PinnedWei YiinTowards Data ScienceUnderstanding the Denoising Diffusion Probabilistic Model, the Socratic WayA deep dive into the motivation behind the denoising diffusion model and detailed derivations for the loss functionFeb 25, 20235Feb 25, 20235

PinnedWei YiinTowards Data ScienceUnderstand REINFORCE, Actor-Critic and PPO in one goUse the loss function of the Policy Gradient algorithm to understand REINFORCE, Actor-Critic, and Proximal Policy Optimization (PPO).Jul 24Jul 24

PinnedWei YiinTowards Data ScienceHow Does an Image-Text Multimodal Foundation Model WorkLearn how an image-text multi-modality model can perform image classification, image retrieval, and image captioningJun 14Jun 14

PinnedWei YiinTowards Data ScienceHow Does the Segment-Anything Model’s (SAM’s) Encoder Work?a deep dive into how image content embedding, sine and cosine positional embedding, guidance click embedding and dense mask embedding is…May 141May 141

PinnedWei YiinTowards Data ScienceHow does the Segment-Anything Model’s (SAM’s) decoder work?A deep dive into how the Segment-Anything model’s decoding procedure, with a focus on how its self-attention and cross-attention mechanism…Mar 241Mar 241

Wei YiinTowards Data ScienceSpeeding up vision transformer prediction by 9 times faster with PyTorch, ONNX and TensorRTHow to use 16bit float, TensorRT, network rewriting and multi-threading to dramatically speed up deep learning model predictionJun 4, 2023Jun 4, 2023

Wei YiinTowards Data ScienceHow Decision Trees Split Nodes, from Loss Function PerspectiveLearn how a decision tree splits nodes only to minimize its loss functionMay 15, 20231May 15, 20231

Wei YiinTowards Data ScienceDistributed data parallel and distributed model parallel in PyTorchHow distributed data parallel DDP and distributed model parallel DMP works in stochastic gradient descent with large models and huge dataMay 8, 20231May 8, 20231

Wei YiinTowards Data ScienceThe Input-output Attention Mechanism from “Neural Machine Translation by Jointly Learning…Learn the math and intuition behind the input-output attention mechanism in a RNN-based language to language translation modelMar 18, 2022Mar 18, 2022

Wei YiinTowards Data ScienceCan We Use Stochastic Gradient Descent (SGD) on a Linear Regression Model?Learn why it is valid to use SGD on a linear regression model for parameter learning, see however, SGD can be inefficient, and appreciate…Aug 5, 2021Aug 5, 2021