PinnedPublished inTDS ArchiveUnderstanding the Denoising Diffusion Probabilistic Model, the Socratic WayA deep dive into the motivation behind the denoising diffusion model and detailed derivations for the loss functionFeb 25, 20235Feb 25, 20235
PinnedPublished inTDS ArchiveUnderstand REINFORCE, Actor-Critic and PPO in one goUse the loss function of the Policy Gradient algorithm to understand REINFORCE, Actor-Critic, and Proximal Policy Optimization (PPO).Jul 24, 2024Jul 24, 2024
PinnedPublished inTDS ArchiveHow Does an Image-Text Multimodal Foundation Model WorkLearn how an image-text multi-modality model can perform image classification, image retrieval, and image captioningJun 1, 20244Jun 1, 20244
PinnedPublished inTDS ArchiveHow Does the Segment-Anything Model’s (SAM’s) Encoder Work?a deep dive into how image content embedding, sine and cosine positional embedding, guidance click embedding and dense mask embedding is…May 14, 20241May 14, 20241
PinnedPublished inTDS ArchiveHow does the Segment-Anything Model’s (SAM’s) decoder work?A deep dive into how the Segment-Anything model’s decoding procedure, with a focus on how its self-attention and cross-attention mechanism…Mar 24, 20241Mar 24, 20241
Published inTDS ArchiveSpeeding up vision transformer prediction by 9 times faster with PyTorch, ONNX and TensorRTHow to use 16bit float, TensorRT, network rewriting and multi-threading to dramatically speed up deep learning model predictionJun 4, 2023Jun 4, 2023
Published inTDS ArchiveHow Decision Trees Split Nodes, from Loss Function PerspectiveLearn how a decision tree splits nodes only to minimize its loss functionMay 15, 20231May 15, 20231
Published inTDS ArchiveDistributed data parallel and distributed model parallel in PyTorchHow distributed data parallel DDP and distributed model parallel DMP works in stochastic gradient descent with large models and huge dataMay 8, 20231May 8, 20231
Published inTDS ArchiveThe Input-output Attention Mechanism from “Neural Machine Translation by Jointly Learning…Learn the math and intuition behind the input-output attention mechanism in a RNN-based language to language translation modelMar 18, 2022Mar 18, 2022
Published inTDS ArchiveCan We Use Stochastic Gradient Descent (SGD) on a Linear Regression Model?Learn why it is valid to use SGD on a linear regression model for parameter learning, see however, SGD can be inefficient, and appreciate…Aug 5, 2021Aug 5, 2021