Wei Yi – Medium

Wei Yi

Pinned

Published in
TDS Archive

Understanding the Denoising Diffusion Probabilistic Model, the Socratic Way

A deep dive into the motivation behind the denoising diffusion model and detailed derivations for the loss function

Feb 25, 2023

Understanding the Denoising Diffusion Probabilistic Model, the Socratic Way

Feb 25, 2023

Pinned

Published in
TDS Archive

Understand REINFORCE, Actor-Critic and PPO in one go

Use the loss function of the Policy Gradient algorithm to understand REINFORCE, Actor-Critic, and Proximal Policy Optimization (PPO).

Jul 24, 2024

Understand REINFORCE, Actor-Critic and PPO in one go

Jul 24, 2024

Pinned

Published in
TDS Archive

How Does an Image-Text Multimodal Foundation Model Work

Learn how an image-text multi-modality model can perform image classification, image retrieval, and image captioning

Jun 1, 2024

How Does an Image-Text Multimodal Foundation Model Work

Jun 1, 2024

Pinned

Published in
TDS Archive

How Does the Segment-Anything Model’s (SAM’s) Encoder Work?

a deep dive into how image content embedding, sine and cosine positional embedding, guidance click embedding and dense mask embedding is…

May 14, 2024

How Does the Segment-Anything Model’s (SAM’s) Encoder Work?

May 14, 2024

Pinned

Published in
TDS Archive

How does the Segment-Anything Model’s (SAM’s) decoder work?

A deep dive into how the Segment-Anything model’s decoding procedure, with a focus on how its self-attention and cross-attention mechanism…

Mar 24, 2024

How does the Segment-Anything Model’s (SAM’s) decoder work?

Mar 24, 2024

Published in
TDS Archive

Speeding up vision transformer prediction by 9 times faster with PyTorch, ONNX and TensorRT

How to use 16bit float, TensorRT, network rewriting and multi-threading to dramatically speed up deep learning model prediction

Jun 4, 2023

Speeding up vision transformer prediction by 9 times faster with PyTorch, ONNX and TensorRT

Jun 4, 2023

Published in
TDS Archive

How Decision Trees Split Nodes, from Loss Function Perspective

Learn how a decision tree splits nodes only to minimize its loss function

May 15, 2023

How Decision Trees Split Nodes, from Loss Function Perspective

May 15, 2023

Published in
TDS Archive

Distributed data parallel and distributed model parallel in PyTorch

How distributed data parallel DDP and distributed model parallel DMP works in stochastic gradient descent with large models and huge data

May 8, 2023

Distributed data parallel and distributed model parallel in PyTorch

May 8, 2023

Published in
TDS Archive

The Input-output Attention Mechanism from “Neural Machine Translation by Jointly Learning…

Learn the math and intuition behind the input-output attention mechanism in a RNN-based language to language translation model

Mar 18, 2022

The Input-output Attention Mechanism from “Neural Machine Translation by Jointly Learning…

Mar 18, 2022

Published in
TDS Archive

Can We Use Stochastic Gradient Descent (SGD) on a Linear Regression Model?

Learn why it is valid to use SGD on a linear regression model for parameter learning, see however, SGD can be inefficient, and appreciate…

Aug 5, 2021

Can We Use Stochastic Gradient Descent (SGD) on a Linear Regression Model?

Aug 5, 2021

Wei Yi

Wei Yi

Friend of Medium

I'm leading the Deep Learning team at AstraZeneca. Previously I worked at SecondMind, Microsoft Research, and also was CTO of a hedge fund EQB.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech