Member-only story

Understanding the Denoising Diffusion Probabilistic Model (DDPMs), the Socratic Way

A deep dive into the motivation behind the denoising diffusion model and detailed derivations for the loss function

Wei Yi
TDS Archive

Photo by Chaozzy Lin on Unsplash

The Denoising Diffusion Probabilistic Models by Jonathan Ho et. al. is a great paper. But I had difficulty understanding it. So I decided to dive into the model and worked out all the derivations. In this article, I will focus on the two main obstacles to understand the paper:

  1. why is the denoising diffusion model designed in terms of the forward process, the forward process posteriors, and backward process. And what is the relationship among these processes? By the way, in this article I call the forward process posteriors “the reverse of the forward process” because I find the word “posteriors” confuses me, and/or subconsciously I want to avoid that word as it frightens me — every time it appears, things become complicated.
  2. how to derive the mysterious loss function. In the paper, there are many skipped steps in deriving the loss function Lₛᵢₘₚₗₑ. I went through all derivations to fill in the missing steps. Now I realize the derivation of the analytical formula for Lₛᵢₘₚₗₑ tells a truly beautiful Bayesian story. And after all the steps filled in, the whole story…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (5)

What are your thoughts?

Excellent step by step analysis (derivation). What would rock even more, is a Colab notebook that contains the steps in executable form :)

Thank you very much! the best material I have found on the topic until now. You are a hero man!

Line (8) splits the integrating variables into 4 parts, corresponding to x₀, xₜ₋₁, xₜ and xₒₜₕₑᵣ, and re-orders them.

Could you please explain the formula in line 8?
1. How are there 2 integrals in line 8, when in line 7 there was only one?
2. Is q(x_t, x_other, x_0) equal to q(x_t | x_0) * q(x_other | x_0) * q(x_0) ?
Thank you