Member-only story
Understanding the Denoising Diffusion Probabilistic Model (DDPMs), the Socratic Way
A deep dive into the motivation behind the denoising diffusion model and detailed derivations for the loss function
The Denoising Diffusion Probabilistic Models by Jonathan Ho et. al. is a great paper. But I had difficulty understanding it. So I decided to dive into the model and worked out all the derivations. In this article, I will focus on the two main obstacles to understand the paper:
- why is the denoising diffusion model designed in terms of the forward process, the forward process posteriors, and backward process. And what is the relationship among these processes? By the way, in this article I call the forward process posteriors “the reverse of the forward process” because I find the word “posteriors” confuses me, and/or subconsciously I want to avoid that word as it frightens me — every time it appears, things become complicated.
- how to derive the mysterious loss function. In the paper, there are many skipped steps in deriving the loss function Lₛᵢₘₚₗₑ. I went through all derivations to fill in the missing steps. Now I realize the derivation of the analytical formula for Lₛᵢₘₚₗₑ tells a truly beautiful Bayesian story. And after all the steps filled in, the whole story…