Diffusion

Pasted image 20241205171034.png

Add Gaussian noise and then reverse

Forward diffusion process:

Training Objective

Pasted image 20241205171209.png

Network structure

Diffusion models often use U-Net architectures with ResNet blocks and Self-Attention Layer to represent ϵθ(xt,t)
Time representation: Sinusoidal positional embeddings or random Fourier features

Latent Diffusion Models

Map data into compressed latent space
Train diffusion model efficiently in latent space
Pasted image 20241205171519.png

Advantages
  1. Compressed latent space: Train diffusion model in lower resolution
    1. Computationally more efficient
  2. Regularized smooth / compressed latent space
    1. Easier task for diffusion model and faster sampling
  3. Flexibility
    1. Autoencoders can be tailored to data

Condition information is fed into the latent diffusion model by cross-attention
Query:

Compression and Encoding

Diffusion Model

Pasted image 20241205173219.png

Latent Diffusion Model

Pasted image 20241205173247.png

Diffusion Transformers