Fully visible belief network (FVBN)

Explicit density model

p(x)=p(x1,x2,...,xn)

PixelRNN

  1. Generate image pixels one at a time, starting at the upper left corner
  2. Compute a hidden state for each pixel that depends on hidden states and RGB values from the left and from above
  3. At each pixel, predict red, then blue, then green
    1. Softmax over [0, 1, …, 255]
Sequential generation is slow!

Each pixel depends implicitly on all pixels above and to the left
Pasted image 20241205160746.png

PixelCNN

  1. Still generate image pixels starting from corner
  2. Dependency on previous pixels now modelled using a CNN over context region (masked convolution)
    Training faster than PixelRNN
    Generation still sequential → slow
    Pasted image 20241205160941.png
Pros

  • Can explicitly compute likelihood p(x)
  • Explicit likelihood of training data gives good evaluation metric
  • Good samples

Cons

  • Sequential generation → slow

Improving PixelCNN performance