Dropout

In each forward pass, randomly set some neurons to zero

Why do we want this?

Forces the network to have a redundant representation

Pasted image 20241202154946.png

Dropout is a large ensemble of models

Dropout makes our output random!
Pasted image 20241202155232.png
Want to “average out” randomness at test-time

y = f (x) = E_{z} [f (x, z)] = \int p (x) f (x, z) d z

Approximate the integral

At test time, drop nothing and multiple by dropout probability
Pasted image 20241202155545.png

Do the re-scaling during training time rather than test time

Dropout is usually applied in the fully-connected layers