Dropout

In each forward pass, randomly set some neurons to zero

Why do we want this?

Forces the network to have a redundant representation

  • Prevents co-adaptation of features

Pasted image 20241202154946.png

Dropout is a large ensemble of models

Problems: Test Time

Dropout makes our output random!
Pasted image 20241202155232.png
Want to “average out” randomness at test-time

y=f(x)=Ez[f(x,z)]=p(x)f(x,z)dz
Approximate the integral

At test time, drop nothing and multiple by dropout probability
Pasted image 20241202155545.png

Inverted Dropout

Do the re-scaling during training time rather than test time

Architecture

Dropout is usually applied in the fully-connected layers