Batch Normalization

“Normalize” the outputs of a layer so they have zero mean and unit variance

Pasted image 20241201235908.png

Input

x:N×D
Pasted image 20241202000042.png
Pasted image 20241202000048.png

Test-Time

Don’t want estimates to depend on minibatch
Pasted image 20241202000459.png

Pros
Cons