Regularization

Prevents overfitting

Model should be “simple”, so it works on test data

Regularization term: λR(W)

L=1NiLi(f(xi,W),yi)+λR(W)

λ = regularization strength (Hyperparameters)

Common use:

A common pattern

Training

Add some kind of randomness

Testing

Average out randomness