Training Neural Networks

Overview

  1. One time setup
    1. Activation functions
    2. Data pre-processing
    3. Weight Initialization
    4. Regularization
  2. Training Dynamics
    1. Learning rate schedules
    2. Large-batch training
    3. Hyperparameters optimization
  3. After training
    1. Model ensembles
    2. Transfer learning

Activation function

Pasted image 20241202142256.png

Data Preprocessing

Weight Initialization

Regularization

Common:

Try cutout and mixup for small classification datasets

Learning Rate

SGD, SGD + Momentum, Adagrad, RMSProp, Adam all have learning rate as a Hyperparameters

Early Stopping

How long to train?

Choosing Hyperparameters

Model Ensembles

Transfer Learning