Gradient Descent
# Vanilla gradient descent
while True:
weights_grad = evaluate_gradient(loss_fun, data, weights)
weights += - step_size * weights_grad
step_size
/learning rate is Hyperparameters
Full sum expensive when N is large!
Approximate sum using a minibatch of examples 32/64/128 commons: Stochastic Gradient Descent