Neural Networks

Feature Extraction

Before

Linear score function: f=Wx

Now

2-layer Neural Network: f=W2max(0,W1x)
W2RC×H,W1RH×D,xRD

3-layer Neural Network: f=W3max(0,W2max(0,W1x))

Activation Function

Fully-connected network

AKA Multi-layer perceptron
Pasted image 20241201101017.png

Neural Net

Deep Neural Networks

Pasted image 20241201115203.png

More hidden layers = more capacity

Pasted image 20241201120857.png

Faq

Are we overfitting with too many hidden layers?
Should we reduce the number of hidden layers?

Don’t regularize with size; instead use stronger L2
Pasted image 20241201121055.png

Universal Approximation

A neural network with one hidden layer can approximate any function f:RNRM with arbitrary precision

Examples

Pasted image 20241201121405.png

Pasted image 20241201121527.png

Convex Function

A function f:XRnR is convex if for all x1,x2X,t[0,1], f(tx1+(1t)x2)tf(x1)+(1t)f(x2)

A convex function is a (multidimensional) bowl

  • Easy to optimize
  • Converging to global minimum

Linear classifiers optimize a convex function
Most neural networks need Non-convex optimization

  • Few or no guarantees about convergence
  • Seems to work anyways lol