Neural Networks

Feature Extraction

Before

Linear score function: $f = W x$

Now

2-layer Neural Network: $f = W_{2} max (0, W_{1} x)$
$W_{2} \in R^{C \times H}, W_{1} \in R^{H \times D}, x \in R^{D}$

3-layer Neural Network: $f = W_{3} max (0, W_{2} max (0, W_{1} x))$

Activation Function

Fully-connected network

AKA Multi-layer perceptron
Pasted image 20241201101017.png

Neural Net

First layer is bank of templates
Second layer recombines templates

Deep Neural Networks

Pasted image 20241201115203.png

More hidden layers = more capacity

Pasted image 20241201120857.png

Faq

Are we overfitting with too many hidden layers?
Should we reduce the number of hidden layers?

Don’t regularize with size; instead use stronger L2
Pasted image 20241201121055.png

Universal Approximation

A neural network with one hidden layer can approximate any function $f : R^{N} \to R^{M}$ with arbitrary precision

Examples

Pasted image 20241201121405.png

Pasted image 20241201121527.png

Convex Function

A function $f : X \subseteq R^{n} \to R$ is convex if for all $x_{1}, x_{2} \in X, t \in [0, 1]$ , $f (t x_{1} + (1 - t) x_{2}) \leq t f (x_{1}) + (1 - t) f (x_{2})$

This means that the secant line between any two points of a function always lies above the function

A convex function is a (multidimensional) bowl

Easy to optimize
Converging to global minimum

Linear classifiers optimize a convex function

Optimization

Most neural networks need Non-convex optimization

Few or no guarantees about convergence
Seems to work anyways lol