CNN Architectures

AlexNet (8 Layers)

Important

Pasted image 20241202124640.png

Trends

Pasted image 20241202125552.png

ZFNet: A Bigger AlexNet (8 Layers)

More trial and error

Bigger networks work better?

VGG (19 Layers)

Design rules

  1. conv-conv-pool
  2. conv-conv-pool
  3. conv-conv-pool
  4. conv-conv-conv-pool
  5. conv-conv-conv-pool

Choices

AlexNet vs VGG: Much bigger network!

Pasted image 20241202130930.png

GoogLeNet (22 Layers)

Focus on Efficiency!

Stem network

Aggressively downsamples input

Compare to VGG

Pasted image 20241202131447.png

Inception Module

Local unit with parallel branches
Local structure repeated many times throughout the network

No Hyperparameters, just try all kernel sizes

Pasted image 20241202131653.png

Global Average Pooling

No large FC layers at the end!

Use global average pooling to collapse spatial dimensions, and one linear layer to produce class scores

Auxiliary Classifiers

Problem

Training using loss at the end of the network didn’t work well:

Solution (hacky)

Attach ‘auxiliary classifiers’ at several intermediate points in the network that also try to classify image and receive loss

Batch Normalization solved this issue

Residual Networks

Once we have Batch Normalization, we can train networks with 10+ layers

What happens as we go deeper?

Deeper model does worse than shallow model!
Pasted image 20241202132333.png

This is an optimization problem. Deeper models are harder to optimize, and in particular don’t learn identity functions to emulate shallow models

Residual blocks

Change the network so learning identity functions with extra layers is easy
Pasted image 20241202132633.png

A residual network is a stack of many residual blocks

Bottleneck Block

Pasted image 20241202133130.png

Able to train very deep networks
Deeper networks do better than shallow networks

Improving ResNets

Re-organize blocks

Pasted image 20241202133441.png

Parallel bottleneck blocks (ResNeXt)

Pasted image 20241202133753.png

Maintain computation by adding groups!

Densely Connected Neural Networks

Dense blocks where each layer is connected to every other layer in feedforward fashion

  • Alleviates vanishing gradients
  • Strengthens feature propagation
  • Encourages feature reuse

MobileNets: Tiny Networks

Maximize efficiency, less accurate

Automating neural architecture creation

VERY EXPENSIVE!!!!