Recurrent Neural Networks

So far: “Feedforward” Neural Networks

Pasted image 20241202210814.png

Key Idea

RNNs have an “internal state” that is updated as a sequence is processed

We can process a sequence of vectors x by applying a recurrence formula at every time step:

ht=fw(ht1,xt)

ht = new state
fw = some function with parameters W
ht1 = old state
xt = input vector at some time step

Vanilla Recurrent Neural Networks

Pasted image 20241202213640.png

ht=tanh(Whhht1+Wxhxt)yt=Whyht

RNN Computational Graph

Initial hidden state

Re-use the same weight matrix at every time-step

What of different timesteps have different weights?

  • Can only predict fixed input length
    • depends on number of weight matrices
  • Model size increase linearly with number of timesteps
  • Different weights applied on different timesteps
    • difficult to learn weights

Pasted image 20241202213955.png

Many to Many

Pasted image 20241202214152.png

Many to One

Pasted image 20241202214315.png

One to Many

Pasted image 20241202214328.png

Sequence to Sequence (Machine translation)

Many to One + One to Many

At test-time, generate new text:

Backpropagation Through Time

Pasted image 20241202215127.png

Take a lot of memory for long sequences!

Truncated Backpropagation Through Time

Pasted image 20241202215355.png

Tldr

Run forward and backward through chunks of the sequence instead of whole sequence

Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps

RNN Tradeoffs

Advantages

Example: Image Captioning

Pasted image 20241202220712.png

  1. Take feature vector coming out of CNN
  2. Feed into RNN
    Pasted image 20241202221808.png
    Result:
    Pasted image 20241202221913.png
We have a START token for the beginning of the predict and an END token to know when to end