Backpropagation

Tedious to derive on paper
- Need to re-derive if we change the loss function
Use a Computational Graph
Example

- Forward pass: Compute outputs
, - Backward pass: Compute derivatives
- Want
, ,


Example

We can define our own nodes

Patterns
Add gate: gradient distributor
- Can copy gradient from upstream to downstream

Copy gate: gradient adder- Add upstream gradient to get downstream gradient

Multiplication gate: “swap multiplier”- Downstream gradient is upstream gradient times other input

Max gate: gradient router- Route gradient to max input, 0 otherwise

What about backprop with vector-valued functions?

Backprop with vectors

Example
ReLU

Want implicit way to express Jacobian as only diagonal is useful, rest is filled with 0s

Backprop with Matrices (or Tensors)

Example
