Backpropagation
Tedious to derive on paper
- Need to re-derive if we change the loss function
Use a Computational Graph
Example
- Forward pass: Compute outputs
, - Backward pass: Compute derivatives
- Want
, ,
Example
We can define our own nodes
Patterns
Add gate: gradient distributor
- Can copy gradient from upstream to downstream
Copy gate: gradient adder- Add upstream gradient to get downstream gradient
Multiplication gate: “swap multiplier”- Downstream gradient is upstream gradient times other input
Max gate: gradient router- Route gradient to max input, 0 otherwise
What about backprop with vector-valued functions?
Backprop with vectors
Example
ReLU
Want implicit way to express Jacobian as only diagonal is useful, rest is filled with 0s
Backprop with Matrices (or Tensors)
Example