Backpropagation

Pasted image 20241201193612.png

Tedious to derive $\nabla_{W} L$ on paper

Need to re-derive if we change the loss function

Use a Computational Graph

Example

Pasted image 20241201200359.png

Forward pass: Compute outputs
$q = x + y$ , $f = q z$
Backward pass: Compute derivatives
Want $\frac{\partial f}{\partial x}$ , $\frac{\partial f}{\partial y}$ , $\frac{\partial f}{\partial z}$

$\frac{\partial f}{\partial f} = 1$
$\frac{\partial f}{\partial z} = q = 3$
$\frac{\partial f}{\partial q} = z = - 4$
$\frac{\partial f}{\partial y} = \frac{\partial q}{\partial y} \frac{\partial f}{\partial q} = - 4$ (Chain Rule) ( $\frac{\partial q}{\partial y} = 1$ )
$\frac{\partial f}{\partial x} = \frac{\partial q}{\partial x} \frac{\partial f}{\partial q} = - 4$ (Chain Rule) ( $\frac{\partial q}{\partial x} = 1$ )

Pasted image 20241201201631.png

Pasted image 20241201201755.png

Example

Pasted image 20241201202504.png

We can define our own nodes

Pasted image 20241201202637.png

Patterns

Add gate: gradient distributor

Can copy gradient from upstream to downstream
Copy gate: gradient adder
Add upstream gradient to get downstream gradient
Multiplication gate: “swap multiplier”
Downstream gradient is upstream gradient times other input
Max gate: gradient router
Route gradient to max input, 0 otherwise

What about backprop with vector-valued functions?

Pasted image 20241201210302.png

Backprop with vectors

Pasted image 20241201211206.png

Example

ReLU
Pasted image 20241201212825.png
Want implicit way to express Jacobian as only diagonal is useful, rest is filled with 0s
Pasted image 20241201212942.png

Backprop with Matrices (or Tensors)

Pasted image 20241201213252.png

Example

Pasted image 20241201220315.png