Backpropagation sounds like a special algorithm. It's the chain rule from calculus run in reverse, with one practical demand: remember the intermediate values from the forward pass so you can reuse them going back.

Forward stores, backward reuses

On the way forward, each operation saves what it'll need for its own derivative. On the way back, gradients flow from the loss to every parameter, each node multiplying the incoming gradient by its local derivative. That "multiply by the local derivative" step is the entire trick.

Write it once by hand for a two-layer net and the mystery evaporates. dL/dw is never magic; it's dL/dy * dy/dw, and dy/dw is something you already computed on the forward pass.

Autograd isn't intelligence. It's a very disciplined accountant.

Why do it by hand

You will never debug a vanishing gradient or a flipped sign by trusting the framework. You debug it by knowing, concretely, what each number on the backward pass is supposed to be — and the only way to know that is to have derived it yourself at least once.