This course is a hands-on dive into ML — not slides full of formulas, but interactive visualizations: drag a slider and you feel regularization. Press a button and watch the gradient vanish in a deep net. Build a RAG pipeline and see why LLMs hallucinate without context. Every idea is anchored in a widget that makes the abstraction tangible.
A neuron is a function: a weighted sum of inputs passed through a nonlinear activation. Drag the point along the x-axis. See how sigmoid saturates at the edges — the derivative shrinks toward zero? That is why deep nets on sigmoid barely trained until 2012. ReLU fixes this: derivative is 1 for x > 0.
Training a neural net is finding a minimum in a high-dimensional landscape. Each step: compute the gradient → step opposite to the gradient → repeat. Press Start and watch the ball roll into the valley. Learning rate too high — it overshoots. Too low — it crawls forever.
After all layers, the network outputs logits. Softmax turns them into probabilities: all positive, summing to 1. Temperature T controls “confidence”: low T — one class dominates; high T — a flatter distribution. This same knob appears when sampling from LLMs.
winner-
takes-allT→∞
uniform