Signing you in…

Introduction: what this course is about

Machine learning: from equations to production

This course is a hands-on dive into ML — not slides full of formulas, but interactive visualizations: drag a slider and you feel regularization. Press a button and watch the gradient vanish in a deep net. Build a RAG pipeline and see why LLMs hallucinate without context. Every idea is anchored in a widget that makes the abstraction tangible.

What awaits you: 8 chapters, 42 lessons
Course structure
Click a ring to learn about that chapter
MathDataClassicsEnsemblesDLNLPCVMLOps
Click a ring to learn about that chapter
70 years in 8 milestones: a short history of ML
Tap an event — see the breakthrough idea ▶
1957
Perceptron
1986
Backprop
1995
SVM
2012
AlexNet
🔥 Breakthrough
2017
Transformer
Attention is all you need
2022
ChatGPT
100M users in 2 months
2023–2024
GPT-4 / Gemini
Try it now: a neuron from the inside

A neuron is a function: a weighted sum of inputs passed through a nonlinear activation. Drag the point along the x-axis. See how sigmoid saturates at the edges — the derivative shrinks toward zero? That is why deep nets on sigmoid barely trained until 2012. ReLU fixes this: derivative is 1 for x > 0.

Drag into the red zone — watch the derivative drop. Switch to ReLU — the difference is obvious.
ИССЛЕДОВАТЕЛЬ АКТИВАЦИЙ · Sigmoid
saturation zone-6.0-3.00.03.06.0-0.10.01.1
σ(x) = 1 / (1 + e⁻ˣ)
x0.000
f(x)0.5000
f'(x)0.2500
|f'(x)| = 0.250
x = 0.000f(x) = 0.5000 · f′(x) = 0.2500 — gradient flows well; the layer trains actively.
This is training: descending the loss surface

Training a neural net is finding a minimum in a high-dimensional landscape. Each step: compute the gradient → step opposite to the gradient → repeat. Press Start and watch the ball roll into the valley. Learning rate too high — it overshoots. Too low — it crawls forever.

Press Start — watch the descent. Try different learning rates
LOSS LANDSCAPE · GRADIENT DESCENT
0 / 80 steps
f(x) = 0.08x⁴ − 0.8x² + 0.2x + 3-4.0-2.4-0.80.82.44.00.23.16.08.911.8
Learning rate
Smooth descent toward the minimum.
iteration
x3.5000
f(x)5.9050
f'(x) =8.3200
step = −lr × f′(x)
= −0.1 × 8.3200
= -0.8320
x_new = x + step
= 3.5000 + (-0.8320)
= 2.6680
|f'(x)| = 8.3200 (→ 0 at the minimum)
Press Step or Start to run gradient descent
How a model decides: softmax

After all layers, the network outputs logits. Softmax turns them into probabilities: all positive, summing to 1. Temperature T controls “confidence”: low T — one class dominates; high T — a flatter distribution. This same knob appears when sampling from LLMs.

Drag logits or change temperature — watch probabilities shift
SOFTMAX · 5 classes
z=3.2cat
z=1.5dog
z=0.8bird
z=-0.5fish
z=2.1rabbit
softmaxT=1.0
p=0.61cat
p=0.11dog
p=0.06bird
p=0.02fish
p=0.20rabbit
T = 1.00
T→0
winner-
takes-all
T→∞
uniform
Entropy H = 1.578 / 2.322
softmax(z)ᵢ = exp(zᵢ/T) / Σⱼexp(zⱼ/T). Winner: cat · p = 0.613
Who this course is for
Click to read more
💻
Developers
📊
Data scientists
⚙️
ML engineers
🔍
The curious
From data to production: the path across 8 chapters
From raw data to a live service ▶
📊
Data
EDA, features
🧹
Preprocessing
scaling, encoding
🧠
Model
train & evaluate
🔬
Experiments
MLflow tracking
📦
Container
Docker image
🚀
Production
serving + monitoring
How to use this course

Three rules for effective learning

ML is learned with your hands, not your eyes. Reading about gradient descent is not the same as feeling it. Every widget is built to give you an “aha” moment. Do not skip the interactive blocks — that is where learning happens.
Touch every slider: each widget reacts. That is how intuition is built.
Read the formula lines under widgets: they explain the math in real time.
Run the code: every code-explorer has runnable Python you can copy.
It is OK to skip around: chapters are relatively independent. Jump to NLP or CV if you need to.
ℹ️Prerequisites: basic Python (lists, functions, classes), school-level math (functions, derivatives at a conceptual level). Linear algebra and probability are explained along the way — no special prep required. For practice: Python 3.11+, pip install torch numpy pandas scikit-learn matplotlib.