Generation vs discrimination: what is the difference
Most classical ML tasks are discriminative: the model learns to map input X to label Y. A spam filter outputs spam / not spam. An object detector returns bounding box coordinates. The model makes inferences about existing data — it creates nothing new.
ℹ️Generative model — a model that learns the data distribution P(X) and can sample new examples from it. Instead of asking "what is this object?" it answers "what does a typical object from this world look like?" GPT generates text that never existed before; Stable Diffusion paints an image from pure random noise.
Key differences between the two approaches
| Aspect | Discriminative | Generative |
|---|---|---|
| Training goal | Predict label P(Y|X) | Model data distribution P(X) or P(X|Y) |
| Output | Class, number, bounding box | New text, image, audio, video |
| Labels | Required (supervised) | Often not needed (self-supervised) |
| Task examples | Classification, regression, NER | Text generation, image synthesis, translation |
| Model examples | BERT, SVM, XGBoost, ResNet | GPT, Claude, Stable Diffusion, GAN, VAE |
The boundary is not absolute. BERT trains without labels (masked token prediction) but is used for discriminative tasks. A GPT-based classifier is technically a generative architecture used for discrimination. The key difference is in the training objective, not the architecture.
A map of generative AI: six families
Click a card to learn more
🔤
LLM
🎨
Diffusion
⚔️
GAN
🗜️
VAE
🎵
Audio Gen
🎬
Video Gen
Why generative AI is possible right now
The ideas behind generative models are not new — GANs were introduced in 2014, VAEs in 2013. What changed was the convergence of three factors that made training at scale both possible and productive.
Three factors converged to make GenAI possible ▶
🏗️
Transformer
2017: self-attention
→
🖥️
GPU / TPU
×10 000 vs 2012
→
🌐
Data
Internet-scale corpora
→
📈
Scaling Laws
More data + params = better
→
🤖
GenAI
GPT-3 → GPT-4 → Claude
Key takeaways
Generative AI is a paradigm shift: instead of inferring labels from data, models learn to create new data that looks like it came from the same distribution. This became practical through the convergence of the transformer architecture, GPU compute scaling, and internet-scale datasets.
Discriminative models learn P(Y|X) — mapping inputs to labels
Generative models learn P(X) — the data distribution itself, from which new examples can be sampled
The architecture can be identical (both use transformers) — the difference is in the training objective
Six main families: LLM, Diffusion, GAN, VAE, Audio Gen, Video Gen
GenAI became practical through the transformer (2017), compute scaling, and internet-scale data — all three at once