Signing you in…

Discrimination vs generation: what is the difference

Generation vs discrimination: what is the difference

Most classical ML tasks are discriminative: the model learns to map input X to label Y. A spam filter outputs spam / not spam. An object detector returns bounding box coordinates. The model makes inferences about existing data — it creates nothing new.

ℹ️Generative model — a model that learns the data distribution P(X) and can sample new examples from it. Instead of asking "what is this object?" it answers "what does a typical object from this world look like?" GPT generates text that never existed before; Stable Diffusion paints an image from pure random noise.
Key differences between the two approaches
AspectDiscriminativeGenerative
Training goalPredict label P(Y|X)Model data distribution P(X) or P(X|Y)
OutputClass, number, bounding boxNew text, image, audio, video
LabelsRequired (supervised)Often not needed (self-supervised)
Task examplesClassification, regression, NERText generation, image synthesis, translation
Model examplesBERT, SVM, XGBoost, ResNetGPT, Claude, Stable Diffusion, GAN, VAE

The boundary is not absolute. BERT trains without labels (masked token prediction) but is used for discriminative tasks. A GPT-based classifier is technically a generative architecture used for discrimination. The key difference is in the training objective, not the architecture.

A map of generative AI: six families
Click a card to learn more
🔤
LLM
🎨
Diffusion
⚔️
GAN
🗜️
VAE
🎵
Audio Gen
🎬
Video Gen
Why generative AI is possible right now

The ideas behind generative models are not new — GANs were introduced in 2014, VAEs in 2013. What changed was the convergence of three factors that made training at scale both possible and productive.

Three factors converged to make GenAI possible ▶
🏗️
Transformer
2017: self-attention
🖥️
GPU / TPU
×10 000 vs 2012
🌐
Data
Internet-scale corpora
📈
Scaling Laws
More data + params = better
🤖
GenAI
GPT-3 → GPT-4 → Claude

Key takeaways

Generative AI is a paradigm shift: instead of inferring labels from data, models learn to create new data that looks like it came from the same distribution. This became practical through the convergence of the transformer architecture, GPU compute scaling, and internet-scale datasets.
Discriminative models learn P(Y|X) — mapping inputs to labels
Generative models learn P(X) — the data distribution itself, from which new examples can be sampled
The architecture can be identical (both use transformers) — the difference is in the training objective
Six main families: LLM, Diffusion, GAN, VAE, Audio Gen, Video Gen
GenAI became practical through the transformer (2017), compute scaling, and internet-scale data — all three at once