Scaling laws: data, parameters, and compute

Scaling laws: parameters, data and compute

Why does GPT-4 know more than GPT-2? Why does Llama 3 8B outperform models with 10× more parameters from two years ago? The answer lies in scaling laws — empirical formulas that predict how a model's quality depends on three numbers: how many parameters it has, how much data it trained on, and how much compute was spent.

Content is available with subscription.

Get full access to all courses on the platform for one year with a single payment.

Unlike other platforms that charge per course, here you get everything for one price, and after one year of use there will be no automatic charge for the following year.