Evaluating generative models: perplexity, BLEU, FID

Evaluating generative models: perplexity, BLEU and FID

Evaluating a generative model is far harder than evaluating a classifier. For classification there is one correct label; for generation there are infinitely many good outputs. "The sun shines" and "A bright sun is shining" are both perfect continuations of the same prompt — but a metric that demands an exact string match will fail the second one.

Content is available with subscription.

Get full access to all courses on the platform for one year with a single payment.

Unlike other platforms that charge per course, here you get everything for one price, and after one year of use there will be no automatic charge for the following year.