Weight initialization: Xavier, He and variance scaling

Weight initialization: why the start matters

A neural net is optimization in a space with millions of dimensions. Where you start matters. Zero weights: all neurons are identical, symmetry is not broken, the net does not learn. Too large: activations saturate, the gradient dies. Too small: activations collapse toward zero. Xavier and He fix this by design: they choose weight variance so activation variance stays stable through the network.

Content is available with subscription.

Get full access to all courses on the platform for one year with a single payment.

Unlike other platforms that charge per course, here you get everything for one price, and after one year of use there will be no automatic charge for the following year.