ResNet: why 152 layers can beat 20
Before 2015, adding layers beyond a point hurt accuracy—even on the training set. Not overfitting: the network simply stopped learning. This was called the degradation problem. He et al. fixed it with one idea: if extra layers add nothing useful, let them learn to predict zero. Then the block output equals the input (identity). A skip connection makes that trivial—and the gradient always has a direct highway.
Content is available with subscription.
Get full access to all courses on the platform for one year with a single payment.
▼
Unlike other platforms that charge per course, here you get everything for one price, and after one year of use there will be no automatic charge for the following year.