Matrices and linear transforms: how a neural layer warps space

A matrix is not a table — it's a transformation

If a vector is a point's position, a matrix is a command that makes that point move. A matrix can rotate, stretch, squash, or warp all of space at once. In machine learning, a neural layer's weights are the numbers in a matrix. When data passes through the layer, the weight matrix literally warps space, trying to place points of different classes farther apart.

Linear transforms — what a matrix can do

A linear transform preserves two things: straight lines stay straight, and the origin stays fixed. Everything else — rotations, scaling, reflections, shearing — a matrix can do.

Identity [[1,0],[0,1]]: the “do nothing” transform. Space is unchanged. np.eye(2)

Scale [[2,0],[0,2]]: stretches space by 2 along both axes

Rotation [[0,-1],[1,0]]: rotates every vector by 90°. Try it in the widget above

Every dense (fully connected) layer is a matrix:

y=Wx+b

. Training = tuning the numbers in W

Matrix multiplication: composing two transforms into one

Matrix multiplication $A\times B$ is not just arithmetic. It's applying two transforms in sequence: first B, then A on top. Hover over a cell in the result matrix C below to see which row of A and which column of B combine. Order matters — A×B is usually not equal to B×A.

Three operations you should know

Matrices in neural nets: a layer as a warp of space

Every dense layer is literally $y=Wx+b$ . Matrix W warps space; bias b shifts it. Training means tuning the entries of W so that after all layers, points of different classes (e.g. cats vs dogs) end up far apart and easy to separate.

ℹ️GPUs became the heart of deep learning because they run billions of matrix multiplies in parallel. Training GPT-4 took on the order of quadrillions of such operations. Without this matrix throughput, modern LLMs would be impossible.