Signing you in…

Matrices and linear transforms: how a neural layer warps space

A matrix is not a table — it's a transformation

If a vector is a point's position, a matrix is a command that makes that point move. A matrix can rotate, stretch, squash, or warp all of space at once. In machine learning, a neural layer's weights are the numbers in a matrix. When data passes through the layer, the weight matrix literally warps space, trying to place points of different classes farther apart.

Change the numbers in matrix W and watch how space deforms
Grid nodes (x, y) map to (x′, y′):  x′ = ax + by,   y′ = cx + dy
Matrix W
(
)
[1.00, 0.00; 0.00, 1.00]

Linear transforms — what a matrix can do

A linear transform preserves two things: straight lines stay straight, and the origin stays fixed. Everything else — rotations, scaling, reflections, shearing — a matrix can do.
Identity [[1,0],[0,1]]: the “do nothing” transform. Space is unchanged. np.eye(2)
Scale [[2,0],[0,2]]: stretches space by 2 along both axes
Rotation [[0,-1],[1,0]]: rotates every vector by 90°. Try it in the widget above
Every dense (fully connected) layer is a matrix: y=Wx+by=Wx+b. Training = tuning the numbers in W
Matrix multiplication: composing two transforms into one

Matrix multiplication A×BA\times B is not just arithmetic. It's applying two transforms in sequence: first B, then A on top. Hover over a cell in the result matrix C below to see which row of A and which column of B combine. Order matters — A×B is usually not equal to B×A.

Hover a cell in matrix C to see which elements combine
Cij=kAikBkjC_{ij} = \sum_k A_{ik} B_{kj} — row ii of Layer 1 (W₁), column jj of Layer 2 (W₂)
Layer 1 (W₁)
231
042
Layer 2 (W₂)
12
01
30
Result
57
64
Hover a result cell in C — the matching row of A and column of B are highlighted.
Three operations you should know
Click an operation for details
↩️
Inverse matrix
📐
Determinant
🔄
Transpose
Matrices in neural nets: a layer as a warp of space

Every dense layer is literally y=Wx+by=Wx+b. Matrix W warps space; bias b shifts it. Training means tuning the entries of W so that after all layers, points of different classes (e.g. cats vs dogs) end up far apart and easy to separate.

Linear algebra in action ▼
python
1
import numpy as np
2
W = np.array([[0, -1], [1, 0]])
3
x = np.array([1, 0])
4
y = W @ x  # [0, 1]
5
inv_W = np.linalg.inv(W)
6
det = np.linalg.det(W)  # = 1.0
ℹ️GPUs became the heart of deep learning because they run billions of matrix multiplies in parallel. Training GPT-4 took on the order of quadrillions of such operations. Without this matrix throughput, modern LLMs would be impossible.