MLP Architecture
Multilayer Perceptrons — the foundational building block of deep learning. Compare how shallow, standard, and deep designs trade capacity against simplicity. Click any layer to inspect its math.
What is a Multilayer Perceptron?
Inspired by biology
Loosely modeled on biological neurons. Each artificial neuron computes a weighted sum of inputs, adds a bias, then passes through an activation function.
Layered hierarchy
Each layer transforms its input into a new representation. Deeper layers capture more abstract features — edges → shapes → objects.
Trained by backprop
Gradient descent + chain rule. The model computes its error, calculates how each weight contributed, then nudges weights in the direction that reduces error.
Network diagram — click a layer to inspect
Parameters
109,386
Layers
4
MNIST Accuracy
>97%
Overfit Risk
Low
Two hidden layers learn a hierarchy of visual features. Hidden 1 (128 neurons) detects primitive patterns — horizontal edges, vertical strokes, curves. Hidden 2 (64 neurons) combines those primitives into digit-level shape detectors. This is the GradVex model: 109,386 parameters trained on 60,000 MNIST images using Adam optimizer.
Real-world use cases
- Credit card fraud detection (tabular features)
- Customer churn prediction in SaaS
- Medical diagnosis from structured lab results
- Digit recognition in postal sorting (this exact task)
- Basic recommendation systems with user/item features
Advantages
- Strong accuracy with minimal tuning
- Fast training on modern hardware
- Learns non-linear decision boundaries
- Hierarchical feature representation
- Generalizes well with proper regularization
Disadvantages
- More complex than linear models — harder to explain
- Requires careful initialization
- Sensitive to learning rate choice
- Slower than linear models at inference
Deep dive — concepts explained
Architecture comparison
Click a row to switch architecture
| Architecture | Shape | Parameters | Accuracy | Train speed | Overfit risk |
|---|---|---|---|---|---|
| Shallow MLP | 784 → 10 | 7,850 | ~92% | Very fast | High underfitting |
| Standard MLP | 784 → 128 → 64 → 10 | 109,386 | >97% | Fast (~30s) | Low |
| Deep MLP | 784 → 256 → 128 → 64 → 32 → 10 | 244,522 | ~98% | Moderate | Overfit risk high |