Architecture Explorer

MLP Architecture

Multilayer Perceptrons — the foundational building block of deep learning. Compare how shallow, standard, and deep designs trade capacity against simplicity. Click any layer to inspect its math.

What is a Multilayer Perceptron?

Inspired by biology

Loosely modeled on biological neurons. Each artificial neuron computes a weighted sum of inputs, adds a bias, then passes through an activation function.

Layered hierarchy

Each layer transforms its input into a new representation. Deeper layers capture more abstract features — edges → shapes → objects.

Trained by backprop

Gradient descent + chain rule. The model computes its error, calculates how each weight contributed, then nudges weights in the direction that reduces error.

Real world: MLPs power credit card fraud detection (Stripe, PayPal), medical diagnosis support (IBM Watson Health), ad click-through prediction (Google, Meta), and of course handwriting recognition — the task you're visualizing in GradVex right now.

Network diagram — click a layer to inspect

Parameters

109,386

Layers

MNIST Accuracy

>97%

Overfit Risk

Low

Two hidden layers learn a hierarchy of visual features. Hidden 1 (128 neurons) detects primitive patterns — horizontal edges, vertical strokes, curves. Hidden 2 (64 neurons) combines those primitives into digit-level shape detectors. This is the GradVex model: 109,386 parameters trained on 60,000 MNIST images using Adam optimizer.

Real-world use cases

Credit card fraud detection (tabular features)
Customer churn prediction in SaaS
Medical diagnosis from structured lab results
Digit recognition in postal sorting (this exact task)
Basic recommendation systems with user/item features

Advantages

Strong accuracy with minimal tuning
Fast training on modern hardware
Learns non-linear decision boundaries
Hierarchical feature representation
Generalizes well with proper regularization

Disadvantages

More complex than linear models — harder to explain
Requires careful initialization
Sensitive to learning rate choice
Slower than linear models at inference

Deep dive — concepts explained

Architecture comparison

Click a row to switch architecture

Architecture	Shape	Parameters	Accuracy	Train speed	Overfit risk
Shallow MLP	784 → 10	7,850	~92%	Very fast	High underfitting
Standard MLP	784 → 128 → 64 → 10	109,386	>97%	Fast (~30s)	Low
Deep MLP	784 → 256 → 128 → 64 → 32 → 10	244,522	~98%	Moderate	Overfit risk high