Architecture Explorer

MLP Architecture

Multilayer Perceptrons — the foundational building block of deep learning. Compare how shallow, standard, and deep designs trade capacity against simplicity. Click any layer to inspect its math.

What is a Multilayer Perceptron?

Inspired by biology

Loosely modeled on biological neurons. Each artificial neuron computes a weighted sum of inputs, adds a bias, then passes through an activation function.

Layered hierarchy

Each layer transforms its input into a new representation. Deeper layers capture more abstract features — edges → shapes → objects.

Trained by backprop

Gradient descent + chain rule. The model computes its error, calculates how each weight contributed, then nudges weights in the direction that reduces error.

Real world: MLPs power credit card fraud detection (Stripe, PayPal), medical diagnosis support (IBM Watson Health), ad click-through prediction (Google, Meta), and of course handwriting recognition — the task you're visualizing in GradVex right now.

Network diagram — click a layer to inspect

+770Input784+117Hidden 1128ReLU+55Hidden 264ReLUOutput10Softmax

Parameters

109,386

Layers

4

MNIST Accuracy

>97%

Overfit Risk

Low

Two hidden layers learn a hierarchy of visual features. Hidden 1 (128 neurons) detects primitive patterns — horizontal edges, vertical strokes, curves. Hidden 2 (64 neurons) combines those primitives into digit-level shape detectors. This is the GradVex model: 109,386 parameters trained on 60,000 MNIST images using Adam optimizer.

Real-world use cases

  • Credit card fraud detection (tabular features)
  • Customer churn prediction in SaaS
  • Medical diagnosis from structured lab results
  • Digit recognition in postal sorting (this exact task)
  • Basic recommendation systems with user/item features

Advantages

  • Strong accuracy with minimal tuning
  • Fast training on modern hardware
  • Learns non-linear decision boundaries
  • Hierarchical feature representation
  • Generalizes well with proper regularization

Disadvantages

  • More complex than linear models — harder to explain
  • Requires careful initialization
  • Sensitive to learning rate choice
  • Slower than linear models at inference

Deep dive — concepts explained

Architecture comparison

Click a row to switch architecture

ArchitectureShapeParametersAccuracyTrain speedOverfit risk
Shallow MLP784 → 107,850~92%Very fastHigh underfitting
Standard MLP784 → 128 → 64 → 10109,386>97%Fast (~30s)Low
Deep MLP784 → 256 → 128 → 64 → 32 → 10244,522~98%ModerateOverfit risk high