Neural Networks Deep Dive: Build One From Scratch
Build a neural network from first principles, then in PyTorch, and understand backpropagation
Greetings, brave adventurer! Deep within the Oracle’s Tower lies a chamber where lifeless numbers awaken into something that learns. This quest, Neural Networks Deep Dive, reveals the machinery beneath all of deep learning. You will build a neural network by hand - neuron by neuron - then watch the same idea expressed in PyTorch. By the end you will understand the two great spells of the network: forward propagation, which makes a prediction, and backpropagation, which teaches the network from its mistakes.
Many invoke model.fit() without knowing what it does. After this quest, you will be among the few who truly understand.
📖 The Legend Behind This Quest
Inspired by the neurons of living brains, early artificers stacked simple units - each computing a weighted sum and a non-linear twist - into layers. Alone, a neuron is a line. Stacked with non-linear activations, layers can bend space into any shape, approximating nearly any function. The breakthrough was not the neuron but the teaching: backpropagation, which uses the chain rule of calculus to assign blame for an error backward through every layer, nudging each weight toward a better answer. This single algorithm powers everything from image recognition to the large language models of today.
🎯 Quest Objectives
By the time you complete this journey, you will have mastered:
Primary Objectives (Required for Quest Completion)
- Neurons and Layers - Explain what a single neuron computes and how layers stack
- Activation Functions - Use ReLU, sigmoid, and softmax and explain why non-linearity matters
- Forward Propagation - Trace an input through a network to a prediction
- Backpropagation & Gradient Descent - Implement learning by propagating error backward
Secondary Objectives (Bonus Achievements)
- Loss Functions - Choose cross-entropy for classification and MSE for regression
- PyTorch Autograd - Let the framework compute gradients automatically
- Hyperparameter Intuition - Reason about learning rate, epochs, and hidden size
Mastery Indicators
You’ll know you’ve truly mastered this quest when you can:
- Explain backpropagation to a peer using the chain rule
- Diagnose a network that fails to learn (e.g., dead ReLUs or a too-large learning rate)
- Translate a from-scratch network into a PyTorch
nn.Module - Read a falling loss curve and know when to stop
🗺️ Quest Prerequisites
📋 Knowledge Requirements
- Completion of Machine Learning Fundamentals (recommended)
- Comfortable with NumPy arrays and matrix shapes
- Basic calculus intuition (a derivative is a slope)
🛠️ System Requirements
- Modern operating system (Windows 10+, macOS 10.14+, or Linux)
- Python 3.10 or newer installed and on your PATH
- 8 GB RAM recommended; a GPU is optional for this quest
- Internet connection for installing packages
🧠 Skill Level Indicators
This ⚔️ Epic quest expects:
- You have trained at least one machine learning model
- You are ready to work with matrices and derivatives
- Ready for 5-7 hours of deep, hands-on learning
🌍 Choose Your Adventure Platform
Build an isolated environment with NumPy and PyTorch.
🍎 macOS Kingdom Path
Click to expand macOS instructions
```bash python3 -m venv ~/nn-quest && source ~/nn-quest/bin/activate pip install --upgrade pip pip install numpy torch matplotlib # Apple Silicon: PyTorch can use the MPS backend for acceleration python -c "import torch; print('torch', torch.__version__, 'mps', torch.backends.mps.is_available())" ```🪟 Windows Empire Path
Click to expand Windows instructions
```powershell python -m venv $HOME\nn-quest & $HOME\nn-quest\Scripts\Activate.ps1 pip install --upgrade pip pip install numpy torch matplotlib python -c "import torch; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())" ```🐧 Linux Territory Path
Click to expand Linux instructions
```bash sudo apt update && sudo apt install -y python3-venv python3-pip python3 -m venv ~/nn-quest && source ~/nn-quest/bin/activate pip install --upgrade pip pip install numpy torch matplotlib python -c "import torch; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())" ```☁️ Cloud Realms Path
Click to expand Cloud/Container instructions
```bash # Google Colab provides a free GPU. Select Runtime > Change runtime type > GPU. # torch and numpy are preinstalled; verify the device: python -c "import torch; print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU')" ```🧙♂️ Chapter 1: The Neuron and the Forward Pass
A neuron computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function. Stack many neurons into a layer, stack layers into a network, and you have a function that can learn.
⚔️ Skills You’ll Forge in This Chapter
- The math of a single neuron
- Why activation functions must be non-linear
- Forward propagation through a layer
🏗️ The Forward Pass by Hand
import numpy as np
def relu(z):
return np.maximum(0, z)
def sigmoid(z):
return 1.0 / (1.0 + np.exp(-z))
# A tiny network: 2 inputs -> 3 hidden neurons -> 1 output
np.random.seed(42)
W1 = np.random.randn(2, 3) * 0.5 # weights for layer 1
b1 = np.zeros(3)
W2 = np.random.randn(3, 1) * 0.5 # weights for layer 2
b2 = np.zeros(1)
x = np.array([0.7, -1.2]) # one input example
# Forward propagation: linear -> activation, layer by layer
z1 = x @ W1 + b1 # weighted sums of hidden layer
a1 = relu(z1) # non-linear activation
z2 = a1 @ W2 + b2
y_hat = sigmoid(z2) # output in (0, 1)
print("Hidden activations:", np.round(a1, 3))
print("Prediction:", float(y_hat))
Without the non-linear relu, stacking layers would collapse into a single linear function. Non-linearity is what lets networks bend.
🔍 Knowledge Check: The Forward Pass
- What does a single neuron compute before activation?
- Why would removing all activation functions make depth useless?
- What range does sigmoid squash its input into?
⚡ Quick Wins and Checkpoints
- Computed a forward pass: Your network produced a prediction
- Named two activations: You can describe ReLU and sigmoid
🧙♂️ Chapter 2: Backpropagation and Gradient Descent
Forward propagation makes a guess. Backpropagation teaches. It computes how much each weight contributed to the error using the chain rule, then gradient descent nudges each weight in the direction that lowers the loss.
⚔️ Skills You’ll Forge in This Chapter
- The intuition of the chain rule for layered functions
- Implementing one training step from scratch
- Watching the loss fall over epochs
🏗️ A Network That Learns XOR
XOR is the classic problem a single linear model cannot solve. A hidden layer can.
import numpy as np
np.random.seed(0)
X = np.array([[0,0],[0,1],[1,0],[1,1]], dtype=float)
y = np.array([[0],[1],[1],[0]], dtype=float) # XOR truth table
def sigmoid(z): return 1/(1+np.exp(-z))
def dsigmoid(a): return a*(1-a) # derivative given the activation
# Initialize a 2 -> 4 -> 1 network
W1 = np.random.randn(2,4); b1 = np.zeros((1,4))
W2 = np.random.randn(4,1); b2 = np.zeros((1,1))
lr = 0.5
for epoch in range(5000):
# Forward
a1 = sigmoid(X @ W1 + b1)
a2 = sigmoid(a1 @ W2 + b2)
# Loss (mean squared error) and its gradient at the output
loss = np.mean((a2 - y)**2)
d2 = (a2 - y) * dsigmoid(a2) # error at output layer
# Backpropagate the error to the hidden layer (chain rule)
d1 = (d2 @ W2.T) * dsigmoid(a1)
# Gradient descent: update weights against the gradient
W2 -= lr * a1.T @ d2
b2 -= lr * d2.sum(axis=0, keepdims=True)
W1 -= lr * X.T @ d1
b1 -= lr * d1.sum(axis=0, keepdims=True)
if epoch % 1000 == 0:
print(f"epoch {epoch:4d} loss {loss:.4f}")
print("Predictions:", np.round(sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2).ravel(), 2))
The loss falls toward zero and predictions approach the XOR table. You have implemented learning itself.
🔍 Knowledge Check: Backpropagation
- Why does gradient descent subtract the gradient from the weights?
- What role does the chain rule play across layers?
- What happens if the learning rate is far too large?
🧙♂️ Chapter 3: The Same Network in PyTorch
PyTorch computes gradients for you via autograd. You define the forward pass; the framework derives the backward pass automatically.
⚔️ Skills You’ll Forge in This Chapter
- Defining a model as an
nn.Module - Using an optimizer and a loss function
- The canonical training loop
🏗️ PyTorch Training Loop
import torch
import torch.nn as nn
X = torch.tensor([[0,0],[0,1],[1,0],[1,1]], dtype=torch.float32)
y = torch.tensor([[0],[1],[1],[0]], dtype=torch.float32)
model = nn.Sequential(
nn.Linear(2, 8),
nn.ReLU(),
nn.Linear(8, 1),
nn.Sigmoid(),
)
loss_fn = nn.BCELoss() # binary cross-entropy
optimizer = torch.optim.Adam(model.parameters(), lr=0.05)
for epoch in range(2000):
optimizer.zero_grad() # clear old gradients
y_hat = model(X) # forward pass
loss = loss_fn(y_hat, y)
loss.backward() # autograd computes all gradients
optimizer.step() # update every weight
if epoch % 500 == 0:
print(f"epoch {epoch:4d} loss {loss.item():.4f}")
print("Predictions:", model(X).detach().round().ravel().tolist())
loss.backward() replaces the entire hand-written backpropagation from Chapter 2. This is why frameworks exist: you describe what to compute, and autograd handles how to differentiate it.
🔍 Knowledge Check: PyTorch
- What does
optimizer.zero_grad()prevent? - Which line triggers automatic differentiation?
- Why is cross-entropy preferred over MSE for classification?
🎮 Mastery Challenges
🟢 Novice Challenge: Swap the Activation
Objective: Observe how activation choice affects learning.
Requirements:
- Take the PyTorch XOR model and replace
nn.ReLU()withnn.Tanh() - Train both and compare final loss
- Note which converged faster
Validation: Both models reach a low loss; you can report the difference.
🟡 Intermediate Challenge: Tune the Learning Rate
Objective: Feel the effect of the most important hyperparameter.
Requirements:
- Train the from-scratch XOR network with
lrof 0.01, 0.5, and 5.0 - Record which converges, which is slow, and which diverges
- Explain why the largest rate fails
Validation: You can describe the trade-off in two sentences.
🔴 Advanced Challenge: A Real Classifier
Objective: Apply a PyTorch network to a real dataset.
Requirements:
- Load
load_digitsfrom scikit-learn (8x8 handwritten digits) - Build a 64 -> 64 -> 10 network with ReLU and softmax (
CrossEntropyLoss) - Train and report test accuracy above 0.90
Validation: Your model classifies held-out digits with high accuracy.
🏆 Quest Rewards & Achievements
🎖️ Badges Earned:
- 🏆 Network Weaver - You built a neural network from first principles
- ⚡ Gradient Channeler - You implemented backpropagation and gradient descent
🛠️ Skills Unlocked:
- From-Scratch Neural Networks - You understand the math, not just the API
- Backpropagation & Autograd - Manual gradients and PyTorch automation
🔓 Unlocked Quests:
- Deep Learning Frameworks - Master PyTorch and TensorFlow in depth
- Computer Vision - Apply convolutional networks to images
- Natural Language Processing - Apply networks to text and transformers
📊 Progression Points: +90 XP
🗺️ Next Steps in Your Journey
Continue the Main Story:
- 🎯 Deep Learning Frameworks - PyTorch vs. TensorFlow
Explore Side Adventures:
- ⚔️ Computer Vision - Convolutional networks for images
- ⚔️ Natural Language Processing - Transformers for text
Character Class Recommendations
💻 Software Developer: Continue to Deep Learning Frameworks
🏗️ System Engineer: Explore MLOps Engineering
📊 Data Scientist: Advance to Computer Vision
📚 Resources
Official Documentation
- PyTorch Tutorials - Official guided learning
- PyTorch nn module - Layers and loss functions
- PyTorch Autograd Mechanics - How gradients work
Community Resources
- 3Blue1Brown: Neural Networks - Visual intuition
- Stack Overflow: pytorch tag - Q&A
- Andrej Karpathy: Neural Networks Zero to Hero - Build it all from scratch
Learning Materials
- Deep Learning Book (Goodfellow et al.) - The canonical text
- CS231n Notes: Backpropagation - Stanford lecture notes
🤝 Quest Completion Checklist
- ✅ Completed all primary objectives
- ✅ Built a network from scratch that learns XOR
- ✅ Answered all knowledge check questions
- ✅ Completed at least one mastery challenge
- ✅ Explored the resource library
- ✅ Identified your next quest in the journey
🕸️ Knowledge Graph
Structured wiki-links connect this quest to the IT-Journey knowledge graph. Open the Obsidian Graph View to explore connections.
Level hub: [[Level 1101 - Machine Learning & AI]] Overworld: [[🏰 Overworld - Master Quest Map]] Prerequisites: [[Machine Learning Fundamentals: Supervised & Unsupervised Learning with Scikit-Learn]] Unlocks: [[Deep Learning Frameworks: PyTorch vs TensorFlow Comparison & Implementation]] · [[Computer Vision Mastery: Image Classification, Object Detection & Segmentation]] · [[Natural Language Processing: Text Analysis, Transformers & LLMs with Python]] Obsidian docs: [[Obsidian Knowledge Graph and Wiki Links]]
🎁 Rewards
Badges
- 🏆 Network Weaver - Built a neural network from first principles
- ⚡ Gradient Channeler - Implemented backpropagation and gradient descent
Skills unlocked
- 🛠️ From-Scratch Neural Networks
- ⚡ Backpropagation & Autograd
Features unlocked
- Access to the deep learning frameworks, vision, and NLP quests
🕸️ Quest Network
Click a node to open the quest · ⌘/Ctrl-click for a new tab · drag to reposition · scroll to zoom.
Referenced by
- Loading…