Skip to main content
Settings
Search
Appearance
Theme Mode
About
Jekyll v3.10.0
Environment Production
Last Build
2026-07-02 02:06 UTC
Current Environment Production
Build Time Jul 02, 02:06
Jekyll v3.10.0
Build env (JEKYLL_ENV) production
Page Location
Page Info
Layout quest
Collection quests
Path _quests/1101/neural-networks.md
URL /quests/1101/neural-networks/
Date 2025-11-29
Theme Skin
SVG Backgrounds
Layer Opacity
0.6
0.04
0.08

Neural Networks Deep Dive: Build One From Scratch

Build a neural network from scratch, then in PyTorch, mastering neurons, layers, activations, forward propagation, backpropagation, and gradient descent.

Lvl 1101Master 🏰 Main Quest ⚔️ Epic 5-7 hours

Neural Networks Deep Dive: Build One From Scratch

Build a neural network from first principles, then in PyTorch, and understand backpropagation

Primary Tech
🛠️ pytorch
Skill Focus
Ai ml
Series
AI/ML Mastery
Author
IT-Journey Team
XP Range
⚡ 7000-8000

Greetings, brave adventurer! Deep within the Oracle’s Tower lies a chamber where lifeless numbers awaken into something that learns. This quest, Neural Networks Deep Dive, reveals the machinery beneath all of deep learning. You will build a neural network by hand - neuron by neuron - then watch the same idea expressed in PyTorch. By the end you will understand the two great spells of the network: forward propagation, which makes a prediction, and backpropagation, which teaches the network from its mistakes.

Many invoke model.fit() without knowing what it does. After this quest, you will be among the few who truly understand.

📖 The Legend Behind This Quest

Inspired by the neurons of living brains, early artificers stacked simple units - each computing a weighted sum and a non-linear twist - into layers. Alone, a neuron is a line. Stacked with non-linear activations, layers can bend space into any shape, approximating nearly any function. The breakthrough was not the neuron but the teaching: backpropagation, which uses the chain rule of calculus to assign blame for an error backward through every layer, nudging each weight toward a better answer. This single algorithm powers everything from image recognition to the large language models of today.

🎯 Quest Objectives

By the time you complete this journey, you will have mastered:

Primary Objectives (Required for Quest Completion)

  • Neurons and Layers - Explain what a single neuron computes and how layers stack
  • Activation Functions - Use ReLU, sigmoid, and softmax and explain why non-linearity matters
  • Forward Propagation - Trace an input through a network to a prediction
  • Backpropagation & Gradient Descent - Implement learning by propagating error backward

Secondary Objectives (Bonus Achievements)

  • Loss Functions - Choose cross-entropy for classification and MSE for regression
  • PyTorch Autograd - Let the framework compute gradients automatically
  • Hyperparameter Intuition - Reason about learning rate, epochs, and hidden size

Mastery Indicators

You’ll know you’ve truly mastered this quest when you can:

  • Explain backpropagation to a peer using the chain rule
  • Diagnose a network that fails to learn (e.g., dead ReLUs or a too-large learning rate)
  • Translate a from-scratch network into a PyTorch nn.Module
  • Read a falling loss curve and know when to stop

🗺️ Quest Prerequisites

📋 Knowledge Requirements

  • Completion of Machine Learning Fundamentals (recommended)
  • Comfortable with NumPy arrays and matrix shapes
  • Basic calculus intuition (a derivative is a slope)

🛠️ System Requirements

  • Modern operating system (Windows 10+, macOS 10.14+, or Linux)
  • Python 3.10 or newer installed and on your PATH
  • 8 GB RAM recommended; a GPU is optional for this quest
  • Internet connection for installing packages

🧠 Skill Level Indicators

This ⚔️ Epic quest expects:

  • You have trained at least one machine learning model
  • You are ready to work with matrices and derivatives
  • Ready for 5-7 hours of deep, hands-on learning

🌍 Choose Your Adventure Platform

Build an isolated environment with NumPy and PyTorch.

🍎 macOS Kingdom Path

Click to expand macOS instructions ```bash python3 -m venv ~/nn-quest && source ~/nn-quest/bin/activate pip install --upgrade pip pip install numpy torch matplotlib # Apple Silicon: PyTorch can use the MPS backend for acceleration python -c "import torch; print('torch', torch.__version__, 'mps', torch.backends.mps.is_available())" ```

🪟 Windows Empire Path

Click to expand Windows instructions ```powershell python -m venv $HOME\nn-quest & $HOME\nn-quest\Scripts\Activate.ps1 pip install --upgrade pip pip install numpy torch matplotlib python -c "import torch; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())" ```

🐧 Linux Territory Path

Click to expand Linux instructions ```bash sudo apt update && sudo apt install -y python3-venv python3-pip python3 -m venv ~/nn-quest && source ~/nn-quest/bin/activate pip install --upgrade pip pip install numpy torch matplotlib python -c "import torch; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())" ```

☁️ Cloud Realms Path

Click to expand Cloud/Container instructions ```bash # Google Colab provides a free GPU. Select Runtime > Change runtime type > GPU. # torch and numpy are preinstalled; verify the device: python -c "import torch; print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU')" ```

🧙‍♂️ Chapter 1: The Neuron and the Forward Pass

A neuron computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function. Stack many neurons into a layer, stack layers into a network, and you have a function that can learn.

⚔️ Skills You’ll Forge in This Chapter

  • The math of a single neuron
  • Why activation functions must be non-linear
  • Forward propagation through a layer

🏗️ The Forward Pass by Hand

import numpy as np

def relu(z):
    return np.maximum(0, z)

def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

# A tiny network: 2 inputs -> 3 hidden neurons -> 1 output
np.random.seed(42)
W1 = np.random.randn(2, 3) * 0.5   # weights for layer 1
b1 = np.zeros(3)
W2 = np.random.randn(3, 1) * 0.5   # weights for layer 2
b2 = np.zeros(1)

x = np.array([0.7, -1.2])          # one input example

# Forward propagation: linear -> activation, layer by layer
z1 = x @ W1 + b1                   # weighted sums of hidden layer
a1 = relu(z1)                      # non-linear activation
z2 = a1 @ W2 + b2
y_hat = sigmoid(z2)                # output in (0, 1)

print("Hidden activations:", np.round(a1, 3))
print("Prediction:", float(y_hat))

Without the non-linear relu, stacking layers would collapse into a single linear function. Non-linearity is what lets networks bend.

🔍 Knowledge Check: The Forward Pass

  • What does a single neuron compute before activation?
  • Why would removing all activation functions make depth useless?
  • What range does sigmoid squash its input into?

⚡ Quick Wins and Checkpoints

  • Computed a forward pass: Your network produced a prediction
  • Named two activations: You can describe ReLU and sigmoid

🧙‍♂️ Chapter 2: Backpropagation and Gradient Descent

Forward propagation makes a guess. Backpropagation teaches. It computes how much each weight contributed to the error using the chain rule, then gradient descent nudges each weight in the direction that lowers the loss.

⚔️ Skills You’ll Forge in This Chapter

  • The intuition of the chain rule for layered functions
  • Implementing one training step from scratch
  • Watching the loss fall over epochs

🏗️ A Network That Learns XOR

XOR is the classic problem a single linear model cannot solve. A hidden layer can.

import numpy as np

np.random.seed(0)
X = np.array([[0,0],[0,1],[1,0],[1,1]], dtype=float)
y = np.array([[0],[1],[1],[0]], dtype=float)   # XOR truth table

def sigmoid(z): return 1/(1+np.exp(-z))
def dsigmoid(a): return a*(1-a)                # derivative given the activation

# Initialize a 2 -> 4 -> 1 network
W1 = np.random.randn(2,4); b1 = np.zeros((1,4))
W2 = np.random.randn(4,1); b2 = np.zeros((1,1))
lr = 0.5

for epoch in range(5000):
    # Forward
    a1 = sigmoid(X @ W1 + b1)
    a2 = sigmoid(a1 @ W2 + b2)

    # Loss (mean squared error) and its gradient at the output
    loss = np.mean((a2 - y)**2)
    d2 = (a2 - y) * dsigmoid(a2)               # error at output layer

    # Backpropagate the error to the hidden layer (chain rule)
    d1 = (d2 @ W2.T) * dsigmoid(a1)

    # Gradient descent: update weights against the gradient
    W2 -= lr * a1.T @ d2
    b2 -= lr * d2.sum(axis=0, keepdims=True)
    W1 -= lr * X.T @ d1
    b1 -= lr * d1.sum(axis=0, keepdims=True)

    if epoch % 1000 == 0:
        print(f"epoch {epoch:4d}  loss {loss:.4f}")

print("Predictions:", np.round(sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2).ravel(), 2))

The loss falls toward zero and predictions approach the XOR table. You have implemented learning itself.

🔍 Knowledge Check: Backpropagation

  • Why does gradient descent subtract the gradient from the weights?
  • What role does the chain rule play across layers?
  • What happens if the learning rate is far too large?

🧙‍♂️ Chapter 3: The Same Network in PyTorch

PyTorch computes gradients for you via autograd. You define the forward pass; the framework derives the backward pass automatically.

⚔️ Skills You’ll Forge in This Chapter

  • Defining a model as an nn.Module
  • Using an optimizer and a loss function
  • The canonical training loop

🏗️ PyTorch Training Loop

import torch
import torch.nn as nn

X = torch.tensor([[0,0],[0,1],[1,0],[1,1]], dtype=torch.float32)
y = torch.tensor([[0],[1],[1],[0]], dtype=torch.float32)

model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid(),
)

loss_fn = nn.BCELoss()                      # binary cross-entropy
optimizer = torch.optim.Adam(model.parameters(), lr=0.05)

for epoch in range(2000):
    optimizer.zero_grad()                   # clear old gradients
    y_hat = model(X)                        # forward pass
    loss = loss_fn(y_hat, y)
    loss.backward()                         # autograd computes all gradients
    optimizer.step()                        # update every weight
    if epoch % 500 == 0:
        print(f"epoch {epoch:4d}  loss {loss.item():.4f}")

print("Predictions:", model(X).detach().round().ravel().tolist())

loss.backward() replaces the entire hand-written backpropagation from Chapter 2. This is why frameworks exist: you describe what to compute, and autograd handles how to differentiate it.

🔍 Knowledge Check: PyTorch

  • What does optimizer.zero_grad() prevent?
  • Which line triggers automatic differentiation?
  • Why is cross-entropy preferred over MSE for classification?

🎮 Mastery Challenges

🟢 Novice Challenge: Swap the Activation

Objective: Observe how activation choice affects learning.

Requirements:

  • Take the PyTorch XOR model and replace nn.ReLU() with nn.Tanh()
  • Train both and compare final loss
  • Note which converged faster

Validation: Both models reach a low loss; you can report the difference.

🟡 Intermediate Challenge: Tune the Learning Rate

Objective: Feel the effect of the most important hyperparameter.

Requirements:

  • Train the from-scratch XOR network with lr of 0.01, 0.5, and 5.0
  • Record which converges, which is slow, and which diverges
  • Explain why the largest rate fails

Validation: You can describe the trade-off in two sentences.

🔴 Advanced Challenge: A Real Classifier

Objective: Apply a PyTorch network to a real dataset.

Requirements:

  • Load load_digits from scikit-learn (8x8 handwritten digits)
  • Build a 64 -> 64 -> 10 network with ReLU and softmax (CrossEntropyLoss)
  • Train and report test accuracy above 0.90

Validation: Your model classifies held-out digits with high accuracy.

🏆 Quest Rewards & Achievements

🎖️ Badges Earned:

  • 🏆 Network Weaver - You built a neural network from first principles
  • Gradient Channeler - You implemented backpropagation and gradient descent

🛠️ Skills Unlocked:

  • From-Scratch Neural Networks - You understand the math, not just the API
  • Backpropagation & Autograd - Manual gradients and PyTorch automation

🔓 Unlocked Quests:

  • Deep Learning Frameworks - Master PyTorch and TensorFlow in depth
  • Computer Vision - Apply convolutional networks to images
  • Natural Language Processing - Apply networks to text and transformers

📊 Progression Points: +90 XP

🗺️ Next Steps in Your Journey

Continue the Main Story:

Explore Side Adventures:

Character Class Recommendations

💻 Software Developer: Continue to Deep Learning Frameworks
🏗️ System Engineer: Explore MLOps Engineering
📊 Data Scientist: Advance to Computer Vision

📚 Resources

Official Documentation

Community Resources

Learning Materials

🤝 Quest Completion Checklist

  • ✅ Completed all primary objectives
  • ✅ Built a network from scratch that learns XOR
  • ✅ Answered all knowledge check questions
  • ✅ Completed at least one mastery challenge
  • ✅ Explored the resource library
  • ✅ Identified your next quest in the journey

🕸️ Knowledge Graph

Structured wiki-links connect this quest to the IT-Journey knowledge graph. Open the Obsidian Graph View to explore connections.

Level hub: [[Level 1101 - Machine Learning & AI]] Overworld: [[🏰 Overworld - Master Quest Map]] Prerequisites: [[Machine Learning Fundamentals: Supervised & Unsupervised Learning with Scikit-Learn]] Unlocks: [[Deep Learning Frameworks: PyTorch vs TensorFlow Comparison & Implementation]] · [[Computer Vision Mastery: Image Classification, Object Detection & Segmentation]] · [[Natural Language Processing: Text Analysis, Transformers & LLMs with Python]] Obsidian docs: [[Obsidian Knowledge Graph and Wiki Links]]

🎁 Rewards

90 XP

Badges

  • 🏆 Network Weaver - Built a neural network from first principles
  • ⚡ Gradient Channeler - Implemented backpropagation and gradient descent

Skills unlocked

  • 🛠️ From-Scratch Neural Networks
  • ⚡ Backpropagation & Autograd

Features unlocked

  • Access to the deep learning frameworks, vision, and NLP quests

🕸️ Quest Network

Loading quest graph…

Click a node to open the quest · ⌘/Ctrl-click for a new tab · drag to reposition · scroll to zoom.