Skip to main content
Settings
Search
Appearance
Theme Mode
About
Jekyll v3.10.0
Environment Production
Last Build
2026-07-02 02:06 UTC
Current Environment Production
Build Time Jul 02, 02:06
Jekyll v3.10.0
Build env (JEKYLL_ENV) production
Page Location
Page Info
Layout quest
Collection quests
Path _quests/1101/computer-vision.md
URL /quests/1101/computer-vision/
Date 2025-11-29
Theme Skin
SVG Backgrounds
Layer Opacity
0.6
0.04
0.08

Computer Vision Mastery: CNNs and Transfer Learning

Build computer vision models in PyTorch: learn convolutions, train CNNs for image classification, and fine-tune pretrained models with transfer learning.

Table of Contents

Lvl 1101Master 🏰 Main Quest 🔴 Hard 4-5 hours

Computer Vision Mastery: CNNs and Transfer Learning

Build computer vision models with convolutions, CNNs, image classification, and transfer learning

Primary Tech
🛠️ pytorch
Skill Focus
Ai ml
Series
AI/ML Mastery
Author
IT-Journey Team
XP Range
⚡ 7000-8000

Greetings, brave adventurer! You have taught machines to read numbers and to read words. Now you grant them sight. This quest, Computer Vision Mastery, leads you into the Tower of the Seeing Eye, where convolutional neural networks learn to recognize objects in raw pixels. By its end you will have trained a network that classifies images and bent a giant pretrained model to a task of your own choosing.

Whether you have only tagged photos or you already wonder how a self-driving car “sees” a stop sign, this adventure reveals the machinery of machine perception.

📖 The Legend Behind This Quest

A dense neural network treats an image as a flat list of pixels, blind to the fact that nearby pixels form edges, edges form shapes, and shapes form objects. The convolution changed everything: a small filter slides across the image, detecting local patterns - an edge here, a corner there - and stacking these detectors into deeper and deeper layers builds a hierarchy from edges to eyes to faces.

This insight, crowned by the 2012 ImageNet victory of a deep CNN, launched the modern vision era. Today you can stand on the shoulders of giants: take a network already trained on millions of images and transfer its learned features to your own small dataset.

🎯 Quest Objectives

By the time you complete this journey, you will have mastered:

Primary Objectives (Required for Quest Completion)

  • Convolutions - Explain how a filter detects local patterns in an image
  • CNN Architecture - Stack convolution, activation, and pooling layers into a classifier
  • Image Classification - Train and evaluate a CNN on a real image dataset
  • Transfer Learning - Fine-tune a pretrained model on a new task with little data

Secondary Objectives (Bonus Achievements)

  • Data Augmentation - Expand a dataset with flips, crops, and rotations
  • Feature Maps - Inspect what early layers learn
  • Confusion Analysis - Find which classes a model confuses most

Mastery Indicators

You’ll know you’ve truly mastered this quest when you can:

  • Explain why CNNs need far fewer parameters than dense nets on images
  • Describe the role of pooling in building translation tolerance
  • Decide when to fine-tune versus train from scratch
  • Diagnose an overfit vision model and respond

🗺️ Quest Prerequisites

📋 Knowledge Requirements

  • Completion of the Deep Learning Frameworks quest (tensors, training loop)
  • Comfortable building and training a PyTorch model
  • Basic understanding of layers and activations

🛠️ System Requirements

  • Modern operating system (Windows 10+, macOS 10.14+, or Linux)
  • Python 3.10 or newer on your PATH
  • A text editor or IDE (VS Code) or a Jupyter environment
  • Internet connection (torchvision downloads datasets and models)

🧠 Skill Level Indicators

This 🔴 Hard quest expects:

  • You can write a forward/loss/backward/step loop
  • You are ready to reason about image tensors and channels
  • Ready for 4-5 hours of focused, hands-on learning

🌍 Choose Your Adventure Platform

The torchvision library is cross-platform. A GPU makes training far quicker but is not required for the small models here.

🍎 macOS Kingdom Path

Click to expand macOS instructions ```bash python3 -m venv ~/cv-quest && source ~/cv-quest/bin/activate pip install --upgrade pip pip install torch torchvision numpy matplotlib # Apple Silicon can accelerate with the MPS backend python -c "import torch, torchvision; print('torch', torch.__version__, 'mps', torch.backends.mps.is_available())" ```

🪟 Windows Empire Path

Click to expand Windows instructions ```powershell python -m venv $HOME\cv-quest & $HOME\cv-quest\Scripts\Activate.ps1 pip install --upgrade pip pip install torch torchvision numpy matplotlib python -c "import torch, torchvision; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())" ```

🐧 Linux Territory Path

Click to expand Linux instructions ```bash sudo apt update && sudo apt install -y python3-venv python3-pip python3 -m venv ~/cv-quest && source ~/cv-quest/bin/activate pip install --upgrade pip pip install torch torchvision numpy matplotlib python -c "import torch, torchvision; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())" ```

☁️ Cloud Realms Path

Click to expand Cloud/Container instructions ```bash # Google Colab ships torch and torchvision with a free GPU. # Enable it under Runtime > Change runtime type > GPU, then train in minutes. python -c "import torchvision; print('torchvision ready')" ```

🧙‍♂️ Chapter 1: Convolutions - Teaching a Filter to See Edges

A convolution slides a tiny weight grid (a kernel) across an image, multiplying and summing at each position to produce a feature map. Different kernels detect different patterns - one finds vertical edges, another horizontal. The network learns the kernels that matter for the task.

⚔️ Skills You’ll Forge in This Chapter

  • Understanding kernels, strides, and feature maps
  • Seeing why convolutions share parameters across the image
  • Recognizing the role of pooling

🏗️ A Convolution Detecting Edges

import torch
import torch.nn.functional as F

# A 1x1x5x5 grayscale image: a bright square on a dark background
img = torch.zeros(1, 1, 5, 5)
img[0, 0, 1:4, 1:4] = 1.0

# A vertical-edge detector kernel (Sobel-like)
kernel = torch.tensor([[-1.0, 0.0, 1.0],
                       [-1.0, 0.0, 1.0],
                       [-1.0, 0.0, 1.0]]).view(1, 1, 3, 3)

edges = F.conv2d(img, kernel)
print("Feature map (vertical edges):")
print(edges.squeeze().round())
# Strong responses appear where brightness changes left-to-right.

Two ideas make convolutions efficient on images. Parameter sharing: the same small kernel scans the whole image, so a dense net’s millions of weights collapse to a handful. Translation tolerance: an edge is detected wherever it appears. Pooling (e.g. max-pooling) then shrinks feature maps, keeping the strongest signal and discarding exact position - so the network recognizes a cat whether it sits left or right.

🔍 Knowledge Check: Convolutions

  • What does parameter sharing save compared to a dense layer?
  • What is a feature map?
  • What does max-pooling give up, and what does it gain?

⚡ Quick Wins and Checkpoints

  • Environment ready: import torchvision works
  • Ran a convolution: You produced an edge feature map

🧙‍♂️ Chapter 2: Building and Training a CNN

Stack convolutions, activations, and pooling to extract features, then flatten and feed a small dense head to classify. We train it on FashionMNIST - 28x28 grayscale images of clothing in ten classes.

⚔️ Skills You’ll Forge in This Chapter

  • Defining a CNN with nn.Module
  • Loading image data with torchvision
  • Training and evaluating on held-out images

🏗️ A Complete Image Classifier

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = "cuda" if torch.cuda.is_available() else "cpu"
tfm = transforms.ToTensor()

train = datasets.FashionMNIST("./data", train=True, download=True, transform=tfm)
test = datasets.FashionMNIST("./data", train=False, download=True, transform=tfm)
train_dl = DataLoader(train, batch_size=128, shuffle=True)
test_dl = DataLoader(test, batch_size=256)

class SmallCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 16, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),  # 28 -> 14
            nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2), # 14 -> 7
        )
        self.head = nn.Sequential(nn.Flatten(), nn.Linear(32 * 7 * 7, 10))

    def forward(self, x):
        return self.head(self.features(x))

model = SmallCNN().to(device)
loss_fn = nn.CrossEntropyLoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

for epoch in range(2):                      # 2 epochs is enough to see learning
    model.train()
    for xb, yb in train_dl:
        xb, yb = xb.to(device), yb.to(device)
        opt.zero_grad()
        loss = loss_fn(model(xb), yb)
        loss.backward()
        opt.step()

    # Evaluate on the held-out test set
    model.eval()
    correct = 0
    with torch.no_grad():
        for xb, yb in test_dl:
            xb, yb = xb.to(device), yb.to(device)
            correct += (model(xb).argmax(1) == yb).sum().item()
    print(f"epoch {epoch}  test accuracy {correct / len(test):.3f}")

After two short epochs this tiny model already classifies clothing with high accuracy - all from the convolutions discovering edges, textures, and shapes on their own.

🔍 Knowledge Check: Training a CNN

  • Why does each MaxPool2d(2) halve the spatial size?
  • What does CrossEntropyLoss expect as its inputs?
  • Why do we wrap evaluation in torch.no_grad()?

🧙‍♂️ Chapter 3: Transfer Learning - Standing on Giants

Training from scratch needs lots of data. Transfer learning reuses a model already trained on millions of images (like ResNet on ImageNet): keep its feature extractor (already trained to detect edges, textures, and shapes that generalize across images), replace only the final classification layer, and fine-tune. You get strong results from a few hundred images.

⚔️ Skills You’ll Forge in This Chapter

  • Loading a pretrained model from torchvision
  • Freezing the backbone and swapping the head
  • Knowing when to fine-tune versus train fresh

🏗️ Adapting ResNet to a New Task

import torch
import torch.nn as nn
from torchvision import models

# Load ResNet-18 pretrained on ImageNet
net = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

# Freeze the learned feature extractor so we don't disturb it
for p in net.parameters():
    p.requires_grad = False

# Replace the final layer for OUR task (e.g. 5 custom classes)
num_classes = 5
net.fc = nn.Linear(net.fc.in_features, num_classes)

# Only the new head's parameters will train
trainable = [p for p in net.parameters() if p.requires_grad]
print("Trainable parameter tensors:", len(trainable))  # just the new fc layer
optimizer = torch.optim.Adam(trainable, lr=1e-3)
# Train this on your small dataset exactly like Chapter 2's loop.

Because ResNet’s early layers already know edges, textures, and shapes that generalize across images, you only teach the final layer to map those features to your classes. This is why transfer learning is the default first move in applied computer vision.

🔍 Knowledge Check: Transfer Learning

  • Why do we freeze the backbone’s parameters?
  • Which layer must always change for a new task?
  • Why does transfer learning need far less data than training from scratch?

🎮 Mastery Challenges

🟢 Novice Challenge: Visualize a Feature Map

Objective: See what a convolution detects.

Requirements:

  • Apply two different 3x3 kernels to a sample image
  • Plot both feature maps with matplotlib
  • Describe what each kernel emphasizes

Validation: The two feature maps highlight visibly different structures (e.g. vertical vs horizontal edges).

🟡 Intermediate Challenge: Train and Confuse

Objective: Train the Chapter 2 CNN and analyze its mistakes.

Requirements:

  • Train for 3 epochs and report test accuracy
  • Build a confusion matrix over the test set
  • Name the two classes the model confuses most

Validation: Accuracy clears 0.85 and you identify a real confusion pair (e.g. shirt vs coat).

🔴 Advanced Challenge: Transfer to Your Own Classes

Objective: Fine-tune a pretrained model on a small custom dataset.

Requirements:

  • Gather or download a small set of images in 2-3 classes
  • Freeze a pretrained backbone and replace the head
  • Train the head and report held-out accuracy

Validation: The fine-tuned model meaningfully outperforms random guessing on your held-out images.

🏆 Quest Rewards & Achievements

🎖️ Badges Earned:

  • 🏆 Sight Giver - You trained a network that classifies images
  • 🔭 Transfer Sage - You adapted a pretrained model to a new task

🛠️ Skills Unlocked:

  • CNN Construction in PyTorch - Convolution, pooling, and a classifier head
  • Transfer Learning - Reusing pretrained features efficiently

🔓 Unlocked Quests:

  • MLOps Engineering - Deploy and monitor vision models
  • AI Ethics - Reason about fairness in vision systems

📊 Progression Points: +75 XP

🗺️ Next Steps in Your Journey

Continue the Main Story:

Explore Side Adventures:

Character Class Recommendations

💻 Software Developer: Continue to MLOps Engineering
🏗️ System Engineer: Explore MLOps Engineering
📊 Data Scientist: Advance to Natural Language Processing

📚 Resources

Official Documentation

Community Resources

Learning Materials

🤝 Quest Completion Checklist

  • ✅ Completed all primary objectives
  • ✅ Trained a CNN and evaluated it on held-out images
  • ✅ Answered all knowledge check questions
  • ✅ Completed at least one mastery challenge
  • ✅ Explored the resource library
  • ✅ Identified your next quest in the journey

🕸️ Knowledge Graph

Structured wiki-links connect this quest to the IT-Journey knowledge graph. Open the Obsidian Graph View to explore connections.

Level hub: [[Level 1101 - Machine Learning & AI]] Overworld: [[🏰 Overworld - Master Quest Map]] Required: [[Deep Learning Frameworks: PyTorch vs TensorFlow Comparison & Implementation]] Unlocks: [[MLOps Engineering: CI/CD Pipelines for Machine Learning Production]] · [[AI Ethics and Responsible AI: Bias Detection, Fairness & Governance]] Obsidian docs: [[Obsidian Knowledge Graph and Wiki Links]]

🎁 Rewards

75 XP

Badges

  • 🏆 Sight Giver - Trained a network that classifies images
  • 🔭 Transfer Sage - Adapted a pretrained model to a new task

Skills unlocked

  • 🛠️ CNN Construction in PyTorch
  • 🧠 Transfer Learning

Features unlocked

  • Access to the MLOps and AI Ethics quests of Level 1101

🕸️ Quest Network

Loading quest graph…

Click a node to open the quest · ⌘/Ctrl-click for a new tab · drag to reposition · scroll to zoom.