Anchoring the Drifting Agent: State Persistence and Drift Prevention
By IT-Journey Team
Detect, measure, and prevent context drift in long-running GitHub Copilot agent sessions — implement state checkpointing, drift detection signals, and recovery procedures.
Estimated reading time: 7 minutes
Table of Contents
The Tide Fields south of the Citadel are treacherous — the currents shift without warning, and many agents sent to cross them have returned to shore hundreds of leagues from their intended destination. They set off correctly, but drift, session by session, until they no longer remember the original shore.
🗺️ Quest Network Position
graph LR
Q8[✅ Q8: Memory Vaults] --> Q9[🎯 Q9: Drifting Agent]
Q9 --> Q10[🔜 Q10: Tool Planes]
style Q9 fill:#4CAF50,stroke:#2E7D32,stroke-width:4px,color:#fff
🎯 Quest Objectives
- Define context drift — identify signals that indicate an agent has lost its original task context
- Implement state checkpointing — save agent state at defined intervals during long-running tasks
- Build a drift detector — script that compares current agent output against original task intent
- Test recovery from drift — simulate a drifted agent and restore it using a checkpoint
- Document intervention triggers — define exactly when drift causes automatic escalation to a human
⚔️ The Quest Begins
Chapter 1 — Recognising Context Drift
Context drift occurs when an agent’s understanding of its task shifts as its context window fills with intermediate results, tool outputs, and accumulated messages.
Drift signals to watch for:
| Signal | Example | Severity |
|---|---|---|
| Task scope creep | Agent starts modifying files not in the original plan | 🔴 High |
| Original task forgotten | Agent’s PR description no longer references the original issue | 🔴 High |
| Contradictory decisions | Agent reverses an earlier decision without noting it | 🟡 Medium |
| Excessive sub-tasks | Agent creates 10+ sub-issues for a 2-step task | 🟡 Medium |
| Repeated actions | Agent rewrites the same file multiple times | 🟡 Medium |
Chapter 2 — Implementing State Checkpoints
Exercise 9.1: Add periodic checkpointing to your agent workflow.
# .github/workflows/agent-with-checkpointing.yml
name: Agent with State Checkpointing
on:
issues:
types: [labeled]
jobs:
agent-run:
runs-on: ubuntu-latest
timeout-minutes: 45
steps:
- uses: actions/checkout@v4
- name: Record initial task intent
id: initial_intent
run: |
# Save the original task as the anchor — never overwrite this
cat > .agent-memory/initial-intent.json << EOF
{
"original_task": $(echo '${{ toJSON(github.event.issue.body) }}'),
"issue_number": ${{ github.event.issue.number }},
"recorded_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"anchor_hash": "$(echo '${{ github.event.issue.body }}' | sha256sum | cut -d' ' -f1)"
}
EOF
echo "✅ Initial task intent anchored"
- name: Checkpoint 1 — after planning
run: |
python3 work/gh-600/scripts/save_checkpoint.py \
--stage planning \
--state-file .agent-memory/checkpoint-planning.json
- name: Validate drift (planning vs intent)
run: |
python3 work/gh-600/scripts/detect_drift.py \
--intent .agent-memory/initial-intent.json \
--checkpoint .agent-memory/checkpoint-planning.json \
--threshold 0.7
- name: Checkpoint 2 — after execution
run: |
python3 work/gh-600/scripts/save_checkpoint.py \
--stage execution \
--state-file .agent-memory/checkpoint-execution.json
- name: Upload checkpoints
if: always()
uses: actions/upload-artifact@v4
with:
name: agent-checkpoints-${{ github.run_id }}
path: .agent-memory/
retention-days: 30
Chapter 3 — Building the Drift Detector
Exercise 9.2: Create a drift detection script.
# work/gh-600/scripts/detect_drift.py
"""Detects context drift by comparing checkpoint against original task intent."""
import argparse
import json
import sys
from difflib import SequenceMatcher
def similarity(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio()
def detect_drift(intent_file: str, checkpoint_file: str, threshold: float) -> bool:
with open(intent_file) as f:
intent = json.load(f)
with open(checkpoint_file) as f:
checkpoint = json.load(f)
original_task = intent.get("original_task", "")
current_task_summary = checkpoint.get("task_summary", "")
score = similarity(original_task, current_task_summary)
print(f"Drift detection report:")
print(f" Original task length: {len(original_task)} chars")
print(f" Current summary length: {len(current_task_summary)} chars")
print(f" Similarity score: {score:.2f} (threshold: {threshold})")
if score < threshold:
print(f"⚠️ DRIFT DETECTED — similarity {score:.2f} below threshold {threshold}")
print(" Recommended action: Stop agent, restore from last good checkpoint")
return True
else:
print(f"✅ No significant drift detected ({score:.2f} >= {threshold})")
return False
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--intent", required=True)
parser.add_argument("--checkpoint", required=True)
parser.add_argument("--threshold", type=float, default=0.7)
args = parser.parse_args()
drifted = detect_drift(args.intent, args.checkpoint, args.threshold)
sys.exit(1 if drifted else 0)
Chapter 4 — Recovery from Drift
When drift is detected, the recovery procedure is:
- Stop the agent — do not allow further execution
- Identify the last good checkpoint — find the checkpoint before drift began
- Restore context — inject the original intent + last good checkpoint into a new agent session
- Re-plan — have the agent create a new plan from the restored state
- Human approval required — a drifted agent must get explicit plan approval before resuming
# Recovery script
# work/gh-600/scripts/recover_from_drift.sh
#!/usr/bin/env bash
set -euo pipefail
CHECKPOINT_DIR=".agent-memory"
RECOVERY_PROMPT_FILE="$CHECKPOINT_DIR/recovery-prompt.md"
echo "=== Agent Drift Recovery Procedure ==="
# Read original intent
ORIGINAL_TASK=$(jq -r '.original_task' "$CHECKPOINT_DIR/initial-intent.json")
cat > "$RECOVERY_PROMPT_FILE" << EOF
# Agent Recovery Context
Your previous session experienced context drift. You are resuming from a clean state.
## Original Task (Anchor)
$ORIGINAL_TASK
## Recovery Instructions
1. Read the original task above carefully
2. Produce a NEW structured plan from scratch — do not reference any previous plans
3. The plan must address ONLY the original task, nothing else
4. Submit the plan for human review before taking any action
## What NOT to Do
- Do not reference prior incomplete work
- Do not continue from where you left off
- Do not assume any previous files are correct
EOF
echo "✅ Recovery prompt written to $RECOVERY_PROMPT_FILE"
echo "Next step: inject this prompt as the first message in a new Copilot session"
✅ Quest Validation
python3 scripts/validate_quest.py --quest q9
# ✅ Checkpointing workflow: agent-with-checkpointing.yml present
# ✅ Drift detector: detect_drift.py present and executable
# ✅ Recovery script: recover_from_drift.sh present
# ✅ Initial intent anchoring: implemented
# 🏆 Quest Q9 complete!
🏆 Quest Rewards
| Reward | Details |
|---|---|
| ⚓ Anchor Master Badge | Earned on completion |
| 🔍 Drift Detection | Skill unlocked |
| 100 XP | Added to Level 1010 total |
| Unlocks | Q10: Crossing the Tool Planes |