Domain 5 of GH-600 (17% of the exam) covers multi-agent systems. This is where the exam moves from “can you build one agent?” to “can you build a system of agents that works reliably together?”
Multi-agent design on GitHub is primarily a GitHub Actions design problem. The primitives are workflow triggers, job dependencies, artifacts, and environments.
Orchestration Patterns (Sub-skill 5.1)
Two patterns cover most multi-agent use cases:
Fan-Out (Parallelism)
An orchestrator job triggers multiple sub-agents simultaneously. Each sub-agent handles a different task (e.g., frontend tests, backend tests, security scan). The orchestrator collects results after all sub-agents complete.
jobs:
orchestrate:
outputs:
task_a_result: $
agent_a:
needs: orchestrate
# ... sub-agent A
agent_b:
needs: orchestrate
# ... sub-agent B
collect:
needs: [agent_a, agent_b]
# ... collect and evaluate results
Chain (Sequential)
Each sub-agent’s output is the next sub-agent’s input. A planning agent produces a plan, an implementation agent implements it, a review agent reviews the output.
Observability in Multi-Agent Systems (Sub-skill 5.2)
The critical challenge with multi-agent systems is debugging failures. When agent C fails, the failure might have been caused by faulty output from agent B, which was caused by an ambiguous plan from agent A.
The solution is distributed tracing: every agent in the system writes structured log entries with a shared correlation ID — a unique identifier for the entire multi-agent run. This allows you to query all log entries for a specific run, across all agents, in order.
In GitHub Actions, correlation IDs are passed as job outputs and injected into artifact filenames and step summary headers.
Failure Recovery (Sub-skill 5.3)
Multi-agent failures require different recovery strategies than single-agent failures. When one sub-agent fails, the orchestrator must decide:
- Abort — stop all agents, mark the entire run failed
- Continue — mark the failing agent’s subtask as failed, continue with others
- Retry — re-run the failing agent with modified inputs
- Escalate — create a human-review issue and pause
The continue-on-error: true and if: always() patterns in GitHub Actions enable orchestrators to continue despite sub-agent failures.
Agent Lifecycle Management (Sub-skill 5.4)
Multi-agent systems have operational demands that single agents don’t:
- Provisioning: Configuring a new agent, registering it in a shared registry
- Health monitoring: Regular health checks to confirm agents are responsive and producing expected outputs
- Deprecation: Gracefully retiring an agent that is being replaced
The Agentic Codex uses _data/agents.yml as an agent registry — a YAML file that records every agent’s name, role, owner, status, and review date.
Domain 5 Quests
| Quest | Skill | Link |
|---|---|---|
| Q14 | Orchestration Patterns | Multi-Agent Orchestration Patterns |
| Q15 | Multi-Agent Observability | Multi-Agent Observability |
| Q16 | Failure Recovery | Multi-Agent Failure Recovery |
| Q17 | Lifecycle Management | Multi-Agent Lifecycle Management |
Q14 includes the full fan-out and chain workflow patterns. Q15 includes the trace writer script and correlation ID implementation. Q16 includes the recovery coordinator. Q17 includes the agents.yml registry schema and health monitoring workflow.