| Level: Journeyman (Lvl 001) | Difficulty: 🟡 Medium | Time: 2-3 hours |
In the realm of software development, documentation is your most powerful spell—but only if you can find it when you need it! As projects multiply across GitHub repositories, valuable knowledge becomes scattered across dozens of README files, wiki pages, and doc folders. This quest will teach you to build an automated documentation aggregation system that collects, organizes, and maintains a centralized knowledge hub.
A self-updating documentation repository powered by:
Every developer faces this problem: documentation lives everywhere. Your team’s API docs are in one repo, deployment guides in another, troubleshooting tips scattered across wikis. When you need information, you’re hunting through multiple repositories, branches, and directories.
The Solution? Build a system that automatically:
By quest’s end, you’ll have a living documentation hub that grows and evolves automatically.
By completing this quest, you will:
Objective: Create the central repository that will house all aggregated documentation.
docs-hub (or choose your own meaningful name)git clone https://github.com/YOUR-USERNAME/docs-hub.git
cd docs-hub
# Create all necessary directories
mkdir -p scripts raw_docs docs temp .github/workflows
# Create essential files
touch repos.txt
touch scripts/aggregate.sh
touch scripts/process.py
touch .github/workflows/aggregate-docs.yml
Define Your Source Repositories
Edit repos.txt to list the repositories you want to aggregate documentation from:
https://github.com/username/project-api
https://github.com/username/project-frontend
https://github.com/username/project-backend
https://github.com/username/project-infrastructure
git add .
git commit -m "feat: Initialize docs-hub repository structure"
git push origin main
Checkpoint: You now have a structured repository ready for automation!
Harness the power of GitHub Actions to automate your doc-harvesting ritual. Create .github/workflows/aggregate-docs.yaml with this incantation:
name: Aggregate Documentation
on:
schedule:
- cron: '0 0 * * *' # Daily at midnight
workflow_dispatch: # Manual trigger
jobs:
aggregate:
runs-on: ubuntu-latest
steps:
- name: Checkout central repo
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install pyyaml requests # Add more if your potions require
- name: Run aggregation script
run: bash scripts/aggregate.sh
- name: Commit changes
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "docs: Auto-aggregate documentation [skip ci]"
commit_user_name: "GitHub Actions Bot"
commit_user_email: "actions@github.com"
on.schedule.cron: Uses cron syntax to run daily at midnight UTCworkflow_dispatch: Enables manual triggering from GitHub Actions tabactions/checkout@v4: Checks out your repository codeactions/setup-python@v5: Sets up Python environmentstefanzweifel/git-auto-commit-action@v5: Automatically commits changesFor AI-powered categorization, add your API key to GitHub Secrets:
XAI_API_KEY (or OPENAI_API_KEY)Checkpoint: Your workflow is configured and ready to orchestrate the automation!
Objective: Create a Bash script that clones repositories and collects documentation files.
The Bash script will:
repos.txtCreate scripts/aggregate.sh with the following content:
#!/bin/bash
set -euo pipefail # Exit on error, undefined variables, and pipe failures
# Color codes for better output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Logging functions
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
# Create necessary directories
mkdir -p temp raw_docs docs
log_info "Starting documentation aggregation..."
# Read and process each repository
while IFS= read -r repo || [ -n "$repo" ]; do
# Skip empty lines and comments
[[ -z "$repo" || "$repo" =~ ^#.* ]] && continue
repo_name=$(basename "$repo" .git)
temp_dir="temp/$repo_name"
log_info "Processing repository: $repo_name"
# Clone or update repository
if [ -d "$temp_dir/.git" ]; then
log_info "Updating existing clone..."
git -C "$temp_dir" pull --quiet || log_warn "Failed to update $repo_name"
else
log_info "Cloning repository..."
git clone --depth 1 --quiet "$repo" "$temp_dir" || {
log_error "Failed to clone $repo"
continue
}
fi
# Create directory for this repo's docs
mkdir -p "raw_docs/$repo_name"
# Find and copy documentation files
file_count=0
while IFS= read -r file; do
# Calculate relative path
rel_path="${file#"$temp_dir"/}"
target_dir="raw_docs/$repo_name/$(dirname "$rel_path")"
# Create target directory and copy file
mkdir -p "$target_dir"
cp "$file" "$target_dir/" && ((file_count++))
done < <(find "$temp_dir" -type f \( -name "*.md" -o -name "README*" \) -not -path "*/.git/*" -not -path "*/node_modules/*" -not -path "*/vendor/*")
log_info "Collected $file_count documentation files from $repo_name"
done < repos.txt
log_info "Repository aggregation complete. Processing documentation..."
# Run Python processing script
python3 scripts/process.py || log_error "Python processing failed"
# Clean up temporary files
log_info "Cleaning up temporary files..."
rm -rf temp/
log_info "Documentation aggregation complete!"
chmod +x scripts/aggregate.sh
Before committing, test your script:
./scripts/aggregate.sh
Checkpoint: Your Bash script can now clone repositories and collect documentation files!
Now, in scripts/process.py, mix Python alchemy to sort, categorize, and enchant with front matter:
import os
import yaml
from pathlib import Path
import requests # For AI API calls
RAW_DIR = 'raw_docs'
ORGANIZED_DIR = 'docs'
AI_API_URL = 'https://api.x.ai/v1/chat/completions' # Placeholder; adjust per docs
AI_API_KEY = os.getenv('XAI_API_KEY')
def categorize_content(content):
# Basic rule-based (expand with NLP if desired)
if 'api' in content.lower():
return 'api'
elif 'guide' in content.lower() or 'tutorial' in content.lower():
return 'user-guides'
else:
return 'misc'
def generate_front_matter(content):
if AI_API_KEY:
payload = {
'model': 'grok-beta',
'messages': [{'role': 'user', 'content': f"Summarize and tag this doc: {content[:500]}"}]
}
response = requests.post(AI_API_URL, json=payload, headers={'Authorization': f'Bearer {AI_API_KEY}'})
if response.status_code == 200:
ai_result = response.json()['choices'][0]['message']['content']
return {'title': 'Auto-Generated Title', 'tags': ai_result.split(', '), 'summary': ai_result}
return {'title': 'Default Title', 'tags': ['uncategorized'], 'summary': 'No summary'}
# Process files
for root, dirs, files in os.walk(RAW_DIR):
for file in files:
if file.endswith('.md'):
src_path = Path(root) / file
with open(src_path, 'r') as f:
content = f.read()
# Extract/update front matter
if content.startswith('---'):
fm_end = content.index('---', 3) + 3
existing_fm = yaml.safe_load(content[3:fm_end-3])
body = content[fm_end:]
else:
existing_fm = {}
body = content
new_fm = generate_front_matter(body)
updated_fm = {**existing_fm, **new_fm}
# Organize
category = categorize_content(body)
dest_dir = Path(ORGANIZED_DIR) / category / Path(root).relative_to(RAW_DIR).parent
dest_dir.mkdir(parents=True, exist_ok=True)
dest_path = dest_dir / file
# Write
with open(dest_path, 'w') as f:
f.write('---\n')
yaml.dump(updated_fm, f)
f.write('---\n')
f.write(body)
# Clean raw_docs
for root, dirs, files in os.walk(RAW_DIR, topdown=False):
for file in files:
os.remove(Path(root) / file)
for dir in dirs:
os.rmdir(Path(root) / dir)
os.rmdir(RAW_DIR)
Script creates proper Python implementation - Full implementation provided above replaces this placeholder.
Objective: Launch your documentation hub and verify it works end-to-end.
# Add all new files
git add .
# Commit with descriptive message
git commit -m "feat: Implement automated documentation aggregation system
- Add GitHub Actions workflow for scheduled execution
- Create Bash script for repository cloning and file collection
- Implement Python script for intelligent organization
- Add YAML front matter generation with categorization
- Include error handling and comprehensive logging"
# Push to GitHub
git push origin main
Watch the workflow execute in real-time:
After the workflow completes:
# Pull the changes locally
git pull origin main
# Check the organized documentation
ls -la docs/
# View a processed file to see front matter
head -n 20 docs/api/README.md
Expected Directory Structure:
docs/
├── api/
│ ├── README.md
│ └── endpoints.md
├── guides/
│ ├── getting-started.md
│ └── tutorial.md
├── architecture/
│ └── design-decisions.md
└── general/
└── misc-docs.md
Checkpoint: Your documentation hub is live and automatically updating!
Congratulations, Documentation Architect! You’ve successfully:
✅ Built a Multi-Repository Documentation System that automatically aggregates knowledge
✅ Mastered GitHub Actions with scheduled and manual workflow triggers
✅ Combined Bash and Python for powerful automation workflows
✅ Implemented Intelligent Organization with category-based file structure
✅ Enhanced Documents with rich YAML front matter metadata
✅ Created a Scalable Solution that grows with your project ecosystem
Deploy your documentation hub as a searchable website:
# Add to workflow after aggregation
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: $
publish_dir: ./docs
Enhance categorization with more sophisticated AI:
Add full-text search capabilities:
Track documentation health:
Problem: Git clone fails with authentication error
Solution: Ensure your GITHUB_TOKEN has correct permissions:
Problem: ModuleNotFoundError: No module named 'yaml'
Solution: Add dependency installation to workflow:
- name: Install dependencies
run: pip install pyyaml requests
Problem: Bash script runs but no files appear
Solution: Check your repos.txt format:
Problem: Documents copied but no YAML added
Solution: Check file detection in Python script:
RAW_DIR path is correct.md extensionBuilt something amazing? We want to see it!
Tag us: @it-journey with #DocumentationHub #QuestComplete
Quest Master’s Wisdom: “Documentation is not just about recording what exists—it’s about creating a living knowledge system that grows, adapts, and serves your team’s evolving needs. Automation doesn’t replace the human touch; it amplifies it, freeing you to focus on insights rather than organization.”
May your documentation always be current, your automation reliable, and your knowledge easily discoverable. Onward to greater adventures! 🚀✨