This guide will help you set up and configure the Guardian 2.0 Hyperlink Testing Framework for your Jekyll-based GitHub Pages site.
# Clone or navigate to your repository
cd /path/to/your/jekyll-site
# Run the automated setup
./test/hyperlink-guardian/scripts/validate.sh setup
# Ensure required directories exist
mkdir -p test-results/{artifacts,reports}
# Make scripts executable
chmod +x test/hyperlink-guardian/scripts/*.sh
chmod +x test/hyperlink-guardian/scripts/*.py
# Install Python dependencies
pip3 install openai requests pyyaml --break-system-packages
# Install optional Node.js dependencies
npm install -g markdown-link-check
The Guardian 2.0 system works out of the box with sensible defaults, but you can customize it using configuration files:
Main Configuration: test/hyperlink-guardian/config/guardian-config.yml
# Site configuration
site:
url: "https://your-username.github.io/your-repo"
name: "Your Site Name"
# Testing parameters
testing:
max_parallel: 10
timeout: 30
retry_count: 2
# AI analysis configuration
ai_analysis:
model: "gpt-4"
max_tokens: 3000
temperature: 0.3
URL Exclusions: test/hyperlink-guardian/config/exclusions.txt
# Add regex patterns for URLs to exclude
localhost
127\.0\.0\.1
example\.com
For AI-powered analysis, configure the OpenAI API key:
OPENAI_API_KEY
Verify your Guardian 2.0 installation:
# Validate the complete setup
./test/hyperlink-guardian/scripts/validate.sh validate
# Check dependencies only
./test/hyperlink-guardian/scripts/validate.sh dependencies
# Quick functionality test
./test/hyperlink-guardian/scripts/validate.sh test
Guardian 2.0 supports environment variable overrides:
export SITE_URL="https://your-site.com"
export MAX_PARALLEL=20
export TIMEOUT=45
export VERBOSE=true
export OPENAI_API_KEY="your-api-key"
Create custom configuration files for different environments:
# Development configuration
cp test/hyperlink-guardian/config/guardian-config.yml dev-config.yml
# Use custom configuration
./test/hyperlink-guardian/scripts/guardian.sh --config dev-config.yml
The workflow is located at .github/workflows/hyperlink-guardian.yml
. Key customization points:
# Change schedule (currently daily at 3 AM UTC)
on:
schedule:
- cron: '0 3 * * *' # Modify this line
# Modify environment variables
env:
MAX_PARALLEL: 15 # Increase parallel processing
TIMEOUT: 45 # Increase timeout for slow sites
# Run a basic scan
./test/hyperlink-guardian/scripts/guardian.sh
# Run with verbose output
./test/hyperlink-guardian/scripts/guardian.sh --verbose
# Test only internal links
./test/hyperlink-guardian/scripts/guardian.sh --internal-only
# Exclude specific domains
./test/hyperlink-guardian/scripts/guardian.sh --exclude "github\.com|localhost"
# Set up OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Run AI analysis on existing results
python3 test/hyperlink-guardian/scripts/ai-analyzer.py \
--input ./test-results \
--output ./ai-analysis.json \
--verbose
# Quick validation without full scan
./test/hyperlink-guardian/scripts/validate.sh test --quick
# Dry run to see what would be executed
./test/hyperlink-guardian/scripts/guardian.sh --dry-run
# Test with limited URLs for development
MAX_PARALLEL=2 TIMEOUT=10 \
./test/hyperlink-guardian/scripts/guardian.sh --parallel 2 --timeout 10
If you’re upgrading from the previous hyperlink guardian system:
# Backup old scripts (if they exist)
mkdir -p backup/old-guardian
cp scripts/hyperlink-guardian.sh backup/old-guardian/ 2>/dev/null || true
cp scripts/ai-link-analyzer.py backup/old-guardian/ 2>/dev/null || true
The new workflow automatically uses the Guardian 2.0 framework. No manual updates needed if you’re using the refactored workflow.
# If you had custom exclusion patterns in the old system
# Add them to test/hyperlink-guardian/config/exclusions.txt
# If you had custom timeouts or parallel settings
# Add them to test/hyperlink-guardian/config/guardian-config.yml
# Ensure everything works correctly
./test/hyperlink-guardian/scripts/validate.sh validate --verbose
# Make all scripts executable
find test/hyperlink-guardian/scripts/ -name "*.sh" -exec chmod +x {} \;
find test/hyperlink-guardian/scripts/ -name "*.py" -exec chmod +x {} \;
# Check required dependencies
./test/hyperlink-guardian/scripts/validate.sh dependencies
# Install missing dependencies
./test/hyperlink-guardian/scripts/validate.sh setup
# Use --break-system-packages for externally managed Python environments
pip3 install openai requests pyyaml --break-system-packages
# Or use virtual environment
python3 -m venv guardian-env
source guardian-env/bin/activate
pip install openai requests pyyaml
# Check if you're in the correct directory
pwd # Should be your Jekyll site root
# Verify file patterns
find . -name "*.md" | head -5
# Run with verbose output
./test/hyperlink-guardian/scripts/guardian.sh --verbose
# Verify API key is set
echo $OPENAI_API_KEY
# Test API connectivity
python3 -c "import openai; print('OpenAI library works')"
# Run with fallback analysis
# Guardian 2.0 automatically provides enhanced fallback when AI is unavailable
Enable comprehensive debugging:
# Enable debug output
export DEBUG=true
export VERBOSE=true
# Run with maximum debugging
./test/hyperlink-guardian/scripts/guardian.sh --verbose 2>&1 | tee debug.log
Guardian 2.0 generates comprehensive logs:
test-results/
├── artifacts/
│ ├── guardian.log # Main execution log
│ ├── ai-analyzer.log # AI analysis log
│ └── raw-links.txt # All discovered links
└── summary.json # High-level results
# Complete configuration example
site:
url: "https://example.github.io/repo"
name: "Example Site"
description: "Educational platform"
testing:
max_parallel: 10
timeout: 30
retry_count: 2
retry_delay: 5
internal_only: false
user_agent: "Guardian-2.0"
# Performance thresholds
slow_response_threshold: 5.0
critical_response_threshold: 10.0
exclusions:
patterns:
- "localhost"
- "127\\.0\\.0\\.1"
- "example\\.com"
url_patterns:
- "^#.*"
- "^mailto:"
- "^tel:"
ai_analysis:
model: "gpt-4"
max_tokens: 3000
temperature: 0.3
enable_fallback: true
fallback_model: "gpt-3.5-turbo"
output:
base_directory: "test-results"
artifacts:
retention_days: 30
include_raw_data: true
github:
create_issues: true
issue_labels:
- "automated-report"
- "guardian-2.0"
close_old_issues: true
old_issue_threshold_days: 7
monitoring:
track_response_times: true
track_success_rates: true
track_error_patterns: true
enable_trend_analysis: true
educational:
prioritize_learning_resources: true
analyze_quest_links: true
assess_learner_impact: true
development:
verbose_logging: false
debug_mode: false
test_mode: false
# Guardian script options
./test/hyperlink-guardian/scripts/guardian.sh [OPTIONS]
Options:
-h, --help Show help message
-v, --verbose Enable verbose output
-u, --url URL Set target site URL
-o, --output DIR Set output directory
-p, --parallel NUM Set max parallel tests
-t, --timeout SEC Set request timeout
-r, --retry NUM Set retry count
-c, --config FILE Load configuration file
--internal-only Test only internal links
--exclude PATTERN Exclude URLs matching regex
# AI analyzer options
python3 test/hyperlink-guardian/scripts/ai-analyzer.py [OPTIONS]
Options:
-h, --help Show help message
-i, --input DIR Input directory with scan results
-o, --output FILE Output file for analysis
-m, --model MODEL OpenAI model to use
-c, --config FILE Configuration file
-v, --verbose Enable verbose logging
# Validation script options
./test/hyperlink-guardian/scripts/validate.sh [COMMAND] [OPTIONS]
Commands:
setup Set up development environment
validate Full validation
test Quick functionality test
dependencies Check dependencies
config Validate configuration
Options:
-h, --help Show help message
-v, --verbose Enable verbose output
-d, --dry-run Show what would be done
-q, --quick Quick tests only
After successful setup:
Guardian 2.0 is part of the IT-Journey Testing Framework, designed to ensure reliable access to educational resources while providing real-world examples of modern DevOps practices.