Our IT-Journey repository had grown to include two separate link checking workflows:
link-checker.yml - Basic link validation with embedded Python scriptshyperlink-guardian.yml - Advanced monitoring with AI analysis capabilitiesWhile both served their purposes, they shared significant code duplication and suffered from the classic problem of embedded scripts in YAML files: they were difficult to test, debug, and maintain.
Today we embarked on a comprehensive refactoring journey to create:
We began by analyzing both existing workflows to understand their unique capabilities:
link-checker.yml provided:
hyperlink-guardian.yml offered:
We designed a five-script modular system:
scripts/link-checker/
├── install-dependencies.sh # Dependency management
├── run-link-checker.sh # Main execution engine
├── analyze-links.py # Enhanced analysis
├── ai-analyze-links.py # AI-powered insights
└── create-github-issue.sh # GitHub integration
#!/bin/bash
# Centralized dependency installation with verification
install_lychee() {
if ! command -v lychee >/dev/null 2>&1; then
log_info "Installing Lychee link checker..."
curl -sSfL https://github.com/lycheeverse/lychee/releases/latest/download/lychee-x86_64-unknown-linux-gnu.tar.gz \
| tar -xz -C /tmp
sudo mv /tmp/lychee /usr/local/bin/
verify_installation lychee "Lychee link checker"
else
log_success "Lychee already installed: $(lychee --version 2>/dev/null | head -1)"
fi
}
Key Features:
#!/bin/bash
# Main link checking execution engine with flexible configuration
build_lychee_command() {
local cmd="lychee"
# Add scope-specific includes/excludes
case "$SCOPE" in
"website")
cmd="$cmd --include '$BASE_URL'"
;;
"internal")
cmd="$cmd --include '$BASE_URL' --exclude-all-private"
;;
"docs")
cmd="$cmd 'docs/'"
;;
esac
# Add configuration options
cmd="$cmd --timeout $TIMEOUT"
cmd="$cmd --max-retries $MAX_RETRIES"
[[ "$FOLLOW_REDIRECTS" == "true" ]] && cmd="$cmd --remap"
echo "$cmd"
}
Key Features:
#!/usr/bin/env python3
"""
Enhanced link analysis with pattern recognition and categorization
"""
def analyze_broken_links(self, broken_links):
"""Enhanced analysis with better categorization"""
categories = {
'external_timeout': [],
'dns_failure': [],
'http_errors': [],
'internal_broken': [],
'redirect_issues': []
}
for link in broken_links:
error_msg = link.get('error', '').lower()
status = link.get('status', 0)
# Defensive programming for various error formats
if 'timeout' in error_msg or 'timed out' in error_msg:
categories['external_timeout'].append(link)
elif 'dns' in error_msg or 'resolve' in error_msg:
categories['dns_failure'].append(link)
elif status >= 400:
categories['http_errors'].append(link)
# ... additional categorization logic
return categories
Key Features:
#!/usr/bin/env python3
"""
AI-powered analysis using OpenAI with intelligent fallbacks
"""
async def analyze_with_ai(self, context):
"""AI analysis with comprehensive error handling"""
try:
if not self.openai_client:
return await self.fallback_analysis(context)
prompt = self._build_analysis_prompt(context)
response = await self.openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
max_tokens=2000,
temperature=0.3
)
return self._parse_ai_response(response)
except Exception as e:
self.logger.warning(f"AI analysis failed: {e}")
return await self.fallback_analysis(context)
Key Features:
#!/bin/bash
# Comprehensive GitHub issue creation with rich formatting
generate_issue_body() {
cat > "$issue_body_file" << EOF
# 🔗 IT-Journey Link Health Report
## 📊 Summary
- **Total links checked**: $TOTAL_COUNT
- **Broken links found**: $BROKEN_COUNT
- **Success rate**: $SUCCESS_RATE%
- **Check date**: $timestamp
$(generate_status_section)
$(include_detailed_results)
$(include_ai_analysis)
$(include_action_items)
EOF
}
Key Features:
The new link-health-guardian.yml workflow combines features from both previous workflows:
name: 🔗 IT-Journey Link Health Guardian
on:
schedule:
- cron: '0 6 * * 1' # Monday mornings
- cron: '0 18 * * 5' # Friday evenings
workflow_dispatch:
inputs:
scope:
description: 'Link checking scope'
type: choice
options:
- 'website'
- 'internal'
- 'external'
- 'docs'
- 'posts'
- 'quests'
analysis_level:
description: 'Analysis depth'
type: choice
options:
- 'basic'
- 'standard'
- 'comprehensive'
- 'ai-only'
Challenge: Managing complexity of combining two different approaches Solution: Started with a clear architectural vision and built modularly
Challenge: Ensuring backward compatibility with existing functionality
Solution: Preserved all existing features while enhancing them
Challenge: Testing modular scripts independently Solution: Each script includes comprehensive argument parsing and standalone operation
install-dependencies.sh - Dependency managementrun-link-checker.sh - Main execution engineanalyze-links.py - Enhanced analysisai-analyze-links.py - AI-powered insightscreate-github-issue.sh - GitHub integrationlink-health-guardian.ymlOur refactoring achieved several key improvements:
# Manual execution example
./scripts/link-checker/install-dependencies.sh
./scripts/link-checker/run-link-checker.sh \
--scope website \
--analysis-level comprehensive \
--follow-redirects \
--max-retries 3
# The workflow will automatically:
# 1. Install dependencies
# 2. Run link checking
# 3. Perform analysis
# 4. Generate AI insights
# 5. Create GitHub issue
# Basic link checking
scope: 'docs'
analysis_level: 'basic'
ai_analysis: false
# Comprehensive analysis
scope: 'website'
analysis_level: 'comprehensive'
ai_analysis: true
create_issue: true
# AI-only analysis of existing results
scope: 'website'
analysis_level: 'ai-only'
ai_analysis: true
# Make scripts executable
chmod +x scripts/link-checker/*.sh
chmod +x scripts/link-checker/*.py
# Force reinstall dependencies
./scripts/link-checker/install-dependencies.sh --force
# Check API key availability
echo $OPENAI_API_KEY
# Run without AI if needed
./scripts/link-checker/run-link-checker.sh --analysis-level standard
The modular architecture makes it easy to contribute:
This transformation from embedded scripts to modular architecture demonstrates how thoughtful refactoring can dramatically improve maintainability while enhancing functionality. The new system provides all the capabilities of the previous workflows while being much easier to test, debug, and extend.
The key takeaway: Sometimes the best way to solve growing complexity is to step back, understand the core requirements, and rebuild with better foundations. Our AI-assisted development approach made this refactoring both efficient and comprehensive, resulting in a system that serves the IT-Journey community much better than the sum of its previous parts.