AI agent documentation quality analyzer for AGENTS.md and CLAUDE.md files.
This tool evaluates your AI agent instruction files using 17 specialized evaluators to identify issues and improvement opportunities. It helps ensure your documentation provides clear, actionable guidance for AI coding assistants.
An experimental project from Packmind.
Visit https://context-evaluator.ai and paste your repository URL.
# Clone and install
git clone https://github.com/PackmindHub/context-evaluator.git
cd context-evaluator
bun install
# Start the application
bun run devThen open http://localhost:3000 in your browser.
- Git clone operations run on your local machine
- Private repositories may work if your git credentials are configured (SSH keys, credential helpers)
- The homepage auto-detects which AI agents you have installed
Input (Git URL or Local Path)
↓
Clone Repository (if remote)
↓
Analyze Codebase (languages, frameworks, patterns)
↓
Find Documentation (AGENTS.md, CLAUDE.md, linked files)
↓
Run 17 Evaluators via AI
↓
Rank by Impact
↓
Calculate Score & Grade
↓
Return Results
Processing time: 1-3 minutes depending on codebase size and AI provider.
Cost display: Shows API costs when supported by the provider.
Results are categorized into two types:
- Errors (13 evaluators): Issues with existing content that need fixing
- Suggestions (4 evaluators): Opportunities for new content based on codebase analysis
Each issue includes:
- Severity level (Critical, High, Medium, Low)
- Location in your documentation
- Problem description
- Recommended fix
| # | Evaluator | Type | Description |
|---|---|---|---|
| 01 | Content Quality | Error | Detects human-focused, irrelevant, or vague content |
| 02 | Structure & Formatting | Error | Identifies poor organization and inconsistent formatting |
| 03 | Command Completeness | Error | Finds incomplete commands and missing prerequisites |
| 04 | Testing Guidance | Error | Detects absent or unclear testing instructions |
| 05 | Code Style Clarity | Error | Identifies missing or conflicting style guidelines |
| 06 | Language Clarity | Error | Finds ambiguous language and undefined jargon |
| 07 | Workflow Integration | Error | Detects missing git/CI workflow documentation |
| 08 | Project Structure | Error | Identifies missing codebase organization explanations |
| 09 | Security Awareness | Error | Finds exposed credentials and security risks |
| 10 | Completeness & Balance | Error | Detects skeletal or over-detailed content |
| 11 | Subdirectory Coverage | Suggestion | Recommends separate AGENTS.md for subdirectories |
| 12 | Context Gaps | Suggestion | Discovers undocumented framework/tool patterns |
| 13 | Contradictory Instructions | Error | Detects conflicting instructions across files |
| 14 | Test Patterns Coverage | Suggestion | Discovers undocumented testing conventions |
| 15 | Database Patterns Coverage | Suggestion | Discovers undocumented database/ORM patterns |
| 17 | Markdown Validity | Error | Checks markdown syntax and link validity |
| 19 | Outdated Documentation | Error | Verifies documented paths and files exist |
The tool supports multiple AI providers:
| Provider | CLI Flag | Setup |
|---|---|---|
| Claude Code | --agent claude (default) |
claude.ai/code |
| Cursor Agent | --agent cursor |
cursor.com |
| OpenCode | --agent opencode |
github.com/opencode-ai/opencode |
| GitHub Copilot | --agent github-copilot |
docs.github.com/copilot |
# Evaluate current directory
bun run evaluate
# Evaluate a remote repository
bun run evaluate --url https://github.com/user/repo
# Evaluate a local directory
bun run evaluate --path /path/to/project| Option | Description | Default |
|---|---|---|
--url <github-url> |
GitHub repository URL to clone and evaluate | - |
--path <directory> |
Local directory path (absolute or relative) | Current directory |
--agent <name> |
AI provider: claude, cursor, opencode, github-copilot |
claude |
-o, --output <file> |
Output file path for results | evaluator-results.json |
--report <mode> |
Output format: terminal, raw, json |
terminal |
Evaluation Scope:
| Option | Description | Default |
|---|---|---|
--evaluators <number> |
Number of evaluators to run | 12 |
--evaluator-filter <type> |
Filter: all (19), errors (14), suggestions (5) |
all |
--depth <integer> |
Limit directory depth for context file search (0 = root only) | Unlimited |
Evaluation Mode:
| Option | Description |
|---|---|
--unified |
All files evaluated together (better cross-file detection) |
--independent |
Each file evaluated separately |
--max-tokens <number> |
Maximum combined tokens for unified mode (default: 100000) |
Results:
| Option | Description | Default |
|---|---|---|
--no-curation |
Show all issues without impact prioritization | Curation enabled |
--top-n <number> |
Number of top issues to curate | 20 |
Debug:
| Option | Description |
|---|---|
-v, --verbose |
Enable verbose output |
--debug |
Save prompts/responses to debug-output/ directory |
--preserve-debug-output |
Keep debug files after successful evaluation |
# Run all error evaluators only
bun run evaluate --evaluator-filter errors
# Evaluate with verbose output and top 10 issues
bun run evaluate -v --top-n 10
# Evaluate remote repo with JSON output
bun run evaluate --url https://github.com/user/repo --report json -o report.json
# Use Cursor agent with unified mode
bun run evaluate --agent cursor --unifiedSee CONTRIBUTING.md for development setup, architecture details, API reference, and contribution guidelines.
Built with Bun, React, Tailwind CSS, and TypeScript.
License: MIT
Issues & Feedback: GitHub Issues