Skip to content

PackmindHub/context-evaluator

Repository files navigation

context-evaluator

AI agent documentation quality analyzer for AGENTS.md and CLAUDE.md files.

This tool evaluates your AI agent instruction files using 17 specialized evaluators to identify issues and improvement opportunities. It helps ensure your documentation provides clear, actionable guidance for AI coding assistants.

An experimental project from Packmind.


How to Use

Option 1: Online (No Install)

Visit https://context-evaluator.ai and paste your repository URL.

Option 2: Local

# Clone and install
git clone https://github.com/PackmindHub/context-evaluator.git
cd context-evaluator
bun install

# Start the application
bun run dev

Then open http://localhost:3000 in your browser.

Local Scanning Notes

  • Git clone operations run on your local machine
  • Private repositories may work if your git credentials are configured (SSH keys, credential helpers)
  • The homepage auto-detects which AI agents you have installed

How it Works

Input (Git URL or Local Path)
    ↓
Clone Repository (if remote)
    ↓
Analyze Codebase (languages, frameworks, patterns)
    ↓
Find Documentation (AGENTS.md, CLAUDE.md, linked files)
    ↓
Run 17 Evaluators via AI
    ↓
Rank by Impact
    ↓
Calculate Score & Grade
    ↓
Return Results

Processing time: 1-3 minutes depending on codebase size and AI provider.

Cost display: Shows API costs when supported by the provider.


Understanding Results

Results are categorized into two types:

  • Errors (13 evaluators): Issues with existing content that need fixing
  • Suggestions (4 evaluators): Opportunities for new content based on codebase analysis

Each issue includes:

  • Severity level (Critical, High, Medium, Low)
  • Location in your documentation
  • Problem description
  • Recommended fix

Evaluators

# Evaluator Type Description
01 Content Quality Error Detects human-focused, irrelevant, or vague content
02 Structure & Formatting Error Identifies poor organization and inconsistent formatting
03 Command Completeness Error Finds incomplete commands and missing prerequisites
04 Testing Guidance Error Detects absent or unclear testing instructions
05 Code Style Clarity Error Identifies missing or conflicting style guidelines
06 Language Clarity Error Finds ambiguous language and undefined jargon
07 Workflow Integration Error Detects missing git/CI workflow documentation
08 Project Structure Error Identifies missing codebase organization explanations
09 Security Awareness Error Finds exposed credentials and security risks
10 Completeness & Balance Error Detects skeletal or over-detailed content
11 Subdirectory Coverage Suggestion Recommends separate AGENTS.md for subdirectories
12 Context Gaps Suggestion Discovers undocumented framework/tool patterns
13 Contradictory Instructions Error Detects conflicting instructions across files
14 Test Patterns Coverage Suggestion Discovers undocumented testing conventions
15 Database Patterns Coverage Suggestion Discovers undocumented database/ORM patterns
17 Markdown Validity Error Checks markdown syntax and link validity
19 Outdated Documentation Error Verifies documented paths and files exist

AI Providers

The tool supports multiple AI providers:

Provider CLI Flag Setup
Claude Code --agent claude (default) claude.ai/code
Cursor Agent --agent cursor cursor.com
OpenCode --agent opencode github.com/opencode-ai/opencode
GitHub Copilot --agent github-copilot docs.github.com/copilot

CLI Reference

Basic Usage

# Evaluate current directory
bun run evaluate

# Evaluate a remote repository
bun run evaluate --url https://github.com/user/repo

# Evaluate a local directory
bun run evaluate --path /path/to/project

Evaluate Command Options

Option Description Default
--url <github-url> GitHub repository URL to clone and evaluate -
--path <directory> Local directory path (absolute or relative) Current directory
--agent <name> AI provider: claude, cursor, opencode, github-copilot claude
-o, --output <file> Output file path for results evaluator-results.json
--report <mode> Output format: terminal, raw, json terminal

Evaluation Scope:

Option Description Default
--evaluators <number> Number of evaluators to run 12
--evaluator-filter <type> Filter: all (19), errors (14), suggestions (5) all
--depth <integer> Limit directory depth for context file search (0 = root only) Unlimited

Evaluation Mode:

Option Description
--unified All files evaluated together (better cross-file detection)
--independent Each file evaluated separately
--max-tokens <number> Maximum combined tokens for unified mode (default: 100000)

Results:

Option Description Default
--no-curation Show all issues without impact prioritization Curation enabled
--top-n <number> Number of top issues to curate 20

Debug:

Option Description
-v, --verbose Enable verbose output
--debug Save prompts/responses to debug-output/ directory
--preserve-debug-output Keep debug files after successful evaluation

Examples

# Run all error evaluators only
bun run evaluate --evaluator-filter errors

# Evaluate with verbose output and top 10 issues
bun run evaluate -v --top-n 10

# Evaluate remote repo with JSON output
bun run evaluate --url https://github.com/user/repo --report json -o report.json

# Use Cursor agent with unified mode
bun run evaluate --agent cursor --unified

Contributing

See CONTRIBUTING.md for development setup, architecture details, API reference, and contribution guidelines.


About

Built with Bun, React, Tailwind CSS, and TypeScript.

License: MIT

Issues & Feedback: GitHub Issues

About

Evaluator context and documentation setup for ai coding agents

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages