From 944d4d8ad5a5fe603dc86d8191d8776006b25f14 Mon Sep 17 00:00:00 2001 From: Drew Miller Date: Sun, 8 Feb 2026 12:31:43 -0500 Subject: [PATCH 1/3] Rewrite workflows:review with phased file-based map-reduce architecture MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the v1 review command with a context-managed v2 that prevents context overflow by having sub-agents write findings to .review/ on disk and return only single-sentence summaries to the parent. Key changes: - Add PR Intent Analysis phase (shared context for all specialists) - Agents write JSON to .review/, return ~100 tokens each to parent - Add validation step to catch silent agent failures and invalid JSON - Add Judge phase for dedup, hallucination removal, and ranking - Add Deep Analysis phase for P1/P2 enrichment - Parent reads only ENRICHED_FINDINGS.json (~8k tokens vs ~30-50k in v1) - Smart agent selection based on PR content instead of running all agents - Cross-platform safe: uses project-relative .review/ instead of /tmp/ 🤖 Generated with Claude Code Co-Authored-By: Claude --- .../commands/workflows/review.md | 795 +++++++++--------- 1 file changed, 398 insertions(+), 397 deletions(-) diff --git a/plugins/compound-engineering/commands/workflows/review.md b/plugins/compound-engineering/commands/workflows/review.md index e6593034..4518b8d0 100644 --- a/plugins/compound-engineering/commands/workflows/review.md +++ b/plugins/compound-engineering/commands/workflows/review.md @@ -4,21 +4,33 @@ description: Perform exhaustive code reviews using multi-agent analysis, ultra-t argument-hint: "[PR number, GitHub URL, branch name, or latest]" --- -# Review Command +# Review Command (v2 — Context-Managed) - Perform exhaustive code reviews using multi-agent analysis, ultra-thinking, and Git worktrees for deep local inspection. +Perform exhaustive code reviews using multi-agent analysis with file-based synthesis to prevent context overflow. ## Introduction Senior Code Review Architect with expertise in security, performance, architecture, and quality assurance +## Architecture: Why This Command Works + +This command uses the **Phased File-Based Map-Reduce** pattern — optimized for review quality while staying within context limits: + +1. **Intent Phase** (sequential) — A single agent analyzes the PR and produces a shared intent summary + architectural context. All specialists receive this so they start from the same understanding. +2. **Map Phase** (parallel) — Specialist sub-agents write full analysis to `.review/`, return only a single sentence to the parent (~100 tokens each vs 2k-4k) +3. **Validate** — Verifies all expected agent files exist (re-runs missing agents), then checks JSON schema compliance +4. **Judge Phase** — A judge agent deduplicates, evidence-checks (discards hallucinated findings), ranks, and selects top-N +5. **Deep Analysis Phase** — A sub-agent performs stakeholder impact and scenario analysis on P1/P2 findings, enriching the judged report +6. **Reduce Phase** — The parent reads only `ENRICHED_FINDINGS.json` (~8k tokens max), then spawns todo-creation agents that read the actual code to produce high-quality, actionable todos + +This keeps the parent context under ~12k tokens of agent output regardless of how many agents run, while maximizing analytical depth at every stage. + ## Prerequisites - Git repository with GitHub CLI (`gh`) installed and authenticated - Clean main/master branch - Proper permissions to create worktrees and access the repository -- For document reviews: Path to a markdown file or document ## Main Tasks @@ -27,500 +39,489 @@ argument-hint: "[PR number, GitHub URL, branch name, or latest]" #$ARGUMENTS - -First, I need to determine the review target type and set up the code for analysis. - - -#### Immediate Actions: - - - [ ] Determine review type: PR number (numeric), GitHub URL, file path (.md), or empty (current branch) - [ ] Check current git branch -- [ ] If ALREADY on the target branch (PR branch, requested branch name, or the branch already checked out for review) → proceed with analysis on current branch -- [ ] If DIFFERENT branch than the review target → offer to use worktree: "Use git-worktree skill for isolated Call `skill: git-worktree` with branch name -- [ ] Fetch PR metadata using `gh pr view --json` for title, body, files, linked issues -- [ ] Set up language-specific analysis tools -- [ ] Prepare security scanning environment -- [ ] Make sure we are on the branch we are reviewing. Use gh pr checkout to switch to the branch or manually checkout the branch. - -Ensure that the code is ready for analysis (either in worktree or on current branch). ONLY then proceed to the next step. - +- [ ] If ALREADY on the target branch → proceed with analysis on current branch +- [ ] If DIFFERENT branch → use git-worktree skill: `skill: git-worktree` with branch name +- [ ] Fetch PR metadata using `gh pr view --json title,body,files,labels,milestone` +- [ ] Get the diff: `gh pr diff ` — save to `.review/pr_diff.txt` +- [ ] Make sure we are on the branch we are reviewing -#### Protected Artifacts - - -The following paths are compound-engineering pipeline artifacts and must never be flagged for deletion, removal, or gitignore by any review agent: - -- `docs/plans/*.md` — Plan files created by `/workflows:plan`. These are living documents that track implementation progress (checkboxes are checked off by `/workflows:work`). -- `docs/solutions/*.md` — Solution documents created during the pipeline. - -If a review agent flags any file in these directories for cleanup or removal, discard that finding during synthesis. Do not create a todo for it. - - -#### Parallel Agents to review the PR: - - - -Run ALL or most of these agents at the same time: - -1. Task kieran-rails-reviewer(PR content) -2. Task dhh-rails-reviewer(PR title) -3. If turbo is used: Task rails-turbo-expert(PR content) -4. Task git-history-analyzer(PR content) -5. Task dependency-detective(PR content) -6. Task pattern-recognition-specialist(PR content) -7. Task architecture-strategist(PR content) -8. Task code-philosopher(PR content) -9. Task security-sentinel(PR content) -10. Task performance-oracle(PR content) -11. Task devops-harmony-analyst(PR content) -12. Task data-integrity-guardian(PR content) -13. Task agent-native-reviewer(PR content) - Verify new features are agent-accessible - - - -#### Conditional Agents (Run if applicable): - - - -These agents are run ONLY when the PR matches specific criteria. Check the PR files list to determine if they apply: - -**If PR contains database migrations (db/migrate/*.rb files) or data backfills:** - -14. Task data-migration-expert(PR content) - Validates ID mappings match production, checks for swapped values, verifies rollback safety -15. Task deployment-verification-agent(PR content) - Creates Go/No-Go deployment checklist with SQL verification queries - -**When to run migration agents:** -- PR includes files matching `db/migrate/*.rb` -- PR modifies columns that store IDs, enums, or mappings -- PR includes data backfill scripts or rake tasks -- PR changes how data is read/written (e.g., changing from FK to string column) -- PR title/body mentions: migration, backfill, data transformation, ID mapping - -**What these agents check:** -- `data-migration-expert`: Verifies hard-coded mappings match production reality (prevents swapped IDs), checks for orphaned associations, validates dual-write patterns -- `deployment-verification-agent`: Produces executable pre/post-deploy checklists with SQL queries, rollback procedures, and monitoring plans - - - -### 4. Ultra-Thinking Deep Dive Phases - - For each phase below, spend maximum cognitive effort. Think step by step. Consider all angles. Question assumptions. And bring all reviews in a synthesis to the user. - - -Complete system context map with component interactions - +### 2. Prepare the Scratchpad Directory -#### Phase 3: Stakeholder Perspective Analysis + +Use a project-relative path, NOT /tmp/. The /tmp/ path causes two problems: +1. Claude Code's Read tool and MCP filesystem tools cannot access /tmp/ (outside allowed directories) +2. On Windows, /tmp/ resolves to different locations depending on the subprocess (MSYS2 vs literal C:\tmp), splitting agent files across directories - ULTRA-THINK: Put yourself in each stakeholder's shoes. What matters to them? What are their pain points? +Using .review/ inside the project avoids both issues. All tools (Read, Write, Bash, MCP) can access project-relative paths reliably. + - - -1. **Developer Perspective** - - - How easy is this to understand and modify? - - Are the APIs intuitive? - - Is debugging straightforward? - - Can I test this easily? - -2. **Operations Perspective** - - - How do I deploy this safely? - - What metrics and logs are available? - - How do I troubleshoot issues? - - What are the resource requirements? +```bash +# Create the review session directory (project-relative, cross-platform safe) +REVIEW_DIR=".review" +rm -rf "$REVIEW_DIR" +mkdir -p "$REVIEW_DIR" +echo "$REVIEW_DIR/" >> .gitignore 2>/dev/null # Ensure it's gitignored + +# Save PR metadata for agents to reference +gh pr view --json title,body,files > "$REVIEW_DIR/pr_meta.json" +gh pr diff > "$REVIEW_DIR/pr_diff.txt" +``` -3. **End User Perspective** +Each sub-agent will READ from `$REVIEW_DIR/pr_diff.txt` and WRITE its findings to `$REVIEW_DIR/{agent_name}.json`. - - Is the feature intuitive? - - Are error messages helpful? - - Is performance acceptable? - - Does it solve my problem? +All references below use `.review/` — this is the ONLY path agents should use. Do not use `/tmp/` anywhere. -4. **Security Team Perspective** +### 3. PR Intent Analysis (Phase 0 — Sequential) - - What's the attack surface? - - Are there compliance requirements? - - How is data protected? - - What are the audit capabilities? + +Run this BEFORE launching specialist agents. This produces the shared context that all specialists receive, ensuring they start from the same understanding and spend their reasoning on domain-specific analysis rather than independently re-deriving PR intent. + -5. **Business Perspective** - - What's the ROI? - - Are there legal/compliance risks? - - How does this affect time-to-market? - - What's the total cost of ownership? +``` +Task pr-intent-analyzer(" +You are a PR Intent Analyzer. Your job is to produce a concise, high-signal summary that specialist review agents will use as shared context. + +## Instructions: +1. Read .review/pr_diff.txt and .review/pr_meta.json +2. Write your analysis to .review/PR_INTENT.md (max 500 words) +3. Your analysis MUST cover: + - **Intent**: What is this PR trying to accomplish? (2-3 sentences) + - **Approach**: How does it achieve this? Key architectural decisions made. (3-5 sentences) + - **Scope**: Files changed, components affected, boundaries of the change + - **Risk Surface**: Which areas carry the most risk — new code, refactored paths, changed interfaces, data model changes + - **Testing Gap**: What's tested vs what's not, based on test files in the diff +4. Return to parent: 'Intent analysis complete. Written to .review/PR_INTENT.md' +") +``` -#### Phase 4: Scenario Exploration +Wait for this to complete before launching specialists. The ~30 second cost pays for itself in higher-quality, less-redundant specialist findings. - ULTRA-THINK: Explore edge cases and failure scenarios. What could go wrong? How does the system behave under stress? +### 4. Launch Review Agents (Map Phase) - + +EVERY sub-agent prompt MUST include these output constraints AND the shared intent context. This is what prevents context overflow while maximizing quality. -- [ ] **Happy Path**: Normal operation with valid inputs -- [ ] **Invalid Inputs**: Null, empty, malformed data -- [ ] **Boundary Conditions**: Min/max values, empty collections -- [ ] **Concurrent Access**: Race conditions, deadlocks -- [ ] **Scale Testing**: 10x, 100x, 1000x normal load -- [ ] **Network Issues**: Timeouts, partial failures -- [ ] **Resource Exhaustion**: Memory, disk, connections -- [ ] **Security Attacks**: Injection, overflow, DoS -- [ ] **Data Corruption**: Partial writes, inconsistency -- [ ] **Cascading Failures**: Downstream service issues +Append this to EVERY agent spawn prompt: -### 6. Multi-Angle Review Perspectives +``` +## SHARED CONTEXT +Read .review/PR_INTENT.md first for PR intent, approach, and risk surface. Do not re-derive what the PR does — focus your full reasoning on your specialist domain. + +## OUTPUT RULES (MANDATORY) +1. Read the diff from .review/pr_diff.txt +2. Write your FULL analysis as JSON to .review/{your_agent_name}.json +3. Use this EXACT schema for findings (hard caps enforced): + { + "agent_type": "security|performance|architecture|rails|patterns|simplicity|...", + "summary": "<500 chars max — your key takeaway>", + "findings_count": , + "critical_count": , + "findings": [ + { + "severity": "p1|p2|p3", + "category": "<50 chars>", + "title": "<100 chars>", + "location": "path/to/file.ext:line_number", + "description": "<300 chars>", + "recommendation": "<200 chars>", + "confidence": 0.0-1.0 + } + ] + } +4. Max 10 findings per agent. Prioritize by severity. No prose analysis fields. +5. Only report findings with confidence >= 0.6. Quality over quantity. +6. EXCLUDED PATHS — do not analyze or report findings for: + - docs/plans/*.md + - docs/solutions/*.md +7. Every finding MUST reference a real file path and line number from the diff. Do not report findings without specific location evidence. +8. Return ONLY this to the parent (do NOT return the full analysis): + "Review complete. Wrote findings ( critical) to .review/{agent_name}.json. Key takeaway: <1 sentence>" +``` + -#### Technical Excellence Angle +#### Agent Selection — Choose Based on PR Content -- Code craftsmanship evaluation -- Engineering best practices -- Technical documentation quality -- Tooling and automation assessment + +Do NOT run all agents on every PR. Examine the PR file list and select relevant agents: -#### Business Value Angle +**Always Run (Core — 5 agents):** +1. `architecture-strategist` — structural patterns, coupling, abstractions, component boundaries +2. `security-sentinel` — vulnerabilities, auth, input validation, injection +3. `performance-oracle` — N+1 queries, memory leaks, scaling concerns, resource exhaustion +4. `code-simplicity-reviewer` — unnecessary complexity, dead code, simplification opportunities +5. `pattern-recognition-specialist` — anti-patterns, inconsistencies, convention violations -- Feature completeness validation -- Performance impact on users -- Cost-benefit analysis -- Time-to-market considerations +**Run If Applicable:** +- `kieran-rails-reviewer` — only if PR has `.rb` files +- `dhh-rails-reviewer` — only if PR has `.rb` files +- `rails-turbo-expert` — only if PR uses Turbo/Stimulus +- `agent-native-reviewer` — only if PR adds user-facing features +- `dependency-detective` — only if PR modifies package files (Gemfile, package.json, etc.) +- `devops-harmony-analyst` — only if PR touches CI/CD, Docker, or deployment configs +- `data-integrity-guardian` — only if PR modifies database queries or data flows -#### Risk Management Angle +**Migration-Specific (only for db/migrate/*.rb files):** +- `data-migration-expert` — validates ID mappings, rollback safety +- `deployment-verification-agent` — Go/No-Go deployment checklist + -- Security risk assessment -- Operational risk evaluation -- Compliance risk verification -- Technical debt accumulation +#### Launch Pattern -#### Team Dynamics Angle + +Launch selected agents in parallel. Each agent reads the diff from the scratchpad and writes its findings there. -- Code review etiquette -- Knowledge sharing effectiveness -- Collaboration patterns -- Mentoring opportunities +Example for a TypeScript PR (no Rails, no migrations): -### 4. Simplification and Minimalism Review +``` +Task architecture-strategist("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task security-sentinel("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task performance-oracle("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task code-simplicity-reviewer("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task pattern-recognition-specialist("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +``` -Run the Task code-simplicity-reviewer() to see if we can simplify the code. +Example for a Rails PR with migrations: -### 5. Findings Synthesis and Todo Creation Using file-todos Skill +``` +Task kieran-rails-reviewer("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task security-sentinel("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task performance-oracle("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task architecture-strategist("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task data-migration-expert("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +``` - ALL findings MUST be stored in the todos/ directory using the file-todos skill. Create todo files immediately after synthesis - do NOT present findings for user approval first. Use the skill for structured todo management. +Wait for ALL agents to complete. + -#### Step 1: Synthesize All Findings +### 5. Verify and Validate Agent Outputs - -Consolidate all agent reports into a categorized list of findings. -Remove duplicates, prioritize by severity and impact. - + +Two checks run before the judge: (1) verify every launched agent actually produced a file, (2) validate the JSON schema. This catches the silent-failure bug where an agent reports completion but its file is missing. + - +#### Step 5a: Verify All Expected Files Exist -- [ ] Collect findings from all parallel agents -- [ ] Discard any findings that recommend deleting or gitignoring files in `docs/plans/` or `docs/solutions/` (see Protected Artifacts above) -- [ ] Categorize by type: security, performance, architecture, quality, etc. -- [ ] Assign severity levels: 🔴 CRITICAL (P1), 🟡 IMPORTANT (P2), 🔵 NICE-TO-HAVE (P3) -- [ ] Remove duplicate or overlapping findings -- [ ] Estimate effort for each finding (Small/Medium/Large) +```bash +# List expected agent files based on which agents were launched +# (adjust this list to match the agents you actually spawned in Step 4) +EXPECTED_AGENTS="architecture security performance simplicity patterns" # or whatever you launched + +MISSING="" +for agent in $EXPECTED_AGENTS; do + if ! ls .review/${agent}*.json 1>/dev/null 2>&1; then + MISSING="$MISSING $agent" + echo "MISSING: $agent — no output file found in .review/" + fi +done + +if [ -n "$MISSING" ]; then + echo "⚠️ Missing agent files:$MISSING" + echo "Re-run these agents before proceeding to judge." +fi +``` - +**If any agent file is missing:** Re-launch that specific agent before proceeding. Do not skip to the judge — silent agent failures caused 10 findings (including 2 P1s) to be lost in testing. -#### Step 2: Create Todo Files Using file-todos Skill +#### Step 5b: Validate JSON Schema - Use the file-todos skill to create todo files for ALL findings immediately. Do NOT present findings one-by-one asking for user approval. Create all todo files in parallel using the skill, then summarize results to user. +```bash +# Validate all agent output files +PR_FILES=$(cat .review/pr_meta.json | python3 -c "import json,sys; [print(f['path']) for f in json.load(sys.stdin)['files']]" 2>/dev/null || echo "") + +for f in .review/*.json; do + [[ "$(basename "$f")" == "pr_meta.json" ]] && continue + python3 -c " +import json, sys +try: + data = json.load(open('$f')) + assert 'findings' in data, 'missing findings key' + assert isinstance(data['findings'], list), 'findings not a list' + assert len(data['findings']) <= 10, f'too many findings: {len(data[\"findings\"])}' + for i, finding in enumerate(data['findings']): + assert finding.get('severity') in ('p1','p2','p3'), f'finding {i}: bad severity' + assert finding.get('location'), f'finding {i}: missing location' + print('VALID:', '$(basename $f)', '-', len(data['findings']), 'findings') +except Exception as e: + print('INVALID:', '$(basename $f)', '-', str(e), '— removing from review') + import os; os.remove('$f') +" 2>&1 +done +``` -**Implementation Options:** +If an agent produced invalid output, it's removed before judging. The review continues with the valid agent outputs. -**Option A: Direct File Creation (Fast)** +**Checkpoint:** At this point, every launched agent should have a valid JSON file in `.review/`. If not, re-run the missing/invalid agents before proceeding. -- Create todo files directly using Write tool -- All findings in parallel for speed -- Use standard template from `.claude/skills/file-todos/assets/todo-template.md` -- Follow naming convention: `{issue_id}-pending-{priority}-{description}.md` +### 6. Judge Phase — Deduplicate and Rank -**Option B: Sub-Agents in Parallel (Recommended for Scale)** For large PRs with 15+ findings, use sub-agents to create finding files in parallel: + +Do NOT read individual agent JSON files into your context. Launch a JUDGE agent that reads them in its own context window. + -```bash -# Launch multiple finding-creator agents in parallel -Task() - Create todos for first finding -Task() - Create todos for second finding -Task() - Create todos for third finding -etc. for each finding. +``` +Task judge-findings(" +You are a Code Review Judge. Consolidate findings from multiple review agents into a single, deduplicated, ranked report. + +## Instructions: +1. Read ALL JSON files in .review/*.json (skip pr_meta.json and pr_diff.txt) +2. Read .review/pr_meta.json to get the list of files actually in this PR +3. Collect all findings across agents +4. EVIDENCE CHECK: Discard any finding whose 'location' field does not reference a file path present in pr_meta.json. These are hallucinated findings. +5. DEDUPLICATE: Remove semantically similar findings (keep the higher-confidence one) +6. RESOLVE CONFLICTS: If agents contradict each other, note the conflict and pick the more evidence-based finding +7. RANK by: severity (p1 > p2 > p3), then confidence score +8. SELECT top 15 findings maximum. Only include findings with confidence >= 0.6. +9. For each P1 finding, add a brief 'impact' field (<200 chars) assessing: deployment risk, user impact, and attack surface. +10. Write the consolidated report to .review/JUDGED_FINDINGS.json using this schema: + +{ + \"pr_number\": , + \"total_raw_findings\": , + \"duplicates_removed\": , + \"hallucinated_removed\": , + \"conflicts_resolved\": , + \"final_findings\": [ + { + \"id\": 1, + \"severity\": \"p1|p2|p3\", + \"category\": \"<50 chars>\", + \"title\": \"<100 chars>\", + \"location\": \"path/to/file.ext:line\", + \"description\": \"<300 chars>\", + \"recommendation\": \"<200 chars>\", + \"confidence\": 0.0-1.0, + \"source_agents\": [\"agent1\", \"agent2\"], + \"effort\": \"small|medium|large\", + \"impact\": \"<200 chars, P1 only, null for P2/P3>\" + } + ], + \"agent_summaries\": [ + {\"agent\": \"name\", \"summary\": \"\"} + ] +} + +11. Return to parent: 'Judging complete. raw findings → after dedup ( hallucinated removed). P1: , P2: , P3: . Report at .review/JUDGED_FINDINGS.json' +") ``` -Sub-agents can: - -- Process multiple findings simultaneously -- Write detailed todo files with all sections filled -- Organize findings by severity -- Create comprehensive Proposed Solutions -- Add acceptance criteria and work logs -- Complete much faster than sequential processing - -**Execution Strategy:** +### 7. Deep Analysis Phase (P1/P2 Enrichment) -1. Synthesize all findings into categories (P1/P2/P3) -2. Group findings by severity -3. Launch 3 parallel sub-agents (one per severity level) -4. Each sub-agent creates its batch of todos using the file-todos skill -5. Consolidate results and present summary + +After judging, spawn a deep-analysis agent to enrich findings. This runs in its own context window so it doesn't cost the parent anything, but it produces significantly richer findings for todo creation. + -**Process (Using file-todos Skill):** +``` +Task deep-analysis(" +You are a Deep Analysis Reviewer. Your job is to enrich judged code review findings with stakeholder impact analysis and scenario exploration. + +## Instructions: +1. Read .review/JUDGED_FINDINGS.json +2. Read .review/pr_diff.txt for code context +3. Read .review/PR_INTENT.md for PR context +4. For each P1 finding: + - Add 'stakeholder_impact' field: assess Developer experience, Operations risk, End User impact, Security surface (each 1-2 sentences, <400 chars total) + - Add 'scenarios' field: list 2-3 specific failure scenarios with trigger conditions (<300 chars total) + - Add 'suggested_fix' field: a concrete code-level suggestion for resolution (<500 chars) +5. For each P2 finding: + - Add 'stakeholder_impact' field (<200 chars) + - Add 'suggested_fix' field (<300 chars) +6. P3 findings: pass through unchanged +7. Write the enriched report to .review/ENRICHED_FINDINGS.json (same schema as JUDGED_FINDINGS.json, plus the new fields) +8. Return to parent: 'Deep analysis complete. Enriched P1 and P2 findings. Written to .review/ENRICHED_FINDINGS.json' +") +``` -1. For each finding: +### 8. Synthesis — Read ONLY the Enriched Findings - - Determine severity (P1/P2/P3) - - Write detailed Problem Statement and Findings - - Create 2-3 Proposed Solutions with pros/cons/effort/risk - - Estimate effort (Small/Medium/Large) - - Add acceptance criteria and work log + +NOW read `.review/ENRICHED_FINDINGS.json` — this is the ONLY agent output file you read into your context. This file is pre-deduplicated, evidence-checked, ranked, enriched with impact analysis, and capped at ~15 findings. + -2. Use file-todos skill for structured todo management: +Read the enriched findings file and proceed to create todos. - ```bash - skill: file-todos - ``` +### 9. Create Todo Files - The skill provides: + +ALL findings from ENRICHED_FINDINGS.json MUST be stored in the todos/ directory. Create todo files immediately — do NOT present findings for user approval first. + - - Template location: `.claude/skills/file-todos/assets/todo-template.md` - - Naming convention: `{issue_id}-{status}-{priority}-{description}.md` - - YAML frontmatter structure: status, priority, issue_id, tags, dependencies - - All required sections: Problem Statement, Findings, Solutions, etc. +#### Execution: LLM-Powered Todo Creation (Sub-Agents) -3. Create todo files in parallel: +Todo creation benefits from LLM reasoning — agents can read the actual code at each finding location to write meaningful Problem Statements, generate real Proposed Solutions with pros/cons, and craft acceptance criteria specific to the codebase. - ```bash - {next_id}-pending-{priority}-{description}.md - ``` +Launch parallel sub-agents grouped by severity: -4. Examples: +``` +Task create-todos-p1(" +Create todo files in todos/ for these P1 findings from the code review. + +## Instructions: +1. Read .review/ENRICHED_FINDINGS.json — process only P1 findings +2. For each P1 finding, read the actual source code at the finding's location to understand full context +3. Create a todo file using the file-todos skill template: .claude/skills/file-todos/assets/todo-template.md +4. File naming: {id}-pending-p1-{slug}.md (id zero-padded to 3 digits, slug from title) +5. Fill in ALL sections with substantive content: + - Problem Statement: explain the issue in context of the actual code, not just the finding description + - Proposed Solutions: 2-3 real options with pros/cons/effort/risk based on the codebase + - Acceptance Criteria: specific, testable checklist items + - Work Log: initial entry with today's date +6. Tag all with 'code-review' plus relevant domain tags +") + +Task create-todos-p2(" +Create todo files in todos/ for these P2 findings from the code review. + +## Instructions: +1. Read .review/ENRICHED_FINDINGS.json — process only P2 findings +2. For each P2 finding, read the actual source code at the finding's location +3. Create todo files using the file-todos skill template +4. File naming: {id}-pending-p2-{slug}.md +5. Fill in all sections with substantive content +6. Tag all with 'code-review' plus relevant domain tags +") + +Task create-todos-p3(" +Create todo files in todos/ for these P3 findings from the code review. + +## Instructions: +1. Read .review/ENRICHED_FINDINGS.json — process only P3 findings +2. For each P3 finding, create a lighter-weight todo file +3. File naming: {id}-pending-p3-{slug}.md +4. Problem Statement and Recommendation are sufficient — Proposed Solutions can be brief +5. Tag all with 'code-review' plus relevant domain tags +") +``` - ``` - 001-pending-p1-path-traversal-vulnerability.md - 002-pending-p1-api-response-validation.md - 003-pending-p2-concurrency-limit.md - 004-pending-p3-unused-parameter.md - ``` +**File naming convention:** +``` +{issue_id}-pending-{priority}-{description}.md -5. Follow template structure from file-todos skill: `.claude/skills/file-todos/assets/todo-template.md` +Examples: +- 001-pending-p1-path-traversal-vulnerability.md +- 002-pending-p1-api-response-validation.md +- 003-pending-p2-concurrency-limit.md +- 004-pending-p3-unused-parameter.md +``` -**Todo File Structure (from template):** +**Todo file structure (from file-todos skill template):** Each todo must include: - - **YAML frontmatter**: status, priority, issue_id, tags, dependencies - **Problem Statement**: What's broken/missing, why it matters - **Findings**: Discoveries from agents with evidence/location - **Proposed Solutions**: 2-3 options, each with pros/cons/effort/risk -- **Recommended Action**: (Filled during triage, leave blank initially) -- **Technical Details**: Affected files, components, database changes +- **Recommended Action**: Leave blank initially +- **Technical Details**: Affected files, components - **Acceptance Criteria**: Testable checklist items -- **Work Log**: Dated record with actions and learnings -- **Resources**: Links to PR, issues, documentation, similar patterns - -**File naming convention:** - -``` -{issue_id}-{status}-{priority}-{description}.md - -Examples: -- 001-pending-p1-security-vulnerability.md -- 002-pending-p2-performance-optimization.md -- 003-pending-p3-code-cleanup.md -``` - -**Status values:** - -- `pending` - New findings, needs triage/decision -- `ready` - Approved by manager, ready to work -- `complete` - Work finished - -**Priority values:** - -- `p1` - Critical (blocks merge, security/data issues) -- `p2` - Important (should fix, architectural/performance) -- `p3` - Nice-to-have (enhancements, cleanup) +- **Work Log**: Dated record +- **Resources**: Links to PR, issues, documentation -**Tagging:** Always add `code-review` tag, plus: `security`, `performance`, `architecture`, `rails`, `quality`, etc. +**Status values:** `pending` → `ready` → `complete` +**Priority values:** `p1` (blocks merge) | `p2` (should fix) | `p3` (nice-to-have) +**Tagging:** Always add `code-review` tag, plus relevant domain tags -#### Step 3: Summary Report +### 10. Summary Report -After creating all todo files, present comprehensive summary: +After creating all todo files, present: -````markdown +```markdown ## ✅ Code Review Complete -**Review Target:** PR #XXXX - [PR Title] **Branch:** [branch-name] +**Review Target:** PR #XXXX - [PR Title] +**Branch:** [branch-name] ### Findings Summary: - -- **Total Findings:** [X] -- **🔴 CRITICAL (P1):** [count] - BLOCKS MERGE -- **🟡 IMPORTANT (P2):** [count] - Should Fix -- **🔵 NICE-TO-HAVE (P3):** [count] - Enhancements +- **Total Raw Findings:** [X] (from [N] agents) +- **Hallucinated/Invalid Removed:** [Z] +- **After Dedup/Judging:** [Y] +- **🔴 P1 (BLOCKS MERGE):** [count] +- **🟡 P2 (Should Fix):** [count] +- **🔵 P3 (Nice-to-Have):** [count] ### Created Todo Files: -**P1 - Critical (BLOCKS MERGE):** - -- `001-pending-p1-{finding}.md` - {description} -- `002-pending-p1-{finding}.md` - {description} - -**P2 - Important:** +**P1 — Critical (BLOCKS MERGE):** +- `001-pending-p1-{finding}.md` — {description} -- `003-pending-p2-{finding}.md` - {description} -- `004-pending-p2-{finding}.md` - {description} +**P2 — Important:** +- `003-pending-p2-{finding}.md` — {description} -**P3 - Nice-to-Have:** +**P3 — Nice-to-Have:** +- `005-pending-p3-{finding}.md` — {description} -- `005-pending-p3-{finding}.md` - {description} - -### Review Agents Used: - -- kieran-rails-reviewer -- security-sentinel -- performance-oracle -- architecture-strategist -- agent-native-reviewer -- [other agents] +### Agents Used: +- [list of agents that ran] ### Next Steps: - -1. **Address P1 Findings**: CRITICAL - must be fixed before merge - - - Review each P1 todo in detail - - Implement fixes or request exemption - - Verify fixes before merging PR - -2. **Triage All Todos**: - ```bash - ls todos/*-pending-*.md # View all pending todos - /triage # Use slash command for interactive triage - ``` -```` - -3. **Work on Approved Todos**: - - ```bash - /resolve_todo_parallel # Fix all approved items efficiently - ``` - -4. **Track Progress**: - - Rename file when status changes: pending → ready → complete - - Update Work Log as you work - - Commit todos: `git add todos/ && git commit -m "refactor: add code review findings"` - -### Severity Breakdown: - -**🔴 P1 (Critical - Blocks Merge):** - -- Security vulnerabilities -- Data corruption risks -- Breaking changes -- Critical architectural issues - -**🟡 P2 (Important - Should Fix):** - -- Performance issues -- Significant architectural concerns -- Major code quality problems -- Reliability issues - -**🔵 P3 (Nice-to-Have):** - -- Minor improvements -- Code cleanup -- Optimization opportunities -- Documentation updates - +1. **Address P1 Findings** — must fix before merge +2. **Triage todos:** `ls todos/*-pending-*.md` +3. **Work on approved items:** `/resolve_todo_parallel` ``` -### 7. End-to-End Testing (Optional) +### 11. End-to-End Testing (Optional) -**First, detect the project type from PR files:** - | Indicator | Project Type | |-----------|--------------| -| `*.xcodeproj`, `*.xcworkspace`, `Package.swift` (iOS) | iOS/macOS | -| `Gemfile`, `package.json`, `app/views/*`, `*.html.*` | Web | -| Both iOS files AND web files | Hybrid (test both) | +| `*.xcodeproj`, `*.xcworkspace`, `Package.swift` | iOS/macOS | +| `Gemfile`, `package.json`, `app/views/*` | Web | +| Both | Hybrid | - - -After presenting the Summary Report, offer appropriate testing based on project type: - -**For Web Projects:** -```markdown -**"Want to run browser tests on the affected pages?"** -1. Yes - run `/test-browser` -2. No - skip -``` - -**For iOS Projects:** -```markdown -**"Want to run Xcode simulator tests on the app?"** -1. Yes - run `/xcode-test` -2. No - skip -``` - -**For Hybrid Projects (e.g., Rails + Hotwire Native):** -```markdown -**"Want to run end-to-end tests?"** -1. Web only - run `/test-browser` -2. iOS only - run `/xcode-test` -3. Both - run both commands -4. No - skip -``` - - +After presenting the Summary Report, offer testing based on project type: -#### If User Accepts Web Testing: +**Web:** "Run browser tests? → `/test-browser`" +**iOS:** "Run simulator tests? → `/xcode-test`" +**Hybrid:** "Web only / iOS only / Both / Skip" -Spawn a subagent to run browser tests (preserves main context): +If accepted, spawn as sub-agent to preserve main context: ``` -Task general-purpose("Run /test-browser for PR #[number]. Test all affected pages, check for console errors, handle failures by creating todos and fixing.") +Task general-purpose("Run /test-browser for PR #[number]. Test affected pages, check for errors, create P1 todos for failures.") ``` -The subagent will: -1. Identify pages affected by the PR -2. Navigate to each page and capture snapshots (using Playwright MCP or agent-browser CLI) -3. Check for console errors -4. Test critical interactions -5. Pause for human verification on OAuth/email/payment flows -6. Create P1 todos for any failures -7. Fix and retry until all tests pass - -**Standalone:** `/test-browser [PR number]` - -#### If User Accepts iOS Testing: - -Spawn a subagent to run Xcode tests (preserves main context): - ``` -Task general-purpose("Run /xcode-test for scheme [name]. Build for simulator, install, launch, take screenshots, check for crashes.") +Task general-purpose("Run /xcode-test for scheme [name]. Build, install, screenshot, check for crashes.") ``` -The subagent will: -1. Verify XcodeBuildMCP is installed -2. Discover project and schemes -3. Build for iOS Simulator -4. Install and launch app -5. Take screenshots of key screens -6. Capture console logs for errors -7. Pause for human verification (Sign in with Apple, push, IAP) -8. Create P1 todos for any failures -9. Fix and retry until all tests pass - -**Standalone:** `/xcode-test [scheme]` - ### Important: P1 Findings Block Merge Any **🔴 P1 (CRITICAL)** findings must be addressed before merging the PR. Present these prominently and ensure they're resolved before accepting the PR. -``` + +--- + +## Appendix: Token Budget Reference + +**Parent context (what matters for avoiding overflow):** + +| Component | Token Budget | Notes | +|-----------|-------------|-------| +| PR metadata | ~500 | Title, body, file list | +| Intent analyzer return | ~100 | One sentence confirmation | +| Per-specialist summary returned to parent | ~100-150 | One sentence + counts (5-7 agents) | +| Validation script | ~0 | Bash, no LLM tokens | +| Judge agent return | ~100 | One sentence + counts | +| Deep analysis return | ~100 | One sentence confirmation | +| Enriched findings (ENRICHED_FINDINGS.json) | ~5,000-8,000 | ≤15 findings with impact/scenarios | +| Todo creation returns | ~300-450 | 3 agents × ~100-150 tokens each | +| Parent orchestrator overhead | ~5,000 | Instructions, synthesis, report | +| **Total parent context from agents** | **~11,000-14,000** | **vs ~30,000-50,000 in v1** | + +**Sub-agent spawns (quality-maximized):** + +| Agent | Context Cost | Purpose | +|-------|-------------|---------| +| Intent analyzer | 1 window | Shared context for all specialists | +| 5-7 specialist reviewers | 5-7 windows | Deep domain-specific analysis | +| Judge | 1 window | Dedup, evidence-check, rank | +| Deep analysis | 1 window | Stakeholder impact, scenarios, fix suggestions | +| 3 todo creators | 3 windows | Code-aware, substantive todo files | +| **Total** | **11-13 windows** | **Each isolated, parent stays lean** | + +The key insight: sub-agent context windows are independent and disposable. Only what they *return* to the parent matters for overflow. Every sub-agent returns ~100 tokens. The parent reads one file (ENRICHED_FINDINGS.json) at ~5-8k tokens. Everything else lives on disk. From 3f9fb0a4e3cbc2096195475c727edcbac503bb08 Mon Sep 17 00:00:00 2001 From: Drew Miller Date: Sun, 8 Feb 2026 13:06:40 -0500 Subject: [PATCH 2/3] Expand deep analysis with full stakeholder perspectives and scenario exploration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add git-history-analyzer and code-philosopher to conditional agent selection - Expand Deep Analysis Phase with detailed stakeholder impact analysis (Developer, Operations, End User, Security, Business perspectives) - Add comprehensive scenario exploration checklist (invalid inputs, concurrency, scale, network, resource exhaustion, security vectors, etc.) - Note that deep analysis inherits the Ultra-Thinking approach from v1 but scoped to judged findings in an isolated context window 🤖 Generated with Claude Code Co-Authored-By: Claude --- .../commands/workflows/review.md | 59 +++++++++++++++---- 1 file changed, 48 insertions(+), 11 deletions(-) diff --git a/plugins/compound-engineering/commands/workflows/review.md b/plugins/compound-engineering/commands/workflows/review.md index 4518b8d0..b15bb26f 100644 --- a/plugins/compound-engineering/commands/workflows/review.md +++ b/plugins/compound-engineering/commands/workflows/review.md @@ -163,6 +163,8 @@ Do NOT run all agents on every PR. Examine the PR file list and select relevant - `dependency-detective` — only if PR modifies package files (Gemfile, package.json, etc.) - `devops-harmony-analyst` — only if PR touches CI/CD, Docker, or deployment configs - `data-integrity-guardian` — only if PR modifies database queries or data flows +- `git-history-analyzer` — if PR touches files with complex change history or frequent churn +- `code-philosopher` — if PR introduces new abstractions, significant API design, or architectural decisions worth deeper reasoning about maintainability and conceptual clarity **Migration-Specific (only for db/migrate/*.rb files):** - `data-migration-expert` — validates ID mappings, rollback safety @@ -312,26 +314,61 @@ You are a Code Review Judge. Consolidate findings from multiple review agents in After judging, spawn a deep-analysis agent to enrich findings. This runs in its own context window so it doesn't cost the parent anything, but it produces significantly richer findings for todo creation. + +This agent performs the Ultra-Thinking deep dive from the original review command — stakeholder perspectives, scenario exploration, and multi-angle analysis — but scoped to the judged findings rather than the raw diff, and in an isolated context window rather than the parent. ``` Task deep-analysis(" -You are a Deep Analysis Reviewer. Your job is to enrich judged code review findings with stakeholder impact analysis and scenario exploration. +You are a Deep Analysis Reviewer. Your job is to enrich judged code review findings with deep stakeholder analysis, scenario exploration, and concrete fix suggestions. ## Instructions: 1. Read .review/JUDGED_FINDINGS.json 2. Read .review/pr_diff.txt for code context 3. Read .review/PR_INTENT.md for PR context -4. For each P1 finding: - - Add 'stakeholder_impact' field: assess Developer experience, Operations risk, End User impact, Security surface (each 1-2 sentences, <400 chars total) - - Add 'scenarios' field: list 2-3 specific failure scenarios with trigger conditions (<300 chars total) - - Add 'suggested_fix' field: a concrete code-level suggestion for resolution (<500 chars) -5. For each P2 finding: - - Add 'stakeholder_impact' field (<200 chars) - - Add 'suggested_fix' field (<300 chars) -6. P3 findings: pass through unchanged -7. Write the enriched report to .review/ENRICHED_FINDINGS.json (same schema as JUDGED_FINDINGS.json, plus the new fields) -8. Return to parent: 'Deep analysis complete. Enriched P1 and P2 findings. Written to .review/ENRICHED_FINDINGS.json' + +## For each P1 finding, perform ALL of the following: + +### Stakeholder Impact Analysis +Put yourself in each stakeholder's shoes. Assess impact from each perspective: + +- **Developer**: How easy is the affected code to understand and modify? Are the APIs intuitive? Is debugging straightforward? Can this be tested easily? +- **Operations**: How do I deploy this safely? What metrics and logs are available? How do I troubleshoot if this goes wrong? What are the resource requirements? +- **End User**: Is the feature intuitive? Are error messages helpful? Is performance acceptable? Does it solve their problem? +- **Security**: What's the attack surface? Are there compliance requirements? How is data protected? What are the audit capabilities? +- **Business**: What's the ROI impact? Are there legal/compliance risks? How does this affect time-to-market? + +Add a 'stakeholder_impact' object with a field for each perspective (each 1-3 sentences). + +### Scenario Exploration +Think through edge cases and failure scenarios. For the code involved in this finding, consider: + +- Invalid/malformed inputs (null, empty, unexpected types) +- Boundary conditions (min/max values, empty collections) +- Concurrent access (race conditions, deadlocks) +- Scale pressure (10x, 100x normal load) +- Network issues (timeouts, partial failures) +- Resource exhaustion (memory, disk, connections) +- Security vectors (injection, overflow, DoS) +- Data corruption (partial writes, inconsistency) +- Cascading failures (downstream service issues) + +Add a 'scenarios' array listing the 3-5 most relevant failure scenarios with specific trigger conditions and expected impact. + +### Suggested Fix +Add a 'suggested_fix' field with a concrete code-level suggestion for resolution (<500 chars). + +## For each P2 finding: +- Add 'stakeholder_impact' with Developer and Operations perspectives (1-2 sentences each) +- Add 'scenarios' with 1-2 most relevant failure scenarios +- Add 'suggested_fix' (<300 chars) + +## For P3 findings: +- Pass through unchanged + +## Output: +Write the enriched report to .review/ENRICHED_FINDINGS.json (same schema as JUDGED_FINDINGS.json, plus the new fields). +Return to parent: 'Deep analysis complete. Enriched P1 and P2 findings. Written to .review/ENRICHED_FINDINGS.json' ") ``` From 2009ddb40f96620ecf35e3e0595c20de2834455c Mon Sep 17 00:00:00 2001 From: Drew Miller Date: Thu, 12 Feb 2026 15:01:54 -0500 Subject: [PATCH 3/3] fix: align review v2 with plugin ecosystem via 7-dimension audit - Remove 4 ghost agents (rails-turbo-expert, dependency-detective, devops-harmony-analyst, code-philosopher) that don't exist in plugin - Add 5 missing real agents to selection matrix (kieran-typescript-reviewer, kieran-python-reviewer, julik-frontend-races-reviewer, schema-drift-detector, learnings-researcher) - Replace Python3 JSON validation with Node.js for cross-platform compat - Add excluded paths check to judge prompt (not just agent prompts) - Add 0-findings short-circuit for clean PRs - Fix .gitignore append duplication with grep -q guard - Update frontmatter description to match v2 purpose - Bump version to 2.31.0 --- .../.claude-plugin/plugin.json | 2 +- plugins/compound-engineering/CHANGELOG.md | 24 +++++++ .../commands/workflows/review.md | 67 ++++++++++++------- 3 files changed, 68 insertions(+), 25 deletions(-) diff --git a/plugins/compound-engineering/.claude-plugin/plugin.json b/plugins/compound-engineering/.claude-plugin/plugin.json index 842ed85c..f6636a99 100644 --- a/plugins/compound-engineering/.claude-plugin/plugin.json +++ b/plugins/compound-engineering/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "compound-engineering", - "version": "2.30.0", + "version": "2.31.0", "description": "AI-powered development tools. 29 agents, 25 commands, 16 skills, 1 MCP server for code review, research, design, and workflow automation.", "author": { "name": "Kieran Klaassen", diff --git a/plugins/compound-engineering/CHANGELOG.md b/plugins/compound-engineering/CHANGELOG.md index e55f5c76..02035b91 100644 --- a/plugins/compound-engineering/CHANGELOG.md +++ b/plugins/compound-engineering/CHANGELOG.md @@ -5,6 +5,30 @@ All notable changes to the compound-engineering plugin will be documented in thi The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.31.0] - 2026-02-12 + +### Changed + +- **`/workflows:review` command** — Rewritten as v2 with context-managed map-reduce architecture + - **Architecture**: Phased File-Based Map-Reduce — sub-agents write full analysis to `.review/`, return ~100 tokens each to parent. Parent context stays under ~12k tokens vs ~30-50k in v1 + - **6 phases**: Intent → Map (parallel specialists) → Validate → Judge (dedup + evidence-check) → Deep Analysis (stakeholder impact) → Reduce (todo creation) + - **Smart agent selection**: 5 always-run core agents + conditional agents based on PR file types (vs v1's "run ALL agents") + - **Cross-platform**: Uses `.review/` scratchpad (not `/tmp/`), Node.js validation (not Python3) + - **Judge phase**: Deduplicates findings, discards hallucinated file references, ranks by severity × confidence, caps at 15 findings + - **Deep Analysis phase**: Replaces v1's inline Ultra-Thinking with isolated sub-agent for stakeholder perspectives and scenario exploration on P1/P2 findings + - **Protected artifacts**: Excluded paths (docs/plans/, docs/solutions/) enforced in both agent prompts and judge phase + - **0-findings short-circuit**: Clean PRs get a summary without running deep analysis or todo creation + +### Fixed + +- Removed 4 ghost agent references (`rails-turbo-expert`, `dependency-detective`, `devops-harmony-analyst`, `code-philosopher`) that don't exist in the plugin +- Added 5 missing agents to selection matrix: `kieran-typescript-reviewer`, `kieran-python-reviewer`, `julik-frontend-races-reviewer`, `schema-drift-detector`, `learnings-researcher` +- Replaced Python3 JSON validation with Node.js (cross-platform — Python3 not guaranteed on all systems) +- Fixed `.gitignore` append duplication on repeated runs (now uses `grep -q` guard) +- Added excluded paths check to judge prompt (v1 only had it in agent prompts, not synthesis) + +--- + ## [2.30.0] - 2026-02-05 ### Added diff --git a/plugins/compound-engineering/commands/workflows/review.md b/plugins/compound-engineering/commands/workflows/review.md index b15bb26f..62db4ce0 100644 --- a/plugins/compound-engineering/commands/workflows/review.md +++ b/plugins/compound-engineering/commands/workflows/review.md @@ -1,6 +1,6 @@ --- name: workflows:review -description: Perform exhaustive code reviews using multi-agent analysis, ultra-thinking, and worktrees +description: Perform exhaustive code reviews using multi-agent analysis with file-based synthesis to prevent context overflow argument-hint: "[PR number, GitHub URL, branch name, or latest]" --- @@ -64,7 +64,7 @@ Using .review/ inside the project avoids both issues. All tools (Read, Write, Ba REVIEW_DIR=".review" rm -rf "$REVIEW_DIR" mkdir -p "$REVIEW_DIR" -echo "$REVIEW_DIR/" >> .gitignore 2>/dev/null # Ensure it's gitignored +grep -q '.review/' .gitignore 2>/dev/null || echo "$REVIEW_DIR/" >> .gitignore # Ensure it's gitignored (idempotent) # Save PR metadata for agents to reference gh pr view --json title,body,files > "$REVIEW_DIR/pr_meta.json" @@ -155,16 +155,19 @@ Do NOT run all agents on every PR. Examine the PR file list and select relevant 4. `code-simplicity-reviewer` — unnecessary complexity, dead code, simplification opportunities 5. `pattern-recognition-specialist` — anti-patterns, inconsistencies, convention violations -**Run If Applicable:** +**Run If Applicable (language-specific):** - `kieran-rails-reviewer` — only if PR has `.rb` files - `dhh-rails-reviewer` — only if PR has `.rb` files -- `rails-turbo-expert` — only if PR uses Turbo/Stimulus +- `kieran-typescript-reviewer` — only if PR has `.ts`/`.tsx` files +- `kieran-python-reviewer` — only if PR has `.py` files +- `julik-frontend-races-reviewer` — only if PR has Stimulus controllers or frontend async code + +**Run If Applicable (domain-specific):** - `agent-native-reviewer` — only if PR adds user-facing features -- `dependency-detective` — only if PR modifies package files (Gemfile, package.json, etc.) -- `devops-harmony-analyst` — only if PR touches CI/CD, Docker, or deployment configs - `data-integrity-guardian` — only if PR modifies database queries or data flows - `git-history-analyzer` — if PR touches files with complex change history or frequent churn -- `code-philosopher` — if PR introduces new abstractions, significant API design, or architectural decisions worth deeper reasoning about maintainability and conceptual clarity +- `schema-drift-detector` — if PR includes `schema.rb` changes, cross-reference against included migrations +- `learnings-researcher` — if PR touches areas where `docs/solutions/` may have relevant prior art **Migration-Specific (only for db/migrate/*.rb files):** - `data-migration-expert` — validates ID mappings, rollback safety @@ -184,16 +187,19 @@ Task security-sentinel("Review PR #49. . " + SHARED_CONTEX Task performance-oracle("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) Task code-simplicity-reviewer("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) Task pattern-recognition-specialist("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task kieran-typescript-reviewer("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) ``` Example for a Rails PR with migrations: ``` Task kieran-rails-reviewer("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task dhh-rails-reviewer("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) Task security-sentinel("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) Task performance-oracle("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) Task architecture-strategist("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) Task data-migration-expert("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) +Task schema-drift-detector("Review PR #49. . " + SHARED_CONTEXT + OUTPUT_RULES) ``` Wait for ALL agents to complete. @@ -231,25 +237,26 @@ fi #### Step 5b: Validate JSON Schema ```bash -# Validate all agent output files -PR_FILES=$(cat .review/pr_meta.json | python3 -c "import json,sys; [print(f['path']) for f in json.load(sys.stdin)['files']]" 2>/dev/null || echo "") - +# Validate all agent output files (uses Node.js — guaranteed available since Claude Code requires it) for f in .review/*.json; do [[ "$(basename "$f")" == "pr_meta.json" ]] && continue - python3 -c " -import json, sys -try: - data = json.load(open('$f')) - assert 'findings' in data, 'missing findings key' - assert isinstance(data['findings'], list), 'findings not a list' - assert len(data['findings']) <= 10, f'too many findings: {len(data[\"findings\"])}' - for i, finding in enumerate(data['findings']): - assert finding.get('severity') in ('p1','p2','p3'), f'finding {i}: bad severity' - assert finding.get('location'), f'finding {i}: missing location' - print('VALID:', '$(basename $f)', '-', len(data['findings']), 'findings') -except Exception as e: - print('INVALID:', '$(basename $f)', '-', str(e), '— removing from review') - import os; os.remove('$f') + node -e " +const fs = require('fs'); +const f = '$f'; +const name = require('path').basename(f); +try { + const data = JSON.parse(fs.readFileSync(f, 'utf8')); + if (!Array.isArray(data.findings)) throw new Error('findings not an array'); + if (data.findings.length > 10) throw new Error('too many findings: ' + data.findings.length); + data.findings.forEach((finding, i) => { + if (!['p1','p2','p3'].includes(finding.severity)) throw new Error('finding ' + i + ': bad severity'); + if (!finding.location) throw new Error('finding ' + i + ': missing location'); + }); + console.log('VALID:', name, '-', data.findings.length, 'findings'); +} catch (e) { + console.log('INVALID:', name, '-', e.message, '— removing from review'); + fs.unlinkSync(f); +} " 2>&1 done ``` @@ -273,6 +280,7 @@ You are a Code Review Judge. Consolidate findings from multiple review agents in 2. Read .review/pr_meta.json to get the list of files actually in this PR 3. Collect all findings across agents 4. EVIDENCE CHECK: Discard any finding whose 'location' field does not reference a file path present in pr_meta.json. These are hallucinated findings. +4b. EXCLUDED PATHS: Also discard any finding whose location is under docs/plans/ or docs/solutions/ — these are compound-engineering pipeline artifacts and must never be flagged. 5. DEDUPLICATE: Remove semantically similar findings (keep the higher-confidence one) 6. RESOLVE CONFLICTS: If agents contradict each other, note the conflict and pick the more evidence-based finding 7. RANK by: severity (p1 > p2 > p3), then confidence score @@ -310,6 +318,17 @@ You are a Code Review Judge. Consolidate findings from multiple review agents in ") ``` +**Short-circuit: If the judge returns 0 final findings**, skip Deep Analysis and Todo Creation. Present a clean summary: + +```markdown +## ✅ Code Review Complete — No Findings + +**Review Target:** PR #XXXX - [PR Title] +**Branch:** [branch-name] + +All [N] agents reviewed the PR. [X] raw findings were raised, but all were either hallucinated (wrong file paths), duplicates, or below confidence threshold after judging. No action items. +``` + ### 7. Deep Analysis Phase (P1/P2 Enrichment)