Skip to content

Feature/my updates#120

Closed
getworken wants to merge 2 commits intoAutoForgeAI:masterfrom
getworken:feature/my-updates
Closed

Feature/my updates#120
getworken wants to merge 2 commits intoAutoForgeAI:masterfrom
getworken:feature/my-updates

Conversation

@getworken
Copy link

@getworken getworken commented Jan 28, 2026

Summary by CodeRabbit

  • New Features

    • Project registry system for managing projects anywhere on disk
    • IDE integration: open projects in VS Code, Cursor, or Antigravity
    • Knowledge files management: upload, view, and delete project-specific files
    • Project reset functionality while preserving registry registration
    • Database health checks and integrity validation
    • Web UI authentication support with Basic Auth
    • Ollama model support as alternative to Claude
    • Assistant conversation resume capability
    • Automated quality gates (linting, type checking, custom scripts)
    • Security command denial logging and auditability
  • Improvements

    • Enhanced error handling and rate-limit detection with backoff strategies
    • Better Windows process management and compatibility
    • Improved database resilience with WAL mode and retry logic
    • Structured logging for better observability and debugging

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive infrastructure improvements including database layer refactoring with robust connection management, rate-limiting and error-handling enhancements to the agent system, security hardening with authentication and command denial tracking, new project management features (reset, IDE integration, knowledge files), expanded UI capabilities, and extensive test infrastructure setup.

Changes

Cohort / File(s) Summary
Core Database Infrastructure
api/connection.py, api/models.py, api/migrations.py, api/database.py
Introduced comprehensive database connection layer with WAL mode, retry logic, health checks, and session management. Moved ORM models to dedicated module with relationships for attempts/errors. Added 10+ idempotent migrations for schema evolution. Reorganized database.py as re-export hub for backwards compatibility.
Configuration & Type System
api/config.py, api/agent_types.py, registry.py
Added centralized Pydantic-based configuration with environment loading. Introduced AgentType enum (INITIALIZER, CODING, TESTING). Extended model registries to support both Claude and Ollama models with per-mode defaults.
Logging & Observability
api/logging_config.py, structured_logging.py, progress.py
Created modular logging system with rotating file handlers, orchestrator-specific logging, and structured log entries. Implemented SQLite-backed structured logging with query interface, timeline bucketing, and multi-format export (JSON/JSONL/CSV). Enhanced progress tracking with session webhooks.
Security & Authentication
security.py, server/utils/auth.py, server/utils/validation.py, server/utils/process_utils.py
Added HTTP Basic Auth middleware, WebSocket authentication support, and centralized project name validation. Implemented denied command tracking (timestamp, reason, hashing). Added dangerous pattern pre-validation. Windows-specific process termination via taskkill fallback and orphaned process cleanup.
Feature Management & Persistence
api/feature_repository.py, mcp_server/feature_mcp.py
Created FeatureRepository with 20+ methods for CRUD, state queries, dependency resolution, and retry logic. Expanded MCP feature endpoints: added attempt/error tracking, feature deletion with cascade, regression testing workflow, quality result storage, and structured error handling with locking.
Agent & Orchestration
agent.py, parallel_orchestrator.py, autonomous_agent_demo.py, rate_limit_utils.py
Added rate-limit detection, exponential backoff, and retry-after parsing in agent session flow. Implemented Windows asyncio cleanup (Proactor policy, safe_asyncio_run). Enhanced orchestrator with platform-aware environment pruning, WAL checkpointing, database state refresh after cross-process commits, and improved agent lifecycle resilience.
API Routes & Endpoints
server/routers/projects.py, server/routers/settings.py, server/routers/terminal.py, server/routers/spec_creation.py, server/routers/expand_project.py, server/routers/assistant_chat.py, server/routers/schedules.py, server/routers/features.py
Added project import/reset/IDE-open endpoints; knowledge file CRUD; database health checks. Added /denied-commands GET/DELETE endpoints. Integrated centralized validators and auth gates. Expanded ScheduleCreate/Response with max_concurrency. Added WebSocket resume capability for assistant chat.
Server Configuration & Main
server/main.py, server/schemas.py
Added rate-limit middleware (SlowAPI), Basic Auth conditional middleware, and startup/shutdown orchestration. Introduced VALID_IDES constant, DatabaseHealth/KnowledgeFile schemas, enhanced FeatureUpdate with Field constraints, extended SettingsResponse/Update with preferred_ide and IDE selection validation.
Chat Services
server/services/assistant_chat_session.py, server/services/expand_chat_session.py, server/services/spec_chat_session.py
Implemented MCP config file-based management instead of in-memory dicts. Added skip_greeting for conversation resume. Extended Claude output token defaults (131072 for ANT, 32000 fallback). Enhanced feature tool access (update/delete). Integrated MCP tool success tracking with XML fallback for feature creation.
Process & Terminal Management
server/services/process_manager.py, server/services/dev_server_manager.py, server/services/terminal_manager.py
Added robust lock cleanup with stale-process detection via psutil. Introduced Windows process grouping flags (CREATE_NEW_PROCESS_GROUP). Enhanced Windows termination with two-phase graceful/forceful shutdown and taskkill cleanup.
UI Components
ui/src/components/IDESelectionModal.tsx, ui/src/components/ResetProjectModal.tsx, ui/src/components/ProjectSetupRequired.tsx, ui/src/components/ErrorBoundary.tsx, ui/src/App.tsx, ui/src/components/{SettingsModal,ScheduleModal,ConversationHistory,DebugLogViewer,ProjectSelector,ThemeSelector}.tsx
Added IDE selection modal with remember-preference flow. Introduced project reset modal with quick/full reset toggle. Added project setup guide component. Implemented global ErrorBoundary with cache-clear recovery. Extended Settings with IDE selection. Added type-safe event handlers across multiple components.
UI Hooks & API
ui/src/hooks/{useProjects,useAssistantChat,useConversations}.ts, ui/src/lib/{api,types}.ts, ui/src/main.tsx
Added useResetProject hook with query invalidation. Enhanced useAssistantChat with safer WebSocket lifecycle management, centralized tool descriptions, retry cap (50), and improved streaming completion detection. Added retry policy to useConversation (skip 404s). Extended API with resetProject, openProjectInIDE, and knowledge file operations. Added IDEType type. Wrapped App in ErrorBoundary.
Quality Gates & Utilities
quality_gates.py, client.py
Introduced comprehensive quality gates with auto-detection of linters/type checkers (ESLint/Biome/Python ruff/flake8, tsc/mypy), timeout/output handling, custom script support, and multi-format result aggregation. Replaced prints with structured logging throughout client initialization.
Templates & Documentation
.claude/commands/{create-spec,expand-project}.md, .claude/templates/{coding_prompt,initializer_prompt,testing_prompt}.template.md, CLAUDE.md, README.md
Expanded spec creation with Phase 3b database decision (impacts Phase 4 infrastructure features). Added mock-data detection (5 categories). Introduced refactoring features workflow and server-restart persistence testing. Enhanced initializer with infrastructure feature descriptions and 165/265/405+ test count guidance. Updated CLAUDE.md with feature behavior/precedence. Added project registry, knowledge files, authentication, and quality gates to README.
Testing Infrastructure
tests/{conftest,test_async_examples,test_repository_and_config,test_security,test_security_integration}.py, test_agent.py, test_structured_logging.py
Created comprehensive conftest with fixtures for project setup, database lifecycle, async client, and mock project structures. Added 1178 lines of security hook tests covering command extraction, validation, config merging, blocklist enforcement, and pre-validation. Added 440 lines of security integration tests using temporary HOME override. Added unit tests for rate-limit parsing and backoff. Added 469 lines of structured logging tests (entry creation, handler, query, export, thread safety).
Requirements & Build
requirements.txt, pyproject.toml, ui/package.json, start_ui.{py,sh,bat}
Loosened dependency version pins to ~= ranges for flexibility. Added pytest asyncio configuration (auto mode, function scope). Updated ui @types/node. Enhanced npm install logic to check node_modules freshness. Added venv detection and creation in start_ui.sh; removed pause in start_ui.bat.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • PR #78: Introduces spec-creation UI flow and Windows asyncio subprocess handling (Proactor policy, safe_asyncio_run), with overlapping changes to autonomous_agent_demo.py and server/main.py.
  • PR #80: Expands security command-allowlist logic with hierarchical config, org/project merging, and pre-validation patterns—directly related to security.py enhancements in this PR.
  • PR #74: Modifies assistant chat session flow including skip_greeting for resumed conversations and history-loading behavior—shares changes with server/services/assistant_chat_session.py.

Poem

🐰 Databases dance on WAL-mode flights,
While auth guards the door both day and night,
Features bloom with attempts tracked true,
IDE dreams and orchestration anew!
Security whispers denied commands' song.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feature/my updates' is vague and generic, using non-descriptive terms that do not clearly convey the changeset's primary purpose or scope. Replace the vague title with a specific, descriptive summary of the main changes. For example: 'Add infrastructure validation, quality gates, and enhanced security checks' or 'Refactor database layer and add comprehensive test coverage'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 97.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@getworken
Copy link
Author

Ignore this, I added it to #117 so its all in one place. Going through with claude opus 4.5 to make all of these changes. I love the project and want to contribute as much as I can. I also have unlimited GLM 4.7 if you need anything mass produced for this project.

I will be making some big changes I think the project needs in terms of making this build apps 10x better than anything else.

Just a heads up, a lot of the updates Claude, GPT and Gemini have done in terms of skills, agents and another things have come directly from my data with dealing with agents. I started those systems back in early November and they finally made their way to here.

I will be restructuring the prompts as well to handle Pro level coding structures.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
server/routers/filesystem.py (1)

44-45: WebSocket endpoint /ws/projects/{project_name} exists in server/main.py:225 and is correctly implemented.

The endpoint handler in server/websocket.py:662-870 sends progress, agent_status, log, and agent_update messages. However, feature_update message type is defined in server/schemas.py:282 but is not sent by the WebSocket handler—it may need to be implemented to fully satisfy the real-time UI updates requirement.

start_ui.sh (1)

86-89: Fail fast when dependency installation fails.

If pip install fails, the script still proceeds and the launcher will crash with less actionable errors. Check the exit code and stop early.

🔧 Suggested fix
 pip install -r requirements.txt --quiet
+if [ $? -ne 0 ]; then
+    echo "[ERROR] Failed to install dependencies"
+    exit 1
+fi
parallel_orchestrator.py (1)

451-476: Testing agents can exceed max_concurrency due to desired overwrite.

desired is clamped before the loop but then overwritten inside the lock, so a configuration like max_concurrency=1 and testing_agent_ratio=3 can still spawn 3 testing agents, contradicting the stated cap.

🐛 Proposed fix
-                desired = self.testing_agent_ratio
+                desired = min(self.testing_agent_ratio, self.max_concurrency)
mcp_server/feature_mcp.py (1)

8-33: Required MCP tools feature_get_next and feature_claim_next are missing.

The guideline mandates these tools, but the file now exposes feature_claim_and_get instead. Please add compatibility tools or restore the originals to avoid breaking clients that depend on the required MCP surface.

➕ Suggested compatibility wrappers
+@mcp.tool()
+def feature_get_next() -> str:
+    payload = json.loads(feature_get_ready(limit=1))
+    features = payload.get("features", [])
+    return json.dumps({"feature": features[0] if features else None})
+
+@mcp.tool()
+def feature_claim_next() -> str:
+    payload = json.loads(feature_get_next())
+    feature = payload.get("feature")
+    if not feature:
+        return json.dumps({"error": "No ready features"})
+    return feature_claim_and_get(feature["id"])
As per coding guidelines, "mcp_server/feature_mcp.py: Use MCP servers for feature management with tools: feature_get_stats, feature_get_next, feature_claim_next, feature_get_for_regression, feature_mark_passing, feature_skip, feature_create_bulk, feature_add_dependency, and feature_remove_dependency".
ui/src/components/DebugLogViewer.tsx (1)

349-428: Add generic type parameter to React.MouseEvent for better type safety

The onClick handlers use React.MouseEvent without specifying the element type. For React 19 + TypeScript on Button elements, use React.MouseEvent<HTMLButtonElement> to ensure proper type safety and IDE support.

🤖 Fix all issues with AI agents
In `@api/database.py`:
- Around line 5-37: Move the Feature ORM class from api/models.py into
api/database.py and keep all other models (FeatureAttempt, FeatureError,
Schedule, ScheduleOverride) in api/models.py; in api/database.py define Feature
using the existing Base (ensure Base is imported or declared before the class)
and the same columns/relationships (priority, category, name, description,
steps, passes, dependencies) so the class signature and metadata remain
unchanged, then remove Feature from api/models.py and update any
re-exports/imports to reference api.database.Feature instead of
api.models.Feature; ensure the module still re-exports Base and the other model
classes and that any session/engine utilities (get_db, get_db_session,
create_database, invalidate_engine_cache, etc.) continue to import/import-from
the same names.

In `@api/feature_repository.py`:
- Around line 37-77: The retry loop in _commit_with_retry currently calls
session.rollback(), which discards caller modifications (e.g., changes made by
mark_in_progress) and makes retries meaningless; change the behavior so the
function does NOT perform session.rollback() on transient OperationalError, and
instead either (A) propagate/raise a specific retryable exception (e.g.,
RetryableCommitError) or re-raise the OperationalError immediately so the caller
(such as mark_in_progress) can catch it, re-apply changes, and perform its own
retry logic; also update the _commit_with_retry docstring to state that callers
are responsible for re-applying pending changes after a rollback and for
retrying commits.
- Around line 205-223: The status-update methods (mark_in_progress,
mark_passing, mark_failing, clear_in_progress) modify an ORM object then call
_commit_with_retry, but if a commit fails and rollback happens those in-memory
changes are lost and retries will commit nothing; fix by ensuring the state
changes are applied inside the retryable commit path — e.g., change each method
to fetch/merge the Feature (use get_by_id or session.merge), then wrap the
attribute assignments and commit inside the same retry loop or pass a callback
to _commit_with_retry that re-applies feature.in_progress / feature.passes /
feature.started_at modifications before each commit attempt so retries always
persist the intended updates and session.refresh(feature) is done after a
successful commit.

In `@api/logging_config.py`:
- Around line 113-116: The f-string for the startup log contains escaped quotes
causing a syntax error; update the logger.debug call (the one using logger from
logging.getLogger(__name__)) to use a proper f-string delimiter (e.g., f"Logging
initialized. Log file: {log_path}" or f'Logging initialized. Log file:
{log_path}') so the string is valid and includes the log_path variable.

In `@server/main.py`:
- Around line 181-186: The except clause in the authorization parsing references
binascii.Error (in the except tuple around ValueError and UnicodeDecodeError and
the Response return block), but binascii is not imported; add "import binascii"
alongside the other top-of-file imports in server/main.py so the NameError is
avoided and the exception handler can catch base64 decode errors properly.

In `@server/schemas.py`:
- Around line 127-134: The steps field in FeatureUpdate currently only enforces
the list length but not that each string is non-empty; update the schema so each
step string has min length 1 (e.g., change the typing of steps from list[str] |
None to list[Annotated[str, MinLen(1)]] | None or add a Pydantic validator on
FeatureUpdate.steps that checks every item is a non-empty string) and ensure
imports for Annotated and MinLen (or the validator utilities) are added/updated
accordingly.

In `@server/services/assistant_chat_session.py`:
- Around line 48-50: Remove the duplicate os import and add the missing uuid
import at the top of the module so uuid.uuid4().hex can be used without error;
specifically, eliminate the second "import os" near the
DEFAULT_MAX_OUTPUT_TOKENS definition (leave the original import at the top that
already exists) and add "import uuid" alongside other imports so the usage of
uuid in the code (uuid.uuid4().hex) resolves correctly while keeping
DEFAULT_MAX_OUTPUT_TOKENS logic unchanged.

In `@structured_logging.py`:
- Around line 96-195: The SQLite connections in _init_database and emit
currently open without a busy timeout, causing "database is locked" under
concurrent writers; update both sqlite3.connect calls to specify a timeout (e.g.
sqlite3.connect(self.db_path, timeout=5.0)) and/or immediately run "PRAGMA
busy_timeout = 5000" on each new connection (use milliseconds) so writes retry
instead of failing; add a configurable attribute (e.g., self.sqlite_busy_timeout
or self.sqlite_timeout) and use it for all new connections (in _init_database
and emit) to ensure consistent behavior across the class.

In `@ui/src/components/ProjectSetupRequired.tsx`:
- Around line 8-55: The async startAgent call in handleSpecComplete needs a
cancel guard so onSetupComplete and state updates don't run if the user cancels;
modify handleSpecComplete to create and use an AbortController signal (if
startAgent supports it) or a local "cancelled" flag/promise, pass the signal to
startAgent, and only call onSetupComplete or setInitializerStatus/error when not
cancelled, and update handleChatCancel to call controller.abort() or set
cancelled = true and also reset showChat, initializerStatus ('idle'),
initializerError (null) and yoloModeSelected to their initial values; also
ensure handleRetryInitializer reinitializes a fresh controller/flag before
calling handleSpecComplete so retries aren't affected by prior cancels.
🟡 Minor comments (20)
ui/src/components/ProjectSetupRequired.tsx-96-150 (1)

96-150: Add type="button" to avoid accidental form submissions.

These buttons default to type="submit" if rendered inside a <form>. Set type="button" explicitly.

✅ Proposed change
-        <button
+        <button type="button"
           onClick={handleClaudeSelect}
           className={`
             w-full text-left p-4
             border-3 border-[var(--color-neo-border)]
             bg-white
             shadow-[4px_4px_0px_rgba(0,0,0,1)]
             hover:translate-x-[-2px] hover:translate-y-[-2px]
             hover:shadow-[6px_6px_0px_rgba(0,0,0,1)]
             transition-all duration-150
           `}
         >
...
-        <button
+        <button type="button"
           onClick={handleManualSelect}
           className={`
             w-full text-left p-4
             border-3 border-[var(--color-neo-border)]
             bg-white
             shadow-[4px_4px_0px_rgba(0,0,0,1)]
             hover:translate-x-[-2px] hover:translate-y-[-2px]
             hover:shadow-[6px_6px_0px_rgba(0,0,0,1)]
             transition-all duration-150
           `}
         >
server/services/spec_chat_session.py-36-42 (1)

36-42: Compute default max-output tokens at runtime.
DEFAULT_MAX_OUTPUT_TOKENS is derived at import time, so env changes after import won’t be reflected (common in tests or runtime config). Compute the default when building sdk_env.

🛠️ Suggested fix
-DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
+def _default_max_output_tokens() -> str:
+    return "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
-            sdk_env["CLAUDE_CODE_MAX_OUTPUT_TOKENS"] = DEFAULT_MAX_OUTPUT_TOKENS
+            sdk_env["CLAUDE_CODE_MAX_OUTPUT_TOKENS"] = _default_max_output_tokens()

Also applies to: 177-180

api/config.py-120-126 (1)

120-126: Handle base URLs without a scheme in is_using_ollama.

If users set anthropic_base_url to localhost:11434, urlparse yields no hostname and Ollama detection fails. Normalize missing schemes before hostname checks.

🔧 Suggested fix
-        host = urlparse(self.anthropic_base_url).hostname or ""
-        return host in {"localhost", "127.0.0.1", "::1"}
+        parsed = urlparse(self.anthropic_base_url)
+        host = parsed.hostname
+        if not host:
+            host = urlparse(f"//{self.anthropic_base_url}").hostname
+        host = host or ""
+        return host in {"localhost", "127.0.0.1", "::1"}
server/utils/process_utils.py-174-217 (1)

174-217: Termination count can be overstated on failed cleanup.

terminated increments even if the kill attempt fails, so the return value can be inaccurate.

🛠️ Suggested fix
-                        _kill_windows_process_tree_taskkill(proc.pid)
-                        terminated += 1
+                        if _kill_windows_process_tree_taskkill(proc.pid):
+                            terminated += 1
server/services/expand_chat_session.py-43-46 (1)

43-46: Remove duplicate import os.

os is already imported at line 12. This duplicate import at line 44 is unnecessary.

Proposed fix
 # Default max output tokens - use 131k only for alternative APIs (like GLM), otherwise use 32k for Anthropic
-import os
 DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
.claude/templates/initializer_prompt.template.md-133-161 (1)

133-161: JSON comments will cause parsing errors.

The dependency example uses // comments inside a JSON block, but JSON doesn't support comments. If this template is processed programmatically or if users copy-paste it, parsing will fail.

Suggested fix: Use JSONC format indicator or move comments outside

Option 1: Change the code block language to jsonc:

-```json
+```jsonc

Option 2: Move comments outside the JSON structure or use a different documentation format.

server/services/assistant_chat_session.py-247-252 (1)

247-252: Potential AttributeError if _mcp_config_file is not initialized.

The close() method accesses self._mcp_config_file but this attribute is only set in start() (line 314). If close() is called before start(), this will raise an AttributeError.

Proposed fix: Initialize in __init__ or use getattr

Option 1: Initialize in __init__:

 def __init__(self, project_name: str, project_dir: Path, conversation_id: Optional[int] = None):
     ...
     self._history_loaded: bool = False
+    self._mcp_config_file: Optional[Path] = None

Option 2: Use hasattr/getattr in close():

-if self._mcp_config_file and self._mcp_config_file.exists():
+if getattr(self, '_mcp_config_file', None) and self._mcp_config_file.exists():
.claude/templates/testing_prompt.template.md-176-184 (1)

176-184: Inconsistency: feature_get_summary doesn't exist in tool list.

Line 182 recommends using feature_get_summary for lighter status checks, but this tool is not listed in the "AVAILABLE MCP TOOLS" section (lines 151-172). Either add this tool to the list or correct the reference to feature_get_stats.

Proposed fix
 - **Don't read files unnecessarily** - Feature details from `feature_get_by_id` contain everything you need
 - **Be concise** - Short, focused responses save tokens for actual work
-- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`)
+- **Use `feature_get_stats`** for status checks (lighter than `feature_get_by_id`)
 - **Avoid re-reading large files** - Read once, remember the content
server/routers/settings.py-152-178 (1)

152-178: **LGTM - Denied commands endpoints are well-implemented.**The web search didn't find relevant information. Looking at the provided code snippet from server/security.py, the get_denied_commands function has a default limit=50, but the endpoint documentation says "Returns the last 100 commands". Let me verify this discrepancy.

Minor documentation inconsistency: limit mismatch.

The docstring on line 156 states "Returns the last 100 commands," but the get_denied_commands function in server/security.py has limit: int = 50 as its default. Either update the endpoint to pass limit=100 or correct the documentation.

Proposed fix - align with documentation
 async def get_denied_commands_list():
     """Get list of recently denied commands.

     Returns the last 100 commands that were blocked by the security system.
     Useful for debugging and understanding what commands agents tried to run.
     """
-    denied = get_denied_commands()
+    denied = get_denied_commands(limit=100)
quality_gates.py-292-353 (1)

292-353: Parameter shadowing: run_type_check shadows function name.

The parameter run_type_check on line 295 shadows the function run_type_check defined on line 196. This can cause confusion and potential bugs if someone tries to call the function from within verify_quality.

Proposed fix - rename parameter
 def verify_quality(
     project_dir: Path,
     run_lint: bool = True,
-    run_type_check: bool = True,
+    run_types: bool = True,
     run_custom: bool = True,
     custom_script_path: str | None = None,
 ) -> QualityGateResult:
     ...
-    if run_type_check:
-        type_result = run_type_check(project_dir)
+    if run_types:
+        type_result = run_type_check(project_dir)
quality_gates.py-68-69 (1)

68-69: Potential index error if cmd is empty.

If cmd is an empty list, accessing cmd[0] on line 69 will raise an IndexError. Add a guard at the start of the function.

Proposed fix
 def _run_command(cmd: list[str], cwd: Path, timeout: int = 60) -> tuple[int, str, int]:
+    if not cmd:
+        return 1, "Empty command provided", 0
+    
     start = time.time()
structured_logging.py-35-37 (1)

35-37: LogLevel type doesn't include "warning" but database stores only "warning".

The type allows "warn" but logs are stored with "warning" (from Python's standard logging level name). This causes a type-contract violation: callers using query(level="warn") would return empty results. Add "warning" to the LogLevel type.

Suggested fix
-LogLevel = Literal["debug", "info", "warn", "error"]
+LogLevel = Literal["debug", "info", "warn", "warning", "error"]
tests/test_repository_and_config.py-262-269 (1)

262-269: Use actual IDs after flush instead of hard-coding dependencies.

The tests assume IDs start at 1/2 in a fresh temporary database. While this will reliably work, assigning dependencies after flush() makes tests more explicit and resilient to potential database state variations.

Suggested fix
             f1 = Feature(priority=1, category="test", name="F1", description="", steps=[], passes=True)
             f2 = Feature(priority=2, category="test", name="F2", description="", steps=[], passes=False)
-            f3 = Feature(priority=3, category="test", name="F3", description="", steps=[], passes=False, dependencies=[1])
-            f4 = Feature(priority=4, category="test", name="F4", description="", steps=[], passes=False, dependencies=[2])
+            f3 = Feature(priority=3, category="test", name="F3", description="", steps=[], passes=False)
+            f4 = Feature(priority=4, category="test", name="F4", description="", steps=[], passes=False)
 
             session.add_all([f1, f2, f3, f4])
-            session.commit()
+            session.flush()
+            f3.dependencies = [f1.id]
+            f4.dependencies = [f2.id]
+            session.commit()
             f1 = Feature(priority=1, category="test", name="F1", description="", steps=[], passes=False)
-            f2 = Feature(priority=2, category="test", name="F2", description="", steps=[], passes=False, dependencies=[1])
+            f2 = Feature(priority=2, category="test", name="F2", description="", steps=[], passes=False)
 
             session.add_all([f1, f2])
-            session.commit()
+            session.flush()
+            f2.dependencies = [f1.id]
+            session.commit()
ui/src/components/AssistantPanel.tsx-53-68 (1)

53-68: Guard non-Error throwables before reading .message.

conversationError has type unknown | null from useQuery, so accessing .message without an instanceof Error check violates strict TypeScript. The codebase already uses this pattern in the retry function of the same hook. A simple instanceof Error guard prevents the crash.

✅ Suggested fix
  useEffect(() => {
-   if (conversationError && conversationId) {
-     const message = conversationError.message.toLowerCase()
+   if (conversationId && conversationError instanceof Error) {
+     const message = conversationError.message.toLowerCase()
      // Only clear for 404 errors, not transient network issues
      if (message.includes('not found') || message.includes('404')) {
        console.warn(`Conversation ${conversationId} not found, clearing stored ID`)
        setConversationId(null)
      }
    }
  }, [conversationError, conversationId])
progress.py-23-85 (1)

23-85: Close webhook responses to prevent socket leaks.

The urllib.request.urlopen() response object is a context manager and must be closed. Leaving it open can accumulate unclosed file descriptors over time.

Suggested fix
-        urllib.request.urlopen(req, timeout=5)
+        with urllib.request.urlopen(req, timeout=5) as resp:
+            resp.read()
server/websocket.py-273-277 (1)

273-277: Fire-and-forget task may silently swallow exceptions.

The _schedule_cleanup method creates a task and adds a done callback that retrieves the exception but doesn't log it. If cleanup_stale_agents fails, the error is silently ignored.

🔧 Suggested fix
     def _schedule_cleanup(self) -> None:
         """Schedule cleanup if needed (non-blocking)."""
         if self._should_cleanup():
             task = asyncio.create_task(self.cleanup_stale_agents())
-            task.add_done_callback(lambda t: t.exception() if not t.cancelled() else None)
+            def log_exception(t):
+                if not t.cancelled():
+                    exc = t.exception()
+                    if exc:
+                        logger.warning(f"Cleanup task failed: {exc}")
+            task.add_done_callback(log_exception)
api/feature_repository.py-291-311 (1)

291-311: Missing priority/score sorting for ready features.

The get_ready_features method returns features without sorting. The relevant snippet from api/dependency_resolver.py (lines 387-416) shows that ready features should be sorted by scheduling score, priority, and ID. This inconsistency could lead to different behavior depending on which function is used.

💡 Suggested enhancement
     def get_ready_features(self) -> list[Feature]:
         """Get features that are ready to implement.
         ...
         """
         passing_ids = self.get_passing_ids()
         candidates = self.get_pending()

         ready = []
         for f in candidates:
             deps = f.dependencies or []
             if all(dep_id in passing_ids for dep_id in deps):
                 ready.append(f)

+        # Sort by priority (lowest first), then by ID
+        ready.sort(key=lambda f: (f.priority, f.id))
+
         return ready
server/routers/projects.py-669-670 (1)

669-670: Timezone-naive datetime from file modification time.

datetime.fromtimestamp(stat.st_mtime) creates a timezone-naive datetime. This could cause inconsistencies when comparing with UTC timestamps used elsewhere (e.g., in the Feature model). Consider using timezone-aware datetimes.

🔧 Suggested fix
+from datetime import timezone

 # In list_knowledge_files:
             files.append(KnowledgeFile(
                 name=filepath.name,
                 size=stat.st_size,
-                modified=datetime.fromtimestamp(stat.st_mtime)
+                modified=datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc)
             ))
api/migrations.py-62-151 (1)

62-151: Refactor dynamic column handling to use SQL parameterization.

The code correctly implements try/except with rollback (lines 148-151), so transaction safety is adequate. However, the dynamic column name construction (lines 105-106, also at line 208 in migrate_add_timestamp_columns) injects column names directly into f-strings:

optional_col_defs += f",\n                        {col_name} {col_type}"
optional_col_names += f", {col_name}"

While the column names originate from SQLite's PRAGMA table_info() and are inherently validated, this pattern is fragile and violates parameterization best practices. Column identifiers should be quoted or handled via proper SQL identifier escaping to protect against edge cases.

Suggested approach

Use SQL quoting for identifiers:

optional_col_defs += f',\n                        "{col_name}" {col_type}'
optional_col_names += f', "{col_name}"'

Apply the same fix to line 208 in migrate_add_timestamp_columns.

Additionally, SQLite DDL rollback behavior (used in the catch block) requires SQLite 3.26.0+ (2019). If supporting older versions is necessary, document the minimum SQLite version requirement.

api/connection.py-73-91 (1)

73-91: Mount point matching may select incorrect filesystem on Unix.

When multiple mount points share a common prefix (e.g., /data and /data/shared), the current logic matches the first one encountered rather than the longest matching mount point. If /data is ext4 and /data/shared is nfs, a path under /data/shared/project could incorrectly be identified as local.

Suggested fix: select the longest matching mount point
         try:
             with open("/proc/mounts", "r") as f:
                 mounts = f.read()
+                # Find the longest matching mount point
+                best_match = None
+                best_fs_type = None
                 # Check each mount point to find which one contains our path
                 for line in mounts.splitlines():
                     parts = line.split()
                     if len(parts) >= 3:
                         mount_point = parts[1]
                         fs_type = parts[2]
-                        # Check if path is under this mount point and if it's a network FS
-                        if path_str.startswith(mount_point):
-                            if fs_type in ("nfs", "nfs4", "cifs", "smbfs", "fuse.sshfs"):
-                                return True
+                        # Check if path is under this mount point
+                        if path_str.startswith(mount_point + "/") or path_str == mount_point:
+                            if best_match is None or len(mount_point) > len(best_match):
+                                best_match = mount_point
+                                best_fs_type = fs_type
+                # Check if the best matching mount is a network FS
+                if best_fs_type in ("nfs", "nfs4", "cifs", "smbfs", "fuse.sshfs"):
+                    return True
         except (FileNotFoundError, PermissionError):
             pass
🧹 Nitpick comments (37)
pyproject.toml (1)

19-28: Avoid blanket DeprecationWarning ignores if possible.
Global suppression can mask upcoming breakages; consider scoping to known noisy modules or leaving DeprecationWarnings visible in CI.

start_ui.sh (1)

3-3: Remove or use SCRIPT_DIR to avoid dead state.

SCRIPT_DIR is currently unused; either remove it or use it to build absolute paths for inputs like requirements.txt and start_ui.py so future edits don’t rely on implicit CWD.

🧹 Suggested cleanup
-SCRIPT_DIR="$(pwd)"
autonomous_agent_demo.py (1)

57-86: Consider centralizing safe_asyncio_run to avoid drift.

This helper now exists here and in parallel_orchestrator.py; extracting it to a shared utility would keep Windows cleanup behavior consistent across entry points.

rate_limit_utils.py (1)

26-53: Consider precompiling retry-after regexes.

The patterns are rebuilt on every call; precompiling reduces overhead if this is on a hot path.

♻️ Suggested refactor
+RETRY_AFTER_PATTERNS = [
+    re.compile(r"retry.?after[:\s]+(\d+)\s*(?:seconds?)?", re.IGNORECASE),
+    re.compile(r"try again in\s+(\d+)\s*(?:seconds?|s\b)", re.IGNORECASE),
+    re.compile(r"(\d+)\s*seconds?\s*(?:remaining|left|until)", re.IGNORECASE),
+]
+
 def parse_retry_after(error_message: str) -> Optional[int]:
@@
-    patterns = [
-        r"retry.?after[:\s]+(\d+)\s*(?:seconds?)?",
-        r"try again in\s+(\d+)\s*(?:seconds?|s\b)",
-        r"(\d+)\s*seconds?\s*(?:remaining|left|until)",
-    ]
-
-    for pattern in patterns:
-        match = re.search(pattern, error_message, re.IGNORECASE)
+    for pattern in RETRY_AFTER_PATTERNS:
+        match = pattern.search(error_message)
         if match:
             return int(match.group(1))
ui/src/components/ErrorBoundary.tsx (1)

67-107: Align fallback UI styling with neobrutalism tokens & animations.

The current palette uses generic grays/blues and no custom animations. Please switch to the required neobrutalism tokens and apply the custom animations (and remember Tailwind v4 CSS variable syntax uses parentheses, e.g., bg-(--color-neo-...)). As per coding guidelines, Use neobrutalism design with Tailwind CSS v4 custom animations (animate-slide-in, animate-pulse-neo, animate-shimmer) and color tokens (--color-neo-pending, --color-neo-progress, --color-neo-done).

server/utils/auth.py (1)

94-106: Consider documenting query param token logging risk.

The query parameter token approach (?token=base64(...)) is a valid workaround for browser WebSockets that don't support custom headers. However, query parameters may be logged in web server access logs or browser history.

The module docstring mentions this option, but consider adding a comment near line 95 noting the logging risk, or recommend using this only in trusted/internal environments.

test_agent.py (1)

89-108: Backoff tests verify formulas, not actual implementation.

These tests compute the expected backoff inline rather than calling functions from rate_limit_utils. If backoff functions exist in the module, consider importing and testing them directly. If they're embedded in the agent logic, these specification tests still provide value as documentation of expected behavior.

Consider testing actual backoff functions if they exist

If rate_limit_utils exports backoff calculation functions, test them directly:

from rate_limit_utils import calculate_backoff, calculate_error_backoff

def test_backoff_sequence(self):
    expected = [60, 120, 240, 480, 960, 1920, 3600, 3600]
    for retries, expected_delay in enumerate(expected):
        assert calculate_backoff(retries) == expected_delay
api/logging_config.py (2)

84-86: Clearing root logger handlers may remove handlers from other modules.

root_logger.handlers.clear() removes all existing handlers, which could affect logging configured elsewhere before setup_logging() is called. Consider either:

  1. Documenting that setup_logging() should be called first at application startup
  2. Only adding handlers without clearing existing ones
Alternative: Add handlers without clearing
-        # Remove existing handlers to avoid duplicates
-        root_logger.handlers.clear()
+        # Check if we've already added our handlers
+        handler_names = {type(h).__name__ for h in root_logger.handlers}
+        if "RotatingFileHandler" in handler_names:
+            # Already configured, skip
+            _logging_configured = True
+            return

Or simply document the expected usage pattern.


118-131: Potential naming collision with structured_logging.get_logger.

The relevant code snippets show structured_logging.py also exports a get_logger function with a different signature (takes project_dir, agent_id, console_output). Having two get_logger functions in the codebase could cause confusion. Consider using a more specific name or documenting the distinction clearly.

quality_gates.py (3)

40-71: Potential command injection risk via cmd parameter.

The _run_command function passes cmd directly to subprocess.run. While the callers within this module construct commands safely, if this function is ever exposed or called with user-controlled input, it could be exploited. Consider documenting this as an internal-only function or adding input validation.

Also, the generic Exception catch on line 70-71 is overly broad and may mask unexpected errors. Consider catching more specific exceptions or at least logging the exception type.

Suggested improvement
 except FileNotFoundError:
     return 127, f"Command not found: {cmd[0]}", 0
-except Exception as e:
-    return 1, str(e), 0
+except (OSError, ValueError) as e:
+    return 1, f"Command execution error: {e}", 0

103-127: Detection order may produce unexpected results.

The function checks global ruff/flake8 before checking the project's venv. If a user has a globally installed older version but the project venv has a specific version, the global one will be used. Consider prioritizing the venv paths first for project-local consistency.

Consider reordering detection
 def _detect_python_linter(project_dir: Path) -> tuple[str, list[str]] | None:
-    # Check for ruff
-    if shutil.which("ruff"):
-        return ("ruff", ["ruff", "check", "."])
-
-    # Check for flake8
-    if shutil.which("flake8"):
-        return ("flake8", ["flake8", "."])
-
     # Check in virtual environment
     venv_ruff = project_dir / "venv/bin/ruff"
     if venv_ruff.exists():
         return ("ruff", [str(venv_ruff), "check", "."])

     venv_flake8 = project_dir / "venv/bin/flake8"
     if venv_flake8.exists():
         return ("flake8", [str(venv_flake8), "."])

+    # Fall back to global tools
+    if shutil.which("ruff"):
+        return ("ruff", ["ruff", "check", "."])
+
+    if shutil.which("flake8"):
+        return ("flake8", ["flake8", "."])
+
     return None

130-152: Same venv priority concern for type checker detection.

The function checks global mypy before venv mypy. Also, the condition on line 145 assumes that the presence of pyproject.toml or setup.py indicates a Python project that should use mypy, but many Python projects don't use type checking. This is acceptable given the "no type checker detected" fallback behavior.

.claude/templates/coding_prompt.template.md (1)

261-287: Mock data detection commands may have false positives.

Line 279's grep for new Map() and new Set() will flag legitimate uses of these data structures (e.g., for caching, memoization, or temporary data processing). This could cause confusion for agents. Consider adding context to clarify that these patterns need investigation, not automatic remediation.

.claude/commands/create-spec.md (1)

237-246: Static analysis: Consider using a proper heading instead of bold emphasis.

The markdownlint tool flagged line 237 for using emphasis (**MANDATORY: Infrastructure Features**) instead of a heading. While this is a stylistic choice, using a proper heading (e.g., #### Mandatory: Infrastructure Features) would improve document structure and accessibility.

Suggested change
-**MANDATORY: Infrastructure Features**
+#### Mandatory: Infrastructure Features
 
-If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
+If the app requires a database (Phase 3b answer was "Yes" or "Not sure"), you MUST include 5 Infrastructure features (indices 0-4):
server/routers/settings.py (1)

136-149: Consider allowing preferred_ide to be cleared.

Currently, if update.preferred_ide is not None, the value is persisted. However, there's no way to clear/reset the preferred IDE to None. If a user wants to remove their preference, they cannot do so with the current logic. Consider supporting an empty string to clear the setting, or document that None means "don't update" rather than "clear".

Suggested approach to allow clearing
-    if update.preferred_ide is not None:
-        set_setting("preferred_ide", update.preferred_ide)
+    # Handle preferred_ide - allow empty string to clear
+    if update.preferred_ide is not None:
+        if update.preferred_ide == "":
+            # Clear the preference by setting empty string
+            set_setting("preferred_ide", "")
+        else:
+            set_setting("preferred_ide", update.preferred_ide)

Note: This would also require updating the validator in schemas.py to allow empty strings.

.github/workflows/ci.yml (1)

21-24: Avoid double-running security tests in CI.

pytest tests/ -v will re-run the two security files already executed in the previous step. Consider excluding them from the all-tests step (or removing the dedicated step) to save CI time.

♻️ Example tweak to avoid duplication
-      - name: Run all tests
-        run: python -m pytest tests/ -v
+      - name: Run all tests (excluding security)
+        run: python -m pytest tests/ -v --ignore=tests/test_security.py --ignore=tests/test_security_integration.py
ui/src/components/SettingsModal.tsx (1)

4-21: Add ARIA state to the IDE toggle group for accessibility.

These buttons act like a radio group; adding role/aria-checked (or aria-pressed) improves screen‑reader clarity.

♿ Suggested tweak
-              <div className="flex rounded-lg border overflow-hidden">
+              <div className="flex rounded-lg border overflow-hidden" role="radiogroup" aria-label="Preferred IDE">
                 {IDE_OPTIONS.map((ide) => (
                   <button
                     key={ide.id}
                     onClick={() => handleIDEChange(ide.id)}
                     disabled={isSaving}
+                    role="radio"
+                    aria-checked={settings.preferred_ide === ide.id}
                     className={`flex-1 py-2 px-3 text-sm font-medium transition-colors ${
                       settings.preferred_ide === ide.id
                         ? 'bg-primary text-primary-foreground'
                         : 'bg-background text-foreground hover:bg-muted'
                     } ${isSaving ? 'opacity-50 cursor-not-allowed' : ''}`}
                   >

Also applies to: 52-56, 209-231

ui/src/components/IDESelectionModal.tsx (1)

15-20: Consider passing isLoading prop from parent component.

The isLoading prop is well-integrated into the component (disabling inputs, showing spinner), but looking at App.tsx lines 559-563, the modal is rendered without passing isLoading. This means the loading state during IDE opening won't be reflected in the modal UI if it remains open.

If the modal is intended to close immediately upon selection (before the async operation completes), this is fine. Otherwise, consider passing isOpeningIDE from App.tsx.

Also applies to: 105-108

agent.py (1)

192-206: Fragile heuristic for detecting parallel mode.

The check "python" not in parent_name.lower() to detect if the agent is spawned by the orchestrator is unreliable:

  • The parent process name varies by platform and Python installation (e.g., python3, python3.11, pythonw, or even custom names).
  • On some systems, the orchestrator might not have "python" in its process name.

Consider using an explicit environment variable set by the orchestrator instead.

♻️ Suggested approach
-    try:
-        import psutil
-        parent_process = psutil.Process().parent()
-        parent_name = parent_process.name() if parent_process else ""
-
-        # Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator)
-        if "python" not in parent_name.lower():
-            clear_stuck_features(project_dir)
-        # else: Skip clearing - we're in parallel mode, orchestrator manages features
-    except (ImportError, ModuleNotFoundError):
-        # psutil not available - skip clearing to be safe in unknown environment
-        logger.debug("psutil not available, skipping stuck feature clearing")
-    except Exception as e:
-        # If parent process check fails, skip clearing to avoid race conditions
-        logger.debug(f"Parent process check failed ({e}), skipping stuck feature clearing")
+    # Orchestrator sets this env var when spawning parallel agents
+    is_parallel_mode = os.environ.get("AUTOCODER_PARALLEL_MODE", "").lower() in ("1", "true")
+    if not is_parallel_mode:
+        clear_stuck_features(project_dir)

Then have the orchestrator set AUTOCODER_PARALLEL_MODE=1 when spawning agents.

ui/src/App.tsx (1)

242-260: IDE integration flow is well-implemented.

The flow properly:

  • Checks for a preferred IDE before showing the selection modal
  • Manages loading state during the async operation
  • Handles errors gracefully (though silently)

Consider adding user-visible feedback (e.g., toast notification) when opening the IDE fails, as console.error alone won't inform users of the issue.

ui/src/components/ResetProjectModal.tsx (2)

75-113: Hardcoded colors may break dark mode theming.

The reset type toggle buttons use bg-white and hover:bg-gray-50 (lines 81, 101) which won't adapt to dark mode. Consider using theme-aware CSS variables or Tailwind's dark mode utilities.

♻️ Suggested fix
               className={`w-full p-4 text-left border-3 border-[var(--color-neo-border)] transition-colors ${
                 !fullReset
                   ? 'bg-[var(--color-neo-progress)] shadow-neo'
-                  : 'bg-white hover:bg-gray-50'
+                  : 'bg-background hover:bg-muted'
               }`}
               className={`w-full p-4 text-left border-3 border-[var(--color-neo-border)] transition-colors ${
                 fullReset
                   ? 'bg-[var(--color-neo-danger)] text-white shadow-neo'
-                  : 'bg-white hover:bg-gray-50'
+                  : 'bg-background hover:bg-muted'
               }`}

74-114: Consider adding ARIA attributes for the reset type toggle.

The toggle buttons function as a mutually exclusive selection (like radio buttons) but lack ARIA attributes. For consistency with IDESelectionModal.tsx and better accessibility:

♿ Suggested accessibility enhancement
           {/* Reset Type Toggle */}
-          <div className="space-y-3">
+          <div className="space-y-3" role="radiogroup" aria-label="Reset type">
             <button
               type="button"
               onClick={() => setFullReset(false)}
+              role="radio"
+              aria-checked={!fullReset}
               className={...}
             >

Apply similarly to the Full Reset button with aria-checked={fullReset}.

ui/src/lib/api.ts (1)

5-34: Use IDEType for openProjectInIDE to preserve type safety.
The shared type IDEType is defined in types.ts and should be used to restrict the ide parameter to valid values ('vscode' | 'cursor' | 'antigravity') instead of accepting any string.

♻️ Suggested refactor
 import type {
   ProjectSummary,
   ProjectDetail,
   ProjectPrompts,
   FeatureListResponse,
   Feature,
   FeatureCreate,
   FeatureUpdate,
   FeatureBulkCreate,
   FeatureBulkCreateResponse,
   DependencyGraph,
   AgentStatusResponse,
   AgentActionResponse,
   SetupStatus,
   DirectoryListResponse,
   PathValidationResponse,
   AssistantConversation,
   AssistantConversationDetail,
   Settings,
   SettingsUpdate,
   ModelsResponse,
   DevServerStatusResponse,
   DevServerConfig,
   TerminalInfo,
   Schedule,
   ScheduleCreate,
   ScheduleUpdate,
   ScheduleListResponse,
   NextRunResponse,
+  IDEType,
 } from './types'
@@
-export async function openProjectInIDE(name: string, ide: string): Promise<{ status: string; message: string }> {
+export async function openProjectInIDE(
+  name: string,
+  ide: IDEType
+): Promise<{ status: string; message: string }> {
   return fetchJSON(`/projects/${encodeURIComponent(name)}/open-in-ide?ide=${encodeURIComponent(ide)}`, {
     method: 'POST',
   })
 }
api/feature_repository.py (2)

32-34: Duplicate helper function across modules.

The _utc_now() function is also defined in mcp_server/feature_mcp.py (lines 47-49). Consider extracting this to a shared utility module to avoid duplication and ensure consistency.


139-147: SQLAlchemy boolean comparison style.

Using Feature.passes == True works but triggers linter warnings in some configurations. The idiomatic SQLAlchemy approach is Feature.passes.is_(True).

♻️ Optional style fix
-        return {
-            f.id for f in self.session.query(Feature.id).filter(Feature.passes == True).all()
-        }
+        return {
+            f.id for f in self.session.query(Feature.id).filter(Feature.passes.is_(True)).all()
+        }
tests/test_security_integration.py (2)

28-69: Duplicate code: temporary_home context manager.

This context manager is identical to the one in tests/test_security.py (lines 33-71). Consider extracting it to a shared test utility module to avoid duplication.


392-440: Consider using pytest instead of manual test runner.

The manual test runner with try/except blocks and manual pass/fail tracking is functional but doesn't integrate with pytest's test discovery and reporting. Since tests/conftest.py sets up pytest fixtures, these tests would benefit from being converted to pytest test functions.

💡 Example conversion
# Convert to pytest style
import pytest

`@pytest.mark.asyncio`
async def test_blocked_command_via_hook(tmp_path):
    """Test that hardcoded blocked commands are rejected."""
    project_dir = tmp_path / "test_project"
    project_dir.mkdir()
    # ... rest of test logic ...
    assert result.get("decision") == "block"
api/migrations.py (2)

206-208: SQL injection via f-string interpolation.

The column name is interpolated directly into SQL via f-string. While col_name comes from a hardcoded list (line 199-204), this pattern is risky if the list is ever extended with user input or external data.

💡 Safer approach using parameterization or validation
# Add explicit validation for column names
VALID_COLUMN_PATTERN = re.compile(r'^[a-z_]+$')

for col_name, col_type in timestamp_columns:
    if not VALID_COLUMN_PATTERN.match(col_name):
        raise ValueError(f"Invalid column name: {col_name}")
    if col_name not in columns:
        conn.execute(text(f"ALTER TABLE features ADD COLUMN {col_name} {col_type}"))

Note: SQLite doesn't support parameterized DDL, so validation is the safest approach.


279-290: Missing error handling in orchestration function.

run_all_migrations calls each migration sequentially but doesn't handle failures. If one migration fails, subsequent migrations won't run, and there's no indication of which migration failed. Consider adding logging or a more robust error handling strategy.

💡 Suggested improvement
 def run_all_migrations(engine) -> None:
     """Run all migrations in order."""
-    migrate_add_in_progress_column(engine)
-    migrate_fix_null_boolean_fields(engine)
-    # ... etc
+    migrations = [
+        ("in_progress_column", migrate_add_in_progress_column),
+        ("fix_null_booleans", migrate_fix_null_boolean_fields),
+        ("dependencies_column", migrate_add_dependencies_column),
+        ("testing_columns", migrate_add_testing_columns),
+        ("timestamp_columns", migrate_add_timestamp_columns),
+        ("schedules_tables", migrate_add_schedules_tables),
+        ("feature_attempts_table", migrate_add_feature_attempts_table),
+        ("feature_errors_table", migrate_add_feature_errors_table),
+        ("regression_count_column", migrate_add_regression_count_column),
+        ("quality_result_column", migrate_add_quality_result_column),
+    ]
+    
+    for name, migration_fn in migrations:
+        try:
+            migration_fn(engine)
+            logger.debug(f"Migration '{name}' completed")
+        except Exception as e:
+            logger.error(f"Migration '{name}' failed: {e}")
+            raise
tests/conftest.py (2)

73-88: Session rollback may not fully isolate tests.

The db_session fixture rolls back changes in finally, which is good. However, if a test commits explicitly, those changes persist across tests since the same engine is reused via create_database caching.

Consider using a nested transaction (savepoint) pattern for better isolation.

💡 Nested transaction pattern
`@pytest.fixture`
def db_session(temp_db: Path):
    """Get a database session with automatic rollback."""
    from api.database import create_database

    _, SessionLocal = create_database(temp_db)
    connection = SessionLocal().get_bind().connect()
    transaction = connection.begin()
    session = SessionLocal(bind=connection)

    try:
        yield session
    finally:
        session.close()
        transaction.rollback()
        connection.close()

223-255: Double cleanup in populated_db fixture.

The populated_db fixture calls invalidate_engine_cache(temp_db) at line 255, but temp_db (which it depends on) already does this cleanup at line 70. This results in a double-invalidation, which is harmless but redundant.

♻️ Remove redundant cleanup
     yield temp_db

-    # Dispose cached engine to prevent file locks on Windows
-    invalidate_engine_cache(temp_db)
+    # Cleanup handled by temp_db fixture
server/routers/projects.py (3)

462-518: Subprocess command injection risk mitigated but worth documenting.

The IDE opening uses shutil.which() to find the executable (line 499), which is good. However, project_path is passed directly to subprocess.Popen. While project_path comes from the registry (not user input), documenting this trust assumption is valuable.

Also, on Unix, start_new_session=True is used but not on Windows. This is intentional but the asymmetry could be documented.

💡 Add comment for clarity
     try:
         if sys.platform == "win32":
+            # Windows: No need for start_new_session, process detaches automatically
             subprocess.Popen([cmd_path, project_path])
         else:
-            # Unix-like systems
+            # Unix-like systems: start_new_session detaches process from terminal
             subprocess.Popen([cmd_path, project_path], start_new_session=True)

726-727: Redundant validation with schema.

The KnowledgeFileUpload schema already has a pattern validation pattern=r'^[a-zA-Z0-9_\-\.]+\.md$' (per the relevant snippets). The manual check on line 726 for .., /, and \\ is partially redundant since the schema regex already restricts the format.

However, the schema validation runs before this code, so this check is a defense-in-depth measure. Consider aligning the validation or removing the redundancy.


694-695: Filename regex inconsistency with schema.

The regex r'^[a-zA-Z0-9_\-\.]+\.md$' on line 694 matches the schema's pattern, which is good. However, this validation is duplicated in multiple places (lines 694, 757). Consider extracting to a shared constant or utility.

♻️ Extract validation pattern
# At module level or in validation utils
KNOWLEDGE_FILENAME_PATTERN = re.compile(r'^[a-zA-Z0-9_\-\.]+\.md$')

def validate_knowledge_filename(filename: str) -> None:
    if not KNOWLEDGE_FILENAME_PATTERN.match(filename):
        raise HTTPException(status_code=400, detail="Invalid filename")
api/connection.py (1)

419-436: Type hint doesn't reflect generator behavior.

get_db() is a generator function (uses yield) designed for FastAPI's Depends(), but the return type Session doesn't indicate this. While FastAPI handles it correctly, the type annotation is misleading for static analysis and IDE support.

Suggested fix: use Generator type hint
+from typing import Any, Generator, Optional
-from typing import Any, Optional
 ...
-def get_db() -> Session:
+def get_db() -> Generator[Session, None, None]:
     """
     Dependency for FastAPI to get database session.

     Yields a database session and ensures it's closed after use.
     Properly rolls back on error to prevent PendingRollbackError.
     """
api/models.py (2)

28-30: _utc_now() is duplicated across multiple modules.

This helper function appears identically in api/feature_repository.py (lines 31-33) and mcp_server/feature_mcp.py (lines 47-49). Consider consolidating into a shared utility module to follow DRY principles.


21-25: Update import to use the modern SQLAlchemy 2.0 location.

The project uses SQLAlchemy 2.0 (as specified in pyproject.toml), which moved declarative_base from the legacy sqlalchemy.ext.declarative namespace to sqlalchemy.orm. Using the new location aligns with SQLAlchemy 2.0 best practices.

Suggested fix
-from sqlalchemy.ext.declarative import declarative_base
+from sqlalchemy.orm import declarative_base

Comment on lines +5 to 37
This module re-exports all database components for backwards compatibility.

def _utc_now() -> datetime:
"""Return current UTC time. Replacement for deprecated _utc_now()."""
return datetime.now(timezone.utc)
The implementation has been split into:
- api/models.py - SQLAlchemy ORM models
- api/migrations.py - Database migration functions
- api/connection.py - Connection management and session utilities
"""

from sqlalchemy import (
Boolean,
CheckConstraint,
Column,
DateTime,
ForeignKey,
Index,
Integer,
String,
Text,
create_engine,
text,
from api.connection import (
SQLITE_BUSY_TIMEOUT_MS,
SQLITE_MAX_RETRIES,
SQLITE_RETRY_DELAY_MS,
check_database_health,
checkpoint_wal,
create_database,
execute_with_retry,
get_database_path,
get_database_url,
get_db,
get_db_session,
get_robust_connection,
invalidate_engine_cache,
robust_db_connection,
set_session_maker,
)
from api.models import (
Base,
Feature,
FeatureAttempt,
FeatureError,
Schedule,
ScheduleOverride,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm where Feature is defined and its implementation
echo "=== Searching for Feature class definition ==="
rg -n "class Feature" api/database.py api/models.py

echo ""
echo "=== Feature class full definition from api/models.py ==="
# Get the Feature class definition to verify required fields
rg -A 20 "^class Feature" api/models.py

Repository: leonvanzyl/autocoder

Length of output: 2694


🏁 Script executed:

#!/bin/bash
# Get complete Feature class definition to see all columns including dependencies
rg -A 40 "^class Feature\(Base\)" api/models.py | head -50

Repository: leonvanzyl/autocoder

Length of output: 2164


Feature ORM definition must be moved to api/database.py per project guideline.

The Feature class is currently defined in api/models.py and re-exported in api/database.py, but the project guideline explicitly requires it to be defined directly in api/database.py. While all required fields (priority, category, name, description, steps, passes, dependencies) are present and correct, the architectural location violates the guideline. Move the Feature class definition to api/database.py.

🤖 Prompt for AI Agents
In `@api/database.py` around lines 5 - 37, Move the Feature ORM class from
api/models.py into api/database.py and keep all other models (FeatureAttempt,
FeatureError, Schedule, ScheduleOverride) in api/models.py; in api/database.py
define Feature using the existing Base (ensure Base is imported or declared
before the class) and the same columns/relationships (priority, category, name,
description, steps, passes, dependencies) so the class signature and metadata
remain unchanged, then remove Feature from api/models.py and update any
re-exports/imports to reference api.database.Feature instead of
api.models.Feature; ensure the module still re-exports Base and the other model
classes and that any session/engine utilities (get_db, get_db_session,
create_database, invalidate_engine_cache, etc.) continue to import/import-from
the same names.

Comment on lines +37 to +77
def _commit_with_retry(session: Session, max_retries: int = MAX_COMMIT_RETRIES) -> None:
"""
Commit a session with retry logic for transient errors.

Handles SQLITE_BUSY, SQLITE_LOCKED, and similar transient errors
with exponential backoff.

Args:
session: SQLAlchemy session to commit
max_retries: Maximum number of retry attempts

Raises:
OperationalError: If commit fails after all retries
"""
delay_ms = INITIAL_RETRY_DELAY_MS
last_error = None

for attempt in range(max_retries + 1):
try:
session.commit()
return
except OperationalError as e:
error_msg = str(e).lower()
# Retry on lock/busy errors
if "locked" in error_msg or "busy" in error_msg:
last_error = e
if attempt < max_retries:
logger.warning(
f"Database commit failed (attempt {attempt + 1}/{max_retries + 1}), "
f"retrying in {delay_ms}ms: {e}"
)
time.sleep(delay_ms / 1000)
delay_ms *= 2 # Exponential backoff
session.rollback() # Reset session state before retry
continue
raise

# If we get here, all retries failed
if last_error:
logger.error(f"Database commit failed after {max_retries + 1} attempts")
raise last_error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry logic loses pending changes after rollback.

After session.rollback() on line 70, any pending changes made before calling _commit_with_retry are lost. On retry, the function only attempts session.commit() again without re-applying the changes, which will either commit nothing or fail.

The caller must re-add/modify entities after each rollback for this pattern to work correctly. This is likely a design assumption, but it's worth documenting or restructuring to make this explicit.

💡 Consider this approach

The current pattern works if callers follow a specific flow:

  1. Modify entities
  2. Call _commit_with_retry
  3. On failure, caller must re-query and re-modify

However, the callers (e.g., mark_in_progress) modify entities and then call commit. After rollback, those modifications are lost but the retry loop still tries to commit an empty transaction.

Either document this limitation clearly or restructure so callers handle retry themselves.

🤖 Prompt for AI Agents
In `@api/feature_repository.py` around lines 37 - 77, The retry loop in
_commit_with_retry currently calls session.rollback(), which discards caller
modifications (e.g., changes made by mark_in_progress) and makes retries
meaningless; change the behavior so the function does NOT perform
session.rollback() on transient OperationalError, and instead either (A)
propagate/raise a specific retryable exception (e.g., RetryableCommitError) or
re-raise the OperationalError immediately so the caller (such as
mark_in_progress) can catch it, re-apply changes, and perform its own retry
logic; also update the _commit_with_retry docstring to state that callers are
responsible for re-applying pending changes after a rollback and for retrying
commits.

Comment on lines +205 to +223
def mark_in_progress(self, feature_id: int) -> Optional[Feature]:
"""Mark a feature as in progress.

Args:
feature_id: The feature ID to update.

Returns:
Updated Feature or None if not found.

Note:
Uses retry logic to handle transient database errors.
"""
feature = self.get_by_id(feature_id)
if feature and not feature.passes and not feature.in_progress:
feature.in_progress = True
feature.started_at = _utc_now()
_commit_with_retry(self.session)
self.session.refresh(feature)
return feature
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Conditional update may not re-apply changes on retry.

In mark_in_progress, if the commit fails and _commit_with_retry rolls back, the modifications to feature.in_progress and feature.started_at are lost. The retry in _commit_with_retry will commit an empty transaction since the feature object was detached/reverted.

This applies to all status update methods (mark_passing, mark_failing, clear_in_progress).

🔧 Proposed fix: wrap modification in retry loop
 def mark_in_progress(self, feature_id: int) -> Optional[Feature]:
+    delay_ms = INITIAL_RETRY_DELAY_MS
+    for attempt in range(MAX_COMMIT_RETRIES + 1):
         feature = self.get_by_id(feature_id)
         if feature and not feature.passes and not feature.in_progress:
             feature.in_progress = True
             feature.started_at = _utc_now()
-            _commit_with_retry(self.session)
+            try:
+                self.session.commit()
+            except OperationalError as e:
+                if attempt < MAX_COMMIT_RETRIES and ("locked" in str(e).lower() or "busy" in str(e).lower()):
+                    self.session.rollback()
+                    time.sleep(delay_ms / 1000)
+                    delay_ms *= 2
+                    continue
+                raise
             self.session.refresh(feature)
-        return feature
+            return feature
+    return feature
🤖 Prompt for AI Agents
In `@api/feature_repository.py` around lines 205 - 223, The status-update methods
(mark_in_progress, mark_passing, mark_failing, clear_in_progress) modify an ORM
object then call _commit_with_retry, but if a commit fails and rollback happens
those in-memory changes are lost and retries will commit nothing; fix by
ensuring the state changes are applied inside the retryable commit path — e.g.,
change each method to fetch/merge the Feature (use get_by_id or session.merge),
then wrap the attribute assignments and commit inside the same retry loop or
pass a callback to _commit_with_retry that re-applies feature.in_progress /
feature.passes / feature.started_at modifications before each commit attempt so
retries always persist the intended updates and session.refresh(feature) is done
after a successful commit.

Comment on lines +113 to +116
# Log startup
logger = logging.getLogger(__name__)
logger.debug(f\"Logging initialized. Log file: {log_path}\")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Typo: Escaped quotes in f-string.

Line 115 has \" instead of regular quotes, which will cause a syntax error.

Proposed fix
-        logger.debug(f\"Logging initialized. Log file: {log_path}\")
+        logger.debug(f"Logging initialized. Log file: {log_path}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Log startup
logger = logging.getLogger(__name__)
logger.debug(f\"Logging initialized. Log file: {log_path}\")
# Log startup
logger = logging.getLogger(__name__)
logger.debug(f"Logging initialized. Log file: {log_path}")
🤖 Prompt for AI Agents
In `@api/logging_config.py` around lines 113 - 116, The f-string for the startup
log contains escaped quotes causing a syntax error; update the logger.debug call
(the one using logger from logging.getLogger(__name__)) to use a proper f-string
delimiter (e.g., f"Logging initialized. Log file: {log_path}" or f'Logging
initialized. Log file: {log_path}') so the string is valid and includes the
log_path variable.

Comment on lines +181 to +186
except (ValueError, UnicodeDecodeError, binascii.Error):
return Response(
status_code=401,
content="Invalid authorization header",
headers={"WWW-Authenticate": 'Basic realm="Autocoder"'},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing binascii import causes NameError.

The exception handler references binascii.Error, but the binascii module is not imported at the top of the file. This will raise a NameError when an invalid base64-encoded authorization header is provided.

🐛 Proposed fix

Add the import at the top of the file with other imports:

 import asyncio
 import base64
+import binascii
 import os
🤖 Prompt for AI Agents
In `@server/main.py` around lines 181 - 186, The except clause in the
authorization parsing references binascii.Error (in the except tuple around
ValueError and UnicodeDecodeError and the Response return block), but binascii
is not imported; add "import binascii" alongside the other top-of-file imports
in server/main.py so the NameError is avoided and the exception handler can
catch base64 decode errors properly.

Comment on lines 127 to 134
class FeatureUpdate(BaseModel):
"""Request schema for updating a feature (partial updates allowed)."""
category: str | None = None
name: str | None = None
description: str | None = None
steps: list[str] | None = None
"""Request schema for updating a feature. All fields optional for partial updates."""
category: str | None = Field(None, min_length=1, max_length=100)
name: str | None = Field(None, min_length=1, max_length=255)
description: str | None = Field(None, min_length=1)
steps: list[str] | None = Field(None, min_length=1)
priority: int | None = None
dependencies: list[int] | None = None # Optional - can update dependencies
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find requirements or setup files to determine Pydantic version
find . -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" \) | head -20

Repository: leonvanzyl/autocoder

Length of output: 99


🌐 Web query:

Pydantic v2 Field min_length for list type validation behavior

💡 Result:

In Pydantic v2, Field(min_length=...) on a list[...] constrains the number of items in the list (JSON Schema minItems), not the “length” of each element. Lists explicitly support min_length / max_length as item-count constraints. [1] Field documents min_length as “minimum length for iterables” (lists, tuples, sets, etc.). [2]

So:

from typing import Annotated
from annotated_types import MinLen
from pydantic import BaseModel, Field

class M(BaseModel):
    # list must have >= 2 items
    xs: list[str] = Field(min_length=2)

    # list must have >= 2 items AND each string must have length >= 3
    ys: list[Annotated[str, MinLen(3)]] = Field(min_length=2)

If you prefer a constrained-list type, conlist(item_type, min_length=..., max_length=...) is the dedicated helper (it wraps the list type with length constraints). [3]

Also note: v1’s min_items / max_items were replaced by min_length / max_length (and min_items/max_items are deprecated aliases in the Field implementation). [2]

Sources: [1] [2] [3]


🏁 Script executed:

cat requirements.txt

Repository: leonvanzyl/autocoder

Length of output: 509


🏁 Script executed:

cat pyproject.toml

Repository: leonvanzyl/autocoder

Length of output: 729


🏁 Script executed:

cat -n server/schemas.py | sed -n '120,145p'

Repository: leonvanzyl/autocoder

Length of output: 1138


🏁 Script executed:

# Check if there are any custom validators in the FeatureUpdate model
rg -A 20 "class FeatureUpdate" server/schemas.py

Repository: leonvanzyl/autocoder

Length of output: 882


Each step string in steps list should be validated as non-empty for consistency.

On line 132, steps: list[str] | None = Field(None, min_length=1) validates that the list has at least one item but does not ensure each step string is non-empty. In Pydantic v2, min_length on a list constrains item count, not string length. Comparing with other string fields (category, name, description) which all use min_length=1 to prevent empty strings, the steps field should similarly validate that each step is a non-empty string. Consider using list[Annotated[str, MinLen(1)]] or a custom validator to enforce this constraint.

🤖 Prompt for AI Agents
In `@server/schemas.py` around lines 127 - 134, The steps field in FeatureUpdate
currently only enforces the list length but not that each string is non-empty;
update the schema so each step string has min length 1 (e.g., change the typing
of steps from list[str] | None to list[Annotated[str, MinLen(1)]] | None or add
a Pydantic validator on FeatureUpdate.steps that checks every item is a
non-empty string) and ensure imports for Annotated and MinLen (or the validator
utilities) are added/updated accordingly.

Comment on lines +48 to +50
# Default max output tokens - use 131k only for alternative APIs (like GLM), otherwise use 32k for Anthropic
import os
DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate import and missing uuid module.

  1. os is already imported at line 12, making this redundant
  2. uuid is used at line 313 (uuid.uuid4().hex) but is never imported, causing a NameError at runtime
Proposed fix
-# Default max output tokens - use 131k only for alternative APIs (like GLM), otherwise use 32k for Anthropic
-import os
-DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
+import uuid
+
+# Default max output tokens - use 131k only for alternative APIs (like GLM), otherwise use 32k for Anthropic
+DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Default max output tokens - use 131k only for alternative APIs (like GLM), otherwise use 32k for Anthropic
import os
DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
import uuid
# Default max output tokens - use 131k only for alternative APIs (like GLM), otherwise use 32k for Anthropic
DEFAULT_MAX_OUTPUT_TOKENS = "131072" if os.getenv("ANTHROPIC_BASE_URL") else "32000"
🤖 Prompt for AI Agents
In `@server/services/assistant_chat_session.py` around lines 48 - 50, Remove the
duplicate os import and add the missing uuid import at the top of the module so
uuid.uuid4().hex can be used without error; specifically, eliminate the second
"import os" near the DEFAULT_MAX_OUTPUT_TOKENS definition (leave the original
import at the top that already exists) and add "import uuid" alongside other
imports so the usage of uuid in the code (uuid.uuid4().hex) resolves correctly
while keeping DEFAULT_MAX_OUTPUT_TOKENS logic unchanged.

Comment on lines +96 to +195
def _init_database(self) -> None:
"""Initialize the SQLite database for logs."""
with self._lock:
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()

# Enable WAL mode for better concurrency with parallel agents
# WAL allows readers and writers to work concurrently without blocking
cursor.execute("PRAGMA journal_mode=WAL")

# Create logs table
cursor.execute("""
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
level TEXT NOT NULL,
message TEXT NOT NULL,
agent_id TEXT,
feature_id INTEGER,
tool_name TEXT,
duration_ms INTEGER,
extra TEXT
)
""")

# Create indexes for common queries
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_timestamp
ON logs(timestamp)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_level
ON logs(level)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_agent_id
ON logs(agent_id)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_feature_id
ON logs(feature_id)
""")

conn.commit()
conn.close()

def emit(self, record: logging.LogRecord) -> None:
"""Store a log record in the database."""
try:
# Extract structured data from record
entry = StructuredLogEntry(
timestamp=datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
level=record.levelname.lower(),
message=self.format(record),
agent_id=getattr(record, "agent_id", self.agent_id),
feature_id=getattr(record, "feature_id", None),
tool_name=getattr(record, "tool_name", None),
duration_ms=getattr(record, "duration_ms", None),
extra=getattr(record, "extra", {}),
)

with self._lock:
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()

cursor.execute(
"""
INSERT INTO logs
(timestamp, level, message, agent_id, feature_id, tool_name, duration_ms, extra)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
entry.timestamp,
entry.level,
entry.message,
entry.agent_id,
entry.feature_id,
entry.tool_name,
entry.duration_ms,
json.dumps(entry.extra) if entry.extra else None,
),
)

# Cleanup old entries if over limit
cursor.execute("SELECT COUNT(*) FROM logs")
count = cursor.fetchone()[0]
if count > self.max_entries:
delete_count = count - self.max_entries
cursor.execute(
"""
DELETE FROM logs WHERE id IN (
SELECT id FROM logs ORDER BY timestamp ASC LIMIT ?
)
""",
(delete_count,),
)

conn.commit()
conn.close()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

SQLite WAL mode concurrent writers database locked busy_timeout official documentation

💡 Result:

  • WAL mode supports concurrent readers and a writer, but it does not support concurrent writers: “there can only be one writer at a time” because only one connection can hold the exclusive WAL write lock while appending to the WAL. [1], [2]
  • So if two connections try to write at once, the second writer will typically hit SQLITE_BUSY / “database is locked” until the first writer finishes (or until it gives up). [1], [2]
  • To wait instead of failing immediately, set a busy timeout (or busy handler):
    • C API: sqlite3_busy_timeout(db, ms) makes SQLite sleep/retry up to ms total before returning SQLITE_BUSY. [3]
    • SQL: PRAGMA busy_timeout = ms; (sets the connection’s busy handler; only one busy handler per connection). [4]
  • Even with a busy handler/timeout, SQLite may return SQLITE_BUSY without calling the handler if calling it could risk a deadlock (per the official sqlite3_busy_handler() docs). [5]

Sources: Write-Ahead Logging [1], WAL file format / WAL_WRITE_LOCK [2], sqlite3_busy_timeout() [3], PRAGMA busy_timeout [4], sqlite3_busy_handler() [5].


SQLite writes may fail under multi-process contention without a busy timeout.

WAL mode supports concurrent readers and a single writer, but does not support concurrent writers. When multiple processes attempt to write simultaneously (e.g., parallel logging agents), the second writer will hit database is locked unless a timeout is configured. Configure a busy timeout on database connections to allow SQLite to retry instead of failing immediately.

🛠️ Proposed fix
-            conn = sqlite3.connect(self.db_path)
+            conn = sqlite3.connect(self.db_path, timeout=5.0)
             cursor = conn.cursor()
@@
-                conn = sqlite3.connect(self.db_path)
+                conn = sqlite3.connect(self.db_path, timeout=5.0)
                 cursor = conn.cursor()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _init_database(self) -> None:
"""Initialize the SQLite database for logs."""
with self._lock:
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Enable WAL mode for better concurrency with parallel agents
# WAL allows readers and writers to work concurrently without blocking
cursor.execute("PRAGMA journal_mode=WAL")
# Create logs table
cursor.execute("""
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
level TEXT NOT NULL,
message TEXT NOT NULL,
agent_id TEXT,
feature_id INTEGER,
tool_name TEXT,
duration_ms INTEGER,
extra TEXT
)
""")
# Create indexes for common queries
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_timestamp
ON logs(timestamp)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_level
ON logs(level)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_agent_id
ON logs(agent_id)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_feature_id
ON logs(feature_id)
""")
conn.commit()
conn.close()
def emit(self, record: logging.LogRecord) -> None:
"""Store a log record in the database."""
try:
# Extract structured data from record
entry = StructuredLogEntry(
timestamp=datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
level=record.levelname.lower(),
message=self.format(record),
agent_id=getattr(record, "agent_id", self.agent_id),
feature_id=getattr(record, "feature_id", None),
tool_name=getattr(record, "tool_name", None),
duration_ms=getattr(record, "duration_ms", None),
extra=getattr(record, "extra", {}),
)
with self._lock:
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute(
"""
INSERT INTO logs
(timestamp, level, message, agent_id, feature_id, tool_name, duration_ms, extra)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
entry.timestamp,
entry.level,
entry.message,
entry.agent_id,
entry.feature_id,
entry.tool_name,
entry.duration_ms,
json.dumps(entry.extra) if entry.extra else None,
),
)
# Cleanup old entries if over limit
cursor.execute("SELECT COUNT(*) FROM logs")
count = cursor.fetchone()[0]
if count > self.max_entries:
delete_count = count - self.max_entries
cursor.execute(
"""
DELETE FROM logs WHERE id IN (
SELECT id FROM logs ORDER BY timestamp ASC LIMIT ?
)
""",
(delete_count,),
)
conn.commit()
conn.close()
def _init_database(self) -> None:
"""Initialize the SQLite database for logs."""
with self._lock:
conn = sqlite3.connect(self.db_path, timeout=5.0)
cursor = conn.cursor()
# Enable WAL mode for better concurrency with parallel agents
# WAL allows readers and writers to work concurrently without blocking
cursor.execute("PRAGMA journal_mode=WAL")
# Create logs table
cursor.execute("""
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
level TEXT NOT NULL,
message TEXT NOT NULL,
agent_id TEXT,
feature_id INTEGER,
tool_name TEXT,
duration_ms INTEGER,
extra TEXT
)
""")
# Create indexes for common queries
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_timestamp
ON logs(timestamp)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_level
ON logs(level)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_agent_id
ON logs(agent_id)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_logs_feature_id
ON logs(feature_id)
""")
conn.commit()
conn.close()
def emit(self, record: logging.LogRecord) -> None:
"""Store a log record in the database."""
try:
# Extract structured data from record
entry = StructuredLogEntry(
timestamp=datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
level=record.levelname.lower(),
message=self.format(record),
agent_id=getattr(record, "agent_id", self.agent_id),
feature_id=getattr(record, "feature_id", None),
tool_name=getattr(record, "tool_name", None),
duration_ms=getattr(record, "duration_ms", None),
extra=getattr(record, "extra", {}),
)
with self._lock:
conn = sqlite3.connect(self.db_path, timeout=5.0)
cursor = conn.cursor()
cursor.execute(
"""
INSERT INTO logs
(timestamp, level, message, agent_id, feature_id, tool_name, duration_ms, extra)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
entry.timestamp,
entry.level,
entry.message,
entry.agent_id,
entry.feature_id,
entry.tool_name,
entry.duration_ms,
json.dumps(entry.extra) if entry.extra else None,
),
)
# Cleanup old entries if over limit
cursor.execute("SELECT COUNT(*) FROM logs")
count = cursor.fetchone()[0]
if count > self.max_entries:
delete_count = count - self.max_entries
cursor.execute(
"""
DELETE FROM logs WHERE id IN (
SELECT id FROM logs ORDER BY timestamp ASC LIMIT ?
)
""",
(delete_count,),
)
conn.commit()
conn.close()
🤖 Prompt for AI Agents
In `@structured_logging.py` around lines 96 - 195, The SQLite connections in
_init_database and emit currently open without a busy timeout, causing "database
is locked" under concurrent writers; update both sqlite3.connect calls to
specify a timeout (e.g. sqlite3.connect(self.db_path, timeout=5.0)) and/or
immediately run "PRAGMA busy_timeout = 5000" on each new connection (use
milliseconds) so writes retry instead of failing; add a configurable attribute
(e.g., self.sqlite_busy_timeout or self.sqlite_timeout) and use it for all new
connections (in _init_database and emit) to ensure consistent behavior across
the class.

Comment on lines +8 to +55
import { useState } from 'react'
import { Bot, FileEdit, Loader2, AlertTriangle } from 'lucide-react'
import { SpecCreationChat } from './SpecCreationChat'
import { startAgent } from '../lib/api'

type InitializerStatus = 'idle' | 'starting' | 'error'

interface ProjectSetupRequiredProps {
projectName: string
onSetupComplete: () => void
}

export function ProjectSetupRequired({ projectName, onSetupComplete }: ProjectSetupRequiredProps) {
const [showChat, setShowChat] = useState(false)
const [initializerStatus, setInitializerStatus] = useState<InitializerStatus>('idle')
const [initializerError, setInitializerError] = useState<string | null>(null)
const [yoloModeSelected, setYoloModeSelected] = useState(false)

const handleClaudeSelect = () => {
setShowChat(true)
}

const handleManualSelect = () => {
// For manual, just refresh to show the empty project
// User can edit prompts/app_spec.txt directly
onSetupComplete()
}

const handleSpecComplete = async (_specPath: string, yoloMode: boolean = false) => {
setYoloModeSelected(yoloMode)
setInitializerStatus('starting')
try {
await startAgent(projectName, { yoloMode })
onSetupComplete()
} catch (err) {
setInitializerStatus('error')
setInitializerError(err instanceof Error ? err.message : 'Failed to start agent')
}
}

const handleRetryInitializer = () => {
setInitializerError(null)
handleSpecComplete('', yoloModeSelected)
}

const handleChatCancel = () => {
setShowChat(false)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard async startAgent against cancel to prevent unintended navigation.

If the user cancels while startAgent is in flight, onSetupComplete() will still fire and can navigate away despite cancel intent. It also leaves stale status/error when returning to the options view. Add a cancel guard (or AbortController if startAgent supports it) and reset state on cancel.

🛠️ Suggested fix
-import { useState } from 'react'
+import { useRef, useState } from 'react'

 export function ProjectSetupRequired({ projectName, onSetupComplete }: ProjectSetupRequiredProps) {
   const [showChat, setShowChat] = useState(false)
   const [initializerStatus, setInitializerStatus] = useState<InitializerStatus>('idle')
   const [initializerError, setInitializerError] = useState<string | null>(null)
   const [yoloModeSelected, setYoloModeSelected] = useState(false)
+  const cancelRef = useRef(false)

   const handleClaudeSelect = () => {
+    cancelRef.current = false
     setShowChat(true)
   }

   const handleSpecComplete = async (_specPath: string, yoloMode: boolean = false) => {
+    cancelRef.current = false
     setYoloModeSelected(yoloMode)
     setInitializerStatus('starting')
     try {
       await startAgent(projectName, { yoloMode })
+      if (cancelRef.current) return
       onSetupComplete()
     } catch (err) {
+      if (cancelRef.current) return
       setInitializerStatus('error')
       setInitializerError(err instanceof Error ? err.message : 'Failed to start agent')
     }
   }

   const handleChatCancel = () => {
+    cancelRef.current = true
+    setInitializerStatus('idle')
+    setInitializerError(null)
     setShowChat(false)
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { useState } from 'react'
import { Bot, FileEdit, Loader2, AlertTriangle } from 'lucide-react'
import { SpecCreationChat } from './SpecCreationChat'
import { startAgent } from '../lib/api'
type InitializerStatus = 'idle' | 'starting' | 'error'
interface ProjectSetupRequiredProps {
projectName: string
onSetupComplete: () => void
}
export function ProjectSetupRequired({ projectName, onSetupComplete }: ProjectSetupRequiredProps) {
const [showChat, setShowChat] = useState(false)
const [initializerStatus, setInitializerStatus] = useState<InitializerStatus>('idle')
const [initializerError, setInitializerError] = useState<string | null>(null)
const [yoloModeSelected, setYoloModeSelected] = useState(false)
const handleClaudeSelect = () => {
setShowChat(true)
}
const handleManualSelect = () => {
// For manual, just refresh to show the empty project
// User can edit prompts/app_spec.txt directly
onSetupComplete()
}
const handleSpecComplete = async (_specPath: string, yoloMode: boolean = false) => {
setYoloModeSelected(yoloMode)
setInitializerStatus('starting')
try {
await startAgent(projectName, { yoloMode })
onSetupComplete()
} catch (err) {
setInitializerStatus('error')
setInitializerError(err instanceof Error ? err.message : 'Failed to start agent')
}
}
const handleRetryInitializer = () => {
setInitializerError(null)
handleSpecComplete('', yoloModeSelected)
}
const handleChatCancel = () => {
setShowChat(false)
}
import { useRef, useState } from 'react'
import { Bot, FileEdit, Loader2, AlertTriangle } from 'lucide-react'
import { SpecCreationChat } from './SpecCreationChat'
import { startAgent } from '../lib/api'
type InitializerStatus = 'idle' | 'starting' | 'error'
interface ProjectSetupRequiredProps {
projectName: string
onSetupComplete: () => void
}
export function ProjectSetupRequired({ projectName, onSetupComplete }: ProjectSetupRequiredProps) {
const [showChat, setShowChat] = useState(false)
const [initializerStatus, setInitializerStatus] = useState<InitializerStatus>('idle')
const [initializerError, setInitializerError] = useState<string | null>(null)
const [yoloModeSelected, setYoloModeSelected] = useState(false)
const cancelRef = useRef(false)
const handleClaudeSelect = () => {
cancelRef.current = false
setShowChat(true)
}
const handleManualSelect = () => {
// For manual, just refresh to show the empty project
// User can edit prompts/app_spec.txt directly
onSetupComplete()
}
const handleSpecComplete = async (_specPath: string, yoloMode: boolean = false) => {
cancelRef.current = false
setYoloModeSelected(yoloMode)
setInitializerStatus('starting')
try {
await startAgent(projectName, { yoloMode })
if (cancelRef.current) return
onSetupComplete()
} catch (err) {
if (cancelRef.current) return
setInitializerStatus('error')
setInitializerError(err instanceof Error ? err.message : 'Failed to start agent')
}
}
const handleRetryInitializer = () => {
setInitializerError(null)
handleSpecComplete('', yoloModeSelected)
}
const handleChatCancel = () => {
cancelRef.current = true
setInitializerStatus('idle')
setInitializerError(null)
setShowChat(false)
}
🤖 Prompt for AI Agents
In `@ui/src/components/ProjectSetupRequired.tsx` around lines 8 - 55, The async
startAgent call in handleSpecComplete needs a cancel guard so onSetupComplete
and state updates don't run if the user cancels; modify handleSpecComplete to
create and use an AbortController signal (if startAgent supports it) or a local
"cancelled" flag/promise, pass the signal to startAgent, and only call
onSetupComplete or setInitializerStatus/error when not cancelled, and update
handleChatCancel to call controller.abort() or set cancelled = true and also
reset showChat, initializerStatus ('idle'), initializerError (null) and
yoloModeSelected to their initial values; also ensure handleRetryInitializer
reinitializes a fresh controller/flag before calling handleSpecComplete so
retries aren't affected by prior cancels.

@getworken getworken closed this Jan 28, 2026
@getworken getworken deleted the feature/my-updates branch January 28, 2026 09:49
rudiheydra added a commit to rudiheydra/AutoBuildr that referenced this pull request Jan 28, 2026
- Added TestAcceptanceGateFailDeterministic class with
  test_acceptance_gate_fail_deterministic() proving AcceptanceGate
  returns verdict='failed' when file_exists validator points to
  a non-existent path
- Deterministic test — no LLM involvement
- All 43 tests in test_dspy_pipeline_e2e.py pass

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant