Skip to content

Conversation

@Shironex
Copy link
Collaborator

@Shironex Shironex commented Jan 24, 2026

Summary

This PR fixes the app spec generation failing for non-Claude models (Cursor, Gemini, OpenCode, Copilot) that don't support structured output capabilities.

  • Add supportsStructuredOutput() centralized utility function
  • Add explicit JSON instructions for non-Claude models in feature generation
  • Add JSON extraction fallback for text responses in sync-spec
  • Improve consistency across all spec generation endpoints

Fixes #669

Problem

When users select non-Claude AI models (like Cursor, Gemini, or Copilot) for app spec generation, the final step fails with "No valid JSON found in response" error. This happens because:

  1. generate-features-from-spec.ts was missing structured output handling
  2. sync-spec.ts was missing JSON extraction fallback
  3. No consistent way to detect which models support structured output

Solution

Applied the same pattern already used in generate-spec.ts:

  1. Added supportsStructuredOutput() utility (libs/types/src/provider-utils.ts)

    • Centralized function to detect if a model supports JSON schema output
    • Currently returns true for Claude and Codex models
    • Returns false for Cursor, Gemini, OpenCode, and Copilot models
  2. Updated generate-features-from-spec.ts

    • Added featuresOutputSchema for Claude/Codex structured output
    • Added explicit JSON formatting instructions for non-Claude models
    • Pre-extract JSON from text responses using extractJsonWithArray
    • Handle both structured_output and text responses properly
  3. Updated sync-spec.ts

    • Added techStackOutputSchema for structured output
    • Added JSON extraction fallback using extractJson utility
    • Handle both structured output and text parsing paths
  4. Updated generate-spec.ts and validate-issue.ts

    • Replaced isCursorModel() with supportsStructuredOutput() for consistency

Test plan

  • Test spec generation with Claude model (should use structured output)
  • Test spec generation with Cursor model (should use JSON instructions fallback)
  • Test spec regeneration/feature generation with non-Claude models
  • Test sync-spec tech stack analysis with non-Claude models
  • Verify issue validation works with all model types

Related Issues

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Prefer structured JSON output from AI models for feature and tech‑stack extraction; schema-backed parsing when available.
  • Improvements

    • Robust fallback to plain-text JSON extraction with clearer logging and telemetry.
    • Longer, model-aware timeouts and reasoning-effort hints for large feature-generation runs.
    • Parsing now prefers structured responses with safer fallback handling.
  • Chores

    • Public API extended to expose structured-output capability and reasoning-effort metadata.

✏️ Tip: You can customize this high-level summary in your review settings.

… generation

This fixes the app spec generation failing for non-Claude models (Cursor, Gemini,
OpenCode, Copilot) that don't support structured output capabilities.

Changes:
- Add `supportsStructuredOutput()` utility function in @automaker/types to
  centralize model capability detection
- Update generate-features-from-spec.ts:
  - Add explicit JSON instructions for non-Claude/Codex models
  - Define featuresOutputSchema for structured output
  - Pre-extract JSON from text responses using extractJsonWithArray
  - Handle both structured_output and text responses properly
- Update generate-spec.ts:
  - Replace isCursorModel with supportsStructuredOutput for consistency
- Update sync-spec.ts:
  - Add techStackOutputSchema for structured output
  - Add JSON extraction fallback for text responses
  - Handle both structured_output and text parsing
- Update validate-issue.ts:
  - Use supportsStructuredOutput for cleaner capability detection

The fix follows the same pattern used in generate-spec.ts where non-Claude models
receive explicit JSON formatting instructions in the prompt and responses are
parsed using extractJson utilities.

Fixes #669

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 24, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Replaces per-model checks with a generic supportsStructuredOutput(model) predicate and implements a structured-output-first flow with a robust JSON-extraction fallback across feature generation, spec generation, tech-stack sync, and issue validation; threads reasoningEffort and Codex-specific timeout behavior into model resolution and providers.

Changes

Cohort / File(s) Summary
Model capability detection
libs/types/src/provider-utils.ts, libs/types/src/index.ts
Add and re-export supportsStructuredOutput(model); note: provider-utils.ts contains duplicate implementations that should be deduped.
Feature generation flow
apps/server/src/routes/app-spec/generate-features-from-spec.ts
Signature extended to accept abortController, maxFeatures, settingsService; use supportsStructuredOutput/isCodexModel; add featuresOutputSchema + FeaturesExtractionResult; prefer structured_output when present and fallback to extractJsonWithArray with explicit JSON-instruction fallback for non-structured models; add logging/telemetry and Codex timeout handling.
Spec generation entry
apps/server/src/routes/app-spec/generate-spec.ts
Replace prior model-type checks with supportsStructuredOutput(model) and append CRITICAL JSON-only instructions when structured output not supported; log structured-mode.
Tech-stack analysis & sync
apps/server/src/routes/app-spec/sync-spec.ts
Add TechStackExtractionResult and techStackOutputSchema; prefer structured_output with outputFormat schema, fallback to extractJson; update parsing, diff calculation, and logging.
GitHub issue validation
apps/server/src/routes/github/routes/validate-issue.ts
Replace hardcoded Claude/Codex checks with supportsStructuredOutput(model) to decide structured system prompt vs JSON-instruction fallback.
Model resolver
libs/model-resolver/src/resolver.ts
Add reasoningEffort?: ReasoningEffort to ResolvedPhaseModel, propagate through resolvePhaseModel, and include in logs.
Codex provider timeouts
apps/server/src/providers/codex-provider.ts, apps/server/tests/unit/providers/codex-provider.test.ts
Add CODEX_FEATURE_GENERATION_BASE_TIMEOUT_MS and use extended base timeout for high reasoningEffort Codex runs; update tests to assert on new base.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Settings
    participant Model as LLM Provider
    participant Extractor as JSON Extractor

    Client->>Server: request spec/features generation
    Server->>Settings: resolve model & reasoningEffort
    Server->>Model: streamingQuery(prompt, outputFormat? schema)
    alt Model returns structured_output
        Model-->>Server: structured_output (JSON)
        Server->>Server: validate with schema -> contentForParsing
    else Model returns text
        Model-->>Server: text response
        Server->>Extractor: extractJsonWithArray(text)
        alt extraction succeeds
            Extractor-->>Server: parsed JSON -> contentForParsing
        else extraction fails
            Extractor-->>Server: error
            Server->>Server: emit error event & throw
        end
    end
    Server->>Server: parse features from contentForParsing & proceed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Enhancement

Poem

🐰 I nibble at prompts and tidy the nest,

Claude or Codex hum, I do the rest.
If JSON hides, I'll dig with care,
Pull out the schema, tidy and share.
Hop—specs renewed, let's code and pair! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary objective: adding a structured output fallback for non-Claude models in app spec generation, which directly addresses the core issue of spec generation failures.
Linked Issues check ✅ Passed The PR fulfills all coding requirements from issue #669: introduces supportsStructuredOutput for centralized model capability detection, implements structured output with JSON extraction fallback in generate-features-from-spec.ts and sync-spec.ts, replaces isCursorModel with supportsStructuredOutput for consistency, and handles both structured and plain-text parsing paths.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #669: new utility function, conditional structured output logic in affected files, timeout adjustments for feature generation, and schema definitions. One minor concern: the duplicate supportsStructuredOutput export in provider-utils.ts appears unintentional but is related to the feature implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Shironex, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where app specification generation failed for AI models that do not natively support structured JSON output. By introducing a centralized mechanism to detect model capabilities and implementing intelligent fallbacks—including explicit JSON instructions in prompts and post-processing text responses to extract JSON—the system can now reliably generate app specs and analyze tech stacks across a wider range of AI providers, ensuring a more resilient and compatible generation pipeline.

Highlights

  • Centralized Structured Output Detection: Introduced a new utility function, supportsStructuredOutput(), in libs/types/src/provider-utils.ts to consistently determine if a given AI model supports native structured JSON output (currently Claude and Codex models).
  • Enhanced Feature Generation (generate-features-from-spec.ts): Implemented a robust strategy for feature generation: models supporting structured output now use a defined JSON schema, while non-supporting models receive explicit JSON formatting instructions in the prompt and their text responses are processed with a new extractJsonWithArray fallback to extract the JSON.
  • Improved Spec Synchronization (sync-spec.ts): Updated the spec synchronization process to handle tech stack analysis similarly: structured output is used for compatible models via techStackOutputSchema, and a extractJson fallback is applied to text responses from other models, along with explicit JSON instructions in the prompt.
  • Consistency Across Spec Generation: Refactored generate-spec.ts and validate-issue.ts to leverage the new supportsStructuredOutput() utility, replacing previous model-specific checks (e.g., isCursorModel()) for a more unified and maintainable approach to determining structured output capabilities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Shironex Shironex self-assigned this Jan 24, 2026
@Shironex Shironex added Bug Something isn't working Testers-Requested Request for others to test an enhancement or bug fix/etc. Do Not Merge Use this label if something should not be merged. labels Jan 24, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust fallback mechanism for app spec generation on AI models that do not support structured output. The changes are well-executed, introducing a supportsStructuredOutput() utility to centralize model capability detection and applying it consistently across generate-features-from-spec.ts, sync-spec.ts, generate-spec.ts, and validate-issue.ts.

The addition of explicit JSON instructions for non-Claude models and the use of a JSON extraction fallback (extractJson and extractJsonWithArray) significantly improve the reliability of spec generation with a wider range of models. The code is clean, well-documented, and the refactoring improves maintainability.

I have one suggestion in generate-features-from-spec.ts to avoid a redundant parsing attempt on failure, which would make the error handling slightly more efficient and explicit. Overall, this is an excellent contribution that addresses a key functionality gap.

@Shironex Shironex added the Work-In-Progress Currently being addressed. label Jan 24, 2026
- Throw error immediately when JSON extraction fails in
  generate-features-from-spec.ts to avoid redundant parsing attempt
  (feedback from Gemini Code Assist review)
- Emit spec_regeneration_error event before throwing for consistency
- Fix TypeScript cast in sync-spec.ts by using double cast through unknown

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Shironex Shironex removed the Work-In-Progress Currently being addressed. label Jan 24, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@libs/types/src/provider-utils.ts`:
- Around line 348-374: supportsStructuredOutput currently returns true for any
model where isClaudeModel(model) is true, which incorrectly enables structured
output for Claude-branded Copilot/OpenCode IDs (e.g.,
"copilot-claude-3.5-sonnet"); update supportsStructuredOutput(model) to also
exclude copilot/OpenCode models by checking existing helpers (e.g.,
isCopilotModel(model) and isOpenCodeModel(model)) or by explicitly matching
prefixes (e.g., model startsWith 'copilot-' or 'opencode-') and only return true
when isClaudeModel(model) is true AND those exclusion checks are false (i.e.,
return isClaudeModel(model) && !isCopilotModel(model) &&
!isOpenCodeModel(model)); keep isCodexModel logic as-is.

- Introduced a dedicated 5-minute timeout for Codex models during feature generation to accommodate slower response times when generating 50+ features.
- Updated the CodexProvider to utilize this extended timeout based on the reasoning effort level.
- Enhanced the feature generation logic in generate-features-from-spec.ts to detect Codex models and apply the appropriate timeout.
- Modified the model resolver to include reasoning effort in the resolved phase model structure.

This change improves the reliability of feature generation for Codex models, ensuring they have sufficient time to process requests effectively.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/server/src/providers/codex-provider.ts (1)

831-842: Gate the extended timeout on models that actually support reasoning effort.

Right now, any caller passing reasoningEffort: 'xhigh' gets the 5‑minute base timeout even if the model doesn’t support reasoning effort (you already gate CLI overrides with supportsReasoningEffort). This can unnecessarily delay failure detection for unsupported models. Consider normalizing the effective reasoning effort before applying the extended timeout.

🛠️ Suggested fix
-      const baseTimeout =
-        options.reasoningEffort === 'xhigh'
-          ? CODEX_FEATURE_GENERATION_BASE_TIMEOUT_MS
-          : CODEX_CLI_TIMEOUT_MS;
-      const timeout = calculateReasoningTimeout(options.reasoningEffort, baseTimeout);
+      const effectiveReasoningEffort = supportsReasoningEffort(options.model)
+        ? options.reasoningEffort
+        : undefined;
+      const baseTimeout =
+        effectiveReasoningEffort === 'xhigh'
+          ? CODEX_FEATURE_GENERATION_BASE_TIMEOUT_MS
+          : CODEX_CLI_TIMEOUT_MS;
+      const timeout = calculateReasoningTimeout(effectiveReasoningEffort, baseTimeout);

Based on learnings, keep capability checks strictly per-model.

🧹 Nitpick comments (1)
apps/server/src/routes/app-spec/generate-features-from-spec.ts (1)

29-33: Remove unused constant CODEX_FEATURE_GENERATION_TIMEOUT_MS.

This constant is defined but never referenced anywhere in the codebase. The timeout is controlled via reasoningEffort: 'xhigh' passed to streamingQuery, with the actual 5-minute timeout managed by the Codex provider implementation. Remove this dead code.

@Shironex Shironex added Tests Adding / Updating / Removing tests across the project. Ready-To-Merge A feature or bug has been improved/fixed and a final review is requested before merging. and removed Testers-Requested Request for others to test an enhancement or bug fix/etc. Do Not Merge Use this label if something should not be merged. labels Jan 24, 2026
@Shironex Shironex merged commit d12e070 into v0.14.0rc Jan 24, 2026
6 checks passed
@Shironex Shironex deleted the feature/bug-fix-app-spec-generation-for-non-claude-models-dgq0 branch January 24, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Ready-To-Merge A feature or bug has been improved/fixed and a final review is requested before merging. Tests Adding / Updating / Removing tests across the project.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants