feat:Add language_detection_options to TranscriptOptionalParams #119

HavenDV · 2025-08-26T21:16:57Z

Summary by CodeRabbit

New Features
- Added optional Language Detection options for transcription when language detection is enabled. You can now:
  - Specify expected languages as a list of language codes.
  - Set a fallback language used when the detected language isn’t in the expected list.
- The fallback defaults to “auto” and can be set to auto-select from the expected languages.
- Backward compatible: no other behaviors or endpoints changed.

coderabbitai · 2025-08-26T21:17:05Z

Walkthrough

Added a new optional object field language_detection_options to TranscriptOptionalParams in src/libs/AssemblyAI/openapi.yaml, introducing expected_languages (array of language codes) and fallback_language (string with default "auto") to configure behavior when language_detection is enabled. No endpoints or other behaviors were changed.

Changes

Cohort / File(s)	Summary
OpenAPI schema: transcript options `src/libs/AssemblyAI/openapi.yaml`	Added TranscriptOptionalParams.language_detection_options object with properties: expected_languages (string array) and fallback_language (string, default "auto"). No other schema elements modified.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant API as AssemblyAI API
  participant Engine as Transcription Engine

  Note over Client,API: Create transcript request
  Client->>API: POST /transcripts { language_detection, language_detection_options }
  alt language_detection enabled
    API->>Engine: Start job with options.expected_languages, options.fallback_language
    Engine-->>API: Job accepted
    API-->>Client: 201 Created + job_id
    loop Processing
      Engine->>Engine: Detect language (use expected_languages)
      opt Detected not in expected_languages
        Engine->>Engine: Apply fallback_language ("auto" allowed)
      end
      Engine->>API: Update transcript status/results
    end
    API-->>Client: GET /transcripts/{id} results
  else language_detection disabled
    API->>Engine: Start job without language detection options
  end

  Note over Client,API: Error paths follow existing API error handling

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my ears at YAML lines,
New fields hop in between the signs—
Expected tongues, a fallback too,
To guide the words the mics once knew.
With "auto" breezes in the air,
I thump: transcripts, handle with care! 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bot/update-openapi_202508262116

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/libs/AssemblyAI/openapi.yaml (1)

1256-1261: Optional: strengthen types, align with existing patterns, and polish labels.

Align with TranscriptLanguageCode by allowing either the enum or any string (as done elsewhere). This keeps SDKs flexible while guiding users.
Enforce uniqueItems and minItems for expected_languages.
Tweak the container label/description to match your style used for redact_pii_audio_options, etc.
Expose TS/Go hints for fallback_language for better SDK ergonomics.

Apply this diff:

-        language_detection_options:
-          x-label: Specify options for Automatic Language Detection.
-          description: Specify options for Automatic Language Detection.
+        language_detection_options:
+          x-label: Language detection options
+          description: Options for Automatic Language Detection. Only used when `language_detection` is true.
           type: object
           additionalProperties: false
           properties:
             expected_languages:
-              x-label: Minimum speakers expected
-              description: List of languages expected in the audio file.
+              x-label: Expected languages
+              description: List of language codes expected in the audio file.
               type: array
-              objects:
-                x-label: language
-                type: string
+              minItems: 1
+              uniqueItems: true
+              items:
+                anyOf:
+                  - $ref: "#/components/schemas/TranscriptLanguageCode"
+                  - type: string
             fallback_language:
               x-label: Fallback language
               description: |
-                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
-              type: string
+                If the detected language of the audio file is not in the list of `expected_languages`, use this fallback language.
+                Set to "auto" to let our model choose the fallback from `expected_languages` with the highest confidence score.
+              type: string
+              x-ts-type: LiteralUnion<TranscriptLanguageCode, string>
+              x-go-type: TranscriptLanguageCode
               default: "auto"

Also applies to: 1262-1275

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c531b33 and d047f7a.

⛔ Files ignored due to path filters (4)

src/libs/AssemblyAI/Generated/AssemblyAI.JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParams.g.cs is excluded by !**/generated/**
src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.Json.g.cs is excluded by !**/generated/**
src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.g.cs is excluded by !**/generated/**

📒 Files selected for processing (1)

src/libs/AssemblyAI/openapi.yaml (1 hunks)

coderabbitai · 2025-08-26T21:22:52Z

src/libs/AssemblyAI/openapi.yaml

+        language_detection_options:
+          x-label: Specify options for Automatic Language Detection.
+          description: Specify options for Automatic Language Detection.
+          type: object
+          additionalProperties: false
+          properties:
+            expected_languages:
+              x-label: Minimum speakers expected
+              description: List of languages expected in the audio file.
+              type: array
+              objects:
+                x-label: language
+                type: string
+            fallback_language:
+              x-label: Fallback language
+              description: |
+                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
+              type: string
+              default: "auto"
+


💡 Verification agent

🧩 Analysis chain

Quick validation checks for this spec block.

Run this script to ensure we’ve eliminated the invalid "objects" key, corrected the "auto" guidance, and that the YAML still parses.

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Fail if any invalid 'objects:' key remains if rg -n "^\s*objects\s*:" -C2 src/libs/AssemblyAI/openapi.yaml; then echo "ERROR: Found invalid 'objects:' key under an array schema." exit 1 else echo "OK: No 'objects:' keys found." fi # 2) Fail if docs still suggest [\"auto\"] (array) for a string field if rg -n '\["auto"\]' src/libs/AssemblyAI/openapi.yaml; then echo 'ERROR: Found ["auto"] in fallback_language description.' exit 1 else echo 'OK: No ["auto"] mention found.' fi # 3) Basic YAML parse sanity check python - <<'PY' import yaml, sys with open("src/libs/AssemblyAI/openapi.yaml", "r") as f: yaml.safe_load(f) print("OK: YAML parsed successfully.") PY

Length of output: 434

AssemblyAI OpenAPI Spec: Remove Invalid objects Key

The validation script still detects an objects: key under the expected_languages array schema on line 1266. This must be replaced with the correct OpenAPI keyword (items:) before the spec can merge cleanly.

Please update src/libs/AssemblyAI/openapi.yaml as follows:

• File: src/libs/AssemblyAI/openapi.yaml
• Line ~1266: Change
diff - objects: + items:
• Ensure the indented block beneath (x-label: language, type: string) remains unchanged under items:.

After making that change, re-run the validation script to confirm:

#!/bin/bash set -euo pipefail # Check for any remaining invalid 'objects:' occurrences rg -n "^\s*objects\s*:" -C2 src/libs/AssemblyAI/openapi.yaml && \ (echo "ERROR: 'objects:' key still present"; exit 1) || \ echo "OK: No 'objects:' keys found." # Confirm no array-style ["auto"] guidance remains rg -n '\["auto"\]' src/libs/AssemblyAI/openapi.yaml && \ (echo "ERROR: Found [\"auto\"] in description"; exit 1) || \ echo "OK: No [\"auto\"] mention found." # Verify YAML parses python - <<'PY' import yaml yaml.safe_load(open("src/libs/AssemblyAI/openapi.yaml")) print("OK: YAML parsed successfully.") PY

Once the objects: key is removed and replaced with items:, the script should pass without errors.

🤖 Prompt for AI Agents

In src/libs/AssemblyAI/openapi.yaml around lines 1256 to 1275 (approximately line 1266), replace the invalid OpenAPI key "objects:" under the expected_languages array with the correct keyword "items:" and keep the indented block beneath it (x-label: language and type: string) exactly as-is under the new items: key; then save and re-run the provided validation script to confirm there are no remaining "objects:" occurrences and YAML parses cleanly.

coderabbitai · 2025-08-26T21:22:52Z

src/libs/AssemblyAI/openapi.yaml

+            expected_languages:
+              x-label: Minimum speakers expected
+              description: List of languages expected in the audio file.
+              type: array
+              objects:
+                x-label: language
+                type: string


⚠️ Potential issue

Fix invalid OpenAPI array schema: replace "objects" with "items" and correct the field label.

OpenAPI uses "items" (not "objects") to define array element schemas. Using "objects" will fail validation and break client generation.

The x-label "Minimum speakers expected" is clearly a copy/paste error for an array of languages.

Apply this diff:

expected_languages: - x-label: Minimum speakers expected + x-label: Expected languages description: List of languages expected in the audio file. type: array - objects: - x-label: language - type: string + items: + type: string

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

expected_languages:

x-label: Minimum speakers expected

description: List of languages expected in the audio file.

type: array

objects:

x-label: language

type: string

expected_languages:

x-label: Expected languages

description: List of languages expected in the audio file.

type: array

items:

type: string

🤖 Prompt for AI Agents

In src/libs/AssemblyAI/openapi.yaml around lines 1262-1268 the array schema is invalid and mislabelled: replace the incorrect "objects" key with the OpenAPI "items" key to define the element schema, and correct the x-label values so the array-level label reflects expected languages (e.g., "Expected languages") and the item-level x-label reflects an individual language (e.g., "language"); keep the item type as string.

coderabbitai · 2025-08-26T21:22:52Z

src/libs/AssemblyAI/openapi.yaml

+              x-label: Fallback language
+              description: |
+                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
+              type: string
+              default: "auto"


⚠️ Potential issue

Documentation/type mismatch: fallback_language description suggests an array, but the type is string.

The text says to specify ["auto"], but the schema defines a string. This will confuse users and SDKs.

Apply this diff:

fallback_language: x-label: Fallback language description: | - If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score. + If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Set to "auto" to let our model choose the fallback language from `expected_languages` with the highest confidence score. type: string default: "auto"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

x-label: Fallback language

description: |

If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.

type: string

default: "auto"

fallback_language:

x-label: Fallback language

description: |

If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Set to "auto" to let our model choose the fallback language from `expected_languages` with the highest confidence score.

type: string

default: "auto"

🤖 Prompt for AI Agents

In src/libs/AssemblyAI/openapi.yaml around lines 1270 to 1274, the description for fallback_language refers to specifying ["auto"] (an array) but the schema sets type: string; change the schema to type: array with items: { type: string } and set default: ["auto"] (or alternatively adjust the description to reference a single string if intended); ensure the description matches the schema and update any examples to use an array of strings when using ["auto"].

feat: Updated OpenAPI spec

d047f7a

github-actions bot approved these changes Aug 26, 2025

View reviewed changes

HavenDV enabled auto-merge (squash) August 26, 2025 21:17

coderabbitai bot changed the title ~~feat:@coderabbitai~~ feat:Add language_detection_options to TranscriptOptionalParams Aug 26, 2025

coderabbitai bot reviewed Aug 26, 2025

View reviewed changes

HavenDV closed this Aug 27, 2025

auto-merge was automatically disabled August 27, 2025 11:19
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat:Add language_detection_options to TranscriptOptionalParams #119

feat:Add language_detection_options to TranscriptOptionalParams #119

Uh oh!

HavenDV commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 26, 2025

Uh oh!

coderabbitai bot Aug 26, 2025

Uh oh!

coderabbitai bot Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat:Add language_detection_options to TranscriptOptionalParams #119

feat:Add language_detection_options to TranscriptOptionalParams #119

Uh oh!

Conversation

HavenDV commented Aug 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HavenDV commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)