Skip to content

Conversation

@HavenDV
Copy link
Contributor

@HavenDV HavenDV commented Aug 26, 2025

Summary by CodeRabbit

  • New Features
    • Added optional Language Detection options for transcription when language detection is enabled. You can now:
      • Specify expected languages as a list of language codes.
      • Set a fallback language used when the detected language isn’t in the expected list.
    • The fallback defaults to “auto” and can be set to auto-select from the expected languages.
    • Backward compatible: no other behaviors or endpoints changed.

@coderabbitai
Copy link

coderabbitai bot commented Aug 26, 2025

Walkthrough

Added a new optional object field language_detection_options to TranscriptOptionalParams in src/libs/AssemblyAI/openapi.yaml, introducing expected_languages (array of language codes) and fallback_language (string with default "auto") to configure behavior when language_detection is enabled. No endpoints or other behaviors were changed.

Changes

Cohort / File(s) Summary
OpenAPI schema: transcript options
src/libs/AssemblyAI/openapi.yaml
Added TranscriptOptionalParams.language_detection_options object with properties: expected_languages (string array) and fallback_language (string, default "auto"). No other schema elements modified.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant API as AssemblyAI API
  participant Engine as Transcription Engine

  Note over Client,API: Create transcript request
  Client->>API: POST /transcripts { language_detection, language_detection_options }
  alt language_detection enabled
    API->>Engine: Start job with options.expected_languages, options.fallback_language
    Engine-->>API: Job accepted
    API-->>Client: 201 Created + job_id
    loop Processing
      Engine->>Engine: Detect language (use expected_languages)
      opt Detected not in expected_languages
        Engine->>Engine: Apply fallback_language ("auto" allowed)
      end
      Engine->>API: Update transcript status/results
    end
    API-->>Client: GET /transcripts/{id} results
  else language_detection disabled
    API->>Engine: Start job without language detection options
  end

  Note over Client,API: Error paths follow existing API error handling
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my ears at YAML lines,
New fields hop in between the signs—
Expected tongues, a fallback too,
To guide the words the mics once knew.
With "auto" breezes in the air,
I thump: transcripts, handle with care! 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bot/update-openapi_202508262116

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@HavenDV HavenDV enabled auto-merge (squash) August 26, 2025 21:17
@coderabbitai coderabbitai bot changed the title feat:@coderabbitai feat:Add language_detection_options to TranscriptOptionalParams Aug 26, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/libs/AssemblyAI/openapi.yaml (1)

1256-1261: Optional: strengthen types, align with existing patterns, and polish labels.

  • Align with TranscriptLanguageCode by allowing either the enum or any string (as done elsewhere). This keeps SDKs flexible while guiding users.
  • Enforce uniqueItems and minItems for expected_languages.
  • Tweak the container label/description to match your style used for redact_pii_audio_options, etc.
  • Expose TS/Go hints for fallback_language for better SDK ergonomics.

Apply this diff:

-        language_detection_options:
-          x-label: Specify options for Automatic Language Detection.
-          description: Specify options for Automatic Language Detection.
+        language_detection_options:
+          x-label: Language detection options
+          description: Options for Automatic Language Detection. Only used when `language_detection` is true.
           type: object
           additionalProperties: false
           properties:
             expected_languages:
-              x-label: Minimum speakers expected
-              description: List of languages expected in the audio file.
+              x-label: Expected languages
+              description: List of language codes expected in the audio file.
               type: array
-              objects:
-                x-label: language
-                type: string
+              minItems: 1
+              uniqueItems: true
+              items:
+                anyOf:
+                  - $ref: "#/components/schemas/TranscriptLanguageCode"
+                  - type: string
             fallback_language:
               x-label: Fallback language
               description: |
-                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
-              type: string
+                If the detected language of the audio file is not in the list of `expected_languages`, use this fallback language.
+                Set to "auto" to let our model choose the fallback from `expected_languages` with the highest confidence score.
+              type: string
+              x-ts-type: LiteralUnion<TranscriptLanguageCode, string>
+              x-go-type: TranscriptLanguageCode
               default: "auto"

Also applies to: 1262-1275

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c531b33 and d047f7a.

⛔ Files ignored due to path filters (4)
  • src/libs/AssemblyAI/Generated/AssemblyAI.JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
  • src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParams.g.cs is excluded by !**/generated/**
  • src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.Json.g.cs is excluded by !**/generated/**
  • src/libs/AssemblyAI/Generated/AssemblyAI.Models.TranscriptOptionalParamsLanguageDetectionOptions.g.cs is excluded by !**/generated/**
📒 Files selected for processing (1)
  • src/libs/AssemblyAI/openapi.yaml (1 hunks)

Comment on lines +1256 to +1275
language_detection_options:
x-label: Specify options for Automatic Language Detection.
description: Specify options for Automatic Language Detection.
type: object
additionalProperties: false
properties:
expected_languages:
x-label: Minimum speakers expected
description: List of languages expected in the audio file.
type: array
objects:
x-label: language
type: string
fallback_language:
x-label: Fallback language
description: |
If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
type: string
default: "auto"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Quick validation checks for this spec block.

Run this script to ensure we’ve eliminated the invalid "objects" key, corrected the "auto" guidance, and that the YAML still parses.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Fail if any invalid 'objects:' key remains
if rg -n "^\s*objects\s*:" -C2 src/libs/AssemblyAI/openapi.yaml; then
  echo "ERROR: Found invalid 'objects:' key under an array schema."
  exit 1
else
  echo "OK: No 'objects:' keys found."
fi

# 2) Fail if docs still suggest [\"auto\"] (array) for a string field
if rg -n '\["auto"\]' src/libs/AssemblyAI/openapi.yaml; then
  echo 'ERROR: Found ["auto"] in fallback_language description.'
  exit 1
else
  echo 'OK: No ["auto"] mention found.'
fi

# 3) Basic YAML parse sanity check
python - <<'PY'
import yaml, sys
with open("src/libs/AssemblyAI/openapi.yaml", "r") as f:
    yaml.safe_load(f)
print("OK: YAML parsed successfully.")
PY

Length of output: 434


AssemblyAI OpenAPI Spec: Remove Invalid objects Key

The validation script still detects an objects: key under the expected_languages array schema on line 1266. This must be replaced with the correct OpenAPI keyword (items:) before the spec can merge cleanly.

Please update src/libs/AssemblyAI/openapi.yaml as follows:

• File: src/libs/AssemblyAI/openapi.yaml
• Line ~1266: Change
diff - objects: + items:
• Ensure the indented block beneath (x-label: language, type: string) remains unchanged under items:.

After making that change, re-run the validation script to confirm:

#!/bin/bash
set -euo pipefail

# Check for any remaining invalid 'objects:' occurrences
rg -n "^\s*objects\s*:" -C2 src/libs/AssemblyAI/openapi.yaml && \
  (echo "ERROR: 'objects:' key still present"; exit 1) || \
  echo "OK: No 'objects:' keys found."

# Confirm no array-style ["auto"] guidance remains
rg -n '\["auto"\]' src/libs/AssemblyAI/openapi.yaml && \
  (echo "ERROR: Found [\"auto\"] in description"; exit 1) || \
  echo "OK: No [\"auto\"] mention found."

# Verify YAML parses
python - <<'PY'
import yaml
yaml.safe_load(open("src/libs/AssemblyAI/openapi.yaml"))
print("OK: YAML parsed successfully.")
PY

Once the objects: key is removed and replaced with items:, the script should pass without errors.

🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1256 to 1275 (approximately
line 1266), replace the invalid OpenAPI key "objects:" under the
expected_languages array with the correct keyword "items:" and keep the indented
block beneath it (x-label: language and type: string) exactly as-is under the
new items: key; then save and re-run the provided validation script to confirm
there are no remaining "objects:" occurrences and YAML parses cleanly.

Comment on lines +1262 to +1268
expected_languages:
x-label: Minimum speakers expected
description: List of languages expected in the audio file.
type: array
objects:
x-label: language
type: string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix invalid OpenAPI array schema: replace "objects" with "items" and correct the field label.

  • OpenAPI uses "items" (not "objects") to define array element schemas. Using "objects" will fail validation and break client generation.
  • The x-label "Minimum speakers expected" is clearly a copy/paste error for an array of languages.

Apply this diff:

             expected_languages:
-              x-label: Minimum speakers expected
+              x-label: Expected languages
               description: List of languages expected in the audio file.
               type: array
-              objects:
-                x-label: language
-                type: string
+              items:
+                type: string
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expected_languages:
x-label: Minimum speakers expected
description: List of languages expected in the audio file.
type: array
objects:
x-label: language
type: string
expected_languages:
x-label: Expected languages
description: List of languages expected in the audio file.
type: array
items:
type: string
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1262-1268 the array schema is
invalid and mislabelled: replace the incorrect "objects" key with the OpenAPI
"items" key to define the element schema, and correct the x-label values so the
array-level label reflects expected languages (e.g., "Expected languages") and
the item-level x-label reflects an individual language (e.g., "language"); keep
the item type as string.

Comment on lines +1270 to +1274
x-label: Fallback language
description: |
If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
type: string
default: "auto"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Documentation/type mismatch: fallback_language description suggests an array, but the type is string.

The text says to specify ["auto"], but the schema defines a string. This will confuse users and SDKs.

Apply this diff:

             fallback_language:
               x-label: Fallback language
               description: |
-                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
+                If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Set to "auto" to let our model choose the fallback language from `expected_languages` with the highest confidence score.
               type: string
               default: "auto"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
x-label: Fallback language
description: |
If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Specify `["auto"]` to let our model choose the fallback language from `expected_languages` with the highest confidence score.
type: string
default: "auto"
fallback_language:
x-label: Fallback language
description: |
If the detected language of the audio file is not in the list of expected languages, the `fallback_language` is used. Set to "auto" to let our model choose the fallback language from `expected_languages` with the highest confidence score.
type: string
default: "auto"
🤖 Prompt for AI Agents
In src/libs/AssemblyAI/openapi.yaml around lines 1270 to 1274, the description
for fallback_language refers to specifying ["auto"] (an array) but the schema
sets type: string; change the schema to type: array with items: { type: string }
and set default: ["auto"] (or alternatively adjust the description to reference
a single string if intended); ensure the description matches the schema and
update any examples to use an array of strings when using ["auto"].

@HavenDV HavenDV closed this Aug 27, 2025
auto-merge was automatically disabled August 27, 2025 11:19

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants