Skip to content

fix: use category name instead of uuid generated by llm (CM-889)#3822

Open
ulemons wants to merge 7 commits intomainfrom
feat/categorization-issue
Open

fix: use category name instead of uuid generated by llm (CM-889)#3822
ulemons wants to merge 7 commits intomainfrom
feat/categorization-issue

Conversation

@ulemons
Copy link
Contributor

@ulemons ulemons commented Feb 3, 2026

PR: Stricter prompt + double validation for category UUIDs

Summary

This PR hardens LLM category selection by:

  • Increasing prompt rigidity to reduce category/id drift.
  • Applying a double validation step to ensure the returned UUID is both well-formed and present in the DB.
  • Falling back to DB-backed resolution using the category name when the UUID is missing/invalid/not found.

What changed

  • Prompt

    • Reinforced “closed set” constraints: categories must be copied verbatim from the authoritative JSON list.
    • Added stricter self-check requirements to minimize mismatches.
  • Post-processing / Validation

    • Step 1: Accept category only if id matches UUID format and exists in categories (DB list).
    • Step 2: If id fails (missing/invalid/not in DB), try resolving by name (case-insensitive) and use the DB id.
    • Step 3: If neither id nor name resolves to a DB category, skip the entry and warn.

Why

LLM outputs can return:

  • Correct names but wrong UUIDs
  • Correct UUID format that doesn’t exist in DB
  • New/modified category names or invented IDs

This ensures output is always DB-consistent, prevents silent data corruption, and keeps categorization within the authoritative category set.


Note

Medium Risk
Moderate risk because it changes production categorization behavior and introduces filtering/correction logic that could drop or alter LLM-selected categories if matching fails.

Overview
Tightens the LLM category-classification flow by switching the category list in the prompt from grouped text to an authoritative closed-set JSON array and adding strict output constraints/self-check instructions to reduce name/id drift.

Adds post-processing in findCategoriesWithLLM to handle null LLM responses and to ensure returned category IDs are DB-consistent: accept only IDs that are valid UUIDs and exist in the fetched category list, otherwise resolve by case-insensitive name and replace with the DB UUID, skipping unknown categories with warnings.

Written by Cursor Bugbot for commit 0f8d57e. This will update automatically on new commits. Configure here.

@ulemons ulemons self-assigned this Feb 3, 2026
@ulemons ulemons added the Bug Created by Linear-GitHub Sync label Feb 3, 2026
@ulemons ulemons changed the title fix: use category name instead of uuid generated by llm fix: use category name instead of uuid generated by llm (CM-889) Feb 3, 2026
@ulemons ulemons requested a review from joanagmaia February 3, 2026 16:56
@ulemons ulemons marked this pull request as ready for review February 3, 2026 16:56
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Created by Linear-GitHub Sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants