Skip to content

Conversation

@yuechao-qin
Copy link

@yuechao-qin yuechao-qin commented Jan 13, 2026

TODO

  • Discuss if indexing and migrations is needed for annotations (key, value).
  • How do I confirm that the unit test will be part of CI/CD?
  • Is EXISTS needed for searching for Keys? Is EQUALS sufficient?
  • Should we combine user search in the new search API too?
  • Black formatter, should it align with Google's Python style guide here?

Description

Closes #45

Implemented a new API to search annotations for runs.

Background

Annotations are key-value pairs. The following examples of annotations (e.g. key = value):

  • env = production
  • team = backend

This PR allows searches for key/value strings.

Features

  • Created a new API (POST /api/pipeline_runs/search/) to search key/value in annotations.
  • Keys searchable operations
    • EXISTS: If any key exists regardless of key string
    • CONTAINS: If key contains a substring
    • IN_SET: If key string matches set of strings
    • EQUALS: If key string equals string
  • Values searchable operations
    • CONTAINS: If value contains a substring
    • IN_SET: If value matches set of strings
    • EQUALS: If value equals string
  • Keys and Values search operations can be negated (i.e. NOT ).
  • N searches can be grouped (AND, OR) together and recursive.
    • Example: (S1 and S2) or (S3 or S4)

Use Cases and Examples

1. Key equals a string

Find runs where annotation key equals "environment":

{
  "annotation_filters": {
    "filters": [
      {"operator": "equals", "key": "environment"}
    ]
  }
}

2. Key contains substring AND value in set

Find runs where key contains "env" AND value is "prod" or "staging":

{
  "annotation_filters": {
    "filters": [
      {"operator": "contains", "key": "env"},
      {"operator": "in_set", "values": ["prod", "staging"]}
    ],
    "operator": "and"
  }
}

3. Complex: (key contains OR value contains) AND key NOT contains

Find runs where (key contains "env" OR any value contains "prod") AND key NOT contains "deprecated":

{
  "annotation_filters": {
    "filters": [
      {
        "filters": [
          {"operator": "contains", "key": "env"},
          {"operator": "contains", "value": "prod"}
        ],
        "operator": "or"
      },
      {"operator": "contains", "value": "deprecated", "negate": true}
    ],
    "operator": "and"
  }
}

Test Plan

  • Unit test
    • uv run pytest tests/test_pipeline_run_search.py -v
  • Manual Testing
    • Watch demo videos below
    • Add annotations for testing (PUT /api/pipeline_runs/<ID>/annotations/<KEY>/)
    • Query annotations for ID (GET /api/pipeline_runs/<ID>/annotations/)
    • Test with new search (POST /api/pipeline_runs/search/)
  • Test on Staging (same procedure as Manual Testing above)

Demo

part1.mov
part2.mov

@yuechao-qin yuechao-qin marked this pull request as ready for review January 14, 2026 00:10
inject_session_dependency(list_pipeline_runs_func)
)
router.post(
"/api/pipeline_runs/search/",
Copy link

@morgan-wowk morgan-wowk Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a piece of feedback on REST semantics mainly. Typically you wouldn't see a POST request for searching for resources, in other words getting resources. If it were a GraphQL API then things would be different.

Here's the suggestion:

Turn this into a GET endpoint:

GET /api/pipeline_runs

This is clear that you get retrieving a list of pipeline runs, which would be an unfiltered, paginated response by default. Then add the search capability after the foundation (unfiltered search) is established.

From loading the UI, I can see there is already a request being made:

curl 'http://localhost:8000/api/pipeline_runs?include_pipeline_names=true&include_execution_stats=true'

and upon asking cursor, I know that it has a pagination implementation already. We can add the filter options to this existing endpoint rather than creating a new endpoint. Which would set a good precedence for all future search capabilities on other resources.

Here is an example of how I've seen this implemented:

curl 'http://localhost:8000/api/pipeline_runs?filter=in(annotations.env:staging,prod)+eq(annotations.app:agenticsearch)

With other operators like gte (greater than or equal), lte, etc.

Accounting for complex AND / OR combinations is something that would require some thought but I wouldn't go there unless that's something we see people using. The above example is a basic OR and AND.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for using POST is that GET requests do not support body. Sending complex query structures via query parameters can be problematic in certain cases. If we can overcome that, then we can use GET piepline_runs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will dig up some documentation on how I've seen this achieved previously then we can review if it meets our needs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we solve this once, we can use the same solution for all future APIs and won't need an extra /search endpoint to maintain

"key",
"value",
),
# Index for searching pipeline runs by annotation value only (across all keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure such feature would be useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine removing this. Reason why I had this is explained in my other longer comment.

@Ark-kun
Copy link
Contributor

Ark-kun commented Jan 20, 2026

Thank you for implementing this feature.

I think there might slight misunderstanding regarding which predicates are needed.
If I understand correctly, the current implementation treats the annotation key and value separately. However the original issue asks to filter "by values of the keys".

Imagine, annotations is a dict. Then the predicates that we need are:

  • "key1" in annotations["key1"]
  • annotations["key1"] == "value1"
  • annotations["key1"] in ("value1", "value2")
  • "substr1" in annotations["key1"]
  • negation of any of those predicates
  • and (a root-level predicate with a list of sub-predicates)

@Ark-kun
Copy link
Contributor

Ark-kun commented Jan 20, 2026

Another question: Would we be able to reuse the same search classes and functions for component search?

@yuechao-qin yuechao-qin reopened this Jan 20, 2026
Copy link
Author

@yuechao-qin yuechao-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #68 (comment)

Yes, I plan on reusing this PR's search classes and functions for component search.

Regarding #68 (comment)

Thanks for questioning this Alexey. I found a bug in my code from your concern, which is fix now.

I believe the current implementation does handle your predicates. Let me first address your predicate examples then explain my thought on why I chose this design.

  • "key1" in annotations
    • {
        "annotation_filters": {
          "filters": [
            {"operator": "equals", "key": "key1"}
          ],
          "operator": "and"
        }
      }
  • annotations["key1"] == "value1"
    • {
        "annotation_filters": {
          "filters": [
            {"operator": "equals", "key": "key1"},
            {"operator": "equals", "value": "value1"}
          ],
          "operator": "and"
        }
      }
  • annotations["key1"] in ("value1", "value2")
    • {
        "annotation_filters": {
          "filters": [
            {"operator": "equals", "key": "key1"},
            {"operator": "in_set", "values": ["value1", "value2"]}
          ],
          "operator": "and"
        }
      }
  • "substr1" in annotations["key1"]
    • {
        "annotation_filters": {
          "filters": [
            {"operator": "equals", "key": "key1"},
            {"operator": "contains", "value": "substr1"}
          ],
          "operator": "and"
        }
      }
  • negation of any of those predicates
    • {
        "annotation_filters": {
          "filters": [
            {"operator": "equals", "key": "key1", "negate": true},
            {"operator": "equals", "value": "value1", "negate": true}
          ],
          "operator": "and"
        }
      }
  • and (a root-level predicate with a list of sub-predicates)
    • {
        "annotation_filters": {
          "filters": [
            {
              "filters": [
                {"operator": "equals", "key": "key1"},
                {"operator": "equals", "value": "value1"}
              ],
              "operator": "and"
            },
            {
              "filters": [
                {"operator": "equals", "key": "key2"},
                {"operator": "in_set", "values": ["value2", "value3"]}
              ],
              "operator": "and"
            }
          ],
          "operator": "and"
        }
      }
      

So my thought on why I chose this design is that:

  • The backend is simpler because it can combine multiple predicates with different operators.
    • Can search for runs with annotations keys and/or values.
    • The JSON structure is similar for key and/or value filters, such that it requires:
      • search type (key/value)
      • operator (equals/contains/in_set)
      • can be negated (true/false)
      • all predicates can be in groups with logical operators (and/or).
    • The JSON structure translates to SQL in a straightforward way. For example, your predicates above will result in the following SQL:
      • "key1" in annotations is equivalent to SQL where pipeline_run_annotation."key" IS NOT NULL
      • annotations["key1"] == "value1" is equivalent to SQL where pipeline_run_annotation."key" = 'key1' AND pipeline_run_annotation.value = 'value1'
  • Key motivation from this design is to make sure the SQL is feasible to generate. In addition, to make it as flexible as possible for any combination of predicates.
    • Given the current design, it allows quite complex SQL predicates to be generated (groups, keys/values, oeprators, negation).
  • The example predicates above you gave are trying to be more pythonic. If we want that UX, it's still possible with this backend design. Since SQL doesn't have a direct transation of pythonic predicates, we need to translate them to SQL, which is something we can explore in the future.

If my explaination does not align with your expectations, happy to change the design. Feel free to suggest any changes you think are better.

"key",
"value",
),
# Index for searching pipeline runs by annotation value only (across all keys)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine removing this. Reason why I had this is explained in my other longer comment.

@yuechao-qin yuechao-qin requested a review from Ark-kun January 20, 2026 19:41
@Volv-G
Copy link

Volv-G commented Jan 21, 2026

I'm curious about contains and about whether we always expect annotation values to be strings. I think we might benefit from having arrays there, in which case contains should be checking whether the array contains the term from the query. Can this be supported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run list - Search by annotations

4 participants