Skip to content

Conversation

@dbschmigelski
Copy link
Member

Description

Our integ tests have been flaky for some time. This PR will improve robustness by introducing a retry annotation. We will sparingly use this. Another place we will is on certain integ tests where model providers have been flaky.

  • Add retry_on_flaky decorator to handle flaky integration tests caused by LLM non-determinism
  • Fix test_guardrail_output_intervention_redact_output which was failing ~90% of runs due to the model referencing "CACTUS" from conversation history
  • Change the follow-up prompt from "Reply with only: OK" to "What is 2+2? Reply with only the number." which gives the model an unrelated task and eliminates flakiness

Documentation PR

N/A

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Add retry_on_flaky decorator to handle LLM non-determinism
- Change response2 prompt from "Reply with only: OK" to a math question
  to prevent model from referencing blocked word from conversation history
- Update system prompt to explicitly instruct not mentioning the word
- Add tenacity dependency for retry functionality
Unshure
Unshure previously approved these changes Jan 20, 2026
pgrayy
pgrayy previously approved these changes Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants