Skip to content
#

state-tracking

Here are 5 public repositories matching this topic...

Language: All
Filter by language

GoldEvidenceBench is a regression harness for RAG/LLM systems. It generates long, noisy synthetic logs with oracle labels to separate retrieval, selection, attribution, and authority failures, and provides an auto‑curriculum loop to train selectors on defined trap families and measure evidence‑grounded state tracking

  • Updated Feb 4, 2026
  • Python

Improve this page

Add a description, image, and links to the state-tracking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the state-tracking topic, visit your repo's landing page and select "manage topics."

Learn more