Agenta-AI · mmabrouk · Jan 27, 2026 · Jan 27, 2026 · Jan 28, 2026 · Jan 28, 2026
diff --git a/docs/design/migrate-evaluator-playground/README.md b/docs/design/migrate-evaluator-playground/README.md
@@ -0,0 +1,84 @@
+# Migrate Evaluator Playground to New Evaluator Endpoints
+
+## Overview
+
+This planning workspace documents the migration of the Evaluator Playground frontend to use the new workflow-based evaluator endpoints. The backend team has migrated evaluators from the old `EvaluatorConfig` model to the new `SimpleEvaluator` (workflow-based) model.
+
+## Migration Strategy
+
+**Direct migration (no adapters)** split into two PRs:
+
+| PR | Scope | Description |
+|----|-------|-------------|
+| **PR 1** | CRUD | Migrate to `/preview/simple/evaluators/*`, change internal types to `SimpleEvaluator` |
+| **PR 2** | Run | Migrate to `/preview/workflows/invoke`, add workflow service types |
+
+See [plan.md](./plan.md) for detailed implementation steps.
+
+## Context
+
+- **PR #3527**: Backend migration that introduces new evaluator endpoints
+- **Goal**: Full migration to new endpoints, no legacy code remaining
+
+## Documents
+
+| File | Description |
+|------|-------------|
+| [context.md](./context.md) | Background, motivation, problem statement, goals, and non-goals |
+| [current-system.md](./current-system.md) | Detailed map of current Evaluator Playground implementation |
+| [new-endpoints.md](./new-endpoints.md) | New evaluator endpoint shapes and differences from legacy |
+| [research.md](./research.md) | Deep dive into evaluator execution architecture and URI-based handlers |
+| [migration-options.md](./migration-options.md) | Why we chose direct migration over adapters |
+| [risk-analysis.md](./risk-analysis.md) | Coupling points and risk areas for the migration |
+| [plan.md](./plan.md) | **Main plan** - PR 1 (CRUD) and PR 2 (Run) implementation details |
+| [status.md](./status.md) | Living document for progress updates and decisions |
+
+## Key Mapping Changes
+
+| Legacy | New |
+|--------|-----|
+| `EvaluatorConfig` | `SimpleEvaluator` |
+| `evaluator_key` | derived from `data.uri` |
+| `settings_values` | `data.parameters` |
+| `GET /evaluators/configs/` | `POST /preview/simple/evaluators/query` |
+| `POST /evaluators/configs/` | `POST /preview/simple/evaluators/` |
+| `PUT /evaluators/configs/{id}/` | `PUT /preview/simple/evaluators/{id}` |
+| `DELETE /evaluators/configs/{id}/` | `POST /preview/simple/evaluators/{id}/archive` |
+| `POST /evaluators/{key}/run/` | `POST /preview/workflows/invoke` |
+
+## Files Affected
+
+### PR 1: CRUD Migration
+
+| Area | Files |
+|------|-------|
+| Types | `web/oss/src/lib/Types.ts` |
+| Services | `web/oss/src/services/evaluators/index.ts` |
+| State | `web/oss/src/state/evaluators/atoms.ts` |
+| Playground State | `web/oss/src/components/.../ConfigureEvaluator/state/atoms.ts` |
+| Playground UI | `web/oss/src/components/.../ConfigureEvaluator/index.tsx` |
+| Registry | `web/oss/src/components/Evaluators/index.tsx` |
+| Registry Hook | `web/oss/src/components/Evaluators/hooks/useEvaluatorsRegistryData.ts` |
+| Columns | `web/oss/src/components/Evaluators/assets/getColumns.tsx` |
+
+### PR 2: Run Migration
+
+| Area | Files |
+|------|-------|
+| Types | `web/oss/src/lib/Types.ts` (add workflow types) |
+| Invoke Service | `web/oss/src/services/workflows/invoke.ts` (new) |
+| Debug Section | `web/oss/src/components/.../ConfigureEvaluator/DebugSection.tsx` |
+
+### Backend Reference (PR #3527)
+- `api/oss/src/routers/evaluators_router.py` - Legacy endpoints (kept temporarily)
+- `api/oss/src/apis/fastapi/evaluators/router.py` - New `SimpleEvaluators` router
+- `api/oss/src/apis/fastapi/workflows/router.py` - Workflow invoke endpoint
+- `api/oss/src/core/evaluators/dtos.py` - New data transfer objects
+
+## Effort Estimate
+
+| PR | Effort |
+|----|--------|
+| PR 1: CRUD | 4-5 days |
+| PR 2: Run | 3-4 days |
+| **Total** | **7-9 days** |
diff --git a/docs/design/migrate-evaluator-playground/context.md b/docs/design/migrate-evaluator-playground/context.md
@@ -0,0 +1,72 @@
+# Context: Migrate Evaluator Playground
+
+## Background
+
+The Agenta platform has undergone a significant architectural change where **evaluators are now workflows**. This means evaluators follow the same git-like versioning model as other workflows:
+- **Artifact** (Evaluator) → **Variant** → **Revision**
+
+Previously, evaluators were stored in a flat `EvaluatorConfigDB` table with simple key-value settings. The new model stores evaluators as `WorkflowArtifactDBE`, `WorkflowVariantDBE`, and `WorkflowRevisionDBE` records with richer metadata and versioning.
+
+## Motivation
+
+1. **Unified Architecture**: Evaluators, testsets, and apps now share the same git-like workflow model
+2. **Better Versioning**: Evaluators can have multiple variants and revision history
+3. **Richer Metadata**: New model supports URIs, schemas, scripts, and configuration in a structured way
+4. **Future Extensibility**: Custom evaluators will be first-class citizens with the same capabilities as built-in ones
+
+## Problem Statement
+
+The Evaluator Playground frontend currently uses legacy endpoints:
+- `GET /evaluators/` - List evaluator templates
+- `GET/POST/PUT/DELETE /evaluators/configs/` - CRUD for evaluator configurations
+- `POST /evaluators/{key}/run/` - Run evaluator in playground
+
+The backend (PR #3527) has:
+1. Migrated all evaluator configs to the new workflow-based model via DB migrations
+2. Created new `SimpleEvaluators` endpoints at `/preview/simple/evaluators/`
+3. Native workflow execution available at `/preview/workflows/invoke`
+4. Kept legacy endpoints as thin wrappers (to be deprecated)
+
+**The frontend needs to migrate to use the new endpoints directly.**
+
+## Goals
+
+1. **Replace legacy evaluator config CRUD** with new `SimpleEvaluator` endpoints
+2. **Replace legacy evaluator run** with native workflow invoke (`/preview/workflows/invoke`)
+3. **Update data models** in frontend to match new `SimpleEvaluator` shape (no adapters)
+4. **Preserve UX** - no user-facing changes to the Evaluator Playground functionality
+5. **Remove all legacy endpoint usage** - clean migration, no dual-path code
+
+## Non-Goals
+
+1. **Not changing the Evaluator Playground UI** - Only the data layer changes
+2. **Not migrating evaluation batch runs** - Those already use the new workflow system internally
+3. **Not introducing new evaluator features** - This is a pure endpoint migration
+
+## Success Criteria
+
+1. Evaluator Playground can create, edit, delete evaluators using new `SimpleEvaluator` endpoints
+2. Evaluator Playground can run evaluators using native workflow invoke
+3. All existing evaluator configurations continue to work
+4. No regression in evaluator testing functionality
+5. No legacy endpoint calls remain in frontend code
+
+## Constraints
+
+1. Must not break existing evaluator configurations
+2. Must coordinate with backend team on endpoint availability (PR #3527)
+3. Split into two PRs for reviewability (CRUD first, then Run)
+
+## Migration Approach
+
+**Direct migration (no adapters):**
+
+| PR | Scope | Endpoints |
+|----|-------|-----------|
+| PR 1 | CRUD | `/preview/simple/evaluators/*` |
+| PR 2 | Run | `/preview/workflows/invoke` |
+
+This approach:
+- Avoids tech debt from adapter layers
+- Aligns internal types with backend models
+- Keeps changes reviewable by splitting into two PRs
diff --git a/docs/design/migrate-evaluator-playground/current-system.md b/docs/design/migrate-evaluator-playground/current-system.md
@@ -0,0 +1,230 @@
+# Current System: Evaluator Playground
+
+## Overview
+
+The Evaluator Playground allows users to:
+1. **Browse** evaluator templates (built-in evaluators)
+2. **Create/Configure** evaluator configurations with custom settings
+3. **Test** evaluators by running them against app variants and test cases
+4. **Manage** (edit, clone, delete) existing evaluator configurations
+
+## File Structure
+
+### Entry Points (Pages)
+
+| Path | Purpose |
+|------|---------|
+| `/web/oss/src/pages/w/[workspace_id]/p/[project_id]/evaluators/index.tsx` | Evaluators list page |
+| `/web/oss/src/pages/w/[workspace_id]/p/[project_id]/evaluators/configure/[evaluator_id].tsx` | Configure evaluator page |
+
+### Core Components
+
+#### Evaluators Registry (`/web/oss/src/components/Evaluators/`)
+
+| File | Purpose |
+|------|---------|
+| `index.tsx` | Main registry with table, search, tabs (automatic/human) |
+| `hooks/useEvaluatorsRegistryData.ts` | Fetches and transforms evaluator data |
+| `assets/getColumns.tsx` | Table column definitions |
+| `components/SelectEvaluatorModal/` | Modal to select evaluator template for new config |
+| `components/ConfigureEvaluator/index.tsx` | Page wrapper that loads data and initializes atoms |
+| `components/DeleteEvaluatorsModal/` | Delete confirmation modal |
+
+#### ConfigureEvaluator (Main UI) 
+
+Location: `/web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/`
+
+| File | Purpose |
+|------|---------|
+| `index.tsx` | Configuration form + test panel layout |
+| `DebugSection.tsx` | Test evaluator panel (run variant, run evaluator) |
+| `DynamicFormField.tsx` | Renders settings fields based on evaluator template |
+| `AdvancedSettings.tsx` | Collapsible advanced parameters |
+| `state/atoms.ts` | Jotai atoms for playground state |
+| `variantUtils.ts` | Utility for building variants from revisions |
+
+### State Management
+
+#### Playground Atoms (`state/atoms.ts`)
+
+```typescript
+// Session state
+playgroundSessionAtom          // { evaluator, existingConfigId, mode }
+playgroundEvaluatorAtom        // Current evaluator template (derived)
+playgroundIsEditModeAtom       // Is editing existing config? (derived)
+playgroundIsCloneModeAtom      // Is cloning config? (derived)
+playgroundEditValuesAtom       // Current config values being edited
+
+// Form state
+playgroundFormRefAtom          // Ant Design Form instance
+
+// Test section state
+playgroundSelectedVariantAtom  // Selected variant for testing
+playgroundSelectedTestsetIdAtom // Selected testset ID
+playgroundSelectedRevisionIdAtom // Selected revision ID
+playgroundSelectedTestcaseAtom // Testcase data
+playgroundTraceTreeAtom        // Trace output from running variant
+
+// Persisted state (localStorage)
+playgroundLastAppIdAtom        // Last used app ID
+playgroundLastVariantIdAtom    // Last used variant ID
+
+// Action atoms
+initPlaygroundAtom             // Initialize playground state
+resetPlaygroundAtom            // Reset all state
+commitPlaygroundAtom           // Update state after save
+cloneCurrentConfigAtom         // Switch to clone mode
+```
+
+#### Global Evaluator Atoms (`/web/oss/src/state/evaluators/atoms.ts`)
+
+```typescript
+evaluatorConfigsQueryAtomFamily // Query for evaluator configs
+evaluatorsQueryAtomFamily       // Query for evaluator templates
+nonArchivedEvaluatorsAtom       // Derived: non-archived evaluators
+evaluatorByKeyAtomFamily        // Find evaluator by key
+```
+
+### API Service Layer
+
+#### Evaluators Service (`/web/oss/src/services/evaluators/index.ts`)
+
+```typescript
+// Evaluator Templates (legacy)
+fetchAllEvaluators()           // GET /evaluators
+
+// Evaluator Configs (legacy)
+fetchAllEvaluatorConfigs()     // GET /evaluators/configs
+createEvaluatorConfig()        // POST /evaluators/configs
+updateEvaluatorConfig()        // PUT /evaluators/configs/{id}
+deleteEvaluatorConfig()        // DELETE /evaluators/configs/{id}
+
+// Custom/Human Evaluators (new)
+createEvaluator()              // POST /preview/simple/evaluators/
+updateEvaluator()              // PUT /preview/simple/evaluators/{id}
+fetchEvaluatorById()           // GET /preview/simple/evaluators/{id}
+deleteHumanEvaluator()         // POST /preview/simple/evaluators/{id}/archive
+```
+
+#### Evaluator Run Service (`/web/oss/src/services/evaluations/api_ee/index.ts`)
+
+```typescript
+createEvaluatorDataMapping()   // POST /evaluators/map
+createEvaluatorRunExecution()  // POST /evaluators/{key}/run
+```
+
+## Data Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                           USER ACTIONS                                       │
+│  - Browse evaluators list                                                   │
+│  - Create new evaluator config                                              │
+│  - Edit existing evaluator config                                           │
+│  - Test evaluator with variant + testcase                                   │
+└─────────────────────────────────────────────────────────────────────────────┘
+                                    │
+                                    ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│  ENTRY POINTS                                                                │
+│  /evaluators → EvaluatorsRegistry                                           │
+│       ├─ Uses useEvaluatorsRegistryData() hook                              │
+│       │     ├─ Calls fetchAllEvaluators() → GET /evaluators                 │
+│       │     └─ Calls fetchAllEvaluatorConfigs() → GET /evaluators/configs   │
+│       │                                                                      │
+│       ├─ "Create new" → SelectEvaluatorModal → /evaluators/configure/new    │
+│       └─ Click row → /evaluators/configure/{id}                             │
+│                                                                              │
+│  /evaluators/configure/{id} → ConfigureEvaluatorPage                        │
+│       ├─ Loads evaluator template & existing config                         │
+│       ├─ Initializes playgroundSessionAtom                                  │
+│       └─ Renders ConfigureEvaluator component                               │
+└─────────────────────────────────────────────────────────────────────────────┘
+                                    │
+                                    ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│  ConfigureEvaluator                                                          │
+│  ┌─────────────────────────────┐  ┌─────────────────────────────┐           │
+│  │  LEFT: Configuration Form   │  │  RIGHT: DebugSection        │           │
+│  │  - Name input               │  │  - Testcase selector        │           │
+│  │  - DynamicFormField[]       │  │  - Variant selector         │           │
+│  │  - AdvancedSettings         │  │  - Run variant button       │           │
+│  │  - Commit/Reset buttons     │  │  - Run evaluator button     │           │
+│  └─────────────────────────────┘  └─────────────────────────────┘           │
+│                                                                              │
+│  Commit Actions:                                                             │
+│  - Create: POST /evaluators/configs → createEvaluatorConfig()               │
+│  - Update: PUT /evaluators/configs/{id} → updateEvaluatorConfig()           │
+│                                                                              │
+│  Test Actions:                                                               │
+│  - Run Variant: callVariant() → POST to variant URL                         │
+│  - Run Evaluator: createEvaluatorRunExecution()                             │
+│                   → POST /evaluators/{key}/run                              │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## Current API Endpoints Used
+
+### Legacy Endpoints (to be migrated)
+
+| Endpoint | Method | Frontend Function | Purpose |
+|----------|--------|-------------------|---------|
+| `/evaluators/` | GET | `fetchAllEvaluators()` | List evaluator templates |
+| `/evaluators/configs/` | GET | `fetchAllEvaluatorConfigs()` | List evaluator configs |
+| `/evaluators/configs/` | POST | `createEvaluatorConfig()` | Create new config |
+| `/evaluators/configs/{id}/` | PUT | `updateEvaluatorConfig()` | Update existing config |
+| `/evaluators/configs/{id}/` | DELETE | `deleteEvaluatorConfig()` | Delete config |
+
+### Endpoints That Remain Unchanged
+
+| Endpoint | Method | Frontend Function | Purpose |
+|----------|--------|-------------------|---------|
+| `/evaluators/map/` | POST | `createEvaluatorDataMapping()` | Map trace data for RAG evaluators |
+| `/evaluators/{key}/run/` | POST | `createEvaluatorRunExecution()` | Run evaluator (test) |
+
+### Already Using New Endpoints (for custom evaluators)
+
+| Endpoint | Method | Frontend Function | Purpose |
+|----------|--------|-------------------|---------|
+| `/preview/simple/evaluators/` | POST | `createEvaluator()` | Create custom evaluator |
+| `/preview/simple/evaluators/{id}` | PUT | `updateEvaluator()` | Update custom evaluator |
+| `/preview/simple/evaluators/{id}` | GET | `fetchEvaluatorById()` | Fetch evaluator by ID |
+| `/preview/simple/evaluators/{id}/archive` | POST | `deleteHumanEvaluator()` | Archive human evaluator |
+
+## Data Types
+
+### Current EvaluatorConfig (Legacy)
+
+```typescript
+interface EvaluatorConfig {
+    id: string
+    evaluator_key: string
+    name: string
+    settings_values: Record<string, any>
+    created_at: string
+    updated_at: string
+    color?: string
+    tags?: string[]
+    // Frontend additions
+    icon_url?: string | StaticImageData
+}
+```
+
+### Current Evaluator Template (Legacy)
+
+```typescript
+interface Evaluator {
+    name: string
+    key: string
+    settings_presets?: SettingsPreset[]
+    settings_template: Record<string, EvaluationSettingsTemplate>
+    icon_url?: string | StaticImageData
+    color?: string
+    direct_use?: boolean
+    description: string
+    oss?: boolean
+    requires_llm_api_keys?: boolean
+    tags: string[]
+    archived?: boolean
+}
+```