Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions docs/design/migrate-evaluator-playground/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Migrate Evaluator Playground to New Evaluator Endpoints

## Overview

This planning workspace documents the migration of the Evaluator Playground frontend to use the new workflow-based evaluator endpoints. The backend team has migrated evaluators from the old `EvaluatorConfig` model to the new `SimpleEvaluator` (workflow-based) model.

## Migration Strategy

**Direct migration (no adapters)** split into two PRs:

| PR | Scope | Description |
|----|-------|-------------|
| **PR 1** | CRUD | Migrate to `/preview/simple/evaluators/*`, change internal types to `SimpleEvaluator` |
| **PR 2** | Run | Migrate to `/preview/workflows/invoke`, add workflow service types |

See [plan.md](./plan.md) for detailed implementation steps.

## Context

- **PR #3527**: Backend migration that introduces new evaluator endpoints
- **Goal**: Full migration to new endpoints, no legacy code remaining

## Documents

| File | Description |
|------|-------------|
| [context.md](./context.md) | Background, motivation, problem statement, goals, and non-goals |
| [current-system.md](./current-system.md) | Detailed map of current Evaluator Playground implementation |
| [new-endpoints.md](./new-endpoints.md) | New evaluator endpoint shapes and differences from legacy |
| [research.md](./research.md) | Deep dive into evaluator execution architecture and URI-based handlers |
| [migration-options.md](./migration-options.md) | Why we chose direct migration over adapters |
| [risk-analysis.md](./risk-analysis.md) | Coupling points and risk areas for the migration |
| [plan.md](./plan.md) | **Main plan** - PR 1 (CRUD) and PR 2 (Run) implementation details |
| [status.md](./status.md) | Living document for progress updates and decisions |

## Key Mapping Changes

| Legacy | New |
|--------|-----|
| `EvaluatorConfig` | `SimpleEvaluator` |
| `evaluator_key` | derived from `data.uri` |
| `settings_values` | `data.parameters` |
| `GET /evaluators/configs/` | `POST /preview/simple/evaluators/query` |
| `POST /evaluators/configs/` | `POST /preview/simple/evaluators/` |
| `PUT /evaluators/configs/{id}/` | `PUT /preview/simple/evaluators/{id}` |
| `DELETE /evaluators/configs/{id}/` | `POST /preview/simple/evaluators/{id}/archive` |
| `POST /evaluators/{key}/run/` | `POST /preview/workflows/invoke` |

## Files Affected

### PR 1: CRUD Migration

| Area | Files |
|------|-------|
| Types | `web/oss/src/lib/Types.ts` |
| Services | `web/oss/src/services/evaluators/index.ts` |
| State | `web/oss/src/state/evaluators/atoms.ts` |
| Playground State | `web/oss/src/components/.../ConfigureEvaluator/state/atoms.ts` |
| Playground UI | `web/oss/src/components/.../ConfigureEvaluator/index.tsx` |
| Registry | `web/oss/src/components/Evaluators/index.tsx` |
| Registry Hook | `web/oss/src/components/Evaluators/hooks/useEvaluatorsRegistryData.ts` |
| Columns | `web/oss/src/components/Evaluators/assets/getColumns.tsx` |

### PR 2: Run Migration

| Area | Files |
|------|-------|
| Types | `web/oss/src/lib/Types.ts` (add workflow types) |
| Invoke Service | `web/oss/src/services/workflows/invoke.ts` (new) |
| Debug Section | `web/oss/src/components/.../ConfigureEvaluator/DebugSection.tsx` |

### Backend Reference (PR #3527)
- `api/oss/src/routers/evaluators_router.py` - Legacy endpoints (kept temporarily)
- `api/oss/src/apis/fastapi/evaluators/router.py` - New `SimpleEvaluators` router
- `api/oss/src/apis/fastapi/workflows/router.py` - Workflow invoke endpoint
- `api/oss/src/core/evaluators/dtos.py` - New data transfer objects

## Effort Estimate

| PR | Effort |
|----|--------|
| PR 1: CRUD | 4-5 days |
| PR 2: Run | 3-4 days |
| **Total** | **7-9 days** |
72 changes: 72 additions & 0 deletions docs/design/migrate-evaluator-playground/context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Context: Migrate Evaluator Playground

## Background

The Agenta platform has undergone a significant architectural change where **evaluators are now workflows**. This means evaluators follow the same git-like versioning model as other workflows:
- **Artifact** (Evaluator) → **Variant** → **Revision**

Previously, evaluators were stored in a flat `EvaluatorConfigDB` table with simple key-value settings. The new model stores evaluators as `WorkflowArtifactDBE`, `WorkflowVariantDBE`, and `WorkflowRevisionDBE` records with richer metadata and versioning.

## Motivation

1. **Unified Architecture**: Evaluators, testsets, and apps now share the same git-like workflow model
2. **Better Versioning**: Evaluators can have multiple variants and revision history
3. **Richer Metadata**: New model supports URIs, schemas, scripts, and configuration in a structured way
4. **Future Extensibility**: Custom evaluators will be first-class citizens with the same capabilities as built-in ones

## Problem Statement

The Evaluator Playground frontend currently uses legacy endpoints:
- `GET /evaluators/` - List evaluator templates
- `GET/POST/PUT/DELETE /evaluators/configs/` - CRUD for evaluator configurations
- `POST /evaluators/{key}/run/` - Run evaluator in playground

The backend (PR #3527) has:
1. Migrated all evaluator configs to the new workflow-based model via DB migrations
2. Created new `SimpleEvaluators` endpoints at `/preview/simple/evaluators/`
3. Native workflow execution available at `/preview/workflows/invoke`
4. Kept legacy endpoints as thin wrappers (to be deprecated)

**The frontend needs to migrate to use the new endpoints directly.**

## Goals

1. **Replace legacy evaluator config CRUD** with new `SimpleEvaluator` endpoints
2. **Replace legacy evaluator run** with native workflow invoke (`/preview/workflows/invoke`)
3. **Update data models** in frontend to match new `SimpleEvaluator` shape (no adapters)
4. **Preserve UX** - no user-facing changes to the Evaluator Playground functionality
5. **Remove all legacy endpoint usage** - clean migration, no dual-path code

## Non-Goals

1. **Not changing the Evaluator Playground UI** - Only the data layer changes
2. **Not migrating evaluation batch runs** - Those already use the new workflow system internally
3. **Not introducing new evaluator features** - This is a pure endpoint migration

## Success Criteria

1. Evaluator Playground can create, edit, delete evaluators using new `SimpleEvaluator` endpoints
2. Evaluator Playground can run evaluators using native workflow invoke
3. All existing evaluator configurations continue to work
4. No regression in evaluator testing functionality
5. No legacy endpoint calls remain in frontend code

## Constraints

1. Must not break existing evaluator configurations
2. Must coordinate with backend team on endpoint availability (PR #3527)
3. Split into two PRs for reviewability (CRUD first, then Run)

## Migration Approach

**Direct migration (no adapters):**

| PR | Scope | Endpoints |
|----|-------|-----------|
| PR 1 | CRUD | `/preview/simple/evaluators/*` |
| PR 2 | Run | `/preview/workflows/invoke` |

This approach:
- Avoids tech debt from adapter layers
- Aligns internal types with backend models
- Keeps changes reviewable by splitting into two PRs
230 changes: 230 additions & 0 deletions docs/design/migrate-evaluator-playground/current-system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
# Current System: Evaluator Playground

## Overview

The Evaluator Playground allows users to:
1. **Browse** evaluator templates (built-in evaluators)
2. **Create/Configure** evaluator configurations with custom settings
3. **Test** evaluators by running them against app variants and test cases
4. **Manage** (edit, clone, delete) existing evaluator configurations

## File Structure

### Entry Points (Pages)

| Path | Purpose |
|------|---------|
| `/web/oss/src/pages/w/[workspace_id]/p/[project_id]/evaluators/index.tsx` | Evaluators list page |
| `/web/oss/src/pages/w/[workspace_id]/p/[project_id]/evaluators/configure/[evaluator_id].tsx` | Configure evaluator page |

### Core Components

#### Evaluators Registry (`/web/oss/src/components/Evaluators/`)

| File | Purpose |
|------|---------|
| `index.tsx` | Main registry with table, search, tabs (automatic/human) |
| `hooks/useEvaluatorsRegistryData.ts` | Fetches and transforms evaluator data |
| `assets/getColumns.tsx` | Table column definitions |
| `components/SelectEvaluatorModal/` | Modal to select evaluator template for new config |
| `components/ConfigureEvaluator/index.tsx` | Page wrapper that loads data and initializes atoms |
| `components/DeleteEvaluatorsModal/` | Delete confirmation modal |

#### ConfigureEvaluator (Main UI)

Location: `/web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/`

| File | Purpose |
|------|---------|
| `index.tsx` | Configuration form + test panel layout |
| `DebugSection.tsx` | Test evaluator panel (run variant, run evaluator) |
| `DynamicFormField.tsx` | Renders settings fields based on evaluator template |
| `AdvancedSettings.tsx` | Collapsible advanced parameters |
| `state/atoms.ts` | Jotai atoms for playground state |
| `variantUtils.ts` | Utility for building variants from revisions |

### State Management

#### Playground Atoms (`state/atoms.ts`)

```typescript
// Session state
playgroundSessionAtom // { evaluator, existingConfigId, mode }
playgroundEvaluatorAtom // Current evaluator template (derived)
playgroundIsEditModeAtom // Is editing existing config? (derived)
playgroundIsCloneModeAtom // Is cloning config? (derived)
playgroundEditValuesAtom // Current config values being edited

// Form state
playgroundFormRefAtom // Ant Design Form instance

// Test section state
playgroundSelectedVariantAtom // Selected variant for testing
playgroundSelectedTestsetIdAtom // Selected testset ID
playgroundSelectedRevisionIdAtom // Selected revision ID
playgroundSelectedTestcaseAtom // Testcase data
playgroundTraceTreeAtom // Trace output from running variant

// Persisted state (localStorage)
playgroundLastAppIdAtom // Last used app ID
playgroundLastVariantIdAtom // Last used variant ID

// Action atoms
initPlaygroundAtom // Initialize playground state
resetPlaygroundAtom // Reset all state
commitPlaygroundAtom // Update state after save
cloneCurrentConfigAtom // Switch to clone mode
```

#### Global Evaluator Atoms (`/web/oss/src/state/evaluators/atoms.ts`)

```typescript
evaluatorConfigsQueryAtomFamily // Query for evaluator configs
evaluatorsQueryAtomFamily // Query for evaluator templates
nonArchivedEvaluatorsAtom // Derived: non-archived evaluators
evaluatorByKeyAtomFamily // Find evaluator by key
```

### API Service Layer

#### Evaluators Service (`/web/oss/src/services/evaluators/index.ts`)

```typescript
// Evaluator Templates (legacy)
fetchAllEvaluators() // GET /evaluators

// Evaluator Configs (legacy)
fetchAllEvaluatorConfigs() // GET /evaluators/configs
createEvaluatorConfig() // POST /evaluators/configs
updateEvaluatorConfig() // PUT /evaluators/configs/{id}
deleteEvaluatorConfig() // DELETE /evaluators/configs/{id}

// Custom/Human Evaluators (new)
createEvaluator() // POST /preview/simple/evaluators/
updateEvaluator() // PUT /preview/simple/evaluators/{id}
fetchEvaluatorById() // GET /preview/simple/evaluators/{id}
deleteHumanEvaluator() // POST /preview/simple/evaluators/{id}/archive
```

#### Evaluator Run Service (`/web/oss/src/services/evaluations/api_ee/index.ts`)

```typescript
createEvaluatorDataMapping() // POST /evaluators/map
createEvaluatorRunExecution() // POST /evaluators/{key}/run
```

## Data Flow

```
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER ACTIONS │
│ - Browse evaluators list │
│ - Create new evaluator config │
│ - Edit existing evaluator config │
│ - Test evaluator with variant + testcase │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ ENTRY POINTS │
│ /evaluators → EvaluatorsRegistry │
│ ├─ Uses useEvaluatorsRegistryData() hook │
│ │ ├─ Calls fetchAllEvaluators() → GET /evaluators │
│ │ └─ Calls fetchAllEvaluatorConfigs() → GET /evaluators/configs │
│ │ │
│ ├─ "Create new" → SelectEvaluatorModal → /evaluators/configure/new │
│ └─ Click row → /evaluators/configure/{id} │
│ │
│ /evaluators/configure/{id} → ConfigureEvaluatorPage │
│ ├─ Loads evaluator template & existing config │
│ ├─ Initializes playgroundSessionAtom │
│ └─ Renders ConfigureEvaluator component │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ ConfigureEvaluator │
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
│ │ LEFT: Configuration Form │ │ RIGHT: DebugSection │ │
│ │ - Name input │ │ - Testcase selector │ │
│ │ - DynamicFormField[] │ │ - Variant selector │ │
│ │ - AdvancedSettings │ │ - Run variant button │ │
│ │ - Commit/Reset buttons │ │ - Run evaluator button │ │
│ └─────────────────────────────┘ └─────────────────────────────┘ │
│ │
│ Commit Actions: │
│ - Create: POST /evaluators/configs → createEvaluatorConfig() │
│ - Update: PUT /evaluators/configs/{id} → updateEvaluatorConfig() │
│ │
│ Test Actions: │
│ - Run Variant: callVariant() → POST to variant URL │
│ - Run Evaluator: createEvaluatorRunExecution() │
│ → POST /evaluators/{key}/run │
└─────────────────────────────────────────────────────────────────────────────┘
```

## Current API Endpoints Used

### Legacy Endpoints (to be migrated)

| Endpoint | Method | Frontend Function | Purpose |
|----------|--------|-------------------|---------|
| `/evaluators/` | GET | `fetchAllEvaluators()` | List evaluator templates |
| `/evaluators/configs/` | GET | `fetchAllEvaluatorConfigs()` | List evaluator configs |
| `/evaluators/configs/` | POST | `createEvaluatorConfig()` | Create new config |
| `/evaluators/configs/{id}/` | PUT | `updateEvaluatorConfig()` | Update existing config |
| `/evaluators/configs/{id}/` | DELETE | `deleteEvaluatorConfig()` | Delete config |

### Endpoints That Remain Unchanged

| Endpoint | Method | Frontend Function | Purpose |
|----------|--------|-------------------|---------|
| `/evaluators/map/` | POST | `createEvaluatorDataMapping()` | Map trace data for RAG evaluators |
| `/evaluators/{key}/run/` | POST | `createEvaluatorRunExecution()` | Run evaluator (test) |

### Already Using New Endpoints (for custom evaluators)

| Endpoint | Method | Frontend Function | Purpose |
|----------|--------|-------------------|---------|
| `/preview/simple/evaluators/` | POST | `createEvaluator()` | Create custom evaluator |
| `/preview/simple/evaluators/{id}` | PUT | `updateEvaluator()` | Update custom evaluator |
| `/preview/simple/evaluators/{id}` | GET | `fetchEvaluatorById()` | Fetch evaluator by ID |
| `/preview/simple/evaluators/{id}/archive` | POST | `deleteHumanEvaluator()` | Archive human evaluator |

## Data Types

### Current EvaluatorConfig (Legacy)

```typescript
interface EvaluatorConfig {
id: string
evaluator_key: string
name: string
settings_values: Record<string, any>
created_at: string
updated_at: string
color?: string
tags?: string[]
// Frontend additions
icon_url?: string | StaticImageData
}
```

### Current Evaluator Template (Legacy)

```typescript
interface Evaluator {
name: string
key: string
settings_presets?: SettingsPreset[]
settings_template: Record<string, EvaluationSettingsTemplate>
icon_url?: string | StaticImageData
color?: string
direct_use?: boolean
description: string
oss?: boolean
requires_llm_api_keys?: boolean
tags: string[]
archived?: boolean
}
```
Loading