Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… #784

Bhoy1 · 2026-01-25T10:40:16Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Adds first-class multi-agent support across the library with per-actor rollouts, scoring, and training.

Introduces Actor, Protocol, MultiAgentEnv, and MultiAgentRubric for multi-agent turn management, spawning, per-actor trajectory tagging, and per-actor rewards/advantages
MultiAgentEnv.generate() flattens game states into per-actor states and computes per-actor GRPO advantages; results now include actor_id
New MultiAgentOrchestrator drives training via Protocol.generate() and builds microbatches from flattened, per-actor trajectories
Exports updated in verifiers/__init__.py; eval_utils.save_rollout_results writes actor_id
Example environments: rock_paper_scissors (simultaneous moves with custom rollout) and twenty_questions (alternating turns, asymmetric actors), each with datasets and rubrics for per-actor rewards

^{Written by Cursor Bugbot for commit 1e5f474. This will update automatically on new commits. Configure here.}

…bric

CLAassistant · 2026-01-25T10:40:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-25T10:49:26Z

verifiers/__init__.py

+from .envs.actor import Actor  # noqa # isort: skip
+from .envs.protocol import EpisodeRequest, GenerateResult, Protocol  # noqa # isort: skip
+from .envs.multiagent_env import MultiAgentEnv  # noqa # isort: skip
+from .rubrics.multiagent_rubric import MultiAgentRubric  # noqa # isort: skip


New multi-agent classes lack documentation updates

Medium Severity · Bugbot Rules

This PR adds major new user-facing classes (Actor, Protocol, MultiAgentEnv, MultiAgentRubric, MultiAgentOrchestrator) exported in __all__, but no corresponding documentation updates are included. Per the review rules, any PR adding core user-facing functionality needs to update relevant documentation in docs/environments.md, docs/training.md, and docs/reference.md.

Additional Locations (2)

verifiers/envs/multiagent_env.py#L70-L83

verifiers/rl/trainer/multiagent_orchestrator.py#L38-L49

cursor · 2026-01-25T10:49:26Z

verifiers/envs/protocol.py

+                env_name = inp.get("task") or self._get_default_env()
+                env = self.get_env(env_name)
+                if env.rubric:
+                    await env.rubric.score_rollout(state, score_sem=score_sem)


spawn() uses wrong scoring method for multi-agent rubrics

Medium Severity

The spawn() method calls score_rollout() for scoring when score=True (the default). However, MultiAgentRubric stores per-actor reward functions in actor_reward_funcs, which score_rollout() (inherited from parent Rubric) does not use—it only processes functions in self.funcs. Additionally, score_rollout() does not compute advantages, which are required for GRPO training. Spawned multi-agent states would have incomplete rewards and missing advantages.

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…

1e5f474

…bric

cursor bot reviewed Jan 25, 2026

View reviewed changes

Bhoy1 marked this pull request as draft January 25, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… #784

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… #784

Uh oh!

Bhoy1 commented Jan 25, 2026 •

edited by cursor bot

Loading

Uh oh!

CLAassistant commented Jan 25, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 25, 2026

Uh oh!

cursor bot Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… #784

Are you sure you want to change the base?

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… #784

Uh oh!

Conversation

Bhoy1 commented Jan 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

CLAassistant commented Jan 25, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 25, 2026

Choose a reason for hiding this comment

New multi-agent classes lack documentation updates

Uh oh!

cursor bot Jan 25, 2026

Choose a reason for hiding this comment

spawn() uses wrong scoring method for multi-agent rubrics

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bhoy1 commented Jan 25, 2026 •

edited by cursor bot

Loading