Agent params #792

alexmanle · 2026-01-27T22:03:42Z

Summary

Adds a simple interface to override agents hyperparameters.

Test Plan

Tested using AIConfigurator workload and GA agent.
Test scenario

Example TOML:

[agent_config]
population_size = 0
n_offsprings = 0
crossover_prob = 0.0
mutation_prob = 0.0
random_seed = 0

New terminal output:

2026-01-27 13:05:11,633 - INFO - Applying agent config overrides for 'ga': {'n_offsprings': 0, 'crossover_prob': 0.0, 'mutation_prob': 0.0, 'random_seed': 0}

coderabbitai · 2026-01-27T22:04:28Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

Added Pydantic agent configuration models, extended TestDefinition with an optional agent_config field, and implemented runtime validation of agent overrides in CLI handlers that validates per-agent config types and conditionally supplies validated kwargs to agent constructors.

Changes

Cohort / File(s)	Summary
Agent configuration models `src/cloudai/models/agent_config.py`	New Pydantic module introducing `AgentConfig` (base, extra="forbid") and specialized models: `GeneticAlgorithmConfig`, `BayesianOptimizationConfig`, `MultiArmedBanditConfig` with typed, constrained optional fields and metadata.
Workload model update `src/cloudai/models/workload.py`	Added `agent_config: Optional[dict[str, Any]] = None` to `TestDefinition` to carry agent override data.
Runtime validation & handler changes `src/cloudai/cli/handlers.py`	Added `validate_agent_overrides(agent_type, agent_config) -> dict[str, Any]`, imported `ValidationError` and agent config models, and updated agent construction flow to validate overrides per-agent type, log field-level ValidationError details, skip invalid overrides, and pass validated kwargs when present.
Miscellaneous `manifest_file`, `requirements.txt`	Updated manifest/requirements to reflect new module/dependencies (lines added).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I nibble on configs, tidy and neat,
Seeds for GA, BO, and arms that compete,
Pydantic keeps rules, no surprises today,
Validated overrides guide each agent's play,
Hop, ship, and repeat — carrots all the way! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Agent params' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changeset.	Use a more specific title that describes the main feature, such as 'Add agent hyperparameter override configuration' or 'Support agent config overrides in DSE jobs'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the purpose, implementation, and testing of the agent hyperparameter override feature with concrete examples.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-01-27T22:07:09Z

Greptile Overview

Greptile Summary

This PR introduces a configuration override mechanism for agents, allowing users to customize agent hyperparameters through TOML configuration files.

Key Changes:

Added base AgentConfig model with pydantic validation and extra="forbid" to prevent typos
Extended BaseAgent with optional config class attribute for agent-specific config schemas
Added agent_config field to TestDefinition to pass overrides from TOML files
Implemented validate_agent_overrides() function that validates config against agent's schema and provides helpful error messages
Agent instantiation now supports **kwargs when overrides are present

Issues Found:

Error handling in handlers.py:157 could raise a secondary ValueError when trying to display valid fields if the agent doesn't support config, masking the original ValidationError

The implementation follows pydantic best practices with strict validation and clear error messages. Most previously reported issues have been addressed.

Confidence Score: 4/5

PR is generally safe to merge with one minor error handling issue to address
The implementation is well-structured with proper validation and error handling. Most concerns from previous threads have been addressed. The one remaining issue is a potential error masking problem in the exception handler that could confuse users, but it won't cause data corruption or runtime failures in production.
src/cloudai/cli/handlers.py requires attention for error handling improvement at line 157

Important Files Changed

Filename	Overview
src/cloudai/cli/handlers.py	Added agent config validation and override mechanism with proper error handling
src/cloudai/configurator/base_agent.py	Added optional config class attribute to BaseAgent for agent configuration support
src/cloudai/models/agent_config.py	Introduced base AgentConfig model with random_seed field and strict validation
src/cloudai/models/workload.py	Added agent_config field to TestDefinition to support configuration overrides

Sequence Diagram

sequenceDiagram
    participant User
    participant handler as handle_dse_job
    participant validator as validate_agent_overrides
    participant Registry
    participant AgentClass
    participant Agent

    User->>handler: DSE job with agent_config
    handler->>validator: validate_agent_overrides(agent_type, agent_config)
    validator->>Registry: get agents_map
    Registry-->>validator: agent classes
    validator->>validator: build config_class_map
    
    alt agent has no config class
        validator-->>handler: raise ValueError
        handler->>User: log error and continue
    else agent has config class
        validator->>AgentClass: config.model_validate(agent_config)
        
        alt validation fails
            AgentClass-->>validator: raise ValidationError
            validator-->>handler: raise ValidationError
            handler->>validator: call again without agent_config (for error details)
            validator-->>handler: return field descriptions
            handler->>User: log validation errors and valid fields
        else validation succeeds
            AgentClass-->>validator: validated_config
            validator->>validator: model_dump(exclude_none=True)
            validator-->>handler: agent_kwargs dict
            handler->>AgentClass: agent_class(env, **agent_kwargs)
            AgentClass-->>handler: agent instance
            handler->>Agent: execute agent steps
        end
    end

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

src/cloudai/cli/handlers.py

src/cloudai/models/agent_config.py

src/cloudai/cli/handlers.py

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/cloudai/cli/handlers.py`:
- Around line 198-201: The code silently drops a user-supplied agent_config when
no validator exists; update the branch that checks config_class =
config_class_map.get(agent_type) to emit a logging.warning (not debug) when an
agent_config was provided and will be ignored, referencing agent_type and
agent_config in the message so callers understand overrides were ignored; keep
returning {} but ensure the warning clearly states the agent_type and that
agent_config will be ignored (optionally include a truncated/summary of
agent_config) to aid user visibility.

In `@src/cloudai/models/agent_config.py`:
- Around line 51-54: The algorithm Field currently accepts any string which
risks invalid runtime values; restrict and validate it by replacing the loose
Optional[str] with a strict set of allowed values (use a Python Enum or
typing.Literal for "ucb1", "ts" (thompson_sampling), "epsilon_greedy",
"softmax", "random") or add a Pydantic validator on the algorithm field in the
AgentConfig class to raise a clear validation error when an unsupported
algorithm is provided, and update the Field description to match the enforced
choices.
- Around line 42-45: The botorch_num_trials field currently allows any integer
but should only accept -1 or integers >= 1 (matching the semantic described and
the sibling sobol_num_trials). Add a Pydantic validator for botorch_num_trials
on the AgentConfig model (e.g., a method named validate_botorch_num_trials
decorated with `@validator`("botorch_num_trials")) that returns the value if it's
None or equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0
or < -1).

src/cloudai/cli/handlers.py

coderabbitai · 2026-01-27T22:12:42Z

src/cloudai/models/agent_config.py

+    sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
+    botorch_num_trials: Optional[int] = Field(
+        default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual file structure and find agent_config.py fd -t f "agent_config.py" | head -20

Repository: NVIDIA/cloudai

Length of output: 92

🏁 Script executed:

# Read the agent_config.py file to see current state cat -n src/cloudai/models/agent_config.py

Repository: NVIDIA/cloudai

Length of output: 3888

🏁 Script executed:

# Search for botorch_num_trials usage throughout the codebase rg "botorch_num_trials" -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 499

🏁 Script executed:

# Search for files that use BayesianOptimizationConfig or botorch_num_trials rg "BayesianOptimizationConfig|botorch_num_trials" -t py --files-with-matches

Repository: NVIDIA/cloudai

Length of output: 120

🏁 Script executed:

# Look for the BO agent implementation fd -t f ".*agent.*" -path "*/bayesian*" -o -path "**/optimization*" | grep -i agent

Repository: NVIDIA/cloudai

Length of output: 226

🏁 Script executed:

# Broader search for agent implementations fd -t f "agent" -type f

Repository: NVIDIA/cloudai

Length of output: 228

🏁 Script executed:

# Search more broadly for agent implementations fd -e py "agent" | head -30

Repository: NVIDIA/cloudai

Length of output: 152

🏁 Script executed:

# Also search for where BayesianOptimizationConfig is instantiated rg "BayesianOptimizationConfig" -B 3 -A 10 -t py

Repository: NVIDIA/cloudai

Length of output: 2822

🏁 Script executed:

# Search for botorch usage in the codebase rg "botorch" -i -B 3 -A 3

Repository: NVIDIA/cloudai

Length of output: 673

🏁 Script executed:

# Find the base_agent.py file and search for BO implementation cat src/cloudai/configurator/base_agent.py | head -100

Repository: NVIDIA/cloudai

Length of output: 2224

🏁 Script executed:

# Search for where botorch_num_trials is actually used in the agent rg "botorch_num_trials" -A 5 -B 5 src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 40

🏁 Script executed:

# Look for any BO agent or related files fd -e py | xargs rg -l "class.*Bayesian|botorch_num_trials"

Repository: NVIDIA/cloudai

Length of output: 92

🏁 Script executed:

# Search for actual agent implementations/classes rg "class.*Agent.*:" -A 5 | head -100

Repository: NVIDIA/cloudai

Length of output: 4155

🏁 Script executed:

# Look at test_agents.py to see how these configs are tested cat tests/test_agents.py

Repository: NVIDIA/cloudai

Length of output: 2749

🏁 Script executed:

# Check if there are any actual BO agent implementations elsewhere fd -e py -x rg -l "num_trials" {} \;

Repository: NVIDIA/cloudai

Length of output: 94

🏁 Script executed:

# Look for any BO agent implementations or references fd -e py | xargs rg -l "BayesianOptimization|BoTorch" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 120

🏁 Script executed:

# Check if there's a bayesian agent file fd -e py -path "*/agent*" -o -path "*/bayesian*"

Repository: NVIDIA/cloudai

Length of output: 226

🏁 Script executed:

# Search for where the agent configs are actually used rg "botorch_num_trials|sobol_num_trials" -B 5 -A 5

Repository: NVIDIA/cloudai

Length of output: 905

🏁 Script executed:

# Check what agent types are actually implemented ls -la src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 559

🏁 Script executed:

# Check if BO agent exists or is planned fd "\.py$" | xargs rg -l "class.*Bayesian" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 92

🏁 Script executed:

# Look at handlers.py more carefully to see how config is used cat src/cloudai/cli/handlers.py | grep -A 20 "BayesianOptimizationConfig"

Repository: NVIDIA/cloudai

Length of output: 1467

🏁 Script executed:

# Check if there's external documentation or plugin system for agents rg "entry_points\|entrypoint" -A 5 -B 5

Repository: NVIDIA/cloudai

Length of output: 40

🏁 Script executed:

# Search for any test or documentation mentioning BO agent rg "bayesian|bo agent" -i -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 1069

Add validation constraint to botorch_num_trials.
The field currently accepts any integer, but the description specifies "(-1 for unlimited until max_steps)", implying only -1 or ≥1 are valid. Without validation, invalid values like 0 or -2 pass Pydantic validation. This is inconsistent with the sibling field sobol_num_trials (line 42), which has ge=1 constraint.

🛠 Proposed fix

-from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, field_validator @@ class BayesianOptimizationConfig(AgentConfig): """Configuration overrides for Bayesian Optimization agent.""" @@ botorch_num_trials: Optional[int] = Field( default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)" ) + + `@field_validator`("botorch_num_trials") + `@classmethod` + def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]: + if v is None: + return v + if v != -1 and v < 1: + raise ValueError("botorch_num_trials must be -1 or >= 1") + return v

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")

botorch_num_trials: Optional[int] = Field(

default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"

)

sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")

botorch_num_trials: Optional[int] = Field(

default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"

)

`@field_validator`("botorch_num_trials")

`@classmethod`

def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:

if v is None:

return v

if v != -1 and v < 1:

raise ValueError("botorch_num_trials must be -1 or >= 1")

return v

🤖 Prompt for AI Agents

In `@src/cloudai/models/agent_config.py` around lines 42 - 45, The botorch_num_trials field currently allows any integer but should only accept -1 or integers >= 1 (matching the semantic described and the sibling sobol_num_trials). Add a Pydantic validator for botorch_num_trials on the AgentConfig model (e.g., a method named validate_botorch_num_trials decorated with `@validator`("botorch_num_trials")) that returns the value if it's None or equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0 or < -1).

@alexmanle would be good to consider this. wdyt?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Also would be good to test it with BO agent w/ AIConfigurator along with GA.

src/cloudai/models/agent_config.py

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T22:15:12Z

src/cloudai/cli/handlers.py

+            err = 1
+            continue
+
+        agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)


Conditional passes kwargs only when non-empty, but empty dict is falsy. If agent_overrides = {}, the condition evaluates False and takes the else branch. Consider using is not None or explicit length check for clarity:

Suggested change

agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)

agent = agent_class(env, **agent_overrides) if agent_overrides is not None else agent_class(env)

However, given validate_agent_overrides always returns a dict, the current logic works but is subtle.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

src/cloudai/cli/handlers.py

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-31T00:38:03Z

src/cloudai/cli/handlers.py

+            for error in e.errors():
+                logging.error(f"  - {'.'.join(str(var_name) for var_name in error['loc'])}: {error['msg']}")
+            logging.error("Valid overrides: ")
+            for item, desc in validate_agent_overrides(agent_type).items():


calling validate_agent_overrides(agent_type) in error handler could raise another ValueError if agent doesn't support config, masking original validation error

Suggested change

for item, desc in validate_agent_overrides(agent_type).items():

logging.error(f"Invalid agent_config for agent '{agent_type}': ")

for error in e.errors():

logging.error(f" - {'.'.join(str(var_name) for var_name in error['loc'])}: {error['msg']}")

alexmanle added 2 commits January 27, 2026 13:46

agent hyperparam interface

1012aa0

fix formatting

bd551e4

alexmanle requested review from amaslenn, jeffnvidia and srivatsankrishnan as code owners January 27, 2026 22:03

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

src/cloudai/cli/handlers.py Outdated Show resolved Hide resolved

src/cloudai/models/agent_config.py Outdated Show resolved Hide resolved

src/cloudai/cli/handlers.py Outdated Show resolved Hide resolved

default pass no kwargs

c1ecf25

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

jeffnvidia approved these changes Jan 28, 2026

View reviewed changes

alexmanle marked this pull request as draft January 28, 2026 20:31

alexmanle changed the base branch from main to agent-dev January 29, 2026 17:51

alexmanle added 3 commits January 29, 2026 13:27

update error logging

9e7d6c3

fix docstring

e7fee0c

make agent interface abstract

cfb38f5

alexmanle marked this pull request as ready for review January 31, 2026 00:34

greptile-apps bot reviewed Jan 31, 2026

View reviewed changes

-    sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
-    botorch_num_trials: Optional[int] = Field(
-        default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
-    )
+    sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
+    botorch_num_trials: Optional[int] = Field(
+        default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
+    )
+    `@field_validator`("botorch_num_trials")
+    `@classmethod`
+    def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:
+        if v is None:
+            return v
+        if v != -1 and v < 1:
+            raise ValueError("botorch_num_trials must be -1 or >= 1")
+        return v

	agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)
	agent = agent_class(env, **agent_overrides) if agent_overrides is not None else agent_class(env)

Agent params #792

Are you sure you want to change the base?

Agent params #792

Conversation

alexmanle commented Jan 27, 2026

Summary

Test Plan

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

greptile-apps bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srivatsankrishnan Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

srivatsankrishnan Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

greptile-apps bot commented Jan 27, 2026 •

edited

Loading

coderabbitai bot Jan 27, 2026 •

edited

Loading