Skip to content

Conversation

@alexmanle
Copy link

Summary

Adds a simple interface to override agents hyperparameters.

Test Plan

Tested using AIConfigurator workload and GA agent.
Test scenario

Example TOML:

[agent_config]
population_size = 0
n_offsprings = 0
crossover_prob = 0.0
mutation_prob = 0.0
random_seed = 0

New terminal output:

2026-01-27 13:05:11,633 - INFO - Applying agent config overrides for 'ga': {'n_offsprings': 0, 'crossover_prob': 0.0, 'mutation_prob': 0.0, 'random_seed': 0}

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

Added Pydantic agent configuration models, extended TestDefinition with an optional agent_config field, and implemented runtime validation of agent overrides in CLI handlers that validates per-agent config types and conditionally supplies validated kwargs to agent constructors.

Changes

Cohort / File(s) Summary
Agent configuration models
src/cloudai/models/agent_config.py
New Pydantic module introducing AgentConfig (base, extra="forbid") and specialized models: GeneticAlgorithmConfig, BayesianOptimizationConfig, MultiArmedBanditConfig with typed, constrained optional fields and metadata.
Workload model update
src/cloudai/models/workload.py
Added agent_config: Optional[dict[str, Any]] = None to TestDefinition to carry agent override data.
Runtime validation & handler changes
src/cloudai/cli/handlers.py
Added validate_agent_overrides(agent_type, agent_config) -> dict[str, Any], imported ValidationError and agent config models, and updated agent construction flow to validate overrides per-agent type, log field-level ValidationError details, skip invalid overrides, and pass validated kwargs when present.
Miscellaneous
manifest_file, requirements.txt
Updated manifest/requirements to reflect new module/dependencies (lines added).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I nibble on configs, tidy and neat,
Seeds for GA, BO, and arms that compete,
Pydantic keeps rules, no surprises today,
Validated overrides guide each agent's play,
Hop, ship, and repeat — carrots all the way! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Agent params' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changeset. Use a more specific title that describes the main feature, such as 'Add agent hyperparameter override configuration' or 'Support agent config overrides in DSE jobs'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the purpose, implementation, and testing of the agent hyperparameter override feature with concrete examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

This PR introduces a configuration override mechanism for agents, allowing users to customize agent hyperparameters through TOML configuration files.

Key Changes:

  • Added base AgentConfig model with pydantic validation and extra="forbid" to prevent typos
  • Extended BaseAgent with optional config class attribute for agent-specific config schemas
  • Added agent_config field to TestDefinition to pass overrides from TOML files
  • Implemented validate_agent_overrides() function that validates config against agent's schema and provides helpful error messages
  • Agent instantiation now supports **kwargs when overrides are present

Issues Found:

  • Error handling in handlers.py:157 could raise a secondary ValueError when trying to display valid fields if the agent doesn't support config, masking the original ValidationError

The implementation follows pydantic best practices with strict validation and clear error messages. Most previously reported issues have been addressed.

Confidence Score: 4/5

  • PR is generally safe to merge with one minor error handling issue to address
  • The implementation is well-structured with proper validation and error handling. Most concerns from previous threads have been addressed. The one remaining issue is a potential error masking problem in the exception handler that could confuse users, but it won't cause data corruption or runtime failures in production.
  • src/cloudai/cli/handlers.py requires attention for error handling improvement at line 157

Important Files Changed

Filename Overview
src/cloudai/cli/handlers.py Added agent config validation and override mechanism with proper error handling
src/cloudai/configurator/base_agent.py Added optional config class attribute to BaseAgent for agent configuration support
src/cloudai/models/agent_config.py Introduced base AgentConfig model with random_seed field and strict validation
src/cloudai/models/workload.py Added agent_config field to TestDefinition to support configuration overrides

Sequence Diagram

sequenceDiagram
    participant User
    participant handler as handle_dse_job
    participant validator as validate_agent_overrides
    participant Registry
    participant AgentClass
    participant Agent

    User->>handler: DSE job with agent_config
    handler->>validator: validate_agent_overrides(agent_type, agent_config)
    validator->>Registry: get agents_map
    Registry-->>validator: agent classes
    validator->>validator: build config_class_map
    
    alt agent has no config class
        validator-->>handler: raise ValueError
        handler->>User: log error and continue
    else agent has config class
        validator->>AgentClass: config.model_validate(agent_config)
        
        alt validation fails
            AgentClass-->>validator: raise ValidationError
            validator-->>handler: raise ValidationError
            handler->>validator: call again without agent_config (for error details)
            validator-->>handler: return field descriptions
            handler->>User: log validation errors and valid fields
        else validation succeeds
            AgentClass-->>validator: validated_config
            validator->>validator: model_dump(exclude_none=True)
            validator-->>handler: agent_kwargs dict
            handler->>AgentClass: agent_class(env, **agent_kwargs)
            AgentClass-->>handler: agent instance
            handler->>Agent: execute agent steps
        end
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/cloudai/cli/handlers.py`:
- Around line 198-201: The code silently drops a user-supplied agent_config when
no validator exists; update the branch that checks config_class =
config_class_map.get(agent_type) to emit a logging.warning (not debug) when an
agent_config was provided and will be ignored, referencing agent_type and
agent_config in the message so callers understand overrides were ignored; keep
returning {} but ensure the warning clearly states the agent_type and that
agent_config will be ignored (optionally include a truncated/summary of
agent_config) to aid user visibility.

In `@src/cloudai/models/agent_config.py`:
- Around line 51-54: The algorithm Field currently accepts any string which
risks invalid runtime values; restrict and validate it by replacing the loose
Optional[str] with a strict set of allowed values (use a Python Enum or
typing.Literal for "ucb1", "ts" (thompson_sampling), "epsilon_greedy",
"softmax", "random") or add a Pydantic validator on the algorithm field in the
AgentConfig class to raise a clear validation error when an unsupported
algorithm is provided, and update the Field description to match the enforced
choices.
- Around line 42-45: The botorch_num_trials field currently allows any integer
but should only accept -1 or integers >= 1 (matching the semantic described and
the sibling sobol_num_trials). Add a Pydantic validator for botorch_num_trials
on the AgentConfig model (e.g., a method named validate_botorch_num_trials
decorated with `@validator`("botorch_num_trials")) that returns the value if it's
None or equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0
or < -1).

Comment on lines 42 to 45
sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
botorch_num_trials: Optional[int] = Field(
default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual file structure and find agent_config.py
fd -t f "agent_config.py" | head -20

Repository: NVIDIA/cloudai

Length of output: 92


🏁 Script executed:

# Read the agent_config.py file to see current state
cat -n src/cloudai/models/agent_config.py

Repository: NVIDIA/cloudai

Length of output: 3888


🏁 Script executed:

# Search for botorch_num_trials usage throughout the codebase
rg "botorch_num_trials" -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 499


🏁 Script executed:

# Search for files that use BayesianOptimizationConfig or botorch_num_trials
rg "BayesianOptimizationConfig|botorch_num_trials" -t py --files-with-matches

Repository: NVIDIA/cloudai

Length of output: 120


🏁 Script executed:

# Look for the BO agent implementation
fd -t f ".*agent.*" -path "*/bayesian*" -o -path "**/optimization*" | grep -i agent

Repository: NVIDIA/cloudai

Length of output: 226


🏁 Script executed:

# Broader search for agent implementations
fd -t f "agent" -type f

Repository: NVIDIA/cloudai

Length of output: 228


🏁 Script executed:

# Search more broadly for agent implementations
fd -e py "agent" | head -30

Repository: NVIDIA/cloudai

Length of output: 152


🏁 Script executed:

# Also search for where BayesianOptimizationConfig is instantiated
rg "BayesianOptimizationConfig" -B 3 -A 10 -t py

Repository: NVIDIA/cloudai

Length of output: 2822


🏁 Script executed:

# Search for botorch usage in the codebase
rg "botorch" -i -B 3 -A 3

Repository: NVIDIA/cloudai

Length of output: 673


🏁 Script executed:

# Find the base_agent.py file and search for BO implementation
cat src/cloudai/configurator/base_agent.py | head -100

Repository: NVIDIA/cloudai

Length of output: 2224


🏁 Script executed:

# Search for where botorch_num_trials is actually used in the agent
rg "botorch_num_trials" -A 5 -B 5 src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 40


🏁 Script executed:

# Look for any BO agent or related files
fd -e py | xargs rg -l "class.*Bayesian|botorch_num_trials"

Repository: NVIDIA/cloudai

Length of output: 92


🏁 Script executed:

# Search for actual agent implementations/classes
rg "class.*Agent.*:" -A 5 | head -100

Repository: NVIDIA/cloudai

Length of output: 4155


🏁 Script executed:

# Look at test_agents.py to see how these configs are tested
cat tests/test_agents.py

Repository: NVIDIA/cloudai

Length of output: 2749


🏁 Script executed:

# Check if there are any actual BO agent implementations elsewhere
fd -e py -x rg -l "num_trials" {} \;

Repository: NVIDIA/cloudai

Length of output: 94


🏁 Script executed:

# Look for any BO agent implementations or references
fd -e py | xargs rg -l "BayesianOptimization|BoTorch" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 120


🏁 Script executed:

# Check if there's a bayesian agent file
fd -e py -path "*/agent*" -o -path "*/bayesian*"

Repository: NVIDIA/cloudai

Length of output: 226


🏁 Script executed:

# Search for where the agent configs are actually used
rg "botorch_num_trials|sobol_num_trials" -B 5 -A 5

Repository: NVIDIA/cloudai

Length of output: 905


🏁 Script executed:

# Check what agent types are actually implemented
ls -la src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 559


🏁 Script executed:

# Check if BO agent exists or is planned
fd "\.py$" | xargs rg -l "class.*Bayesian" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 92


🏁 Script executed:

# Look at handlers.py more carefully to see how config is used
cat src/cloudai/cli/handlers.py | grep -A 20 "BayesianOptimizationConfig"

Repository: NVIDIA/cloudai

Length of output: 1467


🏁 Script executed:

# Check if there's external documentation or plugin system for agents
rg "entry_points\|entrypoint" -A 5 -B 5

Repository: NVIDIA/cloudai

Length of output: 40


🏁 Script executed:

# Search for any test or documentation mentioning BO agent
rg "bayesian|bo agent" -i -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 1069


Add validation constraint to botorch_num_trials.
The field currently accepts any integer, but the description specifies "(-1 for unlimited until max_steps)", implying only -1 or ≥1 are valid. Without validation, invalid values like 0 or -2 pass Pydantic validation. This is inconsistent with the sibling field sobol_num_trials (line 42), which has ge=1 constraint.

🛠 Proposed fix
-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, field_validator
@@
 class BayesianOptimizationConfig(AgentConfig):
     """Configuration overrides for Bayesian Optimization agent."""
@@
     botorch_num_trials: Optional[int] = Field(
         default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
     )
+
+    `@field_validator`("botorch_num_trials")
+    `@classmethod`
+    def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:
+        if v is None:
+            return v
+        if v != -1 and v < 1:
+            raise ValueError("botorch_num_trials must be -1 or >= 1")
+        return v
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
botorch_num_trials: Optional[int] = Field(
default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
)
sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
botorch_num_trials: Optional[int] = Field(
default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
)
`@field_validator`("botorch_num_trials")
`@classmethod`
def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:
if v is None:
return v
if v != -1 and v < 1:
raise ValueError("botorch_num_trials must be -1 or >= 1")
return v
🤖 Prompt for AI Agents
In `@src/cloudai/models/agent_config.py` around lines 42 - 45, The
botorch_num_trials field currently allows any integer but should only accept -1
or integers >= 1 (matching the semantic described and the sibling
sobol_num_trials). Add a Pydantic validator for botorch_num_trials on the
AgentConfig model (e.g., a method named validate_botorch_num_trials decorated
with `@validator`("botorch_num_trials")) that returns the value if it's None or
equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0 or <
-1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexmanle would be good to consider this. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would be good to test it with BO agent w/ AIConfigurator along with GA.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

err = 1
continue

agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional passes kwargs only when non-empty, but empty dict is falsy. If agent_overrides = {}, the condition evaluates False and takes the else branch. Consider using is not None or explicit length check for clarity:

Suggested change
agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)
agent = agent_class(env, **agent_overrides) if agent_overrides is not None else agent_class(env)

However, given validate_agent_overrides always returns a dict, the current logic works but is subtle.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@alexmanle alexmanle marked this pull request as draft January 28, 2026 20:31
@alexmanle alexmanle changed the base branch from main to agent-dev January 29, 2026 17:51
@alexmanle alexmanle marked this pull request as ready for review January 31, 2026 00:34
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

for error in e.errors():
logging.error(f" - {'.'.join(str(var_name) for var_name in error['loc'])}: {error['msg']}")
logging.error("Valid overrides: ")
for item, desc in validate_agent_overrides(agent_type).items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling validate_agent_overrides(agent_type) in error handler could raise another ValueError if agent doesn't support config, masking original validation error

Suggested change
for item, desc in validate_agent_overrides(agent_type).items():
logging.error(f"Invalid agent_config for agent '{agent_type}': ")
for error in e.errors():
logging.error(f" - {'.'.join(str(var_name) for var_name in error['loc'])}: {error['msg']}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants