Skip to content

Conversation

@mzegla
Copy link
Collaborator

@mzegla mzegla commented Jan 30, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 30, 2026 12:19
@mzegla mzegla added the 2026.0 label Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Eagle3 support to the codebase by introducing sequential processing enforcement and configuration options. Eagle3 is a speculative decoding variant that requires stricter limitations than standard speculative decoding, including forced greedy sampling and single-request processing.

Changes:

  • Added Eagle3-specific configuration fields and mutex-based sequential processing enforcement
  • Created new EAGLE3 decoding method with enforced greedy sampling (disabling random sampling and beam search)
  • Updated documentation to clarify Eagle3 limitations and enforcement mechanisms

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/llm/servable.hpp Added mutex and lock fields to enforce sequential processing for Eagle3
src/llm/servable.cpp Added Eagle3 mode detection and decoding method assignment
src/llm/llm_calculator.proto Added draft_eagle3_mode configuration option and renumbered subsequent fields
src/llm/language_model/continuous_batching/servable_initializer.cpp Set eagle3Mode property from node options
src/llm/io_processing/base_generation_config_builder.hpp Added EAGLE3 enum value and documentation
src/llm/io_processing/base_generation_config_builder.cpp Implemented Eagle3 configuration enforcement (greedy sampling only)
src/llm/http_llm_calculator.cc Added lock acquisition/release logic for Eagle3 sequential processing
demos/continuous_batching/speculative_decoding/README.md Updated documentation with enforcement details
demos/common/export_models/export_model.py Renamed flag and updated template to enforce max_num_seqs=1 for Eagle3

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


namespace ovms {

// TODO: Monitor Eagle3 sampling support in GenAI and update this when Eagle3 supports more sampling strategies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add CVS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not exist. As far as I know different sampling degrades performance, so no ticket has been created to support it.

Copy link
Collaborator

@dkalinowski dkalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants