-
Notifications
You must be signed in to change notification settings - Fork 238
Eagle3 - update docs, enforce limitations #3939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Eagle3 support to the codebase by introducing sequential processing enforcement and configuration options. Eagle3 is a speculative decoding variant that requires stricter limitations than standard speculative decoding, including forced greedy sampling and single-request processing.
Changes:
- Added Eagle3-specific configuration fields and mutex-based sequential processing enforcement
- Created new
EAGLE3decoding method with enforced greedy sampling (disabling random sampling and beam search) - Updated documentation to clarify Eagle3 limitations and enforcement mechanisms
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/llm/servable.hpp | Added mutex and lock fields to enforce sequential processing for Eagle3 |
| src/llm/servable.cpp | Added Eagle3 mode detection and decoding method assignment |
| src/llm/llm_calculator.proto | Added draft_eagle3_mode configuration option and renumbered subsequent fields |
| src/llm/language_model/continuous_batching/servable_initializer.cpp | Set eagle3Mode property from node options |
| src/llm/io_processing/base_generation_config_builder.hpp | Added EAGLE3 enum value and documentation |
| src/llm/io_processing/base_generation_config_builder.cpp | Implemented Eagle3 configuration enforcement (greedy sampling only) |
| src/llm/http_llm_calculator.cc | Added lock acquisition/release logic for Eagle3 sequential processing |
| demos/continuous_batching/speculative_decoding/README.md | Updated documentation with enforcement details |
| demos/common/export_models/export_model.py | Renamed flag and updated template to enforce max_num_seqs=1 for Eagle3 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| namespace ovms { | ||
|
|
||
| // TODO: Monitor Eagle3 sampling support in GenAI and update this when Eagle3 supports more sampling strategies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add CVS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not exist. As far as I know different sampling degrades performance, so no ticket has been created to support it.
dkalinowski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
No description provided.