-
Notifications
You must be signed in to change notification settings - Fork 238
Changing --enable_prefix_caching param to be consistant with ovms cli api #3936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the --enable_prefix_caching parameter to align with OVMS CLI API conventions by changing it from an action flag to a boolean type with a default value of True. Since prefix caching is now enabled by default, all explicit usages of this parameter have been removed from documentation examples. Additionally, the "devstral" option has been added to the tool_parser choices.
Changes:
- Modified
--enable_prefix_cachingargument fromaction='store_true'totype=boolwithdefault=True - Removed
--enable_prefix_cachingflags from all command examples across documentation - Added "devstral" as a new tool_parser option
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| demos/continuous_batching/accuracy/README.md | Removed --enable_prefix_caching flag from export_model command example |
| demos/common/export_models/export_model.py | Changed enable_prefix_caching to boolean type with True default; added devstral to tool_parser choices |
| demos/common/export_models/README.md | Updated help text to reflect new boolean parameter type and added devstral to tool_parser options |
| demos/code_local_assistant/README.md | Removed --enable_prefix_caching flags from multiple export_model and ovms command examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -21,7 +21,7 @@ mkdir models | |||
| python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format fp16 --kv_cache_precision u8 --config_file_path models/config.json --model_repository_path models | |||
| python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B --weight-format fp16 --kv_cache_precision u8 --config_file_path models/config.json --model_repository_path models | |||
| python demos/common/export_models/export_model.py text_generation --source_model OpenGVLab/InternVL2_5-8B --weight-format fp16 --config_file_path models/config.json --model_repository_path models | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I add --enable_prefix_caching false to these commands?
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
🛠 Summary
CVS-179961
Changing --enable_prefix_caching param to be consistent with ovms cli api, dropping this param since its true by default now,
Adding devstral tool_parser option to export_model
🧪 Checklist
``