Parakeet: Support quantization on XNNPACK#17175
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17175
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 22e744b with merge base eee5d96 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR adds support for dynamic quantization (8da4w) on the XNNPACK backend for the Parakeet TDT speech recognition model.
Changes:
- Adds HQQ (Half-Quadratic Quantization) scale-only algorithm for 8da4w quantization configuration
- Enables XNNPACK backend to handle both dynamically quantized operations and remaining floating-point operations using dual partitioners
- Adds documentation and examples for using dynamic quantization with XNNPACK
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| examples/models/parakeet/quantize.py | Adds intx_choose_qparams_algorithm="hqq_scale_only" parameter to 8da4w quantization config for improved quantization quality with grouped quantization |
| examples/models/parakeet/export_parakeet_tdt.py | Imports and uses XnnpackDynamicallyQuantizedPartitioner alongside XnnpackPartitioner for handling dynamic quantization ops; enables quantization fusion and constant propagation |
| examples/models/parakeet/README.md | Adds documentation and example command for using 8da4w dynamic quantization with XNNPACK backend |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Can you add a CI? I don't know which job specifically but you should be able to run this script https://github.com/pytorch/executorch/blob/main/.ci/scripts/test_model_e2e.sh |
|
Duplicate #17216 |
model.pte is 719.2 MB
Runtime output: