Parakeet: Support quantization on XNNPACK#17216
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17216
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 29 PendingAs of commit 0406ebc with merge base 267a59d ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR extends the Parakeet TDT export pipeline to support dynamic quantization for XNNPACK (8da4w), wires it into the ExecuTorch lowering path, documents the workflow, and adds CI coverage.
Changes:
- Updates Parakeet’s quantization config to use
intx_choose_qparams_algorithm="hqq_scale_only"for the 8da4w dynamic-activation / 4-bit-weight path. - Enhances the Parakeet XNNPACK export pipeline by using both dynamic-quant and general XNNPACK partitioners, enabling quant-fusion and const-prop in
ExecutorchBackendConfig, and adds documentation and CLI examples for XNNPACK dynamic quantization. - Extends CI and utility scripts (
export_model_artifact.sh,test_model_e2e.sh,pull.yml) to support anxnnpackdevice mode and a newquantized-8da4wpreset for Parakeet, with an end-to-end test job.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
examples/models/parakeet/quantize.py |
Adds intx_choose_qparams_algorithm="hqq_scale_only" to the 8da4w Int8DynamicActivationIntxWeightConfig to align Parakeet linear quantization behavior with HQQ-style scale-only parameter selection. |
examples/models/parakeet/export_parakeet_tdt.py |
Introduces _create_xnnpack_partitioners using both XnnpackDynamicallyQuantizedPartitioner and XnnpackPartitioner, and enables do_quant_fusion_and_const_prop=True in ExecutorchBackendConfig to optimize dynamic quantized XNNPACK exports. |
examples/models/parakeet/README.md |
Documents a concrete “Dynamic Quantization for XNNPACK” example using --backend xnnpack with 8da4w configs and group size 32. |
.github/workflows/pull.yml |
Adds a test-parakeet-xnnpack-linux workflow job that exports Parakeet with XNNPACK + quantized-8da4w and runs the e2e test runner. |
.ci/scripts/test_model_e2e.sh |
Generalizes the script to handle xnnpack as a device, adds the quantized-8da4w option (XNNPACK-only), and maps xnnpack to the *-cpu CMake target so the Parakeet CPU/XNNPACK runner can be built and exercised. |
.ci/scripts/export_model_artifact.sh |
Adds xnnpack as a supported device, introduces a quantized-8da4w quantization preset restricted to XNNPACK, and routes Parakeet exports on XNNPACK to fp32 (no --dtype override) while preserving bf16 for CUDA/Metal. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f233f47 to
5d7d23f
Compare
5d7d23f to
d5b145a
Compare
d5b145a to
0406ebc
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Old PR: #17175
CI: https://github.com/pytorch/executorch/actions/runs/21691143266/job/62550971847?pr=17216