Parakeet: Support quantization on XNNPACK by mergennachin · Pull Request #17216 · pytorch/executorch

mergennachin · 2026-02-04T19:58:03Z

Old PR: #17175

CI: https://github.com/pytorch/executorch/actions/runs/21691143266/job/62550971847?pr=17216

pytorch-bot · 2026-02-04T19:58:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17216

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 29 Pending

As of commit 0406ebc with merge base 267a59d ():

NEW FAILURE - The following job has failed:

trunk / test-mcu-cortex-m-backend / linux-job (gh)
RuntimeError: Command docker exec -t 099c5b2dcad6e3467bf562382368c4601a2bcf9cd3c6a29285cd2d4ec7f4fbff /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-04T19:58:46Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR extends the Parakeet TDT export pipeline to support dynamic quantization for XNNPACK (8da4w), wires it into the ExecuTorch lowering path, documents the workflow, and adds CI coverage.

Changes:

Updates Parakeet’s quantization config to use intx_choose_qparams_algorithm="hqq_scale_only" for the 8da4w dynamic-activation / 4-bit-weight path.
Enhances the Parakeet XNNPACK export pipeline by using both dynamic-quant and general XNNPACK partitioners, enabling quant-fusion and const-prop in ExecutorchBackendConfig, and adds documentation and CLI examples for XNNPACK dynamic quantization.
Extends CI and utility scripts (export_model_artifact.sh, test_model_e2e.sh, pull.yml) to support an xnnpack device mode and a new quantized-8da4w preset for Parakeet, with an end-to-end test job.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`examples/models/parakeet/quantize.py`	Adds `intx_choose_qparams_algorithm="hqq_scale_only"` to the 8da4w `Int8DynamicActivationIntxWeightConfig` to align Parakeet linear quantization behavior with HQQ-style scale-only parameter selection.
`examples/models/parakeet/export_parakeet_tdt.py`	Introduces `_create_xnnpack_partitioners` using both `XnnpackDynamicallyQuantizedPartitioner` and `XnnpackPartitioner`, and enables `do_quant_fusion_and_const_prop=True` in `ExecutorchBackendConfig` to optimize dynamic quantized XNNPACK exports.
`examples/models/parakeet/README.md`	Documents a concrete “Dynamic Quantization for XNNPACK” example using `--backend xnnpack` with 8da4w configs and group size 32.
`.github/workflows/pull.yml`	Adds a `test-parakeet-xnnpack-linux` workflow job that exports Parakeet with XNNPACK + `quantized-8da4w` and runs the e2e test runner.
`.ci/scripts/test_model_e2e.sh`	Generalizes the script to handle `xnnpack` as a device, adds the `quantized-8da4w` option (XNNPACK-only), and maps `xnnpack` to the `*-cpu` CMake target so the Parakeet CPU/XNNPACK runner can be built and exercised.
`.ci/scripts/export_model_artifact.sh`	Adds `xnnpack` as a supported device, introduces a `quantized-8da4w` quantization preset restricted to XNNPACK, and routes Parakeet exports on XNNPACK to fp32 (no `--dtype` override) while preserving bf16 for CUDA/Metal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 4, 2026 19:58

mergennachin requested a review from lucylq as a code owner February 4, 2026 19:58

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026

mergennachin mentioned this pull request Feb 4, 2026

Parakeet: Support quantization on XNNPACK #17175

Closed

mergennachin requested a review from larryliu0820 February 4, 2026 19:58

Copilot started reviewing on behalf of mergennachin February 4, 2026 19:58 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

mergennachin had a problem deploying to upload-benchmark-results February 4, 2026 20:26 — with GitHub Actions Failure

mergennachin force-pushed the parakeet_quantization branch from f233f47 to 5d7d23f Compare February 4, 2026 20:42

mergennachin requested a review from kirklandsign as a code owner February 4, 2026 20:42

mergennachin force-pushed the parakeet_quantization branch from 5d7d23f to d5b145a Compare February 4, 2026 20:43

mergennachin temporarily deployed to upload-benchmark-results February 4, 2026 22:02 — with GitHub Actions Inactive

Parakeet: Support quantization on XNNPACK

0406ebc

Copilot AI review requested due to automatic review settings February 4, 2026 22:38

mergennachin force-pushed the parakeet_quantization branch from d5b145a to 0406ebc Compare February 4, 2026 22:38

Copilot started reviewing on behalf of mergennachin February 4, 2026 22:39 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

larryliu0820 approved these changes Feb 4, 2026

View reviewed changes

mergennachin merged commit f2f337e into main Feb 4, 2026
296 of 300 checks passed

mergennachin deleted the parakeet_quantization branch February 4, 2026 23:22

mergennachin temporarily deployed to upload-benchmark-results February 4, 2026 23:34 — with GitHub Actions Inactive

mergennachin mentioned this pull request Feb 5, 2026

Metal backend: Add Metal int4 quantization support to Parakeet #17235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parakeet: Support quantization on XNNPACK#17216

Parakeet: Support quantization on XNNPACK#17216
mergennachin merged 1 commit intomainfrom
parakeet_quantization

mergennachin commented Feb 4, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mergennachin commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17216

❌ 1 New Failure, 29 Pending

Uh oh!

github-actions bot commented Feb 4, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mergennachin commented Feb 4, 2026 •

edited

Loading

pytorch-bot bot commented Feb 4, 2026 •

edited

Loading

This PR needs a `release notes:` label