Skip to content

Parakeet: Support quantization on XNNPACK#17216

Merged
mergennachin merged 1 commit intomainfrom
parakeet_quantization
Feb 4, 2026
Merged

Parakeet: Support quantization on XNNPACK#17216
mergennachin merged 1 commit intomainfrom
parakeet_quantization

Conversation

@mergennachin
Copy link
Contributor

@mergennachin mergennachin commented Feb 4, 2026

Copilot AI review requested due to automatic review settings February 4, 2026 19:58
@mergennachin mergennachin requested a review from lucylq as a code owner February 4, 2026 19:58
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17216

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 29 Pending

As of commit 0406ebc with merge base 267a59d (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Parakeet TDT export pipeline to support dynamic quantization for XNNPACK (8da4w), wires it into the ExecuTorch lowering path, documents the workflow, and adds CI coverage.

Changes:

  • Updates Parakeet’s quantization config to use intx_choose_qparams_algorithm="hqq_scale_only" for the 8da4w dynamic-activation / 4-bit-weight path.
  • Enhances the Parakeet XNNPACK export pipeline by using both dynamic-quant and general XNNPACK partitioners, enabling quant-fusion and const-prop in ExecutorchBackendConfig, and adds documentation and CLI examples for XNNPACK dynamic quantization.
  • Extends CI and utility scripts (export_model_artifact.sh, test_model_e2e.sh, pull.yml) to support an xnnpack device mode and a new quantized-8da4w preset for Parakeet, with an end-to-end test job.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
examples/models/parakeet/quantize.py Adds intx_choose_qparams_algorithm="hqq_scale_only" to the 8da4w Int8DynamicActivationIntxWeightConfig to align Parakeet linear quantization behavior with HQQ-style scale-only parameter selection.
examples/models/parakeet/export_parakeet_tdt.py Introduces _create_xnnpack_partitioners using both XnnpackDynamicallyQuantizedPartitioner and XnnpackPartitioner, and enables do_quant_fusion_and_const_prop=True in ExecutorchBackendConfig to optimize dynamic quantized XNNPACK exports.
examples/models/parakeet/README.md Documents a concrete “Dynamic Quantization for XNNPACK” example using --backend xnnpack with 8da4w configs and group size 32.
.github/workflows/pull.yml Adds a test-parakeet-xnnpack-linux workflow job that exports Parakeet with XNNPACK + quantized-8da4w and runs the e2e test runner.
.ci/scripts/test_model_e2e.sh Generalizes the script to handle xnnpack as a device, adds the quantized-8da4w option (XNNPACK-only), and maps xnnpack to the *-cpu CMake target so the Parakeet CPU/XNNPACK runner can be built and exercised.
.ci/scripts/export_model_artifact.sh Adds xnnpack as a supported device, introduces a quantized-8da4w quantization preset restricted to XNNPACK, and routes Parakeet exports on XNNPACK to fp32 (no --dtype override) while preserving bf16 for CUDA/Metal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mergennachin mergennachin had a problem deploying to upload-benchmark-results February 4, 2026 20:26 — with GitHub Actions Failure
@mergennachin mergennachin force-pushed the parakeet_quantization branch from f233f47 to 5d7d23f Compare February 4, 2026 20:42
@mergennachin mergennachin force-pushed the parakeet_quantization branch from 5d7d23f to d5b145a Compare February 4, 2026 20:43
@mergennachin mergennachin temporarily deployed to upload-benchmark-results February 4, 2026 22:02 — with GitHub Actions Inactive
Copilot AI review requested due to automatic review settings February 4, 2026 22:38
@mergennachin mergennachin force-pushed the parakeet_quantization branch from d5b145a to 0406ebc Compare February 4, 2026 22:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mergennachin mergennachin merged commit f2f337e into main Feb 4, 2026
296 of 300 checks passed
@mergennachin mergennachin deleted the parakeet_quantization branch February 4, 2026 23:22
@mergennachin mergennachin temporarily deployed to upload-benchmark-results February 4, 2026 23:34 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants