Skip to content

Fix KeyError in InsertIOQDQ pass for LLM quantization#17194

Open
mohammed-saalim wants to merge 1 commit intopytorch:mainfrom
mohammed-saalim:fix-insert-io-qdq-keyerror
Open

Fix KeyError in InsertIOQDQ pass for LLM quantization#17194
mohammed-saalim wants to merge 1 commit intopytorch:mainfrom
mohammed-saalim:fix-insert-io-qdq-keyerror

Conversation

@mohammed-saalim
Copy link
Contributor

@mohammed-saalim mohammed-saalim commented Feb 4, 2026

Summary

This PR fixes a KeyError in the InsertIOQDQ pass that occurrs when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend.

Problem

In insert_io_qdq.py, the q_dq_map dictionary was missing entries for dequantize operations. When a node's quantization encoding was already a dequantize operation (e.g., dequantize_per_tensor.default), trying to look it up in the map during the _insert phase caused a KeyError.

Solution

Extended the q_dq_map to include dequantize-to-self (identity) mappings for:

  • quantized_decomposed.dequantize_per_tensor.default
  • quantized_decomposed.dequantize_per_tensor.tensor
  • quantized_decomposed.dequantize_per_channel.default
    This allows the pass to correctly handle nodes that have already been processed into dequantized form.

Testing

  • Verified that the modified file parses correctly via Python's ast module.
  • Confirmed that q_dq_map now contains the expected 6 keys.
  • Manual verification on Qualcomm hardware is requested from the maintainers to confirm resolution for the SmolLM2 workflow.
    Fixes Qualcomm Quantization and Lowering for LLM fails #16690

cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin

Extend q_dq_map to include dequantize ops mapping to themselves.
This fixes KeyError when nodes have dequantize encodings (e.g.,
dequantize_per_tensor.default) instead of quantize encodings.

Fixes pytorch#16690
Copilot AI review requested due to automatic review settings February 4, 2026 06:06
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit 5ebb788 with merge base 2ace1cc (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026
@mohammed-saalim
Copy link
Contributor Author

While changing the quantization recipe (like using 8-bit KV cache) might change the graph structure, the InsertIOQDQ.py
pass should still be robust enough to handle dequantize operations in the IR without throwing a KeyError. This PR ensures the pass is forward-compatible with models that already have these encodings

@mohammed-saalim
Copy link
Contributor Author

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot bot added the release notes: none Do not include this in the release notes label Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a KeyError in the InsertIOQDQ pass that occurred when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend. The error was caused by missing entries in the q_dq_map dictionary for dequantize operations.

Changes:

  • Extended q_dq_map with identity mappings for dequantize operations to handle nodes that already have dequantize encodings
  • Added three new entries mapping dequantize operations to themselves (per-tensor default, per-tensor tensor, and per-channel default)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nil-is-all nil-is-all added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ labels Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qualcomm Quantization and Lowering for LLM fails

2 participants