Fix KeyError in InsertIOQDQ pass for LLM quantization#17194
Fix KeyError in InsertIOQDQ pass for LLM quantization#17194mohammed-saalim wants to merge 1 commit intopytorch:mainfrom
Conversation
Extend q_dq_map to include dequantize ops mapping to themselves. This fixes KeyError when nodes have dequantize encodings (e.g., dequantize_per_tensor.default) instead of quantize encodings. Fixes pytorch#16690
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 1 Unrelated FailureAs of commit 5ebb788 with merge base 2ace1cc ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
While changing the quantization recipe (like using 8-bit KV cache) might change the graph structure, the InsertIOQDQ.py |
|
@pytorchbot label "release notes: none" |
There was a problem hiding this comment.
Pull request overview
This PR fixes a KeyError in the InsertIOQDQ pass that occurred when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend. The error was caused by missing entries in the q_dq_map dictionary for dequantize operations.
Changes:
- Extended
q_dq_mapwith identity mappings for dequantize operations to handle nodes that already have dequantize encodings - Added three new entries mapping dequantize operations to themselves (per-tensor default, per-tensor tensor, and per-channel default)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
This PR fixes a
KeyErrorin the InsertIOQDQ pass that occurrs when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend.Problem
In insert_io_qdq.py, the
q_dq_mapdictionary was missing entries for dequantize operations. When a node's quantization encoding was already a dequantize operation (e.g.,dequantize_per_tensor.default), trying to look it up in the map during the _insert phase caused aKeyError.Solution
Extended the
q_dq_mapto include dequantize-to-self (identity) mappings for:quantized_decomposed.dequantize_per_tensor.defaultquantized_decomposed.dequantize_per_tensor.tensorquantized_decomposed.dequantize_per_channel.defaultThis allows the pass to correctly handle nodes that have already been processed into dequantized form.
Testing
astmodule.q_dq_mapnow contains the expected 6 keys.Fixes Qualcomm Quantization and Lowering for LLM fails #16690
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin