[ET-VK][qconv] Add layout-agnostic general shader for quantized conv#17219
[ET-VK][qconv] Add layout-agnostic general shader for quantized conv#17219SS-JIA wants to merge 1 commit intogh/SS-JIA/408/basefrom
Conversation
The existing quantized conv2d implementation (`conv2d_q8ta_q8csw_q8to`) only supports the 4W4C memory layout. This limits its use when models require different tensor layouts. This change introduces a new general-purpose quantized conv2d shader (`q8ta_conv2d`) that works with any memory layout by using BufferMetadata for tensor indexing. The routing logic determines which implementation to use based on input/output layouts: when both are 4W4C, the existing optimized path is used; otherwise, the new general shader handles the computation. This enables quantized conv2d to work seamlessly across 4C1W, 4W4C, and 4C memory layouts. Key changes: - New GLSL shader `q8ta_conv2d.glsl` using layout specialization constants - New `Q8taConv2d.cpp` with operator registration and workgroup size heuristics - Refactored routing in QuantizedConvolution.cpp to dispatch based on layout - Extended test coverage to validate all three memory layouts Authored with Claude. Differential Revision: [D92307252](https://our.internmc.facebook.com/intern/diff/D92307252/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17219
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit e1d06ed with merge base 477867a ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Stack from ghstack (oldest at bottom):
The existing quantized conv2d implementation (
conv2d_q8ta_q8csw_q8to) onlysupports the 4W4C memory layout. This limits its use when models require
different tensor layouts. This change introduces a new general-purpose
quantized conv2d shader (
q8ta_conv2d) that works with any memory layoutby using BufferMetadata for tensor indexing.
The routing logic determines which implementation to use based on input/output
layouts: when both are 4W4C, the existing optimized path is used; otherwise,
the new general shader handles the computation. This enables quantized conv2d
to work seamlessly across 4C1W, 4W4C, and 4C memory layouts.
Key changes:
q8ta_conv2d.glslusing layout specialization constantsQ8taConv2d.cppwith operator registration and workgroup size heuristicsAuthored with Claude.
Differential Revision: D92307252