[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d by SS-JIA · Pull Request #17108 · pytorch/executorch

SS-JIA · 2026-02-02T17:14:05Z

Stack from ghstack (oldest at bottom):

Adds a new layout-agnostic quantized depthwise convolution operator
etvk.q8ta_conv2d_dw that uses BufferMetadata and layout specialization
constants to support arbitrary memory layouts (contiguous, channels-last,
4W4C block-packed, etc.).

Key changes:

New shader q8ta_conv2d_dw.glsl:
- Uses BufferMetadata for input/output tensor addressing
- Layout-aware via inp_layout/outp_layout specialization constants
- Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile)
- Includes optimized paths for simple layouts (outer_block_size == 1)
New indexing utilities in indexing.glslh:
- texel_idx_to_tensor4d_idx(): converts linear texel index to 4D tensor coords
- tensor4d_idx_to_texel_idx(): converts 4D tensor index to texel index
Code refactoring:
- Extract Conv2DParams struct and create_conv2d_params() to ConvolutionUtils.h
- Create Q8taConv2dDW.cpp with new operator implementation
- Add Q8taConv2d.h with public API declarations
- Move prepack_quantized_conv2d_dw_weight() to new implementation file
New workgroup size helpers:
- pick_q8ta_conv2d_dw_global_wg_size(): computes {W4, H, C4} dispatch size
- pick_q8ta_conv2d_dw_local_wg_size(): adaptive local size based on tensor dims
Test updates:
- Rename test to test_q8_conv2d_dw.cpp
- Add TestQ8taConv2d.cpp with shared test utilities

Differential Revision: D92061368

Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/) [ghstack-poisoned]

pytorch-bot · 2026-02-02T17:14:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17108

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Pending, 1 Unrelated Failure

As of commit f77a947 with merge base 477867a ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t a95b8de30f54fa93480a4fff57a8f7baef9e88c2c9dc9e1e2c8f5a13523878bc /exec failed with exit code 139
pull / test-vulkan-operators-linux / linux-job (gh)
RuntimeError: Command docker exec -t 22bf7743c7b2e1df7fec25fed48d07178415f543fcfe4d42a4908d070b95e7b0 /exec failed with exit code 134
Test CUDA Builds / export-model-cuda-artifact (openai, whisper-small, quantized-int4-weight-only) / linux-job (gh)
RuntimeError: Command docker exec -t b44405e0be4e6500f2053bd83d87d6ac542a584928d7978dd93ace00882f189d /exec failed with exit code 1
Test CUDA Windows Export / export-model-cuda-windows-artifact (mistralai, Voxtral-Mini-3B-2507, non-quantized) / linux-job (gh)
RuntimeError: Command docker exec -t 8e584969f471bdfeac62787582c3bc3f1a9b4ac3f06bbf15ea61144bc5f4027a /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-samsung-quantmodels-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/) ghstack-source-id: 337539965 Pull Request resolved: #17108

github-actions · 2026-02-02T17:15:36Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…wise conv2d" Adds a new layout-agnostic quantized depthwise convolution operator `etvk.q8ta_conv2d_dw` that uses BufferMetadata and layout specialization constants to support arbitrary memory layouts (contiguous, channels-last, 4W4C block-packed, etc.). Key changes: 1. New shader `q8ta_conv2d_dw.glsl`: - Uses BufferMetadata for input/output tensor addressing - Layout-aware via `inp_layout`/`outp_layout` specialization constants - Processes 4 adjacent width positions × 4 channels per thread (4Wx4C tile) - Includes optimized paths for simple layouts (outer_block_size == 1) 2. New indexing utilities in `indexing.glslh`: - `texel_idx_to_tensor4d_idx()`: converts linear texel index to 4D tensor coords - `tensor4d_idx_to_texel_idx()`: converts 4D tensor index to texel index 3. Code refactoring: - Extract `Conv2DParams` struct and `create_conv2d_params()` to ConvolutionUtils.h - Create Q8taConv2dDW.cpp with new operator implementation - Add Q8taConv2d.h with public API declarations - Move `prepack_quantized_conv2d_dw_weight()` to new implementation file 4. New workgroup size helpers: - `pick_q8ta_conv2d_dw_global_wg_size()`: computes {W4, H, C4} dispatch size - `pick_q8ta_conv2d_dw_local_wg_size()`: adaptive local size based on tensor dims 5. Test updates: - Rename test to `test_q8_conv2d_dw.cpp` - Add `TestQ8taConv2d.cpp` with shared test utilities Differential Revision: [D92061368](https://our.internmc.facebook.com/intern/diff/D92061368/) [ghstack-poisoned]

SS-JIA requested review from kirklandsign and larryliu0820 as code owners February 2, 2026 17:14

This was referenced Feb 2, 2026

[ET-VK][testing] Add per-shader timing breakdown to benchmark output #17105

Open

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Open

SS-JIA mentioned this pull request Feb 2, 2026

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

Open

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 2, 2026

This was referenced Feb 3, 2026

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation #17170

Open

[ET-VK][quantization] Add layout-flexible clone for int8x4 tensors #17171

Open

This was referenced Feb 4, 2026

[ET-VK][qconv] Add layout-agnostic general shader for quantized conv #17219

Open

[ET-VK][testing] Create dedicated test binary for pointwise convolutions #17220

Open

[ET-VK][qconv] Add flexible layout impl for quantized pointwise conv #17221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d#17108

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d#17108
SS-JIA wants to merge 3 commits intogh/SS-JIA/401/basefrom
gh/SS-JIA/401/head

SS-JIA commented Feb 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SS-JIA commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17108

❌ 4 New Failures, 1 Pending, 1 Unrelated Failure

Uh oh!

github-actions bot commented Feb 2, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SS-JIA commented Feb 2, 2026 •

edited

Loading

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

This PR needs a `release notes:` label