Skip to content

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 16, 2026

As part of #5722, I dropped the matmul scheduler for Ampere/Turing. I tried to just disable Swizzle2D but wasn't quite sure how to correctly update the scheduler. Instead of figuring it out, I decided just dropping it.

The matmul tests now only run on Hopper or upper.

@github-actions
Copy link

github-actions bot commented Jan 16, 2026

Review updated until commit 060c34c

Description

  • Remove Ampere-Turing matmul scheduler implementation completely

  • Update architecture requirements from Ampere (8.0) to Hopper (9.0) across all matmul tests

  • Replace scheduler calls with error messages for unsupported architectures

  • Delete matmul_ampere-.cpp and matmul_ampere-.h files entirely

Changes walkthrough

Relevant files
Enhancement
5 files
matmul.cpp
Update architecture guard from Ampere to Hopper                   
+1/-1     
matmul.cpp
Remove Ampere-Turing scheduler include and add error handling
+3/-2     
matmul_ampere-.cpp
Delete entire Ampere-Turing scheduler implementation file
+0/-1338
matmul.h
Update comment to reflect Hopper+ only support                     
+1/-1     
matmul_ampere-.h
Delete entire Ampere-Turing scheduler header file               
+0/-206 
Tests
4 files
test_matmul.cpp
Update matmul tests to require Hopper+ architecture           
+10/-4   
test_matmul_scheduler.cpp
Update scheduler tests to require Hopper+ architecture     
+9/-2     
test_multidevice_matmul.cpp
Update distributed matmul tests to require Hopper+             
+2/-2     
test_translate_mma.cpp
Update MMA translation tests to require Hopper+                   
+6/-6     
Configuration changes
1 files
CMakeLists.txt
Remove Ampere-Turing scheduler source file from build       
+0/-1     

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Breaking Change Documentation

The PR throws an NVF_THROW error when Turing or Ampere macros are detected, which is a breaking change for existing users. Ensure this change is properly documented in release notes and migration guides, as users on these architectures will no longer be able to use the matmul scheduler functionality.

if (isTuring(macro) || isAmpere(macro)) {
  NVF_THROW(
      "Support for Turing and Ampere was dropped. Scheduling functionality "
      "should still be largely available except for swizzling");
} else if (isHopper(macro)) {
Benchmark Architecture Restriction

The SMEM guard was updated from (8,0) to (9,0), which means benchmarks will now only run on Hopper+ architectures. Verify that this is the intended behavior and that benchmark results documentation reflects this change.

9, 0, getSmemSize(cta_tile, number_of_stage), benchmark_state);

@naoyam naoyam force-pushed the remove_swizzle_2d_supplemental branch from a0d36ff to 643b29e Compare January 16, 2026 01:05
@naoyam
Copy link
Collaborator Author

naoyam commented Jan 16, 2026

!test

@naoyam naoyam requested a review from zasdfgbnm January 16, 2026 01:07
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 16, 2026

Greptile Summary

Removes the Ampere/Turing matmul scheduler as part of the effort to eliminate Swizzle2D support. The PR deletes ~1,544 lines of scheduler implementation code and updates all matmul tests to require Hopper (SM 9.0) or newer GPUs.

Key Changes:

  • Deleted csrc/scheduler/matmul_ampere-.cpp and csrc/scheduler/matmul_ampere-.h
  • Replaced Ampere/Turing scheduler invocation with an error throw explaining support was dropped
  • Updated all test files to skip on pre-Hopper architectures using NVFUSER_TEST_CUDA_ARCH_GUARD(9, 0)
  • Removed matmul_ampere-.cpp from CMake build configuration

Minor Issues:

  • Dead code in benchmarks/cpp/matmul.cpp:404-409 - the architecture range check for [8.0, 9.0) is unreachable since the earlier guard already skips anything below 9.0

Confidence Score: 5/5

  • Safe to merge - well-executed code removal with comprehensive test updates
  • The PR cleanly removes deprecated functionality with no remaining references to the deleted code. All affected tests have been properly updated with architecture guards. The only issue is minor dead code that doesn't affect functionality.
  • No files require special attention - the dead code in benchmarks/cpp/matmul.cpp is a cosmetic issue

Important Files Changed

Filename Overview
csrc/scheduler/matmul.cpp Removed include and replaced Ampere/Turing scheduler call with error throw
csrc/scheduler/matmul_ampere-.cpp File deleted - removed 1338 lines of Ampere/Turing scheduler implementation
csrc/scheduler/matmul_ampere-.h File deleted - removed 206 lines of Ampere/Turing scheduler declarations
tests/cpp/test_matmul.cpp Added Hopper guard (9.0+) to MatmulTest base class and MLPBenchmarkTest
benchmarks/cpp/matmul.cpp Changed arch guard from 8.0 to 9.0, but left dead code checking for 8.0-9.0 range

Sequence Diagram

sequenceDiagram
    participant Client
    participant MatmulScheduler
    participant ArchCheck
    participant HopperScheduler
    participant AmpereMinus as AmpereMinus (REMOVED)
    
    Client->>MatmulScheduler: schedule(fusion, params)
    MatmulScheduler->>MatmulScheduler: Get MMA macro
    
    alt isTuring(macro) || isAmpere(macro)
        MatmulScheduler->>AmpereMinus: BEFORE: schedule_matmul::AmpereMinus(fusion, mparams).run()
        Note over AmpereMinus: Removed 1544 lines<br/>(matmul_ampere-.cpp/h)
        MatmulScheduler->>MatmulScheduler: AFTER: throw NVF_THROW error
        Note right of MatmulScheduler: "Support for Turing and<br/>Ampere was dropped"
    else isHopper(macro)
        MatmulScheduler->>HopperScheduler: schedule_matmul::Hopper(fusion, mparams).run()
        HopperScheduler-->>MatmulScheduler: Success
    else isBlackwell(macro)
        MatmulScheduler->>MatmulScheduler: schedule_matmul::Blackwell(fusion, mparams).run()
    end
    
    Note over ArchCheck: All tests now require<br/>Hopper (arch 9.0+)
    ArchCheck->>ArchCheck: BEFORE: CUDA arch >= 7.5/8.0
    ArchCheck->>ArchCheck: AFTER: CUDA arch >= 9.0
    
    MatmulScheduler-->>Client: Scheduled fusion or error
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. tests/cpp/test_matmul_scheduler.cpp, line 142 (link)

    logic: Test will fail on Turing/Ampere since scheduler now throws error for these architectures. Should be NVFUSER_TEST_CUDA_ARCH_RANGE_GUARD(9, 0, 10, 0);

  2. tests/cpp/test_matmul_scheduler.cpp, line 168 (link)

    logic: Ampere (8.0) no longer supported. Update to NVFUSER_TEST_CUDA_ARCH_GUARD(9, 0);

  3. tests/cpp/test_matmul_scheduler.cpp, line 142-3245 (link)

    logic: 48 tests in this file still have CUDA arch guards for Turing (7.5) and/or Ampere (8.0). Since the scheduler now throws for these architectures (see csrc/scheduler/matmul.cpp:80), all these tests will fail. All guards with 7, 5 should start at 9, 0 and all guards with upper bound 9, 0 should have upper bound 10, 0. Are these tests intentionally meant to verify error handling for unsupported architectures, or should they be updated to only run on Hopper+?

  4. csrc/scheduler/matmul.h, line 49 (link)

    syntax: Stale reference to AmpereMinus in comment - this class was removed

7 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 16, 2026

!test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. benchmarks/cpp/matmul.cpp, line 404-409 (link)

    style: dead code - the NVFUSER_BENCHMARK_ARCH_SMEM_GUARD(9, 0, ...) on line 401 already skips anything below arch 9.0, so this check for architectures in range [8.0, 9.0) will never execute

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants