Improve handling of the NVTE_CUDA_ARCHS by ptrendx · Pull Request #2665 · NVIDIA/TransformerEngine

ptrendx · 2026-02-09T23:09:22Z

Description

Improve handling of the NVTE_CUDA_ARCHS env variable:

add the regular architectures to the build of the sources with specific architectures to enable some support for GPU architectures in the family that were not specialized directly.
automatically add sm75 to the build in case the CMAKE_CUDA_ARCHITECTURES becomes empty (which currently should only happen when cmake < 4.0.2 and sm120 is the only selected architecture)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

greptile-apps · 2026-02-09T23:20:51Z

Greptile Overview

Greptile Summary

This PR refactors the CUDA architecture handling to improve compatibility across GPU families. The key changes are:

Architecture Compilation Strategy:

sm_100/101: Now compiles arch-specific sources for both generic (sm_100) and specialized variants (sm_100a, sm_103a). Previously only compiled specialized variants. This enables broader support for GPUs in the Blackwell family.
sm_110/120: Intelligently routes to family-specific variants (110f, 120f) when CMake 4.0.2+ supports 'f' suffix natively, otherwise falls back to manual NVTE_GENERIC_ARCHS + NVTE_SPECIFIC_ARCHS handling.
Fallback for old CMake: Adds sm_75 when CMAKE_CUDA_ARCHITECTURES becomes empty on CMake < 4.0.2 (only happens when sm_120 is sole selected arch).

Safety Check Changes:

NVTE_ARCH_SPECIFIC_TARGETS is now unconditionally set to TRUE (line 104), which defines NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 for all arch-specific sources.
This disables the compile-time static_assert in ptx.cuh that catches misuse of arch/family-specific features when compiling for generic architectures.
The assert is replaced with a runtime compatibility check that returns false if the arch doesn't match.

Critical Issue:
The unconditional flag on line 104 disables compile-time safety checks even when NVTE_SPECIFIC_ARCHS is empty (e.g., building only sm_70/80/89/90). The flag should be conditional on whether arch-specific targets are actually being compiled.

Confidence Score: 3/5

Safe to merge with one critical logic issue that disables compile-time safety checks unconditionally
The architecture routing logic is correct and well-structured for CMake version compatibility. The sm_100/101 double-compilation is intentional for family compatibility. However, line 104 unconditionally disables compile-time safety checks that prevent misuse of arch-specific features, even when no arch-specific compilation is happening. This reduces build safety for common configurations.
Pay close attention to transformer_engine/common/CMakeLists.txt line 104 - the unconditional flag should be conditional

Important Files Changed

Filename	Overview
transformer_engine/common/CMakeLists.txt	Improves arch handling for sm_100/110/120 families by keeping generic archs in CMAKE_CUDA_ARCHITECTURES for broader compatibility, but unconditionally disables compile-time safety checks (line 104)
transformer_engine/common/util/ptx.cuh	Adds runtime fallback for arch-specific checks when NVTE_HAS_ARCH_SPECIFIC_TARGETS=1, replacing compile-time static_assert with runtime compatibility check

Sequence Diagram

sequenceDiagram
    participant User as User/CMake
    participant CMakeLists as CMakeLists.txt
    participant Compiler as CUDA Compiler
    participant ArchSpecific as Arch-Specific Sources
    participant Generic as Generic Sources

    User->>CMakeLists: Set CMAKE_CUDA_ARCHITECTURES (e.g., 100, 110, 120)
    
    Note over CMakeLists: Process architecture mappings
    
    alt Arch 100/101 (new behavior)
    CMakeLists->>CMakeLists: Keep "100" in CMAKE_CUDA_ARCHITECTURES
    CMakeLists->>CMakeLists: Add "100a" to NVTE_SPECIFIC_ARCHS
    end
    
    alt Arch 110/120 with CMake >= 4.0.2
    CMakeLists->>CMakeLists: Replace "110" with "110f" in CMAKE_CUDA_ARCHITECTURES
    CMakeLists->>CMakeLists: Replace "120" with "120f" in CMAKE_CUDA_ARCHITECTURES
    end
    
    alt Arch 110/120 with CMake < 4.0.2
    CMakeLists->>CMakeLists: Remove from CMAKE_CUDA_ARCHITECTURES
    CMakeLists->>CMakeLists: Add to NVTE_GENERIC_ARCHS + NVTE_SPECIFIC_ARCHS
    end
    
    CMakeLists->>CMakeLists: Set NVTE_ARCH_SPECIFIC_TARGETS = TRUE
    
    Note over CMakeLists,ArchSpecific: Compilation phase
    
    CMakeLists->>Generic: Compile with CMAKE_CUDA_ARCHITECTURES
    CMakeLists->>Generic: Add --generate-code for NVTE_GENERIC_ARCHS
    
    CMakeLists->>ArchSpecific: Compile with CMAKE_CUDA_ARCHITECTURES (e.g., sm_100)
    CMakeLists->>ArchSpecific: Add --generate-code for NVTE_SPECIFIC_ARCHS (e.g., sm_100a)
    CMakeLists->>ArchSpecific: Define NVTE_HAS_ARCH_SPECIFIC_TARGETS=1
    
    ArchSpecific->>Compiler: Disable static_assert (use runtime check instead)
    
    Compiler->>User: Build artifacts with both generic and specific arch code

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T23:20:54Z

transformer_engine/common/CMakeLists.txt

@@ -54,31 +52,57 @@ endif()
 # Check for architecture 101 (if we see this we are in toolkit <= 12.9)
 list(FIND CMAKE_CUDA_ARCHITECTURES "101" arch_101_index)
 if(NOT arch_101_index EQUAL -1)
-  list(REMOVE_ITEM CMAKE_CUDA_ARCHITECTURES "101")
-  list(APPEND NVTE_GENERIC_ARCHS "101")
  list(APPEND NVTE_SPECIFIC_ARCHS "101a")
 endif()


Arch 100/101 not removed

When "100" / "101" are present in CMAKE_CUDA_ARCHITECTURES, this block only appends 100a/101a to NVTE_SPECIFIC_ARCHS but never removes the base arch from CMAKE_CUDA_ARCHITECTURES nor adds it to NVTE_GENERIC_ARCHS. As a result, the build will still compile all sources for sm_100 / sm_101 (via CMAKE_CUDA_ARCHITECTURES) while also compiling arch-specific sources for sm_100a / sm_101a (via --generate-code), which defeats the “generic + specific” split and can cause unexpected extra build work / incorrect targeting. This differs from the 110/120 handling below, which removes the base arch and routes it into the explicit lists.

@greptileai That is because arches 110/120 are routed to the "f" (like 120f) variant where the entire family is covered, whereas in the case of 100, it routes to "a" variants (like 100a) - this is intentional behavior, but in order to support different Arches in the same family, we need to still compile for the regular 100 then.

greptile-apps · 2026-02-09T23:20:56Z

transformer_engine/common/CMakeLists.txt

+if(CMAKE_VERSION VERSION_LESS 4.0.2)
+  if(NOT CMAKE_CUDA_ARCHITECTURES)
+    message(WARNING
+            "CMAKE_CUDA_ARCHITECTURES is empty after replacing arch-specific targets. "
+            "Please upgrade to CMake 4.0.2+ for native 'f' architecture support. "
+            "Adding sm_75 target in addition to the specified target to avoid configuration "
+            "errors - this will result in longer build time, but does not affect correctness.")
+    set(CMAKE_CUDA_ARCHITECTURES 75)
  endif()


Fallback wipes user archs

On CMake < 4.0.2, if the arch list becomes empty you set(CMAKE_CUDA_ARCHITECTURES 75) (dropping the user’s originally requested arch entirely). Given the warning text says “add sm_75 target in addition to the specified target”, this looks like a functional mismatch: the fallback should preserve the originally selected arch(es) and append 75, not replace everything with 75.

@greptileai If the list is empty, then setting is the same as appending.

ptrendx · 2026-02-11T20:55:25Z

/te-ci

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T21:08:45Z

transformer_engine/common/CMakeLists.txt

+set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)
+
 # cuDNN frontend API


Always-on arch-specific define

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE) makes NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 get added to all transformer_engine_cuda_arch_specific_sources (lines ~247-255) even when NVTE_SPECIFIC_ARCHS is empty (e.g., user selects only 70/80/89/90). That flips the behavior in util/ptx.cuh to skip the compile-time static_assert that catches “compiled generic but using arch/family-specific features”, so those mistakes will now silently pass in exactly the builds where no arch-specific codegen flags are being injected for these sources.

ptrendx · 2026-02-12T19:15:32Z

/te-ci

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T19:18:16Z

transformer_engine/common/CMakeLists.txt

  endif()
 endif()

+set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)


Unconditional flag disables compile-time safety checks. NVTE_ARCH_SPECIFIC_TARGETS is set to TRUE even when NVTE_SPECIFIC_ARCHS is empty (e.g., user builds only sm_70/80/89/90). This causes NVTE_HAS_ARCH_SPECIFIC_TARGETS=1 to be defined for all arch-specific sources (lines 249-258), which disables the static_assert in ptx.cuh lines 34-38 and 56-60. The compile-time check that catches misuse of arch/family-specific features in generic builds is now always off, even when no arch-specific code generation is happening.

Suggested change

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)

if(NVTE_SPECIFIC_ARCHS)

set(NVTE_ARCH_SPECIFIC_TARGETS TRUE)

endif()

ptrendx added 4 commits January 26, 2026 15:57

Trial and error

8f9804a

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Should be ok

09e6ca8

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix one issue and add better message to the fallback path

5740300

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fixes

71041ae

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

ptrendx requested a review from ksivaman February 9, 2026 23:09

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Merge branch 'main' into pr_get_arch_cmake2

495739c

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

Merge branch 'main' into pr_get_arch_cmake2

18f5a3c

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of the NVTE_CUDA_ARCHS#2665

Improve handling of the NVTE_CUDA_ARCHS#2665
ptrendx wants to merge 6 commits intoNVIDIA:mainfrom
ptrendx:pr_get_arch_cmake2

ptrendx commented Feb 9, 2026

Uh oh!

greptile-apps bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 9, 2026

Uh oh!

ptrendx Feb 10, 2026

Uh oh!

greptile-apps bot Feb 9, 2026

Uh oh!

ptrendx Feb 10, 2026 •

edited

Loading

Uh oh!

ptrendx commented Feb 11, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 11, 2026

Uh oh!

ptrendx commented Feb 12, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ptrendx commented Feb 9, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Feb 11, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Feb 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Feb 9, 2026 •

edited

Loading

ptrendx Feb 10, 2026 •

edited

Loading