Add NVTE_KEEP_BACKWARD_UNQUANTIZED by zianglih · Pull Request #2644 · NVIDIA/TransformerEngine

zianglih · 2026-02-03T00:48:37Z

Description

@HumansAnd

Add an NVTE_KEEP_BACKWARD_UNQUANTIZED env var for quantized fprop + high precision wgrad & dgrad.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

greptile-apps · 2026-02-03T00:52:31Z

Greptile Overview

Greptile Summary

Added NVTE_KEEP_BACKWARD_UNQUANTIZED environment variable to enable quantized forward pass with high-precision backward gradients (wgrad & dgrad). When enabled, the forward pass quantizes inputs and weights to FP8/FP4, but backward gradients remain in high precision (BF16/FP32).

Critical Issues:

DelayedScaling recipe has assertion at line 220 that blocks quantize_backward=False, making the feature unusable with this recipe type despite env var being set
LayerNormMLP module crashes immediately when env var is set (line 238-240 assertion)
Potential AttributeError crash in quantize.py when recipe is None (line 62)

Implementation:

Modified all recipe classes to support quantize_backward field (defaults to True unless NVTE_KEEP_BACKWARD_UNQUANTIZED=1)
Updated linear operations to conditionally save high-precision tensors for backward pass
Test coverage is comprehensive (756 lines) but intentionally excludes DelayedScaling since it's not supported
Feature works correctly with Float8CurrentScaling, MXFP8BlockScaling, Float8BlockScaling, and NVFP4BlockScaling recipes

Confidence Score: 1/5

Critical crashes block the feature for common use cases - needs fixes before merge
Three critical logic errors will cause immediate crashes: (1) DelayedScaling assertion blocks env var usage for the most common recipe type, (2) LayerNormMLP crashes on any usage with env var, (3) potential None reference crash in quantize logic. These are blocking issues that prevent the feature from working in production.
transformer_engine/common/recipe/__init__.py (line 220 assertion), transformer_engine/pytorch/module/layernorm_mlp.py (line 238 assertion), and transformer_engine/pytorch/ops/basic/quantize.py (line 62 None check) require fixes

Important Files Changed

Filename	Overview
transformer_engine/common/recipe/init.py	Added `quantize_backward` field to all recipe classes with env var support, but `DelayedScaling.__post_init__` at line 220 has assertion that blocks `quantize_backward=False`, making `NVTE_KEEP_BACKWARD_UNQUANTIZED=1` crash immediately for this recipe type
transformer_engine/pytorch/module/layernorm_mlp.py	Added assertion at line 238-240 that crashes when `NVTE_KEEP_BACKWARD_UNQUANTIZED=1` is set, making `LayerNormMLP` completely unusable with this env var
transformer_engine/pytorch/ops/basic/quantize.py	Added recipe override logic, but line 62 calls `FP8GlobalStateManager.get_fp8_recipe().quantize_backward` without None check - will crash if recipe is None
transformer_engine/pytorch/ops/basic/basic_linear.py	Added `keep_backward_unquantized` support to conditionally save high-precision tensors and skip backward quantization - implementation looks correct
transformer_engine/pytorch/module/linear.py	Sets `save_original_input=True` when `keep_backward_unquantized` is enabled - works correctly with Float8CurrentScaling and other non-DelayedScaling recipes
tests/pytorch/test_keep_backward_unquantized.py	Comprehensive test file with 756 lines covering the feature - notably excludes `DelayedScaling` from test cases, only testing `Float8CurrentScaling`, `MXFP8BlockScaling`, `Float8BlockScaling`, and `NVFP4BlockScaling`

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe
    participant Linear
    participant BasicLinear
    participant Quantize

    User->>Recipe: Set NVTE_KEEP_BACKWARD_UNQUANTIZED=1
    Recipe->>Recipe: quantize_backward = False
    
    Note over Recipe: DelayedScaling: CRASHES HERE<br/>(assertion at line 220)
    
    User->>Linear: forward(input)
    Linear->>Linear: keep_backward_unquantized = True
    Linear->>Linear: save_original_input = True
    
    Linear->>Quantize: quantize(input)
    Quantize->>Quantize: Check recipe.quantize_forward
    Note over Quantize: Potential crash if recipe is None
    Quantize-->>Linear: quantized_input (FP8)
    
    Linear->>BasicLinear: forward(quantized_input, weight)
    BasicLinear->>BasicLinear: Save high-precision input for backward
    BasicLinear-->>Linear: output
    
    User->>Linear: backward(grad_output)
    Linear->>BasicLinear: backward(grad_output)
    Note over BasicLinear: Uses high-precision saved tensors<br/>Skip quantization in backward
    BasicLinear->>BasicLinear: wgrad = grad_output @ input_hp
    BasicLinear->>BasicLinear: dgrad = grad_output @ weight_hp
    BasicLinear-->>Linear: grad_input (high precision)
    Linear-->>User: gradients (BF16/FP32)

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

zianglih · 2026-02-03T08:31:14Z

I'll work on potential unit test breakage.

transformer_engine/pytorch/ops/fuser.py

Signed-off-by: Ziang Li <ziangli@umich.edu>

transformer_engine/pytorch/module/layernorm_linear.py

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{5 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/quantization.py

transformer_engine/pytorch/module/linear.py

transformer_engine/pytorch/module/layernorm_linear.py

transformer_engine/pytorch/module/layernorm_mlp.py

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/module/linear.py

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/quantization.py

… is used Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{5 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/module/layernorm_mlp.py

greptile-apps · 2026-02-03T18:44:46Z

transformer_engine/pytorch/module/layernorm_linear.py

        ln_out_return = None
        if return_layernorm_output or return_layernorm_output_gathered:
            ln_out_return = ln_out
+        ln_out_hp = ln_out if keep_backward_unquantized else None


storing both ln_out (quantized) and ln_out_hp (high precision) doubles the memory footprint for this activation

verify this memory overhead is acceptable for your target models, especially during training with large batch sizes or long sequences

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/common/recipe/__init__.py

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{12 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T19:59:27Z

transformer_engine/pytorch/ops/basic/quantize.py

        # Check if FP8 is enabled
        fp8_enabled = FP8GlobalStateManager.is_fp8_enabled()
        quantize_forward = fp8_enabled and self._quantize_forward
-        quantize_backward = fp8_enabled and self._quantize_backward
+        quantize_backward = (


Recipe None crash

FP8GlobalStateManager.get_fp8_recipe() can be None (e.g., if FP8 is enabled but no recipe was set), so ...get_fp8_recipe().quantize_backward will raise an AttributeError. This makes Quantize.forward() crash in that configuration; please guard for None (e.g., default to quantized backward when recipe is unset, or explicitly error with a clearer message).

transformer_engine/pytorch/module/layernorm_mlp.py

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

greptile-apps

_{6 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T20:39:00Z

transformer_engine/common/recipe/__init__.py

+        assert not (
+            not self.quantize_forward and self.quantize_backward
+        ), "Invalid recipe configuration: quantize_backward=True requires quantize_forward=True."
+        assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."


assertion prevents using NVTE_KEEP_BACKWARD_UNQUANTIZED=1 with DelayedScaling recipe - when env var is set, quantize_backward becomes False, making this assert fail and blocking the entire feature for this recipe type

greptile-apps · 2026-02-09T20:39:01Z

transformer_engine/pytorch/module/layernorm_mlp.py

+        assert (
+            not keep_backward_unquantized
+        ), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"


assertion makes LayerNormMLP crash immediately when NVTE_KEEP_BACKWARD_UNQUANTIZED=1 is set - if this module doesn't support the feature, either implement it or handle more gracefully with a clear error message before reaching this point

greptile-apps · 2026-02-09T20:39:03Z

Additional Comments (1)

transformer_engine/common/recipe/__init__.py
Float8BlockScaling is missing field declarations for quantize_forward and quantize_backward, but they're used in __post_init__ (line 407) and __repr__ (line 425-426) - will cause AttributeError at runtime

    fp8_mha: bool = False
    quantize_forward: bool = True
    quantize_backward: bool = not (os.getenv("NVTE_KEEP_BACKWARD_UNQUANTIZED", "0") == "1")

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{10 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T20:56:49Z

transformer_engine/pytorch/ops/basic/basic_linear.py


        # Save state for backward pass
        if ctx.requires_grad:
+            saved_input = input_ if keep_backward_unquantized else x_local
+            saved_weight = self.weight if keep_backward_unquantized else w
            if is_cpu_offload_enabled():
-                mark_activation_offload(x_local)
-            ctx.save_for_backward(x_local, w)
-            ctx.with_quantized_compute = with_quantized_compute
+                mark_activation_offload(saved_input)
+            ctx.save_for_backward(saved_input, saved_weight)


Unnecessary saved tensors

The forward path now saves saved_input/saved_weight whenever ctx.requires_grad is true, even when weight_requires_grad or input_requires_grad is false. In cases like frozen weights (common for finetune) or when only one side needs grads, this will save extra tensors and can materially increase activation memory. The prior if not weight_requires_grad: saved_input = None / if not input_requires_grad: saved_weight = None logic avoided that.

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{6 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T23:32:07Z

Additional Comments (1)

transformer_engine/pytorch/ops/fused/userbuffers_forward_linear.py
Backward remains quantized
NVTE_KEEP_BACKWARD_UNQUANTIZED is implemented across other linear paths by saving high-precision tensors for backward and setting ctx.with_quantized_compute = fp8 && !keep_backward_unquantized. In the Userbuffers forward fusion, the forward path never computes/propagates keep_backward_unquantized and always saves x_local, w (potentially quantized) and sets linear_op_ctx.with_quantized_compute = with_quantized_compute, so UB-enabled execution will still take the quantized-backward path even when recipe.quantize_backward=False (i.e., when the feature is intended to disable quantized backward).

zianglih · 2026-02-10T02:40:34Z

Currently, without NVTE_KEEP_BACKWARD_UNQUANTIZED , unit test is aligned with main:
te-2644.log
te-main.log

ptrendx · 2026-02-10T21:39:35Z

transformer_engine/common/recipe/__init__.py

+    quantize_forward : bool, default = True
+        Whether to quantize tensors in the forward pass.
+    quantize_backward : bool, default = True
+        Whether to quantize tensors in the backward pass.


Not sure we need that for the custom recipe, since there we can just specify the quantizers we want, but sure, we can have it to keep the API consistent.

ptrendx · 2026-02-10T21:39:35Z

transformer_engine/pytorch/module/layernorm_mlp.py

+        )
+        assert (
+            not keep_backward_unquantized
+        ), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"


This is unfortunate.

ptrendx · 2026-02-10T21:42:37Z

transformer_engine/pytorch/module/linear.py

+        )
+        if keep_backward_unquantized:
+            # Note, NVTE_KEEP_BACKWARD_UNQUANTIZED is ignored when delayed scaling is used
+            save_original_input = True


We should also make sure that we don't create the columnwise version of the input.

The input_quantizer columnwise is disabled here: https://github.com/NVIDIA/TransformerEngine/pull/2644/changes/BASE..253873a4560b2c2a2c909918cc3ee26500e5b43d#diff-864ad36a21c571fb178499535cfada611df4a82223c9ffbfea872dda39972eaeR335-R342

ptrendx · 2026-02-10T21:50:21Z

@zianglih Thank you for your contribution!
Apart from the specific ones, 2 general comments:

the quantize_forward option seems unnecessary -> it has to be True for the quantize_backward to be True and if it is False it basically means that the entire quantization is false (which is the field we already have in autocast anyway). Because of that I think it should be removed since it is confusing.
Could you add some tests for the new functionality? Otherwise it will be very easy to inadvertently break this when making changes.

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

greptile-apps

_{16 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-10T22:06:33Z

transformer_engine/common/recipe/__init__.py

+        assert not (
+            not self.quantize_forward and self.quantize_backward
+        ), "Invalid recipe configuration: quantize_backward=True requires quantize_forward=True."
+        assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."


blocks NVTE_KEEP_BACKWARD_UNQUANTIZED=1 with DelayedScaling - when env var is set, quantize_backward becomes False, triggering this assertion and preventing the feature from working with this recipe type

greptile-apps · 2026-02-10T22:06:34Z

transformer_engine/pytorch/module/layernorm_mlp.py

+        assert (
+            not keep_backward_unquantized


hard crash when NVTE_KEEP_BACKWARD_UNQUANTIZED=1 - LayerNormMLP becomes completely unusable with this env var

zianglih · 2026-02-10T22:09:08Z

Hi @zhongbozhu @timmoon10 @ptrendx , thank you so much for reviewing!

I have implemented and added the unit test. All new tests passed:

root@B200-55:~/TransformerEngine# NVTE_KEEP_BACKWARD_UNQUANTIZED=1 python3 -m pytest -v -s tests/pytorch/test_keep_backward_unquantized.py
=================================================================================== test session starts ====================================================================================
platform linux -- Python 3.12.3, pytest-8.2.1, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/root/TransformerEngine/.hypothesis/examples'))
rootdir: /root/TransformerEngine
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.12.1, shard-0.1.2, flakefinder-1.1.0, xdist-3.8.0, rerunfailures-16.1, hypothesis-6.130.8, xdoctest-1.0.2
collected 112 items                                                                                                                                                                        
Running 112 items in this shard: tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8CurrentScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-MXFP8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-NVFP4BlockScaling], tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_autocast_respects_quantize_forward_flag, tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_quantize_op_respects_recipe_overrides, tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_is_invalid_for_delayed_scaling, tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_not_implemented_for_layernorm_mlp

tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_recipe_defaults[NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[no_bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-2d_m32_k64_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k64_n128-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-layernorm_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-Float8BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_matches_quantized_fprop_and_unquantized_grads[bias-3d_m32_k128_n64-ops_linear-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-no_bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[uniform_splits-bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-no_bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-Float8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_grouped_linear_matches_quantized_fprop_and_unquantized_grads[with_empty_split-bias-NVFP4BlockScaling] SKIPPED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-Float8BlockScaling] SKIPPED (Fusible ops (te_op...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[bias_add-ForwardLinearBiasAdd-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-Float8BlockScaling] SKIPPED (Fusible ops (te_...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_linear_paths[scale_add-ForwardLinearScaleAdd-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-Float8BlockScaling] SKIPPED (Fus...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[2d_m32_k64-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8CurrentScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-MXFP8BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-Float8BlockScaling] SKIPPED (Fus...)
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_fused_bias_activation_matches_masked_linear_backward[3d_m32_k64-NVFP4BlockScaling] PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_autocast_respects_quantize_forward_flag PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_quantize_op_respects_recipe_overrides PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_is_invalid_for_delayed_scaling PASSED
tests/pytorch/test_keep_backward_unquantized.py::test_keep_backward_unquantized_not_implemented_for_layernorm_mlp PASSED

===================================================================================== warnings summary =====================================================================================
../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480
../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

../../usr/local/lib/python3.12/dist-packages/torch/library.py:357
  /usr/local/lib/python3.12/dist-packages/torch/library.py:357: UserWarning: Warning only once for all operators,  other operators may also be overridden.
    Overriding a previously registered kernel for the same operator and the same dispatch key
    operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) -> Tensor
      registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:926
    dispatch key: ADInplaceOrView
    previous kernel: no debug info
         new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:926 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.)
    self.m.impl(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================== 98 passed, 14 skipped, 3 warnings in 7.21s ========================================================================

zianglih · 2026-02-10T22:14:27Z

Hi @timmoon10 , @ptrendx,

the quantize_forward option seems unnecessary

This design was from @timmoon10 's comment here: #2644 (comment)

This option name is specific to this workflow and doesn't generalize well. How about we break this up into two options: quantize_forward and quantize_backward. We have the following cases:

quantize_forward=True, quantize_backward=True: Equivalent to quantized case. In the future we might be able to replace FP8GlobalStateManager.FP8_ENABLED with FP8GlobalStateManager.QUANTIZE_FORWARD or FP8GlobalStateManager.QUANTIZE_BACKWARD.

quantize_forward=False, quantize_backward=False: Equivalent to unquantized case.

quantize_forward=True, quantize_backward=False: Your desired workflow.

quantize_forward=False, quantize_backward=True: We can error out in this case, but who know if someone in the future might want this.

Which way do we prefer? Thanks!

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps

_{16 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-10T23:35:58Z

transformer_engine/common/recipe/__init__.py

+        assert not (
+            not self.quantize_forward and self.quantize_backward
+        ), "Invalid recipe configuration: quantize_backward=True requires quantize_forward=True."
+        assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."


blocks NVTE_KEEP_BACKWARD_UNQUANTIZED=1 with DelayedScaling recipe

when env var is set, quantize_backward becomes False, but this assertion requires it to be True - the feature cannot work with this recipe type at all

Suggested change

assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."

# Note: DelayedScaling does not support quantize_backward=False yet

greptile-apps · 2026-02-10T23:35:59Z

transformer_engine/pytorch/module/layernorm_mlp.py

+        assert (
+            not keep_backward_unquantized
+        ), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"


hard crash when NVTE_KEEP_BACKWARD_UNQUANTIZED=1

setting the env var makes LayerNormMLP completely unusable - crashes immediately on first use

Suggested change

assert (

not keep_backward_unquantized

), "NVTE_KEEP_BACKWARD_UNQUANTIZED is not implemented in LayerNormMLP"

if keep_backward_unquantized:

raise NotImplementedError(

"NVTE_KEEP_BACKWARD_UNQUANTIZED is not yet implemented in LayerNormMLP"

)

greptile-apps · 2026-02-10T23:36:00Z

transformer_engine/pytorch/ops/basic/quantize.py

+        # Recipe quantize overrides
+        if FP8GlobalStateManager.get_fp8_recipe() is not None:
+            quantize_forward = (
+                quantize_forward and FP8GlobalStateManager.get_fp8_recipe().quantize_forward
+            )
+            quantize_backward = (
+                quantize_backward and FP8GlobalStateManager.get_fp8_recipe().quantize_backward


get_fp8_recipe() returns None when FP8 is enabled but no recipe set

calling .quantize_backward on None will crash with AttributeError

Suggested change

# Recipe quantize overrides

if FP8GlobalStateManager.get_fp8_recipe() is not None:

quantize_forward = (

quantize_forward and FP8GlobalStateManager.get_fp8_recipe().quantize_forward

)

quantize_backward = (

quantize_backward and FP8GlobalStateManager.get_fp8_recipe().quantize_backward

# Recipe quantize overrides

recipe = FP8GlobalStateManager.get_fp8_recipe()

if recipe is not None:

quantize_forward = quantize_forward and recipe.quantize_forward

quantize_backward = quantize_backward and recipe.quantize_backward

zianglih · 2026-02-11T00:38:20Z

Full unit tests results, with the newly added test_keep_backward_unquantized:
te-2644.log

zianglih and others added 2 commits February 2, 2026 16:45

Add NVTE_KEEP_BACKWARD_UNQUANTIZED

3afce1f

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

72149be

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/ops/fuser.py Outdated Show resolved Hide resolved

Merge branch 'main' into keep-bwd

3e6eb64

Signed-off-by: Ziang Li <ziangli@umich.edu>

ziang-and force-pushed the keep-bwd branch from 539af7d to 3e6eb64 Compare February 3, 2026 08:58

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_linear.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zianglih added 2 commits February 3, 2026 09:36

Disable ub and clean up

927d482

Signed-off-by: Ziang Li <ziangli@umich.edu>

Drop fuser changes

cc85b60

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zianglih and others added 2 commits February 3, 2026 09:56

Replace use_quantized_bwd with use_fp8_bwd

fe24f95

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

5ca3615

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/linear.py Outdated Show resolved Hide resolved

Ignore keep_backward_unquantized if delayed scaling

5ba7674

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/quantization.py Outdated Show resolved Hide resolved

Refactor ignoring NVTE_KEEP_BACKWARD_UNQUANTIZED when delayed scaling…

02b7b2a

… is used Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Add back missing ctx.debug

01a7de0

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zianglih added 2 commits February 3, 2026 11:50

Refactor changes under fused

bf904aa

Signed-off-by: Ziang Li <ziangli@umich.edu>

Clean up

b449fc4

Signed-off-by: Ziang Li <ziangli@umich.edu>

ziang-and force-pushed the keep-bwd branch from c934298 to b449fc4 Compare February 3, 2026 19:51

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zianglih added 2 commits February 3, 2026 14:11

Refactor high-precision overwrite if keep_backward_unquantized

de3acaf

Signed-off-by: Ziang Li <ziangli@umich.edu>

Clean up

fe65d34

Signed-off-by: Ziang Li <ziangli@umich.edu>

zianglih mentioned this pull request Feb 9, 2026

[Feature] Support nvfp4 RL THUDM/slime#1505

Open

zianglih added 2 commits February 9, 2026 11:29

Simplify env var logic

004cb45

Signed-off-by: Ziang Li <ziangli@umich.edu>

Move validation check to recipe

9baccfd

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

transformer_engine/common/recipe/__init__.py Outdated Show resolved Hide resolved

zianglih added 2 commits February 9, 2026 11:55

Simplify effective_enabled

207eb5a

Signed-off-by: Ziang Li <ziangli@umich.edu>

Fix inverted assertion logic

15117b1

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

zianglih and others added 2 commits February 9, 2026 12:33

Simplify changes under ops

3fc5270

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9201d19

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Simplify ctx.keep_backward_unquantized

1e0f1d2

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Fix missing attribute

253873a

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

ptrendx reviewed Feb 10, 2026

View reviewed changes

zianglih and others added 3 commits February 10, 2026 14:02

Add unit tests

fd94761

Signed-off-by: Ziang Li <ziangli@umich.edu>

Fix bias errors in unit test

0b2dbf9

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3643320

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Add more shapes to unit test

74c787d

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

ziang-and pushed a commit to zianglih/TransformerEngine that referenced this pull request Feb 11, 2026

Port from NVIDIA#2644

f5920cf

zianglih mentioned this pull request Feb 13, 2026

Implement Blackwell MXFP8 recipe radixark/miles#512

Open

	assert self.quantize_backward, "Delayed scaling does not support quantize_backward=False."
	# Note: DelayedScaling does not support quantize_backward=False yet

Conversation

zianglih commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

zianglih commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

zianglih commented Feb 3, 2026 •

edited

Loading

greptile-apps bot commented Feb 3, 2026 •

edited

Loading