Add support for torch.export exported models#1499
Add support for torch.export exported models#1499tolleybot wants to merge 2 commits intodotnet:mainfrom
Conversation
|
@dotnet-policy-service agree |
1f64f5b to
af266bd
Compare
|
Build Failures : Missing LibTorch 2.9.0 Packages I believe the CI builds are failing because the build system requires .sha files for LibTorch package validation, and these are missing for LibTorch 2.9.0 Missing SHA files:
Package availability check:
Why my local tests passed: I was building against the PyTorch Python installation at Should we wait for PyTorch to publish all LibTorch 2.9.0 packages? |
|
|
@masaru-kimura-hacarus Thank you for the detailed investigation and the Gemini Deep Research report! You're absolutely right. I was looking for the wrong package name. I've just pushed the correct SHA files using the new naming convention. Let's see if the CI builds pass now |
|
@dotnet-policy-service agree |
|
👋 Friendly ping on this PR! It's been open for a little while and I wanted to check if there's anything I can do to help move it forward. Happy to address any feedback or make adjustments as needed. |
|
f5d82b7 to
b1c3dac
Compare
|
Rebased onto latest main with libtorch 2.10 backend. Regenerated all .pt2 test models with PyTorch 2.10. Ready for review. |
There was a problem hiding this comment.
Pull request overview
Adds a new TorchSharp integration for running PyTorch torch.export / AOTInductor-packaged .pt2 models (via LibTorch 2.9+ torch::inductor::AOTIModelPackageLoader), enabling inference-only execution from .NET.
Changes:
- Introduces native (C++) bindings to load and run
.pt2packages and wires them into TorchSharp via P/Invoke. - Adds a managed
torch.exportAPI (ExportedProgram+ generic typed returns) to load/run exported programs. - Adds
.pt2test fixtures, a Python generator script, and new unit tests covering basic load/run scenarios.
Reviewed changes
Copilot reviewed 11 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/Native/LibTorchSharp/THSExport.h |
Declares native API for loading/running AOTI .pt2 exported programs. |
src/Native/LibTorchSharp/THSExport.cpp |
Implements the wrapper over torch::inductor::AOTIModelPackageLoader and marshals tensor inputs/outputs. |
src/Native/LibTorchSharp/Utils.h |
Adds ExportedProgram module typedef (and currently the AOTI header include). |
src/Native/LibTorchSharp/THSJIT.h |
Exposes helper declarations intended for sharing with export support. |
src/Native/LibTorchSharp/CMakeLists.txt |
Adds new export source/header to the native build. |
src/TorchSharp/PInvoke/LibTorchSharp.THSExport.cs |
Adds P/Invoke declarations for the new native export APIs. |
src/TorchSharp/Export/ExportedProgram.cs |
Adds managed torch.export.load() + ExportedProgram runtime wrapper and typed-return convenience API. |
test/TorchSharpTest/TestExport.cs |
Adds unit tests covering load/run with single output, multi-input, tuple output, and array output. |
test/TorchSharpTest/generate_export_models.py |
Adds a script to generate AOTInductor-packaged .pt2 test fixtures. |
test/TorchSharpTest/TorchSharpTest.csproj |
Ensures .pt2 fixtures are copied to test output directory. |
RELEASENOTES.md |
Notes the new torch.export support under API changes. |
Comments suppressed due to low confidence (2)
test/TorchSharpTest/TestExport.cs:75
ExportedProgram<TResult>adds special handling forValueTuple<,,>(3 tensor outputs), but the current tests only cover single output,Tensor[], andValueTuple<,>. Add a unit test (and a small generated.pt2fixture) that returns 3 tensors to ensure theValueTuple<,,>path works end-to-end.
public void TestLoadExport_TupleOutput()
{
// Test loading a model that returns a tuple
using var exported = torch.export.load<(Tensor, Tensor)>(@"tuple_out.export.pt2");
Assert.NotNull(exported);
src/Native/LibTorchSharp/Utils.h:8
Utils.his included by most native binding files; addingtorch/csrc/inductor/aoti_package/model_package_loader.hhere makes the entire native build depend on this internal header even when torch.export support isn’t used. SinceExportedProgramModuleis just a pointer typedef, consider forward-declaringtorch::inductor::AOTIModelPackageLoaderand/or moving the include + typedef intoTHSExport.hto keep compile dependencies localized.
#include "torch/torch.h"
#include "torch/csrc/inductor/aoti_package/model_package_loader.h"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // Free the native array (tensors are now owned by managed Tensor objects) | ||
| Marshal.FreeHGlobal(result_ptr); | ||
|
|
There was a problem hiding this comment.
result_ptr is freed with Marshal.FreeHGlobal, but the native side allocates the returned pointer array with C++ new[] (new Tensor[...]). This allocator/free mismatch can crash or corrupt the heap. Expose a native free API that uses delete[] (and call it here), or change the native allocation to malloc/CoTaskMemAlloc to match FreeHGlobal.
There was a problem hiding this comment.
Fixed. Added a dedicated THSExport_Module_run_free_results() native function that uses delete[] to free the array. The C# side now calls this instead of Marshal.FreeHGlobal.
There was a problem hiding this comment.
Fixed — added THSExport_Module_run_free_results() that uses delete[].
| // Allocate output array and copy results | ||
| *result_length = outputs.size(); | ||
| *result_tensors = new Tensor[outputs.size()]; | ||
|
|
There was a problem hiding this comment.
The returned pointer array is allocated with new Tensor[outputs.size()] but there is no corresponding exported API to free it from managed code (and FreeHGlobal is not compatible with new[]). Add an exported free function that delete[]s this array (or switch to a caller-provided allocator callback), and consider using size_t/int64_t for result_length to avoid truncation from outputs.size().
There was a problem hiding this comment.
Fixed both issues. Added THSExport_Module_run_free_results() for proper delete[] cleanup, and changed result_length from int to int64_t to avoid truncation.
There was a problem hiding this comment.
Fixed — added THSExport_Module_run_free_results() with delete[], and changed result_length to int64_t.
b1c3dac to
9427efe
Compare
|
Hey @tolleybot can you also address the copilot comments ? Also, there are some test failures from TestExport |
9427efe to
4b97275
Compare
|
@alinpahontu2912 Thanks for the review! Addressed the Copilot comments inline in each thread. TestExport CI failures — fixed: |
|
Hey @tolleybot, thanks for the work. Unfortunately we can not run macos tests in our pipelines, so we would need to have a strategy that allows running on both ubuntu/windows machines. I am also not a fan of adding multiple .pt2 files to only use for testing that export works. I think it won't scale nicely. Can you think of a solution for this ? |
|
@alinpahontu2912 Good points — I agree on both fronts. I've just pushed a commit that removes all 7
These all run on any platform (macOS, Linux, Windows) without needing any model files. The core native load/run/dispose plumbing is still fully exercised through the error paths. For end-to-end model execution testing, that could be handled separately — e.g. via a CI step that generates platform-native |
just my 2 cents,
|
|
I can go ahead and do an interactive rebase to scrub them from history entirely and force push? Happy to do it if that’s preferred |
Adds inference-only support for running PyTorch torch.export / AOTInductor compiled .pt2 models from .NET via LibTorch's AOTIModelPackageLoader. - C++ bindings: THSExport.h/.cpp wrapping AOTIModelPackageLoader - P/Invoke layer: LibTorchSharp.THSExport.cs - Managed API: torch.export.load() returning ExportedProgram - Generic typed wrapper: ExportedProgram<T> for Tensor, Tensor[], tuples - Platform-independent API-surface tests (no .pt2 fixtures required)
059e8a2 to
519f9af
Compare
|
@masaru-kimura-hacarus I rebased and squashed into a single commit so the |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| using System; | ||
| using System.Runtime.InteropServices; | ||
| using TorchSharp.PInvoke; |
There was a problem hiding this comment.
using TorchSharp.PInvoke; appears unused in this file (NativeMethods is already referenced via using static TorchSharp.PInvoke.NativeMethods;). With TreatWarningsAsErrors=true, this can fail the build (CS8019). Remove the unused using directive.
| using TorchSharp.PInvoke; |
| Marshal.Copy(result_ptr, result_handles, 0, count); | ||
|
|
||
| for (int i = 0; i < count; i++) | ||
| { | ||
| results[i] = new torch.Tensor(result_handles[i]); | ||
| } | ||
|
|
||
| // Free the native array (tensors are now owned by managed Tensor objects) | ||
| THSExport_Module_run_free_results(result_ptr); | ||
|
|
There was a problem hiding this comment.
THSExport_Module_run_free_results(result_ptr) is not in a finally. If Marshal.Copy or new torch.Tensor(...) throws, the native result pointer array will leak. Wrap the marshal + tensor construction in a try/finally (or use a SafeHandle) so the native array is always freed on all paths.
| Marshal.Copy(result_ptr, result_handles, 0, count); | |
| for (int i = 0; i < count; i++) | |
| { | |
| results[i] = new torch.Tensor(result_handles[i]); | |
| } | |
| // Free the native array (tensors are now owned by managed Tensor objects) | |
| THSExport_Module_run_free_results(result_ptr); | |
| try | |
| { | |
| Marshal.Copy(result_ptr, result_handles, 0, count); | |
| for (int i = 0; i < count; i++) | |
| { | |
| results[i] = new torch.Tensor(result_handles[i]); | |
| } | |
| } | |
| finally | |
| { | |
| // Free the native array (tensors are now owned by managed Tensor objects) | |
| THSExport_Module_run_free_results(result_ptr); | |
| } |
There was a problem hiding this comment.
Fixed — wrapped in try/finally.
| THSExport_Module_run(handle, input_handles, inputs.Length, out IntPtr result_ptr, out long result_length); | ||
| torch.CheckForErrors(); | ||
|
|
||
| // Marshal result array |
There was a problem hiding this comment.
result_length is an Int64 from native code but is cast to int without validation. If a model ever returns more than int.MaxValue outputs, this will overflow and lead to incorrect allocations / marshalling. Add a checked cast or an explicit range check and throw a clear exception when the length is out of range.
| // Marshal result array | |
| // Marshal result array | |
| if (result_length < 0 || result_length > int.MaxValue) | |
| { | |
| throw new InvalidOperationException( | |
| $"Native export run returned an out-of-range result length: {result_length}."); | |
| } |
There was a problem hiding this comment.
Fixed — added range check before casting.
| var genericType = typeof(TResult).GetGenericTypeDefinition(); | ||
| if (genericType == typeof(ValueTuple<,>)) | ||
| { | ||
| if (results.Length != 2) | ||
| throw new InvalidOperationException($"Expected 2 output tensors, got {results.Length}"); | ||
| return (TResult)Activator.CreateInstance(typeof(TResult), results[0], results[1]); | ||
| } | ||
| if (genericType == typeof(ValueTuple<,,>)) | ||
| { | ||
| if (results.Length != 3) | ||
| throw new InvalidOperationException($"Expected 3 output tensors, got {results.Length}"); | ||
| return (TResult)Activator.CreateInstance(typeof(TResult), results[0], results[1], results[2]); |
There was a problem hiding this comment.
The tuple branch assumes any ValueTuple<,> / ValueTuple<,,> is a tuple of tensors. If TResult is e.g. ValueTuple<int,int>, Activator.CreateInstance will throw a confusing reflection exception. Validate that the tuple generic arguments are torch.Tensor (or otherwise explicitly supported) and throw a clear NotSupportedException/ArgumentException when they are not.
| var genericType = typeof(TResult).GetGenericTypeDefinition(); | |
| if (genericType == typeof(ValueTuple<,>)) | |
| { | |
| if (results.Length != 2) | |
| throw new InvalidOperationException($"Expected 2 output tensors, got {results.Length}"); | |
| return (TResult)Activator.CreateInstance(typeof(TResult), results[0], results[1]); | |
| } | |
| if (genericType == typeof(ValueTuple<,,>)) | |
| { | |
| if (results.Length != 3) | |
| throw new InvalidOperationException($"Expected 3 output tensors, got {results.Length}"); | |
| return (TResult)Activator.CreateInstance(typeof(TResult), results[0], results[1], results[2]); | |
| var resultType = typeof(TResult); | |
| var genericType = resultType.GetGenericTypeDefinition(); | |
| if (genericType == typeof(ValueTuple<,>)) | |
| { | |
| var typeArgs = resultType.GetGenericArguments(); | |
| if (typeArgs.Length != 2 || | |
| typeArgs[0] != typeof(torch.Tensor) || | |
| typeArgs[1] != typeof(torch.Tensor)) | |
| { | |
| throw new NotSupportedException( | |
| $"Tuple return type {resultType} is not supported. Only ValueTuple<torch.Tensor, torch.Tensor> is supported."); | |
| } | |
| if (results.Length != 2) | |
| throw new InvalidOperationException($"Expected 2 output tensors, got {results.Length}"); | |
| return (TResult)Activator.CreateInstance(resultType, results[0], results[1]); | |
| } | |
| if (genericType == typeof(ValueTuple<,,>)) | |
| { | |
| var typeArgs = resultType.GetGenericArguments(); | |
| if (typeArgs.Length != 3 || | |
| typeArgs[0] != typeof(torch.Tensor) || | |
| typeArgs[1] != typeof(torch.Tensor) || | |
| typeArgs[2] != typeof(torch.Tensor)) | |
| { | |
| throw new NotSupportedException( | |
| $"Tuple return type {resultType} is not supported. Only ValueTuple<torch.Tensor, torch.Tensor, torch.Tensor> is supported."); | |
| } | |
| if (results.Length != 3) | |
| throw new InvalidOperationException($"Expected 3 output tensors, got {results.Length}"); | |
| return (TResult)Activator.CreateInstance(resultType, results[0], results[1], results[2]); |
There was a problem hiding this comment.
Fixed — now validates generic type arguments are torch.Tensor and throws NotSupportedException otherwise.
| *result_length = static_cast<int64_t>(outputs.size()); | ||
| *result_tensors = new Tensor[outputs.size()]; | ||
|
|
||
| for (size_t i = 0; i < outputs.size(); i++) { | ||
| (*result_tensors)[i] = new torch::Tensor(outputs[i]); |
There was a problem hiding this comment.
If an exception occurs after allocating *result_tensors (or after allocating some of the new torch::Tensor(...) outputs), the current CATCH path will leak the partially allocated array/tensors, and the out-params may be left in an indeterminate state. Initialize *result_tensors = nullptr / *result_length = 0 before the try block and ensure allocations are cleaned up on failure (e.g., use RAII/smart pointers and only release ownership on success).
There was a problem hiding this comment.
Fixed — out-params initialized to nullptr/0 before the try block, and only assigned after full success.
test/TorchSharpTest/TestExport.cs
Outdated
| // Verify that double-dispose doesn't throw. | ||
| // We can't construct a valid ExportedProgram without a real model, | ||
| // so we catch the load error and verify we can still call Dispose | ||
| // without crashing (the constructor should have cleaned up already). | ||
| ExportedProgram? program = null; | ||
| try | ||
| { | ||
| program = torch.export.load("nonexistent.pt2"); | ||
| } | ||
| catch (ExternalException) | ||
| { | ||
| // Expected - the file doesn't exist | ||
| } | ||
|
|
||
| // If somehow a program was created (shouldn't happen), dispose it twice | ||
| if (program != null) | ||
| { | ||
| program.Dispose(); | ||
| program.Dispose(); // second dispose should not throw | ||
| } | ||
|
|
||
| // The fact that we reach here without crashing validates idempotent cleanup |
There was a problem hiding this comment.
This test doesn't actually validate idempotent Dispose(): when torch.export.load("nonexistent.pt2") throws, program remains null and the double-dispose branch is never executed. Either create/load a valid .pt2 test artifact (and then dispose twice), or construct an ExportedProgram instance in a way that allows exercising Dispose() without relying on a failing load.
| // Verify that double-dispose doesn't throw. | |
| // We can't construct a valid ExportedProgram without a real model, | |
| // so we catch the load error and verify we can still call Dispose | |
| // without crashing (the constructor should have cleaned up already). | |
| ExportedProgram? program = null; | |
| try | |
| { | |
| program = torch.export.load("nonexistent.pt2"); | |
| } | |
| catch (ExternalException) | |
| { | |
| // Expected - the file doesn't exist | |
| } | |
| // If somehow a program was created (shouldn't happen), dispose it twice | |
| if (program != null) | |
| { | |
| program.Dispose(); | |
| program.Dispose(); // second dispose should not throw | |
| } | |
| // The fact that we reach here without crashing validates idempotent cleanup | |
| // Verify that double-dispose on a real ExportedProgram doesn't throw. | |
| // Create a simple model and export it to obtain an ExportedProgram instance. | |
| var model = torch.nn.Linear(10, 10); | |
| using var input = torch.randn(1, 10); | |
| using var exported = torch.export.export(model, input); | |
| // Dispose explicitly twice to validate idempotence. | |
| exported.Dispose(); | |
| exported.Dispose(); // second dispose should not throw |
There was a problem hiding this comment.
Removed — the test never exercised the dispose path since the constructor throws before an object is assigned. Idempotent dispose will get proper coverage once end-to-end tests with real models are added.
- Remove unused using directive (TorchSharp.PInvoke) - Wrap marshal/tensor construction in try/finally to prevent native leak - Add range check on result_length before casting to int - Validate tuple generic type arguments are torch.Tensor - Initialize native out-params before try block for safe error paths - Remove no-op dispose test that never exercised the dispose path



Add support for torch.export exported models (#1498)
Implements functionality to load and execute PyTorch models exported via torch.export (.pt2 files), enabling .NET applications to run ExportedProgram models as the PyTorch ecosystem transitions from ONNX to torch.export.
Summary
This PR adds support for loading and running AOTInductor-compiled
.pt2models in TorchSharp usingtorch::inductor::AOTIModelPackageLoaderfrom LibTorch 2.9+.Key Points:
torch._inductor.aoti_compile_and_package()in PythonImplementation
Native Layer (C++)
Files:
src/Native/LibTorchSharp/Utils.h- Added AOTIModelPackageLoader header includesrc/Native/LibTorchSharp/THSExport.h- C++ API declarationssrc/Native/LibTorchSharp/THSExport.cpp- Implementation usingtorch::inductor::AOTIModelPackageLoaderKey Changes:
Managed Layer (C#)
Files:
src/TorchSharp/PInvoke/LibTorchSharp.THSExport.cs- PInvoke declarationssrc/TorchSharp/Export/ExportedProgram.cs- High-level C# APIAPI Design:
Features:
IDisposablefor proper resource cleanupExportedProgram<TResult>for type-safe returnsrun(),forward(), andcall()methods (all equivalent)Testing
Files:
test/TorchSharpTest/TestExport.cs- 7 comprehensive unit teststest/TorchSharpTest/generate_export_models.py- Python script to generate test modelstest/TorchSharpTest/*.pt2- 6 test modelsTest Coverage:
All 7 tests pass successfully.
Dependencies
Updated:
build/Dependencies.props- Updated LibTorch from 2.7.1 to 2.9.0LibTorch 2.9.0 includes the
torch::inductor::AOTIModelPackageLoaderimplementation that was previously only available in PyTorch source code.Technical Details
Two .pt2 Formats
PyTorch has two different .pt2 export formats:
Python-only (from
torch.export.save()):AOTInductor-compiled (from
torch._inductor.aoti_compile_and_package()):Python Model Generation
To create compatible .pt2 files:
Limitations
Performance
According to PyTorch documentation, AOTInductor provides:
Testing
Migration Guide
For users currently using TorchScript:
Before (TorchScript):
After (torch.export):
References
torch/csrc/inductor/aoti_package/model_package_loader.hFixes #1498