-
Notifications
You must be signed in to change notification settings - Fork 132
Description
Summary
The ibm benchmark case crashes immediately at simulation startup when compiled with the AMD compiler (-c famd) and run with OpenMP offloading (--gpu mp) on Frontier. The case compiles successfully — the failure is at runtime during the initial host-to-device data transfer.
The same case works fine with the CCE compiler (-c f).
Error
Simulating a case-optimized 576x288x288 case on 8 rank(s) with OpenMP offloading.
"PluginInterface" error: Failure to copy data from host to device. Pointers: host = 0x0000154c9a2f1010, device = 0x0000154c89200000, size = 0: "unknown or internal error" error in hsa_amd_memory_async_copy_on_engine: HSA_STATUS_ERROR_INVALID_ARGUMENT: One of the actual arguments does not meet a precondition stated in the documentation of the corresponding formal argument.
omptarget error: Copying data to device failed.
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
srun: error: frontier10350: tasks 4-7: Aborted (core dumped)
All 4 failing ranks show the same error with size = 0, suggesting a zero-sized omp target data transfer that the AMD HSA runtime rejects (HSA_STATUS_ERROR_INVALID_ARGUMENT) while CCE treats as a no-op.
Details
- Case:
benchmarks/ibm/case.py— 3D viscous 2-fluid case with immersed boundary sphere ("ib": "T") - Unique feature: IBM is the only benchmark case that enables the immersed boundary method; the other 4 benchmark cases all pass with AMD
- Build: Case-optimized (
--case-optimization), which compiles a customcase.fppthat eliminates dead code paths. This may produce zero-sized arrays for unused features that trigger the AMD runtime bug - syscheck: Passes (MPI + OMP GPU detection OK)
- pre_process: Passes (576x288x288 on 8 ranks, 3.9s)
- simulation: Crashes immediately on launch
Reproducer
On Frontier:
. ./mfc.sh load -c famd -m g
./mfc.sh run benchmarks/ibm/case.py --gpu mp --case-optimization -c frontier_amd -n 8Possible root cause
Case optimization may compile certain IBM-related arrays to zero size. The AMD LLVM OpenMP runtime calls hsa_amd_memory_async_copy_on_engine with size = 0, which the HSA runtime rejects. CCE presumably skips zero-size copies. A debug build (-g or -gline-tables-only) would reveal which source line triggers the transfer.
Workaround
The AMD compiler has been removed from the benchmark workflow in #1135 until this is resolved.