Skip to content

IBM benchmark crashes at runtime with AMD OpenMP offloading compiler on Frontier #1158

@sbryngelson

Description

@sbryngelson

Summary

The ibm benchmark case crashes immediately at simulation startup when compiled with the AMD compiler (-c famd) and run with OpenMP offloading (--gpu mp) on Frontier. The case compiles successfully — the failure is at runtime during the initial host-to-device data transfer.

The same case works fine with the CCE compiler (-c f).

Error

Simulating a case-optimized 576x288x288 case on 8 rank(s) with OpenMP offloading.
"PluginInterface" error: Failure to copy data from host to device. Pointers: host = 0x0000154c9a2f1010, device = 0x0000154c89200000, size = 0: "unknown or internal error" error in hsa_amd_memory_async_copy_on_engine: HSA_STATUS_ERROR_INVALID_ARGUMENT: One of the actual arguments does not meet a precondition stated in the documentation of the corresponding formal argument.
omptarget error: Copying data to device failed.
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
srun: error: frontier10350: tasks 4-7: Aborted (core dumped)

All 4 failing ranks show the same error with size = 0, suggesting a zero-sized omp target data transfer that the AMD HSA runtime rejects (HSA_STATUS_ERROR_INVALID_ARGUMENT) while CCE treats as a no-op.

Details

  • Case: benchmarks/ibm/case.py — 3D viscous 2-fluid case with immersed boundary sphere ("ib": "T")
  • Unique feature: IBM is the only benchmark case that enables the immersed boundary method; the other 4 benchmark cases all pass with AMD
  • Build: Case-optimized (--case-optimization), which compiles a custom case.fpp that eliminates dead code paths. This may produce zero-sized arrays for unused features that trigger the AMD runtime bug
  • syscheck: Passes (MPI + OMP GPU detection OK)
  • pre_process: Passes (576x288x288 on 8 ranks, 3.9s)
  • simulation: Crashes immediately on launch

Reproducer

On Frontier:

. ./mfc.sh load -c famd -m g
./mfc.sh run benchmarks/ibm/case.py --gpu mp --case-optimization -c frontier_amd -n 8

Possible root cause

Case optimization may compile certain IBM-related arrays to zero size. The AMD LLVM OpenMP runtime calls hsa_amd_memory_async_copy_on_engine with size = 0, which the HSA runtime rejects. CCE presumably skips zero-size copies. A debug build (-g or -gline-tables-only) would reveal which source line triggers the transfer.

Workaround

The AMD compiler has been removed from the benchmark workflow in #1135 until this is resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions