IBM benchmark crashes at runtime with AMD OpenMP offloading compiler on Frontier

## Summary

The `ibm` benchmark case crashes immediately at simulation startup when compiled with the AMD compiler (`-c famd`) and run with OpenMP offloading (`--gpu mp`) on Frontier. The case compiles successfully — the failure is at runtime during the initial host-to-device data transfer.

The same case works fine with the CCE compiler (`-c f`).

## Error

```
Simulating a case-optimized 576x288x288 case on 8 rank(s) with OpenMP offloading.
"PluginInterface" error: Failure to copy data from host to device. Pointers: host = 0x0000154c9a2f1010, device = 0x0000154c89200000, size = 0: "unknown or internal error" error in hsa_amd_memory_async_copy_on_engine: HSA_STATUS_ERROR_INVALID_ARGUMENT: One of the actual arguments does not meet a precondition stated in the documentation of the corresponding formal argument.
omptarget error: Copying data to device failed.
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
srun: error: frontier10350: tasks 4-7: Aborted (core dumped)
```

All 4 failing ranks show the same error with `size = 0`, suggesting a zero-sized `omp target` data transfer that the AMD HSA runtime rejects (`HSA_STATUS_ERROR_INVALID_ARGUMENT`) while CCE treats as a no-op.

## Details

- **Case**: `benchmarks/ibm/case.py` — 3D viscous 2-fluid case with immersed boundary sphere (`"ib": "T"`)
- **Unique feature**: IBM is the only benchmark case that enables the immersed boundary method; the other 4 benchmark cases all pass with AMD
- **Build**: Case-optimized (`--case-optimization`), which compiles a custom `case.fpp` that eliminates dead code paths. This may produce zero-sized arrays for unused features that trigger the AMD runtime bug
- **syscheck**: Passes (MPI + OMP GPU detection OK)
- **pre_process**: Passes (576x288x288 on 8 ranks, 3.9s)
- **simulation**: Crashes immediately on launch

## Reproducer

On Frontier:
```bash
. ./mfc.sh load -c famd -m g
./mfc.sh run benchmarks/ibm/case.py --gpu mp --case-optimization -c frontier_amd -n 8
```

## Possible root cause

Case optimization may compile certain IBM-related arrays to zero size. The AMD LLVM OpenMP runtime calls `hsa_amd_memory_async_copy_on_engine` with `size = 0`, which the HSA runtime rejects. CCE presumably skips zero-size copies. A debug build (`-g` or `-gline-tables-only`) would reveal which source line triggers the transfer.

## Workaround

The AMD compiler has been removed from the benchmark workflow in #1135 until this is resolved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IBM benchmark crashes at runtime with AMD OpenMP offloading compiler on Frontier #1158

Summary

Error

Details

Reproducer

Possible root cause

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IBM benchmark crashes at runtime with AMD OpenMP offloading compiler on Frontier #1158

Description

Summary

Error

Details

Reproducer

Possible root cause

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions