MIBM Performance Optimizations by danieljvickers · Pull Request #1157 · MFlowCode/MFC

danieljvickers · 2026-02-17T16:09:41Z

Description

Following the refactor of the levelset, there were several performance optimizations left to be made to the code. This PR introduces optimizations that will make multi-particle MIBM code viable. It also expands the upper bound of allowed number of immersed boundaries to 1000. Performance was measured on 1-4 ranks of ACC GPU compute using A100 GPUs.

This PR has extended optimization to STL IBs, which are now fully capable of running on the GPU. An example is given below.

Type of change

New feature
Refactor
Other: Performance Tuning

Testing

All changes pass the IBM section of the test suite on GPUs with the NVHPC compiler. Performance was measured with a case of 1000 particles with viscosity enabled. The particles are all resolved 3D spheres given random non-overlapping positions generated by the following case file:

import json
import argparse

num_cells = [240, 240, 240]
dim = [8., 8., 8.]
num_particles = 1000

# create a stationary fluid
case = {
    "run_time_info": "T",
    "parallel_io": "T",
    "m": num_cells[0]-1,
    "n": num_cells[1]-1,
    "p": num_cells[2]-1,
    "dt": 0.005,
    "t_step_start": 0,
    "t_step_stop": 2,
    "t_step_save": 1,
    "num_patches": 1,
    "model_eqns": 2,
    "alt_soundspeed": "F",
    "num_fluids": 1,
    "mpp_lim": "F",
    "mixture_err": "T",
    "time_stepper": 3,
    "recon_type": 1,
    "weno_eps": 1e-16,
    "riemann_solver": 2,
    "wave_speeds": 1,
    "avg_state": 2,
    "precision": 2,
    "format": 1,
    "prim_vars_wrt": "T",
    "E_wrt": "T",
    "viscous": "T",
    "x_domain%beg": -0.5*dim[0],
    "x_domain%end": 0.5*dim[0],
    "y_domain%beg": -0.5*dim[1],
    "y_domain%end": 0.5*dim[1],
    "z_domain%beg": -0.5*dim[2],
    "z_domain%end": 0.5*dim[2],
    "bc_x%beg": -3,
    "bc_x%end": -3,
    "bc_y%beg": -3,
    "bc_y%end": -3,
    "bc_z%beg": -3,
    "bc_z%end": -3,
    "patch_icpp(1)%geometry": 9,
    "patch_icpp(1)%z_centroid": 0.0,
    "patch_icpp(1)%length_z": dim[2],
    "patch_icpp(1)%y_centroid": 0.0,
    "patch_icpp(1)%length_y": dim[1],
    "patch_icpp(1)%x_centroid": 0.0,
    "patch_icpp(1)%length_x": dim[0],
    "weno_order": 5,
    "patch_icpp(1)%pres": 1.0,
    "patch_icpp(1)%alpha_rho(1)": 1.0,
    "patch_icpp(1)%alpha(1)": 1.0,
    "patch_icpp(1)%vel(1)": 0.0,
    "patch_icpp(1)%vel(2)": 0.0,
    "patch_icpp(1)%vel(3)": 0.0,
    "fluid_pp(1)%gamma": 2.5000000000000004,
    "fluid_pp(1)%pi_inf": 0.0,
    "fluid_pp(1)%Re(1)": 2500000,
}

import random
random.seed(42)

dx = [float(dim[i]) / float(num_cells[i]) for i in range(3)]

#set particle properties
particle_radius_cells = 5 # particle radius in grid cells
particle_cell_spacing = particle_radius_cells*2 + 5 # set the spacing to be double the radius plus 5 to garuntee no image points in other IBs
mpi_cell_spacing = particle_radius_cells + 5 # space safely away from MPI halo regions to prevent out of bounds errors
genreation_bounds = [6., 6., 6.] # generate particles in this box safely away from the boundary
velocity_magnitude = 1.

# convert non-dimnesional grid cell units to the units of the simulation
radius_units = float(particle_radius_cells) * dx[0]
paticle_units_spacing = float(particle_cell_spacing) * dx[0]
paticle_units_spacing_squared = paticle_units_spacing**2
mpi_units_spacing = [float(mpi_cell_spacing) * dx[i] for i in range(3)]

# generate an array of xyz values that garuntee non-overlapping grid cells
particles = []
while len(particles) < num_particles:
  # generate a completely random position
  position = [random.random() for i in range(3)]
  position = [(position[i] - 0.5) * genreation_bounds[i] for i in range(3)]

  # first check if the particle is too close to the MPI halo regions
  valid = True
  # for i in range(3):
  #   valid = valid and abs(position[i]) >= mpi_units_spacing[i]

  # check for the minimum spacing between all particles, exiting once we find an error
  for particle in particles:
    distance_squared = sum([(particle[i] - position[i])**2 for i in range(3)])
    valid = valid and distance_squared >= paticle_units_spacing_squared
    if not valid:
      break

  if valid:
    particles.append(position)
    # print(f"\rProgress: {100.*float(len(particles))/float(num_particles)}%", end="", flush=True)

# print()
# convert out array of positions to valid JSON for the immersed boundary
ib_properties = {"ib": "T", "num_ibs": num_particles,}
for i in range(len(particles)):
  ib_properties[f'patch_ib({i+1})%radius'] = radius_units
  ib_properties[f'patch_ib({i+1})%slip'] = 'F'
  ib_properties[f'patch_ib({i+1})%geometry'] = 8
  ib_properties[f'patch_ib({i+1})%moving_ibm'] = 2
  ib_properties[f'patch_ib({i+1})%mass'] = 1.
  ib_properties[f'patch_ib({i+1})%x_centroid'] = particles[i][0]
  ib_properties[f'patch_ib({i+1})%y_centroid'] = particles[i][1]
  ib_properties[f'patch_ib({i+1})%z_centroid'] = particles[i][2]
  # move the particle away radially to garuntee they never touch during the simulation
  position_mag = (sum([(particles[i][j])**2 for j in range(3)]))**0.5
  for j in range(3):
    ib_properties[f'patch_ib({i+1})%vel({j+1})'] = particles[i][j] * velocity_magnitude / position_mag

print(json.dumps({**case, **ib_properties}))

These optimizations add nearly x1000 performance in the moving IBM propagation and generation code. Prior to these optimizations, this was the result of the benchmark case using the NVIDIA NSight profiler showing 45 seconds to run a single RK substep:

Following these optimizations, the same profile achieves almost 50 ms per RK substep:

Checklist

I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)

GPU results match CPU results
Tested on NVIDIA GPU or AMD GPU

Summary by CodeRabbit

Release Notes

New Features
- Model-based levelset patch geometry support
Enhancements
- Increased maximum patches to 1000 (previously 10)
- Increased maximum immersed boundaries to 1000 (previously 10)
- Enhanced STL model processing for immersed boundary computations
- Added performance profiling instrumentation

…mpute

coderabbitai · 2026-02-17T16:09:53Z

📝 Walkthrough

Walkthrough

This PR implements GPU-accelerated immersed boundary method (IBM) support by introducing GPU-friendly data structures for STL models, refactoring IB patch handling with type-based markers, adding bounding-index optimizations, expanding model limits from 10 to 1000, and integrating profiling instrumentation across multiple computational kernels.

Changes

Cohort / File(s)	Summary
GPU Parallelization Infrastructure `src/common/include/parallel_macros.fpp`	Added `END_GPU_ATOMIC_CAPTURE()` macro for atomic capture ending directives, mirroring existing GPU parallel loop patterns.
Configuration & Limits `src/common/m_constants.fpp`, `toolchain/mfc/params/definitions.py`, `toolchain/mfc/case_validator.py`, `toolchain/mfc/params_tests/test_definitions.py`	Increased `num_patches_max` from 10 to 1000, updated corresponding IBM validation limits and parameter count expectations from ~3400 to ~40000.
Model Data Structures `src/common/m_derived_types.fpp`, `src/common/m_helper.fpp`	Modified `t_model_array` with `interpolate` changed from logical to integer; added GPU-optimized fields (`ntrs`, `trs_v`, `trs_n`). Added GPU routine annotation to `f_cross` function.
GPU Model Data & Functions `src/common/m_model.fpp`	Introduced public GPU-friendly data arrays (`gpu_ntrs`, `gpu_trs_v`, `gpu_trs_n`, `gpu_boundary_v`, `gpu_interpolate`, etc.) and new subroutines `s_pack_model_for_gpu`, `f_model_is_inside_flat`, `f_distance_normals_3d_flat` for GPU computation pathways.
Immersed Boundary Patch Framework `src/simulation/m_ib_patches.fpp`	Replaced raw integer array `ib_markers_sf` with typed `ib_markers` (type-based) across all patch routines (circle, airfoil, model, sphere, cylinder, etc.); added `get_bounding_indices` for GPU loop optimization; substantially expanded `s_instantiate_STL_models` with GPU data packing and transfer logic.
Levelset Computation `src/simulation/m_compute_levelset.fpp`	Refactored to use GPU data structures (`gpu_interpolate`, `gpu_boundary_edge_count`, `gpu_total_vertices`) and flat-variant distance/normal functions; replaced boolean interpolation checks with integer comparisons.
IBM & Time Stepping `src/simulation/m_ibm.fpp`, `src/simulation/m_time_steppers.fpp`	Introduced NVTX profiling ranges for IBM setup/computation steps; unified 2D/3D logic with GPU parallelization scaffolding; added atomic counters for ghost-point tracking; integrated GPU data transfers and synchronization around IB patch application and force computation.

Sequence Diagram(s)

sequenceDiagram
    participant CPU as CPU/Host
    participant GPU as GPU Device
    participant Model as Model Data
    participant LevelSet as LevelSet Computation
    participant IBM as IBM Forces

    CPU->>Model: s_instantiate_STL_models()
    Note over Model: Read & transform STL models
    Model->>Model: Compute bounding boxes
    Model->>Model: Pack data (ntrs, trs_v, trs_n)
    
    CPU->>GPU: s_pack_model_for_gpu()
    Note over GPU: Upload gpu_ntrs, gpu_trs_v, gpu_trs_n<br/>gpu_boundary_v, gpu_interpolate
    
    CPU->>LevelSet: s_compute_levelset()
    LevelSet->>GPU: f_distance_normals_3d_flat()
    Note over GPU: Parallel computation using<br/>pre-packed GPU data arrays
    GPU->>LevelSet: Return distance & normals
    
    CPU->>CPU: s_apply_ib_patches(ib_markers)
    Note over CPU: Type-based ib_markers<br/>with bounding-index constraints
    
    CPU->>IBM: s_update_mib()
    IBM->>GPU: NVTX profiled GPU kernels
    Note over GPU: Unified 2D/3D logic<br/>Atomic counters for ghost points<br/>Image point interpolation
    GPU->>IBM: Force data
    
    IBM->>CPU: Return computed forces

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Levelset refactor #1123 — Directly implements levelset refactor with same modules (m_compute_levelset, s_apply_levelset, STL model instantiation) and GPU/data packing infrastructure.
Analytic mibm velocities and airfoil centroid #1111 — Modifies overlapping IB patch routines (airfoil, model, levelset, IBM) with similar GPU/STL handling patterns.
AMDFlang compiler for MFC #1116 — Updates GPU/OpenACC/OpenMP parallel macro infrastructure (parallel_macros.fpp) with complementary end-directive patterns.

Suggested labels

Review effort 4/5, size:XXL

Suggested reviewers

sbryngelson
wilfonba

Poem

🐰 A GPU path unfolds so bright,
Where models pack and take their flight,
From CPU threads to CUDA cores—
IBM forces through GPU doors!
With atomic counts and bounding grace,
We've found immersion's perfect place. 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'MIBM Performance Optimizations' clearly and concisely describes the main objective of the PR, which is to optimize the Moving Immersed Boundary Method code for GPU execution.
Description check	✅ Passed	The PR description is comprehensive and well-structured. It includes a clear summary of changes, motivation, expanded upper bounds, testing results with performance metrics, and a practical example case. However, the GPU changes checklist items were not marked as completed despite GPU changes being made.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…mething about that...

codecov · 2026-02-17T20:49:57Z

Codecov Report

❌ Patch coverage is 57.08812% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.24%. Comparing base (c5493c1) to head (bc972ca).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_ib_patches.fpp	63.12%	56 Missing and 3 partials ⚠️
src/simulation/m_ibm.fpp	50.00%	23 Missing and 13 partials ⚠️
src/common/m_model.fpp	37.03%	17 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1157      +/-   ##
==========================================
+ Coverage   44.07%   44.24%   +0.17%     
==========================================
  Files          70       70              
  Lines       20431    20505      +74     
  Branches     1974     1950      -24     
==========================================
+ Hits         9004     9073      +69     
- Misses      10291    10306      +15     
+ Partials     1136     1126      -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…yet)

cubic-dev-ai

7 issues found across 12 files

Confidence score: 2/5

High risk due to concrete breakages: stl_bounding_boxes re-allocation in src/simulation/m_ib_patches.fpp can crash at runtime with multiple STL patches, and src/simulation/m_compute_levelset.fpp changes make if (interpolate) invalid in Fortran 3D path.
An unconditional interpolate = .false. in src/simulation/m_ib_patches.fpp disables STL interpolation, making the intended logic dead and likely altering results.
Pay close attention to src/simulation/m_ib_patches.fpp and src/simulation/m_compute_levelset.fpp - crash/compile risks around interpolation and allocation.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="toolchain/mfc/case_validator.py">

<violation number="1" location="toolchain/mfc/case_validator.py:687">
P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</violation>
</file>

<file name="src/simulation/m_compute_levelset.fpp">

<violation number="1" location="src/simulation/m_compute_levelset.fpp:647">
P1: Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</violation>
</file>

<file name="src/simulation/m_time_steppers.fpp">

<violation number="1" location="src/simulation/m_time_steppers.fpp:805">
P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</violation>
</file>

<file name="src/common/m_model.fpp">

<violation number="1" location="src/common/m_model.fpp:550">
P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</violation>
</file>

<file name="src/simulation/m_ib_patches.fpp">

<violation number="1" location="src/simulation/m_ib_patches.fpp:1026">
P1: Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</violation>

<violation number="2" location="src/simulation/m_ib_patches.fpp:1069">
P1: Runtime crash with multiple STL models: `stl_bounding_boxes` is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use `num_ibs` for the first dimension.</violation>
</file>

<file name="src/simulation/m_ibm.fpp">

<violation number="1" location="src/simulation/m_ibm.fpp:114">
P3: `s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_compute_levelset.fpp


        integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
-        logical :: interpolate
+        integer :: interpolate


P1: Changing interpolate to INTEGER breaks the if (interpolate) then check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_compute_levelset.fpp, line 647: <comment>Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</comment> <file context> @@ -655,7 +644,7 @@ contains integer :: i, j, k, patch_id, boundary_edge_count, total_vertices - logical :: interpolate + integer :: interpolate real(wp), dimension(1:3) :: center, xyz_local real(wp) :: normals(1:3) !< Boundary normal buffer </file context>

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_ib_patches.fpp

+                    write (*, "(A, 3(2X, F20.10))") "    >         Max:", grid_mm(:, 2)
+                end if
+
+                allocate (stl_bounding_boxes(patch_id, 1:3, 1:3))


P1: Runtime crash with multiple STL models: stl_bounding_boxes is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use num_ibs for the first dimension.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ib_patches.fpp, line 1069: <comment>Runtime crash with multiple STL models: `stl_bounding_boxes` is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use `num_ibs` for the first dimension.</comment> <file context> @@ -865,83 +806,368 @@ contains + write (*, "(A, 3(2X, F20.10))") " > Max:", grid_mm(:, 2) + end if + + allocate (stl_bounding_boxes(patch_id, 1:3, 1:3)) + stl_bounding_boxes(patch_id, 1, 1:3) = [bbox%min(1), (bbox%min(1) + bbox%max(1))/2._wp, bbox%max(1)] + stl_bounding_boxes(patch_id, 2, 1:3) = [bbox%min(2), (bbox%min(2) + bbox%max(2))/2._wp, bbox%max(2)] </file context>

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_ib_patches.fpp

+                else
+                    call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+                end if
+                interpolate = .false.


P1: Debugging override left in: interpolate = .false. unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ib_patches.fpp, line 1026: <comment>Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</comment> <file context> @@ -865,83 +806,368 @@ contains + else + call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate) + end if + interpolate = .false. + + ! Show the number of edges and boundary edges in 2D STL models </file context>

cubic-dev-ai · 2026-02-19T18:50:33Z

src/common/m_model.fpp

+                ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
+                ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
+                ! cast sample rays in all directions
+                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp


P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/common/m_model.fpp, line 550: <comment>Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</comment> <file context> @@ -493,58 +526,116 @@ contains + ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center + ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k) + ! cast sample rays in all directions + ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp + end do ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))) </file context>

Suggested change

ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp

ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

cubic-dev-ai · 2026-02-19T18:50:33Z

toolchain/mfc/case_validator.py

        self.prohibit(ib and n <= 0,
                     "Immersed Boundaries do not work in 1D (requires n > 0)")
-        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),


P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At toolchain/mfc/case_validator.py, line 687: <comment>Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</comment> <file context> @@ -684,7 +684,7 @@ def check_ibm(self): self.prohibit(ib and n <= 0, "Immersed Boundaries do not work in 1D (requires n > 0)") - self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10), + self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), "num_ibs must be between 1 and num_patches_max (10)") self.prohibit(not ib and num_ibs > 0, </file context>

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_time_steppers.fpp

        integer :: i
        logical :: forces_computed

+        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")


P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_time_steppers.fpp, line 805: <comment>Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</comment> <file context> @@ -802,6 +802,8 @@ contains integer :: i logical :: forces_computed + call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES") + forces_computed = .false. </file context>

Suggested change

call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")

call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

cubic-dev-ai · 2026-02-19T18:50:33Z

src/simulation/m_ibm.fpp

@@ -28,6 +28,8 @@ module m_ibm



P3: s_populate_ib_buffers is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in s_ibm_setup and s_update_mib. Consider removing the subroutine to avoid confusion for future contributors.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ibm.fpp, line 114: <comment>`s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</comment> <file context> @@ -96,19 +100,18 @@ contains - call s_apply_ib_patches(ib_markers%sf(0:m, 0:n, 0:p)) $:GPU_UPDATE(device='[ib_markers%sf]') - call s_populate_ib_buffers() + call s_apply_ib_patches(ib_markers) $:GPU_UPDATE(host='[ib_markers%sf]') do i = 1, num_ibs </file context>

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)

src/simulation/m_time_steppers.fpp (1)
805-855: ⚠️ Potential issue | 🟡 Minor

Fix NVTX range label typo for consistent profiling.

The label reads “IMERSED” instead of “IMMERSED”, which can break grouping in profiling tools.
✏️ Proposed fix
-        call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
+        call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_time_steppers.fpp` around lines 805 - 855, The NVTX range
label passed to nvtxStartRange in the MTimeSteppers routine is misspelled
("PROPAGATE-IMERSED-BOUNDARIES"); update the string to the correct label
"PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools group this region correctly
(look for the nvtxStartRange(...) call in the do-loop that updates patch_ib and
ensure nvtxEndRange remains paired).
src/simulation/m_compute_levelset.fpp (1)
641-709: ⚠️ Potential issue | 🔴 Critical

Fix integer interpolate used as logical in 3D model path (won't compile).

Line 691 uses if (interpolate) then but interpolate is declared as integer (line 647). This is not valid Fortran and will fail compilation. The 2D branch already correctly uses if (interpolate == 1) then (line 704).
Proposed fix
             ! Get the shortest distance between the cell center and the interpolated model boundary
-            if (interpolate) then
+            if (interpolate == 1) then
                 gp%levelset = f_interpolated_distance(gpu_interpolated_boundary_v(:, :, patch_id), total_vertices, xyz_local)
             else
                 gp%levelset = distance
             end if
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_compute_levelset.fpp` around lines 641 - 709, The integer
variable interpolate in subroutine s_model_levelset is being used as a logical
in the 3D branch (if (interpolate) then) which will not compile; update that
conditional to explicitly compare the integer (e.g., if (interpolate == 1) then)
or change the variable to a logical and adjust all uses accordingly so that the
3D path (the block that currently calls f_interpolated_distance or assigns
gp%levelset = distance) uses a valid boolean check on interpolate.
src/simulation/m_ibm.fpp (3)
589-676: ⚠️ Potential issue | 🟠 Major

Add patch_id to the GPU private list in s_find_ghost_points.

patch_id is assigned in every iteration (lines 628, 666) and used only within that iteration. Without being declared private, multiple GPU threads will access a shared patch_id variable, causing race conditions when reading from patch_ib(patch_id)%....
Proposed fix
-        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
+        $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx,patch_id]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 589 - 676, The variable patch_id in
s_find_ghost_points is written per-iteration but not listed as private in the
$:GPU_PARALLEL_LOOP, causing concurrent threads to race when reading
patch_ib(patch_id); update the GPU pragma for s_find_ghost_points to include
patch_id in the private list (alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each
thread gets its own patch_id before using patch_ib(patch_id)%... .
441-534: ⚠️ Potential issue | 🟠 Major

Fix pointer association and silent out-of-bounds risk in GPU kernel image-point search.

Inside the GPU-parallel loop, s_cc => x_cc/y_cc/z_cc is pointer association within device code, and the bounds safety check (lines 509–521) is compiled out for GPU builds (#if !defined(MFC_OpenACC)). This means the do while loop can walk index beyond [-buff_size, bound] silently, storing invalid values in ip_grid without error or warning on GPU.

Recommended fixes:

Add explicit bounds clamping inside the loop: if index reaches the boundary, exit and clamp to a valid index.

Alternatively, replace the linear search with a binary search over monotone s_cc for both safety and performance.
Current unsafe loop (lines 506–524)
do while ((temp_loc < s_cc(index) &
           .or. temp_loc > s_cc(index + 1)))
    index = index + dir
`#if` !defined(MFC_OpenACC) && !defined(MFC_OpenMP)
    if (index < -buff_size .or. index > bound) then
        ! error handling (skipped on GPU)
        error stop "Ghost Point and Image Point on Different Processors"
    end if
`#endif`
end do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 441 - 534, s_compute_image_points
contains a GPU-parallel do-while that advances index using pointer s_cc and can
run out-of-bounds on device builds because the host-only error check is compiled
out; fix by adding an in-device-safe boundary clamp or replace the linear scan
with a binary search over s_cc. Specifically, inside the loop that updates index
(references: s_cc, index, dir, temp_loc, bound, buff_size) add an explicit check
that if index reaches the lower/upper limits (-buff_size or bound) you break the
loop and clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.
967-1120: ⚠️ Potential issue | 🟠 Major

Add GPU_UPDATE(device=...) after MPI reductions to maintain self-contained GPU synchronization.

The MPI reductions at lines 1108–1109 occur on the host, followed by host-side writes to patch_ib(i)%force and patch_ib(i)%torque. While the caller (s_update_mib at line 940 in m_time_steppers.fpp) does eventually sync patch_ib to device, making s_compute_ib_forces self-contained by adding GPU_UPDATE(device='[patch_ib(1:num_ibs)]') after the final loop ensures coherence is handled within the function and aligns with the guideline: "Wrap MPI calls with GPU_UPDATE to ensure GPU-host synchronization before and after MPI communication."
Suggested fix
         do i = 1, num_ibs
             patch_ib(i)%force(:) = forces(i, :)
             patch_ib(i)%torque(:) = matmul(patch_ib(i)%rotation_matrix_inverse, torques(i, :))
         end do
+
+        ! Sync updated patch_ib to device for GPU kernels
+        $:GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 967 - 1120, The MPI reductions in
s_compute_ib_forces (calls to s_mpi_allreduce_vectors_sum for forces and
torques) are host-side and must be bracketed by a GPU update to ensure
device/host coherence; insert GPU_UPDATE(device='[patch_ib(1:num_ibs)]')
immediately after the two s_mpi_allreduce_vectors_sum calls and before any
writes to patch_ib(i)%force or patch_ib(i)%torque so the routine handles
synchronization itself.
src/common/m_model.fpp (1)
498-576: ⚠️ Potential issue | 🟠 Major

Ray direction generation is incorrect and risks biased/degenerate rays.

Ray directions in both f_model_is_inside (line 550) and f_model_is_inside_flat (line 608) are computed as point(k) + f_model_random_number(rand_seed) - 0.5_wp. This couples the direction to absolute coordinates, causing heavy bias for large coordinate values. Additionally, both functions normalize without guarding against zero-length vectors, risking division by zero.

Ray directions should be independent of the point's absolute location—only the ray origin should depend on point. The fix removes the point(k) term and adds max(..., sgm_eps) guards on normalization (consistent with existing patterns in the codebase):
Proposed changes
Line 550 (f_model_is_inside):
-                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+                ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

Line 552:
-            ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
+            ray_dirs(i, :) = ray_dirs(i, :)/max(sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))), sgm_eps)

Line 608 (f_model_is_inside_flat):
-                dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+                dir(k) = f_model_random_number(rand_seed) - 0.5_wp

Lines 610–611:
-            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
-            dir(:) = dir(:)/dir_mag
+            dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
+            dir(:) = dir(:)/max(dir_mag, sgm_eps)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 498 - 576, The ray direction generation
in f_model_is_inside (and similarly f_model_is_inside_flat) incorrectly adds
point(k) to the random direction (coupling direction to absolute coordinates)
and normalizes without guarding against zero length; fix by computing
ray_dirs(i,k) = f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and
when normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).
src/simulation/m_ib_patches.fpp (2)
169-323: ⚠️ Potential issue | 🟠 Major

Add loop-local scalars xa, yc, and dycdxc to the private list in GPU_PARALLEL_LOOP.

Lines 265–266: These three scalars are assigned per-iteration inside the GPU-parallel region (lines 274–280) but are not declared private, creating a race condition where multiple GPU threads write to the same global variables. Update the private clause to private='[i,j,xy_local,k,f,xa,yc,dycdxc]'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 169 - 323, The
GPU_PARALLEL_LOOP in subroutine s_ib_airfoil currently omits loop-local scalars
xa, yc, and dycdxc from its private list, causing a race condition; update the
GPU_PARALLEL_LOOP directive's private clause to include xa, yc, and dycdxc (so
it becomes private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets
its own copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc
and writes into ib_markers%sf.
28-112: ⚠️ Potential issue | 🟠 Major

Public API change: s_apply_ib_patches now takes type(integer_field) — all callers verified and updated correctly.

s_apply_ib_patches changed its argument type (line 60). Both call sites in src/simulation/m_ibm.fpp (lines 114, 945) correctly pass type(integer_field) as argument. This breaking signature change affects the public API and is appropriately flagged for review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ib_patches.fpp` around lines 28 - 112, s_apply_ib_patches
now expects an argument of type(integer_field) (the signature changed), so
update all call sites and any interface/wrapper that referenced the old
signature; specifically ensure the calls in m_ibm.fpp (the two places that were
already updated) match the new prototype and remove/adjust any legacy
declarations or interface blocks that declared s_apply_ib_patches with the old
argument type, and update any documentation/comments that describe
s_apply_ib_patches to reflect it takes a type(integer_field) argument.

🧹 Nitpick comments (2)

src/common/m_constants.fpp (1)
25-25: Run all three targets after updating shared constants.

This change is in src/common/, so please ensure pre_process, simulation, and post_process targets are exercised to catch any cross-target regressions.
Based on learnings: “Changes to src/common/ affect all three executables (pre_process, simulation, post_process); test all three targets.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_constants.fpp` at line 25, You changed the shared constant
num_patches_max in src/common/m_constants.fpp which can impact all executables;
rebuild and run the pre_process, simulation, and post_process targets (or their
test suites) to exercise cross-target behavior and catch regressions —
specifically rebuild the three binaries that consume num_patches_max
(pre_process, simulation, post_process), run their full test inputs or
integration runs, and verify no runtime assertions, array bounds, or config
mismatches occur after the change.
src/simulation/m_ibm.fpp (1)
536-586: Use reduction instead of atomic operations to improve scaling for grid-cell classification.

num_gps_local and num_inner_gps_local are currently incremented via GPU_ATOMIC(update) for every classified cell, causing high contention on large grids. Since these are simple sums, use GPU_PARALLEL_LOOP reductions instead:
$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]', &
    & reduction='[[num_gps_local, num_inner_gps_local]]', reductionOp='[+, +]')
Replace the copy directive and remove both GPU_ATOMIC calls. Gang/vector reductions avoid per-thread atomics and scale significantly better.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulation/m_ibm.fpp` around lines 536 - 586, The loop in subroutine
s_find_num_ghost_points uses per-cell GPU_ATOMIC updates on num_gps_local and
num_inner_gps_local causing contention; replace the copy-based GPU_PARALLEL_LOOP
with a reduction-based directive so the counts are accumulated via GPU
reductions instead of atomics (use GPU_PARALLEL_LOOP with collapse=3,
private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]' and a
reduction clause for [num_gps_local, num_inner_gps_local] with '+'), remove the
two GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/common/m_derived_types.fpp`:
- Around line 186-199: The t_model_array derived type declares scalar fields
boundary_edge_count, total_vertices, interpolate and ntrs without defaults which
can leave garbage values; add default initializers for these scalars in the type
definition (set boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0)
so all instances start deterministic before any assignment, and then run and
verify pre_process, post_process and simulation builds/tests to ensure the
initialization change doesn't break their workflows; update any constructors or
code that relied on uninitialized behavior if necessary.

In `@src/common/m_model.fpp`:
- Around line 1209-1267: The subroutine f_distance_normals_3D_flat can index
trs_n(:, tri_idx, pid) with tri_idx==0 when ntrs <= 0; add a guard at the top of
f_distance_normals_3D_flat that checks if ntrs <= 0 and in that case set
distance to a safe value (e.g. a large positive sentinel or 0._wp as
appropriate) and set normals to a defined default (e.g. zero vector) and return
immediately to avoid using tri_idx/uninitialized indices; ensure you reference
and update tri_idx, distance and normals in this early-return path so no
out-of-bounds access to trs_n occurs.
- Around line 20-42: Add a module finalizer in m_model.fpp (e.g., subroutine
s_finalize_model or a MODULE PROCEDURE bound to final) that checks
ALLOCATED(...) and calls DEALLOCATE(...) for all public allocatable arrays
gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes
and models to ensure GPU/host memory is freed during simulation cleanup; export
or document the finalizer so the cleanup path in your driver or cleanup routine
calls it. Also remove the unnecessary "use m_model" import from
src/pre_process/m_icpp_patches.fpp (or the specific pre_process file mentioned)
since it does not reference the gpu_* arrays or models, eliminating the unwanted
coupling.
- Around line 1402-1415: s_pack_model_for_gpu currently unconditionally
allocates ma%trs_v and ma%trs_n; make it re-entrant by checking existing
allocations before allocating: set ma%ntrs = ma%model%ntrs, then for ma%trs_v
and ma%trs_n use allocated(...) to detect prior allocation and only allocate if
not allocated or if the current allocation dimensions don't match ma%ntrs (in
that case deallocate then allocate with the correct bounds); ensure you copy
into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation checks complete.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 954-1169: s_instantiate_STL_models has four fixes: allocate
stl_bounding_boxes once before the patch loop with shape (num_ibs,1:3,1:3)
instead of calling allocate(stl_bounding_boxes(patch_id,...)) inside the loop;
ensure total_vertices is initialized or only copied into
models(patch_id)%total_vertices when interpolate is true (do not propagate
uninitialized total_vertices when interpolation is skipped); pass spacing
magnitudes to the interpolation routines by using (/dx_local, dy_local,
dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the
unconditional interpolate = .false. that currently masks interpolation logic);
and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v,
gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each
allocate to allow repeated calls to s_instantiate_STL_models.
- Around line 828-953: s_update_mib’s restart path must refresh the
device-resident model arrays before generating markers: ensure
s_instantiate_STL_models() (or an equivalent GPU_ENTER_DATA/refresh routine for
module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n) is called or re-entered before
s_apply_ib_patches() in the restart branch so device data reflects any changes
from s_update_ib_rotation_matrix(); in s_ib_model and s_ib_3d_model, add
explicit device presence to the GPU_PARALLEL_LOOP clauses by including
present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' (in addition to existing copyin) to
avoid relying on implicit persistence; and narrow the 2D scan in s_ib_model by
reusing the bounding-index logic from s_ib_3d_model (compute corner_distance
from stl_bounding_boxes and call get_bounding_indices to set i/j limits) to
avoid scanning the full grid for sparse STL patches.

In `@toolchain/mfc/case_validator.py`:
- Around line 687-688: The error message passed to self.prohibit for the num_ibs
check is stale (mentions 10) while the actual upper bound is 1000; update the
message to reflect the real max and, to prevent future drift, construct the
message using the authoritative constraint value (e.g. the CONSTRAINTS constant
or the class attribute used for bounds) rather than a hardcoded literal—locate
the self.prohibit call that checks num_ibs (the line with self.prohibit(ib and
(num_ibs <= 0 or num_ibs > 1000), ...)) and interpolate/format the upper-bound
from CONSTRAINTS (or self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as
appropriate) so the message reads "num_ibs must be between 1 and <max>".

---

Outside diff comments:
In `@src/common/m_model.fpp`:
- Around line 498-576: The ray direction generation in f_model_is_inside (and
similarly f_model_is_inside_flat) incorrectly adds point(k) to the random
direction (coupling direction to absolute coordinates) and normalizes without
guarding against zero length; fix by computing ray_dirs(i,k) =
f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and when
normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).

In `@src/simulation/m_compute_levelset.fpp`:
- Around line 641-709: The integer variable interpolate in subroutine
s_model_levelset is being used as a logical in the 3D branch (if (interpolate)
then) which will not compile; update that conditional to explicitly compare the
integer (e.g., if (interpolate == 1) then) or change the variable to a logical
and adjust all uses accordingly so that the 3D path (the block that currently
calls f_interpolated_distance or assigns gp%levelset = distance) uses a valid
boolean check on interpolate.

In `@src/simulation/m_ib_patches.fpp`:
- Around line 169-323: The GPU_PARALLEL_LOOP in subroutine s_ib_airfoil
currently omits loop-local scalars xa, yc, and dycdxc from its private list,
causing a race condition; update the GPU_PARALLEL_LOOP directive's private
clause to include xa, yc, and dycdxc (so it becomes
private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets its own
copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc and
writes into ib_markers%sf.
- Around line 28-112: s_apply_ib_patches now expects an argument of
type(integer_field) (the signature changed), so update all call sites and any
interface/wrapper that referenced the old signature; specifically ensure the
calls in m_ibm.fpp (the two places that were already updated) match the new
prototype and remove/adjust any legacy declarations or interface blocks that
declared s_apply_ib_patches with the old argument type, and update any
documentation/comments that describe s_apply_ib_patches to reflect it takes a
type(integer_field) argument.

In `@src/simulation/m_ibm.fpp`:
- Around line 589-676: The variable patch_id in s_find_ghost_points is written
per-iteration but not listed as private in the $:GPU_PARALLEL_LOOP, causing
concurrent threads to race when reading patch_ib(patch_id); update the GPU
pragma for s_find_ghost_points to include patch_id in the private list
(alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each thread gets its own patch_id
before using patch_ib(patch_id)%... .
- Around line 441-534: s_compute_image_points contains a GPU-parallel do-while
that advances index using pointer s_cc and can run out-of-bounds on device
builds because the host-only error check is compiled out; fix by adding an
in-device-safe boundary clamp or replace the linear scan with a binary search
over s_cc. Specifically, inside the loop that updates index (references: s_cc,
index, dir, temp_loc, bound, buff_size) add an explicit check that if index
reaches the lower/upper limits (-buff_size or bound) you break the loop and
clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.
- Around line 967-1120: The MPI reductions in s_compute_ib_forces (calls to
s_mpi_allreduce_vectors_sum for forces and torques) are host-side and must be
bracketed by a GPU update to ensure device/host coherence; insert
GPU_UPDATE(device='[patch_ib(1:num_ibs)]') immediately after the two
s_mpi_allreduce_vectors_sum calls and before any writes to patch_ib(i)%force or
patch_ib(i)%torque so the routine handles synchronization itself.

In `@src/simulation/m_time_steppers.fpp`:
- Around line 805-855: The NVTX range label passed to nvtxStartRange in the
MTimeSteppers routine is misspelled ("PROPAGATE-IMERSED-BOUNDARIES"); update the
string to the correct label "PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools
group this region correctly (look for the nvtxStartRange(...) call in the
do-loop that updates patch_ib and ensure nvtxEndRange remains paired).

---

Nitpick comments:
In `@src/common/m_constants.fpp`:
- Line 25: You changed the shared constant num_patches_max in
src/common/m_constants.fpp which can impact all executables; rebuild and run the
pre_process, simulation, and post_process targets (or their test suites) to
exercise cross-target behavior and catch regressions — specifically rebuild the
three binaries that consume num_patches_max (pre_process, simulation,
post_process), run their full test inputs or integration runs, and verify no
runtime assertions, array bounds, or config mismatches occur after the change.

In `@src/simulation/m_ibm.fpp`:
- Around line 536-586: The loop in subroutine s_find_num_ghost_points uses
per-cell GPU_ATOMIC updates on num_gps_local and num_inner_gps_local causing
contention; replace the copy-based GPU_PARALLEL_LOOP with a reduction-based
directive so the counts are accumulated via GPU reductions instead of atomics
(use GPU_PARALLEL_LOOP with collapse=3, private='[i,j,k,ii,jj,kk,is_gp]',
firstprivate='[gp_layers,gp_layers_z]' and a reduction clause for
[num_gps_local, num_inner_gps_local] with '+'), remove the two
GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_derived_types.fpp

    type :: t_model_array
+        ! Original CPU-side fields (unchanged)
        type(t_model), allocatable :: model
        real(wp), allocatable, dimension(:, :, :) :: boundary_v
        real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
        integer :: boundary_edge_count
        integer :: total_vertices
-        logical :: interpolate
+        integer :: interpolate
+
+        ! GPU-friendly flattened arrays
+        integer :: ntrs  ! copy of model%ntrs
+        real(wp), allocatable, dimension(:, :, :) :: trs_v  ! (3, 3, ntrs) - triangle vertices
+        real(wp), allocatable, dimension(:, :) :: trs_n  ! (3, ntrs)    - triangle normals
    end type t_model_array


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the current state of the file at the specified lines cat -n src/common/m_derived_types.fpp | sed -n '180,210p'

Repository: MFlowCode/MFC

Length of output: 1607

🏁 Script executed:

# Search for all uses of %interpolate across the codebase rg -nP '%interpolate\b' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 344

🏁 Script executed:

# Look for suspicious logical patterns with %interpolate rg -nP '%interpolate\b.*(\.(true|false)|\.not\.)' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

# Check if there are any conditional patterns that may need review rg -nP 'if\s*\(.*%interpolate' src --type-add 'fpp:*.fpp' --type 'fpp' -A 2

Repository: MFlowCode/MFC

Length of output: 854

🏁 Script executed:

# Check for assignments to %interpolate in the codebase rg -nP '%interpolate\s*=' src --type-add 'fpp:*.fpp' --type 'fpp' -B 1 -A 1

Repository: MFlowCode/MFC

Length of output: 429

t_model_array scalars must be initialized to prevent garbage values reaching GPU or uninitialized field reads.

All uses of %interpolate correctly employ integer semantics (0/1, no logical operators); however, the type definition still lacks default initialization for boundary_edge_count, total_vertices, interpolate, and ntrs. Uninitialized scalars can leak garbage into GPU staging arrays or cause undefined behavior if any code path reads these fields before assignment.

Apply default initialization

integer :: boundary_edge_count = 0 integer :: total_vertices = 0 integer :: interpolate = 0 integer :: ntrs = 0 ! copy of model%ntrs

Additionally, verify that changes do not break pre_process and post_process executables, not just simulation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

type :: t_model_array

! Original CPU-side fields (unchanged)

type(t_model), allocatable :: model

real(wp), allocatable, dimension(:, :, :) :: boundary_v

real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v

integer :: boundary_edge_count

integer :: total_vertices

logical :: interpolate

integer :: interpolate

! GPU-friendly flattened arrays

integer :: ntrs ! copy of model%ntrs

real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices

real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals

end type t_model_array

type :: t_model_array

! Original CPU-side fields (unchanged)

type(t_model), allocatable :: model

real(wp), allocatable, dimension(:, :, :) :: boundary_v

real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v

integer :: boundary_edge_count = 0

integer :: total_vertices = 0

integer :: interpolate = 0

! GPU-friendly flattened arrays

integer :: ntrs = 0 ! copy of model%ntrs

real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices

real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals

end type t_model_array

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_derived_types.fpp` around lines 186 - 199, The t_model_array derived type declares scalar fields boundary_edge_count, total_vertices, interpolate and ntrs without defaults which can leave garbage values; add default initializers for these scalars in the type definition (set boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0) so all instances start deterministic before any assignment, and then run and verify pre_process, post_process and simulation builds/tests to ensure the initialization change doesn't break their workflows; update any constructors or code that relied on uninitialized behavior if necessary.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_model.fpp

+    public :: f_model_read, s_model_write, s_model_free, f_model_is_inside, models, gpu_ntrs, &
+              gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, &
+              gpu_total_vertices, stl_bounding_boxes

    ! Subroutines for STL immersed boundaries
    public :: f_check_boundary, f_register_edge, f_check_interpolation_2D, &
              f_check_interpolation_3D, f_interpolate_2D, f_interpolate_3D, &
-              f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area
+              f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area, s_pack_model_for_gpu, &
+              f_model_is_inside_flat, f_distance_normals_3d_flat
+
+    !! array of STL models that can be allocated and then used in IB marker and levelset compute
+    type(t_model_array), allocatable, target :: models(:)
+    !! GPU-friendly flat arrays for STL model data
+    integer, allocatable :: gpu_ntrs(:)
+    real(wp), allocatable, dimension(:, :, :, :) :: gpu_trs_v
+    real(wp), allocatable, dimension(:, :, :) :: gpu_trs_n
+    real(wp), allocatable, dimension(:, :, :, :) :: gpu_boundary_v
+    real(wp), allocatable, dimension(:, :, :) :: gpu_interpolated_boundary_v
+    integer, allocatable :: gpu_interpolate(:)
+    integer, allocatable :: gpu_boundary_edge_count(:)
+    integer, allocatable :: gpu_total_vertices(:)
+    real(wp), allocatable :: stl_bounding_boxes(:, :, :)



⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find allocation and deallocation sites for the new module variables echo "=== Allocation/Deallocation patterns ===" rg -nP --type fpp '(allocate|deallocate)\s*\(' src/common/m_model.fpp | head -30 echo -e "\n=== GPU array allocations in entire src/common ===" rg -nP --type fpp 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|gpu_interpolate|gpu_boundary_edge_count|gpu_total_vertices|stl_bounding_boxes' src/common/ | grep -i allocate | head -20 echo -e "\n=== Where m_model is imported ===" rg -n 'use.*m_model' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 622

🏁 Script executed:

#!/bin/bash # Check the finalization routine referenced in the public interface echo "=== s_model_free subroutine definition ===" rg -A 50 'subroutine\s+s_model_free' src/common/m_model.fpp | head -60 echo -e "\n=== s_model_write subroutine definition ===" rg -A 30 'subroutine\s+s_model_write' src/common/m_model.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 2632

🏁 Script executed:

#!/bin/bash # Check if pre_process or post_process reference the new public variables or m_model echo "=== pre_process usage of m_model ===" rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/pre_process/ 2>/dev/null | head -10 echo -e "\n=== post_process usage of m_model ===" rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/post_process/ 2>/dev/null | head -10 echo -e "\n=== Check if pre_process or post_process import m_model ===" rg -n 'use.*m_model' src/pre_process/ src/post_process/ 2>/dev/null

Repository: MFlowCode/MFC

Length of output: 518

🏁 Script executed:

#!/bin/bash # Get the full m_model.fpp file to understand context wc -l src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 84

🏁 Script executed:

#!/bin/bash # Read the module to find allocation patterns # Focus on where the public arrays are allocated sed -n '20,100p' src/common/m_model.fpp | cat -n

Repository: MFlowCode/MFC

Length of output: 3661

🏁 Script executed:

#!/bin/bash # Search for allocation patterns - try broader search rg -n 'allocate' src/common/m_model.fpp | head -30

Repository: MFlowCode/MFC

Length of output: 786

🏁 Script executed:

#!/bin/bash # Look for initialization subroutines that might allocate these arrays rg -n 'subroutine.*init\|subroutine.*allocate\|subroutine.*setup' src/common/m_model.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for where models(:) is allocated rg -B 3 -A 3 'models\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 430

🏁 Script executed:

#!/bin/bash # Look for gpu_ntrs allocation pattern rg -B 3 -A 3 'gpu_ntrs\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 490

🏁 Script executed:

#!/bin/bash # Check how these are used in simulation to understand lifecycle echo "=== Usage in m_ibm.fpp ===" rg -B 2 -A 2 'gpu_ntrs\|gpu_trs_v\|gpu_boundary_v\|models\(' src/simulation/m_ibm.fpp | head -40 echo -e "\n=== Usage in pre_process/m_icpp_patches.fpp ===" rg -B 2 -A 2 'models\(' src/pre_process/m_icpp_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 132

🏁 Script executed:

#!/bin/bash # Search for allocation of the MODULE-LEVEL arrays in calling modules echo "=== Search for allocate of gpu_ntrs in entire codebase ===" rg 'allocate.*gpu_ntrs\|gpu_ntrs.*allocate' src/ | head -20 echo -e "\n=== Search for s_pack_model_for_gpu ===" rg -n 's_pack_model_for_gpu' src/ | head -20 echo -e "\n=== Look for where gpu arrays are assigned/populated ===" rg -n 'gpu_ntrs\s*=' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 591

🏁 Script executed:

#!/bin/bash # Search for deallocate of module-level arrays echo "=== Deallocate of module-level gpu arrays ===" rg 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*models\(' src/ | head -20 echo -e "\n=== Look for finalization in m_ibm.fpp ===" rg -n 'finalize\|cleanup\|deallocate' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 146

🏁 Script executed:

#!/bin/bash # Look at s_pack_model_for_gpu implementation echo "=== s_pack_model_for_gpu subroutine ===" rg -A 80 'subroutine\s+s_pack_model_for_gpu' src/common/m_model.fpp | head -100

Repository: MFlowCode/MFC

Length of output: 519

🏁 Script executed:

#!/bin/bash # Final verification: confirm module-level arrays have no lifecycle echo "=== Complete scan for any allocation pattern ===" rg 'allocate\s*\(' src/common/m_model.fpp src/simulation/m_ib_patches.fpp src/simulation/m_ibm.fpp | grep -E 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\s*\(' || echo "NO ALLOCATIONS FOUND for public module arrays" echo -e "\n=== Confirm these are truly module-level (public scope) ===" rg -B 1 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 763

🏁 Script executed:

#!/bin/bash # Search for deallocation of these specific arrays echo "=== Deallocate patterns for public module arrays ===" rg -n 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*gpu_boundary_v\|deallocate.*models' src/ | head -20 echo -e "\n=== Broader search for deallocate in m_ib_patches.fpp ===" rg -n 'deallocate' src/simulation/m_ib_patches.fpp | head -20 echo -e "\n=== Context around gpu array allocations in m_ib_patches.fpp ===" rg -B 5 -A 10 'allocate.*gpu_ntrs' src/simulation/m_ib_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 890

🏁 Script executed:

#!/bin/bash # Check what pre_process actually uses from m_model echo "=== What does pre_process use from m_model? ===" sed -n '1,50p' src/pre_process/m_icpp_patches.fpp | grep -A 15 'use m_model' echo -e "\n=== Any direct usage of gpu_* or models arrays in pre_process? ===" rg -n 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\(' src/pre_process/m_icpp_patches.fpp | head -10

Repository: MFlowCode/MFC

Length of output: 553

Add deallocate() calls for all public module arrays in a finalization routine, and remove unnecessary m_model import from pre_process.

The public module arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices) are allocated in m_ib_patches.fpp but never deallocated, causing GPU memory leaks. Create a finalization subroutine in m_model.fpp to deallocate these arrays and call it during simulation cleanup. Additionally, src/pre_process/m_icpp_patches.fpp imports m_model but does not use any of the gpu or models arrays—remove this unnecessary coupling per src/common/ guidelines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 20 - 42, Add a module finalizer in m_model.fpp (e.g., subroutine s_finalize_model or a MODULE PROCEDURE bound to final) that checks ALLOCATED(...) and calls DEALLOCATE(...) for all public allocatable arrays gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes and models to ensure GPU/host memory is freed during simulation cleanup; export or document the finalizer so the cleanup path in your driver or cleanup routine calls it. Also remove the unnecessary "use m_model" import from src/pre_process/m_icpp_patches.fpp (or the specific pre_process file mentioned) since it does not reference the gpu_* arrays or models, eliminating the unwanted coupling.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_model.fpp

+    subroutine f_distance_normals_3D_flat(ntrs, trs_v, trs_n, pid, point, normals, distance)
+
+        $:GPU_ROUTINE(parallelism='[seq]')
+
+        integer, intent(in) :: ntrs
+        real(wp), dimension(:, :, :, :), intent(in) :: trs_v
+        real(wp), dimension(:, :, :), intent(in) :: trs_n
+        integer, intent(in) :: pid
+        real(wp), dimension(1:3), intent(in) :: point
+        real(wp), dimension(1:3), intent(out) :: normals
+        real(wp), intent(out) :: distance
+
+        real(wp), dimension(1:3, 1:3) :: tri
+        real(wp) :: dist_min, dist_t_min
+        real(wp) :: dist_min_normal, dist_buffer_normal
+        real(wp), dimension(1:3) :: midp
+        real(wp), dimension(1:3) :: dist_buffer
+        integer :: i, j, tri_idx
+
+        dist_min = 1.e12_wp
+        dist_min_normal = 1.e12_wp
+        distance = 0._wp
+
+        tri_idx = 0
+        do i = 1, ntrs
+            do j = 1, 3
+                tri(j, 1) = trs_v(j, 1, i, pid)
+                tri(j, 2) = trs_v(j, 2, i, pid)
+                tri(j, 3) = trs_v(j, 3, i, pid)
+                dist_buffer(j) = sqrt((point(1) - tri(j, 1))**2 + &
+                                      (point(2) - tri(j, 2))**2 + &
+                                      (point(3) - tri(j, 3))**2)
+            end do
+
+            do j = 1, 3
+                midp(j) = (tri(1, j) + tri(2, j) + tri(3, j))/3
+            end do
+
+            dist_t_min = minval(dist_buffer(1:3))
+            dist_buffer_normal = sqrt((point(1) - midp(1))**2 + &
+                                      (point(2) - midp(2))**2 + &
+                                      (point(3) - midp(3))**2)
+
+            if (dist_t_min < dist_min) then
+                dist_min = dist_t_min
+            end if
+
+            if (dist_buffer_normal < dist_min_normal) then
+                dist_min_normal = dist_buffer_normal
+                tri_idx = i
+            end if
+        end do
+
+        normals(1) = trs_n(1, tri_idx, pid)
+        normals(2) = trs_n(2, tri_idx, pid)
+        normals(3) = trs_n(3, tri_idx, pid)
+        distance = dist_min
+
+    end subroutine f_distance_normals_3D_flat


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bf_distance_normals_3D_flat\s*\(' src

Repository: MFlowCode/MFC

Length of output: 345

🏁 Script executed:

sed -n '680,700p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 913

🏁 Script executed:

rg -nP 'gpu_ntrs' src/simulation/m_compute_levelset.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 186

🏁 Script executed:

rg -nP '\bgpu_ntrs\b' src --type-add 'fpp:*.fpp' | grep -E '(allocate|dimension|intent|gpu_ntrs\s*=)'

Repository: MFlowCode/MFC

Length of output: 206

🏁 Script executed:

sed -n '1110,1160p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 2729

🏁 Script executed:

sed -n '650,710p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 2630

Add guard in f_distance_normals_3D_flat for ntrs <= 0 to prevent out-of-bounds access.

When ntrs is 0 (which can occur if a patch has no model allocated), tri_idx remains uninitialized at 0 and is used to index trs_n(:, tri_idx, pid) at the end, causing an out-of-bounds read. The caller at src/simulation/m_compute_levelset.fpp:688 does not validate gpu_ntrs(patch_id) > 0 before invoking this function.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 1209 - 1267, The subroutine f_distance_normals_3D_flat can index trs_n(:, tri_idx, pid) with tri_idx==0 when ntrs <= 0; add a guard at the top of f_distance_normals_3D_flat that checks if ntrs <= 0 and in that case set distance to a safe value (e.g. a large positive sentinel or 0._wp as appropriate) and set normals to a defined default (e.g. zero vector) and return immediately to avoid using tri_idx/uninitialized indices; ensure you reference and update tri_idx, distance and normals in this early-return path so no out-of-bounds access to trs_n occurs.

coderabbitai · 2026-02-19T19:08:53Z

src/common/m_model.fpp

+    subroutine s_pack_model_for_gpu(ma)
+        type(t_model_array), intent(inout) :: ma
+        integer :: i
+
+        ma%ntrs = ma%model%ntrs
+        allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
+        allocate (ma%trs_n(1:3, 1:ma%ntrs))
+
+        do i = 1, ma%ntrs
+            ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
+            ma%trs_n(:, i) = ma%model%trs(i)%n(:)
+        end do
+    end subroutine
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all call sites to s_pack_model_for_gpu rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bs_pack_model_for_gpu\s*\(' src || true

Repository: MFlowCode/MFC

Length of output: 220

🏁 Script executed:

#!/bin/bash # Also search for the subroutine definition and nearby code rg -nP --type-add 'fpp:*.fpp' --type 'fpp' -B 5 -A 20 'subroutine s_pack_model_for_gpu' src || true

Repository: MFlowCode/MFC

Length of output: 1111

🏁 Script executed:

#!/bin/bash # Check if ma%trs_v or ma%trs_n are referenced elsewhere to understand lifecycle rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'ma%trs_v|ma%trs_n' src || true

Repository: MFlowCode/MFC

Length of output: 367

🏁 Script executed:

#!/bin/bash # Get context around the call site in m_ib_patches.fpp sed -n '1080,1120p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1850

🏁 Script executed:

#!/bin/bash # Search for surrounding loop context (broader range) sed -n '1050,1130p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4004

🏁 Script executed:

#!/bin/bash # Check if models is deallocated/re-initialized somewhere rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'models\s*\(|deallocate.*models' src/simulation/ | head -30

Repository: MFlowCode/MFC

Length of output: 3130

🏁 Script executed:

#!/bin/bash # Check if s_instantiate_STL_models or models are re-initialized in any time-stepping/simulation loop rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 's_instantiate_STL_models|allocate.*models\(' src/simulation/ | grep -v '\.fpp-' | head -20

Repository: MFlowCode/MFC

Length of output: 1135

🏁 Script executed:

#!/bin/bash # Search for any re-initialization or deallocate of models during simulation rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'deallocate.*models|models.*=|trs_v.*deallocate|trs_n.*deallocate' src | head -20

Repository: MFlowCode/MFC

Length of output: 1318

Make s_pack_model_for_gpu re-entrant by checking allocations before allocating.

The function is called only once per object during initialization (in s_instantiate_STL_models), so there is currently no runtime risk. However, unconditional allocations of ma%trs_v and ma%trs_n violate defensive programming practices and would fail silently if the function were ever called again on the same object. Add pre-allocation checks:

Proposed fix

ma%ntrs = ma%model%ntrs + if (allocated(ma%trs_v)) deallocate(ma%trs_v) + if (allocated(ma%trs_n)) deallocate(ma%trs_n) allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs)) allocate (ma%trs_n(1:3, 1:ma%ntrs))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

subroutine s_pack_model_for_gpu(ma)

type(t_model_array), intent(inout) :: ma

integer :: i

ma%ntrs = ma%model%ntrs

allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))

allocate (ma%trs_n(1:3, 1:ma%ntrs))

do i = 1, ma%ntrs

ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)

ma%trs_n(:, i) = ma%model%trs(i)%n(:)

end do

end subroutine

subroutine s_pack_model_for_gpu(ma)

type(t_model_array), intent(inout) :: ma

integer :: i

ma%ntrs = ma%model%ntrs

if (allocated(ma%trs_v)) deallocate(ma%trs_v)

if (allocated(ma%trs_n)) deallocate(ma%trs_n)

allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))

allocate (ma%trs_n(1:3, 1:ma%ntrs))

do i = 1, ma%ntrs

ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)

ma%trs_n(:, i) = ma%model%trs(i)%n(:)

end do

end subroutine

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 1402 - 1415, s_pack_model_for_gpu currently unconditionally allocates ma%trs_v and ma%trs_n; make it re-entrant by checking existing allocations before allocating: set ma%ntrs = ma%model%ntrs, then for ma%trs_v and ma%trs_n use allocated(...) to detect prior allocation and only allocate if not allocated or if the current allocation dimensions don't match ma%ntrs (in that case deallocate then allocate with the correct bounds); ensure you copy into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation checks complete.

coderabbitai · 2026-02-19T19:08:54Z

src/simulation/m_ib_patches.fpp

+    !> The STL patch is a 2D geometry that is imported from an STL file.
    !! @param patch_id is the patch identifier
-    !! @param ib_markers_sf Array to track patch ids
-    !! @param STL_levelset STL levelset
-    !! @param STL_levelset_norm STL levelset normals
-    subroutine s_ib_model(patch_id, ib_markers_sf)
+    !! @param ib_markers Array to track patch ids
+    subroutine s_ib_model(patch_id, ib_markers)

        integer, intent(in) :: patch_id
-        integer, dimension(0:m, 0:n, 0:p), intent(inout) :: ib_markers_sf
-
-        integer :: i, j, k !< Generic loop iterators
+        type(integer_field), intent(inout) :: ib_markers

-        type(t_model), pointer :: model
+        integer :: i, j, k  !< Generic loop iterators
+        integer :: spc

-        real(wp) :: eta
+        real(wp) :: eta, threshold
        real(wp), dimension(1:3) :: point, local_point, offset
-        real(wp), dimension(1:3) :: center, xyz_local
+        real(wp), dimension(1:3) :: center, xy_local
        real(wp), dimension(1:3, 1:3) :: inverse_rotation

-        model => models(patch_id)%model
        center = 0._wp
        center(1) = patch_ib(patch_id)%x_centroid
        center(2) = patch_ib(patch_id)%y_centroid
-        if (p > 0) center(3) = patch_ib(patch_id)%z_centroid
        inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)
        offset(:) = patch_ib(patch_id)%centroid_offset(:)
+        spc = patch_ib(patch_id)%model_spc
+        threshold = patch_ib(patch_id)%model_threshold

-        do i = 0, m
-            do j = 0, n
-                do k = 0, p
+        $:GPU_PARALLEL_LOOP(private='[i,j, xy_local, eta]',&
+                  & copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=2)
+        do i = -gp_layers, m + gp_layers
+            do j = -gp_layers, n + gp_layers

-                    xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp]
-                    if (p > 0) then
-                        xyz_local(3) = z_cc(k) - center(3)
-                    end if
+                xy_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp]
+                xy_local = matmul(inverse_rotation, xy_local)
+                xy_local = xy_local - offset
+
+                if (grid_geometry == 3) then
+                    xy_local = f_convert_cyl_to_cart(xy_local)
+                end if
+
+                eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
+                                             gpu_trs_v, gpu_trs_n, &
+                                             patch_id, &
+                                             xy_local, (/dx(i), dy(j), 0._wp/), &
+                                             spc)
+
+                ! Reading STL boundary vertices and compute the levelset and levelset_norm
+                if (eta > threshold) then
+                    ib_markers%sf(i, j, 0) = patch_id
+                end if
+
+            end do
+        end do
+        $:END_GPU_PARALLEL_LOOP()
+
+    end subroutine s_ib_model
+
+    !> The STL patch is a 3D geometry that is imported from an STL file.
+    !! @param patch_id is the patch identifier
+    !! @param ib_markers Array to track patch ids
+    subroutine s_ib_3d_model(patch_id, ib_markers)
+
+        integer, intent(in) :: patch_id
+        type(integer_field), intent(inout) :: ib_markers
+
+        integer :: i, j, k, il, ir, jl, jr, kl, kr !< Generic loop iterators
+        integer :: spc
+
+        real(wp) :: eta, threshold, corner_distance
+        real(wp), dimension(1:3) :: point, local_point, offset
+        real(wp), dimension(1:3) :: center, xyz_local
+        real(wp), dimension(1:3, 1:3) :: inverse_rotation
+
+        center = 0._wp
+        center(1) = patch_ib(patch_id)%x_centroid
+        center(2) = patch_ib(patch_id)%y_centroid
+        center(3) = patch_ib(patch_id)%z_centroid
+        inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :)
+        offset(:) = patch_ib(patch_id)%centroid_offset(:)
+        spc = patch_ib(patch_id)%model_spc
+        threshold = patch_ib(patch_id)%model_threshold
+
+        il = -gp_layers
+        jl = -gp_layers
+        kl = -gp_layers
+        ir = m + gp_layers
+        jr = n + gp_layers
+        kr = p + gp_layers
+        corner_distance = 0._wp
+        do i = 1, 3
+            corner_distance = corner_distance + maxval(abs(stl_bounding_boxes(patch_id, i, 1:3)))**2 ! distance to rim of cylinder
+        end do
+        corner_distance = sqrt(corner_distance)
+        call get_bounding_indices(center(1) - corner_distance, center(1) + corner_distance, x_cc, il, ir)
+        call get_bounding_indices(center(2) - corner_distance, center(2) + corner_distance, y_cc, jl, jr)
+        call get_bounding_indices(center(3) - corner_distance, center(3) + corner_distance, z_cc, kl, kr)
+
+        $:GPU_PARALLEL_LOOP(private='[i,j,k, xyz_local, eta]',&
+                  & copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=3)
+        do i = il, ir
+            do j = jl, jr
+                do k = kl, kr
+
+                    xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), z_cc(k) - center(3)]
                    xyz_local = matmul(inverse_rotation, xyz_local)
                    xyz_local = xyz_local - offset

                    if (grid_geometry == 3) then
                        xyz_local = f_convert_cyl_to_cart(xyz_local)
                    end if

-                    if (p == 0) then
-                        eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), 0._wp/), patch_ib(patch_id)%model_spc)
-                    else
-                        eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), dz(k)/), patch_ib(patch_id)%model_spc)
-                    end if
+                    eta = f_model_is_inside_flat(gpu_ntrs(patch_id), &
+                                                 gpu_trs_v, gpu_trs_n, &
+                                                 patch_id, &
+                                                 xyz_local, (/dx(i), dy(j), dz(k)/), &
+                                                 spc)

                    ! Reading STL boundary vertices and compute the levelset and levelset_norm
                    if (eta > patch_ib(patch_id)%model_threshold) then
-                        ib_markers_sf(i, j, k) = patch_id
+                        ib_markers%sf(i, j, k) = patch_id
                    end if

                end do
            end do
        end do
+        $:END_GPU_PARALLEL_LOOP()

-    end subroutine s_ib_model
+    end subroutine s_ib_3d_model
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1. Find s_instantiate_STL_models and s_apply_ib_patches calls to check ordering echo "=== Looking for s_instantiate_STL_models and s_apply_ib_patches calls ===" rg -n 's_instantiate_STL_models|s_apply_ib_patches' src/simulation/ --type fpp echo -e "\n=== Checking if they're in correct order within m_ibm.fpp ===" rg -n 's_instantiate_STL_models\|s_apply_ib_patches' src/simulation/m_ibm.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 238

🏁 Script executed:

#!/bin/bash # 2. Find where gpu_ntrs/gpu_trs_v/gpu_trs_n are allocated and managed echo "=== Finding gpu_ntrs/gpu_trs_v/gpu_trs_n allocations and GPU_ENTER_DATA ===" rg -n 'allocate.*gpu_ntrs|allocate.*gpu_trs_v|allocate.*gpu_trs_n' src/simulation/ --type fpp echo -e "\n=== Searching for GPU_ENTER_DATA calls with these variables ===" rg -n 'GPU_ENTER_DATA' src/simulation/m_ib_patches.fpp -A 5 || rg -n 'GPU_ENTER_DATA' src/simulation/m_ibm.fpp -A 5

Repository: MFlowCode/MFC

Length of output: 821

🏁 Script executed:

#!/bin/bash # 3. Verify bounding-box logic in both functions echo "=== Checking s_ib_model for bounding-box logic ===" sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|bounding' || echo "No bounding-box narrowing found in s_ib_model" echo -e "\n=== Checking s_ib_3d_model for bounding-box logic ===" sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|kl\|kr\|bounding'

Repository: MFlowCode/MFC

Length of output: 1111

🏁 Script executed:

#!/bin/bash # 4. Check data movement for gpu_ntrs/gpu_trs_v/gpu_trs_n in the GPU loop context echo "=== Checking GPU_PARALLEL_LOOP copyin clause for these variables ===" sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP' echo -e "\n=== In s_ib_3d_model ===" sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP' echo -e "\n=== Checking if gpu_trs_v/gpu_trs_n are explicitly referenced ===" rg -n 'gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 1440

🏁 Script executed:

#!/bin/bash # Check the broader context of s_apply_ib_patches and see if GPU_ENTER_DATA creates persistent region rg -n 's_apply_ib_patches' src/simulation/m_ibm.fpp -B 10 -A 20 | head -80

Repository: MFlowCode/MFC

Length of output: 2981

🏁 Script executed:

#!/bin/bash # Look for GPU_DATA wrapping context around the kernel calls rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 493

🏁 Script executed:

#!/bin/bash # Check f_model_is_inside_flat signature to see if it expects device arrays rg -n 'subroutine\s+f_model_is_inside_flat|function\s+f_model_is_inside_flat' src/simulation/ -A 15

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Verify where GPU_ENTER_DATA is called and what scope it governs rg -n 'subroutine s_instantiate_STL_models' src/simulation/m_ib_patches.fpp -A 300 | tail -150

Repository: MFlowCode/MFC

Length of output: 6345

🏁 Script executed:

#!/bin/bash # Check if there's a persistent GPU_ENTER_DATA region wrapping both calls # Look for GPU_ENTER_DATA without matching GPU_EXIT_DATA rg -n 'GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 185

🏁 Script executed:

#!/bin/bash # Verify the exact scope of GPU_ENTER_DATA at line 1160 - is it persistent? sed -n '1140,1170p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1881

🏁 Script executed:

#!/bin/bash # Check where gpu_ntrs/gpu_trs_v/gpu_trs_n might be declared with present() or accessed post-ENTER_DATA rg -n 'present\|gpu_ntrs\|gpu_trs_v\|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -A 3 -B 3 'present'

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for how GPU data persistence is managed - look for surrounding GPU_DATA or block context sed -n '1100,1170p' src/simulation/m_ib_patches.fpp | head -80

Repository: MFlowCode/MFC

Length of output: 3699

🏁 Script executed:

#!/bin/bash # Check if there's a GPU_EXIT_DATA anywhere in the file rg -n 'GPU_EXIT_DATA' src/simulation/m_ib_patches.fpp || echo "No GPU_EXIT_DATA found" # Check broader context around GPU_ENTER_DATA call at line 1160 sed -n '1090,1170p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4049

🏁 Script executed:

#!/bin/bash # Look for how f_model_is_inside_flat uses the arrays - check if it's marked GPU_ROUTINE rg -n 'f_model_is_inside_flat' src/simulation/ -A 5 | head -40

Repository: MFlowCode/MFC

Length of output: 1164

🏁 Script executed:

#!/bin/bash # Check the GPU macro definitions to understand data region semantics fd 'gpuParallelization\|gpu.*\.fpp' src/simulation/ && cat src/simulation/gpuParallelization.fpp 2>/dev/null | head -200 || echo "File not found, searching for macro definitions" rg -n '@:GPU_PARALLEL_LOOP|@:GPU_DATA|@:GPU_ENTER_DATA' src/ -B 2 | head -50

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if there's a broad GPU_DATA or GPU context managing device persistence around the entire flow rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA|GPU_ROUTINE' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 280

🏁 Script executed:

#!/bin/bash # Look at the main timestepping loop structure to understand device data persistence scope rg -n 'subroutine s_time_step\|end subroutine s_time_step' src/simulation/m_ibm.fpp -A 300 | head -150

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if the GPU arrays like gpu_trs_v are declared as module-level allocatables that persist rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -i 'allocatable\|dimension' | head -10

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for module-level declarations of gpu arrays rg -n 'allocatable.*gpu_ntrs|allocatable.*gpu_trs_v|allocatable.*gpu_trs_n' src/simulation/m_ib_patches.fpp -B 2 -A 2

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Find where these gpu arrays are declared at module scope rg -n 'integer.*allocatable|real.*allocatable' src/simulation/m_ib_patches.fpp | grep -i gpu

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Get the entire module-level variable declarations section sed -n '1,100p' src/simulation/m_ib_patches.fpp | grep -n 'allocatable\|dimension'

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search for the module declaration section more broadly head -200 src/simulation/m_ib_patches.fpp | tail -100

Repository: MFlowCode/MFC

Length of output: 3972

🏁 Script executed:

#!/bin/bash # Check module-level variable declarations in m_ib_patches grep -n 'module m_ib_patches' src/simulation/m_ib_patches.fpp -A 50 | head -60

Repository: MFlowCode/MFC

Length of output: 2313

🏁 Script executed:

#!/bin/bash # Verify whether gpu_trs_v/gpu_trs_n require present() or explicit data movement in nested GPU regions # Look for documentation or examples in gpuParallelization guidance rg -n 'present\|GPU_PARALLEL_LOOP' src/simulation/m_ibm.fpp -B 2 -A 3 | head -80

Repository: MFlowCode/MFC

Length of output: 39

🏁 Script executed:

#!/bin/bash # Check if any other GPU_PARALLEL_LOOP kernels use variables allocated via GPU_ENTER_DATA elsewhere rg -n 'GPU_PARALLEL_LOOP' src/simulation/ -A 5 | grep -A 5 'copyin\|present' | head -60

Repository: MFlowCode/MFC

Length of output: 4582

🏁 Script executed:

#!/bin/bash # Critical check: verify gpu_ntrs/gpu_trs_v/gpu_trs_n are referenced but NOT in data clauses echo "=== In s_ib_model GPU_PARALLEL_LOOP (line ~850) ===" sed -n '850,870p' src/simulation/m_ib_patches.fpp echo -e "\n=== In s_ib_3d_model GPU_PARALLEL_LOOP (line ~920) ===" sed -n '920,940p' src/simulation/m_ib_patches.fpp echo -e "\n=== Checking for 'present' in GPU_PARALLEL_LOOP patterns ===" rg -n 'present=' src/simulation/ | head -10

Repository: MFlowCode/MFC

Length of output: 2233

🏁 Script executed:

#!/bin/bash # Check if gpu_ntrs/gpu_trs_v/gpu_trs_n are module-level in m_model or elsewhere fd -e fpp -e f90 | xargs grep -l 'gpu_ntrs\|gpu_trs_v' | head -5 rg -n 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/ -B 2 | head -30

Repository: MFlowCode/MFC

Length of output: 549

🏁 Script executed:

#!/bin/bash # Check if s_instantiate_STL_models is called before s_apply_ib_patches in BOTH initial and restart paths echo "=== Initial setup (lines ~100-120) ===" sed -n '100,120p' src/simulation/m_ibm.fpp echo -e "\n=== Restart/update path (lines ~935-950) ===" sed -n '935,950p' src/simulation/m_ibm.fpp

Repository: MFlowCode/MFC

Length of output: 1505

🏁 Script executed:

#!/bin/bash # Verify that GPU arrays are module-level and persisted echo "=== Module scope declaration in m_model.fpp ===" sed -n '30,40p' src/common/m_model.fpp echo -e "\n=== Confirm they're never deallocated ===" rg -n 'deallocate.*gpu_ntrs|deallocate.*gpu_trs_v|deallocate.*gpu_trs_n' src/

Repository: MFlowCode/MFC

Length of output: 810

Critical: GPU model data not refreshed in s_update_mib restart path; also declare persistent GPU arrays explicitly in kernel data clauses.

The gpu_ntrs/gpu_trs_v/gpu_trs_n arrays (module-level in m_model.fpp) are allocated and pushed to device via GPU_ENTER_DATA in s_instantiate_STL_models() with persistent lifetime. However, in the restart path (s_update_mib), s_apply_ib_patches() is called at line 945 without re-calling s_instantiate_STL_models() to refresh GPU data. If s_update_ib_rotation_matrix() modifies model state, the device arrays will be stale on the next marker generation.

Additionally, both s_ib_model and s_ib_3d_model reference gpu_ntrs/gpu_trs_v/gpu_trs_n inside GPU_PARALLEL_LOOP but do not explicitly declare them in the copyin or present clauses—they rely on implicit device persistence. For clarity and robustness, add present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' to the loop declarations.

Finally, s_ib_model (2D) scans the full grid; adding the same bounding-box narrowing logic from s_ib_3d_model would improve performance for sparse STL objects.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 828 - 953, s_update_mib’s restart path must refresh the device-resident model arrays before generating markers: ensure s_instantiate_STL_models() (or an equivalent GPU_ENTER_DATA/refresh routine for module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n) is called or re-entered before s_apply_ib_patches() in the restart branch so device data reflects any changes from s_update_ib_rotation_matrix(); in s_ib_model and s_ib_3d_model, add explicit device presence to the GPU_PARALLEL_LOOP clauses by including present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' (in addition to existing copyin) to avoid relying on implicit persistence; and narrow the 2D scan in s_ib_model by reusing the bounding-index logic from s_ib_3d_model (compute corner_distance from stl_bounding_boxes and call get_bounding_indices to set i/j limits) to avoid scanning the full grid for sparse STL patches.

coderabbitai · 2026-02-19T19:08:54Z

src/simulation/m_ib_patches.fpp

+    subroutine s_instantiate_STL_models()
+
+        ! Variables for IBM+STL
+        real(wp) :: normals(1:3) !< Boundary normal buffer
+        integer :: boundary_vertex_count, boundary_edge_count, total_vertices !< Boundary vertex
+        real(wp), allocatable, dimension(:, :, :) :: boundary_v !< Boundary vertex buffer
+        real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v !< Interpolated vertex buffer
+        real(wp) :: dx_local, dy_local, dz_local !< Levelset distance buffer
+        logical :: interpolate !< Logical variable to determine whether or not the model should be interpolated
+
+        integer :: i, j, k !< Generic loop iterators
+        integer :: patch_id
+
+        type(t_bbox) :: bbox, bbox_old
+        type(t_model) :: model
+        type(ic_model_parameters) :: params
+
+        real(wp) :: eta
+        real(wp), dimension(1:3) :: point, model_center
+        real(wp) :: grid_mm(1:3, 1:2)
+
+        real(wp), dimension(1:4, 1:4) :: transform, transform_n
+
+        dx_local = minval(dx); dy_local = minval(dy)
+        if (p /= 0) dz_local = minval(dz)
+
+        do patch_id = 1, num_ibs
+            if (patch_ib(patch_id)%geometry == 5 .or. patch_ib(patch_id)%geometry == 12) then
+                allocate (models(patch_id)%model)
+                print *, " * Reading model: "//trim(patch_ib(patch_id)%model_filepath)
+
+                model = f_model_read(patch_ib(patch_id)%model_filepath)
+                params%scale(:) = patch_ib(patch_id)%model_scale(:)
+                params%translate(:) = patch_ib(patch_id)%model_translate(:)
+                params%rotate(:) = patch_ib(patch_id)%model_rotate(:)
+                params%spc = patch_ib(patch_id)%model_spc
+                params%threshold = patch_ib(patch_id)%model_threshold
+
+                if (f_approx_equal(dot_product(params%scale, params%scale), 0._wp)) then
+                    params%scale(:) = 1._wp
+                end if
+
+                if (proc_rank == 0) then
+                    print *, " * Transforming model."
+                end if
+
+                ! Get the model center before transforming the model
+                bbox_old = f_create_bbox(model)
+                model_center(1:3) = (bbox_old%min(1:3) + bbox_old%max(1:3))/2._wp
+
+                ! Compute the transform matrices for vertices and normals
+                transform = f_create_transform_matrix(params, model_center)
+                transform_n = f_create_transform_matrix(params)
+
+                call s_transform_model(model, transform, transform_n)
+
+                ! Recreate the bounding box after transformation
+                bbox = f_create_bbox(model)
+
+                ! Show the number of vertices in the original STL model
+                if (proc_rank == 0) then
+                    print *, ' * Number of input model vertices:', 3*model%ntrs
+                end if
+
+                call f_check_boundary(model, boundary_v, boundary_vertex_count, boundary_edge_count)
+
+                ! Check if the model needs interpolation
+                if (p > 0) then
+                    call f_check_interpolation_3D(model, (/dx_local, dy_local, dz_local/), interpolate)
+                else
+                    call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+                end if
+                interpolate = .false.
+
+                ! Show the number of edges and boundary edges in 2D STL models
+                if (proc_rank == 0 .and. p == 0) then
+                    print *, ' * Number of 2D model boundary edges:', boundary_edge_count
+                end if
+
+                ! Interpolate the STL model along the edges (2D) and on triangle facets (3D)
+                if (interpolate) then
+                    if (proc_rank == 0) then
+                        print *, ' * Interpolating STL vertices.'
+                    end if
+
+                    if (p > 0) then
+                        call f_interpolate_3D(model, (/dx, dy, dz/), interpolated_boundary_v, total_vertices)
+                    else
+                        call f_interpolate_2D(boundary_v, boundary_edge_count, (/dx, dy, dz/), interpolated_boundary_v, total_vertices)
+                    end if
+
+                    if (proc_rank == 0) then
+                        print *, ' * Total number of interpolated boundary vertices:', total_vertices
+                    end if
+                end if
+
+                if (proc_rank == 0) then
+                    write (*, "(A, 3(2X, F20.10))") "    > Model:  Min:", bbox%min(1:3)
+                    write (*, "(A, 3(2X, F20.10))") "    >         Cen:", (bbox%min(1:3) + bbox%max(1:3))/2._wp
+                    write (*, "(A, 3(2X, F20.10))") "    >         Max:", bbox%max(1:3)
+
+                    grid_mm(1, :) = (/minval(x_cc(0:m)) - 0.5_wp*dx_local, maxval(x_cc(0:m)) + 0.5_wp*dx_local/)
+                    grid_mm(2, :) = (/minval(y_cc(0:n)) - 0.5_wp*dy_local, maxval(y_cc(0:n)) + 0.5_wp*dy_local/)
+
+                    if (p > 0) then
+                        grid_mm(3, :) = (/minval(z_cc(0:p)) - 0.5_wp*dz_local, maxval(z_cc(0:p)) + 0.5_wp*dz_local/)
+                    else
+                        grid_mm(3, :) = (/0._wp, 0._wp/)
+                    end if
+
+                    write (*, "(A, 3(2X, F20.10))") "    > Domain: Min:", grid_mm(:, 1)
+                    write (*, "(A, 3(2X, F20.10))") "    >         Cen:", (grid_mm(:, 1) + grid_mm(:, 2))/2._wp
+                    write (*, "(A, 3(2X, F20.10))") "    >         Max:", grid_mm(:, 2)
+                end if
+
+                allocate (stl_bounding_boxes(patch_id, 1:3, 1:3))
+                stl_bounding_boxes(patch_id, 1, 1:3) = [bbox%min(1), (bbox%min(1) + bbox%max(1))/2._wp, bbox%max(1)]
+                stl_bounding_boxes(patch_id, 2, 1:3) = [bbox%min(2), (bbox%min(2) + bbox%max(2))/2._wp, bbox%max(2)]
+                stl_bounding_boxes(patch_id, 3, 1:3) = [bbox%min(3), (bbox%min(3) + bbox%max(3))/2._wp, bbox%max(3)]
+
+                models(patch_id)%model = model
+                models(patch_id)%boundary_v = boundary_v
+                models(patch_id)%boundary_edge_count = boundary_edge_count
+                if (interpolate) then
+                    models(patch_id)%interpolate = 1
+                else
+                    models(patch_id)%interpolate = 0
+                end if
+                if (interpolate) then
+                    models(patch_id)%interpolated_boundary_v = interpolated_boundary_v
+                    models(patch_id)%total_vertices = total_vertices
+                end if
+
+            end if
+        end do
+
+        ! Pack and upload flat arrays for GPU (AFTER the loop)
+        block
+            integer :: pid, max_ntrs
+            integer :: max_bv1, max_bv2, max_bv3, max_iv1, max_iv2
+
+            max_ntrs = 0
+            max_bv1 = 0; max_bv2 = 0; max_bv3 = 0
+            max_iv1 = 0; max_iv2 = 0
+
+            do pid = 1, num_ibs
+                if (allocated(models(pid)%model)) then
+                    call s_pack_model_for_gpu(models(pid))
+                    max_ntrs = max(max_ntrs, models(pid)%ntrs)
+                end if
+                if (allocated(models(pid)%boundary_v)) then
+                    max_bv1 = max(max_bv1, size(models(pid)%boundary_v, 1))
+                    max_bv2 = max(max_bv2, size(models(pid)%boundary_v, 2))
+                    max_bv3 = max(max_bv3, size(models(pid)%boundary_v, 3))
+                end if
+                if (allocated(models(pid)%interpolated_boundary_v)) then
+                    max_iv1 = max(max_iv1, size(models(pid)%interpolated_boundary_v, 1))
+                    max_iv2 = max(max_iv2, size(models(pid)%interpolated_boundary_v, 2))
+                end if
+            end do
+
+            if (max_ntrs > 0) then
+                allocate (gpu_ntrs(1:num_ibs))
+                allocate (gpu_trs_v(1:3, 1:3, 1:max_ntrs, 1:num_ibs))
+                allocate (gpu_trs_n(1:3, 1:max_ntrs, 1:num_ibs))
+                allocate (gpu_interpolate(1:num_ibs))
+                allocate (gpu_boundary_edge_count(1:num_ibs))
+                allocate (gpu_total_vertices(1:num_ibs))
+
+                gpu_ntrs = 0
+                gpu_trs_v = 0._wp
+                gpu_trs_n = 0._wp
+                gpu_interpolate = 0
+                gpu_boundary_edge_count = 0
+                gpu_total_vertices = 0
+
+                if (max_bv1 > 0) then
+                    allocate (gpu_boundary_v(1:max_bv1, 1:max_bv2, 1:max_bv3, 1:num_ibs))
+                    gpu_boundary_v = 0._wp
+                end if
+
+                if (max_iv1 > 0) then
+                    allocate (gpu_interpolated_boundary_v(1:max_iv1, 1:max_iv2, 1:num_ibs))
+                    gpu_interpolated_boundary_v = 0._wp
+                end if
+
+                do pid = 1, num_ibs
+                    if (allocated(models(pid)%model)) then
+                        gpu_ntrs(pid) = models(pid)%ntrs
+                        gpu_trs_v(:, :, 1:models(pid)%ntrs, pid) = models(pid)%trs_v
+                        gpu_trs_n(:, 1:models(pid)%ntrs, pid) = models(pid)%trs_n
+                        gpu_interpolate(pid) = models(pid)%interpolate
+                        gpu_boundary_edge_count(pid) = models(pid)%boundary_edge_count
+                        gpu_total_vertices(pid) = models(pid)%total_vertices
+                    end if
+                    if (allocated(models(pid)%boundary_v)) then
+                        gpu_boundary_v(1:size(models(pid)%boundary_v, 1), &
+                                       1:size(models(pid)%boundary_v, 2), &
+                                       1:size(models(pid)%boundary_v, 3), pid) = models(pid)%boundary_v
+                    end if
+                    if (allocated(models(pid)%interpolated_boundary_v)) then
+                        gpu_interpolated_boundary_v(1:size(models(pid)%interpolated_boundary_v, 1), &
+                                                    1:size(models(pid)%interpolated_boundary_v, 2), pid) = models(pid)%interpolated_boundary_v
+                    end if
+                end do
+
+                $:GPU_ENTER_DATA(copyin='[gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices]')
+                if (allocated(gpu_boundary_v)) then
+                    $:GPU_ENTER_DATA(copyin='[gpu_boundary_v]')
+                end if
+                if (allocated(gpu_interpolated_boundary_v)) then
+                    $:GPU_ENTER_DATA(copyin='[gpu_interpolated_boundary_v]')
+                end if
+            end if
+        end block
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for all occurrences of stl_bounding_boxes in the codebase echo "=== stl_bounding_boxes allocations and usage ===" rg -n "stl_bounding_boxes" src/simulation/m_ib_patches.fpp src/common/m_model.fpp 2>/dev/null || echo "Not found in specified files" # Broader search rg -n "stl_bounding_boxes" --type f90 --type fpp 2>/dev/null || true

Repository: MFlowCode/MFC

Length of output: 999

🏁 Script executed:

#!/bin/bash # Check total_vertices initialization and usage pattern echo "=== total_vertices usage pattern ===" rg -n "total_vertices" src/simulation/m_ib_patches.fpp | head -50 # Check gpu_total_vertices echo "=== gpu_total_vertices pattern ===" rg -n "gpu_total_vertices" src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1328

🏁 Script executed:

#!/bin/bash # Check gpu_* allocations and deallocations echo "=== gpu_* array allocations ===" rg -n "allocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30 echo "=== gpu_* deallocations ===" rg -n "deallocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30 # Check if there's a finalize/cleanup subroutine echo "=== Potential cleanup routines ===" rg -n "subroutine.*finalize|subroutine.*cleanup" src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 855

🏁 Script executed:

#!/bin/bash # Check the module-level declarations to understand data structure echo "=== Module-level declarations for gpu_* and stl_bounding_boxes ===" rg -n "real.*gpu_|integer.*gpu_|allocatable.*gpu_|real.*stl_bounding|allocatable.*stl_bounding" src/simulation/m_ib_patches.fpp | head -40 # Look for module/global scope context head -150 src/simulation/m_ib_patches.fpp | tail -100

Repository: MFlowCode/MFC

Length of output: 4044

s_instantiate_STL_models has critical allocation/initialization issues breaking multi-model scenarios.

All four issues are confirmed:

stl_bounding_boxes allocation inside loop: Line 1069 allocates with allocate(stl_bounding_boxes(patch_id, 1:3, 1:3)) within the do patch_id = 1, num_ibs loop. This will fail on the second iteration—the first allocates shape (1,3,3), the second attempts (2,3,3), but the array is already allocated. Should allocate once before the loop with shape (num_ibs, 1:3, 1:3).

total_vertices uninitialized when interpolate=false: The local variable total_vertices (line 958) is only assigned at lines 1040 or 1042 inside the if (interpolate) block. Line 1084 unconditionally copies it into models(patch_id)%total_vertices, propagating garbage values when interpolation is skipped. Initialize or conditionally assign.

Incorrect spacing vector in interpolation: Lines 1040 and 1042 pass (/dx, dy, dz/) (array elements, not spacing magnitudes), though this is currently masked by line 1037 forcing interpolate = .false.. If interpolation is re-enabled, use (/dx_local, dy_local, dz_local/).

Missing allocation guards on gpu_ arrays*: Lines 1116–1121 allocate module-level gpu_* arrays without checking if already allocated. Calling s_instantiate_STL_models twice will fail. Add guards: if (.not. allocated(gpu_ntrs)) allocate(...).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 954 - 1169, s_instantiate_STL_models has four fixes: allocate stl_bounding_boxes once before the patch loop with shape (num_ibs,1:3,1:3) instead of calling allocate(stl_bounding_boxes(patch_id,...)) inside the loop; ensure total_vertices is initialized or only copied into models(patch_id)%total_vertices when interpolate is true (do not propagate uninitialized total_vertices when interpolation is skipped); pass spacing magnitudes to the interpolation routines by using (/dx_local, dy_local, dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the unconditional interpolate = .false. that currently masks interpolation logic); and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v, gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each allocate to allow repeated calls to s_instantiate_STL_models.

coderabbitai · 2026-02-19T19:08:54Z

toolchain/mfc/case_validator.py

+        self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
                     "num_ibs must be between 1 and num_patches_max (10)")


⚠️ Potential issue | 🟡 Minor

Fix outdated error message for num_ibs upper bound.

The validation allows up to 1000, but the message still says 10, which will mislead users. Consider deriving the bound from CONSTRAINTS to avoid future drift.

✅ Suggested fix

- self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), - "num_ibs must be between 1 and num_patches_max (10)") + max_ibs = CONSTRAINTS["num_ibs"]["max"] + self.prohibit(ib and (num_ibs <= 0 or num_ibs > max_ibs), + f"num_ibs must be between 1 and num_patches_max ({max_ibs})")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@toolchain/mfc/case_validator.py` around lines 687 - 688, The error message passed to self.prohibit for the num_ibs check is stale (mentions 10) while the actual upper bound is 1000; update the message to reflect the real max and, to prevent future drift, construct the message using the authoritative constraint value (e.g. the CONSTRAINTS constant or the class attribute used for bounds) rather than a hardcoded literal—locate the self.prohibit call that checks num_ibs (the line with self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), ...)) and interpolate/format the upper-bound from CONSTRAINTS (or self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as appropriate) so the message reads "num_ibs must be between 1 and <max>".

danieljvickers added 7 commits February 16, 2026 12:19

Moved all of the IB marker calculation to the GPU without copy

0a304a2

Added profiling and increased maximum num IBs to 1000

e8c778e

Performance tuning complted for marker generation

5d885bb

Bindary search for IB index region beginning for reduced IB marker co…

85889c0

…mpute

ghost points are now computed on the GPU

f0085c9

image points computed on the GPU for x4 performance in that subroutine

bfcc593

Merge branch 'master' into gpu-optimizations

3c4b6dd

danieljvickers added 3 commits February 17, 2026 11:20

Need WAY more parameters in the case file... We should probably do so…

5201ee7

…mething about that...

Extended the binary search reduction to all 3D IB geometries

622edb0

Extended area reduction to all non-model IBs

0a089ce

danieljvickers added 6 commits February 18, 2026 11:45

Intermittent commit for GPU STLs

804a286

Ib markers computed on GPU working

64bc348

Passes STL tests with GPU compute for IB markers (not added levelset …

bc972ca

…yet)

Moved mdoel-specific code to the model file for cleanliness

a1769d0

STLs appear to be working on the GPU with NVHPC!

5e58655

STLs ran on GPU in 3D!

6cc7acc

danieljvickers marked this pull request as ready for review February 19, 2026 18:39

cubic-dev-ai bot reviewed Feb 19, 2026

View reviewed changes

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

	ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
	ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

	call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
	call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")

		self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
		"num_ibs must be between 1 and num_patches_max (10)")

Conversation

danieljvickers commented Feb 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Testing

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

danieljvickers commented Feb 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 17, 2026 •

edited

Loading

codecov bot commented Feb 17, 2026 •

edited

Loading