MIBM Performance Optimizations#1157
MIBM Performance Optimizations#1157danieljvickers wants to merge 16 commits intoMFlowCode:masterfrom
Conversation
📝 WalkthroughWalkthroughThis PR implements GPU-accelerated immersed boundary method (IBM) support by introducing GPU-friendly data structures for STL models, refactoring IB patch handling with type-based markers, adding bounding-index optimizations, expanding model limits from 10 to 1000, and integrating profiling instrumentation across multiple computational kernels. Changes
Sequence Diagram(s)sequenceDiagram
participant CPU as CPU/Host
participant GPU as GPU Device
participant Model as Model Data
participant LevelSet as LevelSet Computation
participant IBM as IBM Forces
CPU->>Model: s_instantiate_STL_models()
Note over Model: Read & transform STL models
Model->>Model: Compute bounding boxes
Model->>Model: Pack data (ntrs, trs_v, trs_n)
CPU->>GPU: s_pack_model_for_gpu()
Note over GPU: Upload gpu_ntrs, gpu_trs_v, gpu_trs_n<br/>gpu_boundary_v, gpu_interpolate
CPU->>LevelSet: s_compute_levelset()
LevelSet->>GPU: f_distance_normals_3d_flat()
Note over GPU: Parallel computation using<br/>pre-packed GPU data arrays
GPU->>LevelSet: Return distance & normals
CPU->>CPU: s_apply_ib_patches(ib_markers)
Note over CPU: Type-based ib_markers<br/>with bounding-index constraints
CPU->>IBM: s_update_mib()
IBM->>GPU: NVTX profiled GPU kernels
Note over GPU: Unified 2D/3D logic<br/>Atomic counters for ghost points<br/>Image point interpolation
GPU->>IBM: Force data
IBM->>CPU: Return computed forces
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1157 +/- ##
==========================================
+ Coverage 44.07% 44.24% +0.17%
==========================================
Files 70 70
Lines 20431 20505 +74
Branches 1974 1950 -24
==========================================
+ Hits 9004 9073 +69
- Misses 10291 10306 +15
+ Partials 1136 1126 -10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
7 issues found across 12 files
Confidence score: 2/5
- High risk due to concrete breakages:
stl_bounding_boxesre-allocation insrc/simulation/m_ib_patches.fppcan crash at runtime with multiple STL patches, andsrc/simulation/m_compute_levelset.fppchanges makeif (interpolate)invalid in Fortran 3D path. - An unconditional
interpolate = .false.insrc/simulation/m_ib_patches.fppdisables STL interpolation, making the intended logic dead and likely altering results. - Pay close attention to
src/simulation/m_ib_patches.fppandsrc/simulation/m_compute_levelset.fpp- crash/compile risks around interpolation and allocation.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="toolchain/mfc/case_validator.py">
<violation number="1" location="toolchain/mfc/case_validator.py:687">
P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</violation>
</file>
<file name="src/simulation/m_compute_levelset.fpp">
<violation number="1" location="src/simulation/m_compute_levelset.fpp:647">
P1: Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</violation>
</file>
<file name="src/simulation/m_time_steppers.fpp">
<violation number="1" location="src/simulation/m_time_steppers.fpp:805">
P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</violation>
</file>
<file name="src/common/m_model.fpp">
<violation number="1" location="src/common/m_model.fpp:550">
P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</violation>
</file>
<file name="src/simulation/m_ib_patches.fpp">
<violation number="1" location="src/simulation/m_ib_patches.fpp:1026">
P1: Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</violation>
<violation number="2" location="src/simulation/m_ib_patches.fpp:1069">
P1: Runtime crash with multiple STL models: `stl_bounding_boxes` is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use `num_ibs` for the first dimension.</violation>
</file>
<file name="src/simulation/m_ibm.fpp">
<violation number="1" location="src/simulation/m_ibm.fpp:114">
P3: `s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| integer :: i, j, k, patch_id, boundary_edge_count, total_vertices | ||
| logical :: interpolate | ||
| integer :: interpolate |
There was a problem hiding this comment.
P1: Changing interpolate to INTEGER breaks the if (interpolate) then check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_compute_levelset.fpp, line 647:
<comment>Changing interpolate to INTEGER breaks the `if (interpolate) then` check in the 3D-model path; INTEGER values are not valid logical expressions in Fortran, so this will not compile. Either keep interpolate LOGICAL (and assign with a logical comparison) or update the condition to compare against 1.</comment>
<file context>
@@ -655,7 +644,7 @@ contains
integer :: i, j, k, patch_id, boundary_edge_count, total_vertices
- logical :: interpolate
+ integer :: interpolate
real(wp), dimension(1:3) :: center, xyz_local
real(wp) :: normals(1:3) !< Boundary normal buffer
</file context>
| write (*, "(A, 3(2X, F20.10))") " > Max:", grid_mm(:, 2) | ||
| end if | ||
|
|
||
| allocate (stl_bounding_boxes(patch_id, 1:3, 1:3)) |
There was a problem hiding this comment.
P1: Runtime crash with multiple STL models: stl_bounding_boxes is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use num_ibs for the first dimension.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ib_patches.fpp, line 1069:
<comment>Runtime crash with multiple STL models: `stl_bounding_boxes` is allocated inside the per-patch loop. When a second STL patch is encountered, Fortran will raise an error because the array is already allocated. Move the allocation before the loop or guard it with a check and use `num_ibs` for the first dimension.</comment>
<file context>
@@ -865,83 +806,368 @@ contains
+ write (*, "(A, 3(2X, F20.10))") " > Max:", grid_mm(:, 2)
+ end if
+
+ allocate (stl_bounding_boxes(patch_id, 1:3, 1:3))
+ stl_bounding_boxes(patch_id, 1, 1:3) = [bbox%min(1), (bbox%min(1) + bbox%max(1))/2._wp, bbox%max(1)]
+ stl_bounding_boxes(patch_id, 2, 1:3) = [bbox%min(2), (bbox%min(2) + bbox%max(2))/2._wp, bbox%max(2)]
</file context>
| else | ||
| call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate) | ||
| end if | ||
| interpolate = .false. |
There was a problem hiding this comment.
P1: Debugging override left in: interpolate = .false. unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ib_patches.fpp, line 1026:
<comment>Debugging override left in: `interpolate = .false.` unconditionally disables STL model interpolation, making the interpolation check dead code. This appears to be a leftover from testing — remove this line to restore the intended interpolation behavior.</comment>
<file context>
@@ -865,83 +806,368 @@ contains
+ else
+ call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate)
+ end if
+ interpolate = .false.
+
+ ! Show the number of edges and boundary edges in 2D STL models
</file context>
| ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center | ||
| ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k) | ||
| ! cast sample rays in all directions | ||
| ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp |
There was a problem hiding this comment.
P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/common/m_model.fpp, line 550:
<comment>Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</comment>
<file context>
@@ -493,58 +526,116 @@ contains
+ ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
+ ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
+ ! cast sample rays in all directions
+ ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+ end do
ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
</file context>
| ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp | |
| ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp |
| self.prohibit(ib and n <= 0, | ||
| "Immersed Boundaries do not work in 1D (requires n > 0)") | ||
| self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10), | ||
| self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), |
There was a problem hiding this comment.
P3: Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At toolchain/mfc/case_validator.py, line 687:
<comment>Update the validation error message to match the new upper bound (1000) so users get accurate guidance when the check fails.</comment>
<file context>
@@ -684,7 +684,7 @@ def check_ibm(self):
self.prohibit(ib and n <= 0,
"Immersed Boundaries do not work in 1D (requires n > 0)")
- self.prohibit(ib and (num_ibs <= 0 or num_ibs > 10),
+ self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
"num_ibs must be between 1 and num_patches_max (10)")
self.prohibit(not ib and num_ibs > 0,
</file context>
| integer :: i | ||
| logical :: forces_computed | ||
|
|
||
| call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES") |
There was a problem hiding this comment.
P3: Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_time_steppers.fpp, line 805:
<comment>Fix the NVTX range label typo (“IMERSED”) so profiling markers are consistent and searchable.</comment>
<file context>
@@ -802,6 +802,8 @@ contains
integer :: i
logical :: forces_computed
+ call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES")
+
forces_computed = .false.
</file context>
| call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES") | |
| call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES") |
| @@ -28,6 +28,8 @@ module m_ibm | |||
|
|
|||
There was a problem hiding this comment.
P3: s_populate_ib_buffers is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in s_ibm_setup and s_update_mib. Consider removing the subroutine to avoid confusion for future contributors.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/simulation/m_ibm.fpp, line 114:
<comment>`s_populate_ib_buffers` is now dead code — it is a private subroutine that is no longer called from anywhere after the refactoring removed its callsites in `s_ibm_setup` and `s_update_mib`. Consider removing the subroutine to avoid confusion for future contributors.</comment>
<file context>
@@ -96,19 +100,18 @@ contains
- call s_apply_ib_patches(ib_markers%sf(0:m, 0:n, 0:p))
$:GPU_UPDATE(device='[ib_markers%sf]')
- call s_populate_ib_buffers()
+ call s_apply_ib_patches(ib_markers)
$:GPU_UPDATE(host='[ib_markers%sf]')
do i = 1, num_ibs
</file context>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
src/simulation/m_time_steppers.fpp (1)
805-855:⚠️ Potential issue | 🟡 MinorFix NVTX range label typo for consistent profiling.
The label reads “IMERSED” instead of “IMMERSED”, which can break grouping in profiling tools.
✏️ Proposed fix
- call nvtxStartRange("PROPAGATE-IMERSED-BOUNDARIES") + call nvtxStartRange("PROPAGATE-IMMERSED-BOUNDARIES")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_time_steppers.fpp` around lines 805 - 855, The NVTX range label passed to nvtxStartRange in the MTimeSteppers routine is misspelled ("PROPAGATE-IMERSED-BOUNDARIES"); update the string to the correct label "PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools group this region correctly (look for the nvtxStartRange(...) call in the do-loop that updates patch_ib and ensure nvtxEndRange remains paired).src/simulation/m_compute_levelset.fpp (1)
641-709:⚠️ Potential issue | 🔴 CriticalFix integer
interpolateused as logical in 3D model path (won't compile).Line 691 uses
if (interpolate) thenbutinterpolateis declared asinteger(line 647). This is not valid Fortran and will fail compilation. The 2D branch already correctly usesif (interpolate == 1) then(line 704).Proposed fix
! Get the shortest distance between the cell center and the interpolated model boundary - if (interpolate) then + if (interpolate == 1) then gp%levelset = f_interpolated_distance(gpu_interpolated_boundary_v(:, :, patch_id), total_vertices, xyz_local) else gp%levelset = distance end if🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_compute_levelset.fpp` around lines 641 - 709, The integer variable interpolate in subroutine s_model_levelset is being used as a logical in the 3D branch (if (interpolate) then) which will not compile; update that conditional to explicitly compare the integer (e.g., if (interpolate == 1) then) or change the variable to a logical and adjust all uses accordingly so that the 3D path (the block that currently calls f_interpolated_distance or assigns gp%levelset = distance) uses a valid boolean check on interpolate.src/simulation/m_ibm.fpp (3)
589-676:⚠️ Potential issue | 🟠 MajorAdd
patch_idto the GPUprivatelist ins_find_ghost_points.
patch_idis assigned in every iteration (lines 628, 666) and used only within that iteration. Without being declared private, multiple GPU threads will access a sharedpatch_idvariable, causing race conditions when reading frompatch_ib(patch_id)%....Proposed fix
- $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3) + $:GPU_PARALLEL_LOOP(private='[i,j,k,ii,jj,kk,is_gp,local_idx,patch_id]', copyin='[count,count_i, x_domain, y_domain, z_domain]', firstprivate='[gp_layers,gp_layers_z]', collapse=3)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ibm.fpp` around lines 589 - 676, The variable patch_id in s_find_ghost_points is written per-iteration but not listed as private in the $:GPU_PARALLEL_LOOP, causing concurrent threads to race when reading patch_ib(patch_id); update the GPU pragma for s_find_ghost_points to include patch_id in the private list (alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each thread gets its own patch_id before using patch_ib(patch_id)%... .
441-534:⚠️ Potential issue | 🟠 MajorFix pointer association and silent out-of-bounds risk in GPU kernel image-point search.
Inside the GPU-parallel loop,
s_cc => x_cc/y_cc/z_ccis pointer association within device code, and the bounds safety check (lines 509–521) is compiled out for GPU builds (#if !defined(MFC_OpenACC)). This means thedo whileloop can walkindexbeyond[-buff_size, bound]silently, storing invalid values inip_gridwithout error or warning on GPU.Recommended fixes:
- Add explicit bounds clamping inside the loop: if
indexreaches the boundary, exit and clamp to a valid index.- Alternatively, replace the linear search with a binary search over monotone
s_ccfor both safety and performance.Current unsafe loop (lines 506–524)
do while ((temp_loc < s_cc(index) & .or. temp_loc > s_cc(index + 1))) index = index + dir `#if` !defined(MFC_OpenACC) && !defined(MFC_OpenMP) if (index < -buff_size .or. index > bound) then ! error handling (skipped on GPU) error stop "Ghost Point and Image Point on Different Processors" end if `#endif` end do🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ibm.fpp` around lines 441 - 534, s_compute_image_points contains a GPU-parallel do-while that advances index using pointer s_cc and can run out-of-bounds on device builds because the host-only error check is compiled out; fix by adding an in-device-safe boundary clamp or replace the linear scan with a binary search over s_cc. Specifically, inside the loop that updates index (references: s_cc, index, dir, temp_loc, bound, buff_size) add an explicit check that if index reaches the lower/upper limits (-buff_size or bound) you break the loop and clamp index to the nearest valid cell before assigning ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe binary_search helper that takes s_cc and temp_loc and returns the correct index and use that in place of the do-while to guarantee safety and better performance.
967-1120:⚠️ Potential issue | 🟠 MajorAdd
GPU_UPDATE(device=...)after MPI reductions to maintain self-contained GPU synchronization.The MPI reductions at lines 1108–1109 occur on the host, followed by host-side writes to
patch_ib(i)%forceandpatch_ib(i)%torque. While the caller (s_update_mibat line 940 in m_time_steppers.fpp) does eventually syncpatch_ibto device, makings_compute_ib_forcesself-contained by addingGPU_UPDATE(device='[patch_ib(1:num_ibs)]')after the final loop ensures coherence is handled within the function and aligns with the guideline: "Wrap MPI calls withGPU_UPDATEto ensure GPU-host synchronization before and after MPI communication."Suggested fix
do i = 1, num_ibs patch_ib(i)%force(:) = forces(i, :) patch_ib(i)%torque(:) = matmul(patch_ib(i)%rotation_matrix_inverse, torques(i, :)) end do + + ! Sync updated patch_ib to device for GPU kernels + $:GPU_UPDATE(device='[patch_ib(1:num_ibs)]')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ibm.fpp` around lines 967 - 1120, The MPI reductions in s_compute_ib_forces (calls to s_mpi_allreduce_vectors_sum for forces and torques) are host-side and must be bracketed by a GPU update to ensure device/host coherence; insert GPU_UPDATE(device='[patch_ib(1:num_ibs)]') immediately after the two s_mpi_allreduce_vectors_sum calls and before any writes to patch_ib(i)%force or patch_ib(i)%torque so the routine handles synchronization itself.src/common/m_model.fpp (1)
498-576:⚠️ Potential issue | 🟠 MajorRay direction generation is incorrect and risks biased/degenerate rays.
Ray directions in both
f_model_is_inside(line 550) andf_model_is_inside_flat(line 608) are computed aspoint(k) + f_model_random_number(rand_seed) - 0.5_wp. This couples the direction to absolute coordinates, causing heavy bias for large coordinate values. Additionally, both functions normalize without guarding against zero-length vectors, risking division by zero.Ray directions should be independent of the point's absolute location—only the ray origin should depend on
point. The fix removes thepoint(k)term and addsmax(..., sgm_eps)guards on normalization (consistent with existing patterns in the codebase):Proposed changes
Line 550 (f_model_is_inside): - ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp + ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp Line 552: - ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))) + ray_dirs(i, :) = ray_dirs(i, :)/max(sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :))), sgm_eps) Line 608 (f_model_is_inside_flat): - dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp + dir(k) = f_model_random_number(rand_seed) - 0.5_wp Lines 610–611: - dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3)) - dir(:) = dir(:)/dir_mag + dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3)) + dir(:) = dir(:)/max(dir_mag, sgm_eps)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/common/m_model.fpp` around lines 498 - 576, The ray direction generation in f_model_is_inside (and similarly f_model_is_inside_flat) incorrectly adds point(k) to the random direction (coupling direction to absolute coordinates) and normalizes without guarding against zero length; fix by computing ray_dirs(i,k) = f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and when normalizing replace direct division by sqrt(sum(...)) with division by max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and f_model_is_inside_flat and keep ray_origins assignment as-is using f_model_random_number(rand_seed).src/simulation/m_ib_patches.fpp (2)
169-323:⚠️ Potential issue | 🟠 MajorAdd loop-local scalars
xa,yc, anddycdxcto theprivatelist inGPU_PARALLEL_LOOP.Lines 265–266: These three scalars are assigned per-iteration inside the GPU-parallel region (lines 274–280) but are not declared private, creating a race condition where multiple GPU threads write to the same global variables. Update the
privateclause toprivate='[i,j,xy_local,k,f,xa,yc,dycdxc]'.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 169 - 323, The GPU_PARALLEL_LOOP in subroutine s_ib_airfoil currently omits loop-local scalars xa, yc, and dycdxc from its private list, causing a race condition; update the GPU_PARALLEL_LOOP directive's private clause to include xa, yc, and dycdxc (so it becomes private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets its own copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc and writes into ib_markers%sf.
28-112:⚠️ Potential issue | 🟠 MajorPublic API change:
s_apply_ib_patchesnow takestype(integer_field)— all callers verified and updated correctly.
s_apply_ib_patcheschanged its argument type (line 60). Both call sites insrc/simulation/m_ibm.fpp(lines 114, 945) correctly passtype(integer_field)as argument. This breaking signature change affects the public API and is appropriately flagged for review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ib_patches.fpp` around lines 28 - 112, s_apply_ib_patches now expects an argument of type(integer_field) (the signature changed), so update all call sites and any interface/wrapper that referenced the old signature; specifically ensure the calls in m_ibm.fpp (the two places that were already updated) match the new prototype and remove/adjust any legacy declarations or interface blocks that declared s_apply_ib_patches with the old argument type, and update any documentation/comments that describe s_apply_ib_patches to reflect it takes a type(integer_field) argument.
🧹 Nitpick comments (2)
src/common/m_constants.fpp (1)
25-25: Run all three targets after updating shared constants.This change is in
src/common/, so please ensure pre_process, simulation, and post_process targets are exercised to catch any cross-target regressions.
Based on learnings: “Changes tosrc/common/affect all three executables (pre_process, simulation, post_process); test all three targets.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/common/m_constants.fpp` at line 25, You changed the shared constant num_patches_max in src/common/m_constants.fpp which can impact all executables; rebuild and run the pre_process, simulation, and post_process targets (or their test suites) to exercise cross-target behavior and catch regressions — specifically rebuild the three binaries that consume num_patches_max (pre_process, simulation, post_process), run their full test inputs or integration runs, and verify no runtime assertions, array bounds, or config mismatches occur after the change.src/simulation/m_ibm.fpp (1)
536-586: Use reduction instead of atomic operations to improve scaling for grid-cell classification.
num_gps_localandnum_inner_gps_localare currently incremented viaGPU_ATOMIC(update)for every classified cell, causing high contention on large grids. Since these are simple sums, useGPU_PARALLEL_LOOPreductions instead:$:GPU_PARALLEL_LOOP(collapse=3, private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]', & & reduction='[[num_gps_local, num_inner_gps_local]]', reductionOp='[+, +]')Replace the
copydirective and remove bothGPU_ATOMICcalls. Gang/vector reductions avoid per-thread atomics and scale significantly better.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/simulation/m_ibm.fpp` around lines 536 - 586, The loop in subroutine s_find_num_ghost_points uses per-cell GPU_ATOMIC updates on num_gps_local and num_inner_gps_local causing contention; replace the copy-based GPU_PARALLEL_LOOP with a reduction-based directive so the counts are accumulated via GPU reductions instead of atomics (use GPU_PARALLEL_LOOP with collapse=3, private='[i,j,k,ii,jj,kk,is_gp]', firstprivate='[gp_layers,gp_layers_z]' and a reduction clause for [num_gps_local, num_inner_gps_local] with '+'), remove the two GPU_ATOMIC(atomic='update') regions and keep the rest of the ghost-classification logic unchanged, then assign num_gps_out and num_inner_gps_out from the reduced locals as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/common/m_derived_types.fpp`:
- Around line 186-199: The t_model_array derived type declares scalar fields
boundary_edge_count, total_vertices, interpolate and ntrs without defaults which
can leave garbage values; add default initializers for these scalars in the type
definition (set boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0)
so all instances start deterministic before any assignment, and then run and
verify pre_process, post_process and simulation builds/tests to ensure the
initialization change doesn't break their workflows; update any constructors or
code that relied on uninitialized behavior if necessary.
In `@src/common/m_model.fpp`:
- Around line 1209-1267: The subroutine f_distance_normals_3D_flat can index
trs_n(:, tri_idx, pid) with tri_idx==0 when ntrs <= 0; add a guard at the top of
f_distance_normals_3D_flat that checks if ntrs <= 0 and in that case set
distance to a safe value (e.g. a large positive sentinel or 0._wp as
appropriate) and set normals to a defined default (e.g. zero vector) and return
immediately to avoid using tri_idx/uninitialized indices; ensure you reference
and update tri_idx, distance and normals in this early-return path so no
out-of-bounds access to trs_n occurs.
- Around line 20-42: Add a module finalizer in m_model.fpp (e.g., subroutine
s_finalize_model or a MODULE PROCEDURE bound to final) that checks
ALLOCATED(...) and calls DEALLOCATE(...) for all public allocatable arrays
gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, stl_bounding_boxes
and models to ensure GPU/host memory is freed during simulation cleanup; export
or document the finalizer so the cleanup path in your driver or cleanup routine
calls it. Also remove the unnecessary "use m_model" import from
src/pre_process/m_icpp_patches.fpp (or the specific pre_process file mentioned)
since it does not reference the gpu_* arrays or models, eliminating the unwanted
coupling.
- Around line 1402-1415: s_pack_model_for_gpu currently unconditionally
allocates ma%trs_v and ma%trs_n; make it re-entrant by checking existing
allocations before allocating: set ma%ntrs = ma%model%ntrs, then for ma%trs_v
and ma%trs_n use allocated(...) to detect prior allocation and only allocate if
not allocated or if the current allocation dimensions don't match ma%ntrs (in
that case deallocate then allocate with the correct bounds); ensure you copy
into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation checks complete.
In `@src/simulation/m_ib_patches.fpp`:
- Around line 954-1169: s_instantiate_STL_models has four fixes: allocate
stl_bounding_boxes once before the patch loop with shape (num_ibs,1:3,1:3)
instead of calling allocate(stl_bounding_boxes(patch_id,...)) inside the loop;
ensure total_vertices is initialized or only copied into
models(patch_id)%total_vertices when interpolate is true (do not propagate
uninitialized total_vertices when interpolation is skipped); pass spacing
magnitudes to the interpolation routines by using (/dx_local, dy_local,
dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the
unconditional interpolate = .false. that currently masks interpolation logic);
and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v,
gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each
allocate to allow repeated calls to s_instantiate_STL_models.
- Around line 828-953: s_update_mib’s restart path must refresh the
device-resident model arrays before generating markers: ensure
s_instantiate_STL_models() (or an equivalent GPU_ENTER_DATA/refresh routine for
module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n) is called or re-entered before
s_apply_ib_patches() in the restart branch so device data reflects any changes
from s_update_ib_rotation_matrix(); in s_ib_model and s_ib_3d_model, add
explicit device presence to the GPU_PARALLEL_LOOP clauses by including
present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' (in addition to existing copyin) to
avoid relying on implicit persistence; and narrow the 2D scan in s_ib_model by
reusing the bounding-index logic from s_ib_3d_model (compute corner_distance
from stl_bounding_boxes and call get_bounding_indices to set i/j limits) to
avoid scanning the full grid for sparse STL patches.
In `@toolchain/mfc/case_validator.py`:
- Around line 687-688: The error message passed to self.prohibit for the num_ibs
check is stale (mentions 10) while the actual upper bound is 1000; update the
message to reflect the real max and, to prevent future drift, construct the
message using the authoritative constraint value (e.g. the CONSTRAINTS constant
or the class attribute used for bounds) rather than a hardcoded literal—locate
the self.prohibit call that checks num_ibs (the line with self.prohibit(ib and
(num_ibs <= 0 or num_ibs > 1000), ...)) and interpolate/format the upper-bound
from CONSTRAINTS (or self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as
appropriate) so the message reads "num_ibs must be between 1 and <max>".
---
Outside diff comments:
In `@src/common/m_model.fpp`:
- Around line 498-576: The ray direction generation in f_model_is_inside (and
similarly f_model_is_inside_flat) incorrectly adds point(k) to the random
direction (coupling direction to absolute coordinates) and normalizes without
guarding against zero length; fix by computing ray_dirs(i,k) =
f_model_random_number(rand_seed) - 0.5_wp (no point(k) term) and when
normalizing replace direct division by sqrt(sum(...)) with division by
max(sqrt(sum(ray_dirs(i,:)*ray_dirs(i,:))), sgm_eps_wp) (or the project-wide
small epsilon symbol) to avoid divide-by-zero; update both f_model_is_inside and
f_model_is_inside_flat and keep ray_origins assignment as-is using
f_model_random_number(rand_seed).
In `@src/simulation/m_compute_levelset.fpp`:
- Around line 641-709: The integer variable interpolate in subroutine
s_model_levelset is being used as a logical in the 3D branch (if (interpolate)
then) which will not compile; update that conditional to explicitly compare the
integer (e.g., if (interpolate == 1) then) or change the variable to a logical
and adjust all uses accordingly so that the 3D path (the block that currently
calls f_interpolated_distance or assigns gp%levelset = distance) uses a valid
boolean check on interpolate.
In `@src/simulation/m_ib_patches.fpp`:
- Around line 169-323: The GPU_PARALLEL_LOOP in subroutine s_ib_airfoil
currently omits loop-local scalars xa, yc, and dycdxc from its private list,
causing a race condition; update the GPU_PARALLEL_LOOP directive's private
clause to include xa, yc, and dycdxc (so it becomes
private='[i,j,xy_local,k,f,xa,yc,dycdxc]') so each GPU thread gets its own
copies when evaluating the airfoil logic that assigns xa, yc, and dycdxc and
writes into ib_markers%sf.
- Around line 28-112: s_apply_ib_patches now expects an argument of
type(integer_field) (the signature changed), so update all call sites and any
interface/wrapper that referenced the old signature; specifically ensure the
calls in m_ibm.fpp (the two places that were already updated) match the new
prototype and remove/adjust any legacy declarations or interface blocks that
declared s_apply_ib_patches with the old argument type, and update any
documentation/comments that describe s_apply_ib_patches to reflect it takes a
type(integer_field) argument.
In `@src/simulation/m_ibm.fpp`:
- Around line 589-676: The variable patch_id in s_find_ghost_points is written
per-iteration but not listed as private in the $:GPU_PARALLEL_LOOP, causing
concurrent threads to race when reading patch_ib(patch_id); update the GPU
pragma for s_find_ghost_points to include patch_id in the private list
(alongside i,j,k,ii,jj,kk,is_gp,local_idx) so each thread gets its own patch_id
before using patch_ib(patch_id)%... .
- Around line 441-534: s_compute_image_points contains a GPU-parallel do-while
that advances index using pointer s_cc and can run out-of-bounds on device
builds because the host-only error check is compiled out; fix by adding an
in-device-safe boundary clamp or replace the linear scan with a binary search
over s_cc. Specifically, inside the loop that updates index (references: s_cc,
index, dir, temp_loc, bound, buff_size) add an explicit check that if index
reaches the lower/upper limits (-buff_size or bound) you break the loop and
clamp index to the nearest valid cell before assigning
ghost_points_in(q)%ip_grid(:); alternatively, implement a device-safe
binary_search helper that takes s_cc and temp_loc and returns the correct index
and use that in place of the do-while to guarantee safety and better
performance.
- Around line 967-1120: The MPI reductions in s_compute_ib_forces (calls to
s_mpi_allreduce_vectors_sum for forces and torques) are host-side and must be
bracketed by a GPU update to ensure device/host coherence; insert
GPU_UPDATE(device='[patch_ib(1:num_ibs)]') immediately after the two
s_mpi_allreduce_vectors_sum calls and before any writes to patch_ib(i)%force or
patch_ib(i)%torque so the routine handles synchronization itself.
In `@src/simulation/m_time_steppers.fpp`:
- Around line 805-855: The NVTX range label passed to nvtxStartRange in the
MTimeSteppers routine is misspelled ("PROPAGATE-IMERSED-BOUNDARIES"); update the
string to the correct label "PROPAGATE-IMMERSED-BOUNDARIES" so profiling tools
group this region correctly (look for the nvtxStartRange(...) call in the
do-loop that updates patch_ib and ensure nvtxEndRange remains paired).
---
Nitpick comments:
In `@src/common/m_constants.fpp`:
- Line 25: You changed the shared constant num_patches_max in
src/common/m_constants.fpp which can impact all executables; rebuild and run the
pre_process, simulation, and post_process targets (or their test suites) to
exercise cross-target behavior and catch regressions — specifically rebuild the
three binaries that consume num_patches_max (pre_process, simulation,
post_process), run their full test inputs or integration runs, and verify no
runtime assertions, array bounds, or config mismatches occur after the change.
In `@src/simulation/m_ibm.fpp`:
- Around line 536-586: The loop in subroutine s_find_num_ghost_points uses
per-cell GPU_ATOMIC updates on num_gps_local and num_inner_gps_local causing
contention; replace the copy-based GPU_PARALLEL_LOOP with a reduction-based
directive so the counts are accumulated via GPU reductions instead of atomics
(use GPU_PARALLEL_LOOP with collapse=3, private='[i,j,k,ii,jj,kk,is_gp]',
firstprivate='[gp_layers,gp_layers_z]' and a reduction clause for
[num_gps_local, num_inner_gps_local] with '+'), remove the two
GPU_ATOMIC(atomic='update') regions and keep the rest of the
ghost-classification logic unchanged, then assign num_gps_out and
num_inner_gps_out from the reduced locals as before.
| type :: t_model_array | ||
| ! Original CPU-side fields (unchanged) | ||
| type(t_model), allocatable :: model | ||
| real(wp), allocatable, dimension(:, :, :) :: boundary_v | ||
| real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v | ||
| integer :: boundary_edge_count | ||
| integer :: total_vertices | ||
| logical :: interpolate | ||
| integer :: interpolate | ||
|
|
||
| ! GPU-friendly flattened arrays | ||
| integer :: ntrs ! copy of model%ntrs | ||
| real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices | ||
| real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals | ||
| end type t_model_array |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the current state of the file at the specified lines
cat -n src/common/m_derived_types.fpp | sed -n '180,210p'Repository: MFlowCode/MFC
Length of output: 1607
🏁 Script executed:
# Search for all uses of %interpolate across the codebase
rg -nP '%interpolate\b' src --type-add 'fpp:*.fpp' --type 'fpp'Repository: MFlowCode/MFC
Length of output: 344
🏁 Script executed:
# Look for suspicious logical patterns with %interpolate
rg -nP '%interpolate\b.*(\.(true|false)|\.not\.)' src --type-add 'fpp:*.fpp' --type 'fpp'Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check if there are any conditional patterns that may need review
rg -nP 'if\s*\(.*%interpolate' src --type-add 'fpp:*.fpp' --type 'fpp' -A 2Repository: MFlowCode/MFC
Length of output: 854
🏁 Script executed:
# Check for assignments to %interpolate in the codebase
rg -nP '%interpolate\s*=' src --type-add 'fpp:*.fpp' --type 'fpp' -B 1 -A 1Repository: MFlowCode/MFC
Length of output: 429
t_model_array scalars must be initialized to prevent garbage values reaching GPU or uninitialized field reads.
All uses of %interpolate correctly employ integer semantics (0/1, no logical operators); however, the type definition still lacks default initialization for boundary_edge_count, total_vertices, interpolate, and ntrs. Uninitialized scalars can leak garbage into GPU staging arrays or cause undefined behavior if any code path reads these fields before assignment.
Apply default initialization
integer :: boundary_edge_count = 0
integer :: total_vertices = 0
integer :: interpolate = 0
integer :: ntrs = 0 ! copy of model%ntrsAdditionally, verify that changes do not break pre_process and post_process executables, not just simulation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| type :: t_model_array | |
| ! Original CPU-side fields (unchanged) | |
| type(t_model), allocatable :: model | |
| real(wp), allocatable, dimension(:, :, :) :: boundary_v | |
| real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v | |
| integer :: boundary_edge_count | |
| integer :: total_vertices | |
| logical :: interpolate | |
| integer :: interpolate | |
| ! GPU-friendly flattened arrays | |
| integer :: ntrs ! copy of model%ntrs | |
| real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices | |
| real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals | |
| end type t_model_array | |
| type :: t_model_array | |
| ! Original CPU-side fields (unchanged) | |
| type(t_model), allocatable :: model | |
| real(wp), allocatable, dimension(:, :, :) :: boundary_v | |
| real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v | |
| integer :: boundary_edge_count = 0 | |
| integer :: total_vertices = 0 | |
| integer :: interpolate = 0 | |
| ! GPU-friendly flattened arrays | |
| integer :: ntrs = 0 ! copy of model%ntrs | |
| real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices | |
| real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals | |
| end type t_model_array |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/common/m_derived_types.fpp` around lines 186 - 199, The t_model_array
derived type declares scalar fields boundary_edge_count, total_vertices,
interpolate and ntrs without defaults which can leave garbage values; add
default initializers for these scalars in the type definition (set
boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0) so all instances
start deterministic before any assignment, and then run and verify pre_process,
post_process and simulation builds/tests to ensure the initialization change
doesn't break their workflows; update any constructors or code that relied on
uninitialized behavior if necessary.
| public :: f_model_read, s_model_write, s_model_free, f_model_is_inside, models, gpu_ntrs, & | ||
| gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, & | ||
| gpu_total_vertices, stl_bounding_boxes | ||
|
|
||
| ! Subroutines for STL immersed boundaries | ||
| public :: f_check_boundary, f_register_edge, f_check_interpolation_2D, & | ||
| f_check_interpolation_3D, f_interpolate_2D, f_interpolate_3D, & | ||
| f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area | ||
| f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area, s_pack_model_for_gpu, & | ||
| f_model_is_inside_flat, f_distance_normals_3d_flat | ||
|
|
||
| !! array of STL models that can be allocated and then used in IB marker and levelset compute | ||
| type(t_model_array), allocatable, target :: models(:) | ||
| !! GPU-friendly flat arrays for STL model data | ||
| integer, allocatable :: gpu_ntrs(:) | ||
| real(wp), allocatable, dimension(:, :, :, :) :: gpu_trs_v | ||
| real(wp), allocatable, dimension(:, :, :) :: gpu_trs_n | ||
| real(wp), allocatable, dimension(:, :, :, :) :: gpu_boundary_v | ||
| real(wp), allocatable, dimension(:, :, :) :: gpu_interpolated_boundary_v | ||
| integer, allocatable :: gpu_interpolate(:) | ||
| integer, allocatable :: gpu_boundary_edge_count(:) | ||
| integer, allocatable :: gpu_total_vertices(:) | ||
| real(wp), allocatable :: stl_bounding_boxes(:, :, :) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find allocation and deallocation sites for the new module variables
echo "=== Allocation/Deallocation patterns ==="
rg -nP --type fpp '(allocate|deallocate)\s*\(' src/common/m_model.fpp | head -30
echo -e "\n=== GPU array allocations in entire src/common ==="
rg -nP --type fpp 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|gpu_interpolate|gpu_boundary_edge_count|gpu_total_vertices|stl_bounding_boxes' src/common/ | grep -i allocate | head -20
echo -e "\n=== Where m_model is imported ==="
rg -n 'use.*m_model' src/ | head -20Repository: MFlowCode/MFC
Length of output: 622
🏁 Script executed:
#!/bin/bash
# Check the finalization routine referenced in the public interface
echo "=== s_model_free subroutine definition ==="
rg -A 50 'subroutine\s+s_model_free' src/common/m_model.fpp | head -60
echo -e "\n=== s_model_write subroutine definition ==="
rg -A 30 'subroutine\s+s_model_write' src/common/m_model.fpp | head -40Repository: MFlowCode/MFC
Length of output: 2632
🏁 Script executed:
#!/bin/bash
# Check if pre_process or post_process reference the new public variables or m_model
echo "=== pre_process usage of m_model ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/pre_process/ 2>/dev/null | head -10
echo -e "\n=== post_process usage of m_model ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/post_process/ 2>/dev/null | head -10
echo -e "\n=== Check if pre_process or post_process import m_model ==="
rg -n 'use.*m_model' src/pre_process/ src/post_process/ 2>/dev/nullRepository: MFlowCode/MFC
Length of output: 518
🏁 Script executed:
#!/bin/bash
# Get the full m_model.fpp file to understand context
wc -l src/common/m_model.fppRepository: MFlowCode/MFC
Length of output: 84
🏁 Script executed:
#!/bin/bash
# Read the module to find allocation patterns
# Focus on where the public arrays are allocated
sed -n '20,100p' src/common/m_model.fpp | cat -nRepository: MFlowCode/MFC
Length of output: 3661
🏁 Script executed:
#!/bin/bash
# Search for allocation patterns - try broader search
rg -n 'allocate' src/common/m_model.fpp | head -30Repository: MFlowCode/MFC
Length of output: 786
🏁 Script executed:
#!/bin/bash
# Look for initialization subroutines that might allocate these arrays
rg -n 'subroutine.*init\|subroutine.*allocate\|subroutine.*setup' src/common/m_model.fpp | head -20Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for where models(:) is allocated
rg -B 3 -A 3 'models\s*\(' src/common/m_model.fpp | head -50Repository: MFlowCode/MFC
Length of output: 430
🏁 Script executed:
#!/bin/bash
# Look for gpu_ntrs allocation pattern
rg -B 3 -A 3 'gpu_ntrs\s*\(' src/common/m_model.fpp | head -50Repository: MFlowCode/MFC
Length of output: 490
🏁 Script executed:
#!/bin/bash
# Check how these are used in simulation to understand lifecycle
echo "=== Usage in m_ibm.fpp ==="
rg -B 2 -A 2 'gpu_ntrs\|gpu_trs_v\|gpu_boundary_v\|models\(' src/simulation/m_ibm.fpp | head -40
echo -e "\n=== Usage in pre_process/m_icpp_patches.fpp ==="
rg -B 2 -A 2 'models\(' src/pre_process/m_icpp_patches.fpp | head -40Repository: MFlowCode/MFC
Length of output: 132
🏁 Script executed:
#!/bin/bash
# Search for allocation of the MODULE-LEVEL arrays in calling modules
echo "=== Search for allocate of gpu_ntrs in entire codebase ==="
rg 'allocate.*gpu_ntrs\|gpu_ntrs.*allocate' src/ | head -20
echo -e "\n=== Search for s_pack_model_for_gpu ==="
rg -n 's_pack_model_for_gpu' src/ | head -20
echo -e "\n=== Look for where gpu arrays are assigned/populated ==="
rg -n 'gpu_ntrs\s*=' src/ | head -20Repository: MFlowCode/MFC
Length of output: 591
🏁 Script executed:
#!/bin/bash
# Search for deallocate of module-level arrays
echo "=== Deallocate of module-level gpu arrays ==="
rg 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*models\(' src/ | head -20
echo -e "\n=== Look for finalization in m_ibm.fpp ==="
rg -n 'finalize\|cleanup\|deallocate' src/simulation/m_ibm.fpp | head -20Repository: MFlowCode/MFC
Length of output: 146
🏁 Script executed:
#!/bin/bash
# Look at s_pack_model_for_gpu implementation
echo "=== s_pack_model_for_gpu subroutine ==="
rg -A 80 'subroutine\s+s_pack_model_for_gpu' src/common/m_model.fpp | head -100Repository: MFlowCode/MFC
Length of output: 519
🏁 Script executed:
#!/bin/bash
# Final verification: confirm module-level arrays have no lifecycle
echo "=== Complete scan for any allocation pattern ==="
rg 'allocate\s*\(' src/common/m_model.fpp src/simulation/m_ib_patches.fpp src/simulation/m_ibm.fpp | grep -E 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\s*\(' || echo "NO ALLOCATIONS FOUND for public module arrays"
echo -e "\n=== Confirm these are truly module-level (public scope) ==="
rg -B 1 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/common/m_model.fppRepository: MFlowCode/MFC
Length of output: 763
🏁 Script executed:
#!/bin/bash
# Search for deallocation of these specific arrays
echo "=== Deallocate patterns for public module arrays ==="
rg -n 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*gpu_boundary_v\|deallocate.*models' src/ | head -20
echo -e "\n=== Broader search for deallocate in m_ib_patches.fpp ==="
rg -n 'deallocate' src/simulation/m_ib_patches.fpp | head -20
echo -e "\n=== Context around gpu array allocations in m_ib_patches.fpp ==="
rg -B 5 -A 10 'allocate.*gpu_ntrs' src/simulation/m_ib_patches.fpp | head -40Repository: MFlowCode/MFC
Length of output: 890
🏁 Script executed:
#!/bin/bash
# Check what pre_process actually uses from m_model
echo "=== What does pre_process use from m_model? ==="
sed -n '1,50p' src/pre_process/m_icpp_patches.fpp | grep -A 15 'use m_model'
echo -e "\n=== Any direct usage of gpu_* or models arrays in pre_process? ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\(' src/pre_process/m_icpp_patches.fpp | head -10Repository: MFlowCode/MFC
Length of output: 553
Add deallocate() calls for all public module arrays in a finalization routine, and remove unnecessary m_model import from pre_process.
The public module arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices) are allocated in m_ib_patches.fpp but never deallocated, causing GPU memory leaks. Create a finalization subroutine in m_model.fpp to deallocate these arrays and call it during simulation cleanup. Additionally, src/pre_process/m_icpp_patches.fpp imports m_model but does not use any of the gpu or models arrays—remove this unnecessary coupling per src/common/ guidelines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/common/m_model.fpp` around lines 20 - 42, Add a module finalizer in
m_model.fpp (e.g., subroutine s_finalize_model or a MODULE PROCEDURE bound to
final) that checks ALLOCATED(...) and calls DEALLOCATE(...) for all public
allocatable arrays gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v,
gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count,
gpu_total_vertices, stl_bounding_boxes and models to ensure GPU/host memory is
freed during simulation cleanup; export or document the finalizer so the cleanup
path in your driver or cleanup routine calls it. Also remove the unnecessary
"use m_model" import from src/pre_process/m_icpp_patches.fpp (or the specific
pre_process file mentioned) since it does not reference the gpu_* arrays or
models, eliminating the unwanted coupling.
| subroutine f_distance_normals_3D_flat(ntrs, trs_v, trs_n, pid, point, normals, distance) | ||
|
|
||
| $:GPU_ROUTINE(parallelism='[seq]') | ||
|
|
||
| integer, intent(in) :: ntrs | ||
| real(wp), dimension(:, :, :, :), intent(in) :: trs_v | ||
| real(wp), dimension(:, :, :), intent(in) :: trs_n | ||
| integer, intent(in) :: pid | ||
| real(wp), dimension(1:3), intent(in) :: point | ||
| real(wp), dimension(1:3), intent(out) :: normals | ||
| real(wp), intent(out) :: distance | ||
|
|
||
| real(wp), dimension(1:3, 1:3) :: tri | ||
| real(wp) :: dist_min, dist_t_min | ||
| real(wp) :: dist_min_normal, dist_buffer_normal | ||
| real(wp), dimension(1:3) :: midp | ||
| real(wp), dimension(1:3) :: dist_buffer | ||
| integer :: i, j, tri_idx | ||
|
|
||
| dist_min = 1.e12_wp | ||
| dist_min_normal = 1.e12_wp | ||
| distance = 0._wp | ||
|
|
||
| tri_idx = 0 | ||
| do i = 1, ntrs | ||
| do j = 1, 3 | ||
| tri(j, 1) = trs_v(j, 1, i, pid) | ||
| tri(j, 2) = trs_v(j, 2, i, pid) | ||
| tri(j, 3) = trs_v(j, 3, i, pid) | ||
| dist_buffer(j) = sqrt((point(1) - tri(j, 1))**2 + & | ||
| (point(2) - tri(j, 2))**2 + & | ||
| (point(3) - tri(j, 3))**2) | ||
| end do | ||
|
|
||
| do j = 1, 3 | ||
| midp(j) = (tri(1, j) + tri(2, j) + tri(3, j))/3 | ||
| end do | ||
|
|
||
| dist_t_min = minval(dist_buffer(1:3)) | ||
| dist_buffer_normal = sqrt((point(1) - midp(1))**2 + & | ||
| (point(2) - midp(2))**2 + & | ||
| (point(3) - midp(3))**2) | ||
|
|
||
| if (dist_t_min < dist_min) then | ||
| dist_min = dist_t_min | ||
| end if | ||
|
|
||
| if (dist_buffer_normal < dist_min_normal) then | ||
| dist_min_normal = dist_buffer_normal | ||
| tri_idx = i | ||
| end if | ||
| end do | ||
|
|
||
| normals(1) = trs_n(1, tri_idx, pid) | ||
| normals(2) = trs_n(2, tri_idx, pid) | ||
| normals(3) = trs_n(3, tri_idx, pid) | ||
| distance = dist_min | ||
|
|
||
| end subroutine f_distance_normals_3D_flat |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bf_distance_normals_3D_flat\s*\(' srcRepository: MFlowCode/MFC
Length of output: 345
🏁 Script executed:
sed -n '680,700p' src/simulation/m_compute_levelset.fppRepository: MFlowCode/MFC
Length of output: 913
🏁 Script executed:
rg -nP 'gpu_ntrs' src/simulation/m_compute_levelset.fpp | head -20Repository: MFlowCode/MFC
Length of output: 186
🏁 Script executed:
rg -nP '\bgpu_ntrs\b' src --type-add 'fpp:*.fpp' | grep -E '(allocate|dimension|intent|gpu_ntrs\s*=)'Repository: MFlowCode/MFC
Length of output: 206
🏁 Script executed:
sed -n '1110,1160p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 2729
🏁 Script executed:
sed -n '650,710p' src/simulation/m_compute_levelset.fppRepository: MFlowCode/MFC
Length of output: 2630
Add guard in f_distance_normals_3D_flat for ntrs <= 0 to prevent out-of-bounds access.
When ntrs is 0 (which can occur if a patch has no model allocated), tri_idx remains uninitialized at 0 and is used to index trs_n(:, tri_idx, pid) at the end, causing an out-of-bounds read. The caller at src/simulation/m_compute_levelset.fpp:688 does not validate gpu_ntrs(patch_id) > 0 before invoking this function.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/common/m_model.fpp` around lines 1209 - 1267, The subroutine
f_distance_normals_3D_flat can index trs_n(:, tri_idx, pid) with tri_idx==0 when
ntrs <= 0; add a guard at the top of f_distance_normals_3D_flat that checks if
ntrs <= 0 and in that case set distance to a safe value (e.g. a large positive
sentinel or 0._wp as appropriate) and set normals to a defined default (e.g.
zero vector) and return immediately to avoid using tri_idx/uninitialized
indices; ensure you reference and update tri_idx, distance and normals in this
early-return path so no out-of-bounds access to trs_n occurs.
| subroutine s_pack_model_for_gpu(ma) | ||
| type(t_model_array), intent(inout) :: ma | ||
| integer :: i | ||
|
|
||
| ma%ntrs = ma%model%ntrs | ||
| allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs)) | ||
| allocate (ma%trs_n(1:3, 1:ma%ntrs)) | ||
|
|
||
| do i = 1, ma%ntrs | ||
| ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :) | ||
| ma%trs_n(:, i) = ma%model%trs(i)%n(:) | ||
| end do | ||
| end subroutine | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all call sites to s_pack_model_for_gpu
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bs_pack_model_for_gpu\s*\(' src || trueRepository: MFlowCode/MFC
Length of output: 220
🏁 Script executed:
#!/bin/bash
# Also search for the subroutine definition and nearby code
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' -B 5 -A 20 'subroutine s_pack_model_for_gpu' src || trueRepository: MFlowCode/MFC
Length of output: 1111
🏁 Script executed:
#!/bin/bash
# Check if ma%trs_v or ma%trs_n are referenced elsewhere to understand lifecycle
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'ma%trs_v|ma%trs_n' src || trueRepository: MFlowCode/MFC
Length of output: 367
🏁 Script executed:
#!/bin/bash
# Get context around the call site in m_ib_patches.fpp
sed -n '1080,1120p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 1850
🏁 Script executed:
#!/bin/bash
# Search for surrounding loop context (broader range)
sed -n '1050,1130p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 4004
🏁 Script executed:
#!/bin/bash
# Check if models is deallocated/re-initialized somewhere
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'models\s*\(|deallocate.*models' src/simulation/ | head -30Repository: MFlowCode/MFC
Length of output: 3130
🏁 Script executed:
#!/bin/bash
# Check if s_instantiate_STL_models or models are re-initialized in any time-stepping/simulation loop
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 's_instantiate_STL_models|allocate.*models\(' src/simulation/ | grep -v '\.fpp-' | head -20Repository: MFlowCode/MFC
Length of output: 1135
🏁 Script executed:
#!/bin/bash
# Search for any re-initialization or deallocate of models during simulation
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'deallocate.*models|models.*=|trs_v.*deallocate|trs_n.*deallocate' src | head -20Repository: MFlowCode/MFC
Length of output: 1318
Make s_pack_model_for_gpu re-entrant by checking allocations before allocating.
The function is called only once per object during initialization (in s_instantiate_STL_models), so there is currently no runtime risk. However, unconditional allocations of ma%trs_v and ma%trs_n violate defensive programming practices and would fail silently if the function were ever called again on the same object. Add pre-allocation checks:
Proposed fix
ma%ntrs = ma%model%ntrs
+ if (allocated(ma%trs_v)) deallocate(ma%trs_v)
+ if (allocated(ma%trs_n)) deallocate(ma%trs_n)
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate (ma%trs_n(1:3, 1:ma%ntrs))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| subroutine s_pack_model_for_gpu(ma) | |
| type(t_model_array), intent(inout) :: ma | |
| integer :: i | |
| ma%ntrs = ma%model%ntrs | |
| allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs)) | |
| allocate (ma%trs_n(1:3, 1:ma%ntrs)) | |
| do i = 1, ma%ntrs | |
| ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :) | |
| ma%trs_n(:, i) = ma%model%trs(i)%n(:) | |
| end do | |
| end subroutine | |
| subroutine s_pack_model_for_gpu(ma) | |
| type(t_model_array), intent(inout) :: ma | |
| integer :: i | |
| ma%ntrs = ma%model%ntrs | |
| if (allocated(ma%trs_v)) deallocate(ma%trs_v) | |
| if (allocated(ma%trs_n)) deallocate(ma%trs_n) | |
| allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs)) | |
| allocate (ma%trs_n(1:3, 1:ma%ntrs)) | |
| do i = 1, ma%ntrs | |
| ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :) | |
| ma%trs_n(:, i) = ma%model%trs(i)%n(:) | |
| end do | |
| end subroutine |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/common/m_model.fpp` around lines 1402 - 1415, s_pack_model_for_gpu
currently unconditionally allocates ma%trs_v and ma%trs_n; make it re-entrant by
checking existing allocations before allocating: set ma%ntrs = ma%model%ntrs,
then for ma%trs_v and ma%trs_n use allocated(...) to detect prior allocation and
only allocate if not allocated or if the current allocation dimensions don't
match ma%ntrs (in that case deallocate then allocate with the correct bounds);
ensure you copy into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation
checks complete.
| !> The STL patch is a 2D geometry that is imported from an STL file. | ||
| !! @param patch_id is the patch identifier | ||
| !! @param ib_markers_sf Array to track patch ids | ||
| !! @param STL_levelset STL levelset | ||
| !! @param STL_levelset_norm STL levelset normals | ||
| subroutine s_ib_model(patch_id, ib_markers_sf) | ||
| !! @param ib_markers Array to track patch ids | ||
| subroutine s_ib_model(patch_id, ib_markers) | ||
|
|
||
| integer, intent(in) :: patch_id | ||
| integer, dimension(0:m, 0:n, 0:p), intent(inout) :: ib_markers_sf | ||
|
|
||
| integer :: i, j, k !< Generic loop iterators | ||
| type(integer_field), intent(inout) :: ib_markers | ||
|
|
||
| type(t_model), pointer :: model | ||
| integer :: i, j, k !< Generic loop iterators | ||
| integer :: spc | ||
|
|
||
| real(wp) :: eta | ||
| real(wp) :: eta, threshold | ||
| real(wp), dimension(1:3) :: point, local_point, offset | ||
| real(wp), dimension(1:3) :: center, xyz_local | ||
| real(wp), dimension(1:3) :: center, xy_local | ||
| real(wp), dimension(1:3, 1:3) :: inverse_rotation | ||
|
|
||
| model => models(patch_id)%model | ||
| center = 0._wp | ||
| center(1) = patch_ib(patch_id)%x_centroid | ||
| center(2) = patch_ib(patch_id)%y_centroid | ||
| if (p > 0) center(3) = patch_ib(patch_id)%z_centroid | ||
| inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :) | ||
| offset(:) = patch_ib(patch_id)%centroid_offset(:) | ||
| spc = patch_ib(patch_id)%model_spc | ||
| threshold = patch_ib(patch_id)%model_threshold | ||
|
|
||
| do i = 0, m | ||
| do j = 0, n | ||
| do k = 0, p | ||
| $:GPU_PARALLEL_LOOP(private='[i,j, xy_local, eta]',& | ||
| & copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=2) | ||
| do i = -gp_layers, m + gp_layers | ||
| do j = -gp_layers, n + gp_layers | ||
|
|
||
| xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp] | ||
| if (p > 0) then | ||
| xyz_local(3) = z_cc(k) - center(3) | ||
| end if | ||
| xy_local = [x_cc(i) - center(1), y_cc(j) - center(2), 0._wp] | ||
| xy_local = matmul(inverse_rotation, xy_local) | ||
| xy_local = xy_local - offset | ||
|
|
||
| if (grid_geometry == 3) then | ||
| xy_local = f_convert_cyl_to_cart(xy_local) | ||
| end if | ||
|
|
||
| eta = f_model_is_inside_flat(gpu_ntrs(patch_id), & | ||
| gpu_trs_v, gpu_trs_n, & | ||
| patch_id, & | ||
| xy_local, (/dx(i), dy(j), 0._wp/), & | ||
| spc) | ||
|
|
||
| ! Reading STL boundary vertices and compute the levelset and levelset_norm | ||
| if (eta > threshold) then | ||
| ib_markers%sf(i, j, 0) = patch_id | ||
| end if | ||
|
|
||
| end do | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() | ||
|
|
||
| end subroutine s_ib_model | ||
|
|
||
| !> The STL patch is a 3D geometry that is imported from an STL file. | ||
| !! @param patch_id is the patch identifier | ||
| !! @param ib_markers Array to track patch ids | ||
| subroutine s_ib_3d_model(patch_id, ib_markers) | ||
|
|
||
| integer, intent(in) :: patch_id | ||
| type(integer_field), intent(inout) :: ib_markers | ||
|
|
||
| integer :: i, j, k, il, ir, jl, jr, kl, kr !< Generic loop iterators | ||
| integer :: spc | ||
|
|
||
| real(wp) :: eta, threshold, corner_distance | ||
| real(wp), dimension(1:3) :: point, local_point, offset | ||
| real(wp), dimension(1:3) :: center, xyz_local | ||
| real(wp), dimension(1:3, 1:3) :: inverse_rotation | ||
|
|
||
| center = 0._wp | ||
| center(1) = patch_ib(patch_id)%x_centroid | ||
| center(2) = patch_ib(patch_id)%y_centroid | ||
| center(3) = patch_ib(patch_id)%z_centroid | ||
| inverse_rotation(:, :) = patch_ib(patch_id)%rotation_matrix_inverse(:, :) | ||
| offset(:) = patch_ib(patch_id)%centroid_offset(:) | ||
| spc = patch_ib(patch_id)%model_spc | ||
| threshold = patch_ib(patch_id)%model_threshold | ||
|
|
||
| il = -gp_layers | ||
| jl = -gp_layers | ||
| kl = -gp_layers | ||
| ir = m + gp_layers | ||
| jr = n + gp_layers | ||
| kr = p + gp_layers | ||
| corner_distance = 0._wp | ||
| do i = 1, 3 | ||
| corner_distance = corner_distance + maxval(abs(stl_bounding_boxes(patch_id, i, 1:3)))**2 ! distance to rim of cylinder | ||
| end do | ||
| corner_distance = sqrt(corner_distance) | ||
| call get_bounding_indices(center(1) - corner_distance, center(1) + corner_distance, x_cc, il, ir) | ||
| call get_bounding_indices(center(2) - corner_distance, center(2) + corner_distance, y_cc, jl, jr) | ||
| call get_bounding_indices(center(3) - corner_distance, center(3) + corner_distance, z_cc, kl, kr) | ||
|
|
||
| $:GPU_PARALLEL_LOOP(private='[i,j,k, xyz_local, eta]',& | ||
| & copyin='[patch_id,center,inverse_rotation, offset, spc, threshold]', collapse=3) | ||
| do i = il, ir | ||
| do j = jl, jr | ||
| do k = kl, kr | ||
|
|
||
| xyz_local = [x_cc(i) - center(1), y_cc(j) - center(2), z_cc(k) - center(3)] | ||
| xyz_local = matmul(inverse_rotation, xyz_local) | ||
| xyz_local = xyz_local - offset | ||
|
|
||
| if (grid_geometry == 3) then | ||
| xyz_local = f_convert_cyl_to_cart(xyz_local) | ||
| end if | ||
|
|
||
| if (p == 0) then | ||
| eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), 0._wp/), patch_ib(patch_id)%model_spc) | ||
| else | ||
| eta = f_model_is_inside(model, xyz_local, (/dx(i), dy(j), dz(k)/), patch_ib(patch_id)%model_spc) | ||
| end if | ||
| eta = f_model_is_inside_flat(gpu_ntrs(patch_id), & | ||
| gpu_trs_v, gpu_trs_n, & | ||
| patch_id, & | ||
| xyz_local, (/dx(i), dy(j), dz(k)/), & | ||
| spc) | ||
|
|
||
| ! Reading STL boundary vertices and compute the levelset and levelset_norm | ||
| if (eta > patch_ib(patch_id)%model_threshold) then | ||
| ib_markers_sf(i, j, k) = patch_id | ||
| ib_markers%sf(i, j, k) = patch_id | ||
| end if | ||
|
|
||
| end do | ||
| end do | ||
| end do | ||
| $:END_GPU_PARALLEL_LOOP() | ||
|
|
||
| end subroutine s_ib_model | ||
| end subroutine s_ib_3d_model | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 1. Find s_instantiate_STL_models and s_apply_ib_patches calls to check ordering
echo "=== Looking for s_instantiate_STL_models and s_apply_ib_patches calls ==="
rg -n 's_instantiate_STL_models|s_apply_ib_patches' src/simulation/ --type fpp
echo -e "\n=== Checking if they're in correct order within m_ibm.fpp ==="
rg -n 's_instantiate_STL_models\|s_apply_ib_patches' src/simulation/m_ibm.fpp -B 2 -A 2Repository: MFlowCode/MFC
Length of output: 238
🏁 Script executed:
#!/bin/bash
# 2. Find where gpu_ntrs/gpu_trs_v/gpu_trs_n are allocated and managed
echo "=== Finding gpu_ntrs/gpu_trs_v/gpu_trs_n allocations and GPU_ENTER_DATA ==="
rg -n 'allocate.*gpu_ntrs|allocate.*gpu_trs_v|allocate.*gpu_trs_n' src/simulation/ --type fpp
echo -e "\n=== Searching for GPU_ENTER_DATA calls with these variables ==="
rg -n 'GPU_ENTER_DATA' src/simulation/m_ib_patches.fpp -A 5 || rg -n 'GPU_ENTER_DATA' src/simulation/m_ibm.fpp -A 5Repository: MFlowCode/MFC
Length of output: 821
🏁 Script executed:
#!/bin/bash
# 3. Verify bounding-box logic in both functions
echo "=== Checking s_ib_model for bounding-box logic ==="
sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|bounding' || echo "No bounding-box narrowing found in s_ib_model"
echo -e "\n=== Checking s_ib_3d_model for bounding-box logic ==="
sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -n 'il\|ir\|jl\|jr\|kl\|kr\|bounding'Repository: MFlowCode/MFC
Length of output: 1111
🏁 Script executed:
#!/bin/bash
# 4. Check data movement for gpu_ntrs/gpu_trs_v/gpu_trs_n in the GPU loop context
echo "=== Checking GPU_PARALLEL_LOOP copyin clause for these variables ==="
sed -n '828,875p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP'
echo -e "\n=== In s_ib_3d_model ==="
sed -n '877,953p' src/simulation/m_ib_patches.fpp | grep -A 2 'GPU_PARALLEL_LOOP'
echo -e "\n=== Checking if gpu_trs_v/gpu_trs_n are explicitly referenced ==="
rg -n 'gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | head -20Repository: MFlowCode/MFC
Length of output: 1440
🏁 Script executed:
#!/bin/bash
# Check the broader context of s_apply_ib_patches and see if GPU_ENTER_DATA creates persistent region
rg -n 's_apply_ib_patches' src/simulation/m_ibm.fpp -B 10 -A 20 | head -80Repository: MFlowCode/MFC
Length of output: 2981
🏁 Script executed:
#!/bin/bash
# Look for GPU_DATA wrapping context around the kernel calls
rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp -B 2 -A 2Repository: MFlowCode/MFC
Length of output: 493
🏁 Script executed:
#!/bin/bash
# Check f_model_is_inside_flat signature to see if it expects device arrays
rg -n 'subroutine\s+f_model_is_inside_flat|function\s+f_model_is_inside_flat' src/simulation/ -A 15Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Verify where GPU_ENTER_DATA is called and what scope it governs
rg -n 'subroutine s_instantiate_STL_models' src/simulation/m_ib_patches.fpp -A 300 | tail -150Repository: MFlowCode/MFC
Length of output: 6345
🏁 Script executed:
#!/bin/bash
# Check if there's a persistent GPU_ENTER_DATA region wrapping both calls
# Look for GPU_ENTER_DATA without matching GPU_EXIT_DATA
rg -n 'GPU_ENTER_DATA|GPU_EXIT_DATA' src/simulation/m_ibm.fpp | head -40Repository: MFlowCode/MFC
Length of output: 185
🏁 Script executed:
#!/bin/bash
# Verify the exact scope of GPU_ENTER_DATA at line 1160 - is it persistent?
sed -n '1140,1170p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 1881
🏁 Script executed:
#!/bin/bash
# Check where gpu_ntrs/gpu_trs_v/gpu_trs_n might be declared with present() or accessed post-ENTER_DATA
rg -n 'present\|gpu_ntrs\|gpu_trs_v\|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -A 3 -B 3 'present'Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for how GPU data persistence is managed - look for surrounding GPU_DATA or block context
sed -n '1100,1170p' src/simulation/m_ib_patches.fpp | head -80Repository: MFlowCode/MFC
Length of output: 3699
🏁 Script executed:
#!/bin/bash
# Check if there's a GPU_EXIT_DATA anywhere in the file
rg -n 'GPU_EXIT_DATA' src/simulation/m_ib_patches.fpp || echo "No GPU_EXIT_DATA found"
# Check broader context around GPU_ENTER_DATA call at line 1160
sed -n '1090,1170p' src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 4049
🏁 Script executed:
#!/bin/bash
# Look for how f_model_is_inside_flat uses the arrays - check if it's marked GPU_ROUTINE
rg -n 'f_model_is_inside_flat' src/simulation/ -A 5 | head -40Repository: MFlowCode/MFC
Length of output: 1164
🏁 Script executed:
#!/bin/bash
# Check the GPU macro definitions to understand data region semantics
fd 'gpuParallelization\|gpu.*\.fpp' src/simulation/ && cat src/simulation/gpuParallelization.fpp 2>/dev/null | head -200 || echo "File not found, searching for macro definitions"
rg -n '@:GPU_PARALLEL_LOOP|@:GPU_DATA|@:GPU_ENTER_DATA' src/ -B 2 | head -50Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if there's a broad GPU_DATA or GPU context managing device persistence around the entire flow
rg -n 'GPU_DATA|GPU_ENTER_DATA|GPU_EXIT_DATA|GPU_ROUTINE' src/simulation/m_ibm.fpp | head -20Repository: MFlowCode/MFC
Length of output: 280
🏁 Script executed:
#!/bin/bash
# Look at the main timestepping loop structure to understand device data persistence scope
rg -n 'subroutine s_time_step\|end subroutine s_time_step' src/simulation/m_ibm.fpp -A 300 | head -150Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if the GPU arrays like gpu_trs_v are declared as module-level allocatables that persist
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n' src/simulation/m_ib_patches.fpp | grep -i 'allocatable\|dimension' | head -10Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for module-level declarations of gpu arrays
rg -n 'allocatable.*gpu_ntrs|allocatable.*gpu_trs_v|allocatable.*gpu_trs_n' src/simulation/m_ib_patches.fpp -B 2 -A 2Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Find where these gpu arrays are declared at module scope
rg -n 'integer.*allocatable|real.*allocatable' src/simulation/m_ib_patches.fpp | grep -i gpuRepository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Get the entire module-level variable declarations section
sed -n '1,100p' src/simulation/m_ib_patches.fpp | grep -n 'allocatable\|dimension'Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for the module declaration section more broadly
head -200 src/simulation/m_ib_patches.fpp | tail -100Repository: MFlowCode/MFC
Length of output: 3972
🏁 Script executed:
#!/bin/bash
# Check module-level variable declarations in m_ib_patches
grep -n 'module m_ib_patches' src/simulation/m_ib_patches.fpp -A 50 | head -60Repository: MFlowCode/MFC
Length of output: 2313
🏁 Script executed:
#!/bin/bash
# Verify whether gpu_trs_v/gpu_trs_n require present() or explicit data movement in nested GPU regions
# Look for documentation or examples in gpuParallelization guidance
rg -n 'present\|GPU_PARALLEL_LOOP' src/simulation/m_ibm.fpp -B 2 -A 3 | head -80Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check if any other GPU_PARALLEL_LOOP kernels use variables allocated via GPU_ENTER_DATA elsewhere
rg -n 'GPU_PARALLEL_LOOP' src/simulation/ -A 5 | grep -A 5 'copyin\|present' | head -60Repository: MFlowCode/MFC
Length of output: 4582
🏁 Script executed:
#!/bin/bash
# Critical check: verify gpu_ntrs/gpu_trs_v/gpu_trs_n are referenced but NOT in data clauses
echo "=== In s_ib_model GPU_PARALLEL_LOOP (line ~850) ==="
sed -n '850,870p' src/simulation/m_ib_patches.fpp
echo -e "\n=== In s_ib_3d_model GPU_PARALLEL_LOOP (line ~920) ==="
sed -n '920,940p' src/simulation/m_ib_patches.fpp
echo -e "\n=== Checking for 'present' in GPU_PARALLEL_LOOP patterns ==="
rg -n 'present=' src/simulation/ | head -10Repository: MFlowCode/MFC
Length of output: 2233
🏁 Script executed:
#!/bin/bash
# Check if gpu_ntrs/gpu_trs_v/gpu_trs_n are module-level in m_model or elsewhere
fd -e fpp -e f90 | xargs grep -l 'gpu_ntrs\|gpu_trs_v' | head -5
rg -n 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/ -B 2 | head -30Repository: MFlowCode/MFC
Length of output: 549
🏁 Script executed:
#!/bin/bash
# Check if s_instantiate_STL_models is called before s_apply_ib_patches in BOTH initial and restart paths
echo "=== Initial setup (lines ~100-120) ==="
sed -n '100,120p' src/simulation/m_ibm.fpp
echo -e "\n=== Restart/update path (lines ~935-950) ==="
sed -n '935,950p' src/simulation/m_ibm.fppRepository: MFlowCode/MFC
Length of output: 1505
🏁 Script executed:
#!/bin/bash
# Verify that GPU arrays are module-level and persisted
echo "=== Module scope declaration in m_model.fpp ==="
sed -n '30,40p' src/common/m_model.fpp
echo -e "\n=== Confirm they're never deallocated ==="
rg -n 'deallocate.*gpu_ntrs|deallocate.*gpu_trs_v|deallocate.*gpu_trs_n' src/Repository: MFlowCode/MFC
Length of output: 810
Critical: GPU model data not refreshed in s_update_mib restart path; also declare persistent GPU arrays explicitly in kernel data clauses.
The gpu_ntrs/gpu_trs_v/gpu_trs_n arrays (module-level in m_model.fpp) are allocated and pushed to device via GPU_ENTER_DATA in s_instantiate_STL_models() with persistent lifetime. However, in the restart path (s_update_mib), s_apply_ib_patches() is called at line 945 without re-calling s_instantiate_STL_models() to refresh GPU data. If s_update_ib_rotation_matrix() modifies model state, the device arrays will be stale on the next marker generation.
Additionally, both s_ib_model and s_ib_3d_model reference gpu_ntrs/gpu_trs_v/gpu_trs_n inside GPU_PARALLEL_LOOP but do not explicitly declare them in the copyin or present clauses—they rely on implicit device persistence. For clarity and robustness, add present='[gpu_ntrs, gpu_trs_v, gpu_trs_n]' to the loop declarations.
Finally, s_ib_model (2D) scans the full grid; adding the same bounding-box narrowing logic from s_ib_3d_model would improve performance for sparse STL objects.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/simulation/m_ib_patches.fpp` around lines 828 - 953, s_update_mib’s
restart path must refresh the device-resident model arrays before generating
markers: ensure s_instantiate_STL_models() (or an equivalent
GPU_ENTER_DATA/refresh routine for module arrays gpu_ntrs, gpu_trs_v, gpu_trs_n)
is called or re-entered before s_apply_ib_patches() in the restart branch so
device data reflects any changes from s_update_ib_rotation_matrix(); in
s_ib_model and s_ib_3d_model, add explicit device presence to the
GPU_PARALLEL_LOOP clauses by including present='[gpu_ntrs, gpu_trs_v,
gpu_trs_n]' (in addition to existing copyin) to avoid relying on implicit
persistence; and narrow the 2D scan in s_ib_model by reusing the bounding-index
logic from s_ib_3d_model (compute corner_distance from stl_bounding_boxes and
call get_bounding_indices to set i/j limits) to avoid scanning the full grid for
sparse STL patches.
| subroutine s_instantiate_STL_models() | ||
|
|
||
| ! Variables for IBM+STL | ||
| real(wp) :: normals(1:3) !< Boundary normal buffer | ||
| integer :: boundary_vertex_count, boundary_edge_count, total_vertices !< Boundary vertex | ||
| real(wp), allocatable, dimension(:, :, :) :: boundary_v !< Boundary vertex buffer | ||
| real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v !< Interpolated vertex buffer | ||
| real(wp) :: dx_local, dy_local, dz_local !< Levelset distance buffer | ||
| logical :: interpolate !< Logical variable to determine whether or not the model should be interpolated | ||
|
|
||
| integer :: i, j, k !< Generic loop iterators | ||
| integer :: patch_id | ||
|
|
||
| type(t_bbox) :: bbox, bbox_old | ||
| type(t_model) :: model | ||
| type(ic_model_parameters) :: params | ||
|
|
||
| real(wp) :: eta | ||
| real(wp), dimension(1:3) :: point, model_center | ||
| real(wp) :: grid_mm(1:3, 1:2) | ||
|
|
||
| real(wp), dimension(1:4, 1:4) :: transform, transform_n | ||
|
|
||
| dx_local = minval(dx); dy_local = minval(dy) | ||
| if (p /= 0) dz_local = minval(dz) | ||
|
|
||
| do patch_id = 1, num_ibs | ||
| if (patch_ib(patch_id)%geometry == 5 .or. patch_ib(patch_id)%geometry == 12) then | ||
| allocate (models(patch_id)%model) | ||
| print *, " * Reading model: "//trim(patch_ib(patch_id)%model_filepath) | ||
|
|
||
| model = f_model_read(patch_ib(patch_id)%model_filepath) | ||
| params%scale(:) = patch_ib(patch_id)%model_scale(:) | ||
| params%translate(:) = patch_ib(patch_id)%model_translate(:) | ||
| params%rotate(:) = patch_ib(patch_id)%model_rotate(:) | ||
| params%spc = patch_ib(patch_id)%model_spc | ||
| params%threshold = patch_ib(patch_id)%model_threshold | ||
|
|
||
| if (f_approx_equal(dot_product(params%scale, params%scale), 0._wp)) then | ||
| params%scale(:) = 1._wp | ||
| end if | ||
|
|
||
| if (proc_rank == 0) then | ||
| print *, " * Transforming model." | ||
| end if | ||
|
|
||
| ! Get the model center before transforming the model | ||
| bbox_old = f_create_bbox(model) | ||
| model_center(1:3) = (bbox_old%min(1:3) + bbox_old%max(1:3))/2._wp | ||
|
|
||
| ! Compute the transform matrices for vertices and normals | ||
| transform = f_create_transform_matrix(params, model_center) | ||
| transform_n = f_create_transform_matrix(params) | ||
|
|
||
| call s_transform_model(model, transform, transform_n) | ||
|
|
||
| ! Recreate the bounding box after transformation | ||
| bbox = f_create_bbox(model) | ||
|
|
||
| ! Show the number of vertices in the original STL model | ||
| if (proc_rank == 0) then | ||
| print *, ' * Number of input model vertices:', 3*model%ntrs | ||
| end if | ||
|
|
||
| call f_check_boundary(model, boundary_v, boundary_vertex_count, boundary_edge_count) | ||
|
|
||
| ! Check if the model needs interpolation | ||
| if (p > 0) then | ||
| call f_check_interpolation_3D(model, (/dx_local, dy_local, dz_local/), interpolate) | ||
| else | ||
| call f_check_interpolation_2D(boundary_v, boundary_edge_count, (/dx_local, dy_local, 0._wp/), interpolate) | ||
| end if | ||
| interpolate = .false. | ||
|
|
||
| ! Show the number of edges and boundary edges in 2D STL models | ||
| if (proc_rank == 0 .and. p == 0) then | ||
| print *, ' * Number of 2D model boundary edges:', boundary_edge_count | ||
| end if | ||
|
|
||
| ! Interpolate the STL model along the edges (2D) and on triangle facets (3D) | ||
| if (interpolate) then | ||
| if (proc_rank == 0) then | ||
| print *, ' * Interpolating STL vertices.' | ||
| end if | ||
|
|
||
| if (p > 0) then | ||
| call f_interpolate_3D(model, (/dx, dy, dz/), interpolated_boundary_v, total_vertices) | ||
| else | ||
| call f_interpolate_2D(boundary_v, boundary_edge_count, (/dx, dy, dz/), interpolated_boundary_v, total_vertices) | ||
| end if | ||
|
|
||
| if (proc_rank == 0) then | ||
| print *, ' * Total number of interpolated boundary vertices:', total_vertices | ||
| end if | ||
| end if | ||
|
|
||
| if (proc_rank == 0) then | ||
| write (*, "(A, 3(2X, F20.10))") " > Model: Min:", bbox%min(1:3) | ||
| write (*, "(A, 3(2X, F20.10))") " > Cen:", (bbox%min(1:3) + bbox%max(1:3))/2._wp | ||
| write (*, "(A, 3(2X, F20.10))") " > Max:", bbox%max(1:3) | ||
|
|
||
| grid_mm(1, :) = (/minval(x_cc(0:m)) - 0.5_wp*dx_local, maxval(x_cc(0:m)) + 0.5_wp*dx_local/) | ||
| grid_mm(2, :) = (/minval(y_cc(0:n)) - 0.5_wp*dy_local, maxval(y_cc(0:n)) + 0.5_wp*dy_local/) | ||
|
|
||
| if (p > 0) then | ||
| grid_mm(3, :) = (/minval(z_cc(0:p)) - 0.5_wp*dz_local, maxval(z_cc(0:p)) + 0.5_wp*dz_local/) | ||
| else | ||
| grid_mm(3, :) = (/0._wp, 0._wp/) | ||
| end if | ||
|
|
||
| write (*, "(A, 3(2X, F20.10))") " > Domain: Min:", grid_mm(:, 1) | ||
| write (*, "(A, 3(2X, F20.10))") " > Cen:", (grid_mm(:, 1) + grid_mm(:, 2))/2._wp | ||
| write (*, "(A, 3(2X, F20.10))") " > Max:", grid_mm(:, 2) | ||
| end if | ||
|
|
||
| allocate (stl_bounding_boxes(patch_id, 1:3, 1:3)) | ||
| stl_bounding_boxes(patch_id, 1, 1:3) = [bbox%min(1), (bbox%min(1) + bbox%max(1))/2._wp, bbox%max(1)] | ||
| stl_bounding_boxes(patch_id, 2, 1:3) = [bbox%min(2), (bbox%min(2) + bbox%max(2))/2._wp, bbox%max(2)] | ||
| stl_bounding_boxes(patch_id, 3, 1:3) = [bbox%min(3), (bbox%min(3) + bbox%max(3))/2._wp, bbox%max(3)] | ||
|
|
||
| models(patch_id)%model = model | ||
| models(patch_id)%boundary_v = boundary_v | ||
| models(patch_id)%boundary_edge_count = boundary_edge_count | ||
| if (interpolate) then | ||
| models(patch_id)%interpolate = 1 | ||
| else | ||
| models(patch_id)%interpolate = 0 | ||
| end if | ||
| if (interpolate) then | ||
| models(patch_id)%interpolated_boundary_v = interpolated_boundary_v | ||
| models(patch_id)%total_vertices = total_vertices | ||
| end if | ||
|
|
||
| end if | ||
| end do | ||
|
|
||
| ! Pack and upload flat arrays for GPU (AFTER the loop) | ||
| block | ||
| integer :: pid, max_ntrs | ||
| integer :: max_bv1, max_bv2, max_bv3, max_iv1, max_iv2 | ||
|
|
||
| max_ntrs = 0 | ||
| max_bv1 = 0; max_bv2 = 0; max_bv3 = 0 | ||
| max_iv1 = 0; max_iv2 = 0 | ||
|
|
||
| do pid = 1, num_ibs | ||
| if (allocated(models(pid)%model)) then | ||
| call s_pack_model_for_gpu(models(pid)) | ||
| max_ntrs = max(max_ntrs, models(pid)%ntrs) | ||
| end if | ||
| if (allocated(models(pid)%boundary_v)) then | ||
| max_bv1 = max(max_bv1, size(models(pid)%boundary_v, 1)) | ||
| max_bv2 = max(max_bv2, size(models(pid)%boundary_v, 2)) | ||
| max_bv3 = max(max_bv3, size(models(pid)%boundary_v, 3)) | ||
| end if | ||
| if (allocated(models(pid)%interpolated_boundary_v)) then | ||
| max_iv1 = max(max_iv1, size(models(pid)%interpolated_boundary_v, 1)) | ||
| max_iv2 = max(max_iv2, size(models(pid)%interpolated_boundary_v, 2)) | ||
| end if | ||
| end do | ||
|
|
||
| if (max_ntrs > 0) then | ||
| allocate (gpu_ntrs(1:num_ibs)) | ||
| allocate (gpu_trs_v(1:3, 1:3, 1:max_ntrs, 1:num_ibs)) | ||
| allocate (gpu_trs_n(1:3, 1:max_ntrs, 1:num_ibs)) | ||
| allocate (gpu_interpolate(1:num_ibs)) | ||
| allocate (gpu_boundary_edge_count(1:num_ibs)) | ||
| allocate (gpu_total_vertices(1:num_ibs)) | ||
|
|
||
| gpu_ntrs = 0 | ||
| gpu_trs_v = 0._wp | ||
| gpu_trs_n = 0._wp | ||
| gpu_interpolate = 0 | ||
| gpu_boundary_edge_count = 0 | ||
| gpu_total_vertices = 0 | ||
|
|
||
| if (max_bv1 > 0) then | ||
| allocate (gpu_boundary_v(1:max_bv1, 1:max_bv2, 1:max_bv3, 1:num_ibs)) | ||
| gpu_boundary_v = 0._wp | ||
| end if | ||
|
|
||
| if (max_iv1 > 0) then | ||
| allocate (gpu_interpolated_boundary_v(1:max_iv1, 1:max_iv2, 1:num_ibs)) | ||
| gpu_interpolated_boundary_v = 0._wp | ||
| end if | ||
|
|
||
| do pid = 1, num_ibs | ||
| if (allocated(models(pid)%model)) then | ||
| gpu_ntrs(pid) = models(pid)%ntrs | ||
| gpu_trs_v(:, :, 1:models(pid)%ntrs, pid) = models(pid)%trs_v | ||
| gpu_trs_n(:, 1:models(pid)%ntrs, pid) = models(pid)%trs_n | ||
| gpu_interpolate(pid) = models(pid)%interpolate | ||
| gpu_boundary_edge_count(pid) = models(pid)%boundary_edge_count | ||
| gpu_total_vertices(pid) = models(pid)%total_vertices | ||
| end if | ||
| if (allocated(models(pid)%boundary_v)) then | ||
| gpu_boundary_v(1:size(models(pid)%boundary_v, 1), & | ||
| 1:size(models(pid)%boundary_v, 2), & | ||
| 1:size(models(pid)%boundary_v, 3), pid) = models(pid)%boundary_v | ||
| end if | ||
| if (allocated(models(pid)%interpolated_boundary_v)) then | ||
| gpu_interpolated_boundary_v(1:size(models(pid)%interpolated_boundary_v, 1), & | ||
| 1:size(models(pid)%interpolated_boundary_v, 2), pid) = models(pid)%interpolated_boundary_v | ||
| end if | ||
| end do | ||
|
|
||
| $:GPU_ENTER_DATA(copyin='[gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices]') | ||
| if (allocated(gpu_boundary_v)) then | ||
| $:GPU_ENTER_DATA(copyin='[gpu_boundary_v]') | ||
| end if | ||
| if (allocated(gpu_interpolated_boundary_v)) then | ||
| $:GPU_ENTER_DATA(copyin='[gpu_interpolated_boundary_v]') | ||
| end if | ||
| end if | ||
| end block | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for all occurrences of stl_bounding_boxes in the codebase
echo "=== stl_bounding_boxes allocations and usage ==="
rg -n "stl_bounding_boxes" src/simulation/m_ib_patches.fpp src/common/m_model.fpp 2>/dev/null || echo "Not found in specified files"
# Broader search
rg -n "stl_bounding_boxes" --type f90 --type fpp 2>/dev/null || trueRepository: MFlowCode/MFC
Length of output: 999
🏁 Script executed:
#!/bin/bash
# Check total_vertices initialization and usage pattern
echo "=== total_vertices usage pattern ==="
rg -n "total_vertices" src/simulation/m_ib_patches.fpp | head -50
# Check gpu_total_vertices
echo "=== gpu_total_vertices pattern ==="
rg -n "gpu_total_vertices" src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 1328
🏁 Script executed:
#!/bin/bash
# Check gpu_* allocations and deallocations
echo "=== gpu_* array allocations ==="
rg -n "allocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30
echo "=== gpu_* deallocations ==="
rg -n "deallocate.*gpu_" src/simulation/m_ib_patches.fpp | head -30
# Check if there's a finalize/cleanup subroutine
echo "=== Potential cleanup routines ==="
rg -n "subroutine.*finalize|subroutine.*cleanup" src/simulation/m_ib_patches.fppRepository: MFlowCode/MFC
Length of output: 855
🏁 Script executed:
#!/bin/bash
# Check the module-level declarations to understand data structure
echo "=== Module-level declarations for gpu_* and stl_bounding_boxes ==="
rg -n "real.*gpu_|integer.*gpu_|allocatable.*gpu_|real.*stl_bounding|allocatable.*stl_bounding" src/simulation/m_ib_patches.fpp | head -40
# Look for module/global scope context
head -150 src/simulation/m_ib_patches.fpp | tail -100Repository: MFlowCode/MFC
Length of output: 4044
s_instantiate_STL_models has critical allocation/initialization issues breaking multi-model scenarios.
All four issues are confirmed:
-
stl_bounding_boxesallocation inside loop: Line 1069 allocates withallocate(stl_bounding_boxes(patch_id, 1:3, 1:3))within thedo patch_id = 1, num_ibsloop. This will fail on the second iteration—the first allocates shape (1,3,3), the second attempts (2,3,3), but the array is already allocated. Should allocate once before the loop with shape(num_ibs, 1:3, 1:3). -
total_verticesuninitialized when interpolate=false: The local variabletotal_vertices(line 958) is only assigned at lines 1040 or 1042 inside theif (interpolate)block. Line 1084 unconditionally copies it intomodels(patch_id)%total_vertices, propagating garbage values when interpolation is skipped. Initialize or conditionally assign. -
Incorrect spacing vector in interpolation: Lines 1040 and 1042 pass
(/dx, dy, dz/)(array elements, not spacing magnitudes), though this is currently masked by line 1037 forcinginterpolate = .false.. If interpolation is re-enabled, use(/dx_local, dy_local, dz_local/). -
Missing allocation guards on gpu_ arrays*: Lines 1116–1121 allocate module-level
gpu_*arrays without checking if already allocated. Callings_instantiate_STL_modelstwice will fail. Add guards:if (.not. allocated(gpu_ntrs)) allocate(...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/simulation/m_ib_patches.fpp` around lines 954 - 1169,
s_instantiate_STL_models has four fixes: allocate stl_bounding_boxes once before
the patch loop with shape (num_ibs,1:3,1:3) instead of calling
allocate(stl_bounding_boxes(patch_id,...)) inside the loop; ensure
total_vertices is initialized or only copied into
models(patch_id)%total_vertices when interpolate is true (do not propagate
uninitialized total_vertices when interpolation is skipped); pass spacing
magnitudes to the interpolation routines by using (/dx_local, dy_local,
dz_local/) when calling f_interpolate_3D/f_interpolate_2D (and remove or fix the
unconditional interpolate = .false. that currently masks interpolation logic);
and protect module-level gpu_* allocations (gpu_ntrs, gpu_trs_v, gpu_trs_n,
gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices, gpu_boundary_v,
gpu_interpolated_boundary_v) with if (.not. allocated(...)) checks before each
allocate to allow repeated calls to s_instantiate_STL_models.
| self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000), | ||
| "num_ibs must be between 1 and num_patches_max (10)") |
There was a problem hiding this comment.
Fix outdated error message for num_ibs upper bound.
The validation allows up to 1000, but the message still says 10, which will mislead users. Consider deriving the bound from CONSTRAINTS to avoid future drift.
✅ Suggested fix
- self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
- "num_ibs must be between 1 and num_patches_max (10)")
+ max_ibs = CONSTRAINTS["num_ibs"]["max"]
+ self.prohibit(ib and (num_ibs <= 0 or num_ibs > max_ibs),
+ f"num_ibs must be between 1 and num_patches_max ({max_ibs})")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@toolchain/mfc/case_validator.py` around lines 687 - 688, The error message
passed to self.prohibit for the num_ibs check is stale (mentions 10) while the
actual upper bound is 1000; update the message to reflect the real max and, to
prevent future drift, construct the message using the authoritative constraint
value (e.g. the CONSTRAINTS constant or the class attribute used for bounds)
rather than a hardcoded literal—locate the self.prohibit call that checks
num_ibs (the line with self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
...)) and interpolate/format the upper-bound from CONSTRAINTS (or
self.CONSTRAINTS / CONSTRAINTS['num_ibs']['max'] as appropriate) so the message
reads "num_ibs must be between 1 and <max>".
Description
Following the refactor of the levelset, there were several performance optimizations left to be made to the code. This PR introduces optimizations that will make multi-particle MIBM code viable. It also expands the upper bound of allowed number of immersed boundaries to 1000. Performance was measured on 1-4 ranks of ACC GPU compute using A100 GPUs.
This PR has extended optimization to STL IBs, which are now fully capable of running on the GPU. An example is given below.
Type of change
Testing
All changes pass the IBM section of the test suite on GPUs with the NVHPC compiler. Performance was measured with a case of 1000 particles with viscosity enabled. The particles are all resolved 3D spheres given random non-overlapping positions generated by the following case file:
These optimizations add nearly x1000 performance in the moving IBM propagation and generation code. Prior to these optimizations, this was the result of the benchmark case using the NVIDIA NSight profiler showing 45 seconds to run a single RK substep:
Following these optimizations, the same profile achieves almost 50 ms per RK substep:

Checklist
See the developer guide for full coding standards.
GPU changes (expand if you modified
src/simulation/)Summary by CodeRabbit
Release Notes
New Features
Enhancements