Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
0a304a2
Moved all of the IB marker calculation to the GPU without copy
danieljvickers Feb 16, 2026
e8c778e
Added profiling and increased maximum num IBs to 1000
danieljvickers Feb 16, 2026
5d885bb
Performance tuning complted for marker generation
danieljvickers Feb 16, 2026
85889c0
Bindary search for IB index region beginning for reduced IB marker co…
danieljvickers Feb 16, 2026
f0085c9
ghost points are now computed on the GPU
danieljvickers Feb 17, 2026
bfcc593
image points computed on the GPU for x4 performance in that subroutine
danieljvickers Feb 17, 2026
3c4b6dd
Merge branch 'master' into gpu-optimizations
danieljvickers Feb 17, 2026
5201ee7
Need WAY more parameters in the case file... We should probably do so…
danieljvickers Feb 17, 2026
622edb0
Extended the binary search reduction to all 3D IB geometries
danieljvickers Feb 17, 2026
0a089ce
Extended area reduction to all non-model IBs
danieljvickers Feb 17, 2026
804a286
Intermittent commit for GPU STLs
danieljvickers Feb 18, 2026
64bc348
Ib markers computed on GPU working
danieljvickers Feb 18, 2026
bc972ca
Passes STL tests with GPU compute for IB markers (not added levelset …
danieljvickers Feb 18, 2026
a1769d0
Moved mdoel-specific code to the model file for cleanliness
danieljvickers Feb 19, 2026
5e58655
STLs appear to be working on the GPU with NVHPC!
danieljvickers Feb 19, 2026
6cc7acc
STLs ran on GPU in 3D!
danieljvickers Feb 19, 2026
5fc31be
Merge branch 'master' into gpu-optimizations
danieljvickers Feb 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/common/include/parallel_macros.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,16 @@
#endif
#:enddef

#:def END_GPU_ATOMIC_CAPTURE()
#:set acc_end_directive = '!$acc end atomic'
#:set omp_end_directive = '!$omp end atomic'
#if defined(MFC_OpenACC)
$:acc_end_directive
#elif defined(MFC_OpenMP)
$:omp_end_directive
#endif
#:enddef

#:def GPU_UPDATE(host=None, device=None, extraAccArgs=None, extraOmpArgs=None)
#:set acc_code = ACC_UPDATE(host=host, device=device, extraAccArgs=extraAccArgs)
#:set omp_code = OMP_UPDATE(host=host, device=device, extraOmpArgs=extraOmpArgs)
Expand Down
2 changes: 1 addition & 1 deletion src/common/m_constants.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ module m_constants
integer, parameter :: fourier_rings = 5 !< Fourier filter ring limit
integer, parameter :: num_fluids_max = 10 !< Maximum number of fluids in the simulation
integer, parameter :: num_probes_max = 10 !< Maximum number of flow probes in the simulation
integer, parameter :: num_patches_max = 10
integer, parameter :: num_patches_max = 1000
integer, parameter :: num_bc_patches_max = 10
integer, parameter :: pathlen_max = 400
integer, parameter :: nnode = 4 !< Number of QBMM nodes
Expand Down
8 changes: 7 additions & 1 deletion src/common/m_derived_types.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,18 @@ module m_derived_types
end type t_model

type :: t_model_array
! Original CPU-side fields (unchanged)
type(t_model), allocatable :: model
real(wp), allocatable, dimension(:, :, :) :: boundary_v
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
integer :: boundary_edge_count
integer :: total_vertices
logical :: interpolate
integer :: interpolate

! GPU-friendly flattened arrays
integer :: ntrs ! copy of model%ntrs
real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices
real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals
end type t_model_array
Comment on lines 185 to 198
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the current state of the file at the specified lines
cat -n src/common/m_derived_types.fpp | sed -n '180,210p'

Repository: MFlowCode/MFC

Length of output: 1607


🏁 Script executed:

# Search for all uses of %interpolate across the codebase
rg -nP '%interpolate\b' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 344


🏁 Script executed:

# Look for suspicious logical patterns with %interpolate
rg -nP '%interpolate\b.*(\.(true|false)|\.not\.)' src --type-add 'fpp:*.fpp' --type 'fpp'

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

# Check if there are any conditional patterns that may need review
rg -nP 'if\s*\(.*%interpolate' src --type-add 'fpp:*.fpp' --type 'fpp' -A 2

Repository: MFlowCode/MFC

Length of output: 854


🏁 Script executed:

# Check for assignments to %interpolate in the codebase
rg -nP '%interpolate\s*=' src --type-add 'fpp:*.fpp' --type 'fpp' -B 1 -A 1

Repository: MFlowCode/MFC

Length of output: 429


t_model_array scalars must be initialized to prevent garbage values reaching GPU or uninitialized field reads.

All uses of %interpolate correctly employ integer semantics (0/1, no logical operators); however, the type definition still lacks default initialization for boundary_edge_count, total_vertices, interpolate, and ntrs. Uninitialized scalars can leak garbage into GPU staging arrays or cause undefined behavior if any code path reads these fields before assignment.

Apply default initialization
         integer :: boundary_edge_count = 0
         integer :: total_vertices = 0
         integer :: interpolate = 0

         integer :: ntrs = 0  ! copy of model%ntrs

Additionally, verify that changes do not break pre_process and post_process executables, not just simulation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type :: t_model_array
! Original CPU-side fields (unchanged)
type(t_model), allocatable :: model
real(wp), allocatable, dimension(:, :, :) :: boundary_v
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
integer :: boundary_edge_count
integer :: total_vertices
logical :: interpolate
integer :: interpolate
! GPU-friendly flattened arrays
integer :: ntrs ! copy of model%ntrs
real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices
real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals
end type t_model_array
type :: t_model_array
! Original CPU-side fields (unchanged)
type(t_model), allocatable :: model
real(wp), allocatable, dimension(:, :, :) :: boundary_v
real(wp), allocatable, dimension(:, :) :: interpolated_boundary_v
integer :: boundary_edge_count = 0
integer :: total_vertices = 0
integer :: interpolate = 0
! GPU-friendly flattened arrays
integer :: ntrs = 0 ! copy of model%ntrs
real(wp), allocatable, dimension(:, :, :) :: trs_v ! (3, 3, ntrs) - triangle vertices
real(wp), allocatable, dimension(:, :) :: trs_n ! (3, ntrs) - triangle normals
end type t_model_array
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_derived_types.fpp` around lines 186 - 199, The t_model_array
derived type declares scalar fields boundary_edge_count, total_vertices,
interpolate and ntrs without defaults which can leave garbage values; add
default initializers for these scalars in the type definition (set
boundary_edge_count=0, total_vertices=0, interpolate=0, ntrs=0) so all instances
start deterministic before any assignment, and then run and verify pre_process,
post_process and simulation builds/tests to ensure the initialization change
doesn't break their workflows; update any constructors or code that relied on
uninitialized behavior if necessary.


!> Derived type adding initial condition (ic) patch parameters as attributes
Expand Down
2 changes: 2 additions & 0 deletions src/common/m_helper.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,8 @@ contains
!! @return The cross product of the two vectors.
pure function f_cross(a, b) result(c)

$:GPU_ROUTINE(parallelism='[seq]')

real(wp), dimension(3), intent(in) :: a, b
real(wp), dimension(3) :: c

Expand Down
197 changes: 181 additions & 16 deletions src/common/m_model.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,28 @@ module m_model

private

public :: f_model_read, s_model_write, s_model_free, f_model_is_inside
public :: f_model_read, s_model_write, s_model_free, f_model_is_inside, models, gpu_ntrs, &
gpu_trs_v, gpu_trs_n, gpu_boundary_v, gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count, &
gpu_total_vertices, stl_bounding_boxes

! Subroutines for STL immersed boundaries
public :: f_check_boundary, f_register_edge, f_check_interpolation_2D, &
f_check_interpolation_3D, f_interpolate_2D, f_interpolate_3D, &
f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area
f_interpolated_distance, f_normals, f_distance, f_distance_normals_3D, f_tri_area, s_pack_model_for_gpu, &
f_model_is_inside_flat, f_distance_normals_3d_flat

!! array of STL models that can be allocated and then used in IB marker and levelset compute
type(t_model_array), allocatable, target :: models(:)
!! GPU-friendly flat arrays for STL model data
integer, allocatable :: gpu_ntrs(:)
real(wp), allocatable, dimension(:, :, :, :) :: gpu_trs_v
real(wp), allocatable, dimension(:, :, :) :: gpu_trs_n
real(wp), allocatable, dimension(:, :, :, :) :: gpu_boundary_v
real(wp), allocatable, dimension(:, :, :) :: gpu_interpolated_boundary_v
integer, allocatable :: gpu_interpolate(:)
integer, allocatable :: gpu_boundary_edge_count(:)
integer, allocatable :: gpu_total_vertices(:)
real(wp), allocatable :: stl_bounding_boxes(:, :, :)

Comment on lines +21 to 43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find allocation and deallocation sites for the new module variables
echo "=== Allocation/Deallocation patterns ==="
rg -nP --type fpp '(allocate|deallocate)\s*\(' src/common/m_model.fpp | head -30

echo -e "\n=== GPU array allocations in entire src/common ==="
rg -nP --type fpp 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|gpu_interpolate|gpu_boundary_edge_count|gpu_total_vertices|stl_bounding_boxes' src/common/ | grep -i allocate | head -20

echo -e "\n=== Where m_model is imported ==="
rg -n 'use.*m_model' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 622


🏁 Script executed:

#!/bin/bash
# Check the finalization routine referenced in the public interface
echo "=== s_model_free subroutine definition ==="
rg -A 50 'subroutine\s+s_model_free' src/common/m_model.fpp | head -60

echo -e "\n=== s_model_write subroutine definition ==="
rg -A 30 'subroutine\s+s_model_write' src/common/m_model.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 2632


🏁 Script executed:

#!/bin/bash
# Check if pre_process or post_process reference the new public variables or m_model
echo "=== pre_process usage of m_model ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/pre_process/ 2>/dev/null | head -10

echo -e "\n=== post_process usage of m_model ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_trs_n|gpu_boundary_v|gpu_interpolated_boundary_v|stl_bounding_boxes|models' src/post_process/ 2>/dev/null | head -10

echo -e "\n=== Check if pre_process or post_process import m_model ==="
rg -n 'use.*m_model' src/pre_process/ src/post_process/ 2>/dev/null

Repository: MFlowCode/MFC

Length of output: 518


🏁 Script executed:

#!/bin/bash
# Get the full m_model.fpp file to understand context
wc -l src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 84


🏁 Script executed:

#!/bin/bash
# Read the module to find allocation patterns
# Focus on where the public arrays are allocated
sed -n '20,100p' src/common/m_model.fpp | cat -n

Repository: MFlowCode/MFC

Length of output: 3661


🏁 Script executed:

#!/bin/bash
# Search for allocation patterns - try broader search
rg -n 'allocate' src/common/m_model.fpp | head -30

Repository: MFlowCode/MFC

Length of output: 786


🏁 Script executed:

#!/bin/bash
# Look for initialization subroutines that might allocate these arrays
rg -n 'subroutine.*init\|subroutine.*allocate\|subroutine.*setup' src/common/m_model.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for where models(:) is allocated
rg -B 3 -A 3 'models\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 430


🏁 Script executed:

#!/bin/bash
# Look for gpu_ntrs allocation pattern
rg -B 3 -A 3 'gpu_ntrs\s*\(' src/common/m_model.fpp | head -50

Repository: MFlowCode/MFC

Length of output: 490


🏁 Script executed:

#!/bin/bash
# Check how these are used in simulation to understand lifecycle
echo "=== Usage in m_ibm.fpp ==="
rg -B 2 -A 2 'gpu_ntrs\|gpu_trs_v\|gpu_boundary_v\|models\(' src/simulation/m_ibm.fpp | head -40

echo -e "\n=== Usage in pre_process/m_icpp_patches.fpp ==="
rg -B 2 -A 2 'models\(' src/pre_process/m_icpp_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Search for allocation of the MODULE-LEVEL arrays in calling modules
echo "=== Search for allocate of gpu_ntrs in entire codebase ==="
rg 'allocate.*gpu_ntrs\|gpu_ntrs.*allocate' src/ | head -20

echo -e "\n=== Search for s_pack_model_for_gpu ==="
rg -n 's_pack_model_for_gpu' src/ | head -20

echo -e "\n=== Look for where gpu arrays are assigned/populated ==="
rg -n 'gpu_ntrs\s*=' src/ | head -20

Repository: MFlowCode/MFC

Length of output: 591


🏁 Script executed:

#!/bin/bash
# Search for deallocate of module-level arrays
echo "=== Deallocate of module-level gpu arrays ==="
rg 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*models\(' src/ | head -20

echo -e "\n=== Look for finalization in m_ibm.fpp ==="
rg -n 'finalize\|cleanup\|deallocate' src/simulation/m_ibm.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 146


🏁 Script executed:

#!/bin/bash
# Look at s_pack_model_for_gpu implementation
echo "=== s_pack_model_for_gpu subroutine ==="
rg -A 80 'subroutine\s+s_pack_model_for_gpu' src/common/m_model.fpp | head -100

Repository: MFlowCode/MFC

Length of output: 519


🏁 Script executed:

#!/bin/bash
# Final verification: confirm module-level arrays have no lifecycle
echo "=== Complete scan for any allocation pattern ==="
rg 'allocate\s*\(' src/common/m_model.fpp src/simulation/m_ib_patches.fpp src/simulation/m_ibm.fpp | grep -E 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\s*\(' || echo "NO ALLOCATIONS FOUND for public module arrays"

echo -e "\n=== Confirm these are truly module-level (public scope) ==="
rg -B 1 'integer.*allocatable.*gpu_ntrs|real.*allocatable.*gpu_trs' src/common/m_model.fpp

Repository: MFlowCode/MFC

Length of output: 763


🏁 Script executed:

#!/bin/bash
# Search for deallocation of these specific arrays
echo "=== Deallocate patterns for public module arrays ==="
rg -n 'deallocate.*gpu_ntrs\|deallocate.*gpu_trs_v\|deallocate.*gpu_boundary_v\|deallocate.*models' src/ | head -20

echo -e "\n=== Broader search for deallocate in m_ib_patches.fpp ==="
rg -n 'deallocate' src/simulation/m_ib_patches.fpp | head -20

echo -e "\n=== Context around gpu array allocations in m_ib_patches.fpp ==="
rg -B 5 -A 10 'allocate.*gpu_ntrs' src/simulation/m_ib_patches.fpp | head -40

Repository: MFlowCode/MFC

Length of output: 890


🏁 Script executed:

#!/bin/bash
# Check what pre_process actually uses from m_model
echo "=== What does pre_process use from m_model? ==="
sed -n '1,50p' src/pre_process/m_icpp_patches.fpp | grep -A 15 'use m_model'

echo -e "\n=== Any direct usage of gpu_* or models arrays in pre_process? ==="
rg -n 'gpu_ntrs|gpu_trs_v|gpu_boundary_v|models\(' src/pre_process/m_icpp_patches.fpp | head -10

Repository: MFlowCode/MFC

Length of output: 553


Add deallocate() calls for all public module arrays in a finalization routine, and remove unnecessary m_model import from pre_process.

The public module arrays (gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_interpolate, gpu_boundary_edge_count, gpu_total_vertices) are allocated in m_ib_patches.fpp but never deallocated, causing GPU memory leaks. Create a finalization subroutine in m_model.fpp to deallocate these arrays and call it during simulation cleanup. Additionally, src/pre_process/m_icpp_patches.fpp imports m_model but does not use any of the gpu or models arrays—remove this unnecessary coupling per src/common/ guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 20 - 42, Add a module finalizer in
m_model.fpp (e.g., subroutine s_finalize_model or a MODULE PROCEDURE bound to
final) that checks ALLOCATED(...) and calls DEALLOCATE(...) for all public
allocatable arrays gpu_ntrs, gpu_trs_v, gpu_trs_n, gpu_boundary_v,
gpu_interpolated_boundary_v, gpu_interpolate, gpu_boundary_edge_count,
gpu_total_vertices, stl_bounding_boxes and models to ensure GPU/host memory is
freed during simulation cleanup; export or document the finalizer so the cleanup
path in your driver or cleanup routine calls it. Also remove the unnecessary
"use m_model" import from src/pre_process/m_icpp_patches.fpp (or the specific
pre_process file mentioned) since it does not reference the gpu_* arrays or
models, eliminating the unwanted coupling.

contains

Expand Down Expand Up @@ -481,6 +497,23 @@ contains
is_buffered = .true.
end subroutine s_skip_ignored_lines

!> This function is used to replace the fortran random number
!! generator because the native generator is not compatible being called
!! from GPU routines/functions
function f_model_random_number(seed) result(rval)

$:GPU_ROUTINE(parallelism='[seq]')

integer, intent(inout) :: seed
real(wp) :: rval

seed = ieor(seed, ishft(seed, 13))
seed = ieor(seed, ishft(seed, -17))
seed = ieor(seed, ishft(seed, 5))

rval = abs(real(seed, wp))/real(huge(seed), wp)
end function f_model_random_number

!> This procedure, recursively, finds whether a point is inside an octree.
!! @param model Model to search in.
!! @param point Point to test.
Expand All @@ -495,58 +528,116 @@ contains
real(wp), dimension(1:3), intent(in) :: point
real(wp), dimension(1:3), intent(in) :: spacing
integer, intent(in) :: spc
real(wp) :: phi, theta
integer :: rand_seed

real(wp) :: fraction

type(t_ray) :: ray
integer :: i, j, nInOrOut, nHits
integer :: i, j, k, nInOrOut, nHits

real(wp), dimension(1:spc, 1:3) :: ray_origins, ray_dirs

! TODO :: The random number generation prohibits GPU compute due to the subroutine not being able to be called in kernels
! This should be swapped out with something that allows GPU compute. I recommend the fibonacci sphere:
! do i = 1, spc
! phi = acos(1.0 - 2.0*(i-1.0)/(spc-1.0))
! theta = pi * (1.0 + sqrt(5.0)) * (i-1.0)
! ray_dirs(i,:) = [cos(theta)*sin(phi), sin(theta)*sin(phi), cos(phi)]
! ray_origins(i,:) = point
! end do
rand_seed = int(point(1)*73856093_wp) + &
int(point(2)*19349663_wp) + &
int(point(3)*83492791_wp)
Comment on lines +541 to +543
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lint failure: _wp suffix on integer hash constants loses precision in single-precision builds.

The pipeline flags 73856093_wp, 19349663_wp, 83492791_wp. Using _wp makes these real literals, but these primes exceed the single-precision mantissa (~8.4M), so they'll be silently truncated when wp is single precision. Use explicit conversion instead.

Proposed fix
-        rand_seed = int(point(1)*73856093_wp) + &
-                    int(point(2)*19349663_wp) + &
-                    int(point(3)*83492791_wp)
+        rand_seed = int(point(1)*real(73856093, wp)) + &
+                    int(point(2)*real(19349663, wp)) + &
+                    int(point(3)*real(83492791, wp))

Apply the same fix to lines 599–601 in f_model_is_inside_flat.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rand_seed = int(point(1)*73856093_wp) + &
int(point(2)*19349663_wp) + &
int(point(3)*83492791_wp)
rand_seed = int(point(1)*real(73856093, wp)) + &
int(point(2)*real(19349663, wp)) + &
int(point(3)*real(83492791, wp))
🧰 Tools
🪛 GitHub Actions: Lint Source

[error] 541-541: Pattern match found: '73856093_wp'


[error] 542-542: Pattern match found: '19349663_wp'


[error] 543-543: Pattern match found: '83492791_wp'

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 541 - 543, The hash constants are
written with the `_wp` suffix (e.g. 73856093_wp) which creates real literals
that will be rounded in single-precision builds; change them to use explicit
conversion from integer to the working kind, e.g. replace occurrences like
int(point(1)*73856093_wp) with int(point(1)*real(73856093, wp)) (or equivalently
real(73856093, wp) wrapped as needed) in the rand_seed computation in m_model
(the lines setting rand_seed) and make the identical change in
f_model_is_inside_flat (the three constants on lines ~599–601) so the integer
constants are converted explicitly to kind wp instead of being written as real
literals.

if (rand_seed == 0) rand_seed = 1

! generate our random collection or rays
do i = 1, spc
call random_number(ray_origins(i, :))
ray_origins(i, :) = point + (ray_origins(i, :) - 0.5_wp)*spacing(:)

call random_number(ray_dirs(i, :))
ray_dirs(i, :) = ray_dirs(i, :) - 0.5_wp
do k = 1, 3
! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
! cast sample rays in all directions
ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/common/m_model.fpp, line 550:

<comment>Ray directions should be random around zero; adding point(k) skews them toward the position vector and biases inside/outside tests. Use only the random offset for directions.</comment>

<file context>
@@ -493,58 +526,116 @@ contains
+                ! random jitter in the origin helps us estimate volume fraction instead of only at the cell center
+                ray_origins(i, k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
+                ! cast sample rays in all directions
+                ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
+            end do
             ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
</file context>
Suggested change
ray_dirs(i, k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
ray_dirs(i, k) = f_model_random_number(rand_seed) - 0.5_wp

end do
ray_dirs(i, :) = ray_dirs(i, :)/sqrt(sum(ray_dirs(i, :)*ray_dirs(i, :)))
end do

! ray trace
nInOrOut = 0
do i = 1, spc
ray%o = ray_origins(i, :)
ray%d = ray_dirs(i, :)

nHits = 0
do j = 1, model%ntrs
! count the number of triangles this ray intersects
if (f_intersects_triangle(ray, model%trs(j))) then
nHits = nHits + 1
end if
end do

! if the ray hits an odd number of triangles on its way out, then
! it must be on the inside of the model
nInOrOut = nInOrOut + mod(nHits, 2)
end do

fraction = real(nInOrOut)/real(spc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

real() without kind parameter — use real(..., wp) for precision consistency.

real(nInOrOut)/real(spc) uses default real kind, which may differ from wp. Same issue at line 629 in f_model_is_inside_flat. As per coding guidelines: avoid implicit precision; use real(nInOrOut, wp)/real(spc, wp).

Proposed fix (line 576)
-        fraction = real(nInOrOut)/real(spc)
+        fraction = real(nInOrOut, wp)/real(spc, wp)
Proposed fix (line 629)
-        fraction = real(nInOrOut)/real(spc)
+        fraction = real(nInOrOut, wp)/real(spc, wp)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fraction = real(nInOrOut)/real(spc)
fraction = real(nInOrOut, wp)/real(spc, wp)
Suggested change
fraction = real(nInOrOut)/real(spc)
fraction = real(nInOrOut, wp)/real(spc, wp)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` at line 576, The expression computing fraction uses
default real kind; update the conversion to enforce working precision by
replacing real(nInOrOut)/real(spc) with real(nInOrOut, wp)/real(spc, wp) and
apply the same change in the f_model_is_inside_flat occurrence (line referencing
spc and nInOrOut) so both numerator and denominator use the wp kind
consistently.


end function f_model_is_inside

impure function f_model_is_inside_flat(ntrs, trs_v, trs_n, pid, point, spacing, spc) result(fraction)

$:GPU_ROUTINE(parallelism='[seq]')

integer, intent(in) :: ntrs
real(wp), dimension(:, :, :, :), intent(in) :: trs_v
real(wp), dimension(:, :, :), intent(in) :: trs_n
integer, intent(in) :: pid
real(wp), dimension(1:3), intent(in) :: point
real(wp), dimension(1:3), intent(in) :: spacing
integer, intent(in) :: spc

real(wp) :: fraction
real(wp) :: origin(1:3), dir(1:3), dir_mag
type(t_ray) :: ray
type(t_triangle) :: tri
integer :: i, j, k, nInOrOut, nHits
integer :: rand_seed

rand_seed = int(point(1)*73856093_wp) + &
int(point(2)*19349663_wp) + &
int(point(3)*83492791_wp)
if (rand_seed == 0) rand_seed = 1

! generate our random collection of rays
nInOrOut = 0
do i = 1, spc
! Generate one ray at a time — no arrays needed
do k = 1, 3
origin(k) = point(k) + (f_model_random_number(rand_seed) - 0.5_wp)*spacing(k)
dir(k) = point(k) + f_model_random_number(rand_seed) - 0.5_wp
end do
dir_mag = sqrt(dir(1)*dir(1) + dir(2)*dir(2) + dir(3)*dir(3))
dir(:) = dir(:)/dir_mag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Guard the normalization of the ray direction in the flat "inside" check to avoid potential division by zero in degenerate cases. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
dir(:) = dir(:)/dir_mag
if (dir_mag > 0._wp) then
dir(:) = dir(:)/dir_mag
else
dir(:) = 0._wp
end if
Why it matters? ⭐

Normalizing a zero-length direction vector can produce a division-by-zero or NaNs; guarding with a magnitude check directly fixes a correctness/numerical stability issue (priority #1 in the MFC rules). The suggested conditional is syntactically valid Fortran and directly prevents the unsafe division.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_model.fpp
**Line:** 613:613
**Comment:**
	*Custom Rule: Guard the normalization of the ray direction in the flat "inside" check to avoid potential division by zero in degenerate cases.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎


ray%o = origin
ray%d = dir

nHits = 0
do j = 1, ntrs
tri%v(:, :) = trs_v(:, :, j, pid)
tri%n(:) = trs_n(:, j, pid)
if (f_intersects_triangle(ray, tri)) then
nHits = nHits + 1
end if
end do
nInOrOut = nInOrOut + mod(nHits, 2)
end do

fraction = real(nInOrOut)/real(spc)
end function f_model_is_inside_flat

! From https://www.scratchapixel.com/lessons/3e-basic-rendering/ray-tracing-rendering-a-triangle/ray-triangle-intersection-geometric-solution.html
!> This procedure checks if a ray intersects a triangle.
!! @param ray Ray.
!! @param triangle Triangle.
!! @return True if the ray intersects the triangle, false otherwise.
elemental function f_intersects_triangle(ray, triangle) result(intersects)

$:GPU_ROUTINE(parallelism='[seq]')

type(t_ray), intent(in) :: ray
type(t_triangle), intent(in) :: triangle

Expand Down Expand Up @@ -1109,6 +1200,66 @@ contains

end subroutine f_distance_normals_3D

subroutine f_distance_normals_3D_flat(ntrs, trs_v, trs_n, pid, point, normals, distance)

$:GPU_ROUTINE(parallelism='[seq]')

integer, intent(in) :: ntrs
real(wp), dimension(:, :, :, :), intent(in) :: trs_v
real(wp), dimension(:, :, :), intent(in) :: trs_n
integer, intent(in) :: pid
real(wp), dimension(1:3), intent(in) :: point
real(wp), dimension(1:3), intent(out) :: normals
real(wp), intent(out) :: distance

real(wp), dimension(1:3, 1:3) :: tri
real(wp) :: dist_min, dist_t_min
real(wp) :: dist_min_normal, dist_buffer_normal
real(wp), dimension(1:3) :: midp
real(wp), dimension(1:3) :: dist_buffer
integer :: i, j, tri_idx

dist_min = 1.e12_wp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: In f_distance_normals_3D_flat, tri_idx remains zero when ntrs is zero, leading to an out-of-bounds access trs_n(:, tri_idx, pid) and undefined behavior; you should explicitly handle the ntrs <= 0 case before the loop by returning a safe default normal and distance. [off-by-one]

Severity Level: Major ⚠️
- ❌ 3D flat distance/normals routine may read out-of-bounds.
- ⚠️ GPU levelset computations for STL IBs may crash.
Suggested change
dist_min = 1.e12_wp
! Handle degenerate case with no triangles to avoid invalid indexing
if (ntrs <= 0) then
normals = 0._wp
distance = 0._wp
return
end if
Steps of Reproduction ✅
1. Build the code including `module m_model` in `src/common/m_model.fpp`, ensuring
bounds-checking is enabled (e.g., `-Mbounds` with NVHPC or `-check bounds` with ifort).

2. From any caller (e.g., a level-set or IBM GPU packing routine) invoke
`f_distance_normals_3D_flat` (defined at lines 1203–1261 in `src/common/m_model.fpp`) with
`ntrs = 0`, and with `trs_v`/`trs_n` passed from a `t_model_array` whose `trs_v`/`trs_n`
were allocated via `s_pack_model_for_gpu` at lines 1395–1406 (which allocates them with
lower bound 1 in the triangle dimension).

3. When `f_distance_normals_3D_flat` executes, the `do i = 1, ntrs` loop (lines 1215–1236)
is skipped because `ntrs = 0`, leaving `tri_idx` at its initialized value of 0.

4. The routine then executes the assignments to `normals` at lines 1256–1258, accessing
`trs_n(:, tri_idx, pid)` with `tri_idx = 0`; since `trs_n` is indexed from 1, this is an
out-of-bounds access that will either trigger a runtime bounds error (with checking on) or
cause undefined behavior/incorrect normals on CPU or GPU.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_model.fpp
**Line:** 1222:1222
**Comment:**
	*Off By One: In `f_distance_normals_3D_flat`, `tri_idx` remains zero when `ntrs` is zero, leading to an out-of-bounds access `trs_n(:, tri_idx, pid)` and undefined behavior; you should explicitly handle the `ntrs <= 0` case before the loop by returning a safe default normal and distance.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

dist_min_normal = 1.e12_wp
distance = 0._wp

tri_idx = 0
do i = 1, ntrs
do j = 1, 3
tri(j, 1) = trs_v(j, 1, i, pid)
tri(j, 2) = trs_v(j, 2, i, pid)
tri(j, 3) = trs_v(j, 3, i, pid)
dist_buffer(j) = sqrt((point(1) - tri(j, 1))**2 + &
(point(2) - tri(j, 2))**2 + &
(point(3) - tri(j, 3))**2)
end do

do j = 1, 3
midp(j) = (tri(1, j) + tri(2, j) + tri(3, j))/3
end do

dist_t_min = minval(dist_buffer(1:3))
dist_buffer_normal = sqrt((point(1) - midp(1))**2 + &
(point(2) - midp(2))**2 + &
(point(3) - midp(3))**2)

if (dist_t_min < dist_min) then
dist_min = dist_t_min
end if

if (dist_buffer_normal < dist_min_normal) then
dist_min_normal = dist_buffer_normal
tri_idx = i
end if
end do

normals(1) = trs_n(1, tri_idx, pid)
normals(2) = trs_n(2, tri_idx, pid)
normals(3) = trs_n(3, tri_idx, pid)
distance = dist_min

end subroutine f_distance_normals_3D_flat
Comment on lines +1203 to +1261
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bf_distance_normals_3D_flat\s*\(' src

Repository: MFlowCode/MFC

Length of output: 345


🏁 Script executed:

sed -n '680,700p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 913


🏁 Script executed:

rg -nP 'gpu_ntrs' src/simulation/m_compute_levelset.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 186


🏁 Script executed:

rg -nP '\bgpu_ntrs\b' src --type-add 'fpp:*.fpp' | grep -E '(allocate|dimension|intent|gpu_ntrs\s*=)'

Repository: MFlowCode/MFC

Length of output: 206


🏁 Script executed:

sed -n '1110,1160p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 2729


🏁 Script executed:

sed -n '650,710p' src/simulation/m_compute_levelset.fpp

Repository: MFlowCode/MFC

Length of output: 2630


Add guard in f_distance_normals_3D_flat for ntrs <= 0 to prevent out-of-bounds access.

When ntrs is 0 (which can occur if a patch has no model allocated), tri_idx remains uninitialized at 0 and is used to index trs_n(:, tri_idx, pid) at the end, causing an out-of-bounds read. The caller at src/simulation/m_compute_levelset.fpp:688 does not validate gpu_ntrs(patch_id) > 0 before invoking this function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 1209 - 1267, The subroutine
f_distance_normals_3D_flat can index trs_n(:, tri_idx, pid) with tri_idx==0 when
ntrs <= 0; add a guard at the top of f_distance_normals_3D_flat that checks if
ntrs <= 0 and in that case set distance to a safe value (e.g. a large positive
sentinel or 0._wp as appropriate) and set normals to a defined default (e.g.
zero vector) and return immediately to avoid using tri_idx/uninitialized
indices; ensure you reference and update tri_idx, distance and normals in this
early-return path so no out-of-bounds access to trs_n occurs.


!> This procedure determines the levelset distance of 2D models without interpolation.
!! @param boundary_v Group of all the boundary vertices of the 2D model without interpolation
!! @param boundary_edge_count Output the total number of boundary edges
Expand Down Expand Up @@ -1240,4 +1391,18 @@ contains

end function f_interpolated_distance

subroutine s_pack_model_for_gpu(ma)
type(t_model_array), intent(inout) :: ma
integer :: i

ma%ntrs = ma%model%ntrs
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Safely deallocate any existing packed triangle buffers before reallocating them to prevent memory leaks when repacking models for the GPU. [custom_rule]

Severity Level: Minor ⚠️

Suggested change
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
if (allocated(ma%trs_v)) deallocate(ma%trs_v)
if (allocated(ma%trs_n)) deallocate(ma%trs_n)
Why it matters? ⭐

The PR added allocation of packed triangle buffers but did not guard against repeated calls to s_pack_model_for_gpu leaving previously allocated arrays attached — this directly violates the MFC memory rule about matching ALLOCATE/DEALLOCATE pairing. The proposed change (deallocate if allocated before reallocate) is a minimal, correct fix that prevents leaks and addresses the custom rule about paired allocation/deallocation.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/common/m_model.fpp
**Line:** 1399:1399
**Comment:**
	*Custom Rule: Safely deallocate any existing packed triangle buffers before reallocating them to prevent memory leaks when repacking models for the GPU.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

allocate (ma%trs_n(1:3, 1:ma%ntrs))

do i = 1, ma%ntrs
ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
ma%trs_n(:, i) = ma%model%trs(i)%n(:)
end do
end subroutine

Comment on lines +1394 to +1407
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all call sites to s_pack_model_for_gpu
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' '\bs_pack_model_for_gpu\s*\(' src || true

Repository: MFlowCode/MFC

Length of output: 220


🏁 Script executed:

#!/bin/bash
# Also search for the subroutine definition and nearby code
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' -B 5 -A 20 'subroutine s_pack_model_for_gpu' src || true

Repository: MFlowCode/MFC

Length of output: 1111


🏁 Script executed:

#!/bin/bash
# Check if ma%trs_v or ma%trs_n are referenced elsewhere to understand lifecycle
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'ma%trs_v|ma%trs_n' src || true

Repository: MFlowCode/MFC

Length of output: 367


🏁 Script executed:

#!/bin/bash
# Get context around the call site in m_ib_patches.fpp
sed -n '1080,1120p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 1850


🏁 Script executed:

#!/bin/bash
# Search for surrounding loop context (broader range)
sed -n '1050,1130p' src/simulation/m_ib_patches.fpp

Repository: MFlowCode/MFC

Length of output: 4004


🏁 Script executed:

#!/bin/bash
# Check if models is deallocated/re-initialized somewhere
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'models\s*\(|deallocate.*models' src/simulation/ | head -30

Repository: MFlowCode/MFC

Length of output: 3130


🏁 Script executed:

#!/bin/bash
# Check if s_instantiate_STL_models or models are re-initialized in any time-stepping/simulation loop
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 's_instantiate_STL_models|allocate.*models\(' src/simulation/ | grep -v '\.fpp-' | head -20

Repository: MFlowCode/MFC

Length of output: 1135


🏁 Script executed:

#!/bin/bash
# Search for any re-initialization or deallocate of models during simulation
rg -nP --type-add 'fpp:*.fpp' --type 'fpp' 'deallocate.*models|models.*=|trs_v.*deallocate|trs_n.*deallocate' src | head -20

Repository: MFlowCode/MFC

Length of output: 1318


Make s_pack_model_for_gpu re-entrant by checking allocations before allocating.

The function is called only once per object during initialization (in s_instantiate_STL_models), so there is currently no runtime risk. However, unconditional allocations of ma%trs_v and ma%trs_n violate defensive programming practices and would fail silently if the function were ever called again on the same object. Add pre-allocation checks:

Proposed fix
    ma%ntrs = ma%model%ntrs
+   if (allocated(ma%trs_v)) deallocate(ma%trs_v)
+   if (allocated(ma%trs_n)) deallocate(ma%trs_n)
    allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
    allocate (ma%trs_n(1:3, 1:ma%ntrs))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
subroutine s_pack_model_for_gpu(ma)
type(t_model_array), intent(inout) :: ma
integer :: i
ma%ntrs = ma%model%ntrs
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate (ma%trs_n(1:3, 1:ma%ntrs))
do i = 1, ma%ntrs
ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
ma%trs_n(:, i) = ma%model%trs(i)%n(:)
end do
end subroutine
subroutine s_pack_model_for_gpu(ma)
type(t_model_array), intent(inout) :: ma
integer :: i
ma%ntrs = ma%model%ntrs
if (allocated(ma%trs_v)) deallocate(ma%trs_v)
if (allocated(ma%trs_n)) deallocate(ma%trs_n)
allocate (ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate (ma%trs_n(1:3, 1:ma%ntrs))
do i = 1, ma%ntrs
ma%trs_v(:, :, i) = ma%model%trs(i)%v(:, :)
ma%trs_n(:, i) = ma%model%trs(i)%n(:)
end do
end subroutine
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/common/m_model.fpp` around lines 1402 - 1415, s_pack_model_for_gpu
currently unconditionally allocates ma%trs_v and ma%trs_n; make it re-entrant by
checking existing allocations before allocating: set ma%ntrs = ma%model%ntrs,
then for ma%trs_v and ma%trs_n use allocated(...) to detect prior allocation and
only allocate if not allocated or if the current allocation dimensions don't
match ma%ntrs (in that case deallocate then allocate with the correct bounds);
ensure you copy into ma%trs_v(:, :, i) and ma%trs_n(:, i) after the allocation
checks complete.

end module m_model
Loading
Loading