Skip to content

Refactor NSS and DNSS implementations for improved readability and modularity#6

Closed
eclipse0922 wants to merge 1 commit intomasterfrom
refactor_2
Closed

Refactor NSS and DNSS implementations for improved readability and modularity#6
eclipse0922 wants to merge 1 commit intomasterfrom
refactor_2

Conversation

@eclipse0922
Copy link
Owner

@eclipse0922 eclipse0922 commented Feb 8, 2026

User description

  • Rewrote legacy code into clean C++17 style.
  • Added missing method implementations and input validation.
  • Introduced options struct for configurable parameters in NSS and DNSS.
  • Implemented CUDA support for DNSS rotational feature computation.
  • Added new CMake configuration for building with or without CUDA.
  • Updated README to reflect changes and provide build instructions.
  • Added .gitignore to exclude build directories.

PR Type

Enhancement, Bug fix, Documentation


Description

  • Refactored NSS and DNSS implementations into clean C++17 with modular design and options structs.

  • Added input validation, missing method implementations, and improved bucket-based sampling logic.

  • Implemented optional CUDA acceleration for DNSS rotational feature computation.

  • Updated documentation, build system (CMake), and README with clear usage and configuration instructions.


Changes walkthrough 📝

Relevant files
Enhancement
NSS.cpp
Refactored NSS/DNSS core logic and sampling                           

NSS.cpp

  • Replaced legacy implementation with modular helper functions and clean
    C++17 code.
  • Introduced sphericalBucketIndex, computeCenteredAndScaledVertices, and
    computeRotationalReturnValue.
  • Rewrote sampling logic using lazy removal and priority queues for
    efficiency.
  • Added CUDA integration point for DNSS rotational features.
  • +545/-310
    NSS.h
    Modernized header with options and dependencies                   

    NSS.h

  • Replaced legacy class definitions with modern C++17 structs and
    options.
  • Added Options for both NSS and DNSS, including CUDA and bucket
    configuration.
  • Removed external dependencies (Eigen, Concurrency) and added minimal
    glm::fvec3 fallback.
  • +162/-123
    dnss_cuda.cu
    Added CUDA implementation for DNSS                                             

    dnss_cuda.cu

  • Implemented CUDA kernel for rotational normal and return computation.
  • Added host wrapper DNSSComputeRotationalFeaturesCuda with fallback
    logic.
  • +217/-0 
    Configuration changes
    CMakeLists.txt
    Added CMake CUDA integration                                                         

    CMakeLists.txt

  • Added CMake configuration with optional CUDA support via
    DNSS_ENABLE_CUDA.
  • Integrated dnss_cuda.cu and set appropriate compile definitions and
    properties.
  • +21/-0   
    Documentation
    README.md
    Updated documentation and build guide                                       

    README.md

  • Updated to reflect refactored implementation, algorithm notes, and
    build instructions.
  • Added sections on fixes, build options (CPU/CUDA), and references.
  • +50/-10 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @chatgpt-codex-connector
    Copy link

    You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Description updated to latest commit (c1f2767)

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Reviewer Guide 🔍

    (Review updated until commit b7adea4)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Memory leak in CUDA error path

    In DNSSComputeRotationalFeaturesCuda, when cudaMemcpy from device to host fails (lines 192-193), the function frees device memory but returns false without clearing the output vectors. This leaves *rotational_normals and *rotational_returns in an inconsistent state (partially assigned or empty), potentially causing undefined behavior in the caller. The CPU fallback path in DNSS::dualNormalSpaceSampling (lines 385-395) assumes these vectors are valid, but they may contain stale or uninitialized data.

    NSS::NSS(const Options &options)
    	: m_options(options), m_runtime_seed(options.random_seed)
    Potential division by zero in CUDA kernel

    In computeRotationalFeaturesKernel, when normal_norm is computed (line 51) and found to be invalid (line 52), it is set to 1.0f (line 54). However, if both point_norm and normal_norm are near-zero, subsequent division operations (e.g., pdx = px / point_norm on line 57) may still cause undefined behavior or NaN propagation before the early return on line 48. The check for point_norm <= kEpsilon occurs after point_norm is computed but before the divisions, but the normal_norm division happens after the point_norm check but before the normal_norm correction.

    const float point_norm = sqrtf(px * px + py * py + pz * pz);
    if (!(isfinite(point_norm)) || point_norm <= kEpsilon)
    {
    	rotational_returns[index] = 0.0f;
    	return;
    }
    
    float normal_norm = sqrtf(nx * nx + ny * ny + nz * nz);
    if (!(isfinite(normal_norm)) || normal_norm <= kEpsilon)
    {
    	normal_norm = 1.0f;
    }
    
    const float pdx = px / point_norm;
    const float pdy = py / point_norm;
    const float pdz = pz / point_norm;
    const float ndx = nx / normal_norm;
    const float ndy = ny / normal_norm;
    const float ndz = nz / normal_norm;
    Inconsistent bucket index handling

    In DNSS::dualNormalSpaceSampling, when t_bucket or r_bucket computed by translationalBucketIndex or rotationalBucketIndex are out of bounds (lines 423-430), they are clamped to 0. However, this clamping is applied after the bucket indices are used to populate t_buckets and r_buckets (lines 432-434). This means points with invalid bucket indices are all forced into bucket 0, potentially causing load imbalance and incorrect sampling behavior. The clamping should occur before inserting into the buckets.

    if (t_bucket < 0 || t_bucket >= t_bucket_count)
    {
    	t_bucket = 0;
    }
    if (r_bucket < 0 || r_bucket >= r_bucket_count)
    {
    	r_bucket = 0;
    }
    
    t_buckets[static_cast<std::size_t>(t_bucket)].push_back(static_cast<int>(point_index));
    r_buckets[static_cast<std::size_t>(r_bucket)].push(
    	RotationEntry{rotational_returns[point_index], static_cast<int>(point_index)});

    …dularity
    
    - Rewrote legacy code into clean C++17 style.
    - Added missing method implementations and input validation.
    - Introduced options struct for configurable parameters in NSS and DNSS.
    - Implemented CUDA support for DNSS rotational feature computation.
    - Added new CMake configuration for building with or without CUDA.
    - Updated README to reflect changes and provide build instructions.
    - Added .gitignore to exclude build directories.
    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Description updated to latest commit (b7adea4)

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    Persistent review updated to latest commit b7adea4

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Code Suggestions ✨

    Latest suggestions up to b7adea4

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Use atan2 for numerical stability

    The function computeRotationalReturnValue uses std::atan with potential division by
    zero or overflow risks. Replace it with std::atan2 for better numerical stability
    and correctness.

    NSS.cpp [74-75]

     float computeRotationalReturnValue(const glm::fvec3 &normalized_point, const glm::fvec3 &normal, float theta)
     {
     	const float point_norm = vectorLength(normalized_point);
     	if (!std::isfinite(point_norm) || point_norm <= kEpsilon)
     	{
     		return 0.0f;
     	}
     
     	const glm::fvec3 point_dir = normalized_point / point_norm;
     	const glm::fvec3 normal_dir = normalizeSafe(normal);
     	const float dot_pn = clampFloat(dotProduct(point_dir, normal_dir), -1.0f, 1.0f);
     	const float beta = std::acos(dot_pn);
     
     	const float pp = 2.0f * point_norm * std::sin(theta * 0.5f);
     	const float sin_beta = std::sin(beta);
     	const float cos_beta = std::cos(beta);
     
     	const float pq_positive = pp * std::cos(beta - theta * 0.5f);
     	const float pq_negative = pp * std::cos(-beta - theta * 0.5f);
     
    -	const float denominator_positive = std::max(kEpsilon, point_norm - pq_positive * cos_beta);
    -	const float denominator_negative = std::max(kEpsilon, point_norm - pq_negative * cos_beta);
    +	const float denominator_positive = point_norm - pq_positive * cos_beta;
    +	const float denominator_negative = point_norm - pq_negative * cos_beta;
     
    -	const float atan_positive = std::atan((pq_positive * sin_beta) / denominator_positive);
    -	const float atan_negative = std::atan((pq_negative * (-sin_beta)) / denominator_negative);
    +	const float atan_positive = std::atan2(pq_positive * sin_beta, denominator_positive);
    +	const float atan_negative = std::atan2(pq_negative * (-sin_beta), denominator_negative);
     
     	const float gamma_positive = theta - atan_positive;
     	const float gamma_negative = theta - atan_negative;
     
     	const float rr_positive = point_norm * gamma_positive / theta;
     	const float rr_negative = point_norm * gamma_negative / theta;
     	const float rotational_return = std::max(rr_positive, rr_negative);
     
     	if (!std::isfinite(rotational_return))
     	{
     		return 0.0f;
     	}
     	return std::max(0.0f, rotational_return);
     }
    Suggestion importance[1-10]: 8

    __

    Why: Replacing std::atan with std::atan2 improves numerical stability by correctly handling quadrant determination and avoiding division-by-zero issues when the denominator is near zero. This is a meaningful correctness improvement for the rotational return calculation.

    Medium

    Previous suggestions

    Suggestions up to commit c1f2767
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Add bin count validation to prevent overflow

    The sphericalBucketIndex function should validate that azimuth_bins and z_bins are
    reasonable upper bounds (e.g., less than a large threshold) to prevent integer
    overflow when computing bucket indices later. Add a check for maximum allowed bin
    counts.

    NSS.cpp [91-100]

     int sphericalBucketIndex(
     	const glm::fvec3 &raw_normal,
     	int azimuth_bins,
     	int z_bins,
     	bool fold_antipodal)
     {
    -	if (azimuth_bins <= 0 || z_bins <= 0)
    +	if (azimuth_bins <= 0 || azimuth_bins > 10000 || z_bins <= 0 || z_bins > 10000)
     	{
     		return -1;
     	}
    Suggestion importance[1-10]: 8

    __

    Why: Adding upper bound validation for bin counts helps prevent integer overflow when computing bucket indices (azimuth_index * z_bins + z_index), which is a valid safety improvement. This addresses a potential edge case that could cause incorrect behavior with extreme input values.

    Medium
    General
    Use std::clamp for clarity and correctness

    The clampFloat function should use std::clamp for better readability and to avoid
    potential issues with nested std::min/std::max calls. This is a standard C++17
    feature that directly expresses the intent.

    NSS.cpp [18-21]

     float clampFloat(float value, float min_value, float max_value)
     {
    -	return std::max(min_value, std::min(value, max_value));
    +	return std::clamp(value, min_value, max_value);
     }
    Suggestion importance[1-10]: 7

    __

    Why: Using std::clamp is more idiomatic and clearer than nested std::min/std::max, but this is a stylistic improvement rather than a correctness fix. It improves readability and reduces potential for error, but doesn't address critical issues.

    Medium
    Rename function and parameter for accuracy

    The function name computeRotationalReturnValue is misleading because it takes
    normalized_point but the original implementation expected non-normalized points.
    Rename the parameter to point and update the function name to
    computeRotationalReturn for consistency with the algorithm description.

    NSS.cpp [51-57]

    -float computeRotationalReturnValue(const glm::fvec3 &normalized_point, const glm::fvec3 &normal, float theta)
    +float computeRotationalReturn(const glm::fvec3 &point, const glm::fvec3 &normal, float theta)
     {
    -	const float point_norm = vectorLength(normalized_point);
    +	const float point_norm = vectorLength(point);
     	if (!std::isfinite(point_norm) || point_norm <= kEpsilon)
     	{
     		return 0.0f;
     	}
    Suggestion importance[1-10]: 6

    __

    Why: Renaming the function and parameter improves clarity and consistency with the algorithm description, but the suggestion doesn't fully address the discrepancy since the function still uses vectorLength(point) which computes the norm of the input regardless of whether it's normalized. The parameter name change alone doesn't fix the underlying confusion about normalization expectations.

    Low
    Reserve memory before resizing for efficiency

    The computeCenteredAndScaledVertices function should reserve memory for
    normalized_vertices before resizing to avoid unnecessary reallocations when the
    input vector is large. This improves performance for large point clouds.

    NSS.cpp [139-148]

     void computeCenteredAndScaledVertices(
     	const std::vector<glm::fvec3> &vertices,
     	std::vector<glm::fvec3> &normalized_vertices)
     {
     	normalized_vertices.clear();
    -	normalized_vertices.resize(vertices.size(), glm::fvec3(0.0f));
     	if (vertices.empty())
     	{
     		return;
     	}
    +	normalized_vertices.reserve(vertices.size());
    +	normalized_vertices.resize(vertices.size(), glm::fvec3(0.0f));
    Suggestion importance[1-10]: 5

    __

    Why: Reserving memory before resizing can improve performance for large point clouds, but the current implementation already resizes with the correct size upfront. Since resize already allocates the required memory, the reserve call is redundant and adds unnecessary complexity without significant benefit.

    Low

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant