Skip to content

perf: Optimize DICOM header extraction for large medical datasets (Phase 1)#29

Open
chinmayy777 wants to merge 2 commits intoOSIPI:mainfrom
chinmayy777:fix/dicom-parsing-performance
Open

perf: Optimize DICOM header extraction for large medical datasets (Phase 1)#29
chinmayy777 wants to merge 2 commits intoOSIPI:mainfrom
chinmayy777:fix/dicom-parsing-performance

Conversation

@chinmayy777
Copy link

@chinmayy777 chinmayy777 commented Mar 1, 2026

Description

📦 Stacked PR Notice: > Note for Reviewers: This PR was branched off of my infrastructure fixes in PR #8 so that I could successfully build and test the Docker containers locally.

Currently, the GitHub diff shows additional files changed, but the only new changes for this specific feature are in package/src/pyaslreport/main.py. Once PR #8 is merged into main, GitHub will automatically update this diff to only show the relevant optimization!

This PR resolves a significant performance bottleneck in the pyaslreport package. #28

Previously, get_dicom_header() iterated through and parsed every file in the target directory using pydicom.dcmread just to populate an array, before finally reading the first file again to extract the header. For clinical datasets containing thousands of DICOM slices, this caused massive unnecessary disk I/O and memory overhead.

This optimization refactors the loop to return dcm_header immediately upon successfully parsing the very first valid DICOM file, turning an O(N) operation into an O(1) operation (best case).

Checklist

  • Code follows style guidelines
  • Tests are included and passing (Existing logic preserved)
  • Documentation is updated
  • No breaking changes
  • Error handling is appropriate
  • Performance considerations addressed (Eliminates O(N) disk parsing)
  • Security implications considered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant