Implement streaming FVDL processing with memory tracking to optimize …#920
Implement streaming FVDL processing with memory tracking to optimize …#920mneeta wants to merge 1 commit intofortify:dev/v3.xfrom
Conversation
…parser performance
|
To what extent is this streaming parser related to the DOM-based parser logic used by other Fortify products? I guess other Fortify products could also benefit from a streaming parser, and at the same time, we want to ensure that we're using the parsing logic everywhere to ensure consistency and avoid difficult to identify differences in behavior (like we've had in the past between SSC & AWB for example). As a side note, eventually we'll likely want to move FPR parsing code to fcli-common or similar, as we'd want to reuse this functionality to provide FPR-based fcli commands (i.e., as an enhanced replacement for FPRUtility). |
|
Thanks for raising this — this is an important consideration. The current streaming implementation is functionally aligned with the existing DOM-based parsing logic. The goal was not to introduce new parsing semantics, but to replicate the same behavior while avoiding loading the entire FVDL document into memory. Specifically:
I fully agree that consistency across Fortify products is critical to avoid behavioral differences. Moving the parsing logic into a shared component (e.g., fcli-common) would be architecturally beneficial. However, given the tight timeline for this release, such refactoring would be difficult to complete safely. This would be a good candidate for a follow-up improvement. |
Problem
The existing FVDL processing relied on DOM-based parsing, which loads the entire XML document into memory.
For large FPR/FVDL files, this resulted in high memory consumption and scalability limitations.
Solution
This PR introduces a streaming-based FVDL processor that parses the XML incrementally instead of loading it entirely into memory.
Key additions:
StreamingFVDLProcessor
Parser components for metadata, description, and trace
Memory tracking support via MemoryTracker
YAML-based language comment configuration
The new implementation significantly reduces peak memory usage during parsing.