#723 Add the ability to generate `_corrupt_fields` column containing information on fields that Cobrix failed to decode #820

yruslan · 2026-02-11T07:42:13Z

Closes #723

Summary by CodeRabbit

New Features
- Added generate_corrupt_fields option that appends a _corrupt_fields array (field_name, raw_value) for values that fail conversion.
- Added a fluent CobolSchema builder API for easier schema construction and configuration.
Documentation
- README and changelog updated with usage examples and description of generate_corrupt_fields.
Tests
- New integration tests validating corrupt-fields output and option compatibility (including incompatible option checks).

… Schema.

- Introduced `generate_corrupted_fields` option to append a `_corrupted_fields` array to the schema, containing field names and raw values of fields that could not be decoded. - Updated relevant methods in `RecordExtractors` to handle corrupted fields. - Enhanced documentation and README to reflect the new feature.

…ull fields, improve performance of generating corrupted fields slightly.

…d memory wise), fix `PIC X USAGE COMP` cases.

…and flexibility in schema creation.

coderabbitai · 2026-02-11T07:42:28Z

Walkthrough

Adds a new generate_corrupt_fields option that captures fields failing to decode by storing their field name and raw bytes into a _corrupt_fields array. The flag is threaded through parameters, schema, iterators, and record extractors; Primitive gained raw extraction and isEmpty helpers. Tests, README, and a CobolSchema builder were added/updated.

Changes

Cohort / File(s)	Summary
Parameters & Parsing `cobol-parser/src/main/scala/.../CobolParameters.scala`, `.../CobolParametersParser.scala`, `.../ReaderParameters.scala`	Add `generate_corrupt_fields` constant and `generateCorruptFields: Boolean` option; validate incompatible combos and propagate into ReaderParameters.
Record Extraction & Corrupt model `cobol-parser/src/main/scala/.../record/RecordExtractors.scala`, `.../record/CorruptField.scala`	Collect `CorruptField(field_name, raw_value)` when decoding yields null but raw bytes exist; pass buffer through post-processing and optionally append `_corrupt_fields` array to outputs.
Primitive API `cobol-parser/src/main/scala/.../Primitive.scala`	Add `getRawValue` and `isEmpty` methods; separate raw extraction from decoding to enable corrupt-byte detection.
Schema & Constants `cobol-parser/src/main/scala/.../Constants.scala`, `.../schema/CobolSchema.scala`, `spark-cobol/src/main/scala/.../CobolSchema.scala`	Add constants (`_corrupt_fields`, `field_name`, `raw_value`); propagate `generateCorruptFields` through CobolSchema/CobolReaderSchema and conditionally add corrupt-fields StructType; add CobolSchema builder API.
Iterators & Reader Flow `cobol-parser/src/main/scala/.../FixedLenNestedRowIterator.scala`, `.../VarLenNestedIterator.scala`, `spark-cobol/src/main/scala/.../SparkCobolOptionsBuilder.scala`	Forward `generateCorruptFields` into `RecordExtractors.extractRecord` calls and adjust call sites to new parameter list.
Parser & Usage comments `cobol-parser/src/main/scala/.../ParserVisitor.scala`, `.../datatype/Usage.scala`	Adjust COMP usage handling (COMP4/COMP9) and update usage guidance comments; minor import reorders.
Tests & Documentation `README.md`, `spark-cobol/src/test/scala/.../Test41CorruptFieldsSpec.scala`, `spark-cobol/src/test/scala/.../*`	README documents new option and changelog; new integration test validates schema/data; many tests updated to use the CobolSchema builder and parameter constants.
Misc test updates `spark-cobol/src/test/.../DummyCobolSchema.scala`, `.../Test02TextFilesOldSchool.scala`, etc.	Refactor tests to the builder API, small style/import tweaks, and replace some hard-coded strings with constants.

Sequence Diagram

sequenceDiagram
    participant Client as Spark Client
    participant Extractor as RecordExtractors
    participant Primitive as Primitive
    participant Output as Output Row

    Client->>Extractor: extractRecord(..., generateCorruptFields=true)
    activate Extractor

    Extractor->>Extractor: init corruptedFields buffer

    loop for each field
        Extractor->>Primitive: getRawValue(offset)
        activate Primitive
        Primitive-->>Extractor: rawBytes or null
        alt rawBytes present and decode succeeds
            Extractor-->>Extractor: use decoded value
        else rawBytes present but decode returns null
            Extractor->>Extractor: create CorruptField(fieldName, rawBytes)
            Extractor-->>Extractor: add to corruptedFields
        end
        deactivate Primitive
    end

    alt generateCorruptFields == true
        Extractor->>Extractor: build _corrupt_fields array from buffer
        Extractor->>Output: append _corrupt_fields to Row
    end

    Extractor-->>Client: Row (with _corrupt_fields if enabled)
    deactivate Extractor

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

#769 Add EBCDIC processor as a library routine #771: touches Primitive/ParserVisitor encoder paths and encoder selection changes overlapping with raw extraction and encoding logic.
#780 Add spark-cobol options to the raw record extractor interface #781: modifies reader/parameter propagation similar to the added generateCorruptFields plumbing.
#776 Add COMP-3 decoders for the EBCDIC writer #778: introduces COMP3 handling and encoder changes that intersect with decoding/error scenarios addressed here.

Poem

🐰 I nibble bytes both shy and bold,

When decoders stumble, crumbs are told.
I tuck raw bytes and names in rows,
A little trail where the mystery goes.
Builder-bounced and brisk I hop — corrupts now saved, no more flip-flop.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements core requirements from `#723` [issue `#723`]: adds generate_corrupt_fields option, generates _corrupt_fields column with field_name and raw_value data when decoding fails. However, the linked issue requested user-configurable column names and detailed error messages, but the implementation uses a fixed column name and simplified data structure without error descriptions.	Verify whether the simplified implementation (fixed column name '_corrupt_fields' with only field_name and raw_value) fully satisfies the original `#723` requirements for user-configurable naming and detailed error messages. Consider documenting any scope changes or design trade-offs.
Out of Scope Changes check	❓ Inconclusive	Changes are generally scoped to the corrupted fields feature. However, ParserVisitor.scala adjusts COMP encoding handling (COMP1/COMP4 to COMP4/COMP9), and Usage.scala adds deprecation comments—these appear tangential to the _corrupt_fields feature and may warrant clarification.	Clarify the relationship between the COMP encoding changes in ParserVisitor.scala and Usage.scala comments to the `#723` feature. Confirm whether these are prerequisite fixes or separate improvements bundled into this PR.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main feature: adding a _corrupted_fields column to capture field decoding failures, which aligns perfectly with the primary change in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/723-add-corrupted-fields-field

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-11T07:44:45Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.45% `-0.59%`	🍏
Files changed	62.29%	🍏

File	Coverage
Constants.scala	89.7%	🍏
ParserVisitor.scala	89.44% `-0.07%`	🍏
CobolSchema.scala	80.67% `-1%`	🍏
RecordExtractors.scala	80.27% `-17%`	🍏
Usage.scala	79.59%	🍏
FixedLenNestedRowIterator.scala	73.44%	🍏
Primitive.scala	71.16% `-13.6%`	🍏
VarLenNestedIterator.scala	67.72%	🍏
CorruptField.scala	0%	❌

github-actions · 2026-02-11T07:44:46Z

JaCoCo code coverage report - 'spark-cobol'

Overall Project	82.39%	🍏
Files changed	100%	🍏

File	Coverage
CobolSchema.scala	95.51%	🍏
SparkCobolOptionsBuilder.scala	90.32%	🍏

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
43-43: ⚠️ Potential issue | 🟡 Minor

Typo in Scaladoc: "pecifies" → "Specifies".
Proposed fix
-  * `@param` schemaRetentionPolicy   pecifies a policy to transform the input schema. The default policy is to keep the schema exactly as it is in the copybook.
+  * `@param` schemaRetentionPolicy   Specifies a policy to transform the input schema. The default policy is to keep the schema exactly as it is in the copybook.

🤖 Fix all issues with AI agents

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala`:
- Around line 125-156: The isEmpty method currently treats byte 0x40 as an
empty-space unconditionally; update isEmpty to be encoding-aware by checking the
field's encoding (use dataType.enc on AlphaNumeric fields) and only treating
0x40 as an empty byte when the encoding is EBCDIC, while treating 0x20 as the
space for ASCII-encoded fields; retain the existing zero (0x00) check and fall
back to the current behavior for unknown encodings. Modify the loop in isEmpty
(and the earlier idx bounds logic using binaryProperties.dataSize) to build the
set of “empty” byte values based on dataType.enc and use that set to decide
emptiness.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`:
- Around line 435-436: The hierarchical extractor call in
extractHierarchicalRecord always passes generateCorruptedFields = false and
corruptedFields = null to applyRecordPostProcessing, so users enabling
corrupted-fields support for hierarchical/multisegment data get no output;
update extractHierarchicalRecord to detect when the user option for corrupted
fields is true and either (a) propagate the flag and a proper corruptedFields
collection into applyRecordPostProcessing if you implement support, or (b) emit
a clear warning via the existing logging mechanism (e.g., logger.warn) that
corrupted-fields are not supported for hierarchical/multisegment extraction and
will be ignored; reference extractHierarchicalRecord and
applyRecordPostProcessing so the change is made at that call site.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala`:
- Line 50: The ScalaDoc for the parameter generateCorruptFields is referencing
the wrong field name "_corrupt_fields"; update the comment to the correct field
name "_corrupted_fields" so the doc matches the actual option and field used
elsewhere (search for generateCorruptFields and any references to
_corrupted_fields/_corrupt_fields to confirm consistency).

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala`:
- Around line 898-901: The current check uses
params.contains(PARAM_CORRUPTED_FIELDS) which treats an explicit "false" value
the same as presence and wrongly throws; update the condition in the
CobolParametersParser where PARAM_CORRUPTED_FIELDS is checked so it only throws
when the option is explicitly set to true (e.g., inspect params.get/PARAM value
or use params.getBoolean-like logic on the params map) instead of just checking
presence; change the check around PARAM_CORRUPTED_FIELDS and the throw in that
block to allow explicit generate_corrupted_fields=false while still blocking
true when used with segment-children:*.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala`:
- Line 61: The Scaladoc and parameter naming are inconsistent: update
ReaderParameters (parameter generateCorruptFields) to match the actual constant
and usage by renaming it to generateCorruptedFields (or vice-versa across the
codebase) and fix the Scaladoc text to refer to Constants.corruptedFieldsField
(`_corrupted_fields`); ensure RecordExtractors.extractRecord and any other
callers use the same name so the parameter and documentation are consistent.

In `@README.md`:
- Around line 600-603: The fenced code block showing
.option("generate_corrupted_fields", "true") lacks a language identifier, which
triggers markdownlint MD040; update the fenced block to add a language tag
(e.g., scala) immediately after the opening backticks so it reads ```scala and
retains the existing code snippet .option("generate_corrupted_fields", "true").
- Around line 1954-1960: The changelog heading for the 2.9.8 entry uses ####
which skips a level (MD001); update the heading text "2.9.8 will be released
soon." to use a triple-hash heading (###) instead of #### so it correctly
follows the prior `##` level; ensure the surrounding block (the bullet about
[`#723`] and the code example `.option("generate_corrupted_fields", "true")`)
remains unchanged and properly nested under the new `###` heading.

In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`:
- Line 47: Update the Scaladoc for the generateCorruptedFields parameter in
CobolSchema (parameter name: generateCorruptedFields) to state that corrupted
record fields are appended to the end of the schema (not the beginning); this
aligns the doc with the implementation that uses :+ to add the
`_corrupted_fields` field at the end (see construction around the code that
appends `_corrupted_fields`).

🧹 Nitpick comments (7)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/common/Constants.scala (1)
67-69: Constants defined but not referenced in RecordExtractors.getCorruptedFieldsGroup.

In RecordExtractors.scala (lines 535-538), the corrupted fields group is constructed using hardcoded string literals "field_name", "raw_value", and "_corrupted_fields" instead of referencing these constants. Either use the constants there or remove them here to avoid drift.
Suggested change in RecordExtractors.scala (lines 532-539)
  private def getCorruptedFieldsGroup: Group = {
    val corruptedFieldsInGroup = new mutable.ArrayBuffer[Statement]

-   corruptedFieldsInGroup += Primitive(15, "field_name", "field_name", 0, AlphaNumeric("X(50)", 50), decode = null, encode = null)(None)
-   corruptedFieldsInGroup += Primitive(15, "raw_value", "raw_value", 0, AlphaNumeric("X(50)", 50, enc = Some(RAW), compact = Some(COMP4())), decode = null, encode = null)(None)
+   corruptedFieldsInGroup += Primitive(15, Constants.fieldNameColumn, Constants.fieldNameColumn, 0, AlphaNumeric("X(50)", 50), decode = null, encode = null)(None)
+   corruptedFieldsInGroup += Primitive(15, Constants.rawValueColumn, Constants.rawValueColumn, 0, AlphaNumeric("X(50)", 50, enc = Some(RAW), compact = Some(COMP4())), decode = null, encode = null)(None)

-   Group(10, "_corrupted_fields", "_corrupted_fields", 0,  children = corruptedFieldsInGroup, occurs = Some(10))(None)
+   Group(10, Constants.corruptedFieldsField, Constants.corruptedFieldsField, 0,  children = corruptedFieldsInGroup, occurs = Some(10))(None)
  }
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (1)

125-156: Duplicated bounds-checking logic between isEmpty and getRawValue.

Both methods repeat the same offset validation and byte-range calculation (string vs. non-string, varchar trimming). Consider extracting a shared private helper that returns the valid (startIdx, endIdx) or null/None to reduce duplication.

Also applies to: 167-191
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala (1)
72-73: ArrayBuffer allocated even when generateCorruptedFields is false.

Line 73 unconditionally allocates new ArrayBuffer[CorruptedField] for every record, even when the feature is disabled. Consider making it lazy or conditional.
Suggested fix
-    val corruptedFields = new ArrayBuffer[CorruptedField]
+    val corruptedFields = if (generateCorruptedFields) new ArrayBuffer[CorruptedField] else null
Then the null-guard at line 511 already covers this.
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/VarLenNestedIterator.scala (1)

95-110: Correct propagation of generateCorruptFields to extractRecord.

The wiring is correct. Note the naming inconsistency: ReaderParameters.generateCorruptFields maps to RecordExtractors.extractRecord's generateCorruptedFields parameter. Not a bug, but the inconsistent naming (Corrupt vs Corrupted) across the API surface could cause confusion.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test41CorruptedFieldsSpec.scala (1)

104-113: Consider using PARAM_CORRUPTED_FIELDS constant for the option key.

PARAM_CORRUPTED_FIELDS is imported on line 21 but only used in the assertion at line 121. For consistency and to guard against the option key string drifting out of sync, consider using the constant here as well.

That said, using string literals in Spark option maps is a common test pattern (it validates the actual user-facing API surface), so this is a minor style preference.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)
905-910: Inconsistent type assertion style.

Line 909 uses isInstanceOf[StringType] while all other tests in this file compare with == (e.g., == StringType, == IntegerType). For consistency, prefer the equality check.
Proposed fix
-      assert(sparkSchema.fields(2).dataType.isInstanceOf[StringType])
+      assert(sparkSchema.fields(2).dataType == StringType)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala (1)

147-147: Naming inconsistency: generateCorruptFields vs generateCorruptedFields across the codebase.

The parameter is defined as generateCorruptFields (no "ed") in ReaderParameters and CobolParameters, but CobolSchema and RecordExtractors use generateCorruptedFields. This inconsistency spans multiple files and creates confusing naming conventions. Consider aligning all usages to generateCorruptedFields, which is grammatically correct and already used in the public-facing option key generate_corrupted_fields.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

coderabbitai · 2026-02-11T07:48:53Z

...arser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala

+    applyRecordPostProcessing(ast, records.toList, effectiveSchemaRetentionPolicy, generateRecordId, generateRecordBytes = false, generateCorruptedFields = false,  Nil, fileId, recordId, recordLength, Array.empty[Byte], generateInputFileField = generateInputFileField, inputFileName, null, handler)
  }


⚠️ Potential issue | 🟡 Minor

Hierarchical record extraction does not support corrupted fields.

extractHierarchicalRecord hardcodes generateCorruptedFields = false and corruptedFields = null. If a user enables the option with hierarchical/multisegment data, they'll silently get no corrupted fields output. Consider logging a warning if the feature is enabled but unsupported, or documenting this limitation.

🤖 Prompt for AI Agents

In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala` around lines 435 - 436, The hierarchical extractor call in extractHierarchicalRecord always passes generateCorruptedFields = false and corruptedFields = null to applyRecordPostProcessing, so users enabling corrupted-fields support for hierarchical/multisegment data get no output; update extractHierarchicalRecord to detect when the user option for corrupted fields is true and either (a) propagate the flag and a proper corruptedFields collection into applyRecordPostProcessing if you implement support, or (b) emit a clear warning via the existing logging mechanism (e.g., logger.warn) that corrupted-fields are not supported for hierarchical/multisegment extraction and will be ignored; reference extractHierarchicalRecord and applyRecordPostProcessing so the change is made at that call site.

If they set the option to true the option map fails validation and throws an exception. So for now this is good. But in the future we can enable this option for hierarchical record extractor as well.

@yruslan, Thank you for the clarification! I can see from the validation logic in CobolParametersParser that the combination of generate_corrupted_fields with hierarchical segment-children is properly prevented at option parsing time, throwing an IllegalArgumentException before extraction occurs. This makes the hardcoded false value in extractHierarchicalRecord safe and appropriate for the current implementation.

The validation approach is sound—failing fast during configuration rather than silently ignoring the option. Looking forward to seeing this feature extended to hierarchical extractors in the future when support is implemented.

✏️ Learnings added

Learnt from: yruslan Repo: AbsaOSS/cobrix PR: 820 File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala:435-436 Timestamp: 2026-02-11T08:01:45.131Z Learning: In `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`, the `extractHierarchicalRecord` method hardcodes `generateCorruptedFields = false` because validation in `CobolParametersParser` throws an `IllegalArgumentException` when `generate_corrupted_fields` is used with segment-children (hierarchical data), preventing users from enabling this unsupported combination.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai · 2026-02-11T07:48:53Z

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala

  * @param variableLengthParams    VariableLengthParameters containing the specifications for the consumption of variable-length Cobol records.
  * @param variableSizeOccurs      If true, OCCURS DEPENDING ON data size will depend on the number of elements
  * @param generateRecordBytes     Generate 'record_bytes' field containing raw bytes of the original record
+  * @param generateCorruptFields   Generate '_corrupt_fields' field for fields that haven't converted successfully


⚠️ Potential issue | 🟡 Minor

Fix doc field name to _corrupted_fields.
Line 50 mentions _corrupt_fields, but the option and field name elsewhere are _corrupted_fields.

📝 Suggested fix

- * `@param` generateCorruptFields Generate '_corrupt_fields' field for fields that haven't converted successfully + * `@param` generateCorruptFields Generate '_corrupted_fields' field for fields that haven't converted successfully

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

* @param generateCorruptFields Generate '_corrupt_fields' field for fields that haven't converted successfully

* `@param` generateCorruptFields Generate '_corrupted_fields' field for fields that haven't converted successfully

🤖 Prompt for AI Agents

In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala` at line 50, The ScalaDoc for the parameter generateCorruptFields is referencing the wrong field name "_corrupt_fields"; update the comment to the correct field name "_corrupted_fields" so the doc matches the actual option and field used elsewhere (search for generateCorruptFields and any references to _corrupted_fields/_corrupt_fields to confirm consistency).

Which name for the field seems better _corrupt_fields or _corrupted_fields ?

@yruslan, great question! Here's my analysis of both options:

_corrupted_fields (current implementation):

✅ More grammatically complete (past participle: "fields that have been corrupted")

✅ More explicit and descriptive

✅ Commonly used pattern in data quality contexts (similar to "rejected_records", "failed_rows")

✅ Clearer that it refers to fields that experienced corruption during processing

_corrupt_fields:

✅ Shorter and more concise

✅ Better parallelism with the parameter name generateCorruptFields

⚠️ Could be ambiguous: "corrupt fields" might imply fields that are inherently corrupt vs. fields that failed during decoding

Recommendation: I'd lean toward _corrupted_fields for clarity and standard data quality terminology. The past participle form makes it unambiguous that these fields encountered corruption during the decoding process.

For consistency, you might also consider renaming the parameter to generateCorruptedFields to maintain parallelism, though generateCorruptFields is already quite clear.

What's your preference given the context of your users and how they'll interact with this feature?

I see but still lean towards _corrupt_fields since spark-csv has a similar concept generating _corrupt_records field for rows it failed to decode

@yruslan, excellent point — Spark CSV indeed uses _corrupt_record as the standard naming convention for similar functionality. Aligning with that established pattern makes perfect sense for consistency across the Spark ecosystem and will be more intuitive for users already familiar with Spark's data quality features.

So to confirm: you'd like to change the implementation from _corrupted_fields to _corrupt_fields (rather than updating the doc comment), which means:

Update Constants.scala from corruptedFieldsField = "_corrupted_fields" to corruptedFieldsField = "_corrupt_fields"

Ensure all usages throughout the codebase reference the updated constant

Keep the doc comment as-is (it already says _corrupt_fields)

This would align the feature naming with Spark's established conventions while maintaining internal consistency. The parameter name generateCorruptFields would also align nicely with the field name _corrupt_fields.

Would you like me to help identify all the files that would need updates for this change?

✏️ Learnings added

Learnt from: yruslan Repo: AbsaOSS/cobrix PR: 820 File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala:50-50 Timestamp: 2026-02-11T08:15:51.206Z Learning: In the Cobrix project, the field name for corrupted/failed fields should be `_corrupt_fields` (not `_corrupted_fields`) to align with Apache Spark CSV's established naming convention which uses `_corrupt_record` for similar functionality, ensuring consistency across the Spark ecosystem.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: yruslan Repo: AbsaOSS/cobrix PR: 820 File: cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala:435-436 Timestamp: 2026-02-11T08:01:45.131Z Learning: In `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`, the `extractHierarchicalRecord` method hardcodes `generateCorruptedFields = false` because validation in `CobolParametersParser` throws an `IllegalArgumentException` when `generate_corrupted_fields` is used with segment-children (hierarchical data), preventing users from enabling this unsupported combination.

...-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala

coderabbitai · 2026-02-11T07:48:53Z

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala

  * @param fileEndOffset           A number of bytes to skip at the end of each file
  * @param generateRecordId        If true, a record id field will be prepended to each record.
  * @param generateRecordBytes     Generate 'record_bytes' field containing raw bytes of the original record
+  * @param generateCorruptFields   Generate '_corrupt_fields' field for fields that haven't converted successfully


⚠️ Potential issue | 🟡 Minor

Naming inconsistency: generateCorruptFields vs. generateCorruptedFields and doc typo.

The Scaladoc (line 61) says _corrupt_fields but the actual column name in Constants.corruptedFieldsField is _corrupted_fields.

This parameter is named generateCorruptFields while RecordExtractors.extractRecord names it generateCorruptedFields. Aligning these would reduce confusion.

Suggested doc fix

- * `@param` generateCorruptFields Generate '_corrupt_fields' field for fields that haven't converted successfully + * `@param` generateCorruptFields Generate '_corrupted_fields' field for fields that haven't converted successfully

Also applies to: 115-115

🤖 Prompt for AI Agents

In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala` at line 61, The Scaladoc and parameter naming are inconsistent: update ReaderParameters (parameter generateCorruptFields) to match the actual constant and usage by renaming it to generateCorruptedFields (or vice-versa across the codebase) and fix the Scaladoc text to refer to Constants.corruptedFieldsField (`_corrupted_fields`); ensure RecordExtractors.extractRecord and any other callers use the same name so the parameter and documentation are consistent.

README.md

coderabbitai · 2026-02-11T07:48:53Z

README.md

+- #### 2.9.8 will be released soon.
+  - [#723](https://github.com/AbsaOSS/cobrix/pull/723) Added the option to generate `corrupted_fields` field that contains field names and raw values
+    of fields that Cobrix couldn't decode.
+    ```scala
+    .option("generate_corrupted_fields", "true")
+    ```
+


⚠️ Potential issue | 🟡 Minor

Fix heading level increment (MD001).
The changelog jumps from ## to ####. Use ### for the 2.9.8 entry.

📝 Suggested fix

-- #### 2.9.8 will be released soon. +- ### 2.9.8 will be released soon.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- #### 2.9.8 will be released soon.

- [#723](https://github.com/AbsaOSS/cobrix/pull/723) Added the option to generate `corrupted_fields` field that contains field names and raw values

of fields that Cobrix couldn't decode.

```scala

.option("generate_corrupted_fields", "true")

```

- ### 2.9.8 will be released soon.

- [`#723`](https://github.com/AbsaOSS/cobrix/pull/723) Added the option to generate `corrupted_fields` field that contains field names and raw values

of fields that Cobrix couldn't decode.

🧰 Tools

🪛 markdownlint-cli2 (0.20.0)

[warning] 1954-1954: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

🤖 Prompt for AI Agents

In `@README.md` around lines 1954 - 1960, The changelog heading for the 2.9.8 entry uses #### which skips a level (MD001); update the heading text "2.9.8 will be released soon." to use a triple-hash heading (###) instead of #### so it correctly follows the prior `##` level; ensure the surrounding block (the bullet about [`#723`] and the code example `.option("generate_corrupted_fields", "true")`) remains unchanged and properly nested under the new `###` heading.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala`:
- Around line 155-163: Update the ScalaDoc for the isEmpty(itOffset: Int,
record: Array[Byte]): Boolean method to accurately describe which byte values
are treated as "space" for different encodings: state that ASCII bytes use 0x20,
EBCDIC bytes use 0x40, and unknown/fallback encodings treat space as 0x00 (in
addition to null bytes 0x00 being considered empty); reference the isEmpty
method in Primitive.scala so readers understand the documented behavior matches
the implementation.

In `@README.md`:
- Around line 1955-1956: Update the changelog entry to reference the correct
field name _corrupt_fields (with leading underscore) instead of corrupt_fields,
and ensure the rest of the codebase consistently uses _corrupt_fields: search
for and rename any occurrences of corrupt_fields or _corrupted_fields to
_corrupt_fields (including the field definition and parameter in
CobolParameters.scala and any tests, serializers, or references that use the old
names), and update README.md, tests, and serialization/deserialization logic to
the new symbol to keep naming aligned with Spark conventions.

🧹 Nitpick comments (3)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala (1)
528-539: Prefer Constants for corrupt field names.
Avoids string drift now that canonical names live in Constants.
♻️ Suggested refactor
-    corruptFieldsInGroup += Primitive(15, "field_name", "field_name", 0, AlphaNumeric("X(50)", 50), decode = null, encode = null)(None)
-    corruptFieldsInGroup += Primitive(15, "raw_value", "raw_value", 0, AlphaNumeric("X(50)", 50, enc = Some(RAW), compact = Some(COMP4())), decode = null, encode = null)(None)
+    corruptFieldsInGroup += Primitive(15, Constants.fieldNameColumn, Constants.fieldNameColumn, 0, AlphaNumeric("X(50)", 50), decode = null, encode = null)(None)
+    corruptFieldsInGroup += Primitive(15, Constants.rawValueColumn, Constants.rawValueColumn, 0, AlphaNumeric("X(50)", 50, enc = Some(RAW), compact = Some(COMP4())), decode = null, encode = null)(None)
 
-    Group(10, "_corrupt_fields", "_corrupt_fields", 0,  children = corruptFieldsInGroup, occurs = Some(10))(None)
+    Group(10, Constants.corruptFieldsField, Constants.corruptFieldsField, 0, children = corruptFieldsInGroup, occurs = Some(10))(None)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test41CorruptFieldsSpec.scala (1)
104-105: Use the imported PARAM_CORRUPT_FIELDS constant instead of the string literal.

PARAM_CORRUPT_FIELDS is already imported (line 21) and used in the assertion on line 121. Using the constant here too avoids the risk of the option name drifting out of sync.
♻️ Suggested fix
-        val df = getDataFrame(tmpFileName, Map("generate_corrupt_fields" -> "true"))
+        val df = getDataFrame(tmpFileName, Map(PARAM_CORRUPT_FIELDS -> "true"))
Same applies to line 118:
-        getDataFrame("/tmp/dummy", Map("generate_corrupt_fields" -> "true", "segment-children:0" -> "COMPANY => DEPT,CUSTOMER"))
+        getDataFrame("/tmp/dummy", Map(PARAM_CORRUPT_FIELDS -> "true", "segment-children:0" -> "COMPANY => DEPT,CUSTOMER"))
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)
744-911: Thorough builder test coverage.

The builder tests comprehensively exercise each withX method and their combinations. One minor nit:

Line 909 uses isInstanceOf[StringType] while the rest of this file consistently uses == StringType for type assertions (e.g., lines 542, 544, 627). Consider using == StringType for consistency.
♻️ Suggested fix
-      assert(sparkSchema.fields(2).dataType.isInstanceOf[StringType])
+      assert(sparkSchema.fields(2).dataType == StringType)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

README.md

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala`:
- Line 174: Update the two typos in the comments inside Primitive.scala: change
"should exactly fix the required bytes" to "should exactly fit the required
bytes" for the comment that starts "Non-string field size..." and apply the same
replacement for the second occurrence around line 216; locate these comments in
the Primitive.scala source (search for the phrase "should exactly fix the
required bytes") and edit the text accordingly.

🧹 Nitpick comments (5)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala (1)

66-102: Minor duplication in spaceChar initialization between Integral and Decimal branches.

The None compact case for both Integral and Decimal (lines 80-86 and 92-98) is identical. Not urgent, but could be deduplicated if this logic grows.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scala (1)
872-887: Consider adding deeper structural assertions for the _corrupt_fields array element type.

The multi-segment test at Line 300 validates the inner struct fields (field_name, raw_value) of the _corrupt_fields array, but this builder test only checks the top-level name and ArrayType. Adding a similar inner-struct assertion here would guard against regressions in the element schema.
Suggested addition
       assert(sparkSchema.fields(1).name == "_corrupt_fields")
       assert(sparkSchema.fields(1).dataType.isInstanceOf[ArrayType])
+      val corruptFieldsType = sparkSchema.fields(1).dataType.asInstanceOf[ArrayType]
+      val elementType = corruptFieldsType.elementType.asInstanceOf[StructType]
+      assert(elementType.fields.map(_.name).toSet == Set("field_name", "raw_value"))
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala (2)
114-116: Minor style: missing space after comma in getRawValue calls.

Lines 115 and 142 both have s.getRawValue(offset,data) / st.getRawValue(useOffset,data) without a space after the comma, inconsistent with the rest of the codebase (e.g., s.decodeTypeValue(offset, data) on line 113).
Proposed fix
-              corruptFields += CorruptField(s"${field.name}[$i]", s.getRawValue(offset,data))
+              corruptFields += CorruptField(s"${field.name}[$i]", s.getRawValue(offset, data))
-            corruptFields += CorruptField(field.name, st.getRawValue(useOffset,data))
+            corruptFields += CorruptField(field.name, st.getRawValue(useOffset, data))
533-540: Synthetic Group for corrupt fields schema uses hardcoded occurs = Some(10) and decode = null.

The occurs = Some(10) on line 539 is arbitrary since the actual array size is determined at runtime. If this Group is ever used for size calculations (e.g., binaryProperties.actualSize), the hardcoded 10 could produce incorrect results. Similarly, decode = null and encode = null on the Primitives (lines 536–537) would NPE if anyone tries to decode through these synthetic nodes.

Since this Group is only used as a structural schema hint for handler.create (line 518), this is likely safe today. But it's fragile — consider adding a comment clarifying this Group must never be used for decoding or size computation.
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test41CorruptFieldsSpec.scala (1)

36-42: Test data comments could be more precise for record 3.

Line 39 comment says "COMP invalid value (negative)" but the expected output shows F3 = 61702 (decoded successfully, not flagged as corrupt). The field F1 = 0x00 is the one that decodes to null here. Consider updating the comment to clarify what's actually being tested in this record — it seems to test that an out-of-PIC-range COMP value is still decoded (not flagged) and that a null byte for alphanumeric is treated as empty (not corrupt).

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala`:
- Around line 66-101: The cached spaceChar value in Primitive.scala incorrectly
treats COMP-3 empty byte as 0x40; locate the private val spaceChar (pattern
matching on dataType) and change the COMP3() cases inside the Integral and
Decimal branches from returning 0x40 to returning 0x00 so COMP-3 fields follow
the project-wide null encoding; ensure both occurrences (the t.compact match in
Integral and the t.compact match in Decimal) are updated and run tests to verify
corruption detection now flags all-0x40 packed fields.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`:
- Around line 533-540: getCorruptFieldsGroup currently hardcodes occurs =
Some(10) which caps maxElements to 10 and can truncate flattening; update
getCorruptFieldsGroup to compute the occurs value instead of using the literal
10 — e.g., derive it from the copybook metadata or a configuration parameter
(use an existing method/property that exposes max possible corrupt fields or add
a config lookup/fallback) and pass that computed Option[Int] into the Group
constructor (replace the literal Some(10) with the computed Option). Ensure the
change uses the same symbols: getCorruptFieldsGroup, Group(..., occurs = ...) so
downstream metadata reflects the true upper bound.

coderabbitai · 2026-02-11T09:09:31Z

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

+  /** This is cached value to speedup checking for empty values */
+  private val spaceChar: Byte = {
+    dataType match {
+      case t: AlphaNumeric =>
+        t.enc match {
+          case Some(EBCDIC) => 0x40
+          case Some(ASCII) => 0x20
+          case Some(_) => 0
+          case None => 0x40
+        }
+      case t: Integral =>
+        t.compact match {
+          case Some(COMP3()) => 0x40
+          case Some(_) => 0
+          case None =>
+            t.enc match {
+              case Some(EBCDIC) => 0x40
+              case Some(ASCII) => 0x20
+              case Some(_) => 0
+              case None => 0x40
+            }
+        }
+      case t: Decimal =>
+        t.compact match {
+          case Some(COMP3()) => 0x40
+          case Some(_) => 0
+          case None =>
+            t.enc match {
+              case Some(EBCDIC) => 0x40
+              case Some(ASCII) => 0x20
+              case Some(_) => 0
+              case None => 0x40
+            }
+        }
+      case _ => 0
+    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's understand how spaceChar is used rg -A 5 -B 5 "spaceChar" cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

Repository: AbsaOSS/cobrix

Length of output: 630

🏁 Script executed:

# Search for usages of spaceChar throughout the codebase rg "spaceChar" --type scala -A 3 -B 3

Repository: AbsaOSS/cobrix

Length of output: 3940

🏁 Script executed:

# Look for COMP-3 handling in reader/extractor code rg "COMP3|0x40" cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/ --type scala -A 2 -B 2

Repository: AbsaOSS/cobrix

Length of output: 40

🏁 Script executed:

# Check for empty/null value detection in readers rg "isEmpty|isNull|empty.*value" cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/ --type scala -A 2 | head -50

Repository: AbsaOSS/cobrix

Length of output: 5846

🏁 Script executed:

# Look for test cases related to COMP-3 empty values fd ".*Suite\.scala$" cobol-parser/src/test --exec rg -l "COMP3|comp3" {} \;

Repository: AbsaOSS/cobrix

Length of output: 40

🏁 Script executed:

# Find test files mentioning COMP3 or packed decimal find cobol-parser/src/test -name "*.scala" -exec grep -l "COMP3\|comp3\|packed" {} \;

Repository: AbsaOSS/cobrix

Length of output: 329

🏁 Script executed:

# Check for corrupted field handling related to COMP-3 rg "corrupted|Corrupt" cobol-parser/src/main/scala --type scala -B 2 -A 2 | grep -A 5 -B 5 "COMP3"

Repository: AbsaOSS/cobrix

Length of output: 40

🏁 Script executed:

# Look for isEmpty method implementation and all its usages rg "def isEmpty" cobol-parser/src/main/scala --type scala -A 10

Repository: AbsaOSS/cobrix

Length of output: 1312

🏁 Script executed:

# Search for any configuration or documentation about null encoding rg "null.*encod|encod.*null|0x00|zero.*byte" cobol-parser/src/main/scala --type scala | head -20

Repository: AbsaOSS/cobrix

Length of output: 3289

🏁 Script executed:

# Let's look at the BinaryDecoderSpec test cat cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryDecoderSpec.scala | head -100

Repository: AbsaOSS/cobrix

Length of output: 10149

🏁 Script executed:

# Check isEmpty usage in tests rg "isEmpty" cobol-parser/src/test --type scala -B 3 -A 3

Repository: AbsaOSS/cobrix

Length of output: 24980

🏁 Script executed:

# Look for COMP3 in the test files we found rg "COMP3|comp3" cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryDecoderSpec.scala -B 5 -A 5

Repository: AbsaOSS/cobrix

Length of output: 4525

🏁 Script executed:

# Let's check the full isEmpty method sed -n '102,130p' cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

Repository: AbsaOSS/cobrix

Length of output: 1087

🏁 Script executed:

# Look for the complete Primitive.scala spaceChar definition with context sed -n '60,110p' cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala

Repository: AbsaOSS/cobrix

Length of output: 1540

🏁 Script executed:

# Search for any test that calls isEmpty or checks empty values with COMP-3 rg "isEmpty|empty" cobol-parser/src/test --type scala | grep -i "comp3\|packed"

Repository: AbsaOSS/cobrix

Length of output: 40

🏁 Script executed:

# Check for usage of isEmpty method in the codebase rg "\.isEmpty\(" cobol-parser/src/main --type scala -B 2 | head -40

Repository: AbsaOSS/cobrix

Length of output: 1013

Change COMP‑3 empty byte from 0x40 to 0x00 to properly detect corruption.

COMP-3 is packed decimal format where 0x40 is meaningless data. Using 0x40 as the empty indicator causes fields with invalid all-0x40 bytes to be incorrectly classified as empty and excluded from corruption reporting. This contradicts the design principle that null values are encoded as 0x00 for all field types. Changing this to 0x00 aligns with the null encoding scheme and ensures corrupted COMP-3 data is properly detected.

Suggested fix

- case Some(COMP3()) => 0x40 + case Some(COMP3()) => 0

- case Some(COMP3()) => 0x40 + case Some(COMP3()) => 0

🤖 Prompt for AI Agents

In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala` around lines 66 - 101, The cached spaceChar value in Primitive.scala incorrectly treats COMP-3 empty byte as 0x40; locate the private val spaceChar (pattern matching on dataType) and change the COMP3() cases inside the Integral and Decimal branches from returning 0x40 to returning 0x00 so COMP-3 fields follow the project-wide null encoding; ensure both occurrences (the t.compact match in Integral and the t.compact match in Decimal) are updated and run tests to verify corruption detection now flags all-0x40 packed fields.

coderabbitai · 2026-02-11T09:09:32Z

...arser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala

+  private def getCorruptFieldsGroup: Group = {
+    val corruptFieldsInGroup = new mutable.ArrayBuffer[Statement]
+
+    corruptFieldsInGroup += Primitive(15, Constants.fieldNameColumn, Constants.fieldNameColumn, 0, AlphaNumeric("X(50)", 50), decode = null, encode = null)(None)
+    corruptFieldsInGroup += Primitive(15, Constants.rawValueColumn, Constants.rawValueColumn, 0, AlphaNumeric("X(50)", 50, enc = Some(RAW), compact = Some(COMP4())), decode = null, encode = null)(None)
+
+    Group(10, Constants.corruptFieldsField, Constants.corruptFieldsField, 0,  children = corruptFieldsInGroup, occurs = Some(10))(None)
+  }


⚠️ Potential issue | 🟡 Minor

Hardcoded occurs = Some(10) can under‑report max corrupt fields.
This caps maxElements metadata to 10, which can be misleading (and can truncate flattening logic). Consider deriving the upper bound from the copybook (e.g., max possible corrupt fields) or making it configurable.

🤖 Prompt for AI Agents

In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala` around lines 533 - 540, getCorruptFieldsGroup currently hardcodes occurs = Some(10) which caps maxElements to 10 and can truncate flattening; update getCorruptFieldsGroup to compute the occurs value instead of using the literal 10 — e.g., derive it from the copybook metadata or a configuration parameter (use an existing method/property that exposes max possible corrupt fields or add a config lookup/fallback) and pass that computed Option[Int] into the Group constructor (replace the literal Some(10) with the computed Option). Ensure the change uses the same symbols: getCorruptFieldsGroup, Group(..., occurs = ...) so downstream metadata reflects the true upper bound.

yruslan added 5 commits February 9, 2026 09:23

#723 Add 'generate_corrupted_fields' as a parameter and part of Spark…

e501f35

… Schema.

#723 Make 0x40 considered empty and not corrupted when checking for n…

bb08a9c

…ull fields, improve performance of generating corrupted fields slightly.

#723 Make corrupted fields generation slightly more efficient (CPU an…

be408d9

…d memory wise), fix `PIC X USAGE COMP` cases.

Refactor CobolSchema to use builder pattern for improved readability …

3d1a9a8

…and flexibility in schema creation.

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

#723 Fix numerous PR suggestions about corrupt field generation.

620daeb

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala Outdated Show resolved Hide resolved

#723 Fix PR nitpicks.

8a835e5

yruslan force-pushed the feature/723-add-corrupted-fields-field branch from 91a8caa to 8a835e5 Compare February 11, 2026 09:00

yruslan changed the title ~~#723 Add the ability to generate _corrupted_fields column containing information on fields that Cobrix failed to decode~~ #723 Add the ability to generate _corrupt_fields column containing information on fields that Cobrix failed to decode Feb 11, 2026

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

yruslan merged commit 8022849 into master Feb 11, 2026
7 checks passed

yruslan deleted the feature/723-add-corrupted-fields-field branch February 11, 2026 09:11

		applyRecordPostProcessing(ast, records.toList, effectiveSchemaRetentionPolicy, generateRecordId, generateRecordBytes = false, generateCorruptedFields = false, Nil, fileId, recordId, recordLength, Array.empty[Byte], generateInputFileField = generateInputFileField, inputFileName, null, handler)
		}

	* @param generateCorruptFields Generate '_corrupt_fields' field for fields that haven't converted successfully
	* `@param` generateCorruptFields Generate '_corrupted_fields' field for fields that haven't converted successfully

#723 Add the ability to generate _corrupt_fields column containing information on fields that Cobrix failed to decode #820

#723 Add the ability to generate _corrupt_fields column containing information on fields that Cobrix failed to decode #820

Uh oh!

Conversation

yruslan commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yruslan Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yruslan Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

yruslan Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

#723 Add the ability to generate `_corrupt_fields` column containing information on fields that Cobrix failed to decode #820

#723 Add the ability to generate `_corrupt_fields` column containing information on fields that Cobrix failed to decode #820

yruslan commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading

coderabbitai bot Feb 11, 2026 •

edited

Loading

coderabbitai bot Feb 11, 2026 •

edited

Loading