Add BaseFormatModelTest for FormatModel implementations by joyhaldar · Pull Request #15441 · apache/iceberg

joyhaldar · 2026-02-25T11:02:51Z

Adds base test class and tests for FormatModel implementations.

Changes

BaseFormatModelTest<T> - Base test class supporting (T) engine type
TestSparkFormatModel - Spark InternalRow round-trip tests
TestFlinkFormatModel - Flink RowData round-trip tests

pvary · 2026-02-25T11:18:07Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

+  protected abstract Class<W> writeType();
+
+  protected abstract Class<R> readType();


Why do we have different read and write type?

I would expect that we use generic Records for one, and the model specific type for the other

pvary · 2026-02-25T11:19:52Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

+    EqualityDeleteWriter<W> writer =
+        writerBuilder
+            .schema(TestBase.SCHEMA)
+            .engineSchema(writeEngineSchema(TestBase.SCHEMA))


Why is this added?
I remember similar issues when I was working on the Spark model, but I also remember fixing it.
Do we need this at this point?

Tests fail for AVRO without engineSchema with the error java.lang.IllegalArgumentException: Invalid struct: null is not a struct.

When I checked the code:

For AVRO, engineSchema is passed directly to SparkAvroWriter with no null fallback (SparkFormatModels.java line 43)

For Parquet: SparkParquetWriters.buildWriter has a fallback, if engineSchema is null, it converts from icebergSchema (SparkParquetWriters.java line 89)

This is according to my understanding, please correct me if I am incorrect. Should I keep engineSchema in the tests, or should AVRO have a similar fallback?

We should create a similar fallback for Avro in an independent PR.
This is why these tests are good!

pvary · 2026-02-25T11:23:02Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

+
+  protected abstract Object readEngineSchema(Schema schema);
+
+  protected abstract List<W> testRecords();


I would leave the responsibility of generating the test records in the base class. We might want to add different test data later (like array of maps, or struct of arrays), and I would love to see that the tests are run automatically.

Maybe something like the Flink DataGenerators could help here. Or a simple converter method?

pvary · 2026-02-25T11:25:15Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

+    InputFile inputFile = encryptedFile.encryptingOutputFile().toInputFile();
+    List<R> readRecords;
+    try (CloseableIterable<R> reader =
+        FormatModelRegistry.readBuilder(fileFormat, readType(), inputFile)


We don't need engine specific reader for the positional deletes. We can just read with the generic reader.

pvary · 2026-02-25T11:26:26Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

+
+  protected abstract void assertEquals(Types.StructType struct, List<W> expected, List<R> actual);
+
+  protected abstract List<W> expectedPositionDeletes(Schema schema);


We don't need to test engine specific readers for position deletes. It is enough if the engine specific write is tested with the generic read

…ations with Generic, Spark, and Flink tests

rambleraptor

Loving the direction this is going!

rambleraptor · 2026-02-25T19:36:52Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

+
+  @ParameterizedTest
+  @FieldSource("FILE_FORMATS")
+  public void testDataWriterRoundTrip(FileFormat fileFormat) throws IOException {


What would you think about creating a roundTrip method (or possibly several depending on the types)? Most of these roundTrip methods are trying to do the same things.

My gut feeling is that we'd use the roundTrip methods on a lot of different tests.

…Generic tests

pvary · 2026-02-26T12:55:26Z

data/src/test/java/org/apache/iceberg/data/TestBaseFormatModel.java

-public class TestGenericFormatModels {
-  private static final List<Record> TEST_RECORDS =
-      RandomGenericData.generate(TestBase.SCHEMA, 10, 1L);
+public abstract class TestBaseFormatModel<T> {


Make sure that the visibility modifiers as strict as possible for classes, methods, attributes

github-actions bot added spark data flink labels Feb 25, 2026

joyhaldar force-pushed the file-format-api-tck branch from 9abf7c7 to 1e3e8a7 Compare February 25, 2026 11:08

pvary reviewed Feb 25, 2026

View reviewed changes

git push --forceAdd BaseFormatModelTest TCK for FormatModel implement…

42ab761

…ations with Generic, Spark, and Flink tests

joyhaldar force-pushed the file-format-api-tck branch from 1e3e8a7 to 42ab761 Compare February 25, 2026 12:01

rambleraptor reviewed Feb 25, 2026

View reviewed changes

Refactor TestBaseFormatModel to single type parameter for Engine <-> …

70b016b

…Generic tests

pvary reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BaseFormatModelTest for FormatModel implementations#15441

Add BaseFormatModelTest for FormatModel implementations#15441
joyhaldar wants to merge 2 commits intoapache:mainfrom
joyhaldar:file-format-api-tck

joyhaldar commented Feb 25, 2026 •

edited

Loading

Uh oh!

pvary Feb 25, 2026

Uh oh!

pvary Feb 25, 2026

Uh oh!

joyhaldar Feb 26, 2026

Uh oh!

pvary Feb 26, 2026

Uh oh!

pvary Feb 25, 2026

Uh oh!

pvary Feb 25, 2026

Uh oh!

pvary Feb 25, 2026

Uh oh!

rambleraptor left a comment

Uh oh!

rambleraptor Feb 25, 2026

Uh oh!

pvary Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		protected abstract Class<W> writeType();

		protected abstract Class<R> readType();


		protected abstract Object readEngineSchema(Schema schema);

		protected abstract List<W> testRecords();


		protected abstract void assertEquals(Types.StructType struct, List<W> expected, List<R> actual);

		protected abstract List<W> expectedPositionDeletes(Schema schema);

Conversation

joyhaldar commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rambleraptor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joyhaldar commented Feb 25, 2026 •

edited

Loading