Core: Track & close FileIO used for remote scan planning by nastra · Pull Request #15439 · apache/iceberg

nastra · 2026-02-25T10:18:23Z

This uses a similar approach to what we do in the RESTSessionCatalog with the FileIOTracker by wrapping the RESTTableScan in a WeakReference and close the attached FileIO instance when the RESTTableScan object is garbage-collected. We don't really have close() methods on either Table or Scan, so I can't think of an easier way to keep track of FileIO instances created inside RESTTableScan so that we properly close them before actually reaching the finalizer in S3FileIO, which would then complain as can be seen below:

6/02/25 09:25:19 WARN ResolvingFileIO: Unclosed ResolvingFileIO instance created by:
	org.apache.iceberg.io.ResolvingFileIO.<init>(ResolvingFileIO.java:85)
	java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:51)
	org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:64)
	org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:401)
	org.apache.iceberg.rest.RESTTableScan.fileIOForPlanId(RESTTableScan.java:202)
	org.apache.iceberg.rest.RESTTableScan.planTableScan(RESTTableScan.java:180)
	org.apache.iceberg.rest.RESTTableScan.planFiles(RESTTableScan.java:163)
	org.apache.iceberg.BatchScanAdapter.planFiles(BatchScanAdapter.java:125)
	org.apache.iceberg.spark.source.SparkPartitioningAwareScan.tasks(SparkPartitioningAwareScan.java:185)
	org.apache.iceberg.spark.source.SparkPartitioningAwareScan.taskGroups(SparkPartitioningAwareScan.java:213)
	org.apache.iceberg.spark.source.SparkPartitioningAwareScan.outputPartitioning(SparkPartitioningAwareScan.java:115)
	org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:45)
	org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:43)
	org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:491)

I have verified that this works in combination of #15368, where the fileIOForPlanId wasn't previously closed properly and was running into the WARN shown above.
I'm currently checking to see how to add a unit test for this to TestRESTScanPlanning but I think this is non-trivial (similar to how it was non-trivial to test this behavior in #7487)

singhpk234 · 2026-02-25T22:09:30Z

core/src/main/java/org/apache/iceberg/rest/RESTTableScan.java

  }

+  /** Cancels the plan on the server (if supported) and closes the plan-scoped FileIO */
+  private void cleanupPlanResources() {


may be reverse would be nice in a sense when i cancel plan then i call cleanUpResource, we need to kind of also check the close of the streaming iterator also cancels( which i think it already does) along with the closing the file IO

we need to kind of also check the close of the streaming iterator also cancels( which i think it already does) along with the closing the file IO

Closing the fileIOForPlanId in

iceberg/core/src/main/java/org/apache/iceberg/rest/RESTTableScan.java

Line 316 in 84e307c

this::cancelPlan);

won't work because the FileIO instance is being accessed after close on the iterator is called, hence why we're explicitly closing fileIOForPlanId only in the two other places when exceptions are happening.

Core: Track & close FileIO used for remote scan planning

84e307c

nastra requested review from amogh-jahagirdar and singhpk234 February 25, 2026 10:18

github-actions bot added the core label Feb 25, 2026

nastra mentioned this pull request Feb 25, 2026

API, Core, Spark: Pass Table with FileIO from Scan on Spark's read path #15368

Open

nastra added this to the Iceberg 1.11.0 milestone Feb 25, 2026

singhpk234 reviewed Feb 25, 2026

View reviewed changes

nastra requested review from danielcweeks and singhpk234 February 26, 2026 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Track & close FileIO used for remote scan planning#15439

Core: Track & close FileIO used for remote scan planning#15439
nastra wants to merge 1 commit intoapache:mainfrom
nastra:remote-planning-close-fileio

nastra commented Feb 25, 2026 •

edited

Loading

Uh oh!

singhpk234 Feb 25, 2026

Uh oh!

nastra Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nastra commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

singhpk234 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

nastra Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nastra commented Feb 25, 2026 •

edited

Loading