Skip to content

Core: Track & close FileIO used for remote scan planning#15439

Open
nastra wants to merge 1 commit intoapache:mainfrom
nastra:remote-planning-close-fileio
Open

Core: Track & close FileIO used for remote scan planning#15439
nastra wants to merge 1 commit intoapache:mainfrom
nastra:remote-planning-close-fileio

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Feb 25, 2026

This uses a similar approach to what we do in the RESTSessionCatalog with the FileIOTracker by wrapping the RESTTableScan in a WeakReference and close the attached FileIO instance when the RESTTableScan object is garbage-collected. We don't really have close() methods on either Table or Scan, so I can't think of an easier way to keep track of FileIO instances created inside RESTTableScan so that we properly close them before actually reaching the finalizer in S3FileIO, which would then complain as can be seen below:

6/02/25 09:25:19 WARN ResolvingFileIO: Unclosed ResolvingFileIO instance created by:
	org.apache.iceberg.io.ResolvingFileIO.<init>(ResolvingFileIO.java:85)
	java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	org.apache.iceberg.common.DynConstructors$Ctor.newInstanceChecked(DynConstructors.java:51)
	org.apache.iceberg.common.DynConstructors$Ctor.newInstance(DynConstructors.java:64)
	org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:401)
	org.apache.iceberg.rest.RESTTableScan.fileIOForPlanId(RESTTableScan.java:202)
	org.apache.iceberg.rest.RESTTableScan.planTableScan(RESTTableScan.java:180)
	org.apache.iceberg.rest.RESTTableScan.planFiles(RESTTableScan.java:163)
	org.apache.iceberg.BatchScanAdapter.planFiles(BatchScanAdapter.java:125)
	org.apache.iceberg.spark.source.SparkPartitioningAwareScan.tasks(SparkPartitioningAwareScan.java:185)
	org.apache.iceberg.spark.source.SparkPartitioningAwareScan.taskGroups(SparkPartitioningAwareScan.java:213)
	org.apache.iceberg.spark.source.SparkPartitioningAwareScan.outputPartitioning(SparkPartitioningAwareScan.java:115)
	org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:45)
	org.apache.spark.sql.execution.datasources.v2.V2ScanPartitioningAndOrdering$$anonfun$partitioning$1.applyOrElse(V2ScanPartitioningAndOrdering.scala:43)
	org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:491)

I have verified that this works in combination of #15368, where the fileIOForPlanId wasn't previously closed properly and was running into the WARN shown above.
I'm currently checking to see how to add a unit test for this to TestRESTScanPlanning but I think this is non-trivial (similar to how it was non-trivial to test this behavior in #7487)

}

/** Cancels the plan on the server (if supported) and closes the plan-scoped FileIO */
private void cleanupPlanResources() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be reverse would be nice in a sense when i cancel plan then i call cleanUpResource, we need to kind of also check the close of the streaming iterator also cancels( which i think it already does) along with the closing the file IO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to kind of also check the close of the streaming iterator also cancels( which i think it already does) along with the closing the file IO

Closing the fileIOForPlanId in

won't work because the FileIO instance is being accessed after close on the iterator is called, hence why we're explicitly closing fileIOForPlanId only in the two other places when exceptions are happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants