Skip to content

Spark: Add sort_by parameter to rewrite_manifests procedure#15467

Open
hemanthboyina wants to merge 1 commit intoapache:mainfrom
hemanthboyina:rewrite_manifests_sortby
Open

Spark: Add sort_by parameter to rewrite_manifests procedure#15467
hemanthboyina wants to merge 1 commit intoapache:mainfrom
hemanthboyina:rewrite_manifests_sortby

Conversation

@hemanthboyina
Copy link
Contributor

This PR adds the sort_by parameter to the rewrite_manifests stored procedure, exposing the sortBy functionality
that was added to RewriteManifestsSparkAction . Currently, custom manifest clustering by partition fields
is only accessible through the Java API. This change allows SQL users to specify which partition fields to cluster
manifests by, which can reduce scan planning time by enabling Spark to skip manifests that don't contain relevant
partition values.

Example:
CALL catalog.system.rewrite_manifests(table => 'db.sample', sort_by => array('category'));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant