Skip to content

fix(glue): add ability for glue to skip StorageDescriptor cols#769

Open
laskoviymishka wants to merge 1 commit intoapache:mainfrom
laskoviymishka:fix/glue-large-schema-payload
Open

fix(glue): add ability for glue to skip StorageDescriptor cols#769
laskoviymishka wants to merge 1 commit intoapache:mainfrom
laskoviymishka:fix/glue-large-schema-payload

Conversation

@laskoviymishka
Copy link
Contributor

Introduces a new catalog property glue.schema-columns (default: true) that controls whether Iceberg schema columns are written to the Glue StorageDescriptor on table create and update operations.

When set to false, the StorageDescriptor is populated with location only, avoiding the Glue API payload size limit that is hit for tables with very large schemas (e.g. ~3600 fields).

Fixes #701

@laskoviymishka
Copy link
Contributor Author

@zeroshade PTAL, this is legit issue, i think worth to fix here, and i'll try to port it to py/java/rust iceberg's too.

@laskoviymishka laskoviymishka marked this pull request as ready for review March 4, 2026 22:16
Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have the one question, otherwise this is fine to me. Primarily my concern is that I don't want us to be creating an entirely new catalog property if there might already exist a standard / common usage for this. And if there isn't already something standard for this, is this something we should propose as a standard property to get adopted?

// StorageDescriptor on table create/update. Set to "false" to omit columns,
// which avoids exceeding Glue's API payload limit for tables with very large
// schemas. Note: disabling columns breaks Athena column discovery via Glue.
GlueSchemaColumns = "glue.schema-columns"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a known property that exists outside of our usage of it here? or are we creating an entirely new usage/property here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that's a case, it's something new.

This is a new property, it doesn't exist in iceberg-java or pyiceberg today. However, it follows the established glue.* namespace convention used consistently across implementations (e.g. glue.skip-archive, glue.id, glue.endpoint).

The Java community attempted to introduce something similar twice — glue.non-current-fields-disabled in apache/iceberg#11334 and apache/iceberg#12664 — both targeting the narrower problem of historical/non-current fields. Those PRs stalled and were closed due to inactivity without consensus, i would try to make them merged later, i think this has value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to get some pre-alignment in slack, if you can point me to a right direction here, so we can start this from broader context, not just iceberg-go one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best route here would be to send an email to the iceberg mailing list, linking this PR, and starting the discussion there.

… columns

Introduces a new catalog property `glue.schema-columns` (default: `true`)
that controls whether Iceberg schema columns are written to the Glue
StorageDescriptor on table create and update operations.

When set to `false`, the StorageDescriptor is populated with location only,
avoiding the Glue API payload size limit that is hit for tables with very
large schemas (e.g. ~3600 fields).

Fixes apache#701
@laskoviymishka laskoviymishka force-pushed the fix/glue-large-schema-payload branch from f16bce0 to e5f7670 Compare March 5, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

updateTable using glue with a large schema fails

2 participants