fix(glue): add ability for glue to skip StorageDescriptor cols#769
fix(glue): add ability for glue to skip StorageDescriptor cols#769laskoviymishka wants to merge 1 commit intoapache:mainfrom
glue to skip StorageDescriptor cols#769Conversation
|
@zeroshade PTAL, this is legit issue, i think worth to fix here, and i'll try to port it to py/java/rust iceberg's too. |
zeroshade
left a comment
There was a problem hiding this comment.
I just have the one question, otherwise this is fine to me. Primarily my concern is that I don't want us to be creating an entirely new catalog property if there might already exist a standard / common usage for this. And if there isn't already something standard for this, is this something we should propose as a standard property to get adopted?
| // StorageDescriptor on table create/update. Set to "false" to omit columns, | ||
| // which avoids exceeding Glue's API payload limit for tables with very large | ||
| // schemas. Note: disabling columns breaks Athena column discovery via Glue. | ||
| GlueSchemaColumns = "glue.schema-columns" |
There was a problem hiding this comment.
is this a known property that exists outside of our usage of it here? or are we creating an entirely new usage/property here?
There was a problem hiding this comment.
well, that's a case, it's something new.
This is a new property, it doesn't exist in iceberg-java or pyiceberg today. However, it follows the established glue.* namespace convention used consistently across implementations (e.g. glue.skip-archive, glue.id, glue.endpoint).
The Java community attempted to introduce something similar twice — glue.non-current-fields-disabled in apache/iceberg#11334 and apache/iceberg#12664 — both targeting the narrower problem of historical/non-current fields. Those PRs stalled and were closed due to inactivity without consensus, i would try to make them merged later, i think this has value.
There was a problem hiding this comment.
I can try to get some pre-alignment in slack, if you can point me to a right direction here, so we can start this from broader context, not just iceberg-go one.
There was a problem hiding this comment.
I think the best route here would be to send an email to the iceberg mailing list, linking this PR, and starting the discussion there.
… columns Introduces a new catalog property `glue.schema-columns` (default: `true`) that controls whether Iceberg schema columns are written to the Glue StorageDescriptor on table create and update operations. When set to `false`, the StorageDescriptor is populated with location only, avoiding the Glue API payload size limit that is hit for tables with very large schemas (e.g. ~3600 fields). Fixes apache#701
f16bce0 to
e5f7670
Compare
Introduces a new catalog property
glue.schema-columns(default:true) that controls whether Iceberg schema columns are written to the Glue StorageDescriptor on table create and update operations.When set to
false, the StorageDescriptor is populated with location only, avoiding the Glue API payload size limit that is hit for tables with very large schemas (e.g. ~3600 fields).Fixes #701