Skip to content

26.1 Antalya port - Timezone for partitioning#1453

Merged
zvonand merged 11 commits intoantalya-26.1from
frontport/antalya-26.1/timezone_for_partitioning
Mar 2, 2026
Merged

26.1 Antalya port - Timezone for partitioning#1453
zvonand merged 11 commits intoantalya-26.1from
frontport/antalya-26.1/timezone_for_partitioning

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Feb 26, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Frontport for Antalya 26.1

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru ianton-ru added antalya port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Feb 26, 2026
@github-actions
Copy link

github-actions bot commented Feb 26, 2026

Workflow [PR], commit [918ab34]

@zvonand
Copy link
Collaborator

zvonand commented Mar 2, 2026

test_storage_delta/test.py::test_network_activity_with_system_tables is failing in multiple PRs

@zvonand
Copy link
Collaborator

zvonand commented Mar 2, 2026

test_storage_delta/test.py::test_network_activity_with_system_tables fails in many PRs

@zvonand zvonand merged commit 3e0d716 into antalya-26.1 Mar 2, 2026
307 of 313 checks passed
@CarlosFelipeOR
Copy link
Collaborator

QA Verification

PR Summary

This PR adds the iceberg_partition_timezone setting to support correct date/time partitioning in Iceberg tables when the server timezone differs from the data timezone.

Integration Tests (Altinity/ClickHouse CI)

PR test passed (6/6 runs across x86 and arm64):

  • test_database_iceberg/test_partition_timezone.py::test_partition_timezone — PASSED on amd_binary (2/5, 3/5) and arm_binary (1/4)

Backward compatibility: Existing Iceberg integration tests (test_database_iceberg/, test_storage_iceberg/) do not use the new iceberg_partition_timezone setting (default = empty) and all passed, confirming no regression on existing behavior.

Unrelated failures:

Test / Job Status Root Cause
test_storage_delta/test.py::test_network_activity_with_system_tables FAIL Regression from PR #1432 (confirmed by developer). Fix tracked in #1474
Integration tests (arm_binary, distributed plan, 3/4) CANCELLED / ERROR ARM CI infrastructure timeout issue, being fixed by PR #1466. Affects all antalya-26.1 PRs

Regression Tests (Altinity/clickhouse-regression)

No iceberg_partition_timezone tests exist in the regression suites yet. PR #1453 is covered by its own integration test.

All failures are unrelated and pre-existing:

Suite x86 arm64 Reason
iceberg_1, iceberg_2 FAIL FAIL Antalya features still being forward-ported to 26.1
parquet FAIL FAIL Antalya features still being forward-ported to 26.1
s3_minio_export_partition FAIL FAIL Antalya features still being forward-ported to 26.1
s3_minio_export_part FAIL FAIL Antalya features still being forward-ported to 26.1
swarms FAIL FAIL Antalya features still being forward-ported to 26.1
settings FAIL FAIL Not updated for 26.1 yet
version FAIL FAIL Fixed in PR #1377, build was not up to date when regression was run
ice FAIL FAIL New suite, not ready
aggregate_functions_3 FAIL PASS Flaky test (rankCorrState snapshot), being addressed by the QA team

Test Coverage Notes

The integration test covers the primary scenario: DayTransform on TimestampType with a positive timezone offset (Asia/Istanbul, UTC+3) and verifies both data reading and partition pruning with iceberg_partition_timezone=UTC.

Not covered by the current test: other temporal transforms (hour, month, year), TimestamptzType, and negative timezone offsets. These are acceptable gaps for a forward-port of an existing feature and can be addressed with additional tests in the future.

Conclusion

All CI failures are justified and unrelated to PR #1453. The PR's integration test passes on both x86 and arm64. Backward compatibility is confirmed by existing Iceberg tests passing without the new setting.

@CarlosFelipeOR CarlosFelipeOR added the verified Verified by QA label Mar 4, 2026
@CarlosFelipeOR
Copy link
Collaborator

PR #1453 Audit Review

AI review comment (model: gpt-5.3-codex)

1) Scope and partitions

PR introduces Iceberg partition-timezone support and test harness updates across 14 files.
Functional partitioning used:

  • Partition A (runtime Iceberg logic)
    src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.*
    src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.*
    src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFile.cpp
    src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.*
  • Partition B (settings/compatibility)
    src/Core/Settings.cpp
    src/Core/SettingsChangesHistory.cpp
  • Partition C (integration/test infra)
    compose ports + cluster env wiring + new integration test and configs in tests/integration/test_database_iceberg/*

Audit focus:

  • transform parsing and argument-shape consistency,
  • pruning/read/insert path semantic consistency,
  • error-contract consistency,
  • concurrency/locking/regression safety,
  • test coverage relevance.

2) Call graph

Main changed flow:

  • Setting read: Context::getSettingsRef()[Setting::iceberg_partition_timezone]
  • Transform decode: parseTransformAndArgument(transform_name, time_zone)
  • Read/pruning path:
    ManifestFileContent::ManifestFileContent(...) -> getASTFromTransform(...) -> partition key AST / pruning condition
  • Insert partitioning path:
    ChunkPartitioner::ChunkPartitioner(...) -> ChunkPartitioner::partitionChunk(...) (adds timezone const arg)
  • Sort-key path:
    getSortingKeyDescriptionFromMetadata(...) -> string expression build -> KeyDescription::parse(...)

Integration boundaries:

  • Iceberg metadata JSON,
  • function resolution via FunctionFactory,
  • parser/planner via KeyDescription::parse,
  • REST/minio docker wiring in integration setup.

3) Transition matrix

ID Transition Invariant
T1 transform string -> TransformAndArgument temporal transforms carry timezone only when configured
T2 TransformAndArgument -> pruning AST generated function arg order matches function signature
T3 TransformAndArgument -> insert partition function args partition key calculation is timezone-consistent
T4 TransformAndArgument -> sort-key expression string generated SQL expression is syntactically valid
T5 new setting registration/history compatibility machinery remains valid

4) Logical code-path testing summary

Reviewed branches:

  • temporal transforms (year/month/day/hour) now include optional timezone.
  • non-temporal transforms (identity/void/bucket/truncate) keep previous argument semantics.
  • unknown/malformed transforms still return nullopt or throw where expected.
  • pruning AST generation now appends timezone literal argument when present.
  • insert partitioning now appends timezone constant column when present.
  • sort-key generation appends timezone to textual expression.

Handled-failure behavior:

  • unknown transform in pruning path logs warning + no AST for that transform.
  • malformed bucket/truncate remains fail-closed (nullopt/exception path).

Concurrency/timing:

  • no new shared mutable state, locks, or lock-order changes in reviewed runtime code.

5) Fault categories and category-by-category injection results

Fault category Status Outcome Defects
Temporal transform argument-shape consistency Executed Fail 1
SQL expression serialization correctness Executed Fail 1 (same root cause)
Unknown/malformed transform handling Executed Pass 0
Error-contract consistency Executed Pass 0
Concurrency/race/deadlock Not Applicable N/A 0
C++ lifetime/iterator/ownership Executed Pass 0
Partial-update/rollback safety Executed Pass 0
Performance/resource failure checks Executed Pass 0

6) Confirmed defects (High/Medium/Low)

Medium — Unquoted timezone in sort-key expression generation

  • Impact: transformed Iceberg sort-order parsing/planning can fail when iceberg_partition_timezone is non-empty.
  • Location: src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp, function getSortingKeyDescriptionFromMetadata.
  • Trigger condition: non-empty iceberg_partition_timezone + temporal sort transform (e.g. day) in metadata.
  • Affected transition: T4.
  • Why this is a defect: timezone is appended as raw token (identifier-like), not as SQL string literal.
  • Proof sketch: expression built as toRelativeDayNum(column, UTC) instead of toRelativeDayNum(column, 'UTC') before KeyDescription::parse(...).
  • Smallest logical repro:
    1. set iceberg_partition_timezone='UTC';
    2. load Iceberg metadata with temporal transformed sort-order;
    3. sort-key parse/planning path consumes malformed expression tokenization.
  • Likely fix direction: quote/escape timezone in expression string (or generate AST instead of string concatenation).
  • Regression test direction: add test for transformed sort-order + timezone setting ensuring sort-key description parses successfully.
  • Blast radius: Iceberg read/planning path for tables with transformed sort order.

Code snippet demonstrating defect condition:

if (clickhouse_transform_name->time_zone)
    full_argument += ", " + *clickhouse_transform_name->time_zone;
...
return KeyDescription::parse(order_by_str, column_description, local_context, true);

7) Coverage accounting + stop-condition status

  • Call-graph nodes covered: all changed runtime nodes in partitions A/B/C.
  • Transitions covered: T1-T5.
  • Fault categories: all executed or explicitly N/A.
  • Coverage stop-condition: met (all in-scope nodes/transitions/categories reviewed or justified N/A).

8) Assumptions & Limits

  • Static audit only (no local runtime execution in this pass).
  • Behavioral confirmation for all data type combinations needs dynamic runs.
  • CI outcome and sanitizer runtime evidence were not re-executed here.

9) Confidence rating and confidence-raising evidence

  • Overall confidence: Medium-High.
  • To raise confidence to High:
    • execute integration/regression for Iceberg transformed sort-order under timezone override,
    • run sanitizer-backed jobs touching this path.

10) Residual risks and untested paths

  • Residual risk: temporal transform behavior across all type variants (Date/DateTime/DateTime64) without runtime matrix.
  • Untested path: sort-order metadata edge cases combined with timezone override in heterogeneous schemas.

@ianton-ru
Copy link
Author

@CarlosFelipeOR
"Unquoted timezone in sort-key expression generation" - yes, need to fix

@CarlosFelipeOR
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases verified-with-issue Verified by QA and issue(s) found.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants