Skip to content

expr: Make sqlfunc macro generic over container element types#35431

Open
antiguru wants to merge 11 commits intoMaterializeInc:mainfrom
antiguru:sqlfunc_generic
Open

expr: Make sqlfunc macro generic over container element types#35431
antiguru wants to merge 11 commits intoMaterializeInc:mainfrom
antiguru:sqlfunc_generic

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Mar 11, 2026

Add phantom type parameters to DatumList, DatumMap, and Array (defaulting to Datum<'a>) so the #[sqlfunc] proc macro can track which input type parameter flows to the output.
The macro detects generic type parameters like T in fn foo<T>(a: DatumList<T>) -> T, classifies their structural position (bare, in list, in array, in map, in range), and auto-derives output_type_expr.
Since T defaults to Datum<'a>, the emitted function compiles without type erasure.

Changes in mz_repr:

  • Add PhantomData<fn() -> T> to DatumList, DatumMap; propagate T through Array via its DatumList field
  • Add private constructors DatumList::new(), DatumMap::new() to centralize PhantomData bookkeeping
  • Add FromDatum trait with Borrow<Datum<'a>> supertrait, replacing the previous IntoDatum trait
  • Add typed_iter() on DatumList and DatumMap yielding typed T elements via FromDatum
  • Add RowArena::make_datum_list() accepting impl IntoIterator<Item = T> for type-safe list construction

Changes in mz_expr_derive_impl:

  • Add classify_generic_usage to detect how T appears in types (bare, in container, absent)
  • Add derive_output_type_for_generic covering all container-to-container, container-to-bare, bare-to-bare, and cross-container derivations
  • Extract container type name constants with compile-time test verifying they match actual type names
  • Support multiple generic type parameters (e.g., <A, B>)

Migrated functions (removing manual output_type_expr):

  • Lists: list_list_concat, list_element_concat, element_list_concat, list_remove
  • Arrays: cast_array_to_list_one_dim, array_index
  • Maps: map_get_value
  • Ranges: range_lower, range_upper, range_empty, range_lower_inc, range_upper_inc, range_lower_inf, range_upper_inf, range_union, range_intersection

Other:

  • Update builtin view descriptors for more precise list nullability
  • Add design doc at doc/developer/design/20260311_sqlfunc_generic.md

🤖 Generated with Claude Code

Add phantom type parameters to DatumList, DatumMap, and Array
(defaulting to Datum<'a>) so the #[sqlfunc] proc macro can track
which input type parameter flows to the output. The macro detects
generic type parameters like T in `fn foo<T>(a: DatumList<T>) -> T`,
auto-derives the output_type_expr, and erases T → Datum<'a> in the
emitted code.

Supports multiple generic type parameters (e.g., `<A, B>`) and
all container types: DatumList, Array, DatumMap, Range.

Also adds DatumList::iter_typed() which yields typed T elements
via a FromDatum trait, enabling compile-time type safety while
monomorphizing to identity at runtime.

Migrates range_lower, range_upper, list_list_concat,
list_element_concat, element_list_concat, list_remove, and
map_get_value to use the new generic syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

antiguru and others added 8 commits March 11, 2026 16:40
* Add typed map iterator (`DatumDictTypedIter`) wrapping `DatumDictIter`
  and `DatumMap::iter_typed()` method
* Add `RowArena::make_datum_list()` returning typed `DatumList<'a, T>`
* Migrate list concat/remove functions to use `make_datum_list`
* Change `map_get_value` return type to `Option<T>` using `iter_typed()`
* Make `DatumList::empty()` and `DatumMap::empty()` generic over T
* Make `Debug` for `DatumMap` generic over T

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add insta tests showing macro expansion and erasure for:
* Unary generic over Range<T> (range_lower)
* Binary generic over DatumList<T> (list_list_concat)
* Multi-generic binary with A, B (multi_generic)

Also addresses remaining PR feedback:
* Add DatumDictTypedIter wrapping DatumDictIter (no duplication)
* Add RowArena::make_datum_list for typed list construction
* Use iter_typed() in list/map functions
* Make DatumList/DatumMap::empty() and DatumMap Debug generic
* Fix line overflow in quote! macros

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of erasing T → Datum<'a> in the emitted function, keep the
generic signature intact so users see readable, type-safe code.

Split FromDatum into two traits:
* FromDatum<'a>: converts Datum<'a> → T (for iteration)
* IntoDatum<'a>: converts T → Datum<'a> (for packing)

Add RowPacker::push_datum and push_list_elems for generic element
types. Users add FromDatum<'a> bounds to their function signatures
where needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`make_datum_list` previously accepted a `FnOnce(&mut RowPacker)` closure,
which made it easy to accidentally double-wrap elements (e.g., calling
`push_list_elems` inside the closure would create a nested list).
Change it to accept an `IntoIterator<Item = T>` instead, which
guarantees only elements of type `T` are pushed and prevents the
double-wrapping bug that caused `list_list_concat` to produce
`List([List([...])])` instead of `List([...])`.

Simplify the list function implementations to use iterator combinators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…methods

Replace the `IntoDatum` trait with `Borrow<Datum<'a>>` as a supertrait
on `FromDatum`. This makes `push_datum` and `push_list_elems` on
`RowPacker` unnecessary since callers can use the existing `push`
method with `*val.borrow()`.

Also refactor `DatumListTypedIter` to wrap `DatumListIter` and delegate
to it, matching the pattern already used by `DatumDictTypedIter`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ility

The `element_list_concat` and `list_element_concat` functions now
correctly report their output as non-nullable since they always produce
a list, even when one input is NULL. Update the expected descriptors
for `mz_dataflow_channel_operators_per_worker` and
`mz_dataflow_channel_operators` to reflect this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert `cast_array_to_list_one_dim` and `array_index` to use generic
type parameters, removing their explicit `output_type_expr` attributes.

Generalize `Array::dims()` and `Array::elements()` from
`impl<'a> Array<'a>` to `impl<'a, T> Array<'a, T>` so they work with
typed arrays.

Add cross-container derivation case `(InDatumList, InArray)` to
the sqlfunc macro so it can auto-derive the output type when a
function converts from `Array<T>` to `DatumList<T>`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert range_union, range_intersection, range_empty, range_lower_inc,
range_upper_inc, range_lower_inf, and range_upper_inf to use generic
type parameters instead of concrete Datum types, enabling auto-derived
output_type_expr in the sqlfunc macro.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
antiguru and others added 2 commits March 11, 2026 21:41
* Remove no-op `emitted_func` if/else blocks in unary, binary, variadic
* Rename `iter_typed` to `typed_iter` on DatumList and DatumMap
* Add private constructors `DatumList::new()` and `DatumMap::new()`,
  use them at all construction sites
* Handle `(Bare, Bare)` case in output type derivation by forwarding
  the input type
* Prefer container usages over bare in tuple generic classification
* Extract container type name strings to constants and add compile-time
  test verifying they match actual type names
* Update design doc to reflect current implementation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@antiguru antiguru marked this pull request as ready for review March 11, 2026 21:01
@antiguru antiguru requested review from a team as code owners March 11, 2026 21:01
@antiguru antiguru requested a review from ohbadiah March 11, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant