Adding tpcds benchmark by tsafin · Pull Request #12 · tsafin/tpch-cpp

tsafin · 2026-03-06T22:19:49Z

Summary

This PR adds a dedicated tpcds_benchmark executable and integrates the TPC-DS dsdgen reference code into this repository in a form usable from the existing Arrow-based writer pipeline.

The branch started as initial TPC-DS table support and grew to include:

complete TPC-DS table generation coverage used by the benchmark
Arrow schema/converter plumbing for TPC-DS rows
dictionary encoding for selected low-cardinality columns
Parquet compression controls
Lance support with zero-copy / streaming modes
profiling-driven fixes for Lance streaming fragmentation and memory behavior
follow-up cleanup from PR review feedback

What is included

1. TPC-DS benchmark executable

adds tpcds_benchmark
CLI broadly mirrors tpch_benchmark
supports single-table generation into multiple output formats

2. Embedded TPC-DS dsdgen integration

builds and embeds the required tpcds.idx distribution data at build time
adds a thin C/C++ wrapper around the upstream TPC-DS generator
exposes per-table generation APIs from C++
keeps the upstream TPC-DS source mostly intact while making it usable in an embedded benchmark binary

3. Table support

Initial phases added fact tables and then expanded to the full set used by the benchmark work in this branch, including:

store_sales
inventory
catalog_sales
web_sales
store_returns
catalog_returns
web_returns
dimension tables such as customer, item, date_dim, call_center, catalog_page, web_page, web_site, warehouse, ship_mode, household_demographics, customer_demographics, customer_address, income_band, reason, time_dim, promotion, store

4. Arrow conversion layer

adds TPC-DS row-to-Arrow builder conversion
moves builder access to positional indexing via generated column indices
adds dictionary arrays for selected low-cardinality string columns
adds lance.cardinality metadata hints on TPC-DS schemas

5. Output format support

Parquet
CSV
ORC
Lance
existing optional formats wired through the common writer interface where enabled

Additional format-related work in this branch:

--compression for Parquet (snappy, zstd, none)
ORC fix for INT32 handling

Lance zero-copy / streaming work

A substantial part of this branch investigates and improves Lance behavior for TPC-DS generation.

Added functionality

--zero-copy streaming mode in tpcds_benchmark
Lance-specific zero-copy mode selection (sync, auto, async)
Rust/C++ copy telemetry for investigation
simplified FFI/CLI after removing dev-only tuning knobs that did not prove useful

Main conclusions from the investigation

for single-table TPC-DS generation, sync zero-copy is the practical default
the earlier async path incurred large extra RSS and Tokio overhead for this workload
a major throughput issue came from overly small sync Lance flush/fragment sizes
increasing sync flush size from very small fragments to 128 batches / 1,048,576 rows significantly reduced fragment/manifest churn and improved store_sales scaling

Current default from this branch

tpcds_benchmark --format lance --zero-copy defaults to the bounded sync path
sync zero-copy uses larger flushes to avoid pathological small-fragment behavior

Profiling / benchmarking artifacts

This branch also includes investigation notes and generated benchmark artifacts under benchmark-results/, including:

SF=5 zero-copy mode comparisons across major TPC-DS fact tables
store_sales streaming scaling investigations
large-fragment experiments
memory projections / plots

These files are included because they were used to justify behavior changes in the Lance path and to document the observed tradeoffs.

Review-driven cleanup included

explicit Arrow error propagation instead of ValueOrDie() in the TPC-DS batch finalization path
early CLI validation for invalid --scale-factor
stronger exception-safety around installed global dsdgen callbacks
compile-time static_assert checks tying TableType values to generated TPC-DS table constants
refactoring of the repeated master-detail sales generation logic (store_sales, catalog_sales, web_sales)

Known limitations / follow-up items

*_returns generation semantics still deserve a follow-up review; there is an open question about whether benchmark convenience should preserve the native sparse return-rate exactly
the tpcds_benchmark table dispatch remains explicit if/else logic for now; this could be converted to a dispatch table later if desired
Lance zero-copy sync scaling is much better after the fragment fix, but higher-scale runs are still partly stall/writeback bound and likely merit further storage-side profiling

Testing / validation

Validation in this branch was primarily benchmark- and profiler-driven:

repeated tpcds_benchmark runs on major fact tables
SF matrix runs for store_sales
perf record / perf stat profiling of Lance zero-copy paths
rebuild and smoke-checks after review-driven code changes

Adds a complete tpcds_benchmark executable (built with -DTPCDS_ENABLE=ON) that generates TPC-DS store_sales and inventory tables into Parquet, CSV, ORC, Lance, Paimon, or Iceberg via the existing tpch::WriterInterface. Build infrastructure (third_party/dsdgen/): - CMakeLists.txt: mkheader → tables.h/streams.h/columns.h → distcomp → tpcds.idx → gen_dsts.py → dsts_generated.c → dsdgen_objs object library - dsdgen_stubs.c: thin file-I/O stubs (not needed in embedded mode) - tpcds_dsdgen.h: C++-safe forward declarations that bypass the fragile config.h/porting.h LINUX/HUGE_TYPE dependency chain; defines ds_key_t, decimal_t, ds_pricing_t, W_STORE_SALES_TBL, W_INVENTORY_TBL, and the EMBEDDED_DSDGEN callback globals - cmake/gen_dsts.py: embeds tpcds.idx as a C byte array (dsts_generated.c) C++ wrappers (include/tpch/, src/dsdgen/): - dsdgen_wrapper.hpp/.cpp: DSDGenWrapper class — initialises dsdgen from the embedded tpcds.idx (via mkstemp temp file), exposes Arrow schemas for STORE_SALES and INVENTORY, and drives generation via the EMBEDDED_DSDGEN callback trampoline (store_sales is master-detail: 8-16 line items per ticket; callback fires once per line item) - dsdgen_converter.hpp/.cpp: append_*_to_builders() helpers mapping the dsdgen C structs to Arrow array builders; dec_to_double() correctly converts decimal_t scaled integers (avoids buggy upstream dectoflt) Executable (src/tpcds_main.cpp): - CLI mirrors tpch_benchmark: --format, --table, --scale-factor, --output-dir, --max-rows, --verbose - Batched Arrow generation loop (10 000 rows/batch) → writer->write_batch third_party/tpcds submodule updated to branch tpcds_cpp_embedded which adds EMBEDDED_DSDGEN guards to w_store_sales.c (callback + suppressed file I/O for both store_sales and store_returns output files). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…hers) Implements the following tables in tpcds_benchmark: catalog_sales — master-detail (callback-based, 4-14 line items/ticket) web_sales — master-detail (callback-based, 8-16 line items/ticket) customer — dimension table (simple, 100K rows at SF=1) item — dimension table (simple, ~18K rows at SF=1) date_dim — dimension table (simple, 73,049 fixed rows) store_returns — driven through store_sales ticket loop catalog_returns — driven through catalog_sales ticket loop web_returns — driven through web_sales ticket loop Changes: - tpcds_dsdgen.h: added struct definitions for all 8 new table row types, table ID constants, mk_w_* function declarations, and EMBEDDED_DSDGEN extern callback declarations for catalog_sales and web_sales - dsdgen_wrapper.hpp: added generate_* method declarations for 8 tables - dsdgen_wrapper.cpp: Arrow schemas, trampolines (catalog/web sales), generate_* implementations; returns tables are driven through their parent sales loop using a no-op callback to suppress file output while populating the global sales struct for valid return references - dsdgen_converter.hpp/.cpp: Arrow builder converters for all 8 tables; updated append_dsdgen_row_to_builders() dispatcher - tpcds_main.cpp: updated parse_table(), generation dispatch, and usage text to reflect Phase 3 completion - third_party/tpcds submodule: updated to include w_catalog_sales.c and w_web_sales.c EMBEDDED_DSDGEN callback patches All 10 implemented tables verified: correct row counts, sane values. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tsafin · 2026-03-06T22:20:08Z

@codex review

Copilot

Pull request overview

Adds TPC-DS data generation support by vendoring the upstream tools, embedding distribution data, and exposing a new tpcds_benchmark executable.

Changes:

Adds a third_party/tpcds submodule and a CMake build for compiling/embedding TPC-DS dsdgen.
Introduces a C++ DSDGenWrapper + Arrow conversion layer for several initial TPC-DS tables.
Adds a TPCDS_ENABLE build flag and wires a new tpcds_benchmark target into the top-level build.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
third_party/tpcds	Adds the TPC-DS tools submodule pin for `dsdgen` sources.
third_party/dsdgen/tpcds_dsdgen.h	Provides a C++-friendly API surface for calling into `dsdgen` objects.
third_party/dsdgen/dsdgen_stubs.c	Placeholder TU for embed-mode stubs (currently comment-only).
third_party/dsdgen/CMakeLists.txt	Builds `mkheader`, `distcomp`, embeds `tpcds.idx`, and compiles `dsdgen` as objects.
src/tpcds_main.cpp	New `tpcds_benchmark` CLI that generates Arrow batches and writes using existing writer backends.
src/dsdgen/dsdgen_wrapper.cpp	Initializes embedded distributions + exposes per-table generation APIs with callbacks.
src/dsdgen/dsdgen_converter.cpp	Converts `dsdgen` structs into Arrow builders for supported tables.
include/tpch/dsdgen_wrapper.hpp	Public wrapper API + table enum + schema helpers.
include/tpch/dsdgen_converter.hpp	Public conversion function declarations.
cmake/gen_dsts.py	New generator to embed `tpcds.idx` as a `uint8_t[]` C array.
CMakeLists.txt	Adds `TPCDS_ENABLE` option and the `tpcds_benchmark` target when enabled.
.gitmodules	Registers the new `third_party/tpcds` submodule.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

third_party/dsdgen/CMakeLists.txt

src/tpcds_main.cpp

src/dsdgen/dsdgen_converter.cpp

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b0d16b9ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/dsdgen/dsdgen_wrapper.cpp

Adds support for all 24 TPC-DS tables to tpcds_benchmark: call_center, catalog_page, web_page, web_site, warehouse, ship_mode, household_demographics, customer_demographics, customer_address, income_band, reason, time_dim, promotion, store. Key changes: - tpcds_dsdgen.h: fix ds_addr_t (add missing plus4 field); add structs and TPCDS_* constants for all 14 new tables - dsdgen_wrapper.hpp/.cpp: TPCDS_SIMPLE_GENERATE macro + 14 new generators - dsdgen_converter.cpp: append_*_to_builders + dispatcher for 14 tables; append_addr_fields() helper for ds_addr_t (street/city/state/zip/etc.) - tpcds_main.cpp: parse_table() + generation dispatch for all 24 tables Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ructs The old approach manually copied all struct definitions (W_STORE_SALES_TBL, ds_addr_t, etc.) into tpcds_dsdgen.h, which could silently diverge from the actual generator sources — as demonstrated by the missing ds_addr_t.plus4 field that caused stack smashing in call_center/warehouse/customer_address. New approach: include the canonical tpcds w_*.h headers directly. All 24 table structs are now sourced from the real implementation files. The TPCDS_* integer constants are kept as prefixed literals (tables.h defines bare macros like CALL_CENTER=0 that would collide with C++ enum member names). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e enum - Re-added #include "tables.h" to tpcds_dsdgen.h - Changed TPCDS_* from literal values to aliases (e.g., #define TPCDS_CALL_CENTER CALL_CENTER) - This allows using the canonical build-generated constants while avoiding macro collisions - Renamed TableType enum members to PascalCase (CallCenter, CatalogPage, etc.) to avoid colliding with ALL_CAPS macros from tables.h - Updated all TableType:: references throughout dsdgen_wrapper.cpp and tpcds_main.cpp - All 24 TPC-DS tables now fully functional via CLI and generation dispatch This maintains the canonical struct layouts (by including real tpcds headers) while resolving namespace conflicts through C++ enum conventions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

tsafin · 2026-03-07T14:30:14Z

@codex review

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/tpch/dsdgen_converter.hpp

include/tpch/dsdgen_wrapper.hpp

src/tpcds_main.cpp

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1015c1c823

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/dsdgen/dsdgen_wrapper.cpp

src/tpcds_main.cpp

Apply Arrow dictionary(int8(), utf8()) encoding to low/medium-cardinality TPC-DS columns, following TPC-H Phase 3.3 pattern (+57% throughput improvement). Encoded columns (50 total): - customer_demographics: cd_gender, cd_marital_status, cd_education_status, cd_credit_rating - customer_address: ca_location_type, ca_state, ca_country, ca_street_type - time_dim: t_am_pm, t_shift, t_sub_shift, t_meal_time - date_dim: d_day_name - item: i_category, i_size, i_color, i_units, i_container - call_center: cc_class, cc_hours, cc_name + address fields (cc_state, cc_country, cc_street_type) - catalog_page: cp_department, cp_type - web_page: wp_type - web_site: web_class + address fields (web_state, web_country, web_street_type) - warehouse: address fields (w_state, w_country, w_street_type) - ship_mode: sm_type, sm_code, sm_carrier - store: s_hours, s_geography_class, s_division_name, s_company_name + address fields (s_state, s_country, s_street_type) - customer: c_salutation - promotion: p_purpose Implementation: - src/tpcds_main.cpp: DICTIONARY type handling in create_builders() and finish_batch() - include/tpch/dsdgen_converter.hpp: get_dict_for_field() declaration - src/dsdgen/dsdgen_converter.cpp: 25+ encode functions + dictionary registry (41 entries) - src/dsdgen/dsdgen_wrapper.cpp: 9 table schemas updated to dict8 All 24 TPC-DS tables validated (SF=1). Expected: +50-60% performance gain at scale (from Phase 3.3 precedent). Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Two targeted fixes to address fact-table regressions found after DS-1 dict8 encoding (store_sales -59%, catalog_sales -74%, web_sales -74%): 1. Disable Parquet auto-dict for int64/int32/float64 columns - Add make_writer_props(schema) helper in parquet_writer.cpp - Iterates schema fields; calls disable_dictionary() for numeric types - Arrow dictionary(int8,utf8) columns unaffected (identified by name) - Eliminates ScalarMemoTable<double>::GetOrInsert (was 8.85% of CPU) and ScalarMemoTable<long>::GetOrInsert (was 4.53%) from profiles - Used in all 3 WriterProperties construction sites 2. std::map → std::unordered_map for builder lookup - Add BuilderMap type alias in dsdgen_converter.hpp - Replace all 26 function signatures in dsdgen_converter.cpp - Replace create_builders/finish_batch/reset_builders in tpcds_main.cpp - O(log N) string comparison → O(1) hash per column per row Measured gains (SF=1, Parquet/SNAPPY, avg 2 runs): web_sales: 356K → 424K r/s (+19%) catalog_sales: 359K → 414K r/s (+16%) store_returns: 161K → 167K r/s (+4%) web_returns: 115K → 121K r/s (+5%) customer_demographics: 1354K → 1508K r/s (+11%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… indices Replace BuilderMap = unordered_map<string, Builder> with BuilderMap = vector<Builder> (schema-order), with named access via auto-generated constexpr column index constants. Code reads: builders[col::store_returns::sr_returned_date_sk] // readable + zero cost instead of: builders["sr_returned_date_sk"] // was: hash per row New files: include/tpch/dsdgen_col_idx.hpp — auto-generated constexpr indices (24 tables, one constexpr size_t per column per table) scripts/gen_col_indices.py — generates dsdgen_col_idx.hpp and rewrites builders[N] → builders[col::TABLE::col] in converter Regenerate after schema changes: python3 scripts/gen_col_indices.py dsdgen_converter.cpp: - 374 builders["col"] → builders[col::TABLE::col] (named index) - append_addr_fields: pfx+string arg → size_t base arg - Callers pass col::TABLE::PREFIX_street_number (named, self-documenting) dsdgen_converter.hpp: BuilderMap = vector<shared_ptr<ArrayBuilder>> tpcds_main.cpp: create_builders push_back, finish_batch/reset by index Measured gains vs DS-2 unordered_map (SF=1, warm runs): store_returns: 167K → 198K r/s (+19%) web_sales: 424K → 777K r/s (+83%) catalog_sales: 414K → 768K r/s (+86%) store_sales: 543K →1012K r/s (+86%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Profiling after DS-3 showed Snappy at ~32% CPU for store_sales, but removing it actually SLOWS things down: compression reduces write volume, hiding I/O latency on WSL2 VirtIO-blk. Snappy remains the default and is the optimal choice. The flag is useful for explicit benchmarking of the tradeoff: --compression snappy (default, optimal for I/O-bound workloads) --compression zstd (better ratio, slightly more CPU) --compression none (fastest encode, most disk I/O) Note: LZ4 is intentionally excluded (Parquet LZ4 interoperability issues). Implementation: - ParquetWriter::set_compression(codec) + compression_codec_ member - parse_compression() maps string → parquet::Compression::type - tpcds_main: --compression CLI flag routed through create_writer() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add cmake/gen_dist_cache.py that reads tpcds.idx at build time and emits static const int[] / decimal_t[] arrays for all 42 TKN_INT and 0 TKN_DECIMAL value sets across 67 TPC-DS distributions. CMake integration in third_party/dsdgen/CMakeLists.txt: - Step 2c: gen_dist_cache.py → dist_cache_generated.c (51 KB) - dist_cache_generated.c added to dsdgen_objs OBJECT library - New tpcds_dist_cache_gen target The generated tpcds_lookup_int_cache() / tpcds_lookup_dec_cache() are referenced by dist.c (in submodule commit) when EMBEDDED_DSDGEN is defined. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ports the same cardinality-hint pattern from TPC-H (Phase 3.2/3.5) to TPC-DS schemas. Without hints Lance runs XXH3+HLL per batch per utf8 column, causing store_sales SF=5 to degrade to ~86K rows/s. Changes: - Add tpcds_field() helper (mirrors tpch_field()) that attaches lance.cardinality metadata to Arrow utf8 fields - Pass scale_factor to get_schema() so SF-scaled counts are correct - Annotate every utf8 field in all 18 TPC-DS dimension+fact schemas with the appropriate bounded cardinality from TPC-DS v3 spec - Convert hd_buy_potential (6 values) from utf8 to dict8; add encode_hd_buy_potential() and register the dictionary in get_dict_for_field() Result: store_sales SF=5 Lance throughput 86K -> 1.13M rows/s (13x). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In Apache ORC all integer types (tinyint/smallint/int/bigint) are stored using LongVectorBatch — there is no IntVectorBatch for the `int` (32-bit) ORC type. The previous code performed a dynamic_cast to orc::IntVectorBatch for Arrow INT32 arrays which always returned nullptr, causing every table with int32 fields to crash with "Failed to cast ORC column to IntVectorBatch". Fix: cast to LongVectorBatch (same as INT64 path), matching the ORC C++ library contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Enables Parquet streaming write path (ParquetWriter::enable_streaming_write()) and Lance streaming path (LanceWriter::enable_streaming_write(true)) via a new --zero-copy / -z CLI flag. Impact on store_sales SF=5: Parquet without --zero-copy: 1.00M r/s, 4.2 GB peak RSS Parquet with --zero-copy: 1.36M r/s, 271 MB peak RSS (15× less RAM, 36% faster) Lance: both modes stream via Rust FFI — no C++ batch accumulation regardless. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Change batch_size from 10,000 → 8,192 rows in run_generation() to match Lance's default max_rows_per_group. This ensures C++ batches align to Lance row-group boundaries, eliminating split/leftover rows at edges. Also benefits Parquet and ORC stripe alignment. SF=10 store_sales results (28.8M rows): Lance buffered: 290K → 868K r/s after alignment Lance zero-copy: stability improved Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

… tuning

- Add -DTPCDS_ENABLE=ON to all build-matrix configs (base, orc, lance) so tpcds_benchmark is built and uploaded as part of the CI artifact - Rename benchmark-suite → tpch-benchmark-suite and optimization-benchmarks → tpch-optimization-benchmarks for clarity - Add tpcds-benchmark-suite job: 12 matrix entries covering csv/parquet × 4 tables, orc/lance × 2 tables (store_returns, store_sales, customer, item); tpcds.idx is embedded in the binary so no runtime data files are needed - Add tpcds-optimization-benchmarks job: 8 matrix entries covering parquet/lance × baseline/zero-copy × store_returns/store_sales - Update results-aggregation to wait on all 4 benchmark jobs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tsafin and others added 2 commits March 6, 2026 23:30

tsafin requested a review from Copilot March 6, 2026 22:19

Copilot AI reviewed Mar 6, 2026

View reviewed changes

third_party/dsdgen/CMakeLists.txt Show resolved Hide resolved

src/tpcds_main.cpp Outdated Show resolved Hide resolved

src/dsdgen/dsdgen_converter.cpp Show resolved Hide resolved

src/dsdgen/dsdgen_converter.cpp Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

src/dsdgen/dsdgen_wrapper.cpp Outdated Show resolved Hide resolved

tsafin and others added 3 commits March 7, 2026 13:43

tsafin requested a review from Copilot March 7, 2026 14:29

Copilot AI reviewed Mar 7, 2026

View reviewed changes

include/tpch/dsdgen_converter.hpp Show resolved Hide resolved

include/tpch/dsdgen_wrapper.hpp Outdated Show resolved Hide resolved

src/tpcds_main.cpp Outdated Show resolved Hide resolved

src/tpcds_main.cpp Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

src/dsdgen/dsdgen_wrapper.cpp Show resolved Hide resolved

src/tpcds_main.cpp Show resolved Hide resolved

tsafin and others added 17 commits March 7, 2026 18:47

Phase DS-7: bump tpcds submodule (cache per-tabId price limits)

d42edad

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lance: document sf5 investigation and drop ineffective stream toggles

19ceecc

lance: add zero-copy sync mode and remove store_sales hack path

79b418c

docs: add sf5 tpcds 3-table zero-copy mode and perf analysis

a3422f4

tpcds/lance: default sync zero-copy and add copy telemetry with async…

e520586

… tuning

docs: add async rss floor isolation with rust stage memory data

6c18837

lance-ffi: add internal live-memory estimator for async stream

fb30a51

tpcds/lance: remove dev cli knobs and simplify ffi instrumentation

3ac8aaa

tsafin and others added 9 commits March 10, 2026 03:04

tpcds/lance: increase sync zero-copy flush size for store_sales

94f08e0

tpcds: address review feedback on wrapper safety

dabcade

tpcds: factor master-detail sales generation helper

83dca5a

cmake: not build lance ffi from source by default

f9c6630

ci: use branch base image for derived docker builds

1a443d1

cmake: prefer prebuilt lance ffi by default

53adb95

ci: disable native cpu tuning for portable artifacts

01f4294

cmake: probe prebuilt lance ffi by exact path

483da57

tsafin merged commit be51337 into master Mar 11, 2026
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding tpcds benchmark#12

Adding tpcds benchmark#12
tsafin merged 31 commits intomasterfrom
tpcds_cpp_embedded

tsafin commented Mar 6, 2026 •

edited

Loading

Uh oh!

tsafin commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

tsafin commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tsafin commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What is included

1. TPC-DS benchmark executable

2. Embedded TPC-DS dsdgen integration

3. Table support

4. Arrow conversion layer

5. Output format support

Lance zero-copy / streaming work

Added functionality

Main conclusions from the investigation

Current default from this branch

Profiling / benchmarking artifacts

Review-driven cleanup included

Known limitations / follow-up items

Testing / validation

Uh oh!

tsafin commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

tsafin commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tsafin commented Mar 6, 2026 •

edited

Loading