Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
bf57b3c
Add TPC-DS Phase 2: store_sales and inventory generators
tsafin Mar 6, 2026
6b0d16b
Add TPC-DS Phase 3: 8 more tables (catalog_sales, web_sales, and 6 ot…
tsafin Mar 6, 2026
35d5c49
Phase 5: implement all 14 remaining TPC-DS dimension tables
tsafin Mar 7, 2026
5f5f6e1
tpcds_dsdgen.h: include real tpcds headers instead of re-declaring st…
tsafin Mar 7, 2026
1015c1c
Phase 5 final: Re-include tables.h with TPCDS_* aliases and PascalCas…
tsafin Mar 7, 2026
897a4d7
Phase DS-1: TPC-DS Dictionary Encoding — 50 columns across 14 tables
tsafin Mar 7, 2026
75e34a9
Phase DS-2: fix numeric dict regression + unordered_map builder lookup
tsafin Mar 7, 2026
eafe592
Phase DS-3: named positional BuilderMap — zero-cost vector with col::…
tsafin Mar 7, 2026
76b6bb2
Phase DS-4: add --compression flag for Parquet output (snappy/zstd/none)
tsafin Mar 8, 2026
52eafb0
Phase DS-5: build-time dist cache generator + CMake integration
tsafin Mar 8, 2026
d42edad
Phase DS-7: bump tpcds submodule (cache per-tabId price limits)
tsafin Mar 8, 2026
74b73b4
Phase DS-8: Add lance.cardinality hints to TPC-DS schemas
tsafin Mar 8, 2026
ec4d3b4
Fix ORC writer: use LongVectorBatch for INT32 columns
tsafin Mar 8, 2026
7741a1f
Phase DS-9: add --zero-copy streaming mode to tpcds_benchmark
tsafin Mar 8, 2026
3054de9
DS-9: Align batch_size to Lance max_rows_per_group (8192 rows)
tsafin Mar 8, 2026
19ceecc
lance: document sf5 investigation and drop ineffective stream toggles
tsafin Mar 9, 2026
79b418c
lance: add zero-copy sync mode and remove store_sales hack path
tsafin Mar 9, 2026
a3422f4
docs: add sf5 tpcds 3-table zero-copy mode and perf analysis
tsafin Mar 9, 2026
e520586
tpcds/lance: default sync zero-copy and add copy telemetry with async…
tsafin Mar 9, 2026
6c18837
docs: add async rss floor isolation with rust stage memory data
tsafin Mar 9, 2026
fb30a51
lance-ffi: add internal live-memory estimator for async stream
tsafin Mar 9, 2026
3ac8aaa
tpcds/lance: remove dev cli knobs and simplify ffi instrumentation
tsafin Mar 9, 2026
94f08e0
tpcds/lance: increase sync zero-copy flush size for store_sales
tsafin Mar 10, 2026
dabcade
tpcds: address review feedback on wrapper safety
tsafin Mar 10, 2026
83dca5a
tpcds: factor master-detail sales generation helper
tsafin Mar 10, 2026
f9c6630
cmake: not build lance ffi from source by default
tsafin Mar 10, 2026
1a443d1
ci: use branch base image for derived docker builds
tsafin Mar 10, 2026
53adb95
cmake: prefer prebuilt lance ffi by default
tsafin Mar 10, 2026
01f4294
ci: disable native cpu tuning for portable artifacts
tsafin Mar 10, 2026
483da57
cmake: probe prebuilt lance ffi by exact path
tsafin Mar 10, 2026
7f60403
ci: add TPC-DS build, benchmark suite, and optimization jobs
tsafin Mar 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .docker/Dockerfile.lance
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Lance Docker image extending base with Rust and Lance FFI for TPC-H benchmarks
FROM ghcr.io/tsafin/tpch-cpp-base:latest
ARG BASE_IMAGE=ghcr.io/tsafin/tpch-cpp-base:latest
FROM ${BASE_IMAGE}

LABEL org.opencontainers.image.source="https://github.com/tsafin/tpch-cpp"
LABEL org.opencontainers.image.description="TPC-H C++ Lance Build Environment with Arrow/Parquet/Lance"
Expand Down
3 changes: 2 additions & 1 deletion .docker/Dockerfile.orc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# ORC Docker image extending base with ORC support for TPC-H benchmarks
FROM ghcr.io/tsafin/tpch-cpp-base:latest
ARG BASE_IMAGE=ghcr.io/tsafin/tpch-cpp-base:latest
FROM ${BASE_IMAGE}

LABEL org.opencontainers.image.source="https://github.com/tsafin/tpch-cpp"
LABEL org.opencontainers.image.description="TPC-H C++ ORC Build Environment with Arrow/Parquet/ORC"
Expand Down
234 changes: 226 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,16 +117,19 @@ jobs:
-DCMAKE_PREFIX_PATH=${{ matrix.deps_path }} \
-DTPCH_ENABLE_ORC=${{ matrix.enable_orc }} \
-DTPCH_ENABLE_LANCE=${{ matrix.enable_lance }} \
-DTPCH_ENABLE_NATIVE_OPTIMIZATIONS=OFF \
-DTPCH_ENABLE_ASYNC_IO=ON \
-DTPCH_ENABLE_ASAN=OFF \
-DTPCH_BUILD_TESTS=${{ matrix.enable_tests }}
-DTPCH_BUILD_TESTS=${{ matrix.enable_tests }} \
-DTPCDS_ENABLE=ON

- name: Build project
run: cmake --build build -j$(nproc)

- name: Verify executable and tests
run: |
test -f build/tpch_benchmark && echo "✓ tpch_benchmark created"
test -f build/tpcds_benchmark && echo "✓ tpcds_benchmark created"
test -f build/tests/buffer_lifetime_manager_test && echo "✓ buffer_lifetime_manager_test created" || true
test -f build/tests/dbgen_batch_iterator_test && echo "✓ dbgen_batch_iterator_test created" || true
if [ "${{ matrix.enable_lance }}" = "ON" ]; then
Expand Down Expand Up @@ -176,12 +179,13 @@ jobs:
name: tpch-benchmark-${{ matrix.config }}
path: |
build/tpch_benchmark
build/tpcds_benchmark
build/tests/*_test
retention-days: 1
if-no-files-found: error

benchmark-suite:
name: Benchmark Suite
tpch-benchmark-suite:
name: TPC-H Benchmark Suite
runs-on: ubuntu-22.04
needs: [resolve-images, build-matrix]
timeout-minutes: 20
Expand Down Expand Up @@ -330,13 +334,13 @@ jobs:
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-logs-suite-${{ matrix.format }}-${{ matrix.table }}
name: tpch-benchmark-logs-suite-${{ matrix.format }}-${{ matrix.table }}
path: benchmark-results/${{ matrix.format }}_${{ matrix.table }}_baseline.log
retention-days: 30
if-no-files-found: ignore

optimization-benchmarks:
name: Optimization Benchmarks (${{ matrix.format }}-${{ matrix.mode }})
tpch-optimization-benchmarks:
name: TPC-H Optimization Benchmarks (${{ matrix.format }}-${{ matrix.mode }})
runs-on: ubuntu-22.04
needs: [resolve-images, build-matrix]
timeout-minutes: 20
Expand Down Expand Up @@ -533,15 +537,229 @@ jobs:
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-logs-optimization-${{ matrix.format }}-${{ matrix.mode }}-${{ matrix.table }}
name: tpch-benchmark-logs-optimization-${{ matrix.format }}-${{ matrix.mode }}-${{ matrix.table }}
path: benchmark-results/${{ matrix.format }}_${{ matrix.table }}_${{ matrix.mode }}.log
retention-days: 30
if-no-files-found: ignore

tpcds-benchmark-suite:
name: TPC-DS Benchmark Suite
runs-on: ubuntu-22.04
needs: [resolve-images, build-matrix]
timeout-minutes: 20
container:
image: ${{ matrix.build == 'base' && needs.resolve-images.outputs.base_image || matrix.build == 'orc' && needs.resolve-images.outputs.orc_image || needs.resolve-images.outputs.lance_image }}
options: --user root
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

strategy:
fail-fast: false
matrix:
include:
# CSV format
- format: csv
table: store_returns
build: base
- format: csv
table: store_sales
build: base
- format: csv
table: customer
build: base
- format: csv
table: item
build: base
# Parquet format
- format: parquet
table: store_returns
build: base
- format: parquet
table: store_sales
build: base
- format: parquet
table: customer
build: base
- format: parquet
table: item
build: base
# ORC format
- format: orc
table: store_returns
build: orc
- format: orc
table: store_sales
build: orc
# Lance format
- format: lance
table: store_returns
build: lance
- format: lance
table: store_sales
build: lance

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 1

- name: Download build artifact
uses: actions/download-artifact@v4
with:
name: tpch-benchmark-${{ matrix.build }}
path: .

- name: Setup benchmark executable
run: |
chmod +x tpcds_benchmark
mkdir -p benchmark-results
export LD_LIBRARY_PATH=/opt/dependencies/lib:$LD_LIBRARY_PATH
echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> $GITHUB_ENV

- name: Run format coverage benchmark
run: |
if ! timeout 600 ./tpcds_benchmark \
--scale-factor 1 \
--format ${{ matrix.format }} \
--table ${{ matrix.table }} \
--output-dir benchmark-results/ \
2>&1 | grep -v "^DEBUG:" | tee "benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_baseline.log"; then
echo "ERROR: Benchmark failed with exit code $?"
exit 1
fi

if grep -q "dumped core" "benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_baseline.log"; then
echo "ERROR: Benchmark crashed with core dump"
exit 1
fi

if grep -qi "unknown format\|unsupported format\|not supported" "benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_baseline.log"; then
echo "ERROR: Format ${{ matrix.format }} not supported by this build"
exit 1
fi

- name: Upload benchmark logs
if: always()
uses: actions/upload-artifact@v4
with:
name: tpcds-benchmark-logs-suite-${{ matrix.format }}-${{ matrix.table }}
path: benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_baseline.log
retention-days: 30
if-no-files-found: ignore

tpcds-optimization-benchmarks:
name: TPC-DS Optimization Benchmarks (${{ matrix.format }}-${{ matrix.mode }})
runs-on: ubuntu-22.04
needs: [resolve-images, build-matrix]
timeout-minutes: 20
container:
image: ${{ matrix.image == 'base' && needs.resolve-images.outputs.base_image || needs.resolve-images.outputs.lance_image }}
options: --user root
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

strategy:
fail-fast: false
matrix:
include:
# Parquet benchmarks
- format: parquet
mode: baseline
table: store_returns
image: base
- format: parquet
mode: baseline
table: store_sales
image: base
- format: parquet
mode: zero-copy
table: store_returns
image: base
- format: parquet
mode: zero-copy
table: store_sales
image: base
# Lance benchmarks
- format: lance
mode: baseline
table: store_returns
image: lance
- format: lance
mode: baseline
table: store_sales
image: lance
- format: lance
mode: zero-copy
table: store_returns
image: lance
- format: lance
mode: zero-copy
table: store_sales
image: lance

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 1

- name: Download build artifact
uses: actions/download-artifact@v4
with:
name: tpch-benchmark-${{ matrix.image }}
path: .

- name: Setup benchmark executable
run: |
chmod +x tpcds_benchmark
mkdir -p benchmark-results
export LD_LIBRARY_PATH=/opt/dependencies/lib:$LD_LIBRARY_PATH
echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> $GITHUB_ENV

- name: Run optimization benchmark
run: |
MODE_FLAGS=""
if [ "${{ matrix.mode }}" = "zero-copy" ]; then
MODE_FLAGS="--zero-copy"
fi

if ! timeout 600 ./tpcds_benchmark \
--scale-factor 1 \
--format ${{ matrix.format }} \
--table ${{ matrix.table }} \
--output-dir benchmark-results/ \
$MODE_FLAGS \
2>&1 | grep -v "^DEBUG:" | tee "benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_${{ matrix.mode }}.log"; then
echo "ERROR: Benchmark failed with exit code $?"
exit 1
fi

if grep -q "dumped core" "benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_${{ matrix.mode }}.log"; then
echo "ERROR: Benchmark crashed with core dump"
exit 1
fi

if grep -qi "unknown format\|unsupported format\|not supported" "benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_${{ matrix.mode }}.log"; then
echo "ERROR: Format ${{ matrix.format }} not supported by this build"
exit 1
fi

- name: Upload benchmark logs
if: always()
uses: actions/upload-artifact@v4
with:
name: tpcds-benchmark-logs-optimization-${{ matrix.format }}-${{ matrix.mode }}-${{ matrix.table }}
path: benchmark-results/tpcds_${{ matrix.format }}_${{ matrix.table }}_${{ matrix.mode }}.log
retention-days: 30
if-no-files-found: ignore

results-aggregation:
name: Aggregate Results
runs-on: ubuntu-22.04
needs: [benchmark-suite, optimization-benchmarks]
needs: [tpch-benchmark-suite, tpch-optimization-benchmarks, tpcds-benchmark-suite, tpcds-optimization-benchmarks]
if: always()

steps:
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/docker-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,16 @@ jobs:
submodules: recursive
fetch-depth: 1

- name: Resolve base image
id: base-image
run: |
BRANCH_TAG=$(echo "${GITHUB_REF_NAME}" | tr '/' '-' | tr '[:upper:]' '[:lower:]')
if [ "${{ needs.build-base.result }}" = "success" ]; then
echo "image=${{ env.IMAGE_PREFIX }}-base:${BRANCH_TAG}" >> "$GITHUB_OUTPUT"
else
echo "image=${{ env.IMAGE_PREFIX }}-base:latest" >> "$GITHUB_OUTPUT"
fi

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

Expand All @@ -187,6 +197,8 @@ jobs:
context: .
file: .docker/Dockerfile.orc
push: true
build-args: |
BASE_IMAGE=${{ steps.base-image.outputs.image }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: |
Expand Down Expand Up @@ -219,6 +231,16 @@ jobs:
submodules: recursive
fetch-depth: 1

- name: Resolve base image
id: base-image
run: |
BRANCH_TAG=$(echo "${GITHUB_REF_NAME}" | tr '/' '-' | tr '[:upper:]' '[:lower:]')
if [ "${{ needs.build-base.result }}" = "success" ]; then
echo "image=${{ env.IMAGE_PREFIX }}-base:${BRANCH_TAG}" >> "$GITHUB_OUTPUT"
else
echo "image=${{ env.IMAGE_PREFIX }}-base:latest" >> "$GITHUB_OUTPUT"
fi

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

Expand All @@ -245,6 +267,8 @@ jobs:
context: .
file: .docker/Dockerfile.lance
push: true
build-args: |
BASE_IMAGE=${{ steps.base-image.outputs.image }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: |
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,6 @@
[submodule "third_party/lance"]
path = third_party/lance
url = https://github.com/tsafin/lance.git
[submodule "third_party/tpcds"]
path = third_party/tpcds
url = https://github.com/tsafin/tpchds-tools.git
Loading
Loading