Welcome to CUDA CODE GENeral

A code-generating GPU database system that executes all 22 TPC-H benchmark queries using CUDA. Queries are compiled at runtime using either NVRTC (fast) or NVCC, with kernel code generated through a composable operator framework. CODEGENeral is extendable and allows benchmarking on custom defined schemas and queries.

       .^.
      /_+_\
     ( o_o )        
  __/  (_)  \__    "Elevate your GPU Database!"
 / /|  ___  |\ \    - the CODEGENERAL
| | | {JIT} | | |   
 \ \|  ---  |/ /
  '--'_____'--'

Features

All 22 TPC-H queries implemented
Runtime JIT compilation (NVRTC or NVCC)
Operator-based code generation framework
Bitmap-based join acceleration
Shared memory reductions for high-performance aggregations

Requirements

CUDA Toolkit (tested with CUDA 11+)
CMake 3.18+
C++17 compiler
TPC-H data files (.tbl format)

Building

mkdir build && cd build
cmake ..
make -j8

Running TPC-H Queries

Basic Usage

# Run with TPC-H data directory
./src/tpch_demo_refactored /path/to/tpch-dbgen

# Run with random test data (no data directory)
./src/tpch_demo_refactored

Options

Option	Description
`--nvcc`	Use NVCC compiler instead of NVRTC (slower compilation, same kernel performance)
`--dp`	Use Dynamic Parallelism when generating the code for launching kernels
`--coop`	Use cooperative groups and fuse generated code into a single kernel
`-v, --verbose`	Print generated CUDA kernel code
`-q <number>`	Execute TPC-H only
`-o, --output FILE`	Write timing results to CSV file
`-h, --help`	Show help message

Examples

# Run all queries with NVRTC and export timing to CSV
./src/tpch_demo /path/to/tpch-dbgen -o results.csv

# Run with NVCC compiler and verbose output
./src/tpch_demo /path/to/tpch-dbgen --nvcc -v

# Compare NVRTC vs NVCC compilation times
./src/tpch_demo /path/to/tpch-dbgen -o nvrtc_timing.csv
./src/tpch_demo /path/to/tpch-dbgen --nvcc -o nvcc_timing.csv

CSV Output Format

query,compiler,lineitem_count,compilation_ms,kernel_ms,total_ms
Q1,NVRTC,6001215,117.38,30.65,148.04
Q2,NVRTC,6001215,535.31,0.24,535.54
...

Code Generation Framework (codegen)

The query engine uses a composable operator framework to generate CUDA kernels. Operators are chained together using a producer-consumer pattern.

Core Infrastructure

Class	Description
`Codegen`	Code generator that builds CUDA kernel source code. Manages indentation, parameters, and code blocks.
`Operator`	Abstract base class for all operators. Defines `produce(codegen, consume)` interface.
`UnaryOperator`	Base class for operators with a single child operator.

Table Scan Operators

Operator	Description
`GPUTableScan`	Basic table scan with simple for-loop. Iterates over tuples with `idx` variable.
`GPUTableScanGridStride`	Table scan using grid-stride loop pattern for better GPU occupancy.

Selection (Filter) Operators

Operator	Description
`Selection`	Filter rows using a predicate (for basic scans). Generates `if (predicate) { ... }`.

Bitmap Operations

Operator	Description
`BitmapBuild`	Build a bitmap from qualifying rows: `bitmap[keyExpr] = 1`.
`BitmapJoin`	Filter rows by checking bitmap: `if (bitmap[keyExpr]) { ... }`.
`MultiBitmapJoin`	Join on multiple bitmaps simultaneously (AND logic).
`AntiBitmapJoin`	Anti-join: passes rows where bitmap is NOT set: `if (!bitmap[keyExpr]) { ... }`.
`TableBitmapBuild`	Standalone operator that scans a table and builds a bitmap in one step.
`TableBitmapBuildGridStride`	Grid-stride version of TableBitmapBuild.

Array Operations

Operator	Description
`ArrayLookup`	Read value from array: `resultVar = array[keyExpr]`.
`ArrayStore`	Write value to array: `array[keyExpr] = valueExpr`.

Aggregation Operators

Operator	Description
`AtomicArrayAgg`	Atomic aggregation by key: `atomicAdd(&array[bucketExpr], valueExpr)`.
`AtomicArrayCount`	Atomic count by key: `atomicAdd(&array[keyExpr], 1)`.
`AtomicCount`	Atomic increment of a single counter.
`SharedMemReductionAgg`	High-performance reduction using shared memory (one atomicAdd per block).
`ArrayMaxReduction`	Find maximum value in an array using shared memory reduction.
`KeyedAggregation`	GROUP BY aggregation with multiple aggregates per bucket.
`KeyedDualAggregation`	Dual aggregation with bucket key (GROUP BY with two aggregates).

Expression Operators

Operator	Description
`ComputeExpr`	Compute a derived value: `type varName = expression`.

Example: Building a Query Kernel

#include "codegen/codegen.hpp"

// Generate kernel for: SELECT SUM(l_extendedprice) FROM lineitem WHERE l_shipdate >= '1994-01-01'
codegen::Codegen cg;
cg.setKernelName("aggregateRevenue");

auto scan = std::make_unique<codegen::GPUTableScanGridStride>("lineitem", "LineItemTuple", "li");
auto filter = std::make_unique<codegen::Selection>(std::move(scan),
    "date_ge(li.l_shipdate, date_start)");
codegen::AtomicArrayAgg agg(std::move(filter), "d_result", "0", "li.l_extendedprice", "double");

agg.produce(&cg, [](){});
std::string kernelCode = cg.print();

Project Structure

CUDACodeGeneral/
├── codegen/
│   └── codegen.hpp      # Codegen framework
│   └── operator.hpp     # Operator framework
├── queries/tpch/
│   └── tpch_q1.hpp ... tpch_q22.hpp  # TPC-H query implementations
├── schema/tables/tpch/
│   └── tpch_schema.hpp     # TPC-H table definitions
├── src/
│   ├── tpch_demo_refactored.cu  # Main executable
│   ├── tpch_loader.hpp     # TPC-H data loader
│   ├── launcher.hpp        # Kernel launch utilities
│   └── jit_compiler.hpp    # NVRTC/NVCC compilation
└── build/                  # Build output

Performance Notes

NVRTC compilation is typically 3-6x faster than NVCC
Kernel execution times are identical between compilers
Shared memory reductions significantly outperform naive atomic aggregations
Bitmap joins provide efficient multi-table query execution

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.vscode		.vscode
codegen		codegen
codegen_out		codegen_out
include		include
queries/tpch		queries/tpch
schema		schema
src		src
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to CUDA CODE GENeral

Features

Requirements

Building

Running TPC-H Queries

Basic Usage

Options

Examples

CSV Output Format

Code Generation Framework (codegen)

Core Infrastructure

Table Scan Operators

Selection (Filter) Operators

Bitmap Operations

Array Operations

Aggregation Operators

Expression Operators

Example: Building a Query Kernel

Project Structure

Performance Notes

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MaxEmanuel/CUDACodeGeneral

Folders and files

Latest commit

History

Repository files navigation

Welcome to CUDA CODE GENeral

Features

Requirements

Building

Running TPC-H Queries

Basic Usage

Options

Examples

CSV Output Format

Code Generation Framework (codegen)

Core Infrastructure

Table Scan Operators

Selection (Filter) Operators

Bitmap Operations

Array Operations

Aggregation Operators

Expression Operators

Example: Building a Query Kernel

Project Structure

Performance Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages