Skip to content

MaxEmanuel/CUDACodeGeneral

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

169 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to CUDA CODE GENeral

A code-generating GPU database system that executes all 22 TPC-H benchmark queries using CUDA. Queries are compiled at runtime using either NVRTC (fast) or NVCC, with kernel code generated through a composable operator framework. CODEGENeral is extendable and allows benchmarking on custom defined schemas and queries.

       .^.
      /_+_\
     ( o_o )        
  __/  (_)  \__    "Elevate your GPU Database!"
 / /|  ___  |\ \    - the CODEGENERAL
| | | {JIT} | | |   
 \ \|  ---  |/ /
  '--'_____'--'

Features

  • All 22 TPC-H queries implemented
  • Runtime JIT compilation (NVRTC or NVCC)
  • Operator-based code generation framework
  • Bitmap-based join acceleration
  • Shared memory reductions for high-performance aggregations

Requirements

  • CUDA Toolkit (tested with CUDA 11+)
  • CMake 3.18+
  • C++17 compiler
  • TPC-H data files (.tbl format)

Building

mkdir build && cd build
cmake ..
make -j8

Running TPC-H Queries

Basic Usage

# Run with TPC-H data directory
./src/tpch_demo_refactored /path/to/tpch-dbgen

# Run with random test data (no data directory)
./src/tpch_demo_refactored

Options

Option Description
--nvcc Use NVCC compiler instead of NVRTC (slower compilation, same kernel performance)
--dp Use Dynamic Parallelism when generating the code for launching kernels
--coop Use cooperative groups and fuse generated code into a single kernel
-v, --verbose Print generated CUDA kernel code
-q <number> Execute TPC-H only
-o, --output FILE Write timing results to CSV file
-h, --help Show help message

Examples

# Run all queries with NVRTC and export timing to CSV
./src/tpch_demo /path/to/tpch-dbgen -o results.csv

# Run with NVCC compiler and verbose output
./src/tpch_demo /path/to/tpch-dbgen --nvcc -v

# Compare NVRTC vs NVCC compilation times
./src/tpch_demo /path/to/tpch-dbgen -o nvrtc_timing.csv
./src/tpch_demo /path/to/tpch-dbgen --nvcc -o nvcc_timing.csv

CSV Output Format

query,compiler,lineitem_count,compilation_ms,kernel_ms,total_ms
Q1,NVRTC,6001215,117.38,30.65,148.04
Q2,NVRTC,6001215,535.31,0.24,535.54
...

Code Generation Framework (codegen)

The query engine uses a composable operator framework to generate CUDA kernels. Operators are chained together using a producer-consumer pattern.

Core Infrastructure

Class Description
Codegen Code generator that builds CUDA kernel source code. Manages indentation, parameters, and code blocks.
Operator Abstract base class for all operators. Defines produce(codegen, consume) interface.
UnaryOperator Base class for operators with a single child operator.

Table Scan Operators

Operator Description
GPUTableScan Basic table scan with simple for-loop. Iterates over tuples with idx variable.
GPUTableScanGridStride Table scan using grid-stride loop pattern for better GPU occupancy.

Selection (Filter) Operators

Operator Description
Selection Filter rows using a predicate (for basic scans). Generates if (predicate) { ... }.

Bitmap Operations

Operator Description
BitmapBuild Build a bitmap from qualifying rows: bitmap[keyExpr] = 1.
BitmapJoin Filter rows by checking bitmap: if (bitmap[keyExpr]) { ... }.
MultiBitmapJoin Join on multiple bitmaps simultaneously (AND logic).
AntiBitmapJoin Anti-join: passes rows where bitmap is NOT set: if (!bitmap[keyExpr]) { ... }.
TableBitmapBuild Standalone operator that scans a table and builds a bitmap in one step.
TableBitmapBuildGridStride Grid-stride version of TableBitmapBuild.

Array Operations

Operator Description
ArrayLookup Read value from array: resultVar = array[keyExpr].
ArrayStore Write value to array: array[keyExpr] = valueExpr.

Aggregation Operators

Operator Description
AtomicArrayAgg Atomic aggregation by key: atomicAdd(&array[bucketExpr], valueExpr).
AtomicArrayCount Atomic count by key: atomicAdd(&array[keyExpr], 1).
AtomicCount Atomic increment of a single counter.
SharedMemReductionAgg High-performance reduction using shared memory (one atomicAdd per block).
ArrayMaxReduction Find maximum value in an array using shared memory reduction.
KeyedAggregation GROUP BY aggregation with multiple aggregates per bucket.
KeyedDualAggregation Dual aggregation with bucket key (GROUP BY with two aggregates).

Expression Operators

Operator Description
ComputeExpr Compute a derived value: type varName = expression.

Example: Building a Query Kernel

#include "codegen/codegen.hpp"

// Generate kernel for: SELECT SUM(l_extendedprice) FROM lineitem WHERE l_shipdate >= '1994-01-01'
codegen::Codegen cg;
cg.setKernelName("aggregateRevenue");

auto scan = std::make_unique<codegen::GPUTableScanGridStride>("lineitem", "LineItemTuple", "li");
auto filter = std::make_unique<codegen::Selection>(std::move(scan),
    "date_ge(li.l_shipdate, date_start)");
codegen::AtomicArrayAgg agg(std::move(filter), "d_result", "0", "li.l_extendedprice", "double");

agg.produce(&cg, [](){});
std::string kernelCode = cg.print();

Project Structure

CUDACodeGeneral/
├── codegen/
│   └── codegen.hpp      # Codegen framework
│   └── operator.hpp     # Operator framework
├── queries/tpch/
│   └── tpch_q1.hpp ... tpch_q22.hpp  # TPC-H query implementations
├── schema/tables/tpch/
│   └── tpch_schema.hpp     # TPC-H table definitions
├── src/
│   ├── tpch_demo_refactored.cu  # Main executable
│   ├── tpch_loader.hpp     # TPC-H data loader
│   ├── launcher.hpp        # Kernel launch utilities
│   └── jit_compiler.hpp    # NVRTC/NVCC compilation
└── build/                  # Build output

Performance Notes

  • NVRTC compilation is typically 3-6x faster than NVCC
  • Kernel execution times are identical between compilers
  • Shared memory reductions significantly outperform naive atomic aggregations
  • Bitmap joins provide efficient multi-table query execution

License

See LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •