Video Diffusion CUDA Optimization

High-performance video generation pipeline with custom CUDA kernel optimizations for Stable Video Diffusion.

Project Overview

This project implements custom CUDA kernels to optimize video diffusion models, achieving:

2-3x speedup on attention operations
30-40% overall latency reduction
8-12 FPS on NVIDIA T4 GPU (Google Colab free tier)

Hardware Requirements

GPU: NVIDIA T4 (16GB) - available free on Google Colab
CUDA: 12.2+ (pre-installed on Colab)
Python: 3.10+

Project Structure

video_diffusion_cuda/
├── src/                    # Source code
│   ├── baseline/          # Baseline PyTorch implementation
│   ├── cuda_kernels/      # Custom CUDA kernels
│   ├── extensions/        # PyTorch C++ extensions
│   ├── optimized/         # Optimized pipeline
│   └── utils/             # Utilities and profiling
├── tests/                 # Test suite
│   ├── unit/             # Unit tests
│   ├── property/         # Property-based tests
│   └── integration/      # Integration tests
├── notebooks/            # Colab notebooks
├── docs/                 # Documentation
└── benchmarks/           # Benchmark scripts

Quick Start (Google Colab)

See notebooks/setup_colab.ipynb for complete setup instructions.

Development Status

🚧 Under active development

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHECKPOINT_1_RESULTS.md		CHECKPOINT_1_RESULTS.md
PROGRESS.md		PROGRESS.md
README.md		README.md
conftest.py		conftest.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Diffusion CUDA Optimization

Project Overview

Hardware Requirements

Project Structure

Quick Start (Google Colab)

Development Status

About

Uh oh!

Releases

Packages

Languages

Kash6/VideoDiffusionCUDA

Folders and files

Latest commit

History

Repository files navigation

Video Diffusion CUDA Optimization

Project Overview

Hardware Requirements

Project Structure

Quick Start (Google Colab)

Development Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages