Skip to content

Add challenge 80: Grouped Query Attention (Medium)#215

Merged
kunal-mansukhani merged 2 commits intomainfrom
add-challenge-80-grouped-query-attention
Mar 11, 2026
Merged

Add challenge 80: Grouped Query Attention (Medium)#215
kunal-mansukhani merged 2 commits intomainfrom
add-challenge-80-grouped-query-attention

Conversation

@claude
Copy link
Contributor

@claude claude bot commented Mar 10, 2026

Summary

  • Adds challenge 80: Grouped Query Attention (GQA), a medium-difficulty inference kernel challenge
  • GQA is used in virtually all modern LLMs (LLaMA-3, Mistral, Gemma, Phi-3) to reduce KV-cache memory during inference by sharing K/V heads across groups of Q heads
  • Solvers must implement scaled dot-product attention where num_q_heads / num_kv_heads consecutive Q heads attend to the same K/V head, requiring correct understanding of memory layout, head grouping, and softmax normalization

Checklist

  • challenge.html starts with <p>, has <h2> sections for Implementation Requirements, Example, Constraints
  • First example matches generate_example_test() values
  • Uses <pre> for 1D data (consistent)
  • Constraints includes performance test bullet: num_q_heads=32, num_kv_heads=8, seq_len=1024, head_dim=128
  • SVG visualization included (dark theme, #222 background)
  • challenge.py inherits ChallengeBase, all 6 methods present
  • reference_impl has assertions on shape, dtype, device
  • generate_functional_test returns 10 cases covering edge cases, powers-of-2, non-powers-of-2, MQA, MHA-equivalent, zero inputs, realistic sizes
  • Performance test fits 5x in 16GB VRAM (~600MB for perf test)
  • All 6 starter files present with correct parameter description comments
  • Pre-commit lint passes
  • Validated with run_challenge.py --action submit — all tests pass on NVIDIA TESLA T4

🤖 Generated with Claude Code

Implements a GQA forward pass challenge inspired by real-world LLM
inference (LLaMA-3, Mistral, Gemma). Solvers must correctly handle
Q/K/V tensors with different head counts and implement scaled
dot-product attention with softmax over grouped KV heads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
shxjames
shxjames previously approved these changes Mar 10, 2026
- Add missing output matrices to the Example section (required by checklist)
- Convert example from <pre> notation to LaTeX \begin{bmatrix} for all
  Q, K, V, and output head matrices (required for 2D/3D data per CLAUDE.md)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kunal-mansukhani kunal-mansukhani merged commit 945983a into main Mar 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants