Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions gpt/local/course/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Understanding GPT From Scratch

## A Course Based on Andrej Karpathy's Implementations

**Target Audience:** Early CS students who understand basic programming (Python, loops, functions, classes) but have no background in AI/ML or GPT.

---

## What This Course Covers

This course walks you through **two real GPT implementations** by Andrej Karpathy, starting from zero AI knowledge:

| Implementation | Location | What It Is |
|---|---|---|
| **microGPT** | `../../8627fe009c40f57531cb18360106ce95/microgpt.py` | ~200 lines of pure Python. No libraries. Builds everything from scratch — autograd, neural networks, the full transformer. |
| **minGPT** | `../../minGPT/` | A clean PyTorch implementation. Production-style code with proper modules, training infrastructure, and real GPT-2 compatibility. |

---

## Course Structure

| Folder | Title | Key Idea |
|---|---|---|
| `ch01_What_is_a_Language_Model/` | What is a Language Model? | The big picture — predicting the next word |
| `ch02_Tokenization/` | Tokenization | Turning text into numbers a computer can process |
| `ch03_Autograd/` | Autograd | Teaching computers to do calculus automatically |
| `ch04_Neural_Network_Building_Blocks/` | Neural Network Building Blocks | Linear layers, activation functions, softmax |
| `ch05_Attention_and_Transformers/` | Attention & Transformers | The core innovation behind GPT |
| `ch06_Training_Loop_and_Optimization/` | Training: How Models Learn | Loss functions, backprop, optimizers |
| `ch07_microGPT_Full_Walkthrough/` | microGPT: Full Walkthrough | Line-by-line through the 200-line pure-Python GPT |
| `ch08_Scaling_Up_with_PyTorch/` | Scaling Up with PyTorch | Why we need frameworks, intro to PyTorch |
| `ch09_minGPT_Model_Deep_Dive/` | minGPT: Model Deep Dive | The production-quality GPT architecture |
| `ch10_Training_and_Inference/` | minGPT: Training & Inference | Training loops, text generation, real demos |
| `ch11_Side_by_Side_Comparison/` | Side-by-Side Comparison | microGPT vs minGPT — same ideas, different scales |
| `ch12_Exercises_and_Next_Steps/` | Exercises & Next Steps | Hands-on challenges and further reading |

---

## How to Use This Course

1. **Read chapters in order** — each builds on the previous one
2. **Run the code examples** — every chapter has runnable `.py` files in its folder
3. **Chapters 01-06** teach the concepts with small, isolated examples
4. **Chapters 07-10** apply those concepts to the real Karpathy code
5. **Chapter 11** ties everything together
6. **Chapter 12** gives you challenges to test your understanding

## Prerequisites

- Python basics: variables, loops, functions, classes, lists, dictionaries
- Basic math: addition, multiplication, exponents (no calculus needed — we teach it!)
- A terminal / command line
- Python 3.8+ installed

## Running Examples

```bash
# For chapters 01-07 (pure Python, no dependencies):
python ch01_What_is_a_Language_Model/language_model_idea.py

# For chapters 08-10 (needs PyTorch):
pip install torch
python ch08_Scaling_Up_with_PyTorch/pytorch_basics.py
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
"""
Chapter 01: The Simplest Possible "Language Model"

This is NOT a real language model — it's a toy to show the core idea:
1. Learn patterns from data (training)
2. Generate new text using those patterns (inference)

We simply count how often each character follows another character.
"""

import random

# ============================================================
# STEP 1: Our "training data" — a few names
# ============================================================
training_data = [
"emma", "olivia", "ava", "sophia", "isabella",
"mia", "charlotte", "amelia", "harper", "evelyn",
]

print("=== Training Data ===")
for name in training_data:
print(f" {name}")

# ============================================================
# STEP 2: TRAINING — Count character transitions
# ============================================================
# We'll count: given character X, how often does character Y come next?
# We use a special character '.' to mean "start" or "end" of a name.

# Build a dictionary of dictionaries:
# counts['a']['b'] = number of times 'b' follows 'a' in the training data
counts = {}

for name in training_data:
# Add start/end markers: ".emma."
chars = ['.'] + list(name) + ['.']
for i in range(len(chars) - 1):
current = chars[i]
next_char = chars[i + 1]
if current not in counts:
counts[current] = {}
counts[current][next_char] = counts[current].get(next_char, 0) + 1

# Let's look at what follows 'a':
print("\n=== What follows 'a' in training data? ===")
if 'a' in counts:
total = sum(counts['a'].values())
for char, count in sorted(counts['a'].items(), key=lambda x: -x[1]):
prob = count / total
display = "END" if char == '.' else char
print(f" '{display}' : {count} times ({prob:.0%})")

# ============================================================
# STEP 3: Convert counts to probabilities
# ============================================================
probs = {}
for current_char, next_chars in counts.items():
total = sum(next_chars.values())
probs[current_char] = {}
for next_char, count in next_chars.items():
probs[current_char][next_char] = count / total

# ============================================================
# STEP 4: INFERENCE — Generate new names!
# ============================================================
print("\n=== Generated Names (sampling from our 'model') ===")
random.seed(42)

for i in range(10):
name = []
current = '.' # start token

for _ in range(20): # max length safety
if current not in probs:
break
# Get possible next characters and their probabilities
next_chars = list(probs[current].keys())
weights = [probs[current][c] for c in next_chars]

# Sample randomly, weighted by probability
chosen = random.choices(next_chars, weights=weights, k=1)[0]

if chosen == '.': # end token
break
name.append(chosen)
current = chosen

print(f" {i+1:2d}. {''.join(name)}")

# ============================================================
# KEY TAKEAWAYS
# ============================================================
print("""
=== Key Takeaways ===
1. We LEARNED patterns from data (counted character transitions)
2. We GENERATED new text by sampling from those patterns
3. The generated names "sound like" the training data but are new

This is exactly what GPT does — just with WAY more sophisticated
pattern detection (neural networks instead of simple counting).

Next chapter: How do we turn text into numbers? → Tokenization
""")
102 changes: 102 additions & 0 deletions gpt/local/course/ch01_What_is_a_Language_Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Chapter 01: What is a Language Model?

## The One-Sentence Summary

A language model is a program that **predicts the next word** (or character) given some previous words.

---

## Think of It Like Autocomplete

You've used autocomplete on your phone:

```
You type: "How are ___"
Phone suggests: "you"
```

That's a language model! It looked at "How are" and predicted "you" is the most likely next word.

GPT is the same idea — just way more powerful.

---

## The Core Loop

Every language model follows this pattern:

```
1. Look at some text → "The cat sat on the"
2. Predict the next word → "mat" (70%), "floor" (20%), "dog" (10%)
3. Pick one (sample or take best) → "mat"
4. Append it and repeat → "The cat sat on the mat"
```

That's it. GPT generates entire essays by repeating steps 1-4 over and over.

---

## But How Does It Know?

The model **learns patterns from data**. If you show it millions of sentences, it notices:
- "The cat sat on the ___" → usually "mat", "floor", "chair"
- "Once upon a ___" → usually "time"
- "def __init__(self, ___" → usually a parameter name

It doesn't "understand" language. It's really good at **pattern matching**.

---

## The Two Phases

### Phase 1: Training (Learning)
- Feed the model tons of text
- For each position, it tries to predict the next character/word
- When it's wrong, adjust its internal numbers to be less wrong next time
- Repeat millions of times

### Phase 2: Inference (Generating)
- Give it a starting text (prompt)
- Let it predict the next token, one at a time
- It generates new text that "sounds like" its training data

---

## Characters vs Words vs Tokens

Language models can predict at different levels:

| Level | Example Input | Predicts |
|---|---|---|
| **Character-level** | `['H', 'e', 'l', 'l']` | `'o'` |
| **Word-level** | `['The', 'cat']` | `'sat'` |
| **Subword (BPE)** | `['The', ' cat', ' s']` | `'at'` |

- **microGPT** uses character-level (simplest)
- **minGPT** can use any level, but demos use character-level and BPE

---

## What's Inside a Language Model?

At its core, a language model is just a **function with adjustable numbers** (parameters):

```
f(input_tokens, parameters) → probability of each possible next token
```

- The **parameters** are millions of numbers that encode patterns
- **Training** = finding good values for those numbers
- **Architecture** = how those numbers are organized and combined

GPT uses an architecture called a **Transformer**, which we'll learn about in Chapter 05.

---

## Run the Example

The file `language_model_idea.py` in this folder shows the simplest possible "language model" — just counting letter frequencies. It's silly and bad, but it captures the core idea.

```bash
python language_model_idea.py
```
104 changes: 104 additions & 0 deletions gpt/local/course/ch01_What_is_a_Language_Model/language_model_idea.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
"""
Chapter 01: The Simplest Possible "Language Model"

This is NOT a real language model — it's a toy to show the core idea:
1. Learn patterns from data (training)
2. Generate new text using those patterns (inference)

We simply count how often each character follows another character.
"""

import random

# ============================================================
# STEP 1: Our "training data" — a few names
# ============================================================
training_data = [
"emma", "olivia", "ava", "sophia", "isabella",
"mia", "charlotte", "amelia", "harper", "evelyn",
]

print("=== Training Data ===")
for name in training_data:
print(f" {name}")

# ============================================================
# STEP 2: TRAINING — Count character transitions
# ============================================================
# We'll count: given character X, how often does character Y come next?
# We use a special character '.' to mean "start" or "end" of a name.

# Build a dictionary of dictionaries:
# counts['a']['b'] = number of times 'b' follows 'a' in the training data
counts = {}

for name in training_data:
# Add start/end markers: ".emma."
chars = ['.'] + list(name) + ['.']
for i in range(len(chars) - 1):
current = chars[i]
next_char = chars[i + 1]
if current not in counts:
counts[current] = {}
counts[current][next_char] = counts[current].get(next_char, 0) + 1

# Let's look at what follows 'a':
print("\n=== What follows 'a' in training data? ===")
if 'a' in counts:
total = sum(counts['a'].values())
for char, count in sorted(counts['a'].items(), key=lambda x: -x[1]):
prob = count / total
display = "END" if char == '.' else char
print(f" '{display}' : {count} times ({prob:.0%})")

# ============================================================
# STEP 3: Convert counts to probabilities
# ============================================================
probs = {}
for current_char, next_chars in counts.items():
total = sum(next_chars.values())
probs[current_char] = {}
for next_char, count in next_chars.items():
probs[current_char][next_char] = count / total

# ============================================================
# STEP 4: INFERENCE — Generate new names!
# ============================================================
print("\n=== Generated Names (sampling from our 'model') ===")
random.seed(42)

for i in range(10):
name = []
current = '.' # start token

for _ in range(20): # max length safety
if current not in probs:
break
# Get possible next characters and their probabilities
next_chars = list(probs[current].keys())
weights = [probs[current][c] for c in next_chars]

# Sample randomly, weighted by probability
chosen = random.choices(next_chars, weights=weights, k=1)[0]

if chosen == '.': # end token
break
name.append(chosen)
current = chosen

print(f" {i+1:2d}. {''.join(name)}")

# ============================================================
# KEY TAKEAWAYS
# ============================================================
print("""
=== Key Takeaways ===
1. We LEARNED patterns from data (counted character transitions)
2. We GENERATED new text by sampling from those patterns
3. The generated names "sound like" the training data but are new

This is exactly what GPT does — just with WAY more sophisticated
pattern detection (neural networks instead of simple counting).

Next chapter: How do we turn text into numbers? → Tokenization
""")
Loading