RaikaSurendra · RaikaSurendra · Feb 23, 2026
diff --git a/gpt/local/course/README.md b/gpt/local/course/README.md
@@ -0,0 +1,64 @@
+# Understanding GPT From Scratch
+
+## A Course Based on Andrej Karpathy's Implementations
+
+**Target Audience:** Early CS students who understand basic programming (Python, loops, functions, classes) but have no background in AI/ML or GPT.
+
+---
+
+## What This Course Covers
+
+This course walks you through **two real GPT implementations** by Andrej Karpathy, starting from zero AI knowledge:
+
+| Implementation | Location | What It Is |
+|---|---|---|
+| **microGPT** | `../../8627fe009c40f57531cb18360106ce95/microgpt.py` | ~200 lines of pure Python. No libraries. Builds everything from scratch — autograd, neural networks, the full transformer. |
+| **minGPT** | `../../minGPT/` | A clean PyTorch implementation. Production-style code with proper modules, training infrastructure, and real GPT-2 compatibility. |
+
+---
+
+## Course Structure
+
+| Folder | Title | Key Idea |
+|---|---|---|
+| `ch01_What_is_a_Language_Model/` | What is a Language Model? | The big picture — predicting the next word |
+| `ch02_Tokenization/` | Tokenization | Turning text into numbers a computer can process |
+| `ch03_Autograd/` | Autograd | Teaching computers to do calculus automatically |
+| `ch04_Neural_Network_Building_Blocks/` | Neural Network Building Blocks | Linear layers, activation functions, softmax |
+| `ch05_Attention_and_Transformers/` | Attention & Transformers | The core innovation behind GPT |
+| `ch06_Training_Loop_and_Optimization/` | Training: How Models Learn | Loss functions, backprop, optimizers |
+| `ch07_microGPT_Full_Walkthrough/` | microGPT: Full Walkthrough | Line-by-line through the 200-line pure-Python GPT |
+| `ch08_Scaling_Up_with_PyTorch/` | Scaling Up with PyTorch | Why we need frameworks, intro to PyTorch |
+| `ch09_minGPT_Model_Deep_Dive/` | minGPT: Model Deep Dive | The production-quality GPT architecture |
+| `ch10_Training_and_Inference/` | minGPT: Training & Inference | Training loops, text generation, real demos |
+| `ch11_Side_by_Side_Comparison/` | Side-by-Side Comparison | microGPT vs minGPT — same ideas, different scales |
+| `ch12_Exercises_and_Next_Steps/` | Exercises & Next Steps | Hands-on challenges and further reading |
+
+---
+
+## How to Use This Course
+
+1. **Read chapters in order** — each builds on the previous one
+2. **Run the code examples** — every chapter has runnable `.py` files in its folder
+3. **Chapters 01-06** teach the concepts with small, isolated examples
+4. **Chapters 07-10** apply those concepts to the real Karpathy code
+5. **Chapter 11** ties everything together
+6. **Chapter 12** gives you challenges to test your understanding
+
+## Prerequisites
+
+- Python basics: variables, loops, functions, classes, lists, dictionaries
+- Basic math: addition, multiplication, exponents (no calculus needed — we teach it!)
+- A terminal / command line
+- Python 3.8+ installed
+
+## Running Examples
+
+```bash
+# For chapters 01-07 (pure Python, no dependencies):
+python ch01_What_is_a_Language_Model/language_model_idea.py
+
+# For chapters 08-10 (needs PyTorch):
+pip install torch
+python ch08_Scaling_Up_with_PyTorch/pytorch_basics.py
+```
diff --git a/...course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py b/...course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py
@@ -0,0 +1,104 @@
+"""
+Chapter 01: The Simplest Possible "Language Model"
+
+This is NOT a real language model — it's a toy to show the core idea:
+  1. Learn patterns from data (training)
+  2. Generate new text using those patterns (inference)
+
+We simply count how often each character follows another character.
+"""
+
+import random
+
+# ============================================================
+# STEP 1: Our "training data" — a few names
+# ============================================================
+training_data = [
+    "emma", "olivia", "ava", "sophia", "isabella",
+    "mia", "charlotte", "amelia", "harper", "evelyn",
+]
+
+print("=== Training Data ===")
+for name in training_data:
+    print(f"  {name}")
+
+# ============================================================
+# STEP 2: TRAINING — Count character transitions
+# ============================================================
+# We'll count: given character X, how often does character Y come next?
+# We use a special character '.' to mean "start" or "end" of a name.
+
+# Build a dictionary of dictionaries:
+#   counts['a']['b'] = number of times 'b' follows 'a' in the training data
+counts = {}
+
+for name in training_data:
+    # Add start/end markers: ".emma."
+    chars = ['.'] + list(name) + ['.']
+    for i in range(len(chars) - 1):
+        current = chars[i]
+        next_char = chars[i + 1]
+        if current not in counts:
+            counts[current] = {}
+        counts[current][next_char] = counts[current].get(next_char, 0) + 1
+
+# Let's look at what follows 'a':
+print("\n=== What follows 'a' in training data? ===")
+if 'a' in counts:
+    total = sum(counts['a'].values())
+    for char, count in sorted(counts['a'].items(), key=lambda x: -x[1]):
+        prob = count / total
+        display = "END" if char == '.' else char
+        print(f"  '{display}' : {count} times ({prob:.0%})")
+
+# ============================================================
+# STEP 3: Convert counts to probabilities
+# ============================================================
+probs = {}
+for current_char, next_chars in counts.items():
+    total = sum(next_chars.values())
+    probs[current_char] = {}
+    for next_char, count in next_chars.items():
+        probs[current_char][next_char] = count / total
+
+# ============================================================
+# STEP 4: INFERENCE — Generate new names!
+# ============================================================
+print("\n=== Generated Names (sampling from our 'model') ===")
+random.seed(42)
+
+for i in range(10):
+    name = []
+    current = '.'  # start token
+
+    for _ in range(20):  # max length safety
+        if current not in probs:
+            break
+        # Get possible next characters and their probabilities
+        next_chars = list(probs[current].keys())
+        weights = [probs[current][c] for c in next_chars]
+
+        # Sample randomly, weighted by probability
+        chosen = random.choices(next_chars, weights=weights, k=1)[0]
+
+        if chosen == '.':  # end token
+            break
+        name.append(chosen)
+        current = chosen
+
+    print(f"  {i+1:2d}. {''.join(name)}")
+
+# ============================================================
+# KEY TAKEAWAYS
+# ============================================================
+print("""
+=== Key Takeaways ===
+1. We LEARNED patterns from data (counted character transitions)
+2. We GENERATED new text by sampling from those patterns
+3. The generated names "sound like" the training data but are new
+
+This is exactly what GPT does — just with WAY more sophisticated
+pattern detection (neural networks instead of simple counting).
+
+Next chapter: How do we turn text into numbers? → Tokenization
+""")
diff --git a/gpt/local/course/ch01_What_is_a_Language_Model/README.md b/gpt/local/course/ch01_What_is_a_Language_Model/README.md
@@ -0,0 +1,102 @@
+# Chapter 01: What is a Language Model?
+
+## The One-Sentence Summary
+
+A language model is a program that **predicts the next word** (or character) given some previous words.
+
+---
+
+## Think of It Like Autocomplete
+
+You've used autocomplete on your phone:
+
+```
+You type: "How are ___"
+Phone suggests: "you"
+```
+
+That's a language model! It looked at "How are" and predicted "you" is the most likely next word.
+
+GPT is the same idea — just way more powerful.
+
+---
+
+## The Core Loop
+
+Every language model follows this pattern:
+
+```
+1. Look at some text               → "The cat sat on the"
+2. Predict the next word            → "mat" (70%), "floor" (20%), "dog" (10%)
+3. Pick one (sample or take best)   → "mat"
+4. Append it and repeat             → "The cat sat on the mat"
+```
+
+That's it. GPT generates entire essays by repeating steps 1-4 over and over.
+
+---
+
+## But How Does It Know?
+
+The model **learns patterns from data**. If you show it millions of sentences, it notices:
+- "The cat sat on the ___" → usually "mat", "floor", "chair"
+- "Once upon a ___" → usually "time"
+- "def __init__(self, ___" → usually a parameter name
+
+It doesn't "understand" language. It's really good at **pattern matching**.
+
+---
+
+## The Two Phases
+
+### Phase 1: Training (Learning)
+- Feed the model tons of text
+- For each position, it tries to predict the next character/word
+- When it's wrong, adjust its internal numbers to be less wrong next time
+- Repeat millions of times
+
+### Phase 2: Inference (Generating)
+- Give it a starting text (prompt)
+- Let it predict the next token, one at a time
+- It generates new text that "sounds like" its training data
+
+---
+
+## Characters vs Words vs Tokens
+
+Language models can predict at different levels:
+
+| Level | Example Input | Predicts |
+|---|---|---|
+| **Character-level** | `['H', 'e', 'l', 'l']` | `'o'` |
+| **Word-level** | `['The', 'cat']` | `'sat'` |
+| **Subword (BPE)** | `['The', ' cat', ' s']` | `'at'` |
+
+- **microGPT** uses character-level (simplest)
+- **minGPT** can use any level, but demos use character-level and BPE
+
+---
+
+## What's Inside a Language Model?
+
+At its core, a language model is just a **function with adjustable numbers** (parameters):
+
+```
+f(input_tokens, parameters) → probability of each possible next token
+```
+
+- The **parameters** are millions of numbers that encode patterns
+- **Training** = finding good values for those numbers
+- **Architecture** = how those numbers are organized and combined
+
+GPT uses an architecture called a **Transformer**, which we'll learn about in Chapter 05.
+
+---
+
+## Run the Example
+
+The file `language_model_idea.py` in this folder shows the simplest possible "language model" — just counting letter frequencies. It's silly and bad, but it captures the core idea.
+
+```bash
+python language_model_idea.py
+```
diff --git a/gpt/local/course/ch01_What_is_a_Language_Model/language_model_idea.py b/gpt/local/course/ch01_What_is_a_Language_Model/language_model_idea.py
@@ -0,0 +1,104 @@
+"""
+Chapter 01: The Simplest Possible "Language Model"
+
+This is NOT a real language model — it's a toy to show the core idea:
+  1. Learn patterns from data (training)
+  2. Generate new text using those patterns (inference)
+
+We simply count how often each character follows another character.
+"""
+
+import random
+
+# ============================================================
+# STEP 1: Our "training data" — a few names
+# ============================================================
+training_data = [
+    "emma", "olivia", "ava", "sophia", "isabella",
+    "mia", "charlotte", "amelia", "harper", "evelyn",
+]
+
+print("=== Training Data ===")
+for name in training_data:
+    print(f"  {name}")
+
+# ============================================================
+# STEP 2: TRAINING — Count character transitions
+# ============================================================
+# We'll count: given character X, how often does character Y come next?
+# We use a special character '.' to mean "start" or "end" of a name.
+
+# Build a dictionary of dictionaries:
+#   counts['a']['b'] = number of times 'b' follows 'a' in the training data
+counts = {}
+
+for name in training_data:
+    # Add start/end markers: ".emma."
+    chars = ['.'] + list(name) + ['.']
+    for i in range(len(chars) - 1):
+        current = chars[i]
+        next_char = chars[i + 1]
+        if current not in counts:
+            counts[current] = {}
+        counts[current][next_char] = counts[current].get(next_char, 0) + 1
+
+# Let's look at what follows 'a':
+print("\n=== What follows 'a' in training data? ===")
+if 'a' in counts:
+    total = sum(counts['a'].values())
+    for char, count in sorted(counts['a'].items(), key=lambda x: -x[1]):
+        prob = count / total
+        display = "END" if char == '.' else char
+        print(f"  '{display}' : {count} times ({prob:.0%})")
+
+# ============================================================
+# STEP 3: Convert counts to probabilities
+# ============================================================
+probs = {}
+for current_char, next_chars in counts.items():
+    total = sum(next_chars.values())
+    probs[current_char] = {}
+    for next_char, count in next_chars.items():
+        probs[current_char][next_char] = count / total
+
+# ============================================================
+# STEP 4: INFERENCE — Generate new names!
+# ============================================================
+print("\n=== Generated Names (sampling from our 'model') ===")
+random.seed(42)
+
+for i in range(10):
+    name = []
+    current = '.'  # start token
+
+    for _ in range(20):  # max length safety
+        if current not in probs:
+            break
+        # Get possible next characters and their probabilities
+        next_chars = list(probs[current].keys())
+        weights = [probs[current][c] for c in next_chars]
+
+        # Sample randomly, weighted by probability
+        chosen = random.choices(next_chars, weights=weights, k=1)[0]
+
+        if chosen == '.':  # end token
+            break
+        name.append(chosen)
+        current = chosen
+
+    print(f"  {i+1:2d}. {''.join(name)}")
+
+# ============================================================
+# KEY TAKEAWAYS
+# ============================================================
+print("""
+=== Key Takeaways ===
+1. We LEARNED patterns from data (counted character transitions)
+2. We GENERATED new text by sampling from those patterns
+3. The generated names "sound like" the training data but are new
+
+This is exactly what GPT does — just with WAY more sophisticated
+pattern detection (neural networks instead of simple counting).
+
+Next chapter: How do we turn text into numbers? → Tokenization
+""")