Add GPT Course: Understanding microGPT & minGPT by RaikaSurendra · Pull Request #1 · RaikaSurendra/Random-Topics

RaikaSurendra · 2026-02-23T12:30:40Z

GPT Course for Early CS Students

A 12-chapter course that walks through Andrej Karpathy's two GPT implementations — from zero AI knowledge to full understanding.

Course Structure

Chapter	Topic
Ch01	What is a Language Model
Ch02	Tokenization
Ch03	Autograd (Value class)
Ch04	Neural Network Building Blocks
Ch05	Attention & Transformers
Ch06	Training Loop & Optimization
Ch07	microGPT Full Walkthrough
Ch08	Scaling Up with PyTorch
Ch09	minGPT Model Deep Dive
Ch10	Training & Inference
Ch11	Side-by-Side Comparison
Ch12	Exercises & Next Steps

Details

36 files (12 READMEs + 24 runnable Python scripts)
Ch01–07: Pure Python, zero dependencies
Ch08–10: Requires PyTorch
Ch11–12: Reference & exercises
All examples are runnable locally with python <script>.py

Summary by CodeRabbit

New Features
- Added comprehensive "Understanding GPT From Scratch" course with 12 chapters
- Includes runnable Python examples demonstrating language models, tokenization, autograd, neural networks, attention mechanisms, and training
- Provides side-by-side comparison between pure Python (microGPT) and PyTorch (minGPT) implementations
- Includes 12 guided exercises with starter code and next-steps reading list

A comprehensive course for early CS students covering: - Ch01-06: Fundamentals (language models, tokenization, autograd, NN blocks, attention, training) - Ch07: microGPT full annotated walkthrough (pure Python, zero dependencies) - Ch08-10: minGPT with PyTorch (model architecture, trainer, inference) - Ch11: Side-by-side comparison of both implementations - Ch12: Exercises and next steps All chapters include runnable Python examples. Ch01-07 require no dependencies; Ch08-10 require PyTorch.

coderabbitai · 2026-02-23T12:31:37Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive 12-chapter educational course titled "Understanding GPT From Scratch," featuring both microGPT (pure Python) and minGPT (PyTorch) implementations. The course covers language model fundamentals, tokenization, automatic differentiation, neural network components, attention mechanisms, training optimization, complete implementations, and exercises, with executable examples throughout.

Changes

Cohort / File(s)	Summary
Course Foundation `gpt/local/course/README.md`	Main course README detailing structure, prerequisites, usage guide, and workflow for chapters 01-10 (pure Python and PyTorch).
Chapter 01: Language Models `gpt/local/course/ch01_What_is_a_Language_Model/README.md`, `gpt/local/course/ch01_What_is_a_Language_Model/language_model_idea.py`, `gpt/local/course/ch01_What_is_a_Language_Model/.../language_model_idea-checkpoint.py`	Introduction to language models with character-level pattern learning. Includes README, toy implementation demonstrating transition probabilities and sampling, and checkpoint version.
Chapter 02: Tokenization `gpt/local/course/ch02_Tokenization/README.md`, `gpt/local/course/ch02_Tokenization/char_tokenizer.py`, `gpt/local/course/ch02_Tokenization/bpe_intuition.py`	Tokenization techniques (character, word, BPE). Includes documentation, character-level tokenizer implementation with BOS token, and simplified BPE training algorithm.
Chapter 03: Autograd `gpt/local/course/ch03_Autograd/README.md`, `gpt/local/course/ch03_Autograd/value_basics.py`, `gpt/local/course/ch03_Autograd/computation_graph.py`, `gpt/local/course/ch03_Autograd/gradient_descent.py`	Automatic differentiation foundations. Introduces Value class with forward/backward passes, computation graph visualization, topological sorting for gradients, and SGD learning demonstration.
Chapter 04: Neural Network Blocks `gpt/local/course/ch04_Neural_Network_Building_Blocks/README.md`, `gpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.py`, `gpt/local/course/ch04_Neural_Network_Building_Blocks/mlp.py`	Core neural network operations: linear layers, ReLU/GELU activations, softmax, normalization (LayerNorm/RMSNorm), embeddings. Includes autograd-enabled MLP with training loop.
Chapter 05: Attention & Transformers `gpt/local/course/ch05_Attention_and_Transformers/README.md`, `gpt/local/course/ch05_Attention_and_Transformers/attention_basics.py`, `gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py`, `gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py`	Attention mechanisms and transformer architecture. Covers single-head attention with causal masking, multi-head attention with KV cache, and complete transformer block with residuals.
Chapter 06: Training `gpt/local/course/ch06_Training_Loop_and_Optimization/README.md`, `gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py`, `gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py`, `gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py`	Training mechanics: cross-entropy loss, Adam optimizer, learning rate scheduling, and complete mini training loop with autograd-based next-token prediction on sequence data.
Chapter 07: microGPT `gpt/local/course/ch07_microGPT_Full_Walkthrough/README.md`, `gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py`	Complete microGPT implementation: character-level tokenizer, transformer with KV cache, Adam optimizer, training loop, and inference with temperature sampling on name dataset.
Chapter 08: PyTorch Basics `gpt/local/course/ch08_Scaling_Up_with_PyTorch/README.md`, `gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.py`, `gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py`	Introduction to PyTorch: tensors, autograd, nn.Module, embeddings, loss functions, batching. Side-by-side comparisons with manual Python implementations.
Chapter 09: minGPT Model `gpt/local/course/ch09_minGPT_Model_Deep_Dive/README.md`, `gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py`, `gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py`	minGPT architecture breakdown: NewGELU, CausalSelfAttention, transformer blocks, and complete GPT model with parameter counting across various configurations.
Chapter 10: Training & Inference `gpt/local/course/ch10_Training_and_Inference/README.md`, `gpt/local/course/ch10_Training_and_Inference/generate_text.py`, `gpt/local/course/ch10_Training_and_Inference/trainer_explained.py`	minGPT training and generation: temperature-scaled sampling, top-k filtering, trainer loop with gradient clipping, and sorting task demonstration.
Chapter 11: Comparison `gpt/local/course/ch11_Side_by_Side_Comparison/README.md`, `gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py`	Direct comparison of microGPT vs minGPT: philosophy, architecture mapping, component-by-component implementation differences, and use case recommendations.
Chapter 12: Exercises `gpt/local/course/ch12_Exercises_and_Next_Steps/README.md`, `gpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py`	Exercise problems (beginner to advanced), next steps resources, quick reference card, and starter solutions including bigram models, temperature exploration, and activation functions (tanh, sigmoid).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A course from naught to GPT's might,
From tokens small to transformers bright,
Through Python pure and PyTorch's grace,
We hop through attention's careful space,
Twelve chapters deep, with code in hand,
Now rabbits truly understand!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a comprehensive 12-chapter GPT course covering microGPT and minGPT implementations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gpt/course-microgpt-mingpt

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (22)

gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py-102-116 (1)

102-116: ⚠️ Potential issue | 🟡 Minor

Misleading comments in the sequence example.

The comments on lines 103-107 say "model correctly predicts" but this is misleading. The first probability in each tuple represents the probability assigned to that character, not necessarily a "correct" prediction. For example, line 103 shows 0.1 (10%) for 'h', which would typically be considered a poor prediction, not a correct one.

Additionally, the variable name l on line 114 is ambiguous (easily confused with 1 or I). Consider renaming to loss_val or char_loss.

Proposed fix

 # Simulating predictions for "hello" where model gets better over positions
 sequence = [
-    ("h", [0.1, 0.8, 0.05, 0.05]),    # model correctly predicts 'h' with 10%
-    ("e", [0.05, 0.05, 0.8, 0.1]),     # model correctly predicts 'e' with 80%
-    ("l", [0.2, 0.6, 0.1, 0.1]),       # model correctly predicts 'l' with 60%
-    ("l", [0.1, 0.1, 0.7, 0.1]),       # model correctly predicts second 'l' with 70%
-    ("o", [0.1, 0.1, 0.1, 0.7]),       # model correctly predicts 'o' with 70%
+    ("h", [0.1, 0.8, 0.05, 0.05]),    # model assigns P=0.1 to 'h' (weak)
+    ("e", [0.8, 0.05, 0.05, 0.1]),    # model assigns P=0.8 to 'e' (strong)
+    ("l", [0.6, 0.2, 0.1, 0.1]),      # model assigns P=0.6 to 'l'
+    ("l", [0.7, 0.1, 0.1, 0.1]),      # model assigns P=0.7 to second 'l'
+    ("o", [0.7, 0.1, 0.1, 0.1]),      # model assigns P=0.7 to 'o'
 ]
 
 print(f"\nPredicting each character in a sequence:")
 losses = []
 for char, probs_for_correct in sequence:
     p = probs_for_correct[0]  # simplified: first prob is for correct token
-    l = -math.log(max(p, 1e-10))
-    losses.append(l)
-    print(f"  '{char}': P(correct) = {p:.2f}, loss = {l:.4f}")
+    loss_val = -math.log(max(p, 1e-10))
+    losses.append(loss_val)
+    print(f"  '{char}': P(correct) = {p:.2f}, loss = {loss_val:.4f}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py` around
lines 102 - 116, Update the misleading comments in the sequence tuples to state
that the numbers are "model-assigned probability for that character" (not "model
correctly predicts") and clarify that higher values mean the model assigns more
probability to that token; also rename the ambiguous variable l (used in the
loop where p = probs_for_correct[0]) to a clearer name such as char_loss (and
update its usage in losses.append and the print formatting) and consider
renaming probs_for_correct to probs_for_token or probs_for_target to better
reflect its meaning.

gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py-1-104 (1)

1-104: ⚠️ Potential issue | 🟡 Minor

Remove Jupyter checkpoint file from version control.

The .ipynb_checkpoints/ directory contains auto-generated backup files from Jupyter and should not be committed. This file is identical to language_model_idea.py in the parent directory.

Add .ipynb_checkpoints/ to your .gitignore and remove this file from the repository.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py`
around lines 1 - 104, The checkpoint file
gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py
is an auto-generated Jupyter backup and duplicates language_model_idea.py;
remove this file from the repo and stop tracking it (git rm --cached or delete
and commit removal), add .ipynb_checkpoints/ to .gitignore so future checkpoints
are ignored, and ensure the canonical script language_model_idea.py remains
tracked (verify counts/probs/generation code in that file if you need to confirm
it's identical).

gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.py-39-41 (1)

39-41: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f-string prefixes (Ruff F541).
These will fail the linter even though they run.

🔧 Suggested fix

-print(f"  This replaces our manual linear() function with nested loops!")
+print("  This replaces our manual linear() function with nested loops!")
...
-print(f"  Manual: 4*(2x+3) = 4*(2*2+3) = 20.0 ✓")
+print("  Manual: 4*(2x+3) = 4*(2*2+3) = 20.0 ✓")
...
-print(f"  Model: 3 → 8 → 2")
+print("  Model: 3 → 8 → 2")
-print(f"  Parameters are automatically tracked by nn.Module!")
+print("  Parameters are automatically tracked by nn.Module!")

Also applies to: 55-59, 171-173

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.py` around lines
39 - 41, The print statements use unnecessary f-string prefixes that trigger
Ruff F541; remove the 'f' from the affected print calls (e.g., the prints that
reference A, B, C shapes near the Matrix multiply message and the other prints
around lines shown) so they are plain strings with interpolation performed by
concatenation or format methods if needed; locate the prints that currently
start with f"  Matrix multiply..." and the similar prints at the other ranges
(55–59 and 171–173) and delete the leading 'f' before the opening quote.

gpt/local/course/ch05_Attention_and_Transformers/attention_basics.py-41-41 (1)

41-41: ⚠️ Potential issue | 🟡 Minor

Remove the unused f-string prefix.
Ruff F541 will error on this line.

🔧 Suggested fix

-print(f"\nToken embeddings (X):")
+print("\nToken embeddings (X):")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch05_Attention_and_Transformers/attention_basics.py` at line
41, Remove the unnecessary f-string prefix on the print statement that prints
"Token embeddings (X):" (the top-level print with the literal string "Token
embeddings (X):") to avoid Ruff F541; change the line to a normal print without
the leading f (i.e., print("\nToken embeddings (X):")).

gpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.py-84-96 (1)

84-96: ⚠️ Potential issue | 🟡 Minor

Remove the unused f-string prefix.
Ruff F541 will error on this line.

🔧 Suggested fix

-print(f"\nHigher score → higher probability. All positive, sum to 1.")
+print("\nHigher score → higher probability. All positive, sum to 1.")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.py`
around lines 84 - 96, The trailing print uses an unnecessary f-string prefix
which triggers Ruff F541; remove the leading "f" from the string in the final
print statement (the one printing "Higher score → higher probability. All
positive, sum to 1.") so it becomes a plain string, keeping the softmax
function, scores, probs variables and all other prints unchanged.

gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py-146-150 (1)

146-150: ⚠️ Potential issue | 🟡 Minor

Avoid lambda assignment for matrix (Ruff E731).
Use a def for lint compliance and readability.

🔧 Suggested fix

-matrix = lambda nout, nin, std=0.08: [
-    [Value(random.gauss(0, std)) for _ in range(nin)]
-    for _ in range(nout)
-]
+def matrix(nout, nin, std=0.08):
+    return [
+        [Value(random.gauss(0, std)) for _ in range(nin)]
+        for _ in range(nout)
+    ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py` around
lines 146 - 150, Replace the lambda assigned to matrix with a regular function
definition to satisfy lint rule Ruff E731 and improve readability: change the
current "matrix = lambda nout, nin, std=0.08: ..." into "def matrix(nout, nin,
std=0.08):" and return the same list comprehension that creates
Value(random.gauss(0, std)) for each output/input; keep the default std and the
use of Value and random.gauss unchanged so functionally identical behavior is
preserved.

gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py-358-364 (1)

358-364: ⚠️ Potential issue | 🟡 Minor

Rename the loop variable l (Ruff E741).
Single-letter l is flagged as ambiguous.

🔧 Suggested fix

-        probs = softmax([l / temperature for l in logits])
+        probs = softmax([logit / temperature for logit in logits])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py` around
lines 358 - 364, The list-comprehension uses a single-letter variable `l` which
is ambiguous; rename it to a descriptive name (e.g., `logit`) in the
temperature-scaling expression inside the loop that calls gpt (the block with
function calls gpt(token_id, pos_id, keys_cache, values_cache) and softmax),
updating the corresponding comprehension [l / temperature for l in logits] to
use the new name and ensure any other list comprehensions in that loop (e.g.,
weights=[p.data for p in probs]) remain correct.

gpt/local/course/ch11_Side_by_Side_Comparison/README.md-25-38 (1)

25-38: ⚠️ Potential issue | 🟡 Minor

Add a language tag to the architecture mapping fence.
markdownlint MD040 will flag this fenced block without a language.

🔧 Suggested fix

-```
+```text
 microGPT                          minGPT
 ────────                          ──────
 Value class                   →   torch.Tensor + autograd
 ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch11_Side_by_Side_Comparison/README.md` around lines 25 -
38, The fenced comparison block starting with "microGPT                         
minGPT" in the README is missing a language tag, so add a language identifier
(e.g., use ```text) to the opening fence of that code block to satisfy
markdownlint MD040; update the opening fence for the block that contains the
table rows (Value class → torch.Tensor + autograd, state_dict['wte'][id] →
nn.Embedding(...), etc.) to be ```text.

gpt/local/course/ch03_Autograd/gradient_descent.py-34-38 (1)

34-38: ⚠️ Potential issue | 🟡 Minor

Split one-line method definitions (Ruff E701).
This will otherwise fail Ruff’s multi-statement-on-one-line rule.

🔧 Suggested fix

-    def __neg__(self): return self * -1
-    def __sub__(self, other): return self + (-other)
-    def __radd__(self, other): return self + other
-    def __rmul__(self, other): return self * other
+    def __neg__(self):
+        return self * -1
+    def __sub__(self, other):
+        return self + (-other)
+    def __radd__(self, other):
+        return self + other
+    def __rmul__(self, other):
+        return self * other

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch03_Autograd/gradient_descent.py` around lines 34 - 38, The
four dunder methods (__neg__, __sub__, __radd__, __rmul__) are written as
one-line definitions which violates Ruff E701; replace each one-line definition
with a standard multi-line method block (def __neg__(self): then a new line with
the return statement, etc.) so each method has its own indented body line —
update the methods __neg__, __sub__, __radd__, and __rmul__ accordingly to use
the expanded form.

gpt/local/course/ch08_Scaling_Up_with_PyTorch/README.md-30-42 (1)

30-42: ⚠️ Potential issue | 🟡 Minor

Add a language tag to the project structure fence.
markdownlint MD040 will flag this fenced block without a language.

🔧 Suggested fix

-```
+```text
 minGPT/
 ├── mingpt/
 │   ├── model.py      ← The GPT model (311 lines)
 ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/README.md` around lines 30 -
42, The fenced project-tree block in README.md (the triple-backtick block that
begins with "minGPT/") lacks a language tag and triggers markdownlint MD040;
update the opening fence from "```" to "```text" so the code block is explicitly
marked (leave the contents and the closing "```" unchanged) to satisfy the
linter.

gpt/local/course/ch03_Autograd/computation_graph.py-99-99 (1)

99-99: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string literal.

This line uses an f-string but contains no placeholders.

Proposed fix

-print(f"\nBackward Pass (computing gradients):")
+print("\nBackward Pass (computing gradients):")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch03_Autograd/computation_graph.py` at line 99, The print
statement in computation_graph.py uses an unnecessary f-string: change the line
printing "Backward Pass (computing gradients):" by removing the leading f so it
becomes a plain string literal; look for the print call that currently reads
print(f"\nBackward Pass (computing gradients):") and replace it with
print("\nBackward Pass (computing gradients):") to avoid the unused f-string.

gpt/local/course/ch10_Training_and_Inference/trainer_explained.py-167-167 (1)

167-167: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string literal.

This line uses an f-string but contains no placeholders.

Proposed fix

-print(f"Example: [2, 0, 1] → [0, 1, 2]\n")
+print("Example: [2, 0, 1] → [0, 1, 2]\n")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` at line
167, The print statement uses an unnecessary f-string: change the print call
that currently reads print(f"Example: [2, 0, 1] → [0, 1, 2]\n") to a plain
string literal print("Example: [2, 0, 1] → [0, 1, 2]\n") by removing the leading
f so no f-string formatting is used.

gpt/local/course/ch03_Autograd/value_basics.py-148-148 (1)

148-148: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string literal.

This line uses an f-string but contains no placeholders. Use a regular string instead.

Proposed fix

-print(f"  By hand: dy/dx = 2 * 2 * (2x+3) = 4 * (2*1+3) = 20 ✓")
+print("  By hand: dy/dx = 2 * 2 * (2x+3) = 4 * (2*1+3) = 20 ✓")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch03_Autograd/value_basics.py` at line 148, The print
statement in value_basics.py uses an f-string with no placeholders; replace the
f-string prefix on the print call (the line printing "  By hand: dy/dx = 2 * 2 *
(2x+3) = 4 * (2*1+3) = 20 ✓") with a plain string literal so it is not an
f-string (locate the print in value_basics.py near the end of the demonstration
output).

gpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py-135-135 (1)

135-135: ⚠️ Potential issue | 🟡 Minor

Ambiguous variable name l.

The variable l can be confused with 1 (one) or I (capital i). Use a more descriptive name like logit.

Proposed fix

-    scaled = [l / temp for l in logits]
+    scaled = [logit / temp for logit in logits]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py` at line
135, Rename the ambiguous list-comprehension iterator variable `l` to a
descriptive name like `logit` in the expression that computes `scaled` (i.e.,
change the comprehension used to build `scaled` from the `logits` iterable), so
it reads conceptually as "for logit in logits" and divides each `logit` by
`temp`; update any nearby similar comprehensions in the same function to use
`logit` for clarity.

gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py-153-153 (1)

153-153: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string literal.

This line uses an f-string but contains no placeholders.

Proposed fix

-print(f"  (Should be 'abcabcabc...' if training worked!)")
+print("  (Should be 'abcabcabc...' if training worked!)")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py` at
line 153, The print statement uses an unnecessary f-string: locate the print
call that currently reads print(f"  (Should be 'abcabcabc...' if training
worked!)") in training_loop.py and remove the leading f so it becomes a plain
string literal; this fixes the extraneous f prefix on the print statement.

gpt/local/course/ch10_Training_and_Inference/trainer_explained.py-86-86 (1)

86-86: ⚠️ Potential issue | 🟡 Minor

Unused variable B in unpacking.

The batch dimension B is unpacked but never used. Prefix with underscore to indicate it's intentionally unused.

Proposed fix

-        B, T = idx.size()
+        _B, T = idx.size()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` at line
86, The tuple unpacking currently does "B, T = idx.size()" but B is never used;
change the unused variable to a prefixed name (e.g., "_B, T = idx.size()") so
it's clear it's intentional and suppresses linter warnings—update the unpacking
in the same scope where idx.size() is called (the line with "B, T = idx.size()")
and leave all other uses of T unchanged.

gpt/local/course/ch10_Training_and_Inference/trainer_explained.py-225-225 (1)

225-225: ⚠️ Potential issue | 🟡 Minor

Unused loop variable y.

The variable y is unpacked in the loop but never used. Rename to _y or _ to indicate it's intentionally ignored.

Proposed fix

-    for b, (x, y) in enumerate(loader):
+    for b, (x, _) in enumerate(loader):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` at line
225, The loop unpacks (x, y) but never uses y; update the loop header to show
the variable is intentionally ignored by renaming it to _ or _y (e.g., change
"for b, (x, y) in enumerate(loader):" to "for b, (x, _) in enumerate(loader):"
or "for b, (x, _y) in enumerate(loader):"), which avoids unused-variable
warnings and clarifies intent in the training/inference loop where
enumerate(loader) is iterated.

gpt/local/course/ch03_Autograd/value_basics.py-11-11 (1)

11-11: ⚠️ Potential issue | 🟡 Minor

Unused import.

The math module is imported but never used in this file.

Proposed fix

-import math
-

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch03_Autograd/value_basics.py` at line 11, Remove the unused
import of the math module: delete the "import math" statement (the unused symbol
"math") from the top of value_basics.py or, if you intended to use math in
functions like any helper referencing math.*, actually use it; otherwise remove
the import to eliminate the unused dependency and lint warning.

gpt/local/course/ch05_Attention_and_Transformers/README.md-44-44 (1)

44-44: ⚠️ Potential issue | 🟡 Minor

Attention weights don't sum to 1.0.

The example shows weights = [0.35, 0.05, 0.18, 0.27, 0.05] which sums to 0.90, not 1.0 as expected from softmax output. This may confuse learners.

📝 Suggested fix

-After softmax: weights = [0.35, 0.05, 0.18, 0.27, 0.05]
+After softmax: weights = [0.35, 0.05, 0.20, 0.30, 0.10]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch05_Attention_and_Transformers/README.md` at line 44, The
listed softmax output "After softmax: weights = [0.35, 0.05, 0.18, 0.27, 0.05]"
does not sum to 1.0; replace that line with a properly normalized softmax result
(divide each value by 0.9) so it reads "After softmax: weights = [0.3888889,
0.0555556, 0.2, 0.3, 0.0555556]" (or similar rounded values) to ensure the
weights sum to 1.0 and update the surrounding text to state these are normalized
softmax probabilities.

gpt/local/course/ch02_Tokenization/char_tokenizer.py-43-45 (1)

43-45: ⚠️ Potential issue | 🟡 Minor

encode() will raise KeyError for characters not in the training vocabulary.

If a user tries to encode text containing characters not present in the docs dataset (e.g., uppercase letters, spaces, digits), a KeyError will be raised. For an educational demo this is acceptable, but consider adding a brief comment noting this limitation.
📝 Suggested documentation improvement
 def encode(text):
-    """Convert a string to a list of token IDs"""
+    """Convert a string to a list of token IDs.
+    
+    Note: Raises KeyError for characters not in vocabulary.
+    """
     return [char_to_id[ch] for ch in text]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch02_Tokenization/char_tokenizer.py` around lines 43 - 45,
The encode function (encode) currently uses char_to_id lookup and will raise
KeyError for characters outside the training vocabulary; add a brief inline
comment above encode noting this limitation (that unknown characters like
uppercase, spaces, digits will cause KeyError) and optionally suggest handling
strategies (e.g., pre-normalize text, add an "<unk>" token, or wrap lookups with
.get and fallback) so readers understand the behavior and how to mitigate it;
reference char_to_id and encode in the comment.

gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py-21-57 (1)

21-57: ⚠️ Potential issue | 🟡 Minor

The n_head parameter is unused in the calculation.

The function accepts n_head but doesn't use it. While n_head doesn't affect total parameter count (it only changes how parameters are partitioned within attention), having an unused parameter could confuse students. Consider either:

Removing the parameter, or
Adding a comment explaining why it doesn't affect parameter count

📝 Option 1: Remove unused parameter

-def count_gpt_params(vocab_size, block_size, n_layer, n_head, n_embd):
+def count_gpt_params(vocab_size, block_size, n_layer, n_embd):
     """Count parameters in a GPT model without building it."""

Note: This would require updating all call sites to remove n_head.

📝 Option 2: Add explanatory comment

-def count_gpt_params(vocab_size, block_size, n_layer, n_head, n_embd):
-    """Count parameters in a GPT model without building it."""
+def count_gpt_params(vocab_size, block_size, n_layer, n_head, n_embd):  # noqa: ARG001
+    """Count parameters in a GPT model without building it.
+    
+    Note: n_head is accepted for API consistency with model configs but doesn't
+    affect parameter count (it only determines how attention weights are partitioned).
+    """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py` around lines 21
- 57, The function count_gpt_params currently accepts n_head but never uses it;
either remove the parameter from the signature and update all call sites that
pass n_head, or (preferred to avoid touching call sites) keep n_head and add a
concise docstring/comment in count_gpt_params explaining why n_head does not
change the total parameter count (e.g., attention weights are stored as combined
c_attn of size n_embd x 3*n_embd so head count only affects tensor
reshaping/partitioning, not parameter count), and update the function docstring
to mention this explicitly so readers aren’t confused.

gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py-63-75 (1)

63-75: ⚠️ Potential issue | 🟡 Minor

The pos parameter is unused inside the function.

The pos argument is accepted but not used within transformer_block. It's only used in the demo's print statements. Consider removing it from the function signature or adding a comment explaining it's for external logging purposes.

📝 Option 1: Remove the parameter

-def transformer_block(x, keys_cache, values_cache, pos):
+def transformer_block(x, keys_cache, values_cache):
     """
     Process one token through one transformer block.
 
     Args:
         x: input embedding vector [n_embd]
         keys_cache: list of previous key vectors
         values_cache: list of previous value vectors
-        pos: position index (for printing)
 
     Returns:
         output vector [n_embd]
     """

Then update the call site:

-    x_out = transformer_block(x_in, keys_cache, values_cache, pos)
+    x_out = transformer_block(x_in, keys_cache, values_cache)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py` around
lines 63 - 75, The transformer_block function currently accepts a pos parameter
that is never used; either remove pos from the transformer_block signature and
update all call sites that pass pos accordingly (references:
transformer_block(x, keys_cache, values_cache, pos)) or keep it but add a clear
comment in the function docstring stating pos is only for external logging/demo
and will not be used by the function (update signature remains
transformer_block(x, keys_cache, values_cache, pos)); choose one approach and
make consistent changes where transformer_block is invoked.

🧹 Nitpick comments (20)

gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py (1)

13-23: Remove extraneous f prefixes in compare() function.

Lines 17 and 20 use f-strings but contain no placeholders. Use regular strings instead.

Proposed fix

 def compare(title, micro_code, min_code, explanation):
     print("=" * 70)
     print(f"  {title}")
     print("=" * 70)
-    print(f"\n  microGPT (pure Python):")
+    print("\n  microGPT (pure Python):")
     for line in micro_code.strip().split('\n'):
         print(f"    {line}")
-    print(f"\n  minGPT (PyTorch):")
+    print("\n  minGPT (PyTorch):")
     for line in min_code.strip().split('\n'):
         print(f"    {line}")
     print(f"\n  → {explanation}\n")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py` around lines 13
- 23, The compare function uses f-strings where no interpolation occurs (in the
print calls that output section headers), so remove the unnecessary 'f' prefixes
and convert those to plain string literals; specifically update the print calls
in compare that currently use f"\n  microGPT (pure Python):" and f"\n  minGPT
(PyTorch):" to "\n  microGPT (pure Python):" and "\n  minGPT (PyTorch):",
leaving other f-strings (those with placeholders) unchanged.

gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py (1)

92-124: Remove extraneous f prefix from string without placeholders.

This multi-line string contains no format placeholders, so the f prefix is unnecessary and triggers a linter warning.

Proposed fix

-print(f"""
+print("""
 === Why Adam is Better ===

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py`
around lines 92 - 124, The multi-line print call uses an unnecessary f-string
prefix: remove the leading 'f' from the print(f"""...""") so the call becomes
print("""...""") (leave the triple-quoted string contents unchanged); locate the
print invocation that begins with the "=== Why Adam is Better ===" block in
adam_optimizer.py and drop the 'f' prefix to eliminate the linter warning.

gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py (1)

44-44: Remove extraneous f prefix from strings without placeholders.

Lines 44, 57, 74-79, 110, 120, and 121 use f-strings but contain no placeholders. Use regular strings instead to avoid confusion and satisfy the linter.

Example fix for line 44

-print(f"\nStep 1 - Model outputs logits (raw scores):")
+print("\nStep 1 - Model outputs logits (raw scores):")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py` at
line 44, Several print statements (e.g., the one printing "Step 1 - Model
outputs logits (raw scores):" and other prints at lines noted) use f-strings
even though they have no placeholders; replace the f-prefixed strings with plain
string literals (remove the leading f) for each occurrence (search for the exact
printed messages such as "Step 1 - Model outputs logits (raw scores):" and the
other messages at lines 57, 74-79, 110, 120, 121) so the prints become normal
strings and satisfy the linter.

gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py (2)

158-158: Move import to top of file.

The import time statement on line 158 should be moved to the top of the file with other imports (after line 11) per PEP 8 conventions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py` at line
158, Move the stray "import time" statement into the module-level imports
grouped with the other top-of-file imports (i.e., place the import time
alongside the existing imports near the top of the file) to comply with PEP 8;
remove the duplicate/misplaced import at its current location so only the
top-level "import time" remains.

99-99: Split statement onto separate lines for readability.

Line 99 has multiple statements on one line, which reduces readability and violates PEP 8 style.

Proposed fix

         def build(v):
             if v not in visited:
                 visited.add(v)
-                for c in v._children: build(c)
+                for c in v._children:
+                    build(c)
                 topo.append(v)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py` at line
99, The single-line compound loop "for c in v._children: build(c)" reduces
readability; split it into a two-line loop in the build function so the loop
header iterates over v._children and the body calls build(c) on an indented next
line, referencing v._children and the build function to locate the change.
Ensure proper indentation to match surrounding code style and PEP8.

gpt/local/course/ch01_What_is_a_Language_Model/README.md (1)

13-16: Consider adding language specifiers to fenced code blocks.

Lines 13-16, 28-33, and 84-86 use fenced code blocks without language identifiers. Adding text or plaintext would improve rendering consistency and silence lint warnings.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch01_What_is_a_Language_Model/README.md` around lines 13 -
16, The fenced code blocks in README.md (the block starting with 'You type: "How
are ___"' and the other blocks around lines referenced in the review) are
missing language specifiers; update each triple-backtick fence to include a
language like text or plaintext (e.g., change ``` to ```text) so the blocks
render consistently and lint warnings are silenced for the snippets shown
(including the blocks containing "You type: \"How are ___\"" and the other
examples mentioned in the comment).

gpt/local/course/ch12_Exercises_and_Next_Steps/README.md (1)

93-101: Add language specifier to fenced code block.

The Quick Reference Card code block lacks a language identifier, which triggers a markdown lint warning and may affect rendering in some viewers. Since it's plain text/pseudocode, consider using text or plaintext.
Proposed fix
-```
+```text
 GPT in one paragraph:
   Tokenize text into integers. Embed each token into a vector.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch12_Exercises_and_Next_Steps/README.md` around lines 93 -
101, The fenced code block containing the "GPT in one paragraph:" quick
reference currently has no language tag; update the opening fence from ``` to
```text (or ```plaintext) so the block is marked as plain text, which will
satisfy markdown linting and improve rendering — locate the code block that
begins with the line "GPT in one paragraph:" in README.md and add the language
specifier to its opening backticks.

gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py (1)

65-66: Use def instead of assigning lambda to a variable.

Lambda expressions assigned to variables should be rewritten as function definitions for clarity and better error messages.

Proposed fix

-matrix = lambda rows, cols: [[Value(random.gauss(0, 0.5)) for _ in range(cols)]
-                              for _ in range(rows)]
+def matrix(rows, cols):
+    return [[Value(random.gauss(0, 0.5)) for _ in range(cols)]
+            for _ in range(rows)]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py` around
lines 65 - 66, Replace the assigned lambda named "matrix" with a proper function
definition (e.g., def matrix(rows, cols): ...) that returns the same
list-of-lists of Value(random.gauss(0, 0.5)) so the behavior is unchanged;
update any references to the "matrix" callable accordingly and keep the
construction of Value and random.gauss(...) intact for clarity and better
tracebacks.

gpt/local/course/ch10_Training_and_Inference/trainer_explained.py (1)

133-145: Move import pickle to the top of the file.

Importing pickle inside __getitem__ means it's re-imported on every dataset access. Move it to the module level for efficiency.
Proposed fix

Add at top of file (after other imports):
import pickle
Then remove the import from __getitem__:
     def __getitem__(self, idx):
-        import pickle
         while True:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` around
lines 133 - 145, The import of pickle inside the __getitem__ method causes
repeated imports on every access; move "import pickle" to the module-level
imports at the top of the file (alongside other imports) and remove the inline
import from the __getitem__ function so __getitem__ uses the top-level pickle
name directly.

gpt/local/course/ch03_Autograd/README.md (2)

36-40: Add language specifier to fenced code block.

This ASCII diagram would benefit from a language specifier.

Proposed fix

-```
+```text
 a ──┐
     ├──(+)──→ c
 b ──┘
 ```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch03_Autograd/README.md` around lines 36 - 40, Update the
fenced code block containing the ASCII diagram (the lines starting with "a ──┐",
"    ├──(+)──→ c", "b ──┘") to include a language specifier (e.g., "text") after
the opening triple backticks so the diagram is treated as plain text; leave the
diagram content unchanged and ensure the opening fence becomes ```text and the
closing fence remains ``` to apply proper formatting.

15-17: Add language specifier to fenced code block.

Adding a language identifier (e.g., text or math) helps with rendering and accessibility.

Proposed fix

-```
+```text
 dy/dx = dy/dg * dg/dx
 ```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch03_Autograd/README.md` around lines 15 - 17, The fenced
code block containing the expression "dy/dx = dy/dg * dg/dx" lacks a language
specifier; update the opening fence from ``` to ```text (or ```math) and keep
the closing ``` so the block becomes a labeled fenced code block for proper
rendering and accessibility; locate the block that wraps the exact expression
"dy/dx = dy/dg * dg/dx" and add the language token to the opening fence.

gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py (1)

119-119: Remove unnecessary f-string prefix.

This string has no placeholders, so the f prefix is extraneous.

📝 Suggested fix

-print(f"\n--- Outputs ---")
+print("\n--- Outputs ---")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py` at
line 119, The print statement using an unnecessary f-string prefix should be
changed to a plain string literal: locate the print call that currently reads
print(f"\n--- Outputs ---") in multi_head_attention.py and remove the leading f
so the statement prints "\n--- Outputs ---" without formatting; ensure no other
f-string formatting is accidentally removed nearby.

gpt/local/course/ch10_Training_and_Inference/generate_text.py (1)

91-91: Remove unnecessary f-string prefix.

This string has no placeholders, so the f prefix is extraneous.

📝 Suggested fix

-print(f"  Sampling (10 tries): ", end="")
+print("  Sampling (10 tries): ", end="")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch10_Training_and_Inference/generate_text.py` at line 91,
The print call uses an unnecessary f-string prefix; update the print invocation
(print(f"  Sampling (10 tries): ", end="")) to use a plain string (remove the
leading 'f') so it becomes print("  Sampling (10 tries): ", end=""); locate this
in generate_text.py where that print statement appears.

gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py (4)

125-129: Remove extraneous f prefix from string without placeholders.

This string has no format placeholders, so the f prefix is unnecessary.

✨ Proposed fix

-print(f"\n  Causal mask (first 5×5):")
+print("\n  Causal mask (first 5×5):")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` around
lines 125 - 129, The print statement uses an unnecessary f-string prefix: remove
the leading "f" from the literal in the print call that outputs "  Causal mask
(first 5×5):" (the block using attn.bias, mask, and the loop over i) so it
becomes a normal string literal; keep the rest of the code (mask = attn.bias[0,
0, :5, :5] and the subsequent loop building rows and printing counts) unchanged.

234-246: Remove extraneous f prefixes from strings without placeholders.

Lines 234 and 242 use f-strings but have no placeholders.

✨ Proposed fix

-print(f"  Model: vocab=27, block_size=16, layers=3, heads=3, embd=48")
+print("  Model: vocab=27, block_size=16, layers=3, heads=3, embd=48")

-print(f"\n  Forward pass:")
+print("\n  Forward pass:")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` around
lines 234 - 246, Two print statements use f-strings even though they contain no
placeholders; remove the unnecessary f prefix to use normal string literals for
the lines printing the model summary and the forward pass header. Specifically,
change the print call that outputs "  Model: vocab=27, block_size=16, layers=3,
heads=3, embd=48" and the print call that outputs "\n  Forward pass:" to plain
print("...") (leave other f-strings like the logits/ loss prints unchanged);
locate these near the code that calls model(idx, targets) and prints
shapes/loss.

200-202: Prefix unused variable with underscore.

The batch dimension B is unpacked but never used. Prefixing with underscore signals intentional non-use.

✨ Proposed fix

     def forward(self, idx, targets=None):
-        B, T = idx.size()
+        _B, T = idx.size()
         pos = torch.arange(0, T, dtype=torch.long, device=idx.device)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` around
lines 200 - 202, In the forward method, the batch dimension B is unpacked but
never used; change the unpacking in def forward(self, idx, targets=None) from
"B, T = idx.size()" to use a prefixed underscore for the unused variable (e.g.,
"_, T = idx.size()") so it's clear B is intentionally ignored and avoid
unused-variable warnings; ensure the rest of the method (pos = torch.arange(...,
device=idx.device) and any uses of T/idx) remains unchanged.

173-173: Remove extraneous f prefix.

✨ Proposed fix

-print(f"  Components: LayerNorm + Attention + LayerNorm + MLP(expand→GELU→compress)")
+print("  Components: LayerNorm + Attention + LayerNorm + MLP(expand→GELU→compress)")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` at line
173, The print statement in model_walkthrough.py currently uses an unnecessary
f-string prefix (print(f"  Components: LayerNorm + Attention + LayerNorm +
MLP(expand→GELU→compress)")); remove the extraneous `f` so it becomes a plain
string print call (print("  Components: LayerNorm + Attention + LayerNorm +
MLP(expand→GELU→compress)")) to avoid misleading formatting semantics.

gpt/local/course/ch02_Tokenization/char_tokenizer.py (1)

57-59: Consider using list unpacking for cleaner syntax.

Per static analysis hint (RUF005), list unpacking is more idiomatic.

✨ Optional style improvement

-    tokens_with_bos = [BOS] + tokens + [BOS]
+    tokens_with_bos = [BOS, *tokens, BOS]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch02_Tokenization/char_tokenizer.py` around lines 57 - 59,
Replace the concatenation-based construction of tokens_with_bos with list
unpacking for clarity: instead of building tokens_with_bos by adding lists
around tokens, use Python's unpacking to create the list with BOS, then all
elements of tokens, then BOS (reference: tokens_with_bos, tokens, BOS).

gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py (1)

10-16: PyTorch is imported but not actually used.

The function count_gpt_params performs pure arithmetic and doesn't require PyTorch. The import could be removed, making the script dependency-free and runnable without PyTorch installed.

♻️ Proposed fix

-try:
-    import torch
-    import torch.nn as nn
-    import math
-except ImportError:
-    print("PyTorch not installed. Run: pip install torch")
-    exit(1)
+import math

Update the docstring accordingly:

-Requires: pip install torch
+No dependencies required.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py` around lines 10
- 16, Remove the unused PyTorch imports at the top (the try/except that imports
torch, torch.nn, math) since count_gpt_params performs only arithmetic and
doesn't need torch; delete that import block and any exit(1) logic, ensure math
(if used) remains imported or add it explicitly, and update the module/docstring
for count_gpt_params to state the function is dependency-free and does not
require PyTorch.

gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py (1)

31-35: Minor: softmax implementation differs from attention_basics.py.

This version doesn't handle -inf values, whereas attention_basics.py (lines 131-135) does. This is fine here since no masking is applied, but worth noting for consistency across chapters if students compare implementations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py` around
lines 31 - 35, The softmax function currently computes a stable softmax but
doesn't handle -inf entries like the version in attention_basics.py; update
softmax to treat -math.inf logits as masked (produce zero probability) by
excluding them from the max computation and setting their exponent to 0 (or map
-inf to a very large negative constant before subtracting max) so the output
probabilities for -inf entries are zero and the remaining logits normalize
correctly; locate and modify the softmax function to mirror the -inf handling
used in attention_basics.py.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@gpt/local/course/ch01_What_is_a_Language_Model/README.md`:
- Around line 13-16: The fenced code blocks in README.md (the block starting
with 'You type: "How are ___"' and the other blocks around lines referenced in
the review) are missing language specifiers; update each triple-backtick fence
to include a language like text or plaintext (e.g., change ``` to ```text) so
the blocks render consistently and lint warnings are silenced for the snippets
shown (including the blocks containing "You type: \"How are ___\"" and the other
examples mentioned in the comment).

In `@gpt/local/course/ch02_Tokenization/char_tokenizer.py`:
- Around line 57-59: Replace the concatenation-based construction of
tokens_with_bos with list unpacking for clarity: instead of building
tokens_with_bos by adding lists around tokens, use Python's unpacking to create
the list with BOS, then all elements of tokens, then BOS (reference:
tokens_with_bos, tokens, BOS).

In `@gpt/local/course/ch03_Autograd/README.md`:
- Around line 36-40: Update the fenced code block containing the ASCII diagram
(the lines starting with "a ──┐", "    ├──(+)──→ c", "b ──┘") to include a
language specifier (e.g., "text") after the opening triple backticks so the
diagram is treated as plain text; leave the diagram content unchanged and ensure
the opening fence becomes ```text and the closing fence remains ``` to apply
proper formatting.
- Around line 15-17: The fenced code block containing the expression "dy/dx =
dy/dg * dg/dx" lacks a language specifier; update the opening fence from ``` to
```text (or ```math) and keep the closing ``` so the block becomes a labeled
fenced code block for proper rendering and accessibility; locate the block that
wraps the exact expression "dy/dx = dy/dg * dg/dx" and add the language token to
the opening fence.

In `@gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py`:
- Line 119: The print statement using an unnecessary f-string prefix should be
changed to a plain string literal: locate the print call that currently reads
print(f"\n--- Outputs ---") in multi_head_attention.py and remove the leading f
so the statement prints "\n--- Outputs ---" without formatting; ensure no other
f-string formatting is accidentally removed nearby.

In `@gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py`:
- Around line 31-35: The softmax function currently computes a stable softmax
but doesn't handle -inf entries like the version in attention_basics.py; update
softmax to treat -math.inf logits as masked (produce zero probability) by
excluding them from the max computation and setting their exponent to 0 (or map
-inf to a very large negative constant before subtracting max) so the output
probabilities for -inf entries are zero and the remaining logits normalize
correctly; locate and modify the softmax function to mirror the -inf handling
used in attention_basics.py.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py`:
- Around line 92-124: The multi-line print call uses an unnecessary f-string
prefix: remove the leading 'f' from the print(f"""...""") so the call becomes
print("""...""") (leave the triple-quoted string contents unchanged); locate the
print invocation that begins with the "=== Why Adam is Better ===" block in
adam_optimizer.py and drop the 'f' prefix to eliminate the linter warning.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py`:
- Line 44: Several print statements (e.g., the one printing "Step 1 - Model
outputs logits (raw scores):" and other prints at lines noted) use f-strings
even though they have no placeholders; replace the f-prefixed strings with plain
string literals (remove the leading f) for each occurrence (search for the exact
printed messages such as "Step 1 - Model outputs logits (raw scores):" and the
other messages at lines 57, 74-79, 110, 120, 121) so the prints become normal
strings and satisfy the linter.

In `@gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py`:
- Around line 65-66: Replace the assigned lambda named "matrix" with a proper
function definition (e.g., def matrix(rows, cols): ...) that returns the same
list-of-lists of Value(random.gauss(0, 0.5)) so the behavior is unchanged;
update any references to the "matrix" callable accordingly and keep the
construction of Value and random.gauss(...) intact for clarity and better
tracebacks.

In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py`:
- Line 158: Move the stray "import time" statement into the module-level imports
grouped with the other top-of-file imports (i.e., place the import time
alongside the existing imports near the top of the file) to comply with PEP 8;
remove the duplicate/misplaced import at its current location so only the
top-level "import time" remains.
- Line 99: The single-line compound loop "for c in v._children: build(c)"
reduces readability; split it into a two-line loop in the build function so the
loop header iterates over v._children and the body calls build(c) on an indented
next line, referencing v._children and the build function to locate the change.
Ensure proper indentation to match surrounding code style and PEP8.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py`:
- Around line 10-16: Remove the unused PyTorch imports at the top (the
try/except that imports torch, torch.nn, math) since count_gpt_params performs
only arithmetic and doesn't need torch; delete that import block and any exit(1)
logic, ensure math (if used) remains imported or add it explicitly, and update
the module/docstring for count_gpt_params to state the function is
dependency-free and does not require PyTorch.

In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py`:
- Around line 125-129: The print statement uses an unnecessary f-string prefix:
remove the leading "f" from the literal in the print call that outputs "  Causal
mask (first 5×5):" (the block using attn.bias, mask, and the loop over i) so it
becomes a normal string literal; keep the rest of the code (mask = attn.bias[0,
0, :5, :5] and the subsequent loop building rows and printing counts) unchanged.
- Around line 234-246: Two print statements use f-strings even though they
contain no placeholders; remove the unnecessary f prefix to use normal string
literals for the lines printing the model summary and the forward pass header.
Specifically, change the print call that outputs "  Model: vocab=27,
block_size=16, layers=3, heads=3, embd=48" and the print call that outputs "\n 
Forward pass:" to plain print("...") (leave other f-strings like the logits/
loss prints unchanged); locate these near the code that calls model(idx,
targets) and prints shapes/loss.
- Around line 200-202: In the forward method, the batch dimension B is unpacked
but never used; change the unpacking in def forward(self, idx, targets=None)
from "B, T = idx.size()" to use a prefixed underscore for the unused variable
(e.g., "_, T = idx.size()") so it's clear B is intentionally ignored and avoid
unused-variable warnings; ensure the rest of the method (pos = torch.arange(...,
device=idx.device) and any uses of T/idx) remains unchanged.
- Line 173: The print statement in model_walkthrough.py currently uses an
unnecessary f-string prefix (print(f"  Components: LayerNorm + Attention +
LayerNorm + MLP(expand→GELU→compress)")); remove the extraneous `f` so it
becomes a plain string print call (print("  Components: LayerNorm + Attention +
LayerNorm + MLP(expand→GELU→compress)")) to avoid misleading formatting
semantics.

In `@gpt/local/course/ch10_Training_and_Inference/generate_text.py`:
- Line 91: The print call uses an unnecessary f-string prefix; update the print
invocation (print(f"  Sampling (10 tries): ", end="")) to use a plain string
(remove the leading 'f') so it becomes print("  Sampling (10 tries): ", end="");
locate this in generate_text.py where that print statement appears.

In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py`:
- Around line 133-145: The import of pickle inside the __getitem__ method causes
repeated imports on every access; move "import pickle" to the module-level
imports at the top of the file (alongside other imports) and remove the inline
import from the __getitem__ function so __getitem__ uses the top-level pickle
name directly.

In `@gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py`:
- Around line 13-23: The compare function uses f-strings where no interpolation
occurs (in the print calls that output section headers), so remove the
unnecessary 'f' prefixes and convert those to plain string literals;
specifically update the print calls in compare that currently use f"\n  microGPT
(pure Python):" and f"\n  minGPT (PyTorch):" to "\n  microGPT (pure Python):"
and "\n  minGPT (PyTorch):", leaving other f-strings (those with placeholders)
unchanged.

In `@gpt/local/course/ch12_Exercises_and_Next_Steps/README.md`:
- Around line 93-101: The fenced code block containing the "GPT in one
paragraph:" quick reference currently has no language tag; update the opening
fence from ``` to ```text (or ```plaintext) so the block is marked as plain
text, which will satisfy markdown linting and improve rendering — locate the
code block that begins with the line "GPT in one paragraph:" in README.md and
add the language specifier to its opening backticks.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5ac245 and 4f3b966.

📒 Files selected for processing (37)

gpt/local/course/README.md
gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py
gpt/local/course/ch01_What_is_a_Language_Model/README.md
gpt/local/course/ch01_What_is_a_Language_Model/language_model_idea.py
gpt/local/course/ch02_Tokenization/README.md
gpt/local/course/ch02_Tokenization/bpe_intuition.py
gpt/local/course/ch02_Tokenization/char_tokenizer.py
gpt/local/course/ch03_Autograd/README.md
gpt/local/course/ch03_Autograd/computation_graph.py
gpt/local/course/ch03_Autograd/gradient_descent.py
gpt/local/course/ch03_Autograd/value_basics.py
gpt/local/course/ch04_Neural_Network_Building_Blocks/README.md
gpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.py
gpt/local/course/ch04_Neural_Network_Building_Blocks/mlp.py
gpt/local/course/ch05_Attention_and_Transformers/README.md
gpt/local/course/ch05_Attention_and_Transformers/attention_basics.py
gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py
gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py
gpt/local/course/ch06_Training_Loop_and_Optimization/README.md
gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py
gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py
gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py
gpt/local/course/ch07_microGPT_Full_Walkthrough/README.md
gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py
gpt/local/course/ch08_Scaling_Up_with_PyTorch/README.md
gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.py
gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py
gpt/local/course/ch09_minGPT_Model_Deep_Dive/README.md
gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py
gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py
gpt/local/course/ch10_Training_and_Inference/README.md
gpt/local/course/ch10_Training_and_Inference/generate_text.py
gpt/local/course/ch10_Training_and_Inference/trainer_explained.py
gpt/local/course/ch11_Side_by_Side_Comparison/README.md
gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py
gpt/local/course/ch12_Exercises_and_Next_Steps/README.md
gpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add GPT Course: Understanding microGPT & minGPT#1

Add GPT Course: Understanding microGPT & minGPT#1
RaikaSurendra wants to merge 1 commit intomasterfrom
gpt/course-microgpt-mingpt

RaikaSurendra commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

RaikaSurendra commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GPT Course for Early CS Students

Course Structure

Details

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RaikaSurendra commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading