Add GPT Course: Understanding microGPT & minGPT#1
Add GPT Course: Understanding microGPT & minGPT#1RaikaSurendra wants to merge 1 commit intomasterfrom
Conversation
A comprehensive course for early CS students covering: - Ch01-06: Fundamentals (language models, tokenization, autograd, NN blocks, attention, training) - Ch07: microGPT full annotated walkthrough (pure Python, zero dependencies) - Ch08-10: minGPT with PyTorch (model architecture, trainer, inference) - Ch11: Side-by-side comparison of both implementations - Ch12: Exercises and next steps All chapters include runnable Python examples. Ch01-07 require no dependencies; Ch08-10 require PyTorch.
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive 12-chapter educational course titled "Understanding GPT From Scratch," featuring both microGPT (pure Python) and minGPT (PyTorch) implementations. The course covers language model fundamentals, tokenization, automatic differentiation, neural network components, attention mechanisms, training optimization, complete implementations, and exercises, with executable examples throughout. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Comment |
There was a problem hiding this comment.
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (22)
gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py-102-116 (1)
102-116:⚠️ Potential issue | 🟡 MinorMisleading comments in the sequence example.
The comments on lines 103-107 say "model correctly predicts" but this is misleading. The first probability in each tuple represents the probability assigned to that character, not necessarily a "correct" prediction. For example, line 103 shows
0.1(10%) for 'h', which would typically be considered a poor prediction, not a correct one.Additionally, the variable name
lon line 114 is ambiguous (easily confused with1orI). Consider renaming toloss_valorchar_loss.Proposed fix
# Simulating predictions for "hello" where model gets better over positions sequence = [ - ("h", [0.1, 0.8, 0.05, 0.05]), # model correctly predicts 'h' with 10% - ("e", [0.05, 0.05, 0.8, 0.1]), # model correctly predicts 'e' with 80% - ("l", [0.2, 0.6, 0.1, 0.1]), # model correctly predicts 'l' with 60% - ("l", [0.1, 0.1, 0.7, 0.1]), # model correctly predicts second 'l' with 70% - ("o", [0.1, 0.1, 0.1, 0.7]), # model correctly predicts 'o' with 70% + ("h", [0.1, 0.8, 0.05, 0.05]), # model assigns P=0.1 to 'h' (weak) + ("e", [0.8, 0.05, 0.05, 0.1]), # model assigns P=0.8 to 'e' (strong) + ("l", [0.6, 0.2, 0.1, 0.1]), # model assigns P=0.6 to 'l' + ("l", [0.7, 0.1, 0.1, 0.1]), # model assigns P=0.7 to second 'l' + ("o", [0.7, 0.1, 0.1, 0.1]), # model assigns P=0.7 to 'o' ] print(f"\nPredicting each character in a sequence:") losses = [] for char, probs_for_correct in sequence: p = probs_for_correct[0] # simplified: first prob is for correct token - l = -math.log(max(p, 1e-10)) - losses.append(l) - print(f" '{char}': P(correct) = {p:.2f}, loss = {l:.4f}") + loss_val = -math.log(max(p, 1e-10)) + losses.append(loss_val) + print(f" '{char}': P(correct) = {p:.2f}, loss = {loss_val:.4f}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py` around lines 102 - 116, Update the misleading comments in the sequence tuples to state that the numbers are "model-assigned probability for that character" (not "model correctly predicts") and clarify that higher values mean the model assigns more probability to that token; also rename the ambiguous variable l (used in the loop where p = probs_for_correct[0]) to a clearer name such as char_loss (and update its usage in losses.append and the print formatting) and consider renaming probs_for_correct to probs_for_token or probs_for_target to better reflect its meaning.gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py-1-104 (1)
1-104:⚠️ Potential issue | 🟡 MinorRemove Jupyter checkpoint file from version control.
The
.ipynb_checkpoints/directory contains auto-generated backup files from Jupyter and should not be committed. This file is identical tolanguage_model_idea.pyin the parent directory.Add
.ipynb_checkpoints/to your.gitignoreand remove this file from the repository.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py` around lines 1 - 104, The checkpoint file gpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.py is an auto-generated Jupyter backup and duplicates language_model_idea.py; remove this file from the repo and stop tracking it (git rm --cached or delete and commit removal), add .ipynb_checkpoints/ to .gitignore so future checkpoints are ignored, and ensure the canonical script language_model_idea.py remains tracked (verify counts/probs/generation code in that file if you need to confirm it's identical).gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.py-39-41 (1)
39-41:⚠️ Potential issue | 🟡 MinorRemove extraneous f-string prefixes (Ruff F541).
These will fail the linter even though they run.🔧 Suggested fix
-print(f" This replaces our manual linear() function with nested loops!") +print(" This replaces our manual linear() function with nested loops!") ... -print(f" Manual: 4*(2x+3) = 4*(2*2+3) = 20.0 ✓") +print(" Manual: 4*(2x+3) = 4*(2*2+3) = 20.0 ✓") ... -print(f" Model: 3 → 8 → 2") +print(" Model: 3 → 8 → 2") -print(f" Parameters are automatically tracked by nn.Module!") +print(" Parameters are automatically tracked by nn.Module!")Also applies to: 55-59, 171-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.py` around lines 39 - 41, The print statements use unnecessary f-string prefixes that trigger Ruff F541; remove the 'f' from the affected print calls (e.g., the prints that reference A, B, C shapes near the Matrix multiply message and the other prints around lines shown) so they are plain strings with interpolation performed by concatenation or format methods if needed; locate the prints that currently start with f" Matrix multiply..." and the similar prints at the other ranges (55–59 and 171–173) and delete the leading 'f' before the opening quote.gpt/local/course/ch05_Attention_and_Transformers/attention_basics.py-41-41 (1)
41-41:⚠️ Potential issue | 🟡 MinorRemove the unused f-string prefix.
Ruff F541 will error on this line.🔧 Suggested fix
-print(f"\nToken embeddings (X):") +print("\nToken embeddings (X):")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch05_Attention_and_Transformers/attention_basics.py` at line 41, Remove the unnecessary f-string prefix on the print statement that prints "Token embeddings (X):" (the top-level print with the literal string "Token embeddings (X):") to avoid Ruff F541; change the line to a normal print without the leading f (i.e., print("\nToken embeddings (X):")).gpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.py-84-96 (1)
84-96:⚠️ Potential issue | 🟡 MinorRemove the unused f-string prefix.
Ruff F541 will error on this line.🔧 Suggested fix
-print(f"\nHigher score → higher probability. All positive, sum to 1.") +print("\nHigher score → higher probability. All positive, sum to 1.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.py` around lines 84 - 96, The trailing print uses an unnecessary f-string prefix which triggers Ruff F541; remove the leading "f" from the string in the final print statement (the one printing "Higher score → higher probability. All positive, sum to 1.") so it becomes a plain string, keeping the softmax function, scores, probs variables and all other prints unchanged.gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py-146-150 (1)
146-150:⚠️ Potential issue | 🟡 MinorAvoid lambda assignment for
matrix(Ruff E731).
Use adeffor lint compliance and readability.🔧 Suggested fix
-matrix = lambda nout, nin, std=0.08: [ - [Value(random.gauss(0, std)) for _ in range(nin)] - for _ in range(nout) -] +def matrix(nout, nin, std=0.08): + return [ + [Value(random.gauss(0, std)) for _ in range(nin)] + for _ in range(nout) + ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py` around lines 146 - 150, Replace the lambda assigned to matrix with a regular function definition to satisfy lint rule Ruff E731 and improve readability: change the current "matrix = lambda nout, nin, std=0.08: ..." into "def matrix(nout, nin, std=0.08):" and return the same list comprehension that creates Value(random.gauss(0, std)) for each output/input; keep the default std and the use of Value and random.gauss unchanged so functionally identical behavior is preserved.gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py-358-364 (1)
358-364:⚠️ Potential issue | 🟡 MinorRename the loop variable
l(Ruff E741).
Single-letterlis flagged as ambiguous.🔧 Suggested fix
- probs = softmax([l / temperature for l in logits]) + probs = softmax([logit / temperature for logit in logits])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.py` around lines 358 - 364, The list-comprehension uses a single-letter variable `l` which is ambiguous; rename it to a descriptive name (e.g., `logit`) in the temperature-scaling expression inside the loop that calls gpt (the block with function calls gpt(token_id, pos_id, keys_cache, values_cache) and softmax), updating the corresponding comprehension [l / temperature for l in logits] to use the new name and ensure any other list comprehensions in that loop (e.g., weights=[p.data for p in probs]) remain correct.gpt/local/course/ch11_Side_by_Side_Comparison/README.md-25-38 (1)
25-38:⚠️ Potential issue | 🟡 MinorAdd a language tag to the architecture mapping fence.
markdownlint MD040 will flag this fenced block without a language.🔧 Suggested fix
-``` +```text microGPT minGPT ──────── ────── Value class → torch.Tensor + autograd ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch11_Side_by_Side_Comparison/README.md` around lines 25 - 38, The fenced comparison block starting with "microGPT minGPT" in the README is missing a language tag, so add a language identifier (e.g., use ```text) to the opening fence of that code block to satisfy markdownlint MD040; update the opening fence for the block that contains the table rows (Value class → torch.Tensor + autograd, state_dict['wte'][id] → nn.Embedding(...), etc.) to be ```text.gpt/local/course/ch03_Autograd/gradient_descent.py-34-38 (1)
34-38:⚠️ Potential issue | 🟡 MinorSplit one-line method definitions (Ruff E701).
This will otherwise fail Ruff’s multi-statement-on-one-line rule.🔧 Suggested fix
- def __neg__(self): return self * -1 - def __sub__(self, other): return self + (-other) - def __radd__(self, other): return self + other - def __rmul__(self, other): return self * other + def __neg__(self): + return self * -1 + def __sub__(self, other): + return self + (-other) + def __radd__(self, other): + return self + other + def __rmul__(self, other): + return self * other🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch03_Autograd/gradient_descent.py` around lines 34 - 38, The four dunder methods (__neg__, __sub__, __radd__, __rmul__) are written as one-line definitions which violates Ruff E701; replace each one-line definition with a standard multi-line method block (def __neg__(self): then a new line with the return statement, etc.) so each method has its own indented body line — update the methods __neg__, __sub__, __radd__, and __rmul__ accordingly to use the expanded form.gpt/local/course/ch08_Scaling_Up_with_PyTorch/README.md-30-42 (1)
30-42:⚠️ Potential issue | 🟡 MinorAdd a language tag to the project structure fence.
markdownlint MD040 will flag this fenced block without a language.🔧 Suggested fix
-``` +```text minGPT/ ├── mingpt/ │ ├── model.py ← The GPT model (311 lines) ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/README.md` around lines 30 - 42, The fenced project-tree block in README.md (the triple-backtick block that begins with "minGPT/") lacks a language tag and triggers markdownlint MD040; update the opening fence from "```" to "```text" so the code block is explicitly marked (leave the contents and the closing "```" unchanged) to satisfy the linter.gpt/local/course/ch03_Autograd/computation_graph.py-99-99 (1)
99-99:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix from string literal.This line uses an f-string but contains no placeholders.
Proposed fix
-print(f"\nBackward Pass (computing gradients):") +print("\nBackward Pass (computing gradients):")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch03_Autograd/computation_graph.py` at line 99, The print statement in computation_graph.py uses an unnecessary f-string: change the line printing "Backward Pass (computing gradients):" by removing the leading f so it becomes a plain string literal; look for the print call that currently reads print(f"\nBackward Pass (computing gradients):") and replace it with print("\nBackward Pass (computing gradients):") to avoid the unused f-string.gpt/local/course/ch10_Training_and_Inference/trainer_explained.py-167-167 (1)
167-167:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix from string literal.This line uses an f-string but contains no placeholders.
Proposed fix
-print(f"Example: [2, 0, 1] → [0, 1, 2]\n") +print("Example: [2, 0, 1] → [0, 1, 2]\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` at line 167, The print statement uses an unnecessary f-string: change the print call that currently reads print(f"Example: [2, 0, 1] → [0, 1, 2]\n") to a plain string literal print("Example: [2, 0, 1] → [0, 1, 2]\n") by removing the leading f so no f-string formatting is used.gpt/local/course/ch03_Autograd/value_basics.py-148-148 (1)
148-148:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix from string literal.This line uses an f-string but contains no placeholders. Use a regular string instead.
Proposed fix
-print(f" By hand: dy/dx = 2 * 2 * (2x+3) = 4 * (2*1+3) = 20 ✓") +print(" By hand: dy/dx = 2 * 2 * (2x+3) = 4 * (2*1+3) = 20 ✓")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch03_Autograd/value_basics.py` at line 148, The print statement in value_basics.py uses an f-string with no placeholders; replace the f-string prefix on the print call (the line printing " By hand: dy/dx = 2 * 2 * (2x+3) = 4 * (2*1+3) = 20 ✓") with a plain string literal so it is not an f-string (locate the print in value_basics.py near the end of the demonstration output).gpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py-135-135 (1)
135-135:⚠️ Potential issue | 🟡 MinorAmbiguous variable name
l.The variable
lcan be confused with1(one) orI(capital i). Use a more descriptive name likelogit.Proposed fix
- scaled = [l / temp for l in logits] + scaled = [logit / temp for logit in logits]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py` at line 135, Rename the ambiguous list-comprehension iterator variable `l` to a descriptive name like `logit` in the expression that computes `scaled` (i.e., change the comprehension used to build `scaled` from the `logits` iterable), so it reads conceptually as "for logit in logits" and divides each `logit` by `temp`; update any nearby similar comprehensions in the same function to use `logit` for clarity.gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py-153-153 (1)
153-153:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix from string literal.This line uses an f-string but contains no placeholders.
Proposed fix
-print(f" (Should be 'abcabcabc...' if training worked!)") +print(" (Should be 'abcabcabc...' if training worked!)")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py` at line 153, The print statement uses an unnecessary f-string: locate the print call that currently reads print(f" (Should be 'abcabcabc...' if training worked!)") in training_loop.py and remove the leading f so it becomes a plain string literal; this fixes the extraneous f prefix on the print statement.gpt/local/course/ch10_Training_and_Inference/trainer_explained.py-86-86 (1)
86-86:⚠️ Potential issue | 🟡 MinorUnused variable
Bin unpacking.The batch dimension
Bis unpacked but never used. Prefix with underscore to indicate it's intentionally unused.Proposed fix
- B, T = idx.size() + _B, T = idx.size()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` at line 86, The tuple unpacking currently does "B, T = idx.size()" but B is never used; change the unused variable to a prefixed name (e.g., "_B, T = idx.size()") so it's clear it's intentional and suppresses linter warnings—update the unpacking in the same scope where idx.size() is called (the line with "B, T = idx.size()") and leave all other uses of T unchanged.gpt/local/course/ch10_Training_and_Inference/trainer_explained.py-225-225 (1)
225-225:⚠️ Potential issue | 🟡 MinorUnused loop variable
y.The variable
yis unpacked in the loop but never used. Rename to_yor_to indicate it's intentionally ignored.Proposed fix
- for b, (x, y) in enumerate(loader): + for b, (x, _) in enumerate(loader):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` at line 225, The loop unpacks (x, y) but never uses y; update the loop header to show the variable is intentionally ignored by renaming it to _ or _y (e.g., change "for b, (x, y) in enumerate(loader):" to "for b, (x, _) in enumerate(loader):" or "for b, (x, _y) in enumerate(loader):"), which avoids unused-variable warnings and clarifies intent in the training/inference loop where enumerate(loader) is iterated.gpt/local/course/ch03_Autograd/value_basics.py-11-11 (1)
11-11:⚠️ Potential issue | 🟡 MinorUnused import.
The
mathmodule is imported but never used in this file.Proposed fix
-import math -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch03_Autograd/value_basics.py` at line 11, Remove the unused import of the math module: delete the "import math" statement (the unused symbol "math") from the top of value_basics.py or, if you intended to use math in functions like any helper referencing math.*, actually use it; otherwise remove the import to eliminate the unused dependency and lint warning.gpt/local/course/ch05_Attention_and_Transformers/README.md-44-44 (1)
44-44:⚠️ Potential issue | 🟡 MinorAttention weights don't sum to 1.0.
The example shows
weights = [0.35, 0.05, 0.18, 0.27, 0.05]which sums to 0.90, not 1.0 as expected from softmax output. This may confuse learners.📝 Suggested fix
-After softmax: weights = [0.35, 0.05, 0.18, 0.27, 0.05] +After softmax: weights = [0.35, 0.05, 0.20, 0.30, 0.10]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch05_Attention_and_Transformers/README.md` at line 44, The listed softmax output "After softmax: weights = [0.35, 0.05, 0.18, 0.27, 0.05]" does not sum to 1.0; replace that line with a properly normalized softmax result (divide each value by 0.9) so it reads "After softmax: weights = [0.3888889, 0.0555556, 0.2, 0.3, 0.0555556]" (or similar rounded values) to ensure the weights sum to 1.0 and update the surrounding text to state these are normalized softmax probabilities.gpt/local/course/ch02_Tokenization/char_tokenizer.py-43-45 (1)
43-45:⚠️ Potential issue | 🟡 Minor
encode()will raiseKeyErrorfor characters not in the training vocabulary.If a user tries to encode text containing characters not present in the
docsdataset (e.g., uppercase letters, spaces, digits), aKeyErrorwill be raised. For an educational demo this is acceptable, but consider adding a brief comment noting this limitation.📝 Suggested documentation improvement
def encode(text): - """Convert a string to a list of token IDs""" + """Convert a string to a list of token IDs. + + Note: Raises KeyError for characters not in vocabulary. + """ return [char_to_id[ch] for ch in text]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch02_Tokenization/char_tokenizer.py` around lines 43 - 45, The encode function (encode) currently uses char_to_id lookup and will raise KeyError for characters outside the training vocabulary; add a brief inline comment above encode noting this limitation (that unknown characters like uppercase, spaces, digits will cause KeyError) and optionally suggest handling strategies (e.g., pre-normalize text, add an "<unk>" token, or wrap lookups with .get and fallback) so readers understand the behavior and how to mitigate it; reference char_to_id and encode in the comment.gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py-21-57 (1)
21-57:⚠️ Potential issue | 🟡 MinorThe
n_headparameter is unused in the calculation.The function accepts
n_headbut doesn't use it. Whilen_headdoesn't affect total parameter count (it only changes how parameters are partitioned within attention), having an unused parameter could confuse students. Consider either:
- Removing the parameter, or
- Adding a comment explaining why it doesn't affect parameter count
📝 Option 1: Remove unused parameter
-def count_gpt_params(vocab_size, block_size, n_layer, n_head, n_embd): +def count_gpt_params(vocab_size, block_size, n_layer, n_embd): """Count parameters in a GPT model without building it."""Note: This would require updating all call sites to remove
n_head.📝 Option 2: Add explanatory comment
-def count_gpt_params(vocab_size, block_size, n_layer, n_head, n_embd): - """Count parameters in a GPT model without building it.""" +def count_gpt_params(vocab_size, block_size, n_layer, n_head, n_embd): # noqa: ARG001 + """Count parameters in a GPT model without building it. + + Note: n_head is accepted for API consistency with model configs but doesn't + affect parameter count (it only determines how attention weights are partitioned). + """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py` around lines 21 - 57, The function count_gpt_params currently accepts n_head but never uses it; either remove the parameter from the signature and update all call sites that pass n_head, or (preferred to avoid touching call sites) keep n_head and add a concise docstring/comment in count_gpt_params explaining why n_head does not change the total parameter count (e.g., attention weights are stored as combined c_attn of size n_embd x 3*n_embd so head count only affects tensor reshaping/partitioning, not parameter count), and update the function docstring to mention this explicitly so readers aren’t confused.gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py-63-75 (1)
63-75:⚠️ Potential issue | 🟡 MinorThe
posparameter is unused inside the function.The
posargument is accepted but not used withintransformer_block. It's only used in the demo's print statements. Consider removing it from the function signature or adding a comment explaining it's for external logging purposes.📝 Option 1: Remove the parameter
-def transformer_block(x, keys_cache, values_cache, pos): +def transformer_block(x, keys_cache, values_cache): """ Process one token through one transformer block. Args: x: input embedding vector [n_embd] keys_cache: list of previous key vectors values_cache: list of previous value vectors - pos: position index (for printing) Returns: output vector [n_embd] """Then update the call site:
- x_out = transformer_block(x_in, keys_cache, values_cache, pos) + x_out = transformer_block(x_in, keys_cache, values_cache)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py` around lines 63 - 75, The transformer_block function currently accepts a pos parameter that is never used; either remove pos from the transformer_block signature and update all call sites that pass pos accordingly (references: transformer_block(x, keys_cache, values_cache, pos)) or keep it but add a clear comment in the function docstring stating pos is only for external logging/demo and will not be used by the function (update signature remains transformer_block(x, keys_cache, values_cache, pos)); choose one approach and make consistent changes where transformer_block is invoked.
🧹 Nitpick comments (20)
gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py (1)
13-23: Remove extraneousfprefixes incompare()function.Lines 17 and 20 use f-strings but contain no placeholders. Use regular strings instead.
Proposed fix
def compare(title, micro_code, min_code, explanation): print("=" * 70) print(f" {title}") print("=" * 70) - print(f"\n microGPT (pure Python):") + print("\n microGPT (pure Python):") for line in micro_code.strip().split('\n'): print(f" {line}") - print(f"\n minGPT (PyTorch):") + print("\n minGPT (PyTorch):") for line in min_code.strip().split('\n'): print(f" {line}") print(f"\n → {explanation}\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py` around lines 13 - 23, The compare function uses f-strings where no interpolation occurs (in the print calls that output section headers), so remove the unnecessary 'f' prefixes and convert those to plain string literals; specifically update the print calls in compare that currently use f"\n microGPT (pure Python):" and f"\n minGPT (PyTorch):" to "\n microGPT (pure Python):" and "\n minGPT (PyTorch):", leaving other f-strings (those with placeholders) unchanged.gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py (1)
92-124: Remove extraneousfprefix from string without placeholders.This multi-line string contains no format placeholders, so the
fprefix is unnecessary and triggers a linter warning.Proposed fix
-print(f""" +print(""" === Why Adam is Better ===🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py` around lines 92 - 124, The multi-line print call uses an unnecessary f-string prefix: remove the leading 'f' from the print(f"""...""") so the call becomes print("""...""") (leave the triple-quoted string contents unchanged); locate the print invocation that begins with the "=== Why Adam is Better ===" block in adam_optimizer.py and drop the 'f' prefix to eliminate the linter warning.gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py (1)
44-44: Remove extraneousfprefix from strings without placeholders.Lines 44, 57, 74-79, 110, 120, and 121 use f-strings but contain no placeholders. Use regular strings instead to avoid confusion and satisfy the linter.
Example fix for line 44
-print(f"\nStep 1 - Model outputs logits (raw scores):") +print("\nStep 1 - Model outputs logits (raw scores):")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py` at line 44, Several print statements (e.g., the one printing "Step 1 - Model outputs logits (raw scores):" and other prints at lines noted) use f-strings even though they have no placeholders; replace the f-prefixed strings with plain string literals (remove the leading f) for each occurrence (search for the exact printed messages such as "Step 1 - Model outputs logits (raw scores):" and the other messages at lines 57, 74-79, 110, 120, 121) so the prints become normal strings and satisfy the linter.gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py (2)
158-158: Move import to top of file.The
import timestatement on line 158 should be moved to the top of the file with other imports (after line 11) per PEP 8 conventions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py` at line 158, Move the stray "import time" statement into the module-level imports grouped with the other top-of-file imports (i.e., place the import time alongside the existing imports near the top of the file) to comply with PEP 8; remove the duplicate/misplaced import at its current location so only the top-level "import time" remains.
99-99: Split statement onto separate lines for readability.Line 99 has multiple statements on one line, which reduces readability and violates PEP 8 style.
Proposed fix
def build(v): if v not in visited: visited.add(v) - for c in v._children: build(c) + for c in v._children: + build(c) topo.append(v)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py` at line 99, The single-line compound loop "for c in v._children: build(c)" reduces readability; split it into a two-line loop in the build function so the loop header iterates over v._children and the body calls build(c) on an indented next line, referencing v._children and the build function to locate the change. Ensure proper indentation to match surrounding code style and PEP8.gpt/local/course/ch01_What_is_a_Language_Model/README.md (1)
13-16: Consider adding language specifiers to fenced code blocks.Lines 13-16, 28-33, and 84-86 use fenced code blocks without language identifiers. Adding
textorplaintextwould improve rendering consistency and silence lint warnings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch01_What_is_a_Language_Model/README.md` around lines 13 - 16, The fenced code blocks in README.md (the block starting with 'You type: "How are ___"' and the other blocks around lines referenced in the review) are missing language specifiers; update each triple-backtick fence to include a language like text or plaintext (e.g., change ``` to ```text) so the blocks render consistently and lint warnings are silenced for the snippets shown (including the blocks containing "You type: \"How are ___\"" and the other examples mentioned in the comment).gpt/local/course/ch12_Exercises_and_Next_Steps/README.md (1)
93-101: Add language specifier to fenced code block.The Quick Reference Card code block lacks a language identifier, which triggers a markdown lint warning and may affect rendering in some viewers. Since it's plain text/pseudocode, consider using
textorplaintext.Proposed fix
-``` +```text GPT in one paragraph: Tokenize text into integers. Embed each token into a vector.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch12_Exercises_and_Next_Steps/README.md` around lines 93 - 101, The fenced code block containing the "GPT in one paragraph:" quick reference currently has no language tag; update the opening fence from ``` to ```text (or ```plaintext) so the block is marked as plain text, which will satisfy markdown linting and improve rendering — locate the code block that begins with the line "GPT in one paragraph:" in README.md and add the language specifier to its opening backticks.gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py (1)
65-66: Usedefinstead of assigning lambda to a variable.Lambda expressions assigned to variables should be rewritten as function definitions for clarity and better error messages.
Proposed fix
-matrix = lambda rows, cols: [[Value(random.gauss(0, 0.5)) for _ in range(cols)] - for _ in range(rows)] +def matrix(rows, cols): + return [[Value(random.gauss(0, 0.5)) for _ in range(cols)] + for _ in range(rows)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py` around lines 65 - 66, Replace the assigned lambda named "matrix" with a proper function definition (e.g., def matrix(rows, cols): ...) that returns the same list-of-lists of Value(random.gauss(0, 0.5)) so the behavior is unchanged; update any references to the "matrix" callable accordingly and keep the construction of Value and random.gauss(...) intact for clarity and better tracebacks.gpt/local/course/ch10_Training_and_Inference/trainer_explained.py (1)
133-145: Moveimport pickleto the top of the file.Importing
pickleinside__getitem__means it's re-imported on every dataset access. Move it to the module level for efficiency.Proposed fix
Add at top of file (after other imports):
import pickleThen remove the import from
__getitem__:def __getitem__(self, idx): - import pickle while True:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py` around lines 133 - 145, The import of pickle inside the __getitem__ method causes repeated imports on every access; move "import pickle" to the module-level imports at the top of the file (alongside other imports) and remove the inline import from the __getitem__ function so __getitem__ uses the top-level pickle name directly.gpt/local/course/ch03_Autograd/README.md (2)
36-40: Add language specifier to fenced code block.This ASCII diagram would benefit from a language specifier.
Proposed fix
-``` +```text a ──┐ ├──(+)──→ c b ──┘ ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch03_Autograd/README.md` around lines 36 - 40, Update the fenced code block containing the ASCII diagram (the lines starting with "a ──┐", " ├──(+)──→ c", "b ──┘") to include a language specifier (e.g., "text") after the opening triple backticks so the diagram is treated as plain text; leave the diagram content unchanged and ensure the opening fence becomes ```text and the closing fence remains ``` to apply proper formatting.
15-17: Add language specifier to fenced code block.Adding a language identifier (e.g.,
textormath) helps with rendering and accessibility.Proposed fix
-``` +```text dy/dx = dy/dg * dg/dx ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch03_Autograd/README.md` around lines 15 - 17, The fenced code block containing the expression "dy/dx = dy/dg * dg/dx" lacks a language specifier; update the opening fence from ``` to ```text (or ```math) and keep the closing ``` so the block becomes a labeled fenced code block for proper rendering and accessibility; locate the block that wraps the exact expression "dy/dx = dy/dg * dg/dx" and add the language token to the opening fence.gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py (1)
119-119: Remove unnecessary f-string prefix.This string has no placeholders, so the
fprefix is extraneous.📝 Suggested fix
-print(f"\n--- Outputs ---") +print("\n--- Outputs ---")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py` at line 119, The print statement using an unnecessary f-string prefix should be changed to a plain string literal: locate the print call that currently reads print(f"\n--- Outputs ---") in multi_head_attention.py and remove the leading f so the statement prints "\n--- Outputs ---" without formatting; ensure no other f-string formatting is accidentally removed nearby.gpt/local/course/ch10_Training_and_Inference/generate_text.py (1)
91-91: Remove unnecessary f-string prefix.This string has no placeholders, so the
fprefix is extraneous.📝 Suggested fix
-print(f" Sampling (10 tries): ", end="") +print(" Sampling (10 tries): ", end="")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch10_Training_and_Inference/generate_text.py` at line 91, The print call uses an unnecessary f-string prefix; update the print invocation (print(f" Sampling (10 tries): ", end="")) to use a plain string (remove the leading 'f') so it becomes print(" Sampling (10 tries): ", end=""); locate this in generate_text.py where that print statement appears.gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py (4)
125-129: Remove extraneousfprefix from string without placeholders.This string has no format placeholders, so the
fprefix is unnecessary.✨ Proposed fix
-print(f"\n Causal mask (first 5×5):") +print("\n Causal mask (first 5×5):")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` around lines 125 - 129, The print statement uses an unnecessary f-string prefix: remove the leading "f" from the literal in the print call that outputs " Causal mask (first 5×5):" (the block using attn.bias, mask, and the loop over i) so it becomes a normal string literal; keep the rest of the code (mask = attn.bias[0, 0, :5, :5] and the subsequent loop building rows and printing counts) unchanged.
234-246: Remove extraneousfprefixes from strings without placeholders.Lines 234 and 242 use f-strings but have no placeholders.
✨ Proposed fix
-print(f" Model: vocab=27, block_size=16, layers=3, heads=3, embd=48") +print(" Model: vocab=27, block_size=16, layers=3, heads=3, embd=48")-print(f"\n Forward pass:") +print("\n Forward pass:")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` around lines 234 - 246, Two print statements use f-strings even though they contain no placeholders; remove the unnecessary f prefix to use normal string literals for the lines printing the model summary and the forward pass header. Specifically, change the print call that outputs " Model: vocab=27, block_size=16, layers=3, heads=3, embd=48" and the print call that outputs "\n Forward pass:" to plain print("...") (leave other f-strings like the logits/ loss prints unchanged); locate these near the code that calls model(idx, targets) and prints shapes/loss.
200-202: Prefix unused variable with underscore.The batch dimension
Bis unpacked but never used. Prefixing with underscore signals intentional non-use.✨ Proposed fix
def forward(self, idx, targets=None): - B, T = idx.size() + _B, T = idx.size() pos = torch.arange(0, T, dtype=torch.long, device=idx.device)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` around lines 200 - 202, In the forward method, the batch dimension B is unpacked but never used; change the unpacking in def forward(self, idx, targets=None) from "B, T = idx.size()" to use a prefixed underscore for the unused variable (e.g., "_, T = idx.size()") so it's clear B is intentionally ignored and avoid unused-variable warnings; ensure the rest of the method (pos = torch.arange(..., device=idx.device) and any uses of T/idx) remains unchanged.
173-173: Remove extraneousfprefix.✨ Proposed fix
-print(f" Components: LayerNorm + Attention + LayerNorm + MLP(expand→GELU→compress)") +print(" Components: LayerNorm + Attention + LayerNorm + MLP(expand→GELU→compress)")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py` at line 173, The print statement in model_walkthrough.py currently uses an unnecessary f-string prefix (print(f" Components: LayerNorm + Attention + LayerNorm + MLP(expand→GELU→compress)")); remove the extraneous `f` so it becomes a plain string print call (print(" Components: LayerNorm + Attention + LayerNorm + MLP(expand→GELU→compress)")) to avoid misleading formatting semantics.gpt/local/course/ch02_Tokenization/char_tokenizer.py (1)
57-59: Consider using list unpacking for cleaner syntax.Per static analysis hint (RUF005), list unpacking is more idiomatic.
✨ Optional style improvement
- tokens_with_bos = [BOS] + tokens + [BOS] + tokens_with_bos = [BOS, *tokens, BOS]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch02_Tokenization/char_tokenizer.py` around lines 57 - 59, Replace the concatenation-based construction of tokens_with_bos with list unpacking for clarity: instead of building tokens_with_bos by adding lists around tokens, use Python's unpacking to create the list with BOS, then all elements of tokens, then BOS (reference: tokens_with_bos, tokens, BOS).gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py (1)
10-16: PyTorch is imported but not actually used.The function
count_gpt_paramsperforms pure arithmetic and doesn't require PyTorch. The import could be removed, making the script dependency-free and runnable without PyTorch installed.♻️ Proposed fix
-try: - import torch - import torch.nn as nn - import math -except ImportError: - print("PyTorch not installed. Run: pip install torch") - exit(1) +import mathUpdate the docstring accordingly:
-Requires: pip install torch +No dependencies required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py` around lines 10 - 16, Remove the unused PyTorch imports at the top (the try/except that imports torch, torch.nn, math) since count_gpt_params performs only arithmetic and doesn't need torch; delete that import block and any exit(1) logic, ensure math (if used) remains imported or add it explicitly, and update the module/docstring for count_gpt_params to state the function is dependency-free and does not require PyTorch.gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py (1)
31-35: Minor:softmaximplementation differs fromattention_basics.py.This version doesn't handle
-infvalues, whereasattention_basics.py(lines 131-135) does. This is fine here since no masking is applied, but worth noting for consistency across chapters if students compare implementations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py` around lines 31 - 35, The softmax function currently computes a stable softmax but doesn't handle -inf entries like the version in attention_basics.py; update softmax to treat -math.inf logits as masked (produce zero probability) by excluding them from the max computation and setting their exponent to 0 (or map -inf to a very large negative constant before subtracting max) so the output probabilities for -inf entries are zero and the remaining logits normalize correctly; locate and modify the softmax function to mirror the -inf handling used in attention_basics.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@gpt/local/course/ch01_What_is_a_Language_Model/README.md`:
- Around line 13-16: The fenced code blocks in README.md (the block starting
with 'You type: "How are ___"' and the other blocks around lines referenced in
the review) are missing language specifiers; update each triple-backtick fence
to include a language like text or plaintext (e.g., change ``` to ```text) so
the blocks render consistently and lint warnings are silenced for the snippets
shown (including the blocks containing "You type: \"How are ___\"" and the other
examples mentioned in the comment).
In `@gpt/local/course/ch02_Tokenization/char_tokenizer.py`:
- Around line 57-59: Replace the concatenation-based construction of
tokens_with_bos with list unpacking for clarity: instead of building
tokens_with_bos by adding lists around tokens, use Python's unpacking to create
the list with BOS, then all elements of tokens, then BOS (reference:
tokens_with_bos, tokens, BOS).
In `@gpt/local/course/ch03_Autograd/README.md`:
- Around line 36-40: Update the fenced code block containing the ASCII diagram
(the lines starting with "a ──┐", " ├──(+)──→ c", "b ──┘") to include a
language specifier (e.g., "text") after the opening triple backticks so the
diagram is treated as plain text; leave the diagram content unchanged and ensure
the opening fence becomes ```text and the closing fence remains ``` to apply
proper formatting.
- Around line 15-17: The fenced code block containing the expression "dy/dx =
dy/dg * dg/dx" lacks a language specifier; update the opening fence from ``` to
```text (or ```math) and keep the closing ``` so the block becomes a labeled
fenced code block for proper rendering and accessibility; locate the block that
wraps the exact expression "dy/dx = dy/dg * dg/dx" and add the language token to
the opening fence.
In `@gpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.py`:
- Line 119: The print statement using an unnecessary f-string prefix should be
changed to a plain string literal: locate the print call that currently reads
print(f"\n--- Outputs ---") in multi_head_attention.py and remove the leading f
so the statement prints "\n--- Outputs ---" without formatting; ensure no other
f-string formatting is accidentally removed nearby.
In `@gpt/local/course/ch05_Attention_and_Transformers/transformer_block.py`:
- Around line 31-35: The softmax function currently computes a stable softmax
but doesn't handle -inf entries like the version in attention_basics.py; update
softmax to treat -math.inf logits as masked (produce zero probability) by
excluding them from the max computation and setting their exponent to 0 (or map
-inf to a very large negative constant before subtracting max) so the output
probabilities for -inf entries are zero and the remaining logits normalize
correctly; locate and modify the softmax function to mirror the -inf handling
used in attention_basics.py.
In `@gpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.py`:
- Around line 92-124: The multi-line print call uses an unnecessary f-string
prefix: remove the leading 'f' from the print(f"""...""") so the call becomes
print("""...""") (leave the triple-quoted string contents unchanged); locate the
print invocation that begins with the "=== Why Adam is Better ===" block in
adam_optimizer.py and drop the 'f' prefix to eliminate the linter warning.
In `@gpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.py`:
- Line 44: Several print statements (e.g., the one printing "Step 1 - Model
outputs logits (raw scores):" and other prints at lines noted) use f-strings
even though they have no placeholders; replace the f-prefixed strings with plain
string literals (remove the leading f) for each occurrence (search for the exact
printed messages such as "Step 1 - Model outputs logits (raw scores):" and the
other messages at lines 57, 74-79, 110, 120, 121) so the prints become normal
strings and satisfy the linter.
In `@gpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.py`:
- Around line 65-66: Replace the assigned lambda named "matrix" with a proper
function definition (e.g., def matrix(rows, cols): ...) that returns the same
list-of-lists of Value(random.gauss(0, 0.5)) so the behavior is unchanged;
update any references to the "matrix" callable accordingly and keep the
construction of Value and random.gauss(...) intact for clarity and better
tracebacks.
In `@gpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.py`:
- Line 158: Move the stray "import time" statement into the module-level imports
grouped with the other top-of-file imports (i.e., place the import time
alongside the existing imports near the top of the file) to comply with PEP 8;
remove the duplicate/misplaced import at its current location so only the
top-level "import time" remains.
- Line 99: The single-line compound loop "for c in v._children: build(c)"
reduces readability; split it into a two-line loop in the build function so the
loop header iterates over v._children and the body calls build(c) on an indented
next line, referencing v._children and the build function to locate the change.
Ensure proper indentation to match surrounding code style and PEP8.
In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.py`:
- Around line 10-16: Remove the unused PyTorch imports at the top (the
try/except that imports torch, torch.nn, math) since count_gpt_params performs
only arithmetic and doesn't need torch; delete that import block and any exit(1)
logic, ensure math (if used) remains imported or add it explicitly, and update
the module/docstring for count_gpt_params to state the function is
dependency-free and does not require PyTorch.
In `@gpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.py`:
- Around line 125-129: The print statement uses an unnecessary f-string prefix:
remove the leading "f" from the literal in the print call that outputs " Causal
mask (first 5×5):" (the block using attn.bias, mask, and the loop over i) so it
becomes a normal string literal; keep the rest of the code (mask = attn.bias[0,
0, :5, :5] and the subsequent loop building rows and printing counts) unchanged.
- Around line 234-246: Two print statements use f-strings even though they
contain no placeholders; remove the unnecessary f prefix to use normal string
literals for the lines printing the model summary and the forward pass header.
Specifically, change the print call that outputs " Model: vocab=27,
block_size=16, layers=3, heads=3, embd=48" and the print call that outputs "\n
Forward pass:" to plain print("...") (leave other f-strings like the logits/
loss prints unchanged); locate these near the code that calls model(idx,
targets) and prints shapes/loss.
- Around line 200-202: In the forward method, the batch dimension B is unpacked
but never used; change the unpacking in def forward(self, idx, targets=None)
from "B, T = idx.size()" to use a prefixed underscore for the unused variable
(e.g., "_, T = idx.size()") so it's clear B is intentionally ignored and avoid
unused-variable warnings; ensure the rest of the method (pos = torch.arange(...,
device=idx.device) and any uses of T/idx) remains unchanged.
- Line 173: The print statement in model_walkthrough.py currently uses an
unnecessary f-string prefix (print(f" Components: LayerNorm + Attention +
LayerNorm + MLP(expand→GELU→compress)")); remove the extraneous `f` so it
becomes a plain string print call (print(" Components: LayerNorm + Attention +
LayerNorm + MLP(expand→GELU→compress)")) to avoid misleading formatting
semantics.
In `@gpt/local/course/ch10_Training_and_Inference/generate_text.py`:
- Line 91: The print call uses an unnecessary f-string prefix; update the print
invocation (print(f" Sampling (10 tries): ", end="")) to use a plain string
(remove the leading 'f') so it becomes print(" Sampling (10 tries): ", end="");
locate this in generate_text.py where that print statement appears.
In `@gpt/local/course/ch10_Training_and_Inference/trainer_explained.py`:
- Around line 133-145: The import of pickle inside the __getitem__ method causes
repeated imports on every access; move "import pickle" to the module-level
imports at the top of the file (alongside other imports) and remove the inline
import from the __getitem__ function so __getitem__ uses the top-level pickle
name directly.
In `@gpt/local/course/ch11_Side_by_Side_Comparison/comparison.py`:
- Around line 13-23: The compare function uses f-strings where no interpolation
occurs (in the print calls that output section headers), so remove the
unnecessary 'f' prefixes and convert those to plain string literals;
specifically update the print calls in compare that currently use f"\n microGPT
(pure Python):" and f"\n minGPT (PyTorch):" to "\n microGPT (pure Python):"
and "\n minGPT (PyTorch):", leaving other f-strings (those with placeholders)
unchanged.
In `@gpt/local/course/ch12_Exercises_and_Next_Steps/README.md`:
- Around line 93-101: The fenced code block containing the "GPT in one
paragraph:" quick reference currently has no language tag; update the opening
fence from ``` to ```text (or ```plaintext) so the block is marked as plain
text, which will satisfy markdown linting and improve rendering — locate the
code block that begins with the line "GPT in one paragraph:" in README.md and
add the language specifier to its opening backticks.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
gpt/local/course/README.mdgpt/local/course/ch01_What_is_a_Language_Model/.ipynb_checkpoints/language_model_idea-checkpoint.pygpt/local/course/ch01_What_is_a_Language_Model/README.mdgpt/local/course/ch01_What_is_a_Language_Model/language_model_idea.pygpt/local/course/ch02_Tokenization/README.mdgpt/local/course/ch02_Tokenization/bpe_intuition.pygpt/local/course/ch02_Tokenization/char_tokenizer.pygpt/local/course/ch03_Autograd/README.mdgpt/local/course/ch03_Autograd/computation_graph.pygpt/local/course/ch03_Autograd/gradient_descent.pygpt/local/course/ch03_Autograd/value_basics.pygpt/local/course/ch04_Neural_Network_Building_Blocks/README.mdgpt/local/course/ch04_Neural_Network_Building_Blocks/building_blocks.pygpt/local/course/ch04_Neural_Network_Building_Blocks/mlp.pygpt/local/course/ch05_Attention_and_Transformers/README.mdgpt/local/course/ch05_Attention_and_Transformers/attention_basics.pygpt/local/course/ch05_Attention_and_Transformers/multi_head_attention.pygpt/local/course/ch05_Attention_and_Transformers/transformer_block.pygpt/local/course/ch06_Training_Loop_and_Optimization/README.mdgpt/local/course/ch06_Training_Loop_and_Optimization/adam_optimizer.pygpt/local/course/ch06_Training_Loop_and_Optimization/cross_entropy.pygpt/local/course/ch06_Training_Loop_and_Optimization/training_loop.pygpt/local/course/ch07_microGPT_Full_Walkthrough/README.mdgpt/local/course/ch07_microGPT_Full_Walkthrough/microgpt_annotated.pygpt/local/course/ch08_Scaling_Up_with_PyTorch/README.mdgpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_basics.pygpt/local/course/ch08_Scaling_Up_with_PyTorch/pytorch_vs_manual.pygpt/local/course/ch09_minGPT_Model_Deep_Dive/README.mdgpt/local/course/ch09_minGPT_Model_Deep_Dive/model_sizes.pygpt/local/course/ch09_minGPT_Model_Deep_Dive/model_walkthrough.pygpt/local/course/ch10_Training_and_Inference/README.mdgpt/local/course/ch10_Training_and_Inference/generate_text.pygpt/local/course/ch10_Training_and_Inference/trainer_explained.pygpt/local/course/ch11_Side_by_Side_Comparison/README.mdgpt/local/course/ch11_Side_by_Side_Comparison/comparison.pygpt/local/course/ch12_Exercises_and_Next_Steps/README.mdgpt/local/course/ch12_Exercises_and_Next_Steps/exercise_solutions.py
GPT Course for Early CS Students
A 12-chapter course that walks through Andrej Karpathy's two GPT implementations — from zero AI knowledge to full understanding.
Course Structure
Details
python <script>.pySummary by CodeRabbit