Write tests first, code second — and think like a QA engineer while doing both
Test-Driven Development (TDD) is a discipline that transforms how you write code. Instead of writing code and then testing it, you write tests first and let them drive your implementation. This guide covers TDD methodology, Python-specific testing patterns, and most importantly — how to develop the QA mindset that separates good tests from great ones.
While 08-tooling-testing.md covers basic pytest and unittest mechanics, this guide focuses on methodology, strategy, and anti-patterns.
Reading time: 90-120 minutes
- Why TDD? The Case for Test-First Development
- The TDD Cycle: Red-Green-Refactor
- Thinking Like a QA Engineer
- pytest Deep Dive
- unittest Advanced Patterns
- Mocking and Test Doubles
- Property-Based Testing with Hypothesis
- Test Design Strategies
- Writing Testable Code
- TDD Anti-Patterns
- Test Code Smells
- The Test Pyramid
- Coverage and Mutation Testing
- TDD Workflows
- TDD for Different Scenarios
- When NOT to Use TDD
- Interview Questions
- Quick Reference Cards
- Resources
Traditional development follows this pattern:
Write code → Manual testing → Write tests (maybe) → Move on
This approach has several problems:
| Problem | Consequence |
|---|---|
| Tests as an afterthought | Tests document what code does, not what it should do |
| Confirmation bias | You test to prove code works, not to find bugs |
| Untestable code | Code written without tests in mind is hard to test |
| Skipped tests | Under time pressure, tests are first to be cut |
| Regression blindness | Changes break things you don't realize |
TDD inverts the process:
Write failing test → Write minimal code → Test passes → Refactor → Repeat
Benefits of TDD:
| Benefit | Impact |
|---|---|
| Executable specifications | Tests document requirements, not just behavior |
| Design feedback | Hard-to-test code signals design problems |
| Confidence | Every change is verified against expectations |
| Regression safety | Refactoring with a safety net |
| Focus | Write only code that's needed (YAGNI) |
Studies show TDD's impact:
| Study | Finding |
|---|---|
| IBM Case Study | 40% fewer defects with 15-35% more initial time |
| Microsoft Research | 60-90% reduction in defect density |
| Quality Improvement Studies | Code written TDD-style has 40-80% fewer bugs |
The initial time investment pays off through reduced debugging and maintenance.
TDD follows a strict three-phase cycle:
┌─────────────────────────────────────────────┐
│ │
│ ┌───────┐ ┌───────┐ ┌──────────┐ │
│ │ │ │ │ │ │ │
│ │ RED │───►│ GREEN │───►│ REFACTOR │ │
│ │ │ │ │ │ │ │
│ └───────┘ └───────┘ └──────────┘ │
│ ▲ │ │
│ └──────────────────────────┘ │
│ │
└─────────────────────────────────────────────┘
Write a test for behavior that doesn't exist yet.
# test_email_validator.py
import pytest
from email_validator import validate_email, InvalidEmailError
def test_validate_email_rejects_empty_string():
"""Empty string should raise InvalidEmailError."""
with pytest.raises(InvalidEmailError):
validate_email("")Rules:
- Test must fail for the right reason (missing code, not syntax error)
- Test should be small and focused on one behavior
- Test name should describe the expected behavior
Run the test — it fails because the module doesn't exist:
$ pytest test_email_validator.py
ModuleNotFoundError: No module named 'email_validator'Write the minimum code to make the test pass.
# email_validator.py
class InvalidEmailError(Exception):
"""Raised when email validation fails."""
pass
def validate_email(email: str) -> str:
"""Validate an email address and return it if valid."""
if email == "":
raise InvalidEmailError("Email cannot be empty")
return emailRules:
- Write the simplest code that passes
- Don't add features the test doesn't require
- It's OK if the code is ugly — you'll fix it next
Run the test — it passes:
$ pytest test_email_validator.py
PASSEDNow improve the code while keeping tests green.
# email_validator.py (refactored)
class InvalidEmailError(Exception):
"""Raised when email validation fails."""
pass
def validate_email(email: str) -> str:
"""Validate an email address and return it if valid.
Args:
email: The email address to validate
Returns:
The validated email address
Raises:
InvalidEmailError: If the email is invalid
"""
if not email: # More Pythonic
raise InvalidEmailError("Email cannot be empty")
return emailRules:
- Tests must pass after every change
- Improve both production code AND test code
- Remove duplication, improve names, simplify logic
- Don't add new functionality — that requires a new Red phase
Let's build a complete email validator step by step:
Cycle 1: Empty email
# Red
def test_validate_email_rejects_empty_string():
with pytest.raises(InvalidEmailError):
validate_email("")
# Green
def validate_email(email: str) -> str:
if not email:
raise InvalidEmailError("Email cannot be empty")
return emailCycle 2: Missing @ symbol
# Red
def test_validate_email_requires_at_symbol():
with pytest.raises(InvalidEmailError):
validate_email("invalid.email")
# Green
def validate_email(email: str) -> str:
if not email:
raise InvalidEmailError("Email cannot be empty")
if "@" not in email:
raise InvalidEmailError("Email must contain @")
return emailCycle 3: Valid email passes
# Red (this should pass, so we verify the happy path)
def test_validate_email_accepts_valid_email():
result = validate_email("user@example.com")
assert result == "user@example.com"
# Green: Already passes! Move to refactor.Cycle 4: Local part cannot be empty
# Red
def test_validate_email_requires_local_part():
with pytest.raises(InvalidEmailError):
validate_email("@example.com")
# Green
def validate_email(email: str) -> str:
if not email:
raise InvalidEmailError("Email cannot be empty")
if "@" not in email:
raise InvalidEmailError("Email must contain @")
local, _, domain = email.partition("@")
if not local:
raise InvalidEmailError("Email must have local part")
return emailCycle 5: Domain cannot be empty
# Red
def test_validate_email_requires_domain():
with pytest.raises(InvalidEmailError):
validate_email("user@")
# Green
def validate_email(email: str) -> str:
if not email:
raise InvalidEmailError("Email cannot be empty")
if "@" not in email:
raise InvalidEmailError("Email must contain @")
local, _, domain = email.partition("@")
if not local:
raise InvalidEmailError("Email must have local part")
if not domain:
raise InvalidEmailError("Email must have domain")
return emailRefactor: Extract validation logic
# email_validator.py (final)
import re
class InvalidEmailError(Exception):
"""Raised when email validation fails."""
pass
# Simple email regex (not RFC 5322 compliant, but practical)
EMAIL_PATTERN = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
def validate_email(email: str) -> str:
"""Validate an email address.
Args:
email: The email address to validate
Returns:
The validated email address (normalized to lowercase)
Raises:
InvalidEmailError: If the email is invalid
"""
if not email:
raise InvalidEmailError("Email cannot be empty")
if not EMAIL_PATTERN.match(email):
raise InvalidEmailError(f"Invalid email format: {email}")
return email.lower()The biggest difference between a developer and QA engineer is mindset:
| Developer Mindset | QA Mindset |
|---|---|
| "How do I make this work?" | "How can I break this?" |
| Tests prove code works | Tests find defects |
| Focus on happy path | Focus on edge cases |
| "It works on my machine" | "What about other environments?" |
Test at the edges of valid ranges:
def test_age_validation():
"""Test boundary values for age field (valid: 0-150)."""
# At boundaries
assert is_valid_age(0) # Minimum valid
assert is_valid_age(150) # Maximum valid
# Just outside boundaries
assert not is_valid_age(-1) # Below minimum
assert not is_valid_age(151) # Above maximum
# Just inside boundaries
assert is_valid_age(1) # One above minimum
assert is_valid_age(149) # One below maximumDivide inputs into groups that should behave the same:
def test_discount_tiers():
"""Test discount calculation for different purchase amounts."""
# Partition 1: No discount (< $100)
assert calculate_discount(50) == 0
assert calculate_discount(99.99) == 0
# Partition 2: 10% discount ($100-$499)
assert calculate_discount(100) == 10
assert calculate_discount(250) == 25
# Partition 3: 20% discount ($500+)
assert calculate_discount(500) == 100
assert calculate_discount(1000) == 200Use experience to anticipate common errors:
def test_string_processing_edge_cases():
"""Test with inputs that commonly cause bugs."""
process = string_processor
# Empty/null-like inputs
assert process("") == ""
assert process(" ") == "" # Whitespace only
# Special characters
assert process("hello\nworld") == "hello world" # Newlines
assert process("hello\tworld") == "hello world" # Tabs
assert process("hello\0world") == "hello world" # Null char
# Unicode
assert process("héllo") == "hello" # Accented chars
assert process("你好") == "" # Non-ASCII
# Very long strings
assert len(process("a" * 1_000_000)) <= 1000 # Truncation
# SQL injection attempt
assert "DROP TABLE" not in process("'; DROP TABLE users;--")Test behavior across state changes:
class TestShoppingCart:
"""Test cart state transitions."""
def test_empty_cart_to_has_items(self):
"""Adding item to empty cart."""
cart = ShoppingCart()
assert cart.is_empty
cart.add_item("SKU123", quantity=1)
assert not cart.is_empty
assert cart.item_count == 1
def test_remove_last_item_returns_to_empty(self):
"""Removing last item returns cart to empty state."""
cart = ShoppingCart()
cart.add_item("SKU123", quantity=1)
cart.remove_item("SKU123")
assert cart.is_empty
assert cart.item_count == 0
def test_checkout_clears_cart(self):
"""Successful checkout empties cart."""
cart = ShoppingCart()
cart.add_item("SKU123", quantity=2)
cart.checkout(payment_info={})
assert cart.is_empty
### The ZOMBIES Acronym
A useful mnemonic for edge cases:
| Letter | Meaning | Example |
|--------|---------|---------|
| **Z** | Zero | Empty list, zero quantity |
| **O** | One | Single element, first item |
| **M** | Many | Multiple items, large datasets |
| **B** | Boundary | Min/max values, edge of ranges |
| **I** | Interface | Public API, contracts |
| **E** | Exception | Error conditions, invalid input |
| **S** | Simple/Scenarios | Happy path, real-world use cases |
```python
class TestStack:
"""Test stack using ZOMBIES."""
# Z - Zero
def test_empty_stack_has_zero_size(self):
assert Stack().size() == 0
def test_pop_empty_stack_raises(self):
with pytest.raises(EmptyStackError):
Stack().pop()
# O - One
def test_push_one_item(self):
stack = Stack()
stack.push("item")
assert stack.size() == 1
assert stack.peek() == "item"
# M - Many
def test_push_multiple_items(self):
stack = Stack()
for i in range(100):
stack.push(i)
assert stack.size() == 100
assert stack.pop() == 99 # LIFO
# B - Boundary
def test_large_number_of_items(self):
stack = Stack(max_size=1000)
for i in range(1000):
stack.push(i)
with pytest.raises(StackOverflowError):
stack.push("overflow")
# E - Exception
def test_peek_empty_stack_raises(self):
with pytest.raises(EmptyStackError):
Stack().peek()import pytest
@pytest.fixture(scope="function") # Default: new instance per test
def user():
return User("test@example.com")
@pytest.fixture(scope="class") # Shared within test class
def database():
db = Database()
yield db
db.close()
@pytest.fixture(scope="module") # Shared within test module
def api_client():
return APIClient()
@pytest.fixture(scope="session") # Shared across entire test session
def config():
return load_config()@pytest.fixture
def make_user():
"""Factory fixture for creating users with custom attributes."""
created = []
def _make_user(email="test@example.com", name="Test User", **kwargs):
user = User(email=email, name=name, **kwargs)
created.append(user)
return user
yield _make_user
# Cleanup
for user in created:
user.delete()
def test_user_creation(make_user):
user1 = make_user(email="alice@example.com")
user2 = make_user(email="bob@example.com", role="admin")
assert user1.email != user2.email
assert user2.role == "admin"@pytest.fixture
def database():
db = Database()
yield db
db.close()
@pytest.fixture
def user_repository(database):
"""Depends on database fixture."""
return UserRepository(database)
@pytest.fixture
def authenticated_user(user_repository, make_user):
"""Depends on multiple fixtures."""
user = make_user()
user_repository.save(user)
return user
def test_user_profile(authenticated_user, user_repository):
"""Uses composed fixtures."""
profile = user_repository.get_profile(authenticated_user.id)
assert profile is not None@pytest.fixture(params=["sqlite", "postgres", "mysql"])
def database(request):
"""Run tests against multiple database backends."""
db_type = request.param
db = create_database(db_type)
yield db
db.close()
def test_user_save(database):
"""This test runs 3 times - once per database type."""
user = User("test@example.com")
database.save(user)
assert database.get(user.id) == userimport pytest
import sys
# Skip with condition
@pytest.mark.skipif(sys.version_info < (3, 10), reason="Requires Python 3.10+")
def test_pattern_matching():
pass
# Expected failure
@pytest.mark.xfail(reason="Known bug, fix pending")
def test_known_broken_feature():
pass
# Custom markers
@pytest.mark.slow
def test_large_dataset():
pass
@pytest.mark.integration
def test_external_api():
pass
# Run specific markers:
# pytest -m slow
# pytest -m "not slow"
# pytest -m "slow and integration"@pytest.mark.parametrize("email,expected_valid", [
pytest.param("user@example.com", True, id="valid_email"),
pytest.param("", False, id="empty_string"),
pytest.param("no-at-symbol", False, id="missing_at"),
pytest.param("@no-local.com", False, id="missing_local_part"),
pytest.param("user@", False, id="missing_domain"),
])
def test_email_validation(email, expected_valid):
result = is_valid_email(email)
assert result == expected_valid# conftest.py - shared fixtures and hooks
import pytest
from pathlib import Path
# Fixtures available to all tests in this directory and subdirectories
@pytest.fixture
def temp_dir(tmp_path):
"""Create a temporary directory with test files."""
test_file = tmp_path / "test.txt"
test_file.write_text("test content")
return tmp_path
@pytest.fixture(autouse=True)
def reset_singleton():
"""Automatically reset singleton before each test."""
MySingleton._instance = None
yield
MySingleton._instance = None
# Hooks
def pytest_configure(config):
"""Register custom markers."""
config.addinivalue_line("markers", "slow: marks tests as slow")
config.addinivalue_line("markers", "integration: marks integration tests")
def pytest_collection_modifyitems(config, items):
"""Skip slow tests unless --slow flag is passed."""
if not config.getoption("--slow"):
skip_slow = pytest.mark.skip(reason="need --slow option to run")
for item in items:
if "slow" in item.keywords:
item.add_marker(skip_slow)
def pytest_addoption(parser):
"""Add custom command line options."""
parser.addoption("--slow", action="store_true", help="run slow tests")import unittest
class TestEmailValidator(unittest.TestCase):
def test_empty_email_raises_with_message(self):
"""Verify exception message content."""
with self.assertRaises(ValueError) as context:
validate_email("")
self.assertIn("cannot be empty", str(context.exception))
def test_invalid_email_raises_specific_error(self):
"""Verify exception type and message."""
with self.assertRaisesRegex(ValueError, r"Invalid email.*@"):
validate_email("no-at-symbol")import unittest
import logging
class TestLogging(unittest.TestCase):
def test_warning_logged_on_retry(self):
"""Verify warning is logged when operation retries."""
with self.assertLogs('myapp.retry', level='WARNING') as logs:
retry_operation()
self.assertEqual(len(logs.output), 1)
self.assertIn("Retrying", logs.output[0])
def test_debug_logging_captures_details(self):
"""Verify debug information is logged."""
with self.assertLogs('myapp', level='DEBUG') as logs:
process_request({"id": 123})
# Find the specific log entry
debug_logs = [log for log in logs.output if 'DEBUG' in log]
self.assertTrue(any("request_id=123" in log for log in debug_logs))import unittest
class TestMathOperations(unittest.TestCase):
def test_square_multiple_values(self):
"""Test square function with multiple inputs."""
test_cases = [
(0, 0),
(1, 1),
(2, 4),
(3, 9),
(-2, 4),
(0.5, 0.25),
]
for value, expected in test_cases:
with self.subTest(value=value):
result = square(value)
self.assertEqual(result, expected)| Type | Purpose | Example Use |
|---|---|---|
| Dummy | Fill parameter lists | Unused logger passed to constructor |
| Stub | Provide canned answers | Return fixed data from API |
| Fake | Working implementation (simplified) | In-memory database |
| Mock | Verify interactions | Check that email was "sent" |
| Spy | Record interactions | Track method calls for later assertion |
from unittest.mock import Mock, MagicMock, patch, call
# Basic Mock
mock = Mock()
mock.method.return_value = 42
result = mock.method() # 42
# Verify calls
mock.method.assert_called_once()
mock.method.assert_called_with()
mock.method.assert_not_called()
# Call history
mock.method("arg1", key="value")
mock.method.call_args # call('arg1', key='value')
mock.method.call_count # 1
# Side effects
mock.method.side_effect = ValueError("Error!")
# mock.method() # Raises ValueError
mock.method.side_effect = [1, 2, 3]
mock.method() # 1
mock.method() # 2
mock.method() # 3# myapp/service.py
from myapp.client import APIClient
def fetch_user(user_id):
client = APIClient()
return client.get(f"/users/{user_id}")
# test_service.py
from unittest.mock import patch
# WRONG: Patching where it's defined
@patch('myapp.client.APIClient')
def test_fetch_user_wrong(mock_client):
# This doesn't work because service.py already imported APIClient
pass
# CORRECT: Patch where it's used (looked up)
@patch('myapp.service.APIClient')
def test_fetch_user_correct(mock_client_class):
# This works!
mock_client = mock_client_class.return_value
mock_client.get.return_value = {"id": 1, "name": "Alice"}
result = fetch_user(1)
assert result["name"] == "Alice"
mock_client.get.assert_called_once_with("/users/1")def test_with_mocker(mocker):
"""pytest-mock provides a cleaner interface."""
# Patch
mock_client = mocker.patch('myapp.service.APIClient')
mock_client.return_value.get.return_value = {"id": 1}
# Spy (track calls to real object)
spy = mocker.spy(real_object, 'method')
real_object.method("arg")
spy.assert_called_once_with("arg")
# Stub attribute
mocker.patch.object(obj, 'attribute', new="value")def test_with_monkeypatch(monkeypatch):
"""monkeypatch is good for environment and simple mocks."""
# Environment variables
monkeypatch.setenv("API_KEY", "test-key")
monkeypatch.delenv("DEBUG", raising=False)
# Attributes
monkeypatch.setattr(module, "CONSTANT", 999)
# Dictionary items
monkeypatch.setitem(config_dict, "timeout", 30)
# Sys.path
monkeypatch.syspath_prepend("/custom/path")# DON'T mock data structures
# BAD
def test_with_mock_list(mocker):
mock_list = mocker.Mock()
mock_list.__len__ = mocker.Mock(return_value=3)
# This is testing your mock, not your code!
# GOOD
def test_with_real_list():
real_list = [1, 2, 3]
assert len(real_list) == 3
# DON'T mock the system under test
# BAD
def test_calculator_mocked(mocker):
calc = Calculator()
mocker.patch.object(calc, 'add', return_value=5)
assert calc.add(2, 3) == 5 # Testing the mock!
# GOOD
def test_calculator():
calc = Calculator()
assert calc.add(2, 3) == 5 # Testing the calculator
# DO mock external services
# GOOD
def test_weather_service(mocker):
mocker.patch('app.weather_api.fetch', return_value={"temp": 72})
weather = get_weather("NYC")
assert weather["temp"] == 72Property-based testing generates random inputs to find edge cases you wouldn't think of.
pip install hypothesisfrom hypothesis import given, strategies as st
@given(st.integers())
def test_integer_round_trip(n):
"""int -> str -> int should be identity."""
assert int(str(n)) == n
@given(st.text())
def test_reverse_twice_is_identity(s):
"""Reversing a string twice gives the original."""
assert s[::-1][::-1] == s
@given(st.lists(st.integers()))
def test_sorted_list_is_sorted(lst):
"""sorted() should produce a sorted list."""
result = sorted(lst)
assert all(result[i] <= result[i+1] for i in range(len(result)-1))from hypothesis import strategies as st
# Basic types
st.integers()
st.integers(min_value=0, max_value=100)
st.floats(allow_nan=False)
st.text()
st.text(min_size=1, max_size=50)
st.booleans()
st.none()
# Collections
st.lists(st.integers())
st.lists(st.integers(), min_size=1, max_size=10)
st.sets(st.text())
st.dictionaries(st.text(), st.integers())
st.tuples(st.integers(), st.text())
# Combining strategies
st.one_of(st.integers(), st.text())
st.integers() | st.none() # Same as one_of
# Custom objects
@st.composite
def user_strategy(draw):
"""Generate random User objects."""
email = draw(st.emails())
name = draw(st.text(min_size=1, max_size=50))
age = draw(st.integers(min_value=0, max_value=150))
return User(email=email, name=name, age=age)
@given(user_strategy())
def test_user_serialization(user):
"""Users should survive JSON round-trip."""
json_str = user.to_json()
restored = User.from_json(json_str)
assert restored == userfrom hypothesis import given, strategies as st, assume
# Property 1: Inverse operations
@given(st.text())
def test_encode_decode_roundtrip(s):
"""Encoding then decoding should return original."""
encoded = encode(s)
decoded = decode(encoded)
assert decoded == s
# Property 2: Invariants
@given(st.lists(st.integers()))
def test_sort_preserves_length(lst):
"""Sorting shouldn't change list length."""
assert len(sorted(lst)) == len(lst)
@given(st.lists(st.integers()))
def test_sort_preserves_elements(lst):
"""Sorting shouldn't add or remove elements."""
from collections import Counter
assert Counter(sorted(lst)) == Counter(lst)
# Property 3: Idempotence
@given(st.text())
def test_normalize_is_idempotent(s):
"""Normalizing twice should equal normalizing once."""
assert normalize(normalize(s)) == normalize(s)
# Using assume() to filter inputs
@given(st.integers(), st.integers())
def test_division_inverse(a, b):
"""(a / b) * b should equal a."""
assume(b != 0) # Skip when b is zero
result = (a / b) * b
assert abs(result - a) < 1e-9 # Float comparisonHypothesis automatically shrinks failing examples to minimal cases:
@given(st.lists(st.integers()))
def test_sum_is_positive(lst):
"""Intentionally wrong property for demonstration."""
assert sum(lst) >= 0
# Hypothesis might find: [1, -2]
# Then shrink to: [-1]Test at the boundaries of input domains:
@pytest.mark.parametrize("age,expected_valid", [
# Boundary: minimum age (0)
(-1, False), # Below minimum
(0, True), # At minimum
(1, True), # Above minimum
# Boundary: maximum age (150)
(149, True), # Below maximum
(150, True), # At maximum
(151, False), # Above maximum
])
def test_age_validation(age, expected_valid):
result = is_valid_age(age)
assert result == expected_validGroup inputs that should behave identically:
class TestUsernameValidation:
"""Test username validation using equivalence partitions."""
# Partition 1: Valid usernames (5-20 chars, alphanumeric + underscore)
@pytest.mark.parametrize("username", [
"alice", # Minimum length
"alice_smith", # Contains underscore
"user123", # Contains numbers
"a" * 20, # Maximum length
])
def test_valid_usernames(self, username):
assert is_valid_username(username)
# Partition 2: Too short (< 5 chars)
@pytest.mark.parametrize("username", ["", "a", "ab", "abc", "abcd"])
def test_too_short_usernames(self, username):
assert not is_valid_username(username)
# Partition 3: Too long (> 20 chars)
@pytest.mark.parametrize("username", ["a" * 21, "a" * 100])
def test_too_long_usernames(self, username):
assert not is_valid_username(username)
# Partition 4: Invalid characters
@pytest.mark.parametrize("username", [
"alice!", # Special char
"alice smith", # Space
"alice@email", # At symbol
"alice-name", # Hyphen
])
def test_invalid_character_usernames(self, username):
assert not is_valid_username(username)For complex business rules:
# Decision table for shipping cost:
# | Order > $100 | Member | Free Shipping |
# |--------------|--------|---------------|
# | Yes | Yes | Yes |
# | Yes | No | Yes |
# | No | Yes | Yes |
# | No | No | No |
@pytest.mark.parametrize("order_total,is_member,expected_free", [
(150, True, True), # Over $100, member
(150, False, True), # Over $100, non-member
(50, True, True), # Under $100, member
(50, False, False), # Under $100, non-member
(100, True, True), # Exactly $100, member (boundary)
(100, False, True), # Exactly $100, non-member (boundary)
])
def test_free_shipping_rules(order_total, is_member, expected_free):
shipping = calculate_shipping(order_total, is_member)
assert (shipping == 0) == expected_freeFunctions without side effects are easiest to test:
# HARD TO TEST: Side effects
def process_order_hard(order_id):
order = database.get(order_id) # Database call
send_email(order.user) # Email side effect
update_inventory(order.items) # More side effects
return order.total
# EASY TO TEST: Pure function
def calculate_order_total(items: list[Item], discount: float) -> float:
"""Pure function: no side effects, deterministic."""
subtotal = sum(item.price * item.quantity for item in items)
return subtotal * (1 - discount)
def test_calculate_order_total():
items = [Item(price=10.0, quantity=2), Item(price=5.0, quantity=3)]
total = calculate_order_total(items, discount=0.1)
assert total == 31.5 # (20 + 15) * 0.9Inject dependencies instead of creating them:
# HARD TO TEST: Creates own dependency
class OrderService:
def __init__(self):
self.db = PostgresDatabase() # Hard-coded dependency
self.email = SMTPEmailService()
def process(self, order_id):
order = self.db.get(order_id)
self.email.send(order.user, "Order confirmed")
return order
# EASY TO TEST: Dependencies injected
class OrderService:
def __init__(self, db: Database, email: EmailService):
self.db = db
self.email = email
def process(self, order_id):
order = self.db.get(order_id)
self.email.send(order.user, "Order confirmed")
return order
def test_order_processing():
fake_db = FakeDatabase()
fake_db.save(Order(id=1, user="alice@example.com", total=100))
fake_email = FakeEmailService()
service = OrderService(db=fake_db, email=fake_email)
result = service.process(1)
assert result.total == 100
assert fake_email.sent_to == ["alice@example.com"]Separate pure logic from side effects:
# Pure core: All business logic, no I/O
def calculate_pricing(items: list[Item], rules: PricingRules) -> PricingResult:
"""Pure function with all business logic."""
subtotal = sum(item.price * item.quantity for item in items)
discount = rules.calculate_discount(subtotal)
tax = rules.calculate_tax(subtotal - discount)
return PricingResult(
subtotal=subtotal,
discount=discount,
tax=tax,
total=subtotal - discount + tax
)
# Imperative shell: I/O at edges
def checkout(cart_id: str) -> Receipt:
"""Thin shell that handles I/O."""
# Input
cart = database.get_cart(cart_id)
rules = config.get_pricing_rules()
# Pure calculation
pricing = calculate_pricing(cart.items, rules)
# Output
receipt = create_receipt(cart, pricing)
database.save_receipt(receipt)
email_service.send_receipt(cart.user_email, receipt)
return receipt
# Testing is easy!
def test_pricing_calculation():
items = [Item("SKU1", price=10, quantity=2)]
rules = PricingRules(discount_threshold=100, tax_rate=0.1)
result = calculate_pricing(items, rules)
assert result.subtotal == 20
assert result.discount == 0
assert result.tax == 2
assert result.total == 22Tests that pass but don't verify anything meaningful:
# BAD: The Liar - always passes
def test_user_created():
user = User("alice@example.com")
assert user # Truthy check, not meaningful
assert True # Always passes!
# GOOD: Meaningful assertions
def test_user_created():
user = User("alice@example.com")
assert user.email == "alice@example.com"
assert user.created_at is not None
assert user.id is not NoneTests that test too much:
# BAD: The Giant - tests everything
def test_user_workflow():
# Registration
user = register("alice@example.com", "password123")
assert user.email == "alice@example.com"
# Login
token = login("alice@example.com", "password123")
assert token is not None
# Update profile
update_profile(user.id, name="Alice")
assert get_user(user.id).name == "Alice"
# Password change
change_password(user.id, "password123", "newpassword")
assert login("alice@example.com", "newpassword")
# Deletion
delete_user(user.id)
assert get_user(user.id) is None
# GOOD: Focused tests
def test_user_registration():
user = register("alice@example.com", "password123")
assert user.email == "alice@example.com"
def test_user_login():
user = register("alice@example.com", "password123")
token = login("alice@example.com", "password123")
assert token is not None
def test_profile_update():
user = register("alice@example.com", "password123")
update_profile(user.id, name="Alice")
assert get_user(user.id).name == "Alice"Tests with too much arrangement:
# BAD: Excessive setup
def test_order_discount():
# 20 lines of setup...
company = Company(name="Acme")
department = Department(company=company, name="Sales")
employee = Employee(department=department, role="Manager")
user = User(employee=employee, email="test@example.com")
account = Account(user=user, type="Premium")
cart = Cart(account=account)
product1 = Product(name="Widget", price=100)
product2 = Product(name="Gadget", price=200)
cart.add(product1)
cart.add(product2)
# ...finally the actual test
discount = calculate_discount(cart)
assert discount == 30
# GOOD: Use fixtures and factories
def test_order_discount(premium_cart_with_items):
discount = calculate_discount(premium_cart_with_items)
assert discount == 30
@pytest.fixture
def premium_cart_with_items(make_cart, make_product):
cart = make_cart(account_type="Premium")
cart.add(make_product(price=100))
cart.add(make_product(price=200))
return cartTests that take too long:
# BAD: Slow test with real delays
def test_rate_limiting():
for i in range(100):
response = make_request()
time.sleep(1) # Real delay!
assert response.status_code == 429
# GOOD: Mock the clock
def test_rate_limiting(mocker):
mock_time = mocker.patch('time.time')
mock_time.side_effect = range(0, 200, 2) # Simulated time
for i in range(100):
response = make_request()
assert response.status_code == 429# BAD: Tests implementation
def test_user_stored_in_dict(mocker):
service = UserService()
spy = mocker.spy(service, '_users')
service.add_user(User("alice@example.com"))
# Fragile: breaks if internal storage changes
assert "alice@example.com" in service._users
# GOOD: Tests behavior
def test_user_can_be_retrieved():
service = UserService()
service.add_user(User("alice@example.com"))
user = service.get_user("alice@example.com")
assert user.email == "alice@example.com"# BAD: Testing just for coverage
def test_property_getter():
user = User(name="Alice")
assert user.name == "Alice" # Tests Python, not your code
def test_trivial_delegation():
service = UserService(repo)
service.get_user(1)
repo.get.assert_called_with(1) # Just verifies wiring
# GOOD: Focus on meaningful behavior
def test_user_not_found_raises():
service = UserService(EmptyRepository())
with pytest.raises(UserNotFoundError):
service.get_user(1)
def test_user_email_validation():
with pytest.raises(ValueError):
User(name="Alice", email="invalid")# BAD: Conditional logic
def test_processing(data_type):
if data_type == "json":
result = process(json_data)
assert result.format == "json"
elif data_type == "xml":
result = process(xml_data)
assert result.format == "xml"
# GOOD: Parametrize instead
@pytest.mark.parametrize("data,expected_format", [
(json_data, "json"),
(xml_data, "xml"),
])
def test_processing(data, expected_format):
result = process(data)
assert result.format == expected_format# BAD: Magic numbers
def test_discount():
result = calculate_discount(150)
assert result == 15 # Why 15? Why 150?
# GOOD: Named constants
def test_discount_for_orders_over_threshold():
PREMIUM_THRESHOLD = 100
ORDER_TOTAL = 150
EXPECTED_DISCOUNT = 15 # 10% of 150
result = calculate_discount(ORDER_TOTAL)
assert result == EXPECTED_DISCOUNT# BAD: Time-dependent flaky test
def test_created_today():
user = User("alice@example.com")
assert user.created_at.date() == date.today() # Fails at midnight!
# GOOD: Freeze time
def test_created_today(freezer):
freezer.move_to("2025-01-15 12:00:00")
user = User("alice@example.com")
assert user.created_at == datetime(2025, 1, 15, 12, 0, 0)
# BAD: Order-dependent test
def test_user_count():
assert User.count() == 5 # Depends on other tests
# GOOD: Isolated test
def test_user_count(empty_database):
User.create("alice@example.com")
User.create("bob@example.com")
assert User.count() == 2 /\
/ \
/ E2E\ Few, slow, expensive
/ \
/--------\
/ \
/Integration \ Some, moderate
/ \
/----------------\
/ \
/ Unit \ Many, fast, cheap
/ \
/------------------------\
| Level | Percentage | Characteristics |
|---|---|---|
| Unit | 70% | Fast, isolated, no I/O |
| Integration | 20% | Test module boundaries |
| E2E | 10% | Full system, slow, brittle |
/------------------------\
/ \
/ E2E (many) \ WRONG!
/ \
/--------------------------------\
/ \
/ Integration (some) \
/ \
/----------------------------------------\
Unit (few)
This inverts the pyramid: too many slow, brittle E2E tests and too few fast unit tests.
def categorize(value):
if value > 0:
return "positive"
elif value < 0:
return "negative"
else:
return "zero"
# Line coverage: 100% with these tests
def test_positive():
assert categorize(1) == "positive"
def test_negative():
assert categorize(-1) == "negative"
def test_zero():
assert categorize(0) == "zero"
# Branch coverage considers all paths through conditionals
# Both branches of each if/else are tested above# Run with coverage
pytest --cov=myapp tests/
# Generate HTML report
pytest --cov=myapp --cov-report=html tests/
# Fail if coverage below threshold
pytest --cov=myapp --cov-fail-under=80 tests/High coverage doesn't mean good tests:
# 100% coverage but useless
def test_square():
square(4) # No assertion!
# 50% coverage but valuable
def test_square_positive():
assert square(4) == 16
def test_square_negative():
assert square(-4) == 16Mutation testing checks test quality by introducing bugs:
# Original code
def is_adult(age):
return age >= 18
# Mutant 1: Change >= to >
def is_adult_mutant1(age):
return age > 18 # Boundary error
# Mutant 2: Change 18 to 17
def is_adult_mutant2(age):
return age >= 17 # Off by one
# Good tests kill mutants:
def test_is_adult_at_boundary():
assert is_adult(18) == True # Kills mutant1
assert is_adult(17) == False # Kills mutant2Tools: mutmut, cosmic-ray
- Write acceptance test (high-level, may be skipped initially)
- Red: Write failing unit test for first behavior
- Green: Implement minimal code
- Refactor: Clean up
- Repeat steps 2-4 for each behavior
- Integrate: Run acceptance test
- Write failing test that reproduces the bug
- Verify the test fails for the right reason
- Fix the bug with minimal change
- Verify the test passes
- Refactor if needed
- Add related edge case tests
# Bug report: "Discount not applied for exactly $100 orders"
# Step 1: Write failing test
def test_discount_applied_at_exact_threshold():
"""Regression test for bug #1234."""
order = Order(total=100)
discount = calculate_discount(order)
assert discount == 10 # 10% discount at $100
# Step 2: Verify it fails
# pytest shows: AssertionError: assert 0 == 10
# Step 3: Find and fix the bug
# Original: if order.total > 100: # Bug: > instead of >=
# Fixed: if order.total >= 100:
# Step 4: Verify it passes
# pytest shows: PASSED
# Step 5: Add related tests
def test_discount_just_below_threshold():
order = Order(total=99.99)
discount = calculate_discount(order)
assert discount == 0 # No discount below $100- Identify the change point
- Write characterization test (tests current behavior)
- Find seams for testing
- Break dependencies carefully
- Write tests for new behavior
- Make changes with safety net
import pytest
from fastapi.testclient import TestClient
from myapp import app
@pytest.fixture
def client():
return TestClient(app)
def test_create_user(client):
response = client.post("/users", json={
"email": "alice@example.com",
"name": "Alice"
})
assert response.status_code == 201
data = response.json()
assert data["email"] == "alice@example.com"
assert "id" in data
def test_create_user_invalid_email(client):
response = client.post("/users", json={
"email": "not-an-email",
"name": "Alice"
})
assert response.status_code == 422 # Validation errorimport pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
@pytest.fixture
def db_session():
"""Create a test database session with rollback."""
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
yield session
session.rollback()
session.close()
def test_user_repository_save(db_session):
repo = UserRepository(db_session)
user = User(email="alice@example.com", name="Alice")
repo.save(user)
saved = repo.get_by_email("alice@example.com")
assert saved.name == "Alice"import pytest
import asyncio
@pytest.mark.asyncio
async def test_async_fetch():
result = await fetch_data("https://api.example.com/data")
assert result is not None
@pytest.mark.asyncio
async def test_async_with_mock(mocker):
mock_fetch = mocker.patch("myapp.client.aiohttp.get")
mock_fetch.return_value.__aenter__.return_value.json = asyncio.coroutine(
lambda: {"data": "test"}
)
result = await fetch_data("https://api.example.com/data")
assert result["data"] == "test"from click.testing import CliRunner
from myapp.cli import main
def test_cli_help():
runner = CliRunner()
result = runner.invoke(main, ["--help"])
assert result.exit_code == 0
assert "Usage:" in result.output
def test_cli_process_file(tmp_path):
runner = CliRunner()
input_file = tmp_path / "input.txt"
input_file.write_text("test data")
result = runner.invoke(main, ["process", str(input_file)])
assert result.exit_code == 0
assert "Processed" in result.outputTDD isn't always the best approach:
| Scenario | Why TDD Might Not Fit |
|---|---|
| Exploratory prototypes | Design is unknown; you're learning |
| Throwaway scripts | One-time use; not worth testing |
| UI/visual code | Hard to test automatically |
| Spikes/research | Goal is learning, not production code |
| Legacy code first contact | Characterization tests first |
- Spike and stabilize: Explore first, then add tests
- Test-after for prototypes: Write tests before committing
- Characterization tests: Document existing behavior before changes
Answer: TDD is a development methodology where you write a failing test before writing the code to make it pass. The cycle is:
- Red: Write a failing test
- Green: Write minimal code to pass
- Refactor: Improve code while keeping tests green
TDD leads to better-designed, more testable code with comprehensive test coverage.
Answer:
- Stub: Returns canned data. Used when you need to control inputs.
stub_api.get_user.return_value = {"id": 1, "name": "Alice"}
- Mock: Verifies interactions. Used when you need to check calls.
mock_email.send_email("alice@example.com", "Welcome!") mock_email.send_email.assert_called_once()
Answer: Property-based testing generates random inputs to verify properties that should always hold, rather than testing specific examples. For example:
@given(st.lists(st.integers()))
def test_sorted_list_is_sorted(lst):
result = sorted(lst)
assert all(result[i] <= result[i+1] for i in range(len(result)-1))This finds edge cases you wouldn't think to test manually.
Answer: The test pyramid recommends:
- 70% unit tests: Fast, isolated, many
- 20% integration tests: Test boundaries, moderate
- 10% E2E tests: Full system, few
The inverted pyramid (ice cream cone) with many E2E tests is an anti-pattern because E2E tests are slow and brittle.
Answer: Use test doubles:
- Mock/stub the external call
- Use dependency injection to swap implementations
- Create a fake (in-memory implementation)
def test_weather_service(mocker):
mocker.patch('app.weather_api.get_weather', return_value={"temp": 72})
result = get_weather_report("NYC")
assert result["temperature"] == 72Answer:
- unittest setUp: Runs before every test method in class
- pytest fixtures: More flexible scopes (function, class, module, session), composable, and can be parametrized
# Fixture with scope and composition
@pytest.fixture(scope="module")
def database():
db = connect()
yield db
db.close()
@pytest.fixture
def user(database): # Composed with database fixture
return database.create_user()Answer: A flaky test passes sometimes and fails other times without code changes. Common causes and fixes:
- Time-dependent: Use time freezing libraries
- Order-dependent: Ensure test isolation
- Concurrency: Use proper synchronization
- External services: Mock external calls
Answer: Mutation testing checks test quality by introducing small bugs (mutants) into code and checking if tests catch them. If a test suite kills all mutants, it's effective. If mutants survive, tests may be weak.
Answer: Python imports create references at import time. If you patch where a function is defined instead of where it's used, the reference in the using module still points to the original.
# myapp/service.py imports from myapp/client.py
# Patch in service.py, not client.py:
@patch('myapp.service.APIClient') # Correct
@patch('myapp.client.APIClient') # WrongAnswer:
- Global state and singletons
- Hard-coded dependencies instead of injection
- Side effects mixed with logic
- Tight coupling between components
- Long methods doing many things
Solution: Use dependency injection, separate pure logic from I/O, and follow SOLID principles.
1. RED: Write failing test (test doesn't compile? That counts!)
2. GREEN: Write minimal code to pass (ugly is OK)
3. REFACTOR: Clean up (tests must stay green)
| Need | Use |
|---|---|
| Canned return value | Stub |
| Verify method called | Mock |
| Track all calls | Spy |
| Working fake implementation | Fake |
| Fill unused parameter | Dummy |
# Return value
mock.method.return_value = 42
# Raise exception
mock.method.side_effect = ValueError("error")
# Multiple returns
mock.method.side_effect = [1, 2, 3]
# Verify called
mock.method.assert_called()
mock.method.assert_called_once()
mock.method.assert_called_with("arg")
mock.method.assert_not_called()
# Call info
mock.method.call_count
mock.method.call_args
mock.method.call_args_list- Test name describes expected behavior
- Test is focused on one thing
- Test would fail without the code
- Test doesn't test implementation details
- Test is deterministic (not flaky)
- Test is fast (< 1 second)
- Test is isolated (no shared state)
- Test has clear arrange/act/assert
- Test-Driven Development: By Example — Kent Beck
- Growing Object-Oriented Software, Guided by Tests — Freeman & Pryce
- Modern TDD in Python — TestDriven.io
- Hypothesis documentation
- Property-Based Testing with Python
- OOPSLA 2025: Evaluation of Property-Based Testing
- pytest-mock — pytest plugin for mocking
- pytest-cov — Coverage plugin
- freezegun — Time freezing
- mutmut — Mutation testing