MultiPeriodDiD: full event-study specification with pre-period coefficients by igerber · Pull Request #125 · igerber/diff-diff

igerber · 2026-02-01T17:38:26Z

Summary

BREAKING: Transform MultiPeriodDiD from a post-period-only estimator into a proper event-study estimator with treatment × period interactions for ALL non-reference periods (pre and post)
Pre-period coefficients enable parallel trends assessment directly from the estimation output
Default reference period changed to last pre-period (e=-1 convention) with FutureWarning for one release cycle
New unit parameter for staggered adoption detection warning
Bug fix: HonestDiD/PreTrendsPower VCV extraction now uses interaction sub-VCV instead of full regression VCV

Changes

Core estimator (`diff_diff/estimators.py`)

Treatment × period interactions created for ALL non-reference periods (pre + post)
Default reference_period changed from first to last pre-period with FutureWarning
Validation warning when reference_period is set to a post-treatment period
unit parameter added to fit() for staggered adoption detection
interaction_indices (period → column index) stored on results for robust VCV extraction
t_stat guard uses np.isfinite(se) and se > 0 (consistent with other estimators)

Results (`diff_diff/results.py`)

New reference_period and interaction_indices fields on MultiPeriodDiDResults
pre_period_effects and post_period_effects convenience properties
summary() shows pre-period section with [ref] indicator for reference period
to_dataframe() includes is_post column
get_effect() gives informative error when accessing reference period

HonestDiD / PreTrendsPower VCV fix

_extract_event_study_params now extracts interaction sub-VCV using stored interaction_indices
Eliminates fragile string matching on coefficient names
Ensures correct sensitivity analysis bounds

Visualization (`diff_diff/visualization.py`)

Default periods now include all estimated periods (pre + post), not just post
Auto-detects reference_period from results object

Tests

14 new tests in TestMultiPeriodDiDEventStudy class
Updated existing tests for new behavior (reference period, pre-period effects)
HonestDiD sub-VCV extraction integration test
All 252 tests pass, 0 failures

Test plan

pytest tests/test_estimators.py -k "MultiPeriod" — 47 tests pass
pytest tests/test_honest_did.py — 52 tests pass
pytest tests/test_pretrends.py — 63 tests pass
pytest tests/test_visualization.py — 18 tests pass
Full suite: 252 passed, 22 skipped, 0 failed
No FutureWarning leaks (verified with -W error::FutureWarning)
ruff check and black --check clean

🤖 Generated with Claude Code

…cients BREAKING: Transform MultiPeriodDiD from a post-period-only estimator into a proper event-study estimator with pre-period coefficients for parallel trends assessment. Core changes: - Treatment × period interactions for ALL non-reference periods (pre and post) - Default reference period is last pre-period (e=-1 convention) with FutureWarning - New `unit` parameter for staggered adoption detection warning - `period_effects` now contains both pre and post period effects - `summary()` shows pre-period section with reference period indicator - `to_dataframe()` includes `is_post` column - `interaction_indices` stored on results for robust sub-VCV extraction Bug fix: - HonestDiD/PreTrendsPower VCV extraction uses interaction sub-VCV instead of full regression VCV (via stored interaction_indices) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-02-01T17:45:46Z

Overall assessment: ⚠️ Needs changes

Executive Summary

MultiPeriodDiD’s new full event-study spec is a good direction, but REGISTRY-mandated assumption checks (≥2 pre-periods, absorbing/time-invariant treatment) are not enforced, so the estimator can run in invalid settings. docs/methodology/REGISTRY.md:96 diff_diff/estimators.py:791
Allowing reference_period to be post-treatment currently only warns, but downstream logic assumes all post periods are estimated, which can miscompute avg_att/SE or crash summary. diff_diff/estimators.py:859 diff_diff/estimators.py:1021 diff_diff/results.py:414
The staggered-adoption warning can false-positive on unbalanced panels because it uses the first observed treated period per unit. diff_diff/estimators.py:799

Methodology

Severity: P1. Impact: REGISTRY now requires ≥2 pre-periods and an absorbing/time-invariant treatment indicator; code only checks binary treatment and ≥1 pre-period, so parallel-trends assessment can be invalid and treatment-misclassified panels can pass silently. Concrete fix: enforce len(pre_periods) >= 2 (or warn + disable pre-trend outputs), and when unit is provided verify treatment is constant within unit and monotonic/absorbing; consider requiring unit for these checks or explicitly document that treatment must be a time‑invariant group indicator. docs/methodology/REGISTRY.md:96 docs/methodology/REGISTRY.md:102 diff_diff/estimators.py:791 diff_diff/estimators.py:829
Severity: P1. Impact: If reference_period is in post_periods (allowed with a warning), that period is omitted from period_effects, but avg_att/SE uses len(post_periods) and summary() indexes all post_periods, leading to biased averages or KeyError. Concrete fix: either disallow post-period references (raise ValueError) or consistently exclude the reference from post_periods in downstream computations and include an explicit zero effect for averaging/summary; use len(post_effect_values) for the variance denominator. diff_diff/estimators.py:859 diff_diff/estimators.py:1021 diff_diff/results.py:414

Code Quality

Severity: P2. Impact: Staggered-adoption detection uses the minimum treated time among observed treated rows; in unbalanced panels with missing early periods this can spuriously warn even under simultaneous adoption. Concrete fix: compute adoption timing relative to each unit’s first observed period or only warn when within-unit treatment timing truly differs after accounting for missingness; consider adding a balanced‑panel check. diff_diff/estimators.py:799

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

Severity: P2. Impact: REGISTRY’s “Standard errors” section says MultiPeriodDiD defaults to cluster-robust, but the cross-reference table (and code) indicate HC1; the registry is internally inconsistent. Concrete fix: align the “Standard errors” bullet with actual defaults (HC1) or change code defaults to match. docs/methodology/REGISTRY.md:149 docs/methodology/REGISTRY.md:1038
Severity: P2. Impact: Tests cover the warning for reference_period in post, but not the downstream behavior (summary/avg_att), so the failure mode is untested. Concrete fix: add tests asserting the chosen behavior (error or correct exclusion/zero effect) when reference_period is post. tests/test_estimators.py:2051

- Upgrade post-period reference_period from warning to ValueError (P1) - Add warning when <2 pre-periods available for parallel trends (P1) - Add absorbing treatment validation when unit param provided (P2) - Fix staggered detection false positives on unbalanced panels (P2) - Fix REGISTRY.md SE default documentation (HC1, not cluster-robust) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T18:26:02Z

/ai-review

github-actions · 2026-02-01T18:31:19Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a2931e67fba9911aa322198ae83f9055db632afd

Overall assessment: ⛔ Blocker
Executive summary:

P0: SE==0/non-finite paths still produce non-NaN confidence intervals for period effects, violating the NaN/Inf propagation rule and risking misleading inference.
P1: Methodology Registry now requires ≥2 pre-periods, but implementation only warns and proceeds, which is a spec mismatch for the updated event‑study design.
P2: Changelog states a warning for post-period reference, but code raises a ValueError (docs/behavior mismatch).

Methodology

P1 — Registry vs implementation mismatch on pre-period requirement. Impact: estimator can be run outside documented assumptions, undermining parallel‑trends assessment. Fix: either enforce ValueError when len(pre_periods) < 2 or update the Registry to explicitly allow a single pre‑period with a warning and “no pre‑trend test” caveat. Locations: docs/methodology/REGISTRY.md:L96-L101, diff_diff/estimators.py:L854-L867.

Code Quality

P0 — NaN/Inf propagation incomplete for period effects: t_stat is NaN when se is non‑finite/0, but CI is still computed from se (can collapse to the point estimate when se==0). Impact: misleading inference fields (CI not NaN) in edge cases flagged by the review checklist. Fix: if se is not finite or <=0, set t_stat, p_value, and conf_int to NaN (or (np.nan, np.nan)) before creating PeriodEffect, or add a global guard in compute_confidence_interval. Locations: diff_diff/estimators.py:L1015-L1026, diff_diff/utils.py:L81-L103.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

P2 — Release notes vs behavior mismatch: changelog says “Warning when reference_period is set to a post‑treatment period,” but code raises ValueError. Impact: user expectations and automation may be wrong. Fix: update changelog (and any other docs) to “ValueError,” or change code to warn if that behavior is intended. Locations: CHANGELOG.md:L27, diff_diff/estimators.py:L891-L900.

- Guard period-level p_value and conf_int when SE is non-finite/zero (previously conf_int returned misleading zero-width interval) - Add test assertions for full NaN propagation on unidentified periods - Fix REGISTRY.md: "requires ≥2 pre-periods" → "warns when <2" - Fix CHANGELOG.md: "Warning" → "ValueError" for post-period reference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T19:27:29Z

/ai-review

github-actions · 2026-02-01T19:35:08Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9440e094e52b66dedf481d695fde2b91a6fb5361

Overall assessment: ⚠️ Needs changes

Executive summary

P1: The new full event‑study spec assumes an ever‑treated group indicator (D_i), but the implementation uses the raw treatment column, so time‑varying D_it inputs (common) make pre‑period effects undefined and undermine the main feature.
P1: Staggered‑adoption warning is opt‑in via unit; the registry says this warning is required, so the default path can silently violate assumptions.
P2: HonestDiD’s max‑pre‑violation now reads pre‑period effects without filtering non‑finite values, which can propagate NaNs into bounds.
Tests/docs should cover time‑varying treatment inputs and NaN pre‑period effects, and clarify the expected treatment indicator.

Methodology

P1 Treatment indicator expectation is inconsistent with the registry’s D_i event‑study equation: the code uses the raw treatment column in interactions, so if users pass time‑varying D_it (0 pre / 1 post), pre‑period interactions are all‑zero/collinear and pre‑period effects become NaN, defeating parallel‑trends assessment. Impact: misleads users and can invalidate pre‑trend diagnostics and downstream HonestDiD/PreTrendsPower inputs. Concrete fix: require/derive an ever‑treated indicator for interactions (e.g., D_i = groupby(unit).max() when unit is provided), validate time‑invariance when unit is passed, and document the requirement clearly. (diff_diff/estimators.py:L923-L952, docs/methodology/REGISTRY.md:L105-L127)
P1 Staggered‑adoption warning is only executed when unit is provided; the registry lists the warning as a required assumption check. Impact: users who omit unit (common) receive no warning and may apply MultiPeriodDiD to staggered adoption settings, biasing estimates. Concrete fix: make unit required (or infer it from a known column/cluster), or explicitly downgrade the registry requirement and docs to “opt‑in warning.” (diff_diff/estimators.py:L794-L832, docs/methodology/REGISTRY.md:L96-L103)
P2 HonestDiD._estimate_max_pre_violation now uses period_effects directly without filtering non‑finite pre‑period effects; NaNs can propagate into bounds when any pre‑period coefficient is unidentified. Impact: sensitivity analysis can return NaNs even when some pre‑period effects are valid. Concrete fix: filter np.isfinite(effect) and fallback to avg_se when no finite pre‑effects remain. (diff_diff/honest_did.py:L1237-L1248)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P2 Missing tests/docs for the time‑varying treatment input path and NaN pre‑period effects. Impact: regressions can slip in for the most common user data format (D_it). Concrete fix: add a test asserting either a warning or conversion to ever‑treated when treatment varies within unit, plus a test that HonestDiD ignores NaN pre‑effects; update README/docs to state that treatment should be an ever‑treated indicator (or that the estimator derives it).

- Add time-varying treatment warning when `unit` is provided and treatment varies within units (guides users toward ever-treated indicator D_i) - Filter non-finite pre-period effects in HonestDiD _extract_event_study_params and _estimate_max_pre_violation (prevents NaN propagation into bounds) - Update REGISTRY.md: D_i requirement and staggered check conditional on unit - Add tests for D_it warning, NaN filtering, and all-NaN-pre-periods error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T20:10:07Z

/ai-review

github-actions · 2026-02-01T20:19:48Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8207057580c6d184fad01b81df1fbd4837792137

Overall Assessment

⚠️ Needs changes

Executive Summary

MultiPeriodDiD now estimates full event‑study pre/post coefficients with updated results/visualization paths and expanded tests.
HonestDiD’s new event‑study extraction can silently drop post‑periods or end up with num_post=0, changing the estimand or causing divide‑by‑zero; needs a guard aligned with the Methodology Registry.
Registry now requires treated/control variation, but MultiPeriodDiD only validates binary values and can proceed with all‑treated/all‑control data.
Tests cover NaN pre‑period filtering, but no coverage for NaN/missing post‑period effects in HonestDiD.

Methodology

Severity: P1 | Impact: _extract_event_study_params filters non‑finite post‑period effects without error; HonestDiD then computes bounds on a reduced post set or divides by zero when num_post==0, which changes the estimand vs. the Registry’s ATT over all post periods. | Fix: After filtering, explicitly require all post‑periods to be finite (or raise) and add a num_post==0 guard in fit(); align behavior with MultiPeriodDiD’s NA‑propagation for unidentified post effects. diff_diff/honest_did.py:L535-L605, diff_diff/honest_did.py:L1084-L1116
Severity: P1 | Impact: Methodology Registry now states treatment must vary across treated/control groups, but MultiPeriodDiD only checks binary values and can proceed with all‑0 or all‑1 treatment, leading to rank deficiency and misleading output. | Fix: Add an explicit “both 0 and 1 present” check after validate_binary and raise a ValueError when violated. diff_diff/estimators.py:L791-L795, diff_diff/utils.py:L29-L45

Code Quality

Severity: P3 | Impact: No issues found. | Fix: None.

Performance

Severity: P3 | Impact: No issues found. | Fix: None.

Maintainability

Severity: P3 | Impact: No issues found. | Fix: None.

Tech Debt

Severity: P3 | Impact: No issues found. | Fix: None.

Security

Severity: P3 | Impact: No issues found. | Fix: None.

Documentation/Tests

Severity: P2 | Impact: No test ensures HonestDiD errors or propagates NA when post‑period effects are non‑finite or missing; current tests only cover pre‑period NaNs. This could let the P1 issue regress. | Fix: Add edge‑case tests for NaN/missing post‑period effects and num_post==0 in tests/test_honest_did.py. tests/test_honest_did.py:L692-L725

Add num_post == 0 guard in HonestDiD.fit() to raise ValueError when all post-period effects are non-finite, preventing silent computation with empty arrays. Covers both MultiPeriodDiD and CallawaySantAnna paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T20:32:58Z

/ai-review

github-actions · 2026-02-01T20:37:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d1de9cb68ee8a4ad5e412d23badc39e9040cc6ed

Overall assessment: ⛔ Blocker

Executive summary

P0: avg_att inference can emit infinite t-stats/p-values when avg_se is 0/non‑finite; this violates the NaN/Inf propagation rule and can mislead users.
P1: HonestDiD’s new full event‑study extraction assumes sorted periods are pre‑then‑post; if period labels don’t sort chronologically or post_periods aren’t strictly later, pre/post splitting is wrong and bounds are invalid.
Registry alignment is otherwise strong (full pre/post interactions, last‑pre reference, pre‑period warnings, sub‑VCV extraction).

Methodology

Severity: P0 | Impact: avg_att inference can report spurious significance if avg_se is 0 or non‑finite, which violates the edge‑case requirement that inference fields become NaN when SE is invalid; this can mislead downstream users and papers. | Concrete fix: mirror the per‑period guard when computing average inference: if not np.isfinite(avg_se) or avg_se <= 0, set avg_t_stat, avg_p_value, and avg_conf_int to NaN (and optionally avg_se to NaN) instead of dividing. | Location: diff_diff/estimators.py:MultiPeriodDiD.fit (avg ATT computation block).
Severity: P1 | Impact: HonestDiD uses sorted(periods) then slices beta_hat[:num_pre]/[num_pre:]; if time labels are non‑numeric or post_periods aren’t strictly later, pre/post effects are misclassified, invalidating parallel‑trends diagnostics and HonestDiD bounds (violates Methodology Registry’s pre/post separation). | Concrete fix: build ordered lists explicitly: pre_est = [p for p in pre_periods if finite], post_est = [p for p in post_periods if finite], then all_estimated = pre_est + post_est and use this same order for effects, sigma, and num_pre/num_post. Add a validation or test with non‑monotone labels to lock this in. | Location: diff_diff/honest_did.py:_extract_event_study_params.

Code Quality

No additional issues found beyond the methodology blockers.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

Severity: P2 | Impact: There is no test that the HonestDiD extraction preserves pre/post ordering when period labels are non‑chronological; this makes the P1 above easy to regress. | Concrete fix: add a unit test with mixed/unsorted period labels or a post_periods list that is out of order, and assert correct pre/post counts and ordering. | Location: tests/test_honest_did.py (add a new extraction test).

- Align avg_se guard with per-period pattern (np.isfinite check prevents avg_t_stat=0 / avg_p_value=1 when variance is infinite) - Use explicit pre-then-post ordering in HonestDiD extraction instead of sorted() (prevents misclassification when period labels don't sort chronologically) - Add test with non-monotone period labels (pre=[5,6,7], post=[1,2]) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T21:01:49Z

/ai-review

github-actions · 2026-02-01T21:12:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 66b27546da306fc4b3e89a3e0423633fbdc09e9d

PR Review

Overall assessment: ✅ Looks good

Executive summary

MultiPeriodDiD’s full event‑study spec (pre + post interactions, last‑pre reference default, post‑only avg ATT) aligns with the Methodology Registry.
HonestDiD/PreTrendsPower now use interaction sub‑VCV extraction; ordering and NaN filtering logic look consistent with the updated results object.
Two minor doc/UX inconsistencies to clean up (warning text and an outdated comment).

Methodology

No findings. Cross‑checked MultiPeriodDiD + HonestDiD/PreTrendsPower changes against docs/methodology/REGISTRY.md; no mismatches in spec/SE/identification logic found.

Code Quality

P3 — Impact: The comment in HonestDiD.fit() still claims beta_hat contains only post‑period effects, which is no longer true after the full event‑study change and can mislead future maintenance. Fix: update the comment to reflect pre+post ordering and the conditional extraction that follows. (diff_diff/honest_did.py:L1094-L1101)

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 — Impact: The FutureWarning message says the default reference period will change in a future release, but the code already uses last pre‑period now; this can confuse users about current behavior. Fix: update the warning text to say the default has changed (or only warn while still using the old default). (diff_diff/estimators.py:L890-L905)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T21:37:13Z

/ai-review

github-actions · 2026-02-01T21:49:38Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3cb1c857280069541281f8482a1a328e04478628

Overall Assessment: ⚠️ Needs changes

Executive Summary

P2 methodology: HonestDiD orders coefficients with sorted(...) instead of preserving the pre_periods/post_periods order, which can mis-order event time for non-numeric labels and distort smoothness-based bounds. diff_diff/honest_did.py:L561-L606
MultiPeriodDiD’s event-study spec, defaults, and warnings align with the Methodology Registry (full pre+post interactions, last-pre reference, D_i guidance).
NaN/Inf propagation for period-level and average inference in MultiPeriodDiD is now consistent.
Tests/docs are expanded for event-study outputs, HonestDiD sub-VCV extraction, and new warnings.

Methodology

P2 | Impact: HonestDiD smoothness/combined bounds rely on the chronological ordering of event-time coefficients; using sorted(...) can reorder periods when labels aren’t naturally sortable (e.g., string periods), violating the ordering assumptions in Rambachan & Roth and yielding incorrect bounds. | Fix: Build pre_estimated/post_estimated by filtering pre_periods/post_periods in their existing order (e.g., pre_estimated = [p for p in pre_periods if p in finite_periods]) and use that ordering for all_estimated, beta_hat, and indices. diff_diff/honest_did.py:L561-L606

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

No issues found.

Replace sorted() with order-preserving list comprehensions in _extract_event_study_params to maintain chronological period ordering. sorted() could break smoothness-based bounds (DeltaSD, DeltaSDRM) when period labels don't sort chronologically by default (e.g., string labels). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-02-01T22:07:13Z

/ai-review

github-actions · 2026-02-01T22:20:29Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ecddfb248a6585104db3d5850d47e41807d6bfca

Overall Assessment ✅ Looks good

Executive Summary

MultiPeriodDiD’s full event‑study spec and default reference period align with the updated Methodology Registry; no mismatches found.
HonestDiD/PreTrendsPower now consume interaction sub‑VCV and filter non‑finite estimates, which matches the intended sensitivity‑analysis inputs.
Edge‑case handling around non‑finite estimates and reference‑period access appears covered; no blocking issues identified.

Methodology

No findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No findings.

Address PR #125 review round 6: stale comment and FutureWarning text

3cb1c85

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber merged commit 4d7901c into main Feb 2, 2026
7 checks passed

igerber deleted the methodology-review-multiperiod branch February 2, 2026 16:28

Conversation

igerber commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core estimator (diff_diff/estimators.py)

Results (diff_diff/results.py)

HonestDiD / PreTrendsPower VCV fix

Visualization (diff_diff/visualization.py)

Tests

Test plan

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

PR Review

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

igerber commented Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

igerber commented Feb 1, 2026 •

edited

Loading

Core estimator (`diff_diff/estimators.py`)

Results (`diff_diff/results.py`)

Visualization (`diff_diff/visualization.py`)