Skip to content

MultiPeriodDiD: full event-study specification with pre-period coefficients#125

Merged
igerber merged 8 commits intomainfrom
methodology-review-multiperiod
Feb 2, 2026
Merged

MultiPeriodDiD: full event-study specification with pre-period coefficients#125
igerber merged 8 commits intomainfrom
methodology-review-multiperiod

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Feb 1, 2026

Summary

  • BREAKING: Transform MultiPeriodDiD from a post-period-only estimator into a proper event-study estimator with treatment × period interactions for ALL non-reference periods (pre and post)
  • Pre-period coefficients enable parallel trends assessment directly from the estimation output
  • Default reference period changed to last pre-period (e=-1 convention) with FutureWarning for one release cycle
  • New unit parameter for staggered adoption detection warning
  • Bug fix: HonestDiD/PreTrendsPower VCV extraction now uses interaction sub-VCV instead of full regression VCV

Changes

Core estimator (diff_diff/estimators.py)

  • Treatment × period interactions created for ALL non-reference periods (pre + post)
  • Default reference_period changed from first to last pre-period with FutureWarning
  • Validation warning when reference_period is set to a post-treatment period
  • unit parameter added to fit() for staggered adoption detection
  • interaction_indices (period → column index) stored on results for robust VCV extraction
  • t_stat guard uses np.isfinite(se) and se > 0 (consistent with other estimators)

Results (diff_diff/results.py)

  • New reference_period and interaction_indices fields on MultiPeriodDiDResults
  • pre_period_effects and post_period_effects convenience properties
  • summary() shows pre-period section with [ref] indicator for reference period
  • to_dataframe() includes is_post column
  • get_effect() gives informative error when accessing reference period

HonestDiD / PreTrendsPower VCV fix

  • _extract_event_study_params now extracts interaction sub-VCV using stored interaction_indices
  • Eliminates fragile string matching on coefficient names
  • Ensures correct sensitivity analysis bounds

Visualization (diff_diff/visualization.py)

  • Default periods now include all estimated periods (pre + post), not just post
  • Auto-detects reference_period from results object

Tests

  • 14 new tests in TestMultiPeriodDiDEventStudy class
  • Updated existing tests for new behavior (reference period, pre-period effects)
  • HonestDiD sub-VCV extraction integration test
  • All 252 tests pass, 0 failures

Test plan

  • pytest tests/test_estimators.py -k "MultiPeriod" — 47 tests pass
  • pytest tests/test_honest_did.py — 52 tests pass
  • pytest tests/test_pretrends.py — 63 tests pass
  • pytest tests/test_visualization.py — 18 tests pass
  • Full suite: 252 passed, 22 skipped, 0 failed
  • No FutureWarning leaks (verified with -W error::FutureWarning)
  • ruff check and black --check clean

🤖 Generated with Claude Code

…cients

BREAKING: Transform MultiPeriodDiD from a post-period-only estimator into a
proper event-study estimator with pre-period coefficients for parallel trends
assessment.

Core changes:
- Treatment × period interactions for ALL non-reference periods (pre and post)
- Default reference period is last pre-period (e=-1 convention) with FutureWarning
- New `unit` parameter for staggered adoption detection warning
- `period_effects` now contains both pre and post period effects
- `summary()` shows pre-period section with reference period indicator
- `to_dataframe()` includes `is_post` column
- `interaction_indices` stored on results for robust sub-VCV extraction

Bug fix:
- HonestDiD/PreTrendsPower VCV extraction uses interaction sub-VCV instead of
  full regression VCV (via stored interaction_indices)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Feb 1, 2026

Overall assessment: ⚠️ Needs changes

Executive Summary

  • MultiPeriodDiD’s new full event-study spec is a good direction, but REGISTRY-mandated assumption checks (≥2 pre-periods, absorbing/time-invariant treatment) are not enforced, so the estimator can run in invalid settings. docs/methodology/REGISTRY.md:96 diff_diff/estimators.py:791
  • Allowing reference_period to be post-treatment currently only warns, but downstream logic assumes all post periods are estimated, which can miscompute avg_att/SE or crash summary. diff_diff/estimators.py:859 diff_diff/estimators.py:1021 diff_diff/results.py:414
  • The staggered-adoption warning can false-positive on unbalanced panels because it uses the first observed treated period per unit. diff_diff/estimators.py:799

Methodology

  • Severity: P1. Impact: REGISTRY now requires ≥2 pre-periods and an absorbing/time-invariant treatment indicator; code only checks binary treatment and ≥1 pre-period, so parallel-trends assessment can be invalid and treatment-misclassified panels can pass silently. Concrete fix: enforce len(pre_periods) >= 2 (or warn + disable pre-trend outputs), and when unit is provided verify treatment is constant within unit and monotonic/absorbing; consider requiring unit for these checks or explicitly document that treatment must be a time‑invariant group indicator. docs/methodology/REGISTRY.md:96 docs/methodology/REGISTRY.md:102 diff_diff/estimators.py:791 diff_diff/estimators.py:829
  • Severity: P1. Impact: If reference_period is in post_periods (allowed with a warning), that period is omitted from period_effects, but avg_att/SE uses len(post_periods) and summary() indexes all post_periods, leading to biased averages or KeyError. Concrete fix: either disallow post-period references (raise ValueError) or consistently exclude the reference from post_periods in downstream computations and include an explicit zero effect for averaging/summary; use len(post_effect_values) for the variance denominator. diff_diff/estimators.py:859 diff_diff/estimators.py:1021 diff_diff/results.py:414

Code Quality

  • Severity: P2. Impact: Staggered-adoption detection uses the minimum treated time among observed treated rows; in unbalanced panels with missing early periods this can spuriously warn even under simultaneous adoption. Concrete fix: compute adoption timing relative to each unit’s first observed period or only warn when within-unit treatment timing truly differs after accounting for missingness; consider adding a balanced‑panel check. diff_diff/estimators.py:799

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P2. Impact: REGISTRY’s “Standard errors” section says MultiPeriodDiD defaults to cluster-robust, but the cross-reference table (and code) indicate HC1; the registry is internally inconsistent. Concrete fix: align the “Standard errors” bullet with actual defaults (HC1) or change code defaults to match. docs/methodology/REGISTRY.md:149 docs/methodology/REGISTRY.md:1038
  • Severity: P2. Impact: Tests cover the warning for reference_period in post, but not the downstream behavior (summary/avg_att), so the failure mode is untested. Concrete fix: add tests asserting the chosen behavior (error or correct exclusion/zero effect) when reference_period is post. tests/test_estimators.py:2051

- Upgrade post-period reference_period from warning to ValueError (P1)
- Add warning when <2 pre-periods available for parallel trends (P1)
- Add absorbing treatment validation when unit param provided (P2)
- Fix staggered detection false positives on unbalanced panels (P2)
- Fix REGISTRY.md SE default documentation (HC1, not cluster-robust)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: a2931e67fba9911aa322198ae83f9055db632afd


Overall assessment: ⛔ Blocker
Executive summary:

  • P0: SE==0/non-finite paths still produce non-NaN confidence intervals for period effects, violating the NaN/Inf propagation rule and risking misleading inference.
  • P1: Methodology Registry now requires ≥2 pre-periods, but implementation only warns and proceeds, which is a spec mismatch for the updated event‑study design.
  • P2: Changelog states a warning for post-period reference, but code raises a ValueError (docs/behavior mismatch).

Methodology

  • P1 — Registry vs implementation mismatch on pre-period requirement. Impact: estimator can be run outside documented assumptions, undermining parallel‑trends assessment. Fix: either enforce ValueError when len(pre_periods) < 2 or update the Registry to explicitly allow a single pre‑period with a warning and “no pre‑trend test” caveat. Locations: docs/methodology/REGISTRY.md:L96-L101, diff_diff/estimators.py:L854-L867.

Code Quality

  • P0 — NaN/Inf propagation incomplete for period effects: t_stat is NaN when se is non‑finite/0, but CI is still computed from se (can collapse to the point estimate when se==0). Impact: misleading inference fields (CI not NaN) in edge cases flagged by the review checklist. Fix: if se is not finite or <=0, set t_stat, p_value, and conf_int to NaN (or (np.nan, np.nan)) before creating PeriodEffect, or add a global guard in compute_confidence_interval. Locations: diff_diff/estimators.py:L1015-L1026, diff_diff/utils.py:L81-L103.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P2 — Release notes vs behavior mismatch: changelog says “Warning when reference_period is set to a post‑treatment period,” but code raises ValueError. Impact: user expectations and automation may be wrong. Fix: update changelog (and any other docs) to “ValueError,” or change code to warn if that behavior is intended. Locations: CHANGELOG.md:L27, diff_diff/estimators.py:L891-L900.

- Guard period-level p_value and conf_int when SE is non-finite/zero
  (previously conf_int returned misleading zero-width interval)
- Add test assertions for full NaN propagation on unidentified periods
- Fix REGISTRY.md: "requires ≥2 pre-periods" → "warns when <2"
- Fix CHANGELOG.md: "Warning" → "ValueError" for post-period reference

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 9440e094e52b66dedf481d695fde2b91a6fb5361


Overall assessment: ⚠️ Needs changes

Executive summary

  • P1: The new full event‑study spec assumes an ever‑treated group indicator (D_i), but the implementation uses the raw treatment column, so time‑varying D_it inputs (common) make pre‑period effects undefined and undermine the main feature.
  • P1: Staggered‑adoption warning is opt‑in via unit; the registry says this warning is required, so the default path can silently violate assumptions.
  • P2: HonestDiD’s max‑pre‑violation now reads pre‑period effects without filtering non‑finite values, which can propagate NaNs into bounds.
  • Tests/docs should cover time‑varying treatment inputs and NaN pre‑period effects, and clarify the expected treatment indicator.

Methodology

  • P1 Treatment indicator expectation is inconsistent with the registry’s D_i event‑study equation: the code uses the raw treatment column in interactions, so if users pass time‑varying D_it (0 pre / 1 post), pre‑period interactions are all‑zero/collinear and pre‑period effects become NaN, defeating parallel‑trends assessment. Impact: misleads users and can invalidate pre‑trend diagnostics and downstream HonestDiD/PreTrendsPower inputs. Concrete fix: require/derive an ever‑treated indicator for interactions (e.g., D_i = groupby(unit).max() when unit is provided), validate time‑invariance when unit is passed, and document the requirement clearly. (diff_diff/estimators.py:L923-L952, docs/methodology/REGISTRY.md:L105-L127)
  • P1 Staggered‑adoption warning is only executed when unit is provided; the registry lists the warning as a required assumption check. Impact: users who omit unit (common) receive no warning and may apply MultiPeriodDiD to staggered adoption settings, biasing estimates. Concrete fix: make unit required (or infer it from a known column/cluster), or explicitly downgrade the registry requirement and docs to “opt‑in warning.” (diff_diff/estimators.py:L794-L832, docs/methodology/REGISTRY.md:L96-L103)
  • P2 HonestDiD._estimate_max_pre_violation now uses period_effects directly without filtering non‑finite pre‑period effects; NaNs can propagate into bounds when any pre‑period coefficient is unidentified. Impact: sensitivity analysis can return NaNs even when some pre‑period effects are valid. Concrete fix: filter np.isfinite(effect) and fallback to avg_se when no finite pre‑effects remain. (diff_diff/honest_did.py:L1237-L1248)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P2 Missing tests/docs for the time‑varying treatment input path and NaN pre‑period effects. Impact: regressions can slip in for the most common user data format (D_it). Concrete fix: add a test asserting either a warning or conversion to ever‑treated when treatment varies within unit, plus a test that HonestDiD ignores NaN pre‑effects; update README/docs to state that treatment should be an ever‑treated indicator (or that the estimator derives it).

- Add time-varying treatment warning when `unit` is provided and treatment
  varies within units (guides users toward ever-treated indicator D_i)
- Filter non-finite pre-period effects in HonestDiD _extract_event_study_params
  and _estimate_max_pre_violation (prevents NaN propagation into bounds)
- Update REGISTRY.md: D_i requirement and staggered check conditional on unit
- Add tests for D_it warning, NaN filtering, and all-NaN-pre-periods error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 8207057580c6d184fad01b81df1fbd4837792137


Overall Assessment

  • ⚠️ Needs changes

Executive Summary

  • MultiPeriodDiD now estimates full event‑study pre/post coefficients with updated results/visualization paths and expanded tests.
  • HonestDiD’s new event‑study extraction can silently drop post‑periods or end up with num_post=0, changing the estimand or causing divide‑by‑zero; needs a guard aligned with the Methodology Registry.
  • Registry now requires treated/control variation, but MultiPeriodDiD only validates binary values and can proceed with all‑treated/all‑control data.
  • Tests cover NaN pre‑period filtering, but no coverage for NaN/missing post‑period effects in HonestDiD.

Methodology

  • Severity: P1 | Impact: _extract_event_study_params filters non‑finite post‑period effects without error; HonestDiD then computes bounds on a reduced post set or divides by zero when num_post==0, which changes the estimand vs. the Registry’s ATT over all post periods. | Fix: After filtering, explicitly require all post‑periods to be finite (or raise) and add a num_post==0 guard in fit(); align behavior with MultiPeriodDiD’s NA‑propagation for unidentified post effects. diff_diff/honest_did.py:L535-L605, diff_diff/honest_did.py:L1084-L1116
  • Severity: P1 | Impact: Methodology Registry now states treatment must vary across treated/control groups, but MultiPeriodDiD only checks binary values and can proceed with all‑0 or all‑1 treatment, leading to rank deficiency and misleading output. | Fix: Add an explicit “both 0 and 1 present” check after validate_binary and raise a ValueError when violated. diff_diff/estimators.py:L791-L795, diff_diff/utils.py:L29-L45

Code Quality

  • Severity: P3 | Impact: No issues found. | Fix: None.

Performance

  • Severity: P3 | Impact: No issues found. | Fix: None.

Maintainability

  • Severity: P3 | Impact: No issues found. | Fix: None.

Tech Debt

  • Severity: P3 | Impact: No issues found. | Fix: None.

Security

  • Severity: P3 | Impact: No issues found. | Fix: None.

Documentation/Tests

  • Severity: P2 | Impact: No test ensures HonestDiD errors or propagates NA when post‑period effects are non‑finite or missing; current tests only cover pre‑period NaNs. This could let the P1 issue regress. | Fix: Add edge‑case tests for NaN/missing post‑period effects and num_post==0 in tests/test_honest_did.py. tests/test_honest_did.py:L692-L725

Add num_post == 0 guard in HonestDiD.fit() to raise ValueError when all
post-period effects are non-finite, preventing silent computation with
empty arrays. Covers both MultiPeriodDiD and CallawaySantAnna paths.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: d1de9cb68ee8a4ad5e412d23badc39e9040cc6ed


Overall assessment: ⛔ Blocker

Executive summary

  • P0: avg_att inference can emit infinite t-stats/p-values when avg_se is 0/non‑finite; this violates the NaN/Inf propagation rule and can mislead users.
  • P1: HonestDiD’s new full event‑study extraction assumes sorted periods are pre‑then‑post; if period labels don’t sort chronologically or post_periods aren’t strictly later, pre/post splitting is wrong and bounds are invalid.
  • Registry alignment is otherwise strong (full pre/post interactions, last‑pre reference, pre‑period warnings, sub‑VCV extraction).

Methodology

  • Severity: P0 | Impact: avg_att inference can report spurious significance if avg_se is 0 or non‑finite, which violates the edge‑case requirement that inference fields become NaN when SE is invalid; this can mislead downstream users and papers. | Concrete fix: mirror the per‑period guard when computing average inference: if not np.isfinite(avg_se) or avg_se <= 0, set avg_t_stat, avg_p_value, and avg_conf_int to NaN (and optionally avg_se to NaN) instead of dividing. | Location: diff_diff/estimators.py:MultiPeriodDiD.fit (avg ATT computation block).
  • Severity: P1 | Impact: HonestDiD uses sorted(periods) then slices beta_hat[:num_pre]/[num_pre:]; if time labels are non‑numeric or post_periods aren’t strictly later, pre/post effects are misclassified, invalidating parallel‑trends diagnostics and HonestDiD bounds (violates Methodology Registry’s pre/post separation). | Concrete fix: build ordered lists explicitly: pre_est = [p for p in pre_periods if finite], post_est = [p for p in post_periods if finite], then all_estimated = pre_est + post_est and use this same order for effects, sigma, and num_pre/num_post. Add a validation or test with non‑monotone labels to lock this in. | Location: diff_diff/honest_did.py:_extract_event_study_params.

Code Quality

  • No additional issues found beyond the methodology blockers.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P2 | Impact: There is no test that the HonestDiD extraction preserves pre/post ordering when period labels are non‑chronological; this makes the P1 above easy to regress. | Concrete fix: add a unit test with mixed/unsorted period labels or a post_periods list that is out of order, and assert correct pre/post counts and ordering. | Location: tests/test_honest_did.py (add a new extraction test).

- Align avg_se guard with per-period pattern (np.isfinite check prevents
  avg_t_stat=0 / avg_p_value=1 when variance is infinite)
- Use explicit pre-then-post ordering in HonestDiD extraction instead of
  sorted() (prevents misclassification when period labels don't sort
  chronologically)
- Add test with non-monotone period labels (pre=[5,6,7], post=[1,2])

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 66b27546da306fc4b3e89a3e0423633fbdc09e9d


PR Review

Overall assessment: ✅ Looks good

Executive summary

  • MultiPeriodDiD’s full event‑study spec (pre + post interactions, last‑pre reference default, post‑only avg ATT) aligns with the Methodology Registry.
  • HonestDiD/PreTrendsPower now use interaction sub‑VCV extraction; ordering and NaN filtering logic look consistent with the updated results object.
  • Two minor doc/UX inconsistencies to clean up (warning text and an outdated comment).

Methodology

  • No findings. Cross‑checked MultiPeriodDiD + HonestDiD/PreTrendsPower changes against docs/methodology/REGISTRY.md; no mismatches in spec/SE/identification logic found.

Code Quality

  • P3 — Impact: The comment in HonestDiD.fit() still claims beta_hat contains only post‑period effects, which is no longer true after the full event‑study change and can mislead future maintenance. Fix: update the comment to reflect pre+post ordering and the conditional extraction that follows. (diff_diff/honest_did.py:L1094-L1101)

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 — Impact: The FutureWarning message says the default reference period will change in a future release, but the code already uses last pre‑period now; this can confuse users about current behavior. Fix: update the warning text to say the default has changed (or only warn while still using the old default). (diff_diff/estimators.py:L890-L905)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 3cb1c857280069541281f8482a1a328e04478628


Overall Assessment: ⚠️ Needs changes

Executive Summary

  • P2 methodology: HonestDiD orders coefficients with sorted(...) instead of preserving the pre_periods/post_periods order, which can mis-order event time for non-numeric labels and distort smoothness-based bounds. diff_diff/honest_did.py:L561-L606
  • MultiPeriodDiD’s event-study spec, defaults, and warnings align with the Methodology Registry (full pre+post interactions, last-pre reference, D_i guidance).
  • NaN/Inf propagation for period-level and average inference in MultiPeriodDiD is now consistent.
  • Tests/docs are expanded for event-study outputs, HonestDiD sub-VCV extraction, and new warnings.

Methodology

  • P2 | Impact: HonestDiD smoothness/combined bounds rely on the chronological ordering of event-time coefficients; using sorted(...) can reorder periods when labels aren’t naturally sortable (e.g., string periods), violating the ordering assumptions in Rambachan & Roth and yielding incorrect bounds. | Fix: Build pre_estimated/post_estimated by filtering pre_periods/post_periods in their existing order (e.g., pre_estimated = [p for p in pre_periods if p in finite_periods]) and use that ordering for all_estimated, beta_hat, and indices. diff_diff/honest_did.py:L561-L606

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found.

Replace sorted() with order-preserving list comprehensions in
_extract_event_study_params to maintain chronological period ordering.
sorted() could break smoothness-based bounds (DeltaSD, DeltaSDRM) when
period labels don't sort chronologically by default (e.g., string labels).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Feb 1, 2026

/ai-review

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: ecddfb248a6585104db3d5850d47e41807d6bfca


Overall Assessment ✅ Looks good

Executive Summary

  • MultiPeriodDiD’s full event‑study spec and default reference period align with the updated Methodology Registry; no mismatches found.
  • HonestDiD/PreTrendsPower now consume interaction sub‑VCV and filter non‑finite estimates, which matches the intended sensitivity‑analysis inputs.
  • Edge‑case handling around non‑finite estimates and reference‑period access appears covered; no blocking issues identified.

Methodology

  • No findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • No findings.

@igerber igerber merged commit 4d7901c into main Feb 2, 2026
7 checks passed
@igerber igerber deleted the methodology-review-multiperiod branch February 2, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant