Multiple Testing Correction
Multiple testing correction is the adjustment of statistical decision rules when many hypotheses or metrics are tested at once, so false positives do not accumulate unnoticed.
Key points
- Statsig’s significance guide warns that false positives rise quickly when teams test many hypotheses simultaneously without adjustment [src-035].
- Bonferroni correction controls family-wise error rate by making each individual test harder to pass when many tests are run [src-035].
- Benjamini-Hochberg controls false discovery rate, which is often more practical when teams expect to inspect many metrics or hypotheses and can tolerate some discoveries being false [src-035].
- The speed-focused Statsig article distinguishes multiple-testing correction from Sequential Testing: Benjamini-Hochberg can correct broad metric dashboards, but it does not solve repeated peeking over time [src-031].
- The practical pattern is to combine the right correction with the error being created: multiple-comparison methods for many metrics or hypotheses, sequential methods for repeated looks, and clear reporting to preserve trust [src-031, src-035].
- Multiple testing correction also reduces the temptation toward p-hacking because it makes selective reporting of one lucky significant metric less persuasive [src-035].
Related entities
Related concepts
- Statistical Significance Testing
- P-Value Interpretation
- Sequential Testing
- Experiment Statistical Power
- A/B Testing Mindset
Source references
- [src-031] Yuzheng Sun — “Speeding up A/B tests with discipline” (2025-06-24)
- [src-035] Jack Virag — “How to accurately test statistical significance” (2025-04-12)
Recommended next
Keep reading from this thread
From 494 indexed pages and articles.
- Wiki concept Sequential Testing An experiment-analysis approach that allows teams to inspect results over time and stop early when evidence is strong while controlling error Related by correction
- Wiki concept Statistical Significance Testing The practice of deciding whether an observed experiment result is likely to reflect a real effect rather than random Related by 035
- Insight Recommendation Systems in Production How recommendation systems become production decisioning systems through signals, ranking, constraints, feedback loops, and experimentation Readers have engaged with this next