Experiment Variance Reduction
Experiment variance reduction is the use of statistical and assignment methods that lower noise in experiment readouts, narrowing confidence intervals so fewer samples are needed for the same effect size.
Key points
- Statsig positions variance reduction as a major lever for faster A/B tests because lower noise produces narrower confidence intervals [src-031].
- CUPED subtracts variation explained by pre-experiment covariates; Statsig’s CURE generalizes this with regression over arbitrary covariate data [src-031].
- The article claims CURE can cut variance substantially, while noting that CUPED already captures most of the value for many practical teams [src-031].
- Winsorization and thresholding can tame heavy-tailed spend metrics by trimming or capping extreme observations, but the rule must be documented carefully [src-031].
- Stratified sampling or stratified assignment can create better-balanced groups from day one instead of hoping randomization balances high-value and low-value users by chance [src-031].
- Statsig warns against excessive novelty: if a fancy method flips a result and stakeholders do not trust the method, the whole experimentation program can lose credibility [src-031].
- Anthropic extends variance reduction to model evals: for chain-of-thought evals, resampling multiple answers per question and averaging at the question level can reduce stochastic response variance [src-067].
- For non-chain-of-thought multiple-choice evals, next-token probabilities may eliminate much of the random answer component by using probability mass on the correct answer as the score [src-067].
- Pairing model scores by question is another “free” variance-reduction tactic when models are evaluated on the same benchmark items [src-067].
Related entities
Related concepts
- A/B Test Acceleration
- Experiment Statistical Power
- Proxy Metrics in Experiments
- Sequential Testing
- Statistical Model Evaluations
- Paired-Difference Model Evals
- Clustered Standard Errors in Evals