Kartik Hosanagar, Operations, Information, and Decisions, The Wharton School
Abstract: In this project, we seek to update the statistical paradigm for conducting A/B tests in the 21st century. In particular, we plan to develop a statistical framework that takes the guesswork out of deciding when to stop an experiment and to calibrate a firm’s decision thresholds based on their own historical data. We have an established partnership with an A/B testing software provider, which gives us unique insight into the distribution of effect sizes, uncertainty estimates, and dynamics of how these factors change over time. While the traditional 5% significance threshold has been an informal rule for making judgments about the outcome of randomized experiments for nearly a century, this dataset gives us a rare opportunity to quantitatively measure the costs of Type I (detecting an effect when none exists) and Type II errors (not detecting an effect when one exists).