Bandits for Time-to-Event Outcomes

Hamsa Bastani and Arielle Anderer, Operations, Information and Decisions, The Wharton School; John Silberholz, University of Michigan

Abstract: Bandits are a popular framework for adaptive, sequential decision-making and have found numerous applications including clinical trials, mobile health, dynamic pricing, and personalized customer targeting. A large literature has designed state-of-the-art algorithms that efficiently manage an exploration-exploitation tradeoff. However, these algorithms are not designed for time-to-event outcomes, which are prevalent in healthcare and marketing. We design new algorithms that leverage tools from survival analysis to provably perform well in these settings. We demonstrate the value of our approach on a large-scale clinical trial database on metastatic breast cancer.