Amandeep Singh, PhD Candidate, The Wharton School
Abstract: Supervised machine learning algorithms fail to perform well in the presence of endogeneity in the explanatory variables. In this paper, we borrow from literature on partial identification to propose deep causal inequalities that overcomes this issue. Instead of relying on observed labels, the DeepCI estimator uses inferred inequalities from the observed behavior of agents in the data. This by construction can allows us to circumvent the issue of endogeneous explanatory variables in many cases. We provide theoretical guarantees for our estimator and demonstrate it is consistent under very mild conditions. We demonstrate through extensive simulations that our estimator outperforms standard supervised machine learning algorithms and existing partial identification methods. Finally, we also demonstrate how estimators based on deep causal inequalities can be very valuable in demand settings. We discuss how deep causal inequalities could help in estimating demand structures with observational data that suffers from endogeneity issues. We further demonstrate how deep causal inequalities allows for highly unstructured data like images in differentiated products demand settings. Using customer level rental data from Hertz, we estimate what design features of cars correspond to consumer demand.