Anoop Menon; Sen Chai, Harvard University; Clarence Lee, Cornell University; and Haris Tabakovic, The Brattle Group
Abstract: Can machine learning techniques be used to predict high-impact, general technologies? We find that an ensemble of deep learning models that analyze both the text of patents as well as their bibliometric information can ex-ante identify such patents, accurately identifying 80 of the top 100 high generality patents in the hold-out sample. We also find that just the abstracts of patents have enough information content to allow a deep learning model trained on them to outperform models that take into account many quality indicator variables about patents. This ability to identify promising technologies early can be highly valuable for society, helping spot promising technologies in their nascent stages, with implications for national technology policy as well as for the R&D strategies of firms. It also demonstrates that machine learning-driven prediction models can be fruitfully applied even to such complex, socio-culturally driven processes as technology adoption.