Abstract
Can events be accurately described as historic at the time they are happening? Claims of this sort are in effect predictions about the evaluations of future historians; that is, that they will regard the events in question as significant. Here we provide empirical evidence in support of earlier philosophical arguments1 that such claims are likely to be spurious and that, conversely, many events that will one day be viewed as historic attract little attention at the time. We introduce a conceptual and methodological framework for applying machine learning prediction models to large corpora of digitized historical archives. We find that although such models can correctly identify some historically important documents, they tend to overpredict historical significance while also failing to identify many documents that will later be deemed important, where both types of error increase monotonically with the number of documents under consideration. On balance, we conclude that historical significance is extremely difficult to predict, consistent with other recent work on intrinsic limits to predictability in complex social systems2,3. However, the results also indicate the feasibility of developing ‘artificial archivists’ to identify potentially historic documents in very large digital corpora.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The original data (that is, cable text and metadata) analysed during the current study are available in the History Lab repository (http://history-lab.org) and US National Archives (https://aad.archives.gov/aad/series-description.jsp?s=4073). Derivative data (for example, model scores) are available at https://osf.io/nhmcd/.
Code availability
All code necessary to reproduce our results is available at https://osf.io/nhmcd/.
References
Danto, A. C. Analytical Philosophy of History (Cambridge Univ. Press, 1965).
Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. In Proc. 25th International Conference on World Wide Web 683–694 (International World Wide Web Conference Committee, 2016).
Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).
Hegel, G. W. F. Hegel’s Philosophy of Right (Clarendon Press, 1942).
Bearman, P. S., Faris, R. & Moody, J. Blocking the future: new solutions for old problems in historical social science. Soc. Sci. Hist. 23, 501–533 (1999).
Sewell, W. H. Historical events as transformations of structures: inventing revolution at the Bastille. Theory Soc. 25, 841–881 (1996).
Kennedy, P. M. The Rise and Fall of the Great Powers: Economic Change and Military Conflict from 1500 to 2000 (Random House, 1987).
McAllister, W. B., Botts, J., Cozzens, P. & Marrs, A. W. Toward “Thorough, Accurate, and Reliable”: A History of the Foreign Relations of the United States Series (US Department of State, 2015).
Schapire, R. E. in Nonlinear Estimation and Classification 149–171 (Springer, 2003).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).
Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).
Newman, D. J. & Block, S. Probabilistic topic decomposition of an eighteenth‐century American newspaper. J. Am. Soc. Inf. Sci. Technol. 57, 753–767 (2006).
Yang, T.-I., Torget, A. & Mihalcea, R. Topic modeling on historical newspapers. In Proc. 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 96–104 (Association for Computational Linguistics, 2011).
Griffiths, T. L. & Steyvers, M. Finding scientific topics. Proc. Natl Acad. Sci. USA 101, 5228–5235 (2004).
Blei, D. M. & Lafferty, J. D. A correlated topic model of science. Ann. Appl. Stat. 1, 17–35 (2007).
Hall, D., Jurafsky, D. & Manning, C. D. Studying the history of ideas using topic models. In Proc. Conference on Empirical Methods in Natural Language Processing 363–371 (Association for Computational Linguistics, 2008).
Barron, A. T., Huang, J., Spang, R. L. & DeDeo, S. Individuals, institutions, and innovation in the debates of the French Revolution. Proc. Natl Acad. Sci. USA 115, 4607–4612 (2018).
Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).
Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).
Salganik, M. J., Dodds, P. S. & Watts, D. J. Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006).
Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355, 477–480 (2017).
Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. In Proc. 4th ACM International Conference on Web Search and Data Mining 65–74 (Association for Computing Machinery, 2011).
González-Bailón, S. Decoding the Social World: Data Science and the Unintended Consequences of Communication (MIT Press, 2017).
Ferguson, N. Virtual History: Alternatives and Counterfactuals (Hachette UK, 2008).
Acknowledgements
The authors are grateful for support from the team at Columbia University’s History Lab. This work was supported in part by the National Science Foundation (SBE-1637108). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
D.J.W. and M.C. conceived and designed the experiments. J.R., A.S. and R.S. performed the experiments and analysed the data. M.C. and R.S. contributed materials. D.J.W. and M.C. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–4, Supplementary Tables 1–2 and Supplementary Methods.
Rights and permissions
About this article
Cite this article
Risi, J., Sharma, A., Shah, R. et al. Predicting history. Nat Hum Behav 3, 906–912 (2019). https://doi.org/10.1038/s41562-019-0620-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-019-0620-8
This article is cited by
-
The rise of machine learning in the academic social sciences
AI & SOCIETY (2024)