Predicting history

Risi, Joseph; Sharma, Amit; Shah, Rohan; Connelly, Matthew; Watts, Duncan J.

doi:10.1038/s41562-019-0620-8

Letter
Published: 03 June 2019

Predicting history

Nature Human Behaviour volume 3, pages 906–912 (2019)Cite this article

3042 Accesses
8 Citations
226 Altmetric
Metrics details

Subjects

Abstract

Can events be accurately described as historic at the time they are happening? Claims of this sort are in effect predictions about the evaluations of future historians; that is, that they will regard the events in question as significant. Here we provide empirical evidence in support of earlier philosophical arguments¹ that such claims are likely to be spurious and that, conversely, many events that will one day be viewed as historic attract little attention at the time. We introduce a conceptual and methodological framework for applying machine learning prediction models to large corpora of digitized historical archives. We find that although such models can correctly identify some historically important documents, they tend to overpredict historical significance while also failing to identify many documents that will later be deemed important, where both types of error increase monotonically with the number of documents under consideration. On balance, we conclude that historical significance is extremely difficult to predict, consistent with other recent work on intrinsic limits to predictability in complex social systems^2,3. However, the results also indicate the feasibility of developing ‘artificial archivists’ to identify potentially historic documents in very large digital corpora.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Performance of the PCI model.**

**Fig. 2: Performance of the SIC model.**

**Fig. 3: Venn diagram of true and false positives and negatives for the PCI and SIC models.**

On revolutions

Article Open access 07 January 2020

A fragment-based approach for computing the long-term visual evolution of historical maps

Article Open access 04 March 2024

Integrating explanation and prediction in computational social science

Article 30 June 2021

Data availability

The original data (that is, cable text and metadata) analysed during the current study are available in the History Lab repository (http://history-lab.org) and US National Archives (https://aad.archives.gov/aad/series-description.jsp?s=4073). Derivative data (for example, model scores) are available at https://osf.io/nhmcd/.

Code availability

All code necessary to reproduce our results is available at https://osf.io/nhmcd/.

References

Danto, A. C. Analytical Philosophy of History (Cambridge Univ. Press, 1965).
Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. In Proc. 25th International Conference on World Wide Web 683–694 (International World Wide Web Conference Committee, 2016).
Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).
Article CAS Google Scholar
Hegel, G. W. F. Hegel’s Philosophy of Right (Clarendon Press, 1942).
Bearman, P. S., Faris, R. & Moody, J. Blocking the future: new solutions for old problems in historical social science. Soc. Sci. Hist. 23, 501–533 (1999).
Google Scholar
Sewell, W. H. Historical events as transformations of structures: inventing revolution at the Bastille. Theory Soc. 25, 841–881 (1996).
Article Google Scholar
Kennedy, P. M. The Rise and Fall of the Great Powers: Economic Change and Military Conflict from 1500 to 2000 (Random House, 1987).
McAllister, W. B., Botts, J., Cozzens, P. & Marrs, A. W. Toward “Thorough, Accurate, and Reliable”: A History of the Foreign Relations of the United States Series (US Department of State, 2015).
Schapire, R. E. in Nonlinear Estimation and Classification 149–171 (Springer, 2003).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).
Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).
Google Scholar
Newman, D. J. & Block, S. Probabilistic topic decomposition of an eighteenth‐century American newspaper. J. Am. Soc. Inf. Sci. Technol. 57, 753–767 (2006).
Article Google Scholar
Yang, T.-I., Torget, A. & Mihalcea, R. Topic modeling on historical newspapers. In Proc. 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 96–104 (Association for Computational Linguistics, 2011).
Griffiths, T. L. & Steyvers, M. Finding scientific topics. Proc. Natl Acad. Sci. USA 101, 5228–5235 (2004).
Article CAS Google Scholar
Blei, D. M. & Lafferty, J. D. A correlated topic model of science. Ann. Appl. Stat. 1, 17–35 (2007).
Article Google Scholar
Hall, D., Jurafsky, D. & Manning, C. D. Studying the history of ideas using topic models. In Proc. Conference on Empirical Methods in Natural Language Processing 363–371 (Association for Computational Linguistics, 2008).
Barron, A. T., Huang, J., Spang, R. L. & DeDeo, S. Individuals, institutions, and innovation in the debates of the French Revolution. Proc. Natl Acad. Sci. USA 115, 4607–4612 (2018).
Article CAS Google Scholar
Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).
Article Google Scholar
Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).
Salganik, M. J., Dodds, P. S. & Watts, D. J. Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006).
Article CAS Google Scholar
Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355, 477–480 (2017).
Article CAS Google Scholar
Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. In Proc. 4th ACM International Conference on Web Search and Data Mining 65–74 (Association for Computing Machinery, 2011).
González-Bailón, S. Decoding the Social World: Data Science and the Unintended Consequences of Communication (MIT Press, 2017).
Ferguson, N. Virtual History: Alternatives and Counterfactuals (Hachette UK, 2008).

Download references

Acknowledgements

The authors are grateful for support from the team at Columbia University’s History Lab. This work was supported in part by the National Science Foundation (SBE-1637108). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors contributed equally: Joseph Risi, Amit Sharma.

Authors and Affiliations

Microsoft Research New York, New York, NY, USA
Joseph Risi & Duncan J. Watts
Microsoft Research India, Bangalore, India
Amit Sharma & Rohan Shah
Department of History, Columbia University, New York, NY, USA
Matthew Connelly

Authors

Joseph Risi
View author publications
You can also search for this author in PubMed Google Scholar
Amit Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Rohan Shah
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Connelly
View author publications
You can also search for this author in PubMed Google Scholar
Duncan J. Watts
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.J.W. and M.C. conceived and designed the experiments. J.R., A.S. and R.S. performed the experiments and analysed the data. M.C. and R.S. contributed materials. D.J.W. and M.C. wrote the paper.

Corresponding authors

Correspondence to Matthew Connelly or Duncan J. Watts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, Supplementary Tables 1–2 and Supplementary Methods.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Risi, J., Sharma, A., Shah, R. et al. Predicting history. Nat Hum Behav 3, 906–912 (2019). https://doi.org/10.1038/s41562-019-0620-8

Download citation

Received: 30 October 2018
Accepted: 25 April 2019
Published: 03 June 2019
Issue Date: September 2019
DOI: https://doi.org/10.1038/s41562-019-0620-8