Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Predicting history

Abstract

Can events be accurately described as historic at the time they are happening? Claims of this sort are in effect predictions about the evaluations of future historians; that is, that they will regard the events in question as significant. Here we provide empirical evidence in support of earlier philosophical arguments1 that such claims are likely to be spurious and that, conversely, many events that will one day be viewed as historic attract little attention at the time. We introduce a conceptual and methodological framework for applying machine learning prediction models to large corpora of digitized historical archives. We find that although such models can correctly identify some historically important documents, they tend to overpredict historical significance while also failing to identify many documents that will later be deemed important, where both types of error increase monotonically with the number of documents under consideration. On balance, we conclude that historical significance is extremely difficult to predict, consistent with other recent work on intrinsic limits to predictability in complex social systems2,3. However, the results also indicate the feasibility of developing ‘artificial archivists’ to identify potentially historic documents in very large digital corpora.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Performance of the PCI model.
Fig. 2: Performance of the SIC model.
Fig. 3: Venn diagram of true and false positives and negatives for the PCI and SIC models.

Similar content being viewed by others

Armand M. Leroi, Ben Lambert, … Patrik Lindenfors

Data availability

The original data (that is, cable text and metadata) analysed during the current study are available in the History Lab repository (http://history-lab.org) and US National Archives (https://aad.archives.gov/aad/series-description.jsp?s=4073). Derivative data (for example, model scores) are available at https://osf.io/nhmcd/.

Code availability

All code necessary to reproduce our results is available at https://osf.io/nhmcd/.

References

  1. Danto, A. C. Analytical Philosophy of History (Cambridge Univ. Press, 1965).

  2. Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. In Proc. 25th International Conference on World Wide Web 683–694 (International World Wide Web Conference Committee, 2016).

  3. Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).

    Article  CAS  Google Scholar 

  4. Hegel, G. W. F. Hegel’s Philosophy of Right (Clarendon Press, 1942).

  5. Bearman, P. S., Faris, R. & Moody, J. Blocking the future: new solutions for old problems in historical social science. Soc. Sci. Hist. 23, 501–533 (1999).

    Google Scholar 

  6. Sewell, W. H. Historical events as transformations of structures: inventing revolution at the Bastille. Theory Soc. 25, 841–881 (1996).

    Article  Google Scholar 

  7. Kennedy, P. M. The Rise and Fall of the Great Powers: Economic Change and Military Conflict from 1500 to 2000 (Random House, 1987).

  8. McAllister, W. B., Botts, J., Cozzens, P. & Marrs, A. W. Toward “Thorough, Accurate, and Reliable”: A History of the Foreign Relations of the United States Series (US Department of State, 2015).

  9. Schapire, R. E. in Nonlinear Estimation and Classification 149–171 (Springer, 2003).

  10. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).

  11. Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).

  12. Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).

    Google Scholar 

  13. Newman, D. J. & Block, S. Probabilistic topic decomposition of an eighteenth‐century American newspaper. J. Am. Soc. Inf. Sci. Technol. 57, 753–767 (2006).

    Article  Google Scholar 

  14. Yang, T.-I., Torget, A. & Mihalcea, R. Topic modeling on historical newspapers. In Proc. 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 96–104 (Association for Computational Linguistics, 2011).

  15. Griffiths, T. L. & Steyvers, M. Finding scientific topics. Proc. Natl Acad. Sci. USA 101, 5228–5235 (2004).

    Article  CAS  Google Scholar 

  16. Blei, D. M. & Lafferty, J. D. A correlated topic model of science. Ann. Appl. Stat. 1, 17–35 (2007).

    Article  Google Scholar 

  17. Hall, D., Jurafsky, D. & Manning, C. D. Studying the history of ideas using topic models. In Proc. Conference on Empirical Methods in Natural Language Processing 363–371 (Association for Computational Linguistics, 2008).

  18. Barron, A. T., Huang, J., Spang, R. L. & DeDeo, S. Individuals, institutions, and innovation in the debates of the French Revolution. Proc. Natl Acad. Sci. USA 115, 4607–4612 (2018).

    Article  CAS  Google Scholar 

  19. Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).

    Article  Google Scholar 

  20. Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).

  21. Salganik, M. J., Dodds, P. S. & Watts, D. J. Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006).

    Article  CAS  Google Scholar 

  22. Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355, 477–480 (2017).

    Article  CAS  Google Scholar 

  23. Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. In Proc. 4th ACM International Conference on Web Search and Data Mining 65–74 (Association for Computing Machinery, 2011).

  24. González-Bailón, S. Decoding the Social World: Data Science and the Unintended Consequences of Communication (MIT Press, 2017).

  25. Ferguson, N. Virtual History: Alternatives and Counterfactuals (Hachette UK, 2008).

Download references

Acknowledgements

The authors are grateful for support from the team at Columbia University’s History Lab. This work was supported in part by the National Science Foundation (SBE-1637108). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

D.J.W. and M.C. conceived and designed the experiments. J.R., A.S. and R.S. performed the experiments and analysed the data. M.C. and R.S. contributed materials. D.J.W. and M.C. wrote the paper.

Corresponding authors

Correspondence to Matthew Connelly or Duncan J. Watts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, Supplementary Tables 1–2 and Supplementary Methods.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Risi, J., Sharma, A., Shah, R. et al. Predicting history. Nat Hum Behav 3, 906–912 (2019). https://doi.org/10.1038/s41562-019-0620-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-019-0620-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing