Predicting history

Abstract

Can events be accurately described as historic at the time they are happening? Claims of this sort are in effect predictions about the evaluations of future historians; that is, that they will regard the events in question as significant. Here we provide empirical evidence in support of earlier philosophical arguments1 that such claims are likely to be spurious and that, conversely, many events that will one day be viewed as historic attract little attention at the time. We introduce a conceptual and methodological framework for applying machine learning prediction models to large corpora of digitized historical archives. We find that although such models can correctly identify some historically important documents, they tend to overpredict historical significance while also failing to identify many documents that will later be deemed important, where both types of error increase monotonically with the number of documents under consideration. On balance, we conclude that historical significance is extremely difficult to predict, consistent with other recent work on intrinsic limits to predictability in complex social systems2,3. However, the results also indicate the feasibility of developing ‘artificial archivists’ to identify potentially historic documents in very large digital corpora.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Performance of the PCI model.
Fig. 2: Performance of the SIC model.
Fig. 3: Venn diagram of true and false positives and negatives for the PCI and SIC models.

Data availability

The original data (that is, cable text and metadata) analysed during the current study are available in the History Lab repository (http://history-lab.org) and US National Archives (https://aad.archives.gov/aad/series-description.jsp?s=4073). Derivative data (for example, model scores) are available at https://osf.io/nhmcd/.

Code availability

All code necessary to reproduce our results is available at https://osf.io/nhmcd/.

References

  1. 1.

    Danto, A. C. Analytical Philosophy of History (Cambridge Univ. Press, 1965).

  2. 2.

    Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. In Proc. 25th International Conference on World Wide Web 683–694 (International World Wide Web Conference Committee, 2016).

  3. 3.

    Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).

  4. 4.

    Hegel, G. W. F. Hegel’s Philosophy of Right (Clarendon Press, 1942).

  5. 5.

    Bearman, P. S., Faris, R. & Moody, J. Blocking the future: new solutions for old problems in historical social science. Soc. Sci. Hist. 23, 501–533 (1999).

  6. 6.

    Sewell, W. H. Historical events as transformations of structures: inventing revolution at the Bastille. Theory Soc. 25, 841–881 (1996).

  7. 7.

    Kennedy, P. M. The Rise and Fall of the Great Powers: Economic Change and Military Conflict from 1500 to 2000 (Random House, 1987).

  8. 8.

    McAllister, W. B., Botts, J., Cozzens, P. & Marrs, A. W. Toward “Thorough, Accurate, and Reliable”: A History of the Foreign Relations of the United States Series (US Department of State, 2015).

  9. 9.

    Schapire, R. E. in Nonlinear Estimation and Classification 149–171 (Springer, 2003).

  10. 10.

    Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).

  11. 11.

    Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).

  12. 12.

    Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).

  13. 13.

    Newman, D. J. & Block, S. Probabilistic topic decomposition of an eighteenth‐century American newspaper. J. Am. Soc. Inf. Sci. Technol. 57, 753–767 (2006).

  14. 14.

    Yang, T.-I., Torget, A. & Mihalcea, R. Topic modeling on historical newspapers. In Proc. 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 96–104 (Association for Computational Linguistics, 2011).

  15. 15.

    Griffiths, T. L. & Steyvers, M. Finding scientific topics. Proc. Natl Acad. Sci. USA 101, 5228–5235 (2004).

  16. 16.

    Blei, D. M. & Lafferty, J. D. A correlated topic model of science. Ann. Appl. Stat. 1, 17–35 (2007).

  17. 17.

    Hall, D., Jurafsky, D. & Manning, C. D. Studying the history of ideas using topic models. In Proc. Conference on Empirical Methods in Natural Language Processing 363–371 (Association for Computational Linguistics, 2008).

  18. 18.

    Barron, A. T., Huang, J., Spang, R. L. & DeDeo, S. Individuals, institutions, and innovation in the debates of the French Revolution. Proc. Natl Acad. Sci. USA 115, 4607–4612 (2018).

  19. 19.

    Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014).

  20. 20.

    Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).

  21. 21.

    Salganik, M. J., Dodds, P. S. & Watts, D. J. Experimental study of inequality and unpredictability in an artificial cultural market. Science 311, 854–856 (2006).

  22. 22.

    Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355, 477–480 (2017).

  23. 23.

    Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. In Proc. 4th ACM International Conference on Web Search and Data Mining 65–74 (Association for Computing Machinery, 2011).

  24. 24.

    González-Bailón, S. Decoding the Social World: Data Science and the Unintended Consequences of Communication (MIT Press, 2017).

  25. 25.

    Ferguson, N. Virtual History: Alternatives and Counterfactuals (Hachette UK, 2008).

Download references

Acknowledgements

The authors are grateful for support from the team at Columbia University’s History Lab. This work was supported in part by the National Science Foundation (SBE-1637108). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

D.J.W. and M.C. conceived and designed the experiments. J.R., A.S. and R.S. performed the experiments and analysed the data. M.C. and R.S. contributed materials. D.J.W. and M.C. wrote the paper.

Correspondence to Matthew Connelly or Duncan J. Watts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, Supplementary Tables 1–2 and Supplementary Methods.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark