Letter | Published:

The wisdom of the inner crowd in three large natural experiments


The quality of decisions depends on the accuracy of estimates of relevant quantities. According to the wisdom of crowds principle, accurate estimates can be obtained by combining the judgements of different individuals1,2. This principle has been successfully applied to improve, for example, economic forecasts3,4,5, medical judgements6,7,8,9 and meteorological predictions10,11,12,13. Unfortunately, there are many situations in which it is infeasible to collect judgements of others. Recent research proposes that a similar principle applies to repeated judgements from the same person14. This paper tests this promising approach on a large scale in a real-world context. Using proprietary data comprising 1.2 million observations from three incentivized guessing competitions, we find that within-person aggregation indeed improves accuracy and that the method works better when there is a time delay between subsequent judgements. However, the benefit pales against that of between-person aggregation: the average of a large number of judgements from the same person is barely better than the average of two judgements from different people.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Surowicki, J. The Wisdom of Crowds. Why the Many Are Smarter Than the Few (Doubleday Books, New York, NY, 2004).

  2. 2.

    Page, S. E. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies (Princeton Univ. Press, Princeton, NJ, 2007).

  3. 3.

    Clemen, R. T. Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5, 559–583 (1989).

  4. 4.

    Armstrong, J. S. in Principles of Forecasting: A Handbook for Researchers and Practitioners (ed. Armstrong, J. S.) 417–439 (Kluwer Academic, Norwell, MA, 2001).

  5. 5.

    Timmermann, A. in Handbook of Economic Forecasting Vol. 1 (eds Elliot, G. et al.) 135–196 (Elsevier, Amsterdam, 2006).

  6. 6.

    Kurvers, R. H. J. M., Krause, J., Argenziano, G., Zalaudek, I. & Wolf, M. Detection accuracy of collective intelligence assessments for skin cancer diagnosis. JAMA Dermatol. 151, 1346–1353 (2015).

  7. 7.

    Wolf, M., Krause, J., Carney, P. A., Bogart, A. & Kurvers, R. H. J. M. Collective intelligence meets medical decision-making: the collective outperforms the best radiologist. PLoS ONE 10, e0134269 (2015).

  8. 8.

    Kurvers, R. H. J. M. et al. Boosting medical diagnostics by pooling independent judgments. Proc. Natl Acad. Sci. USA 113, 8777–8782 (2016).

  9. 9.

    Kämmer, J. E., Hautz, W. E., Herzog, S. M., Kunina-Habenicht, O. & Kurvers, R. H. J. M. The potential of collective intelligence in emergency medicine: pooling medical students’ independent decisions improves diagnostic performance. Med. Decis. Making 37, 715–724 (2017).

  10. 10.

    Sanders, F. On subjective probability forecasting. J. Appl. Meteorol. 2, 191–201 (1963).

  11. 11.

    Staël von Holstein, C.-A. An experiment in probabilistic weather forecasting. J. Appl. Meteorol. 10, 635–645 (1971).

  12. 12.

    Vislocky, R. L. & Fritsch, J. M. Improved model output statistics forecasts through model consensus. Bull. Am. Meteorol. Soc. 76, 1157–1164 (1995).

  13. 13.

    Baars, J. A. & Mass, C. F. Performance of national weather service forecasts compared to operational, consensus, and weighted model output statistics. Weather Forecast. 20, 1034–1047 (2005).

  14. 14.

    Vul, E. & Pashler, H. Measuring the crowd within: probabilistic representations within individuals. Psychol. Sci. 19, 645–647 (2008).

  15. 15.

    Kelley, T. L. The applicability of the Spearman–Brown formula for the measurement of reliability. J. Educ. Psychol. 16, 300–303 (1925).

  16. 16.

    Stroop, J. R. Is the judgment of the group better than that of the average member of the group? J. Exp. Psychol. 15, 550–562 (1932).

  17. 17.

    Preston, M. G. Note on the reliability and the validity of the group judgment. J. Exp. Psychol. 22, 462–471 (1938).

  18. 18.

    Eysenck, H. J. The validity of judgments as a function of the number of judges. J. Exp. Psychol. 25, 650–654 (1939).

  19. 19.

    Hogarth, R. M. A note on aggregating opinions. Organ. Behav. Hum. Perform. 21, 40–46 (1978).

  20. 20.

    Galton, F. Vox populi. Nature 75, 450–451 (1907).

  21. 21.

    Galton, F. The ballot-box. Nature 75, 509–510 (1907).

  22. 22.

    Galton, F. Memories of My Life (Methuen & Co, London, 1908).

  23. 23.

    Gordon, K. Group judgments in the field of lifted weights. J. Exp. Psychol. 7, 398–400 (1924).

  24. 24.

    Jenness, A. The role of discussion in changing opinion regarding a matter of fact. J. Abnorm. Soc. Psychol. 27, 279–296 (1932).

  25. 25.

    Gordon, K. Further observations on group judgments of lifted weights. J. Psychol. 1, 105–115 (1935).

  26. 26.

    Klugman, S. F. Group judgments for familiar and unfamiliar materials. J. Gen. Psychol. 32, 103–110 (1945).

  27. 27.

    Treynor, J. L. Market efficiency and the bean jar experiment. Financ. Anal. J. 43, 50–53 (1987).

  28. 28.

    Blackwell, C. & Pickford, R. The wisdom of the few or the wisdom of the many? An indirect test of the marginal trader hypothesis. J. Econ. Finan. 35, 164–180 (2011).

  29. 29.

    Lorenz, J., Rauhut, H., Schweitzer, F. & Helbing, D. How social influence can undermine the wisdom of crowd effect. Proc. Natl Acad. Sci. USA 108, 9020–9025 (2011).

  30. 30.

    Ariely, D. et al. The effects of averaging subjective probability estimates between and within judges. J. Exp. Psychol. Appl. 6, 130–147 (2000).

  31. 31.

    Herzog, S. M. & Hertwig, R. The wisdom of many in one mind: Improving individual judgments with dialectical bootstrapping. Psychol. Sci. 20, 231–237 (2009).

  32. 32.

    Müller-Trede, J. Repeated judgment sampling: boundaries. Judgm. Decis. Mak. 6, 283–294 (2011).

  33. 33.

    Rauhut, H. & Lorenz, J. The wisdom of crowds in one mind: how individuals can simulate the knowledge of diverse societies to reach better decisions. J. Math. Psychol. 55, 191–197 (2011).

  34. 34.

    Herzog, S. M. & Hertwig, R. Think twice and then: combining or choosing in dialectical bootstrapping? J. Exp. Psychol. Learn. Mem. Cogn. 40, 218–232 (2014).

  35. 35.

    Krueger, J. I. & Chen, L. J. The first cut is the deepest: effects of social projection and dialectical bootstrapping on judgmental accuracy. Soc. Cogn. 32, 315–336 (2014).

  36. 36.

    Herzog, S. M. & Hertwig, R. Harnessing the wisdom of the inner crowd. Trends Cogn. Sci. 18, 504–506 (2014).

  37. 37.

    Dehaene, S., Izard, V., Spelke, E. & Pica, P. Log or linear? Distinct intuitions of the number scale in Western and Amazonian indigene cultures. Science 320, 1217–1220 (2008).

  38. 38.

    Dehaene, S. Number Sense. How the Mind Creates Mathematics (Oxford Univ. Press, Oxford, 1997).

  39. 39.

    Nieder, A. Counting on neurons: the neurobiology of numerical competence. Nat. Rev. Neurosci. 6, 177–190 (2005).

  40. 40.

    Siegler, R. S. & Opfer, J. E. The development of numerical estimation: evidence for multiple representations of numerical quantity. Psychol. Sci. 14, 237–243 (2003).

  41. 41.

    Siegler, R. S. & Booth, J. L. Development of numerical estimation in young children. Child Dev. 75, 428–444 (2004).

  42. 42.

    Booth, J. L. & Siegler, R. S. Developmental and individual differences in pure numerical estimation. Dev. Psychol. 42, 189–201 (2006).

  43. 43.

    Bertelli, I., Lucangeli, D., Piazza, M., Dehaene, S. & Zorzi, M. Numerical estimation in preschoolers. Dev. Psychol. 46, 545–551 (2010).

  44. 44.

    Hooker, R. Mean or median. Nature 75, 487–488 (1907).

  45. 45.

    Genest, C. & Zidek, J. V. Combining probability distributions: a critique and an annotated bibliography. Stat. Sci. 1, 114–135 (1986).

  46. 46.

    Dawid, A. P. et al. Coherent combination of experts’ opinions. Test 4, 263–313 (1995).

  47. 47.

    Genre, V., Kenny, G., Meyler, A. & Timmermann, A. Combining expert forecasts: can anything beat the simple average? Int. J. Forecast. 29, 108–121 (2013).

  48. 48.

    Baron, J., Mellers, B. A., Tetlock, P. E., Stone, E. & Ungar, L. H. Two reasons to make aggregated probability forecasts more extreme. Decis. Anal. 11, 133–145 (2014).

  49. 49.

    Satopää, V. A. et al. Combining multiple probability predictions using a simple logit model. Int. J. Forecast. 30, 344–356 (2014).

  50. 50.

    Larrick, R. P. & Soll, J. B. Intuitions about combining opinions: misappreciation of the averaging principle. Manage. Sci. 52, 111–127 (2006).

  51. 51.

    Mannes, A. E. Are we wise about the wisdom of crowds? The use of group judgments in belief revision. Manage. Sci. 55, 1267–1279 (2009).

  52. 52.

    Fraundorf, S. H. & Benjamin, A. S. Knowing the crowd within: metacognitive limits on combining multiple judgments. J. Mem. Lang. 71, 17–38 (2014).

  53. 53.

    Hourihan, K. L. & Benjamin, A. S. Smaller is better (when sampling from the crowd within): low memory-span individuals benefit more from multiple opportunities for estimation. J. Exp. Psychol. Learn. Mem. Cogn. 36, 1068–1074 (2010).

  54. 54.

    Steegen, S., Dewitte, L., Tuerlinckx, F. & Vanpaemel, W. Measuring the crowd within again: a pre-registered replication study. Front. Psychol. 5, 786 (2014).

  55. 55.

    Krogh, A. & Vedelsby, J. in Advances in Neural Information Processing Systems Vol. 7 (eds Tesauro, G. et al.) 231–238 (MIT Press, Cambridge, MA, 1995).

Download references


We thank Holland Casino for providing the data, and A. Baillon, S. Herzog, A. Lucas, L. Molleman, A. Opschoor, R. Potter van Loon, V. Spinu, and L. Wolk for their constructive and valuable comments. The paper has benefited from discussions with seminar participants at the Max Planck Institute for Human Development, Carnegie Mellon University and the University of Nottingham, and with participants of the 2015 NIBS workshop, SPUDM 2015 Budapest, WESSI 2016 Abu Dhabi, IMEBESS 2016 Rome, TIBER 2016 Tilburg and BFWG 2017 London. We gratefully acknowledge support from the Netherlands Organisation for Scientific Research (NWO) and from the Economic and Social Research Council via the Network for Integrated Behavioural Sciences (ES/K002201/1). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

D.v.D. and M.J.v.d.A. designed the research, performed the research, contributed new analytic tools, analysed the data, and wrote the paper.

Competing interests

The authors declare no competing interests.

Correspondence to Dennie van Dolder.

Electronic supplementary material

  1. Supplementary Information

    Supplementary Notes, Supplementary Notes 2, Supplementary Tables 1–4, Supplementary Figures 1–18

  2. Life Sciences Reporting Summary

  3. Experiment code

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: MSE of the inner crowd and the outer crowd as a function of the number of included estimates.
Fig. 2
Fig. 3: Values of \({{\boldsymbol{T}}}_{{\boldsymbol{t}}}^{{\boldsymbol{* }}}\) for different delays.