Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

A robust data-driven approach identifies four personality types across four large data sets

Matters Arising to this article was published on 16 September 2019


Understanding human personality has been a focus for philosophers and scientists for millennia1. It is now widely accepted that there are about five major personality domains that describe the personality profile of an individual2,3. In contrast to personality traits, the existence of personality types remains extremely controversial4. Despite the various purported personality types described in the literature, small sample sizes and the lack of reproducibility across data sets and methods have led to inconclusive results about personality types5,6. Here we develop an alternative approach to the identification of personality types, which we apply to four large data sets comprising more than 1.5 million participants. We find robust evidence for at least four distinct personality types, extending and refining previously suggested typologies. We show that these types appear as a small subset of a much more numerous set of spurious solutions in typical clustering approaches, highlighting principal limitations in the blind application of unsupervised machine learning methods to the analysis of big data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Uncertainty in the ARC-type classification.
Fig. 2: Clustering reveals four meaningful personality types.
Fig. 3: Replicability of personality types in three independent data sets.
Fig. 4: The composition of four meaningful clusters is correlated with age and gender and is stable across different data sets.

Similar content being viewed by others

Data availability

Data are available from (Johnson-300 and Johnson-120), (myPersonality-100) and (BBC-44).


  1. Revelle, W., Wilt, J. & Condon, D. M. in The Wiley-Blackwell Handbook of Individual Differences (eds Chamorro-Premuzic, T. et al.) 1–38 (Wiley-Blackwell, Oxford, 2013).

  2. McCrae, R. R. & Costa, P. T. in The SAGE Handbook of Personality Theory and Assessment: Volume 1 Personality Theories and Models (eds Boyle, G. J. et al.) 273–294 (SAGE, London, 2008).

  3. Widiger, T. A. The Oxford Handbook of the Five Factor Model of Personality (Oxford Univ. Press, Oxford, 2015).

    Book  Google Scholar 

  4. McCrae, R. R., Terracciano, A., Costa, P. T. & Ozer, D. J. Person-factors in the California adult Q-set: closing the door on personality trait types? Eur. J. Pers. 20, 29–44 (2006).

    Article  Google Scholar 

  5. Donnellan, M. B. & Robins, R. W. Resilient, overcontrolled, and undercontrolled personality types: issues and controversies. Soc. Pers. Psychol. Compass 11, 1070–1083 (2010).

    Article  Google Scholar 

  6. Specht, J., Luhmann, M. & Geiser, C. On the consistency of personality types across adulthood: latent profile analyses in two large-scale panel studies. J. Pers. Soc. Psychol. 107, 540–556 (2014).

    Article  PubMed  Google Scholar 

  7. Goldberg, L. R. An alternative “description of personality”: the Big-Five factor structure. J. Pers. Soc. Psychol. 59, 1216–1229 (1990).

    Article  CAS  PubMed  Google Scholar 

  8. Costa, P. T. & McCrae, R. R. NEO PI-R Professional Manual (Psychological Assessment Resources, Odessa, FL, 1992).

    Google Scholar 

  9. Ozer, D. J. & Benet-Martı́nez, V. Personality and the prediction of consequential outcomes. Annu. Rev. Psychol. 57, 401–421 (2006).

    Article  PubMed  Google Scholar 

  10. Widiger, T. A. & Costa, P. T. Jr. Personality Disorders and the Five-Factor Model of Personality 3rd edn (American Psychological Association, Washington DC, 2013).

    Book  Google Scholar 

  11. Asendorpf, J. B., Borkenau, P., Ostendorf, F. & Van Aken, M. A. G. Carving personality description at its joints: confirmation of three replicable personality prototypes for both children and adults. Eur. J. Pers. 15, 169–198 (2001).

    Article  Google Scholar 

  12. Robins, R. W., John, O. P., Caspi, A., Moffitt, T. E. & Stouthamer-Loeber, M. Resilient, overcontrolled, and undercontrolled boys: three replicable personality types. J. Pers. Soc. Psychol. 70, 157–171 (1996).

    Article  CAS  PubMed  Google Scholar 

  13. Caspi, A. & Silva, P. A. Temperamental qualities at age three predict personality traits in young adulthood: longitudinal evidence from a birth cohort. Child Dev. 66, 486–498 (1995).

    Article  CAS  PubMed  Google Scholar 

  14. Block, J. Lives Through Time (Bancroft Press, Berkeley, CA, 1971).

    Google Scholar 

  15. Costa, P. T., Herbst, J. H., McCrae, R. R., Samuels, J. & Ozer, D. J. The replicability and utility of three personality types. Eur. J. Pers. 16, S73–S87 (2002).

    Article  Google Scholar 

  16. Herzberg, P. Y. & Roth, M. Beyond resilients, undercontrollers, and overcontrollers? An extension of personality prototype research. Eur. J. Pers. 20, 5–28 (2006).

    Article  Google Scholar 

  17. Altman, N. & Krzywinski, M. Points of significance: clustering. Nat. Methods 14, 545–546 (2017).

    Article  CAS  Google Scholar 

  18. Ashton, M. C. & Lee, K. An investigation of personality types within the HEXACO personality framework. J. Individ. Differ. 30, 181–187 (2009).

    Article  Google Scholar 

  19. Isler, L., Fletcher, G. J. O., Liu, J. H. & Sibley, C. G. Validation of the four-profile configuration of personality types within the Five-Factor model. Pers. Individ. Dif. 106, 257–262 (2017).

    Article  Google Scholar 

  20. Rentfrow, P. J. et al. Divided we stand: three psychological regions of the United States and their political, economic, social, and health correlates. J. Pers. Soc. Psychol. 105, 996–1012 (2013).

    Article  PubMed  Google Scholar 

  21. Rentfrow, P. J., Jokela, M. & Lamb, M. E. Regional personality differences in Great Britain. PLoS ONE 10, e0122245 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Revelle, W. et al. in SAGE Handbook of Online Research Methods (eds Fielding, N. G. et al.) 578–595 (SAGE, London, 2016).

  23. Jain, A. K. Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010).

    Article  Google Scholar 

  24. Goldberg, L. R. in Personality Psychology in Europe Vol. 7 (eds Mervielde, I., Deary, I., De Fruyt, F. & Ostendorf, F.) 7–28 (Tilburg Univ. Press, Tilburg, 1999).

  25. Revelle, W. An Introduction to Psychometric Theory with Applications in R (Personality Project, 2017);

  26. Costa, P. T. & McCrae, R. in The Oxford Handbook of the Five Factor Model (ed. Widiger, T. A.) 1–52 (Oxford Univ. Press, Oxford, 2015).

  27. Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference 2nd edn (Springer, New York, NY, 2002).

    Google Scholar 

  28. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

    Article  Google Scholar 

  29. Fortunato, S. & Barthelemy, M. Resolution limit in community detection. Proc. Natl Acad. Sci. USA 104, 36–41 (2007).

    Article  CAS  PubMed  Google Scholar 

  30. Lancichinetti, A. et al. A high-reproducibility and high-accuracy method for automated topic classification. Phys. Rev. X 5, 011007 (2015).

    Google Scholar 

  31. Horn, J. L. A rationale and test for the number of factors in factor analysis. Psychometrika 30, 179–185 (1965).

    Article  CAS  PubMed  Google Scholar 

  32. Xie, X., Chen, W., Lei, L., Xing, C. & Zhang, Y. The relationship between personality types and prosocial behavior and aggression in Chinese adolescents. Pers. Individ. Dif. 95, 56–61 (2016).

    Article  Google Scholar 

  33. Terracciano, A., McCrae, R. R., Brent, L. J. & Costa, P. T. Hierarchical linear modeling analyses of the NEO-PI-R scales in the Baltimore longitudinal study of aging. Psychol. Aging 20, 493–506 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Meeus, W., Van de Schoot, R., Klimstra, T. & Branje, S. Personality types in adolescence: change and stability and links with adjustment and relationships: a five-wave longitudinal study. Dev. Psychol. 47, 1181–1195 (2011).

    Article  PubMed  Google Scholar 

  35. Eysenck, H. J. & Eysenck, M. W. Personality and Individual Differences: a Natural Science Approach (Plenum Press, New York, NY, 1985).

    Book  Google Scholar 

  36. Johnson, J. A. Measuring thirty facets of the Five Factor model with a 120-item public domain inventory: development of the IPIP-NEO-120. J. Res. Pers. 51, 78–89 (2014).

    Article  Google Scholar 

  37. Condon, D. M. The SAPA personality inventory: an empirically-derived, hierarchically-organized self-report personality assessment model. Preprint at (2018).

  38. Vazire, S. & Mehl, M. Knowing me, knowing you: the accuracy and unique predictive validity of self-ratings and other-ratings of daily behavior. J. Pers. Soc. Psychol. 95, 1202–1216 (2008).

    Article  PubMed  Google Scholar 

  39. Paulhus, D. L. & Vazire, S. in Handbook of Research Methods in Personality Psychology (eds Robins, R. W. et al.) 224–239 (Guilford, New York, NY, 2007).

  40. Chapman, B. & Goldberg, L. Replicability and 40-year predictive power of childhood ARC types. J. Pers. Soc. Psychol. 101, 593–606 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Steca, P., Alessandri, G. & Caprara, G. V. The utility of a well-known personality typology in studying successful aging: resilients, undercontrollers, and overcontrollers in old age. Pers. Individ. Dif. 48, 442–446 (2010).

  42. Kosinski, M., Matz, S., Gosling, S., Popov, V. & Stillwell, D. Facebook as a social science research tool: opportunities, challenges, ethical considerations and practical guidelines. Am. Psychol. 70, 543–556 (2015).

    Article  PubMed  Google Scholar 

  43. University of Cambridge, Department of Psychology, British Broadcasting Corporation BBC Big Personality Test, 2009–2011: Dataset for Mapping Personality across Great Britain [data collection] (UK Data Service, 2015);

  44. Gosling, S. D., Vazire, S., Srivastava, S. & John, O. P. Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. Am. Psychol. 59, 93–104 (2004).

    Article  PubMed  Google Scholar 

  45. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning 2nd edn (Springer, New York, NY, 2009).

    Book  Google Scholar 

  46. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  47. Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958).

    Article  Google Scholar 

  48. Factor rotation. Python code for factor rotation (GitHub, 2017);

  49. Carrol, J. An analytical solution for approximating simple structure in factor analysis. Psychometrika 18, 23–38 (1953).

    Article  Google Scholar 

  50. Bishop, C. Pattern Recognition and Machine Learning (Springer, New York, NY, 2006).

Download references


L.A.N.A. thanks the John and Leslie McQuown Gift and support from the Department of Defense Army Research Office under grant number W911NF-14-1-0259. W.R.’s work was partially supported by a grant from the National Science Foundation: SMA-1419324. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank J. Johnson for making the Johnson-300 and the Johnson-120 data sets publicly available; D. Stillwell, M. Kosinski and the myPersonality project for sharing the myPersonality-100 data; and the BBC LabUK for making the BBC-44 data set publicly available.

Author information

Authors and Affiliations



M.G., B.F., W.R. and L.A.N.A. designed the research. M.G., B.F., W.R. and L.A.N.A. performed the research. M.G. and B.F. analysed the data. M.G., W.R. and L.A.N.A. wrote the paper.

Corresponding author

Correspondence to Luís A. Nunes Amaral.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–17; Supplementary Table 1; Supplementary Methods; Supplementary References

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gerlach, M., Farb, B., Revelle, W. et al. A robust data-driven approach identifies four personality types across four large data sets. Nat Hum Behav 2, 735–742 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics