  • Letter
  • Published:

A robust data-driven approach identifies four personality types across four large data sets

Matters Arising to this article was published on 16 September 2019


Understanding human personality has been a focus for philosophers and scientists for millennia1. It is now widely accepted that there are about five major personality domains that describe the personality profile of an individual2,3. In contrast to personality traits, the existence of personality types remains extremely controversial4. Despite the various purported personality types described in the literature, small sample sizes and the lack of reproducibility across data sets and methods have led to inconclusive results about personality types5,6. Here we develop an alternative approach to the identification of personality types, which we apply to four large data sets comprising more than 1.5 million participants. We find robust evidence for at least four distinct personality types, extending and refining previously suggested typologies. We show that these types appear as a small subset of a much more numerous set of spurious solutions in typical clustering approaches, highlighting principal limitations in the blind application of unsupervised machine learning methods to the analysis of big data.

Fig. 1: Uncertainty in the ARC-type classification.
Fig. 2: Clustering reveals four meaningful personality types.
Fig. 3: Replicability of personality types in three independent data sets.
Fig. 4: The composition of four meaningful clusters is correlated with age and gender and is stable across different data sets.

Data availability

Data are available from (Johnson-300 and Johnson-120), (myPersonality-100) and (BBC-44).


L.A.N.A. thanks the John and Leslie McQuown Gift and support from the Department of Defense Army Research Office under grant number W911NF-14-1-0259. W.R.’s work was partially supported by a grant from the National Science Foundation: SMA-1419324. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank J. Johnson for making the Johnson-300 and the Johnson-120 data sets publicly available; D. Stillwell, M. Kosinski and the myPersonality project for sharing the myPersonality-100 data; and the BBC LabUK for making the BBC-44 data set publicly available.

Author information

Authors and Affiliations



M.G., B.F., W.R. and L.A.N.A. designed the research. M.G., B.F., W.R. and L.A.N.A. performed the research. M.G. and B.F. analysed the data. M.G., W.R. and L.A.N.A. wrote the paper.

Corresponding author

Correspondence to Luís A. Nunes Amaral.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–17; Supplementary Table 1; Supplementary Methods; Supplementary References

Reporting Summary

Gerlach, M., Farb, B., Revelle, W. et al. A robust data-driven approach identifies four personality types across four large data sets. Nat Hum Behav 2, 735–742 (2018).

