A prediction-focused approach to personality modeling

Lavi, Gal; Rosenblatt, Jonathan; Gilead, Michael

doi:10.1038/s41598-022-16108-3

Download PDF

Article
Open access
Published: 25 July 2022

A prediction-focused approach to personality modeling

Gal Lavi¹,
Jonathan Rosenblatt² &
Michael Gilead³

Scientific Reports volume 12, Article number: 12650 (2022) Cite this article

3074 Accesses
6 Altmetric
Metrics details

Subjects

Abstract

In the current study, we set out to examine the viability of a novel approach to modeling human personality. Research in psychology suggests that people’s personalities can be effectively described using five broad dimensions (the Five-Factor Model; FFM); however, the FFM potentially leaves room for improved predictive accuracy. We propose a novel approach to modeling human personality that is based on the maximization of the model’s predictive accuracy. Unlike the FFM, which performs unsupervised dimensionality reduction, we utilized a supervised machine learning technique for dimensionality reduction of questionnaire data, using numerous psychologically meaningful outcomes as data labels (e.g., intelligence, well-being, sociability). The results showed that our five-dimensional personality summary, which we term the “Predictive Five” (PF), provides predictive performance that is better than the FFM on two independent validation datasets, and on a new set of outcome variables selected by an independent group of psychologists. The approach described herein has the promise of eventually providing an interpretable, low-dimensional personality representation, which is also highly predictive of behavior.

Establishing the structure and replicability of personality profiles using the HEXACO-PI-R

Article 06 April 2020

Applying explainable artificial intelligence methods to models for diagnosing personal traits and cognitive abilities by social network data

Article Open access 04 March 2024

Personality beyond taxonomy

Article 09 November 2020

Introduction

Humans significantly differ from each other. Some people’s idea of fun is partying all night long, and others enjoy binging on a TV series while eating snacks; some are extremely intelligent, and others less so; some are hot-headed, and others remain cool, no matter what. Because of this variety, predicting humans’ thoughts, feelings, and behaviors is a cumbersome task; nonetheless, we attempt to solve this task on a daily basis. For example, when we decide who to marry, we try to predict whether we can depend on the other person till death do us part; when we choose a career, we must do our best to predict whether we will be successful and fulfilled in a given profession.

In order to predict a person’s thoughts, feelings, and behaviors, people often have no other option but to generate something akin to a scientific theory¹—a parsimonious model that attempts to capture the unique characteristics of individuals, and that could be used to predict their behavior in novel circumstances. Indeed, research shows that people employ such theories when predicting their own² and others’ behaviors. Unfortunately, theories based strictly on intuition are often highly inaccurate³, even if produced by professional psychological theoreticians⁴. In light of this, ever since the early days of psychology research, scholars have been attempting to devise personality models using the scientific method, giving rise to the longstanding field of personality science.

Personality, when used as a scientific term, refers to the mental features of individuals that characterize them across different situations, and thus can be used to predict their behavior. In the early years of personality research, scientists generated numerous competing theories and measures, but struggled to arrive at a scientific consensus regarding the core structure of human personality. In recent decades, a consensus theory of the core dimensions of human personality has emerged—the Five Factor Model (FFM).

The FMM emerged from the so-called “lexical paradigm”, which assumed that if people regularly exhibit a form of behavior that is meaningful to human life, then language will produce a term to describe it⁵. Given this assumption, personality psychologists performed research wherein they asked individuals to rate themselves on lists of common English language trait words (e.g., friendly, upbeat), and then developed and used early dimensionality-reduction methods to find a parsimonious model that can account for much of the variability in each person’s trait ratings⁵.

Much research shows that these five factors, often termed the “Big Five” are relatively stable over time and have convergent and discriminant validity across methods and observers⁶. Moreover, research into the FFM has replicated the dimensional structure in different samples, languages, and cultures^7,8 (but see⁹ for a recent criticism). In light of this, the FFM is taken by some to reflect a comprehensive ontology of the psychological makeup of human beings¹⁰ according to Mccrae and Costa¹¹ the five factors are “both necessary and reasonably sufficient for describing at a global level the major features of personality’’.

Surely, human beings are complex entities, and their personality is not fully captured by five dimensions; however, the importance of having a parsimonious model of humans’ psychological diversity cannot be overstated. As noted by John and Srivasta¹², a parsimonious taxonomy permits researchers to study “specified domains of personality characteristics, rather than examining separately the thousands of particular attributes that make human beings individual and unique.” Moreover, as they note, such a taxonomy greatly facilitates “the accumulation and communication of empirical findings by offering a standard vocabulary, or nomenclature”.

An additional consequence of having a parsimonious model of the core dimensions of human personality, is that such an abstraction enables the acquisition of novel knowledge via statistical learning (see¹³ for a discussion of the importance of abstract representations in learning); namely, whereas the estimation of covariances between high-dimensional vectors is often highly unreliable (i.e., the so-called “curse of dimensionality”¹⁴), learning the statistical correlates of a low-dimensional structure is a more tractable problem. For example, research has shown that participants’ self-reported ratings on the FFM dimensions can be reliably estimated based on their digital footprint¹⁵.

This ability to infer individuals’ personality traits using machine learning also raises serious concerns, as it may be used for effective psychological manipulation of the public. In 2013, a private company named Cambridge Analytica harvested the data of Facebook users, and used statistical methods to infer the personality characteristics of hundreds of millions of Americans¹⁶. This psychological profile of the American population was supposedly used by the Trump campaign in an attempt to tailor political advertisements based on an individuals’ specific personality profile. While the success of these methods remains unclear, given the vast amount of data accumulated by companies such as Alphabet and Meta, the potential dangers of machine-learning based psychological profiling is taken by many to be a serious threat to democracy¹⁷.

Even if dubious entities indeed manage to acquire the Big Five personality profile of entire populations, it is far from obvious that such information could be used to generate actionable predictions. Indeed, the FFM was criticized by some researchers for its somewhat limited contribution to predicting outcomes on meaningful dimensions^18,19,20. In light of such claims, some have argued that the public concern over the Cambridge Analytica scandal was overblown²¹ (but see²² for evidence for potential reasons for concern).

Roberts et al.²³ present counter-argument for critical stances against the predictive accuracy of the FFM and note that: “As research on the relative magnitude of effects has documented, personality psychologists should not apologize for correlations between 0.10 and 0.30, given that the effect sizes found in personality psychology are no different than those found in other fields of inquiry.” While this claim is clearly true, there is also no doubt that such correlations (that translate to explained variance in the range of 1%-9%) potentially leave room for improvements in terms of predictive accuracy.

If one’s goal is to find a parsimonious representation of personality that has better predictive accuracy than the FFM, it could be instructive to remember that the statistical method by which the FFM was produced—namely, Factor Analysis—is not geared towards prediction. Factor analysis is an unsupervised dimensionality-reduction method (i.e., a method that maps original data to a new lower dimensional space without utilizing information regarding outcomes) aimed at maximizing explanatory coherence and semantic interpretability, rather than maximizing predictive ability. It does so by finding a parsimonious, low-dimension representation (e.g., the five Big Five factors: extraversion, neuroticism and so on) that maximizes the variance explained in the higher-dimension domain (e.g., hundreds of responses to questionnaire items; for example, “I am lazy”; “I enjoy meeting new people”). Advances in statistics and machine learning have opened up new techniques for supervised dimensionality-reduction. Namely, methods that reduce the dimensionality of a source domain (i.e., predictor variables, ${X}_{1},...{,X}_{n}$; in the case of personality, hundreds of questionnaire items) by focusing on the objective of maximizing the capacity of the lower-dimensional representation to predict outcomes of a target domain (outcome variables, ${Y}_{1},...{,Y}_{m}$, for example, depression, risky behavior, workplace performance).

Such techniques where dimensionality-reduction is achieved via maximization of predictive accuracy across a host of target-domain outcomes hold the potential of providing psychologists with parsimonious models of a psychological feature space that serve as relatively “generalizable predictors” of important aspects of human behavior. Moreover, it may demonstrate that privacy leaks, a-lá Cambridge-Analytica, are indeed a serious threat to democracy, despite being dismissed by some as science fiction.

In light of this, we investigated whether a supervised dimensionality-reduction approach that takes into account a host of meaningful can potentially improve the predictive performance of personality models. Such an approach could pave the way to a new family of personality models and could advance the study of personality. Alternatively, it may very well be the case that the FFM indeed “carves nature at its joints” and provides the most accurate ontology of the psychological proclivities of humans. In such a case, the FFM may remain the best predictive model of personality, and our approach will not provide improvements in predictions.

In order to examine this question, we conducted three studies. In Study 1, we built a supervised learning model using big data of personality questionnaire items and diverse, important life outcomes. We reduced the dimensionality of 100 questionnaire items into a set of five dimensions, with the objective of simultaneously minimizing prediction errors across ten meaningful life outcomes. We hypothesized that the resulting five-dimensional representation will outperform the FFM representation–when fitting a new model and attempting to predict the ten important outcomes on a held-out dataset. Next, in Studies 2 and 3, we explored the performance of the resulting model on new outcome variables.

Study 1

Method

Participants

The analyses relied on the myPersonality dataset that was collected between 2007 and 2012 via the myPersonality Facebook application. The myPersonality database is no longer shared by its creators for additional use. We received approval to download that data from the administrators of myPersonality on January 7th, 2018, and downloaded the data shortly thereafter. After the myPersonality database was taken down in 2018, we sent an email to the administrators (on June 8th, 2018), and received confirmation that we can use the data we have already downloaded. The application enabled its users to take various validated psychological and psychometric tests, such as different versions of the International Personality Item Pool (IPIP) questionnaire. Many participants also provided informed consent for researchers to access their Facebook usage details (e.g., liked pages). Participation was voluntary and likely motivated by people’s desire for self-knowledge²⁴. The Participants in the myPersonality database are relatively representative of the overall population²⁵. All participants provided informed consent for the data they provided to be used in subsequent psychological studies. We used data from 397,851 participants (210,279 females, 142,497 males, and 44,805 did not identify) who answered all of the questions on the 100-item IPIP representation of Goldberg’s²⁶ markers for the FFM which are freely available for all types of use. Participants’ mean age was 25.7 years (SD = 8.84). The study was approved by the Institutional Review Board of Ben-Gurion University, and was conducted in accordance with relevant guidelines and regulations.

Measures

Dependent variables

We sought to use supervised learning in order to find a low-dimensional representation of personality that can be used to predict psychological consequences across a diverse set of domains. We thus focused on ten meaningful outcome variables that were available in the myPersonality database, that cover many dimensions of human life which psychologists care about:

(1) Intelligence Quotient (IQ), measured with a brief 20 items version of the Raven’s Standard Progressive Matrices test²⁷.

(2) Well-being, measured with the Satisfaction with Life scale²⁸.

Personal values, measured using two scores representing the two axes from the Schwartz's Values Survey:

(3) Self-transcendence vs. Self-enhancement values and

(4) Openness to Change vs. Conservation values²⁹.

(5) Empathy, measured with the Empathy Quotient Scale³⁰.

(6) Depression, measured with The Center for Epidemiologic Study Depression (CES-D) scale³¹.

(7) Risky behavior, measured with a single-item question concerning illegal drug use.

(8) Self-reports of legal, yet unhealthy behavior (measured as averaging two single-item questions concerning alcohol consumption and smoking).

(9) Single item self-report of political ideology.

(10) The number of friends of participants’ had on the social network Facebook.

Independent variables

Our independent variables were the participants’ answers to the 100 questions included in the IPIP-100 questionnaire³². In this questionnaire, the participants are asked to rate their agreement with various statements related to different behaviors in their life and their general characteristics and competencies, on a scale from 1 (strongly disagree) to 5 (strongly agree). The original use of this questionnaire is to reliably gauge participants' scores on each of the FFM dimensions. It includes five subscales, each containing 20 items; the factor score for each FFM dimension can be calculated as a simple average of these 20 questions (after reverse coding some items). In the current research we treat each item from this list of 100 questions as a separate independent variable, and seek to reduce the dimensionality of this vector using supervised learning.

Model construction

The problem we set out to solve is to find a good predictive model that is: (a) based on the 100 questions of the existing IPIP-100 questionnaire, and (b) uses five variables only, so we can fairly compare it with the FFM. Reduced Rank Regression (RRR) is a tool that allows just that: it can be used to compress the original 100 IPIP items, to a set of five new variables. These new variables are constructed so that they are good predictors, on average, of a large set of outcomes. Unlike Principal Component Analysis (PCA) or Factor Analysis, RRR reduces data dimensionality by optimizing predictive accuracy.

We randomly divided our data into an independent train and test sets. Each subject in the train and test set had 100 scores of the IPIP questionnaire (${X}_{1},{X}_{2},...{,X}_{100}$), as well as their score in each of the ten dependent variables (${Y}_{1},{Y}_{2},...{,Y}_{10}$).

X (n × 100) and Y (n × 100) have been centered and scaled. We fitted a linear predictor, with coefficient vector:

$${\widehat{Y}}_{j} := \sum_{k=1}^{100} {X}_{k}{C}_{kj}\quad j= {1,2},....10 $$

(1)

And in matrix notation:

$$\widehat{Y} = XC $$

(2)

Our linear predictors were fully characterized by the matrix C. We wanted these predictors to satisfy the following criteria: (a) minimize the squared prediction loss (b) consist of 5 predictors, i.e., rank(C) = r = 5. Criterion (a) ensures the goodness of fit of the model, and criterion (b) ensures a fair comparison with the FFM. The RRR problem amounts to finding a set of predictors, $\hat{C}$, so that:

$$ \hat{C}: = argmin_{C} \left\{ { \left| {\left| { Y - XC } \right|} \right| ^{2 } , such\; that \;rank\left( C \right) = r} \right\}, $$

(3)

where || $\cdot $ || denotes the Frobenius matrix norm. The matrix $C$ can be expressed as a product of two rank-constrained matrices:

$$C := B{A}^{T}$$

(4)

where $B$ is of has p rows and r columns, denoted, p × r, and ${A}$ is of dimension q × r. The model (2) may thus be rewritten as:

$$\widehat{Y} = (X\widehat{B}){\widehat{A}}^{T}{ }$$

(5)

The n × r matrix $X\hat{B}$, which we noted $\tilde{X}$, may be interpreted as our new low-dimension personality representation. Crucially for our purposes, the same set of r predictors is used for all dependent variables. By choosing dependent variables from different domains, we dare argue that this set of predictors can serve as a set of “generalizable predictors”, which we call henceforth the Predictive Five (PF). For the details of the estimation of $\hat{B}$ see the attached code. For a good description of the RRR algorithm see³³.

Model assessment

To assess the predictive performance of the PF, and compare it to the predictive properties of the classical FFM, we used a fourfold cross validation scheme. The validation worked as follows: we learned $\hat{B}$ from a train set (397,851 participants) using RRR; we then divided the independent test set (800 participants) into 4 subsets; we learned $\hat{A}$ from a three-quarters part of the test set (600 participants), and computed the R² on the holdout test set (200 participants); we iterated this process over the 4-test subsets. The rationale of this scheme is that: (a) predictive performance is assessed using R² on a completely novel dataset; (b) when learning the predictive model, we wanted to treat the personality attributes as known. We thus learned $\hat{B}$ and $\hat{A}$ from different sets. The size of the holdout set was selected so that R² estimates will have low variance. The details of the process can be found in the accompanying code (https://github.com/GalBenY/Predictive-Five).

To examine the performance of the RRR algorithm against another candidate reference model we also performed Principal Component Regression (PCR), where we reduced the IPIP questionnaire to its 5 leading principal components, which were then used to predict the outcome variables. We used the resulting model as a point of comparison in follow-up assessment of predictive accuracy. Like the RRR case, we learned the principal components from the train-set (397,851 participants). Next we divided the independent test set (800 participants) into 4 subsets and used a fourfold cross validation: ¾ to learn 5 coefficients, and ¼ to compute.

In order to calculate the significance of the difference in the predictive accuracy of the models we took the following approach: predictions are essentially paired, since they originate from the same participant. For each participant, we thus computed the (holdout) difference between the (absolute) error of the PF and FFM models: $|{{\widehat{y}}_{i}}^{PF}|-|{{\widehat{y}}_{i}}^{FFM}|$. Given a sample of such differences, comparing the models collapses to a univariate t-test allowing us to reject the null hypothesis that the mean of the differences is 0.

Results

PF loadings

Each of the resulting PF dimensions were a weighted linear combination of IPIP-100 item responses. Despite the fact that the resulting model was based on a questionnaire meant to reliably gauge the FFM, the resulting outcome did not fully recapitulate the FFM structure. The detailed loadings for each of the resulting five dimensions appears in the supplementary materials (Fig. 1, Supplementary Materials), can be examined in an online application we have created (https://predictivefive.shinyapps.io/PredictiveFive), and can be easily gleaned by examining the correlation of PF scores to the FFM scores (Fig. 2). None of the PF dimensions strongly correlated with demographic variables (Table 1, Supplementary Materials). In Fig. 1, we display the correlations between the ten outcome variables, five principle components of these outcome variables (capturing 86% of the total variance), and the five PF dimensions. For example, it can be observed the PF 3 is inversely related to performance on the intelligence test and to empathy.

Table 1 Comparison of the predictive performance of the different models.

Full size table

Predictive performance

The out-of-sample R² of the three models is reported in Table 1. From this figure, we learn that the PF-based regression model is a better predictor of the outcome variables. This holds true on average (over behavioral outcomes), but also for nine of the ten outcomes individually. On 5 of the 10 comparisons, the PF-based model significantly outperformed the FFM, and in a single case the FFM-based model significantly outperformed the PF. The average improvement across all 10 measures was 40.8%.

Reproducibility analysis

If it were the case that our model discovery process produces very different loadings when run on different samples of participants, then the ontological status of the PF representation should be called into question.

In order to assess the reproducibility of the PF we split the training dataset from Study 1 into two datasets; sample A with 198,850 participants and sample B with 198,851 participants. We then learned the rotation matrix, B, on each data part, and applied it. Equipped with two independent copies of the PF, ${X}_{l }{\widehat{B}}_{l}, l=\{A,B\}$ replicability is measured by the correlation between data-parts, over participants. Table 2 reports this correlation, averaged over the 5 PFs (column “Correlation between replications”). As can be seen, the correlation between the replications is satisfactory-to-high and ranges from 0.7 to 0.98. This suggests that PF representation replicates well across samples.

Table 2 Reproducibility and reliability analysis.

Full size table

Reliability analysis

If the same individuals, tested on different occasions, receive markedly different scores on the PF dimensions, then the ontological status of the PF representation should be called into question. To this end, we exploit the fact that 96,682 users answered the IPIP questionnaire twice. The test–retest correlation between these two answers is reported in Table 2 (column “Test–retest correlation”). It varied from 0.69 for the Dimension 3, to 0.79 for both Dimensions 1 and 5, suggesting that the variance captured by these dimensions is indeed (relatively) stable.

Divergence from the FFM

The superior predictive performance of the PF representation provides evidence that it differs from the FFM. Additionally, as can be gleaned from Fig. 2 (and from the detailed factor loadings’ Supplemental Material), Dimensions 3 and 4 reflect a relatively even combination of several FFM dimensions.

However, these observations do not provide us with an estimate of the degree of agreement between the two multidimensional spaces. Prevalent statistical methods of assessment of discriminant validity³⁴ are also not suitable to answer our question regarding the convergence\divergence between the PF and FFM spaces. These various methods only provide researchers with estimates of the agreement between unidimensional constructs.

Nonetheless, the underlying logic behind these methods (i.e., a formalization of a multitrait-multimethod matrix³⁵) is still applicable to our case. We calculated an estimate of agreement between the FFM and the PF spaces using cosine similarity, which gauges the angle between two points in a multidimensional space (the smaller the angle, the closer are the points). Our rationale is that if the FFM scores differ from the PF, they should span different spaces. The cosine similarity within measures (in our case, first and second measurements, denoted T1 and T2) should thus be larger than the similarity between measures (FFM to PF).

We used the data from the 96,682 participants for which we had test–retest data. Instead of computing standard test–retest correlations, we calculated a multidimensional test-rest score as the cosine similarity of participants’ scores on the first and second measurement, for both the FFM and PF. These estimates are expected to be highly similar and provide an upper bound on the similarity measure, partially analogous to the diameter of the multitrait-multimethod matrix. In a second stage, for each T1 and T2 vector, we measured the extent to which participants’ FFM scores are similar to their PF score, thereby calculating a magnitude that is analogous to measures of divergent validity. Because cosine similarity is sensitive to the sign and order of dimensions, we extracted the maximal possible similarity between the two spaces, providing the most conservative estimate of divergent validity.

As can be seen in Fig. 3, the T1-T2 similarity of the FFM is nearly maximal (M = 0.994, SD = 0.011); the T1-T2 similarity of PF is also very high (M = 0.969, SD = 0.100). The similarity between the FFM and the PF on both T1 and T2 is much lower (M = 0.730, SD = 0.111). The minimal difference between the convergence measures and divergence measures is on the magnitude of Hedge's g of 2.217, clearly representing a substantial divergence between the FFM and PF spaces. In other words, while the PF representation bears some resemblance, it is clearly a different representation.

Discussion

The results of Study 1 provide evidence that a supervised dimensionality reduction method can yield a low-dimensional representation that is simultaneously predictive of a set of psychological outcome variables. We demonstrate that by using a standard personality questionnaire and supervised learning methods, it is possible to improve the overall prediction of a set of 10 important psychological outcomes, even when restricting ourselves to 5 dimensions of personality. RRR allowed us to compress the 100 questions of the personality questionnaire to a new quintet of attributes that optimize prediction across a large set of psychological outcomes. The resulting set of five dimensions differs from the FFM, and has better predictive power on the held-out sample than the classical FFM and an additional comparison benchmark of five dimensions generated using Principal Component Analysis.

A theory of personality should strive to predict humans’ thoughts, feelings, and behaviors across different life contexts. Indeed, the representation we discovered in Study 1 was superior to the FFM in terms of its ability to predict a diverse set of psychological outcomes on a set of novel observations. The fact that the same low-dimensional representation was applicable across a set of important outcomes of human psychology suggests that it is a relatively generalizable model, in the sense that it simultaneously applies to several important domains. However, despite the diversity of the outcome measures examined in Study 1, it remains possible that the PF representation is only effective for the prediction of the set of outcome measures on which it was trained. Such a finding would not negate the usefulness of this model, given the wide variety of outcomes captured by the PF. However, it is interesting to see whether the resulting representation can improve prediction on additional sets of outcomes. In light of this, in Study 2 we sought to examine the performance of the PF on a set of novel outcome measures that were present in the myPersonality database, but that were held-out from the model generation process. Specifically, in this study we sought to see whether the PF representation outperforms the FFM in its ability to predict participants’ experiences during their childhood.

Unlike the outcome measures used in Study 1, this dependent variable does not pertain to participants’ lives in the present, rather, it is a measure of their past experiences. As such, “retrodiction” of remote history may be especially challenging. Nonetheless, it is widely held that individuals’ psychological properties are shaped, at least to some extent, by the degree to which they were raised in a loving household^36,37. Indeed, there is evidence to the fact that many specific psychological attributes are shaped by experiences with primary caregivers (e.g., shared environmental effects on topics such as food preference³⁸, substance abuse³⁹, and agression⁴⁰). In light of this, we reasoned that it is reasonable to expect that one's personality profile should contain information that is predictive of individuals' retrospective reports of their upbringing.