Uncovering the complex genetics of human temperament

Experimental studies of learning suggest that human temperament may depend on the molecular mechanisms for associative conditioning, which are highly conserved in animals. The main genetic pathways for associative conditioning are known in experimental animals, but have not been identified in prior genome-wide association studies (GWAS) of human temperament. We used a data-driven machine learning method for GWAS to uncover the complex genotypic–phenotypic networks and environmental interactions related to human temperament. In a discovery sample of 2149 healthy Finns, we identified sets of single-nucleotide polymorphisms (SNPs) that cluster within particular individuals (i.e., SNP sets) regardless of phenotype. Second, we identified 3 clusters of people with distinct temperament profiles measured by the Temperament and Character Inventory regardless of genotype. Third, we found 51 SNP sets that identified 736 gene loci and were significantly associated with temperament. The identified genes were enriched in pathways activated by associative conditioning in animals, including the ERK, PI3K, and PKC pathways. 74% of the identified genes were unique to a specific temperament profile. Environmental influences measured in childhood and adulthood had small but significant effects. We confirmed the replicability of the 51 Finnish SNP sets in healthy Korean (90%) and German samples (89%), as well as their associations with temperament. The identified SNPs explained nearly all the heritability expected in each sample (37–53%) despite variable cultures and environments. We conclude that human temperament is strongly influenced by more than 700 genes that modulate associative conditioning by molecular processes for synaptic plasticity and long-term memory.


Introduction
Temperament is classically defined as those aspects of personality that express basic emotions like fear, anger, and disgust, and that are developmentally stable and heritable, rather than learned [1]. However, this classical definition is inadequate because human beings have three major systems of learning and memory with distinctive genetic and biological bases that evolved in succession over the long phylogenetic lineage leading from primitive animals to modern human beings [2][3][4]. Procedural learning of habits is present in all animals through highly conserved molecular mechanisms of associative conditioning, including classical and operant conditioning [5][6][7][8][9]. In contrast, evidence for intentional cognitive processes, such as purposeful goalseeking, social reconciliation, and abstract symbolization of facts, are present in the primate lineage of human beings, but not in reptiles [2][3][4]10]. Evidence for autonoetic or autobiographical learning appears to be present only with the advent of art and science in modern Homo sapiens [2,[11][12][13][14][15].
Early research assessing temperament focused on developmentally stable features of activity and affect, but some recent work has extended assessments of temperament to include aspects of attention and self-regulatory processes that emerged later in evolution and that develop in response to both individual experience and social norms [1,2,16]. In contrast, Cloninger took an evolutionary perspective to learning in developing the Temperament and Character Inventory (TCI), defining temperament as that aspect of personality based on associative conditioning [17][18][19]. The TCI measures four temperament dimensions that have been empirically confirmed by functional brain imaging to quantify individual differences in associative conditioning and related human brain circuitry: Harm Avoidance (i.e., fearful, pessimistic vs. risk-taking, optimistic) [20][21][22], Novelty Seeking (i.e., impulsive, excitable vs. deliberate, reserved) [23,24], Reward Dependence (i.e., friendly, sentimental vs. detached, objective) [21,24], and Persistence (i.e., determined, ambitious vs. easily discouraged, underachieving) [25,26]. Harm Avoidance is an indicator of negative valence that measures passive avoidance learning and increased sensitivity to fearful stimuli mediated by activation of the amygdala, subgenual cingulate cortex, and the insular salience network [22,27,28]. Novelty Seeking is an indicator of positive valence that measures approach to novel stimuli [29,30], even if they do not predict rewards [24], whereas Reward Dependence is predictive of social affiliation and approach to rewards based on a different pattern of activation of dopaminergic neurons in the nucleus accumbens and substantia nigra [24] and on oxytocinergic neurons in the hypothalamus [31]. Persistence quantifies differences in rates of extinction of intermittently rewarded behaviors in response to frustrative nonreward by activation of a circuit connecting the nucleus accumbens, anterior cingulate, and ventrolateral frontal cortex [25,26].
Studies of gene expression in response to associative conditioning in experimental animals have consistently documented the activation of specific molecular pathways that trigger synaptic plasticity, which is a fundamental basis for long-term memory [7,[32][33][34]. The Ras-MEK-ERK cascade (also known as the Mitogen-activated Protein Kinase (MAPK) pathway) and the PI3K-AKT-mTOR cascade are major cellular mechanisms for responding to extracellular stimuli, and their activation triggers intracellular processes that promote synaptic plasticity and associative conditioning, including long-term potentiation (LTP) and long-term depression (LDP) [7,32,33,35]. The cellsurface receptors for these pathways can be activated by a wide variety of somatic, psychological, and social stressors that vary in positive and negative valence and in consequences for survival and reproduction [6,33,36]. Changes in these pathways in response to associative conditioning occur in a coordinated manner with related processes including stress reactivity [37], neuronal and glial growth [38], and neurotransmission [39]. Therefore, we hypothesized that genes in the same molecular pathways identified in non-human animals for associative conditioning and related processes would be associated with human temperament profiles. This hypothesis was already supported indirectly by our finding that genes in these pathways were associated with the dependent and apathetic character profiles in which self-regulatory personality traits were inadequate to regulate temperament in a healthy manner, resulting in stress reactivity and ill-health [40].
Unfortunately prior genome-wide association studies (GWAS) of temperament that considered only the average effects of genes have identified few genes associated with personality and have specifically failed to uncover the genes associated with long-term memory whether the TCI or other personality inventories were used [41,42]. Such failure is an example of the "missing" [43] or "hidden" [44] heritability problem in studies of complex phenotypes. Temperament as measured by the TCI and other inventories is known to be strongly influenced by gene-gene [45][46][47][48] and gene-environment interactions [49][50][51]. Such complexity is expected from the extensive feedback interactions among the molecular pathways that are activated in non-human animals in response to associative conditioning [52].
As in our accompanying GWAS of human character [40], we have chosen to use strictly data-driven methods of deep cluster analysis in GWAS to uncover the complex genotypic and phenotypic architecture of temperament [53][54][55]. We postulate that the genes in molecular pathways related to temperament are not missing but are distributed in different networks of interacting genes and environments that influence different people [54][55][56][57]. More specifically, we hypothesize that the genes associated with temperament will be enriched in the molecular pathways experimentally activated by associative conditioning in non-human animals.

Subjects and methods
Subjects and methods were the same as detailed in an accompanying paper [40], so essentials are briefly summarized here.

Description of the samples
Our discovery sample was the Young Finns Study, an epidemiological study of 2149 healthy Finnish children followed regularly from 1980 (ages 3-18 years) to 2012 (ages 35-50 years) [58]. All Finnish subjects (56% women) had thorough standardized genotypic, environmental, and phenotypic assessments, including administration of the TCI [16,58].
We replicated the results in two independent samples of healthy adults from Germany [59,60] and Korea [61,62] in which comparable genotypic and phenotypic features were available (see Supplement). The Korean study involved 1052 unrelated individuals extracted from a national register (aged 28-81 years, 57% women). The German study involved 902 subjects (aged 20-74 years, 49% women) randomly selected from the Munich city register and screened to exclude anyone with a history of psychiatric illness in themselves or their first-degree relatives.

Personality assessment
All subjects completed the TCI to assess seven heritable dimensions of personality [18,63]. The TCI measures four well-validated dimensions of temperament (Novelty Seeking, Harm Avoidance, Reward Dependence, Persistence) and three dimensions of character, as described in the "Introduction" and in more detail in Supplementary Section 1 and Table S1 [18,63]. The 12 temperament subscales from the TCI were used as the primary phenotypic data in all three samples (Supplementary Section 2 and Table S1).

Personality health indices
People at risk of unhealthy personality were identified as the bottom decile of the sum of TCI Self-directedness and Cooperativeness [64], a previously validated indicator of illbeing [65,66]. In contrast, people with healthy personalities were identified as the top decile of the product of all three TCI character traits, a previously validated indicator of well-being [64,67,68]. Our ill-being and well-being indices were used to measure the health status of subjects consistently in all three samples.
We also identified an empirical index of temperament (Supplementary Section 3 and Table S2) as a single comprehensive measure of temperament that could be used in SNP-set Kernel Association Test (SKAT) [56,57] and heritability analyses.

Genotyping
The Finnish sample was genotyped by using Illumina Human670-Quad Custom, (i.e., Illumina 670k custom) arrays [69]. The Korean sample used Affymetrix Genome-Wide Human SNP Array 6.0 and Illumina HumanCore [61]. The German sample used Affymetrix Genome-Wide Human SNP Array 6.0, Illumina OMNI Express and the 300 Array, pre-phased and imputed with SHAPEIT2 and IMPUTE2. Some German individuals had also been genotyped on Illumina Omni1-Quad. Quality control was performed for all samples as in prior work [55] (Supplementary  Section 3).
After quality control, the PLINK software suite [70] was used to reduce the large search space by pre-selecting a subset of SNPs using a generously inclusive threshold (pvalue < 0.01 without Bonferroni correction) for possible association with temperament, taking gender and ethnicity into account as covariates of the individual SNPs, as detailed in an accompanying paper [40]. We accounted for ethnicity in each sample by using the first three principal components for ancestral stratification of SNP genotypes (Supplementary Section 3) [71].

Computational procedures
The cluster analyses used the Generalized Factorization Method [72][73][74][75] including Non-negative Matrix Factorization (NMF), which optimizes pattern recognition and naturally occurring associations between patterns across different types of data. The clustering was entirely data-driven without restrictive assumptions about the number or content of the clusters [54], as detailed elsewhere [53-55, 72, 76]. The steps of this analytic procedure are summarized and schematically related to unsupervised Deep NMF Learning in Supplementary Figure S1. The advantages of this clustering approach over alternative analyses of single or multiple markers are described in Supplementary Section 4.
Our web server application for Phenotype-Genotype Many-to-many Relations Analysis (PGMRA) in GWAS is published [54] and available online at http://phop.ugr.es/ fenogeno. The PGMRA method and algorithm are also summarized in Supplementary Sections 5 and 6, which include a semi-supervised classifier of phenotypes from genotypes. PGMRA properly accounts for Linkage Disequilibrium (LD) efficiently (i.e., without loss of information about complex genotypic-phenotypic relations) (Supplementary Section 4). Statistical analysis correcting for multiple comparisons, as well as gender and ethnicity as covariates of the SNP sets, was performed by SKAT [56,57], also accessible via PGMRA. Heritability was estimated from a trimmed regression of SNPs on the empirical index of temperament controlling for outliers and environmental variables [77,78] (see also Supplementary Section 7).
Replicability of results was evaluated in the three independent samples for SNP sets, phenotypic sets, and genotypic-phenotypic relations using multi-objective optimization techniques [55], as detailed in Supplementary Section 8. The PGMRA classifier was used to predict temperament phenotypes from the genotypic sets (Supplementary Section 9). Further details are available in Supplementary Information and elsewhere [72][73][74][75].

Results
Identifying SNP sets as candidates for causal variability 902 Non-identical but possibly overlapping SNP sets were exhaustively identified by PGMRA in the Finnish sample     The SNP sets are named based on molecular pathways and neuronal functions of the genes that distinguish the sets from one another (see Supplementary Table S4). % coding indicates the percentage of protein-coding genes. Strengths of association are compared for the SNP set, the best SNP, and average SNP based on SKAT p-values.
The number of subjects and SNPs comprising each SNP set is specified. The probabilities of the well-being and ill-being are given for subjects in each SNP set (see also Supplementary Table S2). Character indicates the association of the set with the Character phenotype (published elsewhere [40]). # Gs indicates the number of genes mapped by the SNP sets ( Figure S6), where genes can be mapped by more than one SNP set *indicates SNP-sets directly associated only with temperament sets without knowledge of the phenotype, as in our analysis of character [40]. Among these, the SNP sets related to temperament had different numbers of subjects and/or SNPs and associated health risks (  Figure S2, Supplementary Table S3).

Identifying clusters of subjects with distinct temperament profiles
118 Temperament sets were exhaustively identified by PGMRA in the Finnish sample using the 12 temperament subscales without knowledge of the genotype. These finegrained sets were identified in clustering solutions with the possible number of sets ranging from 2 to 15. Hierarchically clustering these 118 fine-grained sets with PGMRA, we identified 3 temperament super-sets that minimized the Cophenetic Correlation Coefficient (Table 2). In other words, 3 groups of people had highly distinct temperament profiles.
The three temperament profiles were named Reliable, Antisocial, and Sensitive based on traditional labels for their prominent features [17]. People in the Reliable profile were high in Reward Dependence (i.e., sentimental, friendly, approval-seeking), high in Persistence (i.e., determined), low in Novelty Seeking (i.e., deliberate, thrifty, orderly), and low in Harm Avoidance (i.e., optimistic, confident, outgoing, vigorous). This profile frequently is associated with healthy and trustworthy behavior ( Table 2). In contrast, people in the Antisocial profile were low in Reward Dependence (i.e., cold, detached, independent), low in Persistence (i.e., easily discouraged), and high in Novelty Seeking (i.e., extravagant, rule-breaking, but not inquisitive), which is frequently associated with unhealthy antisocial conduct ( Table 2). People with the Sensitive profile were high in Harm Avoidance (i.e., pessimistic, fearful, shy, and fatigable), high in Novelty Seeking (i.e., impulsive, extravagant), and high in Reward Dependence (i.e., sentimental, friendly), which is frequently associated with approach-avoidance conflicts and emotional sensitivity ( Table 2).

Prediction of temperament profiles by SNP sets
We computed the association of SNP sets with temperament in Finnish subjects. SKAT showed that the association of the empirical index of temperament with particular SNP sets was stronger than with the average effects of their constituent SNPs (Table 1). We found 51 SNP sets had significant associations with temperament (p < 4E−04). SNP sets were labeled by a genotypic identification "G", followed by 2 numbers indicating the maximum number of clusters and the order of their selection by the algorithm. For example, the SNP set G_13_3 has a p-value of 7.38E−14, whereas the best and average SNPs within this set have 1.46E−04 and 1.50E −01 p-values, respectively (Table 1). SKAT [56] and PLINK [70] methods estimated similar p-values for the individual SNPs (R 2 = 0.95, F statistics, p < 1E−41), which showed that SKAT did not inflate results.
The 51 SNP sets associated with temperament are described in Table 1. We assigned names to the SNP sets based on prominent molecular processes and pathways that distinguished them (Supplementary Table S4). The temperament-related SNP sets were comprised of networks of SNPs that mapped to 736 genes, nearly all of which are known to influence individual differences in brain functions. In particular, these SNP sets were involved in the regulation of synaptic plasticity, longterm memory based on associative conditioning (longterm potentiation and depression, fear conditioning, reward reinforcement, habit extinction), and related , and missing (black). SNP sets were labeled for specificity by a pair of numbers representing the maximum number of clusters from which the bicluster was selected (e.g., 16 clusters may produce more specific than 5) and the order in which they were selected by the method (e.g., 3rd bicluster or factor selected by FNMF when the maximum number of clusters was 5) and usually have a prefix G for genotype or P for phenotype. Only a subset of optimal and cohesive sets are selected across all number of clusters (see Supplementary Methods). The SNPs within each SNP set can map to different chromosomes (e.g., 6 and 20) and exhibit distinct molecular consequences (see Supplementary Table S3). The pie chart shows the percentage of SNPs within a SNP set that belong to each type of consequence. b Dissection of a GWAS in a Finnish population to identify the genotypic and phenotypic architecture of personality measured by the TCI. The genotypic network is depicted as nodes (SNP sets) linked by shared SNPs (blue lines) and/or subjects (red lines). Each SNP set maps to one or more genes (see Supplementary Table S6 for a full list of genes associated with each SNP set). SNP sets associated with each of the three general temperament profiles are distinguished by colorcoding as shown in the legend (see Table 3).  Table 1). The probability of well-being in the z-axis varies from high (red for high well-being) to low (green). The order of the SNP sets is based on shared subjects (x-axis) and on shared SNPs (y-axis) measured by hypergeometric statistics, so SNP sets sharing more SNPs and/or subjects are nearby. (See ill health surface in Supplementary Figure S3.) h Surface showing the pattern of health status of subjects based on both genotypic information (SNP sets) and phenotypic information (temperament sets) (as in Table 3). The probability of well-being in the z-axis varies from high (red, high well-being) to low (green). The sharing of subjects is shown for both SNP sets (x-axis) and temperament sets (y-axis). (See ill health surface in Supplementary Figure S4.) processes involving stress reactivity, neurotransmission (cholinergic, monoaminergic, GABAergic, glutaminergic), resistance to aging, neuronal and glial growth, myelination, and energy production ( Table 1, Supplementary Tables S4-S6).

Complex genotypic-phenotypic relationships in temperament profiles
We found 44 of the 118 temperament sets were significantly associated with particular SNP sets (Hypergeometric statistics, 1E−11 < p < 1E−03, Table 3). The genotypicphenotypic relations were complex, demonstrating pleiotropy and heterogeneity. For example, G_13_3 (ERK-conditioned impulsivity) is comprised of multiple genes that regulate behavioral disinhibition in associative learning tasks, such as DAB1 and CDH13 (Table 1, Supplementary  Table S4); it was frequently associated with sensitive temperament sets, but sometimes with antisocial or reliable profiles ( Table 3). The 44 temperament sets were associated with the 51 SNP sets in 158 relationships that were significant by a permutation test (Table 3, empirical p < 4.6E−03).
Clusters of individuals sharing SNPs and/or subjects (Fig. 1b) often had similar temperament profiles associated with particular molecular processes (Table 3, Supplementary Tables S4, S7). As predicted, each of the temperament profiles was strongly associated with regulation of synaptic plasticity and associative conditioning by genes regulating the Ras-MEK-ERK and PI3K-AKT-mTOR cascades in interaction with one another, Protein Kinases A, B (also known as AKT), and C, and various physiological and psychosocial stressors (Fig. 2c, Table 1, Supplementary  Table S4).

Relations among SNP sets with one another and molecular processes
We found 17 single and disjoint nodes, and at least 3 subnetworks composed of highly connected nodes, shown in Fig. 1b (see Supplementary Information, 9. Identification of sub-networks). SNP sets G_8_8 (Inositol-Chemokine signaling), G_9_2 (Serotonin-Chemokine interaction), and G_7_3 (Neurogenesis) each represent the hub of subnetworks by their direct connections to 6 or 7 other SNP sets. These networks were relatively disjoint (i.e., sharing few SNPs and subjects; see Supplementary Section 6 (iv)), suggesting that these are distinct antecedents of personality.

Heterogenic pathways influence the same temperament trait
The genes associated with each of the three temperament profiles were largely unique to that profile. 73.6% of the 736 genes associated with temperament were unique to a single temperament profile: 266 with reliable, 236 with sensitive, and 40 with antisocial (Supplementary Table S8). Consequently, there were multiple clusters of genes that lead to each individual temperament trait, as depicted in Fig. 1f. For example, high Novelty Seeking is a composite of individuals with the antisocial or sensitive temperament profiles because both are associated with features of high Novelty Seeking. Likewise, high Reward Dependence is a composite of individuals with Sensitive or Reliable profiles.
More generally, we refer to the multiple genotypicphenotypic networks that contribute to individual traits as a pipeline, as depicted in Fig. 1f. The specific genes and molecular processes in the pipelines for each of the four temperament traits are described in Supplementary Tables S9-S11.

Complex genotypic-phenotypic relationships influence health status
Combining genotypic and phenotypic information provided more information than either alone for both well-being (Fig. 1g vs. 1h) and ill-being (Supplementary Figures S3  vs. S4). When health status was based on the joint relationship of SNP sets and temperament sets, all three temperament profiles were well distinguished in terms of the probabilities of both ill-being (p < 1.58E−42, ANOVA statistics, Fig. 1c) and well-being (p < 1.05E−23, ANOVA, Fig. 1d). In contrast, when health status was based on temperament scores only, the probabilities of ill-being (pvalue < 1.27E−06, ANOVA statistics, Supplementary   Fig. 2 a, b Types of genetic variants mapped by SNP sets associated with temperament. a Specific molecular consequences (Supplementary  Table S5) and b their subtypes. Genes related only to temperament sets (red) were less often protein coding and more often RNA genes than those also associated with temperament sets (blue color). c Cell displaying the molecular pathways containing genes associated with the Sensitive and Antisocial profiles. The uncovered genes influence the Ras-MEK-ERK (MAPK), PI3K-AKT-mTOR, and Protein Kinase A, B, C pathways that regulate associative conditioning (see also Supplementary Tables S4, S7). d Multiple SNPs within a SNP set can affect a single or multiple genes in many ways (Supplementary Table S3). The PIP4K2A, the ARMC3 divergent regulatory region, and the ARMC3 coding region are illustrated. SNPs in the SNP set G_41_37 may affect regulatory regions (thereby inhibiting transcription), whereas SNPs from SNP set 39_26 are mostly located in intronic regions (thereby blocking or decreasing protein production). The SNP sets are associated with profiles exhibiting distinct temperament features (sensitive vs. antisocial)    Figure S5A) and well-being (p-value < 1.33E−05, ANOVA statistics, Supplementary Figure S5B) differentiated only the reliable profile from the other two.
We found 46 "switch" genes associated with temperament. These are a few genes in a particular SNP set whose presence or absence is associated with a switch in health status (Supplementary Table S12). These included 23 protein-coding genes, 10 lincRNAs, 4 other ncRNAs, 6 pseudogenes, 1 anti-sense, and 1 sense-intronic gene.
Overall about 67% of the 736 genes associated with temperament may be involved in regulatory processes: these included transcriptional regulators (10%), lncRNAs (14%), other RNA genes (5%), and targets of microRNAs (36%) as identified in the TRANSFAC® release 2017.1 database (Supplementary Table S13). We identified one microRNA (MIR7162) in association with temperament, and it targets 116 of the 736 genes we found associated with temperament in TRANSFAC.

Replication of results in two independent samples
We tested the replicability of our findings in the Finnish study by carrying out the same analyses in the German and Korean samples. All but one (98%) of the 51 SNP sets associated with temperament in the Finnish sample were identified in one or both of the replication samples: 40 were identified in both the Korean and German samples, 5 in the Korean sample only, and 5 in the German sample only (Supplementary Table S14). We also found that all but one (98%) of the 44 Temperament Sets associated with SNP sets in the Finnish sample were replicated in the other samples: 31 in both, 7 in Korean sample only, and 5 in the German sample only (Table S15).
Overall, the genotypic-phenotypic relations between the SNP and temperament sets identified in the Finnish sample were closely matched by those observed in both the Korean study (89%) and in the German (76%) study (Supplementary Table S16). The genotypic-phenotypic relations of people with reliable and sensitive temperaments were strongly replicated in both samples. However, at least two antisocial temperament sets strongly associated with illbeing and with several SNP sets were missing in the German sample, which had been screened to exclude anyone with a history of psychiatric illness in themselves or their first-degree relatives. The absence of these unhealthy temperament sets reduced replicability of genotypic-phenotypic relations in the German sample as expected (Supplementary Figures S6). The strength of the identity of replicated sets was calculated using Hypergeometric statistics and Multi-objective optimization techniques (see Pareto values in Supplementary Tables S17, S18).
Prior literature reporting associations with TCI-related key words were systematically surveyed from PubMed to [TCI subscales are indicated Self-directedness (sd1 to sd5), Cooperativeness (co1 to co5), and Self-transcendence (st1 to st3). Subscale values were divided by median split into high and low scores (distinguished by L before the low scores). The number of subjects in each character set is specified (#S). The probabilities of well-being and of ill-being are shown for subjects in each character set (see also Supplementary Table S2)]   identify genes that had been reported to be associated with TCI traits (Supplementary Tables S19, S20). We found that 120 of our detected genes were related to genes, family of proteins, or pathways of genes previously associated with TCI traits (Supplementary Table S19). Among the genes in temperament-related SNP sets, we also detected 74% of the 111 genes that had been previously associated with TCI temperament or character traits, and 78% of the 74 genes that had previously been reported in association with TCI temperament traits (Supplementary Table S20). Considering all 111 genes previously associated with any TCI traits in a multi-omic approach (Supplementary Table S20), we recovered 6 genes exactly, another 32 variants from the same family of proteins, and another 44 genes in the same molecular pathway previously reported.

Estimation of heritability and environmental influences
The heritability of temperament controlling for outliers was estimated as 48% in the Finnish sample, 53% in the German Association is measured by Fisher's exact test (hypergeometric). Probabilities of well-being and ill-being are given for subjects in the character sets, the SNP sets, and subjects identified in both jointly. i indicates Temperament sets that are more specific than their parental sets, which are also selected sample, and 37% in the Korean sample (Supplementary  Table S21). In addition, 87% of the SNP sets were strongly associated with the empirical temperament index (5E−08 > p-value > 5E−73). In other words, the SNPs that comprise the different SNP sets strongly distinguished the temperament features of the subjects in each set, indicating that each individual SNP set contributed significantly to explain the total distributed heritability (Supplementary Section 9). Consequently, when the genotypic sets were used to classify the well-and ill-being of the subjects using the PGMRA classifier, the predicted values were highly accurate (average Areas Under Curve of the classifications were 0.940 and 0.922, respectively) (Supplementary Figure S8). We also considered environmental influences in the Finnish sample. There were direct associations of sets of environmental influences in childhood and adulthood (Supplementary Table S22A) with temperament sets (Supplementary Table S22B) and with SNP sets (Supplementary Table S22C). The impact of these correlations was small, so the heritability estimate was still 46-52% in the Finnish sample when adjusted for gene-environment correlation (Supplementary Table S21).
Furthermore, 12 novel associations between SNP sets and temperament sets were uncovered when environmental influences were used as mediators (Supplementary  Table S22D). Seven SNP sets associated with the antisocial profile depended on exposure to low parental income during childhood, stressful life events in adulthood, and rural residence in childhood or adulthood (p < 3.4E−03 to 6.3E −04). Two SNP sets associated with sensitive profiles depended on the experience of tolerance and low income in childhood (p < 9.7E−04 to 4.7E−05). One SNP set associated with reliable profiles depended on high parental income throughout childhood (p < 1.5E−04).

Discussion
SNPs that map to 736 genes explained 48% of the variability in temperament in the Finnish sample, thereby accounting for nearly all the heritability of human temperament expected from twin studies. More specifically, most of the genes that we identified in a strictly data-driven manner are known to regulate synaptic plasticity, associative conditioning, and related processes of stress reactivity and neurotransmission. These findings confirm our hypothesis that the highly conserved molecular processes that regulate associative conditioning in experimental animals account substantially for the heritability of human temperament. Our findings are supported in independent replications by GWAS and by independent studies of gene expression during habit learning in experimental animals [7,32,33].

Molecular pathways for temperament and associative conditioning
Most of the SNP sets associated with temperament were involved in the regulation of habit learning and synaptic plasticity in response to extracellular stimuli mediated mainly by the Ras-MEK-ERK and the PI3K-AKT-mTOR cascades (Table 1, Fig. 2c). As predicted, these main pathways of fast adaptive response operated in conjunction with related processes for stress reactivity, neurotransmission, chromatin plasticity, neuronal and glial growth, myelination, neuroprotection, and energy production ( Table 1, Supplementary Tables S4-S6). The identified pathways for associative conditioning are known to intersect to regulate each other and to co-regulate downstream functions [52], as illustrated specifically in Fig. 2c. The mechanisms for integration of the ERK and PI3K cascades include mechanisms for cross-activation, cross-inhibition, negative feedback, and positive and negative influences that converge on the same complex (e.g., mTOR in Fig. 2c). In addition, protein kinases A, B (also known as AKT), and C that regulate these pathways are rather non-selective [52]. Such interactions are expected to produce complex genotypic-phenotypic relationships, as we observed.
These findings about specific molecular pathways for human temperament have important implications. First, they confirm our hypothesis that the human temperament is based on the highly conserved mechanisms for habit learning. This supports a precise definition of temperament in terms of associative conditioning [17,18]. Second, the independent experimental support for specific molecular pathways for associative conditioning provides support for the validity of the strictly data-driven method we used to analyze and interpret genome-wide association data.
These results should encourage widespread use of PGMRA for analysis of complex phenotypes in a variety of settings, including GWAS [54,55] and neuroimaging [53]. For example, PGMRA provides an effective way to allow for epistasis and gene-environment interactions that are prominent in complex phenotypes, thereby overcoming the hidden heritability problem (that is, the consistent inability to account for most of the heritability of complex traits when only the average effects of genes are considered). The generalized clustering method implemented in PGMRA can be interpreted as a deep unsupervised NMF learning process that can identify clusters of individuals with distinct features from various types of information, such as the genotypes, phenotypes, and environments (Supplementary Figure S1). Such clusters, SNP sets, and temperament sets can be used as auto-encoders used by recommender systems in precision medicine [55].

Strengths and limitations
The major strength of these findings is the strong replicability of the findings in three independent samples from different cultures and in independent studies of gene expression during behavioral conditioning of experimental animals. While it is true that cluster analysis is a hypothesisgenerating method in which there is no unique solution to the number of clusters, which features are relevant for a cluster, or the degree of homogeneity to be demanded for each cluster, PGMRA included a practical and robust solution for each of these problems [53,54].

Conclusions and recommendations for future research
We were able to describe and replicate the complex genotypic-phenotypic risk architecture of temperament in three independent samples of people. Our unbiased data-driven findings confirm the hypothesis that temperament is based on associative conditioning and related processes, particularly stress reactivity in response to extracellular stimuli. We have found that different molecular and cognitive processes are associated with character [40], but health status depends on genotypic-phenotypic relations that influence both temperament and character. Therefore, we recommend further work to examine the overlap and interactions between temperament and character.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.