Introduction

Schizophrenia is among the most debilitating and highly heritable of mental disorders. Recent shifts in our conceptualization of neuropsychiatric illnesses suggest that such disorders might be better understood in terms of underlying behavioral or neurobiological dimensions rather than as categories.1 Evidence from neuroscience,2, 3, 4, 5 behavioral genetics6 and prospective clinical studies7, 8 suggest that schizophrenia is associated with quantitative variations in neurobiological and neurocognitive systems. These observations have led to several hypothesized relationships between schizophrenia and neurocognitive abilities,9, 10, 11 though genetic studies examining the association between schizophrenia and general cognitive ability or educational attainment have been inconsistent.12, 13

Recent genome-wide associations studies (GWAS) provide a window into the genetic architecture of schizophrenia, and support a complex model of psychosis liability. First, these studies demonstrate that, individually, common single-nucleotide polymorphisms explain very little of the variation in schizophrenia liability.14 The effects of common genetic variants must be taken in aggregate to explain a meaningful proportion of schizophrenia risk, indicating that the genetic liability for schizophrenia is the result of differences across many hundreds to thousands of genes and regulatory regions. GWAS data now provide the means to quantify aggregate genetic risk for schizophrenia (hereafter referred to as schizophrenia polygenic risk) for any individual, regardless of their familial or phenotypic risk,14, 15 paving the way for a renewed interrogation of intermediate phenotypes (also known as endophenotypes).16, 17, 18 We use the term ‘intermediate phenotype’ to refer to measurable variations in biological or information processing systems that are thought to lie along a causal pathway between genetic risk and mental disorder and might be used to understand the downstream effects of validated genetic-risk factors.19 Understanding downstream effects of schizophrenia genetic-risk variants in terms of intermediate phenotypes can highlight potential pathways that lead from genetic variation to schizophrenia, as well as potential biologically informative phenotypes to study outside of case populations.18, 19

Here we take advantage of a data set of ~4300 individuals ages 8–21 years collected through the Philadelphia Neurodevelopmental Cohort (PNC)20, 21, 22 and a replication sample of ~700 individuals tested as part of the Harvard/Massachusetts General Hospital (MGH) Brain Genomics Superstruct Project (GSP).23 The PNC data set includes genome-wide data on all participants and a comprehensive assessment of neurocognitive function across domains of general and social cognition, where measures were selected and designed to map onto specific neural circuitry.24, 25, 26 We took an unbiased approach towards understanding the relationship between schizophrenia polygenic risk and multiple domains of neurocognitive performance, by first exploring the relationship between polygenic risk and performance across all neurocognitive measures. Genetic effects on complex traits are known to be broadly pleiotropic—a phenotypic screening approach allows us to identify the profile of associations between schizophrenia genetic risk and neurocognition, accounting for such pleiotropy within the neurocognitive phenotypes assessed. We then examined whether there was any evidence of developmentally specific effects of polygenic risk on neurocognition. We operationalize polygenic risk for schizophrenia as the weighted sum of the effects of many thousands of risk alleles across the genome, identified through large-scale schizophrenia GWAS analyses.14, 15 Previous research has suggested that schizophrenia vulnerability might impact neurocognition at specific developmental transitions that happen during puberty and adolescence,27, 28 and thus associations between polygenic risk and neurocognition might be restricted to a particular age range or not begin until a critical developmental period begins. Using the resources of the PNC and GSP, we explore and replicate tests of the hypothesis that individual differences in schizophrenia genetic risk are related to quantitative dimensions of neurocognition.

Materials and methods

Participants

Our primary analytic sample was drawn from the PNC, a Children’s Hospital of Philadelphia health network-based sample of ~9500 individuals ages 8–21 years from the greater Philadelphia area (details in Calkins et al.20 and Gur et al.24). Participants who provided assent/consent gave genetic samples and written permission to be recontacted for further research. The University of Pennsylvania and Children’s Hospital of Philadelphia Institutional Review Boards approved the study. After stratification based on sex, age and ethnicity, PNC participants were recruited through random selection. Inclusion criteria were (1) ability to provide informed consent (parental consent where age<18 years), (2) English proficiency and (3) physical and cognitive functioning sufficient to complete clinical assessment interviews and cognitive testing on a computer. Participants were not excluded for any other medical concerns, including psychiatric illness. No recruitment was done at psychiatric clinics, however, so the sample is not enriched for individuals seeking psychiatric care.

Our replication sample was drawn from the Harvard/MGH Brain GSP. The GSP is a study cohort that includes neuroimaging, genomic and cognitive data on over 4000 healthy participants.23 The present sample included an age-restricted subsample that performed behavioral tasks compatible with the discovered effects in the PNC. Because of the differences in protocols, only one PNC assessment could be tested for replication. All GSP participants provided written informed consent for biomedical research approved by the Partners Healthcare Institutional Review Board or the Harvard University Committee on the Use of Human Subjects. Inclusion criteria were (1) English proficiency, (2) age 18–35 years, (3) no history of psychiatric illness or major health problems and (4) physical and cognitive functioning sufficient to complete magnetic resonance imaging scanning and cognitive testing.

Genetic analysis

As polygenic risk scores are sensitive to ancestry, we restricted our analysis to genotyped individuals with self-described white non-Hispanic ancestry. Within this subsample, we further excluded population outliers (prior to imputation) based on directly genotyped single-nucleotide polymorphism data. Data cleaning and imputation were performed using standard procedures (see Robinson et al.21 for PNC data; Hibar et al.29 for GSP data; see also Ripke et al.30). In the PNC data set, the ancestry threshold was relaxed to a pi_hat of 0.1 (0.125 for the GSP data set), which is a level of relatedness equal to or less than that of first cousins. Imputed data were used to generate individual schizophrenia polygenic risk scores using the procedures described in Purcell et al.15 and Ripke et al.14 In brief, polygenic risk scores estimate genome-wide common variant liability for a trait through a weighted sum of many thousands of risk alleles. The polygenic risk score used here was generated using summary statistics from the Psychiatric Genomics Consortium recent meta-analysis of schizophrenia,14 and includes only single-nucleotide polymorphisms with P<0.05. This version of the score was selected because it most commonly maximized the schizophrenia risk explained in independent case–control samples (~20% of case–control variation).14 In addition to limiting the analysis to participants of European descent, the first 10 principal components of ancestry were controlled for in all analyses. There was no relationship between schizophrenia polygenic risk and age or sex in either sample.

Phenotypic neurocognitive assessment and analysis

PNC participants completed the Computerized Neurocognitive Battery (CNB).24, 25 The CNB was developed from tasks that map onto specific brain systems, as identified through functional neuroimaging.26 Psychometric properties (adult and pediatric samples) and task descriptions for the CNB measures are included elsewhere.21, 24, 25, 26, 31 The CNB provides accuracy and speed measures of: executive function (abstraction and mental flexibility, attention and working memory), memory (verbal, spatial and facial), complex cognition (verbal reasoning, nonverbal reasoning and spatial processing), social cognition (emotion identification, emotion differentiation and age differentiation) as well as speed measures for sensorimotor and pure motor function. All PNC neurocognitive tests were completed on a computer in the laboratory or at home, according to family or participant preference. For specific test names, see Table 1.

Table 1 Specific test names are given corresponding to each neurocognitive domain assessed

GSP participants completed a battery of personality, cognitive and behavioral measures assessing a broad range of domains. A full list of measures and details of neurocognitive assessment for the GSP are described in Holmes et al.23 A subset of measures that are available in the PNC were also available in the GSP data set (in comparable or identical form). All GSP neurocognitive tests were completed using online versions of the tests, on a participant's own computer.

Differences in performance attributable to age and sex were regressed out of all neurocognitive variables (speed and accuracy) prior to analysis. Any individual scores more than four s.d. from the mean for a particular test were excluded. Outliers were determined based on age group means. Linear regression was then used to examine the relationship between schizophrenia polygenic risk and neurocognitive performance. In our primary analytic sample (PNC), Bonferroni correction was applied to correct for the number of comparisons across all neurocognitive variables (26 comparisons: 12 accuracy variables and 14 speed variables) with an alpha threshold of 0.05. Bonferroni correction is appropriate for family-wise error adjustment even in the case where all phenotypes are independent of one another. As neurocognitive performance across difference measures tends to be somewhat correlated, this correction ensures that the probability of a false positive is (at most) 0.05, and provides a strict standard of statistical evidence in these analyses. Thus, this correction is conservative given the correlation structure of the neurocognitive phenotypes.21

Results

The final analytic sample from the PNC included 4303 participants (50% female) ranging in age from 8 to 21 years (near uniform distribution with mean age of 13.8 years). As described in Robinson et al.,21 most of the 838 genotyped white non-Hispanic PNC individuals excluded from this analysis were removed for outlying ethnicity or relatedness to another person in the data set. After Bonferroni correction, two neurocognitive measures were significantly associated with schizophrenia polygenic risk at P<0.05 (Figure 1a). These were verbal reasoning speed (analogies) (β=−0.058 l, P<5E−4 uncorrected, P<5E−3 corrected) and emotion identification speed (β=−0.066, P<5E−5 uncorrected, P<1E−3 corrected). For both variables, increases in polygenic risk were related to linear decreases in speed of responses, with the highest quartile showing the slowest speeds (longest reaction times) and the lowest quartile showing the highest speeds (shortest reaction time; Figure 1b). This was not the case for other variables related to response speed, for example sensorimotor speed, where no differences were observed between the lowest and highest quartiles of schizophrenia genetic risk (Figure 1b). Schizophrenia polygenic risk was unrelated to matrix (nonverbal) reasoning ability, a common proxy for general intelligence (no relationship after correction for multiple comparisons; P=0.035 uncorrected, P=0.91 corrected) or general cognitive ability (that is, general intelligence or ‘g’, based on factor analysis; P= 0.64). We estimated ‘g’ using standard methods,32 specifically by taking scores from the first principal component (unrotated) based on a principal component analysis using all accuracy variables (tests 1–12; Table 1) in the PNC sample.

Figure 1
figure 1

Schizophrenia polygene scores and neurocognitive performance. (a) Linear regression was used to estimate associations between schizophrenia polygenic risk, estimated from genome-wide data, and performance for each neurocognitive variable (labels shown on the right). To best illustrate the strength of the evidence for each association, relationships are plotted in terms of the negative base-10 logarithm of the P-value, when regressing neurocognitive performance on schizophrenia polygenic risk estimates for each participant. The gray line shows the threshold for statistical significance based on P<0.05, uncorrected. The black line shows the threshold for statistical significance after Bonferroni correction for all 26 comparisons (P<0.05 corrected). Red bars show variables where an association exceeded the threshold for statistical significance (verbal reasoning speed and emotion identification speed). For nonsignificant associations, black bars indicate a negative relationship between schizophrenia polygenic risk and neurocognitive performance (that is, greater polygenic risk associated with poorer performance) and gray bars indicate a positive relationship. (b) The relationship between schizophrenia polygenic risk and neurocognitive performance is shown for the two speed variables (emotion identification and verbal reasoning) where associations were statistically significant after correction for multiple statistical tests and (for comparison purposes) associations with a general measure of response speed. Participants from the primary analytic sample (PNC data set) were divided into four groups of approximately equal size based on level of schizophrenia polygenic risk. Quartile 1 (Q1) includes individuals with the lowest schizophrenia polygenic risk. Quartile 4 (Q4) includes individuals with the highest schizophrenia polygenic risk. Mean z-score is plotted on the y axis, with higher values reflecting better performance. For both emotion identification and verbal reasoning speed, increasing polygenic risk was linearly associated with decrease in neurocognitive performance. There was no consistent relationship—significant or otherwise—between schizophrenia polygenic risk and sensorimotor speed.

To understand the specificity of the relationship between schizophrenia polygenic risk and speed of verbal reasoning and emotion identification, we performed the same analysis, controlling for overall motor speed, sensorimotor speed, as well as matrix/nonverbal reasoning ability. Both associations were similar after controlling for motor speed and sensorimotor speed (emotion identification speed: β=−0.079, P<5E−7 uncorrected, P<1E−5 corrected; verbal reasoning speed: β=−0.065, P<5E−5 uncorrected, P<1E−3 corrected) and matrix reasoning ability (emotion identification speed: β=−0.065, P<5E−5 uncorrected, P<1E−3 corrected; verbal reasoning speed: β=−0.057, P<5E−4 uncorrected, P<1E−2 corrected).

To understand the impact of speed accuracy trade-offs on the strength of each association, we also looked at both emotion identification speed and verbal reasoning speed, controlling for emotion identification accuracy and verbal reasoning accuracy, respectively. Emotion identification speed was associated with schizophrenia polygenic risk, even after controlling for emotion identification accuracy (β=−0.065, P<5E−5 uncorrected, P<1E−3 corrected). Verbal reasoning speed was also associated with schizophrenia polygenic risk after controlling for verbal reasoning accuracy (β=−0.056, P<5E−4 uncorrected, P<1E−2 corrected).

For both variables, schizophrenia polygenic risk accounted for approximately 0.3–0.5% of the variation in neurocognitive performance.

Developmental specificity

We evaluated the relationship between polygenic risk and speed over development, by looking at the interaction between schizophrenia polygenic risk and age for both emotion identification speed and verbal reasoning speed (Figure 2). There was no interaction between schizophrenia polygenic risk and age for either emotion identification speed (P=0.38) or verbal reasoning speed (P=0.11). The relationship between schizophrenia polygenic risk and neurocognitive performance was statistically significant by 8 years of age for verbal reasoning speed (β=−0.094, P<5E−2) and by 9 years of age for emotion identification speed (β=−0.11, P<5E−2). Thus, there was no evidence that the associations between schizophrenia polygenic risk and emotion identification, or between schizophrenia polygenic risk and verbal reasoning, were emerging later in development (for example, after puberty). There was also no evidence for the opposite pattern—early associations that disappeared later in development due to potential compensatory changes. Overall, our analyses did not provide any indication of modulation in the relationship between schizophrenia polygenic risk and emotion identification or schizophrenia polygenic risk and verbal reasoning across development.

Figure 2
figure 2

Schizophrenia polygene scores as predictors of emotion identification speed and verbal reasoning speed, across age. Shown are associations between schizophrenia polygenic risk and speed of emotion identification and verbal reasoning, at each year of age. Bars give ±1 s.e. of the effect size estimate. Although both measures were significantly associated with schizophrenia polygenic risk, there was no significant interaction of polygenic risk and age on neurocognitive performance for either variable.

Replication in an independent sample

In the GSP data set, there were 695 participants completed the emotion identification measures used in the CNB and met our inclusion criteria. These participants were 53% female and ranged in age from 18 to 35 years (mean: 21.5 years; s.d.: 3.2 years). This independent sample allowed us to test replication of the relationship between schizophrenia polygenic risk and emotion identification speed. The stimuli for the emotion identification task were shared from the CNB at the initiation of the GSP data collection effort, allowing for a true, independent replication using the same stimuli and test in an independent cohort. The association between schizophrenia polygenic risk and emotion identification speed was replicated, despite GSP participants being sampled from a very different population (β=−0.09, P<0.05; see Figure 3). This association again survived controlling for general intelligence (as estimated by matrix reasoning performance) and a measure of psychomotor response speed. The CNB emotion identification test was the only neurocognitive test that overlapped between the two samples. Verbal reasoning speed was not one of the neurocognitive measures included in the GSP, so we were unable to assess replication for this association.

Figure 3
figure 3

Schizophrenia polygenic risk and emotion identification speed. Replication in an independent sample of adults. Shown are effect size relationships between emotion identification speed (controlling for age and sex) and schizophrenia polygenic risk in the original discovery sample (PNC ages 8–21 years) and the replication sample (GSP ages 18–35 years). Black bars show s.e. of the β values in both samples. GSP, Genomics Superstruct Project; PNC, Philadelphia Neurodevelopmental Cohort.

Discussion

In two large samples, spanning middle childhood to adulthood, we identify a small, but statistically replicable relationship between schizophrenia polygenic risk and speed of emotion identification. We also found an equally significant association between schizophrenia polygenic risk and speed of verbal reasoning in our developmental sample (ages 8–21 years). Both associations survived correction for multiple comparisons in our developmental (discovery) sample and the inclusion of covariates related to general intelligence and sensorimotor speed. Critically, the association between schizophrenia risk and emotion identification replicated in an independent, demographically distinct sample of adults.

We found no evidence that either association depended on developmental phase: effects were significant by 9 years of age and effect sizes were consistent across middle childhood, adolescence and early adulthood. These results suggest a relatively early perturbation affecting neurocognitive development that is present before the typical onset of psychotic illness in young adulthood. Finally, we found that even intermediate levels of schizophrenia polygenic risk were associated with reductions in emotion identification and verbal reasoning speed. In other words, the relationship between schizophrenia polygenic risk and performance was evident across the spectrum of polygenic risk, as opposed to being observed only in those at the highest polygenic risk.33 Our results indicate that common variants that increase risk for schizophrenia impact the development of specific aspects of social cognition and possibly verbal reasoning.

The finding of a relationship between schizophrenia polygenic risk and emotion identification performance is consistent with a large body of literature emphasizing the importance of social cognition in schizophrenia, individuals who go on to develop schizophrenia and individuals at risk of schizophrenia. Schizophrenia is associated with profound and consistent deficits in social cognition.34, 35, 36, 37 These deficits appear early, sometimes decades before the onset of illness38, 39, 40 and are strongly related to core aspects of symptomatology and everyday social functioning.41, 42, 43, 44, 45 Differences in social cognition are observable in healthy individuals with a potential genetic predisposition towards developing schizophrenia.46, 47, 48 Early environmental factors linked with the development of schizophrenia49, 50 are also associated with adult differences in social cognition.51 Even in healthy populations, quantitative differences in psychosis-like characteristics are linearly related to differences in emotion identification.33 There is extensive literature with functional neuroimaging indicating abnormalities in regional brain activation to emotion identification tasks in patients with schizophrenia4, 5, 52 and those with psychosis spectrum features,53 including individuals from the PNC who underwent neuroimaging.54 Our specific findings provide further support, based on polygenic risk estimates, that schizophrenia is a disorder that is fundamentally related to the development of social cognition, specifically the efficiency of emotion identification.

Abnormalities in verbal reasoning have also been identified in schizophrenia and first-degree relatives of schizophrenia patients.55 Although this association was not available for replication due to the absence of a comparable phenotype in our replication sample, the effect size difference was comparable to our finding relating polygenic risk and emotion identification speed. Perhaps deficits in processing emotion and verbal communication combine in creating difficulties in social communication, contributing to social functioning deficits related to schizophrenia risk. Abnormalities in the development of social understanding and communication—in the presence of other life events or cognitive vulnerabilities—could set the stage for psychosis later in life. This possibility is consistent with the findings in the entire PNC sample that individuals with psychosis spectrum features are delayed in their neurocognitive development already by 8 years of age.56 Notably, the most pronounced delays were in complex reasoning and social cognition.

These results have implications for our understanding of schizophrenia genetic risk in relation to neurocognition, particularly social cognition. First, our findings validate the notion that risk genes might perturb cognition in a dimensional fashion—with a linear (as opposed to threshold) relationship between genetic load and cognitive impairment. We found such a relationship for both emotion identification speed and verbal reasoning speed, indicating that quantitative dimensions of genetic risk and cognitive vulnerability can provide critical information for understanding psychopathology. Further, that these relationships did emerge so early may make it unlikely that social deficits arise as a consequence of the expression of psychotic or psychosis-like characteristics (for example, stigma or social rejection could drive social isolation and subsequent deterioration of social cognitive skills in patients with schizophrenia). Indeed, the differences at even low levels of polygenic risk point to a more direct and fundamental relationship.

It is noteworthy that our findings were mainly for speed of processing rather than accuracy. This effect is consistent with earlier studies that showed reduced speed in individuals genetically related to probands with schizophrenia and may indicate compensatory strategies to mitigate vulnerability.57 Furthermore, processing speed has been implicated in meta-analyses as a major deficit domain in schizophrenia.11 Longitudinal studies are needed to examine whether reduced speed of processing during development portends deficits that extend to accuracy and eventually to schizophrenia.

The present study has several limitations. The wide age range spanning from childhood to early adulthood is an epoch associated with rapid improvement in cognitive performance. While enabling the identification of developmental effects, this characteristic of the PNC may mask effects that could be detected in equally sized samples of adults with a narrower age range. The effects we report, although significant, are small—accounting for only between 0.5 and 1% of variance in neurocognitive performance. Given the small effect size, the association between schizophrenia polygenic risk and neurocognitive performance reported is primarily of theoretical interest by illustrating how polygenic risk is linked to a specific dimensional cognitive domain. The small effect sizes reported here suggest that although current schizophrenia polygenic risk models may be useful for identifying mechanisms, they should not be used for making predictions about neurocognitive performance or making conclusions about individuals. The size of the identified associations may also reflect the overall normative nature of the sample; estimated risk could still confer clinically significant effects in vulnerable individuals. Another limitation of these analyses is that we only looked at schizophrenia polygenic risk, and not polygenic risk for other neuropsychiatric diseases. Emotion identification speed might also be related to polygenic risk for other neuropsychiatric disorders beyond schizophrenia. Finally, only one of the effects could be tested in an independent sample because the other measure was not available, underscoring the need for harmonizing measures across genomic projects.

Genome-wide association analyses now provide robust and replicable methods for quantifying genetic risk for schizophrenia. Although such developments are an important milestone toward unraveling the biological architecture of mental disorders, they have been broadly acknowledged as only first steps toward understanding mechanisms. Here we show that variations in emotion identification speed are one replicable downstream effect of schizophrenia polygenic risk. This finding suggests specific hypotheses about the relationship between schizophrenia polygenic risk and neurobiology that could be addressed by future work. Given the large body of research on the neural basis of emotion identification, future studies might focus on the link between schizophrenia polygenic risk and neural responses in subcortical and superior temporal regions that have been consistently linked with emotion perception. Future work might also look at the developmental trajectories of these regions in relation to schizophrenia polygenic risk, to understand how differences in early development might lead to neurocognitive variations in middle childhood and beyond, and ultimately confer psychiatric vulnerability.

Conclusion

Notwithstanding these limitations, our results highlight a new and important role for intermediate phenotypes in the GWAS-era: as a means of understanding mechanism from validated genetic predictors of disease.19 In this role, intermediate phenotypes are not only useful but also fundamental, as they allow us to understand how genes contribute to the development of disease. Finally, our results point to one potential mechanism linking genetic risk for schizophrenia to psychosis through alterations in the efficiency of emotion identification. These alterations arise early in development before typical onset of psychosis. This finding is consistent with decades of research on the fundamental nature of social deficits in schizophrenia, further indicating that these social deficits may arise from schizophrenia risk-related common genetic variants. The finding that schizophrenia has a genetic basis in individual differences opens up many pathways for further study, including the translation of knowledge from social neuroscience and social cognitive psychology toward the elucidation of the roots of major psychiatric illness.