The term “executive functions” (EFs) describes a group of higher-level cognitive abilities [1], including the regulation of thoughts and actions in daily life [1, 2]. As humans age, EFs pass different developmental stages, in which great variability is observed both within and between individuals [3, 4]. EFs naturally decline with advanced age [4,5,6] in a gender-specific manner [7] and diminished EFs are also observed in the longitudinal course of severe mental disorders, such as schizophrenia [8]. In particular, EFs appear to be generally impaired in psychiatric patients suffering from schizophrenia, depression [4], or bipolar disorder [9]. Deficits are also associated, for example with decreased abilities to perform routine tasks [4]. Neurobiologically, EFs are linked intimately to the prefrontal cortex, as exemplified by the famous case of Phineas Gage [10].

There are many definitions of an EF [3], as it represents an umbrella term for multiple cognitive processes [2]. An influential theory of EFs is the “unity and diversity” concept [3, 11] that describes EFs as a “collection of related but separable abilities“ [3]. EFs are differentiated into three latent core skills [3, 4, 11]: (i) set-shifting, allowing an individual to approach tasks flexibly and adjust to new conditions [3, 4], (ii) updating (or working memory), with respect to the monitoring, manipulating, and updating of information [4, 11], and (iii) inhibition, enabling an individual to control behavior, emotions, and responses [4, 11]. In general, EFs rank among the “most heritable psychological traits” [3]. On the behavioral genetic level, a highly heritable latent (common) factor affecting all EF aspects accounted for 99% of the variance common to all three skills [3]. Regarding specific EF components, the heritability estimates of set-shifting assessed by the Trail Making Test (TMT) range from 0.34 to 0.65 [12] and the estimates of updating measured by digit span tests range from 0.27 to 0.62 [12] (these results were obtained in twin studies). Recently, several genome-wide association studies (GWASs) on EFs have been undertaken [13,14,15,16,17,18]; however, genome-wide significance was not attained [2, 12]. Moreover, the genetic basis of variation over time is yet to be elucidated [19].

Here, we performed two longitudinal GWASs for the set-shifting and updating EF abilities assessed by the Trail Making Test, part B (TMT-B) and the Verbal Digit Span backwards (VDS-B), respectively, to identify genetic variation associated with the course of EFs across time. We used a linear mixed model (LMM) to model the dependence structure of the longitudinal PsyCourse Study [20] with four measurements across time. To validate our findings, we also performed a replication study using data from the FOR2107 consortium [21], which assessed two measurements over time.

Materials and methods

Discovery sample: PsyCourse Study

The PsyCourse Study is a multicenter longitudinal study that combines multilevel omics and longitudinal data [20]. We included 1338 genotyped individuals (dataset version 3.0) recruited in different centers in Germany and Austria, comprising patients from the affective-to-psychotic spectrum (377 bipolar I disorder, 100 bipolar II disorder, 420 schizophrenia, 95 schizoaffective disorder, 6 brief psychotic disorder, 9 schizophreniform disorder, and 73 with recurrent depression) and 258 psychiatrically healthy controls. The study protocol was approved by the respective ethics committee for each study center and was carried out following the rules of the Declaration of Helsinki of 1975, revised in 2008 (see ref. [20]). All study participants provided written consent [20]. The patients were diagnosed using parts of the Structured Clinical Interview for DSM (SCID-I) and were classified according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria. The patients were broadly differentiated in patients with predominantly affective symptoms (550 “affective”, with recurrent depression, bipolar I and II disorders) and patients with predominantly psychotic symptoms (530, “psychotic”, with schizophrenia, schizoaffective, brief psychotic and schizophreniform disorder) [20]. Deep phenotyping was performed during four visits, each ~6 months apart (see ref. [20]), thus corresponding to time t of the longitudinal course.

Set-shifting and updating were assessed with the Trail Making Test, part B (TMT-B) [22] and the Verbal Digit Span backwards (VDS-B) [23], respectively. The TMT-B requires an individual to connect numbers (numbers: 1–26) and letters of the alphabet in ascending alternating order. The test score was the time (in seconds (s)) needed to finish this exercise. As recommended by [24] participants with a time >300 s were set to 300 s. VDS-B measures the updating ability. Here, a trained interviewer verbally presented up to seven pairs of number sequences with increasing length, and the study participant was requested to repeat each sequence in backwards order, receiving a point score for each correctly repeated sequence. The maximum possible score for each sequence pair was 2. The process was terminated when an individual failed to repeat correctly both of the sequences in a pair of given length. The test score was the sum of all correctly repeated sequence pairs (range: 0–14).

Replication sample: FOR2107 consortium

To perform the replication study, we used data from the research consortium FOR2107 [21], a longitudinal cohort with two centers, Marburg and Münster (Germany), in which deep phenotyping was performed twice ~2 years apart [21]. In our analyses, we used a sample comprising 1795 individuals with genotype data available divided into five different diagnostic groups (851 affective: 107 bipolar disorder and 744 depression, 112 psychotic: 68 schizophrenia and 44 schizoaffective disorder, and 832 healthy controls). The participants were classified into the same three broad diagnostic groups (affective, psychotic, and controls) as in the discovery sample. Set-shifting was assessed by the TMT-B. In this cohort, participants with a time >180 s were excluded. For updating, we used the Letter–Number-Sequencing Test (LNST) as a substitute for the VDS-B. Here, a trained interviewer verbally presented an increasing sequence of letters and numbers, which the participant was requested to repeat, starting with the numbers in ascending order and ending with the letters in alphabetical order. The test was terminated when the individual repeated the same sequence incorrectly four times. The sum of the correctly repeated sequences was the test score, with a maximum of 24.

Genotyping and imputation

Discovery sample

The Illumina Infinium PsychArray (Illumina, USA) was used for genotyping purposes [20]. Genotypes were imputed with SHAPEIT2/IMPUTE2 using the 1000 Genomes Project Phase 3 data as a reference panel. Quality control (QC) was performed according to standard procedures, as described previously [25] (details Supplementary List 1) and poorly imputed genetic variants (INFO < 0.8) were excluded [20]. We included ~8.2 million SNPs with minor allele frequency (MAF) ≥ 0.01 in our analysis. Ancestry principal components (PCs) were computed with PLINK v1.9 [26] (

Replication sample

To replicate genome-wide significant SNPs of the discovery sample, we analyzed the genotypes of these nine significant SNPs (SNPR). We additionally analyzed 187 suggestive SNPs (SNPNR) with a P value ≤1 × 10−5 in the discovery sample (99 for TMT-B, 88 for VDS-B/LNST) in an exploratory analysis. For the QC in the replication sample, please refer to Supplementary List 2.

Statistical analysis

We performed regression analysis, log-transforming the TMT-B values (lgTMT-B) to fulfill the linear mixed model requirement of normally distributed errors. We present effect estimates with 95% confidence intervals (c.i.s) transformed back to the original scale. Furthermore, we investigated missing data patterns across visits and diagnoses for violation of a missing-at-random (MAR) mechanism [27]. We computed the mean and standard deviation (s.d.) of EFs per visit and diagnostic group, testing for differences in means between diagnostic groups at each visit. For the discovery sample, we fitted LMMs to the longitudinal time course of lgTMT-B and VDS-B, investigating each phenotype first without the SNP terms, and subsequently including them. For each SNP, the fitted model for individual i at visit/time tij with j = 1, 2, 3, 4 was as follows:

$$\begin{array}{l}Y_{ij} = \beta _0 + \beta _1t_{ij} + \beta _2age_i + \beta _3gender_i + \beta _4diagnosis_i + \mathop {\sum }\limits_{k = 1}^5 \beta _{4 + k}PC_{ik} + \\ b_{0i} + b_{1i}t_{ij} + c_icenter_i + \beta _{10}SNP_i + \beta _{11}SNP_i \ast t_{ij} + \varepsilon _{ij}\end{array}$$

The LMM adjusted for agei, genderi, diagnosisi, PCik, i.e., age at visit 1, gender, diagnostic group (affective, psychotic, or control), and the top five PCs, for each individual i, the latter to correct for population stratification. We allowed for random intercepts and slopes b0i,b1i of the trajectories and a random center effect.

For the respective SNP under consideration, we integrated the main effect (SNPi) and the SNP-by-time interaction (SNPi*tij), where the latter is tested (two-sided) for the influence of the SNP on the longitudinal course (see ref. [28]). The interaction term consisting of SNP × diagnosis × time has not been investigated due to the limited sample size. We assumed an additive genetic model with each considered SNP in dosage format. We set the genome-wide significance level to 5 × 10−8, yielding replication SNPs (SNPR), and set the level for suggestive significance to 1 × 10−5 for SNPs to be further explored (SNPNR, not to be replicated). For the replication sample, we separately determined linkage disequilibrium (LD) blocks with r2 > 0.8 for both SNP sets, correcting for multiple testing by dividing 5% by the number of LD blocks for the SNP set [29]. In the end, the SNPR were contained in a single LD block, so the significance level for replication could be set to 5%. The significance levels for the exploratory analysis of the SNPNR were set to 0.05/24 = 0.0021 for lgTMT-B and 0.05/12 = 0.0042 for VDS-B/LNST, respectively.

For the SNP analysis in the replication sample, we analyzed the difference (diff) of lgTMT-B (LNST) between the visits as outcome and SNP, age, gender, diagnosis, and PC’s as covariates. We applied the difference model, as the LMM above contained too many parameters for the replication sample with only two measurements (in total: 613 individuals) and incomplete data resulting in low statistical power (data not shown; two-sided test). Here, the SNP effect may be interpreted as the difference between the average change between the genotypes, especially since SNPR displayed only two genotypes.

We computed LD and haplotypes for Europeans with LDlink [30] and created a regional plot with gene identification using Locus-Zoom [31]. Finally, the average longitudinal course over time per genotype along with 95% c.i. is displayed for the top SNP.

All statistical analyses were performed with R, version 3.5.1 ( The LMM was fitted with the R package lme4 [32] and P values were computed using the Satterthwaite approximation of the lmerTest package [33, 34].


Behavioral characteristics of the EFs

Discovery sample

In comparison with controls, the disease groups were slightly older on average (Table 1). A total of 1272 (1297) individuals had at least one TMT-B (VDS-B) measurement, demonstrating a similar decrease of available data in each diagnostic group (Table 2). Missing value patterns did not hint at any violation of a missing-at-random (MAR) assumption (data not shown). Figure 1 illustrates the mean longitudinal course of TMT-B (left) and VDS-B (right) for each diagnostic group with 95% c.i.s; controls differed significantly from patients (see Fig. 1, c.i.s). Generally, executive performance increased over time, with differences between affective and psychotic patients decreasing over time. An improvement in the respective EF performance is reflected by a decreased TMT-B score for set-shifting and an increased VDS-B score for updating. The individual trajectories were highly variable (Supplementary Fig. 1). The mean difference between diagnostic groups was significant at each visit when adjusting for age and gender (see Table 1). Table 3 displays the time effect estimates in the LMM for each phenotype without SNP stratified by diagnostic group. For lgTMT-B, the time effect within each diagnostic group is highly significant and similar across groups. For VDS-B, the time effects for the two patient groups are similar, very small, and only nominally significant in the psychotic group, but larger and highly significant for controls.

Table 1 Characteristics at visit 1 in discovery sample and replication sample by diagnostic group.
Table 2 Available data of TMT-B and VDS-B per visit for the discovery sample.
Fig. 1: Longitudinal course of TMT-B score (time in seconds, left) and VDS-B score (working memory capacity, right) for each diagnostic group in the discovery sample.
figure 1

Displayed are means with 95% confidence interval for each visit 1, 2, 3, 4, ~6 months apart.

Table 3 Results of the LMM of the discovery sample to test the time effect on lgTMT-B and VDS-B within each diagnostic group.

Replication sample

We analyzed 1795 genotyped individuals with at least one TMT-B and LNST measurement (we deleted data for one individual who had a value larger than the maximum score of 24). Phenotypes were measured at both visits for 34.2%. The means of the diagnostic groups at each visit were significantly different (Table 1) during which the controls had again the best EF abilities, followed by affective and then psychotic individuals (Supplementary Fig. 2).

GWAS of the discovery sample

The QQ-plot (Supplementary Fig. 3) demonstrates that the genomic inflation factor was λ = 1.0034 for lgTMT-B and λ = 0.9999 for VDS-B, hence not indicating any inflation. As illustrated on the Manhattan plots (lgTMT-B Fig. 2A, VDS-B Fig. 2B) for the SNP-by-time interaction in the LMM, we identified nine genome-wide significant SNPs on chromosome 5 (all imputed) in one LD block (r2 > 0.85) for lgTMT-B, and none for VDS-B. For lgTMT-B, 99 SNPs were suggestive, for VDS-B 88.

Fig. 2: Results of the genome-wide association studies of the discovery sample.
figure 2

A Manhattan plot of the GWAS of lgTMT-B in the discovery sample. The lines in (A) and (B) indicate the thresholds for the genome-wide significance of 5 × 10−8 (red) and for suggestive SNPs (blue, P ≤ 1 × 10−5). B Manhattan plot of the GWAS of VDS-B in the discovery sample. C Mean profile of TMT-B by the top SNP rs150547358 genotypes for the discovery sample (1039 AA, 28 AC, 0 CC) with the 95% confidence intervals. D GWAS regional Manhattan plot of chromosome 5 for lgTMT-B of the discovery sample. Colors indicate the LD values (r2) of SNPs with rs150547358 (in purple).

For the nine genome-wide significant SNPs of the GWAS, Supplementary Table 1 displays estimates for the effect of the SNP-by-time interaction on lgTMT with 95% c.i. and P values. The top SNP rs150547358 (P value = 7.2 × 10−10) had an effect of 1.16 (95% c.i. 1.11–1.22) seconds per measurement (spm) in the discovery sample on the original TMT-B scale. We present the mean plot for the top SNP in Fig. 2C, where the TMT-B score increases over time for heterozygotes with risk allele “C”. Figure 2D displays the regional Manhattan plot with three genes in or near the nine significant SNPs. Four of them, including rs150547358, are located in an intron region of ring finger protein 180 (RNF180) (Supplementary Table 1). Other genes located nearby are regulator of G protein signaling 7 binding protein (RGS7BP) and 5-hydroxytryptamine receptor 1A (HTR1A), but neither contained any of the nine SNPs. For the SNP main effect, also included in the model, we did not observe any genome-wide significant SNPs (Supplementary Fig. 4; P < 5 × 10−8).

Difference analysis of the replication sample

The analysis of the differences also identified the top SNP, rs150547358, as significant (P = 0.015), and thus replicated this GWAS-significant LD block. The effect estimate for the top SNP was 0.85 (95% c.i. 0.74–0.97) on the original scale and the highest effect size in the scale of the analysis (greatest negative effect). The estimates for the other SNPs were slightly larger when transformed back to the original scale and also positive (see Supplementary Table 1 for the summary).

Exploratory analysis of the GWAS-suggestive SNPNR in the replication sample yielded no significant results after multiple testing corrections for either phenotype (Supplementary Fig. 5).


We performed a GWAS on the longitudinal course of EFs and detected nine SNPs within the same LD block associated with change over a relatively short period of time (1.5 years) in the EF core skill set-shifting. Importantly, we were able to replicate a significant result for this LD block in an independent sample, which was observed in a heterogeneous population including controls and different psychiatric disorders of the affective-to-psychotic spectrum across age groups. Analysis of TMT-B performance of C-allele carriers, in contrast to the AA genotype, revealed a pronounced slowing over time.

Recently, the analysis of longitudinal data has come to the fore in genetic research. Multiple methods have been developed to perform GWAS with longitudinal data [35,36,37,38,39,40] for binary as well as continuous phenotypes. These analysis methods are mostly applied to analyze long-term developments of the investigated phenotypes [41, 42], as most data comprise multiple measurements over a relatively long period of time. These longitudinal studies often detect group effects [8] based on age or baseline cognitive functions, for example. To date, short-term variability, for example with respect to the longitudinal course of schizophrenia has been found as reviewed [8], but without considering a potential genetic effect. In our longitudinal GWAS, we enter uncharted territory as we study short-term courses of cognitive phenotypes in relation to the genetic background. The discovery sample, the PsyCourse Study, is unique in this sense, as it assesses the phenotypes multiple times in a very heterogeneous sample over a relatively short period of time (18 months). Here, the main interest is the observation of short-term changes specific to a phenotype, such as EF skills, and the use of newly identified characteristics to detect genotype–phenotype associations. The genetic variants found in this study may, if further replicated, be used to improve clinical evaluation of the longitudinal course of EF skills. Knowledge of the genetic status of a patient may, in the future, enhance the interpretation of the course of EF abilities e.g., during psychiatric treatment. Moreover, special training programs could support patients with a known genetic disposition to lack improvement over time. To our knowledge, no other study has performed such analyses to date.

Behavioral results

Prior to our GWAS, we studied the short-term courses of changes in cognitive abilities, focusing on the differences between the diagnostic groups considered. In the discovery sample, we observed an identical pattern for both phenotypes: psychotic individuals demonstrated the lowest EF abilities, followed by those with affective disorders and then the control individuals. This greater EF impairment in psychotic individuals compared to controls is well-documented, as exemplified by [43]. However, regarding the impairment difference between bipolar (affective) and schizophrenic (psychotic) patients, there are various studies [43,44,45,46,47,48] analyzing these differences. The hypothesis exists that bipolar patients demonstrate less severe impairment in comparison to schizophrenic patients [49]. Some studies [44, 46, 48] lend their support to this hypothesis, though not always statistically significant, whereas others detected similar levels of impairment in symptomatic patients [45, 47]. In our analysis, we observed a statistically significant difference between affective and psychotic individuals at visit 1 but detected a decline in these discrepancies over time. The abilities of these two diagnostic groups converged with patients from the psychotic group displaying an improvement in their skills and patients from the affective group presenting a more constant course. Documentation of the EF convergence is only possible thanks to the longitudinal design of the discovery sample and represents a great advantage of this study design.

Owing to the slightly different age structure of the two study samples, with the discovery sample being minimally older on average at visit 1, we further observed the impact of age reflected by the minimally lower average test score. That is, the discovery sample had lower VDS and greater TMT-B scores than the replication sample. The TMT-B mean scores may also be influenced further by the different cutoff thresholds of 300 s in the discovery sample and 180 s for the replication sample.

Genome-wide association studies

To our knowledge, the LD block comprising the nine SNPs we detected for the set-shifting ability has been not identified in any GWAS before. These SNPs are part of two common haplotypes, that is, 97.7% carry the haplotype consisting of the major alleles and 1.7% have the rare haplotype with only minor alleles in European populations [30]. However, we did not observe different allelic distributions between the three diagnostic groups (Supplementary Table 2). We displayed the longitudinal course for the two genotypes “AC” and “AA” of the top SNP rs150547358, observing a steady increase in the TMT-B score for “AC” and an almost unchanging course for “AA”. Consequently, the minor allele C was associated with a decline in the set-shifting ability of ~5 s over a period of 18 months for AC with a large c.i. at the last visit owing to the small number of available heterozygous individuals. This result reflects a relatively high decrease in the ability over this short period. Furthermore, it portrays a highly interesting observation, which is further underpinned when we consider the genetic region of the nine SNPs. Variant rs150547358, the significantly replicated SNP, is one of four associated SNPs directly located in the ring finger protein 180 (RNF180) gene on chromosome 5q12.3. It is an E3 ubiquitin-protein ligase [50], whose product is involved in protein modification. RNF180 is associated with the regulation of monoamine levels in different brain regions, for example, the prefrontal cortex (PFC) in RNF180 knockout mice [51]. The PFC is a critical part of the frontal lobe in the development of EFs [4, 52]. Another gene located in the nearby region, HTR1A (5-hydroxytryptamine receptor 1A), is an important receptor of serotonin (5-HT) also essential to the prefrontal lobe. More importantly, HTR1A is an autoreceptor, located on the cell bodies of serotonin-synthesizing neurons of the brainstem dorsal raphe nucleus, helping to maintain homeostasis in serotonergic function [53]. Furthermore, a genetic polymorphism in the 5-HT system has previously been implicated in EF performance [12].

In an additional exploratory gene-set analysis performed with MAGMA v1.06 as a part of the FUMA pipeline ( [54], we did not receive significant (Bonferroni-corrected P values ≤0.05) pathways for either phenotype.

Our results are a first step in the direction of understanding the molecular genetic influences on the longitudinal course of EFs. We were unable to consider the third core ability, inhibition, which also plays an important role for EF, because we could not fulfill a specific assessment requirement resulting from the multicenter and interview-based structure of the discovery sample [20]. Many unknown factors remain, such as the genetic aspects due to the correlation of the different EF abilities, as we only concentrated on individual EF core skills in two separate analyses. According to the “unity but diversity” concept [11] that also concerns the genetic underpinnings of the EFs, a genetic study of a latent common factor needs to follow. Further, we need to acknowledge the problem of missing data which is a great challenge in longitudinal studies as presented in our samples. Here, selecting the correct analysis method, e.g., linear mixed models are imported but generally, more longitudinal studies with multiple time points and greater sample sizes will be required to unmask further time and genomics interactions [19].