Introduction

In recent years, many large international consortia have performed meta-analyses of genome-wide association studies (GWASs), identifying numerous associations of common genetic variants (ie, single nucleotide polymorphisms (SNPs)) with a wide variety of diseases and traits.1, 2 The rapid increase in the number of SNPs identified provides an opportunity to systematically examine the quantitative impact of these common genetic variants, individually or collectively as part of a genetic risk score (GRS). Such GRSs offer promise for personalized prediction of complex disease risk with potential future application in clinical practice.

Lifelines is a multi-disciplinary population-based prospective cohort study examining the health and health-related behaviors of 167 729 persons living in the North East region of the Netherlands in a three-generation family design.3, 4 The general aim of the Lifelines Cohort Study is to unravel how life-time exposure to environmental, and genetic risk factors and their interaction influences individual susceptibility to multifactorial diseases. Lifelines not only provides an in-depth characterization of the biomedical, socio-demographic, behavioral, physical and psychological factors that contribute to health and disease in the general population, it also employs a broad disease-and organ-overriding phenotypic characterization of its participants allowing it to validly address questions concerning the multi-morbidity that occurs with ageing, rather than focusing on single-disease conditions. Its representativeness for the population in the Northern Netherlands was recently shown5 and the age-related multi-morbidity was documented.6 Thus, Lifelines is particularly suited to investigate causes of morbidity across disease domains. Along the same lines, genotype–phenotype associations were assessed for multiple disease domains in the current paper.

The aim of this study was to test the validity and reliability of the Lifelines Cohort Study for genetic research of a wide range of complex disease traits, and to gauge whether the ‘heritability gap’ of these traits is closing. To this end, we collected dedicated data on a wide variety of phenotypes and performed genome-wide genotyping on more than 13 000 unrelated individuals. We selected 32 continuous traits from five broad disease areas (musculoskeletal, cardiovascular and renal, metabolic, hematologic and inflammatory, and pulmonary), for which we compiled a list of genome-wide significantly associated index SNPs based on the GWAS catalog2 and performed an extensive literature search. For each of the 32 traits we (i) tested whether associations with previously identified index SNPs could be replicated in the Lifelines Cohort; (ii) determined how much of the phenotypic variance could be explained by the known variants when combined in a GRS;7, 8 and (iii) calculated the percentage of variance explained by additive (h2SNP) and dominance variation (δ2SNP) at all common SNPs, because estimates of the narrow-sense (h2SNP) and the broad-sense heritability (H2SNP=h2SNP+δ2SNP) captured by SNPs9 provide an upper bound to the explanatory power of genetic variants that can be discovered by GWAS.9, 10, 11

Materials and methods

Trait and SNP selection

Our selection of traits was based on two criteria: (i) it had to be a continuous trait measured in the baseline visit of the Lifelines Cohort Study, and (ii) a meta-GWAS on the trait including at least 10 000 European individuals had to have been published. We searched the GWAS catalog1, 2 for original papers describing or meta-analyzing GWASs of the selected traits (date: 01/14/2015). In addition, we performed a literature search to identify papers that used a gene-centric genotyping platform for association analysis (Figure 1). From the resulting list of publications, we selected the paper(s) with the largest sample size of European individuals (at least 10 000 individuals). This led to selection of multiple papers for several traits, if the sample sizes of these papers were similarly large, or there were additional papers using a gene-centric array (Table 1). From these papers we identified index SNPs that were significantly associated with the phenotypes of interest. The criterion for statistical significance depended on the genotyping platform. For analyses based on a genome-wide SNP array, the standard genome-wide significance threshold was used (P=5 × 10−8). For gene-centric genotyping platforms, we used the threshold of significance from the original papers. Preferably, statistical significance was assessed based on the P-value of the combined analysis of discovery and replication samples. If a study did not include a P-value for the combined analysis, we used the P-values from the discovery phase. For each selected SNP, we recorded the effect size (beta+direction of effect), standard error, P-value, and effect allele, when available. For the effect sizes, we used the effect size derived from the combined analysis of discovery and replication cohorts if given. If not, we used the effect size in the replication cohort if available, and otherwise that of the discovery. If multiple papers contributed SNPs for a particular trait and they used different units, transformations or meta-analysis methods, we transformed the effect sizes of the studies to make them comparable.

Figure 1
figure 1

Flow diagram showing the paper and SNP selection process of known SNP-phenotype associations for the 32 traits assessed in this study. For some traits multiple papers were selected (Table 1) if sample sizes of several GWAS papers were similarly large, or there were additional papers using a gene-centric array.

Table 1 List of the 32 selected continuous disease traits in Lifelines from five broad disease areas and details on their transformations, covariates and exclusions used for genetic analysis

Genotyping and imputation

A total of 15 638 presumably unrelated individuals of the Lifelines Cohort Study were selected for genome-wide genotyping on the Illumina CytoSNP 12 v2 array and called in GenomeStudio (Illumina, San Diego, CA, USA). A pre-imputation quality control was carried out in PLINK.12 SNPs with a call rate <95%, Hardy–Weinberg equilibrium P-value <0.0001, or minor allele frequency <1% were excluded, as were samples with a sex mismatch, deviating heterozygosity (>4 s.d. from mean), non-European ancestry, a call rate <95%, or that were duplicates or first-degree relatives. A total of 268 407 SNPs and 13 436 samples remained after quality control. Imputation was carried out using Beagle v3.1.0(ref. 13) with the HapMap Phase 2 CEU reference panel (release 24, build 36).14 To determine the genetic relationship matrices needed for the common-SNP heritability estimation (see below) another imputation was performed using IMPUTE2(ref. 15) with the HapMap Phase 3 reference panel (release 2, build 36), as the HapMap Phase 3 SNP set was optimized to capture common genetic variation in the human genome.16 This latter imputed dataset was converted to best-guess genotypes: the most likely genotype was assigned for each SNP and for each individual when it had a probability >0.9, otherwise the genotype was set to missing for that individual. After conversion, SNPs with a call rate <0.95 were excluded. This yielded a set of 874 760 SNPs.

Statistical analysis

Correction for the Lifelines’ effect size

To obtain unbiased estimates, the validation cohort (ie, Lifelines) needs to be completely independent of the discovery sample.7 As some of the selected papers included Lifelines data in their meta-analysis, we corrected their results for the ‘Lifelines effect’ by recalculating the SNP’s effect sizes (betas) and standard errors (SEs) using inverse versions of the formula for an inverse-variance fixed-effects meta-analysis:

and

where βmeta and SEmeta are the beta and SE of the SNP in the original meta-GWAS paper and βLifelines and SELifelines are the beta and SE in the Lifelines Cohort, which have been provided to the meta-GWAS consortium. With the corrected beta and SE, we recalculated the P-value. If a SNP no longer met the significance threshold of the original paper, it was excluded.

Only independently associated SNPs were included in our study. If a study did not test for independence between identified SNPs at a locus for the same trait, or if SNPs from multiple studies were included for a single trait, we calculated linkage disequilibrium between the SNPs in the Lifelines dataset. SNP pairs with an r2<0.1 were considered to be independent. If multiple dependent SNPs in a locus were reported in a single paper, we selected the most significant one. When multiple papers reported different, dependent SNPs in the same locus for the same trait, we selected the SNP from the study with the largest sample size.

Association analysis

Each trait was associated with the individual SNPs selected for that trait, as well as with two GRSs: the unweighted GRS, ie, the sum of the risk allele dosages of selected SNPs; and the weighted GRS, ie, the sum of the risk allele dosages of selected SNPs weighted by the corresponding effect size from the literature (if applicable corrected for the Lifelines’ effect size). To determine whether the inclusion of SNPs with low imputation quality in the GRSs contribute to explained variance, we repeated the analyses for both unweighted and weighted GRS using only SNPs imputed with high quality (R2>0.5) in Lifelines. Association was tested using linear regression in R (rms package v4.2-0).17, 18 For each of the traits, we used the same unit, transformation, exclusion criteria and covariates as described in the original papers (Table 1) to achieve exact replication. Details on the trait measurements, which were all based on either the physical examination or biomaterial collection at the baseline visit of Lifelines, have been described previously.4 Ten principal components were added as covariates to correct for population stratification. The genome-wide significant SNPs from the literature were considered replicated in Lifelines if they showed a one-sided P-value <0.05 and the same direction of effect. We chose a significance threshold of 0.05 because the selected SNPs are firmly established associated variants and hence no multiple testing correction was applied. To determine the percentage of variance explained (R2) by the GRS, we compared nested models (including covariates) with and without the GRS and calculated the difference in R2 between them. The GRS R2 values used in the remainder of the paper refer to this difference in R2.

Common-SNP heritability

Genomic-relatedness-matrix restricted maximum likelihood was performed using the genome-wide complex trait analysis (GCTA) software package11 to determine the percentage of variance that can be attributed to common SNPs, ie, the common-SNP heritability. Only unrelated individuals (estimated pairwise relationship <0.05) were included in this analysis (max N=10 234). We also estimated the dominance component using a recent extension of GREML.9 The variance explained by additive and dominance variation at all common SNPs are defined as h2SNP=σ2A/(σ2A+σ2D+σ2e) and δ2SNP=σ2D/(σ2A+σ2D+σ2e), respectively, where h2SNP is interpreted as the narrow-sense heritability captured by SNPs and H2SNP=h2SNP+δ2SNP as the broad-sense heritability captured by SNPs. Please note that H2SNP as defined here does not include an epistatic component, which is expected to be small.9

Results

Trait selection

Thirty-two traits fulfilled our criteria, representing five main systems or disease areas that are the focus of Lifelines: musculoskeletal, cardiovascular and renal, metabolic, hematologic and inflammatory, and pulmonary function. Descriptive statistics for the 32 traits and demographic variables in our cohort are shown in Supplementary Table 1.

SNP selection

Figure 1 shows the results of the paper and SNP selection from the literature for the 32 traits of interest. From the GWAS catalog2, 19 and the successive literature search, we identified 243 and 15 papers using genome-wide and gene-centric genotyping platforms, respectively. After filtering for ethnicity, sample size, and relevance/suitability, 29 papers (of which 18 used GWAS data, 7 used gene-centric chip data, and 4 used a combination) were selected as sources of known SNPs for our 32 traits. From these papers, we identified 1709 SNP-phenotype associations. A final number of 1442 index SNP-phenotype associations were included in our analyses (Figure 1; Supplementary Table 2) after exclusion of the following associations: 55 were not statistically significant according to our criteria, for 35 the respective SNP was not present in the Lifelines HapMap Phase 2 imputed data, 106 lost statistical significance after correcting the meta-GWAS results for the effect size of Lifelines because Lifelines was part of the meta-GWAS (see Materials and Methods), and 73 were in linkage disequilibrium (LD) with another SNP in the SNP list. Some index SNPs were associated with multiple traits; the number of SNPs included in our analyses was 1307, of which 967 (74%; accounting for 1057 SNP-phenotype combinations) had a high imputation quality (R2>0.5) in the Lifelines Cohort.

Direct SNP replication

Of the 1442 index SNP-phenotype associations that were tested, 865 (60%; median per trait=75%, interquartile range=59.8–88.2%) reached statistical significance (Table 2; Supplementary Table 2). When considering only high-quality imputed SNPs, the replication rate increased to 66.2% (700/1057; median=83.8%, interquartile range 63.6–98%). None of the SNPs had a significant effect in the opposite direction, if we would have used a two-sided test. Furthermore, of the non-replicated SNP the direction of effects were highly consistent with the directions from literature. A median of 86.1% (interquartile range 75–100%) of all non-replicated SNPs per trait showed a direction of effect that was consistent with the literature (100% (85.6–100%) for high-quality SNPs; Supplementary Table 3). The replication rates decreased with increasing sample size of the GWAS discovery from which the SNPs were selected, whereas the percentages of non-significant SNPs with a consistent direction of effect increased (Supplementary Figure 1). The Lifelines’ effect sizes correlated well with those from the literature (if applicable, after correction for the Lifelines’ effect size; Supplementary Figure 2).

Table 2 Numbers and percentages of previously reported SNPs that were statistically significantly associated in Lifelines with the trait of interest

Genetic risk score analysis

The number of index SNPs incorporated in the GRSs ranged from 0 (for UACR) to 476 (for height) when only high-quality SNPs were used, and from 1 (for UACR) to 635 (for height), when all SNPs were used (Table 3; Supplementary Table 4). All GRSs were significantly associated with their respective traits. In general, inclusion of low-quality SNPs in the GRS resulted in an increase in phenotypic variance explained by the GRS. For the weighted GRS built from all SNPs, the median of the relative increase was 10.5% (interquartile range=3.9–19.9%) and for the unweighted GRS it was 11.7% (3.4–20%) compared the GRSs based on only high-quality SNPs.

Table 3 Estimates of the explained variances by unweighted and weighted GRSs and the additive (h2SNP) and dominant (δ2SNP) variance components captured by common SNPs

The most significant GRS was the weighted GRS constructed from all SNPs for height (P<1 × 10−320), which explained 15.52% of the phenotypic variance. The percentages of explained variance for the other traits ranged between 0.02% (FVC) and 6.67% (HDL). For all but four traits, the weighted GRS model was more significant and explained more phenotypic variance than the unweighted model (Table 3). For the exceptions (BMI-adjusted WHR, heart rate, HbA1c, and FVC) the numbers of SNPs included in the GRSs were small and/or the percentage of variance explained was low.

Common-SNP heritability

The broad-sense common-SNP heritability estimates were statistically significant for all 32 traits except for body mass index-adjusted waist–hip ratio (Table 3). The percentage of phenotypic variance that could be explained by the additive effect of common SNPs was highest for height (48.9%) and ranged from 5.6 to 39.2% for the other traits. The dominance effects, on the other hand, were uniformly small (0–1.5%) and not significant.

Discussion

In this study, we investigated in over 13 000 participants of the Lifelines Cohort Study to which extent the heritability of 32 complex traits can be explained by previously reported, genome-wide significantly associated SNPs. We first demonstrated that the majority of previously reported SNP-phenotype associations (median=75%) could be replicated and that there was high enrichment of effects in the right direction for the non-replicated ones (median=86.1%), indicating that power was likely insufficient for those SNPs. These percentages increased to 83.8 and 100%, respectively, when only high-quality SNPs were considered. Second, all unweighted and weighted GRSs combining the information of these SNPs were significantly associated with their respective traits, with weighted GRSs generally explaining more phenotypic variance than the unweighted GRS. Inclusion of poorly-imputed SNPs in GRSs in general still contributed to the variance explained, advocating for the use of all known SNPs and not only high-quality ones when constructing GRSs. The total variance explained by the weighted GRSs constructed from all SNPs was 15.52% for height and between 0.02 and 6.67% for the other traits. The additive genetic variance at all common SNPs explained a significant proportion of the phenotypic variance for all traits ranging from 5.6% (body mass index-adjusted waist–hip ratio) to 48.9% (height), but none of the traits showed significant dominance genetic variance.

As mentioned above we could not replicate all SNPs that were previously identified in large meta-GWASs. This is likely due to reduced power resulting from a sample size of ‘only’ 13000 individuals in comparison to the much larger GWAS discovery samples. Power analysis shows that our study had sufficient power (>80%) to detect effect sizes of 0.04 s.d. for very common SNPs (allele frequency >20%), 0.08 s.d. for SNPs with an allele frequency between 5 and 20%, and 0.16 s.d. for SNPs with an allele frequency between 1 and 5% (Supplementary Figure 3). As expected, SNPs extracted from smaller GWASs were more likely to be replicated in Lifelines as in those GWASs only SNPs with larger effect sizes could be discovered. Conversely, SNPs from smaller GWASs more often had an effect in the opposite direction, indicating that effect sizes from the large GWASs could be estimated more accurately.

As far as we know, we are the first to determine how much of the phenotypic variance is explained by both known and common SNPs to measure the current heritability gap for a large number of complex disease traits in one homogeneous population. However, explained variances of GRSs and common-SNP heritabilities for individual traits have been estimated before, and our overall results are consistent with this literature. For instance, various studies estimated the common-SNP heritability of height to be 40–50%,11, 20, 21 which matches our finding of 48.9%. The study from which most height-associated SNPs were extracted22 reported that genome-wide significantly associated SNPs explained 16% of the variance in height, which is comparable to our GRS result (15.5%). Only one trait (LDL cholesterol concentration) differed considerably from the literature: we found a common-SNP heritability of 27%, whereas a previous study in the Icelandic population found only 10%.21 Zaitlen et al. used a slightly different method, which could have caused the difference, but it might also reflect a population specific effect.

In our study, the dominance effect of common SNPs was not significant for any trait. This is consistent with one earlier study that was unable to detect any replicable dominance effects for 79 quantitative complex traits using data from three large European cohorts including Lifelines as a replication sample.9

Although all our GRSs composed of known SNPs were significantly associated with their respective outcomes, they explain only a fraction of the total common-SNP heritability of these complex traits. On average, the variance explained by the weighted GRS accounted for only 10.7% of the common-SNP heritability. Only for height (15.5%/48.9%=31.7%), alanine transaminase (24.5%), fasting glucose (28%), and HDL cholesterol (35.3%), known SNPs accounted for a considerable part of the common-SNP heritability. Our data thus confirms what has been found previously for height, BMI, and QT interval,10, 23 but now extends these observations to many other complex disease traits. There are a number of potential causes for the large gaps between the common-SNP heritability and the part that can be explained by all identified SNPs. It may be due to errors in the estimation of SNP effects, but it is likely also due to the way in which SNPs are selected for a GRS. Usually the selection of SNPs is restricted to genome-wide significant SNPs only. This is likely too conservative, as low power will result in SNPs with a small effect or low allele frequency not reaching the genome-wide significance threshold. Prediction accuracy will probably increase when less stringent significance thresholds for the selection of SNPs are applied. Polygenic risk score analyses could be applied using various significance thresholds to determine the percentage of variance explained by larger number of SNPs. We should also keep in mind that the markers detected with GWAS are not likely to be the causal variants, but merely in LD with them. As the SNPs on GWAS chips are selected because they are common variants, and low-frequent variants are in low LD with them, power to detect rare causal variants is limited.7

Some investigators have argued that a considerable part of the missing heritability may be caused by non-additive effects such as dominance and epistasis.9, 24 As such, an important result of our analyses, also confirmed by Zhu et al.,9 was that we were unable to find strong dominance effects. This is further supported by the findings of Zaitlen et al.,21 who developed a method to include both closely and distantly related individuals, and used this to analyze the inflation of narrow-sense heritability. Their findings supported neither dominance nor epistatic effects, and they attributed the inflation of narrow-sense heritability to shared environmental factors in datasets using close relatives. Epigenetic variation including methylation profiles have been suggested as another source of missing heritability, but a recent paper including samples from Lifelines found that methylation and genetic predictors for BMI did not overlap, suggesting that the former represent environmental effects on this trait. For height, methylation profiles did not explain any variation.25

In the current study, we paid careful attention to the construction of the GRS to determine the percentage of variance explained by known SNPs. The advantage of GRS is that it is conservative, as it only uses verified markers, and that its results are fairly robust. However, a number of issues need to be considered when using a GRS (see also the review by Wray et al.7). First, there should be no overlap in the samples from which the SNPs were selected and to which the GRS is applied, as this would yield overestimates of the percentage of variance explained. Consequently, for meta-GWASs that included Lifelines, we adjusted the SNP effect sizes by subtracting the Lifelines’ effect size, and excluded the SNPs that were no longer genome-wide significant after this correction. Second, if the replication sample is more closely related to the discovery population than to the target population, or if population stratification patterns of the discovery and replication samples are similar, the prediction accuracy will be overestimated.7 For the selection of SNPs we included only papers that used European individuals, to match the ethnicity of Lifelines and increase the probability that the SNPs replicated with a similar effect size. As a consequence our results may be less applicable to other ethnicities and percentages of explained variance are likely to be lower in non-European cohorts. Third, all SNPs included in the GRS need to be independent, because otherwise regions with high LD will contribute more to the GRS than regions with low or no LD, leading to biased results. This check is mainly important when multiple meta-GWAS and/or gene-centric studies are used for the selection of SNPs. For this reason we excluded 71 (4.2%) dependent SNPs from our original SNP set. Fourth, it is advised to check that the effect direction of all individual SNPs in your cohort matches that of the literature, otherwise effects of misaligned and correctly aligned SNPs might cancel each other out in the GRS and the GRS might turn out to be insignificant. As a check on correct modeling of the GRS, when using the original effect sizes as weights, and assuming that the effect sizes in your cohort are similar to those of the meta-GWAS, the regression coefficient of the weighted GRS model should be around 1.

One limitation of our approach is that we focussed on heritability explained by common SNPs only, which is estimated to cover one third to one half of the total heritability found in twin and family studies,23, 26 with the remaining genetic variance most likely explained by lower frequency variants.27, 28 As such, our conclusions are limited to the heritability gap for only common SNPs as typically targeted by GWAS. The new method developed by Zaitlen et al.21 including relatives allows separate estimation of both common-SNP and total heritability. However, this latter estimate is potentially confounded by shared environment. With the increasing availability of whole-genome sequencing data expected in the near future, the recently developed expansion by Yang et al.28 of their GREML method, which stratifies for LD and minor allele frequency (GREML-LDMS), may be a good alternative for the estimation of heritability of complex traits as it captures contributions of both rare and common variants. A similar method using both rare and common SNPs was recently applied to 9 common diseases in data from the UK Biobank and explained an average of 57.3% of their total (narrow-sense) heritability based on structural equation modeling.29

In conclusion, we demonstrated that the majority of previously reported SNP associations for 32 continuous disease traits could be replicated in the Lifelines Study Cohort, confirming Lifelines’ value and reliability as a resource for genetic epidemiological studies. Although meta-GWAS studies have identified many SNPs that are associated with complex disease traits, these SNPs explain only a small to moderate part of the common-SNP heritability, which in turn explains up to ~50% of the total heritability. Our data suggest that dominance effects are unlikely to fill the gap of the missing heritability. Overall our results showed that none of the GRSs of complex disease traits are sufficiently accurate for personalized prediction limiting successful applications in clinical practice.

Data availability

The data that support the findings of this study are available from the Lifelines Cohort Study (https://lifelines.nl/lifelines-research/general) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Genotype and phenotype data are available upon request from the Lifelines cohort, whereas the effect sizes used to correct for the Lifelines effect are available from the authors on reasonable request and with permission from Lifelines.