Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases.
Genome-wide association studies (GWAS) have been successful in identifying thousands of associations between genetic variation and human complex diseases and traits1. Nevertheless, for most complex diseases, only a small fraction of their genetic architecture is known, and a small amount of the estimated heritability is explained2. Variants that individually have small contributions to the risk of disease, and/or are rare in the population, are often missed by the majority of GWAS even though they may contribute to the pathophysiology of complex diseases. Some of the current limitations of GWAS could be overcome by increasing sample sizes and by applying more comprehensive analytical methods with improved imputation strategies3. Besides, whole-exome sequencing (WES) and whole-genome sequencing (WGS) datasets, such as the NHLBI Trans-Omics for Precision Medicine (TOPMed) Project4, and the UK Biobank5, are rapidly growing and expanding the range of variants to be tested in genetic association studies. However, together with the general tendency of increasing sample sizes, this imposes additional methodological and computational challenges. These can require scientists to restrict and simplify the analysis by limiting it to autosomal chromosomes, to a single reference panel for imputation, and to a single (additive) inheritance model for association testing, leaving a relevant fraction of the genetic architecture of the disease unexplored6.
The genetic variants that modify the risk to develop a particular complex disease may contribute to the final phenotype through different functional mechanism defined by a particular model of inheritance, which is further reflected in a characteristic distribution of affected alleles across patients and healthy individuals in GWAS. For example, the additive inheritance model, which is often the only genetic model tested, assumes that the risk of the disease is proportional to the number of risk alleles in an individual, i.e., that the effect of the heterozygous genotype is halfway between the two possible homozygous genotypes. However, some variants follow non-additive inheritance models, which include dominant, recessive, and heterodominant. The additive model is expected to capture a large fraction of the genetic risk for disease7 and can identify some variants that follow non-additive inheritance patterns. However, the additive model is not sufficient to provide a comprehensive overview of the genetic architecture of diseases. In particular, most GWAS may have insufficient power to identify low-frequency variants that show recessive effects8,9. The importance of evaluating non-additive inheritance models is well reported in the context of Mendelian diseases10 and occasionally for complex traits as well, such as the recessive effects of the FTO locus in obesity11, and in or near ITGA112, TBC1D413, and CDKAL111,14 genes in type 2 diabetes, as well as the known non-additive effects of HLA haplotypes in autoimmune diseases15 and ulcerative colitis16. The increasing ability to capture low-frequency variants using modern imputation reference panels and the need to uncover the still missing heritability estimated for most complex diseases, call for comprehensive association strategies that should include, among other improvements, the analysis of non-additive inheritance models.
In this work, and to fill this gap and to determine the prevalence and contribution of the different inheritance patterns involved in the genetic architecture of complex diseases, we design and implement a comprehensive strategy for genetic association analysis that combines imputation from multiple reference panels with association testing under five different inheritance models across multiple phenotypes. We apply this strategy to the Kaiser Permanente Research Program on Genes, Environment and Health: A Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort17, which includes 62,281 subjects from European ancestry and 22 diseases. Finally, we release here both the summary statistics for all the models of inheritance as well as the complete methodology, provided to the community as an easy-to-use and standalone pipeline. This pipeline allows the analysis of existing and newly generated GWAS data with better efficiency and more comprehensive testing, improving the chances of variant discovery.
In order to assess the potential benefits of applying more in-depth GWAS methodologies to available genetic datasets, and to investigate the relative contribution of different inheritance models to the risk to develop complex diseases, we have applied a global analysis strategy to the GERA cohort, an age-related disease-based cohort with an average age of 63, well-powered to study a broad range of clinically defined age-related conditions. By using this particular cohort, we expect to minimize a possible loss of power due to the misclassification of controls, as often happens in datasets with younger individuals that can include cases at pre-disease stages classified as controls.
Genotype imputation and association testing using multiple reference panels
After applying genetic quality control to the GERA cohort (see “Methods”), we retained 56,637 individuals with European ancestry for further downstream analysis (Supplementary Data 1). To cover the maximum number and type of genetic variants, we next applied an imputation strategy with four reference panels: the Genome of the Netherlands (GoNL)18,19, the UK10K Project20, the 1000 Genomes Project (1000G) phase 321, and Haplotype Reference Consortium (HRC)22, and imputed 11.2 M, 11.4 M, 13.1 M, and 11.7 M high-quality imputed variants (IMPUTE223 info score ≥0.7 and minor allele frequency [MAF] ≥ 0.001) with each panel, respectively. After combining the results of the four reference panels by choosing, for each variant, the panel that provided the highest imputation accuracy, we retained a total of 16,059,686 variants covering all the autosomes and the X chromosome (Fig. 1a). Using this strategy we imputed 2.6 M and 5.5 M high-quality, low-frequency (0.05> MAF > 0.01) and rare variants (0.01 > MAF > 0.001), respectively, as well as 1.6 M indels. Note that as many as 684,393 common variants (MAF ≥ 0.05), 255,106 low-frequency, 1.7 M rare, and all indels (1.6 M) would be missed if only the HRC reference panel was used. This highlights the benefit of combining different reference panels for comprehensive association testing (Fig. 1b).
To evaluate our imputation strategy, we used sequenced data from UK10K, which includes 3781 sequenced genomes, to build an in silico array of 599,208 variants. We then imputed the in silico array using 1000G, GoNL and HRC as reference panels and compared the imputed genotype dosages with the sequenced genotype dosages (allelic dosage R2, see “Methods” and Supplementary Fig. 1). Our results empirically demonstrate that our IMPUTE2-info threshold of 0.7 is a good cutoff for well-predicted genotypes (Supplementary Fig. 2). Moreover, the combination of the results from the different panels based on IMPUTE2-info values outperforms single reference panels in terms of % of variants with allelic dosage R2 ≥ 0.5 in autosomes (Fig. 1c) and in the X chromosome (Fig. 1d) for both SNPs and indels (Supplementary Fig. 3).
To evaluate that our strategy is able to identify previously known loci that are well-powered in our cohort, we performed a power comparison under the additive model for type 2 diabetes and age-related macular degeneration, since both are well-defined diseases with available summary statistics. The power comparison between our additive results for type 2 diabetes and those from the DIAMANTE consortium24 showed that while we had an 80% power to find two variants with a p-value of 5.0 × 10−8 in our dataset, we found four variants with genome-wide significance (Supplementary Data 2). For age-related macular degeneration25, we had 80% power to find six variants with a p-value of 5.0 × 10−8 of which we identified four of them with genome-wide significance. The remaining two variants were nominally significant in our study (Supplementary Data 3). These analyses suggest that our strategy is robust to find previously known variants given sufficient power.
We next tested all the 16 M variants for association with the 22 conditions available in the GERA cohort and five different inheritance models (Supplementary Figs. 4–25). This analysis identified 94 independent loci associated with 17 phenotypes at a usual genome-wide significance level (p < 5.0 × 10−8) of which 63 for 14 phenotypes were also experiment-wide significant (p < 2.0 × 10−8) after considering correction for the different models of inheritance (see “Methods”) (Supplementary Data 4). According to the GWAS Catalog, 68 of the 94 genome-wide significant loci had been previously reported to be associated with the same disease (Supplementary Data 5), whereas 26 of them correspond to previously unreported loci with associations across 16 phenotypes (Table 1).
Of these 26 new loci, 16 correspond to common, 3 to low-frequency, and 7 to rare variants. Only a fraction of the 26 new loci would have been genome-wide significant by using individual imputation panels (Fig. 1e), namely 19/26 using HRC, 14/26 using 1000G Phase 3, 13/26 using UK10K or 14/26 using GoNL. In addition, the lead marker for three of the novel signals was an indel, not covered by HRC, further confirming the benefits of combining multiple panels with our approach.
Identification of recessive variants with large effects
The implementation of refined GWAS strategies not only increases the number of associated variants, but also allows the identification of loci with large impact on the disease. Among the variants that were not detected under the additive model, and hence are expected to be missed by the majority of current GWAS, we highlight three variants with large recessive effects. First, an intronic indel in the CACNB4 gene, rs201654520, associated with a nearly 20-fold increase in risk for cardiovascular disease (MAF = 0.017, OR [CI 95%] = 19.0 [5.5–65.8], p = 4.3 × 10−8). CACNB4 encodes the β4 subunit of the voltage-dependent calcium channel. This subunit contributes to the flux of calcium ions into the cell by increasing peak calcium current and triggering muscle contraction. Interestingly, an intronic single nucleotide polymorphism (SNP) within CACNB4, rs150793926, was associated with idiopathic dilated cardiomyopathy in African Americans26, but this variant is not in linkage disequilibrium (LD) with rs201654520 (LD r2 27 = 0.0016 for European ancestry and LD r2 = 0.0 for African ancestry).
A second recessive variant with large effect, rs77704739, near the PELO gene, is associated with a fourfold risk for type 2 diabetes (MAF = 0.036, OR [CI 95%] = 4.3 [2.7–6.9], p = 1.75 × 10−8). An independent signal that is about 112 K base pairs away (rs870992, LD r2 = 0.0009) was previously associated with type 2 diabetes in the Greenlandic population, also with a recessive effect12. To provide insights into the underlying molecular mechanisms in disease, we interrogated comprehensive catalogs of genetic effects on gene expression: eQTLGen Consortium28 and GTEx29. The rs77704739 variant was significantly associated with gene expression of PELO in multiple tissues, including diabetes-relevant tissues such as adipose tissue, skeletal muscle, and pancreas. Colocalization analyses showed a probability higher than 0.8 in several tissues, including subcutaneous adipose tissue and skeletal muscle, suggesting this gene as the effector transcript (Fig. 2a, b, and Supplementary Data 6). In addition, we found that the lead variants in the PELO locus overlap with active promoter annotations in human pancreatic islets and open chromatin sites highly bounded by islet-specific transcription factors30,31 (Fig. 2c).
Third, a rare indel, rs557998486, located near the THUMPD2 gene, is associated with age-related macular degeneration (MAF = 0.009, OR = 10.5, p = 2.75 × 10−8). Interestingly, the fact that we found no SNPs in LD with this lead indel further confirms the benefits of multiple reference panel imputation strategies that include alternative forms of variation. The lead indel rs557998486 overlaps DNAse I hypersensitivity sites in retinal and iris cell lines32, highlighting a candidate open chromatin region that is also predicted to be an enhancer assigned to the THUMPD2 gene according to GeneHancer33. One of the variants with the highest LD with rs557998486 (rs116649730, LD r2 = 0.32) is associated with reduced expression of its nearest gene, THUMPD2 (z-score = −4.85, p = 1.25 × 10−6), according to eQTLGen Consortium data.
Our empirical evaluation also demonstrates that our imputation approach is accurate for the new variants, including those with a large recessive effect, and that the combination of multiple reference panels increases the certainty of the imputed genotypes (Table 1 and Supplementary Data 7).
Replication using UK Biobank and FinnGen
We sought replication of previously unreported loci using UK Biobank, a prospective cohort of ~500 K individuals aged between 40 and 695. Given the high heterogeneity in phenotype definitions in UK Biobank compared to GERA, we tested for replication with the same phenotype and related traits (Supplementary Data 8). Compared to GERA, some of the conditions may not be ascertained or have an age at onset later than the average age at ascertainment in UK Biobank (56.52 years34) which could affect the replication success. Despite these limitations, we tested the novel variants using the corresponding inheritance model and replicated 4 new loci with the same phenotype (Table 2).
The variant rs77704739 variant near PELO was associated with type 2 diabetes (OR-recessive [95% CI] = 1.9 [1.4–2.6], p = 4.95 × 10−4) and metformin use (OR-recessive [95% CI] = 2.3 [1.6–3.4], p = 3.8 × 10−5) in the UK Biobank5 (Supplementary Data 8), also only under the recessive model.
We further sought replication of the association within the CACNB4 gene with cardiovascular disease in FinnGen, a cohort of ~218 K Finnish individuals with an average age of 63, as it includes individuals with a higher average age (63 vs 56 in UK Biobank) and the risk of developing a cardiovascular disease is well-known to increase with age35. In addition, FinnGen has a precise and richer classification of this particular phenotype than UK Biobank. In brief, we tested rs201654520 for association with 47 cardiovascular endpoints. Of all the conditions tested, four (hypertensive heart disease, hypertensive heart and/or renal disease, heart failure, and right bundle-branch block) were nominally associated (p < 0.05). All the associations had a direction of effect consistent with the effect observed in the GERA cohort (Supplementary Fig. 26a). Despite the high heterogeneity in the phenotype definitions between cohorts, we meta-analyzed the results from these endpoints from FinnGen with the result from cardiovascular disease phenotype from GERA, but none of them reach the genome-wide significance (see “Methods”) (Supplementary Fig. 26). We did not include UK Biobank in this meta-analysis as the equivalent phenotypes were not available or had less than 350 cases in UK Biobank, therefore, underpowered for a recessive analysis. Notably, when analyzing the association of rs201654520 with related quantitative traits we found that those who were homozygous for the high-risk allele had lower systolic blood pressure (p = 4.1 × 10−3, beta = −0.23) (Supplementary Data 8). While lower systolic blood pressure has been associated with increased risk of myocardial infarction in particular circumstances, this is not the typical direction of association, and therefore merits additional study36.
We also sought replication of the recessive association of rs557998486 near THUMPD2 gene with macular degeneration in FinnGen. rs557998486 was associated with increased risk of macular degeneration in UK Biobank under the recessive model (OR [CI 95%] = 7.6 [1.5–37.3], p = 4.1 × 10−2), eye surgery (beta [CI 95%] = 1.6 [0.6–2.6], p = 1.17 × 10−3) (Supplementary Data 8), and with increased C-reactive protein, a known biomarker for macular degeneration37,38 (beta [CI 95%] = 1.1 [0.7–1.5], p = 1.15 × 10−4) (Supplementary Data 9). In FinnGen this variant was not significantly associated although it showed the same direction of effect. However, the meta-analysis did not reach the genome-wide significance (rs557998486 p = 9.6 × 10−6) and had a high heterogeneity (heterogeneity I2 = 87.1, heterogeneity p = 4.3 × 10−4).
Detection ranges of the different inheritance models
Our findings provide an empirical overview of the detection range of five different inheritance models and show how each of them captures a fraction of the genetic variants associated with complex traits. Compared to current GWAS that usually only consider additive allelic effects, we found three different scenarios. Among all the 94 associated loci identified, 12 showed genome-wide significance only under the additive model, 62 under both additive and non-additive models, and 20 showed genome-wide significance only when non-additive tests were applied (Fig. 3a). To further classify these variants, we tested whether any of the 62 variants associated with both additive and non-additive models deviate from additivity through a dominance deviation test11. Eleven of these 62 variants (17.7%) showed significant deviation from additivity (dominance deviation test p < 0.05). However, variants not showing a significant deviation from additivity may become significant for other models with larger sample sizes. Altogether, the dominance deviation test over the 93 autosomal loci identified 62 (66%) additive and 24 (25.5%) non-additive associations, and 8 undetermined. Based on the smallest GWAS p-value, we further classified non-additive associations into 9 recessive, 13 dominant, 8 heterodominat, and 7 genotypic (Supplementary Data 4).
We also observed that each of the available models for association testing has a different range of detection. To identify the 94 genome-wide associated loci, the additive test, as expected, was the most sensitive model (74 loci), followed by the genotypic (59 loci), the dominant (56 loci), the recessive (43 loci), and the heterodominant (32 loci). When considering known loci, 48 of the 68 previously reported loci were identified by more than one model in our analysis, and almost half of these (22 loci) with all five models. In contrast, of the 26 newly discovered variants, only 8 were identified with multiple models, whereas the majority of them (18 loci), were detected only with the additive (6 loci), the genotypic (4 loci), the recessive (4 loci), and the dominant (3 loci) model. Of note, 13 out of 26 (50%) novel loci were only identified by non-additive models.
To further investigate to what extent the additive model captures non-additive signals, and how much this depends on sample size, we carried out power calculations on loci that we identified only under a non-additive model, such as rs201654520 within CACNB4 gene and rs77704739 near the PELO gene. These power calculations showed that the additive test would require a population sample size of at least 370,646 individuals to detect the recessive association of rs201654520 in CACNB4 (Fig. 3b), and at least 188,637 individuals to capture the recessive signal of rs77704739 near the PELO gene (Fig. 3c), while the population sample size required for the recessive model was only 21,021 and 67,611, respectively. In this study, we were able to identify both associations with a modest sample size by using the most well-suited disease model.
The GUIDANCE framework
The complete methodology described here and used for the analysis of the GERA cohort was integrated into a framework, called GUIDANCE. GUIDANCE allows the analysis of genome-wide genotyped data in a single execution in distributed computing infrastructures without the need for extensive computational expertise or constant user intervention. The GUIDANCE workflow requires quality-controlled genotyped data as input and provides association results, graphical outputs, and statistical summaries. Integrating state-of-the-art tools with in-house code written in Java, bash, and R39, GUIDANCE efficiently performs large-scale GWAS, including (1) the pre-phasing of haplotypes, (2) the imputation of genotypes using multiple reference panels, (3) the association testing for different inheritance models and the integration of the results from different panels, (4) a cross-phenotype analysis when more than one phenotype is available in the cohort (Supplementary Data 10), and finally, (5) the generation of summary statistics tables and graphic representations of the results (Supplementary Fig. 27), for both the autosomes and the X chromosome. While GUIDANCE can be executed as a standalone compact program it can also be used in modules (Supplementary Fig. 28), which makes it adaptable to existing frameworks and provides a higher level of control to users.
GUIDANCE runs in distributed computing platforms, including the cloud, without requiring a broad background in these environments. This is feasible since GUIDANCE was implemented on top of the COMP Superscalar Programming Framework (COMPSs)40. The GUIDANCE workflow was implemented as a sequential Java program containing the calls to the GWAS tools, encapsulated in Java methods, and selected as tasks, while COMPSs controls the execution of those tasks on the underlying distributed infrastructure. The source code, the pre-compiled binaries, and the documentation to use GUIDANCE are available at http://cg.bsc.es/guidance.
Current genetic association studies are undergoing fundamental strategic and methodological changes to identify new associations. The gradual incorporation of WES and WGS, together with the continuous efforts to increase the sample size, impose computational and methodological challenges that cannot always be overcome. In addition, the inclusion of the X chromosome, and non-additive models during association testing, despite increasing the computational burden, can also lead to the identification of new associations. In this study, we demonstrate the value of applying a comprehensive GWAS strategy, including denser imputation strategies, the X chromosome, and non-additive association tests, to an existing large-scale genetic resource, the GERA cohort. We show that by applying more innovative imputation protocols we increased the number and the type of variants tested for association, including low-frequency and rare SNPs as well as alternative forms of variation, such as indels. Our analysis of the GERA cohort shows that between 13 and 20 of the genome-wide significant associations (14–21%) would not have been identified when using a single reference panel. Likewise, our analysis in the GERA cohort demonstrates that 21% of the associations would be missed by only testing the additive model. Overall, 27.6% of associations would not have been identified by applying the commonly used HRC and additive model association testing. The inclusion of the X chromosome within the study allowed us to identify an intronic variant (rs67648651) associated with dyslipidemia in the RGAG1 gene among the previously known associated loci since a variant in LD with our top variant (rs5985471, LD r2 = 0.92 for European ancestry) has already been associated with low-density lipoprotein cholesterol20. This supports the value of analyzing the X chromosome, which is not considered in the majority of GWAS.
In this study, we show the potential of identifying large effect recessive associations by maximizing the use of current reference panels and testing different inheritance models, as exemplified by the associations with type 2 diabetes, cardiovascular disease, and macular degeneration with variants near PELO, CACNB4, and THUMPD2, respectively. This strategy opens new avenues for future analyses in large-scale biobanks, as demonstrated with our power calculations, which show that even the largest available GWAS meta-analyses or biobanks would not have enough power to identify these associations using only the additive model. For example, the CACNB4 gene, associated with cardiovascular disease, would require a sample size equivalent to 370,000 individuals when using the additive test, 17 times larger than the required sample size under a recessive analysis. After considering all the supporting evidence illustrated with many examples in this study, the results suggest that these new associations deserve future validations and follow-up analysis. Therefore, this study demonstrates the importance of a comprehensive analysis including non-additive models when performing GWAS, which can increase the discovery not only on GWAS relying on genotyping array data, but also on WES or WGS association studies.
The inclusion of non-additive associations can also have an impact on the construction of polygenic risk scores. Current polygenic risk scores (PRS) are calculated summing risk alleles weighted by effect sizes from GWAS results, which have typically tested only the additive model in the association test. Hence, large-scale genome-wide association data accounting for different models of inheritance and including both SNPs and alternative forms of variation, such as indels, will also be essential to develop more accurate genome-wide PRS, which would weight each of the genotype carriers appropriately, rather than weighting the heterozygous halfway between the homozygous of the effect and alternate alleles.
To easily apply this strategy to genetic studies we present GUIDANCE, a standalone and easy-to-use application that allows an efficient and comprehensive GWAS analysis in different computing platforms, such as cloud and high-performance computing architectures. GUIDANCE is designed to allocate an unlimited number of reference panels. This can be useful for GWAS performed in specific diverse populations where the addition of a population-specific reference panel alongside the commonly used ones could be an advantage. This feature can also be useful when incorporating reference panels that have a better ascertainment of other type of variation, such as structural variants, with the most commonly used ones. In addition, it is possible to launch the execution by steps, incorporating the previously obtained results, avoiding repeating all the computations. In a moment where the community is facing computational and methodological challenges due to the growing complexity and size of genetic datasets, the availability of robust and complete analysis platforms can improve the efficiency of genetic studies, standardizing analysis strategies among meta-analysis cohorts to ensure consistency.
Finally, to share our results with the community and to promote the analysis of non-additive inheritance models in GWAS, a public searchable database including additive and non-additive summary statistics for 16 M of variants and 22 phenotypes is available at the Common Metabolic Diseases Knowledge Portal (https://cmdkp.org/), and full summary statistics both at the Common Metabolic Diseases Knowledge Portal and at http://cg.bsc.es/guidance.
Empirical evaluation of the imputation strategy
To empirically evaluate our imputation strategy, we extracted the corresponding genotypes in UK10K for each variant present in the genotyping array used in GERA to build an in silico array of 599,208 genotyped variants for the 3781 UK10K individuals. We then imputed the data using the 599,208 variants as a backbone, using 1000G, GoNL, and HRC as reference panels. We used IMPUTE2 to impute the genotypes. We also combined the imputed results based on the highest imputation accuracy according to the IMPUTE2-info values as we did with the GERA cohort. We transformed the IMPUTE2 genotype probabilities into genotype dosages and calculated the r-squared correlation (R2) between the imputed genotype dosages and the sequenced genotype dosages from UK10K, termed allelic dosage R2 (Supplementary Fig. 1).
GUIDANCE workflow description
By combining and integrating state-of-the-art GWAS analysis tools into the COMP Superscalar programming Framework (COMPSs), we developed GUIDANCE, a standalone application that performs haplotype phasing, genome-wide imputation, association testing, and PheWAS analysis of large GWAS datasets (Supplementary Fig. 27).
As shown in Supplementary Fig. 27, GUIDANCE’s workflow starts with quality-controlled genotype data and ends with providing association results, graphical outputs, and statistical summaries.
Once everything is settled in the GUIDANCE configuration file, GUIDANCE performs an efficient two-stage imputation procedure, by pre-phasing the genotypes into whole haplotypes followed by genotype imputation itself41. SHAPEIT242 or EAGLE243 and IMPUTE223 or MINIMAC444 can be used for pre-phasing and genotype imputation, respectively. In addition, GUIDANCE accepts one or multiple reference panels, allowing the integration of the results obtained from all panels by selecting for each variant the genotypes from the reference panel that provides the highest imputation accuracy according to the IMPUTE2-info score or MINIMAC2 r2 (Supplementary Fig. 29). GUIDANCE also performs a post-imputation quality control to eliminate low-quality imputed variants under the basis of the IMPUTE2-info score or MINIMAC2 r2 and the MAF.
After genotype imputation and post-imputation quality control, GUIDANCE applies SNPTEST for association testing, where additive, dominant, recessive, heterodominant, and genotype models can be analyzed. Here, the user can decide to include several covariates for the association test, such as principal components to adjust for population stratification, or any other confounders. GUIDANCE also allows testing for multiple phenotypes or for a single phenotype with different covariates in the same execution. After association testing, variants are filtered by the deviation from Hardy–Weinberg equilibrium (HWE) p-value. Finally, GUIDANCE generates summary reports for each trait with all the inheritance models tested in the association and the corresponding graphical representation, i.e., Manhattan and quantile–quantile (Q–Q) plots (Supplementary Fig. 4–25), also providing a matrix identifying cross-phenotype associations (Supplementary Data 10).
GUIDANCE can be executed as a standalone compact program or as independent modules (see Supplementary Fig. 28 for a list of independent modules) to facilitate the use of GUIDANCE into existing frameworks.
Further details can be found in the configuration file from the GUIDANCE execution at http://cg.bsc.es/guidance. Specific documentation to use this framework is available at http://cg.bsc.es/guidance, as well as the source code and the pre-compiled binaries that are available in the Download section.
The analysis of GERA cohort
GERA cohort description
GERA cohort data was obtained through dbGaP under accession phs000674.v1.p1. For further information about the specific phenotypes (ICD-9-CM codes) included in GERA, visit its website on dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?id=phd004308). The Resource for Genetic Epidemiology Research on Aging (GERA) Cohort was created by a RC2 Grand Opportunity grant that was awarded to the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) and the UCSF Institute for Human Genetics (AG036607; Schaefer/Risch, PIs). The RC2 project enabled genome-wide SNP genotyping (GWAS) to be conducted on a cohort of over 100 K adults who were members of the Kaiser Permanente Medical Care Plan, Northern California Region (KPNC), and participating in its RPGEH. The resulting GERA cohort is composed of 42% of males, 58% of females, and ranges in age from 18 to over 100 years old with an average age of 63 years at the time of the RPGEH survey (2007). Nineteen percent of the individuals are from non-European ancestry, while 81% are described as white non-Hispanic participants. After an explicit requirement of consent by email, data from 78,486 participants were deposited in dbGaP, with similar demographic characteristics to those of the initial genotyped cohort.
A subset of 62,281 subjects of European ancestry underwent quality control analyses. A three-step quality control protocol was applied using PLINK45,46, and included two stages of SNP removal and an intermediate stage of sample exclusion.
The exclusion criteria for genetic markers consisted of proportion of missingness ≥ 0.05, HWE p ≤ 1 × 10−20 for all the cohort, and MAF < 0.001. This protocol for genetic markers was performed twice, before and after sample exclusion.
For the individuals, we considered the following exclusion criteria: gender discordance, subject relatedness (pairs with PI-HAT ≥ 0.125 from which we removed the individual with the highest proportion of missingness), sample call rates ≥ 0.02, and population structure showing more than four standard deviations within the distribution of the study population according to the first seven principal components (Supplementary Fig. 30). After QC, 56,637 subjects remained for the analysis (Supplementary Data 1).
Analyzing GERA cohort using GUIDANCE
GUIDANCE pre-phased the genotypes to whole haplotypes with SHAPEIT2, and then performed genotype imputation with IMPUTE2 using 1000G phase 3, UK10K, GoNL, and HRC as reference panels. After filtering variants with an info score < 0.7 and a MAF < 0.001, we tested for association with additive, dominant, recessive, heterodominant, and genotypic logistic regression using SNPTEST, and including seven derived principal components, sex, and age as covariates. To maximize power and accuracy, we combined the association results from the four reference panels by choosing for each variant, the genotypes from the reference panel that provided the best IMPUTE2-info score.
For chromosome X we restricted the analysis to non-pseudoautosomal (non-PAR) regions. For the haplotype phasing of chromosome X, we used the --chrX flag required for SHAPEIT to only phase female samples since males only have one X chromosome, and to impute missing data in male samples. We did the same imputing genotypes using IMPUTE2 as it requires the -chrX flag alongside the sample file with the sex information. In the association test, we used -method newml from SNPTEST to ignore samples with missing sex or males encoded wrongly (males should be coded 0/1, as homozygote females), and to assume a model of full X inactivation. Hence, the logistic regression model assumes a complete inactivation of one allele in females and equal effect size between males and females. For heterogeneity between males and females, and to allow a complete inactivation of the X chromosome in females, we used -stratify_on sex to separate the effects and the baselines of males and females, accounting for hemizygosity for males, while for females, we followed an autosomal model.
Finally, we excluded variants with HWE controls p < 1 × 10−6 and with a case count for homozygous for the alternative allele below three in the final results for the recessive and genotypic model, as we observed a trend for genomic inflation and deflation in the recessive and genotypic model before removing these variants.
Identification of known and new associated loci
After the association test, GUIDANCE provided a list of variants that passed the p-value threshold specified in the configuration file (i.e., p ≤ 5.0 × 10−8). Using the IRanges R package47, all the genome-wide significant variants were collapsed into ranges (500 kb) that define each associated locus.
To distinguish between known or new associated regions, for each top variant we looked for any proxy variant with an LD r2 > 0.35 in the GWAS catalog (accession 5 September 2019) associated with the same phenotype or a related one (for example, bone mineral density, cholesterol levels or diastolic/systolic blood pressure phenotypes for osteoporosis, dyslipidemia or hypertension, respectively). HLA regions at chromosome 6 were excluded since the particularities of these regions required further detailed studies on their LD pattern. Proxies were selected using LDlink (https://ldlink.nci.nih.gov/)48.
We defined an experiment-wide significant p-value cutoff of p < 2.0 × 10−8 by applying the Bonferroni correction for 2.5 effective test (5.0 × 10−8/2.5 effective test). This factor of 2.5 was obtained from a simulation study when four genetic models (additive, dominant, recessive, and genotypic) are used49 since the genetic models are not independent. However, a new simulation study including the heterodominant model should be done for a more accurate effective number of tests.
Replication with UK biobank
UK Biobank participants agreed to provide detailed information about their lifestyle, environment, and medical history, to donate biological samples (for genotyping and for biochemical assays), to undergo measures, and to have their health followed (http://www.ukbiobank.ac.uk/).
When collecting and analyzing a wide range of phenotypes from the UK Biobank, a central challenge was the curation and harmonization of the vast array of categorizations, variable scalings, and follow-up responses. Fortunately, to this end, the PHEnome Scan ANalysis Tool (or PHESANT: https://github.com/MRCIEU/PHESANT)50 performs much of the transformations and recodings required to generate meaningful, interpretable phenotypes.
We have made further adjustments based on user feedback, owing to the value of transparency in generating our phenotype guidelines. Applying these changes to the PHESANT source code, phenotypes were parsed using our modified version (github.com/astheeggeggs/PHESANT) on a virtual machine on the Google Cloud Platform.
We first restricted to the subset of European individuals, before passing the resultant phenotypic data to PHESANT. The ‘variable list’ file and ‘data-coding’ file, whose formats are defined in the original version of PHESANT were updated as new phenotypes were added in the latest UK Biobank release. Recodings of variables, and inherent orderings of categorical variables, are defined in the ‘data-coding’ file. The ‘Excluded’ column of the ‘variable list’ file defines the collection of variables that we do not wish to interrogate.
A high-level overview of the PHESANT pipeline, our defaults, and the associated short flags for the phenomescan.r code are displayed in Supplementary Fig. 31. In addition to the inverse-rank normalization applied to the collection of continuous phenotypes, we also consider the raw version of the continuous phenotype, with no transformation applied to the data.
Curation of the ICD10 codes was carried out separately for computational efficiency. For the ICD10 phenotype, individuals are assigned a vector of ICD10 diagnoses. We truncated these codes to two digits, and assigned each individual to either case or control status for that ICD10 code in turn by checking if their vector contains that code. Throughout, we assumed the data contained no missingness, so the sum of cases and controls throughout was the number of individuals in our ‘European’ subset of the UK Biobank data. As in the PHESANT categorical (multiple) phenotypes, ICD10 code case/control phenotypes were removed if <50 individuals had the diagnosis.
Association testing and meta-analysis for UK Biobank phenotypes
We performed the association testing for the curated phenotypes as implemented in SNPTEST for additive, dominant, recessive, heterodominant, and genotypic inheritance models, as it has been described in the “Analyzing GERA cohort using GUIDANCE” section. For all genotypic variants identified in the discovery stage, we assigned the recessive model after we identified it as the underlying model.
After the association testing, we filtered and ordered all the phenotypes based on the p-value for the best model of inheritance obtained from the GERA cohort analysis, with special consideration to equivalent phenotypes or related traits.
With the association testing results of both GERA cohort and UK Biobank, we meta-analyzed the results using METAL51. We use the inverse variance-weighted fixed-effect model for all the variants except for the rs557998486 variant associated with macular degeneration, since its beta, calculated with the em method from SNPTEST, was inflated. Therefore, we performed a sample size-based meta-analysis, which converts the direction of the effect and the p-value into a z-score.
For biomarkers, only the results from the first visit were taken into account since <10% of the cases were present in the second visit.
Association testing and meta-analysis with FinnGen
We used SAIGE52 for recessive association testing using sex, age, PC1–10, and batch as covariates. We analyzed FinnGen release 5 that contains 218,792 individuals with a median age 62.6 and a mean age 59.8.
For the cardiovascular disease endpoints, we meta-analyzed the results using rmeta R package53. For macular degeneration, we meta-analyzed the results using METAL as described in the previous section.
Dominance deviation test
To detect genuine differences between additive and non-additive signals, we performed a dominance deviation test for all 93 autosomal genome-wide significant loci.
Dominance deviation was tested by a logistic regression analysis using PLINK (v1.90b6.9, www.cog-genomics.org/plink/1.9/). Sex, age, and the first 7 PCs were included as covariates.
Definition of 99% credible set of PELO locus
For the PELO locus, the fraction of aggregated variants that have a 99% probability of containing the causal one was identified. The 99% credible set of variants for the region was defined with a Bayesian refinement approach54, considering variants with an r2 > 0.1 with the leading one.
For each variant within the PELO locus, the credible set provides a posterior probability of being the causal one54. The approximate Bayes factor (ABF) for each variant was estimated as
The β and the SE result from a logistic regression model testing for association. The posterior probability for each variant was calculated as
where ABFi corresponds to the approximate Bayes’ factor for the marker i, and T represents the sum of all the ABF values enclosed in the interval. As commonly employed by SNPTEST, this calculation assumes that the prior of the β is a Gaussian with mean 0 and variance 0.04.
Finally, the cumulative posterior probability was calculated after ranking the variants according to the ABF in decreasing order. Variants were included in the 99% credible set of the region until the cumulative posterior probability of association got over 0.99.
Gene expression and functional characterization
The eQTLGen Consortium (https://www.eqtlgen.org/cis-eqtls.html, last access on July 2019) and GTEx portal (https://gtexportal.org/, last access on July 2019) were used to find associations between our novel genetic associations and gene expression. When the variant was not available in the resources, a proxy SNP was used instead.
To determine whether any identified overlap between GERA GWAS loci and eQTLGen or GTEx eQTLs was due to a true shared association signal, we performed a colocalization analysis. Colocalization was assessed by a Bayesian test using summary statistics from both studies55: summary statistics from the cis eQTLGen and GTEx were downloaded from the eQTLGen website and GTEx portal, respectively. The test was performed using the R package coloc v3.2-155,56,57. The test provided a posterior probability for the GWAS locus and the eQTL to share the same causal variant(s).
We integrated available epigenomic datasets to examine the role of human pancreatic islet transcriptional regulation underlying rs77704739 association with type 2 diabetes. We used the WashU EpiGenome Browser (http://epigenomegateway.wustl.edu/browser/, last access on July 2019) and previously published RNA-seq, ATAC-seq, and ChIP-seq assays of H3K4me3, H3K27ac, Mediator, CTCF, and islet transcription factors (FOXA2, MAFB, NKX2.2, NKX6.1, and PDX1) in human pancreatic islets30,31 and islet regulome annotations31.
Comparison of power calculation under different inheritance models
Power calculations were performed using libraries epiR58 and GeneticsDesign59. For each variant, the power was computed across different allele frequencies and sample sizes. Frequencies of homozygous for different allele frequencies were estimated assuming Hardy–Weinberg equilibrium. The sample size needed to achieve 80% power was plotted against the allele frequency. For the additive model, we chose the observed odds ratio for the additive model, whereas the observed odds ratio for the recessive model was chosen for the recessive model.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
GUIDANCE60 is available at https://gitlab.bsc.es/computational-genomics/guidance (https://doi.org/10.5281/zenodo.4446121). A zipped folder containing the code used to generate the results presented in the article can be found in GitLab (https://gitlab.bsc.es/computational-genomics/guidance) and Zenodo (https://zenodo.org/record/4446121#.YASVmWhKiUk) and can be cited with the DOI 10.5281/zenodo.4446121. It contains example configuration files, explanations on how to execute the framework, a containerized version of the pipeline, and the source code. More detailed instructions can be found on its website (http://cg.bsc.es/guidance/).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Bonas-Guarch, S. et al. Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2 diabetes. Nat. Commun. 9, 321 (2018).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).
Zhu, Z. et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 96, 377–385 (2015).
Salanti, G. et al. Underlying genetic models of inheritance in established type 2 diabetes associations. Am. J. Epidemiol. 170, 537–545 (2009).
Lettre, G., Lange, C. & Hirschhorn, J. N. Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet. Epidemiol. 31, 358–362 (2007).
Antonarakis, S. E. & Beckmann, J. S. Mendelian disorders deserve more attention. Nat. Rev. Genet. 7, 277–282 (2006).
Wood, A. R. et al. Variants in the FTO and CDKAL1 loci have recessive effects on risk of obesity and type 2 diabetes, respectively. Diabetologia 59, 1214–1221 (2016).
Grarup, N. et al. Identification of novel high-impact recessively inherited type 2 diabetes risk variants in the Greenlandic population. Diabetologia 61, 2005–2015 (2018).
Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).
Steinthorsdottir, V. et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat. Genet. 39, 770–775 (2007).
Lenz, T. L. et al. Widespread non-additive and interaction effects within HLA loci modulate the risk of autoimmune diseases. Nat. Genet. 47, 1085–1090 (2015).
Goyette, P. et al. High-density mapping of the MHC identifies a shared role for HLA-DRB1*01:03 in inflammatory bowel diseases and heterozygous advantage in ulcerative colitis. Nat. Genet. 47, 172–179 (2015).
Hoffmann, T. J. et al. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort. PLoS Genet. 11, e1004930 (2015).
Boomsma, D. I. et al. The Genome of the Netherlands: design, and project goals. Eur. J. Hum. Genet. 22, 221–227 (2014).
Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet 46, 818–825 (2014).
UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).
Fritsche, L. G. et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 48, 134–143 (2016).
Xu, H. et al. A genome-wide association study of idiopathic dilated cardiomyopathy in African Americans. J. Pers. Med. 8, 11 (2018).
Hill, W. G. & Robertson, A. Linkage disequilibrium in finite populations. Theor. Appl Genet 38, 226–231 (1968).
Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat. Genet. 46, 136–143 (2014).
Miguel-Escalada, I. et al. Human pancreatic islet three-dimensional chromatin architecture provides insights into the genetics of type 2 diabetes. Nat. Genet. 51, 1137–1148 (2019).
Consortium, E. P. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).
Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database 2017, https://doi.org/10.1093/database/bax028 (2017).
Hewitt, J., Walters, M., Padmanabhan, S. & Dawson, J. Cohort profile of the UK Biobank: diagnosis and characteristics of cerebrovascular disease. BMJ Open 6, e009161 (2016).
Yazdanyar, A. & Newman, A. B. The burden of cardiovascular disease in the elderly: morbidity, mortality, and costs. Clin. Geriatr. Med. 25, 563–577 (2009). vii.
Vidal-Petiot, E. et al. Cardiovascular event rates and mortality according to achieved systolic and diastolic blood pressure in patients with stable coronary artery disease: an international cohort study. Lancet 388, 2142–2152 (2016).
Han, X. et al. Using Mendelian randomization to evaluate the causal relationship between serum C-reactive protein levels and age-related macular degeneration. Eur. J. Epidemiol. 35, 139–146 (2020).
Molins, B., Romero-Vazquez, S., Fuentes-Prior, P., Adan, A. & Dick, A. D. C-reactive protein as a therapeutic target in age-related macular degeneration. Front. Immunol. 9, 808 (2018).
R Foundation for Statistical Computing. R: A Language and Environment for Statistical Computing (2019).
Lordan, F. et al. ServiceSs: an interoperable programming framework for the Cloud. J. Grid Comput. 12, 67–91 (2014).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Delaneau, O., Zagury, J. F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Mercader, J. M. et al. Altered brain-derived neurotrophic factor blood levels and gene variability are associated with anorexia and bulimia. Genes Brain Behav. 6, 706–716 (2007).
Millard, L. A. C., Davies, N. M., Gaunt, T. R., Davey Smith, G. & Tilling, K. Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank. Int. J. Epidemiol. 47, 29–35 (2018).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Lumley, T. rmeta: Meta-analysis. R package version 3.0 (2018).
Wellcome Trust Case Control C. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Giambartolomei, C. et al. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics 34, 2538–2545 (2018).
Wallace, C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 37, 802–813 (2013).
Wallace, C. et al. Statistical colocalization of monocyte gene expression and genetic risk variants for type 1 diabetes. Hum. Mol. Genet. 21, 2815–2824 (2012).
Stevenson, M. et al. epiR: tools for the analysis of epidemiological. R package version 1.0, 2 edn (2019).
Warnes, G., Duffy, D., Man, M., Qiu, W. & Lazarus, R. GeneticsDesign: functions for designing genetics studies. R package version 1.52.0 edn (2019).
Guindo-Martínez, M. et al. The impact of non-additive genetic associations on age-related complex diseases. https://doi.org/10.5281/zenodo.4446121 (2021).
Lim, E. T. et al. A novel test for recessive contributions to complex diseases implicates Bardet-Biedl syndrome gene BBS10 in idiopathic type 2 diabetes and obesity. Am. J. Hum. Genet. 95, 509–520 (2014).
This work has been sponsored by the grant SEV-2011-00067 and SEV2015-0493 of Severo Ochoa Program, awarded by the Spanish Government, by the grant TIN2015-65316-P, awarded by the Spanish Ministry of Science and Innovation, and by the Generalitat de Catalunya (contract 2014-SGR-1051). This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by a Sara Borrell Fellowship from the Instituto Carlos III, Beatriu de Pinós fellowship from the Agency for Management of University and Research Grants (AGAUR) and by the American Diabetes Association Innovative and Clinical Translational Award 1-19-ICTS-068. Sílvia Bonàs was supported by FI-DGR Fellowship from FIDGR 2013 from Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya), and a ‘Juan de la Cierva’ postdoctoral fellowship (MINECO;FJCI-2017-32090). Cecilia Salvoro received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433. Cristian Ramon-Cortes pre-doctoral contract is financed by the Spanish Ministry of Science, Innovation, and Universities under contract BES-2016-076791. Elizabeth G. Atkinson was supported by the National Institutes of Mental Health (grants K01MH121659 and T32MH017119). Jose Florez was supported by NIH/NIDDK award K24 DK110550. This study made use of data generated by the UK10K Consortium, derived from samples from UK10K COHORT IMPUTATION (EGAS00001000713). A full list of the investigators who contributed to the generation of the data is available at www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310. This study made use of data generated by the ‘Genome of the Netherlands’ project, which is funded by the Netherlands Organization for Scientific Research (grant no. 184021007). The data were made available as a Rainbow Project of BBMRI-NL. Samples were contributed by LifeLines (http://lifelines.nl/lifelines-research/general), the Leiden Longevity Study (http://www.healthy-ageing.nl; http://www.langleven.net), the Netherlands Twin Registry (NTR: http://www.tweelingenregister.org), the Rotterdam studies (http://www.erasmus-epidemiology.nl/rotterdamstudy) and the Genetic Research in Isolated Populations program (http://www.epib.nl/research/geneticepi/research.html#gip). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). This study also made use of data generated by The Haplotype Reference Consortium (HRC) accessed through The European Genome-phenome Archive at the European Bioinformatics Institute with the accession numbers EGAD00001002729, after a form agreed by the Barcelona Supercomputing Center (BSC) with WTSI. This research has been conducted using also the UK Biobank Resource (application number 31063 and 27892). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 07/16/2019. We acknowledge PRACE for awarding us access to both MareNostrum supercomputer from the Barcelona Supercomputing Center, based in Spain at Barcelona, and the SuperMUC supercomputer of the Leibniz Supercomputing Center (LRZ), based in Garching at Germany (proposals numbers 2016143358 and 2016163985). The technical support group from the Barcelona Supercomputing Center is gratefully acknowledged. Finally, we thank all the Computational Genomics group at the BSC for their helpful discussions and valuable comments on the manuscript. We also acknowledge Elias Rodriguez Fos for designing the GUIDANCE logo.
The authors declare no competing interests.
Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Guindo-Martínez, M., Amela, R., Bonàs-Guarch, S. et al. The impact of non-additive genetic associations on age-related complex diseases. Nat Commun 12, 2436 (2021). https://doi.org/10.1038/s41467-021-21952-4
This article is cited by
Dominance is common in mammals and is associated with trans-acting gene expression and alternative splicing
Genome Biology (2023)
Translational Psychiatry (2023)
Genetic Variants in WNT16 and PKD2L1 Locus Affect Heel Ultrasound Bone Stiffness: Analyses from the General Population and Patients Evaluated for Osteoporosis
Calcified Tissue International (2023)
High-throughput genetic clustering of type 2 diabetes loci reveals heterogeneous mechanistic pathways of metabolic disease