Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying causal variants remains difficult. Whole-genome sequencing (WGS) can help by providing complete knowledge of all genetic variants, but it is financially prohibitive for well-powered GWAS studies. We performed mapping of expression quantitative trait loci (eQTLs) with WGS and RNA-seq, and found that lead eQTL variants called with WGS were more likely to be causal. Through simulations, we derived properties of causal variants and used them to develop a method for identifying likely causal SNPs. We estimated that 25–70% of causal variants were located in open-chromatin regions, depending on the tissue and experiment. Finally, we identified a set of high-confidence causal variants and showed that these were more enriched in GWAS associations than other eQTLs. Of those, we found 65 associations with GWAS traits and provide examples in which genes implicated by expression are functionally validated as being relevant for complex traits.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Spain, S.L. & Barrett, J.C. Strategies for fine-mapping complex traits. Hum. Mol. Genet. 24, R111–R119 (2015).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Brown, A.A. et al. Genetic interactions affecting human gene expression identified by variance association mapping. eLife 3, e01381 (2014).
Buil, A. et al. Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat. Genet. 47, 88–91 (2015).
UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).
Timpson, N.J. et al. A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans. Nat. Commun. 5, 4871 (2014).
Iotchkova, V. et al. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat. Genet. 48, 1303–1312 (2016).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Lebreton, C.M. & Visscher, P.M. Empirical nonparametric bootstrap strategies in quantitative trait loci mapping: conditioning on the genetic model. Genetics 148, 525–535 (1998).
Visscher, P.M., Thompson, R. & Haley, C.S. Confidence intervals in QTL mapping by bootstrapping. Genetics 143, 1013–1020 (1996).
Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Wen, X., Lee, Y., Luca, F. & Pique-Regi, R. Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am. J. Hum. Genet. 98, 1114–1129 (2016).
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Nica, A.C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. http://dx.doi.org/10.1038/ng.3981 2017).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Sharma, G. et al. GPER deficiency in male mice results in insulin resistance, dyslipidemia, and a proinflammatory state. Endocrinology 154, 4136–4145 (2013).
Meyre, D. et al. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations. Nat. Genet. 41, 157–159 (2009).
Jelinek, D., Heidenreich, R.A., Erickson, R.P. & Garver, W.S. Decreased Npc1 gene dosage in mice is associated with weight gain. Obesity (Silver Spring) 18, 1457–1459 (2010).
Jelinek, D. et al. Npc1 haploinsufficiency promotes weight gain and metabolic features associated with insulin resistance. Hum. Mol. Genet. 20, 312–321 (2011).
Bambace, C., Dahlman, I., Arner, P. & Kulyté, A. NPC1 in human white adipose tissue and obesity. BMC Endocr. Disord. 13, 5 (2013).
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442–1453 (2016).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Marco-Sola, S., Sammeth, M., Guigó, R. & Ribeca, P. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Ongen, H., Buil, A., Brown, A.A., Dermitzakis, E.T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9, e1003486 (2013).
Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).
Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).
The International Multiple Sclerosis Genetics Consortium. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 45, 1353–1360 (2013).
Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Robinson, E.B. et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat. Genet. 48, 552–555 (2016).
Horikoshi, M. et al. Genome-wide associations for birth weight and correlations with adult disease. Nature 538, 248–252 (2016).
Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
Liu, J.Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
Manning, A.K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012).
Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Willer, C.J. et al.; Global Lipid Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Delaneau, O. et al. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 8, 15452 (2017).
We thank N. Lykoskoufis for assistance with the enrichment analysis. T.S. is supported as an NIHR Senior Research Fellow. This project was supported by a Helse Sør-Øst grant (2011060) to A.B. and an MRC Project Grant (L01999X/1) to K.S., and by grants from the NIH-NIMH (NIH-R01MH101814-GTEx), an IMI-Joint Undertaking of the European Commission (UE7-DIRECT-115317-1), the European Commission (UE7-EUROBATS-259749), the European Research Council (UE7-POPRNASEQ-260927), the Louis Jeantet Foundation, the Swiss National Science Foundation (31003A-149984 and 31003A-170096), and SystemsX (2012/201-SysGenetix) to E.T.D. The TwinsUK study was funded by the Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013) and the Medical Research Council. The study also received support from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre, based at Guy's and St Thomas' NHS Foundation Trust, in partnership with King's College London. SNP genotyping was performed by The Wellcome Trust Sanger Institute and National Eye Institute via NIH-CIDR. This study used data generated by the UK10K Consortium. Funding for UK10K was provided by the Wellcome Trust under award WT091310. A full list of the investigators who contributed to the generation of the UK10K data is available at http://www.UK10K.org/. This research was supported by grants from the European Research Council. Computation was performed at the Vital-IT Center (http://www.vital-it.ch/) for high-performance computing of the SIB Swiss Institute of Bioinformatics.
The authors declare no competing financial interests.
Integrated supplementary information
Based on five simulations per tissue, the x-axis shows the rank of the causal variant, and the y-axis the proportion of times this outcome occurred. We notice that, as the whole blood experiment was smaller than the other experiments, sample size does not seem to affect the distribution. The causal variant is the most associated variant in 45% of cases, and among the ten most significantly associated variants 89% of times. The boxes show the 25th and 75th percentiles, and the whiskers end at the furthest value from the edge of the box that is not more than 1.5 times the inter-quartile range. Any values outside these whiskers are outliers that are plotted directly.
The LEVs called using sequence have a lower minor allele frequency than those called using arrays (0.26 vs. 0.27). The box edges show the 25th and 75th percentiles, and the whiskers end at the maximum and minimum values.
Supplementary Figure 3 Impute-derived INFO scores for the two genotyping arrays of the sequence LEVs.
The dots in pink reflect LEVs that were filtered from the array data due to poor imputation quality.
The CaVEMaN score is calibrated using the simulations to estimate the probability that the lead eQTL variant is causal. The estimated calibration functions are consistent across tissues, with the exception of blood, which is slightly less conservative than the other tissues, probably due to the smaller sample size.
Five simulated datasets were produced based on the genotype data and eQTLs mapped in Geuvadis. CaVEMaN was run on all of these datasets, and we plot the median CaVEMaN causal probabilities for LEVs, binned into 10 groups, against the true proportions of LEVs in these bins that were causal on the trait. We also show on this plot the equivalent analysis performed using dap-g and a further simulation where the assumption of only one genetic signal in the region is violated by simulating a secondary eQTL.
The comparison was only performed for genes with only one eQTL to minimize differences due to multiple eQTL mapping strategies. Spearman correlation between the two estimates was 0.856.
Supplementary Figures 1–6, Supplementary Tables 1 and 2 and Supplementary Note
A full list of all eQTLs discovered in the five experiments, together with P value for association and causal probability score
A list of high confidence causal variants which are also significantly associated with a GWAS trait, together with an estimate produced by coloc of the probability of a shared genetic signal
About this article
Cite this article
Brown, A., Viñuela, A., Delaneau, O. et al. Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues. Nat Genet 49, 1747–1751 (2017). https://doi.org/10.1038/ng.3979
Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations
Nature Communications (2020)
A Genetic Variant in the BCL2 Gene Associates with Adalimumab Response in Hidradenitis Suppurativa Clinical Trials and Regulates Expression of BCL2
Journal of Investigative Dermatology (2020)
Trends in Genetics (2020)
Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants across Biological Contexts.
Biological Psychiatry (2020)
Genetic Support for Longevity-Enhancing Drug Targets: Issues, Preliminary Data, and Future Directions
The Journals of Gerontology: Series A (2019)