Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues

Published online:


Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying causal variants remains difficult. Whole-genome sequencing (WGS) can help by providing complete knowledge of all genetic variants, but it is financially prohibitive for well-powered GWAS studies. We performed mapping of expression quantitative trait loci (eQTLs) with WGS and RNA-seq, and found that lead eQTL variants called with WGS were more likely to be causal. Through simulations, we derived properties of causal variants and used them to develop a method for identifying likely causal SNPs. We estimated that 25–70% of causal variants were located in open-chromatin regions, depending on the tissue and experiment. Finally, we identified a set of high-confidence causal variants and showed that these were more enriched in GWAS associations than other eQTLs. Of those, we found 65 associations with GWAS traits and provide examples in which genes implicated by expression are functionally validated as being relevant for complex traits.

  • Subscribe to Nature Genetics for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.




  1. 1.

    et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

  2. 2.

    & Strategies for fine-mapping complex traits. Hum. Mol. Genet. 24, R111–R119 (2015).

  3. 3.

    & Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

  4. 4.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  5. 5.

    et al. Genetic interactions affecting human gene expression identified by variance association mapping. eLife 3, e01381 (2014).

  6. 6.

    et al. Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat. Genet. 47, 88–91 (2015).

  7. 7.

    UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  8. 8.

    et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

  9. 9.

    et al. A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans. Nat. Commun. 5, 4871 (2014).

  10. 10.

    et al. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat. Genet. 48, 1303–1312 (2016).

  11. 11.

    et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  12. 12.

    & Empirical nonparametric bootstrap strategies in quantitative trait loci mapping: conditioning on the genetic model. Genetics 148, 525–535 (1998).

  13. 13.

    , & Confidence intervals in QTL mapping by bootstrapping. Genetics 143, 1013–1020 (1996).

  14. 14.

    , , , & Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

  15. 15.

    , , & Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am. J. Hum. Genet. 98, 1114–1129 (2016).

  16. 16.

    et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

  17. 17.

    & Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

  18. 18.

    et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

  19. 19.

    et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 2017).

  20. 20.

    et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

  21. 21.

    et al. GPER deficiency in male mice results in insulin resistance, dyslipidemia, and a proinflammatory state. Endocrinology 154, 4136–4145 (2013).

  22. 22.

    et al. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations. Nat. Genet. 41, 157–159 (2009).

  23. 23.

    , , & Decreased Npc1 gene dosage in mice is associated with weight gain. Obesity (Silver Spring) 18, 1457–1459 (2010).

  24. 24.

    et al. Npc1 haploinsufficiency promotes weight gain and metabolic features associated with insulin resistance. Hum. Mol. Genet. 20, 312–321 (2011).

  25. 25.

    , , & NPC1 in human white adipose tissue and obesity. BMC Endocr. Disord. 13, 5 (2013).

  26. 26.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  27. 27.

    et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442–1453 (2016).

  28. 28.

    et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

  29. 29.

    et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  30. 30.

    , , & The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012).

  31. 31.

    et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  32. 32.

    1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  33. 33.

    , , , & Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).

  34. 34.

    , , & A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9, e1003486 (2013).

  35. 35.

    , & Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).

  36. 36.

    & Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).

  37. 37.

    The International Multiple Sclerosis Genetics Consortium. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 45, 1353–1360 (2013).

  38. 38.

    et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).

  39. 39.

    et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

  40. 40.

    et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat. Genet. 48, 552–555 (2016).

  41. 41.

    et al. Genome-wide associations for birth weight and correlations with adult disease. Nature 538, 248–252 (2016).

  42. 42.

    et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

  43. 43.

    et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

  44. 44.

    et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

  45. 45.

    et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

  46. 46.

    et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012).

  47. 47.

    et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

  48. 48.

    et al.; Global Lipid Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

  49. 49.

    et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

  50. 50.

    et al. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 8, 15452 (2017).

Download references


We thank N. Lykoskoufis for assistance with the enrichment analysis. T.S. is supported as an NIHR Senior Research Fellow. This project was supported by a Helse Sør-Øst grant (2011060) to A.B. and an MRC Project Grant (L01999X/1) to K.S., and by grants from the NIH-NIMH (NIH-R01MH101814-GTEx), an IMI-Joint Undertaking of the European Commission (UE7-DIRECT-115317-1), the European Commission (UE7-EUROBATS-259749), the European Research Council (UE7-POPRNASEQ-260927), the Louis Jeantet Foundation, the Swiss National Science Foundation (31003A-149984 and 31003A-170096), and SystemsX (2012/201-SysGenetix) to E.T.D. The TwinsUK study was funded by the Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013) and the Medical Research Council. The study also received support from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre, based at Guy's and St Thomas' NHS Foundation Trust, in partnership with King's College London. SNP genotyping was performed by The Wellcome Trust Sanger Institute and National Eye Institute via NIH-CIDR. This study used data generated by the UK10K Consortium. Funding for UK10K was provided by the Wellcome Trust under award WT091310. A full list of the investigators who contributed to the generation of the UK10K data is available at http://www.UK10K.org/. This research was supported by grants from the European Research Council. Computation was performed at the Vital-IT Center (http://www.vital-it.ch/) for high-performance computing of the SIB Swiss Institute of Bioinformatics.

Author information


  1. Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

    • Andrew Anand Brown
    • , Ana Viñuela
    • , Olivier Delaneau
    •  & Emmanouil T Dermitzakis
  2. Institute of Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, Switzerland.

    • Andrew Anand Brown
    • , Ana Viñuela
    • , Olivier Delaneau
    •  & Emmanouil T Dermitzakis
  3. Swiss Institute of Bioinformatics, Geneva, Switzerland.

    • Andrew Anand Brown
    • , Ana Viñuela
    • , Olivier Delaneau
    •  & Emmanouil T Dermitzakis
  4. NORMENT, KG Jebsen Centre for Psychosis Research, Oslo University Hospital, Oslo, Norway.

    • Andrew Anand Brown
  5. Department of Twin Research and Genetic Epidemiology, King's College London, London, UK.

    • Tim D Spector
    •  & Kerrin S Small


  1. Search for Andrew Anand Brown in:

  2. Search for Ana Viñuela in:

  3. Search for Olivier Delaneau in:

  4. Search for Tim D Spector in:

  5. Search for Kerrin S Small in:

  6. Search for Emmanouil T Dermitzakis in:


A.A.B. and E.T.D. designed the study. A.A.B. ran the analyses. A.A.B., A.V., and E.T.D. interpreted the results. A.A.B., A.V., and E.T.D. wrote the manuscript. O.D. provided methodological suggestions. K.S.S. and T.D.S. contributed data.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Andrew Anand Brown or Emmanouil T Dermitzakis.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–6, Supplementary Tables 1 and 2 and Supplementary Note

  2. 2.

    Life Sciences Reporting Summary

Text files

  1. 1.

    Supplementary Data Set 1

    A full list of all eQTLs discovered in the five experiments, together with P value for association and causal probability score

  2. 2.

    Supplementary Data Set 2

    A list of high confidence causal variants which are also significantly associated with a GWAS trait, together with an estimate produced by coloc of the probability of a shared genetic signal