Abstract

Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5–6.8% of eQTLs—a substantially higher fraction than prior estimates—and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , , & Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).

  2. 2.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  3. 3.

    GTEx Consortium. Human genomics: the Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  4. 4.

    et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

  5. 5.

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  6. 6.

    et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

  7. 7.

    , & Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

  8. 8.

    , , & Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).

  9. 9.

    et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

  10. 10.

    , , , & Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 21, 2004–2013 (2011).

  11. 11.

    et al. Cis and trans effects of human genomic variants on gene expression. PLoS Genet. 10, e1004461 (2014).

  12. 12.

    , & A study of CNVs as trait-associated polymorphisms and as expression quantitative trait loci. PLoS Genet. 7, e1001292 (2011).

  13. 13.

    et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

  14. 14.

    et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).

  15. 15.

    Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013).

  16. 16.

    , , & LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

  17. 17.

    , , & Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).

  18. 18.

    et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  19. 19.

    , , , & Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).

  20. 20.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  21. 21.

    , , , & Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

  22. 22.

    et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016).

  23. 23.

    , , & GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  24. 24.

    et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

  25. 25.

    et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  26. 26.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  27. 27.

    et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).

  28. 28.

    , , & DENdb: database of integrated human enhancers. Database 2015, (2015).

  29. 29.

    et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

  30. 30.

    et al. Genome-wide association study identifies a sequence variant within the DAB2IP gene conferring susceptibility to abdominal aortic aneurysm. Nat. Genet. 42, 692–697 (2010).

  31. 31.

    et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

  32. 32.

    et al. Functional haplotypes of PADI4, encoding citrullinating enzyme peptidylarginine deiminase 4, are associated with rheumatoid arthritis. Nat. Genet. 34, 395–402 (2003).

  33. 33.

    et al. Associations between PADI4 gene polymorphisms and rheumatoid arthritis: an updated meta-analysis. Arch. Med. Res. 46, 317–325 (2015).

  34. 34.

    et al. Joint analysis of three genome-wide association studies of esophageal squamous cell carcinoma in Chinese populations. Nat. Genet. 46, 1001–1006 (2014).

  35. 35.

    et al. Genome-wide association study identifies three new melanoma susceptibility loci. Nat. Genet. 43, 1108–1113 (2011).

  36. 36.

    et al. Insertion of an SVA-E retrotransposon into the CASP8 gene is associated with protection against prostate cancer. Hum. Mol. Genet. 25, 1008–1018 (2016).

  37. 37.

    et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nat. Genet. 41, 211–215 (2009).

  38. 38.

    et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat. Genet. 43, 1131–1138 (2011).

  39. 39.

    et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).

  40. 40.

    et al. The impact of rare variation on gene expression across tissues. Preprint at (2016).

  41. 41.

    & Detection and impact of rare regulatory variants in human disease. Front. Genet. 4, 67 (2013).

  42. 42.

    et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).

  43. 43.

    & Characterizing complex structural variation in germline and somatic genomes. Trends Genet. 28, 43–53 (2012).

  44. 44.

    et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  45. 45.

    & FinMetSeq Consortium & Hall, I.M. SVScore: an impact prediction tool for structural variation. Bioinformatics (2016).

  46. 46.

    et al. Detection and correction of artefacts in estimation of rare copy number variants and analysis of rare deletions in type 1 diabetes. Hum. Mol. Genet. 24, 1774–1790 (2015).

  47. 47.

    et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

  48. 48.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  49. 49.

    et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7 (Suppl. 1), 1–9 (2006).

  50. 50.

    et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).

  51. 51.

    , , , & Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics (2016).

  52. 52.

    , , , & Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

  53. 53.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  54. 54.

    et al. Comparative analysis of metazoan chromatin organization. Nature 512, 449–452 (2014).

  55. 55.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

Download references

Acknowledgements

The authors thank R.E. Handsaker for advice on Genome STRiP, H.J. Abel for helpful statistical discussions and R.M. Layer for software contributions. This work was supported by the NIH (MH101810) (D.F.C.), the NIH/NHGRI (1UM1HG008853) (I.M.H.), a Burroughs Wellcome Fund Career Award (I.M.H.), a Mr. and Mrs. Spencer T. Olin Fellowship for Women in Graduate Study (A.J.S.), a Lucille P. Markey Biomedical Research Stanford Graduate Fellowship (J.R.D.), the Stanford Genome Training Program (SGTP; NIH/NHGRI T32HG000044) (J.R.D.), a Hewlett-Packard Stanford Graduate Fellowship (E.K.T.), and a doctoral scholarship from the Natural Science and Engineering Council of Canada (E.K.T.). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health. Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI/SAIC-Frederick, Inc. (SAIC-F) subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171) and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to The Broad Institute, Inc. Biorepository operations were funded through an SAIC-F subcontract to the Van Andel Institute (10ST1035). Additional data repository and project management were provided by SAIC-F (HHSN261200800001E). The Brain Bank was supported by supplements to University of Miami grants DA006227 & DA033684 and to contract N01MH000028. Statistical Methods development grants were made to the University of Geneva (MH090941 and MH101814), the University of Chicago (MH090951, MH090937, MH101820 and MH101825), the University of North Carolina—Chapel Hill (MH090936 and MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University at St. Louis (MH101810) and the University of Pennsylvania (MH101822).

Author information

Affiliations

  1. McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Colby Chiang
    • , Alexandra J Scott
    • , Liron Ganel
    •  & Ira M Hall
  2. Department of Pathology, Stanford University School of Medicine, Stanford, California, USA.

    • Joe R Davis
    • , Emily K Tsang
    • , Xin Li
    •  & Stephen B Montgomery
  3. Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.

    • Joe R Davis
    •  & Stephen B Montgomery
  4. Biomedical Informatics Program, Stanford University School of Medicine, Stanford, California, USA.

    • Emily K Tsang
  5. Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.

    • Yungil Kim
    • , Farhan N Damani
    •  & Alexis Battle
  6. Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Tarik Hadzic
  7. Department of Computer Science, Stanford University, Stanford, California, USA.

    • Stephen B Montgomery
  8. Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Donald F Conrad
    •  & Ira M Hall
  9. Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Donald F Conrad
  10. Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Ira M Hall

Consortia

  1. GTEx Consortium

    A full list of members and affiliations appears in the Supplementary Note.

Authors

  1. Search for Colby Chiang in:

  2. Search for Alexandra J Scott in:

  3. Search for Joe R Davis in:

  4. Search for Emily K Tsang in:

  5. Search for Xin Li in:

  6. Search for Yungil Kim in:

  7. Search for Tarik Hadzic in:

  8. Search for Farhan N Damani in:

  9. Search for Liron Ganel in:

  10. Search for Stephen B Montgomery in:

  11. Search for Alexis Battle in:

  12. Search for Donald F Conrad in:

  13. Search for Ira M Hall in:

Contributions

C.C., A.B., S.B.M., D.F.C. and I.M.H. designed the experiments. C.C. and A.J.S. performed SV discovery and genotyping. C.C. performed common eQTL mapping, causality analyses, LD tagging and candidate GWAS analyses. J.R.D., E.K.T., X.L., Y.K. and F.N.D. identified gene expression outliers. C.C. and A.J.S. analyzed rare SVs. L.G. and I.M.H. designed SVScore annotation. D.F.C. and T.H. performed microarray-based CNV detection. C.C., D.F.C. and I.M.H. wrote the manuscript.

Competing interests

D.F.C. is a paid consultant of PierianDx. The authors declare no other competing financial interests.

Corresponding authors

Correspondence to Donald F Conrad or Ira M Hall.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–27, Supplementary Tables 1, 3, 4 and 6–9, and Supplementary Note.

Excel files

  1. 1.

    Supplementary Table 2

    Excel file of all SV-only and joint eQTLs, along with causality scores.

  2. 2.

    Supplementary Table 5

    Excel file of all SV-eQTL GWAS hits.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3834

Further reading