Abstract
Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5–6.8% of eQTLs—a substantially higher fraction than prior estimates—and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Edwards, S.L., Beesley, J., French, J.D. & Dunning, A.M. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
GTEx Consortium. Human genomics: the Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J.O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).
Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).
Schlattl, A., Anders, S., Waszak, S.M., Huber, W. & Korbel, J.O. Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Res. 21, 2004–2013 (2011).
Bryois, J. et al. Cis and trans effects of human genomic variants on gene expression. PLoS Genet. 10, e1004461 (2014).
Gamazon, E.R., Nicolae, D.L. & Cox, N.J. A study of CNVs as trait-associated polymorphisms and as expression quantitative trait loci. PLoS Genet. 7, e1001292 (2011).
Sudmant, P.H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997/ (2013).
Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Ongen, H., Buil, A., Brown, A.A., Dermitzakis, E.T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Gymrek, M. et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016).
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).
Ashoor, H., Kleftogiannis, D., Radovanovic, A. & Bajic, V.B. DENdb: database of integrated human enhancers. Database 2015, (2015).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Gretarsdottir, S. et al. Genome-wide association study identifies a sequence variant within the DAB2IP gene conferring susceptibility to abdominal aortic aneurysm. Nat. Genet. 42, 692–697 (2010).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Suzuki, A. et al. Functional haplotypes of PADI4, encoding citrullinating enzyme peptidylarginine deiminase 4, are associated with rheumatoid arthritis. Nat. Genet. 34, 395–402 (2003).
Yang, X.-K. et al. Associations between PADI4 gene polymorphisms and rheumatoid arthritis: an updated meta-analysis. Arch. Med. Res. 46, 317–325 (2015).
Wu, C. et al. Joint analysis of three genome-wide association studies of esophageal squamous cell carcinoma in Chinese populations. Nat. Genet. 46, 1001–1006 (2014).
Barrett, J.H. et al. Genome-wide association study identifies three new melanoma susceptibility loci. Nat. Genet. 43, 1108–1113 (2011).
Stacey, S.N. et al. Insertion of an SVA-E retrotransposon into the CASP8 gene is associated with protection against prostate cancer. Hum. Mol. Genet. 25, 1008–1018 (2016).
de Cid, R. et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nat. Genet. 41, 211–215 (2009).
Chambers, J.C. et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat. Genet. 43, 1131–1138 (2011).
Craddock, N. et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).
Li, X. et al. The impact of rare variation on gene expression across tissues. Preprint at http://biorxiv.org/content/early/2016/09/09/074443/ (2016).
Li, X. & Montgomery, S.B. Detection and impact of rare regulatory variants in human disease. Front. Genet. 4, 67 (2013).
Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).
Quinlan, A.R. & Hall, I.M. Characterizing complex structural variation in germline and somatic genomes. Trends Genet. 28, 43–53 (2012).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Ganel, L. & Abel, H.J. FinMetSeq Consortium & Hall, I.M. SVScore: an impact prediction tool for structural variation. Bioinformatics http://dx.doi.org/10.1093/bioinformatics/btw789 (2016).
Cooper, N.J. et al. Detection and correction of artefacts in estimation of rare copy number variants and analysis of rare deletions in type 1 diabetes. Hum. Mol. Genet. 24, 1774–1790 (2015).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Harrow, J. et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7 (Suppl. 1), 1–9 (2006).
DeLuca, D.S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
Ongen, H., Buil, A., Brown, A., Dermitzakis, E. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics http://dx.doi.org/10.1093/bioinformatics/btv722 (2016).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Ho, J.W. et al. Comparative analysis of metazoan chromatin organization. Nature 512, 449–452 (2014).
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Acknowledgements
The authors thank R.E. Handsaker for advice on Genome STRiP, H.J. Abel for helpful statistical discussions and R.M. Layer for software contributions. This work was supported by the NIH (MH101810) (D.F.C.), the NIH/NHGRI (1UM1HG008853) (I.M.H.), a Burroughs Wellcome Fund Career Award (I.M.H.), a Mr. and Mrs. Spencer T. Olin Fellowship for Women in Graduate Study (A.J.S.), a Lucille P. Markey Biomedical Research Stanford Graduate Fellowship (J.R.D.), the Stanford Genome Training Program (SGTP; NIH/NHGRI T32HG000044) (J.R.D.), a Hewlett-Packard Stanford Graduate Fellowship (E.K.T.), and a doctoral scholarship from the Natural Science and Engineering Council of Canada (E.K.T.). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health. Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI/SAIC-Frederick, Inc. (SAIC-F) subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171) and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to The Broad Institute, Inc. Biorepository operations were funded through an SAIC-F subcontract to the Van Andel Institute (10ST1035). Additional data repository and project management were provided by SAIC-F (HHSN261200800001E). The Brain Bank was supported by supplements to University of Miami grants DA006227 & DA033684 and to contract N01MH000028. Statistical Methods development grants were made to the University of Geneva (MH090941 and MH101814), the University of Chicago (MH090951, MH090937, MH101820 and MH101825), the University of North Carolina—Chapel Hill (MH090936 and MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University at St. Louis (MH101810) and the University of Pennsylvania (MH101822).
Author information
Authors and Affiliations
Consortia
Contributions
C.C., A.B., S.B.M., D.F.C. and I.M.H. designed the experiments. C.C. and A.J.S. performed SV discovery and genotyping. C.C. performed common eQTL mapping, causality analyses, LD tagging and candidate GWAS analyses. J.R.D., E.K.T., X.L., Y.K. and F.N.D. identified gene expression outliers. C.C. and A.J.S. analyzed rare SVs. L.G. and I.M.H. designed SVScore annotation. D.F.C. and T.H. performed microarray-based CNV detection. C.C., D.F.C. and I.M.H. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
D.F.C. is a paid consultant of PierianDx. The authors declare no other competing financial interests.
Additional information
A full list of members and affiliations appears in the Supplementary Note.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–27, Supplementary Tables 1, 3, 4 and 6–9, and Supplementary Note. (PDF 7904 kb)
Supplementary Table 2
Excel file of all SV-only and joint eQTLs, along with causality scores. (XLSX 7084 kb)
Supplementary Table 5
Excel file of all SV-eQTL GWAS hits. (XLSX 94 kb)
Rights and permissions
About this article
Cite this article
Chiang, C., Scott, A., Davis, J. et al. The impact of structural variation on human gene expression. Nat Genet 49, 692–699 (2017). https://doi.org/10.1038/ng.3834
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3834
This article is cited by
-
Mapping and functional characterization of structural variation in 1060 pig genomes
Genome Biology (2024)
-
Structural variant landscapes reveal convergent signatures of evolution in sheep and goats
Genome Biology (2024)
-
Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing
Genome Biology (2024)
-
GPAD: a natural language processing-based application to extract the gene-disease association discovery information from OMIM
BMC Bioinformatics (2024)
-
Protein-altering variants at copy number-variable regions influence diverse human phenotypes
Nature Genetics (2024)