Analysis | Published:

Abundant contribution of short tandem repeats to gene expression variation in humans

Nature Genetics volume 48, pages 2229 (2016) | Download Citation

Abstract

The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10–15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

ArrayExpress

References

  1. 1.

    et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat. Genet. 40, 955–962 (2008).

  2. 2.

    et al. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448, 470–473 (2007).

  3. 3.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  4. 4.

    et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

  5. 5.

    et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

  6. 6.

    & Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30, 1095–1106 (2012).

  7. 7.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  8. 8.

    et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

  9. 9.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  10. 10.

    et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

  11. 11.

    et al. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 23, 749–761 (2013).

  12. 12.

    et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).

  13. 13.

    et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

  14. 14.

    , & The overdue promise of short tandem repeat variation for heritability. Trends Genet. 30, 504–512 (2014).

  15. 15.

    Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5, 435–445 (2004).

  16. 16.

    , , & Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu. Rev. Genet. 44, 445–477 (2010).

  17. 17.

    & Mutation of human short tandem repeats. Hum. Mol. Genet. 2, 1123–1128 (1993).

  18. 18.

    Expandable DNA repeats and human disease. Nature 447, 932–940 (2007).

  19. 19.

    , , , & A polymorphic microsatellite that mediates induction of PIG3 by p53. Nat. Genet. 30, 315–320 (2002).

  20. 20.

    , , , & Microsatellite instability regulates transcription factor binding and gene expression. Proc. Natl. Acad. Sci. USA 102, 3800–3804 (2005).

  21. 21.

    , , , & Fimbrial phase variation in Bordetella pertussis: a novel mechanism for transcriptional regulation. EMBO J. 9, 2803–2809 (1990).

  22. 22.

    , , & Molecular basis of Mycoplasma surface antigenic variation: a novel set of divergent genes undergo spontaneous mutation of periodic coding regions and 5′ regulatory sequences. EMBO J. 10, 4069–4079 (1991).

  23. 23.

    , , & A variable dinucleotide repeat in the CFTR gene contributes to phenotype diversity by forming RNA secondary structures that alter splicing. Proc. Natl. Acad. Sci. USA 101, 3504–3509 (2004).

  24. 24.

    et al. Intronic CA-repeat and CA-rich elements: a new class of regulators of mammalian alternative splicing. EMBO J. 24, 1988–1998 (2005).

  25. 25.

    , , & A polymorphic dinucleotide repeat in the rat nucleolin gene forms Z-DNA and inhibits promoter activity. Proc. Natl. Acad. Sci. USA 98, 8985–8990 (2001).

  26. 26.

    , & The molecular mechanism of phase variation of H. influenzae lipopolysaccharide. Cell 59, 657–665 (1989).

  27. 27.

    , , , & Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009).

  28. 28.

    et al. A genetic defect caused by a triplet repeat expansion in Arabidopsis thaliana. Science 323, 1060–1063 (2009).

  29. 29.

    & Microsatellite instability generates diversity in brain and sociobehavioral traits. Science 308, 1630–1634 (2005).

  30. 30.

    et al. Dissection of thousands of cell type–specific enhancers identifies dinucleotide repeat motifs as general enhancer features. Genome Res. 24, 1147–1156 (2014).

  31. 31.

    et al. Microsatellite tandem repeats are abundant in human promoters and are associated with regulatory elements. PLoS ONE 8, e54710 (2013).

  32. 32.

    et al. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res. 25, 1591–1599 (2015).

  33. 33.

    et al. Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB. Hum. Mutat. 33, 1302–1309 (2012).

  34. 34.

    , & Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1. J. Biol. Chem. 274, 13176–13180 (1999).

  35. 35.

    & Abundant raw material for cis-regulatory evolution in humans. Mol. Biol. Evol. 19, 1991–2004 (2002).

  36. 36.

    et al. Shortened microsatellite d(CA)21 sequence down-regulates promoter activity of matrix metalloproteinase 9 gene. FEBS Lett. 455, 70–74 (1999).

  37. 37.

    et al. Genotyping and functional analysis of a polymorphic (CCTTT)n repeat of NOS2A in diabetic retinopathy. FASEB J. 13, 1825–1832 (1999).

  38. 38.

    , , & HnRNP L stimulates splicing of the eNOS gene by binding to variable-length CA repeats. Nat. Struct. Biol. 10, 33–37 (2003).

  39. 39.

    et al. Aberrant splicing of HTT generates the pathogenic exon 1 protein in Huntington disease. Proc. Natl. Acad. Sci. USA 110, 2366–2370 (2013).

  40. 40.

    et al. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat. Genet. 47, 1073–1078 (2015).

  41. 41.

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  42. 42.

    et al. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).

  43. 43.

    , , & lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).

  44. 44.

    et al. Trinucleotide repeat length instability and age of onset in Huntington's disease. Nat. Genet. 4, 387–392 (1993).

  45. 45.

    et al. Meiotic stability and genotype-phenotype correlation of the trinucleotide repeat in X-linked spinal and bulbar muscular atrophy. Nat. Genet. 2, 301–304 (1992).

  46. 46.

    et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).

  47. 47.

    , & Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1. J. Biol. Chem. 274, 13176–13180 (1999).

  48. 48.

    et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8, e1002639 (2012).

  49. 49.

    , & Linkage disequilibrium between STRPs and SNPs across the human genome. Am. J. Hum. Genet. 82, 1039–1050 (2008).

  50. 50.

    , & Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci. bioRxiv 10.1101/020909 (2015).

  51. 51.

    et al. A systematic evaluation of short tandem repeats in lipid candidate genes: riding on the SNP-wave. PLoS ONE 9, e102113 (2014).

  52. 52.

    et al. Regulatory variants explain much more heritability than coding variants across 11 common diseases. bioRxiv 10.1101/004309 (2014).

  53. 53.

    et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  54. 54.

    Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).

  55. 55.

    et al. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 13, R7 (2012).

  56. 56.

    , , & Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

  57. 57.

    et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).

  58. 58.

    & ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

  59. 59.

    , , & GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding. Bioinformatics 10.1093/bioinformatics/btv565 (17 October 2015).

  60. 60.

    et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

  61. 61.

    UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  62. 62.

    et al. SLC2A9 influences uric acid concentrations with pronounced sex-specific effects. Nat. Genet. 40, 430–436 (2008).

  63. 63.

    et al. SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout. Nat. Genet. 40, 437–442 (2008).

  64. 64.

    et al. Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia. Am. J. Hum. Genet. 82, 139–149 (2008).

  65. 65.

    et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).

  66. 66.

    & 7 Genotyping for human whole-genome scans: past, present, and future. Adv. Genet. 42, 77–96 (2001).

  67. 67.

    et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).

  68. 68.

    et al. Haplotypes of common SNPs can explain missing heritability of complex diseases. bioRxiv 10.1101/022418 (2015).

  69. 69.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  70. 70.

    , , , & Rapid multiplexed genotyping of simple tandem repeats using capture and high-throughput sequencing. Hum. Mutat. 34, 1304–1311 (2013).

  71. 71.

    et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014).

  72. 72.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  73. 73.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  74. 74.

    et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  75. 75.

    et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007).

  76. 76.

    et al. A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Res. 38, e17 (2010).

  77. 77.

    , , & GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  78. 78.

    , & Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

Download references

Acknowledgements

We thank T. Lappalainen, A. Goren, T. Hashimoto and D. Zielinksi for useful comments and discussions. M.G. was supported by the National Defense Science and Engineering Graduate Fellowship. Y.E. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. This study was supported by a gift from Andria and Paul Heafy (Y.E.), National Institute of Justice (NIJ) grant 2014-DN-BX-K089 (Y.E. and T.W.), and US National Institutes of Health (NIH) grants 1U01HG007037 (H.Z.), R01MH084703 (J.K.P.), R01HG006399 (A.L.P.), HG006696 (A.J.S.), DA033660 (A.J.S.) and MH097018 (A.J.S.) and by research grant 6-FY13-92 from the March of Dimes Foundation (A.J.S.).

Author information

Affiliations

  1. Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA.

    • Melissa Gymrek
    • , Thomas Willems
    • , Barak Markus
    •  & Yaniv Erlich
  2. Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Melissa Gymrek
  3. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Melissa Gymrek
    • , Mark J Daly
    •  & Alkes L Price
  4. New York Genome Center, New York, New York, USA.

    • Melissa Gymrek
    • , Thomas Willems
    •  & Yaniv Erlich
  5. Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Thomas Willems
  6. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA.

    • Audrey Guilmatre
    •  & Andrew J Sharp
  7. Department of Pediatric Hematology, Robert Debré Hospital, Paris, France.

    • Audrey Guilmatre
  8. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Haoyang Zeng
  9. Department of Genetics and Biology, Stanford University, Stanford, California, USA.

    • Stoyan Georgiev
    •  & Jonathan K Pritchard
  10. Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA.

    • Mark J Daly
  11. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Alkes L Price
  12. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Alkes L Price
  13. Howard Hughes Medical Institute, Chevy Chase, Maryland, USA.

    • Jonathan K Pritchard
  14. Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, New York, USA.

    • Yaniv Erlich
  15. Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, USA.

    • Yaniv Erlich

Authors

  1. Search for Melissa Gymrek in:

  2. Search for Thomas Willems in:

  3. Search for Audrey Guilmatre in:

  4. Search for Haoyang Zeng in:

  5. Search for Barak Markus in:

  6. Search for Stoyan Georgiev in:

  7. Search for Mark J Daly in:

  8. Search for Alkes L Price in:

  9. Search for Jonathan K Pritchard in:

  10. Search for Andrew J Sharp in:

  11. Search for Yaniv Erlich in:

Contributions

M.G. and Y.E. conceived the study. M.G., T.W., H.Z., B.M. and Y.E. performed analyses. A.G. performed experimental work to generate high-coverage sequencing data for promoter STRs. S.G., M.J.D., A.L.P. and J.K.P. provided statistical input. A.J.S. contributed data and analyses. M.G., T.W. and Y.E. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Yaniv Erlich.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–12, Supplementary Note and Supplementary Tables 1–9.

CSV files

  1. 1.

    Supplementary Data Set 1: Significant eSTRs

    A table of all STR × gene associations at a gene-level FDR of 5%.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3461

Further reading

Newsletter Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing