The impact of short tandem repeat variation on gene expression

Abstract

Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole-genome sequencing and expression data for 17 tissues from the Genotype–Tissue Expression Project to identify more than 28,000 STRs for which repeat number is associated with expression of nearby genes (eSTRs). We use fine-mapping to quantify the probability that each eSTR is causal and characterize the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published genome-wide association study signals and implicate specific eSTRs in complex traits, including height, schizophrenia, inflammatory bowel disease and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes, and our data should serve as a valuable resource for future studies of complex traits.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Multitissue identification of eSTRs.
Fig. 2: Characterization of FM-eSTRs.
Fig. 3: FM-eSTRs colocalize with GWAS signals.
Fig. 4: Summary of FM-eSTRs classes and potential regulatory mechanisms.

Data availability

All eSTR summary statistics are available for download on WebSTR http://webstr.ucsd.edu/downloads.

Code availability

Code for performing analyses and generating figures is available at http://github.com/gymreklab/gtex-estrs-paper.

References

  1. 1.

    GTEx Consortium Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed Central  Google Scholar 

  2. 2.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Grünewald, T. G. P. et al. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat. Genet. 47, 1073–1078 (2015).

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Song, J. H. T., Lowe, C. B. & Kingsley, D. M. Characterization of a human-specific tandem repeat associated with bipolar disorder and schizophrenia. Am. J. Hum. Genet. 103, 421–430 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Boettger, L. M. et al. Recurring exon deletions in the HP (haptoglobin) gene contribute to lower blood cholesterol levels. Nat. Genet. 48, 359–366 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Leffler, E. M. et al. Resistance to malaria through structural variation of red blood cell invasion receptors. Science 356, eaam6393 (2017).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Sekar, A. et al. Schizophrenia risk from complex variation of complement component 4. Nature 530, 177–183 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Sun, J. X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl Acad. Sci. USA 107, 961–968 (2010).

    CAS  PubMed  Google Scholar 

  10. 10.

    Willems, T. et al. Population-scale sequencing data enable precise estimates of Y-STR mutation rates. Am. J. Hum. Genet. 98, 919–933 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Mirkin, S. M. Expandable DNA repeats and human disease. Nature 447, 932–940 (2007).

    CAS  PubMed  Google Scholar 

  12. 12.

    Willems, T. et al. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Li, H. Towards better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Gymrek, M. et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016).

    CAS  PubMed  Google Scholar 

  15. 15.

    Nasrallah, M. P. et al. Differential effects of a polyalanine tract expansion in Arx on neural development and gene expression. Hum. Mol. Genet. 21, 1090–1098 (2012).

    CAS  PubMed  Google Scholar 

  16. 16.

    Quilez, J. et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 44, 3750–3762 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Gemayel, R., Vinces, M. D., Legendre, M. & Verstrepen, K. J. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu. Rev. Genet. 44, 445–477 (2010).

    CAS  PubMed  Google Scholar 

  19. 19.

    Liu, X. S. et al. Rescue of fragile X syndrome neurons by DNA methylation editing of the FMR1 gene. Cell 172, 979–992.e6 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nat. Genet. 44, 743–750 (2012).

    CAS  PubMed  Google Scholar 

  21. 21.

    Suter, B., Schnappauf, G. & Thoma, F. Poly(dA.dT) sequences exist as rigid DNA structures in nucleosome-free yeast promoters in vivo. Nucleic Acids Res. 28, 4083–4089 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Afek, A., Schipper, J. L., Horton, J., Gordan, R. & Lukatsky, D. B. Protein-DNA binding in the absence of specific base-pair recognition. Proc. Natl Acad. Sci. USA 111, 17140–17145 (2014).

    CAS  PubMed  Google Scholar 

  23. 23.

    Conlon, E. G. et al. The C9ORF72 GGGGCC expansion forms RNA G-quadruplex inclusions and sequesters hnRNP H to disrupt splicing in ALS brains. eLife 5, e17820 (2016).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Lin, Y., Dent, S. Y., Wilson, J. H., Wells, R. D. & Napierala, M. R loops stimulate genetic instability of CTG.CAG repeats. Proc. Natl Acad. Sci. USA 107, 692–697 (2010).

    CAS  PubMed  Google Scholar 

  25. 25.

    Rothenburg, S., Koch-Nolte, F., Rich, A. & Haag, F. A polymorphic dinucleotide repeat in the rat nucleolin gene forms Z-DNA and inhibits promoter activity. Proc. Natl Acad. Sci. USA 98, 8985–8990 (2001).

    CAS  PubMed  Google Scholar 

  26. 26.

    Min, J. L. et al. The use of genome-wide eQTL associations in lymphoblastoid cell lines to identify novel genetic pathways involved in complex traits. PLoS ONE 6, e22070 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–59 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Borel, C. et al. Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB. Hum. Mutat. 33, 1302–1309 (2012).

    CAS  PubMed  Google Scholar 

  29. 29.

    Contente, A., Dittmer, A., Koch, M. C., Roth, J. & Dobbelstein, M. A polymorphic microsatellite that mediates induction of PIG3 by p53. Nat. Genet. 30, 315–320 (2002).

    PubMed  Google Scholar 

  30. 30.

    Gebhardt, F., Zänker, K. S. & Brandt, B. Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1. J. Biol. Chem. 274, 13176–13180 (1999).

    CAS  PubMed  Google Scholar 

  31. 31.

    Johnson, A. D. et al. Genome-wide association meta-analysis for total serum bilirubin levels. Hum. Mol. Genet. 18, 2700–2710 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Matsuzono, K. et al. Antisense oligonucleotides reduce RNA foci in spinocerebellar ataxia 36 patient iPSCs. Mol. Ther. Nucleic Acids 8, 211–219 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Saha, A. et al. Functional IFNG polymorphism in intron 1 in association with an increased risk to promote sporadic breast cancer. Immunogenetics 57, 165–171 (2005).

    CAS  PubMed  Google Scholar 

  34. 34.

    Shimajiri, S. et al. Shortened microsatellite d(CA)21 sequence down-regulates promoter activity of matrix metalloproteinase 9 gene. FEBS Lett. 455, 70–74 (1999).

    CAS  PubMed  Google Scholar 

  35. 35.

    Vikman, S. et al. Functional analysis of 5-lipoxygenase promoter repeat variants. Hum. Mol. Genet. 18, 4521–4529 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics. 198, 497–508 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Kobayashi, H. et al. Expansion of intronic GGCCTG hexanucleotide repeat in NOP56 causes SCA36, a type of spinocerebellar ataxia accompanied by motor neuron involvement. Am. J. Hum. Genet. 89, 121–130 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Lalioti, M. D. et al. Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature 386, 847–851 (1997).

    CAS  PubMed  Google Scholar 

  39. 39.

    Mougey, E. et al. ALOX5 polymorphism associates with increased leukotriene production and reduced lung function and asthma control in children with poorly controlled asthma. Clin. Exp. Allergy 43, 512–520 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Stephensen, C. B. et al. ALOX5 gene variants affect eicosanoid production and response to fish oil supplementation. J. Lipid Res. 52, 991–1003 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).

    CAS  PubMed  Google Scholar 

  42. 42.

    Jiang, C. & Pugh, B. F. Nucleosome positioning and gene regulation: advances through genomics. Nat. Rev. Genet. 10, 161–172 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Bochman, M. L., Paeschke, K. & Zakian, V. A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Ciesiolka, A., Jazurek, M., Drazkowska, K. & Krzyzosiak, W. J. Structural characteristics of simple RNA repeats associated with disease and their deleterious protein interactions. Front. Cell. Neurosci. 11, 97 (2017).

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    CAS  PubMed  Google Scholar 

  46. 46.

    Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912–919 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Guo, H. et al. Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases. Hum. Mol. Genet. 24, 3305–3313 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Haeuptle, M. A. et al. Human RFT1 deficiency leads to a disorder of N-linked glycosylation. Am. J. Hum. Genet. 82, 600–606 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Saini, S., Mitra, I., Mousavi, N., Fotsing, S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat. Commun. 9, 4397 (2018).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    Chiang, C. et al. The impact of structural variation on human gene expression. Nat. Genet. 49, 692–699 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Hasler, J. & Strub, K. Alu elements as regulators of gene expression. Nucleic Acids Res. 34, 5491–5497 (2006).

    PubMed  PubMed Central  Google Scholar 

  55. 55.

    Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature 526, 68–74 (2015).

    PubMed Central  Google Scholar 

  57. 57.

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

    PubMed  PubMed Central  Google Scholar 

  59. 59.

    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  Google Scholar 

  60. 60.

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Seabold, S. P. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 57–61 (SCIPY, 2010).

  62. 62.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 19, 151 (2018).

    PubMed  PubMed Central  Google Scholar 

  66. 66.

    Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).

    CAS  PubMed  Google Scholar 

  67. 67.

    Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    PubMed  PubMed Central  Google Scholar 

  68. 68.

    Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Research reported in this publication was supported in part by the Office of The Director, National Institutes of Health under Award Number DP5OD024577 (M.G.). We thank V. Bafna, E. Mendenhall, J. Gleeson and Y. Liu for helpful comments. See the Supplementary Note for additional acknowledgements.

Author information

Affiliations

Authors

Contributions

S.F.F. performed all eSTR and SNP mapping, helped to perform downstream analyses and helped to draft the manuscript. J.M. performed multitissue analysis using mashR and helped to revise the manuscript. C.W. optimized and performed the reporter assay. S.S. participated in the design of the STR imputation analysis. S.S.-B. lead, designed and analyzed data from the reporter assay. R.Y. implemented the WebSTR web application. A.G. conceived and planned analyses and validation experiments of regulatory effects of eSTRs and wrote the manuscript. M.G. conceived the study, designed and performed analyses and wrote the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Melissa Gymrek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Relationship between sample size and number of eSTRs detected.

The x-axis shows the number of samples per tissue. The y-axis shows the number of eSTRs (gene-level FDR<10%) detected in each tissue. Each dot represents a single tissue, using the same colors as shown in Fig. 1 in the main text (see box on the right). Notably, although whole blood and skeletal muscle had the highest number of samples, we identified fewer eSTRs in those tissues than in others with lower sample sizes. This is concordant with previous results for SNPs in the GTEx cohort and may reflect higher cell-type heterogeneity in these tissue samples.

Extended Data Fig. 2 Enrichment of genomic annotations as a function of CAVIAR threshold.

The x-axis represents CAVIAR thresholds in terms of the percentile (percentage of all 28,375 eSTRs excluded by those thresholds). The y-axis represents the odds ratio for enrichment in eSTRs above each percentile threshold in each of these categories: a. 5’UTRs (purple); b. 3’UTRs (blue); c. promoters (orange; TSS +/- 3kb); d. Coding regions (red) and e. Introns (green). The y-axis center values denote the log2 odds ratios comparing eSTRs passing each threshold to all STRs. Error bars represent +/−1 s.e.

Extended Data Fig. 3 Example multi-allelic FM-eSTRs.

For each plot, the x-axis represents the mean number of repeats in each individual and the y-axis represents normalized expression in the tissue for which the eSTR was most significant. Boxplots summarize the distribution of expression values for each genotype. Horizontal lines show median values, boxes span from the 25th percentile (Q1) to the 75th percentile (Q3). Whiskers extend to Q1-1.5*IQR (bottom) and Q3+1.5*IQR (top), where IQR gives the interquartile range (Q3-Q1). The red line shows the mean expression for each x-axis value.

Extended Data Fig. 4 Sharing of eSTRs across tissues.

The x-axis represents the number of tissues that share a given eSTR (absolute value of mashR Z-score >4). The y-axis represents the number of eSTRs shared across a given number of tissues.

Extended Data Fig. 5 Localization of all STRs around putative regulatory regions.

Left and right plots show localization around transcription start sites and DNaseI HS clusters, respectively. The y-axis denotes the fraction of STRs of each type in each bin. For promoters, the x-axis is divided into 100bp bins. For DNaseI HS sites, the x-axis is divided into 50bp bins. In each plot, values were smoothed by taking a sliding average of each four consecutive bins. Only STR-gene pairs included in our analysis are considered. Each plot compares localization of the two possible sequences of a given repeat unit on the coding strand. Top plots compare repeat units of the form CnG vs. their reverse complement on the opposite strand, middle plots compare AC vs. GT repeats, and bottom plots compare A vs. T repeats. The strand of each STR was determined based on the coding strand of each target gene.

Extended Data Fig. 6 Relative probability of eSTRs around TSSs and DNaseI HS sites for a range of CAVIAR scores.

Plots are shown for FM-eSTRs defined using multiple CAVIAR thresholds (0, corresponding to all eSTRs, 0.3, as used in the main text, or 0.5). a., c., and e. show the relative probability of an STR to be an FM-eSTR around TSSs. The black lines represent the probability of an STR in each bin to be an FM-eSTR. Values were scaled relative to the genome-wide average. b., d., and f. show the relative probability of an STR to be an FM-eSTR around DNaseI HS clusters. Values were smoothed by taking a sliding average of each four consecutive bins.

Extended Data Fig. 7 Nucleosome occupancy and DNaseI hypersensitivity show distinct patterns around eSTRs.

a-c. Nucleosome density around STRs with different repeat unit lengths. Nucleosome density in GM12878 in 5bp windows is averaged across all STRs analyzed (dashed) and FM-eSTRs (solid) relative to the center of the STR. b. DNaseI HS density around STRs with different repeat unit lengths. The number of DNaseI HS reads in GM12878 (gray), fat (red), tibial nerve (yellow), and skin (cyan) is averaged across all STRs in each category. Solid lines show FM-eSTRs. Dashed lines show all STRs. Left=homopolymers, middle=dinucleotides, right=tetranucleotides. Other repeat unit lengths were excluded since they have low numbers of FM-eSTRs (see Fig. 4a). Dashed vertical lines in (d) show the STR position +/- 147bp.

Extended Data Fig. 8 Strand-biased characteristics of FM-eSTRs.

Top panel: the y-axis shows the number of FM-eSTRs with each repeat unit on the template strand. Bottom panel: the y-axis shows the percentage of FM-eSTRs with each repeat unit on the template strand that have positive effect sizes. Gray bars denote A-rich repeat units (A/AC/AAC/AAAC) and red bars denote T-rich repeat units (T/GT/GTT/GTTT). Single asterisks denote repeat units nominally enriched or depleted (two-sided binomial p<0.05). Double asterisks denote repeat units significantly enriched after controlling for multiple hypothesis testing (Bonferroni adjusted p<0.05). Asterisks above brackets show significant differences between repeat unit pairs. Asterisks on x-axis labels denote departure from the 50% positive effect sizes expected by chance. Error bars give 95% confidence intervals.

Extended Data Fig. 9 Example GWAS signals co-localized with FM-eSTRs.

Left: For each plot, the x-axis represents the mean number of repeats in each individual and the y-axis represents normalized expression in the tissue with the most significant eSTR signal at each locus. Boxplots summarize the distribution of expression values for each genotype. Box plots are as defined in Fig. 1c. The red line shows the mean expression for each x-axis value. Right: Top panels give genes in each region. The target gene for the eQTL associations is shown in black. Middle panels give the -log10 p-values of association of the effect-size between each SNP (black points) and the expression of the target gene. The FM-eSTR is denoted by a red star. Bottom panels give the -log10 p-values of association between each SNP and the trait based on published GWAS summary statistics. P-values are two-sided and are based on t-statistics computed for effect sizes (β) (see Methods). Dashed gray horizontal lines give the genome-wide significance threshold of 5E-8.

Extended Data Fig. 10 Example GWAS signal for schizophrenia potentially driven by an eSTR for MED19 .

a. eSTR association for MED19. The x-axis shows STR genotypes at an AC repeat (chr11:57523883) as the mean number of repeats in each individual and the y-axis shows normalized MED19 expression in subcutaneous adipose. Each point represents a single individual. Red lines show the mean expression for each x-axis value. Boxplots are as defined in Fig. 1c. b. Summary statistics for MED19 expression and schizophrenia. The top panel shows genes in the region around MED19. The middle panel shows the -log10 p-values of association between each variant and MED19 expression in subcutaneous adipose tissue in the GTEx cohort. The FM-eSTR is denoted by a red star. The bottom panel shows the -log10 p-values of association for each variant with schizophrenia reported by the Psychiatric Genomics Consortium. The dashed gray horizontal line shows genome-wide significance threshold of 5E-8. c. Detailed view of the MED19 locus. A UCSC genome browser screenshot is shown for the region in the gray box in (b). The FM-eSTR is shown in red. The bottom track shows transcription factor (TF) and chromatin regulator binding sites profiled by ENCODE. The bottom panel shows long-range interactions reported by Mifsud, et al. using Capture Hi-C on GM12878. Interactions shown in black include MED19. Interactions to loci outside of the window depicted are not shown.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–14

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–7

Supplementary Data 1

All unique eSTRs identified across 17 tissues

Supplementary Data 2

Complete eSTR summary statistics

Supplementary Data 3

FM-eSTRs within 1 Mb of published hits from the NHGRI GWAS Catalog

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fotsing, S.F., Margoliash, J., Wang, C. et al. The impact of short tandem repeat variation on gene expression. Nat Genet 51, 1652–1659 (2019). https://doi.org/10.1038/s41588-019-0521-9

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing