The impact of short tandem repeat variation on gene expression

Fotsing, Stephanie Feupe; Margoliash, Jonathan; Wang, Catherine; Saini, Shubham; Yanicky, Richard; Shleizer-Burko, Sharona; Goren, Alon; Gymrek, Melissa

doi:10.1038/s41588-019-0521-9

Analysis
Published: 01 November 2019

The impact of short tandem repeat variation on gene expression

Nature Genetics volume 51, pages 1652–1659 (2019)Cite this article

13k Accesses
106 Citations
120 Altmetric
Metrics details

Subjects

Abstract

Short tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole-genome sequencing and expression data for 17 tissues from the Genotype–Tissue Expression Project to identify more than 28,000 STRs for which repeat number is associated with expression of nearby genes (eSTRs). We use fine-mapping to quantify the probability that each eSTR is causal and characterize the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published genome-wide association study signals and implicate specific eSTRs in complex traits, including height, schizophrenia, inflammatory bowel disease and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes, and our data should serve as a valuable resource for future studies of complex traits.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Multitissue identification of eSTRs.**

**Fig. 2: Characterization of FM-eSTRs.**

**Fig. 3: FM-eSTRs colocalize with GWAS signals.**

**Fig. 4: Summary of FM-eSTRs classes and potential regulatory mechanisms.**

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Protein-truncating variants in BSN are associated with severe adult-onset obesity, type 2 diabetes and fatty liver disease

Article Open access 04 April 2024

Yajie Zhao, Maria Chukanova, … John R. B. Perry

Data availability

All eSTR summary statistics are available for download on WebSTR http://webstr.ucsd.edu/downloads.

Code availability

Code for performing analyses and generating figures is available at http://github.com/gymreklab/gtex-estrs-paper.

References

GTEx Consortium Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
PubMed Central Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
CAS PubMed PubMed Central Google Scholar
Grünewald, T. G. P. et al. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat. Genet. 47, 1073–1078 (2015).
PubMed PubMed Central Google Scholar
Song, J. H. T., Lowe, C. B. & Kingsley, D. M. Characterization of a human-specific tandem repeat associated with bipolar disorder and schizophrenia. Am. J. Hum. Genet. 103, 421–430 (2018).
CAS PubMed PubMed Central Google Scholar
Boettger, L. M. et al. Recurring exon deletions in the HP (haptoglobin) gene contribute to lower blood cholesterol levels. Nat. Genet. 48, 359–366 (2016).
CAS PubMed PubMed Central Google Scholar
Leffler, E. M. et al. Resistance to malaria through structural variation of red blood cell invasion receptors. Science 356, eaam6393 (2017).
PubMed PubMed Central Google Scholar
Sekar, A. et al. Schizophrenia risk from complex variation of complement component 4. Nature 530, 177–183 (2016).
CAS PubMed PubMed Central Google Scholar
Sun, J. X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).
CAS PubMed PubMed Central Google Scholar
Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl Acad. Sci. USA 107, 961–968 (2010).
CAS PubMed PubMed Central Google Scholar
Willems, T. et al. Population-scale sequencing data enable precise estimates of Y-STR mutation rates. Am. J. Hum. Genet. 98, 919–933 (2016).
CAS PubMed PubMed Central Google Scholar
Mirkin, S. M. Expandable DNA repeats and human disease. Nature 447, 932–940 (2007).
CAS PubMed Google Scholar
Willems, T. et al. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).
CAS PubMed PubMed Central Google Scholar
Li, H. Towards better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
CAS PubMed PubMed Central Google Scholar
Gymrek, M. et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016).
CAS PubMed Google Scholar
Nasrallah, M. P. et al. Differential effects of a polyalanine tract expansion in Arx on neural development and gene expression. Hum. Mol. Genet. 21, 1090–1098 (2012).
CAS PubMed Google Scholar
Quilez, J. et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 44, 3750–3762 (2016).
CAS PubMed PubMed Central Google Scholar
Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009).
CAS PubMed PubMed Central Google Scholar
Gemayel, R., Vinces, M. D., Legendre, M. & Verstrepen, K. J. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu. Rev. Genet. 44, 445–477 (2010).
CAS PubMed Google Scholar
Liu, X. S. et al. Rescue of fragile X syndrome neurons by DNA methylation editing of the FMR1 gene. Cell 172, 979–992.e6 (2018).
CAS PubMed PubMed Central Google Scholar
Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nat. Genet. 44, 743–750 (2012).
CAS PubMed Google Scholar
Suter, B., Schnappauf, G. & Thoma, F. Poly(dA.dT) sequences exist as rigid DNA structures in nucleosome-free yeast promoters in vivo. Nucleic Acids Res. 28, 4083–4089 (2000).
CAS PubMed PubMed Central Google Scholar
Afek, A., Schipper, J. L., Horton, J., Gordan, R. & Lukatsky, D. B. Protein-DNA binding in the absence of specific base-pair recognition. Proc. Natl Acad. Sci. USA 111, 17140–17145 (2014).
CAS PubMed PubMed Central Google Scholar
Conlon, E. G. et al. The C9ORF72 GGGGCC expansion forms RNA G-quadruplex inclusions and sequesters hnRNP H to disrupt splicing in ALS brains. eLife 5, e17820 (2016).
PubMed PubMed Central Google Scholar
Lin, Y., Dent, S. Y., Wilson, J. H., Wells, R. D. & Napierala, M. R loops stimulate genetic instability of CTG.CAG repeats. Proc. Natl Acad. Sci. USA 107, 692–697 (2010).
CAS PubMed Google Scholar
Rothenburg, S., Koch-Nolte, F., Rich, A. & Haag, F. A polymorphic dinucleotide repeat in the rat nucleolin gene forms Z-DNA and inhibits promoter activity. Proc. Natl Acad. Sci. USA 98, 8985–8990 (2001).
CAS PubMed PubMed Central Google Scholar
Min, J. L. et al. The use of genome-wide eQTL associations in lymphoblastoid cell lines to identify novel genetic pathways involved in complex traits. PLoS ONE 6, e22070 (2011).
CAS PubMed PubMed Central Google Scholar
Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–59 (2017).
CAS PubMed PubMed Central Google Scholar
Borel, C. et al. Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB. Hum. Mutat. 33, 1302–1309 (2012).
CAS PubMed Google Scholar
Contente, A., Dittmer, A., Koch, M. C., Roth, J. & Dobbelstein, M. A polymorphic microsatellite that mediates induction of PIG3 by p53. Nat. Genet. 30, 315–320 (2002).
PubMed Google Scholar
Gebhardt, F., Zänker, K. S. & Brandt, B. Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1. J. Biol. Chem. 274, 13176–13180 (1999).
CAS PubMed Google Scholar
Johnson, A. D. et al. Genome-wide association meta-analysis for total serum bilirubin levels. Hum. Mol. Genet. 18, 2700–2710 (2009).
CAS PubMed PubMed Central Google Scholar
Matsuzono, K. et al. Antisense oligonucleotides reduce RNA foci in spinocerebellar ataxia 36 patient iPSCs. Mol. Ther. Nucleic Acids 8, 211–219 (2017).
CAS PubMed PubMed Central Google Scholar
Saha, A. et al. Functional IFNG polymorphism in intron 1 in association with an increased risk to promote sporadic breast cancer. Immunogenetics 57, 165–171 (2005).
CAS PubMed Google Scholar
Shimajiri, S. et al. Shortened microsatellite d(CA)21 sequence down-regulates promoter activity of matrix metalloproteinase 9 gene. FEBS Lett. 455, 70–74 (1999).
CAS PubMed Google Scholar
Vikman, S. et al. Functional analysis of 5-lipoxygenase promoter repeat variants. Hum. Mol. Genet. 18, 4521–4529 (2009).
CAS PubMed PubMed Central Google Scholar
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics. 198, 497–508 (2014).
CAS PubMed PubMed Central Google Scholar
Kobayashi, H. et al. Expansion of intronic GGCCTG hexanucleotide repeat in NOP56 causes SCA36, a type of spinocerebellar ataxia accompanied by motor neuron involvement. Am. J. Hum. Genet. 89, 121–130 (2011).
CAS PubMed PubMed Central Google Scholar
Lalioti, M. D. et al. Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature 386, 847–851 (1997).
CAS PubMed Google Scholar
Mougey, E. et al. ALOX5 polymorphism associates with increased leukotriene production and reduced lung function and asthma control in children with poorly controlled asthma. Clin. Exp. Allergy 43, 512–520 (2013).
CAS PubMed PubMed Central Google Scholar
Stephensen, C. B. et al. ALOX5 gene variants affect eicosanoid production and response to fish oil supplementation. J. Lipid Res. 52, 991–1003 (2011).
CAS PubMed PubMed Central Google Scholar
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
CAS PubMed Google Scholar
Jiang, C. & Pugh, B. F. Nucleosome positioning and gene regulation: advances through genomics. Nat. Rev. Genet. 10, 161–172 (2009).
CAS PubMed PubMed Central Google Scholar
Bochman, M. L., Paeschke, K. & Zakian, V. A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).
CAS PubMed PubMed Central Google Scholar
Ciesiolka, A., Jazurek, M., Drazkowska, K. & Krzyzosiak, W. J. Structural characteristics of simple RNA repeats associated with disease and their deleterious protein interactions. Front. Cell. Neurosci. 11, 97 (2017).
PubMed PubMed Central Google Scholar
MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
CAS PubMed Google Scholar
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
CAS PubMed PubMed Central Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
PubMed Central Google Scholar
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
CAS PubMed PubMed Central Google Scholar
Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912–919 (2018).
CAS PubMed PubMed Central Google Scholar
Guo, H. et al. Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases. Hum. Mol. Genet. 24, 3305–3313 (2015).
CAS PubMed PubMed Central Google Scholar
Haeuptle, M. A. et al. Human RFT1 deficiency leads to a disorder of N-linked glycosylation. Am. J. Hum. Genet. 82, 600–606 (2008).
CAS PubMed PubMed Central Google Scholar
Saini, S., Mitra, I., Mousavi, N., Fotsing, S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat. Commun. 9, 4397 (2018).
PubMed PubMed Central Google Scholar
Chiang, C. et al. The impact of structural variation on human gene expression. Nat. Genet. 49, 692–699 (2017).
CAS PubMed PubMed Central Google Scholar
Hasler, J. & Strub, K. Alu elements as regulators of gene expression. Nucleic Acids Res. 34, 5491–5497 (2006).
PubMed PubMed Central Google Scholar
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature 526, 68–74 (2015).
Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
PubMed PubMed Central Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
CAS PubMed Google Scholar
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
CAS PubMed PubMed Central Google Scholar
Seabold, S. P. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 57–61 (SCIPY, 2010).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
CAS PubMed PubMed Central Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
CAS PubMed PubMed Central Google Scholar
Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415 (2003).
CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 19, 151 (2018).
PubMed PubMed Central Google Scholar
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
CAS PubMed Google Scholar
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
PubMed PubMed Central Google Scholar
Browning, B. L., Zhou, Y. & Browning, S. R. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103, 338–348 (2018).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Research reported in this publication was supported in part by the Office of The Director, National Institutes of Health under Award Number DP5OD024577 (M.G.). We thank V. Bafna, E. Mendenhall, J. Gleeson and Y. Liu for helpful comments. See the Supplementary Note for additional acknowledgements.

Author information

Stephanie Feupe Fotsing
Present address: La Jolla Institute of Immunology, La Jolla, CA, USA

Authors and Affiliations

Biomedical Informatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
Stephanie Feupe Fotsing
Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
Stephanie Feupe Fotsing
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
Jonathan Margoliash, Shubham Saini & Melissa Gymrek
Department of Medicine, University of California San Diego, La Jolla, CA, USA
Jonathan Margoliash, Richard Yanicky, Sharona Shleizer-Burko, Alon Goren & Melissa Gymrek
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Catherine Wang

Authors

Stephanie Feupe Fotsing
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Margoliash
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Saini
View author publications
You can also search for this author in PubMed Google Scholar
Richard Yanicky
View author publications
You can also search for this author in PubMed Google Scholar
Sharona Shleizer-Burko
View author publications
You can also search for this author in PubMed Google Scholar
Alon Goren
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Gymrek
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.F.F. performed all eSTR and SNP mapping, helped to perform downstream analyses and helped to draft the manuscript. J.M. performed multitissue analysis using mashR and helped to revise the manuscript. C.W. optimized and performed the reporter assay. S.S. participated in the design of the STR imputation analysis. S.S.-B. lead, designed and analyzed data from the reporter assay. R.Y. implemented the WebSTR web application. A.G. conceived and planned analyses and validation experiments of regulatory effects of eSTRs and wrote the manuscript. M.G. conceived the study, designed and performed analyses and wrote the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Melissa Gymrek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Relationship between sample size and number of eSTRs detected.

The x-axis shows the number of samples per tissue. The y-axis shows the number of eSTRs (gene-level FDR<10%) detected in each tissue. Each dot represents a single tissue, using the same colors as shown in Fig. 1 in the main text (see box on the right). Notably, although whole blood and skeletal muscle had the highest number of samples, we identified fewer eSTRs in those tissues than in others with lower sample sizes. This is concordant with previous results for SNPs in the GTEx cohort and may reflect higher cell-type heterogeneity in these tissue samples.

Extended Data Fig. 2 Enrichment of genomic annotations as a function of CAVIAR threshold.

The x-axis represents CAVIAR thresholds in terms of the percentile (percentage of all 28,375 eSTRs excluded by those thresholds). The y-axis represents the odds ratio for enrichment in eSTRs above each percentile threshold in each of these categories: a. 5’UTRs (purple); b. 3’UTRs (blue); c. promoters (orange; TSS +/- 3kb); d. Coding regions (red) and e. Introns (green). The y-axis center values denote the log₂ odds ratios comparing eSTRs passing each threshold to all STRs. Error bars represent +/−1 s.e.

Extended Data Fig. 3 Example multi-allelic FM-eSTRs.

For each plot, the x-axis represents the mean number of repeats in each individual and the y-axis represents normalized expression in the tissue for which the eSTR was most significant. Boxplots summarize the distribution of expression values for each genotype. Horizontal lines show median values, boxes span from the 25th percentile (Q1) to the 75th percentile (Q3). Whiskers extend to Q1-1.5*IQR (bottom) and Q3+1.5*IQR (top), where IQR gives the interquartile range (Q3-Q1). The red line shows the mean expression for each x-axis value.

Extended Data Fig. 4 Sharing of eSTRs across tissues.

The x-axis represents the number of tissues that share a given eSTR (absolute value of mashR Z-score >4). The y-axis represents the number of eSTRs shared across a given number of tissues.

Extended Data Fig. 5 Localization of all STRs around putative regulatory regions.

Left and right plots show localization around transcription start sites and DNaseI HS clusters, respectively. The y-axis denotes the fraction of STRs of each type in each bin. For promoters, the x-axis is divided into 100bp bins. For DNaseI HS sites, the x-axis is divided into 50bp bins. In each plot, values were smoothed by taking a sliding average of each four consecutive bins. Only STR-gene pairs included in our analysis are considered. Each plot compares localization of the two possible sequences of a given repeat unit on the coding strand. Top plots compare repeat units of the form C_nG vs. their reverse complement on the opposite strand, middle plots compare AC vs. GT repeats, and bottom plots compare A vs. T repeats. The strand of each STR was determined based on the coding strand of each target gene.

Extended Data Fig. 6 Relative probability of eSTRs around TSSs and DNaseI HS sites for a range of CAVIAR scores.

Plots are shown for FM-eSTRs defined using multiple CAVIAR thresholds (0, corresponding to all eSTRs, 0.3, as used in the main text, or 0.5). a., c., and e. show the relative probability of an STR to be an FM-eSTR around TSSs. The black lines represent the probability of an STR in each bin to be an FM-eSTR. Values were scaled relative to the genome-wide average. b., d., and f. show the relative probability of an STR to be an FM-eSTR around DNaseI HS clusters. Values were smoothed by taking a sliding average of each four consecutive bins.

Extended Data Fig. 7 Nucleosome occupancy and DNaseI hypersensitivity show distinct patterns around eSTRs.

a-c. Nucleosome density around STRs with different repeat unit lengths. Nucleosome density in GM12878 in 5bp windows is averaged across all STRs analyzed (dashed) and FM-eSTRs (solid) relative to the center of the STR. b. DNaseI HS density around STRs with different repeat unit lengths. The number of DNaseI HS reads in GM12878 (gray), fat (red), tibial nerve (yellow), and skin (cyan) is averaged across all STRs in each category. Solid lines show FM-eSTRs. Dashed lines show all STRs. Left=homopolymers, middle=dinucleotides, right=tetranucleotides. Other repeat unit lengths were excluded since they have low numbers of FM-eSTRs (see Fig. 4a). Dashed vertical lines in (d) show the STR position +/- 147bp.

Extended Data Fig. 8 Strand-biased characteristics of FM-eSTRs.

Top panel: the y-axis shows the number of FM-eSTRs with each repeat unit on the template strand. Bottom panel: the y-axis shows the percentage of FM-eSTRs with each repeat unit on the template strand that have positive effect sizes. Gray bars denote A-rich repeat units (A/AC/AAC/AAAC) and red bars denote T-rich repeat units (T/GT/GTT/GTTT). Single asterisks denote repeat units nominally enriched or depleted (two-sided binomial p<0.05). Double asterisks denote repeat units significantly enriched after controlling for multiple hypothesis testing (Bonferroni adjusted p<0.05). Asterisks above brackets show significant differences between repeat unit pairs. Asterisks on x-axis labels denote departure from the 50% positive effect sizes expected by chance. Error bars give 95% confidence intervals.

Extended Data Fig. 9 Example GWAS signals co-localized with FM-eSTRs.

Left: For each plot, the x-axis represents the mean number of repeats in each individual and the y-axis represents normalized expression in the tissue with the most significant eSTR signal at each locus. Boxplots summarize the distribution of expression values for each genotype. Box plots are as defined in Fig. 1c. The red line shows the mean expression for each x-axis value. Right: Top panels give genes in each region. The target gene for the eQTL associations is shown in black. Middle panels give the -log₁₀ p-values of association of the effect-size between each SNP (black points) and the expression of the target gene. The FM-eSTR is denoted by a red star. Bottom panels give the -log₁₀ p-values of association between each SNP and the trait based on published GWAS summary statistics. P-values are two-sided and are based on t-statistics computed for effect sizes (β) (see Methods). Dashed gray horizontal lines give the genome-wide significance threshold of 5E-8.

Extended Data Fig. 10 Example GWAS signal for schizophrenia potentially driven by an eSTR for MED19 .

a. eSTR association for MED19. The x-axis shows STR genotypes at an AC repeat (chr11:57523883) as the mean number of repeats in each individual and the y-axis shows normalized MED19 expression in subcutaneous adipose. Each point represents a single individual. Red lines show the mean expression for each x-axis value. Boxplots are as defined in Fig. 1c. b. Summary statistics for MED19 expression and schizophrenia. The top panel shows genes in the region around MED19. The middle panel shows the -log₁₀ p-values of association between each variant and MED19 expression in subcutaneous adipose tissue in the GTEx cohort. The FM-eSTR is denoted by a red star. The bottom panel shows the -log₁₀ p-values of association for each variant with schizophrenia reported by the Psychiatric Genomics Consortium. The dashed gray horizontal line shows genome-wide significance threshold of 5E-8. c. Detailed view of the MED19 locus. A UCSC genome browser screenshot is shown for the region in the gray box in (b). The FM-eSTR is shown in red. The bottom track shows transcription factor (TF) and chromatin regulator binding sites profiled by ENCODE. The bottom panel shows long-range interactions reported by Mifsud, et al. using Capture Hi-C on GM12878. Interactions shown in black include MED19. Interactions to loci outside of the window depicted are not shown.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fotsing, S.F., Margoliash, J., Wang, C. et al. The impact of short tandem repeat variation on gene expression. Nat Genet 51, 1652–1659 (2019). https://doi.org/10.1038/s41588-019-0521-9

Download citation

Received: 13 February 2019
Accepted: 25 September 2019
Published: 01 November 2019
Issue Date: November 2019
DOI: https://doi.org/10.1038/s41588-019-0521-9

This article is cited by

Dyads of GGC and GCC form hotspot colonies that coincide with the evolution of human and other great apes
- M. Arabfard
- N. Tajeddin
- M. Ohadi
BMC Genomic Data (2024)
RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci
- Sarah Fazal
- Matt C. Danzi
- Vanessa Aguiar-Pulido
Genome Biology (2024)
Sequencing and characterizing short tandem repeats in the human genome
- Hope A. Tanudisastro
- Ira W. Deveson
- Daniel G. MacArthur
Nature Reviews Genetics (2024)
AIRE relies on Z-DNA to flag gene targets for thymic T cell tolerization
- Yuan Fang
- Kushagra Bansal
- Diane Mathis
Nature (2024)
Sequence composition changes in short tandem repeats: heterogeneity, detection, mechanisms and clinical implications
- Indhu-Shree Rajan-Babu
- Egor Dolzhenko
- Jan M. Friedman
Nature Reviews Genetics (2024)