Fig. 4 | Nature Communications

Fig. 4

From: A reference haplotype panel for genome-wide imputation of short tandem repeats

Fig. 4

SNP haplotypes distinguish allele lengths at known pathogenic STRs. a Example SNP-STR haplotypes inferred in European samples at a polyglutamine repeat in ATN1 implicated in DRPLA. Each column represents a SNP from the founder haplotype reported by Veneziano et al. Each row represents a single haplotype inferred in 1000 Genomes Project phase 3 European samples, with gray and black boxes denoting major and minor alleles, respectively. Haplotypes are grouped by the corresponding STR allele. The number of SNP haplotypes for each group of STR alleles is annotated to the left of each box. Alleles seen fewer than 10 times in 1000 Genomes samples were excluded from the visualization. b Comparison of imputed vs. observed STR genotypes in SSC samples at the DRPLA locus. The x-axis gives the maximum likelihood genotype dosage returned by HipSTR and the y-axis gives the imputed dosage. Dosage is defined as the sum of the two allele lengths of each genotype relative to the hg19 reference genome. The bubble size represents the number of samples summarized by each data point. c Distribution of DRPLA repeat length vs. similarity to the pathogenic founder haplotype. The founder haplotype refers to the SNP haplotype reported by Veneziano, et al. on which a pathogenic expansion in ATN1 implicated in DRPLA likely originated. The x-axis gives the Hamming distance between observed haplotypes and the founder haplotype, computed as the number of positions with discordant alleles. White dots represent the median length