Fig. 2 | Nature Communications

Fig. 2

From: Strain profiling and epidemiology of bacterial species from metagenomic sequencing

Fig. 2

Validation on synthetic data and comparison with existing tools. StrainEst is able to predict the relative abundances of multistrain synthetic mixtures for different species such as B. longum, E. coli, E. faecalis, P. acnes, S. aureus, S. epidermidis, and S. pneumoniae. For each species, we simulated 10 synthetic data sets at coverage 10X (a) and 100X (b) generating reads from four strains mixed at variable relative abundances (60-25-10-5%). In the upper panel, we show the comparison between real and predicted relative abundances for E. coli. Colors indicate different strains. In the middle panel, we show the JSD between actual and predicted strain composition. In the lower panel, we show the MCC between the real and predicted strain composition, discarding strains with predicted relative abundances below 1%. As expected, the accuracy of StrainEst grows with increasing coverage. Boxes extend to the first and third quartile, whiskers extend to the upper and lower value within 1.5*IQR from the box. Outliers are shown as points. ce Upper panels: distance between the dominant (D) and the second (II), third (III), and fourth (IV) most frequent strain predicted by Bowtie 2, ConStrains, PanPhlAn, PathoScope, Sigma, and StrainEst for the three synthetic data sets composed of 2, 3, and 4 strains of E. coli. NA (generic E. coli) indicates that the algorithm only predicted the presence of E. coli without further specification. The broken lines indicate the 25th percentile, median, and 75th percentile of the distribution of the pairwise Mash distances between pairs of strains randomly chosen from the 3041 E. coli genomes downloaded from NCBI. Lower panels: Predicted relative abundances of the identified strains. The expected relative abundances are marked in colors (D, II, III, and IV for the dominant, second, third, and fourth strain in terms of relative abundances, respectively) on the vertical axes. Error bars indicate the first and third quartile

Back to article page