In order for human microbiome studies to translate into actionable outcomes for health, meta-analysis of reproducible data from population-scale cohorts is needed. Achieving sufficient reproducibility in microbiome research has proven challenging. We report a baseline investigation of variability in taxonomic profiling for the Microbiome Quality Control (MBQC) project baseline study (MBQC-base). Blinded specimen sets from human stool, chemostats, and artificial microbial communities were sequenced by 15 laboratories and analyzed using nine bioinformatics protocols. Variability depended most on biospecimen type and origin, followed by DNA extraction, sample handling environment, and bioinformatics. Analysis of artificial community specimens revealed differences in extraction efficiency and bioinformatic classification. These results may guide researchers in experimental design choices for gut microbiome studies.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Scientific Reports Open Access 03 January 2023
Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin
Genome Biology Open Access 03 October 2022
Reusing a prepaid health plan’s fecal immunochemical tests for microbiome associations with colorectal adenoma
Scientific Reports Open Access 31 August 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequence Read Archive
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).
Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).
Integrative HMP (iHMP) Research Network Consortium. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).
Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).
Lozupone, C.A. et al. Meta-analyses of studies of the human microbiota. Genome Res. 23, 1704–1714 (2013).
Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS One 7, e39315 (2012).
McCafferty, J. et al. Stochastic changes over time and not founder effects drive cage effects in microbial community assembly in a mouse model. ISME J. 7, 2116–2125 (2013).
Brooks, J.P. et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 15, 66 (2015).
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).
Robinson, C.K., Brotman, R.M. & Ravel, J. Intricacies of assessing the human microbiome in epidemiologic studies. Ann. Epidemiol. 26, 311–321 (2016).
Fu, B.C. et al. Characterization of the gut microbiome in epidemiologic studies: the multiethnic cohort experience. Ann. Epidemiol. 26, 373–379 (2016).
Thomas, V., Clark, J. & Doré, J. Fecal microbiota analysis: an overview of sample collection methods and sequencing strategies. Future Microbiol. 10, 1485–1504 (2015).
Kennedy, N.A. et al. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS One 9, e88982 (2014).
Wagner Mackenzie, B., Waite, D.W. & Taylor, M.W. Evaluating variation in human gut microbiota profiles due to DNA extraction method and inter-subject differences. Front. Microbiol. 6, 130 (2015).
Soergel, D.A., Dey, N., Knight, R. & Brenner, S.E. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 6, 1440–1444 (2012).
Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012).
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).
Caporaso, J.G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).
Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K. & Schloss, P.D. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120 (2013).
Fadrosh, D.W. et al. An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome 2, 6 (2014).
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Salter, S.J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
Wesolowska-Andersen, A. et al. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2, 19 (2014).
Huttenhower, C. et al. Advancing the microbiome research community. Cell 159, 227–230 (2014).
Leek, J.T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Yuan, S., Cohen, D.B., Ravel, J., Abdo, Z. & Forney, L.J. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One 7, e33865 (2012).
Morgan, J.L., Darling, A.E. & Eisen, J.A. Metagenomic sequencing of an in vitro-simulated microbial community. PLoS One 5, e10209 (2010).
Nelson, M.C., Morrison, H.G., Benjamino, J., Grim, S.L. & Graf, J. Analysis, optimization and verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS One 9, e94249 (2014).
De Filippo, C. et al. Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc. Natl. Acad. Sci. USA 107, 14691–14696 (2010).
D'Amore, R. et al. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics 17, 55 (2016).
Clooney, A.G. et al. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis. PLoS One 11, e0148028 (2016).
Fouhy, F., Clooney, A.G., Stanton, C., Claesson, M.J. & Cotter, P.D. 16S rRNA gene sequencing of mock microbial populations- impact of DNA extraction method, primer choice and sequencing platform. BMC Microbiol. 16, 123 (2016).
Degnan, P.H. & Ochman, H. Illumina-based analysis of microbial community diversity. ISME J. 6, 183−194 (2012).
Schloss, P.D., Gevers, D. & Westcott, S.L. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6, e27310 (2011).
Biesbroek, G. et al. Deep sequencing analyses of low density microbial communities: working at the boundary of accurate microbiota detection. PLoS One 7, e32942 (2012).
Gaspar, J.M. & Thomas, W.K. Assessing the consequences of denoising marker-based metagenomic data. PLoS One 8, e60458 (2013).
Kennedy, K., Hall, M.W., Lynch, M.D., Moreno-Hagelsieb, G. & Neufeld, J.D. Evaluating bias of illumina-based bacterial 16S rRNA gene profiles. Appl. Environ. Microbiol. 80, 5717−5722 (2014).
Schmidt, T.S., Matias Rodrigues, J.F. & von Mering, C. Limits to robustness and reproducibility in the demarcation of operational taxonomic units. Environ. Microbiol. 17, 1689−1706 (2015).
Hang, J. et al. 16S rRNA gene pyrosequencing of reference and clinical samples and investigation of the temperature stability of microbiome profiles. Microbiome 2, 31 (2014).
Koskinen, K., Auvinen, P., Bjorkroth, K.J. & Hultman, J. Inconsistent denoising and clustering algorithms for amplicon sequence data. J. Comput. Biol. 22, 743−751 (2015).
Jeon, Y.S., Park, S.C., Lim, J., Chun, J. & Kim, B.S. Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform. J. Microbiol. 53, 60−69 (2015).
Walker, A.W. et al. 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice. Microbiome 3, 26 (2015).
Tremblay, J. et al. Primer and platform effects on 16S rRNA tag sequencing. Front. Microbiol. 6, 771 (2015).
Hiergeist, A. & Reischl, U. Priority Program 1656 Intestinal Microbiota Consortium/quality assessment participants & Gessner, A. Multicenter quality assessment of 16S ribosomal DNA-sequencing for microbiome analyses reveals high inter-center variability. Int. J. Med. Microbiol. 306, 334−342 (2016).
Schloss, P.D., Jenior, M.L., Koumpouras, C.C., Westcott, S.L. & Highlander, S.K. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system. PeerJ 4, e1869 (2016).
Jovel, J. et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front. Microbiol. 7, 459 (2016).
Lauder, A.P. et al. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4, 29 (2016).
Gohl, D.M. et al. Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat. Biotechnol. 34, 942−949 (2016).
Song, S.J. et al. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems 1, e00021–16 (2016).
Schloss, P.D. Application of a database-independent approach to assess the quality of operational taxonomic unit picking methods. mSystems 1, e00027–16 (2016).
Schiffman, M.H. et al. Case-control study of colorectal cancer and fecapentaene excretion. Cancer Res. 49, 1322–1326 (1989).
Schiffman, M.H. et al. Case-control study of colorectal cancer and fecal mutagenicity. Cancer Res. 49, 3420–3424 (1989).
McDonald, J.A. et al. Evaluation of microbial community reproducibility, stability and composition in a human distal gut chemostat model. J. Microbiol. Methods 95, 167–174 (2013).
De Boever, P., Deplancke, B. & Verstraete, W. Fermentation by gut microbiota cultured in a simulator of the human intestinal microbial ecosystem is improved by supplementing a soygerm powder. J. Nutr. 130, 2599–2606 (2000).
Nelson, K.E. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).
Caporaso, J.G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).
Edgar, R.C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10, 996–998 (2013).
Magocˇ, T. & Salzberg, S.L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Masella, A.P., Bartram, A.K., Truszkowski, J.M., Brown, D.G. & Neufeld, J.D. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 13, 31 (2012).
Cole, J.R. et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014).
Schloss, P.D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).
Yilmaz, P. et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).
Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
The authors are grateful to the many additional laboratory members and scientists who contributed to the Microbiome Quality Control Project baseline study, particularly during the sample handling and data generation processes. We would also like to extend our thanks to the participants in the original studies who generously provided specimens to support this and other research. This work was funded in part by the National Institutes of Health NIDDK U54DE023798 (C.H.), NHGRI R01HG005969 (C.H.), NHGRI R01 HG005220 (C.H., to Rafael Irizarry), NHGRI U01HG004866 (O.W.), NHGRI U01HG006537 (R.K.), R01HG004872 (R.K.), U01HGs004866 (R.K.), the NCI Intramural Research Program (R.S., E.V. and C.C.A.), NSF DBI-1053486 (C.H.), ARO W911NF-11-1-0473 (C.H.), the W. M. Keck Foundation (R.K.), John Templeton Foundation (R.K.), and Alfred P. Sloan Foundation (R.K.). R.K. was a Howard Hughes Medical Institute Early Career Scientist. The Microbiome Quality Control Project Consortium members are as follows: Gail Ackermann, BioFrontiers Institute, University of Colorado – Boulder; Nadim J Ajami, Alkek Center of Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Tulin Ayvaz, Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Jordan E Bisanz, Microbiology and Immunology/ Lawson Health Research Institute, Western University; Ian Brown, Molecular and Cellular Biology, University of Guelph; Zigui Chen, Department of Pediatrics, Albert Einstein College of Medicine; Michelle C Daigneault, Molecular and Cellular Biology, University of Guelph; Mike S Humphrys, School of Medicine, Institute for Genome Sciences, University of Maryland; Catherine A Kelty, ORD, NRMRL, WSWRD, MCCB, USEPA; Randy S Longman, Pathology, Skirball Institute of Biomolecular Medicine; Bing Ma, Institute for Genome Sciences, Department of Microbiology and Immunology, University of Maryland; Corinne F Maurice, FAS Center for Systems Biology, Harvard University; Julie AK McDonald, Molecular and Cellular Biology, University of Guelph; Michael Minson, Chemistry & Biochemistry, University of Colorado at Boulder; Tiffany W Poon, MPG, Broad Institute; Joshua N Sampson, Biostatistics Branch, DCEG, National Cancer Institute; Daniel A Victorio, Jill Roberts Center for Inflammatory Bowel Disease, Weill Cornell Medical College; Matthew C Wong, Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Xiaolin Wu, Cancer Research Technology Program, Ledois Biomedical Research Inc., Frederick National Laboratory for Cancer Research; Guoqin Yu, Division of Cancer Epidemiology and Genetics, National Cancer Institute; Emma Allen-Vercoe, Molecular and Cellular Biology, University of Guelph; Robert D Burk, Pediatrics; Microbiology & Immunology; Epidemiology & Population Health, Albert Einstein College of Medicine; J Gregory Caporaso, Department of Biological Sciences, Northern Arizona University; Nicholas Chia, Surgery, Biomedical Engineering and Physiology, Mayo College; Roberto Flores, Nutritional Science Research Group / Division of Cancer Prevention, National Cancer Institute; Dirk Gevers, Broad Institute of MIT and Harvard; Gregory B Gloor, Biochemistry, University of Western Ontario; Andrew L Goodman, Department of Microbial Pathogenesis and Microbial Sciences Institute, Yale University School of Medicine; Dan R Littman, Molecular Pathogenesis Program, Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine; David A Mills, Food Science and Technology, Viticulture and Enology, and Foods for Health Institute, University of California, Davis; Joseph F Petrosino, Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Jacques Ravel, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA; Orin C Shanks, Office of Research and Development, United States Environmental Protection Agency; Peter J Turnbaugh, FAS Center for Systems Biology, Harvard University.
N.J.A., J.F.P., and M.C.W. own shares in Diversigen, Inc.
A complete list of members is provided in the Acknowledgments.
Integrated supplementary information
A) Multidimensional scaling of MBQC sample Bray-Curtis dissimilarities (see Fig. 1). Labels indicate centroids of the indicated sample types. B) As panel A, but also including post-hoc mothur-processed samples (BL-10, see Methods). Systematic taxonomic shifts from this protocol are present (see Supplementary Dataset 6 but not of sufficient effect size to appear on the first two ordination axes. C) Proportions of 10 bacterial phyla that were detected with a minimum relative abundance of 0.01% in at least 10% of the 16,554 samples that were subjected to integrated analysis.
Supplementary Figure 2 Within-sample alpha diversity, stratified by handling and bioinformatics lab.
Four different alpha diversity measures (inverse Simpson, observed species richness, Chao1, and phylogenetic diversity) across all samples processed by each A) handling and B) bioinformatics lab. All diversity measures, whether qualitative (OS, Chao1, PD) or quantitative (IS) and whether taxonomic (IS, OS, Chao1) or phylogenetic (PD) correlate closely, with large but consistent differences induced by distinct handling protocol choices.
Supplementary Figure 3 Within-sample alpha and beta diversities, stratified by handling and bioinformatics lab.
Distributions of A) within-sample Inverse Simpson alpha diversity and B) between-sample Bray-Curtis beta diversity, stratified by sample type (total n=2,033 for artificial communities, n=11,991 for human-derived samples, and n=1,725 for chemostat samples) handling and bioinformatics lab. As in other, higher-level summary statistics, effect sizes in decreasing order are (on average) biospecimen type, biospecimen, handling laboratory, and bioinformatics laboratory, with no bioinformatics protocols (regardless of read length or trimming) evident outliers from this trend. Outlier values outside 1.5 times the interquartile range are omitted for clarity.
Supplementary Figure 4 Correlations of alpha diversities for samples as processed by different handling and bioinformatics labs.
Each tile represents a Spearman rank correlation coefficient between pairwise comparisons of all log10 transformed Inverse Simpson index estimates for the overlapping subsets of 20,708 samples that survived quality control for each pair of A) handling and B) bioinformatics laboratories. High correlation in OTU Inverse Simpson estimates, which accounts for richness and evenness, across labs implies robustness (or consistent bias) in microbial community in silico reconstruction protocols across laboratories. Correlations in diversity are lower among handling than bioinformatics labs but generally highly significantly positive; exceptions include potential external (e.g. HL-A) and within-batch (e.g. HL-D) contaminants (see Supplementary Fig. 14).
Supplementary Figure 5 Simpson diversity estimates for each individual sample under a single bioinformatics protocol (BL-3).
Each boxplot within each panel represents data from a distinct handling lab extraction event. A null hypothesis of no difference for each sample is evaluated by one-way ANOVA with nominal p-values shown above each panel.
Supplementary Figure 6 Simpson’s diversity for a specific sample under different bioinformatics protocols.
Using a single sample (D2497), the overall pattern of diversity is similar between different bioinformatics protocols, but the absolute diversity reported by each protocol varies by up to a factor of two or more. Different bioinformatics protocols when applied to the same sequences, therefore, do not produce absolute diversity estimates that are directly comparable. Direct comparisons of absolute alpha diversity, therefore, are most feasible for data processed by a single bioinformatics protocol, while relative alpha diversities can be more safely compared between protocols.
Supplementary Figure 7 Individual samples differ in terms of how much diversity estimates depend on wet lab extraction.
Each column represents one specimen, with the distribution of p-values across bioinformatics protocols (colors) testing whether handling lab has a significant effect on the estimation of Simpson diversity. For some samples (such as D2327), diversity estimates across different handling labs were not affected by bioinformatics protocol, while for others (such as D2561) different handling protocols produced very different absolute diversity measurements depending on bioinformatics protocol. In the case of D2561, a freeze-dried specimen, different extraction protocols produced unusually variable distributions of Bacteroidetes versus Firmicutes, and some extraction results included large proportions of likely contaminants such as Methylobacterium, Staphylococcus, and Spirochaetes. A smaller study only incorporating a few specimens or technical replicates using the same specimen might thus reach different conclusions based solely on which specimens happen to be included.
Supplementary Figure 8 Ordination for each sample, comparing replicates extracted with different kits for a single bioinformatics pipeline.
For each sample (panel), MDS ordination was performed using Bray-Curtis dissimilarity to examine how each kit (color) influenced microbial composition. In general, samples extracted centrally (open symbols) and in each lab (filled symbols) overlapped when local extraction used the same extraction kit as centralized extraction (Mo-Bio, red symbols), but this was not true for all samples (for example sample DZ15294).
For each extraction lab (top panel) and kit manufacturer (bottom panel), log10(p-values) from a paired t-test for no difference between shipped and locally extracted DNA. Different colors represent results from different phyla as indicated in the figure legend. Bioinformatics pipelines agreed closely at the phylum level, leading to clusters of near-identical points for each color. When kits from certain manufacturers (such as MO-BIO and Omega biotek) were used by the locally extracting lab, there was good agreement with the shipped DNA (which was extracted with a MO-Bio kit). However, for other kits, there were substantial differences in the relative abundance calls at the phylum level, producing small p-values.
Supplementary Figure 10 Taxonomic profiles of positive-control and negative-control samples stratified by extraction, handling lab, and bioinformatics.
Average relative abundance profiles for A) fecal and B) oral artificial communities (see Supplementary Table 1) and C) buffer blank sample profiles stratified by extraction location (super-row), handling lab (super-column), and bioinformatics lab (column). Only taxa (rows) achieving at least 0.1% relative abundance in at least one sample are shown; combinations for which no data were provided are gray.
Supplementary Figure 11 Alpha diversity of gut- and oral-derived artificial communities and negative-control blanks as stratified by handling and bioinformatics laboratories.
Rarefaction curves for mean number of OTUs as a function of rarefaction depth for the fecal (A-B) and oral (C-D) artificial communities and for negative control blanks (E-F). Means and standard error for a minimum of 5 samples are shown for different bioinformatic pipelines (average across handling labs; A, C, E) or for different handling labs (averaging across bioinformatics; B, D, F). Target values are 20 for fecal and 22 for oral artificial communities, respectively.
Supplementary Figure 12 Correlation between taxonomic profiles from whole metagenome shotgun (WMS) and 16S amplicon sequence data on stool and oral artificial communities.
Relative abundances calculated from WMS (horizontal) and 16S rRNA gene (vertical) artificial community sequence data for 17 and 19 species, from gut and oral artificial communities, which were identifiable from both sequence sets. Each point represents the species relative abundance interquartile ranges (IQRs) for WMS and 16S; the IQRs intersect at respective median values. Spearman rho correlation coefficients are shown in the top left of each plot. The dashed diagonal line represents the diagonal. Data are summarized from 43, 36, 43, and 40 artificial community 16S amplicon samples for gut centrally and locally, and oral centrally and locally extracted DNA samples, respectively; twelve WMS samples, three in each respective group, were summarized in each subplot.
Supplementary Figure 13 Mismatched raw reads in artificial communities account for ˜30% of sequences and vary by handling laboratory.
A) Fecal and B) oral artificial community fractions of reads exactly matching one of the 20 or 22 reference 16S rRNA gene sequences, respectively. Each bar represents one sample, with copy numbers varying depending on the data deposited and the number of sample sets handled. C) Fecal and D) oral per-nucleotide error rates estimated based on reads containing exactly one mismatch to reference 16S rRNA gene sequences. E) Fecal and F) oral reads identical to references but offset by either one or two nucleotides. G) Fecal and H) oral reads containing chimeric sequences from two known community members (as determined by exhaustive search of all reference pairs).
Supplementary Figure 14 Reads in oral artificial communities are often apparently derived from abundant taxa in non-artificial samples.
Rows correspond to OTUs abundant in non-artificial MBQC-base samples, columns to samples (grouped by handling lab). Averages on the right are per lab in the same order across all non-artificial samples. Only results from the BL-9B bioinformatics method are shown for simplicity, and handling lab HL-A samples are omitted due to an incongruous pattern of apparent external contamination by other organisms (see Supplementary Fig. 10).
Supplementary Figure 15 The significance of all variables from a multivariate model of experimental and bioinformatic protocol variables.
A) Magnitudes and significance levels of only fixed effects from a random effects model capturing all handling and bioinformatics lab variables for which sufficient measurements were available (see Supplementary Table 7). Comparably high variability in phylum-level taxonomic abundance readouts was associated with subsets of both handling and bioinformatics experimental protocol choices. B) Magnitudes and significance levels of only random effects from the full model. These highlight large differences induced by biological variation relative to sample handling or bioinformatics protocol choices on the resulting abundances. C) Random effects from a simplified model including only individual handling and bioinformatics laboratory identifiers and differences between pre- and locally-extracted samples. D) Fixed effects of sample handling and bioinformatics laboratories from the simplified model. These suggest that that variability in taxonomic profiling is primarily driven by sample handling protocol choices, with lesser differences induced at the phylum level by bioinformatics choices. All effects were evaluated using a likelihood ratio test with Benjamini-Hochberg-Yekutieli correction within models.
Supplementary Figures 1–15 (PDF 3507 kb)
Extended literature review and categorization of microbiome protocol studies. (XLSX 117 kb)
Handling protocol variables recorded during the MBQC-base. (XLSX 25 kb)
Bioinformatics protocol variable reporting. (XLSX 24 kb)
Handling protocol variables curated from the MBQC-base. (XLSX 16 kb)
Raw read counts per sample. (XLSX 41 kb)
MBQC-base OTU table. (ZIP 14543 kb)
Alpha and beta diversities stratified by sample type within and between 175 handling and bioinformatics laboratories. (ZIP 7139 kb)
Beta-diversity divergence from artificial community positive controls; 181 stratified by handling and bioinformatics laboratories. (TXT 2 kb)
About this article
Cite this article
Sinha, R., Abu-Ali, G., Vogtmann, E. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol 35, 1077–1086 (2017). https://doi.org/10.1038/nbt.3981
This article is cited by
Nature Reviews Genetics (2023)
Scientific Reports (2023)
Meta-analysis defines predominant shared microbial responses in various diseases and a specific inflammatory bowel disease signal
Genome Biology (2022)