Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium

Abstract

In order for human microbiome studies to translate into actionable outcomes for health, meta-analysis of reproducible data from population-scale cohorts is needed. Achieving sufficient reproducibility in microbiome research has proven challenging. We report a baseline investigation of variability in taxonomic profiling for the Microbiome Quality Control (MBQC) project baseline study (MBQC-base). Blinded specimen sets from human stool, chemostats, and artificial microbial communities were sequenced by 15 laboratories and analyzed using nine bioinformatics protocols. Variability depended most on biospecimen type and origin, followed by DNA extraction, sample handling environment, and bioinformatics. Analysis of artificial community specimens revealed differences in extraction efficiency and bioinformatic classification. These results may guide researchers in experimental design choices for gut microbiome studies.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Microbiome Quality Control Project baseline study design.
Figure 2: Beta-diversity of MBQC-base microbial community analyses.
Figure 3: Individual and aggregate effects of sample handling and bioinformatics laboratories on microbial profiles.
Figure 4: Detection of abundant taxa in positive and negative control samples is affected by sample handling.
Figure 5: Variation in community profiling analyzed using a multivariate model of experimental and bioinformatic protocol variables.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  2. Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Integrative HMP (iHMP) Research Network Consortium. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).

  5. Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lozupone, C.A. et al. Meta-analyses of studies of the human microbiota. Genome Res. 23, 1704–1714 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS One 7, e39315 (2012).

  8. McCafferty, J. et al. Stochastic changes over time and not founder effects drive cage effects in microbial community assembly in a mouse model. ISME J. 7, 2116–2125 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Brooks, J.P. et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 15, 66 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  10. SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).

  11. Robinson, C.K., Brotman, R.M. & Ravel, J. Intricacies of assessing the human microbiome in epidemiologic studies. Ann. Epidemiol. 26, 311–321 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Fu, B.C. et al. Characterization of the gut microbiome in epidemiologic studies: the multiethnic cohort experience. Ann. Epidemiol. 26, 373–379 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Thomas, V., Clark, J. & Doré, J. Fecal microbiota analysis: an overview of sample collection methods and sequencing strategies. Future Microbiol. 10, 1485–1504 (2015).

    Article  CAS  PubMed  Google Scholar 

  14. Kennedy, N.A. et al. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS One 9, e88982 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wagner Mackenzie, B., Waite, D.W. & Taylor, M.W. Evaluating variation in human gut microbiota profiles due to DNA extraction method and inter-subject differences. Front. Microbiol. 6, 130 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Soergel, D.A., Dey, N., Knight, R. & Brenner, S.E. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 6, 1440–1444 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012).

  18. McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

    Article  CAS  PubMed  Google Scholar 

  19. Caporaso, J.G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K. & Schloss, P.D. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fadrosh, D.W. et al. An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome 2, 6 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Salter, S.J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Wesolowska-Andersen, A. et al. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2, 19 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Huttenhower, C. et al. Advancing the microbiome research community. Cell 159, 227–230 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Leek, J.T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).

    Article  CAS  PubMed  Google Scholar 

  27. Yuan, S., Cohen, D.B., Ravel, J., Abdo, Z. & Forney, L.J. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One 7, e33865 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Morgan, J.L., Darling, A.E. & Eisen, J.A. Metagenomic sequencing of an in vitro-simulated microbial community. PLoS One 5, e10209 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Nelson, M.C., Morrison, H.G., Benjamino, J., Grim, S.L. & Graf, J. Analysis, optimization and verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS One 9, e94249 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. De Filippo, C. et al. Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc. Natl. Acad. Sci. USA 107, 14691–14696 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  31. D'Amore, R. et al. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics 17, 55 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Clooney, A.G. et al. Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis. PLoS One 11, e0148028 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Fouhy, F., Clooney, A.G., Stanton, C., Claesson, M.J. & Cotter, P.D. 16S rRNA gene sequencing of mock microbial populations- impact of DNA extraction method, primer choice and sequencing platform. BMC Microbiol. 16, 123 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Degnan, P.H. & Ochman, H. Illumina-based analysis of microbial community diversity. ISME J. 6, 183−194 (2012).

    Article  CAS  Google Scholar 

  35. Schloss, P.D., Gevers, D. & Westcott, S.L. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6, e27310 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Biesbroek, G. et al. Deep sequencing analyses of low density microbial communities: working at the boundary of accurate microbiota detection. PLoS One 7, e32942 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Gaspar, J.M. & Thomas, W.K. Assessing the consequences of denoising marker-based metagenomic data. PLoS One 8, e60458 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Kennedy, K., Hall, M.W., Lynch, M.D., Moreno-Hagelsieb, G. & Neufeld, J.D. Evaluating bias of illumina-based bacterial 16S rRNA gene profiles. Appl. Environ. Microbiol. 80, 5717−5722 (2014).

    Article  CAS  PubMed Central  Google Scholar 

  39. Schmidt, T.S., Matias Rodrigues, J.F. & von Mering, C. Limits to robustness and reproducibility in the demarcation of operational taxonomic units. Environ. Microbiol. 17, 1689−1706 (2015).

    Google Scholar 

  40. Hang, J. et al. 16S rRNA gene pyrosequencing of reference and clinical samples and investigation of the temperature stability of microbiome profiles. Microbiome 2, 31 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Koskinen, K., Auvinen, P., Bjorkroth, K.J. & Hultman, J. Inconsistent denoising and clustering algorithms for amplicon sequence data. J. Comput. Biol. 22, 743−751 (2015).

    Article  CAS  Google Scholar 

  42. Jeon, Y.S., Park, S.C., Lim, J., Chun, J. & Kim, B.S. Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform. J. Microbiol. 53, 60−69 (2015).

    Google Scholar 

  43. Walker, A.W. et al. 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice. Microbiome 3, 26 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Tremblay, J. et al. Primer and platform effects on 16S rRNA tag sequencing. Front. Microbiol. 6, 771 (2015).

    PubMed  PubMed Central  Google Scholar 

  45. Hiergeist, A. & Reischl, U. Priority Program 1656 Intestinal Microbiota Consortium/quality assessment participants & Gessner, A. Multicenter quality assessment of 16S ribosomal DNA-sequencing for microbiome analyses reveals high inter-center variability. Int. J. Med. Microbiol. 306, 334−342 (2016).

    Article  CAS  PubMed  Google Scholar 

  46. Schloss, P.D., Jenior, M.L., Koumpouras, C.C., Westcott, S.L. & Highlander, S.K. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system. PeerJ 4, e1869 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Jovel, J. et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front. Microbiol. 7, 459 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Lauder, A.P. et al. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4, 29 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Gohl, D.M. et al. Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat. Biotechnol. 34, 942−949 (2016).

    Article  CAS  Google Scholar 

  50. Song, S.J. et al. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems 1, e00021–16 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Schloss, P.D. Application of a database-independent approach to assess the quality of operational taxonomic unit picking methods. mSystems 1, e00027–16 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Schiffman, M.H. et al. Case-control study of colorectal cancer and fecapentaene excretion. Cancer Res. 49, 1322–1326 (1989).

    CAS  PubMed  Google Scholar 

  53. Schiffman, M.H. et al. Case-control study of colorectal cancer and fecal mutagenicity. Cancer Res. 49, 3420–3424 (1989).

    CAS  PubMed  Google Scholar 

  54. McDonald, J.A. et al. Evaluation of microbial community reproducibility, stability and composition in a human distal gut chemostat model. J. Microbiol. Methods 95, 167–174 (2013).

    Article  CAS  PubMed  Google Scholar 

  55. De Boever, P., Deplancke, B. & Verstraete, W. Fermentation by gut microbiota cultured in a simulator of the human intestinal microbial ecosystem is improved by supplementing a soygerm powder. J. Nutr. 130, 2599–2606 (2000).

    Article  CAS  PubMed  Google Scholar 

  56. Nelson, K.E. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).

    Article  CAS  PubMed  Google Scholar 

  57. Caporaso, J.G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Edgar, R.C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10, 996–998 (2013).

    Article  CAS  PubMed  Google Scholar 

  59. Magocˇ, T. & Salzberg, S.L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Masella, A.P., Bartram, A.K., Truszkowski, J.M., Brown, D.G. & Neufeld, J.D. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 13, 31 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Cole, J.R. et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014).

    Article  CAS  PubMed  Google Scholar 

  62. Schloss, P.D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Yilmaz, P. et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).

    Article  CAS  PubMed  Google Scholar 

  64. Bolger, A.M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the many additional laboratory members and scientists who contributed to the Microbiome Quality Control Project baseline study, particularly during the sample handling and data generation processes. We would also like to extend our thanks to the participants in the original studies who generously provided specimens to support this and other research. This work was funded in part by the National Institutes of Health NIDDK U54DE023798 (C.H.), NHGRI R01HG005969 (C.H.), NHGRI R01 HG005220 (C.H., to Rafael Irizarry), NHGRI U01HG004866 (O.W.), NHGRI U01HG006537 (R.K.), R01HG004872 (R.K.), U01HGs004866 (R.K.), the NCI Intramural Research Program (R.S., E.V. and C.C.A.), NSF DBI-1053486 (C.H.), ARO W911NF-11-1-0473 (C.H.), the W. M. Keck Foundation (R.K.), John Templeton Foundation (R.K.), and Alfred P. Sloan Foundation (R.K.). R.K. was a Howard Hughes Medical Institute Early Career Scientist. The Microbiome Quality Control Project Consortium members are as follows: Gail Ackermann, BioFrontiers Institute, University of Colorado – Boulder; Nadim J Ajami, Alkek Center of Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Tulin Ayvaz, Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Jordan E Bisanz, Microbiology and Immunology/ Lawson Health Research Institute, Western University; Ian Brown, Molecular and Cellular Biology, University of Guelph; Zigui Chen, Department of Pediatrics, Albert Einstein College of Medicine; Michelle C Daigneault, Molecular and Cellular Biology, University of Guelph; Mike S Humphrys, School of Medicine, Institute for Genome Sciences, University of Maryland; Catherine A Kelty, ORD, NRMRL, WSWRD, MCCB, USEPA; Randy S Longman, Pathology, Skirball Institute of Biomolecular Medicine; Bing Ma, Institute for Genome Sciences, Department of Microbiology and Immunology, University of Maryland; Corinne F Maurice, FAS Center for Systems Biology, Harvard University; Julie AK McDonald, Molecular and Cellular Biology, University of Guelph; Michael Minson, Chemistry & Biochemistry, University of Colorado at Boulder; Tiffany W Poon, MPG, Broad Institute; Joshua N Sampson, Biostatistics Branch, DCEG, National Cancer Institute; Daniel A Victorio, Jill Roberts Center for Inflammatory Bowel Disease, Weill Cornell Medical College; Matthew C Wong, Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Xiaolin Wu, Cancer Research Technology Program, Ledois Biomedical Research Inc., Frederick National Laboratory for Cancer Research; Guoqin Yu, Division of Cancer Epidemiology and Genetics, National Cancer Institute; Emma Allen-Vercoe, Molecular and Cellular Biology, University of Guelph; Robert D Burk, Pediatrics; Microbiology & Immunology; Epidemiology & Population Health, Albert Einstein College of Medicine; J Gregory Caporaso, Department of Biological Sciences, Northern Arizona University; Nicholas Chia, Surgery, Biomedical Engineering and Physiology, Mayo College; Roberto Flores, Nutritional Science Research Group / Division of Cancer Prevention, National Cancer Institute; Dirk Gevers, Broad Institute of MIT and Harvard; Gregory B Gloor, Biochemistry, University of Western Ontario; Andrew L Goodman, Department of Microbial Pathogenesis and Microbial Sciences Institute, Yale University School of Medicine; Dan R Littman, Molecular Pathogenesis Program, Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine; David A Mills, Food Science and Technology, Viticulture and Enology, and Foods for Health Institute, University of California, Davis; Joseph F Petrosino, Alkek Center for Metagenomics and Microbiome Research, Department of Molecular Virology and Microbiology, Baylor College of Medicine; Jacques Ravel, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA; Orin C Shanks, Office of Research and Development, United States Environmental Protection Agency; Peter J Turnbaugh, FAS Center for Systems Biology, Harvard University.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

A.A., G.A.-A., N.J.A., R.D.B., J.G.C., N.C., Z.C., A.A.F., G.B.G., C.H., M.S.H., R.K., B.M., J.F.P., B.R., J.R., M.C.W., X.W., and G.Y. contributed bioinformatics; A.A., G.A.-A., J.G.C., A.A.F., C.H., R.K., B.R., and E.S. contributed cross-laboratory data handling or analysis; E.A., N.J.A., T.A., R.D.B., I.B., Z.C., M.C.D., R.F., D.G., A.L.G., M.S.H., C.A.K., R.K., D.R.L., R.S.L., D.A.M., C.F.M., J.F.P., J.R., O.C.S., P.J.T., D.A.V., and X.W. contributed DNA extraction; A.A., C.C.A., G.A.-A., A.A.F., C.H., B.R., E.S., R.S., E.V., and O.W. contributed manuscript preparation/writing; E.A., G.A.-A., N.J.A., T.A., R.D.B., J.E.B., I.B., Z.C., R.F., G.B.G., D.G., A.L.G., M.S.H., C.A.K., R.K., D.R.L., R.S.L., D.A.M., C.F.M., J.A.K.M., M.M., J.F.P., J.R., O.C.S., P.J.T., D.A.V., and X.W. contributed other sample handling (sequencing library preparation, sequencing, etc.); E.A., R.D.B., J.G.C., N.C., A.A.F., R.F., G.B.G., D.G., A.L.G., C.H., R.K., D.R.L., D.A.M., J.F.P., J.R., O.C.S., R.S., and P.J.T. contributed as PIs; G.A.-A., TW.P., and E.V. contributed project management; C.C.A., C.H., R.K., R.S., and O.W. contributed to the steering committee; C.C.A., C.H., R.K., E.S., J.N.S., R.S., and E.V. contributed study design.

Corresponding author

Correspondence to Curtis Huttenhower.

Ethics declarations

Competing interests

N.J.A., J.F.P., and M.C.W. own shares in Diversigen, Inc.

Additional information

A complete list of members is provided in the Acknowledgments.

Integrated supplementary information

Supplementary Figure 1 MBQC–base beta diversity, major protocol variables, and taxonomic profiles.

A) Multidimensional scaling of MBQC sample Bray-Curtis dissimilarities (see Fig. 1). Labels indicate centroids of the indicated sample types. B) As panel A, but also including post-hoc mothur-processed samples (BL-10, see Methods). Systematic taxonomic shifts from this protocol are present (see Supplementary Dataset 6 but not of sufficient effect size to appear on the first two ordination axes. C) Proportions of 10 bacterial phyla that were detected with a minimum relative abundance of 0.01% in at least 10% of the 16,554 samples that were subjected to integrated analysis.

Supplementary Figure 2 Within-sample alpha diversity, stratified by handling and bioinformatics lab.

Four different alpha diversity measures (inverse Simpson, observed species richness, Chao1, and phylogenetic diversity) across all samples processed by each A) handling and B) bioinformatics lab. All diversity measures, whether qualitative (OS, Chao1, PD) or quantitative (IS) and whether taxonomic (IS, OS, Chao1) or phylogenetic (PD) correlate closely, with large but consistent differences induced by distinct handling protocol choices.

Supplementary Figure 3 Within-sample alpha and beta diversities, stratified by handling and bioinformatics lab.

Distributions of A) within-sample Inverse Simpson alpha diversity and B) between-sample Bray-Curtis beta diversity, stratified by sample type (total n=2,033 for artificial communities, n=11,991 for human-derived samples, and n=1,725 for chemostat samples) handling and bioinformatics lab. As in other, higher-level summary statistics, effect sizes in decreasing order are (on average) biospecimen type, biospecimen, handling laboratory, and bioinformatics laboratory, with no bioinformatics protocols (regardless of read length or trimming) evident outliers from this trend. Outlier values outside 1.5 times the interquartile range are omitted for clarity.

Supplementary Figure 4 Correlations of alpha diversities for samples as processed by different handling and bioinformatics labs.

Each tile represents a Spearman rank correlation coefficient between pairwise comparisons of all log10 transformed Inverse Simpson index estimates for the overlapping subsets of 20,708 samples that survived quality control for each pair of A) handling and B) bioinformatics laboratories. High correlation in OTU Inverse Simpson estimates, which accounts for richness and evenness, across labs implies robustness (or consistent bias) in microbial community in silico reconstruction protocols across laboratories. Correlations in diversity are lower among handling than bioinformatics labs but generally highly significantly positive; exceptions include potential external (e.g. HL-A) and within-batch (e.g. HL-D) contaminants (see Supplementary Fig. 14).

Supplementary Figure 5 Simpson diversity estimates for each individual sample under a single bioinformatics protocol (BL-3).

Each boxplot within each panel represents data from a distinct handling lab extraction event. A null hypothesis of no difference for each sample is evaluated by one-way ANOVA with nominal p-values shown above each panel.

Supplementary Figure 6 Simpson’s diversity for a specific sample under different bioinformatics protocols.

Using a single sample (D2497), the overall pattern of diversity is similar between different bioinformatics protocols, but the absolute diversity reported by each protocol varies by up to a factor of two or more. Different bioinformatics protocols when applied to the same sequences, therefore, do not produce absolute diversity estimates that are directly comparable. Direct comparisons of absolute alpha diversity, therefore, are most feasible for data processed by a single bioinformatics protocol, while relative alpha diversities can be more safely compared between protocols.

Supplementary Figure 7 Individual samples differ in terms of how much diversity estimates depend on wet lab extraction.

Each column represents one specimen, with the distribution of p-values across bioinformatics protocols (colors) testing whether handling lab has a significant effect on the estimation of Simpson diversity. For some samples (such as D2327), diversity estimates across different handling labs were not affected by bioinformatics protocol, while for others (such as D2561) different handling protocols produced very different absolute diversity measurements depending on bioinformatics protocol. In the case of D2561, a freeze-dried specimen, different extraction protocols produced unusually variable distributions of Bacteroidetes versus Firmicutes, and some extraction results included large proportions of likely contaminants such as Methylobacterium, Staphylococcus, and Spirochaetes. A smaller study only incorporating a few specimens or technical replicates using the same specimen might thus reach different conclusions based solely on which specimens happen to be included.

Supplementary Figure 8 Ordination for each sample, comparing replicates extracted with different kits for a single bioinformatics pipeline.

For each sample (panel), MDS ordination was performed using Bray-Curtis dissimilarity to examine how each kit (color) influenced microbial composition. In general, samples extracted centrally (open symbols) and in each lab (filled symbols) overlapped when local extraction used the same extraction kit as centralized extraction (Mo-Bio, red symbols), but this was not true for all samples (for example sample DZ15294).

Supplementary Figure 9 Different extraction kits produce different estimates of relative abundance.

For each extraction lab (top panel) and kit manufacturer (bottom panel), log10(p-values) from a paired t-test for no difference between shipped and locally extracted DNA. Different colors represent results from different phyla as indicated in the figure legend. Bioinformatics pipelines agreed closely at the phylum level, leading to clusters of near-identical points for each color. When kits from certain manufacturers (such as MO-BIO and Omega biotek) were used by the locally extracting lab, there was good agreement with the shipped DNA (which was extracted with a MO-Bio kit). However, for other kits, there were substantial differences in the relative abundance calls at the phylum level, producing small p-values.

Supplementary Figure 10 Taxonomic profiles of positive-control and negative-control samples stratified by extraction, handling lab, and bioinformatics.

Average relative abundance profiles for A) fecal and B) oral artificial communities (see Supplementary Table 1) and C) buffer blank sample profiles stratified by extraction location (super-row), handling lab (super-column), and bioinformatics lab (column). Only taxa (rows) achieving at least 0.1% relative abundance in at least one sample are shown; combinations for which no data were provided are gray.

Supplementary Figure 11 Alpha diversity of gut- and oral-derived artificial communities and negative-control blanks as stratified by handling and bioinformatics laboratories.

Rarefaction curves for mean number of OTUs as a function of rarefaction depth for the fecal (A-B) and oral (C-D) artificial communities and for negative control blanks (E-F). Means and standard error for a minimum of 5 samples are shown for different bioinformatic pipelines (average across handling labs; A, C, E) or for different handling labs (averaging across bioinformatics; B, D, F). Target values are 20 for fecal and 22 for oral artificial communities, respectively.

Supplementary Figure 12 Correlation between taxonomic profiles from whole metagenome shotgun (WMS) and 16S amplicon sequence data on stool and oral artificial communities.

Relative abundances calculated from WMS (horizontal) and 16S rRNA gene (vertical) artificial community sequence data for 17 and 19 species, from gut and oral artificial communities, which were identifiable from both sequence sets. Each point represents the species relative abundance interquartile ranges (IQRs) for WMS and 16S; the IQRs intersect at respective median values. Spearman rho correlation coefficients are shown in the top left of each plot. The dashed diagonal line represents the diagonal. Data are summarized from 43, 36, 43, and 40 artificial community 16S amplicon samples for gut centrally and locally, and oral centrally and locally extracted DNA samples, respectively; twelve WMS samples, three in each respective group, were summarized in each subplot.

Supplementary Figure 13 Mismatched raw reads in artificial communities account for ˜30% of sequences and vary by handling laboratory.

A) Fecal and B) oral artificial community fractions of reads exactly matching one of the 20 or 22 reference 16S rRNA gene sequences, respectively. Each bar represents one sample, with copy numbers varying depending on the data deposited and the number of sample sets handled. C) Fecal and D) oral per-nucleotide error rates estimated based on reads containing exactly one mismatch to reference 16S rRNA gene sequences. E) Fecal and F) oral reads identical to references but offset by either one or two nucleotides. G) Fecal and H) oral reads containing chimeric sequences from two known community members (as determined by exhaustive search of all reference pairs).

Supplementary Figure 14 Reads in oral artificial communities are often apparently derived from abundant taxa in non-artificial samples.

Rows correspond to OTUs abundant in non-artificial MBQC-base samples, columns to samples (grouped by handling lab). Averages on the right are per lab in the same order across all non-artificial samples. Only results from the BL-9B bioinformatics method are shown for simplicity, and handling lab HL-A samples are omitted due to an incongruous pattern of apparent external contamination by other organisms (see Supplementary Fig. 10).

Supplementary Figure 15 The significance of all variables from a multivariate model of experimental and bioinformatic protocol variables.

A) Magnitudes and significance levels of only fixed effects from a random effects model capturing all handling and bioinformatics lab variables for which sufficient measurements were available (see Supplementary Table 7). Comparably high variability in phylum-level taxonomic abundance readouts was associated with subsets of both handling and bioinformatics experimental protocol choices. B) Magnitudes and significance levels of only random effects from the full model. These highlight large differences induced by biological variation relative to sample handling or bioinformatics protocol choices on the resulting abundances. C) Random effects from a simplified model including only individual handling and bioinformatics laboratory identifiers and differences between pre- and locally-extracted samples. D) Fixed effects of sample handling and bioinformatics laboratories from the simplified model. These suggest that that variability in taxonomic profiling is primarily driven by sample handling protocol choices, with lesser differences induced at the phylum level by bioinformatics choices. All effects were evaluated using a likelihood ratio test with Benjamini-Hochberg-Yekutieli correction within models.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 3507 kb)

Life Sciences Reporting Summary (PDF 158 kb)

Supplementary Notes 1–9 and Supplementary Tables 1–9 (PDF 801 kb)

Supplementary Data set 1

Extended literature review and categorization of microbiome protocol studies. (XLSX 117 kb)

Supplementary Data set 2

Handling protocol variables recorded during the MBQC-base. (XLSX 25 kb)

Supplementary Data set 3

Bioinformatics protocol variable reporting. (XLSX 24 kb)

Supplementary Data set 4

Handling protocol variables curated from the MBQC-base. (XLSX 16 kb)

Supplementary Data set 5

Raw read counts per sample. (XLSX 41 kb)

Supplementary Data set 6

MBQC-base OTU table. (ZIP 14543 kb)

Supplementary Data set 7

Alpha and beta diversities stratified by sample type within and between 175 handling and bioinformatics laboratories. (ZIP 7139 kb)

Supplementary Data set 8

Beta-diversity divergence from artificial community positive controls; 181 stratified by handling and bioinformatics laboratories. (TXT 2 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sinha, R., Abu-Ali, G., Vogtmann, E. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat Biotechnol 35, 1077–1086 (2017). https://doi.org/10.1038/nbt.3981

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.3981

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing