Differential abundance analysis for microbial marker-gene surveys


We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling—a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.

Figure 1: Clustering analysis is improved substantially by CSS normalization.
Figure 2: Simulation results indicated that metagenomeSeq has greater sensitivity and specificity in a variety of settings.


J.N.P. was supported by a US National Science Foundation Graduate Research Fellowship (award DGE0750616). J.N.P., O.C.S. and M.P. were supported in part by the Bill and Melinda Gates Foundation (award 42917 to O.C.S.). H.C.B. was supported in part by the US National Institutes of Health grant 5R01HG005220. We would like to thank B. Lindsay and L. Magder for discussion of the methods and C.M. Hill for help with clustering of OTUs.

Author information

Authors and Affiliations



J.N.P. and H.C.B. developed the algorithms and wrote the software. J.N.P. collected results. O.C.S. and M.P. contributed to discussions of the methods. J.N.P., H.C.B. and M.P. analyzed results. J.N.P., H.C.B. and M.P. wrote the manuscript. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Héctor Corrada Bravo or Mihai Pop.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

