Striped UniFrac: enabling microbiome analysis at unprecedented scale


Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Algorithm description and empirical performance results.

Data availability

The datasets analyzed during the current study are available in the Qiita repository with the specific study accessions in Supplementary Data 1, and were extracted with Qiita’s redbiom interface.


  1. 1.

    Lozupone, C. & Knight, R. Appl. Environ. Microbiol. 71, 8228–8235 (2005).

    CAS  Article  Google Scholar 

  2. 2.

    Thompson, L. R. et al. Nature 551, 457–463 (2017).

    CAS  Article  Google Scholar 

  3. 3.

    McDonald, D. et al. mSystems 3, e00031-18 (2018).

    Article  Google Scholar 

  4. 4.

    Gonzalez, A. et al. Nat. Methods 15, 796–798 (2018).

    CAS  Article  Google Scholar 

  5. 5.

    Caporaso, J. G. et al. Nat. Methods 7, 335–336 (2010).

    CAS  Article  Google Scholar 

  6. 6.

    Chang, Q., Luan, Y. & Sun, F. BMC Bioinformatics 12, 118 (2011).

    Article  Google Scholar 

  7. 7.

    Chen, J. et al. Bioinformatics 28, 2106–2113 (2012).

    CAS  Article  Google Scholar 

  8. 8.

    McMurdie, P. J. & Holmes, S. PLoS One 8, e61217 (2013).

    CAS  Article  Google Scholar 

  9. 9.

    Amir, A. et al. mSystems 2, e00191-16 (2017).

    Article  Google Scholar 

Download references


This work was supported by the NSF (grant DBI-1565100 to D.M., Y.V.-B., Z.X., A.G., and R.K.; award 1664803 to D.K and J.M.), the Alfred P. Sloan Foundation (G-2017-9838 to D.M., Y.V.-B., A.G., and R.K.; G-2015-13933 to A.G. and R.K.), ONR (grant N00014-15-1-2809 to D.M., A.G., and R.K.), and NIH–NIDDK (grant P01DK078669 to A.G. and R.K.). This work was partially supported by XSEDE resource grant BIO150043. Additional support was provided by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

Author information




D.M. designed Striped UniFrac, planned the study, analyzed data, and wrote the manuscript. Y.V.-B. integrated Striped UniFrac with QIIME 2 and contributed to the manuscript. D.K. and J.M. contributed to the proof. N.R. contributed language interface code. Z.X. contributed to the manuscript. A.G integrated Striped UniFrac with Qiita. R.K. planned the study and wrote the manuscript.

Corresponding author

Correspondence to Rob Knight.

Ethics declarations

Competing interests

R.K. is a founder and CSO of Biota Technology Inc. D.M. is a consultant with Biota Technology Inc.

Integrated supplementary information

Supplementary Figure 1 Parallel scaling and heuristic correlations.

(A-B) Walltime and memory distributions of independent processes operating on the full Earth Microbiome Project dataset (n = 26,181) executing on shared compute nodes. An individual partition represents a single independent process, and each process was run with two threads; 32 partitions indicates 32 processes using two threads each. A higher partition count means each individual process is doing less work. Box plots show the median, whiskers are 1.5 times the proportion of the interquartile range past the 25th and 75th percentiles; the number of data points in each box plot is the number of partitions in the processing run. (C) An empirical assessment of the number of proportion vectors required to be retained in memory over increasing tree sizes. This assessment was performed by randomly sampling tips from the Greengenes 99% OTU tree, and counting the maximum number of nodes required to hold proportion vectors resident in memory. Box plots show the median, whiskers are 1.5 times the proportion of the interquartile range past the 25th and 75th percentiles; each box plot represents 10 independent experiments. (D) Empirical assessment of the runtime of Striped UniFrac for 1,024 samples over increasing numbers of tips in a phylogeny. (E) Mantel tests (Pearson) between Striped UniFrac in exact mode, which produces identical results to UniFrac, versus fast mode, in which the UniFrac distances are not computed at the tips of the tree during traversal. Each data point represents n = 10 random subsets (independent experiments) of the Earth Microbiome Project Deblur 90-nt dataset, with the mean R2 value depicted. Error bars are 95% CI around the mean. The figure data can be found in Supplementary Data 3.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1 and Supplementary Note 1

Reporting Summary

Supplementary Data 1

table_s1.xlsx, the Qiita study accessions used.

Supplementary Data 2

figure1-data.xlsx, the data necessary to re-create panels c and d in Fig. 1.

Supplementary Data 3

figureS1-data.xlsx, the data necessary to re-create Supplementary Fig. 1.

Supplementary Software

Supplementary SoftwareUnifrac.tar.gz, the version of UniFrac used in the study.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McDonald, D., Vázquez-Baeza, Y., Koslicki, D. et al. Striped UniFrac: enabling microbiome analysis at unprecedented scale. Nat Methods 15, 847–848 (2018).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing