Striped UniFrac: enabling microbiome analysis at unprecedented scale

McDonald, Daniel; Vázquez-Baeza, Yoshiki; Koslicki, David; McClelland, Jason; Reeve, Nicolai; Xu, Zhenjiang; Gonzalez, Antonio; Knight, Rob

doi:10.1038/s41592-018-0187-8

Correspondence
Published: 30 October 2018

Striped UniFrac: enabling microbiome analysis at unprecedented scale

Daniel McDonald¹,
Yoshiki Vázquez-Baeza¹,
David Koslicki²,
Jason McClelland²,
Nicolai Reeve¹^nAff6,
Zhenjiang Xu¹,
Antonio Gonzalez¹ &
…
Rob Knight^1,3,4,5

Nature Methods volume 15, pages 847–848 (2018)Cite this article

3154 Accesses
61 Citations
51 Altmetric
Metrics details

Subjects

Access through your institution

Buy or subscribe

To the Editor — The UniFrac metric is used frequently in microbiome research, but it does not scale to today’s large datasets. We propose a new algorithm, Striped UniFrac, which produces results identical to those of previous algorithms but requires dramatically less memory and computing power. A BSD-licensed implementation is available that produces a C shared library linkable by any programming language (Supplementary Software and https://github.com/biocore/unifrac).

UniFrac¹ is a phylogenetic distance metric used to compare pairs of microbiome profiles. Microbiome studies now encompass tens of thousands of samples, such as the 27,751-sample Earth Microbiome Project (EMP)² and the 15,096-sample American Gut Project³. Existing algorithms for UniFrac computation cannot scale in time or space to these study designs. For example, Fast UniFrac with the EMP was projected to take months. Striped UniFrac produces results identical to those of other existing algorithms, shows >30-fold improvement in single-threaded performance and near-linear parallel scaling (Supplementary Fig. 1a,b), and can process the EMP dataset on a laptop in less than 24 hours. It can enable scientists to derive new biological insights, as shown by a meta-analysis³ of the American Gut Project and EMP. To demonstrate the utility of the algorithm, we computed UniFrac on 113,721 public samples in Qiita⁴ in less than 48 hours using 256 CPUs (an interactive plot is available at https://bit.ly/2LHMDFC).

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Unveiling the dynamics of the breast milk microbiome: impact of lactation stage and gestational age
- Parul Singh
- , Noora Al Mohannadi
- … Souhaila Al Khodor
Journal of Translational Medicine Open Access 06 November 2023
Oligofructose improves small intestinal lipid-sensing mechanisms via alterations to the small intestinal microbiota
- Savanna N. Weninger
- , Chloe Herman
- … Frank A. Duca
Microbiome Open Access 02 August 2023
Host biology, ecology and the environment influence microbial biomass and diversity in 101 marine fish species
- Jeremiah J. Minich
- , Andreas Härer
- … Eric E. Allen
Nature Communications Open Access 17 November 2022

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Algorithm description and empirical performance results.**

Data availability

The datasets analyzed during the current study are available in the Qiita repository with the specific study accessions in Supplementary Data 1, and were extracted with Qiita’s redbiom interface.

References

Lozupone, C. & Knight, R. Appl. Environ. Microbiol. 71, 8228–8235 (2005).
Article CAS Google Scholar
Thompson, L. R. et al. Nature 551, 457–463 (2017).
Article CAS Google Scholar
McDonald, D. et al. mSystems 3, e00031-18 (2018).
Article Google Scholar
Gonzalez, A. et al. Nat. Methods 15, 796–798 (2018).
Article CAS Google Scholar
Caporaso, J. G. et al. Nat. Methods 7, 335–336 (2010).
Article CAS Google Scholar
Chang, Q., Luan, Y. & Sun, F. BMC Bioinformatics 12, 118 (2011).
Article Google Scholar
Chen, J. et al. Bioinformatics 28, 2106–2113 (2012).
Article CAS Google Scholar
McMurdie, P. J. & Holmes, S. PLoS One 8, e61217 (2013).
Article CAS Google Scholar
Amir, A. et al. mSystems 2, e00191-16 (2017).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the NSF (grant DBI-1565100 to D.M., Y.V.-B., Z.X., A.G., and R.K.; award 1664803 to D.K and J.M.), the Alfred P. Sloan Foundation (G-2017-9838 to D.M., Y.V.-B., A.G., and R.K.; G-2015-13933 to A.G. and R.K.), ONR (grant N00014-15-1-2809 to D.M., A.G., and R.K.), and NIH–NIDDK (grant P01DK078669 to A.G. and R.K.). This work was partially supported by XSEDE resource grant BIO150043. Additional support was provided by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

Author information

Nicolai Reeve
Present address: Biota Technology Inc., La Jolla, CA, USA

Authors and Affiliations

Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
Daniel McDonald, Yoshiki Vázquez-Baeza, Nicolai Reeve, Zhenjiang Xu, Antonio Gonzalez & Rob Knight
Mathematics Department, Oregon State University, Corvallis, OR, USA
David Koslicki & Jason McClelland
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
Rob Knight
Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
Rob Knight
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
Rob Knight

Authors

Daniel McDonald
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiki Vázquez-Baeza
View author publications
You can also search for this author in PubMed Google Scholar
David Koslicki
View author publications
You can also search for this author in PubMed Google Scholar
Jason McClelland
View author publications
You can also search for this author in PubMed Google Scholar
Nicolai Reeve
View author publications
You can also search for this author in PubMed Google Scholar
Zhenjiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Rob Knight
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.M. designed Striped UniFrac, planned the study, analyzed data, and wrote the manuscript. Y.V.-B. integrated Striped UniFrac with QIIME 2 and contributed to the manuscript. D.K. and J.M. contributed to the proof. N.R. contributed language interface code. Z.X. contributed to the manuscript. A.G integrated Striped UniFrac with Qiita. R.K. planned the study and wrote the manuscript.

Corresponding author

Correspondence to Rob Knight.

Ethics declarations

Competing interests

R.K. is a founder and CSO of Biota Technology Inc. D.M. is a consultant with Biota Technology Inc.

Integrated supplementary information

Supplementary Figure 1 Parallel scaling and heuristic correlations.

(A-B) Walltime and memory distributions of independent processes operating on the full Earth Microbiome Project dataset (n = 26,181) executing on shared compute nodes. An individual partition represents a single independent process, and each process was run with two threads; 32 partitions indicates 32 processes using two threads each. A higher partition count means each individual process is doing less work. Box plots show the median, whiskers are 1.5 times the proportion of the interquartile range past the 25th and 75th percentiles; the number of data points in each box plot is the number of partitions in the processing run. (C) An empirical assessment of the number of proportion vectors required to be retained in memory over increasing tree sizes. This assessment was performed by randomly sampling tips from the Greengenes 99% OTU tree, and counting the maximum number of nodes required to hold proportion vectors resident in memory. Box plots show the median, whiskers are 1.5 times the proportion of the interquartile range past the 25th and 75th percentiles; each box plot represents 10 independent experiments. (D) Empirical assessment of the runtime of Striped UniFrac for 1,024 samples over increasing numbers of tips in a phylogeny. (E) Mantel tests (Pearson) between Striped UniFrac in exact mode, which produces identical results to UniFrac, versus fast mode, in which the UniFrac distances are not computed at the tips of the tree during traversal. Each data point represents n = 10 random subsets (independent experiments) of the Earth Microbiome Project Deblur 90-nt dataset, with the mean R² value depicted. Error bars are 95% CI around the mean. The figure data can be found in Supplementary Data 3.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1 and Supplementary Note 1

Reporting Summary

Supplementary Data 1

table_s1.xlsx, the Qiita study accessions used.

Supplementary Data 2

figure1-data.xlsx, the data necessary to re-create panels c and d in Fig. 1.

Supplementary Data 3

figureS1-data.xlsx, the data necessary to re-create Supplementary Fig. 1.

Supplementary Software

Supplementary SoftwareUnifrac.tar.gz, the version of UniFrac used in the study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McDonald, D., Vázquez-Baeza, Y., Koslicki, D. et al. Striped UniFrac: enabling microbiome analysis at unprecedented scale. Nat Methods 15, 847–848 (2018). https://doi.org/10.1038/s41592-018-0187-8

Download citation

Published: 30 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/s41592-018-0187-8

This article is cited by

Oligofructose improves small intestinal lipid-sensing mechanisms via alterations to the small intestinal microbiota
- Savanna N. Weninger
- Chloe Herman
- Frank A. Duca
Microbiome (2023)
Unveiling the dynamics of the breast milk microbiome: impact of lactation stage and gestational age
- Parul Singh
- Noora Al Mohannadi
- Souhaila Al Khodor
Journal of Translational Medicine (2023)
Fecal microbiome of horses transitioning between warm-season and cool-season grass pasture within integrated rotational grazing systems
- Jennifer R. Weinert-Nelson
- Amy S. Biddle
- Carey A. Williams
Animal Microbiome (2022)
Host biology, ecology and the environment influence microbial biomass and diversity in 101 marine fish species
- Jeremiah J. Minich
- Andreas Härer
- Eric E. Allen
Nature Communications (2022)
Distribution characteristics of ammonia-oxidizing microorganisms and their responses to external nitrogen and carbon in sediments of a freshwater reservoir, China
- Jingyu Huang
- Xia Wang
- Shuang Song
Aquatic Ecology (2022)

Striped UniFrac: enabling microbiome analysis at unprecedented scale

Subjects

Relevant articles

Unveiling the dynamics of the breast milk microbiome: impact of lactation stage and gestational age

Oligofructose improves small intestinal lipid-sensing mechanisms via alterations to the small intestinal microbiota

Host biology, ecology and the environment influence microbial biomass and diversity in 101 marine fish species

Access options

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Parallel scaling and heuristic correlations.

Supplementary information

Supplementary Text and Figures

Reporting Summary

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Software

Rights and permissions

About this article

Cite this article

This article is cited by

Oligofructose improves small intestinal lipid-sensing mechanisms via alterations to the small intestinal microbiota

Unveiling the dynamics of the breast milk microbiome: impact of lactation stage and gestational age

Fecal microbiome of horses transitioning between warm-season and cool-season grass pasture within integrated rotational grazing systems

Host biology, ecology and the environment influence microbial biomass and diversity in 101 marine fish species

Distribution characteristics of ammonia-oxidizing microorganisms and their responses to external nitrogen and carbon in sediments of a freshwater reservoir, China

Search

Quick links

Subjects

Relevant articles

Access options

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links