Correspondence | Published:

Striped UniFrac: enabling microbiome analysis at unprecedented scale

Nature Methodsvolume 15pages847848 (2018) | Download Citation

Subjects

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The datasets analyzed during the current study are available in the Qiita repository with the specific study accessions in Supplementary Data 1, and were extracted with Qiita’s redbiom interface.

References

  1. 1.

    Lozupone, C. & Knight, R. Appl. Environ. Microbiol. 71, 8228–8235 (2005).

  2. 2.

    Thompson, L. R. et al. Nature 551, 457–463 (2017).

  3. 3.

    McDonald, D. et al. mSystems 3, e00031-18 (2018).

  4. 4.

    Gonzalez, A. et al. Nat. Methods 15, 796–798 (2018).

  5. 5.

    Caporaso, J. G. et al. Nat. Methods 7, 335–336 (2010).

  6. 6.

    Chang, Q., Luan, Y. & Sun, F. BMC Bioinformatics 12, 118 (2011).

  7. 7.

    Chen, J. et al. Bioinformatics 28, 2106–2113 (2012).

  8. 8.

    McMurdie, P. J. & Holmes, S. PLoS One 8, e61217 (2013).

  9. 9.

    Amir, A. et al. mSystems 2, e00191-16 (2017).

Download references

Acknowledgements

This work was supported by the NSF (grant DBI-1565100 to D.M., Y.V.-B., Z.X., A.G., and R.K.; award 1664803 to D.K and J.M.), the Alfred P. Sloan Foundation (G-2017-9838 to D.M., Y.V.-B., A.G., and R.K.; G-2015-13933 to A.G. and R.K.), ONR (grant N00014-15-1-2809 to D.M., A.G., and R.K.), and NIH–NIDDK (grant P01DK078669 to A.G. and R.K.). This work was partially supported by XSEDE resource grant BIO150043. Additional support was provided by CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

Author information

Author notes

    • Nicolai Reeve

    Present address: Biota Technology Inc., La Jolla, CA, USA

Affiliations

  1. Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA

    • Daniel McDonald
    • , Yoshiki Vázquez-Baeza
    • , Nicolai Reeve
    • , Zhenjiang Xu
    • , Antonio Gonzalez
    •  & Rob Knight
  2. Mathematics Department, Oregon State University, Corvallis, OR, USA

    • David Koslicki
    •  & Jason McClelland
  3. Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA

    • Rob Knight
  4. Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA

    • Rob Knight
  5. Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA

    • Rob Knight

Authors

  1. Search for Daniel McDonald in:

  2. Search for Yoshiki Vázquez-Baeza in:

  3. Search for David Koslicki in:

  4. Search for Jason McClelland in:

  5. Search for Nicolai Reeve in:

  6. Search for Zhenjiang Xu in:

  7. Search for Antonio Gonzalez in:

  8. Search for Rob Knight in:

Contributions

D.M. designed Striped UniFrac, planned the study, analyzed data, and wrote the manuscript. Y.V.-B. integrated Striped UniFrac with QIIME 2 and contributed to the manuscript. D.K. and J.M. contributed to the proof. N.R. contributed language interface code. Z.X. contributed to the manuscript. A.G integrated Striped UniFrac with Qiita. R.K. planned the study and wrote the manuscript.

Competing interests

R.K. is a founder and CSO of Biota Technology Inc. D.M. is a consultant with Biota Technology Inc.

Corresponding author

Correspondence to Rob Knight.

Integrated supplementary information

  1. Supplementary Figure 1 Parallel scaling and heuristic correlations.

    (A-B) Walltime and memory distributions of independent processes operating on the full Earth Microbiome Project dataset (n = 26,181) executing on shared compute nodes. An individual partition represents a single independent process, and each process was run with two threads; 32 partitions indicates 32 processes using two threads each. A higher partition count means each individual process is doing less work. Box plots show the median, whiskers are 1.5 times the proportion of the interquartile range past the 25th and 75th percentiles; the number of data points in each box plot is the number of partitions in the processing run. (C) An empirical assessment of the number of proportion vectors required to be retained in memory over increasing tree sizes. This assessment was performed by randomly sampling tips from the Greengenes 99% OTU tree, and counting the maximum number of nodes required to hold proportion vectors resident in memory. Box plots show the median, whiskers are 1.5 times the proportion of the interquartile range past the 25th and 75th percentiles; each box plot represents 10 independent experiments. (D) Empirical assessment of the runtime of Striped UniFrac for 1,024 samples over increasing numbers of tips in a phylogeny. (E) Mantel tests (Pearson) between Striped UniFrac in exact mode, which produces identical results to UniFrac, versus fast mode, in which the UniFrac distances are not computed at the tips of the tree during traversal. Each data point represents n = 10 random subsets (independent experiments) of the Earth Microbiome Project Deblur 90-nt dataset, with the mean R2 value depicted. Error bars are 95% CI around the mean. The figure data can be found in Supplementary Data 3.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figure 1 and Supplementary Note 1

  2. Reporting Summary

  3. Supplementary Data 1

    table_s1.xlsx, the Qiita study accessions used.

  4. Supplementary Data 2

    figure1-data.xlsx, the data necessary to re-create panels c and d in Fig. 1.

  5. Supplementary Data 3

    figureS1-data.xlsx, the data necessary to re-create Supplementary Fig. 1.

  6. Supplementary Software

    Supplementary SoftwareUnifrac.tar.gz, the version of UniFrac used in the study.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/s41592-018-0187-8

Newsletter Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing