Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Metagenome-assembled genome extraction and analysis from microbiomes using KBase

A Publisher Correction to this article was published on 30 November 2022

This article has been updated

Abstract

Uncultivated Bacteria and Archaea account for the vast majority of species on Earth, but obtaining their genomes directly from the environment, using shotgun sequencing, has only become possible recently. To realize the hope of capturing Earth’s microbial genetic complement and to facilitate the investigation of the functional roles of specific lineages in a given ecosystem, technologies that accelerate the recovery of high-quality genomes are necessary. We present a series of analysis steps and data products for the extraction of high-quality metagenome-assembled genomes (MAGs) from microbiomes using the U.S. Department of Energy Systems Biology Knowledgebase (KBase) platform (http://www.kbase.us/). Overall, these steps take about a day to obtain extracted genomes when starting from smaller environmental shotgun read libraries, or up to about a week from larger libraries. In KBase, the process is end-to-end, allowing a user to go from the initial sequencing reads all the way through to MAGs, which can then be analyzed with other KBase capabilities such as phylogenetic placement, functional assignment, metabolic modeling, pangenome functional profiling, RNA-Seq and others. While portions of such capabilities are available individually from other resources, the combination of the intuitive usability, data interoperability and integration of tools in a freely available computational resource makes KBase a powerful platform for obtaining MAGs from microbiomes. While this workflow offers tools for each of the key steps in the genome extraction process, it also provides a scaffold that can be easily extended with additional MAG recovery and analysis tools, via the KBase software development kit (SDK).

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Overview of MAG extraction data and analysis workflow using KBase apps.
Fig. 2: Example FastQC app report, before and after base call quality trimming with Trimmomatic.
Fig. 3: Stacked bar plots of lineage abundance measured in the compost enrichment by the Kaiju app.
Fig. 4: Krona plot of lineages measured in the compost enrichment by the Kaiju app.
Fig. 5: Cumulative and sorted contig lengths of different assemblies from the compost enrichment.
Fig. 6: Summary statistics and histograms of contig lengths of different assemblies.
Fig. 7: MaxBin2 bin contig plot.
Fig. 8: DAS-Tool bin optimization plot.
Fig. 9: CheckM quality assessment of bins plot.
Fig. 10: Diagram of bin extraction to assembly data objects.
Fig. 11: Phylogenetic placement of MAGs.
Fig. 12: DRAM functional classification of MAGs.
Fig. 13: Gene identification of targeted domain families.

Data availability

The analyses and data discussed are available via the ‘dynamic’ KBase Narratives https://narrative.kbase.us/narrative/33233 (Compost) and https://narrative.kbase.us/narrative/62384 (Moab Desert Crust). Additionally, ‘static’ HTML narratives have been published on KBase [https://docs.kbase.us/getting-started/narrative/share#publishing-a-static-narrative] from each of these dynamic Narratives. They are available at https://kbase.us/n/33233/628/ (Compost78, https://doi.org/10.25982/33233.606/1831502) and https://kbase.us/n/62384/334/ (Moab Desert Crust79, https://doi.org/10.25982/62384.253/1831503). All input and derived data objects can be exported using standard formats from the Narratives by clicking on the given object, and then on the download arrow in the data panel in the upper left of the dynamic Narrative, as described at https://docs.kbase.us/data/upload-download-guide/downloads.

Code availability

All KBase code is open source under the Massachusetts Institute of Technology license and available from Github at https://github.com/kbase and https://github.com/kbaseapps. All externally developed software run in KBase is also open source by policy and available from the respective repositories, typically Github, Gitlab, Bitbucket or Sourceforge (‘Code versions’ section).

Change history

References

  1. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

    Article  CAS  PubMed  Google Scholar 

  2. Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).

    Article  CAS  PubMed  Google Scholar 

  4. Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).

    Article  CAS  PubMed  Google Scholar 

  6. Tully, B. J. & Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci. Data 5, 170203 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9, 870 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography and lifestyle. Cell 176, 649–662 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509, https://doi.org/10.1038/s41587-020-0718-6 (2021).

    Article  CAS  PubMed  Google Scholar 

  10. Gilbert, J. A., Jansson, J. K. & Knight, R. The Earth Microbiome project: successes and aspirations. BMC Biol 12, 69 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Saheb Kashaf, S., Almeida, A., Segre, J. A. & Finn, R. D. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 16, 2520–2541 (2021).

    Article  CAS  PubMed  Google Scholar 

  12. Chong, J., Liu, P., Zhou, G. & Xia, J. Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. Nat. Protoc. 15, 799–821 (2020).

    Article  CAS  PubMed  Google Scholar 

  13. Arkin, A. P. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnol. 36, 566–569 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 49, D10–D17 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Kluyver, T., et al. Jupyter Notebooks – a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B, editors. Positioning and Power in Academic Publishing: Players, Agents and Agendas. p. 87–90 (2016).

  16. Banfield, J. Development of a Knowledgebase to Integrate, Analyze, Distribute, and Visualize Microbial Community Systems Biology Data. (2015). Report number: DOE-UCB-4918, OSTI ID: 1167269.

  17. Chen, I.-M. A. et al. IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 47, D666–D677 (2019).

    Article  CAS  PubMed  Google Scholar 

  18. Afgan, E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44, W3–W10 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Devisetty, U. K., Kennedy, K., Sarando, P., Merchant, N. & Lyons, E. Bringing your tools to CyVerse discovery environment using Docker. F1000Res. 5, 1442 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Wang, L., Lu, Z., Van Buren, P. & Ware, D. SciApps: a bioinformatics workflow platform powered by XSEDE and CyVerse. in Proceedings of the Practice and Experience on Advanced Research Computing 1–5 (Association for Computing Machinery, 2018).

  21. Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Wattam, A. R. et al. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Res 45, D535–D542 (2017).

    Article  CAS  PubMed  Google Scholar 

  23. Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, D570–D578 (2020).

    CAS  PubMed  Google Scholar 

  24. Wu, Y.-W. et al. Ionic liquids impact the bioenergy feedstock-degrading microbiome and transcription of enzymes relevant to polysaccharide hydrolysis. mSystems 1, e00120–16 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Rajeev, L. et al. Dynamic cyanobacterial response to hydration and dehydration in a desert biological soil crust. ISME J 7, 2178–2191 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Foster, I. Globus Online: accelerating and democratizing science through cloud-based services. IEEE Internet Comput 15, 70–73 (2011).

    Article  Google Scholar 

  27. Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res 27, 824–834 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Zhang, H. et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 46, W95–W101 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).

    PubMed  PubMed Central  Google Scholar 

  30. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma 10, 421 (2009).

    Article  Google Scholar 

  31. Nordberg, H. et al. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42, D26–D31 (2014).

    Article  CAS  PubMed  Google Scholar 

  32. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article  Google Scholar 

  34. Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Freitas, T. A. K., Li, P.-E., Scholz, M. B. & Chain, P. S. G. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res 43, e69 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

    Article  CAS  PubMed  Google Scholar 

  38. Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 2014 (2019).

    Article  Google Scholar 

  39. Youngblut, N. D. & Ley, R. E. Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets. Peer J 9, e12198 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a Web browser. BMC Bioinform 12, 385 (2011).

    Article  Google Scholar 

  41. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Article  CAS  PubMed  Google Scholar 

  42. Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

    Article  CAS  PubMed  Google Scholar 

  43. Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol 22, 178 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).

    Article  CAS  PubMed  Google Scholar 

  46. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    Article  CAS  PubMed  Google Scholar 

  48. Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 3, 836–843 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 1043–1055 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Delcher, A. L., Salzberg, S. L. & Phillippy, A. M. Using MUMmer to identify similar regions in large sequence sets. Curr. Protoc. Bioinform. Chapter 10, Unit 10.3 (2003).

    Google Scholar 

  51. Darling, A. C. E., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14, 1394–1403 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 50, D785–D794 (2022).

    Article  CAS  PubMed  Google Scholar 

  53. Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Brettin, T. et al. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci. Rep. 5, 8365 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Overbeek, R. et al. The SEED and the rapid annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42, D206–D214 (2014).

    Article  CAS  PubMed  Google Scholar 

  56. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

    Article  CAS  PubMed  Google Scholar 

  57. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11, 119 (2010).

    Article  Google Scholar 

  58. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).

    Article  CAS  PubMed  Google Scholar 

  59. Rinke, C. et al. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat. Microbiol. 6, 946–959 (2021).

    Article  CAS  PubMed  Google Scholar 

  60. Haft, D. H. et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res 46, D851–D860 (2018).

    Article  CAS  PubMed  Google Scholar 

  61. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res 48, 8883–8900 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 43, D261–D269 (2015). (Database Issue).

    Article  CAS  PubMed  Google Scholar 

  64. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res 47, D427–D432 (2019).

    Article  CAS  PubMed  Google Scholar 

  65. Haft, D. H. et al. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Res 41, D387–D395 (2013). (Database issue).

    Article  CAS  PubMed  Google Scholar 

  66. Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42, D490–D495 (2014).

    Article  CAS  PubMed  Google Scholar 

  68. Chivian, D., Dehal, P. S., Keller, K. & Arkin, A. P. MetaMicrobesOnline: phylogenomic analysis of microbial communities. Nucleic Acids Res 41, D648–D654 (2013).

    Article  CAS  PubMed  Google Scholar 

  69. Karaoz, U. & Brodie, E. L. microTrait: a toolset for a trait-based representation of microbial genomes. Front. Bioinform. https://doi.org/10.3389/fbinf.2022.918853 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Wood-Charlson, E. M. et al. The National Microbiome Data Collaborative: enabling microbiome science. Nat. Rev. Microbiol. 18, 313–314 (2020).

    Article  CAS  PubMed  Google Scholar 

  71. Hofmeyr, S. et al. Terabase-scale metagenome coassembly with MetaHipMer. Sci. Rep. 10, 10689 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020).

    Article  CAS  PubMed  Google Scholar 

  73. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27, 722–736 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Bertrand, D. et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37, 937–944 (2019).

    Article  CAS  PubMed  Google Scholar 

  75. Chen, L.-X. et al. Accurate and complete genomes from metagenomes. Genome Res 30, 315–333 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Lui, L. M., Nielsen, T. N. & Arkin, A. P. A method for achieving complete microbial genomes and improving bins from metagenomics data. PLoS Comput Biol 17, e1008972 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Miller, C. S., Baker, B. J., Thomas, B. C., Singer, S. W. & Banfield, J. F. EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol 12, R44 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Chivian, D. et al. Genome extraction from shotgun metagenome sequence data. KBase n/33233/628 https://doi.org/10.25982/33233.606/1831502 (2022).

    Article  Google Scholar 

  79. Chivian, D., et al. Moab desert crust – sample 4E. KBase n/62384/334 (2022). https://doi.org/10.25982/62384.253/1831503

  80. Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform 11, 538 (2010).

    Article  Google Scholar 

  82. Benson, D. A. et al. GenBank. Nucleic Acids Res 46, D41–D47 (2018).

    Article  CAS  PubMed  Google Scholar 

  83. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

    Article  CAS  PubMed  Google Scholar 

  84. Teiling, C. BaseSpace: Simplifying metagenomic analysis. 26th European Congress of Clinical Microbiology and Infectious Diseases (2016) 10.26226/morressier.56d5ba2ed462b80296c9509d

  85. Reich, M. et al. The GenePattern notebook environment. Cell Syst 5, 149–151.e1 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Karp, P. D. et al. A comparison of microbial genome web portals. Front. Microbiol. 10, 208 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Yue, Y. et al. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinform 21, 334 (2020).

    Article  CAS  Google Scholar 

  89. Nelson, W. C., Tully, B. J. & Mobberley, J. M. Biases in genome reconstruction from metagenomic data. PeerJ 8, e10119 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11, 2864–2868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Li, L., Stoeckert, C. J. Jr & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178–2189 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Kumari, S. et al. A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in sorghum. Curr. Plant Biol. 28, 100229 (2021).

    Article  CAS  Google Scholar 

  97. Seaver, S. M. D. et al. The ModelSEED biochemistry database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic Acids Res 49, D575–D588 (2021).

    Article  CAS  PubMed  Google Scholar 

  98. Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank S. Singer for the use of the Compost sequence data and T. Northen for the Desert Crust sequence data. We thank U. Karaoz and E.L. Brodie for the use of the MicroTrait HMMs. We thank K. Wrighton, M. Shaffer and M. Borton for the use of their DRAM app and P. Chain, M. Flynn and C. Lo for the use of their GOTTCHA2 app. We thank D. Parks and G. Tyson for the use of CheckM and P.-A. Chaumeil, D. Parks, A. J. Mussig and P. Hugenholtz for the use of GTDB-Tk. KBase especially thanks all primary developers whose tools have been wrapped as apps in KBase; please make sure to cite their primary publications if you use any of those apps. KBase greatly appreciates funding by the Genomic Science program within the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under award nos. DE-AC02-05CH11231, DE-AC02-06CH11357, DE-AC05-00OR22725 and DE-AC02-98CH10886.

Author information

Authors and Affiliations

Authors

Contributions

D.C., P.S.D. and A.P.A. conceived the workflow. D.C., P.S.D., R.S.C., E.W.C. and S.P.J. designed the workflow. D.C., S.P.J., P.S.D., G.A.P., W.J.R., T.G., R.S.C., M.L., Q.Z., M.W.S. and R.S. wrote the KBase Genome Extraction and related apps and developed the KBase platform. D.C. built the Narratives. D.C. and M.C. wrote the Narrative tutorial. D.C., E.W.C., S.P.J. and A.P.A. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Dylan Chivian or Adam P. Arkin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks Ami S. Bhatt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Romero Victorica, M. et al. Sci. Rep. 10, 3864 (2020): https://doi.org/10.1038/s41598-020-60850-5

Buongiorno, J. et al. PLoS One 15, e0234839 (2020): https://doi.org/10.1371/journal.pone.0234839

Quoc, B. N. et al. Water Res. 198, 117119 (2021): https://doi.org/10.1016/j.watres.2021.117119

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chivian, D., Jungbluth, S.P., Dehal, P.S. et al. Metagenome-assembled genome extraction and analysis from microbiomes using KBase. Nat Protoc (2022). https://doi.org/10.1038/s41596-022-00747-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41596-022-00747-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing