Parks, D.H. et al. Nat. Microbiol. http://dx.doi.org/10.1038/s41564-017-0012-7 (2017).

The growing piles of metagenomic data, generated by shotgun sequencing of environmentally sampled microbial mixtures, present a tantalizing target for data mining. Parks et al. take a deep dive into over 1,500 publicly available metagenomes and surface with a huge trove of newly characterized microbial genomes. The authors use the CLC de novo assembler to generate long contiguous sequences, which are then binned based on similarity and taxonomic compatibility and pruned for quality. The nearly 8,000 resulting draft-quality metagenome-assembled genomes (MAGs), which they call the Uncultivated Bacteria and Archaea (UBA) data set, are more than 50% complete, with nearly half of the MAGs over 90% complete. The selected metagenome data sets were mainly collected outside of the relatively well-characterized human host context, and thus the MAGs generated in this analysis expand known phylogenetic diversity by over 30%.