Karst, S.M. et al. Nat. Biotechnol. https://doi.org/10.1038/nbt.4045 (2018).

Most microbial profiling efforts rely on the amplification and sequencing of the small subunit (SSU) ribosomal RNA gene, a phylogenetic marker that can reveal the taxonomy of different species in a sample. The assignment of taxonomic categories relies on full-length reference SSU sequences, which range from 1.4 to 1.9 kilobases and are thus difficult to capture with traditional short-read sequencing. To help fill gaps in reference databases, Karst et al. developed a pipeline that involves RNA size selection to enrich for SSU genes, ligation-based amplification to avoid primer bias, and dual end-tagging and sequence subassembly to reconstruct full-length SSU sequences. The high throughput of short-read sequencing enabled the researchers to generate 1.6 million sequences, thereby adding considerable new diversity to the existing 2 million sequences in public databases.