The rise of genomic big data has transformed many aspects of the decades-long research on phylogenetic relationship inference. To maximize information extracted from sequenced biomolecules, computational pipelines have been developed that include steps such as genome assembly, annotation and orthology identification, all of which are equipped with dedicated algorithms. Despite being remarkably powerful, these complex pipelines also set up high barriers in terms of data quality, bioinformatics expertise and computational capability. To ease the task of phylogenomic inference, Fritz Sedlazeck at Baylor College of Medicine, Christophe Dessimoz at the University of Lausanne and their colleagues cooperated to develop Read2Tree, an efficient and accurate method for building trees directly from raw sequencing reads.
Read2Tree bypasses a number of computationally expensive steps in the traditional pipelines, such as genome assembly, annotation, homology and orthology inference. It aligns raw reads to sequences of specific reference orthologous groups from the OMA database developed by Dessimoz’s lab. After constructing and picking up the best consensus sequences, the tool concatenates them for tree building. In addition to being 10–100 times faster, Read2Tree is typically as accurate as or in some cases even more accurate than the conventional approach, notes Sedlazeck: “As absurd as this sounds, in the paper we report accurately reconstructing trees containing fungi for which the closest reference genome diverged more than billion years ago.” It is also versatile regarding input data from various sequencing technologies (Illumina, PacBio or Oxford Nanopore Technologies) and molecule assayed (DNA or RNA).
This is a preview of subscription content, access via your institution