Article abstract
Nature Methods 6, 673 - 676 (2009)
Published online: 2 August 2009 | doi:10.1038/nmeth.1358
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models
Arthur Brady1 & Steven L Salzberg1
Abstract
Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
Correspondence to: Arthur Brady1 e-mail: abrady@umiacs.umd.edu
MORE ARTICLES LIKE THIS
These links to content published by NPG are automatically generated.
RESEARCH
Accurate phylogenetic classification of variable-length DNA fragmentsNature Methods Article (01 Jan 2007)
Use of simulated data sets to evaluate the fidelity of metagenomic processing methodsNature Methods Article (01 Jun 2007)
Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termiteNature Letters to Editor (22 Nov 2007)
High-resolution metagenomics targets specific functional types in complex microbial communitiesNature Biotechnology Research (01 Sep 2008)

