Nature Methods
- 4, 63 - 72 (2007)
Published online: 10 December 2006; | doi:10.1038/nmeth976
Accurate phylogenetic classification of variable-length DNA fragmentsAlice Carolyn McHardy1, Héctor García Martín2, Aristotelis Tsirigos1, Philip Hugenholtz2 & Isidore Rigoutsos11
Bioinformatics and Pattern Discovery Group, IBM Thomas J Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10598, USA. 2
US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598, USA.
Correspondence should be addressed to Isidore Rigoutsos rigoutso@us.ibm.com Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments 1 kb with high specificity.
MORE ARTICLES LIKE THIS These links to content published by NPG are automatically generated.
|