Abstract
As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE.
Similar content being viewed by others
Article PDF
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Lin, M., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions. Nat Prec (2010). https://doi.org/10.1038/npre.2010.4784.1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/npre.2010.4784.1
Keywords
This article is cited by
-
Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters
Nature Genetics (2011)