Original article
Cell Research (2008) 18:695–700. doi: 10.1038/cr.2008.59; published online 27 May 2008
Finding noncoding RNA transcripts from low abundance expressed sequence tags
Chenghai Xue1,2,*, Fei Li1,3,* and Fei Li1,§
- 1Department of Entomology, Nanjing Agricultural University, Nanjing 210095, China;
- 2MOE Key Laboratory of Bioinformatics and Bioinformatics Div, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China;
- 3The First Hospital of Tsinghua University, Beijing 10084, China
Correspondence: §Fei Li, Tel/Fax: +86-25-84399025 E-mail: lifei@njau.edu.cn
*These two authors contributed equally to this work.
Received 11 February 2007; Revised 2 July 2007; Accepted 21 December 2007.
Abstract
It has been proved that noncoding RNA (ncRNA) genes are much more numerous than expected. However, it remains a difficult task to identify ncRNAs with either computational algorithms or biological experiments. Recent reports have suggested that ncRNAs may also appear in the expressed sequence tags (EST's) database. Nevertheless, intergenic ESTs have received little attention and are poorly annotated owing to their low abundance. Here, we have developed a computational strategy for discovering ncRNA genes from human ESTs. We first collected ESTs that are located in the intergenic regions and do not have detailed annotations. The intergenic regions were divided into non-overlapping 50-nt windows and PhastCons scores obtained from the UCSC database were assigned to these windows. We kept conserved windows that had PhastCons scores of over 0.8 and that had at least three supporting ESTs to act as seeds. Each cluster of ESTs corresponding to the seeds was assembled into a long contig. We used two criteria to screen for ncRNA transcripts from these contigs: the first was that the longest predicted open reading frame was less than 300 nt and the second was that the likely Pol-II promoters exist within 2 000 nt upstream or downstream of the contigs. As a result, 118 novel ncRNA genes were identified from human low abundance ESTs. Of seven randomly selected candidates, six were transcribed in human 2BS cells as shown by RT-PCR. Our work proves that the EST is a 'hidden treasure' for detecting novel ncRNA genes.
Keywords:
ncRNA, EST, computational identification, RT-PCR
MORE ARTICLES LIKE THIS
These links to content published by NPG are automatically generated.
NEWS AND VIEWS
The multitasking genomeNature Genetics News and Views (01 Jun 2006)
Research HighlightsNature Genetics News and Views (01 Aug 2007)
RESEARCH
Biological function of unannotated transcription during the early development of Drosophila melanogasterNature Genetics Article (01 Oct 2006)
Dynamic transcriptome of Schizosaccharomyces pombe shown by RNA-DNA hybrid mappingNature Genetics Article (01 Aug 2008)
See all 54 matches for Research
