Original article

Cell Research (2008) 18:695–700. doi: 10.1038/cr.2008.59; published online 27 May 2008

Finding noncoding RNA transcripts from low abundance expressed sequence tags

Chenghai Xue1,2,*, Fei Li1,3,* and Fei Li1,§

  1. 1Department of Entomology, Nanjing Agricultural University, Nanjing 210095, China;
  2. 2MOE Key Laboratory of Bioinformatics and Bioinformatics Div, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China;
  3. 3The First Hospital of Tsinghua University, Beijing 10084, China

  4. Correspondence: §Fei Li, Tel/Fax: +86-25-84399025 E-mail: lifei@njau.edu.cn

*These two authors contributed equally to this work.

Received 11 February 2007; Revised 2 July 2007; Accepted 21 December 2007.

Top

Abstract

It has been proved that noncoding RNA (ncRNA) genes are much more numerous than expected. However, it remains a difficult task to identify ncRNAs with either computational algorithms or biological experiments. Recent reports have suggested that ncRNAs may also appear in the expressed sequence tags (EST's) database. Nevertheless, intergenic ESTs have received little attention and are poorly annotated owing to their low abundance. Here, we have developed a computational strategy for discovering ncRNA genes from human ESTs. We first collected ESTs that are located in the intergenic regions and do not have detailed annotations. The intergenic regions were divided into non-overlapping 50-nt windows and PhastCons scores obtained from the UCSC database were assigned to these windows. We kept conserved windows that had PhastCons scores of over 0.8 and that had at least three supporting ESTs to act as seeds. Each cluster of ESTs corresponding to the seeds was assembled into a long contig. We used two criteria to screen for ncRNA transcripts from these contigs: the first was that the longest predicted open reading frame was less than 300 nt and the second was that the likely Pol-II promoters exist within 2 000 nt upstream or downstream of the contigs. As a result, 118 novel ncRNA genes were identified from human low abundance ESTs. Of seven randomly selected candidates, six were transcribed in human 2BS cells as shown by RT-PCR. Our work proves that the EST is a 'hidden treasure' for detecting novel ncRNA genes.

Keywords:

ncRNA, EST, computational identification, RT-PCR

MORE ARTICLES LIKE THIS

These links to content published by NPG are automatically generated.

NEWS AND VIEWS

The multitasking genome

Nature Genetics News and Views (01 Jun 2006)

Extra navigation

.

naturejobs

ADVERTISEMENT