Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
BMC Genomics Open Access 07 November 2008
BMC Bioinformatics Open Access 14 October 2008
BMC Bioinformatics Open Access 07 June 2007
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Schadt, E.E. et al. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol. 5, R73 (2004).
Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Krogh, A. Two methods for improving performance of an HMM and their applicatoin for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).
Xu, Y., Mural, R.J. & Uberbacher, E.C. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 344–353 (1997).
Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Zhang, W. et al. The functional landscape of moues gene expression. J. Biol. 3, 21 (2004).
Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).
Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
Stolc, V. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
Yamada, K. et al. Empricial analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).
Kapranov, P. et al. Large-scale transcriptional activity in Chromosomes 21 and 22. Science 296, 916–919 (2002).
Rinn, J.L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17, 529–540 (2003).
Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
Kschischang, F.R., Frey, B.J. & Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).
Garbarino, J.E. & Gibbons, I.R. Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein. BMC Genomics 3, 18 (2002).
Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).
Wang, J. et al. Mouse transcriptome: Neutral evolution of 'non-coding' complementary DNAs (reply). Nature 431, 757 (2004).
Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).
Wong, G.K., Passey, D.A. & Yu, J. Most of the human genome is transcribed. Genome Res. 11, 1975–1977 (2001).
Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence Project: update and current status. Nucleic Acids Res. 31, 34–37 (2003).
Hubbard, T. et al. Ensembl 2005. Nucleic Acids Res. 33, D447–D453 (2005).
Pontius, J.U., Wagner, L. & Schuler, G.D. Unigene: A unified view of the transcriptome. in The NCBI Handbook (National Center for Biotechnology Information, Bethesda, MD, 2003).
We thank G.E. Hinton for conversations and C. Boone and B. Andrews for their support. This work was supported by grants from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada and the Canadian Foundation for Innovation (to T.R.H., B.J.F. and B.J.B.), by a PREA award (to B.J.F.) and by a Natural Sciences and Engineering Research Council of Canada postdoctoral fellowship (to Q.D.M.).
The authors declare no competing financial interests.
About this article
Cite this article
Frey, B., Mohammad, N., Morris, Q. et al. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs. Nat Genet 37, 991–996 (2005). https://doi.org/10.1038/ng1630
This article is cited by
Journal of Classification (2015)
Nature Genetics (2009)
BMC Genomics (2008)
BMC Bioinformatics (2008)
BMC Bioinformatics (2007)