Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

Abstract

Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Example of results and illustration of analysis method.
Figure 2: GenRate detects exons with high sensitivity and high specificity.
Figure 3: Performance of GenRate on detecting genes.
Figure 4: New exons detected by GenRate and associated with RefSeq Golden Path genes are categorized by 3′ or 5′ extensions of known genes, bridges that join together known genes, new exons that map to an EST or cDNA in the FANTOM2 or Unigene database, new exons that can be stitched together with the known gene by a previously detected EST or cDNA, and new exons that do not map to any previously detected sequences.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  2. Schadt, E.E. et al. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol. 5, R73 (2004).

    Article  Google Scholar 

  3. Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).

    Article  CAS  Google Scholar 

  4. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

    Article  CAS  Google Scholar 

  5. Krogh, A. Two methods for improving performance of an HMM and their applicatoin for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).

    CAS  PubMed  Google Scholar 

  6. Xu, Y., Mural, R.J. & Uberbacher, E.C. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 344–353 (1997).

    CAS  PubMed  Google Scholar 

  7. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

    Article  CAS  Google Scholar 

  8. Zhang, W. et al. The functional landscape of moues gene expression. J. Biol. 3, 21 (2004).

    Article  Google Scholar 

  9. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).

    Article  CAS  Google Scholar 

  10. Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).

    Article  CAS  Google Scholar 

  11. Stolc, V. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

    Article  Google Scholar 

  12. Yamada, K. et al. Empricial analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).

    Article  CAS  Google Scholar 

  13. Kapranov, P. et al. Large-scale transcriptional activity in Chromosomes 21 and 22. Science 296, 916–919 (2002).

    Article  CAS  Google Scholar 

  14. Rinn, J.L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17, 529–540 (2003).

    Article  CAS  Google Scholar 

  15. Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

    Article  CAS  Google Scholar 

  16. Kschischang, F.R., Frey, B.J. & Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).

    Article  Google Scholar 

  17. Garbarino, J.E. & Gibbons, I.R. Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein. BMC Genomics 3, 18 (2002).

    Article  Google Scholar 

  18. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

    Article  Google Scholar 

  19. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

    Article  CAS  Google Scholar 

  20. Wang, J. et al. Mouse transcriptome: Neutral evolution of 'non-coding' complementary DNAs (reply). Nature 431, 757 (2004).

    Article  CAS  Google Scholar 

  21. Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).

    Article  CAS  Google Scholar 

  22. Wong, G.K., Passey, D.A. & Yu, J. Most of the human genome is transcribed. Genome Res. 11, 1975–1977 (2001).

    Article  CAS  Google Scholar 

  23. Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  Google Scholar 

  24. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence Project: update and current status. Nucleic Acids Res. 31, 34–37 (2003).

    Article  CAS  Google Scholar 

  25. Hubbard, T. et al. Ensembl 2005. Nucleic Acids Res. 33, D447–D453 (2005).

    Article  CAS  Google Scholar 

  26. Pontius, J.U., Wagner, L. & Schuler, G.D. Unigene: A unified view of the transcriptome. in The NCBI Handbook (National Center for Biotechnology Information, Bethesda, MD, 2003).

    Google Scholar 

Download references

Acknowledgements

We thank G.E. Hinton for conversations and C. Boone and B. Andrews for their support. This work was supported by grants from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada and the Canadian Foundation for Innovation (to T.R.H., B.J.F. and B.J.B.), by a PREA award (to B.J.F.) and by a Natural Sciences and Engineering Research Council of Canada postdoctoral fellowship (to Q.D.M.).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Benjamin J Blencowe or Timothy R Hughes.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

GenRate output for putative novel gene. (PDF 54 kb)

Supplementary Table 1

PCR confirmation of predicted false detections. (PDF 80 kb)

Supplementary Table 2

PCR confirmation of a novel gene. (PDF 11 kb)

Supplementary Methods (PDF 30 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frey, B., Mohammad, N., Morris, Q. et al. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs. Nat Genet 37, 991–996 (2005). https://doi.org/10.1038/ng1630

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1630

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing