Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

Article metrics

Abstract

Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Example of results and illustration of analysis method.
Figure 2: GenRate detects exons with high sensitivity and high specificity.
Figure 3: Performance of GenRate on detecting genes.
Figure 4: New exons detected by GenRate and associated with RefSeq Golden Path genes are categorized by 3′ or 5′ extensions of known genes, bridges that join together known genes, new exons that map to an EST or cDNA in the FANTOM2 or Unigene database, new exons that can be stitched together with the known gene by a previously detected EST or cDNA, and new exons that do not map to any previously detected sequences.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. 1

    International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  2. 2

    Schadt, E.E. et al. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol. 5, R73 (2004).

  3. 3

    Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).

  4. 4

    Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

  5. 5

    Krogh, A. Two methods for improving performance of an HMM and their applicatoin for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).

  6. 6

    Xu, Y., Mural, R.J. & Uberbacher, E.C. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 344–353 (1997).

  7. 7

    Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  8. 8

    Zhang, W. et al. The functional landscape of moues gene expression. J. Biol. 3, 21 (2004).

  9. 9

    Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).

  10. 10

    Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).

  11. 11

    Stolc, V. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

  12. 12

    Yamada, K. et al. Empricial analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).

  13. 13

    Kapranov, P. et al. Large-scale transcriptional activity in Chromosomes 21 and 22. Science 296, 916–919 (2002).

  14. 14

    Rinn, J.L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17, 529–540 (2003).

  15. 15

    Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).

  16. 16

    Kschischang, F.R., Frey, B.J. & Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).

  17. 17

    Garbarino, J.E. & Gibbons, I.R. Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein. BMC Genomics 3, 18 (2002).

  18. 18

    Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

  19. 19

    Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).

  20. 20

    Wang, J. et al. Mouse transcriptome: Neutral evolution of 'non-coding' complementary DNAs (reply). Nature 431, 757 (2004).

  21. 21

    Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).

  22. 22

    Wong, G.K., Passey, D.A. & Yu, J. Most of the human genome is transcribed. Genome Res. 11, 1975–1977 (2001).

  23. 23

    Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

  24. 24

    Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence Project: update and current status. Nucleic Acids Res. 31, 34–37 (2003).

  25. 25

    Hubbard, T. et al. Ensembl 2005. Nucleic Acids Res. 33, D447–D453 (2005).

  26. 26

    Pontius, J.U., Wagner, L. & Schuler, G.D. Unigene: A unified view of the transcriptome. in The NCBI Handbook (National Center for Biotechnology Information, Bethesda, MD, 2003).

Download references

Acknowledgements

We thank G.E. Hinton for conversations and C. Boone and B. Andrews for their support. This work was supported by grants from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada and the Canadian Foundation for Innovation (to T.R.H., B.J.F. and B.J.B.), by a PREA award (to B.J.F.) and by a Natural Sciences and Engineering Research Council of Canada postdoctoral fellowship (to Q.D.M.).

Author information

Correspondence to Benjamin J Blencowe or Timothy R Hughes.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

GenRate output for putative novel gene. (PDF 54 kb)

Supplementary Table 1

PCR confirmation of predicted false detections. (PDF 80 kb)

Supplementary Table 2

PCR confirmation of a novel gene. (PDF 11 kb)

Supplementary Methods (PDF 30 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading