Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

An encyclopedia of mouse genes

Abstract

The laboratory mouse is the premier model system for studies of mammalian development due to the powerful classical genetic analysis1 possible (see also the Jackson Laboratory web site, http://www.jax.org/) and the ever–expanding collection of molecular tools2,3. To enhance the utility of the mouse system, we initiated a program to generate a large database of expressed sequence tags (ESTs) that can provide rapid access to genes4,5,6,7,8,9,10,11,12,13,14,15,16. Of particular significance was the possibility that cDNA libraries could be prepared from very early stages of development, a situation unrealized in human EST projects7,12. We report here the development of a comprehensive database of ESTs for the mouse. The project, initiated in March 1996, has focused on 5´ end sequences from directionally cloned, oligo–dT primed cDNA libraries. As of 23 October 1998, 352,040 sequences had been generated, annotated and deposited in dbEST, where they comprised 93% of the total ESTs available for mouse. EST data are versatile and have been applied to gene identification17, comparative sequence analysis18,19, comparative gene mapping and candidate disease gene identification20, genome sequence annotation21,22, microarray development23 and the development of gene–based map resources24.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Sequence discrepancies between the mouse mRNA set and matching ESTs plotted as a function of trimmed sequence length.
Figure 2: Sugano libraries are enriched for full–length cDNAs.

Similar content being viewed by others

The ENCODE Project Consortium, Michael P. Snyder, … Richard M. Myers

References

  1. Brown, S.D.M. & Peters, J. Combining mutagenesis and genomics in the mouse—closing the phenotype gap. Trends Genet. 12, 433–435 (1996).

    Article  CAS  Google Scholar 

  2. Zambrowicz, B.P. et al. Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature 392, 608– 611 (1998).

    Article  CAS  Google Scholar 

  3. Hicks, G.G. et al. Functional genomics in mice by tagged sequence mutagenesis. Nature Genet. 16, 338– 344 (1997).

    Article  CAS  Google Scholar 

  4. Milner, R.J. & Sutcliffe, J.G. Gene expression in rat brain. Nucleic Acids Res. 11, 5497– 5520 (1983).

    Article  CAS  Google Scholar 

  5. Putney, S.D., Herligh, W.D. & Schimmel, P. A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing. Nature 302, 718–721 (1983).

    Article  CAS  Google Scholar 

  6. Adams, M.D. et al. Complementary DNA sequencing: expressed sequence tags and the human genome project. Science 252, 1651 –1656 (1991).

    Article  CAS  Google Scholar 

  7. Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3–17 ( 1995).

    CAS  PubMed  Google Scholar 

  8. McCombie, W.R. et al. Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nature Genet. 1, 124–131 ( 1992).

    Article  CAS  Google Scholar 

  9. Waterston, R.H. et al. A survey of expressed genes in C. elegans. Nature Genet. 1, 114–123 (1992).

    Article  CAS  Google Scholar 

  10. Sasaki, T. et al. Toward cataloguing all rice genes: large–scale sequencing of randomly chosen rice cDNAs from a callus cDNA library. Plant J. 6, 615–624 ( 1994).

    Article  CAS  Google Scholar 

  11. Houlgatte, R. et al. The GenExpress index: a resource for gene discovery and the genic map of the human genome. Genome Res. 5, 272–304 (1995).

    Article  CAS  Google Scholar 

  12. Hillier, L. et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 6, 807– 828 (1996).

    Article  CAS  Google Scholar 

  13. Yamamoto, K. & Sasaki, T. Large–scale EST sequencing in rice. Plant Mol. Biol. 35, 135– 144 (1997).

    Article  CAS  Google Scholar 

  14. Nelson, P.S. et al. An expressed–sequence–tag database of the human prostate: sequence analysis of 1168 clones. Genomics 47, 12–25 (1998).

    Article  CAS  Google Scholar 

  15. Ajioka, J.W. et al. Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 8, 18–28 (1998 ).

    Article  CAS  Google Scholar 

  16. Sasaki, N. et al. Characterization of gene expression in mouse blastocyst using single–pass sequencing of 3995 clones. Genomics 49, 167–179 (1998).

    Article  CAS  Google Scholar 

  17. Sutherland, H.F., Kim, U.J. & Scambler, P.J. Cloning and comparative mapping of the DiGeorge syndrome critical region in the mouse. Genomics 52, 37–43 (1998).

    Article  CAS  Google Scholar 

  18. Makalowski, W. & Boguski, M.S. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl Acad. Sci. USA 95, 9407–9412 (1998).

    Article  CAS  Google Scholar 

  19. Makalowski, W., Zhang, J. & Boguski, M.S. Comparative analysis of 1,196 orthologous mouse and human full–length mRNA and protein sequences. Genome Res. 6, 846–857 ( 1996).

    Article  CAS  Google Scholar 

  20. Scharf, J.M. et al. Identification of a candidate modifying gene for spinal muscular atrophy by comparative genomics. Nature Genet. 20, 83–86 (1998).

    Article  CAS  Google Scholar 

  21. Bailey, L.C. Jr, Searls, D.B. & Overton, G.C. Analysis of EST–driven gene annotation in human genomic sequence. Genome Res. 8, 362– 376 (1998).

    Article  CAS  Google Scholar 

  22. Jiang, J. & Jacob, H.J. EbEST: an automated tool using expressed sequence tags to delineate gene structure. Genome Res. 8, 268–275 (1998).

    Article  Google Scholar 

  23. Schena, M. et al. Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 16, 301– 306 (1998).

    Article  CAS  Google Scholar 

  24. Schuler, G.D. et al. A gene map of the human genome. Science 274, 540–546 (1996).

    Article  CAS  Google Scholar 

  25. Bonaldo, M.F., Lennon, G. & Soares, M.B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6, 791– 806 (1996).

    Article  CAS  Google Scholar 

  26. Ewing, B., Hillier, L., Wendl, M. & Green, P. Basecalling of automated sequencer traces using PHRED I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).

    Article  CAS  Google Scholar 

  27. Ewing, B. & Green, P. Basecalling of automated sequencer traces using PHRED II. Error probabilities. Genome Res. 8,186–194 (1998).

    Article  CAS  Google Scholar 

  28. Suzuki, Y., Yoshitomo–Nakagawa, K., Maruyama, K., Suyama, A. & Sugano, S. Construction and characterization of a full length–enriched and a 5´–end enriched cDNA library. Gene 200, 149– 156 (1997).

    Article  CAS  Google Scholar 

  29. Lennon, G., Auffray, C., Polymeropoulos, M. & Soares, M.B. The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression. Genomics 33, 151– 152 (1996).

    Article  CAS  Google Scholar 

  30. Sonnhammer, E.L. & Durbin, R. Analysis of protein domain families in Caenorhabditis elegans. Genomics 46, 200–216 (1997).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank all investigators who have donated libraries for sequencing; S. Tilghman for scientific guidance; S. Chissoe and S. Gorski for comments on the manuscript and useful discussion; G. Schuler, C. Tolstoshev and others at NCBI for assistance with databases; and the staff at Washington University Genome Center for technical support. Work by C.P. and G.L. was supported by the U.S. DOE under contract W–7405–Eng–48 to LLNL. Work at Washington University was funded by a grant from Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Marra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marra, M., Hillier, L., Kucaba, T. et al. An encyclopedia of mouse genes. Nat Genet 21, 191–194 (1999). https://doi.org/10.1038/5976

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/5976

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing