Abstract
The laboratory mouse is the premier model system for studies of mammalian development due to the powerful classical genetic analysis1 possible (see also the Jackson Laboratory web site, http://www.jax.org/) and the ever–expanding collection of molecular tools2,3. To enhance the utility of the mouse system, we initiated a program to generate a large database of expressed sequence tags (ESTs) that can provide rapid access to genes4,5,6,7,8,9,10,11,12,13,14,15,16. Of particular significance was the possibility that cDNA libraries could be prepared from very early stages of development, a situation unrealized in human EST projects7,12. We report here the development of a comprehensive database of ESTs for the mouse. The project, initiated in March 1996, has focused on 5´ end sequences from directionally cloned, oligo–dT primed cDNA libraries. As of 23 October 1998, 352,040 sequences had been generated, annotated and deposited in dbEST, where they comprised 93% of the total ESTs available for mouse. EST data are versatile and have been applied to gene identification17, comparative sequence analysis18,19, comparative gene mapping and candidate disease gene identification20, genome sequence annotation21,22, microarray development23 and the development of gene–based map resources24.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Brown, S.D.M. & Peters, J. Combining mutagenesis and genomics in the mouse—closing the phenotype gap. Trends Genet. 12, 433–435 (1996).
Zambrowicz, B.P. et al. Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature 392, 608– 611 (1998).
Hicks, G.G. et al. Functional genomics in mice by tagged sequence mutagenesis. Nature Genet. 16, 338– 344 (1997).
Milner, R.J. & Sutcliffe, J.G. Gene expression in rat brain. Nucleic Acids Res. 11, 5497– 5520 (1983).
Putney, S.D., Herligh, W.D. & Schimmel, P. A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing. Nature 302, 718–721 (1983).
Adams, M.D. et al. Complementary DNA sequencing: expressed sequence tags and the human genome project. Science 252, 1651 –1656 (1991).
Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3–17 ( 1995).
McCombie, W.R. et al. Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nature Genet. 1, 124–131 ( 1992).
Waterston, R.H. et al. A survey of expressed genes in C. elegans. Nature Genet. 1, 114–123 (1992).
Sasaki, T. et al. Toward cataloguing all rice genes: large–scale sequencing of randomly chosen rice cDNAs from a callus cDNA library. Plant J. 6, 615–624 ( 1994).
Houlgatte, R. et al. The GenExpress index: a resource for gene discovery and the genic map of the human genome. Genome Res. 5, 272–304 (1995).
Hillier, L. et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 6, 807– 828 (1996).
Yamamoto, K. & Sasaki, T. Large–scale EST sequencing in rice. Plant Mol. Biol. 35, 135– 144 (1997).
Nelson, P.S. et al. An expressed–sequence–tag database of the human prostate: sequence analysis of 1168 clones. Genomics 47, 12–25 (1998).
Ajioka, J.W. et al. Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 8, 18–28 (1998 ).
Sasaki, N. et al. Characterization of gene expression in mouse blastocyst using single–pass sequencing of 3995 clones. Genomics 49, 167–179 (1998).
Sutherland, H.F., Kim, U.J. & Scambler, P.J. Cloning and comparative mapping of the DiGeorge syndrome critical region in the mouse. Genomics 52, 37–43 (1998).
Makalowski, W. & Boguski, M.S. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl Acad. Sci. USA 95, 9407–9412 (1998).
Makalowski, W., Zhang, J. & Boguski, M.S. Comparative analysis of 1,196 orthologous mouse and human full–length mRNA and protein sequences. Genome Res. 6, 846–857 ( 1996).
Scharf, J.M. et al. Identification of a candidate modifying gene for spinal muscular atrophy by comparative genomics. Nature Genet. 20, 83–86 (1998).
Bailey, L.C. Jr, Searls, D.B. & Overton, G.C. Analysis of EST–driven gene annotation in human genomic sequence. Genome Res. 8, 362– 376 (1998).
Jiang, J. & Jacob, H.J. EbEST: an automated tool using expressed sequence tags to delineate gene structure. Genome Res. 8, 268–275 (1998).
Schena, M. et al. Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 16, 301– 306 (1998).
Schuler, G.D. et al. A gene map of the human genome. Science 274, 540–546 (1996).
Bonaldo, M.F., Lennon, G. & Soares, M.B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6, 791– 806 (1996).
Ewing, B., Hillier, L., Wendl, M. & Green, P. Basecalling of automated sequencer traces using PHRED I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).
Ewing, B. & Green, P. Basecalling of automated sequencer traces using PHRED II. Error probabilities. Genome Res. 8,186–194 (1998).
Suzuki, Y., Yoshitomo–Nakagawa, K., Maruyama, K., Suyama, A. & Sugano, S. Construction and characterization of a full length–enriched and a 5´–end enriched cDNA library. Gene 200, 149– 156 (1997).
Lennon, G., Auffray, C., Polymeropoulos, M. & Soares, M.B. The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression. Genomics 33, 151– 152 (1996).
Sonnhammer, E.L. & Durbin, R. Analysis of protein domain families in Caenorhabditis elegans. Genomics 46, 200–216 (1997).
Acknowledgements
We thank all investigators who have donated libraries for sequencing; S. Tilghman for scientific guidance; S. Chissoe and S. Gorski for comments on the manuscript and useful discussion; G. Schuler, C. Tolstoshev and others at NCBI for assistance with databases; and the staff at Washington University Genome Center for technical support. Work by C.P. and G.L. was supported by the U.S. DOE under contract W–7405–Eng–48 to LLNL. Work at Washington University was funded by a grant from Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marra, M., Hillier, L., Kucaba, T. et al. An encyclopedia of mouse genes. Nat Genet 21, 191–194 (1999). https://doi.org/10.1038/5976
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/5976
This article is cited by
-
Exploratory bioinformatics investigation reveals importance of “junk” DNA in early embryo development
BMC Genomics (2017)
-
Computational methods for transcriptome annotation and quantification using RNA-seq
Nature Methods (2011)
-
Temperature-dependent growth rates and gene expression patterns of various medaka Oryzias latipes cell lines derived from different populations
Journal of Comparative Physiology B (2006)
-
Transcriptome analysis of human gastric cancer
Mammalian Genome (2005)
-
Initial sequencing and comparative analysis of the mouse genome
Nature (2002)