Article | Published:

Functional annotation of a full-length mouse cDNA collection

Nature volume 409, pages 685690 (08 February 2001) | Download Citation

Subjects

Abstract

The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from $8.99

All prices are NET prices.

References

  1. 1.

    et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000).

  2. 2.

    & Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet. 25, 232–234 (2000).

  3. 3.

    et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nature Genet. 25, 239–240 (2000).

  4. 4.

    & High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).

  5. 5.

    et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).

  6. 6.

    et al. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617–1630 (2000).

  7. 7.

    , , & Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

  8. 8.

    & Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

  9. 9.

    , , , & Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res. 8, 524–530 (1998).

  10. 10.

    , , & A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45 (1997).

  11. 11.

    & CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877 (1999).

  12. 12.

    , & CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).

  13. 13.

    et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).

  14. 14.

    et al. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nature Genet. 24, 340–341 (2000).

  15. 15.

    et al. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15, 389–390 (1999).

  16. 16.

    et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000).

  17. 17.

    et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

  18. 18.

    & SAP- a putative DNA-binding motif involved in chromosomal organization. Trends Biochem. Sci. 25, 112–114 (2000).

  19. 19.

    Detection of conserved domains in protein sequences using a maximum-density subgraph algorithm. IEICE Trans. Fundamentals Electron. Commun. Comput. Sci. E83-A, 713–721 (2000).

  20. 20.

    , & PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16, 439–450 (2000).

  21. 21.

    et al. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. Proc. Natl Acad. Sci. USA 95, 520–524 (1998).

  22. 22.

    , , , & Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000).

  23. 23.

    et al. Automated filtration-based high-throughput plasmid preparation system. Genome Res. 9, 463–470 (1999).

  24. 24.

    et al. RIKEN integrated sequence analysis (RISA) system-384-format sequencing pipline with 384 multicapillary sequencer. Genome Res. 10, 1757–1771 (2000).

  25. 25.

    , & Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998).

  26. 26.

    & Amino-acid translation program for full-length cDNA sequences with frame-shift error. Physiol. Genomics. (in the press).

Download references

Acknowledgements

We thank the following (in alphabetical order) for discussion, encouragement and technical assistance: R. Abagyan, T. Akimura, K. Arakawa, M. Boguski, L. Corbani, T. A. Dragani, J. T. Eppig, S. Fujimori, G. Grillo, T. Haga, T. Hanagaki, S. Hanaoka, S. Hatta, N. Hayatsu, K. Hiramoto, T. Hiraoka, T. Hirozane, Y. Hodoyama, F. Hori, T. Hubbard, R. Hynes, K. Ikeda, K. Ikeo, C. Imamura, K. Imotani, S. Inoue, H. Kato, N. Kikuchi, Y. Kojima, A. Konagaya, M. Kouda, S. Koya, M. Kubota, S. Kumagai, C. Kurihara, M. Kusakabe, F. Licciulli, S. Liuni, L. Maltais, T. Matsuyama, L. McKenzie, A. Miyazaki, K. Mori, M. Muramatsu, M. Nakamura, K. Nomura, N. Nukina, K. Numata, R. Numazaki, M. Ohno, Y. Okuma, H. Ono, C. Owa, Y. Ozawa, G. Pertea, S. Ramachandran, E. M. Rubin, N. Saga, H. Saitou, H. Sakai, C. Sakai, A. Sakurai, H. Sano, D. Sasaki, L. Sato, C. Schneider, J. Schug, T. Shiraki, M. B. Soares, Y. Sogabe, C. Stoeckert, H. Sugawara, R. Sultana, H. Suzuki, M. Tagami, A. Tagawa, F. Takahashi, S. Takaku-Akahira, M. Takeuchi, T. Tanaka, Y. Tateno, Y. Tejima, J. Todd, A. Tomaru, S. Tonegawa, T. Toya, A. Wada, L. Wagner, A. Watahiki, T. Yamamura, T. Yamashita, T. Yao, A. Yasunishi, T. Yokota, S. Yokoyama, A. Yoshiki and K. Yotsutani. We also thank N. Kazuta, Y. Sigemoto, H. Torigoe and T. Washida for secretarial assistance. This study has been mainly supported by a grant for the RIKEN Genome Exploration Research Project and CREST (Core Research for Evolutional Science and Technology) to Y.H. Further support came from ACT-JST (Research and Development for Applying Advanced Computational Science and Technology) of Japan Science and Technology Corporation (JST) to Y.H. and H.M., and the Science and Technology Agency of the Japanese Government to Y.H. and Y.O. (All funds from the Science Technology Agency of the Japanese Government.) This work was also supported by a Grant-in-Aid for Scientific Research on Priority Areas and Human Genome Program, from the Ministry of Education, Science and Culture, and by a Grant-in-Aid for a Second Term Comprehensive 10-Year Strategy for Cancer Control from the Ministry of Health and Welfare to Y.H. Authors’ contributions: J. Kawai and Y. Okazaki contributed as organizers in phase II team and FANTOM, respectively. A. Shinagawa and H. Bono contributed as managers in sequence data production system and computing system, respectively. J. Quackenbush, P. Carninci, M. J. Brownstein, D. A. Hume, C. Schönbach, H. Suzuki and C. Weitz acted as senior managers of the annotation project.

Author information

Affiliations

  1. Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan

    • J. Kawai
    • , A. Shinagawa
    • , K. Shibata
    • , M. Yoshino
    • , M. Itoh
    • , Y. Ishii
    • , T. Arakawa
    • , A. Hara
    • , Y. Fukunishi
    • , H. Konno
    • , J. Adachi
    • , S. Fukuda
    • , K. Aizawa
    • , M. Izawa
    • , K. Nishi
    • , H. Kiyosawa
    • , S. Kondo
    • , I. Yamanaka
    • , T. Saito
    • , Y. Okazaki
    • , H. Bono
    • , R. Saito
    • , K. Kadota
    • , K. Sakai
    • , T. Okido
    • , M. Furuno
    • , H. Aono
    • , P. Carninci
    • , M. Kamiya
    • , K. Sato
    • , Y. Shibata
    • , H. Suzuki
    • , K. Yoshida
    •  & Y. Hayashizaki
  2. CREST, JST, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074 Japan

    • J. Kawai
    • , K. Shibata
    • , M. Itoh
    • , Y. Fukunishi
    • , H. Konno
    • , S. Fukuda
    • , K. Aizawa
    • , M. Kamiya
    •  & Y. Hayashizaki
  3. Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan

    • T. Gojobori
    •  & J. Mashima
  4. NTT Software Corporation, 223-1 Yamashita-cho, Naka-ku, Yokohama, Kanagawa, 231-8554, Japan

    • T. Kasukawa
    • , Y. Hasegawa
    • , H. Kawaji
    •  & S. Kohtsuki
  5. Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan

    • H. Matsuda
    •  & H. Kawaji
  6. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

    • M. Ashburner
    •  & W. Fleischmann
  7. Genomics Institute of the Novartis Research Foundation, 3115 Merryfield Row, San Diego, California 92121, USA

    • S. Batalov
    •  & C. Fletcher
  8. The Coordinated Laboratory for Computational Genomics, University of Iowa Iowa City, Iowa 52242, USA

    • T. Casavant
  9. The Rockefeller University, 1230 York Avenue, New York, New York 10021-6399, USA

    • T. Gaasterland
  10. Dipartimento di Fisiologia e Biochimica Generali, Universita di Milano Via Celoria, 26, 20133 Milano, Italy

    • C. Gissi
    •  & G. Pesole
  11. Mouse Genome Informatics, The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA

    • B. King
    • , R. Baldarelli
    • , J. Blake
    • , C. Bult
    • , D. Hill
    •  & M. Ringwald
  12. Laboratory for Bioinformatics, Faculty of Environmental Information, Keio University, 5322 Endoh, Fujisawa, Kanagawa, 252-0816, Japan

    • H. Kochiwa
    • , R. Suzuki
    • , M. Tomita
    •  & T. Washio
  13. Department of Molecular & Cell Biology, University of Maryland at Baltimore, Baltimore, Maryland 20201, USA

    • P. Kuehl
  14. University of California, Berkeley, Department of Molecular & Cell Biology, 142 Life Sciences Addition #3200, Berkeley, California 94720-3200, USA

    • S. Lewis
  15. Computational Proteomics Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan

    • Y. Matsuo
  16. Tokai University, Graduate School of Marine Science and Technology, 3-20-1 Orido, Shimizu, Shizuoka,424-8610 Japan

    • I. Nikaido
  17. The Institute for Genomic Research, 9712 Medical Center Dr., Rockville, Maryland 20850, USA

    • J. Quackenbush
    •  & N. H. Lee
  18. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Room 8N805, Bethesda, Maryland 20894, USA

    • L. M. Schriml
    •  & L. Wagner
  19. LION Bioscience AG, Im Neuenheimer Feld 515-519, D-69120 Heidelberg, Germany

    • F. Staubli
    • , N. Bojunga
    •  & M. Hofmann
  20. Stanford University School of Medicine, Beckman Centre B271A, Stanford, California 94305-5428, USA

    • G. Barsh
  21. Lawrence Berkeley Laboratory, 1 Cyclotron Rd, MS84-255, Berkeley, California 94710, USA

    • D. Boffelli
  22. Department of Pediatrics, The University of Iowa, 200 Hawkins Drive 440B EMRB, Iowa City, Iowa 52242-1009, USA

    • M. F. de Bonaldo
  23. Laboratory of Genetics, NIMH/NHGRI, National Institutes of Health Building 36, Room 3D06, Bethesda, Maryland 20892, USA

    • M. J. Brownstein
  24. Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan

    • M. Fujita
  25. Istituto Tumori Milano, Via Venezian,1, 120133 Milano, Italy

    • M. Gariboldi
  26. Department of Neurobiology, Harvard Medical School, 220 Longwood Ave., Boston, Massachusetts 02115, USA

    • S. Gustincich
  27. Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia

    • D. A. Hume
  28. Department of Medical Genetics, Wellcome Trust Centre for Molecular Mechanisms in Disease, University of Cambridge, Wellcome Trust/MRC building, Addenbrookes Hospital, Cambridge, CB2 2XY, UK

    • P. Lyons
  29. LNCIB c/o AREA Science Park, Padriciano 99, 34012 Trieste, Italy

    • L. Marchionni
  30. Computational and Bioinformatics Laboratory, Center for Bioinformatics, University of Pennsylvania, 1313 Blockley Hall, 418 Guardian Drive, Philadelphia, Pennsylvania 19104-6021, USA

    • J. Mazzarelli
  31. Vertebrate Developmental Neurogenetics, The Rockefeller University, 1230 York Avenue, Box 242, New York, New York 10021-6399, USA

    • P. Mombaerts
    •  & I. Rodriguez
  32. University at Buffalo/Roswell Park Cancer Institute, 120 Meyers Rd.#615, Amherst, New York 14226, USA

    • P. Nordone
  33. Department of Genetics, Stanford University, Beckman Centre B281, Stanford, California 94305, USA

    • B. Ring
  34. RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan

    • N. Sakamoto
  35. National Cancer Research Institute, 1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan

    • H. Sasaki
  36. Computational Genomics Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan

    • C. Schönbach
  37. Osaka Medical Center for Cancer, Nakamichi 1-3-3, Higashinari-ku, Osaka 537-8511 Japan

    • T. Seya
  38. Department of Neurobiology, Harvard Medical School, 220 Longwood Ave., Boston, Massachusetts 02115, USA

    • K.-F. Storch
    •  & C. Weitz
  39. University of California, San Diego, School of Medicine, Department of Pediatrics, 9500 Gilman Dr., Medical Teaching Facility 253, La Jolla, California 92093-0627, USA

    • K. Toyo-oka
  40. E17-353, Center for Learning and Memory, Massacusetts Institute of Technlogy, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, USA

    • K. H. Wang
  41. Massachusetts Institute of Technology, MIT CCR, 77 Massachusetts Avenue 17-230, Cambridge, Massachusetts 02139, USA

    • C. Whittaker
  42. Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK

    • L. Wilming
  43. University of California San Diego School of Medicine, 9500 Gilman Dr., Medical Teaching Facility, Room 252, La Jolla, California 92093-0627, USA

    • A. Wynshaw-Boris
  44. Tsukuba University, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan.

    • K. Sato
    •  & Y. Hayashizaki

Consortia

  1. The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium

    The RIKEN Genome Exploration Research Group Phase II Team

    FANTOM Consortium

    General organizer

Authors

    Corresponding author

    Correspondence to Y. Hayashizaki.

    Supplementary information

    Word documents

    1. 1.

      Supplementary Figure 1

      A Distribution of the length of 21,076 insert DNAsB Sequence accuracy

    2. 2.

      Supplementary Figure 2

      A SAP domain containing RIKEN clonesB Phylogenetic tree of the known OATPs and RIKEN clones

    3. 3.

      Supplementary Figure 3

      Alignment of the amino acid sequences of the known OATPs

    4. 4.

      Supplementary Figure 4

      Mapping of RIKEN clones using Radiation Hybrid

    5. 5.

      Supplementary Table 2

      A Full-length evaluationB Analysis of Alternative Splicing in Redundant Clone Set

    6. 6.

      Supplementary Table 3

    7. 7.

      Supplementary Table 4

      A RIKEN clones containing a DSP domainB RIKEN clones containing a consensus kinase signature motifC InterPro Motifs in RIKEN clones

    8. 8.

      Supplementary Table 5

      A Motifs identified by maximum density subgraph analysisB UTR Functional Elements

    9. 9.

      Supplementary Table 6

      RH Map of RIKEN Clones using RH databases

    10. 10.

      Supplementary Table 7

      A Databases that were used for the functional annotationB Software that was used during full-length sequencing and the functional annotation

    11. 11.

      Supplementary Table 8

      Strategies of Gene Ontology Assignment

    12. 12.

      Supplementary methods

      This information is also available at: http://www.gsc.riken.go.jp/e/FANTOM/supplement/

    Text files

    1. 1.

      Supplementary Table 1

      DDBJ accsession number and MGI ID for RIKEN ID

    About this article

    Publication history

    Received

    Accepted

    Published

    DOI

    https://doi.org/10.1038/35055500

    Comments

    By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.