Functional annotation of a full-length mouse cDNA collection

doi:10.1038/35055500

Article
Published: 08 February 2001

Functional annotation of a full-length mouse cDNA collection

The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium

Nature volume 409, pages 685–690 (2001)Cite this article

11k Accesses
524 Citations
12 Altmetric
Metrics details

Abstract

The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Phase II full-insert sequencing flow chart.**

**Figure 2: The criteria used in assigning RIKEN definitions (riken_defs).**

Perspectives on ENCODE

Article 29 July 2020

Transcriptional activity and strain-specific history of mouse pseudogenes

Article Open access 29 July 2020

The status of the human gene catalogue

Article 04 October 2023

References

Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000).
Article CAS Google Scholar
Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet. 25, 232–234 (2000).
Article CAS Google Scholar
Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nature Genet. 25, 239–240 (2000).
Article CAS Google Scholar
Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).
Article CAS Google Scholar
Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).
Article CAS Google Scholar
Carninci, P. et al. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617–1630 (2000).
Article CAS Google Scholar
Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).
Article CAS Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Article CAS Google Scholar
Gautheret, D., Poirot, O., Lopez, F., Audic, S. & Claverie, J. M. Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res. 8, 524–530 (1998).
Article CAS Google Scholar
Huang, X., Adams, M. D., Zhou, H. & Kerlavage, A. R. A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45 (1997).
Article CAS Google Scholar
Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877 (1999).
Article CAS Google Scholar
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Article CAS Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).
Article CAS Google Scholar
Croft, L. et al. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nature Genet. 24, 340–341 (2000).
Article ADS CAS Google Scholar
Hanke, J. et al. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15, 389–390 (1999).
Article CAS Google Scholar
Rubin, G. M. et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000).
Article CAS Google Scholar
Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Article Google Scholar
Aravind, L. & Koonin, E. V. SAP- a putative DNA-binding motif involved in chromosomal organization. Trends Biochem. Sci. 25, 112–114 (2000).
Article CAS Google Scholar
Matsuda, H. Detection of conserved domains in protein sequences using a maximum-density subgraph algorithm. IEICE Trans. Fundamentals Electron. Commun. Comput. Sci. E83-A, 713–721 (2000).
Google Scholar
Pesole, G., Liuni, S. & D'Souza, M. PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16, 439–450 (2000).
Article CAS Google Scholar
Carninci, P. et al. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. Proc. Natl Acad. Sci. USA 95, 520–524 (1998).
Article ADS CAS Google Scholar
Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. & Lander, E. S. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000).
Article CAS Google Scholar
Itoh, M. et al. Automated filtration-based high-throughput plasmid preparation system. Genome Res. 9, 463–470 (1999).
CAS PubMed PubMed Central Google Scholar
Shibata, K. et al. RIKEN integrated sequence analysis (RISA) system-384-format sequencing pipline with 384 multicapillary sequencer. Genome Res. 10, 1757–1771 (2000).
Article CAS Google Scholar
Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998).
Article CAS Google Scholar
Fukunishi, Y. & Hayashizaki, Y. Amino-acid translation program for full-length cDNA sequences with frame-shift error. Physiol. Genomics. (in the press).

Download references

Acknowledgements

We thank the following (in alphabetical order) for discussion, encouragement and technical assistance: R. Abagyan, T. Akimura, K. Arakawa, M. Boguski, L. Corbani, T. A. Dragani, J. T. Eppig, S. Fujimori, G. Grillo, T. Haga, T. Hanagaki, S. Hanaoka, S. Hatta, N. Hayatsu, K. Hiramoto, T. Hiraoka, T. Hirozane, Y. Hodoyama, F. Hori, T. Hubbard, R. Hynes, K. Ikeda, K. Ikeo, C. Imamura, K. Imotani, S. Inoue, H. Kato, N. Kikuchi, Y. Kojima, A. Konagaya, M. Kouda, S. Koya, M. Kubota, S. Kumagai, C. Kurihara, M. Kusakabe, F. Licciulli, S. Liuni, L. Maltais, T. Matsuyama, L. McKenzie, A. Miyazaki, K. Mori, M. Muramatsu, M. Nakamura, K. Nomura, N. Nukina, K. Numata, R. Numazaki, M. Ohno, Y. Okuma, H. Ono, C. Owa, Y. Ozawa, G. Pertea, S. Ramachandran, E. M. Rubin, N. Saga, H. Saitou, H. Sakai, C. Sakai, A. Sakurai, H. Sano, D. Sasaki, L. Sato, C. Schneider, J. Schug, T. Shiraki, M. B. Soares, Y. Sogabe, C. Stoeckert, H. Sugawara, R. Sultana, H. Suzuki, M. Tagami, A. Tagawa, F. Takahashi, S. Takaku-Akahira, M. Takeuchi, T. Tanaka, Y. Tateno, Y. Tejima, J. Todd, A. Tomaru, S. Tonegawa, T. Toya, A. Wada, L. Wagner, A. Watahiki, T. Yamamura, T. Yamashita, T. Yao, A. Yasunishi, T. Yokota, S. Yokoyama, A. Yoshiki and K. Yotsutani. We also thank N. Kazuta, Y. Sigemoto, H. Torigoe and T. Washida for secretarial assistance. This study has been mainly supported by a grant for the RIKEN Genome Exploration Research Project and CREST (Core Research for Evolutional Science and Technology) to Y.H. Further support came from ACT-JST (Research and Development for Applying Advanced Computational Science and Technology) of Japan Science and Technology Corporation (JST) to Y.H. and H.M., and the Science and Technology Agency of the Japanese Government to Y.H. and Y.O. (All funds from the Science Technology Agency of the Japanese Government.) This work was also supported by a Grant-in-Aid for Scientific Research on Priority Areas and Human Genome Program, from the Ministry of Education, Science and Culture, and by a Grant-in-Aid for a Second Term Comprehensive 10-Year Strategy for Cancer Control from the Ministry of Health and Welfare to Y.H. Authors’ contributions: J. Kawai and Y. Okazaki contributed as organizers in phase II team and FANTOM, respectively. A. Shinagawa and H. Bono contributed as managers in sequence data production system and computing system, respectively. J. Quackenbush, P. Carninci, M. J. Brownstein, D. A. Hume, C. Schönbach, H. Suzuki and C. Weitz acted as senior managers of the annotation project.

Author information

Authors and Affiliations

Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan
J. Kawai, A. Shinagawa, K. Shibata, M. Yoshino, M. Itoh, Y. Ishii, T. Arakawa, A. Hara, Y. Fukunishi, H. Konno, J. Adachi, S. Fukuda, K. Aizawa, M. Izawa, K. Nishi, H. Kiyosawa, S. Kondo, I. Yamanaka, T. Saito, Y. Okazaki, H. Bono, R. Saito, K. Kadota, K. Sakai, T. Okido, M. Furuno, H. Aono, P. Carninci, M. Kamiya, K. Sato, Y. Shibata, H. Suzuki, K. Yoshida & Y. Hayashizaki
CREST, JST, 3-1-1 Koyadai, Tsukuba, 305-0074, Ibaraki, Japan
J. Kawai, K. Shibata, M. Itoh, Y. Fukunishi, H. Konno, S. Fukuda, K. Aizawa, M. Kamiya & Y. Hayashizaki
Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima, 411-8540, Shizuoka, Japan
T. Gojobori & J. Mashima
NTT Software Corporation, 223-1 Yamashita-cho, Naka-ku, Yokohama, 231-8554, Kanagawa, Japan
T. Kasukawa, Y. Hasegawa, H. Kawaji & S. Kohtsuki
Osaka University, 1-3 Machikaneyama, Toyonaka, 560-8531, Osaka, Japan
H. Matsuda & H. Kawaji
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
M. Ashburner & W. Fleischmann
Genomics Institute of the Novartis Research Foundation, 3115 Merryfield Row, San Diego, 92121, California, USA
S. Batalov & C. Fletcher
The Coordinated Laboratory for Computational Genomics, University of Iowa Iowa City, 52242, Iowa, USA
T. Casavant
The Rockefeller University, 1230 York Avenue, New York, 10021-6399, New York, USA
T. Gaasterland
Dipartimento di Fisiologia e Biochimica Generali, Universita di Milano Via Celoria, 26, Milano, 20133, Italy
C. Gissi & G. Pesole
Mouse Genome Informatics, The Jackson Laboratory, 600 Main Street, Bar Harbor, 04609, Maine, USA
B. King, R. Baldarelli, J. Blake, C. Bult, D. Hill & M. Ringwald
Laboratory for Bioinformatics, Faculty of Environmental Information, Keio University, 5322 Endoh, Fujisawa, 252-0816, Kanagawa, Japan
H. Kochiwa, R. Suzuki, M. Tomita & T. Washio
Department of Molecular & Cell Biology, University of Maryland at Baltimore, Baltimore, 20201, Maryland, USA
P. Kuehl
Department of Molecular & Cell Biology, University of California, Berkeley, 142 Life Sciences Addition #3200, Berkeley, 94720-3200, California, USA
S. Lewis
Computational Proteomics Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan
Y. Matsuo
Tokai University, Graduate School of Marine Science and Technology, 3-20-1 Orido, Shimizu, 424-8610, Shizuoka, Japan
I. Nikaido
The Institute for Genomic Research, 9712 Medical Center Dr., Rockville, 20850, Maryland, USA
J. Quackenbush & N. H. Lee
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Room 8N805, Bethesda, 20894, Maryland, USA
L. M. Schriml & L. Wagner
LION Bioscience AG, Im Neuenheimer Feld 515-519, Heidelberg, D-69120, Germany
F. Staubli, N. Bojunga & M. Hofmann
Stanford University School of Medicine, Beckman Centre B271A, Stanford, 94305-5428, California, USA
G. Barsh
Lawrence Berkeley Laboratory, 1 Cyclotron Rd, MS84-255, Berkeley, 94710, California, USA
D. Boffelli
Department of Pediatrics, The University of Iowa, 200 Hawkins Drive 440B EMRB, Iowa City, 52242-1009, Iowa, USA
M. F. de Bonaldo
Laboratory of Genetics, NIMH/NHGRI, National Institutes of Health Building 36, Room 3D06, Bethesda, 20892, Maryland, USA
M. J. Brownstein
Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8655, Tokyo, Japan
M. Fujita
Istituto Tumori Milano, Via Venezian,1, Milano, 120133, Italy
M. Gariboldi
Department of Neurobiology, Harvard Medical School, 220 Longwood Ave., Boston, 02115, Massachusetts, USA
S. Gustincich
Institute for Molecular Bioscience, University of Queensland, Brisbane, 4072, Queensland, Australia
D. A. Hume
Department of Medical Genetics, Wellcome Trust Centre for Molecular Mechanisms in Disease, University of Cambridge, Wellcome Trust/MRC building, Addenbrookes Hospital, Cambridge, CB2 2XY, UK
P. Lyons
LNCIB c/o AREA Science Park, Padriciano 99, Trieste, 34012, Italy
L. Marchionni
Computational and Bioinformatics Laboratory, Center for Bioinformatics, University of Pennsylvania, 1313 Blockley Hall, 418 Guardian Drive, Philadelphia, 19104-6021, Pennsylvania, USA
J. Mazzarelli
Vertebrate Developmental Neurogenetics, The Rockefeller University, 1230 York Avenue, Box 242, New York, 10021-6399, New York, USA
P. Mombaerts & I. Rodriguez
University at Buffalo/Roswell Park Cancer Institute, 120 Meyers Rd.#615, Amherst, 14226, New York, USA
P. Nordone
Department of Genetics, Stanford University, Beckman Centre B281, Stanford, 94305, California, USA
B. Ring
RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, 351-0198, Saitama, Japan
N. Sakamoto
National Cancer Research Institute, 1-1 Tsukiji, Chuo-ku, 104-0045, Tokyo, Japan
H. Sasaki
Computational Genomics Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan
C. Schönbach
Osaka Medical Center for Cancer, Nakamichi 1-3-3, Higashinari-ku, 537-8511, Osaka, Japan
T. Seya
Department of Neurobiology, Harvard Medical School, 220 Longwood Ave., Boston, 02115, Massachusetts, USA
K.-F. Storch & C. Weitz
Department of Pediatrics, University of California, San Diego, School of Medicine, 9500 Gilman Dr., Medical Teaching Facility 253, La Jolla, 92093-0627, California, USA
K. Toyo-oka
E17-353, Center for Learning and Memory, Massacusetts Institute of Technlogy, 77 Massachusetts Ave., Cambridge, 02139, Massachusetts, USA
K. H. Wang
Massachusetts Institute of Technology, MIT CCR, 77 Massachusetts Avenue 17-230, Cambridge, 02139, Massachusetts, USA
C. Whittaker
Sanger Centre, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridgeshire, UK
L. Wilming
University of California San Diego School of Medicine, 9500 Gilman Dr., Medical Teaching Facility, Room 252, La Jolla, 92093-0627, California, USA
A. Wynshaw-Boris
Tsukuba University, 1-1-1 Tennodai, Tsukuba, 305-8577,, Ibaraki, Japan
K. Sato & Y. Hayashizaki

Consortia

The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium

The RIKEN Genome Exploration Research Group Phase II Team
- J. Kawai
- , A. Shinagawa
- , K. Shibata
- , M. Yoshino
- , M. Itoh
- , Y. Ishii
- , T. Arakawa
- , A. Hara
- , Y. Fukunishi
- , H. Konno
- , J. Adachi
- , S. Fukuda
- , K. Aizawa
- , M. Izawa
- , K. Nishi
- , H. Kiyosawa
- , S. Kondo
- , I. Yamanaka
- & T. Saito
FANTOM Consortium
- Y. Okazaki
- , T. Gojobori
- , H. Bono
- , T. Kasukawa
- , R. Saito
- , K. Kadota
- , H. Matsuda
- , M. Ashburner
- , S. Batalov
- , T. Casavant
- , W. Fleischmann
- , T. Gaasterland
- , C. Gissi
- , B. King
- , H. Kochiwa
- , P. Kuehl
- , S. Lewis
- , Y. Matsuo
- , I. Nikaido
- , G. Pesole
- , J. Quackenbush
- , L. M. Schriml
- , F. Staubli
- , R. Suzuki
- , M. Tomita
- , L. Wagner
- , T. Washio
- , K. Sakai
- , T. Okido
- , M. Furuno
- , H. Aono
- , R. Baldarelli
- , G. Barsh
- , J. Blake
- , D. Boffelli
- , N. Bojunga
- , P. Carninci
- , M. F. de Bonaldo
- , M. J. Brownstein
- , C. Bult
- , C. Fletcher
- , M. Fujita
- , M. Gariboldi
- , S. Gustincich
- , D. Hill
- , M. Hofmann
- , D. A. Hume
- , M. Kamiya
- , N. H. Lee
- , P. Lyons
- , L. Marchionni
- , J. Mashima
- , J. Mazzarelli
- , P. Mombaerts
- , P. Nordone
- , B. Ring
- , M. Ringwald
- , I. Rodriguez
- , N. Sakamoto
- , H. Sasaki
- , K. Sato
- , C. Schönbach
- , T. Seya
- , Y. Shibata
- , K.-F. Storch
- , H. Suzuki
- , K. Toyo-oka
- , K. H. Wang
- , C. Weitz
- , C. Whittaker
- , L. Wilming
- , A. Wynshaw-Boris
- , K. Yoshida
- , Y. Hasegawa
- , H. Kawaji
- & S. Kohtsuki
General organizer
- Y. Hayashizaki

Corresponding author

Correspondence to Y. Hayashizaki.

Supplementary information

Supplementary Figure 1

A Distribution of the length of 21,076 insert DNAs (DOC 34 kb)

B Sequence accuracy

Supplementary Figure 2

A SAP domain containing RIKEN clones (DOC 41 kb)

B Phylogenetic tree of the known OATPs and RIKEN clones

Supplementary Figure 3

Alignment of the amino acid sequences of the known OATPs (DOC 135 kb)

Supplementary Figure 4

Mapping of RIKEN clones using Radiation Hybrid (DOC 31 kb)

Supplementary Table 1

DDBJ accsession number and MGI ID for RIKEN ID (TXT 801 kb)

Supplementary Table 2

A Full-length evaluation (DOC 80 kb)

B Analysis of Alternative Splicing in Redundant Clone Set

Supplementary Table 3 (DOC 1026 kb)

Supplementary Table 4

A RIKEN clones containing a DSP domain (DOC 997 kb)

B RIKEN clones containing a consensus kinase signature motif

C InterPro Motifs in RIKEN clones

Supplementary Table 5

A Motifs identified by maximum density subgraph analysis (DOC 64 kb)

B UTR Functional Elements

Supplementary Table 6

RH Map of RIKEN Clones using RH databases (DOC 56 kb)

Supplementary Table 7

A Databases that were used for the functional annotation (DOC 82 kb)

B Software that was used during full-length sequencing and the functional annotation

Supplementary Table 8

Strategies of Gene Ontology Assignment (DOC 32 kb)

Supplementary methods

This information is also available at: http://www.gsc.riken.go.jp/e/FANTOM/supplement/ (DOC 32 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001). https://doi.org/10.1038/35055500

Download citation

Received: 06 November 2000
Accepted: 29 December 2000
Issue Date: 08 February 2001
DOI: https://doi.org/10.1038/35055500

This article is cited by

Computational approaches and challenges for identification and annotation of non-coding RNAs using RNA-Seq
- Kiran Dindhoria
- Isha Monga
- Amarinder Singh Thind
Functional & Integrative Genomics (2022)
Functional signatures of evolutionarily young CTCF binding sites
- Dhoyazan Azazi
- Jonathan M. Mudge
- Paul Flicek
BMC Biology (2020)
The Rab5 activator RME-6 is required for amyloid precursor protein endocytosis depending on the YTSI motif
- Simone Eggert
- Tomas Gruebl
- Stefan Kins
Cellular and Molecular Life Sciences (2020)
Bridging the gap between reference and real transcriptomes
- Antonin Morillon
- Daniel Gautheret
Genome Biology (2019)
The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types
- Hideya Kawaji
- Takeya Kasukawa
- Yoshihide Hayashizaki
Scientific Data (2017)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium

The RIKEN Genome Exploration Research Group Phase II Team

FANTOM Consortium

General organizer