Article

Initial sequencing and analysis of the human genome

Received:
Accepted:
Published online:

Abstract

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  • Subscribe to Nature for full access:

    $199

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    Untersuchungen über die Xenien bei Zea mays. Berichte der Deutsche Botanische Gesellschaft 17, 410–418 (1899).

  2. 2.

    Sur la loie de disjonction des hybrides. Comptes Rendue Hebdemodaires, Acad. Sci. Paris 130, 845–847 (1900).

  3. 3.

    Uber Künstliche Kreuzung bei Pisum sativum. Berichte der Deutsche Botanische Gesellschaft 18, 232–239. (1900).

  4. 4.

    et al. Nucleotide sequence of bacteriophage Φ X174 DNA. Nature 265, 687–695 (1977).

  5. 5.

    et al. The nucleotide sequence of bacteriophage ΦX174. J Mol Biol 125, 225–246 (1978).

  6. 6.

    , , , & Nucleotide-sequence of bacteriophage Lambda DNA. J. Mol. Biol. 162, 729–773 (1982).

  7. 7.

    et al. Complete nucleotide sequence of SV40 DNA. Nature 273, 113–120 (1978).

  8. 8.

    et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).

  9. 9.

    , , & Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331 (1980).

  10. 10.

    et al. Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl Acad. Sci. USA 83, 7826–7830 (1986).

  11. 11.

    , , & Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc. Natl Acad. Sci. USA 83, 7821–7825 (1986).

  12. 12.

    , & A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing. Nature 302, 718–721 (1983).

  13. 13.

    & Gene expression in rat brain. Nucleic Acids Res. 11, 5497–5520 (1983).

  14. 14.

    et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252, 1651–1656 (1991).

  15. 15.

    et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3–174 (1995).

  16. 16.

    et al. Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genet. 2, 173–179 (1992).

  17. 17.

    et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 6, 807–828 (1996).

  18. 18.

    , , & The mammalian gene collection. Science 286, 455–457 (1999).

  19. 19.

    et al. Gene-based sequence-tagged-sites (STSs) as the basis for a human gene map. Nature Genet. 10, 415–423 (1995).

  20. 20.

    et al. The Genexpress Index: a resource for gene discovery and the genic map of the human genome. Genome Res. 5, 272–304 (1995).

  21. 21.

    The Santa Cruz Workshop—May 1985. Genomics 5, 954–956 (1989).

  22. 22.

    Human genome—Department of Energy on the map. Nature 321, 371 (1986).

  23. 23.

    Mapping and Sequencing the Human Genome (National Academy Press, Washington DC, 1988).

  24. 24.

    & Genome (Simon and Schuster, New York, 1990).

  25. 25.

    & (eds) The Code of Codes: Scientific and Social Issues in the Human Genome Project (Harvard Univ. Press, Cambridge, Massachusetts, 1992).

  26. 26.

    The Gene Wars: Science, Politics, and the Human Genome (W. W. Norton & Co., New York, London, 1994).

  27. 27.

    et al. A genetic linkage map of the human genome. Cell 51, 319–337 (1987).

  28. 28.

    et al. The 1993–94 Genethon human genetic linkage map. Nature Genet. 7, 246–339 (1994).

  29. 29.

    et al. An STS-based map of the human genome. Science 270, 1945–1954 (1995).

  30. 30.

    et al. A comprehensive genetic map of the mouse genome. Nature 380, 149–152 (1996).

  31. 31.

    et al. A YAC-based physical map of the mouse genome. Nature Genet. 22, 388–393 (1999).

  32. 32.

    et al. The complete DNA sequence of yeast chromosome III. Nature 357, 38–46 (1992).

  33. 33.

    et al. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 368, 32–38 (1994).

  34. 34.

    et al. The human growth hormone locus: nucleotide sequence, biology, and evolution. Genomics 4, 479–497 (1989).

  35. 35.

    et al. Expressed genes, Alu repeats and polymorphisms in cosmids sequenced from chromosome 4p16.3. Nature Genet. 1, 348–353 (1992).

  36. 36.

    et al. Automated DNA sequencing and analysis of 106 kilobases from human chromosome 19q13.3. Nature Genet. 1, 34–39 (1992).

  37. 37.

    et al. Automated DNA sequencing of the human HPRT locus. Genomics 6, 593–608 (1990).

  38. 38.

    A strategy for sequencing the genome 5 years early. Science 267, 783–784 (1995).

  39. 39.

    Project to sequence human genome moves on to the starting blocks. Nature 375, 93–94 (1995).

  40. 40.

    et al. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl Acad. Sci. USA 89, 8794–8797 (1992).

  41. 41.

    , & Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236, 806–812 (1987).

  42. 42.

    A second private genome project. Science 281, 1121 (1998).

  43. 43.

    NIH to produce a ‘working draft’ of the genome by 2001. Science 281, 1774–1775 (1998).

  44. 44.

    Academic sequencers challenge Celera in a sprint to the finish. Science 283, 1822–1823 (1999).

  45. 45.

    , , , & Analysis of the quality and utility of random shotgun sequencing at low redundancies. Genome Res. 8, 1074–1084 (1998).

  46. 46.

    et al. New goals for the U. S. Human Genome Project: 1998–2003. Science 282, 682–689 (1998).

  47. 47.

    & A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94, 441–448 (1975).

  48. 48.

    & A new method for sequencing DNA. Proc. Natl Acad. Sci. USA 74, 560–564 (1977).

  49. 49.

    Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res. 9, 3015–3027 (1981).

  50. 50.

    et al. The complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by M13mp7 shotgun sequencing. Nucleic Acids Res. 9, 2871–2888 (1981).

  51. 51.

    Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal. Biochem. 129, 216–223 (1983).

  52. 52.

    et al. Sequence and analysis of the human ABL gene, the BCR gene, and regions involved in the Philadelphia chromosomal translocation. Genomics 27, 67–82 (1995).

  53. 53.

    , & The complete 685-kilobase DNA sequence of the human beta T cell receptor locus. Science 272, 1755–1762 (1996).

  54. 54.

    et al. Organization, structure, and function of 95 kb of DNA spanning the murine T-cell receptor C alpha/C delta region. Genomics 13, 1209–1230 (1992).

  55. 55.

    et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 378, 789–792 (1995).

  56. 56.

    et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).

  57. 57.

    & Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988).

  58. 58.

    & Human whole-genome shotgun sequencing. Genome Res. 7, 401–409 (1997).

  59. 59.

    Against a whole-genome shotgun. Genome Res. 7, 410–417 (1997).

  60. 60.

    et al. Shotgun sequencing of the human genome. Science 280, 1540–1542 (1998).

  61. 61.

    et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

  62. 62.

    et al. Fluorescence detection in automated DNA sequence analysis. Nature 321, 674–679 (1986).

  63. 63.

    , , , & Fluorescence energy-transfer dye-labeled primers for DNA sequencing and analysis. Proc. Natl Acad. Sci. USA 92, 4347–4351 (1995).

  64. 64.

    et al. New energy transfer dyes for DNA sequencing. Nucleic Acids Res. 25, 2816–2822 (1997).

  65. 65.

    et al. New dye-labeled terminators for improved DNA sequencing patterns. Nucleic Acids Res. 25, 4500–4504 (1997).

  66. 66.

    , & Electrophoretically uniform fluorescent dyes for automated DNA sequencing. Science 271, 1420–1422 (1996).

  67. 67.

    et al. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238, 336–341 (1987).

  68. 68.

    & A novel thermostable polymerase for DNA sequencing. Nature 376, 796–797 (1995).

  69. 69.

    & Selective inactivation of the exonuclease activity of bacteriophage T7 DNA polymerase by in vitro mutagenesis. J. Biol. Chem. 264, 6447–6458 (1989).

  70. 70.

    & DNA sequence analysis with a modified bacteriophage T7 DNA polymerase—effect of pyrophosphorolysis and metal ions. J. Biol. Chem. 265, 8322–8328 (1990).

  71. 71.

    Improved double-stranded DNA sequencing using the linear polymerase chain reaction. Nucleic Acids Res. 17, 8889 (1989).

  72. 72.

    , , & Analytical and micropreparative ultrahigh resolution of oligonucleotides by polyacrylamide-gel high-performance capillary electrophoresis. Anal. Chem. 62, 137–141 (1990).

  73. 73.

    et al. High-speed DNA sequencing by capillary electrophoresis. Nucleic Acids Res. 18, 4417–4421 (1990).

  74. 74.

    , , & Capillary gel-electrophoresis for DNA sequencing—laser-induced fluorescence detection with the sheath flow cuvette. J. Chromatogr. 516, 61–67 (1990).

  75. 75.

    Automation for genomics, part one: preparation for sequencing. Genome Res. 10, 1081–1092 (2000).

  76. 76.

    Automation for genomics, part two: sequencers, microarrays, and future trends. Genome Res. 10, 1288–1303 (2000).

  77. 77.

    & Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

  78. 78.

    , , & Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

  79. 79.

    Genomic sequence information should be released immediately and freely in the public domain. Science 274, 533–534 (1996).

  80. 80.

    Statement on the rapid release of genomic DNA sequence. Genome Res. 8, 413 (1998).

  81. 81.

    et al. A genetic map of the mouse suitable for typing intraspecific crosses. Genetics 131, 423–447 (1992).

  82. 82.

    et al. Construction and characterization of a human bacterial artificial chromosome library. Genomics 34, 213–218 (1996).

  83. 83.

    et al. Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res. 10, 116–128 (2000).

  84. 84.

    et al. High throughput fingerprint analysis of large-insert clones. Genome Res. 7, 1072–1084 (1997).

  85. 85.

    et al. A map for sequence analysis of the Arabidopsis thaliana genome. Nature Genet. 22, 265–270 (1999).

  86. 86.

    . A physical map of the human genome. Nature 409, 934–941 (2001).

  87. 87.

    et al. Human BAC ends quality assessment and sequence analyses. Genomics 63, 321–332 (2000).

  88. 88.

    et al. Sequence-tagged connectors: A sequence approach to mapping and scanning the human genome. Proc. Natl Acad. Sci. USA 96, 9739–9744 (1999).

  89. 89.

    et al. A physical map of the human Y chromosome. Nature 409, 943–945 (2001).

  90. 90.

    et al. The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X. Nature 409, 942–943 (2001).

  91. 91.

    et al. A high-resolution map of human chromosome 12. Nature 409, 945–946 (2001).

  92. 92.

    et al. A physical map of human chromosome 14. Nature 409, 947–948 (2001).

  93. 93.

    et al. The DNA sequence of human chromosome 21. Nature 405, 311–319 (2000).

  94. 94.

    et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

  95. 95.

    et al. Radiation hybrid map of the human genome. Science (in the press).

  96. 96.

    et al. An improved approach for construction of bacterial artificial chromosome libraries. Genomics 52, 1–8 (1998).

  97. 97.

    . A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

  98. 98.

    , & A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8, 1229–1231 (1998).

  99. 99.

    et al. An STS-based radiation hybrid map of the human genome. Genome Res. 7, 422–433 (1997).

  100. 100.

    et al. A physical map of 30,000 human genes. Science 282, 744–746 (1998).

  101. 101.

    et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380, 152–154 (1996).

  102. 102.

    , , , & Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998).

  103. 103.

    . Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

  104. 104.

    & GigAssembler: an algorithm for the initial assembly of the human working draft . Technical Report UCSC-CRL-00-17 (Univ. California at Santa Cruz, Santa Cruz, California, 2001).

  105. 105.

    Parameters of the human genome. Proc. Natl Acad. Sci. USA 88, 7474–7476 (1991).

  106. 106.

    & Heterochromatic regions on chromosomes 1, 9, 16, and Y in children with some disturbances occurring during embryo development. Hum. Genet. 63, 183–188 (1983).

  107. 107.

    , & Constitutive heterochromatin C-band polymorphism in prostatic cancer. Cancer Genet. Cytogenet. 51, 57–62 (1991).

  108. 108.

    , , , & Human centromeric DNAs. Hum. Genet. 100, 291–304 (1997).

  109. 109.

    et al. Integration of telomere sequences with the draft human genome sequence. Nature 409, 953–958 (2001).

  110. 110.

    & RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140 (2001).

  111. 111.

    , & Guide to the draft human genome. Nature 409, 824–826 (2001).

  112. 112.

    & Evolutionary genomics: reading the bands. Bioessays 22, 105–107 (2000).

  113. 113.

    et al. Correlations between isochores and chromosomal bands in the human genome. Proc. Natl Acad. Sci. USA 90, 11929–11933 (1993).

  114. 114.

    , & The gene distribution of the human genome. Gene 174, 95–102 (1996).

  115. 115.

    Base composition and gene distribution: critical patterns in mammalian genome organization. Trends Genet. 12, 519–524 (1996).

  116. 116.

    , & Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40, 308–317 (1995).

  117. 117.

    , , & The highest gene concentrations in the human genome are in telomeric bands of metaphase chromosomes. Proc. Natl Acad. Sci. USA 89, 4913–4917 (1992).

  118. 118.

    et al. The mosaic genome of warm-blooded vertebrates. Science 228, 953–958 (1985).

  119. 119.

    Isochores and the evolutionary genomics of vertebrates. Gene 241, 3–17 (2000).

  120. 120.

    , & Base compositional structure of genomes. Genomics 13, 1056–1064 (1992).

  121. 121.

    Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51, 79–94 (1989).

  122. 122.

    , , , & A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell 40, 91–99 (1985).

  123. 123.

    CpG islands as gene markers in the vertebrate nucleus. Trends Genet. 3, 342–347 (1987).

  124. 124.

    , & Relationship between transcription and DNA methylation. Curr. Top. Microbiol. Immunol. 249, 75–86 (2000).

  125. 125.

    & DNA modification mechanisms and gene activity during development. Science 187, 226–232 (1975).

  126. 126.

    , , & CpG islands as gene markers in the human genome. Genomics 13, 1095–1107 (1992).

  127. 127.

    & Alternative chromatin structure at CpG islands. Cell 60, 909–920 (1990).

  128. 128.

    & CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).

  129. 129.

    & Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).

  130. 130.

    & Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet. 25, 232–234 (2000).

  131. 131.

    Comparison of human genetic and sequence-based physical maps. Nature 409, 951–953 (2001).

  132. 132.

    , , & Chromosome size-dependent control of meiotic recombination. Science 256, 228–232 (1992).

  133. 133.

    et al. Physical maps of the 6 smallest chromosomes of Saccharomyces cerevisiae at a resolution of 2.6-kilobase pairs. Genetics 134, 81–150 (1993).

  134. 134.

    et al. Patterns of meiotic recombination on the long arm of human chromosome 21. Genome Res. 10, 1319–1332 (2000).

  135. 135.

    & Further studies on bivalent chiasma frequency in human males with normal karyotypes. Ann. Hum. Genet. 49, 189–201 (1985).

  136. 136.

    Meiotic chromosomes: it takes two to tango. Genes Dev. 11, 2600–2621 (1997).

  137. 137.

    & Meiosis-induced double-strand break sites determined by yeast chromatin structure. Science 263, 515–518 (1994).

  138. 138.

    et al. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 97, 11383–11390 (2000).

  139. 139.

    Molecular Evolution (Sinauer, Sunderland, Massachusetts, 1997).

  140. 140.

    & The modulation of DNA content: proximate causes and ultimate consequences. Genome Res. 9, 317–324 (1999).

  141. 141.

    Molecular melodies in high and low C. Nature Rev. Genet. 1, 145–149 (2000).

  142. 142.

    Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9, 657–663 (1999).

  143. 143.

    & Jr Mobile elements and the human genome. Nature Rev. Genet. 1, 134–144 (2000).

  144. 144.

    , , & SINEs and LINEs share common 3′ sequences: a review. Gene 205, 229–243 (1997).

  145. 145.

    , & Human LINE retrotransposons generate processed pseudogenes. Nature Genet. 24, 363–367 (2000).

  146. 146.

    et al. Human L1 retrotransposition: cis-preference vs. trans-complementation. Mol. Cell. Biol. 21, 1429–1439 (2001)

  147. 147.

    , & Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10, 1307–1318 (2000).

  148. 148.

    The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6, 743–748 (1996).

  149. 149.

    & A phylogenetic perspective on P transposable element evolution in Drosophila. Proc. Natl Acad. Sci. USA 94, 11428–11433 (1997).

  150. 150.

    , & Ancient and recent horizontal invasions of Drosophilids by P elements. J. Mol. Evol. 51, 577–586 (2000).

  151. 151.

    et al. Evidence for recent invasion of the medaka fish genome by the Tol2 transposable element. Genetics 155, 273–281 (2000).

  152. 152.

    & Recent horizontal transfer of a mariner transposable element among and between Diptera and Neuroptera. Mol. Biol. Evol. 12, 850–862 (1995).

  153. 153.

    Horizontal transfer of hobo transposable elements within the Drosophila melanogaster species complex: evidence from DNA sequencing. Mol. Biol. Evol. 9, 1050–1060 (1992).

  154. 154.

    , & The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793–805 (1999).

  155. 155.

    & Bov-B long interspersed repeated DNA (LINE) sequences are present in Vipera ammodytes phospholipase A2 genes and in genomes of Viperidae snakes. Eur. J. Biochem. 246, 772–779 (1997).

  156. 156.

    Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).

  157. 157.

    & Generation time and genome evolution in primates. Science 179, 1144–1147 (1973).

  158. 158.

    , , & Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J. Mol. Biol. 246, 401–417 (1995).

  159. 159.

    & Gross chromosome rearrangements mediated by transposable elements in Drosophila melanogaster. Bioessays 16, 269–275 (1994).

  160. 160.

    , , , & Generation of a widespread Drosophila inversion by a transposable element. Science 285, 415–418 (1999).

  161. 161.

    It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet. 16, 461–468 (2000).

  162. 162.

    & Genome rearrangements by nonlinear transposons in maize. Genetics 153, 1403–1410 (1999).

  163. 163.

    Identification of a new, abundant superfamily of mammalian LTR-transposons. Nucleic Acids Res. 21, 1863–1872 (1993).

  164. 164.

    , & Isolation of novel human endogenous retrovirus-like elements with foamy virus-related pol sequence. J. Virol. 69, 5890–5897 (1995).

  165. 165.

    & Human-specific integrations of the HERV-K endogenous retrovirus family. J. Virol. 72, 9782–9787 (1998).

  166. 166.

    et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).

  167. 167.

    , & High intrinsic rate of DNA loss in Drosophila. Nature 384, 346–349 (1996).

  168. 168.

    , , , & Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol. Phylogenet. Evol. 5, 182–187 (1996).

  169. 169.

    et al. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9, 585–598 (1998).

  170. 170.

    & The impact of L1 retrotransposons on the human genome. Nature Genet. 19, 19–24 (1998).

  171. 171.

    & NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans. Genetics 154, 193–203 (2000).

  172. 172.

    et al. The end of the LINE?: lack of recent L1 activity in a group of South American rodents. Genetics 154, 1809–1817 (2000).

  173. 173.

    , , , & Sequence organization and genomic distribution of the major family of interspersed repeats of mouse DNA. Proc. Natl Acad. Sci. USA 79, 355–359 (1982).

  174. 174.

    , & The distribution of interspersed repeats is nonuniform and conserved in the mouse and human genomes. Proc. Natl Acad. Sci. USA 80, 1816–1820 (1983).

  175. 175.

    , , , & Replication timing of genes and middle repetitive sequences. Science 224, 686–692 (1984).

  176. 176.

    & Chromosomal and nuclear distribution of the HindIII 1.9-kb human DNA repeat segment. Chromosoma 91, 28–38 (1984).

  177. 177.

    , , & Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916 (1996).

  178. 178.

    Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl Acad. Sci. USA 94, 1872–1877 (1997).

  179. 179.

    et al. High-resolution cartography of recently integrated human chromosome 19-specific Alu fossils. J. Mol. Biol. 281, 843–856 (1998).

  180. 180.

    Does SINE evolution preclude Alu function? Nucleic Acids Res. 26, 4541–4550 (1998).

  181. 181.

    , , , & Potential Alu function: regulation of the activity of double-stranded RNA-activated kinase PKR. Mol. Cell. Biol. 18, 58–68 (1998).

  182. 182.

    , , & Physiological stresses increase mouse short interspersed element (SINE) RNA expression in vivo. Gene 239, 367–372 (1999).

  183. 183.

    , , & Cell stress and translational inhibitors transiently increase the abundance of mammalian SINE transcripts. Nucleic Acids Res. 23, 1758–1765 (1995).

  184. 184.

    Correlation between molecular clock ticking, codon usage fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells. FEBS Lett. 217, 184–186 (1987).

  185. 185.

    Directional mutation pressure and neutral molecular evolution. Proc. Natl Acad. Sci. USA 85, 2653–2657 (1988).

  186. 186.

    , & Mutation rates differ among regions of the mammalian genome. Nature 337, 283–285 (1989).

  187. 187.

    Local sequence dependence of rate of base replacement in mammals. Mutat. Res. 267, 43–54 (1992).

  188. 188.

    & DNA precursor asymmetries, replication fidelity, and variable genome evolution. Bioessays 14, 295–301 (1992).

  189. 189.

    & Organization of mutations along the genome: a prime determinant of genome evolution. Trends Ecol. Evol. 9, 65–68 (1994).

  190. 190.

    Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152, 675–683 (1999).

  191. 191.

    . An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).

  192. 192.

    , & Unexpectedly similar rates of nucleotide substitution found in male and female hominids. Nature 406, 622–625 (2000).

  193. 193.

    , & Unit-length LINE-1 transcripts in human teratocarcinoma cells. Mol. Cell. Biol. 8, 1385–1397 (1988).

  194. 194.

    , & L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol. Biol. Evol. 17, 915–928 (2000).

  195. 195.

    Human L1 retrotransposition: insights and peculiarities learned from a cultured cell retrotransposition assay. Genetica 107, 39–51 (1999).

  196. 196.

    et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164–166 (1988).

  197. 197.

    et al. Reading between the LINEs: Human genomic variation introduced by LINE-1 retrotransposition. Genome Res. 10, 1496–1508 (2000).

  198. 198.

    , , , & Isolation of an active human transposable element. Science 254, 1805–1808 (1991).

  199. 199.

    , , , & A new retrotransposable human L1 element from the LRE2 locus on chromosome 1q produces a chimaeric insertion. Nature Genet. 7, 143–148 (1994).

  200. 200.

    et al. Many human L1 elements are capable of retrotransposition. Nature Genet. 16, 37–43 (1997).

  201. 201.

    , & Two additional potential retrotransposons isolated from a human L1 subfamily that contains an active retrotransposable element. Proc. Natl Acad. Sci. USA 90, 6513–6517 (1993).

  202. 202.

    et al. Full-length human L1 insertions retain the capacity for high frequency retrotransposition in cultured cells. Hum. Mol. Genet. 8, 1557–1560 (1999).

  203. 203.

    et al. High frequency retrotransposition in cultured mammalian cells. Cell 87, 917–927 (1996).

  204. 204.

    , & Exon shuffling by L1 retrotransposition. Science 283, 1530–1534 (1999).

  205. 205.

    , , & Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res. 10, 411–415 (2000).

  206. 206.

    et al. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 52, 643–645 (1992).

  207. 207.

    & Developmental and cell type specificity of LINE-1 expression in mouse testis: implications for transposition. Mol. Cell. Biol. 14, 2584–2592 (1994).

  208. 208.

    & Tightly regulated, developmentally specific expression of the first open reading frame from LINE-1 during mouse embryogenesis. Proc. Natl Acad. Sci. USA 92, 1520–1524 (1995).

  209. 209.

    & Sectorial mutagenesis by transposable elements. Genetica 107, 239–248 (1999).

  210. 210.

    , , & Precise excision of TTAA-specific lepidopteran transposons piggyBac (IFP2) and tagalong (TFP3) from the baculovirus genome in cell lines from two species of Lepidoptera. Insect Mol. Biol. 5, 141–151 (1996).

  211. 211.

    Genomes were forged by massive bombardments with retroelements and retrosequences. Genetica 107, 209–238 (1999).

  212. 212.

    , , & Equilibrium distribution of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl Acad. Sci. USA 95, 10774–10778 (1998).

  213. 213.

    , & Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 10, 967–981 (2000).

  214. 214.

    Heterogeneous mutation processes in human microsatellite DNA sequences. Nature Genet. 24, 400–402 (2000).

  215. 215.

    , , & Structure of chromosomal duplicons and their role in mediating human genomic disorders. Genome Res. 10, 597–610 (2000).

  216. 216.

    Masquerading repeats: paralogous pitfalls of the human genome. Genome Res. 8, 758–762 (1998).

  217. 217.

    & Pathological consequences of sequence duplications in the human genome. Genome Res. 8, 1007–1021 (1998).

  218. 218.

    et al. Interchromosomal duplications of the adrenoleukodystrophy locus: a phenomenon of pericentromeric plasticity. Hum. Mol. Genet. 6, 991–1002 (1997).

  219. 219.

    , & The mosaic structure of human pericentromeric DNA: a strategy for characterizing complex regions of the human genome. Genome Res. 10, 839–852 (2000).

  220. 220.

    et al. A genomic region encompassing a cluster of olfactory receptor genes and a myosin light chain kinase (MYLK) gene is duplicated on human chromosome regions 3q13-q21 and 3p13. Genomics 56, 98–110 (1999).

  221. 221.

    , , & Comparative mapping of DNA probes derived from the V kappa immunoglobulin gene regions on human and great ape chromosomes by fluorescence in situ hybridization. Genomics 26, 147–150 (1995).

  222. 222.

    et al. Duplication of a gene-rich cluster between 16p11.1 and Xq28: a novel pericentromeric-directed mechanism for paralogous genome evolution. Hum. Mol. Genet. 5, 899–912 (1996).

  223. 223.

    et al. Two sequence-ready contigs spanning the two copies of a 200-kb duplication on human 21q: partial sequence and polymorphisms. Genomics 51, 417–426 (1998).

  224. 224.

    et al. Emergence and scattering of multiple neurofibromatosis (NF1)-related sequences during hominoid evolution suggest a process of pericentromeric interchromosomal transposition. Hum. Mol. Genet. 6, 9–16 (1997).

  225. 225.

    , & A large polymorphic repeat in the pericentromeric region of human chromosome 15q contains three partial gene duplications. Hum. Mol. Genet. 7, 1253–1260 (1998).

  226. 226.

    et al. Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. Hum. Mol. Genet. 7, 13–26 (1998).

  227. 227.

    et al. Large multi-chromosomal duplications encompass many members of the olfactory receptor gene family in the human genome. Hum. Mol. Genet. 7, 2007–2020 (1998).

  228. 228.

    et al. Identification of the first gene (FRG1) from the FSHD region on human chromosome 4q35. Hum. Mol. Genet. 5, 581–590 (1996).

  229. 229.

    The immunoglobulin kappa locus—or—what has been learned from looking closely at one-tenth of a percent of the human genome. Gene 135, 167–173 (1993).

  230. 230.

    , , , & Fluorescence in situ hybridization analysis of keratinocyte growth factor gene amplification and dispersion in evolution of great apes and humans. Proc. Natl Acad. Sci. USA 94, 11461–11465 (1997).

  231. 231.

    et al. The FSHD region on human chromosome 4q35 contains potential coding regions among pseudogenes and a high density of repeat elements. Genomics 61, 55–65 (1999).

  232. 232.

    et al. Molecular structure and evolution of an alpha satellite/non-alpha satellite junction at 16p11. Hum. Mol. Genet. 9, 113–123 (2000).

  233. 233.

    et al. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10q. Hum. Mol. Genet. 9, 2029–2042 (2000).

  234. 234.

    , , , & The human COX10 gene is disrupted during homologous recombination between the 24 kb proximal and distal CMT1A-REPs. Hum. Mol. Genet. 6, 1595–1603 (1997).

  235. 235.

    et al. Chromosome breakage in the Prader-Willi and Angelman syndromes involves recombination between large, transcribed repeats at proximal and distal breakpoints. Am. J. Hum. Genet. 65, 370–386 (1999).

  236. 236.

    , , , & Large genomic duplicons map to sites of instability in the Prader-Willi/Angelman syndrome chromosome region (15q11-q13). Hum. Mol. Genet. 8, 1025–1037 (1999).

  237. 237.

    , & Low-copy repeats mediate the common 3-Mb deletion in patients with velo-cardio-facial syndrome. Am. J. Hum. Genet. 64, 1076–1086 (1999).

  238. 238.

    et al. Chromosome 22-specific low copy repeats and the 22q11.2 deletion syndrome: genomic organization and deletion endpoint analysis. Hum. Mol. Genet. 9, 489–501 (2000).

  239. 239.

    Williams-Beuren syndrome: genes and mechanisms. Hum. Mol. Genet. 8, 1947–1954 (1999).

  240. 240.

    et al. A physical map, including a BAC/PAC clone contig, of the Williams-Beuren syndrome-deletion region at 7q11.23. Am. J. Hum. Genet. 66, 47–68 (2000).

  241. 241.

    , & CAGGG repeats and the pericentromeric duplication of the hominoid genome. Genome Res. 9, 1048–1058 (1999).

  242. 242.

    & in Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and the Evolution of Gene Families (eds Sankoff, D. & Nadeau, J.) 29–46 (Kluwer Academic, Dordrecht, 2000).

  243. 243.

    The new genomics: Global views of biology. Science 274, 536–539 (1996).

  244. 244.

    Noncoding RNA genes. Curr. Op. Genet. Dev. 9, 695–699 (1999).

  245. 245.

    , , , & The complete atomic structure of the large ribosomal subunit at 2.4 angstrom resolution. Science 289, 905–920 (2000).

  246. 246.

    , , , & The structural basis of ribosome activity in peptide bond synthesis. Science 289, 920–930 (2000).

  247. 247.

    & Guided tours: from precursor snoRNA to functional snoRNP. Curr. Opin. Cell Biol. 11, 378–384 (1999).

  248. 248.

    & in Modification and Editing of RNA (ed. Benne, H. G. a. R.) 255–272 (ASM, Washington DC, 1998).

  249. 249.

    & Classification of introns: U2-type or U12-type. Cell 91, 875–879 (1997).

  250. 250.

    et al. The Human Xist gene—analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71, 527–542 (1992).

  251. 251.

    , & Vaults are the answer, what is the question? Trends Cell Biol. 6, 174–178 (1996).

  252. 252.

    & Proportion of the HeLa cell genome complementary to the transfer RNA and 5S RNA. J. Mol. Biol. 56, 535–553 (1971).

  253. 253.

    , , , & Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 26, 148–153 (1998).

  254. 254.

    & Repeated genes in eukaryotes. Annu. Rev. Biochem. 49, 727–764 (1980).

  255. 255.

    Codon–anticodon pairing: the wobble hypothesis. J. Mol. Biol. 19, 548–555 (1966).

  256. 256.

    & in The Molecular Biology of the Yeast Saccharomyces: Metabolism and Gene Expression (eds Strathern, J. & Broach J.) 487–528 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1982).

  257. 257.

    & (eds) tRNA: Structure, Biosynthesis, and Function (ASM, Washington DC, 1995).

  258. 258.

    Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34 (1985).

  259. 259.

    Coevolution of codon usage and transfer-RNA abundance. Nature 325, 728–730 (1987).

  260. 260.

    tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. 16, 287–289 (2000).

  261. 261.

    & Codon usage and genome evolution. Curr. Opin. Genet. Dev. 4, 851–860 (1994).

  262. 262.

    A primate transfer-RNA gene cluster and the evolution of human chromosome 1. Cytogenet. Cell Genet. 61, 1–4 (1992).

  263. 263.

    & Human tRNA-Glu genes: their copy number and organization. FEBS Lett. 276, 138–142 (1990).

  264. 264.

    et al. The human ribosomal RNA genes: structure and organization of the complete repeating unit. Hum. Genet. 73, 193–198 (1986).

  265. 265.

    & Characterization of human 5S ribosomal RNA genes. Nucleic Acids Res. 19, 4147–4151 (1991).

  266. 266.

    et al. [Organization of a 5S ribosomal RNA gene cluster in the human genome]. Mol. Biol. (Mosk.) 27, 861–868 (1993).

  267. 267.

    & Genomic organization of human 5S rDNA and sequence of one tandem repeat. Genomics 4, 376–383 (1989).

  268. 268.

    Htl>The numerous modified nucleotides in eukaryotic ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 39, 241–303 (1990).

  269. 269.

    , , & Modification of U6 spliceosomal RNA is guided by other small RNAs. Mol. Cell 2, 629–638 (1998).

  270. 270.

    , & Concerted evolution of the tandem array encoding primate U2 snRNA (the RNU2 locus) is accompanied by dramatic remodeling of the junctions with flanking chromosomal sequences. EMBO J. 18, 3783–3792 (1999).

  271. 271.

    , , & Human genes for U2 small nuclear RNA map to a major adenovirus 12 modification site on chromosome 17. Nature 314, 115–116 (1985).

  272. 272.

    & Human genes for U2 small nuclear RNA are tandemly repeated. Mol. Cell. Biol. 4, 492–499 (1984).

  273. 273.

    , & Human genes encoding U3 snRNA associate with coiled bodies in interphase cells and are clustered on chromosome 17p11. 2 in a complex inverted repeat structure. Nucleic Acids Res. 25, 4740–4747 (1997).

  274. 274.

    A survey on intron and exon lengths. Nucleic Acids Res. 16, 9893–9908 (1988).

  275. 275.

    & Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).

  276. 276.

    & Titins: giant proteins in charge of muscle ultrastructure and elasticity. Science 270, 293–296 (1995).

  277. 277.

    , & Architectural limits on split genes. Proc. Natl Acad. Sci. USA 93, 15081–15085 (1996).

  278. 278.

    , , , & General splicing factor SF2/ASF promotes alternative splicing by binding to an exonic splicing enhancer. Genes Dev. 7, 2598–2608 (1993).

  279. 279.

    , & Polypurine sequences within a downstream exon function as a splicing enhancer. Mol. Cell. Biol. 14, 1347–1354 (1994).

  280. 280.

    , & An intron splicing enhancer containing a G-rich repeat facilitates inclusion of a vertebrate micro-exon. RNA 2, 342–353 (1996).

  281. 281.

    , & Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28, 4364–4375 (2000).

  282. 282.

    , & Evolutionary fates and origins of U12-type introns. Mol. Cell 2, 773–785 (1998).

  283. 283.

    , & Frequent alternative splicing of human genes. Genome Res. 9, 1288–1293 (1999).

  284. 284.

    et al. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15, 389–390 (1999).

  285. 285.

    et al. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 474, 83–86 (2000).

  286. 286.

    The gene guessing game. Yeast 17, 218–224 (2000).

  287. 287.

    Gene Expression (Wiley, New York, 1980).

  288. 288.

    Genes IV 466–481 (Oxford Univ. Press, Oxford, 1990).

  289. 289.

    Researchers take a gamble on the human genome. Nature 405, 264 (2000).

  290. 290.

    , , & How many genes in the human genome? Nature Genet. 7, 345–346 (1994).

  291. 291.

    et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nature Genet. 25, 239–240 (2000).

  292. 292.

    et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000).

  293. 293.

    Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282, 2012–2018 (1998).

  294. 294.

    et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000).

  295. 295.

    et al. Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716 (1993).

  296. 296.

    et al. Functional genomic analysis of C. elegans chromosome I by systematic RNA interference. Nature 408, 325–330 (2000).

  297. 297.

    EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput. Appl. Biosci. 13, 477–478 (1997).

  298. 298.

    , , , & A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967–974 (1998).

  299. 299.

    , & Analysis of EST-driven gene annotation in human genomic sequence. Genome Res. 8, 362–376 (1998).

  300. 300.

    , & PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids Res. 24, 2730–2739 (1996).

  301. 301.

    , & Gene recognition via spliced sequence alignment. Proc. Natl Acad. Sci. USA 93, 9061–9066 (1996).

  302. 302.

    , , & A generalized hidden Markov model for the recognition of human genes in DNA. ISMB 4, 134–142 (1996).

  303. 303.

    , , & Genie—gene finding in Drosophila melanogaster. Genome Res. 10, 529–538 (2000).

  304. 304.

    & The Gene-Finder computer tools for analysis of human and model organisms genome sequences. ISMB 5, 294–302 (1997).

  305. 305.

    , , , & An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).

  306. 306.

    & Open annotation offers a democratic solution to genome sequencing. Nature 403, 825 (2000).

  307. 307.

    et al. The Pfam protein families database. Nucleic Acids Res. 28, 263–266 (2000).

  308. 308.

    & Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000).

  309. 309.

    . Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).

  310. 310.

    , & Small open reading frames: beautiful needles in the haystack. Genome Res. 7, 768–771 (1997).

  311. 311.

    & Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 115, 420–430 (1985).

  312. 312.

    , , , & Evolution of domain families. Adv. Protein Chem. 54, 185–244 (2000).

  313. 313.

    The multiplicity of domains in proteins. Annu. Rev. Biochem. 64, 287–314 (1995).

  314. 314.

    & Searching databases to find protein domain organization. Adv. Protein Chem. 54, 137–157 (2000).

  315. 315.

    et al. Cancer and genomics. Nature 409, 850–852 (2001).

  316. 316.

    & Learning about addiction from the human draft genome. Nature 409, 834–835 (2001).

  317. 317.

    , & Expressing the human genome. Nature 409, 832–835 (2001).

  318. 318.

    , , , & A genomic view of immunology. Nature 409, 836–838 (2001).

  319. 319.

    , , & Evolutionary analyses of the human genome. Nature 409, 847–849 (2001).

  320. 320.

    , , & A genomic perspective on membrane compartment organization. Nature 409, 839–841 (2001).

  321. 321.

    Genomics, the cytoskeleton and motility. Nature 409, 842–843 (2001).

  322. 322.

    & Can sequencing shed light on cell cycling? Nature 409, 844–846 (2001).

  323. 323.

    , & Keeping time with the human genome. Nature 409, 829–831 (2001).

  324. 324.

    et al. Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science 282, 2022–2028 (1998).

  325. 325.

    & Origin of multicellular eukaryotes—insights from proteome comparisons. Curr. Opin. Genet. Dev. 9, 688–694 (1999).

  326. 326.

    et al. PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res. 28, 225–227 (2000).

  327. 327.

    , , & The PROSITE database, its status in 1999. Nucleic Acids Res. 27, 215–219 (1999).

  328. 328.

    et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  329. 329.

    , & No footprints of primordial introns in a eukaryotic genome. Trends Genet. 16, 333–334 (2000).

  330. 330.

    , , , & A. Abnormal behavior associated with a point mutation in the structural gene for monoamine oxidase A. Science 262, 578–580 (1993).

  331. 331.

    et al. Aggressive behavior and altered amounts of brain serotonin and norepinephrine in mice lacking MAOA. Science 268, 1763–1766 (1995).

  332. 332.

    et al. X-linked borderline mental retardation with prominent behavioral disturbance: phenotype, genetic localization, and evidence for disturbed monoamine metabolism. Am. J. Hum. Genet. 52, 1032–1039 (1993).

  333. 333.

    et al. Excess of high activity monoamine oxidase A gene promoter alleles in female patients with panic disorder. Hum. Mol. Genet. 8, 621–624 (1999).

  334. 334.

    & Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

  335. 335.

    , & A genomic perspective on protein families. Science 278, 631–637 (1997).

  336. 336.

    , , , & Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J. Mol. Biol. 289, 729–745 (1999).

  337. 337.

    , & Evolution of the rodent eosinophil-associated Rnase gene family by rapid gene sorting and positive selection. Proc. Natl Acad. Sci. USA 97, 4701–4706 (2000).

  338. 338.

    Ependymin, a brain extracellular glycoprotein, and CNS plasticity. Ann. NY Acad. Sci. 627, 94–114 (1991).

  339. 339.

    , , , & SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28, 231–234 (2000).

  340. 340.

    , & The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576 (2000).

  341. 341.

    , & Members of the immunoglobulin superfamily in bacteria. Protein Sci. 5, 1939–1941 (1996).

  342. 342.

    , & Branchless encodes a Drosophila FGF homolog that controls tracheal cell migration and the pattern of branching. Cell 87, 1091–1101 (1996).

  343. 343.

    et al. The molecular basis of lung morphogenesis. Mech. Dev. 92, 55–81 (2000).

  344. 344.

    , , , & The human olfactory subgenome: from sequence to structure to evolution. Hum. Genet. 108, 1–13 (2001).

  345. 345.

    et al. The olfactory receptor gene family: data mining, classification and nomenclature. Mamm. Genome 11, 1016–1023 (2000).

  346. 346.

    et al. Distribution of olfactory receptor genes in the human genome. Nature Genet. 18, 243–250 (1998).

  347. 347.

    et al. Primate evolution of an olfactory receptor cluster: Diversification by gene conversion and recent emergence of a pseudogene. Genomics 61, 24–36 (1999).

  348. 348.

    et al. Dichotomy of single-nucleotide polymorphism haplotypes in olfactory receptor genes and pseudogenes. Nature Genet. 26, 221–224 (2000).

  349. 349.

    & Cells, Embryos, and Evolution (Blackwell Science, Malden, Massachusetts, 1997).

  350. 350.

    et al. The syntenic relationship of the zebrafish and human genomes. Genome Res. 10, 1351–1358 (2000).

  351. 351.

    , , & Estimation of synteny conservation and genome compaction between pufferfish (Fugu) and human. Yeast 17, 22–36 (2000).

  352. 352.

    et al. Linkage of TATA-binding protein and proteasome subunit C5 genes in mice and humans reveals synteny conserved between mammals and invertebrates. Genomics 44, 1–7 (1997).

  353. 353.

    Maps of linkage and synteny homologies between mouse and man. Trends Genet. 5, 82–86 (1989).

  354. 354.

    & Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl Acad. Sci. USA 81, 814–818 (1984).

  355. 355.

    et al. A genetic linkage map of the mouse: current applications and future prospects. Science 262, 57–66 (1993).

  356. 356.

    & Human/mouse homology relationships. Genomics 33, 337–351 (1996).

  357. 357.

    & The lengths of undiscovered conserved segments in comparative maps. Mamm. Genome 9, 491–495 (1998).

  358. 358.

    et al. Comparative genome mapping in the sequence-based era: early experience with human chromosome 7. Genome Res. 10, 624–633 (2000).

  359. 359.

    et al. Chromosome evolution: The junction of mammalian chromosomes in the formation of mouse chromosome 10. Genome Res. 10, 1463–1467 (2000).

  360. 360.

    Mammalian phylogeny: shaking the tree. Nature 356, 121–125 (1992).

  361. 361.

    et al. Genome maps 10. Comparative genomics. Mammalian radiations. Wall chart. Science 286, 463–478 (1999).

  362. 362.

    Vertebrate Paleontology (Univ. Chicago Press, Chicago and New York, 1966).

  363. 363.

    et al. Toward a unified genetic map of higher plants, transcending the monocot-dicot divergence. Nature Genet. 14, 380–382 (1996).

  364. 364.

    , & Differentiation between natural and cultivated populations of Medicago sativa (Leguminosae) from Spain: analysis with random amplified polymorphic DNA (RAPD) markers and comparison to allozymes. Mol. Ecol. 8, 1317–1330 (1999).

  365. 365.

    Evolution by Gene Duplication (George Allen and Unwin, London, 1970).

  366. 366.

    & Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997).

  367. 367.

    , , , & Extensive duplication and reshuffling in the arabidopsis genome. Plant Cell 12, 1093–1102 (2000).

  368. 368.

    et al. Comparative genomics of plant chromosomes. Plant Cell 12, 1523–1540 (2000).

  369. 369.

    , & The origins of genome duplications in Arabidopsis. Science 290, 2114–2117 (2000).

  370. 370.

    & Molecular phylogeny. Curr. Opin. Genet. Dev. 1, 451–456 (1991).

  371. 371.

    & A molecular evolutionary framework for eukaryotic model organisms. Curr. Biol. 4, 596–603 (1994).

  372. 372.

    Gen(om)e duplications in the evolution of early vertebrates. Curr. Opin. Genet. Dev. 6, 715–722 (1996).

  373. 373.

    Vertebrate evolution by interspecific hybridisation—are we polyploid? FEBS Lett. 400, 2–8 (1997).

  374. 374.

    & Eukaryote genome duplication—where's the evidence? Curr. Opin. Genet. Dev. 8, 694–700 (1998).

  375. 375.

    Phylogenies of developmentally important proteins do not support the hypothesis of two rounds of genome duplication early in vertebrate history. J. Mol. Evol. 48, 565–576 (1999).

  376. 376.

    & Genetic dissection of complex traits. Science 265, 2037–2048 (1994).

  377. 377.

    et al. Genetic variability in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genet. 26, 163–175 (2000).

  378. 378.

    et al. The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 78, 1073–1087 (1994).

  379. 379.

    et al. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271, 1380–1387 (1996).

  380. 380.

    et al. Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus PAH, in a global representation of populations. Am. J. Hum. Genet. 63, 1882–1899 (2000).

  381. 381.

    et al. Worldwide genetic analysis of the CFTR region. Am. J. Hum. Genet. 68, 103–117 (2001).

  382. 382.

    et al. Extent and distribution of linkage disequilibrium in three genomic regions. Am. J. Hum. Genet. 68, 191–197 (2001).

  383. 383.

    et al. Juxtaposed regions of extensive and minimal linkage disequilibrium in Xq25 and Xq28. Nature Genet. 25, 324–328 (2000).

  384. 384.

    et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67, 383–394 (2000).

  385. 385.

    , & Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl Acad. Sci. USA 96, 15173–15177 (1999).

  386. 386.

    et al. The extent of linkage disequilibrium in four populations with distinct demographic histories. Am. J. Hum. Genet. 67, 1544–1554 (2000).

  387. 387.

    , , & Sequence variation in the human angiotensin converting enzyme. Nature Genet. 22, 59–62 (1999).

  388. 388.

    Positional cloning moves from perditional to traditional. Nature Genet. 9, 347–350 (1995).

  389. 389.

    et al. Positional cloning of the APECED gene. Nature Genet. 17, 393–398 (1997).

  390. 390.

    et al. Mutations in PEX1 are the most common cause of peroxisome biogenesis disorders. Nature Genet. 17, 445–448 (1997).

  391. 391.

    et al. Human PEX1 is mutated in complementation group 1 of the peroxisome biogenesis disorders. Nature Genet. 17, 449–452 (1997).

  392. 392.

    et al. Pendred syndrome is caused by mutations in a putative sulphate transporter gene (PDS). Nature Genet. 17, 411–422 (1997).

  393. 393.

    et al. Host response to EBV infection in X-linked lymphoproliferative disease results from mutations in an SH2-domain encoding gene. Nature Genet. 20, 129–135 (1998).

  394. 394.

    et al. Nonsyndromic hearing impairment is associated with a mutation in DFNA5. Nature Genet. 20, 194–197 (1998).

  395. 395.

    et al. Mutations in ATP2A2, encoding a Ca2+ pump, cause Darier disease. Nature Genet. 21, 271–277 (1999).

  396. 396.

    et al. Identification of the gene (SEDL) causing X-linked spondyloepiphyseal dysplasia tarda. Nature Genet. 22, 400–404 (1999).

  397. 397.

    et al. Mutations in the CCN gene family member WISP3 cause progressive pseudorheumatoid dysplasia. Nature Genet. 23, 94–98 (1999).

  398. 398.

    et al. Truncating mutations in CCM1, encoding KRIT1, cause hereditary cavernous angiomas. Nature Genet. 23, 189–193 (1999).

  399. 399.

    et al. Mutations in the gene encoding KRIT1, a Krev-1/rap1a binding protein, cause cerebral cavernous malformations (CCM1). Hum. Mol. Genet. 8, 2325–2333 (1999).

  400. 400.

    et al. Mutations in COL11A2 cause non-syndromic hearing loss (DFNA13). Nature Genet. 23, 413–419 (1999).

  401. 401.

    et al. Limb-girdle muscular dystrophy type 2G is caused by mutations in the gene encoding the sarcomeric protein telethonin. Nature Genet. 24, 163–166 (2000).

  402. 402.

    et al. Mutations in a new gene in Ellis-van Creveld syndrome and Weyers acrodental dysostosis. Nature Genet. 24, 283–286 (2000).

  403. 403.

    et al. Mutations in ACTN4, encoding alpha-actinin-4, cause familial focal segmental glomerulosclerosis. Nature Genet. 24, 251–256 (2000).

  404. 404.

    et al. Mutations of SCN1A, encoding a neuronal sodium channel, in two families with GEFS+2. Nature Genet. 24, 343–345 (2000).

  405. 405.

    et al. Identification of the alpha-aminoadipic semialdehyde synthase gene, which is defective in familial hyperlysinemia. Am. J. Hum. Genet. 66, 1736–1743 (2000).

  406. 406.

    et al. N-myc downstream-regulated gene 1 is mutated in hereditary motor and sensory neuropathy-Lom. Am. J. Hum. Genet. 67, 47–58 (2000).

  407. 407.

    et al. Genetic basis of total colourblindness among the Pingelapese islanders. Nature Genet. 25, 289–293 (2000).

  408. 408.

    et al. Mutations in the CNGB3 gene encoding the beta-subunit of the cone photoreceptor cGMP-gated channel are responsible for achromatopsia (ACHM3) linked to chromosome 8q21. Hum. Mol. Genet. 9, 2107–2116 (2000).

  409. 409.

    et al. Gene encoding a new RING-B-box-coiled-coil protein is mutated in mulibrey nanism. Nature Genet. 25, 298–301 (2000).

  410. 410.

    et al. A defect in harmonin, a PDZ domain-containing protein expressed in the inner ear sensory hair cells, underlies usher syndrome type 1C. Nature Genet. 26, 51–55 (2000).

  411. 411.

    et al. A recessive contiguous gene deletion causing infantile hyperinsulinism, enteropathy and deafness identifies the usher type 1C gene. Nature Genet. 26, 56–60 (2000).

  412. 412.

    Mutations in MYH9 result in the May-Hegglin anomaly, and Fechtner and Sebastian syndromes. Nature Genet. 26, 103–105 (2000).

  413. 413.

    , , & Mutation of MYH9, encoding non-muscle myosin heavy chain A, in May-Hegglin anomaly. Nature Genet. 26, 106–108 (2000).

  414. 414.

    et al. Mutations of the gene encoding the protein kinase A type I-α regulatory subunit in patients with the Carney complex. Nature Genet. 26, 89–92 (2000).

  415. 415.

    et al. Human nonsyndromic hereditary deafness DFNA17 is due to a mutation in non-muscle myosin MYH9. Am. J. Hum. Genet. 67, 1121–1128 (2000).

  416. 416.

    et al. Large expansion of the ATTCT pentanucleotide repeat in spinocerebellar ataxia type 10. Nature Genet. 26, 191–194 (2000).

  417. 417.

    et al. Nuclear gene OPA1, encoding a mitochondrial dynamin-related protein, is mutated in dominant optic atrophy. Nature Genet. 26, 207–210 (2000).

  418. 418.

    et al. The complete form of X-linked congenital stationary night blindness is caused by mutations in a gene encoding a leucine-rich repeat protein. Nature Genet. 26, 324–327 (2000).

  419. 419.

    Autosomal dominant hypophosphataemic rickets is associated with mutations in FGF23. Nature Genet. 26, 345–348 (2000).

  420. 420.

    et al. The gene encoding gigaxonin, a new member of the cytoskeletal BTB/kelch repeat family, is mutated in giant axonal neuropathy. Nature Genet. 26, 370–374 (2000).

  421. 421.

    et al. Mutant WD-repeat protein in triple-A syndrome. Nature Genet. 26, 332–335 (2000).

  422. 422.

    et al. Perlecan, the major proteoglycan of basement membranes, is altered in patients with Schwartz-Jampel syndrome (chondrodystrophic myotonia). Nature Genet. 26, 480–483 (2000).

  423. 423.

    et al. Familial Alzheimer's disease in kindreds with missense mutations in a gene on chromosome 1 related to the Alzheimer's disease type 3 gene. Nature 376, 775–778 (1995).

  424. 424.

    et al. Cloning of a gene bearing missense mutations in early-onset familial Alzheimer's disease. Nature 375, 754–760 (1995).

  425. 425.

    & The therapeutic reactivation of fetal haemoglobin. Hum. Mol. Genet. 7, 1655–1658 (1998).

  426. 426.

    Research & development. Basic science and pharmaceutical innovation. Nature Biotechnol. 17, 406 (1999).

  427. 427.

    Drug discovery: a historical perspective. Science 287, 1960–1964 (2000).

  428. 428.

    et al. The 5-HT3B subunit is a major determinant of serotonin-receptor function. Nature 397, 359–363 (1999).

  429. 429.

    et al. Characterization of the human cysteinyl leukotriene 2 receptor. J. Biol. Chem. 275, 30531–30536 (2000).

  430. 430.

    et al. BACE maps to chromosome 11 and a BACE homolog, BACE2, reside in the obligate Down Syndrome region of chromosome 21. Science 286, 1255a (1999).

  431. 431.

    , & BACE maps to chromosome 11 and a BACE homolog, BACE2, reside in the obligate Down Syndrome region of chromosome 21. Science 286, 1255a (1999).

  432. 432.

    The good taste of genomics. Nature 404, 552–553 (2000).

  433. 433.

    , & A family of candidate taste receptors in human and mouse. Nature 404, 601–604 (2000).

  434. 434.

    et al. A novel family of mammalian taste receptors. Cell 100, 693–702 (2000).

  435. 435.

    et al. T2Rs function as bitter taste receptors. Cell 100, 703–711 (2000).

  436. 436.

    Conserved non-coding sequences are reliable guides to regulatory elements. Trends Genet. 16, 369–372 (2000).

  437. 437.

    et al. Sequence and comparative analysis of the mouse 1-megabase region orthologous to the human 11p15 imprinted domain. Genome Res. 10, 1697–1710 (2000).

  438. 438.

    , & Shotgun sample sequence comparisons between mouse and human genomes. Nature Genet. 25, 31–33 (2000).

  439. 439.

    Public-private project to deliver mouse genome in 6 months. Science 290, 242–243 (2000).

  440. 440.

    , , , & Human-mouse genome comparisons to locate regulatory sites. Nature Genet. 26, 225–228 (2000).

  441. 441.

    et al. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 439–455 (1988).

  442. 442.

    , & Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000).

  443. 443.

    , , & Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnol. 16, 939–945 (1998).

  444. 444.

    & Biclustering of expression data. ISMB 8, 93–103 (2000).

  445. 445.

    , , & A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nature Genet. 26, 183–186 (2000).

  446. 446.

    & Genomic imprinting in mammals: an interplay between chromatin and DNA methylation? Trends Genet. 15, 431–434 (1999).

  447. 447.

    & DNA methylation in health and disease. Nature Rev. Genet. 1, 11–19 (2000).

  448. 448.

    , & From genomics to epigenomics: a loftier view of life. Nature Biotechnol. 17, 1144–1144 (1999).

  449. 449.

    Mapping a subtext in our genetic book. Science 288, 945–946 (2000).

  450. 450.

    in T. S. Eliot. Collected Poems 1909–1962 (Harcourt Brace, New York, 1963).

  451. 451.

    , & FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci. 13, 523–535 (1997).

  452. 452.

    & Approximate statistics of gapped alignments. J. Comp. Biol. 6, 91–112 (1999).

Download references

Acknowledgements

Beyond the authors, many people contributed to the success of this work. E. Jordan provided helpful advice throughout the sequencing effort. We thank D. Leja and J. Shehadeh for their expert assistance on the artwork in this paper, especially the foldout figure; K. Jegalian for editorial assistance; J. Schloss, E. Green and M. Seldin for comments on an earlier version of the manuscript; P. Green and F. Ouelette for critiques of the submitted version; C. Caulcott, A. Iglesias, S. Renfrey, B. Skene and J. Stewart of the Wellcome Trust, P. Whittington and T. Dougans of NHGRI and M. Meugnier of Genoscope for staff support for meetings of the international consortium; and the University of Pennsylvania for facilities for a meeting of the genome analysis group. We thank Compaq Computer Corporations's High Performance Technical Computing Group for providing a Compaq Biocluster (a 27 node configuration of AlphaServer ES40s, containing 108 CPUs, serving as compute nodes and a file server with one terabyte of secondary storage) to assist in the annotation and analysis. Compaq provided the systems and implementation services to set up and manage the cluster for continuous use by members of the sequencing consortium. Platform Computing Ltd. provided its LSF scheduling and loadsharing software without license fee. In addition to the data produced by the members of the International Human Genome Sequencing Consortium, the draft genome sequence includes published and unpublished human genomic sequence data from many other groups, all of whom gave permission to include their unpublished data. Four of the groups that contributed particularly significant amounts of data were: M. Adams et al. of the Institute for Genomic Research; E. Chen et al. of the Center for Genetic Medicine and Applied Biosystems; S.-F. Tsai of National Yang-Ming University, Institute of Genetics, Taipei, Taiwan, Republic of China; and Y. Nakamura, K. Koyama et al. of the Institute of Medical Science, University of Tokyo, Human Genome Center, Laboratory of Molecular Medicine, Minato-ku, Tokyo, Japan.. Many other groups provided smaller numbers of database entries. We thank them all; a full list of the contributors of unpublished sequence is available as Supplementary Information. This work was supported in part by the National Human Genome Research Institute of the US NIH; The Wellcome Trust; the US Department of Energy, Office of Biological and Environmental Research, Human Genome Program; the UK MRC; the Human Genome Sequencing Project from the Science and Technology Agency (STA) Japan; the Ministry of Education, Science, Sport and Culture, Japan; the French Ministry of Research; the Federal German Ministry of Education, Research and Technology (BMBF) through Projektträger DLR, in the framework of the German Human Genome Project; BEO, Projektträger Biologie, Energie, Umwelt des BMBF und BMWT; the Max-Planck-Society; DFG—Deutsche Forschungsgemeinschaft; TMWFK, Thüringer Ministerium für Wissenschaft, Forschung und Kunst; EC BIOMED2—European Commission, Directorate Science, Research and Development; Chinese Academy of Sciences (CAS), Ministry of Science and Technology (MOST), National Natural Science Foundation of China (NSFC); US National Science Foundation EPSCoR and The SNP Consortium Ltd. Additional support for members of the Genome Analysis group came, in part, from an ARCS Foundation Scholarship to T.S.F., a Burroughs Wellcome Foundation grant to C.B.B. and P.A.S., a DFG grant to P.B., DOE grants to D.H., E.E.E. and T.S.F., an EU grant to P.B., a Marie-Curie Fellowship to L.C., an NIH-NHGRI grant to S.R.E., an NIH grant to E.E.E., an NIH SBIR to D.K., an NSF grant to D.H., a Swiss National Science Foundation grant to L.C., the David and Lucille Packard Foundation, the Howard Hughes Medical Institute, the University of California at Santa Cruz and the W. M. Keck Foundation.

Author information

Author notes

    • Glen A. Evans

    Present addresses: Genome Sequencing Project, Egea Biosciences, Inc., 4178 Sorrento Valley Blvd., Suite F, San Diego, CA 92121, USA (G.A.E.); INRA, Station d’Amélioration des Plantes, 63039 Clermont-Ferrand Cedex 2, France (L.C.).

Affiliations

  1. Whitehead Institute for Biomedical Research, Center for Genome Research, Nine Cambridge Center, Cambridge, Massachusetts 02142, USA

    • Eric S. Lander
    • , Lauren M. Linton
    • , Bruce Birren
    • , Chad Nusbaum
    • , Michael C. Zody
    • , Jennifer Baldwin
    • , Keri Devon
    • , Ken Dewar
    • , Michael Doyle
    • , William FitzHugh
    • , Roel Funke
    • , Diane Gage
    • , Katrina Harris
    • , Andrew Heaford
    • , John Howland
    • , Lisa Kann
    • , Jessica Lehoczky
    • , Rosie LeVine
    • , Paul McEwan
    • , Kevin McKernan
    • , James Meldrim
    • , Jill P. Mesirov
    • , Cher Miranda
    • , William Morris
    • , Jerome Naylor
    • , Christina Raymond
    • , Mark Rosetti
    • , Ralph Santos
    • , Andrew Sheridan
    • , Carrie Sougnez
    • , Nicole Stange-Thomann
    • , Nikola Stojanovic
    • , Aravind Subramanian
    • , Dudley Wyman
    • , Serafim Batzoglou
    • , Daniel G. Brown
    • , James Galagan
    •  & Victor J. Pollara
  2. The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, United Kingdom

    • Jane Rogers
    • , John Sulston
    • , Rachael Ainscough
    • , Stephan Beck
    • , David Bentley
    • , John Burton
    • , Christopher Clee
    • , Nigel Carter
    • , Alan Coulson
    • , Rebecca Deadman
    • , Panos Deloukas
    • , Andrew Dunham
    • , Ian Dunham
    • , Richard Durbin
    • , Lisa French
    • , Darren Grafham
    • , Simon Gregory
    • , Tim Hubbard
    • , Sean Humphray
    • , Adrienne Hunt
    • , Matthew Jones
    • , Christine Lloyd
    • , Amanda McMurray
    • , Lucy Matthews
    • , Simon Mercer
    • , Sarah Milne
    • , James C. Mullikin
    • , Andrew Mungall
    • , Robert Plumb
    • , Mark Ross
    • , Ratna Shownkeen
    • , Sarah Sims
    • , Alex Bateman
    • , Michele Clamp
    •  & James G. R. Gilbert
  3. Washington University Genome Sequencing Center, Box 8501, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA

    • Robert H. Waterston
    • , Richard K. Wilson
    • , LaDeana W. Hillier
    • , John D. McPherson
    • , Marco A. Marra
    • , Elaine R. Mardis
    • , Lucinda A. Fulton
    • , Asif T. Chinwalla
    • , Kymberlie H. Pepin
    • , Warren R. Gish
    • , Stephanie L. Chissoe
    • , Michael C. Wendl
    • , Kim D. Delehaunty
    • , Tracie L. Miner
    • , Andrew Delehaunty
    • , Jason B. Kramer
    • , Lisa L. Cook
    • , Robert S. Fulton
    • , Douglas L. Johnson
    • , Patrick J. Minx
    • , Sandra W. Clifton
    • , Ian Korf
    • , John Wallis
    •  & Shiaw-Pyng Yang
  4. US DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598, USA

    • Trevor Hawkins
    • , Elbert Branscomb
    • , Paul Predki
    • , Paul Richardson
    • , Sarah Wenning
    • , Tom Slezak
    • , Norman Doggett
    • , Jan-Fang Cheng
    • , Anne Olsen
    • , Susan Lucas
    • , Christopher Elkin
    • , Edward Uberbacher
    •  & Marvin Frazier
  5. Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, One Baylor Plaza, Houston, Texas 77030, USA;

    • Richard A. Gibbs
    • , Donna M. Muzny
    • , Steven E. Scherer
    • , John B. Bouck
    • , Erica J. Sodergren
    • , Kim C. Worley
    • , Catherine M. Rives
    • , James H. Gorrell
    •  & Michael L. Metzker
  6. Department of Cellular and Structural Biology, The University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, Texas 78229-3900, USA;

    • Susan L. Naylor
  7. Department of Molecular Genetics, Albert Einstein College of Medicine, 1635 Poplar Street, Bronx, New York 10461, USA;

    • Raju S. Kucherlapati
  8. Baylor College of Medicine Human Genome Sequencing Center and the Department of Microbiology & Molecular Genetics, University of Texas Medical School, PO Box 20708, Houston, Texas 77225, USA

    • George M. Weinstock
  9. RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama-city, Kanagawa 230-0045, Japan

    • Yoshiyuki Sakaki
    • , Asao Fujiyama
    • , Masahira Hattori
    • , Tetsushi Yada
    • , Atsushi Toyoda
    • , Takehiko Itoh
    • , Chiharu Kawagoe
    • , Hidemi Watanabe
    • , Yasushi Totoki
    •  & Todd Taylor
  10. Genoscope and CNRS UMR-8030, 2 Rue Gaston Cremieux, CP 5706, 91057 Evry Cedex, France

    • Jean Weissenbach
    • , Roland Heilig
    • , William Saurin
    • , Francois Artiguenave
    • , Philippe Brottier
    • , Thomas Bruls
    • , Eric Pelletier
    • , Catherine Robert
    •  & Patrick Wincker
  11. GTC Sequencing Center, Genome Therapeutics Corporation, 100 Beaver Street, Waltham, Massachusetts 02453-8443, USA

    • Douglas R. Smith
    • , Lynn Doucette-Stamm
    • , Marc Rubenfield
    • , Keith Weinstock
    • , Hong Mei Lee
    •  & JoAnn Dubois
  12. Department of Genome Analysis, Institute of Molecular Biotechnology, Beutenbergstrasse 11, D-07745 Jena, Germany

    • André Rosenthal
    • , Matthias Platzer
    • , Gerald Nyakatura
    • , Stefan Taudien
    •  & Andreas Rump
  13. Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing 100101, China;

    • Huanming Yang
    • , Jun Yu
    •  & Jian Wang
  14. Southern China National Human Genome Research Center, Shanghai 201203, China;

    • Guyang Huang
  15. Northern China National Human Genome Research Center, Beijing 100176, China

    • Jun Gu
  16. Multimegabase Sequencing Center, The Institute for Systems Biology, 4225 Roosevelt Way, NE Suite 200, Seattle, Washington 98105, USA

    • Leroy Hood
    • , Lee Rowen
    • , Anup Madan
    •  & Shizen Qin
  17. Stanford Genome Technology Center, 855 California Avenue, Palo Alto, California 94304, USA

    • Ronald W. Davis
    • , Nancy A. Federspiel
    • , A. Pia Abola
    •  & Michael J. Proctor
  18. Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, California 94305-5120, USA

    • Richard M. Myers
    • , Jeremy Schmutz
    • , Mark Dickson
    • , Jane Grimwood
    •  & David R. Cox
  19. University of Washington Genome Center, 225 Fluke Hall on Mason Road, Seattle, Washington 98195, USA

    • Maynard V. Olson
    • , Rajinder Kaul
    •  & Christopher Raymond
  20. Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan

    • Nobuyoshi Shimizu
    • , Kazuhiko Kawasaki
    •  & Shinsei Minoshima
  21. University of Texas Southwestern Medical Center at Dallas, 6000 Harry Hines Blvd., Dallas, Texas 75235-8591, USA

    • Glen A. Evans
    • , Maria Athanasiou
    •  & Roger Schultz
  22. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, 620 Parrington Oval, Rm 311, Norman, Oklahoma 73019, USA

    • Bruce A. Roe
    • , Feng Chen
    •  & Huaqin Pan
  23. Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany

    • Juliane Ramser
    • , Hans Lehrach
    •  & Richard Reinhardt
  24. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA

    • W. Richard McCombie
    • , Melissa de la Bastide
    •  & Neilay Dedhia
  25. GBF - German Research Centre for Biotechnology, Mascheroder Weg 1, D-38124 Braunschweig, Germany

    • Helmut Blöcker
    • , Klaus Hornischer
    •  & Gabriele Nordsiek
  26. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A, 8600 Rockville Pike, Bethesda, Maryland 20894, USA;

    • Richa Agarwala
    • , L. Aravind
    • , Hsiu-Chuan Chen
    • , Deanna Church
    • , Wonhee Jang
    • , Paul Kitts
    • , Eugene V. Koonin
    • , Greg Schuler
    • , Danielle Thierry-Mieg
    • , Jean Thierry-Mieg
    • , Lukas Wagner
    •  & Yuri I. Wolf
  27. Department of Genetics, Case Western Reserve School of Medicine and University Hospitals of Cleveland, BRB 720, 10900 Euclid Ave., Cleveland, Ohio 44106, USA;

    • Jeffrey A. Bailey
    •  & Evan E. Eichler
  28. EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom;

    • Ewan Birney
    • , Lorenzo Cerutti
    • , Henning Hermjakob
    • , Arek Kaspryzk
    • , Nicola Mulder
    • , Guy Slater
    •  & Elia Stupka
  29. Max Delbrück Center for Molecular Medicine, Robert-Rossle-Strasse 10, 13125 Berlin-Buch, Germany;

    • Peer Bork
    •  & Tobias Doerks
  30. EMBL, Meyerhofstrasse 1, 69012 Heidelberg, Germany;

    • Peer Bork
    • , Richard R. Copley
    • , Tobias Doerks
    •  & Jörg Schultz
  31. Dept. of Biology, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139-4307, USA;

    • Christopher B. Burge
    •  & Ru-Fang Yeh
  32. Howard Hughes Medical Institute, Dept. of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA;

    • Sean R. Eddy
    • , L. Steven Johnson
    •  & Thomas A. Jones
  33. Dept. of Computer Science, University of California at Santa Cruz, Santa Cruz, California 95064, USA;

    • Terrence S. Furey
  34. Affymetrix, Inc., 2612 8th St, Berkeley, California 94710, USA;

    • Cyrus Harmon
    • , David Kulp
    • , Raymond Wheeler
    •  & Alan Williams
  35. Genome Exploration Research Group, Genomic Sciences Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan;

    • Yoshihide Hayashizaki
  36. Howard Hughes Medical Institute, Department of Computer Science, University of California at Santa Cruz, California 95064, USA;

    • David Haussler
  37. University of Dublin, Trinity College, Department of Genetics, Smurfit Institute, Dublin 2, Ireland;

    • Karsten Hokamp
    • , Aoife McLysaght
    •  & Kenneth H. Wolfe
  38. Cambridge Research Laboratory, Compaq Computer Corporation and MIT Genome Center, 1 Cambridge Center, Cambridge, Massachusetts 02142, USA;

    • Simon Kasif
    • , Tarjei Mikkelsen
    •  & Joseph Szustakowki
  39. Dept. of Mathematics, University of California at Santa Cruz, Santa Cruz, California 95064, USA;

    • Scot Kennedy
  40. Dept. of Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA;

    • W. James Kent
  41. Crown Human Genetics Center and Department of Molecular Genetics, The Weizmann Institute of Science, Rehovot 71600, Israel;

    • Doron Lancet
  42. Dept. of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA;

    • Todd M. Lowe
  43. The University of Michigan Medical School, Departments of Human Genetics and Internal Medicine, Ann Arbor, Michigan 48109, USA;

    • John V. Moran
  44. MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK;

    • Chris P. Ponting
  45. Institute for Systems Biology, 4225 Roosevelt Way NE, Seattle, WA 98105, USA

    • Arian F. A. Smit
  46. National Human Genome Research Institute, US National Institutes of Health, 31 Center Drive, Bethesda, Maryland 20892, USA;

    • Francis Collins
    • , Mark S. Guyer
    • , Jane Peterson
    • , Adam Felsenfeld
    •  & Kris A. Wetterstrand
  47. Office of Science, US Department of Energy, 19901 Germantown Road, Germantown, Maryland 20874, USA;

    • Aristides Patrinos
  48. The Wellcome Trust, 183 Euston Road, London, NW1 2BE, UK.

    • Michael J. Morgan

Consortia

  1. International Human Genome Sequencing Consortium

    Whitehead Institute for Biomedical Research, Center for Genome Research:

    The Sanger Centre:

    Washington University Genome Sequencing Center

    US DOE Joint Genome Institute:

    Baylor College of Medicine Human Genome Sequencing Center:

    RIKEN Genomic Sciences Center:

    Genoscope and CNRS UMR-8030:

    Department of Genome Analysis, Institute of Molecular Biotechnology:

    GTC Sequencing Center:

    Beijing Genomics Institute/Human Genome Center:

    Multimegabase Sequencing Center, The Institute for Systems Biology:

    Stanford Genome Technology Center:

    University of Oklahoma's Advanced Center for Genome Technology:

    Max Planck Institute for Molecular Genetics:

    Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center:

    GBF—German Research Centre for Biotechnology:

    *Genome Analysis Group (listed in alphabetical order, also includes individuals listed under other headings):

    Scientific management: National Human Genome Research Institute, US National Institutes of Health:

    Stanford Human Genome Center:

    University of Washington Genome Center:

    Department of Molecular Biology, Keio University School of Medicine:

    University of Texas Southwestern Medical Center at Dallas:

    Office of Science, US Department of Energy:

    The Wellcome Trust:

Authors

    Supplementary information

    Comments

    By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.