We constructed maps for eight chromosomes (1, 6, 9, 10, 13, 20, X and (previously) 22), representing one-third of the genome, by building landmark maps, isolating bacterial clones and assembling contigs. By this approach, we could establish the long-range organization of the maps early in the project, and all contig extension, gap closure and problem-solving was simplified by containment within local regions. The maps currently represent more than 94% of the euchromatic (gene-containing) regions of these chromosomes in 176 contigs, and contain 96% of the chromosome-specific markers in the human gene map. By measuring the remaining gaps, we can assess chromosome length and coverage in sequenced clones.
The task of sequencing the 3,200 megabase (Mb) human genome can be subdivided into individual chromosome projects ranging in size from 263 Mb (chromosome 1)1 to about 35 Mb (chromosomes 21q and 22q)2,3. Our strategy, in common with other groups4,5,6, was to map selected chromosomes individually, and then to combine the results with those of whole-genome mapping studies into a single map of the human genome7. Chromosome maps were constructed as follows (see Supplementary Information). First, we constructed a landmark map for each chromosome. Second, we identified bacterial clones (bacterial- or P1-derived artificial chromosomes (BACs or PACs)) from genomic libraries using the chromosome-specific landmarks, and assembled them into contigs on the basis of shared restriction enzyme fingerprints and landmark content. Third, contigs were extended and joined by chromosome walking. Walking was carried out using BAC end sequences generated in house from clones selected at the ends of contigs, or from the publicly available resources; joins were also made by identification of overlaps using genomic sequence data.
Clones that were representative of each contig were selected regularly for inclusion in the ‘tiling path’ (a set of minimally overlapping clones) for genomic sequencing8. Clones from the tiling path of each chromosome were deposited in the ‘HumanMap’ database at the Genome Sequencing Centre7 (http://genome.wustl.edu/gsc/human/Mapping/), for integration with the data obtained by whole-genome fingerprinting. New clones were identified from this integrated dataset to assist the extension and closure of the chromosome maps (Table 1). A detailed description of tiling paths and the underlying clone contigs is available as Supplementary Information and will continue to be updated at http://www.sanger.ac.uk; all clones are publicly available.
A key question is the extent of coverage of each chromosome in the map. We analysed the coverage of the euchromatic regions on the basis of the assumptions given below and the approximate estimates that were available for chromosome length. Heterochromatic regions, which are estimated to comprise 3–15% of each chromosome (Table 1), are absent from the contigs analysed here. On the basis of the fingerprint bands in the maps (converted to Mb as described in Supplementary Information), 176 contigs represent 927 Mb, or 94% of the estimated total of 981 Mb euchromatin. Half of this (485 Mb) is in fifteen contigs of 22–62 Mb, that illustrates the extent of continuity obtained. We also analysed the representation of human gene markers in the map. The clone map contained 96% of the unique markers that were previously mapped to these chromosomes in GeneMap99 (ref. 9), and 90% were present in the genome sequence. As expected, this figure is higher for individual chromosomes 20 (99%) and 22q (98%), which are nearly or completely finished (Table 1).
Independent corroboration of the coverage of each chromosome required identification of the boundaries between euchromatic sequence and the centromeric, telomeric, and other heterochromatic repeat sequences, as well as measurement of the remaining gaps in the map. We have measured 52 of the 57 remaining gaps in the maps of chromosomes 6, 9, 10, 13 and 20 by fluorescent in situ hybridization (FISH), in addition to the gaps previously measured on chromosome 22. Clones immediately flanking each gap were hybridized to extended DNA fibres, interphase nuclei or metaphase chromosome spreads.
An example of this analysis is given for chromosome 10 (Fig. 1). Fourteen contigs account for 124.4 Mb of DNA. They contain pericentromeric satellite sequences, and also a subtelomeric sequence on the short arm which is <0.1 Mb from the telomere; the most distal clone on the long arm is <0.15 Mb from the telomere10. From FISH analysis, the total extent of euchromatic gaps is ≤ 4.2 Mb. From these measurements, the estimated coverage of euchromatin in the map can be revised from 89% (based on Morton's previous estimate; Table 1) to 96.7% (124.4 of 128.6 Mb). A 9.75-Mb restriction map that spans the centromere has been constructed for chromosome 10 (ref. 11). By anchoring this to the clone maps on both chromosome arms, we estimate that the size of the gap across the centromeric region is around 4.5 Mb. This analysis provides a new estimate of 133 Mb for the total length of the chromosome (Fig. 1), compared with the previous value of 144 Mb. In a similar analysis of chromosome 13, seven contigs span 102 Mb and the remaining gaps in the euchromatic region total a maximum of 5 Mb. This leads to a new, increased estimate of the size of the euchromatic region of 107 Mb, and the current coverage in the map is 95%.
Our strategy may form the basis for finishing the map and sequence of every human chromosome. Accurate measurement of all gap sizes assists our continued efforts to bridge the remaining gaps in clones, using other cloning systems (for example yeast artificial chromosomes (YACs), plasmids and bacteriophage lambda), and characterization of the centromeric satellites and the remaining heterochromatic regions, including the short arms of the acrocentric chromosomes (13, 14, 15, 21 and 22), will enable us to determine the true physical extent of DNA in the human genome.
Morton, N. E. Parameters of the human genome. Proc. Natl Acad. Sci. USA 88, 7474–7476 (1991).
The chromosome 21 mapping and sequencing consortium. The DNA sequence of human chromosome 21. Nature 405, 311–319 (2000).
Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
Tilford, C. A. et al. A physical map of the human Y chromosome. Nature 409, 943–945 (2001).
Montgomery, K. T. A high-resolution map of human chromosome 12. Nature 409, 945–946 (2001).
Bruls, T. et al. A physical map of human chromosome 14. Nature 409, 947–948 (2001).
The International Human Genome Mapping Consortium. A physical map of the human genome. Nature 409, 934–941 (2001).
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Deloukas, P. et al. A physical map of 30,000 human genes. Science 282, 744–746 (1998).
Riethman, H. C. et al. Integration of telomere sequences with the draft human genome sequence. Nature 409, 948–951 (2001).
Jackson, M. S., See, C. G., Mulligan, L. M. & Lauffart, B. F. A 9.75-Mb map across the centromere of human chromosome 10. Genomics 33, 258–270 (1996).
We thank M. Sekhon, A. Chinwalla, J. McPherson and staff of the Genome Sequencing Centre, St. Louis for their assistance; the many collaborators who have contributed reagents and information to assist map construction on individual chromosomes; M. Jackson for helpful discussion; T. Cox, R. Pettett and the web team; and the Wellcome Trust for support.
About this article
IUBMB Life (2008)
Gestion des variations du nombre de séquences génomiques (CNV) en génétique humaine constitutionnelle utilisant l’hybridation génomique comparative en microréseau d’ADN (HGCM)
Pathologie Biologie (2008)
Nature Genetics (2007)
Small regions of overlapping deletions on 6q26 in human astrocytic tumours identified using chromosome 6 tile path array-CGH
Mammalian Genome (2006)