Main

The task of sequencing the 3,200 megabase (Mb) human genome can be subdivided into individual chromosome projects ranging in size from 263 Mb (chromosome 1)1 to about 35 Mb (chromosomes 21q and 22q)2,3. Our strategy, in common with other groups4,5,6, was to map selected chromosomes individually, and then to combine the results with those of whole-genome mapping studies into a single map of the human genome7. Chromosome maps were constructed as follows (see Supplementary Information). First, we constructed a landmark map for each chromosome. Second, we identified bacterial clones (bacterial- or P1-derived artificial chromosomes (BACs or PACs)) from genomic libraries using the chromosome-specific landmarks, and assembled them into contigs on the basis of shared restriction enzyme fingerprints and landmark content. Third, contigs were extended and joined by chromosome walking. Walking was carried out using BAC end sequences generated in house from clones selected at the ends of contigs, or from the publicly available resources; joins were also made by identification of overlaps using genomic sequence data.

Clones that were representative of each contig were selected regularly for inclusion in the ‘tiling path’ (a set of minimally overlapping clones) for genomic sequencing8. Clones from the tiling path of each chromosome were deposited in the ‘HumanMap’ database at the Genome Sequencing Centre7 (http://genome.wustl.edu/gsc/human/Mapping/), for integration with the data obtained by whole-genome fingerprinting. New clones were identified from this integrated dataset to assist the extension and closure of the chromosome maps (Table 1). A detailed description of tiling paths and the underlying clone contigs is available as Supplementary Information and will continue to be updated at http://www.sanger.ac.uk; all clones are publicly available.

Table 1 Status of chromosome maps

A key question is the extent of coverage of each chromosome in the map. We analysed the coverage of the euchromatic regions on the basis of the assumptions given below and the approximate estimates that were available for chromosome length. Heterochromatic regions, which are estimated to comprise 3–15% of each chromosome (Table 1), are absent from the contigs analysed here. On the basis of the fingerprint bands in the maps (converted to Mb as described in Supplementary Information), 176 contigs represent 927 Mb, or 94% of the estimated total of 981 Mb euchromatin. Half of this (485 Mb) is in fifteen contigs of 22–62 Mb, that illustrates the extent of continuity obtained. We also analysed the representation of human gene markers in the map. The clone map contained 96% of the unique markers that were previously mapped to these chromosomes in GeneMap99 (ref. 9), and 90% were present in the genome sequence. As expected, this figure is higher for individual chromosomes 20 (99%) and 22q (98%), which are nearly or completely finished (Table 1).

Independent corroboration of the coverage of each chromosome required identification of the boundaries between euchromatic sequence and the centromeric, telomeric, and other heterochromatic repeat sequences, as well as measurement of the remaining gaps in the map. We have measured 52 of the 57 remaining gaps in the maps of chromosomes 6, 9, 10, 13 and 20 by fluorescent in situ hybridization (FISH), in addition to the gaps previously measured on chromosome 22. Clones immediately flanking each gap were hybridized to extended DNA fibres, interphase nuclei or metaphase chromosome spreads.

An example of this analysis is given for chromosome 10 (Fig. 1). Fourteen contigs account for 124.4 Mb of DNA. They contain pericentromeric satellite sequences, and also a subtelomeric sequence on the short arm which is <0.1 Mb from the telomere; the most distal clone on the long arm is <0.15 Mb from the telomere10. From FISH analysis, the total extent of euchromatic gaps is ≤ 4.2 Mb. From these measurements, the estimated coverage of euchromatin in the map can be revised from 89% (based on Morton's previous estimate; Table 1) to 96.7% (124.4 of 128.6 Mb). A 9.75-Mb restriction map that spans the centromere has been constructed for chromosome 10 (ref. 11). By anchoring this to the clone maps on both chromosome arms, we estimate that the size of the gap across the centromeric region is around 4.5 Mb. This analysis provides a new estimate of 133 Mb for the total length of the chromosome (Fig. 1), compared with the previous value of 144 Mb. In a similar analysis of chromosome 13, seven contigs span 102 Mb and the remaining gaps in the euchromatic region total a maximum of 5 Mb. This leads to a new, increased estimate of the size of the euchromatic region of 107 Mb, and the current coverage in the map is 95%.

Figure 1: Physical map of chromosome 10.
figure 1

Clone contigs cover 124.4 Mb, euchromatic gaps cover 4.2 Mb, and the gap across centromeric satellites is 4.5 Mb11. The separation between contigs 43 and 16 was determined as 4 Mb on the basis of FISH of metaphase chromosome spreads, from which the sum of contigs 3000–3003 (1.6 Mb) was subtracted for a net gap coverage of 2.4 Mb in this region of 10q11.

Our strategy may form the basis for finishing the map and sequence of every human chromosome. Accurate measurement of all gap sizes assists our continued efforts to bridge the remaining gaps in clones, using other cloning systems (for example yeast artificial chromosomes (YACs), plasmids and bacteriophage lambda), and characterization of the centromeric satellites and the remaining heterochromatic regions, including the short arms of the acrocentric chromosomes (13, 14, 15, 21 and 22), will enable us to determine the true physical extent of DNA in the human genome.