A low-resolution physical map of the Y chromosome was previously assembled by testing naturally occurring deletions and yeast artificial chromosome (YAC) clones for the presence or absence of 182 Y-chromosomal sequence-tagged sites (STSs)3,6. These STS markers were generated from Y DNA sequences selected at random, which promoted representative sampling of the entire chromosome6. Nonetheless, most randomly selected sequences proved unusable as map landmarks because they corresponded either to interspersed repetitive elements found throughout the genome or to male-specific repetitive sequences dispersed to many locations in the NRY.

To construct a high-resolution map, we generated additional STSs in a directed manner. To enrich for the single-copy sequences most useful as map landmarks, we systematically applied genomic clone subtraction, whereby a ‘tracer’ clone's DNA is depleted of sequences shared with a set of ‘driver’ clones4. We identified a tiling path of 57 YACs that collectively spanned the euchromatic NRY3, and then carried out, in parallel, 57 subtractions, each employing one YAC as tracer and the remaining YACs (minus those overlapping the tracer YAC) as drivers. We sequenced a random sample of products from each of the 57 subtractions, identifying 308 additional STSs that proved useful in map assembly.

We used radiation hybrid mapping to integrate and order the random and subtraction-derived STSs. Large-fragment radiation hybrid panels offering long-range connectivity have been used to assemble human genome maps at 500–1,000-kilobase (kb) resolution7,8,9,10. To obtain greater resolution in the NRY, where the number of STSs appeared sufficient to cover the euchromatic region at an average spacing of 50 kb, we used a small-fragment radiation hybrid panel that had been used to build detailed maps of limited autosomal segments11. We tested this panel for all random and subtraction-derived NRY STSs. Nascent radiation hybrid linkage groups were ordered and oriented with respect to the centromere by positioning selected STSs on the existing map of natural deletions6. Additional STSs generated in our project's later phases were also tested. Ultimately, 513 STSs were positioned, at a resolution of around 50 kb, on a radiation hybrid map encompassing nearly the entire euchromatic NRY.

To prepare for sequencing the NRY, we used the radiation hybrid map as a scaffold for assembling contigs of bacterial artificial chromosome (BAC) clones. We screened a BAC library of human male genomic DNA with hybridization probes derived from NRY STSs. Through subsequent polymerase chain reaction tests of STS content, we assembled 1,038 BACs into contigs that, except for four small gaps, represented the whole NRY (see Fig. 1 in Supplementary Information). Many portions of this BAC map could be assembled only after the sequence of selected BACs had been determined and compared. Also, many NRY genes and extragenic sequences are known to have closely related counterparts on the X chromosome12. In many cases, it was initially unclear whether BACs identified using X-homologous NRY STSs, especially those from a 4-Mb region of 99% X–Y identity13,14, derived from the NRY or from the X chromosome. We resolved these ambiguities by resequencing STSs from the BACs in question and comparing them to X- or Y-derived reference sequences. The resulting map of overlapping BACs and ordered STSs (Fig. 1 in Supplementary Information) was extensively cross-checked against the radiation hybrid map and was further reinforced by restriction fingerprinting of all mapped BACs15.

Figure 1: Repetitive structure of euchromatic NRY.
figure 1

Bottom, schematic of the Y chromosome, comprising large NRY flanked by pseudoautosomal regions (yellow). NRY is divided into euchromatic and heterochromatic (tan, shown truncated) portions, roughly 24 and 30 Mb, respectively. pter, short-arm telomere; cen, centromere; qter, long-arm telomere. Within euchromatic NRY, regions rich in NRY-specific amplicons (blue) or sequence similarity to X chromosome (red) are shown. Above chromosome schematic are positions of some NRY genes; most are found in amplicons (blue) or have X-linked homologues (red). Above genes is a plot of the average number of NRY BACs that contain each of the 758 STSs mapped (136 of these STSs at two or more locations) along euchromatic NRY. As expected, STSs in amplicon regions tended to be present in more BACs than STSs in X-homologous or unshaded regions. (Plotted values are local averages within sliding window of five consecutive STSs; values reflect all NRY BACs containing those STSs, not just BACs assigned to site indicated.) Some amplicon regions were under-represented in the BAC library; four gaps remain (red diamonds; ≤100 kb each) in BAC coverage of NRY. Top, STS-based dot plot of euchromatic NRY. Each dot reflects occurrence of a particular STS at two points in map (complete map shown in Fig. 1 in Supplementary Information). Dots fall almost exclusively within amplicon regions. Many repeats of entire groups of STSs are apparent, with lines parallel to light grey diagonal indicating direct repeats and lines perpendicular to light grey diagonal indicating inverted repeats. Green arrows, inverted repeats flanking 3.5-Mb inversion (see text). Pale red lines, centromere.

The greatest challenges were posed by massive, NRY-specific amplified regions (or amplicons), which comprise about one-third of the euchromatic NRY. Of the 758 STSs on which the map is built, 136 are present at two or more locations in the NRY. Although we avoided such repetitive STSs in favour of single-copy STSs wherever possible, substantial portions of the euchromatic NRY contained little or no single-copy sequence. For many such amplicons, BACs derived from different copies could not be distinguished by STS content or restriction fingerprinting15. In many cases, we distinguished among amplicon copies (and the BACs corresponding to them) by typing ‘sequence family variants’ (SFVs)5. SFVs are subtle differences (for example, single-nucleotide substitutions or dinucleotide repeat length alterations) between closely related but non-allelic sequences. We were analysing BACs from only one male's Y chromosome, so these subtle sequence differences could not represent allelic variants. In general, we identified SFVs only after comparing the DNA sequences of BACs that originated from distinct copies, despite having similar STS content. Thus, mapping and sequencing were inseparable, iterative activities in amplicon-rich regions.

The euchromatic NRY amplicons are diverse in composition, size, copy number and orientation (Fig. 1), with some occurring as tandem repeats, others as inverted repeats, and still others dispersed throughout both arms of the chromosome. The euchromatic amplicons are well populated with testis-specific gene families that may be critical in spermatogenesis (see Fig. 1 in Supplementary Information)1,2.

One pair of amplicons is of particular interest in the context of human variation. Highlighted in Fig. 1 (arrows) are two units, each 300 kb long, that exist in opposite orientations on the short arm. These inverted repeats bound a region of around 3.5 Mb that occurs in one orientation (Fig. 1 in Supplementary Information) in the male from whom the BAC library was constructed, but in the opposite orientation in the existing map of naturally occurring deletions6. This may reflect variation among men for a 3.5-Mb inversion1,16, perhaps the result of homologous recombination between the 300-kb inverted repeats flanking the inverted segment. Large Y-chromosome inversions are postulated to have been crucial in the evolution of the human sex chromosomes12, and this 3.5-Mb inversion may be one of many massive NRY variants that exist in modern populations.