Introduction

The paternally inherited human Y chromosome has a well-established phylogeny1 in which each clade, or haplogroup, is defined by one or more binary markers, such as single-nucleotide polymorphisms and small insertion/deletion polymorphisms (here collectively referred to as SNPs). Each haplogroup has a specific, and sometimes distinct, geographical distribution, shaped by demographic and evolutionary events, such as range expansion, migration and drift. Y-chromosome haplogroup O, characterized by a 5-bp deletion known as M175 (rs2032678), is the dominant haplogroup among males throughout East and Southeast Asia, where its frequency typically ranges between 50 and 100%,2, 3 while being more rare in Central Asia.4 In addition, haplogroup O is observed in coastal and island parts of Near Oceania,5 is frequent overall in Remote Oceania,5 and is present in Madagascar,6 all due to population movements from East/Southeast Asia. Although several SNPs phylogenetically downstream of M175 are known since more than a decade,7 recent advances in Y-SNP discovery have further resolved the internal topology of haplogroup O substantially.1, 8, 9 To make this accumulated knowledge easily available for future human Y-chromosome studies, we developed a multiplex Y-SNP genotyping tool, based on single-base primer extension (SNaPshot) technology, for the discrimination of the most significant haplogroup O sublineages.

Materials and methods

DNA samples

For testing and validation of the assay, we used DNA from samples of the HapMap 3 reference panel,10 obtained through the Coriell Institute for Medical Research (http://www.coriell.org/), from samples of the European Collection of Cell Cultures (ECACC) ethnic diversity panel (EDP-1) (http://www.hpacultures.org.uk/products/dna/ethnicdna.jsp), and from samples described previously.11

Marker selection

Informative Y-SNP markers were selected by surveying haplogroup frequency data available from the literature as summarized in Figure 1. The final marker selection included M175, M119, P203 (also known as M307), M110, M268, M95, M88, M176 (also known as SRY465 or PS63), M122, M324, KL1 (also known as L465), 002611, P201 (also known as 021354), M7, M134 and PS23.

Figure 1
figure 1

Marker phylogeny of the Y-SNPs included in the multiplex assay (left part), with previously reported haplogroup frequency data (in percentages) for a range of populations (right part). When the SNP resolution of the available haplogroup frequency data did not match that of the multiplex assay, multiple haplogroups were combined into one class as indicated by the box lines; for example, 2.2% of Koreans fall within haplogroup O-M119, but it is not known how these are distributed over O-M119*(xP203,M110), O-P203 and O-M110, because markers P203 and M110 were not typed in this population sample. In addition, some studies used a different, but phylo-equivalent marker, as compared with the marker included in our assay; in such case, the alternative marker is also shown in the phylogeny; for example, in the Malagasy marker M50 was typed rather than M110. Data sources for haplogroup frequencies are as follows: northeast Indians;13 Han Chinese;9 Koreans;14 Japanese;8 Taiwanese aborigines, western Indonesians and eastern Indonesians;3 Admiralty Islanders, Solomon Islanders and Polynesians;15 Malagasy;6 US Asian Americans.16

Primer design and genotyping protocol

PCR and extension primers (Table 1) were designed as described previously.12

Table 1 Genotyping details of the Y-SNP haplogroup O multiplex

Multiplex PCR amplification was carried out in a reaction volume of 6 μl, containing 1 × GeneAmp PCR Gold Buffer (Applied Biosystems, Foster City, CA, USA), 4.5 mM MgCl2 (Applied Biosystems), 100 μM of each dNTP (Roche, Mannheim, Germany), 0.35 units of AmpliTaq Gold DNA polymerase (Applied Biosystems), 1–2 ng of genomic DNA template and PCR primers (desalted; Metabion, Martinsried, Germany) in concentrations as specified in Table 1. The reactions were performed in a Dual 384-well GeneAmp PCR System 9700 (Applied Biosystems) with the following cycling conditions: 10 min at 95 °C, followed by 30 cycles of 94 °C for 15 s, 60 °C for 45 s, and a final extension at 60 °C for 5 min. PCR products were purified by adding 2μl ExoSAP-IT (USB Corporation, Cleveland, OH, USA) to 6 μl PCR product and incubation at 37 °C for 30 min followed by 80 °C for 15 min.

Multiplex single-base primer extension was carried out in a reaction volume of 6 μl, containing 1 μl SNaPshot Ready Reaction Mix (Applied Biosystems), 1 μl purified PCR product and extension primers (high-performance liquid chromatography-purified; Metabion) in concentrations as specified in Table 1. The reactions were performed in a Dual 384-well GeneAmp PCR System 9700 (Applied Biosystems) with the following cycling conditions: 2 min at 96 °C, followed by 25 cycles of 96 °C for 10 s, 50 °C for 5 s and 60 °C for 30 s. The reaction products were purified by adding 1 unit of shrimp alkaline phosphatase (USB Corporation) to 6 μl of extension product, and incubation at 37 °C for 45 min followed by 75 °C for 15 min.

The extended fragments were analyzed by capillary electrophoresis using a 3130xl Genetic Analyzer (Applied Biosystems) with POP-7 polymer. A mixture of 1 μl purified extension product, 8.7 μl Hi-Di formamide (Applied Biosystems) and 0.3 μl GeneScan-120 LIZ internal size standard (Applied Biosystems) was run with 23 s injection time at 1.2 kV, and 500 s run time at 15.0 kV. Data were analyzed using GeneMapper version 3.7 software (Applied Biosystems).

Results and discussion

The assay introduced here includes a total of 16 Y-SNPs, among which there are four recently discovered markers (P203, KL1, 002611 and P201) that prove phylogenetically informative and hence will aid to improve the resolution of future Y-chromosome studies. Figure 1 shows the phylogenetic relationship of the 16 Y-SNPs included and provides frequency data of the respective haplogroups for a range of populations. Figure 2 shows typical electropherograms obtained with the multiplex assay for a range of DNA samples with different haplogroup O subgroups. In all cases, all 16 allele peaks were clearly visible and showed no violation of the established marker topology (Figure 1). The assay is sensitive by virtue of small amplicon sizes (minimum: 45 bp; maximum: 123 bp; average: 90 bp), and is therefore expected to be applicable using small amounts of (degraded) DNA, such as is often encountered in forensic casework and ancient DNA studies. Typically, we obtained good results when using template DNA amounts of 1–2 ng. Furthermore, we tested our assay on a number of commonly available reference DNA samples; we report the determined Y haplogroups in Table 2 so that other researchers can use them as control DNA samples in future studies.

Figure 2
figure 2

Typical electropherograms obtained with the Y-haplogroup O multiplex assay introduced here. Samples belonging to a range of different haplogroups (indicated at the left of each electropherogram) were selected such that every possible allele can be seen at least once. For each peak, the detected allele is indicated in concordance with Table 1. As is convention, the yellow dye is shown as black for better contrast. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 2 Haplogroup information of commonly available reference samples

Previous population genetic studies have employed various combinations of the hitherto-known SNPs within Y-chromosome haplogroup O. However, some of the recently discovered Y-SNPs appear to be informative for the breakup of previously undifferentiated clusters. For instance, a considerable fraction of Southeast Asian males that were previously classified as O-M119*(xM110), turned out to belong to O-P203,3 which is a subhaplogroup of O-M119 and a sister haplogroup to O-M110.1 We foresee that future studies making use of our assay will benefit from the inclusion of such an increased marker set, allowing to reveal patterns of genetic distribution that would otherwise (at lower phylogenetic resolution) remain unnoticed. Furthermore, such studies will generate further knowledge regarding the precise geographic distribution of each of the haplogroup O sublineages.

In conclusion, we provide a convenient and sensitive multiplex genotyping assay for the dissection of the most significant Y-chromosome haplogroup O sublineages. The assay can be applied to male DNA samples for which prior testing (for example, by the use of a global Y-SNP assay12) revealed haplogroup O status, to retrieve more detailed Y-chromosome diversity and patrilineal biogeographic ancestry information, thus being of relevance in human population genetics, anthropological, genealogical, as well as forensic studies.