Haplotypic polymorphisms and mutation rate estimates of 22 Y-chromosome STRs in the Northern Chinese Han father–son pairs

Y chromosome Short tandem repeats (Y-STRs) analysis has been widely used in forensic identification, kinship testing, and population evolution. An accurate understanding of haplotype and mutation rate will benefit these applications. In this work, we analyzed 1123 male samples from Northern Chinese Han population which including 578 DNA-confirmed father-son pairs at 22 Y-STRs loci. A total of 537 haplotypes were observed and the overall haplotype diversity was calculated as 1.0000 ± 0.0001. Except that only two haplotypes were observed twice, all the rest of the 535 were unique. Furthermore, totally 47 mutations were observed during 13,872 paternal meiosis. The mutation rate for each locus estimates ranged from 0.0 to 15.6 × 10−3 with an average mutation rate 3.4 × 10−3 (95% CI 2.5–4.5 × 10−3). Among the 22 loci, DYS449, DYS389 II and DYS458 are the most prone to mutations. This study adds to the growing data on Y-STR haplotype diversity and mutation rates and could be very useful for population and forensic genetics.

Y chromosome Short tandem repeats (Y-STRs) are widely used in genetic epidemiology 1 , forensic genetics 2 and human migration 3 because of its paternal inheritance and human population structuring 4 . However, just the same as autosomal STR, Y-STRs also have high mutation rates 5 . Therefore, reliable estimates of mutation rates of Y-STRs are a prerequisite for the accurate application based on Y-STR analysis. Several studies on estimating Y-STR mutation rates had been reported, such as investigating the father-son pairs from confirmed paternity 6 , male individuals from deep-rooted pedigrees 7 , genotyping sperm cells 8 , and using Y-STR population data with known history 9 . Of these approaches, estimating Y-STR mutation rates through the direct observation of allelic transmission between father and son is the most accurate, as long as large numbers of meiosis could be investigated.
In this study, We determined the haplotypes and mutation rates for the 22

Materials and Methods
Samples and DNA extraction. Blood samples were collected from 1123 healthy Northern Chinese Han male individuals. Among these individuals, there had 578 father-son pairs. All father-son pairs were confirmed by using autosomal STRs typing based on 39 autosomal STRs by using Microreader TM 21 ID and 23 SP system (Microread Genetics Incorporation, China), with a minimum paternity probability of 99.99%. All individuals signed the informed consent before participating in this study. Genomic DNA was extracted using Chelex resin method 10  and strictly followed the recommendations on the analysis of Y-STRs by DNA Commission of the International Society of Forensic Genetics (ISFG) 11 .

Statistical analysis.
Haplotype and allele frequencies were calculated by the gene counting method. Gene diversity (GD) for each locus was calculated using the formula: GD = [n (1 − ∑pi 2 )]/(n − 1), where n is the number of alleles, pi is the frequency of the ith allele. Discrimination capacity (DC) was determined as DC = Ndiff/N, where Ndiff and N was the number of different haplotypes and the sample size, repectivly. Haplotype diversity (HD) and Standard error (SE) was calculated according to Nei's formula 12 . Mutation rates were calculated as the number of mutations divided by the number of Meiosis. Confidence intervals (CI) were estimated from the binominal standard deviation 13 . In mutation counting, there were two father-son pairs where one-step mutation seen for both DYS389I and DYS389II, for instance (13, 29) → (14, 30). These were treated as one mutation instead of two because DYS389I is part of the sequence called DYS389II. According to the repeat numbers of alleles per locus, the alleles were categorized into short (25%), medium (50%) and long class (25%) as described by Ge et al. 14 , used to evaluate the relationship between allele size and corresponding mutation rate.

Results and Discussion
Allele frequencies and gene diversity. Allele   From these, 531 haplotypes (99.25%) were observed once, 4 were observed twice (0.75%). Although the number of unique haplotype increased when additional Y-STR loci were combined, however, in this study, only 2 unique haplotype were increased with 5 loci were added compared with Y filer. This suggest that to achieve the goal for high haplotype resolution for Y-STR analysis, selecting appropriate loci, such as the Rapidly mutating Y-STRs 15 , should be considered.

Variant alleles.
Thirty four copy number variants were detected in 1123 males. Variant alleles were confirmed by re-amplification and genotyping. Null alleles were observed at DYS448 (6 father-son pairs), DYS19 (1 father-son pair) and DYS527a/b (1 father-son pair). Primers were designed for larger PCR fragments of these 3 loci, but failed to produce amplicons in the test samples (data not shown). DYS448 is located within the azoospermia factor c gene (AZFc) in the distal euchromatic part of the Y chromosome long arm. AZFc consists almost entirely of very long direct and inverted repeats. Therefore, it is prone to partial deletions or duplications by rearrangements 16 . The DYS448 null allele has been reported by several studies [17][18][19][20] . The relatively high frequencies of the DYS448 null allele in Asians suggest giving careful consideration to the use of DYS448 for commercial genotyping and further database construction in Asians. Triplications were observed at DYS527a/b (8 father-son pairs) and DYS385 a/b (1 father-son pair). These variants are not rare in forensic casework and they should be interpreted carefully to exclude mixed profiles. These variants have been considered due to non-allelic, homologous recombination 21 .
Mutation rates. In this study, 578 meiosis from fathers to sons were observed, in which 47 mutations were found at all the studied loci except DYS47, DYS438, DYS447, DYS522, and DYS388 (Table 2). There are no more than one locus mutations in the same father-son pair. Except one three-step mutation occurred at DYS449 (32 → 29), all remaining mutations were single step, namely, 97.9% mutations were one step. This finding is consistent with the general notion that the majority of mutations comprise single step repeat gain or loss due to strand slippage during replication 22 . Among these 47 mutations, 26 mutations (i.e., 55.3%) gained repeats, and 21 mutations (i.e., 44.7%) lost repeats. Hence, the data herein support that mutations at these Y chromosome microsatellites do not have any contraction or expansion bias.

Locus
This study Ballantyne et al. 25 Burgarella et al. 26 No. of meiosis  The average mutation rate across these 22 Y-STR loci was 0.0034 (95% confidence interval (CI), 0.0025-0.0045), which was close to the average mutation rates across 16 Y-STR markers of the Texas populations (i.e., 0.0021) by Ge et al. 14 and the South China Han population (i.e., 0.0023) by Weng et al. 23 The mutation rates of the 22 Y-STR loci ranged from 0.0000 (95% CI, 0.0000-0.0064) to 0.0156 (95% CI, 0.0076-0.0311). Mutation counts and rates by relative allele sizes (short, moderate, and long) for each locus is shown in Table 3. In the Northern Chinese Han population, the mutation rate of long alleles (6.9 × 10 −3 ) is significantly greater than short (1.9 × 10 −3 ) and moderate (2.5 × 10 −3 ) alleles. Therefore, the longer alleles are more likely to be mutated than short alleles, which is consistent with the previous studies 14,23,24 .
It is more accurate to estimate the Y-STR mutation rate is by testing a large number of meiosis from father-son pairs. Ballantyne et al. 25 provided Y-STR mutation rates for a large number of Y-STR markers in a reasonably large number of up to 2000 DNA-confirmed father-son pairs collected from the Germany and Poland. Burgella et al. 26 performed a meta-analysis to estimate the mutation rate for 110 Y-STRs combining population and father-son pair data. A comparison of our data to these published rates was shown in Table 4. The mutation rates for most of the shared loci were similar except DYS449, which was 1.9 × 10 −3 reported by Burgarella and only approximately one eighth and one seventh of our and Ballantyne's study.

Conclusion
In this study, we investigated the haplotype diversity and estimated mutation rates for 22 Y-STRs in 578 fatherson pairs in a Northern Chinese Han population. We detected 537 distinct haplotypes in 539 male individuals, which indicating a high power to distinguish unrelated male individuals. Furthermore, totally 47 mutations were observed during 13,872 paternal meiosis. The mutation rate for each locus estimates ranged from 0.0 to 15.6 × 10 −3 with an average mutation rate 3.4 × 10 −3 (95% CI 2.5-4.5 × 10 −3 ). This study adds to the growing data on Y-STR haplotype diversity and mutation rates. It could be very useful for population and forensic genetics. However, to obtain precise knowledge of haplotype and mutation rate, more number of meiosis analyses involving more Y-STRs loci should be performed.