Y chromosome haplogroups based genome-wide association study pinpoints revelation for interactions on non-obstructive azoospermia

The Y chromosome has high genetic variability with low rates of parallel and back mutations, which make up the most informative haplotyping system. To examine whether Y chromosome haplogroups (Y-hgs) could modify the effects of autosomal variants on non-obstructive azoospermia (NOA), based on our previous genome-wide association study (GWAS), we conducted a genetic interaction analysis in GWAS subjects. Logistic regression analysis demonstrated a protective effect of Y-hg O3e* on NOA. Then, we explored the potential interaction between Y-hg O3e* and autosomal variants. Our results demonstrated that there was a suggestively significant interaction between Y-hg O3e* and rs11135484 on NOA (Pinter = 9.89 × 10−5). Bioinformatic analysis revealed that genes annotated by significant single nucleotide polymorphisms (SNPs) were mainly enriched in immunological pathways. This is the first study of interactions between Y-hgs and autosomal variants on a genome-wide scale, which addresses the missing heritability in spermatogenic impairment and sheds new light on the pathogenesis of male infertility.

Infertility is the inability of a sexually active couple to achieve pregnancy without contraception in one year 1 . It affects about one in six couples, and male contributions can be found in approximately half of the cases 2 . Apart from some acquired factors like mechanical injury, infection, medical use et al., something congenital such as genetic abnormalities plays a crucial role in the etiology of male infertility. Mutagenesis studies in vivo have identified hundreds of potential causal genes that impact the process of reproduction 3,4 , and some of them, like chromosomal abnormalities and Y chromosome micro-deletions, are used for the diagnosis of male infertility 5 .
Among these genetic studies, Genome-wide association studies (GWAS) present a powerful tool to detect candidate genes for a trait. Our previous GWAS identified some susceptibility loci for non-obstructive azoospermia (NOA) in Han Chinese 6,7 . Although GWAS have successfully identified some causal single nucleotide polymorphisms (SNPs) in the past few years, some researchers now find it more limited for complex diseases 8,9 . For example, GWAS focus on common SNPs and neglect rare mutations; and SNPs on sex chromosomes are commonly excluded from most GWAS due to the law of linkage disequilibrium. However, sex chromosomes, especially the Y chromosome, play central roles in sex determination, and it is improper to overlook effects of sex chromosomes when understanding the genetic etiology of a disease.
Y chromosome, in contrast to the rest of the genome, is confined to males and contains the smallest number of genes, most of which locate in the male specific region (MSY) 10 . Y chromosome has the most informative Scientific RepoRts | 6:33363 | DOI: 10.1038/srep33363 haplotyping system with applications in evolutionary studies, medical genetics and genealogical reconstruction 11 . Considering the function of the Y chromosome in sex determination, it has been reported that some Y chromosome haplogroups (Y-hgs) may increase the danger of spermatogenic impairment across different populations [12][13][14] .
In fact, as a mainly genetic background, Y-hgs may underlie phenotypic variations like SNPs in autosomes. And association studies considering only one of the Y chromosme or autosomes are likely to attain inconsistent results among different human populations or even in the same population [15][16][17] . A more appropriate strategy to explore potential genetic causes of spermatogenic impairment is to combine Y-hgs with SNPs in autosomes. Therefore, in this study we recruited 1,000 NOA cases and 1,703 fertile controls in Han-Chinese, compared the distributions of Y-hgs in both groups and analyzed the interactions between Y-hgs and autosomal variants in GWAS.

Results
Distributions of Y-hgs for NOA. To assess whether some Y-hgs are predisposing to or protecting against the spermatogenic impairment, we compared the distributions of Y-hgs between cases and controls. As shown in Fig. 1, we found that subjects belonging to Y-hg O3e* were more frequent in controls than in cases, and the difference was statistically significant (OR = 0.68, 95% CI = 0.52-0.89 and P = 5.55 × 10 −3 ). On the contrary, frequencies of Y-hgs O3e1 and O2* was higher in the case group (16.3% and 5.1%, respectively) than those in the control group (13.2% and 3.7%, respectively), although there were no significant differences between these two groups.
Interactions between Y-hgs and SNPs on NOA. From the first-step screening, there was only one significant Y-hg O3e* associated with NOA. Thus, in order to examine potential modifications of the Y-hg on effects of SNPs, we further compared the distribution of SNPs among NOA cases and controls based on the frequency of Y-hg O3e*. Although there is no demonstration of a statistically significant result here, however, totally 38 Y-hg O3e*-SNP pairs were found to be suggestively significant (P inter ≤ 1 × 10 −4 ) and presented in Table 1. Considering the main effects of both Y-hg O3e* and SNPs, rs11135484 possessed the smallest P = 9.09 × 10 −4 and executed a beneficial effect on azoospermia (OR SNP = 0.80). Interaction between Y-hg O3e* and rs11135484 was synergistic (OR inter = 2.07). Besides, rs17217643, rs6774209, rs11135482, rs17139327, rs4757259 and rs8035166 were also protective factors which magnified their effects on the presence of Y-hg O3e*. On the contrary, rs11678378, rs12520985, rs9452333 and rs9510242 showed antagonistic interactions with Y-hg O3e*.

Further functional study in silicon.
To examine whether SNPs in Table 1 exert their function via effects on expressions of nearby genes, we searched these variants on the GTEx in all expression quantitative trait loci (eQTL) tissues including blood, lung, adipose etc. As shown in Fig. 2, only two SNPs (rs11135482 and rs11135484) located in chromosome 5 were cis-eQTL. And both of them were significantly associated with a reduced expression of the gene ERAP2 (P = 5.1 × 10 −32 for both).
In order to explore potential functions of SNPs, we performed pathway analysis using gene ontology (GO) enrichment analysis. As presented in Table 2, 17 biological process pathways were listed. Most of these pathways, including natural killer cell activation, lymphocyte activation, leukocyte activation etc, were belonging to the immunology. The most significant pathway was the Tob1 pathway.

Discussion
GWAS has become a powerful tool for genetic scientists in the past 10 years. The strategy of GWAS is to uncover the SNPs which occur differently in people with or without a particular disease like cancer, Alzheimer disease, obesity, etc. Using this solution, we have identified several potential risk genomic regions of NOA in our previous studies 6,7 . Looking back, however, there is a growing cognition that GWAS approach has its limitations. One of the biggest problems before was whether to choose rare variants of large effect or common variants of small effect, but it is no longer a problem since the development of genotyping technology 18 . Besides that, the neglected sex chromosomes in GWAS are believed to contribute to the "missing" heritability in the etiology of complex diseases.
The X and Y chromosomes, the sex chromosomes, are special for men since the hemizygous exposure. While the abnormalities in the X chromosome are reported to be associated with a much wider range of diseases, the Y chromosome is believed to play a pivotal role in sex determination and spermatogenesis. The deletions of the AZF region in the Y chromosome long arm lead to spermatogenic impairment 19 . Given the haploid nature of the Y chromosome, it is reported to be the major reason for the exclusion in GWAS 20 , and the analysis of Y-hgs that  Table 1. Suggestively significant interactions between Y-hg O3e* and SNPs in control and case groups. OR for odds ratio, P for P value and R 2 for goodness of fit of logistic regression models. a Only those suggestively significant SNPs with a P inter < 1 × 10 −4 were shown. b MAF for minor allele frequency in Chinese in 1000 Genomes, c MAF in this study controls.
defined by a series of SNPs has been recognized as a more appropriate strategy in the association study 21 . Our efforts using this strategy have proved that some Y-hgs, such as Y-hg K, Q1, are potential risk factors for male infertility 22,23 . And there is a great tendency to survey the contribution of sex chromosomes to complex traits for filling the blank space of GWAS 20 . So in this study, we hypothesized that individuals in predisposed Y-hgs may carry some autosomal variants, which might be a potential genetic modifier for the Y-hg specific susceptibility to spermatogenic impairment. We first examined whether some Y-hgs are risk or protective factors for spermatogenic impairment. Our results demonstrated that Y-hg O3e* was significantly associated with NOA. Compared with the reference Y-hg O3*, Y-hg  O3e* was protective against the prevalence of spermatogenic impairment. The diversity of haplogroups is the result of genetic drift, natural selection or stochastic processes, and the beneficial effect on NOA cannot be occasional. Although the machanisms by which haplogroups exert their functions are unclear, emerging evidence has been reported to connect different haplogroups with variations of phenotypes, including high or reduced sperm motility 24 .
Next, to find out the autosomal variants with which Y-hg O3e* might interact, we combined data from Y-hgs with those from our previous GWA study. The pseudo-R 2 , which was a measure of goodness of fit of the statistical model, was also shown. Although pseudo-R 2 values were somewhat low here, they were still helpful in the model building state as a statistic to evaluate competing models and might be quoteworthy for other researchers on the male infertility. Besides that, we found that rs11135484 was suggestive of an interaction with Y-hg O3e* on NOA. Rs11135484 locates in an intron of ERAP2, which was found in the cytoplasm/endoplasmic reticulum (ER) and the plasma membrane. ERAP2 is a proteolytic enzyme site in the ER where it plays a central role in antigen processing and presentation 25 , and it is an attractive candidate involved in immune responses and inflammation 26 . It has been reported that certain SNPs located in ERAP2 can affect its nonsense-mediated RNA decay and protein expression 25,27 . Rs11135484 as well as rs11135482 are proved eQTL of ERAP2 which means they can regulate the expression of ERAP2. Moreover, the Encode project also points out that rs11135484 located in a region marked by enhancer markers, and it may change transcription factors binding affinity resulting in the different expression of ERAP2. In addition, a possible effect of ERAP2 deficiency could be an alteration in a quantitative reduction of MHC levels and thus influence the homeostasis of reproductive function 25 .
Pathway enrichment analysis showed that some processes like "natural killer cell activation" and "antigen processing and presentation of peptide antigen via MHC class I" shared common genes involved in immunology. For all we know, immunology is an important biological process that deals with the response of body to exo-or endo-disturbance, and many studies have linked it to the reproductive problems such as infertility, failed in vivo fertilization, spontaneous abortions etc 28,29 . Y-hg O3e* might regulate these immunological genes to keep the body in a steady state from adverse impacts and maintain a normal reproductive capacity.
In summary, we combined Y-hgs with GWAS to investigate potential interactions between them on NOA, and observed that Y-hg O3e* may modify the risk of some SNPs. These results suggest that both the Y chromosome and autosomes may jointly contribute to the reproductive outcomes. We cannot always divide them into two distinct aspects, and the combination may shed new light on the pathogenesis of male infertility.

Materials and Methods
Study population. This study was approved by the ethics review board of Nanjing Medical University, and all experimental protocol were in accordance with guidelines approved by the Institutional Review Board for Human Studies of Nanjing Medical University. Totally, 1,000 NOA cases and 1,703 male controls included in this study were reported in our previous genome-wide association study 6 , and a written informed consent was obtained from each subject. Briefly, all infertile cases were genetically unrelated Han Chinese men diagnosed to have idiopathic male infertility without a history of cryptorchidism, vascular trauma, orchitis, obstruction of the vas deferens, vasectomy, chromosome abnormalities or Y chromosome microdeletions of azoospermia factor (AZF) region. Semen analysis was performed following World Health Organization (WHO) criteria (2010) 30 . To ensure the accuracy of the diagnosis, each sample was examined twice. And the absence of spermatozoa in both replicate samples was defined as azoospermia. Fertile control subjects who had fathered one or more healthy children without performing assisted reproductive technology (ART) were shared with the Nanjing Lung Cancer Study. Y chromosome haplotyping. Y chromosome haplogtyping was performed in all 2703 participants. Totally, 10 highly informative polymorphic loci for East Asians (M130, YAP, M89, M231, M175, M119, M268, M122, M134 and M117, which defined Y-hgs C, DE, F*, N*, O*, O1, O2*, O3*, O3e*, O3e1) were identified following the nomenclature recommended by the Y Chromosome Consortium (YCC) and its updates [31][32][33] . The experimental procedures, mainly involving multiplex PCR amplification, restriction fragment length polymorphism (RFLP) and capillary electrophoresis, were described previously 14 .
Genotyping and quality control in GWAS. The GWAS was conducted using an Affymetrix Genome-Wide Human SNP Array 6.0 followed by a quality control procedure as described previously 6 . In brief, SNPs were excluded if they: did not map to autosomes, had a call rate of < 95%, had a minor allele frequency < 0.05 or had a genotype distribution in the controls that deviated from that expected with Hardy-Weinberg equilibrium (P < 1 × 10 −5 ). Individuals with overall genotype completion rates < 95%, gender discrepancies, unexpected duplicates or probable relatives, heterozygosity rates > 6 s.d. away from the mean or outliers in the principal component analysis were also excluded. After quality control processing, a total of 957 NOA cases and 1634 healthy controls with 587347 SNPs were included in the subsequent interaction analysis.
Statistical analyses. Statistical analyses were performed using R software (version 3.1.2; The R Foundation for Statistical Computing). Firstly, distributions of Y-hg among cases and controls were assessed, and the statistically significant Y-hg was further combined with SNPs. For the analysis of the Y-hg × SNP interaction, we firstly defined the Y-hg variable as Y-hg O3e* (1) or non-Y-hg O3e* (0). And SNPs were coded as continuous variables (0, 1 and 2) under an additive genetic model. Then we tested the interaction between each pair of Y-hg and SNP by conducting a 1-degree-of-freedom Wald test of a single interaction term as implemented in an unconditional logistic regression based on the equation Y = β 0 + β 1 × Y-hg + β 2 × SNP + β 3 × (Y-hg × SNP). Here, Y is the logit of the probability of being an infertile case, β 0 is a constant, β 1 and β 2 are the main effects of Y-hg and SNP, respectively, and β 3 is the interaction parameter to be tested. P value ≤ 1 × 10 −8 was regarded as statistically significant in interaction analyses considering the issue of multiple comparison, and P value ≤ 1 × 10 −4 was suggestive of an interaction.