INTRODUCTION

Male factors account for nearly half of infertility cases (defined by the World Health Organization [WHO] as failure to conceive within 1 year of unprotected sexual intercourse) with nearly 7% of the male population estimated to suffer from infertility.1 The large number of genes (>400) that cause male infertility when knocked out in mice suggests that hundreds of genes can result in monogenic forms of male infertility in humans.2 However, current estimates of the contribution of genetics to male infertility (15%) suffer from the lack of comprehensive genomic analysis in large cohorts. Indeed, the current practice of screening for numerical and structural chromosomal aberrations including deletions of the AZF regions on the Y chromosome as the standard approach to male infertility (with or without CFTR variant analysis depending on the study population) has a diagnostic yield of only 4%.3 Failure to incorporate single-gene disorders in the diagnostic workup despite the recent advances of sequencing technology with its increasing throughput and declining cost may also reflect lack of standardization of determining which genes are relevant to this clinical entity. This represents a missed opportunity from a clinical standpoint since defining monogenic forms of male infertility enables accurate counseling and prevention through carrier screening and potentially contributes to personalized management in the future.

Oud and colleagues have recently reviewed the literature and identified 1337 publications on monogenic forms of male infertility.4 They highlighted 78 genes with at least a moderate level of evidence of their connection to male infertility. They noted the growth in the number of genes confidently related to male infertility at around three genes/year, a rate much slower than many other heterogeneous conditions, e.g., intellectual disability, in the era of genomics. Furthermore, this limited repertoire of genes spans pretesticular, testicular, and post-testicular forms of male infertility such that genes related specifically to oligo-, azo-, and teratozoospermia are only 20.4 Exome sequencing (ES) is a very powerful genomic test with a proven track record in unraveling the underlying variants of established genetic diseases as well as the genetic contribution to phenotypes that may or may not be genetic in etiology. Previous genomic studies of male infertility employed targeted sequencing of a limited number of genes, or performed ES on a very limited number of highly selected families.5,6,7 Thus, it remains unknown what the yield of ES is in the setting of male infertility. In this study, we performed ES on a relatively large (285 patients) unselected cohort of male infertility (severe oligospermia and nonobstructive azoospermia). Our study showed the power of this approach to establish a monogenic cause in these patients through the identification of variants in known genes or in previously reported candidate genes thus supporting their proposed link to male infertility. Surprisingly, this unbiased approach also allowed us to identify male infertility as the major and sometimes only phenotypic expression of genes previously linked to multisystem disorders. Furthermore, we increased the number of plausible candidates in male infertility by adding 33 novel candidate genes.

MATERIALS AND METHODS

Human subjects

We only included in this study patients who presented to our center solely for the purpose of infertility and were subsequently found to have nonobstructive azoospermia or severe oligospermia (less than 1 million sperm/ml). Full andrological assessment included semen analysis on two independent samples, hormone profile, scrotal ultrasound, and chromosome analysis. As a control set, we also analyzed ES data on an identical number (n = 285) of ethnically matched fertile adult men who were exome sequenced as part of a trio ES approach on children with suspected Mendelian disorders and healthy parents as described previously.8

Ethics statements

Informed consent was obtained from all participants and the study was approved by the local institutional review board (King Faisal Specialist Hospital and Research Centre [KFSHRC] RAC 2180007 and 2121053).

Chromosome analysis

At an average 450-band resolution, standard chromosome studies were performed on phytohemagglutinin (PHA) stimulated peripheral blood mononuclear cells. Additional fluorescence in situ hybridization studies using centromere or locus specific probes were performed as clinically indicated by the cytogenetic finding.

Y-chromosome microdeletion (YCM) analysis

We utilized the commercial Elucigene Male Factor Infertility kit (catalog code AZFXYB1, Elucigene Diagnostics, Manchester, UK) to screen 275 patients in our cohort for Y-chromosome microdeletions (YCM). The kit involves multiplex quantitative fluorescence–polymerase chain reaction (MQF-PCR) assays used for the rapid determination of sex chromosome aneuploidy combined with the detection and characterization of the three most common classes of YCM associated with male infertility (AZFa, AZFb, and AZFc). Specifically, the assay includes sex chromosome STR markers XHPRT and DXYS218 and non-STR markers AMEL, TAF9L (Xq21.1) and SRY markers for the determination of sex chromosome aneuploidy (SCA) status. The assay also includes the Y specific markers AZFa (sY84, sY86), AZFb (sY127, sY134), and AZFc (sY254 and sY255) for the detection of YCM. Fluorescently labeled primers specific for sequence flanking these markers amplify the wild type sequence while loss of the wild type diagnostic peak is indicative of a microdeletion covering this STS marker. Amplified products of the QF-PCR technique were analyzed and quantitatively assessed on a capillary electrophoresis Genetic Analyzer 3100 XL, to determine both the copy number of the analyzed STR markers and the deletion status of the Y-chromosomal STS markers.

ES, variant classification, and candidate gene discovery

ES was performed on a clinical basis essentially as described previously.8 Briefly, sequencing libraries were prepared, enriched for the desired target using the Illumina Exome Enrichment protocol, and sequenced using an Illumina HiSeq 2000 Sequencer. The reads were annotated against Univeristy of California–Santa Cruz (UCSC) hg19 by Burrow–Wheeler Aligner (BWA). When analyzing ES, we were blinded to the results of chromosomal analysis. We prioritized variants in genes that are listed in OMIM with a male infertility phenotype, followed by genes with published variants in male infertility that are not yet listed in OMIM. When no interesting variants were identified in these genes, we searched for variants in genes with compatible animal models or other biological links to male infertility, as well as genes that are recurrently mutated in our cohort. We only considered variants that achieve or approach the cutoff values of “pathogenic” or “likely pathogenic” according to the American College of Medical Genetics and Genomics (ACMG) guidelines when the gene has an established link to disease (the guidelines do not apply to candidate genes).9 ACMG guidelines for variant classification were applied and used both general (gnomAD) and local (an internal database of 2379 ethnically matched exomes) variant databases to filter out common variants. In addition, we performed case–control analysis as recommended by ClinGen10 to assess the evidence of novel gene candidacy by performing an identical analysis on 285 fertile ethnically matched adult men. Similarly, previously reported candidate genes in male infertility were also classified based on ClinGen guidelines.10 Briefly, for previously reported candidate genes, we classified the evidence supporting the respective gene–disease association into likely strong (>2 independent studies with multiple pathogenic variants in unrelated probands and supporting experimental data), moderate (>1 independent study with several unrelated probands with pathogenic variants, supporting experimental data and no contradictory evidence), and limited (>1 independent study with <3 unrelated probands with pathogenic variants and multiple variants reported in unrelated probands but without sufficient evidence of pathogenicity). For novel candidate genes identified in this study, we classified them based on case–control data (single-variant analysis: 3 points or aggregate variant analysis: 6 points), segregation evidence, and experimental evidence (animal model: 4 points, testicular expression: 2 points and interaction with known male infertility genes: 2 points) as strong (12–18 points), moderate (7–11 points), and limited (1–6 points). Only variants that were validated by Sanger sequencing are reported.

RESULTS

Cohort characteristics

We recruited a total of 285 male patients who presented to our center with either nonobstructive azoospermia (n = 237) or severe oligospermia (n = 48). One hundred and twenty-seven cases are familial, and the age range was 24–63 years (average 37.4). Table S1 summarizes the detailed clinical workup for the cases with potential molecular findings (see below). The clinical workup includes detailed hormonal analysis (estradiol, follicle-stimulating hormone [FSH], luteinizing hormone [LH], prolactin, testosterone, and thyroid-stimulating hormone [TSH]), semen analysis, histopathological analysis of testicular biopsies, and testicular ultrasound imaging.

Chromosomal aberrations in male infertility

Klinefelter syndrome was the most common sex chromosome aneuploidy (18 of 19 SCA patients). Inversions and translocations were encountered in four patients while YCM were observed in seven cases (Table 1). Thus, the overall yield of the “traditional” approach using karyotype and YCM analysis in our cohort is 10.5% (n = 30) (Table 1 and Fig. S1). Patient 19DG1699 had a low level XYY mosaicism in blood (1% of nuclei and 2/30 metaphase cells). However, he was counted among those with potential chromosomal causes of male infertility because a higher level of mosaicism in the gonads could not be excluded. In addition, patient 19DG1706 with a translocation (46,XY,t[14;19][q11.2;q13.3]) is the only patient who was counted among both the chromosomal and single-gene groups because he also harbored a homozygous likely loss-of-function variant in SIRPG (NM_001039508.1:c.407delC;p.[Pro136Glnfs*29]), which was an attractive candidate gene for male infertility (see below).

Table 1 List of chromosomal aberrations identified in this study.

The landscape of point variants in male infertility

The control set (selected from our database of trio ES performed for various Mendelian indications on children born to healthy parents) was analyzed using the same pipeline to identify candidate variants in the male infertility cohort. Tables S2 and S3 list the details of our analysis of the control and patient cohorts, respectively. Variants that may impact the function of genes with potential involvement in male infertility were detected in 24.2% (Table S3 and Fig. S1) of our patient cohort as follows:

1. Variants in genes with established links to male infertility (15 patients; 5.2%) (Table 2):

Table 2 Variants in OMIM disease genes with established links to male infertility.

We identified candidate variants in the following genes that are listed in OMIM under the phenotype Spermatogenic failure (class A): CFAP44, NANOS1, SLC26A8, SYCP2, TEX11, TTC21A, USP9Y, and ZMYND15. We also identified candidate variants in genes for which the listed OMIM phenotype is not spermatogenic failure although male infertility can be part of the phenotypic spectrum (class A*). In contrast to the patient with CFTR-related cystic fibrosis, we found no or limited systemic manifestations in the patients with candidate variants in genes linked to primary ciliary dyskinesia (PCD) (DNAAF2, DNAH1, DNAH5, and HYDIN) and Fanconi anemia (XRCC2), although we note that the latter patient did have short stature and hypoplastic kidneys.

2. Variant in established disease genes with no link to male infertility in humans (class B) (9 patients; 3.1%) (Table 3):

Table 3 Variants in OMIM disease genes with no established links to male infertility.

We identified a candidate variant in HFM1, variants of which have only been described in female infertility in humans (premature ovarian failure 9, MIM 615724). However, Hfm−/− mice display sterility in both males and females, and severely defective crossing over has been demonstrated in Hfm1−/− spermatocytes.11 Similarly, we identified homozygous candidate variants in SGOL2 (SGO2) and SPIDR, two genes with no disease OMIM entries found mutated in a single family each with primary ovarian insufficiency.12,13 On the other hand, we note that OMIM listing of TDP1-related spinocerebellar ataxia is based on a single missense variant reported in 200214 with no follow-up confirmatory report, whereas our cohort revealed two homozygous candidate variants in TDP1, including a severely truncating variant (NM_001008744.2:c.910C>T;p.[Arg304*]), in two patients with male infertility but no evidence of any neurological involvement. Although male infertility was not described in the mouse model, we note that the established role of TDP1 in DNA damage repair and its binding to meiotic chromatin support our novel observation of TDP1 biallelic variant in male infertility. Similarly, the OMIM listing of CEP250-related retinal dystrophy with hearing loss is based on a few families with missense or compound missense and truncating variants, and the one family with a homozygous nonsense variant was also homozygous for a nonsense variant in an established retinal dystrophy-related gene C2orf71.15 In our cohort, however, we identified four patients with four different homozygous missense or truncating variants with no evidence of retinal dystrophy or hearing loss, which is consistent with the male infertility phenotype observed in Cep250−/−mice (https://www.mousephenotype.org/data/genes/MGI:108084). Accordingly, and following the ClinGen guidelines, we conclude that the TDP1-spinocerebellar ataxia and CEP250-retinal dystrophy associations are “disputed” by our findings (Table S3).

3. Variants in previously reported candidate genes in male infertility (class C) (8 patients; 2.8%) (Table 4):

Table 4 Variants in previously reported male infertility candidate genes.

We identified variants that appear to support the candidacy of previously reported genes with only tentative connection to male infertility. These include three variants in CCDC155 and one variant each in DNAH6, DZIP1, FAM47C, SPAG17, and TDRD6. According to the ClinGen scoring, our study changes their supporting evidence from limited to moderate (Table S3).

4. Variants in novel candidate genes for male infertility (class D) (37 patients, 12.9%) (Table 5):

Table 5 Variants in novel candidate genes for male infertility identified in this study.

We identified 36 candidate variants in 33 genes not previously connected to male infertility in humans. These can be arranged in three groups based on the ClinGen scoring system: group 1 (strong evidence, class D1), group 2 (moderate evidence, class D2) and group 3 (limited evidence, class D3) as follows:

Group 1 (strong evidence, class D1): These are genes with two or more independent variants or a single founder variant in two or more apparently unrelated individuals and are supported by compatible animal models. These include:

  • TERB1 (CCDC79): a single homozygous variant (NM_001136505.2:c.733G>A;p.[Gly245Arg]) was identified in two apparently unrelated patients in our cohort who shared the same haplotype thus confirming its founder nature. This gene is crucial for meiosis and its disruption in mice causes infertility in both sexes.16

  • PIWIL2: two novel homozygous candidate variants were identified in our cohort (one patient each). PIWIL2 encodes a testis-specific member of the Argonaute family of proteins, which have established roles in germline cell development and maintenance. The male knockout mice are sterile with completely blocked spermatogenesis at the early prophase of the first meiosis (between zygotene and early pachytene).17

  • ZSWIM7: two independent homozygous likely loss-of-function variants were identified in our cohort (one patient each). The encoded protein regulates homologous recombination (HR) by promoting the assembly of RAD51 and DMC1 on early meiotic HR intermediates, and its knockout results in infertility with small testes that have reduced cellularity and lack postmeiotic germ cells, and arrested spermatocytes during midpachynema.18

Group 2 (moderate evidence, class D2): These are genes with typically a single mutational event in a single patient and compelling animal models:

  • AKAP9: Knockout mouse exhibited male infertility with abnormal spermatogenesis and Sertoli maturation, and azoospermia.19

  • ASZ1: The encoded protein colocalizes with PIWIL2 (see above) and the knockout mouse is infertile.20

  • CCDC146: The encoded protein is a component of the centrosome in sperm cells as revealed by proteomic analysis, and the knockout mouse (Ccdc146em1J) is infertile.21

  • CEP131: The encoded protein has a key role in cilia formation, and knockout mice are sterile.22

  • DAZL: This is the autosomal ancestor of the Y chromosome (AZFc) DAZ (deleted in azoospermia) and deletion of its mouse ortholog results in complete male sterility due to failure of maintenance and maturation of germ cells.23

  • DDX25: This gene has testis-limited expression, specifically in Leydig and germ cells, and null mice exhibit spermiogenesis arrest and male sterility.24

  • ELMO1: The encoded protein plays a crucial role in the phagocytic clearance of apoptotic germ cells in the seminiferous epithelium by Sertoli cells, and Elmo1−/− testes have grossly abnormal seminiferous tubules with reduced sperm production.25

  • ESR2: This is a critical gene of estrogen signaling, and ablation of murine ESR2 causes sterility in both sexes.26

  • HIPK4: The encoded protein regulates spermiogenesis and the knockout mouse is infertile.27

  • HORMAD1: The encoded protein regulates meiotic synapsis and recombination and the knockout mouse is infertile.28

  • MAGEE2: Although two novel hemizygous likely loss-of-function (stopgain) variants were identified in this gene, ClinGen classification only assigns it a “moderate” level of evidence due to lack of suggestive biology. However, we note that a paralog (MAGEF1) has testis-specific expression and is upregulated during the pachytene stage of spermatocyte development.29 Furthermore, although gnomAD lists a probability of loss of function intolerance (pLI) of 0, we note that all hemizygous loss-of-function variants in gnomAD are flagged for quality, and no loss-of-function variants were identified in our control data sets.

  • MMRN1: Male knockout mice (Mmrn1tm1b(EUCOMM)Hmgu) are infertile (https://www.mousephenotype.org/data/genes/MGI:1918195).

  • ODF4: The encoded protein is a component of the outer dense fiber of mature sperm tail and its knockout results in male infertility in mice (https://www.mousephenotype.org/data/genes/MGI:2182079).

  • PGK2: Male knockout mice have severely impaired fertility.30

  • ROS1: ROS1 signaling has been found to be essential for epithelial differentiation in the epididymis such that knockout male mice have arrested sperm maturation and complete sterility.31

  • SPATA3: Expression of this gene is specifically enriched in spermatids and not detected in other testicular cell types, and it has been shown to interact with KLHL10, which when inactivated causes spermiogenesis arrest and male infertility in mouse and humans.32

  • STRA8: The encoded protein regulates meiotic initiation such that male knockout mice lack evidence of meiotic chromosome cohesion, synapsis and recombination.33

  • TBCCD1: The encoded centrosomal protein localizes to the basal bodies of primary and motile cilia, and its depletion impairs the ciliogenesis potential.34 Knockout mice are infertile (Tbccd1tm1b(EUCOMM)Hmgu).

  • TTLL9: This tubulin polyglutamylase is essential for the heterogeneity of the structure of the outer doublet microtubules in ciliary and flagellar axonemes, and its deficiency causes doublet loss and reduced polyglutamylation of sperm axoneme with abnormal motility and resulting male infertility.35

Group 3 (limited evidence, class D3): These are genes with a single mutational event in a single patient and only tentative/indirect biological link to male infertility. These include CST1, DMRTA2, DNAH7, MOSPD2, NLGN4Y, PPP1R36, RIOK2, SIRPG, STAG2, TCEANC, and ZNF541 (Table S4). Of note, two genes were originally considered, CEP350 and DDX53; however, the finding of high quality homozygous likely loss-of-function variants among fertile men (Table S2) made us exclude these as plausible candidates. While case 19DG1872 with the DDX53 variant remained negative after reanalysis, case 19DG1640 with the CEP350 was reanalyzed and found to have the abovementioned variant in ASZ1 (see “Moderate evidence, class D2” above).

DISCUSSION

Despite the relatively common occurrence of male infertility and the multiple lines of evidence supporting a strong genetic component, the workup of infertile male patients has seen little change in the contemporary era of genomic medicine. This may seem surprising given the large number of genetic studies in male infertility.36 However, these studies were largely focused on the identification of individual novel genes, typically one per study, rather than comprehensive genomic studies on a large number of patients. Much has been learned from genomic studies on similarly common and heterogeneous disorders such as intellectual disability where unbiased analysis of large cohorts each comprising hundreds of patients have not only unraveled a large fraction of the disease genetics but also prompted the adoption of this genomics approach clinically.8 This study is an attempt to bring these benefits to the field of male infertility.

Our results show that chromosomal aberrations, including Y-chromosome microdeletions, only account for a small fraction of male infertility, which suggests that the current practice potentially deprives many patients of the benefits of a molecular diagnosis. As with many other clinical entities, molecular classification enables the study of the natural history and “reverse phenotyping” of specific subtypes of male infertility, an important step toward the development of personalized management and targeted therapies in the future. For example, the finding of specific defects in the sperm head and tail suggests that these defects are amenable to treatment using intracytoplasmic sperm injection (ICSI) following surgical sperm retrieval.

Despite the relatively small number of previously published male infertility genes, our results suggest that these account for nearly an identical percentage of patients compared with chromosomal aberrations. The percentage is even higher when one also considers genes that cause systemic disorders in which male infertility may or may not have been described as part of the phenotype. This latter group of disorders deserves special mention because they epitomize the phenomenon of “reverse phenotyping” and the power of unbiased genomics approaches to circumvent historical clinical bias in disease definition. For example, PCDs have historically been ascertained and studied based on their lung pathology or laterality defects rather than male infertility despite the latter being a well-established phenotypic component.37 By specifically interrogating a male infertility disease cohort, we show that nonsyndromic, i.e., isolated male infertility can be the sole presentation of PCD in humans. We acknowledge that dynein variants are predicted to cause asthenozoospermia (morphologically defective sperm flagella) rather than oligospermia/azoospermia. However, a previous study has linked DNAH6 to nonobstructive azoospermia without any signs of PCD thus establishing a precedent in support of our finding.38 More surprisingly, we show that male infertility is a phenotypic component of a number of syndromes, e.g., HFM1-related male infertility. Similarly, we have previously proposed a novel Fanconi anemia group based on a young child with the same XRCC2 variant we identified in this cohort in a male with hypoplastic kidneys and short stature but no evidence of other skeletal findings, anemia, or microcephaly.39 Interestingly, a recent report also described a missense variant in XRCC2 in a family with meiotic arrest, azoospermia, and premature ovarian failure, but who lack the typical features of Fanconi anemia.40 The value of this phenotypic expansion in future case identification cannot be overemphasized.

We acknowledge that the novel genes identified in this study have variable levels of evidence supporting their candidacy. To increase the objectivity of judging the level of evidence we employed the ClinGen classification that takes into account both variant- and gene-level factors, including the quality of case–control analysis. Our analysis of the fertile male cohort using the same pipeline to identify candidate causes of infertility in the study cohort was very helpful in this regard and supported the specificity of our findings. Specifically, the only two genes (CEP350 and DDX53) with high quality homozygous loss-of-function variants in the fertile male cohort were genes with only “limited” level of support and we excluded them accordingly. On the other hand, the single homozygous missense variant each identified among fertile men in the previously published genes CCDC155, DNAH6, and USP9Y do not sufficiently challenge the previous reports, and thus were retained.

The large number of genes (>400) that cause male infertility in mouse models clearly suggests that many disease genes for male infertility have yet to be identified in human patients. Thus, it should not be surprising that this single study uncovered 33 novel candidate genes. Although only a fraction of these genes met the “strong” designation of ClinGen for disease–gene association, we note that all the novel candidate genes were carefully selected on the basis of suggestive biology and, especially in the case of compatible mouse models, these can be readily verified, refined, and added to through the study of future cohorts of males with infertility using unbiased genomic approaches rather than a predefined set of genes.

The novel candidate genes we highlight encode a biologically diverse group of proteins, which is consistent with the known highly complex nature of germ cell development in males. The two phases of spermatogenesis and spermiogenesis encompass numerous developmental steps that require a large number of molecules, many of which are germ cell–specific.36 The demonstration of a male infertility phenotype as a consequence of mutation of specific genes is the ultimate proof of the indispensability of these genes to male fertility in humans because animal models only partly represent human physiology.4 Apart from the diagnostic value of novel disease gene discovery, it should be noted that deeper understanding of these genes can have other translational benefits including the development of male contraceptives. For example, our finding of male infertility secondary to naturally occurring “knockout” events in genes encoding certain sperm plasma membrane antigens may be viewed as an approach to novel drug target discovery and validation.

In conclusion, we show that monogenic causes of male infertility are at least as common as the currently screened-for causes. Our results support previously reported candidate genes, highlight male infertility as a phenotypic expression of genes linked to other disorders in humans, and expand the number of candidate genes that should be investigated in future studies. Defining these monogenic forms of male infertility will greatly expand our understanding of the molecular mechanisms that control male germ cell development and inform future therapies. Expanding the use of genomic medicine in urology could lead to personalized diagnosis and improved management of infertile males.