Diverse monogenic subforms of human spermatogenic failure

Non-obstructive azoospermia (NOA) is the most severe form of male infertility and typically incurable. Defining the genetic basis of NOA has proven challenging, and the most advanced classification of NOA subforms is not based on genetics, but simple description of testis histology. In this study, we exome-sequenced over 1000 clinically diagnosed NOA cases and identified a plausible recessive Mendelian cause in 20%. We find further support for 21 genes in a 2-stage burden test with 2072 cases and 11,587 fertile controls. The disrupted genes are primarily on the autosomes, enriched for undescribed human “knockouts”, and, for the most part, have yet to be linked to a Mendelian trait. Integration with single-cell RNA sequencing data shows that azoospermia genes can be grouped into molecular subforms with synchronized expression patterns, and analogs of these subforms exist in mice. This analysis framework identifies groups of genes with known roles in spermatogenesis but also reveals unrecognized subforms, such as a set of genes expressed across mitotic divisions of differentiating spermatogonia. Our findings highlight NOA as an understudied Mendelian disorder and provide a conceptual structure for organizing the complex genetics of male infertility, which may provide a rational basis for disease classification.

Abstract: Non-obstructive azoospermia (NOA) is the most severe form of male infertility and typically incurable with current medicine. Due to the biological complexity of sperm production, defining the genetic basis of NOA has proven challenging, and to date, the most advanced classification of NOA subforms is based on simple description of testis histology. In this study, we exome-sequenced over 1,000 clinically diagnosed NOA cases and identified a plausible recessive Mendelian cause in 20%. Population-based testing against fertile controls identified 27 genes as significantly associated with azoospermia. The disrupted genes are primarily on the autosomes, enriched for undescribed human "knockouts", and, for the most part, have yet to be linked to a Mendelian trait. Integration with single-cell RNA sequencing of adult testes shows that, rather than affecting a single cell type or pathway, azoospermia genes can be grouped into molecular subforms with highly synchronized expression patterns, and analogs of these subforms exist in mice. This analysis framework identifies groups of genes with known roles in spermatogenesis but also reveals unrecognized subforms, such as a set of genes expressed specifically in mitotic divisions of type B spermatogonia. Our findings highlight NOA as an understudied Mendelian disorder and provide a conceptual structure for organizing the complex genetics of male infertility, which may serve as a basis for disease classification more advanced than histology.

Introduction
Non-obstructive azoospermia, or lack of sperm in the ejaculate due to disruption of spermatogenesis, is a multifactorial trait with a prevalence of 0.4%-2% in the male population [1][2][3] and has an incidence of around 10% in cohorts of infertile men 4 . Mounting evidence suggests that male infertility is broadly associated with late-onset somatic comorbidities, including cancer, cardiovascular disease, and other chronic diseases 5 . The mechanisms by which these risks are linked are generally unknown. From the perspective of patient counseling, disease prevention and management, it is thus becoming increasingly relevant to establish the genetic origins of male infertility and spermatogenic impairment. During pioneering genetic investigations over 40 years ago, large Y chromosome deletions were identified as a highly penetrant genetic cause of human azoospermia 6 . The advent of next generation sequencing has renewed the hunt for Mendelian forms of azoospermia and the number of confirmed monogenic causes has since grown to encompass at least 40 loci 7,8 . Nevertheless, our understanding of genetic causes of male infertility phenotypes is still lacking considering that the number of genes linked to the trait in mice is over 500 9 , half of which are specifically implicated in azoospermia 10 . The few genomewide studies published have shown promise of using whole-exome and whole-genome sequencing to identify the missing genetic causes of azoospermia [11][12][13][14][15] . What is needed now is a conceptual framework to organize what appears to be a very large collection of monogenic disorders that we collectively call "male infertility". We hypothesized that distinct subforms of NOA exist, currently invisible by standard histological classification, but detectable at a molecular level. In this study, we test this hypothesis through analysis of the largest collection of azoospermia cases to date, combined with a new single-cell RNA sequencing (scRNA-seq) atlas of human testis gene expression (Fig. 1). The results presented provide a broad and detailed

Variation prioritization identifies potential monogenic causes in 20% of NOA cases
Through the GEnetics of Male INfertility Initiative (GEMINI), we performed whole exome sequencing (WES) on 1,011 unrelated men diagnosed with spermatogenic failure, the vast majority with unexplained NOA. Carriers of known genetic causes of azoospermia or large structural variants were excluded using the processed sequencing dataset (Extended Data Fig. 1 16 . The prioritized variants were further filtered by taking into account the mutation type and the testicular expression patterns of the underlying genes (Methods) yielding a total of 312 variants most likely to have an impact on spermatogenesis. Experimental validation confirmed 98% (n=82/84) of SNVs and 93% (n=38/41) of INDELs tested, whereas a decreased . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07. 19.22271581 doi: medRxiv preprint rate was detected for CNVs (43%, n=3/7) as expected due to the lower accuracy of detecting structural rearrangements from exome data 17 .
Using PSAP prioritization, we identified a potential molecular cause of the disease for 19.3% (n=178/924) of the NOA cases, when considering the SNVs and INDELs only. The PSAP pipeline is a filtering tool; some variants that are completely unrelated to infertility may pass this filter. In order to estimate what fraction of prioritized variants are false positives, we also applied PSAP to an institutional control cohort of 2,665 individuals with non-reproductive disorders ( Extended Data Table 1). Only 6% of controls carried prioritized variants, indicating that the discovery rate in GEMINI is much higher than background (P=1.7x10 -27 ), but also indicating that as many as ⅓ of prioritized variants in GEMINI may be false positives.
The NOA cohort included 72 unrelated consanguineous cases that were expected to carry damaging recessive variation leading to the disease 8 . Consistently, a possible genetic cause of NOA was identified in 76.4% of the consanguineous men, significantly higher than the remainder of the cohort 14.9%; Fisher's exact test P=8.3x10 -28 ; Supplementary Discussion).
Additionally, two unrelated NOA cases from Utah were identified with uniparental isodisomy of chromosome 2 (UPD2) and chromosome 4 (UPD4), and with prioritized variation in INHBB 18 (regulation of FSH production) and POLN (involved in recombination) on affected chromosomes, respectively (Supplementary Discussion; Extended Data Fig. 4a). The overall detection rate of a candidate NOA lesion with all variation types combined reached 20%, an estimate similar to previously reported rates in genomic studies of male infertility 14,15 .
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Population-based statistical tests associate 27 genes with azoospermia
The PSAP analysis described above is useful as a prioritization filter that is sensitive to monogenic disease mutations in n=1 cases, but it is not a replacement for association testing, which remains a gold standard for definitive identification of disease genes. To perform association testing, we assembled two independent cohorts of infertile men from MERGE (n=817 cases from Germany) and Newcastle/Radboud studies (total n=331; n=286 from Netherlands, n=45 UK), as well as a large control cohort of 11,587 fertile parents (Fig. 1, Methods). Of the 221 putative disease genes prioritized in the GEMINI cohort, 18.9% were disrupted in at least one other NOA cohort (Supplementary Table 3). Three genes were affected in all NOA cohorts: M1AP, recently published in a cross-center report as a novel cause of autosomal recessive meiotic arrest in men, and the uncharacterized genes KCTD19 and YY2 (Supplementary Table 3) 19 , 20 .
Using the combined NOA cohorts (n=2,072 cases) and 11,587 fertile controls, we performed burden testing on 34 genes with prioritized mutations in at least 2 cases (Fig. 2h). We reasoned that only testing genes with multiple prioritized variants would limit multiple testing and maximize power. To increase rigor, we restricted our analysis to only genotypes compatible with recessive disease (e.g. homozygous and hemizygous changes). Due to data use limitations we were unable to obtain data on compound heterozygous changes from controls, thus limiting some of the genes that could be tested (Methods). Burden testing identified 27 genes with significant p-values, providing strong evidence that variation in these genes is likely deleterious for human male infertility (Supplementary Table 4). Especially strong evidence was found for 21 genes, for which we identified multiple highly deleterious genotypes in cases, and none in 11,587 fertile controls (Fig 2h). Of these, only TDRD9 and RNF212 were identified as having at least "limited" clinical validity in a recent review of azoospermia research 7 .

Undescribed human "knock-outs" in azoospermia
We next identified rare human biallelic loss-of-function variants, or "knock-outs" (KO), enrichment of which has been observed for Mendelian disease, developmental disorders, and autism 21 . A high-confidence complete KO was predicted for 50/221 NOA genes identified in this study with an enrichment on X and Y chromosomes (p=0.018 and p=0.032 vs autosomes, respectively; hypergeometric test, Fig. 2b) that have historically been considered the main source of azoospermia defects. On chromosome X, 9/19 (47%) genotypes were KO, including in genes MAGEA3, MAGEB18 and MAGEC3, the members of the cancer/testis-antigen family modulating reproductive success 22 . On chromosome Y, prioritized LoF mutations were found in USP9Y, DDX3Y, TGIF2LY, and ZFY, the latter three yet to be linked to male infertility in humans.
Notably, 38.0% (n=19) of the KO genes represented the first instances of human KO observed across more than 224,000 individuals in five large human variation datasets, including gnomAD and the Human Gene Mutation Database (Methods; Supplementary Table 6). For comparison, significantly fewer cases of novel predicted KOs were observed among the 2265 institutional controls considering only autosomal genes (n=11; one-tailed binomial test, P=5.4x10 -7 ). Three genes were recurrently affected by biallelic or hemizygous loss-of-function variants, including poorly studied X-linked DCAF12L1 (Fig. 2c).
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022.  Fig. 5a; Supplementary Discussion). The rearranged genes were novel in the context of male infertility, with an exception of STRA8 where a homozygous splice variant was recently found in an NOA/severe oligozoospermia cohort 15 .
We identified and validated a homozygous ~6kb deletion in the gene, which was predicted as the top candidate cause of NOA in the patient (Fig. 2d-f, Extended Data Figs. 5b-d). STRA8 is considered a meiotic gatekeeper and plays a vital role in meiotic initiation upon induction by retinoic acid in both female and male germline 23 . The NOA patient with the STRA8 deletion displayed a maturation arrest phenotype where pre-pachytene spermatocytes were rarely observed and germ cells beyond the pre-pachytene stage were not detected, largely consistent with the Stra8 -/mouse model ( Fig. 2g; Extended Data Fig. 6) 23 . These data indicate that undescribed human KOs contribute to the loss of fertility in men.

Clinical validity assessment and the role of recurrent gene mutations in establishing pathogenicity
Although the aim of this study was to explore and compile an extensive catalog of recessive variation possibly leading to NOA among unrelated cases, we evaluated the clinical validity of the variants to identify the genes with the best prospects for clinical testing. Potentially diagnostic variants classified as "Pathogenic" or "Likely Pathogenic" were identified in 29 genes disrupted in 37 cases from the GEMINI cohort (Supplementary Table 5) based on the guidelines from the American College of Medical Genetics and Genomics (ACMG) 24 . These genes consisted of eight with existing human genetic evidence, whereas three appear to be implicated . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022.
in human or mouse male infertility for the first time based on the evidence compiled here: DCAF12L1, DCAF4 and YY2 (Supplementary Table 5).
Recurrent observations are a key for classifying novel disease variation as '(Likely) Pathogenic' according to the ACMG guidelines. We observed only 25/221 (11.3%) of prioritized genes in multiple subjects, with a maximum of three recurrent hits seen for four genes (  For the majority of the NOA cases with a detected possible cause (77%), a single gene was identified, consistent with a monogenic form of the disease, whereas the remaining subjects carried recessive variation in up to five loci depending on the degree of consanguinity (Extended Data Fig. 4b). Based on these observations, and a simple model of NOA genetic architecture, we estimate the total number of monogenic azoospermia genes is around 625 (Fig 2a; Methods).
Incorporating the estimated FDR of PSAP prioritization in the calculations, the total number drops 12% to 550. The cohort sizes required for detecting multiple occurrences of at least half of all azoospermia genes are thus an order of magnitude higher among outbred cases (n~6700) than has been feasible until now (Fig. 3b), while the number of consanguineous cases required for equivalent power is much lower (n~1300) (Fig. 3b). These results highlight the necessity for large cohort studies for identifying NOA disease variation with a diagnostic value as per the ACMG guidelines.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022.

NOA as an understudied Mendelian disorder
In spite of the growing number of studies on the genetics of NOA, the total number of the genes linked to the disease has remained minuscule. Consistently, we observed a low fraction (7.7%) of genes previously reported in association with male infertility in humans (Extended Data Table   2). The remainder represent novel NOA candidate genes in men, of which at least 29.9% (61/204) were known to cause sub/infertility upon disruption in mice. A protein-protein interaction (PPI) network analysis of the prioritized genes identified numerous well defined subgroups of interacting proteins with clear roles in testis function (Fig. 3c).
To expand the context to all known Mendelian trait loci, we intersected the prioritized gene list with known Mendelian disease genes in OMIM, an online database of Mendelian traits. For the fifty-one (24.9%) NOA genes in OMIM, infertility was the most frequent reported disease type (n=13), followed by developmental disorders (n=11) and cancer (n=7) (Supplementary Table 8).
Collectively these findings highlight primary testicular failure as a poorly studied Mendelian disorder. Cases of idiopathic male infertility can provide knowledge about Mendelian disease that is under-sampled in other study populations, likely due to tissue-specific functions of many NOA genes. Indeed, by analysis of expression patterns across 52 human tissues in the GTEx database 26 , we find that genes with high expression in testis are significantly under-represented among genes linked to recessive Mendelian disorders (Fisher's Exact Test p=1.0x10 -40 ; Fig. 3d).
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint

Integrative analysis with testis scRNAseq reveals 'molecular subforms' of NOA
We next aimed to characterize the testicular cell types and pathways affected by prioritized NOA variants by characterizing scRNAseq data from twelve human testis samples (Methods) 27 . Testis expressed genes were grouped into 70 expression modules ("components") by soft clustering with Sparse Decomposition of Arrays (SDA), a Bayesian method for tensor decomposition (Fig.   4a, Methods). Each component is described by two matrices: a cell score matrix and a gene loadings matrix. Together they provide a quantitative summary of which genes are co-expressed across which cells. We provide an interactive browser to view the SDA components in detail online at https://conradlab.shinyapps.io/HISTA/#. These components represent sets of genes with similar expression dynamics independent of cell type boundaries and are often driven by shared biology 28 . The activity of each component can be summarized by plotting the cell scores for each component on a map of labeled types (Fig. 4b), and, in the case of germ cell components, plotting cells scores as a function of developmental stage (pseudotime, Fig. 4c). In such a way components can be identified that are specifically expressed in spermatogonia (component 35), spermatocytes (component 59) and spermatids (component 92) (Fig. 4c).
By comparing the profile of SDA genes loadings for GEMINI NOA genes to the profile observed for 1) genes essential for spermatogenesis in mice based on MGI database and literature searches (denoted 'MGI genes'; n=197) and 2) genes identified among the controls of the institutional cohort (n=131), we observed that the mouse MGI genes were more similar to NOA rather than to the control genes (R=0.54 vs R=0.087; p=3x10 -4 Pearson-Filon test, Fig. 5a).
Unlike controls, NOA and MGI genes showed a very strong, specific enrichment on SDA . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint components with peak expression in earlier phases of germ cell development, differentiating spermatogonia and pre-pachytene spermatocytes (Fig. 5a). Specifically, component 51 was an outlier in the absolute number of human and murine infertility genes (Fig. 5b-c) and showed strong enrichment of gene ontology (GO) categories related to meiosis. At least half of the top 50 genes in this component have already been identified as associated with male infertility in humans, mice, or both, whereas several of the genes with a high specificity towards this component have yet to be characterized in mammalian spermatogenesis, including RAD51AP2, PRAP1, C18orf63, C5orf47, and the SYCP3 paralogs FAM9B and FAM9C (Fig. 5b).
We next compared the SDA profiles in the context of the histological diagnosis of the NOA cases and observed a different distribution of gene loadings across the components for patients with MA compared to patients with SCO or "unknown" histology, the latter of which is likely a mixture of all three histologic subtypes (p=3.0x10 -3 , hypergeometric test, Fig. 5a). Furthermore, the human SDA profile for MA genes was negatively correlated with the profile of SCOassociated genes and genes from patients with unknown histology (R=-0.23) indicating distinct molecular causes for different histological subtypes. This observation of different loading patterns for genes linked to MA compared to SCO (p=2.8x10 -3 , hypergeometric test) was replicated with data from 197 mouse models of male infertility with known histological observations (Methods). SDA components were classified into "MA-enriched" and "SCO-enriched" based on the corresponding case histology of component genes. Among the MA components with multiple gene loadings, SDA149 and SDA35 were expressed in undifferentiated and differentiating spermatogonia and involved multiple known early germ cell markers, such as NANOS3, ID4 and . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint DMRT1, but also DCAF12L1, a gene recurrently disrupted in NOA cases (Fig. 5d). The SDA94 component represents an alternative molecular time-point affected in MA cases, meiotic spermatocytes, where cases were identified with mutations in HIST1H1T and C1orf146 (a human ortholog of yeast SPO16). Similarly, five SCO 'molecular subforms' were mapped to the spermatogonial cell population, including SDA67, enriched for DNA replication genes, and SDA46, a component with unknown function (see below). Collectively, these findings indicate that within the same histological endpoint, different 'molecular subforms' can be found reflecting the known and novel molecular origins of azoospermia.

SDA components with uncharacterized genes
We noted several SDA components that included multiple prioritized genes not previously linked to mammalian sperm production. For instance, component 46, which included NOA case genes

piRNA processing factors in NOA
We prioritized potential disease variation in six genes that are essential for the biogenesis of have previously been found to be disrupted in men with azoospermia.
Remarkably, when queried against the SDA components, these 6 genes loaded highly and specifically on component 59 expressed in pre-pachytene spermatocytes (p<0.01, Fig. 6b) together with other known piRNA processing genes PIWIL2, TEX19 and GPAT2, and a . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint potentially novel factor in piRNA processing, TDRD10 (Fig. 6c). Strikingly, 11 of the 15 top loadings on component 59 were long intergenic non-coding RNAs annotated as pre-pre-piRNAs 44 , indicating that the precursors of piRNA molecules are strongly co-regulated with the transcripts of the piRNA biogenesis genes themselves.
We set out to validate the functional consequences in testicular tissue samples available for six NOA cases with prioritized variants in PNLDC1, TDRD9 and TDRD12. PNLDC1, a trimmer of pre-piRNAs, was recurrently disrupted in three men from different GEMINI centers and one case from the Newcastle/Radboud cohort 45 , whereas two potentially deleterious variants in TDRD9 and TDRD12, essential factors in secondary piRNA biogenesis, were found in singleton NOA cases from the GEMINI and MERGE cohorts, respectively (Extended Data Fig. 8a). Disruption of all tested piRNA genes predominantly led to spermatogenic arrest seen in testicular biopsy  45 . Collectively, these findings demonstrate that components of piRNAs biogenesis are indispensable for human spermatogenesis and highlight the benefit of intersecting putative disease genes with scRNAseq data for identification of diverse molecular subforms of NOA.

Discussion
In this study, we performed whole exome sequencing on an international cohort of unrelated NOA cases and reported extensive genetic heterogeneity attributable to the recessive form of the . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint disease. We implicated over 200 genes in primary spermatogenic failure, including 27 genes with significant association with NOA when comparing 2,072 cases and 11,587 fertile controls.
Thus, we provide two lists of genes with two levels of statistical signal: 221 genes prioritized in GEMINI cases, the majority of which were observed in n=1 cases, with an estimated 30% false discovery rate based on a control cohort comparison (Supplementary Table 2). Second, we provide a higher confidence list of 27 genes which have all been observed in multiple NOA cases and have statistically significant association with NOA in a comparison with 11,587 fertile controls (Supplementary Table 4). The observed detection rate of possible causes of NOA in GEMINI (20%) remains a conservative estimate, as our approach did not consider genes exclusively involved in fetal gonad development, excluded genes with broad expression across the body, and did not address dominant or additive defects likely to contribute to the manifestation of NOA as well 46,47 . A more detailed analysis of X-linked genes is reported in a companion paper focusing exclusively on the X chromosome, using a different method of prioritization than described here (Riera Escamilla et al. submitted).
One remarkable conclusion of our study is that the Mendelian causes of NOA are well distributed across the vast number of genes involved in testis function, and not clustered into a small number of genes. The highest level of recurrence recognized in the GEMINI cohort was n=3 cases affected by mutations in the same gene, and most cases were singletons, affecting in total 200 genes. This places NOA in a similar space of genetic architecture as congenital hearing loss (151 genes with at least "limited" clinical validity) 48 , inherited retinal degenerations (over 260 genes) 49 , and ciliopathies (over 190 genes) 50 , but not yet approaching developmental disorders (over 1,500 reportable genes) 51 . This observation has numerous implications for the clinical management of genetic forms of male infertility. First, it seems unlikely that a targeted . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint panel of genes for clinical genetic testing will, in the short term, be an effective replacement for whole exome or genome assays in this disease space. Second, most clinicians will never see multiple unrelated cases of infertility caused by mutations in the same gene. Looking across clinics to share experiences for diagnosis and treatment, and to construct case series, will be essential to optimize patient care. From a research perspective, this scarcity of recurrent cases will challenge variant interpretation: this will need to be addressed by data sharing consortia, development of robust assays to evaluate the function of a wide range of variants, and more creative, integrative use of model organisms for functional validation.
It is now clear that what is clinically referred to as "spermatogenic impairment" is a collection of numerous rare diseases. However, these rare diseases share a related genetic basis; many of the affected genes encode proteins that either interact physically, such as in a protein complex, or physiologically, in regulating a particular cellular process. In such a system it is expected that oligogenic inheritance will occur, as well as variable expression of disease genotypes, where the precise phenotypic effect of a mutation will depend on genetic variation at other interacting loci, as well as the environment (e.g. patient age, lifestyle) 52 . We noted that eleven genes prioritized in our study have previously been linked to male infertility disorders other than NOA, including hypogonadotropic hypogonadism, primary ciliary dyskinesia, and multiple morphological abnormalities of the sperm flagella (MMAF; Fig. 3c, Extended Data Table 2, Supplementary Discussion) 7,53 . The latter suggests that problems in sperm assembly may also result in defects in sperm concentration, consistent with the observation of severe oligozoospermia in some patients with MMAF 54,55 ; 56 . Clinical data available for the affected GEMINI patients did not indicate misdiagnoses, and most of these genes are robustly expressed in adult testicular cell types relevant to NOA, raising the possibility of an expanded phenotype for these loci. In total, these . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint observations suggest that variant and gene interpretation for NOA should typically be performed jointly with genetic knowledge of other male infertility phenotypes. It should be expected that, while some subtypes of NOA may have a unique genetic basis not shared with other traits, many variants that have a causal role in NOA will also modulate sperm production across the full spectrum of sperm count, such has already been shown with some Y chromosome variants 57 .
Furthermore, it is imperative that future studies of NOA should consider oligogenic inheritance during case assessment, and population-based tests of association that include heterozygous genotypes should be used as discovery and validation tools.
By intersecting the identified NOA disease genes with an scRNAseq atlas of human adult testis, we were able to organize the underlying molecular pathologies beyond what is currently possible with histological classification. We identified gene expression components with clear sets of independent genes, involving distinct cellular processes (e.g. mitosis, meiosis, piRNA biogenesis) and distinct cell types (e.g. Sertoli cells, and various germ cell types). In total we found over a dozen clearly distinct components with robust GEMINI gene loadings, and have statistically defined 70 components from total testis expression data. Thus the scope is clearly there to use gene expression components as a rational basis for NOA patient classification with higher resolution than our current systems based on histology. We are not proposing that gene expression signatures like SDA components will provide a comprehensive and definitive classification system for all genetic forms of male infertility, just that such an approach can provide a useful starting point for organizing the rapidly growing, and, in some cases, not obviously related genetic causes that are being identified.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. The use of components to categorize patients can have benefits to research and patient management. Most importantly, we showed that defects in multiple genes from the same component, 59, produced a shared molecular signature of defective piRNA processing that could be detected with the same assay of small RNA sequencing. With further research, it may be possible to design similar companion diagnostics for each component, or molecular subform, that can be used in conjunction with genetic analysis to confirm the pathogenicity of candidate variants, even for variants and genes that have yet to be characterized. Examples of other molecular markers that could be measured from patient tissue would be DNA methylation (e.g. for SPOCD1 mutations found in our study 58 ), metabolite levels, or markers of immune activity, although the potential number is large and many have been considered to date. The SDA approach simply provides a framework to pair genes of uncertain function with other genes, many of which may already have well characterized roles. Finally, we showed that components can be used to map the cell types involved in a defect. This would be critical knowledge to nominate patients with primary defects in somatic cells, who could benefit from powerful in vivo therapies targeted to those somatic cells, without directly modifying germ cells.
Infertility treatment is the leading frontier of some genetic technology, including diagnostic single cell sequencing 59 and therapeutic mitochondrial donation 60 . The principles outlined in this discovery study indicate that genetic forms of male infertility have diverse mechanisms, and that genetic diagnosis through whole exome/genome sequencing will be an essential tool for developing and targeting new forms of assisted reproductive therapies but also for risk evaluation of future comorbidities.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint To identify potentially missed cases of obstructive azoospermia, the genotype call set was additionally screened for pathogenic mono-or biallelic ClinVar variants (curated Expert Panel list, July 2018) 18 in the CFTR gene, a known cause of obstructive azoospermia. We elected to remove cases who were heterozygous for a single well-acknowledged pathogenic CFTR variant as a conservative measure. Due to the limitations of WES to detect variants in non-exonic regions, we are not able to exclude the presence of other non-coding pathogenic variants in the CFTR gene in trans with the coding variants 19,20 . Additionally, identity by descent (IBD) analysis was performed using SNPRelate R package 21 to identify and remove one counterpart of each accidental sample duplicate or twin pairs and cases with cryptic relatedness to avoid ascertainment bias of rare variation.

Study subjects
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Institutional control cohort for development of variation prioritization procedures
A collection of individuals (total n=2265, 851 of whom are men) sequenced with the identical exome platform as the GEMINI cases was used as institutional controls to refine the variant prioritization pipeline and to test its ability to specifically aggregate potentially pathogenic variation related to testicular function. This comparative dataset consisted of 79 patients from Alzheimer study as well as 2186 participants from a study of skeletal or brain malformations: cases with adolescent idiopathic scoliosis (AIS, n=1245), Chiari malformation (n=105), hypermobility (n=38) and/or limb abnormalities (n=798), and 43 controls with no malformations.
The AIS dataset has previously been partially reported 22 and is fully available through the database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/gap/; accession ID phs001677.v1.p1). The institutional cohort was subjected to exome capture and sequencing methods identical to the GEMINI cases (McDonnell Genome Institute, St. Louis, MO, USA) and similar bioinformatic pipeline was applied to read alignment and joint genotyping procedures.
The prioritization of variants and genes likely disrupting testicular function was performed in parallel with the GEMINI cases and was aiming to refine the prioritization strategy in an inheritance mode-aware manner to maximize patient-specific calls. Genes with prioritized variants in both cohorts were removed from both as a conservative step.

Additional cohorts
Two independent "replication" cohorts were screened for variation in genes linked to NOA in the GEMINI study. Whole exome sequencing of 900 men has been performed in the framework of the Male Reproductive Genomics (MERGE) study and includes cases with NOA or extremely . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint low sperm counts (n=817). Patients were recruited at the Centre of Reproductive Medicine and Andrology (CeRA), Münster, Germany and the Clinic for Urology, Pediatric Urology and Andrology, Gießen, Germany. The MERGE cohort was queried for the 221 prioritized GEMINI genes to identify potential disease variation and retain recessive changes that are rare (MAF<0.01), non-synonymous variants seen in the MERGE cohort less than 20 times, had alternate read count <5, frequency >25 in all reads and CADD score >10 if the consequence is missense. The data set used to compare to the GEMINI study was further filtered to match the GEMINI filtering procedures by excluding variation with PSAP P-value >10 -3 , the positions that were observed in more than three individuals and the compound heterozygous pairs that were found within a 10 bp proximity or occurred in more than one individual.
Whole exome sequencing in the Newcastle/Radboud cohort was performed on a total of 331 patients who presented with idiopathic NOA (n=164) or severe to extreme oligozoospermia (with or without asthenozoospermia; n=167) at the Radboudumc outpatient clinic between July 2007 to October 2017 (n=286) and at the Newcastle Fertility Clinic between January 2018 to January 2020 (n=45). The presence of chromosomal anomalies, AZF deletions or pathogenic CFTR variants were exclusion criteria, all patients remained idiopathic following thorough clinical evaluation. Variants were filtered for recessive changes that were rare (MAF<0.01), nonsynonymous, seen in less than 5 patients, with an alternate read count >5, and an alternate allele frequency >15%, predicted to be pathogenic by 3 or more of the following variant prediction tools: SIFT 23 , PolyPhen-2 24 , MutationTaster 25 , FATHMM 26 , Mutation Assessor 27 and CADD 28 .
All variants with read depth less than 50 were validated by Sanger sequencing.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Ethics statements & IRBs from centers
The study was approved by the Ethics Committee of all collaborative centers: protocols  with the study cohort and for which a reference call set of high-quality variants is publicly . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.

Detection of copy number variation
Copy number variants were detected from WES data using XHMM as previously described 33 .
CNVs were called using the Viterbi HMM algorithm with default XHMM parameters, and XHMM CNV quality scores were calculated as previously described using the forwardbackward HMM algorithm. In addition, all called CNVs were statistically genotyped across all samples using the same XHMM quality scores and output as a single uniformly called VCF file.
The output of CNV calls was first separated into deletions (DEL) and duplications (DUP) and each subset was then filtered by frequency to remove common CNVs, which were present in >1% of individuals and defined as overlapping more than 50% of their respective targets. Only CNVs with quality scores greater than or equal to 60 were included in the downstream analysis.
As previously recommended 34 , individuals having a CNV count greater than 3 SD above the mean (27 DELs or 30 DUPs, in this study) were removed from the analysis.
To evaluate the sensitivity of CNV detection from the WES dataset, a comparative CNV call set for the reference sample NA12878 sequenced in parallel with the GEMINI study subjects, was available through the whole genome sequencing (WGS) effort of the 1000 Genomes phase 3 project 35 . The CNV calls of the 1000G dataset were required to overlap with WES CNVs by at least 50% of length and span the exonic regions targeted in this study. The sensitivity estimate . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint reached 13% (n=11/85 of reference CNVs), which is within the expected range considering the fragmented coverage of the genome in WES approach and comparable to previously reported sensitivity estimates (7.67%) for CNV calling from exome sequencing data with XHMM when compared to WGS 36 . In total, approximately 55% of the autosomal duplications (n=5/9) and 60% of autosomal deletions (14/23) identified in NA12878 in this study are supported by previous studies 35 and are likely true positives.

Validation of CNVs by PCR
A subset of likely homozygous/hemizygous deletions was selected for experimental validation by +/-PCR, performed with at least one primer pair located between the predicted deletion breakpoints (internal) and a second primer pair outside, either upstream or downstream from the deletion breakpoints in a region unaffected by the predicted deletion (external, positive control).
CNV was confirmed when the internal set of primers did not result in amplification and the external set of primers outside the deletion successfully amplified from patient DNA. Both sets were required to amplify from control DNA. Polymerase reaction for +/-PCR was performed using 50ng of DNA and Qiagen HotStarTaq (Qiagen, Hilden, Germany). Deletions were considered as true positives when only amplification for the external primer pair(s) was obtained.
The primer sequences and annealing temperatures are available upon request.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint identical pair of genotypes was observed in more than one case or were found in cis configuration as determined by pHASER v1.1.1 38 , in-house parsing of variant annotation in the VCF file or manual check of read alignments using IGV tools 39 .
In order to further aggregate variation most likely increasing Mendelian disease risk and affecting spermatogenesis pathways, the following information was considered: 1) loss-offunction (LoF; nonsense, splice site and frameshift variants) changes, 2) genes known to cause The criteria for aggregating most plausible NOA variation were evaluated individually for each Mendelian disease model in parallel with the cohort of institutional controls aiming to maximize patient-specific calls. All autosomal variants were required to meet at least one of the three aggregation criteria mentioned above. X-linked variants were expected to be located in genes with testis enhanced expression and additionally meet either the prioritization criteria 1) or 2) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint mentioned above, whereas Y-linked variants were included if defined as LoF. Any NOA gene, which did not display testis enhanced expression based on the Human Protein Atlas dataset, was required to at least show an appreciable expression specifically in premeiotic germ cells or somatic cells of the testis (normalized TPM>0.5 in the testis scRNA-seq) 42 .

Burden testing
We performed gene-based burden testing using the combined data from GEMINI, MERGE, and Newcastle/Radboud case cohorts, compared to a control cohort of 11,587 fertile parents sequenced at Radboud University. These fertile parents were ascertained from the genetic diagnostic lab at Radboud, where they were referred for genetic testing as part of clinical care for their children. The cohort consisted of approximately equal numbers of men (5,803) and women (5,784), with an ethnic makeup reflecting the local Dutch population. Due to data use limitations, it was not possible to analyze compound heterozygous variants in the control cohort; therefore, burden testing was only performed using single-variant genotype calls. Testing was performed on the 35 genes with more than one prioritized homozygous (autosomal) or hemizygous (X and Y chromosomes) variant in cases (i.e. a smaller number of recurrent genes than observed in the full data). Case and fertile control genotypes were processed using the same filtering and prioritization pipeline described above, so the only variants included in the analysis were the short list of prioritized thought to represent potential Mendelian causes of NOA. The proportion of individuals with filtered variants was compared between cases and controls using the Fisher Exact test, and q-values were obtained using the Benjamini-Hochberg method. Two forms of . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint burden test were performed and reported -one test using just male control samples, and one using the combined male and female control cohort.

Bioinformatic annotation
To determine the ancestry of GEMINI cases, EthSeq software 43 was implemented using Phase 3 genotype and population data from the 1000 Genomes Project as a reference 35 . Only variants with MAF>0.2 and located within the exonic regions targeted in the GEMINI WES study were considered. Long runs of homozygosity (ROH) were detected using the H3M2 tool specifically designed for analysing WES data 44  To extract high quality LoF calls predicted to cause gene knock-outs, homozygous, hemizygous and compound heterozygous LoF variation were annotated and filtered using the LOFTEE tool To explore the expression patterns of GEMINI disease genes across GTEx tissues in comparison to other recessive Mendelian disorders, a list of genes (n=2588) linked to Mendelian diseases was retrieved from the Centers for Mendelian Genomics dataset (http://mendelian.org/phenotypes-genes; accessed Oct 01, 2020). The gene list was then intersected with an OMIM dataset (retrieved May 15, 2020) to only extract genes that follow a recessive inheritance pattern, which includes autosomal as well as X-linked and Y-linked recessive modes. GTEx expression values were available for 604 genes out of the 615 defined as recessive Mendelian disease genes and the difference in the expression levels, when compared to the GEMINI disease genes, was evaluated using Mann-Whitney U test (p-value below 0.05 was considered significant).
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. To explore the functional role of prioritized disease genes in mice, information on the reproductive phenotypes of knock-out mouse models was extracted from the MGI database, International Mouse Phenotyping Consortium (www.mousephenotype.org) 53 and literature.
To estimate p, the number of monogenic azoospermia genes in non-consanguineous men, we used monte carlo simulations to evaluate a model that relates p to the expected rate of recurrent gene hits in a sequencing study, which we denote as r. We assume that for a given set of azoospermia genes, each gene in the set is equally likely to be a monogenic cause of disease (e.g. the amount and effect size of disease variation in each gene is the same). Next, we evaluate the likelihood of a particular value of p using the difference in the expected rate of recurrent hits, r, and the observed rate ro (9.4% among non-consanguineous NOA cases) as an objective function.
We searched over a grid of 1000 values of pi, i ranging from 1 to 1000, with p1 = 1 , p2= 2, and so forth. For each value of pi, we simulate a set of "gene hits" by random sampling with replacement 221 genes, the actual number of prioritized genes observed in GEMINI, from the pool of pi genes. This is done 1000 times for each value of pi, each time recording the corresponding value of ri, and then averaging across all 1000 replicates to arrive at a final estimate for ri. The objective function, ri-rO, was minimized at i=625. The sensitivity of detecting multiple gene hits as a function of the cohort size was then calculated considering the estimated size of the azoospermia gene pool (n=625), discovery rates of 15% for outbred samples, 76% for consanguineous samples and 20% for the GEMINI cohort overall (8% of consanguineous cases) and assuming the absence of natural selection.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

scRNAseq analyses
We created an aggregated dataset of human testis scRNA-seq from multiple sources, comprising cells from a total of 12 human donors. These consisted of 7 adults 42 , 3 adults 54 (GEO accession #GSE109037), and two juvenile samples 55 (GEO accession #GSE120506). As described elsewhere, these datasets were integrated and analyzed using the sparse decomposition of arrays (SDA) framework 56 . SDA soft clusters genes and cells into latent "components", which are then manually interpreted and separated into "technical noise" and "signal". Components corresponding to technical noise are removed from the data, thus providing batch correction. For the analyses described, here, SDA was run with 150 components, half of which were removed as technical noise. Gene ontology enrichment analysis was performed on the top 250 genes from each component (from each side) using the enrichGO function from the clusterProfiler R package in which p-values are calculated based on the hypergeometric distribution and corrected for testing of multiple biological process GO terms using the Benjamini-Hochberg procedure.

Small RNA sequencing and analysis
RNA was extracted from the fixed testicular tissue of the NOA cases with prioritized variants in genes involved in piRNA biogenesis and controls with complete spermatogenesis matched to each case by fixative. Small RNA sequencing libraries were prepared using CATS Small RNAseq kit (Diagenode, Cat. #C05010040) and sequenced on the MiSeq platform (Illumina, San Diego, CA). Raw sequencing reads were processed according to the CATS Small RNA-seq protocol and reads with 25-45 bases in length were mapped to the human assembly hg19 using bowtie v1.0.1 57 allowing one mismatch. The mapped reads were filtered to remove all reads . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Individual piRNA counts were normalized to the spike-in cel-miR-39-3p (5'-UCACCGGGUGUAAAUCAGCUUG-3') added to the samples in the first step of the sequencing library preparation. Spike-in reads were mapped to the human reference using bowtie v1.0.1 and allowing no mismatches.

Sanger sequencing
The regions of interest were amplified using Hot Start Taq

Immunohistochemistry
Testicular tissue was fixed in Bouin's fixative, processed and stained with an antibody directed against STRA8 (Abcam ab49405, diluted 1:250) using the ImmPRESS detection kit (Vector laboratories, CA, USA) as described before 60 . Antigen retrieval was performed in a microwave in TEG buffer and the staining developed 20 min with AEC.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint Correspondence and requests for materials should be addressed to Donald F. Conrad. Email:

conradon@ohsu.edu
Reprints and permissions information is available at www.nature.com/reprints . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.  . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint biallelic STRA8 deletion. STRA8 CNV (the smallest p-value) was prioritized as the most likely cause of NOA in GEMINI-295. e, Z-scores of normalized read depth of exome sequencing data around the STRA8 locus, plotted against the STRA8 gene model. NLS; nuclear localization; NES, nuclear export signal. Green arrows, PCR primers. f, PCR spanning the predicted deletion region yielded a short ~5 kb product in GEMINI-295 (P) validating the homozygous STRA8 deletion. C, control; M, ladder. g, In control testis, clear staining of STRA8 was observed in spermatogonia (arrows), together with background staining in peritubular and interstitial cells (asterisk). In GEMINI-295, STRA8 protein staining in spermatogonia is much lower (arrows), presumably due to nonsense mediated decay of the transcript. Bar represents 10 m. h, Burden test results from comparison of a combined cohort of 2,072 NOA cases with 11,587 fertile controls. Thirty-four genes with prioritized variation in the GEMINI cohort were selected for burden testing; of these 27 were nominally associated with NOA (Methods). * indicates genes with at least "limited" clinical validity defined by 7 .
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.   . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.  . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint transcription of long precursor-transcripts from genomic clusters, which are then processed in the cytoplasm. b, Cell scores for component 59, which encodes both piRNA processing genes and target pre-pre-piRNAs, indicate that it is expressed primarily in pre-leptotene spermatocytes. c, Gene loadings for component 59. Note TDRD10 (black box), a protein-coding gene which has yet to be characterized for a role in mammalian piRNA processing. d, H&E stain of TDRD9 patient biopsy showing spermatogenic arrest. The most mature germ cell observed were early round spermatids, which often appeared multinucleated (arrows). In addition, many pyknotic cells were observed (arrowheads). Scale bar = 50 m. e, Size distribution of piRNAs detected in the NOA case with a biallelic missense variant in TDRD12 and matching control, derived from small RNA-sequencing of testis tissue. f, Significantly decreased fraction of mature (<32 bases) piRNAs was detected in testis of patients with piRNA pathway mutations, indicative of faulty processing of the immature piRNA transcripts.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. karyotype) and large rearrangements on sex chromosomes were detected based on i) the CNV data and ii) the ratio of chromosome Y coverage (normalized to autosomes) and inbreeding . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint 98 bp from E2) and chr7:134933449 (BP2, 2050 bp from E6). d, As expected, PCR1 did not yield a PCR product in the patient (P; the observed 800 bp band is a non-specific product), whereas PCR2 reaction was amplified in the patient. The PCR3 reaction spanning the deletion breakpoints resulted in a 5200 bp fragment in the patient when the expected size in control (C) was 13 kb. Neg, no DNA control; M, marker. Fig. 6. STRA8 and MAGE-A4 staining of testicular tissue. The carrier of STRA8 deletion displayed significantly reduced staining of STRA8 in spermatogonia (arrows) compared to the control with normal spermatogenesis. The STRA8 deletion caused a maturation . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Extended Data
The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint arrest phenotype, whereby the tubules predominantly contained spermatogonia, stained with spermatogonia-specific marker MAGE-A4, and occasional pre-pachytene spermatocytes (arrowheads). Non-specific background staining is observed particularly in peritubular and Leydig cells in spite of extensive optimization of the only commercially available antibody.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022.

EXTENDED DATA TABLES
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022.  institutional controls (total, n=2265, n=851 men). Note the highest sensitivity of identifying a possible genetic cause of NOA in the autosomal homozygous inheritance mode, which was likely powered by the presence of consanguineous men in the NOA cohort. P-values below 0.05 were considered significant. *The sum of individuals across disease models is larger than "Total" as multiple candidate disease models can be observed per person. Genes with 'limited' evidence should be considered as candidate genes that require additional . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint proof of association. AD, Autosomal dominant, AR, Autosomal recessive, XL, X-linked, MA, Maturation arrest, SCO, Sertoli Cell Only, n/a, information unavailable.
. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 24, 2022. ; https://doi.org/10.1101/2022.07.19.22271581 doi: medRxiv preprint