X-linked palindromic gene families 4930567H17Rik and Mageb5 are dispensable for male mouse fertility

Mammalian sex chromosomes are enriched for large, nearly-identical, palindromic sequences harboring genes expressed predominately in testicular germ cells. Discerning if individual palindrome-associated gene families are essential for male reproduction is difficult due to challenges in disrupting all copies of a gene family. Here we generate precise, independent, deletions to assess the reproductive roles of two X-linked palindromic gene families with spermatid-predominant expression, 4930567H17Rik and Mageb5. Sequence analyses reveals mouse 4930567H17Rik and Mageb5 are orthologs of human HSFX3 and MAGEB5, respectively, where 4930567H17Rik/HSFX3 is harbored in a palindrome in humans and mice, while Mageb5 is not. Additional sequence analyses show 4930567H17Rik and HSFX3 are rapidly diverging in rodents and primates, respectively. Mice lacking either 4930567H17Rik or Mageb5 gene families do not have detectable defects in male fertility, fecundity, spermatogenesis, or in gene regulation, but do show differences in sperm head morphology, suggesting a potential role in sperm function. We conclude that while all palindrome-associated gene families are not essential for male fertility, large palindromes influence the evolution of their associated gene families.

expression in spermatids, high level of nucleotide identity between palindrome arms) 3 . Moreover, the presence of a single protein-coding gene in each palindrome enables us to more confidently ascribe associated reproductive phenotypes to loss of the deleted gene family 12 . We find both 4930567H17Rik or Mageb5 gene families in mice have orthologs in humans, but despite this conservation, mice lacking either gene family do not exhibit detectable defects in male fertility and post-meiotic spermatogenesis. We observe several abnormalities in sperm head morphology, indicating that while 4930567H17Rik or Mageb5 are not necessary for male fertility, both genes likely play a role in post-meiotic sperm development. Overall, our study supports that the 4930567H17Rik and Mageb5 gene families may play roles in spermatogenesis, but are not necessary for overall male fertility in C57BL/6 J mice. Our findings that 4930567H17Rik or Mageb5 are dispensable for male fertility is consistent with previous efforts demonstrating many single-copy testes-specific genes are also dispensable for male fertility in mice [13][14][15] . Our studies add to previous findings suggesting 4930567H17Rik and Mageb5 palindrome structures are not essential for male fertility or spermatogenesis 12 .

Results
Mouse 4930567H17Rik is a highly diverged ortholog of human HSFX3, while mouse Mageb5 is a conserved ortholog of human MAGEB5. X-linked gene families associated with large palindromes can have orthologs in other species or be independently acquired 4 . To assess possible orthologs of mouse 4930567H17Rik and Mageb5 in humans, we compared their protein sequence via BLASTP and found mouse 4930567H17Rik is orthologous to human Heat Shock Transcription Factor X linked Member 3 (HSFX3) and mouse MAGEB5 is orthologous to human MAGEB5 (Fig. 1A). We further examined whether the genomic regions between mouse and human are syntenic (i.e. if they share flanking orthologous genes). In mouse, 4930567H17Rik is flanked by the genes Iduronate 2-sulfatase (Ids) and Transmembrane Protein 185a (Tmem185a) which also flank HSFX3 in humans. Interleukin 1 receptor accessory protein-like 1 (Il1RAPl1) and Aristaless related homeobox (Arx) flank Mageb5 in both human and mouse (Fig. 1A). This data supports that 4930567H17Rik and HSFX3 and Mageb5 and MAGEB5 are orthologous between human and mice.
In mouse, two gene copies of 4930567H17Rik and Mageb5 exist within palindromic sequences (Fig. 1B), however, the copy number of both genes is different on the human X chromosome. In humans, HSFX3 is present in four copies (annotated as HSFX1-4) and MAGEB5 is inverted in a unique non-palindromic sequence. Human MAGEB5 has additional neighboring MAGEB gene family copies, but none of the gene family members are within palindromes (Fig. 1B). BLASTP alignments and synteny of 4930567H17Rik/HSFX3 and Mageb5/ MAGEB5 suggest both X-linked gene families were present in the common ancestor of mouse and humans ~ 80 million years ago (MYA), but diverged at the sequence level, as in the case of 4930567H17Rik, or at the level of palindrome structure, as in the case of Mageb5.
Mouse 4930567H17Rik and human HSFX3 are rapidly diverging protein-coding genes. To further understand how 4930567H17Rik and HSFX3 diverged, we compared the evolutionary dynamics and intron-exon structures of 4930567H17Rik in rodents and HSFX3 in primates. Previous studies have shown both copies of 4930567H17Rik (gene accession #'s NM_001033807, NM_001081476.1) exhibit rapid sequence divergence in rodents; having a K a /K s value of 1.81 when compared across three Mus lineages and R. norvegicus as an outgroup 16 . We find primate HSFX3 has a K a /K s value of 1.20, indicating 4930567H17Rik and HSFX3 sequence is rapidly diverging in both rodents and primates. The rapid sequence divergence of 4930567H17Rik in rodents may have been facilitated by loss of an exon. Other mammalian HSFX3 orthologs, including human HSFX3, encode two exons, while 4930567H17Rik encodes a single exon (Fig. 1C). HSFX3 has a DNA binding domain spanning the splice junction between exon 1 and 2. 4930567H17Rik produces a predicted protein that shares amino acid identity only with HSFX3. While 4930567H17RIK lacks the DNA-binding domain, it possesses an expanded glutamic acid repeat at the C-terminus end (Fig. 1C). Interestingly, the first exon of mouse 4930567H17Rik is pseudogenized and remnants of the first exon are found outside of the palindrome, via sequence comparisons ( Fig S1). RNA-seq data supports the ancestral first exon of HSFX is not transcribed in mice and thus the second exon is the only remaining functional exon in Mus musculus (Fig S1).
The predicted annotation of 4930567H17Rik is a long non-coding RNA 17 . However, we find evidence to support that 4930567H17Rik is a protein-coding gene despite loss of an exon and rapid sequence divergence. First, 4930567H17Rik encodes a large open reading frame (233 amino acids), conserved across rodents (Fig. 1C). Despite the rapid sequence divergence, 4930567H17Rik has not acquired new nonsense mutations typical of pseudogenes or long non-coding RNAs. Second, reanalysis of ribosome profiling data 18 from sorted spermatogenic cells demonstrates 4930567H17Rik RNA is most highly associated with ribosomes in round spermatids and elongating spermatids, consistent with single cell RNA-seq studies, altogether indicating 4930567H17Rik mRNA is likely translated (Fig. 1D). Overall, we conclude that the protein-coding sequence of 4930567H17Rik/ HSFX3 is rapidly diverging across mammals, and the exon containing the HSF DNA-binding domain has been lost along the Mus lineage, after divergence with rat.

Discussion
Our study addresses whether gene families harbored within large singleton X-palindromes are required for male fertility and spermatogenesis in mice. While null mutants of the Slx and Slxl1 gene families harbored in X-palindrome arrays result in male infertility and defects in spermatogenesis 8 , null mutants of the 4930567H17Rik and Mageb5 gene families in singleton X-palindromes do not. The absence of an overt reproductive phenotype in male mice lacking 4930567H17Rik or Mageb5 may in part be due to genetic redundancy. 4930567H17Rik is related to heat shock transcription factors, which could have compensating gene family members. Indeed, Hsf1 and Hsf2 are expressed in post-meiotic cells 23,24 and Hsf2 is known to regulate post-meiotic X-and Y-palindromic gene families 24 suggesting Hsf1 and Hsf2 could compensate for the loss of 4930567H17Rik. The most likely candidate to compensate for loss of 4930567H17Rik is Hsf2, based on the robust spermatid expression of Hsf2, as compared to Hsf1 (Fig S8, left). Similarly, Mageb5 has eight X-linked 3 and one autosomal Mageb gene family members expressed in the testis that could potentially compensate for the loss of the Mageb5 gene family. The most likely candidate to compensate for loss of Mageb5 is Mageb3, because Mageb3 is expressed at the highest level in spermatids, as compared to other Mageb gene family members (Fig. S8, right). To better understand the spermatogenic role of Mageb5, and Mageb family members, removal of multiple Mageb family members, particularly Mageb3, may be necessary. For both 4930567H17Rik and Mageb5, further studies investigating these possibly redundant genes could help elucidate the roles of 4930567H17Rik and Mageb5 in spermatogenesis. Furthermore, studies of 4930567H17Rik orthologs in rats or primates, that still possesses HSF DNA-binding domains, could shed light on the ancestral function of 4930567H17Rik.
Despite the lack of overt reproductive phenotypes in 4930567H17Rik ∆CDS/Y and Mageb5 ∆Arm∆CDS/Y mice, differences in sperm head morphology suggest 4930567H17Rik and Mageb5 play a role in sperm development. Sperm head morphology analysis uses DAPI stained images of sperm to detect chromatin. 4930567H17Rik +/Y and Mageb5 ∆Arm∆CDS/Y sperm had increased size and elongation, respectively, compared to wild type sperm (Fig. 4). This finding may represent that sperm from these mice have a reduced level of chromatin compaction. Thus, 4930567H17Rik and Mageb5 may alter chromatin compaction during spermiogenesis and epididymal transit, a time in development when sperm chromatin compaction is dynamic 25 . Supporting a potential role for 4930567H17Rik and Mageb5 in sperm chromatin compaction, both gene families exhibit increasing expression levels from late round spermatids to elongated spermatids (Fig. 1D) www.nature.com/scientificreports/ compaction. Tracking the dynamics of sperm head morphology during development and epididymal transit may help further define the role of 4930567H17Rik and Mageb5 in sperm development. Our current analyses of HSFX gene families and previous studies on 4930567H17Rik 16 demonstrate that HSFX and 4930567H17Rik sequences are rapidly diverging, suggesting the 4930567H17Rik/HSFX gene family is under positive selection throughout mammals. The gene family's presence within a X-linked palindrome may facilitate this rapid evolution in multiple ways. First, positive selection is known to be stronger for X-linked genes with male-beneficial functions, because of male sex chromosome hemizygosity 26 . Second, the second gene copy provide more substrate for new beneficial mutations upon which selection pressures can readily act 27 . Third, the second copy could relax constraint on palindromic genes and facilitate the acquisition of novel functions 27 . Fourth, any beneficial mutation arising in one gene copy could be readily spread to other gene copies in the palindrome through arm-to-arm gene conversion 16 . In the future, it will be important to connect how the rapid sequence divergence of HSFX and 4930567H17Rik relates to their spermatogenic functions.
Large palindromic regions are challenging to study in mice and thus are not a priority in large mouse knockout project consortiums [28][29][30][31] . CRISPR now enables the study of both X-palindromic structures and their associated gene families. Megabase-sized deletions of arrays of large palindromes demonstrate the necessity of large palindromes and their associated genes for male fertility [8][9][10] . However, these studies cannot resolve whether the palindrome structure or the associated gene families are responsible for male infertility. Our study demonstrates how CRISPR can generate specific deletions of a single palindrome-associated gene family, while keeping the palindrome structure largely intact. Our study also improves our understanding of large X-palindrome-associated gene function, by demonstrating that individual X-palindrome associated gene families are dispensable for male fertility. Future studies using CRISPR to genetically dissect the importance of palindrome structures versus associated gene families in reproduction will provide a more complete understating of the importance of these large genomic regions and their implications in male fertility.

Materials and methods
Generation of mice lacking 4930567H17Rik and Mageb5 palindrome-associated gene families. Mice lacking the X-linked palindrome-associated 4930567H17Rik and Mageb5 gene families were generated using a CRISPR Cas9 strategy. We selected single guide RNAs (sgRNAs) within the coding sequences of 4930567H17Rik or Mageb5 (Table S3)  www.nature.com/scientificreports/ (50 ng/ul) complexes into mouse zygotes and screening for edits via PCR using primers flanking sgRNA cut sites (Table S4) and subsequent Sanger sequencing. We selected sgRNAs with cleavage efficiencies of > 30% to delete 4930567H17Rik and Mageb5 gene families.
To generate mice lacking the 4930567H17Rik gene family (4930567H17Rik ∆CDS/Y ), C57BL/6 J X SJL hybrid females were crossed with existing 4930567H17Rik +/Y mice 12 . Zygotes were injected with Cas9 protein (50 ng/μl), a single-stranded oligonucleotide donor (10 ng/μl), and dual sgRNAs (30 ng/μl) targeting each 4930567H17Rik gene copy to achieve a ~ 650 base pair deletion within each copy on both palindrome arms (Table S3). The deletion breakpoints were verified via PCR and subsequent Sanger sequencing (Fig S7A). An F1 male carrying a deletion of both 4930567H17Rik coding sequences in cis (4930567H17Rik ∆CDS/Y ) was bred to a C57BL/6 J female to generate 4930567H17Rik ∆CDS/+ female mice. 4930567H17Rik ∆CDS/+ females were backcrossed to C57BL/6 J males to generate 4930567H17Rik ∆CDS/Y mice, which were used for all experiments. 4930567H17Rik ∆CDS/Y mice used in the described experiments were backcrossed to C57BL/6 J for > 7 generations.
To generate mice lacking the Mageb5 gene family, zygotes from Mageb5 ∆Arm/+ females crossed to Mageb5 ∆Arm/Y mice 12 were injected with Cas9 protein (50 ng/μl), an oligonucleotide donor (10 ng/μl), and dual sgRNAs (30 ng/μl) targeting a ~ 900 bp deletion of Mageb5 (Table S3). These injections resulted in two independent Mageb5 ∆Arm∆CDS/Y lines, " L1" carrying a 860 bp deletion and "L2" carrying a 400 bp deletion. The deletion breakpoints of the two lines were verified via PCR and Sanger sequencing (Fig S7B). F1 females with both the Mageb5 palindrome arm and coding sequence deleted in cis were bred to C57BL/6 J males to generate Mageb5 ∆Arm∆CDS/+ female mice. Mageb5 ∆Arm∆CDS/+ females were backcrossed to C57BL/6 J males for > 10 generations to generate Mageb5 ∆Arm∆CDS/Y mice, which were used for all experiments.
Both 4930567H17Rik ∆CDS/Y and Mageb5 ∆Arm∆CDS/Y mice transmitted the 4930567H17Rik and Mageb5 coding sequence deletions through the germline and no changes in overall health were observed due to off-target effects of CRISPR or as a consequence of the deletions. All mice used in these studies were between 3-7 months of age. 4930567H17Rik ∆CDS/Y and Mageb5 ∆Arm∆CDS/Y mice were directly compared to wild type littermates (4930567H17Rik +/Y and Mageb5 +/Y mice) in all experiments allowing for the minimization of the effects of genetic background and age. If wild-type littermates were not available, then age-matched controls were used. Because both Mageb5 ∆Arm∆CDS/Y L1 and Mageb5 ∆Arm∆CDS/Y L2 mice were able to be maintained easily (had normal breeding), Mageb5 ∆Arm∆CDS/Y L1 were used for experiments presented in this work unless otherwise specified. Cages were kept on ventilated racks at 72°F, 30-70% humidity, on a 12 h:12 h light: dark cycle in a specific-pathogen free room. Cages were monitored daily by husbandry personnel and changed every two weeks. Mice were given water and fed Lab Diet 5008 food ad libitum. Adult mice were sacrificed by CO 2 asphyxiation followed by cervical dislocation and pups were sacrificed by decapitation in compliance with ULAM standard procedures in euthanasia. The Institutional Animal Care and Use Committee of the University of Michigan approved all animal procedures (PRO00009403) and all experiments followed the National Institutes of Health Guidelines of the Care and Use of Experimental Animals and the ARRIVE guidelines.
Genotyping. Genotypes of 4930567H17Rik ∆CDS/Y and Mageb5 ∆Arm∆CDS/Y mice were determined via PCR on DNA samples collected from 1-2 mm tail snips. Tails were digested in 50 mM NaOH for 20 min at 95 °C and briefly vortexed to dissolve tissues. 50 µl of Tris HCl (pH 6.8) was added to neutralize NaOH and samples were centrifuged at 13,000 rpm for 30 s 32 . PCR was performed with Taq DNA polymerase (New England Biolabs) per manufactures instructions. To verify genotypes of 4930567H17Rik ∆CDS/Y and Mageb5 ∆Arm∆CDS/Y mice, we used primers flanking the coding sequence of each gene (primers 1-5 Table S4). For the Mageb5 lines, we used primers flanking the Mageb5 palindrome arm to verify the deletion of the palindrome arm, as previously described 12 (primers 6,7 Table S4).
Reverse transcriptase-PCR. Total testis RNA was extracted using Trizol (Life Technologies) according to the manufacturer's instructions. ~ 10 µg of total RNA was DNase treated using Turbo DNase (Life Technologies) and reverse transcribed using Superscript II (Invitrogen) using oligo (dT) primers to generate firststrand cDNA. RT-PCR was performed on adult testis cDNA preparations with primers residing in the single exon coding sequence of 4930567H17Rik (primers 3, 8 Table S4), and with intron-spanning primers for Mageb5 (primers 9,10 Table S4). Primers to the round spermatid-specific gene Trim42 (primers 11,12 Table S4) served as a positive control 8 . To control for genomic DNA contamination, a reaction lacking reverse transcriptase was performed in parallel.

RNA-sequencing.
Testis RNA was extracted from three 4930567H17Rik ∆CDS/Y and three Mageb5 ∆Arm∆CDS/Y L1 mice, along with three wild-type littermate controls from each line, and DNase treated as described above. Total RNA quality was assessed using the Tapestation 4200 (Agilent) (minimum DV200 value of greater than 30% and a minimum concentration of 3.32 ng/µl). RNA used in this study had RIN (RNA integrity number) values ranging from 6.1-8.9. Ribo-minus (RNaseH-mediated) stranded RNA-seq libraries with indexed adaptors were generated (New England BioLabs). Final libraries were quantitated by Kapa qPCR using Kapa's library quantification kit for Illumina sequencing platforms (Kapa Biosystems, catalog # KK4835). Pooled libraries were subjected to 150 bp paired-end sequencing according to the manufacturer's protocol (Illumina NovaSeq6000) giving an average of 50 million reads per sample. Bcl2fastq2 Conversion Software (Illumina) was used to generate de-multiplexed Fastq files. RNA-seq reads were pseudoaligned to the NCBI RefSeq gene annotation for the Mus musculus C57BL/6 J (mm10) reference genome by Kallisto 33 , using the default settings. Transcript per million (TPM) numbers were generated by Kallisto. The estimated number of RNA-seq reads aligning to each gene, as provided by Kallisto, were used as input to DESeq 34  Testis to body weight ratio. To calculate testis/body weight ratio, total testis mass was divided by the total body mass taken at the time of euthanasia.
Sperm counts, motility and swim-up assay. Following dissection from the body cavity, the two cauda epididymis were dissected and nicked three times to allow sperm to swim out. Cauda were placed in 1 ml of Human Tubal Fluid (HTF) (Millipore) at 37 °C, and rotated in a 37 °C incubator for 10 min. Cauda were removed and a 100 μl aliquot was used for pre-swim-up baseline sperm counts and motility assessment. For swim-up assays, the remaining portion of sperm in HTF was then removed and placed in a new tube using wide bore tips. Sperm were centrifuged for 5 min at 400xg and the supernatant discarded. The pellet was re-suspended in 1 ml of fresh 37 °C HTF and centrifuged for 5 min at 400×g. The supernatant was removed and 1 ml of fresh 37 °C HTF was carefully overlaid on top of the pellet. The tube was then placed at a 45° angle in a 37 °C incubator for 1 h; after which the top 800 μl containing motile sperm was removed and placed in a new tube. All aliquots of sperm used for counting were diluted 1:10 in H 2 O and counted using a hemocytometer. Counts were performed blind with four technical replicates per mouse. Sperm counts were calculated by taking the average number of sperm from each of the four technical replicates per mouse. Motility was assessed by counting ≈ 200 sperm for each replicate on a hemocytometer across at least 5 frames. Sperm were considered motile if they showed both progressive movement and signs of flagellar activity 36,37 . Percent motility for visual assessment was calculated by dividing the number of motile sperm by the total sperm counted and multiplying by 100. Motility assayed via swim up was calculated from dividing the post swim-up count by the pre-count and multiplying by 100. All analyses between groups were performed with an unpaired two-tailed student's t-test, unless otherwise noted.
Fecundity and fertility. Three 4930567H17Rik ∆CDS/Y and three Mageb5 ∆Arm∆CDS/Y mice, and equal numbers of wild-type litter mate controls, were each repeatedly mated with two CD1 females. Litter size was recorded, and the sex of each offspring was determined with sex-genotyping PCR primers (primers13,14 Table S4) specific to the X-and Y-linked gene Ube1 (Ubiquitin-like modifier activating enzyme, 1 as previously described 8 . Sperm head morphology assessment. To assess sperm head morphology, 25 μl of the pre-swim-up sperm aliquot above was placed on a slide and allowed to dry. Cells were fixed for 10 min in 500 μl of 4% PFA (diluted in PBS). Slides were rinsed twice in PBS for 5 min and left to dry. Slides were stained with Vectashield with DAPI (Vector Laboratories) under a 22 × 40 mm cover slip and imaged using an Olympus UPlanSApo 100 × oil objective on an Olympus BX61 equipped with a Hamamatsu Orca-ER camera and Excelitas X-Cite 120LED fluorescence illuminator. Nucleus detection and morphology assessment was performed using the default settings of the custom plugin "Nuclear_Morphology_Analysis_1.20.0_standalone" to the image analysis software ImageJ 22 . ~ 100 sperm heads from each genotype were blindly selected, imaged, and input into the software. Default edge detection settings were used, and sperm heads were manually inspected to ensure all sperm were accurately detected and only sperm were selected by the software. Sperm from Mageb5 ∆Arm∆CDS/Y mice were not originally oriented correctly so the top and bottom vertical border tag was placed manually for the dataset 38 .

Data availability
The data underlying sections of this article are available in NCBI's sequence read archive at https:// www. ncbi. nlm. nih. gov/ sra and can be accessed under BioProject number: PRJNA748373 and accession numbers of SRR15198217 -SRR15198228. The rest of the data underlying this article are available in the article and in the online supplementary material. www.nature.com/scientificreports/