Mayer–Rokitansky–Küster–Hauser (MRKH) syndrome is characterized by the congenital absence of the upper two-thirds of the vagina and uterus and occurring at a rate of 1 in 4500 newborn girls. Patients with MRKH syndrome have a normal female karyotype (46XX) and seemingly normal development of secondary sex characteristics and typically present with amenorrhea during adolescence, resulting in problems with sexual intercourse and infertility1. This syndrome is classified into type I and type II: type I involves only uterovaginal aplasia, and type II involves uterovaginal aplasia with concomitant defects, such as renal malformations, skeletal malformations, hearing defects, and rare cardiac and digital anomalies1. The etiology of this syndrome remains elusive because most of the cases are sporadic with potential underlying heterogenous causes. However, familial aggregation is occasionally observed, and genetic involvement has been reported by several investigators, showing likely autosomal dominant inheritance with incomplete penetrance1,2. Frequent genomic rearrangements, such as deletions and duplications, have been identified by array comparative genome hybridization (CGH) analyses; however, a consistent pattern has not been observed, and a responsive gene has not been pinpointed, except for LHX1 and HNF1B on chromosome 17q121,3,4,5. Because large families that promote linkage and positional cloning are mostly lacking in MRKH syndrome, a candidate gene approach has been utilized for detecting causal genes; thus far, only WNT4 has been identified in MRKH syndrome with hyperandrogenism1.

Recent advances in sequence technologies have propelled genetic analyses of both rare and common diseases at the whole-genome level. We performed genome-wide single-nucleotide polymorphism (SNP) analyses to detect chromosomal rearrangement and exome analysis to identify causal variants in trio-families and sporadic cases.

Informed consent was obtained in accordance with the Declaration of Helsinki, and the study protocol was approved by the Institutional Review Board of Tokai University School of Medicine (12I-03, 17I-32), National Institute of Genetics (nig1608) and Yamaguchi University Graduate School of Medicine (H29-229). Each participant gave written informed consent for the collection of samples and subsequent analyses. A total of 17 specimens (ten patients and seven unaffected individuals from the families) were recruited in this study. Six of the ten patients had type I MRKH syndrome, and four of the ten patients had type II MRKH syndrome (Table S1). We also analyzed samples from healthy parents (mother and father) of two patients (A5 and A7) and samples from healthy family members (mother, father, and sister) of one patient (A6). We designated families of A5, A6, and A7 as Fam01, Fam02, and Fam03, respectively (Figure S1).

Genomic DNA was extracted from the peripheral white blood cells of individuals using the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s instructions. Genomic DNA was hybridized using the SureSelect Human All Exon V5 Kit (Agilent) and sequenced using a HiSeq 2500 (Illumina) with 100 or 150 base paired end modules. Sequencing data were mapped to a human genome reference (hg19) using a standard method of BWA, Picard, and GATK, as previously described6,7. Variant calling and genotyping were performed using GATK HaplotypeCaller6. De novo variants in the three families (Fam01, Fam02, and Fam03) were called using the Trio Calling module of VarScan8. Allele frequency and functional information of variants were annotated by ANNOVAR9. We also used allele frequencies in the Japanese population for further selection of variants using the Human Genetic Variation Database (HGVD)10. After selection of variants, the allele frequencies were manually reviewed in the Integrative Japanese Genome Variation Database (iJGVD)11. Single-nucleotide polymorphism (SNP) array experiments using Infinium OmniExpress-24 BeadChips (Illumina) were conducted for nine of the 10 patients with MRKH except for A5. A PennCNV-ParseCNV analysis pipeline12,13 was applied to detect structural variations. To identify disease-specific CNV, we used in-house CNV data from individuals without MRKH. For Fam01 (A5, C8, and C9), the array-CGH experiment was conducted using the Agilent Human Genome CGH Microarray 1M Kit.

Three common structural variations were identified by the PennCNV-ParseCNV pipeline (Fig. 1). One amplified structural variation (chr2:10,886,097–10,890,204) was found in A1 and A8. A4 and A6 shared a deleted region of chr8:135,062,170–135,065,947. A7 and A9 shared a deleted region of chr16:5,608,833–5,613,997. Identified structural variations were not large and were not involved in coding regions. No overlapping region in the reported results and WNT4 region was identified. Because two patients shared the same structural variations at each site, these variations would likely be polymorphic. Therefore, we shifted our focus to single-nucleotide variants (SNVs) and small insertion and deletions (Indels) to evaluate causality using whole-exome analyses mainly in trio-based families.

Fig. 1: Three common structural variations of MRKH syndrome.
figure 1

(Top) A duplication (4108 bp) in chromosome 2 (chr2p25.1). A1 and A8 had a common duplication in chr2:10,886,097–10,890,204. (Middle) A deletion (3778 bp) in chromosome 8 (chr8q24.22). A4 and A6 had a common deletion in chr8:135,062,170–135,065,947. (Bottom) A deletion (5165 bp) in chromosome 16 (chr16p13.3). A7 and A9 had a common deletion in chr16:5,608,833–5,613,997. Black arrows and red or blue vertical lines indicate the positions of structural variations in chromosome ideograms. All structural variations were visualized using UCSC Genome Browser (hg19)

Ten patients including three patients from trio-based families and seven unaffected individuals from their families were subjected to whole-exome analyses. The mean depth of exome sequencing was ×71 and ranged from ×49 to ×98. Coverage ≥ ×10 was achieved for more than 90% of all the samples (Table S2). A total of 228,135 variants and an average of 95,293 variants (from 91,835 to 97,514) were detected in the 17 samples. After selection of exonic and splicing variants with an allele frequency ≤ 5%, an average of 284 variants (from 247 to 348) remained. Variants were surveyed using the Japanese public database (HGVD), and variants with allelic frequency <0.1% were selected (Table S3 and S4). We could not find any variants in the previously reported MRKH-associated genes (LHX1, HNF1B, and WNT4). To identify de novo variant(s) in the three families, we conducted trio-based de novo variant calling. After manual curation of variants using IGV14, we identified three de novo variants in three genes (MYCBP2, NAV3, and PTPN3) (Table S5). We also identified a distinct variant of MYCBP2 in a sporadic case (A1). Genotypes of all the de novo variants and the sporadic variant of MYCBP2 were heterozygous, and the positions of the variants are shown in Table 1. Functional predictions of variants in MYCBP2 (A1 and A5), NAV3 (A6), and PTPN3 (A7) are summarized in Table 1. All the identified variants have not been registered in the most updated databases, and all the variants are probably deleterious (especially predicted by MutationTaster and fathmm-MKL). MYCBP2 is located in chromosome 13q22.3, which has not yet been identified to harbor SNVs and chromosome aberrations in MRKH syndrome. MYCBP2 encodes an E3 ubiquitin–protein ligase and has not been described in the etiology of uterine and vaginal development15. Reduced expression of MYCBP2 has been observed in acute lymphoblastic leukemia patients16, and a deletion mutation in this gene has been associated with developmental abnormality of optical discs, resulting in a rare inherited vision defect17. NAV3 is located on chromosome 12, and it belongs to a neuron navigator family that is expressed in the nervous system18. Because NAV3 variants have not been implicated in diseases and only one patient had the variant, NAV3 was not considered to be a cause of MRKH. PTPN3 is located on chromosome 9, and it belongs to a tyrosine phosphatase family19. PTPN3 has multiple functions in cellular process, such as differentiation and growth. Somatic mutation of PTPN3 can promote cell proliferation and cholangiocarcinoma20, but the involvement of this gene in MRKH is not clear.

Table 1 Functional predictions of the three de novo variants and MYCBP2 variants in A1

Genetic approaches to identify the genetic causalities for MRKH syndrome have not been successful. Thus, the focus should be on epigenetic and environmental factors underlying the disease. Indeed, discordant phenotype in twin sisters has been reported, indicating more heterogenous characteristics of the syndrome. Therefore, genetic and nongenetic factors need to be investigated for full understanding of MRKH syndrome.

We identified two mutations in MYCBP2 in two patients (A1 and A5). In particular, one patient (A5) showed a de novo mutation. The functional involvement of MYCBP2 in the etiology of MRKH syndrome needs to be further investigated.