Over the past decade, the utilization of chromosome microarray analysis (CMA) as a clinical diagnostic tool has accumulated a massive amount of copy number variation (CNV) data, most of which are large or with sizes greater than the arbitrary set reporting limits, e.g., several hundreds of thousands of base pairs (kilobase/kb) to millions of base pairs (megabase/Mb). Although some of these CNVs have already been implicated in neurodevelopmental disorders and contiguous gene syndromes,13 many have remained as variants of uncertain clinical significance, with critical regions or causative genes yet to be identified. In clinical CMA, CNVs smaller than the analysis cutoff (e.g., 25–50 kb, 25 markers; referred to as ‘small CNVs’ hereafter) have either not been analyzed or not been reported. Thus far, only a few CMA studies have described the detection of such CNVs in human genetic disorders, e.g., 2.2–49 kb in idiopathic autism and/or intellectual disability,4 psoriasis,5 Crohn's disease,6 autism spectrum disorder7 and congenital heart disease.8 Genome sequencing data have also been utilized in finding CNVs as small as 100 base pairs (bp) to 2.8 kb in patients with autism spectrum disorder and congenital heart disease.810 Despite such reports, small CNV maps for clinical populations remain underrepresented in the existing databases and literature. In contrast, publicly available high resolution CNV maps (with CNVs as small as 50 bp) have been constructed and recently updated,1113 providing comparable resources for the appropriate annotation and interpretation of small CNVs detected in clinical settings. With the availability of these genomic data, we are now poised to build small CNV maps that could aid in narrowing the genomic gap in human genetic disorders, including disorders of sex development (DSDs). DSDs constitute a group of rare and heterogeneous conditions characterized by atypical manifestations of chromosomal, gonadal and phenotypic sex, resulting in differences in the development of the urogenital and reproductive structures. DSDs are classified into three major groups; XX, DSD; XY, DSD; and sex chromosome DSD.14,15

In a significant portion of DSD cases, the genetic etiology remains elusive. Previous CMA studies in DSD have described a few small CNVs (0.2–50 kb),1620 most of which have remained uncharacterized. Here, we focused our retrospective CMA on previously unanalyzed small CNVs and other genomic regions, including small regions of homozygosity (ROHs), imprinting and position effects. We used these high resolution data to investigate salient genomic features, e.g., CpG islands (CGIs), structural regions, functional domains and repeat elements, as well as the DNA methylation, histone modification and RNA expression profiles of these predicted regulatory regions in the normal testis and ovary.

Materials and methods

Ethics statement

This retrospective study was approved by the Institutional Review Board of the Washington University in St Louis School of Medicine.

Patient population

Our retrospective study was performed using CMA data obtained from 52 patients referred to our center from 2008 to 2015 for mild to severe DSDs, atypical secondary sexual development, Turner syndrome or stigmata and sex chromosome anomalies. XY and XX sex chromosome complements accounted for 63% and 21% of the patients, respectively (Table 1). Karyotypes with sex chromosome anomalies were also observed: 6% with loss of X; 4% with XXY; and 6% with mosaicism, X/XX (n=2) and X/XY (n=1).

Table 1 Summary of CNVs and ROH overlapping with DSD genes in our patient cohort

Clinical CMA platform and analysis software

Clinical CMA was performed on two Affymetrix (Affymetrix, Inc., Santa Clara, CA, USA) platform types (Table 1): Whole Genome-Human SNP Array 6.0 (40% of patients; 2008–2012; patients D1-D20, D53) and CytoScan HD (60% of patients; 2012–2015; D21-D52). The latter has a higher resolution (1 probe/880 bp; 2.6 million probes-1.9 million CN and non-polymorphic, 750,000 single nucleotide polymorphism (SNP) probes) than the former (approximately 1.8 million CN and SNP probes), and it encompasses almost all of known OMIM and RefSeq genes. CMA data were compared from control individuals (270 HapMap and 100 in-house) and from 380 phenotypically normal cohorts (284 HapMap and 96 blood samples from 186 female and 194 male patients). Copy number analysis was performed using the Affymetrix Genotyping Console and Chromosome Analysis Suite software programs. All of the genomic linear positions were based on human genome reference version UCSC Genome Assembly GRCh37/hg19.

Retrospective CMA approach

A total of 334 genes obtained from published clinical reports, reviews, CMA studies and the OMIM (Online Mendelian Inheritance in Man) database ( were included in constructing a customized CMA ‘DSD Track’ (Supplementary Table S1; listed by size). These genes have been associated with the development of adrenal, urogenital and other reproductive structures, as well as with sexual dimorphism. Their sizes and chromosome locations (in bp coordinates) were also noted. CMA data from 52 patients were analyzed using a 1 kb filter and 1 probe (versus clinical limits of 25–50 kb, 25 probes). Gene-desert regions known to cis-regulate genes were used to build a ‘position effect track.’ Known imprinting genes ( were also collated and used to build the ‘imprinting track.’ To filter the CNVs and ROHs overlapping with genes (DSDs, imprinted and position effects), the ‘overlap map’ function of the Chromosome Analysis Suite software (Affymetrix) was selected, which facilitated the filtration of CNVs and ROHs of interest from the remaining intervals not overlapping with DSD genes.

Probe coverage

The majority of the genes considered in this study with or without CNVs and/or ROHs showed good or proportional probe coverage (Supplementary Tables S3, S4, S5, S6 and S23). Although no SNPs or CN coverage was seen in approximately 12% (n=24), multiple probes flanked these genes. Probe coverage was also verified from the Affymetrix website (

In silico investigations

All of the small CNVs and ROHs that overlapped with DSD genes were further analyzed for detailed genomic and epigenomic features using publicly available databases. We aligned the CNVs detected from our study with the following tracks from the UCSC Genome Browser21 ( RefSeq genes; common CNV data (inclusive and stringent data) by Zarrei et al.,13 obtained from DGV (database of genomic variants;; the ClinGen database, which curates benign, uncertain clinical significance likely benign (UCS LB), UCS likely pathogenic (LP), UCS and curated pathogenic genes; DECIPHER22 (; Segmental Duplications; CGIs, UniProt/SwissProt Protein and Secondary Annotations for regions and domains; and Transcription Factor ChIP for predicted transcription factor binding sites. DECIPHER was also investigated for CNVs overlapping with our data, including their sizes, sex chromosome complements, types of CNVs and proportions of small CNVs with DSD phenotypes.

We also aligned our small CNV data set with the WashU Epigenome Browser23 ( to investigate the following features: RefSeq genes; CGIs; repeat elements; chromHMM of the ovary for predicted regulatory regions; H3K27Ac and H3K4me1 of the ovary (histone regions predicted to be regulatory); methylC-seq of the fetal testis and ovary for DNA methylation (CGI or regulatory); RNA-seq of fetal and adult ovaries for predicted RNA expression; and cis (local) and trans (interchromosomal) interactions (human foreskin fibroblast, high-content genome conformation capture; density map or circle plot representation).

We also used the following databases for correlating molecular function with the biological processes involved in DSD genes: UniProt (, BGee (; GeneMania ( and STRING ( for the genetic and physical interactions of these genes.


Small CNV discoveries

We designed a customized CMA track targeting 334 genes associated with the development of urogenital and other reproductive structures (referred to as ‘DSD track’ and ‘DSD genes’ hereafter) (Supplementary Table S1). The majority of these genes are mapped on chromosomes X, 9 and 1, and approximately 60% of them are 50 kb (Supplementary Figures S1A and SB), e.g., 897 bp (SRY) and 5.9 kb (SOX9). Using 1 kb as the lowest limit of detection, our retrospective CMA of 52 unrelated patients revealed a total of 12,576 CNVs: 6,475 losses (51%) and 6,141 gains (49%) (Table 1). We used the ‘overlap map’ function of the analysis software to expedite the filtration of 301 CNVs overlapping with 68 DSD genes from all of the others, reducing the number of variants for analysis by approximately a hundredfold (Tables 1 and 2, Supplementary Table S2). CN losses accounted for 53% of the CNVs (n=162; 75% CN=1, 25% CN=0), while 47% were CN gains (n=141; 42% CN=3, 42% CN=4, 16% CN=2). Using the haploinsufficiency index scoring from DECIPHER,24 74% (n=121) of the total CN losses involving 36 genes were predicted to be haploinsufficient, in the range of 0.06% (ESR1) to 21.2% (CAMK1D) (Supplementary Table S2). From the recently generated small CNV map obtained from normal populations,13 we compared the 21 homozygous losses (CN=0) involving NOTCH1, SUPT3H, WWOX and DMD detected in our study with the null CNVs consisting of insignificant, enriched, paralogous CN variable regions, and none were found to be overlapping.

Table 2 Summary of DSD genes with recurrent and overlapping CNVs and ROH

Our CMA approach uncovered 284 (94%) small CNVs involving 63 DSD genes (approximately 93%), and 16 (approximately 24%) of these genes were found to be 50 kb (Supplementary Table S2). The highest frequencies of CNVs were observed in DMD, WWOX and DHRSX (Table 1) and on chromosomes X, 16 and 9, while none were found on 11, 18, 21 and Y (Supplementary Figure S1A). Eight genes exhibited 14 CNVs with breakpoints disrupting exons (1.8–29.1 kb; 8–37 markers) that involve critical structural and functional regions or domains and biological processes related to sex development (Supplementary Table S3). Thirteen CNVs revealed breakpoints outside of an intact gene, 10 of which included 8 small genes, e.g., a 5.6 kb gain (NR0B1) (Figure 1a) to a 64 kb gain (GNRH1) (Supplementary Table S2). Identical small CNVs involving 25 genes detected in two or more patients were observed (Table 1): four different recurrent CNVs in DMD (1.4–6.3 kb; 10–40 markers), three in WWOX (7.2–8.8 kb; 12–24 markers), two each in ESR2 (1.8–4.8 kb; 8–26 markers), NOTCH3 (2.8–19.8 kb; 9–46 markers) and TP63 (4.6–8.3 kb; 12–32 markers), and one in 19 genes (1.2–37 kb; 2–66 markers). CNVs with variable sizes (1.1–190 kb) but involved the same intron or exon were observed in 32 genes, while non-recurrent and non-overlapping CNVs were detected in 25 genes.

Figure 1
figure 1figure 1

(a) CN gains overlapping with an intact NR0B1 (5 kb; Xp21.2). UCSC Browser—CN gains (blue) overlapping with an intact NR0B1 (exons 1–2) in 2 patients (Px); D28 (XY) with underdeveloped genitalia (5.6 Kb, 39 markers; chrX:30,322,475–30,328,115) and D30 (XXY) with ambiguous genitalia, undescended testes and small ureters (4.9 kb, 32 markers; chrX:30,322,476–30,327,408). DGV (Zarrei et al.,13)—no overlapping CNVs found. ClinGen—CN gains and losses categorized as uncertain clinical significance (UCS), UCS likely benign (UCS LB), UCS likely pathogenic (UC LP) and a pathogenic loss. DECIPHER—losses in patients with variable findings. Segmental Dups (SegDups)—none. CpG Islands—exon 1. Protein Annotation (Prot Annot)—ligand binding, AF-2 motif, LXXLL motifs 1–3, AA tandem repeat and repeats 1–4. Transcription Factor (Txn Factor)—exons 1 and 2. WashU Epigenome Browser. Repeat Masker (Repeats)—intron 1: LINE (orange), DNA transposon (blue), simple repeat, microsatellite (SR/MS; gold), satellite repeat (SR; brown). ChromHMM—active transcription start site (aTSS) (red); repressed polycomb (gray). Histone Modification—H3K27Ac (acetylation) and H3K4me1 (methylation) marks in the ovary. DNA Methylation (methylC-seq t, ov)—differential profile in exon 1—some DNA in fetal ovary and none in fetal testis). RNA Expression (RNA-Seq fo, o)—differential profile—more in adult ovary than in fetal ovary. (b) CN loss overlapping with partial region of NR5A1 (26 kb; 9q33.3). UCSC Browser—CN loss (red) involving exon 1-intron 4 in a patient (Px D9; XY) with hypospadias (18 kb; 5 markers; chr9:127,261,935–127,279,829). DGV (Zarrei et al.,13)—no entry. ClinGen—UCS likely benign (LB) gain, pathogenic loss, UCS LP gain. DECIPHER—loss/gain (Note: truncated exon 1 of NR6A1 at the 3' end of the CNV). Segmental Dups (SegDups)—none. CpG Islands—intron 1-exon 2-intron 3. Protein Annotation (Prot Annot)—dimerization, ligand binding, phosphorylation, cross-linking, DNA binding, Zn finger, acetylation. Transcription Factor (Txn Factor)—exons 1-intron 4, upstream of NR5A1. WashU Epigenome Browser. Repeat Masker (Repeats)—exon 1-intron 4—SINEs (red), LINE (orange), LCRs (dark brown), SR/MS (gold); upstream of NR5A1—SINEs (red), LINE (orange), LCRs (dark brown), SR/MS (gold), DNA transposon (blue). ChromHMM—exon 1-intron 4, active transcription start site (aTSS) (red); enhancers (yellow), strong transcription site (green), weak TS (dark green); upstream of NR5A1—aTSS, enhancers, weak TS. Histone Modification—some H3K4me1 methylation and H3K27Ac acetylation marks in ovary. DNA Methylation (methylC-seq t, ov) in 5 regions involving NR5A1 (numbered and underlined)—absence of DNA methylation in fetal testes in exons 2–3 (1) and intron 1 (3) overlapping with an enhancer; absence of methylation in intron 1 overlapping with an aTSS in both fetal testis and ovary (2); absence of methylation in ovary in exon 1 (3) and upstream of NR5A1 overlapping with an enhancer (5). RNA Expression (RNA-Seq fo, o)—similar RNA expression in adult and fetal ovaries. Local Interactions (CCHiC-HFF1-M-R1/NS-R1)—various regions within and upstream of NR5A1 are locally interacting (purple). (c) CN gain involving intron 1 of CTNNB1 (41 kb, 3p22.1). UCSC Browser—CN gain (blue) overlapping with intron 1 of CTNNB1 in a patient (Px D14, XY) with ambiguous genitalia (12.6 kb, 59 markers; chr3:41243113–41255684). DGV (Zarrei et al.,13)—no overlapping CNVs found. ClinGen—no overlapping CNVs found. DECIPHER—losses in patients with variable findings. Segmental Dups (SegDups)—none. CpG Islands—none. Protein Annotation (Prot Annot)—none. Transcription Factor (Txn Factor)—several. WashU Epigenome Browser. Repeat Masker (Repeats)—LINEs (orange), SINEs (red), DNA transposons (blue). ChromHMM—active transcription start site (aTSS) (red); enhancers (yellow), weak TS (dark green). Histone Modification—some H3K4me1 methylation and H3K27Ac acetylation marks in ovary. DNA Methylation (methylC-seq t, ov)—differential DNA methylation in fetal testes and fetal ovaries. RNA Expression (RNA-Seq fo, o)—differential profile in adult and fetal ovaries. Local Interactions (CCHiC-HFF1-M-R1/NS-R1)—various regions within and upstream of NR5A1 are locally interacting (purple arcs). Interchromosomal Interactions (Interchrom inset)—interactions of intron 1 of CTNNB1 with other chromosome loci (purple lines; details in Supplementary Figure 2A). CNV, copy number variation; LCRs, low copy repeats; LINEs, long interspersed elements; SINEs, short interspersed elements.

Genomic and epigenomic landscape

Of the 301 CNVs overlapping with DSD genes, 36% (n=107) and approximately 60% (n=184) accounted for CNVs with exonic and intronic breakpoints, respectively. Thirteen exonic CNVs involved exon 1 of 9 genes (2.5–95.7 kb; 5–47 markers) (Supplementary Table S4), and 34 intronic CNVs involved intron 1 of 15 genes (1.2–29 kb; 3–95 markers) (Supplementary Table S5). Of the remaining non-exon/intron 1 CNVs (n=225/75%; 1.1–507 kb; 2–208 markers), 54% were losses and 36% involved exons (Supplementary Table S6). The majority of exon 1 CNVs (12/13) and the intron 1 of NR5A1 overlapped with functional regions coding for signal peptides, for integral membrane structures and for DNA or ligand binding, as well as with active transcription start sites with unmethylated or differentially methylated CGIs predicted to regulate differential gene expression in normal fetal testes and in fetal and adult ovaries (Figure 1a, b). Other regions upstream or downstream of a gene, within a gene or at its 3'-UTR were also found to include CGIs and other differentially methylated enhancers or active transcription start sites predicted to regulate spatiotemporally the expression of other cis/trans genes or regions25 (Figure 1c; Supplementary Figures S2A–D).

Segmental duplications and repeat elements

Segmental duplications (segdups) are low copy repeats prone to non-allelic homologous recombinations, often resulting in recurrent, large CNVs.20,2630 These segdups were found in 16 genes that involved 108 CNVs, with segdups located in DMD and DHRSX showing the highest CNV frequencies (Supplementary Table S7). Smaller repeat elements have been exapted as binding sites for transcription factors, acting as alternative and differentially methylated promoters or enhancers3133 dysregulating the host's gene expression machinery. Non-allelic homologous recombinations involving such repeats have also been described to result in recurrent but smaller deletions/duplications in various human genetic disorders.30,34,35 All of the CNVs detected in our study showed variable distribution of these smaller repeat elements: RNA repeats, AT-rich regions, simple repeats or microsatellites, DNA transposons and retroelements, such as long terminal repeats, short interspersed elements, including Alu and mammalian wide interspersed repeats, and long interspersed elements (Supplementary Table S7; Figure 1, Supplementary Figure S2). These intervals were also found to harbor CGIs, enhancers and active transcription start sites, which exhibit differential DNA methylation and histone modification, as well as predicting RNA expression in the normal testis or ovary.

Region of homozygosity (ROH)

SNP-based CMA could detect genomic copy neutral (CN=2) stretches of allelic homozygosity. This study detected a total of 3,421 ROHs, and approximately 10% (n=333) overlapped with 92 DSD genes (Table 2; Supplementary Tables S2 and S8). Chromosomes with the most ROHs involving DSD genes were 15, 12 and X (Supplementary Figure S1A). Recurrent ROHs were seen in 58% (n=53) of these genes, with PTPN11 showing the highest frequency (n=29 patients) (Table 2, Supplementary Figure S2). Of the ROHs overlapping with DSD genes (71% were small), 174 showed no corresponding CNVs (exclusively ROH) (n=174) (Supplementary Table S8). Interestingly, eight X-linked genes seen in nine XX patients with DSD and/or Turner stigmata were found to have copy-neutral ROHs (no CNV) (Supplementary Table S9).

Recently, small ROHs detected by CMA were characterized in 46,XX SRY-negative testicular DSD patients.17 These regions involved 27 genes and exhibited variable and smaller sizes (than the clinical cutoff, i.e., 5–10 Mb) ranging from 200 bp (upstream of ZWINT) to 634 kb (INPP4B, USP38). Comparing these reported regions with our data (Supplementary Table S10), 19 small ROHs (1–8 Mb; 207–2,647 markers) involving 10 genes were also found in 16 of our patients (XX, XY, sex chromosome anomaly). Six of these genes were smaller than 50 kb.

Position effect

We customized a “position effect” CMA track to detect gene-desert regions that regulated the expression of a gene at a nearby or distant locus,3638 e.g., the upstream region of SOX9.16,3941 Our study uncovered four recurrent and overlapping small CN losses (1.6–6 kb; 8–17 markers) approximately 1.2 Mb downstream of SOX9 in four patients (Supplementary Table S11). These intervals included DNA transposons (Charlie15A, MER103C), retroelements (Alu and mammalian wide interspersed repeats) and differentially methylated enhancers predicted to interact with other cis/trans regions (Supplementary Figure S2D). In addition to the SOX9 region, 86 CNVs and 44 ROHs involving 15 and 11 regions were also detected, respectively. The regions with the highest frequencies were SHH (n=11) for CNVs and LCT for ROHs (n=20). Approximately 52% of the CNVs that involved 15 regions were 50 kb (2–81 markers). Although many of the genes regulated by these regions are expressed in adrenal, urogenital and reproductive structures, their functional impact on sex development has yet to be determined.

Imprinted genes

We investigated 34 genes known to be imprinted (list obtained from the GeneImprint database: for ROHs and CNVs (Supplementary Table S12). Approximately 65% of such genes were 50 kb. SNRPN is one of these genes, which has been implicated in Prader-Willi syndrome with or without DSD. Using an “imprinting CMA track,” 118 CNVs and 20 ROHs were found to involve 22 and 14 genes (paternal, maternal or isoform-independent), respectively. The CNVs ranged from 1.6 kb to 2.4 Mb (2–728 markers), 73% of which were 50 kb. The ROHs ranged from 1 to 2.5 Mb (229–969 markers). GRB10 showed the highest CNV frequency (n=19 patients). Concurrent gain and loss were exhibited by some genes, including GRB10 (13 patients) and SNRPN (7 patients). These genes are expressed in the adrenal gland, as well as the urogenital and reproductive structures, and their contributions to DSDs remain to be explored.

Linked genes and gene interactions

We explored the chromosome distribution of DSD genes, and we found several adjacent genes with recurrent ROH or CN gains (Supplementary Table S13), e.g., HSD3B1/HSD3B2/NOTCH2 (1p11.2-p12; ROH) in three patients with hypospadias (H/E), FGF8/CYP17A1 (10q24.32-q24.33; ROH) in three patients also with H/E and NR0B1/DMD (Xp21.2; gain) in two patients with genital anomalies.

We utilized different public databases to investigate the interactions (e.g., genetic and physical) of the genes included in this study (Supplementary Table S14). Multiple interacting genes (DSD, imprinted, position) with CNVs and ROHs were revealed, with CTNNB1 showing the greatest number of interacting partners (Figure 2). Many other genes within the same pathway or network have also been found to exhibit local (cis) or interchromosomal (trans) gene interactions42 (Supplementary Table S14).

Figure 2
figure 2

Genetic (a) and physical (b) interactions, pathway associations (c) and co-expression (d) of CTNNB1 with other 125 genes (DSD, imprinted, position effect) with CNVs and ROH (GeneMANIA; See Supplementary Table 24 for complete list of genes. CNV, copy number variation; DSD, disorders of sex development. Genes with predicted interactions/associations are in black circles, and without interaction/association are in gray circle.

Overlapping cases in DGV, ClinGen and DECIPHER

We compared our data with the most recent small CNV map ( 50 bp; 2,057,368 variants) constructed from a normal cohort (n=2,647 subjects from 23 studies),13 and approximately 75% (225/301) of our CNVs were not overlapping (Supplementary Table S15; Figure 1a–c; Supplementary Figure S2D). We also searched the ClinGen database for comparable size CNVs overlapping with our data, and approximately 16% were documented as benign or likely benign, 7 CNVs (26–223 kb) involving 5 genes (CREBBP, CHD7, EP300, NR5A1, SUPT3H) were documented as pathogenic, and most (approximately 80%) showed no entry (Supplementary Table S15; Figure 1; Supplementary Figure S2). Approximately 61% of small CNVs that are common in healthy populations (DGV) or categorized as benign (ClinGen) were found to include salient genomic and epigenomic features characteristic of tissue-specific gene regulation and expression (Supplementary Tables S3, S4, S5, S6). The DECIPHER database also revealed CNVs with variable sizes overlapping with our data, and 19 DSD genes were documented to have small CNVs (33–50 kb), including CHD7, ANOS1 and WWOX, in patients with various DSD conditions (Supplementary Table S16). Although small ROHs ( 4 kb) (Supplementary Table S17) overlapping with DSD genes and CNVs overlapping with position effect regions ( 4.2 kb) (Supplementary Table S18) and imprinted genes ( 1.8 kb) (Supplementary Table S19) were also documented in DECIPHER, none were reported in patients with DSD.

Small CNVs and ROHs correlated with DSD phenotype

Pathogenic CMA findings were reported originally in only 13% of patients, and all of the cases involved aneuploidy, polyploidy, mosaic sex chromosome complement or large structural anomalies (Table 1). Variants of uncertain clinical significance were reported in 23% of patients, none of which were recurrent. Normal CMA accounted for 62% of patients, approximately 12% of whom revealed ROHs in one or more chromosomes. CNVs and/or ROHs involving DSD genes were detected in all of the patients regardless of sex chromosome complement (XX, XY, aneuploid, mosaic) and original CMA finding (normal, pathogenic, variants of uncertain clinical significance).

Using our high-resolution and gene-targeted CMA approach, this retrospective study revealed individual genes with concurrent CNVs (loss/loss, gain/gain, loss/gain combinations) or multiple genes with concurrent CNVs and/or ROHs (Table 2). Some of these genes were found to be recurrent in two or more patients with similar DSD phenotypic findings (summarized in Supplementary Tables S20 and S21). The 28 patients with ambiguous genitalia with or without hypospadias/epispadias (H/E) revealed CNVs or ROHs involving 22 genes. All of the CNVs involving CXL12, VAMP7 and NOTCH1 were detected in seven patients with scrotal anomalies, undescended testes or micropenis/clitoromegaly, respectively. All of the ROHs involving AHRR and PTGDS were detected in four patients with undescended testes, while MAPK3 and NOTCH4 were seen in four patients with micropenis/clitoromegaly. All three patients with H/E revealed CNVs overlapping with a region known to regulate MAF (16q23.2) by position effect (Supplementary Table S11). Gains (71 kb; 21 markers) involving two imprinted linked genes (IGF2, IGF2AS on 11p15.5) were seen in two patients with H/E, undescended testes and micropenis/clitoromegaly, while ROHs (2.1 Mb; 297 markers) involving two imprinted, linked genes (BLCAP and NNAT at 20q11.23) were seen in two patients with ambiguous genitalia and H/E (Supplementary Table S12). The other DSD phenotypic findings were seen in a wide range of patients with variable CNV and ROH frequencies.


Identification of small CNVs that include causative or candidate genes has recently been gaining more attention from genomics investigators. Large CNVs detected by CMA from cohorts of patients with similar phenotypic findings have been aligned to identify small regions of overlap that contain candidate genes.2,40,4346 Data from genome sequencing have also been utilized to identify small CNVs.9,10,47,48 These studies have further demonstrated that, regardless of size, the clinical relevance of the detected CNV is determined by gene content and other layers of evidence.49,50 Current clinical CMA practice follows arbitrarily set limits such that CNVs smaller than these cutoffs are typically not analyzed or reported.

In this study, we uncovered small CNVs from unanalyzed CNVs previously obtained from DSD patients. Although relaxing our analysis filter to its 1 kb limit of detection generated an extensive list of variants, our customized DSD gene-targeted CMA track expeditiously filtered CNVs (with proportional probe coverage) overlapping with DSD genes, thereby allowing us to circumvent the daunting task of investigating all of the variants. The majority of the small CNVs detected in our investigations were not documented in comparable resources obtained from normal cohorts.13 Although the DECIPHER and ClinGen databases have documented overlapping CNVs, most of them were found to be relatively larger in size, and very few small CNVs were documented in patients with DSDs. These findings suggested that these variants are not common in healthy populations and remain underrepresented in clinical populations. Many deletions or duplications of partial regions of DSD genes have been described in various disorders, some of which were reported in DSDs (Supplementary Table S22). All of these findings, however, warrant further studies to determine whether these ‘rare’ CNVs are specific to either syndromic or isolated DSDs or to various clinical populations, including those at risk for gonadoblastoma or germ cell neoplasms and especially those CNVs involving genes implicated in cancer.

Our CMA approach demonstrated the ability to detect not only CNVs that are 50 kb but also relevant genes that are smaller than the clinical CMA cutoff size. For example, small CN gains (4.9 and 5.6 kb) involving an intact small NR0B1 gene (approximately 5 kb in size) were detected in two patients with DSDs. These small CNVs and genes would be masked in clinical settings, unless it was one of the contiguous genes included within a reportable large CNV. We also identified not only potentially relevant private or overlapping small CNVs but also several CNVs that were recurrent in two or more patients with similar phenotypic findings. The breakpoints of these CNVs involved different regions or 5′ or 3′ ends, were intragenic or were upstream or downstream of a gene. High-resolution data obtained from this study provided us with a close-up view of the genomic details of deleted or duplicated structural and functional domains or regions. Although there are limited epigenomic data for the normal testis and ovary and none for atypical tissues, we were able to obtain DNA methylation, histone modification and RNA expression profiles in silico. Exon 1 CNVs exhibited the typical features of a 5′ promoter, such as the presence of unmethylated or differentially methylated CGIs,51 regulatory enhancers, active transcription start sites and regions coding for signal peptides. Intron CNVs were usually not reported in the past; however, their roles in various diseases have been gaining significance in recent years, e.g, disease-associated SNP variations were found in non-coding regions defined by regulatory H3K27Ac marks.52 The differential H3K27Ac marks52,53 and methylation and RNA expression profiles observed in some of the detected exon/intron 1 CNVs, as well as in other CNVs with orphan CGIs and enhancers acting as remote or cryptic promoter regions, suggested tight spatiotemporal testis/ovary-specific gene regulation.25 Importantly, we found that most of the CNVs involved genes that are highly expressed in adrenal, urogenital and other specific reproductive structures, which are key effector or recipient sites responsible for DSD phenotypes. In addition, these CNVs occur in genes that perform critical functions in sex development, including germ cell migration and development and gonadal and genital development, and in androgen and other signaling pathways. Whether CNVs or ROHs involving these regions and genes in these affected tissues would significantly result in improper peptide transport, aberrant protein structure, dysregulation of expression and ultimately a DSD phenotype remains to be elucidated.

Exaptation of transposable elements into promoter or enhancer regions has been correlated with spatiotemporal or tissue-specific gene regulation of the host genome.31,32,34,35,54 Our study focused on some of these elements within these small CNVs. Similar to segdups, these elements have been known to result in recurrent or overlapping small CNVs via non-allelic homologous recombination, and they have been implicated in various human disorders, e.g., the GTF2IRD1 and GTF2I genes distal to the ELN (7q11.23) locus that harbor DNA transposon CHARLIE-like region in Williams syndrome.30 Short interspersed elements are non-long terminal retroelements that are typically found in GC and gene-rich, early replicating euchromatin, as well as in cytogenetic G-banded light bands.31,55,56 It has been demonstrated that CNVs involving clusters of Alu short interspersed elements35 dysregulate gene expression within the promoter regions, they alter splicing and they shift reading frames, resulting in aberrant phenotypic findings.34,57 Although CNVs with or without differential methylation patterns for these DNA transposons and other repeats have been implicated in several human disorders,54,5860 the significance of the repeat element-harboring CNVs that also showed regulatory genomic and epigenomic profiles remains to be pursued in DSDs.

An ROH region is typically reported in clinical settings if it is 5 Mb and harbors a causative gene. However, regardless of how large the detected ROH interval is, such regions are further investigated for genes that might be inherited in an autosomal recessive pattern or involved in a disorder associated with uniparental disomy or imprinting. If detected, reflex studies, such as methylation or microsatellite analysis, are performed on the gene of interest. Our findings from gene-targeted approaches demonstrated that causative genes within an ROH less than the cutoff size could easily be revealed. Although copy neutral ROHs have been described in many neoplastic conditions,61 they are yet to be investigated in DSD. In a recent CMA study in patients with 46,XX testicular DSD, small ROH-harboring candidate DSD genes were uncovered.17 Similarly, our study revealed small CNVs and ROHs overlapping with some of the genes previously reported, as well as with DSD genes and imprinted genes. Although it has yet to be determined whether the CNVs and ROHs involving these genes would result in DSDs, our data provided several avenues for further investigations in DSD patients, including those at risk for gonadoblastoma or germ cell neoplasms.

Our CMA track specific for position effect regions easily detected small CNVs and/or ROHs in gene-desert regions. SOX9 is one of the genes that is regulated by upstream regulatory elements via position effect, and disruption of this region results in craniofacial, skeletal and sex development anomalies.39,41,62,63. On the basis of several reported cases, a more refined interval upstream of SOX9 was described as the smallest critical region for XX (68 kb) and XY (32.5 kb) DSDs.16,39,40 The recurrent, small deletions detected downstream of SOX9 in our DSD patients overlapped with the reported 1.3 Mb position effect region described in a patient with campomelic dysplasia (with DSD).62 Whether the much smaller CNVs overlapping with position effect regions that we revealed in this study would result in DSDs through mechanisms similar to SOX9 remains to be elucidated. It is also possible that the other genes mapped within the interval downstream of SOX9 contribute to the DSD phenotype (or not at all), although this relationship also remains to be investigated.

Mapping the relative distributions of CNVs and ROHs overlapping with DSD genes revealed that CNVs were mostly frequently observed on chromosomes X, 16 and 9, and ROHs were commonly found on chromosomes 15, 12 and X. Neither CNVs nor ROHs were observed on chromosomes 21 or Y. Some of these CNVs involve dose-sensitive haploinsufficient genes that typically exhibit highly conserved coding and promoter regions, and they are tissue-specific and embryonically expressed. These genes are known to interact strongly with other haploinsufficiency genes in the same network.24 As genomic tools have become more sophisticated, more combinations of variants obtained from CMA and/or sequencing, both common and rare, have been revealed, e.g., compound heterozygote, triploinsufficiency and CNV/SNV combination, and some have been implicated in human diseases. Whether CNVs or ROHs involving DSD genes (including linked, imprinted or position effect genes) that are within the DSD network result in pathway dysregulation and subsequently in isolated or syndromic DSDs remains to be elucidated and mapped. The chromosome distribution of DSD genes and their associated CNVs and ROHs, as well as their local and interchromosomal interactions, could provide a better picture of the genomic landscape and perhaps of some of the tools needed to draw the genomic and epigenomic maps of the spatiotemporal regulation of interactions, pathways and networks in DSDs.

Our retrospective CMA identified recurrent small CNVs and ROHs overlapping with DSD genes in patients with sex chromosome aneuploidy or mosaicism, variants of uncertain clinical significance and clinically normal CMA. These findings emphasized our gene-targeted and disease-specific CMA approach to identifying small CNVs overlapping with genes or small regions of overlap that might be relevant to DSDs, supporting this approach in improving the diagnostic power and utility of CMA. In addition, these findings provided shorter intervals for detailed genomic and epigenomic analyses. The results obtained from this study will contribute significantly to the accumulating resources for small CNVs, as well as in narrowing the genomic gap in both normal and clinical populations. We are encouraged that this study will provide many avenues for downstream genomic and epigenomic research investigations and that our DSD-specific CMA approach will advance our understanding regarding DSDs, as well as provide a model analysis tool for finding small CNVs in other human disorders.