Duplication events downstream of IRX1 cause North Carolina macular dystrophy at the MCDR3 locus

Autosomal dominant North Carolina macular dystrophy (NCMD) is believed to represent a failure of macular development. The disorder has been linked to two loci, MCDR1 (chromosome 6q16) and MCDR3 (chromosome 5p15-p13). Recently, non-coding variants upstream of PRDM13 (MCDR1) and a duplication including IRX1 (MCDR3) have been identified. However, the underlying disease-causing mechanism remains uncertain. Through a combination of sequencing studies on eighteen NCMD families, we report two novel overlapping duplications at the MCDR3 locus, in a gene desert downstream of IRX1 and upstream of ADAMTS16. One duplication of 43 kb was identified in nine families (with evidence for a shared ancestral haplotype), and another one of 45 kb was found in a single family. Three families carry the previously reported V2 variant (MCDR1), while five remain unsolved. The MCDR3 locus is thus refined to a shared region of 39 kb that contains DNAse hypersensitive sites active at a restricted time window during retinal development. Publicly available data confirmed expression of IRX1 and ADAMTS16 in human fetal retina, with IRX1 preferentially expressed in fetal macula. These findings represent a major advance in our understanding of the molecular genetics of NCMD and provide insights into the genetic pathways involved in human macular development.

North Carolina macular dystrophy (NCMD) is a rare autosomal dominant disorder in which there is abnormal development of the macula, a crucial structure of the central retina responsible for central vision and colour perception 1 . Understanding the genetics of rare developmental macular conditions is key for unravelling the mechanism of development of this structure that is found only in higher primates within mammals 1 . NCMD shows fully penetrant inheritance and is considered a non-progressive disorder with a wide range of phenotypic manifestations, usually affecting both eyes symmetrically 2,3 . Phenotypic presentation varies from mild cases with drusen-like deposits covering the macular region but with little or no visual impairment, to severe cases with marked central chorioretinal atrophy and poor vision. Although generally non-progressive, complications associated with choroidal neovascularization can contribute to visual deterioration.
The molecular genetics of NCMD has been extensively investigated with the disorder being mapped to chromosome 6q16 (MCDR1, MIM:136550) in multiple families of different ethnic origins since the early 1990s [4][5][6][7][8][9][10] . A similar phenotype has been assigned to a second locus at 5p15-p13 (MCDR3, MIM:608850) 3,11 . Interestingly, several studies reported evidence for ancestral haplotypes at the MCDR1 locus 2,12,13 . Early sequencing studies of the two disease intervals failed to identify exonic disease-causing variants 2,14 . More recently, three novel single nucleotide variants (SNVs) were identified in 11 families at the MCDR1 locus, within a DNase1 hypersensitivity site (DHS), in the non-coding interval between PRDM13 and the neighbouring overlapping genes CCNC/TSTD3 15 . Two tandem duplications including the full coding region of PRDM13, with some additional upstream and downstream sequence included, were also identified 15,16 . One MCDR3-linked family of Danish origin 3 was found to carry a 900 kb tandem duplication 15 that includes the entire coding sequence of IRX1. However, duplications of IRX1 have been observed in several normal individuals from the Database of Genome Variants 15,17 and the significance of this reported variant is uncertain. Thus, the causative mechanism at the 5p15-p13 NCMD locus remains unclear.
In this report we present a combination of genomic investigations in a cohort of 18 NCMD families. The aim of this study was to identify any causative molecular changes and mechanism of disease in these families.

Results
Families and brief clinical phenotype description. Eighteen families with phenotypes consistent with a diagnosis of NCMD were included in the study (Table 1 and Supplementary Fig. S1). Four families were previously reported: suggestive linkage at the MCDR3 locus has been recently described for family 1 14 , family 2 was originally reported to be linked to the MCDR3 locus 11 , and families 12 and 13 were linked to MCDR1 7, 9 , with family 13 recently found to carry the SNV V2 upstream of PRDM13 15 . All families (mostly of small size) showed autosomal dominant inheritance and had at least one individual with Grade 3 disease. DNA samples from a total of 56 affected and 33 unaffected family members were available for genetic analysis. Figure 1 shows fundus autofluorescence and optical coherence tomography (OCT) images for selected individuals from families 2 and 3. Individual IV:5 from family 2 presents with a well demarcated, relatively symmetrical, bilateral area of macular chorioretinal atrophy, while individual IV:3 from family 3 shows a mild form of disease with relatively symmetrical, bilateral hyperfluorescent drusen-like deposits concentrated in the macular region.
Haplotype sharing analysis can exclude or suggest genetic mapping at known NCMD loci.
Haplotype sharing analysis was carried out using the Homozygosity Haplotype (HH) method 18 to search for shared identical-by-descent (IBD) chromosomal segments among affected individuals within each family. This analysis was performed in those families for which Illumina single-nucleotide polymorphism (SNP) array data were available for more than one affected family member (families 1-5 and 12-13). The 6q16 MCDR1 locus was excluded in four families, including the two previously MCDR3-linked families 1 14 and 2 11 and unreported families 3 and 4 (Supplementary Figs S2-S5). Family 5 showed evidence for haplotype sharing at many regions across the genome, including both the 6q16 and 5p15-p13 loci ( Supplementary Fig. S6). The two previously reported MCDR1-linked families 12 7 and 13 9 were confirmed with evidence for a Region with a Conserved HH (RCHH) at the 6q16 locus, and not at the 5p15-p13 locus (Supplementary Figs S7-S8). Two additional NCMD families shown to carry previously reported SNV upstream of PRDM13 at the MCDR1 locus. All families, except families 1-4 for which linkage at the 6q16 locus had been excluded via haplotype sharing analysis, were tested with Sanger Sequencing for the three previously reported SNVs (V1-V3) upstream of PRDM13 15 . In addition to the previously reported V2 family 13 15 , two more NCMD families were found to harbour the variant V2 (family 11 and the previously described MCDR1-linked family 12 7 ).

Array-based comparative genomic hybridization (aCGH) uncovers duplications at the MCDR3 locus in three NCMD families.
To investigate the MCDR3 locus for the presence of structural variants (SVs), an aCGH experiment using 10,000 probes spanning the region at chr5:11882-10140073 (GRCh37/hg19) was performed in three affected individuals from families 1-3 which did not show linkage at the 6q16 locus (Supplementary Figs S2-S4). All three families were found to harbour heterozygous duplications of approximately 45 kb, downstream of IRX1 and upstream of ADAMTS16 (Fig. 2a). The duplications were found to be located in the minimal overlapping regions chr5:4391880-4434888 (GRCh37/hg19) in family 1 and chr5:4397221-4440150 (GRCh37/hg19) in families 2 and 3. These SVs were not seen in 16 control individuals included in the same aCGH experiment, nor were they present in whole genome sequencing (WGS) data from 650 individuals with inherited retinal disease 19 or in publicly available population copy number variant (CNV) data (CNV browser) 20 .
WGS identifies four more NCMD families with duplications at the MCDR3 locus. Thirteen affected individuals from families 1-7 underwent WGS. Graphical visualisation of individual paired-end reads using Integrative Genomics Viewer (IGV) 21,22 confirmed the presence of heterozygous tandem duplications in families 1-3 (Fig. 2b). Precise breakpoint coordinates were identified from coverage changes, split reads and chimeric reads. Family 1 had a 45158 bp duplicated region (GRCh37/hg19 chr5:4391377-4436535) and families 2 and 3 shared an identical 43515 bp tandem duplication (GRCh37/hg19 chr5:4396927-4440442), overlapping the first identified SV by 85% of the sequence (GRCh37/hg19 chr5:4396925-4436534). Subsequently, members from families 4-7 were also found to carry the same 43 kb duplication. Polymerase chain reaction (PCR) primers were designed to amplify the novel sequence across the breakpoint between duplicated copies (Table 2, Fig. 3) and used to confirm the predicted breakpoints and assess segregation of the two variants in all available affected and unaffected members of families 1 and 2 (Fig. 3, Table 1 and Supplementary Fig. S1). PCR was then used to genotype the available affected individuals from families 3-7 and confirmed the presence of a band in all affected individuals tested (Table 1 and Supplementary  Fig. S1).
Genotyping reveals three additional previously unmapped NCMD families with duplications at the MCDR3 locus. The remaining 8 unmapped families were tested with the established PCR assay for both duplications, and 3 of them (families 8-10) were also found to carry the 43 kb duplication (Table 1 and Supplementary Fig. S1). Thus, a total of 9 not knowingly related families were shown to harbour the same 43 kb tandem duplication at the MCDR3 locus. Five affected members available from the remaining 5 families did not carry either of the two novel duplications.   Haplotype sharing analysis suggests presence of ancestral haplotypes at the MCDR1 and MCDR3 loci. We hypothesized that finding the same 6q16 SNV and 5p15 duplication with identical breakpoint in 3 and 9 families respectively, could be due to two different shared ancestral haplotypes suggestive of a common founder, in keeping with previous reports on other 6q16 NCMD families 5,12,13,15 . Therefore, haplotype sharing analysis was performed using available Illumina SNP array data from 14 affected individuals in 3 families carrying the 6q16 V2 variant (families 11-13) and 14 affected individuals in 6 families carrying the 5p15 43 kb duplication (families 2-7). Using a cut-off of 2.0 cM and 2.5 cM respectively, the results confirmed that all the genotyped 6q16 individuals collectively shared a RCHH of approximately 2.5 Mb from GRCh37/hg19 coordinate chr6:98962591 (rs150396) to chr6:101468591 (rs1321204) at the MCDR1 locus, and all the genotyped 5p15 individuals collectively shared a RCHH of approximately 0.9 Mb from GRCh37/hg19 coordinate chr5:4327455 (rs155354) to chr5:5210050 (rs1560063) at the MCDR3 locus (Supplementary Tables S1-S2 and Supplementary Figs S9-S10).

Discussion
We report two distinct heterozygous tandem duplications at the MCDR3 locus in 30 affected individuals from 10 NCMD families. The two novel SVs overlap the previously described duplication found in a single NCMD family of Danish origin 15 and further refine the 5p15 NCMD locus to a shared region of 39 kb in a gene desert downstream of IRX1 and upstream of ADAMTS16 (800 kb and 693.9 kb from the respective transcription start sites, Fig. 4). We postulated that the 39 kb shared region could harbour cis-acting elements that contribute to the fine tuning of gene expression during macular development, affecting target gene expression spatially, temporally and/or quantitatively. Publicly available platforms were queried for informative data on gene expression and chromatin accessibility in relevant tissue types. A dataset screening for gene expression in fetal retina confirmed high expression of IRX1 at 19-20 weeks of gestation in the macula, and medium expression levels in other regions ( Supplementary Fig. S11). In contrast, ADAMTS16 had medium expression levels throughout the retina 23,24 . Although no role in retinal pathophysiology has been described for ADAMTS16, the gene has high sequence similarity to ADAMTS18 which has been previously associated with retinal disease 25 . Overall, the data suggest that the pattern and/or refined spatial dosage and timing of expression of the transcription factor IRX1 may be important in macular development. A second dataset provided information on open chromatin conformation using DNase-accessible sequencing in fetal retina tissues at 5 stages from gestation day 72 to 125 (~10 to 18 weeks) 26 . Different sites were identified to be open/active within the 39 kb shared region at four out of five time points (~10-15 weeks of gestation) available during retinal development (Table 3). Interestingly, one of the sites was active during three developmental stages and the remaining four sites were functionally active as two overlapping pairs. At the last time point (day 125, ~18 weeks), all sites were inactive/closed. In the context of human macular development, the sites are active during the period where retinal progenitor cells are proliferating and differentiating towards photoreceptor fate 27 ; by week 14 of gestation, cells of the central retina exit mitosis 27 , corresponding to the period where DHSs are turning off.
As mentioned, the MCDR1 locus on chromosome 6q16 is associated with variants sited within a DHS, which suggests that aspects of macular development may be highly gene dosage sensitive. Exploring the function and precise target of such regulatory domains in both loci will be essential for understanding the disease mechanism of NCMD and investigating its potential role in the context of normal macular development. The graded Figure 4. Schematic representation of the MCDR3 locus which is refined to a 39 kb shared genomic region (GRCh37/hg19 chr5:4396925-4436534). The shared sequence between a previously reported duplication and the two novel SVs identified in this study is located in a large gene desert, downstream of IRX1 and upstream of ADAMTS16, 800 kb and 693.9 kb from their respective transcription start sites. Publicly available NGS datasets were queried for informative data on chromatin accessibility and 3 sites were found active from human gestation day 72 to 105 in fetal retina, suggestive of functional acting elements within this site. expression of IRX1 and known involvement in retinal development 28,29 , but not ADAMTS16 in the macular region, suggests that IRX1 is the probable target of the putative retinal regulatory element which, when duplicated, may cause misregulation of IRX1.
Eye development, like other organogenesis processes, requires the precise spatio-temporal and quantitative expression of genes, orchestrated by a complex network of regulatory mechanisms influencing critical transcription factors and other developmental genes. The lack of readily accessible animal or in vitro models has hindered detailed understanding of macular development, as this structure only evolved in higher primates among mammals. Recently, disrupted developmental expression of the transcription factor and histone methyltransferase PRDM13 30, 31 was suggested as a disease mechanism for NCMD at the 6q16 locus, based on the identification of non-coding SNVs and duplication events residing in an overlapping region upstream of PRDM13 in many MCDR1 families. Differential regulation of PRDM13 in eyecups derived from wild-type induced pluripotent stem cells (iPSCs) 15,16 was suggested. However, no causal relationship between the non-coding variants and PRDM13 expression has been identified.
Despite variable presentation in affected individuals, the NCMD phenotypic spectrum is indistinguishable in patients assigned to either of the two linked loci, MCDR1 and MCDR3. Whether a biological and functional connection between PRDM13 at the MCDR1 locus and the most likely candidate gene IRX1 at the MCDR3 locus exists warrants further investigation. iPSC technology and CRISPR manipulation in eye cups from normal and affected individuals may help elucidate the molecular mechanism 32, 33 and the potential molecular links between the two genes. Importantly, the involvement of ancestral variation at both the 6q16 and 5p15 loci (Supplementary Tables S1-S2 and Supplementary Figs S9-S10) in such a highly penetrant dominant disease is intriguing, with the implication that there may exist a significant number of unrecognized related NCMD families. Full clinical examination reveals a high degree of penetrance, but visually unaffected individuals in whole families may fail to be ascertained.
Finally, the two novel duplications identified in this study significantly further the understanding of the molecular genetics of NCMD at the MCDR3 locus and provide additional effective tools for the molecular diagnosis of NCMD families.

Materials and Methods
Families. All families were ascertained at Moorfields Eye Hospital, London, United Kingdom, expect for family 1 14 (Vision Centre, Children's Clinical University Hospital, Riga, Latvia) and family 13 9 (Centre Hospitalier Régional Universitaire de Lille, France). When possible, retinal imaging was undertaken using colour fundus photography, fundus autofluorescence and OCT imaging. Blood/saliva samples were collected for DNA extraction, genotyping and sequence analyses. The study protocol was approved by the local ethics committees (Central Medical Ethics Commitee of Latvian Republic; NRES Committee London -Camden & Islington) and conformed to the tenets of the Declaration of Helsinki. Written informed consent was obtained from all participants, or their parents, before inclusion in the study.
Haplotype sharing analysis. In order to search for chromosomal segments sharing the same haplotype across affected individuals (within the same family or across different families), the non-parametric HH method 18 was used for the analysis of those affected individuals that were genotyped with the Illumina array. The HH is a type of haplotype described by the homozygous SNPs only (all heterozygous SNPs are removed) and, therefore, can be uniquely determined on each chromosome. Since affected family members who inherited the same mutation from a common ancestor share a chromosomal segment IBD around the disease gene, they should not have discordant homozygous calls in the IBD region and thus they should share the same HH. The HH approach predicts IBD regions through the identification of RCHHs defined as those regions with a shared HH among affected individuals and a genetic length longer than a certain cut-off value (recommended cut-off for Illumina HumanOmniExpress array is 2.5/3.0 cM for the analysis of one single family). aCGH. aCGH was performed at Oxford Gene Technology (OGT) (Begbroke, United Kingdom) using a custom design consisting of 10,000 probes spanning the MCDR3 locus at GRCh37/hg19 chr5:11882-10140073 (approximately 1 probe every 1,000 bp), designed with Agilent e-Array software (Agilent Technologies Inc., Santa Clara, CA, USA), in three individuals from families 1-3 ( Supplementary Fig. S1). Sixteen other individuals   (Illumina, Inc., San Diego, CA, USA), generating minimum coverage of 30X. Reads were aligned to the hg19 human reference sequence (build GRCh37) with novoalign (version 3.02.08). The aligned reads were sorted by base pair position and duplicates were marked using novosort. Discordant reads were marked with samblaster (version 0.1.20) and sent to a separate file for manual inspection of breakpoints using the IGV (version 2.3.61). SVs were manually investigated using the IGV by identifying peaks of discordant reads which were interpreted as breakpoints. The identified duplicated regions were also screened for the presence of common copy number variants using data from the CNV browser 20 (https://personal.broadinstitute.org/handsake/mcnv_data/) and WGS data from 650 individuals with inherited retinal disease 19 .
Sanger sequencing validation of duplication events. Segregation analysis of the duplication events identified by WGS was performed using primers ( Table 2) designed to span the end of first copy and start of second copy. A graphical representation is shown in Fig. 3. After sequence confirmation with Sanger sequencing, PCR was used to genotype selected individuals from all identified families.
In silico analysis of duplicated sequences and expression of flanking genes. The Encyclopedia of DNA Elements (ENCODE) 26 was interrogated for fetal retina datasets of interest. Bed files from DNA-seq datasets (ENCFF249FGP, ENCFF937NUZ, ENCFF401BCF, ENCFF591NRB, ENCFF265ZNN, Stamatoyannopoulos' laboratory) were downloaded and investigated at the shared duplicated region with R Studio. A second microarray expression dataset on human fetal retina (19-20 gestation week) was queried for the genes of interest 24 using the platform GENEVESTIGATOR 23 .
Data availability statement. The datasets generated during the current study are not publicly available due to limitations imposed by the scope of participant consent, but are available from the corresponding authors on reasonable request.