Introduction

Cornelia de Lange syndrome (CdLS: OMIM# 122470, 300590, and 610759) is a multiple congenital anomaly disorder, characterized by distinctive facial features, intellectual disability/developmental delay, hirsutism, and limb abnormalities.1 Growth failure, which typically manifests in the second trimester, occurs proportionally.2 Hirsutism is commonly observed especially on the face, neck, back, and extremities. Less frequently associated clinical findings include cardiac septal defects, gastrointestinal malformations/dysfunction, genitourinary malformations, ocular findings, and hearing problems.1,3,4 Despite differences in clinical severity from patient to patient, the distinctive recognizable facies have provided the most differentiating feature in establishing the diagnosis. The prevalence of CdLS is estimated to be 1:10,000 live births but the incidence may be underestimated due to lack of recognition of milder cases.1

Thus far, three cohesin complex subunit genes—NIPBL, SMC1A, and SMC3—have been described in the molecular etiology of CdLS.5,6,7,8 Heterozygous mutations in NIPBL, which encodes a key regulatory protein of the cohesin complex that functions in sister chromatid cohesion and transcriptional regulation, are responsible for ~50% of all CdLS cases. Subsequently, the involvement of two additional core cohesin subunit genes, SMC1A and SMC3, were documented, but point mutations in these genes account for only 5% of all CdLS cases.9,10 Hence, the molecular etiology of ~45% of CdLS cases remains unknown. Therefore, other cohesin complex subunit genes, and/or large genomic rearrangements—that is, copy number variation (CNV)—of known genes not detected by DNA-sequencing methods may have a role in the molecular etiology of CdLS.

The majority of disease-causing mutations in NIPBL are point mutations or single-nucleotide variants (SNVs); however, case reports of large genomic rearrangements have been published rarely.11,12,13,14 Genotype–phenotype correlation studies revealed that patients who are mutation-negative tend to have a milder phenotype than patients who are mutation-positive, and patients with missense mutations have a milder phenotype than those with truncating mutations, which suggests that NIPBL is a dosage-sensitive gene.15 Furthermore, Gause et al.16 have shown that Nipped-B regulates cohesin chromosome binding in a dosage-sensitive manner in Drosophila salivary glands.Because of the rarity of CNV mutational events identified, genotype–phenotype correlations of large deletion CNVs involving NIPBL have not been extensively studied. Moreover, we showed recently that copy number gains involving NIPBL convey a different clinical phenotype than CdLS, further supporting the dosage sensitivity of NIPBL.13 NIPBL duplication cases have common facial dysmorphic features including frontal bossing, broad nasal root, low-set ears, short philtrum, and high arched palate; however, most of these features do not overlap with the distinctive CdLS facial gestalt. Furthermore, some clinical findings such as overweight body habitus, short philtrum, and long fingers are completely opposite to what is observed with the CdLS phenotype.

CNVs are increased or decreased number of copies of a genomic segment, that is, deviations from the normal diploid state, which like SNV may either represent benign variations or result in a disease phenotype.17 After application of genome-wide analysis tools such as comparative genomic hybridization (CGH), the human genome has been found to contain a high degree of CNVs comprising ~12% of the human genome in varying sizes from kilobases to megabases.18 Duplication CNVs in the etiology of genetic diseases have long been known,19,20,21 and more continue to be identified with recent advances in genome-wide scanning technologies.22

Three major mechanisms have been proposed for the formation of human genomic disorder–associated CNVs, including nonallelic homologous recombination, nonhomologous end joining, and the DNA replication-based mechanisms of fork stalling and template switching (FoSTeS)/microhomology-mediated break-induced replication (MMBIR).17 Nonallelic homologous recombination is the predominant mechanism for the formation of recurrent genomic rearrangements by using low copy repeats as a substrate for recombination. In nonhomologous end joining, breaks in double strands occur, and then both broken DNA ends are bridged. The product of repair often contains additional nucleotides at the junction, leaving a “molecular scar.” FoSTeS/MMBIR is a recently described replication-based mechanism of DNA repair that utilizes nucleotide microhomology at the breakpoint junctions to prime DNA replication of a template switch; it has been found to be associated with the formation of nonrecurrent and complex rearrangements.23,24,25

To investigate a potential role for large genomic rearrangements in the etiology of CdLS, and to further delineate potential genotype/phenotype correlations of CNVs in the CdLS phenotype, we designed an Agilent 8×60K custom array interrogating all cohesin complex subunit genes and report herein our findings at the NIPBL locus. Furthermore, by defining and characterizing the breakpoint junctions of these genomic rearrangements, we sought to gain insights into underlying molecular mechanisms.

Materials and Methods

Subjects

All patients were offered enrollment into the study after procuring informed consent at the Children’s Hospital of Philadelphia. Consent for publishing subject photographs was also independently procured. All patients were examined by clinical dysmorphologists experienced with the CdLS phenotype. To date, 162 CdLS cases, in which SNV mutations in NIPBL, SMC1A, and SMC3 were excluded by targeted sequencing methods, and genome-wide single-nucleotide polymorphism arrays, were investigated. This study was approved by the institutional review boards of both Baylor College of Medicine and the Children’s Hospital of Philadelphia.

Array CGH (aCGH)

We designed an Agilent 8×60K custom microarray with oligonucleotides interrogating cohesin complex subunit genes using the Agilent eArray website (http://earray.chem.agilent.com/earray). We included 46 genes and their flanking 50 kb of up- and downstream regions with an average genomic resolution of ~1 probe/200 bp. Genes related to cohesin structure and function were selected after a bibliographic search, including the OMIM and PubMed websites.

Experiments for digestion, labeling, purification of the labeled product, hybridization with gender-matched male (NA10851) or female (NA15510) control DNAs (obtained from Coriell Cell Repositories; http://ccr.coriell.org), washing, and scanning were conducted per the manufacturer’s protocol and previously described methods.26 Computational analyses including data extraction, background subtraction, and normalization were done by using Agilent Feature Extraction Software 10_7_3_1 (Agilent Technologies, Santa Clara, CA). These data were subsequently imported into array CGH analytics software (Genomic Workbench Standard Edition 5.0.14; Agilent Technologies). The genomic copy number was defined by analysis of the normalized log2 (Cy5/Cy3) ratio average of the CGH signal. Regions that reached a threshold of at least 0.6 were considered gains consistent with duplication, and thresholds of at least −1.0 were considered significant losses consistent with deletion.

Breakpoint analysis

To detect the breakpoint junctions of the large genomic rearrangements, primers were designed at the apparent boundaries of each segment based on aCGH analysis and the genomic coordinates of interrogating probes demarcating transitions from normal copy to apparent deletion. Both long-range and conventional PCR methods were conducted for each primer pair. Long-range PCR was performed as previously described.27 Standard PCR was carried out in 12 μl of reaction mixture with 0.52 pmol/µl of each primer, 50 ng of genomic DNA, 10× PCR buffer, 0.2 mmol/l of each deoxynucleotide triphosphate, and 0.6 U of HotStar Taq DNA polymerase (Qiagen, Valencia, CA). The initial denaturation step at 95°C for 15 min was followed by 40 cycles of denaturation at 94°C for 30 s, annealing at 60°C for 30 s, and an extension at 72°C for 1 min. A final extension step at 72°C for 7 min was added. Amplification products were electrophoresed on 0.8–1% agarose gels. PCR products were purified using ExoSAP-IT (Affymetrix, Santa Clara, CA) and analyzed by standard Sanger di-deoxy nucleotide sequencing (DNA Sequencing Core Facility at Baylor College of Medicine, Houston, TX). We successfully amplified the breakpoints of five of seven patients by using different combinations of primers (223-F1: 5′-TTGTTCTGGCAGTCTGTAGTATGG-3′ and 223-R1: 5′-TTAATGGCACACAACTGTAGTTCAC-3′ for patient CDL223; 266-F1: 5′-CAGCGTTCACTTTTGGAGGATGATA-3′ and 266-R2: 5′-CCTTCAACATTTTCCCCTAACCTTC-3′ for patient CDL266; 283-F2: 5′-TGTCAGTCATTCACCAAAGGAAAGT-3′ and 283-R2: 5′-TCTGCCAATATACCAAACAGGAAA-3′ for patient CDL283; 340-F1: 5′-CATGGCAAAAGTAAGATGCAGAAGA-3′ and 340-R1: 5′-CCAAAGAAAAGTATGCCATCCTCTC-3′ for patient CDL340; and 406-F2: 5′-CCTTGTGAGATGAGTATGCTTTTCC-3′ and 406-R2: 5′-GTGTGTTATTTCTCCTATCAGACAGT-3′ for patient CDL406).

PCR products of CDL223 and CDL406 were further sequenced by primer walking using the following primers: 223-FW1: 5′-GGATTCAAAACTAAGCAATT-3′ and 223-RW1: 5′-GAATTAAGAGAACACAATTT-3′ for patient CDL223, and 406-RW1: 5′-ATTTGAGAATGTCTACTCAC-3′ for patient CDL406.

Results

High-resolution genomic analyses of the NIPBL gene region in CdLS

To investigate a potential role for large genomic rearrangements as an etiology for CdLS, we performed high-resolution genome-wide gene-targeted CGH array analyses. In total, 7 of 162 unrelated patients were found to harbor NIPBL deletions ranging in size from 4.2 to 750 kb ( Figure 1 ). As anticipated, parental studies revealed that deletions occurred de novo in all CdLS cases for whom parental samples were available (4 of 7, both parents; 2 of 7, only one parent). Patients CDL266 and CDL340 each had a relatively small single-exon dropout mutation; deletions of exon 11 (4.2 kb in size) and exon 2 (4.5 kb in size), respectively. Except for patient CDL341, all deletions were intragenic exonic deletions of varying sizes. Patient CDL341 has a deletion spanning ~750 kb, that comprised almost the entire NIPBL gene and the 5′-flanking SLC1A3; we could not detect the breakpoint for this patient using our aCGH assay, as our aCGH probe coverage did not allow us to easily assess whether the deletion includes the distal RANBP3L gene or not. The deletions observed in patients CDL283 and CDL454 encompassed exons 2–9; although the deletion in each patient is different in size: 32 and 85 kb, respectively. Two patients, CDL223 and CDL406, were found to have multiexon intragenic NIPBL deletions including exon 2–17 deleting 66 kb and exon 2–6 deleting 18 kb, respectively ( Figure 2 ). Of note, intron 1, the longest intron, which constitutes 77 kb of the 189 kb NIPBL locus, harbors one end of the breakpoints in 5 of 7 patients.

Figure 1
figure 1

Array comparative genomic hybridization (aCGH) results displaying the NIPBL gene region. Results for aCGH analyses in each of the seven Cornelia de Lange syndrome (CdLS) cases are shown. Individual dots represent interrogating oligonucleotide probes: black dot represents normal copy number as compared with a gender-matched control, red dots represent copy number gain, and green dots represent copy number losses as compared with a gender-matched control. Horizontal blue bar represents the NIPBL gene. Numbers on the y axis show the log2 ratio of the hybridization signal of patient versus control. The approximate size in kb of the deletion is shown at right.

Figure 2
figure 2

Schematic view of NIPBL displaying exonic deletions in seven patients. (a) Chromosome 5 karyogram with G bands indicated (top). The location of NIPBL is demarcated with a blue vertical line at 5p13.1. (b) Graphic view of 47 exons (vertical black bars) of NIPBL; size and orientation of the gene above the exons. (c) Solid green bars represent genomic regions deleted with approximate sizes. Vertical dotted lines track exons on deleted regions. Patients’ code and deleted exons (Δ) are given at the left. The graphical normalized data for each patient was obtained by inputting the most distal and proximal oligonucleotide genomic probe coordinates into the custom track at the University of California, Santa Cruz website, http://genome.ucsc.edu/cgi-bin/hgGateway. Narrow green vertical bars depict uncertainty for proximal and distal ends of the deleted regions in patients CDL341 and CDL454, for which breakpoint junctions were not determined. Note the 77-kb length of intron 1; thus intron 1 harbors the distal breakpoint in five of seven cases.

Breakpoint junction sequence analyses

We performed breakpoint sequencing to further fine-map the deletions and potentially infer rearrangement mechanisms. The PCR amplification of breakpoint junctions was achieved in five of seven patients (CDL223, CDL266, CDL283, CDL340, and CDL406; Figure 3 ). In four of five breakpoints, the distal end of the deletion harbors a repetitive sequence such as a microsatellite, mammalian interspersed repeat element, or a member of the long interspersed element family. Breakpoints from each of these four patients showed 1–5 bp of shared microhomology between proximal and distal reference sequences. The distal breakpoint of patient CDL223, which maps to a mammalian interspersed repeat element, has a 2-bp microhomology (TT) at the breakpoint junction. Patient CDL283 was found to have 5 bp of microhomology (TGTGT) and the distal breakpoint was mapped within a GT microsatellite repeat. The deletion of patient CDL406 harbors a repetitive sequence at both proximal and distal breakpoints, long interspersed element (L1MA4) and simple repeat (TATATG), respectively, and has 5 bp of microhomology (ATATA) at the breakpoint junction. The deletion of patient CDL340 showed a 1-bp microhomology (A) and harbors a long interspersed element (L1M5) element at the distal breakpoint. One patient, CDL266, showed no microhomology, but instead had a 47-bp insertion between proximal and distal breakpoints. In aggregate, the above observations of breakpoint junctions reveal features attributed to replicative repair mechanisms such as FoSTeS/MMBIR as a potential predominant mechanism for large genomic deletions in CdLS cases. However, other mechanisms may contribute to a small proportion of the events (see Supplementary Table S1 online).

Figure 3
figure 3

Breakpoint sequence analyses for patients with NIPBL deletions. The proximal and distal sequences refer to reference sequences and to their relative position from the centromere. Proximal reference sequence and patient breakpoint sequences that match with the proximal reference sequence are shown in green, whereas the distal reference sequence and patient breakpoint sequences that match with the distal reference sequence are shown in red. Dash boxed sequences (purple) correspond to regions of microhomology and reveal the breakpoint junctions. Patient identification numbers, the type of the repeat sequence, and observed microhomology are shown above.

Phenotype of large genomic rearrangements in NIPBL

To assess potential correlations between the sizes of the deletion and clinical features, or whether deletion-associated CdLS may differ from the phenotype observed with SNV, we reviewed the clinical details for available records and have summarized the chromosomal involvement, facial features, physical features, and growth for each patient in Table 1 . All of the patients had facial features that were typical for CdLS and consistent with disruption of NIPBL activity ( Figure 4 ). They displayed a range of severity, from very mild, high functioning individuals with no limb anomalies to patients with severe cognitive impairment and upper limb truncations. Also consistent with mutations in NIPBL, all patients exhibited growth delay and microcephaly, but when compared on CdLS-specific growth charts (http://www.cdlsusa.org), they ranged from the 10th percentile in the patients with large multiple exon deletions (patients CDL223 and CDL341) to the 90th percentile for patients with small single or few exon-containing deletions (patients CDL266, CDL340, and CDL406).

Table 1 Clinical features of patients with deletions in NIPBL
Figure 4
figure 4

Patient photographs. Frontal view of facies and extremity pictures of patients with limb abnormality.

Discussion

We describe seven CdLS patients with nonrecurrent deletions involving the NIPBL gene. The deletions ranged in size from 4 to 750 kb and encompassed only a single exon/multiple exons or the entire gene; the majority of breakpoint sequences revealed microhomology.

NIPBL is the gene most predominantly found to be mutated in subjects with CdLS; SNV mutations are identified in ~50% of cases. Because studies in Drosophila16, mouse,28 and human29 showed that NIPBL is a dosage-sensitive gene, we hypothesized that large genomic deletions may contribute to a fraction of the remaining molecularly unidentified cases. Previously, two groups have reported a low frequency of large genomic rearrangements in CdLS.11,12 Ratajska et al.12 studied 11 NIPBL/SMC1A mutation-negative cases and found one deletion spanning 62.7 kb and encompassing exons 35–47 of the NIPBL gene.Bhuiyan et al.11 analyzed 50 CdLS probands negative for NIPBL mutation and found a single 5.2-kb deletion encompassing exons 41–42 of NIPBL.

This study systematically assessed genomic rearrangements in a large cohort of point mutation–negative CdLS cases. Furthermore, the genomic span and breakpoint junctions of multiple NIPBL deletions have been comprehensively examined. Our data show that large intragenic deletions of NIPBL can account for ~5% of mutation-negative CdLS cases. Thus, we suggest screening for large genomic deletions of NIPBL in SNV mutation-negative CdLS cases.

We identified microhomology at four of the five sequenced breakpoints that ranged from 1 to 5 bp, consistent with a possible replicative mechanism such as FoSTeS/MMBIR. Patient CDL266 has a 47-bp insertion and no microhomology at the breakpoint junction. We initially evaluated this case as a nonhomologous end joining event; however, detailed examination of the inserted segment revealed that another replicative mechanism, serial replication slippage, potentially underlies this complex rearrangement (see Supplementary Figure S1 online). Serial replication slippage has been proposed to explain complex rearrangements, especially those potentially occurring within 100-bp intervals of the replication fork.30

In reviewing the two reported CNV-associated CdLS cases, a breakpoint junction has been studied in detail in only one patient. In the case reported by Ratajska et al.,12 the proximal and distal deletion breakpoints mapped within two different long interspersed element 1 from distinct families. They did not observe microhomology at the breakpoint junction, but a 15-bp insertion was evaluated as reflecting a potential nonhomologous end joining event as the mechanism for the deletion.We re-analyzed the case reported by Ratajska et al.12 and now suggest that serial replication slippage may be the responsible mechanism for that deletion; as viewed in this perspective, apparent 15-bp insertion actually consist of smaller insertions flanked by microhomology (see Supplementary Figure S2 online). The breakpoint junction of the case reported by Bhuiyan et al.,11 was not studied in detail.

The clinical features of patients with genomic deletions that wholly or partially alter the NIPBL locus are comparable to the distinctive facial features, cognitive delay, and growth deficiencies associated with classical CdLS. However, a broad range of clinical severity that is manifest in their degree of cognitive, growth, and structural involvement can be observed with NIPBL deletions. Largely, this correlates with the size of deletion and the number of exons involved, with more exon involvement manifesting as more severe features. However, there are some notable findings that may explain some of the molecular basis for the differences between patients ( Table 1 ). For example, the most severe patients include patient CDL223, who demonstrates a severe, classic, NIPBL phenotype due to deletion of a large part of the open reading frame of NIPBL (exons 2–17). In addition, patient CDL341 demonstrates severe cognitive and growth delays and has a deletion of NIPBL exons 1–45 and upstream genes.

In contrast, patient CDL266, with a milder form of the disorder, is a higher-functioning patient with a deletion of only exon 11 that is predicted to result in an in-frame deletion. Also, patient CDL340 has less cognitive and growth involvement with a deletion of only exon 2. It is possible that an in-frame ATG at c.334 in exon 4 provides an alternative start codon.

Consistent with this model, three patients have deletions that include exons 5–8. Each of these patients demonstrates a phenotype that is intermediate to those above; however, their resultant features vary significantly despite deletions at the same three exons. Patient CDL406 is mildly affected with typical CdLS facies and a deletion of exons 2–6. It is possible that the ATG at c.748 in exon 7 could serve as an alternative start codon. Patient CDL283 is a moderately severe Caucasian patient with typical facies with a deletion of exons 2–9. Because exon 2 contains the primary start codon, it is possible that an alternative start codon at c.1618, p.540 in exon 10 could be used, resulting in a truncated protein. In comparison, patient CDL454, an African-American patient with a deletion of the same exons 2–9, demonstrates typical CdLS facies and growth delays, although he appears cognitively less affected than patient CDL283 despite having a deletion that is twice as large. Although these latter two patients could share the same alternative start codon, there are clearly modifying factors that influence the cognitive outcomes, either within the NIPBL locus or at other genomic sites.

To fully understand the multiple potential causes of CdLS and to be able to predict the growth and cognitive outcomes for patients with NIPBL mutations, much more work will need to be done to clarify the type of mutations and their resultant effect on the genome. Nevertheless, this study identifies additional underlying causes of CdLS, and provides insight into the molecular bases of these genomic rearrangements.

Disclosure

J.R.L. is a paid consultant for Athena Diagnostics and Ion Torrent Systems, has stock ownership in 23andMe, and is a co-inventor on multiple US and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis offered in the Medical Genetics Laboratory (http://www.bcm.edu/geneticlabs). The other authors declare no conflict of interest.