Cornelia de Lange syndrome (CdLS; OMIM 122470) is a dominantly inherited multisystem developmental disorder characterized by growth and cognitive retardation; abnormalities of the upper limbs; gastroesophageal dysfunction; cardiac, ophthalmologic and genitourinary anomalies; hirsutism; and characteristic facial features1,2,3. Genital anomalies, pyloric stenosis, congenital diaphragmatic hernias, cardiac septal defects, hearing loss and autistic and self-injurious tendencies also frequently occur2. Prevalence is estimated to be as high as 1 in 10,000 (ref. 4). We carried out genome-wide linkage exclusion analysis in 12 families with CdLS and identified four candidate regions, of which chromosome 5p13.1 gave the highest multipoint lod score of 2.7. This information, together with the previous identification of a child with CdLS with a de novo t(5;13)(p13.1;q12.1) translocation, allowed delineation of a 1.1-Mb critical region on chromosome 5 for the gene mutated in CdLS. We identified mutations in one gene in this region, which we named NIPBL, in four sporadic and two familial cases of CdLS. We characterized the genomic structure of NIPBL and found that it is widely expressed in fetal and adult tissues. The fly homolog of NIPBL, Nipped-B, facilitates enhancer-promoter communication and regulates Notch signaling and other developmental pathways in Drosophila melanogaster5.
CdLS is a dominantly inherited disorder with characteristic facial appearance, limb defects (Fig. 1) and growth and cognitive retardation. We carried out a genome-wide linkage analysis in nine families with CdLS with more than one affected family member. Under a model of genetic homogeneity, we used a linkage exclusion mapping approach, excluding all markers for which the affected individuals in one or more families did not share both parental alleles (if both parents were unaffected) or the allele transmitted by the affected parent. This analysis identified five regions containing one or more markers with positive lod scores in the nine families (chromosomes 2q37, 5p13, 10p13, 14q24 and 17p13; Table 1). We analyzed these five regions in the original nine families and in three additional families with CdLS (total of 12 families) and obtained negative lod scores for D17S938 in one family, excluding chromosome 17. All other markers gave positive lod scores (Table 1).
We carried out fine mapping in all 12 families with additional markers at an average density of 1–1.5 cM in the defined regions on chromosomes 2, 5, 10 and 14. Multipoint linkage analysis did not improve the odds for linkage to chromosomes 2, 10 or 14 but resulted in a maximum lod score of 2.7 for chromosome 5p13, which was the highest score for the entire genome analysis. We refined the critical region on chromosome 5p13 by obligate recombination events to a region of ∼7.4 Mb spanning 5p13.1–13.3 flanked by markers D5S477 distally and D5S1376 proximally (Supplementary Fig. 1 online) and containing 58 putative genes (Fig. 2a).
We looked for other corroborating evidence to target one or more of the four candidate regions. We previously identified a child with classic features of CdLS and a balanced de novo t(5;13)(p13.1;q12.1) translocation, and another child with classic features of CdLS and a de novo chromosome 5p13.1–p14.2 deletion (the only reported case of a constitutional deletion of 5p13.2) was recently described6. These cases supported the association of 5p13 with CdLS. We next refined the 5p breakpoint in the child with the translocation (samples were not available from the child with the 5p deletion, who died shortly after birth). We carried out fluorescence in situ hybridization (FISH) analysis using clones in the minimal critical region on 5p13 of the child with the translocation (Fig. 2b). Owing to sample limitations, we could not initially identify a clone that spanned the translocation breakpoint, but we narrowed the critical region to an interval of 1.1 Mb containing 11 putative genes (Fig. 2a).
We carried out mutational analysis of the first three exons of all 11 genes by conformation-sensitive gel electrophoresis (CSGE)7 and identified mutations in two overlapping transcripts, BX538178 (3,653-bp mRNA) and IDN3 (8,124-bp mRNA; Fig. 2a). The identification of mutations in both transcripts (in BX538178-specific sequence, in IDN3-specific sequence and in the overlap region) and their exact sequence identity over a 2,259-bp region of overlap suggested that they were part of a larger transcript, which we called NIPBL (Nipped-B like). CSGE analysis of the complete coding sequence of NIPBL in 30 probands (including the 12 familial probands) identified mutations in 2 familial and 4 sporadic cases of CdLS (20% mutation detection rate; Table 2 and Supplementary Fig. 2 online). In three of the four sporadic cases for which samples from both parents were available, the mutations were de novo. In one sporadic case (3023-3027delTGTCT), samples were available from only the mother, and she did not carry the mutation. All four mutations identified in sporadic cases were frameshift mutations (three deletions and one insertion). The two familial mutations (a missense mutation in family II in the first codon (2G→A, causing the amino acid substitution M1K) and a splice site mutation in family XXI (6763+5G→T) in intron 39) were identified in all affected siblings and were not present in any of the parents, implicating germline mosaicism as a mechanism in familial recurrences where neither parent manifests features of the disorder.
All mutations are expected to result in a truncated or, in the case of the M1K mutation, untranslated protein. The mutations are spread throughout the gene and were not seen in 300 normal ethnically matched control chromosomes. We identified seven sequence polymorphisms (Supplementary Table 1 online).
We studied expression patterns by northern-blot and in situ analyses. Northern blots of fetal and adult samples for multiple probes detected transcripts of ∼6 kb and 1.9 kb transcripts and, in fetal samples, additional bands of ∼9.5 kb and 7.2 kb (Supplementary Fig. 3 online). The presence of multiple transcripts is suggestive of alternative splicing for this gene. Transcripts of the mouse homolog of NIPBL were detected widely at gestation days 9.5 and 10.5 (Fig. 3), with notable accumulations in limb bud, branchial arch and craniofacial mesenchyme. These regions are involved in patterning of the skeleton and soft tissues of the limbs, jaw and face (among others).
We amplified cDNA isolated from lymphoblastoid cell lines, compared it with sequences in the University of California Santa Cruz and National Center for Biotechnology Information genomic databases and determined that NIPBL is represented by two overlapping transcripts: BX538178 (3,653-bp mRNA) and IDN3 (8,124-bp mRNA). We confirmed this by northern-blot analyses using probes generated from sequence-specific regions of these two transcripts. The genomic sequence spans 188 kb, and the mRNA is 9,505 bp (coding region, bases 127–8,539), encoding a protein of 2,804 amino acids. The mRNA comprises 47 exons, with one 5′ noncoding exon. The protein sequence of human NIPBL shares 92% identity with mouse, 88% with rat and 37% with the fruit fly Nipped-B gene product (SIM alignment). In a BLAST search of the National Center for Biotechnology Information database, NIPBL also had substantial homology with the Saccharomyces cerevisiae sister chromatid cohesion protein 2, which forms a complex with SCC4 and is required for the association of the cohesin complex with chromosomes8. We used the PROSITE program to search for conserved motifs and found that NIPBL has a bipartite nuclear targeting sequence (amino acids 1,108–1,124) and a putative HEAT repeat. HEAT repeats (originally identified in the huntingtin protein) are found in condensins, cohesins and other complexes with chromosome-related functions9.
Nipped-B is an essential regulator of cut, Ultrabithorax and Notch receptor signaling. Its protein product belongs to the family of chromosomal adherins, and genetic evidence suggests that it has an architectural role in facilitating long-distance interactions between enhancers and promoters5. The involvement of Nipped-B in regulating Notch signaling is of interest, as two other genes involved in Notch signaling are implicated in human developmental disorders (mutations in JAG1 result in Alagille syndrome10, and mutations in DLL3 result in spondylocostal dysostosis11).
The identification of mutations in a single allele of NIPBL in individuals with CdLS is consistent with a dominant pattern of inheritance. All mutations identified so far predict a truncated protein product and probably result in functional haploinsufficiency. That haploinsufficiency is a mechanism in CdLS is confirmed by the child with a large deletion of the region (encompassing NIPBL) and severe manifestations of CdLS6, and by the child with the translocation reported here, who also has severe manifestations.
In this report we show that mutations in NIPBL cause CdLS. Because the paucity of familial cases and consistent cytogenetic rearrangements did not allow for standard positional cloning approaches, we identified this gene by combining information on candidate regions not excluded by linkage analysis with other supporting data (cytogenetic rearrangements). The expression pattern of NIPBL, and the mechanism of action suggested by its structural homologs, provides insight into the pathogenesis of the defects seen in the multiple systems involved in CdLS.
Individuals with CdLS.
We verified that all affected individuals enrolled in the study were diagnosed with CdLS. All affected individuals and unaffected family members were enrolled in the study under a protocol of informed consent approved by the Institutional Review Board at The Children's Hospital of Philadelphia.
Genome-wide linkage analysis.
We carried out linkage studies using the ABI linkage mapping set version 2, consisting of 400 fluorescently labeled polymorphic markers spaced at intervals of ∼10 cM throughout the genome. We estimated marker allele frequencies used in the lod score analysis based on alleles observed in the families' founders. We carried out model-based two-point and multipoint linkage analysis on data from the whole-genome scan and from the fine mapping of chromosomes 2, 5, 10 and 14 in all families using the GENEHUNTER computer program version 2.0 (ref. 12). For lod score analysis, we assumed the disease to follow an autosomal dominant mode of inheritance with disease allele frequency of 0.00001. To account for the possibility that the disease in families with unaffected parents was due to germline mosaicism in one of the parents, we coded all unaffected individuals (parents and siblings) from whom samples were available for genotyping as unknown at the disease phenotype. Thus, we did not have to assume anything about the unknown penetrance of the putative mutation underlying CdLS. We retained marker genotype information from unaffected siblings when such information was available and used it to reconstruct phase for haplotyping. Marker maps used in multipoint linkage analysis were sex-averaged genetic maps from the Center for Medical Genetics of the Marshfield Clinic Research Foundation.
We carried out FISH analysis using standard techniques as described previously13. We used BAC clones to the critical region on chromosome 5p13.2 (from telomere to centromere: RP11-8C23, RP11-67N10, RP11-317I21 (AC026463.5), RP11-14I21, RP11-7M4, RP11-252F20, RP11-90P7, RP11-60A21 and RP11-138C1) identified through the University of California Santa Cruz genome browser as probes to refine the position of the chromosome 5p13 breakpoint of the t(5;13)(p13.1;q12.1) translocation. We obtained the BACs from Children's Hospital of Oakland Research Institute. We extracted total BAC DNA (Perfectprep Plasmid XL, Eppendorf Scientific) and labeled it with spectrum orange or green dUTP by nick translation using a commercially available kit (Vysis). We combined labeled DNA with Cot-1 DNA. We carried out hybridization and washes using standard conditions.
Mutational analysis and CSGE.
We carried out CSGE according to standard protocols7. Oligonucleotide primer sequences and PCR conditions used for amplification of all exons of the NIPBL are available on request. We purified PCR products corresponding to all altered migration patterns (shifts) using QIAquick PCR purification kit (QIAGEN Sciences) and sequenced them on an ABI 377 sequencer.
We hybridized poly(A)+ RNA northern blots of multiple adult human tissues (Human 12-Lane Multiple Tissue Northern (MTN) Blot BD Biosciences Clontech) and human fetal tissues (MessageMap Northern Blot, Stratagene) with a 301-bp probe from BX538178-specific cDNA sequence (NIPBL exon 2 and 3), a 344-bp probe from IDN3-specific cDNA sequence (NIPBL exon 46 and 47) and a 252-bp probe from a region of overlap between the two putative transcripts (NIPBL exon 10; all primer sequences available on request). We used the BD SpotLight Random Primer Labeling Kit (BD Bioscience Clontech) to label probes and SpotLight Chemiluminescent Hybridization & Detection Kit (BD Bioscience Clontech) for hybridization and visualization. Experiments were duplicated using Ready-to-go DNA labeling beads (- dCTP; Amersham) with 32P-dCTP and purified on ProbeQuant G-50 microcolumns (Amersham). We blocked blots with yeast tRNA and herring sperm DNA. We visualized the signal by exposure to autoradiograph film for 1–5 min (chemiluminescent) and 1–4 h (32P).
In situ hybridization.
We generated a probe for the mouse homolog of NIPBL by PCR from an EST clone (oligonucleotide primer sequences available on request), which yielded a 389-bp product corresponding to the last 190 bp of exon 10 and all of exon 11 of human NIPBL (Table 2). We subcloned this fragment into pCRII-TOPO (Invitrogen) to generate antisense and sense digoxigenin-labeled cRNA probes. We generated an Fgf8 probe (positive control) from a 422-bp NcoI-PstI fragment of the Fgf8 cDNA (bp 59–481 of GenBank Z48746) cloned into pBluescript. We dissected CD-1 (Charles River) mouse embryos at days 9.5 and 10.5 of gestation and fixed and processed them for whole-mount in situ hybridization, with detection using alkaline phosphatase–conjugated sheep antibodies to digoxigenin and 5-bromo-4-chloro-3-indolyl phosphate/nitroblue tetrazolium as the chromagenic substrate14.
The University of California Santa Cruz genome browser is available at http://genome.ucsc.edu/cgi-bin/hgGateway. The National Center for Biotechnology Information genome database is available at http://www.ncbi.nlm.nih.gov/. The PROSITE program is available at http://us.expasy.org/cgi-bin/scanprosite. SIM alignment is available at http://us.expasy.org/cgi-bin/sim.pl. The Center for Medical Genetics of the Marshfield Research Foundation is available at http://research.marshfieldclinic.org/genetics/.
GenBank: Human IDN3, NM_133433; mouse IDN3 homolog BG070859 and XM_127929; rat IDN3 homolog, XM_238213; NIPBL, BK005151. GenBank protein: Saccharomyces cerevisiae sister chromatid cohesion protein 2, Q04002.
Note: Supplementary information is available on the Nature Genetics website.
We thank the individuals with CdLS and their families for their support and willingness to donate samples; the Cornelia de Lange Syndrome Foundation, their staff and their director J. Mairano for their support; and N. Spinner, M. Jackson, A. Kline, J. Morrissette, M. Budarf and the staff of the clinical cytogenetics laboratory and the sequencing core at The Children's Hospital of Philadelphia for their comments and guidance. This work was supported by grants from the National Institutes of Health, National Institute of Child Health and Human Development (to I.D.K., M.D., A.D.L. and A.L.C.).