Cornelia de Lange syndrome (CdLS) is a multiple malformation disorder characterized by dysmorphic facial features, mental retardation, growth delay and limb reduction defects1,2. We indentified and characterized a new gene, NIPBL, that is mutated in individuals with CdLS and determined its structure and the structures of mouse, rat and zebrafish homologs. We named its protein product delangin. Vertebrate delangins have substantial homology to orthologs in flies, worms, plants and fungi, including Scc2-type sister chromatid cohesion proteins, and D. melanogaster Nipped-B. We propose that perturbed delangin function may inappropriately activate DLX genes, thereby contributing to the proximodistal limb patterning defects in CdLS. Genome analyses typically identify individual delangin or Nipped-B-like orthologs in diploid animal and plant genomes. The evolution of an ancestral sister chromatid cohesion protein to acquire an additional role in developmental gene regulation suggests that there are parallels between CdLS and Roberts syndrome.
The multisystem nature of the CdLS phenotype suggests that it is caused by a microdeletion or microduplication affecting several genes or by a single gene that regulates various target genes. A high-density BAC microarray comparative genome hybridization screen found no evidence for a consistent pattern of microdeletion or microduplication3. Because CdLS is rare and most cases are sporadic, genome-wide linkage screens are problematic. As an alternative, we analyzed chromosomal breakpoints associated with CdLS, focusing first on three classical cases with de novo balanced translocations, including the previously described translocations t(3;17)(q26.3;q23.1)4 and t(14;21)(q32;q11)5. We first analyzed the 3q26.3 breakpoint because of perceived phenotypic overlap between duplication 3q syndrome and mild CdLS6,7. The 3q breakpoint disrupts a large gene undergoing unusual alternative splicing, but we found no additional mutations specific to any individuals with CdLS3. Molecular analyses of regions spanning the 17q23, 14q32 and 21q11 breakpoint regions also did not identify a gene likely to underlie CdLS (data not shown).
We localized the breakpoints in a third translocation case to 5p13.1 and 13q12.1 (Fig. 1a,b and Table 1). Fluorescent in situ hybridization (FISH) mapping identified BACs crossing the 5p breakpoint (Fig. 1c) and the 13q breakpoint. CdLS was recently reported to be associated with a 5p13.1-5p14.2 deletion (D. Viskochil, personal communication), and so we focused on the 5p breakpoint. We continued FISH mapping with fosmids until we identified two clones with overlapping inserts that mapped to either side of the 5p13 breakpoint. G248P84262B4 gave a clear hybridization signal on the normal chromosome 5 and the der(13), indicating preferential binding to the region telomeric to the 5p breakpoint (Fig. 1d). In contrast, G248P88840C10 hybridized clearly to the der(5) chromosome but was not visible on the der(13) chromosome (Fig. 1e).
The 5p breakpoint mapped close to a previously predicted gene-like sequence of unknown function, called IDN3. Using in silico analyses and in-house cDNA sequencing, we determined that IDN3 was a gene fragment and that it comprises 90 kb of a new 190-kb gene, which we named NIPBL (Nipped-B-like; Fig. 2a,b). NIPBL contains 47 exons and is predicted to generate isoforms of 2,804 or 2,697 amino acids (Fig. 2c). Northern-blot analysis confirmed the predicted 9.8-kb transcript size and showed that NIPBL was strongly expressed in fetal and adult kidney, fetal liver, adult placenta, heart, skeletal muscle and thymus, but weakly or almost undetectably expressed in fetal and adult brain and lung and in adult liver, colon, small intestine and leukocytes (Supplementary Fig. 1 online).
We screened other individuals with CdLS for mutations in NIPBL and identified nine plausible point mutations, at least five of which arose de novo (Table 1 and Supplementary Fig. 2 online). As we found NIPBL mutations in individuals with severe and mild CdLS, phenotype variation can be explained, at least in part, by allelic heterogeneity. The spectrum and distribution of mutations imply that pathogenesis arises from loss or altered function of a single NIPBL allele. Our mutation detection rate was ∼50%. Locus heterogeneity could be a factor, but limitations of the screening methods is a plausible explanation for the comparatively low mutation detection rate. Considerable intrafamilial variation in phenotype, even between siblings with CdLS8, has been reported, suggesting that additional factors may be important.
Using BLAST and BLAT analyses we determined the full-length sequence of the mouse, rat and zebrafish NIPBL homologs (data not shown). The exon structure is very well conserved in vertebrates and specifies a protein of ∼2,800 amino acids. Sequence identities between human delangin and vertebrate orthologs are 96% (for mouse and rat) and 63% (for zebrafish; data not shown). TBLASTN searches against expressed-sequence tag databases showed that the two C-terminal isoforms are conserved in cow, pig, mouse, rat and chick (data not shown).
We also identified delangin homologs in flies (Drosophila melanogaster Nipped-B, Anopheles gambiae XM_320088), worms (Caenorhabditis elegans PQN-85, C. briggsae CBG0727), plants (Arabidopsis thaliana NM_121558, Oryza sativa NM_186173) and fungi (Scc2 family of sister chromatid cohesion proteins). In each case, homology is largely confined to a segment of ∼1,500 amino acids spanning most of the delangin C-terminal half (Fig. 2c). Discounting small polyglutamine- and lysine-rich segments, most of the homologs do not share significant homology with any other protein sequence predicted from the relevant genome sequence. Because they are expected to be essential and most of their sequence shows homology to delangins, they may be viewed as orthologs. The pufferfish may be an exception: BLAT analyses suggest that there are two related NIPBL-like gene sequences.
Many of the fungal homologs have crucial chromosomal roles: Saccharomyces cerevisiae Scc2 and Schizosaccharomyces pombe Mis4 in sister chromatid cohesion9,10 and Coprinus cinereus Rad9 in meiotic chromosome pairing and DNA repair11. Some metazoan orthologs, however, are known or likely to be developmental regulators. By facilitating activation of remote enhancers, the D. melanogaster Nipped-B protein regulates the Ultrabithorax (Ubx) and Cut homeobox genes12,13. RNA-interference knock-down of the gene encoding C. elegans PQN-85 results in a high level of embryonic lethality; survivors have a paralyzed uncoordinated phenotype, body morphology defects and sometimes a vulval defect (J. Ahringer, personal communication).
Because embryonic expression can differ substantially between some human-mouse orthologs14, we carried out NIPBL in situ hybridization analyses on human embryonic tissue sections. The observed expression pattern was largely consistent with the CdLS phenotype (Figs. 3 and 4). NIPBL was expressed in developing limbs (Fig. 3a,b) and later in cartilage primordia of the ulna and of various hand bones (Fig. 4c). Sites of craniofacial expression included the cartilage primordium of the basioccipital and basisphenoid skull bones (Figs. 3a and 4f) and elsewhere in the head and face, including a region encompassing the mesenchyme adjacent to the cochlear canal (Fig. 4e,f).
NIPBL was also expressed in the spinal column, notochord and surface ectoderm (Figs. 3a,b and 4b), sclerotome and what seem to be migrating myoblasts (Fig. 3b). Expression in the developing heart was pronounced in the atrial and ventricular myocardium and in the ventricular tubeculae but absent in the endocardial cushions (Fig. 3a,b). NIPBL was also expressed in the developing esophagus, trachea and midgut loops (Fig. 3a), in the bronchi of the lung (Figs. 3b and 4d) and in the tubules of the metanephros (Fig. 4a).
Expression in organs and tissues not typically affected in CdLS (e.g., the developing trachea, bronchi, esophagus, heart and kidney) may reflect a bias towards underreporting of more subtle aspects of the phenotype or problems that typically present later in life. Respiratory and feeding difficulties and gastroesophageal reflux are known CdLS complications1,15, individuals with CdLS have a greater incidence of congenital heart abnormalities1,16 and renal abnormalities can be found in >50% of classical cases (A. Selicorni, personal communication). Expression in the mesenchyme surrounding the cochlear canal (Fig. 4e) may be related to the hearing impairment commonly found in CdLS17. Expression of NIPBL in embryonic brain was not evident, but the main neurodevelopmental deficits in CdLS are thought to occur during late gestation18,19.
The involvement of Nipped-B in activating the Ubx and Cut homeobox genes12 may provide insights into the molecular basis of CdLS pathogenesis. Ubx suppresses limb formation in the fly abdomen by repressing Distalless (Dll), a gene required for distal limb development20. The Dlx family of mammalian Dll homologs are involved in multiple developmental processes, including limb and branchial arch patterning, neurogenesis and hematopoiesis21. They are expressed in the apical ectodermal ridge of the developing limb bud, which partly coordinates limb outgrowth, and also in facial primordia. Therefore, the proximodistal limb patterning defect that underlies limb reduction in CdLS and possibly the associated facial abnormalities could largely be explained by inappropriate activation of DLX genes. Mutations in fly Cut cause leg and wing abnormalities. The mouse homolog Cutl2 (Cux2) is dynamically expressed in branchial arch and limb bud progress zones22, and so reduced expression of a human homolog in CdLS could also contribute to facial and limb abnormalities. The other mouse homolog, Cutl1, is widely expressed but important in lung development23.
A dual role for Nipped-B in sister chromatid cohesion and developmental regulation was recently confirmed24. Similar dual roles can be expected for vertebrate delangins, suggesting a possible parallel between CdLS and Roberts syndrome (OMIM 268300), which is characterized by growth retardation, limb reduction defects, craniofacial abnormalities and premature centromere separation. We assayed C-banded samples from individuals with CdLS for premature centromere separation but, perhaps unsurprisingly, detected no abnormalities; targeted knock-down of both alleles might be more informative. If delangin does have a dual functional role, the housekeeping role in facilitating trans interactions between sequences on sister chromatids could be satisfied with a basal level of expression. An additional role in enabling long-distance cis interactions (between promoter plus remote enhancer) for select target genes could require strong expression in tissues and organs where the target genes are active. The Scc2–Nipped-B–delangin family provides a model system for investigating evolutionary diversification of protein function.
We used thymidine to synchronize phytohemagglutinin-stimulated blood cultures and carried out G-banding according to standard protocols25. For premature centromere separation assays, we carried out standard C-banding25 on fresh slides of samples from 12 individuals with CdLS. We scored 25 cells from each individual sample for premature centromere separation. We carried out chromosome FISH analysis by nick translation labeling of assorted genomic clones with SpectrumRed (Vysis) according to the manufacturer's instructions. Genomic clones included YAC, BAC and fosmid clones. We hybridized labeled probes along with a prelabeled chromosome 5p telomere-specific probe (Qbiogene) to metaphase chromosomes using standard methodology.
DNA sequencing and mutation screening.
We obtained Image cDNA clones from the MRC Geneservice (see URL below). We sequenced all inserts using a combination of vector-specific and insert-specific primers and the MegaBACE ET system (Amersham). We screened mutations by direct sequencing and, when exons were suitably small, by SSCP-heteroduplex analysis using standard protocols. In the latter case, we denatured PCR products, size-fractionated them in 1× MDE gels (BioWhittaker) containing 5% glycerol and 0.6× TBE buffer at 300 V for ∼20 h (depending on fragment size) and visualized them by silver staining. Any samples that had band differences relative to an unaffected control were sequenced using the MegaBACE ET system (Amersham). If the primary chromatogram suggested the presence of a deletion or insertion, we cloned the PCR product and sequenced a number of transformants to confirm the change. We reamplified and resequenced all mutations to confirm that the change observed was not the result of base misincorporation by the DNA polymerase. We used a panel of genomic DNA samples from 45 individuals of European descent (mostly from the UK, some from Poland and Ireland) with CdLS. For each mutation, we also screened 200 normal chromosomes. We screened parental DNA (when available) to confirm that the observed mutation had occurred de novo.
Because of the very long coding sequence (8,412 nucleotides), our mutation screening protocol surveyed a subset (26) of the 46 coding exons, namely exons 2–8, 13–20, 23, 24, 30, 34–36, 38, 40, 43, 45 and 46. The coding sequence sampled in these 26 exons corresponds to ∼31% of the total. This means that approximately one-third of the coding sequence was sampled in 45 individuals and more than one-half of the proximal intronic sequence was also sampled for splicing mutations. On the basis of observed relative frequencies of splice site mutations and other mutations in large multiexon genes, and assuming that all affected individuals are heterozygotes (as expected from the strong evidence for autosomal dominant transmission26) and that there are no strong mutational hot spots in the coding sequence, the identification of 9 mutations in a panel of 45 individuals with CdLS (Table 1) equates roughly to a detection rate >50%. The relatively low mutation detection rate could reflect limitations of the mutation screening protocol: only coding sequences and proximal intronic sequences were analyzed, and the gene is large and possibly prone to undetected large-scale mutations.
Gene expression analyses.
We designed PCR primers to amplify a 424-bp cDNA probe spanning exons 10–12 of NIPBL, which should hybridize to transcripts encoding both the long and short isoforms (primer sequences are available on our website; see URL below). For northern-blot analyses, we labeled the probe to high specific activity with [α32P]-dCTP by random priming. After removing unincorporated nucleotides (NICK column, Amersham), we hybridized the probe against blots of human adult and fetal RNA (Clontech) containing ∼2 μg of mRNA per lane at 42 °C overnight in ULTRAhyb (Ambion). We washed the blots in 0.1× SSC in 0.1SDS at 65 °C before exposing them to film. After removing the test probe, we rehybridized the blots with random-primed labeled control cDNA for human β-actin.
For tissue in situ hybridization, we cloned the 424-bp cDNA fragment into the pGEM-T Easy vector (Promega) and transcribed it with T7 and SP6 RNA polymerases incorporating DIG-11-UTP to generate labeled sense and antisense riboprobes, respectively. We generated additional isoform-specific probes to correspond to the long isoform of 2,804 amino acids and the short isoform of 2,697 amino acids (primer sequences are available on our website; see URL below). We hybridized the probes to sections of human embryonic tissue as described27. The isoform-specific probes generated similar expression patterns as the non-isoform-specific probe. We collected and used human embryonic tissue samples with ethical permission from the joint Ethics Committee of the Newcastle Health Authority and with appropriate signed consents. Samples were staged by microscopic examination. We fixed and processed tissue samples as previously described28. We selected the material we studied to have normal karyotypes and to be unrelated to disease.
Sequences of primers used for expression are available at our Newcastle CdLS research website at http://www.ncl.ac.uk/ihg/cdls. Servers used for nucleotide sequence analysis were the US National Center for Biotechnology Information's BLAST server (http://www.ncbi.nih.gov/BLAST/), the University of California at Santa Cruz's BLAT genome search server (http://genome.ucsc.edu/cgi-bin/hgBlat), the Ensembl genome browser (http://www.ensembl.org/), the University of California at Santa Cruz genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway), the NIX suite of nucleotide sequence analysis programs (http://www.hgmp.mrc.ac.uk/Registered/Webapp/nix/) and Baylor College of Medicine's sequence utilities programs (http://searchlauncher.bcm.tmc.edu/seq-util/seq-util.html). Servers for protein sequence analysis included PSORTII (http://psort.nibb.ac.jp) and the DomPred program for predicting protein domains (http://bioinf.cs.ucl.ac.uk/dompred/). The Rep program (http://www.embl-heidelberg.de/~andrade/papers/rep/search.html) allowed us to identify five HEAT repeats in the conserved C-terminal domain. Alignment of multiple orthologous sequences was aided by using the ClustalW program at http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html. IMAGE cDNA clones were obtained from the MRC Geneservice, available at http://www.hgmp.mrc.ac.uk/geneservice/index.html.
GenBank accession numbers.
Human NIPBL mRNA encoding the long delangin isoform, AJ627032; homologous mouse mRNA encoding the long delangin isoform, AJ627033; human NIPBL mRNA encoding the short delangin isoform, AJ640137; homologous mouse mRNA encoding the short delangin isoform, AJ640138; Image clone 5784375, AJ627564.
Note: Supplementary information is available on the Nature Genetics website.
This paper is dedicated to the memory of F. Strachan (1921–2004). We thank many of our current Newcastle colleagues, especially S. Zwolinsky and J. Wolstenholme, for discussions, carrying out chromosome-banding analyses and conducting the premature centromere separation assay; H. Peters and D. Henderson for contributions to analysis of our expression data; S. Humphray for supplying fosmid clones; L. Jackson for facilitating the collaboration between M.B. and the Newcastle group; I. Krantz for discussions; A. Peaford and colleagues for their support; M. Walasek, M. Ireland and many other clinical geneticists for providing blood samples from individuals with CdLS and access to phenotype data; many individuals with CdLS and their families for their generosity; and previous members of the Newcastle CdLS research team, notably M. Smith, P. J. A. Eichhorn and B. Imamwerdi, for their earlier contributions. We thank the UK Community Fund and previously Action Research for providing funding for this project and the MRC-Wellcome Human Developmental Biology Resource for supplying human embryonic tissue samples.