Clinical relevance of postzygotic mosaicism in Cornelia de Lange syndrome and purifying selection of NIPBL variants in blood

Postzygotic mosaicism (PZM) in NIPBL is a strong source of causality for Cornelia de Lange syndrome (CdLS) that can have major clinical implications. Here, we further delineate the role of somatic mosaicism in CdLS by describing a series of 11 unreported patients with mosaic disease-causing variants in NIPBL and performing a retrospective cohort study from a Spanish CdLS diagnostic center. By reviewing the literature and combining our findings with previously published data, we demonstrate a negative selection against somatic deleterious NIPBL variants in blood. Furthermore, the analysis of all reported cases indicates an unusual high prevalence of mosaicism in CdLS, occurring in 13.1% of patients with a positive molecular diagnosis. It is worth noting that most of the affected individuals with mosaicism have a clinical phenotype at least as severe as those with constitutive pathogenic variants. However, the type of genetic change does not vary between germline and somatic events and, even in the presence of mosaicism, missense substitutions are located preferentially within the HEAT repeat domain of NIPBL. In conclusion, the high prevalence of mosaicism in CdLS as well as the disparity in tissue distribution provide a novel orientation for the clinical management and genetic counselling of families.


Postzygotic mosaicism (PZM) in NIPBL is a strong source of causality for Cornelia de Lange syndrome (CdLS) that can have major clinical implications.
Here, we further delineate the role of somatic mosaicism in CdLS by describing a series of 11 unreported patients with mosaic disease-causing variants in NIPBL and performing a retrospective cohort study from a Spanish CdLS diagnostic center. By reviewing the literature and combining our findings with previously published data, we demonstrate a negative selection against somatic deleterious NIPBL variants in blood. Furthermore, the analysis of all reported cases indicates an unusual high prevalence of mosaicism in CdLS, occurring in 13.1% of patients with a positive molecular diagnosis. It is worth noting that most of the affected individuals with mosaicism have a clinical phenotype at least as severe as those with constitutive pathogenic variants. However, the type of genetic change does not vary between germline and somatic events and, even in the presence of mosaicism, missense substitutions are located preferentially within the HEAT repeat domain of NIPBL. In conclusion, the high prevalence of mosaicism in CdLS as well as the disparity in tissue distribution provide a novel orientation for the clinical management and genetic counselling of families.
Genetic mosaicism is a well-described biological phenomenon characterized by the presence of genetically distinct lineages of cells in the same individual due to postzygotic de novo mutational events. Far from being an exceptional condition, technical advances in DNA and RNA sequencing, which can even sequence a single cell, have confirmed the theoretical hypothesis that mosaicism is the norm in humans [1][2][3] . Postzygotic mosaicism (PZM) can refer to a variety of different mutation types, such as single-nucleotide substitutions, insertions, deletions, and copy-number variants (CNVs). The biological consequences of these mutations are mainly

Results
Novel postzygotic mosaic variants in 11 individuals with Cornelia de Lange syndrome. Here we report a total of 11 new cases of postzygotic mosaicism in individuals with CdLS from Germany, Italy and Spain. Based on their clinical CdLS score, 10 individuals showed classic CdLS phenotypes and only one showed a non-classic phenotype (Table 1). Patients reported here had consistent global developmental delay and intellectual disability (10/11). All of them presented the characteristic (classic) CdLS craniofacial features such as synophrys, thick arched eyebrows, thin upper lip vermilion and downturned corners of mouth (11/11). Upturned nasal tip (9/11) and elongated smooth philtrum (10/11) were also commonly observed. Regarding growth parameters, microcephaly (9/11) and postnatal growth retardation (9/11) were the anomalies most frequently observed (Table 1). All 11 individuals presented mosaic disease-causing variants in NIPBL. Using next  G  G  G  G  IT  IT  IT  IT  S  S  S   Clinical score  8  14  12  12  13  12  14  14  14  14  14 Synophrys

Characteristics of mosaic variants in NIPBL.
Postzygotic mosaic NIPBL variants are scattered across the entire gene. Only one variant was shared by two unrelated individuals: NIPBL, (RefSeq NM_133433), c.7168G > A; p.(Ala2390Thr) (Fig. 1a). One gross gene rearrangement causing a deletion of exons 2 to 32 was reported. Of the 37 point variants identified so far, 59.5% (22/37) are nonsense or frameshift variants, 16.2% (6/37) are splice variants, and 24.3% (9/37) are missense variants. A similar proportion of each type of variants was observed for de novo mutations (DNM) or germline variants found in NIPBL currently deposited in ClinVar (Fig. 1b). Interestingly eight of the nine mosaic missense substitutions are located within the HEAT repeat domain of NIPBL ( Fig. 1c-e). A similar trend was observed for all the pathogenic and likely pathogenic constitutive variants described in ClinVar. It is noteworthy that nonsense, frameshift and splice variants are distributed all over the gene, while the vast majority of the missense variants are located in the HEAT repeat domain (Fig. 1c,d).
Purifying selection against NIPBL disease-causing variants in blood. Mosaic variants were detected by Sanger sequencing, pyrosequencing and/or NGS on genomic DNA. Blood and at least one additional tissue (cultured skin fibroblasts, saliva and/or buccal swabs, urine or muscle) were analyzed in 29 out of the 38 cases with NIPBL mosaic variants. The detection of these 29 variants was achieved by quantitative methods (NGS and/or pyrosequencing) in 12 cases and by non-quantitative Sanger sequencing in 17 cases. For all 29 cases, the genetic change on blood DNA was present at a very low allelic frequency or was undetected (Fig. 2 High frequency of postzygotic mosaicism in Cornelia de Lange syndrome. Out of the 12 studies identified in the literature on PZM in CdLS, four were cohort studies 12,13,26,27 . Due to differences in inclusion criteria among studies, we calculated the prevalence of PZM by dividing the reported number of patients with PZM by the total number of CdLS patients who received a molecular diagnosis. Across the studies, the frequency of PZM ranges from 7.9 to 27% (Fig. 3a). In order to accomplish a more comprehensive evaluation of the relevance of mosaicism in CdLS, we performed a detailed retrospective study in a Spanish cohort clinically diagnosed as  [28][29][30][31][32] . The 31 causative variants identified in blood were confirmed in all patients by Sanger sequencing in at least one additional biological sample (saliva, buccal swabs or fibroblasts), confirming that the variants identified were all constitutive. By array CGH and MLPA, the genetic cause of the disorder was detected in four patients. Two of them presented a microdeletion involving the RAD21 33 and ARID1B 23 genes,  Notably none of the variants could be detected by Sanger sequencing on DNA derived from peripheral blood. At the end of our study, a molecular diagnosis could not be assigned in four cases (9.3%, 4/43). Hence the prevalence of somatic mosaicism in our cohort was 10.26% (4/39), when considering the individuals with a defined molecular diagnosis (Fig. 3b).

Discussion
Currently, a molecular diagnosis is established in approximately 85% 14 of patients with a clinical diagnosis of CdLS. An invaluable tool to reach this high percentage of solved cases is sensitive next generation sequencing, and in particular the incorporation of deep-sequencing target panels. By this, a set of genes can be analyzed simultaneously with very high sequencing depth, allowing the identification of genetic mosaicism, which is of special relevance in the context of CdLS 12,35,36 . Recently, it has been estimated that about 3% of causative de novo point variants in children with developmental disorders occurred as PZM 37 . So far, including the present work, five cohort studies have analyzed the prevalence of mosaicism in CdLS 12,13,26,27 . Across the studies, the frequency of PZM ranges from 7.9 to 27%. This variance could be explained by differences in the clinical characteristics of the patient cohort, the inclusion criteria, the molecular analyses performed and the tissues analyzed. Despite these limitations and the more than probable selection bias included in the retrospective studies, taking into account all five studies, PZM has been identified in 13.1% of the individuals who received a molecular diagnosis, which entails an unusual high frequency of somatic mosaicism in this genetic disease. In other syndromes, many mosaic cases could go unnoticed inasmuch as, potentially, a mosaic variant causes a less severe and/or variable phenotype compared with the equivalent constitutive variant 4 . However, this is not the case for CdLS, since CdLS patients with somatic mosaicism may present with clinical manifestations as severe as individuals harboring a heterozygous loss-of-function variant in a known causative gene. In fact, 22 out of the 26 mosaic patients for whom clinical data were available (included the 11 reported in this paper), showed a classic CdLS phenotype. www.nature.com/scientificreports/ In the vast majority of mosaic cases described in association with CdLS, NIPBL is the affected gene. The genetic variant type (frameshift, nonsense, missense or splice variant), as well as its distribution over the gene, do not appear to be influenced by the mosaic condition. However, it seems remarkable that the majority of missense variants found in NIPBL lie within the HEAT repeat domain, a very important region for the functionality of the protein. The structure of this domain was recently solved using cryo-electron microscopy (cryo-EM) 38 . It suggests an involvement in binding of a segment of the central unstructured domain of RAD21 as well as to the DNA molecule, thus reinforcing the hypothesis that the HEAT repeat domain plays a central role for the function of NIPBL and the cohesin complex. Somatic mosaicism was also reported for pathogenic variants in other CdLS-related genes 13,[39][40][41] . Further studies based on deep sequencing are needed for a better characterization of mosaicism in non-NIPBL genes to withdraw conclusions about the frequency of mosaic variants in each CdLS causative gene. Besides CdLS, this phenomenon has been also described for other chromatinopathies, including Rubinstein-Taybi Syndrome (CREBBP) 42,43 , Wiedemann-Steiner Syndrome (KMT2A) 26 or Coffin-Siris Syndrome (ARID1A) 44 .
Despite great progress in DNA sequencing techniques, mosaicism of pathogenic variants as cause of CdLS is frequently missed because genomic DNA from peripheral blood cells is used as the standard sample for routine genetic diagnostics. Unfortunately, the majority of mosaic events in CdLS were detected in DNA derived from buccal cells, saliva, urine, fibroblasts and/or skeletal muscle, whereas none of the cases described shows an overrepresentation of the mutant allele in DNA from peripheral blood. It could be thought that the explanation of this particularity relies on the fact that analyses of other tissues are only carried out when a pathogenic variant cannot be found in blood. Nevertheless, in this work we have analyzed other tissues in all the patients in whom causative variants had been detected in blood, but we did not identify any mosaic. That suggests PZM or genetic reversion followed by a negative selection against mutated clones in blood. Reversion is a rare phenomenon mainly described in skin and hematological diseases and associated with milder phenotypes than constitutive cases 45 . However, given the severity of mosaic cases in CdLS, the heterogeneous allele distribution observed amongst tissues and that back mutations would be unusually frequent for the various NIPBL causative variants, there are no evidences supporting this phenomenon in CdLS. www.nature.com/scientificreports/ It is assumed that the extent of mosaicism across different tissues of a patient depends, at least in part, on the moment of occurrence of the mutation during early embryogenesis, the relative size of the founding population, and the cell fitness and quality. The clinical severity observed in mosaic CdLS patients suggests the arousal of the pathogenic variants early in development. More precisely, the presence of these variants in cells from different germ layers indicates that the mutational event might have taken place after zygotic stage but before gastrulation process. Thus, the specific absence of causative NIPBL variants in blood cannot be explained by the time of occurrence of the mutational event. Instead, it seems that the functional alterations in cells due to these variants could lie behind the mosaicism dynamics.
Several mechanisms of genetic selective pressures have been proposed. For example, DNA damage response or unfolded protein response are implicated in cell-autonomous elimination of altered cells, meanwhile innate immune system or local competitive interactions between neighboring cells may drive the expansion or elimination of cells harboring pathogenic variants in a cell non-autonomous manner 8 . Recently, it has been demonstrated that cells derived from CdLS patients display a defective DNA damage signaling and repair 46 . Actually, NIPBL is yet known to have important roles in 3D genome organization and stability 47 , and its knocking down has been directly correlated with higher levels of DNA damage 48 . It seems likely that mutated cell population could have a selective growth disadvantage over unaffected cells, leading to the expansion of the wild-type clones in bone marrow. A similar phenomenon of somatic rescue events specifically in blood has been demonstrated in some genome instability syndromes, such as Fanconi anemia or Bloom syndrome 49 , in which pathogenic variants in genes related to DNA damage and repair seem to reduce the fitness of hematopoietic stem and progenitor cells (HSPCs) and drive clonal selection and expansion of non-diseased cells 50 . By all means, a better understanding of mosaicism dynamics and the forces that drive the generation and shaping of somatic mosaicism in CdLS will provide new insights of a fundamental biological process and will enhance our understanding of the pathological mechanisms of this disease.
This phenomenon of negative selection against somatic deleterious variants in blood may be more common than reported so far. A recent massive RNA-seq analysis in samples from individuals of the Genotype-Tissue Expression (GTEx) cohort revealed that less than half of disease-causing mosaic variants in genes expressed in blood were detectable in blood-derived DNA 3 . Furthermore, selective genetic segregation in blood has been also described in some genetic disorders such as Pallister-Killian Syndrome 51 or even in some mitochondrial diseases 52 , in which genetic testing begins from urine or fibroblasts samples instead of blood. The current recommendation for CdLS is to conduct a mosaicism study using fibroblasts, buccal cells or bladder epithelial cells when targeted panel or Sanger sequencing do not detect causal variants in lymphocytes 14,53 . We are well aware of the problems involved in collecting some biological samples and the technical challenges of obtaining high quality DNA from some of them. Thus, simultaneous collection of blood samples and buccal swabs may be a plausible option when a patient is suspected of having CdLS. Preferably, if the quality and quantity of DNA extracted from buccal swab sample meet the same standards as those established for blood samples, the first-line molecular testing should analyze DNA derived from buccal swab using a deep targeted gene panel containing at least the eight known CdLS causative genes. In case the panel detected a causal variant, this should be confirmed by Sanger sequencing in DNA derived from blood to evaluate mosaicism condition.
Besides the above considerations, the high prevalence of mosaicism as well as the disparity in tissue distribution can have major clinical implications in CdLS regarding parental counselling about recurrence risk. In principle, in routine clinical practice, the risk classification is made based on variant heritability: Hereditary (high risk), DNM (low risk) and PZM (minimal risk). However, since blood is usually the only sample analyzed in parents in order to determine heritability, it is more than likely that we are missing parental mosaicism events. Actually, several cases of apparently unaffected parents with very low levels of somatic mosaicism have been identified in CdLS 54 , for which a 4% of germline mosaicism has been estimated 55 . Thus, it is worth noting that deep sequencing of DNA derived from buccal cells or fibroblasts would be a reliable way to investigate somatic mosaicism in patients and parents, and subsequently, to estimate recurrence risk. In this context, recurrence risk of future pregnancies could be split into four groups based on the type of pathogenic variants found in the probands and their parents: high (parental constitutive variant), moderate (parental gonadosomic and/or germline mosaic variant), low (germline DNM) and minimal (PZM in child).
In conclusion, the high prevalence of mosaicism in CdLS as well as the likely purifying selection against disease-causing variants in blood should be considered when molecular diagnosis of the proband and familial co-segregation studies are planned.

Material and methods
Patient recruitment and data collection. All mosaic patients were recruited as a part of an international collaboration between investigators from Spain, Germany and Italy. The study was performed according to the Declaration of Helsinki protocols and was approved by each Regional Ethics Committee of Clinical Research. Informed consent was obtained from parents or guardians of all individuals included in this study, and from all the parents in which inheritance of the variants have been evaluated. Patients with mosaic diseasecausing variants in NIPBL were phenotyped either by a clinical geneticist, a pediatrician or a trained physician. Clinical data were collected using a standard restricted-term questionnaire, and detailed phenotypes of the individuals were entered by the patients' clinician using the Human Phenotype Ontology (HPO) nomenclature. Clinical scores for CdLS were calculated according to the published international consensus guidelines 14 Table 2.
For the retrospective study, all patients were subjected to molecular analysis in the Clinical Genetics and Functional Genomics Group in the University of Zaragoza. DNA isolation. Genomic DNA was isolated from blood lymphocytes using conventional phenol-chloroform isoamyl alcohol method, from oral mucosa epithelial cells using prepIT.L2P (DNA Genotek Inc.), and/or from fibroblasts samples using PureLink™ Genomic DNA kit (Invitrogen) according to the manufacture's protocols. Quality and concentration of gDNA were determined using both, the Qubit Fluorometric Quantitation (Thermo Fisher Scientific) and Nanodrop 2000 (Thermo Fisher Scientific). Sanger sequencing. Independent PCR followed by Sanger sequencing was performed to confirm those reportable SNVs and indel variants detected by NGS and for co-segregation analyses. Primers were designed using the Primer-Blast in silico tool (https:// www. ncbi. nlm. nih. gov/ tools/ primer-blast/) and checked in the UCSC In-Silico PCR tool (https:// genome. ucsc. edu/ cgi-bin/ hgPcr). All primer sequences and annealing temperatures are presented in Supplementary Table 2. PCR products were sequenced on ABI3730xl Capillary Electrophoresis Sequencing System (Applied Biosystems) according to manufacturer´s protocol. Sequences were analyzed and compared with the reference sequences using the Analysis Module Variant Analysis (VA) software (Applied Biosystem) and Ensembl and NCBI databases.

Next-generation sequencing.
MLPA and CGH array. If the panel did not detect causal variants, multiplex ligation-dependent probe amplification (MLPA) and/or comparative genomic hybridization array (aCGH) were done. MLPA was used to search for genomic copy number variations in NIPBL gene. The SALSA P141/P142 NIPBL MLPA kit (MRC-Holland, Amsterdam, The Netherlands) was used following the manufacturer's instructions, the reaction products were separated by capillary electrophoresis on Abi Prism 3130XL Analyzer (Applied Biosystems) and the results obtained were analysed using GeneMapper software (Thermo Fisher Scientific). aCGH analyses were performed with the qChip Post oligonucleotide microarray (Quantitative Genomic Medicine Laboratories, Barcelona, Spain).

Systematic review.
We systematically searched the literature in the databases PubMed, Web of Sciences and EMBASE from 2005 to 2020. The search strategy included the key words of "de Lange Syndrome", "mosaicism", "somatic mosaicism", and "postzygotic mutation". We also manually checked the reference lists from relevant articles and reviews. Trials, case reports, cohort studies and reviews were included. After full-text review, papers containing patients diagnosed with CdLS according to standard clinical criteria and carrying pathogenic variants in one of the major causative genes of CdLS were included. Asymptomatic familial cases and germline mosaicisms were excluded.

Statistical analyses and figures.
Statistical analyses and graphics were produced with GraphPad Prism 6 software. Data sets were compared by chi-square test when corresponded. Differences were considered statistically significant at p values below 0.05. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001. All statistical analyses are explained in the figure legends. Figure 1e was generated using the Pymol Molecular Graphics System (https:// pymol. org/; Schrödinger, LLC, Portland, OR) and the information contained in the Protein Data Bank structure 6WGE 38 .