First interchromosomal insertion in a patient with cerebral and spinal cavernous malformations

Autosomal dominant cerebral cavernous malformations (CCM) are leaky vascular lesions that can cause epileptic seizures and stroke-like symptoms. Germline mutations in either CCM1, CCM2 or CCM3 are found in the majority of patients with multiple CCMs or a positive family history. Recently, the first copy number neutral inversion in CCM2 has been identified by whole genome sequencing in an apparently mutation-negative CCM family. We here asked the question whether further structural genomic rearrangements can be detected within NGS gene panel data of unsolved CCM cases. Hybrid capture NGS data of eight index patients without a pathogenic single nucleotide, indel or copy number variant were analyzed using two bioinformatics pipelines. In a 58-year-old male with multiple CCMs in his brain and spinal cord, we identified a 294 kb insertion within the coding sequence of CCM2. Fine mapping of the breakpoints, molecular cytogenetic studies, and multiplex ligation-dependent probe amplification verified that the structural variation was an inverted unbalanced insertion that originated from 1p12-p11.2. As this rearrangement disrupts exon 6 of CCM2 on 7p13, it was classified as pathogenic. Our study demonstrates that efforts to detect structural variations in known disease genes increase the diagnostic sensitivity of genetic analyses for well-defined Mendelian disorders.


Results
Clinical findings. The male index patient III:1 (pedigree 1, Fig. 1a) was referred to genetic counselling at the age of 58 years because of multiple symptomatic CCMs. Magnetic resonance imaging (MRI) documented at least nine cavernous lesions with surrounding hemosiderin deposits in his brain and spinal cord (Fig. 1b). One brainstem cavernoma and three CCMs in his thoracic, cervical, and lumbar spine had already been resected because of acute bleeding events at the age of 44, 50, and 58, respectively (Fig. 1c, Supplementary Video S1). The index patient also reported an atrial septal defect and a traumatic L1 vertebral body fracture at the age of 55 years which had been treated with balloon kyphoplasty. On physical examination, he presented a partial loss of vibration and fine touch sensation below the level of T10 and impaired sensation of temperature on the contralateral side. In addition, he had an ataxic gait with right foot drop. The index patient's father (II:2) died of gastric cancer when he was 72 years old. Cavernous lesions or neurological symptoms that would be suggestive for CCM had not been documented for him. The mother of the index patient (II:3) passed away at the age of 90 years with no signs of CCM either. Both children (IV:1 and IV:2) are healthy, and there were no other relatives who had CCMs or reported epileptic seizures, stroke-like symptoms or chronic headaches.

Identification of an interchromosomal insertion. Gene panel sequencing of CCM1, CCM2, and CCM3
for index patient III:1 (Fig. 1) and routine bioinformatics analyses identified no pathogenic SNV, indel or CNV. However, a high number of split reads in CCM2 was noticed in NGS gene panel data of this proband. These deviant reads could be grouped into two clusters based on their mapping. Reads of the first group mapped with one part to the 5' half of exon 6 of CCM2, which is located on 7p13, and with the other part to 1p11.2 or 1q21.1. Due to known sequence homologies, aberrant mapping on chromosome 1 could not be further specified. Split reads of the second group partially mapped to the 3' half of CCM2 exon 6 but also to 1p12. Notably, detection of deviant reads required the SeqNext module of the Sequence Pilot tool for mapping and alignment. Using the mapping and alignment information of the MiSeq Reporter Software resulted in significantly reduced read depth at positions [hg19] chr7:45,108,098-45,108,101. However, split reads were filtered out by SeqNext and no variant was called since there were no reads that covered a heterozygous deletion of these four nucleotides (Fig. 2a).
Based on the NGS data of index patient III:1, we suspected an interchromosomal insertion from 1p12-p11.2 into the coding region of CCM2 on 7p13 (Fig. 2b). This rearrangement was finally made visible by fluorescence in situ hybridization (FISH) on metaphase chromosomes from cultured blood lymphocytes of the index patient (Fig. 2c). The inserted fragment covers a genomic region of 294 kb that contains the complete NOTCH2 (MIM: 600275), ADAM30 (604779), and NBPF7 (613997) genes as well as parts of REG4 (609846). MLPA analysis with a centromere kit further indicated the presence of three copies of NOTCH2 which is the only gene in the inserted fragment with an associated phenotype (Fig. 2d). Loss-of-function mutations and pathogenic missense variants in this gene have been reported in patients with Alagille (MIM: 610205) and Hajdu-Cheney syndrome (102500). However, the index patient did not meet the clinical diagnostic criteria of either syndrome. fine mapping of the breakpoints. We next amplified the breakpoints of the interchromosomal insertion for its precise molecular characterization. Sanger sequencing allowed us to fine map the breakpoints to [hg19] chr1:120,347,265, chr1:120,641,440, and chr7:45,108,098-45,108,101, respectively. It also verified that the fragment of 1p12-p11.2 was inserted in an inverted orientation on 7p13 and flanked by a small insertion and a deletion-insertion variant in which the four base pairs with reduced sequencing depths were replaced by six www.nature.com/scientificreports www.nature.com/scientificreports/ nucleotides of unknown origin (Fig. 3). While the breakpoint on 1p11.2 locates within a long interspersed nuclear element (LINE), no repeat elements were found in close proximity to the breakpoints on 1p12 or 7p13. As the heterozygous SV interrupts exon 6 of the CCM2 gene, it was classified as pathogenic for CCM. In summary, the results of our molecular, cytogenetic, and molecular cytogenetic studies demonstrated that index patient III:1 is a heterozygous carrier of a submicroscopic, unbalanced, interchromosomal, and inverted insertion.
Screening for SVs in mutation-negative ccM patients. We finally wanted to know if SVs can also be identified in the hybrid capture NGS data of seven additional CCM patients without any pathogenic SNV, indel or CNV in CCM1, CCM2 or CCM3 that had been analyzed between 2017 and 2019 (Supplementary Table S1). Therefore, we chose the Agilent SureCall tool, which allowed us to search for translocation events based on the presence of split reads in their NGS data. The index patient III:1 from family 1 was used as a positive control. The tool precisely re-identified his SV but did not detect any further chromosomal rearrangements in the other CCM cases. Furthermore, no split reads were found by visual inspection of their NGS data.

Discussion
In this study, we have identified the first interchromosomal insertion, sometimes also referred to as interchromosomal insertional translocation (IT), in a CCM patient. Notably, current genetic analysis cannot clarify the molecular basis of CCM disease in 2 to 13% of familial cases and up to 40% of sporadic CCM patients 1 . Besides phenocopies, somatic mosaicism and pathogenic variants in an unknown CCM4 gene or non-coding regions of CCM1, CCM2 or CCM3 have been discussed as possible explanations for these patients [21][22][23] . However, the identification of an interchromosomal insertion in one of eight unsolved CCM cases and the copy number neutral inversion in CCM2, that we have reported recently 14 , demonstrate that SVs also need to be considered. It is remarkable that both rearrangements identified so far have been detected in CCM2 which is the largest of the three CCM genes. However, future studies will have to show whether CCM2 is more susceptible to the occurrence of SVs or whether further genomic rearrangements can be detected in CCM1 and CCM3 with more sensitive techniques.
ITs and other SVs have a higher prevalence than previously thought 20,24,25 . FISH confirmation and parental follow-up studies of CNVs that had been identified by clinical array comparative genomic hybridization (aCGH) analysis revealed that ITs can be found in approximately 1 of 500 patients referred to aCGH analysis 24 . This observation suggests that their prevalence might also be underestimated in patients with CCM or other inherited disorders for whom aCGH is not part of standard genetic analyses. Sophisticated bioinformatics algorithms have been developed to identify SVs from short-read NGS data based on inconsistent paired-end mapping, the www.nature.com/scientificreports www.nature.com/scientificreports/ presence of split reads or changes in read depth 26,27 . While CNVs can be efficiently detected with NGS-based approaches, inversions and ITs that do not cause gain or loss in the number of CCM1, CCM2 or CCM3 alleles might be regularly missed with targeted gene panel sequencing. Furthermore, the analytical sensitivity of the available bioinformatics tools is still incomplete. Notably, only one of the two tools that were used for mapping and alignment in our current study allowed us to detect aberrant reads in the NGS data analysis for the index patient from pedigree 1.
The search for chromosomal rearrangements is becoming more and more important in genetic research projects that try to solve undiagnosed cases with rare diseases. While the detection of ITs within the coding region of a gene seems to be relatively straightforward with special bioinformatics tools and filter criteria, insertions into intronic, promotor or cis-regulatory regions that can also impair gene regulation 16 can hardly ever be detected with PCR-or hybrid capture-based target enrichment strategies and short-read sequencing. Recently, short-read WGS and a new bioinformatics pipeline revealed disease-causing SVs in 16 out of 477 patients of the Undiagnosed Diseases Network (UDN) 28 . However, third-generation sequencing technologies will probably further improve the diagnostic sensitivity for the identification of chromosomal rearrangements as the probability of spanning the whole SV and the mappability in repetitive regions are higher for long reads 29,30 . In a cohort of 1,324 undiagnosed probands with rare diseases from the NIHR BioResource research study, three cases with a complex pathogenic SV were identified by short-read WGS and a duplication-inversion-duplication event of unknown clinical significance was resolved in another proband by long-read sequencing 31 . Whole genome long-read sequencing also identified a heterozygous ~2.2 kb deletion in PRKAR1A which is a known disease gene for autosomal dominant Carney complex in a patient with multiple tumours for whom initial targeted PRKAR1A sequencing and short-read WGS had not revealed any pathogenic variant 32 . Several other examples illustrate that long-read sequencing enables effective SV calling and could help to solve undiagnosed cases 33 . Therefore, long-read WGS is a promising next step for the yet unsolved CCM cases of our cohort. These have already been checked for CNVs in CCM1, CCM2, and CCM3 but third-generation sequencing technologies would allow the detection of CNVs and copy number neutral SVs in a genome-wide approach. Nevertheless, the combination of multiple sequencing technologies might be still necessary to achieve sufficient insertion and inversion detection sensitivities 20 .
The identification of a pathogenic SNV, indel, CNV or SV is essential for genetic counselling of CCM families and always raises the question whether it is an inherited or a de novo variant. As both parents of the index patient from pedigree 1 died without CCMs, epileptic seizures or stroke-like symptoms, one might speculate that his SV is a de novo mutation. In line with this hypothesis, a recent analysis of the Genome Aggregation Database (gnomAD) Consortium suggested a de novo mutation rate of 0.35 SVs per generation (95% confidence interval: 0.18-0.52) 25 . Given that a significant number of SVs might have been missed with short-read WGS, this projection likely underestimates the real number of de novo SVs per genome. However, we can also not exclude a cryptic inherited case in pedigree 1 since the penetrance of CCM is incomplete and up to 45% of CCM2 mutation carriers remain asymptomatic 9 . Indeed, Nowakowska and colleagues demonstrated that a significant number of apparently de novo, interstitial CNVs that had been found in patients with multiple congenital anomalies or mental retardation were actually the result of an unbalanced transmission of a derivative chromosome from one parent with a balanced insertional translocation 34 . Furthermore, the transmission from an obviously unaffected parent that carries the same unbalanced rearrangement as its affected child has also been reported 24 . Unfortunately, DNA samples of the index patient's parents were not available and there were no other affected family members to address the origin of the chromosomal rearrangement or to demonstrate co-segregation of the SV and CCM disease.
In conclusion, our study adds the first interchromosomal insertion to the CCM mutation spectrum and suggests that hard-to-detect SVs might account for more CCM cases than previously thought. Together with other literature reports of copy number neutral chromosomal rearrangements in known disease genes 14,35,36 , our results support the hypothesis that efforts to detect SVs might be more promising than the search for novel candidate genes for well-defined Mendelian disorders.

Methods
Study population and ethical considerations. Genomic DNA was isolated from peripheral blood lymphocytes of all study participants with written informed consent using the NucleoSpin Blood L Kit according to the manufacturer's instructions (Macherey-Nagel, Düren, Germany). A Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, USA) was used to measure DNA concentrations before NGS target enrichment. DNA purity was determined on a NanoPhotometer instrument (Implen, München, Germany). All procedures performed in this study involving human participants were in accordance with the 1964 Helsinki declaration and its later amendments.The study protocol was approved by the local ethics committee of the University Medicine Greifswald (BB 047/14) and written informed consent was obtained from index patient III:1 to publish the medical information and images that are presented here.
Target enrichment, next-generation sequencing, and bioinformatics analyses. www.nature.com/scientificreports www.nature.com/scientificreports/ were independently performed for each sample with the MiSeq Reporter Software and the SeqNext module of the Sequence Pilot software (JSI medical systems, Ettenheim, Germany). The SeqNext module was also used in a read depth-based approach to identify CNVs in CCM1, CCM2, and CCM3 as described previously 37 . BAM files that had been generated with the MiSeq Reporter Software were further analyzed for translocation events with the SureCall 4.1.1.5 software (Agilent Technologies), and aligned reads were grouped by the mate chromosome in the Integrative Genomics Viewer 38 .
Sanger sequencing and multiplex ligation-dependent probe amplification. Specific primer pairs were designed to amplify the suspected breakpoints of a novel interchromosomal insertion identified in CCM2. Primer sequences are available upon request. PCR products were purified with the ExoSAP-IT Cleanup Reagent (Thermo Fisher Scientific). Sanger sequencing was performed on a SeqStudio Genetic Analyzer (Thermo Fisher Scientific) following established protocols. The SALSA MLPA P181-B1 Centromere mix 1 (MRC-Holland, Amsterdam, The Netherlands) was used according to the manufacturer's instructions to determine copy number variations of a part of the NOTCH2 gene (MIM: 600275) which is located on chromosome 1p12 (hg19: 120,454,176-120,612,317). chromosome analysis and molecular cytogenetic studies. Karyotyping was performed following standard procedures on metaphases obtained from PHA stimulated blood lymphocytes. Fluorescence in situ hybridization (FISH) was used to confirm the interchromosomal insertion using standard protocols. The following BAC clones were used: RP11-425C16 in 1p12 (hg19: 120,176,963-120,358,983) and RP11-111G20 in 7p12.3~13 (hg19: 45,283,437-45,448,789). ethical approval. All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee (University Medicine Greifswald; BB 047/14) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Data availability
All relevant data generated or analysed during this study are included in this published article and its supplementary information files.