The clinical-phenotype continuum in DYNC1H1-related disorders—genomic profiling and proposal for a novel classification

Mutations in the cytoplasmic dynein 1 heavy chain gene (DYNC1H1) have been identified in rare neuromuscular (NMD) and neurodevelopmental (NDD) disorders such as spinal muscular atrophy with lower extremity dominance (SMALED) and autosomal dominant mental retardation syndrome 13 (MRD13). Phenotypes and genotypes of ten pediatric patients with pathogenic DYNC1H1 variants were analyzed in a multi-center study. Data mining of large-scale genomic variant databases was used to investigate domain-specific vulnerability and conservation of DYNC1H1. We identified ten patients with nine novel mutations in the DYNC1H1 gene. These patients exhibit a broad spectrum of clinical findings, suggesting an overlapping disease manifestation with intermixed phenotypes ranging from neuropathy (peripheral nervous system, PNS) to severe intellectual disability (central nervous system, CNS). Genomic profiling of healthy and patient variant datasets underlines the domain-specific effects of genetic variation in DYNC1H1, specifically on toleration towards missense variants in the linker domain. A retrospective analysis of all published mutations revealed domain-specific genotype–phenotype correlations, i.e., mutations in the dimerization domain with reductions in lower limb strength in DYNC1H1–NMD and motor domain with cerebral malformations in DYNC1H1–NDD. We highlight that the current classification into distinct disease entities does not sufficiently reflect the clinical disease manifestation that clinicians face in the diagnostic work-up of DYNC1H1-related disorders. We propose a novel clinical classification for DYNC1H1-related disorders encompassing a spectrum from DYNC1H1–NMD with an exclusive PNS phenotype to DYNC1H1–NDD with concomitant CNS involvement.

In this multi-center study, we report the clinical course of ten pediatric patients with DYNC1H1-associated phenotypes with nine novel pathogenic variants, highlighting the broad clinical heterogeneity of dyneinopathies and proposing a new clinical classification for DYNC1H1-related disorders.

Genetic investigations
We included ten patients (P1-P10) with DYNC1H1 (NM_001376.5) mutations in this multi-center study. All parents and/or patients gave informed consent. The study was approved by the ethics committee of the Medical Faculty, University of Cologne .
Genetic investigations of P1 via a next-generation sequencing panel (MYO-SEQ project in Newcastle University) revealed a heterozygous variant in DYNC1H1 after various genes had been sequenced without detection of putative pathogenic variants at the time when whole-exome sequencing (WES) was not easily available [11]. Subsequently, the DYNC1H1 variant was identified by Sanger Sequencing in the father. Later, P1 and both parents underwent trio-WES and the DYNC1H1 variant was confirmed in P1 and his father [11].
P2 was diagnosed via a targeted next-sequencing gene panel following enrichment using highly specific Molecular Inversion Probes [12]. For P3, we performed a commercial targeted next-generation sequencing panel including coding regions of 61 NMD genes (spinal muscular atrophy and limb-girdle muscular dystrophies panel, MGZ München).
P4 received trio-WES after enrichment for the IDT xGene exome research panel followed by 2 × 150 bp sequencing with a mean target coverage of 100-fold on Illumina NextSeq500 Sequencer. Alignment (mapping to GRCh37/hg19), variant identification (SNPs and indels), variant annotation, and filtering were performed using the CLC Biomedical Genomics Workbench (Qiagen, Hilden, Germany). Variants were filtered with a focus on proteinaltering variants (missense, frame-shift, splice-site, and premature stop-codons) rare or absent (de novo variants) from public databases (gnomAD and 1000 Genomes project) as previously described [13].
In P5, we performed trio-WES after enrichment with Agilent SureSelect V6 kit (Agilent, USA) on an Illumina HiSeq 4000 Sequencer (Illumina, USA) with 2 × 75 bp sequencing protocol according to the manufacturer's bestpractice protocol, a mean coverage of 85 for the patient and father and a mean coverage of 80 for the mother was achieved [14,15]. The sequencing data were analyzed using a version of the Cologne Center for Genomics exome pipeline, version 2.20 [16]. The annotated variant lists were uploaded to the Cologne Center for Genomics Varbank (https://varbank.ccg.uni-koeln.de) database for variant filtering. Since a trio-WES was available, we additionally performed calling and filtering for de novo variants using deNovoGear [17].
For P6, the DYNC1H1 variant in patient and parents was found by a commercial next-generation sequencing gene panel for fetal akinesia (ID 078.03, MGZ München).
In P7, we performed WES after enrichment with the NimbleGen MedExome kit (Roche NimbleGen, Basel, Switzerland) on an Illumina HiSeq 4000 Sequencing System (Illumina, San Diego, CA, USA) with 2 × 150 bp sequencing protocol according to the manufacturer's best- Fig. 1 Overview of DYNC1H1 variants identified in this study. Calculated MTR and CADD-Phred score values for the variants from the healthy population and our patient collective show that pathogenic DYNC1H1 mutations cluster in regions of less genetic heterogeneity, specifically in highly conserved domains. a Ten variants in the DYNC1H1 gene (NM_001376.5, 78 exons) identified in our patients and concomitant position in b. DYNC1H1 protein structure (Q14204); pictogram with protein domains: coiled coil domain (CC, gray), ATPase associated with various cellular activities domain (AAA, red), ATP-binding region in AAA domain (ATP, dark brown), rest of protein in blue. We noted all regions (beginning tail region gray, linker region dark blue, motor region red, stalk/microtubule-binding domain green, end tail region gray) and specified the dimerization domain in yellow with interaction partners DYNC1I2 and DYNC1LI2 noted below. The mutations on protein level are presented in the above-mentioned color scheme. c Missense tolerant ratio (MTR) gene viewer result for DYNC1H1 (ENST00000360184) with window size 21 (http://biosig.unimelb.edu.au/mtr-viewer/); patients' variants are marked with blue crosses. Protein regions noted below as in b. d CADD-Phred scores of all gnomAD variants with ClinVar patient variants (marked with red asterisks) and our patient's variants (marked with blue asterisks), score >20 indicates likely pathogenic computation, score >30 indicates pathogenic [48]. In general, CADD is a genelevel scoring for potential proxy-deleterious variants and has to be treated with caution. The linker mutations in our patient collective show amino acid exchanges with more significant changes in physicochemical properties when compared with variants from a healthy population dataset. The patients' mutations in the motor region are found in highly conserved AAA domains with higher CADD-Phred score values. However, the pathogenic mutations from patients are in regions where allele frequencies and high CADD-Phred scores are "thinned out". For the raw data, please see Supplementary Table 2. Protein regions noted below as in b. e Violin plot for CADD-Phred scores for variants recorded in gnomAD database (left in blue, https://gnomad.broadinstitute.org/); likely pathogenic and pathogenic variants according to ClinVar (middle in orange, https://www.ncbi. nlm.nih.gov/clinvar/), and ten patients variants (right in red), please see Supplementary Table 2 for raw data. Variance analysis (ANOVA, SigmaPlot 12.5, SYSTAT, USA) revealed significant differences between the groups "gnomAD variants" and "ClinVar variants" (**p < 0.01) as well as the groups "gnomAD variants" and "patients' variants" in our ten patients (*p < 0.05). There was no significant difference between the groups "ClinVar variants" and "patients' variants" practice protocol. The variant calling and filtering pipeline were described earlier elsewhere [14].
In P8, trio-WES was performed after enrichment using the SOLiD-optimized SureSelect All Human Exon Kit (50 Mb; Agilent, Technologies, Santa Clara, CA, USA), followed by sequencing on 5500XL sequencers (Life Technologies, Carlsbad, CA, USA). Quality control parameters were checked throughout the laboratory workflow. Sequence reads were aligned to the human genome (hg19) using Lifescope v2.1 (Life Technologies), followed by variant calling on the aligned sequence. Variants were annotated using a custom analysis pipeline. Samples were automatically checked for quality (e.g., median coverage). For further information on the sequencing procedure, please see previously published data [18].
In P9, we performed WES with Agilent Clinical Research Exome Kit on an Illumina HiSeq 2000 with 2 × 100 bp reads, mean depth of coverage of 103×, and quality threshold of 95.9% (percentage of XomeDx which is covered by at least ten sequence reads/10× coverage). The data were aligned to reference NM sequence based on GRCh37/ hg19 and analyzed for sequence variants using a customdeveloped analysis tool (Xome Analyzer).
In P10, we performed a next-generation sequencing panel targeting PAFAH1B1, KIF5C, KIF2A, TUBG1, CRADD, and DYNC1H1, after using enrichment with the Agilent insolution hybridization technology followed by sequencing on an Illumina HiSeq Sequencing system (Illumina, USA). The analysis included a next-generation sequencing-based copy number variant calling and analysis. The panel was supplemented with an MLPA-based deletion-and duplication analyses for PAFAH1B1.
We performed dideoxy seqeucning for confirmation of all the patients' DYNC1H1 variants and for further cosegregation analyses, except for P1 and P10.
All variants were scored based on the classification by the standards and guidelines of the American College of Medical Genetics and Genomics-American College of Molecular Pathology (ACMG) for the interpretation of variants [19].
The missense tolerance ratio (MTR, Fig. 1c) calculates the number of observed missense DNA variations relative to the number of all observed (missense and synonymous) single base variants, then divided by the number of expected missense mutations relative to the number of all possible variations in that segment [44]. To evaluate the evolutionary conservation, we performed multiple sequence alignment for all patients using Clustal Omega (https://www.ebi.ac.uk/ Tools/msa/clustalo/) (Fig. 2).
In order to judge the pathogenicity on a domain-specific protein level and investigate the domain-specific vulnerability and conservation of DYNC1H1, we analyzed datasets on DYNC1H1 variants in a healthy population (gnomAD database) with a patient collective. We performed twofactor analysis of variances (ANOVA, SigmaPlot 12.5, SYSTAT, USA) for CADD-Phred score values and MTR score values between the groups "reports" (healthy subjects vs patients) and the groups "protein regions" (tail vs linker vs motor vs MTBD). For these analyses, we pooled the ten patients from our collective with pathogenic and likely pathogenic DYNC1H1 mutation reports from the ClinVar database (Supplementary Table 2). For all statistical analyses, we performed one-factor or two-factor ANOVA as specified in the text, and we report significant differences with a p value below 0.05 or lower.
For statistical analyses of the genotype-phenotype analyses, we performed Pearson's Chi-Square test, Lambda, Phi, and Cramer V test (Supplementary Table 3) to correlate the phenotype to the localization of the mutations categorized by the domain (beginning tail, dimerization, linker, motor domain). We plotted the results with a balloon plot using R 3.6.3 GUI 1.70 El Capitan build (7735) for MacOS and the packages ggpubr, ggplot2, and magrittr (Fig. 4b, c).

Clinical findings
We report ten pediatric patients (P1-P10) with overlapping DYNC1H1-associated phenotypes, onset in infancy, and little to no disease progression (Table 1, Fig. 2). Motor development was delayed in all patients except for P10. Eight out of ten patients had muscular weakness and atrophy predominantly of the lower limbs (P1-P8) with "crouching" movements and three patients exhibited weakness also of the upper limbs (P1, P6, and P7). Deep tendon reflexes (DTR) were normal in three patients (P4, P9, and P10); all other patients had reduced DTR in the lower limb but normal DTR in the upper limb.
Domain-specific review of genotype-phenotype correlations in the literature As described in the methods, we performed a review of the literature for prevalent symptoms in DYNC1H1-associated disorders and portrayed the results in a balloon plot using R (Fig. 4b, c). The genotype-phenotype analyses on our patients and the patients found in the literature revealed a significant difference between the different domains of DYNC1H1. The statistical analyses revealed that specific mutations in the dimerization domain of DYNC1H1 corresponded to a NMD phenotype in reported patients with reductions in lower limb strength and mostly preserved upper limb strength (DYNC1H1-NMD, Fig. 4b). For the detailed results of the statistical tests, please see Supplementary Table 3.
For patients with specific MRI findings and symptoms associated with NDD, e.g., ID, behavioral abnormalities, and seizures, we perceived that the patients reported in the literature were largely spared from mutations in the dimerization domain (Fig. 4c). Instead, seizures were mostly reported in the motor domain, and ID and behavioral abnormalities were largely reported in the beginning tail, linker, and motor domains. MRI abnormalities were largely reported in the motor domain, specifically pachygyria.
We performed additional statistical analyses for variant pathogenicity with multiple prediction tools as described in the methods, which projected the variants in our patient collective to have a highly damaging impact (Fig. 1c- In the comparison between subjects with a two-factor ANOVA, we observed significant differences in CADD-Phred values between the reports "healthy subjects" (mean: 23.809; standard error of mean: 0.25) and "patients" (mean: 29.05; standard error of mean: 0.612) (p < 0.001). When differentiating the patient groups in a post hoc test (Bonferroni), there were significant differences between the groups "gnomAD variants" and "ClinVar variants (likely pathogenic and pathogenic)" (two-factor ANOVA, p < 0.01), and "gno-mAD variants" and "patients' variants" from our collective (two-factor ANOVA, p < 0.05, Fig. 1d, e).
In-depth interaction analyses for CADD-Phred score values uncovered a significant interaction between the group "reports" (patients vs healthy) and "regions" (above-mentioned protein domains) (two-factor ANOVA, p < 0.05). Specifically, in the linker domain, the mutations from patients showed significantly higher mean CADD-Phred score values than the variants in healthy subjects (scores in healthy group 23.1 vs patients 32.3, two-factor ANOVA, p < 0.001), while other domains showed less drastic differences in the  In the next step, we evaluated the intolerance toward missense mutations throughout the regions of DYNC1H1.
The MTR for our patients' variants showed that these variants were in regions of high intolerance toward missense variations (Fig. 1c). In the comparison between subjects with a two-factor ANOVA, we tested for differences in MTR score values between healthy subjects and patients  Fig. 4 Overview of overlapping clinical disease manifestation of DYNC1H1-associated disorders and domain-specific presentation of genotype-phenotype correlation based on literature review. a On the left, Venn diagram of the recorded symptoms in patients with each of the three known entities associated with DYNC1H1 mutations and the overlap of phenotypes in DYNC1H1-associated disorders: Charcot-Marie-Tooth disease Type 20 (CMT20), lower extremity-predominant spinal muscular atrophy (SMALED), and cortical malformations. The symptoms were taken from an extensive PubMed literature search ("dync1h1", with each "motor neuropathy", "CMT20", "charcotmarie-tooth", "SMALED", "spinal muscular atrophy", "malformation", "MRD13", "mental retardation"). Specifically, neuromuscular symptoms as in CMT20 and SMALED were mostly observed in patients with mutations in the dimerization domain and cortical malformation was mostly observed in motor domain mutations. On the right, a simplified overview of the protein model from Fig. 1b. b Balloon plot for symptoms "reduction of upper limb strength" and "reduction of lower limb strength" recorded in the literature search with mutations in the beginning tail, dimerization, linker, and motor domains. The size of patient groups denoted with the size of circles on the right (smallest circle 20, biggest circle 80). The calculated normalization quotient (from green to blue to red, on the right) from Pearson's chi-square test as described in methods revealed clustering of reductions of lower limb strength with preserved upper limb strength in the dimerization domain. c Balloon plot for symptoms "seizures", "intellectual disability", "behavioral abnormalities", "MRI abnormalities" in general, "pachygyria", "enlarged ventricles", "hypoplasia corpus callosum", "hypoplasia cerebellum", "hypoplasia brain stem", and "gray matter heterotopia" recorded in the literature search with mutations in the beginning tail, dimerization, linker, and motor domains. The size of patient groups denoted with the size of circles on the right (smallest circle 10, biggest circle 50). The calculated normalization quotient from Pearson's chi-square test as described in methods revealed clustering of intellectual disability and behavioral abnormalities in patients with mutations in the beginning tail, linker, and motor domains. Seizures and all MRI abnormalities specifically clustered in patients with mutations in the motor domain. Patients with mutations in the dimerization domain were largely spared of these symptoms, thus underlining the hypothesis that DYNC1H1-NMD and -NDD can be traced to specific domain mutations
In-depth interaction analyses for MTR score values also revealed significant differences in MTR scores between protein regions (two-factor ANOVA, p < 0.001), while mean MTR scores decreased over protein length (tail 0.63 > linker 0.55 > motor 0.55 > MTBD 0.48). In a post hoc test (Bonferroni), we tested for the interactions between reports (healthy vs patients) and protein regions and observed that variants in the linker domain had the highest mean MTR scores in the healthy population (0.72), but the lowest mean MTR scores in patients (0.39) (two-factor ANOVA, p < 0.001). Mean MTR scores for other domains also showed dissociations between healthy subjects and patients (tail 0.68 vs 0.57, motor 0.65 vs 0.45, two-factor ANOVA, all p < 0.001), while the calculation for MTBD showed similar values between the groups (0.46 vs 0.44, two-factor ANOVA, p < 0.001).

Discussion
We report ten patients with nine novel mutations in the DYNC1H1 gene (Table 2). In neurons, DYNC1H1 as part of the cytoplasmic dynein complex is essential for retrograde transport of cargos in axons and dendrites, thus involved in neuronal development, morphology, and survival [7,9,45].
In the literature, an increasing number of phenotype expansions have shown an overlapping phenotype link between motor neuropathies and brain malformations [32,37,46,47]. We propose a novel clinical classification of DYNC1H1-related disease entities that follows a holistic approach, focusing on the patients' individual but complex clinical traits in the center of the classification, rather than the current reductionistic classification (e.g., SMALED or MRD13): DYNC1H1-related disorders with an exclusive NMD phenotype, DYNC1H1-NMD, and a combined NMD-CNS phenotype, DYNC1H1-NDD, on either sides of the spectrum (Fig. 4).
Genomic profiling of population datasets for the domain-specific impact of genetic variation All mutations in our ten patients showed high CADD scores, high intolerance toward missense variations (MTR, Fig. 1c) in evolutionary well-conserved domains (Fig. 2), and no reports in population databases (Table 2). When comparing healthy with patient report datasets, we observed that pathogenic mutations with significantly higher CADD scores in patients (Table 2) in all domains due to evolutionary conservation and drastic effects on physicochemical properties cause by amino acid exchanges at the mutational sites (p < 0.001). This is also evident from multiple lines of in silico pathogenicity scores (Supplementary Table 2). Next, we looked at domain-specific statistical pathogenicity prediction tools. The significant interaction between reports and protein regions revealed particularly higher CADD-Phred score value increases in the linker domain (healthy 23.1 vs patients 32.2, p < 0.001), which hints at an evolutionary well-conserved domain (Fig. 2).
The tail domain itself shows rather low conservation throughout species and variants in the tail domains can be observed quite often in human, thus CADD-Phred score values in the healthy dataset are rather low. The tail domain mutations in patients tend to have rather high CADD-Phred score values and are situated at residues with rather high tolerance toward missense variants. The CC regions in the tail domains are highly conserved are also connoted with high CADD-Phred score values in patient mutations. As the CADD scores also comprise protein impact tools (e.g., Grantham, SIFT, and PolyPhen), the amino acid exchanges in the patients mutations show higher CADD scores and more drastic effects in physicochemical properties than in the healthy dataset. Clustered around the CC regions, the MTR scores show significantly higher intolerance toward missense mutations (p < 0.001).
The CADD and MTR score values further support that the motor domain is highly conserved and the mutational effects lead to drastic changes on the protein level. MTBD domain variants in the healthy and patient datasets show rather similar MTR scores, thus a similarly high intolerance toward missense variants. When comparing Fig. 1c, d, we note that there is a much lower number of variants as well as significantly lower mean CADD score value in the MTBD (p < 0.001), i.e., MTBD variants are generally scarce in healthy population datasets and the protein level scores do not show drastic effects in physiochemical properties.
Based on this genomic profiling of healthy and patient variant datasets, we underline the domain-specific effects of genetic variation in DYNC1H1 and we further recommend to interpret DYNC1H1 variants based on the following model: (1) Region location/ conservation: mutations in the linker, motor, and MTBD region are in highly evolutionarily conserved domains with important functional roles in processive and powerstroke movements. (2) Gene-wide missense intolerance: as a measure of regional intolerance to missense variation, the spatial distribution of observed vs expected variants have to be evaluated concerning the healthy population. Patient mutations are situated at genomic coordinates with significantly lower MTR scores, i.e., are more prone to intolerance from missense variation.
(3) Protein level change in physicochemical property: based on multiple lines of pathogenicity prediction scoring tools, the effect of the amino acid exchange in a mutation can be evaluated for missense variations for protein structural and functional properties, including secondary structure, solvent accessibility, functional domains, methylation, phosphorylation, and glycosylation.

DYNC1H1-NMD
In our cohort, no patient was characterized by exclusive DYNC1H1-NMD. In the literature, about half of patients reported an exclusive NMD phenotype, predominantly involving the lower limbs (SMALED, CMT20) [1,30] and presenting with delay in motor milestones, muscle weakness, atrophy hyporeflexia, and skeletal limb abnormalities. Most affected individuals developed secondary orthopedic symptoms like hyperlordosis and feet deformities [32]. However, patients did not exhibit CNS involvement, e.g., ID or cortical malformations (Fig. 4). DYNC1H1 mutations in patients with NMD were located in the tail domain of DYNC1H1 (53AA-1867AA), predominantly within the dimerization domain (300AA-1140AA) (Fig. 1a-d). Previous studies demonstrated that these tail domain mutations show no disruption in the retrograde movement of dynein along microtubules, in contrast to motor domain mutations. They rather exclusively shortened the run length of processive dynein-dynactin-BICD2N complexes, leading to a possible disruption of neuronal cargo delivery [8]. A hypothesis why the muscular atrophy in DYNC1H1-NMD affects predominantly the lower extremities is that neuronal transportation distance is longer compared with the upper extremity or the cortex, thus being more sensitive by a deduction in run length [8]. Further studies need to evaluate if extrinsic (environmental) or intrinsic factors (methylation, genome interactions) contribute to phenotypic variability.

DYNC1H1-NDD
A second group of patients presented DYNC1H1-NMD with concomitant CNS involvement. In our cohort, all patients presented with predominant lower extremity muscle atrophy, a variable degree of ID, global developmental delay, and/or brain malformations in MRI. From a recent molecular biological study, mutations in the linker region were observed with deficient powerstroke movements [2]. CC mutations (P1) displayed altered foci for plus-end dynein (dynactin independent) and cortical dynein (dynactin dependent). Furthermore, these mutations resulted in processive movement reductions of the dynein complex [8]. Mutations in MTBD are known to be associated with a reduction in velocity, displacement, and neck transit success, all of which are essential for the stabilization and advancement of movements [8] leading to a more severe disruption of motor activity like in the other variants associated with DYNC1H1-NDD. The fact that MTBD domain mutations do not appear in healthy datasets hints at the possibility of early onset lethal disease courses for MTBD domain mutations. Motor domain mutations, with the six AAA domains forming the core complex, were associated with microtubule gliding defects (AAA5) or inhibition of any movement (AAA1), whereas other protein areas (tail, linker, or MTBD) do not alter glinding [8]. Moreover, mutations lead to static binding to microtubule (AAA1) or diffuse binding behavior (AAA5 and MTBD), both resulting in disturbed motor activity and possibly secondarily in severe disruption of neuronal migration and myelination [8,45]. We identified four AAA domain mutations with NMD, cognitive-behavioral impairment, and brain malformation (P4, P5, P7, P9, Fig. 1a-d). These findings highlight that motor domain mutations lead to MCD due to a more severe disruption of the dynein movement.
P2 displayed cerebral hypomyelination on brain MRI with a mutation in CC7, which is involved in the sedimentation of microtubules with the MTBD in the context of a fusion protein with a heterologous CC [8].
Patients with DYNC1H1-NDD had different degrees of ID, learning, and language impairment, in line with our findings. Some reports described more severely affected individuals with epilepsy or spastic paraplegia and variable MCD including pachygyria and polymicrogyria, frequently in combination with ventricular anomalies, abnormal white matter and corpus callosum, and cerebellar hypoplasia [5,37]. Almost all of our patients had an overlapping phenotype with PNS and CNS involvement, signifying a clinical continuum (Fig. 4). Of note, all patients exhibited rare signs and symptoms closely linked to neuronal development (e.g., cataracts, syrinx, etc.), even patients with tail domain mutations (Table 1).
We propose a novel clinical classification of DYNC1H1related disease entities that follows a holistic approach, focusing on the patients' individual but complex clinical traits in the center of the classification, rather than the current reductionistic classification (e.g., SMALED or MRD13). Our new classification of DYNC1H1-related disorders involves the leading phenotype characteristics, i.e., DYNC1H1-NMD as a NMD phenotype and DYNC1H1-NDD with concomitant CNS involvement.

Data availability
Any data not published within the article is available as anonymized data and will be shared by request from the authors. The next-generation sequencing raw data including either gene panel or WES cannot be fully shared publicly since this may facilitate the de-anonymization of the study subjects and also infringe their privacy. We also do not have ethical permission to share the full and raw next-generation sequencing data or full variant list from it individually.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.