Introduction

Histone lysine methylation defects are an important cause for developmental disorders and cancers [1, 2]. KMT2D (formerly known as MLL2 and ALR) encodes lysine (K)-specific methyltransferase 2D, which catalyses the mono-, di- and trimethylation of the lysine 4 on histone 3 (H3K4), promoting the expression of its target genes [3]. Germline deleterious heterozygous KMT2D variants cause Kabuki syndrome type 1 (KS, MIM# 147920), a rare congenital disorder characterized by intellectual disability, growth retardation, distinctive facial features and structural anomalies [4,5,6,7]. Somatic deleterious KMT2D variants have been described in a spectrum of cancers including leukaemias, gastrointestinal and central nervous system tumours [8, 9].

Correct interpretation of KMT2D variants is crucial for diagnosis in KS and disease progression in cancers [10, 11]. About 80% of deleterious germline KMT2D variants are predicted to result in a truncated protein [5] (Figure S1). Germline pathogenic missense KMT2D variants are also frequently encountered in KS [4, 5, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. In contrast, only 35% of somatic KMT2D variants in cancers are predicted to be protein truncating (Figure S1). Approximately 50% of the somatic variants found in cancers are missense, and the remaining are in-frame insertions/deletions and synonymous variants [32] (Figure S1).

Although limited functional analysis of KMT2D variants is now possible, determining the consequences of KMT2D missense variants (MVs) in diagnostic setting remains challenging because parental segregation is not always possible, and especially due to incomplete understanding of KMT2D protein structure and its interactions [33,34,35,36]. Notably, the three-dimensional structure of only the SET domain of the protein is available (PDB entries 4z4p and 4erq) [37]. A systematic study of KMT2D MVs can, therefore, have significant clinical benefits and help to distinguish pathogenic from benign germline variants and driver somatic variants from passenger ones. Additionally, this may provide insights into the structure and function of this important protein. Furthermore, the consequences of disease-causing germline and somatic variants can be different. For example, some activating somatic BRAF variants cause malignant melanoma [38], while other activating germline BRAF variants cause cardiofaciocutaneous syndrome (MIM #115150) [39]. Somatic loss-of-function SMARCA4 variants cause hypercalcemic type small cell carcinoma of the ovary [40] and postulated activating germline SMARCA4 variants are associated with Coffin-Siris syndrome (MIM #614609) [41]. However, germline and somatic KMT2D MVs have not previously been systematically compared. Likewise, loss-of-function, dominant-negative or activating germline MVs in the same gene can cause different phenotypes or diseases [42,43,44,45]. Although, all KS-causing KMT2D variants are presumed to be loss-of-function, the possibility of other phenotypes resulting from a different spectrum of germline KMT2D variants has not been examined. Similarly, loss-of-function, dominant-negative or activating somatic MVs can have different consequences [46]. However, this aspect has not been explored for KMT2D previously. For all these reasons, we performed a comprehensive systematic study of KMT2D MVs.

Methods

The study design is summarised in Fig. 1. The databases and tools used in this study are summarised in Tables S1 and S2.

Fig. 1
figure 1

Study design. Summary of steps followed for compilation and analysis of missense variants (MV) in KMT2D

Compilation and interpretation of KMT2D MVs

KMT2D MVs reported in control population (Control-MVs) were compiled from the Exome Aggregation Consortium [47] (ExAC, Version 0.3.1) database, the 1000 Genomes (1K-G) Project [48], database of single nucleotide polymorphisms (dbSNP) [49] and the NHLBI-GO Exome Sequencing Project (ESP) [50]. The ExAC data was accessed via http://exac.broadinstitute.org/ and the other data were obtained from the Ensembl version 80-GRCh37. For ExAC, only high-quality and non-flagged sites were included. For analyses, we assumed that Control-MVs did not result in any phenotype.

KMT2D MVs annotated as being identified only in somatic tissue (Cancer-MVs) were compiled from the Catalogue of Somatic Mutations in Cancer (COSMIC) [32] database, version 77.

KMT2D MVs reported in KS (KS-MVs) were obtained from literature (and cross-checked with Human Gene Mutation Database Professional® [HGMD]) [51], ClinVar [52] and our in-house database for Kabuki syndrome test results. Of note, the Manchester Centre for Genomic Medicine has offered diagnostic KMT2D genotyping by sequencing since 2012.

All the Control-MVs, Cancer-MVs and KS-MVs were assessed by the Ensembl Variant Effect Predictor (VEP) [53] to obtain their minor allele frequencies and to identify the variants that were likely to disrupt splicing. EX-SKIP tool [54] was used to identify substitutions that may result in exon skipping in mature transcripts. All MVs predicted not to disrupt splicing were mapped with their frequencies on KMT2D protein domains, regions and motifs (according to UniProt accession number O14686) using the Mutation Mapper tool from the cBio Cancer Genomics Portal [55, 56]. For purpose of our analysis, we divided the regions of the protein sequence that are not part of a specific domain or motif into 19 ‘no domain’ regions (Figure S2).

Next, for all MVs that were predicted not to significantly affect splicing, we generated the Blocks Substitution Matrix Series 62 (BLOSUM62) [57] scores for evolutionary conservation analyses, the Ensemble Learning Approach for Stability Prediction of Interface and Core mutations (ELASPIC) algorithm ∆∆G values [58] for changes to the thermodynamic properties resulting from substitutions, the Structural Mutation Annotation (StructMAn) score [59] for calculating the impact of MVs on the interaction of KMT2D with other proteins and ligands, obtaining the probability-of-disruption scores when possible. The PDB file for the longest chain reported for KMT2D as part of a complex was downloaded from the Protein Data Bank in Europe [37] (PDB entry 4erq) in order to support the analyses given by ELASPIC and StructMAn.

Statistical analysis

To study the association between the type of the phenotype and the location of MVs, the likelihood ratio chi-square test was applied. The Z-test with the Bonferroni correction was used to compare the proportion of MVs on each location according to the phenotype. The Kruskal–Wallis test with multiple comparisons was applied to compare the BLOSUM62 scores, ELASPIC ∆∆G and StructMAn interaction scores amongst the phenotypes, which were also described using the median and interquartile range. For all statistical analyses, the IBM SPSS® version 22 programme was used and a two-sided, exact p-value < 0.05 was considered as significant.

Results

Compilation of variants

In total we identified 1920 distinct MVs, which included 1535 KMT2D Control-MVs, 584 KMT2D Cancer-MVs and 201 KS-MVs (Table S3). Of note, six MVs were reported in all three groups, 85 were reported in both Cancer-MVs and Control-MVs groups, 83 were reported in both KS-MVs and Control-MVs groups, and 23 were reported in both Cancer-MVs and KS-MVs groups (Figure S3) (Tables S3-1).

The MAFs for 1211/1535 (78.9%) Control-MVs were <1/10,000, and for 53/1535 (3.5%) Control-MVs was >1/1000 (Table S3). The Arg5048 was the most frequently altered amino acid in the Cancer-MVs group (7/584, 1.2%), followed by Arg3582 and Arg3727 (each 5/584, 0.9%) (Table S3). The Arg5179 was the most frequently altered amino acid in the KS-MVs group (8/201, 4%), followed by the Arg5048 and Arg5432 amino acids (each 7/201, 3.5%) (Table S3).

16/1535 Control-MVs, 14/584 Cancer-MVs and 11/201 KS-MVs were predicted to significantly affect splicing (two of these variants were present in both Control-MVs and KS-MVs groups, and one in both Cancer-MVs and KS-MVs groups) (Tables S3-2) (Fig. 2). As these variants are likely to result in loss-of function by introduction of frameshift, they were excluded from subsequent analyses that were performed on 1519 Control-MVs, 570 Cancer-MVs and 190 KS-MVs. The proportion of presumed MVs predicted to affect splicing is significantly higher for KS-MVs and Cancer-MVs in comparison with Control-MVs (χ2 = 21.88, df = 2, p = 0.000018). Of these 41 variants that are predicted to disrupt splicing, 6/16 (37.5%) in controls, 8/14 (57.1%) in cancer and 7/11 (63.6%) in KS affect either the first or last bases of exons, demonstrating a further enrichment of canonical splice-donor and splice-acceptor sites in cancer and KS (Tables S3-2) (Fig. 2). EX-SKIP tool analysis showed that out of these six Control-MVs, two (c.50C>T and c.5188G>A) did not increase the probability of exon skipping when compared against wild-type (WT) and the remaining four (c.4131G>C, c.4419G>T, c.4693G>T, c.4694C>T) variants were predicted to result in in-frame exon skipping.

Fig. 2
figure 2

Presumed KMT2D MVs that are likely to disrupt splicing are enriched in Kabuki syndrome and cancer. Variants affecting the first or last three bases of exons (first/last in red, second/second last in orange and third/third last in green) are depicted. Variants seen in Kabuki syndrome are denoted with *, variants seen in cancer are denoted with #, and are placed above the transcript (ENST00000301067.11), whereas control variants are placed below the transcript. The proportion of presumed MVs predicted to affect splicing is significantly higher for KS-MVs and Cancer-MVs in comparison with Control-MVs (χ2 = 21.88, df = 2, p = 0.000018). Within the variants predicted to disrupt splicing, a further enrichment of canonical splice-donor and splice-acceptor sites can be found in cancer and KS (variants in red). Interestingly, the six Control-MVs affecting the canonical splice-donor and splice-acceptor sites either do not increase the probability of exon skipping or are predicted to result in in-frame exon skipping

Location of MVs

We identified several regions of constraint for Control-MVs (Fig. 3; Tables 1 and 2). Cancer-MVs clustered in PHD#3, PHD#4, RING#4, FYR-C and SET domains in comparison with Control-MVs (p < 0.05) (Tables 1 and 2). Cancer-MVs also clustered specifically between amino acid numbers 3043–3248 (No Domain #8 in Figure S2) when compared with Control-MVs and KS-MVs (p < 0.05) (Table 2). KS-MVs clustered in PHD#3, PHD#4, coiled-coil#5, RING#4, FYR-N and SET domains when compared with Control-MVs (p < 0.05) (Tables 1 and 2). KS-MVs also clustered specifically between amino acid numbers 4995–5090 (No Domain #16 in Figure S2) when compared with Control-MVs and Cancer-MVs (p < 0.05) (Table 2).

Fig. 3
figure 3

Specific regions of the KMT2D protein are enriched for missense variants in Kabuki syndrome and cancer. Distributions of KMT2D missense variants (MV) seen in a control population, b cancers, and c Kabuki syndrome (KS) is shown. The X-axis shows the length of the KMT2D protein and the location of its domains and regions. The domains and regions are colour-coded and the legend is provided at the bottom of the figure. The enriched regions/domains in cancers or in Kabuki syndrome are highlighted in red brackets in the respective panels. The Y-axis in (a) shows minor allele frequencies of controls and in (b and c) the number of times a specific Cancer-MV or KS-MV was seen in our cohort. d Proportion of KMT2D missense variants grouped according to domains and regions

Table 1 Comparison of proportions of missense variants seen in control population, cancer and Kabuki syndrome according to their grouped locations
Table 2 Comparison of proportions of missense variants seen in control population, cancer and Kabuki syndrome according to every significantly different location

Consequences on protein properties

The median BLOSUM score for Control-MVs was −1 (−2;1), for Cancer-MVs was −1 (−2;0), and for KS-MVs was −1 (−2;0) (Fig. 4). Overall, the BLOSUM scores for Cancer-MVs and KS-MVs were significantly lower when compared to Control-MVs (p < 0.001 and p = 0.007, respectively) (Fig. 4).

Fig. 4
figure 4

Cancer and Kabuki syndrome MVs affect more conserved residues, increase KMT2D delta-delta free energy and may disrupt its interaction with other proteins. Global comparisons of a BLOSUM62, b ELASPIC ∆∆G and c StructMAn scores of missense variants (MV) seen in control population, cancers and Kabuki syndrome. When compared to Control-MVs, Cancer-MVs and KS-MVs have both significantly lower BLOSUM scores, KS-MVs have significantly higher ELASPIC ∆∆G scores, and Cancer-MVs have significantly higher StructMAn scores

The ELASPIC ∆∆G score for Control-MVs was 0.76 (0.25;1.07), for Cancer-MVs was 0.89 (0.4;1.46), and for KS-MVs was 0.98 (0.34;2.17) (Fig. 4). The ELASPIC ∆∆G scores for KS-MVs were significantly higher when compared to Control-MVs (p = 0.03) (Fig. 4). No other pairwise comparisons were significant (Fig. 4).

The StructMAn score for Control-MVs was 0.17 (0.14;0.26), for Cancer-MVs was 0.32 (0.15;0.42), and for KS-MVs was 0.21 (0.14;0.34) (Fig. 4). The StructMAn scores for Cancer-MVs were significantly higher when compared to Control-MVs (p = 0.019). No other pairwise comparisons were significant (Fig. 4).

Discussion

We present a comprehensive analysis of KMT2D MVs reported in control populations, cancers and KS. Rare KMT2D MVs are frequent in the general population as nearly 80% of Control-MVs have a MAF < 1/10,000 (Table S3). Hence, the rarity of a KMT2D variant is not a reliable indicator of pathogenicity. This compilation highlights five arginine residues in KMT2D that are recurrently substituted in cancer (Arg5048, Arg3582 and Arg3727) and KS (Arg5048, Arg5179 and Arg5432) (Table S3). Interestingly, Arg5048 is amongst the most frequently mutated residues in both cancer and in KS. Arg5048 and Arg5432 are located outside any recognized domains of the protein (No domain #16 and #18, respectively in Figure S2). The Arg5432Trp substitution has been shown to disrupt the interaction of KMT2D with RBBP5 and ASH2L, and result in loss of its catalytic activity [60]. Arg5179 is located in the FYR-N domain, which is a region of around 50–100 amino acids enriched in phenylalanine (F) and tyrosine (Y) found in chromatin-associated proteins [61]. Arg3582 and Arg3727 are located in the coiled-coils #3 and #4, respectively. Coiled-coils are a type of secondary structure composed of two or more alpha helices which pack together like a cable. These structures help to position catalytic activities at fixed distance [62].

Intriguingly, we found that six KMT2D MVs have been described in controls, cancers and KS; 85 in cancer and controls; and 83 in KS and controls (Tables S3-1). Several possibilities could account for these MVs being observed in control and disease cohorts. Overlap between controls and cancer-MVs could be explained by incorrect curation of germline variants as somatic-only in the COSMIC database or wrongly curated somatic variants as germline benign variants in controls. Overlap between controls and KS-MVs could be explained by incorrect interpretation of pathogenicity of these benign variants in KS. Alternatively, these variants may be causing KS with reduced penetrance. However, incomplete penetrance has never been reported in KS. Notably, in other disorders, somatic mosaicism of truly pathogenic variants in healthy controls has been described (e.g. in Bohring-Opitz syndrome) [63] and this could be another explanation for some overlap observed between KS-MVs and Control-MVs. 65/83 of the overlapping KS-MVs and Control-MVs are located outside the regions of enrichment in KS-MVs, therefore, they are more likely to be benign variants (Tables S3-1.

MVs predicted to alter splicing, those affecting canonical splice-donor and splice-acceptor sites were significantly more frequent in cancer and KS, which is consistent with the loss-of-function mechanism associated with these two disorders (Tables S3-2) (Fig. 2). These variants in cancer and KS should be more appropriately reclassified as splicing variants.

Of note, the six Control-MVs affecting the first or last nucleotide of exons are all located at the first half of the gene (exons 2, 13, 16, 17, 18, 21; Fig. 2), which should allow the expression of an alternative protein coding transcript (ENST00000526209.1). The protein encoded by this alternative transcript includes the catalytic SET and Post-SET domains without the PHD-type and RING-type zinc fingers, the SPPPEPEA region, the HMG Box, coiled-coils, the LXXLL motifs and the FYR-N and -C domains (Figure S2). This observation points towards the potential redundancy of the N-terminus of KMT2D, which is consistent with previous observations and may indicate the compensatory capacity of the alternative transcript for normal development [60, 64, 65]. Interestingly, 11/16 (68%) KMT2D protein-truncating variants (PTVs) reported in ExAC are located in the first half of the gene (from residue 1–2768). This is in contrast with KMT2D PTVs in HGMD and COSMIC, where 39% of KS-PTVs, and 53% of Cancer-PTVs are in this region.

We demonstrate significant clustering of Cancer-MVs and KS-MVs in the PHD-type zinc fingers #3 and #4, RING-type zinc finger #4 and SET domains, reflecting the importance of these domains in the function of KMT2D. The PHD (plant homeodomain) fingers are domains of 50–80 amino acids containing a zinc-binding motif that appears in many chromatin-associated proteins, which recognise methylated H3K4 [66]. The RING-type zinc fingers are composed of 40–60 amino acids that bind two atoms of zinc, and may mediate protein–protein interactions [67]. The SET (Su(var)3-9, Enhancer-of-zeste, and Trithorax) domain is composed of 130–140 amino acids in which resides the methyltransferase activity and the substrate-binding sites [60, 68]. This similarity of clustering seen in Cancer-MVs and KS-MVs is strongly suggestive that these variants result in loss-of-function.

We found significant clustering of Cancer-MVs in the FYR-C domain and between residue numbers 3043–3248 (No domain #8 in Figure S1). The FYR-C domains have the features similar to those of FYR-N domains [61]. Notably, these regions were not enriched for KS-MVs. The lack of KS-MVs in these regions could be due to the lack of power of our study. Alternatively, these variants may result in dominant-negative or gain-of-function effects, specific to some cancers. We, therefore, specifically looked at the type of cancers reported with Cancer-MVs in the FYR-C domain and between residues 3043–3248 (No domain #8). This showed that 87% and 82.1% of the variants detected in the FYR-C domain and No domain #8 regions came from solid cancers, but there was no enrichment for a specific type of cancer (Table S3). Another possibility is that germline MVs in this region may result in a condition different from KS, which has yet to be delineated.

460/570 (80.7%) Cancer-MVs were outside the regions of the protein with statistically significant clustering. Interestingly, 84/460 Cancer-MVs are part of set of overlapping Cancer-MVs and Control-MVs in comparison with only 7/110 Cancer-MVs in the cancer-enriched regions of KMT2D (Tables S3-1). Overall, this analysis suggests that a substantial number of these Cancer-MVs, which lie outside the cancer-enriched regions of KMT2D, may not be driver variants but passengers ones.

For KS-MVs we detected significant clustering in the coiled-coil#5 and FYR-N domains, and in between residue numbers 4995–5090 (No Domains #16 in Figure S1), but we did not identify significant clustering in these regions for Cancer-MVs. As MVs in these three regions are likely to result in loss-of-function, the lack of Cancer-MVs in these regions is likely to be due to lack of statistical power.

120/190 KS-MVs were outside the regions of the protein with statistically significant clustering. Of note, 75/120 KS-MVs were also seen in Control-MVs in comparison with only 12/70 KS-MVs in the KS-enriched regions of KMT2D (Tables S3-1). Furthermore, 107/120 MVs were either inherited from an apparently unaffected parent or the information on inheritance was unavailable. Taken together, 75 KS-MVs can be classed as benign or variants of uncertain significance when classified according to the American College of Medical Genetics guidelines [69]. Finally, the misdiagnosis of KS in some patients might also explain that their phenotypes do not match with their genetic findings, which may be benign. Unfortunately, many KS-MVs were got from sources without a comprehensive individual delineation of the syndrome, and most of those patients were just described as suffering from KS (e.g. ClinVar, Hannibal et al. [12]; Van Laarhoven et al. [30]). Therefore, we could not filter patients with a true KS phenotype from those without it.

Twenty-two MVs were seen in both KS and cancers (Tables S3-1). Of note, 21 of these were present in KS-enriched and/or Cancer-enriched regions. The unique MV that was not part of any of these enriched regions, the p.Arg5340Leu substitution, may abolish the interaction between KMT2D and WDR5 resulting in the complete loss of the H3K4 dimethylation activity of the complex [33, 34]. Thus, all the overlapping KS-MVs and Cancer-MVs are highly likely to be pathogenic.

We did not find clustering of pathogenic MVs in a number of recognised domains and motifs in KMT2D such as the SPPPEPEA region, the HMG Box, most coiled-coils (except coiled-coil#5), the LXXLL motifs and the Post-SET domains. The SPPPEPEA region is a poorly characterised sequence of repeats composed by the amino acids Serine (S), Proline (P), Glutamic acid (E) and Alanine (A) [70]. The HMG (High mobility group) Box is a sequence of ~75 amino acids that binds DNA [71]. The LXXLL (L, Leucine; X, any amino acid) motifs are necessary to activate nuclear receptors, and therefore, to activate transcription [72]. The Post-SET domain also contributes to the methyltransferase activity of KMT2D [68]. Our results suggest that these regions of KMT2D are more tolerant to variations or that there may be as yet unrecognised phenotypes associated with variants in these regions.

We found that the Cancer-MVs and KS-MVs tend to affect more conserved residues, KS-MVs increase the energy that the protein needs for folding/interacting, and that Cancer-MVs have a greater probability of disrupting protein interactions. We did not identify significant difference in the ELASPIC ∆∆G scores or StructMAn scores of Cancer-MVs or KS-MVs against Control-MVs, respectively (Fig. 4b, c), which could be due to limited available information on dynamics and interaction sites of KMT2D. This is reflected by our observations that the ELASPIC ∆∆G scores and StructMAn interaction scores could be generated for only 222/2279 MVs and 92/2279 MVs, respectively. This also limited the analysis of scores according to the locations (e.g. the enriched regions) as most of these values were given for the catalytic, PHD-1 and PHD-2 Zinc fingers domains only (Table S3).

Although this approach needs confirmation by large-scale functional analyses, which are being described just recently [73], and a better characterisation of the protein structure of KMT2D, a recent study about functional consequences of some MVs in this gene confirms our methodology. Cocciadiferro et al. [34] demonstrated that MVs detected in patients with KS and located on PHD-type zinc fingers #3 and #4 (p.Glu1391Lys, pMet1417Val, p.Ile1428Thr, p.Ser1476Cys), RING-type zinc finger #4 (p.Thr5098Pro), FYR-N (p.Gly5189Arg, p.Trp5217Met) and SET (p.Arg5471Met, p.Glu5425Lys, p.Arg5471Met, p.Tyr5510Asp) domains, and in between residue numbers 4995–5090 (No Domain #16; p.Phe5034Val, p.His5059Pro) decreased catalytic activity and/or disrupt the interaction of KMT2D with ASH2L/RbBP5. These are exactly the same regions and domains that our study found to be enriched in KS-MVs when compared to Control-MVs. Two exceptions are PHD-type zinc finger #5 and coiled-coil#5 domains. While the p.Gln1522Arg MV in the former also disrupted enzymatic activity and interaction with ASH2L/RbBP5, this domain was not detected to be enriched in KS-MVs in our analysis. This may be explained by the lack of enough MVs detected in patients with KS in this domain. Inversely, no MVs in coiled-coil#5 were studied by Cocciadiferro et al. [34], which cannot discard this domain as relevant for the function of KMT2D.

Similarly, few Cancer-MVs have been characterised functionally and those findings are also concordant with our results. Zhang et al. [74] demonstrated that MVs detected in patients with lymphomas and located on RING-type zinc finger #4 (p.Cys5092Ser, p.Cys5092Tyr), FYR-C (p.Asp5257Val) and SET (p.Arg5432Trp, p.Asn5437Ser, p.Gly5467Asp) domains decreased catalytic activity of KMT2D. These three domains were found to be enriched in Cancer-MVs when compared to Control-MVs. Other relevant MVs that decreased KMT2D activity in lymphomas were p.Arg5027Leu and p.Leu5056, which are located between residue numbers 4995–5090 (No Domain #16). This region was not detected to be enriched in Cancer-MVs in our analysis, which may be explained by the type of cancer studied. Inversely, no MVs in PHD-type zinc fingers #3 and #4, and between residue numbers 3043–3248 (No domain #8) were studied by Zhang et al. [74], which cannot discard these domains as relevant for the function of KMT2D.

In conclusion, this compilation can aid analysis of KMT2D MVs in diagnostic laboratories. We show that rarity of KMT2D variants has limited value in determination of their pathogenicity. We have identified a set of recurrent KMT2D MVs in cancer and KS. We show that some presumed KMT2D MVs are in fact likely to result in loss-of function by introduction of frameshift. This work leads to reclassification of a set of presumed pathogenic MVs as benign variants or as VUS. We identify regions of the KMT2D protein that demonstrate significant clustering of MVs in cancer and KS within and outside the known domains and regions of the protein. We establish that the mechanism of most pathogenic KMT2D Cancer-MVs is loss-of function, although other possibilities cannot be ruled out for some atypical Cancer-MVs. We raise the possibility of as yet unrecognised ‘non-KS’ phenotypes associated with some germline pathogenic MVs. Finally, this work provides insights into the disease mechanism of cancers driven by KMT2D mutations and of KS1 (Kabuki syndrome type 1). Future work will be needed to understand the impact of the MVs that could not be examined by the described in silico programmes. Similar analyses in other genes, mutations in which also cause developmental syndromes and cancer, should also be carried out in the future [1, 2].