Introduction

Vertebrate craniofacial structures develop through multiple interactions between epithelium and the underlying mesenchyme, and a series of transcription factors are involved in these processes.1 Among these transcription factors are members of the Msx homeobox gene family that are related to the Drosophila msh (muscle segment homeobox) genes.2 In humans, the MSX gene family consists of two members - MSX1 and MSX2 - that are expressed in partly overlapping patterns in the embryonic craniofacial region.3 As a transcriptional repressor, Msx1 is crucial in palatogenesis and odontogenesis.4 In addition, it is also involved in limb formation, development of the nervous system and tumor growth inhibition.5, 6 In humans, MSX1 variants show pleiotropic phenotypes with variable association with non-syndromic cleft lip with or without cleft palate (CL/P,OMIM #608874, Orofacial Cleft 5, OFC5), non-syndromic tooth agenesis (OMIM, #106600 - Tooth Agenesis, Selective, 1; STHAG1), Witkop syndrome (OMIM, #189500) and Wolf-Hirschhorn syndrome (WHS, OMIM, #194190).7, 8, 9, 10 Mice with a homozygous deletion of Msx1 exhibit a complete cleft palate and failure of tooth development.11

Affected individuals from the same pedigree carrying the same MSX1 variant also show variable phenotype severity. Hypotheses for this variability include effects of epigenetic and environmental factors, stochastic effects, as well as the effects of modifier genes.12 The study of genotype-phenotype relations for MSX1 variants therefore seems complex. Previous studies have mainly focused on tooth agenesis, associating the MSX1 variants to the type and number of missing teeth, and it is generally concluded that the most frequently missing teeth in case of MSX1 variants are the second premolars.13, 14, 15 In a previous study we identified a novel MSX1 mutation causing tooth agenesis with cleft lip, further confirming that different MSX1 mutations may cause different phenotypes.16

This review reveals a strong correlation between observed phenotypes and the location in the MSX1 protein structure of the disease causing mutations. Mutations in the structural part, which disturb DNA binding of the homeodomain (HD), preferentially cause tooth agenesis with or without other phenotypes, while variants in the natively unfolded N-terminal part of the protein generally cause ns OFC. Interestingly we found that variants associated with ns OFC are all in-frame missense mutations, while syndrome-associated variants all are truncating mutations which do affect the HD. Truncating MSX1 mutations cause more severe tooth agenesis than in-frame MSX1 mutations. Using the known part of the 3D molecular MSX1 structure for locating the different variants in this review, the indel mutations are predicted to have the most deleterious effect on the DNA binding function, and hence cause the most severe tooth agenesis phenotypes.

Disease phenotypes caused by MSX1 mutations

MSX1 and non-syndromic orofacial clefting

Non-syndromic orofacial clefts (ns OFC) are common birth defects in humans and are generally classified as cleft lip with or without cleft palate or cleft palate only. The etiology of ns OFC is complex involving genetic as well as environmental factors.17 Msx1 first emerged as a candidate gene for clefts based on a transgenic knock-out (KO) mouse experiment.11 Thereafter case-control and nuclear family-based studies showed that MSX1 also plays a role in human clefting and more than 10 MSX1 variants have already been related to CL/P7, 18 (Table 1; Figures 1a and b). It has been predicted that MSX1 variants account for the etiology of 2% of all OFC cases.7, 21 Although Msx1 was shown to maintain the growth of the primary palate during mammalian palatogenesis through expression regulation of growth factors such as Bmp4,22 the exact role of Msx1 in these and other regulatory actions is still largely unknown. By investigating information assembled by KMAD (Knowledge-based Multiple sequence Alignment for intrinsically Disordered proteins),23 we found the highly conserved amino acid sequence motif Lig_EH1_1 (the Engrailed Homology-1 motif) of which the Eukaryotic Linear Motif (ELM) database24 reports that it is part of a composite, highly conserved motif (Figure 2) in eukaryotes where it recruits other proteins that subsequently cause down regulation. This motif is observed in the Internally Disordered Part (IDP) recently also called the Natively Unfolded Part (NUP) of the MSX1 protein that lies N-terminal to the homeodomain (see Figure 2). This EH1 motif is known to mediate physical interaction with transcriptional co-repressors of the Groucho/TLE proteins family. The HE1 motif recruits and binds these co-repressors taking part in transcriptional down regulating protein complexes.25

Table 1 MSX1 mutations with non-syndromic orofacial clefts (nsOFC)
Figure 1
figure 1

Distribution of all MSX1 mutations included in this review. Reference sequence NC_000004.12, chrom 4, GRCh38.p2 (GCF_000001405.28); NP 002439.2; NM 002448,3 was used on which the variant description of nucleotides and amino acids are based. (a) Pie diagram with the ‘in-frame’ (blue; n=18) versus 13 ‘truncating’ mutations of which 5 nonsense mutations (red), 5 indel mutations (yellow), 1 splice site mutation (green), 1 nonstop mutation (purple), as well as 1 whole gene deletion (brown). The in-frame mutations were only missense mutations. (b) Mapping of the MSX1 mutations with associated disease phenotypes to coding and non-coding structures of MSX1. The horizontal boxes on top represent exon 1 and exon 2; the line between the two exons represents the intron; the grey box is the homeodomain (HD) coding area; ATG, start codon; TAG, stop codon. The mutant MSX1 variants in the vertical blue boxes represent the in-frame missense variants; those in the red boxes the nonsense mutations, in the yellow boxes the indel mutations, in the green box the splice site mutation and in the purple box the nonstop mutation. (c) In the yet experimentally determined structural part of the MSX1 protein (PSI; Protein Model Portal; 1ig7; Msx-1 Homeodomain/DNA Complex Structure; residues 173-230; http://www.proteinmodelportal.org/query/uniprot/P28360) the mutations are indicated in the same colors as in parts (a) and (b) of Figure 1: blue for the missense mutations; red for the nonsense mutations and yellow for the indel mutations; DNA is depicted in green; and protein structure in grey. From the location in the MSX1 structure, the indel mutations can be predicted to severely disturb the DNA binding. Although the same holds true for the missense mutations, they will disturb the DNA binding with variable severity. The red residues become stop codons and thus represent the first absent residues in these mutations; their effect cannot be predicted. A full color version of this figure is available at the European Journal of Human Genetics journal online.

Figure 2
figure 2

Panel with the amino acid sequences and motifs in the natively unfolded part of the MSX1 protein According to the Knowledge based Multiple sequence Alignment for intrinsically Disordered proteins (KMAD; DOI:10.1093/bioinformatics/btv663; Lange et al, 2016) most of the prediction programs attribute the term ‘natively unfolded’) to this part of the MSX1 protein lying N-terminally to the homeodomain. Although many of the sequence motifs of this natively unfolded part of the MSX1 protein have not yet been experimentally associated to a function (KMAD-align; DOI:10.1093/bioinformatics/btv663), an important role is obvious when the amino-acid (AA) sequence is highly conserved. This is the case for the AA sequence in the large orange block under the orange arrow which refers to the LIG_EH1_1 motif where the M67K mutation causing ns TA is residing. This mutation will dramatically disrupt the function of this LIG_EH1_1 motif conserved sequence. All other arrows point to the MSX1 variants which are all associated with ns OFC. It can be noticed that the motifs/sequences containing these variants show a lesser degree of conservation. MOD: stands for variants found in mouse data. Both MOD_SUMO and MOD_CDK1 (see color legend), have previously been mentioned in relation to cleft lip and palate development in mice. A full color version of this figure is available at the European Journal of Human Genetics journal online.

MSX1 and non-syndromic tooth agenesis

Non-syndromic tooth agenesis (ns TA) is another common developmental anomaly that can be caused by MSX1 variants.14 To date, nearly 20 MSX1 mutations have been related to ns TA8, 13, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 (Table 2). Functional analyses of the mutant proteins suggest that haploinsufficiency of MSX1 underlies this phenotype.16, 38 Mutant proteins were predicted to either exhibit reduced stability or to have reduced DNA binding capacity, or to have reduced capability to interact with their cognate protein factors. As a result, the function of MSX1 as a transcriptional repressor can be greatly impaired.27, 39, 40

Table 2 MSX1 mutations in individuals with non-syndromic tooth agenesis

This could also be the result of epigenetic silencing of MSX1 by DNA methylation giving rise to either or combinations of its phenotypes or increase risk for tumoral growth

Syndromes caused by MSX1 mutations

Mutations in or including the MSX1 gene can also cause syndromic forms of tooth agenesis, including Wolf-Hirschhorn syndrome, Witkop syndrome, and tooth agenesis combined with orofacial clefting (Table 3). The Wolf-Hirschhorn syndrome is caused by a deletion of the WHS locus - including MSX1 - on chromosome 4p.10 The phenotype includes mental and growth retardation, characteristic craniofacial features, seizures, and tooth agenesis. Furthermore, a nonsense mutation in MSX1 accounts for the genetic etiology of Witkop syndrome characterized by tooth agenesis and nail dysplasia.9 Up to now, two reports propose MSX1 as a candidate gene for tooth agenesis combined with orofacial clefting.16, 41

Table 3 MSX1 mutations with combined phenotypes or syndrome

From genotype to phenotype

Overview of human MSX1 mutations

Since the first MSX1 variant was identified in a pedigree with tooth agenesis,8 several additional pathogenic variants have been discovered. To analyze the MSX1 mutations, we retrieved all MSX1 variants from HGMD (Human Gene Mutation Database) Professional up to the 30th of June, 2015. All variants with the variant class ‘DM’ (Disease causing Mutation) were included.42 The only exception was a splice site variant43 that was excluded because no detailed phenotype description was available. We obtained 31 MSX1 variants including 18 missense mutations, 5 nonsense mutations 5 indel mutations, 1 splice variant, 1 nonstop variant and 1 entire gene deletion (Tables 13, Figure 1a). These variants cause either tooth agenesis with and without orofacial clefting, non-syndromic orofacial clefting, Wolf-Hirschhorn syndrome or Witkop syndrome (Tables 13, Figure 1b).

In-frame mutations and truncating mutations

First the MSX1 variants were separated into two subsets: in-frame mutations comprising only missense mutations, and truncating mutations comprising all nonsense mutations, out-of-frame insertions or deletions, mutations causing defective splicing, nonstop mutations, and deletions of the entire gene. We found that MSX1 in-frame mutations are more frequent than truncating mutations (18 vs13) (Figure 1a) which is consistent with many other Mendelian genes.44

We analysed the phenotypes in terms of the domains where the variants are located and identified a clear segregation. The average number of missing teeth associated with in-frame mutations is lower than with truncating mutations leading to more severe phenotypes (Figure 3). This might be explained by haploinsufficiency, which is thought to play a role in tooth agenesis.38 Msx1 mutations mostly function in a dose-sensitive manner but because truncating mutations often lead to complete loss of function or to abnormal mRNA and/or protein expression they result in more disabled Msx1 proteins than in case of in-frame mutations.16

Figure 3
figure 3

Comparison of different types of MSX1 variants. In-frame versus truncating mutations and their accordingly phenotypes: non-syndromic tooth agenesis (ns TA) and non-syndromic orofacial clefts (ns OFC). In-frame (blue) versus truncating (red) variants and their accordingly number of missing teeth. All mutations in this histogram cause tooth agenesis with or without other phenotypes. The bars indicate the average number of missing teeth with error bars representing the maximum and minimum number. The arrows above the bars indicate the presence of other phenotypes. MSX1 variants affecting homeodomain (HD) versus not affecting HD with their accordingly phenotypes. Non-syndromic tooth agenesis (ns TA) and non-syndromic orofacial clefts (ns OFC). A full color version of this figure is available at the European Journal of Human Genetics journal online.

Secondly, in order to analyse the relation between the TA phenotype severity to the mutations in as much detail as possible, we refined the phenotypes by introducing 6 subcategories with increasing (average) number of TA (Table 4). This way we could cover the spectrum of dental phenotype severity from agenesis of 1 - 4 teeth towards agenesis of 17-20 teeth in the permanent dentition (except for the wisdom teeth). Moreover we added the changes in individual amino-acid characteristics (charge, polarity, hydrophilia/hydrophobia and volume) which could contribute or explain the observed phenotype. When the 8 mildest phenotypes (i.e. subcategories 1-4, 5-8 and 9-12) were compared with the 8 most severe phenotypes (subcategories 13-16, 17-20 and 21-24), 3 differences were found (Table 4). First difference concerned the number of hits in the HD: in the group of mildest phenotype 5 of the 8 mutations affected the HD, while in the severest phenotype group 7 of the 8 mutations affected the HD. The second difference relates to the number of frameshift mutations: in the first group only 3 of 8, while in the second group 5 of 8 are frameshift mutations. The third difference is the presence/absence of nucleotide duplications: interestingly, no variants with nucleotide duplications are associated with the 8 mildest phenotypes while they do occur in 4 of the 8 variants with the severest phenotypes. Therefore we can confirm that the TA phenotype severity is not only related to the domain location of the mutation (HD or not) but also to the type of mutation, with frameshift mutations causing more often more severe TA. However, within the frameshifts, mutations due to duplications of nucleotides or dinucleotides only cause severe ns TA (Figure 3 and Table 4). When we additionally analyzed the changes in amino acid parameters in the 2 phenotype severity subgroups (Table 4), not much could be concluded extra, as only 2 missense variants were present in the severest phenotype group versus 5 in the mildest phenotype group. Nevertheless, the volume of the mutant amino acid seems important. The volume of the AA was the only parameter that was oppositely different from the wild type (WT) amino acid. In the mildest phenotype with an average of 3,6 tooth ageneses, the volume of the mutant amino acid was smaller than the WT (L>P) (Table 4) while the charge, polarity and hydrophilia did not change. In the severest phenotype with 15 tooth ageneses, all other parameters being equal, the mutant amino acid was bigger than the WT (A<T) (Table 4).

Table 4 MSX1 variants in subphenotypes of increasing severity of non-syndromic tooth agenesis (nsTA) and changes in charge, polarity and hydrophilia/-phobia

Thirdly, the mutations in the HD were mapped on the 3D structure of this domain and visualized with YSARA45 (Figure 1c). From the location in the MSX1 structure, the indel mutations can be predicted to severely disturb the DNA binding. Although the same holds true for the missense mutations, they will disturb the DNA binding with variable severity (Table 4; Figure 1c). The red residues become stop codons and thus represent the first absent residues in these variants; therefore their effect cannot be predicted.

Mutations (not) affecting the homeodomain

Mutations located in different protein domains tend to cause different disease phenotypes,46 as a result of different protein functions being affected. For example in case of TP63, mutations in different domains cause either ectrodactyly ectodermal cleft lip/palate (EEC) syndrome or ankyloblepharon-ectodermal dysplasia-cleft lip/palate (AEC).47

MSX1 consists of two exons, the second of which includes the highly conserved homeodomain (HD). The homeodomain is essential for protein stability, DNA binding, transcriptional repression and interactions with other odontogenic molecules like PAX9, TATA binding protein (TBP) and DLX family members.38 When mapping the 31 identified MSX1 variants onto the gene and protein, some appear to affect the HD, including all mutations in the HD itself and truncating mutations upstream of the HD, and others did not (Figure 1b). The latter included all in-frame mutations outside the HD and truncating mutations downstream of the HD. This yielded 16 mutations affecting vs 15 mutations not affecting the HD (Figure 3). Mutations affecting the HD preferentially cause tooth agenesis with or without other phenotypes, while mutations not affecting the HD preferentially cause orofacial clefting (Figure 3). Only 4 out of 15 mutations not affecting the HD cause ns TA including three truncating mutations which affect the MSX1 protein structure31 and one missense mutation – the p.(M67K) - which is predicted to reside in the highly conserved EH1 sequence motif in the natively unfolded part of the MSX1 protein N-terminal of the HD.

All MSX1 missense variants associated with ns OFC also map to the natively unfolded part (NUP) of the MSX1 protein and when locating them onto these sequences and motifs they all map to mildly less conserved sequences and motifs, than the LIG_EH_1 motif (Figures 1b and 2). Although the function of these motifs all are not yet known or experimentally validated, we can conclude that contrary to ns TA, ns OFC is mainly associated with MSX1 variants involved in regulatory cell processes, which are located in a bit less conserved area’s than those causing ns TA. Moreover all the mutations associated with nsOFC are mapping outside the HD, which suggests that the HD is less important in orofacial clefting. For the syndromic phenotypes, all mutations are truncating mutations affecting the HD and also the sequence on the C-terminal side of the HD (Figures 1b and c).

Edgetic perturbation model

Several difficulties hinder the prediction of an exact phenotype from the type and location of the MSX1 mutation. These include gene pleiotropy, incomplete penetrance, and also variable expressivity. Only few examples exist in which the corresponding phenotype could reliably be predicted from the mutations.48

Recently, network modeling has been introduced in order to explain how specific mutations may lead to distinct phenotypes.44, 46, 49, 50, 51 In a specific network model called the edgetic perturbation model, a mutation is considered to alter molecular interactions either due to edgetic perturbations or due to node removal (Figure 4). Edgetic perturbation leads to the removal or addition of specific interactions while all other interactions (or edges) remain equal. In case of node removal all the interactions with other molecules are deleted. The perturbation of specific interactions arising from individual genetic variants can give rise to distinct phenotypes (Figure 4).

Figure 4
figure 4

The edgetic perturbation model. Different MSX1 variants might have different effects on the MSX1 network. Compared to the wild type genotype with full interaction (a), loss of some interaction (b and c), or loss of all interaction (node removal, d), leads to different phenotypes. Blue arrows represent the in frame mutations; red arrows are the truncating. Nodes represent molecules, and edges represent interactions between them. A full color version of this figure is available at the European Journal of Human Genetics journal online.

In line with this edgetic perturbation model, we hypothesize that in-frame MSX1 mutations are likely to cause edgetic perturbations involving only one edge leading to ns TA (Figure 4c). Mutations not affecting the HD would perturb a different edge, leading to ns OFC (Figure 4b). Finally truncating mutations might cause node removal perturbing all edges leading to a combination of several phenotypes or a syndrome (Figure 4d).

Conclusions

MSX1 mutations cause different phenotypes depending on their location in the gene. Variants affecting the HD mainly cause tooth agenesis with or without other phenotypes, while mutations not affecting the HD preferentially cause nsOFC. Mutations causing ns OFC are all in-frame mutations not affecting the HD while syndrome-associated mutations are all truncating mutations which do affect the HD.

Truncating MSX1 mutations cause more severe tooth agenesis phenotypes than in-frame MSX1 mutations. Our findings and our hypotheses based on the edgetic perturbation model can help explain how MSX1 mutations alter molecular interactions and cause specific phenotypes. This not only increases our mechanistic insight and understanding of the pathogenicity of MSX1 variants in craniofacial disorders, but could also expand the options to be considered for their precision treatment in the future.