Introduction

Oral clefts are common multifactorial birth defects presenting with a wide range of abnormalities in the upper lip, the primary palate, and the secondary palate, and include cleft lip (CL), cleft palate (CP), CL and/or palate (CL/P), incomplete CP, and submucous CP.1,2 Because the secondary palate consists of both a bone-lined hard palate and a bone-free soft palate, incomplete CP includes hard-palate cleft, soft-palate cleft, and bifid uvula. The mildest forms of CP are defects of the soft palate only (soft-palate cleft) or the uvula only (bifid uvula). Oral clefts may be nonsyndromic or manifest as a clinical phenotype within syndromes. They can be caused by different etiological factors such as single gene mutations, chromosomal aberrations, and specific environmental agents as well as by interactions between genetic and environmental influences.3, 4 Concordance rates for CL, CL/P, and CP are higher in monozygotic twins than in dizygotic twins,5 which indicates significant, but not exclusive, genetic contributions. Epidemiological studies indicate that oral cleft phenotypes may have different underlying etiologies. For instance, isolated CP and CL/P seldom occur in the same family.3 Siblings of patients with CL/P have an increased frequency of CL/P but not of isolated CP, while siblings of patients with isolated CP have an increased frequency of isolated CP but not of CL/P.3 Moreover, CL/P and CP display different sex ratios and prevalence among oral cleft phenotypes. The recurrence risk of CP among siblings is higher in females than in males whereas the reverse is true for CL/P.3, 6 Gaining insight into the different causative etiologies of oral clefts is important as it may lead to improved diagnosis, counseling, and preventive health treatments.

Oral clefts in humans are associated with a large number of genetic diseases/syndromes,7 and findings from studies using genetically engineered mice with oral cleft have improved our understanding of palatogenesis.8, 9 As a result, many genetic mutations associated with human and mouse oral clefts have been identified and molecular functions have been elucidated. Since the identification and functional classification of disease-causing genes can reveal general biological mechanisms underlying human diseases and disorders,10 investigating the functional annotation of candidate genes associated with oral clefts would aid in a better understanding not only of the biological basis of these phenotypically variable and complex group of conditions but also of their underlying genetic causes.

Materials and methods

Genes associated with human oral cleft phenotypes

Online Mendelian Inheritance in Man (OMIM) (http://omim.org)11 is a comprehensive, well-established database of human genes and genetic disorders integrating genetic information with clinical phenotypes and diseases in humans. Similarly, the GATACA database (https://gataca.cchmc.org/gataca/) provides links between genes and different diseases or phenotypes using cross mapping to identify genetic overlap between different biological elements, functions, or processes. In our evaluation of the genetic basis of human palatogenesis, we first investigated congenital disorders or syndromes associated with oral clefts and their candidate genes using OMIM and GATACA. The databases were searched using the terms “cleft lip and palate”, “cleft lip/palate”, “cleft lip and/or palate”, “cleft lip”, “cleft of upper lip”, “cleft palate”, “cleft secondary palate”, “incomplete cleft palate”, “submucosal cleft palate”, “submucous cleft palate”, “soft palate cleft”, “cleft of the soft palate”, “soft cleft palate”, “cleft uvula”, and “bifid uvula”. The search was completed on 9 February 2016. In our identification of oral cleft phenotypes in humans, our primary search results were screened using the following exclusion criteria: (1) genes associated with oral clefts in mice with no evidence of association in humans; and (2) genes specifically associated with an absent uvula. The resulting list of genes associated with nonsyndromic and syndromic oral cleft phenotypes in humans was used for ontology analysis. Positive hits were further interrogated to identify oral cleft subphenotypes through review of either the Clinical Synopsis or articles cited in OMIM (Supplementary Table 1 and Supplementary References). OMIM and the NCBI Gene database (http://www.ncbi.nlm.nih.gov/gene) were used to identify the corresponding proteins and Entrez Gene ID of each gene.

Gene ontology analysis

For a better understanding of the genetic contributions underlying oral clefts, genes associated with oral cleft were further analyzed based on biological process, molecular function, and gene family using the Protein ANalysis THrough Evolutionary Relationships (PANTHER) database (http://pantherdb.org).12 Briefly, Entrez Gene IDs were uploaded to identify unique and annotated genes for inclusion in the ontology analysis. The resulting gene lists were evaluated using tests for enrichment that identify functional classes in which the genes of a given class have values that are non-randomly selected from a genome-wide distribution of values.12 Statistically significant enrichment of the data set in a given process was determined using binomial testing with Bonferroni corrections for multiple testing as described previously.13 Only those classes demonstrating statistically significant (P< 0.05) enrichment were used for gene family analysis. Putative chemical–gene–disease interactions were identified using the Comparative Toxicogenomics Database (CTD) (http://ctdbase.org).14 For CTD analysis, derived nominal P-values were adjusted using the false discovery rate as described by Benjamini and Yekutieli.15 The CTD contains many classes with similar protein constituents. Therefore, the gene counts of those classes that were a complete subset of another were discarded.

Results

Gene profiles differ depending on the oral cleft phenotype

As a result of our search using OMIM and GATACA (refer to Materials and Method section for a full list of search terms), we found over 350 candidate genes having one or more syndromic and/or nonsyndromic oral cleft annotations (Supplementary Table 1). Since phenotypic classification of human genes often yields important insights into gene function,16 we classified the identified genes based on their association with CL/P, CP only (CPO), incomplete CP, and submucous CP as shown in Figure 1a.

Figure 1
figure 1

Gene profiles differ depending on cleft palate phenotype. (a) The overlap between human genes associated with cleft phenotypes is depicted in the Venn diagram. The numbers in each area represent the gene count for the particular section. (bd) Gene ontology analysis of genes associated with human cleft palate phenotypes according to molecular function (b), biological process (c) and chemicals (d). Plotted is the –log(P-value) with the threshold set to 1.3 [log(0.05)]. CP, cleft palate; CL/P, cleft lip and/or palate; CPO, cleft palate only; ICP, incomplete cleft palate; SCP, submucous cleft palate; CL, cleft lip; CLO, cleft lip only.

To investigate whether gene profiles differ among oral cleft phenotypes, we performed a gene ontology analysis first comparing candidate genes using the PANTHER database (Figure 1b and 1c and Table 1, 2, 3). Based on studies that investigated expression patterns and phenotypes in mutant mice, homeobox transcription factors have roles in the patterning of the upper and lower jaws.17, 18 We found that when genes were analyzed according to molecular function, those found in the transcription factor category, especially those genes that contain a homeobox transcription domain, were enriched in all oral cleft phenotypes (Figure 1b, Table 1, family #1 in Table 3). We also found that genes associated with signaling molecules (P=0.000035) and growth factor (P=0.0015) were significantly enriched in CL/P, and genes associated with the extracellular matrix were significantly enriched in incomplete CP (P=0.042) (Figure 1b and Table 1). When genes were analyzed according to biological process, neurogenesis (P=0.00000076), ectoderm development (P=0.0000021), and segment specification (P=0.00066) were enriched in only CL/P (Figure 1c and Table 2). In submucous CP, we found that muscle development (P=0.0021) and skeletal development (P=0.00099) were enriched (Figure 1c and Table 2). Developmental process and mesoderm development were significantly enriched in all oral cleft phenotypes (Figure 1c).

Table 1 Classification of candidate genes associated with human oral cleft phenotypes according to molecular function
Table 2 Classification of candidate genes associated with human oral cleft phenotypes according to biological process
Table 3 Classification of candidate genes associated with human oral cleft phenotypes according to gene family

We next investigated possible chemical–gene–disease interactions using the CTD to investigate the mechanisms underlying environmentally influenced oral clefts. We found that the enrichment distribution of chemicals was also different among cleft phenotypes (Figure 1d). Tretinoin (the carboxylic acid form of vitamin A), tetrachlorodibenzodioxin (also known as Dioxin), and arsenic trioxide (an anti-cancer chemotherapy drug) were significantly enriched in all oral cleft phenotypes (Figure 1d). Valproic acid, a medication primarily used to treat epilepsy and bipolar disorder, was significantly enriched in CL/P (P=0.00000006144) and CPO (P=0.002719), but not in incomplete CP and submucous CP (Figure 1d). In addition, we found that ethanol and phenytoin (an anti-seizure medication) were both enriched in CL/P and incomplete CP (Figure 1d), whereas vitamin A and dexamethasone (a corticosteroid) were both enriched in CPO and incomplete CP (Figure 1d). The herbicide nitrofen and reactive oxygen species were significantly enriched in incomplete CP, whereas ochratoxin A, which is a mycotoxin produced by Aspergillus ochraceus, was enriched specifically in submucous CP (Figure 1d).

We also analyzed genes according to gene family. Interestingly, gene products involved in the TGF-β signaling pathway (family #4 in Table 3) were enriched in CPO (P=0.00024) and incomplete CP (P=0.00019) whereas genes involved in the fibroblast growth factor (FGF) family were only enriched in CL/P (P=0.0000032) (family #5 in Table 3). In addition, we found that all three of the T-box protein, collagen-α chain protein, and TGF-β families were associated with CPO and incomplete CP (families #2–4 in Table 3).

Discussion

Palatogenesis involves many diverse genes in a complex process. Oral cleft phenotypes develop when this process is disrupted in some manner because of gene dysfunction. However, oral cleft phenotypes can vary significantly, and this phenotypic variation likely reflects the involvement of different genes and/or changes in the functional contributions of the same genes. To understand better the genetic contributions underlying different oral cleft phenotypes, it is necessary to identify and characterize these culprit genes. It is known that the empirical recurrence risks for CP and CL/P are independent, characterized by differences in sex ratios and prevalence.3 Similarly, our ontology analysis found different gene profiles indicating different underlying genetic etiologies of CP and CL/P. When genes were analyzed according to molecular function, biological process, chemical–gene–disease interactions, and gene family, we found distinct genetic profiles for different cleft palate phenotypes such as CL/P, CP, incomplete CP, and submucous CP. The results of our gene ontology analyses support the findings of earlier epidemiological studies that suggest that different genetic etiologies underlie different oral cleft phenotypes. They further demonstrate the usefulness of ontological candidate gene analysis in understanding gene function in palatogenesis.

Using ontology analysis, we found that the T-box protein family, the collagen-α chain protein family, and the TGF-β family were associated with CPO and incomplete CP. Consistent with our findings, a study reported that TGF-β regulates collagen synthesis and degradation, thereby affecting the amount of collagen present in the mesenchyme of the embryonic palate.19 The T-box gene, TBX1, is the major candidate gene for DiGeorge syndrome (OMIM #188400) and may be responsible for several phenotypes including cleft palate, while mutations in TBX22 cause a form of X-linked cleft palate (OMIM #303400). Similarly, mutations in the collagen-α chain genes, COL2A1, COL9A2, COL11A1, and COL11A2, have been associated with different forms of Stickler syndrome (OMIM#108300, #614284, #604841, and #184840, respectively), a clinically variable condition that includes cleft palate. As disruption of T-box proteins and collagen-α chain proteins both contribute to CPO and incomplete CP in humans, and that Tbx1 knockout mice exhibit different CP phenotypes including incomplete CP and submucosal CP,20 further investigations to determine whether deletion of Tbx1 or Tbx22 affects expression of collagen-α chain genes in mouse palatal shelves are warranted.

In summary, we identified a pool of candidate genes associated with different oral cleft phenotypes. Our gene ontology analysis revealed that genes associated with each cleft palate phenotype show different functional profiles. It is possible that some of the candidate genes identified are involved in tongue or bone anomalies and induce oral clefts during palatogenesis as a secondary defect. In addition, some polymorphisms identified in listed genes may not be disease-causing per se, but benign sequence variants in linkage disequilibrium with pathogenic variants. In addition to gene mutations, epigenetic changes and microRNA regulation may alter gene expression during palatogenesis. Nevertheless, the results of the gene ontology analysis indicated distinct genetic profiles for each oral cleft phenotype and differences in the underlying genetic etiologies of oral clefts. Analysis of the candidate genes and their products may provide an opportunity to discover new disease-causing genes implicated in palatogenesis.