Introduction

Orofacial clefts (OFCs) comprise any cleft (that is, a break or gap) in the orofacial structures. Most of the research on OFCs has primarily focused on typical cases such as cleft lip (CL), cleft palate (CP) and cleft lip and palate (CLP).1 The prevalence of CLP is 6.64 per 10 000, CL is 3.28 per 10 000,2 and CP is 4.50 per 10 000 live births.3 Most cases of OFCs occur as an isolated defect, although they can be associated with other anomalies or as part of syndromes.4

The complex etiology of OFCs involves chromosome rearrangements, gene mutations and environmental factors.5, 6 It has been suggested that OFCs are caused by genetic variations in more than one gene because several processes are involved in lip and palate formation including cell proliferation, differentiation, adhesion and apoptosis.7

A group of genes including IRF6, FOXE1, GLI2, MSX2, SKI, SATB2, MSX1 and FGF has been identified to contribute to OFC etiology. Mutations in IRF6 have been detected in 12% of OFC cases,8 those in FOXE1, GLI2, MSX2, SKI, SATB2 and SPRY2 account for 6%,9 those in MSX1 are responsible for 2%10, 11 and those in the FGF family of genes, mainly FRGR1 and FGF8, contribute to 3% of the cases.12 These genes represent only a small proportion of known genetic factors involved in the development of OFCs.7 Despite efforts to understand OFC etiology, the molecular mechanisms underlying cleft development have not been fully characterized.

Studies using array-based techniques have uncovered various large-scale copy number variations (CNVs), deletions and duplications that substantially contribute to human genomic variation.13, 14 In addition, CNVs play a role in genetic defects and diseases by modifying gene expression, disrupting gene sequence or altering gene dosage.15, 16, 17 CNV screening has proven to be a powerful strategy in identifying candidate genes and/or chromosome regions involved in various disorders including OFC.18, 19, 20

To investigate the genetic aspects of OFC, the aims of this study were to screen IRF6, FOXE1, GLI2, MSX2, SKI, SATB2, MSX1, FGF8 and FGFR1 by Sanger sequencing and to investigate the role of CNVs by array genomic hybridization (aGH) in a clinically well-characterized group of patients with OFC.

Materials and methods

All patients or parents representing their child provided their informed consent, as required and approved by the Research Ethics Committee of our institution (#714/2008).

Patients were ascertained at the Clinical Hospital, University of Campinas, and at the Faculty of Medicine, Federal University of Alagoas. The study population was composed of 23 unrelated individuals who were grouped according to phenotype (associated with other anomalies or isolated) and familial recurrence (Table 1). OFCs associated with congenital defects included seven patients with CLPs, five with CLs and three with CPs. In the isolated OFC group, three patients showed CLP, four CL and one CP. Four familial cases of CLP were identified: patient 7 has an aunt with isolated CL, patient 16 has an uncle with isolated CLP, the mother of patient 18 presented with CLP and the mother of patient 23 was affected by isolated CP.

Table 1 Clinical aspects of patients

All patients were evaluated by clinical geneticists using the same clinical protocol and classified according to International Perinatal Database of Typical Oral Clefts (2011), as well as by echocardiography. In addition, GTG-banding karyotypes (600 bands) of the patients were prepared. The karyotypes were normal in 21 cases, patient 11 was 46,XX,t(4;5)(p10;p10)pat(20) and patient 14 had a chromosomal constitution of 46,XY,ins(11;?)(p13;?)(20).

Mutation screening for the entire coding region and exon–intron boundaries was performed for the following candidate genes: IRF6, FOXE1, GLI2, MSX2, SKI, SATB2, MSX1 and FGF8. For FGFR1, only three mutations previously described by Riley et al.12 were analyzed. Sequencing was performed in an Applied Biosystems 3500xL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems). To predict the effects of the identified variants, computational algorithms such as Grantham score, Panther, PolyPhen, SIFT and SNP&GO were applied. In addition, a group of 100 Brazilian control individuals without OFC and encompassing three generations were sequenced for variants detected.

The CNV pattern of the patients was determined by the aGH technique using the Affymetrix Genome-Wide Human single-nucleotide polymorphism (SNP) Array 6.0 (Affymetrix, Santa Clara, CA, USA) according to the manufacturer’s instructions. CNV analysis of trios (patient–parents) was feasible to perform in patients 2, 4, 8, 9, 12, 14, 17, 19, 21, 22 and 23. In contrast, CNV analysis of patients and mothers was possible for individuals 3, 5, 6, 10, 11 and 20. In familial cases, the patient and the affected relative were analyzed.

Data analysis was performed using the Genotyping Console v. 3.0.2 (HMM) (Affymetrix) software. Comparisons were conducted using three different strategies as follows: patients vs 20 Brazilian control individuals without OFCs in three generations; patients vs 50 Brazilian control individuals; and patients vs the HapMap control data set. For CNV screening, regions of sizes 300 kb and involving 25 markers for deletion and 50 markers for duplications were used. CNVs of sizes <300 kb were carefully verified to detect genes related to OFCs or new ones that could be related.

Results

Sequence screening of candidate genes identified several SNPs that have been previously reported in a public database (http://www.ncbi.nlm.nih.gov/SNP), as well as three undocumented sequence variants. Patient 21 presented one variant in GLI2 (c.2341C>T, p.Leu761Phe) (Figure 1a). Patient 8 harbored a variant in MSX1 (c.329C>T, Ala32Val) (Figure 1b). Patient 6 showed an FGF8 variant defined as c.765C>A, p.Glu236Lys (Figure 1c). None of these variants were detected in the control group of 100 Brazilians. The results of five protein prediction programs varied in terms of the effects of these alterations. Panther and Polyphen considered these variants tolerant and benign. However, SFIT considered them intolerant (score 0.00). Gratham and SNP&GO pointed GLI2 variant as conservative (escore 22) and deleterious (reability=10); MSX1 and FGF8 moderately conservative (escore 64 and 56, respectively) and neutral (reability=1 and reability=9, respectively).

Figure 1
figure 1

(a) DNA sequence of GLI2 of patient 21. Heterozygous C>T variant sequence is shown by arrow. (b) DNA sequence of MSX1 of patient 8. Heterozygous C>T variant sequence is shown by arrow. (c) DNA sequence of FGF8 of patient 6. Heterozygous C>A variant sequence is shown by arrow. A full color version of this figure is available at the Journal of Human Genetics journal online.

According to previously defined parameters of size and markers, the role of CNVs detected by the three analytical approaches is different and summarized in Supplementary Table 1. CNVs that were detected in all three analytical tests were selected to gene identification, research in Database of Chromosomal Imbalance and Phenotype in Humans (DECIPHER) and Database of Genomic Variants (DGV) (Table 2). DECIPHER cases were considered relevant up to four CNVs beyond the one overlapping with CNV presented by our patient or if OFC was part of the phenotype.

Table 2 Clinical information (cleft type) and results of genetic analyses including CNV and mutational screening

CNVs of <300 kb in size were also assessed; however, only two were considered relevant based on genes involved. In patient 3, a 96-kb duplication was detected in FGFR1 that is located in the chromosomal region 8p12 (nt 38 431 900–38 441 500 (hg18)). Patient 8 showed a 270-kb deletion in the chromosomal region 1p36.11 (nt 23 903 625–24 173 440 (hg18)) that encompassed 8 genes including TCEB3 (Figure 2). Both CNVs were confirmed using the three types of analysis.

Figure 2
figure 2

(a) Genomic array profile of chromosome 1. Black circle evidences the deletion on 1p36.11 band. (b) The region involves TCEB3 gene (black circle) as showed at Ensembl Genome Browser. A full color version of this figure is available at the Journal of Human Genetics journal online.

Patient 14 presented a karyotype of 46,XY,ins(11;?)(p13;?)[20]. The aGH analysis detected two duplications: a 17.09-Mb segment at the chromosomal region 15q25–q26 (nt 81 869 248–98 962 477 bp (hg18)) and a 3.8-Mb duplication at the chromosomal region 8p23.1 (nt 8 129 435–11 934 586 bp (hg18)). The complete report of this patient has been published elsewhere,21 although the present study highlights the 15q–15q26 region that involves KIF7 (Figure 3). Patient 11, who showed a karyotype of 46,XX,t(4;5)(p10;p10), had no alterations involving breakpoint regions that would further characterize it as a balanced translocation.

Figure 3
figure 3

(a) Genomic array profile of chromosome 15. Black circle evidences the duplication on 15q25 band. (b) The region involves KIF7 gene (black circle) as showed at Ensembl Genome Browser. A full color version of this figure is available at the Journal of Human Genetics journal online.

Discussion

Our strategy to investigate the genetic factors involved in OFCs was based on standard clinical evaluation, mutational screening and aGH analysis. The main idea of this research was to perform a screening of variants (sequence and copy number) involved in orofacial clefts that justifies the sample composed of different types of clefts (CLP, CP and CL).

Undocumented variants were detected in patients with GLI2, MSX1 and FGF8, whereas these were not observed in their parents as well as in the 200 control chromosomes, indicating that these were rare variants. The predicted effects at the protein level using in silico algorithms showed discordant results. Functional studies are therefore necessary to elucidate how these variants affect gene expression and protein production during development and to establish their role in OFC etiology.

GLI2 belongs to a zinc-finger protein class that is required for the expression of genes during embryogenesis and is involved in the sonic hedgehog signaling process.22, 23 Mutations in GLI2 together with FOXE1, MSX2, SKI and SATB2 have been detected in 6% of OFC cases.9 Mutations in this gene have also been reported in patients with OFC, holoprosencephaly and facial anomalies.24, 25 The GLI2 variant was detected in patient 21, who was classified as an isolated CL case.

MSX1 encodes a member of the muscle segment homeobox gene family that controls gene expression during the development of palatal shelves.26 This gene is also involved in epithelial–mesenchymal growth and differentiation of specific tissues. Animal models of growth disruption because of mutations in MSX1 have been shown to develop palatal clefts.27 Mutations in MSX1 contribute to 2% of isolated OFC cases that consist of patients of different ethnicities.10 Patient 8, who carried this variant, presented a CL that was associated with other anomalies as well as CNV (to be discussed later) that might have played a role in OFC development.

Mammalian fibroblast growth factors (FGF1FGF10 and FGF16FGF23) control a wide spectrum of biological functions during development and adult life.28 FGF8 expression occurs during gastrulation as well as during the development of the brain, heart, limbs and craniofacial structures including labial and palatal shelves.29, 30, 31, 32 Sequence screening of 12 fibroblast growth factor genes (FGFR1, FGFR2, FGFR3, FGF2, FGF3, FGF4, FGF7, FGF8, FGF9, FGF10, FGF18 and NUDT6) in a study population consisting of patients with isolated OFC has detected 9 potential pathogenic mutations including one loss-of-function mutation involving FGF8. These FGFs genes might be potentially responsible for 3–5% of the isolated CLP cases.12 Patient 6 uniquely presented CLP and delayed psychomotor development that also corresponded to the timing of FGF8 expression during the brain development.

Mutations in GLI2, MSX1 and FGF8 have been reported in single cases of OFC and are therefore considered private mutations.7 Although this finding might also be detected in the present study sample, our results reinforce the importance of analyzing these genes in patients with OFC. Mutations in genes other than those described by Vieira,7 which were not investigated in the present study, might be present in other patients of this Brazilian population. In future studies, exome sequencing might identify all genes associated with this disorder.

CNVs have received significant attention in recent years because of the improvements in the resolution of the aGH technique, facilitating its identification in the human genome. CNVs occurring at high frequencies in human populations are considered a potential source of genetic diversity13, 14 and might also be relevant to the pathogenesis of complex traits.33 The following features of CNVs should therefore be analyzed: the frequency of CNVs in the normal population; the pattern of emergence of CNVs, genes involved with CNVs including patterns of expression and correlated phenotypes; and abnormal phenotypes that have been reported to be associated with the particular CNV.34, 35, 36, 37, 38 When possible, each CNV was analyzed to check whether it was de novo or inherited from an affected parent(s) or a normal parent. At the present time, it is not certain whether de novo CNV can explain patient’s phenotype. The penetrance and extent of the phenotypic spectrum of the imbalance should be considered.39

Based on these criteria, CNV analysis was therefore conducted according to three types of references due to the absence of a Brazilian control population database. Considering a minimum size (300 kb) and the number of markers, the analysis generated different results, particularly those derived from the HapMap control data set that mainly assesses genetic differences among various human populations (Supplementary Table 1). Furthermore, the detected CNVs were not recurrent among patients and involved different regions of the genome, indicating the expected OFC genetic heterogeneity based on sample composition.

However, common CNVs among analysis of the same patient were detected (Table 2), except for patients 3, 4, 12, 20 and 22. Most of these have been reported in Database of Genomic Variants, although the population background should also be taken into account. CNV inheritance was predicted in cases in which parental DNA was available, as well as in familial cases. A deletion at the chromosomal region 17q21.31 was detected in patient 7 as well as in his aunt, who was affected by an isolated CLP, highlighting the role of this region that encompassed ARL17, LRRC37A and KANSL1. Haploinsufficiency of KANSL1 was suggested to cause 17q21.31 microdeletion syndrome (Online Mendelian Inheritance in Man (OMIM): 612452), a multisystem disorder characterized by intellectual disability, hypotonia and distinctive facial features.40, 41 Patient 7 showed a different clinical picture that might probably be related to the size of the deletion.

The most plausible explanation for the majority of detrimental phenotypes caused by changes in copy number is gene dosage, wherein gain or loss of a gene copy causes alteration in expression level.42 CNVs typically affect multiple genes and, thus, the central question is to estimate the contribution of each gene in CNVs to a particular phenotype.33 Based on the observed expression pattern and clinical phenotype, we infer that the deletion of the chromosomal region 15q11.2 in patient 17 pointed out NIPA1. Mutations in this gene cause hereditary spastic paraplegia type 6, a neurodegenerative disease, and deletions in this gene have been associated with a higher susceptibility to amyotrophic lateral sclerosis.43, 44 NIPA1 is an inhibitor of BMP signaling,45 and BMP genes play a major role in the lip and palate development.46

Two CNVs of sizes <300 kb were considered relevant based on the implicated genes: a duplication in patient 3 involved FGFR1 (chromosomal region 8p12) and a deletion in patient 8 involved TCEB3 (chromosomal region 1p36.11). FGFR1 plays a role in palatogenesis,47 has been linked to OFC development,48 and mutations in this gene have been implicated in isolated CLP cases.12 TCEB3 (transcription elongation factor B (SIII), polypeptide 3) encodes the protein Elongin A, which is a subunit of the transcription factor B (SIII) complex, and is composed of Elongins A/A2, B and C. SIII activates the elongation role of RNA polymerase II by suppressing the transient pausing of the polymerase at several sites within transcription units. Elongin A functions as a transcriptionally active component of SIII complex, whereas Elongins B and C are regulatory subunits.49 In another study conducted by our group using SNP analysis, this gene was associated with isolated CLP cases (TK Araujo, 2014, unpublished data). In addition, a 1.88-Mb deletion involving TCEB3 was detected in case 4661 of the DECIPHER database that presented CP associated with other anomalies.

A 17.09-Mb duplication involving the chromosomal region 15q25–q26 was detected in patient 14. The 15q26 region has been previously reported to be strongly associated with isolated CLP cases in Indian families.50 In addition, this region encompasses KIF7 (kinesin family member 7) that encodes a protein involved in sonic hedgehog (SHH) signaling pathway through the regulation of GLI transcription factors.51, 52, 53 SHH signaling plays a role in craniofacial development and lip fusion.54 Mutations in KIF7 have been associated with acrocallosal syndrome53, 55, 56 as well as hydrolethalus and acrocallosal syndrome, with overlapping features of polydactyly, brain abnormalities and CP.53 Our group has also associated this gene with isolated CLP cases using SNP analysis (TK Araujo, 2014, unpublished data). Based on all this information, we therefore infer that KIF7 gene may be suggested to be involved in the development of the submucous CP observed in patient 14.

Considering the phenotype related to CNV, patient 11 harbored a 334-kb deletion that overlapped with the 22q11.2 deletion syndrome (OMIM: #611867) that has also been related to OFC, particularly CP. This region encompasses PRODH that is widely expressed in the brain and is a proline dehydrogenase that encodes for the enzyme proline oxidase. This enzyme is responsible for converting proline into glutamate, the main excitatory neurotransmitter of the brain.57 This gene has been associated with schizophrenia phenotype in the 22q11.2 deletion syndrome.58, 59, 60

Patient 19, who was classified as an isolated CL case, harbored a 564-kb deletion that overlapped with the 1p36 microdeletion syndrome (OMIM: 607872) region. Monosomy 1p36 is a common terminal deletion syndrome with an estimated incidence rate of 1 in 5000 births;61 its main clinical features include developmental delay with hypotonia (100%), seizures (up to 72%), cardiac defects (40%) and CLP (20–40%). The size of the deletion widely varies from 1.5 Mb to >10 Mb; 40% of the breakpoints occur 3.0–5.0 Mb from the telomere and 70% involve true terminal deletions. A few (7%) of the 1p36 deletions are interstitial, retaining the 1p subtelomeric region. Breakpoints most commonly occur within the 1p36.13–1p36.33 region that has also been associated by a meta-analysis study to isolated CLP cases,62 and this is interesting as patient 19 presents isolated CL.

Although extensive studies on CNVs have been conducted, its recent discovery still generates difficulties in its classification and interpretation. For these reasons, understanding the impact of changes in the copy number of individual genes or large chromosomal regions on diseases or malformations is necessary. This study, including a detailed clinical description, contributes to the elucidation of the role of CNVs in OFC pathogenesis. In fact, major and minor clinical features have been considered relevant to clinical trial.63

Considering the complex etiology of OFC, it has been earlier proposed that genetic variations in more than one gene cause this particular phenotype.7 In addition, the cumulative effect of changes in the copy number of various genes, which individually have little or no effect, could be responsible for the observed abnormal phenotypes.41 CNV screening has identified new genes that might have influenced OFC pathogenesis and could still be further analyzed. The results of the present study suggest that a mechanism underlying CNVs associated with sequence variants may play a role in the etiology of this complex congenital defect.