Main

Nonsyndromic cleft lip with or without cleft palate (CL/P) is one of the most common birth defects with the birth prevalence being highest in Asian (2/1000 live births), intermediate in European (1/1000 live births), and lowest in African populations (0.4/1000 live births). CL/P is a complex disease with both genetic and environmental risk factors.1 Mutations in the interferon regulatory factor 6 gene (IRF6) located on chromosome 1q32.3-q41 are responsible for a majority of van der Woude syndrome (VWS) cases. VWS is an autosomal dominant syndrome that includes an oral cleft and pits on the lower lip in approximately 85% of cases. Fifteen percent of VWS cases have an isolated cleft with no lip pits and are clinically indistinguishable from nonsyndromic CL/P.2 The GATA124F08 marker located 1 Mb from IRF6 has shown a significant heterogeneity LOD (1.15) with α = 0.45,3 and an anonymous marker (D1S205) in IRF6 has yielded significant evidence of linkage and linkage disequilibrium (LD) in 106 nonsyndromic CL/P trios.4 Recently, strong evidence of overtransmission of the G allele at the IRF6 c.820G>A marker (rs2235371) was found in CL/P case-parent trios from Asia and South America,5 and a significantly higher frequency of the GG genotype was observed among 192 Thai CL/P cases compared with controls (odds ratio = 1.67).6 This variant creates a valine→isoleucine substitution at amino acid 274 (p.V274I) in the protein-binding domain [the Smad-interferon regulatory factor binding domain (SMIR)] of IRF6, but the A allele is rare in white populations. Analysis of seven other single nucleotide polymorphisms (SNPs) in and around IRF6 has shown several distinct haplotypes demonstrating altered transmission in Iowan and Danish trios.5 Confirmatory studies using Italian, European-American, and Belgian CL/P families, respectively, have strengthened the evidence that IRF6 is important in the etiology of nonsyndromic oral clefts.79 Risk of CL/P associated with particular variants in IRF6 may differ among ethnic groups, however. Here, we evaluated 13 SNPs in and around IRF6 to test for association with nonsyndromic CL/P in 77 European-American (including five incomplete trios), 146 (three incomplete trios) and 34 (11 incomplete trios) Han Chinese trios from Taiwan and Singapore, respectively, plus 40 (two incomplete trios) Korean CL/P trios. Expression of IRF6 in human craniofacial structures was also determined using publicly available dat a.

MATERIALS AND METHODS

Subjects

Infants born with isolated, nonsyndromic CL/P and their parents were ascertained through treatment centers in Maryland (Johns Hopkins University and University of Maryland), Taiwan (Chang Gung Memorial Hospital), Singapore (KK Women's and Children's Hospital), and Korea (Yonsei University Medical Center), respectively, under a protocol approved by the institutional review board at each institution as part of an international study of oral clefts. After informed consent was obtained from parents, ethnicity and other data were obtained through structured interviews.10 Both cases and parents provided blood samples.

SNP selection and genotyping

SNP markers in and around IRF6 were identified from the literature and the dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/) using a NorthStar Searchlet program (Genetic Software Innovations, Inc., Cicero, NY). A final set of 13 SNPs were chosen based on the criteria of high “design scores” as provided by Illumina, Inc. (San Diego, CA), heterozygosity >0.1, and HapMap validation (www.hapmap.org/index.html.en). The final marker set included the rs2235371 and rs2013162, which previously showed significant associations in Asians and Europeans, respectively (Fig. 1). Primers for each SNP were synthesized using the Oligator technology by Illumina, Inc. as part of an oligo pool for the BeadLab 1000 system. Genomic DNA samples were prepared from peripheral blood lymphocytes by the protein precipitation method described previously11 and genotyped for SNP markers using the Golden Gate chemistry on Sentrix Array Matrices (Illumina, Inc.) at the Johns Hopkins SNP Center.12 The average distance between neighboring markers was 1.53 kb (based on the Build 36.1 of dbSNP). Two duplicates and four CEPH control DNA samples were included to evaluate genotyping consistency.

Fig. 1
figure 1

Significance of individual single nucleotide polymorphisms (SNPs) and sliding window haplotypes for the interferon regulatory factor 6 (IRF6) gene in four groups of nonsyndromic cleft lip with or without cleft palate trios. The −log10 (empirical P value) for the overall χ2 test for an individual SNP (vertical line) and for sliding windows of haplotypes of two to five SNPs (horizontal lines) is presented. Nominal significance levels are denoted by gray lines (5%, 1%, 0.1%, 0.01%, and 0.001%).

Statistical analysis

Within each population, the minor allele frequency (MAF), heterozygosity, and a χ2 test for Hardy-Weinberg equilibrium (HWE) at each SNP were computed among parents. Pairwise LD was computed as both D′ and r2 for all SNPs using the Haploview program (http://www.broad.mit.edu/mpg/haploview/index.php/).13,14 Individual SNPs and sliding windows of haplotypes consisting of two, three, four, and five SNPs were tested using the family-based association test program (http://www.biostat.harvard.edu/fbat/default.html).15 Empirical P values for observed versus expected transmission were obtained using the permutation option and these are presented as −log10 P values.16 A Web interface (SNPSpD) was used to perform the spectral decomposition correction for multiple comparisons (http://genepi.qimr.edu.au/general/daleN/SNPSpD/).17 Genotypic odds ratios (GORs) for heterozygotes and homozygotes were calculated separately for individual SNPs as well as for diplotypes consisting of two or three SNP haplotypes yielding statistical significance. GORs were obtained from conditional logistic regression models for matched sets consisting of the case and three “pseudosib” controls derived from the parental mating type using publicly available subroutines in the STATA software package (http://www-gene.cimr.cam.ac.uk/clayton/software/stata/).

Gene expression analysis

Expression of IRF6 in human craniofacial structures relevant to normal palate and lip development was determined using data obtained from the Craniofacial and Oral Gene Expression Network (COGENE) consortium (http://hg.wustl.edu/COGENE/). Data from seven serial analysis of gene expression (SAGE) libraries were used to assess gene expression patterns in different human embryonic tissues (i.e., 26-day-old human embryonic tissue, 4-week anterior rhombomere, 4-week posterior rhombomere, 4-week frontonasal prominence, 5-week frontonasal prominence, 6-week mandible, and 8.5-week upper lip).18

RESULTS

Proband gender for the four groups of case-parent trios is shown in Appendix Table A1 (available online at www.geneticsinmedicine.org). Examining duplicated samples revealed a very high reproducibility for genotypes (99.98%). Minor allele frequencies for rs2235371 and rs3753517 were too low to be informative in the Maryland samples (MAF <0.005 for both), and only 60% of genotypes were called at rs2294408 in the Singapore and Korean samples. All remaining SNPs gave no evidence of deviating from HWE (data not shown). Among the 13 SNPs, five groups of markers (rs599021-rs861019, rs2073485-rs2235373, rs2235371-rs3753517, rs674433-rs595918, and rs2013162-rs2236907-rs2294408-rs2073487-rs1005287) showed virtually complete LD (D′ = 1 and r2 > 0.8) in all four populations, so markers within each block became redundant (see Appendix Table A2 available online at www.geneticsinmedicine.org). Consequently, one tagging marker was chosen from each of these five groups (rs599021, rs2235373, rs2235371, rs595918, and rs2013162) to represent haplotypes showing significant transmission distortion and estimate GORs.

TDT analyses for individual markers and haplotypes

In Figure 1, only empirical P values <0.10 from the transmission disequilibrium test (TDT) are presented for haplotypes, whereas all empirical P values for the 13 individual SNPs in each of the four populations are presented. TDT results for individual SNPs showing significant evidence of linkage and LD among the four samples of CL/P trios are summarized in Table 1. Two SNPs, rs2073485 and rs2235373, which were in complete LD with one another (both D′ and r 2 = 1), yielded highly significant P values for both single marker and haplotype analyses among the 146 Taiwanese CL/P trios (P = 2 × 10−6 and lowest P < 10−6, respectively). In the 34 Singaporean trios, seven SNPs and their haplotypes yielded nominal significance (lowest P = 0.014). Haplotypes consisting of three SNPs (rs2235371, rs674433, and rs595918) yielded nominal significance in the Maryland trios. In the Korean trios, most haplotypes, including rs2294408, were statistically significant; however, only nine families were informative for this marker. Experiment-wide significance thresholds required to keep the type I error rate at 5% for samples from Maryland, Taiwan, Singapore, and Korea are P < 0.010, 0.013, 0.017, and 0.013, respectively, based on the spectral decomposition correction. Therefore, statistical evidence observed among the Taiwanese, Singaporean, and Korean trios remained significant after correcting for multiple comparisons.

Table 1 Marker information and TDT results for SNPs in IRF6 showing significant evidence of linkage and LD in four groups of CL/P trios

In Table 2, the most common haplotype (AATGA) across five SNPs (rs599021, rs2235373, rs2235371, rs595918, and rs2013162) showed significant undertransmission (P = 0.00051), whereas two haplotypes, (A/C)GC(A/G)C, were significantly overtransmitted among Taiwanese CL/P children. Interestingly, two-SNP haplotypes [e.g., AA for rs599021 and rs2235373 (P = 5 × 10−6) or GC composed of rs2235373 and rs2235371 (P = 9 × 10−6)] were more informative than the three- or four-SNP haplotypes. Here, all alleles are reported on the forward strand of the chromosome (NCBI build 36.1), although the gene is transcribed from the reverse strand. Allele designations need to be reversed when compared with published reports that used the transcription strand as the reference.

Table 2 Markers and haplotypes showing significant evidence of linkage and LD in 79 Taiwanese CL/P case-parent trios

Genotypic and diplotypic odds ratios

As shown in Table 3, G/G and C/C homozygotes at rs2235373 and rs2013162 had a significantly increased risk of being CL/P cases [GOR = 4.94 and 3.78, 95% confidence interval (CI) = 2.43–10.04 and 1.93–7.41, respectively], whereas C/C, C/C, and A/A homozygotes at rs599021, rs2235371, and rs595918 were more likely to be CL/P cases compared with reference homozygotes among Taiwanese trios (GOR = 2.92, 2.75, and 3.20, respectively, and 95% CI: 1.15–7.42, 1.37–5.52, and 1.15–8.90, respectively), although the global P value for the conditional logistic model of rs595918 was not significant. The C/C genotype at rs2013162 increased the risk of being a CL/P case among Singaporean trios (GOR = 6.88, 95% CI: 1.17–40.34).

Table 3 GORs for heterozygotes and homozygotes for individual SNPs showing significant evidence of linkage and LD in four CL/P groups

To determine diplotype specific risks in the Taiwanese trios, two-, three- and four-marker models were tested as shown in Table 4. Diplotypes with frequency <4% were analyzed as a single group. The AG/CG diplotype for rs599021 and rs2235373 showed the most increased risk of being a CL/P case among all two-SNP diplotypes composed from the four SNPs identified in Table 3 (i.e., 5.95 times higher than AA/AA, the reference diplotype group; 95% CI: 2.53–13.99). Interestingly, the GOR for diplotypes that included A/C heterozygotes at rs599021 and homozygotes for the high-risk allele at other loci showed the greatest risk. For instance, the AGC/CGC diplotype showed a higher risk of being a CL/P (GOR = 6.99, 95% CI: 2.70–18.06) than did the CGC/CGC diplotype (GOR = 3.70, 95% CI: 1.31–10.46). Using data from the COGENE consortium, the IRF6 gene was found to be expressed in the 4-week frontonasal prominence among seven SAGE libraries.

Table 4 GORs for heterozygotes and homozygotes for two-, three-, and four-SNP haplotypes consisting of SNPs increasing risk in 146 Taiwanese CL/P case-parent trios

DISCUSSION

Zucchero et al.5 showed strong evidence of overtransmission of the valine (V) allele at p.V274I (rs2235371) in IRF6; however, the estimated attributable risk (11.6%) and the estimated 3-fold increased recurrence risk among Filipino CL/P case-parent trios must be interpreted carefully because it was assumed that carrying this allele was directly causal and uncorrelated with other risk factors.6 This SNP is not highly polymorphic in Europeans, although Asians have allele frequencies around 66% for the G allele (p.274V).9,19 Thus, subsequent studies in European-derived populations focused on four SNPs (rs1319435, rs2013162, rs2235375, and rs2235543) with higher heterozygosity levels. Scapoli et al.7 detected overtransmission of the G and C alleles for markers rs2013162 and rs2235375 (P = 0.004 and P = 0.002, respectively) and all haplotypes carrying these common alleles among 219 Italian CL/P trios (the GTGA haplotype showed significant undertransmission, P = 0.0003). Blanton et al.8 detected overtransmission of the C allele at rs2013162 (P = 0.05), and all haplotypes including the A allele at this marker were significantly undertransmitted to cases (the lowest P = 0.002 for CAXT haplotype among 216 European-American families with CL/P). Ghassibé et al.9 observed transmission distortion of the GG and TG haplotypes (P = 0.004 and P = 0.036, respectively) for two markers (rs2013162 and rs2235543) and confirmed overtransmission of the G allele at rs2013162 (P = 0.01) in 195 Belgium families (this sample included some immigrant families from other populations). Alleles were designated (T/G) for rs2013162 based on the 3′ to 5′ orientation of this gene in some studies, whereas here these alleles are designated A/C based on their 5′ to 3′ orientation on the chromosome. Reported significance levels from these published studies did not include corrections for multiple comparisons.

Initially, we analyzed 103 European-American trios, 171 and 66 Han Chinese trios from Taiwan and Singapore, respectively, and 42 Korean trios with either CL/P or isolated cleft plate (results not shown). When stratified by type of cleft, we found greater statistical significance among CL/P groups, despite the smaller numbers, confirming the possibility of etiologic heterogeneity (e.g., P = 3.6 × 10−5 vs. P = 2 × 10−6 at rs2235373 alone among all 171 Taiwanese trios vs. the 146 CL/P trios).

Two SNPs (rs2073485 and rs2235373), located next to, but not in LD with, the V274I variant, yielded statistical significance for individual SNPs and haplotypes among Taiwanese trios, even after correcting for multiple comparisons. In particular, the G allele at rs2235373 significantly increased the risk of being a CL/P case, whereas the A allele was underrepresented among Taiwanese cases. Overtransmission of the C allele at rs2013162 (located in the fifth exon of IRF6) was not confirmed here among European-American trios (from Maryland), in contrast to three previously reported studies of European-derived populations. In our Han Chinese populations (Taiwan and Singapore), the C allele at this synonymous SNP significantly increased the risk of being a CL/P case, whereas haplotypes including the A allele at the same marker were consistently undertransmitted to CL/P cases (Table 2).

This is the first study that has considered genotypic and diplotypic risks for specific SNP markers in IRF6 particularly in Taiwanese, Singaporean, and Korean samples. However, the number of Singaporean and Korean trios available may not be sufficient to detect SNPs with weak or moderate effects on risk. Although the C allele at rs599021 was overtransmitted to cases and C/C homozygotes showed an increased risk of being a CL/P case, C/A heterozygotes seemed to be at higher risk when diplotypes were considered with other SNPs (Table 4). The fact that mutations in IRF6 cause VWS, which usually includes CL/P, combined with significant evidence of linkage and association with CL/P in our study and other case-parent trio studies strongly suggests that IRF6 itself is a causal gene for CL/P, but not the only one.

IRF6 is one of nine members of a family of transcription factors (IRFs) that share a highly conserved helix-turn helix DNA-binding domain and a less conserved protein-binding domain. These domains exert diverse functions including regulating host defense pathways.2022 The Irf6 gene was expressed in the ectoderm fusion forming the upper lip and primary palate in both mouse and chick, but only in the developing secondary palate of the mouse (which fuses as in humans).23 Similar expression patterns for IRF6 are also seen in human craniofacial structures, although the biological function of IRF6 during development of the lip and palate in humans remains uncertain.

Significant results observed from SNPs other than p.V274I (rs2235371) suggested that V274I itself is not causal, but rather in LD with some causal mutation in IRF6. Patterns of LD between SNPs and the SNPs that individually showed statistical significance differed across our sample populations (see Appendix Table A2). However, our data confirm that the IRF6 gene is associated with increased risk of CL/P, and thus the regions showing statistical evidence of association (e.g., rs599021, rs2235373, rs2235371, and rs2013162 for two Chinese groups; rs2294408 for Korean trios) should be searched further for causal variants. High-risk genotypes and diplotypes identified here may provide a better understanding of the etiological role that IRF6 plays in CL/P and could prove useful in genetic counseling, if these findings can be confirmed in subsequent studies.