Introduction

Oral clefts are one of the most common birth defects in humans and represent a significant public health burden in terms of both medical and economic burden for affected individuals and their families. Non-syndromic cleft lip with or without cleft palate (CL/P) is complex in its etiology, and both genes and environmental risk factors influence the risk.1 Although several candidate genes have been studied extensively in different populations (TGFA, IRF6, BCL, RARA, etc), relatively few genes have been shown to contain truly causal mutations (MSX1, PVRL1, etc), and these are individually rare and often show incomplete penetrance.2, 3 Recently, several studies have reported that genes responsible for Mendelian malformation syndromes that include CL/P (eg IRF6, which accounts for the majority of Van der Woude syndrome) may also be associated with non-syndromic clefts.2, 4

Paired box (PAX) genes, termed as the PAX gene family, encode for specific DNA-binding transcription factors, which typically contain a PAX domain (an octapeptide) and a paired-type homeodomain.5 The mammalian PAX gene family includes nine genes encoding DNA-binding transcriptional regulatory proteins.6 These nine individual PAX genes are assigned to four subgroups based on conservation of their primary structure: (1) PAX1/PAX9, (2) PAX2/PAX5/PAX8, (3) PAX3/PAX7, and (4) PAX4/PAX6.6

PAX genes play critical roles during fetal development and in the growth of cancer cells. Mutations in the PAX3 (MIM 606597) gene have been associated with Waardenburg syndrome, which can include CL/P.7 PAX3 has also been associated with craniofacial-deafness-hand syndrome.8 PAX7 (MIM 167410) plays a crucial function during neural crest development.9 PAX3 and PAX7 have also been associated with alveolar rhabdomyosarcoma.10 Mutations in PAX6 (MIM 607108) have been associated with aniridia and development of the central nervous system.11, 12 Several studies have reported that PAX genes are associated with CL/P in animals.9, 13 However, to date few studies have focused on PAX genes as risk factors for CL/P in humans.14

It is important to consider parent-of-origin effects when studying birth defects because maternal genotype controls the in utero environment of the developing fetus, and separating maternal genotypic effects from imprinting effects remains an important scientific question.15, 16 Maternal parent-of-origin effects have been suggested for several genes associated with non-syndromic CL/P.17, 18 Males are more often affected with CL/P than females19, 20 however, the underlying cause of this aberrant sex ratio remains unclear. In this paper, we tested for association between single nucleotide polymorphisms (SNPs) in PAX3, PAX6, PAX7, and PAX9 genes and risk of CL/P in 297 case–parent trios, specifically considering parent-of-origin effects in the total sample and stratified by the proband's gender.

Materials and methods

Sample description

As part of an international study of oral clefts, we collected data on case–parent trios recruited through treatment centers in Maryland (MD): Johns Hopkins and University of Maryland; Taiwan (TW): Chang Gung Memorial Hospital; Singapore (SP): KK Women's and Children's Hospital, and Korea (KR): Yonsei Medical Center. Research protocols were reviewed and approved by institutional review boards at each institution. Table 1 lists the gender of all CL/P probands. All parents of probands in the Singaporean, Taiwan, and Korean trios were unaffected, but 4 parents among the 76 MD trios also had an oral cleft. The racial background of case families from MD was 80% European American, 16% African American, and 4% ‘other’. All probands underwent a clinical genetics evaluation (including checking for other congenital anomalies or major developmental delays), and were classified as having an isolated, non-syndromic CL/P. Among the total collection of 297 cases (5% of whom did not specify laterality), 17% of CL cases and 23% of CLP cases were bilateral.

Table 1 Gender among 297 non-syndromic CL/P cases from four populations

SNP selection, DNA, and genotyping

Single nucleotide polymorphic markers were obtained from literature review and the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/), using a NorthStar Searchlet from Genetic Software Innovations (Cicero, NY, USA), which identified SNPs within each gene based on definitions used in LocusLink and EntrezGene. SNPs were selected with primary consideration given to the spacing between known SNPs and the amount of sequence data available at that time in public databases. SNPs with multiple submitters and higher heterozygosity levels were given priority. SNPs with high ‘design scores’ (a predictor of useable genotypes provided by Illumina, Inc., San Diego, CA, USA), heterozygosity above 0.1 in both Caucasian and Asian populations, and HapMap validation were included. SNPs were selected in and around four PAX genes with the goal of identifying one SNP per 5 kb: 7 SNPs were genotyped for PAX7 on chromosome 1p36.2–p36.12, 13 for PAX3 on chromosome 2q35–q37, 7 for PAX6 on chromosome 11p13, and 7 for PAX9 on chromosome 14q12–q13. A total of 45 SNPs were identified, and 35 were polymorphic in all populations. The call rate we considered acceptable was ≥80%. One SNP had unacceptably high rates of missing data (71%), leaving only 34 SNPs with reasonable heterozygosity for analysis (Table 2).

Table 2 SNP minor allele frequencies among parents of 297 CL/P cases from four populations

Genomic DNA samples were prepared from peripheral blood by the protein precipitation method described earlier.21 DNA concentration was determined using the PicoGreen® dsDNA Quantitation Kit (Molecular Probes, Inc., Eugene, OR USA) and all DNA samples were stored at −20°C. A 4-μg aliquot of each genomic DNA sample (concentration of 100 ng/μl) was dispensed into bar-coded 96-well plates and genotyped for SNP markers using the Illumina Golden-Gate chemistry with Sentrix® Array Matrices22 at the SNP Center of the Genetic Resources Core Facility, part of the McKusick–Nathans Institute of Genetic Medicine at the Johns Hopkins School of Medicine. Two duplicates and four CEPH controls were included on each plate to evaluate genotyping consistency within and between plates. Genotypes were generated on a BeadLab 1000 system.23 All SNPs were inspected, and poorly performing SNPs were dropped. No Mendelian inconsistencies were found for these 34 SNPs when checked with the SIB-PAIR program.24

Statistical analysis

Within each population, minor allele frequencies (MAFs) were computed among parents. Pairwise linkage disequilibrium (LD) was computed as r2 for all SNPs using the Haploview program,25 and blocks were identified in Asian and MD population separately (Figure 1). Clayton's extension of the transmission disequilibrium test (TDT) as incorporated into STATA 8.226, 27 was used on individual SNPs to test for evidence of linkage and LD in the total sample of 297 CL/P trios. From this TDT analysis, we calculated the odds ratio of transmission, OR (transmission), and defined a ‘high-risk’ allele as that allele over-transmitted to cases (regardless of its statistical significance).

Figure 1
figure 1

Linkage disequilibrium as measured by r2 in PAX7, PAX3, PAX6, and PAX9 among parents of CL/P children from Asian and Maryland populations. White: r2=0. Shades of gray: 0<r2<1. Black: r2=1.

Parent-of-origin analyses were conducted in the total sample in several ways. As an initial screening, parent-of-origin effects were examined using the transmission asymmetry test (TAT), suggested by Weinberg et al,28 which is similar to the TDT but excludes mating between two heterozygotes (where transmission can be ambiguous). TAT was stratified into separate paternal and maternal allelic tests. Next, we used the likelihood-based approach proposed by Weinberg29 to confirm these parent-of-origin effects. This log-linear model considers the three mating types where the mother and father carry different numbers of variant alleles, stratified by the number of alleles in the child. This log-linear model is used to compute a parent-of-origin likelihood ratio test (PO-LRT), which tests maternal genotypic effects on the phenotype of the fetus (which could otherwise confound assessment of parent-of-origin effects) along with a separate term for imprinting.29 Here imprinting reflects a differential transmission of alleles to the affected child from mothers versus fathers. PO-LRT was executed using the LEM software.30 We also tested for parent-of-origin effect in the sample stratified by proband’s gender, with separate analyses for trios with male and female cases.

The FAMHAP package was used to estimate haplotype frequencies, while testing for excess transmission of multi-SNP haplotypes.31 In this haplotype analysis, 2–5 SNP haplotypes using sliding window were analyzed using FAMHAP. For this FAMHAP analysis, MD and Asian trios were analyzed separately. The haplotype analysis was carried out ignoring parent-of-origin first. Then the haplotype analysis was conducted for maternal and paternal transmissions separately. FAMHAP calculates maximum likelihood estimates of haplotype frequencies from nuclear families through the expectation–maximization algorithm and is robust in handling missing SNPs.32 This tool provides a haplotype-based test, where the test statistic is based on simulations that randomly permute transmitted and non-transmitted genotypes/haplotypes in each replicate.33 In this analysis, we used max-TDT, which analyzes each haplotype separately and relies on the maximum TDT as the test statistic.33, 34 The program yields P-values corrected for multiple haplotypes.

Results

Among these 34 SNPs, there was considerable variation in allele frequencies among parents from MD and the three Asian populations (Table 2). From the allele frequencies shown in Table 2, it is clear that some markers showed sharp distinctions between MD and Asian samples, whereas others did not. Taiwan, Korean, and Singaporean parents had very similar haplotype frequencies; therefore all Asian trios were combined for haplotype analysis. Patterns of LD across each gene were similar in all populations, with some adjacent SNPs in each gene in perfect LD, rendering them redundant (see Figures 1 and Table 2).

When individual markers were screened with the TDT in the total sample, one SNP in PAX7 and five SNPs in PAX6 were nominally significant when parent-of-origin was ignored (Table 3). The OR(transmission) was 1.62 (P=0.003) for rs766325 in PAX7. The five SNPs in PAX6 showing significant evidence of linkage and LD included two separate pairs of SNPs in perfect LD. The most significant SNP (rs3026354) gave an OR(transmission)=1.47 (P=0.008) ignoring parent-of-origin (Table 3). When analyzed separately in each of the four populations, the association was less strong because of the smaller sample sizes, but the patterns of OR(transmission) were similar (data not shown).

Table 3 Number of transmitted or non-transmitted minor alleles in 297 CL/P cases (all populations combined) from TDT and estimated odds ratios of transmission, OR(transmission) ignoring parent-of-origin

Parent-of-origin effects were first investigated by stratifying informative transmissions and non-transmissions by parental source for all SNPs in the total dataset (Table 4). TAT (ie where heterozygous × heterozygous matings were dropped) revealed three SNPs showing excess maternal transmission significant at the P<0.01 level (rs618941, rs553934 in PAX7, and rs1367414 in PAX3; see Figure 2 and Table 4), and three others (rs4674639, rs930140, and rs7600206) in PAX3 showing slightly less significant transmission from mothers. For these six SNPs, estimated maternal OR(transmission) was statistically significant (ranging from 1.74 to 2.40) in TAT analysis. The PO-LRTs were also significant for these six SNPs (P-values ranging from 0.035 to 0.012) and gave estimated risk ratios for imprinting ranging between 2.08 and 2.78 for these six SNPs, suggesting excess maternal transmission of this region in PAX7 and PAX3 (Table 4). Parent-of-origin effects for markers in PAX6 and PAX9 were not significant (data not shown).

Table 4 Number of transmitted or non-transmitted associated alleles to 297 CL/P cases (all populations combined)a from TAT and estimated odds ratios, and parent-of-origin likelihood ratio test to test for inequality of maternal versu s paternal transmission
Figure 2
figure 2

Empirical P-values for individual SNPs from PAX7, PAX3, PAX6, and PAX9 genes in 297 CL/P case–parent trios from four populations (Maryland, Singapore, Taiwan, and Korea) combined. (a) Only maternal transmission was considered; (b) only paternal transmission was considered.

Separate analyses were conducted for trios with male and female cases. For two SNPs in PAX7, the estimated OR(transmission) from mothers to male case was statistically significant (OR=4.50, P=0.0003 for rs553934; and OR=4.20, P=0.0017 for rs618941). Both of these SNPs gave significant PO-LRTs (P=0.028 for rs618941 and P=0.027 for rs553934). Among trios with a female case, however, OR(transmission) and PO-LRT were non-significant for these two SNPs (data not shown).

In the haplotype analysis using sliding window (ignoring parent-of-origin), haplotypes of two SNPs (rs766325 and rs880810) in PAX7 showed evidence of excess transmission of the 1–2 haplotype to the case among Asian trios (P=0.036). MD trios showed similar transmission patterns, but were not statistically significant (data not shown). In PAX6, haplotypes of three SNPs (rs592859, rs3026354, and rs2071164) showed strong evidence of excess transmission of the 1–2–2 haplotype to the case among Asian trios (P=0.011). The MD trios again showed similar transmission patterns, but were not statistically significant (data not shown). Haplotypes in PAX3 and PAX9 genes were not significant at P-value=0.05 (data not shown).

Next, we conducted the haplotype analyses stratified by parent-of-origin using 2–5 sliding windows (Table 5). In PAX7, haplotypes of rs880810 and rs618941 were most significant. The 2–2 haplotype showed evidence of excess maternal transmission to the CL/P child among Asian trios (P=0.049) and among MD trios (P=0.041), whereas no haplotypes showed deviation from expected when inherited from fathers. Analysis of two SNPs in PAX3 (rs4674639 and rs930140) showed evidence of excess maternal transmission of the 1–2 haplotype, with stronger evidence coming from Asian trios (here again MD trios showed similar but non-significant patterns of over-transmission).

Table 5 Testing for excess transmission of haplotypes of SNPs rs880810 and rs618941 in PAX7 and SNPs rs4674639 and rs930140 in PAX3 in 297 CL/P case–parent trios using the program FAMHAP with maternal and paternal transmission considered separately

Discussion

Our study of case–parent trios from different populations (comprising a total of 297 CL/P trios) showed evidence of linkage in the presence of LD for multiple SNPs in the PAX7 and PAX3 genes only when parent-of-origin effects were considered. In this study, ignoring parent-of-origin made the PAX7 and PAX3 genes look relatively uninteresting. Only a single SNP in PAX7 showed any evidence of linkage and LD. However, considering parent-of-origin revealed two SNPs in PAX7 and three SNPs in PAX3 yielding strong evidence of linkage and LD when transmitted from mothers but not from fathers. This evidence was more dramatic among male cases. Other studies also report that ignoring parent-of-origin could lead to overlooking important genes. In a case–parent trio study for bipolar I disorder, TDT analysis revealed no statistically significant association with SNPs on chromosome 18p11. However, when parent-of-origin was considered, evidence of association was seen involving two potentially causal loci.35

In screening for parent-of-origin effects, we found suggestive evidence of excess maternal transmission for several SNPs in PAX3 and PAX7, which are closely related and are important in mammalian embryogenesis.36 Relaix et al37 identified a new cell population expressing transcription factors PAX3 and PAX7, but no skeletal muscle-specific markers. These cells are maintained as a proliferating cell line throughout development in embryonic and fetal muscles of the trunk and limbs.

Excess maternal transmission could reflect genomic imprinting or direct maternal genotype effects on the developing fetus. Maternal genotypic effects for non-syndromic CL/P have also been reported for several other candidate genes (MTHFR, CBS, and TGFB3), but these are yet to be confirmed.17, 18, 38

In this study, log-linear models discriminating between maternal genotype and child genotype effects revealed a possible maternal imprinting effect for multiple SNPs in PAX7 and PAX3. Estimates of maternal genotype effects were generally non-significant for the 19 SNPs in PAX7 and PAX3, except for a single SNP (rs1367414). Genomic imprinting is defined as the differential expression of alleles depending on parent-of-origin.39 A common feature of imprinted genes is DNA sequence carrying a gametic methylation imprint, known as gametic DMR (Differentially DNA-Methylated Region).40 Parental allele-specific DNA methylation has been found at most imprinted clusters examined thus far. For example, the IGF2 cluster has a gametic DMR located 2 kb upstream from the H19 nc RNA promoter, which is methylated only in the paternal gamete and is maintained thereafter in all somatic tissues.41 Kurmasheva et al42 suggested PAX3 gene methylation may be correlated with gene inactivation.

In a variety of animal species, maternal transcripts and proteins control early embryonic development in the developing oocyte.43 In the leech Helobdella, Woodruff et al44 found that Hau-Pax3/7A is present as a maternal transcript in both ectodermal and mesodermal progenitor cells. They suggested Hau-Pax3/7A plays an important role in mesoderm development. Helobdella embryos receive a large contribution of maternal Hau-Pax3/7A RNA, but its function remains unknown.44

Many congenital anomalies occur more often in one gender. Males are more often affected with CL/P than females.19, 20 Rittler et al20 reported that infants with CL/P were more frequently female when the father was older, and among CL cases, this shift in sex ratio was highly significant. In our results from markers in PAX7 and PAX3 genes, boys showed stronger evidence of possible imprinting than female cases.

Even though this candidate gene study involved a modest number of SNPs in each gene, addressing the issue of multiple comparisons is necessary before an overall statement about the significance of our findings can be made. Here we relied on a hypothesis-driven approach for single SNP analysis and haplotype-based test statistics. SNPs in strong LD typically have highly correlated P-values, adjusting significance levels through Bonferroni correction is overly conservative. Therefore, following the strategy in Sull et al,45 we adjusted empirical P-values for the number of LD blocks rather than the number of SNPs. In this study, we have 10 LD blocks in these four genes (three forPAX7, three for PAX3, two for PAX6, and two for PAX9). In the second block with two SNPs in PAX7 gene (as shown in Table 4), we found evidence against the null hypothesis only for maternal transmission (the empirical P-value of 0.006 would still be marginally significant after correcting for the number of LD blocks). We also used haplotype-based test statistics based on permutation analysis of case–parent trio data. Salyakina et al46 argue that permutation tests are generally preferred over adjustments of asymptotic P-values based on the estimated correlation structure among multiple markers or on conventional Bonferroni adjustment (which can be too conservative).47 The case–parent trio design offers the advantage of testing directly for maternal versus paternal effects, and allows separating these from effects of the fetal genotype versus parental origin in a robust manner.26, 48 Another advantage of this design is that it minimizes confounding that plagues traditional case–control designs. This permits pooling trios from four diverse populations into a combined test of allelic effects on OR(transmission), while testing for parent-of-origin effects. The present study showed excess maternal transmission of markers in PAX7 and PAX3, suggesting that these genes may influence the risk of CL/P, possibly through imprinting. Independent confirmation is still needed to determine the ultimate impact of these PAX genes on risk to CL/P.