NSCL/P is one of the most common human birth defects. In European populations, NSCL/P has a prevalence ranging from 1 in 700 to 1 in 1,000. We recently reported a susceptibility locus for NSCL/P at chromosome 8q24.21 from a genome-wide association study in 224 individuals with NSCL/P (cases) and 383 population-based controls1. This locus is the second susceptibility locus to have been unequivocally identified for NSCL/P to date, the first being the IRF6 locus2.

To identify additional cleft susceptibility loci, we enlarged our sample by genotyping an additional set of 177 NSCL/P cases and adding the genotypes of 940 population-based controls of central European origin. Genotyping was performed using Illumina BeadChips (Human610-Quad and HumanHap 550k).

Following quality control (Supplementary Methods and Supplementary Fig. 1), association analysis of 521,288 SNPs having a minor allele frequency (MAF) of ≥1% in controls was performed in 399 cases and 1,318 controls.

After excluding markers from the previously described 8q24.21 locus, 20 SNPs with P < 10−5 remained. Five chromosomal loci (8q12.3, 10q25.3, 13q31.1, 15q13.3 and 17q22) were located within these 20 top SNPs, and the associations at these loci were further supported by at least three more SNPs with P < 10−4 (Supplementary Fig. 2 and Supplementary Table 1). Two additional regions were considered to be promising NSCL/P susceptibility loci (6p22.1, 11q14.2), as they contained at least four markers with P < 10−4.

To replicate the genome-wide association study (GWAS) findings, we selected the 20 top SNPs (P < 10−5) as well as additional backup markers for each of the seven previously mentioned loci, resulting in two replication assays. We included additional SNPs with P < 10−4 in the two replication assays, giving highest priority to SNPs with the lowest P values. Thus, a total of 56 markers were genotyped in a replication sample of 793 NSCL/P triads of European origin. Genotyping using matrix assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (Sequenom Inc.) was successful for 45 markers (representing 32 different loci), which were then analyzed by the transmission-disequilibrium test in 665 triads (128 triads were excluded after quality control, Supplementary Methods).

Of the 45 SNPs successfully genotyped, 11 (representing six different loci) showed P < 0.05 in the replication sample (Supplementary Table 2). Two of these SNPs remained significant after correction for multiple testing by a conservative Bonferroni procedure (17q22: rs227731, Pcorr = 0.01039 and 10q25.3: rs7078160, Pcorr = 0.04999). The probability that 6 or more out of 32 loci would generate P values <0.05 by chance alone is 0.0046. It is therefore likely that true association was detected. After combining the GWAS and replication samples, genome-wide significant evidence of association was found using the combined haplotype relative risk method3 for three SNPs at two loci (17q22: rs227731, Pcomb = 1.07 × 10−8; and 10q25.3: rs7078160, Pcomb = 1.92 × 10−8 and rs4752028, Pcomb = 2.48 × 10−8) (Table 1, Supplementary Table 3 and Supplementary Methods). Two further loci (13q31.1, 15q13.3) were replicated, although they fell short of achieving genome-wide significance (13q31.1: rs9574565, Pcomb = 3.44 × 10−7 and 15q13.3: rs1258763, Pcomb = 1.14 × 10−6). Similarly, the single marker rs7590268 on 2p21 failed to reach genome-wide significance in the combined sample (Pcomb = 8.62 × 10−8). For the two significant markers on 6q27 (rs2197100 and rs7740603, Supplementary Table 2), the putative risk alleles in the replication study differed from those in the GWAS, and these markers were therefore not considered to have been replicated. There was no evidence for imprinting or a maternal genotype effect for any of the 45 SNPs included in the family-based replication step (Supplementary Methods).

Table 1 GWAS and replication for the most significantly associated markers at five NSCL/P susceptibility loci

The relative risk (RR) in the replication sample for rs227731 (17q22) was 1.26 (95% CI 0.99–1.6) for the heterozygous genotype and 1.84 (95% CI 1.34–2.53) for the homozygous genotype. The RR in the replication sample for rs7078160 (10q25.3) was 1.32 (95% CI 1.05–1.65) for the heterozygous genotype and 2.17 (95% CI 1.32–3.56) for the homozygous genotype (Table 1).

The population attributable risk (PAR) estimated from the combined sample was 23.9% for rs227731 (17q22) and 12.3% for rs7078160 (10q25.3). The joint PAR for these two new loci, the key susceptibility locus at 8q24.21 and the IRF6 locus estimated from the combined sample was 54.6% (Supplementary Methods). Although there may be biases in estimating the PAR from these discovery cohort samples, these results suggest that together these four loci may explain a substantial proportion of the risk for NSCL/P. However, because the summary PAR is usually less than the sum of the individual PARs, it is also likely that additional genetic variants contribute to NSCL/P risk and remain to be identified.

The chromosome 17q22 region contains the gene encoding the noggin protein, NOG, which is located 100 kb centromeric of rs227731 (Fig. 1a). NOG is an antagonist of members of the transforming growth factor beta (TGF-β) superfamily, which includes proteins such as the bone morphogenetic protein 4 (BMP4). BMP4 has been shown to regulate mammalian palatogenesis4 and has been reported to be associated with clefting in humans5.

Figure 1: Details of the loci showing genome-wide significant association with NSCL/P in the combined sample.
figure 1

(a,b) Each panel shows single-marker association statistics (as −log10 P; left y axis) from the GWAS (squares) and from the combined analysis (diamonds). Linkage disequilibrium (r2) to the most significantly associated regional SNP (blue diamond), as estimated from the GWAS control genotypes, is color-coded (red: r2 > 0.8; orange: r2 = 0.5–0.8; yellow: r2 = 0.2–0.5; white: r2 < 0.2). Recombination rates across each region in HapMap CEU are shown in light blue (right y axis). The chromosomal locations and relative positions of genes according to hg18 are shown (x axis).

The four markers at 10q25.3 with a PGWAS < 10−4 are located within a 30-kb region and show high intermarker linkage disequilibrium (Fig. 1b). Two genes are located in close vicinity to this region: KIAA1598 (40 kb centromeric) and VAX1 (ventral anterior homeobox 1; 53 kb telomeric). Mice with homozygous Vax1 mutations display craniofacial malformations including cleft palate6. Two individuals with a 10q terminal deletion syndrome with breakpoints in 10q25 have been reported, one with a submucous cleft palate7 and the other with a cleft lip8.

Three further loci were successfully replicated but did not achieve genome-wide significance in the combined analysis. FMN1 (formin-1), a gene with an unknown function, and GREM1 (gremlin-1), the gene coding for another known antagonist of BMP4, are located on chromosome 15q13.3 (Supplementary Fig. 3a). The 13q31.1 locus lies in a gene desert (Supplementary Fig. 3b). SPRY2 (sprouty homolog 2) is located 241 kb telomeric of rs9574565. In mice, palate development is sensitive to Spry2 dosage9, and Spry2 overexpression results in craniofacial defects10. Resequencing of SPRY2 in NSCL/P cases has suggested the presence of rare, and possibly detrimental, variants in SPRY2 (ref. 11). The SNP on 2p21 (rs7590268) is located within intron 31 of THADA (thyroid adenoma associated), which may be involved in the cell death receptor pathway and apoptosis12 (Supplementary Fig. 3c). Notably, rs7590268 maps to a region that was duplicated in two individuals presenting with cleft palate and other anomalies13,14.

The most significant SNP at the 8q24.21 locus1 (rs987525) and the functional IRF6 variant2 (rs642961) were also genotyped in the replication sample in order to test for a possible interaction. Suggestive evidence (P = 0.005) was found for an interaction between the IRF6 variant rs642961 and the SNP near GREM1 at 15q13.3 (rs1258763). No evidence was found for any other interactions.

Recent genetic data2 support the hypothesis that NSCL/P may be separable into two sub-phenotypes: cleft lip only (NSCLO) and cleft lip with cleft palate (NSCLP). The genotype distribution in the NSCLP subsample (318 cases) did not differ significantly from that in the NSCLO subsample (81 cases) (data not shown) for any of the nine reported SNPs at the five newly identified loci (Supplementary Table 3).

Because NSCL/P and nonsyndromic cleft palate only (NSCPO) may have an etiological overlap, we genotyped the replication marker panel in 295 NSCPO triads. None of the SNPs showed evidence of association (Supplementary Table 4).

In summary, we have identified two new NSCL/P susceptibility loci with genome-wide significance on 17q22 and 10q25.3, and three further loci (13q31.1, 15q13.3 and 2p21) for which there is suggestive evidence. Promising candidate genes at these loci include NOG (noggin), VAX1 (ventral anterior homeobox 1), GREM1 (gremlin 1), SPRY2 (sprouty homolog 2) and THADA (thyroid adenoma associated). Given the intergenic location of the associated SNPs in the present study, further studies should test for allele-specific expression of these candidate genes and resequence their coding regions in order to identify possible functional variants.

Accession numbers. GenBank: BMP4, NM_001202 (NM_130850, NM_130851); FMN1, NM_001103184; GREM1, NM_013372; IRF6, NM_006147; KIAA1598, NM_018330 (NM_001127211); NOG, NM_005450; SPRY2, NM_005842; THADA, NM_022065; VAX1, NM_199131 (NM_001112704).