Sill and co-workers1 report that germline variation in semaphorin 4A (SEMA4A) influences colorectal cancer (CRC) risk. This stems from identifying the SEMA4A p.Val78Met variant in one kindred with familial colorectal cancer type X (FCCTX) and subsequently p.Gly484Ala (c.1451G>C, rs148744804) and p.Ser326Phe (c.977C>T) mutations along with the single nucleotide polymorphism (SNP) p.Pro682Ser (c.2044C>T, rs76381440) among an additional 53 FCCTX cases. In comparing the frequency of rs76381440 genotype in 47 FCCTX cases and 1,138 controls, c.2044T carrier status was reported to be associated with 6.79-fold increased CRC risk1.

Here, we report a well-powered study that casts doubt on SEMA4A as a CRC predisposition gene. This has important implications for clinical genetics because inappropriate screening or intervention might be recommended to carriers.

First, we studied the contribution of the recurrent variants, rs148744804 and rs76381440, to CRC analysing 6,856 CRC cases and 10,090 controls from six European populations as previously described2,3. These comprised (1) 3,666 English case patients (n=250 from the CORGI study; n=957 from the QUASAR study; n=1,168 from NSCCG; n=1,291 from a Leeds case–control series) and 6,140 control patients (n=5,694 from the 1958 Birth Cohort; n=446 from the Leeds series); (2) 2,052 Scottish CRC cases and 2,004 Scottish controls (n=1,452 from the 1935 and 1928 Lothian birth cohorts; n=552 from generation Scotland); (3) 276 Spanish cases and 284 controls; (4) 800 Dutch samples (n=337 Leiden cases and n=337 controls; n=74 Groningen cases and n=52 controls); (5) 199 Portuguese cases and 186 controls and (6) 1,339 German samples (n=77 Heidelberg cases and n=88 controls; n=175 Kiel cases and n=999 controls). Collectively these samples provide >99% power (α=0.05) to detect the lower limit of the point estimate reported by Sill and co-workers for the association between p.Pro682Ser and CRC1) (odds ratio [OR]=2.6).

We used Infinium HumanExome BeadChips (Illumina San Diego, CA) to genotype our samples as previously described2,3 and extracted the genotypes for rs148744804 and rs76381440. We validated genotyping by sequencing 541 random samples, providing very strong concordance (r2=1.0 and 0.99 for rs148744804 and rs76381440, respectively; Supplementary Table 1). We used principal component analysis to confirm ancestral comparability of cases and controls (Supplementary Figure 1).

None of the six series showed a statistically significant difference in frequency of rs148744804 or rs76381440 genotype between cases and controls (Table 1). In a meta-analysis of data from all studies, we found no association between c.1451C or c.2044T carrier status and CRC (OR=1.14, 95% confidence interval [CI]: 0.58–2.24, Pheterogenity=0.87, I2=0% and OR=1.04, 95% CI: 0.89–1.22, Pheterogenity=0.36, I2=9%, respectively; Fig. 1). Principal component analysis adjustment had no impact on findings.

Table 1 SEMA4A rs76381440 (p.Pro682Ser, c.2044C>T) and rs148744804 (p.Gly484Ala, c.1451G>C) genotype counts and association statistics for the six colorectal cancer case–control studies.
Figure 1: Forest plot of association between rs148744804 and rs76381440 SEMA4A genotypes and colorectal cancer risk.
figure 1

Studies were weighted according to the inverse of the variance of the log of the OR calculated by unconditional logistic regression. Meta-analysis under a fixed-effects model was conducted using standard methods. Cochran’s Q statistic to test for heterogeneity and the I2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated. Horizontal lines indicate 95% confidence intervals (CIs). Boxes indicate odds ratio (OR) point estimate; its area is proportional to the weight of the study. Diamond (and broken line) indicates overall summary estimate, with CI given by its width. Unbroken vertical line indicates null value (OR=1.0).

Following on from these analyses we examined the mutational spectra of SEMA4A in 1,006 familial early-onset CRC cases (≥1 first-degree relative with CRC, <56yrs; 158 with FCCTX) from the National Study of Colorectal Cancer Genetics (NSCCG) and 1,609 1958 BC controls sequenced using Illumina Truseq exome capture in conjunction with HiSeq2000 technology (Supplementary Figure 2, Supplementary Table 2). Over 99% of the SEMA4A transcript was covered at a depth greater than 10 reads (average coverage 38 ×, Supplementary Figure 2). We identified 28 protein changing variants in 354 samples. Of these variants, there were three unique frameshifts present in two controls and one case. Overall, 13% of CRC cases (16% FCCTX) had a protein changing variant in SEMA4A in comparison with 14% of controls.

There are a number of possible explanations for the disparity between our findings and those reported by Sill and co-workers1. Population stratification can lead to spurious associations, and this is especially important with rare variants. The study by Sill and co-workers did not account for this and indeed in comparing the frequency of rs14874408 included cases from both the US and Germany, whereas we ensured ancestral comparability of case patients and control subjects from single nucleotide polymorphism genotypes, thereby excluding this as a source of bias. Generalisability is central to establishing a mutation–phenotype relationship. The evidence for p.Val78Met being causative in FCCTX is based on incomplete segregation in the family reported by Schulz et al. Hence there is the issue of type 1 error.

Much of the missing heritability of CRC is likely to be a result of high/moderate penetrance mutations and rare variants. As illustrated by the recent identification of POLE and POLD1 as a cause of familial CRC4, this class of susceptibility is especially important in understanding cancer biology and for clinical practice. Hence there is a strong rationale for seeking to identify additional such genes. Given the high frequency of deleterious mutations carried by the healthy population, it is becoming increasingly clear that robust and well-powered studies are required to prevent erroneous findings from exome-sequencing projects being asserted to be causal of disease.

In conclusion, in this well-powered study, we find no evidence to support variation in SEMA4A as a determinant of CRC risk. Given that a priori SEMA4A is not a strong candidate CRC predisposition gene, having previously been shown to cause eye disease, we feel that caution should be exercised before SEMA4A is considered as a cause of CRC.

Additional information

How to cite this article: Kinnersley, B. et al. Correspondence: SEMA4A variation and risk of colorectal cancer. Nat. Commun. 7:10611 doi: 10.1038/ncomms10611 (2016).