Exome-wide analyses identify low-frequency variant in CYP26B1 and additional coding variants associated with esophageal squamous cell carcinoma


Genome-wide association studies have identified common variants associated with risk of esophageal squamous cell carcinoma (ESCC). However, these common variants cannot explain all heritability of ESCC. Here we report an exome-wide interrogation of 3,714 individuals with ESCC and 3,880 controls for low-frequency susceptibility loci, with two independent replication samples comprising 7,002 cases and 8,757 controls. We found six new susceptibility loci in CCHCR1, TCN2, TNXB, LTA, CYP26B1 and FASN (P = 7.77 × 10−24 to P = 1.49 × 10−11), and three low-frequency variants had relatively high effect size (odds ratio > 1.5). Individuals with the rs138478634-GA genotype had significantly lower levels of serum all-trans retinoic acid, an anticancer nutrient, than those with the rs138478634-GG genotype (P = 0.0004), most likely due to an enhanced capacity of variant CYP26B1 to catabolize this agent. These findings emphasize the important role of rare coding variants in the development of ESCC.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Manhattan plot for associations between genetic variants and ESCC risk.
Fig. 2: Significant gene–lifestyle interaction for the CYP26B1 rs138478634 variant.
Fig. 3: CYP26B1 rs138478634 variant influences ESCC risk by altering the catabolic activity of the enzyme.


  1. 1.

    Chen, W. et al. Cancer statistics in China, 2015. CA Cancer J. Clin. 66, 115–132 (2016).

  2. 2.

    Torre, L. A. et al. Global cancer statistics, 2012. CA Cancer J. Clin. 65, 87–108 (2015).

  3. 3.

    Abnet, C. C. et al. A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. Nat. Genet. 42, 764–767 (2010).

  4. 4.

    Wang, L. D. et al. Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54. Nat. Genet. 42, 759–763 (2010).

  5. 5.

    Wu, C. et al. Genome-wide association study identifies three new susceptibility loci for esophageal squamous-cell carcinoma in Chinese populations. Nat. Genet. 43, 679–684 (2011).

  6. 6.

    Wu, C. et al. Genome-wide association analyses of esophageal squamous cell carcinoma in Chinese identify multiple susceptibility loci and gene-environment interactions. Nat. Genet. 44, 1090–1097 (2012).

  7. 7.

    Wu, C. et al. Joint analysis of three genome-wide association studies of esophageal squamous cell carcinoma in Chinese populations. Nat. Genet. 46, 1001–1006 (2014).

  8. 8.

    Chang, J. et al. Risk prediction of esophageal squamous-cell carcinoma with common genetic variants and lifestyle factors in Chinese population. Carcinogenesis 34, 1782–1786 (2013).

  9. 9.

    Huyghe, J. R. et al. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet. 45, 197–201 (2013).

  10. 10.

    Wessel, J. et al. Low-frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility. Nat. Commun. 6, 5897 (2015).

  11. 11.

    Jin, G. et al. Low-frequency coding variants at 6p21.33 and 20q11.21 are associated with lung cancer risk in Chinese populations. Am. J. Hum. Genet. 96, 832–840 (2015).

  12. 12.

    Rhee, E. P. et al. An exome array study of the plasma metabolome. Nat. Commun. 7, 12360 (2016).

  13. 13.

    Zhou, F. et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease. Nat. Genet. 48, 740–746 (2016).

  14. 14.

    Altucci, L. & Gronemeyer, H. The promise of retinoids to fight against cancer. Nat. Rev. Cancer 1, 181–193 (2001).

  15. 15.

    Lu, T. Y. et al. Inhibition effects of all trans-retinoic acid on the growth and angiogenesis of esophageal squamous cell carcinoma in nude mice. Chin. Med. J. (Engl.) 124, 2708–2714 (2011).

  16. 16.

    Alizadeh, F. et al. Retinoids and their biological effects against cancer. Int. Immunopharmacol. 18, 43–49 (2014).

  17. 17.

    Chen, M. C., Hsu, S. L., Lin, H. & Yang, T. Y. Retinoic acid and cancer treatment. Biomedicine (Taipei) 4, 22 (2014).

  18. 18.

    Kim, H. et al. The retinoic acid synthesis gene ALDH1a2 is a candidate tumor suppressor in prostate cancer. Cancer Res. 65, 8118–8124 (2005).

  19. 19.

    Hou, J. et al. Hepatic RIG-I predicts survival and interferon-α therapeutic response in hepatocellular carcinoma. Cancer Cell 25, 49–63 (2014).

  20. 20.

    Zhang, N. N. et al. RIG-I plays a critical role in negatively regulating granulocytic proliferation. Proc. Natl. Acad. Sci. USA 105, 10553–10558 (2008).

  21. 21.

    Jiang, L. J. et al. RA-inducible gene-I induction augments STAT1 activation to inhibit leukemia cell proliferation. Proc. Natl. Acad. Sci. USA 108, 1897–1902 (2011).

  22. 22.

    Bhattacharya, N. et al. Normalizing microbiota-induced retinoic acid deficiency stimulates protective CD8+ T cell-mediated immunity in colorectal cancer. Immunity 45, 641–655 (2016).

  23. 23.

    Goralczyk, R. Beta-carotene and lung cancer in smokers: review of hypotheses and status of research. Nutr. Cancer 61, 767–774 (2009).

  24. 24.

    Li, T., Molteni, A., Latkovich, P., Castellani, W. & Baybutt, R. C. Vitamin A depletion induced by cigarette smoke is associated with the development of emphysema in rats. J. Nutr. 133, 2629–2634 (2003).

  25. 25.

    Xue, Y., Harris, E., Wang, W. & Baybutt, R. C. Vitamin A depletion induced by cigarette smoke is associated with an increase in lung cancer-related markers in rats. J. Biomed. Sci. 22, 84 (2015).

  26. 26.

    Clugston, R. D. & Blaner, W. S. The adverse effects of alcohol on vitamin A metabolism. Nutrients 4, 356–371 (2012).

  27. 27.

    Menendez, J. A. & Lupu, R. Fatty acid synthase and the lipogenic phenotype in cancer pathogenesis. Nat. Rev. Cancer 7, 763–777 (2007).

  28. 28.

    Kuhajda, F. P. Fatty-acid synthase and human cancer: new perspectives on its role in tumor biology. Nutrition 16, 202–208 (2000).

  29. 29.

    Kuhajda, F. P. Fatty acid synthase and cancer: new application of an old pathway. Cancer Res. 66, 5977–5980 (2006).

  30. 30.

    Nguyen, P. L. et al. Fatty acid synthase polymorphisms, tumor expression, body mass index, prostate cancer risk, and survival. J. Clin. Oncol. 28, 3958–3964 (2010).

  31. 31.

    Eggert, S. L. et al. Genome-wide linkage and association analyses implicate FASN in predisposition to Uterine Leiomyomata. Am. J. Hum. Genet. 91, 621–628 (2012).

  32. 32.

    Campa, D. et al. Genetic variation in genes of the fatty acid synthesis pathway and breast cancer risk. Breast Cancer Res. Treat. 118, 565–574 (2009).

  33. 33.

    Wuerges, J. et al. Structural basis for mammalian vitamin B12 transport by transcobalamin. Proc. Natl. Acad. Sci. USA 103, 4386–4391 (2006).

  34. 34.

    Tajuddin, S. M. et al. Genetic and non-genetic predictors of LINE-1 methylation in leukocyte DNA. Environ. Health Perspect. 121, 650–656 (2013).

  35. 35.

    Hazra, A. et al. Twenty-four non-synonymous polymorphisms in the one-carbon metabolic pathway and risk of colorectal adenoma in the Nurses’ Health Study. Carcinogenesis 28, 1510–1519 (2007).

  36. 36.

    Hazra, A. et al. Germline polymorphisms in the one-carbon metabolism pathway and DNA methylation in colorectal cancer. Cancer Causes Control 21, 331–345 (2010).

  37. 37.

    Martinelli, M. et al. A candidate gene study of one-carbon metabolism pathway genes and colorectal cancer risk. Br. J. Nutr. 109, 984–989 (2013).

  38. 38.

    Tse, K. P. et al. Genome-wide association study reveals multiple nasopharyngeal carcinoma-associated loci within the HLA region at chromosome 6p21.3. Am. J. Hum. Genet. 85, 194–203 (2009).

  39. 39.

    Shiraishi, K. et al. A genome-wide association study identifies two new susceptibility loci for lung adenocarcinoma in the Japanese population. Nat. Genet. 44, 900–903 (2012).

  40. 40.

    Savage, S. A. et al. Genome-wide association study identifies two susceptibility loci for osteosarcoma. Nat. Genet. 45, 799–803 (2013).

  41. 41.

    Chen, D. et al. Genome-wide association study of susceptibility loci for cervical cancer. J. Natl. Cancer Inst. 105, 624–633 (2013).

  42. 42.

    Asumalahti, K. et al. Coding haplotype analysis supports HCR as the putative susceptibility gene for psoriasis at the MHC PSORS1 locus. Hum. Mol. Genet. 11, 589–597 (2002).

  43. 43.

    Orozco, G. et al. Common genetic variants associated with disease from genome-wide association studies are mutually exclusive in prostate cancer and rheumatoid arthritis. BJU Int. 111, 1148–1155 (2013).

  44. 44.

    Kote-Jarai, Z. et al. Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study. Nat. Genet. 43, 785–791 (2011).

  45. 45.

    Mao, J. R. et al. Tenascin-X deficiency mimics Ehlers-Danlos syndrome in mice through alteration of collagen deposition. Nat. Genet. 30, 421–425 (2002).

  46. 46.

    Ikuta, T., Ariga, H. & Matsumoto, K. Extracellular matrix tenascin-X in combination with vascular endothelial growth factor B enhances endothelial cell proliferation. Genes Cells 5, 913–927 (2000).

  47. 47.

    Minamitani, T., Ariga, H. & Matsumoto, K. Adhesive defect in extracellular matrix tenascin-X-null fibroblasts: a possible mechanism of tumor invasion. Biol. Pharm. Bull. 25, 1472–1475 (2002).

  48. 48.

    Aggarwal, B. B. Signalling pathways of the TNF superfamily: a double-edged sword. Nat. Rev. Immunol. 3, 745–756 (2003).

  49. 49.

    Niwa, Y. et al. Lymphotoxin-alpha polymorphism and the risk of cervical cancer in Japanese subjects. Cancer Lett. 218, 63–68 (2005).

  50. 50.

    Wang, S. S. et al. Common gene variants in the tumor necrosis factor (TNF) and TNF receptor superfamilies and NF-kB transcription factors and non-Hodgkin lymphoma risk. PLoS One 4, e5360 (2009).

  51. 51.

    Aissani, B. et al. The major histocompatibility complex conserved extended haplotype 8.1 in AIDS-related non-Hodgkin lymphoma. J. Acquir. Immune Defic. Syndr. 52, 170–179 (2009).

  52. 52.

    Skibola, C. F. et al. Tumor necrosis factor (TNF) and lymphotoxin-alpha (LTA) polymorphisms and risk of non-Hodgkin lymphoma in the InterLymph Consortium. Am. J. Epidemiol. 171, 267–276 (2010).

  53. 53.

    Lu, R. et al. A functional polymorphism of lymphotoxin-alpha (LTA) gene rs909253 is associated with gastric cancer risk in an Asian population. Cancer. Epidemiol. 36, e380–e386 (2012).

  54. 54.

    Zhang, Y. et al. Tumor necrosis factor-α induced protein 8 polymorphism and risk of non-Hodgkin’s lymphoma in a Chinese population: a case-control study. PLoS One 7, e37846 (2012).

  55. 55.

    Zhou, P. et al. The lymphotoxin-α 252A>G polymorphism and breast cancer: a meta-analysis. Asian Pac. J. Cancer Prev. 13, 1949–1952 (2012).

  56. 56.

    Sainz, J. et al. Effect of type 2 diabetes predisposing genetic variants on colorectal cancer risk. J. Clin. Endocrinol. Metab. 97, E845–E851 (2012).

  57. 57.

    Huang, Y. et al. Four genetic polymorphisms of lymphotoxin-alpha gene and cancer risk: a systematic review and meta-analysis. PLoS One 8, e82519 (2013).

  58. 58.

    Goldstein, J. I. et al. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28, 2543–2545 (2012).

  59. 59.

    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  60. 60.

    Lou, J. et al. A functional polymorphism located at transcription factor binding sites, rs6695837 near LAMC1 gene, confers risk of colorectal cancer in Chinese populations. Carcinogenesis 38, 177–183 (2017).

  61. 61.

    Li, J. et al. A low-frequency variant in SMAD7 modulates TGF-β signaling and confers risk for colorectal cancer in Chinese population. Mol. Carcinog. 56, 1798–1807 (2017).

  62. 62.

    Li, Y., Willer, C. J., Ding, J., Scheet, P. & Abecasis, G. R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

  63. 63.

    Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 8, e64683 (2013).

  64. 64.

    Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

  65. 65.

    Kane, M. A., Folias, A. E., Wang, C. & Napoli, J. L. Quantitative profiling of endogenous retinoic acid in vivo and in vitro by tandem mass spectrometry. Anal. Chem. 80, 1702–1708 (2008).

  66. 66.

    Chen, W. et al. Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nat. Genet. 46, 714–721 (2014).

  67. 67.

    Kane, M. A., Chen, N., Sparks, S. & Napoli, J. L. Quantification of endogenous retinoic acid in limited biological samples by LC/MS/MS. Biochem. J. 388, 363–369 (2005).

Download references


This work was supported by the National Key Research and Development Plan Program (2016YFC1302702 to X.M., 2016YFC1302701 to C.W. and 2016YFC1302703 to R.Z.); the National Program for Support of Top-notch Young Professionals, National Natural Science Foundation of China (81171878, 81222038 to X.M.); the Fok Ying Tung Foundation for Young Teachers in the Higher Education Institutions of China (131038 to X.M.); and the Program for HUST Academic Frontier Youth Team (to X.M.).

Author information

X.M. and C.W. were the overall principal investigators of this study, who conceived the study and obtained financial support, were responsible for study design and oversaw the entire study, and synthesized the paper. J.C. performed statistical analyses, interpreted the results and drafted the initial manuscript. J.C., R.Z., J.T., J. Li, K.Z., J.K., J. Lou, W.C., B.Z., N.S., Y. Zhang, Y.G., Y.Y., Y. Zhu, D.Z. and X.P. performed laboratory analyses. Z.Z. and X.Z. were responsible for patient recruitment and sample preparation from Hebei province. K.H., T.W. and D.L. reviewed the manuscript. All authors have approved the final report for publication.

Correspondence to Chen Wu or Xiaoping Miao.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1

Summary of the study design and work flow

Supplementary Figure 2 Plots for genetic matching of three principal components derived from the PCA of 3,714 cases with ESCC and 3,880 controls, and 206 HapMap individuals without relationships

(a) PC1 versus PC2 for 3,714 cases, 3,880 controls and 206 HapMap individuals, including 57 YRIs, 60 CEUs, 44 JPTs, and 45 CHBs. (b) PC1 versus PC2 for 3,714 ESCC cases and 3,880 controls. (c) PC1 versus PC3 for 3,714 ESCC cases and 3,880 controls. (d) PC2 versus PC3 for 3,714 ESCC cases and 3,880 controls. The case-control matching suggested minimal evidence of population stratification.

Supplementary Figure 3 Quantile-quantile plot and genomic inflation factor lambda for associations with ESCC risk

The results were based on 3,714 ESCC cases and 3,880 controls in the discovery stage of this study. The red circles represent the distribution of P values for the association in the discovery stage. The observed versus expected χ2 test statistics shows no evidence for inflation of χ2 tests (inflation factor λ = 1.032).

Supplementary Figure 4 Regional plots of association results and recombination rates within the four significant susceptibility loci

(a-f) rs130079 (a), rs117353193 (b), rs204900 (c), rs1041981 (d), rs138478634 (e) and rs17848945 (f). The association results were based on imputation results of 3,714 ESCC cases and 3,880 controls in the discovery stage of this study. P values are two sided and were calculated by an additive model in logistic regression analysis adjusted for sex, age, smoking status, drinking status and the first three principle components. For each plot, the −log10P values (y-axis) of the SNPs are presentedaccording to their chromosomal positions (x-axis). The genetic recombination rates (cM/Mb) estimated using the 1000 Genomes June2014 ASN samples are shown with ablue line; we annotated the genes within the region of interest, and these genes are shown as arrows. The LD r2 values were calculated using pairwise linkage disequilibrium analyses. The top genotyped SNP is labeled by rs ID, and the r2 values of the rest of the SNPs with the top genotyped SNP are indicated by different colors.

Supplementary Figure 5 Linkage disequilibrium plot of rs117353193

(a) Regional plot of LD r2 and recombination rates in a 1-Mb region centered by rs117353193. The LD r2 was calculated based on the 1000 Genomes phase 3 ASN population. (b) The LD block plot of variants with LD r2 > 0.1 for rs117353193. The LD r2 was calculated using pairwise linkage disequilibrium analyses in PLINK based on 504 individuals from the 1000 Genomes phase 3 ASN population.

Supplementary Figure 6 Linkage disequilibrium plot of rs17848945

(a) Regional plot of LD r2 and recombination rates in a 1-Mb region centered by rs17848945. The LD r2 was calculated based on the 1000 Genomes phase 3 ASN population. (b) The LD block plot of variants with LD r2 > 0.1 for rs17848945. The LD r2 was calculated using pairwise linkage disequilibrium analyses in PLINK based on 504 individuals from the 1000 Genomes phase 3 ASN population.

Supplementary Figure 7 Linkage disequilibrium plot of rs138478634

(a) Regional plot of LD r2 and recombination rates in a 1-Mb region centered by rs138478634. The LD r2 was calculated based on the 1000 Genomes phase 3 ASN population. (b) The LD block plot of variants with LD r2 > 0.1 for rs138478634. The LD r2 was calculated using pairwise linkage disequilibrium analyses in PLINK based on 504 individuals from the 1000 Genomes phase 3 ASN population.

Supplementary Figure 8 Stratification analysis of the association between risk of ESCC and the six identified SNPs

(a-f) rs130079 (a), rs117353193 (b), rs204900 (c), rs1041981 (d), rs138478634 (e) and rs17848945 (f). Each box and horizontal line represent the OR point estimate and 95% CI derived from the additive model. The analyses were based on 10,716 ESCC cases and 12,637 controls in this study. The area of each box is proportional to the statistical weight of the study. The heterogeneity P values are shown in the right side of the plots.

Supplementary Figure 9 Histogram distribution of minor allele frequencies of variants interrogated in this study in controls

The y-axis shows number of variants. The x-axis shows range of minor allele frequencies.

Supplementary Figure 10 Result of the test of transfection efficiency

(a-f) Relative expression levels of CYP26B1 are shown as determined by western blot (a-d) or qRT–PCR (e,f). The western blot experiment was repeated independently three times with similar results. KYSE30 and KYSE150 cells were transfected with CYP26B1[G], CYP26B1[A] and control vector (a,c,e) or targeting siRNAs and siControl (b,d,f). (a,b) Cropped western blot are shown. (c,d) Full scans of western blots are shown. (e,f) Results present means ± s.e.m. from three independent experiments and each had three replications. P values were compared with control by two-sided unpaired Student’s t-test.

Supplementary information

Supplementary Figures and Tables

Supplementary Figures 1–10 and Supplementary Tables 1–13

Life Sciences Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading