Article | Published:

Genome sequencing analysis identifies Epstein–Barr virus subtypes associated with high risk of nasopharyngeal carcinoma


Epstein–Barr virus (EBV) infection is ubiquitous worldwide and is associated with multiple cancers, including nasopharyngeal carcinoma (NPC). The importance of EBV viral genomic variation in NPC development and its striking epidemic in southern China has been poorly explored. Through large-scale genome sequencing of 270 EBV isolates and two-stage association study of EBV isolates from China, we identify two non-synonymous EBV variants within BALF2 that are strongly associated with the risk of NPC (odds ratio (OR) = 8.69, P = 9.69 × 10−25 for SNP 162476_C; OR = 6.14, P = 2.40 × 10−32 for SNP 163364_T). The cumulative effects of these variants contribute to 83% of the overall risk of NPC in southern China. Phylogenetic analysis of the risk variants reveals a unique origin in Asia, followed by clonal expansion in NPC-endemic regions. Our results provide novel insights into the NPC endemic in southern China and also enable the identification of high-risk individuals for NPC prevention.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

The EBV sequencing data are deposited in the US National Center for Biotechnology Information (NCBI) database under BioProject ID PRJNA522388. EBV sequences are released in NCBI database under GenBank IDs MK540241MK540470.


  1. 1.

    Epstein, M. A., Achong, B. G. & Barr, Y. M. Virus particles in cultured lymphoblasts from Burkitt’s lymphoma. Lancet 1, 702–703 (1964).

  2. 2.

    Epstein, A. Why and how Epstein-Barr virus was discovered 50 years ago. Curr. Top. Microbiol Immunol. 390, 3–15 (2015).

  3. 3.

    Kieff, E. D. & Rickinson, A. B. in Fields’ Virology 5th edn, Vol. 2 (eds Knipe, D. M. & Howley, P. M.) Ch. 68A, 2603–2654 (Lippincott Williams & Wilkins, Wolters Kluwer, 2007).

  4. 4.

    Zhang, L. F. et al. Incidence trend of nasopharyngeal carcinoma from 1987 to 2011 in Sihui county, Guangdong province, south China: an age-period-cohort analysis. Chin. J. Cancer 34, 350–357 (2015).

  5. 5.

    Bei, J. X. et al. A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci. Nat. Genet 42, 599–603 (2010).

  6. 6.

    Bei, J. X. et al. A GWAS Meta-analysis and replication study identifies a novel locus within CLPTM1L/TERT associated with nasopharyngeal carcinoma in individuals of Chinese ancestry. Cancer Epidemiol. Biomark. Prev. 25, 188–192 (2016).

  7. 7.

    Cui, Q. et al. An extended genome-wide association study identifies novel susceptibility loci for nasopharyngeal carcinoma. Hum. Mol. Genet. 25, 3626–3634 (2016).

  8. 8.

    Tang, M. et al. The principal genetic determinants for nasopharyngeal carcinoma in China involve the HLA class I antigen recognition groove. PLoS Genet. 8, e1003103 (2012).

  9. 9.

    Baer, R. et al. DNA sequence and expression of the B95-8 Epstein–Barr virus genome. Nature 310, 207–211 (1984).

  10. 10.

    Zeng, M. S. et al. Genomic sequence analysis of Epstein-Barr virus strain GD1 from a nasopharyngeal carcinoma patient. J. Virol. 79, 15323–15330 (2005).

  11. 11.

    Dolan, A., Addison, C., Gatherer, D., Davison, A. J. & McGeoch, D. J. The genome of Epstein–Barr virus type 2 strain AG876. Virology 350, 164–170 (2006).

  12. 12.

    Liu, P. et al. Direct sequencing and characterization of a clinical isolate of Epstein-Barr virus from nasopharyngeal carcinoma tissue by using next-generation sequencing technology. J. Virol. 85, 11291–11299 (2011).

  13. 13.

    Lin, Z. et al. Whole-genome sequencing of the Akata and Mutu Epstein-Barr virus strains. J. Virol. 87, 1172–1182 (2013).

  14. 14.

    Palser, A. L. et al. Genome diversity of Epstein-Barr virus from multiple tumor types and normal infection. J. Virol. 89, 5222–5237 (2015).

  15. 15.

    Correia, S. et al. Natural Variation of Epstein-Barr Virus Genes, Proteins, and Primary MicroRNA. J. Virol. 91, e00375-17 (2017).

  16. 16.

    Kwok, H. et al. Genomic diversity of Epstein-Barr virus genomes isolated from primary nasopharyngeal carcinoma biopsy samples. J. Virol. 88, 10662–10672 (2014).

  17. 17.

    Edwards, R. H., Seillier-Moiseiwitsch, F. & Raab-Traub, N. Signature amino acid changes in latent membrane protein 1 distinguish Epstein–Barr virus strains. Virology 261, 79–95 (1999).

  18. 18.

    Hui, K. F. et al. High risk Epstein–Barr virus variants characterized by distinct polymorphisms in the EBER locus are strongly associated with nasopharyngeal carcinoma. Int. J. Cancer 144, 3031–3042 (2018).

  19. 19.

    Tso, K. K. et al. Complete genomic sequence of Epstein–Barr virus in nasopharyngeal carcinoma cell line C666-1. Infect. Agent Cancer 8, 29 (2013).

  20. 20.

    Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  21. 21.

    Chen, H. et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet. 98, 653–666 (2016).

  22. 22.

    Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).

  23. 23.

    Decaussin, G., Leclerc, V. & Ooka, T. The lytic cycle of Epstein–Barr virus in the nonproducer Raji line can be rescued by the expression of a 135-kilodalton protein encoded by the BALF2 open reading frame. J. Virol. 69, 7309–7314 (1995).

  24. 24.

    Zeng, Y., Middeldorp, J., Madjar, J. J. & Ooka, T. A major DNA binding protein encoded by BALF2 open reading frame of Epstein–Barr virus (EBV) forms a complex with other EBV DNA-binding proteins: DNAase, EA-D, and DNA polymerase. Virology 239, 285–295 (1997).

  25. 25.

    Mumtsidu, E. et al. Structural features of the single-stranded DNA-binding protein of Epstein–Barr virus. J. Struct. Biol. 161, 172–187 (2008).

  26. 26.

    Rowe, M. et al. Distinction between Epstein-Barr virus type A (EBNA 2A) and type B (EBNA 2B) isolates extends to the EBNA 3 family of nuclear proteins. J. Virol. 63, 1031–1039 (1989).

  27. 27.

    Li, D. J. et al. The dominance of China 1 in the spectrum of Epstein–Barr virus strains from Cantonese patients with nasopharyngeal carcinoma. J. Med. Virol. 81, 1253–1260 (2009).

  28. 28.

    Coghill, A. E. et al. Identification of a Novel, EBV-based antibody risk stratification signature for early detection of nasopharyngeal carcinoma in Taiwan. Clin. Cancer Res. 24, 1305–1314 (2018).

  29. 29.

    Paramita, D. K. et al. Native early antigen of Epstein–Barr virus, a promising antigen for diagnosis of nasopharyngeal carcinoma. J. Med. Virol. 79, 1710–1721 (2007).

  30. 30.

    Steven, N. M. et al. Immediate early and early lytic cycle proteins are frequent targets of the Epstein–Barr virus-induced cytotoxic T cell response. J. Exp. Med. 185, 1605–1617 (1997).

  31. 31.

    Xue, W. Q. et al. Decreased oral Epstein-Barr virus DNA loads in patients with nasopharyngeal carcinoma in Southern China: a case-control and a family-based study. Cancer Med. 7, 3453–3464 (2018).

  32. 32.

    Hadinoto, V., Shapiro, M., Sun, C. C. & Thorley-Lawson, D. A. The dynamics of EBV shedding implicate a central role for epithelial cells in amplifying viral output. PLoS Pathog. 5, e1000496 (2009).

  33. 33.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  34. 34.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  35. 35.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  36. 36.

    Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

  37. 37.

    Raab-Traub, N. & Flynn, K. The structure of the termini of the Epstein–Barr virus as a marker of clonal cellular proliferation. Cell 47, 883–889 (1986).

  38. 38.

    Pathmanathan, R., Prasad, U., Sadler, R., Flynn, K. & Raab-Traub, N. Clonal proliferations of cells infected with Epstein–Barr virus in preinvasive lesions related to nasopharyngeal carcinoma. N. Engl. J. Med. 333, 693–698 (1995).

  39. 39.

    Neri, A. et al. Epstein–Barr virus infection precedes clonal expansion in Burkitt’s and acquired immunodeficiency syndrome-associated lymphoma. Blood 77, 1092–1095 (1991).

  40. 40.

    Weiss, E. R. et al. Early Epstein-Barr virus genomic diversity and convergence toward the B95.8 Genome in primary infection. J. Virol. 92, e01466-17 (2018).

  41. 41.

    Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

  42. 42.

    Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).

  43. 43.

    Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).

  44. 44.

    Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60, 291–302 (2011).

  45. 45.

    Li, W. et al. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 43, W580–W584 (2015).

  46. 46.

    Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).

  47. 47.

    Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

  48. 48.

    Dahlqwist, E., Zetterqvist, J., Pawitan, Y. & Sjolander, A. Model-based estimation of the attributable fraction for cross-sectional, case-control and cohort studies using the R package AF. Eur. J. Epidemiol. 31, 575–582 (2016).

Download references


We thank all of the participants for their generous support of the current study. We would also thank R. Sun, C. Wang, H. Chen, J. Shen and C. Jie for helpful discussions on viral biology and genetic statistical, evolutionary and phylogenetic analyses, W.-S. Liu and X. Zuo for providing code support, Z. Lin (Tulane University) for kindly sharing EBV genome annotation files and J.-Y. Shao from Sun Yat-sen University Cancer Center for providing the MassArray iPlex platform. This work was supported by the National Natural Science Foundation of China (81430059 to Y.-X.Z. and 81872228 to M.X.), the National Key R&D Program of China (2016YF0902000 to Y.-X.Z., and 2018YFC1406902 and 2018YFC0910400 to W.Z.), the National Cancer Institute at the US National Institutes of Health (NIH) (R01CA115873-01 to H.-O.A. and Y.-X.Z., and R35-CA197449, P01-CA134294, U01-HG009088 and U19-CA203654 to X.L.) and the Agency of Science, Technology and Research (A*STAR), Singapore (to J.L.).

Author information

Y.-X.Z., J.L. and W.Z. were the principal investigators who conceived the study. Y.-X.Z., J.L., W.Z. and M.X. designed and oversaw the study. J.L. and X.L. supervised the viral genome-wide association studies. W.W. supervised phylogenetic analysis. M.X. contributed to sample preparation, sequencing, genotyping, variant calling and genetic statistical analyses. Y.Y. contributed to sequencing, genotyping and variant calling. H.C. contributed to phylogenetic analyses. S.Z. contributed to genotyping and genetic statistical analyses. Z.Li contributed to genetic statistical analyses. Z.Z. contributed to collection of samples from the First Affiliated Hospital of Guangxi Medical College. B.L. contributed to collection of samples from the Affiliated Hospital of the Qingdao University. X.G., M.-Y.C., R.P. and R.-H.X. contributed to collection of samples from Sun Yat-sen University Cancer Center. H.-O.A., W.Y. and Y.-X.Z. supervised the design and implementation of the population-based case–control study in Zhaoqing. W.Y., E.T.C., S.-M.C., S.-H.X. and Z.Liu participated in the case–control study. The manuscript was drafted by M.X., J.L., W.Z. and Y.-X.Z., and revised by V.P. and E.T.C. All authors critically reviewed the article and approved the final manuscript.

Correspondence to Weiwei Zhai or Yi-Xin Zeng or Jianjun Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–15, Supplementary Tables 2–4, 6–17, 19 and 20, and Supplementary Note

Reporting Summary

Supplementary Table 1

Information of 270 EBV isolates sequenced in current study and 97 publicly accessed genomes included in our study

Supplementary Table 5

Variant information of EBV genome isolates sequenced in current study

Supplementary Table 18

The percentage of heterozygous variants in 270 EBV genome isolates

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Principal component and phylogenetic analyses of EBV genomes.
Fig. 2: Genome-wide association analysis of EBV variants in 156 NPC cases and 47 controls.