Endemic Burkitt Lymphoma in second-degree relatives in Northern Uganda: in-depth genome-wide analysis suggests clues about genetic susceptibility

Mateus H. Gouveia ● Isaac Otim ● Martin D. Ogwang ● Mingyi Wang ● Bin Zhu ● Nathan Cole ● Wen Luo ● Belynda Hicks ● Kristine Jones ● Kathrin Oehl-Huber ● Leona W. Ayers ● Stefania Pittaluga ● Ismail D. Legason ● Hadijah Nabalende ● Patrick Kerchan ● Tobias Kinyera ● Esther Kawira ● Glen Brubaker ● Arthur G. Levin ● Lutz Guertler ● Jung Kim 12 ● Douglas R. Stewart 12 ● Melissa Adde ● Ian Magrath ● Andrew W. Bergen 12 ●

Using the 198 representative children with confirmed eBL in Northern Uganda as a denominator, we estimate that about 1% of the eBL cases may be genetically related [2]. The two children lived near each other in a high malaria transmission area (Fig. 1a). They presented with a short history of symptoms that started three months apart and were diagnosed with eBL based on histological criteria . Images are ×20 magnification. Scale bar 25 μm. c Genome-wide detection of chromosomal imbalances in FFPE tumor tissue by OncoScan Array analysis. For each tumor the imbalance and B-allele frequency plots are shown. The upper panel depicts case 1 (male) in which a loss at the IGK-locus in 2p11 (left arrow), a copy-number neutral loss of 17p13.3p13.1 (including the gene TP53, middle arrow) and a heterozygous loss in 18q21.32 (including the gene CCBE1, right arrow) were called. The lower panel shows the results of case 2 with loss at the IGH-locus in 14q32 (arrow) and a probable (subclonal) gain in 1q which was just below the diagnostic threshold for calling by the evaluation pipeline. (Fig. 1b): consistent morphology on hematoxylin and eosin stains, positivity for B cell and germinal center markers, high proliferation index, EBV RNA and MYC translocation. Results of genome-wide copy number variation (CNV) analyses of the tumor DNA based on OncoScan SNP-array ( Fig. 1c) and WES (Fig. S2) were consistent with reports that BL has a simple karyotype pattern [5] (Figs. 1c and S2). Specifically, the genomic imbalance mapping of the tumors revealed only a few alterations besides deletions at the IG loci, suggestive of clonal IG rearrangement (Table S1). In the first patient, these changes include apparent copy number neutral loss of heterozygosity (CNN-LOH) in 17p, spanning TP53, and a heterozygous loss in 18q21.32, spanning CCBE1. Alterations in the second patient were consistent with IG-rearrangement associated loss and a putative low level (sub-clonal) gain in 1q as highly recurrent in BL (Fig. 1c).
This discovery of BL in the two close relatives triggered a review of their medical records, which confirmed Stage C high-risk eBL in both patients. Both responded to treatment (INCTR 03-06 protocol) and were cured (Table 1). Both patients reported their paternal tribe as Langi, which belongs to the Western Nilotic ethno-linguistic group. Consistently, ADMIXTURE analysis showed that both children have more than 50% Nilotic ancestry (Fig. 1a).
We analyzed~400,000 variants in germline WES and 282 mutations identified in tumor WES of the two children ( Fig. S1). We focused on candidate germline variants defined as those with moderate-to-high deleterious Combined Annotation Dependent Depletion (CADD) score [9] or those mapped to genes that are recurrently mutated in BL tumors (n = 61) or in other cancers (n = 180) (Table S2) [5,6,10].
We identified 106,404 identical-by-state germline variants (~¼ of the total) in the two children, in agreement with their estimated second-degree of genetic relatedness. Of these variants, 254 were rare variants, with minor allele frequency (MAF) ≤ 0.01 in the ancestrally similar (Nilotic) reference population from Shirati, Tanzania (Tables S3 and  S4), and 784 were uncommon variants based on a less stringent MAF ≤ 0.05 (Table S5). Fourteen of the rare variants had a phred-scaled CADD scores >10 (range 10.3-25.3) (Table S3). All the variants were validated by manual review using The Integrative Genomics Viewer (IGV; Fig. S3A). None of the 14 variants were in a region previously suggested to be of regulatory importance in BL based on analysis of differential states of chromatin accessibility in BL-derived cell lines or nonneoplastic B-cells, or showed differential DNA methylation in BL [11] ( Table S6). From the 241 candidate genes (Table S2), we identified an intronic SNP (rs772535596) in CHD8 (Tables S3 and S4), which had a phred-scaled CADD score of~10, consistent with "moderate evidence of pathogenicity" [12,13]. CHD8 was recently identified as recurrently mutated in BL in the BL Genome Sequencing Project [4]. The rs772535596 SNP was not observed in the germline DNA of the unrelated Nilotic individuals from Shirati with BL (n = 30) and without BL (n = 80).
The discovered 14 candidate variants in 14 genes in the germline DNA of both children (Table S3) were classified as variant of unknown significance (VUS), benign, or likely benign, based on InterVar. Most of the variants were rare (MAF < 1% in the gnomAD database). Of potential importance was an intronic indel rs374301928-ATT/upstream of exon 12/20 (NM_001369568) in the TCF4 gene. This variant had a phred-scaled CADD score of 18.7 and "supporting evidence of pathogenicity" according to VarSome [12,13], although it is currently classified as VUS by InterVar. This variant was also observed in one of the unrelated BL Nilotic patients from Shirati (for a total of  (Table S3), suggesting that this is a rare African-specific variant. The variant is found in a highly conserved genomic region, with predicted transcription factor binding sites adjacent to exon 12/20 (Fig. S4). Of interest, the rs374301928-TCF4 locus appears to have been subject to early negative selection among vertebrate species and archaic hominins that may be indicative of relevant regulatory function [14]. Of pathological relevance to BL, TCF4 encodes the helix-loop-helix transcription factor 4 reported to interact with ID3, which is inactivated by recurrent mutations in up to two-thirds of BL [6,7,15]. Somatic TCF4 deregulation has been implicated as an alternative mechanism of ID3-inactivating or TCF3activating mutations in BL [6,15]. Our somatic WES analysis showed that both patients lacked mutations in ID3 and TCF3. Since TCF4 has been implicated in the ID3/ TCF3 pathway in BL [6,7,15], the observed germline/ somatic pattern in our cases raises the question of whether germline TCF4 genetic variants could have an effect comparable to somatic involvement of ID3/TCF3. In view of their potential significance, both the CHD8 and TCF4 germline variants were verified by Sanger sequencing (Fig. S3B).
The somatic WES analysis for these two children identified 29 mutations in core genes, including CCND3, MYC, and USP7 in one child and BCL7A and DDX3X in the other child, that have been reported to be recurrently mutated in other BL studies (Table S7) [4][5][6][7]. Most (266) of the somatic mutations were unique to each child's tumor (Tables S7 and S8). In addition to the mutations identified in the candidate BL genes, we identified 253 mutations in genes that have not been reported before in BL or other cancers; 70 of these mutations had phred-scaled CADD scores > 20 (Table S8).
While the discovery of genetic relatedness in these two eBL cases suggests a possible genetic predisposition to BL, environmental predisposition from P. falciparum malaria and EBV was considered. Both children did not carry common malaria-resistance genetic variants (e.g., the sickle cell trait, see Supplementary data) [3]. One child carried the HLA-B53 allele, previously reported to be associated with resistance to severe malaria in West Africa [16]. Both children were EBV-tumor positive and positive for EBV LMP-1 DNA Pattern A variant (Table 1), which has been associated with a 31-fold higher odds of eBL in EMBLEM [17]. However, the relatively advanced age at BL diagnosis in these children (>10 years) and recent efforts to suppress malaria in their district casts doubt on the hypothesis that these environmental pathogens are the sole triggers.
Our study is a discovery effort with several strengths, including epidemiologically well-characterized samples, extensive genomic data, and availability of tumor tissue. These strengths enabled us to robustly confirm diagnosis by histology and molecular analysis, and to conduct integrated multidisciplinary analysis combining somatic and germline WES data. We confirmed genetic relatedness, ancestry and discovered two variants that warrant follow-up. However, the limitations include small sample size, which precluded consideration of formal statistics (including adjustment for multiple comparisons), and the lack of functional validation of the variants. Also, the paucity of genomic data from individuals in the eBL belt, i.e., Nilotic speakers [2,3], is a limitation. Our study illustrates the feasibility and scalability of collaborative efforts applying genomic data analysis to identify familial aggregation of eBL in epidemiological or clinical cohorts and that such investigations can shed light on the genetic susceptibility to eBL.
In conclusion, we report the first pathologically confirmed eBL cases in Northern Uganda determined to be related based on their genetic data uncovered and analyzed in the course of an epidemiological study. We identified in both children potentially important germline DNA genetic variants in TCF4 and CHD8. These discoveries, although preliminary, provide novel clues about genetic susceptibility to eBL development.

Data availability
The datasets generated and/or analyzed during the current study are available through dbGAP: the EMBLEM data are available through accession: phs001705.v2.p1; the Shirati data are available through accession: phs002223.v1.p1; the Childhood Cancer Survivorship study are available through accession: phs002072.v1.p1; the International Cancer Genome Consortium (ICGC) data were extracted from WGS alignments available from the European Genomephenome Archive (EGA) under the accession numbers: EGA-S00001002198 in accordance with approval from the ICGC guidelines (www.icgc.org) under DACO-1064755 (National Institutes of Health). The infrastructural support of the KinderKrebsInitiative Buchholz/Holm-Seppensen to the group at the Institute of Human Genetics in Ulm is gratefully acknowledged. The authors acknowledge the research contributions of the Cancer Genomics Research Laboratory for their expertise, execution, and support of this research in the areas of project planning, wet laboratory processing of specimens, and bioinformatics analysis of generated data. from the NCI, NIH, under NCI Contract No. 75N910D00024. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. The content of this manuscript is the sole responsibility of the authors.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.