Endemic Burkitt lymphoma (eBL) is a monoclonal B-cell non-Hodgkin lymphoma that is common in equatorial Africa and Papua New Guinea, which has been linked to childhood infection with Plasmodium falciparum (Pf)1,2,3,4, a Class 2A carcinogen for eBL5. Evidence for associations between eBL and Pf is unclear with, for example, the risk of eBL being increased in children with antibody markers of recent Pf infection while decreased in those with antibody markers of long-term exposure to Pf infection6,7. An alternative approach is to assess Pf prevalence, density, or genetic diversity as risk factors for eBL. Early studies of the association between eBL and Pf prevalence yielded null8,9 or inverse associations10, but they were limited by small sample sizes and reliance on microscopy that has variable sensitivity to detect Pf infection and that cannot distinguish infection with multiple Pf genotypes.

A recent ecological study using published data from Ghana, Uganda and Tanzania11, countries where Pf transmission intensity is moderate to high (mesoendemic to holoendemic)12,13,14, showed that the age-specific risk of eBL and the average number of distinct malaria genotypes per positive blood sample both peaked between ages 5–9 years. The peaks for age-specific asymptomatic parasitaemia and parasite density, in contrast to those of eBL, both peaked at age about 2 years12,13,14. Infection with multiple Pf genotypes is relatively common in children in areas with holoendemic malaria15, but its association with eBL has yet to be fully studied.

Here, we report our investigation to test the hypothesis that Pf prevalence, parasite density in peripheral blood and genetic diversity are associated with eBL among 303 children with eBL (cases) compared to 274 children with non eBL-related cancers or non-malignant conditions (controls) in Malawi. Pf genetic diversity was measured using a sensitive and specific Pf molecular-barcode array16 of 24 independently segregating Pf single nucleotide polymorphisms (SNPs) representative of the 3D7 Pf genome.


Pf malaria prevalence potentially associated with eBL

Cases were similar to the controls with respect to gender, but they were slightly older than the controls (7.7 [SD 0.2] years versus 6.5 [0.3] years) (Table 1). The distribution of cases and controls across reported home districts was similar (not shown). The Pf prevalence as assessed by PCR analysis was 64.7% among the cases compared to 45.3% among the controls (OR = 2.1, 95% CI: 1.5–3.1). Similar results (66.9% versus 46.4%) were obtained when the analysis was restricted to a subset of 239 children that were previously tested for EBV2 (OR = 2.9, 95%, CI: 1.6–5.4). Associations between eBL and Pf prevalence were evident after stratifying by EBV with eBL associated with Pf among 106 children with a high EBV antibody reactivity (OR = 2.3, 95% CI: 0.8–6.2) and among 133 children with negative, indeterminate or low EBV reactivity (OR = 2.5, 95% CI: 1.1–5.6).

Table 1 Children included in this study by cancer type showing age and sex distributions and prevalence of P. falciparum and Epstein-Barr Virus

Pf genetic diversity potentially associated with eBL

To determine whether Pf density was associated with eBL, we compared Pf log copy number per 105 peripheral blood mononuclear cells in cases and controls. Log copy number of Pf parasites was similar in eBL cases and controls (4.9 versus 4.5 log copies, p = 0.28) and it decreased, albeit, non-significantly with age in both groups. Pf density in younger cases and controls (0–5 years) was 4.9 log copies and in older children it was 4.7 log copies (p = 0.54).

We evaluated the association between eBL and Pf genetic diversity in a subset of 129 children that had samples with at least 2 Pf DNA copies detected and at least 20 of 24 unambiguous SNP calls, which we considered the threshold for valid results (Figure 1a). Although there was no effect of Pf density on the number successful SNP calls, there was a weak relationship between density and the proportion of called SNPs that were non-clonal (Figure 1b) hence Pf density was included as an adjustment in subsequent regression analyses. Genetic diversity at one or more of the 24 SNP locations was observed in 127 (98.5%) of the 129 children (mean number of non-clonal calls per child 11.4, standard error [se] = 0.6, Table 2). The prevalence of non-clonal calls among cases was slightly, but not significantly, increased compared to that among controls (RR = 1.3, 95% CI: 0.97–1.70, P = 0.08). The prevalence of cases with at least 3 non-clonal calls was 2.7 times (95% CI: 0.7–9.9, p = 0.14) that among controls. Mixed calls (non-concordant) may indicate presence of Pf variant strains at levels close to the limit of assay detection. Mixed calls were less frequent than non-clonal calls; having at least one mixed call was observed in 80 (62%) of 129 children (Table 2). Having a mixed call among cases was 3.2 times more likely in cases than in controls, but the result was not statistically significant (p = 0.18). The results were similar in an expanded subset of 160 children with at least 1 copy of Pf DNA (Table 2).

Table 2 Number (%) of children with non-clonal and mixed calls in two nested subsets of the data showing the mean (se) numbers of both per child
Figure 1
figure 1

The relationship between the amount of P. falciparum DNA isolated in cases and controls and (A) the proportion of SNP genotypes (out of 24 SNPs) determined and (B) the proportion of determined genotypes found to be non-clonal.

The relationships are illustrated by loess fitted curves. Amount of DNA has been loge-transformed. Red reference line indicates 2 copies of parasite DNA present and blue reference line indicates 20 of 24 SNPs determined. Blue markers indicate those patients included in the ‘high quality’ subset.

Graphical and spline analyses of the barcode arrays of the children ordered by either diversity score (Figure 2a, b) or by the proportion of non-clonal SNP calls (not shown) revealed a greater preponderance of cases at the more-diverse end of the Pf diversity scale. Corroborating this finding, cases were also found to have a higher average diversity score, which up-weighted non-clonal (scored as 10) or mixed calls (scored as 5), than controls (mean score: 153.9 [se = 5.8] versus 133.1 [se = 7.7], t-test p = 0.036) (Figure 2c).

Figure 2
figure 2

The genetic diversity of P. falciparum isolates from 87 cases and 42 controls.

A) The barcode array: The barcode for a single patient is represented in a single row whilst each column summarizes the diversity at each SNP location. Cases and controls are ordered by the diversity score (most diverse are at the bottom of the plot) and the SNPs are arranged by location in the P. falciparum genome – the first column indicating cancer diagnosis (Cases in red and controls in blue). SNP results are coded as follows: minor allele as lighter blue, major allele as darker blue, potentially mixed call as lighter green, non-clonal call as darker green and a failed call as light gray. B) A loess spline curve relating diversity score to the probability of being a case. The X symbols mark the rows for controls and small circles mark the rows for cases. C) A comparison of the distributions of the diversity score among cases and controls.


We report results from a case-control study of children in Malawi evaluating whether prevalence, density or genetic diversity of Plasmodium falciparum (Pf) might be the triggering malaria exposure for children with endemic Burkitt lymphoma. Associations found between Pf prevalence and genetic diversity with eBL agree with the well-established epidemiology of eBL, i.e., that it occurs in rural areas where Pf transmission is high2,4,11,17,18. The association between eBL and Pf prevalence and genetic diversity score, although modest, was robust in analyses adjusting for anti-EBV and anti-malaria antibodies, Pf density and in sensitivity analyses. The significant difference in Pf and genetic diversity score in cases and controls supports the hypothesis that genetic diversity of Pf may play a role in triggering the pathogenesis of eBL, which was based on a previous study showing a correlation between age-specific peaks for number of malaria genotypes and age-specific eBL peaks11. These results are consistent with observations that eBL is characterized by a very short doubling time (1–2 days)18 and that the interval from initiating or promoting events to diagnosis may be comparatively short (3–8 months)19.

Biologically, Pf parasites and EBV are recognized as co-factors in the genesis of eBL, but the detailed mechanisms of interaction between Pf parasites, the B cell compartment and EBV remain obscure. Cases in our study did not have higher average Pf density than controls. This contrasts with children suffering from severe acute malaria (e.g., cerebral malaria) where Pf density is high, but genetic diversity is low20. Our results perhaps suggest a different conceptual model to explore with the underlying molecular mechanisms linking Pf density, genetic diversity and host proteins in eBL pathogenesis. Clinical data support the notion of differences in the immunopathology of Pf in eBL compared to severe malaria. First, eBL is rare in children aged 0–2 years, despite being the age group when children are most vulnerable to high-density Pf parasitemia and severe malaria. Second, multi-clonal Pf infections are frequently associated with mild malaria among people with established disease immunity21, a property that is closer to the risk profile for eBL than severe malaria.

Pf parasites modulate host defenses promoting both an immune-suppression, hyper-activation (immunosubversion)22,23 and expansion of atypical memory B cells24. Possibly, subversion of immunity might be enhanced by parasite genetic diversity. If so, parasite genetic diversity could increase susceptibility of children to EBV infection or trigger reactivation of EBV among children with latent infection. Additionally, polymorphic Pf-encoded ligands, such as PfEMP125, have been shown to induce polyclonal B cell activation26, preferentially of memory B cells (in which EBV persists and eBL develops) and to rescue tonsillar B cells from apoptosis and to reactivate latent EBV infection27,28. If parasite genetic diversity enhances immune-subversion during infection, potential consequences could be: the impairment of EBV-specific T-cell response, hyper-activation of germinal centers where c-myc/Ig chromosomal rearrangements often occur and increased survival of translocation-positive B cells. Further studies investigating the role of parasite genetic diversity based on the above points are thus needed.

Our study has some limitations. By its design, it precludes us from distinguishing whether the association precedes or follows the development of eBL29. The use of hospital cases and controls is a limitation because, although a similar distribution of home districts was observed for cases and controls it is not known how well this captures the actual malaria exposure related to geography. The controls were slightly younger than the cases, but this difference would bias the study towards the null, suggesting that our results may be conservative. Other limitations include a relatively small sample size, particularly in subgroup analyses. A particular strength of molecular bar code array was its design, based on 24 independently segregating SNPs scattered across the Pf genome, which made a direct measurement and quantification of Pf genetic diversity possible with a high degree of sensitivity and specificity. Despite this strength, the molecular barcode array may not be uniformly sensitive or specific for malaria clones at low quantities. Our results motivate the adaptation of recently published malaria genome and the maturing bioinformatics computational methods to integrate genomics and proteomics30 to investigate the role of malaria genetic diversity in carcinogenesis of eBL.

To conclude, the results of this case control study of children with endemic Burkitt lymphoma in Malawi support the hypothesis that infection with genetically diverse Pf parasites may be associated with eBL. It also supports the rationale of incorporating molecular methods in the study of the pathobiology of malaria in eBL. Further work is needed to evaluate the possible role and the underlying molecular mechanisms, of Pf genetic diversity in the pathogenesis of eBL.


Patients and cancer diagnoses

Participants were from a case-control study of cancers in children aged 0 to 15 years conducted at the Queen Elizabeth Hospital in Blantyre, Malawi, between July 2005 and August 2010 as described elsewhere2,3. Cancers included Burkitt lymphoma, other haematological malignancies (leukaemia, Hodgkin lymphoma), neuroblastoma, rhabdomyosarcoma, Ewing's sarcoma, primitive neuroectodermal tumour and Wilms' tumour. Eight children with Kaposi sarcoma were included in the current study. All cancer cases were reviewed clinically by one investigator (EM) and were confirmed by histology, cytology or other laboratory investigations when possible. Trained nurses obtained consent and administered a standardized questionnaire to the children or their parents or guardians. All children were routinely tested for HIV infection. HIV positive children were excluded from the current study. For analytic purposes, children with Burkitt lymphoma were coded as cases and children with another diagnosis as controls. The controls comprised children admitted to the same hospital with a wide range of both malignant and non-malignant conditions (Table 1).

Ethics review

The study obtained ethical approval from the Oxford Tropical Research Ethics Committee and the Malawian College of Medicine Research and Ethics Committee and exemption from ethics review by the Office of Human Subjects Research at the National Institutes of Health. All subjects gave written informed consent to participate.

DNA extraction, P. falciparum barcode genotyping

DNA was extracted from whole blood samples using QIAamp Blood DNA Kit (Qiagen, Inc., Valencia, CA) according to well-established protocols. Genomic DNA samples (20 ng) from blood were evaluated for Human DNA content RNAse P(ABI TaqMan 4316844, VIC) and for Plasmodium falciparum (Pf) copy number to a 519 bp segment of PF07-0076 using semi quantitative 5′nucleotidase (TaqMan)16. For each sample the PF assay consisted of 20 ng genomic DNA, 900 nMolar forward primer (CGACCCTGATGTTGTTGTTGGA), 900 nMolar reverse Primer (GGCTTTTTTCCATTTCTGTAGTTAAGATTCA), 200 nMolar FAM labeled probe (CAACAGCTCCAAAATAT), 2.5 ul 2× universal master mix (Applied Biosystems), in a final volume of 5 ul. Samples were denatured at 95 degrees for 10 minutes followed by 40 cycles of amplification (95 degrees 15 sec, 60 degree 60 sec) on ABI 7900HT. Human DNA content was assessed in parallel aliquots using identical conditions and substituting primers for the Human RNAse P gene (TaqMan 4316844, VIC, product 87 bp,). The average cycle threshold of triplicate measurements for samples were compared to standards of known copy number and those samples with P. falciparum DNA at greater than 0.5 copies per sample were included in subsequent genotyping analysis.

Genotyping assays were performed in 96.96 dynamic arrays for SNP genotyping (SNP arrays) using the BioMark platform (Fluidigm). Each sample was assayed in quadruplicate for 24 nucleotide polymorphisms (Daniels et al, manufactured by Fluidigm). Samples comprising 20 ng genomic DNA, 50 nMolar STA primer mixture, 50 nMolar LSP Primer mixture 2.5 ul 2× universal master mix (Applied Biosystems), in a final volume of 5 ul. Samples were denatured at 95 degrees for 10 minutes followed by 15 cycles of amplification (95 degrees 15 sec, 60 degrees 120 sec) on ABI 9700. The Fluidigm SNP Array microfluidic chips were loaded with 5 ul assay comprising 2.5 μL assay loading reagent (2×) (Fluidigm 85000736), 1.0 μL 50× SNP genotyping assay mix, 7.5 uM each allelic specific primer and 20 uM locus specific primer) and 1.5 μL RNAse/DNAse free water. Samples were loaded in (5 μL) comprising 2.5 μL Biotium 2× Fast Probe master mix (Biotium 31005) 0.25 μL SNPtype sample loading reagent (20×) (Fluidigm 100-3425), 0.08 ul SNPtype Reagent (Fluidigm 100-3402), 0.03 ul ROX (Invitrogen 12223-012), 0.06 μL RNAse/DNAse free water and 2.08 ul 5-fold dilution of the pre-amplification mixture. Control samples included a negative control (2.08 ul of water instead of genomic DNA) and positive controls of malaria genomic DNA samples (MRA-102G, MRA-150G, MRA-205G, MRA-330G) from BEI resources (Manassas, Virginia). Individual assays (5 μL) and samples (5 μL) was pipetted into separate inlets on the frame of the SNP arrays per the manufacturer instructions. Microfluidic chip loading and mixing of samples and assay mixtures in the 9216 reaction chambers of the dynamic array was carried out on the IFC Controller HX. PCR and image processing was carried out on the BioMark system (Fluidigm). Laboratory staff were masked to the case or control status of the samples.

Two issues surrounding the analyses of the SNP results were identified. Firstly, there were a number of SNP calls that were discordant when the assay was repeated: one time clonal and another non-clonal. This ambiguity might be taken as an indicator of a mixed infection where one clone is at the limit of detectability. Secondly, successful allelic Pf typing was defined as classification of at least 20 of the 24 SNPs on the malaria barcode at a threshold at 2 copies of Pf DNA in the sample. The proportion of samples with determined genotypes declined rapidly below this threshold (Figure 1a).

Statistical analyses

Initial descriptive analyses of the complete sample of 577 patients were carried out and associations between having Burkitt lymphoma or any other cancer and the presence of P. falciparum and/or EBV were assessed using logistic regression adjusted for age, sex and month and year of enrolment.

The main analyses of clonality were restricted to samples with at least 2 copies of Pf DNA and calls for at least 20 of the 24 barcode loci. Firstly, the number of non-clonal calls among cases and controls was compared using a negative binomial regression model using the number of SNP calls made as an offset (natural logarithm (loge)-transformed)) and adjusted for age, sex, year and month of enrolment and amount of Pf DNA present. A similar approach was used to assess the number of potentially mixed calls that occurred. Logistic regressions of the presence of at least one potentially mixed call or at least 3 non-clonal calls were also carried out. These analyses were repeated in sensitivity analyses including 160 participants with at least 1 copy of Pf DNA.

Finally, the unique barcode array for cases and controls was assembled and potential clustering of barcodes assessed by relating the array to cancer type status using a spline fit (with 95% confidence limits). A binary cancer type code was considered the outcome while the array was considered the predictor. The barcodes were ordered in two ways. First, a ‘genetic diversity’ score was assigned to each possible SNP call at each of the 24 locations: 0 for a failed call, 1 for a minor allele (<35% prevalence, as previously defined by Daniels et al16,31), 3 for a major allele, 5 for a potentially mixed call and 10 for a non-clonal call. The score for the entire array was obtained by summing these scores. Second, the proportion of the array with non-clonal calls was determined. For this, potentially mixed calls counted as both a clonal call and non-clonal call. Departures of the spline fit away from the proportion of patients in the sample with Burkitt lymphoma may be indicative of clusters of barcodes predominantly found in one group of patients. Two-sided p-values <0.05 were considered statistically significant; p-values between 0.1 and 0.05 were suggestive of a trend. Because this study was exploratory in nature for hypothesis generation, no adjustment was done for multiple comparisons. All analyses were undertaken with the SAS System (SAS/STAT version 12.3, SAS Institute, Cary, NC, USA).