Introduction

Burkitt lymphoma (BL) is an aggressive B cell non-Hodgkin lymphoma1 first reported in African children in 1958 by Denis Burkitt2. BL incidence varies 3- to 5-fold within and across continents, but the highest incidence is recorded in children in sub-Saharan Africa (SSA, 2-4 per 100,000 person-years3,4), where it is approximately 10-fold higher than the incidence in the United States or Europe1. Epstein-Barr virus (EBV)5 and Plasmodium (P.) falciparum6 are established risk factors for BL in SSA. The association of BL with these infections strongly suggests that variation within the human leukocyte antigen (HLA) region, which plays an important role in the regulation of immune response7, could be related to the etiology of BL8. This hypothesis is consistent with studies that showed association of HLA-B*53 with protection against severe malaria in Gambian children9. Mechanistic studies suggested that this association was mediated by enhanced cytotoxicity against liver-stage malaria parasite forms among carriers of HLA-B*5310. However, a large genome-wide association study (GWAS) of 17,000 individuals from 11 populations, including people from Gambia, did not replicate this association11. HLA variation has also been associated with EBV control, including HLA-A*02:0112 and HLA-DQB1*0213 which are associated with elevated anti-EBV antibodies. HLA-A*02:01 is part of the HLA Class I system and may mediate direct killing of EBV-infected cells by modulating expression of CD8+ Th1-type immune response14. Conversely, HLA-DQB1*02 is part of the HLA Class II system and may mediate EBV control by facilitating anti-EBV antibody secretion through the modulation of peptide presentation to CD4+ T cells and their Th-2 signaling to B cells, thereby promoting an antibody response15,16. Therefore, the reported HLA associations are compatible with HLA modulating immune response to EBV or malaria as potential mechanisms for influencing risk of BL.

Only five studies have been conducted to directly test the association between HLA variation and risk of BL, yielding null results8,17,18,19,20. Those studies were small (<100 cases), lacked suitable controls, and did not properly adjust for ancestry of participants. Moreover, most typed HLA alleles using serological methods, which can be less accurate21 and are limited to 1-field resolution. Only one study, which was conducted by our group previously, has used sequence-based typing (SBT)22 to obtain accurate high-resolution data (≥2 fields) in 600 participants (including 200 with BL) in the Epidemiology of Burkitt Lymphoma in East African Children and Minors (EMBLEM) study in Uganda23. Potentially significant associations were found between BL and HLA-A*02, HLA-B*41, and HLA-B*5823, underscoring the need to conduct HLA research using high-resolution typing and in a larger sample. We investigated associations of classical HLA alleles, SNPs, and amino acids within the HLA region with BL among 4645 children, 800 with BL, enrolled in Uganda, Tanzania, Kenya, and Malawi in who HLA variation was inferred from genome-wide genotype data obtained from an ongoing BL GWAS24. Here we report sucessful HLA imputation, using an updated multi-ancestry reference panel25, obtaining accurate HLA imputation (>90% for class I and 85–89% for class II alleles) when compared with SBT-results in a subset of participants (n = 600) with paired imputed and SBT data25. We observe significant associations between BL risk and HLA-DQA1*04:01 and rs2040406(G) in SSA. The higher risk rs2040406(G) variant for BL is associated with decreased HLA-DQB1 expression in eQTLs in EBV transformed lymphocytes, potentially suggesting that the HLA role is mediated by EBV control.

Results

Table 1 shows the participants’ characteristics (see Methods and Fig. 1). Most participants were from the EMBLEM study in Uganda, Tanzania, and Kenya (71.9% of the cases and 94.5% of controls). As expected and by study design1, compared to controls, BL cases were predominantly male (62.9% vs. 52.1%) and were aged 3–11 years old (77.9% of the cases vs. 74.5% of the controls). P. falciparum infection was detected in 282 (35.3%) of the BL cases vs. 1857 (48.3%) of the controls24.

Table 1 Demographic characteristics of 4645 participants in EMBLEM and Malawi.
Fig. 1: Map of study area and flow chard of study procedures.
figure 1

a Map showing geographical areas shaded green where Burkitt lymphoma cases and controls were enrolled. Participating hospitals are marked with a red cross, while the capital cities where major tertiary care centers are located are marked with white star on a green background. The map was drawn using ESRI ArcGIS Pro software. No portions of this figure were imported as image components from a database. b The Study workflow components.

HLA imputation accuracy at 2-field resolution for alleles with allelic fraction (AF) > 1% was >90% for Class I HLA-A, -B, and Class II HLA-DQA1, 89.5% for HLA-DQB1, 88.3% for HLA-DPB1, and 85.0% for HLA-DRB1 (Supplementary Fig. 1A). We observed a high correlation (r2 ≥ 0.8) between HLA allelic dosage (genotype imputation quality info) and actual typed HLA alleles with AF ≥ 1% (Supplementary Fig. 1B). Analysis of ancestry using only GWAS SNPs in the HLA region replicated the ancestry patterns observed using genome-wide data26 (Supplementary Fig. 2).

The number of imputed HLA variants after quality control is shown in Supplementary Table 1.

Of 12 HLA alleles with prior associations with severe malaria, EBV, or BL (Supplementary Table 2), nine with AF ≥ 1% were evaluated. HLA-A*23:01, which previously was associated with elevated anti-EBV VCA-IgG in a GWAS in Uganda27, was associated with decreased BL in our study (OR = 0.77; P = 0.027; Supplementary Table 3). The other alleles, including HLA-B*53, were not associated with BL. In Uganda only, decreased BL risk was observed with HLA-A*02 (OR = 0.61; P = 0.001) and -B*41 (OR = 0.45; P = 0.037; Supplementary Table 4), which in line with the findings we previously reported in a smaller sample size in Uganda.

After Bonferroni correction (P < 2.7 × 10-4), HLA-DQA1*04:01 was associated with elevated risk of BL (OR = 1.61, 95% CI = 1.32–1.97; P = 3.71 × 10−6, Table 2, Fig. 2A). Significant associations were observed with DQA1*04 (OR = 1.60, 95% CI = 1.31–1.96; P = 5.02 × 10−6), DQB1*04 (OR = 1.60, 95% CI = 1.30–1.98; P = 1.29 × 10−5), and DRB1*03:02 (OR = 1.47, 95% CI = 1.20–1.81; P = 2.30 × 10−4) (Supplementary Table 5). These associations were not significant in analysis conditioned on HLA-DQA1*04:01 (Fig. 2B). The observed associations were similar in country-specific analyses (Supplementary Table 6) and in the different sensitivity models (Supplementary Fig. 3 for model 1 and Supplementary Table 7 for models 2-5).

Table 2 Top HLA alleles and Haplotypes, locus, and amino acid residue hits from analysis of 4,645 Participants (800 with BL) in EMBLEM and Malawi.
Fig. 2: Region association plots of variants in the human leukocyte antigen region and Burkitt lymphoma based on 800 BL cases and 3,845 controls.
figure 2

a Shows results from unconditional analyses; b Shows analyses conditional on DQA1*04:01; c Shows results conditional on rs2040406. The color of the dots indicates different polymorphisms: grey dots indicate imputed SNPs; magenta dots indicate genotyped SNPs; turquoise dots indicate amino acids (AA); light blue dots indicate indels; and orange dots indicate classical HLA alleles at 1- or 2-field resolution.

We identified 24 haplotypes formed by HLA-DRB1, -DQA1 and -DQB1 genes with AF ≥ 0.01. The HLA haplotype substructure was different in our participants in East Africa compared to those in Ghana, consistent with our findings of significant population substructure between these regions26 (Supplementary Fig. 4). For example, HLA haplotype DRB1*08:04-DQB1*03:01-DQA1*04:01 was more frequent in Ghana (5%; 45/923) compared to East Africa (1%; 32/4645). This haplotype was not associated with BL risk (OR = 1.44, 95% CI = 0.82–2.51; p = 0.200), whereas DRB1*03:02- DQB1*04:02-DQA1*04:01, which was also more frequent in Ghana (10.0%), was associated with elevated BL risk (OR = 1.58, 95% CI = 1.25–1.99; P = 1.08 × 10−4) (Table 2).

After Bonferroni correction (P < 1.1 × 10-6), GWAS variant rs2040406(G) in the HLA-DQA1 region was associated with elevated BL risk (OR = 1.43, 95% CI = 1.26–1.63; P = 4.62 × 10−8, Table 2, Fig. 2A, and Supplementary Fig. 5). This association persisted in single variant conditional models adjusting for HLA-DQA1*04:01 (Table 2). Associations with elevated BL risk were observed with rs1064994(C) (OR = 1.38, P = 8.75 × 10−7), rs1065049(A) (OR = 1.39, P = 6.06 × 10−7), rs9272982(A) (OR = 1.38, P = 8.23 × 10−7), and rs1130399(A) (OR = 1.48, P = 1.21 × 10−7), but they were not significant after conditioning on rs2040406, which may reflect LD patterns that were strong (r2 > 0.8) for three of the four SNPs and weak or moderate for rs1130399 (r2 = 0.3) (Fig. 2C and Supplementary Table 8). Finally, we examined the combined effect of the presence of the two risk alleles. Although homozygosity for both HLA-DQA1*04:01 and SNP rs2040406 (with LD R2 = 0.225) was low (1.3% in the cases and 0.68% in the controls), children who were homozygous for both had a higher risk for BL (OR = 2.66, 95% CI = 1.19–5.95; P = 0.017) versus those not carrying either allele.

Among the imputed single amino acid residues in HLA-DQA1 chain, significant associations were observed with variants at position 53, with two allelic variants (Lys/Gln) in the current dataset. Compared to Lys, which is the most common residue at position 53, those having Gln residue at this position had an elevated risk for BL (OR = 1.36; 95% CI = 1.20–1.55; P = 2.06 × 10−6; Table 2). The 3D structure of the HLA-DQA1 chain suggests that Gln 53 is located within the peptide binding groove of HLA-DQ molecule, and may have functional impact on specificity of peptide binding and/or T cell receptor (TCR) contacts (Supplementary Fig. 6).

Among the controls, neither HLA-DQA1*04:01 nor rs2040406(G) alleles were associated with P. falciparum parasite density (Supplementary Fig. 7). In exploratory analyses, we observed no significant associations between HLA alleles and P. falciparum infection detection among the controls (Supplementary Fig. 8).

Regarding a potential EBV connection with HLA in our data, we note suggestive associations of elevated BL risk (below Bonferroni threshold) were observed with three GWAS SNPs previously associated with higher anti-EBV EBNA1 IgG antibodies in southwest Uganda27 (rs1064991(G), P = 2.11 × 10−6; rs3129867(G), P = 8.24 × 10−3; rs6927022(G), P = 1.99 × 10−2; Supplementary Table 9). We found no association between BL with three other GWAS SNPs (all p values > 0.05) that were previously associated with higher anti-EBV EBNA1 IgG antibodies in Europeans (rs2516049)28 and Mexican Americans (rs477515 and rs285427529) or with GWAS SNP rs28394498(T) associated with anti-EBV VCA IgG antibodies in southwest Uganda27 (P = 0.256). We note that the GWAS SNP with the strongest association with BL - rs2040406(G)- has been reported to be an eQTL for HLA-DQB1 (P = 1.0 × 10−10) and C4A (P = 1.1 × 10−6) in EBV-transformed lymphocytes in the Genotype-Tissue Expression (GTEx) v8 database (Supplementary Fig. 9 and Supplementary Table 10).

Contrary to our hypotheses about global associations of BL with HLA variation, our study did not find significant associations of BL with any of HLA alleles categorized as rare our dataset or with HLA zygosity in individuals. The possible exceptions were inverse associations between BL and carriage of common HLA-A alleles (OR = 0.86, P = 0.016, Supplementary Table 11) and homozygosity at HLA-C (OR = 0.62, P = 0.004, Supplementary Table 12). However, independent studies are needed to confirm these findings.

Discussion

Our study using imputed high-resolution HLA data from four countries in Africa supports the hypothesis that BL risk is related to HLA variation. This hypothesis was proposed five decades ago2, but has been difficult to confirm or refute because of lack of accurate HLA typing data and well-designed studies. We also report associations between BL risk and carriage of HLA-DQA1*04:01 allele and GWAS SNP rs2040406(G) in the HLA-DQA1 region. These HLA associations may be related to effects of HLA on EBV control because the higher risk rs2040406(G) variant for BL is in high LD (r2 > 0.8) with several GWAS SNPs (e.g., rs1064991) known to have pleiotropic effects against EBV humoral immune response26,27. Additionally, both rs2040406(G) and HLA-DQA1*04:01 have pleiotropic effects against multiple sclerosis30,31, which is EBV-linked32.

Because malaria is the strongest geographical risk factor for BL and the risk of BL increases by 39% per 100 cumulative infections in a child33, we hypothesized that HLA associations with BL may be related to control of malaria. However, neither HLA-DQA1*04:01 nor rs2040406(G) alleles were not associated with P. falciparum infection detection or parasite density. Our finding of a null association between BL and HLA-B*53 confirms a previous null result from a smaller study34, and is consistent with the null association between severe malaria and HLA-B*53 reported in a malaria GWAS of 17,000 individuals from 11 countries11. These results cast doubt on the frequently cited result of HLA-B*53 as a marker of malaria resistance9,35.

Several of our findings may suggest that there are other mechanisms that could link HLA variation to risk of BL. For example, HLA-DQA1*04:01 has been linked to autoimmune-related conditions, such as Henoch-Schönlein Purpura36. Is it possible that the association with HLA-DQA1*04:01 is pointing to a role of autoimmunity and risk of BL. This inference was suggested by reports of an inverse association between a history of allergy or asthma and BL risk among young adults in the International Lymphoma Consortium study37,38. It is also supported by findings that BL tumors display skewed usage of immunoglobulin heavy-chain variable gene segment 4 (IGH-V4), which is implicated in autoreactivity39.

The results also suggest immuno-genetic mechanisms in the etiology of BL that can be investigated by experimental methods, particularly with regard to the control of EBV infection as a critical factor affecting risk of BL. These include functional studies of expression of cytokines associated with Th1 (IFN-γ), Th2 (IL-10), or Treg (IL-17) phenotypes40 or the assessment of CD4+ and CD8+ T cell effector memory subsets targeting selected EBV proteins according to HLA type (HLA-DQA1*04:01 versus not). While previous studies have primarily centered on EBNA141, future research could be expanded to employ broader panels of EBV proteins, including the 33 EBV proteins, such as the viral capsid antigens (VCAp18, -p23, -p40, -p160), which showed to be differentially reactive in BL cases compared to controls. The studies could also use a smaller set of peptides, such as the four-marker immune panel, including BHRF1 (Bcl-2 homolog), BMRF1 (EAp47), BBLF1 (tegument protein), and BZLF1 (ZTA), whose reactivity most accurately classified BL status in patients in Ghana42.

Our study was possible because imputation of HLA alleles from GWAS data in sub-Saharan Africa, which we show to be a feasible and a practical way to obtain high-resolution HLA data to investigate the relationship between HLA variation and BL risk. Our study adds value to the ongoing GWAS and expands the experience with HLA imputation in African subjects25, who are currently underrepresented in genetic studies in Africa23.

We note several strengths of our study, including having the largest sample size to date, enrollment from four countries in Africa, availability of SBT data for evaluating the imputation accuracy, and detailed information on covariates to adjust for confounding. However, we acknowledge weaknesses, including lack of anti-EBV antibodies results, which limits our ability to discuss the link between HLA and EBV control. Although our study sample size is the largest to date, it is still relatively small for genome-wide studies and only included people from East Africa. Thus, our results may not be generalizable to other SSA populations. We did not use a GWAS significance threshold (e.g., 5 × 10−8), which is conservative for some of our exploratory hypotheses. Larger studies with more broad sampling across different countries in SSA are needed to confirm or refute our findings. Such studies may improve the capacity to investigate HLA in SSA, increase clarity of associations, and identify generalizable results.

To conclude, our findings confirm the high accuracy of HLA imputation from GWAS data in populations with admixed Nilotic ancestry. We report significant association of BL risk with HLA-DQA1*04:01 and rs2040406(G) in East African children. We hypothesize that the observed associations could be mediated by HLA effects on control of EBV infection and/or autoimmunity, and suggest a promising area of research might be understanding the link between HLA variation and EBV control in SSA populations.

Methods

Study population

The methods of the EMBLEM24 and the Infections and Childhood Cancer studies have previously been published43. The EMBLEM study is a population-based study that enrolled cases and controls aged 0-15 years in two neighboring regions in Uganda and in four neighboring regions in Kenya and Tanzania (Fig. 1). BL diagnosis was based on local histological/cytological diagnosis (74% of included cases) or clinical, imaging, and laboratory diagnosis. There were no apparent demographic or clinical differences between these groups. The controls were apparently healthy children enrolled from random villages (100 in Uganda, 100 in Kenya, and 95 in Tanzania)24. HIV-positive participants in EMBLEM were not excluded because HIV infection was rare (24 BL cases and 15 controls)24. In Malawi, participants were children 0–15 years enrolled with cancer at Queen Elizabeth Hospital in Blantyre. BL diagnosis was based on local cytology or histology and compatible clinical investigations; children with non-lymphoid solid cancers were used as controls. Participants with HIV or Kaposi sarcoma were excluded to protect their privacy. Demographic and risk factor information was obtained via a structured interviewer-administered questionnaire24. Venous blood was collected in EDTA tubes at enrollment. In EMBLEM, P. falciparum infection in blood was determined using microscopy of thick-film blood smears, or antigen capture rapid diagnostic tests (RDTs)24. In Malawi, P. falciparum infection was based on P. falciparum-specific polymerase chain reaction (PCR)43. Because we have previously shown distinct ancestry patterns in East versus West African populations26, we conducted comparative studies of HLA patterns in 968 Ghanaian adult men who were previously studied for ancestry to compare with our East African populations.

Ethical approvals

We confirm that all relevant ethical regulations were followed. Specifically, approval for the EMBLEM study was granted by ethics committees at the Uganda Virus Research Institute (GC/127), Uganda National Council for Science and Technology (H816), Tanzania National Institute for Medical Research (NIMR/HQ/R.8c/Vol. IX/1023), Moi University/Moi Teaching and Referral Hospital (000536), and National Cancer Institute (10-C-N133). Ethical approval for the original Infections and Childhood Cancer Study was granted by ethics committees at the Malawi College of Medicine (P.03/04/277R) and Oxford University. Because the original Malawi Infections and Childhood Cancer study did not request participants to consent to genetic testing, special ethical approval to conduct genetic testing was obtained from the Malawi National Health Sciences Research Committee (Approval #2405). Written informed consent was obtained from participants’ guardians in EMBLEM and Malawi studies, and written informed assent was obtained from children aged ≥7 years old in the EMBLEM study. The ethical approval for genetic studies favored research that would enable possible or suitable interventions in the communities where participants were enrolled or increasing knowledge considered relevant to the local communities, such as research on HLA variation and malaria resistance to investigate the association of BL with malaria.

DNA extraction and genotyping procedures

DNA extraction and genotyping were performed at the Cancer Genomics Research Laboratory, NCI, USA. Genotypes of approximately 4.6 million variants were determined using the Infinium Omni5Exome-4 v1.3 BeadChip (Illumina, San Diego, CA, USA) following standard Illumina data analysis workflow26. Genotype data were phased and then imputed using the African Genome Resources panel on the Sanger imputation server (https://imputation.sanger.ac.uk/). This panel was preferred because it currently has the largest number of genomes from African participants, including from the Great Lakes region in Africa. Only variants imputed with high confidence (genotype imputation quality info ≥0.9) with a minor allele frequency (MAF) ≥ 0.01 were retained in the analysis dataset. All variants analyzed passed Hardy–Weinberg equilibrium (HWE) test in the controls using a threshold of P < 1.0 × 10−6. Ancestry was evaluated using principal components (PC) analysis of 787,731 genotyped uncorrelated (r2 < 0.3) SNPs outside the HLA region. The top three country-specific PCs were used to control for ancestry. Additionally, because of a high degree of relatedness in Ugandan controls26, we constructed a genetic relationship matrix (GRM) for all individuals in the dataset, based on the probability that two individuals i and j share 0, 1, or 2 alleles identical by descent (IBD)44, and used it to control for relatedness.

We performed HLA imputation using Minimac445 on the Michigan Imputation Server (MIS) with a multi-ancestry reference panel which contains data from 21,546 unrelated individuals25. We applied the default quality control procedures of the MIS pipeline. We imputed eight classical HLA genes HLA‐A, B, C and HLA‐DRB1, DQA1, DQB1, DPA1, and DPB1, and amino acids and intergenic variants using genotypes extracted from chr6:25Mb-35Mb (hg19/GRCh37; n = 49,159 SNPs) for the 4645 participants. In total, we inferred 1439 classical I and II alleles at 1- and 2-field resolution, 164 INDELs, and 4,511 amino acid polymorphisms (Supplementary Table 1). After quality control (Supplementary Fig. 10), we retained 187 classical HLA alleles, 126 INDELs, 2652 amino acid polymorphisms, and 43,572 GWAS SNPs (11,702 genotyped and 31,870 imputed) with imputation R2 > 0.8 and MAF > 1% for association analysis. The HLA allele frequencies of East Africa, when compared with Ghana population34, are presented in Supplementary Fig. 11.

Statistics and reproducibility

We assessed the accuracy of our HLA imputation by examining the concordance and correlation between the imputed classical HLA alleles and SBT-HLA genotypes of 600 participants in the same dataset with paired data22. We assessed BL and HLA allele, SNP, and single amino acid residue associations by fitting generalized linear mixed models (GLMM) with the logit link in country-specific datasets with covariates (sex, age, P. falciparum infection status, ancestry as fixed effects, and GRM as a random effect) as the main model. P. falciparum is the strongest known co-factor for the geographic patterns of BL24,33,46,47. Thus, it was considered a-priori, as a risk factor and a confounder and included in our main models. When interpreted as a confounder, any associations with HLA alleles that remain significant indicate that additional contribution of P. falciparum was not responsible for the observed associations with HLA alleles.

The main results are presented as summary odds ratios (ORs) and significance assessed by 95% confidence intervals (95% CIs), computed using a standard meta-analytical approach of country-specific ORs in PLINK v1.9. HLA variants were coded as 0, 1, or 2 corresponding to the number of variant alleles carried by an individual. Formal analysis of statistical heterogeneity of associations with BL across the countries was assessed using Cochran’s Q test. P values were calculated using Wald tests and corrected for multiple comparisons using Bonferroni adjustment (P < 2.7 × 10−4 for 187 classical HLA alleles and P < 1.1 × 10−6 for another 46,350 variants in the HLA region). While a more conservative P value of GWAS significance (e.g., 5 × 10−8) may be considered more rigorous, it was not preferred for the current analysis because we wanted to test hypotheses based on prior epidemiological and biological data about HLA associations with malaria10 or EBV27, or with BL23. We performed conditional single-variant association tests, where in addition to the confounders included in the models, we performed conditional adjustment for HLA alleles or SNPs identified in the main models to be significantly associated with BL. Associations of BL with 12 variants with a priori

associations with BL, severe malaria or EBV were assessed without correcting for multiple comparisons (Supplementary Table 2). Alleles that are polymorphic at the 2-field resolution are reported as such, while those with limited polymorphism are reported at 1-field resolution.

We examined the robustness of our results by conducting several sensitivity analyses. We excluded P. falciparum infection from the main model in sensitivity model 1. In the subsequent sensitivity models 2-5, we controlled for urban vs. rural residence of participants (model 2), wet vs. dry season of enrollment (model 3), carriage of HBB-rs334(A) and ABO-rs8176703(T) that are also protective against BL48 (model 4), and excluded 14 individuals with outlier principal components (model 5).

Finally, we defined HLA allele groups as common, when the allele frequency [AF] >5%, otherwise as not common when the allele frequency was 1%-5%. Alleles with <1% of the participants were not included in this analysis. This variable was used to investigate global HLA associations of BL with common versus not common HLA alleles, which provides a way to assess potential detrimental associations with rare HLA alleles, based on the assumption that natural selection influences allelic distribution and favorable alleles will be differentially represented in cases and controls49. Second, we investigated associations between BL and HLA zygosity at six HLA loci (HLA‐A, B, C and HLA‐DRB1, DQB1, and DPB1). This approach was selected to reduce potential selection bias, as it could arise from excluding individuals carrying rare HLA alleles, particularly given their higher likelihood of being heterozygotes. HLA zygosity in an individual was defined as the number of alleles at the 2-field resolution that were homozygous across the six HLA loci. We hypothesized that low HLA zygosity (i.e., with more loci being heterozygous, is correlated with a broader, and therefore, more effective immune response repertoires, while high zygosity would be correlated with a decreased repertoire of immune responses50. If so, then we reasoned that low zygosity might be associated with decreased BL risk, perhaps, mediated by more effective immune responses against relevant infections, e.g., EBV or malaria.

Because our understanding of associations between classical HLA alleles and asymptomatic P. falciparum infection, particularly in older children, is less clear, we assessed associations between HLA variants with P. falciparum infection among controls. We used similar models as those used for analysis of BL, except with infection status as the outcome in the controls. To gain insights about whether HLA alleles that were significantly associated with BL have effects on P. falciparum infection, we examined the relationship between those alleles with log-transformed P. falciparum density in the controls. These analyses were performed separately for EMBLEM and Malawi participants because in Malawi parasite density was measured using PCR, which is more sensitive, while in EMBLEM parasite density was measured using blood smears, which is less sensitive.

Statistical analyses were performed in R (version 4.1.0) utilizing computational resources of the NIH HPC Biowulf cluster.

Haplotype inference, visualization, and comparisons across countries

We constructed HLA haplotypes separately for each country from phased genotypes using Haplo.stats package v.1.7.7. HLA linkage disequilibrium (LD) and/or haplotype configurations across multiallelic genetic markers were visualized using Disentangler plots (http://kumasakanatsuhiko.jp/projects/disentangler). We tested associations between BL with each haplotype, controlling for covariates used in the main model. We also descriptively compare HLA patterns in the four countries in East Africa versus in Ghana (N = 968), which we previously showed to be ancestrally different from the populations in East Africa26.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.