Main

A highly effective malaria vaccine is needed to limit malaria morbidity and mortality worldwide1. However, this has been hampered by poor understanding of naturally acquired or vaccine-induced immunity2. Malaria vaccines that have entered clinical testing show great variation in efficacy in different geographical locations3,4. Increasing evidence supports the notion that human immune responses are not only shaped by genetics, but are also markedly influenced by the environment5. Indeed, vaccine responses can be affected by increased burden of exposure to micro-organisms and parasites in the environment6,7 as well as by pre-exposure to the specific target pathogen. For example, a malaria vaccine that had shown good efficacy when tested in malaria-naive North Americans, induced weaker responses in endemic regions8,9, indicating that an in-depth understanding of the interaction between our immune system and Plasmodium parasites is needed, not only in malaria-naive individuals but, more notably, also in those living in areas where malaria is endemic.

Immunity to malaria can develop naturally, as indicated by the lower parasite carriage and disease episodes with increasing age in endemic areas10. Immune responses to malaria parasites are complex, as antibodies11,12 and a range of immune cells are thought to be involved in protection13, but the exact contribution of cellular immunity requires further characterization10. Controlled human malaria infection (CHMI) trials provide exceptional opportunities to study immune responses and vaccine efficacy, as there is a clear onset of infection allowing the tracking of cause and effect13,14. Most CHMIs have been conducted in Europe or in the USA, where malaria-naive Europeans or European Americans and African Americans have been inoculated with Plasmodium falciparum sporozoites (PfSPZs) and have shown reproducible emergence, from the liver, of blood-stage parasites that can be detected by microscopy within a precise window of time. Currently, CHMI is also being established in malaria pre-exposed individuals in endemic areas, which will be invaluable for dissecting the relationship between baseline immune profiles, naturally acquired immunity15 and malaria vaccine efficacy.

To date, a limited number of CHMI studies have examined cellular immune profiles during infection using flow cytometry and each of these studies could focus only on a limited number of cell subsets9,13,16,17. Developments in single-cell analysis by mass cytometry5,18, however, provide an opportunity for in-depth and broad immune profiling of responses to malaria parasites over time.

Here, we examined the immunological reactivity of malaria-naive Europeans as well as Africans with lifelong residence in a malaria-endemic area, known to exhibit naturally acquired immunity. Both Europeans and Africans were experimentally infected with P. falciparum in Gabon and mass cytometry was used to show, at an unprecedented depth, the detailed cellular immunological profiles at baseline, as well as the dynamics of immune responses to malaria parasites. This was complemented by determining cellular functionality through cytokine production, RNA-sequencing (RNA-seq) transcriptome analyses and by application of machine learning to identify key markers associated with naturally acquired immunity.

Results

Individually unique and stable immune fingerprints revealed by mass cytometry

We enrolled European (n = 5) and African (n = 20) adult volunteers15 (Supplementary Table 1) and collected peripheral blood mononuclear cells (PBMCs) 1 day before direct venous inoculation (DVI) with nonattenuated PfSPZs and 5 days and 11 days after DVI. Through mass cytometry (Supplementary Table 2), we profiled a total of 33.3 million immune cells from 75 blood samples. As summarized in Fig. 1a, unsupervised hierarchical clustering with SPADE19 and t-distributed stochastic neighbor embedding (t-SNE) analysis in Cytosplore (Fig. 1b)18,20 identified distinct cell clusters (Fig. 1c), with unique marker expression profiles. Clusters were labeled with immune lineages and subsets (Fig. 1d, Extended Data Fig. 1 and Supplementary Table 3). Collectively, we were able to distinguish 198 cell clusters belonging to 45 immune subsets and 9 lineages in a data-driven fashion.

Fig. 1: Individually unique and stable immune fingerprints revealed by mass cytometry.
figure 1

a, A SPADE tree of a PBMC sample after analysis of the combined sample (n = 75) dataset containing 33.3 million cells. Size and color of the nodes are proportional to the number of clustered cells. The major immune lineages are annotated based on lineage marker expression profiles. b, t-SNE embeddings showing the collective CD4+ T cells (975,000 cells). Colors of the cells represent arcsinh-transformed (cofactor 5) expression values of indicated markers. c, A density map describing the local probability density of t-SNE-embedded CD4+ T cells, where black dots represent centroids of identified clusters using Gaussian mean-shift clustering (left). A t-SNE plot depicting the cluster partitions in different colors (middle) and the CD4+ T cell subset borders (right). d, Clusters and subsets within the CD4+ T cell lineage. A heat map summary of median expression values of markers expressed by 24 CD4+ T cell clusters and hierarchical clustering of clusters with labeling by subset name. Cell counts per cluster (bottom). e, t-SNE map showing samples per individual and time points clustered for cell frequencies. Samples with similar percentages of cell clusters relative to total cells end up close together. All 5 Europeans (blue) and 20 Africans (pink) are shown. The groups of three samples for each individual representing the three time points, 1 day before DVI with PfSPZs (C−1, triangle) and 5 days (D5, circle) and 11 days (D11, diamond) after DVI cluster together. For one individual, the C−1 sample (asterisk) is not clustering with the other two time points (D5, D11) of the same individual (also asterisk). f, t-SNE maps of PBMCs showing that each individual has their own immune fingerprint. Two European and two African individuals over time are visualized.

The immune cell composition of samples from the same individuals at three different time points were remarkably similar and we observed striking interindividual variation (Fig. 1e). Thus, each individual preserved their unique immune fingerprint over time (Fig. 1f).

Distinct immune signatures in Europeans and Africans at baseline

A very distinct European and African immune signature was identified at baseline. At the lineage level, (Extended Data Fig. 2), but in particular, when visualizing single cells within lineages by t-SNE, several unconventional αβ T cell, γδ T cell and innate lymphoid cell (ILC) subsets/clusters were present in Africans that were largely absent in Europeans (Fig. 2a). For example, subsets of CD8+ natural killer (NK) T cells, γδ T cells (CD8+, CD45RA+, CD127) and type 2 ILCs (ILC2s) were strikingly lower in Europeans compared to Africans (Extended Data Figs. 1 and 2 and Supplementary Table 4). The t-SNE plots of adaptive cells reflect expansions of memory CD4+ and CD8+ T cell subsets and increased numbers of CD11c+ B cells in Africans (Extended Data Fig. 2). The type 2 helper T (TH2) cell (CRTH2+) subset, as well as CD161+CD4+ T cell subset, described to express more effector and pathogenic functions21, are enriched in Africans. The expanded CD11c+ B cells are also known as atypical memory B cells, reported to increase in settings of chronic stimulation, including malaria exposure22.

Fig. 2: Immune signatures of Europeans and Africans before and after P. falciparum sporozoite direct venous inoculation.
figure 2

a, t-SNE maps illustrating differences between Europeans (n = 5) and Africans (n = 20) at the single-cell level, per immune lineage. Cell density per individual map is indicated by color. b, Heat map showing Spearman’s rank correlation coefficients (rho) for relationships between immune cell subsets that were different between Europeans (n = 5) and Africans (n = 20) at baseline, based on Global test within-test P values of ≤0.05, of all individuals. The orange block of positively correlated subsets was found in higher numbers in Europeans than Africans (top left), while high frequencies of subsets in the lower right block were associated with Africans. Orange indicates a positive correlation, whereas purple indicates a negative correlation. MAIT, mucosal-associated invariant T cell. c, Comparison of time to parasitemia as determined by TBS after PfSPZ DVI between Europeans (n = 5) and Africans (n = 20). P value is from log-rank (Mantel–Cox) test for survival analysis with chi-squared of 23.62 and d.f. = 1. d, Relative changes in response to PfSPZ DVI in frequencies of cell subsets that were different between Europeans (n = 5) and Africans (n = 20), between 1 day before (C −1) and 5 days (D5) or 11 days (D11) after DVI. Subset frequencies were calculated relative to total cells, as indicated with T, or relative to its lineage, as indicated with L. When the difference was significant relative to both total cells and lineage, this is indicated with T,L, and only the change relative to total was shown. Fold changes in these frequencies were calculated as ‘D5/C−1’, ‘D11/D5’ and ‘D11/C−1’. Fold increase is indicated in red, whereas blue shows a decrease. All subsets with a within-test P ≤ 0.05 for the Global test are depicted (see Supplementary Table 4 for more statistics). mDC, myeloid dendritic cell. e, Percentage of cell subsets relative to their lineage at baseline (C−1). Cell subsets are shown that responded differentially in Europeans (n = 5) and Africans (n = 20) after PfSPZ DVI as indicated in Fig. 2d. Significance is based on the within-test P values (Supplementary Table 4). *P ≤ 0.05, **P ≤ 0.01 (see also Extended Data Fig. 2c and Supplementary Table 4 for more statistics). f, Volcano plots of gene expression (11,659 genes) (log2 fold change versus −log10 adjusted P value) after PfSPZ DVI in Africans (n = 20) and Europeans (n = 5) between two time points. Top five DEGs are annotated. A false discovery rate (FDR) cutoff of 0.05 with no fold change threshold was used to define DEGs. g, Overrepresented Gene Ontology (GO) Biological Process pathways in Europeans (n = 5) and Africans (n = 20). GO testing was performed with adjustment for length bias. Pathways with Benjamini–Hochberg (BH)-adjusted P value <0.05 were uploaded to REViGO to reduce redundancy of pathway terms. SimRel semantic similarity measure was used with small (0.5) allowed similarity. The categorization of pathways into either immune response or cell biology and metabolism categories was performed manually. Gene/pathway ratio was defined as the proportion of DEGs present in a pathway divided by the total number of genes in a pathway. Overrepresentation analysis was equivalent to a one-sided Fisher’s exact test. ER, endoplasmic reticulum; MHC, major histocompatibility complex.

In total, the frequencies of 45 clusters were significantly different between Africans and Europeans at baseline (Extended Data Fig. 1 and Supplementary Table 4). Overall, clusters expressing markers known to associate with a more differentiated state or with inflammation were expanded in Africans compared to Europeans, explaining the differences at the subset level. Examples for innate cells are CD56-expressing CD8+CD45RA+CD127 γδ T cells (clusters 60 and 61), which could represent cells with cytotoxic potential23; NK cells expressing CD16, but no or dim CD56 (clusters 93, 94, 96 and 99), described to be terminally differentiated NK cells24; and CD56-expressing monocytes (cluster 146), known to be increased during inflammatory disorders and to decrease after anti-tumor necrosis factor (TNF) therapy25. Regarding adaptive cells, the expanded memory CD4+ T cell and B cell subsets in Africans could be accounted for by CD25 (cluster 28 and 29), CD38 and programmed cell death (PD)-1 (cluster 21)-expressing T cell clusters, as well as CD185+ (CXCR5) (cluster 116) and CD185CD11c+ B cells (cluster 136) (Extended Data Fig. 1 and Supplementary Table 4). Notably, a cluster of naive CD8+ T cells associated with Africans was positive for CD161 and KLRG1 (cluster 38)26,27. Altogether, the higher frequency of clusters expressing activation/differentiation/pro-inflammatory markers in Africans reflects the exposure to micro-organisms and parasites that can lead to an inflammatory environment and chronic stimulation state7,28,29.

Additionally, correlations were performed on percentages of cell subsets at baseline within individuals (Fig. 2b), showing clear African and European immune profiles that reflect the large phenotypic differences across both adaptive and innate immune compartments.

Differential dynamic changes in immune responses of Europeans and Africans upon PfSPZ DVI

Upon PfSPZ DVI, all Europeans and 12 of 20 Africans (60%) developed parasitemia detectable by microscopy of thick blood smear (TBS) within the 28-d study period (Fig. 2c, Extended Data Fig. 3 and Supplementary Table 1)15. Europeans exhibited blood-stage parasitemia within 12–14 d, whereas Africans who became TBS+ did so at later time points (geomean 17.9 d, range 13–25 d), indicating varying degrees of immunity to liver and blood-stage parasites.

Differences between Europeans and Africans in response to PfSPZs over time was marked by an increase in Africans, in regulatory T (Treg) cells and ILC2s seen on day 5 (D5), while in Europeans, an increase occurred later, between D5 and day 11 (D11) after DVI (Fig. 2d). The ILC2, effector memory (EM) CD4+ T cell and CD127 (interleukin (IL)-7Rα) γδ T cell subsets, which were of lower frequency in Europeans at baseline (C−1, 1 day before DVI) (Fig. 2e and Extended Data Fig. 2), increased at this later time point. In the same period, classical monocytes showed a stronger increase, whereas CD141hi (BDCA3) myeloid dendritic cells showed a much stronger decrease in frequency in Europeans than in Africans (Fig. 2d and Supplementary Table 4).

RNA-seq analysis of the whole blood transcriptome from parallel samples showed that PfSPZ DVI resulted in 473 differentially expressed genes (DEGs) over time unique to Africans and 108 DEGs unique to Europeans, with only 15 DEGs overlapping between the groups. The changes in gene expression occurred predominantly between C−1 and D5 in Africans and between D5 and D11 in Europeans (Fig. 2f), in line with mass cytometry data. Pathway overrepresentation analysis emphasized the distinctiveness of responses to PfSPZ DVI between Africans and Europeans (Extended Data Fig. 2). In Africans, mostly pathways associated with cell biology and metabolism (cellular shape regulation, actin filaments, oxidative phosphorylation) or with immune pathways (Fc receptor and platelet activation) were upregulated, whereas in Europeans, the immune pathways that were highly upregulated involved responses to interferons (Fig. 2g).

Further insight into how the immune system responds to P. falciparum was provided by identification of 23 clusters, through mass cytometry, which changed differentially in Europeans and Africans after PfSPZ DVI (Extended Data Fig. 1 and Supplementary Table 4). For example, malaria infection leads to the activation of CD161+CD4+ T cells evident from the increase in CD25+ cells (cluster 16) in Europeans, in line with the higher frequencies of CD161+CD4+ T cells in pre-exposed Africans at baseline (Fig. 2e). Similarly, the increase in CD45RA+ and CD127 γδ T cell subsets in Europeans could be accounted for by an increase in the CD16+CD161+ cluster 70, which at baseline was higher in Africans. Some changes at the cluster level, not reflected at the subset level, were striking, exemplified by a strong decrease in NK cell cluster 107 (CD16KLRG1) in Europeans. This could reflect the relocation of these cells to tissues or their differentiation, in line with NK cells being one of the first innate cells that respond to malaria parasites and help activate adaptive responses30. It was also interesting that NK cell cluster 107 was already significantly lower in peripheral blood of pre-exposed Africans. Therefore, in malaria-naive individuals, P. falciparum infection seems to drive an immune profile that resembles the one seen at baseline in pre-exposed Africans.

High-resolution immune signatures at baseline and following PfSPZ DVI in Africans related to control of parasitemia

In contrast to Europeans, some Africans were able to control parasitemia following PfSPZ DVI, remaining negative by microscopy (TBS Africans), demonstrating strong naturally acquired immunity. Although the Africans who became parasitemic (TBS+ Africans) did so largely at later time points than naive Europeans, which indicates some degree of immunity (Fig. 2c), their inability to control their infection distinguishes them from TBS individuals. The visualization of immune cells by t-SNE shows distinct patterns between TBS and TBS+ Africans (Fig. 3a). Indeed, at the subset level, TBS Africans were characterized by fewer CD8+ NKT cells, but more CD161+CD4+ T cells and double-negative (DN) T cells than TBS+ Africans (Fig. 3b and Supplementary Table 4).

Fig. 3: High-resolution immune signatures at baseline and following PfSPZ DVI in Africans related to parasitemia control.
figure 3

a, t-SNE maps illustrating the phenotypic differences between TBS+ (n = 12) and TBS (n = 8) Africans at the single-cell level, per lineage. Cell density per individual map is indicated by color. b, Percentage of cell subsets relative to their lineage at baseline (C−1). Cell subsets are shown that responded differentially in TBS+ (n = 12) and TBS (n = 8) Africans before or after (Fig. 3f) PfSPZ DVI and cell subsets that differ at baseline. Significance is based on Global test within-test P values (see Supplementary Table 4 for more statistics). *P ≤ 0.05. c, Survival graph showing time until Africans (n = 20) develop parasitemia (TBS) according to the frequency of PD-1+CD161+ cluster 21 relative to central memory (CM) CD4+ T cells at baseline (C−1). The frequency was split on the median to create two groups of Africans, a top >50% (red), with the Africans who had a high frequency of cluster 21, and bottom <50% (blue). This grouping was used as the coefficient in a univariate Cox regression; the shown P value is based on the score (log-rank) test, with likelihood ratio test of 6.94, d.f. = 1, n = 20, 12 events. The box plot shows the median, first and third quartiles of the respective cluster for the TBS+ Africans (n = 12) and TBS Africans (n = 8). Whiskers extend to the maximum/minimum of the respective groups, no further than 1.5 × interquartile range (IQR). All points are added to the box plot. d, Survival graph similar to c, but showing the time until Africans develop parasitemia according to the frequency of KLRG1+ cluster 197 relative to DN T cells at baseline (C−1). Likelihood ratio test of 7.58, d.f. = 1, n = 20, 12 events. e, Survival graph similar to c, but showing the time until Africans develop parasitemia according to the frequency of KLRG1+ cluster 63 relative to CD8+ NKT cells at baseline (C−1). Likelihood ratio test of 5.39, d.f. = 1, n = 20, 12 events. f, Relative changes in response to PfSPZ DVI in frequencies of cell subsets that were different between TBS+ (n = 12) and TBS (n = 8) Africans. Comparisons between baseline (C−1) and D5 or D11 after DVI. All subsets with a within-test P value ≤0.05 for the Global test are shown (Supplementary Table 4). See Extended Data Fig. 4 for individual data points.

More in-depth profiling of cell populations revealed differences in 18 clusters (Extended Data Fig. 1 and Supplementary Table 4), 6 of which explained the observations at the subset level. Thus, in TBS Africans, the higher percentage of CD161+CD4+ T cells could be explained by expansion of cells expressing PD-1 (CD279) (cluster 21), whereas the expanded DN T cells seem depleted of clusters that show little expression of differentiation/activation markers (clusters 10 and 197). Moreover, the CD8+ NKT cell subset that was lower in peripheral blood of TBS Africans and thus might be residing in tissues or secondary lymphoid organs, consists of an increase in clusters expressing KLRG1 (cluster 63, 75 and 103), which has been reported to be a marker for long-lived invariant NKT cells31. Together, these results indicate that cells in a more activated/differentiated state associate with parasite control. Kaplan–Meier plots show that higher frequencies of the PD-1+CD161+ cluster 21 (Fig. 3c), lower frequencies of cluster 197 (with virtually no other activation/differentiation markers) (Fig. 3d) and higher frequencies of the KLRG1+ cluster 63 (Fig. 3e) at baseline were associated with a higher probability of being TBS.

Following PfSPZ DVI, four cell subsets changed differentially in Africans who controlled their parasites (TBS) and those who developed parasitemia (TBS+) (Fig. 3f, Extended Data Fig. 4 and Supplementary Table 4). In TBS Africans, an increase was seen in terminally differentiated effector memory (EMRA) CD4+ T cells as well as in γδ T cells, which would be in line with the possibility that these cells reside in peripheral organs, poised to respond rapidly to infection and thereafter move out into peripheral blood. In parallel, a marked decrease was seen in the percentage of plasmacytoid dendritic cells (pDCs), which, by moving out of the blood into the tissue and/or secondary lymphoid organs, can elicit further T cell responses.

Analyzed at the cluster level, changes in 12 clusters were seen (Supplementary Table 4). The differential change in γδ T cells, when comparing before and 11 d after DVI, might be due to cluster 12 (of CD127+ γδ T cells), which notably expresses KLRG1 and increase in TBS Africans compared to a decrease in TBS+ Africans (Extended Data Fig. 1 and Supplementary Table 4). Additional analysis of this cluster with a panel containing Vδ2, showed that this cluster, as well as γδ T cell cluster 79, which was associated with protection in TBS Africans at baseline, seem to encompass Vδ2+ γδ T cells (Extended Data Fig. 5). Vδ2+ γδ T cells have been associated with protection, but can also become dysfunctional with repeated exposure resulting in clinical tolerance to malaria32. Furthermore, in TBS Africans, cluster 187, which comprises CCR7-expressing (CD197) cells within pDCs, decreased between C−1 and D5, whereas cluster 26, expressing PD-1 and KLRG1 within EM CD4+ T cells, increased later, between D5 and D11, which could indicate activation of pDCs that leave the peripheral blood to contribute to stimulation of EM CD4+ T cells to expand and appear in peripheral blood.

The CD8+ NKT cell and DN T cell subsets, which at baseline distinguished TBS from TBS+ Africans, changed in frequency. CD8+ NKT cells, lower in frequency at baseline, increased more strongly (P = 0.091) in peripheral blood of TBS Africans between day 5 and 11, suggesting that these cells reside in the liver33 or secondary lymphoid organs and respond rapidly to malaria infection and thereafter leave to enter peripheral blood. Within the DN T cell subset, which was higher at baseline, the KLRG1-expressing cluster 66 (KLRG1+ DN T cells) increased in TBS Africans (Supplementary Table 4) after PfSPZ DVI within the first 5 d, indicating an early response in those able to control parasitemia.

The RNA-seq data showed that PfSPZ DVI resulted in 236 DEGs over time unique to TBS− Africans and only 42 DEGs unique to TBS+ Africans; thus, changes in gene expression occurred predominantly in TBS Africans (Fig. 4a). The pathways enriched in TBS Africans (Fig. 4b) reflected activation (cellular metabolism or response to intracellular organisms), but also those that could explain the strong changes seen in pDCs relative to the monocyte and DC lineage (myeloid cell development) (Fig. 3f at subset level and Supplementary Table 4 for cluster 187). Gene set enrichment analysis identified two out of eight interferon (IFN)-γ-related pathways (GO 0060334, Bonferroni-adjusted P = 0.015; GO 0060330, Bonferroni-adjusted P = 0.015) in TBS Africans at day 5. The few DEGs in TBS+ Africans were not enriched for any identifiable pathway but a decrease in IFNG was seen in this group from C−1 to D5 (Fig. 4b).

Fig. 4: Gene expression in blood of Africans after PfSPZ DVI.
figure 4

a, Volcano plots of gene expression (log2 fold change versus −log10 adjusted P value) after PfSPZ DVI in TBS Africans (n = 8) between baseline (C−1) and D5. Top five DEGs are annotated. BH-adjusted and two-sided P value <0.05 without fold change. b, Overrepresented GO Biological Process pathways in TBS+ (n = 12) and TBS (n = 8) Africans together with a circular graph depicting the genes that were differentially up- or downregulated between the TBS+ and TBS Africans between two time points.

Taken together, by using CHMI, it has been possible to associate subsets and clusters identified as CD4+ T cells, in particular, EM and CD161+CD4+ T cells, NKT cells, DN T cells and γδ T cells as well as pDCs, with naturally acquired immunity. Notably, these cells often exhibited differentiation/activation markers that might portray stronger effector responses for control of parasitemia in Africans with lifelong residence in malaria-endemic regions. In line with this, the analysis of the transcriptome revealed changes in a set of genes indicative of early cellular activation in TBS Africans only.

Cytokine production in response to Plasmodium falciparum–infected red blood cells

To test immune cell functionality, PBMCs were stimulated with P. falciparum-infected red blood cells (PfRBCs) and cytokine-producing CD4+ T cells, CD8+ T cells, DN T cells, NKT cells and γδ T cells were analyzed by flow cytometry (Extended Data Fig. 6 and Supplementary Table 5). IL-17 and IL-2 responses, while detectable in response to Staphylococcal enterotoxin B, were negligible in response to PfRBCs. PfRBC-specific IFN-γ and TNF responses were readily observed (Extended Data Fig. 6). The highest proportion of cells that produced TNF and IFN-γ were γδ T cells and NKT cells in both European and Africans, indicating that there can be functional consequences to the associations seen with these cells and varying degrees of immunity to P. falciparum. Even though innate immune cells showed the largest cytokine responses to PfRBCs, an adaptive immune response by CD4+ T cells was clearly measurable, whereas we were unable to detect antigen-specific cytokine-producing CD8+ T cells (Fig. 5a). The CD4+ T cell response to P. falciparum antigen was characterized by higher IFN-γ (Fig. 5b) and TNF (Fig. 5c) responses in Africans compared to Europeans, with highest responses seen in TBS Africans, who control their infection (Fig. 5d). It was also interesting to note that in line with the mass cytometry data, lower frequencies of TNF+CD8+ NKT cells were found in TBS Africans at baseline (Fig. 5e).

Fig. 5: Cytokine production in response to Plasmodium falciparum–infected red blood cells.
figure 5

a, Baseline frequencies of PfRBC-specific IFN-γ response by CD8+ T cells to PfRBC stimulation in Europeans (n = 5) and Africans (n = 20). All cytokine response data have been subtracted with background response to uninfected red blood cells. Median of each group is displayed as horizontal line. b, Baseline frequencies of the PfRBC-specific IFN-γ response by CD4+ T cells in Europeans and Africans. P value from two-sided Wilcoxon rank-sum test with W = 17. c, Baseline frequencies of PfRBC-specific TNF response by CD4+ T cells in Europeans and Africans. P value from two-sided Wilcoxon rank-sum test with W = 18. d, Baseline frequencies of PfRBC-specific IFN-γ response by CD4+ T cells in TBS+ and TBS Africans. P value from two-sided Wilcoxon rank-sum test with W = 106. e, Baseline frequencies of PfRBC-specific TNF response by CD8+ NKT cells in TBS+ and TBS Africans. P value from two-sided Wilcoxon rank-sum test with W = 23. f, Changes in frequency of TNF-producing terminally differentiated EMRA CD4+ T cells in TBS Africans from D5 to D11 after PfSPZ DVI. P value from two-sided Wilcoxon signed-rank test with W = −22.

Considering responses over time, EMRA CD4+ T cells, which increased in frequency between D5 and D11 in TBS Africans, showed no increase, but a decrease (P = 0.078) in TNF response from D5 to D11 (Fig. 5f). This could reflect a level of hyporesponsiveness following their earlier activation upon infection. A similar dysfunction has been described upon repeated malaria infection in children34,35.

Altogether, through examining the cytokine production in response to P. falciparum-infected red blood cells, we were able to show the functionality of cell types, identified by mass cytometry, to be associated with naturally acquired immunity to P. falciparum.

Multi-omics implicates IFN-γ-producing CD161+CD4+ T cells in natural protection against malaria infection

To understand which cell clusters, genes and antigen-specific responses were most strongly associated with naturally acquired resistance to malaria, the machine-learning algorithm DIABLO was used to classify individuals and identify discriminant features (see Extended Data Fig. 7 and Methods for details on model selection and performance)36. Feature selection was performed using Lasso-like regularization for each fold in the cross-validation and consensus features present in at least half the folds were retained. At baseline, comparing Europeans and Africans, three CyTOF clusters, two cellular responses and six genes were consistently included in the model (Fig. 6a). This encompassed the naive CD4+ T cell cluster 3, the naive CD8+ T cell cluster 34, the NKT cell cluster 85 and the genes CASP7, encoding for caspase-7, and CD180. We then assessed which features were most important for predicting naturally acquired resistance to malaria by comparing TBS+ and TBS Africans at baseline (Fig. 6b). The CD161+CD4+ T cell cluster 21 and IFN-γ production by total CD4+ T cells and EM CD4+ T cells were retained in all folds. In addition, cluster 173 of nonclassical monocytes, the gene encoding cathepsin-C (CTSC) and cytokine production by CD8+ γδ T cells and NKT cells were also consistently selected.

Fig. 6: Integrative data analysis.
figure 6

a, Bar charts showing consensus features included in DIABLO machine learning to classify Africans versus Europeans. The most important CyTOF clusters (green), genes (RNA-seq, blue) and cellular responses (intracellular cytokine staining (ICS), red) for discriminating between Africans and Europeans at baseline are shown. Features were selected for every fold in cross-validation, and only features retained in at least 50% of folds are depicted. b, Bar charts showing the consensus features included in the DIABLO machine learning to classify TBS+ and TBS Africans. The most important CyTOF clusters (green), genes (RNA-seq, blue) and cellular responses (ICS, red) for discriminating between TBS+ and TBS Africans at baseline are shown. Features were selected for every fold in cross-validation, and only features retained in at least 50% of folds are depicted. c, Arrow plots showing projection onto the latent space of the full model for all datasets, including outcome. For each individual, a centroid is shown, with arrows pointing to where the CyTOF, RNA, ICS and outcome are projected for that individual. Shorter arrows indicate better alignment between datasets. Africans and Europeans are depicted in pink and blue, respectively. Two components were used for the latent space. d, Arrow plots showing the projection onto the latent space of the full model for all datasets, including outcome. For each individual, a centroid is shown, with arrows pointing to where the CyTOF, RNA, ICS and outcome are projected for that individual. TBS+ and TBS Africans are depicted in green and orange, respectively. e, Correlation matrix of baseline levels between all 22 scaled features that were most strongly associated with either ethnicity or TBS outcome. Nonsignificant correlations (Pearson test) were set to 0. Hierarchical clustering was performed on the Euclidean distance using complete linkage. f, Baseline levels of important genes, CyTOF clusters and cellular responses per group. Mean and s.e.m. are shown for z-scaled levels for Europeans (blue), TBS+ Africans (orange) and TBS Africans (green). g, Differences in frequency of the PfRBC-specific IFN-γ response between CD161CD4+ T cells and CD161+CD4+ T cells of Africans (n = 6) from this study. Cytokine response data have been subtracted with background response to uninfected red blood cells. Statistical analysis was performed with a Student’s t-test for paired samples. P value from two-sided paired Student’s t-test, with T = 3.316 and d.f. = 5. h, Antibody reactivity of TBS+ and TBS African individuals to Plasmodium protein microarray, showing the reactivities at baseline (C−1) to five antigens associated with TBS Africans (P value <0.05 and a fold change >2 of mean signal intensity (SI), annotated). Statistics are based on two-sided Welch-corrected Student’s t-test. i, Correlation between antibody reactivity and abundance of total CD4+ T cells at C−1 for TBS+ (n = 12) and TBS (n = 8) Africans. The y axis represents cell subset at baseline as a percentage of total cells. The x axis represents relative binding antibody value to the measured antigens. Both the Pearson correlation and its corresponding two-sided P value are reported for each antigen, without adjustments for multiple comparisons. The black line represents a fitted linear model (y ~ x), and the shaded gray area is the 95% confidence interval. j, Correlation as in i, between antibody reactivity and abundance of CD161+CD4+ T cells.

The three datasets and outcomes aligned well in the latent space (Fig. 6c,d) and we therefore correlated baseline levels of the predictive genes, CyTOF clusters and cellular responses (Fig. 6e). This revealed three groups of features, one of which was associated with Europeans and consisted of naive CD4+ and CD8+ T cell clusters and the NKT cell cluster 85, as well as genes encoding for Homeobox protein Nkx-3.1 (NKX3-1), actin-related protein 2/3 complex inhibitor (ARPIN), olfactomedin 1 (OLFM1) and leucine-rich repeat containing 6 (LRRC6) (Fig. 6e,f).

A second cluster associated with resistance to parasitemia consisted of the protective CD161+CD4+ T cell cluster 21 and IFN-γ producing (EM) CD4+ T cells (Fig. 6e,f). Indeed, CD161+CD4+ T cells showed an increased capacity to produce IFN-γ upon PfRBC stimulation compared to CD161CD4+ T cells (Fig. 6g). Although the genes CASP7 and CD180 clustered together with CD161+CD4+ T cells and IFN-γ production, these genes were increased in Africans over Europeans but not associated with resistance to parasitemia (Fig. 6e,f).

The identification of CD161+CD4+ T cells as a prominent feature associated with control of parasitemia, prompted us to study its further functional relevance. To this end, a protein array with 228 unique antigens of P. falciparum was probed by sera from study participants and antibody reactivity, at baseline, to five proteins was identified to associate with parasite control in TBS group (Fig. 6h). While CD4+ T cells did not correlate (Fig. 6i), frequencies of CD161+CD4+ T cells at baseline showed a correlation with two of these antibody reactivities and a trend for two more (Fig. 6j). This indicates a possible mechanism through which CD161+CD4+ T cells can play a role in controlling parasitemia.

Taken together, a machine-learning approach was able to identify a minimal signature of features before inoculation that was predictive of naturally acquired protection against P. falciparum upon CHMI, which included CD161+CD4+ T cells, as well as CD4+ T cell IFN-γ production upon stimulation. Although validation in an independent cohort is required, this signature could help toward the development of vaccines with high protective potential in endemic areas.

Discussion

Mass cytometry revealed distinct European and African immune signatures marked by enrichment of memory cells and expression of activation/differentiation markers such as CD25, CD161, KLRG1 and PD-1, not only on adaptive but also on innate immune cells of Africans. These distinct immunological patterns are likely a reflection of variation in the burden of exposure to micro-organisms and parasites in Europe and Africa.

By collecting samples 5 days after PfSPZ DVI, we were able to observe a rapid increase in Treg cells as well as ILC2s in Africans, whereas in Europeans a later increase was seen in a number of adaptive and innate cells in peripheral blood. Transcriptomics of parallel whole blood samples supported the rapid response to malaria in Africans and a delayed response in Europeans, but also highlighted the very different pathways that were activated in these two groups. For example, ‘myeloid development’ or ‘platelet aggregation’ typified the immunological pathways activated early in Africans. An early increase in platelet-activation-pathway genes has been reported recently in Tanzanian adults undergoing malaria challenge37 and might represent a response to antibodies present in pre-exposed individuals. Interestingly, a delayed but strong type-I IFN signature was seen in naive Europeans. The stronger increase between day 5 and 11 in γδ T cells in Europeans, in line with earlier studies of CHMI in malaria-naive individuals14, as well as the stronger increase in EM CD4+ T cells and classical monocytes in these volunteers is consistent with the ability of type-I IFN to lead to activation and recruitment of a network of cells that can mediate malaria-induced inflammation38,39. In this regard, the early increase in Treg cells or ILC2s seen in Africans might represent the initiation of a distinct response; one that is anti-inflammatory, not present in Europeans.

Comparison of African volunteers that were susceptible to infection (TBS+) with those that controlled parasitemia (TBS), demonstrated that naturally acquired immunity was associated with a cellular profile involving subsets and clusters within CD4+ T cells, NKT cells, DN T cells and γδ T cells as well as pDCs. These cells often exhibited differentiation/activation markers, which might indicate stronger effector responses capable of controlling parasitemia. In particular, a high frequency of CD161+CD4+ T cells was a notable feature of the baseline immune profile associated with the control of P. falciparum. This parallels a stronger cytokine response to Plasmodium antigen by CD4+ T cells of the TBS group, reflected in the higher frequency of antigen stimulated CD161+CD4+ rather than CD161CD4+ T cells that produced IFN-γ. RNA-seq analysis of samples from TBS Africans showed a rapid alteration of transcriptional profiles upon DVI, mirroring the strong changes seen in pDCs and CD4+ T cells by CyTOF. In addition, a rapid increase in IFN-γ-related pathways was observed early after PfSPZ DVI in protected participants only. CD4+ T cell responses have been associated with protection when chemo-attenuated malaria parasites were used to vaccinate European volunteers14. Our study identifies the CD161+ subset of CD4+ T cells as a correlate of protection in individuals with lifelong malaria exposure. One mechanism through which CD161+CD4+ T cells might contribute to parasite control, could be through promoting antibody responses. Using malaria parasite protein arrays, we identified antigen-specific antibody reactivities associated with parasite control; these antibodies were correlated with CD161+CD4+ T cells.

The role of innate immune cells in immune memory is of increasing interest now that there is considerable evidence for trained immunity31,40. Although there are reports of NKT cells that enhance immunity to malaria infection in animal models41, we identified CD8+ NKT cells expressing KLRG1, as players in naturally acquired immunity in humans. In addition and vδ2 γδ T cells were found to be associated with stronger parasite control. Finally, γδ T cell subset and clusters expressing KLRG1 increased 11 d after PfSPZ DVI only in the TBS group. Given the residence of γδ T cells in the liver42 and their importance in parasite clearance42,43, our data support the hypothesis that γδ T cells activated by P. falciparum traffic from the liver as part of effective immune responses in individuals with naturally acquired immunity.

Integrative machine learning identified genes, cell types and responses strongly associated with immunity to malaria. It revealed that expression of the genes CASP7 and NKX3-1 were altered in Africans relative to Europeans. These genes are involved in signaling of p53, which has recently been identified to play a role in malaria-induced inflammation44,45,46. Integrative machine learning also provided further evidence that CD161+CD4+ T cells producing IFN-γ are a correlate of naturally acquired protection against malaria infection. To date, malaria vaccines aim to induce a strong memory T cell response, based on compelling evidence for the importance of CD8+ T cells, which can be helped by CD4+ T cells47. Our findings regarding the role of CD161+CD4+ T cells, along with emerging evidence for their strong effector functions in infection and inflammatory diseases21, could be a starting point for the development of vaccines inducing strong protective responses in individuals residing in malaria-endemic areas.

In conclusion, by combining CHMI with high-dimensional single-cell technology, RNA-seq, functional analysis and data integration, we created a detailed map of responses in naive European and pre-exposed African volunteers with varying degrees of naturally acquired immunity and identified rapid responses that associate with control of parasitemia. We also provide a dataset repository that can help design independent studies to confirm the cellular responses that can be harnessed against malaria.

Methods

Clinical trial details

Samples in this study were part of the LaCHMI-1 trial15, which studied immunity to P. falciparum malaria in a controlled infection setting in Lambaréné, Gabon, in August 2014. The trial followed International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use Good Clinical Practice guidelines and the principles of the Declaration of Helsinki and was performed in accordance with guidelines approved by the national ethics committee, le Comité Nationale d’Ethique pour la Recherche, Gabon’s regulatory authority and under a US Food and Drug Administration Investigational New Drug application. Written informed consent was obtained from all participants. The trial is registered at ClinicalTrials.gov under NCT02237586, which contains a full list of eligibility criteria. Exclusion criteria included positivity for HIV, hepatitis B or hepatitis C. Included in the the trial were 5 malaria-naive European adults and 20 malaria-exposed African adults, of which 11 had a normal hemoglobin genotype (HbAA) and 9 had sickle cell trait (HbAS) (Supplementary Table 1). Hemoglobin genotype and sex had no measurable effect on the parasitological and immunological outcomes of this study (Supplementary Table 1 and Extended Data Fig. 8). Furthermore, eosinophil counts did not differ significantly between TBS+ and TBS Africans, as a proxy for helminth infections.

Before the start of the trial, four Africans had low, asymptomatic parasitemia by TBS (45 to 105 parasites per µl). As per protocol, all volunteers received a treatment to radically cure possible Plasmodium infection with a 5-d course of 12-h 5 mg kg−1 clindamycin. After clindamycin treatment, all volunteers were free of asexual parasites at C−1, whereas one African individual had gametocytemia by PCR and two Africans had gametocytemia by both PCR and TBS.

At least 3 d after clindamycin treatment, individuals received 3,200 aseptic, purified, fully infectious, nonattenuated PfSPZs (Sanaria PfSPZ Challenge), strain NF54, by intravenous injection48,49,50. Blood was drawn on C−1, D5 and D11 in sodium heparin blood collection tubes for immunophenotyping. From D5 onwards, TBS and PCR tests were performed daily to detect malaria parasites. Effective treatment was initiated upon development of TBS parasitemia for Europeans. For Africans, treatment was initiated if parasitemia was accompanied by symptoms consistent with malaria, if there was parasitemia of >1,000 parasites per µl, or at the end of the study at day 28 in those not previously treated.

Cryopreservation of PBMCs

Heparinized blood was diluted 2× with HBSS, followed by PBMC isolation by 1.077 Ficoll density centrifugation. The HBSS contained 100 U ml−1 penicillin G sodium and 100 µg ml−1 streptomycin. After washing twice with HBSS, the PBMCs were cryopreserved in 20% fetal bovine serum (FBS; Bodinco; lot. BDC 1886)/10% dimethyl sulfoxide (DMSO)/RPMI-1640 medium. RPMI medium contained 1 mM pyruvate, 2 mM l-glutamine, penicillin G and streptomycin. Cryovials were placed overnight in a Nalgene Mr. Frosty Freezing Container (Thermo Fisher Scientific) at −80 °C before transfer to liquid nitrogen for long-term storage. Cryopreserved PBMCs were shipped in a liquid nitrogen dry vapor shipper from Lambaréné, Gabon, to Leiden, the Netherlands, for analysis.

On the day of mass cytometry staining and PfRBC stimulation, cryopreserved PBMCs were thawed with 50% FBS/RPMI medium at 37 °C, washed twice with 10% FBS/RPMI medium and 2 × 106 cells per sample were stored on ice temporarily. Thawing was performed in five batches in subsequent weeks in 2015, with a median recovery of 75% and viability of 98.5% and without significant differences between the batches.

Mass cytometry staining

The mass cytometry antibody panel consisted of preconjugated antibodies as well as self-conjugated antibodies (Supplementary Table 2). Staining for Vδ2+ γδ T cells was conducted with the same panel, with the following adjustments: TCR Vd2-157Gd (clone B6, BioLegend, cat. 331402, RRID AB_1089226, 1:100 dilution) was added, CD86-144Nd was replaced by CD86-198Pt (clone IT2.2, BioLegend, cat. 305435, RRID AB_2563764, 1:200 dilution), CD14-160Gd was replaced by CD14-148Nd (clone M5E2, BioLegend, cat. 301843, RRID AB_2562813, 1:100 dilution), CD16-148Nd was replaced by CD16-209Bi (clone 3G8, Fluidigm, cat. 3209002B, 1:200 dilution), CD185 (CXCR5)-150Nd was replaced by CD185-150Nd (clone J252D4, BioLegend, cat. 356902, RRID AB_2561811, 1:100 dilution), barcoding with β2-microglobulin (BioLegend, cat. 316302, RRID AB_492835, 1:50 dilution) coupled to 106Cd, 110Cd, 111Cd, 112Cd, 114Cd and 116Cd was added and most Fluidigm preconjugated antibodies were self-conjugated with the same clone antibody from BioLegend. For self-conjugation, 100 µg of antibody and the MaxPar X8 Antibody Labeling kit were used according to manufacturer’s protocol V2. The conjugated antibody was stored in 200 µl Antibody Stabilizer PBS (Candor Bioscience) at 4 °C. All antibodies were titrated on study samples.

Staining for mass cytometry was based on MaxPar Cell Surface Staining Protocol V2 (Fluidigm). First, cells were washed with MaxPar staining buffer (Fluidigm) and 5 min centrifugation at 300g in 5 ml polystyrene round-bottom tubes. Then, the cells were incubated with 1 ml 500× diluted 500 µM Cell-ID Intercalator-103Rh (Fluidigm) in staining buffer at room temperature for 15 min. After washing with staining buffer, cells were incubated with 5 µl Human TruStain FcX Fc-receptor blocking solution (BioLegend) and 40 µl staining buffer at room temperature for 10 min. Then, 50 µl of freshly prepared antibody cocktail was added and incubated at room temperature for another 45 min. Subsequently, the cells were washed 3× with staining buffer and incubated with 1 ml 1,000× diluted 125 µM Cell-ID Intercalator-Ir (Fluidigm) in MaxPar Fix and Perm buffer (Fluidigm) at 4 °C overnight. After three washes with staining buffer and centrifugation at 800g, cells were stored as a pellet at 4 °C and measured within a week. The median recovery after staining was 80.5%.

Measurement by mass cytometry

Measurement of samples by mass cytometry was randomized per individual to avoid bias by staining and measurement day. Samples belonging to the same individual were stained and measured together. Furthermore, every staining day, PBMCs from the same cryopreserved control source were thawed and stained, after which the sample was split over two tubes. These tubes were the first and the last samples measured within a measurement week for quality control and were found to be highly comparable.

Samples were measured with a CyTOF2 mass cytometer (Fluidigm), which was automatically tuned according to Fluidigm’s recommendations. Before measurement, cells were counted, washed with Milli-Q water, passed over a cell strainer and brought to a concentration of 0.5 × 106 cells ml−1 with 10% EQ Four Element Calibration Beads (Fluidigm) in Milli-Q water. Samples were measured and analyzed on-the-fly, using dual-count mode, acquisition delay set to 30 s and detector stability delay set to 10 s. Noise-reduction was applied, with a lower convolution threshold of 200, no event subtraction, minimum event duration of 10 s, maximum event duration of 150 s, sigma of 3 and no event limit. Threshold filtering was set to default. Next to channels used to detect antibodies, channels for intercalators (103Rh, 191Ir and 193Ir), calibration beads (140Ce, 151Eu, 153Eu, 165Ho and 175Lu) and background/contamination (133Cs, 138Ba, 206Pb and background) were acquired. FCS files were normalized and concatenated in CyTOF2 software, without removing beads. The median sampling efficiency (or recovery), which is the percentage of cells in a sample that is saved in the FCS file, was 36.0%, which falls within the expected range for CyTOF2 mass cytometers.

Stimulation with PfRBCs

After thawing, cells were rested overnight at 1 × 106 cells ml−1 in 10% FBS/RPMI in an upright-standing 25 cm2 cell culture flask placed in an incubator at 37 °C and with 5% CO2. Subsequently, cells were stimulated in 96-well round-bottom plates with 0.5 × 106 cells per well in 200 µl medium with 0.5 × 106 intact PfRBCs or uninfected red blood cells (uRBCs) or Staphylococcal enterotoxin B (200 ng ml−1; Sigma) in the incubator for 24 h. After 2 h of incubation, brefeldin A (10 µg ml−1; Sigma) was added to all conditions. After 24 h, cells were stained with LIVE/DEAD Fixable Aqua Dead Cell Stain (50 µl per well, 1:400 diluted) according to manufacturer’s instructions (Thermo Fisher Scientific), fixed with 1.9% formaldehyde solution for 15 min at room temperature and frozen at −20 °C in 10% FBS/10% DMSO/RPMI and stored at −80 °C.

Measurement by flow cytometry

Measurement of samples by flow cytometry was also randomized per individual to avoid bias by staining and measurement day. Samples belonging to the same individual were stained and measured together. After thawing, fixed cells were stained in 96-well V-bottom microplates with 50 µl of antibody mixture (see Supplementary Table 5 for panel) diluted in eBioscience permeabilization buffer (Thermo Fisher Scientific) with 1% human Fc-receptor binding inhibitor (Thermo Fisher Scientific; cat. 14-9161) at 4 °C for 30 min. Leftover cells were pooled and then split for use as fluorescence-minus-one (FMO) controls. Compensation beads (BD CompBead, BD Biosciences) were made fresh every day. PBMCs were acquired with a BD LSR-Fortessa X-20 SORP (Supplementary Table 5). BD FACSDiva 8.0.1. CST setup beads (BD Biosciences, lot 83654) were run before every measurement. A median of 141,800 live, single cells was obtained per sample.

RNA-seq

Blood for RNA-seq was collected in PAXgene blood RNA tubes (QIAGEN; cat. 762125) and kept at room temperature for 2 h, frozen overnight at −20 °C and then brought to −80 °C for long-term storage. The samples were shipped from Gabon to the Netherlands on dry ice and stored again at −80 °C. RNA was purified from PAXgene tubes, following the manufacturer’s protocol. Library preparation and sequencing were contracted to GenomeScan B.V. The NEBNext Ultra Directional RNA Library Prep kit for Illumina was used to process the samples. Messenger RNA was isolated from total RNA using oligo-dT magnetic beads and used for library preparation. DNA sequencing was performed with the Illumina NextSeq 500 according to the manufacturer’s protocol.

Mass cytometry data processing

The workflow for the mass cytometry data processing is summarized in Fig. 1. At first, FlowJo v.10.2-10.7.1 for Mac (FlowJo) was used to gate out beads and to subsequently select single, live, CD45+ cells (Extended Data Fig. 3). A total of 95.1% of the collected events were cells, of which 77% were single cells, 96.7% were viable and 99.8% were CD45+ cells (medians). The gating steps resulted in a median number of 432,000 single, live CD45+ cells per sample and a total number of 33.3 million cells for all samples combined.

Next, FCS files were analyzed using Cytobank (2015, Cytobank) with arcsinh transformation with cofactor 5. SPADE19 was applied on all 75 samples, with 35 clustering channels (all antibodies, except CD197 (CCR7)), a target number of 500 nodes and 70,000 downsampled events and without compensation. Nodes (clusters of cells) comprising artifacts such as dead cells and doublets were identified by combinations of high levels of the dead cell stain 103Rh, contaminating 138Ba and 206Pb, bead marker 140Ce, long event length and/or expression of mutually exclusive markers such as CD3, CD14 and CD19. The remaining nodes contained 33.1 million cells and formed five branches: the CD4+ T cell, CD8+ T cell, B cell, CD7+ T cell and myeloid cell branch (Fig. 1 and Extended Data Fig. 6). These branches were subsequently analyzed separately to reduce the size of the individual analyses and to improve the discriminatory power of these analyses but were not intended to reflect pure hematopoietic lineages.

Cytosplore (www.cytosplore.org, 2015–2021) was used to determine cell clusters within the branches. Cytosplore makes use of approximated t-SNE51, followed by interactive determination of cell clusters by a Gaussian mean-shift method. This results in more accurate and reproducible cell cluster classification than with SPADE or ACCENSE52. In Cytosplore, the number of output clusters is determined by a manually adjustable kernel bandwidth, which could be interactively optimized thanks to the visualization of clusters on the t-SNE map and was further guided by visualization of marker expression variations of clusters in the corresponding heat map. Before t-SNE was run, FCS files were arcsinh-transformed with cofactor 5 and an extra channel called SampleTag was added to the FCS files to be able to identify from which sample an event originated after t-SNE. Currently, Cytosplore can perform these tasks, but for this study, MATLAB R2015b (The MathWorks) was used for the transformation with CYT53 as well as the addition of the SampleTag. Furthermore, FCS files were downsampled before t-SNE and upsampled after t-SNE. Downsampling involved random selection of events from each sample, with a maximum of 13,000 events per sample to reach a maximum of 975,000 total events per t-SNE, as t-SNE does not perform well with more events. After downsampling, t-SNE analyses and clustering was carried out separately for the five SPADE branches with Cytosplore, with perplexity set at 30 and 1,000 iterations and resulted in a total of 235 clusters. Afterward, upsampling was performed, which was based on matching of events to the median signal intensity of the 235 previously identified clusters and the maximum arcsinh-transformed single marker distance was set to 3.5 and the maximum total distance to 25 in Cytosplore. Per SPADE branch, 97.0% to 98.2% of cells were matched to these clusters after upsampling. The remaining unassigned cells were analyzed in a second round of t-SNE runs, which resulted in an additional 100 clusters. Many of these clusters were later of significance in the comparative analyses. After upsampling, the 335 cell clusters were manually merged to a final number of 198 immunological distinguishable clusters (Extended Data Fig. 1), based on a heat map of the median signal intensity of all channels and all clusters.

Next, the 198 clusters were labeled manually at the subset level and a higher lineage level (Fig. 1 and Supplementary Table 3). The lineage level consists out of CD4+ T cells, CD8+ T cells, unconventional αβ T cells, γδ T cells, B cells, ILCs, monocytes and dendritic cells, basophils and undefined cells. Multiple subsets were identified within each of these lineages. To illustrate, two clusters were found to be CD3+CD4+TCRγδCD56CRTH2+ and were therefore classified into the TH2 cell subset, which was part of the CD4+ T cell lineage. These clusters were also CD45RA−/loCD45RO+/hiCCR7 and could therefore also be analyzed as part of the EM CD4+ T cell subset. These lineages, subsets and clusters were used for the subsequent statistical analysis.

Regarding the analysis for Vδ2+ γδ T cells (Extended Data Fig. 5), batch correction was performed using the fastMNN function54 and uniform manifold approximation and projection dimensionality reduction was run on combined data.

Flow cytometry data processing

Flow cytometry data analysis was performed with FlowJo software (v.10.4.1; TreeStar). Gating for time, singlets and live cells was performed before gating of the subsets of interest (Extended Data Fig. 6). Gates were initially set according to FMO control and adjusted according to negative controls for gating of cytokine-positive cells. The percentage of PfRBC-specific cytokine-producing cells was corrected by subtracting background levels as observed in uRBC conditions.

RNA-seq data processing

All RNA sequence files were processed using the BIOPET Gentrap pipeline v.0.8 developed at the LUMC (https://biopet-docs.readthedocs.io/en/latest/pipelines/gentrap/). The pipeline consists of FASTQ pre-processing (including quality control, quality trimming and adaptor clipping), RNA-seq alignment, read and base quantification and optionally transcript assembly. FastQC v.0.11.2 was used for raw read quality control. Low-quality read trimming was conducted using sickle v.1.33 with default settings. Cutadapt v.1.10 with default settings was used for adaptor clipping on the basis of detected adaptor sequences by the FastQC toolkit. RNA-seq reads were aligned against human reference genome GRCh38 using RNAseq aligner GSNAP v.2014-12-23 with settings ‘–npaths 1–quiet-if-excessive’. Ensembl human genome annotation v.87 was used for raw read counting. The gene read quantification step was performed using htseq-count v.0.6.1p1 with the setting ‘–stranded=reverse’. FASTQ files and the gene count matrix will be made available upon request.

Statistics

Statistical testing of the lineages, subsets and clusters was performed using RStudio v.0.99.902 for Windows (RStudio; www.rstudio.com), R x64 v.3.3.1 for Windows (R Foundation for Statistical Computing; http://www.r-project.org/) and the Global test R package, v.5.24.0 (http://bioconductor.org/packages/globaltest/)55. Survival analysis at the cluster level was performed with the survival R package, v.2.38 (https://CRAN.R-project.org/package=survival), using the Cox proportional hazards regression model. Other statistics were performed using SPSS Statistics v.23 for Windows (IBM). The Fisher’s exact test for contingency tables and the nonparametric Kruskal–Wallis H test were used to compare population characteristics. Log-rank (Mantel–Cox) test was used for survival/time-to-event analysis. Complete blood counts were compared by Kruskal–Wallis H test followed by Dunn’s post hoc test. Hierarchical clustering was performed based on one minus Spearman’s rank correlation with average linkage. Measurements were taken from distinct samples, rather than repeated measurements, unless stated otherwise. P values ≤0.05 were considered statistically significant. Tests were two-tailed, where applicable. Results presented come from single experiments, unless otherwise specified.

Percentages of lineages were expressed relative to total cells, percentages of subsets relative to either total cells or to its respective lineage and percentages of clusters relative to its respective subset. Percentages of lineages and subsets at baseline were log2-transformed for the Global test; percentages of clusters were square-root transformed. Responses of cell lineages, subsets and clusters to the PfSPZ DVI were assessed as relative changes in percentages over time, for example percentage at D5/percentage at C−1 and then log2-transformed. Percentage of cytokine-producing cells as measured by flow cytometry were determined after subtraction of background (uRBC control) from the PfRBC-stimulated samples. Negative values were set to zero. Percentages were calculated relative to the total live, singlet cells. Wilcoxon signed-rank or rank-sum test was used to compare the difference between two groups of samples, paired or unpaired, respectively.

Differential gene expression analysis

For downstream analysis of the transcriptomics data, we included 19,331 genes after excluding genes of the Y chromosome and genes without known Entrez ID or HGNC symbols. Exclusion of low-expressed genes was performed with the filterByExpr function as implemented in the edgeR R package, v.3.24.0, (http://bioconductor.org/packages/edgeR/) resulting in 11,659 genes. This resulted in gene expression profiles sufficient for our exploratory analysis, although it is possible that more lowly expressed genes could have been found with deeper sequencing. The trimmed means of M values method was used to produce normalization factors correcting raw counts for different library sizes.

Differential expression testing was conducted with the limma-voom workflow as implemented in the limma R package, v.3.38.3 (http://bioconductor.org/packages/limma/)56. Correlations between paired measurements were validated by the duplicateCorrelation function in limma. For linear modeling of differential gene expression, a factorial design matrix for a model with no intercept and six groups was constructed (three time points for both Europeans and Africans). Moderated F-statistics combining comparisons for all time points was performed separately for Europeans and Africans to identify DEGs on any contrast. For pairwise comparison between time points, Student’s t-tests with a moderated Bayesian variance estimator were applied. An FDR cutoff of the BH-adjusted P values <0.05 was used to select DEGs.

Hypergeometric tests to test for GO term enrichment with adjustment for gene length were performed with the goana function as implemented in limma. The universe for the enrichment test was restricted to genes included for differential expression analysis. REViGO was used to summarize significant GO Biological Process terms (P < 0.05 after BH adjustment)57. Homo sapiens GO database (GO Biological Process from MSigDB, v.6.2, July 2018; https://www.gsea-msigdb.org/gsea/downloads_archive.jsp) with SimRel semantic similarity was used with medium-allowed similarity. Annotation of GO terms into categories was conducted manually.

Gene set enrichment analysis with CAMERA was performed to compare pathway enrichment between the TBS+ and TBS group. For this analysis, we focused our test on eight GO Biological Process pathway terms that contained ‘interferon-gamma’ in their names. The resulting P values were corrected with the Bonferroni procedure.

Machine learning and data integration

For machine learning, the 4,000 most variable genes were selected and all 176 CyTOF clusters and 49 cellular responses identified by ICS were included. Baseline features were z-scaled before inclusion in machine-learning models. Two machine-learning approaches were tested: Extreme Gradient Boosting, an adaptation of random forests, and DIABLO, which combines partial least-squares discriminant analysis with canonical correlation analysis by aligning individual datasets and outcomes on the same latent space36,58. XGB modeling and tuning of parameters was performed using the Caret package (v.6.0-84) in R, with a tune length of 3 and a leave-one-out (LOO) cross-validation scheme. For each fold, important parameters were extracted using the varImp function. DIABLO was performed using mixOmics (v.6.8.0) with a two-component latent space and tuning was performed using the tune.block.splsda function to determine the minimum number of features that afforded maximum performance in the LOO cross-validation59. The weight on aligning between the different datasets (RNA-seq, CyTOF and ICS), relative to the outcome, was manually tuned with 0, 0.1, 0.25, 0.5, 0.75 and 1 included in the design matrix. At baseline, Extreme Gradient Boosting had an 84% accuracy in classifying Europeans versus Africans and 70% accuracy to classify TBS+ versus TBS Africans, using LOO cross-validation (Extended Data Fig. 7). DIABLO performed better with accuracies of 96% and 85%, respectively. These accuracies were significantly increased over random permutations (n = 1,000), indicating that at baseline both ethnicity and susceptibility to malaria infection could be accurately predicted. Moreover, Extreme Gradient Boosting identified only genes as important features, due to the very large number of genes relative to CyTOF clusters and ICS responses, whereas DIABLO inherently selected features from all datasets. The latter method was thus used to analyze the most discriminating combination of genes, cell clusters and responses between groups. For comparison of Africans versus Europeans, a weight of 0.5 between datasets and ten genes, four CyTOF clusters and four ICS features were used for the final model. For comparison between TBS+ and TBS Africans, a weight of 0.1 between datasets and 4 genes, 6 CyTOF clusters and 12 ICS features were selected for the final model. For each of the rounds in the LOO cross-validation, selected features and correlation between datasets on the first component of the latent space were extracted for analysis.

Of note, the aim of this analysis was to identify features across the datasets that were most strongly associated with naturally acquired resistance to malaria infection rather than to build a predictive machine-learning model, although we used LOO cross-validation to assess model performance in our study.

Protein microarray

Protein microarray experiments and analyses were performed as previously described60,61. Microarray slides were spotted with malaria proteins at the University of California Irvine62. In total, 262 P. falciparum proteins representing 228 unique antigens were expressed using an Escherichia coli lysate in vitro expression system and spotted on a 16-pad ONCYTE AVID slide, representing 228 important P. falciparum antigens known to frequently provide a positive signal when tested with serum from those with sterile and naturally acquired immunity against the parasite. For the detection of binding antibodies, secondary IgG antibody (goat anti-human IgG QDot800, Grace Bio-Labs) was used62,63,64,65.

Study serum samples as well as European control serum were diluted 1:50 in 0.05× Super G Blocking Buffer (Grace Bio-Labs) containing 10% E. coli lysate (GenScript) and incubated for 30 min on a shaker at room temperature. Meanwhile, microarray slides were rehydrated using 0.05× Super G Blocking buffer at room temperature. Rehydration buffer was subsequently removed and samples added onto the slides. Arrays were incubated overnight at 4 °C on a shaker (180 r.p.m.). Serum samples were removed the following day and microarrays were washed using 1× TBST buffer (Grace Bio-Labs). Secondary antibodies were then applied at a dilution of 1:200 and incubated for 2 h at room temperature on the shaker, followed by another washing step and a 1-h incubation in a 1:250 dilution of Qdot585 Streptavidin Conjugate. After a final washing step, slides were dried by centrifugation at 500g for 10 min. Slide images were taken using the ArrayCAM Imaging System (Grace Bio-Labs) and the ArrayCAM 400-S Microarray Imager Software v.2.2.

Microarray data were analyzed in R statistical software package v.3.6.2. All images were manually checked for any noise signal. Each antigen spot signal was corrected for local background reactivity by applying a normal-exponential convolution model66 using the RMA-75 algorithm for parameter estimation (available in the limma package v.3.28.14, https://bioconductor.org/packages/limma/)67. Data were log2-transformed and further normalized by subtraction of the median signal intensity of mock expression spots on the particular array to correct for background activity of antibodies binding to E. coli lysate. After log2 transformation, data were normally distributed. Differential antibody levels (protein array signal) in the different allocated study outcomes (protected participants, nonprotected participants developing parasitemia) were determined by Welch-corrected Student’s t-test. Antigens with P < 0.05 and a fold change >2 of mean signal intensities were defined as differentially recognized between the tested sample groups. Volcano plots were generated using GraphPad Prism v.9.0.0.

Graphing

Graphs were made using R, GraphPad Prism v.7 to 9 for Windows (GraphPad Software), Morpheus heat map software v.2016–2018 (https://software.broadinstitute.org/morpheus) and ColorBrewer 2.0 color schemes (www.colorbrewer2.org). t-SNE plots were made with MATLAB, the dscatter function, v.1.1.0.1 (https://www.mathworks.com/matlabcentral/fileexchange/8430-flow-cytometry-data-reader-and-visualization) and inferno color scheme (https://bids.github.io/colormap/). Arrow plots were created with mixOmics (v.6.8.0). Adobe Illustrator CC version 2015–2021 (Adobe) was used to combine multiple graphs and create the figures.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.