The composition of the intestinal microbiota develops over the first few years of life, and is shaped by multiple factors such as mode of birth, breast-feeding, food intake and antibiotic use1,2. It is regulated by infant immunity, which in turn develops and is shaped by exposure to the microbiota3,4. Intestinal lymphoid tissues require the intestinal microbiota for normal development, and exposure to commensal and pathogenic organisms sets the tone of the systemic and mucosal immune system for the long term. These early interactions have a substantial impact on the development of metabolic and immune-mediated disorders5. There is also emerging evidence in mice and humans for microbiota dependent modeling of local immunity and the subsequent response to infection or vaccination6,7,8. Understanding the complex interaction between the microbiota and immune responses during infancy therefore has potentially important implications for vaccine development as well as prevention of immune-mediated disorders.

Infants in low and lower-middle income countries (LMICs) are exposed to a high burden of intestinal pathogens from birth9,10. It is thought that these infections alter intestinal immune system homeostasis and eventually lead to environmental enteropathy (EE). This understudied disorder is commonly found in children in the poorest socioeconomic settings and is characterized by intestinal inflammation, permeability and histological changes including blunted intestinal villi11. Recent work has begun to elucidate its contribution to malnutrition and changes in the microbiota that may adversely affect nutrient uptake and growth12,13,14.

It has also been suggested that EE may cause poor oral vaccine immunogenicity15,16. Oral vaccines against poliovirus, rotavirus, and cholera are all substantially less immunogenic and effective when given to children in LMICs17. This phenomenon has prolonged polio eradication by limiting vaccine effectiveness18, and is limiting the benefits from the global scale-up in access to oral rotavirus vaccines19. However, recent studies of biomarkers of EE in children at the time of immunization have had mixed findings, with different biomarkers showing positive, negative or no association with oral poliovirus or rotavirus vaccine immunogenicity20. Several other mechanisms may also contribute to poor oral vaccine immunogenicity in LMICs, including interference by high titers of antigen-specific transplacental or breastmilk antibodies. However, withholding breastfeeding does not improve oral poliovirus or rotavirus vaccine response21,22,23,24, and neonatal immunization when transplacental antibody titres are at their highest has in general resulted in comparable levels of seroconversion to doses administered later in life25,26.

In the case of oral poliovirus vaccine (OPV), enterovirus infection at the time of vaccination is associated with a modest decline in vaccine immunogenicity27,28. Recent data suggest this may be the case for oral rotavirus vaccine too29, which is also affected by co-administration of the live-attenuated OPV30. The mechanisms underlying this association are unclear and may relate to direct interference at mucosal and epithelial sites affecting virus replication and modulation of innate and adaptive immunity.

Here we report an investigation of the mechanisms behind poor OPV immunogenicity and the contribution of changes in systemic and mucosal immune homeostasis, including EE. We measured systemic and mucosal immune parameters in peripheral blood and stool samples collected at the time of monovalent serotype 3 OPV immunization of seronegative Indian infants aged 6–11 months. These infants were enrolled in a randomized controlled trial of the effect of the antibiotic azithromycin on EE and OPV immunogenicity31. As primary outcome, we reported that azithromycin treatment had no effect on seroconversion after OPV, despite reducing the enteric bacterial pathogen load and fecal biomarkers of EE. Instead, infection with enteric viruses (enteroviruses and rotavirus) was associated with seroconversion. We now report on extensive immune profiling of these infants at the time of immunization, including mucosal homing immune cells in peripheral blood, plasma cytokines and acute-phase proteins, leukocyte counts, and fecal biomarkers of inflammation. We use a statistical learning approach to identify immune markers measured at the time of vaccination that predict OPV immunogenicity. We end with a consideration of the implications of our findings for improving oral vaccines and oral vaccination strategies.


Study population

We measured markers of innate and adaptive, mucosal and systemic immunity in Indian infants aged 6–11 months and who lacked detectable antibodies at 1:8 dilution to serotype 3 poliovirus at the time of immunization with monovalent type 3 OPV (mOPV3) as part of a clinical trial of the effect of azithromycin on the immunogenicity of this vaccine. In the original study, 50% of infants given azithromycin seroconverted after vaccination with mOPV3, compared with 54% given placebo. Out of the 300 infants randomly selected for this study (stratified by study arm and seroconversion status), 292 completed the study per protocol and were included in our analysis of immune markers. Ex vivo flow cytometry was performed in 129 infants to examine circulating gut-homing T-cell phenotypes (Fig. 1).

Fig. 1: Flow chart of infants included in the analysis.
figure 1

P3 poliovirus serotype 3, mOPV3 monovalent serotype 3 oral poliovirus vaccine, PBMC peripheral blood mononuclear cells, EE environmental enteropathy, CRP C-reactive protein.

Immune measurements

Total cell counts were measured from freshly collected blood samples. We separated peripheral blood mononuclear cells (PBMCs) and plasma within 4–6 h of collection and measured lymphocyte gut-homing markers, plasma cytokines and acute phase proteins (see Methods section for full details). Data on seroconversion to mOPV3, vaccine virus shedding 7 days after immunization, stool inflammatory biomarkers and pathogens detected by Taqman array card (TAC) were also analyzed as previously reported31.

A cross-correlation matrix among all investigated markers shows strong associations among parameters within each class of measurements (measurement ‘module’) but weaker associations across modules (Fig. 2a). Unsupervised clustering of immune parameters did not reveal any strong association of immune phenotype with seroconversion to mOPV3 or treatment arm (Fig. 2b).

Fig. 2: Correlation and clustering analysis of infant immune status variables.
figure 2

a Pearson’s correlation coefficient for the 51 variables in each of the 4 measurement “modules”. Pairwise comparisons are based on all available data (n = 292 infants for data on biomarkers of environmental enteropathy (EE), plasma cytokines and CRP, and leukocyte counts; and n = 129 infants for data on ex vivo flow cytometry). b Heatmap for each infant with variables (rows) clustered using Ward’s minimum variance hierarchical clustering method and infants (columns) grouped by treatment arm and OPV seroconversion status. A tree with 6 clusters showing the grouping of the variables is shown on the left of the heatmap and the variables included in each group on the right. In both plots the variables were log-transformed, normalized, and truncated at +/−3 standard deviations before analysis and plotting. Color bars indicate the scale for each plot. Only infants with complete data for all variables were included and invariant variables after normalisation and truncation were removed (126 infants, 37 variables). SN seroconversion negative, SP seroconversion positive, Az azithromycin arm, Pl placebo arm.

Demographic and seasonal correlates of immune phenotype

Fecal calprotectin and other biomarkers of EE, C-reactive protein (CRP) as a measure of systemic inflammation, plasma cytokines and leukocyte populations, did not differ significantly by age-group or sex after false discovery rate (FDR) correction of p-values (Table 1). There were, however, significant differences in measurements according to the time of year the sample was collected, suggesting an important seasonal component. Measures of intestinal inflammation, including calprotectin, were raised among infants immunized in September–October compared with November–December (Table 1). In contrast, markers of systemic inflammation including interferon-γ (IFN-γ) and interleukin-1β (IL-1β) in plasma and neutrophil counts, were higher during November–December. Nearly all infants (271/292) were breastfed and breastfeeding status was not associated with immune phenotype (p > 0.8 for all variables in Table 1).

Table 1 Variables describing immune status by infant characteristics.

Effect of intestinal infection with pathogenic viruses and bacteria

The number of bacterial or viral pathogens detected in stool at the time of vaccination did not show any significant correlation with EE biomarkers, plasma cytokines, CRP or lymphocyte populations after FDR correction of p-values (Table 1). Calprotectin was raised in infants with bacterial pathogens detected in stool, in agreement with expectations from previous work, but this was not significant after FDR correction31. Viral pathogens were more prevalent among those infants who failed to seroconvert to OPV (prevalence of at least one viral pathogen was 63.9% vs. 45.6%, Fisher’s p = 0.002), as previously reported31. However, detection of any pathogenic virus in stool was not correlated with any of the immune parameters after FDR correction.

Effect of treatment with azithromycin

As reported in Grassly et al.31, treatment with azithromycin reduced fecal biomarkers of EE (myeloperoxidase, calprotectin, and α1-antitrypsin) and the prevalence of pathogenic Escherichia coli and Campylobacter bacteria detected in stool, but did not affect seroconversion to OPV which was 50% and 54% in the treatment and placebo arms respectively. Here we report that azithromycin, despite significantly reducing fecal calprotectin level, did not affect measures of systemic inflammation such as CRP (0.88 mg/L among infants in the treatment arm at the time of vaccination compared with 0.94 mg/L in the placebo arm) or other immune parameters of interest, including circulating CD4+ T cells expressing intestinal or mucosal homing markers and the regulatory cell marker forkhead box P3 (FOXP3) (Table 1).

Association with OPV seroconversion and vaccine shedding

After FDR correction none of the 51 immune parameters showed a significant individual association with OPV seroconversion or shedding of vaccine virus as a marker of vaccine “take” (Table 1 and Supplementary Table 1). Measures of mucosal inflammation (e.g., fecal calprotectin, myeloperoxidase) and systemic inflammation (e.g., plasma IFN-γ and IL-1β) measured in plasma at the time of vaccination were not significantly different among infants according to their subsequent seroconversion or vaccine shedding status (Table 1 and Supplementary Fig. 1). The number of regulatory CD4+ T cells homing to the small intestine (CCR9+) was higher in infants who failed to seroconvert, but this was not significant after FDR correction (16.4 vs. 13.5 cells/μl, FDR p-value = 0.483).

Statistical learning analysis

We used supervised learning (random forest) analysis of immune measurements taken at the time of vaccination to determine whether they allowed accurate out-of-sample prediction of infant response to OPV or receipt of azithromycin. We analyzed the accuracy of each module independently and combined using 10-fold cross-validation and plotted the correlation network for the eight most important variables in a single run of the random forest algorithm for the combined analyses (Fig. 3).

Fig. 3: Random forests analysis to predict seroconversion and study arm.
figure 3

The accuracy of random forests analysis to predict a seroconversion and b study arm for each measurement module individually and all modules combined. For each analysis we performed a 10-fold cross-validation repeated 20 times. The boxes correspond to the interquartile range for the accuracies in the prediction set, with the solid line showing the median and the whiskers extending to the 10th and 90th percentile. The dashed line indicates expected accuracy if a random choice were made. The top eight most important variables that predict c seroconversion and d study arm in one best fit full random forests model are shown as correlation networks. The color of the circle around each variable indicate whether they were positively (green) or negatively (red) associated with seroconversion or azithromycin respectively. The size of each network node indicates the strength of correlation with the outcome and the color indicates whether the variable describes pathogens in stool (orange), EE biomarkers (blue) ex vivo T-cell and total leukocyte count data (pink), or plasma cytokines (green) (as for a and b). Demographic variables were not among the top eight variables. Lines connect circles with a Spearman correlation coefficient of at least 0.2, with the color of the line indicating the strength of the correlation (indicated by color scale bar). EV enterovirus, EGF epidermal growth factor, IL2R interleukin-2 receptor, TLC total leukocyte count, NC neutrophil count, MPO fecal myeloperoxidase, EAEC enteroaggregative Escherichia coli. Analysis is for infants with complete data for all variables only (n = 126 infants).

Combining all measurement modules for 126 infants with complete data we were able to predict seroconversion with a median accuracy across all cross-validation samples of 58% (interquartile range (IQR): 50–69%; Fig. 3a). This compares with 50% accuracy expected by random choice assuming a priori equal probabilities of seroconversion or not. The most important measurement modules contributing to this modest predictive accuracy were enteric pathogens (enterovirus in stool), ex vivo flow cytometry measurements (regulatory CD4+ T cells expressing CCR9), neutrophil and the (correlated) total leukocyte count and plasma cytokines (Fig. 3c). The out-of-sample predictive accuracy for the combined modules to predict shedding of poliovirus after immunization was not different to that expected by chance (Supplementary Fig. 2).

Infant study arm (receipt of azithromycin or placebo 2 weeks before sample collection at vaccination) was predicted with a median accuracy of 66.7% (IQR: 58.3–75.0%; Fig. 3b). All measurement modules contributed to this predictive accuracy with the exception of demography (as expected because of the randomized study design). Important variables included measures of intestinal inflammation and integrity (calprotectin, myeloperoxidase, α1-antitrypsin), regulatory T cells homing to the intestine and enteric bacterial pathogens (Campylobacter, enteroaggregative Escherichia coli) (Fig. 3d).


Our study provides insight into the ecology of the gastrointestinal tract and the developing immune system in infants in a low-income setting with a high load of bacterial and viral intestinal pathogens. Perturbing the intestinal microbiota with azithromycin allowed us to overcome some of the difficulties with complex association data in observational studies and reveals distinct bacteria-associated changes in infant immune status. To analyze these complex data including over 30 infection and 50 immune parameters per child we used a systems biology approach to identify immune signatures that could predict OPV immunogenicity.

We did not find a distinct immune phenotype that was strongly associated with OPV seroconversion or shedding among the immune parameters that we measured. As previously reported, infection with enteroviruses at the time of vaccination was more common among infants who failed to seroconvert to OPV31, but these infections were not associated with a distinct immune signature in peripheral blood or stool (Fig. 2b). Combining all measurements in a machine-learning (random forests) analysis, OPV seroconversion could be predicted with an out-of-sample accuracy of 58% (IQR: 50–69%), only a modest improvement over the 50% accuracy expected by chance. Apart from enterovirus infection, the variables with the highest importance in this analysis were the numbers of CD4+ T cells homing to the small intestine (CCR9+), particularly CD4+FOXP3+ (regulatory) T cells, and also circulating leukocytes, in particular neutrophils, and the fecal biomarker of enteropathy α1-antitrypsin. This is potentially suggestive of a local intestinal inflammation that inhibits OPV immunogenicity. CD4+ regulatory T cells have a well-established role in intestinal inflammation—they home towards the inflamed gut and can resolve intestinal inflammation32. They are also involved in the early antiviral immune response to mucosal infection in mice33,34. However, without mucosal tissue samples, it is difficult to conclude their role in the response to OPV in these infants.

Shedding of poliovirus after vaccination was strongly correlated with seroconversion, and markers of inflammation were somewhat raised among infants who did not shed poliovirus after vaccination as seen for seroconversion (although these differences were not significant in univariate analyses after FDR correction). However, shedding of OPV after immunization was not predicted with any degree of accuracy above that expected by chance on the basis of the immune parameters that we measured.

We did find significant seasonal variation in several immune parameters, including raised measures of intestinal inflammation and lower levels of systemic proinflammatory plasma cytokines (IL-1β, IFN-γ) and neutrophil counts in the warmer months of September to October compared with November to December. This may reflect differences in exposure to infection, with many diarrheal pathogens more common in the warmer months in this population and respiratory illness peaking in colder months9,35.

Treatment of infants with azithromycin resulted in a network of correlated immune changes that were distinct from those associated with OPV seroconversion. This is consistent with the absence of an effect of treatment on vaccine response. Prior receipt of azithromycin or placebo could be predicted with a median of 67% out-of-sample accuracy (IQR 58–75%). In addition to the previously reported reduction in bacterial pathogens and biomarkers of intestinal inflammation, important variables in the random forests analysis included gut-homing (β7+ or CCR9+) regulatory CD4+ T cells and neutrophil counts. Systemic inflammation measured by CRP in plasma was not affected by treatment, despite the known anti-inflammatory effects of azithromycin, which may be mediated both through a reduction in bacterial load and direct effects on cell signaling pathways36.

Our study has a number of limitations. It is observational in nature and lacks mucosal tissue that would have allowed a more direct assessment of intestinal immunity. Although pediatric endoscopy has an excellent safety record, we decided not to take intestinal biopsies because infants were asymptomatic and healthy despite raised levels of EE biomarkers. We therefore relied on inference from the systemic compartment and from circulating cells expressing mucosal and gut-homing receptors/integrins. Further investigation of underlying mechanisms responsible for oral vaccine failure may benefit from examination of tissue biopsies taken during diagnostic procedures and use of appropriate in vitro and animal models37,38.

An additional limitation is that we focused on a single population of seronegative infants in south India with high levels of EE biomarkers. Indeed, fecal biomarkers of inflammation in our study are among the highest reported from any cohort described so far in the world39. It will be important to determine how our findings in this population translate to other settings, such as in sub-Saharan Africa, where oral vaccine immunogenicity is also compromised. We deliberately enrolled infants without detectable serum neutralizing antibodies to serotype 3 poliovirus, but these infants were previously exposed to trivalent oral poliovirus vaccine and may have developed an immune response that was not detectable in blood or that had waned by the time of enrollment. It is therefore possible that our population included both naïve and immunologically primed infants with a potential impact on the subsequent immunogenicity of OPV. However, we were not able to distinguish such a population and analysis of infant vaccination history and seroprevalence at screening for eligibility was consistent with a simple hypothesis of a small, independent probability of vaccine failure per dose of earlier vaccine40.

Finally, we do not report on the commensal intestinal microbiota here, which is known to have immunomodulatory effects3. However, in a subset of infants from this study the bacterial microbiota was assessed by 16S rRNA sequencing of stool and found not to correlate with OPV immunogenicity or shedding28.

In conclusion, our study did not find a strong predictor of OPV immunogenicity in this population in south India. Infection with enteric viruses and changes in gut-homing regulatory T cells showed a modest association with OPV immunogenicity. Further work would be required to investigate this possible association, ideally including mucosal samples. It is clear that intestinal homeostasis among these infants is skewed towards an inflammatory state that maintains health in the presence of multiple intestinal pathogens. On average, each infant had >2 pathogen targets detected in stool using TAC but only 2% had diarrhea at enrollment31. We do not fully understand the mechanisms responsible for the development and maintenance of this inflammatory environment, although they are likely to reflect the integration by the infant immune system of multiple signals from intestinal viral and bacterial pathogens and commensals3,41. Further investigation of these mechanisms in humans and in animal models may identify pathways that could be targeted by coadministered drugs or adjuvants that enhance the immune response to oral vaccines.

Translating our current findings to interventions to improve oral vaccine immunogenicity and effectiveness is difficult. Treatment of enteric virus infection or directly perturbing infant intestinal immune environment from an inflammatory state that might be adaptive in this community may well be detrimental. However, alternative approaches could include changes to an earlier immunization schedule, including neonatal dosing, before exposure to pathogenic enteric viruses, or interventions to promote gut health of neonates, such as maternal immunization, water and sanitation interventions or promotion of exclusive breastfeeding.


Study population

Infants aged 6–11 months living in Vellore, south India, were enrolled in a double-blind, randomized, placebo-controlled trial to study the effect of a 3-day course of azithromycin on serotype-3 poliovirus vaccine immune response. Infants were enrolled in the study if they lacked detectable antibodies against serotype 3 poliovirus at a dilution of 1 in 8 and were medically fit. They were excluded if they had a history of allergic reaction after oral poliovirus vaccine, had chronic diarrhea (>14 days), were receiving immunosuppressant medication, or they or their mother had syndromic or documented evidence of being immunocompromised. Full details of the study population and the result of treatment with azithromycin have been published previously31. For the present study, we selected a random sample of 300 infants from those who had completed the trial and had sufficient sample volumes, with equal numbers chosen by study arm and poliovirus seroconversion status (defined as the detection of serotype-3 poliovirus-specific serum neutralizing antibodies at a dilution of 1 in 8 or higher in blood taken 21 days after vaccination).

The trial was performed in accordance with good clinical practice and ethical principles of the Declaration of Helsinki, including the collection of written informed consent from parents. The study was approved by the Institutional Review Board of the Christian Medical College Vellore and the Drugs Controller General of India. All biological samples were coded with a unique ID linked to the study participant ID, to ensure that only anonymized samples were analyzed and that laboratory personnel were blinded to study group assignment. The trial was registered with the Clinical Trials Registry India on 9 May 2014, number CTRI/2014/05/004588.

Full blood count and isolation of peripheral blood mononuclear cells

Total white blood cell count and differential leukocyte count (lymphocytes, neutrophils, monocytes and eosinophils) were performed in blood collected in EDTA tubes based on standardized methods by Leishman’s staining in a Neubauer’s chamber by trained technicians using standardized methods. Peripheral blood mononuclear cells (PBMC) were isolated by Ficoll Hypaque gradient method on the day of vaccination (day 14 of the trial). PBMC were analyzed ex vivo using flow cytometry (FACS), within 4–6 h of collection.

FACS analysis

Samples were acquired on a BD FACSAria™ III running a BD FACSDIVA Ver.8.0.1 software and results analyzed using FlowJo X 10.0.6 software (Supplementary Table 2). Cell viability was determined using LIVE/DEAD® Fixable Aqua Dead Cell Stain Kit. Appropriate single stained controls and florescent minus one (FMO) controls were used. The PBMCs collected on the day of administration of monovalent OPV3 (pre-vaccination), were assessed for the lymphocyte subsets with a special focus on the gut-homing subpopulations. Staining was performed for CD3, CD4, CCR6, CCR9, FOXP3, and β7-integrin (Supplementary Table 2). This allowed us to define the following populations: CD3+ T cells, CD3+CD4+ T cells, CD3+CD4+FOXP3+ regulatory T cells, and investigate CCR6+, CCR9+, and integrin β7+ gut homing subsets (gating strategy shown in Supplementary Fig. 3).

Plasma cytokine and C-reactive protein levels

Plasma was separated from blood samples collected on the day of vaccination and stored at −80 °C until analysis. Samples were tested for a diverse range of pro-inflammatory and anti-inflammatory cytokines, growth markers and chemokines using the Human Cytokine Magnetic 30-plex panel and analyzed using a LuminexTM platform. This includes chemokines (Eotaxin, IP-10, MCP-1, MIG, MIP-1α, MIP-1β, and RANTES), growth factors (EGF, FGF-basic, HGF, and VEGF), and cytokines (G-CSF, GM-CSF, IFN-α, IFN-γ, IL-1β, IL-1RA, IL-2, IL-4, IL-5, IL-6, IL-7, IL-8, IL-10, IL-12 (p40/p70), IL-13, IL-15, IL-17, and TNF-α). Results were calculated using the 5PL fit equation on the Bio-plex Manager software Version 6.1. Plasma C-reactive protein (CRP) levels were estimated using the ProcartaPlex Human CRP Simplex kit and analyzed using the LuminexTM platform.

Plasma cytokine and CRP assays were run across multiple 96-well plates. We found significant plate-to-plate variation in estimated analyte levels using the LuminexTM platform. This variation remained even after using more stringent thresholds for analyte minima (at value of 6th dilution standard across 7 dilutions) and checking readings across plates used to fit the standard curve were within 10% of each other. For example, the F-statistic for IL12 across the five plates used for plasma cytokine measurement was 22.2 (p-value < 0.001) and for CRP across the four plates used was 16.5 (p < 0.001). The distribution of samples across plates was approximately equal (plasma cytokines) or deliberately randomized (CRP) with respect to study arm (azithromycin vs. placebo) and outcome (seroconversion) and we therefore included these measurements in our analyses. However, this highlights the importance of checking batch effects in high-throughput studies.

Environmental enteropathy biomarkers

Stool biomarkers of inflammation (myeloperoxidase, calprotectin), protein-losing enteropathy (α1-antitrypsin) and immune activation (neopterin), and plasma biomarkers of microbial translocation (soluble CD14 and endotoxin-core IgG) and epithelial damage (intestinal fatty acid binding protein) were measured on the day of vaccination using commercial enzyme-linked immunosorbent assays (ELISA)31. Stool aliquots for the fecal biomarker ELISA assays were supplemented with a cocktail of protease inhibitors before being stored at −70 °C.

Detection of poliovirus shedding and stool pathogens

Shedding of serotype 3 Sabin poliovirus in stool samples collected 7 days after vaccination was assessed using a singleplex quantitative real-time PCR, using RNA extracted by Vx reagents on a Qiaxtractor42. Bacterial, viral, and eukaryotic pathogens were analyzed using total nucleic acid extracted from stool samples collected on the day of vaccination using an enteropathogen Taqman Array Card (TAC) quantitative PCR assay developed by the Division of Infectious Diseases and International Health, University of Virginia31,43. TAC assay targets allowed identification of pathogenic bacteria (Aeromonas, Campylobacter, Clostridium difficile, Helicobacter pylori, Salmonella, Shigella, Vibrio cholerae, and enteroaggregative, enteropathogenic, enterotoxigenic and shiga-toxin producing Escherischia coli (EAEC, EPEC, ETEC, and STEC)), viruses (Adenovirus, Astrovirus, Enterovirus, Norovirus, Rotavirus, and Sapovirus) and eukaryotes (Ancylostoma, Ascaris, Cryptosporidium, Cyclospora, Enterocytozoon bieneusi, Entamoeba histolytica, Encephalitozoon intestinalis, Giardia, Isospora, Necator, Strongyloides, Trichuris).

Statistical analysis

Data on fecal and plasma biomarkers of environmental enteropathy, circulating ex vivo T-cell phenotype, plasma cytokines and leukocyte counts for samples collected on the day of vaccination (study day 14) were compiled in a single dataset. Correlation among the variables was assessed by calculating Pearson’s correlation coefficient for the log-transformed variables rescaled to have a mean of zero and standard deviation (SD) = 1. Univariable comparisons between groups were based on analysis of variance for the log-transformed variables or Wilcoxon’s (non-parametric) rank sum test for the untransformed data (two-sided tests). p-values for the univariable tests of significance were corrected for multiple comparisons using FDR correction44.

Hierarchical cluster analysis of variables in the complete dataset was performed using Ward’s minimum variance criterion45. Heatmaps of the clustered dataset were plotted to visualize the relationship between infant immune phenotype and study arm/seroconversion status. The association between the immune phenotype data and classification of infants according to their seroconversion status or study arm was assessed using the random forests algorithm46. For each analysis we report the median accuracy from a 10-fold cross-validation using 20 random forests for each fold. Variables were ranked by their importance in the random forests analysis and the top eight most important variables plotted in a correlation network with links between pairs of variables shown if their Spearman rank correlation coefficient was greater than 0.2. Variable importance was assessed by the mean decrease in the Gini coefficient resulting from their inclusion in the random forest model.

All analyses were performed in the R programming language (R Core Team. R: A Language and Environment for Statistical Computing. Individual R packages were used during the analysis including pheatmap for the heatmap plots, beeswarm for the univariable plots, randomForest for the random forest analyses, and igraph for the network plots.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.