Effects of infection history on dengue virus infection and pathogenicity

The understanding of immunological interactions among the four dengue virus (DENV) serotypes and their epidemiological implications is often hampered by the lack of individual-level infection history. Using a statistical framework that infers full infection history, we analyze a prospective pediatric cohort in Nicaragua to characterize how infection history modulates the risks of DENV infection and subsequent clinical disease. After controlling for age, one prior infection is associated with 54% lower, while two or more are associated with 91% higher, risk of a new infection, compared to DENV-naive children. Children >8 years old have 55% and 120% higher risks of infection and subsequent disease, respectively, than their younger peers. Among children with ≥1 prior infection, intermediate antibody titers increase, whereas high titers lower, the risk of subsequent infection, compared with undetectable titers. Such complex dependency needs to be considered in the design of dengue vaccines and vaccination strategies.

The authors make use of a very nice cohort dataset from Nicaragua, applying statistical models to estimate the relationship between unobserved infection history and subsequent infection and disease outcomes. Getting at such unobserved processes is notoriously challenging, and the authors have presented a sensible approach for getting at this problem. However, in my view there are a few areas where the analysis could be strengthened.
I had the following main comments: -In Figure 1, the authors compare model estimates for infection rates with observed surveillance data as a validation. But if I understand 2.4 of the supplement, the surveillance data is used to infer the left-censored infection history? So it shouldn't be surprising that the two line up pre-2004?
-On page 15, the authors discuss the role of waning cross-protection. They estimate an increase in risk from the 3rd year onwards, but I wonder whether it would be possible to establish a clearer relationship with the model they are using. If responses wane over time, protection should presumably decline monotonically -would it be feasible to explicitly incorporate a function of this type into the model and estimate the likely waning rate?
-The focus on the role of fans and home ownership seems a little odd given that there were multiple comparisons in Table S6 and although the univariable p-values were significant for these two, they were not overwhelmingly so (the evidence in support of a role for school type or mother education was not much weaker).
-On page 18, the authors note the issue with using a 4-fold cutoff for individuals who have higher baseline titres. It's therefore plausible the use of 4-fold rise as a correlate for infection could have influenced conclusions in other ways too. What would the results look like if the authors used a 2fold cutoff everywhere? -There have been several recent studies using Bayesian approaches to infer unobserved infections for dengue (DOI: 10.1038/s41586-018-0157-4) and influenza (DOI: 10.1101/330720, 10.1371/journal.pbio.2004974). Unlike the current study, these incorporated titer values as well, so would be worth discussing how the methods described fit in with other work on similar questions. The authors suggest antibody titers weren't modelled because of limited availability, but they feature prominently in the results and discussion.
-It was useful to see a simulation study to check the model's ability to infer infections, but there weren't many measures of model performance when applied to the data, e.g. residuals. There is also likely to be a lot of colinearity between the variables in 2.2 and 2.3 -would it be possible to tease out which of these are actually most predictive of infection and disease risk? -It would be helpful to include a model schematic in the supplement -there are lot of different inputs, so it would be good to have a summary of how they fit together.
-I couldn't find a data availability statement. For the analysis to be reproducible, the authors should make the input data for the model available alongside the paper (or at least, what can be published without affecting participant identifiability), and ideally the model code.
-In Figure 4B, what is causing the peak in the distribution at 9 years? Is it an artefact of the timing of the outbreaks?
Minor point: -On SI page 9, should rho*c be rho/c (as this is what appears in the poisson function)?
Reviewer #2 (Remarks to the Author): The major claims of this paper relate to the risk of infection and the proportion of infections that are estimated to be symptomatic or asymptomatic at different times after different numbers of infections and with different antibody titres.
The estimates of risk of disease given infection are not novel, with multiple previous papers looking at these in this cohort. In addition, these results are also not clearly put in context of other estimates of this risk, for example: Clapham et al. 2017 PLoS NTDs (using aggregated data from multiple cohorts) estimating a change in risk of disease at different times following infection. The risk of disease at different antibody levels is also shown in Katzelnick et al. 2017 Science and though this paper is discussed, it is still not clear how these two analyses differ.
The estimates of the risk of infection given different infection histories and Ab titres is interesting. However for infection histories, I feel the results could be better presented to help the reader understand how this risk changes over time after the different infections.
In general I found the results of this paper presented as currently not very easy to interpret.
Specific comments: I'm not sure how meaningful it is to present the "crude odds ratio" here when the paper shows much variation in this number for different groups.
For the annual risk of risk of infection-please be clear what data this was calculated using.
Please define a 3-4 year cycle as these cycles are not clear to me from the data shown. DENV2 perhaps has two peaks 3-4 years apart, whereas DENV4 is low all years, there is one peak of DENV1 in 2003, DENV3 peaks in 1998 and then has low transmission until 2009. Why was the age group cut -off chosen as 8 years old?
Bottom of Page 7: Please be more specific about how population growth might have contributed. Figure 2: For the "Years since most recent primary infection" (and non-primary) is the "2" meant to be "between 1 and 2" and should the reference be <1?
The presentation of the OR for risk of infection given one past infection, doesn't make sense given the fact that it is later shown that this risk varies depending on time since infection. In my opinion makes this hard for the reader to understand what the impact of the time after infection is and leads to conclusions about what happens generally after one infection, instead of at the different times. If the risk is decreased at any point after one infection, but then increased in those who have an infection more than a year ago compared to one year, what does it mean for the overall risk of infection 1 year after infection compared to no infection?
The serotype specific estimates are interesting, but it would be informative to see this broken down by first and second infection too, the estimates here are an amalgamation of infections occurring in multiple types of individuals which may be obscuring different relationships in different groups. Figure 3: For time since infection, is this also between 1 and 2 years, and less than 1 as in Figure  2? Also the time since infection should be broken down into since primary and since non-primary infection as in Figure 2. This is important because of the different waning of immunity after the different infections.
Sentence that begins: "Distribution…" Please clarify what "restricting the analysis to the study period" means. Discussion: The first paragraph makes it sounds like you have not used the individual level data or present a method that does not need such data. Please rephrase.
Not sure what "both in reference to one year after Infection" means here? The fact that risk of infection decreases after 1 infection is perhaps counter-intuitive, again there is a conflation of multiple time periods here-and the authors have the data to tease this apart-as they should comment on here.
For the sentence beginning: "While secondary infection….." As on my point on Figure 3, I think that this analysis has not taken into account the time since infection which could hide the relationship between number of previous infections and outcome. Once this analysis has been done, please comment on these results here.

Reviewer #1 (Remarks to the Author):
The authors make use of a very nice cohort dataset from Nicaragua, applying statistical models to estimate the relationship between unobserved infection history and subsequent infection and disease outcomes. Getting at such unobserved processes is notoriously challenging, and the authors have presented a sensible approach for getting at this problem. However, in my view there are a few areas where the analysis could be strengthened.

Response:
We thank the reviewer for the encouraging comments. years. If the model cannot explain either the cohort data or the surveillance data, it is likely to predict unexpected deviation of the trend of infection risks from that of the surveillance counts. This comparison assures us that the information contained in the cohort data during the study period is more or less consistent with that contained in the surveillance counts. We think the current discrete format with 3 categories, which is more or less equivalent to a two-parameter function, is the most robust way to characterize the general variation pattern of cross-protection. If more years of data become available in the future, we will certainly consider a flexible function to model cross-protection.
(3) The focus on the role of fans and home ownership seems a little odd given that there were multiple comparisons in Table S6 and although the univariable p-values were significant for these two, they were not overwhelmingly so (the evidence in support of a role for school type or mother education was not much weaker).

Response:
We included the number of fans and home ownership because they are the only two that reached the predefined cut-off p-value of 0.05. Indeed, the univariate p-values were not adjusted for multiple comparisons. The purpose of this screening analysis is to rank the potential predictors. Mother education was not considered also because only the "university" category differs from other categories in incidence of DENV infection, but this category has a small number of children. We fitted a logistic regression of observed DENVinfection status during the study period on school type, home ownership and number of fans, where private and semi-private were combined for school type, and number of fans was treated as continuous. School type was not significant (p-value=0.5) in the presence of home ownership (p-value=0.014) and number of fans (p-value0.032). As a result, we think the number of fans and household ownership best represent the social economic variables 3 that may be predictive of DENV infection. To clarify the rationale of this choice, we added these reasons to the end of Section 4 in the Supporting Information. Response: The phenomenon that a high baseline antibody titer may not be boosted much after infection has been seen in many infectious diseases such as influenza and is termed the antibody ceiling effect 1,2. Another influenza study showed that a ceiling effect is rare, if any, for low to moderate baseline titers 3 . In our data, a 2-fold iELISA cutoff leads to twice as many infections as a 4-fold cutoff and will likely induce a low specificity of the infection status, in particular for low baseline titers. In the dengue antibody paper (Salje et al.) you suggested 4 , a 2-fold cutoff in the mean serotypic HI titers seems to have more satisfactory combination of sensitivity and specificity than a 4-fold cutoff. However, in our study, the iELISA titer measures the overall non-serotypic antibody level, which is subject to much higher variation than the mean serotypic HI titer (taking an average can shrink the variability substantially). Consequently, we did not pursue additional analyses using a uniform cutoff of 2-fold. Response: Thank you for suggesting the references. We agree that it would be more informative to model the dynamics of the antibody levels simultaneously. However, we do believe a fine modelling of antibodies should be supported by more data in our setting.
Given that we have annual samples for 6 years but children were born up to 10 years before the study period, we do not think it is feasible to reliably infer the historical antibody levels. In the Thailand-KPP cohort described in Salje et al.' s Nature 2018 paper 4 , the quarterly serological sampling is much more frequent than our study, and the serotypic titers further enriched the information available (note that the random effects modelling the antibody dynamic share mean and variances across serotypes in their paper). In addition, in Salje et al.'s paper, the effect of infection history on subsequent infection risks was assumed to be fully mediated by antibody level, and therefore it suffices to estimate baseline antibody level rather than to impute the whole pre-study infection history. Although we were not able to model the antibody dynamics, our study does share similar important findings with the Thai study. For example, the probability of disease was largely not correlated, with preseason antibody levels. Interestingly, the somewhat higher probability of disease given infection at very high titers (>1280) in our study also appeared true in the Thai study (empirical probability, log2 titer>6, extended data figure 6), although the two studies used different assays. Our study also has distinct findings, e.g., an ADE pattern for association of the risk of infection with preseason titer among those with at least one prior infection. We have added these comparisons to the 9 th and 10 th paragraphs in DISCUSSION (pages 20-21). Response: We agree a model schematic would be helpful and have attached one as Figure   S4 in the SI Appendix. Description about this schematic can be found in Section 2.5 of SI and the figure's caption.  suggesting the possibility of spatial and temporal heterogeneity in pathogenicity 1 . We also pointed out in paragraph 7 (page 18), although the pooled analysis found secondary infections were more likely to be symptomatic than primary ones, such a difference was not seen in Nicaraguan data included in the analysis, nor was it observed for most serotypes except for DENV 1. We explained that our analysis was adjusted for age but the pooled analysis was not which may partially account for the gap between the two analyses.
We ORs for 1, 2 and ≥3 years are 0.46, 0.52 and 0.68 for the former and 1.91, 1.45 and 1.26 for the latter". We referred readers to the Section 7.2 of the SI Appendix for details on how these numbers were obtained.

Specific comments:
(1). I'm not sure how meaningful it is to present the "crude odds ratio" here when the paper shows much variation in this number for different groups.

Response:
We do agree with the reviewer that the crude odds ratios are not very meaningful given that important risk factors such as age group and infection history are not controlled for. We moved these odds ratios from the main text to the SI Appendix (Sec. 7.1) to serve as auxiliary information for the relative virulence of the serotypes.

(2). For the annual risks of infection-please be clear what data this was calculated
using.

Response:
The estimation of annual risks of infection is based on the individual data from the cohort and the surveillance data. We added "Based on the model fitted to individual data from the cohort and the surveillance data" at the beginning of the paragraph subtitled "Annual risk of infection". Please also see the newly added model schematic in Figure S4 in the SI Appendix.

Response:
We agree the 3-4 year cycle is not really clear. Therefore, we removed that sentence from the manuscript.
(4). Why was the age group cut -off chosen as 8 years old?

Response:
We chose the cut-off of 8 years old because dengue vaccine is licensed for 11 individuals aged with 9 or above (http://www.who.int/immunization/research/development/dengue_q_and_a/en/) and we hope to provide relevant information for future research on vaccine design or vaccination strategies. We added this explanation to the SI Appendix Section 2.3.
(5). Bottom of Page 7: Please be more specific about how population growth might have contributed.

Response:
The case number is given as the attack rate multiplied by the population size.
Therefore, for a growing population, an epidemic in 2009 would have caused more cases than those in the nineties even with similar infection probabilities. We added "that is, the number of DENV-3 cases captured by surveillance in 2009 could be higher than those in the nineties because of a larger susceptible population, although the risk of infection did not increase" (bottom of page 8).

(6). Figure 2: For the "Years since most recent primary infection" (and non-primary)
is the "2" meant to be "between 1 and 2" and should the reference be <1?

Response:
We did an additional analysis stratifying serotypic probabilities of disease given infection by the number of prior infections (0 vs. ≥1). We did not see significant difference in the pathogenicity for DENV-1 and DENV-3. However, the pathogenicity of secondary infections with DENV-2 doubled as compared to primary infection with DENV-2. This could be related to that fact that the DENV-2 epidemics during the study period was preceded by DENV-1 epidemics during 2002-2004 in Managua (Fig. 1), as increased pathogenicity of DENV-2 in terms of severe disease following previous infection with DENV-1 has been noticed before. We have presented this result in Table S11 of the SI Appendix and discussed it at the end of the 7 th paragraph in DISCUSSION (page 18-19). For simplicity and also due to data limitations, this stratified analysis is not further stratified by age group. What we saw is very similar to the original results stratified by age group, the pathogenicity peaked in year 2 for one prior infection (matching 2-8 years old) and in year 1 for two or more prior infections (matching >8 years old). This similarity is very likely because the effect of age is confounded with that of the number of prior infections, as older children tended to have more prior infections. We presented the results as Table S13 in the SI Appendix and discussed it in the 8 th paragraph in DISCUSSION (page 19).
(10). Sentence that begins: "Distribution…" Please clarify what "restricting the analysis to the study period" means.
14 Response: We rephrased this sentence as "When only the infection pairs that occurred within the study period (2004-2009) were used for estimation" (bottom of page 12).
(11). Discussion: The first paragraph makes it sounds like you have not used the individual level data or present a method that does not need such data. Please

rephrase.
Response: Thanks for the suggestion. Our method surely need individual level data. We now stated in the paragraph that "we developed a novel statistical approach that combines individual-level data from prospective serological cohorts with surveillance data".

Response:
We agree this statement is confusing and thus rephrased the sentence as "We found that one prior infection was protective against but two or more were a risk factor for another DENV infection, and both effects decayed over time" (page 16). Please see our response to Specific Comment (7) where we offered a better interpretation for the dependence of infection risk on the time since most recent infection after primary and secondary infections. Response: Please see our responses to specific Comments (8) and (9). We have done additional analyses stratifying either serotypic pathogenicity (Table S11) or the effect of time since last infection (Table S13) Table S13, we also added the following to the 8 th paragraph (page 19): "These