Clustering of susceptible individuals within households can drive measles outbreaks: an individual-based model exploration

When estimating important measures such as the herd immunity threshold, and the corresponding efforts required to eliminate measles, it is often assumed that susceptible individuals are uniformly distributed throughout populations. However, unvaccinated individuals may be clustered in a variety of ways, including by geographic location, by age, in schools, or in households. Here, we investigate to which extent different levels of within-household clustering of susceptible individuals may impact the risk and persistence of measles outbreaks. To this end, we apply an individual-based model, Stride, to a population of 600,000 individuals, using data from Flanders, Belgium. We construct a metric to estimate the level of within-household susceptibility clustering in the population. Furthermore, we compare realistic scenarios regarding the distribution of susceptible individuals within households in terms of their impact on epidemiological measures for outbreak risk and persistence. We find that higher levels of within-household clustering of susceptible individuals increase the risk, size and persistence of measles outbreaks. Ignoring within-household clustering thus leads to underestimations of required measles elimination and outbreak mitigation efforts.

The global elimination of measles is an important goal of the WHO 1 . To reach this goal by 2020, the WHO has set the target of 95% vaccination coverage, both at the country and district level. Many regions are close to reaching this target, or have vaccination coverage levels that already exceed this threshold. Nevertheless, outbreaks of measles still occur in these same regions [2][3][4] .
Increasing attention is being devoted to explain why outbreaks still occur in regions with high vaccination coverage [5][6][7] . Finding the answer to this question might give us an indication of where the barriers for successful elimination of measles lie and enable us to make the final push for eradication of this disease.
Such an explanation is undoubtedly multifaceted, relating to vaccination coverage and vaccine effectiveness over time, as well as to the intricacies of social interactions between individuals in populations. In relation to the latter, one potentially contributing factor is that susceptible individuals might not be randomly distributed in a population, and that this may influence to which extent an overall high vaccination coverage avoids outbreaks from occurring or persisting.
The traditional formula to estimate a threshold for herd immunity states that a fraction as large as 1 − (1/R 0 ) should be immune to infection for herd immunity to be ensured. With an R 0 for measles between 12 and 18, a population immunity level of 95% should thus be enough to stop the disease from spreading 8 . However, this formula is based on the assumption that susceptible individuals are evenly distributed throughout the population and that mixing occurs homogeneously. This is, however, not realistic, and may lead to an underestimation of the herd immunity threshold 9 .
When susceptible individuals are clustered in a population, this may lead to a higher risk for measles outbreaks to occur, even when a high level of population immunity already exists 10, 11 . Susceptible individuals in a population may be clustered in a number of ways. One way of clustering, which has already received some Scientific Reports | (2020) 10:19645 | https://doi.org/10.1038/s41598-020-76746-3 www.nature.com/scientificreports/ attention, and which has turned out to have an impact on the risk for measles outbreaks, is clustering based on geographical proximity 9,12,13 . Such findings support policy and provide a basis to set routine immunization levels not only at country-but also at district level, or to recommend campaigns in under-immunized regions 9,12 . Another way in which individuals can be clustered is by age. As social contact patterns are highly assortative with age 14 , susceptible individuals belonging to under-immunized age categories are likely to cluster and thus to be at greater risk for measles infection 9,15,16 . Other reasons for the clustering of un-vaccinated individuals can be socio-cultural and/or socio-economic, i.e. related to religion, ideology, educational attainment, income, or psychological receptiveness to rumours of vaccine associated adverse events 2,[17][18][19][20] . Finally, the household one belongs to can also be influential in determining one's vaccination status. Parents who have a negative attitude towards vaccination, or no access to vaccination resources, might leave one or more of their children un-vaccinated [21][22][23] . This can lead to an accumulation of susceptible children and young adults within the same households.
In the current paper, we aim to investigate the influence of within-household clustering. We believe that it is valuable to focus on this form of susceptibility clustering for three main reasons. First, the extent to which susceptible individuals are clustered within households is something that is relatively easy to measure 'in the field' . The definition of what constitutes a household can be articulated clearly, and a simple survey could shed light on the actual number of immune versus susceptible individuals in each household. Second, evidence from several past measles outbreaks suggests that the household is an important place for the transmission of this infection 19 . We thus hypothesise that within-household clustering of susceptible persons could have an important impact on the risk of measles outbreaks.
Finally, as the recommended age for the first dose of a measles-containing vaccine in many regions, including Flanders, is at 12-months old, infants under 1 year of age constitute an important risk group for measles infection. Infants sharing a household-where frequent contacts occur-with older unvaccinated children are at increased risk of exposure, since within-household mixing tends to be frequent and intense.
To investigate the impact of different levels of within-household susceptibility clustering on the risk for measles outbreaks, we use an individual-based model. In this type of model, the population consists of unique entities that interact with each other under the constraints of a given social structure and transmission model. This makes it possible to model multiple levels of heterogeneity in the population, such as differences in social contact behaviour or in preventive behaviour. Such sources of heterogeneity are especially relevant for diseasessuch as measles-for which a high level of immunity already exists within a population 24 .
In Flanders, Belgium, 96.2% of infants were vaccinated with measles-containing vaccines at 1 year of age, and 93.4% of children at 10 years of age, as of 2016 22,25 . This is above-or, for adolescents, close to-the threshold that was set by the WHO. However, this does not mean that Flanders is safe from measles outbreaks. Regions enjoying similarly high overall vaccination coverage have recently experienced severe outbreaks 4,6,26 , and in Flanders, smaller outbreaks regularly occur 2,3,17,27 .
In this paper, we will use Stride, a previously developed individual-based model for the transmission of infectious diseases 28 , to investigate the impact of different levels of within-household clustering of susceptible individuals on the risk for measles outbreaks. To parametrise the model we used data from Flanders, Belgium. We introduce a metric to estimate the level of within-household susceptibility clustering in a population. Furthermore, we investigate how the level of clustering within households impacts the probability of an outbreak to occur, the distribution of final outbreak sizes, and the distribution of the effective reproduction number.

Methods
Stride is a stochastic model, meaning that some of its processes, such as the simulation of social contacts and disease transmission have a probabilistic component. The simulator is designed to be versatile: by supplying different input files, it is possible to simulate a broad range of populations, scenarios and diseases. The core logic of the model is implemented in C++, making it highly portable and open to performance-optimization 29 . Stride is an open-source project. Its code is maintained in a public Github repository 30 . We briefly discuss the main features of the model and simulator here, and refer to Kuylen et al. 28 for more details.
Population. The population we used consisted of 600,000 individuals, a sample of 10% of the population of Flanders. The synthetic population consists of reference households, obtained from a social contact survey conducted in Flanders in 2010-2011 31,32 . The geographic distribution of these households is based on 2001 census data 33 . The population is closed, meaning that no births or deaths occur during the simulation. Children are assigned to a daycare (0-2 years old), preschool (3-5 years old), primary (6-11 years old), secondary (12-17 years old) or tertiary (18-23 years old) school based on enrolment statistics for Belgium, as acquired from Eurostat 34 . Adults (18-64 years old) are assigned to a workplace based on age-specific employment data and aggregated workplace size data from Eurostat and commuting data from the 2001 census. To account for general contacts, all individuals are also assigned to two artificial 'communities' . One of these represents the general contacts made during the week, while the other represents those made during the weekend, as social contact behaviour differs substantially between work week and weekend days. Each of these communities consists of 1000 individuals on average, which is in line with the size of such communities used in a previous model 35 .
The age-dependent contact rates that determine social behaviour in each of these contexts-to which we refer as 'contact pools'-are based on data from a social contact survey in Flanders 32,36 . We did not model holidays.
Immunity. At the initiation of each simulation, each individual is defined to be either immune or susceptible to the simulated infection. Individuals are marked as immune based on their age and the target level of household-based clustering. This target level of clustering is in turn determined by an input parameter with a value www.nature.com/scientificreports/ between 0 and 1. In short, this parameter represents the probability that if one person-born since 1985-in a household is vaccinated, all other persons born since 1985 in that household will also be vaccinated.
To immunize the population, we follow the procedures that we describe below. The Immunizer class samples individuals from the population to immunize at the beginning of the simulation. It needs two parameters to do this. First, it needs an age-specific distribution of immunity: for each age, we need to know the fraction of immune individuals. The distribution we used is a projection of the age-dependent immunity levels to measles in 2020 in Flanders, Belgium. It is based on projections made by Hens et al. 27 , and was obtained in the same way as was described in a more recent paper 16 . Secondly, the Immunizer also needs to be supplied with a target clustering level, as was described above.
When sampling immune individuals in the population, we first sample among those individuals that are 35 years of age or older-i.e. individuals that were born before 1985. As measles virus circulation was not substantially affected by vaccination until the mid 1980s in Belgium, we assume that the majority of individuals born before 1985 acquired natural immunity 37 . As such, we assume that clustering of susceptible individuals within households as a result of vaccination decisions does not yet play a role in this age category. To immunize individuals born before 1985, we use the following procedure. First, we calculate for each age older than 35 how many immune individuals should be in the population, based on the age-specific immunity level and the number of individuals of that age that are present in the population. Next, we select a random individual. If this individual is still susceptible, and there are not yet enough immune individuals in the age-category this person belongs to, we make them immune. We repeat this procedure until all age-dependent immunity quota for individuals older than 35 have been fulfilled.
Next, we immunize individuals younger than 35 years of age, taking into account the clustering level that was provided as an input parameter. First, we again calculate the required number of immune individuals per age given the target coverage. Secondly, we draw a random individual from a random household. If there are not yet enough immune individuals in their age-category, and provided they are still susceptible, we set this individual's health state to 'immune' . Next, we compare a random draw between 0 and 1 to the clustering level. If the draw is smaller than the clustering level, we immunize all other individuals in this household that are younger than 35 years of age. We do this within a 10% range of the constraints of age-dependent immunity levels. If the random draw is greater or equal to the target clustering level, we go back to step 2, sampling a random individual from a random household. We repeat this procedure until all age-dependent immunity quota have been fulfilled.
Contact and transmission events. The simulator moves forward in discrete time-steps of one day. Each time-step consists of two phases. First, the health status and corresponding presence in contact pools is updated for each individual. Second, the actual contacts and transmission events within each contact pool are simulated.
We adhere to the following natural history for measles. We draw 4 durations from a distribution for each individual: the duration of the incubation period, the duration of the latent period, the duration of the infectious period, and the duration of the symptomatic period. The duration of the incubation period is sampled from a log-normal distribution with median 12.5 and dispersion 1.25 38 . At the end of the incubation period, individuals become infectious for a duration of 6 to 8 days 39 . Next, after either 2 or 3 days of being infectious but still asymptomatic, individuals become symptomatic 38 . When individuals become symptomatic, it is assumed that they will remain home and thus only make contacts with persons within their own household. This symptomatic period lasts 6 to 8 days, and we assume that each primary infected individual eventually becomes symptomatic 40 .
To simulate contacts and transmissions within each contact pool, we use the age-and context-dependent contact rates described above. When a contact occurs between an infectious individual and a susceptible individual, a transmission probability is used to determine whether an actual transmission of the virus occurs. This P transmission is supplied as an input parameter.
Simulations. First, we established a relationship between the transmission probability P transmission and R 0 , the basic reproduction number. We did this by running a large number of simulations, for transmission probabilities ranging from 0 to 1. We tested 21 values for the transmission probability, and ran 1000 stochastic simulations for each of these values. Seeds to initialize our random number generator were generated from a non-deterministic, machine-specific source. We kept track of the number of secondary cases that was caused by one index case in an otherwise completely susceptible population and used this as an estimator for R 0 . We ran each simulation for 30 days, as we were only interested in the transmissions made by the index case, who recovered after-at most-27 days. From the 21,000 data points we collected in this way, we fitted a function, allowing us to predict a corresponding R 0 value from a given P transmission value. We fitted the data using the optimize.curve_fit function in the scipy python package 41 . This uses a non-linear least squares method to fit the simulation data to a given function. In our case we used a function of the form a + b * log(1 + P transmission ).
Secondly, we ran simulations over the course of 730 simulated days, until no more new infections were being recorded, to track the full course of the epidemic. We tested 13 values for the transmission probability between 0.2 and 0.8. We also tested 5 different target levels for the clustering of susceptible individuals within households (0.00, 0.25, 0.50, 0.75, 1.00), defined as the probability that all other persons (aged below 35 years of age) within the same household are immune if one person (aged below 35 years of age) in a household has been vaccinated. For each of these 65 scenarios, we ran 1000 simulations.
We started each simulation with the introduction of one random infected individual in the population. Before the first time-step, we recorded the number of immune and susceptible individuals in each age category. To determine to which extent susceptible individuals were actually clustered within households, we calculated a 'household assortativity coefficient' for the entire population at the beginning of each simulation. To do this we calculated the assortativity based on immunity status (susceptible or immune) in a network of individuals www.nature.com/scientificreports/ connected by household relations 42,43 . We constructed a network, the nodes of which represented the individuals in the population. Each node has one attribute, 'susceptible' , which can be either true or false. An edge between two nodes exists if two individuals belong to the same household. We used the networkX python package to construct the network and calculate the attribute assortativity coefficient 44 .
During the simulation, we also keep track of new transmissions. This enables us to determine the total number of cases over the course of the simulation. As we record who infects whom, it is also possible to calculate an effective R-i.e. how many persons the index case infects on average.
To run these simulations, we used the Python environment we created for Stride (PyStride), and used the multiprocessing package to run multiple simulations in parallel 45 . We ran all simulations on a Linux machine, using 8 cores (16 with hyper-threading).

Results
Relationship P transmission ∼ R 0 . First we established a relationship between the transmission probability P transmission and R 0 . We estimated R 0 as the expected number of secondary cases one infected individual would infect in an otherwise completely susceptible population.
In Fig. 1, the median (solid blue line), mean (dotted pink line), and 95% percentile interval (grey shape) over 1000 simulations for each of 21 values of P transmission between 0 and 1 are shown. We see that both the mean and the median number of secondary cases steadily increases as the transmission probability is increased. The shape roughly follows that of a logarithmic function, which is the reason why we chose to fit the data to a function of the form a + b * log(1 + P transmission ) . We observe that the 95% percentile interval widens as the transmission probability increases, but that nevertheless, mean and median remain very similar over all values of P transmission .
We fitted the data using the optimize.curve_fit function in the scipy python package 41 . This gave us the formula shown in Eq. (1) to estimate R 0 for a given P transmission .
In Fig. 1, we also added the fitted function for comparison (dotted brown line). When P transmission is 0, we observe a corresponding R 0 value of 0. We observe that the fitted function closely follows both the mean and median number of secondary cases we observed in our simulations. Based on the formula we found, a realistic range for transmission probabilities for measles in our model is between 0.372 and 0.601, which corresponds to a value of R 0 between 12.01 and 18.02, which is in line with the basic reproduction number commonly estimated for measles 46 . We ran simulations for P transmission ranging from 0.2 to 0.8, to be certain to include all relevant R 0 values (R 0 ∼ 6.80 to 22.58).

Population immunity profiles.
To get an idea of the way in which immunity is distributed over the population in the different scenarios, we investigated the fraction of susceptible individuals by age, and the assortativity of individuals within households based on their immunity status as described in the 'Methods' section. The mean fraction of susceptible individuals per year of age for simulations with clustering level 0, 0.25, 0.5, 0.75 and 1 can be seen in Fig. 2. Each data point is the mean over simulations for 13 values for the transmission probability, and with 1000 simulations per combination of clustering level and transmission probability, i.e. 13,000 simulations per tested clustering level in total. We also added the 95% percentile interval of the target projections we used as input data in Fig. 2 (grey shape). More detailed distributions of susceptibility by age for clustering levels from 0 to 1 can be found in Supplementary Fig. S1.
We see that overall, the immunity levels in all scenarios seem to be consistent with the target projected immunity levels per age. As the clustering level increases, however, we observe small discrepancies between the  www.nature.com/scientificreports/ simulated and projected immunity levels per age. This can be attributed to the fact that ages are not randomly distributed over the household sample we use to initialize our population. However, when distributing immunity in the population, we randomly draw individuals at each age. For instance, suppose a 7-year-old individual is randomly drawn, and the immunity level for 7-year-olds needs to increase to align with age-specific immunity quota in the population. Suppose also the household this 7-year-old belongs to also contains an 11-year-old, and the target clustering level is 1. In this case, we will set the 11-year-old as 'immune' , regardless of whether the age-dependent immunity quota for 11-year-olds have been reached. As such, we slightly relaxed the constraints imposed by age-specific immunity quota. Slight variations in older age categories can be attributed to fewer individuals being present in these age categories.
We also looked at the relationship between the clustering level we provided as an input parameter to the simulator, and the resulting household assortativity coefficient of the simulated population. The clustering level represents the probability that if one household member under the age of 35 is immune to measles, all other household members under 35 years of age will also be immune. We calculated the household assortativity coefficient-defined as the assortativity of individuals within households, based on immunity status-for clustering levels 0, 0.25, 0.5, 0.75 and 1. We did this for both all age groups in the population and for individuals under the age of 35.
As we increase the clustering level, we observe that the household assortativity coefficient also increases. This is the case when we observe the entire population (Fig. 3a), as well as when we only observe individuals under 35 years of age (Fig. 3b). When we measure the household assortativity coefficient over the entire population, it increases from about 0.3 to 0.5 as the target clustering level increases from 0 to 1. However, when we only take   www.nature.com/scientificreports/ into account individuals aged under 35 years of age, we see an increase from about 0.35 to 0.85 as the clustering level increases from 0 to 1. This discrepancy is as expected, as we only applied household-based clustering of susceptible individuals for individuals aged under 35 years of age. Taking into account older individuals when calculating the household assortativity coefficient lowers the coefficient, as these individuals have not been actively clustered by immunity status. We also observe that there is not much variation per clustering level over the different simulations. We also validated that the distribution of household assortativity coefficients per input clustering level remains stable for samples of the simulated population. We did this by selecting 1000 households-out of the total of 192,627 households-during each simulation, and calculating the household assortativity coefficient over the individuals in these selected households. We found that this did not affect the distribution of household assortativity coefficients by input clustering level in an important way. A box-plot of household assortativity coefficients by input clustering level for samples of 1000 households can be found in Supplementary Fig. S2. Effective R. For each combination of transmission probability and clustering level that we tested, we also calculated the effective R. We defined this as the expected number of secondary cases caused by our index case in a partially immune population. In Fig. 4, a heat-map indicates the effective R-as calculated from 1000 simulations per scenario-for clustering levels 0, 0.25, 0.5, 0.75 and 1 and transmission probabilities between 0.2 and 0.8 (R 0 ∼ 6.80 to 22.58). We observe that a higher clustering level results in a higher effective R for the same transmission probabilities.
The effect is stronger for lower values of P transmission , and becomes weaker as P transmission increases. This is clear when we investigate the percentage the effective R increases when increasing the clustering level from 0 to 1. An overview of the increase in effective R when increasing the clustering level from 0 to 1 can be found in Table 1. We see that, while the increase in effective R is above 60% when P transmission is 0.2 ( R 0 = 6.80 ), it decreases to about 18% for a transmission probability of 0.6 ( R 0 = 17.99).   . We see that, for all transmission probabilities, outbreaks either die out after only a few cases (bins on the far left), or persist and infect a large part of the susceptible population (bins on the far right). These histograms also show that, as the clustering level increases, outbreaks tend to become larger: outbreaks that persist with clustering level 0 (red bars) are generally smaller than outbreaks that persist with clustering level 1 (green bars). Based on these plots, we set the extinction threshold-above which we will regard an outbreak as persistent-at 5000 cases. Based on this threshold, we calculated the fraction of outbreaks that persisted in each scenario. A heat-map showing the probability of persistent outbreaks for clustering levels 0, 0.25, 0.5, 0.75, and 1 and transmission probabilities between 0.2 and 0.8 ( R 0 between 6.80 and 22.58) is shown in Fig. 5. With P transmission = 0.2, we observe no persistent outbreaks, regardless of the clustering level. However, with a transmission probability of 0.3, which corresponds to an R 0 of about 9.91, there are still no persistent outbreaks with a clustering level of 0 or 0.25, but persistent outbreaks start to appear for clustering levels 0.5, 0.75 and 1. For a transmission probability of 0.35 ( R 0 = 11.38 ), we observe persistent outbreaks for all clustering levels. However, at a clustering level of 0, only about 20% of outbreaks is persistent, while this increases to about 50% for a clustering level of 1. For larger transmission probabilities, the same trend can be observed: as the clustering level increases, so does the percentage of outbreaks that persist. The effect is stronger for lower values of P transmission , and less pronounced for higher values of P transmission . A more detailed overview of outbreak probabilities per value of P transmission can be found in Supplementary Fig. S6  Similar trends can be observed when we look at the escape probabilities for the different scenarios. We define the escape probability as shown in Eq. (2), with N susceptible the number of susceptible individuals at the beginning of the simulation, and N cases , the number of infected cases after 730 simulated days.
A heat-map for the mean escape probability over 1000 runs for each scenario (clustering levels ranging from 0 to 1 and transmission probabilities ranging from 0.2 to 0.8-R 0 of 6.80 to 22.58) is shown in Fig. 6. For a transmission probability of 0.2 and 0.25 ( R 0 = 6.80 and R 0 = 8.38 , respectively), we see that the escape probability is almost 1 on average for all clustering levels. This is what we would expect, as we observed before that there were no persistent outbreaks at this transmission probability. However, with a transmission probability of 0.3 and 0.35 ( R 0 = 9.91 and R 0 = 11.38 respectively), we see that while the escape probability is still almost one for a clustering level of 0, we observe that the escape probability lowers to about 0.9 for a clustering level of 1. For values of P transmission 0.4 ( R 0 = 12.80 ) or higher, we see that while the mean escape probability for clustering level 0 is below 1, it decreases further as the clustering level increases. This effect becomes less pronounced for P transmission 0.7 and 0.8 ( R 0 = 20.35 and R 0 = 22.58 respectively), but it can still be observed. A more detailed overview of the distribution of escape probabilities by clustering level and transmission probability can be found We also examined the sizes of the simulated outbreaks in the different scenarios. In Fig. 7, the mean outbreak size for persistent outbreaks over 1000 runs is shown for clustering levels ranging from 0 to 1 and transmission probabilities ranging from 0.2 to 0.8 ( R 0 of 6.80 to 22.58). As discussed above, we set the extinction threshold at 5000 cases.
We observe that the average outbreak size when P transmission = 0.2 ( R 0 = 6.80 ) remains 0 at all clustering levels. With a transmission probability of 0.25 ( R 0 = 8.38 ), persistent outbreaks only occur at a clustering level of 1. On average, these outbreaks only just reach the extinction threshold of 5000 cases. For a transmission probability of 0.3 ( R 0 = 9.91 ), persistent outbreaks occur for clustering levels 0.25, 0.50, 0.75 and 1. With a transmission probability above 0.35 ( R 0 ≥ 11.38 ), we observe that as the clustering level increases, the average outbreak size also increases. The increase in average outbreak size from clustering level 0 to 1 is steep for transmission probabilities 0.35 ( R 0 = 11.38 ) to 0.55 ( R 0 = 16.76 ). For higher values of P transmission , the same trend can be observed, but the increase in outbreak size when the clustering level is increased from 0 to 1 is smaller.
A more detailed representation of the distribution of outbreak sizes by clustering level and transmission probability can be seen in Supplementary Fig. S12  Again, we only looked at persistent outbreaks, and set the extinction threshold at 5000 cases. As the clustering level increases from 0 to 1, we observe that for all transmission probabilities, the outbreak sizes also increase. This is, again, more pronounced for lower values of P transmission . For a transmission probability of  www.nature.com/scientificreports/ 0.35 ( R 0 = 11.38 ), we see that the median outbreak size increases from about 10,000 to about 25,000. For this same transmission probability we notice that there is a higher variance for the outbreak size at lower clustering levels than can be observed at higher levels of clustering. The same can still be observed-albeit to a lesser extent-when we look at results for P transmission = 0.4 ( R 0 = 12.80 ). Here the median outbreak size increases from a little over 20,000 to about 35,000 as the clustering level increases from 0 to 1. For higher values of P transmission , the median outbreak size also increases as the clustering level increases. However, the variance in outbreak sizes is small for all clustering levels. We also observe that, as P transmission increases, the relative increase in outbreak size becomes smaller as the clustering level increases from 0 to 1. Finally, we also wanted to know which individuals in the population had the most risk of being infected. For each scenario, we calculated the mean age of infected individuals. As expected, there is not much difference between the different scenarios. With a higher clustering level, infected individuals tend to be slightly younger on average, but are still very close in age to infected individuals when testing a lower clustering level. A heat-map of average ages of infected individuals for transmission probabilities 0.2 to 0.8 (R 0 ∼ 6.80 to 22.58) and clustering levels 0 to 1 can be found in Supplementary Fig. S15.

Discussion
We looked at the impact of different levels of within-household clustering of susceptible individuals on the risk for measles outbreaks. We did this by including age-dependent immunity rates and household-based clustering into Stride, an individual-based simulator for the transmission of infectious diseases.
We established a measure for estimating the level of household-based clustering of susceptible individuals in a population: the household assortativity coefficient. We calculated this measure as the assortativity based on immunity status within a network of nodes-representing individuals-connected by household relationsmeaning that two nodes were connected if they belonged to the same household. We found that this measure served as a good indication of the input clustering level in our simulated populations, and remains stable for samples of the total population as well. Furthermore, we believe that this measure is valuable for future research into the effects of susceptibility clustering. For example, it would be relatively easy to measure the household assortativity coefficient of a population 'in the field' . By conducting a survey, and counting the number of immune versus susceptible individuals in a sample of households, it would be possible to construct a network similar to the one we constructed for the simulated population, and calculate the household assortativity coefficient. As such, the actual level of within-household susceptibility clustering in a population could be taken into account when estimating the efforts needed to control measles.
We also simulated outbreaks of measles by introducing one infected individual into the simulated population of Flanders, Belgium and following the evolution of the outbreak for 730 simulated days. We found that the level of within-household clustering of susceptible individuals has an impact on important measures for outbreak risk. The effective R increases as the clustering level increases. This is true for all tested transmission probabilities (0.2 to 0.8, which we calculated to be equivalent to an R 0 of about 6.80 to 22.58). However, the relative increase in effective R is higher for lower values of P transmission and is less pronounced for higher transmission probabilities.
This means that, as more clustering of susceptible individuals within households occurs in a population, an index case will on average infect more cases compared to a population with a lower household assortativity coefficient. As such, household-based clustering of susceptible individuals has an impact from the very beginning of an outbreak.
It also plays a role in the rest of the evolution of measles outbreaks. We observed that outbreaks have a higher probability of persisting when there is a higher level of household-based clustering of susceptible individuals. When we set the threshold for persistent outbreaks at 5000 cases, we saw that while there were no persistent outbreaks with a transmission probability of 0.3 ( R 0 = 9.91 ) and a clustering level of 0 and 0.25, persistent outbreaks did appear when the clustering level was at 0.5, 0.75 or 1. For higher values of P transmission , we noticed that the probability for an outbreak to persist increased as the clustering level increased.
We observed the same trend when looking at escape probabilities: as the clustering level increases, the escape probability decreases, meaning that a larger fraction of susceptible individuals will get infected over the course of an outbreak. This can also be seen when looking at the sizes of persistent outbreaks. On average, outbreaks become larger when the level of household-based clustering increases. As such, a higher level of clustering of susceptible individuals within households entails a higher risk for susceptible individuals to become infected whenever an infected individual is introduced in the population.
These effects are strongest for transmission probabilities between 0.3 and 0.6 ( R 0 between 9.91 and 17.99). For larger values of P transmission , the effects of increasing the level of household-based clustering of susceptible individuals become less pronounced, but can still be observed. An explanation for this might be that, as the transmissibility of an infection increases, this infection spreads regardless of the level of within-household clustering of susceptibility. As such, contacts between susceptible and infectious individuals within households become relatively less important for the spread of the epidemic.
We conclude that, as the level of household-based clustering of susceptible individuals has an impact on the risk for measles outbreaks, it is important to include this form of clustering of susceptibility into models for measles transmission in highly vaccinated populations, or in populations with low fertility in which an accumulation of susceptible individuals within households could easily develop 47 . Omitting this form of clustering would lead to an underestimation of the risk for outbreaks and thus of the efforts needed to inhibit the spread of measles and to achieve measles elimination. Taking into account within-household clustering, and potentially other ways in which susceptible individuals can be clustered in a population, provides new and more accurate herd immunity thresholds for measles. www.nature.com/scientificreports/ Some limitations need to be taken into account when interpreting these results. Firstly, the population we used in Stride is closed. No individuals are born or die over the course of the simulation. However, for a childhood disease such as measles, long-term simulations would benefit from having a demographic component. Further work will include expanding our individual-based models to include an open instead of a closed population.
Additionally, having a closed population leads to some limitations regarding our estimation of the relationship between R 0 and P transmission , and regarding our calculation of the effective R. As no new susceptible individuals enter the population over the course of the simulation, each epidemic will eventually die out, as susceptible individuals become depleted. As such, neither R 0 , nor the effective R satisfy the threshold property that an epidemic only dies out if R 0 or, respectively, the effective R is smaller than 1. Moreover, as we use a closed population, we chose to estimate the relationship between R 0 and the transmission probability by tracking the secondary cases caused by the index case-a randomly selected individual that is then infected-in a completely susceptible population. However, it could be argued that a randomly selected individual might have different social contact behaviours opposed to the 'typically infected individual' that R 0 is usually based on 48 .
Furthermore, the projections of age-specific immunity levels in 2020 we used as input for our simulations were partially based on a serological survey 25 . The serological survey information is based on antibody titre levels, which is a humoral immune response indicator. However, cellular immune response mechanisms-which are only partially correlated with humoral response markers-can still provide immunity even when antibody levels are low or undetectable 49 . The levels of immunity that we have used may thus represent an underestimation of the actual immunity levels against measles in Flanders.
On the other hand, even when an individual's humoral immunity levels are considered protective, exposure studies suggest that a sufficiently high infecting dose can still trigger an active infection 49 . This could further increase the importance of within-household clustering of susceptible or partially vaccinated people, as the infecting dose is likely highest for people living in the same household as an infectious individual.
In our simulations, we applied household-based clustering for individuals born since 1985. We assumed that most individuals born before that date had acquired natural immunity against measles. However, in vaccination guidelines, 1970 is used as a cut-off for natural immunity, assuming individuals born before 1970 to have natural immunity against measles 50 .
For this study, we have also made some assumptions in our model of the natural history of measles disease. We assumed that every infected individual will eventually become symptomatic, and that individuals, while symptomatic, will only contact members of their own household. We also assumed that all susceptible individuals had the same chance of becoming infected, and subsequently infectious and symptomatic. We did not take into account variation in the probability and duration of symptoms of previously vaccinated versus unvaccinated susceptible individuals 51,52 . Finally, we also assumed that the durations of the incubation period, latent period, infectious period and symptomatic period were independent from each other within each individual. Future analyses should aim to relax these assumptions.
Additionally, data from observational studies on levels of clustering of susceptible individuals within households should be obtained and used as an input to models for measles transmission. To enable the collection of data concerning susceptibility clustering as well as the inclusion of household-based susceptibility clustering in epidemiological models, we plan to elaborate on the clustering measure we proposed in this paper, and to further validate it. Furthermore, the concept of clustering can be further expanded by examining other types of clustering, including geographical clustering and within-school clustering of susceptible individuals 19 .
Finally, if the underlying reasons for non-or incomplete vaccination can be uncovered and quantified, other relevant types of clustering may be defined in future models. As these models become more realistic, they can be used more effectively to inform policy on measles elimination.

Data availability
The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.