The dense social-contact networks characteristic of urban areas form a perfect fabric for fast, uncontrolled disease propagation. Current explosive trends in urbanization exacerbate the problem: it is estimated that by 2030 more than 60% of the world's population will live in cities7. This raises important questions, such as: How can an outbreak be contained before it becomes an epidemic, and what disease surveillance strategies should be implemented? Recent studies1, under the assumption of homogeneous mixing, make the case for mass vaccination in response to a smallpox outbreak. With different assumptions, it has been shown2 that mass vaccination is not required. Policymakers must trade off the risks associated with vaccinating a large population8 against the poorly understood risks of losing control of an outbreak. Addressing such specific policy questions9 requires a higher-resolution description of disease spread than that offered by the homogeneous-mixing assumption and the differential-equations approach.

Here we present a highly resolved agent-based simulation tool (EpiSims), which combines realistic estimates of population mobility, based on census and land-use data, with parameterized models for simulating the progress of a disease within a host and of transmission between hosts10. The simulation generates a large-scale, dynamic contact graph that replaces the differential equations of the classic approach. EpiSims is based on the Transportation Analysis and Simulation System (TRANSIMS) developed at Los Alamos National Laboratory, which produces estimates of social networks based on the assumption that the transportation infrastructure constrains people's choices about where and when to perform activities11. TRANSIMS creates a synthetic population endowed with demographics such as age and income, consistent with joint distributions in census data. It then estimates positions and activities of all travellers on a second-by-second basis. For more information on TRANSIMS and its availability, see Supplementary Information. The resulting social network is the best extant estimate of the physical contact patterns among large groups of people—alternative methodologies are limited to physical contacts among hundreds of people or non-physical contacts (such as e-mail or citations) among large groups.

The case study we present is a model of Portland, Oregon, USA, but the approach is broadly applicable. People, in the course of carrying out their daily activities (such as work, study or shopping), move between several locations, both exposing themselves to infectious agents within these locations and transporting those agents between locations. We represent these processes by a social contact network, which can be represented as a bipartite graph, GPL, as shown by the example in Fig. 1a. For Portland, GPL has about 1.6 million vertices, with a giant component of about 1.5 million people and 180,000 locations. The degree distribution of the people vertices in GPL, that is, the number of people QPLj who visited j different locations, is shown in Fig. 2a. It has a sharp peak near the average value of about four different locations, followed by a fast, exponentially decaying tail. The degree distribution for the location vertices in GPL is very different, as shown in Fig. 2b. This is the number of locations MPLi having i different visitors during the day. The distribution has a power-law tail with an exponent of about -2.8.

Figure 1: An example of a small social contact network.
figure 1

a, A bipartite graph GPL with two types of vertex representing four people (P) and four locations (L). If person p visited location l, there is an edge in this graph between p and l. Vertices are labelled with appropriate demographic or geographic information, edges with arrival and departure times. b, c, The two disconnected graphs GP and GL induced by connecting vertices that were separated by exactly two edges in GPL. d, The static projections ĜP and ĜL resulting from ignoring time labels in GP and GL. People (such as 24-year-old male) are represented by filled circles, and locations (such as 34 Elm Street) by open squares.

Figure 2: Degree distributions for the estimated Portland social network.
figure 2

a, The number of people QPLj who visited j different locations in the bipartite people–locations graph GPL. b, The number of locations MPLi in GPL that are visited by exactly i different people. The slope of the straight-line graph is -2.8. c, The number of people who have k neighbours in the static people–contact graph ĜP on log–log scale. d, The in and out degree distributions of the locations network GL. The slope of the straight-line graph is -2.8.

For many infectious diseases, transmission occurs mainly between people who are collocated (simultaneously in the same location), and spread is due mainly to people's movement. Hence we look at two natural projections of GPL obtained by drawing an edge between all pairs of vertices distance two from each other on the bipartite graph, as illustrated in Fig. 1b, c. The result is two disconnected graphs: GP, containing only people vertices, and GL, containing only locations. In GP, the edges are labelled with the sets of time intervals during which the people were collocated. For simplicity, however, we consider ĜP, a static projection of the time-resolved GP, obtained by discarding time labels, as shown in Fig. 1d. This is reasonable for diseases such as smallpox, severe acute respiratory syndrome or influenza, in which both the incubation period and duration of infectivity are of the order of several days, much longer than the 24-h approximate periodicity of people's contacts. This assumption introduces a systematic bias into results based on ĜP: the static projection yields a worst-case scenario of how the disease is likely to propagate, because ĜP is much more connected than GP. Any control strategy that is effective in such worst-case scenarios will also be effective in the time-resolved case. Furthermore, we have modelled the idea of an effective contact (one in which the transmission of disease is likely to occur) by removing from ĜP all edges for which the duration of contact was less than some minimum threshold, usually one hour. Even this thresholded version of ĜP is biased towards more connectivity than GP. An alternative to thresholding is to weight edges according to duration (and other factors affecting transmission). Figure 2c shows the degree distribution of ĜP for the Portland network. The other important projection of the bipartite graph is the locations network GL. If there is at least one person travelling from location l1 directly to l2 during the day, the two vertices corresponding to locations l1 and l2 are connected by a directed edge in GL from l1 to l2 that indicates whether the person is travelling in or out of the location. As before, we form a static version of the locations network, ĜL, by ignoring the time labels on the edges. The in and out degree distributions for the locations network are superimposed in Fig. 2d (ref. 12). The power-law decay evident there shows that ĜL is a scale-free network6 with an exponent of γ - 2.8. A simple explanation for this empirical observation is based on the capacity distribution of the locations in Portland. Land-use data indicate that the number of people using a location (its capacity) follows a power-law distribution with the same exponent γ. In large urban areas, people tend to fill locations up to their capacity. More densely filled locations (for example shopping malls) will have a larger number of people moving in from a proportionally larger number of other different locations (for example, homes), which in turn generates the scale-free character of ĜL with the capacity exponent γ.

Measurements of the average clustering coefficient (see Methods) for ĜP yield CP ≈ 0.48, and for ĜL, CL ≈ 0.04, both much larger than the roughly 10-6 of an Erdös–Rényi random graph with the same number of vertices and average degree5. This, together with the degree distribution and its small diameter (about 6), suggests that the people-contact graph is more like a small-world graph5 than a random graph. The clustering coefficient distributions versus degree13,14 shown in Supplementary Fig. 1 indicate that the locations network ĜL is an empirical example of a hierarchical scale-free structure15,16,17.

Both degree distribution and clustering are relevant to short-term propagation in a network, but longer time dynamics will be driven by global graph properties. It is thus natural to consider estimation schemes for global topological measures, such as expansion (see Methods). Informally, the higher the expansion, the quicker is the spread of any phenomenon (such as disease, gossip or data). We estimated an expansion value of about 2 for ĜP by random sampling, indicating that the people-contact graph is extremely connected. An immediate consequence is that, as for an assortatively mixed network18, ĜP cannot be shattered by removing (by means of vaccination or quarantine) a small number of high-degree vertices19,20,21,22. To verify this, we have computed the size of the giant component—the maximum number of people at risk for disease introduced by a single person—when all vertices of degree more than k are removed. A unique giant component persists even when all vertices of degree 11 and higher are removed, as shown in Fig. 3a. Thus, attempting to shatter the contact graph by vaccinating the most gregarious people in a population would essentially be equivalent to mass vaccination. Similarly, we show in Fig. 3b, c that closing the most-visited locations—or vaccinating everyone who visits them—does not shatter the induced people-contact graph until large fractions of the population have been affected. Other infrastructure networks exhibit very different shattering properties.

Figure 3: Shattering and covering the people–contact graph.
figure 3

In a we remove (by vaccination or quarantine) all people with degree k and higher from the bipartite graph GPL. In b and c we remove all locations with degree k or higher from GPL and monitor the size of the largest connected component in the static people–contact graph induced by the remaining bipartite graph. d, Overlap ratios by degree. The lower curve shows the cumulative overlap ratio by degree, which is the overlap ratio for locations having degree k or less. The upper curve shows the overlap ratio for locations having degree exactly k.

Can epidemics be stopped without resorting to mass vaccination? Alternatives rely on early detection and efficient targeting. Here we introduce the overlap ratio, another non-local property of the graph that is crucial to early detection. Consider an idealized situation in which sensors at a location can detect whether any person there is infected. The feasibility of early detection depends on the number of sensors required to cover the population. This problem is equivalent to finding the minimum dominating set23. That is, we wish to find a subset L′ of locations so that all (or most) people visit some location in L′. The overlap ratio23 ω(J) of a set of locations JL is n(J)/∑lJdeg(l), where n(J) is the total number of people visiting any location in J, and deg(l) is the number of different people visiting location lJ. The smaller the overlap ratio, the larger is the number of different locations in J visited by a single person. The overlap ratio by degree, shown in Fig. 3d, is the overlap ratio for the set Jk of locations having degree k. Clearly, not many people visit more than one high-degree location, which implies that the high-degree location vertices form a near-optimal dominating set. With high probability, early identification could be accomplished by using sensors placed at locations with the highest degree.

Alternatives to mass vaccination involve isolating and/or vaccinating small subsets of individuals to ensure that the disease will spread only locally in the graph. Most such strategies assume that people contacted by infectious people are the best candidates for vaccination or quarantine. However, it might also be possible to identify a good subset of the population to target before an outbreak. GP is composed of tight-knit communities joined by long-range edges. A model for this structure is given by adding long-range edges to random geometric graphs24. An infectious individual (even one with low degree) who travels can nucleate two independent growth centres. Other long-range travellers near these centres can in turn nucleate an exponentially growing number of growth centres, as demonstrated by the rapid worldwide spread of severe acute respiratory syndrome. Thus, targeting long-distance travellers (say, across town for urban regions) is a crucial component of any response.

Our results on the expansion property indicate that disease is likely to spread quickly if not controlled early enough. However, exactly how the number of casualties depends on response delay and what constitutes ‘early enough’ depend on disease-specific factors such as incubation period and probability of transmission, as well as scenario-specific factors such as the means of introduction. Because these dependences cannot be easily determined from an analysis of the static social network, we turn to simulation, which captures the full time-dependence of GPL (see Supplementary Methods for details).

There is not yet a consensus on models of smallpox. We have designed a model that captures many features on which there is widespread agreement9,25 and allows us to vary poorly understood properties through reasonable ranges1,2,26. Our model includes the following features: the incubation period is a gaussian truncated at 7 and 17 days with a 12-day mean and 2-day deviation; the prodromal period is 3–5 days; the infectious period is 4 days, during which infectivity decreases exponentially; death occurs 10–16 days after the rash develops in 30% of normal cases. Ninety-five per cent of susceptibles exposed for 3 h to a person at minimum infectivity will become infected (the remaining 5% have extremely high or low susceptibilities, mimicking some anecdotal transmission incidents); vaccination is assumed to be 100% effective before exposure, and it reduces the mortality and transmission rates when administered up to 2 days after exposure (or 4 days for a previously vaccinated person, assumed to be everyone over the age of 30 years). The model also includes haemorrhagic variants with a shorter incubation period that are 10-fold as infectious and invariably fatal (see Methods and Supplementary Information for details).

Note that EpiSims does not specify a value for R0, the basic reproductive number27. This parameter reflects how many people in a susceptible population are directly infected by the introduction of a single infective. R0 is a convolution of transmission rates and contact patterns, and EpiSims performs the convolution for us. The implied value of R0 is the ratio of the numbers of people in the first and original cohorts; that is, the number of people initially infected and the number infected directly by them. These estimates obviously include the effects of the simulated response strategy. For the set of experiments reported below, R0 ranges from 0.4 to 3.4.

In these scenarios, aerosolized smallpox was distributed indoors at busy locations over several hours, infecting of the order of 1,000 people. We assumed that the presence of smallpox was detected on the tenth day after the attack. Furthermore, we assumed that cases could be recognized in the prodromal phase after this date. We did not consider the confounding background distribution of influenza-like symptoms.

We studied the sensitivity of the number of casualties to three factors: mitigation efforts, delay in implementing mitigation efforts, and whether people move about while infectious. We simulated a passive (do nothing) ‘baseline’ and three active responses: mass vaccination covering 100% of the population in 4 days (‘mass’); targeted vaccination and quarantine with unlimited resources (‘targeted’); and the same targeted response, using only half as many contact tracers and vaccinators (‘limited’).

For a movie showing the spatial spread of disease under two different response strategies, see Supplementary Information. Figure 4 compares the efficacy of these strategies. For each strategy we plot (on a logarithmic scale) the ratio of the cumulative number of deaths by day 100 to the number initially infected. The absolute numbers are less important than the rank and relative sizes of gaps between the points. Also shown are the effects of delays of 4, 7 or 10 days in implementing the response. For each of the responses including the baseline, we allowed infected people to isolate themselves by withdrawing to the home. This could be due either to the natural history of the disease, which incapacitates its victims, or to actions taken by public health officials encouraging people to stay home. The results are grouped according to time of withdrawal to the home: (1) early, in which everyone withdraws before becoming infectious, producing the lowest estimates for R0; (2) late, in which everyone withdraws about 24 h after becoming infectious; and (3) never, in which everyone carries on their daily activities unless they die. The extreme cases are unrealistic but are shown here because they demonstrate the existence of a clear transition.

Figure 4
figure 4

Cumulative number of deaths per number of initial infected, for the case of a smallpox outbreak in downtown Portland, under a number of different response strategies: squares, no vaccine; stars, 10-day delay; multiplication signs, 7-day delay; plus signs, 4-day delay.

In this study, time of withdrawal to the home is by far the most important factor, followed by delay in response. This indicates that targeted vaccination is feasible when combined with fast detection. Ironically, the actual strategy used is much less important than either of these factors.



Here we used the definition of the clustering coefficient, ci = 2ni/[ki(ki - 1)], given in ref. 5, which measures the extent to which neighbours of a vertex are connected by edges (ni is the number of connections between the neighbours of vertex i, and ki is the degree of i). Clustering has important implications for the rate and probability of disease spread4,28.

Graph expansion

The vertex expansion of a set PP is the ratio between the number of distinct vertices not in P′ reached through edges emanating from P′ and the number of vertices in P′, denoted by |P′|. Clearly, the vertex expansion of a set P′ with size |P′| = α|P| is bounded above by α-1 - 1. By definition, the vertex expansion of the graph GP is the minimum of the vertex expansions of all sets P′ with |P′| ≤ NP/2.

Haemorrhagic variants

In our model, haemorrhagic variants occur in 20% of pregnant women, 10% of HIV-positive people, and 2.4% of the population at large. Of those, 30% get an ‘early haemorrhagic’ variant, with a prodromal period between 0.5 and 1.5 days and an infectious period of 1 day (until death). Pregnancy and HIV status are assigned on the basis of demographics.

Simulation protocol

The protocol we simulated for targeted strategies was to place each prodromal person on a list. Contact tracers chose people at random from the list as they became available. We allowed 24 h for each contact tracer to vaccinate everyone living at the infected person's home and to travel to each location that the infected person visited, vaccinating a fraction of the people there who had been present when the infected person was there. We varied the fraction vaccinated according to the type of activity, from zero at a shopping location to unity at work and home. After 24 h the contact tracer was freed to service the list again. The infected person was sent to a quarantine location. In the ‘targeted’ case, roughly 20 people were vaccinated for each person initially infected. The peak rate was 10 people per day per initial victim, or roughly 10,000 people per day.