Introduction

Social interactions among humans form complex networks. While these interactions have recently begun to occur via many different channels – email, social networks, texts, and calls1 – the interactions mediated through physical proximity remain a fundamental way for people to connect2. A common way to quantify the nature of a link is to consider repeated interactions: frequently occurring interactions indicate strong ties, such as friendships, while ties with small weights can indicate random encounters. Here we focus on a different dimension: rather than the strength of links, we study physical distance between individuals when a link is activated. Using epidemics as an example application, we show that changing of our definition of what constitutes a social tie based on the distance of pairs of individuals leads to strong structural differences in the resulting networks and quantify those differences.

The findings presented here are based on a dataset of proximity events in a population of approximately 500 students at the Technical University of Denmark3. These students are densely interconnected via networks of interactions, both virtual (Facebook, calls, texts) and based on physical proximity (both within university campus and outside). The full dataset – known as the Copenhagen Networks Study – contains two years of high-resolution records of students’ activity (the aforementioned networks along with GPS location and questionnaires), collected primarily through smartphones distributed to students at the beginning of their university education. Here, we explore the dynamic network where every person is represented by a node, and two nodes are connected if they are within certain physical distance d of each other. While this network is small from the perspective of population-level epidemiological studies, the access to physical proximity sampled at the 5-minute level, provides very a detailed view of possible empirical spreading paths (see Table 1 for details).

To quantify the impact of the physical proximity on the dynamic network, we use a simulated epidemic spreading processes in two distinct networks of physical proximity. We consider the network of short-range interactions defined as $$d\lesssim 1$$ meter, and the long-range network which includes all interactions $$d\lesssim 10$$ meters4. Below we show that the short-range and long-range networks are fundamentally different in terms of structure and dynamics.

The key novelty of this work arises from the fact that we are able to explore dynamics of two distinct types of spreading mechanisms (in many ways similar to e.g. droplet vs. airborne spreading mechanisms) based on the same underlying empirical behavioral data. Because we are able to consider two fundamentally distinct networks arising from a single underlying dataset, we can be certain that the differences in infection patterns are related solely to differences in how the disease is able to spread on each of the networks. This implies that differences in spreading patterns are not due to other differences in behavior that one might encounter when comparing two disparate datasets of actual human behavior, such as mobility, culture, population density, demographics, etc. Similarly, having both short- and long-range networks directly observable allows us to sidestep creation of synthetic networks via randomization schemes.

In the literature on physical proximity, the tacit rules of human interactions in physical space have been an object of interest since the 1950’s5,6,7. Yet little is known about how the structure of person-to-person proximity networks change as we vary the definition of which distance between two individuals corresponds to a connection between the two. Previous research into proximity networks has been based on self-reported data6,8,9 or tightly-controlled laboratory observation5.

We expect the social network of individuals to be closely related to the structure of the short-range network, but with some differences. This similarity arises because, in social networks, the difference between friend and stranger is typically expressed via different personal spaces for each social category6. Interactions with individuals with whom we are not familiar tend to occur at larger distances (we use term ‘interaction’ for all proximity events, including the long-range network). Since people function in bounded spaces, however, we do not have complete freedom to only allow friends to be physically close to us. Rides on buses, random meetings in elevators, or busy dining halls force us to be in close proximity to strangers. Thus, while the majority of our proximity interactions are with friends and families, our interactions network is not fully explained by the underlying social network, as expressed by, for example, link strengths. The long-range networks contains all of the links in the short-range network, but in addition also spurious connections to people passing by and the ‘familiar strangers’10,11, those individuals we encounter repeatedly but have never gotten to know. Thus, considering the proximity of pairs engaging in interactions and moving beyond simply considering the weight of the links in the network, provides a new source of information regarding potential spreading paths.

From the network science literature we know that social networks exhibit non-trivial structure on every level from degree distribution12,13, over motifs14,15, to communities16,17,18, and at time even an overall hierarchical organization19. In the light on the research on physical proximity discussed above it is interesting to keep these key findings from the social networks literature in mind as we explore the differences between the short-range and long-range networks.

Results

The proximity networks are based on Bluetooth scans providing a measure of pairwise proximity between N = 464 highly-connected participants – freshmen students at a large university3. We define an interaction between users i, j in a 5-minute timebin t (the smartphone were configured to scan for nearby devices every 5 minutes) as γijt = s, where the signal strength s is reported by the handsets as received signal strength indicator (RSSI). Two users are considered to be interacting within a given timebin if their phones registered each other at least once in that timebin, regardless of the reported signal strength. This densely-connected dynamic network of all Bluetooth interactions is based on a total of 1472 094 interactions, taking place over 28 days. RSSI, measured in dBm, is defined as the observed signal power relative to 1 mW.

The long-range, sampled long-range, and short-range network

The long-range network is created by interactions occurring at any distance covered by Bluetooth range, between 0 and 10–15 meters. In order to capture only close range interactions, we establish the short-range network by selecting the subset of interactions with γijt ≥ −75 dBm corresponding to distances of approximately 1 meter or less4 (see Supplementary Information for additional details on the choice of threshod). The short-range network consists of f = 18.3% of all interactions.

Since the short-range network contains only a fraction of all interactions, the simulated spreading processes taking place on this network are trivially slower and smaller than processes occuring on the long-range network. The intuitive reason for this is that with an average of one fifth of the interactions, a node in the short-range network has correspondingly fewer opportunities of spreading a disease than in the long-range network. The difference in number of interactions therefore prevents us from directly comparing the interplay between structure and dynamics of spreading processes for the short-range and long-range networks using simulated disease models with the same parameters.

In order be able to compare directly, we create a sampled long-range network, which contains the same fraction of interactions as the short-range network, but chosen at random among all interactions (see Fig. 1a). As we argue below, the sampled long-range network thus contains both close and distant interactions and shares most topological properties with the full long-range network, while based on precisely the same number of interactions as the short-range network.

Link weights in the three networks

We start our analysis by studying similarities and differences in the the distribution of link weights between the three networks (long-range, sampled long-range, and short-range). For each of the networks, we calculate the weights as described below, using the long-range network as an example. We first create an adjacency matrix Ai×j×t with timebins t containing interactions aggregated over 5 minute intervals corresponding to the Bluetooth scanning rate. This matrix has entries aijt = 1 when an interaction is present and aijt = 0 otherwise. The weight wij of a link connecting two individuals is defined as the total number of interactions occurring on that link $${w}_{ij}={\sum }_{t}\,{a}_{ijt}$$. Note that because the sampled long-range network is generated by sampling interactions at random from the full network, it is possible to calculate the weight distribution for this network analytically.

We use a number of closely related (but distinct) terms to describe connections between pairs of individuals. A quick overview of terms are: Interaction: A single measurement of proximity between a pair of individuals. Signal strength: The RSSI measured by a smartphone for a single interaction. The signal strength can be considered a measure of distance. Link: An abstract description of the connection between two individuals, and implies at least one interaction. Links are sometimes denoted ties or connections in the literature. Weight: Number of interactions observed on a given link; sometimes called strength in the literature.

Now, considering high-weight links we find that these links in the short-range network are relatively unaffected by removing interactions according to physical distance: in the short-range network we find that the highest-weight links typically maintain ~80% of their interactions). This is in stark contrast to the sampled long-range network, where link-weight is depleted in proportion to the sampling fraction, and high-weight links maintain only ~18% of the interactions from the full long-range network.

In summary, the weight distribution in the short-range network suggests that friends (with high-weight links) tend to be physically close and that most low-weight links correspond to random encounters (encounters between strangers), consistent with results on interaction distance from both quantitative measurements4 as well as sociology6.

Differences in local structure

The key comparison is between the short-range network and the two long-range networks. Since our sampling is uniform over interactions, we expect the sampled long-range to be structurally very similar to the full long-range network, with weights decreased proportional to the down-sampling fraction. As we discuss above, however, many low-weight links disappear as part of the sampling process, and the overall network structure is complex, reflecting non-trivial and highly correlated underlying social behaviors. Therefore, it is useful to quantitatively confirm that the structure of the long-range and sampled long-range remain remarkably similar – and distinct from the short-range network.

Starting from the single node perspective, we find important differences between the short-range and the long-range networks. We can quantify this difference using the Shannon entropy. For a node i, we start from a link with neighbor j with weight wij and define $$\pi ({w}_{ij})={w}_{ij}/{\sum }_{k}\,{w}_{ik}$$ to mean the fraction of the node’s total interactions taking place on that link. Now, we define the node entropy as $$S(i)=-\,{\sum }_{j}\,\pi ({w}_{ij})\,{\mathrm{log}}_{2}\,\pi ({w}_{ij})$$. Since infection probability is approximately proportional to link weight (see SI), this quantity can be interpreted as the expected number of yes/no questions needed to establish which of i’s links caused an infection. The distribution of entropy for all three networks is plotted in Fig. 2a. For the short-range network (blue), the distribution peaks at 4 bits, corresponding to an effective group of 24 = 16 potential sources of infection. Comparing the long-range (green) and sampled long-range (orange) networks, we find as expected that the distribution of node entropies are very similar, emphasizing the structural similarity between these two networks. The distribution for the sampled long-range network is created by averaging per-user entropy values over 100 random realizations of the sampled long-range network. Both peak at around 6 bits, corresponding to a larger effective group of 26 = 64 potential sources of infection in this network.

These results provide a striking illustration of how the close proximity zone is preferentially reserved for strong ties (e.g. friends or acquaintances) while the distant zone is a more public space where many more random interactions happen, resulting in a correlation between physical proximity and tie strength as reported in ref. 9.

Meso-level structural differences

In the previous section we showed that in the short-range network a large fraction of interactions takes place on high-weight links. We now study the interplay between meso-level network structure and link-weight in the short-range and long-range networks. Specifically we are interested in the structures formed by the highest weight links. To explore these, we start building the networks from empty, adding their respective strongest links one-by-one. As links are added, we keep track of the number of connected components in the network as well as total weight of interactions added through the links, revealing the differences in the networks with respect to the structures created by the heaviest links.

Figure 2b illustrates how the process of adding links gradually grows the long-range and short-range networks, respectively. In the lower panel of Fig. 2b we show the number of the connected components and total number of interactions in the networks as the links are added. First, notice that the full and sampled long-range networks display identical behavior, with number of neighborhoods peaking with approximately 120 strongest links added. This behavior is consistent across 100 random realization of the sampled network. This is in contrast to the short-range network, where the number of components continues to grow up to 240 heaviest links in the network.

Our analysis shows, therefore, that the short-range network not only contains fewer links than the sampled long-range network, but that the configuration of the heaviest links is more fragmented than in the long-range case. This structural property of the short-range network, the highly-connected neighborhoods bridged by weak ties, is consistent with well known structures found in other social networks, such as mobile phone networks and online social networks17,20,21,22. In the long-range network, however, this structure is less pronounced, obscured by the presence of spurious links, distinct communities bridged by a small number of strong links not present in the short-range network.

Spreading process is captured in neighborhoods

Having investigated differences between short- and long-range networks with respect to structure, we now explore how the differences based on how diseases spread on the networks. Using a simple Susceptible-Infected-Recovered (SIR) model, we run simulations of a disease spreading across the networks. Our model is intentionally simplistic, intended to illustrate the structural differences between short- and full-range transmission, rather than emulate a specific disease. We use the actual temporal sequence of proximity interactions observed in the data, choosing parameter values to create a situation where large outbreaks are likely, but not guaranteed (see Methods for details of the epidemic modeling). While we report results for a specific choice of parameters and a single realization of the sampled long-range network, these results are robust across a wide range values of the transmission parameters and realizations of the sampled network.

Based on the structural analysis, our hypothesis is that, in the short-range network, the simulated pathogen tends to be more contained within small sets of highly interacting individuals. We quantify the contained-in-communities behavior as follows. For each infection event, occurring on link wij, where node i infects node j, we measure which fraction Ij of the node’s direct (1-hop) neighborhood has already been infected. Since this is a weighted network, we define $${I}_{j}={W}_{\{-i\}}^{-1}\,{\sum }_{k\in {\mathcal I} (j),k\ne i}\,{w}_{jk}$$, where $${\mathcal I} (j)$$ is the set of j’s infected neighbors and $${W}_{\{-i\}}={\sum }_{k\ne i}\,{w}_{jk}$$ is the sum of all weights excluding the infecting link. A value of Ij = 0 indicates that no-one in the direct neighborhood besides the infecting node has been yet infected; a value of Ij = 0.5 indicates that neighbors accounting for 50% of link weights connecting to j have already been infected. Figure 3a shows a kernel density estimation of I as a function of the fraction of infected nodes, based on 500 runs of the spreading process in the short-range (left), sampled long-range (middle), and long-range (right) networks.

In the case of the short-range network, we observe behavior which suggest that the spreading agent is indeed slowed by neighborhoods, consistent with behavior of both simulated and real spreading processes found in the literature23,24,25,26,27. As is evident from Fig. 3a, early in the epidemic outbreak, when the fraction of infected nodes is low, the disease agent can saturate small neighborhoods and infect new nodes in neighborhoods, where a large fraction (I > 0.80) of neighbors are already infected. Conversely, it is still possible to find neighborhoods with a low fraction (I < 0.20) of infected nodes very late in the outbreak. These effects are possible because the spreading agent does not jump easily between neighborhoods of densely connected nodes.

The disease spreading is very different in the full and sampled long-range cases. In contrast to the contained-in-communities picture, the infection progresses smoothly through the network. In the long-range networks, the neighborhood infection is more closely proportional to the fraction F of the total network infected. Cuts at particular levels of overall network infection F in Fig. 3b show that the pattern of more spread-out I in the short-range network is consistent through the spreading progression and across random starting conditions (seed node and time) Visually, the distributions of I at given F are narrower for the long-range networks, with peak values of neighborhood infection I closer to values of overall network infection F. To quantify this effect, we consider the distribution of R2 of a linear model fitting infection of the neighborhoods I to the progress of the infection (fraction of network infected F), calculated for each of the aforementioned 500 realizations of an epidemic, the distribution of R2 peaks at around 0.4 in the short-range network vs 0.75 in the two long-range networks, as shown in Fig. 3c. This indicates that direct proportionality between the global (F) and local (I) infection level is a significantly better model for the long-range networks.

Thus we find, that while – in the short-range network – the infection tends be captured inside closely connected communities, the picture is quite different in the long-range network. While both types of behavior has been described in the literature8,23,24,25,26,27,28, the important finding in this context is that the two networks are representations of the same underlying behavioral data originating from a single population. These findings underscore how long-range spreading dramatically taps into spurious connections outside the social networks, resulting in fundamentally different types of spreading – in some ways mimicking the differences between droplet and airborne spreading mechanisms29,30,31,32.

Community structure increases infected-infected interactions

Our analysis of link weights showed that the short-range network tends to have fewer links with more interactions on each link. But why is the disease trapped within communities in the first place? One of the reasons that an infection remains ‘stuck’ in a neighborhood is that a disease can only spread via interactions between infected and susceptible nodes. Thus, if a local group is fully infected, we tend to see a large fraction of infected-infected interactions, which cannot help spread the disease. In Fig. 4a we quantify this tendency, by plotting how frequently infected-infected are active in the sampled long-range and short-range network, respectively.

We observe a clear difference between two networks. In the sampled long-range network, where the local connection patterns have high entropy, there is only a low level of activity among infected or recovered individuals. The spreading agent quickly reaches the entire network due to a large number of available susceptible-infected links. This behavior is in contrast to the short-range network, where infected-infected interactions present a larger fraction of interaction events. Thus, as above, given the same number of interactions and the same underlying behavioral data, outbreaks are significantly slower and more contained in the short-range network relative to the sampled long-range case (Fig. 4b).

Finally, in Fig. 5 we summarize a number of statistics related to disease spreading in the three networks. These results confirm that the structural differences between the short-range and long-range interaction networks discussed above lead to reliably different outcomes in simulated epidemics. Firstly, in Fig. 5a, we show that when the outbreaks do happen in the short-range network, they are smaller in terms of total number of nodes infected. Moreover, the probability that an outbreak is contained – reaching only a small fraction of the network (<20%) – is higher in the short-range network than in the long-range networks (Fig. 5a inset). Finally, the time an infection needs to reach 50% of the short-range network is significantly longer, with the peak of the distribution for sampled long-range network occurring after 7 days, while the short-range network the peak is delayed to 10 days (Fig. 5).

Thus, consistent with the literature short-range short-range interactions are organized in a way that slows down spreading relative to the long-range case. The sampled long-range network features precisely the same number of interactions as the short-range network, but is structurally more similar to the full long-range network according to the measures considered here. Our results show that taking the physical distance of interactions into account results in networks that can significantly alter the outcome of a simulated outbreak. The qualitative behavior described above is reproduced across a wide range of parameter values.

Discussion

We have demonstrated a strong structural difference between the short-range networks that support short-range transmission processes and the long-range networks that support transmission across distances up to 10 meters. Summarizing our findings, we find that the proximity of interactions correlates with link-weight: on average we stay closer to our friends. In the short-range network, we find spreading patterns consistent with our knowledge of spreading on various online social networks and modeling studies23,24,25,26,27. In the long-range network we observe a large proportion of proximity interactions between individuals with weak or absent social ties, resulting in a complex local network structure. This non-social ‘noise’ in the network allows for faster and more powerful outbreaks to take place, even when considering the exactly same number of interactions, consistent with results of synthetic proximity-aware spreading simulations33.

It is, of course, well known that that the definition of ‘interaction’ impacts the network structure and spreading dynamics. For example, networks of sexual contacts are analyzed separately from other types of pathogen spread34,35, even though both types of networks are physical interactions networks. A central work in understanding role of physical proximity is by Read et al.8, where questionnaire data regarding ‘close’ and ‘distant’ interactions were collected from 49 participants over 14 non-consecutive days. This study, however, did not address how differences in mode of transmission can affect the network of infections. Recently, a multitude of new approaches have been developed for collecting data regarding close interactions with the purpose of modeling spreading using various methods, including Bluetooth, RFID, and questionnaires8,28,36,37,38,39.

Here we argue that from the perspective of a spreading agent, the relatively subtle difference of what ‘interaction’ is in the short-range and long-range networks makes an important difference, even given the same underlying social system. Our results suggest that long-range spreading is less related to the underlying social network and closer to a well-mixed system than simulations on purely social structures might lead one to suggest.

Methods

The dataset

The dataset used in this paper comes from the Copenhagen Networks Study3. We use one month of data (February 2014). Out of 696 freshmen student participants active in that month we chose students with at least 60% of Bluetooth observations present (resulting median 80%) and who belong to a single connected component. Observations are defined as 5-minute bins in which the user has performed scans, whether the scans contained any devices or not. Since Bluetooth scans do not result in false positives, we symmetrized the observation matrix (resulting in an undirected network), assuming that $${\gamma }_{ijt}\iff {\gamma }_{jit}$$. This results in improved data coverage, with a median of 85% of 5-minute containing data. More information regarding the dataset is provided in the Supplementary Information.