The network limits of infectious disease control via occupation-based targeting

Policymakers commonly employ non-pharmaceutical interventions to reduce the scale and severity of pandemics. Of non-pharmaceutical interventions, physical distancing policies—designed to reduce person-to-person pathogenic spread – have risen to recent prominence. In particular, stay-at-home policies of the sort widely implemented around the globe in response to the COVID-19 pandemic have proven to be markedly effective at slowing pandemic growth. However, such blunt policy instruments, while effective, produce numerous unintended consequences, including potentially dramatic reductions in economic productivity. In this study, we develop methods to investigate the potential to simultaneously contain pandemic spread while also minimizing economic disruptions. We do so by incorporating both occupational and contact network information contained within an urban environment, information that is commonly excluded from typical pandemic control policy design. The results of our methods suggest that large gains in both economic productivity and pandemic control might be had by the incorporation and consideration of simple-to-measure characteristics of the occupational contact network. We find evidence that more sophisticated, and more privacy invasive, measures of this network do not drastically increase performance.

. Projection of selected jobs on the first and second principal components (PC). Figure S2. The COVID-19 death rate in top ten jobs as reported in 7 (left) and the composite proximity score for the closest matching occupations in the O*NET data.
where µ and σ 2 are the mean and the variance of the normal distribution Y = log(X). The parameters of each fit are listed in Table S1. We wish to recover the mean number of contacts of 75 as observed in 14 and to split these total contacts into work, home and transport (more accurately representing all other urban interactions: transport but also shops, cinemas, restaurants etc) while imposing a skew that is consistent with the Coefficient of Variation observed empirically. We choose a CV value of 0.7 representative of the mean and the middle of this range of CV values.  Table S1. Parameters of log-normal distribution fitted to the degree distribution of six empirical networks.

Occupational Proximity to Contact Degree
As described in the main paper, we have a multidimensional score of proximity in the range [0-100] for each occupation. We seek to translate that score into a work contact. These distributions are fitted to data found in 16 . The values are then rescaled to make a mean total of 75, matching empirical data 14 .
The work degree of the node depends solely on her occupation, which is proportional to the rescaled value of its projection on the first PC of the job proximity data with the mean work degree equal to 36.5.
We also calculate the correlation between the first PC with the essential score data found in 17, 18 (see Figure S3).

Workforce Data
To generate the occupational contact network through the configuration model, we use data from the NY Workforce dataset for 2019 19 and the data for the number of contacts per occupation that we derive through the PCA. The NY workforce dataset includes information about the 750 distinct occupations that exist in the NY, such as the number of employees per occupation (out of a total labour size of around 9.5M workers), their percentage in the NY workforce, and the hourly and annual average wages per worker per occupation. Each occupation in the dataset is represented by a six-digit code based on the 2010 Standard Occupational Classification (SOC) System 20 . The first two digits represent the major group, the third digit represents the minor group, the fourth and fifth digits represent the broad occupation and the sixth digit represents the broader occupation.

Merging O*NET and Workforce Data
The data for the number of contacts per occupation derived by the PCA include information for 967 occupations. The occupations in this dataset are represented by an O*NET-SOC eight-digit code. The first six digits match the SOC coding scheme while the additional two digits indicate whether the O*NET-SOC occupation is a one-to-one match or is a detailed O*NET breakout of the SOC occupation 21 . To merge the two datasets by occupation code, we first group the occupations included in the PCA outcomes by their six first SOC digits. In that case, the number of contacts per six-digit SOC occupation is the mean of the number of contacts between the eight-digit SOC occupations that share the first six digits. After grouping the occupations (i.e. transition from the eight-digit coding system to the six-digit classification), we end up with 773 distinct occupations. 680 out of 750 SOC codes from the NY workforce dataset were matched one-to-one with SOC codes from the contacts per occupation data. We then match 18 more occupations presented in Table S2 manually. We therefore create a dataset of 678 distinct occupations having their number of employees in the NY workforce, their average wages and the number of contacts per occupation.

Modified Configuration Model
We generate our contact network between workers using a modified configuration model as follows. We first represent the assigned numbers of home, work and transportation contacts of each worker as half-links of the nodes in the network (see Figure 2 in the main paper). We then randomly select a pair of half links for each category of contacts and connect them. The random selection is achieved by sampling nodes without replacement, in order to avoid duplications of contacts and self-loops. We repeat the process until all half-links are paired up. This technique produces a network that is not fully random as nodes are assigned a degree according to their occupation which is in turn taken from an empirical distribution of workers over occupations.

Calculation of Transmission Rate from Reproductive Number
Let us consider how we can relate R 0 , the mean number of new infections per infection, to p in f and d ; the probability of infection between two nodes in a single encounter and the mean network degree. Since R 0 ≈ 2.5 according to various studies, we would like to fix p in f appropriately in our simulations to be roughly consistent with this measured R 0 . Let us consider a 'point infection' in a single time-step (in our simulations this is a single day) in which node i interacts with its four neighbours (see Figure S4). With probability p in f node j is infected by node i and transitions from susceptible to infected.
In a single encounter the probability of j becoming infected is p j (S → I) = p in f . Under n encounters, the probability of j becoming infected in at least one of the n encounters is the complement of the probability that j is not infected in any of the n encounters. The probability of not becoming infected in a single encounter is (1 − p in f ). This can be expanded to an arbitrary number of neighbours (see Figure S5). Assuming that there are no spillovers e.g. there is no possibility for i to infect k and then k to subsequently infect j (with i and j connected), we can calculate the expected number of new infections as the product of this probability of infection after n days and the mean degree. This is our implied R 0 . This assumption of no second order effects is somewhat unrealistic and becomes more unrealistic as the incubation period increases. Figure S4. A schematic illustration of a contact infection transmission.
Assuming that R 0 is defined over the incubation period i.e. it is the mean number of new infections caused by an infected person before they are symptomatic, then we set n = 14. d is defined by our contact data and so p in f is set by our empirical R 0 .
In fact we can calculate p in f directly, given our assumptions. Given our assumption of no higher order effects (i infects k who infects j) we would actually need a smaller p in f to recover R 0 . So our estimated value of p in f is an upper bound.

Industry Based Worker Productivity
To examine the productivity of workers based upon the industry of their occupation, we employ data from the Bureau of Economic Analysis (BEA) on value added by industry in producers' prices 22 . These data are calculated at the detailed industry level and exist for the years 2007 and 2012. We employ the most recent set of these data -from 2012. We note that while industry value added has surely changed since these data were compiled, the detail industry values in BEA's estimates likely correlate decently well with actual present day value added and provide to our knowledge the highest resolution estimate of detailed industry value added in the U.S. The BEA industry codes (NAICS) do not perfectly match the industry codes contained within the O*NET data, and so we supplement the NAICS crosswalk provided by the U.S. Census 23 with a procedure that sequentially matches industries to occupations based upon increasing levels of industry aggregation. We first match industry value added data on the full six digit NAICS codes of occupations, then match any remaining occupations and industries at the five-digit level, and so on up to the two-digit NAICS code level. Approximately 27% of occupations have a full match, 0.3% have a five digit match, 15% have a four digit match, 28% have a three digit match, and 29% have a two digit match.

Evaluation of Strategies
In order to evaluate and compare strategies, we consider (i) the economic loss due to furloughing of workers (ii) the economic loss due to infected workers and (iii) the size of the peak of the number of infected individuals (as a proxy for the degree of strain on health services). We wish to evaluate each strategy independently from a specific choice about the severity of the 5/16 Figure S5. The probability of a node to become infected based on the number of its infected neighbours (left) and the probability of infection in a single encounter (right).
intervention i.e. the proportion of workers to furlough. We therefore calculate the areas under the curves plotted in Figure 4 of the main paper; higher values are worse for all metrics. This is a simple sum given as below for strategy s for the furlough loss and analogously for loss due to infection and epidemic peak.
In Figure S6, we present the interaction term between the cost due to furloughing and the cost of infection that complements Figure 5 in the main paper.

Centrality and Shuffled Centrality
Assigning a score to each node in a network according to its centrality is a mature area of study. We therefore compare various centrality measures (see Table S3 and Figure S7), we conclude that High Degree Adaptive (HDA) is the best performing and it is this metric we present in the main paper.

Decomposition of the network in subcomponents
Our initial network is composed of a single component with 200,003 nodes. When removing the work and transportation links of nodes that can work from home the network decomposes into 3,846 components where from those, one is the giant component with 194,276 nodes and most of the other components are isolated nodes reflecting the workers that can work from home and do not have any home links. Furloughing n% nodes additional to those that can work from home, the network further decomposes in more components and the size of the giant component reduces as shown in Figures S8 and S9 for the different furloughing strategies. Figure S10 shows the weak correlation between occupational proximity and essentialness score (left) and between occupational proximity and wage (right).

Comparison of Synthetic Contact Network with Empirical Contact Networks
We address the concern that our results are an artefact of our contact network generation model; the modified configuration model. Therefore we consider several empirical contact networks from various scenarios and compare the descriptive statistics of each with our synthetic network (table S4 and Figure S11). The "workplace" is the contact network between 217 individuals measured in an office building in France in 2015. The "conference" network represents the face-to-face interactions of 403 participants to the 2009 SFHH conference in Nice, France. The "art exhibition" is the network of contacts collected during an 6/16 Figure S6. The aggregated performance of each strategy presented as the interaction term between the cost due to furloughing and the cost of infection that complements Figure 5 in the main paper.
art science exhibition at the Science Gallery in Dublin, Ireland in 2009. The "hospital" is the contact network between patients, between health-care workers and patients and among health-care workers during four days in a hospital ward in Lyon, France. The "primary school" network includes the face-to-face interactions between 242 students and teachers in a primary school, and the "high school" is the network of contacts between students in a high school in Marseilles, France. The data for those networks were active contacts as collected during 20-second intervals generating dynamic (temporal) networks in the different locations. Here we consider their static counterparts by simplifying the networks with neglection of multiple edges or by using cumulated contacts (e.g. within a day).

Simulations with random rewiring
We randomly delete 1% of home, transportation and work edges respectively and then rewire random nodes with the same proportion of links. We then repeat the simulations with the furloughing strategies on that network, and compare the outcomes ( Figure S12) with the main findings shown in Figure 4 of the main paper. Each spreading scenario shown in Figure S12, is evaluated over 1000 epidemic realisations.

Simulations with additional removal of home links of the furloughed workers
We repeat the simulations with the different furloughing strategies and we add the hypothesis of home quarantine (Figures S13 and S14), which means that for the furlough nodes we cut all of their links (work, transportation and home).

Comparison with Alternate Configuration Model Instantiation
In order to ensure that our results are not an artifact of our particular random configuration of the network, we repeat our analysis using a different random seed. The results of the analysis with the different network configuration are shown in Figure  S15.        Figure S15. This figure is similar to Figure 4 in the main paper comparing the different strategies across the full range of severities according to cost of furlough (left), cost of infection (middle) and peak of infection (right). Here, we consider a different network configuration drawn from the same modified configuration model.