Diffusion of treatment in social networks and mass drug administration

Information, behaviors, and technologies spread when people interact. Understanding these interactions is critical for achieving the greatest diffusion of public interventions. Yet, little is known about the performance of starting points (seed nodes) for diffusion. We track routine mass drug administration—the large-scale distribution of deworming drugs—in Uganda. We observe friendship networks, socioeconomic factors, and treatment delivery outcomes for 16,357 individuals in 3491 households of 17 rural villages. Each village has two community medicine distributors (CMDs), who are the seed nodes and responsible for administering treatments. Here, we show that CMDs with tightly knit (clustered) friendship connections achieve the greatest reach and speed of treatment coverage. Importantly, we demonstrate that clustering predicts diffusion through social networks when spreading relies on contact with seed nodes while centrality is unrelated to diffusion. Clustering should be considered when selecting seed nodes for large-scale treatment campaigns.

The coverage outcomes concerned the reach and speed of treatment diffusion. Coverage is defined as the proportion of the eligible population that were offered treatment, i.e. visited and offered at least one drug by the CMD. Coverage was measured for eligible individuals in the 3,436 households belonging to the main components of the networks. There were 3,415 households with at least one person eligible for treatment. For individuals, coverage was a binary indicator that was equal to one if an individual was offered at least one of praziquantel, albendazole, or ivermectin by CMDs and was in the eligible population. Similarly, household coverage was a binary indicator equal to one if at least one eligible person in the home was offered at least one of the three MDA drugs by CMDs. Individual and household coverage indicators were then presented as proportions at the village level to assess the fraction of eligible individuals or households approached by CMDs for at least one drug. The day the drug was received was recorded for individuals who indicated being offered at least one drug by CMDs and was presented as a variable from one (offered a drug on the first day-October 1 st -of distribution) to 31 (offered a drug on the last day-October 31 st -of distribution). The earliest day of praziquantel, albendazole, or ivermectin receipt was used as a timestamp of drug receipt. There were 0.24% (35/14625) of eligible individuals who indicated receiving drugs greater than 31 days; these responses were not concentrated in one village. These answers were changed to the maximum response of 31 days. At the household level, the earliest day any eligible individual was offered praziquantel, albendazole, or ivermectin by CMDs was used as the timestamp of household drug receipt.
3 were available and household members engaged in open defecation in a bush or field. Households that sought private medical care frequently received medical supplies and care from drug shops and private health clinics. Availability of electricity in the home also was recorded as a dummy variable.
Education was a count variable of the highest level of education attained by anyone in the household. Education was marked in levels from zero (no education) to 16. The levels of education were primary 1-7, senior 1-6 (levels [8][9][10][11][12][13], diploma (level 14), some university (level 15), and completed university (level 16). The years a household had lived in the village was represented as a continuous variable, rounded to the nearest year. Home quality score was a rank indicator of floor, wall, and roof materials. These materials were ranked from 1-4 and summed. The rank order was grass, sticks, plastic, and metal for the roof; mud and sticks, plastic, metal, and bricks or cement for the walls; and mud, plastic, wood planks, and brick or cement for the floor. Mud included cow dung. If no roof, wall, or floor material was present then zero was recorded.

Village-level variables
This section describes variables that measure village size, accessibility, and ecology. Two village size indicators were constructed. The total number of households included interviewed and not interviewed households in each village. There were a total of 3,578 households, which included 55 households that were not in the main component of the village friendship networks and 87 households that were not interviewed (see main text, network completeness). The fraction of total households that were friends with the CMDs also was calculated. Friendship was defined as a direct connection in the village social network (see network construction section). The physical size of the village was measured as follows. In November 2014, waypoints of all the homes in the village were taken. The physical homes in the village differed from the total households in the village, as some households lived in the same physical home but were counted as separate families in the household survey (see definitions in participants and data source section). There were 3,323 physical homes in the 17 study villages. In Python version 2.7.3 (www.python.org), the haversine distance in meters between each home and all other homes in the same village was calculated with Global Positioning System data. This procedure was repeated for every home in all the study villages. The haversine distance is the shortest distance (as-the-crow-flies) on the earth's surface between any two points. These distance matrices were used to calculate the average distance between any two homes in a village, which measured the general accessibility of the village for the CMDs 4 . The village ecology also had been surveyed in February 2013. This survey provided indirect measures of accessibility. Five binary indicators for village ecology were constructed. Three of these indicators assessed the presence of water bodies and were equal to one if the following were within the village: a large rice farm (rice paddy/swamp), beach on Lake Victoria, or small boat landing site on Lake Victoria (no beach). Other ecology dummy variables included if the village center was more than 0.50 kilometres from Lake Victoria and if there were three or more roads within a village. An additional village indicator was recorded. CMDs were asked if they used any methods not in the national protocol (i.e. other than door-to-door) to deliver treatments within their village. A binary indicator was constructed and equal to one if CMDs also made available an option for treatment pick-up from the CMDs' homes (the only other delivery option stated).

Network measurements
The friendship graphs were analyzed using Python version 2.7.3 (www.python.org) with several algorithms implemented from the NetworkX library 5 . Edges were not weighted and treated as undirected for all indicators except for reciprocity. Well-established centrality indicators 6 were measured. Degree is the total incoming and outgoing edges of a node and average neighbour degree is the mean degree of all neighbours of a node 6 . Reciprocated edges are counted as one edge for degree. Eigenvector centrality is similar to degree, but more weight is assigned to neighbours with more connections 6,7 . The eigenvector centrality of a node is proportional to the score of its neighbours. Hence, a node may have few connections, but have high eigenvector centrality because the neighbours of the node have high degree. Katz centrality captures not only the direct neighbours of a node, but also the extent that other nodes are connected to the node of interest through its neighbours. Katz is similar to eigenvector centrality except that there is a free parameter so all nodes have non-zero centrality. All connections to the node of interest contribute to its Katz centrality 6,8 . Three path-related centrality indicators for undirected edges were calculated: closeness, betweenness, and communicability. Closeness is the sum of shortest paths from a node to all other nodes in the network and is normalized by dividing by the sum of minimum distances 9 . Communicability is similar to closeness except this measure is not only concerned with the shortest paths, but also considers all paths connecting two nodes 10 . Betweenness centrality is the sum of the shortest paths between any two nodes in the network that traverse the node of interest, which is divided by the total number of shortest paths 11 . Betweenness was normalized by 1/((n−1)(n−2)) where n is the total number of nodes in the network.
Three local measures of network transitivity were calculated-reciprocity, clustering, and density. In directed networks, reciprocity is the frequency that nodes received and returned connections. Reciprocity was measured as the total reciprocated edges divided by the total edges in the egocentric network of the CMD 6 . The egocentric network was a sub-graph of nodes and all edges between the CMD's household and its neighbours (direct connections) in the village network. Clustering was defined as the pattern of undirected edges that can exist between three nodes. Methods presented in Saramaki et al. were used to calculate clustering 12 . Clustering occurred when two connected nodes also shared connections with another node and was calculated as the fraction of possible triangles involving the node of interest and its neighbours. Local clustering was equal to the total number of connections between the neighbours of a node divided by the maximum possible number of connections between the neighbours. Clustering is important when comparing models of complex and simple contagion, and is being investigated in other fields including agricultural economics 13 . Density, in an undirected network, is equal to [2m/(n(n-1)] where m is the total number of edges and n is the total number of nodes 6 . Density was calculated for the egocentric networks of the CMDs.
Beyond centrality and local transitivity, two specific sub-graph network properties were examined 14,15 . The core number was calculated as described in Batagelj and Zaversnik 14 . The core number of a node is the value k where k is equal to the largest degree for all nodes in a maximal sub-graph. A k-core is a maximal sub-graph with all nodes that have at least degree k. Each vertex is connected to at least k other nodes in this sub-graph. A subset is maximal in that no single vertex can be added whilst maintaining the property of interest; here, the property is k connections amongst all nodes. If, any number of vertices may be added whilst retaining this property then the subset is not a kcore. The k-cores cannot overlap, as a group can only be a k-core if it is not a subset of a group that is a larger kcore. The clique number is the number of nodes in the largest maximal clique with the node of interest 15 . In a clique, every node is connected to every other node in the set. Cliques were allowed to overlap in the network. Since clustering was calculated with the immediate neighbours (friends) of a node, understanding the effects of the core and clique numbers provide insights into how well connected those friends must be. The core number enables a distinction between simply having a well-knit group of friends (clustering) or a highly connected group of friends who also share a high degree. This distinction is behaviourally interesting since high clustering, as opposed to a high core number, does not require a well connected, possibly influential group of friends. The clique number measures the completeness of connections between friends of a node. The significance of a high clique number for diffusion would indicate that the friends of a node or a subset of those friends must be perfectly connected. Understanding how complete connections must be amongst friends is needed to inform targets for public interventions that seek to introduce friends of a node.

Statistical analysis Overview
To our knowledge, this study is the first analysis of complex networks for MDA 16 . All network variables except for reciprocity were calculated as undirected to assess who knows whom rather than who initiated contact. However, all analyses described below were repeated with directed network indicators except for clustering and density, which did not have a trivial directed measurement 6 . The direction and significance of all coefficients were robust and remained unchanged to the results found with undirected network variables. To identify the role of network indicators, the contribution of potential confounders was assessed, including homophily and village geography. With limited degrees of freedom (17 village observations) and, most importantly, collinearity between network indicators, univariate models were used for all statistical analysis. These models were robust to omitted variable bias (Supplementary Table 20).

Treatment coverage
To identify the topological indicators that predicted treatment coverage, network characteristics of CMDs were assessed in univariate fractional response models 17 . Unlike linear models, the magnitude of the coefficients of fractional response models cannot be directly interpreted. The expected margins of predictors found to be significant also were calculated to provide a more tangible measure of potential impact on the reach of treatment coverage. The dependent variables were the village outcomes (N=17) of household and individual coverage. Topological indicators were presented as the mean value for the two CMDs in each village. The average CMD network characteristics were then used as predictors for each of the dependent variables. These network characteristics included clustering, reciprocity in the CMD ego-network, density of the CMD ego-network, degree, average neighbour degree, core number, clique number closeness centrality, eigenvector centrality, katz centrality, betweenness centrality, and communicability. The fractional response models were specified with a probit link, binomial family, and robust standard errors 18 . To test for heteroskedasticity, the assumption of a fixed variance was relaxed and compared to models with fixed variance. If the equation for heteroskedasticity was significant (Wald test, p-value<0.05), the coefficient from the model with heteroskedasticity was reported and noted. As a robustness check, the treatment coverage models were examined as linear regressions and there was insufficient support to suggest any differences in the significance and sign of coefficients. However, unlike fractional response models, linear regressions cannot account for non-normal errors, non-linearity, heteroskedasticity (even with robust standard errors), and the bounded nature of the dependent variable when the model predictions (e.g. average partial effects) must lie between zero and one.

Treatment speed
To examine the speed of treatment distribution, CMD network properties were analyzed in univariate Poisson regressions with robust standard errors 18,19 . The speed of diffusion was measured by the time, in days, required for CMDs to not only approach but also have 50% of households accept or swallow treatment (effective coverage). Predicted margins were provided as an indication of the difference in speed for the 90 th and 10 th percentiles of significant variables, as the magnitudes of coefficients from Poisson regressions are not directly interpretable. Poisson regressions were used instead of proportional hazard models for the following reasons. In a proportional hazard model, if the baseline hazard is assumed to be constant over time than the proportional hazard model is the same as the Poisson model 20 . This relationship is due to the hazard estimator collapsing into a standard maximum likelihood estimator used in Poisson regression. Issues arise when attempting to use hazard models for our data. The sample size (15 villages that achieved 50% household coverage) is too small for this type of model and more equations are generated than actual observations. Hazard models also produced inconsistent predictions when there were 'ties' in the data, i.e. two villages achieving the target in the model on the same day.
The dependent variables for the speed of household and individual coverage were adjusted to account for missing data in the diffusion speed analysis. Individuals and households with missing day receipt information were excluded from the numerator and denominator (treated as missing data) when calculating individual and household coverage, respectively. The day of treatment offer was unavailable for 11.37% (944/8302) of eligible individuals who were visited by CMDs and refused treatment. And, not all individuals (3.98%; 293/7358), who accepted treatment when offered by CMDs, remembered the day of drug receipt. At the household level, the day of drug receipt was unavailable for 9.48% (228/2404) of households where everyone visited refused treatment, though these households can include unvisited individuals who might comply with treatment if offered by CMDs. And, there were 3.54% (85/2404) of households that were offered treatment and did not remember the day of drug receipt. The households that received treatment, but did not have any information for the day of drug receipt are presented by village in Supplementary Table 15. These households are roughly evenly spread across villages and were not a large proportion of CMD friends, except for Village ID 17 that was the smallest village with 65 households.
The predictors-the average CMD network characteristics-were the same as specified for the treatment coverage regressions. As the study was confined to one month (31 days), which was within the intended national schedule of drug distribution, not all villages reached 50% coverage and were included in the Poisson regressions. Fifteen villages achieved 50% household coverage whilst only 11 villages reached 50% individual coverage. The 50% coverage threshold was chosen to ensure comparability across villages, capture variability that was not present when achieving lower thresholds, avoid bias to small village size, and include the greatest number of villages without setting the threshold to a trivially low number. This 50% threshold is conservative for World Health Organization guidelines, which recommend 75% coverage as a feasible target 21 . A trivially low 20% target would have been needed to include all villages.

Robustness to temporal clustering, CMD friendship, and homophily
Temporal clustering, CMD friendship connections, and homophily were examined as potential confounders to the effect of CMD network properties on treatment diffusion. Univariate fractional response models with a probit link, binomial family, and robust standard errors were used when the dependent variable was household coverage 17,18 . For the speed of treatment as the dependent variable, univariate Poisson regressions with robust standard errors were used 18,19 . Temporal clustering was measured as the standard deviation of the day of drug receipt amongst friends of CMDs. This indicator excluded the CMDs. If significant, temporal clustering may suggest that any transitivity of connections amongst CMDs and their friends was confounded by the CMD simply approaching all friends at the same time. The standard deviation was calculated by using the earliest day that an eligible individual in the home was offered treatment by a CMD (see treatment outcomes section for day of drug receipt). Friendship status between CMDs was examined to understand the reliance of village treatment coverage on the connection between CMDs. A binary predictor was included and equal to one if the two CMDs were friends in the village network. The geodesic distance between CMDs in a village also was examined. Due to their initial selection, training, and working together for several years to administer treatments, all CMDs knew the other CMD within the same village irrespective of their close friendship status.
Homophily consists of the shared characteristics, environments, or other contextual factors amongst a group of individuals 22 . The presence of homophily can confound or, at the very least, inflate effects of network structure on diffusion 23 . The best practice for untangling the effect of homophily from other network properties involves experimental techniques 24 . Latent homophily, which is the set of shared and unobservable characteristics (for example, unrevealed attitudes or preferences), can only be addressed with a randomized controlled trial. Yet, this setup would undermine a key purpose of this study, which was to understand a practical context of diffusion where seed nodes are regularly selected by their fellow village members in a routine round of MDA. Accordingly, we have used the next best approach and directly examined all observable types of homophily, including manifest and secondary homophily 24 . Manifest homophily is a shared likeness on the characteristic of interest, which would suggest that CMDs acquired their friends because they were keen participants of MDA. Two indicators of the years of friendship were used to predict household coverage and speed of treatment. To assess if the CMD had acquired their friends after participating in MDA, a binary indicator was constructed and equal to one if the average years of friendship were less than the average years the CMDs had been active in MDA. If the CMDs joined after the start of MDA in Uganda (10 years at the time of study) then the average years active as a CMD may not be informative. Another binary variable measured if the average years of friendship were less than 10 years. We also assessed if the CMD personally had an affinity for MDA with the assumption that the length of tenure as a CMD may represent this affinity; the average years as a CMD also were used as an indicator of manifest homophily. Secondary homophily is based on an observed characteristic that is not the outcome of interest. For example, CMDs may share traits with their friends that make CMDs more likely to better distribute drugs. To test this assumption, a wide range of household socioeconomic variables was used to predict household coverage and treatment speed. Household-level variables were used as CMD friendship connections were at this level. The construction of the following variables included the CMDs to measure the similarity amongst CMDs and their friends, although all analyses of these variables also were tested without the CMDs. The socioeconomic variables, if binary, were presented as the fraction of friends of the CMD that have the trait of interest. For count or continuous variables, the standard deviation of the trait amongst the friends was used as a predictor. The traits, as described in the socioeconomic variable section, included preference for private medical care, social status, Muslim household head, majority tribe, water purification, home latrine, private medical supply, education, years lived in the village, home quality score, and electricity in the home.

Robustness to village size, accessibility, and ecology
This section measured the general physical accessibility 4 of a village for a CMD. Accessibility was an important feature to study since CMDs were instructed to walk from home-to-home to distribute treatments. Accordingly, the total households, the fraction of total households connected to CMDs, and the average distance in meters between households were examined as predictors of household coverage and the speed at which 50% household coverage was reached. The univariate regressions were set up in the same manner as the treatment coverage and speed sections. Fractional response models with probit links and binomial families as well as Poisson regressions were used with robust standard errors [17][18][19] , respectively, for the dependent variables of treatment coverage and speed. We also sought to understand if the presence of swamps/rice paddies or fewer roads hinders CMDs from physically reaching a home, perhaps by blocking a footpath or simply requiring more time to reach the home. Additional ecological indicators, as described in the village-level variables section, that measured if there was a beach in the village, the distance of the village center to Lake Victoria, and the presence of a boat landing site were examined as possible predictors of treatment coverage and speed. In the event that other drug distribution methods facilitated treatment diffusion, an additional indicator of accessibility was examined that was unrelated to geography. This indicator tested the effect of CMDs allowing individuals to retrieve treatment from the CMDs' homes, which was in addition to home-to-home distribution and not part of the national training instructions.

Robustness to overfitting and variable selection
CMD clustering was selected as a predictor of treatment coverage when using several exploratory (not hypothesisdriven) methods. Such approaches complement our already presented and use of sparse models (univariate), forward selection, and factor analyses to reduce dimensionality. We employed data mining approaches to investigate model overfitting and variable selection, using R 25 with the glmnet 26 and hdm 27 packages. These approaches penalize the regression coefficients towards zero (shrinkage) and are useful for when the number of covariates exceeds the number of observations (high dimensionality). Three methods are available: ridge 28 , lasso 29 , and elastic net regressions 30 . Such methods are not hypothesis driven and do not consider the statistical significance of the variables. Ridge regression shrinks coefficient estimates, but retains all covariates and works well in the presence of correlated variables. This regression imposes a penalty on the coefficients that is equal to the square of the magnitude of the coefficient (L2 regularization). Here, with ordinary least squares regression, ridge regression results in the shrinkage of the coefficient of average CMD clustering. Importantly, the penalized clustering (up to an unreasonably large penalty of λ=10) retains a value greater than zero (~0.01) against the dependent variable of household coverage 26 . Hence, CMD clustering remained relevant for positively predicting household coverage. On the other hand, lasso regression selects few variables and imposes a penalty based on the absolute value of the coefficient (L1 regularization). However, this method performs poorly with correlated variables 30 . We observe this poor performance by implementing lasso with a data-driven penalty 31 and find no variables selected (including the intercept). No variables were selected, including the intercept, also in the case of double-post-lasso. When the lasso regression was rerun without the intercept, purely for variable selection as opposed to assessing an accurate model fit, we find CMD clustering (coef. 0.107) is amongst the five penalized variables selected (other variables were CMD education (0.152), water purification behaviour (0.036), core number (0.026), and clique number (0.009)). Lastly, we examined elastic net, which is a more flexible approach to shrinkage that does not make assumptions about the penalty. A mixing parameter, α[0,1], is included where either L1 [α=1] or L2 [α=0] or both regularizations can be employed. With α=0.40, CMD clustering is selected at λ=0.10.

Focus group analyses
In April 2016, structured focus group questionnaires were developed at the Ministry of Health in Kampala and conducted in all of our study communities. Though the main aim of our study was to mitigate between two competing views, i.e. centrality versus clustering for seed node selection in social networks, the authors had an interest in further isolating the mechanism of the main result. To avoid biasing routine drug distribution, only an expost focus analysis was possible in our study. The researchers returned to the study area to meet with the drug distributors and chairmen (village leaders) of all the 17 study villages. Individuals were directly asked: "Is there any benefit of having a close-knit group of friends to deliver more treatments during mass drug administration?" All participants enthusiastically agreed. The main finding in all villages and corroborated by quotes from the focus groups, is that close-knit friends feed back information to the CMDs about problems in the community and about missed households. One chairman and CMD explained that "communication is much easier and they can talk to each other about the problems that they are facing.
[Friends] tend to meet each other when everyone is connected; when they are [a] close group they can be together." Other CMDs indicated that receiving the same information from close-knit friends assisted treatment distribution: "For them, they can reach many people because they are sharing the same information." The secondary finding of the focus groups in all villages was that close-knit friends increased the performance of CMDs by spreading information to the wider community and mobilizing households. For example, one CMD stated that close-knit friends "can help them do work quicker. [Close-knit friends] helps (sic) to give information back to the person distributing. [And, a] certain household will already be informed about drug distribution." Another CMD noted that "[I] can tell my friends to go around and mobilize people and they come quickly." Concerning mobilization of the wider community, all CMDs expressed concern that information about drug availability often needs to be provided repeatedly to the same recipients. These results suggest that social reinforcement occurs through the process of spreading repeated information not only to the wider community (complex contagion), but more importantly feeding back information to the CMD. Our qualitative evidence suggests that clustering facilitates this type of repeated information exchange.

Supplementary Figure 1 Village degree distributions
The linear probability density functions are plotted. The number of each plot corresponds to the village ID and the plots are not ordered by any village characteristics. In-degree, out-degree, and degree are plotted respectively in red, blue, and green. The y-axis represents the frequency of nodes and the x-axis is the actual node degree, in-degree, or out-degree. The frequency counts are not binned; the raw counts and every occurring degree, in-degree, or out-degree in a village is displayed. All villages displayed right-skewed degree distributions where only a small number of nodes have many connections.    32 The numbers next to the villages correspond to the village IDs used in the main text. From this figure, a wide variation in village 'shape' is observable, as some villages have households that are geographically clustered, whereas other villages have households that are spread out. No study villages were isolated from other villages in the study area, i.e. all villages had geographical neighbours including Village ID 3. Villages that were isolated included villages in the land area between village ID 11 and 4; these villages that were not included in our study were surrounded by government forests, were nearly a kilometer from neighbouring villages, and were difficult to access by even informal dirt roads. The villages that achieved the highest household coverage (IDs 6, 7, 15, 17, 13) and the fastest speed of household coverage (IDs 17, 9, 7, 1, 3) did not all belong to the same geographical cluster in Supplementary   33 . Three global network statistics are included in Supplementary Table 1 that were not described in the materials and methods section. The mean geodesic distance is the average of all shortest paths between two nodes in the network. We provide both directed and undirected measures of average geodesic distance and also degree. In a directed network, there may be no paths between two nodes and these 'infinite' paths are not counted in the average shortest path. The directed measure of degree provided is in-degree, which is the count of incoming connections. Households were allowed to name up to ten outgoing connections and average degree may differ from average in-degree due to the count of these outgoing connections.

Supplementary
The exponent of the power law distribution, alpha, was calculated using methods described in Alstott et al. 34 . We assessed if a power law degree distribution was a better fit (log-likelihood test) when compared to an exponential or lognormal distribution. All nodes in the main component, i.e. at least one degree, were considered and degree was treated as a continuous variable. A power-law distribution was a better fit than both exponential and lognormal distributions in all villages with all p-values<0.001. Global clustering was defined as the total triangles that occurred in the network divided by the maximum number of triangles theoretically possible. a Household coverage is defined as the proportion of households in the village where at least one eligible person in the home was offered at least one treatment through mass drug administration. b Individual coverage is defined as the proportion of eligible individuals in the village who were offered at least one treatment through mass drug administration. c Excludes the actual CMD. d CMDs were asked the number of years that they have been a CMD, which ranged from 1 (first year as CMD) to 10 (all years MDA had been ongoing in Uganda). Two of 34 CMDs (Village ID 3 and 17) did not provide the number of years they have been active as a CMD.

Supplementary
Supplementary Table 3 presents the socio-demographic statistics for the community medicine distributors (CMDs) and the friends of the CMDs. When compared to their friends and to the average of all households in the study area (see Supplementary Table 4), the CMDs had higher social status, better home quality, higher educational attainment, more access to formal medical care, lived in the village longer, all had a home latrine, and more purified drinking water. Hence, the CMDs not only had high centrality, but also high socio-demographic status. Again, this finding was not surprising as villagers elect the CMDs. CMDs also offered treatment to more individuals in their home, on average, when compared to the rest of the village. Individual coverage may appear lower than household coverage for CMDs, but 26/55 people not offered treatment in CMDs' homes were all from one village where both CMDs did not offer treatment to anyone in their homes (Village ID 12).  Figure 2). Each network characteristic was calculated for the whole network in the same manner as described in the methods for node-level (CMD) calculations. The significance and direction that was found for the average of CMD network properties was preserved when these properties were examined at the village level. a Community medicine distributor. The average of network properties for the two CMDs in each village were used in pairwise correlations.

Supplementary
Clustering was not associated (p-value>0.05) with any centrality indicators. However, clustering was associated (p-value<0.05) with other measures of local transitivity, including CMD ego-network density and reciprocity. a Households where at least one eligible person was offered treatment and the day of drug receipt was known. b Individuals who were offered at least one drug and the day of drug receipt was known.

Supplementary
The frequency of households or individuals being offered treatment at each day of the one-month distribution period was plotted for each village and descriptive statistics of those frequency distributions are presented in Supplementary Table 7. Skewness measures the symmetry of the frequency distributions whereas kurtosis measures the thickness of the distribution tails. More specifically for skewness, if the median is greater than the mean, the value for skewness will be negative and the distribution will be leftskewed. Kurtosis describes if the peak of the distribution is flat or pronounced. These measures were calculated as described in D'Agostino et al. 35 For comparison, Normal curves have skewness and kurtosis of approximately zero and three, respectively. The p-values provided are for a chi-squared test of skewness and kurtosis against values expected in a Normal distribution, i.e. the null hypothesis is that the distribution is a Normal curve. There was no trend in the skewness of the distributions for the villages that achieved the highest household or individual coverage. For example, the top five villages for household coverage (IDs 6, 7, 15, 17, 13) had varied distributions that were either more left-skewed, right-skewed, or insignificantly different from a Normal distribution. The top five fastest villages (IDs 17, 9, 7, 1, 3) that achieved 50% household coverage in the fewest days had four villages with right-skewed distributions (IDs, 17, 9, 7, 1) when compared to Normal curves. However, the third (ID 5) and fifth (ID 1) slowest villages also had significantly (p-value<0.001) right-skewed distributions when compared to Normal curves. As per kurtosis, there was no notable trend in the tails of distributions for the best performing villages. The villages with the fastest or highest coverage had varied kurtosis that was higher, lower, or insignificantly different than Normal curves. Overall, there was no apparent difference in the skewness or kurtosis between the fastest and slowest villages, so differences in how the CMDs approached households was not easily observable in the distribution of days in which treatment was offered. Obs. 17 a Each model is a single predictor regression. Fractional response models were used with probit links; constants are not shown. b Household coverage is defined as the proportion of households in the village where at least one eligible person in the home was offered at least one treatment through mass drug administration. c CMD is an abbreviation for community medicine distributor. d Predictors were represented as the percentage of CMDs friends (including the CMD) with the characteristic of interest. The direction and significance of these variables was unchanged when CMDs were excluded in the variable generation. e Predictors were represented as the standard deviation of the characteristic of interest amongst the CMDs and their friends. The direction and significance of these variables was unchanged when CMDs were excluded in the variable generation. Household coverage is defined as the proportion of households in the village where at least one eligible person in the home was offered at least one treatment through mass drug administration. c The number of days to reach 50% household coverage was recorded within a one-month distribution period for each village. d CMD is an abbreviation for community medicine distributor. e These binary variables were constructed as follows. The original variables for each CMD were equal to one if the CMD had the characteristic described. To assess the similarity in these characteristics between CMDs since the average of the binary indicators is not informative, the variables were coded as one if the CMDs both had different values for the variable. For example, for 'household purifies water', if one CMD's household indicated 'yes' and another CMDs' household indicated 'no' to purifying water then the binary indicator here would be equal to one. Hence, the base category for all the binary indicators is that there are no differences in the values of these variables between the two CMDs in each village. f Ownership of a home latrine was not included as a predictor in the univariate regressions because there was no variation in this binary indicator. All CMDs had a home latrine. g The variable was averaged for the two CMDs in each village. h Only 15/17 villages achieved 50% household coverage during the one-month distribution observed for this study.

Supplementary
Supplementary Obs. 17 a Each model is a single predictor regression. Fractional response models were used with probit links; constants are not shown. b Household coverage is defined as the proportion of households in the village where at least one eligible person in the home was offered at least one treatment through mass drug administration. CMD stands for community medicine distributor. c Predictors were represented as the percentage of CMDs friends (including the CMD) with the characteristic of interest. The direction and significance of these variables was unchanged when CMDs were excluded in the variable generation. d Predictors were represented as the standard deviation of the characteristic of interest amongst the CMDs and their friends. The direction and significance of these variables was unchanged when CMDs were excluded in the variable generation.
Supplementary Table 13 shows that no relationships with manifest or secondary homophily for friends of CMDs were found (p-value>0.05) with the household coverage amongst those households. Supplementary Table 16 presents the balance of infrastructural development across the study villages, which was captured in the variables of home quality (also indicative of wealth), availability of electricity, education, and the longevity of the village (years of residence for villagers). The number of observations for the variables where the average was presented was the same as the observations presented in the denominator of the frequency column. The study villages display similar properties across these characteristics. As a robustness check against multiple hypotheses, we perform a confirmatory principal component analysis 36 on our average CMD network indicators then rerun our regressions on diffusion reach and speed. This analysis was done to reduce the data into two dimensions: local transitivity and centrality. Two indices were constructed to measure a set of conceptually similar variables. Instead of using each indicator of transitivity (average CMD: clustering, ego-network density, and ego-network reciprocity) as a predictor of diffusion, we construct a factor variable relating all of these measures. Another factor is constructed for the measures of centrality (average CMD: degree, avg. neighbor degree, closeness, eigenvector, katz, betweenness, and communicability). All local transitivity factors are uncorrelated with the centrality factors (p-value>0.05). Thus, we have now collapsed our predictors into a smaller set of variables (two predictors/factors) that are uncorrelated with each other rather than using a large set of variables that are correlated (i.e. the correlation between different centrality measures, though clustering and centrality are not correlated Supplementary Table 6). We now use these factors as predictors of diffusion. Additionally, we construct a second version of the centrality factor without degree and average neighbor degree. Lastly, to further demonstrate the validity of our hypotheses, we also construct naïve indices of local transitivity and centrality. The naïve index for local transitivity aggregates average CMD clustering, ego-network density, and ego-network reciprocity and divides by the number of variables (here, three). Naïve indices were constructed for centrality, with and without degree and average neighbor degree.

Supplementary
Most importantly, all results remained qualitatively the same. Both indices/factors of local transitivity retained the signs of the coefficient found in the main paper and remained significant (p-value<0.05) against diffusion reach and speed. Both indices/factors of centrality remained insignificant (p-value>0.05) against household coverage and speed of treatment. Community medicine distributor c Adjusted for heteroskedasticity As a robustness check, we repeated the analysis of the reach of treatment coverage with household coverage adjusted for missing data. We removed the 228 households for which we did not have information on the day of drug receipt from the numerator and denominator when calculating household coverage. This table is presented to enable comparisons with the analysis of treatment speed, in particular because households that did not have information on the day of drug offer were missing from the analysis of treatment speed. All results presented in the main text remained. b These binary variables were constructed as follows. The original variables for each CMD were equal to one if the CMD had the characteristic described. To assess the similarity in these characteristics between CMDs since the average of the binary indicators is not informative, the variables were coded as one if the CMDs both had different values for the variable. For example, for 'household purifies water', if one CMD's household indicated 'yes' and another CMDs' household indicated 'no' to purifying water then the binary indicator here would be equal to one. Hence, the base category for all the binary indicators is that there are no differences in the values of these variables between the two CMDs in each village.

Supplementary
No evidence was found to indicate that unobservable factors (omitted variable biases) were driving the association of average CMD clustering with household coverage. We implement methods described in Oster 37 and Antonji et al 38 using Stata v13.1 with the package psacalc 37 . These approaches examine the coefficient stability of the predictor of interest, here clustering, when additional covariates are added and the subsequent movements in R 2 . Such analysis is necessary if the insignificance of covariates does not capture the effect of unobservables. However, assessments of coefficient stability to infer omitted variable bias ultimately rely on a strong assumption that observables and unobservables similarly affect outcomes. Another important assumption is the maximum R 2 , Rmax[0,1], that can be achieved in a controlled (with covariates) model. We run ordinary least squares regressions with the dependent variable of household coverage and covariates of centrality and personal characteristics of CMDs; factors we argue are irrelevant for treatment coverage. Home latrine was excluded, as there was no variation in this variable for CMDs; all CMDs had a home latrine. No added covariates were significant (here, p-value<0.10) in any of the regressions; this result accords with the univariate analysis. We chose Rmax as 1.3*R 2 controlled where controlled represented R 2 from the controlled regression, as recommended in Oster 37 . However, all results were qualitatively the same when 2.2*R 2 controlled was used. Each controlled regression included a single covariate in addition to average CMD clustering. We limit each model to 2 predictors due to the limited number of observations (17). In the uncontrolled model, with only the predictor of average CMD clustering, this variable had a coef.= 0.637,, std. err.= 0.270, R 2 = 0.058, and p-value=0.032. We provide the estimated range for the coefficient of clustering after any adjustment for unobservable influences (identified set). This identified set always excluded 0 (except in the case of CMD religion where R 2 controlled was exactly unchanged from the controlled model (0.058)), indicating a positive influence of clustering.
With the addition of another covariate and, in turn, noise to the model, statistical power was lost and pvalues expectedly were not retained always below 0.05. We also provide the estimate of δ required for the coefficient of clustering to equal 0 or to be 'eliminated.' Antonji et al. 38 suggest a cutoff of δ=1 as robust; anything above this cutoff indicates that the unobservables must have a stronger effect than the observable factors to exert bias. A negative δ suggests that the addition of covariates actually strengthens the coefficient of clustering as opposed to attenuating it towards 0. In 4 cases, for CMD religion, tribe, clique number, and closeness, was δ positive and below 1. However, in these cases, R 2 controlled either was completely unchanged (religion) or virtually unchanged from the R 2 of the uncontrolled model, a setting where the Oster 37 approach is not applicable and indicative of likely capturing noise. All network variables are averages for CMDs. For socioeconomic variables, if not indicated as an average, then these variables were binary. The binary variables were constructed as follows. The original variables for each CMD were equal to one if the CMD had the characteristic described. To assess the similarity in these characteristics between CMDs since the average of the binary indicators is not informative, the variables were coded as one if the CMDs both had different values for the variable. For example, for 'household purifies water', if one CMD's household indicated 'yes' and another CMDs' household indicated 'no' to purifying water then the binary indicator here would be equal to one. Hence, the base category for all the binary indicators is that there are no differences in the values of these variables between the two CMDs in each village.

Supplementary
The approach employed in Supplementary Table 20 was repeated here with a uniform random selection (in R v3.3.3) of two covariates instead of one covariate to run alongside clustering though underpowered. There are 120 combinations of two covariates from 16 predictors, so we present a random sample of 10 from the 120 regressions. Other combinations did not change the conclusions of this study. Clustering remained positive and greater than zero.