Incorporating Contact Network Structure in Cluster Randomized Trials

Whenever possible, the efficacy of a new treatment is investigated by randomly assigning some individuals to a treatment and others to control, and comparing the outcomes between the two groups. Often, when the treatment aims to slow an infectious disease, clusters of individuals are assigned to each treatment arm. The structure of interactions within and between clusters can reduce the power of the trial, i.e. the probability of correctly detecting a real treatment effect. We investigate the relationships among power, within-cluster structure, cross-contamination via between-cluster mixing, and infectivity by simulating an infectious process on a collection of clusters. We demonstrate that compared to simulation-based methods, current formula-based power calculations may be conservative for low levels of between-cluster mixing, but failing to account for moderate or high amounts can result in severely underpowered studies. Power also depends on within-cluster network structure for certain kinds of infectious spreading. Infections that spread opportunistically through highly connected individuals have unpredictable infectious breakouts, making it harder to distinguish between random variation and real treatment effects. Our approach can be used before conducting a trial to assess power using network information, and we demonstrate how empirical data can inform the extent of between-cluster mixing.

In this supplement, we provide additional details for a few topics discussed in the main paper. Section S1 demonstrates a simple approach to modeling infectious spread with between-cluster mixing using ordinary differential equations, and compares this result to the simulation approach introduced in the paper. Section S2 describes the stochastic blockmodel and provides details for the specific model we used in our paper. Section S3 connects our definition of between-mixing parameter γ with a common metric used in applications of network science. Section S4 describes how the Intracluster Correlation Coefficient is defined, and we show estimates of this quantity for our simulations. Finally, Section S5 shows the degree distribution for the empirical cell phone network, with discussion. S1: Ordinary Differential Equation approach to epidemic spreading with between-cluster mixing.
One of the most common approaches to investigating the spread of an epidemic on networks is Ordinary Differential Equations (ODEs) 1 2 . ODEs are functions of a variable in terms of its derivatives. Compartmental models for epidemic spread can use ODEs to specify the rate of change for individuals in terms of others. A common assumption used to specify ODEs for epidemic spread is mass action, in which the spread of an infection depends only on the proportion of individuals in each compartment. For example, an SI compartmental model assumes that individual i is either infected (I i (t) = 1) or not infected but susceptible (S i (t) = 1) at any time t. These two statuses are mutually exclusive, and S i (t) = 1 − I i (t). An ordinary differential equation that assumes mass action would specify the change in the total proportion of infected individuals I(t) := I i (t) in terms of the infected proportion I(t) at time t. If we assume mass action, we may model the rate of infectious growth in an SI compartmental model as proportional to the proportion of infected individuals multiplied by the proportion of susceptible individuals: In this paper, we consider a collection of c = 1, ..., C cluster pairs, with one cluster in each pair assigned to the treatment condition r = 1 and the other to control r = 0. Furthermore, we assume that clusters are mixed according to mixing parameter γ, For the SI compartmental model, I irc (t) = 1 if individual i is infected and 0 otherwise. We may assume that the spread of an infection across the network pair is a mass action ODE as above, with a simple modification. Let I rc (t) = I irc (t) represent the proportion of infected nodes in cluster pair c at discrete time t. Individual i may contact an individual j in the opposing cluster with probability γ. In this case, the probability of a successful infection requires that i is suspectible and j is infectious. Mass action dictates that the rate of change for each cluster depends only on the proportion of individuals in each infectious status for either cluster, which is now sum of ODEs weighted by mixing parameter γ: According to Supplementary Equations 2 and 3, if γ = 0, the rate of infection in each cluster is identical to Supplementary Equation 1. As γ approaches 1/2, the difference in the proportion of infected individuals in the two treatment arms decreases to no difference.
The ODE approach is quite comparable to the stochastic approach we chose for the paper. To show this, we created network clusters with every node connected to each other in the cluster, performed degree-corrected rewiring, simulated an infectious processes with unit infectivity on the pair according to the paper, and averaged the proportion of infections at each time step. Supplementary Figure  Where the differential equation approach assumes individuals contact everyone in the population, infections spreading through fixed networks only allow contact through existing edges. This redundant contact effect 3 causes infections through networks to be slightly slower, also observable in Supplementary Figure 1.

S2: Modularity and Between-Mixing Parameter γ
Our definition of between-mixing parameter γ (Equation 2) has a convenient interpretation in terms of findings in network science. Modularity Q is a measure of how well the individuals in a network and their relationships fit into mutually exclusive groups 4 . For CRTs, we assume the natural groupings to be the two treatment arms. If Q = 1, all edges exist within treatment arms. If Q = −1, all edges are between the two treatment arms. The definition of modularity is written in the same terms as γ: If the individuals between the two treatment arms have equal numbers of edges, ij kikj (2m) 2 δ(r i , r j ) = 1/2, and γ = 1/2−Q. Therefore, if modularity can be computed, so can the mixing between the two treatment arms. More generally, γ is entirely a function of cluster structure matrix A and treatment assignments, so if an experimenter knows the structure of relationships among individuals in the study, they may calculate the estimate the amount of mixing between the two treatment arms.

S3: Details on the Stochastic Blockmodel
A stochastic blockmodel (SBM) is a probablistic network model, which means that the probability of an edge existing between nodes i and j is specified by probability p i,j . SBM assumes that each network node is a member of a exactly one block in a partition of b blocks B = B 1 , ..., B b , and the probability p i,j of a connection between nodes i and j depends only on each node's block membership. Denote the block membership of node i as B i . A probability matrix P b×b describes all edge probabilities for a network, with p i,j = P Bi,Bj .
In our study, we imitated within-cluster community structure using a SBM. We assume each cluster is comprised of blocks arranged in a triangular lattice structure. Blocks of nodes may be thought of near each other in geographic location, and while most edges are contained within each block, blocks share a few edges according to a triangular spatial pattern. We organized clusters into 10 equally-sized blocks, and individuals within each block are connected to others within their block such that average within-block degree is 9 10 k . For between-block connections, we also assume that each edge between members of blocks share a total between-block degree of

S4: The ICC
The Intracluster Correlation Coefficient (ICC) is a measure of the average correlation between individual outcomes within a cluster. The ICC assumes that the correlation is identical for all pairs of individuals within a cluster, and is constant across clusters. The ICC can also be expressed as the ratio of between-cluster variance to the total outcome variance in the study 5 . In the case of binary outcomes, this value may be expressed as 6 where π c is the proportion of infections in cluster c and · is the average over all clusters in a trial. We These values are quite low, but not very far from typical values 7 and lower values have been reported in actual trials 6 . These values for the ICC are low because in our design, the data is collected for each cluster pair when the average proportion of infections within each pair is 10%, which results in relatively low variation in infection proportions for each cluster.
Like power, the relative value of the ICC depends on within-cluster structure, the amount of between-cluster mixing, and infectivity. In the case of unit infectivity, the ICC shrinks as between-cluster mixing increases for all within-cluster structures. However, in many power calculation formulas 8 , lower values of ICC indicate increased power, not less. This shows that even if sample size calculations account for within-cluster correlations as measured by the ICC, power can be reduced by other trial features, such as the extent of between-cluster mixing.

S5: Degree Distribution for an Empirical Cell Phone Network
The main paper specifies two definitions for an edge between callers in the cell phone network, which are, respectively, unweighted or weighted by the number of total number of calls made between each pair of callers. The empirical degree distribution for both definitions are found in Supplementary Figure 4. Focusing on Panel a, we notice three distinct regimes. The vast majority of callers make calls with 1 − 100 others. The distribution of those who call a large number (100 − 1000) of others follows a nearly straight line on these log-log plots, which is indicative of a power-law for this segment. Finally, a few singular callers are found to call a very large number (> 1000) of callers within the quarter. The general shape is similar for both the unweighted and weighted definitions. This degree distribution is in accordance to similar datasets analyzed in the literature 9 .