Understanding how socioeconomic inequalities drive inequalities in COVID-19 infections

Across the world, the COVID-19 pandemic has disproportionately affected economically disadvantaged groups. This differential impact has numerous possible explanations, each with significantly different policy implications. We examine, for the first time in a low- or middle-income country, which mechanisms best explain the disproportionate impact of the virus on the poor. Combining an epidemiological model with rich data from Bogotá, Colombia, we show that total infections and inequalities in infections are largely driven by inequalities in the ability to work remotely and in within-home secondary attack rates. Inequalities in isolation behavior are less important but non-negligible, while access to testing and contract-tracing plays practically no role because it is too slow to contain the virus. Interventions that mitigate transmission are often more effective when targeted on socioeconomically disadvantaged groups.


.1 CoVIDA Data
Our primary data comes from the CoVIDA project led by the University of Los Andes. This community-based sentinel surveillance initiative was integrated with the district's public health surveillance and organized by occupation group. The CoVIDA project was designed to help contain the spread of COVID-19 through active surveillance among mostly asymptomatic individuals and to provide a range of information that differs from the self-selected symptomatic individuals tested in health facilities. The sample includes 59,770 RT-PCR tests of COVID-19 on 55,078 different individuals in Bogotá from the beginning of June 2020 to March 3rd, 2021. At the time of registration, individuals were surveyed to capture various characteristics, including occupation, socioeconomic stratum, and address.
Two main strategies were employed to recruit participants, and approximately one half of the total sample comes from each strategy. First, through 74 agreements with institutions and companies, we obtained long-lists that we used to contact and invite participants. Most lists were specific to a given occupation, based on the employees of a large company or on a list of individuals who were signed up to a specific mobile app. We also used some lists of residents based on beneficiaries of social programs. We randomly selected participants from the lists and contacted them to invite them to be tested for free. The total population of all lists covers 20% of the population in Bogotá. This means that it is relatively close to a populationbased sampling, but with an over-representation of some occupations that were prioritized in the CoVIDA project, in particular because they were expected to be more exposed (which is why we re-weight by occupation to maintain representativity of actual occupations in Bogotá).
The second source of participants' identification comes from public announcements made by the CoVIDA team through various communication channels to invite people to be tested, stating explicitly that the invitation is open to those that are asymptomatic.
For estimations of cumulative incidence with the CoVIDA data, we convert the positivity rate of CoVIDA tests into a number of daily new cases. To do this, we take into account the estimated sensitivity of the RT-PCR tests, which implies that individuals can be tested positive for a period of 17 days on average [1]. Hence in order to obtain the number of cases per day and per inhabitant, one needs to divide the positivity rate by 17. Intuitively, if, any person that gets infected can be tested positive during 17 days on average, then the positivity rate should be 17 times higher than the numbers of daily new infections. See our companion paper [2] for more details on this calculation.
Our second database comes from administrative records, collected by the Health Secretary of Bogotá (HSB, in Spanish the Secretaria de Salud de Bogotá), that cover the universe of cases of Bogotá residents that have been tested positive to COVID-19 by any laboratory using an RT-PCR test, starting from the beginning of the pandemic (January 23rd, 2020) until February 14th, 2021. All laboratories in Bogotá must report any positive test to the HSB, which in turn reports it to the National Health Institute that provides national statistics used by the World Health Organization. This administrative data also comes with basic socioeconomic characteristics from a form that is a mandatory part of the institution's report when recording a positive case to the HSB.
Both databases include information on an individual's socioeconomic stratum, a classification that is based on the neighbourhood of residence and is commonly used in Colombia as a proxy for the household's economic living conditions. Neighborhoods are categorized into one of 6 levels, where 1 is the poorest and 6 is the wealthiest. To gain power, we pool together strata 1 and 2 and strata 5 and 6, leaving us with the four following groups: SES 1&2, 3, 4 and 5&6.

S1.2 Model details
In the model, each infected individual comes into contact with other individuals and potentially infects them. An example of this process is seen in Figure 1. A potential secondary infection is defined as a contact that would become infected unless prevented by immunity or isolation. The stochastic number of potential secondary infections each infected person generates depends on (i) whether they are symptomatic or asymptomatic, (ii) the number of contacts they have during their infectious period, and (iii) the secondary attack rate. We distinguish between contacts within the household and outside the household ( Figure 1 panel (b)) and each type of contact has a different secondary attack rate. All members within the household are assumed to be contacts and to be from the same SES. Contacts from outside the household are sampled randomly from the entire population with sampling weights that reflect our estimated contact matrix between SES (see Table 1b, Panel b). The model therefore permits assortative mixing; contacts are typically more likely to occur within a SES, but are not restricted to the same SES.
Potential infections may not translate into actual infections for two reasons. First, because of immunity, a potential secondary case that has already been infected in the past will not develop into a new infection (e.g. person C). Second, any out-of-home contact is prevented if the individual is isolating at the time of the potential infection (while within-home transmissions are unaffected by isolation). An individual may isolate for four reasons: (i) after experiencing symptoms, (ii) after receiving a positive test result, (iii) after being contact-traced, or (iv) if the entire household quarantines because one member received a positive test result.
An individual may be tested either because they decide to take a test themselves after experiencing symptoms (person A and person E in Figure 1), or because they are contact traced. An individual who is "detected" (tests positive) is subjected to a process of contact-tracing with a probability lower than 1, leading to the possible testing and detection of each of their secondary cases. For example, in Figure 1, A is detected, leading to B and D (but not E) being traced and tested.
The model is based on a branching process structure. We initiate the model with 1/5000 of the population infected, and iterate over one-period cycles (that are equivalent to one day). In each period, infected individuals can transmit the virus to others, and other events such as isolation, testing, and tracing can also occur. These events, and how they are simulated, are de-scribed in detail below. In all results, the model is simulated with a total population of 100,000, and the proportion of initial cases in each SES is set to be equal to the SES's population as a proportion of total population. The parameters used in the model are summarised in Table S4, which describes the parameters common across all SES, and in Table S5, which describes the parameters that were potentially allowed to differ by SES.

S1.2.1 Secondary Infections
Infected individuals come into contact with other simulated individuals. Contacts are divided into two types, "household" and "external" (outside the household). The number of contacts within the household for individual i, δ hh i , is assumed to be equal to (householdsize i − 1). The number of contacts outside the household for an individual i, δ ext i , is drawn from a negative binomial distribution with a dispersion parameter k = 0.58 (taken from [3]) and a mean of µ j , where j is i's SES (see Figure S6). We then define the number of "potential" infections generated by i. This is the number of infections that would be generated by i, absent considerations of any isolation behavior or immunity. Some potential infections will not convert into actual transmission because the infector is isolating or because the potential infectee is immune. The number of potential infections that i generates in each category is given by a binomial distribution with size equal to the number of contacts and probability of infection from the group-specific secondary attack rate (SAR): People in the same household are assumed to be from the same SES. External contacts are selected randomly from the wider population with sampling weights that depend on the estimated contact matrix seen in Table S5, panel (b). This allows for phenomena such as assortative mixing (i.e. individuals are more likely to encounter someone from their own group).

S1.2.2 Symptoms
Based on data from a recent review [4], we assume that 20% of infected individuals remain completely asymptomatic throughout the course of infection. We also follow this review in assuming that these individuals are less infectious than symptomatics, with a relative risk of 0.35 (i.e. the secondary attack rate is 0.35 × SAR j ). The remaining 80% of infected individuals present symptoms according to the timing described in the next subsection. Symptomatic individuals may get tested upon observing their symptoms (see the Section S1.2.5), whereas asymptomatic individuals will never get tested through this channel.

S1.2.3 Timings
The incubation periods of both infector and infectee are both assumed to be distributed lognormally with parameters drawn from a meta-analysis of the literature on the incubation period for COVID-19 (µ = 1.63 and σ = 0.5) [5].
The serial interval is assumed to be gamma-distributed with parameters α = 8.12, β = 0.64 (and with the distribution function translated by ∆x = -7.5 to allow for negative values), all taken from [6]. This distribution implies that 10.1% of serial intervals will be negative. The mean serial interval is 5.2 days.
If the incubation periods and the serial interval are assumed to be independent, then the implied generation interval is often significantly below 0, which is epidemiologically impossible. Therefore, to make the distributions of the incubation periods and serial interval consistent with a realistic distribution of the generation interval, we allow for correlation between the values of the two incubation periods and the serial interval. To do this, we first draw values from a trivariate skewed-normal distribution [7,8], and then transform the resulting variables to follow the lognormal and shifted-gamma distribution described above. The variance-covariance matrix and the skew parameters of the trivariate skewed-normal are chosen to match the resulting generation interval distribution to a gamma distribution with a mean of 5.2 days (the same as the mean serial interval) and with a freely varying shape parameter. This ensures that the generation interval is realistic and that negative generation intervals are rarely drawn. (In approximately 0.36% of cases, generation intervals less than 1 day are drawn using this procedure. Secondary infections are assumed to only be possible 12 hours after infection, so in such cases we redraw the incubation period and serial interval values using the same procedure until the generation interval is greater than or equal to 1 day.) The process that matches the the generation interval distribution to a gamma distribution is carried out by numerically minimizing the Kolmogorov-Smirnoff test statistic. The shape parameter of the gamma distribution is estimated to be 4.79. The mean of the generation interval distribution, 5.2 days, is consistent with a recent meta-analysis [9], which estimated the mean generation interval to be 4.8 [95% CI 4.3-5.41] when using a fitted gamma distribution. The estimation process results in a shape parameter of 4.79. The probability distributions of all timing variables can be found in Figures S2, S3, S4, and S5. All timing variables are assumed follow the same distribution for both household infections and external infections.

S1.2.4 Immunity and isolation
A "potential" infection in the model may not become an actual new infection for three reasons: 1. The potential infectee has already been infected. All individuals who have been infected once are assumed to be immune indefinitely.
2. The potential infector is isolating at the time of potential infection.
3. The potential infectee is isolating at the time of potential infection.
If the potential infectee or infector is isolating at the time of potential infection, then this reduces the probability of infection to 0 for out-of-home infections, while leaving the probability of within-home infections unchanged.
Potential infectors may isolate for one (or more) of 4 reasons: 1. Symptoms. Upon symptom onset, individuals will isolate with some probability that depends on their SES.
2. Positive test result. Upon receiving a positive test result (see below), individuals will isolate with some probability conditional on their SES.
3. Contact tracing call. Upon being told by a contact tracing team that they have been in contact with an infected individual, individuals will isolate with some probability conditional on their SES.
4. Household quarantine. A proportion ω j of households in each group j are quarantiningtypes. This means that if at least one person in the household isolates because they receive a positive test result, then all the members of the household also isolate for the same period as the detected individual.
All the relevant probabilities can be found in Table S5. Potential infectees also reduce transmission by isolating through the household quarantining channel, although this plays a minor empirical role in reducing overall transmissions in the model simulations. In most cases, isolating individuals stop isolating when they recover, which is assumed to be 10 days after experiencing symptoms. However, there are some cases in which individuals will "deisolate" before recovery. Deisolation occurs only in the case when these conditions hold: (i) individual was isolating due to a contact tracing call, (ii) the individual then receives a negative test result (either because they were tested before they were infected, or because the test was a false negative), and (iii) the individual is not isolating for any other reason. This feature of the model avoids overestimating the efficacy of contact tracing in cases where tests are carried out very quickly.

S1.2.5 Testing and Contact Tracing
There are two reasons for which an individual can be tested in the model: 1. "Self" testing. Upon symptom onset (if symptomatic), individuals will be tested with some probability conditional on their SES.
2. Contact tracing. If an infected individual from group j is "detected" (is tested positive), then everyone they infected has some probability π j to be called by a contact tracing team and then tested themselves.
The relevant probabilities can be found in Table S5.
Tests are imperfectly sensitive, and test sensitivity is assumed to depend on time since infection, following the data from [10] (see Figure S1, in which test sensitivity increases rapidly over the first few days of infection and then decreases after around day 5. All testing processes include relevant delay times. The "test consultation delay" denotes the time it takes for individuals to access a test after symptom onset (if they self-test). The "test results delay" denotes the time it takes for results to be given to an individual after they are tested (for all three types of test). And the "contact tracing delay" denotes the time it takes to contact and test an individual after the original infector is detected (these are assumed to take place simultaneously).

S1.3 Calibration exercises for out-of-home contacts
In order to calculate the full matrix of out-of-home contacts to be input into the model, we first estimate the average number of non-work contacts outside the home for each SES j (from individuals traced in the CoVIDA project), and add it to contacts due to working outside of home, estimated from average number of days of work in a 14-day period for group (from the full data) j. Together, it provides the average number of contacts of an individual from group j, called µ j . We then use the COVIDA contact tracing data to infer the proportion of the contacts of an individual from group j that come from group k, called q jk . This basically assumes that people are "missing at random" i.e. probability of being missing is independent of the group. Together these pieces of information allow us to calculate the full symmetric contact matrix ∆ that describes the number of contacts in an average infectious period from group j to group k (absent any isolation behavior). The steps carried out are described in more detail below.

S1.3.1 Contacts outside the home
We assume that the mean number of contacts an individual from group j has outside of home during an average infectious period is given by µ j , defined as: Where v j is the mean number of non-work contacts from outside of home for group j, and w j is the mean number of days of work for group j. Both of these are estimated directly from the CoVIDA data (see Table 1) in the main text. λ is an unknown parameter (the "work factor" ) that describes the relationship between the number of days at spent at work and the number of out-of-home contacts during an infectious period (we assume this relationship to be linear). We choose a value of λ by calibrating the model to the true value of R q estimated from the data. More specifically, we carry out the following steps: (b) Use the values of µ j (λ 0 ) to calculate the implied full contact matrix ∆ using the maximum likelihood process outlined in Section S1.3.2.
(c) Run 50 simulations of the first 100 periods of the model.
(d) Use these simulations to calculate the implied value of R q for this value of λ 0 following the steps in Section S1.3.3. Call this R q (λ 0 ) .

Set
being the true value of the basic reproduction number calculated based on the real data using the methodology in Section S1.3.3.
3. Use∆ M L (λ * ) as calculated using the method in Section S1.3.2 as the value for the contact matrix in the model simulations.
When we carry out the above steps in this way, we find that λ * = 0.8, implying that the value for the work factor that best matches the observed growth in early cases in Bogotá's epidemic is 0.8. In other words, the model assumes that the mean number of out-of-home contacts during the average infectious period increases by 0.8 with each additional day of work outside of home.
Using this work factor, the estimated contact matrix based on the maximum likelihood procedure in Section S1.3.2 for a population of 100,000 is: The estimated ∆ matrix tells us the number of contacts that occur between each pair of groups in an average infectious period (which is the same across all groups). For example, the value in the first column and second row indicates that 49,440 contacts occur between individuals of groups 1 and 2 (corresponding to SES 1&2 and 3 respectively) during an average infectious period. Contacts are assumed to be mutual, so that the matrix is symmetric. We can see from the estimation result that there is strong assortative mixing: each group is more likely than random to contact another individual from the same group. (Note that smaller values for the 3 rd and 4 th column are due to smaller population sizes in these groups).
When using this estimated contact matrix∆ M L (λ * ), the estimated value of R q (0.8) is 1.222, close to the estimated value in the data R * q = 1.216 [95%CI : 1.170, 1.263]. We use the matrix ∆ M L (λ * ) as an input to the baseline simulations of the model seen in the main results.
Another way of viewing this matrix is by looking at the proportion of contacts for an individual from group j that are from group k: This tells us, for example, that 14.8% of the contacts of an individual from 4 come from group 1. These values are then directly used in the model to set the probability that a potential infection generated by someone from group j is from group k.
In the following section, we describe in detail the maximum likelihood process used to estimate∆ M L (λ 0 ) for each of the candidate values λ 0 of the work factor. Then, in Section S1.3.3 we describe the how we calculate the value of R q both in the data (to calculate R * q ) and in the models (to calculate R q (λ 0 ) . Finally, in Section S1.3.4 we describe the "mobility matching" process, in which we allow the ∆ matrix to be scaled by a time-varying constant over the course of the epidemic in order to account for changes in mobility in Bogotá.

S1.3.2 Maximum likelihood process for contact matrix
The aim of the following maximum likelihood process is to estimate the symmetric contact matrix ∆ , where the element c jk describes the total number of out-of-home contacts (across the whole population) between individuals from group j ∈ {1, 2, 3, 4} and individuals from group k ∈ {1, 2, 3, 4} over the course of an average infectious period: Let µ j be the mean number of out-of-home contacts for an individual from group j across all groups during an average infectious period. For the purposes of this estimation process, we assume that µ(λ 0 ) : ] is given, because it has been calculated in step (1a) in the previous section using a candidate value for the work factor of λ 0 . What follows is thus the estimation process for ∆ M L (λ 0 ). We also treat as given n := [n 1 , . . . , n 4 ] where n j is the number of individuals in group j in the population, because this is known from the data. Given µ(λ 0 ) and n, we use maximum likelihood to estimate the 6 remaining parameters of ∆, which we denote using the vector d = [d 11 , d 12 , d 13 The values c jk are filled out by the parameter vectors µ(λ 0 ), n, and d. In particular, we define ∆(d, µ(λ 0 ), n) as the matrix with elements c jk defined in the following way: 1. First, the d vector directly specifies some elements of the ∆ matrix, so that: Then, we enforce symmetry because contacts between each group are symmetrical so happen at the same rate, so set : Finally, the last row and column are pinned down by the µ vector and the population size since the sum of each row j in the ∆ matrix needs to sum to µ j n j : c jk for all j ∈ {1, 2, 3, 4} To understand how we estimate the vector d, note first that the set of vectors {d, µ(λ 0 ), n} also pin down the probability q ik that a given contact of a person from group i is from group j, defined in the following way: q jk (d, µ(λ 0 ), n) := Pr( Secondary contact is from group k | Individual is from group j) We can use this probability to define the conditional likelihood function for d given the contact tracing data we observe, and the known parameters µ(λ 0 ) and n.
So, assuming that the observations are independently and identically distributed, the conditional likelihood of observing the entire dataset is: Therefore, to run the maximum likelihood estimation procedure, we input the values of µ j (λ 0 ) from the procedure described in Section S1.3.1, along with the known values of n. We then follow the maximisation programme below, maximising the log likelihood function while imposing the constraint that all elements of ∆ are positive: We use the nloptr package in R [11] to run this optimization numerically, using a local COBYLA (Constrained Optimization BY Linear Approximations) algorithm. This yields a maximum likelihood estimate of the full symmetric contact matrix, which we denote ∆ M L (d, µ(λ 0 ), n), sometimes denoted ∆ M L (λ 0 ) for brevity.

S1.3.3 Calculating R q
Here we calculate the value of R q , which we define as the average number of secondary infections generated by an infected individual at the start of the generalized quarantine period under the assumption that the proportion of susceptible individuals is 1 (or very close to 1). We treat Figure S7. Calculation of exponential growth rate of cases in early stage this value as a constant in the early phase of the quarantine. To calculate the value of R q , we use the Lotka-Euler equation [12]. This assumes exponential growth in new cases, assumes that all individuals in the population are susceptible (S = 1), and uses the rate of exponential growth in new cases r and the distribution of the generation interval g(a) to calculate an estimate of R q .
First, we calculate the rate of exponential growth in new confirmed cases per day by running an OLS regression with the natural log of daily confirmed cases as the outcome variable, and the date as the independent variable. We limit our sample to the early period of the epidemic April 1 st 2020 to June 1 st 2020, when the exponential growth curve fits the data well and when immunity is unlikely to play a role in case growth because S ≈ 1. This yields an estimate of r = 0.038 (95% CI: 0.031, 0.046). Figure S7 displays the log daily confirmed new cases in Bogotá over time (the gray dots), and plots the line of best fit (in red) whose slope is equal to r.
Then, we take the estimated generation interval from Figure S4, denoted g(a) where a is the number of days since infection and use this to calculate the initial value of R q using the Lotka-Euler equation [12]: Where g(a) is the density of the generation interval as a function of the day since infection a, r is the rate of exponential growth in new cases.
Using the values of r = 0.38 and the g(a) function from Figure S4, this yields an estimate of R * q = 1.216 [95%CI : 1.170, 1.263]. Note that this estimate comes during a period of strict lockdown in Bogotá, which explains why our estimate of R q is significantly lower than the estimates of R 0 seen in the literature (which are typically calculated in conditions of full mobility [13]).
When calculating the R q (λ 0 ) values based on the model simulations, we carry out exactly the same steps apart from the fact that we add simulation fixed effects to the OLS regression for calculating r. This ensures that between-simulation variation in new cases does not affect the estimates of the rate of the exponential growth in new cases.

S1.3.4 Mobility calibration
In some model specifications, we allow for changes in the number of out of home contacts over time in the model, in order to account for changes in the level of mobility over time in Bogotá (in particular due to changes in policies such as stay-at-home orders and other mobility restrictions).
To do this, we allow for a generalized mobility factor m(t) to change over time throughout the epidemic, so that at each time t, the contact matrix input into the model is equal to: m(t) scales the contacts of all groups similarly, and does not lead to differential mobility changes across groups through the course of the epidemic.
We estimate m(t) by using an iterative process to match the model predictions to the observed pattern of new cases seen in Bogotá. This matching process calibrates the total confirmed new cases in the model (summing all groups) to the total confirmed new cases in the data (summing all groups). A case is deemed confirmed when an individual receives a positive test result (both in the model and the data). This means that any inequality between groups is a result of the other parameters we input into the model described elsewhere. The precise quantity used to match is the total (all group) per capita new confirmed cases per day, smoothed by taking a 2week rolling average. Call this quantity it X k (t; m(t)) when it is calculated from an individual simulation k. And call this same quantity Y (d) when it is calculated from the data with dates d. . This defines the quantities Y (t).
(d) For all t after t corresponding to the lower bound date, calculate the deviation quantity ∆(t) that indicates the extent to which the model predictions deviate from the data: To avoid sharp changes in mobility, we calculated a smoothed version of this function ∆(t) by taking the 1 week rolling average of the deviations.
(e) Making use of the fact that the average time from infection to receiving positive test results is approximately 3 weeks, we set the value of m l+1 (t) in order to account for the deviations from the 3 weeks later: where the adjustment factor γ was chosen to be 1/3, so that a 1 log deviation at t+21 would lead to an adjustment of 1/3 on the value of m l+1 (t) . We then use this value of m l+1 (t) as the input to step (a) for loop l + 1.  are derived from census data. 95% confidence intervals are too small to be seen at this scale; all differences in means are significant at the 1% level. Panel (c) shows the linear fit of the relationship between household size and the probability of being infected conditional on being tested in the CoVIDA data by SES, with 95% shaded confidence intervals. The slopes of the effect of household size on positivity for strata 1&2, 3, 4 and 5&6 are 0.56, 0.58, 0.13 and 0.13 respectively, The p-value of the F-test of difference between these slopes is p=0.0033  Figure S10. Visual representation of the theoretical model. An initial infection A potentially infects four other individuals, called B, C, D, and E. 1a: A successfully infects B, D, and E. C does not get infected by A because she has already been infected previously, and is thus immune to further infections. A gets tested upon experiencing symptoms, and isolates upon receiving a positive test result. This begins a process of contact tracing, through which B and D (but not E) are tested. Individuals in the model may or may not be symptomatic, get tested, be contact traced, and they may isolate for a variety of reasons. 1b: Each individual has contacts both within the home and outside the home. Every individual within the home is a contact, and the number of outside-of-home contacts is drawn from a negative binomial distribution. A contact becomes a potential infection with a probability equal to the secondary attack rate, which differs for within-home and outside-of-home contacts, and by SES. 1c: The infection tree summarises the "branching process" in the model, i.e. the first and second generation potential infections caused by A.  Figure S11. Model match based on confirmed cases with confidence intervals. This shows the same results as panels (a), (b) and (c) in Figure 2. The dashed lines represent the observed data on confirmed cases from the Health Secretary of Bogotá (HSB). The solid lines represent the median predictions over 50 model simulations of per capita confirmed incidence. The shaded intervals represent the 0.025 and 0.975 quantiles of the per capita confirmed incidence over these same 50 model simulations.  Figure S12. Share of cases from each group. The y-axis denotes the number of cumulative cases from the specific SES shown divided by the total number of cumulative cases across all groups. The solid line is the median, and the shaded areas denote the 0.025 and 0.975 quantiles of the model results. The points and surrounding error bars are estimations from the CoVIDA data, calculated using the methodology described in subsection S1.1.    Figure S14. Upward adjustment scenarios epidemic curves. Additional results of the upwardadjustment scenarios, in which the parameters of all SES are adjusted to match the values of SES 5&6. The variable on the y-axis is the per capita incidence over the last 2 weeks in each SES. Parameters adjusted in each set are as follows: out of home (number of contacts outside the home), within home (within-household SAR, household size), isolation behavior (probability of isolating conditional on observing symptoms, testing positive, being contact traced, and probability of quarantining as a household), testing & tracing (probability of self testing, delay in test consultation, delay in test results, and probability of being contact traced). (b) Figure S15. Upward adjustment scenarios when changing within-home SAR and household size.

Results
Results of additional upward-adjustment scenarios, in which the parameters of all SES are adjusted to match the values of SES 5&6. "Within-home" adjusts both household size and within-home SAR. "HH size" and "SAR Home" adjust only the single named parameter at a time. Panel (a) shows the twoweekly incidence for each group in each of the upward adjustment scenarios. Panel (b) shows the overall cumulative incidence for each group in these scenarios. (b) Figure S16. Upward adjustment scenarios with mobility change. Panel (a) shows the two-weekly incidence for each group in each of the upward adjustment scenarios. Panel (b) shows the overall cumulative incidence for each group in these scenarios. Apart from in simulations where the epidemic is completely contained (e.g. Out of home (100%) and Within home (100%)), reductions in transmission early on in the epidemic lead to much more severe second waves, leading to little overall change in the overall cumulative incidence. Mean increase in self-testing probability across whole population Cumulative Per Capita Incidence (all SES groups, entire epidemic) (d) Figure S17. Targeted and untargeted policies with varying intensity. The outcome variable in all cases is the cumulative per capita incidence over the course of the whole epidemic for models with no mobility change. In "Targeted" scenarios, only the parameters of SES 1&2 are adjusted, but adjustments in this group are greater, such that the mean adjustment across the whole population is the same as in the untargeted scenario. Panel (a): outside-of-home contacts are reduced by 1, 2, and 3 relative to baseline scenario. Panel (b): 10%, 20% and 30% of the population are immune to the virus from the start of the epidemic. Panel (c): mean increase of 10 and 20 percentage points in select isolation parameters (probability of isolating conditional on being symptomatic and being contact traced). Panel (d): mean increase of 10, 20, 30, and 40 percentage points in the probability of being tested after observing symptoms. 0% fast testing indicates the same testing delays as in the baseline scenario. 100% fast testing indicates that all tests have a consultation delay of 1 day and a results delay of 1 day. The estimates represent the median of 50 simulations, while the confidence intervals represent the 0.025 and 0.975 quantiles of these simulations.        The table displays the same results as Table 1 and extends the information provided with standard errors [in brackets], confidence intervals in curly brackets, the data source and a an explanation of the estimation method. CoVIDA refers to the primary data collected within the CoVIDA project led by the authors, and HSB refers to administrative data of the Health Secretary of Bogotá on detected cases.