Masks and distancing during COVID-19: a causal framework for imputing value to public-health interventions

During the COVID-19 pandemic, the scientific community developed predictive models to evaluate potential governmental interventions. However, the analysis of the effects these interventions had is less advanced. Here, we propose a data-driven framework to assess these effects retrospectively. We use a regularized regression to find a parsimonious model that fits the data with the least changes in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt parameter. Then, we postulate each jump in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt as the effect of an intervention. Following the do-operator prescriptions, we simulate the counterfactual case by forcing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt to stay at the pre-jump value. We then attribute a value to the intervention from the difference between true evolution and simulated counterfactual. We show that the recommendation to use facemasks for all activities would reduce the number of cases by 200,000 (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document}95% CI 190,000–210,000) in Connecticut, Massachusetts, and New York State. The framework presented here might be used in any case where cause and effects are sparse in time.

The burst of the COVID-19 pandemic forced governments around the globe to take health care interventions. One that caused significant controversies is the use of masks among the general public [1][2][3] . Initially, it was assumed that the primary mode of transmission of SARS-CoV-2 was through coughing or contact with surfaces. Extensive shortages of personal protective equipment for health workers led to an initial recommendation for the general population not to wear masks. While no extant studies show surgical masks reduce transmission of SARS-CoV-2 in humans, surgical masks do reduce viral shedding of other coronaviruses 4 , and transmission of SARS-CoV-2 in animal models 5 . Also, cloth masks might filter SARS-CoV-2 6 , mainly if they are worn by the infected individual by preventing droplets from being aerosolized 7 .
The Centers for Disease Control and Prevention (CDC) changed its guidelines on April 3, 2020, and recommended the widespread use of masks 8 . According to the CDC, the rationale behind its policy change was the increase in evidence that asymptomatic and presymptomatic people are infectious [9][10][11][12][13][14] and that there are many undetected cases 15 . Similarly, The European Centre for Disease Prevention and Control (ECDC) recommends using masks 16 . Still, it states that "It is not known how much the use of masks in the community can contribute to a decrease in transmission in addition to the other countermeasures" 17 . On June 5, the World Health Organization (WHO) changed its guidelines and recommended governments encourage the general public to wear masks in specific situations like grocery stores 18 . On December 1, the WHO updated its guidance on masks and included aerosols as a means of transmission 19 . In this paper, we show evidence that the policy change regarding masks by the CDC (and local governments) decreased the number of positive cases in the states of Connecticut (CT), Massachusetts (MA), New York (NY), Rhode Island (RI), and Virginia (VA).
To assess causality, we need to evaluate both branches of an intervention 20 : one in which the intervention did happen, and one in which it did not. The gold standard to do so is the double-blind randomized control trial (RCT) paradigm. Although possible 21 , RCTs are not the norm in public health epidemiological intervention. Even if implemented, there is no such thing as a "placebo arm" for travel restrictions or a double-blind school closure. Since a placebo or double-blind trials are not possible, there is an indirect causal path between the treatment and the outcome 22,23 . For example, people in zip codes with open schools might be more careful with their hygiene because they know that they are at a higher risk than people in zip codes where the schools are closed. When the second branch of the intervention did not happen, it is called a counterfactual ("contrary www.nature.com/scientificreports/ to the facts"). One option to measure the direct effect of an intervention, and the one that we use in this work, is to estimate or simulate the counterfactual branch 24,25 .
Here, we present a framework to analyze data from the COVID-19 epidemic that can simulate counterfactual scenarios in which specific NPI did not occur. In this framework, we use the odds of a positive test as the dependent variable, rather than the number of positive tests 26 or deaths 27 . We motivate a linear equation for the evolution of this dependent variable using the Susceptible-Infected-Recovered (SIR) model 28,29 . We show that we can compute the average number of people infected from one positive case on each day, a parameter known as the instantaneous reproduction number ( R t ) 30 . Finally, we carry out a LASSO 31 regression to fit the data, to obtain a piecewise-linear fit to the logarithm of the odds with the smallest number of breaks (see the Supplementary  Fig. S4 online). This regression finds the times when interventions started, allowing us to simulate alternative scenarios where these interventions did not happen and assess their net impact.

Results
The daily number of cases and tests are highly variable (Fig. 1). To reduce this variability, we compute the logodds, the logarithm of the number of positive tests over the number of negative ones. Doing this, reveals a piecewise linear pattern that we fit using the LASSO regression (Fig. 2). These regressions show three breakpoints in the log-odds in NY, two in CT, MA, MI, and RI, and one in VA. We should stress the fact that these breaks are not an input of the user. On the contrary, this is the result of applying the LASSO regularization. These changepoints happen after different NPIs. Once in place, the NPIs remained in effect during the period that we analyzed. The first change in CT and MA, and the first and second in NY, are due to mobility restrictions (school closure, ban mass gatherings, restriction the non-essential workforce, and stay-at-home orders). In these states, the last break happens after the CDC changed its guidelines regarding masks. In MI, RI, and VA, the stay at home orders and www.nature.com/scientificreports/ the CDC recommendation to wear masks happened closer in time, making it hard to disentangle their effects. However, the MI government only enforced the use of masks in closed areas (like grocery stores), and the VA government never recommended the use of masks. We assume that the lack of local orders correlates negatively with local compliance with the CDC guidelines, and that explains why we do not see the masking effect on MI and VA. We use the slopes of the regression to compute the R t . In Fig. 3, we show R t as a function of time, and in Tables 1 and 2 we show the values of R t , the dates at which it changes, and the dates of the NPIs. The NY plot, in Fig. 3, shows R t dropping down from 2.1 to 1.6 and then to 0.72 on March 30, 8 days after the closure of all nonessential business. Taking them together, this translates to a reduction by 65% on R t due to mobility restrictions. There is a third drop from 0.72 to 0.44 on April 14, 11 days after the CDC changed their guidelines and recommended to wear masks, and 2 days after NY enforced the use of masks for public employees. In CT, the stay-at-home orders reduced the value of R t by 51% . Moreover, after the new CDC recommendation on masks, it dropped by 40% . Remarkably, in MA, after the stay at home order the R t value dropped from 1.9 to 1.1, still above 1. Only after the recommendation of wearing masks it fell to 0.66, below 1. As we already mentioned, we do not see the effect of masks in MI or VA, and we attribute this to the lack of local compliance. In MI, the government only enforced the use of masks in enclosed areas, and the VA government never ruled on the use www.nature.com/scientificreports/ of masks. The data from RI is harder to interpret because stay-at-home orders and masks guidelines happened close in time, and data from before April are unreliable (with less than 500 tests a day). Nonetheless, there is an effect from the CDC guidelines and from local governments making masks mandatory. Finally, one advantage of the sparsifying framework is that we can simulate counterfactual scenarios by removing a breakpoint. Then, the regressed line would have continued at the previous slope. Take the case of the public wearing masks. From the fit shown in Fig. 4, we observe that on April 14 in NY, R t changed from 0.72 to 0.44 . We interpret that the counterfactual to this intervention is that if the public had not used masks, R t would have stayed at 0.72 , or in causal inference jargon do(no masks). Figure 4 (green line) shows that removing the intervention would have resulted in a much more drawn-out dwindling of the case curve. Now, we can use the counterfactual odds to calculate the counterfactual number of positive and compare it with the actual number Orange lines indicate mobility restriction orders such as closing bars, gyms, movie theaters, schools, and banning non essential work. The black line show the moment at which the CDC updated its guidelines to recommend wearing masks. The green lines show moments at which the local states changed their guidelines regarding masks. NY and RI enforced the use of masks for some jobs first, and later on they enforced mask wearing policies among the general public. MA and CT enforced the use masks by the general public. MI only enforced the use of masks in enclosed public areas, such as grocery stores. VA never enforced the use of masks.

Discussion
In conclusion, we found that masks reduced the spread of the virus in CT, MA, and NY. In those states, our calculations showed that the intervention reduced the R t by 40%, and we estimate that masking prevented 200,000 cases ( 95% CI 190,000-210,000) from the moment they were adopted until the end of the stay-at-home orders (see Table 3). These results are consistent with recently published results by Mitze et al. that found the same effect in Germany 32 . Also, we estimate that in New York City alone, masks reduced the number of cases by 29,000 ( 95%   33 . We believe the discrepancy can be accounted for by noting that Zhang et al. did not consider the increase in testing during the stay-at-home order in, leading to a higher difference after the masking order at which point the testing was more stable. The framework that we presented is data-driven, and therefore it relies on only a handful of hypotheses as compared to other methods For example, the counterfactual analysis relies on one hypothesis: the log-odds are piecewise linear (see Eq. 6)-without the need to assume any of the hypotheses of the SIR model. Based on the goodness of fit, we are confident that this hypothesis holds for the datasets presented here (see Fig. 2 and the Supplementary Table S2 online). We also assume that the effects of the different NPIs are independent, in logarithmic scale. For example, we assume that wearing masks reduces the value of R t by the same factor, whether schools are open or not.
It should be noted that the computed R t is representative of the tested population. For example, the R t will be biased towards the one among the vulnerable populations if these are tested more often than others. Due to the structure of the social network, it might be different from the R t of the overall population. This information should be taken into account to interpret our results in a case by case basis.
Also, to put the framework to the test, we apply the method to synthetic data, and we found that it was able to find the corresponding breakpoints and slopes (see the "Supplementary Section Simulations", Supplementary Fig. S5 and Table S4 online). Nevertheless, when there are no reliable figures on negative test results, our framework fails to fit the data (see the Supplementary Fig. S2, S3, and Table S2, online). More detailed data will be necessary to build better models in the future. Ideally, the data would be organized in a case-by-case fashion and it would contain information on the sample criterion. Fundamentally, this information should be available for negative tests results.
Overall, we found evidence that masks reduce the spread of the SARS-CoV-2 and prevent new infections. We hope that our findings will persuade local authorities and intergovernmental institutions to strongly recommend the use of masks to prevent the spread of SARS-CoV-2. We arrived at this conclusion by merging two different traditions: causal inference and regularized regression. We believe that the union of these techniques will be fruitful in other contexts where the causes and effects are sparse in time.

Methods
Data. We collected data from States that offer raw data on the number of tests and positive cases each day.
We found that 16 States offer that information. Some of them offer an Abstraction Protocol Interface (API). Others serve a file with the information. Many states offer a visual dashboard with information on testing, but if they do not offer raw data, we did not use it. There is at least one project that collects data on testing from all the states: https ://covid track ing.com/. This aggregator builds its database based on snapshots of the dashboards published by the states. The information in these snapshots is averaged three or four times a day. This process makes the accumulated number of tests and positives reliably, but not their daily change. That is why we did not use information from this aggregator, and we only use direct information from official sources. We provide links to each dataset in Table S1.
In the main text, we show results for six datasets with the highest R 2 . To show the robustness and also limitations of the framework, we show an analysis for all the states where we found data on the daily number of cases and tests on the Supplementary Figs. S1-S3, Tables S2, and S3, online. In the main text, we limit the analysis to the time in which NYS, MA, and CT ended their stay-at-home order. To show the robustness of the framework, in the "Supplementary Information", we show the results when we apply the method to a bigger timespan. Since, due to backlog, some states have a delay in reporting of about 1 week, we included data until the last day we find reliable data.
The odds as the dependent variable. As can be seen in Fig. 1 the number of daily positive tests, Positive t , oscillates in synchrony with the number of tests. To overcome this source of noise, we propose to use the odds of a positive test: where Negative t is the number of negative tests on day t.
We show the number of positives and the number of tests for each dataset in Fig. 1. www.nature.com/scientificreports/ We show the evolution of the Odds in Fig. 2. The noise due to the variation in the number of tests is reduced, and a trend emerges.
Since we do not have access to the total number of infected individuals, but only to the tested population, we have to use some statistical assumptions about this population. If we assume that the people being tested is a random sample of the population with COVID-19-like symptoms, we can state that: where P t (I|symptoms) is the probability of a patient being positive for SARS-CoV-2 given that she is symptomatic, P t (symptoms) is the probability of having COVID-19-like symptoms, N is the total population, and f t is the fraction of people with symptoms that are selected to be tested (this number can be different each day, for example, if the number of tests available changes). Similarly: where P t (notI|symptoms) is the probability of a patient being SARS-CoV-2 negative given he has COVID-19like symptoms. Now, if we assume that P t (symptoms|I) is constant, we can use Bayes theorem to show that: Then: Finally, if we assume that P t (notI|symptoms)P t (symptoms) is constant: We used four sets of hypotheses. First, we use the assumptions of the SIR model. Second, we use that the tested population is a random sample from the population with COVID-19-like symptoms (Eqs. 1 and 2). This assumption does not hold, for example, if the basis for testing someone is that she was in contact with a confirmed case. If this happens, it follows that our computed R t will be biased towards this over-sampled population. For instance, it would be possible that the calculated R t would be more representative of the one among the elderly than the youth, given that the former is tested more often than the latter. Third, we assume that P t (notI|symptoms)P t (symptoms) is constant. This hypothesis is equivalent to say that the number of people with COVID-19-like symptoms but without the SARS-CoV-2 (for example, people with the flu) is constant. Given that we compute R t in periods that span weeks, it would be enough to assume P t (notI|symptoms)P t (symptoms) is constant during this time or that its change rate is negligible compared with the change rate in the number of symptomatic people with SARS-CoV-2. Fourth, we use that the symptoms show up instantaneously and that the tests are performed and processed on the same day (Eq. 3). This last hypothesis is not true, and it is the reason why, in our analysis, the effects of the interventions show a delay to onset between 8 and 11 days.
Linearization. We write Eq. (4) as a linear function of the rate of change of R t . Defining We can write Eq. (4) as: Now, instead of using b t as the parameters to estimate we decompose each b t as follows: The a i s represent the rate of change of the variable b t in logarithmic scale. Next, we replace the (7) in (6): We can write (8) as a linear problem with the following definitions: www.nature.com/scientificreports/ Importantly, the SIR hypotheses are only necessary to draw the connection to R t (Eq. 5). However, Eq. (8) might hold even if the SIR hypotheses do not. What would change is the interpretation of the parameters.
LASSO regression and feature selection. Since in Eq. (9), we have as many regressors as samples, and we assume that the changes in a are only due to top-down interventions we use a LASSO regression to fit the data 31 . This regression minimizes the loss function: This approach finds a sparse set of β i . We add two extra steps to sparsify even further this set of parameters. If there are contiguous β i = 0 , we set to zero all of them but the first in the chunk. Then, we fit the selected regressors using ordinary least squares, and we recursively remove the β i with p-values * > 0.01 , where p-values * are the Bonferroni corrected p-values. Using the LARS algorithm 37 , we repeat these steps for different values of the hyperparameter α , and we use the fit that minimizes the Bayesian Information Criterion 38 . We show the result of this procedure in Fig. S4.
From fitted parameters to R t . To compute the value of R t from fitter parameters, we have to use Eqs. (5, 7 and 12). From these equations, we arrive at the following equality: where most of the β i values are zero. Using this formula, we arrive at the values presented in Fig. 3 and Table 1 (main text). We show the R t values for all states with data in Fig. 3.

Data availability
All the data and codes to reproduce our analysis are publicly available at https ://githu b.com/ababi no/babin o2020 masks .