Evidence of protective role of Ultraviolet-B (UVB) radiation in reducing COVID-19 deaths

Prior studies indicate the protective role of Ultraviolet-B (UVB) radiation in human health, mediated by vitamin D synthesis. In this observational study, we empirically outline a negative association of UVB radiation as measured by ultraviolet index (UVI) with the number of COVID-19 deaths. We apply a fixed-effect log-linear regression model to a panel dataset of 152 countries over 108 days (n = 6524). We use the cumulative number of COVID-19 deaths and case-fatality rate (CFR) as the main dependent variables and isolate the UVI effect from potential confounding factors. After controlling for time-constant and time-varying factors, we find that a permanent unit increase in UVI is associated with a 1.2 percentage points decline in daily growth rates of cumulative COVID-19 deaths [p < 0.01] and a 1.0 percentage points decline in the CFR daily growth rate [p < 0.05]. These results represent a significant percentage reduction in terms of daily growth rates of cumulative COVID-19 deaths (− 12%) and CFR (− 38%). We find a significant negative association between UVI and COVID-19 deaths, indicating evidence of the protective role of UVB in mitigating COVID-19 deaths. If confirmed via clinical studies, then the possibility of mitigating COVID-19 deaths via sensible sunlight exposure or vitamin D intervention would be very attractive.


Description of Methodology
We apply a fixed-effect log-linear regression model to estimate the effect of UVI on the number of COVID-19 deaths that builds upon Fig. 1 in Manuscript. A log-linear model increases the comparability of the growth rates of COVID-19 deaths across countries because it considers percentage rather than absolute changes over time, and percentage growth rates are more comparable across countries than absolute ones. The model isolates the effect of UVI from country-specific time-constant factors. These time-constant factors consist of the countries' location (measured by its latitude and longitude) and diet-related effects such as dietary supplements, food fortification and diets which are rich in vitamin D. These timeconstant factors also include the composition of vegans and vegetarians in the population, who tend to have lower vitamin D levels than meat and fish eaters. They also consist of other population parameters such as how active their people's lifestyle and mobility are and the composition of the population with respect to their age, skin pigmentation, obesity rates, underlying diseases, co-morbidities or treatments. Furthermore, they consist of the economic and social situation of a population in a country which remains to be fairly stable over our observation period. Furthermore, the model allows to control for an increasing pressure on the healthcare system over time measured by the time passed by since the first reported case of  in the specific country. Importantly, this factor would partial out any linear change of growth rates over time that is similar across countries. Therefore, the model isolates the effect of UVI from an exponential-shaped curve which is often observed in the cumulative COVID-19 deaths over time or in the growth rates of Fig. 2 in Manuscript. The model also isolates the effect of UVI from factors which can influence UVI or individuals' absorption of UVB such as precipitation index, cloud index, ozone level, visibility level, humidity level, and temperature.
Time-constant behaviours of individuals are captured by our country-fixed effects because these fixed effects capture time-constant reoccurring habits or mobility of individuals, which affect the likelihood of exposure to UVB radiation (e.g., walking to work, outdoor exercises, outdoor activities related to work.) Yet, country-fixed effects do not capture time-varying factors, in particular weather that influence the behaviour of individuals and thus the likelihood of UVB radiation exposure.
Because an increase of UVB today plausibly affects individuals several weeks later we include in our model five weekly lags of UVI and of the control variables. Additional lags did not change the results substantively. Thus, we use the following model of equation (1) , represents the cumulative COVID-19 deaths in country at time point (in days) and it is related to the explanatory factors via an exponential growth model on the right-hand side of the equation. The exponential growth model flexibly allows for different shapes of the cumulative COVID-19 deaths. These shapes cover the one described in Fig. 2

of the
Manuscript or often observed S-shaped curves which appear at later stages of COVID-19 outbreaks, depending on how flexible time is allowed to enter the exponential growth model. The exponential growth model consists of six explanatory parts.
1) represents the daily growth rate of COVID-19 deaths from , −1 to , that is independent of the factors presented in Fig. 1 in Manuscript. covers virus-specific attributes like its basic reproductive rate R0 combined with its lethality.

2)
, − represents the for a country at day lagged by weeks. , reflects the effect of lagged by weeks.
3) , − stands for the set of control variables. This set consists of precipitation index, cloud index, ozone level, visibility level, humidity level, as well as minimum and maximum temperature for a country at day lagged by weeks. The vector , identifies the effect of these control variables lagged by weeks.

4)
, stands for the time passed by since the first reported COVID-19 infection for a country at day and identifies the associated effect.

5)
represents time-constant country-specific factors influencing the growth rate of cumulative COVID-19 deaths (e.g., diet related effects, population parameters about their activities and demographic composition).
6) , consists of all the remaining factors that are not identified but also have an effect on the cumulative COVID-19 deaths (i.e., all non-linear differences of growth rates with respect to time and country-specific linear differences of growth rates with respect to time). They could be caused by a decreasing number of people who could potentially become infected or contagious, lockdowns in a country , mutation of the virus in a country over time, systematic false-reports of the dependent variable.
An appropriate transformation (see Section 6 for the Derivation of Equation (2)) results in the estimable equation (2).
and do not appear in the equation anymore and a linear regression can identify all other coefficients. Equation (2)  observations of 152 countries in Table 2 and the 6,524 observations of 152 countries in Table   3 in Manuscript. We present an overview of how many observations of which country we use in our analysis in Table S8.

Derivation of Short-and Long-Run Effects
The interpretation of the coefficients of equation (2) is percentage wise and we can separate the effect of lagged variables into a short-and a long-run effect. For example, a one-unit increase of at time s affects the cumulative COVID-19 deaths = via ,0 in the short-run. After one week, the increase of affects = +1 firstly via = and secondly via ,1 because = +1 = = ,1 (partialling out the daily growth rate and keeping the control variables constant). Consequently, if the model consists of 5 lags, then the longrun effect will be reached after 5 weeks (see Table S1). Therefore, an increase of by one-

Identification of Effect of Ultraviolet Index (UVI)
The key assumption that is required to identify a causal effect of UVI on the cumulative COVID-19 deaths is that , − is uncorrelated to , at all points in time. This means that 1) past or future unexplained parts of , must not affect , . These unexplained parts would appear in , and be correlated with , − for some . The key assumption requires further, that 2) there is no factor affecting , which also influences , in addition to country-specific time-constant factors or those variables which we include in the analysis.
This additional factor would appear in , and correlate with , − for some . Both requirements are always satisfied, if UVI is randomly distributed after controlling for an appropriate set of control variables 1 . , but would not be biased due to the behavioural change. We control for some changes in the behaviours, which may lead to more or less exposure to UVB radiation in two ways. Firstly, the set of weather variables partially controls for variations in individuals' exposure to UVB radiation because individuals are more likely to go outside on less cloudy or on days without rain. Secondly, we control for changes in individuals' exposure of UVB radiation induced by governmental measures (see our description in Table S9).
The second assumption could be violated by changes in the growth rates of the cumulative COVID-19 deaths with respect to countries over time. Such changes could be growth rates which are 1) country-specific and time-constant, 2) linearly time-varying but similar across countries, 3) non-linearly time-varying but similar across countries, and 4) linearly or nonlinearly time-varying and country-specific. Such changes in the growth rates threaten a causal identification of the effect of UVI because UVI increases over our observation period as summer is coming closer (in the countries on the Northern hemisphere). Our main model specification isolates the effect of UVI from changes in the growth rates of type 1) and type 2) by partialling out any country-specific time-constant differences of growth rates and linear changes of growth rates with respect to time that are similar across countries. In our robustness checks we increase flexibility of the model to also capture country-specific linear differences of growth rates with respect to time as well as some non-linear differences of growth rates with respect to time, which are either similar across countries or even countryspecific.
The cloud and precipitation index could also violate condition 2). On the one hand, a high cloud coverage and precipitation today decrease the future number of infections because individuals are less likely to go outside and get infected today and die in future. On the other hand, both indices decrease UVI, because they absorb the radiation. Therefore, these two relationships could create an upward bias in the estimate of UVI. Consequently, controlling for the cloud as well as the precipitation index mitigates the bias and makes a causal identification more plausible.
The air quality could violate condition 2). The decrease in traffic because of COVID-19 over the time of our observation period leads to an improvement of the air quality and air quality is likely to reduce the cumulative COVID-19 deaths 2 . Because UVI increases over time, the air quality could be positively correlated with UVI. These relationships could cause an upwards bias in the estimate of UVI meaning that the true effect of UVI is lower that the estimate. Therefore, controlling for the visibility as a proxy variable for the air quality mitigates the bias and makes a causal identification more plausible.
A country-specific mutation of the virus over time could also violate assumption 2. If a mutation increased the cumulative COVID-19 deaths, then it could positively correlate with UVI because UVI also increased over the time of our observation period. This relationship would create an upward bias. Therefore, negative estimates of the effect of UVI can be considered conservative. Another threat to our identification is a potential systematic false reporting about the cumulative COVID-19 deaths. In the beginning of the crisis it seems likely that not all deaths were tested for COVID-19 (so that the reported number of COVID-19 deaths is smaller than the true value) while nowadays more deaths are tested and reported as a COVID-19 death, even though not entirely caused by it (i.e., reported COVID-19 deaths is higher than true value). This positive correlation of measurement error with time would generate an upward bias of the estimate of the effect of UVI. Therefore, negative estimates can be considered conservative.
Country-specific governmental measures aimed at mitigating the spread of COVID-19 could also violate assumption 2. If such a measure decreases the number of COVID-19 deaths, then it will be negatively correlated with UVI because UVI increased over time. This relationship could create a downward bias of the estimate of the effect of UVI. Therefore, controlling for country-specific governmental measures mitigates this potential bias.

Model Selection to Identify the Effect of Ultraviolet Index (UVI) on the Cumulative COVID-19 Deaths
We estimate equation (2) up to a lag of 8 weeks and decided to choose models with 5 lags and all control variables as presented in Table S2. On the one hand, we did not find major changes with respect to the size and statistical significance of the estimate of UVI. On the other hand, including more lags increases the number of estimates but not the number of observations so that the accuracy of estimates decreases, which favours a more parsimonious model.

Robustness Checks
To examine the robustness of our results we change the dependent variable into the casefatality-rate (CFR). CFR is defined as the cumulative COVID-19 deaths divided by the cumulative COVID-19 infections. Therefore, CFR of country on day is calculated as , where , stands for the cumulative COVID-19 infections. It is a common measure to assess the severity of diseases because a high CFR leads to a high number of cumulative COVID-19 deaths. Its advantage to the cumulative COVID-19 deaths is that it relates the cumulative number of deaths to the cumulative number of infections for a disease.
Therefore, it helps to isolate the effect of UVI on cumulative COVID-19 deaths from its effect on cumulative COVID-19 infections. Provided that the cumulative COVID-19 infections follow the same model structure as outlined in equation (2), we can express , via an exponential growth model in equation (3): The interpretation of the coefficients and variables is essentially the same as in equation (2) but relates to the cumulative COVID-19 infections rather than deaths. Dividing equation (1) by equation (3) leads to equation (4).
After applying the same transformation on equation (4) to derive equation (2) we get the estimable equation (5).
Every coefficient of equation (5)  we expect UVI to decrease the cumulative COVID-19 infections. The reason is not that fewer people get infected but rather that the infections are less severe which makes it less likely that people get themselves tested. One concern when using the observed case fatality rate that is obtained during an epidemic, is that it likely understates the true case fatality rate . Note, however, that the model is robust to a miss-reported value of as long as the error is multiplicative and time-constant. Suppose the observed case fatality rate is where , and , represent the multiplicative error for the cumulative COVID-19 deaths and infections, respectively, which could grow linearly over time. For example, if the true cumulative Covid-19 deaths or infections are first underreported and later over-reported, then or will be positive and , as well as , will first be negative and later (i.e., with larger values for ) become positive.
Even in the presence of those two forms of measurement-error outlined in equations (7) and (8) Similarly to the control variables of equation (1), we add those 8 variables and their corresponding weekly lags to our model. Table S9 and Table S10 outline
Taking the natural logarithm leads to equation (1.1).

Identification of Daily Growth Rate
Instead of demeaning equation (1)

Time Trend
Linear and exponential