Introduction

As of October 14, 2021, the coronavirus disease 2019 (COVID-19) pandemic has claimed over 720,000 lives in the United States alone, with more than 44.7 million confirmed cases1. Current evidence suggests that the primary mode of transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is close contact with infected individuals2,3. Aerosols4,5, which are particulates less than 5 µm in diameter6,7, likely play a significant role in transmission8. After the initial rise of cases in the early winter of 2020, cases remained severe through the spring before dropping in the summer. Given the shelter-in-place order in most states and the rise in humidity, cases generally decreased in May and stayed in lower ranges through the summer until the fall months. In most areas of the northern hemisphere, as fall turns to winter, the weather becomes colder and drier. Lower absolute humidity has been shown to be associated with increased transmission rates of other respiratory viruses (e.g., influenza)9, posing significant concerns regarding potential increases in the number of COVID-19 cases in the fall and winter. The surge in cases through the end of 2020 further supports the seasonal effects of COVID-19.

While several studies have suggested a relationship between climatic factors (e.g., temperature and/or humidity) and COVID-1910,11,12,13,14,15,16,17,18, the exact environmental and biological mechanism behind airborne and droplet transmission and viral survival of SARS-CoV-219 is not yet clear. In influenza, lower atmospheric moisture has been shown to increase the production of aerosol nuclei and viral survival time9, which translates to higher risks of airborne and droplet transmission. Other climatic factors that may impact transmission include temperature and air quality20,21; nevertheless, absolute humidity can still provide a surrogate measure for indoor air moisture and temperature22.

Initial efforts to slow the spread of COVID-19 focused on reducing contacts between individuals through social-distancing measures such as large-scale lockdowns, which were significantly associated with reductions in cases23. However, as the initial lockdowns were lifted and the movement of individuals increased, the correlation between mobility and case growth rates weakened overall24, though upticks in cases were associated with increased mobility during national holidays25. During the months of 2020 and 2021 some counties and states saw increases in cases, while others observed decreases without corresponding increases in movement by any metric. Thus, other factors, including environmental factors, must also be considered as important transmission drivers.

Analyses of the factors influencing COVID-19 have used either climate data21,26,27,28 or human mobility data23, but no study to our knowledge has considered changes in both climate and human mobility on COVID-19 outbreaks in the United States. Preliminary studies have investigated these effects in China but did not consider varying sensitivities to humidity for different climatological regimes, leading to a weaker detection of humidity impacts on transmission risks in areas with higher variations of humidity29. Understanding the potential for climatic factors to increase transmission in the fall and winter is crucial for developing policies to combat the spread of SARS-CoV-2. While the interaction between environmental factors and human encounters is complex, accounting for this relationship is necessary for determining appropriate policies that will be effective at reducing transmissions. Furthermore, indoor gatherings typically increase in frequency and size in the winter and are one of the largest risk factors for transmission7,30. Therefore, greater understanding regarding the added risk of weather changes is needed to aid future decisions on restricting gatherings or implementing mandates for protective face coverings. In this study, we assessed the relative impact of absolute humidity and human mobility in different climatological regimes on reported cases of COVID-19 in the US.

Results

Partitioning climatological regimes

The US is geographically large and encompasses several different climatological regimes with varying absolute humidity trends. We partitioned all 3137 US counties into six exclusive clusters (Fig. 1) ranked by average absolute humidity (AH) using a dynamic time warping (DTW) algorithm which considers both magnitude and functional trends of AH (see “Methods”). The cluster with the lowest average AH was primarily located in the western region of the US, while the region with the highest average AH was located on the southern coast bordering the Gulf of Mexico. Large changes of humidity were seen in clusters High 1 and High 2 which, respectively, includes variances of 26.9 and 30.6 g/m3 (see Fig. S1), while Low 1 and Low 2 humidity clusters had a variance of 4.5 and 14.2 g/m3.

Figure 1
figure 1

(A) Map of US Counties and their respective absolute humidity clusters. Each county is colored based on their cluster. Counties that are included in the regression analysis are indicated by a darker shade. The clustering analysis was conducted using a partitional algorithm that utilized dynamic time warping (DTW) to measure similarity between absolute humidity profiles of 3137 counties in the United States. Expectantly, the clustering of absolute humidity is related to the geography of the counties which serves as a proxy for regional weather patterns and different climatological regimes. (B) The cross-sectional smoothed mean of human encounter absolute humidity, and new case per 10,000 people trends for each cluster group of the 497 counties analyzed in the regression analysis. Map was generated using the ggplot package31 in R.

Associations between humidity and cases rates

We conducted a regression on counties with more than 50,000 people using a generalized linear model (GLM) and controlling for individual movement and behavior with a metric from mobile phone data of visits to non-essential businesses (see Methods), we found that increases in AH were significantly negatively associated with cases per 100,000 of COVID-19 in all the non-high humidity regions (Table 1). We found that counties that belong to the least humid clusters, Low 1 and Low 2, had a 1 g/m3 increase in AH was associated with an average decrease of 14 percent reduction in cases over the entire duration, while the most humid clusters (High 1 and High 2) had a decrease of 4 percent in cases. The largest associations were seen in counties predominantly in the Rocky Mountains (Low 1; 20% decrease in daily cases), Upper Midwest/Northwest (Mid 1; 12% decrease in daily cases), West Coast/Texas/Northeast (Mid 2; 16% decrease in daily cases), and a region stretching along the western edge of the Midwest down to Texas (Low 2; 8% decrease in daily cases). Small but significant effects were detected in two high humidity clusters, both located in the southern region of the US (High 1 and High 2), with respective reductions of 6% and 1% in daily cases with a 1 g/m3 increase in AH.

Table 1 Untransformed GLM coefficient estimates for the entire study period.

The overall associations between AH and COVID-19 cases were negatively correlated when disaggregated across the time periods (Tables 2 and 3). The regression showed that AH had strong associations in the Mid 2 cluster, located in West Coast/Texas/Northeast, during the spring and summer months of 2020 (Table 2). In the fall of 2020 and spring of 2021, AH associations were generally stronger in counties from Mid 2 and High 1 clusters, which are in the West Coast, Texas, Northeast and Southern regions of the US (Table 3).

Table 2 Untransformed GLM coefficient estimates for the 2020 spring to fall period.
Table 3 Untransformed GLM coefficient estimates for the 2020 winter and 2021 spring seasons.

Associations between movement and case rates

In general, movement effects on daily cases are larger than absolute humidity effects, with visits to retail and recreation positively associated with new COVID-19 cases in most of the clusters (Table 1). Mobility trends for retail & recreation and grocery stores & pharmacies had a larger positive effect during the earlier phase of the pandemic for most clusters (March 10 to September 30, 2020) compared to the later phase spanning from October 1, 2020 to March 1, 2021. The residential mobility trend was associated with a decrease in new cases in most clusters during the earlier phase of the pandemic (Table 2), while having a positive effect on daily cases during the later phase (Table 3).

Detecting multicollinearity between movement and absolute humidity

To understand the collinearity of the combined regressions shown in Tables 1, 2 and 3, we conducted robustness checks with additional regressions that included the AH and the mobility trends separately (See Tables S1S18). Additionally, we calculated the Generalized Variational Inflation Factor (GVIF) for the regressions in our robustness checks. Workplaces and Residential Mobility Trends were the least collinear with other independent variables (absolute humidity, immunity factor, and previous 14-day caseload) supported by GVIF values less than 2. Mobility trends in Retail and Recreation Areas and Grocery Stores and Pharmacies were mostly non-collinear with few exceptions with GVIF values ranging between with a mean of 1.53 (range: 1.15–2.30) and 1.65 (1.28–2.63). And finally, Transit Stations and Parks demonstrated the most collinearity with mean GVIF values of 2.15 (1.45–3.71) and 2.01 (1.56–2.83).

Discussion

As the COVID-19 epidemic continues in the US and given the surge of COVID-19 in the winter seasons, there is renewed interest in understanding the relationship between outbreaks and seasonal changes, especially climatological factors related to outdoor and indoor humidity. This is not the first study to investigate humidity impacts on transmission, which been associated with increased transmission of respiratory pathogens (e.g., influenza) and SARS-CoV-2. While SARS-CoV-2 is a novel human virus, other pandemic coronaviruses (e.g., MERS-CoV and SARS-CoV-1)9,32,33,34,35 have also been associated with increased transmission in the winter, thus suggesting similar implications for SARS-CoV-2. Here, we found that the relative effect of absolute humidity on transmissions has so far been significant and was greatest in the Western, upper Midwest, and Northeast regions of the United States, which were clustered into the driest climatological regimes. These results support the hypothesis that falling rates of absolute humidity magnify the transmission risk of SARS-CoV-2, particularly in regions that are more arid and dry36. This effect was less noticeable for more humid regions, such as the coastal and southern counties of the US (Fig. 2).

Figure 2
figure 2

The average daily new cases per 100,000 people plotted against the average Google Mobility Measure of 497 counties for the entire study duration. The plots are organized by type of movement and cluster group. For each plot, we added a simple linear trend line with shaded standard errors.

The effects of behavior and nonpharmaceutical interventions (NPI) are observed in our analysis when we disaggregate the analysis between the early and later phases of the pandemic. In the early phase of the pandemic, we see that an increase in mobility trends for retail & recreation resulted in an increase in daily cases, which measures visits to restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters. While in the later stages during the fall and winter of 2020, retail & recreation mobility had a lesser effect since many of those establishments were closed due to NPI policies. Furthermore, increases in residential mobility played a larger role in transmission, especially during the winter holidays when travel between residential homes occurred at a higher incidence.

The relationship between humidity and transmission is not fully clear, but several studies have shown that as absolute humidity decreases, survival times for enveloped viruses increase nonlinearly, including other coronaviruses9,22,37,38. Our findings support the hypothesis of a nonlinear relationship since the log-linear effects between humidity and case growth varied between climatological regimes. Our stratified regression and Fig. 2 show that different climatological regimes have different sensitivities to humidity changes. The increased survival of the virus in lower AH may be compounded by increased binding capacity, thereby enhancing the potential infectivity of the virus39. As AH falls, relative humidity indoors also decreases, which may increase susceptibility to airborne diseases40. This association suggests that increased humidification of indoor air in high transmission settings may help decrease the burden of COVID-19.

Given that our results suggest COVID-19 cases will increase significantly during winters, areas where humidity typically falls earlier in the fall (e.g., the upper Midwest) are likely to see cases increase earlier. In contrast, more humid regions (e.g., Gulf Coast areas) will likely observe outbreaks later in the winter. However, the results demonstrate that mobility had a larger and significant impact on cases, particularly when humidity was unchanging in the summer. Consequently, falling temperatures and holiday celebrations are likely to increase the risk of people gathering in indoor spaces for longer durations, resulting in a surge of COVID-19 cases through the winter, given that there are no substantial changes in population immunity and behavior.

The prior influenza pandemic in 2009 is instructive here, as increased contact patterns that occurred in the fall likely combined with falling humidity to drive transmission, which resulted in the peak of infections occurring significantly earlier than other years. Given the uncertainty and nonlinear effects of humidity on transmission, increasing vaccination, proper social distancing, and improving healthcare capacities can potentially reduce the toll of the COVID-19 pandemic. In addition, the uncertainty regarding the role of children in transmission41,42,43, who remain largely unvaccinated, suggests that proper precautions related to opening schools is warranted as the potential for transmission increases. While studies linking schools to outbreaks to date have been limited, few have occurred during the winter when transmission is higher.

We suspected that a relationship between human behavior and climate might exist which can cause variations in encounters. During winter months, the likelihood of being indoors increases especially in colder climates. To investigate this potential interaction, we conducted a collinear analysis. We can interpret this collinear analysis as residential and workplace movement patterns not being collinear with meteorological conditions (absolute humidity) and epidemiological factors (immunity factor and new cases per 100,000 (14-day Lag)). Retail/recreation and grocery/pharmacies are moderately collinear, while transit stations and parks were the most collinearly related to meteorological and epidemiological variables.

One limitation of this study includes changing social distancing dynamics and masking adherence between counties. We attempted to account for county-level heterogeneities using fixed effects for each county, but these are static effects. Furthermore, it is difficult to disentangle the epidemiological dynamics that cause exponential growth of cases. Events related to evacuation in natural disasters or mass-gatherings during the summer of 2020 that were not reflected in the Google Mobility Data44 may bias the analysis. Also, as with many COVID-19 analyses on retrospective data, the differences in testing rates at the county-level will result in varying detection rates of actual cases. Potential variations around vaccination efficacy for variants and within-host changes will impact the magnitude and exact timing of outbreaks45.

Transmission of SARS-CoV-2 will likely increase during the winters in the United States and other temperate regions in the northern hemisphere due in part to falling humidity. Studies of prior viruses and preliminary studies of the SARS-CoV-2 virus underpin the theoretical connection between humidity and transmission of droplet and aerosols. Nevertheless, mobility is still a significant driver of transmission.

Methods

Study design

The United States is geographically large, and the timing and magnitude of changes in absolute humidity can vary widely across regions. In order to account for regional differences in humidity, we utilized a partitional clustering algorithm with dynamic time warping (DTW) similarity measurements46 to classify the absolute humidity temporal profile for all observed counties into six exclusive clusters that are ranked based on average humidity. The clustering algorithm was implemented using the dtw package in R47. These clusters are ranked from lowest to highest as Low 1, Low 2, Mid 1, Mid 2, High 1, and High 2. Clustering allowed us to designate groups of counties based on temporal, climatological regimes and to stratify different absolute humidity patterns, which reduces group-level effects and enhances the independence of the data points. The DTW clustering of absolute humidity was conducted on a larger set of 3,137 counties. In the regression analysis, we included data from a subset of counties that had more than twenty cumulative confirmed cases and a population of more than 50,000 people. We excluded any days with fewer than 20 cumulative confirmed cases within each county because early transmission dynamics had a high rate of undetected cases48, making the data unreliable for this analysis. The final dataset used in the regression analysis included 497 counties, where separate panel data GLM was conducted on counties in each cluster (NLow1 = 39, NLow 2 = 42, NMid1 = 118, NMid2 = 108, NHigh1 = 78, and NHigh2 = 105). We assessed the results of the model over the entirety of the dataset and two time periods in 2020–2021: (1) the entire duration of the dataset (March 10, 2020 to March 1, 2021), (2) spring and summer when humidity increases (March 10, 2020 to September 30, 2020), and (3) the fall and winter months when humidity decreases to its lowest point (October 1, 2020 to March 1, 2021).

Data sources

Confirmed case data were extracted from the Johns Hopkins Center for Systems Science and Engineering1 for each county. Population data were obtained from the US Census Bureau49 for 3,137 counties from March 10, 2020 to March 1, 2021. Daily cases were obtained from the confirmed case count by taking a simple difference between the days. Any data incongruencies, such as negative case counts, were omitted in our analysis.

Daily average absolute humidity for each US county, excluding territories, was calculated using temperature and dewpoint data from the National Centers for Environmental Information50 at the National Oceanic and Atmospheric Administration (NOAA). Time series data for the year 2020 from US weather stations were acquired from the NOAA Global Summary of the Day Index51. Weather stations were mapped using latitude and longitude to corresponding counties using the Federal Communications Commission (FCC) Census Block API52. For counties without a weather station, we used data from the nearest station, which was calculated based on distance from the county’s spatial centroid using the haversine formula. In cases where counties contained multiple stations, data were averaged across all stations in a county. Absolute humidity was calculated using average daily temperature and average daily dew point (see Alduchov and Eskridge53).

Data on mobility from March 10, 2020 to March 1, 2021 was obtained from the Google COVID-19 Community Mobility Reports54. We specifically utilized the metric that measures visits to grocery stores & pharmacies, parks, transit stations, retail & recreation, residential, and workplaces by comparing the median rate on the county-level to a 5-week period Jan 3–Feb 6, 2020. The measure was calculated as the percent difference from before policy interventions (e.g., shelter-in-place orders) began to impact movement. This temporal measure allowed us to compare movement differences across counties.

Statistical analysis

For each humidity cluster that was classified using the DTW algorithm, we conducted three multivariate regressions using a generalized linear model (GLM) that assessed the time-weighted association between absolute humidity and non-essential visits with the number of new coronavirus cases (Eqs. 13). The GLM regression results in Tables 1, 2 and 3 are described in the following equation,

$$\begin{aligned} \log \left( {Y_{it} } \right) = & \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} RR_{{i\left( {t - \delta } \right)}} + \beta_{5} GP_{{i\left( {t - \delta } \right)}} \\ & + \beta_{6} PK_{{i\left( {t - \delta } \right)}} + \beta_{7} TS_{{i\left( {t - \delta } \right)}} + \beta_{8} WP_{{i\left( {t - \delta } \right)}} + \beta_{9} RD_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it} \\ \end{aligned}$$
(1)

where Yit, is the number of daily COVID-19 cases for county i at time t, log(N) is an offset term to control for population-size, and α is the intercept. In order to account for population immunity and exponential growth dynamics, we added the independent variables cumulative cases per 100,000, IMt, and lagged daily cases per 100,000, yi(t-δ) to the regression models. Absolute humidity, AHi(t-δ) is smoothed using a 7-day moving average and lagged by δ days. Google mobility trends to retail and recreation, RRi(t-δ), grocery and pharmacies, GPi(t-δ), parks, PKi(t-δ), transit stations, TSi(t-δ), workplaces, WPi(t-δ), residential places, RDi(t-δ), are smoothed using a 7-day moving average, lagged by δ days, and rescaled and centered on the mean. Fixed effects γi for each county were added to capture unobserved heterogeneities between counties. For our study, we assumed that the lag length δ was equal to 14 days, which is based on previous studies investigating lagged effects due to the incubation period of COVID-1955. As our outcome variable was daily cases, we modeled the variable as a Poisson distributed random variable with a log-transformed link function. Standard errors were calculated for the estimated linear coefficients β.

We conducted additional regressions on the absolute humidity and mobility measures as predictors individually to test for robustness. Specifically, we fit a GLM with absolute humidity for each humidity cluster and one measure from rescaled Google COVID-19 Community Mobility as linear predictors for new daily cases, as described in Eqs. (2) to (8).

$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(2)
$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} RR_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(3)
$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} GP_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(4)
$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} PK_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(5)
$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} TS_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(6)
$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} RD_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(7)
$$\log \left( {Y_{it} } \right) = \log \left( N \right) + \alpha + \beta_{1} IM_{t} + \beta_{2} y_{{i\left( {t - \delta } \right)}} + \beta_{3} AH_{{i\left( {t - \delta } \right)}} + \beta_{4} WP_{{i\left( {t - \delta } \right)}} + \gamma_{i} + \epsilon_{it}$$
(8)

To demonstrate robustness in the coefficient estimates, the coefficients in the combined regression analyses with absolute humidity and all mobility trends (Eq. (1)) were compared to the regression coefficients for absolute humidity and each mobility trend (Eqs. (2)–(8)). The analysis using GLM was conducted using the stats package in R (Version 4.0.2). All untransformed coefficient estimates are located in (Tables 1, 2 and 3). In the main text, we reported the logit-transformed estimates as relative change in cases per unit increase (1 g/m3) of absolute humidity. Given the log-linear relationship in a Poisson regression between the covariates and response variable, we can calculate the percent change in daily cases for a unit increase of a covariate to be equal to exp (β) − 1. For example, if β = − 0.112 for absolute humidity, we would state that there is a 9% (= exp (− 0.112) − 1) reduction for 1 g/m3 increase in absolute humidity. To verify that mulicollinearity is not a major issue, we conducted a collinearity analysis by calculating the Generalized Variational Inflation Factor (GVIF) for all regressions, which are listed in Table S19.

In addition to running a GLM regression, we also discretized the data based on months for each humidity cluster and calculated the Pearson correlation coefficient for absolute humidity and Google Mobility Trends against new cases (Fig. S2). Stationarity was checked for absolute humidity and Google mobility trends using the Levin-Lin-Chu unit-root test for unbalanced panel data for the three periods that were analyzed aforementioned regressions. Results for the stationarity are listed in Table S20 in the supplement.

We tested for robustness and externally validated our regressions by conducting additional analysis using K-folds cross-validation. We validated the coefficient estimation of all the GLMs mentioned previously by showing that the relative effect size for each regression was similar. The analysis was conducted over 100 folds or iterations with separate training and test sets derived from a subset of the county-level data. We used test sets for each fold where the mean square error (MSE) was calculated for each fit and shown in Table S22 in the supplement. In order to minimize overfitting, we also excluded county-level fixed effects in our cross-validation analysis. Additionally, we show the 95% confidence intervals of all parameter estimations using the GLM model that includes all variables in Table S23.