Environmental determinants of COVID-19 transmission across a wide climatic gradient in Chile

Several studies have examined the transmission dynamics of the novel COVID-19 disease in different parts of the world. Some have reported relationships with various environmental variables, suggesting that spread of the disease is enhanced in colder and drier climates. However, evidence is still scarce and mostly limited to a few countries, particularly from Asia. We examined the potential role of multiple environmental variables in COVID-19 infection rate [measured as mean relative infection rate = (number of infected inhabitants per week / total population) × 100.000) from February 23 to August 16, 2020 across 360 cities of Chile. Chile has a large climatic gradient (≈ 40º of latitude, ≈ 4000 m of altitude and 5 climatic zones, from desert to tundra), but all cities share their social behaviour patterns and regulations. Our results indicated that COVID-19 transmission in Chile was mostly related to three main climatic factors (minimum temperature, atmospheric pressure and relative humidity). Transmission was greater in colder and drier cities and when atmospheric pressure was lower. The results of this study support some previous findings about the main climatic determinants of COVID-19 transmission, which may be useful for decision-making and management of the disease.

www.nature.com/scientificreports/ the analysis of temporal and spatial relationships of these factors with COVID-19 transmission rate. Most of these studies have reported a negative relationship between transmission rate and several proxies of temperature and humidity, suggesting that the disease spread is enhanced in colder and drier climates [5][6][7][8][9] .Other environmental variables have received less attention and results have been inconclusive or differed among countries. For example, one study reported an inverse relationship between COVID-19 transmission and wind speed in Iran 10 , while global studies have found no significant association between both variables 11,12 . A negative relationship of the disease transmission with solar radiation has been reported in Iran 10 , and a positive relationship with the concentration of atmospheric pollutants was found in China 13 . Overall, and despite the rapid response of the scientific community to understand the transmission of COVID-19, the role that environmental variables play in the disease dynamics remains an open question that requires further evidence across the world. The goal of this study was to examine the potential role of multiple environmental variables in COVID-19 transmission rates and patterns in Chile. Environmental variation within Chile is unique due to the particular geography of this country, which includes an altitudinal range of ≈ 7000 m from the sea level to the top of Aconcagua mountain, and a ≈ 40º latitudinal gradient that covers 6 climatic zones, including desert, semiarid, mediterranean, marine west coast, tundra and ice sheet. At the same time, population across the country shares a common social behaviour, and regulations are established by a single national authority, allowing the evaluation of environmental variables under relatively consistent socio-economic conditions 14 . We thus aim to provide information about COVID-19 transmission across a wide range of environmental variation within a single country that may help understanding the dynamics of this disease.
Relationship between COVID-19 and predictive variables matrices. The major statistical associations between the variables were registered without the time lags analyzed (0 days). The correlation analysis showed a strong relationship between the average, maximum and minimum temperatures, as well as between altitude and relative humidity, atmospheric pressure and solar radiation. There was no correlation between population density, wind speed and IR, and low correlation between precipitation and all other variables (Fig. 1). According to multicollinearity analysis, the variables of greatest importance were minimum and mean tempera- www.nature.com/scientificreports/ ture (variance inflation factor, VIF > 10) and the least important was maximum temperature (VIF < 10). Given the above, the reduced the data matrix excluded mean and maximum temperature and altitude as predictor variables.
The training model was adjusted using the database with 3368 observations and 7 predictive variables. The hyperparameter selection allowed improving the predictions. The final tuning values used for the model were n_estimators (number of iterations in training) = 56, max_depth (maximum depth of the tree) = 4, eta (model learning rate) = 0.03, gamma (Minimal loss reduction required to perform an additional partition on a leaf node of the tree) = 0, colsample_bytree (the last parameter that we need to config) = 0.5, min_child_weight (sum of sample weight of the smallest leaf nodes to prevent overfitting) = 1 and subsample (sampling rate of all training samples) = 1 ( Table 2). The predictions corresponding to week 0 showed a lowest error with 56 iterations ( Table 3). The scatter plot of predicted mean relative infection (IR) versus observed values using the final model   www.nature.com/scientificreports/ is illustrated in Fig. 2. The scatterplot considering all parameters demonstrate an acceptable prediction of IR with a R 2 = 0.32 (R = 0.57). The Gain Score showed that the most important variables were minimum temperature (Tmin), atmospheric pressure (AP) and relative humidity (RH) (Fig. 3). All these final selected variables showed a highly significant negative relationship with the infection rate (p < 0.0001; Tmin r = − 0.25, AP r = − 0.23, RH r = − 0.21).

Discussion
Our results demonstrate that COVID-19 infection rate in Chile to date has been linked to 3 main environmental variables: minimum temperature, atmospheric pressure and relative humidity. Firstly, we found a negative relationship between infection rate and minimum temperature. Other studies have reported a similar, negative relationship between air temperature and the transmission of COVID-19 [15][16][17][18] and other respiratory diseases such as SARS 19 . However, a positive correlation with average and minimum temperature has been reported in Singapore, especially in the initial phase of transmission 20 . Others have found an indirect, positive effect of average temperature on the spread of the SARS-CoV-2 virus due to enhanced people's mobility at higher temperatures 21 .  www.nature.com/scientificreports/ These findings are particularly concerning at present in the southern hemisphere, which is entering winter and therefore lower temperatures are expected in the coming months, which could drive an upsurge of the disease. Atmospheric pressure was the second relevant variable and it was negatively related with the spread of the SARS-CoV-2 virus. The link between atmospheric pressure and the spread of the SARS-CoV-2 virus has been studied in several countries [20][21][22][23][24][25][26] , since atmospheric pressure is responsible for air movement (wind), cloud formation, precipitation and humidity. Therefore, this variable has strong influence on climatic variation, generating favourable conditions for the virus spread in some cases (drought and light wind) but not in others (high humidity and strong wind). Others have provided evidence for a direct link between atmospheric pressure and the virus spread, indicating that the unusual persistence of an anticyclonic atmospheric situation (i.e., abnormally strong positive phase of the North Atlantic and Arctic oscillation) in southwestern Europe, centered in Spain and Italy during February 2020, generated conditions of drought and light wind that could have favored the faster spread of the virus compared to other European countries 22 . This is reinforced by the positive correlation found between atmospheric pressure and the frequency of COVID-19 cases in Mozambique 25 , and with several spread parameters (infection rate, effective reproduction number and compound growth rate) in 487 cities in the United States 23 . Such positive relationship could be related to an increase in fog associated to high pressure, which increases the humidity of the air and surfaces. However, other studies have found an inverse link between atmospheric pressure and the spread of the SARS-CoV-2 virus in Singapore and China 20,26 , which could be explained by the fact that high pressures can limit suspension time of viral particles in the environment 26 . Indirectly, atmospheric pressure could also reduce the virus spread by limiting people's mobility 21 . Overall, there is no consensus on the link between atmospheric pressure and the spread of the SARS-CoV-2 virus since there is evidence that describes both direct and inverse correlations, even both within the same country (e.g., Italy) 24 .
The negative relationship that we observed between relative infection rate and relative humidity is consistent with former evidence that high relative humidity reduces the COVID-19 viability 27,28 and transmission rates 7,29 . Similarly, high relative humidity has been reported to reduce the survival of the influenza virus 30 and the incidence of this disease 8 . Environmental humidity can affect viral transmission through its interaction with respiratory droplets, which act as virus containers and can remain longer in dry air 31,32 . Additionally, high relative humidity leads to inactivation of the viral lipid membrane, and consequently a decrease in the virus stability and transmission 33,34 . However, a study found a direct link between average relative humidity and the SARS-CoV-2 basic reproductive ratio in China 26 . Again, relative humidity can indirectly contribute to the spread of the SARS-CoV-2 virus due to its influence on people's mobility 21 .
In conclusion, our study shows that climate plays a key role in the transmission of COVID-19 in Chile, a country that comprises a particularly high variation of environmental conditions. Importantly, it is highly likely that climatic conditions expected for the coming months in the southern hemisphere (i.e., lower temperature, humidity and atmospheric pressure) can favour a higher disease transmission speed. Our study and others providing information about how climatic factors can influence the spread of the disease may serve as the basis for predictive models of COVID-19 transmission through space and time, which will be highly relevant to decisionmaking and management of the disease.
Chilean population is 19.11 million inhabitants, of which 51% are women and 49% men. Life expectancies are 83 (women) and 78 (men) years old; 68.7% of the population is between 15 and 64 years old and 11.9% over 65 years old. The 88% of inhabitants live in urban areas and the estimated international migration rate is 12 per thousand inhabitants. The 13% of the population belongs to indigenous or native groups; 80% Mapuche, 7% Aymara and 4% Diaguita 37 . The population is aging as a result of the decline in fertility and the increased life expectancy 38 .
Chile has 16 administrative regions 3539 , of which the Metropolitana region concentrates the largest population (7.1 million inhabitants), followed by the Valparaiso region (1.8 million inhabitants). In contrast, the Aysén and Magallanes regions, located in the southern extreme of Chile, have the smallest population (< 200,000 inhabitants). Inhabitants > 65 years old mainly inhabit the areas with mediterranean climate in the cities of Santiago, Valparaíso and Concepción, and correspond to 6.28% of the total employed inhabitants in the country 40 . By 2050 it is projected that total population size reaches 21.6 million (i.e., an increase of 15.3% compared to 2020) under assumptions of birth and immigration surpassing mortality and emigration, with inhabitants > 65 years old predicted to exceed 3 million (25% of the population) 38 .

COVID-19 transmission data and predictive variables.
We characterized the COVID-19 transmission in Chile from February 23 to August 16, 2020, based on mean relative infection rate [IR; (number of infected inhabitants per week/total population) × 100,000) of 360 cities. Data were obtained from official sources of the Government of Chile 41 . We extracted daily climatic data from the databases of 159 meteorological stations in Chile 42 corresponding to cities with and without presence of COVID-19, for the same period; these data were averaged per week to make them comparable with variables quantifying the disease transmission. The data of www.nature.com/scientificreports/ the climatic variables recorded every 1 h at meteorological stations were extracted and average weekly expressing them as follows: weekly average, maximum and minimum atmospheric temperature (°C); weekly average relative humidity (%; moisture content (i.e., water vapor) of the atmosphere, expressed as a percentage of the amount of moisture that can be retained by the atmosphere (moisture-holding capacity) at a given temperature and pressure without condensation) 43 , absolute humidity (g m −3 ), accumulated precipitation (mm), atmospheric pressure (mbar), ultraviolet solar radiation (Mj m −2 ) and wind speed (km h −1 ). Additionally, we obtained data for other relevant environmental, demographic and geographic variables, those that were averaged and expressed as follows: air pollutant data, including particulate matter with aerodynamic diameter ≤ 10 μm (PM10) and ≤ 2.5 μm (PM2.5), obtained from a database of 30 air quality stations 44 ; and city area (km −2 ), population size (ind), population density (ind km −2 ), latitude (absolute degrees), longitude (absolute degrees) and altitude (m a.s.l.), obtained from CONAF 45 and IDE Chile 46 .
Statistical analyses. In order to analyze time lags in the transmission of the virus, three databases were built using environmental information with different time lags of contagion with respect to the response variable (IR), these being: (a) 0 days, (b) 7 days and (c) 14 days. Each database was subjected to an exploratory analysis, which allowed the identification of missing, influential or out of range data, and elimination of variables that were non-influential and/or highly correlated with others. This allowed reducing dimensionality by eliminating redundant information. For this, a correlation matrix was constructed and a boosting model was fitted using the VIF criterion 47 . For the final variables selection were modeled through extreme gradient boosting (XGBoost 48 ). This model consists of a successful machine learning library based on a gradient boosting algorithm proposed by Chen 48 , which sequentially processes the data with a loss or cost function, minimizing the error iteration after iteration and increasing predictive power compared to other sequential tree models. We used VIFs to identify multicollinearity 46 and data were normalized, which is important for machine-learning estimators. Dataset records were shuffled and split to 80% for the training and 20% for the test.
The model was trained with the original parameters adjusting the depth (max_depth) between 1 and 10 49 . Once the best iteration was identified, we proceeded to predict on the validation set. The model was tuned using hyperparameters ( Table 2). We used a grid search on hyperparameters, parallelizing the search, with threefold cross-validation was carried out to find the best model based on, root mean square error (RMSE), R 2 metrics and mean absolute error (MAE). The xgboost and caret libraries of the R software 50 were used for the analyzes. Once the most important variables were selected, a spatial representation of each of them and climates was generated for each city, which was adapted from Sarricolea et al. 36 . For this, the existing spatial coverage in shape format in the national database of the Spatial Data Infrastructure IDE-Chile 46 was used. For the management and analysis of spatial data, the ArcMap software version 10.8.1 (ESRI Inc., Redlands, California, USA) was used. www.nature.com/scientificreports/

Data availability
The raw data was supplied as a Supplementary Information File.