Cardiac dyspnea risk zones in the South of France identified by geo-pollution trends study

The incidence of cardiac dyspnea (CD) and the distribution of pollution in the south of France suggests that environmental pollution may have a role in disease triggering. CD is a hallmark symptom of heart failure leading to reduced ability to function and engage in activities of daily living. To show the impact of short-term pollution exposure on the increment of CD emergency room visits, we collected pollutants and climate measurements on a daily basis and 43,400 events of CD in the Région Sud from 2013 to 2018. We used a distributed lag non-linear model (DLNM) to assess the association between air pollution and CD events. We divided the region in 357 zones to reconciliate environmental and emergency room visits data. We applied the DLNM on the entire region, on zones grouped by pollution trends and on singular zones. Each pollutant has a significant effect on triggering CD. Depending on the pollutant, we identified four shapes of exposure curves to describe the impact of pollution on CD events: early and late effect for NO2; U-shape and rainbow-shape (or inverted U) for O3; all the four shapes for PM10. In the biggest cities, O3 has the most significant association along with the PM10. In the west side, a delayed effect triggered by PM10 was found. Zones along the main highway are mostly affected by NO2 pollution with an increase of the association for a period up to 9 days after the pollution peak. Our results can be used by local authorities to set up specific prevention policies, public alerts that adapt to the different zones and support public health prediction-making. We developed a user-friendly web application called Health, Environment in PACA Region Tool (HEART) to collect our results. HEART will allow citizens, researchers and local authorities to monitor the impact of pollution trends on local public health.

www.nature.com/scientificreports/ hospital admissions and mortality rate, causing mainly respiratory and cardiovascular diseases, including CD 4-6 , whereas a long-term exposure reduces life expectancy [7][8][9][10][11] . Bourdrel et al. 12 showed that long-term exposure leads to an average increase of 11% in cardiovascular mortality for an annual increase of 10 µg/m 3 in particulate matter less than 2.5 diameter (PM2.5). Increased hospitalizations and risk of mortality upon chronic exposure to PM2.5 was also found in two epidemiological studies 13,14 . Similarly, cardiovascular mortality increases for long-term and short-term exposures at dioxide nitrogen (NO 2 ) 15 . Ozone (O 3 ) also has effects on health 16 . Raza et al. 17 highlighted that short-term exposure to O 3 is associated with a high risk of cardiac arrest.
Although the relationship between short-term exposure to air pollution and several cardiovascular pathologies, such as acute myocardial infarction and congestive heart failure 18 , is widely established; the link regarding CD is not yet fully demonstrated.
The aim of this study is to shed a light on CD causation mechanisms and to develop prevention policies for health care by combining pollutants, environmental factors and CD hospitalization events.

Methods
Study design. The Région Sud (previously known as Provence-Alpes-Côte d' Azur, PACA region) is one of the eighteen administrative regions of France with a population of around 5 millions of people covering a territory of 31,400 km 2 . It encloses six departments: Alpes-de-Haute-Provence (04), Alpes-Maritimes (06), Bouchesdu-Rhône (13), Hautes-Alpes (05), Var (83) and Vaucluse (84) (Fig. 1A). The peculiarity of this region is that it presents a wide variety of landscapes: the Alps mountains, plains and coastal zones where the majority of the population lives. The region is subjected to different pollutants with extreme local specificity due to different distribution of industrial and agricultural activities, urbanization and traffic all over the territory. Furthermore, a demographic imbalance is also present, with 1 out of 4 citizens older that 60 years old. Altogether these characteristics make the Région Sud a suitable model to study the relationship between CD and pollution.

Emergency room visits and environmental data.
Our study is based on two different databases. The emergency room visits data, provided by the agency ORUPACA (https:// ies-sud. fr/) that consists of 43,400 recorded events for individuals aged between 20 and 115 years old, who attended one of the 47 emergency centers of the Région Sud with symptoms related to CD between 2013 and 2018. Inclusion criteria include only patients with International Classification of Disease, version 10, (ICD-10, https:// icd. who. int/ brows e10/ 2019/ en#/) emergency codes: "I50" for heart failure and "I48" for atrial fibrillation and flutter (Table 1). We decided to include atrial fibrillation and flutter and not acute coronary syndrome because an atrial dysrhythmia is a frequent cause of heart failure decompensation. On the other hand, acute coronary syndrome presentation is rarely an isolated dyspnea and the main clinical presentation is chest pain, even if ischemia might be the cause of cardiac decompensation. Pre-anonymized patient data are available, including age, gender and residential postal code (Table 2).
Environmental data, provided by the agency AtmoSud (https:// www. atmos ud. org/), are composed by daily pollution and meteorological factors measurements over the period from 2013 to 2018. AtmoSud relies on a set of eighty static and mobile sensors which measure daily pollutants and meteorological variables. Regarding the pollutants, hourly measurements are taken during the day, but only the maximum value for NO 2 , and O 3 , and the average value for PM10 are registered. Then, using the sophisticated model Chimere 19 to interpolate sensors measurements, a detailed distribution map with a resolution of 4 km for each of these variables is released. Specifically, the region is divided in squares of 4 km, each square represents a measure, constituting a mesh of 1995 and 1967 squares for pollution and meteorological variables respectively. The environmental data are composed by 3 pollutants: the daily maximum observed of NO 2 and O 3 and the daily average value of PM10, and 4 meteorological factors, namely: the daily maximal observed temperature (T) and pressure (P) and the difference between the maximal and minimal value ( T and P).
Pre-processing and merging. Missing values and outliers of environmental data. The environmental database contained missing data, specifically there were 43 days missing (2%) for the pollution data and 22 (1%) for the meteorological data over 2,191 days between 2013 and 2018. Outliers were also detected and were processed as missing data. An outlier is an abnormal value, namely a value that can impossibly be feasible for the associated variable. Two methods were used: interpolation and random sampling. The interpolation method was used for variables with a trend or seasonality (O 3 , PM10, P, T) and the random sampling method for variables with a random distribution (NO 2 ), see Supplementary Information for details.
Merging emergency room visits and environmental data. The three databases had different resolutions: nonoverlapping meshes of 4 km squares for pollution (Fig. 1B, insert I) and meteorological variables (Fig. 1B, insert II), and postal codes for emergency room visits data (Fig. 1B, insert III). In order to associate pollutant and meteorological measurements to each emergency room visit, we first needed to change the data resolution of the three databases. Thus, we created 357 zones (Fig. 1B, insert IV) of non-overlapping postal codes with associated a vector of 10 variables, namely: patient id, age, gender, maximal value for NO 2 , O 3 , T, P, daily temperature amplitude ( T ) and daily pressure amplitude ( P ) and the average value for PM10. Finally, we filtered out 8 zones for which no CD events were registered during the period under study, for a total of 349 zones (Fig. 1B, insert V). Once the bases were reconciled, each emergency room visit has been associated to the environmental data of its residential zone. The database reconciliation procedure is fully explained in Supplementary Information.    20 . The DLNM allows to take into account the non-linear and delayed effects of environmental variables on CD events. It is usually employed for time series data. It is based on the definition of a « cross-basis » function, it describes simultaneously the shape of the relationship along both the space of the predictor and the lag dimension of its occurrence. We defined the cross-basis functions of environmental variables and their lags as natural cubic splines in order to take into account the trend and seasonality of the data. To choose the values of the parameters, we used as reference the zone corresponding to the city of Marseille because it has the highest number of CD events during the studied period. We tested several values of degree of freedom, between 2 and 5, for environmental variables and their lags, and 2 to 8 for time function. Among the meteorological variables, to avoid correlated variables, we included in our model only the maximum of amplitude of pressure and temperature and the maximum of temperature and pressure or the average of pressure and temperature. The selection was done according to the minimization of Q-AIC. The degree of freedom selected are: time function (2), (2), ΔP (2). The logarithm of the number of people for each zone (called here "population") is also included in the model. The number of habitants by INSEE code is found on website "notreterritoire.maregionsud. fr" and we calculated the population for our defined 357 zones. Since data were available only for the years up to 2016, we assumed the population to be constant for 2017 and 2018. The association between environmental variables and CD events is studied on a period of 14 days (lags). The model applied is a generalized linear model with quasi-Poisson distribution. The explained variable is the number of CD by date and by zone. We represented the association between the outcome and predictors by the relative risk (RR) or overall effect for a specific increase above the threshold at each lag. Specifically, for each pollutant, we selected four levels of exposure from the World Health Organization (WHO) (https:// www. who. int/) air quality guidelines, namely: "good", "moderate", "lightly", "heavily" (intervals are indicated in Table 3). Then we applied the statistical model based on DLNM to assess the RR for each of the four defined level when data are available for the studied period, compared to the basal reference value for each pollutant, for a period of 14 days before the event. For meteorological variables we divided the distribution in four percentiles (1st, 10th, 90th, 99th) and the RR was assessed compared to median value (intervals are indicated in Table 4). Since both high and low temperature can have an effect on the RR, to measure this association we chose the median because by definition is the value separating the higher half from the lower half of data distribution.
Global model on the entire region. We performed a multivariate meta-analysis with the DLNM 21 on the previously defined zones in order to identify global trends on the risks of CD by pollutants at the region level (Fig. 1B,

insert VI).
Global model on the zones grouped by pollution trends. To take into account local fine structures of pollution distribution, we employed the co-clustering algorithm "multiFunLBM" 22 on the environmental dataset in order to identify groups of zones with homogeneous trends of pollutions (Fig. 1B, insert VII). To define groups of zones with similar trends of pollution, we used a specific co-clustering method developed for time series with R package funLBM. The algorithm takes into account 2 dimensions: space and time. The analysis is done by week of 7 days (313 weeks for the 6 years). The aim is to identify zones and weeks having similar trends of environmental variables distributions. Parameters used are: the basis function is Fourrier, commonly used function for time series of periodic data, the type of initialization is funFEM. According to the Integrated Completed Likelihook (ICL) criterion, 6 clusters are obtained. The cluster 1 corresponds to countryside (69 zones), cluster 2 to coast-side/medium size cities (23 zones), cluster 3 to mountains (31 zones), cluster 4 to coast-side/ big cities (17 zones), cluster 5 to highway (58 zones) and cluster 6 to west side (53 zones). Finally, we applied the same model at the pollution clusters level.
Local model on the singular zones. In order to identify specific local trends, we applied the DLNM singularly on each zone (Fig. 1B, insert VIII). Finally, we selected 23 zones with the highest number of CD events, representatives of each pollution cluster to explore the relationship between pollution and CD at the local level.
Web application HEART . In order to collect and to easily access to the results of this study, we developed a web application called HEART: Health-Environment PACA Region Tool. HEART can be easily downloaded at https:// github. com/ UCA-MSI/ HEART.

Results
Overall characteristics of emergency room visits data and environmental database. The emergency room visits data is composed by 43,400 CD events collected by the 47 emergency centers of Région Sud in the period from 1st January 2013 to 31st December 2018 reporting an event of CD with symptoms related to either heart failure (I50 codes, 98% of the events) or atrial fibrillation (I48 codes, 2% of the events) ( Table 1). A summary of emergency room visits data characteristic is reported in Table 2.
We normalized the number of events by the total population by year and by zone and we reported the incidence of CD events in Région Sud in Fig. 2A. Indeed, only a small percentage of the population is concerned, slightly increasing over the period of the study. As expected, we observed a higher incidence of CD events in wintertime compared with summer (Fig. 2B) and no specific prevalence due to gender (Fig. 2C). The distribution of CD events by age follows the expected pattern: very few events among people younger than 70 years old with a narrow peak for people aged more than 90 years old (Fig. 2D). Finally, the distribution of CD events by department and by zone are reported in Fig. 2E,F, respectively. Overall similar percentages are observed, specifically the department of Alpes-de-Haute-Provence (04) located to the Alps reported the highest percentage of events (0.30%), along with the department of Var (83) on the coast (0.24%) while the lowest percentage is observed in the department of Bouches-du-Rhône (13) (0.14%).  Supplementary Fig. 1. We observe that O 3 has a periodic behavior with peaks during summertime, PM10 has a less strong seasonality and NO 2 does not show any periodicity. As expected, temperature has a seasonal pattern, while pressure shows a month periodic trend. No particular trend is reported for temperature and pressure amplitudes. The average values of each pollutant by zone, reported in Supplementary Fig. 2, highlight the strong spatial distribution of

Global model on the entire region finds significant associations between meteorological factors and CD events.
We applied the DLNM model globally on the entire region and we measured the relative risk (RR) of having a CD event after exposure to pollution and meteorological factors (see "Methods" for details on the model set-up). Results are reported by exposure curves to show global trends and by single day lag-response confidence intervals (CI) to show significance of RR. Overall, no significant associations were found for NO 2 and PM10 ( Supplementary Fig. 4A,C, respectively) on RR of CD when the model is applied globally on the entire region. Slightly significant associations are observed for O 3 when levels raise above "heavily polluted" compared to reference level, increasing the RR up to 1.06 (95% CI: 1-1.15), appearing at lag 0 and decreasing until no significant at lag 3 ( Supplementary Fig. 4B). These results suggest that the global model is not suitable for our data due to the different local distribution of pollution all over the region.
On the other side, regarding meteorological variables, the global model showed good performances, due to a more homogeneous distribution over the territory. As expected, low temperatures (1st percentile) increase the RR of CD progressively from 1.01 (95% CI: 1.00-1.05) at lag 3 up to 1.04 (95% CI: 1.02-1.06) at lag 14 ( Fig. 3A  www.nature.com/scientificreports/ and Supplementary Fig. 4D) as well as high pressures (99th percentile) from lag 3 up to lag 12 [RR between 1.02 (95% CI: 1.00-1.04) and 1.04 (95% CI: 1.01-1.06)] ( Fig. 3B and Supplementary Fig. 4E). High values of temperature amplitude (above the 99th percentile) increase RR of CD up to 1.04 (95% CI: 1.00-1.06) from lag 10 to lag 14 ( Fig. 3C and Supplementary Fig. 4F). Finally, a slight association is observed only for high values of pressure amplitude (99th percentile) at lag 0 yielding a RR of 1.02 (95% CI: 1.00-1.04) up to lag 10 ( Fig. 3D and Supplementary Fig. 4G). Overall, low temperature, high pressure and high amplitude values for these two parameters have a significant association on the RR of CD.

Global model by pollution clusters identified significant associations between pollution and CD events.
As already showed, the global model did not find any conclusive relationship between pollution exposure and CD events. We reasoned that the different territorial distribution of pollutants was not properly considered in the global analysis ( Supplementary Fig. 2). Thus, we took advantage of our previous work on the analysis of the environmental data in Région Sud 22 , in which we grouped the 357 zones in 6 clusters having homogeneous trends of pollution (Fig. 1B, insert VII and Fig. 4A), to exploit the impact of pollution on CD events. Curves of exposure and CI for significant RR values by lag for each cluster and pollutant are reported in Fig. 4 and supplementary table 2, respectively. Exposure curves for NO 2 show a significant impact on RR only for cluster 6, that collects zones of the west side of the region, when the level overcomes the heavily polluted threshold, with linear trend increasing by lag (Fig. 4B).  Colour code: light blue, cluster 1 (internal zones on the mountains); light green, cluster 3 (mountains); dark green, cluster 4 (big cities on the coast); pink, cluster 5 (highway); red, cluster 6 (west side); grey, zones not included. (B) Lag-specific effect on CD events for increases of NO2 levels over the two thresholds as indicated in the legend in the plot for zones in cluster 6 (west side). Lag-specific effect on CD events for increases of O 3 levels over the thresholds indicated in the legend in the plot for zones in cluster 5 (highway) (C), cluster 1 (internal zones on the mountains) (D) and cluster 3 (mountains) (E). (F) Lag-specific effect on CD events for increases of PM10 over the thresholds indicated in the legend in the plot for zones in cluster 4 (big cities on the coast). Maps were created with the R package maptools v1.1-1. www.nature.com/scientificreports/ For cluster 5, the association on RR becomes slightly significant with the increasing of pollution levels between lag 6 and lag 10 (Fig. 4C). An immediate association up to lag 2 is observed when O 3 level raises above lightly for cluster 1 (internal zones on the mountains) (Fig. 4D) and lightly + for cluster 3 and up to lag 4 when O 3 level raises above good, moderate (Fig. 4E).
PM10 has the highest impact on RR in cluster 4 with an immediate association at lag 0 increasing significantly the RR up to 1.1 (95% CI: 1.00-1.3) regardless the level of pollution (Fig. 4F).
Overall, these results pointed out that RR of CD is affected by pollution levels, with trends and lags that depends by local territorial characteristics.
Local model by singular zones identified specific exposure curves trends. Previous results, although significant, are mild. In order to exploit more how local pollution affects the RR of CD, we applied the DLNM statistical model singularly to each of the 251 zones of Région Sud with at least 18 CD events. Then we selected 23 zones with the highest number of CD events in order to guarantee a geographic representativity of the region peculiarities. Results for pollution variables on these zones are reported in Fig. 5, supplementary Fig. 5 and supplementary table 3. The study of several zones allowed the identification of common patterns of exposure curves characteristics of each pollutant. Therefore, four cumulative exposure curves shapes were identified: early and late effect for NO 2 ; rainbow-shape (inverted U) and U-shape for O 3 ; the four for PM10 (Fig. 5,  supplementary Fig. 5). NO 2 has a significant association with CD emergency room visits only in some zones belonging mainly to cluster 5, namely: "Grasse", "Le Luc" (supplementary Fig. 5A), and mildly for "Salon de Provence" with two exposure patterns, namely "early effect" and "late effect" (Fig. 5A). This result is consistent with the characteristics of NO 2 pollution, mainly due to fuel combustion from motor vehicles and present in densely populated zones, and of the zones collected in these clusters passing by the main highway of the region. Notably, only for the zone "Nice" of cluster 4, NO 2 increases over the level heavily polluted yielding a RR of 1.08 (95% CI: 1.00-1.15) at lag 0 (Fig. 5A). The presence of the airport along with densely populated area with elevated traffic might support this result.

Discussion and conclusions
This is the first study of pollution associations on CD events done in Région Sud and, as far as we know, the first study to cover an entire region. We collected pollutants (NO 2 , O 3 , and PM10) and climate measurements (Temperature and Pressure) on a daily basis and 43,400 events of CD in the Région Sud from 2013 to 2018. We present evidences that the three pollutants are associated with significant increment in CD emergency room visits. Four shapes of exposure curves were identified: early and late effect for NO 2 ; rainbow-shape (inverted U) and U-shape for O 3 ; all the four shapes for PM10. Importantly, our approach effectively proposes pollutants monitoring as tool for prediction-making for public health policies.
In this study, we have found significant associations between short-term exposure to NO 2 , O 3 , PM10 and CD events at the zone level. The strongest effect on CD incidence was observed for PM10, as for previous studies reported on other cardiovascular illnesses [23][24][25] . This finding is supported by biological studies suggesting that particulate matter tends to deposit in pulmonary alveoli more than gaseous pollutants, triggering extra-pulmonary effects 13 . Our study reported a positive association of CD events with O 3 , as already showed for acute coronary events 26 . We observed a weak relationship between NO 2 and CD, as previously showed for acute myocardial infarction failure [18][19][20][21][22][23][24] .
We found that not only extreme pollution levels could increase the risk of CD, but also moderate and good levels depending on the zone of the region. Importantly, the DLNM model used, depicted the non-linear and delayed influence of pollution exposure on CD events. Indeed, we observed four different shapes of exposure curves describing the relationship between pollutants and CD incidence, with important consequences for prevention and treatment. The early-effect curves suggest that the RR increases immediately after exposure and decreases after a certain period until no association on CD is observed. Contrarily, for late effect curves, the opposite scenario is to be expected. U-shaped relationship means that both immediate and late effects on the RR of CD events should be expected. Rainbow-shaped curves yield a slightly delayed effect on CD after the pollution peak, that persists for a certain period (depending on the amplitude of the significant lags). Accordingly, specific alerts can be set up by local authorities to prevent CD events.
One of the limitations of this study could be the resolution of the results: according to the General Data Protection Regulation, only patients' residential postal code was available although data are anonymized. Thus, in order to associate pollution measurements to each emergency room visit, we had to change the resolution of pollution data from a mesh of 4 km squares to the zone level, yielding merged pollution values for zones covered by more than one square. This may have led an underestimation of exposure associations. However, since most of the zones matched with only one square, data aggregation is expected to play only a minor role in this underestimation. For the same reason, we cannot distinguish repeated emergency room visits from singular events. Multiple visits of the same individual would suggests an exacerbation of the pathology and might be affected by pollution differently. However the period covered by this study is relatively short to have several repeated visits for CD except for chronic CD that are excluded since they have a different ICD-10 code.
By contrast, some strengths should be discussed. First, the data on CD were collected from all the 47 emergency centers of the region allowing complete coverage and the sample size is quite big to achieve good statistical performances. Secondly, the availability of the exact date of CD events and daily measurements of pollutants allowed the temporal alignment between air pollution exposure and CD incidence. Thirdly, we set up a statistical model that can be applied at different scales, from the entire region up to single cities/zones.
In order to collect and to easily access to the results of this study, we developed a web application called HEART: Health-Environment PACA Region Tool. HEART can be used by local authorities in order to set up specific policies to lower down pollution or for public alerts when pollution raise above secure levels for the citizens that adapt to the different zones. Furthermore, it can help to identify periods of the years that are particularly affected by pollution, in order to set up alerts and prevention behaviors for each specific zone.
In conclusion, the value of this study goes beyond the results obtained for the Région Sud, because the statistical model we set up, could be expanded to study other regions or zones and/or the impact of pollution on other pathologies.