Spatial analysis of 10-year predicted risk and incident atherosclerotic cardiovascular disease: the CoLaus cohort

Whether cardiovascular risk scores geographically aggregate and inform on spatial development of atherosclerotic cardiovascular diseases (ASCVD) remains unknown. Our aim is to determine the spatial distribution of 10-year predicted cardiovascular risk and ASCVD, and to compare the overlap of the resulting spatial distributions. Using prospective data from the CoLaus|PsyCoLaus cohort study (2003–2021) we computed SCORE2 in participants free from ASCVD. Geographical distributions of predicted risk and events were determined using the Gi* Getis-Ord autocorrelation statistic. 6203 individuals (54% women, mean age 52.5 ± SD 10.7, ASCVD incidence rate 5.7%) were included. We identified clusters of high versus low predicted risk (4%, 6%, respectively) and ASCVD (5%, 5% respectively) at baseline. They persisted at follow-up. Overlap of SCORE2 and ASCVD clusters was marginal. Body-mass index and alcohol consumption explained most of the predicted risk distribution. For ASCVD, high clusters persisted or were reinforced after multivariate adjustment, while low incidence clusters were reduced, multifactorial determinants. Incidence rate of ASCVD was 2.5% higher (IC 95%, 1.4–3.7) in clusters of higher incidence of ASCVD. To develop up-to-date, geographically targeted prevention strategies, there is a need to study novel geographically risk factors affecting ASCVD and to update commonly used prediction models for a population approach.


Ethics declaration
The institutional Ethics Committee of the University of Lausanne, which later became the Ethics Commission of Canton Vaud (www.cer-vd.ch) approved the CoLaus|PsyCoLaus study (www.cer-vd.ch; project number PB_2018-00038, reference 239/09).All participants provided written informed consent.This study conforms to the Declaration of Helsinki.

Inclusion criteria
For the present study, we have included participants of the CoLaus|PsyCoLaus study without pre-existing ASCVD and with geocoding corresponding to the urban districts of Lausanne.Participants with missing data for risk calculation were excluded.Participants developing ASCVD before the onset of the follow-up were excluded from analyses corresponding to that period.

Geocoding
Participants were georeferenced at their postal address through the joint use of the Swiss Confederation's geocoding application, the GEOADMIN API (https:// api3.geo.admin.ch/) and the Register of Buildings, and Housing REGBL (https:// www.housi ng-stat.ch/ fr/ index.html).Geocoding was repeated at each follow-up.Urban districts only were considered because they were dense enough to allow for spatial analyses.

Risk prediction models
Our primary endpoint, 10-year predicted risk was computed at the first two periods of the four follow-ups that were conducted in the study, baseline (2003-2006), follow-up 1 (2009-2012), as to be able to compare its distribution with that of the 10-year incident ASCVD.We computed the Systematic Coronary Risk Evaluation Score 2 (SCORE2) and Systematic Coronary Risk Evaluation Score 2-older persons (SCORE2-OP) for participants of age 35-69, and ≥ 70 years, respectively 5,17 .Both were calibrated for a low-risk region.
SCORE2 and SCORE2-OP are sex and competing-risk adjusted predictions models, recalibrated to four defined European risk regions, to evaluate 10-year cardiovascular risk defined as non-fatal myocardial infarction and stroke in apparent healthy individuals.They have been developed by the SCORE2/SCORE2-OP working group and ESC Cardiovascular risk collaboration.The models included the following predictors: age, current smoking, history of diabetes mellitus, systolic blood pressure, and total-and HDL-cholesterol.
In this cohort, SCORE2 and SCORE2-OP were computed using the available algorithms offered by the SCORE2 working group and ESC Cardiovascular risk collaboration using the software Stata (StataCorp.2021.Stata Statistical Software: Release 17. College Station, TX: StataCorp LLC) 5,17 .
Participants suffering from diabetes, chronic renal kidney disease and possible familial hypercholesterolemia (Low-density lipoprotein cholesterol (LDL-C) > 4.9 mmol/L or Total cholesterol > 7.5 mmol/L) were reclassified according to their corresponding categories of cardiovascular risk as defined by the 2021 ESC Guidelines on cardiovascular disease prevention in clinical practice 3,18 .
Reclassified participants were assigned a predicted risk equal to the inferior threshold of their corresponding category of risk.

Incident atherosclerotic cardiovascular diseases
Ten-year incident ASCVD, as a secondary endpoint, was considered 10 years after baseline visit (2003-2006)  and follow-up 1 (2009-2012) for each participant.ASCVD was defined as the presence of either an episode of: (a) acute coronary syndromes (ACS) (acute myocardial infarction (AMI) or unstable angina); (b) sudden cardiac death; (c) symptomatic coronary artery disease (CAD) with greater than 50% stenosis (treated by percutaneous coronary intervention or coronary artery bypass graft); and (d) fatal and non-fatal ischaemic stroke (including transient ischaemic attack).
Medical records of participants who declared an incident ASCVD and/or ASCVD-related procedure were prospectively collected, as well as information on cause of death.Outcomes were independently adjudicated by

Statistical analysis
The geographical distribution of predicted risk and incident ASCVD was assessed using spatial autocorrelation measures, that enable the study of smaller areas, not delimited by administrative boundaries 10,[20][21][22] .Those methods identify clusters of areas or individuals, in which a variable of interest is more closely related to its surrounding than to regions that are more distant [23][24][25] .
Analyses were conducted using incremental fixed distance bands (200 m, 400 m, 600 m, 800 m, 1000 m, and 1200 m) as regions of interest.Additionally, assessments were repeated using a k-nearest neighbours weights method.
Autocorrelation was also stratified by sex assigned at birth, self-reported, and measured on the mean squared error between predicted probabilities or risk and observed values (ASCVD), equivalent to a Brier score (Supplementary material), to identify areas where prediction models lack performance.
To test for global spatial autocorrelation of predicted risk, overall tendency of clustering, the Global Moran's I statistic was used.Because it tends to average local variations, this method does not rule out local clusters.To test for local spatial autocorrelation, to find such clusters, Getis-Ord Gi* statistic was used for continuous and binary variables.A pseudo-p value > 0.05 based on 999 random permutations was considered.
We conducted exploratory analyses to investigate whether covariates may be associated with clusters.Concisely, spatial autocorrelation was measured before and after adjustment, using residuals of univariable and incremental multivariable regressions models 12 .A reduction of the clustering would indicate a positive dependence upon the covariates.We used both traditional OLS, logistic and geographically weighted models (Multi-scale Geographically Weighted Regression (MGWR), Geographically Weighted Regression (GWR)) that allowed relationships to vary across the study area 26 .
Such models result in multiple coefficients of regression across the area.The coefficient of regression are represented on maps, called parameter estimates surfaces, that can be found in the Supplementary material.Models fit metrics were R-squared, percent deviance explained and AICc.Multicollinearity was investigated using variance inflation factor (VIF) and local condition number (CN) (supplementary figure S1-2).
Regarding predicted risk, adjustment analyses were conducted on predicted risk in which age was fixed using the population mean age as a constant, to focus on geographical associations that were independent of age distribution, which is not a modifiable risk factor.
Covariates were selected after conduction of a scoping review on cardiovascular risk factors and according to data availability.Such variables were weekly alcohol consumption and body mass index (BMI) in model one, run on predicted risk.Low education status, Townsend deprivation index per hectare, Swiss origin and living in a couple were added in model two (socio-economic status variables) and, additionally, existence of anxiety disorder, and major depressive disorder in model three (psychological risk factors).Model four consisted in the same variables as model three with the addition of a polygenic risk score, a Mediterranean diet score and daily minutes of moderate intensity sport activity at follow-up 1. Model four included variables that were only available at one period, the first follow-up.
Models run on ASCVD and on the Brier score included the same variables used in model 4 previously described, with the addition of age, existence of an antihypertensive or lipid modifying drug, presence of either diabetes, chronic renal kidney disease, or possible familial hypercholesterolemia (labelled as comorbidity), and predicted risk with age fixed in one additional model ran on ASCVD.Covariates distribution in our sample can be found in supplementary table S1.
Additionally, Wilcoxon, non-parametric, and Chi-square tests for binary and numeric variables were conducted to calculate risk and incidence rate differences between high or low clusters and the rest of the population (supplementary table S2).
Results are presented using mean and standard deviation (SD) for normal variables, median and interquartile range (IQR) for skewed variables, parameter estimates, model fits metrics and dot maps.Overall performance of risk prediction models was evaluated using ROC curves with 95% IC, calibration plots and Brier scores.
Spatial analyses results are presented considering a 600 m bandwidth, the smallest fixed band without isolates which also displayed well-circumscribed clusters in comparison to smaller perimeters.Statistical analysis was conducted using Stata (StataCorp.2021.Stata Statistical Software: Release 17. College Station, TX: StataCorp LLC), GeoDa, 1.20 (Spatial Analysis Laboratory, University of Illinois, USA9), and MGWR (School of geographical sciences and urban planning, Arizona State University, USA).For further details, refer to Supplementary methods.

Missing data
Participants without any data for risk computation or geocoding (154, 2.4%) were excluded from analyses.To understand effect of loss of participants in between follow-ups on geographical distribution, we restricted analysis on participants present a both timelines.To account for loss of participants due to reclassification, we conducted analyses on non-reclassified predicted risk (Supplementary results).
Characteristics of excluded and included participants were described to better understand effect of missing data on sample representation.Further details are presented within the Supplementary material.www.nature.com/scientificreports/

Sensitivity analyses
All sensitivity analyses were conducted based on a 600 m bandwidth.We conducted analyses only on individuals without diabetes, chronic kidney disease or familial dyslipidaemia to assess the performance of the model in an apparent healthy population.Second, to allow for the use of the SCORE2 model only, we repeated analyses excluding participants over 70 years.Autocorrelation on difference of predicted risk between periods, and on ESC risk categories as binary variables was measured (data not shown).
Finally, to assess the impact of loss to follow-up and house moving on geographical distribution of clusters, analyses for each outcome variable of each period were conducted using iteratively geographical coordinates at baseline and at follow-up.Additionally, the latter analyses were restricted to the subset of individuals who did not move during the study to differentiate a relation mediated by house moving from one mediated by attrition.Additional sensitivity analyses (supplementary figure S3-5) are presented within the Supplementary material.
Included participants were healthier than excluded ones (supplementary figure S7, supplementary table S3).Distribution in characteristics, zip codes and covariates were similar between follow-ups (Table 1).
Spatial distribution was relatively consistent between bandwidths, types of weighting schemes (data not shown) and over time (Fig. 1).Sensitivity analyses, shown within the Supplementary material, confirmed the robustness of our results, as we found similar results within an under 70-year population, when using SCORE2 only (supplementary figure S8) or when restricting analyses to an apparent healthy population (supplementary figure S9).
Regarding the temporal changes in clusters of predicted risk, we mainly see a reinforcement of the central area on the map.These results could neither be explained by loss of participants nor house moving as shown on supplementary figure S10.The significance of this change was confirmed by further spatial auto-correlation analyses on the distribution of the difference of risk in between the two follow-ups (data not shown).www.nature.com/scientificreports/

Geographical distribution of 10-year incident atherosclerotic cardiovascular diseases
Then, we assessed the distribution of 10-year incident atherosclerotic cardiovascular diseases.We found local clusters of high (5%, pseudo-p < 0.001, at baseline and follow-up) and low (5%, pseudo-p < 0.01 at baseline, < 0.001 at follow-up) incident ASCVD (Fig. 1).Results were robust according to most of our sensitivity analyses (Supplementary material).
Between baseline and follow-up, a small cluster appeared on the centre of the map and a north cluster disappeared, while the majority eastern clusters persisted.The reinforcement was not related to house moving according to our sensitivity analyses (supplementary figure S10).We were able to recreate similar patterns at baseline by excluding participants that were lost to follow-up, suggesting the changes could be related to an attrition or exclusion mediated effect (supplementary figure S10).

Adjustment
Next, we conducted exploratory analyses to to investigate whether covariates may be associated with clusters.Regarding predicted risk, the analyses showed that clusters were greatly attenuated at baseline and follow-up after adjustment (Figs. 2, 3).The MGWR model displayed the best model fit metrics (supplementary table S7).Alcohol consumption and BMI were most closely associated with the clustering tendency (Figs. 2, 3; supplementary figure S11-13, 1, 2, supplementary table S8-9).
In our linear multivariate regression models, BMI, alcohol consumption, low education, living in couple and the Townsend index were all positively associated with cardiovascular risk (p value < 0.001).Addition of the latter socio-economic variables to a model containing BMI and alcohol did not alter the magnitude of the association of BMI and alcohol with cardiovascular risk.Conversely, adding BMI and alcohol to models comprising low education, living in couple and the Townsend index resulted in an approximately 50% reduction of the magnitude in the corresponding beta-coefficients.
Regarding incident ASCVD, attenuation was found for low clusters with similar results between GWR and OLS (Fig. 4; supplementary figure S14-20, supplementary table S10).High risk clusters persisted after adjustment, some were even reinforced (Fig. 4).Similar results were found with applying MGWR to the mean squared error between predicted probabilities and observed values (Fig. 4, supplementary figure S21, 22, supplementary table S11).
The age, the polygenic risk score and the predicted risk, displaying a non-hazardous association with ASCVD (supplementary table S12), contributed only modestly to the attenuation of low clusters in univariable models,

Geographical distribution of 10-year predicted risk
Our findings highlighted a tendency of local clustering of predicted risk, in line with Bagheri, al. using the Framingham risk equation in a cross-sectional sample of general physician patients, northwest Adelaide , South Australia, 2012 15 .In comparison, our results showed a narrower range of 10-year predicted risk.
At follow-up, we highlighted a reinforcement in the central area of the map, which was confirmed by sensitivity analyses on the difference in risk in between periods.It should be noted that the concerned central area www.nature.com/scientificreports/consists of a region with a known higher rate of smoker and lower levels of HDL-cholesterol in average, as supported by the results of Juan R. Vallarta-Robledo et al. 27 .

Geographical distribution of 10-year incident atherosclerotic cardiovascular diseases
Our results demonstrated a tendency of local clustering of ASCVD.Such clusters have already been identified in a few studies 10,11,16,28,29 .While most of clusters in our study were consistent, some changed over time in agreement with Rajabi et al. 11 , confirming the relevance of having a dynamic approach towards risk assessment.There was a 2.5% variability in incidence rate of ASCVD over 10 years between clusters of higher or lower incidence of ASCVD and the rest of the population, which is clinically meaningful considering an average 10-year incident rate of approximately 6% in the whole population.
While many clusters stayed alike, we observed shifts in patterns in between periods.Most of changes could be explained by the loss of participants.Our main hypothesis is that some participants who developed an ASCVD were excluded from the analyses in follow-ups based on selection criteria.These 'missing' participants, could then have been responsible for the shift in patterns.It is interesting to note the emergence of a central cluster of ASCVD at follow-up 1, overlapping the clusters of high predicted risk.Our interpretation is that factors comprised in SCORE2 do drive ASCVD but fail to capture the total risk over the time and in a given area.This should stem from the fact that total risk is composed of other known and unknown non-observed factors.

Adjustment
In our exploratory analyses, predicted risk clusters were mainly associated with BMI and alcohol consumption.The results are supported by the fact that the geographical distribution of regression coefficients for those variables persisted after adjustment and that the relationship between alcohol consumption, BMI and ASCVD were found in a small perimeter.
Regarding incident cardiovascular events, we found no association between all studied covariates and high-risk ASCVD clusters.In low-risk clusters, attenuation was multifactorial.In the absence of association with behavioral, psychological, socio-economic status variables and traditional cardiovascular risk factors, our hypothesis is that environmental variables, such as pollution, landscape architecture and noise, are highly likely to be the major driver of distribution of cardiovascular risk in this sample.There is a need to deepen our understanding of these novel risk factors.It should be pointed that this result might suffer from an ecological fallacy and might only apply to a similar environment.
Other studies have found a spatial association between CVD and socio-economic characteristics such as education and material deprivation 10 .The discrepancy with our results could possibly be related to a lack of variability in social economic factors in Switzerland, in fact the difference in Townsend scores between the most deprived and wealthiest areas was very narrow in our study.
Our linear multivariate models confirmed a non-spatial association between socio-economic status and cardiovascular risk and suggested it contributes to risk through BMI and alcohol consumption but not in the opposite direction.

Overlap of distribution between predicted risk and incident atherosclerotic cardiovascular diseases
Our results showed marginal overlap between predicted risk using SCORE2 and ASCVD geographical distributions.
In a previous study, Dalton, al. found that the Pooled Cohort Equations Risk Model accounted for ten percent of the spatial variability in ASCVD event rates, in a cohort in the United States 16 .Neighbourhood deprivation accounted for 32% of the variability, which was not the case in our sample.60% of the variability was unexplored in their study.
Another study by Van Rheenen, al. highlighted a marginal overlap between clusters of stroke risk factors and of ischemic stroke 30 .Quality of care associated with having a reported risk factor was hypothesized as the associated factor, however this data lacked in the study.In our study, the discrepancy was robust to adjustment for treatment of absolute risk by antihypertensive and lipid modifying agents.
Inter-cluster differences were more marked for ASCVD than predicted risk, suggesting the map of predicted risk was more homogeneous and that SCORE2 could have underperformed due to missing risk factors that are geographically dependent.www.nature.com/scientificreports/Clinicians and policymakers might be tempted to use SCORE2 or other risk prediction models, as an alternative to focusing solely on cardiovascular events, to assess spatial distribution of cardiovascular risk.Indeed, the dynamic pattern of events might hinder their effort to develop up-to-date local prevention strategies based on geographical epidemiology.This study help increase the awareness on the risk of biased results when using these tools without any previous validation for a local geographical approach.Further research should focus on replication of ASCVD distribution over the time in specific locations and on the need of employing a valid instrument to spatially predict ASCVD.

Strengths and limitations
Our study showed some limitations.First, our analysis is based on a closed cohort, thus not allowing to account for immigration over time.A serial cross-sectional design may have improved generalizability.Availability of some covariates only at certain periods and attrition could have reduced the potential to find associations.
It is also noteworthy to consider that our study is susceptible to an ecological fallacy, as results might not fully be applicable to another ecosystem with a different distribution of cardiovascular risk factors.However, this limitation can also be considered as a strength of the study.Indeed, as our results question the use of prediction risk models for a geographical approach and should incentivize policy makers to validate models locally.
This study presents several strengths.We used a population-based cohort design with a rather long follow-up.Limited data are available regarding geographical epidemiology of cardiovascular disease and our study adds to a better understanding of spatial distribution of ASCVD development.

Conclusion
Using a population-based cohort with a rather long follow-up, we identified clusters of higher and lower incidence of ASCVD, displaying a high variability in incident rate of events, up to ± 2.5% over 10 years compared to the rest of the population.
The geographical distribution of clusters of clinical risk score, mainly driven by alcohol consumption and BMI, was not congruent with that of the ASCVD, limiting their use in identifying areas where ASCVD may develop.Cluster of CVD have been shown to change overtime, underlining the opportunity that prediction tools could offer to assess cardiovascular risk factors dynamically to design up-to-date prevention strategies with a precision public health perspective.
Our results question the use of available prediction risk models for a geographical approach.Further research should focus on replication of ASCVD distribution over the time in specific locations and on the need for clinicians and policymakers of employing a valid instrument to spatially predict ASCVD.
Surprisingly, accounted variables were not associated with clusters of higher incidences of ASCVD.As unaccounted environmental predictors, such as pollution, landscape architecture and noise, might be major drivers of distribution of cardiovascular risk, there is a need to deepen our understanding of these novel risk factors. https://doi.org/10.1038/s41598-024-54900-5

Figure 1 .
Figure 1.Clusters of predicted risk (A,B), 10-year observed cardiovascular diseases (C,D) and mean squared errors of prediction (E,F).Getis-Ord Gi* statistic for the individuals geo-referenced at their postal address.White dots correspond to individual with values that do not exhibit spatial dependence.Red/Blue dots indicate individuals with a positive/negative Z score considering a p value < 0.05 (α = 0.05).This means that high or low values clusters, respectively, within a spatial lag of 600 m, are found closer together than expected if the underlying spatial process was random.

Figure 2 .
Figure 2. Clusters of predicted risk, with age fixed, unadjusted (A), adjusted for alcohol consumption (B), body mass index (C) and multivariable adjusted at baseline (2003-2006).Getis-Ord Gi* statistic for the individuals geo-referenced at their postal address.White dots correspond to individual with values that do not exhibit spatial dependence.Red/Blue dots indicate individuals with a positive/negative Z score considering a p value < 0.05 (α = 0.05).This means that high or low values clusters, respectively, within a spatial lag of 600 m, are found closer together than expected if the underlying spatial process was random.The multivariable model (MGWR) included body mass index, alcohol consumption, living in couple, low education status, Townsend index, Swiss origin, anxiety disorder and major depressive disorder, a polygenic risk score and at follow-up 1 (2009-2012) a Mediterranean diet score, and daily moderate intensity physical activity.

Figure 3 .
Figure 3. Clusters of predicted risk, with age fixed, unadjusted (A), adjusted for alcohol (B), body mass index (C) and multivariable adjusted at follow-up 1 (2009-2012).Getis-Ord Gi* statistic for the individuals georeferenced at their postal address.White dots correspond to individual with values that do not exhibit spatial dependence.Red/Blue dots indicate individuals with a positive/negative Z score considering a p value < 0.05 (α = 0.05).This means that high or low values clusters, respectively, within a spatial lag of 600 m, are found closer together than expected if the underlying spatial process was random.The multivariable model (MGWR) included body mass index, alcohol consumption, living in couple, low education status, Townsend index, Swiss origin, anxiety disorder and major depressive disorder, a polygenic risk score and at follow-up 1 (2009-2012) a Mediterranean diet score, and daily moderate intensity physical activity.

Figure 4 .
Figure 4. Clusters of 10-year observed cardiovascular diseases unadjusted (A,B) and multivariable adjusted (C,D).Getis-Ord Gi* statistic for the individuals geo-referenced at their postal address.White dots correspond to individual with values that do not exhibit spatial dependence.Red/Blue dots indicate individuals with a positive/negative Z score considering a p value < 0.05 (α = 0.05).This means that high or low values clusters, respectively, within a spatial lag of 600 m, are found closer together than expected if the underlying spatial process was random.The multivariable model (GWR) included age, comorbidity, lipid modifying and antihypertensive drug, body mass index, alcohol consumption, living in couple, low education status, Townsend index, Swiss origin, anxiety and major depressive disorder, the predicted risk with age fixed and at follow-up 1 a Mediterranean diet score, and daily moderate intensity activity.Additional models accounting for a polygenic risk score, with similar results, are shown within the supplementary figure S14, 18.

Table 1 .
Participants' characteristics at baseline(2003-2006)and follow-up 1 (2009-2012).Description with mean and SD of variables showing a normal distribution and using median, and interquartile range for skewed variables.SD a , standard deviation; IQR, interquartile range; n, number.