Geographic variation in cardiometabolic risk factor prevalence explained by area-level disadvantage in the Illawarra-Shoalhaven region of the NSW, Australia

Cardiometabolic risk factors (CMRFs) demonstrate significant geographic variation in their distribution. The study aims to quantify the general contextual effect of the areas on CMRFs; and the geographic variation explained by area-level socioeconomic disadvantage. A cross sectional design and multilevel logistic regression methods were adopted. Data included objectively measured routine pathology test data between years 2012 and 2017 on: fasting blood sugar level; glycated haemoglobin; total cholesterol; high density lipoprotein; urinary albumin creatinine ratio; estimated glomerular filtration rate; and body mass index. The 2011 Australian census based Index of Relative Socioeconomic Disadvantage (IRSD) were the area-level study variables, analysed at its smallest geographic unit of reporting. A total of 1,132,029 CMRF test results from 256,525 individuals were analysed. After adjusting for individual-level covariates, all CMRFs significantly associated with IRSD and the probability of higher risk CMRFs increases with greater area-level disadvantage. Though the specific contribution of IRSD in the geographic variation of CMRF ranged between 57.8 and 14.71%, the general contextual effect of areas were found minimal (ICCs 0.6–3.4%). The results support universal interventions proportional to the need and disadvantage level of populations for the prevention and control of CMRFs, rather than any area specific interventions as the contextual effects were found minimal in the study region.

often-reported that socioeconomically disadvantaged individuals in Australia, on average, experience a greater disease burden than their less disadvantaged counterparts [25][26][27][28] . This tendency is also evident at a contextual level when studies have investigated association of CMRFs with area-level socioeconomic disadvantage in Australia 5,7 and globally 4,9,10,12,14,16-18,20,23,29-33 . Consistent with this, men from highly urbanised environments have been reported to have higher incidence of coronary heart disease with increasing residential area socioeconomic disadvantage, after adjusting for individual characteristics 18 . Also, lower area-level disadvantage has been reported as being associated with lower prevalence of some behavioural cardiac risk factors such as smoking, physical inactivity and obesity etc. in some studies 9,10,34 . Most of the reported associations of CMRFs with area-level socioeconomic disadvantage were independent of individual-level characteristics such as age and educational attainment. Even though the area-level associations of CMRFs were significant in these studies, the results were often dependent on the CMRF analysed, the measures of area-level socioeconomic disadvantage and the geographic scale at which associations were examined 35 .
Multilevel analyses of CMRFs based on the average measures of association or variation alone are insufficient to report the geographical variance as similar associations were possible with very different scenarios of area variance 36 . Multilevel findings extending on the general contextual effects and reporting the proportion of the total area-level variance along with the measures of clustering and the average measures of association or variation are appropriate and informative in reporting area-level influences, but less common 23,[36][37][38] . To differentiate the relative importance of individual versus area-level interventions for the prevention and control of CMRFs, the geographical component of the total individual risk variance has to be identified in a multilevel approach.
Therefore, the aims of this study are to (1) quantify the general contextual or geographic effect of areas on CMRFs, over and above their individual-level compositions; and to (2) quantify the geographic variation across multiple CMRFs specifically explained by area-level socioeconomic disadvantage, within the Illawarra-Shoalhaven region of NSW Australia. Quantification of the general contextual effect and the variation specifically explained by area-level socioeconomic disadvantage will assist our understanding of the socioeconomic context of CMRFs in the study region and provide guidance for health service commissioning more generally nationally.

Methods
A cross-sectional multilevel design was adopted to account for the hierarchical nature of the data and analyses. No informed consent were obtained for the individual-level data used in this study, as the study used existing data which were already de-identified. The study was approved by the University of Wollongong and Illawarra and Shoalhaven Local Health District Health and Medical Human Research Ethics Committee (HREC protocol No: 2017/124). All the methods and analyses were performed meeting the relevent ethical guidelines and regulations of the committee.
Study area and data. The study was conducted in the Illawarra-Shoalhaven region of the New South Wales (NSW) state in Australia. The Illawarra-Shoalhaven region is a coastal plain along the south-east border of NSW; situates at the immediate south of the metropolitan boundaries of Sydney; and encompasses multiple regional cities, towns and rural areas. This region covers a land area of 5,615 km 2 , and had an estimated residential population of 369,469 at the time of the 2011 Australian Census of Population and Housing conducted by the Australian Bureau of Statistics (ABS) 39 . Statistical Area level 1 (SA1), the smallest geographical unit of the 2011 census data release, was the area-level unit of analysis in this study 39 . SA1s typically have a population size of 200 to 800 persons (average 400), and the Illawarra-Shoalhaven region covers a total of 980 conterminous SA1s 39 .
The CMRF test data in this study were extracted from the Southern IML Research (SIMLR) Study database, which is comprised of de-identified and internally linked pathology results from a major network of pathology services in the study region. The individual-level data in SIMLR database are geocoded to their corresponding SA1 areas, but not to their residential address, for privacy and confidentiality concerns. More details on this data source, procurement and access are published elsewhere 7 . The CMRF test data were extracted for non-pregnant individuals aged 18 years or older presenting for testing between 01 January 2012 and December 2017. Only the most recent test result was included if an individual had undergone the same test multiple times in this data period. Test data with missing details on the individual and area-level factors analysed in this study were excluded from the analyses.
Variables. Outcome variable. Results of the CMRF tests were the individual-level outcome variables. Data on the seven CMRF tests analysed in this study included: fasting blood sugar level (FBSL); glycated haemoglobin (HbA1c); total cholesterol (TC); high density lipoprotein (HDL); urinary albumin creatinine ratio (ACR); estimated glomerular filtration rate (eGFR); and objectively-measured body mass index (BMI). These CMRF test results were dichotomised into higher risk and lower risk values based on the current national and international guidelines on risk definitions (Table 1).
Study variable. The 2011 ABS census based Index of Relative Socioeconomic Disadvantage (IRSD) of the SA1s was the study variable. IRSD summarises a range of measures of relative socioeconomic disadvantage of people and households within SA1s and includes: level of income; education; employment; family structure; disability; housing; transportation; and internet connection 45 . This study uses IRSD reported as quintiles; the lowest quintile (Q1) indicating the most disadvantaged SA1s and the highest quintile (Q5) the least disadvantaged SA1s 45 . The IRSD quintiles in the study were derived by ABS from the distribution of IRSD scores for the Illawarra-Shoalhaven region based on the 2011 census. The study region has a diverse IRSD profile with representation across IRSD scores in comparison with Australia as a whole, making the region useful for population-level studies 46 . where y ij denote the binary response of CMRF test outcome (as 'higher risk' or 'lower risk' , based on the adopted definitions) for individual i in the area (SA1) j; π ij denotes the probability that individual i in area (SA1) j has a 'higher risk' CMRF test outcome given their individual-level age ij and sex ij; and their area-level IRSD index. The β 1 , β 2 , β 3 are the regression coefficients which measure the associations between the log-odds of the CMRF outcome and each covariate all else equal, and when exponentiated these are translated to ORs 36 . u j is the random effect for the area (SA1) j and τ 2 u is the area level variance, which has to be estimated.
Model comparison. The Akaike Information Criterion (AIC) was used to evaluate model fit. The derived multilevel models were compared for: area-level variance ( τ 2 ) at SA1 (level 2) level; proportional change in variance (PCV); Intra-cluster Correlation Coefficients (ICC); Median Odds Ratios (MORs); area under the receiver operating characteristic (AUC) curve; and the change in AUC. The τ 2 s of the multilevel models were initially identified from each models. PCVs were calculated for models M2s to M4s relative to M1s. The ICCs of the fitted models were calculated using the latent variable approach 47 . This approach assumes that a latent continuous outcome underlies the observed dichotomous outcomes and it is this latent outcome for which the ICC is calculated and interpreted. The ICC measured the expected correlation in CMRF outcomes between two individuals from the same SA1. The higher the ICC, the more relevant area-level context is for understanding individual latent outcome variation 36 . The MOR is calculated as an alternative way of interpreting the magnitude of area-level variance. The MOR translated the area-level variance which were estimated on the log-odds scale to the commonly used OR scale. The MOR result value is interpreted as the median increased odds of identifying the outcome if an individual move to another SA1 with higher risk. Thus, the higher the MOR the greater the general area-level effect and it will equal to 1 in the absence of area-level variance 36 . The general contextual effect of the geographic areas over and above their individual-level composition of the higher risk CMRFs, is obtained through the measure of clustering (ICC) in M2s. The geographic variance and ICC in the null models (M1s) of higher risk CMRFs may depend on both the contextual and individual-level variables. Therefore, M2s of the higher risk CMRFs which adjusted for individual-level attributes is better to provide information on the 'general contextual effect' of the areas. The unique contribution of the area-level study variable (IRSD) to the area-level variance of higher risk CMRFs were assessed through the PCVs between M2s and M4s.
The receiver operating characteristic (ROC) curves are created by plotting the true positive rates (TPR) i.e. sensitivity, against the false positive rates (FPR), i.e. 1 specificity for different binary classification thresholds of the predicted probabilities in all the models 48 . Post-estimation, predicted probabilities (π ij ) are calculated for www.nature.com/scientificreports/ each individual and are used to calculate the AUC for the model. The AUCs of the models measure the capacity of the models to correctly classify individuals with or without the outcome of a higher risk CMRFs analysed in this study, as a function of their predicted probabilities 36 . The AUC values range from 1 and 0.5, where 1 is the perfect predictive discrimination and 0.5 have no predictive power 49 . The AUCs also indicate the general contextual effects and can be compared it to the ICC and the MOR values 36 . The added value of knowing an individual's area of residence besides individual-level information (age and sex) can be obtained through the AUC change in Model 2 in reference to Model 0, where a higher AUC change would indicate higher relevance of areas in relation to CMRFs.
Statistical package. All analyses were performed using R version 3.4.4. (R Foundation for Statistical Computing, Vienna, Austria) 50 . Multi-level models were fit using the glmer function in the lme4 package 51, likelihood ratio tests were calculated using the lrtest function in the lmtest package 52, and ROC curves using the roc function in the pROC package 53 .

Results
A total of 1,132,029 CMRFs test data which belong to 256,525 individuals were extracted for the analyses.  Table 2 provides details of the missing data and test data distribution of each CMRF tests. Most frequently missing data were the IRSD indices from SA1s in the study area for which an IRSD index was not available from ABS 2011 census either due to low populations or poor data quality 54 . Tables 3 and 4 shows the frequencies and relative frequencies of CMRF tests results. Overall, the higher risk frequencies of all CMRFs increased with increasing area-level socioeconomic disadvantage, except for TC which demonstrated an inverse trend.
Single and multilevel models for each of the CMRFs analysed in this study are presented in Tables 5, 6, 7, 8, 9, 10 and 11. After adjusting for the covariates, all seven CMRFs were found to be significantly associated with area-level IRSD in the study region. For all but one variable the associations were positive (i.e. increased with area-level disadvantage). TC was the exception; being inversely associated with area-level disadvantage, with the most disadvantaged quintile (Q1) displaying the lowest odds for higher risk test results. Among the covariates, there was no significant association between gender and higher risk test results of eGFR or BMI. It was also noted that the odds of higher risk eGFR tests results accelerated with increasing age group, and the 80+ age group demonstrated a very high odds of being identified with a higher risk eGFR tests result in the study region. www.nature.com/scientificreports/ The overall comparisons of model random effects are presented in Table 12. Reductions in the AIC values were observed among all CMRFs from the null model (M1) to the final model (M4) indicating a better fit for the final models. In the unadjusted null models, higher risk test results of eGFR demonstrated the most area-level variance (0.189) and TC the least (0.026). Adjusting the CMRFs for age and sex initially increased the τ 2 of M2 for FBSL (PCV = + 1.88%), HbA1c (PCV = + 3.02%), HDL (PCV = + 15.25%) and BMI (PCV = + 1.48%). The τ 2 was reduced in the final model among all CMRFs compared with the null models.
The Akaike Information Criterion (AIC) was used to evaluate model fit. The derived multilevel models were compared for: area-level variance ( τ 2 ) at SA1 (level 2) level; proportional change in variance (PCV); Intra-cluster Correlation Coefficients (ICC); Median Odds Ratios (MORs); Area under the receiver operating characteristic (AUC) curve; and the change in AUC.
The ICCs of the unadjusted models ranged between 0.8% in high TC to 5.4% in low eGFR. Inclusion of IRSD after adjusting for age and sex had reduced the ICCs of all CMRFs in the final models, which ranged between 0.4% in low eGFR to 2.0% in obesity test results. The ICCs of the final models were low and suggest very limited area-level contextual effects. The AUC changes in model 2 and MORs of the final model support these findings. www.nature.com/scientificreports/ The proportions of the geographic variance in CMRFs contributed by IRSD were estimated through the PCV between M2 and M4. Adjusting the models for IRSD and individual-level variables explained a maximum 92.79% of the variance expressed by the null model of eGFR, reducing the ICC from 5.4 to 0.4%. The changes were least among the adjusted models of TC, with a marginal reduction of ICC from 0.8% to 0.5%. Thus, in the final models, the proportional reduction in variance was the largest for eGFR (PCV = 92.79%) and the least for TC (PCV = 33.27%).
The identified specific contribution of IRSD in the geographic variation of CMRF was the highest among the geographic variance of higher risk findings of HDL tests (57.8%), which was closely followed by FBSL (57.14%); HbA1c (53.31%); and ACR (51.17%) test results. The contribution of IRSD was comparatively lower among the geographic variance of the higher risk findings of eGFR (41.75%); BMI (41.06%); and TC (14.71%) test results, though not the least. Even though these specific proportions are large, it should be noted that it actually explained a lot of very little (i.e., variance of 0.01-0.07).

Discussion
The study reports on the influence of areas on higher risk CMRF distribution and quantifies the specific proportion of geographic variance explained by IRSD. The work adds to the very few studies which consider multiple CMRF variables within the same region, or which are based on population derived data over extended years 16,17,20,29,31,32 ,and reports on both single and multilevel analyses 38,55 . The results present both the measures of association and area-level variance based on multilevel logistic regression analyses 36 . The findings of the study add to the existing evidence and discussion regarding the relevance of individual versus area-level interventions for the prevention and control of CMRFs.
We found consistent evidence for the association between area-level disadvantage and seven CMRFs among adult health service using residents of the Illawarra-Shoalhaven region in NSW Australia. In adjusted models, the odds of a higher risk finding increased with increase in area-level disadvantage among all CMRFs excepting TC, which showed an inverse pattern of association with increase in area-level disadvantage. Thus, in the final models we observed that, over and above individual age and sex, living in a disadvantaged neighbourhood proportionally increased the individual-level probability of being identified with a higher risk CMRF. The findings highlight the importance of including of area-level variables into health risk analyses. www.nature.com/scientificreports/ The ICCs of CMRFs in all the models were comparatively small (Table 12) in all the models. In the fully adjusted models, the ICCs were further reduced and ranged between 0.4% and 2.0% in low eGFR and BMI respectively. As per the interpretation framework proposed by Merlo et al., an ICC value less than 10% is indicative of very little geographic difference 56 . The AUC change of the model 2 s in relation to the single level models (range 0.01-0.08) reconfirm on these findings. However, this has to be interpreted along with the traditional geographic comparisons such as the proportion of the individuals who are affected with higher risk CMRF outcomes. Therefore, a small geographic difference with uniformly higher, medium, or lower proportion of affected individuals indicates homogeneity of the higher risk CMRF findings within their geographic units 56 . Such a situation would call for balanced universal approaches to prevent and control the higher risk CMRFs, with a proportional focus to the need and disadvantage level of affected populations 57,58 . However, it is also worth noting that when the exposure to an agent is homogenic in a community, the traditional epidemiological methods are not very helpful in identifying their markers of susceptibility 59 .
Our results confirm, and are comparable with, associations between area-level disadvantage and CMRFs reported in previous studies  , and extends their findings. The results primarily confirm the geographic variation of CMRFs and associations with area level disadvantage, as reported in previous studies. Further, the study provides means to compare this association which were observed consistently with a range of multiple CMRFs analysed in this study. The study extends on previous reports by differentiating the individual and area-level contributors to the exhibited geographic variance of CMRFs. And most importantly, the general contextual effect and the specific contributions of IRSD on the geographic variance of multiple CMRFs were identified, which is unique in the literature and highly informative for health care service commissioning.
The TC test results often stood apart from the major findings of this study, demonstrating inverse associations with IRSD. However, this was not reflected in the HDL findings, even though both are components of the lipid profile in an individual. This raises the possibility of a medication effect on TC in these areas, where the lipid lowering drugs have a less consistent effect in raising HDL than in lowering TC 60    www.nature.com/scientificreports/ and poor diet quality 66,67 . However the reason for the inverse association demonstrated by TC test results are not clearly established within the current study results and requires further research to explore possible individual and area-level contributions. The study has to be considered within its limitations. Primarily, the cross sectional nature of analyses adopted in this study do not yield support for any causal relationships. In addition, the non-linear and time varying effects of covariates analysed in this study restrict generalisability of their findings though very informative for regional health care service commissioning. Secondly, the IRSD quintiles included as the key explanatory variable represent relative disadvantage in an area and have limitations intrinsic to aggregate measures. Thirdly, it should be noted that the data used in this study are extracted from people already utilising the health care service facilities in the area. Fourthly , the readers should be mindful that the variance reported in this study are attributable to (1) individual level factors (age, sex) analysed at the area-level, (2) area-level contextual influences (IRSD), and (3) other individual and area-level characteristics not considered in this study. However, further individual-level data extractions or collections are not possible with this study's dataset as the de-identification process precludes the inclusion of any further individual level data. Other individual and area-level factors not considered in this study could include: individual-level SES 68 , type of neighbourhood food outlets [69][70][71][72] , poor physical activity resources 73,74 , residential density and service availability 75 . Finally, the assumptions of the standard multilevel logistic regression modelling methods adopted in this study would not be able to account for the autocorrelation of the area-level residuals (if any) of the models. Expected shortcomings due to this could be an overestimation of random effects in our models 76 . However, any such effects were observed to be very marginal in our results as the random effect estimates are already at their lower limits. While acknowledging this limitation, we believe the effects of this are not critical in our results. Hybrid models which provide more precise estimates of random effects are becoming increasingly available with advances in computational technologies 77 . However, they www.nature.com/scientificreports/ would not be directly applicable to our data sets, mainly due to the non-availability of location specific data at individual-level in our study data. Notwithstanding these limitations, the study is unique in that it analysed a range of CMRFs across a widely dispersed population and included both rural and urban residents. In addition, the study used six years (year 2012-2017) of CMRF tests data from the region in the hierarchical multilevel analyses. The findings of the study indicate that those residing in the most disadvantaged areas are more likely to be identified with higher risk CMRFs than those in lower disadvantage areas. However, the low ICC, AUC change and MOR values of the area-level models do not support for contextual approaches. Rather, the findings of the study support a proportionate universalism approach in which health resources are made universally available but proportional to the need and disadvantage level of the affected population 57,58 .

conclusion
The study demonstrates that in the Illawarra Shoalhaven region of Australia, people residing in socioeconomically disadvantaged areas have a higher probability of being identified with higher risk CMRFs across a range of factors. The low general contextual effects of the areas suggest for universal intervention for the prevention and control of CMRFs in this study region, but proportional to the need and disadvantage level. The patterns were consistent across the six CMRFs analysed in this study; and comparable with similar studies reported nationally and globally. Based on our findings, we recommend further area-level research to discern the role of other contextual factors not analysed in this study especially the area-level access to health care services to determine its existing role and adequacy 78 , and evidence based universal interventions for the prevention and control of CMRFs but proportionate to the priority level of the populations based on area-level disadvantage. Significance (LRT) *** *** *** *** *** www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/