Spatial analysis of cholangiocarcinoma in relation to diabetes mellitus and Opisthorchis viverrini infection in Northeast Thailand

Cholangiocarcinoma (CCA) exhibits a heightened incidence in regions with a high prevalence of Opisthorchis viverrini infection, with previous studies suggesting an association with diabetes mellitus (DM). Our study aimed to investigate the spatial distribution of CCA in relation to O. viverrini infection and DM within high-risk populations in Northeast Thailand. Participants from 20 provinces underwent CCA screening through the Cholangiocarcinoma Screening and Care Program between 2013 and 2019. Health questionnaires collected data on O. viverrini infection and DM, while ultrasonography confirmed CCA diagnoses through histopathology. Multiple zero-inflated Poisson regression, accounting for covariates like age and gender, assessed associations of O. viverrini infection and DM with CCA. Bayesian spatial analysis methods explored spatial relationships. Among 263,588 participants, O. viverrini infection, DM, and CCA prevalence were 32.37%, 8.22%, and 0.36%, respectively. The raw standardized morbidity ratios for CCA was notably elevated in the Northeast’s lower and upper regions. Coexistence of O. viverrini infection and DM correlated with CCA, particularly in males and those aged over 60 years, with a distribution along the Chi, Mun, and Songkhram Rivers. Our findings emphasize the association of the spatial distribution of O. viverrini infection and DM with high-risk CCA areas in Northeast Thailand. Thus, prioritizing CCA screening in regions with elevated O. viverrini infection and DM prevalence is recommended.

dietary habits 14 .This parasitic infection triggers chronic inflammation, resulting in oxidative DNA damage to the infected biliary epithelium and subsequent malignant transformation 1 .Notably, recent investigations have indicated a declining trend in the incidence of CCA in Thailand.Both males and females have experienced a significant reduction in the incidence rate, marking a positive shift in the epidemiological landscape 15 .
The incidence of CCA exhibits significant regional variability, with distinct risk factors identified across diverse populations.Previous studies in Thailand have outlined specific risk factors for CCA, including O. viverrini infection, the consumption of raw cyprinid fish, a family history of cancer, the use of praziquantel (PZQ), hepatitis C virus, hepatitis B virus, cirrhosis, obesity, alcohol consumption, tobacco smoking, and diabetes mellitus (DM) 2,16,17 .Furthermore, a recent study conducted in northeastern Thailand has shed light on an intriguing correlation.This research uncovered that individual infected with O. viverrini not only faced an increased risk of CCA but that this risk was further amplified in individuals who also had DM 18 .This dual association suggests a potential synergistic effect between O. viverrini infection and DM in elevating the susceptibility to CCA.
While prior research has explored the association of various factors with the development of CCA, a noteworthy revelation pertains to the connection between the combined presence of O. viverrini infection and DM and the development of CCA.Despite these insights, the scarcity of spatial analyses aimed at pinpointing geographic areas with a high prevalence of CCA, while considering associated factors, remains evident.This gap is particularly pronounced in regions characterized by a heightened prevalence of both O. viverrini infection and DM.
Extensive research has delved into the spatial analysis of CCA concerning O. viverrini infection and DM in Northeast Thailand.The heightened prevalence of liver and biliary tract abnormalities in this region, predominantly attributed to the distribution of O. viverrini, has been a focal point of spatial analysis studies 19,20 .These investigations aim to unravel the geographical distribution and clustering of CCA cases in Northeast Thailand, offering valuable insights into the environmental and regional factors contributing to the CCA prevalence 20 .These studies have played a pivotal role in emphasizing the pressing need for eradicating the liver fluke and identifying high-risk populations to effectively address the elevated incidence of CCA in the region 20 .Additionally, the investigation into the correlation between DM and CCA has garnered attention, as DM is considered a potential risk factor for CCA 18,[21][22][23] .Previous studies have suggested that individuals with DM were more likely to be incident cases of CCA 24 , and another study demonstrated an association between DM and shorter survival among CCA patients 25 .Understanding the spatial distribution of CCA cases in relation to DM and O. viverrini infection is paramount for crafting targeted prevention and control strategies.Moreover, this knowledge is crucial for enhancing early detection and treatment outcomes, particularly in high-risk areas such as Northeast Thailand 18,26 .This region is endemic for O. viverrini, the primary cause of abnormalities in the biliary system examined in this study and a key factor on the etiological pathway leading to the development of CCA 27 .
Geographic Information System (GIS) analysis has emerged as a prevalent tool in epidemiological research, facilitating the identification of disease clustering, spatial patterns, and the analysis of interactions between environmental factors and health.Nevertheless, the authors are not aware of any studies employing these methods in the investigation of the distribution of CCA, DM, and the history of O. viverrini infection.Therefore, our study aimed to explore the spatial distribution of CCA in relation to O. viverrini infection and DM within high-risk populations in Northeast Thailand.

Study area
The study was conducted in Northeast Thailand, situated between latitudes 14.50°N and 17.50°N, and longitudes 102.12°E and 104.90°E, encompassing an approximate area of 168,854 km 2 (Fig. 1).This region comprises 20 provinces (Supplementary Fig. 1), and data for this study was systematically gathered from all 2678 sub-districts distributed across these provinces.

Study design and data sources
The cross-sectional study was reported in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement.The data for this study were obtained from the Cholangiocarcinoma Screening and Care Program (CASCAP) in northeastern Thailand.CASCAP represents the pioneering initiative for CCA screening within a high-risk population, employing a community-based, bottom-up approach 27 .The CASCAP screening program endeavors to encompass all residents of northeastern Thailand, achieving this through various methods and settings, such as tertiary-care hospitals, district-level hospitals, and mobile screening sessions at the sub-district level.For this study, we included all participants who underwent CCA screening and were subsequently enrolled in the CASCAP database.

Study population
Northeastern Thailand boasts a population of approximately 21 million individuals, accounting for roughly onethird of Thailand's total population.Our study encompasses all individuals who participated in CASCAP across 20 provinces and received information regarding their history of O. viverrini infection and diagnosis of DM.Specifically, our study sample comprises individuals who were enrolled in the CASCAP database between 2013 and 2019, totaling 263,588 subjects.Population data for the sub-districts during the study period (2013-2019) were acquired from the Thailand Official Statistics Registration Systems website (https:// stat.bora.dopa.go.th).

Study outcome and independent variables
The primary outcome in this study was the diagnosis of CCA, which was determined through a combination of ultrasonography (US), computed tomography (CT)/magnetic resonance imaging (MRI), and histopathological examination.Positive US findings were defined as those indicating liver masses and/or bile duct dilatation, www.nature.com/scientificreports/indicative of suspected CCA.Individuals falling into this category were subsequently referred for either CT or MRI scans.The results of the CT/MRI scans were categorized as either positive or negative for CCA.For confirmation of positive CCA cases, biopsy procedures were performed.Participants who did not exhibit suspicious findings on US, CT/MRI scans, or had histologically negative results were classified as negative for CCA.The diagnosis of CCA was only confirmed through biopsy results.The independent variables in this study encompassed the key factors of interest, namely the history of O. viverrini infection and the diagnosis of DM, both of which were categorized into two groups (negative/positive).Additionally, the covariates considered for analysis consisted of gender and age at the time of enrollment.These factors were obtained through interviews with study participants using the CASCAP questionnaires.

Statistical analysis
The prevalence of CCA, O. viverrini infection, and DM was calculated as percentages, with the number of positive cases as the numerator and the total number of participants as the denominator.In order to assess differences in the prevalence of CCA across different categorical factors, we utilized the chi-square test statistic, employing a significance level of 0.05.The prevalence is reported along with its corresponding 95% confidence interval (CI).This CI is determined by calculating the lower bound as the proportion value minus the product of the z-score for the 95% CI (1.96) and the standard error (SE).The SE is computed as the square root of the proportion value multiplied by one minus the proportion value, divided by the number of samples.Subsequently, the upper bound is obtained by the proportion value plus the product of the z-score and the SE.These values are then multiplied by 100 and presented as the 95% CI of the prevalence.All statistical analyses were conducted using STATA version 18 (StataCorp, College Station, TX, USA).
We accounted for differences in population sizes by calculating raw standardized morbidity ratios (SMR) for the overall positive of CCA in each sub-district of Northeast Thailand from 2013 to 2019, using the following formula: where Y i represents the SMR in sub-district i, O i stands for the observed number of cases for CCA in the subdistrict, and E i represents the expected number of cases in the sub-district over the study period.We calculated www.nature.com/scientificreports/ the expected number of cases by multiplying the population of each sub-district by the overall crude rates of these conditions during the study period 28 .

Spatial analysis
This analysis covered a region that included 2678 sub-districts in Northeast Thailand.The administrative boundaries for these sub-districts were defined using polygon shapefiles obtained from the DIVA-GIS website (www.diva-gis.org).The spatial dataset, which included SMR for CCA were imported into ArcGIS Pro software version 3.2 (ESRI Inc., Redlands, CA, USA) and projected onto the Universal Transverse Mercator (UTM) coordinate system zone 48 N. The polyline shapefiles representing main water resources in Northeast Thailand were obtained from the DIVA-GIS website (www.diva-gis.org).The administrative boundary map encompassed a total of 2678 sub-district-level areas 20 .We identified cases where O. viverrini infection history was present alongside CCA or DM, cases where DM co-occurred with CCA, and cases where all three conditions coexisted as OVCCA, DMCCA, and OVDMCCA, respectively.These cases, along with the spatial distributions of demographic data, were integrated into ArcGIS Pro software.The datasets were geospatially linked based on the administrative boundary map of Northeast Thailand.This allowed us to aggregate and extract data by sub-district areas and create variables for subsequent statistical analyses.The process of linking and analyzing the data followed established procedures previously employed in spatial analyses within this region 20 .The cut-off points reported in the figure legends were determined using the Natural Breaks (Jenks) classification method in the ArcGIS Pro software.
We initially conducted bivariate Poisson regression analyses to explore associations between CCA cases and various independent variables, including gender, age group, history of O. viverrini infection and DM.This step aimed to identify covariates supported by evidence from prior articles, indicating associations with the development of CCA or demonstrating significant relationships with CCA prevalence (with a threshold of P value < 0.25) in the bivariate models.Variables meeting this criterion were subsequently incorporated into the final models.To assess collinearity among all included variables, we performed Pearson correlation analyses.All preliminary statistical analyses were carried out using STATA software version 18.0 (Stata Corporation, College Station, TX, USA).

Spatial poisson regression analysis
We conducted a Spatial Poisson regression analysis in our study.To address the presence of zero cases in some sub-districts for the CCA, we opted for a zero-inflated Poisson (ZIP) regression model rather than a standard Poisson regression.The ZIP regression was implemented using a Bayesian framework in WinBUGS software version 1.4.3,developed by the Medical Research Council in Cambridge, UK.Three distinct models were constructed to account for different sources of variability.The first model, referred to as Model I, incorporated unstructured random effects for gender, age groups, history of O. viverrini infection, and DM.In the second model, Model II, we introduced spatially structured random effects alongside the same covariates as Model I. Finally, the third model, Model III, was a convolution model that included both spatially structured and unstructured random effects, while maintaining the same covariate set as the previous two models.
The convolution model, with an outcome of observed cases of CCA (numbers), Y, for the ith sub-district in Northeast Thailand (i = 1 to 2678), for the jth gender group and kth age group was structured as follows: where E ijk is the expected number of CCA cases (offsetting population size) in sub-district i, gender j, age group k, μ ijk is the parameter of distribution, θ ijk is the mean log relative risk (RR) and ω ∈ [0,1] relates to proportion of extra zero; α is the intercept, and β 1 , β 2 , β 3, and β 4 are the coefficients for age (≤ 60 years as the reference category), gender (female as the reference category), O. viverrini infection history, DM; u i and s i are the unstructured and structured random effects (assuming a mean of zero and variances of σ u 2 and σ s 2 respectively).In the modelling the spatially structured random effects, we implemented a conditional autoregressive (CAR) prior structure.This structure was employed to account for spatial relationships among sub-districts within the study area.To represent these relationships, we utilized an adjacency weights matrix, where a weight of 1 denoted sub-districts sharing a border, and 0 indicated no shared border.For the model's priors, we employed a flat distribution for the intercept and a normal distribution for the coefficients, with a mean of zero and a precision (the inverse of variance) set at 0.0001 20 .The precision of both unstructured and spatially structured random effects was modelled using non-informative gamma distributions, characterized by shape and scale parameters of 0.001.These methodologies have previously been applied in spatial analyses of CCA in Northeast Thailand.
We initiated the analysis by discarding the initial 10,000 iterations as a burn-in period.Subsequently, we executed sets of 15,000 iterations and scrutinized them for signs of convergence in the Monte Carlo chains.This process was iterated three times, with the outcomes of the first two iterations being discarded.At this www.nature.com/scientificreports/juncture, we conducted a visual examination of the posterior density plots to ascertain that convergence had indeed been achieved.The final model selection hinged on striking a balance between model fit and parsimony.We employed the deviance information criterion (DIC) to guide our decision-making, ultimately selecting the model that yielded the lowest DIC score.To establish statistical significance of the covariates, we set an α-level of 0.05, determined by analyzing the 95% credible intervals (CrI) for mean posterior relative risks (RRs).Variables were considered statistically significant if the 95% CrI for the RRs did not encompass 1 20,29 .Finally, to visually represent the spatial distribution of posterior means for both unstructured and structured random effects, we utilized ArcGIS Pro software to generate maps.

Ethical considerations
The research protocol was approved by Khon Kaen University Ethics Committee for Human Research, reference number HE631061.The data were provided from the CASCAP.The CASCAP data collection was conducted according to the principles of Good Clinical Practice, the Declaration of Helsinki, and national laws and regulations about clinical studies.It was approved by the Khon Kaen University Ethics Committee for Human Research under the reference number HE551404.All patients gave written informed consent for the study.

Descriptive summary
A total of 263,588 participants undergoing screening for CCA were included in our study.Approximately twothirds of the participants were females (60.7%), with the majority being aged 60 years or less, reflecting a mean age of 55.7 (standard deviation = 9.2) years (Table 1).The prevalence of O. viverrini infection was 32.37% (95% CI: 32.19-32.54),and the prevalence of those diagnosed with DM was 8.22% (95% CI: 8.12-8.33).Over the seven-year period under examination, the overall crude incidence rate of CCA was calculated to be 2.64 cases per 100,000 population.At the sub-district level, the incidence rates exhibited considerable variation, spanning from 0 to 38 cases per 100,000 population.Particularly noteworthy were the substantial spatial disparities observed in the incidence rate of CCA among different sub-districts.Figure 2 visually depicts the spatial distribution of SMR for CCA in Northeast Thailand at the sub-district level.

Discussion
This study stands as a pioneering endeavor, being the first to undertake a comprehensive spatial analysis of CCA concerning its correlation with DM and O. viverrini infection.Notably, these health factors are recognized as significant and prevalent issues in the Northeast Thailand.By exploring the spatial dynamics of CCA in conjunction with DM and O. viverrini, this research contributes to a deeper understanding of the interplay between these health concerns, shedding light on potential geographic patterns and correlations that can inform targeted interventions and public health strategies in Northeast Thailand.Our study revealed an association between O. viverrini infection and individuals diagnosed with DM and CCA.These observed effects remained significant even after adjusting for the influence of gender and age, both identified as significant covariates in this investigation.This discovery aligns with prior research indicating that gender and age factors are linked to CCA, particularly among males and the elderly 18,26,30,31 .
The outcomes of our investigation indicate a widespread prevalence of CCA, O. viverrini infection, and DM across nearly all regions in northeastern Thailand.This aligns with prior research conducted in 2011, which identified a strong association between O. viverrini infection and DM with CCA 18 .Furthermore, a prospective follow-up study in 2020, involving O. viverrini-positive subjects in Thailand, revealed a significant increase in serum levels of HbA1c and HDL during the follow-up period 32 .The spatial distribution of CCA is conspicuously apparent in the upper, middle, and specific lower regions of the studied area.This distribution aligns with a previous spatial analysis of CCA in northeast Thailand conducted in 2019, which identified significant clusters of CCA in the upper part (Nong Bua Lamphu, Udon Thani, Sakon Nakhon, and Nakhon Phanom Provinces), middle part (Maha Sarakham and Khon Kaen Provinces), and lower part (Chiyaphum Province) 20 .
On the contrary, O. viverrini infection and DM exhibit a widespread distribution throughout the region, consistent with a study conducted in 2021 that demonstrated a high proportion of O. viverrini infection and DM in a high-risk group of CCA in northeastern Thailand 18 .Moreover, in our study, the spatial distribution of CCA, O. viverrini infection, and DM was notably elevated, particularly in proximity to main water sources such as the Mekong, Songkhram, Chi, and Mun rivers.This finding aligns with previous research associating the prevalence of O. viverrini with populations residing near water sources.This observed pattern corresponds with earlier investigations, such as a 2019 study that identified a large high-risk area cluster of CCA in provinces located in the Chi and Mun River Basin, with more isolated high-risk sub-districts found in provinces situated in the Mekong River Basin 20 .Additionally, a study in 2020 revealed a significant association between geographic factors and O. viverrini infection 33 .Furthermore, a comprehensive review in 2018, which systematically searched five international and seven Thai research databases to identify studies relevant to risk factors for CCA in the Lower Mekong Region, recommended addressing O. viverrini infection as a key strategy to mitigate CCA incidence in affected regions 34 .
Our findings on Bayesian spatial analysis revealed a pronounced cluster of CCA in Maha Sarakham Province, which is situated between the Chi and Mun rivers in the middle part of the region.Subsequent notable clusters were identified in Khon Kaen and Kalasin, which are also located in the middle part, as well as provinces in the upper part (Nong Bua Lamphu, Udon Thani, Nakhon Phanom, and Sakon Nakhon), and provinces in the lower part (Surin and Chiyaphum).These findings align with prior research that has similarly identified the prevalence and spatial distribution of CCA in the northeastern region of Thailand 19,20 .
One limitation of our study pertains to the data collection method for factors related to O. viverrini infection and DM, which relied on participant interviews without the physical examinations or diagnostic procedures.This approach introduces the potential for data reliability issues.Furthermore, this information bias could result in underestimation or overestimation of study results, potentially impacting the overall reliability of the study's conclusions.However, the substantial size of our study's sample mitigates this concern, as the large sample size contributes to reduced variance in the data.Consequently, this diminishes the likelihood of estimation errors and strengthens the reliability of our findings, promoting more conclusive and robust outcomes.

Conclusions
Our findings elucidate a pervasive distribution of CCA, O. viverrini infection, and DM across the entirety of the northeastern region of Thailand, exhibiting a congruent spatial pattern.Specifically, regions characterized by a high distribution of CCA often coincide with elevated occurrences of both O. viverrini infection and DM.Consequently, prioritizing CCA screening initiatives in areas exhibiting high rates of O. viverrini infection and prevalent DM should be accorded top priority.This targeted approach aligns with the observed spatial relationships, optimizing resource allocation for effective and region-specific health interventions.

Figure 1 .
Figure 1.Map of the study area.Map was created using ArcGIS Pro software version 3.2 (ESRI: https:// www.esri.com/ en-us/ home).

Figure 2 .
Figure 2. Raw standardized morbidity ratios of cholangiocarcinoma by sub-districts in Northeast Thailand.Map was created using ArcGIS Pro software version 3.2 (ESRI: https:// www.esri.com/ en-us/ home).

Figure 4 .
Figure 4. Spatial distributions of the posterior means of cholangiocarcinoma in Northeast Thailand in Model III: (A) Spatially structured random effects and (B) Spatially unstructured random effects.Maps were created using ArcGIS Pro software version 3.2 (ESRI: https:// www.esri.com/ en-us/ home).

Table 1 .
Baseline characteristics and health history of participants in the CASCAP study 2013-2019 Percentage-The percentage per 100.

Table 3
presents the Bayesian spatial and non-spatial models applied to CCA.Based on the DIC statistics, the most fitting model was identified as Model III, incorporating both unstructured and structured random effects.Within Model III, gender, age, O. viverrini infection history, and DM exhibited significant associations with CCA.Males were found to be 2.52 times more likely to have CCA compared to females (95% CrI: 2.20-2.90).

Table 2 .
Prevalence of cholangiocarcinoma for each factor.N/A, Not applicable; 95% CI, 95% Confidence interval of the prevalence; Prevalence, The prevalence per 100; P value, Probability values from chi-square tests.