The use of social simulation modelling to understand adherence to diabetic retinopathy screening programs

Pereira, Andreia Penso; Macedo, João; Afonso, Ana; Laureano, Raul M. S.; de Lima Neto, Fernando Buarque

doi:10.1038/s41598-024-55517-4

Download PDF

Article
Open access
Published: 29 February 2024

The use of social simulation modelling to understand adherence to diabetic retinopathy screening programs

Andreia Penso Pereira¹,
João Macedo²,
Ana Afonso³,
Raul M. S. Laureano^1,4 &
…
Fernando Buarque de Lima Neto⁵

Scientific Reports volume 14, Article number: 4963 (2024) Cite this article

397 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

The success of screening programs depends to a large extent on the adherence of the target population, so it is therefore of fundamental importance to develop computer simulation models that make it possible to understand the factors that correlate with this adherence, as well as to identify population groups with low adherence to define public health strategies that promote behavioral change. Our aim is to demonstrate that it is possible to simulate screening adherence behavior using computer simulations. Three versions of an agent-based model are presented using different methods to determine the agent’s individual decision to adhere to screening: (a) logistic regression; (b) fuzzy logic components and (c) a combination of the previous. All versions were based on real data from 271,867 calls for diabetic retinopathy screening. The results obtained are statistically very close to the real ones, which allows us to conclude that despite having a high degree of abstraction from the real data, the simulations are very valid and useful as a tool to support decisions in health planning, while evaluating multiple scenarios and accounting for emergent behavior.

Multi-agent simulation model updating and forecasting for the evaluation of COVID-19 transmission

Article Open access 21 December 2022

A multi-method approach to modeling COVID-19 disease dynamics in the United States

Article Open access 14 June 2021

Stress-testing the resilience of the Austrian healthcare system using agent-based simulation

Article Open access 23 July 2022

Introduction

Diabetic retinopathy (DR), ICD-9 code 362.0, is a complication of diabetes that causes structural changes in the blood vessels of the retina. It is currently one of the main causes of blindness in developed countries¹. As DR is asymptomatic until the later stages, patients with diabetes should have regular eye tests^1,2. Several countries have therefore implemented population-based DR screenings³.

The literature demonstrates that the success of screening programs depends to a large extent on the adherence of the target population, but as far as we know there is a gap in the study of the behavioral mechanisms behind the phenomenon (as demonstrated in “Literature review”).

In order to bridge this gap, our research focuses on the development of computer simulation models that make it possible to understand the factors that correlate with adherence rates, identify population groups with particularly low adherence and may help to support decisions in health planning, while evaluating multiple scenarios and accounting for emergent behavior.

In this article we are mostly focused on the first step of the process: how to predict the rate of adherence to population-based screenings through computational simulation models with a high level of abstraction. More specifically, this article aims to (i) demonstrate that it is possible to develop a computational simulation model that faithfully portrays the individual decision to adhere to screening or not, using the intrinsic features of the diabetic patients and of the screening programs, (ii) demonstrate that a simulation model with the aforementioned characteristics can be used in contexts other than the one in which the data for its development was collected, (iii) hopefully, demonstrate the utility of combining ABM and fuzzy logic in models that intend to simulate human behavior. To this end, we developed three versions of an agent-based simulation model (ABM), using a logistic regression equation and fuzzy logic components to predict the individual decision concerning whether to adhere to the screening or not. By integrating the fuzzy components with the result of the logistic linear regression, with parametrizable weights, the proposed model can be used to compare the two methods, and a combination of both techniques.

The first set of simulations, in which the individual decision on whether to adhere to the screening or not was based on logistic regression, proved to be good at replicating reality and useful in staging scenarios in a specific geographic context, but we recognize that its scalability and level of abstraction is limited, as it is mainly based on behaviors previously observed in a concrete screening program²⁷. The second set of simulations allows us to overcome the aforementioned limitations, despite some decline in the accuracy of the previsions, by replacing the logistic regression equation with three fuzzy logic components to simulate individual decision-making. Finally, a third set of simulations was performed, combining the logistic regression and fuzzy components in equal proportions.

Literature review

The first computer simulations of DR screenings date back to the 1990s and mainly used Markov chains to demonstrate the cost-effectiveness of implementing population-based screening programs^4,5,6,7. In subsequent years, and with a broad consensus on the cost-effectiveness of population-based DR screenings, researchers started to focus on the analysis of different screening alternatives. Several simulation models were developed to compare screening methods^8,9,10,11,12, the results of adopting different screening intervals^{9,10,11,12,13,14,15,16,17}, and to analyze the cost-effectiveness of telemedicine^{18,19,20,21,22,23}. We highlight the works of Davies and his colleagues, who developed a simulation model based on discrete events (DES) to stage different screening intervals^9,10,11,12 and found that the population's adherence to the screening plays a decisive role in its success. However, in subsequent models the authors continued to adopt a fixed probability of adherence and no attempt was made to model the subjects' individual behavior^9,10,11,12. An attempt to include human behavior was made by Schmidt and Brailsford²⁴, by incorporating the Health Belief Model (HBM) into a DES model that produces a result (behavior—output) through a combination of several factors (inputs) that influence screening adherence. In this model each patient is an individual entity, with their own features. This approach was implemented using numerical attributes to represent the various features of the diabetics (number of times they adhered to previous screenings, perception of general health status, current DR status, information and anxiety regarding DR, and educational qualifications). The probability of participating in the screening was calculated as a binary variable and the model uses only artificial data, leading to results that are theoretical artifacts which lack validation with real data. This research also stresses the difficulty of incorporating qualitative variables, such as those used by the HBM, in DES models, emphasizing the need for the use of another type of technique²⁴. Supplementary Table S1 provides further details on the strengths and limitations of each of these studies.

The literature suggests that ABM are a good alternative for the study of systems in which individual behavior has a relevant impact, since they are composed of networks and processes formed by interactive and adaptive agents²⁶. In fact, in an ABM the social system is represented as a set of autonomous agents capable of taking decisions. In each iteration, each agent individually assesses their situation and makes decisions based on a set of rules, then takes a certain action. Even a simple ABM can exhibit complex behavior patterns and provide valuable information on the dynamics of the real-world system it simulates^25,26. Among the main advantages of using such an approach, we highlight its ability to simulate different scenarios and emergent behavior that is not explained by classic theories, such as the adoption of behaviors that repeatedly do not result in the best outcome; heterogeneity and interactions between agents; flexibility and the possibility of following the evolution of a system²⁶.

The concept of fuzzy logic, introduced by Lotfi Zadeh in 1965, is based on the observation that human beings make decisions based on imprecise, subjective and non-numerical information^28,29. Thus, fuzzy sets are mathematical entities that aim to represent imprecise information and give models the ability to recognize, represent, manipulate, interpret, and use vague and/or subjective information, allowing for a high level of abstraction in relation to the original data²⁹. Techniques based on fuzzy logic are therefore especially suitable for simulating human behavior, having already been used quite successfully for this purpose³⁰, and fuzzy rules can be embedded in within the intelligent agents of the ABM³¹.

Methods

This research focuses on the concept of model development driven by real data³². Thus, the research process began with the important steps of identifying sources and collecting, integrating, and processing data. After this, the modelling process itself began. The following flowchart (Fig. 1) illustrates the main steps performed, which will be described in greater detail in the following subsections.

Sample and data gathering

This research used data on all calls for DR screening between the years 2013 and 2018, provided by the Portuguese Northern Region Health Administration (ARSN). Figure 2 illustrates the geographic area covered by ARSN.

The sample consists of 271,867 calls for DR screening, which corresponds to 108,620 different diabetics. The following variables were used: age; gender; professional status; existence of telephone contact for sending reminders; Health Centre Cluster (ACES); Primary Health Care Unit; type of Primary Health Unit; existence of a family doctor; reason for exemption from payment of charges for services, when applicable; number of consultations at the Primary Care Unit in the last 12 months; type of diabetes (I or II); Body Mass Index (BMI); blood glucose levels (HBA1C); month of call for screening; days elapsed between calls; number of times the diabetic was called; last screening result; percentage of times the diabetic adhered to previous screenings. Subsequently, data from the National Institute of Statistics (INE) was used to obtain the variables "income (median)" and "educational qualifications (distribution by postal code with 7 digits)”³³, as these variables are identified in the literature as closely related to the adherence rate²⁴. For the classification of geographical areas according to the degree of urbanization, data from the Typology of Urban Areas 2014 (TIPAU, 2014), available on the INE website³³, was used.

All methods of data gathering were carried out in accordance with relevant guidelines and regulations.

The authors did not have any direct contact with the subjects participating in the study.

The data obtained from INE are publicly available and of a general nature, not allowing the identification of the subjects involved³³.

The data provided by ARSN were collected by the institution, in accordance with the legislation applicable in the Portuguese Public Administration, including informed consent from all subjects and/or their legal guardian(s).

Moreover, the data provided by ARSN for this research went through a set of mechanisms that guarantee the protection of privacy (for example encryption and anonymization), and all the procedures were duly endorsed by the ARSN ethics committee, in strict compliance with all issues related to access to Public Administration data, and the data protection regime.

The present research does not include the use of experimental protocols.

Data preparation and Statistical analysis

A large percentage of the work in data analysis involves preparing the data³². Hence, in this phase it was necessary to integrate data from different data sources and perform data cleansing: identifying impossible or incorrect values for specific variables, cases that should not be in the study (because they do not meet the inclusion criteria), duplicate cases, missing data, and outliers, while also ensuring that the same value for string variables is always written in a coherent manner throughout the data set. The SPSS Modeler 18.2 software³⁴ was used to carry out this step.

In a second phase, a descriptive statistical analysis was carried out, aiming to identify the variables that best explain adherence to screening. The results of the statistical analysis are presented in “Sample and data gathering”.

The diabetic’s individual decision

In order to select the data mining model, eight models were tested using SPSS Modeler 18.2 software³⁴: decision trees (C5, Tree-AS, CHAID, Quest, C&R Tree), neural network, Bayesian network and logistic regression. For this study, the adherence to screening variable was used as a dichotomous dependent variable. A set X of 21 independent variables was considered. The logistic regression model (Fig. 3) revealed the most accurate results (62.23% correct in the training set, AUC = 0.68 and 63.62% in the testing set, AUC 0.681). Only 3 independent variables were included in the regression model that was generated, since the others were discarded (using the stepwise method) due to their low significance in the model. The predictors of the behavior of adherence to the screening are the percentage of times the diabetic had previously adhered to the screening, the last screening result, and the number of times the diabetic was called for DR screening, which is in line with the literature²⁴.

As an alternative to the data mining model, a set of fuzzy components was developed to measure the result of the individual decision on whether to adhere to the screening or not.

The fuzzy components, as well as the variables that constitute each component, were established on the basis of the statistical analysis results, those in the literature that focus on explaining the rate of adherence to health programs, and the HBM^23,24. This analysis resulted in three common-sense fuzzy components: “access barriers”, “knowledge of the disease”, “quality/strategy of the screening program”. The selection of the representative function for the variables that comprise each component was based on an analysis of the distribution of real data. The “access barriers” component comprises variables B1, B2, B3 and B4. B1 concerns the perception of access barriers due to age. “Difficult access due to age” is defined by the linear function that passes through the points [0, 1], [100, 0]. “Easy access” is defined by the linear function that passes through the points [0, 0] and [100, 1]. Variable B2 corresponds to the perception of access barriers as a function of income. "Difficult access due to income" is defined by the normal distribution of the mean 50,000 euros/year and standard deviation 17,000 euros/year. The classification "easy access due to income" corresponds to the maximum of two normal distributions with averages of 0 and 100,000 euros/ year respectively and standard deviations of 17,000 euros/year. B3 corresponds to the perception of access barriers depending on the location of the screening. “Difficult to access due to screening location” is defined by the linear function that passes through the points [0, 0], [100, 1]. “Easy access due to screening location” is defined by the linear function that passes through the points [0, 1] and [100, 0]. B4 corresponds to the perception of access barriers depending on the degree of urbanization of the place of residence. The “difficult access due to the degree of urbanization” is defined by the normal distribution of mean 0.3 and standard deviation 0.1. The classification “easy access due to the degree of urbanization” corresponds to the maximum of two normal distributions with means 0 and 0.5 respectively and standard deviations 0.1. The component relating to knowledge about the disease comprises variables C1, C2 and C3. Variable C1 assesses knowledge of the disease as a function of age. "High knowledge level due to age" is defined by a normal distribution of the mean 65 years and standard deviation 30. "Low knowledge level due to age" corresponds to the maximum of two normal distributions with means 18 and 100 years respectively and deviation pattern 30. Variable C2 corresponds to knowledge of the disease as a function of educational qualifications. “High knowledge level due to educational qualifications” is defined by a linear function that passes through the points [0, 0] and [100, 1]. “Low knowledge level due to educational qualifications” is defined by a linear function that passes through the points [0, 1] and [100, 0]. Variable C3 corresponds to knowledge of the disease as a function of the percentage of times the agent previously adhered to screening. “High knowledge level due to prior adhesion” is defined by a linear function that passes through the points [0, 0] and [100, 1]. “Low knowledge level due to prior adhesion” is defined by a linear function that passes through the points [0, 1] and [100, 0]. The component relating to the quality of the screening strategy comprises variables E1, E2 and E3. Variable E1 corresponds to the quality of the strategy in terms of the sending of reminders. “High quality, considering sending reminders” is defined by a linear function that passes through the points [0, 0] and [100, 1]. “Low quality, considering sending reminders” is defined by a linear function that passes through the points [0, 1] and [100, 0]. Variable E2 corresponds to the quality of the strategy considering the waiting time at the time of screening (in minutes). “High quality, considering the waiting time” is defined by a linear function that passes through the points [0, 1] and [500, 0]. “Low quality, considering the waiting time” is defined by a linear function that passes through the points [0, 0] and [500, 1]. Variable E3 corresponds to the time (in weeks) between sending the call notice and the date of the screening. "High quality, considering advance notification of the call" is defined by a normal distribution of mean 4 and standard deviation 2. "Low quality, considering advance notification of the call" corresponds to the maximum of two normal distributions with means 0 and 8 respectively and standard deviations 2. Finally, a random noise was added, whose magnitude is controlled by the “variability” parameter. In each component, rules of the IF–Then type were defined, so that if the easy/high/high value is obtained in at least half of the variables that comprise it, there is a strong probability that the agent will adhere to the tracking. Therefore, for the “barriers of access” component, 16 rules were defined, 8 for the “knowledge of the disease” component and 8 for the “quality/strategy of the screening program” component, resulting in a total of 32 IF–then rules (listed in Supplementary Information S2). The maximum as aggregation operator and the Mamdani Fuzzy Inference Method were used, and the defuzzification of each component was performed by the Centre of Gravity (COG) method^28,29. The final result corresponds to the average of the results of the three components.

Simulation model

The ABM was developed using NetLogo 6.1.1 software³⁵, a simulator written in Scala language. The status diagram that was implemented (Fig. 4) contemplates four possible states: (i) not called; (ii) called; (iii) attended screening; (iv) did not attend screening. Initially, all diabetics assume the “not called” status. At the beginning of the simulation, each diabetic is called for screening by means of an invitation letter. At that moment, the diabetic assumes the status “Called for screening” (until the date of the screening). On the date of the screening, it is verified whether the diabetic has attended the screening or not. According to the diabetic’s action, he can assume the status “attended screening” or “did not attend screening”, as the case may be. After this phase, a new cycle begins in which the diabetic returns to the “Not called” state until the stipulated interval between screenings elapses. At that moment, a new invitation letter is issued, and the diabetic again assumes “Called for screening” status, repeating the entire process.

By integrating the fuzzy components with the result of the logistic linear regression, the current model allows the two methods to be compared, as well as the results obtained with the use of different weights selected by the user. The information regarding the screening strategy was based on the opinion of ARSN experts and on an analysis of official documents provided by the institution³⁶. Hence, the following parameters were used: Screening location = Primary Health Care Unit; Screening test sensitivity = 96%; Screening test specificity = 94%; Probability of a positive screening test = 4%; Probability of a negative screening test = 93%; Probability of an inconclusive screening test = 3%; Screening intervals = 52 weeks.

Simulations

A virtual population of 10,000 diabetics was generated and the call for screening was simulated over a period of ten screening cycles. A 52-week interval was defined between screenings. The initial population of agents was designed according to the characteristics of the ARSN diabetic population, and the model was initialized with the parameters measured from the available data. Five simulations were performed for each version of the model. In order to test the model's ability to capture geographic specificities, the simulation results obtained for each subregion were compared with the real adherence rates.

For the version that bases the individual decision of whether or not to adhere to the screening exclusively on fuzzy components, the data set was divided into two groups: training and testing. Data relating to the geographical areas of Tâmega e Sousa, Cávado, Douro, Trás-os-Montes and the Metropolitan Area of Porto, which corresponds to 66.41% of the total diabetic population covered by the ARSN, was used for training. Data relating to the geographical areas of Alto Minho, Ave and Entre Douro e Vouga, which correspond to 33.59% of the ARSN diabetic population, was reserved for testing. It was not possible to conduct a similar procedure for the version that only uses the logistic regression model because the model needs previous regional screening information to run.

For the version that uses fuzzy components, we also compared the results obtained for the entire population, with the ones obtained defining sets based on specific subgroups determined by a previous cluster analysis. To this end, we performed a cluster analysis using SPSS Modeler 18.2 software³⁴. The initial data set of 271,867 calls for DR screening, corresponding to 108,620 different diabetics, was divided in two clusters, using the TwoStep Cluster Analysis procedure's algorithm. The model summary table indicates that two clusters were found based on seven input features. The cluster quality chart indicates that the overall model quality is "Fair". 50.9% (138,258) of the records were assigned to the first cluster and 49.1% (133,609) to the second. The cluster means suggest that the clusters are well separated for some of the variables, but to better evaluate the quality of the model, chi-squares and Cramér's V tests were performed for each variable. Although the chi-square tests point to the significance of the relation between clusters and all the input variables, this is mostly due to the large dimension of the data set. In this conditions, Cramér's V tests are better suitable to understand the correlations between input variables and clusters. The Cramer’s V tests revealed that only two of the seven variables have strong correlations with the cluster variable: age groups and occupation.

So, tendentially Cluster 1 is comprised by younger diabetics, mostly active, and Cluster 2 by older diabetics, mostly retired. The main aspects of the cluster analysis were included in the manuscript and the details are available in Supplementary Information—Tables S3 and S4.

Results

Statistical analysis results

The statistical analysis results (Supplementary Information Table S5) are, in general, consistent with those found in the literature on population-based screening. Younger and older diabetics tend to adhere less to screening, as well as those earning higher incomes³⁷. Higher educational qualifications, as well as a regular habit of using primary health care—visits to the health unit in the last 12 months—are conducive to higher rates of adherence^37,38,39. Diabetics who received a higher number of invitations for previous screening and who had adhered more frequently in the past had higher rates of adherence^24,39. There are, however, results that are not supported by the literature. Contrary to expectations^37,38,39, men in the ARSN adhere more to screening, and diabetics with previous positive results have lower adherence rates in the next screening. Regarding this second result, a scientific article focusing on the perspective of one of the main hospitals in the northern region which is an integral part of the ARSN screening program may indicate a possible explanation⁴⁰. In fact, the lack of communication between hospital services and primary health care often results in calls for screening being sent to diabetics who are already being followed up and undergoing treatment in a hospital environment.

Simulation results using logistic regression only

The objective of this first set of simulations was to compare the simulated adherence rate with the real ARSN adherence rate, using logistic regression only in the agent decision process. In order to test the model's ability to capture geographic specificities, the simulation results obtained for each subregion were compared with the real adherence rates. The model was run for 520 simulated weeks to ensure convergence of results. In all cases, there was a significant initial increase in the global adherence rate, after which the model converges to an average adherence rate of 67.6%, with a standard deviation of 0.16%. The real adherence rate is slightly lower (66.6%). Figure 5 illustrates the simulation results obtained after 52, 260, 312 and 520 weeks.

When the model stabilizes, the simulated values approach the real ones. Only in one subregion (Douro) does the actual value of the adherence rate fall outside the 99% confidence interval. The simulation results also reflect the geographical asymmetries well (Table 1, Fig. 7).

Table 1 Simulation results versus real data.

Full size table

Simulation results using fuzzy components only

In this second phase, the simulations were obtained using fuzzy components only to establish the agent decision rules. The initial population of agents was designed according to the characteristics of the diabetic population in each of the geographic areas that belong to the ARSN. Since all the real data belongs to the same Regional Health Administration, the screening strategy is similar in all sub-regions (both for training and testing). Hence, sending reminders is still a very incipient practice and the screening is carried out in primary care units in all the eight subregions under analysis. It has not yet been possible to obtain information on the other variables that comprise the “quality/screening strategy” component. Therefore, it was assumed that the call is sent 4 weeks in advance in all locations and that the waiting time on the screening day is always 10 min. Five simulations were performed. Figure 6 corresponds to Netlogo's graphic output obtained with the first simulation performed.

Table 1 and Fig. 7 summarize the results obtained in comparison to the real data.

The overall ARSN adherence rate obtained using the simulation model is 1.55% below the region's real adherence rate (65.05% versus 66.6%). In fact, the results obtained with the simulations are slightly below the actual adhesion rate in all geographic subregions, with the smallest difference being registered in Tâmega and Sousa (0.56%) and the largest in Alto Minho (3.18%). In four subregions the actual value of the adherence rate does not belong to the 99% confidence interval. However, the model is able to effectively capture the nuances between different regions in terms of adherence to screening.

The results obtained using the previous defined clusters are very satisfactory, and, particularly for cluster 1, the simulation results are in fact a better representation of reality, when compared with the results obtain using the entire population (Table1).

It should also be noted that the adjustment to reality in the test set and the model’s ability to predict higher adherence rates supports the belief that the model has a predictive capacity in new contexts.

Simulation results using a combination of logistic regression and fuzzy components

In this last set of simulations, the agent decision results from a combination of the results obtained with logistic regression and with fuzzy components, in a ratio of 50/50. The simulations were performed as described in the two previous sections. Table 1 and Fig. 7 illustrate the results obtained.

The overall ARSN adherence rate obtained was 1.52% below the region's real adherence rate (65.08% versus 66.6%). The biggest difference (absolute value) was registered in Trás-os-Montes (4.13%) and the smallest in Douro (-0.27%). In five subregions the actual value of the adherence rate does not belong to the 99% confidence interval.

Comparation of the results obtain with the three versions of the ABM

The results obtained are close to the real ones, even though four of the eight subregions in the version that uses fuzzy components present real values that fall outside the 99% confidence interval for the mean of the simulation results. Therefore, the model captures the geographic asymmetries very well. The use of the fuzzy components leads to a high level of abstraction from the real data and shows predictive capability in new contexts (the test set), which attests to the validity of the model for the study of this problem and its usefulness as a predictive tool for public health planning. In fact, the use of logistic regression (version 1) led to the best global result: a predicted adherence rate of 67.6%, a difference of only 1% in relation to the real value (66.6%). However, the logistic regression technique is of limited use in geographic areas aiming to begin a screening program, since the main predictors included in its equation are the percentage of times the diabetic had previously adhered and the result of the last screening. Nevertheless, this technique can be very useful and effective if the necessary data is available. The combined version 3 showed no overall improvement in comparison to version 2.

Figure 8 allows for direct comparation of the differences between the results obtained with each of the ABM versions and the real results, in all the geographic areas. As can be seen, most subregions follow the general trend, producing better results when using logistic regression only. However, in subregions where screening was started more recently (and therefore has fewer years of history), such as Douro and Tâmega e Sousa, the version that relies on fuzzy components or the combined version tend to have better results.

Effect of different interventions on adherence rate

Since the region's adherence rate is lower than desired (80%), it will be necessary to develop public health interventions in this area. Therefore, in order to predict and compare the results of several possible interventions, simulations were carried out for different hypothetical scenarios. These are only first examples of the applications of our research (a prove of concept), and we plan to continue to improve our model so that it can be used to analyze a wider range of scenarios. All the simulations on this section were carried out using the ABM version with the logistic regression (version 1).

Scenario 1—intervention that allows increasing adherence of diabetics who tested positive in the previous screening to 95%

According to ARSN experts, it makes no sense for this group of diabetics to have lower adherence rates than those who had a negative result. Therefore, and based on the literature referred to previously⁴⁰, the hypothesis was formulated that the existing data are biased due to a weak articulation between hospital services and primary health care, which leads to the sending of calls to diabetics followed in a hospital environment. In this way, this intervention would not actually consist of an increase in the real adherence rate, but rather an increase in the quality of data and the screening process.

Since only 4% of screening results are positive, a very significant impact on the overall adherence rate was not expected. In fact, the results of this simulation (in which the adherence rate of diabetics with a previous positive result was parameterized to 95% in all sub-regions) are in line with empirical knowledge, revealing that only the sub-regions with greater differences between the adherence of diabetics with previous negative and positive results show increases in the adherence rate (Table 2). Overall, the region's adherence rate would increase from 67.6 to 68.23%.

Scenario 2—Intervention that increases adherence by 5% of all diabetics who have already taken part in screening at least once.

Table 2 Scenarios results.

Full size table

According to experts, this increase could be viable, taking advantage of the presence of diabetics at screening to provide them with a small training session on the disease and the importance of annual screenings.

Therefore, in this simulation, the adherence rate of all diabetics who have already taken part in screening was programed to be increased by 5%.

According to the results obtained, this intervention would lead to substantial increases in all sub-regions (Table 2) and an increase in the global adherence rate from 67.6 to 70.28%.

Scenario 3—Intervention that allows the adherence rate of younger diabetics, particularly students, to increase by 20%.

Although 20% is an ambitious increase, experts consider that it could be possible through information sessions in schools and with the collaboration of teachers. Therefore, in this simulation the adherence of diabetics under 25 years of age and students was programed to be increased by 20%.

Since the percentage of diabetics of school age is very small (only 5.2% of the total number of diabetics in the region) the impact of this measure is minimal in terms of increasing the overall adherence rate—from 67.6 to 67. 9%. The measure is a little more interesting in regions where the adherence rate of this group is extremely low, and/or where this age group is more significant (Table 2).

Conclusions

This research aimed to demonstrate that it is possible to predict the rate of adherence to population-based screenings with a high level of abstraction using ABMs. More specifically, it intended: (i) to demonstrate that it is possible to develop an ABM that faithfully portrays the decision on whether to adhere to screening or not, using the intrinsic features of the agent and the screening program; (ii) to demonstrate that an ABM with the aforementioned characteristics can be used in contexts other than the one for which the data for its development were collected; (iii) to demonstrate the utility of combining ABM and fuzzy logic in models intended to simulate human behavior. To this end, three versions of an agent-based model were presented, differing in terms of the method used to infer the individual decision on whether to adhere to screening or not. For the first, we used a logistic regression equation, in the second logistic regression was replaced by three fuzzy logic components, and in the third a combination of the two methods was used. All three versions were calibrated and validated using real data from 271,867 calls for screening in the Northern Region Health Administration. The results obtained indicate that it is possible to predict the rate of adherence to screening for diabetic retinopathy using demographic and socioeconomic data for the target population, and information regarding the screening strategy. The use of the fuzzy components leads to a high level of abstraction from the real data and shows predictive capability in new contexts (the test set), which attests to the validity of the model for the study of this problem and its usefulness as a predictive tool for public health planning. In fact, the use of logistic regression (version 1) led to the best global result: a predicted adherence rate of 67.6%, a difference of only 1% in relation to the real value (66.6%). However, the logistic regression technique is of limited use in geographic areas aiming to begin a screening program, since the main predictors included in its equation are the percentage of times the diabetic had previously adhered and the result of the last screening. Nevertheless, this technique can be very useful and effective if the necessary data is available. The combined version 3 showed no overall improvement in comparison to version 2.

Discussion

Since the 1990s, several simulation models focused on screening for diabetic retinopathy have been developed. However, despite the recognized importance of adherence to screening success, we only found one attempt to model the subjects' individual behavior²⁴. This model incorporated the Health Belief Model (HBM) into a DES model through a combination of several factors (inputs) that influence screening adherence. The model used only artificial data, leading to theoretical results, which lack validation with real data. That research also stresses the difficulty of incorporating qualitative variables, such as those used by the HBM, in DES models, emphasizing the need for the use of another type of technique. The objective of our research was to overcome those limitations, proving that is possible to simulate screening adherence behavior using computer simulations, in particular agent-based models embedded with logistic regression or/and fuzzy logic components.

Regarding the logistic regression, we found that only three independent variables had predictive value: percentage of times the diabetic had previously adhered to the screening, the last screening result, and the number of times the diabetic was called for DR screening, which is in line with the literature²⁴.

Has far as we know, our research is the first that aims to simulate adherence to RD screening using an ABM. However, ABM have been used quite successfully to model health behaviors, like alcohol use, diet, smoking.^25,26. Techniques based on fuzzy logic have also been used for simulating human behavior with good results³⁰. So, our results are in line with the literature, reinforcing the idea that these computational modeling techniques are very effective when it comes to human behavior, and are the first application on DR screenings. Moreover, we think we demonstrate the utility of the use of fuzzy logic embedded in an ABM that intend to simulate human behavior. As we did not find any research, in the health area, that combinates the two techniques we believe our results could be an important contribution to future research.

Despite the main focus of this article was proving that is possible to simulate adherence to population-based screenings through computational simulation models with a high level of abstraction, we also did some experiments to illustrate as such model can be used to support decisions in health planning, while analyzing the effectiveness of various interventions. For example, we studied the impact of promoting continuity in adherence to screening by providing the diabetic with a short training session on the day of the test, and of promoting adherence among younger diabetics through information sessions in schools. We are improving the model so that it can analyze a wider range of scenarios, the results of which we intend to present in our future research. We believe that the models developed can be of great importance when staging hypothetical interventions, enhancing the discovery of knowledge and when proposing measures to the public and private entities responsible for laws and decision-making. They can also facilitate the identification of groups/geographical locations where the problem of adherence to screening is particularly relevant and which factors have the greatest impact on the decision to adhere to the screening.

One limitation of this research is the fact that we did not have access to data from other locations with different screening strategies. In future, with the intention of improving the validation of our model, we intent to test our model with data from other geographic locations where both population characteristics and screening strategies differ substantially from those found in this training group. We also acknowledge that our approach relies on specific assumptions and data. To address data biases, we plan to test the model with data from other population-based screenings and gauge its ability to replicate the real-life adherence rates.

Data availability

The data that support the findings of this study are available from Portuguese North Region Health Administration, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request, addressed to the corresponding author, and with permission of the Portuguese North Region Health Administration.

References

Fernandes, A.G., Ferraz, A.N., Brant, R. et al. Diabetic retinopathy screening and treatment through the Brazilian National Health Insurance. Sci. Rep. 12, 13941 https://doi.org/10.1038/s41598-022-18054-6 (2022).
World Health Organization. Prevention of Blindness from Diabetes Mellitus. Accessed 05 Aug 2021 (2005).
World Health Organization/International Foundation Europe. The Saint Vincent declaration on diabetes care and research in Europe. Diabet. Med. 1990 (1989).
Javitt, J. et al. Preventive eye care in people with diabetes is cost-saving to de Federal-Government: Implications for health-care reform. Diabetes Care 17(8), 909–917 (1994).
Article CAS PubMed Google Scholar
Palmer, A. J. et al. The cost-effectiveness of different management strategies for type I diabetes: A Swiss perspective. Diabetologia 43, 13–26 (2000).
Article CAS PubMed Google Scholar
Craig, B. A., Fryback, D. G., Klein, R. & Klein, B. E. K. A bayesian approach to modelling the natural history of a chronic condition from observations with intervention. Stat. Med. 18, 1355–1371 (1999).
Article CAS PubMed Google Scholar
Vetrini, D. et al. Incremental cost-effectiveness of screening and laser treatment for diabetic retinopathy and macular edema in Malawi. PLoS ONE 13(1), 1–14 (2018).
Article Google Scholar
Maberley, D., Walker, H., Koushik, A. & Cruess, A. Screening for diabetic retinopathy in James Bay Ontario: A cost-effectiveness analysis. CMAJ 168, 160–164 (2003).
PubMed PubMed Central Google Scholar
Davies, R. & Canning, C. Discrete Event Simulation to evaluate screening for diabetic eye disease. Simulation 66(4), 209–216 (1996).
Article Google Scholar
Davies, R., Brailsford, S., Roderick, P., Canning, C. & Crabbe, D. Using simulation modelling for evaluating screening services for diabetic retinopathy. J. Oper. Res. Soc. 51, 476–484 (2000).
Article Google Scholar
Davies, R., Roderick, P., Canningt, C. & Brailsford, S. The evaluation of screening policies for diabetic retinopathy using simulation. Diabetes Med. 19, 762–770 (2002).
Article CAS Google Scholar
Davies, R. & Brailsford, S. Screening for diabetic retinopathy. In Handbook of OR/MS Applications in Health Care (M. B. a. W. P. & Sainfort, F. ed.). 493 (2004).
Chalk, D., Pitt, M., Vaidya, B. & Stein, K. Can the retinal screening interval be safely increased to 2 years for type 2 diabetic patients without retinopathy?. Diabetes Care 35, 1663–1668 (2012).
Article PubMed PubMed Central Google Scholar
Vijan, S., Hofer, T. P. & Hayward, R. A. Cost-utility analysis of screening intervals for diabetic retinopathy in patients with type 2 diabetes mellitus. J. Am. Med. Assoc. 283(7), 889–896 (2000).
Article CAS Google Scholar
Brailsford, S. C., Gutjahr, W. J., Rauner, M. S. & Zeppelzauer, W. Combined discrete-event simulation and ant colony optimization approach for selecting optimal screening policies for diabetic retinopathy. CMS 4, 59–83 (2007).
Article Google Scholar
Day, T. E., Ravi, N., Xian, H. & Brugh, A. An agent-based modelling template for a cohort of veterans with diabetic retinopathy. PLoS ONE 8, e66812 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Day, E. T., Ravi, N., Xian, H. & Brugh, A. Sensitivity of diabetic retinopathy associated vision loss to screening interval in an agent-based/ discrete event simulation model. Comput. Biol. Med. 47, 7–12 (2014).
Article PubMed Google Scholar
Aoki, N. et al. Cost-effectiveness analysis of telemedicine to evaluate diabetic retinopathy in a prison population. Diabetes Care 27, 1095–1101 (2004).
Article PubMed Google Scholar
Whited, J. D. et al. A modelled economic analysis of a digital teleophthalmology system as used by three federal healthcare agencies for detecting proliferative diabetic retinopathy. Telemed. e-health 11(6), 641–651 (2005).
Article Google Scholar
Rein, D. B. et al. The cost-effectiveness of three screening alternatives for people with no or early diabetic retinopathy. Health Serv. Res. 46(5), 1534–1561 (2011).
Article PubMed PubMed Central Google Scholar
Kirkizlar, E. et al. Evaluation of telemedicine for screening of diabetic retinopathy in the Veterans Health Administration. Am. Acad. Ophthalmol. 120(12), 2604–2610 (2013).
Google Scholar
Nguyen, H. V. et al. Cost-effectiveness of a national telemedicine diabetic retinopathy screening program in Singapore. Am. Acad. Ophthalmol. 123(12), 2571–2580 (2016).
Google Scholar
Ben, A. J. et al. Cost-utility analysis of opportunistic and systematic diabetic retinopathy screening strategies from the perspective of the Brazilian public healthcare system. Appl. Health Econ. Health Policy 18, 57–68 (2020).
Article PubMed Google Scholar
Brailsford, S. & Schmidt, B. Towards incorporating human behaviour in models of health care systems: An approach using discrete event simulation. Eur. J. Oper. Res. 150, 19–31 (2003).
Siebert, U. et al. State-transition modeling: A report of the ISPOR-SMDM Modeling Good Research Practices Task Force. Value Health 15, 812–820 (2012).
Article PubMed Google Scholar
Marshall, D. A., Burgos-Liz, L., Ijzerman, M. J., Crown, W., Padula, W. V., Wong, P. K., Pasupathy, K. S., Higashi, M. K. & Osgood, N. D. Selecting a dynamic simulation modeling method for healthcare delivery research—Part 2 report of the ISPOR dynamic simulation modeling emerging good practices task force. Value Health 3, 147–160 (2015).
Pereira, A. P., da Silva Laureano, R. M., de Lima Neto, F. B., Macedo, J. Adherence to the screening of diabetic retinopathy: An agent based simulation model. In 20th Conference of the Portuguese Association for Information Systems, (CAPSI, 2020).
Yager, R. R. & Lotfi, Z. A. An Introduction to Fuzzy Logic Applications in Intelligent Systems (Springer, 2012).
Google Scholar
Singh, H., Gupta, M. M., Meitzler, T., Hou, Z.-G., Garg, K. K., Solo, A. M. G. & Zadeh, L. A. Real-life applications of fuzzy logic. Adv. Fuzzy Syst. 2013,12 (2013).
Pourtousi, Z., Khalijian, S., Ghanizadeh, A. et al. Ability of neural network cells in learning teacher motivation scale and prediction of motivation with fuzzy logic system. Sci. Rep. 11, 9721 https://doi.org/10.1038/s41598-021-89005-w (2021).
Sperb, R. M. & Cabral, R. B. Fuzzy agent-based model: A hybrid tool for exploring spatial perception and behavior. IEEE Annu. Meet. Fuzzy Inf. https://doi.org/10.1109/NAFIPS.2004.1336262 (2004).
Article Google Scholar
Tsui, K. L., Chen, N., Zhou, Q., Hai, Y., & Wang, W. Prognostics and health management: A review on data driven approaches. Math. Probl. Eng. (2015).
Instituto Nacional de Estatística (online). https://www.ine.pt/xportal/xmain?xpid=INE&xpgid=ine_base_dados&contexto=bd&selTab=tab2&xlang=pt.
IBM SPSS Modeler. SPSS Modeler 18.2 Documentation, 18 December 2018 (online). https://www.ibm.com/support/pages/spss-modeler-182-documentation. Accessed 5 Aug 2021.
Net Logo. Net Logo User Manual version 6.1.1, 26 September 2019 (online). https://ccl.northwestern.edu/netlogo/6.1.1/docs/. Accessed 5 Aug 2021.
Pereira, A. P., da Silva Laureano, R. M., de Lima Neto, F. B. Five regions, five retinopathy screening programmes: a systematic review of how Portugal addresses the challenge. BMC Health. Serv. Res. 21, 756 https://doi.org/10.1186/s12913-021-06776-8 (2021).
European Association for the Study of Diabetes. Screening for Diabetic Retinopathy in Europe—Progress Since 2011 (European Association for the Study of Diabetes, 2016).
Rahman, S. M., Dignan, M. B. & Shelton, B. J. Factors influencing adherence to guidelines for screening mammography among women aged 40 years and older. Ethn. Dis. 13(4), 477 (2003).
PubMed Google Scholar
Keenum, Z. et al. Patients’ adherence to recommended follow-up eye care after diabetic retinopathy screening in a publicly funded county clinic and factors associated with follow-up eye care use. AMA Ophthalmol. 134(11), 1221–1228. https://doi.org/10.1001/jamaophthalmol.2016.3081 (2016).
Article Google Scholar
Abreu, A. et al. First five years of implementation of diabetic screening programme in Centro Hospitalar do Porto. Rev. Bras. Oftalmol. 76(6), 295–299 (2017).
Article Google Scholar

Download references

Acknowledgements

This work is partially funded by national funds through the FCT—Fundação para a Ciência e Tecnologia, I.P., under the projects UID/GES/00315/2020 and UIDB/04466/2020, and by the Business Research Unit ISCTE under the project HOPE. We are grateful for the support of ISCTE ISTAR-IUL—Information Sciences, Technologies and Architecture Research Center. We are also grateful to the doctors of the Northern Region Health Administration who provided data, insight and expertise that greatly assisted this research.

Author information

Authors and Affiliations

Information Sciences, Technologies and Architecture Research Center (ISTAR-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026, Lisboa, Portugal
Andreia Penso Pereira & Raul M. S. Laureano
Escola Politécnica, Computer Engineering, (POLI/EComp), Universidade de Pernambuco (UPE), Recife, 50720-001, Brazil
João Macedo
Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation Towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008, Lisboa, Portugal
Ana Afonso
Business Research Unit (BRU-IUL), Instituto Universitário de Lisboa (ISCTE-IUL), Av. das Forças Armadas, 1649-026, Lisboa, Portugal
Raul M. S. Laureano
Escola Politécnica, Computer Engineering (POLI/PPG-EC), Universidade de Pernambuco (UPE), Rua Benfica, 455-Bloco ‘C’, Recife, 50720-001, Brazil
Fernando Buarque de Lima Neto

Authors

Andreia Penso Pereira
View author publications
You can also search for this author in PubMed Google Scholar
João Macedo
View author publications
You can also search for this author in PubMed Google Scholar
Ana Afonso
View author publications
You can also search for this author in PubMed Google Scholar
Raul M. S. Laureano
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Buarque de Lima Neto
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.P.: Conceptualization, Methodology, Data preparation, Software development, Calibration, Validation. Writing original draft. J.M.: Software development, Writing-Reviewing and Editing. A.A.: Conceptualization, Methodology, Expert advice, Writing-Reviewing and Editing. R.L. and F.N.: Conceptualization, Methodology, Supervision, Writing-Reviewing and Editing.

Corresponding authors

Correspondence to Andreia Penso Pereira or Raul M. S. Laureano.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pereira, A.P., Macedo, J., Afonso, A. et al. The use of social simulation modelling to understand adherence to diabetic retinopathy screening programs. Sci Rep 14, 4963 (2024). https://doi.org/10.1038/s41598-024-55517-4

Download citation

Received: 25 September 2023
Accepted: 24 February 2024
Published: 29 February 2024
DOI: https://doi.org/10.1038/s41598-024-55517-4

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.