Introduction

Cervical cancer ranks the fourth most common cancer worldwide (570,000 cases and 311,000 deaths in 2018), accounting for 7.5% of all female cancer deaths. Nearly 90% cervical cancer deaths occur in the less developed regions1. In Thailand, cervical cancer has been a leading cancer for decades with an estimated 24.7 per 100,000 age-standardized incidence rate in the period of 1998–20002, then dropped down to 14.0 per 100,000 in the period 2010 to 20123.

The Thai Ministry of Public Health initiated a screening program in 2002 for women aged from 35 to 60 years at 5-yearly intervals4 and later extended it to women aged from 30 to 60 years5. The coverage of the screening, as evaluated by the Health Intervention and Technology Assessment Program, was around 68% by 2009–20106. Two national surveys done in 2007 and 2009 reported a coverage of 46.3% and 59.7%, respectively7. Variation in coverage across regions existed with Bangkok Metropolitan being the poorest at around 44%. Pap smear and visual inspection with acetic acid are screening methods covered under the universal health care (UHC)plan, while the human hapillomavirus virus vaccine is not covered by UHC due to its unaffordable price7.

Cervical cancer incidence dropped after countrywide screening8. The incidence of cervical cancer declined by 4.7% per year from just give the number at the peak in 1998–2000 in Songkhla9. However, screening rates for cervical cancer were not optimal among several groups of women: less or non-educated, poor, young, unmarried, and non-Buddhist women7. Furthermore, the extent to which Pap smear screening could affect the diagnosis of late stage cervical cancer lacked empirical evidence.

This study examined the effect of repeated Pap screening on cervical cancer diagnosis, as well as other risk factors, including residence, religion, marital status and age in at-risk neighborhoods.

Results

Propensity scores were calculated based on selected covariates (marital status, age, religion and hospital level), to make the screened and not screened groups more comparable. The proportion of each covariate in the 2 groups became similar after propensity scores matching. Among not screened women, 58(12.7%) were Muslim whereas 16(17.6%) were Muslim among screened women (Table 1).

Table 1 Comparison of matched samples.

Among screened women, 59(64.8%) were diagnosed with early stage cancer, while only 202(44.4%) were diagnosed at early stage among screened women (Table 2). The odds of a screened woman getting late-stage diagnosis were 0.43 times lower compared to a not screened woman.

Table 2 Distribution of Pap smear screening and cancer stage at Diagnosis.

A generalized linear model (GLM), which assumed a linear association between screening and diagnosis, was applied to get the probability of being diagnosed at early or late stage based on multiple independent variables. After the model selection (see methods), the best model included five independent variables (screening frequency, marital status, age, religion and hospital level). The results showed that not being screened, being single, older age, and being Muslim had higher odds of being diagnosed at late stage (Table 3).

Table 3 Probability of being diagnosed at late stage.

Generalized additive model regression (GAM) was used to highlight nonlinear trends in the data. All applicable continuous variables were smoothed using thin plate splines to separate trends from noise. The y-axis was transformed to be interpreted as the logarithm of the ‘odds’ to the probability of being diagnosed at late stage. The GAM revealed hidden non-linear trends of disease stage with age at diagnosis and repeated time of screening: the probability of being diagnosed at late stage increases rapidly among female aged from 40 to 55, as well as being screened from 0 to 5 times, the probability of being diagnosed at late stage dropped over 15% from 3-time to 5-time screening (Fig. 1).

Figure 1
figure 1

Smoothed graph of repeated times of screening and age. The curve is smoothed by the technique of thin plate splines, representing the change of the probability of being diagnosed at late stage.

The result of the GAM was similar to the GLM (Table 4). The deviance explained by generalized additive model was 8.74%. Analysis of variance showed that the residual deviance and the Akaike information criterion of the generalized additive model was less than generalized linear model, illustrating the better fit of the generalized additive model fit (Table 4).

Table 4 Results and comparison of generalized linear model and generalized additive model.

Discussion

Cancer stage at diagnosis is a critical determinant of cancer outcomes and is directly associated with survival in cancer patients10. Previous studies showed that women rarely or never screened were more likely to be diagnosed at late stage than women undergoing routine screenings11,12,13,14,15. Our study showed that patients were more likely to be diagnosed at an early stage with more frequent screening.

Studies in the past indicated that various disparities prevented women from being diagnosed at an early stage. A study from Florida indicated that elderly, unmarried, and uninsured women are more likely to be diagnosed late stage16. One study conducted in three American cities showed that residence in less developed neighborhoods tended to cause late-stage cancer diagnosis17, as well as physician characteristics, such as being screened before, or having visited doctors in the past 3 years which was revealed in a retrospective cohort study in Canada18.

Our analysis also validated that elderly and unmarried women had higher odds of getting a late stage diagnosis in Thailand. Moreover, women aged between 40 and 55 years were the most vulnerable population to late stage diagnosis. Possible explanations are that women ages 40 and 55 years are at a high risk for late stage cervical cancer because of having a sexual partner; unmarried females might get less social support or fear the loss of virginity or think of cervical cancer as a sexually transmitted disease which might prevent them from entering the clinic for screening.

Religious and cultural beliefs, especially those valuing modesty and premarital virginity, contribute to reluctance to seek health care19. In our study, Muslim women had a higher chance of getting a late stage diagnosis. Many Muslim women face challenges in obtaining adequate health care due to family pressures, especially from their husband20; they may resist screening practices that threaten their cultural and religious values. Additionally, Asian people consider cancer screening as a response to symptoms instead of tests to prevent the development of symptoms21.

The latest recommendation is to screen women with Pap tests every 3 years according to our findings and the recommendation made by the United States Preventive Services Task Force and American Cancer Societies22. Our findings show that more frequent screening might decrease the number of women diagnosed at late stage. More frequent Pap smear screening should be provided for high-risk group, such as unmarried, women aged 40 to 55 and Muslim women. Reminder letters, texts through mobile phones or door-to-door visit by community health workers might be good ways to notify the high-risk group to attend the screening on time.

Despite the better outcome of 3-year interval screening, many countries including Thailand used the 5-year interval due to the limited financial budget and human resources. Moreover, in 2017, Thailand government has initiated free HPV vaccination for Grade 5 students23, which is another way to prevent cervical cancer under the 5-year interval screening. Thus, we suggest a Pap smear screening program with 3-year interval when there are enough health workers and money for the whole society.

Our study had some limitations. First, registry data and screening data were not well matched leading to a limited number of cases available for this analysis; Second, several important independent variables were not included: socio-economic status, occupation, education, and age at birth of first child. Therefore, our findings may not be generalized to all women in Thailand. However, this is the first study to combine a screening database with a cancer registry in Thailand to identify factors that contribute to reduced screening and how it can affect cancer diagnosis. Also, propensity score matching was proved to be an effective way to link two separate databases. Lastly, generalized additive model revealed nonlinear trends that were important for particular age groups and these could not be detected by the commonly used generalized linear model.

Methods

Region

Songkhla province is located in southern Thailand with a population of 1,424,230 (25% Muslims)24. Although the estimated age-standardized incidence rate dropped from its peak 20.6 in 1999 to 14.0 per 100,000 in the period 2010 to 20123, cervical cancer still ranks the second among common female cancer of Songkhla. The organized Pap smear screening program has started since 2004 in Songkhla9.

Cancer registry and screening

The Songkhla registry covers sixteen districts in southern Thailand. Cancer cases have been collected from 23 sources including community and private hospitals, also the population registration office. Undetected cases still existed in remote villages due to poor access to health facilities and the utilization of traditional Thai medicine in lieu of health care services25.

From 1989–2014 Songkhla Cancer Registry data, 3,317 cervical cancer cases were selected by ICD-10 codes (C53.X for invasive cancer and D06.9 for carcinoma in situ of the cervix uteri). Cervical cancer screening data from 2001–2016 was provided by Songkhla Provincial Health Office. There were 114,222 persons screened with 208,039 times (visits).

Data management

Although screening started in 2002, the cervical cancer screening database used before 2006 had a different data structure. Therefore, data from both the cancer and screening registries were only able to be merged based on unique registry numbers from 2006 to 2014 as this was the time period with the most complete variables needed for the analysis. Prior to 2006, missing data prevented any informative analysis.

Due to the nature of the cancer registry, only invasive cancers were included in this analysis. The cancer registry intends to collect invasive cancers only. If precancerous lesions were identified in the cancer registry, the data was considered incomplete as the invasive cancer was likely not collected. Including this information would bias our analyses as it would seem that the in situ cases did not progress to invasive carcinomas, although they likely did and were just not captured by the registry.

Only 5 patients had unknown stage, and were excluded to only include those with complete stage information. We included 680 women for this retrospective population-based study. Of these, 91 women aged from 35–60 had their screening before diagnosis, and 589 had a cancer diagnosis but no screening history (Fig. 2).

Figure 2
figure 2

Study sample inclusion process.

A propensity score is the probability of a unit (e.g., person, hospital, department) being allocated to a specific treatment given a group of observed covariates. Selection bias can be reduced by equating groups based on these covariates. The propensity score was estimated by running a logit model where the outcome variable is a binary variable indicating the stage of cervical cancer. For the matching, covariates that were related to both the screening and outcome variables were included. The R package “MatchIt” was used for estimating the propensity score and then matches observations based on the method of choice (“nearest” in this case). After matching, 91 for screening group and 455 for no screening group were included.

Data analysis

The stages of cervical cancer were tagged from 1 to 4. Stage 1—The cancer is contained within the cervix; Stage 2—The cancer reaches out of the cervix to the surrounding tissues; Stage 3—The cancer spreads outside the surrounding area of the cervix; Stage 4—This stage is advanced cervical cancer. Stage 1 was assigned to early stage, the rest as late stage.

Pap smear screening was categorized in two ways: binary variable (1 for being screened; 0 for not being screened) and count variable (0, 1, 2 to N times of being screened). Other independent variables included religion, age, marital status, hospital level, list all variables. Women aged 30–60 years were the target of the national screening program and therefore, the focus of this analysis. The first 5-year age group was omitted from the calculation as unstable estimates of the risk might occur in counting cases first entering the screening process by calendar year.

Data management and description analysis was done by using R packages including “epicalc”, “plyr” and “reshape2”. Logistic regression was conducted by using R package “ice”. Inference was made based on the chi-square test in univariate analysis. P-value less than 0.05 suggests there is a significant difference among different categories. The 95% confidence interval and p-value were calculated in the logistic regression model. The likelihood ratio test and Wald’s test were used to test the statistical significance of the variable in the logistic regression model. A p-value less than 0.05 suggests the statistical significance of a variable in the model. All statistical analysis was conducted using R software version 3.5.2.

A generalized additive model was conducted by using R package “mgcv”. Generalized additive models are generalized linear models in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. Therefore, the advantage of the generalized additive model (GAM) lies in relaxing the near universal statistical assumption of linearity, and thereby potentially allowing the discovery of important trends that may have been missed in traditional analyses. We actually tried to smooth all applicable continuous variables to see if non-linear relationship was expected. In the usual practice of assessing non-linearity, the value of a continuous variable is cut into ordinal strata and the ordinal variable is tested with a linear model. The smoothing technique takes segments of data and assesses the relationship of the continuous predictor with the outcome and gives a series of values rather than a single value for the beta estimate26. The equation of generalized additive model is as follows: g(E(y)) = β0 + f1(x1) + f2(x2) + ··· + fm(xm). The best fitted multivariate model was met by stepwise varible selection based on Akaike information criterion (AIC). First, we select the possible variables based on the conclusions of previous similar studies and univariate analysis between two groups. Then we applied stepwise model selection to looking for better fitted model. Last, we decided the final variables on the basis of statistical model selection based on Akaiki Information Criterion (AIC) and suggestion from other studies.

Ethical consideration

This study was conducted according to the World Medical Association Declaration of Helsinki and conformed to ICMJE Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals. This study was approved by the Ethical Committee of the Faculty of Medicine, Prince of Songkla University. Since the data were collected from the process of routine health examination, the content form was not required from the participants. Written informed consent was waived by the Ethical Committee of the Faculty of Medicine, Prince of Songkla University. All the information that might violate the right of privacy were not included in this study.