The effect of Pap smear screening on cervical cancer stage among southern Thai women

Our study aimed to investigate the effect of Pap smear screening on stage at diagnosis of cervical cancer in a heterogeneous population of Thai women. Data was merged from the population-based cancer registry and screening registry based on unique identification numbers from 2006 to 2014. Patients being screened had lower odds to be diagnosed at late stage. After adjustment, married women had reduced risk of late stage cancer compared to single women. Muslim women had almost twice the risk of being diagnosed late stage compared to Buddhist women. The odds of being diagnosed at late stage decreased with increased number of screening. The probability of being diagnosed at late stage increased rapidly among females aged 40 to 55 years. Pap smear screening is a protective factor in diagnosis of late stage cervical cancer. Patients were more likely to be diagnosed at early stage with more frequent screening. For future screening programs, it will be beneficial to shorten screening intervals and take more concern for vulnerable population: women aged between 40 and 55 years, and women who are single or Muslim.

Among screened women, 59(64.8%) were diagnosed with early stage cancer, while only 202(44.4%) were diagnosed at early stage among screened women ( Table 2). The odds of a screened woman getting late-stage diagnosis were 0.43 times lower compared to a not screened woman. A generalized linear model (GLM), which assumed a linear association between screening and diagnosis, was applied to get the probability of being diagnosed at early or late stage based on multiple independent variables. After the model selection (see methods), the best model included five independent variables (screening frequency, marital status, age, religion and hospital level). The results showed that not being screened, being single, older age, and being Muslim had higher odds of being diagnosed at late stage (Table 3).
Generalized additive model regression (GAM) was used to highlight nonlinear trends in the data. All applicable continuous variables were smoothed using thin plate splines to separate trends from noise. The y-axis was transformed to be interpreted as the logarithm of the 'odds' to the probability of being diagnosed at late stage. The GAM revealed hidden non-linear trends of disease stage with age at diagnosis and repeated time of screening: the probability of being diagnosed at late stage increases rapidly among female aged from 40 to 55, as well as being screened from 0 to 5 times, the probability of being diagnosed at late stage dropped over 15% from 3-time to 5-time screening (Fig. 1).
The result of the GAM was similar to the GLM ( Table 4). The deviance explained by generalized additive model was 8.74%. Analysis of variance showed that the residual deviance and the Akaike information criterion of the generalized additive model was less than generalized linear model, illustrating the better fit of the generalized additive model fit (Table 4).

Discussion
Cancer stage at diagnosis is a critical determinant of cancer outcomes and is directly associated with survival in cancer patients 10 . Previous studies showed that women rarely or never screened were more likely to be diagnosed at late stage than women undergoing routine screenings [11][12][13][14][15] . Our study showed that patients were more likely to be diagnosed at an early stage with more frequent screening.
Studies in the past indicated that various disparities prevented women from being diagnosed at an early stage. A study from Florida indicated that elderly, unmarried, and uninsured women are more likely to be diagnosed late stage 16 . One study conducted in three American cities showed that residence in less developed neighborhoods   www.nature.com/scientificreports www.nature.com/scientificreports/ tended to cause late-stage cancer diagnosis 17 , as well as physician characteristics, such as being screened before, or having visited doctors in the past 3 years which was revealed in a retrospective cohort study in Canada 18 .
Our analysis also validated that elderly and unmarried women had higher odds of getting a late stage diagnosis in Thailand. Moreover, women aged between 40 and 55 years were the most vulnerable population to late stage diagnosis. Possible explanations are that women ages 40 and 55 years are at a high risk for late stage cervical cancer because of having a sexual partner; unmarried females might get less social support or fear the loss of virginity or think of cervical cancer as a sexually transmitted disease which might prevent them from entering the clinic for screening.
Religious and cultural beliefs, especially those valuing modesty and premarital virginity, contribute to reluctance to seek health care 19 . In our study, Muslim women had a higher chance of getting a late stage diagnosis. Many Muslim women face challenges in obtaining adequate health care due to family pressures, especially from their husband 20 ; they may resist screening practices that threaten their cultural and religious values. Additionally, Asian people consider cancer screening as a response to symptoms instead of tests to prevent the development of symptoms 21 .
The latest recommendation is to screen women with Pap tests every 3 years according to our findings and the recommendation made by the United States Preventive Services Task Force and American Cancer Societies 22 . Our findings show that more frequent screening might decrease the number of women diagnosed at late stage. More frequent Pap smear screening should be provided for high-risk group, such as unmarried, women aged 40 to 55 and Muslim women. Reminder letters, texts through mobile phones or door-to-door visit by community health workers might be good ways to notify the high-risk group to attend the screening on time.
Despite the better outcome of 3-year interval screening, many countries including Thailand used the 5-year interval due to the limited financial budget and human resources. Moreover, in 2017, Thailand government has initiated free HPV vaccination for Grade 5 students 23 , which is another way to prevent cervical cancer under the 5-year interval screening. Thus, we suggest a Pap smear screening program with 3-year interval when there are enough health workers and money for the whole society.
Our study had some limitations. First, registry data and screening data were not well matched leading to a limited number of cases available for this analysis; Second, several important independent variables were not included: socio-economic status, occupation, education, and age at birth of first child. Therefore, our findings may not be generalized to all women in Thailand. However, this is the first study to combine a screening database   www.nature.com/scientificreports www.nature.com/scientificreports/ with a cancer registry in Thailand to identify factors that contribute to reduced screening and how it can affect cancer diagnosis. Also, propensity score matching was proved to be an effective way to link two separate databases. Lastly, generalized additive model revealed nonlinear trends that were important for particular age groups and these could not be detected by the commonly used generalized linear model.

Methods
Region. Songkhla province is located in southern Thailand with a population of 1,424,230 (25% Muslims) 24 .
Although the estimated age-standardized incidence rate dropped from its peak 20.6 in 1999 to 14.0 per 100,000 in the period 2010 to 2012 3 , cervical cancer still ranks the second among common female cancer of Songkhla. The organized Pap smear screening program has started since 2004 in Songkhla 9 .
Cancer registry and screening. The Songkhla registry covers sixteen districts in southern Thailand.
Cancer cases have been collected from 23 sources including community and private hospitals, also the population registration office. Undetected cases still existed in remote villages due to poor access to health facilities and the utilization of traditional Thai medicine in lieu of health care services 25 .
From 1989-2014 Songkhla Cancer Registry data, 3,317 cervical cancer cases were selected by ICD-10 codes (C53.X for invasive cancer and D06.9 for carcinoma in situ of the cervix uteri). Cervical cancer screening data from 2001-2016 was provided by Songkhla Provincial Health Office. There were 114,222 persons screened with 208,039 times (visits).

Data management.
Although screening started in 2002, the cervical cancer screening database used before 2006 had a different data structure. Therefore, data from both the cancer and screening registries were only able to be merged based on unique registry numbers from 2006 to 2014 as this was the time period with the most complete variables needed for the analysis. Prior to 2006, missing data prevented any informative analysis.
Due to the nature of the cancer registry, only invasive cancers were included in this analysis. The cancer registry intends to collect invasive cancers only. If precancerous lesions were identified in the cancer registry, the data was considered incomplete as the invasive cancer was likely not collected. Including this information would bias our analyses as it would seem that the in situ cases did not progress to invasive carcinomas, although they likely did and were just not captured by the registry.
Only 5 patients had unknown stage, and were excluded to only include those with complete stage information. We included 680 women for this retrospective population-based study. Of these, 91 women aged from 35-60 had their screening before diagnosis, and 589 had a cancer diagnosis but no screening history (Fig. 2).
A propensity score is the probability of a unit (e.g., person, hospital, department) being allocated to a specific treatment given a group of observed covariates. Selection bias can be reduced by equating groups based on these covariates. The propensity score was estimated by running a logit model where the outcome variable is a binary variable indicating the stage of cervical cancer. For the matching, covariates that were related to both the screening and outcome variables were included. The R package "MatchIt" was used for estimating the propensity score and then matches observations based on the method of choice ("nearest" in this case). After matching, 91 for screening group and 455 for no screening group were included. Data analysis. The stages of cervical cancer were tagged from 1 to 4. Stage 1-The cancer is contained within the cervix; Stage 2-The cancer reaches out of the cervix to the surrounding tissues; Stage 3-The cancer spreads outside the surrounding area of the cervix; Stage 4-This stage is advanced cervical cancer. Stage 1 was assigned to early stage, the rest as late stage.
Pap smear screening was categorized in two ways: binary variable (1 for being screened; 0 for not being screened) and count variable (0, 1, 2 to N times of being screened). Other independent variables included religion, age, marital status, hospital level, list all variables. Women aged 30-60 years were the target of the national screening program and therefore, the focus of this analysis. The first 5-year age group was omitted from the calculation as unstable estimates of the risk might occur in counting cases first entering the screening process by calendar year.
Data management and description analysis was done by using R packages including "epicalc", "plyr" and "reshape2". Logistic regression was conducted by using R package "ice". Inference was made based on the chi-square test in univariate analysis. P-value less than 0.05 suggests there is a significant difference among different categories. The 95% confidence interval and p-value were calculated in the logistic regression model. The likelihood ratio test and Wald's test were used to test the statistical significance of the variable in the logistic regression model. A p-value less than 0.05 suggests the statistical significance of a variable in the model. All statistical analysis was conducted using R software version 3.5.2. A generalized additive model was conducted by using R package "mgcv". Generalized additive models are generalized linear models in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. Therefore, the advantage of the generalized additive model (GAM) lies in relaxing the near universal statistical assumption of linearity, and thereby potentially allowing the discovery of important trends that may have been missed in traditional analyses. We actually tried to smooth all applicable continuous variables to see if non-linear relationship was expected. In the usual practice of assessing non-linearity, the value of a continuous variable is cut into ordinal strata and the ordinal variable is tested with a linear model. The smoothing technique takes segments of data and assesses the relationship of the continuous predictor with the outcome and gives a series of values rather than a single value for the beta estimate 26 . The equation of generalized additive model is as follows: g(E(y)) = β 0 + f 1 (x 1 ) + f 2 (x 2 ) + ·· · + f m (x m ). The best fitted multivariate model was met by stepwise varible selection based on Akaike information