Introduction

Pelvic organ prolapse (POP) is a common disease among older women, leading to various symptoms1. The prevalence rate of posterior pelvic prolapse is 18–40%2. The severity of posterior pelvic prolapse is closely related to defecation disorders3. Constipation was a rising concern of POP patients. Obstructed defecation affected as much as 43% of patients with pelvic organ prolapse4.

With increasing attention to pelvic floor disorders, a series of validated self-administered QOL questionnaires have been developed to assess individual symptoms5. However, the validated Chinese version of questionnaires inquiring into functional constipation was limited. Although the Rome III criterion for functional constipation is considered the golden standard, it is not possible to assess the severity of the symptoms of constipation6. Pelvic Floor Impact Questionnaire Short Form 7 (PFIQ-7) and Pelvic Floor Distress Inventory 20 (PFDI-20) were developed in 20057. The Chinese translation version of the questionnaires and their reliability and validity analysis among the Chinese patients were completed in 2011 and 2019, respectively8,9. They were widely used among patients with pelvic organ prolapse. However, PFIQ-7 and PFDI-20 were not designed for investigating symptoms of constipation7.

The Constipation Scoring System (CSS), also known as Wexner Constipation Scores or Cleveland Constipation Scores, is one of the earliest and most widely accepted Constipation Scores. The original English version of the Constipation Scoring System was developed and verified by the Colorectal Surgery Department, Cleveland Clinic, USA10. CSS is widely used to assess the severity of constipation in China11. However, no Chinese version of CSS has been analyzed for reliability and validity, nor is there verification research on the Chinese POP women. As pelvic floor gynecologists increasingly pay attention to assessing posterior pelvic symptoms, CSS has shown excellent prospects in applying to POP patients. The use of CSS in the Chinese population may have cultural or linguistic issues. Some items may need to be adjusted according to the characteristics of the POP patients to improve its understandability. The demand for translating CSS questionnaire into Chinese version and testing its validation in patients with POP has become increasingly urgent.

This study aims to translate the CSS questionnaire into Chinese and analyze its reliability and validity in POP patients, fill the blanks in the questionnaire for constipation assessment of POP patients, and promote the application of CSS in clinical work and research. In POP patients, it may provide a basis for establishing a comprehensive evaluation system for the anatomy and function of posterior pelvic compartment prolapse.

Methods

It was designed as a prospective study approved by The Ethics Committee of Peking University People's Hospital (Ethics Number: 2019PHB273-01). The patients with pelvic organ prolapse were recruited from outpatient at Peking University People's Hospital from May 2019 to January 2021. All research was performed following relevant guidelines and regulations, including the Declaration of Helsinki.

Inclusion criteria: (1) Patients diagnosed with stage II or above pelvic organ prolapse according to POP-Q with ages ranging from 30 to 90; (2) Patients who report at least one abnormal defecation symptom; (3) Willing to accept relevant questionnaire survey.

Exclusion criteria: (1) Previous surgical treatment for pelvic organ prolapse; (2) Constipation was diagnosed as "colon slow transit constipation"; (3) Previous diagnosis of ulcerative colitis, Crohn's disease, or colorectal malignancy; (4) Unable to understand or complete the questionnaires; (5) Complicated with serious other somatic diseases (such as diabetes with unstable blood glucose levels, heart disease and malignant tumors in other body parts).

The CSS included eight questions: frequency of bowel movements, difficulty in evacuation effort, incomplete evacuation, abdominal pain, time in lavatory per attempt, type of assistance, unsuccessful attempts for evacuation per 24 h, and duration of constipation. The respondents evaluated the eight questions, and each question was scored according to the corresponding options (0–2 points for type of assistance, 0–4 points for other questions, and 30 points in total). The total score of the eight questions was the CSS total score. The larger the score of the questionnaire, the more serious the symptoms of constipation.

The Chinese Version of CSS was translated according to the "WHO-QOL Questionnaire Translation Method for Intercultural Quality of Life Research." We obtained the authorization of translation authentication and correlational research of the initial CSS questionnaire from the corresponding author via email. To maintain as much original meaning as possible, the CSS translation contains two dependent forward and backward translations12. We arranged a pilot study in 10 participants with POP-Q stage II or above. The official Chinese version of the CSS questionnaire was obtained according to the modified opinions of experts and the feedback of subjects. The Rome III criterion for functional constipation is an easy-to-use and straightforward method. The researchers diagnosed whether the patients met the Rome III criteria for functional constipation6. Patients were investigated with the Chinese version of the CSS questionnaire at a 2–4 weeks interval before the surgery by two separate researchers. The researcher of the second investigation was blinded from the database of the first investigation. The enrollment data are presented in Supplementary Fig. 1.

Reliability analysis

Cronbach's α coefficient was used to evaluate the internal consistency of the questionnaire. Cronbach's α coefficient > 0.7 indicated that the internal consistency of the questionnaire was good. The intraclass correlation coefficient (ICC) was used to evaluate the retest reliability of the questionnaire. ICC > 0.75 indicated good retest reliability, 0.4 ≤ ICC ≤ 0.75 was medium, and ICC < 0.4 was poor. The Kappa coefficient was used for the consistency test of two measurements. The Kappa coefficient ≤ 0.21 was considered general consistency, and a Kappa coefficient ≤ 0.41 was considered medium consistency.

Validity analysis

Factor analysis aims to determine whether the correlation between multiple observed variables can be explained or summarized by a smaller number of latent variables. Unobserved variables are also called factors13. There are two main methods: exploratory factor analysis and confirmed factor analysis. Exploratory factor analysis is used to conduct a preliminary investigation of a set of observed variables and confirmed factor analysis is a method used to test whether a specified factor structure is still valid for a new data set.

Construct validity: in exploratory factor analysis, we used the Kaiser–Meyer–Olkin test and Bartlett spherical test to determine whether the scale met the conditions of factor analysis. If the KMO value > 0.6 and p < 0.001 met the requirements of factor analysis, factor analysis could be carried out by principal component analysis to extract the common factors of feature root > 1 to judge whether the Chinese questionnaire had an excellent logical structure. In confirmed factor analysis, we use CMIN/DF, RMSEA, IFI, CFI, TLI, PNFI, PCFI to evaluate the model's fit, use AVE, CR to assess convergent validity and calculate the AVE arithmetic square root and correlation coefficient of the factors. If the arithmetic square root of AVE > correlation coefficient, it means that the discriminant validity is good.

Criterion validity: as a subscale of the PFDI-20 scale for the influence of defecation symptoms on quality of life, the CRADI-8 has been verified in the Chinese PFD population9. There is still no gold standard for evaluating defecation function in POP patients. Accordingly, the recommendations of consensus-based standards for the selection of health measurement instruments (COSMIN) were followed14. We chose the CRADI-8 as the criterion of efficacy. Spearman correlation coefficient analysis of the CSS score and CRADI-8 score was used to evaluate the validity of the criteria. If p < 0.05, the correlation between CSS score and CRADI-8 score is significant.

Floor effect and ceiling effect

The floor effect and ceiling effect mean that if more than 15% of the respondents obtain the lowest or highest score, respectively, the floor effect and ceiling effect are considered to exist15.

Reactivity

The Wilcoxon signed-rank test was used to assess whether there was a significant difference in CSS score before and after surgery. P < 0.05 indicated a significant difference between the two results.

Statistical analysis

Excel was used to input data, and SPSS 24.0 statistical software (2016 version) and AMOS 21.0 were used for data analysis. SPSS 24.0 statistical software and GraphPad Prism 8.0.2 software were used for graphic drawing. Measurement data were expressed as the mean ± standard deviation (‾x ± s) or median (25th and 75th percentiles) (M (P25, P75)) according to data distribution characteristics, while classification data were expressed as frequency and percentage. Spearman correlation analysis was used to evaluate the correlation of measurement data between the two groups. The Mann–Whitney U test of two independent samples was used to compare independent measurement data between the two groups. P < 0.05 was considered statistically significant. Taking the Roman III criteria of functional constipation for reference, the curve of the receiver operating characteristic (ROC) curve of CSS in diagnosing constipation in POP patients was drawn. We calculated the area under the curve (AUC) and the optimal cut-off point to discuss the role of CSS in the diagnosis of constipation in POP patients.

Ethical approval

The Ethics Committee of Peking University People's Hospital approved this study. (Ethics Number: 2019PHB273-01). All research was performed in accordance with relevant guidelines and regulations, including the Declaration of Helsinki. All the subjects were adults, and they obtained informed consent in the study.

Results

Demographic information and POP-Q staging of the participants

A total of 140 women were enrolled in our study and completed the Chinese version CSS questionnaire and CARDI-8 questionnaire. 90 of whom completed the second CSS questionnaire at a 2–4 weeks interval before the surgery, and 87 completed the postoperative follow-up, which was performed three months after surgery (See Supplementary Fig. 1). The demographic information is shown in Table 1.

Table 1 Demographic information of the participants.

Reliability analysis

Internal consistency

The Cronbach's α coefficient of the total score of the CSS questionnaire in the Chinese version for POP patients was 0.721. The results of each item and the overall Cronbach's α coefficient if items were deleted are shown in Table 2. The overall Cronbach’s α coefficient only increased slightly (to 0.722) when deleting the fourth question, “How often do you suffer from abdominal pain?”. The above results indicate that the internal consistency of the questionnaire is excellent.

Table 2 Internal consistency of the CSS questionnaire in the Chinese version.

Test–retest reliability

For the 90 patients who completed the second investigation of CSS, the ICC analysis was conducted and shown in Table 3. The ICC of the total score of CSS was 0.877. The results showed that the retest reliability of the total score of the questionnaire was good, and the subitems were medium to good (0.690–0.920). The Kappa value of the total CSS score was 0.424, and the Kappa value of all questions ranged from 0.581 to 0.877. The results showed great consistency in the total score and the subitems.

Table 3 Retest reliability of the CSS questionnaire in the Chinese version.

Validity analysis

Construct validity

Factor analysis was performed on the CSS scores of 140 patients to verify the construct validity of the CSS questionnaire in the Chinese version. The Chinese CSS questionnaire's Kaiser–Meyer–Olkin test result was 0.807. Bartlett's test showed a very weak partial correlation (p < 0.001), indicating that this questionnaire is very suitable for factor analysis.

The result of factor analysis is shown in Table 4. Two components were extracted from the data (Eigenvalues > 1), which explained 36.293% and 13.058% of the total data variation, respectively. These two components accounted for 47.245% of the data variation after extraction. Table 5 shows the results of the rotated component matrix analysis. QuestionsQ1-Q5 and Q7 belonged to the first factor, and Q6 and Q8 belonged to the second factor. The variances for the contributions of all factors ranged from 52.0% to 88.0%. Although the original version did not test the construct validity, the results of the Chinese version show that the tool has excellent construct validity. Confirmed factor analysis was employed to assess construct validity from three aspects of model fit degree, convergent validity, and discriminative validity. As shown in Supplementary Table 1, all indicators are confirmed to indicate great model adaptation. The standard load coefficients of each indicator in the model are shown in Supplementary Table 2. The AVE of factor 1 is not greater than 0.4, indicating that the convergent validity is not ideal. The combined reliability CR is greater than 0.6, meaning that the basic fit of the model is good and has high structural validity. For discriminative validity, we calculated that the arithmetic square root of AVE of factor 1 is 0.530. The arithmetic square root of AVE of factor 2 is 0.705, both of which are greater than their correlation, 0.525, indicating that the discriminative validity of the two is good.

Table 4 Factor analysis of the CSS questionnaire in the Chinese version.
Table 5 Rotated component matrix of the CSS questionnaire in the Chinese version.

Criterion validity

CRADI-8 is a questionnaire to assess the bothersomeness of defecation symptoms. The higher the score, the greater the annoying defecation symptoms in the patient9. The Spearman correlation coefficient between the total CSS score and the CRADI-8 in the Chinese version was r = 0.491, p < 0.001, which proved that the CSS score was significantly correlated with the CRADI-8 score.

Hypothesis testing

We hypothesized that patients diagnosed with constipation according to the Roman III criteria for functional constipation had a higher overall CSS score than patients without constipation. Compared with patients without constipation according to Roma III criteria for functional constipation, patients with constipation had higher CSS scores (p < 0.001), the median of the patients without constipation group was 3(1,5), the median of the patients with constipation group was 8(5,12).

Floor effect and ceiling effect

There were no patients with a CSS score of 30, the highest score was 21 (1/140, 0.7%), and the lowest was 0 (9/140, 6.4%). CSS application in Chinese POP women with defecation symptoms showed no significant floor effect or ceiling effect. The floor effect and ceiling effect of subitems are shown in Table 6. Only question 6, “type of assistance to evacuate,” showed ceiling effect. All questions showed floor effect.

Table 6 The floor/ceiling effect of different items in the CSS questionnaire.

Reactivity

As for the 90 patients who completed the second CSS investigation, 87(96.7%) completed the telephone follow-up three months after surgery. The CSS and CRADI-8 scores are shown in Table 7. The median CSS score before and after surgery was 6(3,10) and 3(0,7), respectively, indicating a significant decrease after surgery, consistent with the change in the CRADI-8 score. This result showed that the CSS questionnaire in the Chinese version had excellent reactivity.

Table 7 Comparison of preoperative and postoperative CSS/CRADI-8 scores.

Preliminary application of CSS in assessing posterior pelvic prolapse

The data from 125 patients with stage II or above posterior pelvic prolapse according to POP-Q was used to build the ROC curve using the Rome III criteria as reference (See Fig. 1). The AUC was 0.896 (95% CI 0.844–0.948), and the cut-off value was 4, providing a sensitivity of 91.84% and a specificity of 68.42%. The positive predictive value was 65.2%, and the negative predictive value was 92.9%.

Figure 1
figure 1

ROC curve of CSS score in the diagnosis of constipation.

Discussion

The CSS questionnaire, also known as the Wexner questionnaire, is a common tool for examining the severity of constipation in research and the clinic, which takes less than 5 min to answer all questions16. In this study, we evaluated the reliability and validity of the Chinese version of the CSS questionnaire. The results showed good internal consistency, excellent test–retest reliability, good structural validity, and good criterion validity. All assumptions were confirmed. Thus, the translated version can be used as a standard tool in clinics and research in Chinese-speaking women with pelvic organ prolapse.

Although the original version of the CSS questionnaire is widely used, it has not been formally validated17. However, the Persian version of the CSS questionnaire has been proved to have good validity and reliability16. The objective of this study was to translate the CSS questionnaire to Chinese and test its reliability and validity among women with POP.

The internal consistency of the Chinese version was good, indicating that all items presented the severity of constipation. The good test–retest reliability guaranteed that the questionnaire results were consistent over time.

As for the construct validity, two components were found in factor analysis. The first factor consisted of Q1-Q5 and Q7, which might refer to the severity of constipation. The second factor consisted of Q6 and Q8. According to our unpublished data, the result of the balloon expulsion test was significantly associated with the type of assistance and duration of defecation. These results indicated there might be an internal connection between these two items. This model was confirmed to have good adaptation, discriminative validity, and moderate convergent validity.

The Chinese version of PFDI-20 has been validated and widely used in Chinese-speaking patients with POP9. For this reason, CARDI-8, the third domain of PFDI-20, was chosen to be the reference in criterion validity testing. The CSS score was significantly correlated with the score of CARDI-8, indicating a good sensitivity in evaluating the bothersomeness of defecation symptoms. The CSS score was significantly decreased after surgery, consistent with the change in the CRADI-8 score, indicating good reactivity of the CSS questionnaire.

The total score of the Chinese version of CSS among patients with POP had no obvious floor effect or ceiling effect. However, when it came to subitems, we found that all items had floor effects, especially defecation frequency, abdominal pain, and difficulties. There is no analysis of the floor effect in the original English questionnaire, so it is hard to compare with8. This might be because the degree of constipation among the selected POP women was not as severe as other studies which investigated patients with functional constipation16. This finding was in line with our former study4. The CSS questionnaire might be adjusted to be suitable for patients with POP in the future.

We also found that the CSS questionnaire had certain diagnostic value of functional constipation among patients with posterior vaginal prolapse. Based on the Roman III criteria of functional constipation, the AUC of CSS for the diagnosis of functional constipation was 0.896 (95% CI 0.844–0.948), the optimal cut-off value was 4. The CSS questionnaire was suitable to be used as an exclusion test for its negative predictive value was as high as 92.9%.

There are some strengths in this study. It is the first study to test the reliability and validity of the Chinese version of CSS among POP patients, providing a new reliable method for accessing the severity of defecation symptoms in Chinese POP women. Also, the prospective design and the application of the blind method ensured the validity of this study.

Nevertheless, this study also has some limitations. Our study used the Rome III criteria to define functional constipation. New features in the Rome IV criteria for functional constipation may have a modest influence on clinical practice, though the majority of changes relative to Rome III are relatively minor for functional constipation18. Also, 90 patients who participated in the retest in this study were scheduled for surgery and were severely bothered by the symptoms of POP. The conclusion may not be extended to mild POP patients. Finally, the minimal clinically important change (MCID) of the CSS questionnaire was not included in our study, which was necessary when the CSS was used to evaluate the treatment effect.

Conclusion

The Chinese version of the CSS questionnaire has great internal consistency, retest reliability, and structural validity. It could be commonly applied in clinical work and scientific studies in the evaluation of constipation symptoms in POP patients. The cut-off value for diagnosing functional constipation was 4.