Introduction

Colorectal cancer (CRC) is the third most common cancer in the world1, accounting for 10% of all malignancies and 8% of all cancer mortality in 2012. Its incidence is rapidly rising in both Western and Asia Pacific countries1,2. CRC screening using colonoscopy has been shown to be effective to reduce CRC mortality by 68%3,4. Both stool-based occult blood tests and colonoscopy have been recommended as primary screening modalities for CRC screening5,6 and the updated Asia Pacific Consensus Recommendations suggested that colonoscopy is the preferred choice in increased risk individuals7.

However, limited colonoscopy resource has been a widely recognized barrier hindering CRC screening8,9. Risk prediction models could therefore prioritize high-risk subjects to receive colonoscopy – which optimizes efficiency of resources for screening. In the past decade, a number of risk scoring systems have been designed and validated based on subjects in various countries and regions10,11,12,13,14,15,16,17. These included two scores derived from US residents10 and US physician, respectively;11 one each from Spain12, Germany13 and Poland;14 the Asia Pacific Colorectal Screening (APCS) score;15 the Korean Colorectal Screening (KCS) score;16 and the modified APCS score17. These studies prospectively recruited asymptomatic CRC screening participants, with significant risk factors identified from a derivation cohort and performance evaluated in a validation cohort. These systems included a combination of well-defined risk factors for CRC as predictive parameters, including age12,18,19, sex12,19,20, family history of CRC12,21, smoking22,23, body mass index (BMI)19,24, dietary factors23,25 and long-term use of non-steroidal anti-inflammatory drugs (NSAIDs)23,26.

Nevertheless, the discriminatory capability of these tools to predict advanced neoplasia in specific population groups remained unexplored. It has been demonstrated that the incidence and distribution of colorectal neoplasia are different among different racial and ethnic groups27,28,29,30. The original studies which published these scoring systems called for external validation in other cohorts10,11,12,13,14,15,16,17. The population of Greater China was 1.39 billion in 2013, excluding residents living in various continents in the globe. The proportion of ethnic Chinese population in the world was 20%31, highlighting a need to evaluate the most suitable tool for risk stratification for this ethnic group.

The objective of this study is to compare the predictive performance and resources required of these seven published risk scoring tools to detect advanced neoplasia in a large Chinese population. These findings could inform the predictive capability of the existing scoring systems and the resources required to identify subjects with advanced neoplasia-i.e. whereby colonoscopy is warranted.

Methods

A bowel cancer screening centre was established in Hong Kong in 2008, which provides free-of-charge CRC screening services for eligible Hong Kong citizens32,33. The centre is accessible to all Hong Kong residents. Following several media announcements, we recruited subjects for free screening services by registrations via telephone, e-mail, fax or walk-in. The present evaluation included all screening participants who received colonoscopy in the study period 2008-2014. The Clinical Research Ethics Committee of the Chinese University of Hong Kong approved the study (protocol CRE-2008.404) and the methods were carried out in accordance with the approved guidelines. All participants provided informed consent before enrolling into the study and were invited to visit the centre before screening.

Screening Participants

Upon centre visits, two independent health educators checked for subject eligibility. These eligibility criteria included: (i) age 50–70 years; (ii) the absence of any current and previous CRC symptoms, such as haematochezia, tarry stool, anorexia or change in bowel habit in the past 4 weeks, or unintentional weight loss of greater than 5 kg in the past 6 months; and (iii) not having undergone any CRC screening tests in the past 5 years. Exclusion criteria consist of personal history of CRC, colorectal adenoma, inflammatory bowel disease and the presence of medical conditions which were contraindications for colonoscopy, like cardiopulmonary insufficiency and the use of double antiplatelet therapies.

Screening Colonoscopy

All study participants were explained about the nature, benefits and risk of colonoscopy before the procedures. We used polyethylene glycol (Klean-PrepR, Helsinn Birex Pharmaceuticals Ltd, Ireland) as a standard bowel preparation regime for all participants, who were reminded of colonoscopy attendance before they left the centre. A team of experienced physicians and colorectal surgeons performed all colonoscopy procedures in the endoscopy centres affiliated with the University. The sedation regimen used included midazolam 2.5 mg (Groupe Panpharma, France) and meperidine 25 mg (Martindale Pharmaceuticals, United Kingdom). Further doses of these drugs were administered according to the subject’s level of discomfort. We used air insufflation and aimed for cecal intubation, aiming for a withdrawal time of ≥6 minutes in accordance with the current quality indicators for colonoscopy34. All colorectal lesions were removed and biopsied as deemed appropriate by the endoscopists. We sent all the biopsied samples to an accredited laboratory for gross and microscopic examination. Advanced neoplasia was defined as CRC or any colorectal adenoma which has (1). a size of ≥10 mm in diameter; (2). high grade dysplasia; (3). villous or tubulovillous histologic characteristics, or any combination thereof19. In the presence of multiple lesions, the most advanced characteristic was assigned.

Evaluation of the existing scoring systems: outcome variables and statistical analysis

From a thorough literature review, seven studies which devised and validated scoring systems based on elementary clinical information to predict advanced neoplasia were identified10,11,12,13,14,15,16,17. Table 1 summarizes the key feature, predictor variables and the computational algorithm of each scoring system. We defined the threshold for colonoscopy referral as the cut-off score where: (1). the subjects were classified as high or very high risk; or (2). the subjects in the initial development and validation of the scoring system were found to be at a specific risk level which was just higher than that of the whole cohort in the respective study.

Table 1 Existing scoring systems for risk prediction of advanced neoplasia.

For the US Physician health survey11, only male subjects in our cohort were included, since the US survey exclusively consists of male physicians. In the German cohort evaluated by Tao and colleagues13, we only included non-smokers, non-drinkers, as we do not have detailed information on pack years and drinking frequency in our cohort. For the Poland system devised by Kaminski et al.14, only non-smokers aged 66 years or below from our participants were included. The original APCS scoring system developed by Yeoh and colleagues has been extensively evaluated in the Asia-Pacific countries and the c-statistics was found to be 0.64 ( ± 0.04)15. A modified version of the APCS scoring system has been devised and validated in 7,463 subjects from 11 Asian cities, with a c-statistics of 0.65 (95% C.I. 0.58–0.72)17. It incorporated body mass index as a predictor variable in addition to age, gender, smoking and family history of CRC. In this study we studied the modified APCS system.

The proportion of subjects referred for colonoscopy in our cohort was delineated when different scoring systems were applied – and each was compared with the modified APCS scoring system using McNemar test. The accuracy of all the prediction strategies to detect advanced neoplasia was evaluated, including the sensitivity, specificity and positive predictive values (PPVs) and negative predictive values (NPVs). The discriminatory ability for prediction was computed for each scoring model, presented as the concordance (c-) statistics. The c-statistics was used to measure the discriminatory power between those with and without advanced neoplasia35, which is identical to the Area Under the Curve (AUC) for binary logistic regression models. The statistics considered all pairs of subjects and computed the proportion of pairs in which the model accurately predicted a higher likelihood of advanced neoplasia for subjects categorized as high risk. Similar to the approach adopted by Imperiale et al.36, a c-statistic of 0.7–0.8 was regarded as good discrimination, whilst a value >0.8 indicated excellent discrimination. A c-statistics between 0.6–0.7 had some clinical value whereas c-statistics <0.6 had no clinical value. We employed the deLong test to compare the AUCs of the seven systems. Using the modified APCS scoring system as the comparison group, each system was evaluated according to their ability to accurately classify subjects into high vs. low risk group. This is presented as the Net Reclassification Index (NRI), defined as the sum of differences in proportions of correct reclassification minus incorrect reclassification. A positive NRI indicates more accurate classification of risk for advanced neoplasia by the assessed system than the modified APCS system; whilst a negative NRI indicates less accurate classification of the risk by the assessed system than the modified APCS system.

The resources required to detect advanced neoplasia was estimated by the Number Needed to Screen (NNS) and the Number Needed to Refer (NNR) for colonoscopy to detect one advanced neoplasia. The NNS was the total number of subjects in each subgroup divided by the number of subjects referred for colonoscopy and detected as having advanced neoplasia, according to each risk stratification system. The NNR was the number of subjects referred for colonoscopy divided by the number of subjects referred for colonoscopy and detected as having advanced neoplasia. The Statistical Package for Social Sciences version 19.0 was used for all data analysis. P values < 0.05 were considered statistically significant.

Results

Participant characteristics

From a total of 5,899 eligible screening participants, the average age was 57.7 years (SD 4.9) (Table 2). Male subjects consist of 47.1% and the proportion of smokers and alcohol drinkers was 8.3% and 9.7%, respectively. Of all participants, 1,700 (29.2%) had BMI ≥ 25 kg/m2, 847 (14.4%) had a family history of CRC in a first-degree relative and the proportion of subjects self-reported as having diabetes, hypertension and current NSAID use was 7.6%, 23.2% and 4.7%, respectively. There were 25 CRC (0.4%) and 339 (5.7%) advanced adenomas. The characteristics of the screening participants are summarized in Table 2.

Table 2 Characteristics of individuals included in the analysis (N = 5,899).

Proportions of colonoscopy referral

The proportion of subjects referred for colonoscopy was the highest with the KCS scores (27.4%, 95% C.I. 26.3%–28.6%), followed by the Spain scores (25.5%, 95% C.I. 24.4%–26.7%) and the modified APCS scores (21.4%, 95% C.I. 20.3%–22.5%). The proportion of colonoscopy referral by applying each scoring system (p < 0.001) was significantly different from that of the modified APCS scores (Table 3).

Table 3 Individuals referred for colonoscopy according to each scoring system.

Performance of the scoring systems

The c-statistics of the scoring systems ranged from 0.56 (95% C.I 0.48–0.64) [the Spain system] to 0.65 (95% C.I. 0.58–0.72) [the modified APCS system] (Table 4). The sensitivity of these systems ranged from 0.04–0.44 and the specificity was moderate to high (range 0.74–0.99). All of them had low PPVs (range 0.10–0.19) and high NPVs (range 0.92–0.96). The deLong tests showed that the AUCs of all the six scores, other than the modified APCS score, had no statistically significant difference.

Table 4 Performance characteristics of the various scoring systems.

Using the modified APCS scoring system as a comparator, the NRI of the Spain (−3.3%, 95% C.I. −8.0% to 1.4%), Germany (−0.5%, 95% C.I. −6.4% to 5.5%), Poland (1.2%, 95% C.I. −2.7% to 5.2%) and the KCS (−2.7%, 95% C.I. −7.2% to 1.9%) was statistically similar (all p > 0.05) (Table 5). The US-Seattle (−8.1%, 95% C.I. −13.1% to −3.1%, p = 0.001) and the US physician health survey (−20.4%, 95% C.I. −27.2% to −13.6%, p < 0.001) classified advanced neoplasia less accurately than the modified APCS scores (Table 5).

Table 5 The Reclassification performances of each risk scoring system

Colonoscopy resources

The NNS ranged from 12 (95% C.I. 6–21) of the US physician health survey to 19 (95% C.I. 11–30) of the Germany and Poland scores (Table 6). The NNR ranged from 5 (95% C.I. 2–12) of the US physician health survey to 10 (95% C.I. 5–18) of the Spain, Poland and KCS systems. There were no significant differences of both NNS and NNR among the various scoring systems (Table 6).

Table 6 Colonoscopy resources required for each risk scoring system.

Discussion

This study compared the performance of seven existing prediction models in a Chinese population. From 5,899 asymptomatic subjects, 6.1% had advanced neoplasia. The proportion of screening participants referred for colonoscopy was the highest using the KCS and the Spain scores and the lowest using the Germany and the US Physician Health Survey criteria. The scoring systems had variable discriminatory ability to predict the risk of advanced neoplasia (c-statistics ranged from 0.56–0.65). The modified APCS score seemed a preferable system to classify high risk subjects based on its highest c-statistics. These findings implied that prediction tools for advanced neoplasia may need further external validation to evaluate their generalizability.

The modified APCS scoring system demonstrated a higher discriminatory ability to detect advanced neoplasia and improvements in risk prediction compared with two other tools. There is a relatively high proportion of Chinese subjects represented in this score (5,795 out of 7,463; 77.6%), whilst the other subjects were recruited in Korea (4.0%), Malaysia (5.8%), the Philippines (0.7%), Singapore (0.9%), Thailand (4.2%), Japan (5.3%), Brunei (1.1%) and Pakistan (0.4%). Also, BMI was included as a predictive variable – a parameter missing in the original APCS score15. In addition, despite the lowest c-statistics of the KCS scoring system16, its NNS and NNR were similar to other scores. It might be that the difference in c-statistics with other systems was so small to observe a clinically significant difference in NNS/NNR. Future studies are needed to explore further rooms for improving the discriminatory capability of the KCS scoring system.

This is the first large-scale study which evaluated the predictive ability and colonoscopy resources required when the existing prediction tools were applied in a Chinese population. The evaluation is unique as we did not only evaluate these scoring systems by their concordance statistics. The approach of solely relying on discrimination measures has been criticized, since calibration (i.e. the agreement between predicted and observed risk) is also a crucial aspect of model performance37,38. Most importantly, comparing c-statistics lacks an apparent clinical interpretation, such as how patient classification would improve with inclusion or exclusion of other predictors in risk scoring systems. Reclassification has recently become a popular approach for comparing improvement among risk scoring systems for diagnosing common diseases39,40,41. A model is considered better when individuals who have subsequently developed the disease and those who have not developed the disease are reclassified to a higher risk category and to a lower risk category, respectively.

This study highlighted a need to modify existing tools with c-statistics <0.60 to risk stratify subjects for colonoscopy screening – and is clinically important as identification of advanced neoplasia enables secondary prevention by polypectomy19,42. The estimation of individual risk for advanced neoplasia may facilitate informed, shared decision making process about screening as part of patient-centred care43. Nevertheless, there are some limitations which should be addressed. Firstly, we invited screening participants who were self-referred following media announcements. They may be different from the general public – therefore our subjects are more representative of patients who volunteered as screening participants. The self-selected population included in the present cohort had low prevalence of smoking or alcohol drinking and high proportion of them had family members suffering from CRC – which may reflect the better health-consciousness when compared with the general population. One may also anticipate that a revised methodology to recruit subjects using a population-based, random sampling approach may meet with a high refusal rate. Secondly, the German and Poland systems13,14 excluded some of our subjects and this might have introduced biases, since in our original cohort comprehensive data on number of pack years among smokers and frequency of alcohol drinking were not collected. The original Kaminski score14 was developed from screening subjects aged between 40–66 years, thus limiting the number of subjects which could be included for external validation. In addition, we are unable to validate some scoring systems, including the prediction model constructed by Cai and colleagues in China, which has been demonstrated as having better discrimination in previous evaluations44. Among eight parameters, four variables require dietary recall of green vegetable, prickled food, fried food and white meat intake. Detailed information of these food items were not collected in our cohort. Another system developed by Law and colleagues in Malaysia45 was mainly reserved for symptomatic patients and none of our subjects in our study was categorized as high risk, as they were all asymptomatic. Other scoring systems required detailed information on family history, physical activity, leisure time vigorous activity, dietary intake, the use of hormone replacement therapy and folate consumption46,47,48,49,50. Critics might argue that exclusion of high-risk subjects (smokers and drinkers) and low-risk subjects (women) might lead to altered estimates of the discriminatory ability of these three scoring systems. Furthermore, the choice of cut-off value for each prediction system was based on the recommendation from the respective original article. It still remains to be explored whether addition of more variables could further improve the scoring systems, such as lifestyle measures (like dietary intake; smoking and alcohol drinking) and medical conditions known to be risk factors for advanced neoplasia (like diabetes and central obesity measured by waist circumference). Besides, some variables like age and BMI could be analyzed by treating them as continuous variables which might enhance the concordance statistics. Finally, the nature of these risk models may apply to lifetime or longer-term prediction of risk for ACN and the cross-sectional nature of the present study might benefit by further prospective colonoscopy follow-up for additional risk score validation.

In summary, these findings suggested that in the absence of newer prediction tools, the modified APCS system could be useful to risk-stratify ethnic Chinese subjects. The formulation and implementation of a higher-performing scoring system may optimize the efficiency of screening resources and prioritize high-risk subjects for colonoscopy. Future studies may evaluate the performance of these scoring systems where the same cut-off values for absolute risk of ACN were applied and target on devising scores using additional variables.

Additional Information

How to cite this article: Wong, M. C. S. et al. The discriminatory capability of existing scores to predict advanced colorectal neoplasia: a prospective colonoscopy study of 5,899 screening participants. Sci. Rep. 6, 20080; doi: 10.1038/srep20080 (2016).