The discriminatory capability of existing scores to predict advanced colorectal neoplasia: a prospective colonoscopy study of 5,899 screening participants

We evaluated the performance of seven existing risk scoring systems in predicting advanced colorectal neoplasia in an asymptomatic Chinese cohort. We prospectively recruited 5,899 Chinese subjects aged 50–70 years in a colonoscopy screening programme(2008–2014). Scoring systems under evaluation included two scoring tools from the US; one each from Spain, Germany, and Poland; the Korean Colorectal Screening(KCS) scores; and the modified Asia Pacific Colorectal Screening(APCS) scores. The c-statistics, sensitivity, specificity, positive predictive values(PPVs), and negative predictive values(NPVs) of these systems were evaluated. The resources required were estimated based on the Number Needed to Screen(NNS) and the Number Needed to Refer for colonoscopy(NNR). Advanced neoplasia was detected in 364 (6.2%) subjects. The German system referred the least proportion of subjects (11.2%) for colonoscopy, whilst the KCS scoring system referred the highest (27.4%). The c-statistics of all systems ranged from 0.56–0.65, with sensitivities ranging from 0.04–0.44 and specificities from 0.74–0.99. The modified APCS scoring system had the highest c-statistics (0.65, 95% C.I. 0.58–0.72). The NNS (12–19) and NNR (5-10) were similar among the scoring systems. The existing scoring systems have variable capability to predict advanced neoplasia among asymptomatic Chinese subjects, and further external validation should be performed.

Nevertheless, the discriminatory capability of these tools to predict advanced neoplasia in specific population groups remained unexplored. It has been demonstrated that the incidence and distribution of colorectal neoplasia are different among different racial and ethnic groups [27][28][29][30] . The original studies which published these scoring systems called for external validation in other cohorts [10][11][12][13][14][15][16][17] . The population of Greater China was 1.39 billion in 2013, excluding residents living in various continents in the globe. The proportion of ethnic Chinese population in the world was 20% 31 , highlighting a need to evaluate the most suitable tool for risk stratification for this ethnic group.
The objective of this study is to compare the predictive performance and resources required of these seven published risk scoring tools to detect advanced neoplasia in a large Chinese population. These findings could inform the predictive capability of the existing scoring systems and the resources required to identify subjects with advanced neoplasia-i.e. whereby colonoscopy is warranted.

Methods
A bowel cancer screening centre was established in Hong Kong in 2008, which provides free-of-charge CRC screening services for eligible Hong Kong citizens 32,33 . The centre is accessible to all Hong Kong residents. Following several media announcements, we recruited subjects for free screening services by registrations via telephone, e-mail, fax or walk-in. The present evaluation included all screening participants who received colonoscopy in the study period 2008-2014. The Clinical Research Ethics Committee of the Chinese University of Hong Kong approved the study (protocol CRE-2008.404), and the methods were carried out in accordance with the approved guidelines. All participants provided informed consent before enrolling into the study, and were invited to visit the centre before screening.
Screening Participants. Upon centre visits, two independent health educators checked for subject eligibility. These eligibility criteria included: (i) age 50-70 years; (ii) the absence of any current and previous CRC symptoms, such as haematochezia, tarry stool, anorexia or change in bowel habit in the past 4 weeks, or unintentional weight loss of greater than 5 kg in the past 6 months; and (iii) not having undergone any CRC screening tests in the past 5 years. Exclusion criteria consist of personal history of CRC, colorectal adenoma, inflammatory bowel disease and the presence of medical conditions which were contraindications for colonoscopy, like cardiopulmonary insufficiency and the use of double antiplatelet therapies.
Screening Colonoscopy. All study participants were explained about the nature, benefits and risk of colonoscopy before the procedures. We used polyethylene glycol (Klean-Prep R , Helsinn Birex Pharmaceuticals Ltd, Ireland) as a standard bowel preparation regime for all participants, who were reminded of colonoscopy attendance before they left the centre. A team of experienced physicians and colorectal surgeons performed all colonoscopy procedures in the endoscopy centres affiliated with the University. The sedation regimen used included midazolam 2.5 mg (Groupe Panpharma, France) and meperidine 25 mg (Martindale Pharmaceuticals, United Kingdom). Further doses of these drugs were administered according to the subject's level of discomfort. We used air insufflation and aimed for cecal intubation, aiming for a withdrawal time of ≥ 6 minutes in accordance with the current quality indicators for colonoscopy 34 . All colorectal lesions were removed and biopsied as deemed appropriate by the endoscopists. We sent all the biopsied samples to an accredited laboratory for gross and microscopic examination. Advanced neoplasia was defined as CRC or any colorectal adenoma which has (1). a size of ≥ 10 mm in diameter; (2). high grade dysplasia; (3). villous or tubulovillous histologic characteristics, or any combination thereof 19 . In the presence of multiple lesions, the most advanced characteristic was assigned.
Evaluation of the existing scoring systems: outcome variables and statistical analysis. From a thorough literature review, seven studies which devised and validated scoring systems based on elementary clinical information to predict advanced neoplasia were identified [10][11][12][13][14][15][16][17] . Table 1 summarizes the key feature, predictor variables, and the computational algorithm of each scoring system. We defined the threshold for colonoscopy referral as the cut-off score where: (1). the subjects were classified as high or very high risk; or (2). the subjects in the initial development and validation of the scoring system were found to be at a specific risk level which was just higher than that of the whole cohort in the respective study.
For the US Physician health survey 11 , only male subjects in our cohort were included, since the US survey exclusively consists of male physicians. In the German cohort evaluated by Tao and colleagues 13 , we only included non-smokers, non-drinkers, as we do not have detailed information on pack years and drinking frequency in our cohort. For the Poland system devised by Kaminski et al. 14 , only non-smokers aged 66 years or below from our participants were included. The original APCS scoring system developed by Yeoh and colleagues has been extensively evaluated in the Asia-Pacific countries and the c-statistics was found to be 0.64 ( ± 0.04) 15 . A modified version of the APCS scoring system has been devised and validated in 7,463 subjects from 11 Asian cities, with a c-statistics of 0.65 (95% C.I. 0.58-0.72) 17 . It incorporated body mass index as a predictor variable in addition to age, gender, smoking and family history of CRC. In this study we studied the modified APCS system.
The proportion of subjects referred for colonoscopy in our cohort was delineated when different scoring systems were applied -and each was compared with the modified APCS scoring system using McNemar test. The accuracy of all the prediction strategies to detect advanced neoplasia was evaluated, including the sensitivity, specificity, and positive predictive values (PPVs) and negative predictive values (NPVs). The discriminatory ability for prediction was computed for each scoring model, presented as the concordance (c-) statistics. The c-statistics was used to measure the discriminatory power between those with and without advanced neoplasia 35 , which is identical to the Area Under the Curve (AUC) for binary logistic regression models. The statistics considered all pairs of subjects, and computed the proportion of pairs in which the model accurately predicted a higher likelihood of advanced neoplasia for subjects categorized as high risk. Similar to the approach adopted by Imperiale et al. 36 , a c-statistic of 0.7-0.8 was regarded as good discrimination, whilst a value > 0.8 indicated excellent discrimination. A c-statistics between 0.6-0.7 had some clinical value whereas c-statistics < 0.6 had no clinical value. We employed the deLong test to compare the AUCs of the seven systems. Using the modified APCS scoring system as the comparison group, each system was evaluated according to their ability to accurately classify subjects into high vs. low risk group. This is presented as the Net Reclassification Index (NRI), defined as the sum of differences in proportions of correct reclassification minus incorrect reclassification. A positive NRI indicates more accurate classification of risk for advanced neoplasia by the assessed system than the modified APCS system; whilst a negative NRI indicates less accurate classification of the risk by the assessed system than the modified APCS system. The resources required to detect advanced neoplasia was estimated by the Number Needed to Screen (NNS) and the Number Needed to Refer (NNR) for colonoscopy to detect one advanced neoplasia. The NNS was the total number of subjects in each subgroup divided by the number of subjects referred for colonoscopy and detected as having advanced neoplasia, according to each risk stratification system. The NNR was the number of subjects referred for colonoscopy divided by the number of subjects referred for colonoscopy and detected as having advanced neoplasia. The Statistical Package for Social Sciences version 19.0 was used for all data analysis. P values < 0.05 were considered statistically significant.

Results
Participant characteristics. From a total of 5,899 eligible screening participants, the average age was 57.7 years (SD 4.9) ( Table 2). Male subjects consist of 47.1%, and the proportion of smokers and alcohol drinkers was 8.3% and 9.7%, respectively. Of all participants, 1,700 (29.2%) had BMI ≥ 25 kg/m 2 , 847 (14.4%) had a family history of CRC in a first-degree relative, and the proportion of subjects self-reported as having diabetes, hypertension, and current NSAID use was 7.6%, 23.2% and 4.7%, respectively. There were 25 CRC (0.4%) and 339 (5.7%) advanced adenomas. The characteristics of the screening participants are summarized in Table 2.
Colonoscopy resources. The NNS ranged from 12 (95% C.I. 6-21) of the US physician health survey to 19 (95% C.I. 11-30) of the Germany and Poland scores ( Table 6). The NNR ranged from 5 (95% C.I. 2-12) of the US physician health survey to 10 (95% C.I. 5-18) of the Spain, Poland, and KCS systems. There were no significant differences of both NNS and NNR among the various scoring systems (Table 6).

Discussion
This study compared the performance of seven existing prediction models in a Chinese population. From 5,899 asymptomatic subjects, 6.1% had advanced neoplasia. The proportion of screening participants referred for colonoscopy was the highest using the KCS and the Spain scores, and the lowest using the Germany and the US Physician Health Survey criteria. The scoring systems had variable discriminatory ability to predict the risk of advanced neoplasia (c-statistics ranged from 0.56-0.65). The modified APCS score seemed a preferable system to classify high risk subjects based on its highest c-statistics. These findings implied that prediction tools for advanced neoplasia may need further external validation to evaluate their generalizability.
The modified APCS scoring system demonstrated a higher discriminatory ability to detect advanced neoplasia and improvements in risk prediction compared with two other tools. There is a relatively high proportion of Chinese subjects represented in this score (5,795 out of 7,463; 77.6%), whilst the other subjects were recruited in Korea (4.0%), Malaysia (5.8%), the Philippines (0.7%), Singapore (0.9%), Thailand (4.2%), Japan (5.3%), Brunei (1.1%) and Pakistan (0.4%). Also, BMI was included as a predictive variable -a parameter missing in the original APCS score 15 . In addition, despite the lowest c-statistics of the KCS scoring system 16 , its NNS and NNR were similar to other scores. It might be that the difference in c-statistics with other systems was so small to observe a clinically significant difference in NNS/NNR. Future studies are needed to explore further rooms for improving the discriminatory capability of the KCS scoring system. This is the first large-scale study which evaluated the predictive ability and colonoscopy resources required when the existing prediction tools were applied in a Chinese population. The evaluation is unique as we did not only evaluate these scoring systems by their concordance statistics. The approach of solely relying on discrimination measures has been criticized, since calibration (i.e. the agreement between predicted and observed risk) is also a crucial aspect of model performance 37,38 . Most importantly, comparing c-statistics lacks an apparent clinical interpretation, such as how patient classification would improve with inclusion or exclusion of other predictors in risk scoring systems. Reclassification has recently become a popular approach for comparing improvement among risk scoring systems for diagnosing common diseases [39][40][41] . A model is considered better when individuals who have subsequently developed the disease and those who have not developed the disease are reclassified to a higher risk category and to a lower risk category, respectively.
This study highlighted a need to modify existing tools with c-statistics < 0.60 to risk stratify subjects for colonoscopy screening -and is clinically important as identification of advanced neoplasia enables secondary prevention by polypectomy 19,42 . The estimation of individual risk for advanced neoplasia may facilitate informed, shared decision making process about screening as part of patient-centred care 43 . Nevertheless, there are some limitations which should be addressed. Firstly, we invited screening participants who were self-referred following media announcements. They may be different from the general public -therefore our subjects are more representative of patients who volunteered as screening participants. The self-selected population included in the present cohort  Table 3. Individuals referred for colonoscopy according to each scoring system. 1  had low prevalence of smoking or alcohol drinking, and high proportion of them had family members suffering from CRC -which may reflect the better health-consciousness when compared with the general population. One may also anticipate that a revised methodology to recruit subjects using a population-based, random sampling approach may meet with a high refusal rate. Secondly, the German and Poland systems 13   vegetable, prickled food, fried food, and white meat intake. Detailed information of these food items were not collected in our cohort. Another system developed by Law and colleagues in Malaysia 45 was mainly reserved for symptomatic patients, and none of our subjects in our study was categorized as high risk, as they were all asymptomatic. Other scoring systems required detailed information on family history, physical activity, leisure time vigorous activity, dietary intake, the use of hormone replacement therapy, and folate consumption [46][47][48][49][50] . Critics might argue that exclusion of high-risk subjects (smokers and drinkers) and low-risk subjects (women) might lead to altered estimates of the discriminatory ability of these three scoring systems. Furthermore, the choice of cut-off value for each prediction system was based on the recommendation from the respective original article. It still remains to be explored whether addition of more variables could further improve the scoring systems, such as lifestyle measures (like dietary intake; smoking and alcohol drinking) and medical conditions known to be risk factors for advanced neoplasia (like diabetes and central obesity measured by waist circumference). Besides, some variables like age and BMI could be analyzed by treating them as continuous variables which might enhance the concordance statistics. Finally, the nature of these risk models may apply to lifetime or longer-term prediction of risk for ACN, and the cross-sectional nature of the present study might benefit by further prospective colonoscopy follow-up for additional risk score validation. In summary, these findings suggested that in the absence of newer prediction tools, the modified APCS system could be useful to risk-stratify ethnic Chinese subjects. The formulation and implementation of a higher-performing scoring system may optimize the efficiency of screening resources and prioritize high-risk subjects for colonoscopy. Future studies may evaluate the performance of these scoring systems where the same cut-off values for absolute risk of ACN were applied, and target on devising scores using additional variables.