Validation of a risk prediction score for proximal neoplasia in colorectal cancer screening: a prospective colonoscopy study

This study developed a clinical scoring system to predict the risks of PN among screening participants for colorectal cancer. We recruited 5,789 Chinese asymptomatic screening participants who received colonoscopy in Hong Kong (2008–2014). From random sampling of 2,000 participants, the independent risk factors were evaluated for PN using binary regression analysis. The odds ratios for significant risk factors were used to develop a scoring system, with scores stratified into ‘average risk’ (AR):0–2 and ‘high risk’ (HR):3–5. The other 3,789 subjects formed an independent validation cohort. Each participant received a score calculated based on their risk factors. The performance of the scoring system was evaluated. The proportion of PN in the derivation and validation cohorts was 12.6% and 12.9%, respectively. Based on age, gender, family history, body mass index and self-reported ischaemic heart disease, 85.0% and 15.0% in the validation cohort were classified as AR and HR, respectively. Their prevalence of PN was 12.0% and 18.1%, respectively. Participants in the HR group had 1.51-fold (95% CI = 1.24–1.84, p < 0.001) higher risk of PN than the AR group. The overall c-statistics of the prediction model was 0.71(0.02). The scoring system is useful in predicting the risk of PN to prioritize patients for colonoscopy.

for screening participants in Asian countries. We have previously evaluated the factors associated with PN and distal neoplasia (DN), as well as assessed the use of the US, UK, Italy, Norwegian and a Hong Kong criterion to predict PN among asymptomatic screening participants 20 . All of the existing scoring systems were reported to have limited predictive and discriminative ability. There is an urgent need to devise a validated risk prediction system for the Asian population.
The primary objective of this study was to develop and validate a clinical risk stratification score predicting the risk of PN among asymptomatic subjects. We aim to construct a simple tool for physicians so that information on risks could be easily computed to inform choice of screening options. Our secondary objective is to validate a similar system, with synchronous presence of PN and DN as the outcome of interest.

Methods
The study setting has been described elsewhere [21][22][23][24][25][26] . In short, a community CRC screening centre was established in Hong Kong in 2008. Through several territory-wide media invitations, it invited free CRC screening for all eligible Hong Kong residents. The study was approved by the Clinical Research Ethics Committee of the Chinese University of Hong Kong (protocol CRE-2008.404). The methods were carried out in accordance with the approved guidelines. All participants provided informed consent for the study.
Study Participants. The screening participants voluntarily enrolled for the programme via online application, telephone, e-mail, fax or walk-in. The inclusion criteria consist of: (i) Age between 50 to 70 years; (ii) The absence of current or previous symptoms suggestive of CRC, such as per rectal bleeding, tarry stool, anorexia or a change in bowel habit in the past 4 weeks, or a weight loss of greater than 5 kg in the past 6 months, and (iii) Not having received any CRC screening tests in the past 5 years. Exclusion criteria included: (i). A personal history of CRC, colonic adenoma, diverticular disease, inflammatory bowel disease, prosthetic heart valve or vascular graft surgery, and (ii). Medical conditions which were contraindications for colonoscopy, like cardiopulmonary insufficiency and the use of double antiplatelet agents. The eligibility of each participant was checked by trained staff in the centre. Each eligible participant completed a self-administered questionnaire, which consists of information on their age, gender, family history of CRC, smoking status, drinking habits, past medical history, and chronic use of medications. Meanwhile, the completeness of questionnaires was checked by trained personnel. For relatively illiterate participants, trained volunteers assisted with survey completion.
Each participant was offered a choice between receiving FIT yearly for up to five years, or one direct colonoscopy. The present study included all screening participants who have given informed consent and received colonoscopy in this programme.
Colonoscopy procedure. The detailed procedure of colonoscopy was explained to each study participant before the scheduled colonoscopy appointment. Polyethylene Glycol (Klean-Prep R , Helsinn Birex Pharmaceuticals Ltd, Ireland) in split-dosing was used as a standardized bowel preparation regimen, which were given to all participants before they left the centre. Colonoscopy was performed by experienced colonoscopists in endoscopy centres affiliated with two major hospitals. Prior to the procedure, all subjects received a sedation regimen consisting of Midazolam 2.5 mg (Groupe Panpharma, France). Meperidine 25 mg (Martindale Pharmaceuticals, United Kingdom) was administered intravenously. Further doses of midazolam and meperidine were supplemented subject to the needs of the participants. Air insufflation was used for all procedures. The endoscopists aimed for a withdrawal time of ≥ 6 minutes, according to the current quality indicators for colonoscopy. As deemed appropriate by the endoscopists, lesions were removed and biopsied. The specimens were sent to a certified, accredited laboratory for gross and histopathological examination.
The derivation and validation cohort. A total of 5,789 screening participants completed colonoscopy in the study period 2008-2012. Among them, simple random sampling was used to select 2,000 subjects to act as the derivation cohort -a methodology based on our previous validation study 21 . Each study participant was regarded as one unit of randomization, and had an equal probability of being selected. "Proximal" refers to a location in the colon which is proximal to the splenic flexure; whilst "distal" refers to the rectum, sigmoid and descending colon. The proportion of PN in the derivation cohort was 12.6%, and we assumed 25% as the point prevalence of individual risk factors, as in the Asia Pacific Colorectal Screening (APCS) study by Yeoh and colleagues 27 . Based on these assumptions, a minimum of 3,100 subjects were needed to attain a power of > 80% so that a risk factor with an odds ratio of two could be detected at a significance level of p < 0.05. Therefore, the other 3,789 subjects formed our validation cohort.

Development of the risk scores.
In the derivation cohort, the association between the colonoscopic finding of PN and each risk factor was examined by the Pearson Chi-square tests. The risk factors examined included age, sex, family history of CRC (in first-degree relatives before the age of 60 years) 7 , smoking, drinking (current drinkers of alcohol for more than two times per week vs. those drinking less or non-drinkers), Body Mass Index (BMI), self-reported medical conditions, and use of non-steroidal anti-inflammatory agents (NSAIDs) or aspirin. All variables with initial p < 0.05 in univariate analysis were included in a binary logistic regression model with PN as the outcome. As was adopted by Yeoh and colleagues 28 , a weighting was assigned to each independent variable in the risk score, applying the corresponding adjusted odds ratio (AOR) halved and rounded to the nearest integer. This statistical technique aims to keep the total score below ten, and make the scoring system simple. The risk score for each subject is the sum of all the risk factors. To evaluate the predictive ability of the scoring system, a receiver operating characteristic (ROC) curve was constructed and the area under the curve (AUC) was delineated. A concordance (c)-statistics was used to reflect the discriminative ability of the prediction tool. used for data analysis. The proportion of PN was evaluated according to each score in the derivation cohort. The score with a magnitude closest to and below the overall proportion of PN was allocated to the category "average risk", whilst scores above were assigned as "high risk". An additional binary logistic regression model was constructed by entering all the significant variables identified by the derivation cohort analysis, and the AORs were evaluated in the validation cohort. The Hosmer-Lemeshow goodness-of-fit statistic was adopted to assess the reliability of the final model, where p > 0.05 indicates a good match of predicted risk over observed risk. C-statistics and the area under the ROC curve were used to evaluate the ability of the scoring system to predict the risk of having PN. The Cochran-Armitage test of trend was used to compare the prevalence of PN according to scores. The above analyses were repeated with synchronous presence of PN and distal neoplasia (DN) as the outcome of prediction. P values (two-sided) < 0.05 were considered statistically significant.

Subject characteristics.
In the derivation cohort, the average age of the participants was 57.8 years (SD 4.9) with 47.2% being male subjects (Table 1). A total of 645 (32.2%) cases of colorectal neoplasia were detected, including 8 (0.4%) and 122 (6.1%) being CRC and advanced neoplasia, respectively. The proportion of subjects having PN, DN, and synchronous presence of PN/DN was 12.6%, 14.9% and 4.7%, respectively. The characteristics of the validation cohort were similar to the derivation set, except BMI (p = 0.043). The prevalence of colorectal neoplasia according to the risk factors is shown in Table 2. From univariate analysis, age (p = 0.001), gender (p = 0.001), family history (p = 0.035), BMI (p = 0.013), and self-reported ischemic heart disease (p = 0.019) were associated with PN. For synchronous PN/DN, the factors identified included age (p < 0.001), gender (p < 0.001), smoking (p = 0.008), alcohol drinking (p = 0.017), and BMI (p = 0.017) ( Table 2).  (Table 3). Age (AOR 1.3 to 3.4), gender (AOR = 3.1, 95% C.I. 1.9-5.0, p < 0.001) and BMI (AOR = 1.6, 95% C.I. 1.0-2.5, p = 0.042) were significant independent predictors of synchronous PN/DN (Table 4). Development of the risk scoring systems. According to the AORs from the derivation cohort, the following predictors of PN were used to assign scores to each subject (  (2), female gender (0), BMI < 25 kg/m 2 (0), BMI ≥ 25 kg/m 2 (1). For both systems, the range of scores was 0-5, and a participant's score was based on the summation of all the points assigned to each risk factor. The number and proportions of subjects having various scores were shown in Tables 7. Since a score of 2 had a proportion of PN closest to the overall prevalence in the derivation cohort (13.8% vs. 12.6%), a scoring of ≤ 2 was categorized as "Average Risk" (AR). Subjects with scores ≥ 3 had proportions higher than the overall prevalence, and hence were designated as "High Risk" (HR). For synchronous PN/DN, a score of 2 was also chosen as the cut-off point to differentiate between AR and HR, since its proportion was closest to the overall prevalence in the derivation cohort (

Discussion
Statement of principal findings. This study has devised and validated two simple clinical risk scoring systems for predicting PN and synchronous PN/DN, respectively, in asymptomatic subjects. There is a trend toward higher detection rate of proximal neoplasia with increasing scores. The instrument is simple and easy to use, and the risk prediction only requires basic clinical information. The scoring system is particularly suited for patients who are keen to obtain more comprehensive information about their risks, where their screening choice could be facilitated. We recommend subjects who scored ≥ 3 points in either system may choose colonoscopy, whereas those with scores ≤ 2 could select FS as the primary CRC screening test. In should be noted that the prevalence of synchronous PN/DN among subjects who scored 5 was 28.1%, where colonoscopy is strongly indicated. The use of this scoring system in clinical practice is consistent with the advocates from the Institute of Medicine 28 and the US Preventive Services Task Force 29 , where shared decision making should be promoted in screening practices. Besides, its application could rationalize the use of colonoscopy in circumstances where the risk of proximal  Table 5. Colorectal Screening score for prediction of risk for proximal neoplasia. BMI: Body Mass Index. *Colorectal neoplasia include adenoma and advanced neoplasia. Advanced neoplasia is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥10 mm in diameter, high grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.

Risk factor Criteria Points
Age 50-55 0  Table 6. Colorectal Screening score for prediction of risk for synchronous proximal and distal neoplasia. BMI: Body Mass Index. *Colorectal neoplasia include adenoma and advanced neoplasia. Advanced neoplasia is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥ 10 mm in diameter, high grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.  Table 7. Distribution of number of subjects for each score category in the derivation cohort. *Colorectal neoplasia include adenoma and advanced neoplasia. Advanced neoplasia is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥10 mm in diameter, high grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.
lesions should be adequately high to warrant the procedure. These findings could also inform policy-makers at the macro level, especially when the characteristics of eligible residents in population-based screening programmes are known. The use of this tool could have a substantial public health implication, as resources to equip colonoscopy and FS capacity could be more accurately planned. Some variables, such as family history of CRC, were significantly associated with PN but not with synchronous PN/DN, and this might be explained by the relatively small sample size among those with synchronous PN/DN and the risk factors at the same time.
Relationship with literature. From a thorough literature review, there are only few studies which devised a validated scoring system for prediction of PN. Imperiale and colleagues have developed a seven-point risk stratification tool based on age, gender, and distal findings on FS from a company-based programme of screening colonoscopy in Indianapolis 16 . A methodology similar to the present study was used. All three variables were found to be independent predictors and formed a derivation cohort, and the outcome of interest is proximal advanced neoplasia 30 . However, when the scoring system was later evaluated in an average-risk asymptomatic cohort in Boston, it was reported that the clinical index has limited ability to differentiate low from intermediate risk white, black and Hispanic patients for PN (c-statistics < 0.07) 17 . Another large-scale study including more than 10,000 adults concluded that proximal advanced neoplasia is a function of age and gender only 18 . Yet another evaluation was conducted in California involving more than 2,900 asymptomatic subjected aged ≥ 5 0 years undergoing colonoscopy as a follow up to screening sigmoidoscopy. It was found that age, family history and distal findings were significant predictors of proximal advanced neoplasia 19 . When compared with these existing tools, our scoring system is unique as it does not rely on distal findings for risk prediction, yet it has  Table 8. Prevalence of proximal neoplasia and proximal advanced neoplasia by risk tier. *Colorectal neoplasia include adenoma and advanced neoplasia. Advanced neoplasia is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥ 10 mm in diameter, high grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof. high accuracy and predictive ability. A possible difference in discriminative ability between the present scoring system and the existing ones [16][17][18][19]30 may be due to differences in subject ethnicity and the outcomes of interest -as we studied colorectal neoplasia instead of advanced neoplasia in the proximal colon. The exact reasons for the difference remain to be further explored. In addition, it is noteworthy that whilst some previous studies demonstrated higher prevalence of proximal neoplasia in men than in women 16,18,30 , some studies showed that women had higher risk for proximal neoplasia [31][32][33] . The discrepancy in gender difference among these studies could be due to different study designs (e.g. biomathematical modelling 31 vs. the use of cancer registries 32 vs. prospective recruitment of screening participants [16][17][18][19]30,33 ). There have also been speculations that sociocultural barriers within female subjects were present to delay screening and diagnosis; as well as different nutrient metabolism and dietary practices when compared with male subjects. Although these have been identified as factors which might influence the risk of proximal neoplasia in different populations 33 , the exact reasons for the gender-specific discrepancy will need to be further explored. It was found that the adjusted odds ratio for proximal neoplasia among those with IHD was the highest (2.2) when compared with other risk factors (1.3-1.8). Many of the risk factors for IHD, namely smoking, alcohol drinking, hypertension, diabetes, and obesity which were entities within the metabolic syndrome were also risk factors for colorectal neoplasia. One biologically plausible explanation includes the fact that when IHD develops in screening participants, they might have been exposed to all these risk factors for a prolonged period of time which could potentially explain the higher odds among those with IHD. Future studies should evaluate the relative risks for proximal neoplasia conferred by established IHDs compared with healthy individuals.

Strengths and
Limitations. This is the first study which devised a scoring system to predict PN in a large cohort of asymptomatic individuals. It was conducted in an Asian Chinese population, which may be extrapolated to the 1.2 billion Chinese populations in the globe, due to subject homogeneity. A few limitations should however be addressed here. Firstly, we included self-referred screening participants in this study. Their health-seeking behavior and health consciousness might be different from the general public. Nevertheless, it is impractical to recruit participants by simple random sampling of the entire population, as the anticipated refusal rate will be high. Secondly, we invited screening participants aged between 50 to 70 years, and the utility of this system might not extend to subjects outside this age range. In addition, there are other potential risk factors which have not been included in the modeling, including physical activity level 34 , dietary intake of saturated fat, red meat and fibre 35,36 , as well as waist circumference which has recently shown to be an accurate predictor of colorectal neoplasia 37 . However, these variables are difficult to measure accurately, and could be subject to recall biases. Furthermore, we have used BMI as a measure of obesity and other anthropometric measurements including waist circumference, waist-to-hip ratio and body fat distribution could be additional parameters to enhance the predictive capability of the model. Finally, although IHD was found to be a novel predictive component of proximal neoplasia in this study which is compatible with a recent evaluation 38 , the present system used self-reported measures.
Study implications and future research. In summary, we have devised and validated a clinical scoring system for prediction of PN in a Chinese population. It is anticipated that its use in clinical practice could assist physicians to risk stratify subjects for colorectal cancer screening, and offer a choice between FS and colonoscopy-based on individual risk of proximal neoplasia. Prospective screening participants could observe the possible risks of missing proximal neoplasia, and physicians could base on these figures to facilitate a more thoroughly informed, shared decision making discussion with their patients. Future research should evaluate the scoring system in other countries with different ethnicity and distribution of colorectal neoplasia. The projected cost-effectiveness, acceptability, and practicality to implement this prediction tool in screening practices should be further addressed.