Risk scores for predicting incidence of type 2 diabetes in the Chinese population: the Kailuan prospective study

Few risk scores have been specifically developed to identify individuals at high risk of type 2 diabetes in China. In the present study, we aimed to develop such risk scores, based on simple clinical variables. We studied a population-based cohort of 73,987 adults, aged 18 years and over. After 5.35 ± 1.59 years of follow-up, 4,726 participants (9.58%) in the exploration cohort developed type 2 diabetes and 2,327 participants (9.44%) in the validation cohort developed type 2 diabetes. Age, gender, body mass index, family history of diabetes, education, blood pressure, and resting heart rate were selected to form the concise score with an area under the receiver operating characteristic curve (AUC) of 0.67. The variables in the concise score combined with fasting plasma glucose (FPG), and triglyceride (TG) or use of lipid-lowering drugs constituted the accurate score with an AUC value of 0.77. The utility of the two scores was confirmed in the validation cohort with AUCs of 0.66 and 0.77, respectively. In summary, the concise score, based on non-laboratory variables, could be used to identify individuals at high risk of developing diabetes within Chinese population; the accurate score, which also uses FPG and TG data, is better at identifying such individuals.

exploration cohort using a Cox proportional hazards model. Of the 24 variables that initially entered into the model, only age, gender, BMI, family history of diabetes, education, BP, resting heart rate, FPG, and TG or using lipid-lowering drugs made significant contributions to the score. These statistically significant risk factors constitute the accurate score. Excluding FPG and TG or using lipid-lowering drugs produces the concise score. People with FPG values ≥ 6.1 mmol/L had the highest risk score for predicting incidence of diabetes. The receiver operating characteristic (ROC) curves (Fig. 1a) demonstrate that the accurate score has a better predictive capacity than the concise score (areas under the curve (AUC) of 0.77 and 0.67, respectively, P < 0.001). The total scores for the concise score varied from 0-37, while the accurate score ranged from 0-60.
The diagnostic characteristics of the two models using the validation cohort are shown in Table 3. The concise score had a performance corresponding to an AUC of 0.66 (95% confidence interval (CI): 0.65-0.68); the AUC for the accurate score was 0.77 (95% CI: 0.76-0.78) (Fig. 1b). The concise score exhibits a reasonable sensitivity of 0.72 and specificity of 0.52, with an optimum cut-off value of 21. The accurate score exhibits a reasonable sensitivity (0.70), and specificity (0.70) with an optimum cut-off value of 27. Stratified analyses show that the concise score performs better in women than in men (AUC: 0.72 vs. 0.65), as does the accurate score (AUC: 0.81 vs. 0.76), and the concise score performs better for people < 60 years compared to those ≥ 60 years (AUCs of 0.67 and 0.62, respectively), but the accurate score performs slightly better for people ≥ 60 years (AUC: 0.76 vs. 0.77). Figures 2 and 3 present the calibration plots for the concise and accurate scores using the validation cohort, with the probability of incident diabetes after a mean of 5.35 years on the ordinate, and scores on the abscissa. The dots represent the actual incidence of diabetes, and the vertical lines represent the 95% CIs. The continuous line represents the predicted probability of incident diabetes, which clearly increases with increasing score. At the cut-off value of 21 in the concise score the predicted probability is 9.29%; the corresponding value for the accurate score (at 27) is 9.42%. Table 4 summarizes the performance of 14 other diabetes risk scores. These include 7 scores containing laboratory variables 4,7,8,[11][12][13]19 and 7 scores without laboratory data 6,9,10,14,15,17,18 . When applied to the validation cohort (1/3 of the whole cohort), none of the 7 scores containing laboratory variables outperformed our accurate score. Our concise score also performs better than the other 7 scores that do not contain laboratory variables. Both of our scores (concise and accurate) outperform the New Chinese Diabetes Risk Score devised by Zhou et al. 17 (AUCs of 0.66 and 0.77 vs. 0.61) when used to predict the incidence of diabetes in individuals. When applied to the whole sample, the diabetes risk score developed by Schmidt et al. 8 performs the best out of the 14 risk scores (AUC of 0.74).

Discussion
Using two-thirds of the Kailuan cohort, we derived two scoring systems to predict the incidence of diabetes among Chinese adults after a mean follow-up period of 5.35 years. We validated both of the scores using the remaining one-third and confirmed their predictive capacity for incident diabetes. The concise score is non-invasive and can be performed by the individuals themselves. The accurate score is more effective in predicting diabetes but requires simple blood tests. Our scores performed better than the 14 scores derived from other populations. Continued high AUC value (0.77) and our concise score performed with a somewhat low AUC value (0.67). Of note, a model providing an AUC value < 0.80 for predicting incident diabetes may be limited in its clinical utility. However, all predictors included in our scores are readily available clinical variables. If further predictors related to blood testing were included, the scores would perform better. In our scores, the FPG variable is the strongest predictor of incident diabetes (a contribution of up to 20 points). This result is consistent with previous reports 7, 8,11,13,19 . Impaired fasting glucose (IFG) has been defined at the levels from 6.1 to 6.9 mmol/L 20,21 , and from 5.6 to 6.9 mmol/L 22 . It is not surprising that individuals with IFG have a high risk of developing diabetes. In the accurate score, we also found that the points contributed by the category from 6.1 to 6.9 mmol/L was about twice that of the points contributed by the category from 5.6 to 6.1 mmol/L. The risk of incident diabetes increased with the high FPG level 8 .
Age was the second-strongest predictor in our scores; indeed, it has been included in most of the published scores used to predict incident diabetes 4,[6][7][8][9][10][11][12][13][14][17][18][19]23,24 . Individuals aged ≥ 60 years have the highest risk of developing diabetes in our scores (accounting for 29.7% of the total score in the concise score), closely followed by individuals in the age range from 40 to 59 years. In contrast, in the simple score used by Aekplakorn et al. 9 , the category ≥ 50 years was considered to have the highest point contribution (accounting for 11.8% of the total score). Although they differ in the age cut-off value, these scores are consistent in that older age predicts incident diabetes. In addition, some scores that were developed in a particular age group 6,12,13,19 , included age as a continuous variable 4,[8][9][10][11]14,17,18 , and also suggest that the risk of incident diabetes increases with older age.
In our concise score, BMI was the second-strongest predictor after age. In previous diabetes risk scores, BMI or waist circumference were also strong predictors 4,[6][7][8][9][10][11][12][13][14][17][18][19]23,24 . Compared to height, the variables of weight, waist circumference, and BMI had more statistical significance in the univariate analysis. When all of these factors were entered into the Cox proportional hazards score, only the BMI made a contribution to the scoring system. Similarly, in the clinical diabetes risk scores by Balkau et al. 15 , both BMI and waist circumference had similar predictive value, but only waist circumference was included in the score. The variables of BMI and waist circumference may not coexist in the same scoring systems. In addition, BMI was not recommended as a candidate variable in the report by Kahn et al. 13 , because that BMI is a complex index; thus, there is a possibility that the association between BMI and incident diabetes might be driven as much by reduced height as by increased weight.
We are not the first researchers to include resting heart rate in a diabetes prediction score 13 . Both the basic and enhanced scores developed by Kahn et al. 13 included the resting heart rate, and were assigned points of 2 and 5 with mean scores of 38.1 and 33.7, respectively. European studies 25,26 and a Chinese study from the Kailuan database 27 also demonstrated that an elevated resting heart rate is an independent risk factor for incident diabetes. It has been proposed that sympathetic activation resulting in increased heart rate may lead to insulin resistance which increases diabetes risk 28 . Ultimately, the exact mechanism for this remains to be elucidated.
The inclusion of TG and BP in the diabetes risk score is also not new 7,8,[11][12][13]19 . A widely held viewpoint is that the incidence of type 2 diabetes is the result of complex metabolic processes 29,30 . Elsewhere, it has been demonstrated that high normal BP and hypertension are associated with an increased risk of developing type 2 diabetes 31 . As reported in previous studies 32, 33 , higher TG and lower HDL levels are also associated with incident diabetes. However, only TG made a contribution to incident diabetes in our accurate score, which is consistent with the scores by Kanaya et al. 19 and Gao et al. 12 . The diabetes risk scores developed by Schmidt et al. 8 , Wilson et al. 7 , Chien et al. 11 , Meigs et al. 24 , and Kahn et al. 13 , included both the TG and HDL variables, while the score by Stern et al. 4 only included the HDL variable.
A family history of diabetes is also an important predictor for incident diabetes; genetic and environmental pathways may account for this 24 . 'Current heavy smoker' was given the highest point value in the German Diabetes Risk Score 10 . In the clinical scores by Balkau et al. 15 , smoking was the second most important predictive factor for men, but was not a predictor for women. However, smoking did not contribute to any of our scores, which was consistent with the majority of previously developed diabetes risk scores 4,[6][7][8]11,13,19 . Physical activity frequency was also not predictive, possibly because of its negative correlation with BMI. There were some limitations to our study. First, the study is based on residents in the Kailuan community of Tangshan, which might not be representative of the general population of China. In particular, the Kailuan study population is exposed to environmental pollution, and a large proportion of the participants were manual workers, including coalminers. Furthermore, the average BMI of participants included in the current study is higher than the national average 34 . The two scoring systems we developed will need to be validated in other parts of China or in other countries. Second, our scores were derived and validated using the same cohort. This may reduce their ability to predict incident diabetes in other populations. We hope to test these scores in other population samples in the future. Third, we did not collect further parameters related to blood testing. One-hour plasma glucose has been demonstrated as a strong predictor of incident diabetes 35 , and single-nucleotide polymorphisms are known to have associations with the risk of diabetes 24 . These, if included, may have improved the discrimination of the accurate score. Another limitation is that we have not been able to include OGTT data in our diagnostic criteria. This is likely to have led to an underestimate of the association between diabetes and score parameters. On the other hand, our scores used parameters that are easy to obtain, and are appropriate in China.

Conclusion
We designed two scores for use as assessment tools to identify subjects at high risk of developing type 2 diabetes among the Chinese population. The concise score is non-invasive and can be used by the individuals themselves. The accurate score provides superior assessment ability but requires simple blood tests. Our scores performed better than other existing diabetes risk scores within the Chinese study population. Further research is required to test the scores we developed in other population samples of China.     The community is located in the Tangshan area of northern China. Periodic health examinations, including questionnaire interviews, anthropometric measurements, clinical examinations, and laboratory assessments, were performed in 2-year cycles until the present day. We used the data for the period from 2006 to 2012 in this study. Individuals were eligible for enrolment if they were aged 18 years or over, provided informed consent, and updated their health status every 2 years according to the protocol. In the present study, 9,268 participants were excluded due to missing information related to candidate variables, and 8,766 were excluded due to missing follow-up data. Another 9,489 participants were excluded because they had either a baseline FPG level higher than 7.0 mmol/L (≥ 126 mg/dL), or a history of diabetes (as informed by a physician), or used anti-diabetic medicine. The remaining 73,987 individuals were available for our analyses.

Methods
The study followed the guidelines of the Helsinki Declaration, and was approved by the Ethics Committees of both the Kailuan General and Beijing Tiantan hospitals. All participants provided their written informed consent. Table 1 were chosen for their common availability and use in previous diabetes risk scores. The demographic data and information about lifestyle characteristics, medication use, history of diseases, and family history were obtained using questionnaires that were administered by research doctors of the hospitals who were specially trained for the task. The classification of each category variable has been described elsewhere in some detail 38,39 . To further clarify, the physical activity group 'very active' was defined as more than 80 minutes of activity per week, 'moderately active' corresponded to less than 80 minutes per week, and 'inactive' meant no physical activity. The salt intake group 'high' was defined as 10 grams/day, 'medium' as 6-10 grams/day, and 'low' as 6 grams/day. The 'smoking occasionally' group is defined as one cigarette or less per day and 'smoking frequently' as smoking daily. The 'drinking occasionally' group was defined as drinking 1-3 times every month, while 'drinking frequently' was defined as drinking daily.

Assessment of risk factors and outcomes. The candidate baseline variables presented in
Height, weight, hip circumference, and waist circumference (2.5 cm above the umbilicus) were measured in the standing position, without heavy clothing, to the nearest 0.1 cm or 0.1 kg by nurses responsible for annual routine health examinations. Waist-to-hip ratio was calculated as waist circumference divided by hip circumference. The waist measurement was categorized based on the dividing points of 84 and 90 for men and 77 and 84 for women, in reference to Korean diabetes risk scores in which the population had a similar Asian nature 40 . BMI was calculated according to the equation BMI = weight (kg)/height (m) 2 and was classified based on the common Chinese criteria, i.e., normal corresponds to BMI < 24.0 kg/m 2 , overweight to 24.0 ≤ BMI < 28.0, and obese to BMI ≥ 28.0. Two measurements of BP were taken with a 5-minute interval. If the two measurements differed by more than 5 mmHg, then an additional reading was taken, and the final, average of the readings used for analysis purposes. The resting heart rate 27 was based on the results of a 12-lead electrocardiogram performed with the participants in the supine position.
Blood samples were collected in the morning after an overnight fast in the 11 hospitals and analysed at the central laboratory of the Kailuan General Hospital. FPG was measured using the hexokinase/glucose-6-phosphate dehydrogenase method. TG, TC, HDL, and low density lipoprotein (LDL) levels were all measured enzymatically. According to the criteria of the National Cholesterol Education Program Adult Treatment Panel III 41 , a TG level of 1.70 mmol/L (150 mg/dL) or greater is considered to be hypertriglyceridemia. Similarly, an HDL level less than 1.03 mmol/L (40 mg/dL) in men, or 1.29 mmol/L (50 mg/dL) in women, was considered low. A TC level of 5.18 mmol/L (200 mg/dL), and LDL level of 3.35 mmol/L (130 mg/dL), were considered borderline-high levels.
The outcome of interest in the present study is the first incidence of diabetes at follow-up. This was identified according to either a self-reported history of diabetes diagnosis, taking of anti-diabetic medicine after the baseline examination, or being found to have an FPG level of ≥ 7.0 mmol/L (126 mg/dL) at any of the periodic examinations. The date of the diagnosis (incidence) was defined as the examination visit date when a new case of diabetes was identified; otherwise follow-up was censored if participants remained nondiabetic at the last follow-up. Statistical analysis. We used SAS version 9.4 (SAS Institute, Cary, NC, USA) for our analyses. An exploration cohort (49,325 persons) that accounted for two-thirds of the cohort was selected randomly to develop the    ¶ We selected the simple model from the 7 models described in the original article because it was defined with points and was recommended by the article. # Physical activity in the original model was calculated per hour, which cannot be derived in such detail in the present study. It was removed when validating. **We selected the simple clinical risk scores for predicting the incidence of diabetes. A Cox proportional hazards model was conducted in a stepwise manner, with candidate variables with a significance of P ≤ 0.2 included in the initial model; then, variables with a significance of P > 0.05 were removed. We took no account of the interaction terms between the independent variables. We refer to this model, which includes only demographic and anthropometric variables, as the concise model. The concise model supplemented with the laboratory evaluations results in the accurate model. For each model, the hazard ratio and 95% CI were calculated to estimate relative risk. In addition, β -coefficients were calculated to assign points for each risk factor by dividing the sum of the β -coefficients from the two models by 2 and rounding to the nearest integer. Continuous variables included in the model were categorized so that the estimated contribution of these factors to diabetes risk could be expressed through simplified point scores assigned to each of categories 13 . The sum of these points for each model was further calculated to predict the hazard of incidence of diabetes over a follow-up period of a mean of 5.35 years for each person. ROC curves were used to compare the predictive discrimination of different risk scores. Additionally, the AUC (also referred to as C statistic) was used to give a quantitative assessment of the predictive ability of the score. Sensitivity and specificity were used to differentiate the subjects who developed diabetes from those who did not. A cut-off value was identified based on the optimal point that gave the maximum sum of sensitivity and specificity.
Our literature reviewed 40 original articles (dated from March 2000 to December 2013) that developed new diabetes risk scores. These included 20 articles that aimed to screen individuals with undiagnosed diabetes or impaired glucose tolerance and 20 articles that aimed to identify individuals at high risk of developing diabetes during a certain period. Among the 20 articles identifying the risk of diabetes incidence, we selected 11 articles 4,6-15 for validation using our cohort (according to better AUC, and information scores available in the Kailuan study and different territories). The article by Griffin et al. 18 was selected for its development of the Cambridge Diabetes Risk Score, which has proven to be effective in identifying those at risk for incident diabetes 42 ; for the same reason, we also selected the article by Kanaya et al. 19 . We also tested the New Chinese Diabetes Risk Score 17 , which was originally developed to detect undiagnosed diabetes.
All validations were analysed using a 10-fold cross-validation method. The concise and accurate scores were validated in one-third of the cohort (24,662 participants). The other algorithms from different countries were also separately validated in the validation cohort and the whole cohort. We divided the validation cohort or the whole cohort into 10 smaller samples and validated 9 of them each time. We repeated the cross-validation process 10 times, and then calculated the mean AUC of the 10 validating values for the AUCs. model in the original article because it was transformed into a point score, performed well with a good AUC, and was recommended in the article. † † We selected the clinical risk score in the original article because it was defined with points and the variables in the other model cannot be obtained for the Kailuan study. ‡ ‡ We selected the accurate model in the original article because it performed better with a better AUC. ## We selected the enhanced diabetes prediction model, which had a higher AUC than the basic model in the original article.