Development and evaluation of a risk score for type 2 diabetes mellitus among middle-aged Chinese rural population based on the RuralDiab Study

The purpose of this study was to establish a simple and effective risk score for type 2 diabetes mellitus (T2DM) in middle-aged rural Chinese. Total of 5453 participants aged 30–59 years from the Rural Diabetes, Obesity and Lifestyle (RuralDiab) study were recruited for establishing the RuralDiab risk score by using logistic regression analysis. The RuralDiab risk score was validated in a prospective study from Henan Province of China, and compared with previous risk scores by using the receiver-operating characteristics cure. Ultimately, sex, age, family history of diabetes, physical activity, waist circumference, history of dyslipidemia, diastolic blood pressure, body mass index were included in the RuralDiab risk score (range from 0 to 36), and the optimal cutoff value was 17 with 67.9% sensitivity and 67.8% specificity. The area under the cures (AUC) of the RuralDiab risk score was 0.723(95%CI: 0.710–0.735) for T2DM in validation population, which was significant higher than the American Diabetes Association score (AUC: 0.636), the Inter99 score (AUC: 0.669), the Oman risk score (AUC: 0.675). The RuralDiab risk score was established and demonstrated an appropriate performance for predicting T2DM in middle-aged Chinese rural population. Further studies for validation should be implemented in different populations.

some risk factors without laboratory tests have been demonstrated as an effective, low cost and noninvasive tool for identifying the high-risk individuals of T2DM [10][11][12][13][14] . Because of incomplete health care system and underdeveloped economy in rural areas, the prevalence of T2DM is already high and continuously increasing in rural areas of China 2,3 . Thus, establishing a suitable risk score must be useful in identifying high risk individuals for the prevention and control of T2DM in rural areas.
A risk score of T2DM had been developed according to the data of a nationwide study in China 14 . However, because of quickly increased prevalence of T2DM and the different levels of risk factors in rural population of China, we tried to establish a rural risk assessment tool (the RuralDiab risk score) for T2DM based on the data from the Rural Diabetes, Obesity and Lifestyle (RuralDiab) study. Another prospective study from Henan Province was used to validate and compare the performance between the RuralDiab risk score and previous risk scores.

Results
Population characteristics. The characteristics of establishment population was shown in Table 1, which showed that the crude prevalence of undiagnosed T2DM was 4.29% (234 of 5453 individuals), while age, marital status, family history of diabetes, more vegetable and fruit intake, treated with anti-hypertensive medication and body mass index (BMI) had no sex difference. The percentages of high fat intake, current smoking, hypertension and dyslipidemia were higher, but physical activity was lower in men than that in women. Detailed characteristics of validation population were presented in Supplementary Table 1. A total of 249 patients of T2DM were detected in the validation population with a 6-year follow-up.
Establishment of risk score. Table 2 describes the results of the multivariate logistic regression analysis.
The characteristics of establishment population were significantly associated with undiagnosed T2DM included sex, age, family history of diabetes, physical activity, waist circumference, history of dyslipidemia, diastolic blood pressure (DBP), and BMI. The well-fitting was shown by Hosmer-Lemeshow test (χ 2 = 5.25, P = 0.731), which the observed prevalence matched well with the predicted prevalence of undiagnosed T2DM in the multivariate logistic regression model. BMI, DBP and history of dyslipidemia by the net reclassification improvement (NRI) analysis were added in the multivariate logistic regression model fitting with sex, age, family history of diabetes, physical activity and waist circumference. The results of analysis showed that the contribution of DBP is higher compared with systolic blood pressure (SBP) for risk score of T2DM, and the strength co-linearity was found between DBP and SBP for effecting on T2DM. In the sensitivity analysis of predicting T2DM, the area under the curve (AUC) of DBP was bigger than that of SBP in the multivariate logistic regression model. Thus, the DBP was incorporated into the risk score of the T2DM. And they improved the predicted probabilities with NRI = 0.2192

Characteristics
Men(n = 1746) Women(n = 3707) Total(n = 5453) P-value Validation the RuralDiab risk score and its advantages compared with others. Table 3 presents the validation of the RuralDiab risk score for predicting risk of T2DM in an external prospective study. The AUCs of the RuralDiab risk score were 0.723 (95% CI: 0.710-0.735) in total population, 0.711 (95% CI: 0.688-0.732) in men and 0.726 (95% CI: 0.709-0.742) in women. The optimal cutoff value was 17 in total population. The AUCs of the RuralDiab risk score was better than that of the American Diabetes Association (ADA) score (AUC: 0.636 in total, 0.628 in men), the Inter99 score (AUC: 0.669 in total, 0.618 in men), the Oman risk score (AUC: 0.675 in total, 0.659 in men) in total population and men. The significant difference of the AUC was only found between the RuralDiab risk score and the ADA score in women (AUC: 0.648). Comparing with the New Chinese Diabetes risk score, the RuralDiab risk score significantly improved the reclassification in all risk scores, and the net reclassification improvement (NRI) were 6.33% in total, 3.86% in men, 9.23% in women, respectively. The Figure 1 showed that the comparison between the RuralDiab risk score and previous risk scores was executed.

Discussion
The RuralDiab risk score, which was developed from a large-scale rural population study, is the first risk assessment tool for T2DM with noninvasive factors in rural population. Meanwhile, the RuralDiab risk score was validated and evaluated by an external prospective study for T2DM prediction, which showed some advantages of the RuralDiab risk score compared with previous risk scores. The result of Framing-ham Offspring Study reported that the incident of T2DM was mainly in middle-aged adults 15 . Therefore, the RuralDiab risk score was established in Chinese with aged 30-59 years living in rural area. Previous reports showed that T2DM was a multi-factor metabolic disorder disease, and environment factors and life-style played important roles [16][17][18][19][20][21] . The results of data analysis found that sex, age, family history of diabetes, physical activity, waist circumference, history of dyslipidemia, DBP, BMI were included in the RuralDiab risk score. Compared with previously published the New Chinese Diabetes Risk Score, the RuralDiab risk score added physical activity and history of dyslipidemia, and made DBP substituted for SBP with adjusting "treated with anti-hypertensive medication".
With some advantages compared with previous risk scores, especially in validity of T2DM risk prediction, the RuralDiab risk score is a reliable and inexpensive health check tool, which could be used for screening diabetes in the large population. Although it might inevitably omit individuals with T2DM risk 22,23 , there are some clinical meanings. Firstly, applying the RuralDiab risk score to predict T2DM may reduce the suffering of individuals with invasive procedure. Secondly, the application of the RuralDiab risk score could quickly identify the high-risk individuals of T2DM in rural areas for both the general population and health care providers. Finally, wide application of the RuralDiab risk score could improve the public awareness of T2DM and help people realize the relevant risk factors.
Although the RuralDiab risk score is the first rural assessment tool for T2DM in China based on a large-scale, population-based data-the RuralDiab study, there are some limitations. Firstly, the cases of undiagnosed T2DM were ascertained by fasting glucose level without OGTT or HBA1c, which might omit some potential T2DM individuals, and OGTT or HBA1c will be considered in future study. Secondly, some important covariates, such as dietary and lifestyle might have reporting bias, but potential covariates were adjusted as much as possible. Thirdly, the current performance might be not ideal enough for risk prediction in practice, and some new indicators or biomarkers, especially for hereditary factors could improve the performance of the risk score in the future. Finally, only one provincial data was applied to establish and validate the RuralDiab risk score, which might limit the popularization and application. In addition, the performance of the risk tool need to be further confirmed in the multi-centered prospective studies.
In conclusion, the current study develops the RuralDiab risk score including sex, age, family history of diabetes, physical activity, waist circumference, history of dyslipidemia, DBP and BMI for predicting T2DM. Compared with the previously published risk scores, the RuralDiab risk score was more suitable for rural population, which might be helpful for rural health care practitioners to assess the risk of T2DM, and then improve the awareness of disease prevention for rural population. However, the potential clinical application remains to be determined.

Methods
Study design and participants. Establishment population of the RuralDiab risk score was derived from the Rural Diabetes, Obesity and Lifestyle (RuralDiab) study. In brief, the participants were selected by stratified random cluster sampling from eligible candidates listed in the residential registration record. Firstly, 3 townships were selected from 22 rural areas of Yuzhou County in consideration of the adherence and local medical conditions. Secondly, all permanent residents who satisfied the inclusion criteria and signed informed consent were selected as the subjects. Ultimately, a total of 11032 participants aged 18 years and older were recruited between July and August in 2015 from Yuzhou County in Henan Province of China. The participants were excluded based on the criteria, which comprised (1) previously diagnosed diabetes (n = 818); (2) aged younger than 30 or older than 59 years (n = 4725); (3) with incomplete information (n = 36). Finally, the information of 5453 participants aged 30-59 years was used to establish the RuralDiab risk score of T2DM in the present study.
An external population from one prospective study was used as validation population to evaluate the RuralDiab risk score. The baseline study was conducted from 2007 to 2008, and 10009 participants aged 18 years and above who lived in their current location with at least 10 years were recruited from Xinan County in Henan Province of China. Then, participants were followed up during 2013 and 2014. Individuals with the drop-off (n = 1280), the death (n = 580), age younger than 30 or older than 59 years (n = 1627), diagnosed diabetes at baseline (n = 654), and incomplete information at baseline or follow-up (n = 1215) were excluded. Ultimately, 4653 participants aged 30 to 59 years were included in the current study.
The two surveys were approved by the Zhengzhou University Medical Ethics Committee, and written informed consent was obtained from all participants. The studies were executed with the principles of the Declaration of Helsinki.
Data collection and laboratory measurement. Using standardized methods for stringent levels of quality control, a standard questionnaire was given to each participant with face-to-face interview by well trained public health workers and physicians to collect information on demographics (age, sex, income status, educational level and marital status), family and individual disease history (diabetes, hypertension, coronary heart disease and stroke), dietary intake and lifestyle (smoking, alcohol drinking, intakes of fat, vegetable and fruit, and physical activity). Age was classified into three categories: ≥ 30 and < 40, ≥ 40 and < 50, ≥ 50 and < 60 years. The educational level was classified into four categories: illiterate, primary school, secondary school, and college and above. Marital status was classified into two categories: married/cohabitation and unmarried/divorced/widowed. Family history was defined as the parents or siblings of participants with a history of disease.
Food frequency method was used to estimate the daily intake of fat, vegetable and fruit in the past one year according to the China Food Composition Table 24 . Based on the Chinese Dietary Guidelines, the appropriate consumption of vegetable and fruit should be more than 500 g daily, and high fat intake was defined as consuming an average of more than 75 g per day 25 . Physical activity for each participant was classified as low, moderate and high level based on the International Physical Activity Questionnaire (IPAQ) 26 . The participants with high or/ and moderate level of physical activity were defined as physical activity. Smoking status was classified as current smoking and not current smoking. Participants who were current smoking at least one cigarette per day along with sequential or cumulative 6 months were defined as current smoking according to the definition of the World Health Organization 27 .
Blood specimens were collected with vacuum tubes containing ethylene diamine tetraacetic acid (EDTA)-K2 after overnight fasting and were centrifuged at 4 °C and 3000 rpm for 10 min. The plasma was transferred with the cold chain and stored at − 80 °C for biochemical analyses. Plasma glucose was measured using a modified hexokinase enzymatic method.
Definitions. Undiagnosed T2DM was defined as having fasting plasma glucose level ≥ 7.0 mmol/L without previously diagnosed diabetes based on the American Diabetes Association (ADA) diagnostic criteria 29 . After excluding type 1 diabetes mellitus, gestational diabetes mellitus, and other special type diabetes, T2DM was defined as a self-reported diagnosed diabetes or undiagnosed T2DM. All participants brought their prescribed medications during the investigation, and a self-reported history of diabetes was confirmed by the use of insulin or oral hypoglycemic agents. In addition, the hospitalized patients with diabetes had their charts reviewed.  Table 3. Performance of the RuralDiab risk score and comparison with previously published risk scores for predicting T2DM in validation population. AUC = area under the curve; PPV = positive predictive value; NPV = negative predictive value; + LR = positive likelihood ratio; − LR = negative likelihood ratio; DBP = diastolic blood pressure; BMI = body mass index; T2DM = type 2 diabetes mellitus. # compared with the RuralDiab risk score P < 0.05.

Previous risk scores selection.
(ADA) 10 from Americans, the Inter99 score (Inter99) 11 from Europeans, the Thai risk score (Thai) 12 from Thais, the Oman risk score (Oman) 13 from Arabians and the New Chinese Diabetes Risk Score (CHN) 14 from Chinese to compare with the RuralDiab risk score (RuralDiab) in validation population.
Statistical analysis. Data of the participants' characteristics were compared. The categorical variables and continuous variables were analyzed through Chi-square and t-test, respectively. In this analysis, we re-categorized these parameters and used logistic regression analysis to select factors and derive the risk score. Forward stepwise likelihood ratio method of multivariate logistic regression analysis was used to investigate significant risk factors for the RuralDiab risk score. Net reclassification improvement analysis was used to identify whether adding some risk factors could improve the classification of the predicted probabilities of the multivariate logistic regression model 30 . The quintiles of predicted probabilities of having diabetes according to the model comprised of sex, age, family history of diabetes, physical activity, waist circumference were classified into five categories: ≤ 2.1%, > 2.1% and ≤ 2.9%, > 2.9% and ≤ 3.9%, > 3.9% and ≤ 5.8%, and > 5.8%. The risk score was calculated according to the coefficients (β) of the model. Then, the receiver-operating characteristics curves were plotted for the RuralDiab risk score, the sensitivity was plotted on the y-axis, and the false-positive rate (1-specificity) was plotted on the x-axis. The area under the curves reflected the discriminating accuracy of different curves using different combinations of predictors 31 , and the optimal cutoff point was the peak of the curve. Sensitivity, specificity, likelihood ratio, predictive value and the AUC were applied to compare the performance among different risk scores. A two-tailed P-value < 0.05 was deemed statistically significant. Statistical analyses were performed using SAS 9.3 (SAS Institute, USA).