Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study

Moon, Shinje; Jang, Ji-Yong; Kim, Yumin; Oh, Chang-Myung

doi:10.1038/s41598-021-95341-8

Download PDF

Article
Open access
Published: 03 August 2021

Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study

Shinje Moon¹^na1,
Ji-Yong Jang²^na1,
Yumin Kim³ &
…
Chang-Myung Oh³

Scientific Reports volume 11, Article number: 15748 (2021) Cite this article

2403 Accesses
10 Citations
2 Altmetric
Metrics details

Subjects

Abstract

In this study, we aimed to propose a novel diabetes index for the risk classification based on machine learning techniques with a high accuracy for diabetes mellitus. Upon analyzing their demographic and biochemical data, we classified the 2013–16 Korea National Health and Nutrition Examination Survey (KNHANES), the 2017–18 KNHANES, and the Korean Genome and Epidemiology Study (KoGES), as the derivation, internal validation, and external validation sets, respectively. We constructed a new diabetes index using logistic regression (LR) and calculated the probability of diabetes in the validation sets. We used the area under the receiver operating characteristic curve (AUROC) and Cox regression analysis to measure the performance of the internal and external validation sets, respectively. We constructed a gender-specific diabetes prediction model, having a resultant AUROC of 0.93 and 0.94 for men and women, respectively. Based on this probability, we classified participants into five groups and analyzed cumulative incidence from the KoGES dataset. Group 5 demonstrated significantly worse outcomes than those in other groups. Our novel model for predicting diabetes, based on two large-scale population-based cohort studies, showed high sensitivity and selectivity. Therefore, our diabetes index can be used to classify individuals at high risk of diabetes.

Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods

Article Open access 11 August 2023

A nomogram model for screening the risk of diabetes in a large-scale Chinese population: an observational study from 345,718 participants

Article Open access 14 July 2020

Identifying top ten predictors of type 2 diabetes through machine learning analysis of UK Biobank data

Article Open access 24 January 2024

Introduction

Diabetes mellitus is a chronic metabolic disorder characterized by disrupted glucose homeostasis, resulting from increased insulin resistance and/or impaired insulin secretion. People with diabetes mellitus are predisposed to metabolic disorders, such as cardiovascular disease (CVD), which affects 32.2% of all people with diabetes mellitus globally. Moreover, their complications are leading causes of morbidity and mortality^1,2. The prevalence and socioeconomic burden of diabetes are rapidly increasing worldwide. Approximately 1 in 11 adults have diabetes, and 90% of people with diabetes have type 2 diabetes mellitus².

Previous large-scale studies suggest that diet and lifestyle modifications can prevent or delay the development of diabetes mellitus in high-risk individuals by Refs.^2,3. The Diabetes Prevention Program conducted in the Unites States reported that lifestyle modification reduced the incidence of diabetes mellitus by 58% compared with control after a 2.8-year mean follow-up⁴. Toshikazu et al. also demonstrated that lifestyle modification reduced the overall relative risk of diabetes mellitus by 44.1% in Japan⁵. Clinical studies conducted in China⁶ and India⁷ have reported 42% and 38% risk reductions, respectively.

Therefore, developing risk prediction models for diabetes mellitus and identifying high-risk individuals have become a challenging issue in clinical research. To explore the risk factors and formulate predictive models for diabetes development, machine learning techniques have been widely used⁸. These methods help researchers discover unknown significant figures and solve scientific problems from large quantities of datasets^9,10. In the fields of medical science and healthcare, machine learning provides useful classification and prediction models with high accuracy¹¹. Recently, Hang Lai et al. proposed a risk prediction model with 84.7% area under the receiver operating characteristic curve (AUROC) from 13,309 Canadian patients¹². Furthermore, Maniruzzaman et al. built a classifying model that yielded 94.25% accuracy for the prediction of diabetes mellitus from an American diabetes dataset¹³.

In this study, we aimed to propose a novel diabetes index based on machine learning techniques for diabetes mellitus with high accuracy from two large community-based cohort studies. We formulated a risk classification model using logistic regression to measure the probability of diabetes presence, based on non-diabetic participants’ demographic information and laboratory data from the Korea National Health and Nutrition Examination Survey (KNHANES). Thereafter, we externally validated this model by predicting new-onset diabetes mellitus in a large prospective cohort study known as the Korean Genome and Epidemiology Study (KoGES).

Results

Baseline characteristics from the KNHANES

Table 1 show the general characteristics from the KNHANES. These depict the derivation and internal validation datasets, respectively, according to gender and diabetes. Subjects with diabetes were older than those without in both datasets. In the derivation dataset, diabetes prevalence was 4.9% in men and 3.8% in women. The prevalence of obesity (Body mass index, BMI ≥ 25 kg/m²) was 38% in men (38% in normal and 38% in diabetes) and 28.1% in women (27.3% in normal and 47.3% in diabetes). In the internal validation dataset, diabetes prevalence was 4.6% in men and 3.9% in women. The prevalence of obesity (BMI ≥ 25 kg/m²) was 40.8% in men (40.6% in normal and 44.5% in diabetes) and 27.6% in women (26.8% in normal and 46% in diabetes). Subjects with diabetes in both datasets exhibited lower socioeconomic status and education, higher fasting glucose levels, as well as higher incidence of glycosuria, hypertension, and dyslipidemia than did subjects without diabetes.

Table 1 General characteristics of training set (2013–16 KNHANES) and testing set (2017–2018) according to gender and diabetes.

Full size table

Feature selection and classification model by logistic regression

Based on literature review, we identified about 40 candidate risk factors (Supplementary Table 1), in 20 variables present in both KNHANES and KoGES. Table 2 displayed the selection process by means of a univariate LR in men and women, respectively. All 20 features from Model 1 were selected as candidate variables for univariate analysis in Model 2. By means of multivariate analysis (Models 2 and 3), we identified 16 and 18 variables as diabetes risk factors to be utilized as the input features for formulating the classification model in men and women, respectively. Thereafter, based on these variables, we generated a gender-specific diabetes classification model using LR. Note that the feature selection and the formulation of the prediction model were conducted using only the derivation dataset.

Table 2 Backward stepwise logistic regression of men and women in training set.

Full size table

We used this gender-specific diabetes classification model to calculate the probabilities of diabetes in subjects from the internal validation dataset. The area under the receiver operating curve (AUROC) was 0.941 and 0.939 in men and women, respectively (Fig. 1). The area of under the precision-recall (PR) curve was 0.475 and 0.381 in men and women, respectively (Fig. 1). Moreover, we evaluated the model performance via calibration, the agreement between observed and predicted probabilities using val.prob function in the rms package. As a result, the classification model for women was a well-calibrated model, besides the model for men was not according to the Spiegelhalter Z-test and its two-tailed p-values (S:p for men: 0.008; S:p for women: 0.588, Supplementary Fig. 2).

External validation of the classification model

Table 3 shows baseline characteristics of the KOGES dataset. By using our gender-specific classifying model constructed from the derivation dataset, we calculated the probabilities of the presence of diabetes in subjects from the external validation dataset. These subjects were categorized into five groups according to the probabilities of the subjects in ascending order. Figure 2 shows the cumulative incidence of new-onset diabetes. Most groups had significant differences from other groups. For both men and women, group 5 yielded significantly worse outcomes than those in other groups.

Table 3 Baseline characters of external validation set.

Full size table

Discussion

Our novel model for the risk classification of diabetes mellitus, based on two large-scale population-based cohort studies, showed high sensitivity and selectivity. Our model yielded AUROCs of 0.941 and 0.939 in men and women, respectively. The Finnish Diabetes Risk Score (FINDRISC) model is a well-known, recommended tool for diabetes mellitus prediction¹⁴. The AUROC of the FINDRISC model was 0.77 and 0.74 in the Norwegian¹⁵ and Spanish¹⁶ populations, respectively. The Framingham Diabetes Risk Scoring Model (FDRSM) by Wilson et al.¹⁷ yielded an AUROC of 0.85 and 0.78 in middle aged American and Canadian populations, respectively¹⁸. In the Asian population, Quan Zou et al. predicted new-onset diabetes using the machine learning technique from a Chinese cohort. Their model yielded an AUROC of 0.8084¹⁹. The diabetes risk score model from the KoGES by Kim et al.²⁰ yielded AUROC of 0.71 and 0.76 in men and women, respectively. Note that the predictive performance by our model is for the presence of DM, not the new-onset DM, thereby, somewhat outperforms compared to previous models predicting the new-onset DM. We had performed the literature-review and statistical methods to select more than 15 predictors, which are the potentially appropriate model for DM that has the complex pathophysiology.

With the help of machine learning techniques, we can handle large numbers of participant features that may have positive or negative correlations with the prevalence of diabetes mellitus. To obtain input features for our model, we used data from the KNHANES, a large-scale cross-sectional study that includes approximately 10,000 participants. As a result, we were able to use the 16 and 18 variables in men and women, respectively, during the analysis (Table 3).

Among these variables, glycosuria showed the highest odds ratio (OR) in men (OR 1.35; 95% CI 1.32–1.39). In general, glycosuria has been used as a biomarker for renal complication in diabetes^8,21, not as a predictor for diabetes. Although glycosuria is a result of hyperglycemia, it also occurs with normal blood glucose levels due to renal injury. Moreover, hyperglycemic patients can also secrete normal range glucose in their urine^22,23. This implies that we need to identify a new risk factor that, despite being considered negligible, may have a significant impact on predicting diabetes through machine learning techniques. High triglyceride (TG) levels showed the highest OR in women (OR 1.49; 95% CI 1.45–1.54). High TG levels are known to be a result of metabolic dysfunction in patients with diabetes²⁴ and a risk factor for diabetes development^25,26. Recently, a rural Chinese cohort study by Yongcheng et al. reported that hypertriglyceridemia is a risk factor for diabetes²⁷. They also suggested that reducing triglycerides can decrease the risk of developing diabetes²⁷. This implies that a high TG level is a modifiable risk factor for diabetes and should be managed in people predisposed to diabetes.

Alcohol consumption was related to a decreased risk of diabetes in both men and women (KNHANES dataset). This finding is consistent with previous studies about alcohol consumption. Moreover, heavy and moderate consumption showed deleterious and protective effects on diabetes, respectively²⁸. BMI and waist circumference (WC) showed positive relationships in univariate analysis. However, multivariate analysis revealed that BMI had a negative relationship, whereas WC had a positive relationship with diabetes. In light of this, waist circumference, a well-known parameter for central obesity, may be a better parameter for risk assessment of obesity than is BMI, a general obesity indicator. Wang et al. reported similar results regarding risk prediction for diabetes. According to their analysis, abdominal adiposity was superior to abdominal obesity as a predictor for new-onset diabetes²⁹. Peter et al. also reported that WC showed higher mortality risk than BMI (WC: HR 1.40 [95% CI 1.14–1.72] and BMI: HR 1.29 [1.04–1.61]) in adults with diabetes³⁰.

Risk group classification is one of the most critical uses of machine learning techniques in medical research³¹. Using logistic regression, the combinatory effect of selected risk factors on the disease of interest could be calculated as a probability. Moreover, based on the probability obtained from LR, the participants were classified into five groups. Subsequently, we assessed the risk of each group by analyzing the cumulative incidence of diabetes using cox regression analysis. As expected, and as per our prediction model, participants at high risk showed a high incidence of diabetes (Fig. 2).

Our study had several limitations. First, we could not distinguish type 1 diabetes mellitus from type 2 diabetes mellitus because there were no biomarkers or clinical information for classifying the new-onset diabetes in the KoGES. The risk factors for each type of diabetes are different. Therefore, distinguishing the type of diabetes may be preferable when formulating a prediction model with high accuracy. However, new-onset type 1 diabetes mellitus in a patient over 30 years of age is rare³². Hence, this prediction model may be used to classify groups with a high risk for type 2 diabetes mellitus. Second, we could not use menopausal status as a predictive factor in women. The effects of various post-menopausal hormones in women must be considered³³. Previous cohort studies reported controversial results regarding the role of menopausal status in diabetes development^34,35. Kim et al. reported that there was no association between natural menopause and the risk for diabetes mellitus³⁴. However, early menopause showed significant association with type 2 diabetes mellitus³⁶. Unfortunately, KoGES data at baseline did not include the menopausal status of participants. Therefore, we could not use this factor. Third, we used two large cohort composed of Koreans. So, our diabetes index has high generalizability in Koreans, but not high in other populations. However, we had used the nationally representative surveys to establish the DM classification model. Moreover, we validated the model using the KoGES that is also a nation-wide longitudinal study. Due to setting healthy subjects as target population, our model might have the generalizability compared to other models using hospital-based participants.

In conclusion, we developed a diabetes mellitus risk classification model and validated it using Korean datasets. Although the variables used in this model cannot be counted directly, they can be easily collected in real clinical practice. Hence, this new diabetes index can be used to classify individuals at a high risk for diabetes mellitus, who should prevent the disease by managing their risks through lifestyle modification.

Materials and methods

Study population

This study used demographic data and biochemical profiles from the 2013–18 KNHANES. The KNHANES is a national surveillance system assessing the health and nutritional status of the Korean population. It is conducted annually by the Korea Centers for Disease Control and Prevention (KCDC). Details of this nationwide survey have been described elsewhere³⁷. Subjects aged 40 years and older were included. Subjects with incomplete data regarding demographics and laboratory information were excluded. Furthermore, we excluded subjects with a fasting blood glucose level ≥ 126 mg/dL regardless of a diagnosis of diabetes mellitus. When constructing prediction models, subjects with hyperglycemia may cause bias as this may involve predicting the development of an anticipated pre-existing condition. We determined 2013–16 KNHANES data as the derivation set and 2017–18 KNHANES data as the internal validation set. The target population of KHANES consists of nationally representative non-institutionalized civilians³⁸.

The KoGES is an ongoing, prospective, large cohort study conducted by the Korean government. It involves a biannual examination related to life-style surveys, biochemical profiles, and incidences of common chronic diseases of Korean adults since 2001. Details of the KoGES have been described elsewhere³⁹. We used the Ansan–Ansung cohort study, a KoGES 10-year data follow-up study, for the external validation set. Subjects who were already diagnosed with diabetes mellitus or exhibited diabetic profiles in lab tests (a fasting glucose level ≥ 126 mg/dL, a 2-h post glucose level ≥ 200 mg/dL in a 75 g oral glucose tolerance test [OGTT], or a glycosylated hemoglobin A1c[HbA1c] level ≥ 6.5%) were excluded at baseline. Finally, 14,977, 9611, and 7140 subjects were used in the derivation, internal validation, and external validation sets for analysis, respectively. The major steps of inclusion/ exclusion processes of this study are described at Supplementary Fig. 1.

Definition of diabetes

Diabetes was defined according to the American Diabetes Association (ADA) guidelines⁴⁰ as follows: a fasting blood glucose level ≥ 126 mg/dL, a 2-h post glucose level ≥ 200 mg/dL during OGTT, or an HbA1c ≥ 6.5%. Participants who were previously diagnosed as having diabetes or who exhibited diabetic features in their blood samples were categorized as the diabetes group in the KNHANES. In the KoGES, because it is a longitudinal observational study, we included non-diabetic patients in the initial cohort data. Moreover, we detected new-onset diabetes in accordance to the criteria of the ADA during the observation period.

Variable selection and statistical analysis

To determine predictive risk factors for deriving the risk prediction model, candidate variables were selected based on literature review. Two endocrinologists performed literature review and selected 40 risk factors (Supplementary Table 1). Subsequently, we determined predictive risk factors using backward stepwise logistic regression (LR) method⁴¹ after applying weight values to all subjects in the KNHANES. Weight values were used for the processes of determining the significant risk factors and deriving the prediction model. These values were determined during data construction and denoted the subjects in the study cohort in which a number of people were represented.

Normal distribution of candidate variables was verified using the Kolmogorovo–Smirnov test. Differences in variables were analyzed based on diabetes status by means of the student’s t-test and Chi-square test for continuous and categorical variables, respectively. Associations between candidate variables were analyzed separately for men and women. The LR model was used to determine the risk factors for the presence of diabetes mellitus, and to formulate the diabetes mellitus prediction model. The AUROC and the Cox regression model were used to measure the performance of the prediction model for the internal validation set and for the external validation set, respectively. Statistical analysis was performed using R language (R packages ver.3.6.1). P-value < 0.05 was considered statistically significant.

Ethical considerations

The Institutional Review Board of Gwangju Institute of Science and Technology (South Korea) approved the study protocol (IRB No. 20200414-EX-01-02). All research procedures were performed in accordance to the relevant guidelines and regulations. All participants volunteered and provided written informed consent prior to enrolment, and their records were anonymized before being accessed by the authors.

References

Einarson, T. R., Acs, A., Ludwig, C. & Panton, U. H. Prevalence of cardiovascular disease in type 2 diabetes: A systematic literature review of scientific evidence from across the world in 2007–2017. Cardiovasc. Diabetol. 17, 83 (2018).
Article PubMed PubMed Central Google Scholar
Zheng, Y., Ley, S. H. & Hu, F. B. Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nat. Rev. Endocrinol. 14, 88 (2018).
Article PubMed Google Scholar
Tuomilehto, J., Schwarz, P. & Lindström, J. Long-term benefits from lifestyle interventions for type 2 diabetes prevention: Time to expand the efforts. Diabetes Care 34, S210–S214 (2011).
Article PubMed PubMed Central Google Scholar
Knowler, W. C. et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 346, 393–403 (2002).
Article CAS PubMed Google Scholar
Saito, T. et al. Lifestyle modification and prevention of type 2 diabetes in overweight Japanese with impaired fasting glucose levels: A randomized controlled trial. Arch. Intern. Med. 171, 1352–1360 (2011).
Article ADS PubMed Google Scholar
Pan, X.-R. et al. Effects of diet and exercise in preventing NIDDM in people with impaired glucose tolerance: The Da Qing IGT and Diabetes Study. Diabetes Care 20, 537–544 (1997).
Article CAS PubMed Google Scholar
Ramachandran, A. et al. The Indian Diabetes Prevention Programme shows that lifestyle modification and metformin prevent type 2 diabetes in Asian Indian subjects with impaired glucose tolerance (IDPP-1). Diabetologia 49, 289–297 (2006).
Article CAS PubMed Google Scholar
Carpentier, C. et al. Glycosuria amount in response to hyperglycaemia and risk for diabetic kidney disease and related events in Type 1 diabetic patients. Nephrol. Dial. Transplant. 34, 1731–1738 (2019).
Article CAS PubMed Google Scholar
Dhall, D., Kaur, R. & Juneja, M. Machine learning: a review of the algorithms and its applications. In: Proceedings of ICRIC 2019, 47–63 (Springer, 2020).
Chapter Google Scholar
Wu, H., Yang, S., Huang, Z., He, J. & Wang, X. Type 2 diabetes mellitus prediction model based on data mining. Inform. Med. Unlocked 10, 100–107 (2018).
Article Google Scholar
Birjais, R., Mourya, A. K., Chauhan, R. & Kaur, H. Prediction and diagnosis of future diabetes risk: A machine learning approach. SN Appl. Sci. 1, 1112 (2019).
Article Google Scholar
Lai, H., Huang, H., Keshavjee, K., Guergachi, A. & Gao, X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr. Disord. 19, 1–9 (2019).
Article CAS Google Scholar
Maniruzzaman, M., Rahman, M. J., Ahammed, B. & Abedin, M. M. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf. Sci. Syst. 8, 7 (2020).
Article PubMed PubMed Central Google Scholar
Lim, H. M., Chia, Y. C. & Koay, Z. L. Performance of the Finnish Diabetes Risk Score (FINDRISC) and Modified Asian FINDRISC (ModAsian FINDRISC) for screening of undiagnosed type 2 diabetes mellitus and dysglycaemia in primary care. Prim. Care Diabetes 14, 494–500 (2020).
Article PubMed Google Scholar
Jølle, A. et al. Validity of the FINDRISC as a prediction tool for diabetes in a contemporary Norwegian population: A 10-year follow-up of the HUNT study. BMJ Open Diabetes Res. Care 7, e000769 (2019).
Article PubMed PubMed Central Google Scholar
Salinero-Fort, M. et al. Performance of the Finnish Diabetes Risk Score and a simplified Finnish Diabetes Risk Score in a community-based, cross-sectional programme for screening of undiagnosed type 2 diabetes mellitus and dysglycaemia in Madrid, Spain: The SPREDIA-2 study. PLoS ONE 11, e0158489 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wilson, P. W. et al. Prediction of incident diabetes mellitus in middle-aged adults: The Framingham Offspring Study. Arch. Intern. Med. 167, 1068–1074 (2007).
Article PubMed Google Scholar
Mashayekhi, M. et al. Evaluating the performance of the Framingham Diabetes Risk Scoring Model in Canadian electronic medical records. Can. J. Diabetes 39, 152–156 (2015).
Article PubMed Google Scholar
Zou, Q. et al. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 9, 515 (2018).
Article PubMed PubMed Central Google Scholar
Ha, K. H. et al. Development and validation of the Korean diabetes risk score: A 10-year national cohort study. Diabetes Metab. J. 42, 402–414 (2018).
Article PubMed PubMed Central Google Scholar
Fishman, B. et al. Renal glucosuria is associated with lower body weight and lower rates of elevated systolic blood pressure: Results of a nationwide cross-sectional study of 2.5 million adolescents. Cardiovasc. Diabetol. 18, 124 (2019).
Article PubMed PubMed Central Google Scholar
Ferrannini, E. Learning from glycosuria. Diabetes 60, 695–696 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hung, C.-C. et al. Glycosuria and renal outcomes in patients with nondiabetic advanced chronic kidney disease. Sci. Rep. 6, 39372 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Mooradian, A. D. Dyslipidemia in type 2 diabetes mellitus. Nat. Rev. Endocrinol. 5, 150–159 (2009).
Article CAS Google Scholar
Tirosh, A. et al. Changes in triglyceride levels over time and risk of type 2 diabetes in young men. Diabetes Care 31, 2032–2037 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kwon, Y. H. et al. The association between persistent hypertriglyceridemia and the risk of diabetes development: The Kangbuk Samsung Health Study. Endocrinol. Metab. 33, 55–61 (2018).
Article CAS Google Scholar
Ren, Y. et al. Hypertriglyceridemia-waist and risk of developing type 2 diabetes: The Rural Chinese Cohort Study. Sci. Rep. 7, 1–8 (2017).
ADS CAS Google Scholar
Baliunas, D. O. et al. Alcohol as a risk factor for type 2 diabetes: A systematic review and meta-analysis. Diabetes Care 32, 2123–2132 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y., Rimm, E. B., Stampfer, M. J., Willett, W. C. & Hu, F. B. Comparison of abdominal adiposity and overall obesity in predicting risk of type 2 diabetes among men. Am. J. Clin. Nutr. 81, 555–563 (2005).
Article CAS PubMed Google Scholar
Katzmarzyk, P. T., Hu, G., Cefalu, W. T., Mire, E. & Bouchard, C. The importance of waist circumference and BMI for mortality risk in diabetic adults. Diabetes Care 36, 3128–3130 (2013).
Article PubMed PubMed Central Google Scholar
Alghamdi, M. et al. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLoS One 12, e0179805 (2017).
Article PubMed PubMed Central CAS Google Scholar
Thomas, N. J. et al. Frequency and phenotype of type 1 diabetes in the first six decades of life: A cross-sectional, genetically stratified survival analysis from UK Biobank. Lancet Diabetes Endocrinol. 6, 122–129 (2018).
Article PubMed PubMed Central Google Scholar
Dalal, P. K. & Agarwal, M. Postmenopausal syndrome. Indian J. Psychiatry 57, S222 (2015).
Article PubMed PubMed Central Google Scholar
Kim, C. et al. Menopause and risk of diabetes in the Diabetes Prevention Program. Menopause (New York, NY) 18, 857 (2011).
Article Google Scholar
Ren, Y. et al. Association of menopause and type 2 diabetes mellitus. Menopause 26, 325–330 (2019).
Article PubMed Google Scholar
Brand, J. S. et al. Age at menopause, reproductive life span, and type 2 diabetes risk: Results from the EPIC-InterAct study. Diabetes Care 36, 1012–1019 (2013).
Article PubMed PubMed Central Google Scholar
Kweon, S. et al. Data resource profile: The Korea national health and nutrition examination survey (KNHANES). Int. J. Epidemiol. 43, 69–77 (2014).
Article PubMed PubMed Central Google Scholar
Kwon, S. et al. Sleep duration and mortality in Korean adults: A population-based prospective cohort study. BMC Public Health 20, 1–10 (2020).
Article CAS Google Scholar
Kim, Y., Han, B.-G. & Groiup, K. Cohort profile: The Korean genome and epidemiology study (KoGES) consortium. Int. J. Epidemiol. 46, e20–e20 (2017).
Article PubMed Google Scholar
American Diabetes Association. 2. Classification and diagnosis of diabetes: standards of medical care in diabetes—2018. Diabetes Care 41, S13–S27 (2018).
Article Google Scholar
Zellner, D., Keller, F. & Zellner, G. E. Variable selection in logistic regression models. Commun. Stat. Simul. Comput. 33, 787–805 (2004).
Article MathSciNet MATH Google Scholar

Download references

Funding

This research was supported by the Basic Science Research Program, through the National Research Foundation of Korea (2020R1C1C1004999 to C.M.O).

Author information

These authors contributed equally: Shinje Moon and Ji-Yong Jang.

Authors and Affiliations

Department of Endocrinology and Metabolism, Hallym University College of Medicine, Chuncheon, Republic of Korea
Shinje Moon
Division of Cardiology, National Health Insurance Service Ilsan Hospital, Goyang, Republic of Korea
Ji-Yong Jang
Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
Yumin Kim & Chang-Myung Oh

Authors

Shinje Moon
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Yong Jang
View author publications
You can also search for this author in PubMed Google Scholar
Yumin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Chang-Myung Oh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.M. and C.-M.O. developed the theory and designed the study. J.-Y.J. and Y.K. performed analysis. S.M. and C.-M.O. draft the manuscript and designed the table and figures. All authors critically reviewed the manuscript and provided critical feedback. C.-M.O. supervised the project.

Corresponding author

Correspondence to Chang-Myung Oh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Moon, S., Jang, JY., Kim, Y. et al. Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study. Sci Rep 11, 15748 (2021). https://doi.org/10.1038/s41598-021-95341-8

Download citation

Received: 28 March 2021
Accepted: 26 July 2021
Published: 03 August 2021
DOI: https://doi.org/10.1038/s41598-021-95341-8

This article is cited by

Machine learning and deep learning predictive models for type 2 diabetes: a systematic review
- Luis Fregoso-Aparicio
- Julieta Noguez
- José A. García-García
Diabetology & Metabolic Syndrome (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.