Abstract
Diagnostic procedures, therapeutic recommendations, and medical risk stratifications are based on dedicated, strictly controlled clinical trials. However, a plethora of real-world medical data exists, whereupon the increase in data volume comes at the expense of completeness, uniformity, and control. Here, a case-by-case comparison shows that the predictive power of our real world data–based model for diabetes-related chronic kidney disease outperforms published algorithms, which were derived from clinical study data.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
BMC Medicine Open Access 28 September 2022
-
Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records
Journal of Translational Medicine Open Access 26 March 2022
-
Artificial intelligence-assisted fast screening cervical high grade squamous intraepithelial lesion and squamous cell carcinoma diagnosis and treatment planning
Scientific Reports Open Access 10 August 2021
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout


Data availability
Restrictions apply to the general availability of the data because of patient agreements and the nature of patient data. Data were used under license for the study presented in this manuscript. The IBM Explorys database data are run by IBM who makes the data available for secondary use (for example, scientific research) on a commercial basis. The INPC database is owned by the participating health institutions of the INPC. Access to the INPC can be provided for research purposes through the Regenstrief Institute Data Core.
References
Trojano, M. et al. Nat. Rev. Neurol. 13, 105–118 (2017).
Marx, V. Nature 498, 255–260 (2013).
Bender, E. Nature 527, S19 (2015).
Wu, X. et al. IEEE Trans. Knowl. Data Eng. 26, 97–107 (2014).
Frieden, T. R. N. Engl. J. Med. 377, 465–475 (2017).
Bates, D. W. et al. Health Aff. 33, 1123–1131 (2014).
Razavian, N. et al. Big Data 3, 277–287 (2015).
Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Sci. Rep. 6, 26094 (2016).
Levin, A. et al. Lancet 390, 1888–1917 (2017).
Fioretto, P., Dodson, P. M., Ziegler, D. & Rosenson, R. S. Nat. Rev. Endocrinol. 6, 19–25 (2010).
Wanner, C. et al. N. Engl. J. Med. 375, 323–334 (2016).
Kaelber, D. C. et al. J. Am. Med. Inform. Assoc. 19, 965–972 (2012).
Hosmer, Jr., D. W., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression 3rd edn (John Wiley & Sons, Inc., Hoboken, NJ, USA, 2013).
Vossen, P. Science 357, 22–27 (2017).
McDonald, C. J. et al. Health Aff. 24, 1214–1220 (2005).
Swets, J. A. Science 240, 1285–1293 (1988).
Bradley, A. P. Patt. Recogn. 30, 1145–1159 (1997).
The Diabetes Control and Complications Trial Research Group N. Engl. J. Med. 329, 977–986 (1993).
Dunkler, D. et al. Clin. J. Am. Soc. Nephrol. 10, 1371–1379 (2015).
Vergouwe, Y. et al. Diabetologia 53, 254–262 (2010).
Keane, W. F. et al. Clin. J. Am. Soc. Nephrol. 1, 761–767 (2006).
Jardine, M. J. et al. Am. J. Kidn. Dis. 60, 770–778 (2012).
Liaw, A. & Wiener, M. R News 2, 18–22 (2002).
Unger, J. & Schwartz, Z. Diabetes Management in Primary Care 2nd edn (Lippincott Williams & Wilkens, Philadelphia, 2013).
Glassock, R. J., Warnock, D. G. & Delanaye, P. Nat. Rev. Nephrol. 13, 104–114 (2017).
GBD 2015 Mortality and Causes of Death Collaborators. Lancet 388, 1459–1544 (2016).
Platinga, L. C., Tuot, D. S. & Powe, N. R. Adv. Chron. Kidn. Dis. 17, 225–236 (2010).
Bursac, Z. et al. Source Code Biol. Med. 3, 17 (2008).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer, New York, 2009).
Van Rijsbergen, C. J. Information Retrieval (Butterworth-Heinemann, Newton, MA, USA, 1979).
Wasserstein, R. L. & Lazar, N. A. The ASA’s statement on p-values: context, process, and purpose. Am. Stat. 70, 129–133 (2016).
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
Carpenter, J. & Bithell, J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat. Med. 19, 1141–1164 (2000).
Acknowledgements
The authors thank O. Quarder, C. Ringemann, P. Stephan (Roche Diabetes Care GmbH, Germany), and H. Mikulski (Roche Diabetes Care Spain, S.L.) for their continuing contributions to this work. We are grateful to T. Beck, S. Chittajallu, and S. Weinert (Roche Diabetes Care, Inc., USA) for their consultancy in the early phase of the investigation. The support from U. Günzel as well as H. Rincker and team (Roche Diabetes Care Deutschland, Germany) is highly appreciated. We are indebted to R. Daikeler, K. Kusterer, S. Waibel, and S. Zink (Germany) for their medical advice concerning our initial results. The research described in this manuscript was funded by Roche Diabetes Care GmbH and supplemented with in-kind contributions from Eli Lilly and Company (S.M.), Indiana Biosciences Research Institute (D.R.), and Regenstrief Institute, Inc. (T.S.).
Author information
Authors and Affiliations
Contributions
S.R., A.A., A.B., and F.F.F. generated and validated the Roche/IBM algorithm. T.H. and H.K. performed independent validation and further analysis. S.M., D.R., T.S., and teams enabled data withdrawal and assessment. B.S., L.B., and R.H. provided consultation for the overall research project, which was led by W.P.
Corresponding author
Ethics declarations
Competing interests
The authors declare the following potential conflicts of interest: T.H., B.S., W.P., S.R., and A.B. are inventors of a patent application related to the work described in this manuscript. T.H., H.K., B.S., R.H., and W.P. are employees of Roche Diabetes Care GmbH. S.R., A.A., A.B., L.B., and F.F.F. are employees of IBM Switzerland Ltd. S.M. is an employee of Eli Lilly and Company. Independent of his employment at Roche, W.P. is affiliated with Heidelberg University and is a member of the Faculty of Physics and Astronomy. T.S. is affiliated with Indiana University School of Medicine.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Tables 1–7
Rights and permissions
About this article
Cite this article
Ravizza, S., Huschto, T., Adamov, A. et al. Predicting the early risk of chronic kidney disease in patients with diabetes using real-world data. Nat Med 25, 57–59 (2019). https://doi.org/10.1038/s41591-018-0239-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-018-0239-8
This article is cited by
-
Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
BMC Medicine (2022)
-
Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records
Journal of Translational Medicine (2022)
-
Federated learning-based AI approaches in smart healthcare: concepts, taxonomies, challenges and open issues
Cluster Computing (2022)
-
The efficacy of canagliflozin in diabetes subgroups stratified by data-driven clustering or a supervised machine learning method: a post hoc analysis of canagliflozin clinical trial data
Diabetologia (2022)
-
Predicting diabetic nephropathy in type 2 diabetic patients using machine learning algorithms
Journal of Diabetes & Metabolic Disorders (2022)