Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Axes of a revolution: challenges and promises of big data in healthcare

Abstract

Health data are increasingly being generated at a massive scale, at various levels of phenotyping and from different types of resources. Concurrent with recent technological advances in both data-generation infrastructure and data-analysis methodologies, there have been many claims that these events will revolutionize healthcare, but such claims are still a matter of debate. Addressing the potential and challenges of big data in healthcare requires an understanding of the characteristics of the data. Here we characterize various properties of medical data, which we refer to as ‘axes’ of data, describe the considerations and tradeoffs taken when such data are generated, and the types of analyses that may achieve the tasks at hand. We then broadly describe the potential and challenges of using big data in healthcare resources, aiming to contribute to the ongoing discussion of the potential of big data resources to advance the understanding of health and disease.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The different axes of health data.
Fig. 2: Tradeoffs between axes of data.
Fig. 3: Global distribution of several biobanks and cohorts.
Fig. 4: Using human-based omics data in drug development.

Similar content being viewed by others

References

  1. Grad, F. P. The Preamble of the Constitution of the World Health Organization. Bull. World Health Organ. 80, 981 (2002).

    PubMed  Google Scholar 

  2. Burton-Jeangros, C., Cullati, S., Sacker, A. & Blane, D. A Life Course Perspective on Health Trajectories and Transitions Vol. 4 pp. 1–18 (Springer, 2015); https://link.springer.com/chapter/10.1007/978-3-319-20484-0_1

  3. Obermeyer, Z. & Emanuel, E. J. Predicting the future—big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216–1219 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Benke, K. & Benke, G. Artificial intelligence and big data in public health. Int. J. Environ. Res. Public Health 15, E2796 (2018).

    Article  PubMed  Google Scholar 

  5. Baro, E., Degoul, S., Beuscart, R. & Chazard, E. Toward a literature-driven definition of big data in healthcare. BioMed. Res. Int. 2015, 639021 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Gligorijević, V., Malod-Dognin, N. & Pržulj, N. Integrative methods for analyzing big data in precision medicine. Proteomics 16, 741–758 (2016).

    Article  PubMed  CAS  Google Scholar 

  7. Cios, K. J. & Moore, G. W. Uniqueness of medical data mining. Artif. Intell. Med. 26, 1–24 (2002).

    Article  PubMed  Google Scholar 

  8. Rumsfeld, J. S., Joynt, K. E. & Maddox, T. M. Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13, 350–359 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Koopmans, R. & Schaeffer, M. Relational diversity and neighbourhood cohesion. Unpacking variety, balance and in-group size. Soc. Sci. Res. 53, 162–176 (2015).

    Article  PubMed  Google Scholar 

  10. Gould, A. L. Planning and revising the sample size for a trial. Stat. Med. 14, 1039–1051 (1995).

    Article  CAS  PubMed  Google Scholar 

  11. Booker, C. L., Harding, S. & Benzeval, M. A systematic review of the effect of retention methods in population-based cohort studies. BMC Public Health 11, 249 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Mason, C. E., Porter, S. G. & Smith, T. M. Characterizing multi-omic data in systems biology. Adv. Exp. Med. Biol. 799, 15–38 (2014).

    Article  CAS  PubMed  Google Scholar 

  13. Cho, I. & Blaser, M. J. The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260–270 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Palsson, B. & Zengler, K. The challenges of integrating multi-omic data sets. Nat. Chem. Biol. 6, 787–789 (2010).

    Article  PubMed  Google Scholar 

  15. Check Hayden, E. Is the $1,000 genome for real? Nature https://www.nature.com/news/is-the-1-000-genome-for-real-1.14530 (2014).

  16. Kwon, E. J. & Kim, Y. J. What is fetal programming?: a lifetime health is under the control of in utero health. Obstet. Gynecol. Sci. 60, 506–519 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Barker, D. J. In utero programming of chronic disease. Clin. Sci. 95, 115–128 (1998).

    Article  CAS  Google Scholar 

  18. Topol, E. J. Individualized medicine from prewomb to tomb. Cell 157, 241–253 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Qiu, X. et al. The born in guangzhou cohort study (BIGCS). Eur. J. Epidemiol. 32, 337–346 (2017).

    Article  PubMed  Google Scholar 

  20. Golding, J., Pembrey M., Jones, R. & ALSPAC Study Team. ALSPAC—The Avon Longitudinal Study of Parents and Children. Paediatr. Perinat. Epidemiol. 15, 74–87 (2001).

  21. Howe, C. J., Cole, S. R., Lau, B., Napravnik, S. & Eron, J. J. Jr. Selection bias due to loss to follow up in cohort studies. Epidemiology 27, 91–97 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Swanson, J. M. The UK Biobank and selection bias. Lancet 380, 110 (2012).

    Article  PubMed  Google Scholar 

  23. Brieger, K. et al. Genes for Good: engaging the public in genetics research via social media. Am. J. Hum. Genet. 105, 65–77 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kaprio, J. The Finnish Twin Cohort Study: an update. Twin Res. Hum. Genet. 16, 157–162 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Magnus, P. et al. Cohort profile update: the Norwegian mother and child cohort study (MoBa). Int. J. Epidemiol. 45, 382–388 (2016).

    Article  PubMed  Google Scholar 

  26. Beesley, L. J. et al. The emerging landscape of health research based on biobanks linked to electronic health records: existing resources, statistical challenges, and potential opportunities. Stat. Med. https://doi.org/10.1002/sim.8445 (2019).

  27. Lau, B., Gange, S. J. & Moore, R. D. Interval and clinical cohort studies: epidemiological issues. AIDS Res. Hum. Retroviruses 23, 769–776 (2007).

    Article  PubMed  Google Scholar 

  28. Chen, M. S. Jr., Lara, P. N., Dang, J. H. T., Paterniti, D. A. & Kelly, K. Twenty years post-NIH Revitalization Act: enhancing minority participation in clinical trials (EMPaCT): laying the groundwork for improving minority clinical trial accrual: renewing the case for enhancing minority participation in cancer clinical trials. Cancer 120, 1091–1096 (2014).

    Article  PubMed  Google Scholar 

  29. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Mahmood, S. S., Levy, D., Vasan, R. S. & Wang, T. J. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet 383, 999–1008 (2014).

    Article  PubMed  Google Scholar 

  31. Colditz, G. A., Manson, J. E. & Hankinson, S. E. The Nurses’ Health Study: 20-year contribution to the understanding of health among women. J. Women’s Health 6, 49–62 (1997).

    Article  CAS  Google Scholar 

  32. Liao, Y., McGee, D. L., Cooper, R. S. & Sutkowski, M. B. How generalizable are coronary risk prediction models? Comparison of Framingham and two national cohorts. Am. Heart J. 137, 837–845 (1999).

    Article  CAS  PubMed  Google Scholar 

  33. Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    Article  PubMed  Google Scholar 

  34. Hripcsak, G. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015).

    PubMed  PubMed Central  Google Scholar 

  35. Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Vashisht, R. et al. Association of hemoglobin a1c levels with use of sulfonylureas, dipeptidyl peptidase 4 inhibitors, and thiazolidinediones in patients with type 2 diabetes treated with metformin: analysis from the observational health data sciences and informatics initiative. JAMA Netw. Open 1, e181755 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Gebru, T. et al. Datasheets for datasets. arXiv 1803.09010 (2018).

  38. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Wolford, B. N., Willer, C. J. & Surakka, I. Electronic health records: the next wave of complex disease genetics. Hum. Mol. Genet. 27, R14–R21 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Weber, G. M., Mandl, K. D. & Kohane, I. S. Finding the missing link for big biomedical data. J. Am. Med. Assoc. 311, 2479–2480 (2014).

    CAS  Google Scholar 

  41. Evans, R. S. Electronic health records: then, now, and in the future. Yearb. Med. Inform. 1, S48–S61 (2016).

    PubMed  Google Scholar 

  42. Tiik, M. & Ross, P. Patient opportunities in the Estonian electronic health record system. Stud. Health Technol. Inform. 156, 171–177 (2010).

    PubMed  Google Scholar 

  43. Montgomery, J. Data sharing and the idea of ownership. New Bioeth. 23, 81–86 (2017).

    Article  PubMed  Google Scholar 

  44. Rodwin, M. A. The case for public ownership of patient data. J. Am. Med. Assoc. 302, 86–88 (2009).

    Article  CAS  Google Scholar 

  45. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Hewitt, R. & Watson, P. Defining biobank. Biopreserv. Biobank. 11, 309–315 (2013).

    Article  PubMed  Google Scholar 

  47. Organization for Economic Cooperation and Development. Glossary of Statistical Terms: Biobank. in Creation and Governance of Human Genetic Research Databases (OECD). https://stats.oecd.org/glossary/detail.asp?ID=7220 (2006).

  48. Kinkorová, J. Biobanks in the era of personalized medicine: objectives, challenges, and innovation: Overview. EPMA J. 7, 4 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Gaziano, J. M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).

    Article  PubMed  Google Scholar 

  50. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Tapia-Conyer, R. et al. Cohort profile: the Mexico City Prospective Study. Int. J. Epidemiol. 35, 243–249 (2006).

    Article  PubMed  Google Scholar 

  53. Senn, S. Statistical pitfalls of personalized medicine. Nature 563, 619–621 (2018).

    Article  CAS  PubMed  Google Scholar 

  54. Garrett-Bakelman, F. E. et al. The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight. Science 364, eaau8650 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).

    Article  CAS  PubMed  Google Scholar 

  56. Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).

    Article  CAS  PubMed  Google Scholar 

  57. Zeevi, D. et al. Structural variation in the gut microbiome associates with host health. Nature 568, 43–48 (2019).

    Article  CAS  PubMed  Google Scholar 

  58. Shah, T. et al. Population genomics of cardiometabolic traits: design of the University College London-London School of Hygiene and Tropical Medicine-Edinburgh-Bristol (UCLEB) Consortium. PLoS One 8, e71345 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Tigchelaar, E. F. et al. Cohort profile: LifeLines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open 5, e006772 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Cohen, I.G. & Mello, M.M. Big data, big tech, and protecting patient privacy. J. Am. Med. Assoc. 322, 1141–1142 (2019).

    Article  Google Scholar 

  61. Price, W. N. II & Cohen, I. G. Privacy in the age of medical big data. Nat. Med. 25, 37–43 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Tutton, R., Kaye, J. & Hoeyer, K. Governing UK Biobank: the importance of ensuring public trust. Trends Biotechnol. 22, 284–285 (2004).

    Article  CAS  PubMed  Google Scholar 

  63. Kaufman, D. J., Murphy-Bollinger, J., Scott, J. & Hudson, K. L. Public opinion about the importance of privacy in biobank research. Am. J. Hum. Genet. 85, 643–654 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Hernán, M. A., Hsu, J. & Healy, B. A second chance to get causal inference right: a classification of data science tasks. Chance 32, 42–49 (2019).

    Article  Google Scholar 

  65. Shmueli, G. To Explain or to Predict? Stat. Sci. 25, 289–310 (2010).

    Article  Google Scholar 

  66. Geserick, M. et al. Acceleration of BMI in early childhood and risk of sustained obesity. N. Engl. J. Med. 379, 1303–1312 (2018).

    Article  PubMed  Google Scholar 

  67. Obermeyer, Z., Samra, J. K. & Mullainathan, S. Individual differences in normal body temperature: longitudinal big data analysis of patient records. Br. Med. J. 359, j5468 (2017).

    Article  Google Scholar 

  68. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  69. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).

    Article  Google Scholar 

  70. Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, E215–E220 (2000).

    CAS  PubMed  Google Scholar 

  71. Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Wang, S. et al. MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III. arXiv 1907.08322 (2019).

  73. Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Ghassemi, M., Naumann, T., Schulam, P., Beam, A. L. & Ranganath, R. Opportunities in machine learning for healthcare. arXiv 1806.00388 (2018).

  75. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article  CAS  PubMed  Google Scholar 

  76. Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380, 1347–1358 (2019).

    Article  PubMed  Google Scholar 

  77. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).

    Article  PubMed  Google Scholar 

  78. Weng, W.H. & Szolovits, P. Representation learning for electronic health records. arXiv 1909.09248 (2019).

  79. Dickerman, B. A., García-Albéniz, X., Logan, R. W., Denaxas, S. & Hernán, M. A. Avoidable flaws in observational analyses: an application to statins and cancer. Nat. Med. 25, 1601–1606 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Agniel, D., Kohane, I. S. & Weber, G. M. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. Br. Med. J. 361, k1479 (2018).

    Article  Google Scholar 

  81. Hernán, M. A. & Robins, J. M. Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol. 183, 758–764 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Pearl, J. Causality (Cambridge University Press, 2009).

  83. Johansson, F., Shalit, U. & Sontag, D. Learning representations for counterfactual inference. arXiv 1605.03661 (2016).

  84. Dickerman, B. A., García-Albéniz, X., Logan, R. W., Denaxas, S. & Hernán, M. A. Avoidable flaws in observational analyses: an application to statins and cancer. Nat. Med. 25, 1601–1606 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Smith, G. D. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  PubMed  Google Scholar 

  86. Hu, P., Jiao, R., Jin, L. & Xiong, M. Application of causal inference to genomic analysis: advances in methodology. Front. Genet. 9, 238 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Yusuf, S. et al. Modifiable risk factors, cardiovascular disease, and mortality in 155 722 individuals from 21 high-income, middle-income, and low-income countries (PURE): a prospective cohort study. Lancet https://doi.org/10.1016/S0140-6736(19)32008-2 (2019).

  88. Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572, 116–119 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Rivers, E. et al. Early goal-directed therapy in the treatment of severe sepsis and septic shock. N. Engl. J. Med. 345, 1368–1377 (2001).

    Article  CAS  PubMed  Google Scholar 

  90. Calvert, J. S. et al. A computational approach to early sepsis detection. Comput. Biol. Med. 74, 69–73 (2016).

    Article  PubMed  Google Scholar 

  91. Shimabukuro, D. W., Barton, C. W., Feldman, M. D., Mataraso, S. J. & Das, R. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir. Res. 4, e000234 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Avati, A. et al. Improving palliative care with deep learning. BMC Med. Inform. Decis. Mak. 18, 122 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Lipton, Z. C. The mythos of model interpretability. Commun. ACM 61, 36–43 (2018).

    Article  Google Scholar 

  94. Vogt, H., Green, S., Ekstrøm, C. T. & Brodersen, J. How precision medicine and screening with big data could increase overdiagnosis. Br. Med. J. 366, l5270 (2019).

    Article  Google Scholar 

  95. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).

    Article  CAS  PubMed  Google Scholar 

  96. American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care 36, S67–S74 (2013)..

  97. Udler, M. S. et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med. 15, e1002654 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Young, A. I., Benonisdottir, S., Przeworski, M. & Kong, A. Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Lakhani, C. M. et al. Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes. Nat. Genet. 51, 327–334 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011).

    Article  CAS  PubMed  Google Scholar 

  101. Phelan, M., Bhavsar, N. & Goldstein, B. A. Illustrating informed presence bias in electronic health records data: how patient interactions with a healthsystem can impact inference. eGEMs 5, 22 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Brodniewicz, T. & Grynkiewicz, G. Preclinical drug development. Acta Pol. Pharm. 67, 578–585 (2010).

    PubMed  Google Scholar 

  103. Breyer, M. D. Improving productivity of modern-day drug discovery. Expert Opin. Drug Discov. 9, 115–118 (2014).

    Article  CAS  PubMed  Google Scholar 

  104. FitzGerald, G. et al. The future of humans as model organisms. Science 361, 552–553 (2018).

    Article  CAS  PubMed  Google Scholar 

  105. Matthews, H., Hanison, J. & Nirmalan, N. “Omics”-informed drug and biomarker discovery: opportunities, challenges and future perspectives. Proteomes 4, 28 (2016).

    Article  PubMed Central  Google Scholar 

  106. Reshef, D. N. et al. Detecting novel associations in large data sets. Science 334, 1518–1524 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Finan, C. et al. The druggable genome and support for target identification and validation in drug development. Sci. Transl. Med. 9, eaag1166 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  108. Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).

    Article  CAS  PubMed  Google Scholar 

  109. Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 18, 41–58 (2019).

    Article  CAS  PubMed  Google Scholar 

  110. Paik, H. et al. Repurpose terbutaline sulfate for amyotrophic lateral sclerosis using electronic medical records. Sci. Rep. 5, 8580 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Dudley, J. T. et al. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 3, 96ra76 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Xu, S. et al. Prevalence and predictability of low-yield inpatient laboratory diagnostic tests. JAMA Netw. Open 2, e1910967 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Einav, L., Finkelstein, A., Mullainathan, S. & Obermeyer, Z. Predictive modeling of U.S. health care spending in late life. Science 360, 1462–1465 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Ahlqvist, E. et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 6, 361–369 (2018).

    Article  PubMed  Google Scholar 

  115. Thenganatt, M. A. & Jankovic, J. Parkinson disease subtypes. JAMA Neurol. 71, 499–504 (2014).

    Article  PubMed  Google Scholar 

  116. Lawton, M. et al. Developing and validating Parkinson’s disease subtypes and their motor and cognitive progression. J. Neurol. Neurosurg. Psychiatry 89, 1279–1287 (2018).

    Article  PubMed  Google Scholar 

  117. Berg, D. et al. Time to redefine PD? Introductory statement of the MDS Task Force on the definition of Parkinson’s disease. Mov. Disord. 29, 454–462 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  118. Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Awadalla, P. et al. Cohort profile of the CARTaGENE study: Quebec’s population-based biobank for public health and personalized genomics. Int. J. Epidemiol. 42, 1285–1299 (2013).

    Article  PubMed  Google Scholar 

  120. Scholtens, S. et al. Cohort Profile: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 44, 1172–1180 (2015).

    Article  PubMed  Google Scholar 

  121. Christensen, H., Nielsen, J. S., Sørensen, K. M., Melbye, M. & Brandslund, I. New national Biobank of The Danish Center for Strategic Research on Type 2 Diabetes (DD2). Clin. Epidemiol. 4, 37–42 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  122. Krokstad, S. et al. Cohort profile: the HUNT study, Norway. Int. J. Epidemiol. 42, 968–977 (2013).

    Article  CAS  PubMed  Google Scholar 

  123. Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).

    Article  PubMed  Google Scholar 

  124. Al Kuwari, H. et al. The Qatar Biobank: background and methods. BMC Public Health 15, 1208 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  125. Jiang, C. Q. et al. An overview of the Guangzhou biobank cohort study-cardiovascular disease subcohort (GBCS-CVD): a platform for multidisciplinary collaboration. J. Hum. Hypertens. 24, 139–150 (2010).

    Article  CAS  PubMed  Google Scholar 

  126. Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Lee, J.-E. et al. National Biobank of Korea: quality control programs of collected-human biospecimens. Osong Public Health Res. Perspect. 3, 185–189 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  128. Lin, J.-C., Fan, C.-T., Liao, C.-C. & Chen, Y.-S. Taiwan Biobank: making cross-database convergence possible in the Big Data era. Gigascience 7, 1–4 (2018).

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eran Segal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med 26, 29–38 (2020). https://doi.org/10.1038/s41591-019-0727-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-019-0727-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing