Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Axes of a revolution: challenges and promises of big data in healthcare

Abstract

Health data are increasingly being generated at a massive scale, at various levels of phenotyping and from different types of resources. Concurrent with recent technological advances in both data-generation infrastructure and data-analysis methodologies, there have been many claims that these events will revolutionize healthcare, but such claims are still a matter of debate. Addressing the potential and challenges of big data in healthcare requires an understanding of the characteristics of the data. Here we characterize various properties of medical data, which we refer to as ‘axes’ of data, describe the considerations and tradeoffs taken when such data are generated, and the types of analyses that may achieve the tasks at hand. We then broadly describe the potential and challenges of using big data in healthcare resources, aiming to contribute to the ongoing discussion of the potential of big data resources to advance the understanding of health and disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The different axes of health data.
Fig. 2: Tradeoffs between axes of data.
Fig. 3: Global distribution of several biobanks and cohorts.
Fig. 4: Using human-based omics data in drug development.

References

  1. 1.

    Grad, F. P. The Preamble of the Constitution of the World Health Organization. Bull. World Health Organ. 80, 981 (2002).

    PubMed  PubMed Central  Google Scholar 

  2. 2.

    Burton-Jeangros, C., Cullati, S., Sacker, A. & Blane, D. A Life Course Perspective on Health Trajectories and Transitions Vol. 4 pp. 1–18 (Springer, 2015); https://link.springer.com/chapter/10.1007/978-3-319-20484-0_1

  3. 3.

    Obermeyer, Z. & Emanuel, E. J. Predicting the future—big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216–1219 (2016).

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Benke, K. & Benke, G. Artificial intelligence and big data in public health. Int. J. Environ. Res. Public Health 15, E2796 (2018).

    PubMed  PubMed Central  Google Scholar 

  5. 5.

    Baro, E., Degoul, S., Beuscart, R. & Chazard, E. Toward a literature-driven definition of big data in healthcare. BioMed. Res. Int. 2015, 639021 (2015).

    PubMed  PubMed Central  Google Scholar 

  6. 6.

    Gligorijević, V., Malod-Dognin, N. & Pržulj, N. Integrative methods for analyzing big data in precision medicine. Proteomics 16, 741–758 (2016).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Cios, K. J. & Moore, G. W. Uniqueness of medical data mining. Artif. Intell. Med. 26, 1–24 (2002).

    PubMed  PubMed Central  Google Scholar 

  8. 8.

    Rumsfeld, J. S., Joynt, K. E. & Maddox, T. M. Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13, 350–359 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Koopmans, R. & Schaeffer, M. Relational diversity and neighbourhood cohesion. Unpacking variety, balance and in-group size. Soc. Sci. Res. 53, 162–176 (2015).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    Gould, A. L. Planning and revising the sample size for a trial. Stat. Med. 14, 1039–1051 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Booker, C. L., Harding, S. & Benzeval, M. A systematic review of the effect of retention methods in population-based cohort studies. BMC Public Health 11, 249 (2011).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Mason, C. E., Porter, S. G. & Smith, T. M. Characterizing multi-omic data in systems biology. Adv. Exp. Med. Biol. 799, 15–38 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Cho, I. & Blaser, M. J. The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260–270 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Palsson, B. & Zengler, K. The challenges of integrating multi-omic data sets. Nat. Chem. Biol. 6, 787–789 (2010).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Check Hayden, E. Is the $1,000 genome for real? Nature https://www.nature.com/news/is-the-1-000-genome-for-real-1.14530 (2014).

  16. 16.

    Kwon, E. J. & Kim, Y. J. What is fetal programming?: a lifetime health is under the control of in utero health. Obstet. Gynecol. Sci. 60, 506–519 (2017).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Barker, D. J. In utero programming of chronic disease. Clin. Sci. 95, 115–128 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Topol, E. J. Individualized medicine from prewomb to tomb. Cell 157, 241–253 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Qiu, X. et al. The born in guangzhou cohort study (BIGCS). Eur. J. Epidemiol. 32, 337–346 (2017).

    PubMed  PubMed Central  Google Scholar 

  20. 20.

    Golding, J., Pembrey M., Jones, R. & ALSPAC Study Team. ALSPAC—The Avon Longitudinal Study of Parents and Children. Paediatr. Perinat. Epidemiol. 15, 74–87 (2001).

  21. 21.

    Howe, C. J., Cole, S. R., Lau, B., Napravnik, S. & Eron, J. J. Jr. Selection bias due to loss to follow up in cohort studies. Epidemiology 27, 91–97 (2016).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Swanson, J. M. The UK Biobank and selection bias. Lancet 380, 110 (2012).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Brieger, K. et al. Genes for Good: engaging the public in genetics research via social media. Am. J. Hum. Genet. 105, 65–77 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Kaprio, J. The Finnish Twin Cohort Study: an update. Twin Res. Hum. Genet. 16, 157–162 (2013).

    PubMed  PubMed Central  Google Scholar 

  25. 25.

    Magnus, P. et al. Cohort profile update: the Norwegian mother and child cohort study (MoBa). Int. J. Epidemiol. 45, 382–388 (2016).

    PubMed  PubMed Central  Google Scholar 

  26. 26.

    Beesley, L. J. et al. The emerging landscape of health research based on biobanks linked to electronic health records: existing resources, statistical challenges, and potential opportunities. Stat. Med. https://doi.org/10.1002/sim.8445 (2019).

  27. 27.

    Lau, B., Gange, S. J. & Moore, R. D. Interval and clinical cohort studies: epidemiological issues. AIDS Res. Hum. Retroviruses 23, 769–776 (2007).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Chen, M. S. Jr., Lara, P. N., Dang, J. H. T., Paterniti, D. A. & Kelly, K. Twenty years post-NIH Revitalization Act: enhancing minority participation in clinical trials (EMPaCT): laying the groundwork for improving minority clinical trial accrual: renewing the case for enhancing minority participation in cancer clinical trials. Cancer 120, 1091–1096 (2014).

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Mahmood, S. S., Levy, D., Vasan, R. S. & Wang, T. J. The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet 383, 999–1008 (2014).

    Google Scholar 

  31. 31.

    Colditz, G. A., Manson, J. E. & Hankinson, S. E. The Nurses’ Health Study: 20-year contribution to the understanding of health among women. J. Women’s Health 6, 49–62 (1997).

    CAS  Google Scholar 

  32. 32.

    Liao, Y., McGee, D. L., Cooper, R. S. & Sutkowski, M. B. How generalizable are coronary risk prediction models? Comparison of Framingham and two national cohorts. Am. Heart J. 137, 837–845 (1999).

    CAS  Google Scholar 

  33. 33.

    Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).

    Google Scholar 

  34. 34.

    Hripcsak, G. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015).

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Vashisht, R. et al. Association of hemoglobin a1c levels with use of sulfonylureas, dipeptidyl peptidase 4 inhibitors, and thiazolidinediones in patients with type 2 diabetes treated with metformin: analysis from the observational health data sciences and informatics initiative. JAMA Netw. Open 1, e181755 (2018).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Gebru, T. et al. Datasheets for datasets. arXiv 1803.09010 (2018).

  38. 38.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Google Scholar 

  39. 39.

    Wolford, B. N., Willer, C. J. & Surakka, I. Electronic health records: the next wave of complex disease genetics. Hum. Mol. Genet. 27, R14–R21 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Weber, G. M., Mandl, K. D. & Kohane, I. S. Finding the missing link for big biomedical data. J. Am. Med. Assoc. 311, 2479–2480 (2014).

    CAS  Google Scholar 

  41. 41.

    Evans, R. S. Electronic health records: then, now, and in the future. Yearb. Med. Inform. 1, S48–S61 (2016).

    PubMed  PubMed Central  Google Scholar 

  42. 42.

    Tiik, M. & Ross, P. Patient opportunities in the Estonian electronic health record system. Stud. Health Technol. Inform. 156, 171–177 (2010).

    PubMed  PubMed Central  Google Scholar 

  43. 43.

    Montgomery, J. Data sharing and the idea of ownership. New Bioeth. 23, 81–86 (2017).

    PubMed  PubMed Central  Google Scholar 

  44. 44.

    Rodwin, M. A. The case for public ownership of patient data. J. Am. Med. Assoc. 302, 86–88 (2009).

    CAS  Google Scholar 

  45. 45.

    Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Hewitt, R. & Watson, P. Defining biobank. Biopreserv. Biobank. 11, 309–315 (2013).

    Google Scholar 

  47. 47.

    Organization for Economic Cooperation and Development. Glossary of Statistical Terms: Biobank. in Creation and Governance of Human Genetic Research Databases (OECD). https://stats.oecd.org/glossary/detail.asp?ID=7220 (2006).

  48. 48.

    Kinkorová, J. Biobanks in the era of personalized medicine: objectives, challenges, and innovation: Overview. EPMA J. 7, 4 (2016).

    PubMed  PubMed Central  Google Scholar 

  49. 49.

    Gaziano, J. M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).

    Google Scholar 

  50. 50.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).

    PubMed  PubMed Central  Google Scholar 

  52. 52.

    Tapia-Conyer, R. et al. Cohort profile: the Mexico City Prospective Study. Int. J. Epidemiol. 35, 243–249 (2006).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    Senn, S. Statistical pitfalls of personalized medicine. Nature 563, 619–621 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Garrett-Bakelman, F. E. et al. The NASA Twins Study: A multidimensional analysis of a year-long human spaceflight. Science 364, eaau8650 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Zeevi, D. et al. Structural variation in the gut microbiome associates with host health. Nature 568, 43–48 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Shah, T. et al. Population genomics of cardiometabolic traits: design of the University College London-London School of Hygiene and Tropical Medicine-Edinburgh-Bristol (UCLEB) Consortium. PLoS One 8, e71345 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Tigchelaar, E. F. et al. Cohort profile: LifeLines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open 5, e006772 (2015).

    PubMed  PubMed Central  Google Scholar 

  60. 60.

    Cohen, I.G. & Mello, M.M. Big data, big tech, and protecting patient privacy. J. Am. Med. Assoc. 322, 1141–1142 (2019).

    Google Scholar 

  61. 61.

    Price, W. N. II & Cohen, I. G. Privacy in the age of medical big data. Nat. Med. 25, 37–43 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Tutton, R., Kaye, J. & Hoeyer, K. Governing UK Biobank: the importance of ensuring public trust. Trends Biotechnol. 22, 284–285 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Kaufman, D. J., Murphy-Bollinger, J., Scott, J. & Hudson, K. L. Public opinion about the importance of privacy in biobank research. Am. J. Hum. Genet. 85, 643–654 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Hernán, M. A., Hsu, J. & Healy, B. A second chance to get causal inference right: a classification of data science tasks. Chance 32, 42–49 (2019).

    Google Scholar 

  65. 65.

    Shmueli, G. To Explain or to Predict? Stat. Sci. 25, 289–310 (2010).

    Google Scholar 

  66. 66.

    Geserick, M. et al. Acceleration of BMI in early childhood and risk of sustained obesity. N. Engl. J. Med. 379, 1303–1312 (2018).

    PubMed  PubMed Central  Google Scholar 

  67. 67.

    Obermeyer, Z., Samra, J. K. & Mullainathan, S. Individual differences in normal body temperature: longitudinal big data analysis of patient records. Br. Med. J. 359, j5468 (2017).

    Google Scholar 

  68. 68.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).

    Google Scholar 

  70. 70.

    Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, E215–E220 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Wang, S. et al. MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III. arXiv 1907.08322 (2019).

  73. 73.

    Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).

    PubMed  PubMed Central  Google Scholar 

  74. 74.

    Ghassemi, M., Naumann, T., Schulam, P., Beam, A. L. & Ranganath, R. Opportunities in machine learning for healthcare. arXiv 1806.00388 (2018).

  75. 75.

    Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    CAS  Google Scholar 

  76. 76.

    Rajkomar, A., Dean, J. & Kohane, I. Machine learning in medicine. N. Engl. J. Med. 380, 1347–1358 (2019).

    PubMed  PubMed Central  Google Scholar 

  77. 77.

    Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).

    PubMed  PubMed Central  Google Scholar 

  78. 78.

    Weng, W.H. & Szolovits, P. Representation learning for electronic health records. arXiv 1909.09248 (2019).

  79. 79.

    Dickerman, B. A., García-Albéniz, X., Logan, R. W., Denaxas, S. & Hernán, M. A. Avoidable flaws in observational analyses: an application to statins and cancer. Nat. Med. 25, 1601–1606 (2019).

    CAS  Google Scholar 

  80. 80.

    Agniel, D., Kohane, I. S. & Weber, G. M. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. Br. Med. J. 361, k1479 (2018).

    Google Scholar 

  81. 81.

    Hernán, M. A. & Robins, J. M. Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol. 183, 758–764 (2016).

    PubMed  PubMed Central  Google Scholar 

  82. 82.

    Pearl, J. Causality (Cambridge University Press, 2009).

  83. 83.

    Johansson, F., Shalit, U. & Sontag, D. Learning representations for counterfactual inference. arXiv 1605.03661 (2016).

  84. 84.

    Dickerman, B. A., García-Albéniz, X., Logan, R. W., Denaxas, S. & Hernán, M. A. Avoidable flaws in observational analyses: an application to statins and cancer. Nat. Med. 25, 1601–1606 (2019).

    CAS  Google Scholar 

  85. 85.

    Smith, G. D. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    PubMed  PubMed Central  Google Scholar 

  86. 86.

    Hu, P., Jiao, R., Jin, L. & Xiong, M. Application of causal inference to genomic analysis: advances in methodology. Front. Genet. 9, 238 (2018).

    PubMed  PubMed Central  Google Scholar 

  87. 87.

    Yusuf, S. et al. Modifiable risk factors, cardiovascular disease, and mortality in 155 722 individuals from 21 high-income, middle-income, and low-income countries (PURE): a prospective cohort study. Lancet https://doi.org/10.1016/S0140-6736(19)32008-2 (2019).

  88. 88.

    Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572, 116–119 (2019).

    Google Scholar 

  89. 89.

    Rivers, E. et al. Early goal-directed therapy in the treatment of severe sepsis and septic shock. N. Engl. J. Med. 345, 1368–1377 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. 90.

    Calvert, J. S. et al. A computational approach to early sepsis detection. Comput. Biol. Med. 74, 69–73 (2016).

    PubMed  PubMed Central  Google Scholar 

  91. 91.

    Shimabukuro, D. W., Barton, C. W., Feldman, M. D., Mataraso, S. J. & Das, R. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir. Res. 4, e000234 (2017).

    PubMed  PubMed Central  Google Scholar 

  92. 92.

    Avati, A. et al. Improving palliative care with deep learning. BMC Med. Inform. Decis. Mak. 18, 122 (2018).

    PubMed  PubMed Central  Google Scholar 

  93. 93.

    Lipton, Z. C. The mythos of model interpretability. Commun. ACM 61, 36–43 (2018).

    Google Scholar 

  94. 94.

    Vogt, H., Green, S., Ekstrøm, C. T. & Brodersen, J. How precision medicine and screening with big data could increase overdiagnosis. Br. Med. J. 366, l5270 (2019).

    Google Scholar 

  95. 95.

    Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  96. 96.

    American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care 36, S67–S74 (2013)..

  97. 97.

    Udler, M. S. et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med. 15, e1002654 (2018).

    PubMed  PubMed Central  Google Scholar 

  98. 98.

    Young, A. I., Benonisdottir, S., Przeworski, M. & Kong, A. Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. 99.

    Lakhani, C. M. et al. Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes. Nat. Genet. 51, 327–334 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. 100.

    Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  101. 101.

    Phelan, M., Bhavsar, N. & Goldstein, B. A. Illustrating informed presence bias in electronic health records data: how patient interactions with a healthsystem can impact inference. eGEMs 5, 22 (2017).

    PubMed  PubMed Central  Google Scholar 

  102. 102.

    Brodniewicz, T. & Grynkiewicz, G. Preclinical drug development. Acta Pol. Pharm. 67, 578–585 (2010).

    PubMed  PubMed Central  Google Scholar 

  103. 103.

    Breyer, M. D. Improving productivity of modern-day drug discovery. Expert Opin. Drug Discov. 9, 115–118 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. 104.

    FitzGerald, G. et al. The future of humans as model organisms. Science 361, 552–553 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. 105.

    Matthews, H., Hanison, J. & Nirmalan, N. “Omics”-informed drug and biomarker discovery: opportunities, challenges and future perspectives. Proteomes 4, 28 (2016).

    Google Scholar 

  106. 106.

    Reshef, D. N. et al. Detecting novel associations in large data sets. Science 334, 1518–1524 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. 107.

    Finan, C. et al. The druggable genome and support for target identification and validation in drug development. Sci. Transl. Med. 9, eaag1166 (2017).

    PubMed  PubMed Central  Google Scholar 

  108. 108.

    Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  109. 109.

    Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 18, 41–58 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. 110.

    Paik, H. et al. Repurpose terbutaline sulfate for amyotrophic lateral sclerosis using electronic medical records. Sci. Rep. 5, 8580 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. 111.

    Dudley, J. T. et al. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 3, 96ra76 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. 112.

    Xu, S. et al. Prevalence and predictability of low-yield inpatient laboratory diagnostic tests. JAMA Netw. Open 2, e1910967 (2019).

    PubMed  PubMed Central  Google Scholar 

  113. 113.

    Einav, L., Finkelstein, A., Mullainathan, S. & Obermeyer, Z. Predictive modeling of U.S. health care spending in late life. Science 360, 1462–1465 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. 114.

    Ahlqvist, E. et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 6, 361–369 (2018).

    PubMed  PubMed Central  Google Scholar 

  115. 115.

    Thenganatt, M. A. & Jankovic, J. Parkinson disease subtypes. JAMA Neurol. 71, 499–504 (2014).

    PubMed  PubMed Central  Google Scholar 

  116. 116.

    Lawton, M. et al. Developing and validating Parkinson’s disease subtypes and their motor and cognitive progression. J. Neurol. Neurosurg. Psychiatry 89, 1279–1287 (2018).

    PubMed  PubMed Central  Google Scholar 

  117. 117.

    Berg, D. et al. Time to redefine PD? Introductory statement of the MDS Task Force on the definition of Parkinson’s disease. Mov. Disord. 29, 454–462 (2014).

    PubMed  PubMed Central  Google Scholar 

  118. 118.

    Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. 119.

    Awadalla, P. et al. Cohort profile of the CARTaGENE study: Quebec’s population-based biobank for public health and personalized genomics. Int. J. Epidemiol. 42, 1285–1299 (2013).

    PubMed  PubMed Central  Google Scholar 

  120. 120.

    Scholtens, S. et al. Cohort Profile: LifeLines, a three-generation cohort study and biobank. Int. J. Epidemiol. 44, 1172–1180 (2015).

    PubMed  PubMed Central  Google Scholar 

  121. 121.

    Christensen, H., Nielsen, J. S., Sørensen, K. M., Melbye, M. & Brandslund, I. New national Biobank of The Danish Center for Strategic Research on Type 2 Diabetes (DD2). Clin. Epidemiol. 4, 37–42 (2012).

    PubMed  PubMed Central  Google Scholar 

  122. 122.

    Krokstad, S. et al. Cohort profile: the HUNT study, Norway. Int. J. Epidemiol. 42, 968–977 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  123. 123.

    Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).

    PubMed  PubMed Central  Google Scholar 

  124. 124.

    Al Kuwari, H. et al. The Qatar Biobank: background and methods. BMC Public Health 15, 1208 (2015).

    PubMed  PubMed Central  Google Scholar 

  125. 125.

    Jiang, C. Q. et al. An overview of the Guangzhou biobank cohort study-cardiovascular disease subcohort (GBCS-CVD): a platform for multidisciplinary collaboration. J. Hum. Hypertens. 24, 139–150 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  126. 126.

    Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    PubMed  PubMed Central  Google Scholar 

  127. 127.

    Lee, J.-E. et al. National Biobank of Korea: quality control programs of collected-human biospecimens. Osong Public Health Res. Perspect. 3, 185–189 (2012).

    PubMed  PubMed Central  Google Scholar 

  128. 128.

    Lin, J.-C., Fan, C.-T., Liao, C.-C. & Chen, Y.-S. Taiwan Biobank: making cross-database convergence possible in the Big Data era. Gigascience 7, 1–4 (2018).

    PubMed  PubMed Central  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Eran Segal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med 26, 29–38 (2020). https://doi.org/10.1038/s41591-019-0727-5

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing