Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Privacy in the age of medical big data


Big data has become the ubiquitous watch word of medical innovation. The rapid development of machine-learning techniques and artificial intelligence in particular has promised to revolutionize medical practice from the allocation of resources to the diagnosis of complex diseases. But with big data comes big risks and challenges, among them significant questions about patient privacy. Here, we outline the legal and ethical challenges big data brings to patient privacy. We discuss, among other topics, how best to conceive of health privacy; the importance of equity, consent, and patient governance in data collection; discrimination in data uses; and how to handle data breaches. We close by sketching possible ways forward for the regulatory system.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Consent models for health data.
Fig. 2: The data that is and isn’t included in HIPAA.
Fig. 3: Potential harms to the individual if data is breached.


  1. Cohen, I. G., Amarasingham, R., Shah, A., Xie, B. & Lo, B. The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Aff. 33, 1139–1147 (2014).

    Article  Google Scholar 

  2. Executive Office of the President. Big data: seizing opportunities, preserving values. (2014).

  3. Hoffman, S. Electronic Health Records and Medical Big Data (Cambridge Univ. Press, New York, 2016).

  4. Institute of Medicine. Committee on Quality of Health Care in America, the National Academies. To Err is Human: Building a Safer Health System (eds. Kohn, L. T., Corrigan, J. M., & Donaldson, M. S.) (National Academies Press, Washington, D.C., 2000).

  5. Centers for Medicare and Medicaid Services. Hospital inpatient quality reporting program. (2017).

  6. Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011).

    Article  CAS  Google Scholar 

  7. Behrman, R. E. et al. Developing the sentinel system—a national resource for evidence development. N. Engl. J. Med. 364, 498–499 (2011).

    Article  CAS  Google Scholar 

  8. Price, W. N. II Black-box medicine. Harv. J.L. & Tech. 28, 419–467 (2016).

    Google Scholar 

  9. Terry, N. P. Appification, AI, & healthcare’s new iron triangle. Preprint at (2018).

  10. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  CAS  Google Scholar 

  11. Liu, N. T. et al. Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients. Med. Biol. Eng. Comput. 52, 193–203 (2014).

    Article  Google Scholar 

  12. Avati, A. et al. Improving palliative care with deep learning. Preprint at (2018).

  13. Spector-Bagdady, K. & Shuman, A. Reg-ENT within the learning health system. Otolaryngol. Head. Neck. Surg. 158, 405–406 (2018).

    Article  Google Scholar 

  14. Price, W. N. II Regulating black-box medicine. Mich. L. Rev. 116, 421–474 (2017).

    Google Scholar 

  15. Institute of Medicine. The LearningHealthcare System: Workshop Summary (eds. Olsen, L. A., Aisner, D. & McGinnis, J. M.) (National Academies Press, Washington, D.C., 2007).

  16. Faden, R. R. et al. An ethics framework for a learning health care system: a departure from traditional research ethics and clinical ethics. Hastings Ctr. Rep. 43, S16–S27 (2013).

    Article  Google Scholar 

  17. Kass, N. E. The research-treatment distinction: a problematic approach for determining which activities should have ethical oversight. Hastings Ctr. Rep. 43, S4–S15 (2013).

    Article  Google Scholar 

  18. Raval, M. V., Sakran, J. V., Medbery, R. L., Angelos, P. & Hall, B. L. Distinguishing QI projects from human subjects research: ethical and practical considerations. Bull. Am. Coll. Surg. 99, 21–7 (2014).

    PubMed  Google Scholar 

  19. Miller, F. G. & Emanuel, E. J. Quality-improvement research and informed consent. N. Engl. J. Med. 358, 765–767 (2008).

    Article  CAS  Google Scholar 

  20. Morreim, H. Research versus innovation: real differences. Am. J. Bioeth. 5, 42–43 (2005).

    Article  Google Scholar 

  21. Friedman, C. P., Wong, A. K. & Blumenthal, D. Achieving a nationwide learning health system. Sci. Translat. Med. 2, 57cm29 (2010).

    Article  Google Scholar 

  22. Nissenbaum, H. Privacy in Context: Technology, Policy, and the Integrity of Social Life (Stanford Univ. Press, Stanford, CA, USA, 2010).

  23. Konnoth, C. An expressive theory of privacy intrusions. Iowa L. Rev. 102, 1533–1581 (2017).

    Google Scholar 

  24. Terry, N. P. Regulatory disruption and arbitrage in health-care data protection. Yale J. Health Pol’y L. & Ethics 17, 143–208 (2017).

    Google Scholar 

  25. Terry, N. P. Existential challenges for healthcare data protection in the United States. Ethics, Med., & Pub. Health 3, 19–27 (2017).

    Article  Google Scholar 

  26. Commission Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with regard to the Processing of Personal Data and on the Free Movement of such Data, and Repealing Directive, 95/46/EC, 2016 O.J. (L 119) 1, 34 (General Data Protection Regulation). (2016).

  27. Spector-Bagdady, K., Prince, A. E. R., Yu, J. H. & Appelbaum, P. S. Analysis of state laws on informed consent for clinical genetic testing in the era of genomic sequencing. Am. J. Med. Genet. C. Semin. Med. Genet. 178, 81–88 (2018).

    Article  Google Scholar 

  28. 45 C.F.R. §§ 160.103–164.504.

  29. 45 C.F.R. §§ 164.302–318.

  30. Eisenberg, R. S. & Price, W. N. II Promoting healthcare innovation on the demand side. J.L. & Biosciences 4, 3–49 (2017).

    Google Scholar 

  31. 45 C.F.R. § 164.514.

  32. Gymrek, M. et al. Identifying personal genomes by surname inference. Science 339, 321–324 (2013).

    Article  CAS  Google Scholar 

  33. National Committee on Vital and Health Statistics and its Privacy, Security, and Confidentiality Subcommittee, U.S. Department of Health and Humam Services. Health information privacy beyond HIPAA: a 2018 environmental scan of major trends and challenges. (2017).

  34. Philibert, R. A. et al. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern. Clin. Epigenetics 6, 28 (2014).

    Article  Google Scholar 

  35. Centers for Medicare and Medicaid Services. Blue Button® 2.0: improving medicare beneficiary access to their health information.

  36. Couzin-Frankel, J. After a prominent gene-testing firm declined to give patients their complete data, ACLU filed a complaint. Science (2016).

  37. Riley, M. F. Big data, HIPAA, and the common rule: time for a big change? In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  38. Hoffman, S. Citizen science: the law and ethics of public access to medical big data. Berkeley Tech. L.J. 30, 1741–1805 (2015).

    Google Scholar 

  39. Barocas, S. & Selbst, A. D. Big data’s disparate impact. Calif. L Rev. 104, 671–732 (2016).

    Google Scholar 

  40. Malanga, S. E., Loe, J. D., Robertson, C. T. & Ramos, K. S. Who’s left out of big data? how big data collection, analysis, and use neglects populations most in need of medical and public health research and interventions. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  41. Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? Preprint at (2018).

  42. Kleinberg, J., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. Preprint at (2016).

  43. Cohen, I. G. Is there a duty to share health care data? In Big Data, Health Law, and Bioethics (Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U. eds., Cambridge Univ. Press, New York, 2018).

  44. Kaye, J. et al. Dynamic consent: a patient interface for twenty-first century research networks. Eur. J. Hum. Genet. 23, 141–146 (2015).

    Article  Google Scholar 

  45. Grady, C. et al. Broad consent for research with biological samples: workshop conclusions. Am. J. Bioeth. 15, 34–42 (2015).

    Article  Google Scholar 

  46. Mayer‐Schönberger, V. & Ingelsson, E. Big data and medicine: a big deal? (Review Symposium). J. Intern. Med. 283, 418–429 (2018).

    Article  Google Scholar 

  47. Rockhold, F., Nisen, P. & Freeman, A. Data sharing at a crossroads. N. Engl. J. Med. 375, 1115–1117 (2016).

    Article  Google Scholar 

  48. Winickoff, D. & Winickoff, M. The charitable trust as a model for genomic biobanks. N. Engl. J. Med. 349, 1180–1184 (2003).

    Article  CAS  Google Scholar 

  49. Evans, B. J. Big data and individual autonomy in a crowd. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  50. Maschke, K. J. Governance Issues for Biorepositories and BiospecimenResearch 299. In Specimen Science: Ethics and Policy Implications (eds. Lynch, H. F., Bierer, B. E., Cohen, I. G. & Rivera, S. M.) (MIT Press, Cambridge, MA, USA, 2017).

  51. Connected Health Cities. Citizens’ Juries Report. (2017).

  52. Calo, M. R. The boundaries of privacy harm. Indiana L.J. 86, 1131–1162 (2011).

    Google Scholar 

  53. Epstein, R. A. The legal regulation of genetic discrimination: old responses to new technology. B.U. L. Rev. 74, 1–23 (1994).

    Google Scholar 

  54. Stone, D. A. The struggle for the soul of health insurance. J. Health Polit. Policy & L. 18, 287–317 (1993).

    Article  CAS  Google Scholar 

  55. Hoffman, A. K. Three models of health insurance: the conceptual pluralism of the Patient Protection and Affordable Care Act. U. Penn. L. Rev. 159, 1873–1954 (2011).

    Google Scholar 

  56. Hoffman, S. data’s new discrimination threats: amending the americans with disabilities act to cover discrimination based on data-driven predictions of future disease. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U. eds.) (Cambridge Univ. Press, New York, 2018).

  57. Mello, M. M., Lieou, V. & Goodman, S. N. Clinical trial participants’ views of the risks and benefits of data sharing. N. Engl. J. Med. 378, 2202–2211 (2018).

    Article  Google Scholar 

  58. Grande, D. et al. Public preferences about secondary uses of electronic health information. JAMA Intern. Med. 173, 1798–1806 (2013).

    Article  Google Scholar 

  59. Ford, R. A. & Price, W. N. II Privacy and accountability in black-box medicine. Mich. Telecomm. & Tech. L. Rev. 23, 1–43 (2016).

    Google Scholar 

  60. May, T. Sociogenetic risks—ancestry DNA testing, third-party identity, and protection of privacy. N. Engl. J. Med. 379, 410–412 (2018).

    Article  Google Scholar 

  61. Crawford, K. & Schultz, J. Big data and due process: toward a framework to redress predictive privacy harms. B.C. L. Rev. 55, 93–128 (2014).

    Google Scholar 

  62. Skopek, J. M. Big data’s epistemology and its implications for precision medicine and privacy. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  63. Terry, N. P. Protecting patient privacy in the age of big data. U.M.K.C. L. Rev. 81, 1–34 (2012).

    Google Scholar 

  64. Goldacre, B. How to get all trials reported: audit, better data, and individual accountability. PLoS. Med. 12, e1001821 (2015).

    Article  Google Scholar 

  65. Price II, W. N. Drug approval in a learning health system. Preprint at (2018).

  66. Beaulieu-Jones, B. K. et al. Privacy-preserving generative deep neural networks support clinical data sharing. Preprint at (2018).

  67. Dwork, C. & Roth, A. The algorithmic foundations of differential privacy. Found. & Trends in Theoretical Comput. Sci. 9, 211–407 (2014).

    Article  Google Scholar 

  68. Moussa, M. & Demurjian, S. A. Differential privacy approach for big data privacy in healthcare. In Privacy and Security Policies in Big Data (eds. Tamane, S., Solanki, V. K. & Dey, N. eds.) (IGI Global, Hershey, PA, USA, 2017).

  69. Price, W. N. II Big data, patents, and the future of medicine. Cardozo L. Rev. 37, 1401–1453 (2016).

    Google Scholar 

  70. Cook-Deegan, R. et al. The next controversy in genetic testing: clinical data as trade secrets? Eur. J. Hum. Genetics 21, 585–588 (2013).

    Article  Google Scholar 

  71. Spector-Bagdady, K. “The Google of Healthcare:” enabling the privatization of genetic bio/databanking. Ann. Epidemiol. 26, 515–519 (2016).

    Article  Google Scholar 

  72. Greely, H. T. The uneasy ethical and legal underpinnings of large-scale genomic biobanks. Annu. Rev. Genomics Hum. Genet. 8, 343–346 (2007).

    Article  CAS  Google Scholar 

  73. Ohm, P. Broken promises of privacy: responding to the surprising failure of anonymization. UCLA L. Rev. 57, 1738–1777 (2010).

    Google Scholar 

  74. Narayanan, A. & Shmatikov, V. Robust deanonymization of large sparse datasets (how to break the anonymity of the Netflix prize database). In 2008 IEEE Symposium on Security and Privacy. (2008).

Download references


The authors extend thanks to N. Terry and K. Spector-Bagdady.

Author information

Authors and Affiliations


Corresponding author

Correspondence to I. Glenn Cohen.

Ethics declarations

Competing interests

W.N.P. and I.G.C.’s research reported in this publication was done with the support of CeBIL (Collaborative Research Program for Biomedical Innovation Law). CeBIL is a scientifically independent collaborative research program supported by a Novo Nordisk Foundation Grant (grant number NNF17SA0027784). W.N.P.’s work was also supported by the National Cancer Institute (Grant number 1-R01-CA-214829-01-A1; The Lifecycle of Health Data: Policies and Practices). I.G.C. has served as a consultant for Otsuka Pharmaceuticals on their Abilify MyCite product.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Price , W.N., Cohen, I.G. Privacy in the age of medical big data. Nat Med 25, 37–43 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing