Review Article | Published:

Privacy in the age of medical big data


Big data has become the ubiquitous watch word of medical innovation. The rapid development of machine-learning techniques and artificial intelligence in particular has promised to revolutionize medical practice from the allocation of resources to the diagnosis of complex diseases. But with big data comes big risks and challenges, among them significant questions about patient privacy. Here, we outline the legal and ethical challenges big data brings to patient privacy. We discuss, among other topics, how best to conceive of health privacy; the importance of equity, consent, and patient governance in data collection; discrimination in data uses; and how to handle data breaches. We close by sketching possible ways forward for the regulatory system.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Cohen, I. G., Amarasingham, R., Shah, A., Xie, B. & Lo, B. The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Aff. 33, 1139–1147 (2014).

  2. 2.

    Executive Office of the President. Big data: seizing opportunities, preserving values. (2014).

  3. 3.

    Hoffman, S. Electronic Health Records and Medical Big Data (Cambridge Univ. Press, New York, 2016).

  4. 4.

    Institute of Medicine. Committee on Quality of Health Care in America, the National Academies. To Err is Human: Building a Safer Health System (eds. Kohn, L. T., Corrigan, J. M., & Donaldson, M. S.) (National Academies Press, Washington, D.C., 2000).

  5. 5.

    Centers for Medicare and Medicaid Services. Hospital inpatient quality reporting program. (2017).

  6. 6.

    Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nat. Rev. Genet. 12, 417–428 (2011).

  7. 7.

    Behrman, R. E. et al. Developing the sentinel system—a national resource for evidence development. N. Engl. J. Med. 364, 498–499 (2011).

  8. 8.

    Price, W. N. II Black-box medicine. Harv. J.L. & Tech. 28, 419–467 (2016).

  9. 9.

    Terry, N. P. Appification, AI, & healthcare’s new iron triangle. Preprint at (2018).

  10. 10.

    Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

  11. 11.

    Liu, N. T. et al. Development and validation of a machine learning algorithm and hybrid system to predict the need for life-saving interventions in trauma patients. Med. Biol. Eng. Comput. 52, 193–203 (2014).

  12. 12.

    Avati, A. et al. Improving palliative care with deep learning. Preprint at (2018).

  13. 13.

    Spector-Bagdady, K. & Shuman, A. Reg-ENT within the learning health system. Otolaryngol. Head. Neck. Surg. 158, 405–406 (2018).

  14. 14.

    Price, W. N. II Regulating black-box medicine. Mich. L. Rev. 116, 421–474 (2017).

  15. 15.

    Institute of Medicine. The LearningHealthcare System: Workshop Summary (eds. Olsen, L. A., Aisner, D. & McGinnis, J. M.) (National Academies Press, Washington, D.C., 2007).

  16. 16.

    Faden, R. R. et al. An ethics framework for a learning health care system: a departure from traditional research ethics and clinical ethics. Hastings Ctr. Rep. 43, S16–S27 (2013).

  17. 17.

    Kass, N. E. The research-treatment distinction: a problematic approach for determining which activities should have ethical oversight. Hastings Ctr. Rep. 43, S4–S15 (2013).

  18. 18.

    Raval, M. V., Sakran, J. V., Medbery, R. L., Angelos, P. & Hall, B. L. Distinguishing QI projects from human subjects research: ethical and practical considerations. Bull. Am. Coll. Surg. 99, 21–7 (2014).

  19. 19.

    Miller, F. G. & Emanuel, E. J. Quality-improvement research and informed consent. N. Engl. J. Med. 358, 765–767 (2008).

  20. 20.

    Morreim, H. Research versus innovation: real differences. Am. J. Bioeth. 5, 42–43 (2005).

  21. 21.

    Friedman, C. P., Wong, A. K. & Blumenthal, D. Achieving a nationwide learning health system. Sci. Translat. Med. 2, 57cm29 (2010).

  22. 22.

    Nissenbaum, H. Privacy in Context: Technology, Policy, and the Integrity of Social Life (Stanford Univ. Press, Stanford, CA, USA, 2010).

  23. 23.

    Konnoth, C. An expressive theory of privacy intrusions. Iowa L. Rev. 102, 1533–1581 (2017).

  24. 24.

    Terry, N. P. Regulatory disruption and arbitrage in health-care data protection. Yale J. Health Pol’y L. & Ethics 17, 143–208 (2017).

  25. 25.

    Terry, N. P. Existential challenges for healthcare data protection in the United States. Ethics, Med., & Pub. Health 3, 19–27 (2017).

  26. 26.

    Commission Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with regard to the Processing of Personal Data and on the Free Movement of such Data, and Repealing Directive, 95/46/EC, 2016 O.J. (L 119) 1, 34 (General Data Protection Regulation). (2016).

  27. 27.

    Spector-Bagdady, K., Prince, A. E. R., Yu, J. H. & Appelbaum, P. S. Analysis of state laws on informed consent for clinical genetic testing in the era of genomic sequencing. Am. J. Med. Genet. C. Semin. Med. Genet. 178, 81–88 (2018).

  28. 28.

    45 C.F.R. §§ 160.103–164.504.

  29. 29.

    45 C.F.R. §§ 164.302–318.

  30. 30.

    Eisenberg, R. S. & Price, W. N. II Promoting healthcare innovation on the demand side. J.L. & Biosciences 4, 3–49 (2017).

  31. 31.

    45 C.F.R. § 164.514.

  32. 32.

    Gymrek, M. et al. Identifying personal genomes by surname inference. Science 339, 321–324 (2013).

  33. 33.

    National Committee on Vital and Health Statistics and its Privacy, Security, and Confidentiality Subcommittee, U.S. Department of Health and Humam Services. Health information privacy beyond HIPAA: a 2018 environmental scan of major trends and challenges. (2017).

  34. 34.

    Philibert, R. A. et al. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern. Clin. Epigenetics 6, 28 (2014).

  35. 35.

    Centers for Medicare and Medicaid Services. Blue Button® 2.0: improving medicare beneficiary access to their health information.

  36. 36.

    Couzin-Frankel, J. After a prominent gene-testing firm declined to give patients their complete data, ACLU filed a complaint. Science (2016).

  37. 37.

    Riley, M. F. Big data, HIPAA, and the common rule: time for a big change? In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  38. 38.

    Hoffman, S. Citizen science: the law and ethics of public access to medical big data. Berkeley Tech. L.J. 30, 1741–1805 (2015).

  39. 39.

    Barocas, S. & Selbst, A. D. Big data’s disparate impact. Calif. L Rev. 104, 671–732 (2016).

  40. 40.

    Malanga, S. E., Loe, J. D., Robertson, C. T. & Ramos, K. S. Who’s left out of big data? how big data collection, analysis, and use neglects populations most in need of medical and public health research and interventions. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  41. 41.

    Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? Preprint at (2018).

  42. 42.

    Kleinberg, J., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. Preprint at (2016).

  43. 43.

    Cohen, I. G. Is there a duty to share health care data? In Big Data, Health Law, and Bioethics (Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U. eds., Cambridge Univ. Press, New York, 2018).

  44. 44.

    Kaye, J. et al. Dynamic consent: a patient interface for twenty-first century research networks. Eur. J. Hum. Genet. 23, 141–146 (2015).

  45. 45.

    Grady, C. et al. Broad consent for research with biological samples: workshop conclusions. Am. J. Bioeth. 15, 34–42 (2015).

  46. 46.

    Mayer‐Schönberger, V. & Ingelsson, E. Big data and medicine: a big deal? (Review Symposium). J. Intern. Med. 283, 418–429 (2018).

  47. 47.

    Rockhold, F., Nisen, P. & Freeman, A. Data sharing at a crossroads. N. Engl. J. Med. 375, 1115–1117 (2016).

  48. 48.

    Winickoff, D. & Winickoff, M. The charitable trust as a model for genomic biobanks. N. Engl. J. Med. 349, 1180–1184 (2003).

  49. 49.

    Evans, B. J. Big data and individual autonomy in a crowd. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  50. 50.

    Maschke, K. J. Governance Issues for Biorepositories and BiospecimenResearch 299. In Specimen Science: Ethics and Policy Implications (eds. Lynch, H. F., Bierer, B. E., Cohen, I. G. & Rivera, S. M.) (MIT Press, Cambridge, MA, USA, 2017).

  51. 51.

    Connected Health Cities. Citizens’ Juries Report. (2017).

  52. 52.

    Calo, M. R. The boundaries of privacy harm. Indiana L.J. 86, 1131–1162 (2011).

  53. 53.

    Epstein, R. A. The legal regulation of genetic discrimination: old responses to new technology. B.U. L. Rev. 74, 1–23 (1994).

  54. 54.

    Stone, D. A. The struggle for the soul of health insurance. J. Health Polit. Policy & L. 18, 287–317 (1993).

  55. 55.

    Hoffman, A. K. Three models of health insurance: the conceptual pluralism of the Patient Protection and Affordable Care Act. U. Penn. L. Rev. 159, 1873–1954 (2011).

  56. 56.

    Hoffman, S. data’s new discrimination threats: amending the americans with disabilities act to cover discrimination based on data-driven predictions of future disease. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U. eds.) (Cambridge Univ. Press, New York, 2018).

  57. 57.

    Mello, M. M., Lieou, V. & Goodman, S. N. Clinical trial participants’ views of the risks and benefits of data sharing. N. Engl. J. Med. 378, 2202–2211 (2018).

  58. 58.

    Grande, D. et al. Public preferences about secondary uses of electronic health information. JAMA Intern. Med. 173, 1798–1806 (2013).

  59. 59.

    Ford, R. A. & Price, W. N. II Privacy and accountability in black-box medicine. Mich. Telecomm. & Tech. L. Rev. 23, 1–43 (2016).

  60. 60.

    May, T. Sociogenetic risks—ancestry DNA testing, third-party identity, and protection of privacy. N. Engl. J. Med. 379, 410–412 (2018).

  61. 61.

    Crawford, K. & Schultz, J. Big data and due process: toward a framework to redress predictive privacy harms. B.C. L. Rev. 55, 93–128 (2014).

  62. 62.

    Skopek, J. M. Big data’s epistemology and its implications for precision medicine and privacy. In Big Data, Health Law, and Bioethics (eds. Cohen, I. G., Fernandez Lynch, H., Vayena, E. & Gasser, U.) (Cambridge Univ. Press, New York, 2018).

  63. 63.

    Terry, N. P. Protecting patient privacy in the age of big data. U.M.K.C. L. Rev. 81, 1–34 (2012).

  64. 64.

    Goldacre, B. How to get all trials reported: audit, better data, and individual accountability. PLoS. Med. 12, e1001821 (2015).

  65. 65.

    Price II, W. N. Drug approval in a learning health system. Preprint at (2018).

  66. 66.

    Beaulieu-Jones, B. K. et al. Privacy-preserving generative deep neural networks support clinical data sharing. Preprint at (2018).

  67. 67.

    Dwork, C. & Roth, A. The algorithmic foundations of differential privacy. Found. & Trends in Theoretical Comput. Sci. 9, 211–407 (2014).

  68. 68.

    Moussa, M. & Demurjian, S. A. Differential privacy approach for big data privacy in healthcare. In Privacy and Security Policies in Big Data (eds. Tamane, S., Solanki, V. K. & Dey, N. eds.) (IGI Global, Hershey, PA, USA, 2017).

  69. 69.

    Price, W. N. II Big data, patents, and the future of medicine. Cardozo L. Rev. 37, 1401–1453 (2016).

  70. 70.

    Cook-Deegan, R. et al. The next controversy in genetic testing: clinical data as trade secrets? Eur. J. Hum. Genetics 21, 585–588 (2013).

  71. 71.

    Spector-Bagdady, K. “The Google of Healthcare:” enabling the privatization of genetic bio/databanking. Ann. Epidemiol. 26, 515–519 (2016).

  72. 72.

    Greely, H. T. The uneasy ethical and legal underpinnings of large-scale genomic biobanks. Annu. Rev. Genomics Hum. Genet. 8, 343–346 (2007).

  73. 73.

    Ohm, P. Broken promises of privacy: responding to the surprising failure of anonymization. UCLA L. Rev. 57, 1738–1777 (2010).

  74. 74.

    Narayanan, A. & Shmatikov, V. Robust deanonymization of large sparse datasets (how to break the anonymity of the Netflix prize database). In 2008 IEEE Symposium on Security and Privacy. (2008).

Download references


The authors extend thanks to N. Terry and K. Spector-Bagdady.

Author information

Competing interests

W.N.P. and I.G.C.’s research reported in this publication was done with the support of CeBIL (Collaborative Research Program for Biomedical Innovation Law). CeBIL is a scientifically independent collaborative research program supported by a Novo Nordisk Foundation Grant (grant number NNF17SA0027784). W.N.P.’s work was also supported by the National Cancer Institute (Grant number 1-R01-CA-214829-01-A1; The Lifecycle of Health Data: Policies and Practices). I.G.C. has served as a consultant for Otsuka Pharmaceuticals on their Abilify MyCite product.

Correspondence to I. Glenn Cohen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: Consent models for health data.
Fig. 2: The data that is and isn’t included in HIPAA.
Fig. 3: Potential harms to the individual if data is breached.