To the Editor — We read with interest the recent Editorial ‘Shared, but not up for grabs’1. In particular, the editor’s reflections about consent and privacy in connection to the use of publicly shared images of humans in the computer science community led us to think about the role of an institutional review board (IRB): how would an IRB examine a waiver of consent in such situations and review whether research “will not adversely affect the rights and welfare of the subjects”2?

An IRB is the stronghold that curbs unethical research practices. Historically, it has been routine practice in biomedical, behavioural and social sciences, with few exceptions. Most research entities in the aforementioned fields nowadays enforce mandatory IRB training for personnel upon hire, even if not immediately working on human subjects’ research. Investigators often complain about the leviathan bureaucracy accompanying the process of protocol set-up and approval by an IRB3. On the other hand, many IRBs face overwhelming challenges in streamlining the review process while maintaining rigour and quality4. These challenges are even more tortuous in cases where research bridges different fields and makes use of big ‘organic’ data that have been acquired from various online sources, even if the official questions remain the same2: ‘Is it research? Does the research involve human subjects?’ Of course, data science practices may not be framed as research or datasets may not be considered human subjects, managing to entirely bypass ethics regulations5.

As the editor pointed out, even if data have a Creative Commons licence, this is not a sufficient guarantee that an IRB is not needed. Besides scraping online photos, this issue is also relevant for social media data mining. A systematic review of Twitter use for health research found that only 32% of the articles mentioned ethical approval and only 12% mentioned participant consent6.

The definition of minimal risk to subjects reads2 “the probability and magnitude of harm or discomfort anticipated in the proposed research are not greater in and of themselves than those ordinarily encountered in daily lives of the general population or during the performance of routine physical or psychological examinations or tests.” Even if still subject of debate7, such a definition is workable in clinical settings, but hardly translatable to big data research. Would online photos or social media data scraping pose minimal risk to the subjects? Which research designs and objectives can be sought to minimize risks? We are long overdue for modernization of the procedural burden, evaluation and handling of new problematic grey areas in human subject studies, and regulation of research integrity with big data8,9.

We should honestly consider how much we can currently rely on IRB training for our big data research endeavours. The Belmont Report10, one of the basic reads for research ethics, is still inspiring us after 40 years. The report’s core philosophical principles of ‘respect for persons, beneficence and justice’ should guide us and make sure that IRBs will not be a merely irrelevant administrative exercise.