A sign showing a Facebook icon outside their corporate headquarters

Misuse of information from Facebook users will trigger much-needed updates to data ethics.Credit: Josh Edelson/AFP/Getty

Revelations keep emerging in the Cambridge Analytica personal-data scandal, which has captured global public attention for more than a week. But when the dust settles, researchers harvesting data online will face greater scrutiny. And so they should.

At the centre of the controversy is Aleksandr Kogan, a psychologist and neuroscientist at the University of Cambridge, UK. In 2014, he recruited people to complete a number of surveys and sign up to an app that handed over Facebook information on themselves — and tens of millions of Facebook friends. Kogan passed the data to SCL, a UK firm that later founded controversial political-consultancy firm Cambridge Analytica in London. (All those involved deny any wrongdoing.)

Last week, Facebook announced restrictions on data harvesting by third parties, including drastically reducing the kinds of information that app developers can access. (It had already changed its rules in 2014 to stop developers gleaning data from users’ friends through their apps.) But damage has been done: the public has good reason to be angry about the way in which researchers and companies have seemingly used personal data without consumers’ full understanding or consent.

Where do academic researchers fit in? Handled correctly, online data can be a major boon to research, and the world would benefit from companies such as Facebook making their data more open. Ethical safeguards for research that intervenes in human lives were largely set up for medical and psychological studies, and are often written with definitions that exclude Internet research. In the United States, for example, unless data collected are both private and identifiable, informed consent is usually not deemed necessary, and research requires minimal, if any, oversight by an institutional review board. This would include data from Twitter, which are by default public. Models built on anonymized Facebook data would also tend to be exempt.

Kogan’s study was unusual, both in that it was done by a university academic for a private company he operated, and in that the data were passed to a third party. Yet there is a common theme behind this controversy and ones that preceded it — such as a study warning that someone’s sexual orientation could be determined from their online presence (Y. Wang and M. Kosinski J. Personality Soc. Psychol. 114, 246–257; 2018). Data were used in ways well beyond what users expected or intended. Bundled together and trawled by algorithms, innocuous data points can reveal information that users might reasonably expect to stay private and that might be used in ways they are not happy with.

Guidance does exist. A number of projects are grappling with the ethical challenges of big data. US and European funders have supported efforts in this area, and have issued recommendations such as rethinking what counts as ‘public’ data and the need to consider a study’s potential harm to society, as well as to individuals. (The University of Cambridge is among the institutions writing guidelines for Internet-mediated research, after the UK Research Integrity Office issued non-binding recommendations on the topic in 2016.) Funders should further support such efforts, and make them better known to researchers.

Sticking points remain, a major one being that consent is often not practical when retrospectively accessing data from millions of individuals. But as outlined for biomedical scientists in the 1978 Belmont Report, the principle of beneficence applies: researchers should put the good of research participants first and, with that in mind, perform their own assessment of risks versus benefits. Studies should not be done just because the data are there. In studies that are too large to ask participants for consent, researchers should poll the views of samples of subjects and of any population that could be affected by the outcomes. Ethics training on research should be extended to computer scientists who have not conventionally worked with human study participants.

Academics across many fields know well how technology can outpace its regulation. All researchers have a duty to consider the ethics of their work beyond the strict limits of law or today’s regulations. If they don’t, they will face serious and continued loss of public trust.