After an earthquake tore through Haiti in 2010, killing more than 100,000 people, aid agencies spread across the country to work out where the survivors had fled. But Linus Bengtsson, a graduate student studying global health at the Karolinska Institute in Stockholm, thought he could answer the question from afar. Many Haitians would be using their mobile phones, he reasoned, and those calls would pass through phone towers, which could allow researchers to approximate people’s locations. Bengtsson persuaded Digicel, the biggest phone company in Haiti, to share data from millions of call records from before and after the quake. Digicel replaced the names and phone numbers of callers with random numbers to protect their privacy.
Bengtsson’s idea worked. The analysis wasn’t completed or verified quickly enough to help people in Haiti at the time, but in 2012, he and his collaborators reported that the population of Haiti’s capital, Port-au-Prince, dipped by almost one-quarter soon after the quake, and slowly rose over the next 11 months1. That result aligned with an intensive, on-the-ground survey conducted by the United Nations.
Humanitarians and researchers were thrilled. Telecommunications companies scrutinize call-detail records to learn about customers’ locations and phone habits and improve their services. Researchers suddenly realized that this sort of information might help them to improve lives. Even basic population statistics are murky in low-income countries where expensive household surveys are infrequent, and where many people don’t have smartphones, credit cards and other technologies that leave behind a digital trail, making remote-tracking methods used in richer countries too patchy to be useful.
Since the earthquake, scientists working under the rubric of ‘data for good’ have analysed calls from tens of millions of phone owners in Pakistan, Bangladesh, Kenya and at least two dozen other low- and middle-income nations. Humanitarian groups say that they’ve used the results to deliver aid. And researchers have combined call records with other information to try to predict how infectious diseases travel, and to pinpoint locations of poverty, social isolation, violence and more (see ‘Phone calls for good’).
At least 20 mobile-phone companies have donated their proprietary information to such efforts, including operators in 100 countries that back an initiative called Big Data for Social Good, sponsored by the GSMA, an international mobile-phone association. Cash to support the studies has poured in from the UN, the World Bank, the US National Institutes of Health and the Bill & Melinda Gates Foundation in Seattle, Washington. Bengtsson co-founded a non-profit organization in Stockholm called Flowminder that crunches massive call data sets with the aim of saving lives.
Yet as data-for-good projects gain traction, some researchers are asking whether they benefit society enough to outweigh their potential for misuse. That question is complicated to answer. Aid agencies are secretive about the details of their projects. The GSMA celebrates some data-for-good analyses as weapons against epidemics and disasters, but rarely points to peer-reviewed research to support the claims. And in the fields of public health, computer and social science, a decade of published call-record studies have yet to notably assist the communities they track.
Meanwhile, concerns are rising over the lack of consent involved; the potential for breaches of privacy, even from anonymized data sets; and the possibility of misuse by commercial or government entities interested in surveillance. Critics can’t point to any specific harm that has come from these projects. But it’s possible to imagine a government rounding up political dissidents who have been identified in a well-intentioned call-record project, or human traffickers using the results to locate desperate people seeking asylum, suggests Nathaniel Raymond, a data-responsibility researcher at Yale University in New Haven, Connecticut. He and others say it’s time to create thorough guidelines for assessing the benefits and risks of data-for-good studies that use call records. “We don’t know enough about the harm we might cause with good intentions,” he says.
When historians look back on this era, they could well call it the age of the mobile phone. In 2017, more than 5 billion people owned them — up to two-thirds of the global population. By 2025, that proportion is expected to reach 71%, according to the GSMA. Although not everyone owns a phone, Flowminder and other researchers have shown that call-record analyses can estimate the distribution and movement of populations. Government agencies, including those in the Netherlands, Afghanistan and the Democratic Republic of the Congo (DRC), are now exploring how call records can feed into censuses. This information is sorely lacking in many low-income countries: the DRC’s last complete census was published in 1984; Flowminder is helping it with one now.
Aid organizations also use these data. The UN’s World Food Programme, based in Rome, analysed anonymized call records to find out where people needed food or cash assistance after an earthquake in Nepal in 2015, says Jonathan Rivers, a programme officer at the agency. Flowminder and the UN team estimated how many people fled the capital Kathmandu after the quake, where they went and when they returned. Rivers says the agency conducts such projects around the world, but declined to name other examples. It rarely makes reports public. He says one reason for the secrecy is that phone companies that make their data available fear a backlash from subscribers who do not want their location shared, even anonymously.
In general, researchers glean results from anonymized call-detail records that show roughly where and when text messages and phone calls are made. The results are then aggregated into groups so researchers can learn what proportion of a population travels from one point to another (see ‘Protecting privacy’). Phone companies don’t legally need subscriber consent to share information that is anonymized and aggregated, says Jeanine Vos, head of the GSMA’s Big Data for Social Good initiative. “The data is no longer attached to any individual,” she explains. When subscribers are asked for consent, it tends to be on an opt-out basis in the fine print of contracts they sign when activating a phone’s SIM card.
The Ebola controversy
During the peak of the Ebola outbreak in Sierra Leone, Guinea and Liberia in 2014, epidemiologists at Flowminder, the UN and other institutions pushed for access to anonymized call records, arguing that the information could help to curb the crisis. “The value of these data in the context of a public-health emergency like the ongoing Ebola outbreak is undeniable,” Bengtsson and his colleagues wrote in PLOS Currents: Outbreaks that September.
But some researchers on the ground didn’t agree. “What they were proposing didn’t even work logically,” says Susan Erikson, an anthropologist at Simon Fraser University in Burnaby, Canada. Unlike highly transmissible, airborne infections, the Ebola virus spreads only through direct contact with infected bodily fluids. So quantifying how populations move wouldn’t reveal how the virus spreads, Erikson argued. It was much more urgent to convince individuals with symptoms to come into clinics, where they’d be isolated to prevent further infections. Officials in the countries hit by Ebola didn’t have ethical guidelines on call-record analyses, so spent time deliberating how to regulate them. That time, says Erikson, could have been better spent handling the escalating crisis.
Liberia decided not to allow the studies, citing privacy concerns. But Bengtsson and his colleagues gained access to anonymized call records from Sierra Leone. Those records didn’t help to track Ebola, but confirmed how much less people travelled during the country’s three-day travel ban in late March 20152. Despite the modest result, magazines ran headlines saying that call records could predict where Ebola strikes. And Bengtsson and his colleagues wrote3 in 2018: “As recent crises have made abundantly clear, having qualified researchers being barred from accessing and using valuable mobile-phone data is not acceptable.”
Such forceful statements aggravate researchers who say they have witnessed the roll-out of too many technological experiments during crises that don’t help the people who most need it. Sean McDonald, a digital-governance researcher at Duke University in Durham, North Carolina, cautions that crises can be used as an excuse to rapidly analyse call records without frameworks first being used to evaluate their worth or to assess potential harms.
In interviews with Nature, Bengtsson was forthcoming about the limits of call-record analyses, saying they cannot curb Ebola. But he still considers them invaluable, because they could tell officials or aid workers how a population moves, and that might prove useful — although he was not specific about exactly how.
Epidemiologists have explored how call records might help to combat other diseases, including malaria in Africa and Asia, dengue in Pakistan and cholera in Haiti. In 2012, researchers studied records from nearly 15 million mobile-phone subscribers in Kenya4, and quantified the seasonal migrations of people who travel to work on tea plantations northeast of Lake Victoria, where malaria is a problem. The researchers suggested that officials ramp up malaria surveillance in the towns to which the workers return. But it’s unclear whether the results were needed, or useful. Malaria-control officers haven’t incorporated the analyses into their efforts. Caroline Buckee, an epidemiologist at Harvard University in Cambridge, Massachusetts, who led the investigation, says: “The capacity and the regulatory pieces are not yet there to have it be an automatic part of a response.”
Buckee’s focus has moved to southeast Asia, where drug-resistant malaria has emerged. Her team has partnered with Telenor, a Norwegian telecommunications company with operations across Asia. In a study reported this April, the researchers combined analyses of call records in Bangladesh with information on the movement of malaria parasites, gleaned from genetic analyses of the parasite in blood samples. They found that malaria might be imported into southwest Bangladesh from several places in the country5. Although there’s no indication that the results are being put into action, senior scientist Kenth Engø-Monsen at Telenor Research says: “It is just a question of time.” In a press release, he went further, stating: “The study proves that we have a potent weapon at our disposal in the fight against malaria.” The company is also collaborating with researchers to conduct similar studies in Myanmar and Thailand.
But this type of promotion irks malaria researchers who aren’t convinced that the information is helpful, especially given the lack of resources for proven methods to combat the disease — such as health workers, bed nets, insecticides and malaria drugs. “On an intellectual level, this [mobile-phone research] is attractive,” says Myaing Nyunt, a malaria researcher at Duke University who is based in Myanmar. “But the thing in my head is that actual work is becoming harder to sustain in villages.” Global funding for malaria has plateaued in the past few years, she points out — and with it, progress.
The same practical argument could be made against research on parasite genetics. But Nyunt says that call-record analyses trouble her more, because people haven’t consented to take part.
Data for development
In 2012, the mobile-phone company Orange, together with data scientists at the UN and several universities, held a ‘Data for Development’ challenge to encourage researchers to explore positive uses for call-detail records. Phone companies mostly analyse the records to boost their businesses, says Robert Kirkpatrick, director of UN Global Pulse, an initiative to harness big data. “We wanted to show how it could be used for the public good,” he says.
Orange let scientists analyse anonymized call records from customers in Côte d’Ivoire. In one project, researchers found that brief calls surged before small violent events in Côte d’Ivoire, and suggested that future analyses could help officials to predict danger and thus intervene — but that idea hasn’t been taken up.
Other phone operators took over the challenge. In 2017, Türk Telekom and UN groups invited researchers to study how call records might improve the well-being of Syrian refugees in Turkey. Türk Telekom anonymized and aggregated the data, but flagged call records that were likely to belong to refugees, on the basis of forms of identification that subscribers provided when registering a SIM card.
One project, led by a team from the Middle East Technical University in Ankara, found that refugees living in relatively cheap neighbourhoods “appear to be introvert”, rarely travelling outside these areas. The team posted online maps identifying refugee workplaces and homes, and mapped migrations of refugees to hazelnut plantations in Turkey. The team suggested that migrant workers could benefit from more clinics and child care there.
But there’s no indication that the findings triggered actions that helped refugees. And critics argue that open-ended analyses, such as the refugee challenge, play fast and loose with sensitive information for the sake of exploring big data — rather than doing good for the people in the studies. “Is there no way around understanding how isolated refugees are besides using an invasive technique to track people through mobile technology?” asks Alexandrine Pirlot de Corbion, a programme leader at Privacy International in London, a charity that advocates for the right to privacy. Another way to find out whether refugees are isolated would be to ask them questions, which allows them to decide what to share, she adds.
The Turkish computer engineer who helped to organize the refugee challenge, Albert Ali Salah, now at Utrecht University in the Netherlands, defends the project’s worth. Anyone who might want to harm any of the 3.6 million Syrian refugees in Turkey already knows their neighbourhoods, he argues. But call-record intelligence might help policymakers by giving them quantitative information about refugee movements. And an ethics committee vetted the results: when research indicated refugees were working at a location illegally, for example, the committee told them not to publish the finding.
Responding to the charge that such data challenges have not helped people, Kirkpatrick says exploration was a necessary first step. The next phase in call-records research, he says, should be cost–benefit analyses that look at the investment needed to conduct a study, roll out an intervention and appraise the advantages for communities.
Security and consent
In the meantime, exploratory studies continue. But Bengtsson and others are addressing concerns about consent and data security, not least because one negative story — even if the harm is minor — could trigger a backlash that might stop phone companies from opening up their call records to study at no charge. “Now is the time to put in place standards to do this safely, at scale and ethically,” says Emmanuel Letouzé, director of the Data-Pop Alliance, a coalition in New York City that aims to ensure that big data serves the interests of people across the globe.
Some pressure to change has come from within the community. To show his colleagues the frailty of anonymity, Yves-Alexandre de Montjoye, an applied mathematician at Imperial College London, reported in 2013 that with just four data points per person, 95% of 1.5 million callers in an anonymized mobile-phone data set can be identified6. To lessen the chance that a person acting in bad faith could get hold of the records and identify individuals, many researchers now try to conduct their analyses on data that remain on phone-company servers. Flowminder and the UN World Food Programme are among those groups. “It takes the risk off us,” Rivers explains.
Letouzé, de Montjoye and their colleagues are piloting a system called Open Algorithms (OPAL) in Senegal and Colombia. As well as running analyses on phone-company servers, their model includes a committee that vets and shapes researchers’ questions so that the data analysed are less specific. For instance, if aid workers want to know how many people leave Senegal’s capital city Dakar each week, the committee can decide that records should be aggregated by day, rather than by hour. This reduces the number of extra, unapproved questions that the results can answer. “It’s not a perfect system,” de Montjoye says, “but we are trying to find a way to mitigate risks, while making sure data can be used for good.”
Since last year, groups including Flowminder and phone companies that are headquartered in Europe must comply with the European Union’s general data-protection regulation. Although anonymized and aggregated data seem to be exempt, Letouzé thinks that the law signals a trend towards privacy, and suggests that data scientists should consider how they might incorporate consent into their studies. OPAL is planning to send subscribers a text message asking if they want to opt out, which causes Letouzé some concern. “There are studies showing that when you give people an option, you lose about half,” he says. He’d like to change that by convincing people of the worth of their studies, and by giving them assurances about data security.
Advocates for data security and human rights say that, although technical changes are welcome, more careful risk assessments are required, because records don’t need to be hacked to cause harm. “What if I have aggregated data from the Texas border that shows movement of people coming in from Honduras in the middle of the desert in the middle of the night? That’s a signature of a highly vulnerable population,” Raymond says.
Risk varies from country to country. For instance, the Netherlands’ national statistics office is trying to incorporate anonymized call-record analyses into its censuses — but the results are extraordinarily well protected by law, even from the police, says May Offermans, a statistician there who spoke to Nature in a personal capacity. But many countries don’t enforce data privacy laws well, if they have them at all. Others have a history of human-rights abuses.
For these reasons, critics worry that the GSMA’s initiative on Big Data for Social Good includes countries with governments that routinely track people, such as Turkey, Myanmar, China and Russia. In response, the GSMA says that the phone companies in its network don’t share identifiable data, and that it would hand call records over to government agencies only if required to do so by law.
Researchers analysing call records from nations with over-reaching governments often justify their work by saying that the information they access pales in comparison to what authorities see. But critics counter that, by taking this attitude, scientists are legitimizing an invasion of privacy. Raymond says an organization recently asked his team to assess a call-record study planned in an authoritarian country. (He won’t disclose where or when.) His team pointed out that the study could help the military government to learn how to track populations — including groups they had targeted in the past. The organization called off the project.
Raymond’s passion stems from a tragic mistake he made in 2012 while working on a data-for-good project funded by the actor George Clooney, called the Satellite Sentinel Project. Raymond and colleagues posted satellite images of a new road in Sudan that they supposed could be used to transport tanks and weapons. Two days later, a Sudanese rebel group ambushed a construction crew near an intersection in one of the photos, and took 29 people hostage. The timing of the posted images and the attack suggests that Raymond’s actions might have damaged lives. In retrospect, he says, the initiative lacked a thorough assessment of what could go wrong and whether its objectives were proportional to its risks.
What is needed is clearer guidance on how to decide whether a project is valuable enough to justify concerns, Raymond says. The field of call-record analysis — and big data more generally — needs a broad-scope review from a group such as the US National Academies, he argues, to work out how studies should be vetted. Institutional review boards, which ensure the protection of humans enrolled in studies, “are not fit for purpose in the age of call-detail records, AI and big-data processing”, he says. Because such boards have mainly focused on biomedical investigations in the past, their ethical concerns revolve around protecting individuals from direct harm. They rarely consider unintended consequences that could stem from anonymized, aggregated data sets.
Some guidance might be provided by a set of principles established by the UN Human Rights Council in 2013, which state that digital surveillance shouldn’t be permitted under human-rights law unless it is the only way to achieve a legitimate aim. If legitimacy means actually helping people, vanishingly few of the projects over the past decade measure up.
Bengtsson admits to disappointment. “Frankly, I am surprised a lot of the research hasn’t been used to make decisions,” he says. He explains that it takes time for researchers to work out how to conduct and corroborate such studies, and for policymakers to adopt the practice and act on the results.
Even critics of call-record research think that some studies by Flowminder and other academic groups might one day prove beneficial. But they say extra caution is required in exploratory projects, because real people are involved. McDonald worries that labelling call-records research as ‘data for good’ provides a veneer that can lead people to overlook potentially harmful side effects, and could allow companies to label marketing studies as beneficial research. “If you leave a gun on a table, it is partially your responsibility,” he says, “and what we have now is people who open the arms cache.”
Bengtsson says that Flowminder is doing all it can to ensure that its work doesn’t cause harm. “Unintended consequences of information are always a fear,” he says. “But it’s also discrimination to not have everyone be counted.” If the DRC’s government doesn’t know that an area contains many people, for example, it won’t establish extra schools or clinics there.
More call-record analyses are launching this year. If, like their predecessors, these deliver few tangible advantages to people, the ‘data for good’ mantra could wither. Offermans says the pressure is on to deliver. “You can use [call-record analyses] for good and for bad, I have to admit,” he says. “You just have to trust leaders and policymakers to use it for good.”
Nature 569, 614-617 (2019)