Under the proposed European Data Act, researchers’ access to big data would be more restricted than that of consumers and businesses.Credit: Jorg Greuel/Getty

The amount of information humans and their machines are generating is growing exponentially. It’s expected that the amount of data created, captured and replicated across the world annually will have increased from 33 zettabytes (or 33 trillion gigabytes) in 2018 to 221 zettabytes by 2026.

There is huge potential for this information to drive innovation and economic growth, but most of it is going to waste, say European Union legislators, because companies keep it closely guarded and it ends up largely unused. The European Data Act, proposed in 2022, seeks to free up some of this data, giving consumers, businesses and public-sector bodies access rights.

Researchers, however, say that the proposed act fails to extend such rights to them, and is a missed opportunity to accelerate innovation in key areas, such as climate change, public health and the countering of misinformation. Some see it as the latest example of publicly funded researchers being left behind in the race to make the most of big data.

“There is a wall between researchers and a lot of data that they could use to carry out important research in the public interest, either because companies prevent them from accessing it, or they charge very high prices for it,” says Julien Chicot, a senior policy officer at the Guild of European Research-Intensive Universities in Brussels, a network of 21 research-led institutions. “The proposed Data Act is disappointing because it enables data sharing between businesses, but largely fails to do so for research purposes,” he says.

In 2020, the European Commission published a data strategy to increase the flow of data across sectors in member states. The aims included creating wealth, giving people greater control over their data and fostering trust for companies. The proposed Data Act is a key part of that strategy. The commission’s proposals apply to devices and machines that gather data related to their performance, use and environment, and that can communicate the data through the Internet or by other means. This includes connected objects that are part of the Internet of Things (IoT), such as smart home appliances, connected vehicles and smart manufacturing systems. Products such as smart phones, cameras and personal computers are outside the scope of the proposals, as are ‘value added’ insights derived through software processing.

Indirect access

The commission says it expects the proposals to create €270 billion (US$296 billion) of additional gross domestic product for EU member states by 2028 by unlocking currently underused data. The act could, for example, allow a farmer to access data generated by their machines that they could then pass on to third-party companies for analysis. This might find efficiencies or allow repairs to be made at rates that are lower than the premium charged by the manufacturers.

Some observers say that the EU’s focus on creating legislative frameworks around digital activities doesn’t take into account the importance of publicly funded research in driving innovation. “The political push behind the data strategy is specifically aimed at industry data and business-driven innovation,” says Viivi Lähteenoja, chair of MyData Global, a personal-data campaign group in Helsinki. “The researchers’ agenda, and research-driven innovation, hasn’t been high up on the list.”

The proposed Data Act states that public-sector bodies can request information from companies in “exceptional need”, including to respond to or prevent emergencies, such as those related to public health, environmental degradation or natural disasters. The laws would also give these bodies access to commercial data if it was needed to fulfil tasks in the public interest that they are legally obliged to carry out, and for which they could not obtain the necessary data in other ways. Small companies would be exempt from the requirements.

Legal specialists and organizations representing researchers say that the Data Act will be of limited use to academics because their institutions would not fall under its definition of public-sector bodies, so they could not request information from companies directly. “Researchers usually find a research concept and methodology, and then look for data that helps them study their subject,” says Heiko Richter, who studies information and data regulation at the Max Planck Institute for Innovation and Competition in Munich, Germany. “Under the act, they would have to see what they could do with data provided to them by public-sector bodies, and could only use it for the purposes for which it was originally intended. So, I don’t think the act, as it stands, does much to help researchers.”

Chicot agrees. “The proposed Data Act text suggests public bodies, such as health authorities, would have to ask companies for data and could then share the data with researchers,” he says. “Universities and other public research-performing organizations must be able to request access to data directly from data holders.”

“The Data Act introduces new ways for public-sector bodies to access and use data held by private companies when it is necessary for specific public-interest purposes,” said Johannes Bahrke, a spokesperson for the commission. “Public-sector bodies can use the expertise of public-research institutes to analyse this data.”

Missed opportunity?

The commission says that people in member states will benefit from the greater accessibility of data that the strategy allows, through innovations leading to improved health care, better transport systems, greater energy efficiency, new products and cheaper public services. Some researchers say that these aims can be achieved only if the act is amended to allow them to request data from companies directly and in a wider set of circumstances.

“We are facing some major global challenges, such as the energy transition, loss of biodiversity and environmental degradation,” says Morten Dæhlen, director of the Centre for Computational and Data Science at the University of Oslo. “I understand some data must remain confidential for reasons of commercial interests and personal privacy, however, researchers need access to more data from companies than the Data Act allows to speed up the green transition.”

Danijel Skočaj, a computer scientist at the University of Ljubljana in Slovenia, says that access to more data from manufacturing businesses could accelerate his efforts to use deep learning to improve defect detection in production processes. “We really struggle to get good, realistic data sets to evaluate our algorithms,” says Skočaj. “The Data Act seems to be mostly about business-to-business data sharing, but if it focused more on data sharing for research, it could benefit everyone.”

Both the European Parliament and the European Council, which represents member states, have suggested amendments to the proposed act. Representatives of these bodies and the commission have begun discussions aimed at reaching a common position that could be adopted as legislation next year.

Dæhlen says that the lack of emphasis on the importance of research in the Data Act was mirrored in early proposals of the Artificial Intelligence (AI) Act, which seeks to classify and regulate AI systems by their risk profile. The proposed AI Act, published by the commission in 2021, has been revised during negotiations and is currently the subject of discussions in the European Parliament. “The current form is better, but the early version of the AI Act could be interpreted as saying you can’t do research on certain AI topics,” says Dæhlen. “It was again leaving research behind and showed a lack of understanding of its importance in society.”

Researchers’ concerns about the Data Act follow other examples of publicly funded science being left behind while the private sector capitalizes on big data. One preprint study1 found that more AI researchers are moving from universities to technology companies than the other way around. And those that made the jump to the commercial sector had more than three times the citations per paper than those who stayed behind. The announcement by Twitter of plans to end researchers’ free access to the service’s application programming interface, which enables the extraction and processing of large amounts of data from the platform, is expected to hit those using it for research at universities the hardest because they lack the funds to pay for access.