The European Commission has announced long-awaited plans to make it easier for researchers to harvest facts and data from research papers — by freeing the computer-aided activity from the shackles of copyright law.

Software can rapidly analyse millions of online articles and data sets at speeds humans can’t match, an activity known as text and data mining (TDM). Scientists hope that this could reveal patterns in scientific knowledge and generate new hypotheses.

But the field has been hampered by uncertainties about the legality of sifting through science publishers’ content to crunch the data. In the European Union, this sort of activity requires the permission of a paper’s copyright holder. To crawl across paywalled content, would-be miners have had to go through the laborious process of asking various publishers for approval. And publishers have sometimes refused to allow TDM (apparently out of fear that paywalled content might be freely redistributed), or have only permitted it with restrictions, controlled licenses or fees. A 2014 report for the European Commission suggested that Europe’s researchers were doing less computer crawling than those in the United States and Asia.

As part of copyright-reform proposals announced on 14 September, the Commission suggests exempting TDM from copyright — but only for research organizations “acting in the public interest”, such as universities and research centres, and only for content that they already have legal access to read. It would cover both commercial and non-commercial research. But the exception will not apply to commercial firms, which would still need to negotiate rights with publishers and other content providers.

“We must remove barriers that prevent scientists from digging deeper into the existing knowledge base. This proposed copyright exception will give researchers the freedom to pursue their work without fear of legal repercussions,” said Carlos Moedas, head of research at the European Commission, in a press statement.

Uncertainty lifted

If adopted, the proposals — which would need to be approved by the European Parliament and the council that represents the European Union member states — would lift many of the uncertainties over an academic’s right to text mine. Even if university libraries sign a contract with publishers that runs contrary to the exemption, this would be “unenforceable”, the proposals say.

One of the leading campaigners for the exemption — the Association of European Research Libraries in the Hague, the Netherlands — calls the proposals a “hugely important step” towards addressing legal confusions. However, Susan Reilly, the organization’s executive director, notes that it’s disappointing that start-up firms won’t be able to take advantage.

According to the proposed directive, publishers would have the right to take “reasonable measures” to ensure the security and integrity of their databases, and where their content is stored. This suggests that publishers and research organizations may need to reach agreement on how text miners access and compute copyrighted content – even if academics no longer have to ask for legal permission to do it.

This is an issue, says Reilly. "No one wants to bring down servers." But she says that publisher's electronic platforms are sufficiently robust to handle the extra load caused by content mining, and that libraries are ready to help discuss what reasonable access measures might be.