Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Tensions grow as data-mining discussions fall apart

Scientists want to exempt computer-based text crawling from Europe’s copyright law.

Disagreement between scientists and publishers has grown on a thorny issue: how to make it easier for computer programs to extract facts and data from online research papers. On 22 May, researchers, librarians and others pulled out of European Commission talks on how to encourage the techniques, known as text mining and data mining. The withdrawal has effectively ended the contentious discussions, although a formal abandonment can be decided only after a commission review in July.

Scientists have chafed for years at limitations on computer-aided research. They would like to use computer programs to crawl over thousands or millions of articles and other online research content, extracting data to build up databases or to pick out patterns such as associations between genes and diseases.

But in many parts of the world, including Europe, this sort of use currently requires permission from the content’s copyright owner. Even if an institution has paid to access a journal, its academics do not necessarily have permission to mine the text. Publishers, worried that their content might be redistributed for free, tend to block data-mining programs, giving extra licence permissions only on a slow, case-by-case basis (see Nature 483,134–135; 2012). And although authors can now choose to publish under licences that explicitly allow text mining, that innovation doesn’t help text-miners wanting to run programs on decades of pre-existing content.

Rather than struggle through a thicket of different permissions set by publishers, some researchers want Europe to exempt text mining from copyright law — allowing them to run programs on content that they have paid for, and on free content, without fear of copyright breach. Last year, the UK government said that it plans to introduce exemptions for non-commercial purposes. Lenient ‘fair use’ rights in the United States may already allow text mining, depending on how the law is interpreted.

“There is an intense debate on this within the scientific and research community, with a large number of scientists pointing at the limits of the current copyright regulatory regime,” says Ryan Heath, a spokesman for European Commission vice-president Neelie Kroes. “This is a very serious issue, impacting on scientific excellence and innovation in Europe.”

To tackle the issue, last December the commission set up a working group — one of a number under a framework called Licences for Europe — to open discussions about new policies among publishers, researchers, librarians and other interested parties, such as technology companies. In late February, researchers complained in a letter to the commission that the group was constrained to discuss only text-mining licences, and not changes to copyright law (see Nature 495, 295; 2013) — a restriction that would “make computer-based research in many instances impossible”.

“Every researcher I’ve spoken to thinks licensing is a problem,” says Susan Reilly, projects manager at the Association of European Research Libraries in the Hague, the Netherlands. She coordinated the letter that declared the 22 May withdrawal from talks. “There was really no point in us continuing to attend,” she says. Other signatories include the non-profit Open Knowledge Foundation in Cambridge, UK, and the National Centre for Text Mining at the University of Manchester, UK.

“Continuing the group under current circumstances doesn’t make sense,” says Heath. “This is regrettable, but at least the process brought to the fore the major controversies in this area.” The European Commission, he adds, “will reflect on the implications and will address the matter at the time of the review of the Licences for Europe process in July”.

The European talks had always been conflicted because four different European Union administrative departments were involved — not only the department for research and innovation, but also those for education and culture, for media and information issues, and for Europe’s internal market, economy and intellectual-property rights. (The May letter argues that the research department is being squeezed out in favour of the others’ interests.)

“Since the Licences for Europe process has not managed to deliver in this area, other ways forward must be explored,” says Heath. An analysis under way by the commission’s internal-market department on the need for copyright reform may provide impetus for action, should it conclude that changes are needed.

Many publishers say that there are practical, as well as legal, barriers to text mining. Even if the practice were permitted through licences or changes to copyright law, researchers would still need a way to access websites without crippling publisher servers through excess traffic. And publishers want to be able to identify the purpose of the programs crawling their content, especially if mining is for commercial means, so as to decide “what they’re willing to allow at what cost”, says Sarah Faulder, chief executive of the Publishers Licensing Society in London, an industry body that took part in the talks.

To lower some of these practical barriers, the non-profit publisher collaboration CrossRef hopes to launch technology this year enabling text-mining researchers to agree to terms by clicking a button on a publisher’s website.

Discussions may have faltered, but scientists and librarians hope to keep talking to officials, says Reilly. “There’s lots of disagreement even among publishers,” she says. “Some are open to text and data mining, some are completely frightened of it. They need an informed discussion.”


Related links

Related links

Related links in Nature Research

Text-mining spat heats up 2013-Mar-20

Gold in the text? 2012-Mar-07

Trouble at the text mine 2012-Mar-07

Related external links

Dialogue on text and data mining from Licences for Europe

Letter of withdrawal from researchers and publishers (PDF)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Van Noorden, R. Tensions grow as data-mining discussions fall apart. Nature 498, 14–15 (2013).

Download citation

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing