Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Text-mining offers clues to success

A Correction to this article was published on 28 May 2014

This article has been updated

US intelligence programme analyses language in patents and papers to identify next big technologies.

The rise of technologies such as satellite tracking could have been foretold by analysing past literature. Credit: Richard Newstead/Getty

A project backed by a US intelligence agency might soon make it much easier to predict which technologies will one day become game-changers. Results revealed this week by the Intelligence Advanced Research Projects Activity (IARPA) suggest that clues in the wordings of, and relationships between, scientific papers and patents could foretell research successes.

Settlement for UCLA chemist over student death China rover scans deep into the Moon's geological history BICEP2 team concedes problems with gravitational- wave signal

The project, called Foresight and Understanding from Scientific Exposition (FUSE), could enable funders to pick winners, and help governments to keep an eye on ‘disruptive technologies’ — those that they feel might threaten national security or outpace regulations, for instance. Past examples include nanotechnologies and information technologies, such as the use of the Global Positioning System in mobile phones to allow tracking of individuals’ movements. Last week, FUSE, a four-year project that started in 2011, entered its last phase: to predict the successes three to five years from now.

Although abstracts have been text-mined in the past for keywords and other clues in language, FUSE is one of the first projects to mine the whole bodies of papers and patents. So far, it has performed more than 2 million analyses of past data to pick out key advances, says Dewey Murdick, manager of the FUSE project. From these, it has identified several hundred indicators, such as new collaborations or expressions of excitement in text, that highlight emerging areas.

“What we’d like to glean is understanding of the right combination of things that leads to success,” says John Byrnes, a computer scientist at SRI International, an innovation centre in Menlo Park, California, whose team is one of three developing software for FUSE. To make predictions, his program mines text for keywords, citations and phrases that indicate authors’ outlooks in scholarly papers.

One example he cites is the resolution of a technical problem that, once solved, led to what is now a mainstay of solar-panel technology. In the mid-1990s, millions of dollars were invested in research into solar panels that used aqueous solutions to convert photons into energy. Although promising at first, by 2008, the technology had been overtaken by the much more stable and effective solid-state solar panels. FUSE might have predicted the aqueous panels’ demise, according to results presented by Byrnes and his team this week at the US Department of Energy’s SunShot Grand Challenge Summit in Anaheim, California.

Scientific literature analysis, or ‘scientometrics’, is decades old. Organizations such as Thomson Reuters, an information firm headquartered in New York, have long used these analyses to identify the most influential papers or researchers in a field. FUSE takes this further by mining millions of papers and patents in both English and Chinese, two of the most commonly used languages in scientific literature, says Murdick.

The analysis and indicators can predict whether a nascent field will become prominent or whether it is simply a source of excitement that will soon die out, says Olga Babko-Malaya, a research engineer at BAE Systems in Winchester, Massachusetts, who heads another FUSE team.

Her team uses software algorithms to analyse ‘sentiment’ in the natural language of papers. For instance, authors might say that their work builds on or contradicts a cited paper, or use descriptive language that expresses excitement.

The researchers also found that promising topics invent their own jargon and start using more acronyms. “Abbreviations imply acceptance by the community and are indicators of more mature technologies,” says Babko-Malaya.

The changes in group collaborations in a field over time can also be predictive. FUSE researcher Lance Ramshaw at Raytheon BBN Technologies in Cambridge, Massachusetts, and his team are analysing networks between different topics, keywords and authors. He says that a new topic may be emerging when prominent authors start contributing to a group of papers that share common traits, or when alliances between collaborations shift.

Abbreviations imply acceptance by the community.

Alan Porter, who specializes in technology forecasting at the Georgia Institute of Technology in Atlanta, agrees that retrospective predictions, such as the solar-panel example, are useful for modelling what companies have been doing and for tracking the history of a product. The more difficult task, he says, would be to use such a network to identify the “white spaces”: areas between technology clusters that are ripe for new research.

Ideally, analysis will show patterns or tipping points that are common to success stories. Such patterns might eventually allow the project to forecast when a product might launch, or whether a drug will be approved by regulators, explains Babko-Malaya.

Although software is catching up, human analysts remain the best forecasters, says Murdick. “You can ask experts anything you want,” he says.

In another project, Forecasting Science & Technology, IARPA is funding an online crowdsourcing project called SciCast, run by George Mason University in Fairfax, Virginia, and the American Association for the Advancement of Science. It aims to consult 10,000 scientists to help develop methods for generating accurate forecasts. “My personal bias is that it’s a combination of people and machines that will ultimately provide the most useful value,” says Murdick.

Change history

  • 21 May 2014

    This article originally implied that SRI International is based in Arlington, Virginia. Although it has an office there, it is based in Menlo Park, California. It also originally said that IARPA was a partner in SciCast — in fact it is finding the project. The text has been corrected to reflect this.

  • 28 May 2014

    A Correction to this paper has been published:


Related links

Related links

Related links in Nature Research

Who is the best scientist of them all? 2013-Nov-06

Spies to use Twitter as crystal ball 2011-Oct-17

News mining might have predicted Arab Spring 2011-Sep-13

Related external links

IARPA's FUSE programme

AAAS SciCast

Sunshot Grand Challenge Summit

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Reardon, S. Text-mining offers clues to success. Nature 509, 410 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing