The creators of a scientific search engine have unveiled software that automatically generates one-sentence summaries of research papers, which they say could help scientists to skim-read papers faster.
The free tool, which creates what the team calls TLDRs (the common Internet acronym for ‘Too long, didn’t read’), was activated this week for search results at Semantic Scholar, a search engine created by the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington. For the moment, the software generates sentences only for the ten million computer-science papers covered by Semantic Scholar, but papers from other disciplines should be getting summaries in the next month or so, once the software has been fine-tuned, says Dan Weld, who manages the Semantic Scholar group at AI2.
Preliminary testing suggests that the tool helps readers to sort through search results faster than viewing titles and abstracts, especially on mobile phones, he says. “People seem to really like it.”
A preprint describing the tool was first published on the arXiv preprint server in April1, and was accepted for publication after peer review by a natural-language-processing conference taking place this month. The researchers have made their code freely available, along with a working demo website where anyone can try the tool.
“I predict that this kind of tool will become a standard feature of scholarly search in the near future. Actually, given the need, I am amazed it has taken this long to see it in practice,” says Jevin West, an information scientist at the University of Washington in Seattle who tested the tool at Nature’s request. “It is not perfect, but it’s definitely a step in the right direction,” he says.
Weld was inspired to create the TLDR software in part by the snappy sentences his colleagues share on Twitter to flag up articles. Like other language-generation software, the tool uses deep neural networks trained on vast amounts of text. The team included tens of thousands of research papers matched to their titles, so that the network could learn to generate concise sentences. The researchers then fine-tuned the software to summarize content by training it on a new data set of a few thousand computer-science papers with matching summaries, some written by the papers’ authors and some by a class of undergraduate students. The team has gathered training examples to improve the software’s performance in 16 other fields, with biomedicine likely to come first.
The TLDR software is not the only scientific summarizing tool: since 2018, the website Paper Digest has offered summaries of papers, but it seems to extract key sentences from text, rather than generate new ones, Weld notes. TLDR can generate a sentence from a paper’s abstract, introduction and conclusion. Its summaries tend to be built from key phrases in the article’s text, so are aimed squarely at experts who already understand a paper’s jargon. But Weld says the team is working on generating summaries for non-expert audiences.
The researchers also plan to license the technology to publishers, and to expand their service to provide personalized research briefings that summarize key papers in a field. “We are just getting to the point where AI methods can generate novel summaries at a level that is acceptable to people,” Weld says.
Cachola, I., Kyle, L., Cohan, A. & Weld, D. S. Preprint at https://arxiv.org/abs/2004.15011 (2020).