Technology feature

New tools track article buzz online

A new service allows researchers to access raw social media data.

  • Jeffrey Perkel

Credit: Roy Scott/Getty

New tools track article buzz online

A new service allows researchers to access raw social media data.

9 March 2018

Jeffrey Perkel

Roy Scott/Getty

“How’s my paper doing?” It’s such a simple question, and in today’s hyperconnected world it’s relatively easy to work out who’s reading and talking about your scientific publications. But are there conversations you might be overlooking?

A handful of companies — including Plum Analytics and Altmetric, which is partially owned by Digital Science, a company that is fully owned by Holtzbrinck, which also invests in Springer Nature — provide services that collect and crunch the numbers from social media providers to produce an aggregate metric. But those metrics are effectively ‘pre-digested’, says Juan Pablo Alperin of the Public Knowledge Project in Vancouver, Canada: The analytics firm has made certain judgement calls to turn the raw numbers into an easily understood value. Those metrics don’t necessarily reflect the priorities or assumptions of every institution, nor do they tally every possible interaction.

What is sometimes needed, Alperin says, is access to the raw data themselves. Now a new service from DOI registry Crossref is providing just that.

Crossref’s Event Data service, according to product manager Madeleine Watson, provides publishers, editors, bibliometricians, research scientists, and third-party service providers with a stream of information detailing tens of millions of raw ‘interactions’ between registered DOIs and online resources — some 65 million to date. Users can tap into the stream from the command line via an application programming interface, and then apply their own algorithms to best understand their work’s impact.

Unlike traditional altmetrics providers, “We don’t do any sort of aggregation,” Watson explains. Instead, the organization provides “an ongoing stream” of subject-relation-object “triples”, each of which describes an “interaction”: this blog post discusses that paper, and was logged at this particular time.

Though still in beta, the service captures interactions from a dozen sources, including Twitter, Wikipedia, Reddit, StackExchange, DataCite, and the Cambia Lens patent database. Many of those interactions are also captured by traditional ‘altmetrics’ services. But the dataset also “has applications far beyond altmetrics alone,” Watson says. Users can, for instance, enumerate links between DOIs and the hypothesis annotation service.

According to Alperin, who is developing a widget that seeks to aggregate these data for open-access publishers, the Event Data service offers two particularly attractive features. First, “they are creating a fully auditable stream.” That is, the data are captured such that “it’s possible to really go back and see what the evidence for every source of every bit of every metric that can be produced out of it will be.” And second, the data are licensed for maximal reuse — mostly under Creative Commons CC-0.

As Watson puts it, the data are fully “transparent”. Not just the interactions are captured, but also information detailing how they were identified and mapped to particular DOIs. And users are free to sift through them however they like, to answer the questions most pertinent to them.

“Anyone, whether they’re a researcher interested in researching blocking activity or blog spam, or somebody interested in altmetrics, or someone using it because they want to do some business analysis, for instance — they have all the context that we had when we went out and we noticed that relationship,” Watson says, “so that they can then use it as effectively as possible to create their own measurement of meaning, whatever that is to them.”

At least two third-party tools are in development to exploit these data. The first, developed by Vancouver-based nonprofit ImpactStory with funding from the PKP, is Paperbuzz, which provides a free, user-friendly interface to the data stream. Simply enter a DOI (papers published in 2017 work best) in the search bar to get a list of all mentions recorded to date. An associated API allows programmers to import the underlying data for processing in their own pipelines and tools. The second tool, PKP’s in-development widget, will distill Paperbuzz data into simple graphical representations of activity over time.

Event Data, says ImpactStory spokesperson Jason Priem, “addresses one of the missing components of the altmetrics ecosystem to date, which is a really open and transparent way of gathering altmetrics, aggregating them, and distributing them.”

Currently in beta, Paperbuzz is expected to formally launch in 2018 coincident with the Event Data Service itself, Priem says. For the moment, the service details only those events logged by Event Data. But the company also is working to fold in data from its Unpaywall browser extension — a tool that points users to open-access versions of published research articles — as a proxy for determining how many people actually read an article over time.

PKP’s widget should make those data even friendlier, Alperin says. And among its likely early adopters, he adds, are the journals published using PKP’s Open Journal Systems software. Over 10,000 journals meet that criterion, Alperin says; some don’t assign DOIs, and thus cannot use Paperbuzz. But those that do will likely avail themselves of the new service, he says. “It should be used by thousands of journals in the end.”

This post was originally published on Naturejobs