Published online 9 March 2009 | Nature | doi:10.1038/458135a


Web usage data outline map of knowledge

Analysis offers fresh perspective on role of humanities and social sciences.

Click for larger imagePLoS ONE

When users click from one page to another while looking through online scientific journals, they generate a chain of connections between things they think belong together. Now a billion such 'clickstream events' have been analysed by researchers to map these connections on a grand scale.

The work provides a fascinating snapshot of the web of interconnections between disciplines, which some data-mining experts believe reveals the degree to which work that is not often cited — including work in the social sciences and humanities — is widely consulted and can form bridges between scientific disciplines.

The creators of the maps argue that web-usage metrics give an alternative and more up-to-date view of science than existing maps and indicators, which are largely based on out-of-date citation data. Other researchers agree that the new maps, published this week (J. Bollen et al. PLoS ONE 4, e4803; 2009), are impressive in approach, but they disagree on their significance.

For the study, Johan Bollen and his colleagues at the Los Alamos National Laboratory in New Mexico negotiated access to anonymized server log data covering 35,000 journals from 2006 to 2007. The data came from the University of Texas, the California State University system, and major science journal gateways including Thomson Reuters' Web of Science and Elsevier's Scopus database.

Although data on usage rather than citations have been used in some past studies, the sheer scale of the new study makes it stand out, says Henk Moed, a bibliometry expert at the Centre for Science and Technology Studies at the University of Leiden in the Netherlands. "The paper represents an important step forward."

The data reveal how often users looking at an article in journal A moved on to an article in journal B, and on to one in journal C, and so on, during a browser session. By aggregating hundreds of millions of such relationships, the researchers could use network-visualization algorithms to create maps based on the 'distances' computed between journals and disciplines.

The broad structure of the maps is similar to those created using citation data: a network of clusters in different fields, within which journals have strong connections with one another but fewer links to other clusters. A striking difference in the usage maps is that journals in the humanities and social sciences figure much more prominently than in citation-based maps. Along with some journals in other fields, such as psychology and the environment, they also emerge as gateways between clusters that are otherwise poorly connected, and so act as key bridges between disciplines. The difference partly arises because Bollen's study covers a wider literature than the citation databases, which are biased towards natural sciences journals.

The journal ranking generated from the usage maps includes not just the usual suspects such as Nature , Science and Physical Review B , but also the Journal of Advanced Nursing and Environmental Health Perspectives . That reflects a key difference between citation- and usage-based maps and metrics. The former reflect citations by researchers who publish, but ignore the impact of papers on large swathes of the scientific and medical community who read and apply the literature in medical, commercial or policy practice but who rarely or never publish.

"Citation data may undervalue papers written in practitioner-based fields, such as reviews or syntheses in clinical medical journals that are widely read by practising physicians but not cited proportionally," says Carl Bergstrom of the University of Washington in Seattle. "By including practitioners we capture a much wider sample of the scholarly community," adds Bollen.

Usage maps are also more up to date than citation ones because the inherent delay in publication means it takes at least two years before a paper will start to gather citations in sufficient numbers to be meaningful. "The most exciting aspect is that they give us a different time-slice of the process of scientific discovery," says Bergstrom.


Others are less impressed. Anthony van Raan, director of the Leiden Centre for Science and Technology Studies, argues that this more current view may in fact represent today's "fashions", rather than trends that will endure. Faster online publishing means that papers are being cited faster than before, he argues. He also questions the central position of the social sciences in the maps, and various aspects of the data-analysis techniques used. Other experts say they have similar concerns, but are holding off from passing judgement until they can discuss the methodology with the paper's authors.

But Bergstrom argues that usage and citation data each provide different but useful information on the impact of papers and journals. "Usage data tell us where the net was cast; citation data tell us where the fish were caught," he says. "If you want to understand the human enterprise of fishing, you had better know about both." 


If you find something abusive or inappropriate or which does not otherwise comply with our Terms or Community Guidelines, please select the relevant 'Report this comment' link.

Comments on this thread are vetted after posting.

  • #60712

    How would I start the process of web data analysis right?

Commenting is now closed.