A preliminary analysis shows that citation counts correlate well with paper downloads soon after publication.
Citation analysis has yielded many a controversial conclusion. One striking example is a study which suggested, on the basis of the propagation of citation errors, that about 80% of all references are transcribed from other reference lists rather than from the original source article1. Such reports lead to the suspicion that most authors do not read the papers they cite, and that the papers that are the most cited are not necessarily the papers that are the most read. If true, this makes citation counting far less significant and calls into question the accuracy of referencing in the literature.
Moreover, a practice of 'abstract citation' in lieu of reading the full article or citing papers based on reference lists is particularly problematic for subject areas such as neuroscience, where many manuscripts are multidisciplinary. Blind citation could easily lead to increased error rates within any literature discussion, with inaccuracies reverberating significantly across disciplines. Therefore, we decided to test this hypothesis to determine whether our highly cited papers were also our most widely read.
Elevated readership may cautiously suggest impact and interest. But can we actually measure article readership? The answer is a resounding no, but we can get a coarse approximation. These days, most scientists access the literature through journal websites, leaving behind an electronic trail of page views and paper downloads. It is plausible that immediate downloads (online access within the first few months of publication) may estimate unbiased readership, not likely to be influenced or guided by the perceived authority that comes with high citation counts.
We computed the cross-correlations of citations to individual articles and reviews in Nature Neuroscience (February–December, 2005) with download statistics from our website. Downloads represented the total PDF page views for any particular manuscript within the first 90 days of being posted online (including Advanced Online Publication (AOP) time). There is a strongly significant correlation (R = 0.648) between immediate downloads and future citations (Fig. 1). PDF downloads were a better predictor than HTML downloads, and the correlation progressively became larger even out to 180 days after AOP (R = 0.724). The correlation dropped substantially when the download timeframe was extended to one year, but, as total downloads decline precipitously with time while citation numbers increase with time, this was to be expected.
These results suggest that immediate online readership, with the assumption that everyone downloading the paper is reading it, and eventual citation counts are highly correlated, casting some doubt on the potential view that manuscript citation numbers are often a product of 'reference mining' rather than a reflection of the influential science shaping an author's work. By using a relatively recent cohort of papers for gathering citation counts, and with readership in this analysis measured well before citation totals for individual articles become more influential, we have hopefully minimized the impact of citation for historical reasons, which can be more of a risk for older papers. Although citation feedback loops may still artificially raise the cited totals for particular articles, increased readership could be just as plausible an explanation.
Citation numbers have been generally viewed as one of the most straightforward ways to quantify the potential influence of a manuscript. However, as we have warned previously2, this is problematic. Citation rates can vary quite substantially between subdisciplines, often making it difficult to definitively state the importance of one study to its respective field; papers in Alzheimer's disease, for example, get more citations than those in auditory psychophysics. Another problem involves 'citation bias', with authors more likely to cite their own papers or those of their closest collaborators.
Whatever the problems with interpreting citation numbers, it is important to remember that they are complex distributions and must be tracked in the form of trends; converting them to a simple total drastically undermines their usefulness. New findings could relegate to obsolescence papers previously highly cited, removing any perceived link between citation numbers and importance for that particular paper.
Despite the danger of using citation numbers to gauge scientific influence, it is still reassuring to note that these values do seem to reflect the overall readership of and community interest in a particular manuscript. Our results also recapitulate those from earlier attempts at comparing download statistics and citation numbers, calculated for physics or mathematics preprints3 and for a small sector of the medical literature4. Thus, this correlation may hold up across disciplines.
Interestingly, as other 'Web 2.0' technologies such as blogs and paper commenting become more entrenched in the community as a way of providing feedback on papers and tracking their popularity, following reader traffic from these additional venues may provide other variables with which to calibrate citation numbers, providing a more complete picture of manuscript influence on a particular field. Soon, citation numbers may only be a small factor within a much larger and more complete overall metric used to gauge scientific influence, impact and importance.
For more details and to discuss this concept further, please visit our blog: http://blogs.nature.com/nn/actionpotential/2008/05/downloads_vs_citations.html
Simkin, M.V. & Roychowdhury, V.P. Complex Syst. 14, 269 (2003).
Nat. Neurosci. 6, 783 (2003).
Brody, T. et al. J. Am. Soc. Inf. Sci. Technol. 57, 1060–1072 (2006).
Perneger, T.V. BMJ 329, 546–547 (2004).
About this article
Cite this article
Deciphering citation statistics. Nat Neurosci 11, 619 (2008). https://doi.org/10.1038/nn0608-619
This article is cited by
Nature Neuroscience (2008)