Deciphering citation statistics

doi:10.1038/nn0608-619

Download PDF

Editorial
Published: June 2008

Deciphering citation statistics

Nature Neuroscience volume 11, page 619 (2008)Cite this article

1175 Accesses
6 Citations
55 Altmetric
Metrics details

A Corrigendum to this article was published on 01 December 2008

This article has been updated

A preliminary analysis shows that citation counts correlate well with paper downloads soon after publication.

Citation analysis has yielded many a controversial conclusion. One striking example is a study which suggested, on the basis of the propagation of citation errors, that about 80% of all references are transcribed from other reference lists rather than from the original source article¹. Such reports lead to the suspicion that most authors do not read the papers they cite, and that the papers that are the most cited are not necessarily the papers that are the most read. If true, this makes citation counting far less significant and calls into question the accuracy of referencing in the literature.

Moreover, a practice of 'abstract citation' in lieu of reading the full article or citing papers based on reference lists is particularly problematic for subject areas such as neuroscience, where many manuscripts are multidisciplinary. Blind citation could easily lead to increased error rates within any literature discussion, with inaccuracies reverberating significantly across disciplines. Therefore, we decided to test this hypothesis to determine whether our highly cited papers were also our most widely read.

Elevated readership may cautiously suggest impact and interest. But can we actually measure article readership? The answer is a resounding no, but we can get a coarse approximation. These days, most scientists access the literature through journal websites, leaving behind an electronic trail of page views and paper downloads. It is plausible that immediate downloads (online access within the first few months of publication) may estimate unbiased readership, not likely to be influenced or guided by the perceived authority that comes with high citation counts.

We computed the cross-correlations of citations to individual articles and reviews in Nature Neuroscience (February–December, 2005) with download statistics from our website. Downloads represented the total PDF page views for any particular manuscript within the first 90 days of being posted online (including Advanced Online Publication (AOP) time). There is a strongly significant correlation (R = 0.648) between immediate downloads and future citations (Fig. 1). PDF downloads were a better predictor than HTML downloads, and the correlation progressively became larger even out to 180 days after AOP (R = 0.724). The correlation dropped substantially when the download timeframe was extended to one year, but, as total downloads decline precipitously with time while citation numbers increase with time, this was to be expected.

These results suggest that immediate online readership, with the assumption that everyone downloading the paper is reading it, and eventual citation counts are highly correlated, casting some doubt on the potential view that manuscript citation numbers are often a product of 'reference mining' rather than a reflection of the influential science shaping an author's work. By using a relatively recent cohort of papers for gathering citation counts, and with readership in this analysis measured well before citation totals for individual articles become more influential, we have hopefully minimized the impact of citation for historical reasons, which can be more of a risk for older papers. Although citation feedback loops may still artificially raise the cited totals for particular articles, increased readership could be just as plausible an explanation.

Citation numbers have been generally viewed as one of the most straightforward ways to quantify the potential influence of a manuscript. However, as we have warned previously², this is problematic. Citation rates can vary quite substantially between subdisciplines, often making it difficult to definitively state the importance of one study to its respective field; papers in Alzheimer's disease, for example, get more citations than those in auditory psychophysics. Another problem involves 'citation bias', with authors more likely to cite their own papers or those of their closest collaborators.

Whatever the problems with interpreting citation numbers, it is important to remember that they are complex distributions and must be tracked in the form of trends; converting them to a simple total drastically undermines their usefulness. New findings could relegate to obsolescence papers previously highly cited, removing any perceived link between citation numbers and importance for that particular paper.

Despite the danger of using citation numbers to gauge scientific influence, it is still reassuring to note that these values do seem to reflect the overall readership of and community interest in a particular manuscript. Our results also recapitulate those from earlier attempts at comparing download statistics and citation numbers, calculated for physics or mathematics preprints³ and for a small sector of the medical literature⁴. Thus, this correlation may hold up across disciplines.

Interestingly, as other 'Web 2.0' technologies such as blogs and paper commenting become more entrenched in the community as a way of providing feedback on papers and tracking their popularity, following reader traffic from these additional venues may provide other variables with which to calibrate citation numbers, providing a more complete picture of manuscript influence on a particular field. Soon, citation numbers may only be a small factor within a much larger and more complete overall metric used to gauge scientific influence, impact and importance.

For more details and to discuss this concept further, please visit our blog: http://blogs.nature.com/nn/actionpotential/2008/05/downloads_vs_citations.html

Change history

25 June 2008
In the version of this editorial originally published, the x and y axes of Figure 1 were mislabeled. The correct x axis should read ‘Number of PDF downloads within 90 days (A.U.)’ and the correct y axis should read ‘Number of citations’. This error has been corrected in the HTML and PDF versions of the editorial.

References

Simkin, M.V. & Roychowdhury, V.P. Complex Syst. 14, 269 (2003).
Google Scholar
Nat. Neurosci. 6, 783 (2003).
Brody, T. et al. J. Am. Soc. Inf. Sci. Technol. 57, 1060–1072 (2006).
Article Google Scholar
Perneger, T.V. BMJ 329, 546–547 (2004).
Article PubMed Google Scholar

Download references

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deciphering citation statistics. Nat Neurosci 11, 619 (2008). https://doi.org/10.1038/nn0608-619

Download citation

Issue Date: June 2008
DOI: https://doi.org/10.1038/nn0608-619

This article is cited by

Erratum: Corrigendum: Deciphering citation statistics

Nature Neuroscience (2008)

Deciphering citation statistics

Change history

25 June 2008

References

Rights and permissions

About this article

Cite this article

This article is cited by

Erratum: Corrigendum: Deciphering citation statistics

Search

Quick links

Change history

25 June 2008

References

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Erratum: Corrigendum: Deciphering citation statistics

Search

Quick links