Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Large-scale analysis of micro-level citation patterns reveals nuanced selection criteria

Abstract

The analysis of citations to scientific publications has become a tool that is used in the evaluation of a researcher’s work; especially in the face of an ever-increasing production volume1,2,3,4,5,6. Despite the acknowledged shortcomings of citation analysis and the ongoing debate on the meaning of citations7,8, citations are still primarily viewed as endorsements and as indicators of the influence of the cited reference, regardless of the context of the citation. However, only recently has attention9,10 been given to the connection between contextual information and the success of citing and cited papers, primarily because of the lack of extensive databases that cover both types of metadata. Here we address this issue by studying the usage of citations throughout the full text of 156,558 articles published by the Public Library of Science (PLoS), and by tracing their bibliometric history from among 60 million records obtained from the Web of Science. We find universal patterns of variation in the usage of citations across paper sections11. Notably, we find differences in microlevel citation patterns that were dependent on the ultimate impact of the citing paper itself; publications from high-impact groups tend to cite younger references, as well as more very young and better-cited references. Our study provides a quantitative approach to addressing the long-standing issue that not all citations count the same.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: References vary in age and impact according to the section in which they are cited.
Fig. 2: The number of references used in each section is mostly independent of the paper’s impact group, although authors of highly cited PLoS papers cite younger references and use a higher percentage of young references.
Fig. 3: Authors of highly cited PLoS papers cite more highly cited references, especially in the case of young references.
Fig. 4: Highly cited papers have a higher-than-expected probability of citing highly cited references, and a lower-than-expected probability of citing poorly cited references.

Data availability

The data from PLoS are publicly available through its API (api.PLoS.org), the data from the Web of Science are available from Clarivate Analytics. We provide the conversion tables to link the DOIs of the PLoS papers used in this study, and the Web of Science unique IDs (of both the PLoS papers and the references they cite) here: https://doi.org/10.21985/N21X9J.

Code availability

Code for replication of all of our results is available via GitHub: https://github.com/juliettapc/my_In_text_citations.

References

  1. 1.

    de Solla Price, D. J. Networks of scientific papers. Science 149, 510–515 (1965).

    Article  Google Scholar 

  2. 2.

    Merton, R. K. The Matthew effect in science: the reward and communication systems of science are considered. Science 159, 56–63 (1968).

    Article  Google Scholar 

  3. 3.

    Cronin, B. & Barsky Atkins, H. (eds) The Web of Knowledge: A Festschrift in Honor of Eugene Garfield (ASIS, 2000).

  4. 4.

    Boyack, K. W., Klavans, R. & Börner, K. Mapping the backbone of science. Scientometrics 64, 351–374 (2005).

    CAS  Article  Google Scholar 

  5. 5.

    Evans, J. A. & Foster, J. G. Metaknowledge. Science 331, 721–725 (2011).

    CAS  Article  Google Scholar 

  6. 6.

    Zeng, A. et al. The science of science: from the perspective of complex systems. Phys. Rep. 714–715, 1–73 (2017).

    Article  Google Scholar 

  7. 7.

    Bornmann, L. & Daniel, H. D. Selecting manuscripts for a high-impact journal through peer review: a citation analysis of communications that were accepted by Angewandte Chemie International Edition, or rejected but published elsewhere. J. Am. Soc. Inf. Sci. Technol. 59, 1841–1852 (2008).

    Article  Google Scholar 

  8. 8.

    Radicchi, F., Weissman, A. & Bollen, J. Quantifying perceived impact of scientific publications. J. Informetr. 11, 704–712 (2017).

    Article  Google Scholar 

  9. 9.

    Yegros-yegros, A., Lamers, W. S., Eck, N. J. V., Waltman, L. & Hoos, H. Patterns in citation context: the case of the field of scientometrics. In Proc. 23rd International Conference on Science and Technology Indicators 1115–1122 (2018).

  10. 10.

    Boyack, K. W., van Eck, N. J., Colavizza, G. & Waltman, L. Characterizing in-text citations in scientific articles: a large-scale analysis. J. Informetr. 12, 59–73 (2018).

    Article  Google Scholar 

  11. 11.

    Bertin, M., Atanassova, I., Gingras, Y. & Lariviere, V. The invariant distribution of references in scientific articles. J. Assoc. Inf. Sci. Technol. 67, 164–177 (2016).

    CAS  Article  Google Scholar 

  12. 12.

    Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).

    CAS  Article  Google Scholar 

  13. 13.

    Guimerà, R., Uzzi, B., Spiro, J. & Amaral, L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308, 697–702 (2005).

    Article  Google Scholar 

  14. 14.

    Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316, 1036–1039 (2007).

    CAS  Article  Google Scholar 

  15. 15.

    Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protégé performance. Nature 465, 622–627 (2010).

    CAS  Article  Google Scholar 

  16. 16.

    Zeng, X. H. T. et al. Differences in collaboration patterns across discipline, career stage, and gender. PLoS Biol. 14, e1002573 (2016).

    Article  Google Scholar 

  17. 17.

    Iacopini, I., Milojević, S. & Latora, V. Network dynamics of innovation processes. Phys. Rev. Lett. 120, 048301 (2018).

    CAS  Article  Google Scholar 

  18. 18.

    Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342, 468–472 (2013).

    CAS  Article  Google Scholar 

  19. 19.

    Ahmadpoor, M. & Jones, B. F. The dual frontier: patented inventions and prior scientific advance. Science 357, 583–587 (2017).

    CAS  Article  Google Scholar 

  20. 20.

    Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).

    CAS  Article  Google Scholar 

  21. 21.

    Wang, D., Song, C. & Barabási, A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013).

    CAS  Article  Google Scholar 

  22. 22.

    Petersen, A. M. et al. Reputation and impact in academic careers. Proc. Natl Acad. Sci. USA 111, 15316–15321 (2014).

    CAS  Article  Google Scholar 

  23. 23.

    Moreira, J. A. G., Zeng, X. H. T. & Amaral, L. A. N. The distribution of the asymptotic number of citations to sets of publications by a researcher or from an academic department are consistent with a discrete lognormal model. PLoS One 10, e0143108 (2015).

    Article  Google Scholar 

  24. 24.

    Wasserman, M., Zeng, X. H. T. & Amaral, L. A. N. Cross-evaluation of metrics to estimate the significance of creative works. Proc. Natl Acad. Sci. USA 112, 1281–1286 (2015).

    CAS  Article  Google Scholar 

  25. 25.

    Tahamtan, I., Safipour Afshar, A. & Ahamdzadeh, K. Factors affecting number of citations: a comprehensive review of the literature. Scientometrics 107, 1195–1225 (2016).

    Article  Google Scholar 

  26. 26.

    Milojević, S. How are academic age, productivity and collaboration related to citing behavior of researchers? PLoS One 7, e49176 (2012).

    Article  Google Scholar 

  27. 27.

    Gingras, Y., Larivière, V., Macaluso, B. & Robitaille, J. P. The effects of aging on researchers' publication and citation patterns. PLoS One 3, e4048 (2008).

    Article  Google Scholar 

  28. 28.

    West, J. D., Jacquet, J., King, M. M., Correll, S. J. & Bergstrom, C. T. The role of gender in scholarly authorship. PLoS One 8, e66212 (2013).

    CAS  Article  Google Scholar 

  29. 29.

    Bornmann, L. & Daniel, H. What do citation counts measure? A review of studies on citing behavior. J. Doc. 64, 45–80 (2008).

    Article  Google Scholar 

  30. 30.

    Valenzuela, M., Ha, V. & Etzioni, O. Identifying meaningful citations. In AAAI Workshops https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/viewPaper/10185 (2015)..

  31. 31.

    Popper, K. R. The nature of philosophical problems and their roots in science. Br. J. Philos. Sci. 3, 124–156 (1952).

    Article  Google Scholar 

  32. 32.

    Krapivsky, P. L. & Redner, S. Network growth by copying. Phys. Rev. E 71, 036118 (2005).

    CAS  Article  Google Scholar 

  33. 33.

    Redner, S. Citation statistics from more than a century of physical review. Phys. Today 58, 49 (2005).

    Article  Google Scholar 

  34. 34.

    Stringer, M. J., Sales-Pardo, M. & Nunes Amaral, L. A. Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in 26 a scientific journal. J. Am. Soc. Inf. Sci. Technol. 61, 1377–1385 (2010).

    Article  Google Scholar 

  35. 35.

    Liang, L., Zhong, Z. & Rousseau, R. Scientists’ referencing (mis)behavior revealed by the dissemination network of referencing errors. Scientometrics 101, 1973–1986 (2014).

    Article  Google Scholar 

  36. 36.

    Roach, V. J., Lau, T. K., Kee, W. D. N. & Kong, H. The quality of citations in major international obstetrics and gynecology journals. Am. J. Obstet. Gynecol. 177, 973–975 (1997).

    CAS  Article  Google Scholar 

  37. 37.

    Davies, K. Reference accuracy in library and information science journals. Aslib Proc. 64, 373–387 (2012).

    Article  Google Scholar 

  38. 38.

    Dias, L., Gerlach, M., Scharloth, J. & Altmann, E. G. Using text analysis to quantify the similarity and evolution of scientific disciplines. R. Soc. Open Sci. 5, 171545 (2018).

    Article  Google Scholar 

  39. 39.

    Aksnes, D. W. A macro study of self-citation. Scientometrics 56, 235–246 (2003).

    CAS  Article  Google Scholar 

  40. 40.

    Ioannidis, J. P. A. A generalized view of self-citation: direct, co-author, collaborative, and coercive induced self-citation. J. Psychosom. Res. 78, 7–11 (2015).

    Article  Google Scholar 

  41. 41.

    Mukherjee, S., Romero, D. M., Jones, B. & Uzzi, B. The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: the hotspot. Sci. Adv. 3, e1601315 (2017).

    Article  Google Scholar 

  42. 42.

    Van Noorden, R., Maher, B. & Nuzzo, R. The top 100 papers. Nature 514, 550–553 (2014).

    Article  Google Scholar 

  43. 43.

    Gerlach, M., Peixoto, T. P. & Altmann, E. G. A network approach to topic models. Sci. Adv. 4, eaaq1360 (2018).

    Article  Google Scholar 

  44. 44.

    Garfield E. The use of journal impact factors and citation analysis for evaluation of science. In Proc. Cell Separation, Hematology and Journal Citation Analysis, Mini Symposium in Tribute to Arne Bøyum (1998).

  45. 45.

    Stringer, M. M. J., Sales-Pardo, M. & Amaral, L. A. N. Effectiveness of journal ranking schemes as a tool for locating information. PLoS One 3, e1683 (2008).

    Article  Google Scholar 

  46. 46.

    Bornmann, L., de Moya Anegón, F. & Leydesdorff, L. Do scientific advancements lean on the shoulders of giants? A bibliometric investigation of the Ortega hypothesis. PLoS One 5, e13327 (2010).

    Article  Google Scholar 

  47. 47.

    Šubelj, L. & Bajec, M. Model of complex networks based on citation dynamics. In Proc. 22nd International Conference on World Wide Web 527–530 (ACM, 2013).

  48. 48.

    Zhao, Z., Rollins, J., Bai, L. & Rosen, G. Incremental author name disambiguation for scientific citation data. In Proc. 2017 International Conference on Data Science and Advanced Analytics 175–183 (2018).

  49. 49.

    Clarivate Analytics. Web of Science Raw Data (XML): User Guide for Web of Science Raw Data Clarivate.com https://clarivate.libguides.com/c.php?g=593069&p=4220414 (2016).

  50. 50.

    Chiao, J. Y., Bowman, N. E. & Gill, H. The political gender gap: gender bias in facial inferences that predict voting behavior. PLoS One 3, e3666 (2008).

    Article  Google Scholar 

  51. 51.

    Duch, J., Waitzman, J. S. & Amaral, L. A. N. Quantifying the performance of individual players in a team activity. PLoS One 5, e10937 (2010).

    Article  Google Scholar 

  52. 52.

    Sales-Pardo, M., Diermeier, D. & Amaral, L. A. N. The impact of individual biases on consensus formation. PLoS One 8, e58989 (2013).

    CAS  Article  Google Scholar 

  53. 53.

    Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947).

    Article  Google Scholar 

Download references

Acknowledgements

L.A.N.A. thanks the John and Leslie McQuown Gift and support from the Department of Defense Army Research Office under grant number W911NF-14-1-0259. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

J.P.-C. contributed to the data preparation, wrote the codes for data analysis, statistical testing and figure plotting, contributed to the interpretation of the results and drafted the manuscript. N.A. collected, cleaned and prepared the data and performed preliminary analysis. M.G. contributed to the collection and the analysis of the data, contributed to the interpretation of the results and drafted the manuscript. L.A.N.A. conceived and designed the study, contributed to the interpretation of the results and drafted the manuscript.

Corresponding author

Correspondence to Luís A. N. Amaral.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods 1–3, Supplementary Figs. 1–35, and Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Poncela-Casasnovas, J., Gerlach, M., Aguirre, N. et al. Large-scale analysis of micro-level citation patterns reveals nuanced selection criteria. Nat Hum Behav 3, 568–575 (2019). https://doi.org/10.1038/s41562-019-0585-7

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing