Large-scale analysis of micro-level citation patterns reveals nuanced selection criteria

Abstract

The analysis of citations to scientific publications has become a tool that is used in the evaluation of a researcher’s work; especially in the face of an ever-increasing production volume1,2,3,4,5,6. Despite the acknowledged shortcomings of citation analysis and the ongoing debate on the meaning of citations7,8, citations are still primarily viewed as endorsements and as indicators of the influence of the cited reference, regardless of the context of the citation. However, only recently has attention9,10 been given to the connection between contextual information and the success of citing and cited papers, primarily because of the lack of extensive databases that cover both types of metadata. Here we address this issue by studying the usage of citations throughout the full text of 156,558 articles published by the Public Library of Science (PLoS), and by tracing their bibliometric history from among 60 million records obtained from the Web of Science. We find universal patterns of variation in the usage of citations across paper sections11. Notably, we find differences in microlevel citation patterns that were dependent on the ultimate impact of the citing paper itself; publications from high-impact groups tend to cite younger references, as well as more very young and better-cited references. Our study provides a quantitative approach to addressing the long-standing issue that not all citations count the same.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: References vary in age and impact according to the section in which they are cited.
Fig. 2: The number of references used in each section is mostly independent of the paper’s impact group, although authors of highly cited PLoS papers cite younger references and use a higher percentage of young references.
Fig. 3: Authors of highly cited PLoS papers cite more highly cited references, especially in the case of young references.
Fig. 4: Highly cited papers have a higher-than-expected probability of citing highly cited references, and a lower-than-expected probability of citing poorly cited references.

Data availability

The data from PLoS are publicly available through its API (api.PLoS.org), the data from the Web of Science are available from Clarivate Analytics. We provide the conversion tables to link the DOIs of the PLoS papers used in this study, and the Web of Science unique IDs (of both the PLoS papers and the references they cite) here: https://doi.org/10.21985/N21X9J.

Code availability

Code for replication of all of our results is available via GitHub: https://github.com/juliettapc/my_In_text_citations.

References

  1. 1.

    de Solla Price, D. J. Networks of scientific papers. Science 149, 510–515 (1965).

  2. 2.

    Merton, R. K. The Matthew effect in science: the reward and communication systems of science are considered. Science 159, 56–63 (1968).

  3. 3.

    Cronin, B. & Barsky Atkins, H. (eds) The Web of Knowledge: A Festschrift in Honor of Eugene Garfield (ASIS, 2000).

  4. 4.

    Boyack, K. W., Klavans, R. & Börner, K. Mapping the backbone of science. Scientometrics 64, 351–374 (2005).

  5. 5.

    Evans, J. A. & Foster, J. G. Metaknowledge. Science 331, 721–725 (2011).

  6. 6.

    Zeng, A. et al. The science of science: from the perspective of complex systems. Phys. Rep. 714–715, 1–73 (2017).

  7. 7.

    Bornmann, L. & Daniel, H. D. Selecting manuscripts for a high-impact journal through peer review: a citation analysis of communications that were accepted by Angewandte Chemie International Edition, or rejected but published elsewhere. J. Am. Soc. Inf. Sci. Technol. 59, 1841–1852 (2008).

  8. 8.

    Radicchi, F., Weissman, A. & Bollen, J. Quantifying perceived impact of scientific publications. J. Informetr. 11, 704–712 (2017).

  9. 9.

    Yegros-yegros, A., Lamers, W. S., Eck, N. J. V., Waltman, L. & Hoos, H. Patterns in citation context: the case of the field of scientometrics. In Proc. 23rd International Conference on Science and Technology Indicators 1115–1122 (2018).

  10. 10.

    Boyack, K. W., van Eck, N. J., Colavizza, G. & Waltman, L. Characterizing in-text citations in scientific articles: a large-scale analysis. J. Informetr. 12, 59–73 (2018).

  11. 11.

    Bertin, M., Atanassova, I., Gingras, Y. & Lariviere, V. The invariant distribution of references in scientific articles. J. Assoc. Inf. Sci. Technol. 67, 164–177 (2016).

  12. 12.

    Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).

  13. 13.

    Guimerà, R., Uzzi, B., Spiro, J. & Amaral, L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308, 697–702 (2005).

  14. 14.

    Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316, 1036–1039 (2007).

  15. 15.

    Malmgren, R. D., Ottino, J. M. & Amaral, L. A. N. The role of mentorship in protégé performance. Nature 465, 622–627 (2010).

  16. 16.

    Zeng, X. H. T. et al. Differences in collaboration patterns across discipline, career stage, and gender. PLoS Biol. 14, e1002573 (2016).

  17. 17.

    Iacopini, I., Milojević, S. & Latora, V. Network dynamics of innovation processes. Phys. Rev. Lett. 120, 048301 (2018).

  18. 18.

    Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342, 468–472 (2013).

  19. 19.

    Ahmadpoor, M. & Jones, B. F. The dual frontier: patented inventions and prior scientific advance. Science 357, 583–587 (2017).

  20. 20.

    Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).

  21. 21.

    Wang, D., Song, C. & Barabási, A.-L. Quantifying long-term scientific impact. Science 342, 127–132 (2013).

  22. 22.

    Petersen, A. M. et al. Reputation and impact in academic careers. Proc. Natl Acad. Sci. USA 111, 15316–15321 (2014).

  23. 23.

    Moreira, J. A. G., Zeng, X. H. T. & Amaral, L. A. N. The distribution of the asymptotic number of citations to sets of publications by a researcher or from an academic department are consistent with a discrete lognormal model. PLoS One 10, e0143108 (2015).

  24. 24.

    Wasserman, M., Zeng, X. H. T. & Amaral, L. A. N. Cross-evaluation of metrics to estimate the significance of creative works. Proc. Natl Acad. Sci. USA 112, 1281–1286 (2015).

  25. 25.

    Tahamtan, I., Safipour Afshar, A. & Ahamdzadeh, K. Factors affecting number of citations: a comprehensive review of the literature. Scientometrics 107, 1195–1225 (2016).

  26. 26.

    Milojević, S. How are academic age, productivity and collaboration related to citing behavior of researchers? PLoS One 7, e49176 (2012).

  27. 27.

    Gingras, Y., Larivière, V., Macaluso, B. & Robitaille, J. P. The effects of aging on researchers' publication and citation patterns. PLoS One 3, e4048 (2008).

  28. 28.

    West, J. D., Jacquet, J., King, M. M., Correll, S. J. & Bergstrom, C. T. The role of gender in scholarly authorship. PLoS One 8, e66212 (2013).

  29. 29.

    Bornmann, L. & Daniel, H. What do citation counts measure? A review of studies on citing behavior. J. Doc. 64, 45–80 (2008).

  30. 30.

    Valenzuela, M., Ha, V. & Etzioni, O. Identifying meaningful citations. In AAAI Workshops https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/viewPaper/10185 (2015)..

  31. 31.

    Popper, K. R. The nature of philosophical problems and their roots in science. Br. J. Philos. Sci. 3, 124–156 (1952).

  32. 32.

    Krapivsky, P. L. & Redner, S. Network growth by copying. Phys. Rev. E 71, 036118 (2005).

  33. 33.

    Redner, S. Citation statistics from more than a century of physical review. Phys. Today 58, 49 (2005).

  34. 34.

    Stringer, M. J., Sales-Pardo, M. & Nunes Amaral, L. A. Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in 26 a scientific journal. J. Am. Soc. Inf. Sci. Technol. 61, 1377–1385 (2010).

  35. 35.

    Liang, L., Zhong, Z. & Rousseau, R. Scientists’ referencing (mis)behavior revealed by the dissemination network of referencing errors. Scientometrics 101, 1973–1986 (2014).

  36. 36.

    Roach, V. J., Lau, T. K., Kee, W. D. N. & Kong, H. The quality of citations in major international obstetrics and gynecology journals. Am. J. Obstet. Gynecol. 177, 973–975 (1997).

  37. 37.

    Davies, K. Reference accuracy in library and information science journals. Aslib Proc. 64, 373–387 (2012).

  38. 38.

    Dias, L., Gerlach, M., Scharloth, J. & Altmann, E. G. Using text analysis to quantify the similarity and evolution of scientific disciplines. R. Soc. Open Sci. 5, 171545 (2018).

  39. 39.

    Aksnes, D. W. A macro study of self-citation. Scientometrics 56, 235–246 (2003).

  40. 40.

    Ioannidis, J. P. A. A generalized view of self-citation: direct, co-author, collaborative, and coercive induced self-citation. J. Psychosom. Res. 78, 7–11 (2015).

  41. 41.

    Mukherjee, S., Romero, D. M., Jones, B. & Uzzi, B. The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: the hotspot. Sci. Adv. 3, e1601315 (2017).

  42. 42.

    Van Noorden, R., Maher, B. & Nuzzo, R. The top 100 papers. Nature 514, 550–553 (2014).

  43. 43.

    Gerlach, M., Peixoto, T. P. & Altmann, E. G. A network approach to topic models. Sci. Adv. 4, eaaq1360 (2018).

  44. 44.

    Garfield E. The use of journal impact factors and citation analysis for evaluation of science. In Proc. Cell Separation, Hematology and Journal Citation Analysis, Mini Symposium in Tribute to Arne Bøyum (1998).

  45. 45.

    Stringer, M. M. J., Sales-Pardo, M. & Amaral, L. A. N. Effectiveness of journal ranking schemes as a tool for locating information. PLoS One 3, e1683 (2008).

  46. 46.

    Bornmann, L., de Moya Anegón, F. & Leydesdorff, L. Do scientific advancements lean on the shoulders of giants? A bibliometric investigation of the Ortega hypothesis. PLoS One 5, e13327 (2010).

  47. 47.

    Šubelj, L. & Bajec, M. Model of complex networks based on citation dynamics. In Proc. 22nd International Conference on World Wide Web 527–530 (ACM, 2013).

  48. 48.

    Zhao, Z., Rollins, J., Bai, L. & Rosen, G. Incremental author name disambiguation for scientific citation data. In Proc. 2017 International Conference on Data Science and Advanced Analytics 175–183 (2018).

  49. 49.

    Clarivate Analytics. Web of Science Raw Data (XML): User Guide for Web of Science Raw Data Clarivate.com https://clarivate.libguides.com/c.php?g=593069&p=4220414 (2016).

  50. 50.

    Chiao, J. Y., Bowman, N. E. & Gill, H. The political gender gap: gender bias in facial inferences that predict voting behavior. PLoS One 3, e3666 (2008).

  51. 51.

    Duch, J., Waitzman, J. S. & Amaral, L. A. N. Quantifying the performance of individual players in a team activity. PLoS One 5, e10937 (2010).

  52. 52.

    Sales-Pardo, M., Diermeier, D. & Amaral, L. A. N. The impact of individual biases on consensus formation. PLoS One 8, e58989 (2013).

  53. 53.

    Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947).

Download references

Acknowledgements

L.A.N.A. thanks the John and Leslie McQuown Gift and support from the Department of Defense Army Research Office under grant number W911NF-14-1-0259. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

J.P.-C. contributed to the data preparation, wrote the codes for data analysis, statistical testing and figure plotting, contributed to the interpretation of the results and drafted the manuscript. N.A. collected, cleaned and prepared the data and performed preliminary analysis. M.G. contributed to the collection and the analysis of the data, contributed to the interpretation of the results and drafted the manuscript. L.A.N.A. conceived and designed the study, contributed to the interpretation of the results and drafted the manuscript.

Correspondence to Luís A. N. Amaral.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods 1–3, Supplementary Figs. 1–35, and Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark