Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Learning on knowledge graph dynamics provides an early warning of impactful research

Abstract

The scientific ecosystem relies on citation-based metrics that provide only imperfect, inconsistent and easily manipulated measures of research quality. Here we describe DELPHI (Dynamic Early-warning by Learning to Predict High Impact), a framework that provides an early-warning signal for ‘impactful’ research by autonomously learning high-dimensional relationships among features calculated across time from the scientific literature. We prototype this framework and deduce its performance and scaling properties on time-structured publication graphs from 1980 to 2019 drawn from 42 biotechnology-related journals, including over 7.8 million individual nodes, 201 million relationships and 3.8 billion calculated metrics. We demonstrate the framework’s performance by correctly identifying 19/20 seminal biotechnologies from 1980 to 2014 via a blinded retrospective study and provide 50 research papers from 2018 that DELPHI predicts will be in the top 5% of time-rescaled node centrality in the future. We propose DELPHI as a tool to aid in the construction of diversified, impact-optimized funding portfolios.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Collecting, structuring, computing on and learning an early-warning signal of scientific impact from dynamic knowledge graphs.
Fig. 2: Dynamics of knowledge graph structure contain information about future scientific impact.
Fig. 3: DELPHI leverages temporal dynamics to identify high-impact research early and with state-of-the-art performance characteristics when focused on biotechnology publications.
Fig. 4: DELPHI correctly identifies historical biotechnology breakthroughs in a blinded back-testing.
Fig. 5: In a world of expanding science and limited resources, quantitative approaches such as DELPHI can be used to help guide research funding allocations to maximize scientific return on investment.

Data availability

The data analyzed are available for download from https://www.lens.org/. Exemplary datasets and retrieval code are further available from GitHub as described in the ‘Code availability’ section.

Code availability

Exemplary code, datasets, trained models, a visualization application to aid in the analysis of results and Docker-based installation instructions are all available from GitHub at https://github.com/jameswweis/delphi.

References

  1. 1.

    McNutt, M. The measure of research merit. Science 346, 1155 (2014).

    CAS  Article  Google Scholar 

  2. 2.

    Not-so-deep impact. Nature 435, 1003–1004 (2005).

  3. 3.

    Wilhite, A. W. & Fong, E. A. Coercive citation in academic publishing. Science 335, 542–543 (2012).

    CAS  Article  Google Scholar 

  4. 4.

    Seglen, P. O. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 498–502 (1997).

    CAS  Article  Google Scholar 

  5. 5.

    Cumming, D. J. & Dai, N. Local bias in venture capital investments. J. Empirical Finance 17, 362–380 (2010).

    Article  Google Scholar 

  6. 6.

    Gompers, P., Gornall, W., Kaplan, S. & Strebulaev, I. How Do Venture Capitalists Make Decisions? Working Paper 22587 https://www.nber.org/system/files/working_papers/w22587/w22587.pdf (National Bureau of Economic Research, 2016).

  7. 7.

    Mulcahy, D., Weeks, B. & Bradley, H. We Have Met The Enemy… and He Is Us: Lessons from Twenty Years of the Kauffman Foundation’s Investments in Venture Capital Funds and the Triumph of Hope over Experience https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2053258 (Kauffman Foundation, 2012).

  8. 8.

    Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Management Sci. 63, 791–817 (2017).

    Article  Google Scholar 

  9. 9.

    Mariani, M. S., Medo, M. & Lafond, F. Early identification of important patents: design and validation of citation network metrics. Technol. Forecast. Soc. Change 146, 644–654 (2019).

    Article  Google Scholar 

  10. 10.

    Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).

    CAS  Article  Google Scholar 

  11. 11.

    Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115, 12608–12615 (2018).

    CAS  Article  Google Scholar 

  12. 12.

    Battiston, F. et al. Taking census of physics. Nat. Rev. Physics 1, 89–97 (2019).

    Article  Google Scholar 

  13. 13.

    Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).

    CAS  Article  Google Scholar 

  14. 14.

    Fu, L. D. & Aliferis, C. F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85, 257–270 (2010).

    Article  Google Scholar 

  15. 15.

    Weihs, L. & Etzioni, O. Learning to predict citation-based impact measures. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries 49–58 http://ai2-website.s3.amazonaws.com/publications/JCDL2017.pdf (2017).

  16. 16.

    Vidmer, A. & Medo, M. The essential role of time in network-based recommendation. Europhysics Lett. 116, 30007 (2016).

    Article  Google Scholar 

  17. 17.

    Mariani, M. S., Medo, M. & Zhang, Y.-C. Identification of milestone papers through time-balanced network centrality. J. Informetrics 10, 1207–1223 (2016).

    Article  Google Scholar 

  18. 18.

    Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 https://doi.org/10.1145/2939672.2939754 (2016).

  19. 19.

    Tachibana, M. et al. G9a histone methyltransferase plays a dominant role in euchromatic histone h3 lysine 9 methylation and is essential for early embryogenesis. Genes Dev. 16, 1779–1791 (2002).

    CAS  Article  Google Scholar 

  20. 20.

    Dykstra, B. et al. Long-term propagation of distinct hematopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229 (2007).

    CAS  Article  Google Scholar 

  21. 21.

    Nature and biotechnology. Nat. Biotechnol. 37, 1383–1383 (2019).

  22. 22.

    Xu, S., Mariani, M. S., Lü, L. & Medo, M. Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. J. Informetrics 14, 101005 (2020).

    Article  Google Scholar 

  23. 23.

    Metcalfe, B. Metcalfe’s law after 40 years of ethernet. Computer 46, 26–31 (2013).

    Article  Google Scholar 

  24. 24.

    Zhang, X.-Z., Liu, J.-J. & Xu, Z.-W. Tencent and Facebook data validate Metcalfe’s law. J. Comput. Sci. Technol. 30, 246–251 (2015).

    Article  Google Scholar 

  25. 25.

    Fang, F. C. & Casadevall, A. Research funding: the case for a modified lottery. mBio 7, e00422–16 (2016).

  26. 26.

    Nicholson, J. M. & Ioannidis, J. P. A. Conform and be funded. Nature 492, 34–36 (2012).

    CAS  Article  Google Scholar 

  27. 27.

    Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artificial Intell. Res. 16, 321–357 (2002).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the consortia of sponsors of the MIT Media Lab and the MIT Center for Bits and Atoms. We thank the AWS Cloud Credits for Research program for computational infrastructure and the Lens Lab for providing publication data.

Author information

Affiliations

Authors

Contributions

J.W.W. and J.M.J. conceived the study. J.W.W. performed the data structuring, algorithm design and computational implementation. J.W.W. and J.M.J. drafted the manuscript and figures. J.M.J. supported and supervised the project.

Corresponding author

Correspondence to James W. Weis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Lutz Bornmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The DELPHI framework exhibits strong performance characteristics across a range of definitions of high-impact and and model evaluation criteria.

(a) The DELPHI framework is based on a user-defined definition of high-impact, and the utility of the framework is robust to the specific parameters of that definition. DELPHI models were constructed with a range of threshold definitions between 5% and 25%, and evaluated across a range of criteria to demonstrate this robustness. (b) Those papers in the top 5% of our impact metric, time-rescaled node centrality, contain over 35% of total aggregate impact. As such, the high-impact threshold of 5% was chosen for this study.

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Weis, J.W., Jacobson, J.M. Learning on knowledge graph dynamics provides an early warning of impactful research. Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-00907-6

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing