Learning on knowledge graph dynamics provides an early warning of impactful research

Weis, James W.; Jacobson, Joseph M.

doi:10.1038/s41587-021-00907-6

Analysis
Published: 17 May 2021

Learning on knowledge graph dynamics provides an early warning of impactful research

Nature Biotechnology volume 39, pages 1300–1307 (2021)Cite this article

24k Accesses
30 Citations
1237 Altmetric
Metrics details

Subjects

Abstract

The scientific ecosystem relies on citation-based metrics that provide only imperfect, inconsistent and easily manipulated measures of research quality. Here we describe DELPHI (Dynamic Early-warning by Learning to Predict High Impact), a framework that provides an early-warning signal for ‘impactful’ research by autonomously learning high-dimensional relationships among features calculated across time from the scientific literature. We prototype this framework and deduce its performance and scaling properties on time-structured publication graphs from 1980 to 2019 drawn from 42 biotechnology-related journals, including over 7.8 million individual nodes, 201 million relationships and 3.8 billion calculated metrics. We demonstrate the framework’s performance by correctly identifying 19/20 seminal biotechnologies from 1980 to 2014 via a blinded retrospective study and provide 50 research papers from 2018 that DELPHI predicts will be in the top 5% of time-rescaled node centrality in the future. We propose DELPHI as a tool to aid in the construction of diversified, impact-optimized funding portfolios.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Collecting, structuring, computing on and learning an early-warning signal of scientific impact from dynamic knowledge graphs.**

**Fig. 2: Dynamics of knowledge graph structure contain information about future scientific impact.**

**Fig. 3: DELPHI leverages temporal dynamics to identify high-impact research early and with state-of-the-art performance characteristics when focused on biotechnology publications.**

**Fig. 4: DELPHI correctly identifies historical biotechnology breakthroughs in a blinded back-testing.**

Fig. 5: In a world of expanding science and limited resources, quantitative approaches such as DELPHI can be used to help guide research funding allocations to maximize scientific return on investment.

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Data availability

The data analyzed are available for download from https://www.lens.org/. Exemplary datasets and retrieval code are further available from GitHub as described in the ‘Code availability’ section.

Code availability

Exemplary code, datasets, trained models, a visualization application to aid in the analysis of results and Docker-based installation instructions are all available from GitHub at https://github.com/jameswweis/delphi.

References

McNutt, M. The measure of research merit. Science 346, 1155 (2014).
Article CAS Google Scholar
Not-so-deep impact. Nature 435, 1003–1004 (2005).
Wilhite, A. W. & Fong, E. A. Coercive citation in academic publishing. Science 335, 542–543 (2012).
Article CAS Google Scholar
Seglen, P. O. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 498–502 (1997).
Article CAS Google Scholar
Cumming, D. J. & Dai, N. Local bias in venture capital investments. J. Empirical Finance 17, 362–380 (2010).
Article Google Scholar
Gompers, P., Gornall, W., Kaplan, S. & Strebulaev, I. How Do Venture Capitalists Make Decisions? Working Paper 22587 https://www.nber.org/system/files/working_papers/w22587/w22587.pdf (National Bureau of Economic Research, 2016).
Mulcahy, D., Weeks, B. & Bradley, H. We Have Met The Enemy… and He Is Us: Lessons from Twenty Years of the Kauffman Foundation’s Investments in Venture Capital Funds and the Triumph of Hope over Experience https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2053258 (Kauffman Foundation, 2012).
Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Management Sci. 63, 791–817 (2017).
Article Google Scholar
Mariani, M. S., Medo, M. & Lafond, F. Early identification of important patents: design and validation of citation network metrics. Technol. Forecast. Soc. Change 146, 644–654 (2019).
Article Google Scholar
Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).
Article CAS Google Scholar
Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115, 12608–12615 (2018).
Article CAS Google Scholar
Battiston, F. et al. Taking census of physics. Nat. Rev. Physics 1, 89–97 (2019).
Article Google Scholar
Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).
Article CAS Google Scholar
Fu, L. D. & Aliferis, C. F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85, 257–270 (2010).
Article Google Scholar
Weihs, L. & Etzioni, O. Learning to predict citation-based impact measures. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries 49–58 http://ai2-website.s3.amazonaws.com/publications/JCDL2017.pdf (2017).
Vidmer, A. & Medo, M. The essential role of time in network-based recommendation. Europhysics Lett. 116, 30007 (2016).
Article Google Scholar
Mariani, M. S., Medo, M. & Zhang, Y.-C. Identification of milestone papers through time-balanced network centrality. J. Informetrics 10, 1207–1223 (2016).
Article Google Scholar
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 https://doi.org/10.1145/2939672.2939754 (2016).
Tachibana, M. et al. G9a histone methyltransferase plays a dominant role in euchromatic histone h3 lysine 9 methylation and is essential for early embryogenesis. Genes Dev. 16, 1779–1791 (2002).
Article CAS Google Scholar
Dykstra, B. et al. Long-term propagation of distinct hematopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229 (2007).
Article CAS Google Scholar
Nature and biotechnology. Nat. Biotechnol. 37, 1383–1383 (2019).
Xu, S., Mariani, M. S., Lü, L. & Medo, M. Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. J. Informetrics 14, 101005 (2020).
Article Google Scholar
Metcalfe, B. Metcalfe’s law after 40 years of ethernet. Computer 46, 26–31 (2013).
Article Google Scholar
Zhang, X.-Z., Liu, J.-J. & Xu, Z.-W. Tencent and Facebook data validate Metcalfe’s law. J. Comput. Sci. Technol. 30, 246–251 (2015).
Article Google Scholar
Fang, F. C. & Casadevall, A. Research funding: the case for a modified lottery. mBio 7, e00422–16 (2016).
Nicholson, J. M. & Ioannidis, J. P. A. Conform and be funded. Nature 492, 34–36 (2012).
Article CAS Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artificial Intell. Res. 16, 321–357 (2002).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the consortia of sponsors of the MIT Media Lab and the MIT Center for Bits and Atoms. We thank the AWS Cloud Credits for Research program for computational infrastructure and the Lens Lab for providing publication data.

Author information

Authors and Affiliations

MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
James W. Weis & Joseph M. Jacobson
Department of Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
James W. Weis
MIT Center for Bits and Atoms, Massachusetts Institute of Technology, Cambridge, MA, USA
Joseph M. Jacobson

Authors

James W. Weis
View author publications
You can also search for this author in PubMed Google Scholar
Joseph M. Jacobson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.W.W. and J.M.J. conceived the study. J.W.W. performed the data structuring, algorithm design and computational implementation. J.W.W. and J.M.J. drafted the manuscript and figures. J.M.J. supported and supervised the project.

Corresponding author

Correspondence to James W. Weis.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Lutz Bornmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The DELPHI framework exhibits strong performance characteristics across a range of definitions of high-impact and and model evaluation criteria.

(a) The DELPHI framework is based on a user-defined definition of high-impact, and the utility of the framework is robust to the specific parameters of that definition. DELPHI models were constructed with a range of threshold definitions between 5% and 25%, and evaluated across a range of criteria to demonstrate this robustness. (b) Those papers in the top 5% of our impact metric, time-rescaled node centrality, contain over 35% of total aggregate impact. As such, the high-impact threshold of 5% was chosen for this study.

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weis, J.W., Jacobson, J.M. Learning on knowledge graph dynamics provides an early warning of impactful research. Nat Biotechnol 39, 1300–1307 (2021). https://doi.org/10.1038/s41587-021-00907-6

Download citation

Received: 19 February 2020
Revised: 29 December 2020
Accepted: 22 March 2021
Published: 17 May 2021
Issue Date: October 2021
DOI: https://doi.org/10.1038/s41587-021-00907-6

This article is cited by

A knowledge-graph based text summarization scheme for mobile edge computing
- Zheng Yu
- Songyu Wu
- Dongqing Liu
Journal of Cloud Computing (2024)
Impact of medical technologies may be predicted using constructed graph bibliometrics
- Lawrence Jiang
- Ashir Raza
- Shuhan He
Scientific Reports (2024)
VSAN: A new visualization method for super-large-scale academic networks
- Qi Li
- Xingli Wang
- Chenghu Zhou
Frontiers of Computer Science (2024)
Is science really getting less disruptive — and does it matter if it is?

Nature (2023)
Facebook and Tencent Data Fit a Cube Law Better than Metcalfe’s Law
- Xing-Zhou Zhang
- Zhi-Wei Xu
Journal of Computer Science and Technology (2023)