Abstract
The scientific ecosystem relies on citation-based metrics that provide only imperfect, inconsistent and easily manipulated measures of research quality. Here we describe DELPHI (Dynamic Early-warning by Learning to Predict High Impact), a framework that provides an early-warning signal for ‘impactful’ research by autonomously learning high-dimensional relationships among features calculated across time from the scientific literature. We prototype this framework and deduce its performance and scaling properties on time-structured publication graphs from 1980 to 2019 drawn from 42 biotechnology-related journals, including over 7.8 million individual nodes, 201 million relationships and 3.8 billion calculated metrics. We demonstrate the framework’s performance by correctly identifying 19/20 seminal biotechnologies from 1980 to 2014 via a blinded retrospective study and provide 50 research papers from 2018 that DELPHI predicts will be in the top 5% of time-rescaled node centrality in the future. We propose DELPHI as a tool to aid in the construction of diversified, impact-optimized funding portfolios.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Cancer nanotechnology: current status and perspectives
Nano Convergence Open Access 02 November 2021
-
Speeding up to keep up: exploring the use of AI in the research process
AI & SOCIETY Open Access 15 October 2021
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
The data analyzed are available for download from https://www.lens.org/. Exemplary datasets and retrieval code are further available from GitHub as described in the ‘Code availability’ section.
Code availability
Exemplary code, datasets, trained models, a visualization application to aid in the analysis of results and Docker-based installation instructions are all available from GitHub at https://github.com/jameswweis/delphi.
References
McNutt, M. The measure of research merit. Science 346, 1155 (2014).
Not-so-deep impact. Nature 435, 1003–1004 (2005).
Wilhite, A. W. & Fong, E. A. Coercive citation in academic publishing. Science 335, 542–543 (2012).
Seglen, P. O. Why the impact factor of journals should not be used for evaluating research. BMJ 314, 498–502 (1997).
Cumming, D. J. & Dai, N. Local bias in venture capital investments. J. Empirical Finance 17, 362–380 (2010).
Gompers, P., Gornall, W., Kaplan, S. & Strebulaev, I. How Do Venture Capitalists Make Decisions? Working Paper 22587 https://www.nber.org/system/files/working_papers/w22587/w22587.pdf (National Bureau of Economic Research, 2016).
Mulcahy, D., Weeks, B. & Bradley, H. We Have Met The Enemy… and He Is Us: Lessons from Twenty Years of the Kauffman Foundation’s Investments in Venture Capital Funds and the Triumph of Hope over Experience https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2053258 (Kauffman Foundation, 2012).
Funk, R. J. & Owen-Smith, J. A dynamic network measure of technological change. Management Sci. 63, 791–817 (2017).
Mariani, M. S., Medo, M. & Lafond, F. Early identification of important patents: design and validation of citation network metrics. Technol. Forecast. Soc. Change 146, 644–654 (2019).
Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).
Ma, Y. & Uzzi, B. Scientific prize network predicts who pushes the boundaries of science. Proc. Natl Acad. Sci. USA 115, 12608–12615 (2018).
Battiston, F. et al. Taking census of physics. Nat. Rev. Physics 1, 89–97 (2019).
Acuna, D. E., Allesina, S. & Kording, K. P. Predicting scientific success. Nature 489, 201–202 (2012).
Fu, L. D. & Aliferis, C. F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85, 257–270 (2010).
Weihs, L. & Etzioni, O. Learning to predict citation-based impact measures. Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries 49–58 http://ai2-website.s3.amazonaws.com/publications/JCDL2017.pdf (2017).
Vidmer, A. & Medo, M. The essential role of time in network-based recommendation. Europhysics Lett. 116, 30007 (2016).
Mariani, M. S., Medo, M. & Zhang, Y.-C. Identification of milestone papers through time-balanced network centrality. J. Informetrics 10, 1207–1223 (2016).
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 https://doi.org/10.1145/2939672.2939754 (2016).
Tachibana, M. et al. G9a histone methyltransferase plays a dominant role in euchromatic histone h3 lysine 9 methylation and is essential for early embryogenesis. Genes Dev. 16, 1779–1791 (2002).
Dykstra, B. et al. Long-term propagation of distinct hematopoietic differentiation programs in vivo. Cell Stem Cell 1, 218–229 (2007).
Nature and biotechnology. Nat. Biotechnol. 37, 1383–1383 (2019).
Xu, S., Mariani, M. S., Lü, L. & Medo, M. Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data. J. Informetrics 14, 101005 (2020).
Metcalfe, B. Metcalfe’s law after 40 years of ethernet. Computer 46, 26–31 (2013).
Zhang, X.-Z., Liu, J.-J. & Xu, Z.-W. Tencent and Facebook data validate Metcalfe’s law. J. Comput. Sci. Technol. 30, 246–251 (2015).
Fang, F. C. & Casadevall, A. Research funding: the case for a modified lottery. mBio 7, e00422–16 (2016).
Nicholson, J. M. & Ioannidis, J. P. A. Conform and be funded. Nature 492, 34–36 (2012).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artificial Intell. Res. 16, 321–357 (2002).
Acknowledgements
This work was supported by the consortia of sponsors of the MIT Media Lab and the MIT Center for Bits and Atoms. We thank the AWS Cloud Credits for Research program for computational infrastructure and the Lens Lab for providing publication data.
Author information
Authors and Affiliations
Contributions
J.W.W. and J.M.J. conceived the study. J.W.W. performed the data structuring, algorithm design and computational implementation. J.W.W. and J.M.J. drafted the manuscript and figures. J.M.J. supported and supervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Biotechnology thanks Lutz Bornmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 The DELPHI framework exhibits strong performance characteristics across a range of definitions of high-impact and and model evaluation criteria.
(a) The DELPHI framework is based on a user-defined definition of high-impact, and the utility of the framework is robust to the specific parameters of that definition. DELPHI models were constructed with a range of threshold definitions between 5% and 25%, and evaluated across a range of criteria to demonstrate this robustness. (b) Those papers in the top 5% of our impact metric, time-rescaled node centrality, contain over 35% of total aggregate impact. As such, the high-impact threshold of 5% was chosen for this study.
Supplementary information
Supplementary Information
Supplementary Tables 1 and 2.
Rights and permissions
About this article
Cite this article
Weis, J.W., Jacobson, J.M. Learning on knowledge graph dynamics provides an early warning of impactful research. Nat Biotechnol 39, 1300–1307 (2021). https://doi.org/10.1038/s41587-021-00907-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-021-00907-6
This article is cited by
-
VSAN: A new visualization method for super-large-scale academic networks
Frontiers of Computer Science (2024)
-
Is science really getting less disruptive — and does it matter if it is?
Nature (2023)
-
Facebook and Tencent Data Fit a Cube Law Better than Metcalfe’s Law
Journal of Computer Science and Technology (2023)
-
A review of scientific impact prediction: tasks, features and methods
Scientometrics (2023)
-
Speeding up to keep up: exploring the use of AI in the research process
AI & SOCIETY (2022)