Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Prediction of robust scientific facts from literature

A preprint version of the article is available at arXiv.


The growth of published science in recent years has escalated the difficulty that human and algorithmic agents face in reasoning over prior knowledge to select the next experiment. This challenge is increased by uncertainty about the reproducibility of published findings. The availability of massive digital archives, machine reading, extraction tools and automated high-throughput experiments allows us to evaluate these challenges computationally at scale and identify novel opportunities to craft policies that accelerate scientific progress. Here we demonstrate a Bayesian calculus that enables positive prediction of robust scientific claims with findings extracted from published literature, weighted by scientific, social and institutional factors demonstrated to increase replicability. Illustrated with the case of gene regulatory interactions, our approach automatically estimates and counteracts sources of bias, revealing that scientifically focused but socially and institutionally diverse research activity is most likely to replicate. This results in updated certainty about the literature, which accurately predicts robust scientific facts on which new experiments should build. Our findings allow us to identify and evaluate policy recommendations for scientific institutions that may increase robust scientific knowledge, including sponsorship of increased diversity of and independence between investigations of any particular scientific phenomenon, and diversity of scientific phenomena investigated.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Analysis synopsis.
Fig. 2: Feature visualization and estimates from claim-level prediction models.
Fig. 3: Positivity bias in published effects and prediction results.
Fig. 4: Science policy experiments revealing the relationship between community independence, collective attention and certainty about genetic regulatory interactions.

Similar content being viewed by others

Data availability

To illustrate our pipeline, we used the publicly available GeneWays and Literome datasets (available at and, linked with Clarivate’s Web of Science database of bibliographic information. While we cannot share the Web of Science, we share a linked file, which includes all claims of interest and citation metadata required to perform described analyses.

Code availability

Our code is publicly available at and


  1. Hey, T. & Trefethen, A. in Grid Computing: Making the Global Infrastructure a Reality (eds Fox, G. C. & Hey, T.) 809–824 (Wiley, 2003).

  2. Bell, G., Hey, T. & Szalay, A. Computer science. Beyond the data deluge. Science 323, 1297–1298 (2009).

    Article  Google Scholar 

  3. Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    Article  Google Scholar 

  4. King, R. D. et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 247–252 (2004).

    Article  Google Scholar 

  5. Zhou, Q. et al. Learning atoms for materials discovery. Proc. Natl Acad. Sci. USA 115, E6411–E6417 (2018).

    Article  Google Scholar 

  6. Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).

    Article  Google Scholar 

  7. Nissen, S. B., Magidson, T., Gross, K. & Bergstrom, C. T. Publication bias and the canonization of false facts. eLife 5, e21451 (2016).

    Article  Google Scholar 

  8. Daston, L. J. & Galison, P. Objectivity (Zone Books, 2007).

  9. Foreman, P. Weimar culture, causality and quantum theory 1918–1927. Hist. Stud. Phys. Biol. Sci. 3, 2–225 (1971).

    Google Scholar 

  10. Rzhetsky, A., Iossifov, I., Loh, J. M. & White, K. P. Microparadigms: chains of collective reasoning in publications about molecular interactions. Proc. Natl Acad. Sci. USA 103, 4940–4945 (2006).

    Article  Google Scholar 

  11. Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).

    Article  Google Scholar 

  12. Surowiecki, J. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations (Doubleday, 2004).

  13. Galton, F. Vox populi (the wisdom of crowds). Nature 75, 450–451 (1907).

    Article  MATH  Google Scholar 

  14. Hong, L. & Page, S. E. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc. Natl Acad. Sci. USA 101, 16385–16389 (2004).

    Article  Google Scholar 

  15. Becker, J., Brackbill, D. & Centola, D. Network dynamics of social influence in the wisdom of crowds. Proc. Natl Acad. Sci. USA 114, E5070–E5076 (2017).

    Article  Google Scholar 

  16. Lorenz, J., Rauhut, H., Schweitzer, F. & Helbing, D. How social influence can undermine the wisdom of crowd effect. Proc. Natl Acad. Sci. USA 108, 9020–9025 (2011).

    Article  Google Scholar 

  17. Danchev, V., Rzhetsky, A. & Evans, J. A. Centralized communities more likely generate non-replicable results. eLife 8, e43094 (2019).

    Article  Google Scholar 

  18. Hicks, D. M. & Katz, J. S. Where is science going? Sci. Technol. Human Values 21, 379–406 (1996).

    Article  Google Scholar 

  19. Guimerà, R., Uzzi, B., Spiro, J. & Amaral, L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308, 697–702 (2005).

    Article  Google Scholar 

  20. Hand, E. ‘Big science’ spurs collaborative trend. Nature 463, 282–282 (2010).

    Article  Google Scholar 

  21. Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316, 1036–1039 (2007).

    Article  Google Scholar 

  22. Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature (2019).

  23. Jones, B. F., Wuchty, S. & Uzzi, B. Multi-university research teams: shifting impact, geography, and stratification in science. Science 322, 1259–1262 (2008).

    Article  Google Scholar 

  24. Merton, R. K. The Matthew effect in science: the reward and communication systems of science are considered. Science 159, 56–63 (1968).

    Article  Google Scholar 

  25. Azoulay, P., Stuart, T. & Wang, Y. Matthew: effect or fable? Manage. Sci. 60, 92–109 (2014).

    Article  Google Scholar 

  26. Evans, J. A. Electronic publication and the narrowing of science and scholarship. Science 321, 395–399 (2008).

    Article  Google Scholar 

  27. Simkin, M. V. & Roychowdhury, V. P. Do copied citations create renowned papers? Ann. Improbable Res. 11, 24–27 (2005).

    Article  Google Scholar 

  28. Chu, J. S. G. & Evans, J. A. Slowed canonical progress in large fields of science. Proc. Natl. Acad. Sci. USA 118, e2021636118 (2021).

    Article  Google Scholar 

  29. Mullard, A. Reliability of ‘new drug target’ claims called into question. Nat. Rev. Drug Discov. 10, 643–644 (2011).

    Article  Google Scholar 

  30. Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712–712 (2011).

    Article  Google Scholar 

  31. Freedman, L. P. & Gibson, M. C. The impact of preclinical irreproducibility on drug development. Clin. Pharmacol. Ther. 97, 16–18 (2015).

    Article  Google Scholar 

  32. Ioannidis, J. P., Ntzani, E. E., Trikalinos, T. A. & Contopoulos-Ioannidis, D. G. Replication validity of genetic association studies. Nat. Genet. 29, 306–309 (2001).

    Article  Google Scholar 

  33. Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).

    Article  Google Scholar 

  34. Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).

    Article  Google Scholar 

  35. Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

  36. Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J. & Reinero, D. A. Contextual sensitivity in scientific reproducibility. Proc. Natl Acad. Sci. USA 113, 6454–6459 (2016).

    Article  Google Scholar 

  37. Zollman, K. J. S. The communication structure of epistemic communities. Phil. Sci. 74, 574–587 (2007).

    Article  Google Scholar 

  38. Payette, N. in Models of Science Dynamics: Encounters between Complexity Theory and Information Sciences (eds Scharnhorst, A., Börner, K. & van den Besselaar, P.) 127–157 (Springer, 2012).

  39. Baker, M. Biotech giant publishes failures to confirm high-profile science. Nature 530, 141 (2016).

    Article  Google Scholar 

  40. Borenstein, M., Hedges, L. V., Higgins, J. P. T. & Rothstein, H. R. Introduction to Meta-Analysis (Wiley, 2011).

  41. Nussbaum, D. The role of conceptual replication. Psychologist 25, 350 (2012).

    Google Scholar 

  42. Barragan-Jason, G., Atance, C. M., Hopfensitz, A., Stieglitz, J. & Cauchoix, M. Commentary: Revisiting the marshmallow test: a conceptual replication investigating links between early delay of gratification and later outcomes. Front. Psychol. 9, 2719 (2019).

    Article  Google Scholar 

  43. MacLeod, C. & McLaughlin, K. Implicit and explicit memory bias in anxiety: a conceptual replication. Behav. Res. Ther. 33, 1–14 (1995).

    Article  Google Scholar 

  44. Hagemann, D., Naumann, E., Becker, G., Maier, S. & Bartussek, D. Frontal brain asymmetry and affective style: a conceptual replication. Psychophysiology 35, 372–388 (1998).

    Article  Google Scholar 

  45. Horselenberg, R., Merckelbach, H. & Josephs, S. Individual differences and false confessions: a conceptual replication of Kassin and Kiechel (1996). Psychol. Crime Law 9, 1–8 (2003).

    Article  Google Scholar 

  46. Belknap, P. & Leonard, W. M. A conceptual replication and extension of Erving Goffman’s study of gender advertisements. Sex Roles 25, 103–118 (1991).

    Article  Google Scholar 

  47. Seyedghorban, Z., Tahernejad, H. & Matanda, M. J. Reinquiry into advertising avoidance on the internet: a conceptual replication and extension. J. Advert. 45, 120–129 (2016).

    Article  Google Scholar 

  48. Lu, Y., Ossmann, M. M., Leaf, D. E. & Factor, P. H. Patient visibility and ICU mortality: a conceptual replication. HERD 7, 92–103 (2014).

    Article  Google Scholar 

  49. Friedman, C., Kra, P. & Rzhetsky, A. Two biomedical sublanguages: a description based on the theories of Zellig Harris. J. Biomed. Inform. 35, 222–235 (2002).

    Article  Google Scholar 

  50. Rzhetsky, A. et al. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J. Biomed. Inform. 37, 43–53 (2004).

    Article  Google Scholar 

  51. Quirk, C. et al. MSR SPLAT, a language analysis toolkit. In Proc. 2012 Conference of the North American Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, 2012).

  52. Kim, J.-D., Ohta, T., Pyysalo, S., Kano, Y. & Tsujii, J. Overview of BioNLP’09 shared task on event extraction. In Proc. BioNLP 2009 Workshop Companion Volume for Shared Task (Association for Computational Linguistics, 2009).

  53. Rosvall, M., Axelsson, D. & Bergstrom, C. T. The map equation. Eur. Phys. J. Spec. Top. 178, 13–23 (2009).

    Article  Google Scholar 

  54. Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).

    Article  Google Scholar 

  55. Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 86, 638 (1979).

    Article  Google Scholar 

  56. Scargle, J. D. Publication bias (the ‘file-drawer problem’) in scientific inference. Preprint at (1999).

  57. Sunstein, C. R. (Princeton Univ. Press, 2001).

  58. Stoeger, T., Gerlach, M., Morimoto, R. I. & Nunes Amaral, L. A. Large-scale investigation of the reasons why potentially important genes are ignored. PLoS Biol. 16, e2006643 (2018).

    Article  Google Scholar 

  59. Rzhetsky, A. et al. GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J. Biomed. Inform. 37, 43–53 (2004).

    Article  Google Scholar 

  60. Poon, H., Quirk, C., DeZiel, C. & Heckerman, D. Literome: PubMed-scale genomic knowledge base in the cloud. Bioinformatics 30, 2840–2842 (2014).

    Article  Google Scholar 

  61. Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).

    Article  Google Scholar 

  62. Bergstrom, C. T., West, J. D. & Wiseman, M. A. The eigenfactor™ metrics. J. Neurosci. 28, 11433–11434 (2008).

    Article  Google Scholar 

  63. Ioannidis, J. P. A., Boyack, K. W. & Klavans, R. Estimates of the continuously publishing core in the scientific workforce. PLoS ONE 9, e101698 (2014).

    Article  Google Scholar 

  64. Babuji, Y. N., Chard K., Gerow, A. & Duede, E. Cloud Kotta: enabling secure and scalable data analytics in the cloud. In IEEE International Conference on Big Data 302–310 (IEEE, 2016).

Download references


We thank V. Sitnik, V. Danchev and P. Saleiro for fruitful discussions, Y. Babuji for technical help, H. Poon for suggestions regarding the formulation of the project and I. Mayzus, R. Melamed and O. Kel-Margoulis for help with the annotation and the interpretation of biological datasets. We are grateful for comments from participants of the MetaScience Conference at Stanford (2019), and for meetings associated with the Defense Advanced Research Projects Agency (DARPA) Big Mechanism programme. We acknowledge funding from DARPA (14145043, J.E. and A.V.B.; HR00111820006, J.E., A.V.B. and A.R.), the Air Force Office of Scientific Research (FA9550-19-1-0354, J.E.; FA9550-15-1-0162, J.E.), the National Science Foundation (SBE-1829366, J.E.; 1422902, J.E.; 1158803, J.E.) and the John Templeton Foundation to the ‘Metaknowledge Network’ (J.E. and A.R.).

Author information

Authors and Affiliations



A.V.B. proposed and implemented the methodology, validated the model, analysed the data and drafted the paper. J.E. was responsible for conception and funding of the project, contributed to the design of the methodology and drafted the paper. A.R. provided feedback on the experimental work and data interpretation, and participated in drafting the paper. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Alexander V. Belikov or James Evans.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Luis Amaral and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Illustration of core interaction and claim variables.

Directed regulatory interactions between genes constitute communities of researchers who study them. Features regarding the position of a claim within prior knowledge are derived from its relationship to other genetic regulatory interactions. Features regarding the breadth and independence of support are derived from the connection between publications making claims about the same interaction.

Extended Data Fig. 2 Correlation between claim value and experimental strength across the claim frequency distribution.

Correlation of mean claim value \(\hat \mu ^\alpha\) and interaction strength \(\hat \pi ^\alpha\) from LINCS L1000 as a function of threshold on minimum claim sequence length per interaction for GeneWays (a) and Literome (b).

Extended Data Fig. 3 Data-driven thresholds to partition interactions into neutral, negative and positive interactions for analysis.

C0, C and C+ correspond to classes of neutral, negative and positive genetic regulatory interactions. Distance between C0and C (\(W\left( {g_0,g_ - ,\theta _ - ,\theta _ + } \right)\), solid green), and C0 and C+ (\(W\left( {g_0,g_ + ,\theta _ - ,\theta _ + } \right)\), solid blue), number of claims in C, dotted green, number of claims in C+, dotted blue) for GeneWays (a) and Literome (b). Distance between C0 and C (\(W\left( {g_0,g_ - ,\theta _ - ,\theta _ + } \right)\)) in GeneWays (c) and Literome (d); Distance between C0 and C+ (\(W\left( {g_0,g_ + ,\theta _ - ,\theta _ + } \right)\)) in GeneWays (e) and Literome (f).

Extended Data Fig. 4 Pearson correlation between core analysis variables in both Geneways and Literome datasets.

Heat map indicating correlation between: (a) claim correctness \(y_i^\alpha\) and batch-level features for GeneWays (top row) and Literome (bottom row); (b) claim correctness \(y_i^\alpha\) and claim-level features for GeneWays (top row) and Literome (bottom row); (c) interaction non-neutrality \(\pi _0^\alpha\) and interaction-level features for GeneWays (top row) and Literome (bottom row); (d) interaction positivity \(\pi _ + ^\alpha\) and interaction-level features for GeneWays (top row) and Literome (bottom row).

Extended Data Fig. 5 Variable importance and significance in models of the non-neutrality and positivity of genetic regulatory interactions.

Family importances of random forest model (left, darker shade) and logistic regression coefficients (right, lighter shade) for the model of classification of neutral interactions (top) and positive interactions (bottom) for GeneWays (left) and Literome (right). Vertical centered lines show 95% confidence level on the mean of the corresponding importance/coefficient.

Extended Data Fig. 6 Analysis of the relationship between the distribution of claims per interaction and overall certainty about those interactions.

Examples of claim number distribution ρ(nα) per interaction for test subsamples from GeneWays (a) and Literome (b). Information gain as a function of the slope of claim number distribution β. Solid lines correspond to binned averages and shaded regions denote one standard deviation of the data confidence interval for GeneWays (c) and Literome (d).

Extended Data Fig. 7 Survival functions (complements of the cumulative distribution functions) of claim number per interaction.

Survival functions for GeneWays (a) and Literome (b); for all interactions () and nonzero () interactions, where the probability distribution function is modeled as \(\rho \propto n^\gamma\). Exponents γ equal 2.26 and 2.01 for Geneways for all and non-neutral interactions, respectively; and equal 2.5 and 2.26 for Literome for all and non-neutral interactions. The exponents were obtained by Maximum Likelihood Estimation.

Extended Data Fig. 8 Model selection using ROC AUC values for all models.

Neutral interaction models (a-c,g-i) and positive interaction models (d-f,j-l). Left: the distribution of ROC AUC as a function of random forest depth (a,d,g,j). Center: the distribution of ROC AUC as a function of minimum number of samples in a decision tree leaf (b,e,h,k). Right: the distribution of ROC AUC as a function of the number of trees in a random forest (c,f,i,l).

Extended Data Fig. 9 Science policy experiments revealing the relationship between community independence, collective attention, and certainty about genetic regulatory interactions (complement to Fig. 4).

a, Relationship between the number of communities studying a particular genetic regulatory interaction and the average AUC of out-of-sample predictions for positive interactions. b, Distribution of the average AUC curves for Literome for interactions with 1, 2-3 and greater than 4 communities. c, Relationship between the shape of the distribution of number of claims per interaction on the AUC of out-of-sample predictions for positive interactions. β represents the slope of the claim number per interaction distribution for Literome. (Complement to main Fig. 4).

Extended Data Fig. 10 Positivity bias in published effects and prediction results for Literome (complement to Fig. 3); random forest Gini Importance scores and logistic regression coefficients for features from Literome (complement to Fig. 2b).

a, Joint plot of the mean experimental interaction strength (x-axis) and mean value of the published claim (y-axis) for each genetic interaction. More intense hues of the red (and also greater marker size) correspond to the interactions in Literome with 10 or more claims per interaction; for less intense hues (and also smaller marker size) the cutoff is absent, representing the complete distribution. (See Fig. 3a for comparable Geneways distribution). b, We first predicted the nonexistentence () or existence () of each published gene-gene regulatory interaction (Literome). c, Then, if the interaction was deemed existent (), we predicted whether each claim (of positivity or negativity) from literature was correct. d, Using Bayesian inference, we estimated the sign (positive vs negative) of all genetic regulatory interactions. Mean ROC curves in bold are complemented by a 95% c.i. contours, with fainter individual lines corresponding to ROC curves for 60 models corresponding to different training/validation samples. (Complement to Fig. 3 in the main manuscript). e, Gini Importance or Mean Decrease in Impurity for features in the random forest models (left vertical scale, bold colors), and coefficients from the logistic regression models (right vertical scale, fainter colors) for Literome. Vertical bars represent 95% c.i. for the mean value of the estimate.

Supplementary information

Supplementary Information

Supplementary Discussion and Tables 1–3.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Belikov, A.V., Rzhetsky, A. & Evans, J. Prediction of robust scientific facts from literature. Nat Mach Intell 4, 445–454 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics