Validity of machine learning in biology and medicine increased through collaborations across fields of expertise

Littmann, Maria; Selig, Katharina; Cohen-Lavi, Liel; Frank, Yotam; Hönigschmid, Peter; Kataka, Evans; Mösch, Anja; Qian, Kun; Ron, Avihai; Schmid, Sebastian; Sorbie, Adam; Szlak, Liran; Dagan-Wiener, Ayana; Ben-Tal, Nir; Niv, Masha Y.; Razansky, Daniel; Schuller, Björn W.; Ankerst, Donna; Hertz, Tomer; Rost, Burkhard

doi:10.1038/s42256-019-0139-8

Perspective
Published: 13 January 2020

Validity of machine learning in biology and medicine increased through collaborations across fields of expertise

Nature Machine Intelligence volume 2, pages 18–24 (2020)Cite this article

2289 Accesses
36 Citations
71 Altmetric
Metrics details

Subjects

Abstract

Machine learning (ML) has become an essential asset for the life sciences and medicine. We selected 250 articles describing ML applications from 17 journals sampling 26 different fields between 2011 and 2016. Independent evaluation by two readers highlighted three results. First, only half of the articles shared software, 64% shared data and 81% applied any kind of evaluation. Although crucial for ensuring the validity of ML applications, these aspects were met more by publications in lower-ranked journals. Second, the authors’ scientific backgrounds highly influenced how technical aspects were addressed: reproducibility and computational evaluation methods were more prominent with computational co-authors; experimental proofs more with experimentalists. Third, 73% of the ML applications resulted from interdisciplinary collaborations comprising authors from at least two of the three disciplines: computational sciences, biology, and medicine. The results suggested collaborations between computational and experimental scientists to generate more scientifically sound and impactful work integrating knowledge from both domains. Although scientifically more valid solutions and collaborations involving diverse expertise did not correlate with impact factors, such collaborations provide opportunities to both sides: computational scientists are given access to novel and challenging real-world biological data, increasing the scientific impact of their research, and experimentalists benefit from more in-depth computational analyses improving the technical correctness of work.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Spearman correlation coefficients for numeric and binary variables.**

**Fig. 2: Method validation, comparison and data and programme sharing depends on author expertise.**

**Fig. 3: Sharing and method comparison hardly impact citations.**

**Fig. 4: Number of citations and impact factor not consistently higher for collaborations.**

Robustness and reproducibility for AI learning in biomedical sciences: RENOIR

Article Open access 22 January 2024

Systematic auditing is essential to debiasing machine learning in biology

Article Open access 10 February 2021

Evaluation guidelines for machine learning tools in the chemical sciences

Article 24 May 2022

References

Bleicher, K. H., Bohm, H. J., Muller, K. & Alanine, A. I. Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug. Discov. 2, 369–378 (2003).
Article Google Scholar
Sulakhe, D. et al. High-throughput translational medicine: challenges and solutions. Adv. Exp. Med. Biol. 799, 39–67 (2014).
Article Google Scholar
Howard, J. Quantitative cell biology: the essential role of theory. Mol. Biol. Cell. 25, 3438–3440 (2014).
Article Google Scholar
Cook, C. E. et al. The European Bioinformatics Institute in 2016: data growth and integration. Nucl. Acids Res. 44, D20–26 (2016).
Article Google Scholar
Chicco, D. Ten quick tips for machine learning in computational biology. BioData Mining 10, 35 (2017).
Article Google Scholar
Cios, K. J., Kurgan, L. A. & Reformat, M. Machine learning in the life sciences. IEEE Eng. Med. Biol. Mag. 26, 14–16 (2007).
Article Google Scholar
Google Trends. Google https://trends.google.de/trends (2019).
Rost, B., Radivojac, P. & Bromberg, Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett. 590, 2327–2341 (2016).
Article Google Scholar
Webb, S. Deep learning for biology. Nature 554, 555–557 (2018).
Article Google Scholar
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2017).
Google Scholar
Larranaga, P. et al. Machine learning in bioinformatics. Brief. Bioinform. 7, 86–112 (2006).
Article Google Scholar
Frank, M. R., Wang, D., Cebrian, M. & Rahwan, I. The evolution of citation graphs in artificial intelligence research. Nat. Mach. Intell. 1, 79–85 (2019).
Article Google Scholar
Domingos, P. A few useful things to know about machine learning. Commun. ACM 55, 78–87 (2012).
Article Google Scholar
Chou, K.-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 273, 236–247 (2011).
Article MathSciNet Google Scholar
Ioannidis, J. P. et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 383, 166–175 (2014).
Article Google Scholar
Gron, A. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly Media, 2017).
Chen, S., Arsenault, C. & Larivière, V. Are top-cited papers more interdisciplinary? J. Informetr. 9, 1034–1046 (2015).
Article Google Scholar
Cummings, J. & Kiesler, S. Organization theory and the changing nature of science. J. Org. Des. 3, 1–16 (2014).
Google Scholar
Abramo, G., D’Angelo, C. A. & Di Costa, F. Authorship analysis of specialized vs diversified research output. J. Informetr. 13, 564–573 (2019).
Article Google Scholar
Abramo, G., D’Angelo, C. A. & Di Costa, F. Do interdisciplinary research teams deliver higher gains to science? Scientometrics 111, 317–336 (2017).
Article Google Scholar
Chen, S., Arsenault, C., Gingras, Y. & Larivière, V. Exploring the interdisciplinary evolution of a discipline: the case of biochemistry and molecular biology. Scientometrics 102, 1307–1323 (2015).
Article Google Scholar
Xie, Z., Li, M., Li, J., Duan, X. & Ouyang, Z. Feature analysis of multidisciplinary scientific collaboration patterns based on PNAS. EPJ Data Sci. 7, 5 (2018).
Article Google Scholar
Rinia, E. J., van Leeuwen, T. N. & van Raan, A. F. J. Impact measures of interdisciplinary research in physics. Scientometrics 53, 241–248 (2002).
Article Google Scholar
Larivière, V. & Gingras, Y. On the relationship between interdisciplinarity and scientific impact. J. Am. Soc. Inform. Sci. Technol. 61, 126–131 (2010).
Article Google Scholar
Wallach, J. D., Boyack, K. W. & Ioannidis, J. P. A. Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017. PLoS Biol. 16, e2006930 (2018).
Article Google Scholar
Berger, B. et al. ISCB’s initial reaction to the New England Journal of Medicine editorial on data sharing. PLoS Comput. Biol. 12, e1004816 (2016).
Article Google Scholar
Drazen, J. M. Data sharing and the journal. N. Engl. J. Med. 374, e24 (2016).
Article Google Scholar
Longo, D. L. & Drazen, J. M. Data sharing. N. Engl. J. Med. 374, 276–277 (2016).
Article Google Scholar
Mind meld. Nature 525, 289–290 (2015).
Google Scholar
Nissani, M. Ten cheers for interdisciplinarity: the case for interdisciplinary knowledge and research. Soc. Sci. J. 34, 201–216 (1997).
Article Google Scholar
van Wesel, M., Wyatt, S. & ten Haaf, J. What a difference a colon makes: how superficial factors. Scientometrics 98, 1601–1615 (2014).
Article Google Scholar
Fitzgerald, R. T. & Radmanesh, A. Social media and research visibility. Am. J. Neuroradiol. 36, 637 (2015).
Article Google Scholar
Patton, R. M., Stahl, C. G. & Wells, J. C. Measuring scientific impact beyond citation counts. D-Lib Magazine 22, 5 (2016).
Article Google Scholar

Download references

Acknowledgements

Thanks to T. Karl and I. Weise (both TUM) for invaluable help with technical and administrative aspects of this work. Thanks to the TUM Graduate School (in particular Z. Zhang) for organizing the summer school, to the TUM (in particular H. Keidel and W. Herrmann) for substantial support on several levels including financing the summer school, to the Weizmann Institute, Tel Aviv University, Technion and Hebrew University for financial and general support; thanks also to the enlightening talks by D. Cremers (TUM), M. Linial (IAS Israel, Hebrew University), Y. Ofran (Bar-Ilan University); thanks to PubMed for providing easy access to published articles and supporting automatic access; thanks to the maintainers of Biopython for providing excellent code to access various databases and process biological data. Last, but not least, thanks to all maintainers of public databases and to all experimentalists who enabled this analysis by making their data publicly available. This work was supported by grant no. 640508 from the Deutsche Forschungsgemeinschaft (DFG).

Author information

These authors contributed equally: Maria Littmann, Katharina Selig.

Authors and Affiliations

Department of Informatics, Bioinformatics and Computational Biology, Technical University of Munich, Garching/Munich, Germany
Maria Littmann & Burkhard Rost
Department of Mathematics, Technical University of Munich, Garching/Munich, Germany
Katharina Selig & Donna Ankerst
National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Liel Cohen-Lavi & Tomer Hertz
Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Liel Cohen-Lavi
The Blavatnik School of Computer Science, Tel-Aviv University, Ramat Aviv, Israel
Yotam Frank
Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technical University of Munich, Freising, Germany
Peter Hönigschmid, Evans Kataka & Anja Mösch
Chair of Human-Machine Communication, Technical University of Munich, Munich, Germany
Kun Qian
Educational Physiology Laboratory, Graduate School of Education, The University of Tokyo, Tokyo, Japan
Kun Qian
Institute for Biological and Medical Imaging, Helmholtz Center Munich, Neuherberg, Germany
Avihai Ron & Daniel Razansky
Faculty of Medicine, Technical University of Munich, Munich, Germany
Avihai Ron & Daniel Razansky
Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich, Freising, Germany
Sebastian Schmid
Chair of Nutrition and Immunology, Technical University of Munich, Freising, Germany
Adam Sorbie
Weizmann Institute of Science, Rehovot, Israel
Liran Szlak
The Institute of Biochemistry, Food and Nutrition, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University, Rehovot, Israel
Ayana Dagan-Wiener & Masha Y. Niv
Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
Nir Ben-Tal
The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem, Israel
Masha Y. Niv
Faculty of Medicine, University of Zurich, Zurich, Switzerland
Daniel Razansky
Institute of Pharmacology and Toxicology, University of Zurich, Zurich, Switzerland
Daniel Razansky
Institute for Biomedical Engineering, University of Zurich and ETH Zurich, Zurich, Switzerland
Daniel Razansky
Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland
Daniel Razansky
Group on Language, Audio and Music, Imperial College London, London, UK
Björn W. Schuller
The Shraga Segal Department of Microbiology and Immunology, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Tomer Hertz
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Tomer Hertz
Institute for Advanced Study, Garching/Munich, Germany
Burkhard Rost
School of Life Sciences, Technical University of Munich, Freising, Germany
Burkhard Rost
Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
Burkhard Rost

Authors

Maria Littmann
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Selig
View author publications
You can also search for this author in PubMed Google Scholar
Liel Cohen-Lavi
View author publications
You can also search for this author in PubMed Google Scholar
Yotam Frank
View author publications
You can also search for this author in PubMed Google Scholar
Peter Hönigschmid
View author publications
You can also search for this author in PubMed Google Scholar
Evans Kataka
View author publications
You can also search for this author in PubMed Google Scholar
Anja Mösch
View author publications
You can also search for this author in PubMed Google Scholar
Kun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Avihai Ron
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Adam Sorbie
View author publications
You can also search for this author in PubMed Google Scholar
Liran Szlak
View author publications
You can also search for this author in PubMed Google Scholar
Ayana Dagan-Wiener
View author publications
You can also search for this author in PubMed Google Scholar
Nir Ben-Tal
View author publications
You can also search for this author in PubMed Google Scholar
Masha Y. Niv
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Razansky
View author publications
You can also search for this author in PubMed Google Scholar
Björn W. Schuller
View author publications
You can also search for this author in PubMed Google Scholar
Donna Ankerst
View author publications
You can also search for this author in PubMed Google Scholar
Tomer Hertz
View author publications
You can also search for this author in PubMed Google Scholar
Burkhard Rost
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.L. and K.S. performed the major part of data analysis and of writing the manuscript. M.L. created and adapted the predefined list of articles. K.S. generated figures and performed statistical tests. L.C. assisted in finding interesting correlations in the data by performing complex analyses and statistical test and in generating figures. M.L., K.S., L.C., Y.F., P.H, E.K., A.M., K.Q., A.R., S.S., A.S., L.S. and A. D.-W. participated in the summer school where the idea for this work was developed, were involved in agreeing on the goals and analysis methods of this work, were involved in data analysis by collecting data from the predefined list of articles, and assisted in writing the manuscript. M.L., K.S. and A.M. collected the data for 2018. N.B.-T., M.Y.N, D.R. and B.W.S. supervised the work over the entire time and proofread the manuscript. D.A. provided valuable comments, especially regarding statistical analysis and was involved in manuscript writing. T.H. and B.R. initiated and supervised the summer school where the idea for this project was developed. T.H. provided important comments to refine the analysis and contributed to manuscript writing. B.R. supervised and guided the work over the entire time and proofread the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Maria Littmann or Katharina Selig.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figures, Supplementary Tables, Supplementary Methods

Supplementary Datasets 1–-4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Littmann, M., Selig, K., Cohen-Lavi, L. et al. Validity of machine learning in biology and medicine increased through collaborations across fields of expertise. Nat Mach Intell 2, 18–24 (2020). https://doi.org/10.1038/s42256-019-0139-8

Download citation

Received: 09 August 2019
Accepted: 06 December 2019
Published: 13 January 2020
Issue Date: January 2020
DOI: https://doi.org/10.1038/s42256-019-0139-8

This article is cited by

Artificial intelligence and illusions of understanding in scientific research
- Lisa Messeri
- M. J. Crockett
Nature (2024)
Biological data studies, scale-up the potential with machine learning
- Raj Rajeshwar Malinda
European Journal of Human Genetics (2023)
Large language models in medicine
- Arun James Thirunavukarasu
- Darren Shu Jeng Ting
- Daniel Shu Wei Ting
Nature Medicine (2023)
A modeling method for the development of a bioprocess to optimally produce umqombothi (a South African traditional beer)
- Edwin Hlangwani
- Wesley Doorsamy
- Oluwafemi Ayodeji Adebo
Scientific Reports (2021)
Accelerating antibiotic discovery through artificial intelligence
- Marcelo C. R. Melo
- Jacqueline R. M. A. Maasch
- Cesar de la Fuente-Nunez
Communications Biology (2021)