Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Accurate data-driven prediction does not mean high reproducibility

A valid machine model is predictive, but a predictive model may not be valid. The gap between these two can be larger than many practitioners may expect.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Accuracy versus validity.

References

  1. 1.

    Nat. Genet. 51, 1 (2019).

  2. 2.

    Runge, J. et al. Nat. Commun. 10, 2553 (2019).

    Article  Google Scholar 

  3. 3.

    Hussein, A. A. et al. Br. J. Cancer 119, 724–736 (2018).

    Article  Google Scholar 

  4. 4.

    Tam, V. et al. Nat. Rev. Genet. 20, 467–484 (2019).

    Article  Google Scholar 

  5. 5.

    Lewis, R. A., Rao, J. M. & Reiley, D. H. in Proc. 20th International Conference on World Wide Web 157–166 (ACM, 2011).

  6. 6.

    Pearl, J. Causality: Models, Reasoning, and Inference (Cambridge Univ. Press, 2009).

  7. 7.

    Imbens, G. W. & Rubin, D. B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (Cambridge Univ. Press, 2015).

  8. 8.

    Reichstein, M. et al. Nature 566, 195–204 (2019).

    Article  Google Scholar 

  9. 9.

    Hill, A. B. Proc. R. Soc. Med. 58, 295–300 (1965).

    Google Scholar 

  10. 10.

    Pearl, J. Commun. ACM 62, 54–60 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by ARC Discovery Project grant DP170101306 and NHMRC grant 1123042.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jiuyong Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, J., Liu, L., Le, T.D. et al. Accurate data-driven prediction does not mean high reproducibility. Nat Mach Intell 2, 13–15 (2020). https://doi.org/10.1038/s42256-019-0140-2

Download citation

Further reading

  • Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

    • Anthony Culos
    • , Amy S. Tsai
    • , Natalie Stanley
    • , Martin Becker
    • , Mohammad S. Ghaemi
    • , David R. McIlwain
    • , Ramin Fallahzadeh
    • , Athena Tanada
    • , Huda Nassar
    • , Camilo Espinosa
    • , Maria Xenochristou
    • , Edward Ganio
    • , Laura Peterson
    • , Xiaoyuan Han
    • , Ina A. Stelzer
    • , Kazuo Ando
    • , Dyani Gaudilliere
    • , Thanaphong Phongpreecha
    • , Ivana Marić
    • , Alan L. Chang
    • , Gary M. Shaw
    • , David K. Stevenson
    • , Sean Bendall
    • , Kara L. Davis
    • , Wendy Fantl
    • , Garry P. Nolan
    • , Trevor Hastie
    • , Robert Tibshirani
    • , Martin S. Angst
    • , Brice Gaudilliere
    •  & Nima Aghaeepour

    Nature Machine Intelligence (2020)

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing