Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Comment
  • Published:

Data leakage jeopardizes ecological applications of machine learning

Machine learning is a popular tool in ecology but many scientific applications suffer from data leakage, causing misleading results. We highlight common pitfalls in ecological machine-learning methods and argue that discipline-specific model info sheets must be developed to aid in model evaluations.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: How data leakage might occur in ecological applications, explained through the lens of shortcut learning.

References

  1. Tuia, D. et al. Nat. Commun. 13, 792 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Valletta, J. J. et al. J. Anim. Behav. 124, 203–220 (2017).

    Article  Google Scholar 

  3. Kapoor, S. & Narayanan, A. Preprint at arXiv, http://arxiv.org/abs/2207.07048 (2022).

  4. Kaufman, S. et al. ACM Trans. Knowl. Discov. Data 6, 15 (2012).

    Article  Google Scholar 

  5. Stock, A., Haupt, A. J., Mach, M. E. & Micheli, F. Ecol. Inform. 48, 37–47 (2018).

    Article  Google Scholar 

  6. Geirhos, R. et al. Nat. Mach. Learn. 2, 665–673 (2020).

    Google Scholar 

  7. Shane, J. Do neural nets dream of electric sheep? AI Weirdness, https://www.aiweirdness.com/do-neural-nets-dream-of-electric-18-03-02/ (2 March 2018)

  8. Beery, S., Van Horn, G. & Perona, P. Recognition in terra incognita. In Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) 472–489 (2018).

  9. Gregr, E. J. et al. Ecography 42, 428–443 (2019).

    Article  Google Scholar 

  10. Stock, A. ISPRS J. Photogramm. Remote Sens. 187, 46–60 (2022).

    Article  Google Scholar 

  11. Roberts, D. R. et al. Ecography 40, 913–929 (2017).

    Article  Google Scholar 

  12. Wiles, O. et al. Preprint at arXiv, https://doi.org/10.48550/arXiv.2110.11328 (2021).

  13. Yates, K. L. et al. Trends Ecol. Evol. 33, 790–802 (2018).

    Article  PubMed  Google Scholar 

  14. Chan, K. M. A. & Gregr, E. J. Hindsight: tackling pattern, scale, and independence to ensure ecosystem models are predictive. functionalecologists.com, https://functionalecologists.com/2018/10/19/hindsight-tackling-pattern-scale-and-independence-to-ensure-ecosystem-models-are-predictive/ (2018).

  15. Valavi, R. et al. Methods Ecol. Evol. 10, 225–232 (2019).

    Article  Google Scholar 

  16. Feng, X. et al. Nat. Ecol. Evol. 3, 1382–1395 (2019).

    Article  PubMed  Google Scholar 

  17. Serra-Garcia, M. & Gneezy, U. Sci. Adv. 7, eabd1705 (, (2021).

  18. Grill, G. Preprint at OSF Preprints, https://doi.org/10.31219/osf.io/zekqv (2022).

  19. Lürig, M. D. et al. 9, 642774 (2021).

Download references

Acknowledgements

We were supported by a Liber Ero Postdoctoral Fellowship (A.S.) and NSERC Discovery Grant RGPIN-2020-05032 (K.M.A.C.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andy Stock.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks the anonymous reviewers for their contribution to the peer review of this work.

Supplementary information

Supplementary Information

Supplementary Figure 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stock, A., Gregr, E.J. & Chan, K.M.A. Data leakage jeopardizes ecological applications of machine learning. Nat Ecol Evol 7, 1743–1745 (2023). https://doi.org/10.1038/s41559-023-02162-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-023-02162-1

This article is cited by

Search

Quick links

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene