Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Iterative human and automated identification of wildlife images

A preprint version of the article is available at arXiv.

Abstract

Camera trapping is increasingly being used to monitor wildlife, but this technology typically requires extensive data annotation. Recently, deep learning has substantially advanced automatic wildlife recognition. However, current methods are hampered by a dependence on large static datasets, whereas wildlife data are intrinsically dynamic and involve long-tailed distributions. These drawbacks can be overcome through a hybrid combination of machine learning and humans in the loop. Our proposed iterative human and automated identification approach is capable of learning from wildlife imagery data with a long-tailed distribution. Additionally, it includes self-updating learning, which facilitates capturing the community dynamics of rapidly changing natural systems. Extensive experiments show that our approach can achieve an ~90% accuracy employing only ~20% of the human annotations of existing approaches. Our synergistic collaboration of humans and machines transforms deep learning from a relatively inefficient post-annotation tool to a collaborative ongoing annotation tool that vastly reduces the burden of human annotation and enables efficient and constant model updates.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of a realistic animal classification system.
Fig. 2: Label efficiency comparison with transfer learning on the group 2 validation set (ordered with respect to training sample size).
Fig. 3: Failure cases.

References

  1. 1.

    Steenweg, R. et al. Scaling-up camera traps: monitoring the planet’s biodiversity with networks of remote sensors. Front. Ecol. Environ. 15, 26–34 (2017).

    Article  Google Scholar 

  2. 2.

    Rich, L. N. et al. Assessing global patterns in mammalian carnivore occupancy and richness by integrating local camera trap surveys. Global Ecol. Biogeogr. 26, 918–929 (2017).

    Article  Google Scholar 

  3. 3.

    Barnosky, A. D. et al. Has the Earth’s sixth mass extinction already arrived? Nature 471, 51–57 (2011).

    Article  Google Scholar 

  4. 4.

    Ahumada, J. A. et al. Wildlife insights: a platform to maximize the potential of camera trap and other passive sensor wildlife data for the planet. Environ. Conserv. 47, 1–6 (2020).

    MathSciNet  Article  Google Scholar 

  5. 5.

    Norouzzadeh, M. S. et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl Acad. Sci. 115, E5716–E5725 (2018).

    Article  Google Scholar 

  6. 6.

    Miao, Z. et al. Insights and approaches using deep learning to classify wildlife. Sci. Rep. 9, 8137 (2019).

    Article  Google Scholar 

  7. 7.

    Liu, Z. et al. Large-scale long-tailed recognition in an open world. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2537–2546 (IEEE, 2019).

  8. 8.

    Liu, Z. et al. Open compound domain adaptation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 12406–12415 (IEEE, 2020).

  9. 9.

    Hautier, Y. et al. Anthropogenic environmental changes affect ecosystem stability via biodiversity. Science 348, 336–340 (2015).

    Article  Google Scholar 

  10. 10.

    Barlow, J. et al. Anthropogenic disturbance in tropical forests can double biodiversity loss from deforestation. Nature 535, 144–147 (2016).

    Article  Google Scholar 

  11. 11.

    Ripple, W. J. et al. Conserving the world’s megafauna and biodiversity: the fierce urgency of now. Bioscience 67, 197–200 (2017).

    Article  Google Scholar 

  12. 12.

    Dirzo, R. et al. Defaunation in the Anthropocene. Science 345, 401–406 (2014).

    Article  Google Scholar 

  13. 13.

    O’Connell, A. F., Nichols, J. D. & Karanth, K. U. Camera Traps in Animal Ecology: Methods and Analyses (Springer Science & Business Media, 2010).

  14. 14.

    Burton, A. C. et al. Wildlife camera trapping: a review and recommendations for linking surveys to ecological processes. J. Appl. Ecol. 52, 675–685 (2015).

    Article  Google Scholar 

  15. 15.

    Kays, R., McShea, W. J. & Wikelski, M. Born-digital biodiversity data: millions and billions. Divers. Distrib. 26, 644–648 (2020).

    Article  Google Scholar 

  16. 16.

    Swanson, A. et al. Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci. Data 2, 1–14 (2015).

    Article  Google Scholar 

  17. 17.

    Ahumada, J. A. et al. Community structure and diversity of tropical forest mammals: data from a global camera trap network. Philos. Trans. R. Soc. B Biol. Sci. 366, 2703–2711 (2011).

    Article  Google Scholar 

  18. 18.

    Pardo, L. E. et al. Snapshot Safari: a large-scale collaborative to monitor Africa’s remarkable biodiversity. South Africa J. Sci. https://doi.org/10.17159/sajs.2021/8134 (2021).

  19. 19.

    Anderson, T. M. et al. The spatial distribution of African savannah herbivores: species associations and habitat occupancy in a landscape context. Philos. Trans. R. Soc. B Biol. Sci. 371, 20150314 (2016).

    Article  Google Scholar 

  20. 20.

    Palmer, M., Fieberg, J., Swanson, A., Kosmala, M. & Packer, C. A ‘dynamic’ landscape of fear: prey responses to spatiotemporal variations in predation risk across the lunar cycle. Ecol. Lett. 20, 1364–1373 (2017).

    Article  Google Scholar 

  21. 21.

    Tabak, M. A. et al. Machine learning to classify animal species in camera trap images: applications in ecology. Methods Ecol. Evol. 10, 585–590 (2019).

    Article  Google Scholar 

  22. 22.

    Whytock, R. C. et al. Robust ecological analysis of camera trap data labelled by a machine learning model. Methods Ecol. Evol 12, 1080–1092 (2021).

    Article  Google Scholar 

  23. 23.

    Beery, S., Van Horn, G. & Perona, P. Recognition in terra incognita. In Proc. European Conference on Computer Vision (ECCV) 456–473 (IEEE, 2018).

  24. 24.

    Tabak, M. A. et al. Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2. Ecol. Evol. 10, 10374–10383 (2020).

    Article  Google Scholar 

  25. 25.

    Shahinfar, S., Meek, P. & Falzon, G. How many images do I need? Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring. Ecol. Inform. 57, 101085 (2020).

    Article  Google Scholar 

  26. 26.

    Norouzzadeh, M. S. et al. A deep active learning system for species identification and counting in camera trap images. Methods Ecol. Evol. 12, 150–161 (2020).

    Article  Google Scholar 

  27. 27.

    Willi, M. et al. Identifying animal species in camera trap images using deep learning and citizen science. Methods Ecol. Evol. 10, 80–91 (2019).

    Article  Google Scholar 

  28. 28.

    Schneider, S., Greenberg, S., Taylor, G. W. & Kremer, S. C. Three critical factors affecting automated image species recognition performance for camera traps. Ecol. Evol. 10, 3503–3517 (2020).

    Article  Google Scholar 

  29. 29.

    Kays, R. et al. An empirical evaluation of camera trap study design: how many, how long and when? Methods Ecol. Evol. 11, 700–713 (2020).

    Article  Google Scholar 

  30. 30.

    Prach, K. & Walker, L. R. Four opportunities for studies of ecological succession. Trends Ecol. Evol. 26, 119–123 (2011).

    Article  Google Scholar 

  31. 31.

    Mech, L. D., Isbell, F., Krueger, J. & Hart, J. Gray wolf (Canis lupus) recolonization failure: a Minnesota case study. Can. Field-Nat. 133, 60–65 (2019).

    Article  Google Scholar 

  32. 32.

    Taylor, G. et al. Is reintroduction biology an effective applied science? Trends Ecol. Evol. 32, 873–880 (2017).

    Article  Google Scholar 

  33. 33.

    Clavero, M. & Garcia-Berthou, E. Invasive species are a leading cause of animal extinctions. Trends Ecol. Evol. 20, 110 (2005).

    Article  Google Scholar 

  34. 34.

    Caravaggi, A. et al. An invasive-native mammalian species replacement process captured by camera trap survey random encounter models. Remote Sens. Ecol. Conserv. 2, 45–58 (2016).

    Article  Google Scholar 

  35. 35.

    Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. Preprint at https://arxiv.org/abs/1907.02893 (2019).

  36. 36.

    Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 3320–3328 (IEEE, 2014).

  37. 37.

    Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  38. 38.

    Pimm, S. L. et al. The biodiversity of species and their rates of extinction, distribution and protection. Science https://doi.org/10.1126/science.1246752 (2014).

  39. 39.

    Liu, W., Wang, X., Owens, J. & Li, Y. Energy-based out-of-distribution detection. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) 21464–21475 (Curran Associates, 2020).

  40. 40.

    Lee, D.-H. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, Vol. 3 (2013).

  41. 41.

    He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  42. 42.

    Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at https://arxiv.org/abs/1503.02531 (2015).

  43. 43.

    Gaynor, K. M., Daskin, J. H., Rich, L. N. & Brashares, J. S. Postwar wildlife recovery in an African savanna: evaluating patterns and drivers of species occupancy and richness. Anim. Conserv. 24, 510–522 (2020).

    Article  Google Scholar 

  44. 44.

    Paszke, A. et al. in Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) 8024–8035 http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf (Curran Associates, 2019)

  45. 45.

    Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. Preprint at https://arxiv.org/abs/2002.05709 (2020).

  46. 46.

    He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9729–9738 (IEEE, 2020).

  47. 47.

    Xiao, T., Wang, X., Efros, A. A. & Darrell, T. What should not be contrastive in contrastive learning. Preprint at https://arxiv.org/abs/2008.05659 (2020).

Download references

Acknowledgements

We thank T. Gu, A. Ke, H. Rosen, A. Wu, C. Jurgensen, E. Lai, M. Levy and E. Silverberg for annotating the images used in this study, as well as everyone else involved in this project. Data collection was supported by J. Brashares and through grants to K.M.G. from HHMI BioInteractive, the Rufford Foundation, Idea Wild, the Explorers Club and the UC Berkeley Center for African Studies. We are grateful for the support of Gorongosa National Park, especially M. Stalmans, in permitting and facilitating this research. Z.L. is supported by NTU NAP. K.M.G. is supported by Schmidt Science Fellows in partnership with the Rhodes Trust, and the National Center for Ecological Analysis and Synthesis Director’s Postdoctoral Fellowship. M.S.P. is funded by National Science Foundation grant no. PRFB #1810586.

Author information

Affiliations

Authors

Contributions

This study was conceived by Z.M., Z.L., K.M.G. and M.S.P. The methods were designed by Z.M. and Z.L. Code was written by Z.M., and the computations were undertaken by Z.M. with help from Z.L. The main text was drafted by Z.M. and Z.L., with contributions, editing and comments from all authors. K.M.G. and M.S.P. collected all data and oversaw annotation. Z.M. created all figures and tables in consultation with W.M.G., Z.L. and S.X.Y.

Corresponding authors

Correspondence to Zhongqi Miao or Wayne M. Getz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Dan Morris and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The distribution of images across species in the entire camera trap data set.

There are 55 categories in total. 14 categories were tagged as "unknown” (colored in orange) and used to improve and validate our model’s sensitivity to novel and difficult samples.

Extended Data Fig. 2 The distribution of species across the two groups of data.

We split the data set into two groups to mimic two sequential data collection seasons. In the first group, there are 26 categories (colored in blue). The second group has 41 categories. Group 1 is used in the first period experiment to train a baseline model, and Group 2 is used in the second period experiment to test and update the model.

Extended Data Fig. 3 The overall experimental workflow of our framework.

In the first time step, a baseline model is trained using group 1 training data with only 26 categories. Next, the classifier is fine-tuned using the 14 unknown categories and energy-based loss to increase the sensitivity to out-of-distribution categories. After the classifier is fine-tuned, the classifier is then used to predict classifications for group 2 training data. Here, high-confidence predictions are trusted while low-confidence predictions are flagged for human annotation. In the final step, both machine- and human-annotations are used to update the previous model with OLTR and semi-supervised techniques. Once the model is updated, the classifier is fine-tuned using energy-based loss again for out-of-distribution sensitivity.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Miao, Z., Liu, Z., Gaynor, K.M. et al. Iterative human and automated identification of wildlife images. Nat Mach Intell 3, 885–895 (2021). https://doi.org/10.1038/s42256-021-00393-0

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing