Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments

Marks, Markus; Jin, Qiuhan; Sturman, Oliver; von Ziegler, Lukas; Kollmorgen, Sepp; von der Behrens, Wolfger; Mante, Valerio; Bohacek, Johannes; Yanik, Mehmet Fatih

doi:10.1038/s42256-022-00477-5

Article
Published: 21 April 2022

Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments

Nature Machine Intelligence volume 4, pages 331–340 (2022)Cite this article

7692 Accesses
19 Citations
145 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

Quantification of behaviours of interest from video data is commonly used to study brain function, the effects of pharmacological interventions, and genetic alterations. Existing approaches lack the capability to analyse the behaviour of groups of animals in complex environments. We present a novel deep learning architecture for classifying individual and social animal behaviour—even in complex environments directly from raw video frames—that requires no intervention after initial human supervision. Our behavioural classifier is embedded in a pipeline (SIPEC) that performs segmentation, identification, pose-estimation and classification of complex behaviour, outperforming the state of the art. SIPEC successfully recognizes multiple behaviours of freely moving individual mice as well as socially interacting non-human primates in three dimensions, using data only from simple mono-vision cameras in home-cage set-ups.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the SIPEC workflow and modules.**

**Fig. 2: Performance of SIPEC:SegNet, SIPEC:PoseNet and SIPEC:IdNet under demanding video conditions while using few labels.**

**Fig. 3: SIPEC:BehaveNet outperforms DLC.**

**Fig. 4: SIPEC can recognize social interactions of multiple primates and infer their three-dimensional positions using a single camera.**

Automatic mapping of multiplexed social receptive fields by deep learning and GPU-accelerated 3D videography

Article Open access 01 February 2022

Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework

Article Open access 08 January 2024

Real-time analysis of the behaviour of groups of mice via a depth-sensing camera and machine learning

Article 20 May 2019

Data availability

Mouse data from Sturman and colleagues²⁰ are available under https://zenodo.org/record/3608658. Example mouse data for training are available through our GitHub repository. The primate videos are available to the scientific community on request to V.M. (valerio@ini.uzh.ch).

Code availability

We provide the code for SIPEC at https://github.com/SIPEC-Animal-Data-Analysis/SIPEC (https://doi.org/10.5281/zenodo.5927367) and the GUI for the identification of animals https://github.com/SIPEC-Animal-Data-Analysis/idtracking_gui.

References

Datta, S. R., Anderson, D. J., Branson, K., Perona, P. & Leifer, A. Computational neuroethology: a call to action. Neuron 104, 11–24 (2019).
Article Google Scholar
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature 21, 1281–1289 (2018).
Geuther, B. Q. et al. Robust mouse tracking in complex environments using neural networks. Commun. Biol. 2, 124 (2019).
Article Google Scholar
Romero-Ferrero, F., Bergomi, M. G., Hinz, R. C., Heras, F. J. & de Polavieja, G. idtracker.ai: Tracking all individuals in small or large collectives of unmarked animals. Nat. Methods 16, 179 (2019).
Article Google Scholar
Forys, B. J., Xiao, D., Gupta, P. & Murphy, T. H. Real-time selective markerless tracking of forepaws of head fixed mice using deep neural networks. eNeuro 7, ENEURO.0096-20.2020 (2020).
Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117 (2019).
Article Google Scholar
Graving, J. M. et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).
Article Google Scholar
Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11, 4560 (2020).
Article Google Scholar
Günel, S. et al. DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 8, e48571 (2019).
Article Google Scholar
Chen, Z. et al. AlphaTracker: a multi-animal tracking and behavioral analysis tool. Preprint at https://www.biorxiv.org/content/10.1101/2020.12.04.405159v1 (2020).
Lauer, J. et al. Multi-animal pose estimation and tracking with DeepLabCut. Preprint at https://www.biorxiv.org/content/10.1101/2021.04.30.442096v1 (2021).
Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron 88, 1121–1135 (2015).
Article Google Scholar
Hsu, A. I. & Yttri, E. A. B-SOiD: an open source unsupervised algorithm for discovery of spontaneous behaviors. Nat Commun. 12, 5188 (2019).
Berman, G. J., Choi, D. M., Bialek, W. & Shaevitz, J. W. Mapping the stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface 11, 20140672 (2014).
Article Google Scholar
Whiteway, M. R. et al. Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders. PLoS Comput. Biol. 17, e1009439 (2021).
Article Google Scholar
Calhoun, A. J., Pillow, J. W. & Murthy, M. Unsupervised identification of the internal states that shape natural behavior. Nat. Neurosci. 22, 2040–2049 (2019).
Article Google Scholar
Batty, E. et al. BehaveNet: Nonlinear Embedding and Bayesian Neural Decoding of Behavioral Videos (NeurIPS, 2019).
Nilsson, S. R. et al. Simple behavioral analysis (SimBA)—an open source toolkit for computer classification of complex social behaviors in experimental animals. Preprint at https://www.biorxiv.org/content/10.1101/2020.04.19.049452v2 (2020).
Segalin, C. et al. The Mouse Action Recognition System (MARS) software pipeline for automated analysis of social behaviors in mice. eLife 10, e63720 (2021).
Article Google Scholar
Sturman, O. et al. Deep learning-based behavioral analysis reaches human accuracy and is capable of outperforming commercial solutions. Neuropsychopharmacology 45, 1942–1952 (2020).
Nourizonoz, A. et al. EthoLoop: automated closed-loop neuroethology in naturalistic environments. Nat. Methods 17, 1052–1059 (2020).
Article Google Scholar
Branson, K., Robie, A. A., Bender, J., Perona, P. & Dickinson, M. H. High-throughput ethomics in large groups of Drosophila. Nat. Methods 6, 451–457 (2009).
Article Google Scholar
Dankert, H., Wang, L., Hoopfer, E. D., Anderson, D. J. & Perona, P. Automated monitoring and analysis of social behavior in Drosophila. Nat. Methods 6, 297–303 (2009).
Article Google Scholar
Kabra, M., Robie, A. A., Rivera-Alba, M., Branson, S. & Branson, K. JAABA: interactive machine learning for automatic annotation of animal behavior. Nat. Methods 10, 64 (2013).
Article Google Scholar
Jhuang, H. et al. Automated home-cage behavioural phenotyping of mice. Nat. Commun. 1, 68 (2010).
Article Google Scholar
Hayden, B. Y., Park, H. S. & Zimmermann, J. Automated pose estimation in primates. Am. J. Primatol. https://doi.org/10.1002/ajp.23348 (2021).
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision 2961–2969 (IEEE, 2017).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4700–4708 (IEEE, 2017).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 1724–1734 (Association for Computational Linguistics, 2014).
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NeurIPS 2014 Workshop on Deep Learning (2014).
Deb, D. et al. Face recognition: primates in the wild. Preprint at https://arxiv.org/abs/1804.08790 (2018).
Chollet, F. Xception: deep learning with depthwise separable convolutions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1251–1258 (IEEE, 2017).
Van den Oord, A. et al. WaveNet: a generative model for raw audio. Preprint at https://arxiv.org/abs/1609.03499 (2016)
Bai, S., Kolter, J. Z. & Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. Preprint at https://arxiv.org/abs/1803.01271 (2018).
Jung, A. B. et al. Imgaug (GitHub, 2020); https://github.com/aleju/imgaug
Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 3320–3328 (NeurIPS, 2014).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision 740–755 (Springer, 2014).
Dutta, A. & Zisserman, A. The VIA annotation software for images, audio and video. In Proc. 27th ACM International Conference on Multimedia (ACM, 2019); https://doi.org/10.1145/3343031.3350535
Xiao, B., Wu, H. & Wei, Y. Simple baselines for human pose estimation and tracking. In Computer Vision – ECCV 2018 (eds. Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) 472–487 (Springer International Publishing, 2018).
Tan, M. & Le, Q. V. EfficientNet: rethinking model scaling for convolutional neural networks. Preprint at https://arxiv.org/abs/1905.11946 (2020).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 1097–1105 (NeurIPS, 2012).
Vidal, M., Wolf, N., Rosenberg, B., Harris, B. P. & Mathis, A. Perspectives on individual animal identification from biology and computer vision. Integr. Comp. Biol. 61, 900–916 (2021).
Article Google Scholar
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006).
MathSciNet MATH Google Scholar
Tenenbaum, J. B. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).
Article Google Scholar
Lin, T.-Y. et al. Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 936–944 (IEEE, 2017); https://doi.org/10.1109/CVPR.2017.106
Girshick, R. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV) 1440–1448 (IEEE, 2015); https://doi.org/10.1109/ICCV.2015.169
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929−1958 (2014).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning Vol. 37, 448–456 (JMLR.org, 2015).
Maas, A. L., Hannun, A. Y. & Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In Proc. 30th International Conference on Machine Learning (ICML, 2013).
Xu, B., Wang, N., Chen, T. & Li, M. Empirical evaluation of rectified activations in convolutional network. Preprint at https://arxiv.org/abs/1505.00853 (2015).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR, 2014).
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for Dense object detection. In 2017 IEEE International Conference on Computer Vision (ICCV, 2017).
Bohnslav, J. P. et al. DeepEthogram: a machine learning pipeline for supervised behavior classification from raw pixels. eLife 10, 63377 (2020).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1603.04467 (2016).
Chollet, F. Keras (GitHub, 2015); https://github.com/fchollet/keras

Download references

Acknowledgements

This project was funded by the Swiss Federal Institute of Technology (ETH) Zurich and the European Research Council (ERC) under the ERC Consolidator Award (grant no. 818179 to MFY), SNSF (grant no. CRSII5_198739/1 to MFY; grant no. 310030_172889/1 to J.B., grant no. PP00P3_157539 to V.M.) ETH Research Grant (grant no. ETH-20 19-1 to J.B.), 3RCC (grant no. OC-2019-009 to J.B. and M.F.Y.), the Simons Foundation (award nos. 328189 and 543013 to V.M.) and the Botnar Foundation (to J.B.). We would like to thank P. Tornmalm and V. de La Rochefoucauld for annotating primate data and feedback on primate behaviour, and P. Johnson, B. Yasar, B. Wu, and A. Shah for helpful discussions and feedback.

Author information

Authors and Affiliations

Institute of Neuroinformatics ETH Zürich and University of Zürich, Zurich, Switzerland
Markus Marks, Sepp Kollmorgen, Wolfger von der Behrens, Valerio Mante & Mehmet Fatih Yanik
Neuroscience Center Zurich, ETH Zürich and University of Zürich, Zurich, Switzerland
Markus Marks, Oliver Sturman, Lukas von Ziegler, Sepp Kollmorgen, Wolfger von der Behrens, Valerio Mante, Johannes Bohacek & Mehmet Fatih Yanik
Laboratory for Neuro- and Psychophysiology, Department of Neurosciences, KU Leuven, Leuven, Belgium
Qiuhan Jin
Laboratory of Molecular and Behavioral Neuroscience, Institute for Neuroscience, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
Oliver Sturman, Lukas von Ziegler & Johannes Bohacek

Authors

Markus Marks
View author publications
You can also search for this author in PubMed Google Scholar
Qiuhan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Sturman
View author publications
You can also search for this author in PubMed Google Scholar
Lukas von Ziegler
View author publications
You can also search for this author in PubMed Google Scholar
Sepp Kollmorgen
View author publications
You can also search for this author in PubMed Google Scholar
Wolfger von der Behrens
View author publications
You can also search for this author in PubMed Google Scholar
Valerio Mante
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Bohacek
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Fatih Yanik
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.M. developed, implemented, and evaluated the SIPEC modules and framework. J.Q. developed segmentation filtering, tracking and three-dimensional-estimation. M.M., W.B. and M.F.Y. wrote the manuscript. M.M., O.S., LvZ., S.K., W.B., V.M., J.B. and M.F.Y. conceptualized the study. All authors gave feedback on the manuscript.

Corresponding author

Correspondence to Mehmet Fatih Yanik.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Adam Kepecs and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Individual mouse segmentation.

For mice, SIPEC:SegNet performance in mAP, dice and IoU for single mouse as a function of the number of labels. The lines indicate the means for 5-fold CV while circles, squares, triangles indicate the mAP, dice, and IoU, respectively, for individual folds. All data is represented by mean, showing all points.

Extended Data Fig. 2 Identification performance of mice across days and interventions.

Identification accuracy across days for models trained on day 1. While the performance for the day the model is trained on is very high it drops when tested on day 2 but is still significantly above chance level. When tested on day 3, after a forced swim test intervention, the performance drops significantly. All data is represented by mean, showing all points.

Extended Data Fig. 3 Identification of typical vs difficult frames.

a) Examples of very difficult frames, which are also beyond human single-frame recognition, are excluded for the ‘typical’ frame evaluation. b) Example frames used for the ‘typical’ frame analysis. c) Identification performance is significantly higher on ‘typical’ frames than on all frames. All data is represented by mean, showing all points.

Extended Data Fig. 4 Additional behavioural evaluation.

a) Overall increased F1 score is caused by an increased recall in case of grooming events and precision for unsupported rearing events. b) Comparison of F1 values as well as Pearson Correlation of SIPEC:BehaveNet to human-to-human performance as well as combined model. Using pose estimates in conjunction with raw-pixel classification increases precision in comparison with solely raw-pixel classification while suffering from a decrease in recall. All data is represented by a Tukey box-and-whisker plot, showing all points. Wilcoxon paired test: *P≤0.05; ***P≤0.001; ****P≤0.0001.

Extended Data Fig. 5 3D depth estimates based on mask size.

The inverse of the square root of the mask size (based on SIPEC:SegNet output) highly correlates with the depth of the individual in 3D space.

Extended Data Fig. 6 Comparison of counts of behaviours between SIPEC:BehaveNet, pose estimation based approach and human raters.

Unsupported and supported rears and grooming events were counted per video for n = 20 different mice videos. Behaviours were integrated over multiple frames, as described in Sturman et al. Behavioural counts of 3 different human expert annotators were averaged (in legend as ‘human ground truth’). No significant differences were found for comparing the number of behaviours between SIPEC:BehaveNet and human annotators or Sturman et al. and human annotators (Tukey’s multiple comparison test). All data is represented by mean, showing all points.

Supplementary information

Supplementary Information

Supplementary Figs. 1–9 and Table 1.

Reporting Summary

Supplementary Video 1

A short example video of behaving primates in their homecage environment. SIPEC:SegNet is used to mask different primates and SIPEC:IdNet is used to identify them. During obstructions, the identity of a primate can alter but SIPEC:IdNet quickly recovers the correct identity over the next frames, as it becomes more visible and therefore better identifiable.

Supplementary Video 2

A comparison for tracking four mice by idtracker.ai (left) and SIPEC (right). We used publicly available data from idtracker.ai (https://drive.google.com/drive/folders/1Vua7zd6VuH6jc-NAd1U5iey4wU5bNrm4) as well as idtracker.ai’s publicly available inference results (https://www.youtube.com/watch?v=ANsThSPgBFM) for a tracking comparison. Left: the tracking of idtracker.ai exhibits prolonged label switching errors where the label of two or more animals gets swapped for some time. Right: tracking is performed by SIPEC:SegNet in conjunction with greedy mask matching to track the identities of animals. In this example video, SIPEC is more robust to these kinds of errors than idtracker.ai. (see also Supplementary Video 4).

Supplementary Video 3

Tracking of four mice by SIPEC in an open-field test. The masks generated by SIPEC:SegNet in conjunction with greedy mask matching are used to robustly track identities of four mice in an open-field test (see Methods).

Supplementary Video 4

SIPEC tracking over 52 min video. We used publicly available data from idtracker.ai (https://drive.google.com/drive/folders/1Vua7zd6VuH6jc-NAd1U5iey4wU5bNrm4) and tracked four mice. The masks generated by SIPEC:SegNet in conjunction with greedy mask matching are used to robustly track identities of four mice in an open-field test (see Methods).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marks, M., Jin, Q., Sturman, O. et al. Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments. Nat Mach Intell 4, 331–340 (2022). https://doi.org/10.1038/s42256-022-00477-5

Download citation

Received: 25 October 2020
Accepted: 13 March 2022
Published: 21 April 2022
Issue Date: April 2022
DOI: https://doi.org/10.1038/s42256-022-00477-5

This article is cited by

Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework
- Yaning Han
- Ke Chen
- Pengfei Wei
Nature Machine Intelligence (2024)
Quantifying agonistic interactions between group-housed animals to derive social hierarchies using computer vision: a case study with commercially group-housed rabbits
- Nusret Ipek
- Liesbeth G. W. Van Damme
- Jan Verwaeren
Scientific Reports (2023)
Three-dimensional surface motion capture of multiple freely moving pigs using MAMMAL
- Liang An
- Jilong Ren
- Yebin Liu
Nature Communications (2023)
The joint detection and classification model for spatiotemporal action localization of primates in a group
- Kewei Liang
- Zhiyuan Chen
- Xibo Ma
Neural Computing and Applications (2023)