Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Geometric deep learning enables 3D kinematic profiling across species and environments

Abstract

Comprehensive descriptions of animal behavior require precise three-dimensional (3D) measurements of whole-body movements. Although two-dimensional approaches can track visible landmarks in restrictive environments, performance drops in freely moving animals, due to occlusions and appearance changes. Therefore, we designed DANNCE to robustly track anatomical landmarks in 3D across species and behaviors. DANNCE uses projective geometry to construct inputs to a convolutional neural network that leverages learned 3D geometric reasoning. We trained and benchmarked DANNCE using a dataset of nearly seven million frames that relates color videos and rodent 3D poses. In rats and mice, DANNCE robustly tracked dozens of landmarks on the head, trunk, and limbs of freely moving animals in naturalistic settings. We extended DANNCE to datasets from rat pups, marmosets, and chickadees, and demonstrate quantitative profiling of behavioral lineage during development.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Fully 3D deep learning versus 2D-to-3D triangulation for naturalistic 3D pose detection.
Fig. 2: Rat 7M, a training and benchmark dataset for 3D pose detection.
Fig. 3: DANNCE outperforms DLC on rats with and without markers.
Fig. 4: Kinematic profiling of the mouse behavioral repertoire.
Fig. 5: DANNCE can report the ontogeny of behavioral complexity in rats.
Fig. 6: 3D tracking across the behavioral repertoire of a marmoset and chickadees.

Data availability

The Rat 7M video and motion capture datasets are available at https://doi.org/10.6084/m9.figshare.c.5295370.v3. Mouse training labels, video, and DANNCE predictions are available at https://github.com/spoonsso/dannce/. Statistical source data for Figs. 1, 5 and 6 are included with the manuscript. Source data for Fig. 3 are available at https://doi.org/10.6084/m9.figshare.13884038. Marmoset data are subject to additional veterinary restrictions and can be made available upon request. Source data are provided with this paper.

Code availability

The code for DANNCE is available at https://github.com/spoonsso/dannce/ and https://doi.org/10.5281/zenodo.4567514 (ref. 61). Code for analyzing and plotting data is available at https://doi.org/10.5281/zenodo.4571521 (ref. 62). The code for labeling points in 3D is available at https://github.com/diegoaldarondo/Label3D/. The core functions used for the behavioral embedding23 are available at https://github.com/jessedmarshall/CAPTURE_demo.

References

  1. 1.

    Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron 88, 1121–1135 (2015).

    CAS  Article  Google Scholar 

  2. 2.

    Hong, W. et al. Automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning. Proc. Natl Acad. Sci. USA 112, E5351–E5360 (2015).

    CAS  Article  Google Scholar 

  3. 3.

    Alhwarin, F., Ferrein, A. & Scholl, I. IR Stereo kinect: improving depth images by combining structured light with IR stereo. In PRICAI 2014: Trends in Artificial Intelligence 409–421 (2014).

  4. 4.

    Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

    CAS  Article  Google Scholar 

  5. 5.

    Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019).

    CAS  Article  Google Scholar 

  6. 6.

    Graving, J. M. et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).

    CAS  Article  Google Scholar 

  7. 7.

    Günel, S. et al. DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 8, e48571 (2019).

    Article  Google Scholar 

  8. 8.

    Nath, T. et al. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176 (2019).

    CAS  Article  Google Scholar 

  9. 9.

    Karashchuk, P. et al. Anipose: a toolkit for robust markerless 3D pose estimation. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2020.05.26.117325v1 (2020).

  10. 10.

    Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11, 4560 (2020).

    CAS  Article  Google Scholar 

  11. 11.

    Kar, A., Häne, C. & Malik, J. Learning a multi-view stereo machine. In 31st Conference on Neural Information Processing Systems (2017).

  12. 12.

    Qi, C. R., Nießner, M., Dai, A., Yan, M. & Guibas, L. J. Volumetric and multi-view CNNs for object classification on 3D data. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5648–5656 (2016).

  13. 13.

    Chang, J., Moon, G. & Lee, K. V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 5079–5088 (2018).

  14. 14.

    Ge, L. et al. 3D Hand shape and pose estimation from a single RGB image. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10825–10834 (2019).

  15. 15.

    Pavlakos, G., Zhou, X., Derpanis, K. G. & Daniilidis, K. Harvesting multiple views for marker-less 3D human pose annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 6988–6997 (2017).

  16. 16.

    Iskakov, K., Burkov, E., Lempitsky, V. & Malkov, Y. Learnable triangulation of human pose. In The IEEE International Conference on Computer Vision (ICCV) (2019).

  17. 17.

    Doersch, C. & Zisserman, A. Sim2real transfer learning for 3D human pose estimation: motion to the rescue. In 33rd Conference on Neural Information Processing Systems (2019).

  18. 18.

    Tome, D., Toso, M., Agapito, L. & Russell, C. Rethinking pose in 3D: multi-stage refinement and recovery for markerless motion capture. In 2018 International Conference on 3D Vision (3DV) (2018).

  19. 19.

    Sitzmann, V., Zollhöfer, M. & Wetzstein, G. Scene representation networks: continuous 3D-structure-aware neural scene representations. In 33rd Conference on Neural Information Processing Systems 1–12 (2019).

  20. 20.

    Zimmermann, C., Schneider, A., Alyahyay, M., Brox, T. & Diester, I. FreiPose: a deep learning framework for precise animal motion capture in 3D spaces. Preprint at bioRxiv https://doi.org/10.1101/2020.02.27.967620 (2020).

  21. 21.

    Remelli, E., Han, S., Honari, S., Fua, P. & Wang, R. Lightweight multi-view 3D pose estimation through camera-disentangled representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).

  22. 22.

    Sun, X., Xiao, B., Wei, F., Liang, S. & Wei, Y. Integral human pose regression. In European Conference on Computer Vision (ECCV) (2018).

  23. 23.

    Marshall, J. D. et al. Continuous whole-body 3D kinematic recordings across the rodent behavioral repertoire. Neuron 109, 420–437.e8 (2021).

    CAS  Article  Google Scholar 

  24. 24.

    Berman, G. J., Choi, D. M., Bialek, W. & Shaevitz, J. W. Mapping the stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface 11, 20140672 (2014).

    Article  Google Scholar 

  25. 25.

    Guo, Z. V. et al. Flow of cortical activity underlying a tactile decision in mice. Neuron 81, 179–194 (2014).

    CAS  Article  Google Scholar 

  26. 26.

    Machado, A. S., Darmohray, D. M., Fayad, J., Marques, H. G. & Carey, M. R. A quantitative framework for whole-body coordination reveals specific deficits in freely walking ataxic mice. eLife 4, e07892 (2015).

    Google Scholar 

  27. 27.

    Pozzo, T., Berthoz, A. & Lefort, L. Head stabilization during various locomotor tasks in humans. Exp. Brain Res. 82, 97–106 (1990).

    CAS  Article  Google Scholar 

  28. 28.

    Kalueff, A. V. et al. Neurobiology of rodent self-grooming and its value for translational neuroscience. Nat. Rev. Neurosci. 17, 45–59 (2016).

    CAS  Article  Google Scholar 

  29. 29.

    Tinbergen, N. On aims and methods of ethology. Z. Tierpsychol. 20, 410–433 (1963).

    Article  Google Scholar 

  30. 30.

    Bolles, R. C. & Woods, P. J. The ontogeny of behaviour in the albino rat. Anim. Behav. 12, 427–441 (1964).

    Article  Google Scholar 

  31. 31.

    Andrew, R. J. Precocious adult behaviour in the young chick. Anim. Behav. 14, 485–500 (1966).

    CAS  Article  Google Scholar 

  32. 32.

    Marler, P. & Peters, S. Developmental overproduction and selective attrition: new processes in the epigenesis of birdsong. Dev. Psychobiol. 15, 369–378 (1982).

    CAS  Article  Google Scholar 

  33. 33.

    Golani, I. & Fentress, J. C. Early ontogeny of face grooming in mice. Dev. Psychobiol. 18, 529–544 (1985).

    CAS  Article  Google Scholar 

  34. 34.

    Miller, C. T. et al. Marmosets: a neuroscientific model of human social behavior. Neuron 90, 219–233 (2016).

    CAS  Article  Google Scholar 

  35. 35.

    Sigal, L., Balan, A. O. & Black, M. J. HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 4 (2009).

    Article  Google Scholar 

  36. 36.

    Andriluka, M., Pishchulin, L., Gehler, P. & Schiele, B. 2D human pose estimation: new benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition 3686–3693 (2014).

  37. 37.

    Joo, H. et al. Panoptic Studio: a massively multiview system for social motion capture. In IEEE International Conference on Computer Vision (ICCV) 3334–3342 (2015).

  38. 38.

    Ionescu, C., Papava, D., Olaru, V. & Sminchisescu, C. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2014).

    Article  Google Scholar 

  39. 39.

    Qiu, H., Wang, C., Wang, J., Wang, N. & Zeng, W. Cross view fusion for 3D human pose estimation. In IEEE International Conference on Computer Vision (ICCV) (2019).

  40. 40.

    Oord, A. van den et al. WaveNet: a generative model for raw audio. In 9th ISCA Speech Synthesis Workshop (2016).

  41. 41.

    Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    CAS  Article  Google Scholar 

  42. 42.

    Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 5999–6009 (2017).

  43. 43.

    Pavllo, D., Feichtenhofer, C., Grangier, D. & Auli, M. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR) (2018).

  44. 44.

    Zhang, L., Dunn, T. W., Marshall, J. D., Ölveczky, B. P. & Linderman, S. Animal pose estimation from video data with a hierarchical von Mises-Fisher-Gaussian model. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics 130, 2800–2808 (2021).

  45. 45.

    Bedford, N. L. & Hoekstra, H. E. Peromyscus mice as a model for studying natural variation. eLife 4, 1–13 (2015).

    Article  Google Scholar 

  46. 46.

    Dell, A. I. et al. Automated image-based tracking and its application in ecology. Trends Ecol. Evol. 29, 417–428 (2014).

    Article  Google Scholar 

  47. 47.

    Wiltschko, A. B. et al. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat. Neurosci. 23, 1433–1443 (2020).

    CAS  Article  Google Scholar 

  48. 48.

    Niell, C. M. & Stryker, M. P. Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65, 472–479 (2010).

    CAS  Article  Google Scholar 

  49. 49.

    Markowitz, J. E. et al. The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44–58.e17 (2018).

    CAS  Article  Google Scholar 

  50. 50.

    Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).

    CAS  Article  Google Scholar 

  51. 51.

    Mimica, B., Dunn, B. A., Tombaz, T., Bojja, V. P. T. N. C. S. & Whitlock, J. R. Efficient cortical coding of 3D posture in freely behaving rats. Science 362, 584–589 (2018).

    CAS  Article  Google Scholar 

  52. 52.

    Björklund, A. & Dunnett, S. B. The amphetamine induced rotation test: a re-assessment of Its use as a tool to monitor motor impairment and functional recovery in rodent models of Parkinson’s disease. J. Parkinsons. Dis. 9, 17–29 (2019).

    Article  Google Scholar 

  53. 53.

    Ayaz, A. et al. Layer-specific integration of locomotion and sensory information in mouse barrel cortex. Nat. Commun. 10, 2585 (2019).

    Article  Google Scholar 

  54. 54.

    Batty, E. et al. BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In Advances in Neural Information Processing Systems 32 15706–15717 (2019).

  55. 55.

    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M. & Schiele, B. Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision (ECCV) (2016).

  56. 56.

    Hartley, R. & Zisserman, A. Multiple View Geometry in Computer Vision (Cambridge Univ. Press, 2003).

  57. 57.

    Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. Miccai 234–241 (2015).

  58. 58.

    Newell, A., Yang, K. & Deng, J. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV) (2016).

  59. 59.

    Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. Proc. Track 9, 249–256 (2010).

    Google Scholar 

  60. 60.

    Stephens, G. J., Johnson-Kerner, B., Bialek, W. & Ryu, W. S. Dimensionality and dynamics in the behavior of C. elegans. PLoS Comput. Biol. 4, e1000028 (2008).

    Article  Google Scholar 

  61. 61.

    Dunn, T. W. et al. dannce (3-dimensional aligned neural network for computational ethology). https://doi.org/10.5281/zenodo.4567514 (2021).

  62. 62.

    Dunn, T. W. Analysis code for ‘Geometric deep learning enables 3D kinematic profiling across species and environments.’ https://doi.org/10.5281/zenodo.4571521 (2021).

Download references

Acknowledgements

We thank K. Herrera for 3D renderings; M. Tadross for guidance on the text; G. Pho and K. Mizes for assistance with 2D behavioral tracking; M. Shah and the Harvard Center for Brain Science neuroengineers for technical assistance; T. Pereira for discussion; S. Gannon for marmoset labeling assistance; M. Applegate for assisting in chickadee arena design; E. Naumann for illustrations; and the Black Rock Forest Consortium for permission to catch chickadees. J.D.M. acknowledges support from the Helen Hay Whitney Foundation and NINDS (K99NS112597). K.S.S. from NIH (F32MH122995), D.E.A. from the NSF (DGE1745303), W.L.W. from Harvard College Research Program, D.G.C.H. from Kavli Neural Systems Institute and the Leon Levy Foundation, S.N.C. from the NIMH (F32MH123015), D.A. from Beckman Young Investigator and New York Stem Cell Foundation Robertson Investigator programs, F.W. from the NIH (U19NS107466), T.W.D. from the Donna Bernstein fund and the NIH (R01GM136972), and B.P.Ö. from SFARI (646706), the NIH (R01GM136972), and the Starr Family Foundation.

Author information

Affiliations

Authors

Contributions

The project was conceived by B.P.Ö., T.W.D., and J.D.M. T.W.D. conceived and developed DANNCE. J.D.M. and W.L.W. acquired and analyzed rat, low-resolution mouse, and rat-pup datasets. J.D.M. performed all behavioral analysis in pups and generated all behavioral maps. T.W.D. quantified DANNCE performance and performed mouse kinematic analysis. K.S.S. developed multicamera video-acquisition software, performed high-resolution experiments in mice, and contributed to the DANNCE codebase. D.E.A. developed labeling software, contributed to the DANNCE codebase, and assisted with rat-pup dataset analysis. D.G.C.H. performed marmoset experiments. S.N.C. performed chickadee experiments and contributed to the DANNCE codebase. A.J.G. assisted with rat-pup analysis. B.P.Ö., D.E.C., D.A., F.W. and W.A.F. provided financial support and project guidance. T.W.D. and J.D.M wrote the manuscript with input from all authors. D.E.A. and K.S.S. contributed equally to this work, and this will be noted as such in their CVs.

Corresponding authors

Correspondence to Timothy W. Dunn or Jesse D. Marshall or Bence P. Ölveczky.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Methods thanks Gonzalo de Polavieja, Benjamin Risse, and the other, anonymous, reviewer(s), for their contribution to the peer review of this work. Nina Vogt was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–13 and note.

Reporting Summary

Supplementary Video 1

Animated example of DANNCE tracking capabilities. Blender rendering, animated using real data, of DANNCE tracking capabilities. DANNCE can track the 3D whole-body landmarks of animals in a manner that is robust to occlusions and can be extended to diverse environments and species.

Supplementary Video 2

Finetuning 2D pose-detection networks with limited training data does not enable 3D tracking across multiple behaviors. Video recording of a rat behaving in an open field, with landmarks tracked by DeepLabCut, projected onto video frames (left). We trained DeepLabCut with 225 hand-labeled frames. Shown consecutively are the predictions for the markers on the animal’s head and trunk, and then across the full 20 marker set made using 6 cameras, and then the predictions made using 3 cameras. Right, concurrent wireframe representations of the landmarks in the animal’s egocentric reference frame. Video speed is 1.5 times real time.

Supplementary Video 3

DANNCE, but not DLC, tracks 3D pose in Rat 7M validation data. Video recording of a rat with markers behaving in an open field, with tracked 3D landmarks projected onto video frames (left). Right, concurrent wireframe representations of the tracked 3D landmarks. Shown consecutively are DLC and DANNCE predictions on the validation animal with 3 cameras and then with 6 cameras. Video speed is 3.33 times real time.

Supplementary Video 4

DANNCE can track 3D kinematics in rats using a single camera. Showed consecutively are reprojections of 3D predictions on new camera views from a validation animal not used for training, followed by 3D wireframe predictions for each of six cameras on validation frames in a training animal. All predictions come from passing single-camera volumes through the DANNCE network. The 3D wireframe plot axes are in units of millimeters. Video speed is four times real time.

Supplementary Video 5

DANNCE, but not DLC, tracks 3D markerless rat pose. Video recording of a markerless rat behaving in an open field, with DLC-tracked 3D landmarks projected onto 2 different cameras. Coloring of the predicted landmarks reveals prediction errors that then lead to poor 3D triangulation performance. DLC predictions are then shown concurrent with DANNCE predictions. In this final segment, projections of 3D predictions are plotted on video frames (left), together with wireframe representations of the 3D predictions (right). Predictions were made using six cameras for networks trained on Rat 7M and without any additional training data. Video speed is four times real time.

Supplementary Video 6

DANNCE readily extends to 3D tracking in mice. Video recording of a mouse behaving in an open field, with tracked landmarks projected onto video frames (left). Right, concurrent wireframe representations of the tracked 3D landmarks. Predictions were made using six cameras. Video speed is two times real time. The 3D wireframe plot axes are in units of millimeters.

Supplementary Video 7

DANNCE enables kinematic profiling across behaviors in mice. A series of examples of mouse behavior sampled from behavioral clusters distributed across different categories in the behavioral embedding space. For each cluster sampled (left), four different examples of the rat’s behavior in that cluster are shown, with DANNCE landmark predictions overlaid (right). Videos are shown at real time and repeated twice.

Supplementary Video 8

DANNCE recordings across behavioral development. A series of examples of video recordings with tracked landmarks reprojected (left) and DANNCE 3D predictions (right) across timepoints of behavioral development. Video speed is three times real time.

Supplementary Video 9

Markerless 3D tracking across the marmoset behavioral repertoire. Left, video recording of a marmoset behaving in a naturalistic enclosure, with tracked landmarks projected onto video frames. Right, corresponding DANNCE 3D landmark predictions. Video speed is four times real time. The 3D wireframe plot axes are in units of millimeters.

Supplementary Video 10

Markerless 3D tracking across the chickadee behavioral repertoire. Left, video recording of a chickadee behaving in a naturalistic enclosure, with tracked landmarks projected onto video frames. Right, concurrent wireframe representations of the tracked 3D landmarks. Video speed is two times real time.

Source data

Source Data Fig. 1

Statistical Source Data.

Source Data Fig. 5

Statistical Source Data.

Source Data Fig. 6

Zip file containing two .csv files of Statistical Source Data for marmosets and chickadees.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dunn, T.W., Marshall, J.D., Severson, K.S. et al. Geometric deep learning enables 3D kinematic profiling across species and environments. Nat Methods 18, 564–573 (2021). https://doi.org/10.1038/s41592-021-01106-6

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing