Comprehensive descriptions of animal behavior require precise three-dimensional (3D) measurements of whole-body movements. Although two-dimensional approaches can track visible landmarks in restrictive environments, performance drops in freely moving animals, due to occlusions and appearance changes. Therefore, we designed DANNCE to robustly track anatomical landmarks in 3D across species and behaviors. DANNCE uses projective geometry to construct inputs to a convolutional neural network that leverages learned 3D geometric reasoning. We trained and benchmarked DANNCE using a dataset of nearly seven million frames that relates color videos and rodent 3D poses. In rats and mice, DANNCE robustly tracked dozens of landmarks on the head, trunk, and limbs of freely moving animals in naturalistic settings. We extended DANNCE to datasets from rat pups, marmosets, and chickadees, and demonstrate quantitative profiling of behavioral lineage during development.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The Rat 7M video and motion capture datasets are available at https://doi.org/10.6084/m9.figshare.c.5295370.v3. Mouse training labels, video, and DANNCE predictions are available at https://github.com/spoonsso/dannce/. Statistical source data for Figs. 1, 5 and 6 are included with the manuscript. Source data for Fig. 3 are available at https://doi.org/10.6084/m9.figshare.13884038. Marmoset data are subject to additional veterinary restrictions and can be made available upon request. Source data are provided with this paper.
The code for DANNCE is available at https://github.com/spoonsso/dannce/ and https://doi.org/10.5281/zenodo.4567514 (ref. 61). Code for analyzing and plotting data is available at https://doi.org/10.5281/zenodo.4571521 (ref. 62). The code for labeling points in 3D is available at https://github.com/diegoaldarondo/Label3D/. The core functions used for the behavioral embedding23 are available at https://github.com/jessedmarshall/CAPTURE_demo.
Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron 88, 1121–1135 (2015).
Hong, W. et al. Automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning. Proc. Natl Acad. Sci. USA 112, E5351–E5360 (2015).
Alhwarin, F., Ferrein, A. & Scholl, I. IR Stereo kinect: improving depth images by combining structured light with IR stereo. In PRICAI 2014: Trends in Artificial Intelligence 409–421 (2014).
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019).
Graving, J. M. et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).
Günel, S. et al. DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 8, e48571 (2019).
Nath, T. et al. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176 (2019).
Karashchuk, P. et al. Anipose: a toolkit for robust markerless 3D pose estimation. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2020.05.26.117325v1 (2020).
Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11, 4560 (2020).
Kar, A., Häne, C. & Malik, J. Learning a multi-view stereo machine. In 31st Conference on Neural Information Processing Systems (2017).
Qi, C. R., Nießner, M., Dai, A., Yan, M. & Guibas, L. J. Volumetric and multi-view CNNs for object classification on 3D data. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5648–5656 (2016).
Chang, J., Moon, G. & Lee, K. V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 5079–5088 (2018).
Ge, L. et al. 3D Hand shape and pose estimation from a single RGB image. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10825–10834 (2019).
Pavlakos, G., Zhou, X., Derpanis, K. G. & Daniilidis, K. Harvesting multiple views for marker-less 3D human pose annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 6988–6997 (2017).
Iskakov, K., Burkov, E., Lempitsky, V. & Malkov, Y. Learnable triangulation of human pose. In The IEEE International Conference on Computer Vision (ICCV) (2019).
Doersch, C. & Zisserman, A. Sim2real transfer learning for 3D human pose estimation: motion to the rescue. In 33rd Conference on Neural Information Processing Systems (2019).
Tome, D., Toso, M., Agapito, L. & Russell, C. Rethinking pose in 3D: multi-stage refinement and recovery for markerless motion capture. In 2018 International Conference on 3D Vision (3DV) (2018).
Sitzmann, V., Zollhöfer, M. & Wetzstein, G. Scene representation networks: continuous 3D-structure-aware neural scene representations. In 33rd Conference on Neural Information Processing Systems 1–12 (2019).
Zimmermann, C., Schneider, A., Alyahyay, M., Brox, T. & Diester, I. FreiPose: a deep learning framework for precise animal motion capture in 3D spaces. Preprint at bioRxiv https://doi.org/10.1101/2020.02.27.967620 (2020).
Remelli, E., Han, S., Honari, S., Fua, P. & Wang, R. Lightweight multi-view 3D pose estimation through camera-disentangled representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
Sun, X., Xiao, B., Wei, F., Liang, S. & Wei, Y. Integral human pose regression. In European Conference on Computer Vision (ECCV) (2018).
Marshall, J. D. et al. Continuous whole-body 3D kinematic recordings across the rodent behavioral repertoire. Neuron 109, 420–437.e8 (2021).
Berman, G. J., Choi, D. M., Bialek, W. & Shaevitz, J. W. Mapping the stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface 11, 20140672 (2014).
Guo, Z. V. et al. Flow of cortical activity underlying a tactile decision in mice. Neuron 81, 179–194 (2014).
Machado, A. S., Darmohray, D. M., Fayad, J., Marques, H. G. & Carey, M. R. A quantitative framework for whole-body coordination reveals specific deficits in freely walking ataxic mice. eLife 4, e07892 (2015).
Pozzo, T., Berthoz, A. & Lefort, L. Head stabilization during various locomotor tasks in humans. Exp. Brain Res. 82, 97–106 (1990).
Kalueff, A. V. et al. Neurobiology of rodent self-grooming and its value for translational neuroscience. Nat. Rev. Neurosci. 17, 45–59 (2016).
Tinbergen, N. On aims and methods of ethology. Z. Tierpsychol. 20, 410–433 (1963).
Bolles, R. C. & Woods, P. J. The ontogeny of behaviour in the albino rat. Anim. Behav. 12, 427–441 (1964).
Andrew, R. J. Precocious adult behaviour in the young chick. Anim. Behav. 14, 485–500 (1966).
Marler, P. & Peters, S. Developmental overproduction and selective attrition: new processes in the epigenesis of birdsong. Dev. Psychobiol. 15, 369–378 (1982).
Golani, I. & Fentress, J. C. Early ontogeny of face grooming in mice. Dev. Psychobiol. 18, 529–544 (1985).
Miller, C. T. et al. Marmosets: a neuroscientific model of human social behavior. Neuron 90, 219–233 (2016).
Sigal, L., Balan, A. O. & Black, M. J. HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 4 (2009).
Andriluka, M., Pishchulin, L., Gehler, P. & Schiele, B. 2D human pose estimation: new benchmark and state of the art analysis. In 2014 IEEE Conference on Computer Vision and Pattern Recognition 3686–3693 (2014).
Joo, H. et al. Panoptic Studio: a massively multiview system for social motion capture. In IEEE International Conference on Computer Vision (ICCV) 3334–3342 (2015).
Ionescu, C., Papava, D., Olaru, V. & Sminchisescu, C. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2014).
Qiu, H., Wang, C., Wang, J., Wang, N. & Zeng, W. Cross view fusion for 3D human pose estimation. In IEEE International Conference on Computer Vision (ICCV) (2019).
Oord, A. van den et al. WaveNet: a generative model for raw audio. In 9th ISCA Speech Synthesis Workshop (2016).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 5999–6009 (2017).
Pavllo, D., Feichtenhofer, C., Grangier, D. & Auli, M. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
Zhang, L., Dunn, T. W., Marshall, J. D., Ölveczky, B. P. & Linderman, S. Animal pose estimation from video data with a hierarchical von Mises-Fisher-Gaussian model. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics 130, 2800–2808 (2021).
Bedford, N. L. & Hoekstra, H. E. Peromyscus mice as a model for studying natural variation. eLife 4, 1–13 (2015).
Dell, A. I. et al. Automated image-based tracking and its application in ecology. Trends Ecol. Evol. 29, 417–428 (2014).
Wiltschko, A. B. et al. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat. Neurosci. 23, 1433–1443 (2020).
Niell, C. M. & Stryker, M. P. Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65, 472–479 (2010).
Markowitz, J. E. et al. The striatum organizes 3D behavior via moment-to-moment action selection. Cell 174, 44–58.e17 (2018).
Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).
Mimica, B., Dunn, B. A., Tombaz, T., Bojja, V. P. T. N. C. S. & Whitlock, J. R. Efficient cortical coding of 3D posture in freely behaving rats. Science 362, 584–589 (2018).
Björklund, A. & Dunnett, S. B. The amphetamine induced rotation test: a re-assessment of Its use as a tool to monitor motor impairment and functional recovery in rodent models of Parkinson’s disease. J. Parkinsons. Dis. 9, 17–29 (2019).
Ayaz, A. et al. Layer-specific integration of locomotion and sensory information in mouse barrel cortex. Nat. Commun. 10, 2585 (2019).
Batty, E. et al. BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In Advances in Neural Information Processing Systems 32 15706–15717 (2019).
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M. & Schiele, B. Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision (ECCV) (2016).
Hartley, R. & Zisserman, A. Multiple View Geometry in Computer Vision (Cambridge Univ. Press, 2003).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. Miccai 234–241 (2015).
Newell, A., Yang, K. & Deng, J. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV) (2016).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. Proc. Track 9, 249–256 (2010).
Stephens, G. J., Johnson-Kerner, B., Bialek, W. & Ryu, W. S. Dimensionality and dynamics in the behavior of C. elegans. PLoS Comput. Biol. 4, e1000028 (2008).
Dunn, T. W. et al. dannce (3-dimensional aligned neural network for computational ethology). https://doi.org/10.5281/zenodo.4567514 (2021).
Dunn, T. W. Analysis code for ‘Geometric deep learning enables 3D kinematic profiling across species and environments.’ https://doi.org/10.5281/zenodo.4571521 (2021).
We thank K. Herrera for 3D renderings; M. Tadross for guidance on the text; G. Pho and K. Mizes for assistance with 2D behavioral tracking; M. Shah and the Harvard Center for Brain Science neuroengineers for technical assistance; T. Pereira for discussion; S. Gannon for marmoset labeling assistance; M. Applegate for assisting in chickadee arena design; E. Naumann for illustrations; and the Black Rock Forest Consortium for permission to catch chickadees. J.D.M. acknowledges support from the Helen Hay Whitney Foundation and NINDS (K99NS112597). K.S.S. from NIH (F32MH122995), D.E.A. from the NSF (DGE1745303), W.L.W. from Harvard College Research Program, D.G.C.H. from Kavli Neural Systems Institute and the Leon Levy Foundation, S.N.C. from the NIMH (F32MH123015), D.A. from Beckman Young Investigator and New York Stem Cell Foundation Robertson Investigator programs, F.W. from the NIH (U19NS107466), T.W.D. from the Donna Bernstein fund and the NIH (R01GM136972), and B.P.Ö. from SFARI (646706), the NIH (R01GM136972), and the Starr Family Foundation.
The authors declare no competing interests.
Peer review information Nature Methods thanks Gonzalo de Polavieja, Benjamin Risse, and the other, anonymous, reviewer(s), for their contribution to the peer review of this work. Nina Vogt was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figs. 1–13 and note.
Animated example of DANNCE tracking capabilities. Blender rendering, animated using real data, of DANNCE tracking capabilities. DANNCE can track the 3D whole-body landmarks of animals in a manner that is robust to occlusions and can be extended to diverse environments and species.
Finetuning 2D pose-detection networks with limited training data does not enable 3D tracking across multiple behaviors. Video recording of a rat behaving in an open field, with landmarks tracked by DeepLabCut, projected onto video frames (left). We trained DeepLabCut with 225 hand-labeled frames. Shown consecutively are the predictions for the markers on the animal’s head and trunk, and then across the full 20 marker set made using 6 cameras, and then the predictions made using 3 cameras. Right, concurrent wireframe representations of the landmarks in the animal’s egocentric reference frame. Video speed is 1.5 times real time.
DANNCE, but not DLC, tracks 3D pose in Rat 7M validation data. Video recording of a rat with markers behaving in an open field, with tracked 3D landmarks projected onto video frames (left). Right, concurrent wireframe representations of the tracked 3D landmarks. Shown consecutively are DLC and DANNCE predictions on the validation animal with 3 cameras and then with 6 cameras. Video speed is 3.33 times real time.
DANNCE can track 3D kinematics in rats using a single camera. Showed consecutively are reprojections of 3D predictions on new camera views from a validation animal not used for training, followed by 3D wireframe predictions for each of six cameras on validation frames in a training animal. All predictions come from passing single-camera volumes through the DANNCE network. The 3D wireframe plot axes are in units of millimeters. Video speed is four times real time.
DANNCE, but not DLC, tracks 3D markerless rat pose. Video recording of a markerless rat behaving in an open field, with DLC-tracked 3D landmarks projected onto 2 different cameras. Coloring of the predicted landmarks reveals prediction errors that then lead to poor 3D triangulation performance. DLC predictions are then shown concurrent with DANNCE predictions. In this final segment, projections of 3D predictions are plotted on video frames (left), together with wireframe representations of the 3D predictions (right). Predictions were made using six cameras for networks trained on Rat 7M and without any additional training data. Video speed is four times real time.
DANNCE readily extends to 3D tracking in mice. Video recording of a mouse behaving in an open field, with tracked landmarks projected onto video frames (left). Right, concurrent wireframe representations of the tracked 3D landmarks. Predictions were made using six cameras. Video speed is two times real time. The 3D wireframe plot axes are in units of millimeters.
DANNCE enables kinematic profiling across behaviors in mice. A series of examples of mouse behavior sampled from behavioral clusters distributed across different categories in the behavioral embedding space. For each cluster sampled (left), four different examples of the rat’s behavior in that cluster are shown, with DANNCE landmark predictions overlaid (right). Videos are shown at real time and repeated twice.
DANNCE recordings across behavioral development. A series of examples of video recordings with tracked landmarks reprojected (left) and DANNCE 3D predictions (right) across timepoints of behavioral development. Video speed is three times real time.
Markerless 3D tracking across the marmoset behavioral repertoire. Left, video recording of a marmoset behaving in a naturalistic enclosure, with tracked landmarks projected onto video frames. Right, corresponding DANNCE 3D landmark predictions. Video speed is four times real time. The 3D wireframe plot axes are in units of millimeters.
Markerless 3D tracking across the chickadee behavioral repertoire. Left, video recording of a chickadee behaving in a naturalistic enclosure, with tracked landmarks projected onto video frames. Right, concurrent wireframe representations of the tracked 3D landmarks. Video speed is two times real time.
About this article
Cite this article
Dunn, T.W., Marshall, J.D., Severson, K.S. et al. Geometric deep learning enables 3D kinematic profiling across species and environments. Nat Methods 18, 564–573 (2021). https://doi.org/10.1038/s41592-021-01106-6
Nature Methods (2021)