Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals

Abstract

Markerless three-dimensional (3D) pose estimation has become an indispensable tool for kinematic studies of laboratory animals. Most current methods recover 3D poses by multi-view triangulation of deep network-based two-dimensional (2D) pose estimates. However, triangulation requires multiple synchronized cameras and elaborate calibration protocols that hinder its widespread adoption in laboratory studies. Here we describe LiftPose3D, a deep network-based method that overcomes these barriers by reconstructing 3D poses from a single 2D camera view. We illustrate LiftPose3D’s versatility by applying it to multiple experimental systems using flies, mice, rats and macaques, and in circumstances where 3D triangulation is impractical or impossible. Our framework achieves accurate lifting for stereotypical and nonstereotypical behaviors from different camera angles. Thus, LiftPose3D permits high-quality 3D pose estimation in the absence of complex camera arrays and tedious calibration procedures and despite occluded body parts in freely behaving animals.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: LiftPose3D predicts 3D poses with a single, flexibly positioned camera.
Fig. 2: LiftPose3D performs 3D poses estimation on freely behaving animals with occluded keypoints.
Fig. 3: A pretrained LiftPose3D network predicts 3D poses for diverse data and when triangulation is impossible.

Similar content being viewed by others

Data availability

The datasets used in this paper and their sources are listed in Supplementary Table 2. Data used for Figs. 13 and Extended Data Figs. 1 and 2 can be downloaded from https://doi.org/10.7910/DVN/KHFAEI. Source data are provided with this paper.

Code availability

LiftPose3D code can be installed as a pip package at https://pypi.org/project/liftpose/. The source code and custom software used to acquire images with the LiftPose3D station is available at https://doi.org/10.5281/zenodo.5031774. The code is licensed under GNU General Public License v.3.0.

References

  1. Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019).

    Article  CAS  Google Scholar 

  2. Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

    Article  CAS  Google Scholar 

  3. Günel, S. et al. DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 8, 3686 (2019).

    Article  Google Scholar 

  4. C. Bala, P. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11, 4560 (2020).

    Article  Google Scholar 

  5. Newell, A., Yang, K. & Deng, J. Stacked hourglass networks for human pose estimation. In Proc. European Conference on Computer Vision (ECCV) (2016).

  6. Graving, J. M. et al. Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).

    Article  CAS  Google Scholar 

  7. Fang, H.-S., Xie, S., Tai, Y.-W. & Lu, C. RMPE: Regional multi-person pose estimation. In Proc. IEEE International Conference on Computer Vision (ICCV) (2017).

  8. Wei, S.-E., Ramakrishna, V., Kanade, T. & Sheikh, Y. Convolutional pose machines. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).

  9. Cao, Z., Simon, T., Wei, S.-E. & Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).

  10. Hartley, R. & Zisserman, A. Multiple View Geometry in Computer Vision 2nd edn (Cambridge University Press, Cambridge, 2003).

    Google Scholar 

  11. Dombeck, D. A., Khabbaz, A. N., Collman, F., Adelman, T. L. & Tank, D. W. Imaging large-scale neural activity with cellular resolution in awake, mobile mice. Neuron 56, 43–57 (2007).

    Article  CAS  Google Scholar 

  12. Seelig, J. D. et al. Two-photon calcium imaging from head-fixed Drosophila during optomotor walking behavior. Nat. Methods 7, 535–540 (2010).

    Article  CAS  Google Scholar 

  13. Gaudry, Q., Hong, E. J., Kain, J., de Bivort, B. L. & Wilson, R. I. Asymmetric neurotransmitter release enables rapid odour lateralization in Drosophila. Nature 493, 424–428 (2013).

    Article  CAS  Google Scholar 

  14. Machado, A. S., Darmohray, D. M., Fayad, J., Marques, H. G. & Carey, M. R. A quantitative framework for whole-body coordination reveals specific deficits in freely walking ataxic mice. eLife 4, e07892 (2015).

    Article  Google Scholar 

  15. Isakov, A. et al. Recovery of locomotion after injury in Drosophila melanogaster depends on proprioception. J. Exp. Biol. 219, 1760–1771 (2016).

    Google Scholar 

  16. Uhlmann, V., Ramdya, P., Delgado-Gonzalo, R., Benton, R. & Unser, M. Flylimbtracker: an active contour based approach for leg segment tracking in unmarked, freely behaving Drosophila. PLoS ONE 12, e0173433 (2017).

    Article  Google Scholar 

  17. DeAngelis, B. D., Zavatone-Veth, J. A. & Clark, D. A. The manifold structure of limb coordination in walking Drosophila. eLife 8, 137 (2019).

    Article  Google Scholar 

  18. Lee, H.-J. & Chen, Z. Determination of 3D human body postures from a single view. Comp. Vis. Graphics Image Proc. 30, 148–168 (1985).

    Google Scholar 

  19. Taylor, C. J. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2000).

  20. Chen, C. & Ramanan, D. 3D human pose estimation = 2D pose estimation + matching. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).

  21. Gupta, A., Martinez, J., Little, J. J. & Woodham, R. J. 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014).

  22. Sun, J. J. et al. View-invariant probabilistic embedding for human pose. In Proc. European Conference on Computer Vision (ECCV) (2020).

  23. Nibali, A., He, Z., Morgan, S. & Prendergast, L. 3D human pose estimation with 2D marginal heatmaps. In Proc. IEEE Winter Conference on Applications of Computer Vision (WACV) (2019).

  24. Zhao, L., Peng, X., Tian, Y., Kapadia, M. & Metaxas, D. N. Semantic graph convolutional networks for 3D human pose regression. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

  25. Iskakov, K., Burkov, E., Lempitsky, V. & Malkov, Y. Learnable triangulation of human pose. In Proc. International Conference on Computer Vision (ICCV) (2019).

  26. Kanazawa, A., Zhang, J. Y., Felsen, P. & Malik, J. Learning 3D human dynamics from video. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

  27. Mehta, D. et al. XNect: real-time multi-person 3D motion capture with a single RGB camera. In Proc. ACM Transactions on Graphics (2020).

  28. Rematas, K., Nguyen, C. H., Ritschel, T., Fritz, M. & Tuytelaars, T. Novel views of objects from a single image. IEEE Trans. Patt. Anal. Machine Intell. 39, 1576–1590 (2017).

    Article  Google Scholar 

  29. Rhodin, H., Constantin, V., Katircioglu, I., Salzmann, M. & Fua, P. Neural scene decomposition for multi-person motion capture. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

  30. Martinez, J., Hossain, R., Romero, J. & Little, J. J. A simple yet effective baseline for 3D human pose estimation. In Proc. IEEE International Conference on Computer Vision (ICCV) (2017).

  31. Pavllo, D., Feichtenhofer, C., Grangier, D. & Auli, M. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

  32. Liu, J., Guang, Y. & Rojas, J. GAST-Net: graph attention spatio-temporal convolutional networks for 3D human pose estimation in video. Preprint at https://arxiv.org/abs/2003.14179 (2020).

  33. Cai, Y. et al. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In Proc. IEEE International Conference on Computer Vision (ICCV) (2019).

  34. Yiannakides, A., Aristidou, A. & Chrysanthou, Y. Real-time 3D human pose and motion reconstruction from monocular RGB videos. Comput. Animat. Virt. Worlds (2019).

  35. Card, G. & Dickinson, M. H. Visually mediated motor planning in the escape response of Drosophila. Curr. Biol. 18, 1300–1307 (2008).

    Article  CAS  Google Scholar 

  36. Wosnitza, A., Bockemühl, T., Dübbert, M., Scholz, H. & Büschges, A. Inter-leg coordination in the control of walking speed in Drosophila. J. Exp. Biol. 216, 480–491 (2013).

    Google Scholar 

  37. Marshall, J. D. et al. Continuous whole-body 3D kinematic recordings across the rodent behavioral repertoire. Neuron 109, 420–437.e8 (2021).

    Article  Google Scholar 

  38. De Bono, M. & Bargmann, C. I. Natural variation in a neuropeptide Y receptor homolog modifies social behavior and food response in C. elegans. Cell 94, 679–689 (1998).

    Article  Google Scholar 

  39. Budick, S. A. & O’Malley, D. M. Locomotor repertoire of the larval zebrafish: swimming, turning and prey capture. J. Exp. Biol. 203, 2565–2579 (2000).

    Article  CAS  Google Scholar 

  40. Louis, M., Huber, T., Benton, R., Sakmar, T. P. & Vosshall, L. B. Bilateral olfactory sensory input enhances chemotaxis behavior. Nat. Neurosci. 11, 187–199 (2008).

    Article  CAS  Google Scholar 

  41. Strauss, R. & Heisenberg, M. Coordination of legs during straight walking and turning in Drosophila melanogaster. J. Comp. Physiol. A. 167, 403–412 (1990).

    Article  CAS  Google Scholar 

  42. Clarke, K. & Still, J. Gait analysis in the mouse. Physiol. Behav. 66, 723–729 (1999).

    Article  CAS  Google Scholar 

  43. Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron 88, 1121–1135 (2015).

    Article  CAS  Google Scholar 

  44. Hong, W. et al. Automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning. Proc. Natl. Acad. Sci. USA 112, E5351–E5360 (2015).

    Article  CAS  Google Scholar 

  45. Mendes, C. S., Bartos, I., Akay, T., Márka, S. & Mann, R. S. Quantification of gait parameters in freely walking wild type and sensory deprived Drosophila melanogaster. eLife 2, 231 (2013).

    Google Scholar 

  46. Feng, K. et al. Distributed control of motor circuits for backward walking in Drosophila. Nat. Commun. 11, 6166 (2020).

    Article  CAS  Google Scholar 

  47. Alp Güler, R., Neverova, N. & Kokkinos, I. Densepose: dense human pose estimation in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).

  48. Güler, R. A. & Kokkinos, I. Holopose: holistic 3D human reconstruction in-the-wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).

  49. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G. & Black, M. J. SMPL: a skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 248:1–248:16 (2015).

    Google Scholar 

  50. Zhang, J. Y., Felsen, P., Kanazawa, A. & Malik, J. Predicting 3D human dynamics from video. In Proc. IEEE International Conference on Computer Vision (ICCV) (2019).

  51. Zuffi, S., Kanazawa, A., Berger-Wolf, T. & Black, M. J. Three-D safari: learning to estimate zebra pose, shape, and texture from images ‘in the wild’. In Proc. IEEE International Conference on Computer Vision (ICCV) (2019).

  52. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  53. Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. International Conference on Machine Learning (ICML) (2010).

  54. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).

  55. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. International Conference on Learning Representations (ICLR) (2015).

  56. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. International Conference on Machine Learning (ICML) (2015).

  57. Wandt, B., Rudolph, M., Zell, P., Rhodin, H. & Rosenhahn, B. CanonPose: self-supervised monocular 3D human pose estimation in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021).

  58. Cao, J. et al. Cross-domain adaptation for animal pose estimation. In Proc. IEEE International Conference on Computer Vision (ICCV) (2019).

  59. Sanakoyeu, A., Khalidov, V., McCarthy, M. S., Vedaldi, A. & Neverova, N. Transferring dense pose to proximal animal classes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).

  60. Sridhar, V. H., Roche, D. G. & Gingins, S. Tracktor: image-based automated tracking of animal movement and behaviour. Meth. Ecol. Evol. 10, 815–820 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

P.R. acknowledges support from an SNSF Project grant (no. 175667) and an SNSF Eccellenza grant (no. 181239). A.G. acknowledges support from an HFSP Cross-disciplinary Postdoctoral Fellowship (grant no. LT000669/2020-C). S.G. acknowledges support from an EPFL SV iPhD grant. D.M. holds a Marie Curie EuroTech postdoctoral fellowship and acknowledges that this project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 754462. V.L.-R. acknowledges support from the Mexican National Council for Science and Technology, CONACYT, under the grant number 709993. We thank the Ölveczky laboratory (Center for Brain Science, Harvard University, Boston, MA, USA) for providing us with the CAPTURE dataset. We thank the Carey laboratory (Champalimaud Centre for the Unknown, Lisbon, Portugal) for the LocoMouse dataset.

Author information

Authors and Affiliations

Authors

Contributions

A.G. was responsible for conceptualization, methodology, software, hardware (Drosophila prism-mirror system), formal analysis, data curation, writing the original draft and reviewing and editing. S.G. was responsible for conceptualization, methodology, software, formal analysis, data curation, writing the original draft and reviewing and editing the paper. V.L.-R. was responsible for software and hardware (LiftPose3D station, low-resolution Drosophila ventral view system), data curation, and reviewing and editing the paper. M.P.A. was responsible for methodology, software (LiftPose3D), preliminary analysis of DeepFly3D dataset, data curation, and reviewing and editing the paper. D.M. was responsible for investigation (low-resolution Drosophila experiments), and reviewing and editing the paper. H.R. was responsible for conceptualization, and reviewing and editing the paper. P.F. was responsible for reviewing and editing the paper and funding acquisition. P.R. was responsible for conceptualization, hardware (Drosophila prism-mirror system), resources, writing the original draft, and reviewing and editing the paper, supervision, project administration and funding acquisition.

Corresponding authors

Correspondence to Adam Gosztolai, Semih Günel or Pavan Ramdya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature Methods thanks Katarzyna Bozek, Giorgio Glilestro and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Joint angles resulting from lifting compared with 3D triangulated ground truth and 2D projections.

Joint angles α, β, γ, and ω for the front, mid, and hind left legs during forward walking. Shown are angles computed from 3D triangulation using DeepFly3D (blue), LiftPose3D predictions (red), and ventral 2D projections \(\alpha ^{\prime} ,\beta ^{\prime} ,\gamma ^{\prime}\), and \(\omega ^{\prime}\) (green). The mean (solid lines) and standard deviation of joint error distributions (transparency) are shown. Joint angles were computed by Monte Carlo sampling and errors were computed by taking the fluctuation in bone lengths.

Source data

Extended Data Fig. 2 Training and test loss convergence of the LiftPose3D network applied to a variety of datasets.

A-E Absolute test errors of LiftPose3D for all joints as a function of optimization epoch. A Two-camera data of Drosophila on a spherical treadmill (each color denotes a different pair of diametrically opposed cameras). B OpenMonkeyStudio dataset (each color denotes a different training run). C Single-camera data of Drosophila behaving freely in the right-angle prism-mirror system. D LocoMouse dataset. E CAPTURE dataset.

Source data

Extended Data Fig. 3 Drosophila LiftPose3D station.

A CAD drawing of the LiftPose3D station indicating major components (color-coded). B Photo of the LiftPose3D station. C Electronic circuit for building the illumination module on a pre-fabricated prototyping board (see Supplementary Table 1), electronic components and additional wiring are indicated (color-coded). D Printed circuit board provided as an alternative to the pre-fabricated board for building the illumination module.

Supplementary information

Supplementary Information

Supplementary Note 1 and Table 1.

Reporting Summary

Supplementary Video 1

3D pose lifting for backward walking in tethered Drosophila obtained from two side cameras. Videos obtained from cameras 2 (top left) and 5 (bottom left). DeepFly3D-derived 2D poses are superimposed. Orange circle indicates that the optogenetic stimulation LED light is on, activating MDNs to elicit backward walking. Right, 3D poses obtained by triangulating six camera views using DeepFly3D (solid lines), or lifting two camera views using LiftPose3D (dashed lines).

Supplementary Video 2

3D pose lifting for antennal grooming in tethered Drosophila obtained from two side cameras. Videos obtained from cameras 2 (top left) and 5 (bottom left). DeepFly3D-derived 2D poses are superimposed. Orange circle indicates that the optogenetic stimulation LED light is on, activating aDNs to elicit antennal grooming. Right, 3D poses obtained by triangulating six camera views using DeepFly3D (solid lines), or lifting two camera views using LiftPose3D (dashed lines).

Supplementary Video 3

3D pose lifting for irregular spontaneous limb movements in tethered Drosophila obtained from two side cameras. Videos obtained from cameras 2 (top left) and 5 (bottom left). DeepFly3D-derived 2D poses are superimposed. Right, 3D poses obtained by triangulating six camera views using DeepFly3D (solid lines), or lifting two camera views using LiftPose3D (dashed lines).

Supplementary Video 4

3D pose lifting of previously published OpenMonkeyStudio dataset4 of a freely moving macaque. Left, Single image drawn randomly from one of 62 cameras. Middle, ground truth 3D poses based on triangulation of 2D poses from up to 62 cameras (solid lines), or lifting from a single camera view using LiftPose3D (dashed lines). Right, error distribution across the 62 cameras for a given pose. Camera locations (circles) are color-coded by error. Gray circles denote cameras for which an image was not available. Green circle denotes the camera from which the image was used.

Supplementary Video 5

3D pose lifting of freely behaving Drosophila when triangulation is only partially possible. Single camera images of the ventral (top left) and side (bottom left) views. DeepLabCut-derived 2D poses are superimposed. Right, 3D poses obtained by triangulating partially available multi-view 2D poses (solid lines), or by lifting the ventral 2D pose using LiftPose3D (dashed lines).

Supplementary Video 6

3D pose lifting of previously published freely behaving mouse data14 when triangulation is only partially possible. Side (top left) and ventral (bottom left) views of a freely walking mouse. Superimposed are keypoints on the paws, mouth and proximal tail tracked using the LocoMouse software (blue circles). Using only the ventral view 2D pose, a trained LiftPose3D network can accurately track keypoints in the side view (orange circles).

Supplementary Video 7

3D pose lifting for freely behaving rats in a naturalistic arena. Left, ground truth 3D poses triangulated from six cameras (solid lines) superimposed with LiftPose3D’s predictions using 2D poses from one camera (dashed lines). Right, images from one camera with 2D poses acquired using CAPTURE are superimposed.

Supplementary Video 8

3D pose lifting for low-resolution videos of freely behaving flies when triangulation is impossible. Top, three freely behaving Drosophila in a rounded square arena and recorded ventrally using a single low-resolution camera. Of these, fly 0 is tracked, cropped and rotated leftward. Superimposed are 2D poses for 24 visible joints. (bottom) 3D poses lifted from ventral view 2D poses (xy plane) permit analysis of leg kinematics in the otherwise unobserved xz plane.

Supplementary Video 9

3D pose lifting of previously published ventral view videos16 of freely behaving flies when triangulation is impossible. Top, video from a freely behaving fly within a pill-shaped arena and recorded ventrally using a single high-resolution camera. Bottom left, following tracking, a region-of-interest containing the fly was cropped and rotated to maintain a leftward orientation. Superimposed are 2D poses estimated for 24 visible joints. Bottom middle, 3D poses obtained by lifting ventral view 2D poses. Bottom right, 3D poses lifted from ventral view 2D poses (top) permit analysis of leg kinematics in the otherwise unobserved xz plane (bottom).

Supplementary Video 10

3D pose lifting of data from the Drosophila LiftPose3D station. Left, video of a freely behaving fly in the LiftPose3D station arena. Middle, cropped video around the centroid of the tracked fly, superimposed with 2D pose predictions. Right, lifted 3D poses obtained using ventral 2D poses.

Source data

Source Data Fig. 1

Statistical source data for Fig. 1.

Source Data Fig. 2

Statistical source data for Fig. 2.

Source Data Fig. 3

Statistical source data for Fig. 3.

Source Data Extended Data Fig. 1

Statistical source data for Extended Data Fig. 1.

Source Data Extended Data Fig. 2

Statistical source data for Extended Data Fig. 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gosztolai, A., Günel, S., Lobato-Ríos, V. et al. LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat Methods 18, 975–981 (2021). https://doi.org/10.1038/s41592-021-01226-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-021-01226-z

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing