Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Unsupervised behaviour analysis and magnification (uBAM) using deep learning

A preprint version of the article is available at arXiv.

Abstract

Motor behaviour analysis is essential to biomedical research and clinical diagnostics as it provides a non-invasive strategy for identifying motor impairment and its change caused by interventions. State-of-the-art instrumented movement analysis is time- and cost-intensive, because it requires the placement of physical or virtual markers. As well as the effort required for marking the keypoints or annotations necessary for training or fine-tuning a detector, users need to know the interesting behaviour beforehand to provide meaningful keypoints. Here, we introduce unsupervised behaviour analysis and magnification (uBAM), an automatic deep learning algorithm for analysing behaviour by discovering and magnifying deviations. A central aspect is unsupervised learning of posture and behaviour representations to enable an objective comparison of movement. Besides discovering and quantifying deviations in behaviour, we also propose a generative model for visually magnifying subtle behaviour differences directly in a video without requiring a detour via keypoints or annotations. Essential for this magnification of deviations, even across different individuals, is a disentangling of appearance and behaviour. Evaluations on rodents and human patients with neurological diseases demonstrate the wide applicability of our approach. Moreover, combining optogenetic stimulation with our unsupervised behaviour analysis shows its suitability as a non-invasive diagnostic tool correlating function to brain plasticity.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Unsupervised behaviour analysis and magnification.
Fig. 2: Evaluating the learned disentangled representation.
Fig. 3: Behaviour analysis for disease classification.
Fig. 4: Evaluation of motor function skills at different points during learning.
Fig. 5: Predicting neurophysiological characteristics from behaviour.
Fig. 6: Magnifying impaired behaviour as a diagnostic tool.

Data availability

The rat data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Rats.zip. The optogenetics data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Optogenetics.zip. The mice data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Mice.zip. The human dataset cannot be publicly released because of privacy issues (please contact the authors if needed).

Code availability

The code for training and evaluating our models is publicly available on GitHub at the following address: https://github.com/utabuechler/uBAM (ref. 59).

References

  1. 1.

    Berman, G. J. Measuring behavior across scales. BMC Biol. 16, 23 (2018).

    Article  Google Scholar 

  2. 2.

    Filli, L. et al. Profiling walking dysfunction in multiple sclerosis: characterisation, classification and progression over time. Sci. Rep. 8, 4984 (2018).

    Article  Google Scholar 

  3. 3.

    Vargas-Irwin, C. E. et al. Decoding complete reach and grasp actions from local primary motor cortex populations. J. Neurosci. 30, 9659–9669 (2010).

    Article  Google Scholar 

  4. 4.

    Loper, M. M., Mahmood, N. & Black, M. J. {MoSh}: motion and shape capture from sparse markers. ACM Trans. Graph. 33, 220:1–220:13 (2014).

    Article  Google Scholar 

  5. 5.

    Huang, Y. et al. Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. 37, 185:1–185:15 (2018).

    Google Scholar 

  6. 6.

    Robie, A. A., Seagraves, K. M., Egnor, S. R. & Branson, K. Machine vision methods for analyzing social interactions. J. Exp. Biol. 220, 25–34 (2017).

    Article  Google Scholar 

  7. 7.

    Dell, A. I. et al. Automated image-based tracking and its application in ecology. Trends Ecol. Evol. 29, 417–428 (2014).

    Article  Google Scholar 

  8. 8.

    Peters, S. M. et al. Novel approach to automatically classify rat social behavior using a video tracking system. J. Neurosci. Methods 268, 163–170 (2016).

    Article  Google Scholar 

  9. 9.

    Arac, A., Zhao, P., Dobkin, B. H., Carmichael, S. T. & Golshani, P. DeepBehavior: a deep learning toolbox for automated analysis of animal and human behavior imaging data. Front. Syst. Neurosci. 13, 20 (2019).

    Article  Google Scholar 

  10. 10.

    Graving, J. M. et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).

    Article  Google Scholar 

  11. 11.

    Pereira, T. D. et al. Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019).

    Article  Google Scholar 

  12. 12.

    Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).

    Article  Google Scholar 

  13. 13.

    Simon, T., Joo, H., Matthews, I. & Sheikh, Y. Hand keypoint detection in single images using multiview bootstrapping. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1145–1153 (IEEE, 2017).

  14. 14.

    Nath, T. et al. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176 (2019).

    Article  Google Scholar 

  15. 15.

    Mathis, M. W. & Mathis, A. Deep learning tools for the measurement of animal behavior in neuroscience. Curr. Opin. Neurobiol. 60, 1–11 (2020).

    Article  Google Scholar 

  16. 16.

    Mu, J., Qiu, W., Hager, G. D. & Yuille, A. L. Learning from synthetic animals. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 12386–12395 (IEEE, 2020).

  17. 17.

    Li, S. et al. Deformation-aware unpaired image translation for pose estimation on laboratory animals. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 13158–13168 (IEEE, 2020).

  18. 18.

    Sanakoyeu, A., Khalidov, V., McCarthy, M. S., Vedaldi, A. & Neverova, N. Transferring dense pose to proximal animal classes. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 5233–5242 (IEEE, 2020).

  19. 19.

    Kocabas, M., Athanasiou, N. & Black, M. J. Vibe: video inference for human body pose and shape estimation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 5253–5263 (IEEE, 2020).

  20. 20.

    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G. & Black, M. J. SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 248:1–248:16 (2015).

    Article  Google Scholar 

  21. 21.

    Zuffi, S., Kanazawa, A., Berger-Wolf, T. & Black, M. J. Three-D Safari: learning to estimate zebra pose, shape and texture from images ‘in the wild’. In Proc. IEEE/CVF International Conference on Computer Vision 5359–5368 (IEEE, 2019).

  22. 22.

    Habermann, M., Xu, W., Zollhofer, M., Pons-Moll, G. & Theobalt, C. DeepCap: monocular human performance capture using weak supervision. In Proc IEEE/CVF Conference on Computer Vision and Pattern Recognition 5052–5063 (IEEE, 2020).

  23. 23.

    Batty, E. et al. BehaveNet: nonlinear embedding and Bayesian neural decoding of behavioral videos. In Advances in Neural Information Processing Systems 15680–15691 (NIPS, 2019).

  24. 24.

    Ryait, H. et al. Data-driven analyses of motor impairments in animal models of neurological disorders. PLoS Biol. 17, 1–30 (2019).

    Article  Google Scholar 

  25. 25.

    Kabra, M., Robie, A. A., Rivera-Alba, M., Branson, S. & Branson, K. JAABA: interactive machine learning for automatic annotation of animal behavior. Nat. Methods 10, 64–67 (2012).

    Article  Google Scholar 

  26. 26.

    Brattoli, B., Büchler, U., Wahl, A. S., Schwab, M. E. & Ommer, B. LSTM self-supervision for detailed behavior analysis. In Proc. IEEE/ECVF Conference on Computer Vision and Pattern Recognition 3747–3756 (IEEE, 2017).

  27. 27.

    Büchler, U., Brattoli, B. & Ommer, B. Improving spatiotemporal self-supervision by deep reinforcement learning. In Proc. IEEE/ECVF European Conference on Computer Vision 770–776 (IEEE, 2017).

  28. 28.

    Noroozi, M. & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proc. IEEE/ECVF European Conference on Computer Vision 69–84 (IEEE, 2016).

  29. 29.

    Lee, H. Y., Huang, J. B., Singh, M. K. & Yang, M. H. Unsupervised representation learning by sorting sequences. In Proc. IEEE/ECVF International Conference on Computer Vision 667–676 (IEEE, 2017).

  30. 30.

    Oh, T. H. et al. Learning-based video motion magnification. In Proc. IEEE/CVF European Conference on Computer Vision 633–648 (IEEE, 2018).

  31. 31.

    Liu, C., Torralba, A., Freeman, W. T., Durand, F. & Adelson, E. H. Motion magnification. ACM Trans. Graph 24, 519–526 (2005).

    Article  Google Scholar 

  32. 32.

    Wu, H. Y. et al. Eulerian video magnification for revealing subtle changes in the world. ACM Trans. Graph 31, 65 (2012).

    Article  Google Scholar 

  33. 33.

    Elgharib, M., Hefeeda, M., Durand, F. & Freeman, W. T. Video magnification in presence of large motions. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 4119–4127 (IEEE, 2015).

  34. 34.

    Wadhwa, N., Rubinstein, M., Durand, F. & Freeman, W. T. Phase-based video motion processing. ACM Trans. Graph. 32, 80 (2013).

    Article  Google Scholar 

  35. 35.

    Wadhwa, N., Rubinstein, M., Durand, F. & Freeman, W. T. Riesz pyramids for fast phase-based video magnification. In Proc. International Conference on Computational Photography 1–10 (IEEE, 2014).

  36. 36.

    Zhang, Y., Pintea, S. L. & Van Gemert, J. C. Video acceleration magnification. In Proc. IEEE/ECVF Conference on Computer Vision and Pattern Recognition 529–537 (IEEE, 2017).

  37. 37.

    Tulyakov, S. et al. Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2396–2404 (IEEE, 2016).

  38. 38.

    Dekel, T., Michaeli, T., Irani, M. & Freeman, W. T. Revealing and modifying non-local variations in a single image. ACM Trans. Graph. 34, 227 (2015).

    Article  Google Scholar 

  39. 39.

    Wadhwa, N., Dekel, T., Wei, D., Durand, F. & Freeman, W. T. Deviation magnification: revealing departures from ideal geometries. ACM Trans. Graph. 34, 226 (2015).

    Article  Google Scholar 

  40. 40.

    Kingma, D.P. & Welling, M. Auto-encoding variational bayes. In 2nd International Conference on Learning Representations (ICLR, 2014).

  41. 41.

    Goodfellow, I. et al. Generative adversarial nets. In Proc. Advances in Neural Information Processing Systems Vol. 27, 2672–2680 (NIPS, 2014).

  42. 42.

    Esser, P., Sutter, E. & Ommer, B. A variational U-Net for conditional appearance and shape generation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8857–8866 (IEEE, 2018).

  43. 43.

    Goodman, A. D. et al. Sustained-release oral fampridine in multiple sclerosis: a randomised, double-blind, controlled trial. Lancet 373, 732–738 (2009).

    Article  Google Scholar 

  44. 44.

    Zörner, B. et al. Prolonged-release fampridine in multiple sclerosis: improved ambulation effected by changes in walking pattern. Mult. Scler. 22, 1463–1475 (2016).

    Article  Google Scholar 

  45. 45.

    Schniepp, R. et al. Walking assessment after lumbar puncture in normal-pressure hydrocephalus: a delayed improvement over 3 days. J. Neurosurg. 126, 148–157 (2017).

    Article  Google Scholar 

  46. 46.

    Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proc. IEEE/ECVF Conference on Computer Vision and Pattern Recognition 6450–6459 (IEEE, 2018).

  47. 47.

    Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  48. 48.

    Lafferty, C. K. & Britt, J. P. Off-target influences of arch-mediated axon terminal inhibition on network activity and behavior. Front. Neural Circuits 14, 10 (2020).

    Article  Google Scholar 

  49. 49.

    Miao, C. et al. Hippocampal remapping after partial inactivation of the medial entorhinal cortex. Neuron 88, 590–603 (2015).

    Article  Google Scholar 

  50. 50.

    Carta, I., Chen, C. H., Schott, A. L., Dorizan, S. & Khodakhah, K. Cerebellar modulation of the reward circuitry and social behavior. Science 363, eaav0581 (2019).

    Article  Google Scholar 

  51. 51.

    Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Proc. Advances in Neural Information Processing Systems 1097–1105 (NIPS, 2012).

  52. 52.

    Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    MathSciNet  Article  Google Scholar 

  53. 53.

    Johnson, J., Alahi, A. & Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proc. IEEE/ECVF European Conference on Computer Vision 694–711 (Springer, 2016).

  54. 54.

    Alaverdashvili, M. & Whishaw, I. Q. A behavioral method for identifying recovery and compensation: hand use in a preclinical stroke model using the single pellet reaching task. Neurosci. Biobehav. Rev. 37, 950–967 (2013).

    Article  Google Scholar 

  55. 55.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  56. 56.

    Fisher, R. A. The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936).

    Article  Google Scholar 

  57. 57.

    Wahl, A. S. et al. Optogenetically stimulating intact rat corticospinal tract post-stroke restores motor control through regionalized functional circuit formation. Nat. Commun. 8, 1187 (2017).

    Article  Google Scholar 

  58. 58.

    Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    MATH  Google Scholar 

  59. 59.

    Brattoli, B., Buechler, U. & Ommer, B. Source code of uBAM: first release (version v.1.0) (2020); https://github.com/utabuechler/uBAM. https://doi.org/10.5281/zenodo.4304070

Download references

Acknowledgements

This work was supported in part by German Research Foundation (DFG) projects 371923335 and 421703927 to B.O. as well as the Branco Weiss Fellowship Society in Science and the Swiss National Foundation Grant (Nr. 192678) to ASW.

Author information

Affiliations

Authors

Contributions

B.B., U.B. and B.O. developed uBAM. B.B. and U.B. implemented and evaluated the framework and M.D. and P.R. the VAE. A.-S.W., L.F. and F.H. conducted the biomedical experiments and validated the results. B.B., U.B. and B.O. prepared the figures with input from A.-S.W. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Björn Ommer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Ahmet Arac, Sven Dickinson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Qualitative comparison with the state-of-the-art in motion magnification.

To compare our results with Oh et al.(30), we show five clips from different impaired subjects before and after magnification for both methods. First, we re-synthesize the healthy reference behavior to change the appearance to that of the impaired subject so differences in posture can be studied directly, first row (see Method). The second row is the query impaired sequence. Third and forth rows show the magnified frame using the method by Oh et al.(30) and our approach, respectively. The magnified results, represented by magenta markers, show that Oh et al. corrupts the subject appearance, while our method emphasises the differences in posture without altering the appearance. (Details in Supplementary).

Extended Data Fig. 2 Quantitative comparison with the state-of-the-art in motion magnification.

a: mean-squared difference (white = 0) between the original query frame and its magnification using our method and the approach proposed by Oh et al.(30). For impaired subjects, our method modifies only the leg posture, while healthy subjects are not altered. Oh et al.(30) mostly changes the background and alters impaired and healthy indiscriminately. b: Measuring the fraction of frames with important deviation from healthy reference behaviour for each subject and video sequence and plotting the distribution of these scores. c, mean and standard deviation of deviation scores per cohort and approach. (Details in Supplementary).

Extended Data Fig. 3 Abnormality posture before and after magnification.

We show that our magnification supports spotting abnormal postures by applying a generic classifier on our behaviour magnified frames. This doubles the amount of detected abnormal postures without introducing a substantial number of false positives. In particular, we use a one-class linear-svm on ImageNet features trained only on one group (that is healthy) and predict abnormalities on healthy and impaired before and after magnification. The ratio of abnormalities is unaltered within the healthy cohort ( ~ 2%) while it doubles in the impaired cohort (5.7% to 11.7%) showing that our magnification method can detect and magnify small deviations, but that it does not artificially introduce abnormalities. (Details in Supplementary).

Extended Data Fig. 4 Qualitative evaluation of our posture encoding on the rat grasping dataset.

Projection from our posture encoding to a 2D embedding of 1000 randomly chosen postures using tSNE. Similar postures are located close to each other and the grasping action can be reconstructed by following the circle clockwise (best viewed by zooming in on the digital version of this figure). (Details in Supplementary).

Extended Data Fig. 5 Comparison with PCA of posture encoding.

a: A single video clip projected onto the two most important factors of variation using PCA directly on RGB input (left) and our representation (right). Consecutive frames are connected by straight lines colourised according to the time within the video. Every four frames we plot the original frame. PCA is able to sort the frames over time automatically, showing that each cycle is overlapping with the previous one. Our representation better separates different postures thus reflected by the circular shape of the embedding. b: same as a but including more videos. Each colour represent a different subject. In this case, PCA is strongly biased towards the subject appearance. Thus it separates subjects and does not allow to compare behaviour. c: We reduce the appearance bias by normalising per video with the mean appearance. The result still shows subject separation and no similarity of posture across subjects. d: Using our posture representation and applying PCA on Eπ instead of directly on video frames shows no subject bias and only similar postures are near in the 2D space. (Details in Supplementary).

Extended Data Fig. 6 Disentanglement comparison with simple baseline.

We transfer posture from a subject (row) to others with different appearance (columns). a: A baseline model which uses the average video frames as appearance. The appearance is subtracted from each frame to extract the posture. b: Disentanglement using our custom VAE for extracting posture and appearance. Checking for consistency in posture along a row and for similarity in appearance along a column shows that disentanglement is a hard problem: a pixel-based representation cannot solve the task, while our model produces more detailed and realistic images. (Details in Supplementary).

Extended Data Fig. 7 DeepLabCut trainset size.

We train DLC models on a growing number of training samples. The model is evaluated as described in Fig. 2 of the main manuscript. Note the limited gain in performance despite annotation increasing by more than an order of magnitude. (Details in Supplementary).

Extended Data Fig. 8 Comparison with R3D.

Besides JAABA and DLC we also compare our method with R3D which is another non-parametric model, very popular for video classification. We extract R3D features and evaluate the representation using the same protocol as our method. Our model is more suited to behaviour analysis. More information regarding the evaluation protocol can be found in the Methods section of the main manuscript. (Details in Supplementary).

Extended Data Fig. 9 Regress Key-points.

We show qualitative results for the key-point regression from our posture representation to key-points and ene-to-end inferred key-points for DLC. This experiment was computed on 14 keypoints, however we only show 6 for clarity: wrist (yellow), start of the first finger (purple), tip of each finger. The ground-truth location is shown with a circle and the detection inferred by the model with a cross. Even though our representation was not trained on keypoint detection, for some frames we can recover keypoints as good as, or even better, than DLC which was trained end-to-end on the task. We study the gap in performance in more detail in the Supplementary (Supplementary Figure 3).

Extended Data Fig. 10 Typical high/low scoring grasps with optogenetics.

Given the classifier that produced Fig. 5b, we score all testing sequences from the same animal and show two typical sequences with high/low classification scores. The positive score indicates that the sequence was predicted as light-on, the negative that it was predicted as light-off. Both sequences are correctly classified as indicated by the ground-truth (‘GT’) and classifier score (‘SVM-Score’). The sequence on the left shows a missed grasp, consistent with a light-on inhibitory behaviour, while the same animal performs a successful grasp in the sequence on the right for the light-off. Obviously, the classifier cannot see the fiber optics, since we cropped this area out before passing it to the classifier. (Details in Supplementary).

Supplementary information

Supplementary Information

Supplementary Figs. 1–3, Tables 1–6 and Discussion.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Brattoli, B., Büchler, U., Dorkenwald, M. et al. Unsupervised behaviour analysis and magnification (uBAM) using deep learning. Nat Mach Intell 3, 495–506 (2021). https://doi.org/10.1038/s42256-021-00326-x

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing