A-SOiD, an active-learning platform for expert-guided, data-efficient discovery of behavior

Tillmann, Jens F.; Hsu, Alexander I.; Schwarz, Martin K.; Yttri, Eric A.

doi:10.1038/s41592-024-02200-1

Article
Published: 21 February 2024

A-SOiD, an active-learning platform for expert-guided, data-efficient discovery of behavior

Nature Methods volume 21, pages 703–711 (2024)Cite this article

2826 Accesses
117 Altmetric
Metrics details

Subjects

Abstract

To identify and extract naturalistic behavior, two methods have become popular: supervised and unsupervised. Each approach carries its own strengths and weaknesses (for example, user bias, training cost, complexity and action discovery), which the user must consider in their decision. Here, an active-learning platform, A-SOiD, blends these strengths, and in doing so, overcomes several of their inherent drawbacks. A-SOiD iteratively learns user-defined groups with a fraction of the usual training data, while attaining expansive classification through directed unsupervised classification. In socially interacting mice, A-SOiD outperformed standard methods despite requiring 85% less training data. Additionally, it isolated ethologically distinct mouse interactions via unsupervised classification. We observed similar performance and efficiency using nonhuman primate and human three-dimensional pose data. In both cases, the transparency in A-SOiD’s cluster definitions revealed the defining features of the supervised classification through a game-theoretic approach. To facilitate use, A-SOiD comes as an intuitive, open-source interface for efficient segmentation of user-defined behaviors and discovered sub-actions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Human annotations of social interactions cannot be easily represented in unsupervised embeddings.**

**Fig. 2: Active learning improves data efficiency and overall performance via different feature weighting.**

**Fig. 3: Benchmark results against state-of-the-art supervised classification methods.**

**Fig. 4: Unsupervised clustering can be used to discover and integrate behavioral expressions in previously unspecific data.**

**Fig. 5: Efficient segmentation of monkey behavioral repertoire.**

Principal component analysis

Article 22 December 2022

Scientific discovery in the age of artificial intelligence

Article 02 August 2023

ColabFold: making protein folding accessible to all

Article Open access 30 May 2022

Data availability

The social mice dataset (CalMS21) used in this study is available online (https://doi.org/10.22002/D1.1991)^24,51. The single-monkey dataset used in this study is available in our GitHub repository (https://github.com/YttriLab/asoid_paper/demo_dataset) thanks to the laboratories of J. Zimmermann and B. Hayden at the University of Minnesota. The human dataset is part of the public NTU-RGB+D Action Recognition Dataset made available by the ROSE Laboratory at the Nanyang Technological University, Singapore³⁸.

Code availability

The app, further documentation and the open-source code written in Python can be found at https://github.com/YttriLab/A-SOID (ref. ⁵²) and is licensed under a modified BSD-3 license that permits unrestricted usage for non-commercial applications. The code to generate these figures is open-source and available in a GitHub repository at https://github.com/YttriLab/asoid_paper (ref. ⁵³).

References

Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Article CAS PubMed Google Scholar
Lauer, J. et al. Multi-animal pose estimation, identification and tracking with DeepLabCut. Nat. Methods 19, 496–504 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pereira, T. D. et al. SLEAP: a deep learning system for multi-animal pose tracking. Nat. Methods 19, 486–495 (2022).
Article CAS PubMed PubMed Central Google Scholar
Graving, J. M. et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019).
Article CAS PubMed PubMed Central Google Scholar
Segalin, C. et al. The mouse action recognition system (MARS) software pipeline for automated analysis of social behaviors in mice. eLife 10, e63720 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat. Commun. 11, 1–12 (2020).
Article Google Scholar
Ro, S. et al. Simple Behavioral Analysis (SimBA) - an open source toolkit for computer classification of complex social behaviors in experimental animals. Preprint at bioRxiv https://doi.org/10.1101/2020.04.19.049452 (2020).
Hsu, A. I. & Yttri, E. A. B-SOiD, an open-source unsupervised algorithm for identification and fast prediction of behaviors. Nat. Commun. 12, 1–13 (2021).
Article Google Scholar
Luxem, K. et al. Identifying behavioral structure from deep variational embeddings of animal motion. Commun. Biol. 5, 1–15 (2022).
Article Google Scholar
Wiltschko, A. B. et al. Mapping sub-second structure in mouse behavior. Neuron 88, 1121–1135 (2015).
Article CAS PubMed PubMed Central Google Scholar
Schweihoff, J. F. et al. DeepLabStream enables closed-loop behavioral experiments using deep learning-based markerless, real-time posture detection. Commun. Biol. 4, 1–11 (2021).
Article Google Scholar
Kane, G. A., Lopes, G., Saunders, J. L., Mathis, A. & Mathis, M. W. Real-time, low-latency closed-loop feedback using markerless posture tracking. eLife 9, 1–29 (2020).
Article Google Scholar
Nourizonoz, A. et al. EthoLoop: automated closed-loop neuroethology in naturalistic environments. Nat. Methods 17, 1052–1059 (2020).
Article CAS PubMed Google Scholar
Klibaite, U. et al. Deep phenotyping reveals movement phenotypes in mouse neurodevelopmental models. Mol. Autism 13, 1–18 (2022).
Article Google Scholar
Giancardo, L. et al. Automatic visual tracking and social behaviour analysis with multiple mice. PLoS ONE 8, e74557 (2013).
Article CAS PubMed PubMed Central Google Scholar
De Chaumont, F. et al. Computerized video analysis of social interactions in mice. Nat. Methods 9, 410–417 (2012).
Article PubMed Google Scholar
Hong, W. et al. Automated measurement of mouse social behaviors using depth sensing, video tracking, and machine learning. Proc. Natl Acad. Sci. USA 112, E5351–E5360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kabra, M., Robie, A. A., Rivera-Alba, M., Branson, S. & Branson, K. JAABA: interactive machine learning for automatic annotation of animal behavior. Nat. Methods 10, 64–67 (2013).
Article CAS PubMed Google Scholar
von Ziegler, L., Sturman, O. & Bohacek, J. Big behavior: challenges and opportunities in a new era of deep behavior profiling. Neuropsychopharmacology 46, 33–44 (2020).
Article Google Scholar
Berman, G. J., Choi, D. M., Bialek, W. & Shaevitz, J. W. Mapping the stereotyped behaviour of freely moving fruit flies. J. R. Soc. Interface 11, 20140672 (2014).
Article PubMed PubMed Central Google Scholar
Marshall, J. D. et al. Continuous whole-body 3D kinematic recordings across the rodent behavioral repertoire. Neuron 109, 420–437 (2021).
Article CAS PubMed Google Scholar
Goodwin, N. L., Nilsson, S. R., Choong, J. J. & Golden, S. A. Toward the explainability, transparency, and universality of machine learning for behavioral classification in neuroscience. Curr. Opin. Neurobiol. 73, 102544 (2022).
Article CAS PubMed PubMed Central Google Scholar
Berman, G. J. Measuring behavior across scales. BMC Biol. 16, 1–11 (2018).
Article Google Scholar
Caltech, J. J. S. et al. The multi-agent behavior dataset: mouse dyadic social interactions. Preprint at arXiv https://doi.org/10.48550/arXiv.2104.02710 (2021).
Karashchuk, P. et al. Anipose: a toolkit for robust markerless 3D pose estimation. Cell Rep. 36, 109730 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dunn, T. W. et al. Geometric deep learning enables 3D kinematic profiling across species and environments. Nat. Methods 18, 564–573 (2021).
Article CAS PubMed PubMed Central Google Scholar
Günel, S. et al. Deepfly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 8, e48571 (2019).
Article PubMed PubMed Central Google Scholar
Todd, J. G., Kain, J. S. & de Bivort, B. L. Systematic exploration of unsupervised methods for mapping behavior. Phys. Biol. 14, 015002 (2017).
Article PubMed Google Scholar
Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. GitHub https://github.com/slundberg/shap (2017).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article PubMed PubMed Central Google Scholar
Mathis, M. W. & Mathis, A. Deep learning tools for the measurement of animal behavior in neuroscience. Curr. Opin. Neurobiol. 60, 1–11 (2020).
Article CAS PubMed Google Scholar
Bohnslav, J. P. et al. DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. eLife 10, e63377 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article Google Scholar
Blagus, R. & Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinform. 14, 1–16 (2013).
Google Scholar
Maldonado, S., López, J. & Vairetti, C. An alternative SMOTE oversampling strategy for high-dimensional datasets. Appl. Soft Comput. 76, 380–389 (2019).
Article Google Scholar
Winslow, J. T. Mouse social recognition and preference. Curr.Protoc. Neurosci. 22, 1–8 (2003).
Article Google Scholar
Yang, M., Loureiro, D., Kalikhman, D. & Crawley, J. N. Male mice emit distinct ultrasonic vocalizations when the female leaves the social interaction arena. Front. Behav. Neurosci. 7, 159 (2013).
Article PubMed PubMed Central Google Scholar
Shahroudy, A., Liu, J., Ng, T. T. & Wang, G. NTU RGB+D: a large scale dataset for 3D human activity analysis. Preprint at arXiv https://doi.org/10.48550/arXiv.1604.02808 (2016).
Sturman, O. et al. Deep learning-based behavioral analysis reaches human accuracy and is capable of outperforming commercial solutions. Neuropsychopharmacology 45, 1942–1952 (2020).
Article PubMed PubMed Central Google Scholar
Datta, S. R., Anderson, D. J., Branson, K., Perona, P. & Leifer, A. Computational neuroethology: a call to action. Neuron 104, 11–24 (2019).
Article CAS PubMed PubMed Central Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Preprint at arXiv https://doi.org/10.48550/arXiv.1802.03426 (2020).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).
Article Google Scholar
Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, eaax1971 (2019).
Article CAS PubMed PubMed Central Google Scholar
van Unen, V. et al. Mass cytometry of the human mucosal immune system identifies tissue- and disease-associated immune subsets. Immunity 44, 1227–1239 (2016).
Article PubMed Google Scholar
Campello, R. J., Moulavi, D. & Sander, J. Density-based clustering based on hierarchical density estimates. Pacific-Asia Conference on Knowledge Discovery and Data Mining 10.1007/978-3-642-37456-2_14 (2013).
Stringer, C. et al. Spontaneous behaviors drive multidimensional, brainwide activity. Science 364, eaav7893 (2019).
Article CAS Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article CAS PubMed PubMed Central Google Scholar
Maisson, D. J.-N. et al. Widespread coding of navigational variables in prefrontal cortex. Curr. Biol. 33, 3478–3488 (2023).
Article CAS PubMed Google Scholar
Voloh, B. et al. Hierarchical action encoding in prefrontal cortex of freely moving macaques. Cell Rep. 42, 113091 (2023).
Article CAS PubMed PubMed Central Google Scholar
Friard, O. & Gamba, M. BORIS: a free, versatile open-source event-logging software for video/audio coding and live observations. Methods Ecol. Evol. 7, 1325–1330 (2016).
Article Google Scholar
Sun, J. J., Kennedy, A., Zhan, E., Yue, Y. & Perona, P. Task programming: learning data efficient behavior representations. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. https://doi.org/10.1109/CVPR46437.2021.00290 (2021).
Tillmann, J. F., Hsu, A. I., Schwarz, M. K. & Yttri, E. A. A-soid: an active learning platform for expert-guided, data efficient discovery of behavior. Zenodo https://doi.org/10.5281/zenodo.10210509 (2023).
Tillmann, J. F., Hsu, A. I., Schwarz, M. K. & Yttri, E. A. Paper code repository for an active learning platform for expert-guided, data efficient discovery of behavior. Zenodo https://doi.org/10.5281/zenodo.10257993 (2023).

Download references

Acknowledgements

We thank staff at the laboratories of J. Zimmermann and B. Hayden at the University of Minnesota for their patience and unpublished 3D OpenMonkeyStudio pose data. We also thank A. Kennedy, J. Sun, D. Anderson and the team at the California Institute of Technology and Northwestern University, USA, who created the CalMS21 dataset to provide a comprehensive dataset that can be used to benchmark current and future approaches to the classification of social behavior in mice. Portions of the research in this paper used the NTU-RGB+D Action Recognition Dataset made available by the ROSE Laboratory at the Nanyang Technological University, Singapore.

This work was supported by German Research Foundation (SFB1089 P02, P03 and B06 to M.K.S.; SPP 2041 SCHW1578/2-1 to M.K.S.). Research in the Schwarz laboratory was also supported by the Verein zur Förderung der Epilepsieforschung e.V. (M.K.S. and J.F.T.) and from the program Netzwerke 2021, an initiative of the Ministry of Culture and Science of the State of Northrhine Westphalia (M.K.S.). Research in the Yttri laboratory was supported by the Brain Research Foundation, the Kaufman Foundation and the US-Israel Binational Science Foundation (E.A.Y. and A.I.H.).

Author information

These authors contributed equally: Jens F. Tillmann, Alexander I. Hsu.

Authors and Affiliations

Institute of Experimental Epileptology and Cognition Research, University of Bonn, Bonn, Germany
Jens F. Tillmann & Martin K. Schwarz
Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
Alexander I. Hsu & Eric A. Yttri
Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Eric A. Yttri

Authors

Jens F. Tillmann
View author publications
You can also search for this author in PubMed Google Scholar
Alexander I. Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Martin K. Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Eric A. Yttri
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.F.T. and A.I.H. contributed equally to this paper. M.K.S. and E.A.Y. contributed equally and are both corresponding authors. J.F.T., A.I.H., M.K.S. and E.A.Y. wrote and reviewed the paper. J.F.T. proposed the idea and A.I.H. designed the core functionality and main analysis scripts. Both J.F.T. and A.I.H. created the app and worked on analysis and figures. J.F.T. annotated the primate data. E.A.Y. and M.K.S. provided support and funds.

Corresponding authors

Correspondence to Martin K. Schwarz or Eric A. Yttri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Nina Vogt, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Active-learning framework on user-defined data.

a) The A-SOiD GUI offers step-by-step navigation to run A-SOiD on a user’s own data. b) First users can select the origin/type of their pose estimation data (SLEAP, DLC, or CALMS21) and uploads their dataset including a previously labeled ground truth. Right, the user can enter basic parameters (frame-rate, resolution), behavioral categories of interest that are contained in the ground truth dataset as well as sub-select individuals/animals and key points/body parts as a basis for feature extraction. c) After input of a temporal reference frame (aka. bout length) for feature extraction using a histogram as shown in Fig. 1 (top), features are extracted, and a number of splits are provided to evaluate later classification training. d) In the active-learning segment, a classifier is trained by the iterative addition of low-confidence predictions. Here, refinement is directly taken from the remaining ground truth. During each iteration, the model’s performance is evaluated on a held-out test data for multiple splits. This process can be viewed live for each iteration. e) Finally, once the training is complete, users can use the app to upload new unlabeled data and use the previously trained model for classification. f) After classification, the app allows users to go through the results and view a brief report. g) Users are also able to discover conserved patterns in their ground truth data by selectively clustering annotation classes with directed unsupervised classification. Subtypes of interest can then be exported to create a new training set and be used to train a classifier.

Extended Data Fig. 2 Extended embeddings and feature windows.

a) Different embeddings of the features extracted from the CalMS21 dataset (feature window = 400 ms or 12 frames). Left, UMAP embedding as seen in Fig. 1g. Middle, principal component analysis (PCA, n = 2). Right, t-Distributed Stochastic Neighbor Embedding (t-SNE, n = 2). The annotations for each underlying behavior are superimposed onto the embedding (attack = red, investigation = orange, mount = blue, other = dark gray). Inserts show the underlying distribution separated by behavior (light gray). b) 2D UMAP embedding of CalMS21 features provided with the dataset. The features include the pose estimation of all body parts and 32 additional features extracted with TREBA[53]. c) 2D UMAP embeddings of feature bins across a range of 2 frames to 150 frames. Note that 2 frames (top left) is the minimum with a 30 Hz frame-rate, and 150 frames (bottom right) is considered very coarse for most observations. d) Adjusted mutual information score (AMI, black line) between the assignments and the original human annotations for a minimum cluster size range of 0.5 to 10.0% with 0.5% intervals. An AMI score of 1.0 indicates a perfect overlap even if the number of groups is different- that is, if investigation was perfectly represented by two clusters instead of one. In addition, we investigated the total number of clusters (gray, dashed line) as we expected a higher split to be more likely to incorporate the subtle differences between the behaviors. e) Selected HDBscan clustering of the embedded features (400 ms) in d. Left, the highest AMI reached a minimum cluster size of 6.0% (AMI = 0.267). Right, the highest number of clusters was found with a minimum cluster size of 1.5% (AMI = 0.195). Colors show identified clusters within each plot. Note that samples that cannot be confidently associated with a cluster are collectively annotated as noise (dark gray) and can therefore span the entire embedding. f) 2D histogram of the assigned cluster groups (y-axis) in d in relation to their ground truth annotations. Each histogram is normalized so that the sum of each column (ground truth behavior, for example, attack) is 1.0.

Extended Data Fig. 3 More candidates for active-learning refinement appear at behavior transitions.

a) Transition matrix for the frame annotation that happens before refinement candidates throughout A-SOiD (left), when compared to a random frame selection of these behaviors (right). b) Adjusted mutual information score as a metric to quantify similarity between prior frame (t-1) and refinement candidate/random selection (t). n=20 random seeds for sampling. Box spans first to third quartile, with the whiskers extending below first and beyond third quartiles by 1.5 times of inter-quartile range. One-sided t-test, p=8.887e-32. c) Transition matrix for the frame annotation that happens after refinement candidates throughout A-SOiD (left), in contrast to a random frame selection of these behaviors (right). d) Adjusted mutual information score as a metric to quantify similarity between next frame (t+1) and refinement candidate/random selection. n=20 random seeds for sampling. Box spans first to third quartile, with the whiskers extending below first and beyond third quartiles by 1.5 times of inter-quartile range. One-sided t-test, p=1.154e-34.

Extended Data Fig. 4 SHAP-replicated decision paths of two instances where A-SOiD, but not using full training data at once, correctly predicted “attack” and “investigation”.

a) Multi-output decision plot on an instance where human-annotated “attack”, but A-SOiD iteration 0 predicted “investigation”. The features on the x-axis are ranked sorted by their impact on cumulative prediction probability on the y-axis. The highest final prediction probability of the four color-coded lines revealed the prediction was “investigation”. b) Multi-output decision plot on an instance where human-annotated “investigation”, but A-SOiD iteration 0 predicted “other”. The features on the x-axis are ranked and sorted by their impact on cumulative prediction probability on the y-axis. The highest final prediction probability of the four color-coded lines revealed the prediction was “other”. c) Similar to a), a multi-output decision plot on an instance where human-annotated “attack”, but full training data at once incorrectly predicted “investigation”. d) Similar to b), a multi-output decision plot on an instance where human-annotated “investigation”, but full training at once incorrectly predicted “other”. e) Similar to a) and c), a multi-output decision plot on an instance where human-annotated “attack”, and A-SOiD iteration 20 correctly predicted “attack”. f) Similar to b) and d), a multi-output decision plot on an instance where human-annotated “investigation”, and A-SOiD iteration 20 correctly predicted “investigation”. Red: “attack”; Orange: “investigation”; Blue: “mount”; Black: “other”. Features names are black when they’re inter-distance, gray when they’re speed, and teal when they’re angular change. R: resident; I: intruder.

Extended Data Fig. 5 Benchmark results against state-of-the-art supervised classification methods.

a) Predicted ethogram (including “other”) using SimBA (blue), DeepEthogram (DEG, purple), and A-SOiD (pink) against target human annotation (black) without “other”. Every fiftieth frame is shown to allow for better visualization. Session n=19. b) Micro F1 scores across all behaviors (including classification of “other”) using SimBA (blue), DEG (purple), and A-SOiD (pink). Error bars represent +/- 3 SD across 20 seeds. c) Weighted macro F1 scores averaged across all behaviors (including classification of “other”) using SimBA (blue), DEG (purple), and A-SOiD (pink). Error bars represent +/- 3 SD across 20 seeds. d) The percentages of each behavior predicted for each algorithm (top left: SimBA; DEG: top right; A-SOiD: bottom left; human annotation: bottom right). e) The total number of frames being predicted as “attack”, “investigation”, “mount”, and “other”, for each algorithm (top left: SimBA; DEG: top right; A-SOiD: bottom left; human annotation: bottom right). f) F1 scores for individual behavior using SimBA (blue), DEG (purple), and A-SOiD (pink). Error bars represent +/- 3 SD across 20 seeds. g) F1 scores for individual behavior using unbalanced (unbal, black), random under-sampling (under, brown), SMOTE (maroon), and A-SOiD (pink). Error bars represent +/- 3 SD across 20 seeds.

Extended Data Fig. 6 Active learning speed across different sample sizes.

To estimate the total time it takes to run larger datasets with A-SOiD, we timed the time it takes for our active-learning regime (max iterations = 20, max number of samples per iteration = 200, initial ratio = 0.01) across a range of subsets (0.3 to 1.0) of the CalMS21 dataset (number of features = 100). We then fit a linear function to the measurements to estimate the performance speed with increasing sample sizes. a) Total time A-SOiD takes to complete 20 iterations, including feature extraction of the train set. Given the fit, every 1000 new samples increase the runtime by 3 seconds. The time it takes to run 1 Million samples is roughly 53 min. b) Isolated feature extraction speed for each subset. Given the fit, every 1000 samples increase the runtime by 2 seconds (1M samples about 28 min). Notably, we considerably optimized feature extraction by employing just-in-time compilation using the Python implementations of numba 0.52.0 (https://github.com/numba/numba). However, the feature extraction, which is run once in the beginning, is still the major bottleneck when it comes to speed. The vertical dotted line indicates the original size of the dataset. The vertical dotted line indicates the original size of the dataset. Each subset was repeated 3 times, and the speed was averaged across seeds. Error bars represent the standard deviation.

Extended Data Fig. 7 SHAP-replicated decision paths of two instances where A-SOiD, but not using full training data at once, correctly predicted “jump” and “climb (S)”.

a) Multi-output decision plot on an instance where human-annotated “jump”, but A-SOiD iteration 0 predicted “walk”. The features on the x-axis are ranked and sorted by their impact on cumulative prediction probability on the y-axis. The highest final prediction probability of the four color-coded lines revealed the prediction was “walk”. b) Multi-output decision plot on an instance where human-annotated “climb (S)”, but A-SOiD iteration 0 predicted “rear”. The features on the x-axis are ranked and sorted by their impact on cumulative prediction probability on the y-axis. The highest final prediction probability of the four color-coded lines revealed the prediction was “rear”. c) Similar to a), a multi-output decision plot on an instance where human-annotated “jump”, but full training data at once incorrectly predicted “walk”. d) Similar to b), a multi-output decision plot on an instance where human-annotated “climb (S)”, but full training at once incorrectly predicted “rear”. e) Similar to a) and c), a multi-output decision plot on an instance where human-annotated “jump”, and A-SOiD iteration 20 correctly predicted “jump”. f) Similar to b) and d), a multi-output decision plot on an instance where human-annotated “climb (S)”, and A-SOiD iteration 20 correctly predicted “climb (S)”. Orange: “ceiling climb” (Climb C); Yellow: “sidewall climb” (Climb S); Pink: “jump”; Green: “rear”; Blue: “walk”. Features names are black when they’re inter-distance, gray when they’re speed, and teal when they’re angular change. R: right-sided body parts; L: left-sided body parts.

Extended Data Fig. 8 A-SOiD’s performance on data that contains a higher number of actions, including more complex actions (NTU-RGB-D60 dataset; see Methods).

Social human actions (a-e) and Single human actions (g-h). a) Schematic representation of pose estimation skeletons in NTU-RGB-D60 dataset. b) A-SOiD performance (F1 score, 3 cross-validations, blue) on held-out test data of the last iteration (50 iterations; n_samples = 81045, 77% of training set) plotted versus a classifier trained with the same number of samples (partial, red) and the full training set (full, black). See Supplementary Table 6 for details. c) Performance (F1 score, 3 cross-validations) on held-out test data is plotted against active-learning iterations. Black: Unweighted class average. The dashed line demonstrates unweighted class average performance using full training annotations at once. Data presented as mean values +/- standard deviation across 3 random initializations. d) Stacked bar graph of the number of training annotations is plotted against active-learning iterations (right-most: full annotation count). Bars represent average training samples across 3 seeds. Dashed line indicates the number of samples in the final iteration (77%, n = 81045). e) Confusion matrices where prediction mistakes were being made for the last iteration (A-SOiD, top) and a classifier trained on the same number of samples, randomly sampled from the training set (Partial, bottom). Darker red shades along the diagonal indicate better algorithm performance in matching ground truth. f) Performance across all 49 actions (F1 score, 3 cross-validations, blue) on held-out test data of the last iteration (90 iterations; n_samples = 546245, 87% of training set) plotted versus the performance of a classifier trained with the same number of samples (partial, red) and with full training set (full, black). Data presented as mean values +/- standard deviation across 3 random initializations. See Supplementary Table 7 for details. g) Performance (F1 score, 3 seed cross-validations) on held-out test data is plotted against active-learning iterations. Ten representative actions, including the worst and best performing, are shown. Dashed lines demonstrate performance using full training annotations at once. Black: Unweighted class average across all 49 actions. Error bars represent the standard deviation across 3 initializations. h) Same as e but for the 49 actions subset.

Extended Data Fig. 9 A-SOiD performance done on 3, 5, or 7 key points.

a) Identical to what was used in the Fig. 2, 5 key points were used (nose, neck, two hips, and tail base). Data presented as mean values +/- standard deviation across 20 random initializations. b) Average A-SOiD performance on using just 3 key points (nose, neck, and tail base). Data presented as mean values +/- standard deviation across 20 random initializations. c) Average A-SOiD performance on using all 7 key points (nose, two ears, neck, two hips, and tail base). Data presented as mean values +/- standard deviation across 20 random initializations. d) Average F1 score difference between 5 key points and either all 7 key points or a reduced 3 key points over active-learning iterations.

Supplementary information

Supplementary Information

Supplementary Discussion, Supplementary Fig. 1, Supplementary Note and Supplementary Tables 1–7.

Reporting Summary

Peer Review File

Supplementary Video 1

Video examples of two subclasses segmented from investigation that reflect anogenital investigation. On the left, one mouse directly approaches the anogenital area of another mouse, irrespective of the incoming angle (anogenital approach). On the right, one mouse investigates the anogenital area of another mouse while already being in close proximity to begin with (anogenital investigation).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tillmann, J.F., Hsu, A.I., Schwarz, M.K. et al. A-SOiD, an active-learning platform for expert-guided, data-efficient discovery of behavior. Nat Methods 21, 703–711 (2024). https://doi.org/10.1038/s41592-024-02200-1

Download citation

Received: 04 November 2022
Accepted: 29 January 2024
Published: 21 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1038/s41592-024-02200-1