Understanding of animal collectives is limited by the ability to track each individual. We describe an algorithm and software that extract all trajectories from video, with high identification accuracy for collectives of up to 100 individuals. idtracker.ai uses two convolutional networks: one that detects when animals touch or cross and another for animal identification. The tool is trained with a protocol that adapts to video conditions and tracking difficulty.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Artificial Life and Robotics Open Access 30 April 2022
Nature Methods Open Access 12 April 2022
Nature Methods Open Access 04 April 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
idtracker.ai is open-source and free software (license GPL v.3). The source code and the instructions for its installation are available at https://gitlab.com/polavieja_lab/idtrackerai. A quick-start user guide and a detailed explanation of the GUI can be found at http://idtracker.ai/. The software is also provided as Supplementary Software.
Processed data that can be used to reproduce all figures and tables can be found at http://idtracker.ai/. Lossless compressed videos can be downloaded from the same page. Raw videos are available from the corresponding author upon reasonable request. A library of single-individual zebrafish images for use in testing identification methods also can be found at http://idtracker.ai/. Two example videos, one of 8 adult zebrafish and one of 100 juvenile zebrafish, are also included as part of the quick-start user guide.
Pérez-Escudero, A., Vicente-Page, J., Hinz, R. C., Arganda, S. & de Polavieja, G. G. Nat. Methods 11, 743–748 (2014).
Dolado, R., Gimeno, E., Beltran, F. S., Quera, V. & Pertusa, J. F. Behav. Res. Methods 47, 1032–1043 (2015).
Rasch, M. J., Shi, A. & Ji, Z. bioRxiv Preprint at https://www.biorxiv.org/content/early/2016/08/24/071308 (2016).
Rodriguez, A., Zhang, H., Klaminder, J., Brodin, T. & Andersson, M. Sci. Rep. 7, 14774 (2017).
Wang, S. H., Zhao, J. W. & Chen, Y. Q. Multimed. Tools Appl. 76, 23679–23697 (2017).
Xu, Z. & Cheng, X. E. Sci. Rep. 7, 42815 (2017).
Lecheval, V. et al. Proc. Biol. Sci. 285, 1877 (2018).
LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. TensorFlow.org http://download.tensorflow.org/paper/whitepaper2015.pdf (2015).
Rusk, N. Nat. Methods 13, 35 (2016).
Pan, S. J. et al. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).
Laan, A., Iglesias-Julios, M. & de Polavieja, G. G. R. Soc. Open Sci. 5, 180679 (2018).
Martins, S. et al. Zebrafish 13, S47–S55 (2016).
Glorot, X. & Bengio, Y. in Proc. Thirteenth International Conference on Artificial Intelligence and Statistics (eds Teh, Y. W. & Titterington, M.) 249–256 (PMLR, Sardinia, Italy, 2010).
Kingma, D. & Ba, J. arXiv Preprint at https://arxiv.org/abs/1412.6980 (2015).
Morgan, N. & Bourlard, H. in Advances in Neural Information Processing Systems 2 (ed Touretzky, D. S.) 630–637 (Morgan Kaufmann, San Francisco, 1990).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Bradski, G. Dr. Dobb’s Journal 25, 120–123 (2000).
Oppenheim, A. V. & Schafer, R. W. Discrete-time Signal Processing (Pearson, Upper Saddle River, NJ, 2014).
Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization (John Wiley & Sons, Hoboken, NJ, 2015).
We thank A. Groneberg, A. Laan and A. Pérez-Escudero for discussions; J. Baúto, R. Ribeiro, P. Carriço, T. Cruz, J. Couceiro, L. Costa, A. Certal and I. Campos for assistance in software, arena design and animal husbandry; and A. Bruce (Monash University, Melbourne, Australia), N. Blüthgen (Technische Universität Darmstadt, Darmstadt, Germany), C. Ferreira, A. Laan and M. Iglesias-Julios (Champalimaud Foundation, Lisbon, Portugal) for videos of ants, flies and zebrafish fights. This study was supported by Congento LISBOA-01-0145-FEDER-022170, NVIDIA (M.G.B., F.H. and G.G.d.P.), PTDC/NEU-SCC/0948/2014 (G.G.d.P.) and Champalimaud Foundation (G.G.d.P.). F. R.-F. acknowledges an FCT PhD fellowship.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
(a) Holding grid used to record 184 juvenile zebrafish (TU strain, 31 dpf) in separated chambers (60-mm-diameter Petri dishes). (b) Sample frame showing the individuals used to create the dataset and the individuals used as social context (n = 46 videos corresponding to n = 184 different individuals; ~18,000 frames per individual). (c) Summary of the individual-images dataset. The dataset is composed of a total of ~3,312,000 uncompressed, grayscale, labeled images (52 × 52 pixels).
Supplementary Figure 2 Single-image identification accuracy for different group sizes and different variations of the identification network.
Each network is trained from scratch using 3,000 temporally uncorrelated images per animal (90% for training and 10% for validation) and then tested with 300 new temporally uncorrelated images to compute the single-image identification accuracy (Supplementary Notes). We train and test each network five times. For every repetition, the individuals of the group and the images of each individual are selected randomly. Images are extracted from videos of 184 different animals recorded in isolation (Supplementary Fig. 2). Colored lines with markers represent single-image accuracies (mean ± s.d., n = 5) for network architectures with different numbers of convolutional layers (a; see Supplementary Table 2 for the architectures) and different sizes and numbers of fully connected layers (b; see Supplementary Table 3 for the architectures). The black solid line with diamond markers shows the accuracy for the network used to identify images in idtracker.ai (see Supplementary Table 1, identification convolutional neural network).
(a) Front view of the experimental setup used to record zebrafish in groups and in isolation. (b) Side view of the same setup with the light diffuser rolled up. (c) Close-up view of the custom-made circular tank used to record the groups of 10, 60 and 100 juvenile zebrafish. (d) Sample frame from a video of 60 animals (n = 3 videos of 10 zebrafish, n = 3 videos of 60 zebrafish, and n = 3 videos of 100 zebrafish).
(a) Exterior view of the setup used to record flies in groups. (b) Top view of the same setup with the diffuser rolled up. (c) Close-up view of one of the two arenas used (arena 1). (d) Sample frame from a video of 100 flies (n = 1 group of 38 flies, n = 2 groups of 60 flies, n = 1 group of 72 flies, n = 2 groups of 80 flies, and n = 3 groups of 100 flies; all animals were different for each group).
Comparison between the accuracy estimated automatically by idtracker.ai and the accuracy computed by human validation of the videos (Supplementary Notes). The estimated accuracy is computed over the validated portion of the video. Blue dots represent the videos referenced in Supplementary Tables 5–7.
Supplementary Figure 6 Accuracy as a function of the minimum number of images in the first global fragment used for training.
To study the effect of the minimum number of images per individual in the first global fragment used to train the identification network, we created synthetic videos using images of 184 individuals recorded in isolation (Supplementary Fig. 1). Each synthetic video consists of 10,000 frames, where the number of images in every individual fragment was drawn from a gamma distribution, and the crossing fragments lasted for three frames (Supplementary Notes). The parameters were set as follows: θ = [2,000, 1,000, 500, 250, 100], k = [0.5, 0.35, 0.25, 0.15, 0.05], number of individuals = [10,60,100]. For every combination of these parameters we ran three repetitions. In total, we computed both the cascade of training and identification protocols and the residual identification for 225 synthetic videos. (a) Identification accuracy for simulated (empty markers) and real videos (color markers) as a function of the minimum number of images in the first global fragment. The number next to each color marker indicates the number of animals in the video. The accuracy of the real videos was obtained by manual validation (Supplementary Tables 5–7). In some videos, animals are almost immobile for long periods of time because of low-humidity conditions. Potentially, the individual fragments acquired during these periods encode less information that is useful for identifying the animals. To account for this, we corrected the number of images in the individual fragments by considering only frames in which the animals were moving with a speed of at least 0.75 BL/s. We observed that idtracker.ai was more likely to have higher accuracy when the minimum number of images in the first global fragment used for training was > 30. (b) Distributions of the number of images per individual fragment for real videos of zebrafish, and their fits to a gamma distribution. (c) Distributions of speeds of zebrafish and fruit fly videos.
Human-validated accuracy of tracking results obtained at six different resolutions. Pixels per animal are here indicated at the identification stage. There are fewer pixels per animal at the segmentation stage—approximately 25 and 300 pixels per animal, compared with 100 and 600 at the identification stage, respectively.
Human-validated accuracy of tracking results obtained at seven different values of the s.d. of a Gaussian filtering of the video.
Background image corresponding to two different experiments with 60 zebrafish (n = 1 experiment for each condition). On the left for our standard setup and on the right after switching off the IR LEDs in two walls and covering the light diffuser in the same side with a black cloth. Human-validated accuracy of tracking results is given below the images. The background image is computed as the average of equally spaced frames along the video with a period of 100 frames.
Each colored line represents the attack score of an individual (see the Methods for the definition of ‘attack score’).
Supplementary Figure 11 Correlation between the average distance to the center of the tank and the average speed for two milling groups of 100 juvenile zebrafish.
(a) Probability density of the location in the tank of three representative individuals depicted in (b) as gray markers. (b) Average speed along the video as a function of the average distance to the center of the tank for all the fish in the group. Each black dot represents an individual; the gray markers are the individuals depicted in (a). The blue dashed line is the line of best fit to the data (R2 = 0.5686, Pearson’s r and P = 10–19, two-sided P value using Wald test with t-distribution of the test statistic). (c) Same as in (a) for a different video. (d) Same as in (b) for a different video (R2 = 0.6934, Pearson’s r and P = 7 × 10–27, two-sided P value using Wald test with t-distribution of the test statistic).
Supplementary Figs. 1–11, Supplementary Tables 1–12 and Supplementary Note 1
Supplementary_software.zip contains two folders: (1) idtrackerai-1.0.3-alpha, which is the code for the idtracker.ai software at the time of publication (see https://gitlab.com/polavieja_lab/idtrackerai.git for the latest version), and (2) idtracker.ai_Figures_and_Tables_code, which includes the code to reproduce the panels in Figs. 1 and 2, as well as Supplementary Figures and Supplementary Tables
About this article
Cite this article
Romero-Ferrero, F., Bergomi, M.G., Hinz, R.C. et al. idtracker.ai: tracking all individuals in small or large collectives of unmarked animals. Nat Methods 16, 179–182 (2019). https://doi.org/10.1038/s41592-018-0295-5
Nature Methods (2022)
Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments
Nature Machine Intelligence (2022)
Nature Methods (2022)
Artificial Life and Robotics (2022)
Recent advances of deep learning algorithms for aquacultural machine vision systems with emphasis on fish
Artificial Intelligence Review (2022)