Brief Communication | Published:

idtracker.ai: tracking all individuals in small or large collectives of unmarked animals

Abstract

Understanding of animal collectives is limited by the ability to track each individual. We describe an algorithm and software that extract all trajectories from video, with high identification accuracy for collectives of up to 100 individuals. idtracker.ai uses two convolutional networks: one that detects when animals touch or cross and another for animal identification. The tool is trained with a protocol that adapts to video conditions and tracking difficulty.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

idtracker.ai is open-source and free software (license GPL v.3). The source code and the instructions for its installation are available at https://gitlab.com/polavieja_lab/idtrackerai. A quick-start user guide and a detailed explanation of the GUI can be found at http://idtracker.ai/. The software is also provided as Supplementary Software.

Data availability

Processed data that can be used to reproduce all figures and tables can be found at http://idtracker.ai/. Lossless compressed videos can be downloaded from the same page. Raw videos are available from the corresponding author upon reasonable request. A library of single-individual zebrafish images for use in testing identification methods also can be found at http://idtracker.ai/. Two example videos, one of 8 adult zebrafish and one of 100 juvenile zebrafish, are also included as part of the quick-start user guide.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Pérez-Escudero, A., Vicente-Page, J., Hinz, R. C., Arganda, S. & de Polavieja, G. G. Nat. Methods 11, 743–748 (2014).

  2. 2.

    Dolado, R., Gimeno, E., Beltran, F. S., Quera, V. & Pertusa, J. F. Behav. Res. Methods 47, 1032–1043 (2015).

  3. 3.

    Rasch, M. J., Shi, A. & Ji, Z. bioRxiv Preprint at https://www.biorxiv.org/content/early/2016/08/24/071308 (2016).

  4. 4.

    Rodriguez, A., Zhang, H., Klaminder, J., Brodin, T. & Andersson, M. Sci. Rep. 7, 14774 (2017).

  5. 5.

    Wang, S. H., Zhao, J. W. & Chen, Y. Q. Multimed. Tools Appl. 76, 23679–23697 (2017).

  6. 6.

    Xu, Z. & Cheng, X. E. Sci. Rep. 7, 42815 (2017).

  7. 7.

    Lecheval, V. et al. Proc. Biol. Sci. 285, 1877 (2018).

  8. 8.

    LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).

  9. 9.

    Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. TensorFlow.org http://download.tensorflow.org/paper/whitepaper2015.pdf (2015).

  10. 10.

    Rusk, N. Nat. Methods 13, 35 (2016).

  11. 11.

    Pan, S. J. et al. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).

  12. 12.

    Laan, A., Iglesias-Julios, M. & de Polavieja, G. G. R. Soc. Open Sci. 5, 180679 (2018).

  13. 13.

    Martins, S. et al. Zebrafish 13, S47–S55 (2016).

  14. 14.

    Glorot, X. & Bengio, Y. in Proc. Thirteenth International Conference on Artificial Intelligence and Statistics (eds Teh, Y. W. & Titterington, M.) 249–256 (PMLR, Sardinia, Italy, 2010).

  15. 15.

    Kingma, D. & Ba, J. arXiv Preprint at https://arxiv.org/abs/1412.6980 (2015).

  16. 16.

    Morgan, N. & Bourlard, H. in Advances in Neural Information Processing Systems 2 (ed Touretzky, D. S.) 630–637 (Morgan Kaufmann, San Francisco, 1990).

  17. 17.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. J. Mach. Learn. Res. 15, 1929–1958 (2014).

  18. 18.

    Bradski, G. Dr. Dobb’s Journal 25, 120–123 (2000).

  19. 19.

    Oppenheim, A. V. & Schafer, R. W. Discrete-time Signal Processing (Pearson, Upper Saddle River, NJ, 2014).

  20. 20.

    Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization (John Wiley & Sons, Hoboken, NJ, 2015).

Download references

Acknowledgements

We thank A. Groneberg, A. Laan and A. Pérez-Escudero for discussions; J. Baúto, R. Ribeiro, P. Carriço, T. Cruz, J. Couceiro, L. Costa, A. Certal and I. Campos for assistance in software, arena design and animal husbandry; and A. Bruce (Monash University, Melbourne, Australia), N. Blüthgen (Technische Universität Darmstadt, Darmstadt, Germany), C. Ferreira, A. Laan and M. Iglesias-Julios (Champalimaud Foundation, Lisbon, Portugal) for videos of ants, flies and zebrafish fights. This study was supported by Congento LISBOA-01-0145-FEDER-022170, NVIDIA (M.G.B., F.H. and G.G.d.P.), PTDC/NEU-SCC/0948/2014 (G.G.d.P.) and Champalimaud Foundation (G.G.d.P.). F. R.-F. acknowledges an FCT PhD fellowship.

Author information

F.R.-F., M.G.B. and G.G.d.P. devised the project and algorithms and analyzed data. F.R.-F. and M.G.B. wrote the code with help from F.H. M.G.B. managed the code architecture and GUI. F.R.-F. managed testing procedures. R.H. built setups and conducted experiments with help from F.R.-F. G.G.d.P. supervised the project. M.G.B. wrote the supplementary material with help from F.R.-F., R.H., F.H. and G.G.d.P., and G.G.d.P. wrote the main text with help from F.R.-F., M.G.B. and F.H.

Competing interests

The authors declare no competing interests.

Correspondence to Gonzalo G. de Polavieja.

Integrated supplementary information

  1. Supplementary Figure 1 Training dataset of individual images.

    (a) Holding grid used to record 184 juvenile zebrafish (TU strain, 31 dpf) in separated chambers (60-mm-diameter Petri dishes). (b) Sample frame showing the individuals used to create the dataset and the individuals used as social context (n= 46 videos corresponding to n = 184 different individuals; ~18,000 frames per individual). (c) Summary of the individual-images dataset. The dataset is composed of a total of ~3,312,000 uncompressed, grayscale, labeled images (52 × 52 pixels).

  2. Supplementary Figure 2 Single-image identification accuracy for different group sizes and different variations of the identification network.

    Each network is trained from scratch using 3,000 temporally uncorrelated images per animal (90% for training and 10% for validation) and then tested with 300 new temporally uncorrelated images to compute the single-image identification accuracy (Supplementary Notes). We train and test each network five times. For every repetition, the individuals of the group and the images of each individual are selected randomly. Images are extracted from videos of 184 different animals recorded in isolation (Supplementary Fig. 2). Colored lines with markers represent single-image accuracies (mean ± s.d., n= 5) for network architectures with different numbers of convolutional layers (a; see Supplementary Table 2 for the architectures) and different sizes and numbers of fully connected layers (b; see Supplementary Table 3 for the architectures). The black solid line with diamond markers shows the accuracy for the network used to identify images in idtracker.ai (see Supplementary Table 1, identification convolutional neural network).

  3. Supplementary Figure 3 Experimental setup for recording zebrafish videos.

    (a) Front view of the experimental setup used to record zebrafish in groups and in isolation. (b) Side view of the same setup with the light diffuser rolled up. (c) Close-up view of the custom-made circular tank used to record the groups of 10, 60 and 100 juvenile zebrafish. (d) Sample frame from a video of 60 animals (n= 3 videos of 10 zebrafish, n= 3 videos of 60 zebrafish, and n= 3 videos of 100 zebrafish).

  4. Supplementary Figure 4 Experimental setup used to record fruit fly videos.

    (a) Exterior view of the setup used to record flies in groups. (b) Top view of the same setup with the diffuser rolled up. (c) Close-up view of one of the two arenas used (arena 1). (d) Sample frame from a video of 100 flies (n = 1 group of 38 flies, n = 2 groups of 60 flies, n = 1 group of 72 flies, n = 2 groups of 80 flies, and n = 3 groups of 100 flies; all animals were different for each group).

  5. Supplementary Figure 5 Automatic estimation of identification accuracy.

    Comparison between the accuracy estimated automatically by idtracker.ai and the accuracy computed by human validation of the videos (Supplementary Notes). The estimated accuracy is computed over the validated portion of the video. Blue dots represent the videos referenced in Supplementary Tables 57.

  6. Supplementary Figure 6 Accuracy as a function of the minimum number of images in the first global fragment used for training.

    To study the effect of the minimum number of images per individual in the first global fragment used to train the identification network, we created synthetic videos using images of 184 individuals recorded in isolation (Supplementary Fig. 1). Each synthetic video consists of 10,000 frames, where the number of images in every individual fragment was drawn from a gamma distribution, and the crossing fragments lasted for three frames (Supplementary Notes). The parameters were set as follows: θ = [2,000, 1,000, 500, 250, 100], k = [0.5, 0.35, 0.25, 0.15, 0.05], number of individuals = [10,60,100]. For every combination of these parameters we ran three repetitions. In total, we computed both the cascade of training and identification protocols and the residual identification for 225 synthetic videos. (a) Identification accuracy for simulated (empty markers) and real videos (color markers) as a function of the minimum number of images in the first global fragment. The number next to each color marker indicates the number of animals in the video. The accuracy of the real videos was obtained by manual validation (Supplementary Tables 57). In some videos, animals are almost immobile for long periods of time because of low-humidity conditions. Potentially, the individual fragments acquired during these periods encode less information that is useful for identifying the animals. To account for this, we corrected the number of images in the individual fragments by considering only frames in which the animals were moving with a speed of at least 0.75 BL/s. We observed that idtracker.ai was more likely to have higher accuracy when the minimum number of images in the first global fragment used for training was > 30. (b) Distributions of the number of images per individual fragment for real videos of zebrafish, and their fits to a gamma distribution. (c) Distributions of speeds of zebrafish and fruit fly videos.

  7. Supplementary Figure 7 Performance as a function of resolution.

    Human-validated accuracy of tracking results obtained at six different resolutions. Pixels per animal are here indicated at the identification stage. There are fewer pixels per animal at the segmentation stage—approximately 25 and 300 pixels per animal, compared with 100 and 600 at the identification stage, respectively.

  8. Supplementary Figure 8 Performance after application of Gaussian blurring.

    Human-validated accuracy of tracking results obtained at seven different values of the s.d. of a Gaussian filtering of the video.

  9. Supplementary Figure 9 Performance with inhomogeneous light conditions.

    Background image corresponding to two different experiments with 60 zebrafish (n = 1 experiment for each condition). On the left for our standard setup and on the right after switching off the IR LEDs in two walls and covering the light diffuser in the same side with a black cloth. Human-validated accuracy of tracking results is given below the images. The background image is computed as the average of equally spaced frames along the video with a period of 100 frames.

  10. Supplementary Figure 10 Attack score over time for seven pairs of fish staged to fight.

    Each colored line represents the attack score of an individual (see the Methods for the definition of ‘attack score’).

  11. Supplementary Figure 11 Correlation between the average distance to the center of the tank and the average speed for two milling groups of 100 juvenile zebrafish.

    (a) Probability density of the location in the tank of three representative individuals depicted in (b) as gray markers. (b) Average speed along the video as a function of the average distance to the center of the tank for all the fish in the group. Each black dot represents an individual; the gray markers are the individuals depicted in (a). The blue dashed line is the line of best fit to the data (R2 = 0.5686, Pearson’s r and P = 10–19, two-sided P value using Wald test with t-distribution of the test statistic). (c) Same as in (a) for a different video. (d) Same as in (b) for a different video (R2 = 0.6934, Pearson’s r and P = 7 × 10–27, two-sided P value using Wald test with t-distribution of the test statistic).

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figs. 1–11, Supplementary Tables 1–12 and Supplementary Note 1

  2. Reporting Summary

  3. Supplementary Software

    Supplementary_software.zip contains two folders: (1) idtrackerai-1.0.3-alpha, which is the code for the idtracker.ai software at the time of publication (see https://gitlab.com/polavieja_lab/idtrackerai.git for the latest version), and (2) idtracker.ai_Figures_and_Tables_code, which includes the code to reproduce the panels in Figs. 1 and 2, as well as Supplementary Figures and Supplementary Tables

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.

About this article

Publication history

  • Received

  • Accepted

  • Published

  • Issue Date

DOI

https://doi.org/10.1038/s41592-018-0295-5

Further reading

Fig. 1: Tracking by identification in idtracker.ai.
Fig. 2: Using idtracker.ai to study small and large animal groups.
Supplementary Figure 1: Training dataset of individual images.
Supplementary Figure 2: Single-image identification accuracy for different group sizes and different variations of the identification network.
Supplementary Figure 3: Experimental setup for recording zebrafish videos.
Supplementary Figure 4: Experimental setup used to record fruit fly videos.
Supplementary Figure 5: Automatic estimation of identification accuracy.
Supplementary Figure 6: Accuracy as a function of the minimum number of images in the first global fragment used for training.
Supplementary Figure 7: Performance as a function of resolution.
Supplementary Figure 8: Performance after application of Gaussian blurring.
Supplementary Figure 9: Performance with inhomogeneous light conditions.
Supplementary Figure 10: Attack score over time for seven pairs of fish staged to fight.
Supplementary Figure 11: Correlation between the average distance to the center of the tank and the average speed for two milling groups of 100 juvenile zebrafish.