Abstract

Pattern recognition and classification of images are key challenges throughout the life sciences. We combined two approaches for large-scale classification of fluorescence microscopy images. First, using the publicly available data set from the Cell Atlas of the Human Protein Atlas (HPA), we integrated an image-classification task into a mainstream video game (EVE Online) as a mini-game, named Project Discovery. Participation by 322,006 gamers over 1 year provided nearly 33 million classifications of subcellular localization patterns, including patterns that were not previously annotated by the HPA. Second, we used deep learning to build an automated Localization Cellular Annotation Tool (Loc-CAT). This tool classifies proteins into 29 subcellular localization patterns and can deal efficiently with multi-localization proteins, performing robustly across different cell types. Combining the annotations of gamers and deep learning, we applied transfer learning to create a boosted learner that can characterize subcellular protein distribution with F1 score of 0.72. We found that engaging players of commercial computer games provided data that augmented deep learning and enabled scalable and readily improved image classification.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Petabyte data management and automated data workflow in neuroscience: delivering data from the instruments to the researcher's fingertips. Microsc. Microanal. 17, 276–277 (2011).

  2. 2.

    et al. Building Watson: an overview of the DeepQA project. AI Magazine 31, 59–79 (2010).

  3. 3.

    et al. Machine learning in bioinformatics. Brief. Bioinform. 7, 86–112 (2006).

  4. 4.

    et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

  5. 5.

    et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).

  6. 6.

    Citizen science: can volunteers do real research? Bioscience 58, 192–197 (2008).

  7. 7.

    et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010).

  8. 8.

    et al. A subcellular map of the human proteome. Science 356, eaai3321 (2017).

  9. 9.

    & A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics 17, 1213–1223 (2001).

  10. 10.

    & Boosting accuracy of automated classification of fluorescence microscope images for location proteomics. BMC Bioinformatics 5, 78 (2004).

  11. 11.

    et al. Automated analysis of Human Protein Atlas immunofluorescence images. Proc. IEEE Int. Symp. Biomed. Imaging 5193229, 1023–1026 (2009).

  12. 12.

    , , , & Automated analysis and reannotation of subcellular locations in confocal images from the Human Protein Atlas. PLoS One 7, e50514 (2012).

  13. 13.

    , , & Protein subcellular location pattern classification in cellular images using latent discriminative models. Bioinformatics 28, i32–i39 (2012).

  14. 14.

    et al. Determining the subcellular location of new proteins from microscope images using local features. Bioinformatics 29, 2343–2349 (2013).

  15. 15.

    et al. A multiresolution approach to automated classification of protein subcellular location images. BMC Bioinformatics 8, 210 (2007).

  16. 16.

    , & Deep learning. Nature 521, 436–444 (2015).

  17. 17.

    & Accurate classification of protein subcellular localization from high-throughput microscopy images using deep learning. G3 (Bethesda) 7, 1385–1392 (2017).

  18. 18.

    , & Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).

  19. 19.

    The class imbalance problem: A systematic study. Intell. Data Anal. 6, 429–449 (2002).

  20. 20.

    , & Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing. Bioinformatics 26, i7–i12 (2010).

  21. 21.

    , , & Object type recognition for automated analysis of protein subcellular location. IEEE Trans. Image Process. 14, 1351–1359 (2005).

  22. 22.

    Bioimage-based protein subcellular location prediction: a comprehensive review. Front. Comput. Sci. 12, 26–39 (2018).

  23. 23.

    et al. Algorithm discovery by protein folding game players. Proc. Natl. Acad. Sci. USA 108, 18949–18953 (2011).

  24. 24.

    et al. Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat. Struct. Mol. Biol. 18, 1175–1177 (2011).

  25. 25.

    et al. Galaxy Zoo: 'Hanny's Voorwerp', a quasar light echo? Mon. Not. R. Astron. Soc. 399, 129–140 (2009).

  26. 26.

    Galaxy evolution. Galaxy zoo volunteers share pain and glory of research. Science 333, 173–175 (2011).

  27. 27.

    et al. Galaxy Zoo: exploring the motivations of citizen science volunteers. Astron. Educ. Rev. 9, 18 (2010).

  28. 28.

    et al. RNA design rules from a massive open laboratory. Proc. Natl. Acad. Sci. USA 111, 2122–2127 (2014).

  29. 29.

    et al. Exploring the quantum speed limit with computer games. Nature 532, 210–213 (2016).

  30. 30.

    et al. Quantius: Generic, high-fidelity human annotation of scientific images at 105-clicks-per-hour. Preprint at (2017).

  31. 31.

    , & Using mechanical turk to study clinical populations. Clin. Pyschol. Sci. 1, 213–220 (2013).

  32. 32.

    et al. How is success defined and measured in online citizen science? A case study of Zooniverse projects. Comput. Sci. Eng. 17, 28–41 (2015).

  33. 33.

    , & A long-term study of a popular MMORPG. Proceedings of the 6th ACM SIGCOMM Workshop on Network and System Support for Games 19–24 (2007).

  34. 34.

    , & Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging 23, 903–921 (2004).

  35. 35.

    , , & Cheap and fast, but is it good? Evaluating non-expert annotations for natural language tasks. Conference on Empirical Methods in Natural Language Processing 254–263 (2008).

  36. 36.

    et al. Glutamine deprivation initiates reversible assembly of mammalian rods and rings. Cell. Mol. Life Sci. 71, 2963–2973 (2014).

  37. 37.

    et al. Induction of cytoplasmic rods and rings structures by inhibition of the CTP and GTP synthetic pathway in mammalian cells. PLoS One 6, e29690 (2011).

  38. 38.

    , , , & Unsupervised clustering of subcellular protein expression patterns in high-throughput microscopy images reveals protein complexes and functional relationships between proteins. PLOS Comput. Biol. 9, e1003085 (2013).

  39. 39.

    , , & Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures. Preprint at (2016).

Download references

Acknowledgements

We acknowledge the staff of the Human Protein Atlas program for valuable contributions. We acknowledge the EVE Development team, the University of Reykjavik and the University of Iceland for assistance with the game implementation. We acknowledge MMOS Sarl for serving images and managing response collection and CCP hf and MMOS Sarl for financially supporting the image storage and serving throughout Project Discovery. Funding to E.L. was provided by the Knut and Alice Wallenberg Foundation.

Author information

Author notes

    • Devin P Sullivan
    •  & Casper F Winsnes

    These authors contributed equally to this work.

Affiliations

  1. Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden.

    • Devin P Sullivan
    • , Casper F Winsnes
    • , Lovisa Åkesson
    • , Martin Hjelmare
    • , Mikaela Wiking
    • , Rutger Schutten
    •  & Emma Lundberg
  2. CCP hf, Reyjkavik, Iceland.

    • Linzi Campbell
    • , Hjalti Leifsson
    • , Scott Rhodes
    • , Andie Nordgren
    •  & Bergur Finnbogason
  3. Science for Life Laboratory, School of Computer Science and Communication, KTH - Royal Institute of Technology, Stockholm, Sweden.

    • Kevin Smith
  4. MMOS Sàrl, Monthey, Switzerland.

    • Bernard Revaz
    •  & Attila Szantner
  5. Department of Genetics, Stanford University, Stanford, California, USA.

    • Emma Lundberg
  6. Chan Zuckerberg Biohub, San Francisco, San Francisco, California, USA.

    • Emma Lundberg

Authors

  1. Search for Devin P Sullivan in:

  2. Search for Casper F Winsnes in:

  3. Search for Lovisa Åkesson in:

  4. Search for Martin Hjelmare in:

  5. Search for Mikaela Wiking in:

  6. Search for Rutger Schutten in:

  7. Search for Linzi Campbell in:

  8. Search for Hjalti Leifsson in:

  9. Search for Scott Rhodes in:

  10. Search for Andie Nordgren in:

  11. Search for Kevin Smith in:

  12. Search for Bernard Revaz in:

  13. Search for Bergur Finnbogason in:

  14. Search for Attila Szantner in:

  15. Search for Emma Lundberg in:

Contributions

A.S., B.R., B.F., A.N. and E.L. conceived the study. M.H., A.S., B.F., E.L., D.P.S. and C.F.W. developed the methodology for the study. A.S. and B.R. developed the citizen science engine. L.C., H.L., S.R. and B.F. developed the game narrative and implementation. Project Discovery was played by thousands of players of EVE Online. D.P.S., L.Å., M.W., R.S. and E.L. provided game support. C.F.W., K.S. and D.P.S. developed the machine learning. D.P.S., C.F.W. and E.L. carried out data analysis and investigation. D.P.S., C.F.W. and E.L. wrote the manuscript. D.P.S. and C.F.W. created the figures. E.L. supervised and administered the project and acquired funding.

Competing interests

A.S. and B.R. are founders of MMOS Sarl.

Corresponding author

Correspondence to Emma Lundberg.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–5

  2. 2.

    Life Sciences Reporting Summary

Excel files

  1. 1.

    Supplementary Table 1

    Comparison of protein subcellular localization methods from fluorescent microscopy images

  2. 2.

    Supplementary Table 2

    Project Discovery optimized per-class cutoffs

  3. 3.

    Supplementary Table 3

    Rods & Rings localized proteins found by Project Discovery

  4. 4.

    Supplementary Table 4

    Loc-CAT optimized per-class cutoffs

  5. 5.

    Supplementary Data Set 2

    SLF feature names used in Loc-CAT DNN

Zip files

  1. 1.

    Supplementary Data Set 1

    HPA version 14 “gold standard” annotations

Text files

  1. 1.

    Supplementary Data Set 3

    Expert reannotation results

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.4225

Further reading Further reading