Phenotate: crowdsourcing phenotype annotations as exercises in undergraduate classes

  • A Correction to this article was published on 18 June 2020



Computational documentation of genetic disorders is highly reliant on structured data for differential diagnosis, pathogenic variant identification, and patient matchmaking. However, most information on rare diseases (RDs) exists in freeform text, such as academic literature. To increase availability of structured RD data, we developed a crowdsourcing approach for collecting phenotype information using student assignments.


We developed Phenotate, a web application for crowdsourcing disease phenotype annotations through assignments for undergraduate genetics students. Using student-collected data, we generated composite annotations for each disease through a machine learning approach. These annotations were compared with those from clinical practitioners and gold standard curated data.


Deploying Phenotate in five undergraduate genetics courses, we collected annotations for 22 diseases. Student-sourced annotations showed strong similarity to gold standards, with F-measures ranging from 0.584 to 0.868. Furthermore, clinicians used Phenotate annotations to identify diseases with comparable accuracy to other annotation sources and gold standards. For six disorders, no gold standards were available, allowing us to create some of the first structured annotations for them, while students demonstrated ability to research RDs.


Phenotate enables crowdsourcing RD phenotypic annotations through educational assignments. Presented as an intuitive web-based tool, it offers pedagogical benefits and augments the computable RD knowledgebase.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The Phenotate annotator and comparator user interfaces.
Fig. 2
Fig. 3

Change history

  • 18 June 2020

    The original version of this Article contained an incorrect supplementary file in the Excel file format. This has now been replaced in the HTML version of the Article.


  1. 1.

    Wakap SN, Lambert DM, Olry A, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–173.

    Article  Google Scholar 

  2. 2.

    Girdea M, Dumitriu S, Fiume M, et al. PhenoTips: patient phenotyping software for clinical and research use. Hum Mutat. 2013;34:1057–1065.

    Article  Google Scholar 

  3. 3.

    Smedley D, Jacobsen JOB, Jäger M, et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10:2004–2015.

    CAS  Article  Google Scholar 

  4. 4.

    Smedley D, Schubach M, Jacobsen JOB, et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am J Hum Genet. 2016;99:595–606.

    CAS  Article  Google Scholar 

  5. 5.

    Buske OJ, Girdea M, Dumitriu S, et al. PhenomeCentral: a portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Hum Mutat. 2015;36:931–940.

    Article  Google Scholar 

  6. 6.

    Masino AJ, Dechene ET, Dulik MC, et al. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology. BMC Bioinformatics. 2014;15:248.

    Article  Google Scholar 

  7. 7.

    Foong J, Girdea M, Stavropoulos J, Brudno M. Prioritizing clinically relevant copy number variation from genetic interactions and gene function data. PLoS One. 2015;10:e0139656.

    Article  Google Scholar 

  8. 8.

    Köhler S, Carmody L, Vasilevsky N, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2019;47(D1):D1018–D1027.

    Article  Google Scholar 

  9. 9.

    Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–517.

    CAS  Article  Google Scholar 

  10. 10.

    Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum Mutat. 2012;33:803–808.

    Article  Google Scholar 

  11. 11.

    Maiella S, Olry A, Hanauer M, et al. Harmonising phenomics information for a better interoperability in the rare disease field. Eur J Med Genet. 2018;61:706–714.

    Article  Google Scholar 

  12. 12.

    Orphanet. Orphadata. Accessed 15 November 2019.

  13. 13.

    Orphanet. What is HOOM (the HPO-ORDO Ontological Module)? Accessed 15 November 2019.

  14. 14.

    Kawrykow A, Roumanis G, Kam A, et al. Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One. 2012;7:e31362.

    CAS  Article  Google Scholar 

  15. 15.

    Meyer AND, Longhurst CA, Singh H. Crowdsourcing diagnosis for patients with undiagnosed illnesses: an evaluation of CrowdMed. J Med Internet Res. 2016;18:e12.

    Article  Google Scholar 

  16. 16.

    MetaSUB International Consortium. The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report. Microbiome. 2016;4:24.

    Article  Google Scholar 

  17. 17.

    Afshinnekoo E, Ahsanuddin S, Mason CE. Globalizing and crowdsourcing biomedical research. Br Med Bull. 2016;120:27–33.

    Article  Google Scholar 

  18. 18.

    Abadi M, Barham P, Chen J et al. TensorFlow: A System for Large-Scale Machine Learning. Paper presented at the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, 2–6 November 2016.

  19. 19.

    De Paepe A, Devereux RB, Dietz HC, Hennekam RC, Pyeritz RE. Revised diagnostic criteria for the Marfan syndrome. Am J Med Genet. 1996;62:417–426.

    Article  Google Scholar 

  20. 20.

    National Institute of Neurological Disorders and Stroke. Friedreich ataxia fact sheet. Accessed 23 October 2019.

  21. 21.

    Abicht A, Müller J, Lochmüller H. Congenital myasthenic syndromes. In: Adam MP, Ardinger HH, Pagon RA, et al., editors. GeneReviews®. Seattle, WA: University of Washington, Seattle; 1993. Accessed 23 October 2019.

Download references


We thank Orion Buske, Marta Gîrdea, and other members of the Centre for Computational Medicine for their guidance during the development stage of this project. Furthermore, we thank the clinical geneticists who have submitted annotations to the project. We also thank Peter Roy, Karim Mekhail, Alistair Dias, Bernard Duncker, and Nagham Abdalahad for integrating Phenotate into their classes. We thank their students, as well as Chloe Ng, for contributing annotations. We also thank Sana Tonekaboni for her advice on ML methods, Andrei Turinsky for his advice on statistics, and Jixuan Wang for his assistance in integrating Phenotate into LMP408. We use web-based calculators on Social Science Statistics to computeP values. This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the ERA-NET Cofund action number 643578, E-Rare3; the Canadian component of the work was supported by the Canadian Institutes of Health Research (CIHR); and Genome Canada. A.X.L. received funding to work on Phenotate from a University of Toronto Faculty of Medicine Comprehensive Research for Medical Students (CREMS) Scholarship.

Author information



Corresponding author

Correspondence to Michael Brudno PhD.

Ethics declarations


The authors declare no conflicts of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chang, W.H., Mashouri, P., Lozano, A.X. et al. Phenotate: crowdsourcing phenotype annotations as exercises in undergraduate classes. Genet Med (2020).

Download citation


  • rare diseases
  • phenotype
  • crowdsourcing
  • medical education
  • machine learning