Computational documentation of genetic disorders is highly reliant on structured data for differential diagnosis, pathogenic variant identification, and patient matchmaking. However, most information on rare diseases (RDs) exists in freeform text, such as academic literature. To increase availability of structured RD data, we developed a crowdsourcing approach for collecting phenotype information using student assignments.
We developed Phenotate, a web application for crowdsourcing disease phenotype annotations through assignments for undergraduate genetics students. Using student-collected data, we generated composite annotations for each disease through a machine learning approach. These annotations were compared with those from clinical practitioners and gold standard curated data.
Deploying Phenotate in five undergraduate genetics courses, we collected annotations for 22 diseases. Student-sourced annotations showed strong similarity to gold standards, with F-measures ranging from 0.584 to 0.868. Furthermore, clinicians used Phenotate annotations to identify diseases with comparable accuracy to other annotation sources and gold standards. For six disorders, no gold standards were available, allowing us to create some of the first structured annotations for them, while students demonstrated ability to research RDs.
Phenotate enables crowdsourcing RD phenotypic annotations through educational assignments. Presented as an intuitive web-based tool, it offers pedagogical benefits and augments the computable RD knowledgebase.
Subscribe to Journal
Get full journal access for 1 year
only $94.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Wakap SN, Lambert DM, Olry A, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28:165–173.
Girdea M, Dumitriu S, Fiume M, et al. PhenoTips: patient phenotyping software for clinical and research use. Hum Mutat. 2013;34:1057–1065.
Smedley D, Jacobsen JOB, Jäger M, et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10:2004–2015.
Smedley D, Schubach M, Jacobsen JOB, et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am J Hum Genet. 2016;99:595–606.
Buske OJ, Girdea M, Dumitriu S, et al. PhenomeCentral: a portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Hum Mutat. 2015;36:931–940.
Masino AJ, Dechene ET, Dulik MC, et al. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology. BMC Bioinformatics. 2014;15:248.
Foong J, Girdea M, Stavropoulos J, Brudno M. Prioritizing clinically relevant copy number variation from genetic interactions and gene function data. PLoS One. 2015;10:e0139656.
Köhler S, Carmody L, Vasilevsky N, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2019;47(D1):D1018–D1027.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–517.
Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum Mutat. 2012;33:803–808.
Maiella S, Olry A, Hanauer M, et al. Harmonising phenomics information for a better interoperability in the rare disease field. Eur J Med Genet. 2018;61:706–714.
Orphanet. Orphadata. http://www.orphadata.org/cgi-bin/index.php#phenotypesmodal. Accessed 15 November 2019.
Orphanet. What is HOOM (the HPO-ORDO Ontological Module)? http://www.orphadata.org/cgi-bin/img/PDF/WhatIsHOOM.pdf. Accessed 15 November 2019.
Kawrykow A, Roumanis G, Kam A, et al. Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One. 2012;7:e31362.
Meyer AND, Longhurst CA, Singh H. Crowdsourcing diagnosis for patients with undiagnosed illnesses: an evaluation of CrowdMed. J Med Internet Res. 2016;18:e12.
MetaSUB International Consortium. The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report. Microbiome. 2016;4:24.
Afshinnekoo E, Ahsanuddin S, Mason CE. Globalizing and crowdsourcing biomedical research. Br Med Bull. 2016;120:27–33.
Abadi M, Barham P, Chen J et al. TensorFlow: A System for Large-Scale Machine Learning. Paper presented at the Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, 2–6 November 2016.
De Paepe A, Devereux RB, Dietz HC, Hennekam RC, Pyeritz RE. Revised diagnostic criteria for the Marfan syndrome. Am J Med Genet. 1996;62:417–426.
National Institute of Neurological Disorders and Stroke. Friedreich ataxia fact sheet. https://www.ninds.nih.gov/Disorders/Patient-Caregiver-Education/Fact-Sheets/Friedreichs-Ataxia-Fact-Sheet. Accessed 23 October 2019.
Abicht A, Müller J, Lochmüller H. Congenital myasthenic syndromes. In: Adam MP, Ardinger HH, Pagon RA, et al., editors. GeneReviews®. Seattle, WA: University of Washington, Seattle; 1993. http://www.ncbi.nlm.nih.gov/books/NBK1168/. Accessed 23 October 2019.
We thank Orion Buske, Marta Gîrdea, and other members of the Centre for Computational Medicine for their guidance during the development stage of this project. Furthermore, we thank the clinical geneticists who have submitted annotations to the project. We also thank Peter Roy, Karim Mekhail, Alistair Dias, Bernard Duncker, and Nagham Abdalahad for integrating Phenotate into their classes. We thank their students, as well as Chloe Ng, for contributing annotations. We also thank Sana Tonekaboni for her advice on ML methods, Andrei Turinsky for his advice on statistics, and Jixuan Wang for his assistance in integrating Phenotate into LMP408. We use web-based calculators on Social Science Statistics to computeP values. This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the ERA-NET Cofund action number 643578, E-Rare3; the Canadian component of the work was supported by the Canadian Institutes of Health Research (CIHR); and Genome Canada. A.X.L. received funding to work on Phenotate from a University of Toronto Faculty of Medicine Comprehensive Research for Medical Students (CREMS) Scholarship.
The authors declare no conflicts of interest.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chang, W.H., Mashouri, P., Lozano, A.X. et al. Phenotate: crowdsourcing phenotype annotations as exercises in undergraduate classes. Genet Med (2020). https://doi.org/10.1038/s41436-020-0812-7
- rare diseases
- medical education
- machine learning