Article | Published:

Novel phenotype–disease matching tool for rare genetic diseases

Genetics in Medicine (2018) | Download Citation




To improve the accuracy of matching rare genetic diseases based on patient’s phenotypes.


We introduce new methods to prioritize diagnosis of genetic diseases based on integrated semantic similarity (method 1) and ontological overlap (method 2) between the phenotypes expressed by a patient and phenotypes annotated to known diseases.


We evaluated the performance of our methods by two sets of simulated data and one set of patient’s data derived from electronic health records. We demonstrated that the two methods achieved significantly improved performance compared with previous methods in correctly prioritizing candidate diseases in all of the three sets. Our methods are freely available as a web application ( to aid diagnosis of genetic diseases.


Our methods can capture the diagnostic information embedded in the phenotype ontology, consider all phenotypes exhibited by a patient, and are more robust than the existing methods when phenotypes are incorrectly or imprecisely specified. These methods can assist the diagnosis of rare genetic diseases and help the interpretation of the results of DNA tests.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    Yang Y, Muzny DM, Reid JG, et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med. 2013;369:1502–11.

  2. 2.

    Zemojtel T, Kohler S, Mackenroth L, et al. Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med. 2014;6:252ra123.

  3. 3.

    Alves R, Pinol M, Vilaplana J, et al. Computer-assisted initial diagnosis of rare diseases. PeerJ. 2016;4:e2211.

  4. 4.

    Kohler S, Schulz MH, Krawitz P, et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009;85:457–64.

  5. 5.

    Smedley D, Robinson PN. Phenotype-driven strategies for exome prioritization of human Mendelian disease genes. Genome Med. 2015;7:81.

  6. 6.

    Masino AJ, Dechene ET, Dulik MC, et al. Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the Human Phenotype Ontology. BMC Bioinformatics. 2014;15:248.

  7. 7.

    Kohler S, Vasilevsky NA, Engelstad M, et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 2017;45(D1):D865–76.

  8. 8.

    Coordinators NR. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2017;45(D1):D12–7.

  9. 9.

    Orphanet: an online database of rare diseases and orphan drugs. 1997; Accessed 10 June 2018.

  10. 10.

    Hoehndorf R, Schofield PN, Gkoutos GV. Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Sci Rep. 2015;5:10888.

  11. 11.

    Bauer S, Kohler S, Schulz MH, Robinson PN. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics. 2012;28:2502–8.

  12. 12.

    Resnik P. Using information content to evaluate semantic similarity in a taxonomy. Int Joint Conf Artif. 1995:448-53. Proceedings of the 14th International Joint Conference on Artificial Intelligence (Morgan Kaufmann, San Francisco), Vol 1, pp 448–453.

  13. 13.

    Mostafavi S, Ray D, Warde-Farley D, et al. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9(suppl 1):S4.

  14. 14.

    Chen J, Xu H, Aronow BJ, et al. Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinform. 2007;8:392.

  15. 15.

    Alexa A, Rahnenfuhrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–7.

  16. 16.

    R: A language and environment for statistical computing. (R Foundation for Statistical Computing, 2016).

  17. 17.

    Fang H, Gough J. The ‘dnet’ approach promotes emerging research on cancer patient survival. Genome Med. 2014;6:64.

  18. 18.

    Greene D, Richardson S, Turro, E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics. 2017;33: 1104–1106.

  19. 19.

    Michael D. metap: meta-analysis of significance values. Rpackage version 0.8. 2017.

  20. 20.

    Tseytlin E, Mitchell K, Legowski E, et al. NOBLE—flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinform. 2016;17:32.

  21. 21.

    Simon U. rJava: Low-Level R to Java Interface. R packageversion 0.9-9, 2017.

  22. 22.

    Winston C, Joe C, JJ Allaire, et al. shiny: Web Application Framework for R. R packageversion 1.0.5., 2017.

  23. 23.

    Ma H, Bandos AI, Rockette HE, et al. On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat Med. 2013;32:3449–58.

  24. 24.

    McClish DK. Analyzing a portion of the ROC curve. Med Decis Making. 1989;9:190–5.

  25. 25.

    Mungall CJ, McMurry JA, Kohler S, et al. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 2017;45(D1):D712–22.

  26. 26.

    Trakadis YJ, Buote C, Therriault JF, et al. PhenoVar: a phenotype-driven approach in clinical genomics for the diagnosis of polymalformative syndromes. BMC Med Genomics. 2014;7:22.

  27. 27.

    Robinson PN, Kohler S, Oellrich A, et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 2014;24:340–8.

Download references


The authors would like to acknowledge Alka Chandel, Parth Divekar, and Diana Epperson for helping to query and organize clinical data from the i2b2 database. This study is partially funded by the Center for Pediatric Genomics, Cincinnati Children’s Hospital Medical Center, and National Institutes of Health (NIH) grant U01 HG008666.

Author information


  1. Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA

    • Jing Chen PhD
    • , Anil Jegga DVM, MRes
    •  & Pete S. White PhD
  2. Division of Biostatistics and Bioinformatics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA

    • Huan Xu MS
  3. Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA

    • Kejian Zhang MD, MBA
    • , Pete S. White PhD
    •  & Ge Zhang MD, PhD
  4. Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA

    • Kejian Zhang MD, MBA
    •  & Ge Zhang MD, PhD


  1. Search for Jing Chen PhD in:

  2. Search for Huan Xu MS in:

  3. Search for Anil Jegga DVM, MRes in:

  4. Search for Kejian Zhang MD, MBA in:

  5. Search for Pete S. White PhD in:

  6. Search for Ge Zhang MD, PhD in:


The authors declare no conflicts of interest.

Corresponding authors

Correspondence to Jing Chen PhD or Ge Zhang MD, PhD.

Electronic supplementary material

About this article

Publication history