Diagnosing monogenic diseases facilitates optimal care, but can involve the manual evaluation of hundreds of genetic variants per case. Computational tools like Phrank expedite this process by ranking all candidate genes by their ability to explain the patient’s phenotypes. To use these tools, busy clinicians must manually encode patient phenotypes from lengthy clinical notes. With 100 million human genomes estimated to be sequenced by 2025, a fast alternative to manual phenotype extraction from clinical notes will become necessary.


We introduce ClinPhen, a fast, high-accuracy tool that automatically converts clinical notes into a prioritized list of patient phenotypes using Human Phenotype Ontology (HPO) terms.


ClinPhen shows superior accuracy and 20× speedup over existing phenotype extractors, and its novel phenotype prioritization scheme improves the performance of gene-ranking tools.


While a dedicated clinician can process 200 patient records in a 40-hour workweek, ClinPhen does the same in 10 minutes. Compared with manual phenotype extraction, ClinPhen saves an additional 3–5 hours per Mendelian disease diagnosis. Providers can now add ClinPhen’s output to each summary note attached to a filled testing laboratory request form. ClinPhen makes a substantial contribution to improvements in efficiency critically needed to meet the surging demand for clinical diagnostic sequencing.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    Church G. Compelling reasons for repairing human germlines. N Engl J Med. 2017;377:1909–1911.

  2. 2.

    Jagadeesh KA, Wenger AM, Berger MJ, et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet. 2016;48:1581.

  3. 3.

    Dewey FE, Grove ME, Pan C, et al. Clinical interpretation and implications of whole-genome sequencing. JAMA . 2014;311:1035–1045.

  4. 4.

    Stephens ZD, Lee SY, Faghri F, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13:e1002195

  5. 5.

    Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164–e164.

  6. 6.

    McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics . 2010;26:2069–2070.

  7. 7.

    De Baets G, Van Durme J, Reumers J, et al. SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Res. 2012;40(Database issue):D935–D939.

  8. 8.

    Jagadeesh KA, Birgmeier J, Guturu H, et al. Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization. Genet Med. 2018 Jul 12; [Epub ahead of print].

  9. 9.

    Smedley D, Jacobsen JOB, Jager M, et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10:2004–2015.

  10. 10.

    Robinson PN, Köhler S, Oellrich A, et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 2014;24:340–348.

  11. 11.

    Zemojtel T, Köhler S, Mackenroth L, et al. Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome. Sci Transl Med. 2014;6:252ra123–252ra123.

  12. 12.

    Bauer S, Köhler S, Schulz MH, Robinson PN. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics. 2012;28:2502–2508.

  13. 13.

    James RA, Campbell IM, Chen ES, et al. A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics. Genome Med. 2016;8:13.

  14. 14.

    Javed A, Agrawal S, Ng PC. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat Methods. 2014;11:935–937.

  15. 15.

    Köhler S, Schulz MH, Krawitz P, et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009;85:457–464.

  16. 16.

    Singleton MV, Guthery SL, Voelkerding KV, et al. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet. 2014;94:599–610.

  17. 17.

    Smedley D, Schubach M, Jacobsen JO, et al. A whole-genome analysis framework for effective Identification of pathogenic regulatory variants in Mendelian disease. Am J Hum Genet. 2016;99:595–606.

  18. 18.

    Trakadis YJ, Buote C, Therriault J-F, Jacques P-É, Larochelle H, Lévesque S. PhenoVar: a phenotype-driven approach in clinical genomics for the diagnosis of polymalformative syndromes. BMC Med Genom. 2014;7:22–22.

  19. 19.

    Yang H, Robinson PN, Wang K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat Methods. 2015;12:841–843.

  20. 20.

    Groza T, Köhler S, Moldenhauer D, et al. The Human Phenotype Ontology: semantic unification of common and raredisease. Am J Hum Genet. 2015;97:111–124.

  21. 21.

    Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. v.2001:17-21. Accessed 28 November 2018.

  22. 22.

    Cui L, Sahoo SS, Lhatoo SD, et al. Complex epilepsy phenotype extraction from narrative clinical discharge summaries. J Biomed Inform. 2014;51:272–279.

  23. 23.

    Jonquet C, Shah NH, Musen MA. The Open Biomedical Annotator. Summit Transl Bioinforma. 2009;2009:56–60.

  24. 24.

    Köhler S, Vasilevsky NA, Engelstad M, et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 2017;45(Database issue):D865–D876.

  25. 25.

    Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–513.

  26. 26.

    Shivade C, Raghavan P, Fosler-Lussier E, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2014;21:221–230.

  27. 27.

    Taboada M, Rodríguez H, Martínez D, Pardo M, Sobrido MJ. Automated semantic annotation of rare disease cases: a case study. Database (Oxford). 2014;2014:bau045.

  28. 28.

    Cui L, Bozorgi A, Lhatoo SD, Zhang G-Q, Sahoo SS. EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification. AMIA Annu Symp Proc. 2012;2012:1191–1200.

  29. 29.

    Liao KP, Cai T, Gainer V, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 2010;62:1120–1127.

  30. 30.

    Kernohan KD, Hartley T, Alirezaie N, Robinson PN, Dyment DA, Boycott KM. Evaluation of exome filtering techniques for the analysis of clinically relevant genes. Hum Mutat. 2017;39:197–201.

  31. 31.

    Bird S. NLTK: the Natural Language Toolkit. COLING-ACL '06: Proceedings of the COLING/ACL on Interactive presentation sessions. (Sydney, Australia) 2006:69-72. Accessed 28 November 2018.

  32. 32.

    Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE—an integrated standards-based translational research informatics platform. AMIA Annu Symp Proc. 2009;2009:391–395.

  33. 33.

    Smith R. An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition. Vol. 2. Washington, DC: IEEE Computer Society; (Parana, Brazil) 2007:629–633. Accessed 14 September 2018.

  34. 34.

    Son JH, Xie G, Yuan C, et al. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. Am J Hum Genet. 2018;103:58–73.

  35. 35.

    Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017;542:433–438.

  36. 36.

    Taruscio D, Groft SC, Cederroth H, et al. Undiagnosed Diseases Network International (UDNI): white paper for global actions to meet patient needs. Mol Genet Metab. 2015;116:223–225.

  37. 37.

    Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 2015;43(Database issue):D789–D798.

  38. 38.

    Jagadeesh KA, Wu DJ, Birgmeier JA, Boneh D, Bejerano G. Deriving genomic diagnoses without revealing patient genomes. Science. 2017;357:692–695.

Download references


We thank Julia Buckingham and Morgan Danowski for assistance with obtaining patient data; Paul McDonagh and Margaret Bray for introductions and data sharing; Charlie Curnin, Marta Maria Majcherska, and Colleen McCormack for facilitating access to patient data; and Bejerano Lab members and Elijah Kravets for project feedback. Clinicians’ research was supported by the National Insitutes of Health (NIH) Common Fund, Office of Strategic Coordination/Office of the NIH Director Awards U01HG007690, U01HG007708, U01HG007530, U01HG007942. Manton Center sequence analysis and diagnosis was supported by NIH 1U54HD090255 IDDRC Molecular Genetics Core grant. The Duke UDN site is funded by NIH grant U01HG007672 (principal investigators: V. Shashi and D.B. Goldstein). UCLA’s J.A.M.-A. and R.S. were supported by UDN grant HG007703-05. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. All computational tool building was supported by a Stanford Bio-X Undergraduate Summer Research Program (C.A.D.), a Bio-X Stanford Interdisciplinary Graduate Fellowship (J.B.) and by the Defense Advanced Research Projects Agency (DARPA) and the Stanford Pediatrics Department (G.B.).

Author information

Author notes

  1. These authors contributed equally: Johannes Birgmeier, Cole A. Deisseroth


  1. Department of Computer Science, Stanford University, Stanford, CA, USA

    • Cole A. Deisseroth
    • , Johannes Birgmeier MS
    •  & Gill Bejerano PhD
  2. Department of Pediatrics, Stanford School of Medicine, Stanford, CA, USA

    • Ethan E. Bodle MD
    • , Dena R. Matalon MD
    • , Jonathan A. Bernstein MD, PhD
    •  & Gill Bejerano PhD
  3. Stanford Center for Undiagnosed Diseases, Stanford, CA, USA

    • Jennefer N. Kohler MS, LCGC
    •  & Matthew T. Wheeler MD, PhD
  4. Department of Biomedical Data Science, Stanford University, Stanford, CA, USA

    • Yelena Nazarenko BA
    •  & Gill Bejerano PhD
  5. The Manton Center for Orphan Disease Research, Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA

    • Casie A. Genetti MS, CGC
    • , Catherine A. Brownstein MPH, PhD
    • , Klaus Schmitz-Abe PhD
    •  & Alan H. Beggs PhD
  6. Department of Pediatrics, Duke University School of Medicine, Durham, NC, USA

    • Kelly Schoch MS, CGC
    • , Heidi Cope MS, CGC
    •  & Vandana Shashi MBBS, MD
  7. Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA

    • Rebecca Signer MS,CGC
    •  & Julian A. Martinez-Agosto MD, PhD
  8. Department of Pediatrics, Division of Medical Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA

    • Julian A. Martinez-Agosto MD, PhD
  9. Department of Psychiatry, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA

    • Julian A. Martinez-Agosto MD, PhD
  10. Department of Medicine, Stanford School of Medicine, Stanford, CA, USA

    • Matthew T. Wheeler MD, PhD
  11. Department of Developmental Biology, Stanford University, Stanford, CA, USA

    • Gill Bejerano PhD


  1. Search for Cole A. Deisseroth in:

  2. Search for Johannes Birgmeier MS in:

  3. Search for Ethan E. Bodle MD in:

  4. Search for Jennefer N. Kohler MS, LCGC in:

  5. Search for Dena R. Matalon MD in:

  6. Search for Yelena Nazarenko BA in:

  7. Search for Casie A. Genetti MS, CGC in:

  8. Search for Catherine A. Brownstein MPH, PhD in:

  9. Search for Klaus Schmitz-Abe PhD in:

  10. Search for Kelly Schoch MS, CGC in:

  11. Search for Heidi Cope MS, CGC in:

  12. Search for Rebecca Signer MS,CGC in:

  13. Search for Julian A. Martinez-Agosto MD, PhD in:

  14. Search for Vandana Shashi MBBS, MD in:

  15. Search for Alan H. Beggs PhD in:

  16. Search for Matthew T. Wheeler MD, PhD in:

  17. Search for Jonathan A. Bernstein MD, PhD in:

  18. Search for Gill Bejerano PhD in:


  1. Undiagnosed Diseases Network


    The authors declare no conflicts of interest.

    Corresponding authors

    Correspondence to Jonathan A. Bernstein MD, PhD or Gill Bejerano PhD.

    Electronic supplementary material

    About this article

    Publication history