Phen-Gen: combining phenotype and genotype to analyze rare disorders

Javed, Asif; Agrawal, Saloni; Ng, Pauline C

doi:10.1038/nmeth.3046

Brief Communication
Published: 03 August 2014

Phen-Gen: combining phenotype and genotype to analyze rare disorders

Asif Javed¹,
Saloni Agrawal¹ &
Pauline C Ng¹

Nature Methods volume 11, pages 935–937 (2014)Cite this article

8551 Accesses
105 Citations
27 Altmetric
Metrics details

Subjects

Abstract

We introduce Phen-Gen, a method that combines patients' disease symptoms and sequencing data with prior domain knowledge to identify the causative genes for rare disorders. Simulations revealed that the causal variant was ranked first in 88% of cases when it was a coding variant—a 52% advantage over a genotype-only approach—and Phen-Gen outperformed other existing prediction methods by 13–58%. If disease etiology was unknown, the causal variant was assigned the top rank in 71% of simulations. Phen-Gen is available at http://phen-gen.org/.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Comparison with VAAST, eXtasy and VAAST+PHEVOR.**

Genetic association analysis of 77,539 genomes reveals rare disease etiologies

Article Open access 16 March 2023

Rare-variant collapsing analyses for complex traits: guidelines and applications

Article 11 October 2019

Genetic associations of protein-coding variants in human disease

Article Open access 23 February 2022

References

de Ligt, J. et al. N. Engl. J. Med. 367, 1921–1929 (2012).
Article CAS PubMed Google Scholar
Yang, Y. et al. N. Engl. J. Med. 369, 1502–1511 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cordero, J.F. N. Engl. J. Med. 352, 2032 (2005).
Article CAS Google Scholar
Amberger, J., Bocchini, C.A., Scott, A.F. & Hamosh, A. Nucleic Acids Res. 37, D793–D796 (2009).
Article CAS PubMed Google Scholar
Chakravarti, A. Genome Res. 21, 643–644 (2011).
Article CAS PubMed PubMed Central Google Scholar
Köhler, S. et al. Am. J. Hum. Genet. 85, 457–464 (2009).
Article PubMed PubMed Central Google Scholar
Sifrim, A. et al. Nat. Methods 10, 1083–1084 (2013).
Article CAS PubMed Google Scholar
Yandell, M. et al. Genome Res. 21, 1529–1542 (2011).
Article CAS PubMed PubMed Central Google Scholar
Singleton, M.V. et al. Am. J. Hum. Genet. 94, 599–610 (2014).
Article CAS PubMed PubMed Central Google Scholar
Robinson, P.N. et al. Genome Res. 24, 340–348 (2014).
Article CAS PubMed PubMed Central Google Scholar
Stenson, P.D. et al. Hum. Genet. 133, 1–9 (2014).
Article CAS PubMed Google Scholar
Fu, W. et al. Nature 493, 216–220 (2013).
Article CAS PubMed Google Scholar
Visel, A. et al. Nature 464, 409–412 (2010).
Article CAS PubMed PubMed Central Google Scholar
Khurana, E. et al. Science 342, 1235587 (2013).
Article PubMed PubMed Central Google Scholar
Pruitt, K.D. et al. Genome Res. 19, 1316–1323 (2009).
Article CAS PubMed PubMed Central Google Scholar
Sim, N.-L. et al. Nucleic Acids Res. 40, W452–W457 (2012).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I.A. et al. Nat. Methods 7, 248–249 (2010).
CAS PubMed PubMed Central Google Scholar
Kryukov, G.V., Shpunt, A., Stamatoyannopoulos, J.A. & Sunyaev, S.R. Proc. Natl. Acad. Sci. USA 106, 3871–3876 (2009).
Article CAS PubMed PubMed Central Google Scholar
Schwarz, J.M., Rödelsperger, C., Schuelke, M. & Seelow, D. Nat. Methods 7, 575–576 (2010).
Article CAS PubMed Google Scholar
Lewin, B. Genes VIII (Benjamin Cummings, 2004).
Price, A.L. et al. Am. J. Hum. Genet. 86, 832–838 (2010).
Article PubMed PubMed Central Google Scholar
Davydov, E.V. et al. PLoS Comput. Biol. 6, e1001025 (2010).
Article PubMed PubMed Central Google Scholar
Cooper, G.M. et al. Genome Res. 15, 901–913 (2005).
Article CAS PubMed PubMed Central Google Scholar
Prabhakar, S. et al. Genome Res. 16, 855–863 (2006).
Article CAS PubMed PubMed Central Google Scholar
Derrien, T. et al. Genome Res. 22, 1775–1789 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kozomara, A. & Griffiths-Jones, S. Nucleic Acids Res. 39, D152–D157 (2011).
Article CAS PubMed Google Scholar
Smith, N.G.C., Webster, M.T. & Ellegren, H. Genome Res. 12, 1350–1356 (2002).
Article CAS PubMed PubMed Central Google Scholar
He, L. & Hannon, G.J. Nat. Rev. Genet. 5, 522–531 (2004).
Article CAS PubMed Google Scholar
Esteller, M. Nat. Rev. Genet. 12, 861–874 (2011).
Article CAS PubMed Google Scholar
McLean, C.Y. et al. Nat. Biotechnol. 28, 495–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).
Sherry, S.T. et al. Nucleic Acids Res. 29, 308–311 (2001).
Article CAS PubMed PubMed Central Google Scholar
MacArthur, D.G. et al. Science 335, 823–828 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wu, G., Feng, X. & Stein, L. Genome Biol. 11, R53 (2010).
Article PubMed PubMed Central Google Scholar
Matthews, L. et al. Nucleic Acids Res. 37, D619–D622 (2009).
Article CAS PubMed Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. Nucleic Acids Res. 40, D109–D114 (2012).
Article CAS PubMed Google Scholar
Schaefer, C.F. et al. Nucleic Acids Res. 37, D674–D679 (2009).
Article CAS PubMed Google Scholar
Stark, C. et al. Nucleic Acids Res. 34, D535–D539 (2006).
Article CAS PubMed Google Scholar
Franceschini, A. et al. Nucleic Acids Res. 41, D808–D815 (2013).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Obayashi, T. et al. Nucleic Acids Res. 41, D1014–D1020 (2013).
Article CAS PubMed Google Scholar
The 1000 Genomes Project Consortium. Nature 491, 56–65 (2012).

Download references

Acknowledgements

This work was supported by the Agency for Science, Technology and Research (A*STAR), Singapore. We thank Radboud University Nijmegen Medical Centre for sharing the 100 intellectual disability patient data sets, particularly J. de Ligt for his help with this data. We also thank S. Köhler for his help with Phenomizer, N. Jinawath for her help interpreting patient symptoms, and S. Prabhakar and N. Clarke for their comments on the genomic predictor. We thank S. Prabhakar, S. Davila, A. Wilm and R. del Rosario for their comments on the manuscript.

Author information

Authors and Affiliations

Computational and Systems Biology Group, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
Asif Javed, Saloni Agrawal & Pauline C Ng

Authors

Asif Javed
View author publications
You can also search for this author in PubMed Google Scholar
Saloni Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Pauline C Ng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.J. conceived of and designed the project, designed and implemented the analysis framework, implemented methods, conducted experiments, interpreted results, wrote the initial manuscript and revised and proofread the paper. S.A. implemented methods, conducted experiments, set up the web server and revised and proofread the paper. P.C.N. conceived of and designed the project, revised and proofread the paper and supervised the project.

Corresponding authors

Correspondence to Asif Javed or Pauline C Ng.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Overall workflow.

Patient disease symptoms are matched against known disorders and the probability of a symptomatic match is assigned to genes implicated for the respective disorder. These probabilities are permeated to known gene associates using a random walk with restart on the interaction network. In parallel the patient’s sequencing data is analyzed and the damaging impact of each variant estimated and pooled within genes. These two predictions are combined to implicate the gene(s) involved.

Supplementary Figure 2 Distribution of SIFT and PolyPhen-2 scores for damaging and benign nonsynonymous mutations.

The distribution of SIFT and PolyPhen-2 scores for HGMD-reported damaging nonsynonymous mutations and neutral nonsynonymous fixed substitutions inferred from human-chimp alignment are shown. The plots indicate general agreement between the two methods.

Source data

Supplementary Figure 3 Deleteriousness predictions around splice site.

The figure depicts the probability of deleteriousness around donor and acceptor sites for splice site mutations.

Source data

Supplementary Figure 4 Probability of deleteriousness using the genomic predictor.

The figure illustrates the predicted deleteriousness of different combination of five annotations: GERP++ (G), PhyloP (P), near-genic (N), transcription factor binding sites (T), and DNase hypersensitive sites (D). The predictions are binned according to the number of annotations (shown on the x-axis). Each bin is further canonically sorted based on the fore mentioned order of annotations.

Source data

Supplementary Figure 5 Confidence intervals for positive and benign mutation set combinations.

The 90% confidence intervals of different combination of genomic annotations are shown. The order from Supplementary Figure 4 is maintained. With the four sub-figures representing combinations of the two positive sets (HGMD and GWAS) and the two neutral sets (common variation in dbSNP and Complete Genomics MAF>0.30), respectively.

Source data

Supplementary Figure 6 Histograms of the null distribution of deleteriousness of genes.

The top 1 percentile of damaging variants in each gene is shown. The histogram of this null distribution cutoff for all genes under dominant and recessive inheritance pattern for coding and genomic predictors is shown. Most genes do not harbor any putative damaging variants and hence the distributions are dominated by the left most bar; which has been truncated for better visual representation.

Source data

Supplementary Figure 7 Performance of variant predictors.

The distribution of damaging probabilities assigned to different classes of HGMD variants is shown. The top three panels employ the coding predictor. A genomic predictor was used for the bottom panel and applied to noncoding regulatory variants. The histograms depict the distribution of the scored variants. The pie charts on the right explicate the distribution of omitted and predicted variants in each category. Common variants (white) were observed in 1000 Genomes, ESP, or dbSNP with MAF 0.01. Commonly mutated genes indicate that the variants failed to exceed the null distribution of the respective gene (light green). Missed indicates that the variant eluded our regions of interest (dark blue).

Source data

Supplementary Figure 8 Prediction of heterozygous variants.

The figure depicts how compound heterozygous variants are evaluated. When both damaging variants reside within the coding region, the coding predictor is used to estimate the damaging impact of these variants. In cases when one or both variants lay outside the exon boundaries, both variants are evaluated using the genomic predictor.

Supplementary Figure 9 Phen-Gen and VAAST comparison for phenotypically heterogeneous disorders.

The comparison of Phen-Gen and VAAST in simulations using 44 phenotypically heterogeneous disorders and nonsynonymous mutations in HGMD is shown. In both panels the ability of both methods to narrow down the true gene search within 1, 5 and 10 genes is depicted. For Phen-Gen, the bar is split into the predictive power based on genotypic prediction and the added advantage gained from disease symptoms. VAAST only uses the genomic data and assign multiple genes the same rank at the top of the order. For a fair comparison, the true gene was assigned the worst, average and best rank among similarly ranked peers. The three components of the bar reflect the performance across these scenarios.

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–7 and Supplementary Note (PDF 5144 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Javed, A., Agrawal, S. & Ng, P. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat Methods 11, 935–937 (2014). https://doi.org/10.1038/nmeth.3046

Download citation

Received: 01 April 2014
Accepted: 16 June 2014
Published: 03 August 2014
Issue Date: September 2014
DOI: https://doi.org/10.1038/nmeth.3046

This article is cited by

GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation
- Roozbeh Manshaei
- Sean DeLong
- S. Mohsen Hosseini
BMC Medical Genomics (2022)
PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care
- James M. Havrilla
- Anbumalar Singaravelu
- Bimal R. Desai
BMC Medical Informatics and Decision Making (2022)
Predicting genes from phenotypes using human phenotype ontology (HPO) terms
- Anne Slavotinek
- Hannah Prasad
- Mark Kvale
Human Genetics (2022)
Systematic identification of genetic systems associated with phenotypes in patients with rare genomic copy number variations
- F. M. Jabato
- Pedro Seoane
- Juan A. G. Ranea
Human Genetics (2021)
Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes
- Afrah Shafquat
- Ronald G. Crystal
- Jason G. Mezey
BMC Bioinformatics (2020)

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links