Synopsis

Subject Categories: Computational methods | Molecular Biology of Disease

Molecular Systems Biology 4 Article number: 189  doi:10.1038/msb.2008.27
Published online: 6 May 2008
Citation: Molecular Systems Biology 4:189

Network-based global inference of human disease genes

Xuebing Wu1, Rui Jiang1, Michael Q Zhang1,2 & Shao Li1

  1. MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, China
  2. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA

Correspondence to: Shao Li1 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China. Tel.: +86 010 62797035; Fax: +86 010 62786911; Email: shaoli@mail.tsinghua.edu.cn

Received 18 September 2007; Accepted 17 March 2008; Published online 6 May 2008

Top

Article highlights

  • The global concordance between human protein network and phenotype network is modeled, and a method that robustly predicts causative genes with high accuracy for many phenotypes and outperforms most of other disease gene finding methods currently available is proposed.
  • The proposed method is applicable to genetically uncharacterized phenotypes and is effective in the genome-wide scan of disease genes for phenotypes lacking known associated loci.
  • The nonlinear extension of the linear regression model can be used to explore gene-cooperative behavior in complex diseases, and three putative pairs of cooperative genes in breast cancer are identified.
  • The predicted genetic landscape of human diseases reveals the global modular organization of phenotype-genotype relationships, and the genome-wide prioritization of candidate genes for over 5000 human phenotypes is released to facilitate future discovery of disease genes.

Top

Synopsis

The identification of genes responsible for human diseases is of great importance for both understanding human disease pathogenesis and improving clinical practice. Traditional gene-mapping approaches, such as linkage analysis and association studies (Botstein and Risch, 2003), though succeeded in identifying causative genes for many Mendelian diseases, have less power in identifying genes for complex and common diseases such as autism, inflammatory bowel disease, diabetes, coronary heart diseases, various cancers, and many others. In addition, these methods generally yield a large genomic region containing tens or even hundreds of candidate genes that need to be further analyzed, resulting in a task that is often expensive and laborious. Therefore, one of the greatest challenges in human genetics is to computationally prioritize candidate genes from the results of gene-mapping studies (even across the whole genome), to assist biologists in identifying causative genes.

Evidences from many sources suggest that similar phenotypes are caused by functionally related genes, referring to as the modular nature of human genetic diseases (Oti and Brunner, 2007). Many studies show that the causative genes for the same or similar diseases will generally reside in the same biological module, either a protein complex (Lage et al, 2007), a pathway (Wood et al, 2007), or a subnetwork of protein interactions (Lim et al, 2006). With this understanding, we propose a regression model that integrates human protein–protein interaction network and disease phenotype similarities with known gene–phenotype associations to infer novel gene–phenotype associations. Our method, named CIPHER (Correlating protein Interaction network and PHEnotype network to pRedict disease genes), can robustly uncover causative genes with high accuracy for many phenotypes, outperforming most of other disease gene-finding methods currently available.

CIPHER is applicable to phenotypes without any known causative genes by taking advantage of information from phenotypically similar diseases. Further, CIPHER is effective in prioritizing candidate genes from the whole genome instead of genetic loci, which enables us to perform genome-wide scans of causative genes for most of the recorded human phenotypes. In contrast, many existing methods rely on a list of known causative genes for the query phenotype or are limited to phenotypes with associated loci, which currently comprise less than half of the recorded human phenotypes (McKusick, 2007). The regression model of CIPHER can be further extended to the nonlinear situation to explore gene cooperative behavor in complex diseases. As application examples, three putative pairs of cooperative genes in breast cancer are identified.

A draft genetic landscape of human diseases is predicted by CIPHER. The inferred genome-wide molecular basis for 1126 phenotypes with at least one known causative genes in our data reveals the modular organization of human genotype–phenotype relationships (Figure 3). The modular disease landscape shows that a set of functionally related genes are implicated in a set of genetically overlapped phenotypes, suggesting interesting and meaningful connections between gene functions and specific disease categories. We further depict a much more comprehensive landscape for human diseases, including more than 14 000 human proteins and more than 5000 phenotypes. All these data are publicly available online. We hope the predicted genetic landscape will facilitate the discovery of disease genes in the future.

Top

Acknowledgements

We thank Dr HG Brunner and his laboratory for the generosity of providing us with the phenotype network data, and Dr Xuegong Zhang, Dr Yuanlie Lin and members in our laboratory for useful discussion. This study is supported by MOST of China (nos 2006AA02Z311 and 2006BAI08B05-05), NSFC (nos 90709013 and 60721003), and the 985 fund of Tsinghua University. MQZ is also partly supported by the Chang Jiang Scholarship programme and by NIH HG06916.

Top

References

  1. Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat Genet 33: 228–237 | Article | PubMed | ISI | ChemPort |
  2. Lage K, Karlberg EO, Storling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tumer Z, Pociot F, Tommerup N, Moreau Y, Brunak S (2007) A human phenome–interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 25: 309–316 | Article | PubMed | ISI | ChemPort |
  3. Lim J, Hao T, Shaw C, Patel AJ, Szabo G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabasi AL, Vidal M, Zoghbi HY (2006) A protein–protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 125: 801–814 | Article | PubMed | ISI | ChemPort |
  4. McKusick VA (2007) Mendelian inheritance in man and its online version, OMIM. Am J Hum Genet 80: 588–604 | Article | PubMed | ChemPort |
  5. Oti M, Brunner HG (2007) The modular nature of genetic diseases. Clin Genet 71: 1–11 | Article | PubMed | ChemPort |
  6. Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z et al (2007) The genomic landscapes of human breast and colorectal cancers. Science 318: 1108–1113 | Article | PubMed | ChemPort |

MORE ARTICLES LIKE THIS

These links to content published by NPG are automatically generated.

RESEARCH

A text-mining analysis of the human phenome

European Journal of Human Genetics Article Response

Extra navigation

.
ADVERTISEMENT