Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Association of genes to genetically inherited diseases using data mining

This article has been updated

Abstract

Although approximately one-quarter of the roughly 4,000 genetically inherited diseases currently recorded in respective databases (LocusLink1, OMIM2) are already linked to a region of the human genome, about 450 have no known associated gene. Finding disease-related genes requires laborious examination of hundreds of possible candidate genes (sometimes, these are not even annotated; see, for example, refs 3,4). The public availability of the human genome5 draft sequence has fostered new strategies to map molecular functional features of gene products to complex phenotypic descriptions, such as those of genetically inherited diseases. Owing to recent progress in the systematic annotation of genes using controlled vocabularies6, we have developed a scoring system for the possible functional relationships of human genes to 455 genetically inherited diseases that have been mapped to chromosomal regions without assignment of a particular gene. In a benchmark of the system with 100 known disease-associated genes, the disease-associated gene was among the 8 best-scoring genes with a 25% chance, and among the best 30 genes with a 50% chance, showing that there is a relationship between the score of a gene and its likelihood of being associated with a particular disease. The scoring also indicates that for some diseases, the chance of identifying the underlying gene is higher.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Components used for deriving associations between phenotypic features and gene functions.
Figure 2: Example of the analysis of 'spinocerebellar ataxia-8, infantile, with sensory neuropathy'.

Similar content being viewed by others

Change history

  • 05 June 2002

    New versions of the three pieces of supplementary info were placed on the site. These new versions did not contain any new information - the changes were strictly cosmetic.

References

  1. Pruit, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140 (2001).

    Article  Google Scholar 

  2. Hamosh, A., Scott, A.F., Amberger, J., Valle, D. & McKusick, V.A. Online mendelian inheritance in man (OMIM). Hum. Mutat. 15, 57–61 (2000).

    Article  CAS  Google Scholar 

  3. Garcia, C.K. et al. Autosomal recessive hypercholesterolemia caused by mutations in a putative LDL receptor adaptor protein. Science 292, 1394–1398 (2001).

    Article  CAS  Google Scholar 

  4. Zhou, B., Westaway, S.K., Levinson, B., Johnson, M.A., Gitschier, J. & Hayflick, S.J. A novel pantohenate kinase gene (PANK2) is defective in Hallervorden-Spatz syndrome. Nature Genet. 28, 345–349 (2001).

    Article  CAS  Google Scholar 

  5. Lander, E.S. et al. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  6. The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).

  7. Zimmermann, H.J. Fuzzy Set Theory and its Applications 3rd edn (Kluwer Academics, Boston, 1996).

    Book  Google Scholar 

  8. Hogenesch, J.B. et al. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 106, 413–415 (2001).

    Article  CAS  Google Scholar 

  9. Plaitakis, A., Flessas, P., Natsiou, A.B. & Shashidharan, P. Glutamate dehydrogenase deficiency in cerebellar degenerations: clinical, biochemical and molecular genetic aspects. Can. J. Neurol. Sci. 20, S109–S116 (1993).

    Article  Google Scholar 

  10. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).

    Article  CAS  Google Scholar 

  11. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank Y.P. Yuan, J. Reina, D. Torrents, M. Suyama and other members of our group for helpful discussions. We are grateful to the US National Library of Medicine for kind licensing of MEDLINE, to NLM annotators for their extensive work in annotating MEDLINE papers with MeSH terms, and to the developers of RefSeq, LocusLink and Gene Ontology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peer Bork.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Perez-Iratxeta, C., Bork, P. & Andrade, M. Association of genes to genetically inherited diseases using data mining. Nat Genet 31, 316–319 (2002). https://doi.org/10.1038/ng895

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng895

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing