Subjects

  • An Erratum to this article was published on 01 June 2006

Abstract

The identification of genes involved in health and disease remains a challenge. We describe a bioinformatics approach, together with a freely accessible, interactive and flexible software termed Endeavour, to prioritize candidate genes underlying biological processes or diseases, based on their similarity to known genes involved in these phenomena. Unlike previous approaches, ours generates distinct prioritizations for multiple heterogeneous data sources, which are then integrated, or fused, into a global ranking using order statistics. In addition, it offers the flexibility of including additional data sources. Validation of our approach revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. The approach described here offers an alternative integrative method for gene discovery.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Genomics. Microarrays—guilt by association. Science 302, 240–241 (2004).

  2. 2.

    & Bioinformatics in the post-sequence era. Nat. Genet. 33 Suppl. 305–310 (2003).

  3. 3.

    , & Funding high-throughput data sharing. Nat. Biotechnol. 22, 1179–1183 (2004).

  4. 4.

    & A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics 18 Suppl. 2, S110–S115 (2002).

  5. 5.

    , & Association of genes to genetically inherited diseases using data mining. Nat. Genet. 31, 316–319 (2002).

  6. 6.

    , & POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 4, R75 (2003).

  7. 7.

    et al. Integration of text- and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res. 33, 1544–1552 (2005).

  8. 8.

    , , , & Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics 6, 55 (2005).

  9. 9.

    & Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 32, 3108–3114 (2004).

  10. 10.

    et al. Exploring relationships and mining data with the UCSC Gene Sorter. Genome Res. 15, 737–741 (2005).

  11. 11.

    & PathwayVoyager: pathway mapping using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. BMC Genomics 6, 60 (2005).

  12. 12.

    et al. TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis. Nucleic Acids Res. 33, W393–W396 (2005).

  13. 13.

    , , , & Computational detection of cis-regulatory modules. Bioinformatics 19 (Suppl 2), II5–II14 (2003).

  14. 14.

    et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999).

  15. 15.

    et al. Gene expression-based high-throughput screening (GE-HTS) and application to leukemia differentiation. Nat. Genet. 36, 257–263 (2004).

  16. 16.

    et al. BCL6 suppresses RhoA activity to alter macrophage morphology and motility. J. Cell Sci. 118, 1873–1883 (2005).

  17. 17.

    et al. Hepatocyte growth factor is a regulator of monocyte-macrophage function. J. Immunol. 166, 1241–1247 (2001).

  18. 18.

    et al. Fas death receptor signaling represses monocyte numbers and macrophage activation in vivo. J. Immunol. 173, 7584–7593 (2004).

  19. 19.

    The 22q11 deletion syndromes. Hum. Mol. Genet. 9, 2421–2426 (2000).

  20. 20.

    Dissecting contiguous gene defects: TBX1. Curr. Opin. Genet. Dev. 15, 279–284 (2005).

  21. 21.

    & DiGeorge syndrome phenotype in mice mutant for the T-box gene, Tbx1. Nat. Genet. 27, 286–291 (2001).

  22. 22.

    et al. TBX1 is responsible for cardiovascular defects in velo-cardio-facial/DiGeorge syndrome. Cell 104, 619–629 (2001).

  23. 23.

    et al. Tbx1 haploinsufficieny in the DiGeorge syndrome region causes aortic arch defects in mice. Nature 410, 97–101 (2001).

  24. 24.

    et al. The zebrafish van gogh mutation disrupts tbx1, which is involved in the DiGeorge deletion syndrome in humans. Development 130, 5043–5052 (2003).

  25. 25.

    et al. A novel 22q11.2 microdeletion in DiGeorge syndrome. Am. J. Hum. Genet. 64, 659–666 (1999).

  26. 26.

    The development and evolution of the pharyngeal arches. J. Anat. 199, 133–141 (2001).

  27. 27.

    et al. VEGF: a modifier of the del22q11 (DiGeorge) syndrome? Nat. Med. 9, 173–182 (2003).

  28. 28.

    et al. TXTGate: profiling gene groups with text-based information. Genome Biol. 5, R43 (2004).

  29. 29.

    , & BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250 (2003).

  30. 30.

    , , & A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes. Bioinformatics 20, 1974–1976 (2004).

  31. 31.

    , , & A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

  32. 32.

    The Zebrafish Book. A Guide for the Laboratory Use of Zebrafish, (University of Oregon Press, Eugene, Oregon, 1994).

  33. 33.

    et al. The shaping of pharyngeal cartilages during early development of the zebrafish. Dev. Biol. 203, 245–263 (1998).

  34. 34.

    et al. Ca(V)1.2 calcium channel dysfunction causes a multisystem disorder including arrhythmia and autism. Cell 119, 19–31 (2004).

  35. 35.

    et al. Missense mutations in CRELD1 are associated with cardiac atrioventricular septal defects. Am. J. Hum. Genet. 72, 1047–1052 (2003).

  36. 36.

    et al. Identification and functional analysis of a caveolin-3 mutation associated with familial hypertrophic cardiomyopathy. Biochem. Biophys. Res. Commun. 313, 178–184 (2004).

  37. 37.

    et al. Mutations in LRRK2 cause autosomal-dominant parkinsonism with pleomorphic pathology. Neuron 44, 601–607 (2004).

  38. 38.

    et al. Mutations in the pleckstrin homology domain of dynamin 2 cause dominant intermediate Charcot-Marie-Tooth disease. Nat. Genet. 37, 289–294 (2005).

  39. 39.

    et al. Point mutations of the p150 subunit of dynactin (DCTN1) gene in ALS. Neurology 63, 724–726 (2004).

  40. 40.

    et al. Identification of an angiogenic factor that when mutated causes susceptibility to Klippel-Trenaunay syndrome. Nature 427, 640–645 (2004).

  41. 41.

    et al. ABCC9 mutations identified in human dilated cardiomyopathy disrupt catalytic KATP channel gating. Nat. Genet. 36, 382–387 (2004).

  42. 42.

    et al. Heterozygous missense mutations in BSCL2 are associated with distal hereditary motor neuropathy and Silver syndrome. Nat. Genet. 36, 271–276 (2004).

  43. 43.

    , , , & NIPBL, encoding a homolog of fungal Scc2-type sister chromatid cohesion proteins and fly Nipped-B, is mutated in Cornelia de Lange syndrome. Nat. Genet. 36, 636–641 (2004).

  44. 44.

    et al. Exclusion of linkage to the CDL1 gene region on chromosome 3q26.3 in some familial cases of Cornelia de Lange syndrome. Am. J. Med. Genet. 101, 120–129 (2001).

  45. 45.

    et al. Positional identification of TNFSF4, encoding OX40 ligand, as a gene that influences atherosclerosis susceptibility. Nat. Genet. 37, 365–372 (2005).

  46. 46.

    et al. Functional variants of OCTN cation transporter genes are associated with Crohn disease. Nat. Genet. 36, 471–475 (2004).

  47. 47.

    , & Mutations in the glucocerebrosidase gene and Parkinson's disease in Ashkenazi Jews. N. Engl. J. Med. 351, 1972–1977 (2004).

  48. 48.

    et al. A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis. Am. J. Hum. Genet. 75, 330–337 (2004).

  49. 49.

    et al. The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nat. Genet. 36, 233–239 (2004).

  50. 50.

    et al. Family-based association between Alzheimer's disease and variants in UBQLN1. N. Engl. J. Med. 352, 884–894 (2005).

Download references

Acknowledgements

We wish to thank all groups and consortia that made their data freely available: Ensembl, NCBI (EntrezGene and Medline), Gene Ontology, BIND, KEGG, Atlas, InterPro, BioBase, the Disease Probabilities from Lopez-Bigas and Ouzounis9 and the Prospectr scores from Euan Adie8. Ouzounis8 and the Prospectr scores from Euan Adie9. We also thank the following people for their help in particular areas: Robert Vlietinck with the manuscript, Patrick Glenisson with text mining, Joke Allemeersch and Gert Thijs with the order statistics and Camilla Esguerra with the zebrafish experiments. S.A., D.L. and P.V.L. are sponsored by the Research Foundation Flanders (FWO). This work is supported by Flanders Institute for Biotechnology (VIB), Instituut voor de aanmoediging van Innovatie door Wetenschap en Technologie in Vlaanderen (IWT) (STWW-00162), Research Council KULeuven (GOA-Ambiorics, IDO genetic networks), FWO (G.0229.03 and G.0413.03), IUAP V-22, K.U.L. Excellentiefinanciering CoE SymBioSys (EF/05/007), EU NoE Biopattern and EU EST BIOPTRAIN to Y.M., and by the FWO (G.0405.06), GOA/2006/11 and GOA/2001/09, Squibb and EULSHB-CT-2004-503573 to P.C.

Author information

Author notes

    • Stein Aerts
    • , Diether Lambrechts
    • , Sunit Maity
    • , Peter Van Loo
    •  & Bert Coessens

    These authors contributed equally to this work.

Affiliations

  1. Laboratory of Neurogenetics, Department of Human Genetics, Flanders Interuniversity Institute for Biotechnology (VIB), University of Leuven, Herestraat 49, bus 602, 3000 Leuven, Belgium.

    • Stein Aerts
    •  & Bassem Hassan
  2. The Center for Transgene Technology and Gene Therapy, Flanders Interuniversity Institute for Biotechnology (VIB), University of Leuven, Herestraat 49, bus 602, 3000 Leuven, Belgium.

    • Diether Lambrechts
    • , Sunit Maity
    • , Frederik De Smet
    •  & Peter Carmeliet
  3. Human Genome Laboratory, Department of Human Genetics, Flanders Interuniversity Institute for Biotechnology (VIB), University of Leuven, Herestraat 49, bus 602, 3000 Leuven, Belgium.

    • Peter Van Loo
    •  & Peter Marynen
  4. Bioinformatics Group, Department of Electrical Engineering (ESAT-SCD), University of Leuven, Belgium.

    • Stein Aerts
    • , Peter Van Loo
    • , Bert Coessens
    • , Leon-Charles Tranchevent
    • , Bart De Moor
    •  & Yves Moreau

Authors

  1. Search for Stein Aerts in:

  2. Search for Diether Lambrechts in:

  3. Search for Sunit Maity in:

  4. Search for Peter Van Loo in:

  5. Search for Bert Coessens in:

  6. Search for Frederik De Smet in:

  7. Search for Leon-Charles Tranchevent in:

  8. Search for Bart De Moor in:

  9. Search for Peter Marynen in:

  10. Search for Bassem Hassan in:

  11. Search for Peter Carmeliet in:

  12. Search for Yves Moreau in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Stein Aerts.

Supplementary information

PDF files

  1. 1.

    Supplementary Fig. 1

    Variability of the performance of endeavour, evaluated for each data source.

  2. 2.

    Supplementary Fig. 2

    Pairwise dependency of the data sources.

  3. 3.

    Supplementary Fig. 3

    Endeavor is not biased to well-characterized genes.

  4. 4.

    Supplementary Table 1

    Selection criteria and training sets for the 10 mono and 6 polygenic diseases.

  5. 5.

    Supplementary Table 2

    Prioritization of 1048 test genes located on chromosome 3 using training genes of congenital heart defects (CHD), arrythmias (AR), and cardiomyopathies (CM).

  6. 6.

    Supplementary Methods

  7. 7.

    Supplementary Notes

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nbt1203

Further reading