Today's biological research, it is often agreed, is awash in data. Large-scale experiments generate reams of information, and the struggle is often in using this information to achieve biological insight. For instance, are the data from the many genetic screens in model organisms being optimally used to understand human biology and disease?

According to Melissa Haendel at Oregon Health and Science University, Portland, we could be doing better. “We spend a lot of money on model-organism research, but the data [are not] really very accessible to clinicians, and vice versa for clinical research,” she says. In a recently published paper, Haendel and colleagues report a bioinformatics approach that could help these research communities make better use of each other's resources.

The essence of their approach is to enable computational comparison of phenotypes between species. To do this, it is necessary to describe phenotypes in a standardized way. Although some model-organism databases use controlled vocabularies, or ontologies, to describe the phenotypes associated with particular genes, alleles or genotypes, human clinical phenotypes are typically described using unstandardized terms. Such textual descriptions are difficult or impossible for a computer program to effectively search and compare to other phenotypic descriptions.

“What we tried to do here is to represent the human disease data in the same way as phenotype data [are] recorded in some model organisms,” says Haendel, “and see if we could get these two types of data to talk to each other.” The researchers selected 11 human disease genes from the Online Mendelian Inheritance in Man (OMIM) database and annotated them using what is called EQ terminology: 'E' applies to an entity, such as an anatomical part, a process or a molecular entity, and 'Q' applies to a quality that can be ascribed to that entity.

The complete set of EQ descriptions for a particular gene make up its phenotypic profile. The researchers then compared the phenotypic profiles of the 11 EQ-annotated human disease genes to the phenotypes recorded in similarly annotated mouse and zebrafish databases, using several specific information theory-based metrics to compare how similar a given phenotypic profile is to another.

Though the closest sequence orthologs of the human disease genes were not always identified as having the most similar phenotype to the human disease, the results are nevertheless promising that the approach will be useful for obtaining biologically relevant information. In five out of eleven comparisons, the zebrafish ortholog was in the top ten most similar phenotypes to the human, as reported by two metrics; for the mouse, this was true in four out of ten comparisons. Additionally, in within-species tests, other alleles or pathway members were identified as most similar to a query gene based on phenotypic comparison alone.

How could such comparisons be used to better understand human disease? The most exciting application will be in identifying candidate human disease genes, says Haendel. “If you have a human disease and you don't know the genetic basis for it, you could take a standardized phenotypic description of that disease and query the model organism databases for similar phenotypes where you would then be able to assess the genetic basis,” she says. A reciprocal search, in which the phenotype of a mutant model organism is used to search for a similar human phenotype, could also be useful for identifying animal models for human disease.

An extension of the EQ annotations to the entire OMIM database is in progress, says Haendel, and once in place should allow many researchers to work toward bringing these exciting possibilities to fruition.