Sir

Mikhail V. Blagosklonny and Arthur B. Pardee in their Concepts essay “Unearthing the gems” (Nature 416, 373; 2002) discuss the emergence of 'conceptual biology' — the iterative process of analysing existing facts and models available in published literature to generate new hypotheses — and its value in integrating existing information conceptually as an essential part of scientific research. This process has parallels with the process of drug discovery by iteration of existing biomedical literature.

Scientists have traditionally worked in discrete communities, creating discipline-specific language. The natural consequence is that today we are faced with an overwhelming array of nomenclature for genes, proteins, drugs and even diseases.

The β-amyloid converting enzyme BACE, for example, has synonyms ASP2, memapsin 2 and β-secretase, because of different research groups working in different research communities. Catatonic schizophrenia is the more common term for what used to be known as schizophrenic catalepsy, schizophrenic flexibilatis cerea, catatonic dementia praecox or schizophrenic catatonia. The drug pindolol has more than 30 synonyms, which include LB46, Carvisken, Calvisken and Prindolol.

In some areas, such as the naming of chemicals, there are controls that prevent names from being used more than once. In others, the same name has been used for different entities. The term 'hunk' is not only an English word, but also a cell type (human natural killer) and a gene (hormonally upregulated Neu-associated kinase). A search for the term 'hunk' in PubMed gives references to all three contexts.

The problem for scientists trying to perform 'conceptual' searches precisely and comprehensively is self-evident. It may be reasonably straightforward (though time-consuming) to search in a 'concept-driven manner' within their area of expertise, in which they are likely to be familiar with the language. It is considerably more difficult to search outside the discipline, as they may not know where to look and will be less familiar with variable nomenclature. As soon as a name is changed for a compound, disease or target, data associated with that entity could easily be lost. Delving back into history to seek those potentially valuable resources of information can often be impossible.

To address this all-too familiar problem, many groups are applying standards to biomedical nomenclature; expert committees are reaching consensus on the naming of gene families, diseases and compounds; there are consortia such as Gene Ontology (http://www.geneontology.org), which is defining a controlled vocabulary for the association of genes across different species. From the opposite perspective, there are resources such as the Metathesaurus of the Unified Medical Language System (http://www.nlm.nih.gov/research/umls), which brings together numerous classification systems so that all alternative names and meanings defined by different groups are in one repository.

These are all extremely valuable resources, but what the biomedical community needs is the ability to use such resources to search the vast sources of information more effectively, to extract more meaning.

The annotation of Medline with medical subject headings (MeSH) is probably the best attempt so far (http://www.nlm.nih.gov/mesh), as it at least helps users to link their search-term to abstracts containing different terms with the same meaning. What is still needed is a way to control the context of the search, so that terms having different meaning in different contexts can be retrieved appropriately. We also need ways to enable scientists to cross disciplines and search in areas outside their expertise, so that they can extract information critical for new discoveries. Knowledge-based systems will no doubt provide the best opportunity in this regard.