Sir

A recent News report in Nature1 mentioned a workshop that we organized at the National Institutes of Health concerning the prospects for databases that describe signal transduction pathways, and more specifically that define protein–protein interactions.

The complete genomic DNA sequence of an organism in principle yields its full coding potential, and gives the possibility of describing in a comprehensive fashion the structures and functions of proteins, and their organization into the pathways and networks that control cellular behaviour2. The vast literature on these topics seems likely to grow exponentially with the refinement of tools for rapid proteomic analysis. We feel that the value of this information, and its ability to serve as the basis for modelling of cellular responses to external signals, will depend on its organization into a readily accessible electronic format.

One approach to such a database makes use of the observation that a common theme in cellular events is the assembly of proteins into complexes, through specific modular interactions3. A growing number of such interactions use domains and recognition motifs that can be readily identified by primary sequence analysis, and are therefore predictable4. This notion can be extended to encompass the interactions of distinct types of macromolecules with one another, and with small molecules. Although not all cellular phenomena can be described in these terms, the concept provides a useful starting point from which to organize data.

The purpose of the workshop was to explore continuing efforts to design databases of protein–protein interactions, and to solicit input as to the best way forward. We considered the creation of a centralized, freely available, public submission database as an achievable and highly desirable goal for the next generation of cellular analysis. Such an undertaking will be complex and prone to numerous pitfalls, but we believe it is an inevitable evolution of current biological databases. Furthermore, we consider it essential if we are to understand more fully how cellular function is controlled.

A transcript of the workshop will be posted on the NIGMS website. As this initiative proceeds, we will solicit broader input from the scientific community.