Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A large-scale protein-function database

To the editor

In 2009, the Journal of Biological Chemistry published nearly 37,000 pages containing the data and analyses of biological entities. Biochemistry contributed 12,000 pages, the Proceedings of the National Academy of Sciences 22,568 and the European Journal of Biochemistry 7,446... Imagine if all of the protein-function data in those pages, and more, had been efficiently deposited to a database that was accessible, free of charge, worldwide. This is the primary objective of the Strenda Committee (Standards for the Reporting of Enzymological Data)1,2,3.

The rate at which data is acquired frequently outstrips the capacity of the human mind to house it. Instead, we mine it. The ability to electronically cull the majority of mankind's knowledge of the functioning of a particular biomolecule at the push of a button would be an acutely effective, efficient research tool. Consider the benefits of crossing such information against single nucleotide polymorphism databases to identify the biochemical lesions associated with disease-linked mutations or associate the functional consequences of mutations with changes in the structures housed in the Protein Data Bank. Additionally, as systems biologists strive to integrate large swaths of metabolism, ready access to initial-rate equilibria and regulatory data will prove immensely useful. Perhaps the greatest value of such a database lies in the myriad ways in which it would integrate into the daily activities of individuals, worldwide. One cannot help but wonder what fraction of the protein-function literature is obscured or even lost to the researcher by imprecise search engines and retrieval strategies.

In October of 2003, a group of scientists gathered under the auspices of the Beilstein Institut for the Advancement of Chemical Sciences to address a problem of common interest: the large-scale collection of protein-function data. It was realized that such an endeavor would require developing community-based standards for the reporting of protein-function data and that an electronic form, acting as a portal for the deposition of data as it enters the literature, would provide a mechanism for the growth of a protein-function database to parallel the efforts of the scientific community.

Strenda has worked extensively with the scientific community to formulate recommendations to authors for the reporting of enzymological data (Box 1). These recommendations are the result of in-depth discussions that took place at each of five annual Experimental Standard Conditions of Enzyme Characterizations conferences—international meetings comprised of about 50 invitees from the academy, industry and editorial boards. It is hoped that the recommendations will prove an asset to authors and journals alike by clearly articulating the community standards; so far they have been adopted by 14 journals.

The relationship of structure to function is among the most powerful in molecular science, yet an initiative to construct a protein-function database on the scale of the Protein Data Bank does not yet exist. The time for such an effort has come, and Strenda and the Beilstein Institute stand ready to assist in its implementation. Coupling the electronic submission of data contained in an article to its publication has been crucial to the development of the extant, large-scale databases. Toward this end, Strenda has developed an electronic data-entry form for the deposition of protein-function data that can be viewed at the Strenda website ( We encourage readers to view the form and share their thoughts regarding its design and construction with the goal of developing it to the point at which it can be become part of our routine publication practices.


  1. 1

    Apweiler, R. et al. Trends Biochem. Sci. 30, 11–12 (2005).

    CAS  Article  Google Scholar 

  2. 2

    Armstrong, R.N. Biochemistry 47, 1–2 (2008).

    Article  Google Scholar 

  3. 3

    Taylor, C.F. et al. Biotech. 26, 889–896 (2008).

    CAS  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Thomas S Leyh.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Apweiler, R., Armstrong, R., Bairoch, A. et al. A large-scale protein-function database. Nat Chem Biol 6, 785 (2010).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing