Main

Creating pre-competitive, collaborative industrial consortia has precedents in areas other than pharmaceuticals. The Single Nucleotide Polymorphism (SNP) Consortium (TSC) broke new ground as a consortium formed in 1999 by companies engaged in, or related to, pharmaceutical research1. TSC had a two year target to generate, and place in the public domain, a map of 300,000 human SNPs. The potential importance of a human SNP map in the area of health-care delivery has been reviewed recently2. TSC funded the work at major academic genome centers. After 18 months TSC is exceeding this target by a considerable margin and the results are in the public domain to be used by all researchers, academic and industrial.

It is now proposed that a similar pre-competitive consortium could tackle structural genomics. A group of multinational pharmaceutical and other companies and the Wellcome Trust are attempting to form a charitable organization, the Structural Genomics Consortium (SGC).

The motivation is to accelerate the flow of human protein structures into the public database. This will benefit biological research in general and particularly within the pharmaceutical area. Ascribing function to all of the human genes will provide numerous potential drug targets. The genome sequence requires extensive annotation that will be powered by the acquisition of extensive basic knowledge. The latter process is pre-competitive for the pharmaceutical industry — that is, for drug discovery, or more appropriately, drug creation. Most drugs are, and will be in the future, small molecules, with molecular weights of 500 daltons. The key target classes are broadly enzymes, receptors and ion channels. Within the many families of such proteins there are as yet few examples of three-dimensional protein structures. However, when structures of sufficient resolution have been generated, they have proven valuable in creating drugs. Examples are HIV protease inhibitors3 and influenza neuraminadase inhibitors4.

There is a large disparity between the number of open reading frames (ORFs) in the human genome (of the order of 100,000) and the number of human proteins for which three-dimensional structural information is available (considerably less than 1,000). Moreover, less than 200 novel structures are completed each year and few of these are human proteins. At this rate, generating structures for all human proteins would take over 1,000 years. Recognition that a substantial increase in the rate of completing novel structures is needed has provided the impetus for creating the SGC. The aim of the SGC will be to substantially accelerate production of novel structures, mainly of human proteins, but also of other proteins related to human health, resulting in a broader and deeper cross-section of such structures, all of which will be placed in the public databases. The goal will be an increased representation of structures across protein families rather than coverage of fold space. To achieve this goal it is important that structural genomics should now be tackled on an industrial scale and with an industrial approach to organization. The interim SGC group is evaluating costs that would be involved in achieving the goal.

Improvements in the methods and speed of solving protein structures using synchrotron radiation, coupled with automation of the process and improved informatics, allow the possibility of an increase in the numbers of novel structures. The initial rate-limiting step is large-scale production of proteins of suitable quality for structural studies and their crystallization.

The success of TSC has provided a model for creating the SGC, but differences in the scientific requirements will influence the detailed approach for the SGC. Producing a SNP map depended mainly on focused genome sequencing. The existing academic genome centers that were organized for that purpose were able to undertake the work as an extension of their large sequencing efforts, and TSC provided funding from a contribution of $3 million per company, with the Wellcome Trust contributing $14 million. Analogous centers for large-scale structural genomics do not exist currently. However, there are synchrotron facilities that offer a high-throughput approach to the rapid collection of X-ray diffraction data combined with rapid methods for solving structures. Since there is scope for creating new beam lines, data collection should not be a rate limiting step.

Complementary facilities for the production of large numbers of proteins in a quality suitable for crystallization are required. The SGC plans to fund such facilities in addition to high-throughput data collection and structure solving. The SGC program is intended to be complementary to the NIH initiative in structural biology and to other efforts in structural genomics around the world.

The SGC will aim to begin operation early in 2001 and to provide the necessary funding to approriate work centers (either academic or specialist companies) for at least three years. Even with a substantial increase in the numbers of novel human structures added to the public database each year a complete set of structures for human proteins will be some way off and there is scope for many concurrent efforts both private and public.

Associations with Structural Genomics A.R.W., an independent consultant, was an organizer of TSC and is the facilitator of the interim SGC. Organizations or individuals interested in participation should contact A.R.W.