Paris

The United States is turning to European bioinformatics facilities to help it meet its researchers' future needs for databases of protein sequences.

European institutions are set to be the main recipients of a $15-million, three-year grant from the US National Institutes of Health (NIH), to set up a global database of information on protein sequence and function known as the United Protein Databases, or UniProt.

Sound structure: the planned UniProt database will be a valuable resource for protein research. Credit: SWISS-PROT

The NIH is scheduled to announce this week that two-thirds of the grant will go to help maintain two protein databases, Swiss-Prot and Trembl. Swiss-Prot is a curated protein-sequence database that strives to provide a high level of annotation including descriptions of function, structure and variance. Trembl is a computer-annotated supplement to the main database that contains sequences not yet integrated into Swiss-Prot. The two databases were developed by groups at the European Bioinformatics Institute (EBI) near Cambridge in the UK, and the Swiss Institute of Bioinformatics (SIB), which is based at Geneva and Lausanne.

The remainder of the NIH money will support the Protein Information Resource (PIR) database in the United States, which is kept by the Georgetown University Medical Center in Washington DC, and will now merge with its erstwhile European rivals.

“It's the first time that most of the money from such a large, NIH-funded infrastructure project goes outside the United States,” says Rolf Apweiler, Swiss-Prot coordinator at the EBI. “This is remarkable, and shows that Americans recognize that the centre of gravity for protein sequence data is in Europe.”

Swiss-Prot began life in 1986 as an augmented version of the PIR database. It has grown to be the world's largest protein sequence database, and, according to some researchers, the most useful — for both the quality of additional editorial and functional information that it carries, and the richness of its links to genomic and other bioinformatics databases.

The NIH is backing Swiss-Prot because it recognizes that the community is best served by focusing efforts globally on one well-developed database, rather than developing a rival, officials close to the decision say.

Under the terms of the grant, PIR will stop maintaining its own database but will assist with the care and feeding of UniProt. Existing data held on PIR will be integrated into Trembl and Swiss-Prot. The grant will enable PIR's organization to remain the same size as it is now, but the funding will allow the EBI and the SIB to expand their protein bioinformatics programmes markedly.

For the biologist at the bench, the move is likely to help guarantee the availability of a sustainable and reliable source of data on protein sequences and functions, as the amount of protein data rapidly expands. Database officials say that the new funding will result in a better quality of curated entries, and more sophisticated tools to help users to navigate the large data sets.

But the news that the NIH is backing the project may create some political embarrassment in Europe, given the notorious reluctance of the European Union to support bioinformatics infrastructure (see Nature 402, 1; 199910.1038/46847).

“It is ironic that the United States gives us money, whereas the European Union seems to expect such infrastructure to live on manna from heaven,” says one European expert in biology and information technology. The US investment inevitably also means that the United States will have a greater say in the control of Europe's bioinformatics infrastructure, the expert points out.