Nature Biotechnology 24, 115 (2006)

In it for the long haul

Six weeks ago, the rights to one of biology's premier public databases were quietly sold to an informatics startup. The database in question, the Biomolecular Interaction Network Database (BIND), is arguably the most comprehensive freely accessible protein-protein interaction database available to the research community. Yet through a combination of bureaucratic delays, Canadian government fiscal nitpicking and a lack of community consensus, this important resource now finds itself on life support, its survival precariously linked to that of Unleashed Informatics, a private venture founded last April with little more than $1.0 million in seed funding from Sun Microsystems.

BIND is a database of molecular associations that collates high-throughput data submissions and hand-curated information from the scientific literature. Although the database has existed since 1998, it really started making headlines in 2003 when the Blueprint Initiative, a project of Toronto-based Mount Sinai Hospital's Samuel Lunenfeld Research Institute, obtained $17.3 million in federal and Ontario government funding and another $7.8 million from the private sector to “assemble man's biomolecular knowledge on one open source database for all researchers to access free of charge.”

By 2005, Blueprint's staff of curators, software developers and administrators had grown to 68. The number of interactions lodged in the database had risen to >180,000. And 77 scientific journals had signed on to publish BIND accession numbers in their papers. All seemed to be progressing well until Blueprint's principal investigator, Chris Hogue, started looking around for new funding.

It soon became apparent that Genome Canada was unwilling to stump up the additional $20.8 million Hogue estimated was needed to maintain the database over the next four years. While Genome Canada president Martin Godbout cited problems with BIND's “management, budget justification and financial plan,” Hogue countered the real sticking point was Genome Canada's requirement that Blueprint secure “matched funding” from another source. Unfortunately, the most likely provider of such funding, the Ontario government, declined to back the project because it was in the midst of restructuring how it doled out funds.

Forced to lay off half of his staff to keep the project alive, Hogue was left scrambling to find alternative sources of funding. In June, an interim solution appeared to have been found when Blueprint's mirror node in Singapore offered to take over database operations. By November, however, this arrangement had also fallen through. Hogue announced the termination of all BIND curation activities and the dismissal of all remaining staff. Last month, Blueprint Asia closed its doors, leaving BIND under the sole control of Unleashed Informatics, which has agreed to maintain the existing data as an open access resource.

One might argue that BIND offers nothing more than a cautionary tale about Canadian research funding. After all, researchers have several other protein interaction databases available, including the European Bioinformatics Institute's (EBI) InAct (http://www.embl-ebi.ac.uk/intact/index.jsp), the University of California's Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu/) and the Munich Institute for Bioinformatics' MPact (http://mips.gsf.de/genre/proj/mpact/index.html). But that would be wrong. BIND's predicament is not just about Canadian politics and Canadian research. Only ~7% of BIND's users were based in Canada; the majority originated from the United States and Europe. And BIND was not just another protein interaction database. It was a unique resource, not only because of its comprehensiveness (as of January 23, ~200,000 interactions compared with ~60,000 and ~56,000 in IntAct and DIP, respectively), but also because of the quality of its data and its hyperlinks to the scientific literature.

If the failure of BIND highlights anything, it is the endemic and longstanding problem of providing sustainable long-term financial support for databases. Because when it comes to financial insecurity, no database project, big or small, is immune. According to a survey by Nature ( 435, 1010–1011, 2005), of 89 databases listed in the Molecular Biology Database collection in 2000, last year nearly two-thirds (51) were struggling financially and seven had already closed. Last May, the decision of the US National Institute for General Medical Services (NIGMS) to halve the $5 million originally requested by the Alliance for Cellular Signaling (AfCS; http://www.signaling-gateway.org/) necessitated the closure of the AfCS's curation office at Duke University and required Nature Publishing Group to assume full editorial control of AfCS's Molecular Pages. Even the EBI has had to shuffle its own financial reserves to keep databases like InterPro afloat; and confirmation of renewed funding for major databases, such as ArrayExpress, often has to wait until the eleventh hour.

A first step to addressing these problems would be to bring together representatives of the major funding agencies (e.g., the European Union and Wellcome Trust in Europe, and the National Science Foundation and NIGMS in the United States) to engage in a high-level discussion on long-term funding goals for international databases. A report on long-lived databases published in September by the US National Science Board provides a summary of the most important issues (http://www.nsf.gov/pubs/2005/nsb0540/start.jsp).


Second, a mechanism needs to be outlined that would allow funding agencies to recognize databases that both reflect community consensus standards (e.g., GEO and ArrayExpress are compliant with the Minimal Information About a Microarray Experiment standard; http://www.mged.org) and have matured into indispensable community resources. These are the databases that should be prioritized for longer-term funding (subject of course to regular reappraisal and peer review).

There's no doubt that BIND, with its links to the literature and its high-quality molecular interaction data set, was a valuable resource. But the protein interaction field is still developing its consensus standards (e.g., see the Proteomics Standard Initiative; http://psidev.sourceforge.net/) and is to some extent still exploring its boundaries. Perhaps it was too early to unite community efforts into one molecular interaction database. Or perhaps BIND was ahead of its time. Whatever the case, funders must now formulate a strategy to ensure that other databases, especially those bankrolled with millions of dollars of public money, avoid a similar fate.


