Marvin Cassman and his colleagues, in their Commentary “Barriers to progress in systems biology” (Nature 438, 1079; 2005), discuss the development of standards in systems-biology research. We agree with the need for well-curated databases, software systems that can work together to analyse such data and integrated models that can deliver the fruits of systems-based research to laboratory biologists. But we have concerns about the proposed solution, which is presented as a ‘top-down’ approach that ignores many existing and emerging standards. It seems based on false assumptions about the research community and ignores the community it is intended to serve.
Cassman and his colleagues argue that standards are needed because much software developed in research settings is not reusable by other groups of working biologists, who are not appropriately trained. But the community has many excellent quantitative scientists and software developers — and with the advent of genomics, an increasing number of physicists, mathematicians, statisticians, computer scientists and engineers have joined the ranks of biologists.
It is not a lack of training that influences software design, but the realities of developing software in a research environment where developing a professional software system is not the primary goal. As fields mature and the methodologies used to generate the data become well known and established, it is both appropriate and valuable to have standardized, easy-to-use software. But standardized approaches are not always appropriate for developing software to support new research using novel methodologies in exciting new ways.
Our collective experience, gained through the Microarray Gene Expression Data Society and the BioConductor project, clearly demonstrates that flexible systems are needed and that most initial efforts are neither well documented nor widely used. But that is not a bad thing — as science charts a particular path, the appropriate tools, if given room to evolve, do emerge and rise to the top, becoming better documented and more robust.
Even with the relatively straightforward task of assembling and annotating genome-sequencing data, computationally elegant solutions to software interoperability (such as the common object request broker architecture, or CORBA) were ultimately abandoned in favour of FASTA-formatted sequence data and tab-delimited output from various analytical tools strung together using Perl. It wasn't elegant or pretty, but it delivered what was needed in a way that sophisticated users at various locations could replicate and adapt to suit their needs. When combined with well-engineered databases and websites to provide access, the genome projects also delivered the fruits of their work to the broader community in a form that has been extremely useful and continues to evolve. Engineering this ahead of time, particularly when the field and the tools were evolving so rapidly, quite simply would have failed.
We believe that the centralized approach proposed by Cassman and colleagues would not fare well compared with more democratic, community-based approaches that understand and include research-driven development efforts. Creating a rigid standard before a field has matured can result in a failed and unused standard, in the best of circumstances, and, in the worst, can have the effect of stifling innovation.