Many working with microarrays now recognize that one way uncertainty about experimental findings can be dispelled is by being more transparent about methodology and data. This realization has transformed the field. For instance, after some initial resistance, almost every major commercial vendor has made the sequences and annotations of their probes publicly available — to the considerable benefit of the community as a whole.

This awareness has also manifest itself in the drive to develop shared resources for pooling experimental data and systems for clearly defining how these data were obtained. A leading force in this regard is the Microarray Gene Expression Data (MGED) Society, which put forward a proposal in 2001 for experimental annotation standards known as minimum information about a microarray experiment (MIAME), designed to record key details about factors such as sample preparation and experimental design. These standards were embraced by many, and several leading journals, including Cell, The Lancet and Nature, demand MIAME compliance from all microarray research submissions. However, some aspects of MIAME have proved problematic.

“I think almost all academic biologists embrace the concept of openly sharing data,” says Catherine Ball of Stanford University in California, the current president of the MGED Society. “But embracing the process and actually taking part are very different, and it can be difficult to fully annotate your data.” According to Gavin Sherlock, also of Stanford and MGED, part of the problem was MAGE-ML (microarray and gene expression markup language), the XML-based language initially developed for MIAME data recording. “Nobody can look at it, nobody can read it, nobody can edit it,” he says. “It's very difficult to use.” This is reflected in the uploading of data to public databases, another process strongly advocated by MGED.

Catherine Ball believes simpler software tools could encourage better MIAME compliance.

The ArrayExpress database of the European Bioinformatics Institute in Cambridge, UK, is strictly MIAME-compliant, and receives considerably fewer submissions than the non-MIAME-compliant Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information in Rockville, Maryland. MGED is now poised to release a considerably simpler format for data submission, and Ball is hopeful that this, along with other user-friendly software tools, will make a difference.

But, fundamentally, compliance comes down to the effort scientists can and will put in. All of the MicroArray Quality Control project's data are being deposited into both GEO and ArrayExpress, and although this has proved an onerous task, Leming Shi of the US Food and Drug Administration sees clear rewards in the effort. “Depositing the data may be a painful process, but we have to do it for the sake of the community,” he says. “The more information we have in the future, the better.”