Not a moment too soon, the microarray community has issued guidelines that will make their data much more useful and accessible. Nature and the Nature research journals will respond accordingly.
You read a paper with a fascinating conclusion about the expression of several genes. You decide to use some of the same experiments on your system of choice. But when you wade through hundreds of pages of supplementary information, you find that crucial details needed for replication are missing.
Welcome to the exciting but frustrating world of DNA microarray research. Microarrays are plastic or glass chips spotted with tiny amounts of thousands of probes, used to query the activity levels of that many genes in any tissue or organism at one time. Variables in every step of the experiment often make cross-paper comparison virtually impossible. Microarray papers also pose a considerable strain on the refereeing process; the vast amounts of data mean that critical review is a monumental task.
Yet referees sometimes feel they are not given enough details, leading cautious reviewers to think that they must reanalyse the primary data set. In other cases, the primary data provided are in proprietary software and so are impossible to comment on. Many journals allowed authors to put the huge data files on their own websites for the review process, until it became clear that unscrupulous authors compromised the anonymity of referees by tracking who had visited the website.
In a move to remedy these problems, the international Microarray Gene Expression Data (MGED) group has written an open letter to scientific journals proposing standards for publication. Other members of the microarray community welcomed these steps, designed to clarify the Minimal Information About a Microarray Experiment (MIAME) guidelines (Nature Genetics 29, 365–371; 2001).
For authors, the proposal provides a checklist of variables that should be included in every microarray publication, at http://www.mged.org/Workgroups/MIAME/miame_checklist.html. This checklist, with all variables completed, would be supplied as supplementary information at the time of submission. The MGED group suggests that journals require submission of microarray data to either of two databases emerging as the main public repositories: GEO (http://www.ncbi.nlm.nih.gov/geo/) or ArrayExpress (http://www.ebi.ac.uk/arrayexpress).
Harried editors can rejoice that, at last, the community is taming the unruly beast that is microarray information. Therefore, all submissions to Nature and the Nature family of journals received on or after 1 December containing new microarray experiments must include the mailing of five compact disks to the editor. These disks should include necessary information compliant with the MIAME standard. The information must be supplied in a format that could be read by widely available software packages. Data integral to the paper's conclusions should be submitted to the ArrayExpress or GEO databases, with accession numbers where available, supplied at or before acceptance for publication.
How much data should authors provide to the community? Specifically, do other researchers really need to recreate the exact microarray just to test the expression level of a few key genes, which could presumably be done through other methods? Perhaps with further evolution and standardization of microarray technology, the need to specify so many variables will decrease, but the MGED standards are surely appropriate for the current state of the field.
About this article
Physiological Genomics (2014)
PLoS ONE (2011)
Peptide-MHC Cellular Microarray with Innovative Data Analysis System for Simultaneously Detecting Multiple CD4 T-Cell Responses
PLoS ONE (2010)
Journal of Informetrics (2010)
Molecular Systems Biology (2009)