Disparities between microarray data from different groups working on similar samples has made many question the validity of this widely adopted technology. Although the 'minimal information about a microarray experiment' (MIAME) guidelines set standards for the publication of microarray data, they do not address experimental reproducibility. As gene-expression data rapidly accumulate in the public domain, three papers in Nature Methods provide a timely investigation into the reproducibility of microarray data and suggest that with appropriate caution such data can be used with confidence.

One of the main issues when comparing microarray data is consideration of the metrics generated by different technology platforms. There is a tremendous choice of platforms available and much diversity in protocols for sample preparation, imaging and analysis. Furthermore, whereas some groups report the absolute level of expression of a particular gene, others compare the relative transcription of genes. This makes meaningful comparisons of gene-expression data from different sources challenging.

The three papers investigate different aspects of microarray reproducibility. Larkin et al. directly compared the performance of two microarray platforms — an in-house-developed two-colour cDNA array and a commercial oligonucleotide array — in a study of the effects of chronic and acute exposure of angiotensin II on cardiac gene expression in mice. Irizarry et al. studied the impact of inter-laboratory variation by providing a consortium of ten laboratories with an identical RNA sample processed according to individual laboratory protocols, and then comparing the results obtained from three widely used microarray platforms. Finally, the Toxicogenomics Research Consortium (TRC) used in-house and commercial microarrays with identical RNA samples to assess the variability caused by sample handling, imaging and data analysis.

The studies show that results between platforms are remarkably consistent. Larkin et al. report that most genes had similar expression patterns but that the relative amplitude of expression was greater according to the commercial array. Some genes had divergent expression patterns between platforms, but principal-components analysis clustered these genes by experimental treatment rather than platform. Mapping probes from both arrays to the genome revealed that the two platforms interrogated different sequences for these divergent genes; Larkin et al. suggest that the presence of poorly or non-annotated splice variants might explain this inconsistency.

Considerable variation between laboratories using identical RNA samples was identified by both Irizarry et al. and the TRC study, although the TRC study showed that reproducibility improved markedly after standardizing protocols for RNA labelling, hybridization, array processing, data acquisition and normalization.

All three papers agree that using a standard procedure to normalize data relative to controls provides a more meaningful value and eliminates technical variability caused by probe and target molecules. Moreover, the TRC study showed that the use of gene-ontology nodes to analyse groups of genes in lieu of direct gene-by-gene comparison identified significant biological themes even with low levels of correlation between data from different platforms and laboratories.

Despite some disagreement, the authors reach a common consensus that standardization of experimental and analytical procedures is warranted. These studies should boost confidence that robust and reproducible results can be obtained using microarrays.