Following up on its publications in Nature Biotechnology four years ago (http://www.nature.com/nbt/focus/maqc/index.html), the Microarray Quality Control (MAQC) consortium publishes the results of its second phase of assessment (MAQC-II) on p. 827, in conjunction with ten accompanying papers in The Pharmacogenomics Journal (http://www.nature.com/tpj/journal/v10/n4/index.html). The new work assesses the capabilities and limitations of microarray data analysis methods—so-called genomic classifiers—in identifying gene signatures representative of a specific pathological condition.

All in all, >30,000 genomic classifier models were built by combining one of 17 different data preprocessing and normalization methods, with one of 9 methods for filtering out problematic data, with one of >33 techniques for picking 'signature' genes, with one of >24 algorithms for discerning patterns from those genes, and with one of 6 methods for testing the robustness of the results. Thirty-six research teams sought gene signatures within 6 massive microarray datasets derived from toxicological studies of chemicals on rodents and expression profiles of human cancer patients that predict 13 'endpoints' potentially relevant to preclinical or clinical applications.

As discussed on p. 810, one key finding of MAQC-II is that the classifier models are remarkably similar in predicting outcome, irrespective of the approach used. On the other hand, the overall success of the classifiers in predicting endpoints depends on the endpoints themselves. For example, predictions were in general much worse for breast cancer and multiple myeloma, which have highly heterogenous genetic backgrounds, than for liver toxicology or neuroblastoma.

Perhaps most striking of all, some data analysis teams were consistently better at predictions than others. This may relate to simple errors associated with manipulating such large datasets. But insufficient tuning of the parameters used in a classifier model is also a likely contributor. In this sense, MAQC-II was as much an exercise in sociology as in technology. The human element in classifier implementation is key.

Thus a key take-home message is that classifier protocols need to be more tightly described and more tightly executed. In this respect, regulatory agencies and scientific journals can promote good practice. A clear need exists for greater meticulousness both in documenting the parameters of a particular classifier model used and in detailing the procedures for normalization, batch effect correction, quality control and reduction of quality control flaws. Greater attention to detail will not only enhance reproducibility of research—it will also facilitate the progression of this technology toward the clinic.