The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models

Journal name:
Nature Biotechnology
Volume:
28,
Pages:
827–838
Year published:
DOI:
doi:10.1038/nbt.1665
Received
Accepted
Published online

Abstract

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

At a glance

Figures

  1. Experimental design and timeline of the MAQC-II project.
    Figure 1: Experimental design and timeline of the MAQC-II project.

    Numbers (1–11) order the steps of analysis. Step 11 indicates when the original training and validation data sets were swapped to repeat steps 4–10. See main text for description of each step. Every effort was made to ensure the complete independence of the validation data sets from the training sets. Each model is characterized by several modeling factors and seven internal and external validation performance metrics (Supplementary Tables 1 and 2). The modeling factors include: (i) organization code; (ii) data set code; (iii) endpoint code; (iv) summary and normalization; (v) feature selection method; (vi) number of features used; (vii) classification algorithm; (viii) batch-effect removal method; (ix) type of internal validation; and (x) number of iterations of internal validation. The seven performance metrics for internal validation and external validation are: (i) MCC; (ii) accuracy; (iii) sensitivity; (iv) specificity; (v) AUC; (vi) mean of sensitivity and specificity; and (vii) r.m.s.e. s.d. of metrics are also provided for internal validation results.

  2. Model performance on internal validation compared with external validation.
    Figure 2: Model performance on internal validation compared with external validation.

    (a) Performance of 18,060 models that were validated with blinded validation data. (b) Performance of 13 candidate models. r, Pearson correlation coefficient; N, number of models. Candidate models with binary and continuous prediction values are marked as circles and squares, respectively, and the standard error estimate was obtained using 500-times resampling with bagging of the prediction results from each model. (c) Distribution of MCC values of all models for each endpoint in internal (left, yellow) and external (right, green) validation performance. Endpoints H and L (sex of the patients) are included as positive controls and endpoints I and M (randomly assigned sample class labels) as negative controls. Boxes indicate the 25% and 75% percentiles, and whiskers indicate the 5% and 95% percentiles.

  3. Performance, measured using MCC, of the best models nominated by the 17 data analysis teams (DATs) that analyzed all 13 endpoints in the original training-validation experiment.
    Figure 3: Performance, measured using MCC, of the best models nominated by the 17 data analysis teams (DATs) that analyzed all 13 endpoints in the original training-validation experiment.

    The median MCC value for an endpoint, representative of the level of predicability of the endpoint, was calculated based on values from the 17 data analysis teams. The mean MCC value for a data analysis team, representative of the team's proficiency in developing predictive models, was calculated based on values from the 11 non-random endpoints (excluding negative controls I and M). Red boxes highlight candidate models. Lack of a red box in an endpoint indicates that the candidate model was developed by a data analysis team that did not analyze all 13 endpoints.

  4. Correlation between internal and external validation is dependent on data analysis team.
    Figure 4: Correlation between internal and external validation is dependent on data analysis team.

    Pearson correlation coefficients between internal and external validation performance in terms of MCC are displayed for the 14 teams that submitted models for all 13 endpoints in both the original (x axis) and swap (y axis) analyses. The unusually low correlation in the swap analysis for DAT3, DAT11 and DAT36 is a result of their failure to accurately predict the positive endpoint H, likely due to operator errors (Supplementary Table 6).

  5. Effect of modeling factors on estimates of model performance.
    Figure 5: Effect of modeling factors on estimates of model performance.

    (a) Random-effect models of external validation performance (MCC) were developed to estimate a distinct variance component for each modeling factor and several selected interactions. The estimated variance components were then divided by their total in order to compare the proportion of variability explained by each modeling factor. The endpoint code contributes the most to the variability in external validation performance. (b) The BLUP plots of the corresponding factors having proportion of variation larger than 1% in a. Endpoint abbreviations (Tox., preclinical toxicity; BR, breast cancer; MM, multiple myeloma; NB, neuroblastoma). Endpoints H and L are the sex of the patient. Summary normalization abbreviations (GA, genetic algorithm; RMA, robust multichip analysis). Classification algorithm abbreviations (ANN, artificial neural network; DA, discriminant analysis; Forest, random forest; GLM, generalized linear model; KNN, K-nearest neighbors; Logistic, logistic regression; ML, maximum likelihood; NB, Naïve Bayes; NC, nearest centroid; PLS, partial least squares; RFE, recursive feature elimination; SMO, sequential minimal optimization; SVM, support vector machine; Tree, decision tree). Feature selection method abbreviations (Bscatter, between-class scatter; FC, fold change; KS, Kolmogorov-Smirnov algorithm; SAM, significance analysis of microarrays).

Accession codes

Referenced accessions

GenBank/EMBL/DDBJ

Gene Expression Omnibus

References

  1. Marshall, E. Getting the noise out of gene arrays. Science 306, 630631 (2004).
  2. Frantz, S. An array of problems. Nat. Rev. Drug Discov. 4, 362363 (2005).
  3. Michiels, S., Koscielny, S. & Hill, C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365, 488492 (2005).
  4. Ntzani, E.E. & Ioannidis, J.P. Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362, 14391444 (2003).
  5. Ioannidis, J.P. Microarrays and molecular research: noise discovery? Lancet 365, 454455 (2005).
  6. Ein-Dor, L., Kela, I., Getz, G., Givol, D. & Domany, E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21, 171178 (2005).
  7. Ein-Dor, L., Zuk, O. & Domany, E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103, 59235928 (2006).
  8. Shi, L. et al. QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies. Expert Rev. Mol. Diagn. 4, 761777 (2004).
  9. Shi, L. et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 6 Suppl 2, S12 (2005).
  10. Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 11511161 (2006).
  11. Guo, L. et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat. Biotechnol. 24, 11621169 (2006).
  12. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 11151122 (2006).
  13. Patterson, T.A. et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 11401150 (2006).
  14. Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 11231131 (2006).
  15. Tong, W. et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 24, 11321139 (2006).
  16. Irizarry, R.A. et al. Multiple-laboratory comparison of microarray platforms. Nat. Methods 2, 345350 (2005).
  17. Strauss, E. Arrays of hope. Cell 127, 657659 (2006).
  18. Shi, L., Perkins, R.G., Fang, H. & Tong, W. Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential. Curr. Opin. Biotechnol. 19, 1018 (2008).
  19. Dudoit, S., Fridlyand, J. & Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 7787 (2002).
  20. Goodsaid, F.M. et al. Voluntary exploratory data submissions to the US FDA and the EMA: experience and impact. Nat. Rev. Drug Discov. 9, 435445 (2010).
  21. van 't Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530536 (2002).
  22. Buyse, M. et al. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J. Natl. Cancer Inst. 98, 11831192 (2006).
  23. Dumur, C.I. et al. Interlaboratory performance of a microarray-based gene expression test to determine tissue of origin in poorly differentiated and undifferentiated cancers. J. Mol. Diagn. 10, 6777 (2008).
  24. Deng, M.C. et al. Noninvasive discrimination of rejection in cardiac allograft recipients using gene expression profiling. Am. J. Transplant. 6, 150160 (2006).
  25. Coombes, K.R., Wang, J. & Baggerly, K.A. Microarrays: retracing steps. Nat. Med. 13, 12761277, author reply 1277–1278 (2007).
  26. Ioannidis, J.P.A. et al. Repeatability of published microarray gene expression analyses. Nat. Genet. 41, 149155 (2009).
  27. Baggerly, K.A., Edmonson, S.R., Morris, J.S. & Coombes, K.R. High-resolution serum proteomic patterns for ovarian cancer detection. Endocr. Relat. Cancer 11, 583584, author reply 585–587 (2004).
  28. Ambroise, C. & McLachlan, G.J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99, 65626566 (2002).
  29. Simon, R. Using DNA microarrays for diagnostic and prognostic prediction. Expert Rev. Mol. Diagn. 3, 587595 (2003).
  30. Dobbin, K.K. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin. Cancer Res. 11, 565572 (2005).
  31. Shedden, K. et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat. Med. 14, 822827 (2008).
  32. Parry, R.M. et al. K-nearest neighbors (KNN) models for microarray gene-expression analysis and reliable clinical outcome prediction. Pharmacogenomics J. 10, 292309 (2010).
  33. Dupuy, A. & Simon, R.M. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 99, 147157 (2007).
  34. Dave, S.S. et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N. Engl. J. Med. 351, 21592169 (2004).
  35. Tibshirani, R. Immune signatures in follicular lymphoma. N. Engl. J. Med. 352, 14961497, author reply 1496–1497 (2005).
  36. Shi, W. et al. Functional analysis of multiple genomic signatures demonstrates that classification algorithms choose phenotype-related genes. Pharmacogenomics J. 10, 310323 (2010).
  37. Robinson, G.K. That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6, 1532 (1991).
  38. Hothorn, T., Hornik, K. & Zeileis, A. Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Statist. 15, 651674 (2006).
  39. Boutros, P.C. et al. Prognostic gene signatures for non-small-cell lung cancer. Proc. Natl. Acad. Sci. USA 106, 28242828 (2009).
  40. Popovici, V. et al. Effect of training sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Res. 12, R5 (2010).
  41. Yousef, W.A., Wagner, R.F. & Loew, M.H. Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach. IEEE Trans. Pattern Anal. Mach. Intell. 28, 18091817 (2006).
  42. Gur, D., Wagner, R.F. & Chan, H.P. On the repeated use of databases for testing incremental improvement of computer-aided detection schemes. Acad. Radiol. 11, 103105 (2004).
  43. Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 5565 (2006).
  44. Wood, I.A., Visscher, P.M. & Mengersen, K.L. Classification based upon gene expression data: bias and precision of error rates. Bioinformatics 23, 13631370 (2007).
  45. Luo, J. et al. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J. 10, 278291 (2010).
  46. Fan, X. et al. Consistency of predictive signature genes and classifiers generated using different microarray platforms. Pharmacogenomics J. 10, 247257 (2010).
  47. Huang, J. et al. Genomic indicators in the blood predict drug-induced liver injury. Pharmacogenomics J. 10, 267277 (2010).
  48. Oberthuer, A. et al. Comparison of performance of one-color and two-color gene-expression analyses in predicting clinical endpoints of neuroblastoma patients. Pharmacogenomics J. 10, 258266 (2010).
  49. Hong, H. et al. Assessing sources of inconsistencies in genotypes and their effects on genome-wide association studies with HapMap samples. Pharmacogenomics J. 10, 364374 (2010).
  50. Thomas, R.S., Pluta, L., Yang, L. & Halsey, T.A. Application of genomic biomarkers to predict increased lung tumor incidence in 2-year rodent cancer bioassays. Toxicol. Sci. 97, 5564 (2007).
  51. Fielden, M.R., Brennan, R. & Gollub, J. A gene expression biomarker provides early prediction and mechanistic assessment of hepatic tumor induction by nongenotoxic chemicals. Toxicol. Sci. 99, 90100 (2007).
  52. Ganter, B. et al. Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. J. Biotechnol. 119, 219244 (2005).
  53. Lobenhofer, E.K. et al. Gene expression response in target organ and whole blood varies as a function of target organ injury phenotype. Genome Biol. 9, R100 (2008).
  54. Symmans, W.F. et al. Total RNA yield and microarray gene expression profiles from fine-needle aspiration biopsy and core-needle biopsy samples of breast carcinoma. Cancer 97, 29602971 (2003).
  55. Gong, Y. et al. Determination of oestrogen-receptor status and ERBB2 status of breast carcinoma: a gene-expression profiling study. Lancet Oncol. 8, 203211 (2007).
  56. Hess, K.R. et al. Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J. Clin. Oncol. 24, 42364244 (2006).
  57. Zhan, F. et al. The molecular classification of multiple myeloma. Blood 108, 20202028 (2006).
  58. Shaughnessy, J.D. Jr. et al. A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood 109, 22762284 (2007).
  59. Barlogie, B. et al. Thalidomide and hematopoietic-cell transplantation for multiple myeloma. N. Engl. J. Med. 354, 10211030 (2006).
  60. Zhan, F., Barlogie, B., Mulligan, G., Shaughnessy, J.D. Jr. & Bryant, B. High-risk myeloma: a gene expression based risk-stratification model for newly diagnosed multiple myeloma treated with high-dose therapy is predictive of outcome in relapsed disease treated with single-agent bortezomib or high-dose dexamethasone. Blood 111, 968969 (2008).
  61. Chng, W.J., Kuehl, W.M., Bergsagel, P.L. & Fonseca, R. Translocation t(4;14) retains prognostic significance even in the setting of high-risk molecular signature. Leukemia 22, 459461 (2008).
  62. Decaux, O. et al. Prediction of survival in multiple myeloma based on gene expression profiles reveals cell cycle and chromosomal instability signatures in high-risk patients and hyperdiploid signatures in low-risk patients: a study of the Intergroupe Francophone du Myelome. J. Clin. Oncol. 26, 47984805 (2008).
  63. Oberthuer, A. et al. Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J. Clin. Oncol. 24, 50705078 (2006).

Download references

Author information

Affiliations

  1. National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, USA.

    • Leming Shi,
    • Zhining Wen,
    • Minjun Chen,
    • Huixiao Hong,
    • Roger G Perkins,
    • James C Fuscoe,
    • Weigong Ge,
    • Stephen C Harris,
    • Zhiguang Li,
    • Jie Liu,
    • Zhichao Liu,
    • Baitang Ning,
    • Qiang Shi,
    • Brett Thorn,
    • Lei Xu,
    • Lun Yang,
    • Min Zhang &
    • Weida Tong
  2. Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland, USA.

    • Gregory Campbell,
    • Weijie Chen,
    • Brandon D Gallas,
    • Gene A Pennello,
    • Reena Philip,
    • Lakshmi Vishnuvajjala,
    • Francisco Martinez-Murillo,
    • Frank W Samuelson,
    • Rong Tang,
    • Zivana Tezak &
    • Uwe Scherf
  3. Expression Analysis Inc., Durham, North Carolina, USA.

    • Wendell D Jones &
    • Joel Parker
  4. Department of Physiology and Biophysics and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York, USA.

    • Fabien Campagne
  5. Wake Forest Institute for Regenerative Medicine, Wake Forest University, Winston-Salem, North Carolina, USA.

    • Stephen J Walker
  6. Z-Tech, an ICF International Company at NCTR/FDA, Jefferson, Arkansas, USA.

    • Zhenqiang Su,
    • Hong Fang,
    • Feng Qian,
    • Dhivya Arasappan,
    • Joseph Meehan &
    • Joshua Xu
  7. SAS Institute Inc., Cary, North Carolina, USA.

    • Tzu-Ming Chu,
    • Li Li,
    • Wenjun Bao,
    • Wendy Czika,
    • Kelci Miclaus,
    • Padraic Neville,
    • Pei-Yi Tan &
    • Russell D Wolfinger
  8. Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA.

    • Federico M Goodsaid,
    • Sue Jane Wang,
    • Mat Soukup,
    • Jialu Zhang &
    • Li Zhang
  9. Breast Medical Oncology Department, University of Texas (UT) M.D. Anderson Cancer Center, Houston, Texas, USA.

    • Lajos Pusztai
  10. Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA.

    • John D Shaughnessy Jr,
    • Bart Barlogie &
    • Yiming Zhou
  11. Department of Pediatric Oncology and Hematology and Center for Molecular Medicine (CMMC), University of Cologne, Cologne, Germany.

    • André Oberthuer,
    • Matthias Fischer,
    • Frank Berthold &
    • Yvonne Kahlert
  12. The Hamner Institutes for Health Sciences, Research Triangle Park, North Carolina, USA.

    • Russell S Thomas
  13. National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, USA.

    • Richard S Paules,
    • Pierre R Bushel,
    • Jeff Chou &
    • Jennifer Fostel
  14. Roche Palo Alto LLC, South San Francisco, California, USA.

    • Mark Fielden
  15. Biomedical Informatics Center, Northwestern University, Chicago, Illinois, USA.

    • Pan Du &
    • Simon M Lin
  16. Fondazione Bruno Kessler, Povo-Trento, Italy.

    • Cesare Furlanello,
    • Giuseppe Jurman,
    • Samantha Riccadonna &
    • Roberto Visintainer
  17. Department of Mathematics & Statistics, South Dakota State University, Brookings, South Dakota, USA.

    • Xijin Ge
  18. CMINDS Research Center, Department of Electrical and Computer Engineering, University of Massachusetts Lowell, Lowell, Massachusetts, USA.

    • Dalila B Megherbi &
    • Manuel Madera
  19. Department of Pathology, UT M.D. Anderson Cancer Center, Houston, Texas, USA.

    • W Fraser Symmans
  20. Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA.

    • May D Wang,
    • Richard A Moffitt,
    • R Mitchell Parry,
    • John H Phan &
    • Todd H Stokes
  21. Systems Analytics Inc., Waltham, Massachusetts, USA.

    • John Zhang,
    • Jun Luo,
    • Eric Wang &
    • Matthew Woods
  22. Hoffmann-LaRoche, Nutley, New Jersey, USA.

    • Hans Bitter
  23. Department of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

    • Benedikt Brors,
    • Dilafruz Juraeva,
    • Roland Eils &
    • Frank Westermann
  24. Computational Life Science Cluster (CLiC), Chemical Biology Center (KBC), Umeå University, Umeå, Sweden.

    • Max Bylesjo &
    • Johan Trygg
  25. GlaxoSmithKline, Collegeville, Pennsylvania, USA.

    • Jie Cheng
  26. Medical Systems Biology Research Center, School of Medicine, Tsinghua University, Beijing, China.

    • Jing Cheng
  27. Almac Diagnostics Ltd., Craigavon, UK.

    • Timothy S Davison
  28. Swiss Institute of Bioinformatics, Lausanne, Switzerland.

    • Mauro Delorenzi &
    • Vlad Popovici
  29. Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, Mississippi, USA.

    • Youping Deng
  30. Global Pharmaceutical R&D, Abbott Laboratories, Souderton, Pennsylvania, USA.

    • Viswanath Devanarayan
  31. National Center for Computational Toxicology, US Environmental Protection Agency, Research Triangle Park, North Carolina, USA.

    • David J Dix,
    • Fathi Elloumi,
    • Richard Judson &
    • Zhen Li
  32. Department of Bioinformatics and Genomics, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain.

    • Joaquin Dopazo
  33. HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York, USA.

    • Kevin C Dorff &
    • Piali Mukherjee
  34. Department of Operation Research and Financial Engineering, Princeton University, Princeton, New Jersey, USA.

    • Jianqing Fan &
    • Yang Feng
  35. MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST / Department of Automation, Tsinghua University, Beijing, China.

    • Shicai Fan,
    • Xuegong Zhang,
    • Rui Jiang,
    • Ying Liu &
    • Lu Meng
  36. Institute of Pharmaceutical Informatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China.

    • Xiaohui Fan,
    • Yiyu Cheng,
    • Jianping Huang &
    • Shao Li
  37. Roche Palo Alto LLC, Palo Alto, California, USA.

    • Nina Gonzaludo
  38. Department of Biostatistics, UT M.D. Anderson Cancer Center, Houston, Texas, USA.

    • Kenneth R Hess
  39. Department of Electrical Engineering & Computer Science, University of Kansas, Lawrence, Kansas, USA.

    • Jun Huan,
    • Brian Quanz &
    • Aaron Smalter
  40. Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA.

    • Rafael A Irizarry &
    • Matthew N McCall
  41. Center for Biologics Evaluation and Research, US Food and Drug Administration, Bethesda, Maryland, USA.

    • Samir Lababidi,
    • Jennifer G Catalano,
    • Jing Han &
    • Raj K Puri
  42. Golden Helix Inc., Bozeman, Montana, USA.

    • Christophe G Lambert
  43. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Yanen Li
  44. SABiosciences Corp., a Qiagen Company, Frederick, Maryland, USA.

    • Guozhen Liu &
    • Xiao Zeng
  45. Cogenics, a Division of Clinical Data Inc., Morrisville, North Carolina, USA.

    • Edward K Lobenhofer
  46. Ligand Pharmaceuticals Inc., La Jolla, California, USA.

    • Wen Luo
  47. GeneGo Inc., Encinitas, California, USA.

    • Yuri Nikolsky,
    • Weiwei Shi,
    • Richard J Brennan &
    • Tatiana Nikolskaya
  48. Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Nathan D Price &
    • Jaeyun Sung
  49. Spheromics, Kontiolahti, Finland.

    • Andreas Scherer
  50. The Center for Bioinformatics and The Institute of Biomedical Sciences, School of Life Science, East China Normal University, Shanghai, China.

    • Tieliu Shi,
    • Chang Chang,
    • Jian Cui,
    • Junwei Wang &
    • Chen Zhao
  51. National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA.

    • Danielle Thierry-Mieg &
    • Jean Thierry-Mieg
  52. Rockefeller Research Laboratories, Memorial Sloan-Kettering Cancer Center, New York, New York, USA.

    • Venkata Thodima
  53. CapitalBio Corporation, Beijing, China.

    • Jianping Wu,
    • Liang Zhang,
    • Sheng Zhu &
    • Qinglan Sun
  54. Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA.

    • Yichao Wu
  55. SRA International (EMMES), Rockville, Maryland, USA.

    • Qian Xie
  56. Helwan University, Helwan, Egypt.

    • Waleed A Yousef
  57. Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Sheng Zhong
  58. Agilent Technologies Inc., Santa Clara, California, USA.

    • Anne Bergstrom Lucas &
    • Stephanie Fulmer-Smentek
  59. F. Hoffmann-La Roche Ltd., Basel, Switzerland.

    • Andreas Buness
  60. Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.

    • Rong Chen
  61. Department of Pathology and Laboratory Medicine and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York, USA.

    • Francesca Demichelis
  62. Cedars-Sinai Medical Center, UCLA David Geffen School of Medicine, Los Angeles, California, USA.

    • Xutao Deng &
    • Charles Wang
  63. Vavilov Institute for General Genetics, Russian Academy of Sciences, Moscow, Russia.

    • Damir Dosymbekov &
    • Marina Tsyganova
  64. DNAVision SA, Gosselies, Belgium.

    • Laurent Gatto
  65. École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.

    • Darlene R Goldstein
  66. State Key Laboratory of Multi-phase Complex Systems, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, China.

    • Li Guo
  67. Abbott Laboratories, Abbott Park, Illinois, USA.

    • Donald N Halbert
  68. Nuvera Biosciences Inc., Woburn, Massachusetts, USA.

    • Christos Hatzis
  69. Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA.

    • Damir Herman
  70. VirginiaTech, Blacksburg, Virgina, USA.

    • Roderick V Jensen
  71. BioMath Solutions, LLC, Austin, Texas, USA.

    • Charles D Johnson
  72. Bioinformatic Program, University of Toledo, Toledo, Ohio, USA.

    • Sadik A Khuder
  73. Department of Mathematics, University of Bayreuth, Bayreuth, Germany.

    • Matthias Kohl
  74. Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA.

    • Jianying Li
  75. Pediatric Department, Stanford University, Stanford, California, USA.

    • Li Li
  76. College of Chemistry, Sichuan University, Chengdu, Sichuan, China.

    • Menglong Li
  77. University of Texas Southwestern Medical Center (UTSW), Dallas, Texas, USA.

    • Quan-Zhen Li
  78. Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain.

    • Ignacio Medina &
    • David Montaner
  79. Millennium Pharmaceuticals Inc., Cambridge, Massachusetts, USA.

    • George J Mulligan
  80. RTI International, Atlanta, Georgia, USA.

    • Grier P Page
  81. Takeda Global R & D Center, Inc., Deerfield, Illinois, USA.

    • Xuejun Peng
  82. Novartis Institutes of Biomedical Research, Cambridge, Massachusetts, USA.

    • Ron L Peterson
  83. W.M. Keck Center for Collaborative Neuroscience, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.

    • Yi Ren
  84. Entelos Inc., Foster City, California, USA.

    • Alan H Roter
  85. Biomarker Development, Novartis Institutes of BioMedical Research, Novartis Pharma AG, Basel, Switzerland.

    • Martin M Schumacher &
    • Frank Staedtler
  86. Genedata Inc., Lexington, Massachusetts, USA.

    • Joseph D Shambaugh
  87. Affymetrix Inc., Santa Clara, California, USA.

    • Richard Shippy
  88. Department of Chemistry and Chemical Engineering, Hefei Teachers College, Hefei, Anhui, China.

    • Shengzhu Si
  89. Institut Jules Bordet, Brussels, Belgium.

    • Christos Sotiriou
  90. Biostatistics, F. Hoffmann-La Roche Ltd., Basel, Switzerland.

    • Guido Steiner
  91. Lilly Singapore Centre for Drug Discovery, Immunos, Singapore.

    • Yaron Turpaz
  92. Microsoft Corporation, US Health Solutions Group, Redmond, Washington, USA.

    • Silvia C Vega
  93. Data Analysis Solutions DA-SOL GmbH, Greifenberg, Germany.

    • Juergen von Frese
  94. Cornell University, Ithaca, New York, USA.

    • Wei Wang
  95. Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Toledo Health Sciences Campus, Toledo, Ohio, USA.

    • James C Willey
  96. Bristol-Myers Squibb, Pennington, New Jersey, USA.

    • Shujian Wu
  97. OpGen Inc., Gaithersburg, Maryland, USA.

    • Nianqing Xiao

Consortia

  1. MAQC Consortium

    • Leming Shi,
    • Gregory Campbell,
    • Wendell D Jones,
    • Fabien Campagne,
    • Zhining Wen,
    • Stephen J Walker,
    • Zhenqiang Su,
    • Tzu-Ming Chu,
    • Federico M Goodsaid,
    • Lajos Pusztai,
    • John D Shaughnessy Jr,
    • André Oberthuer,
    • Russell S Thomas,
    • Richard S Paules,
    • Mark Fielden,
    • Bart Barlogie,
    • Weijie Chen,
    • Pan Du,
    • Matthias Fischer,
    • Cesare Furlanello,
    • Brandon D Gallas,
    • Xijin Ge,
    • Dalila B Megherbi,
    • W Fraser Symmans,
    • May D Wang,
    • John Zhang,
    • Hans Bitter,
    • Benedikt Brors,
    • Pierre R Bushel,
    • Max Bylesjo,
    • Minjun Chen,
    • Jie Cheng,
    • Jing Cheng,
    • Jeff Chou,
    • Timothy S Davison,
    • Mauro Delorenzi,
    • Youping Deng,
    • Viswanath Devanarayan,
    • David J Dix,
    • Joaquin Dopazo,
    • Kevin C Dorff,
    • Fathi Elloumi,
    • Jianqing Fan,
    • Shicai Fan,
    • Xiaohui Fan,
    • Hong Fang,
    • Nina Gonzaludo,
    • Kenneth R Hess,
    • Huixiao Hong,
    • Jun Huan,
    • Rafael A Irizarry,
    • Richard Judson,
    • Dilafruz Juraeva,
    • Samir Lababidi,
    • Christophe G Lambert,
    • Li Li,
    • Yanen Li,
    • Zhen Li,
    • Simon M Lin,
    • Guozhen Liu,
    • Edward K Lobenhofer,
    • Jun Luo,
    • Wen Luo,
    • Matthew N McCall,
    • Yuri Nikolsky,
    • Gene A Pennello,
    • Roger G Perkins,
    • Reena Philip,
    • Vlad Popovici,
    • Nathan D Price,
    • Feng Qian,
    • Andreas Scherer,
    • Tieliu Shi,
    • Weiwei Shi,
    • Jaeyun Sung,
    • Danielle Thierry-Mieg,
    • Jean Thierry-Mieg,
    • Venkata Thodima,
    • Johan Trygg,
    • Lakshmi Vishnuvajjala,
    • Sue Jane Wang,
    • Jianping Wu,
    • Yichao Wu,
    • Qian Xie,
    • Waleed A Yousef,
    • Liang Zhang,
    • Xuegong Zhang,
    • Sheng Zhong,
    • Yiming Zhou,
    • Sheng Zhu,
    • Dhivya Arasappan,
    • Wenjun Bao,
    • Anne Bergstrom Lucas,
    • Frank Berthold,
    • Richard J Brennan,
    • Andreas Buness,
    • Jennifer G Catalano,
    • Chang Chang,
    • Rong Chen,
    • Yiyu Cheng,
    • Jian Cui,
    • Wendy Czika,
    • Francesca Demichelis,
    • Xutao Deng,
    • Damir Dosymbekov,
    • Roland Eils,
    • Yang Feng,
    • Jennifer Fostel,
    • Stephanie Fulmer-Smentek,
    • James C Fuscoe,
    • Laurent Gatto,
    • Weigong Ge,
    • Darlene R Goldstein,
    • Li Guo,
    • Donald N Halbert,
    • Jing Han,
    • Stephen C Harris,
    • Christos Hatzis,
    • Damir Herman,
    • Jianping Huang,
    • Roderick V Jensen,
    • Rui Jiang,
    • Charles D Johnson,
    • Giuseppe Jurman,
    • Yvonne Kahlert,
    • Sadik A Khuder,
    • Matthias Kohl,
    • Jianying Li,
    • Li Li,
    • Menglong Li,
    • Quan-Zhen Li,
    • Shao Li,
    • Zhiguang Li,
    • Jie Liu,
    • Ying Liu,
    • Zhichao Liu,
    • Lu Meng,
    • Manuel Madera,
    • Francisco Martinez-Murillo,
    • Ignacio Medina,
    • Joseph Meehan,
    • Kelci Miclaus,
    • Richard A Moffitt,
    • David Montaner,
    • Piali Mukherjee,
    • George J Mulligan,
    • Padraic Neville,
    • Tatiana Nikolskaya,
    • Baitang Ning,
    • Grier P Page,
    • Joel Parker,
    • R Mitchell Parry,
    • Xuejun Peng,
    • Ron L Peterson,
    • John H Phan,
    • Brian Quanz,
    • Yi Ren,
    • Samantha Riccadonna,
    • Alan H Roter,
    • Frank W Samuelson,
    • Martin M Schumacher,
    • Joseph D Shambaugh,
    • Qiang Shi,
    • Richard Shippy,
    • Shengzhu Si,
    • Aaron Smalter,
    • Christos Sotiriou,
    • Mat Soukup,
    • Frank Staedtler,
    • Guido Steiner,
    • Todd H Stokes,
    • Qinglan Sun,
    • Pei-Yi Tan,
    • Rong Tang,
    • Zivana Tezak,
    • Brett Thorn,
    • Marina Tsyganova,
    • Yaron Turpaz,
    • Silvia C Vega,
    • Roberto Visintainer,
    • Juergen von Frese,
    • Charles Wang,
    • Eric Wang,
    • Junwei Wang,
    • Wei Wang,
    • Frank Westermann,
    • James C Willey,
    • Matthew Woods,
    • Shujian Wu,
    • Nianqing Xiao,
    • Joshua Xu,
    • Lei Xu,
    • Lun Yang,
    • Xiao Zeng,
    • Jialu Zhang,
    • Li Zhang,
    • Min Zhang,
    • Chen Zhao,
    • Raj K Puri,
    • Uwe Scherf,
    • Weida Tong &
    • Russell D Wolfinger

Competing financial interests

Many of the MAQC-II participants are employed by companies that manufacture gene expression products and/or perform testing services.

Corresponding author

Correspondence to:

Author details

    Supplementary information

    PDF files

    1. Supplementary Text and Figures (4M)

      Supplementary Tables 3–8, Supplementary Data and Supplementary Figs. 1–13

    Excel files

    1. Supplementary Table 1 (15M)

      UniqueModels19779_PerformanceMetrics

    2. Supplementary Table 2 (12M)

      Swap_UniqueModels13287_PerformanceMetrics

    Additional data