Analysis

Repeatability of published microarray gene expression analyses

Received:
Accepted:
Published online:

Abstract

Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.

  • Subscribe to Nature Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Accessions

References

  1. 1.

    Microarray analysis. (John Wiley & Sons, Hoboken, New Jersey, 2003).

  2. 2.

    et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

  3. 3.

    , , & Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006).

  4. 4.

    et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

  5. 5.

    Anonymous. Minimum compliance for a microarray experiment? Nat. Genet. 38, 1089 (2006).

  6. 6.

    et al. Submission of microarray data to public repositories. PLoS Biol. 2, e317 (2004).

  7. 7.

    , & Gene Expression Omnibus: NCBI gene expression and hybridization assay repository. Nucleic Acids Res. 30, 207–210 (2002).

  8. 8.

    et al. Array Express – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).

  9. 9.

    & Lack of correct data format and comparability limits future integrative microarray research. Nat. Biotechnol. 24, 1322–1323 (2006).

  10. 10.

    & Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 99, 147–157 (2007).

  11. 11.

    , & Selective discussion and transparency in microarray research findings for cancer outcomes. Eur. J. Cancer 43, 1999–2010 (2007).

  12. 12.

    International Committee of Medical Journal Editors. Uniform Requirements for Manuscripts Submitted to Biomedical Journals: Writing and Editing for Biomedical Publication. <> (2008).

  13. 13.

    Molecular evidence-based medicine: evolution and integration of information in the genomic era. Eur. J. Clin. Invest. 37, 340–349 (2007).

  14. 14.

    et al. Abnormal skin, limb and craniofacial morphogenesis in mice deficient for interferon regulatory factor 6 (Irf6). Nat. Genet. 38, 1335–1340 (2006).

  15. 15.

    et al. Genome-wide analysis of estrogen receptor binding sites. Nat. Genet. 38, 1289–1297 (2006).

  16. 16.

    & Molecular analysis of flies selected for aggressive behavior. Nat. Genet. 38, 1023–1031 (2006).

  17. 17.

    et al. Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat. Genet. 38, 1005–1014 (2006).

  18. 18.

    , , & A genetic signature of interspecies variations in gene expression. Nat. Genet. 38, 830–834 (2006).

  19. 19.

    et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 38, 431–440 (2006).

  20. 20.

    et al. Physiogenomic resources for rat models of heart, lung and blood disorders. Nat. Genet. 38, 234–239 (2006).

  21. 21.

    et al. Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits. Nat. Genet. 37, 1224–1233 (2005).

  22. 22.

    , & Genome-scale profiling of histone H3.3 replacement patterns. Nat. Genet. 37, 1090–1097 (2005).

  23. 23.

    et al. The melanocyte differentiation program predisposes to metastasis after neoplastic transformation. Nat. Genet. 37, 1047–1054 (2005).

  24. 24.

    et al. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs. Nat. Genet. 37, 991–996 (2005).

  25. 25.

    et al. Nova regulates brain-specific splicing to shape the synapse. Nat. Genet. 37, 844–852 (2005).

  26. 26.

    et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).

  27. 27.

    et al. The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nat. Genet. 37, 544–548 (2005).

  28. 28.

    et al. Epistasis analysis with global transcriptional phenotypes. Nat. Genet. 37, 471–477 (2005).

  29. 29.

    et al. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506 (2005).

  30. 30.

    , , & Widespread and nonrandom distribution of DNA palindromes in cancer cells provides a structural platform for subsequent gene amplification. Nat. Genet. 37, 320–327 (2005).

  31. 31.

    et al. An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat. Genet. 37, 182–186 (2005).

  32. 32.

    et al. An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat. Genet. 37, 48–55 (2005).

  33. 33.

    , & Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nat. Genet. 37, 67–72 (2005).

  34. 34.

    et al. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–344 (2005).

  35. 35.

    et al. Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data. BMC Bioinformatics 8, 412 (2007).

  36. 36.

    & Inferential literacy for experimental high-throughput biology. Trends Genet. 22, 84–89 (2006).

  37. 37.

    MIAME, we have a problem. Trends Genet. 22, 65–66 (2006).

  38. 38.

    , , & Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 22, 101–109 (2006).

  39. 39.

    , & Sharing detailed research data is associated with increased citation rate. PLoS ONE. 2, e308 (2007).

  40. 40.

    & ArrayExpress service for reviewers/editors of DNA microarray papers. Nat. Biotechnol. 24, 1321–1322 (2006).

  41. 41.

    Reproducible research: a bioinformatics case study. Stat. Appl. Genet. Mol. Biol. 4, 2 (2005).

  42. 42.

    Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).

Download references

Author information

Affiliations

  1. Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina 45110, Greece.

    • John P A Ioannidis
  2. Biomedical Research Institute, Foundation for Research and Technology–Hellas, Ioannina 45110, Greece.

    • John P A Ioannidis
  3. Center for Genetic Epidemiology and Modeling, Tufts Medical Center and Department of Medicine, Tufts University School of Medicine, Boston, Massachusetts 02111, USA.

    • John P A Ioannidis
  4. Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.

    • David B Allison
    • , Issa Coulibaly
    • , Xiangqin Cui
    • , Tapan Mehta
    •  & Grier P Page
  5. Department of Biochemistry, Stanford University School of Medicine, Stanford, California 94305, USA.

    • Catherine A Ball
    •  & Michael Nitzberg
  6. Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.

    • Aedín C Culhane
  7. Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, USA.

    • Aedín C Culhane
  8. Genomic Medicine, Faculty of Medicine, Imperial College London, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK.

    • Mario Falchi
  9. Department of Twin Research & Genetic Epidemiology, St. Thomas' Campus, King's College London, Lambeth Palace Road, London SE1 7EH, UK.

    • Mario Falchi
  10. Fondazione Bruno Kessler, via Sommarive 18, 38100 Povo-Trento, Italy.

    • Cesare Furlanello
    •  & Giuseppe Jurman
  11. Medical Research Council Clinical Sciences Centre Microarray Centre, Hammersmith Hospital, Du Cane Road, London W12 0NN, UK.

    • Laurence Game
    • , Jon Mangion
    •  & Enrico Petretto
  12. Statistics and Epidemiology Unit, RTI International, Atlanta, Georgia 30341, USA.

    • Grier P Page
  13. Department of Epidemiology, Public Health and Primary Care, Faculty of Medicine, Imperial College, Praed Street, London W2 1PG, UK.

    • Enrico Petretto
  14. European Molecular Biology Laboratory Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany.

    • Vera van Noort

Authors

  1. Search for John P A Ioannidis in:

  2. Search for David B Allison in:

  3. Search for Catherine A Ball in:

  4. Search for Issa Coulibaly in:

  5. Search for Xiangqin Cui in:

  6. Search for Aedín C Culhane in:

  7. Search for Mario Falchi in:

  8. Search for Cesare Furlanello in:

  9. Search for Laurence Game in:

  10. Search for Giuseppe Jurman in:

  11. Search for Jon Mangion in:

  12. Search for Tapan Mehta in:

  13. Search for Michael Nitzberg in:

  14. Search for Grier P Page in:

  15. Search for Enrico Petretto in:

  16. Search for Vera van Noort in:

Corresponding author

Correspondence to John P A Ioannidis.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1 and 2, Supplementary Table 1