Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Design issues for cDNA microarray experiments

Key Points

  • Microarray experiments are widely used to quantify and compare gene expression on a large scale, it is therefore important that they are designed with care as aspects of their design will affect the validity of their results and experimental efficiency.

  • Several scientific and practical issues (for example, the amount of material available) will affect the choice of experimental design.

  • A key choice in microarray design is whether to use direct or indirect comparisons; that is, whether to make comparisons within or between slides.

  • Dye-swap experiments allow the experimenter to minimize the systematic bias that comes from the systematic differences between green and red intensities.

  • Different experimental designs are appropriate in different experimental contexts; examples of single-factor and multifactorial designs are discussed.

  • Replication of microarray experiments is important as it reduces variability, and data obtained from replicated experiments can be analysed using formal statistical methods. Not all forms of replication are equal, and technical and biological replicates are compared.

  • The issue of sample size is problematic in microarray experiments because the variance of the relative expression levels across hybridizations varies greatly from gene to gene.

Abstract

Microarray experiments are used to quantify and compare gene expression on a large scale. As with all large-scale experiments, they can be costly in terms of equipment, consumables and time. Therefore, careful design is particularly important if the resulting experiment is to be maximally informative, given the effort and the resources. What then are the issues that need to be addressed when planning microarray experiments? Which features of an experiment have the most impact on the accuracy and precision of the resulting measurements? How do we balance the different components of experimental design to reach a decision? For example, should we replicate, and if so, how?

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Direct versus indirect designs.
Figure 2: Dye-swap replications.
Figure 3: Averaging replicates reduces variability.

Similar content being viewed by others

References

  1. Hughes, T. R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).

    Article  CAS  Google Scholar 

  2. Van 't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    Article  CAS  Google Scholar 

  3. Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).

    Article  CAS  Google Scholar 

  4. Spellman, P. T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).

    Article  CAS  Google Scholar 

  5. Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. & Rubin, E. M. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. 10, 2022–2029 (2000).

    Article  CAS  Google Scholar 

  6. Redfern, C. H. et al. Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. Proc. Natl Acad. Sci. USA 97, 4826–4831 (2000).

    Article  CAS  Google Scholar 

  7. Kerr, M. K. & Churchill, G. A. Experimental design for gene expression microarrays. Biostatistics 2, 183–201 (2001).The first paper to present the statistical principles of experimental design in the context of microarray experiments. Analysis involves a linear model for log intensities. Loop designs are introduced and compared with common reference designs.

    Article  CAS  Google Scholar 

  8. Fisher, R. A. The arrangement of field experiments. J. Min. Agric. Gr. Br. 33, 503–513 (1926).

    Google Scholar 

  9. Cox, D. R. Planning of Experiments (Wiley, New York, 1958).A classic book about the statistical design of experiments.

    Google Scholar 

  10. Box, G. E. P., Hunter, W. G. & Hunter, J. S. Statistics for Experimenters: An Introduction to Design, Data Analysis and Model Building (Wiley, New York, 1978).A modern classic on the statistical design and analysis of experiments.

    Google Scholar 

  11. Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000).

    Article  CAS  Google Scholar 

  12. Tseng, G. C., Oh, M. K., Rohlin, L., Liao, J. C. & Wong, W. H. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 29, 2549–2557 (2001).

    Article  CAS  Google Scholar 

  13. Yang, Y. H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, E15 (2002).

    Article  Google Scholar 

  14. Jin, W. et al. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nature Genet. 29, 389–395 (2001).The authors carry out a mixed model analysis of variance on single channel log intensities, including age, sex and strain. Of these, age effects were estimated within arrays, whereas sex and strain effects were estimated between arrays. No single-channel between-slide normalization was carried out. The authors found strong evidence for differential dye effects.

    Article  CAS  Google Scholar 

  15. Kerr, M. K. & Churchill, G. A. Statistical design and the analysis of gene expression microarrays. Genet. Res. 77, 123–128 (2001).The authors apply classical statistical experimental design to cDNA microarray experiments and keep Cy3 and Cy5 spot intensities separate in the analysis. The study assumes global normalization is adequate.

    CAS  PubMed  Google Scholar 

  16. Yates, F. The Design and Analysis of Factorial Experiments Technical Communication 35 (Commonwealth Bureau of Soils, Harpenden, Herts, 1937).A classic book on factorial experiments.

    Google Scholar 

  17. Glonek, G. F. V. & Solomon, P. J. Factorial Designs for Microarray Experiments Technical Report (Department of Applied Mathematics, University of Adelaide, South Australia, 2002).The first careful treatment of optimal design for factorials.

    Google Scholar 

  18. Lee, M. L., Kuo, F. C., Whitmore, G. A. & Sklar, J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl Acad. Sci. USA 97, 9834–9839 (2000).

    Article  CAS  Google Scholar 

  19. Black, M. A. & Doerge, R. W. Calculation of the minimum number of replicate spots required for detection of significant gene expression fold changes for cDNA microarrays. Bioinformatics (in the press). The authors limit their discussions to replicate spots within a slide.

  20. Dudoit, S., Yang, Y. H., Callow, M. J. & Speed, T. P. Statistical methods for identifying genes with differential expression in replicated cDNA microarray experiments. Statist. Sincia 12, 111–139 (2001).

    Google Scholar 

  21. Wolfinger, R. D. et al. Assessing gene significance from cDNA microarray expression data via mixed models. J. Comp. Biol. 8, 625–638 (2001).

    Article  CAS  Google Scholar 

  22. Zien, A., Fluck, J., Zimmer, R. & Lengauer, T. Microarrays: How Many do you Need? Proceedings of RECOMB 2002 (Association for Computing Machinery, New York, 2002).Using non-standard power analysis, this paper answers the question posed in its title.

    Google Scholar 

  23. Friddle, C. J., Koga, T., Rubin, E. M. & Bristow, J. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy. Proc. Natl Acad. Sci. USA 97, 6745–6750 (2000).

    Article  CAS  Google Scholar 

  24. Redfern, C. H. et al. Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. Proc. Natl Acad. Sci. USA 97, 4826–4831 (2000).

    Article  CAS  Google Scholar 

  25. Pritchard, C. C., Hsu, L., Delrow, J. & Nelson, P. S. Project normal: defining normal variance in mouse gene expression. Proc. Natl Acad. Sci. USA 98, 13266–13271 (2001).

    Article  CAS  Google Scholar 

  26. Boldrick, J. C. et al. Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc. Natl Acad. Sci. USA 99, 972–977 (2002).

    Article  CAS  Google Scholar 

  27. Hinkelmann, K. & Kempthorne, O. Design and Analysis of Experiments Vol. 1 Introduction to Experimental Design (Wiley, New York, 1994).

    Google Scholar 

  28. Bingham, D. & Sitter, R. R. Design issues for fractional factorial split-plot experiments. J. Quality Technol. 33, 2–15 (2001).

    Article  Google Scholar 

  29. The chipping forecast. Nature Genet. 21 (Suppl.) (1999).

  30. Youden, W. J. in Precision Measurement and Calibration: Statistical Concepts and Procedures Vol. 1 of Special Publication 300 (ed. Ku, H. H.) 146–151 (National Bureau of Standards, United States Department of Commerce, Washington, DC, 1969).

    Google Scholar 

Download references

Acknowledgements

We thank S. Dudoit and N. Thorne for discussions and assistance during the course of this review. We also thank M. J. Callow from the Lawrence Berkeley National Laboratory and members of J. Ngai's lab — D. Lin, E. Diaz and J. Scolnick — for providing the data used in the figures. In addition, we are grateful to D. Bowtell for feedback on many design issues. This work was supported in part by the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Terry Speed.

Supplementary information

Related links

Related links

FURTHER INFORMATION

Terry Speed's lab

Glossary

COMPETITIVE HYBRIDIZATION

A mixture of differently labelled target cDNA fragments that are hybridized together in the presence of a common probe or collection of probes.

LOG RATIO

The logarithm, usually to the base 2, of the ratio of the measured signal intensities in the two channels of a two-colour microarray experiment. If we denote these two signals by R (red channel) and G (green channel), then their log ratio is log2(R/G).

CHANNEL

cDNA microarrays have paired hybridization intensity measurements that are taken from two wavelength bands after laser excitation at two wavelengths. These two sources of data are known as channels. By contrast, measurements of radiolabelled hybridization products are single channel, as are the Affymetrix microarrays.

VARIANCE

The most common statistical measure of variability of a random quantity or random sample about its mean. Its scale is the square of the scale of the random quantity or sample. The square root of the variance is known as the standard deviation.

LOOP DESIGN

A design that involves mRNA samples labelled 1, 2, 3,...,n, hybridized together in pairs (1,2), (2,3), ..., (n − 1,n), (n,1).

SUMMARY STATISTIC

A numerical summary of some aspect of an experiment, typically an estimate of a parameter.

POWER CALCULATION

A calculation that leads to the probability that a null hypothesis that is being tested will be rejected in favour of the alternative, under specified assumptions that imply that the alternative hypothesis is true.

MEDIAN

The middle value in a set of numbers ordered in value from smallest to largest. If there are an even number of numbers, the median is the average of the middle two after ordering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Speed, T. Design issues for cDNA microarray experiments. Nat Rev Genet 3, 579–588 (2002). https://doi.org/10.1038/nrg863

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg863

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing