Key Points
-
Microarray experiments are widely used to quantify and compare gene expression on a large scale, it is therefore important that they are designed with care as aspects of their design will affect the validity of their results and experimental efficiency.
-
Several scientific and practical issues (for example, the amount of material available) will affect the choice of experimental design.
-
A key choice in microarray design is whether to use direct or indirect comparisons; that is, whether to make comparisons within or between slides.
-
Dye-swap experiments allow the experimenter to minimize the systematic bias that comes from the systematic differences between green and red intensities.
-
Different experimental designs are appropriate in different experimental contexts; examples of single-factor and multifactorial designs are discussed.
-
Replication of microarray experiments is important as it reduces variability, and data obtained from replicated experiments can be analysed using formal statistical methods. Not all forms of replication are equal, and technical and biological replicates are compared.
-
The issue of sample size is problematic in microarray experiments because the variance of the relative expression levels across hybridizations varies greatly from gene to gene.
Abstract
Microarray experiments are used to quantify and compare gene expression on a large scale. As with all large-scale experiments, they can be costly in terms of equipment, consumables and time. Therefore, careful design is particularly important if the resulting experiment is to be maximally informative, given the effort and the resources. What then are the issues that need to be addressed when planning microarray experiments? Which features of an experiment have the most impact on the accuracy and precision of the resulting measurements? How do we balance the different components of experimental design to reach a decision? For example, should we replicate, and if so, how?
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Hughes, T. R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).
Van 't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).
Chu, S. et al. The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998).
Spellman, P. T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. & Rubin, E. M. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. 10, 2022–2029 (2000).
Redfern, C. H. et al. Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. Proc. Natl Acad. Sci. USA 97, 4826–4831 (2000).
Kerr, M. K. & Churchill, G. A. Experimental design for gene expression microarrays. Biostatistics 2, 183–201 (2001).The first paper to present the statistical principles of experimental design in the context of microarray experiments. Analysis involves a linear model for log intensities. Loop designs are introduced and compared with common reference designs.
Fisher, R. A. The arrangement of field experiments. J. Min. Agric. Gr. Br. 33, 503–513 (1926).
Cox, D. R. Planning of Experiments (Wiley, New York, 1958).A classic book about the statistical design of experiments.
Box, G. E. P., Hunter, W. G. & Hunter, J. S. Statistics for Experimenters: An Introduction to Design, Data Analysis and Model Building (Wiley, New York, 1978).A modern classic on the statistical design and analysis of experiments.
Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000).
Tseng, G. C., Oh, M. K., Rohlin, L., Liao, J. C. & Wong, W. H. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 29, 2549–2557 (2001).
Yang, Y. H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, E15 (2002).
Jin, W. et al. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nature Genet. 29, 389–395 (2001).The authors carry out a mixed model analysis of variance on single channel log intensities, including age, sex and strain. Of these, age effects were estimated within arrays, whereas sex and strain effects were estimated between arrays. No single-channel between-slide normalization was carried out. The authors found strong evidence for differential dye effects.
Kerr, M. K. & Churchill, G. A. Statistical design and the analysis of gene expression microarrays. Genet. Res. 77, 123–128 (2001).The authors apply classical statistical experimental design to cDNA microarray experiments and keep Cy3 and Cy5 spot intensities separate in the analysis. The study assumes global normalization is adequate.
Yates, F. The Design and Analysis of Factorial Experiments Technical Communication 35 (Commonwealth Bureau of Soils, Harpenden, Herts, 1937).A classic book on factorial experiments.
Glonek, G. F. V. & Solomon, P. J. Factorial Designs for Microarray Experiments Technical Report (Department of Applied Mathematics, University of Adelaide, South Australia, 2002).The first careful treatment of optimal design for factorials.
Lee, M. L., Kuo, F. C., Whitmore, G. A. & Sklar, J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl Acad. Sci. USA 97, 9834–9839 (2000).
Black, M. A. & Doerge, R. W. Calculation of the minimum number of replicate spots required for detection of significant gene expression fold changes for cDNA microarrays. Bioinformatics (in the press). The authors limit their discussions to replicate spots within a slide.
Dudoit, S., Yang, Y. H., Callow, M. J. & Speed, T. P. Statistical methods for identifying genes with differential expression in replicated cDNA microarray experiments. Statist. Sincia 12, 111–139 (2001).
Wolfinger, R. D. et al. Assessing gene significance from cDNA microarray expression data via mixed models. J. Comp. Biol. 8, 625–638 (2001).
Zien, A., Fluck, J., Zimmer, R. & Lengauer, T. Microarrays: How Many do you Need? Proceedings of RECOMB 2002 (Association for Computing Machinery, New York, 2002).Using non-standard power analysis, this paper answers the question posed in its title.
Friddle, C. J., Koga, T., Rubin, E. M. & Bristow, J. Expression profiling reveals distinct sets of genes altered during induction and regression of cardiac hypertrophy. Proc. Natl Acad. Sci. USA 97, 6745–6750 (2000).
Redfern, C. H. et al. Conditional expression of a Gi-coupled receptor causes ventricular conduction delay and a lethal cardiomyopathy. Proc. Natl Acad. Sci. USA 97, 4826–4831 (2000).
Pritchard, C. C., Hsu, L., Delrow, J. & Nelson, P. S. Project normal: defining normal variance in mouse gene expression. Proc. Natl Acad. Sci. USA 98, 13266–13271 (2001).
Boldrick, J. C. et al. Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc. Natl Acad. Sci. USA 99, 972–977 (2002).
Hinkelmann, K. & Kempthorne, O. Design and Analysis of Experiments Vol. 1 Introduction to Experimental Design (Wiley, New York, 1994).
Bingham, D. & Sitter, R. R. Design issues for fractional factorial split-plot experiments. J. Quality Technol. 33, 2–15 (2001).
The chipping forecast. Nature Genet. 21 (Suppl.) (1999).
Youden, W. J. in Precision Measurement and Calibration: Statistical Concepts and Procedures Vol. 1 of Special Publication 300 (ed. Ku, H. H.) 146–151 (National Bureau of Standards, United States Department of Commerce, Washington, DC, 1969).
Acknowledgements
We thank S. Dudoit and N. Thorne for discussions and assistance during the course of this review. We also thank M. J. Callow from the Lawrence Berkeley National Laboratory and members of J. Ngai's lab — D. Lin, E. Diaz and J. Scolnick — for providing the data used in the figures. In addition, we are grateful to D. Bowtell for feedback on many design issues. This work was supported in part by the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Related links
Glossary
- COMPETITIVE HYBRIDIZATION
-
A mixture of differently labelled target cDNA fragments that are hybridized together in the presence of a common probe or collection of probes.
- LOG RATIO
-
The logarithm, usually to the base 2, of the ratio of the measured signal intensities in the two channels of a two-colour microarray experiment. If we denote these two signals by R (red channel) and G (green channel), then their log ratio is log2(R/G).
- CHANNEL
-
cDNA microarrays have paired hybridization intensity measurements that are taken from two wavelength bands after laser excitation at two wavelengths. These two sources of data are known as channels. By contrast, measurements of radiolabelled hybridization products are single channel, as are the Affymetrix microarrays.
- VARIANCE
-
The most common statistical measure of variability of a random quantity or random sample about its mean. Its scale is the square of the scale of the random quantity or sample. The square root of the variance is known as the standard deviation.
- LOOP DESIGN
-
A design that involves mRNA samples labelled 1, 2, 3,...,n, hybridized together in pairs (1,2), (2,3), ..., (n − 1,n), (n,1).
- SUMMARY STATISTIC
-
A numerical summary of some aspect of an experiment, typically an estimate of a parameter.
- POWER CALCULATION
-
A calculation that leads to the probability that a null hypothesis that is being tested will be rejected in favour of the alternative, under specified assumptions that imply that the alternative hypothesis is true.
- MEDIAN
-
The middle value in a set of numbers ordered in value from smallest to largest. If there are an even number of numbers, the median is the average of the middle two after ordering.
Rights and permissions
About this article
Cite this article
Yang, Y., Speed, T. Design issues for cDNA microarray experiments. Nat Rev Genet 3, 579–588 (2002). https://doi.org/10.1038/nrg863
Issue Date:
DOI: https://doi.org/10.1038/nrg863
This article is cited by
-
Investigating the molecular basis of multiple insecticide resistance in a major malaria vector Anopheles funestus (sensu stricto) from Akaka-Remo, Ogun State, Nigeria
Parasites & Vectors (2020)
-
Intratype variants of the E2 protein from human papillomavirus type 18 induce different gene expression profiles associated with apoptosis and cell proliferation
Archives of Virology (2019)
-
Approximate theory-aided robust efficient factorial fractions under baseline parametrization
Annals of the Institute of Statistical Mathematics (2016)
-
Gene expression profiling of the human natural killer cell response to Fc receptor activation: unique enhancement in the presence of interleukin-12
BMC Medical Genomics (2015)
-
Assessment of copy number variations in the brain genome of schizophrenia patients
Molecular Cytogenetics (2015)