A key problem in analyzing data from microarray experiments involves deciding when two expression levels corresponding to the same gene or expressed sequence tag are significantly different. A first approximation is to declare significance whenever the fold-difference between two expression levels exceeds some predefined threshold, but this approach ignores the fact that the variability of the ratio is linked to the overall expression level. A second approximation is to use multiple replicate arrays to build up an estimate of the standard deviation of the ratio on a gene-by-gene basis, but this requires many arrays. We propose a middle ground. Beginning from first principles governing the behavior of loose strands of complementary DNA on microarrays, we derive a simple parametric model describing the type of variability that should be expected from chance alone, thus providing a null distribution for testing significance. When the variance of the log ratio is plotted as a function of the log intensity, the shape of the function is given by an exponential decay plus a constant. We estimate the parameters of the model on an array-by-array basis, using within-array replication, and then use replication between arrays to reduce the variation associated with our results. We illustrate the model using both simulations and data from a preliminary study of bladder cancer.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Baggerly, K., Coombes, K., Hess, K. et al. Modeling significance and reproducibility on high-density cDNA microarrays. Nat Genet 27 (Suppl 4), 40 (2001). https://doi.org/10.1038/86990
Issue Date:
DOI: https://doi.org/10.1038/86990