Statistical practice in high-throughput screening data analysis

Malo, Nathalie; Hanley, James A; Cerquozzi, Sonia; Pelletier, Jerry; Nadon, Robert

doi:10.1038/nbt1186

Review Article
Published: 07 February 2006

Statistical practice in high-throughput screening data analysis

Nathalie Malo^1,2,
James A Hanley²,
Sonia Cerquozzi¹,
Jerry Pelletier³ &
…
Robert Nadon^1,4

Nature Biotechnology volume 24, pages 167–175 (2006)Cite this article

16k Accesses
520 Citations
13 Altmetric
Metrics details

Abstract

High-throughput screening is an early critical step in drug discovery. Its aim is to screen a large number of diverse chemical compounds to identify candidate 'hits' rapidly and accurately. Few statistical tools are currently available, however, to detect quality hits with a high degree of confidence. We examine statistical aspects of data preprocessing and hit identification for primary screens. We focus on concerns related to positional effects of wells within plates, choice of hit threshold and the importance of minimizing false-positive and false-negative rates. We argue that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met. The integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Typical location of controls on a 96-well plate.**

**Figure 3: Titration series in a translation assay.**

**Figure 4: Presence of edge effects in a high-throughput screen.**

**Figure 5: Replicates, false-positive and false-negative rates.**

**Figure 6: Verification of the assumptions of normally distributed data with constant variance among compounds.**

**Figure 7: Verification of the assumption that the within-compound variances follow an inverse gamma distribution.**

Statistical models for identifying frequent hitters in high throughput screening

Article Open access 14 October 2020

A statistical framework for high-content phenotypic profiling using cellular feature distributions

Article Open access 22 December 2022

A practical guide to large-scale docking

Article 24 September 2021

References

Dove, A. Screening for content—the evolution of high throughput. Nat. Biotechnol. 21, 859–864 (2003).
Article CAS Google Scholar
Landro, J.A. et al. HTS in the new millennium: the role of pharmacology and flexibility. J. Pharmacol. Toxicol. Methods 44, 273–289 (2000).
Article CAS Google Scholar
Stein, R.L. High-throughput screening in academia: the Harvard experience. J. Biomol. Screen. 8, 615–619 (2003).
Article CAS Google Scholar
Nelson, R.M. & Yingling, J.D. Introduction to High-Throughput Screening for Drug Discovery (IBC USA Conferences, Inc., San Diego, CA, 2004).
Google Scholar
Campbell, D.T. & Kenny, D.A. A Primer on Regression Artifacts (Guilford Press, New York, 1999).
Google Scholar
Stigler, S.M. Statistics on the Table: the History of Statistical Concepts and Methods (Harvard University Press, Cambridge, MA, 1999).
Lundholt, B.K., Scudder, K.M. & Pagliaro, L. A simple technique for reducing edge effect in cell-based assays. J. Biomol. Screen. 8, 566–570 (2003).
Article CAS Google Scholar
Zhang, J.H., Chung, T.D.Y. & Oldenburg, K.R. Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem. 2, 258–265 (2000).
Article CAS Google Scholar
Tukey, J.W. A survey of sampling from contaminated distributions. in Contributions to Probability and Statistics (ed. Olkin, I.) 448–485 (Stanford University Press, Stanford, CA, 1960).
Google Scholar
Brideau, C., Gunter, B., Pikounis, B. & Liaw, A. Improved statistical methods for hit selection in high-throughput screening. J. Biomol. Screen. 8, 634–647 (2003).
Article Google Scholar
Gunter, B., Brideau, C., Pikounis, B. & Liaw, A. Statistical and graphical methods for quality control determination of high-throughput screening data. J. Biomol. Screen. 8, 624–633 (2003).
Article Google Scholar
Hoaglin, D.C., Mosteller, F. & Tukey, J.W. Understanding Robust and Exploratory Data Analysis (Wiley, New York, 1983).
Google Scholar
Buxser, S. & Vroegop, S. Calculating the probability of detection for inhibitors in enzymatic or binding reactions in high-throughput screening. Anal. Biochem. 340, 1–13 (2005).
Article CAS Google Scholar
Chen, Y., Dougherty, E.R. & Bittner, M.L. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374 (1997).
Article CAS Google Scholar
Rocke, D.M. Design and analysis of experiments with high throughput biological assay data. Semin. Cell Dev. Biol. 15, 703–713 (2004).
Article CAS Google Scholar
Lee, M.L., Kuo, F.C., Whitmore, G.A. & Sklar, J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97, 9834–9839 (2000).
Article CAS Google Scholar
Nadon, R. & Shoemaker, J. Statistical issues with microarrays: processing and analysis. Trends Genet. 18, 265–271 (2002).
Article CAS Google Scholar
Box, G.E.P., Hunter, J.S. & Hunter, W.G. Statistics for Experimenters: Design, Innovation, and Discovery, edn. 2 (Wiley-Interscience, Hoboken, N.J., 2005).
Google Scholar
Wright, G.W. & Simon, R.M. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19, 2448–2455 (2003).
Article CAS Google Scholar
Smyth, G. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, no.1, art. 3 (2004).
Baldi, P. & Long, A.D. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17, 509–519 (2001).
Article CAS Google Scholar
Verkman, A.S. Drug discovery in academia. Am. J. Physiol. Cell Physiol. 286, C465–C474 (2004).
Article CAS Google Scholar
Kerns, E.H. & Di, L. Pharmaceutical profiling in drug discovery. Drug Discov. Today 8, 316–323 (2003).
Article CAS Google Scholar
Fay, N. & Ullmann, D. Leveraging process integration in early drug discovery. Drug Discov. Today 7, S181–S186 (2002).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Jing Liu and Janie Lapointe for generating the Figure 3 data. This work was supported by the “Informatics and Chemical Genomics” funding to R.N. under the Genome Quebec Phase II Bioinformatics Consortium program.

Author information

Authors and Affiliations

McGill University and Genome Quebec Innovation Centre, 740 avenue du Docteur Penfield, Montreal, H3A 1A4, Quebec, Canada
Nathalie Malo, Sonia Cerquozzi & Robert Nadon
McGill University Department of Epidemiology, Biostatistics, and Occupational Health, 1020 Pine Avenue West, Montreal, H3A 1A4, Quebec, Canada
Nathalie Malo & James A Hanley
McGill University Department of Biochemistry, 3655 Promenade Sir William Osler, Montreal, H3A 1A4, Quebec, Canada
Jerry Pelletier
McGill University Department of Human Genetics, 1205 avenue du Docteur Penfield N5/13, Montreal, H3A 1B1, Quebec, Canada
Robert Nadon

Authors

Nathalie Malo
View author publications
You can also search for this author in PubMed Google Scholar
James A Hanley
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Cerquozzi
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Pelletier
View author publications
You can also search for this author in PubMed Google Scholar
Robert Nadon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Nadon.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malo, N., Hanley, J., Cerquozzi, S. et al. Statistical practice in high-throughput screening data analysis. Nat Biotechnol 24, 167–175 (2006). https://doi.org/10.1038/nbt1186

Download citation

Published: 07 February 2006
Issue Date: 01 February 2006
DOI: https://doi.org/10.1038/nbt1186

This article is cited by

Probing the chemical ‘reactome’ with high-throughput experimentation data
- Emma King-Smith
- Simon Berritt
- Alpha A. Lee
Nature Chemistry (2024)
Anti-malaria drug artesunate prevents development of amyloid-β pathology in mice by upregulating PICALM at the blood-brain barrier
- Kassandra Kisler
- Abhay P. Sagare
- Berislav V. Zlokovic
Molecular Neurodegeneration (2023)
High-throughput sequencing in plant disease management: a comprehensive review of benefits, challenges, and future perspectives
- Mir Muhammad Nizamani
- Qian Zhang
- Yong Wang
Phytopathology Research (2023)
The regulation of endocrine-disrupting chemicals to minimize their impact on health
- Carol Duh-Leong
- Maricel V. Maffini
- Leonardo Trasande
Nature Reviews Endocrinology (2023)
Transfer learning for versatile and training free high content screening analyses
- Maxime Corbe
- Gaëlle Boncompain
- Auguste Genovesio
Scientific Reports (2023)

Statistical practice in high-throughput screening data analysis

Abstract

Access options

Similar content being viewed by others

Statistical models for identifying frequent hitters in high throughput screening

A statistical framework for high-content phenotypic profiling using cellular feature distributions

A practical guide to large-scale docking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

Probing the chemical ‘reactome’ with high-throughput experimentation data

Anti-malaria drug artesunate prevents development of amyloid-β pathology in mice by upregulating PICALM at the blood-brain barrier

High-throughput sequencing in plant disease management: a comprehensive review of benefits, challenges, and future perspectives

The regulation of endocrine-disrupting chemicals to minimize their impact on health

Transfer learning for versatile and training free high content screening analyses

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links