Abstract
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Maheswari, U. et al. The Diatom EST database. Nucleic Acids Res. 33, D344–D347 (2005).
Pawley, J.B. Handbook of Biological Confocal Microscopy. 3rd edition. (Springer Science + Business Media, 2006).
Gaudet, S. et al. A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol. Cell. Proteomics 4, 1569–1590 (2005).
Neve, R.M. et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515–527 (2006).
Conrad, C. & Gerlich, D.W. Automated microscopy for high-content RNAi screening. J. Cell Biol. 188, 453–461 (2010).
Loo, L.H., Wu, L.F. & Altschuler, S.J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).
Snijder, B. et al. Population context determines cell-to-cell variability in endocytosis and virus infection. Nature 461, 520–523 (2009).
Gehlenborg, N. et al. Visualization of omics data for systems biology. Nat. Methods 7, S56–S68 (2010).
Krutzik, P.O., Crane, J.M., Clutter, M.R. & Nolan, G.P. High-content single-cell drug screening with phosphospecific flow cytometry. Nat. Chem. Biol. 4, 132–142 (2008).
Dougherty, M.T. et al. Unifying biological image formats with HDF5. ACM Queue 52, 42–47 (2009).
Taylor, C.F. et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 26, 889–896 (2008).
Abramoff, M.D., Magelhaes, P.J. & Ram, S.J. Image processing with ImageJ. Biophotonics International 11, 36–42 (2004).
Moore, J. et al. Open tools for storage and management of quantitative image data. Methods Cell Biol. 85, 555–570 (2008).
Goldberg, I.G. et al. The Open Microscopy Environment (OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 6, R47 (2005).
Gupta, P.B., Chaffer, C.L. & Weinberg, R.A. Cancer stem cells: mirage or reality? Nat. Med. 15, 1010–1012 (2009).
Ciardiello, F. et al. Antitumor effect and potentiation of cytotoxic drugs activity in human cancer cells by ZD-1839 (Iressa), an epidermal growth factor receptor-selective tyrosine kinase inhibitor. Clin. Cancer Res. 6, 2053–2063 (2000).
Yarden, Y. & Sliwkowski, M.X. Untangling the ErbB signalling network. Nat. Rev. Mol. Cell Biol. 2, 127–137 (2001).
Ciardiello, F. & Tortora, G. EGFR antagonists in cancer treatment. N. Engl. J. Med. 358, 1160–1174 (2008).
Paez, J.G. et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304, 1497–1500 (2004).
Blaimauer, K. et al. Effects of epidermal growth factor and keratinocyte growth factor on the growth of oropharyngeal keratinocytes in coculture with autologous fibroblasts in a three-dimensional matrix. Cells Tissues Organs 182, 98–105 (2006).
McKillop, D. et al. Tumor penetration of gefitinib (Iressa), an epidermal growth factor receptor tyrosine kinase inhibitor. Mol. Cancer Ther. 4, 641–649 (2005).
Turke, A.B. et al. Preexistence and clonal selection of MET amplification in EGFR mutant NSCLC. Cancer Cell 17, 77–88 (2010).
Sharma, S.V. et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80 (2010).
Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M. & Sorger, P.K. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432 (2009).
Brown, A., Carlson, T., Loi, C.-M. & Graziano, M. Pharmacodynamic and toxicokinetic evaluation of the novel MEK inhibitor, PD0325901, in the rat following oral and intravenous administration. Cancer Chemother. Pharmacol. 59, 671–679 (2007).
Saez-Rodriguez, J. et al. Flexible informatics for linking experimental data to mathematical models via DataRail. Bioinformatics 24, 840–847 (2008).
Albeck, J.G. et al. Collecting and organizing systematic sets of protein data. Nat. Rev. Mol. Cell Biol. 7, 803–812 (2006).
Lamprecht, M.R., Sabatini, D.M. & Carpenter, A.E. CellProfiler: free, versatile software for automated biological image analysis. Biotechniques 42, 71–75 (2007).
Feinerman, O., Veiga, J., Dorfman, J.R., Germain, R.N. & Altan-Bonnet, G. Variability and robustness in T cell activation from regulated heterogeneity in protein levels. Science 321, 1081–1084 (2008).
Niepel, M., Spencer, S.L. & Sorger, P.K. Non-genetic cell-to-cell variability and the consequences for pharmacology. Curr. Opin. Chem. Biol. 13, 556–561 (2009).
Yang, R., Niepel, M., Mitchison, T.K. & Sorger, P.K. Dissecting variability in responses to cancer chemotherapy through systems pharmacology. Clin. Pharmacol. Ther. 88, 34–38 (2010).
Murray-Rust, P. & Rzepa, H.S. Chemical markup, XML and the world wide web. 4. CML schema. J. Chem. Inf. Comput. Sci. 43, 757–772 (2003).
Krutzik, P.O. & Nolan, G.P. Fluorescent cell barcoding in flow cytometry allows high-throughput drug screening and signaling profiling. Nat. Methods 3, 361–368 (2006).
Sevecka, M. & MacBeath, G. State-based discovery: a multidimensional screen for small-molecule modulators of EGF signaling. Nat. Methods 3, 825–831 (2006).
Wolf-Yadlin, A., Sevecka, M. & MacBeath, G. Dissecting protein function and signaling using protein microarrays. Curr. Opin. Chem. Biol. 13, 398–405 (2009).
Alexopoulos, L.G., Saez-Rodriguez, J., Cosgrove, B.D., Lauffenburger, D.A. & Sorger, P.K. Networks inferred from biochemical data reveal profound differences in toll-like receptor and inflammatory signaling between normal and transformed hepatocytes. Mol. Cell. Proteomics 9, 1849–1865 (2010).
Chen, W.W. et al. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).
Hendriks, B.S. & Espelin, C.W. DataPflex: a MATLAB-based tool for the manipulation and visualization of multidimensional datasets. Bioinformatics 26, 432–433 (2010).
Acknowledgements
This work was supported by US National Institutes of Health grants HG006097, HG005693 and GM68762. We thank G. Danuser, T. Mitchison and M. Eisenstein for help with the manuscript; Applied Precision Inc., C. Brown and K. Teplitz for help with instrumentation; and G. Odell and J. Baker for inspiration.
Author information
Authors and Affiliations
Contributions
B.L.M., M.P.M. and J.L.M. programmed the software. B.L.M., M.N., J.L.M. and P.K.S. developed the method and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
P.K.S. is a founder and stockholder in Glencoe Software, a private company that develops software based on Open Microscopy Environment standards. Glencoe developed the OMERO server mentioned in this article. P.K.S. is a member of the Board of Directors of Applied Precision Inc., which manufactured the scanning microscope used in this study.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–8, Supplementary Table 1 and Supplementary Notes 1–2 (PDF 12736 kb)
Supplementary Software 1
SDCube Programming Library 1.0: Java-based programming library to read and write data in the SDCube format. (ZIP 3968 kb)
Supplementary Software 2
ImageRail 1.0: image analysis software for high-throughput microscopy using SDCubes for single-cell and experimental design data management. (ZIP 38287 kb)
Rights and permissions
About this article
Cite this article
Millard, B., Niepel, M., Menden, M. et al. Adaptive informatics for multifactorial and high-content biological data. Nat Methods 8, 487–492 (2011). https://doi.org/10.1038/nmeth.1600
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.1600
This article is cited by
-
A Machine Learning-Based Image Segmentation Method to Quantify In Vitro Osteoclast Culture Endpoints
Calcified Tissue International (2023)
-
The “MYOCYTER” – Convert cellular and cardiac contractions into numbers with ImageJ
Scientific Reports (2019)
-
Profiling drugs for rheumatoid arthritis that inhibit synovial fibroblast activation
Nature Chemical Biology (2017)
-
Physical design for distributed RFID-based supply chain management
Distributed and Parallel Databases (2016)
-
Ancillary study management systems: a review of needs
BMC Medical Informatics and Decision Making (2013)