Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Adaptive informatics for multifactorial and high-content biological data

Abstract

Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Challenges in management of multidimensional data.
Figure 2: SDCubes are built from a collection of linked data modules that can encode diverse experimental data with varying requirements.
Figure 3: Annotated and simplified screen shots from ImageRail software.
Figure 4: Exploring different dimensions of a multivariate drug and ligand dose-response series using SDCubes.
Figure 5: Single-cell analysis of drug-ligand dose responses uncovers cell-to-cell heterogeneity.

Similar content being viewed by others

References

  1. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

  2. Maheswari, U. et al. The Diatom EST database. Nucleic Acids Res. 33, D344–D347 (2005).

    Article  Google Scholar 

  3. Pawley, J.B. Handbook of Biological Confocal Microscopy. 3rd edition. (Springer Science + Business Media, 2006).

    Book  Google Scholar 

  4. Gaudet, S. et al. A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol. Cell. Proteomics 4, 1569–1590 (2005).

    Article  CAS  Google Scholar 

  5. Neve, R.M. et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515–527 (2006).

    Article  CAS  Google Scholar 

  6. Conrad, C. & Gerlich, D.W. Automated microscopy for high-content RNAi screening. J. Cell Biol. 188, 453–461 (2010).

    Article  CAS  Google Scholar 

  7. Loo, L.H., Wu, L.F. & Altschuler, S.J. Image-based multivariate profiling of drug responses from single cells. Nat. Methods 4, 445–453 (2007).

    CAS  Google Scholar 

  8. Snijder, B. et al. Population context determines cell-to-cell variability in endocytosis and virus infection. Nature 461, 520–523 (2009).

    Article  CAS  Google Scholar 

  9. Gehlenborg, N. et al. Visualization of omics data for systems biology. Nat. Methods 7, S56–S68 (2010).

    Article  CAS  Google Scholar 

  10. Krutzik, P.O., Crane, J.M., Clutter, M.R. & Nolan, G.P. High-content single-cell drug screening with phosphospecific flow cytometry. Nat. Chem. Biol. 4, 132–142 (2008).

    Article  CAS  Google Scholar 

  11. Dougherty, M.T. et al. Unifying biological image formats with HDF5. ACM Queue 52, 42–47 (2009).

    Google Scholar 

  12. Taylor, C.F. et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 26, 889–896 (2008).

    Article  CAS  Google Scholar 

  13. Abramoff, M.D., Magelhaes, P.J. & Ram, S.J. Image processing with ImageJ. Biophotonics International 11, 36–42 (2004).

    Google Scholar 

  14. Moore, J. et al. Open tools for storage and management of quantitative image data. Methods Cell Biol. 85, 555–570 (2008).

    Article  Google Scholar 

  15. Goldberg, I.G. et al. The Open Microscopy Environment (OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 6, R47 (2005).

    Article  Google Scholar 

  16. Gupta, P.B., Chaffer, C.L. & Weinberg, R.A. Cancer stem cells: mirage or reality? Nat. Med. 15, 1010–1012 (2009).

    Article  CAS  Google Scholar 

  17. Ciardiello, F. et al. Antitumor effect and potentiation of cytotoxic drugs activity in human cancer cells by ZD-1839 (Iressa), an epidermal growth factor receptor-selective tyrosine kinase inhibitor. Clin. Cancer Res. 6, 2053–2063 (2000).

    CAS  PubMed  Google Scholar 

  18. Yarden, Y. & Sliwkowski, M.X. Untangling the ErbB signalling network. Nat. Rev. Mol. Cell Biol. 2, 127–137 (2001).

    Article  CAS  Google Scholar 

  19. Ciardiello, F. & Tortora, G. EGFR antagonists in cancer treatment. N. Engl. J. Med. 358, 1160–1174 (2008).

    Article  CAS  Google Scholar 

  20. Paez, J.G. et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304, 1497–1500 (2004).

    Article  CAS  Google Scholar 

  21. Blaimauer, K. et al. Effects of epidermal growth factor and keratinocyte growth factor on the growth of oropharyngeal keratinocytes in coculture with autologous fibroblasts in a three-dimensional matrix. Cells Tissues Organs 182, 98–105 (2006).

    Article  CAS  Google Scholar 

  22. McKillop, D. et al. Tumor penetration of gefitinib (Iressa), an epidermal growth factor receptor tyrosine kinase inhibitor. Mol. Cancer Ther. 4, 641–649 (2005).

    Article  CAS  Google Scholar 

  23. Turke, A.B. et al. Preexistence and clonal selection of MET amplification in EGFR mutant NSCLC. Cancer Cell 17, 77–88 (2010).

    Article  CAS  Google Scholar 

  24. Sharma, S.V. et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80 (2010).

    Article  CAS  Google Scholar 

  25. Spencer, S.L., Gaudet, S., Albeck, J.G., Burke, J.M. & Sorger, P.K. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature 459, 428–432 (2009).

    Article  CAS  Google Scholar 

  26. Brown, A., Carlson, T., Loi, C.-M. & Graziano, M. Pharmacodynamic and toxicokinetic evaluation of the novel MEK inhibitor, PD0325901, in the rat following oral and intravenous administration. Cancer Chemother. Pharmacol. 59, 671–679 (2007).

    Article  CAS  Google Scholar 

  27. Saez-Rodriguez, J. et al. Flexible informatics for linking experimental data to mathematical models via DataRail. Bioinformatics 24, 840–847 (2008).

    Article  CAS  Google Scholar 

  28. Albeck, J.G. et al. Collecting and organizing systematic sets of protein data. Nat. Rev. Mol. Cell Biol. 7, 803–812 (2006).

    Article  CAS  Google Scholar 

  29. Lamprecht, M.R., Sabatini, D.M. & Carpenter, A.E. CellProfiler: free, versatile software for automated biological image analysis. Biotechniques 42, 71–75 (2007).

    Article  CAS  Google Scholar 

  30. Feinerman, O., Veiga, J., Dorfman, J.R., Germain, R.N. & Altan-Bonnet, G. Variability and robustness in T cell activation from regulated heterogeneity in protein levels. Science 321, 1081–1084 (2008).

    Article  CAS  Google Scholar 

  31. Niepel, M., Spencer, S.L. & Sorger, P.K. Non-genetic cell-to-cell variability and the consequences for pharmacology. Curr. Opin. Chem. Biol. 13, 556–561 (2009).

    Article  CAS  Google Scholar 

  32. Yang, R., Niepel, M., Mitchison, T.K. & Sorger, P.K. Dissecting variability in responses to cancer chemotherapy through systems pharmacology. Clin. Pharmacol. Ther. 88, 34–38 (2010).

    Article  CAS  Google Scholar 

  33. Murray-Rust, P. & Rzepa, H.S. Chemical markup, XML and the world wide web. 4. CML schema. J. Chem. Inf. Comput. Sci. 43, 757–772 (2003).

    Article  CAS  Google Scholar 

  34. Krutzik, P.O. & Nolan, G.P. Fluorescent cell barcoding in flow cytometry allows high-throughput drug screening and signaling profiling. Nat. Methods 3, 361–368 (2006).

    Article  CAS  Google Scholar 

  35. Sevecka, M. & MacBeath, G. State-based discovery: a multidimensional screen for small-molecule modulators of EGF signaling. Nat. Methods 3, 825–831 (2006).

    Article  CAS  Google Scholar 

  36. Wolf-Yadlin, A., Sevecka, M. & MacBeath, G. Dissecting protein function and signaling using protein microarrays. Curr. Opin. Chem. Biol. 13, 398–405 (2009).

    Article  CAS  Google Scholar 

  37. Alexopoulos, L.G., Saez-Rodriguez, J., Cosgrove, B.D., Lauffenburger, D.A. & Sorger, P.K. Networks inferred from biochemical data reveal profound differences in toll-like receptor and inflammatory signaling between normal and transformed hepatocytes. Mol. Cell. Proteomics 9, 1849–1865 (2010).

    Article  CAS  Google Scholar 

  38. Chen, W.W. et al. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data. Mol. Syst. Biol. 5, 239 (2009).

    PubMed  PubMed Central  Google Scholar 

  39. Hendriks, B.S. & Espelin, C.W. DataPflex: a MATLAB-based tool for the manipulation and visualization of multidimensional datasets. Bioinformatics 26, 432–433 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by US National Institutes of Health grants HG006097, HG005693 and GM68762. We thank G. Danuser, T. Mitchison and M. Eisenstein for help with the manuscript; Applied Precision Inc., C. Brown and K. Teplitz for help with instrumentation; and G. Odell and J. Baker for inspiration.

Author information

Authors and Affiliations

Authors

Contributions

B.L.M., M.P.M. and J.L.M. programmed the software. B.L.M., M.N., J.L.M. and P.K.S. developed the method and wrote the manuscript.

Corresponding author

Correspondence to Peter K Sorger.

Ethics declarations

Competing interests

P.K.S. is a founder and stockholder in Glencoe Software, a private company that develops software based on Open Microscopy Environment standards. Glencoe developed the OMERO server mentioned in this article. P.K.S. is a member of the Board of Directors of Applied Precision Inc., which manufactured the scanning microscope used in this study.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8, Supplementary Table 1 and Supplementary Notes 1–2 (PDF 12736 kb)

Supplementary Software 1

SDCube Programming Library 1.0: Java-based programming library to read and write data in the SDCube format. (ZIP 3968 kb)

Supplementary Software 2

ImageRail 1.0: image analysis software for high-throughput microscopy using SDCubes for single-cell and experimental design data management. (ZIP 38287 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millard, B., Niepel, M., Menden, M. et al. Adaptive informatics for multifactorial and high-content biological data. Nat Methods 8, 487–492 (2011). https://doi.org/10.1038/nmeth.1600

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1600

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing