Data-driven modelling of signal-transduction networks

Janes, Kevin A.; Yaffe, Michael B.

doi:10.1038/nrm2041

Review Article
Published: 01 November 2006

Data-driven modelling of signal-transduction networks

Kevin A. Janes^1,2 &
Michael B. Yaffe³

Nature Reviews Molecular Cell Biology volume 7, pages 820–828 (2006)Cite this article

6524 Accesses
283 Citations
5 Altmetric
Metrics details

Key Points

New experimental techniques are allowing the generation of complex data sets that characterize signal-transduction networks. It is no longer possible to inspect these data by intuition to extract the maximal amount of information that is embedded within them.
'Data-driven models' are mathematical approaches that provide simplified representations of complex data sets. They are based solely on analysing the data itself, without having to make any assumptions about the underlying mechanisms.
This User's guide introduces three data-driven modelling approaches: clustering, principal components analysis (PCA), and partial least squares (PLS). Clustering provides a means for data organization, whereas PCA is a method for data condensation and PLS is a technique for data prediction.
Clustering groups observations together that have similar projections in the high-dimensional space defined by the signalling variables. Similarity can be defined by several difference distance metrics, such as Euclidean distance (for absolute distances) and Pearson distance (for correlations).
PCA and PLS factorize a data set into the product of two vectors (a scores vector and a loadings vector) that capture the leading eigenvalues of the covariance of the data. PCA calculates scores and loadings vectors to maximize the variance that is captured in the starting data matrix. By contrast, PLS calculates scores and loadings vectors to maximize the relationship between a matrix of independent variables and a matrix of dependent variables.
Data-driven models are poised to become standard tools in analysing signalling networks as complex protein data sets become easier to acquire and more difficult to interpret.

Abstract

New technologies are permitting large-scale quantitative studies of signal-transduction networks. Such data are hard to understand completely by inspection and intuition. 'Data-driven models' help users to analyse large data sets by simplifying the measurements themselves. Data-driven modelling approaches such as clustering, principal components analysis and partial least squares can derive biological insights from large-scale experiments. These models are emerging as standard tools for systems-level research in signalling networks.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Alternative representations of a systems biology data set.**

**Figure 2: Clustering of row and column vectors by different distance metrics.**

**Figure 3: Principal components identified by PCA and PLS.**

Predictive power of non-identifiable models

Article Open access 10 July 2023

Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions

Article Open access 19 January 2022

Network inference from perturbation time course data

Article Open access 01 November 2022

References

Janes, K. A. et al. A high-throughput quantitative multiplex kinase assay for monitoring information flow in signaling networks: application to sepsis-apoptosis. Mol. Cell Proteomics 2, 463–473 (2003).
Article CAS Google Scholar
Kingsmore, S. F. Multiplexed protein measurement: technologies and applications of protein and antibody arrays. Nature Rev. Drug Discov. 5, 310–320 (2006).
Article CAS Google Scholar
Ong, S. E. & Mann, M. Mass spectrometry-based proteomics turns quantitative. Nature Chem. Biol. 1, 252–262 (2005).
Article CAS Google Scholar
Irish, J. M., Kotecha, N. & Nolan, G. P. Mapping normal and cancer cell signalling networks: towards single-cell proteomics. Nature Rev. Cancer 6, 146–155 (2006).
Article CAS Google Scholar
Gaudet, S. et al. A compendium of signals and responses triggered by prodeath and prosurvival cytokines. Mol. Cell Proteomics 4, 1569–1590 (2005). References 3–5 are excellent reviews on emerging technologies for large-scale studies of signal-transduction networks.
Article CAS Google Scholar
Janes, K. A. et al. The response of human epithelial cells to TNF involves an inducible autocrine cascade. Cell 124, 1225–1239 (2006). This study applied data-driven modelling to a large-scale proteomic compendium and showed that tumour necrosis factor induces a regulated, interdependent cascade of autocrine cytokines.
Article CAS Google Scholar
Jones, R. B., Gordus, A., Krall, J. A. & MacBeath, G. A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439, 168–174 (2006).
Article CAS Google Scholar
Blagoev, B., Ong, S. E., Kratchmarova, I. & Mann, M. Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics. Nature Biotechnol. 22, 1139–1145 (2004).
Article CAS Google Scholar
Irish, J. M. et al. Single cell profiling of potentiated phospho-protein networks in cancer cells. Cell 118, 217–228 (2004).
Article CAS Google Scholar
Natarajan, M., Lin, K. M., Hsueh, R. C., Sternweis, P. C. & Ranganathan, R. A global analysis of cross-talk in a mammalian cellular signalling network. Nature Cell Biol. 8, 571–580 (2006). The first data-driven analysis of the one- and two-ligand screens for macrophage signalling that was organized by the Alliance for Cell Signaling. The results show how crosstalk is widespread but not uniformly distributed across all ligands and signalling molecules.
Article CAS Google Scholar
Bray, D. Reasoning for results. Nature 412, 863 (2001).
Article CAS Google Scholar
Janes, K. A. & Lauffenburger, D. A. A biological approach to computational models of proteomic networks. Curr. Opin. Chem. Biol. 10, 73–80 (2006).
Article CAS Google Scholar
Pawson, T. Specificity in signal transduction: from phosphotyrosine–SH2 domain interactions to complex cellular systems. Cell 116, 191–203 (2004).
Article CAS Google Scholar
Hunter, T. Signaling — 2000 and beyond. Cell 100, 113–127 (2000).
Article CAS Google Scholar
Janes, K. A. et al. Cue-signal-response analysis of TNF-induced apoptosis by partial least squares regression of dynamic multivariate data. J. Comput. Biol. 11, 544–561 (2004).
Article CAS Google Scholar
D'Haeseleer, P. How does gene expression clustering work? Nature Biotechnol. 23, 1499–1501 (2005).
Article CAS Google Scholar
Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E. & Ruzzo, W. L. Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001).
Article CAS Google Scholar
Yeung, K. Y., Haynor, D. R. & Ruzzo, W. L. Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001).
Article CAS Google Scholar
Schuldiner, M. et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123, 507–519 (2005).
Article CAS Google Scholar
Perlman, Z. E. et al. Multidimensional drug profiling by automated microscopy. Science 306, 1194–1198 (2004).
Article CAS Google Scholar
Bjorklund, M. et al. Identification of pathways regulating cell size and cell-cycle progression by RNAi. Nature 439, 1009–1013 (2006).
Article Google Scholar
Gilchrist, M. et al. Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4. Nature 441, 173–178 (2006).
Article CAS Google Scholar
Geladi, P. & Kowalski, B. R. Partial least-squares regression — a tutorial. Anal. Chim. Acta 185, 1–17 (1986). The classic review on partial least squares. The tutorial is presented in the context of spectroscopy, but the analytical approaches can be applied equally well to biological systems.
Article CAS Google Scholar
Briggman, K. L., Abarbanel, H. D. & Kristan, W. B. Jr. Optical imaging of neuronal populations during decision-making. Science 307, 896–901 (2005).
Article CAS Google Scholar
Hallem, E. A. & Carlson, J. R. Coding of odors by a receptor repertoire. Cell 125, 143–160 (2006).
Article CAS Google Scholar
Butte, A. The use and analysis of microarray data. Nature Rev. Drug Discov. 1, 951–960 (2002).
Article CAS Google Scholar
Tanaka, M. et al. An unbiased cell morphology-based screen for new, biologically active small molecules. PLoS Biol. 3, e128 (2005).
Article Google Scholar
Knight, Z. A. et al. A pharmacological map of the PI3-K family defines a role for p110α in insulin signaling. Cell 125, 733–747 (2006).
Article CAS Google Scholar
Haggarty, S. J., Koeller, K. M., Wong, J. C., Butcher, R. A. & Schreiber, S. L. Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem. Biol. 10, 383–396 (2003).
Article CAS Google Scholar
Hirai, M. Y. et al. Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 101, 10205–10210 (2004).
Article CAS Google Scholar
Liu, G., Swihart, M. T. & Neelamegham, S. Sensitivity, principal component and flux analysis applied to signal transduction: the case of epidermal growth factor mediated signaling. Bioinformatics 21, 1194–1202 (2005).
Article CAS Google Scholar
Janes, K. A. et al. A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science 310, 1646–1653 (2005).
Article CAS Google Scholar
Nguyen, D. V. & Rocke, D. M. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002).
Article CAS Google Scholar
Jessen, F., Lametsch, R., Bendixen, E., Kjaersgard, I. V. & Jorgensen, B. M. Extracting information from two-dimensional electrophoresis gels by partial least squares regression. Proteomics 2, 32–35 (2002). These three papers are the first applications of PLS for classification (references 33 and 34) and prediction (reference 32) using biological networks.
Article CAS Google Scholar
Hood, L., Heath, J. R., Phelps, M. E. & Lin, B. Systems biology and new technologies enable predictive and preventative medicine. Science 306, 640–643 (2004).
Article CAS Google Scholar
Goncalves, A. et al. Postoperative serum proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving adjuvant chemotherapy. Oncogene 25, 981–989 (2006).
Article CAS Google Scholar
Linke, S. P., Bremer, T. M., Herold, C. D., Sauter, G. & Diamond, C. A multimarker model to predict outcome in tamoxifen-treated breast cancer patients. Clin. Cancer Res. 12, 1175–1183 (2006).
Article CAS Google Scholar
Liao, J. C. et al. Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA 100, 15522–15527 (2003). This paper is the first introduction of NCA and its proof-of-principle application to biological networks.
Article CAS Google Scholar
Martens, H. & Martens, M. Multivariate Analysis of Quality: An Introduction (John Wiley & Sons, Chichester, 2001).
Google Scholar
Grossman, R. L., Kamath, C., Kegelmeyer, P., Kumar, V. & Namburu, R. Data Mining for Scientific and Engineering Applications (Kluwer Academic, Dordrecht, 2001).
Book Google Scholar
Gilman, A. G. et al. Overview of the Alliance for Cellular Signaling. Nature 420, 703–706 (2002).
Article CAS Google Scholar
Pradervand, S., Maurya, M. R. & Subramaniam, S. Identification of signaling components required for the prediction of cytokine release in RAW 264.7 macrophages. Genome Biol. 7, R11 (2006).
Kitano, H. Systems biology: a brief overview. Science 295, 1662–1664 (2002).
Article CAS Google Scholar
MacQueen, J. B. in Proceedings of 5 th Berkeley Symposium on Mathematical Statistics and Probability 281–297 (University of California Press, Berkeley, 1967).
Google Scholar
Bezdek, J. C. Pattern Recognition with Fuzzy Objective Function Algorithms (Plenum, New York, 1981).
Book Google Scholar

Download references

Acknowledgements

The work cited in this review was supported by grants from the National Institutes of Health to M.B.Y. and an American Cancer Society postdoctoral fellowship to K.A.J.

Author information

Authors and Affiliations

Cell Decision Processes Center, Massachusetts Institute of Technology, Cambridge, 02139, Massachusetts, USA
Kevin A. Janes
Department of Cell Biology, Harvard Medical School, Boston, 02115, Massachusetts, USA
Kevin A. Janes
Center for Cancer Research and Departments of Biology and Biological Engineering, Cell Decision Processes Center, Massachusetts Institute of Technology, Cambridge, 02139, Massachusetts, USA
Michael B. Yaffe

Authors

Kevin A. Janes
View author publications
You can also search for this author in PubMed Google Scholar
Michael B. Yaffe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael B. Yaffe.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Matrix: A table of numbers. Alternatively, a matrix can be viewed as an arrangement of row or column vectors.
Vector: A mathematical quantity that has both magnitude (or length) and direction. The entries of a vector specify the magnitude of its projection in different directions.
Linear algebra: A branch of mathematics that involves linear manipulations of vectors and matrices.
Transformation: A mathematical function that can be applied to vectors and matrices.
Row vector: A vector that is composed of one entire row of a matrix with dimensions that are specified by the matrix columns.
Euclidean distance: A mathematical quantity that calculates the measurable geometric distance between two vectors pointing from a common origin.
Column vector: A vector that is composed of one entire column of a matrix with dimensions that are specified by the matrix rows.
Pearson distance: A mathematical quantity that calculates the difference in direction between two vectors pointing from a common origin.
k-means clustering: A clustering technique in which observations are grouped into a fixed number of pre-specified clusters called centroids.
Eigenvalue: A mathematical quantity that provides the scaling factor for an eigenvector of a given transformation. For PCA, eigenvalues quantify the contribution of different portions of the data set to the overall measured variation.
Scores vector: The principal component vector that describes how strongly each observation projects along the principal component.
Loadings vector: The principal component vector that describes how strongly each measured signal contributes to the principal component.
Unsupervised analysis: A type of computational learning approach in which the expected output is not specified. Hierarchical clustering and PCA are unsupervised analyses.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Janes, K., Yaffe, M. Data-driven modelling of signal-transduction networks. Nat Rev Mol Cell Biol 7, 820–828 (2006). https://doi.org/10.1038/nrm2041

Download citation

Issue Date: 01 November 2006
DOI: https://doi.org/10.1038/nrm2041

This article is cited by

Epigenetic modulation reveals differentiation state specificity of oncogene addiction
- Mehwish Khaliq
- Mohan Manikkam
- Mohammad Fallahi-Sichani
Nature Communications (2021)
Rheology-Informed Neural Networks (RhINNs) for forward and inverse metamodelling of complex fluids
- Mohammadamin Mahmoudabadbozchelou
- Safa Jamali
Scientific Reports (2021)
Robust latent-variable interpretation of in vivo regression models by nested resampling
- Alexander W. Caulk
- Kevin A. Janes
Scientific Reports (2019)
Whither systems medicine?
- Rolf Apweiler
- Tim Beissbarth
- Olaf Wolkenhauer
Experimental & Molecular Medicine (2018)
Systems analysis of latent HIV reversal reveals altered stress kinase signaling and increased cell death in infected T cells
- Linda E. Fong
- Endah S. Sulistijo
- Kathryn Miller-Jensen
Scientific Reports (2017)

Data-driven modelling of signal-transduction networks

Key Points

Abstract

Access options

Similar content being viewed by others

Predictive power of non-identifiable models

Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions

Network inference from perturbation time course data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary information S1 (box) (PDF 153 kb)

Supplementary information S2 (box) (PDF 171 kb)

Supplementary information S3 (box) (PDF 139 kb)

Supplementary information S4 (box) (PDF 136 kb)

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Epigenetic modulation reveals differentiation state specificity of oncogene addiction

Rheology-Informed Neural Networks (RhINNs) for forward and inverse metamodelling of complex fluids

Robust latent-variable interpretation of in vivo regression models by nested resampling

Whither systems medicine?

Systems analysis of latent HIV reversal reveals altered stress kinase signaling and increased cell death in infected T cells

Search

Quick links

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Related links

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links