Key Points
-
Network propagation transforms a short list of candidate genes into a genome-wide profile of gene scores that are based on proximity to candidates in a gene network.
-
This transformation greatly improves the power of genetic association, providing a universal amplifier for genetic analysis.
-
Mathematically, the technique of network propagation is simplifying and unifying.
-
Network propagation methods can be used to identify genes and genetic modules that underlie human disease.
Abstract
Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease. Fundamental to network analysis is the concept that genes underlying the same phenotype tend to interact; this principle can be used to combine and to amplify signals from individual genes. Recently, numerous bioinformatic techniques have been proposed for genetic analysis using networks, based on random walks, information diffusion and electrical resistance. These approaches have been applied successfully to identify disease genes, genetic modules and drug targets. In fact, all these approaches are variations of a unifying mathematical machinery — network propagation — suggesting that it is a powerful data transformation method of broad utility in genetic research.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Schwikowski, B., Uetz, P. & Fields, S. A network of protein–protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000).
Brohée, S. & van Helden, J. Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinformatics 7, 488 (2006).
Song, J. & Singh, M. How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics 25, 3143–3150 (2009).
Sharan, R., Ulitsky, I. & Shamir, R. Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007).
Peña-Castillo, L. et al. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9 (Suppl. 1), S2 (2008).
Navlakha, S. & Kingsford, C. The power of protein interaction networks for associating genes with diseases. Bioinformatics 26, 1057–1063 (2010).
Menche, J. et al. Uncovering disease–disease relationships through the incomplete interactome. Science 347, 1257601–1257601 (2015).
Shrager, J., Hogg, T. & Huberman, B. A. Observation of phase transitions in spreading activation networks. Science 236, 1092–1094 (1987).
Lovász, L. in Combinatorics: Paul Erdõs is Eighty (eds Miklós, D., Sós, V. T. & Szõnyi, T.), 1–46 (Janos Bolyai Mathematical Society, 1993.
Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank citation ranking: bringing order to the web. Stanford InfoLab http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1768 (1999).
Kleinberg, J. M. Authoritative sources in a hyperlinked environment. J. of the ACM 46, 604–632 (1999).
Klein, D. J. & Randic´, M. Resistance distance. J. Math. Chem. 12, 81–95 (1993).
Tong, H., Faloutsos, C. & Pan, J.-Y. Random walk with restart: fast solutions and applications. Knowl. Inf. Syst. 14, 327–346 (2007).
Haveliwala, T. H. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 784–796 (2003).
Krapivsky, P. L., Redner, S. & Ben-Naim, E. A Kinetic View of Statistical Physics (Cambridge Univ. Press, 2010).
Ben-Avraham, D. & Havlin, S. Diffusion and Reactions in Fractals and Disordered Systems (Cambridge Univ. Press, 2000).
Doyle, P. G. & Laurie Snell, J. Random Walks and Electric Networks (The Mathematical Association of America, 1984).
Kondor, R. I. & Lafferty, J. Diffusion kernels on graphs and other discrete input spaces. Proc. Intl Conf. on Machine Learning (ICML) 2, 315–322 (2002).
Noble, W. S., Kuang, R., Leslie, C. & Weston, J. Identifying remote protein homologs by network propagation. FEBS J. 272, 5119–5128 (2005).
Mitra, K., Carvunis, A.-R., Ramesh, S. K. & Ideker, T. Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14, 719–732 (2013).
Cho, D.-Y., Kim, Y.-A. & Przytycka, T. M. Chapter 5: network biology approach to complex diseases. PLoS Comput. Biol. 8, e1002820 (2012).
Ideker, T. & Sharan, R. Protein networks in disease. Genome Res. 18, 644–652 (2008).
Csermely, P., Korcsmáros, T., Kiss, H. J. M., London, G. & Nussinov, R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol. Ther. 138, 333–408 (2013).
Oti, M., Snel, B., Huynen, M. A. & Brunner, H. G. Predicting disease genes using protein–protein interactions. J. Med. Genet. 43, 691–698 (2006).
Franke, L. et al. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am. J. Hum. Genet. 78, 1011–1025 (2006).
Barabasi, A.-L. Scale-free networks: a decade and beyond. Science 325, 412–413 (2009).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Leiserson, M. D. M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015). A 2D method that exploits the propagation-derived similarity matrix to infer protein modules that are associated with cancer.
Ruffalo, M., Koyutürk, M. & Sharan, R. Network-based integration of disparate omic data to identify 'silent players' in cancer. PLoS Comput. Biol. 11, e1004595 (2015).
Du, D., Lee, C. F. & Li, X.-Q. Systematic differences in signal emitting and receiving revealed by PageRank analysis of a human protein interactome. PLoS ONE 7, e44872 (2012).
Vinayagam, A. et al. A directed protein interaction network for investigating intracellular signal transduction. Sci. Signal. 4, rs8 (2011).
Cao, M. et al. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics 30, i219–i227 (2014). A network propagation-based approach for incorporating known biological pathways into protein function prediction.
Weston, J., Elisseeff, A., Zhou, D., Leslie, C. S. & Noble, W. S. Protein ranking: from local to global structure in the protein similarity network. Proc. Natl Acad. Sci. USA 101, 6559–6563 (2004). One of the first studies to apply the concept of network propagation to the biological domain. A propagation process over sequence similarity networks of different species is used to predict orthology.
Kuang, R., Weston, J., Noble, W. S. & Leslie, C. Motif-based protein ranking by network propagation. Bioinformatics 21, 3711–3718 (2005).
Yosef, N., Sharan, R. & Noble, W. S. Improved network-based identification of protein orthologs. Bioinformatics 24, i200–i206 (2008).
Singh, R., Xu, J. & Berger, B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc. Natl Acad. Sci. USA 105, 12763–12768 (2008).
Liao, C.-S., Lu, K., Baym, M., Singh, R. & Berger, B. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25, i253–i258 (2009).
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B. & Singh, M. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21 (Suppl. 1), i302–i310 (2005).
Letovsky, S. & Kasif, S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19 (Suppl. 1), i197–i204 (2003).
Deng, M., Zhang, K., Mehta, S., Chen, T. & Sun, F. Prediction of protein function using protein–protein interaction data. J. Comput. Biol. 10, 947–960 (2003).
Can, T., Çamoglu, O. & Singh, A. K. Analysis of protein–protein interaction networks using random walks. BIOKDD '05 https://doi.org/10.1145/1134030.1134042 (2005).
Voevodski, K., Teng, S.-H. & Xia, Y. Spectral affinity in protein networks. BMC Syst. Biol. 3, 112 (2009).
Suthram, S., Beyer, A., Karp, R. M., Eldar, Y. & Ideker, T. eQED: an efficient method for interpreting eQTL associations using protein networks. Mol. Syst. Biol. 4, 162 (2008).
Kelley, R. & Ideker, T. Systematic interpretation of genetic interactions using protein networks. Nat. Biotechnol. 23, 561–566 (2005).
Qi, Y., Suhail, Y., Lin, Y.-Y., Boeke, J. D. & Bader, J. S. Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. Genome Res. 18, 1991–2004 (2008).
Cao, M. et al. Going the distance for protein function prediction: a new distance metric for protein interaction networks. PLoS ONE 8, e76339 (2013).
Lehtinen, S., Lees, J., Bähler, J., Shawe-Taylor, J. & Orengo, C. Gene function prediction from functional association networks using kernel partial least squares regression. PLoS ONE 10, e0134668 (2015).
Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C. & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9 (Suppl. 1), S4 (2008).
Peng, W., Li, M., Chen, L. & Wang, L. Predicting protein functions by using unbalanced random walk algorithm on three biological networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 14, 360–369 (2015).
Lanckriet, G. R. G., De Bie, T., Cristianini, N., Jordan, M. I. & Noble, W. S. A statistical framework for genomic data fusion. Bioinformatics 20, 2626–2635 (2004).
Lee, H., Tu, Z., Deng, M., Sun, F. & Chen, T. Diffusion kernel-based logistic regression models for protein function prediction. OMICS 10, 40–55 (2006).
Tsuda, K., Shin, H. & Schölkopf, B. Fast protein classification with multiple networks. Bioinformatics 21 (Suppl. 2), ii59–ii65 (2005).
Tsuda, K. & Noble, W. S. Learning kernels from biological networks by maximizing entropy. Bioinformatics 20 (Suppl. 1), i326–i333 (2004).
Cho, H., Berger, B. & Peng, J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. 3, 540–548.e5 (2016). An integrative network propagation approach for functional inference using multiple heterogeneous networks.
Wang, S., Cho, H., Zhai, C., Berger, B. & Peng, J. Exploiting ontology graph for predicting sparsely annotated gene function. Bioinformatics 31, i357–i364 (2015).
Voevodski, K., Teng, S.-H. & Xia, Y. Finding local communities in protein networks. BMC Bioinformatics 10, 297 (2009).
Peng, W., Wang, J., Zhao, B. & Wang, L. Identification of protein complexes using weighted PageRank-nibble algorithm and core-attachment structure. IEEE/ACM Trans. Comput. Biol. Bioinform. 12, 179–192 (2015).
Macropol, K., Can, T. & Singh, A. K. RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics 10, 283 (2009).
Morrison, J. L., Breitling, R., Higham, D. J. & Gilbert, D. R. GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics 6, 233 (2005).
Missiuro, P. V. et al. Information flow analysis of interactome networks. PLoS Comput. Biol. 5, e1000350 (2009).
Zotenko, E., Mestre, J., O'Leary, D. P. & Przytycka, T. M. Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput. Biol. 4, e1000140 (2008).
Tu, Z., Wang, L., Arbeitman, M. N., Chen, T. & Sun, F. An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 22, e489–e496 (2006).
Yeger-Lotem, E. et al. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat. Genet. 41, 316–323 (2009).
Atias, N. & Sharan, R. An algorithmic framework for predicting side effects of drugs. J. Comput. Biol. 18, 207–218 (2011).
Lei, C. & Ruan, J. A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29, 355–364 (2013).
Alkan, F. & Erten, C. RedNemo: topology-based PPI network reconstruction via repeated diffusion with neighborhood modifications. Bioinformatics 33, 537–544 (2016).
Lerman, G. & Shakhnovich, B. E. Defining functional distance using manifold embeddings of gene ontology annotations. Proc. Natl Acad. Sci. USA 104, 11334–11339 (2007).
Wang, P. I. et al. RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network. Genome Biol. 13, R125 (2012).
Li, Y. & Patra, J. C. Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics 26, 1219–1224 (2010).
Smedley, D. et al. Walking the interactome for candidate prioritization in exome sequencing studies of Mendelian diseases. Bioinformatics 30, 3215–3222 (2014).
Köhler, S., Bauer, S., Horn, D. & Robinson, P. N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82, 949–958 (2008). An application of network propagation to prioritize disease-causing genes.
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T. & Sharan, R. Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6, e1000641 (2010). One of the first studies to use network propagation to associate modules of multiple proteins with disease.
Lee, I., Blom, U. M., Wang, P. I., Shim, J. E. & Marcotte, E. M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 1109–1121 (2011).
Chen, J., Aronow, B. J. & Jegga, A. G. Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics 10, 73 (2009).
Chen, J. Y., Shen, C. & Sivachenko, A. Y. Mining Alzheimer disease relevant proteins from integrated protein interactome data. Pac. Symp. Biocomput. 2006, 367–378 (2006).
Nitsch, D., Gonçalves, J. P., Ojeda, F., de Moor, B. & Moreau, Y. Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics 11, 460 (2010).
Kim, Y.-A., Wuchty, S. & Przytycka, T. M. Identifying causal genes and dysregulated pathways in complex diseases. PLoS Comput. Biol. 7, e1001095 (2011).
Erten, S., Bebek, G., Ewing, R. M. & Koyutürk, M. DADA: degree-aware algorithms for network-based disease gene prioritization. BioData Min. 4, 19 (2011).
Erten, S., Bebek, G. & Koyutürk, M. Vavien: an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks. J. Comput. Biol. 18, 1561–1574 (2011).
Singh-Blom, U. M. et al. Prediction and validation of gene-disease associations using methods inspired by social network analyses. PLoS ONE 8, e58977 (2013).
Kim, Y.-A., Cho, D.-Y. & Przytycka, T. M. Understanding genotype–phenotype effects in cancer via network approaches. PLoS Comput. Biol. 12, e1004747 (2016).
Magger, O., Waldman, Y. Y., Ruppin, E. & Sharan, R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput. Biol. 8, e1002690 (2012).
Mazza, A., Klockmeier, K., Wanker, E. & Sharan, R. An integer programming framework for inferring disease complexes from network data. Bioinformatics 32, i271–i277 (2016).
Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).
Nakka, P., Raphael, B. J. & Ramachandran, S. Gene and network analysis of common variants reveals novel associations in multiple complex diseases. Genetics 204, 783–798 (2016).
Shrestha, R. et al. in Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science (ed. Sharan, R.) 293–306 (Springer, 2014).
Hofree, M., Shen, J. P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 1108–1115 (2013). One of the first methods to use patient-specific propagation processes to stratify patients with cancer into subtypes.
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).
Paull, E. O. et al. Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE). Bioinformatics 29, 2757–2764 (2013). An integrative method to predict cancer pathways that is based on superimposing two propagation processes that are run from nodes corresponding to mutated and differentially expressed genes.
Drake, J. M. et al. Phosphoproteome integration reveals patient-specific networks in prostate cancer. Cell 166, 1041–1054 (2016).
Shnaps, O., Perry, E., Silverbush, D. & Sharan, R. Inference of personalized drug targets via network propagation. Pac. Symp. Biocomput. 21, 156–167 (2016).
Chen, X., Xing, C., Ming-Xi, L. & Gui-Ying, Y. Drug–target interaction prediction by random walk on the heterogeneous network. Mol. Biosyst. 8, 1970 (2012).
Greene, C. S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).
GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Kellis, M. et al. Defining functional DNA elements in the human genome. Proc. Natl Acad. Sci. USA 111, 6131–6138 (2014).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Chung, F. Laplacians and the Cheeger inequality for directed graphs. Ann. Comb. 9, 1–19 (2005).
Malliaros, F. D. & Vazirgiannis, M. Clustering and community detection in directed networks: a survey. Phys. Rep. 533, 95–142 (2013).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).
Montojo, J. et al. GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop. Bioinformatics 26, 2927–2928 (2010).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Guney, E. & Oliva, B. Exploiting protein–protein interaction networks for genome-wide disease-gene prioritization. PLoS ONE 7, e43557 (2012).
Gottlieb, A., Magger, O., Berman, I., Ruppin, E. & Sharan, R. PRINCIPLE: a tool for associating genes with diseases via network propagation. Bioinformatics 27, 3325–3326 (2011).
Chen, J., Bardes, E. E., Aronow, B. J. & Jegga, A. G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).
Acknowledgements
The authors gratefully acknowledge J. Huang and M. Ruffalo for assistance with figures for this manuscript. They also thank E. Eisenberg for assistance with references for this manuscript. This work was initiated while the authors attended a Network Biology workshop as part of a semester on Algorithmic Challenges in Genomics at the Simons Institute for the Theory of Computing at University of California, Berkeley, USA.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
B.J.R. is a founder of Medley Genomics. The other authors declare they have no competing interests.
Glossary
- Nodes
-
The objects modelled by a network. In biological networks, nodes can represent proteins, genes, metabolites, RNA molecules, or even diseases and phenotypes.
- Edges
-
Relationships between pairs of nodes in a network, for example, molecular interactions between the genes or proteins that correspond to these nodes. Two nodes sharing an edge are said to be adjacent, neighbours, or directly connected by it.
- Network propagation
-
A family of stochastic processes that trace the flow of information through a network over time.
- Random walks
-
Mathematical formalization of the paths resulting from taking successive random steps. Classical examples of random walks are Brownian motion, the fortune of a gambler flipping a coin or fluctuations of the stock market. In the context of networks, a random walk typically describes a process in which a 'walker' moves from one node to another with a probability that is proportional to the weight of the edge connecting the nodes.
- Kernels
-
Symmetric similarity functions with the property that one can assign vectors (in some abstract space) to its arguments such that the similarity of two elements is the dot-product between their corresponding vectors.
- Disease module
-
A network module, the member genes of which are associated with a particular disease.
- False positives
-
Error in prediction whereby negative examples are predicted to be positive. For example, when predicting disease genes, a false positive would correspond to a non-disease gene that is wrongly predicted to be disease-related.
- False negatives
-
Error in prediction whereby positive examples are predicted to be negative. For example, when predicting disease genes, a false negative would correspond to a disease gene that is missed and predicted to be unrelated.
- Edge weight
-
An abstract measure of the 'strength' of the connection between a pair of nodes in a network, typically represented as a real number between 0 and 1.
- Adjacency matrix
-
A matrix representation of a network such that the (i,j) entry denotes whether nodes i and j are adjacent (in which case its value is 1) or not (value 0).
- Orthology
-
The evolutionary relationship between two genes in two species that have descended from a common ancestor.
- Classifier
-
A machine-learning algorithm that predicts the class of a sample given some characteristics of it. For example, a classifier can aim to distinguish between disease and non-disease genes based on their network proximity to known disease or non-disease genes.
- Network modules
-
Regions of a network with some topological property; for example, a set of nodes that densely interact with one another.
- Node degree
-
The number of other nodes that are adjacent (that is, directly connected) to a node.
- Similarity matrix
-
A matrix with rows and columns that represent the same set of objects such that the (i,j) entry denotes some similarity measure (for example, as obtained from network propagation) between the corresponding elements.
Rights and permissions
About this article
Cite this article
Cowen, L., Ideker, T., Raphael, B. et al. Network propagation: a universal amplifier of genetic associations. Nat Rev Genet 18, 551–562 (2017). https://doi.org/10.1038/nrg.2017.38
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg.2017.38
This article is cited by
-
The diversification of methods for studying cell–cell interactions and communication
Nature Reviews Genetics (2024)
-
GINv2.0: a comprehensive topological network integrating molecular interactions from multiple knowledge bases
npj Systems Biology and Applications (2024)
-
Germline determinants of aberrant signaling pathways in cancer
npj Precision Oncology (2024)
-
A review of machine learning-based methods for predicting drug–target interactions
Health Information Science and Systems (2024)
-
Focal adhesion is associated with lithium response in bipolar disorder: evidence from a network-based multi-omics analysis
Molecular Psychiatry (2024)