Abstract
The identification of key functional biological networks from high-dimensional genomics data is pivotal for cancer research. Here, we introduce FDRnet, a method for the detection of molecular subnetworks in cancer, which addresses several challenges in pathway analysis. FDRnet detects key subnetworks by solving a mixed-integer linear programming problem, using a given upper bound of false discovery rate (FDR) as a budget constraint, and minimizing a conductance score to find dense subgraphs around seed genes. A large-scale benchmark study was performed on both simulation and cancer genomics data. FDRnet outperformed other methods in the ability to detect functionally homogeneous subnetworks in a scale-free biological network, to control FDRs of the genes in detected subnetworks, to improve computational efficiency and to integrate multi-omics data. By overcoming the limitations of existing approaches, FDRnet can facilitate the detection of key functional pathways in cancer and other genetic diseases.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The breast cancer somatic mutation and copy number data (dbGaP study accession no. phs000178) were downloaded from the TCGA Firehose website (https://gdac.broadinstitute.org). The iRefIndex9.0 PPI network, the BioGRID v3.5.187 PPI network and the ReactomeFI v2019 PPI network were downloaded from http://compbio-research.cs.brown.edu/pancancer/hotnet2/, https://thebiogrid.org and https://reactome.org, respectively, without any restriction. For the lymphoma study, the gene expression data and the interactome data (HPRD PPI network) were obtained from the BioNet package (https://www.bioconductor.org/packages/release/bioc/html/BioNet.html) without any restriction. Source data are provided with this paper.
Code availability
The software and user manual are available at https://github.com/yangle293/FDRnet (https://doi.org/10.5281/zenodo.4121885; ref. 61) and www.acsu.buffalo.edu/~yijunsun/lab/FDRnet.html.
References
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
The Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Dees, N. D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).
Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).
Raphael, B. J., Dobson, J. R., Oesper, L. & Vandin, F. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med. 6, 5 (2014).
Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A. F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18, S233–S240 (2002).
Dittrich, M. T., Klau, G. W., Rosenwald, A., Dandekar, T. & Müller, T. Identifying functional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics 24, 223–231 (2008).
Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).
Ciriello, G., Cerami, E., Sander, C. & Schultz, N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22, 398–406 (2012).
Iorio, F. et al. Pathway-based dissection of the genomic heterogeneity of cancer hallmarks’ acquisition with SLAPenrich. Sci. Rep. 8, 1–16 (2018).
Sohler, F., Hanisch, D. & Zimmer, R. New methods for joint analysis of biological networks and expression data. Bioinformatics 20, 1517–1521 (2004).
Nacu, Ş., Critchley-Thorne, R., Lee, P. & Holmes, S. Gene expression network analysis and applications to immunology. Bioinformatics 23, 850–858 (2007).
Leiserson, M. D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Reyna, M. A., Leiserson, M. D. & Raphael, B. J. Hierarchical HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 34, i972–i980 (2018).
Razick, S., Magklaras, G. & Donaldson, I. M. iRefindex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008).
Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 47, D559–D563 (2019).
Beisser, D., Klau, G. W., Dandekar, T., Müller, T. & Dittrich, M. T. BioNet: an R-package for the functional analysis of biological networks. Bioinformatics 26, 1129–1130 (2010).
Qiu, Y.-Q., Zhang, S., Zhang, X.-S. & Chen, L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinformatics 11, 26 (2010).
Gu, J., Chen, Y., Li, S. & Li, Y. Identification of responsive gene modules by network-based gene clustering and extending: application to inflammation and angiogenesis. BMC Syst. Biol. 4, 47 (2010).
Barabasi, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
Watson, I. R., Takahashi, K., Futreal, P. A. & Chin, L. Emerging patterns of somatic mutations in cancer. Nat. Rev. Genet. 14, 703–718 (2013).
Mermel, C. H. et al. Gistic2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2016).
Olivier, M., Hollstein, M. & Hainaut, P. TP53 mutations in human cancers: origins, consequences and clinical use. Cold Spring Harb. Perspect. Biol. 2, a001008 (2010).
Khatri, P. & Drăghici, S. Ontological analysis of gene expression data: current tools, limitations and open problems. Bioinformatics 21, 3587–3595 (2005).
Dustin, D., Gu, G. & Fuqua, S. A. W. ESR1 mutations in breast cancer. Cancer 125, 3714–3728 (2019).
Toy, W. et al. ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat. Genet. 45, 1439–1445 (2013).
Martínez-Iglesias, O., Alonso-Merino, E. & Aranda, A. Tumor suppressive actions of the nuclear receptor corepressor 1. Pharmacol. Res. 108, 75–79 (2016).
Soutourina, J. Transcription regulation by the Mediator complex. Nat. Rev. Mol. Cell Biol. 19, 262–274 (2018).
Eyboulet, F. et al. Mediator links transcription and DNA repair by facilitating Rad2/XPG recruitment. Genes Dev. 27, 2549–2562 (2013).
Rosenwald, A. et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New Engl. J. Med. 346, 1937–1947 (2002).
Chapuy, B. et al. Molecular subtypes of diffuse large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes. Nat. Med. 24, 679–690 (2018).
Keshava Prasad, T. et al. Human Protein Reference Database—2009 update. Nucleic Acids Res. 37, D767–D772 (2008).
Xu-Monette, Z. Y. et al. Mutational profile and prognostic significance of TP53 in diffuse large B-cell lymphoma patients treated with R-CHOP: report from an international DLBCL Rituximab-CHOP Consortium Program Study. Blood 120, 3986–3996 (2012).
Lenz, G. & Staudt, L. M. Aggressive lymphomas. New Engl. J. Med. 362, 1417–1429 (2010).
Phelan, J. D. et al. A multiprotein supercomplex controlling oncogenic signalling in lymphoma. Nature 560, 387–391 (2018).
Munoz, J., Dhillon, N., Janku, F., Watowich, S. S. & Hong, D. S. STAT3 inhibitors: finding a home in lymphoma and leukemia. Oncologist 19, 536–544 (2014).
Hatzi, K. et al. A hybrid mechanism of action for BCL6 in B cells defined by formation of functionally distinct complexes at enhancers and promoters. Cell Rep. 4, 578–588 (2013).
Benson, A. R., Gleich, D. F. & Leskovec, J. Higher-order organization of complex networks. Science 353, 163–166 (2016).
Yin, H., Benson, A. R., Leskovec, J. & Gleich, D. F. Local higher-order graph clustering. In Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 555–564 (ACM, 2017); https://doi.org/10.1145/3097983.3098069
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Efron, B., Tibshirani, R., Storey, J. D. & Tusher, V. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96, 1151–1160 (2001).
Efron, B. & Tibshirani, R. Using specially designed exponential families for density estimation. Ann. Stat. 24, 2431–2461 (1996).
Strimmer, K. fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24, 1461–1462 (2008).
Langaas, M., Lindqvist, B. H. & Ferkingstad, E. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. B 67, 555–572 (2005).
Efron, B. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99, 96–104 (2004).
Hong, W.-J., Tibshirani, R. & Chu, G. Local false discovery rate facilitates comparison of different microarray experiments. Nucleic Acids Res. 37, 7483–7497 (2009).
Albert, R. Scale-free networks in cell biology. J. Cell Sci. 118, 4947–4957 (2005).
Dao, P. et al. Inferring cancer subnetwork markers using density-constrained biclustering. Bioinformatics 26, i625–i631 (2010).
Colak, R. et al. Dense graphlet statistics of protein interaction and random networks. In Pacific Symposium on Biocomputing 178–189 (World Scientific, 2009); https://doi.org/10.1142/9789812836939_0018
Adams, W. P. & Sherali, H. D. Linearization strategies for a class of zero-one mixed integer programming problems. Oper. Res. 38, 217–226 (1990).
Fan, N. & Pardalos, P. M. Multi-way clustering and biclustering by the ratio cut and normalized cut in graphs. J. Combin. Optim. 23, 224–251 (2012).
Dilkina, B. N. & Gomes, C. P. Solving connected subgraph problems in wildlife conservation. In 7th International Conference on the Integration of Constraint Programming, Artificial Intelligence and Operations Research 102–116 (ACM, 2010); https://doi.org/10.1007/978-3-642-13520-0_14
IBM, Inc. CPLEX Optimizer Studio 12.7 (2016); https://www.ibm.com/analytics/cplex-optimizer
Andersen, R., Chung, F. & Lang, K. Local graph partitioning using PageRank vectors. In 47th Annual IEEE Symposium on Foundations of Computer Science 475–486 (IEEE, 2006); https://doi.org/10.1109/FOCS.2006.44
Yang, L. FDRnet 1.0.0 (version 1.0.0) (2020); https://doi.org/10.5281/zenodo.4121885
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Acknowledgements
This work is supported in part by NIH R01AI125982 (Y.S.), NIH R01DE024523195 (Y.S.) and NIH R01CA241123 (S.G.).
Author information
Authors and Affiliations
Contributions
L.Y., S.G. and Y.S. designed the study. L.Y., R.C. and Y.S. performed the data analysis. S.G. performed the biological discussions. L.Y., S.G. and Y.S. wrote the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Editor recognition statement Fernando Chirigati was the primary editor on this Article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Supplementary information
Rights and permissions
About this article
Cite this article
Yang, L., Chen, R., Goodison, S. et al. An efficient and effective method to identify significantly perturbed subnetworks in cancer. Nat Comput Sci 1, 79–88 (2021). https://doi.org/10.1038/s43588-020-00009-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-020-00009-4
This article is cited by
-
Surrogate “Level-Based” Lagrangian Relaxation for mixed-integer linear programming
Scientific Reports (2022)
-
Statistical properties of the MetaCore network of protein–protein interactions
Applied Network Science (2022)
-
Network propagation-based prioritization of long tail genes in 17 cancer types
Genome Biology (2021)
-
Redefining false discoveries in cancer data analyses
Nature Computational Science (2021)
-
A computational approach for the discovery of significant cancer genes by weighted mutation and asymmetric spreading strength in networks
Scientific Reports (2021)