An efficient and effective method to identify significantly perturbed subnetworks in cancer

Yang, Le; Chen, Runpu; Goodison, Steve; Sun, Yijun

doi:10.1038/s43588-020-00009-4

Article
Published: 14 January 2021

An efficient and effective method to identify significantly perturbed subnetworks in cancer

Nature Computational Science volume 1, pages 79–88 (2021)Cite this article

4325 Accesses
7 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The identification of key functional biological networks from high-dimensional genomics data is pivotal for cancer research. Here, we introduce FDRnet, a method for the detection of molecular subnetworks in cancer, which addresses several challenges in pathway analysis. FDRnet detects key subnetworks by solving a mixed-integer linear programming problem, using a given upper bound of false discovery rate (FDR) as a budget constraint, and minimizing a conductance score to find dense subgraphs around seed genes. A large-scale benchmark study was performed on both simulation and cancer genomics data. FDRnet outperformed other methods in the ability to detect functionally homogeneous subnetworks in a scale-free biological network, to control FDRs of the genes in detected subnetworks, to improve computational efficiency and to integrate multi-omics data. By overcoming the limitations of existing approaches, FDRnet can facilitate the detection of key functional pathways in cancer and other genetic diseases.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the proposed method.**

**Fig. 2: Comparison of six methods in terms of their abilities to detect target genes and modular structures and to control FDRs of identified subnetworks using simulation data.**

**Fig. 3: Detecting significantly mutated subnetworks in breast cancer using The Cancer Genome Atlas copy number and somatic mutation data.**

**Fig. 4: Detecting pathways differentially expressed between germinal center B-cell like (GCB) and activated B-cell like (ABC) diffuse large B-cell lymphoma using gene expression data.**

**Fig. 5: Running time of six methods applied to simulation, breast cancer and lymphoma data.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Data availability

The breast cancer somatic mutation and copy number data (dbGaP study accession no. phs000178) were downloaded from the TCGA Firehose website (https://gdac.broadinstitute.org). The iRefIndex9.0 PPI network, the BioGRID v3.5.187 PPI network and the ReactomeFI v2019 PPI network were downloaded from http://compbio-research.cs.brown.edu/pancancer/hotnet2/, https://thebiogrid.org and https://reactome.org, respectively, without any restriction. For the lymphoma study, the gene expression data and the interactome data (HPRD PPI network) were obtained from the BioNet package (https://www.bioconductor.org/packages/release/bioc/html/BioNet.html) without any restriction. Source data are provided with this paper.

Code availability

The software and user manual are available at https://github.com/yangle293/FDRnet (https://doi.org/10.5281/zenodo.4121885; ref. ⁶¹) and www.acsu.buffalo.edu/~yijunsun/lab/FDRnet.html.

References

Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
Article Google Scholar
The Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Article Google Scholar
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Article Google Scholar
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Article Google Scholar
Dees, N. D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
Article Google Scholar
Stransky, N. et al. The mutational landscape of head and neck squamous cell carcinoma. Science 333, 1157–1160 (2011).
Article Google Scholar
Chapman, M. A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).
Article Google Scholar
Raphael, B. J., Dobson, J. R., Oesper, L. & Vandin, F. Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med. 6, 5 (2014).
Article Google Scholar
Ideker, T., Ozier, O., Schwikowski, B. & Siegel, A. F. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18, S233–S240 (2002).
Article Google Scholar
Dittrich, M. T., Klau, G. W., Rosenwald, A., Dandekar, T. & Müller, T. Identifying functional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics 24, 223–231 (2008).
Article Google Scholar
Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).
Article MathSciNet Google Scholar
Ciriello, G., Cerami, E., Sander, C. & Schultz, N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Res. 22, 398–406 (2012).
Article Google Scholar
Iorio, F. et al. Pathway-based dissection of the genomic heterogeneity of cancer hallmarks’ acquisition with SLAPenrich. Sci. Rep. 8, 1–16 (2018).
Article Google Scholar
Sohler, F., Hanisch, D. & Zimmer, R. New methods for joint analysis of biological networks and expression data. Bioinformatics 20, 1517–1521 (2004).
Article Google Scholar
Nacu, Ş., Critchley-Thorne, R., Lee, P. & Holmes, S. Gene expression network analysis and applications to immunology. Bioinformatics 23, 850–858 (2007).
Article Google Scholar
Leiserson, M. D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Article Google Scholar
Reyna, M. A., Leiserson, M. D. & Raphael, B. J. Hierarchical HotNet: identifying hierarchies of altered subnetworks. Bioinformatics 34, i972–i980 (2018).
Article Google Scholar
Razick, S., Magklaras, G. & Donaldson, I. M. iRefindex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008).
Article Google Scholar
Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes—2019. Nucleic Acids Res. 47, D559–D563 (2019).
Article Google Scholar
Beisser, D., Klau, G. W., Dandekar, T., Müller, T. & Dittrich, M. T. BioNet: an R-package for the functional analysis of biological networks. Bioinformatics 26, 1129–1130 (2010).
Article Google Scholar
Qiu, Y.-Q., Zhang, S., Zhang, X.-S. & Chen, L. Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinformatics 11, 26 (2010).
Article Google Scholar
Gu, J., Chen, Y., Li, S. & Li, Y. Identification of responsive gene modules by network-based gene clustering and extending: application to inflammation and angiogenesis. BMC Syst. Biol. 4, 47 (2010).
Article Google Scholar
Barabasi, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Article Google Scholar
Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).
Article Google Scholar
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
Google Scholar
Watson, I. R., Takahashi, K., Futreal, P. A. & Chin, L. Emerging patterns of somatic mutations in cancer. Nat. Rev. Genet. 14, 703–718 (2013).
Article Google Scholar
Mermel, C. H. et al. Gistic2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Article Google Scholar
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2016).
Article Google Scholar
Olivier, M., Hollstein, M. & Hainaut, P. TP53 mutations in human cancers: origins, consequences and clinical use. Cold Spring Harb. Perspect. Biol. 2, a001008 (2010).
Article Google Scholar
Khatri, P. & Drăghici, S. Ontological analysis of gene expression data: current tools, limitations and open problems. Bioinformatics 21, 3587–3595 (2005).
Article Google Scholar
Dustin, D., Gu, G. & Fuqua, S. A. W. ESR1 mutations in breast cancer. Cancer 125, 3714–3728 (2019).
Article Google Scholar
Toy, W. et al. ESR1 ligand-binding domain mutations in hormone-resistant breast cancer. Nat. Genet. 45, 1439–1445 (2013).
Article Google Scholar
Martínez-Iglesias, O., Alonso-Merino, E. & Aranda, A. Tumor suppressive actions of the nuclear receptor corepressor 1. Pharmacol. Res. 108, 75–79 (2016).
Article Google Scholar
Soutourina, J. Transcription regulation by the Mediator complex. Nat. Rev. Mol. Cell Biol. 19, 262–274 (2018).
Article Google Scholar
Eyboulet, F. et al. Mediator links transcription and DNA repair by facilitating Rad2/XPG recruitment. Genes Dev. 27, 2549–2562 (2013).
Article Google Scholar
Rosenwald, A. et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New Engl. J. Med. 346, 1937–1947 (2002).
Article Google Scholar
Chapuy, B. et al. Molecular subtypes of diffuse large B cell lymphoma are associated with distinct pathogenic mechanisms and outcomes. Nat. Med. 24, 679–690 (2018).
Article Google Scholar
Keshava Prasad, T. et al. Human Protein Reference Database—2009 update. Nucleic Acids Res. 37, D767–D772 (2008).
Article Google Scholar
Xu-Monette, Z. Y. et al. Mutational profile and prognostic significance of TP53 in diffuse large B-cell lymphoma patients treated with R-CHOP: report from an international DLBCL Rituximab-CHOP Consortium Program Study. Blood 120, 3986–3996 (2012).
Article Google Scholar
Lenz, G. & Staudt, L. M. Aggressive lymphomas. New Engl. J. Med. 362, 1417–1429 (2010).
Article Google Scholar
Phelan, J. D. et al. A multiprotein supercomplex controlling oncogenic signalling in lymphoma. Nature 560, 387–391 (2018).
Article Google Scholar
Munoz, J., Dhillon, N., Janku, F., Watowich, S. S. & Hong, D. S. STAT3 inhibitors: finding a home in lymphoma and leukemia. Oncologist 19, 536–544 (2014).
Article Google Scholar
Hatzi, K. et al. A hybrid mechanism of action for BCL6 in B cells defined by formation of functionally distinct complexes at enhancers and promoters. Cell Rep. 4, 578–588 (2013).
Article Google Scholar
Benson, A. R., Gleich, D. F. & Leskovec, J. Higher-order organization of complex networks. Science 353, 163–166 (2016).
Article Google Scholar
Yin, H., Benson, A. R., Leskovec, J. & Gleich, D. F. Local higher-order graph clustering. In Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 555–564 (ACM, 2017); https://doi.org/10.1145/3097983.3098069
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Efron, B., Tibshirani, R., Storey, J. D. & Tusher, V. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96, 1151–1160 (2001).
Article MathSciNet MATH Google Scholar
Efron, B. & Tibshirani, R. Using specially designed exponential families for density estimation. Ann. Stat. 24, 2431–2461 (1996).
Article MathSciNet MATH Google Scholar
Strimmer, K. fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24, 1461–1462 (2008).
Article Google Scholar
Langaas, M., Lindqvist, B. H. & Ferkingstad, E. Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. B 67, 555–572 (2005).
Article MathSciNet MATH Google Scholar
Efron, B. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99, 96–104 (2004).
Article MathSciNet MATH Google Scholar
Hong, W.-J., Tibshirani, R. & Chu, G. Local false discovery rate facilitates comparison of different microarray experiments. Nucleic Acids Res. 37, 7483–7497 (2009).
Article Google Scholar
Albert, R. Scale-free networks in cell biology. J. Cell Sci. 118, 4947–4957 (2005).
Article Google Scholar
Dao, P. et al. Inferring cancer subnetwork markers using density-constrained biclustering. Bioinformatics 26, i625–i631 (2010).
Article Google Scholar
Colak, R. et al. Dense graphlet statistics of protein interaction and random networks. In Pacific Symposium on Biocomputing 178–189 (World Scientific, 2009); https://doi.org/10.1142/9789812836939_0018
Adams, W. P. & Sherali, H. D. Linearization strategies for a class of zero-one mixed integer programming problems. Oper. Res. 38, 217–226 (1990).
Article MathSciNet MATH Google Scholar
Fan, N. & Pardalos, P. M. Multi-way clustering and biclustering by the ratio cut and normalized cut in graphs. J. Combin. Optim. 23, 224–251 (2012).
Article MathSciNet MATH Google Scholar
Dilkina, B. N. & Gomes, C. P. Solving connected subgraph problems in wildlife conservation. In 7th International Conference on the Integration of Constraint Programming, Artificial Intelligence and Operations Research 102–116 (ACM, 2010); https://doi.org/10.1007/978-3-642-13520-0_14
IBM, Inc. CPLEX Optimizer Studio 12.7 (2016); https://www.ibm.com/analytics/cplex-optimizer
Andersen, R., Chung, F. & Lang, K. Local graph partitioning using PageRank vectors. In 47th Annual IEEE Symposium on Foundations of Computer Science 475–486 (IEEE, 2006); https://doi.org/10.1109/FOCS.2006.44
Yang, L. FDRnet 1.0.0 (version 1.0.0) (2020); https://doi.org/10.5281/zenodo.4121885
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by NIH R01AI125982 (Y.S.), NIH R01DE024523195 (Y.S.) and NIH R01CA241123 (S.G.).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The State University of New York at Buffalo, Buffalo, NY, USA
Le Yang, Runpu Chen & Yijun Sun
Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, USA
Steve Goodison
Department of Microbiology and Immunology, The State University of New York at Buffalo, Buffalo, NY, USA
Yijun Sun
Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY, USA
Yijun Sun

Authors

Le Yang
View author publications
You can also search for this author in PubMed Google Scholar
Runpu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Steve Goodison
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Sun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.Y., S.G. and Y.S. designed the study. L.Y., R.C. and Y.S. performed the data analysis. S.G. performed the biological discussions. L.Y., S.G. and Y.S. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Steve Goodison or Yijun Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editor recognition statement Fernando Chirigati was the primary editor on this Article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Supplementary information

Supplementary Information.

Supplementary Data 1 Genes and subnetworks identified by FDRnet applied to breast cancer data.

Supplementary Data 2 Genes and subnetworks identified by HotNet2 applied to breast cancer data.

Supplementary Data 3 Genes and subnetworks identified by hHotNet applied to breast cancer data.

Supplementary Data 4 Genes and subnetworks identified by FDRnet applied to lymphoma gene expression data.

Source data

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Chen, R., Goodison, S. et al. An efficient and effective method to identify significantly perturbed subnetworks in cancer. Nat Comput Sci 1, 79–88 (2021). https://doi.org/10.1038/s43588-020-00009-4

Download citation

Received: 21 May 2020
Accepted: 02 December 2020
Published: 14 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1038/s43588-020-00009-4

This article is cited by

Surrogate “Level-Based” Lagrangian Relaxation for mixed-integer linear programming
- Mikhail A. Bragin
- Emily L. Tucker
Scientific Reports (2022)
Statistical properties of the MetaCore network of protein–protein interactions
- Ekaterina Kotelnikova
- Klaus M. Frahm
- Dima L. Shepelyansky
Applied Network Science (2022)
Network propagation-based prioritization of long tail genes in 17 cancer types
- Hussein Mohsen
- Vignesh Gunasekharan
- Mark B. Gerstein
Genome Biology (2021)
Redefining false discoveries in cancer data analyses
- Hanna Najgebauer
- Umberto Perron
- Francesco Iorio
Nature Computational Science (2021)
A computational approach for the discovery of significant cancer genes by weighted mutation and asymmetric spreading strength in networks
- Jorge Francisco Cutigi
- Adriane Feijo Evangelista
- Adenilso Simao
Scientific Reports (2021)