Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Similarity network fusion for aggregating data types on a genomic scale

Abstract

Recent technologies have made it cost-effective to collect diverse types of genome-wide data. Computational methods are needed to combine these data to create a comprehensive view of a given disease or a biological process. Similarity network fusion (SNF) solves this problem by constructing networks of samples (e.g., patients) for each available data type and then efficiently fusing these into one network that represents the full spectrum of underlying data. For example, to create a comprehensive view of a disease given a cohort of patients, SNF computes and fuses patient similarity networks obtained from each of their data types separately, taking advantage of the complementarity in the data. We used SNF to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. SNF substantially outperforms single data type analysis and established integrative approaches when identifying cancer subtypes and is effective for predicting survival.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Illustrative example of SNF steps.
Figure 2: Patient similarities for each of the data types independently compared to SNF fused similarity.
Figure 3: Comparison of the SNF approach to iCluster and concatenation.

References

  1. 1

    Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003).

    Article  Google Scholar 

  2. 2

    Verhaak, R.G.W. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98–110 (2010).

    CAS  Article  Google Scholar 

  3. 3

    Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  4. 4

    Kirk, P., Griffin, J.E., Savage, R.S., Ghahramani, Z. & Wild, D.L. Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 28, 3290–3297 (2012).

    CAS  Article  Google Scholar 

  5. 5

    Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

  6. 6

    Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).

  7. 7

    Shen, R., Olshen, A.B. & Ladanyi, M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25, 2906–2912 (2009).

    CAS  Article  Google Scholar 

  8. 8

    Goldenberg, A., Zheng, A.X., Fienberg, S.E. & Airoldi, E.M. A survey of statistical network models. Foundations and Trends in Machine Learning. 2, 129–233 (2010).

    Article  Google Scholar 

  9. 9

    Barabási, A.-L. Network medicine -from obesity to the 'diseasome. N. Engl. J. Med. 357, 404–407 (2007).

    Article  Google Scholar 

  10. 10

    Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).

  11. 11

    Nigro, J.M. et al. Integrated array-comparative genomic hybridization and expression array profiles identify clinically relevant molecular subtypes of glioblastoma. Cancer Res. 65, 1678–1686 (2005).

    CAS  Article  Google Scholar 

  12. 12

    Sturm, D. et al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 22, 425–437 (2012).

    CAS  Article  Google Scholar 

  13. 13

    Sun, S. et al. Protein alterations associated with temozolomide resistance in subclones of human glioblastoma cell lines. J. Neurooncol. 107, 89–100 (2012).

    CAS  Article  Google Scholar 

  14. 14

    Hosmer Jr, D.W., Lemeshow, S. & May, S. Applied Survival Analysis: Regression Modeling of Time to Event Data (Wiley, 2011).

  15. 15

    Rousseeuw, P. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

    Article  Google Scholar 

  16. 16

    Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001).

    CAS  Article  Google Scholar 

  17. 17

    Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

    CAS  Article  Google Scholar 

  18. 18

    Margolin, A.A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181 (2013).

    Google Scholar 

  19. 19

    Friend, S.H. & Ideker, T. Point: Are we prepared for the future doctor visit? Nat. Biotechnol. 29, 215–218 (2011).

    Article  Google Scholar 

  20. 20

    Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    CAS  Article  Google Scholar 

  21. 21

    Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).

    CAS  Article  Google Scholar 

  22. 22

    Wang, B., Jiang, J., Wang, W., Zhou, Z.-H. & Tu, Z. Unsupervised metric fusion by cross diffusion. in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2997–3004 (IEEE, 2012).

  23. 23

    Ng, A.Y., Jordan, M.I. & Weiss, Y. On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002).

    Google Scholar 

  24. 24

    Wei, Y.C. & Cheng, C.K. Towards efficient hierarchical designs by ratio cut partitioning. in Proc. Int. Conf. Computer-Aided Design 298–301 (ICCAD, 1989).

  25. 25

    Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007).

    Article  Google Scholar 

  26. 26

    Zhang, W. et al. Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput. Biol. 9, e1002975 (2013).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This study used data generated by TCGA and METABRIC; we thank TCGA, the Cancer Research UK and the British Columbia Cancer Agency Branch for sharing these invaluable data with the scientific community. We thank N. Jabado, M. Wilson and J. Rommens for feedback on the manuscript, and B. Sousa for help with the figures. This study was partially funded by the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-068) to M.B.; A.G. is funded by the SickKids Research Institute. Z.T. was supported by NSF IIS-1360568.

Author information

Affiliations

Authors

Contributions

B.W. and A.G. conceived of and designed the approach. B.W. performed the data analysis, implemented the method in Matlab and performed all computational experiments. A.M.M. performed data preparation. F.D. wrote the R code that is distributed with the paper. M.F. assisted with network visualization and analysis. Z.T. helped with method design and theoretical framework. B.H.-K. assisted in preparation and analysis of the METABRIC data. B.W., M.B. and A.G. wrote the manuscript.

Corresponding author

Correspondence to Anna Goldenberg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–20, Supplementary Table 1, Supplementary Notes 1 –3 and Supplementary Results (PDF 6804 kb)

Supplementary Software

Similarity Network Fusion for aggregating multiple data types (ZIP 415 kb)

Supplementary Data

TCGA cancer datasets after pre-processing (ZIP 81276 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wang, B., Mezlini, A., Demir, F. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11, 333–337 (2014). https://doi.org/10.1038/nmeth.2810

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing