Resource

Interactome INSIDER: a structural interactome browser for genomic studies

  • Nature Methods volume 15, pages 107114 (2018)
  • doi:10.1038/nmeth.4540
  • Download Citation
Received:
Accepted:
Published:

Abstract

We present Interactome INSIDER, a tool to link genomic variant information with structural protein–protein interactomes. Underlying this tool is the application of machine learning to predict protein interaction interfaces for 185,957 protein interactions with previously unresolved interfaces in human and seven model organisms, including the entire experimentally determined human binary interactome. Predicted interfaces exhibit functional properties similar to those of known interfaces, including enrichment for disease mutations and recurrent cancer mutations. Through 2,164 de novo mutagenesis experiments, we show that mutations of predicted and known interface residues disrupt interactions at a similar rate and much more frequently than mutations outside of predicted interfaces. To spur functional genomic studies, Interactome INSIDER (http://interactomeinsider.yulab.org) enables users to identify whether variants or disease mutations are enriched in known and predicted interaction interfaces at various resolutions. Users may explore known population variants, disease mutations, and somatic cancer mutations, or they may upload their own set of mutations for this purpose.

  • Subscribe to Nature Methods for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).

  2. 2.

    Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science 333, 601–607 (2011).

  3. 3.

    et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).

  4. 4.

    et al. A proteome-wide fission yeast interactome reveals network evolution principles from yeasts to human. Cell 164, 310–323 (2016).

  5. 5.

    & HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).

  6. 6.

    et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).

  7. 7.

    , , & Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314, 1938–1941 (2006).

  8. 8.

    et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 30, 159–164 (2012).

  9. 9.

    Cryo-EM enters a new era. eLife 3, e03678 (2014).

  10. 10.

    , , & Principles of docking: an overview of search algorithms and a guide to scoring functions. Proteins 47, 409–443 (2002).

  11. 11.

    & Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993).

  12. 12.

    , & Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).

  13. 13.

    et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, 03430 (2014).

  14. 14.

    , & Binding interface prediction by combining protein-protein docking results. Proteins 82, 57–66 (2014).

  15. 15.

    et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 490, 556–560 (2012).

  16. 16.

    et al. A computational interactome and functional annotation for the human proteome. eLife 5, 18715 (2016).

  17. 17.

    et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA 108, E1293–E1301 (2011).

  18. 18.

    & Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999).

  19. 19.

    , , & Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems (eds. Shawe-Taylor, T et al.) 2546–2554 (NIPS, 2011).

  20. 20.

    , , , & PIER: protein interface recognition for structural proteomics. Proteins 67, 400–417 (2007).

  21. 21.

    , , & Protein binding site prediction using an empirical scoring function. Nucleic Acids Res. 34, 3698–3707 (2006).

  22. 22.

    & Prediction-based fingerprints of protein-protein interactions. Proteins 66, 630–645 (2007).

  23. 23.

    & CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS One 6, e17695 (2011).

  24. 24.

    , , & Predicting protein-protein interface residues using local surface structural similarity. BMC Bioinformatics 13, 41 (2012).

  25. 25.

    , , & Protein-protein docking benchmark version 4.0. Proteins 78, 3111–3114 (2010).

  26. 26.

    & Predicting protein interface residues using easily accessible on-line resources. Brief. Bioinform. 16, 1025–1034 (2015).

  27. 27.

    et al. A massively parallel pipeline to clone DNA variants and examine molecular phenotypes of human disease mutations. PLoS Genet. 10, e1004819 (2014).

  28. 28.

    et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).

  29. 29.

    et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

  30. 30.

    et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).

  31. 31.

    et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

  32. 32.

    1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  33. 33.

    UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

  34. 34.

    et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).

  35. 35.

    et al. mutation3D: cancer gene prediction through atomic clustering of coding variants in the structural proteome. Hum. Mutat. 37, 447–456 (2016).

  36. 36.

    et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  37. 37.

    et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).

  38. 38.

    , , & Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Hum. Mutat. 33, 359–363 (2012).

  39. 39.

    et al. Bone Morphogenetic Protein (BMP) signaling in development and human diseases. Genes Dis. 1, 87–105 (2014).

  40. 40.

    et al. SMAD genes in juvenile polyposis. Genes Chromosom. Cancer 26, 54–61 (1999).

  41. 41.

    et al. Exome sequencing reveals germline SMAD9 mutation that reduces phosphatase and tensin homolog expression and is associated with hamartomatous polyposis and gastrointestinal ganglioneuromas. Gastroenterology 149, 886–889 e5 (2015).

  42. 42.

    Hypertrophic cardiomyopathy: a systematic review. J. Am. Med. Assoc. 287, 1308–1320 (2002).

  43. 43.

    et al. Cardiomyopathy in patients with ACTA1-myopathy. Neuromuscul. Disord. 25, S287 (2015).

  44. 44.

    et al. Muscle disease caused by mutations in the skeletal muscle alpha-actin gene (ACTA1). Neuromuscul. Disord. 13, 519–531 (2003).

  45. 45.

    et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  46. 46.

    et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).

  47. 47.

    et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  48. 48.

    et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

  49. 49.

    et al. Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat. Methods 12, 154–159 (2015).

  50. 50.

    et al. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Natl. Acad. Sci. USA 112, E5486–E5495 (2015).

  51. 51.

    , , & Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. Curr. Opin. Struct. Biol. 32, 18–24 (2015).

  52. 52.

    , , & Predicting the impact of missense mutations on protein-protein binding affinity. J. Chem. Theory Comput. 10, 1770–1780 (2014).

  53. 53.

    et al. Current progress in structure-based rational drug design marks a new mindset in drug discovery. Comput. Struct. Biotechnol. J. 5, e201302011 (2013).

  54. 54.

    , & Exploring bias in the Protein Data Bank using contrast classifiers. Pac. Symp. Biocomput. 2004, 435–446 (2004).

  55. 55.

    et al. The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics 9, S1 (2008).

  56. 56.

    et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods 9, 345–350 (2012).

  57. 57.

    et al. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).

  58. 58.

    et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846 (2012).

  59. 59.

    et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012).

  60. 60.

    et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43, D470–D478 (2015).

  61. 61.

    et al. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database (Oxford) 2010, baq023 (2010).

  62. 62.

    et al. Human Protein Reference Database--2009 update. Nucleic Acids Res. 37, D767–D772 (2009).

  63. 63.

    et al. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res. 39, D220–D224 (2011).

  64. 64.

    et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 33, D418–D424 (2005).

  65. 65.

    et al. CORUM: the comprehensive resource of mammalian protein complexes--2009. Nucleic Acids Res. 38, D497–D501 (2010).

  66. 66.

    et al. MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res. 34, D436–D441 (2006).

  67. 67.

    & Online predicted human interaction database. Bioinformatics 21, 2076–2082 (2005).

  68. 68.

    et al. The MIPS mammalian protein-protein interaction database. Bioinformatics 21, 832–834 (2005).

  69. 69.

    et al. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data. Nat. Biotechnol. 22, 177–183 (2004).

  70. 70.

    et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

  71. 71.

    et al. SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Res. 41, D483–D489 (2013).

  72. 72.

    & The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55, 379–400 (1971).

  73. 73.

    et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

  74. 74.

    Random forests. Mach. Learn. 45, 5–32 (2001).

  75. 75.

    , , & Data Mining: Practical Machine Learning Tools and Techniques (Elsevier Science, 2016).

  76. 76.

    et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012).

  77. 77.

    A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biol. Skr. 5, 1–34 (1948).

  78. 78.

    , & Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

  79. 79.

    et al. The UCSC Genome Browser database: 2017 update. Nucleic Acids Res. 45 D1, D626–D634 (2017).

Download references

Acknowledgements

The authors would like to thank G. Hooker, D. Bindel, and K. Weinberger for helpful discussions and J. VanEe for technical support. This work was supported by National Institute of General Medical Sciences grants (R01 GM097358, R01 GM104424, R01 GM124559); National Cancer Institute grant (R01 CA167824); Eunice Kennedy Shriver National Institute of Child Health and Human Development grant (R01 HD082568); National Human Genome Research Institute grant (UM1 HG009393); National Science Foundation grant (DBI-1661380); and Simons Foundation Autism Research Initiative grant (367561) to H.Y.

Author information

Author notes

    • Michael J Meyer
    • , Juan Felipe Beltrán
    •  & Siqi Liang

    These authors contributed equally to this work.

Affiliations

  1. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.

    • Michael J Meyer
    • , Juan Felipe Beltrán
    • , Siqi Liang
    • , Aaron Rumack
    • , Xiaomu Wei
    •  & Haiyuan Yu
  2. Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA.

    • Michael J Meyer
    • , Juan Felipe Beltrán
    • , Siqi Liang
    • , Robert Fragoza
    • , Aaron Rumack
    • , Jin Liang
    •  & Haiyuan Yu
  3. Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, USA.

    • Michael J Meyer
  4. Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA.

    • Robert Fragoza
  5. Department of Medicine, Weill Cornell College of Medicine, New York, New York, USA.

    • Xiaomu Wei

Authors

  1. Search for Michael J Meyer in:

  2. Search for Juan Felipe Beltrán in:

  3. Search for Siqi Liang in:

  4. Search for Robert Fragoza in:

  5. Search for Aaron Rumack in:

  6. Search for Jin Liang in:

  7. Search for Xiaomu Wei in:

  8. Search for Haiyuan Yu in:

Contributions

M.J.M., J.F.B., S.L., and H.Y. conceived the study. H.Y. oversaw all aspects of the study. M.J.M., J.F.B., S.L., and A.R. performed computational analyses. M.J.M. and J.F.B. designed ECLAIR. J.F.B. designed the web interface. R.F., J.L., and X.W. performed laboratory experiments. M.J.M. wrote the manuscript with input from J.F.B., S.L., and H.Y. All authors edited and approved of the final manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Haiyuan Yu.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–11 and Supplementary note 1–7

  2. 2.

    Life Sciences Reporting Summary

Excel files

  1. 1.

    Supplementary Table 1

    Comparison of ECLAIR using docking benchmark 4.0

  2. 2.

    Supplementary Table 2

    PSI-MI binary evidence codes

  3. 3.

    Supplementary Table 3

    Training and Testing Sets

  4. 4.

    Supplementary Table 4

    Feature Selection

  5. 5.

    Supplementary Table 5

    Full sub-classifier training

  6. 6.

    Supplementary Table 6

    Comparison of ECLAIR performance with and without co-evolution

  7. 7.

    Supplementary Table 7

    ECLAIR prediction category performance using docking benchmark 4.0

  8. 8.

    Supplementary Table 8

    Initially-trained ECLAIR vs. fully-trained ECLAIR performance

Zip files

  1. 1.

    Supplementary Software

    ÉCLAIR software