Global insights into cellular organization and genome function require comprehensive understanding of the interactome networks that mediate genotype–phenotype relationships1,2. Here we present a human ‘all-by-all’ reference interactome map of human binary protein interactions, or ‘HuRI’. With approximately 53,000 protein–protein interactions, HuRI has approximately four times as many such interactions as there are high-quality curated interactions from small-scale studies. The integration of HuRI with genome3, transcriptome4 and proteome5 data enables cellular function to be studied within most physiological or pathological cellular contexts. We demonstrate the utility of HuRI in identifying the specific subcellular roles of protein–protein interactions. Inferred tissue-specific networks reveal general principles for the formation of cellular context-specific functions and elucidate potential molecular mechanisms that might underlie tissue-specific phenotypes of Mendelian diseases. HuRI is a systematic proteome-wide reference that links genomic variation to phenotypic outcomes.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
HuRI, Lit-BM and all previously published human interactome maps from CCSB are available at http://interactome-atlas.org. The PPI data from this publication are also available through IntAct (https://www.ebi.ac.uk/intact/) with the identifier IM-25472. All HuRI-related networks from this study are available at NDExbio.org (https://tinyurl.com/networks-HuRI-paper). The raw and analysed proteomic data have been deposited in the PRIDE repository (https://www.ebi.ac.uk/pride/) with the accession number PXD012321.
Analysis code is available at github.com/CCSB-DFCI/HuRI_paper.
Vidal, M., Cusick, M. E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011).
Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Melé, M. et al. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).
Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339–344 (2015).
Hein, M. Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
Rual, J.-F. et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173–1178 (2005).
Braun, P. et al. An experimentally derived confidence score for binary protein-protein interactions. Nat. Methods 6, 91–97 (2009).
Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods 6, 83–90 (2009).
Chen, Y.-C., Rajagopala, S. V., Stellberger, T. & Uetz, P. Exhaustive benchmarking of the yeast two-hybrid system. Nat. Methods 7, 667–668 (2010).
Choi, S. G. et al. Maximizing binary interactome mapping with a minimal number of assays. Nat. Commun. 10, 3907 (2019).
Eyckerman, S. et al. Design and application of a cytokine-receptor-based interaction trap. Nat. Cell Biol. 3, 1114–1119 (2001).
Cassonnet, P. et al. Benchmarking a luciferase complementation assay for detecting protein complexes. Nat. Methods 8, 990–992 (2011).
Mosca, R., Céol, A. & Aloy, P. Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).
Tompa, P., Davey, N. E., Gibson, T. J. & Babu, M. M. A million peptide motifs for the molecular biologist. Mol. Cell 55, 161–169 (2014).
Sambourg, L. & Thierry-Mieg, N. New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size. BMC Bioinformatics 11, 605 (2010).
Leid, M. et al. Purification, cloning, and RXR identity of the HeLa cell factor with which RAR or TR heterodimerizes to bind target sequences efficiently. Cell 68, 377–395 (1992).
Willy, P. J. et al. LXR, a nuclear receptor that defines a distinct retinoid response pathway. Genes Dev. 9, 1033–1045 (1995).
Kovács, I. A. et al. Network-based prediction of protein interactions. Nat. Commun. 10, 1240 (2019).
Baryshnikova, A. Systematic functional annotation and visualization of biological networks. Cell Syst. 2, 412–421 (2016).
Graham, D. B. et al. TMEM258 is a component of the oligosaccharyltransferase complex controlling ER stress and intestinal inflammation. Cell Rep. 17, 2955–2965 (2016).
Yamamoto, Y., Yoshida, A., Miyazaki, N., Iwasaki, K. & Sakisaka, T. Arl6IP1 has the ability to shape the mammalian ER membrane in a reticulon-like fashion. Biochem. J. 458, 69–79 (2014).
Abdel-Salam, G. M. H. et al. A homozygous IER3IP1 mutation causes microcephaly with simplified gyral pattern, epilepsy, and permanent neonatal diabetes syndrome (MEDS). Am. J. Med. Genet. A. 158A, 2788–2796 (2012).
Jeong, H., Mason, S. P., Barabási, A.-L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).
Capra, J. A., Williams, A. G. & Pollard, K. S. ProteinHistorian: tools for the comparative analysis of eukaryote protein origin. PLOS Comput. Biol. 8, e1002567 (2012).
Pan, J. et al. Interrogation of mammalian protein complex structure, function, and membership using genome-scale fitness screens. Cell Syst. 6, 555–568 (2018).
Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).
Kim, D. K. et al. EVpedia: a community web portal for extracellular vesicles research. Bioinformatics 31, 933–939 (2015).
Hessvik, N. P. & Llorente, A. Current knowledge on exosome biogenesis and release. Cell. Mol. Life Sci. 75, 193–208 (2018).
Imjeti, N. S. et al. Syntenin mediates SRC function in exosomal cell-to-cell communication. Proc. Natl Acad. Sci. USA 114, 12495–12500 (2017).
Calderone, A., Castagnoli, L. & Cesareni, G. mentha: a resource for browsing integrated protein-interaction networks. Nat. Methods 10, 690–691 (2013).
Kiran, M. & Nagarajaram, H. A. Global versus local hubs in human protein-protein interaction network. J. Proteome Res. 12, 5436–5446 (2013).
Yang, L. et al. Comparative analysis of housekeeping and tissue-selective genes in human based on network topologies and biological properties. Mol. Genet. Genomics 291, 1227–1241 (2016).
Paulson, J. N. et al. Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data. BMC Bioinformatics 18, 437 (2017).
Bossi, A. & Lehner, B. Tissue specificity and the human protein interaction network. Mol. Syst. Biol. 5, 260 (2009).
Barshir, R., Shwartz, O., Smoly, I. Y. & Yeger-Lotem, E. Comparative analysis of human tissue interactomes reveals factors leading to tissue-specific manifestation of hereditary diseases. PLOS Comput. Biol. 10, e1003632 (2014).
Sahni, N. et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015).
Reynolds, J. J., Walker, A. K., Gilmore, E. C., Walsh, C. A. & Caldecott, K. W. Impact of PNKP mutations associated with microcephaly, seizures and developmental delay on enzyme activity and DNA strand break repair. Nucleic Acids Res. 40, 6608–6619 (2012).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46 (D1), D1062–D1067 (2018).
Bhatnagar, S. et al. TRIM37 is a new histone H2A ubiquitin ligase and breast cancer oncoprotein. Nature 516, 116–120 (2014).
Olivé, M. et al. New cardiac and skeletal protein aggregate myopathy associated with combined MuRF1 and MuRF3 mutations. Hum. Mol. Genet. 24, 3638–3650 (2015).
Novarino, G. et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343, 506–511 (2014).
Yang, X. et al. Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164, 805–817 (2016).
The authors gratefully acknowledge, in memoriam, support and insight from D. Allinger. We thank P. Porras Millan and the IntAct team for their help in disseminating our PPI data via IntAct, before and after publication. We thank U. Braunschweig, J. Ellis and B. J. Blencowe for help with data analysis. We also thank Q. Zhu, O. G. Troyanskaya, J. Pan and C. Kadoch for sharing co-expression and co-fitness data, respectively. We thank K. S. Tuttle for help with graphics. This work was primarily supported by the National Institutes of Health (NIH) National Human Genome Research Institute (NHGRI) grant U41HG001715 (M.V., F.P.R., D.E.H., M.A.C., G.D.B. and J.T.) with additional support from NIH grants P50HG004233 (M.V. and F.P.R.), U01HL098166 (M.V.), U01HG007690 (M.V.), R01GM109199 (M.A.C.), Canadian Institute for Health Research (CIHR) Foundation Grants (F.P.R. and J. Rak), the Canada Excellence Research Chairs Program (F.P.R.) and an American Heart Association grant 15CVGPS23430000 (M.V.). D.-K.K. was supported by a Banting Postdoctoral Fellowship through the Natural Sciences and Engineering Research Council (NSERC) of Canada and by the Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Education (2017R1A6A3A03004385). C. Pons was supported by a Ramon Cajal fellowship (RYC-2017-22959). G.M.S. was supported by NIH Training Grant T32CA009361. M.V. is a Chercheur Qualifié Honoraire from the Fonds de la Recherche Scientifique (FRS-FNRS, Wallonia-Brussels Federation, Belgium).
J.C.M. is a founder and CEO of seqWell, Inc; F.P.R. and M.V. are shareholders and scientific advisors of seqWell, Inc.
Peer review information Nature thanks Ulrich Stelzl and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Number of protein-coding genes in hORFeome v9.1 and GTEx (tissues), FANTOM (cell types) and HPA (cell lines) transcriptome projects. The number of genes in hORFeome v9.1 is on par with the number of genes expressed in three comprehensive individual transcriptome sequencing studies and includes 94% of the genes with robust evidence of expression in all three. b, Overlap between hORFeome v9.1 and intersection of transcriptomes in a. c, Individual and combined recovery of PRSv1 and RRSv1 pairs by Y2H assay versions (n = 252, 270). d, Coloured squares showing which protein pairs were detected in PRSv1 (left) and RRSv1 (right) by Y2H assay versions. e, Recovery rates of Lit-BM and PPIs from screens of a 2,000-by-2,000 gene test space per Y2H assay version in MAPPIT. f, Cumulative PPI count performing three screens with each Y2H assay version in the test space compared to nine screens with Y2H assay version 1. g, h, MAPPIT and GPCA recovery of Lit-BM and PPIs from screens of Space III when split by screen at an RRS rate of 1% (g) or across a range of thresholds (h). All error bars in c, e and g, are 68.3% Bayesian confidence interval; shaded error band in h is standard error of proportion and n = between 101 and 395 pairs successfully tested for each category. i, Number of proteins in HuRI, detected with each additional screen.
a, Categorization of literature-curated PPIs into distinct subsets based on the experimental methods in which they were detected and the number of pieces of experimental evidence. b–e, Results of testing the different categories of literature-curated pairs in Y2H (b, d) and MAPPIT (c, e) in which the pairs have been further divided into high-throughput (HT) and low-throughput (LT) subsets (b, c). There were between n = 191 and n = 471 successfully tested PPIs for each category. BM, binary multiple; BS, binary singleton; NB, non-binary. Error bars are standard error of proportion.
a, b, Fraction of PPIs with N or C terminus <10 Å (a) or <20 Å (b) to PPI interface, for PPIs with known structure in and not in HuRI (n = 37–1,891 PPIs). Error bars are standard error of proportion. The structure of UBE2D3 bound to RNF115 illustrates an example of a PPI found only by Y2H assay version 3 (PDB code 5ULH). c, MAPPIT recovery rates of HuRI and Lit-BM PPIs that were also detected in HuRI by the number of screens each pair was detected in. Error bars are 68.3% Bayesian confidence interval (n = 22–793 PPIs successfully tested in each category). d, MAPPIT recovery rates of Lit-BM PPIs that were also detected in HuRI, for increasing number of pieces of experimental evidence per PPI. Error bars are 68.3% Bayesian confidence interval (n = 24–61 PPIs successfully tested in each category). e, f, Distributions of interaction interface area (e) or number of atomic contacts (f) by the number of HuRI screens in which a PPI is detected, with box plots showing median, interquartile range (IQR), and 1.5× IQR (with outliers); n = 1,004 PPIs. g, Left, examples of within-complex interactions detected in HuRI (purple) and BioPlex (orange). Right, fraction of HuRI PPIs between proteins of protein complexes that link proteins of the same complex, split by PPIs found in single and multiple screens (dark purple). Error bars are standard error of proportion; n = 1,042 and 775 PPIs, for single and multiple screens, respectively. h, Number of screens each PPI in HuRI was detected in, split by Y2H assay version. i, Number of Y2H assay versions each PPI in HuRI was detected in. j, Estimates of the size of the total binary protein interactome and the fraction covered by HuRI, right and left, respectively, as a function of the minimum number of publications per gene and the minimum number of evidence for the Lit-BM reference. Error bands are 68.3% Bayesian confidence interval; n ≥ 170 Lit-BM PPIs.
Intra-complex PPIs are shown for protein complexes from CORUM as found in BioPlex (orange) or HuRI (purple). HuRI PPIs are further distinguished into PPIs found in single (light purple) or multiple screens (dark purple).
a, Examples of protein pairs in HuRI with high interaction profile similarity and both high (left) and low (right) sequence identity. b, The number of pairs of proteins in HuRI and 100 random networks at increasing Jaccard similarity cutoffs. Box plots are as in Extended Data Fig. 3e. c, Enrichment over random networks of the sum of Jaccard similarities of pairs of proteins in HuRI at increasing thresholds of sequence identity. Error bars are 95% confidence intervals, centre is relative to mean of random networks. d, Fraction of PSN edges that are also PPIs in HuRI, split by the PPIs involving no, one or two self-interacting proteins (SIPs), at increasing Jaccard similarity cutoffs. Error bars are standard error of proportion. e, f, Enrichment over random networks of the PPI count (left) or sum of Jaccard similarities (right) of HuRI PPIs or PSN pairs, respectively, at increasing co-expression (e) and co-fitness (f) cutoffs. Error bars are 95% confidence interval, centre is relative to mean of random networks. g, Functional modules in HuRI (top) and its PSN (bottom) with functional annotations. h, Heat maps of PPI counts, ordered by number of publications, for our previous human interactome maps and Lit-BM i, Fraction of genes with at least one PPI for biomedically interesting genes. j, Heat maps of HuRI and Lit-BM PPI counts between proteins, ordered by number of publications, restricted to PPIs involving genes from the corresponding gene set. k, Schematic of relation between variables: observed PPI degree, abundance, number of publications, and lethality. l, Correlation matrices. PPI datasets refer to their network degree. m, Degree distribution of various PPI networks. n, Empirical determination of significance of correlation between various network degrees and gene properties. HuRI-2s, subset of HuRI found in at least two screens. n = 13,441–53,704 PPIs per network.
a, Odds ratios of proteins in different subcellular compartments and PPI datasets. n = 125–3,941 proteins per compartment, two-tailed Fisher’s exact test. b, The subnetwork of HuRI involving extracellular vesicle proteins. Names of high-degree proteins are shown. c, Number of PPIs in HuRI between extracellular vesicle proteins (purple arrow) compared to the distribution from randomized networks (grey). d, Western blots of SDCBP (left) and ACTB (loading control, right) in wild-type and three knockout (KO) cell lines (#7–#9), repeated twice in two independent laboratories. Full scanned image was displayed, obtained by ChemiDoc MP imager (Bio-Rad). Cell line #8 was used for extracellular vesicle proteomics. e, Fraction of proteins in which abundance in extracellular vesicles was significantly reduced in the SDCBP-knockout cell line, split by proteins interacting and not interacting with SDCBP as identified in HuRI. Error bars are standard error of proportion (n = 6 interactors, 638 non-interactors, *P = 0.042, one-tailed empirical test). f, Schematic illustrating that the number of HuRI PPIs between proteins from two different compartments should correlate with the enrichment of both compartment pairs to overlap, if co-localization annotation is incomplete. g, Scatter plot showing, for each pair of subcellular compartments, odds ratios quantifying the enrichment for proteins located in both compartments versus the enrichment of the density of PPIs between proteins located to either compartment. Size of points is scaled by the standard error of the x axis variable. Regression line and 95% confidence interval are shown. h, The z-score of the regression slope of g compared to those of random networks.
a, Examples of genes displaying different levels of TiP gene expression across the GTEx tissue panel (left). Box plots are as in Extended Data Fig. 3e. n = 90–779 samples per tissue. Equation to calculate tissue-preferential expression for every gene–tissue pair and the maximum TiP value for every gene (middle). Number of genes showing tissue-preferential expression for increasing tissue-preferential expression cutoffs (right). b, Relative number of TiP genes for every tissue for increasing tissue-preferential expression cutoffs. c, d, Differences in number of TiP genes after removal of testis before TiP value calculation per tissue (TiP value cutoff = 2) (c) and in total for increasing tissue-preferential expression cutoffs (d). e, Number of TiP genes and number of TiP genes that are also exclusively expressed in one tissue (for increasing tissue-preferential expression cutoffs. sglTis, single tissue.
Extended Data Fig. 8 PPIs between TiP proteins and uniformly expressed proteins likely adapt basic cellular processes to mediate cellular context-specific functions.
a, TiP protein coverage by CCSB PPI networks for increasing levels of tissue-preferential expression. Shaded error bars are proportional to standard error of proportion, n ≥ 233 genes. b, Spearman correlation coefficients and 95% confidence intervals for correlations between degree or betweenness and tissue specificity for HuRI and Lit-BM (n = 6,684 and 4,971 proteins). c, Fraction of HuRI and Lit-BM that involve TiP proteins compared to fraction of genome that are TiP genes for increasing levels of tissue-preferential expression. d, Number of PPIs in HuRI, involving proteins in GTEx, in which both proteins are expressed in the same tissue, and the mean of the tissue-specific subnetworks. Error bar denotes s.d. e, Test for enrichment of TiP–TiP PPIs (left) and significance of average shortest path between TiP proteins (middle) in each tissue subnetwork, number of TiP proteins in each subnetwork, interacting with other TiP proteins, being part of keratin (KRT) or late-cornified envelope (LCE) protein family (right). f, g, Transcript expression levels across the BLUEPRINT haematopoietic cell lineage (f) and GTEx tissue panel (g) for three candidate genes predicted to function in apoptosis. EG, oesophagus gastroesophageal. h, Histogram of number of untransfected cells and their time of death (left) without (top) and with (bottom) addition of TRAIL. Time of death of cells expressing OTUD6A–GFP fusions versus OTUD6A expression measured as fluorescence (right) without (top) and with (bottom) addition of TRAIL. i, Apoptosis-related network context of OTUD6A and C6ORF222 in HuRI, unfiltered (left) and filtered using colon transverse or mature eosinophil transcript levels (right).
a, Histogram of the number of Mendelian diseases showing symptoms in several tissues. b, Test for enrichment of causal proteins associated with tissue-specific Mendelian diseases to interact with TiP proteins of affected tissues. c, Network neighbourhood of uniformly expressed causal proteins of tissue-specific diseases found to interact with TiP proteins in HuRI, indicating PPI perturbation by mutations. d, Causal genes split by mutation found to perturb PPI to TiP protein (dashed) or not (solid). e, Expression profile of PNKP and interactors in brain tissues and PPI perturbation pattern of disease causing (Glu326Lys) and benign (Pro20Ser) mutation. Yeast growth phenotypes on SC-Leu-Trp (top) or SC-Leu-Trp-His+3AT media (bottom) are shown; green or grey protein symbols denote preferentially or not expressed, respectively.
Extended Data Fig. 10 Mutations in uniformly expressed causal proteins associated with tissue-specific Mendelian diseases perturb interactions to TiP proteins.
Expression profile and interaction perturbation profile of nine causal proteins and their interaction partners. Top, affected tissues were selected for display. Middle, control of activation domain and Gal4 DNA-binding domain plasmid presence and cell density by spotting yeast colonies on SC-Leu-Trp media. Bottom, detection of PPIs by spotting yeast on SC-Leu-Trp-His+3AT media, in which yeast growth indicates PPIs. Red letters denote causal proteins; grey protein symbols denote interaction partners not expressed in affected tissues; black and grey alleles denote pathogenic and not pathogenic, respectively; green protein symbols denote TiP interaction partners in affected tissues.
This file contains methods, supplementary notes and supplementary references.
This file contains all supplementary tables 1-29, which were referenced in the manuscript.
This file contains a Supplementary Tables Guide 1-29.
About this article
Cite this article
Luck, K., Kim, D., Lambourne, L. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020). https://doi.org/10.1038/s41586-020-2188-x