The cell is a multi-scale structure with modular organization across at least four orders of magnitude1. Two central approaches for mapping this structure—protein fluorescent imaging and protein biophysical association—each generate extensive datasets, but of distinct qualities and resolutions that are typically treated separately2,3. Here we integrate immunofluorescence images in the Human Protein Atlas4 with affinity purifications in BioPlex5 to create a unified hierarchical map of human cell architecture. Integration is achieved by configuring each approach as a general measure of protein distance, then calibrating the two measures using machine learning. The map, known as the multi-scale integrated cell (MuSIC 1.0), resolves 69 subcellular systems, of which approximately half are to our knowledge undocumented. Accordingly, we perform 134 additional affinity purifications and validate subunit associations for the majority of systems. The map reveals a pre-ribosomal RNA processing assembly and accessory factors, which we show govern rRNA maturation, and functional roles for SRRM1 and FAM120C in chromatin and RPS3A in splicing. By integration across scales, MuSIC increases the resolution of imaging while giving protein interactions a spatial dimension, paving the way to incorporate diverse types of data in proteome-wide cell maps.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
A web portal is available at http://nrnb.org/music with links to all major resources used for this study. These include the MuSIC map (https://doi.org/10.18119/N9188W); the immunofluorescence (HPA) and AP–MS data (BioPlex 2.0) on which the map is based; and data for the AP–MS pull-down experiments performed as follow-up. The new AP–MS data have also been included as part of the larger compendium of protein interactions in the next version of the BioPlex resource (BioPlex 3.029). AP–MS data, including filtered and unfiltered interaction lists as well as raw mass spectrometry data, are also available at http://bioplex.hms.harvard.edu. The image data and associated metadata can also be found in the HPA database (https://www.proteinatlas.org). The Gene Expression Omnibus (GEO) accession number for eCLIP data generated in this study is GSE171553. Source data are provided with this paper.
The MuSIC pipeline is available at https://github.com/idekerlab/MuSIC along with a detailed step-by-step guide to building a MuSIC map.
Harold, F. M. Molecules into cells: specifying spatial architecture. Microbiol. Mol. Biol. Rev. 69, 544–564 (2005).
Mori, H. & Cardiff, R. D. Methods of immunohistochemistry and immunofluorescence: converting invisible to visible. In The Tumor Microenvironment, Methods in Molecular Biology Vol. 1458 (eds Ursini-Siegel, J. & Beauchemin, N.) 1–12 (Humana Press, 2016).
Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).
Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
Schaffer, L. V. & Ideker, T. Mapping the multiscale structure of biological systems. Cell Syst. 12, 622–635 (2021).
Ouyang, W. et al. Analysis of the Human Protein Atlas Image Classification competition. Nat. Methods 16, 1254–1261 (2019).
Grover, A. & Leskovec, J. node2vec: scalable feature learning for networks. In KDD ’16: Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 855–864 (2016).
Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Deep Learning Vol. 1 (MIT Press, 2016).
Fortunato, S. & Hric, D. Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016).
Go, C. D. et al. A proximity-dependent biotinylation map of a human cell. Nature 595, 120–124 (2021)
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
Deckert, J. et al. Protein composition and electron microscopy structure of affinity-purified human spliceosomal B complexes isolated under physiological conditions. Mol. Cell. Biol. 26, 5528–5543 (2006).
Charenton, C., Wilkinson, M. E. & Nagai, K. Mechanism of 5′ splice site transfer for human spliceosome activation. Science 364, 362–367 (2019).
Yoshikatsu, Y. et al. NVL2, a nucleolar AAA-ATPase, is associated with the nuclear exosome and is involved in pre-rRNA processing. Biochem. Biophys. Res. Commun. 464, 780–786 (2015).
Chaudhuri, S. et al. Human ribosomal protein L13a is dispensable for canonical ribosome function but indispensable for efficient rRNA methylation. RNA 13, 2224–2237 (2007).
Tafforeau, L. et al. The complexity of human ribosome biogenesis revealed by systematic nucleolar screening of pre-rRNA processing factors. Mol. Cell 51, 539–551 (2013).
Eppens, N. A. et al. Deletions in the S1 domain of Rrp5p cause processing at a novel site in ITS1 of yeast pre-rRNA that depends on Rex4p. Nucleic Acids Res. 30, 4222–4231 (2002).
De Silva, D., Tu, Y.-T., Amunts, A., Fontanesi, F. & Barrientos, A. Mitochondrial ribosome assembly in health and disease. Cell Cycle 14, 2226–2250 (2015).
Blencowe, B. J. et al. The SRm160/300 splicing coactivator subunits. RNA 6, 111–120 (2000).
The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Pavan Kumar, P. et al. Phosphorylation of SATB1, a global gene regulator, acts as a molecular switch regulating its transcriptional activity in vivo. Mol. Cell 22, 231–243 (2006).
Pomeranz Krummel, D. A., Oubridge, C., Leung, A. K. W., Li, J. & Nagai, K. Crystal structure of human spliceosomal U1 snRNP at 5.5 A resolution. Nature 458, 475–480 (2009).
Fleckner, J., Zhang, M., Valcárcel, J. & Green, M. R. U2AF65 recruits a novel human DEAD box protein required for the U2 snRNP-branchpoint interaction. Genes Dev. 11, 1864–1872 (1997).
Van Nostrand, E. L. et al. A large-scale binding and functional map of human RNA-binding proteins. Nature 583, 711–719 (2020).
Van Nostrand, E. L. et al. Robust, cost-effective profiling of RNA binding protein targets with single-end enhanced crosslinking and immunoprecipitation (seCLIP). In mRNA Processing, Methods in Molecular Biology Vol. 1648 (ed. Shi, Y.) 177–200 (Humana Press, 2017).
Stryer, L. Fluorescence energy transfer as a spectroscopic ruler. Annu. Rev. Biochem. 47, 819–846 (1978).
Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903 (2017).
Huttlin, E. L. et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell 184, 3022–3040 (2021).
Williams, S. G. & Hall, K. B. Human U2B″ protein binding to snRNA stemloops. Biophys. Chem. 159, 82–89 (2011).
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. Preprint at https://arxiv.org/abs/1608.06993 (2016).
Nusinow, D. P. et al. Quantitative proteomics of the Cancer Cell Line Encyclopedia. Cell 180, 387–402 (2020).
We thank C. Ng, A. Palmer, Q. Zhang, Y. Quan, members of the laboratories of T.I. and E.L., the Human Protein Atlas and J. Swedlow for discussion and comments; M. Dow for helping us to improve the MuSIC GitHub repository and test the MuSIC pipeline; and the Cell Profiling facility and C. Stadler at the Science for Life Laboratory for help with in situ fractionation. This work was supported by the National Institutes of Health (NIH) under grants U54 CA209891, U01 MH115747, P41 GM103504 and R01 HG009979 to T.I., F99 CA264422 to Y.Q., U24 HG006673 to E.L.H., S.P.G. and J.W.H., U41 HG009889 and R01s HL137223 and HG004659 to G.W.Y. and R50 CA243885 to J.F.K.; by a gift from Google Ventures to J.W.H. and S.P.G.; by the Erling-Persson family foundation, Knut and Alice Wallenberg Foundation (2016.0204) and the Swedish Research Council (2017-05327) to E.L.; and by the Belgian Fonds de la Recherche Scientifique (F.R.S./FNRS), the Université Libre de Bruxelles (ULB), the European Joint Programme on Rare Diseases (‘RiboEurope’ and ‘DBAcure’), the Région Wallonne (SPW EER) (‘RIBOcancer’), the Internationale Brachet Stiftung and the Epitran COST action (CA16120) to D.L.J.L.
T.I. is a co-founder of Data4Cure, is on the Scientific Advisory Board and has an equity interest. T.I. is on the Scientific Advisory Board of Ideaya BioSciences and has an equity interest. G.W.Y is a co-founder, a member of the Board of Directors, on the Scientific Advisory Board, an equity holder and a paid consultant for Locanabio and Eclipse BioInnovations. G.W.Y is a visiting professor at the National University of Singapore. The terms of these arrangements have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies. E.L is on the Scientific Advisory Boards of Cartography Biosciences, Nautilus Biotechnology and Interline Therapeutics, and has an equity interest in all of these. J.W.H. is a co-founder of Caraway Therapeutics, is on the Scientific Advisory Board and has an equity interest. J.W.H. is Founding Scientific Advisor for Interline Therapeutics.
Peer review information Nature thanks Jason Swedlow and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Histogram showing distribution in number of antibodies per protein over 661 proteins included in MuSIC. b, Histogram showing distribution in antibody quality scores over antibodies used in this study. c, Immunofluorescence images for alternative antibodies (columns) targeting the same protein (rows). Colours represent immunostained protein (green), cytoskeleton (red), or nucleus (blue). Images show high reproducibility for different antibodies against the same protein. d, Comparison of localizations for proteins in MuSIC (HEK293 cells, red) versus all proteins assayed by HPA in any cell line (grey). Localizations as defined by the HPA project4.
a, Embedding immunofluorescence (IF) images using DenseNet. The 1024-dimension feature vector for each IF image was extracted from a DenseNet-12131 model trained to classify the IF image into one or several of 28 pre-defined protein localization classes from HPA. b, Two-dimensional visualization (UMAP, n_neighbours = 5) for the 1,451 image embeddings associated with the 661 proteins in MuSIC. c, Ability of different image embedding methods (coloured curves) to generate image-image similarities (cosine similarity) in agreement with protein-protein interactions in BioPlex 2.0. d, Node2vec8 workflow. The feature vector generated by node2vec captures the pattern of interaction neighbourhood for the respective node in input network. e, Embedding AP–MS data using node2vec. The input network to node2vec was constructed by treating each protein as a node and assigning edges between protein pairs that were identified as physically interacting in the AP–MS data. The two-dimensional visualization (UMAP, n_neighbours = 5) for AP–MS embeddings associated with 661 proteins in MuSIC is shown at right. f, Network showing all proteins (grey) that physically interact with SNRPC and SNRPB2 (blue) in BioPlex 2.0. SNRPC and SNRPB2 do not physically interact, but the cosine similarity of their embedded features is 0.93 due to shared interaction neighbourhood. In many cases of two proteins with high node2vec similarity but without direct interaction in AP–MS data, we found that neither protein had yet been tagged as bait for an affinity purification experiment. In these cases, the node2vec embedding suggests gaps in existing AP–MS data. g, Ability of different AP–MS embedding methods to generate protein-protein similarities (cosine similarity) in agreement with protein pairwise similarities computed from HPA images.
a, b, Protein pairs ranked by similarity in AP–MS embedding enrich for the most similar protein pairs in IF (a), and vice versa (b). c, Calibrating physical diameter, D, of subcellular components against the number of proteins, C, assigned to the corresponding Gene Ontology (GO) terms. d, Supervised model (random forest) estimates physical proximity (nm) of all pairs of proteins from their IF and AP–MS embeddings. e, Performance of model in recovering protein-protein distances in GO in five-fold cross validation (red, Pearson’s r). Equivalent calculation for random feature sets (grey). Statistics calculated using two-sided paired t-test. Data are presented as mean values +/- standard deviation.
a, Using multi-scale community detection, protein systems of increasing sizes are discovered as the threshold for protein-protein distance is progressively increased. b, CliXO community detection has four parameters (depth 𝛼, y-axis; breadth β, x-axis; minimum modularity m and modularity significance z, red circle backslash) that affect the sensitivity with which communities are identified and thus the size of the hierarchy. c, d, Dot plots in which each dot is a community hierarchy generated with a particular set of parameters. The selection for MuSIC is highlighted in red. This selection was among several that were optimal based on enrichment for protein-protein interactions in Human Cell Map (c) and co-essentialities from DepMap (d). Examples of other parameter sets are shown in blue. e, Map from Fig. 2 with system colour showing enrichment for co-essentialities among protein pairs that are specific to that system. Enrichment of each system is assessed empirically, using 1,000 randomized hierarchies, followed by Benjamini–Hochberg multiple test correction to obtain FDR (orange gradient).
a, Distributions of protein-protein distance z-scores among the seven proteins in the PRRPA system for IF (top, red) or AP–MS (bottom, blue) modalities, calibrated to all such distances, respectively (grey). Statistics calculated using one-sided Mann–Whitney U test. b, Specific recovery of new AP–MS interactions within PRRPA is shown (dark blue bar), in comparison to interactions between proteins in PRRPA and other proteins organized under the same parent systems (“Ribosome” and “Ribosome biogenesis assembly”, light blue bar), or between proteins in PRRPA and those organized elsewhere in MuSIC (grey bar). c, Mature 28S/18S rRNA ratio under siRNAs targeting each PRRPA protein (green) versus scrambled siRNA (grey), n = 3 biological replicates. FDR from two-sided t-test with Benjamini–Hochberg correction. Data are presented as mean values +/- standard deviation. d–i, Western blot analysis (d, e, Simple western assay; f–i, SDS–PAGE) of target protein abundance after treating HEK293T cells with respective siRNA for 72 h (Supplementary Tables 6, 7). The siRNAs highlighted in red were selected to assess the perturbation of mature rRNA ratio (28S/18S rRNA) when knocking down target protein, with protein knockdown efficiency confirmed using western blot in three additional biological replicates. For source data, see Supplementary Fig. 1 (gel; d–i) and Supplementary Fig. 2 (total RNA profiles; c).
a, Categorization of proteins in “Ribosome biogenesis community” by whether they have been previously identified in human ribosome biogenesis. Excludes PRRPA proteins described in Fig. 5b–d. b, Structure of human pre-rRNA and probes used for northern blot. In eukaryotes, 3 out of 4 mature rRNAs (18S, 5.8S, and 28S rRNAs) are produced from a single long polycistronic precursor (47S) synthesized by RNA polymerase I. The mature rRNAs are interspersed with the 5′ and 3′ external transcribed spacers (ETS) and internal transcribed spacer (ITS) 1 and 2. The probes used in the northern blot (5′-ETS, ITS1, and ITS2) are indicated and colour-coded. c, Total RNA extracted from the indicated cell line, which was transfected with a DsiRNA specific to the target protein for 72 h and analysed by northern blotting with probes specific to the 5′-ETS, ITS1, and ITS2 sequences (Supplementary Table 8). As controls, cells were either untreated, transfected with a scrambled silencer, or transfected with a silencer targeting UTP18 (positive control involved in small ribosomal subunit biogenesis). Heat map colour shows the percentage of each pre-rRNA species with respect to the scramble control. For gel source data, see Supplementary Fig. 1. d, For protein baits in new AP–MS experiments (x axis), fraction of interacting preys that fall within the Ribosome biogenesis community (blue bars) versus elsewhere (grey bars). Only new AP–MS interactions are considered for this analysis. RNPS1 does not belong to Ribosome biogenesis community and serves as a negative control. e, IF images showing similar cytoplasmic staining for proteins in “Mito-cyto ribosomal cluster.” Cytoplasmic staining is dim for MRPS9, MRPS14 and MRPS31 compared to their predominant mitochondrial locations. Colours represent immunostained protein (green), cytoskeleton (red) and nucleus (blue). f, g, Corresponding distributions of protein-protein distance z-scores for IF (f, red) or AP–MS (g, blue), calibrated to all such distances, respectively (grey). Statistics calculated using one-sided Mann–Whitney U test. h, Two-dimensional projection of proteins in Mito-cyto ribosomal cluster, as in Fig. 5f. Proteins coloured according to known affiliations to cytoplasmic ribosome or mitochondrial ribosome. i, Validated AP–MS interactions in Mito-cyto ribosomal cluster. Note that only one out of seven proteins was previously tagged as bait in BioPlex 2.0 (light blue node), thus most physical associations (dark blue edges) among protein pairs were newly identified in this study.
a, IF images showing similar nucleoplasm and nuclear speckles signals among proteins in the “Chromatin regulation complex.” Colours represent immunostained protein (green) and cytoskeleton (red). b, Distributions of pairwise protein distance z-scores among the proteins in the Chromatin regulation complex for IF (top, red) or AP–MS (bottom, blue) modalities, calibrated to all such distances, respectively (grey). Statistics calculated using one-sided Mann–Whitney U test. c, Immunofluorescent proteins (rows) imaged in HEK293 cells, untreated (left) or treated (right) with in situ fractionation to remove soluble cytoplasmic and loosely held nuclear proteins. Chromatin-binding proteins remain after treatment. Blue, nucleus; other colours as in a. For image source data, see Supplementary Fig. 3. d, IF images showing similar nucleoplasm signals among proteins in “RNA splicing complex 3.” e, Similar display for RNA splicing complex 3 as in b. f, Comparison of 500 top differentially expressed mRNAs (absolute fold change) resulting from shRNA knockdown of each of five genes (see Supplementary Table 9 for file accessions). Bar chart shows number of differential mRNAs shared by different gene groups indicated by black dots beneath each bar. One-sided one-sample t-test. g, Comparison among the top 10 pathways (Gene Ontology Biological Process) returned from Gene Set Enrichment Analysis using the top 500 differentially expressed transcripts. Bar chart shows number of enriched pathways shared by different gene groups indicated by black dots beneath each bar. One-sided one-sample t-test. h, eCLIP workflow. RBP, RNA-binding protein. NGS, next generation sequencing.
a, b, Examples of proteins with strong AP–MS protein interactions that have very different IF localization patterns. Colours represent immunostained protein (green) and cytoskeleton (red). c, Degree of co-essentiality for gene pairs within PRRPA (teal bar) shown in comparison to remaining pairs of genes assigned to the more general system that contains it, “Ribosome biogenesis community” (green bar), as well as all other gene pairs in MuSIC (grey bar). d, Similar analysis as in (c) for “RNA splicing complex 3.” Parent systems are “RNA processing complex 1” and “RNA splicing complex family.” e, Protein co-abundance for MuSIC systems, calculated from the median Pearson correlation of pairwise protein abundance over 375 diverse cell lines32. The plot shows all systems with fewer than 20 proteins and co-abundance measurements for >50% of protein pairs. Significance is assessed empirically (one-sided), using 1,000 randomized MuSIC hierarchies, followed by Benjamini–Hochberg multiple test correction to obtain FDR (colour of bar). Protein co-abundance for a system provides evidence for its presence in cell types beyond HEK293.
This file is consisted of three supplementary figures and provides the source gel data (Supplementary Figure 1), total RNA profiles (Supplementary Figure 2) and source in situ fractionation data (Supplementary Figure 3).
This file contains Supplementary Methods and Supplementary References.
MuSIC proteins and associated data.
Literature collection of subcellular components used for calibrating physical diameter (related to Extended Data Fig. 3c).
MuSIC systems and associated data.
Literature collection of subcellular components used for validating MuSIC estimated diameter (related to Fig. 3b).
866 reproducible and significant (IDR cut-off of 0.01, Fisher’s Exact test P ≤ 0.001, fold enrichment ≥8) eCLIP peaks of RPS3A.
Sequences of siRNA and DsiRNA used in this study.
Antibodies used in this study.
Sequences of northern blot probes used for pre-rRNA analysis (related to Fig. 5e and Extended Data Fig. 6b, c).
ENCODE file accessions used for RNA-seq analysis (related to Extended Data Fig. 7f, g).
About this article
Cite this article
Qin, Y., Huttlin, E.L., Winsnes, C.F. et al. A multi-scale map of cell structure fusing protein images and interactions. Nature (2021). https://doi.org/10.1038/s41586-021-04115-9