Abstract
The cell cycle, over which cells grow and divide, is a fundamental process of life. Its dysregulation has devastating consequences, including cancer1,2,3. The cell cycle is driven by precise regulation of proteins in time and space, which creates variability between individual proliferating cells. To our knowledge, no systematic investigations of such cell-to-cell proteomic variability exist. Here we present a comprehensive, spatiotemporal map of human proteomic heterogeneity by integrating proteomics at subcellular resolution with single-cell transcriptomics and precise temporal measurements of individual cells in the cell cycle. We show that around one-fifth of the human proteome displays cell-to-cell variability, identify hundreds of proteins with previously unknown associations with mitosis and the cell cycle, and provide evidence that several of these proteins have oncogenic functions. Our results show that cell cycle progression explains less than half of all cell-to-cell variability, and that most cycling proteins are regulated post-translationally, rather than by transcriptomic cycling. These proteins are disproportionately phosphorylated by kinases that regulate cell fate, whereas non-cycling proteins that vary between cells are more likely to be modified by kinases that regulate metabolism. This spatially resolved proteomic map of the cell cycle is integrated into the Human Protein Atlas and will serve as a resource for accelerating molecular studies of the human cell cycle and cell proliferation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
HPA images, including the images from the FUCCI screening, are available as .jpg files from the HPA website (v20, https://www.proteinatlas.org). Uncompressed images were annotated using IDR metadata templates and deposited in the BioImage Archive (accession S-BIAD34, https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD34). The bulk RNA-seq data for tissue samples is available at www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2836/. The single-cell RNA-seq data are available at GEO with accession GSE146773. HPA Cell Atlas imaging, transcriptome, and proteome data, along with interpretation and classification, are available in the HPA (www.proteinatlas.org/humancell). Source data are provided with this paper.
Code availability
The cell profiler pipeline for image analysis and the code for generating the polar-coordinate pseudotime model are available at https://github.com/CellProfiling/SingleCellProteogenomics, and the single-cell sequencing quantification pipeline is available at https://github.com/CellProfiling/FucciSingleCellSeqPipeline.
Change history
05 August 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41586-022-05180-4
References
Malumbres, M. & Barbacid, M. Cell cycle, CDKs and cancer: a changing paradigm. Nat. Rev. Cancer 9, 153–166 (2009).
Massagué, J. G1 cell-cycle control and cancer. Nature 432, 298–306 (2004).
Hartwell, L. H. & Kastan, M. B. Cell cycle control and cancer. Science 266, 1821–1828 (1994).
Barnum, K. J. & O’Connell, M. J. in Cell Cycle Control Vol. 1170 (eds Noguchi, E. & Gadaleta, M. C.), 29–40 (Springer, 2014).
Weinberg, R. A. The retinoblastoma protein and cell cycle control. Cell 81, 323–330 (1995).
Morgan, D. O. Principles of CDK regulation. Nature 374, 131–134 (1995).
Teixeira, L. K. & Reed, S. I. Ubiquitin ligases and cell cycle control. Annu. Rev. Biochem. 82, 387–414 (2013).
King, R. W., Deshaies, R. J., Peters, J. M. & Kirschner, M. W. How proteolysis drives the cell cycle. Science 274, 1652–1659 (1996).
Cho, R. J. et al. Transcriptional regulation and function during the human cell cycle. Nat. Genet. 27, 48–54 (2001).
Whitfield, M. L. et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13, 1977–2000 (2002).
Boström, J. et al. Comparative cell cycle transcriptomics reveals synchronization of developmental transcription factor networks in cancer cells. PLoS One 12, e0188772 (2017).
Lane, K. R. et al. Cell cycle-regulated protein abundance changes in synchronously proliferating HeLa cells include regulation of pre-mRNA splicing proteins. PLoS One 8, e58456 (2013).
Ohta, S. et al. The protein composition of mitotic chromosomes determined using multiclassifier combinatorial proteomics. Cell 142, 810–821 (2010).
Ly, T. et al. A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells. eLife 3, e01630 (2014).
Pagliuca, F. W. et al. Quantitative proteomics reveals the basis for the biochemical specificity of the cell-cycle machinery. Mol. Cell 43, 406–417 (2011).
Ly, T., Endo, A. & Lamond, A. I. Proteomic analysis of the response to cell cycle arrests in human myeloid leukemia cells. eLife 4, e04534 (2015).
Karlsson, J., Kroneis, T., Jonasson, E., Larsson, E. & Ståhlberg, A. Transcriptomic characterization of the human cell cycle in individual unsynchronized cells. J. Mol. Biol. 429, 3909–3924 (2017).
Scialdone, A. et al. Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85, 54–61 (2015).
Bar-Joseph, Z. et al. Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells. Proc. Natl Acad. Sci. USA 105, 955–960 (2008).
Dominguez, D. et al. A high-resolution transcriptome map of cell cycle reveals novel connections between periodic genes and cancer. Cell Res. 26, 946–962 (2016).
Grant, G. D. et al. Identification of cell cycle-regulated genes periodically expressed in U2OS cells and their regulation by FOXM1 and E2F transcription factors. Mol. Biol. Cell 24, 3634–3650 (2013).
Peña-Diaz, J. et al. Transcription profiling during the cell cycle shows that a subset of Polycomb-targeted genes is upregulated during DNA replication. Nucleic Acids Res. 41, 2846–2856 (2013).
Cooper, S. et al. Membrane-elution analysis of content of cyclins A, B1, and E during the unperturbed mammalian cell cycle. Cell Div. 2, 28 (2007).
Davis, P. K., Ho, A. & Dowdy, S. F. Biological methods for cell-cycle synchronization of mammalian cells. Biotechniques 30, 1322–1331 (2001).
Sakaue-Sawano, A. et al. Visualizing spatiotemporal dynamics of multicellular cell-cycle progression. Cell 132, 487–498 (2008).
Zielke, N. & Edgar, B. A. FUCCI sensors: powerful new tools for analysis of cell proliferation. Wiley Interdiscip. Rev. Dev. Biol. 4, 469–487 (2015).
Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).
Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010).
Suzuki, C. et al. ANLN plays a critical role in human lung carcinogenesis through the activation of RHOA and by involvement in the phosphoinositide 3-kinase/AKT pathway. Cancer Res. 65, 11314–11325 (2005).
Shaffer, S. M. et al. Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature 546, 431–435 (2017).
Collins, E. J. et al. Post-transcriptional circadian regulation in macrophages organizes temporally distinct immunometabolic states. Genome Res. (in the press).
Robles, M. S., Cox, J. & Mann, M. In-vivo quantitative proteomics reveals a key contribution of post-transcriptional mechanisms to the circadian regulation of liver metabolism. PLoS Genet. 10, e1004047 (2014).
Fischer, M. et al. p53 and cell cycle dependent transcription of kinesin family member 23 (KIF23) is controlled via a CHR promoter element bound by DREAM and MMB complexes. PLoS One 8, e63187 (2013).
Varjosalo, M. et al. The protein interaction landscape of the human CMGC kinase group. Cell Rep. 3, 1306–1320 (2013).
Pearce, L. R., Komander, D. & Alessi, D. R. The nuts and bolts of AGC protein kinases. Nat. Rev. Mol. Cell Biol. 11, 9–22 (2010).
Wright, P. E. & Dyson, H. J. Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 16, 18–29 (2015).
Sellers, K. et al. Pyruvate carboxylase is critical for non-small-cell lung cancer proliferation. J. Clin. Invest. 125, 687–698 (2015).
Oyinlade, O. et al. Targeting UDP-α-d-glucose 6-dehydrogenase inhibits glioblastoma growth and migration. Oncogene 37, 2615–2629 (2018).
The Cancer Genome Atlas Research Network. The Cancer Genome Atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Uhlen, M. et al. A pathology atlas of the human cancer transcriptome. Science 357, eaan2507 (2017).
Nilsson, P. et al. Towards a human proteome atlas: high-throughput generation of mono-specific antibodies for tissue profiling. Proteomics 5, 4327–4337 (2005).
Uhlen, M. et al. A proposal for validation of antibodies. Nat. Methods 13, 823–827 (2016).
Edfors, F. et al. Enhanced validation of antibodies for research applications. Nat. Commun. 9, 4130 (2018).
Stadler, C., Skogs, M., Brismar, H., Uhlén, M. & Lundberg, E. A single fixation protocol for proteome-wide immunofluorescence localization studies. J. Proteomics 73, 1067–1078 (2010).
Williams, E. et al. The Image Data Resource: a bioimage data integration and publication platform. Nat. Methods 14, 775–781 (2017).
Carpenter, A. E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
The External RNA Controls Consortium. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Köster, J. & Rahmann, S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 34, 3600 (2018).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at ArXiv https://arxiv.org/abs/1802.03426 (2018).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Ietswaart, R., Gyori, B. M., Bachman, J. A., Sorger, P. K. & Churchman, L. S. GeneWalk identifies relevant gene functions for a biological context using network representation learning. Preprint at bioRxiv https://doi.org/10.1101/755579 (2019).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modelling. Nat. Biotechnol. 38, 1408–1414 (2020).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Hsiao, C. J. et al. Characterizing and inferring quantitative cell cycle phase in single-cell RNA-seq data analysis. Genome Res. 30, 611–621 (2020).
Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLOS Comput. Biol. 12, e1004873 (2016).
Talevich, E. & Shain, A. H. CNVkit-RNA. Copy number inference from RNA-sequencing data. Preprint at bioRxiv https://doi.org/10.1101/408534 (2018).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Dennis, G. Jr et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 4, 3 (2003).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 5, e13984 (2010).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).
Santos, A., Wernersson, R. & Jensen, L. J. Cyclebase 3.0: a multi-organism database on cell-cycle regulation and phenotypes. Nucleic Acids Res. 43, D1140–D1144 (2015).
Croft, D. et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 39, D691–D697 (2011).
Binns, D. et al. QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics 25, 3045–3046 (2009).
Szklarczyk, D. et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Cowley, G. S. et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data 1, 140035 (2014).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Jarzab, A. et al. Meltome atlas-thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).
UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).
Mészáros, B., Erdos, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329–W337 (2018).
Hornbeck, P. V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261–D270 (2012).
Eid, S., Turk, S., Volkamer, A., Rippmann, F. & Fulle, S. KinMap: a web-based tool for interactive navigation through human kinome data. BMC Bioinformatics 18, 16 (2017).
Kampf, C., Olsson, I., Ryberg, U., Sjöstedt, E. & Pontén, F. Production of tissue microarrays, immunohistochemistry staining and digitalization within the human protein atlas. JoVE 63, 3620 (2012).
Leonetti, M. D., Sekine, S., Kamiyama, D., Weissman, J. S. & Huang, B. A scalable strategy for high-throughput GFP tagging of endogenous human proteins. Proc. Natl Acad. Sci. USA 113, E3501–E3508 (2016).
Feng, S. et al. Improved split fluorescent proteins for endogenous protein labeling. Nat. Commun. 8, 370 (2017).
Pinello, L. et al. Analyzing CRISPR genome-editing experiments with CRISPResso. Nat. Biotechnol. 34, 695–697 (2016).
Kruskal, W. H. & Wallis, W. A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621 (1952).
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Oliphant, T. E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Semple, J. W. et al. An essential role for Orc6 in DNA replication through maintenance of pre-replicative complexes. EMBO J. 25, 5150–5158 (2006).
Izumi, M. et al. The Mcm2-7-interacting domain of human mini-chromosome maintenance 10 (Mcm10) protein is important for stable chromatin association and origin firing. J. Biol. Chem. 292, 13008–13021 (2017).
Li, J. et al. ZNF32 contributes to the induction of multidrug resistance by regulating TGF-β receptor 2 signaling in lung adenocarcinoma. Cell Death Dis. 7, e2428 (2016).
St-Denis, N. et al. Phenotypic and interaction profiling of the human phosphatases identifies diverse mitotic regulators. Cell Rep. 17, 2488–2501 (2016).
Tran, P. V. Dysfunction of intraflagellar transport proteins beyond the primary cilium. J. Am. Soc. Nephrol. 25, 2385–2386 (2014).
Xu, Y. et al. Effect of estrogen sulfation by SULT1E1 and PAPSS on the development of estrogen-dependent cancers. Cancer Sci. 103, 1000–1009 (2012).
Acknowledgements
We acknowledge the entire staff of the HPA program. We acknowledge S. Ito and H. Masai for providing the stable U2OS FUCCI cell line; the Eukaryotic Single Cell Genomics (ESCG) facility at SciLifeLab for single-cell sequencing; M. Otrocka for access to imaging infrastructure; S. Besson and F. Wong for help with annotating the imaging data using IDR metadata templates (https://idr.openmicroscopy.org); P. Ranefall and C. Wählby for providing support for establishing the Cell Profiler pipeline; A. Kundaje for providing support at Stanford University; and L. M. Smith for providing access to computational resources. Funding was provided by the Knut and Alice Wallenberg Foundation (2016.0204) and the Swedish Research Council (2017-05327) to E.L.
Author information
Authors and Affiliations
Contributions
E.L. conceived the study. D.M., D.P.S., and E.L. developed the methodology for the study. D.M., L.S., R.S., C.G., and P.T. carried out the immunofluorescent experimental work and D.M., F.J., and U.A. contributed to the cell atlas implementation. C.G., N.H.C., and M.D.L. performed the GFP tagging and analysis. D.M., A.B., and C.S. performed the siRNA antibody validation and growth assays. D.M., A.J.C., D.P.S., and E.L. carried out protein imaging data analysis and investigation. A.J.C. performed scRNA-seq, proteogenomic, protein disorder, and upstream kinase enrichment analyses. T.L. reviewed the code and wrote the algorithm method sections. F.D. analysed the bulk RNA-seq data. M.A., C.Z., and A.M. carried out the gene expression association analysis. C.G. performed the yeast homologue analysis. C.L. and F.P. provided the tissue data. D.M., A.J.C., and E.L. wrote the manuscript. B.A., D.P.S., O.C., U.A., and P.T revised the manuscript. D.M., A.J.C., C.G., and D.P.S. created the figures. M.U. initiated the HPA project and provided antibodies. E.L. supervised and administered the project and acquired funding.
Corresponding author
Ethics declarations
Competing interests
M.U. is a co-founder of Atlas Antibodies. The other authors declare that they have no conflict of interest.
Additional information
Peer review information Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Validation examples of CCD and non-CCD proteins using co-localization with mNG-tagged proteins and siRNA gene silencing.
a, The specificity of several dozen antibodies (45 in total with 10 examples presented here; Supplementary Table 3) targeting proteins with CCD expression was validated by co-localization with mNG-tagged protein. Scale bars, 10 μm. b, The specificity of several dozen antibodies (79 in total with 10 examples presented here; Supplementary Table 3) targeting proteins with non-CCD variable expression was validated by co-localization with mNG-tagged protein. Scale bars, 10 μm. c, The specificity of 29 antibodies (three presented here; Figs. 1c, 4b, Supplementary Table 3) was validated with siRNA-mediated gene silencing, which resulted in significantly lower staining intensity. Scale bars, 10 μm. For box plots: centre line, median; box, Q1 and Q3; whiskers, 1.5× IQR below Q1 and above Q3; points, outliers. TTC21B, P = 6.3 × 10−31; DUSP19, P = 0.0025; NET1, P = 9.7 × 10−8 by binomial one-sided tests. TTC21B, n = 620, 358; DUSP19, n = 125, 147; NET1, n = 162,1 49 independent cells for control and siRNA, respectively. d, The specificity of 321 antibodies was validated by independent antibody staining, as exemplified by SCIN. Scale bars, 10 μm.
Extended Data Fig. 2 Time-lapses of cell cycle regulated proteins.
a, The specificity of the anti-ANLN antibody was shown by co-localization with mNG-tagged ANLN. Scale bars, 10 μm. b, Time-lapse microscopy for mNG-tagged ANLN protein demonstrates an increase in nuclear expression over the course of the cell cycle with a peak in late G2. During mitosis and cytokinesis, ANLN localizes to the cell membrane and is involved in the formation of the cleavage furrow (t = 17–18 h and t = 35 h). Scale bar, 10 μm. c, The specificity of the anti-AURKB antibody was validated by co-localization with mNG-tagged AURKB. Scale bars, 10 μm. d, Time-lapse microscopy for mNG-tagged AURKB protein demonstrates an increase in protein expression over the course of the cell cycle with a peak in late G2. During mitosis, AURKB localizes to the mitotic chromosomes at hours 13 and 29. Scale bars, 10 μm. e, The specificity of the anti-GATAD1 antibody was validated by co-localization with mNG-tagged GATAD1. Scale bars, 10 μm. f, Time-lapse microscopy for mNG-tagged GATAD1 protein demonstrates a decrease in nuclear abundance throughout the cell cycle (mitosis at 35 h). Scale bars, 10 μm. g, Time-lapse microscopy for mNG-tagged KIFC1 protein demonstrates a CCD translocation from the nucleoplasm to the kinetochores in late G2 into mitosis at hours 14 and 30 and in early G1. Scale bars, 10 μm. h, Time-lapse microscopy for mNG-tagged NET1 protein demonstrates a CCD increase in protein abundance and translocation from the nucleoplasm to the nucleolus (mitosis at 35 h). Scale bars, 10 μm.
Extended Data Fig. 3 Flow cytometry of FUCCI cells.
Gating strategy used to sort U2OS FUCCI cells into three clusters representing three distinct segments of the cell cycle. This strategy distinguished the cells expressing the red tag (CDT1, G1) from the cells expressing the green tag (GMNN, S and G2). Cells that expressed both markers simultaneously, marking the S-transition, were clustered in a third group. The RNA velocities calculated for these cells (Fig. 2c, Extended Data Fig. 7a, b) validate this sorting strategy, as a cell cycle reflecting interphase progression is clearly observed.
Extended Data Fig. 4 Variation of distribution, bimodality, and cell cycle dependence in distinct high- and low-expressing cell populations.
a, Scatterplot showing the three clusters generated by k-means clustering based on kurtosis and skewness as features for CCD proteins. b, Scatterplot showing the three clusters generated by k-means clustering based on kurtosis and skewness as features for non-CCD proteins. c, Violin plots and histograms showing the population distributions of the normalized mean intensity of each cell per protein for three selected CCD proteins (GATA6, CCNB1, and DEF6). d, Bimodal protein distributions were evaluated for cell cycle dependence separately in both low- and high-expressing cells if the two populations were determined to be distinct. This determination was performed using a Kruskal–Wallis test, adjusted for multiple testing, and if they had greater than a twofold difference in expression between them. e, GATA6 expression over the cell cycle. While GATA6 produces a bimodal population intensity distribution, it represents a single population of cells and was evaluated for cell cycle dependence as such, rather than as two populations of high- and low-expressing cells. f, SLC25A42 expression over the cell cycle is exhibited as two distinct populations of high- and low-expressing cells (left). The low-expressing cells (centre) have CCD expression, forming a second harmonic over the cell cycle, while the high-expressing cells (right) do not display correlation to the cell cycle. g, HPSE expression over the cell cycle has two distinct populations of high- and low-expressing cells that both display non-CCD expression, which may point to a temporal component related to a different cellular process.
Extended Data Fig. 5 Validation of the FUCCI cell model.
a, Images from a time-lapse microscopy series of FUCCI cells show that the length of the cell cycle phases marked by the different colours of the FUCCI tags are 10.8 h for G1, 2.6 h for S-transition, and 11.9 h for G2. U2OS FUCCI cells allow monitoring of the cell cycle by expressing two fluorescently tagged cell cycle markers: CDT1 during G1 phase (red), GMNN during S and G2 phases (green), and both during the S-transition (yellow). Scale bars, 10 μm. b, The polar coordinate model transfers the FUCCI marker information into a linear model of pseudotime. The tagged CDT1 (red) and GMNN (green) log-intensities are displayed across pseudotime, which they are used to calculate. The solid line is the moving average across 100 cells (264,863 total), and light shading fills between the 25th and 75th percentiles of intensity. c, Cell division time can also be predicted from scRNA-seq data, and so we performed a comparison between the pseudotime calculated using FUCCI marker intensities in this work (x-axis, colour mapping) and the pseudotime inferred from scRNA-seq markers using the tool Peco59 (y-axis). The most notable feature of this comparison is the more continuous nature of the pseudotime calculated using FUCCI marker intensities, indicating that the experimental FUCCI marker intensities provide a better basis than inferred pseudotime for the integration of single-cell proteomic and transcriptomic data. d, CNVs identified in single-cell transcriptomic data correlate to the cell cycle, thus validating the FUCCI model. e, BrdU incorporation (blue) into FUCCI cells. Incorporation of the synthetic nucleotide BrdU occurs during DNA replication in S-phase of proliferating cells and remains through the rest of the cell cycle. Here, G1 cells with red nuclei show no incorporation of BrdU, whereas most of the S and G2 cells with green nuclei show incorporation of BrdU (bottom left and centre; G1, n = 45; S-trans, n = 32; S and G2, n = 73 independent cells). For box plots: centre line, median; box, Q1 and Q3; whiskers, 1.5× IQR below Q1 and above Q3; points, outliers. Based on this, we can estimate the proportion of cells in each phase (bottom right). These results show the validity of the FUCCI model. Scale bar, 10 μm.
Extended Data Fig. 6 Validation of variability observed with antibody staining.
a, Box plot showing gene expression levels for all proteins that exhibit cell-to-cell heterogeneity and all proteins in the HPA Cell Atlas (mapped proteome). There is no significant difference (P < 0.01) in gene expression levels between the antibodies included in this study and all antibodies used in the HPA Cell Atlas (P = 0.03, two-sample Student’s t-test; variation, n = 1,607; mapped proteome, n = 9,806 independent proteins). b, Bar plot showing staining intensity levels for antibodies included in this study and all proteins in the HPA Cell Atlas (mapped proteome). The average immunofluorescence signal intensity measured for proteins showing cell-to-cell variation and for all proteins in the HPA Cell Atlas shows no significant difference for low signal antibodies. Low signal antibodies are not enriched among the cell-to-cell variability dataset (strong, P = 1.12 × 10−5; moderate, P = 0.0004; weak, P = 0.99 by binomial one-sided tests; asterisk denotes significance). c, Since the HPA antibodies are purified on affinity columns coupled with their corresponding antigen, antibody concentration after purification can serve as a proxy for antibody affinity (albeit not a perfect proxy). Box plot showing antibody concentration levels for all proteins that showed cell-to-cell heterogeneity and all proteins in the HPA Cell Atlas (mapped proteome). There is no significant difference between the antibodies included in this study and all antibodies used in the HPA Cell Atlas, hence we can conclude that the cell-to-cell heterogeneity is probably not due to differences in antibody affinity. The average antibody concentration for the antibodies published on the HPA Cell Atlas is 0.1710 mg ml−1, and the average concentration for the antibodies used in this study is 0.1712 mg ml−1 (P = 0.1084, two-sample Student’s t-test; variation, n = 1,415; mapped proteome, n = 14,942 independent proteins). d, Box plot showing the variance for all proteins that showed cell-to-cell heterogeneity and microtubules. Proteins showing heterogeneity show significantly higher variance than the variance of the microtubules measured from each well (P = 1.6 × 10−292, one-sided Kruskal–Wallis test; n = 1,180 independent IF stains). e, Box plot showing the Gini index for all proteins that showed cell-to-cell heterogeneity and for microtubules. Proteins displaying heterogeneity show significantly higher Gini indexes than microtubules (P < 5 × 10−324, one-sided Kruskal–Wallis test; n = 1,180 independent IF stains). f, Gene ontology enrichment analysis for CCD proteins shows significantly enriched terms for the biological processes domain. Each circle represents a GO term, and line width corresponds to the number of genes that overlap between the two connected gene sets. Similar terms are grouped and labelled. g, GO enrichment analysis for non-CCD enzymes shows enrichment for basic metabolic functions, while CCD enzymes are enriched for cell cycle functions. h, Subcellular localizations of CCD and non-CCD proteins. Asterisk denotes enrichment relative to the HPA-mapped proteome. For CCD proteins: cleavage furrow, P = 0.0003; cytokinetic bridge, P = 4.3 × 10−93; kinetochores, P = 3.5 × 10−5; midbody ring, P = 2.3 × 10−15; midbody, P = 9 × 10−36; mitotic chromosome, P = 9.8 × 10−66; mitotic spindle, P = 6.3 × 10−41; nucleoli, P = 0.005. For non-CCD proteins: intermediate filaments, P = 1.9 × 10−8; mitochondria, P = 1.9 × 10−11; nuclear bodies, P = 3.01 × 10−5; nucleoli, P = 8.8 × 10−7. i, Protein–protein interaction (PPI) network of CCD proteins and CCD transcripts using the STRING70 database. Proteins with known associations to the cell cycle (by GO term, teactome pathway, or cyclebase phenotype) are represented as green squares (left). Transcriptionally regulated CCD proteins are tightly clustered in the centre of the network, with an extended network of novel CCD genes and particularly ones that are not transcriptionally regulated (right). For box plots: centre line, median; box, Q1 and Q3; whiskers, 1.5× IQR below Q1 and above Q3; points, outliers.
Extended Data Fig. 7 Filtering of non-cycling cells for RNA-seq analysis, selection of percentage variance cutoffs, and UMAP stability analysis.
a, Dimensionality reduction of scRNA-seq data by UMAP with RNA velocities overlaid in arrows and coloured according to clustering. Each point represents a cell, and cluster 5 (brown) appears to be cells that are sequestered from cycling after G1. b, Testing cutoffs of between 1% and 10% additional percentage variance explained by the cell cycle in scRNA-seq data over random shows that 8% is an appropriate cutoff. At cutoffs 1–3%, the cycle is completely removed from non-CCD proteins, but it appears that there are enough false-positive calls to impact the structure of the UMAP for CCD-only genes. At 4–8%, there is a small eddy in the UMAP for non-CCD genes but no cohesive cycle, and at 9–10%, there is a distinct cycle encompassing all cells. We chose to use a conservative final cutoff of 8% for calling both CCD transcripts and proteins to err on the side of having fewer false positives while balancing precision and recall for these analyses. c, Confirmation that the UMAP structure for scRNA-seq data is stable across various parameter selections. d, BIRC5 and UBE2C provide anecdotal examples of cycling genes (orange) with differential cycling between one or more transcript isoforms (green). Line, moving mean; darker shade, 25th to 75th percentile range; lighter shade, 10th to 90th percentile range; points, individual cell data.
Extended Data Fig. 8 Illustrations of percentage variance analysis, comparison to LASSO, and batch effect analysis.
a, Percentage variance explained for RNA is shown against the Gini index for the gene (blue, CCD; red, non-CCD). b, Randomization analysis of the protein IF data (left) and scRNA-seq data (right) for each gene was used to determine whether a protein or RNA was CCD (blue) or non-CCD (red). The significance scores, adjusted for multiple correction, on the vertical axis show that nearly all proteins and RNAs are significantly different from random, and so requiring 8% additional percentage variance explained by the cell cycle over random was the predominant cutoff. c, Examples of NFAT5 protein IF data (blue points and trace) and randomizations of cell order in pseudotime (red points and trace). These examples provided (from left to right) produced the minimum, first quartile, median, third quartile, and maximum percentage variance explained by the random fluctuations. d, LASSO analysis of marker genes was overly conservative compared to the pseudotime analysis in this work. A higher false-negative rate for calling CCD genes (top) and proteins (bottom) leaves a cyclic pattern in the UMAP dimensionality reduction expected of CCD genes and proteins (left) in the non-CCD ones (right). e, Principal component analysis showed no discernible batch effect between the three plates with 384 cells each, and instead the cell cycle phases roughly assigned by FACS sorting provide clear separation in the first two components (PCs). Results are shown before and after filtering out the non-cycling cells. f, Comparing the individual batches to the combined data for RNA-seq again confirms that no batch effects were present in the RNA-seq data. Each plot contains relative RNA expression (0 to 1, y-axes) versus cell division time (0 to 25.3 h, x-axes). Line, moving mean; darker shade, 25th to 75th percentile range; lighter shade, 10th to 90th percentile range; points, individual cell data.
Extended Data Fig. 9 Similarity of temporal expression profiles indicates associations in time.
Bottom, examples of temporal expression profiles for single-cell protein (blue) and RNA expression (orange). Line, moving mean; darker shade, 25th to 75th percentile range; lighter shade, 10th to 90th percentile range; points, individual cell data. The compartment for which the protein abundance was measured is denoted above: nuc, nucleus; cyto, cytoplasm; cell, entire cell. The proteins of interest are shown in green, and microtubules are shown in red. Similar temporal profiles between proteins indicate compartmentalization and association in time. For the G1 group, well-known CCD proteins such as ORC6, which is required for cell entry into S phase87, and MCM10, which is required for DNA replication88, were found to have similar patterns to the novel CCD protein ZNF32, whose overexpression has been associated with a shorter survival time in lung adenocarcinoma cells89. The group that has peak expression at the end of G1 includes known proteins such as CCNE1, along with the novel CCD protein DUSP19 (Fig. 3), a phosphatase whose depletion results in increased mitotic defects90. The G2 group includes known proteins, such as PC (Fig. 3), CCNB1, AURKB (Extended Data Fig. 2c, d) and BUB1B. Novel CCD proteins in this group include the phosphatase DUSP18 (Fig. 3); the transcription factor NFAT5 (Fig. 1F); a retrograde intraflagellar transport (IFT) protein TTC21B (Fig. 3), which has been implicated in several ciliopathies and may be implicated in additional roles beyond the primary cilium91; the oestrogen-sulfating enzyme PAPSS1 (Fig. 3), for which overexpression has been reported to affect proliferation92; the methyltransferase N6AMT1 (Fig. 3); the uncharacterized protein PHLDB1 (Fig. 3); the enzyme DPH2; and transcription factor FLI1 (Fig. 3). Scale bar, 10 μm.
Extended Data Fig. 10 Clustering of gene expression in tissue and tumours for known and novel CCD proteins.
a, Hierarchical clustering of bulk transcript expression (log-transformed TPM) in various normal and cancer tissue types for known CCD proteins. The expression levels of the proliferation markers MCM6, CDK1, PCNA, MCM2 and KI67 are highlighted at the top as a general measure of the proliferative activity of the tissues. Four clusters are identified: c1 and c2 contain mostly normal tissues with midrange expression of the proliferation markers; c3 contains tissues with high expression of the proliferation markers, including tumours; and c4 contains normal tissues with low proliferative activity. b, Box plots displaying the average transcript level for known CCD proteins in the four different clusters from a. There is a significant difference in gene expression levels between the different clusters (c4–c1, P = 0.04; c4–c3, P = 0.006; c3–c2, P < 2 × 10−16; c3–c1, P = 3.5 × 10−8; c4–c2, P = 4 × 10−14; c2–c1, P = 6.9 × 10−14) by one-sided Kruskal–Wallis test. n = 238 independent tissues for each cluster. c, Hierarchical clustering of bulk transcript expression (log-transformed TPM) in various normal and cancer tissue types for novel CCD proteins. The expression levels of the proliferation markers MCM6, CDK1, PCNA, MCM2 and KI67 are highlighted at the top as a general measure of the proliferative activity of the tissues. Four clusters are identified: c1 contains normal tissues with midrange expression of the proliferation markers and tissues with high expression of the proliferation markers, including tumours; c2 contains bone marrow and bone marrow cancer tissues; c3 contains cerebral tissues with testis; and c4 contains normal tissues with low proliferative activity. d, Box plots displaying the average transcript level for novel CCD proteins in the four different clusters from c. There is a significant difference between gene expression in the different clusters, except between clusters c2 and c4 (c4–c1, P = 4.3 × 10−14; c4–c2, P = 0.6; c4–c3, P < 2 × 10−16; c1–c2, P = 2.9 × 10−12; c1–c3, P = 0.001; c3–c2, P = 3.7 × 10−14) by one-sided Kruskal–Wallis test. n = 301 independent tissues for each cluster. For box plots: centre line, median; box, Q1 and Q3; whiskers, 1.5× IQR below Q1 and above Q3; points, outliers.
Supplementary information
Supplementary Tables
This file contains Supplementary Tables 1-16.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mahdessian, D., Cesnik, A.J., Gnann, C. et al. Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature 590, 649–654 (2021). https://doi.org/10.1038/s41586-021-03232-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-021-03232-9
This article is cited by
-
Gene trajectory inference for single-cell data by optimal transport metrics
Nature Biotechnology (2024)
-
Pumping the brakes on RNA velocity by understanding and interpreting RNA velocity estimates
Genome Biology (2023)
-
Multimodal perturbation analyses of cyclin-dependent kinases reveal a network of synthetic lethalities associated with cell-cycle regulation and transcriptional regulation
Scientific Reports (2023)
-
Single-cell subcellular protein localisation using novel ensembles of diverse deep architectures
Communications Biology (2023)
-
Uncovering biology by single-cell proteomics
Communications Biology (2023)