A central question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds of cell lines have been used to cluster genes into ‘co-essential’ pathways, but this approach has been limited by ubiquitous false positives. In the present study, we develop a statistical method that enables robust identification of gene co-essentiality and yields a genome-wide set of functional modules. This atlas recapitulates diverse pathways and protein complexes, and predicts the functions of 108 uncharacterized genes. Validating top predictions, we show that TMEM189 encodes plasmanylethanolamine desaturase, a key enzyme for plasmalogen synthesis. We also show that C15orf57 encodes a protein that binds the AP2 complex, localizes to clathrin-coated pits and enables efficient transferrin uptake. Finally, we provide an interactive webtool for the community to explore our results, which establish co-essentiality profiling as a powerful resource for biological pathway identification and discovery of new gene functions.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Context-defined cancer co-dependency mapping identifies a functional interplay between PRC2 and MLL-MEN1 complex in lymphoma
Nature Communications Open Access 17 July 2023
Nature Communications Open Access 06 April 2023
Genome Biology Open Access 09 March 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
The Achilles project 18C3 release is publicly available at https://ndownloader.figshare.com/files/12704099 or https://depmap.org/portal/download/all under release ‘DepMap Public 18Q3’ and file ‘gene_effect.csv’. The HUGO Gene Nomenclature Committee Database is accessible at https://www.genenames.org. The STRING database is accessible at https://string-db.org. The CORUM database is accessible at https://mips.helmholtz-muenchen.de/corum. The hu.MAP database is accessible at http://proteincomplexes.org. The DoRothEA database is accessible at https://saezlab.github.io/dorothea. The COXPRESdb database is accessible at https://coxpresdb.jp. Data supporting the findings of the present study are available upon reasonable request. Lipidomic raw data, acquisition methods and quantitative results are available as Supplementary Data 5–7. The raw MS proteomic data have been deposited to the ProteomeXchange Consortium via the PRIDE86 partner repository (http://www.ebi.ac.uk/pride) with the dataset identifier PXD023558. Source data are provided with this paper.
Code to generate co-essential gene pairs, co-essential modules, modules with cancer-type-specific dependencies and the 2D layout is available at https://github.com/kundajelab/coessentiality.
Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Chuang, H.-Y., Hofree, M. & Ideker, T. A decade of systems biology. Annu. Rev. Cell Dev. Biol. 26, 721–744 (2010).
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016).
Horlbeck, M. A. et al. Mapping the genetic landscape of human cells. Cell 174, 953–967.e22 (2018).
Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. From molecular to modular cell biology. Nature 402, C47–C52 (1999).
Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Carpenter, A. E. & Sabatini, D. M. Systematic genome-wide screens of gene function. Nat. Rev. Genet. 5, 11–22 (2004).
Alonso, J. M. & Ecker, J. R. Moving forward in reverse: genetic technologies to enable genome-wide phenomic screens in Arabidopsis. Nat. Rev. Genet. 7, 524–536 (2006).
Mohr, S. E., Smith, J. A., Shamu, C. E., Neumüller, R. A. & Perrimon, N. RNAi screening comes of age: improved techniques and complementary approaches. Nat. Rev. Mol. Cell Biol. 15, 591–600 (2014).
Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR–Cas9. Nat. Rev. Genet. 16, 299–311 (2015).
Tong, A. H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).
Tong, A. H. Y. Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004).
Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431 (2010).
Bassik, M. C. et al. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell 152, 909–922 (2013).
Rosenbluh, J. et al. Genetic and proteomic interrogation of lower confidence candidate genes reveals signaling networks in β-catenin-active cancers. Cell Syst. 3, 302–316.e4 (2016).
Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).
Han, K. et al. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat. Biotechnol. 35, 463–474 (2017).
Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).
Boettcher, M. et al. Dual gene activation and knockout screen reveals directional dependencies in genetic networks. Nat. Biotechnol. 36, 170–178 (2018).
Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic ras. Cell 168, 890–903.e15 (2017).
Rauscher, B. et al. Toward an integrated map of genetic interactions in cancer cells. Mol. Syst. Biol. 14, e7656 (2018).
McDonald, E. R. 3rd et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592.e10 (2017).
Pan, J. et al. Interrogation of mammalian protein complex structure, function, and membership using genome-scale fitness screens. Cell Syst. 6, 555–568.e7 (2018).
Boyle, E. A., Pritchard, J. K. & Greenleaf, W. J. High-resolution mapping of cancer cell networks using co-functional interactions. Mol. Syst. Biol. 14, e8594 (2018).
Kim, E. et al. A network of human functional gene interactions from knockout fitness screens in cancer cells. Life Sci. Alliance 2, e201800278 (2019).
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).
Aitkin, A. C. On least squares and linear combination of observations. Proc. R. Soc. Edinb. 55, 42–48 (1935).
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
Moll, U. M. & Petrenko, O. The MDM2-–p53 interaction. Mol. Cancer Res. 1, 1001–1008 (2003).
Wang, X. et al. PHLDA2 is a key oncogene-induced negative feedback inhibitor of EGFR/ErbB2 signaling via interference with AKT signaling. Oncotarget 9, 24914 (2018).
Furukawa, T., Tanji, E., Xu, S. & Horii, A. Feedback regulation of DUSP6 transcription responding to MAPK1 via ETS2 in human cells. Biochem. Biophys. Res. Commun. 377, 317–320 (2008).
Rickman, D. S., Schulte, J. H. & Eilers, M. The expanding world of N-MYC–driven tumors. Cancer Disco. 8, 150–163 (2018).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).
Coifman, R. R. & Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006).
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36, D646–D650 (2008).
Drew, K. et al. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol. Syst. Biol. 13, 932 (2017).
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).
Okamura, Y. et al. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res. 43, D82–D86 (2015).
Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).
Nepusz, T., Yu, H. & Paccanaro, A. Detecting overlapping protein complexes in protein–protein interaction networks. Nat. Methods 9, 471–472 (2012).
Saxton, R. A. & Sabatini, D. M. mTOR signaling in growth, metabolism, and disease. Cell 169, 361–371 (2017).
Shoemaker, C. J. et al. CRISPR screening using an expanded toolkit of autophagy reporters identifies TMEM41B as a novel autophagy factor. PLoS Biol. 17, e2007044 (2019).
Breslow, D. K. et al. A CRISPR-based screen for Hedgehog signaling provides insights into ciliary function and ciliopathies. Nat. Genet. 50, 460–471 (2018).
Blomen, V. A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 1092–1096 (2015).
Nagan, N. & Zoeller, R. A. Plasmalogens: biosynthesis and functions. Prog. Lipid Res. 40, 199–229 (2001).
Vaz, F. M. et al. Mutations in PCYT2 disrupt etherlipid biosynthesis and cause a complex hereditary spastic paraplegia. Brain 142, 3382–3397 (2019).
Horibata, Y. et al. EPT1 (selenoprotein I) is critical for the neural development and maintenance of plasmalogen in humans. J. Lipid Res. 59, 1015–1026 (2018).
Contrepois, K. et al. Cross-platform comparison of untargeted and targeted lipidomics approaches on aging mouse plasma. Sci. Rep. 8, 17747 (2018).
Schüssler-Fiorenza Rose, S. M. et al. A longitudinal big data approach for precision health. Nat. Med. 25, 792–804 (2019).
Snyder, F., Lee, T.-C. & Wykle, R. L. in The Enzymes of Biological Membranes, Vol. 2, Biosynthesis and Metabolism (ed. Martonosi, A. N.) 1–58 (Springer US, 1985).
Zoeller, R. A. et al. Mutants in a macrophage-like cell line are defective in plasmalogen biosynthesis, but contain functional peroxisomes. J. Biol. Chem. 267, 8299–8306 (1992).
Gao, J. et al. Fatty acid desaturase4 of Arabidopsis encodes a protein distinct from characterized fatty acid desaturases. Plant J. 60, 832–839 (2009).
Motley, A., Bright, N. A., Seaman, M. N. J. & Robinson, M. S. Clathrin-mediated endocytosis in AP-2-depleted cells. J. Cell Biol. 162, 909–918 (2003).
Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).
Huttlin, E. L. et al. The BioPlex Network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).
Chan, E. M. et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature 568, 551–556 (2019).
Ariazi, E., Ariazi, J., Cordera, F. & Jordan, V. Estrogen receptors as therapeutic targets in breast cancer. Curr. Top. Med. Chem. 6, 181–202 (2006).
Fletcher, M. N. C. et al. Master regulators of FGFR2 signalling and breast cancer risk. Nat. Commun. 4, 2464 (2013).
Roman, S. D. et al. Estradiol induction of retinoic acid receptors in human breast cancer cells. Cancer Res. 53, 5940–5945 (1993).
Zhang, Y.-W. et al. Acquisition of estrogen independence induces TOB1-related mechanisms supporting breast cancer cell proliferation. Oncogene 35, 1643–1656 (2016).
Ascierto, P. A. et al. The role of BRAF V600 mutation in melanoma. J. Transl. Med. 10, 85 (2012).
Garraway, L. A. et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117–122 (2005).
Perotti, V. et al. NFATc2 is an intrinsic regulator of melanoma dedifferentiation. Oncogene 35, 2862–2872 (2016).
Harris, M. L., Baxter, L. L., Loftus, S. K. & Pavan, W. J. Sox proteins in melanocyte development and melanoma. Pigment Cell Melanoma Res. 23, 496–513 (2010).
Gallego-García, A. et al. A bacterial light response reveals an orphan desaturase for human plasmalogen synthesis. Science 366, 128–132 (2019).
Werner, E. R. et al. The TMEM189 gene encodes plasmanylethanolamine desaturase which introduces the characteristic vinyl ether double bond into plasmalogens. Proc. Natl Acad. Sci. USA 117, 7792–7798 (2020).
Piano, V. et al. Discovery of inhibitors for the ether lipid-generating enzyme AGPS as anti-cancer agents. ACS Chem. Biol. 10, 2589–2597 (2015).
Zhu, C. et al. The fusion landscape of hepatocellular carcinoma. Mol. Oncol. 13, 1214–1225 (2019).
Chen, J. & Wagner, E. J. snRNA 3′ end formation: the dawn of the integrator complex. Biochem. Soc. Trans. 38, 1082–1087 (2010).
Boeing, S. et al. Multiomic analysis of the UV-induced DNA damage response. Cell Rep. 15, 1597–1610 (2016).
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Shifrut, E. et al. Genome-wide CRISPR screens in primary human T cells reveal key regulators of immune function. Cell 175, 1958–1971.e15 (2018).
Povey, S. et al. The HUGO gene nomenclature committee (HGNC). Hum. Genet. 109, 678–680 (2001).
Collard, F. et al. A conserved phosphatase destroys toxic glycolytic side products in mammals and yeast. Nat. Chem. Biol. 12, 601–607 (2016).
Braverman, N. et al. Human PEX7 encodes the peroxisomal PTS2 receptor and is responsible for rhizomelic chondrodysplasia punctata. Nat. Genet. 15, 369–376 (1997).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45, D331–D338 (2017).
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Liu, Y. & Xie, J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J. Am. Statist. Assoc. https://doi.org/10.1080/01621459.2018.1554485 (2019).
Liu, Y. et al. ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410–421 (2019).
Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998 (2015).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
We thank R. Zoeller (Boston University Medical Center) for providing RAW.12 cells and the parent RAW264.7 cell line. We thank E. Boyle, J. Donnelly, M. Pearson, G. Anderson, S. Simpkins, T. Ideker and members of the Bassik and Kundaje laboratories for helpful discussions. This work was supported by a National Institute of Health (NIH) Director’s New Innovator award (no. 1DP2HD084069-01 to M.C.B.), a grant from NIH/ENCODE (no. 5UM1HG009436-02 to A.K. and M.C.B.), a Stanford Bio-X Bowes Fellowship (to M.W.), and a Stanford School of Medicine Dean’s Postdoctoral Fellowship and a Jane Coffin Childs Postdoctoral Fellowship (to R.A.K.).
The authors declare no competing interests.
Peer review information Nature Genetics thanks Francesco Iorio and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a. The concept of co-essentiality: (left) a pair of functionally related genes are both essential in some cell lines and both non-essential in other lines. Essentiality can be quantified from CRISPR screens as the logarithm of the growth effect of the gene’s knockout (intuitively, the number of times fewer cells with the knockout doubled during the screen, compared to control cells). (Right) a pair of unrelated genes have uncorrelated essentiality across cell lines. b. Simulation of how biological relatedness between cell lines inflates Pearson’s correlation p-values. Duplicating each point 10 times with slight noise (analogous to duplicating each screen in 10 related lines) makes the previously non-significant (p = 0.6) blue correlation highly significant (p = 0.007) and the significant red correlation (p = 7 × 10−5) substantially more so (p = 2 × 10-103), despite similar correlation magnitudes.
Quantile-quantile plots for Pearson’s correlation and GLS p-values (an alternate visualization of the p-value histograms in Fig. 1b). The observed p-values (y), sorted from largest to smallest, are plotted against the uniform distribution of p-values (x) expected under the null hypothesis.
Histograms of genes’ number of co-essential partners at 1% and 10% FDR as a function of the gene’s average essentiality (pre-bias-correction CERES score) across lines.
Extended Data Fig. 4 GLS improves recall of known functional interactions in co-essential gene pairs with and without PCA-based bias correction.
Enrichment of interactions from GLS- and Pearson’s-based co-essentiality using the DepMap dataset, as well as co-expression using the COXPRESdb dataset, in CORUM, hu.MAP and STRING, considering the top 1-10 partners per gene, similar to Fig. 2a but including GLS- and Pearson’s-based co-essentiality done both with and without PCA-based bias correction.
F1 score (harmonic mean of precision and recall) for various values of the module density parameter d on CORUM, hu.MAP and STRING. F1 scores represent the performance of a binary network based on the modules (that is “are genes A and B in the same module?”) at predicting a binary network based on the benchmark dataset (that is “are genes A and B partners in the benchmark dataset?”).
Enrichment of syntenic (both genes on same chromosome) and non-syntenic co-essential pairs for annotated interactions CORUM, hu.MAP and STRING databases, using the same benchmarking strategy as in Fig. 2a.
Extended Data Fig. 7 Number of genes assigned putative functions by various co-essentiality module detection methods, after excluding syntenic modules.
Number of genes in non-syntenic clusters/modules at least N-fold enriched for some GO term with at least 5 total genes present across all clusters/modules, excluding the gene itself from the enrichment calculation, for various N from 10 to 1000.
Extended Data Fig. 8 Strength of correct functional predictions of our modules versus same-size Pearson.
Maximum GO term enrichment across all correctly predicted GO terms, for each of the n = 1407 genes correctly predicted by both our modules and same-size Pearson, shown as a boxplot (left) and swarmplot (right). Boxplot centre represents median, bounds of box represent 25th and 75th percentiles, and minima and maxima represent the minimum and maximum values, respectively.
Extended Data Fig. 9 Additional functional characterization of TMEM189 suggests a secondary role in sphingolipid biosynthesis.
a. Abundances (relative to Safe-targeting sgRNA control #1) of very long chain sphingomyelin species (with acyl chain length indicated on x-axis) in cell extracts prepared from HeLa cells transduced with indicated sgRNAs. sgSafe data and sgTMEM189 data are from same data set represented in Fig. 4c. n = 4 biologically independent cell extracts. Data are presented as mean+/- s.d. b. Volcano plot of mass spectrometric (TMT) analysis of TMEM189-GFP immunoprecipitates. Data are from same mass spectrometry analysis as data shown in Fig. 5d.
Example use case for the interactive web tool (http://coessentiality.net). A gene, KRAS, was selected using the dropdown menu at top left and is marked with a red arrow in the scatterplot below. Genes selected for analysis – KRAS and its gene neighborhood – are designated with red points in the main panel (left). The heatmap panel (top right) shows that KRAS-mutant lines (selected for display using the search bar above the heat map and indicated as black marks in the “Mutation” bar above the heatmap) are enriched in a cluster (far right) that is marked by increased essentiality of KRAS. The pathway enrichment panel (bottom right) shows strong enrichments for Ras signaling and related pathways. The points in the main panel have also been selected in the tissue search bar (top middle) to be colored according to the average essentialities of each gene in kidney-derived cell lines. Gene sets can also be either saved or uploaded as csv files using the respective buttons in the top center (under “Gene set download/upload”). Some web colors and font sizes were optimized for display in this figure.
Supplementary Fig. 1, Table 1, Notes, Methods and References
Spreadsheet of significant co-essential interactions at 10% per-gene FDR. List of all co-essential gene pairs identified in the present study, with the number of PubMed citations (as of October 2019) and chromosome location for each gene, and the direction of the gene correlation (positive (+) or negative (−)). Second sheet includes manual module annotation references. Two-tailed P values were computed via GLS (Methods); multiple testing correction was performed using the Benjamini–Hochberg correction, with significance set at a per-gene FDR threshold of 10%.
Spreadsheet of co-essential modules. List of all 5,229 co-essential modules and their constituent genes, with the top three most-enriched GO terms (relative to a background set consisting of all genes in any module, and excluding GO terms with fewer than five genes in this background set) and their associated enrichments and P values, the value of d used to define the module and a link to the heatmap of batch-corrected essentiality data across 485 cell lines. The second sheet includes manual module annotation references.
Uncharacterized gene functional predictions. List of uncharacterized genes that are present in co-essential modules >100-fold enriched (and with Bonferroni’s corrected P < 0.05) for a GO term, the UniProt annotation score and number of PubMed citations for each gene (as of October 2019), and the set of genes in each cluster that is and is not annotated with the most-enriched GO term. Two-tailed P values were computed using a hypergeometric test.
Lipidomics data. Lipid species concentrations for indicated lipids measured using Lipidyzer platform in indicated cell lines. QC1, QC2 and QC3 indicate quality controls (Methods).
Raw lipidomics data.
Quantitative lipidomics data.
Acquisition methods for lipidomics.
MS data for proteomic analysis of C15orf57 and TMEM189 interactomes. Proteomic data, including complete list of proteins and enrichment P values from two-sided Student’s t-test, for C15orf57 and TMEM189 interactome analyses in Fig. 4 and Extended Data Fig. 9.
Cancer-type-specific module dependencies. List of 444 differentially essential modules across 16 tissue types, ranked by P value. Two-tailed P values were computed via GLS and the Aggregated Cauchy Association Test (Methods); multiple testing correction was performed using the Benjamini–Hochberg correction, with significance set at an FDR threshold of 10%.
Example of use of cases of co-essential browser. Guide to use of co-essential browser showing how to navigate the webtool in the context of multiple-use cases, including gene lookup, gene set selection and gene list upload.
About this article
Cite this article
Wainberg, M., Kamber, R.A., Balsubramani, A. et al. A genome-wide atlas of co-essential modules assigns function to uncharacterized genes. Nat Genet 53, 638–649 (2021). https://doi.org/10.1038/s41588-021-00840-z
This article is cited by
Nature Chemical Biology (2023)
Context-defined cancer co-dependency mapping identifies a functional interplay between PRC2 and MLL-MEN1 complex in lymphoma
Nature Communications (2023)
Nature Communications (2023)
Genome Biology (2023)
Dynamic rewiring of biological activity across genotype and lineage revealed by context-dependent functional interactions
Genome Biology (2022)