Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A genome-wide atlas of co-essential modules assigns function to uncharacterized genes

Abstract

A central question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds of cell lines have been used to cluster genes into ‘co-essential’ pathways, but this approach has been limited by ubiquitous false positives. In the present study, we develop a statistical method that enables robust identification of gene co-essentiality and yields a genome-wide set of functional modules. This atlas recapitulates diverse pathways and protein complexes, and predicts the functions of 108 uncharacterized genes. Validating top predictions, we show that TMEM189 encodes plasmanylethanolamine desaturase, a key enzyme for plasmalogen synthesis. We also show that C15orf57 encodes a protein that binds the AP2 complex, localizes to clathrin-coated pits and enables efficient transferrin uptake. Finally, we provide an interactive webtool for the community to explore our results, which establish co-essentiality profiling as a powerful resource for biological pathway identification and discovery of new gene functions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Construction of a genome-wide co-essentiality network.
Fig. 2: GLS improves recall of known functional interactions in co-essential gene pairs and modules.
Fig. 3: Co-essential modules recapitulate known pathways and nominate new pathway members.
Fig. 4: TMEM189 encodes the enzyme PEDS required for synthesis of plasmalogen lipids.
Fig. 5: C15orf57 is required for efficient clathrin-mediated endocytosis of transferrin.
Fig. 6: Identification of cancer-type-specific module dependencies.

Similar content being viewed by others

Data availability

The Achilles project 18C3 release is publicly available at https://ndownloader.figshare.com/files/12704099 or https://depmap.org/portal/download/all under release ‘DepMap Public 18Q3’ and file ‘gene_effect.csv’. The HUGO Gene Nomenclature Committee Database is accessible at https://www.genenames.org. The STRING database is accessible at https://string-db.org. The CORUM database is accessible at https://mips.helmholtz-muenchen.de/corum. The hu.MAP database is accessible at http://proteincomplexes.org. The DoRothEA database is accessible at https://saezlab.github.io/dorothea. The COXPRESdb database is accessible at https://coxpresdb.jp. Data supporting the findings of the present study are available upon reasonable request. Lipidomic raw data, acquisition methods and quantitative results are available as Supplementary Data 57. The raw MS proteomic data have been deposited to the ProteomeXchange Consortium via the PRIDE86 partner repository (http://www.ebi.ac.uk/pride) with the dataset identifier PXD023558. Source data are provided with this paper.

Code availability

Code to generate co-essential gene pairs, co-essential modules, modules with cancer-type-specific dependencies and the 2D layout is available at https://github.com/kundajelab/coessentiality.

References

  1. Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).

    Article  PubMed  Google Scholar 

  2. Chuang, H.-Y., Hofree, M. & Ideker, T. A decade of systems biology. Annu. Rev. Cell Dev. Biol. 26, 721–744 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

    Article  CAS  PubMed  Google Scholar 

  4. Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016).

  5. Horlbeck, M. A. et al. Mapping the genetic landscape of human cells. Cell 174, 953–967.e22 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. From molecular to modular cell biology. Nature 402, C47–C52 (1999).

    Article  CAS  PubMed  Google Scholar 

  7. Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Carpenter, A. E. & Sabatini, D. M. Systematic genome-wide screens of gene function. Nat. Rev. Genet. 5, 11–22 (2004).

    Article  CAS  PubMed  Google Scholar 

  9. Alonso, J. M. & Ecker, J. R. Moving forward in reverse: genetic technologies to enable genome-wide phenomic screens in Arabidopsis. Nat. Rev. Genet. 7, 524–536 (2006).

    Article  CAS  PubMed  Google Scholar 

  10. Mohr, S. E., Smith, J. A., Shamu, C. E., Neumüller, R. A. & Perrimon, N. RNAi screening comes of age: improved techniques and complementary approaches. Nat. Rev. Mol. Cell Biol. 15, 591–600 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR–Cas9. Nat. Rev. Genet. 16, 299–311 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Tong, A. H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Tong, A. H. Y. Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004).

    Article  CAS  PubMed  Google Scholar 

  14. Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Bassik, M. C. et al. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell 152, 909–922 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Rosenbluh, J. et al. Genetic and proteomic interrogation of lower confidence candidate genes reveals signaling networks in β-catenin-active cancers. Cell Syst. 3, 302–316.e4 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Han, K. et al. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat. Biotechnol. 35, 463–474 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Boettcher, M. et al. Dual gene activation and knockout screen reveals directional dependencies in genetic networks. Nat. Biotechnol. 36, 170–178 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic ras. Cell 168, 890–903.e15 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Rauscher, B. et al. Toward an integrated map of genetic interactions in cancer cells. Mol. Syst. Biol. 14, e7656 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  23. McDonald, E. R. 3rd et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell 170, 577–592.e10 (2017).

    Article  CAS  PubMed  Google Scholar 

  24. Pan, J. et al. Interrogation of mammalian protein complex structure, function, and membership using genome-scale fitness screens. Cell Syst. 6, 555–568.e7 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Boyle, E. A., Pritchard, J. K. & Greenleaf, W. J. High-resolution mapping of cancer cell networks using co-functional interactions. Mol. Syst. Biol. 14, e8594 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Kim, E. et al. A network of human functional gene interactions from knockout fitness screens in cancer cells. Life Sci. Alliance 2, e201800278 (2019).

  27. Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Aitkin, A. C. On least squares and linear combination of observations. Proc. R. Soc. Edinb. 55, 42–48 (1935).

    Article  Google Scholar 

  30. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Moll, U. M. & Petrenko, O. The MDM2-–p53 interaction. Mol. Cancer Res. 1, 1001–1008 (2003).

    CAS  PubMed  Google Scholar 

  32. Wang, X. et al. PHLDA2 is a key oncogene-induced negative feedback inhibitor of EGFR/ErbB2 signaling via interference with AKT signaling. Oncotarget 9, 24914 (2018).

    Article  PubMed  Google Scholar 

  33. Furukawa, T., Tanji, E., Xu, S. & Horii, A. Feedback regulation of DUSP6 transcription responding to MAPK1 via ETS2 in human cells. Biochem. Biophys. Res. Commun. 377, 317–320 (2008).

    Article  CAS  PubMed  Google Scholar 

  34. Rickman, D. S., Schulte, J. H. & Eilers, M. The expanding world of N-MYC–driven tumors. Cancer Disco. 8, 150–163 (2018).

    Article  CAS  Google Scholar 

  35. McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  36. Coifman, R. R. & Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006).

    Article  Google Scholar 

  37. Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36, D646–D650 (2008).

    Article  CAS  PubMed  Google Scholar 

  38. Drew, K. et al. Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes. Mol. Syst. Biol. 13, 932 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).

    Article  CAS  PubMed  Google Scholar 

  40. Okamura, Y. et al. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Res. 43, D82–D86 (2015).

    Article  CAS  PubMed  Google Scholar 

  41. Garcia-Alonso, L., Holland, C. H., Ibrahim, M. M., Turei, D. & Saez-Rodriguez, J. Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res. 29, 1363–1375 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Nepusz, T., Yu, H. & Paccanaro, A. Detecting overlapping protein complexes in protein–protein interaction networks. Nat. Methods 9, 471–472 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Saxton, R. A. & Sabatini, D. M. mTOR signaling in growth, metabolism, and disease. Cell 169, 361–371 (2017).

    Article  CAS  PubMed  Google Scholar 

  44. Shoemaker, C. J. et al. CRISPR screening using an expanded toolkit of autophagy reporters identifies TMEM41B as a novel autophagy factor. PLoS Biol. 17, e2007044 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Breslow, D. K. et al. A CRISPR-based screen for Hedgehog signaling provides insights into ciliary function and ciliopathies. Nat. Genet. 50, 460–471 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Blomen, V. A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 1092–1096 (2015).

    Article  CAS  PubMed  Google Scholar 

  47. Nagan, N. & Zoeller, R. A. Plasmalogens: biosynthesis and functions. Prog. Lipid Res. 40, 199–229 (2001).

    Article  CAS  PubMed  Google Scholar 

  48. Vaz, F. M. et al. Mutations in PCYT2 disrupt etherlipid biosynthesis and cause a complex hereditary spastic paraplegia. Brain 142, 3382–3397 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Horibata, Y. et al. EPT1 (selenoprotein I) is critical for the neural development and maintenance of plasmalogen in humans. J. Lipid Res. 59, 1015–1026 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Contrepois, K. et al. Cross-platform comparison of untargeted and targeted lipidomics approaches on aging mouse plasma. Sci. Rep. 8, 17747 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Schüssler-Fiorenza Rose, S. M. et al. A longitudinal big data approach for precision health. Nat. Med. 25, 792–804 (2019).

    Article  PubMed  Google Scholar 

  52. Snyder, F., Lee, T.-C. & Wykle, R. L. in The Enzymes of Biological Membranes, Vol. 2, Biosynthesis and Metabolism (ed. Martonosi, A. N.) 1–58 (Springer US, 1985).

  53. Zoeller, R. A. et al. Mutants in a macrophage-like cell line are defective in plasmalogen biosynthesis, but contain functional peroxisomes. J. Biol. Chem. 267, 8299–8306 (1992).

    Article  CAS  PubMed  Google Scholar 

  54. Gao, J. et al. Fatty acid desaturase4 of Arabidopsis encodes a protein distinct from characterized fatty acid desaturases. Plant J. 60, 832–839 (2009).

    Article  CAS  PubMed  Google Scholar 

  55. Motley, A., Bright, N. A., Seaman, M. N. J. & Robinson, M. S. Clathrin-mediated endocytosis in AP-2-depleted cells. J. Cell Biol. 162, 909–918 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Huttlin, E. L. et al. Architecture of the human interactome defines protein communities and disease networks. Nature 545, 505–509 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Huttlin, E. L. et al. The BioPlex Network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Chan, E. M. et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature 568, 551–556 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Ariazi, E., Ariazi, J., Cordera, F. & Jordan, V. Estrogen receptors as therapeutic targets in breast cancer. Curr. Top. Med. Chem. 6, 181–202 (2006).

    Article  CAS  PubMed  Google Scholar 

  60. Fletcher, M. N. C. et al. Master regulators of FGFR2 signalling and breast cancer risk. Nat. Commun. 4, 2464 (2013).

    Article  PubMed  Google Scholar 

  61. Roman, S. D. et al. Estradiol induction of retinoic acid receptors in human breast cancer cells. Cancer Res. 53, 5940–5945 (1993).

    CAS  PubMed  Google Scholar 

  62. Zhang, Y.-W. et al. Acquisition of estrogen independence induces TOB1-related mechanisms supporting breast cancer cell proliferation. Oncogene 35, 1643–1656 (2016).

    Article  CAS  PubMed  Google Scholar 

  63. Ascierto, P. A. et al. The role of BRAF V600 mutation in melanoma. J. Transl. Med. 10, 85 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Garraway, L. A. et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117–122 (2005).

    Article  CAS  PubMed  Google Scholar 

  65. Perotti, V. et al. NFATc2 is an intrinsic regulator of melanoma dedifferentiation. Oncogene 35, 2862–2872 (2016).

    Article  CAS  PubMed  Google Scholar 

  66. Harris, M. L., Baxter, L. L., Loftus, S. K. & Pavan, W. J. Sox proteins in melanocyte development and melanoma. Pigment Cell Melanoma Res. 23, 496–513 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Gallego-García, A. et al. A bacterial light response reveals an orphan desaturase for human plasmalogen synthesis. Science 366, 128–132 (2019).

    Article  PubMed  Google Scholar 

  68. Werner, E. R. et al. The TMEM189 gene encodes plasmanylethanolamine desaturase which introduces the characteristic vinyl ether double bond into plasmalogens. Proc. Natl Acad. Sci. USA 117, 7792–7798 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Piano, V. et al. Discovery of inhibitors for the ether lipid-generating enzyme AGPS as anti-cancer agents. ACS Chem. Biol. 10, 2589–2597 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Zhu, C. et al. The fusion landscape of hepatocellular carcinoma. Mol. Oncol. 13, 1214–1225 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Chen, J. & Wagner, E. J. snRNA 3′ end formation: the dawn of the integrator complex. Biochem. Soc. Trans. 38, 1082–1087 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Boeing, S. et al. Multiomic analysis of the UV-induced DNA damage response. Cell Rep. 15, 1597–1610 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Shifrut, E. et al. Genome-wide CRISPR screens in primary human T cells reveal key regulators of immune function. Cell 175, 1958–1971.e15 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Povey, S. et al. The HUGO gene nomenclature committee (HGNC). Hum. Genet. 109, 678–680 (2001).

    Article  CAS  PubMed  Google Scholar 

  76. Collard, F. et al. A conserved phosphatase destroys toxic glycolytic side products in mammals and yeast. Nat. Chem. Biol. 12, 601–607 (2016).

    Article  CAS  PubMed  Google Scholar 

  77. Braverman, N. et al. Human PEX7 encodes the peroxisomal PTS2 receptor and is responsible for rhizomelic chondrodysplasia punctata. Nat. Genet. 15, 369–376 (1997).

    Article  CAS  PubMed  Google Scholar 

  78. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45, D331–D338 (2017).

    Article  Google Scholar 

  81. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  PubMed  Google Scholar 

  82. Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Liu, Y. & Xie, J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J. Am. Statist. Assoc. https://doi.org/10.1080/01621459.2018.1554485 (2019).

  84. Liu, Y. et al. ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410–421 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998 (2015).

    Article  CAS  PubMed  Google Scholar 

  86. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank R. Zoeller (Boston University Medical Center) for providing RAW.12 cells and the parent RAW264.7 cell line. We thank E. Boyle, J. Donnelly, M. Pearson, G. Anderson, S. Simpkins, T. Ideker and members of the Bassik and Kundaje laboratories for helpful discussions. This work was supported by a National Institute of Health (NIH) Director’s New Innovator award (no. 1DP2HD084069-01 to M.C.B.), a grant from NIH/ENCODE (no. 5UM1HG009436-02 to A.K. and M.C.B.), a Stanford Bio-X Bowes Fellowship (to M.W.), and a Stanford School of Medicine Dean’s Postdoctoral Fellowship and a Jane Coffin Childs Postdoctoral Fellowship (to R.A.K.).

Author information

Authors and Affiliations

Authors

Contributions

M.W., R.A.K., A.B., A.K. and M.C.B. contributed to conceptualizing the project. M.W. developed the method for identifying co-essential gene pairs using GLS and performed benchmarking analyses. R.A.K. annotated clusters according to biological pathways. M.W. and R.M.M. generated the co-essential modules using ClusterONE, with guidance from R.A.K. R.A.K. performed the experiments with help from M.G., K.S. and M.M.D. A.B. created the 2D visualization and the webtool with help from A.S. and W.M., and guidance from R.A.K. N.S.-A. contributed to analysis of tissue-selective module dependencies. L.J., J.C. and R.J. performed the proteomic analysis. D.H. performed the lipidomic analysis. M.W., R.A.K., A.B., A.K. and M.C.B. wrote the original draft. M.P.S., A.K. and M.C.B. supervised the work. All authors edited and reviewed the paper. A.K. and M.C.B. acquired the funding.

Corresponding authors

Correspondence to Michael C. Bassik or Anshul Kundaje.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Francesco Iorio and the other, anonymous, reviewer for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Co-essentiality profiling and the limitations of Pearson’s correlation.

a. The concept of co-essentiality: (left) a pair of functionally related genes are both essential in some cell lines and both non-essential in other lines. Essentiality can be quantified from CRISPR screens as the logarithm of the growth effect of the gene’s knockout (intuitively, the number of times fewer cells with the knockout doubled during the screen, compared to control cells). (Right) a pair of unrelated genes have uncorrelated essentiality across cell lines. b. Simulation of how biological relatedness between cell lines inflates Pearson’s correlation p-values. Duplicating each point 10 times with slight noise (analogous to duplicating each screen in 10 related lines) makes the previously non-significant (p = 0.6) blue correlation highly significant (p = 0.007) and the significant red correlation (p = 7 × 10−5) substantially more so (p = 2 × 10-103), despite similar correlation magnitudes.

Extended Data Fig. 2 Quantile-quantile plots for Pearson’s and GLS.

Quantile-quantile plots for Pearson’s correlation and GLS p-values (an alternate visualization of the p-value histograms in Fig. 1b). The observed p-values (y), sorted from largest to smallest, are plotted against the uniform distribution of p-values (x) expected under the null hypothesis.

Extended Data Fig. 3 Number of co-essential partners per gene by average gene essentiality.

Histograms of genes’ number of co-essential partners at 1% and 10% FDR as a function of the gene’s average essentiality (pre-bias-correction CERES score) across lines.

Extended Data Fig. 4 GLS improves recall of known functional interactions in co-essential gene pairs with and without PCA-based bias correction.

Enrichment of interactions from GLS- and Pearson’s-based co-essentiality using the DepMap dataset, as well as co-expression using the COXPRESdb dataset, in CORUM, hu.MAP and STRING, considering the top 1-10 partners per gene, similar to Fig. 2a but including GLS- and Pearson’s-based co-essentiality done both with and without PCA-based bias correction.

Extended Data Fig. 5 Benchmarking of cluster density d.

F1 score (harmonic mean of precision and recall) for various values of the module density parameter d on CORUM, hu.MAP and STRING. F1 scores represent the performance of a binary network based on the modules (that is “are genes A and B in the same module?”) at predicting a binary network based on the benchmark dataset (that is “are genes A and B partners in the benchmark dataset?”).

Extended Data Fig. 6 Benchmarking of syntenic versus non-syntenic genes.

Enrichment of syntenic (both genes on same chromosome) and non-syntenic co-essential pairs for annotated interactions CORUM, hu.MAP and STRING databases, using the same benchmarking strategy as in Fig. 2a.

Extended Data Fig. 7 Number of genes assigned putative functions by various co-essentiality module detection methods, after excluding syntenic modules.

Number of genes in non-syntenic clusters/modules at least N-fold enriched for some GO term with at least 5 total genes present across all clusters/modules, excluding the gene itself from the enrichment calculation, for various N from 10 to 1000.

Extended Data Fig. 8 Strength of correct functional predictions of our modules versus same-size Pearson.

Maximum GO term enrichment across all correctly predicted GO terms, for each of the n = 1407 genes correctly predicted by both our modules and same-size Pearson, shown as a boxplot (left) and swarmplot (right). Boxplot centre represents median, bounds of box represent 25th and 75th percentiles, and minima and maxima represent the minimum and maximum values, respectively.

Extended Data Fig. 9 Additional functional characterization of TMEM189 suggests a secondary role in sphingolipid biosynthesis.

a. Abundances (relative to Safe-targeting sgRNA control #1) of very long chain sphingomyelin species (with acyl chain length indicated on x-axis) in cell extracts prepared from HeLa cells transduced with indicated sgRNAs. sgSafe data and sgTMEM189 data are from same data set represented in Fig. 4c. n = 4 biologically independent cell extracts. Data are presented as mean+/- s.d. b. Volcano plot of mass spectrometric (TMT) analysis of TMEM189-GFP immunoprecipitates. Data are from same mass spectrometry analysis as data shown in Fig. 5d.

Extended Data Fig. 10 A web tool for interactive exploration of the co-essential network.

Example use case for the interactive web tool (http://coessentiality.net). A gene, KRAS, was selected using the dropdown menu at top left and is marked with a red arrow in the scatterplot below. Genes selected for analysis – KRAS and its gene neighborhood – are designated with red points in the main panel (left). The heatmap panel (top right) shows that KRAS-mutant lines (selected for display using the search bar above the heat map and indicated as black marks in the “Mutation” bar above the heatmap) are enriched in a cluster (far right) that is marked by increased essentiality of KRAS. The pathway enrichment panel (bottom right) shows strong enrichments for Ras signaling and related pathways. The points in the main panel have also been selected in the tissue search bar (top middle) to be colored according to the average essentialities of each gene in kidney-derived cell lines. Gene sets can also be either saved or uploaded as csv files using the respective buttons in the top center (under “Gene set download/upload”). Some web colors and font sizes were optimized for display in this figure.

Supplementary information

Supplementary Information

Supplementary Fig. 1, Table 1, Notes, Methods and References

Reporting Summary

Supplementary Data 1

Spreadsheet of significant co-essential interactions at 10% per-gene FDR. List of all co-essential gene pairs identified in the present study, with the number of PubMed citations (as of October 2019) and chromosome location for each gene, and the direction of the gene correlation (positive (+) or negative (−)). Second sheet includes manual module annotation references. Two-tailed P values were computed via GLS (Methods); multiple testing correction was performed using the Benjamini–Hochberg correction, with significance set at a per-gene FDR threshold of 10%.

Supplementary Data 2

Spreadsheet of co-essential modules. List of all 5,229 co-essential modules and their constituent genes, with the top three most-enriched GO terms (relative to a background set consisting of all genes in any module, and excluding GO terms with fewer than five genes in this background set) and their associated enrichments and P values, the value of d used to define the module and a link to the heatmap of batch-corrected essentiality data across 485 cell lines. The second sheet includes manual module annotation references.

Supplementary Data 3

Uncharacterized gene functional predictions. List of uncharacterized genes that are present in co-essential modules >100-fold enriched (and with Bonferroni’s corrected P < 0.05) for a GO term, the UniProt annotation score and number of PubMed citations for each gene (as of October 2019), and the set of genes in each cluster that is and is not annotated with the most-enriched GO term. Two-tailed P values were computed using a hypergeometric test.

Supplementary Data 4

Lipidomics data. Lipid species concentrations for indicated lipids measured using Lipidyzer platform in indicated cell lines. QC1, QC2 and QC3 indicate quality controls (Methods).

Supplementary Data 5

Raw lipidomics data.

Supplementary Data 6

Quantitative lipidomics data.

Supplementary Data 7

Acquisition methods for lipidomics.

Supplementary Data 8

MS data for proteomic analysis of C15orf57 and TMEM189 interactomes. Proteomic data, including complete list of proteins and enrichment P values from two-sided Student’s t-test, for C15orf57 and TMEM189 interactome analyses in Fig. 4 and Extended Data Fig. 9.

Supplementary Data 9

Cancer-type-specific module dependencies. List of 444 differentially essential modules across 16 tissue types, ranked by P value. Two-tailed P values were computed via GLS and the Aggregated Cauchy Association Test (Methods); multiple testing correction was performed using the Benjamini–Hochberg correction, with significance set at an FDR threshold of 10%.

Supplementary Video 1

Example of use of cases of co-essential browser. Guide to use of co-essential browser showing how to navigate the webtool in the context of multiple-use cases, including gene lookup, gene set selection and gene list upload.

Source data

Source Data Fig. 4

Unprocessed western blots.

Source Data Fig. 5

Unprocessed western blots.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wainberg, M., Kamber, R.A., Balsubramani, A. et al. A genome-wide atlas of co-essential modules assigns function to uncharacterized genes. Nat Genet 53, 638–649 (2021). https://doi.org/10.1038/s41588-021-00840-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00840-z

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research