Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Mutant phenotypes for thousands of bacterial genes of unknown function

Abstract

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because they are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High-throughput genetics for 32 bacteria.
Fig. 2: Identification of conserved phenotypes.
Fig. 3: Genetic overviews for a condition or a class of proteins.
Fig. 4: Conserved functional associations for genes encoding uncharacterized protein families.

Similar content being viewed by others

References

  1. Chang, Y.-C. et al. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps. Nucleic Acids Res. 44, D330–D335 (2016).

    Article  PubMed  CAS  Google Scholar 

  2. Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLOS Comput. Biol. 5, e1000605 (2009).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  3. Deutschbauer, A. et al. Towards an informative mutant phenotype for every bacterial gene. J. Bacteriol. 196, 3643–3655 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Deutschbauer, A. et al. Evidence-based annotation of gene function in Shewanella oneidensis MR-1 using genome-wide fitness profiling across 121 conditions. PLoS Genet. 7, e1002385 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Nichols, R. J. et al. Phenotypic landscape of a bacterial cell. Cell 144, 143–156 (2011).

    Article  PubMed  CAS  Google Scholar 

  6. Price, M. N. et al. The genetic basis of energy conservation in the sulfate-reducing bacterium Desulfovibrio alaskensis G20. Front. Microbiol. 5, 577 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Langridge, G. C. et al. Simultaneous assay of every Salmonella typhi gene using one million transposon mutants. Genome Res. 19, 2308–2316 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. van Opijnen, T., Bodi, K. L. & Camilli, A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat. Methods 6, 767–772 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio 6, e00306–e00315 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Liu, H. et al. Magic pools: parallel assessment of transposon delivery vectors in bacteria. mSystems 3, e00143–17 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Rubin, B. E. et al. The essential gene set of a photosynthetic organism. Proc. Natl Acad. Sci. USA 112, E6634–E6643 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Melnyk, R. A. et al. Novel mechanism for scavenging of hypochlorite involving a periplasmic methionine-rich peptide and methionine sulfoxide reductase. MBio 6, e00233–15 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Smith, A. M. et al. Quantitative phenotyping via deep barcode sequencing. Genome Res. 19, 1836–1842 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Rensing, C., Pribyl, T. & Nies, D. H. New functions for the three subunits of the CzcCBA cation-proton antiporter. J. Bacteriol. 179, 6871–6879 (1997).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Hottes, A. K. et al. Bacterial adaptation through loss of function. PLoS Genet. 9, e1003617 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Haft, D. H. et al. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 41, D387–D395 (2013).

    Article  PubMed  CAS  Google Scholar 

  17. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

    Article  PubMed  CAS  Google Scholar 

  18. Baker, J. L. et al. Widespread genetic switches and toxicity resistance proteins for fluoride. Science 335, 233–235 (2012).

    Article  ADS  PubMed  CAS  Google Scholar 

  19. Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41, D605–D612 (2013).

    Article  PubMed  CAS  Google Scholar 

  20. Hillenmeyer, M. E. et al. Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action. Genome Biol. 11, R30 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Rabus, R., Reizer, J., Paulsen, I. & Saier, M. H. Jr Enzyme INtr from Escherichia coli. A novel enzyme of the phosphoenolpyruvate-dependent phosphotransferase system exhibiting strict specificity for its phosphoryl acceptor, NPr. J. Biol. Chem. 274, 26185–26191 (1999).

    Article  PubMed  CAS  Google Scholar 

  22. van Opijnen, T., Dedrick, S. & Bento, J. Strain dependent genetic networks for antibiotic-sensitivity in a bacterial pathogen with a large pan-genome. PLoS Pathog. 12, e1005869 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Chen, S. H., Byrne, R. T., Wood, E. A. & Cox, M. M. Escherichia coli radD (yejH) gene: a novel function involved in radiation resistance and double-strand break repair. Mol. Microbiol. 95, 754–768 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Lopes-Kulishev, C. O. et al. Functional characterization of two SOS-regulated genes involved in mitomycin C resistance in Caulobacter crescentus. DNA Repair (Amst.) 33, 78–89 (2015).

    Article  CAS  Google Scholar 

  25. Gwon, G. H. et al. Crystal structure of a Fanconi anemia-associated nuclease homolog bound to 5′ flap DNA: basis of interstrand cross-link repair by FAN1. Genes Dev. 28, 2276–2290 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Justice, S. S., Hunstad, D. A., Cegelski, L. & Hultgren, S. J. Morphological plasticity as a bacterial survival strategy. Nat. Rev. Microbiol. 6, 162–168 (2008).

    Article  PubMed  CAS  Google Scholar 

  27. da Rocha, R. P., Paquola, A. C. de M., Marques Mdo, V., Menck, C. F. M. & Galhardo, R. S. Characterization of the SOS regulon of Caulobacter crescentus. J. Bacteriol. 190, 1209–1218 (2008).

    Article  PubMed  CAS  Google Scholar 

  28. Abella, M., Campoy, S., Erill, I., Rojo, F. & Barbé, J. Cohabitation of two different lexA regulons in Pseudomonas putida. J. Bacteriol. 189, 8855–8862 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Cirz, R. T., O’Neill, B. M., Hammond, J. A., Head, S. R. & Romesberg, F. E. Defining the Pseudomonas aeruginosa SOS response and its role in the global response to the antibiotic ciprofloxacin. J. Bacteriol. 188, 7101–7110 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Wiegmann, K. et al. Carbohydrate catabolism in Phaeobacter inhibens DSM 17395, a member of the marine roseobacter clade. Appl. Environ. Microbiol. 80, 4725–4737 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Brouns, S. J. J. et al. Identification of the missing links in prokaryotic pentose oxidation pathways: evidence for enzyme recruitment. J. Biol. Chem. 281, 27378–27388 (2006).

    Article  PubMed  CAS  Google Scholar 

  32. Johnsen, U. et al. d-xylose degradation pathway in the halophilic archaeon Haloferax volcanii. J. Biol. Chem. 284, 27290–27303 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Stephens, C. et al. Genetic analysis of a novel pathway for d-xylose metabolism in Caulobacter crescentus. J. Bacteriol. 189, 2181–2185 (2007).

    Article  PubMed  CAS  Google Scholar 

  34. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    Article  PubMed  CAS  Google Scholar 

  35. Overbeek, R. et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206–D214 (2014).

    Article  PubMed  CAS  Google Scholar 

  36. Iwamoto, R. & Imanaga, Y. Direct evidence of the Entner–Doudoroff pathway operating in the metabolism of d-glucosamine in bacteria. J. Biochem. 109, 66–69 (1991).

    PubMed  CAS  Google Scholar 

  37. Ghrist, A. C. & Stauffer, G. V. The Escherichia coli glycine transport system and its role in the regulation of the glycine cleavage enzyme system. Microbiology 141, 133–140 (1995).

    Article  PubMed  CAS  Google Scholar 

  38. Figueira, R. et al. Adaptation to sustained nitrogen starvation by Escherichia coli requires the eukaryote-like serine/threonine kinase YeaG. Sci. Rep. 5, 17524 (2015).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  39. Tagourti, J., Landoulsi, A. & Richarme, G. Cloning, expression, purification and characterization of the stress kinase YeaG from Escherichia coli. Protein Expr. Purif. 59, 79–85 (2008).

    Article  PubMed  CAS  Google Scholar 

  40. Thorgersen, M. P. et al. Molybdenum availability is key to nitrate removal in contaminated groundwater environments. Appl. Environ. Microbiol. 81, 4976–4983 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Ray, J. et al. Complete genome sequence of Cupriavidus basilensis 4G11, isolated from the Oak Ridge Field Research Center site. Genome Announc. 3, e00322–15 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Vaccaro, B. J. et al. Novel metal cation resistance systems from mutant fitness analysis of denitrifying Pseudomonas stutzeri. Appl. Environ. Microbiol. 82, 6046–6056 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Kovach, M. E. et al. Four new derivatives of the broad-host-range cloning vector pBBR1MCS, carrying different antibiotic-resistance cassettes. Gene 166, 175–176 (1995).

    Article  PubMed  CAS  Google Scholar 

  44. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Kuehl, J. V. et al. Functional genomics with a comprehensive library of transposon mutants for the sulfate-reducing bacterium Desulfovibrio alaskensis G20. MBio 5, e01041–14 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Zane, G. M., Yen, H. C. & Wall, J. D. Effect of the deletion of qmoABC and the promoter-distal gene encoding a hypothetical protein on sulfate reduction in Desulfovibrio vulgaris Hildenborough. Appl. Environ. Microbiol. 76, 5500–5509 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Kahm, M., Hasenbrink, G., Lichtenberg-Frate, H., Ludwig, J. & Kschischo, M. grofit: fitting biological growth curves with R. J. Stat. Softw. 33, 1–21 (2010).

    Article  Google Scholar 

  48. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    Article  MathSciNet  PubMed  PubMed Central  CAS  Google Scholar 

  49. Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 30, 701–707 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  52. Tritt, A., Eisen, J. A., Facciotti, M. T. & Darling, A. E. An integrated pipeline for de novo assembly of microbial genomes. PLoS One 7, e42304 (2012).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  53. Hunt, M. et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Eddy, S. R. Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011).

    Article  ADS  MathSciNet  PubMed  PubMed Central  CAS  Google Scholar 

  55. Wu, M. & Scott, A. J. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034 (2012).

    Article  PubMed  CAS  Google Scholar 

  56. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  57. Sagawa, S., Price, M. N., Deutschbauer, A. M. & Arkin, A. P. Validating regulatory predictions from diverse bacteria with mutant fitness data. PLoS One 12, e0178258 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G. D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl Acad. Sci. USA 96, 2896–2901 (1999).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  59. Aziz, R. K. et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471–D480 (2016).

    Article  PubMed  CAS  Google Scholar 

  61. Price, M. N. & Arkin, A. P. PaperBLAST: text mining papers for information about homologs. mSystems 2, e00039–17 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Marchler-Bauer, A. et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43, D222–D226 (2015).

    Article  PubMed  CAS  Google Scholar 

  63. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Li, C.-L. et al. DNA binding and cleavage by the periplasmic nuclease Vvn: a novel structure with a known active site. EMBO J. 22, 4014–4025 (2003).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Ananthaswamy, H. N. The release of endonuclease I from Escherichia coli by a new cold shock procedure. Biochem. Biophys. Res. Commun. 76, 289–298 (1977).

    Article  CAS  Google Scholar 

  66. Lopes, J., Gottfried, S. & Rothfield, L. Leakage of periplasmic enzymes by mutants of Escherichia coli and Salmonella typhimurium: isolation of ‘periplasmic leaky’ mutants. J. Bacteriol. 109, 520–525 (1972).

    PubMed  PubMed Central  CAS  Google Scholar 

  67. Nossal, N. G. & Heppel, L. A. The release of enzymes by osmotic shock from Escherichia coli in exponential phase. J. Biol. Chem. 241, 3055–3062 (1966).

    PubMed  CAS  Google Scholar 

Download references

Acknowledgements

We thank V. Lo, W. Shao, and K. Keller for technical assistance with the Fitness Browser website. Sequencing was performed at: the Vincent J. Coates Genomics Sequencing Laboratory (University of California at Berkeley), supported by NIH S10 Instrumentation Grants S10RR029668, S10RR027303, and OD018174; the DOE Joint Genome Institute; the College of Biological Sciences UCDNA Sequencing Facility (UC Davis); and the Institute for Genomics Sciences (University of Maryland). Studies of novel isolates were conducted by ENIGMA and were supported by the Office of Science, Office of Biological and Environmental Research of the US Department of Energy, under contract DE-AC02-05CH11231. The other data collection was supported by Laboratory Directed Research and Development (LDRD) funding from Berkeley Laboratory, provided by the Director, Office of Science, of the US Department of Energy under contract DE-AC02-05CH11231 and a Community Science Project from the Joint Genome Institute to M.J.B., J.B., A.P.A., and A.M.D. The work conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Contributions

A.M.D., A.P.A., M.N.P., M.J.B., and J.B. conceived the project. A.M.D., A.P.A., M.J.B., and J.B. supervised the project. A.M.D. led the experimental work. A.M.D., K.M.W., R.J.W., R.A.M., M.C., J.R., J.V.K., H.L., H.K.C., J.S.L., Y.S., Z.E., and H.S. collected data. R.C. isolated bacteria. M.N.P. and A.M.D. analysed the fitness data. R.A.M., R.J.W., and M.N.P. assembled genomes. B.E.R. provided resources and advice on S. elongatus experiments. G.M.Z. and J.D.W. generated gene deletion mutants in Pseudomonas stutzeri RCH2. A.V. edited the manuscript and provided advice. M.N.P., M.J.B., and A.M.D. wrote the paper.

Corresponding authors

Correspondence to Matthew J. Blow, Adam P. Arkin or Adam M. Deutschbauer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Examples of nitrogen source and stress fitness experiments.

a, The utilization of d-alanine or cytosine by Azospirillum brasilense Sp245. Each point shows the fitness of a gene in the two conditions. The data are the average of two biological replicates for each nitrogen source. Amino acid synthesis genes were identified using the top-level role in TIGRFAMs. The genes for d-alanine utilization were a d-amino acid dehydrogenase (AZOBR_RS08020), an ABC transporter operon (AZOBR_RS08235:RS08260), and a LysR family regulator (AZOBR_RS21915). The genes for cytosine utilization were cytosine deaminase (AZOBR_RS31895) and two ABC transporter operons (AZOBR_RS06950:RS06965 and AZOBR_RS31875:RS31885). b, Zinc stress in S. loihica PV-4. We compare fitness in rich medium with added zinc (II) sulfate to fitness in plain rich medium. The LB data are the average of two biological replicates. The highlighted genes include a putative heavy metal efflux pump (CzcCBA or Shew_3358:Shew_3356), a hypothetical protein at the beginning of the czc operon (CzcX), a zinc-responsive regulator (ZntR or Shew_3411), and another heavy metal efflux gene related to arsP or DUF318 (Shew_3410). CzcX lacks homology to any characterized protein, but homologues in other strains of Shewanella are also specifically important for resisting zinc stress. In both panels, the lines show x = 0, y = 0, and x = y.

Extended Data Fig. 2 Phenotypes versus types of genes.

We categorized proteins in our data set by their type of annotation or by whether they have homologues in the same genome (‘paralogues’). For each category, we show the fraction of genes that have statistically significant phenotypes, and more specifically the fractions that have strong phenotypes (|fitness| > 2 and statistically significant) or are significantly detrimental to fitness (fitness > 0). Genes with high or moderate similarity to another gene in the same genome (paralogues with alignment score above 30% of the self-alignment score) were less likely to have a phenotype (25% versus 32%, P < 10−15, Fisher’s exact test), which is likely to reflect genetic redundancy.

Extended Data Fig. 3 Known DNA repair genes are important for cisplatin resistance.

We compared the growth of a gene deletion strain and the wild-type bacterium under varying cisplatin concentrations. We show all replicate growth curves for each genotype. We believe the higher overall growth for some of the wild-type experiments (for example, top middle) is random. We observe this phenomenon consistently for some bacteria and we speculate that this is due to varying oxygen content across the microplate. a, E. coli radD (n = 6 independent experiments per strain). b, D. shibae Dshi_2244 (n = 3 independent experiments for wild-type and n = 6 independent experiments for the mutant). c, Phaeobacter inhibens PGA1_c08960 (n = 4 independent experiments for wild-type and n = 6 independent experiments for the mutant). Dshi_2244 and PGA1_c08960 are orthologues of MmcB (DUF1052) from C. crescentus24.

Extended Data Fig. 4 EndA, DUF3584, and a FAN1-like VRR-NUC domain protein are important for cisplatin resistance.

As in Extended Data Fig. 3, comparing cisplatin sensitivity of a gene deletion mutant to the wild-type bacterium. a, E. coli endA knockout. cycA encodes an amino acid transporter and is not expected to have a phenotype on cisplatin and is used as a control. Each growth curve is the average of 12 replicate wells and the dashed lines show 95% confidence intervals from the t-test. b, A deletion of S. oneidensis MR-1 SO4008, a member of the DUF3584 protein family (n = 6 independent experiments per strain). c, A deletion of P. stutzeri RCH2 Psest_2235 (n = 4 independent experiments per strain). Psest_1636 is not expected to be involved in DNA repair and is used here as a control. Psest_2235 is a FAN1-like VRR-NUC domain protein25.

Extended Data Fig. 5 The nuclease domain of EndA is important for cisplatin resistance.

We assayed the growth of an E. coli endA− Keio collection deletion mutant carrying one of three different vectors: an empty vector with no insert (endA− + empty), a complementation vector carrying a wild-type copy of endA (endA− + endA), and a complementation vector with a mutant version of endA with an alanine at position 84 instead of histidine (endA− + mutant endA). A mutation of this conserved histidine residue in a close homologue from Vibrio vulnificus has been reported to eliminate nearly all nuclease catalytic activity64. As a control, we assayed the wild-type, parental E. coli strain carrying a vector with no insert (wt + empty). We performed these growth assays on three separate microplates (Plate #1, #2, #3). n = 3 independent experiments per strain in Plate #1; n = 4 independent experiments per strain in Plates #2 and #3. We added 20 µg ml−1 gentamicin to each assay to maintain selection for the plasmids (pBBR1-MCS5 and derivatives). Although the catalytic activity of EndA (endonuclease I) appears to be important for resisting cisplatin, it is not clear how EndA would be involved in DNA repair if it is located in the periplasm, as previously believed65,66,67. We speculate that EndA relocates to the cytoplasm upon DNA damage or that EndA degrades broken DNA that enters the periplasm and would otherwise damage the membrane.

Extended Data Fig. 6 Members of protein family UPF0126 are important for growth on glycine.

Growth comparison of gene deletion mutants in UPF0126 versus wild-type bacteria in minimal defined medium. a, SO1319 from S. oneidensis MR-1, with either ammonium chloride (n = 6 independent experiments per strain) or glycine as the sole source of nitrogen (n = 12 independent experiments per strain). b, PGA1_c00920 from P. inhibens, with glycine as the sole source of carbon (n = 8 independent experiments for wild-type and n = 16 independent experiments for the mutant). c, Psest_1636 from P. stutzeri RCH2, with either ammonium chloride (n = 4 independent experiments per strain) or glycine (n = 8 independent experiments per strain) as the sole source of nitrogen. The Psest_2235 deletion strain is used as a control and is not expected to have a phenotype in these conditions.

Extended Data Fig. 7 PGA1_c00920 partially rescues the glycine growth defect of an E. coli cycA mutant.

CycA is a glycine transporter from E. coli and a mutant in this gene has reduced uptake of glycine37. We investigated whether a member of the UPF0126 protein family could rescue the glycine growth defect of an E. coli cycA deletion strain. We introduced different plasmids into the E. coli cycA Keio collection deletion background: an empty plasmid with no insert (cycA− + empty), a plasmid with a wild-type allele of the E. coli cycA gene (cycA− + cycA), and a plasmid with PGA1_c00920 from P. inhibens (cycA− + PGA1_c00920). We compared the growth of these strains and a wild-type E. coli control (wt + empty) in defined media with either ammonium chloride (n = 2 independent experiments per strain) or glycine as the sole source of nitrogen (n = 4 independent experiments per strain). PGA1_c00920 partially rescues the glycine-specific growth defect of the cycA− deletion strain.

Extended Data Fig. 8 Overexpression of members of protein family UPF0060 confers resistance to thallium.

We introduced three plasmids into wild-type E. coli: a plasmid control with no insert (Empty vector), a plasmid carrying RR42_RS34240 from C. basilensis 4G11, and a plasmid carrying Pf6N2E2_2547 from P. fluorescens FW300-N2E2. We assayed the growth of these strains in LB at 30 °C with varying concentrations of thallium(I) acetate (n = 6 independent experiments per strain). We added 50 µg ml−1 kanamycin to each assay to maintain selection for the plasmids (pFAB2286 and derivatives). RR42_RS34240 and Pf6N2E2_2547 are members of the UPF0060 protein family.

Extended Data Fig. 9 Relevance to all bacteria.

We selected 2,593 hypothetical or vaguely annotated proteins from diverse bacterial species, compared them to the protein-coding genes for which we have fitness data (using protein BLAST), and identified potential orthologues as best hits that were homologous over at least 75% of each protein’s length. We show the fraction of these proteins that have a potential orthologue with each type of phenotype and that is above a given level of amino acid sequence similarity. Similarity was defined as the ratio of the alignment’s bit score to the score from aligning the query to itself.

Extended Data Fig. 10 Alternative ways of computing cofitness.

a, The effect of rescaling the cofitness values by the number of generations in six bacteria. For each of the six bacteria, we identified all pairs of protein-coding genes that were assigned to the same TIGR subrole, were more than 20 kB apart, and had fitness data. This gave 1,711–9,406 pairs per bacterium. We also selected a random subset of pairs that were assigned to different TIGR subroles, were more than 20 kB apart, and had fitness data (1,559–8,881 pairs per bacterium). For each pair, we compared the original cofitness values to the rescaled cofitness (computed from fitness values that were divided by the number of generations). b, The effect of averaging fitness scores from replicate experiments on the cofitness values.

Supplementary information

Supplementary Information

This file contains a Supplementary Table guide, Supplementary Figures 1-5, Supplementary Notes 1-6 and Supplementary References

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-22. The tables are provided in a single Excel file with separate tabs for each table

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Price, M.N., Wetmore, K.M., Waters, R.J. et al. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557, 503–509 (2018). https://doi.org/10.1038/s41586-018-0124-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-018-0124-0

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing