Mutant phenotypes for thousands of bacterial genes of unknown function

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan; Callaghan, Mark; Ray, Jayashree; Liu, Hualan; Kuehl, Jennifer V.; Melnyk, Ryan A.; Lamson, Jacob S.; Suh, Yumi; Carlson, Hans K.; Esquivel, Zuelma; Sadeeshkumar, Harini; Chakraborty, Romy; Zane, Grant M.; Rubin, Benjamin E.; Wall, Judy D.; Visel, Axel; Bristow, James; Blow, Matthew J.; Arkin, Adam P.; Deutschbauer, Adam M.

doi:10.1038/s41586-018-0124-0

Article
Published: 16 May 2018

Mutant phenotypes for thousands of bacterial genes of unknown function

Morgan N. Price¹,
Kelly M. Wetmore¹,
R. Jordan Waters²,
Mark Callaghan¹,
Jayashree Ray¹,
Hualan Liu¹,
Jennifer V. Kuehl¹,
Ryan A. Melnyk¹,
Jacob S. Lamson¹,
Yumi Suh¹,
Hans K. Carlson¹,
Zuelma Esquivel¹,
Harini Sadeeshkumar¹,
Romy Chakraborty³,
Grant M. Zane⁴,
Benjamin E. Rubin⁵,
Judy D. Wall⁴,
Axel Visel^2,6,
James Bristow²,
Matthew J. Blow²,
Adam P. Arkin^1,7 &
…
Adam M. Deutschbauer^1,8

Nature volume 557, pages 503–509 (2018)Cite this article

38k Accesses
295 Citations
385 Altmetric
Metrics details

Subjects

Abstract

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because they are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: High-throughput genetics for 32 bacteria.**

**Fig. 2: Identification of conserved phenotypes.**

**Fig. 3: Genetic overviews for a condition or a class of proteins.**

**Fig. 4: Conserved functional associations for genes encoding uncharacterized protein families.**

Environmental conditions shape the nature of a minimal bacterial genome

Article Open access 15 July 2019

A bacterial pan-genome makes gene essentiality strain-dependent and evolvable

Article Open access 12 September 2022

Synonymous mutations in representative yeast genes are mostly strongly non-neutral

Article 08 June 2022

References

Chang, Y.-C. et al. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps. Nucleic Acids Res. 44, D330–D335 (2016).
Article PubMed CAS Google Scholar
Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLOS Comput. Biol. 5, e1000605 (2009).
Article ADS PubMed PubMed Central CAS Google Scholar
Deutschbauer, A. et al. Towards an informative mutant phenotype for every bacterial gene. J. Bacteriol. 196, 3643–3655 (2014).
Article PubMed PubMed Central CAS Google Scholar
Deutschbauer, A. et al. Evidence-based annotation of gene function in Shewanella oneidensis MR-1 using genome-wide fitness profiling across 121 conditions. PLoS Genet. 7, e1002385 (2011).
Article PubMed PubMed Central CAS Google Scholar
Nichols, R. J. et al. Phenotypic landscape of a bacterial cell. Cell 144, 143–156 (2011).
Article PubMed CAS Google Scholar
Price, M. N. et al. The genetic basis of energy conservation in the sulfate-reducing bacterium Desulfovibrio alaskensis G20. Front. Microbiol. 5, 577 (2014).
Article PubMed PubMed Central Google Scholar
Langridge, G. C. et al. Simultaneous assay of every Salmonella typhi gene using one million transposon mutants. Genome Res. 19, 2308–2316 (2009).
Article PubMed PubMed Central CAS Google Scholar
van Opijnen, T., Bodi, K. L. & Camilli, A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat. Methods 6, 767–772 (2009).
Article PubMed PubMed Central CAS Google Scholar
Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio 6, e00306–e00315 (2015).
Article PubMed PubMed Central CAS Google Scholar
Liu, H. et al. Magic pools: parallel assessment of transposon delivery vectors in bacteria. mSystems 3, e00143–17 (2018).
Article PubMed PubMed Central Google Scholar
Rubin, B. E. et al. The essential gene set of a photosynthetic organism. Proc. Natl Acad. Sci. USA 112, E6634–E6643 (2015).
Article PubMed PubMed Central CAS Google Scholar
Melnyk, R. A. et al. Novel mechanism for scavenging of hypochlorite involving a periplasmic methionine-rich peptide and methionine sulfoxide reductase. MBio 6, e00233–15 (2015).
Article PubMed PubMed Central CAS Google Scholar
Smith, A. M. et al. Quantitative phenotyping via deep barcode sequencing. Genome Res. 19, 1836–1842 (2009).
Article PubMed PubMed Central CAS Google Scholar
Rensing, C., Pribyl, T. & Nies, D. H. New functions for the three subunits of the CzcCBA cation-proton antiporter. J. Bacteriol. 179, 6871–6879 (1997).
Article PubMed PubMed Central CAS Google Scholar
Hottes, A. K. et al. Bacterial adaptation through loss of function. PLoS Genet. 9, e1003617 (2013).
Article PubMed PubMed Central CAS Google Scholar
Haft, D. H. et al. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 41, D387–D395 (2013).
Article PubMed CAS Google Scholar
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).
Article PubMed CAS Google Scholar
Baker, J. L. et al. Widespread genetic switches and toxicity resistance proteins for fluoride. Science 335, 233–235 (2012).
Article ADS PubMed CAS Google Scholar
Keseler, I. M. et al. EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41, D605–D612 (2013).
Article PubMed CAS Google Scholar
Hillenmeyer, M. E. et al. Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action. Genome Biol. 11, R30 (2010).
Article PubMed PubMed Central CAS Google Scholar
Rabus, R., Reizer, J., Paulsen, I. & Saier, M. H. Jr Enzyme I^Ntr from Escherichia coli. A novel enzyme of the phosphoenolpyruvate-dependent phosphotransferase system exhibiting strict specificity for its phosphoryl acceptor, NPr. J. Biol. Chem. 274, 26185–26191 (1999).
Article PubMed CAS Google Scholar
van Opijnen, T., Dedrick, S. & Bento, J. Strain dependent genetic networks for antibiotic-sensitivity in a bacterial pathogen with a large pan-genome. PLoS Pathog. 12, e1005869 (2016).
Article PubMed PubMed Central CAS Google Scholar
Chen, S. H., Byrne, R. T., Wood, E. A. & Cox, M. M. Escherichia coli radD (yejH) gene: a novel function involved in radiation resistance and double-strand break repair. Mol. Microbiol. 95, 754–768 (2015).
Article PubMed PubMed Central CAS Google Scholar
Lopes-Kulishev, C. O. et al. Functional characterization of two SOS-regulated genes involved in mitomycin C resistance in Caulobacter crescentus. DNA Repair (Amst.) 33, 78–89 (2015).
Article CAS Google Scholar
Gwon, G. H. et al. Crystal structure of a Fanconi anemia-associated nuclease homolog bound to 5′ flap DNA: basis of interstrand cross-link repair by FAN1. Genes Dev. 28, 2276–2290 (2014).
Article PubMed PubMed Central CAS Google Scholar
Justice, S. S., Hunstad, D. A., Cegelski, L. & Hultgren, S. J. Morphological plasticity as a bacterial survival strategy. Nat. Rev. Microbiol. 6, 162–168 (2008).
Article PubMed CAS Google Scholar
da Rocha, R. P., Paquola, A. C. de M., Marques Mdo, V., Menck, C. F. M. & Galhardo, R. S. Characterization of the SOS regulon of Caulobacter crescentus. J. Bacteriol. 190, 1209–1218 (2008).
Article PubMed CAS Google Scholar
Abella, M., Campoy, S., Erill, I., Rojo, F. & Barbé, J. Cohabitation of two different lexA regulons in Pseudomonas putida. J. Bacteriol. 189, 8855–8862 (2007).
Article PubMed PubMed Central CAS Google Scholar
Cirz, R. T., O’Neill, B. M., Hammond, J. A., Head, S. R. & Romesberg, F. E. Defining the Pseudomonas aeruginosa SOS response and its role in the global response to the antibiotic ciprofloxacin. J. Bacteriol. 188, 7101–7110 (2006).
Article PubMed PubMed Central CAS Google Scholar
Wiegmann, K. et al. Carbohydrate catabolism in Phaeobacter inhibens DSM 17395, a member of the marine roseobacter clade. Appl. Environ. Microbiol. 80, 4725–4737 (2014).
Article PubMed PubMed Central CAS Google Scholar
Brouns, S. J. J. et al. Identification of the missing links in prokaryotic pentose oxidation pathways: evidence for enzyme recruitment. J. Biol. Chem. 281, 27378–27388 (2006).
Article PubMed CAS Google Scholar
Johnsen, U. et al. d-xylose degradation pathway in the halophilic archaeon Haloferax volcanii. J. Biol. Chem. 284, 27290–27303 (2009).
Article PubMed PubMed Central CAS Google Scholar
Stephens, C. et al. Genetic analysis of a novel pathway for d-xylose metabolism in Caulobacter crescentus. J. Bacteriol. 189, 2181–2185 (2007).
Article PubMed CAS Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article PubMed CAS Google Scholar
Overbeek, R. et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206–D214 (2014).
Article PubMed CAS Google Scholar
Iwamoto, R. & Imanaga, Y. Direct evidence of the Entner–Doudoroff pathway operating in the metabolism of d-glucosamine in bacteria. J. Biochem. 109, 66–69 (1991).
PubMed CAS Google Scholar
Ghrist, A. C. & Stauffer, G. V. The Escherichia coli glycine transport system and its role in the regulation of the glycine cleavage enzyme system. Microbiology 141, 133–140 (1995).
Article PubMed CAS Google Scholar
Figueira, R. et al. Adaptation to sustained nitrogen starvation by Escherichia coli requires the eukaryote-like serine/threonine kinase YeaG. Sci. Rep. 5, 17524 (2015).
Article ADS PubMed PubMed Central CAS Google Scholar
Tagourti, J., Landoulsi, A. & Richarme, G. Cloning, expression, purification and characterization of the stress kinase YeaG from Escherichia coli. Protein Expr. Purif. 59, 79–85 (2008).
Article PubMed CAS Google Scholar
Thorgersen, M. P. et al. Molybdenum availability is key to nitrate removal in contaminated groundwater environments. Appl. Environ. Microbiol. 81, 4976–4983 (2015).
Article PubMed PubMed Central CAS Google Scholar
Ray, J. et al. Complete genome sequence of Cupriavidus basilensis 4G11, isolated from the Oak Ridge Field Research Center site. Genome Announc. 3, e00322–15 (2015).
Article PubMed PubMed Central Google Scholar
Vaccaro, B. J. et al. Novel metal cation resistance systems from mutant fitness analysis of denitrifying Pseudomonas stutzeri. Appl. Environ. Microbiol. 82, 6046–6056 (2016).
Article PubMed PubMed Central CAS Google Scholar
Kovach, M. E. et al. Four new derivatives of the broad-host-range cloning vector pBBR1MCS, carrying different antibiotic-resistance cassettes. Gene 166, 175–176 (1995).
Article PubMed CAS Google Scholar
Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006).
Article PubMed PubMed Central CAS Google Scholar
Kuehl, J. V. et al. Functional genomics with a comprehensive library of transposon mutants for the sulfate-reducing bacterium Desulfovibrio alaskensis G20. MBio 5, e01041–14 (2014).
Article PubMed PubMed Central CAS Google Scholar
Zane, G. M., Yen, H. C. & Wall, J. D. Effect of the deletion of qmoABC and the promoter-distal gene encoding a hypothetical protein on sulfate reduction in Desulfovibrio vulgaris Hildenborough. Appl. Environ. Microbiol. 76, 5500–5509 (2010).
Article PubMed PubMed Central CAS Google Scholar
Kahm, M., Hasenbrink, G., Lichtenberg-Frate, H., Ludwig, J. & Kschischo, M. grofit: fitting biological growth curves with R. J. Stat. Softw. 33, 1–21 (2010).
Article Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article MathSciNet PubMed PubMed Central CAS Google Scholar
Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 30, 701–707 (2012).
Article PubMed PubMed Central CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article PubMed PubMed Central CAS Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Tritt, A., Eisen, J. A., Facciotti, M. T. & Darling, A. E. An integrated pipeline for de novo assembly of microbial genomes. PLoS One 7, e42304 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Hunt, M. et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294 (2015).
Article PubMed PubMed Central CAS Google Scholar
Eddy, S. R. Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011).
Article ADS MathSciNet PubMed PubMed Central CAS Google Scholar
Wu, M. & Scott, A. J. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034 (2012).
Article PubMed CAS Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010).
Article ADS PubMed PubMed Central CAS Google Scholar
Sagawa, S., Price, M. N., Deutschbauer, A. M. & Arkin, A. P. Validating regulatory predictions from diverse bacteria with mutant fitness data. PLoS One 12, e0178258 (2017).
Article PubMed PubMed Central CAS Google Scholar
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G. D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl Acad. Sci. USA 96, 2896–2901 (1999).
Article ADS PubMed PubMed Central CAS Google Scholar
Aziz, R. K. et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008).
Article PubMed PubMed Central CAS Google Scholar
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471–D480 (2016).
Article PubMed CAS Google Scholar
Price, M. N. & Arkin, A. P. PaperBLAST: text mining papers for information about homologs. mSystems 2, e00039–17 (2017).
Article PubMed PubMed Central Google Scholar
Marchler-Bauer, A. et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43, D222–D226 (2015).
Article PubMed CAS Google Scholar
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Article PubMed PubMed Central CAS Google Scholar
Li, C.-L. et al. DNA binding and cleavage by the periplasmic nuclease Vvn: a novel structure with a known active site. EMBO J. 22, 4014–4025 (2003).
Article PubMed PubMed Central CAS Google Scholar
Ananthaswamy, H. N. The release of endonuclease I from Escherichia coli by a new cold shock procedure. Biochem. Biophys. Res. Commun. 76, 289–298 (1977).
Article CAS Google Scholar
Lopes, J., Gottfried, S. & Rothfield, L. Leakage of periplasmic enzymes by mutants of Escherichia coli and Salmonella typhimurium: isolation of ‘periplasmic leaky’ mutants. J. Bacteriol. 109, 520–525 (1972).
PubMed PubMed Central CAS Google Scholar
Nossal, N. G. & Heppel, L. A. The release of enzymes by osmotic shock from Escherichia coli in exponential phase. J. Biol. Chem. 241, 3055–3062 (1966).
PubMed CAS Google Scholar

Download references

Acknowledgements

We thank V. Lo, W. Shao, and K. Keller for technical assistance with the Fitness Browser website. Sequencing was performed at: the Vincent J. Coates Genomics Sequencing Laboratory (University of California at Berkeley), supported by NIH S10 Instrumentation Grants S10RR029668, S10RR027303, and OD018174; the DOE Joint Genome Institute; the College of Biological Sciences ^UCDNA Sequencing Facility (UC Davis); and the Institute for Genomics Sciences (University of Maryland). Studies of novel isolates were conducted by ENIGMA and were supported by the Office of Science, Office of Biological and Environmental Research of the US Department of Energy, under contract DE-AC02-05CH11231. The other data collection was supported by Laboratory Directed Research and Development (LDRD) funding from Berkeley Laboratory, provided by the Director, Office of Science, of the US Department of Energy under contract DE-AC02-05CH11231 and a Community Science Project from the Joint Genome Institute to M.J.B., J.B., A.P.A., and A.M.D. The work conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Morgan N. Price, Kelly M. Wetmore, Mark Callaghan, Jayashree Ray, Hualan Liu, Jennifer V. Kuehl, Ryan A. Melnyk, Jacob S. Lamson, Yumi Suh, Hans K. Carlson, Zuelma Esquivel, Harini Sadeeshkumar, Adam P. Arkin & Adam M. Deutschbauer
Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
R. Jordan Waters, Axel Visel, James Bristow & Matthew J. Blow
Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Romy Chakraborty
Department of Biochemistry, University of Missouri, Columbia, MO, USA
Grant M. Zane & Judy D. Wall
Division of Biological Sciences, University of California, San Diego, CA, USA
Benjamin E. Rubin
School of Natural Sciences, University of California, Merced, CA, USA
Axel Visel
Department of Bioengineering, University of California, Berkeley, CA, USA
Adam P. Arkin
Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
Adam M. Deutschbauer

Authors

Morgan N. Price
View author publications
You can also search for this author in PubMed Google Scholar
Kelly M. Wetmore
View author publications
You can also search for this author in PubMed Google Scholar
R. Jordan Waters
View author publications
You can also search for this author in PubMed Google Scholar
Mark Callaghan
View author publications
You can also search for this author in PubMed Google Scholar
Jayashree Ray
View author publications
You can also search for this author in PubMed Google Scholar
Hualan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer V. Kuehl
View author publications
You can also search for this author in PubMed Google Scholar
Ryan A. Melnyk
View author publications
You can also search for this author in PubMed Google Scholar
Jacob S. Lamson
View author publications
You can also search for this author in PubMed Google Scholar
Yumi Suh
View author publications
You can also search for this author in PubMed Google Scholar
Hans K. Carlson
View author publications
You can also search for this author in PubMed Google Scholar
Zuelma Esquivel
View author publications
You can also search for this author in PubMed Google Scholar
Harini Sadeeshkumar
View author publications
You can also search for this author in PubMed Google Scholar
Romy Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Grant M. Zane
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin E. Rubin
View author publications
You can also search for this author in PubMed Google Scholar
Judy D. Wall
View author publications
You can also search for this author in PubMed Google Scholar
Axel Visel
View author publications
You can also search for this author in PubMed Google Scholar
James Bristow
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Blow
View author publications
You can also search for this author in PubMed Google Scholar
Adam P. Arkin
View author publications
You can also search for this author in PubMed Google Scholar
Adam M. Deutschbauer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.M.D., A.P.A., M.N.P., M.J.B., and J.B. conceived the project. A.M.D., A.P.A., M.J.B., and J.B. supervised the project. A.M.D. led the experimental work. A.M.D., K.M.W., R.J.W., R.A.M., M.C., J.R., J.V.K., H.L., H.K.C., J.S.L., Y.S., Z.E., and H.S. collected data. R.C. isolated bacteria. M.N.P. and A.M.D. analysed the fitness data. R.A.M., R.J.W., and M.N.P. assembled genomes. B.E.R. provided resources and advice on S. elongatus experiments. G.M.Z. and J.D.W. generated gene deletion mutants in Pseudomonas stutzeri RCH2. A.V. edited the manuscript and provided advice. M.N.P., M.J.B., and A.M.D. wrote the paper.

Corresponding authors

Correspondence to Matthew J. Blow, Adam P. Arkin or Adam M. Deutschbauer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Examples of nitrogen source and stress fitness experiments.

a, The utilization of d-alanine or cytosine by Azospirillum brasilense Sp245. Each point shows the fitness of a gene in the two conditions. The data are the average of two biological replicates for each nitrogen source. Amino acid synthesis genes were identified using the top-level role in TIGRFAMs. The genes for d-alanine utilization were a d-amino acid dehydrogenase (AZOBR_RS08020), an ABC transporter operon (AZOBR_RS08235:RS08260), and a LysR family regulator (AZOBR_RS21915). The genes for cytosine utilization were cytosine deaminase (AZOBR_RS31895) and two ABC transporter operons (AZOBR_RS06950:RS06965 and AZOBR_RS31875:RS31885). b, Zinc stress in S. loihica PV-4. We compare fitness in rich medium with added zinc (II) sulfate to fitness in plain rich medium. The LB data are the average of two biological replicates. The highlighted genes include a putative heavy metal efflux pump (CzcCBA or Shew_3358:Shew_3356), a hypothetical protein at the beginning of the czc operon (CzcX), a zinc-responsive regulator (ZntR or Shew_3411), and another heavy metal efflux gene related to arsP or DUF318 (Shew_3410). CzcX lacks homology to any characterized protein, but homologues in other strains of Shewanella are also specifically important for resisting zinc stress. In both panels, the lines show x = 0, y = 0, and x = y.

Extended Data Fig. 2 Phenotypes versus types of genes.

We categorized proteins in our data set by their type of annotation or by whether they have homologues in the same genome (‘paralogues’). For each category, we show the fraction of genes that have statistically significant phenotypes, and more specifically the fractions that have strong phenotypes (|fitness| > 2 and statistically significant) or are significantly detrimental to fitness (fitness > 0). Genes with high or moderate similarity to another gene in the same genome (paralogues with alignment score above 30% of the self-alignment score) were less likely to have a phenotype (25% versus 32%, P < 10⁻¹⁵, Fisher’s exact test), which is likely to reflect genetic redundancy.

Extended Data Fig. 3 Known DNA repair genes are important for cisplatin resistance.

We compared the growth of a gene deletion strain and the wild-type bacterium under varying cisplatin concentrations. We show all replicate growth curves for each genotype. We believe the higher overall growth for some of the wild-type experiments (for example, top middle) is random. We observe this phenomenon consistently for some bacteria and we speculate that this is due to varying oxygen content across the microplate. a, E. coli radD (n = 6 independent experiments per strain). b, D. shibae Dshi_2244 (n = 3 independent experiments for wild-type and n = 6 independent experiments for the mutant). c, Phaeobacter inhibens PGA1_c08960 (n = 4 independent experiments for wild-type and n = 6 independent experiments for the mutant). Dshi_2244 and PGA1_c08960 are orthologues of MmcB (DUF1052) from C. crescentus²⁴.

Extended Data Fig. 4 EndA, DUF3584, and a FAN1-like VRR-NUC domain protein are important for cisplatin resistance.

As in Extended Data Fig. 3, comparing cisplatin sensitivity of a gene deletion mutant to the wild-type bacterium. a, E. coli endA knockout. cycA encodes an amino acid transporter and is not expected to have a phenotype on cisplatin and is used as a control. Each growth curve is the average of 12 replicate wells and the dashed lines show 95% confidence intervals from the t-test. b, A deletion of S. oneidensis MR-1 SO4008, a member of the DUF3584 protein family (n = 6 independent experiments per strain). c, A deletion of P. stutzeri RCH2 Psest_2235 (n = 4 independent experiments per strain). Psest_1636 is not expected to be involved in DNA repair and is used here as a control. Psest_2235 is a FAN1-like VRR-NUC domain protein²⁵.

Extended Data Fig. 5 The nuclease domain of EndA is important for cisplatin resistance.

We assayed the growth of an E. coli endA− Keio collection deletion mutant carrying one of three different vectors: an empty vector with no insert (endA− + empty), a complementation vector carrying a wild-type copy of endA (endA− + endA), and a complementation vector with a mutant version of endA with an alanine at position 84 instead of histidine (endA− + mutant endA). A mutation of this conserved histidine residue in a close homologue from Vibrio vulnificus has been reported to eliminate nearly all nuclease catalytic activity⁶⁴. As a control, we assayed the wild-type, parental E. coli strain carrying a vector with no insert (wt + empty). We performed these growth assays on three separate microplates (Plate #1, #2, #3). n = 3 independent experiments per strain in Plate #1; n = 4 independent experiments per strain in Plates #2 and #3. We added 20 µg ml⁻¹ gentamicin to each assay to maintain selection for the plasmids (pBBR1-MCS5 and derivatives). Although the catalytic activity of EndA (endonuclease I) appears to be important for resisting cisplatin, it is not clear how EndA would be involved in DNA repair if it is located in the periplasm, as previously believed^65,66,67. We speculate that EndA relocates to the cytoplasm upon DNA damage or that EndA degrades broken DNA that enters the periplasm and would otherwise damage the membrane.

Extended Data Fig. 6 Members of protein family UPF0126 are important for growth on glycine.

Growth comparison of gene deletion mutants in UPF0126 versus wild-type bacteria in minimal defined medium. a, SO1319 from S. oneidensis MR-1, with either ammonium chloride (n = 6 independent experiments per strain) or glycine as the sole source of nitrogen (n = 12 independent experiments per strain). b, PGA1_c00920 from P. inhibens, with glycine as the sole source of carbon (n = 8 independent experiments for wild-type and n = 16 independent experiments for the mutant). c, Psest_1636 from P. stutzeri RCH2, with either ammonium chloride (n = 4 independent experiments per strain) or glycine (n = 8 independent experiments per strain) as the sole source of nitrogen. The Psest_2235 deletion strain is used as a control and is not expected to have a phenotype in these conditions.

Extended Data Fig. 7 PGA1_c00920 partially rescues the glycine growth defect of an E. coli cycA mutant.

CycA is a glycine transporter from E. coli and a mutant in this gene has reduced uptake of glycine³⁷. We investigated whether a member of the UPF0126 protein family could rescue the glycine growth defect of an E. coli cycA deletion strain. We introduced different plasmids into the E. coli cycA Keio collection deletion background: an empty plasmid with no insert (cycA− + empty), a plasmid with a wild-type allele of the E. coli cycA gene (cycA− + cycA), and a plasmid with PGA1_c00920 from P. inhibens (cycA− + PGA1_c00920). We compared the growth of these strains and a wild-type E. coli control (wt + empty) in defined media with either ammonium chloride (n = 2 independent experiments per strain) or glycine as the sole source of nitrogen (n = 4 independent experiments per strain). PGA1_c00920 partially rescues the glycine-specific growth defect of the cycA− deletion strain.

Extended Data Fig. 8 Overexpression of members of protein family UPF0060 confers resistance to thallium.

We introduced three plasmids into wild-type E. coli: a plasmid control with no insert (Empty vector), a plasmid carrying RR42_RS34240 from C. basilensis 4G11, and a plasmid carrying Pf6N2E2_2547 from P. fluorescens FW300-N2E2. We assayed the growth of these strains in LB at 30 °C with varying concentrations of thallium(I) acetate (n = 6 independent experiments per strain). We added 50 µg ml⁻¹ kanamycin to each assay to maintain selection for the plasmids (pFAB2286 and derivatives). RR42_RS34240 and Pf6N2E2_2547 are members of the UPF0060 protein family.

Extended Data Fig. 9 Relevance to all bacteria.

We selected 2,593 hypothetical or vaguely annotated proteins from diverse bacterial species, compared them to the protein-coding genes for which we have fitness data (using protein BLAST), and identified potential orthologues as best hits that were homologous over at least 75% of each protein’s length. We show the fraction of these proteins that have a potential orthologue with each type of phenotype and that is above a given level of amino acid sequence similarity. Similarity was defined as the ratio of the alignment’s bit score to the score from aligning the query to itself.

Extended Data Fig. 10 Alternative ways of computing cofitness.

a, The effect of rescaling the cofitness values by the number of generations in six bacteria. For each of the six bacteria, we identified all pairs of protein-coding genes that were assigned to the same TIGR subrole, were more than 20 kB apart, and had fitness data. This gave 1,711–9,406 pairs per bacterium. We also selected a random subset of pairs that were assigned to different TIGR subroles, were more than 20 kB apart, and had fitness data (1,559–8,881 pairs per bacterium). For each pair, we compared the original cofitness values to the rescaled cofitness (computed from fitness values that were divided by the number of generations). b, The effect of averaging fitness scores from replicate experiments on the cofitness values.

Supplementary information

Supplementary Information

This file contains a Supplementary Table guide, Supplementary Figures 1-5, Supplementary Notes 1-6 and Supplementary References

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-22. The tables are provided in a single Excel file with separate tabs for each table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Price, M.N., Wetmore, K.M., Waters, R.J. et al. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557, 503–509 (2018). https://doi.org/10.1038/s41586-018-0124-0

Download citation

Received: 05 October 2016
Accepted: 09 April 2018
Published: 16 May 2018
Issue Date: 24 May 2018
DOI: https://doi.org/10.1038/s41586-018-0124-0

This article is cited by

Antimicrobial resistance crisis: could artificial intelligence be the solution?
- Guang-Yu Liu
- Dan Yu
- Xiao-Fen Liu
Military Medical Research (2024)
Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations
- Konrad Herbst
- Taiyao Wang
- Daniel Segrè
Communications Biology (2024)
Quantifying the adaptive landscape of commensal gut bacteria using high-resolution lineage tracking
- Daniel P. G. H. Wong
- Benjamin H. Good
Nature Communications (2024)
Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality
- Ramin Hasibi
- Tom Michoel
- Diego A. Oyarzún
npj Systems Biology and Applications (2024)
Functional and evolutionary significance of unknown genes from uncultivated taxa
- Álvaro Rodríguez del Río
- Joaquín Giner-Lamia
- Jaime Huerta-Cepas
Nature (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.