Retroelement-guided protein diversification abounds in vast lineages of Bacteria and Archaea

Paul, Blair G.; Burstein, David; Castelle, Cindy J.; Handa, Sumit; Arambula, Diego; Czornyj, Elizabeth; Thomas, Brian C.; Ghosh, Partho; Miller, Jeff F.; Banfield, Jillian F.; Valentine, David L.

doi:10.1038/nmicrobiol.2017.45

Letter
Published: 03 April 2017

Retroelement-guided protein diversification abounds in vast lineages of Bacteria and Archaea

Blair G. Paul¹,
David Burstein²,
Cindy J. Castelle²,
Sumit Handa³,
Diego Arambula⁴,
Elizabeth Czornyj⁴,
Brian C. Thomas²,
Partho Ghosh³,
Jeff F. Miller^4,5,6,
Jillian F. Banfield^2,7,8 &
…
David L. Valentine ORCID: orcid.org/0000-0001-5914-9107^1,9

Nature Microbiology volume 2, Article number: 17045 (2017) Cite this article

3454 Accesses
52 Citations
118 Altmetric
Metrics details

Subjects

Abstract

Major radiations of enigmatic Bacteria and Archaea with large inventories of uncharacterized proteins are a striking feature of the Tree of Life^1–5. The processes that led to functional diversity in these lineages, which may contribute to a host-dependent lifestyle, are poorly understood. Here, we show that diversity-generating retroelements (DGRs), which guide site-specific protein hypervariability^6–8, are prominent features of genomically reduced organisms from the bacterial candidate phyla radiation (CPR) and as yet uncultivated phyla belonging to the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota and Nanohaloarchaea) archaeal superphylum. From reconstructed genomes we have defined monophyletic bacterial and archaeal DGR lineages that expand the known DGR range by 120% and reveal a history of horizontal retroelement transfer. Retroelement-guided diversification is further shown to be active in current CPR and DPANN populations, with an assortment of protein targets potentially involved in attachment, defence and regulation. Based on observations of DGR abundance, function and evolutionary history, we find that targeted protein diversification is a pronounced trait of CPR and DPANN phyla compared to other bacterial and archaeal phyla. This diversification mechanism may provide CPR and DPANN organisms with a versatile tool that could be used for adaptation to a dynamic, host-dependent existence.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Prevalence of DGRs identified in groundwater metagenomes.**

**Figure 2: Phylogeny of DGRs and radiation of novel lineages.**

**Figure 3: Putative functional classes of DGR variable proteins.**

Ecology and molecular targets of hypermutation in the global microbiome

Article Open access 24 May 2021

Ongoing shuffling of protein fragments diversifies core viral functions linked to interactions with bacterial hosts

Article Open access 28 November 2023

DNA transposons mediate duplications via transposition-independent and -dependent mechanisms in metazoans

Article Open access 13 July 2021

References

Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).
Article CAS Google Scholar
Castelle, C. J. et al. Genomic expansion of domain Archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690–701 (2015).
Article CAS Google Scholar
Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015).
Article CAS Google Scholar
Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
Article CAS Google Scholar
Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).
Article CAS Google Scholar
Liu, M. et al. Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science 295, 2091–2094 (2002).
Article CAS Google Scholar
Doulatov, S. et al. Tropism switching in Bordetella bacteriophage defines a family of diversity-generating retroelements. Nature 431, 476–481 (2004).
Article CAS Google Scholar
Guo, H., Arambula, D., Ghosh, P. & Miller, J. F. Diversity-generating retroelements in phage and bacterial genomes. Microbiol. Spectr. http://dx.doi.org/10.1128/microbiolspec.MDNA3-0029-2014 (2014).
Comolli, L. R., Baker, B. J., Downing, K. H., Siegerist, C. E. & Banfield, J. F. Three-dimensional analysis of the structure and ecology of a novel, ultra-small archaeon. ISME J. 3, 159–167 (2009).
Article CAS Google Scholar
Baker, B. J. et al. Enigmatic, ultrasmall, uncultivated Archaea. Proc. Natl Acad. Sci. USA 107, 8806–8811 (2010).
Article CAS Google Scholar
Luef, B. et al. Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 6, 6372 (2015).
Article CAS Google Scholar
Gong, J., Qing, Y., Guo, X. & Warren, A. Candidatus sonnebornia yantaiensis, a member of candidate division OD1, as intracellular bacteria of the ciliated protist Paramecium bursaria (Ciliophora, oligohymenophorea). Syst. Appl. Microbiol. 37, 35–41 (2014).
Article CAS Google Scholar
Nelson, W. C. & Stegen, J. C. The reduced genomes of parcubacteria (OD1) contain signatures of a symbiotic lifestyle. Front. Microbiol. 6, 713 (2015).
Article Google Scholar
Valentine, D. L. Adaptations to energy stress dictate the ecology and evolution of the Archaea. Nat. Rev. Microbiol. 5, 316–323 (2007).
Article CAS Google Scholar
Paul, B. G. et al. Targeted diversity generation by intraterrestrial Archaea and archaeal viruses. Nat. Commun. 6, 6585 (2015).
Article CAS Google Scholar
Le Coq, J. & Ghosh, P. Conservation of the C-type lectin fold for massive sequence variation in a treponema diversity-generating retroelement. Proc. Natl Acad. Sci. USA 108, 14649–14653 (2011).
Article CAS Google Scholar
Arambula, D. et al. Surface display of a massively variable lipoprotein by a Legionella diversity-generating retroelement. Proc. Natl Acad. Sci. USA 110, 8212–8217 (2013).
Article CAS Google Scholar
Pfreundt, U., Kopf, M., Belkin, N., Berman-Frank, I. & Hess, W. R. The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101. Sci. Rep. 4, 6187 (2014).
Article CAS Google Scholar
Miller, J. L. et al. Selective ligand recognition by a diversity-generating retroelement variable protein. PLoS Biol. 6, e131 (2008).
Article Google Scholar
Handa, S., Paul, B. G., Valentine, D. L., Miller, J. F. & Ghosh, P. Conservation of the C-type lectin fold for accommodating massive sequence variation in archaeal diversity-generating retroelements. BMC Struct. Biol. 16, 13 (2016).
Article Google Scholar
Nimkulrat, S. et al. Genomic and metagenomic analysis of diversity-generating retroelements associated with Treponema denticola. Front. Microbiol. 7, 852 (2016).
Article Google Scholar
Guo, H. et al. Diversity-generating retroelement homing regenerates target sequences for repeated rounds of codon rewriting and protein diversification. Mol. Cell 31, 813–823 (2008).
Article CAS Google Scholar
Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111–120 (2013).
Article CAS Google Scholar
Guo, H. et al. Target site recognition by a diversity-generating retroelement. PLoS Genet. 7, e1002414 (2011).
Article CAS Google Scholar
Minot, S., Grunberg, S., Wu, G. D., Lewis, J. D. & Bushman, F. D. Hypervariable loci in the human gut virome. Proc. Natl Acad. Sci. USA 109, 3962–3966 (2012).
Article CAS Google Scholar
Schillinger, T., Lisfi, M., Chi, J., Cullum, J. & Zingler, N. Analysis of a comprehensive dataset of diversity generating retroelements generated by the program DiGReF. BMC Genomics 13, 430 (2012).
Article CAS Google Scholar
Ye, Y. Identification of diversity-generating retroelements in human microbiomes. Int. J. Mol. Sci. 15, 14234–14246 (2014).
Article CAS Google Scholar
Zimmerly, S. & Wu, L. An unexplored diversity of reverse transcriptases in bacteria. Microbiol. Spectr. https://dx.doi.org/10.1128/microbiolspec.MDNA3-0058-2014 (2015).
Xu, Q. et al. A distinct type of pilus from the human microbiome. Cell 165, 690–703 (2016).
Article CAS Google Scholar
Anantharaman, V., Koonin, E. V. & Aravind, L. Regulatory potential, phyletic distribution and evolution of ancient, intracellular small-molecule-binding domains. J. Mol. Biol. 307, 1271–1292 (2001).
Article CAS Google Scholar
Peng, Y. Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
Article CAS Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Article Google Scholar
Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).
Article CAS Google Scholar
Kelley, L. A. & Sternberg, M. J. Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4, 363–371 (2009).
Article CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS Google Scholar
Ultsch, A. & Moerchen, F. ESOM-maps: Tools for Clustering. Visualization, and Classification with Emergent SOM, Technology Report, Department of Mathematics and Computer Science No. 46 (University of Marburg, 2005).
Dick, G. J. et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, 1–16 (2009).
Article Google Scholar
Raes, J., Korbel, J. O., Lercher, M. J., von Mering, C. & Bork, P. Prediction of effective genome size in metagenomic samples. Genome Biol. 8, R10 (2007).
Article Google Scholar
Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415 (2003).
Article CAS Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Article CAS Google Scholar
Burge, S. W. et al. Rfam 11.0: 10 years of RNA families. Nucleic Acids Res. 41, D226–D232 (2013).
Article CAS Google Scholar
Simon, D. M. & Zimmerly, S. A diversity of uncharacterized reverse transcriptases in bacteria. Nucleic Acids Res. 36, 7219–7229 (2008).
Article CAS Google Scholar
Eddy, S. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).
Article CAS Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Article CAS Google Scholar
Thompson, J. D., Gibson, T. & Higgins, D. G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics http://dx.doi.org/10.1002/0471250953.bi0203s00 (2002).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

This research was funded by National Science Foundation grant no. OCE-1046144 to D.L.V., National Institutes of Health grant no. R01 AI096838 to J.F.M. and P.G., and by the US Department of Energy (DOE), Office of Science, Office of Biological and Environmental Research under award no. DE-AC02-05CH11231 (Sustainable Systems Scientific Focus Area; Lawrence Berkley National Laboratory operated by the University of California) and award no. DE-SC0004918 (Systems Biology Knowledge Base Focus Area). Sequencing was performed at the US DOE Joint Genome Institute, a DOE Office of Science User Facility, supported under contract no. DE-AC02-05CH11231. Metatranscriptomes were sequenced at the DOE-supported Environmental Molecular Sciences Laboratory at Pacific Northwest National Laboratory. B.G.P. was supported by a postdoctoral fellowship from the Center for Dark Energy Biosphere Investigations (C-DEBI). D.B. was supported by a long-term EMBO fellowship. The authors thank K. Anantharaman for assistance with genome binning, A. Singh and C.T. Brown, who aided in examining CPR and DPANN genomes and C. Magnabosco for offering insights on phylogenetic reconstruction. This is C-DEBI contribution no. 361.

Author information

Authors and Affiliations

Marine Science Institute, University of California, Santa Barbara, 93106, California, USA
Blair G. Paul & David L. Valentine
Department of Earth and Planetary Science, University of California, Berkeley, 94720, California, USA
David Burstein, Cindy J. Castelle, Brian C. Thomas & Jillian F. Banfield
Department of Chemistry and Biochemistry, UC San Diego, La Jolla, 92093, California, USA
Sumit Handa & Partho Ghosh
Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, 90095, California, USA
Diego Arambula, Elizabeth Czornyj & Jeff F. Miller
Molecular Biology Institute, University of California, Los Angeles, 90095, California, USA
Jeff F. Miller
California NanoSystems Institute, University of California, Los Angeles, 90095, California, USA
Jeff F. Miller
Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, 94720, California, USA
Jillian F. Banfield
Department of Environmental Science, Policy, and Management, University of California, Berkeley, 94720, California, USA
Jillian F. Banfield
Department of Earth Science, UC Santa Barbara, Santa Barbara, 93106, California, USA
David L. Valentine

Authors

Blair G. Paul
View author publications
You can also search for this author in PubMed Google Scholar
David Burstein
View author publications
You can also search for this author in PubMed Google Scholar
Cindy J. Castelle
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Handa
View author publications
You can also search for this author in PubMed Google Scholar
Diego Arambula
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Czornyj
View author publications
You can also search for this author in PubMed Google Scholar
Brian C. Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Partho Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Jeff F. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Jillian F. Banfield
View author publications
You can also search for this author in PubMed Google Scholar
David L. Valentine
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.G.P. and D.L.V. developed the project. B.G.P., D.B., C.J.C., B.C.T. and J.F.B. performed reassembly, read mapping and annotation of the metagenomic and metatranscriptomics data sets. B.G.P., D.B., C.J.C., E.C., D.A., S.H., P.G., J.F.M., J.F.B. and D.L.V. conducted bioinformatic analyses on DGR sequences. B.G.P., D.B., C.J.C., J.F.B. and D.L.V. wrote the manuscript.

Corresponding author

Correspondence to David L. Valentine.

Ethics declarations

Competing interests

J.F.M. is a cofounder, equity holder and chair of the scientific advisory board of AvidBiotics Inc., a biotherapeutics company in San Francisco. No other authors declare competing financial interests.

Supplementary information

Supplementary Information

Supplementary Figures 1–9 (PDF 1731 kb)

Supplementary Tables 1–10

Supplementary Table 1: DGRs that appear to be active based on readmapping and a stemloop-like sequence. Ns substitutions linked to TR adenines were inferred from VR-read-mapping and putative DGR stemloops were predicted using the Mfold DNA folding server (see Methods). The number of stemloops is shown incrementally for the same DGR. (3'-) distance from VR to the beginning of the stemloop is given in nucleotides. (XLSX 211 kb)

Supplementary Table 2: Metatranscriptomic readmapping analysis for DGRs that recruited at least ten perfect-matching transcripts. Relative proportions are given for transcripts mapping to DGRs versus the whole contig, and separately for transcripts mapping to TR versus the sum for all other DGR features.

Supplementary Table 3: Annotation details for DUF1566 (PF07603) containing DGR variable proteins. Variable protein length is given in amino acids. Transmembrane (TM) predictions are shown as “yes”, “no”, or “signal peptide”. The best hit from HMMER is listed with its corresponding e-value. Phyre2 values are given as per cent confidence (conf) and per cent coverage of the variable protein (covg).

Supplementary Table 4: Taxonomic affiliations of AAA_5 ATPase (PF07728) domain-containing DGR variable proteins. Rows are coloured by domain. Best hits were retrieved using pHMMER searches against the Uniprot database.

Supplementary Table 5: DGR-containing scaffolds and feature coordinates, including RT, VP (up to three), VR (up to three), and TR. Genome bin affiliations are given for each scaffold.

Supplementary Table 6: DGR-containing scaffolds and feature coordinates for scaffolds with more than one DGR cassette (up to three distinct DGRs for a single scaffold).

Supplementary Table 7: DGR-containing scaffolds and feature annotations for DGRs with split/interrupted RT open reading frames.

Supplementary Table 8: Variable proteins with homology to known pfams or database UniProtKB representatives. NA, or not applicable, indicates that no significant hit was returned from the database.

Supplementary Table 9: Index of reverse transcriptase (RT) tree labels. Representatives listed under Database as “Genbank”, have tree labels that are NCBI accession numbers.

Supplementary Table 10: DGR-containing scaffolds and corresponding Genbank accession codes.

Supplementary Data 1

All DGR-containing sequences that are described in this study, which were derived from draft genomes. (TXT 44046 kb)

Supplementary Data 2

Reverse transcriptase protein sequences for all DGRs from draft genomes. (TXT 259 kb)

Supplementary Data 3

All DGR targeted variable protein sequences. (TXT 255 kb)

Supplementary Data 4

Reverse transcriptase tree that corresponds to Fig. 2. (TXT 20 kb)

Supplementary Data 5

The reverse transcriptase multiple sequence alignment used to construct the phylogenetic tree in Fig. 2. (TXT 157 kb)

Supplementary Data 6

DGR-containing sequences as assembled metagenomic fragments, which are not contained in a draft genome. (TXT 36182 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paul, B., Burstein, D., Castelle, C. et al. Retroelement-guided protein diversification abounds in vast lineages of Bacteria and Archaea. Nat Microbiol 2, 17045 (2017). https://doi.org/10.1038/nmicrobiol.2017.45

Download citation

Received: 07 September 2016
Accepted: 03 March 2017
Published: 03 April 2017
DOI: https://doi.org/10.1038/nmicrobiol.2017.45