Unusual biology across a group comprising more than 15% of domain Bacteria

  • Nature volume 523, pages 208211 (09 July 2015)
  • doi:10.1038/nature14486
  • Download Citation


A prominent feature of the bacterial domain is a radiation of major lineages that are defined as candidate phyla because they lack isolated representatives. Bacteria from these phyla occur in diverse environments1 and are thought to mediate carbon and hydrogen cycles2. Genomic analyses of a few representatives suggested that metabolic limitations have prevented their cultivation2,3,4,5,6. Here we reconstructed 8 complete and 789 draft genomes from bacteria representing >35 phyla and documented features that consistently distinguish these organisms from other bacteria. We infer that this group, which may comprise >15% of the bacterial domain, has shared evolutionary history, and describe it as the candidate phyla radiation (CPR). All CPR genomes are small and most lack numerous biosynthetic pathways. Owing to divergent 16S ribosomal RNA (rRNA) gene sequences, 50–100% of organisms sampled from specific phyla would evade detection in typical cultivation-independent surveys. CPR organisms often have self-splicing introns and proteins encoded within their rRNA genes, a feature rarely reported in bacteria. Furthermore, they have unusual ribosome compositions. All are missing a ribosomal protein often absent in symbionts, and specific lineages are missing ribosomal proteins and biogenesis factors considered universal in bacteria. This implies different ribosome structures and biogenesis mechanisms, and underlines unusual biology across a large part of the bacterial domain.

  • Subscribe to Nature for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Change history

  • Corrected online 29 January 2016

    Extended Data Table 1 was corrected on 25 January 2016


Primary accessions


Sequence Read Archive

Data deposits

DNA and RNA sequences have been deposited in the NCBI Sequence Read Archive under accession number SRP050083, and genome sequences have been deposited in NCBI BioProject under accession number PRJNA273161 (first versions described here). Genomes are also available through ggKbase: http://ggkbase.berkeley.edu/CPR-complete-draft/organisms. ggKbase is a ‘live data’ site, thus annotations and genomes may be improved after publication.


  1. 1.

    , & New perspective on uncultured bacterial phylogenetic division OP11. Appl. Environ. Microbiol. 70, 845–849 (2004).

  2. 2.

    et al. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337, 1661–1665 (2012).

  3. 3.

    et al. Small genomes and sparse metabolisms of sediment-associated bacteria from four candidate phyla. MBio 4, e00708–e00713 (2013).

  4. 4.

    et al. Metabolic interdependencies between phylogenetically novel fermenters and respiratory organisms in an unconfined aquifer. ISME J. 8, 1452–1463 (2014).

  5. 5.

    et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).

  6. 6.

    et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotechnol. 31, 533–538 (2013).

  7. 7.

    et al. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690–701 (2015).

  8. 8.

    et al. Diverse, uncultivated ultra-small bacterial cells in groundwater. Nature Commun. 6, 6372 (2015).

  9. 9.

    & Homing endonuclease genes: the rise and fall and rise again of a selfish element. Curr. Opin. Genet. Dev. 14, 609–615 (2004).

  10. 10.

    , , & Multiple self-splicing introns in the 16S rRNA genes of giant sulfur bacteria. Proc. Natl Acad. Sci. USA 109, 4203–4208 (2012).

  11. 11.

    et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

  12. 12.

    Bacterial ribosomal RNA in pieces. Mol. Microbiol. 57, 318–325 (2005).

  13. 13.

    , & Toxic introns and parasitic intein in Coxiella burnetii: legacies of a promiscuous past. J. Bacteriol. 190, 5934–5943 (2008).

  14. 14.

    , , & Extremely acidophilic protists from acid mine drainage host Rickettsiales-lineage endosymbionts that have intervening sequences in their 16S rRNA genes. Appl. Environ. Microbiol. 69, 5512–5518 (2003).

  15. 15.

    , , & Candidatus Sonnebornia yantaiensis’, a member of candidate division OD1, as intracellular bacteria of the ciliated protist Paramecium bursaria (Ciliophora, Oligohymenophorea). Syst. Appl. Microbiol. 37, 35–41 (2014).

  16. 16.

    et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).

  17. 17.

    in Structural RNA Homology Search and Alignment using Covariance Models (ed. et al.) (Washington Univ. in Saint Louis, 2009).

  18. 18.

    & Omic approaches in microbial ecology: charting the unknown. Microbe 8, 353–360 (2013).

  19. 19.

    et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Rev. Microbiol. 12, 635–645 (2014).

  20. 20.

    et al. Inactivation of ribosomal protein genes in Bacillus subtilis reveals importance of each ribosomal protein for cell proliferation and cell differentiation. J. Bacteriol. 194, 6282–6291 (2012).

  21. 21.

    Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res. 30, 5382–5390 (2002).

  22. 22.

    , , & Signature protein of the PVC superphylum. Appl. Environ. Microbiol. 80, 440–445 (2014).

  23. 23.

    , , & Phylogenomics of prokaryotic ribosomal proteins. PLoS ONE 7, e36972 (2012).

  24. 24.

    & Initiator proteins for the assembly of the 50S subunit from Escherichia coli ribosomes. Proc. Natl Acad. Sci. USA 79, 7238–7242 (1982).

  25. 25.

    & A gripping tale of ribosomal frameshifting: extragenic suppressors of frameshift mutations spotlight P-site realignment. Microbiol. Mol. Biol. Rev. 73, 178–210 (2009).

  26. 26.

    Structures of the bacterial ribosome at 3.5 Å resolution. Science 310, 827–834 (2005).

  27. 27.

    Ribosomal protein L1 recognizes the same specific structural motif in its target sites on the autoregulatory mRNA and 23S rRNA. Nucleic Acids Res. 33, 478–485 (2005).

  28. 28.

    , & Assembly of bacterial ribosomes. Annu. Rev. Biochem. 80, 501–526 (2011).

  29. 29.

    et al. Iron-reducing bacteria accumulate ferric oxyhydroxide nanoparticle aggregates that may support planktonic growth. ISME J. 7, 338–350 (2013).

  30. 30.

    et al. Acetate availability and its influence on sustainable bioremediation of uranium-contaminated groundwater. Geomicrobiol. J. 28, 519–539 (2011).

  31. 31.

    , , & IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

  32. 32.

    & Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).

  33. 33.

    et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

  34. 34.

    Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

  35. 35.

    , , , & UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).

  36. 36.

    , , , & KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).

  37. 37.

    & KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27 (2000).

  38. 38.

    et al. Community genomic analyses constrain the distribution of metabolic traits across the Chloroflexi phylum and indicate roles in sediment carbon cycling. Microbiome 1, 22 (2013).

  39. 39.

    et al. Extraordinary phylogenetic diversity and metabolic versatility in aquifer sediment. Nature Commun. 4, 2120 (2013).

  40. 40.

    et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009).

  41. 41.

    , , , & Prediction of effective genome size in metagenomic samples. Genome Biol. 8, R10 (2007).

  42. 42.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  43. 43.

    et al. Candidate phylum TM6 genome recovered from a hospital sink biofilm provides genomic insights into this uncultivated phylum. Proc. Natl Acad. Sci. USA 110, E2390–E2399 (2013).

  44. 44.

    et al. Targeted access to the genomes of low-abundance organisms in complex microbial communities. Appl. Environ. Microbiol. 73, 3205–3214 (2007).

  45. 45.

    et al. Dissecting biological ‘dark matter’ with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. Proc. Natl Acad. Sci. USA 104, 11889–11894 (2007).

  46. 46.

    , & Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).

  47. 47.

    et al. The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3, 2 (2002).

  48. 48.

    et al. Rfam 11.0: 10 years of RNA families. Nucleic Acids Res. 41, D226–D232 (2013).

  49. 49.

    , , , & Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics 23, i19–i28 (2007).

  50. 50.

    et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

  51. 51.

    et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

  52. 52.

    & Protein structure prediction on the Web: a case study using the Phyre server. Nature Protocols 4, 363–371 (2009).

  53. 53.

    et al. Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project. Stand. Genomic Sci. 3, 243–248 (2010).

  54. 54.

    et al. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27, 1159–1161 (2011).

  55. 55.

    RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

  56. 56.

    Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195 (2011).

  57. 57.

    , & FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).

  58. 58.

    MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

  59. 59.

    , & ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–2105 (2005).

  60. 60.

    & Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst. Biol. 61, 1061–1067 (2012).

  61. 61.

    & Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

  62. 62.

    & ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report no. 46 (Dept. of Mathematics and Computer Science, University of Marburg, Germany, 2005).

Download references


We thank J. Cate and S. Moore for input into the ribosomal protein analysis, J. Doudna and E. Nawrocki for suggestions on the rRNA insertion analysis, and M. Markillie and R. Taylor for assistance with RNA sequencing. Research was supported by the US Department of Energy (DOE), Office of Science, Office of Biological and Environmental Research under award number DE-AC02-05CH11231 (Sustainable Systems Scientific Focus Area and DOE-JGI) and award number DE-SC0004918 (Systems Biology Knowledge Base Focus Area). L.A.H. was partially supported by a Natural Sciences and Engineering Research Council postdoctoral fellowship. DNA sequencing was conducted at the DOE Joint Genome Institute, a DOE Office of Science User Facility, via the Community Science Program. RNA sequencing was performed at the DOE-supported Environmental Molecular Sciences Laboratory at Pacific Northwest National Laboratory.

Author information


  1. Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA

    • Christopher T. Brown
  2. Department of Earth and Planetary Science, University of California, Berkeley, California 94720, USA

    • Laura A. Hug
    • , Brian C. Thomas
    • , Itai Sharon
    • , Cindy J. Castelle
    • , Andrea Singh
    •  & Jillian F. Banfield
  3. School of Earth Sciences, The Ohio State University, Columbus, Ohio 43210, USA

    • Michael J. Wilkins
  4. Department of Microbiology, The Ohio State University, Columbus, Ohio 43210, USA

    • Michael J. Wilkins
    •  & Kelly C. Wrighton
  5. Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA

    • Kenneth H. Williams
    •  & Jillian F. Banfield
  6. Department of Environmental Science, Policy, and Management, University of California, Berkeley, California 94720, USA

    • Jillian F. Banfield


  1. Search for Christopher T. Brown in:

  2. Search for Laura A. Hug in:

  3. Search for Brian C. Thomas in:

  4. Search for Itai Sharon in:

  5. Search for Cindy J. Castelle in:

  6. Search for Andrea Singh in:

  7. Search for Michael J. Wilkins in:

  8. Search for Kelly C. Wrighton in:

  9. Search for Kenneth H. Williams in:

  10. Search for Jillian F. Banfield in:


Samples and geochemical measurements were taken by M.J.W., K.C.W. and K.H.W. B.C.T. assembled the metagenome data. I.S. implemented the ABAWACA algorithm. C.T.B. and J.F.B. binned the data and carried out the ESOM binning validation. J.F.B. closed and curated the complete genomes. C.T.B., L.A.H. and B.C.T. conducted the rRNA gene insertion analysis. C.T.B. and L.A.H. performed phylogenetic analyses. M.J.W. and K.C.W. conducted the RNA sequencing. C.T.B. carried out the 16S rRNA gene copy number, primer binding and transcript analyses. C.T.B. and J.F.B. carried out the ribosomal protein analyses. C.T.B., L.A.H., C.J.C. and J.F.B. conducted the metabolic analysis. A.S. and B.C.T. provided bioinformatics support. C.T.B. and J.F.B. drafted the manuscript. All authors reviewed the results and approved the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Jillian F. Banfield.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains a guide to Supplementary Figure 1, Supplementary Tables 1-10 and the Supplementary Data (see separate files).

  2. 2.

    Supplementary Figure

    This file contains Supplementary Figure 1 (see the Supplementary Information file for details).

Excel files

  1. 1.

    Supplementary Tables

    This file contains Supplementary Tables 1-10 (see the Supplementary Information file for details).

Zip files

  1. 1.

    Supplementary Data

    This zipped file contains the Supplementary Data (see the Supplementary Information file for details).


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.