Main

Relatively little is known about the functioning of complex microbial communities, largely due to the difficulty in culturing most microbes (Rappe and Giovannoni, 2003). Although metagenomics can provide information on the genetic capabilities of the entire community, it is difficult to connect predicted gene functions to specific organisms using metagenomics (Morales and Holben, 2011). One method to address these challenges is single-cell genomics, where a single microbial cell is isolated from a sample, lysed, and its genome amplified by multiple displacement amplification (MDA; Lasken, 2012; Blainey, 2013).

When working with environmental or clinical samples it is generally impractical to work with fresh samples. Most samples need to be preserved in some way before they are studied by single-cell genomics. A possible difficulty is that preservation methods could induce damage to the DNA that would negatively impact genome recovery. The effect of treatments on the cells is especially important in single-cell genomics because it is already established that the process produces incomplete genomes even with fresh cells (Marcy et al., 2007). The average estimated genome completeness of 650 single-cell genomes that are publicly available in IMG (https://img.jgi.doe.gov) is 41%, as based on the presence of conserved single copy genes by the method in Rinke et al. (2013).

The purpose of this study was to test the effects of common sample treatments on the recovery of genomes from single cells. Three methods of sample preservation were tested: cryopreservation with 20% glycerol as a cryoprotectant, preservation in 70% ethanol, and preservation in 4% paraformaldehyde. Because paraformaldehyde treatment causes crosslinks between nucleic acids and proteins, we tested cells exposed to 4% paraformaldehyde with and without having their crosslinks reversed by heat treatment. Additionally, there is demand for obtaining single-cell genomes from particular taxa within a microbial community. A common approach for this type of isolation makes use of fluorescence in situ hybridization (FISH) to identify the target cells (Podar et al., 2007; Haroon et al., 2013). Thus, cells that had undergone a typical FISH protocol without prior fixation (Yilmaz et al., 2010) were included as an additional treatment to separate the FISH protocol from the fixation steps.

In order to determine if the GC content of a genome would affect the results, three bacterial strains were chosen with differing genomic G+C contents: Pedobacter heparinus DSM 2366, 42% GC; Escherichia coli K12-MG1655, 51% GC; and Meiothermus ruber DSM 1279, 63% GC. These have complete genome sequences available and have the same cell structure (Gram negative) to control for major differences in lysis efficiency. Forty cells of each of the three strains underwent each of the five treatments outlined above (Figure 1).

Figure 1
figure 1

Schematic of experimental workflow. For clarity the treatments are color coded throughout the figure and are: blue, Cryo—cryopreservation; green, FISH—FISH treatment; yellow, EtOH—ethanol fixation; orange, PFA xlinks—paraformaldehyde fixation with crosslinks reversed; red, PFA—paraformaldehyde fixation with crosslinks intact; brown, Pos—positive controls (wells each with 100 cryopreserved cells); purple, Neg—no cell-negative controls. The layout used for the single cell isolations in a 384-well plate is shown. Each strain/treatment combination was sorted separately onto one plate per strain. The MDA step in this figure plots the Cp (inflection point) values for the real-time MDA amplification curves for all three bacterial species combined. The MDA reaction is run for 16 h, so any wells that show no amplification by that time are indicated as 16+ on the graph. Each column in the chart summarizes data from 120 separate MDA reactions, except for the negative control, which involves 60 reactions. 16S rRNA gene screening indicates the percentage of the wells for each treatment that produced a 16S rRNA gene sequence from the reference organism.

All cells underwent an alkaline lysis and MDA (Woyke et al., 2011). The time at which the inflection point of the real-time amplification curves occurs (crossing point; Cp value) correlates with the completeness of the recovered single-cell genomes (Supplementary Figure S1). Cp order was ranked as follows: cryopreservation<FISH<ethanol<paraformaldehyde with crosslinks reversed<paraformaldehyde with crosslinks intact=negative controls (Figure 1). Although the real-time kinetics showed some amplification, in no cases did the MDA products from paraformaldehyde with or without crosslink reversal treatments produce a 16S rRNA gene amplicon from PCR (Figure 1) indicating that the MDA amplicons were likely due to nonspecific amplification of primers. For each strain, eight cells from each of the three treatments that produced the expected 16S rRNA gene sequence and had the earliest Cp times in the treatment (cryopreservation, ethanol, and FISH) were selected for shotgun sequencing.

To eliminate any biases due to varying sequencing depth, the sequences were randomly subsampled to a coverage depth of 315 × for each single amplified genome (SAG; Supplementary Figure S2). The reads were mapped to the reference genomes, a de novo assembly was performed and the assembled data were also mapped to the reference genomes.

Cryopreservation resulted in SAGs with the highest percentage of the genome recovered for all three strains of bacteria (Figure 2; Supplementary Figure S3). FISH treatment produced reduced genome coverage, and ethanol preservation resulted in the lowest amount of the genome recovered. These treatments are significantly different from each other (Figure 2; Supplementary Figure S3). Despite the trend for cells with a higher GC content to have higher MDA Cp values, there is no clear effect of GC content on the amount of genome recovered by the various treatments (Supplementary Figures S1 and S3).

Figure 2
figure 2

(a) Coverage plots for each organism/treatment combination. Cryo—cryopreservation, EtOH—ethanol preservation, FISH—FISH treatment. The horizontal black lines in the center of each plot represent the complete reference genomes and the vertical colored lines indicate which parts of the genome were recovered in the assemblies. Redder colors indicate that more single cells contained that region in their assembly. (b) Percent of the genomes recovered for each treatment with the single cells from all three organisms averaged together and the error bars indicating one standard deviation. Light gray bars indicate the genome recovery by mapping the reads to the reference genomes. A base pair was considered to be recovered if at least 10 reads were mapped to cover it. Dark gray bars are the genome recovery by mapping contigs produced by de novo assembly to the reference genomes. The treatments are significantly different from each other (ANOVA; P<0.0001).

The lack of amplicons from cells treated with paraformaldehyde is likely due to the crosslinks preventing the phi29 polymerase from accessing the DNA. The heat treatment to reverse the crosslinks was either insufficient to reverse them or resulted in DNA strand breakage and depurination that damaged the DNA sufficiently that it was unable to be amplified.

The FISH process is intended to improve access of the oligo probe to its RNA or DNA target site in the cells. As this might result in greater access of the phi29 polymerase to the DNA, one would expect the MDA from this treatment to be the most efficient. However, the fact that the MDA kinetics were delayed and that the genome recovery was reduced compared to the cryopreserved samples indicates that other factors must be involved. As the washing steps should remove the components of the FISH buffers prior to MDA, they would not directly affect the MDA process. Thus, the reduced genome recovery is likely derived from damage to the DNA during the FISH treatment. Since the FISH process can denature high-AT regions of DNA at the temperatures used, the single-stranded DNA that results could be more susceptible to damage (Blake and Delcourt, 1996). This interpretation is supported by the genome recovery in the low-GC-organism P. heparinus being lower than in the other two organisms (Supplementary Figure S3). In addition, it has been shown that formamide can degrade purine nucleosides (Saladino et al., 1996). Small amounts of damage could explain the reduction in the genome recovered by the MDA.

The significant reduction in genome recovery from ethanol-preserved cells is challenging to explain. It is unlikely that ethanol could carry over and inhibit the MDA reaction because the cells were washed twice before sorting and the sorting process itself results in significant dilution of the sample (Rodrigue et al., 2009). Since there is no known mechanism for ethanol to damage the DNA, the substantial reduction in the amount of the genome recovered must be due to the polymerase having restricted access to the DNA. Ethanol causes proteins to denature and precipitate (Yoshikawa et al., 2012). These could aggregate around the DNA and prevent access by the polymerase.

Although single-cell genomics has great potential to provide insight into the vast number of microbes that have not been cultivated, sample handling can greatly impact the completeness of single-cell genomes. Our results suggest that samples that have been archived by preservation in paraformaldehyde will be unsuitable for the production of single-cell genomes and that ethanol-preserved samples are likely to produce single-cell genomes of reduced quality. Thus, we recommend use of cryopreserved specimens for best results and fixation-free FISH (Yilmaz et al., 2010; Haroon et al., 2013) if targeted flow sorting is to be employed.