Targeted detection of Dehalococcoides mccartyi microbial protein biomarkers as indicators of reductive dechlorination activity in contaminated groundwater

Dehalococcoides mccartyi (Dhc) bacterial strains expressing active reductive dehalogenase (RDase) enzymes play key roles in the transformation and detoxification of chlorinated pollutants, including chlorinated ethenes. Site monitoring regimes traditionally rely on qPCR to assess the presence of Dhc biomarker genes; however, this technique alone cannot directly inform about dechlorination activity. To supplement gene-centric approaches and provide a more reliable proxy for dechlorination activity, we sought to demonstrate a targeted proteomics approach that can characterize Dhc mediated dechlorination in groundwater contaminated with chlorinated ethenes. Targeted peptide selection was conducted in axenic cultures of Dhc strains 195, FL2, and BAV1. These experiments yielded 37 peptides from housekeeping and structural proteins (i.e., GroEL, EF-TU, rpL7/L2 and the S-layer), as well as proteins involved in the reductive dechlorination activity (i.e., FdhA, TceA, and BvcA). The application of targeted proteomics to a defined bacterial consortium and contaminated groundwater samples resulted in the detection of FdhA peptides, which revealed active dechlorination with Dhc strain-level resolution, and the detection of RDases peptides indicating specific reductive dechlorination steps. The results presented here show that targeted proteomics can be applied to groundwater samples and provide protein level information about Dhc dechlorination activity.


Sample preparation for global and targeted proteomics
Filtered cells from axenic cultures of Dhc strains 195, FL2, BAV1 (n=2 biological replicates), the BDI Consortium, as well as the M17, M18, 97, 116 and 129 groundwater samples (n=1) were processed by adding 2 mL of SDS lysis buffer (4% SDS in 100 mM Tris-HCl, pH 8.0) to the Sterivex cartridges followed by incubation in a water bath at 97ºC for 15 minutes and incubation at room temperature for 1 hour. The SDS lysis buffer was recovered and the filters rinsed once more with fresh lysis buffer. As previously described, proteins were extracted from cell lysates by trichloroacetic acid (TCA) precipitation and proteolytically digested with trypsin following denaturation and disulfide bonds being reduced and blocked. 1 Frozen filter membranes with biomass from the 33NA4 groundwater sample (n=1) were removed from the cartridges and cut into ~ 1 cm pieces using a sterilized razor blade and then suspended in 5 mL of SDS lysis buffer (5% SDS in 50 mM Tris-HCl, pH 8.5; 0.15 M NaCl, 0.1 mM EDTA; 1mM MgCl2; 50 mM DTT). Cells were heat-lysed as described earlier 2 and the supernatant containing the whole cell lysate transferred to new tubes. Proteins were then precipitated by TCA. Lysate mixes were centrifuged at 21000 g x 20 min to obtain a protein pellet which was washed with chilled acetone, air dried and solubilized in 6M guanidine buffer 3 . Following protein solubilization, proteolysis was initiated using trypsin. All peptide solutions were desalted on 200µL C18 stage tips (Thermo Scientific, Waltham, MA) and stored at -80ºC prior to global proteomics analysis. For targeted proteomics runs, volumes of processed samples were loaded directly onto capillary back columns and desalted off-line.

Protein identification by database searching
Tandem MS spectra from pure cultures of Dhc strains 195, FL2, BAV1, and the BDI consortium culture were searched against individual or concatenated databases of Dhc strains downloaded from UniProt (for strains 195, GT, VS, CBDB1, BAV1 02/2017). The IGS Annotation Engine was used for structural and functional annotation of the Dhc strain FL2 protein sequences (http://ae.igs.umaryland.edu/cgi/index.cgi, Reference: PMID:21677861) and the web-based tool Manatee was used to view and download protein annotations (http://manatee.sourceforge.net/). The tryptic digest of the BDI consortium was searched with a database assembled from the proteomes of the six strains of Dhc and Dehalobacter restrictus DSM 9455 (SI Table S5). Spectral data collected from groundwater samples were searched against a database encompassing the proteomes of bacterial isolates known to coexist with Dhc or known to inhabit aquifer and sediments (SI Table S6).
In addition to common contaminant proteins, the reversed protein sequences were appended and used as decoys to discern the false-discovery rate (FDR) at the spectral level. For standard database searching, the tandem fragmentation spectra (MS/MS) were searched with Myrimatch v2.2 algorithm 4 set to parameters described before. 5 Resulting peptide spectrum matches were then imported, filtered and organized into proteins with IDPicker v.3.1 6 software. To achieve a final peptide-level confidence > 99% (or false discovery rate FDR < 1%), proteins were identified with at least two distinct peptides sequences and a minimum spectra of 2 per protein.

Global proteomics data analysis
Protein intensity values from each global proteomics dataset were calculated by summing together the MS1-level intensities of peptide precursors that were derived from IDPicker using IDPQuantify. 7 Extracted ion chromatograms (XICs) were identified using ± 30 s lower and upper retention time tolerance and ± 10 ppm lower and upper chromatogram tolerance. Protein abundance values were normalized by dividing the protein intensity values by their length (i.e., number of amino acids), performing a log2 transformation, and mean central tendency adjusted with the software platform Inferno RDN (https://omics.pnl.gov/software/infernordn).
Using the Perseus software, 8 we removed proteins in pure cultures of Dhc strains 195, FL2 and BAV1 that were stochastically sampled by requiring quantified proteins to be observed in both biological replicates per strain. For the BDI consortium and groundwater sample sets (n= 2 and 3 technical replicates, respectively) proteins observed in at least one run were considered for comparison to targeted results as their sporadic identification by global proteomics may have been due to their low biological abundances and thus we could have a probability of observing them employing LC-MRM-MS. Missing values were then imputed with random numbers from a simulated Gaussian distribution of low abundant proteins (down-shift value of 2.5 and width of 0.3). All proteins identified by LC-MS/MS were clustered at > 85% amino acid sequence identity with the UClust algorithm of the analysis tool USearch v10.0. 9 Venn diagrams were generated with the web application jvenn (http://jvenn.toulouse.inra.fr/app/index.html).

RDase phylogenetic tree construction
To provide insight into the diversity of the RDases sequences present in the proteomes of Dhc strains 195, FL2, and BAV1, their phylogenetic relationships were evaluated with the software MEGA 7. 10 A total of 52 RDases and two outgroup RDase sequences from Desulfitobacterium hafniense strain Y51 (Q8L172) and Dehalobacter restrictus DSM 9455 (AHF10441) were aligned with the MUSCLE algorithm. 11 All columns in the alignment of the protein sequences containing gaps and missing data were eliminated, leaving a total of 56 amino acid positions in the final dataset. A phylogenetic tree using the Maximum Likelihood algorithm based on the JTT matrix-based model was then constructed. 10,12 Initial trees for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distance estimates using a JTT model, and then selecting the topology with superior log likelihood value. Estimation of the relative confidence scores in phylogenetic groups were determined by using 1000 bootstrap replications of the data set. 13 The tree was rooted with the outgroup RDase sequences from Desulfitobacterium hafniense strain Y51 and Dehalobacter restrictus DSM 9455.

Unipept tools and Protein BLAST searches
The Peptidome Clustering tool of the web application Unipept 3.2 14 was used to compare the percentages of pairwise similarity between the in-silico generated peptidomes of six Dhc strain proteomes databases (i.e., Dhc strains 195, FL2, VS, GT, BAV1, and CBDB1) against the peptidomes of representative bacterial isolates that have been obtained from groundwater, aquifer, sediment, or soil (see SI for additional information). Peptidome similarity percentages were calculated based on the minimum similarity method and then clustered by the UPGMA algorithm. To assess if other protein records stored either at UniProt or NCBI could produce the selected peptides before monitoring in groundwater, in-silico specificities were evaluated using the Tryptic Peptide Analysis tool of UniPept 3.2 (equating isoleucine and leucine residues) and Protein BLAST searches against non-redundant protein sequences (replacing the N-terminus of each peptide with either K or R residues, respectively). Peptides were deemed as Dhc-specific if they were not found in the proteins of any other bacterial species by means of both in silico searches qPCR of groundwater samples Sterivex 0.22 µm filter units were used to concentrate biomass for groundwater samples M17, M18, 97, 116 and 129 using volumes equivalent to 225 mL, 553 mL, 1000 mL, 1000 mL and 1000 mL, respectively. DNA was isolated from the Sterivex cartridges using the standard protocol of the MoBio PowerLyzer PowerSoil Kit (MoBio, Carlsbad, CA). Quantification of DNA concentrations was conducted using the Qubit dsDNA BR Assay (Life Technologies, Grand Island, NY) as per the manufacturer's instructions. DNA solutions were stored at -80°C until qPCR measurements. qPCR analyses targeting 16S rRNA genes from total bacteria, Dehalococcoides sp, Dehalobacter sp. And Dehalogenimonas sp. were conducted using a QuantStudio 12K Flex Real-Time PCR System. The primer probes for each gene targeted assay were designed using Primer Express version 3 and Geneious versions R6-R11. 15 Sample dilutions of 1:10, 1:100 and 1:1000 were prepared using nuclease-free water to determine the presence of any interfering contaminants. The Dhc 16S rRNA gene assay was chosen to demonstrate any contaminant interference for each sample. Upon analysis of the qPCR results, the most diluted sample demonstrating absence of contaminants and that gave the best fit within the template DNA standard curve was chosen for qPCR quantification. Gene copies numbers per mL were quantified in triplicate using the method described in 16 . Other peak groups (i.e., the one marked with a red arrowhead) were also observed in the XICs of strain FL2 but were determined to be interfering signals when the XICs of the same peptide in strain 195 was analyzed. S10 Supplementary Figure S5. Similarity matrix calculated with the Unique Peptide Finder application of Unipept 3.2 (https://unipept.ugent.be/) between the tryptic peptidomes of cultured strains of Dhc (including the three analyzed here) and other bacteria that have been isolated or identified in groundwater or aquifer materials. The E. coli proteome was also added as distant comparison 0% 100% Peptidome similarity scale S13