Novel enzymes for biodegradation of polycyclic aromatic hydrocarbons identified by metagenomics and functional analysis in short-term soil microcosm experiments

Polycyclic aromatic hydrocarbons (PAHs) are highly toxic, carcinogenic substances. On soils contaminated with PAHs, crop cultivation, animal husbandry and even the survival of microflora in the soil are greatly perturbed, depending on the degree of contamination. Most microorganisms cannot tolerate PAH-contaminated soils, however, some microbial strains can adapt to these harsh conditions and survive on contaminated soils. Analysis of the metagenomes of contaminated environmental samples may lead to discovery of PAH-degrading enzymes suitable for green biotechnology methodologies ranging from biocatalysis to pollution control. In the present study, our goal was to apply a metagenomic data search to identify efficient novel enzymes in remediation of PAH-contaminated soils. The metagenomic hits were further analyzed using a set of bioinformatics tools to select protein sequences predicted to encode well-folded soluble enzymes. Three novel enzymes (two dioxygenases and one peroxidase) were cloned and used in soil remediation microcosms experiments. The experimental design of the present study aimed at evaluating the effectiveness of the novel enzymes on short-term PAH degradation in the soil microcosmos model. The novel enzymes were found to be efficient for degradation of naphthalene and phenanthrene. Adding the inorganic oxidant CaO2 further increased the degrading potential of the novel enzymes for anthracene and pyrene. We conclude that metagenome mining paired with bioinformatic predictions, structural modelling and functional assays constitutes a powerful approach towards novel enzymes for soil remediation.

risk management activities has been to develop and apply efficient methods to remediate PAH polluted sites.Amongst the risk reduction methodologies, biodegradation based treatment (bioremediation) is emerging as an efficient and environmental-friendly option which employs microorganisms or enzymes for risk mitigation [9][10][11] , whereas physical and chemical treatment methods are usually costly and highly chemical-or energy-intensive.
Numerous microorganisms (bacteria, fungi, algae) have the potential to transform or degrade PAH contaminants, among which, bacteria-and fungi-facilitated biodegradation has been studied most widely 2 .However, enzyme-based bioremediation technologies offer several advantages (i.e.greater specificity and higher mobility of the enzymes, non-dependency on expensive coenzymes or cofactors of enzymatic bioremediation) over the use of bacteria or fungi [12][13][14] .In addition, these technologies can operate efficiently over a broader range of environmental parameters such as pH, temperature and ionic strength 14 .
In the most common microbial degradation pathways, the initiating important steps are usually catalysed by redox enzymes, such as mono-and dioxygenases and peroxidases, as shown in Fig. 1.
Ring-hydroxylating dioxygenases catalysing the initial oxidation are widely known enzymes having significant role in the biodegradation of polycyclic aromatic hydrocarbons 15,16 .All gentisate dioxygenases (GDOs) characterized to date belong to the bicupin family 17 .Different peroxidase enzymes with PAH pollutants removal ability were also detected 14 .The application of enzymes for degradation of pollutants demands some requirements regarding the enzyme properties: increased stability, substrate specificity, and kinetic properties 14,18 .Meeting of these requirements is supported by novel engineering tools, such as metagenomics and metabolic engineering.The application of genomic information, gained from metagenomes of diverse environmental samples is a widely researched field today.Species, adapted to particular environmental conditions may contain enzymes, which are well-applicable in biotechnology, including chemical manufacturing and environmental pollution control.
Coal gasification plants produced lighting gas and coking coal in the second half of the XIXth and the first half of the XXth centuries.The industrial areas of those former factories are highly contaminated by the by-products of the manufacturing process, including polycyclic aromatic hydrocarbons (PAHs).E.g. one of the oldest coal gasification plants in North America was the Victoria Gas Company on the Vancouver Island, in the Rock Bay, British Columbia, Canada.The plant ceased production in 1952, but its former area is still contaminated by numerous pollutants.A large metagenomic study was performed in this area in 2005 by Carleton University, analyzing the soil samples from the gasification plant 19 .The short reads, identified from the DNA strands by Illumina HiSeq 2000 sequencing were deposited in the NCBI SRA archive by the accession numbers of SRX1419392 and SRX1419010.In our present study, we have selected these metagenomic sequence databases as starting databases.
In the present study, our major goal was to identify new PAH-decomposing enzymes from the above metagenomic databases, particularly focusing on the families of dioxygenases and peroxidases.We aimed to use artificial intelligence tools for distilling the hidden common properties of PAH-metabolizing enzymes, and based on these properties, we aimed to identify novel enzymes using the methodology described in Szalkai and Grolmusz 20 .
The method described in Szalkai and Grolmusz 20 applies a training, a search and an assembly phase.For the training phase, we have used well-known PAH-degrading enzymes from the families of dioxygenases, peroxygenases, and oxydoreductases.From the existing, already annotated enzymes we have prepared HMM-profiles by www.nature.com/scientificreports/applying multiple string alignments and hidden Markov models.This approach facilitates finding novel enzymes with a prescribed potential function, it is more flexible and less restrictive than searching by simple sequential similarity.The constructed profiles were used for searching short reads from the Rock Bay metagenome by the hmmsearch program 21,22 .The hits were collected from the metagenome, and an assembly program was applied for finding longer protein fragments by using the famous Megahit program 23,24 .The best constructs were chosen for further processing.
The functional characterization and evaluation of PAH biodegradation in soil are much more complicated than in liquid cultures because of the heterogeneity and complexity of soil.Thus, many biotic and abiotic factors should be considered when studying soil bioremediation.Numerous studies demonstrated that the biodegradation rate of organic pollutants in soils depends on the environmental parameters, the soil characteristics, the type of microorganism and enzymes, as well as the nature and chemical structure of the PAH compound to be removed.Manipulation and optimization of these factors and the addition of biosurfactants 25,26 , nanoparticles 27,28 and peroxides 29,30 .may increase biodegradation efficiency.Therefore, to design and develop a bioremediation technology, a number of factors are to be taken into account.
For several representatives of the newly identified enzymes, we have also performed functional analysis.The objectives of the functionality study were: (1) to assess the efficiency of the novel enzymes for degradation of PAH pollutants in short-term soil microcosms, (2) to investigate the effect of calcium peroxide (CaO 2 ) as an additive in order to decide whether it can enhance the degradation, (3) to evaluate the microbial activity after bioremediation in soil.Our long-term goal is to create an enzyme formulation that can be successfully used to remediate soils contaminated with polycyclic aromatic hydrocarbons (PAHs).Enzyme activity assays with a purified enzyme on a given substrate under artificial conditions (controlled pH, temperature, composition of an assay buffer) would not provide satisfying answer to the question whether the enzyme could function properly under the conditions where we would like to apply it in the future.For this reason, we have chosen to perform microcosm experiments, where the experimental circumstances more closely mimic the real conditions.Our data indicate that these novel enzymes can efficiently degrade most of the PAH pollutants in soil microcosms, especially the low molecular weight PAHs (LMW PAHs) (e.g.naphthalene, anthracene, phenanthrene), and even some high molecular weight PAH (HMW PAH) contaminants (e.g.pyrene) and a cost-effective inorganic oxidant may increase the efficiency of the degradation.Our study presents a pipeline as a useful resource for identifying potent novel enzymes for PAH biodegradation in soils.

Metagenomic analysis
In the metagenomic sequence search, we needed to solve the following problem: based on the already identified PAH-decomposing enzymes, new, still unknown, possibly better enzymes needed to be found.If we had filtered according to sequence homology with known PAH-decomposing enzymes, then we would have identified from the metagenome only slightly different "new" enzymes, relative to the already known ones.Consequently, we have chosen a more subtle, artificial intelligence-based method, as follows: (i) First, we identified six potentially PAH-degrading enzyme groups, from database-and literature search.
The members of these classes are listed in Supplementary Tables S1-S6.(ii) Next, multiple alignments were performed for each of the six classes separately, using the Clustal Omega software 23,31,32 with default parameters.(iii) Next, we used the hmmbuild tool 21,22 for constructing six hidden Markov models (HMM), one for each class, using the Clustal-aligned structures from step (ii).These models made possible the search for unknown sequences, having the common properties of the proteins in the starting classes.(iv) In the following step, we used the hmmsearch program 21,22 and applied our HMM models created in (iii) on the metagenomic short reads databases deposited in the NCBI SRA archive by the accession numbers of SRX1419392 and SRX1419010 (representing DNA sequences from soil samples of a coal gasification plant 19 ).We identified short reads in these databases with the highest similarity to our HMM profiles.The selected short reads were listed according to E-values increasing; we then applied a cut-off value of 0.01, that is, only those short reads were returned, which had an E-value of 0.01 or less.(v) The identified short reads were assembled into the longest possible sequences by using the Megahit program 23,24 .Only those results were retained which had both start-and stop codons.(vi) The identified protein sequences were compared by the NCBI BLAST program against the RefSeq nonredundant protein sequences.Sequences with very high similarity with the known proteins were discarded as not novel.
The description of the hmmbuild and hmmsearch components of the HMMER3 suite is at the site http:// hmmer.org/ docum entat ion.html.
The amino acid sequences of the hits retained after step (vi) are listed in Supplementary Table S7.

Bioinformatic analysis of hit sequences
In order to identify potentially well folded enzymes, the hit sequences were analyzed by several bioinformatic predictors, such as PROSO II 33 , Protein-sol 34 , ESPRESSO 35 , MEMEX 36 , CRYSTALP2 37 , IUPRED 38 .Hit protein sequences were classified according to solubility, expressability, crystallizability and flexibility.We analyzed the metagenomic hit sequences in sequence alignments and with bioinformatic prediction methods.We employed a set of four criteria related to sequence novelty, crystallizability and potential for expression in E. coli.These criteria are further detailed in the Results and Discussion section.The sequences were further analyzed by multiple sequence alignment with ClustalW.The presence of the amino acids which are essential in catalytic activity was important during selection of hits for further detailed analysis.Three-dimensional models of several hit proteins were built with SWISS-MODEL server 39 .

Cloning, expression and purification
For protein expression the relevant protein coding genes were obtained by gene synthesis, sequences were ordered from GenScript, cloned into plasmid pET15b.This plasmid encodes an amino-terminally His-tagged variant of the enzymes.Optimisation experiments with respect to the bacterial strains and induction conditions were carried out and the conditions described below were used.The conditions used for protein production were also selected on the basis of preliminary experiments.E. coli Rosetta cells with the recombinant pET15b were grown at 37 °C overnight in 5 ml LB-media with chloramphenicol (34 μg/ml) and carbenicillin/ampicillin (50 μg/ml) then transferred into 500 ml fresh LB media without antibiotics.The cells were grown to an OD600 nm of 0.8 and the expression of the proteins was induced by adding isopropyl-1-thio-b-D-galactopyrano-side (IPTG) (0.7 mM).The protein expression took place at 25 °C.The cells were harvested about 5 h after IPTG addition by centrifugation (4000 rpm, 20 min, 4 °C).
For purification of enzyme proteins, E. coli cells were resuspended into lysis buffer (Tris/HCl (50 mM, pH 8.0), 300 mM NaCl) and disrupted by using sonication.Intact cells and cell fragments were removed by centrifugation (11,000 rpm, 30 min, 4 °C).The enzymes were purified from the crude extracts by using nickel affinity chromatography.We have followed the purification of enzymes, obtained by using nickel affinity chromatography, by SDS-PAGE electrophoretic gels.The proteins were eluted by using buffers containing 50 mM HEPES (pH 7.5), 300 mM KCl, 5 mM β-mercaptoethanol and 250 mM imidazole.The solutions were dialyzed into 140 mM NaCl, 30 mM Tris/HCl (pH 7.5) containing buffer.Aliquots were stored at -80 °C in this buffer (140 mM NaCl, 30 mM Tris/HCl (pH 7.5) and were used for the experiments.

Soil microcosms to study the functionality and the efficiency of the enzymes
Laboratory soil microcosms were set up to study the functionality and efficiency of the nominated enzymes for the biodegradation of PAH compounds in soil 40 .The 7-day long experiments were conducted using 24 glass reactors of 500 ml volume.Each reactor contained 200 g of the contaminated matrix.
The contaminated matrix was an aleuritic sand (obtained from Elgoscar Environmental technology Plc., Hungary), artificially spiked with naphthalene, phenanthrene, anthracene and pyrene dissolved in gasoil.The spiked gasoil contamination of the soil was 8000 mg/kg.After spiking, the concentrations of the PAH compounds in the soil were determined by gas chromatography coupled with mass spectrometry (GC MS) using Agilent 6890 5973 N Autosampler System according to the 21470-84:2002 standard.These concentrations were 12.5 mg/kg, 25.9 mg/kg, 46.2 and 52.0 mg/kg (dry weight) for naphthalene, phenanthrene, anthracene and pyrene, respectively.
In the experiments, 1 mg purified enzyme in 5 ml buffer solution as enzyme-based inoculant was added into 200 g contaminated soil.Nitrogen and phosphorous, as nutrient supply, were not applied in the soil microcosms.20 mmol CaO 2 was added to 12 microcosms in order to reduce bacterial activity and enhance degradation of the PAH compounds.Each treatment was investigated both in the presence and absence of calcium peroxide.CaO 2 (CAS: 78403-22-2) was purchased from Sigma-Aldrich Merck (Product No: 466271).
The experimental set-up also contained an untreated control series without the enzyme-based inoculant and calcium peroxide treatment.20 mmol CaO 2 treatment was also tested without enzymes.Each treatment was run in triplicate.Table 1 shows the applied experimental treatments.

Methods applied to study the efficiency of the enzymes
To determine the concentration of the contaminants in the soil as well the biological activity, samples have been collected from each microcosm after the completion of the experiments.
The concentration of PAH compounds during the experiments was determined by gas chromatography mass spectrometry (GC MS).www.nature.com/scientificreports/For microbial activity measurement, the Biolog EcoPlate™ (Biolog Inc., Hayward, CA, USA) was applied.The Biolog plate is a microtiter plate including 96 wells, which contain 31 various carbon source substrates in triplicate.These substrates are the most useful carbon sources for the physiological profiling of heterotrophic soil bacteria communities.The consumption of a certain substrate by a microbial population group results in a characteristic response pattern, thus various metabolic patterns are developed.The measurement was carried out as described by Feigl et al. 41 with slight modifications.5 g soil from the reactors was suspended in 45 ml 0.85% sterile NaCl solution and it was shaken at 22 °C for 30 min at 150 rpm.After the samples were settled for 10 min, then 1 ml supernatant was diluted in 9 ml 0.85% sterile NaCl solution.125 μl of this suspension was transferred into each well of the EcoPlate, and it was incubated at 25 °C in the dark.The absorbance of the wells was measured every 24 h for 120 h at 490 nm wavelength with DIALAB EL800 Microplate Reader (Dialab GmbH, Wiener Neudorf, Austria).Average Well Colour Development (AWCD) was calculated for all carbon sources with the following equation: where, C is the absorbance value of the substrate-containing well, R is the absorbance of the control well, n is the substrate number.The calculated AWCD represents a quantitative measure of the general potential metabolic activity indicator of the microbial community.

Statistical analysis
The statistical evaluation of the datasets was carried out with TIBCO Statistica™ 13.5.Software.One-way analysis of variance (ANOVA) was performed to investigate whether the treatments (enzymes, calcium peroxide or their combined application) had any effect on the examined parameter such as concentrations of PAHs and microbial activity; all p values less than 0.05 were considered statistically significant.To compare the treatments Fisher's least significant difference (LSD) test was carried out.Letters in alphabetical order are used to mark the significat effects on figures, where "a" represents the smallest average value of the data sets in all separate comparative experiments.Columns assigned with the same letter indicate that there was no significant difference between them.

Metagenomic hits from the dioxygenase and catalase/peroxidase family of enzymes
Table 2 represents a list of hits from both the dioxygenases and catalase-peroxidase families.
For the selection of hits that will be used in further detailed experimental work, we set a number of criteria.Four criteria were considered, two of these were based on sequence analysis and two other criteria were based on bioinformatic predictions.First, only full-length sequences were selected, in which the representative domains and motifs were all present as compared to homologous sequences from the NCBI database.For this criterion, we also required that the metagenomic DNA sequence contain the STOP codon (shown as "end sign" in Table 2).Second, in order to identify enzymes with potential novelty in their amino acid sequence, we selected hit sequences that show considerable amino acid variations as compared to enzyme sequences already present in the NCBI database.In this criterion, we considered only metagenomic sequences showing less than 92% of sequence identity (match) with sequences already present in databases.The hits fulfilling these two criteria are all included in Table 2 (for hit sequences see Supplementary Table S7).The third criterion related to the crystallizability potential.Crystallizability potential was included in the criteria since it usually indicates a well-folded protein and may lead to high resolution structural insights in further studies.In this respect, we used the CrystalP2 software and removed those hits that were predicted to be non-crystallizable from the further investigations.For the fourth criterion, we required that the proteins could be expressed in E. coli, as based on bioinformatic expression prediction methods.
Both the ESPRESSO and MEMEX servers, which estimates protein expression predicted that all of our selected protein can be expressed in E. coli.Disorder predictions for the metagenomic hit sequences showed only short segments with appreciable disorder, potentially constituting flexible linker regions (see Supplementary Figure S1).We have identified PAH1_99, PAH1_105 and PAH6_39 as sequences fulfilling the four criteria.To assess the evolutionary relationships of these three proteins, we have constructed evolutionary trees using the NCBI blastp server (https:// blast.ncbi.nlm.nih.gov/ Blast.cgi). Figure 2 shows these results.PAH1_99 and PAH6_39 show up as distinct branches, well separated from other parts in the evolutionary tree, arguing for some character of novelty.PAH1_105 shows closer relationship with already known sequences.In further details, we focused on sequence and structural alignments to select potential novel enzymes with preserved catalytic sites as compared to previously described representatives of the dioxygenase and catalase/peroxidase enzyme families.

Structural analysis of the dioxygenase metagenomic hits
Figure 3A shows the alignment of the six full-length sequence hits from the dioxygenase enzyme family.Among these six sequences, only the sequences denoted as PAH1_99, PAH1_102 and PAH1_105 were predicted to be crystallizable.A close inspection of this alignment revealed an intriguing variation at a specific sequence position (highlighted in yellow on Fig. 3A).At this position, the protein possesses either glycine or alanine amino acid and according to the literature this variation has a significant effect on the substrate selectivity of the dioxygenase 42 .Ferraroni et al. created this mutation in the enzyme Pseudaminobacter salicylatoxidans salicylate 1,2-dioxygenase, which also belongs to the dioxygenase enzyme family.They found that if glycine is present at this position, then the enzyme can bind several substrates in the active site, while dioxygenases with alanine in this position oxidize only gentisate 42 .www.nature.com/scientificreports/Our aim was to investigate one representative from both the alanine and the glycine containing variations.PAH1_105 is the single hit with a glycine residue in this critical position that was also predicted to be crystallizable, hence this sequence was selected for further characterizations.From among the two crystallizable hits with an alanine in the critical position, PAH1_99 and PAH1_102, we have selected PAH1_99 which shows more variability as seen in the sequence identity (match) parameter (cf Table 2) as compared to the sequences present in the database.
Next, we created a three-dimensional structural model of the two selected dioxygenases using the SWISS-MODEL server.In both cases the template identified in the SWISS-MODEL server was the same, namely Pseudaminobacter salicylatoxidans salicylate 1,2-dioxygenase (PDB ID: 3NW4).Global Model Quality Estimation (GQME) scores were 0.83 and 0.72 in case of PAH1_99 and PAH1_105, while the QMEAN scores were − 1.27 and − 1.47.Figure 3B shows the alignment for the two metagenomic hits and their three-dimensional template sequences: most residues strictly conserved in dioxygenase sequences based on the literature 42 are also conserved in the two novel metagenomic hits (cf red background).www.nature.com/scientificreports/ The three-dimensional fold predicted with the SWISS-MODEL server is shown in Fig. 4A (homotetramers and one subunit, respectively, PAH1_105: orange, PAH1_99: greencyan, template: yellow), while Fig. 4B presents a close up from the active site, indicating several differences in amino acid positions of the two metagenomic hits.These enzymes have a homo-tetrameric structure containing a catalytic Fe(II) ion coordinated by three histidine residues in the N-terminal region 43 .The predicted folds of the two metagenomic hits are identical with minor differences observed at one loop position (a flexible loop in the template sequence-shown on Figs. 3 and 4B).
The wild-type Pseudaminobacter salicylatoxidans salicylate 1,2-dioxygenase protein sequence contains a Gly amino acid in the position 106.It was found that protein with G106A mutation oxidized only gentisate, while 1-hydroxy-2-naphthoate and salicylate were not converted.The amino acid residue Gly106 is located inside the enzyme active site cavity but does not directly interact with the substrates.In the case of the G106A mutation, based on the crystal structure of the complex, a different binding mode was observed for salicylate compared to the wild-type enzyme.The salicylate in the G106A variant coordinated to the catalytically active Fe(II) ion in an unusual and unproductive manner, since salicylate is unable to displace the hydrogen bond formed between Trp104 and Asp174 in the G106A variant.Presumably, such inefficient substrate binding may generally limit the substrate spectrum of wild type GDOs 42 .

Structural analysis of the catalase-peroxidase metagenomic hits
In case of catalase-peroxidases all of the metagenomic hits were predicted as crystallizable (see Table 2), so we have selected PAH6_39 which shows more variability as compared to the sequences present in the database.Figure 5A shows the alignment of the three full-length sequence hits from the catalase-peroxidase enzyme family.A three-dimensional structural model of the selected catalase-peroxidase was also built using the SWISS-MODEL server.The template identified in the SWISS-MODEL server was the Synechococcus elongatus catalase-peroxidase KatG (PDB ID: 3WNU).Global Model Quality Estimation (GQME) score was 0.91 and the QMEAN score was -1.32. Figure 5B shows the alignment for the selected catalase-peroxidase and their three-dimensional template sequences: most residues strictly conserved in catalase-peroxidase sequences according to the literature 44 are also conserved in the novel metagenomic hits (cf red background).
The three-dimensional fold predicted with the SWISS-MODEL server is shown in Fig. 6 (homodimers and one subunit, respectively, PAH6_39: blue, template: yellow).The predicted fold of the metagenomic hit is identical to the template with minor differences observed at one loop position (a flexible loop in the template sequenceshow on Figs. 5 and 6B).
Based on the bioinformatic analysis and the structural modelling, we have expressed and purified two dioxygenases (PAH1_99 and PAH1_105) and one catalase-peroxidase (PAH6_39).Following the experimental protocol details in the Methods section, we have obtained the following yields from 0.5 L medium: 59 mg, 42 mg and 49 mg, for the enzymes PAH1_99, PAH1_105 and PAH6_39 respectively.These enzyme preparations were used in the further experiments.

Performance of the novel enzymes in PAH degradation
In order to test the functionality of the novel enzymes in soil samples, we set up a series of microcosm experiments.Using soil samples spiked with known amounts of different PAH contaminants (12.5 mg/kg, 25.9 mg/ kg, 46.2 and 52.0 mg/kg for naphthalene, phenanthrene, anthracene and pyrene, respectively, cf.Methods), the effect of the enzyme addition on the concentration of contaminants was determined following an incubation period of 7 days.
As shown on Fig. 7A, even in the absence of the added enzymes a considerable degradation of PAH compounds was observed in all of the soil microcosm experiments, suggesting the potential of an inherent microbial activity in the soils.Addition of the novel enzymes (PAH1_99, PAH1_105 and PAH6_39), predicted from metagenome searches and sequence alignments, led to further significant increase in PAH degradation only for naphthalene and phenanthrene.Anthracene and pyrene degradation was not increased by the novel enzymes in this setup.
The suggestion for the inherent microbial activity on the soils was reinforced by the Biolog EcoPlate™ test (cf.Figure 7B) where the various metabolic patterns were estimated with the AWCD values 45 .Some significant increase in the AWCD values upon addition of the novel enzyme argue for the positive effect of the enzyme proteins on the microbes in the soil samples.
In these experiments, we could establish significant remediation activity of the novel enzymes for some, but not all PAHs tested (see also Table 3 for pollutant removal efficiency data in %, calculated from the results in Fig. 7A).It was also observed that enzyme protein addition increased the inherent microbial activity.Hence, it was not straightforward to decide whether the positive effect of the novel enzymes is due to their specific catalytic activities or to their potential to stimulate the inherent microbial activity as carbon and nitrogen sources.
Towards further insights, we set up another set of experiments to investigate if combining enzyme addition with a simple inorganic oxidant molecule may increase the beneficial effects on PAH degradation.We selected CaO 2 for these experiments since this compound compares preferably to other oxidizing agents in terms of stability and cost efficiency.
Despite the fact that according to other studies 46,47 calcium peroxide proved to be extremely effective in removing petroleum hydrocarbons and polycyclic aromatic hydrocarbons from the soil, the results of our microcosm experiments displayed otherwise in some cases.Figure 8A shows that addition of CaO 2 leads to preservation of the original spiked concentrations of the PAH pollutants.This effect is most probably due to destroying the soil-inherent microbial flora as the metabolic activity observed in the non-treated soil is fully erased upon addition of CaO 2 (Fig. 8B).
However, a combination of the inorganic oxidant CaO 2 with the novel enzymes displayed increased successful degradation of the more resistant PAHs anthracene and pyrene compounds tested in our microcosm experiments (Fig. 8A).Anthracene concentrations drop by approx.57-70%, while pyrene concentrations are decreased by approx.56-66% upon the combined treatment of the soils with CaO 2 and the novel enzymes (see also Table 3 for pollutant removal efficiency data in %, calculated from the results in Fig. 8).This result also strongly suggests that the novel enzymes can provide beneficial PAH degradation in the absence of a soil-inherent microbial activity.In agreement with these findings, 48 reported similar results showing that sterilization upon peroxide addition increased the degradation of pyrene because of the removal of competition from indigenous microbes.
The degradation efficiency for four PAH pollutants (naphthalene, phenanthrene, anthracene, and pyrene) was calculated comparing to the analytically determined initial concentration values (Table 3).The results show that the microflora of the control group properly adapted, and the decreased PAH concentrations could be attributed to the biodegradation by indigenous microbes.Moreover, our results demonstrated that the degradation efficiency generally decreased with the increasing number of aromatic rings.Applied enzymes enhanced the degradation efficiency, whilst adding CaO 2 decreased the efficiency at PAHs associated with lower molecular masses.Remarkably, at naphthalene, the degradation efficiency was around 95% upon enzyme treatments without peroxide, while at the combined application of enzymes and CaO 2 , it reduced to 73-83%.The degradation www.nature.com/scientificreports/efficiency had a similar pattern in the case of phenanthrene, where it was 68-71% with the addition of enzymes and 41-53% with enzymes plus CaO 2 application.On the other hand, in the case of anthracene, CaO 2 had a positive effect since the 44-46% degradation efficiency with enzymes was further increased to 57-70% at the combined treatment.The highest favourable outcome of CaO 2 was detected at the pyrene removal, where the degradation efficiency was 56-65% with enzymes and CaO 2 , whilst the enzymes alone degraded only 12-17% of this pollutant.
The degradation efficiency generally negatively correlated with the number of aromatic rings of the pollutants, so the degradation efficiency was the highest for naphthalene and the lowest for pyrene without the use of peroxide.This observation is in line with the expectation that more complex, condensed aromatic ring structures are more resistant to degradation.The combined administration of peroxide and enzymes was the most beneficial for highly stable pyrene containing four fused aromatic rings.Among these treatments, the PAH1_105 1,2-dioxygenase enzyme exhibited the effectiveness in the presence of CaO 2 .
Similarly to our research, some studies showed that the chemical oxidations by hydrogen peroxide represent promising pretreatment before the bioremediation for the removal of hydrophobic contaminants [49][50][51] ; however, these treatments have not been used in enzymatic bioremediation.Liao et al. (2019) demonstrated that chemical oxidation is a great pretreatment option coupled with bioremediation to remove PAHs from soil 49 .They found that applying potassium permanganate enhanced the removal efficiency among other oxidants (activated persulfate, Fenton, and modified Fenton).On the other hand, permanganate decreased the microbial diversity and delayed the population recovery, whilst the Fenton treatment applied hydrogen peroxide, which only had a slight impact on indigenous microbial diversity.
As several studies reported [52][53][54] , applying specific enzymes in bioremediation offers some advantages compared to the use of microbial cells, including enhanced specificity, simplified handling and storage, higher   mobility, and activity even in the presence of high concentrations of toxic contaminants.Furthermore, the enzymes do not require nutrient supplementation and may be applicable under extreme environmental conditions.Although the enzyme-based bioremediation methods seem advantageous clean-up technologies, some limitations of these technologies have been demonstrated [54][55][56] .Free enzymes are generally unstable, may be degraded and consequently exist for only a short time in a soil environment; the individually applied free enzymes cannot completely degrade the contaminants.
Although, in many cases, enzymes can usually carry out only the first transformation steps, these first biotransformation steps are usually the limiting factors for further biodegradation 57 .Therefore, the direct application of enzymes in contaminated environmental matrices, particularly in soils contaminated with highly degradable PAH compounds, is a feasible option, as our research illustrates.
The Biolog EcoPlates™ allows for detecting metabolic activities and physiological diversity of microbial communities in the environment 41,58 .Our study also demonstrated the applicability of the Biolog EcoPlate™ as a  www.nature.com/scientificreports/reliable tool for evaluating the activity of PAH-contaminated and remediated soils' microbial community and a necessary method to complement the chemical analysis of the contaminants was well demonstrated.
In conclusion, the applied novel enzymes effectively degraded the contaminants; the used CaO 2 slightly reduced the degradation rate in the case of naphthalene and phenanthrene while enhancing the removal of anthracene and pyrene.The novel enzyme-mediated bioremediation can be a feasible and efficient option in nutrient-poor contaminated soils with low biological activity.

Conclusions
Here, we demonstrated that an artificial intelligence-based method for identifying new PAH-degrading enzymes from still undescribed microorganisms was successful in in vitro studies.The Hidden Markov Model, prepared from known PAH degrading enzymes efficiently identified the sequence signature patterns in still unknown enzymes with PAH-degrading functionality.Cloning and expression of the novel enzymes was followed by functional microcosms experiments to check the soil remediation capability of the newly identified enzymes.We found the highest pyrene removal efficiency for the PAH1_105 enzyme (59%) whose effect was significantly different from PAH1_99 and PAH6_39.This result is in agreement with the prediction from the sequence and structural alignments (Figs. 2 and 3) since PAH1_105 is associated with a point mutation involved in substrate selectivity.
The combined application of oxidizing agent, such as calcium peroxide, with enzymes, was significantly advantageous in the case of poorly degradable pyrene and anthracene.Interestingly, CaO 2 on its own was not efficient due to its strong detrimental effect on soil-inherent microbial activity.However, enzymatic degradation and removal of the PAH components was significantly increased in combination with peroxide treatment.

Figure 2 .
Figure 2. Phylogenetic trees of the newly identified proteins (highlighted by yellow shade).Protein sequence alignments and phylogenetic trees were generated using NCBI blastp (https:// blast.ncbi.nlm.nih.gov/ Blast.cgi).Bar at the bottom of the figure provides a scale the amount of change (number of changes or 'substitutions' divided by the length of the sequence).

Figure 3 .
Figure 3. Dioxygenase sequence alignments.Panel A Amino acid sequences of the six dioxygenases which were found during metagenomic search.The amino acids important in substrate specificity were highlighted by yellow.Glycine in this position allows several substrates binding in the active site, while dioxygenases with alanine in this position oxidize only gentisate 42 .Panel B Amino acid sequence alignment of the two selected dioxygenases (PAH1_99 and PAH1_105) and Pseudaminobacter salicylatoxidans salicylate 1,2-dioxygenase.Highly conserved amino acids based on literature 42 are shown in red.Yellow color indicates the G/A point mutation which influences the enzyme substrate specificity.The flexible (loop) regions of the template were highlighted by purple, and the main differences in the three-dimensional structure between the template and the model were highlighted by ocher.

Figure 4 .
Figure 4. Three-dimensional models of the two dioxygenases (PAH1_99 and PAH1_105) built based on Pseudaminobacter salicylatoxidans salicylate 1,2-dioxygenase (PDB ID: 3NW4).Panel A The merged homotetrameric structures of the template (yellow), and the models of PAH1_99 (greencyan) and PAH1_105 (orange).Panel B A close up of a monomer with the active site.Red arrows point on the main differences in the three-dimensional structure between the template and the models (the sequences are highlighted by ocher on Fig. 2B).The flexible (loop) regions of the template were colored by purple.Panel C Amino acids which interact with the substrate gentisate based on Pseudaminobacter salicylatoxidans salicylate 1,2-dioxygenase three-dimensional structure (PDB ID: 3NW4).Colours: PAH1_99 (greencyan) and PAH1_105 (orange) with the substrate gentisate (2,5-dihydroxybenzoic acid) (magenta).The Fe(II) ion is indicated by a gray sphere.Polar contacts were labelled by black dashed lines.(First amino acid belongs to PAH1_99, the second to PAH1_105).

Figure 5 .
Figure 5. Catalase-peroxidase sequence alignments.Panel A Multiple sequence alignment of the three catalaseperoxydase amino acid sequences which were found during metagenomic search.Panel B Amino acid sequence alignment of the selected catalase-peroxydase PAH6_39 and Synechococcus elongates catalase-peroxidase.Highly conserved amino acids based on literature 44 labelled by red.The residues of the catalytic Met-Tyr-Trp adduct are highlighted in yellow.The flexible (loop) regions of the template were highlighted by purple, and the main differences in the three-dimensional structure between the template the model were highlighted by ocher.

Figure 6 .
Figure 6.Three-dimensional models of the catalase-peroxidase PAH6_39 built based on Synechococcus elongatus catalase-peroxidase (PDB ID: 3WNU).Panel A The merged homodimeric structure of the template and the model, where the PAH6_39 was labelled by blue and the Synechococcus elongatus catalase-peroxidase was labelled by yellow.Haems in the active sites shown in orange.Panel B A close-up of the monomers.The color scheme is same as in case of the homodimer.Na ions were labelled by gray spheres.The flexible (loop) regions of the template were colored by purple.Red arrows point on the main differences in the threedimensional structure between the template and the models (the corresponding sequences are highlighted in ocher yellow on Fig.5B).

Figure 7 .
Figure 7. PAH degradation (Panel A) and microbiological activity (Panel B) in the soils treated with the novel enzymes.Data represent averages of three replicates and error bars are standard deviations.Red lines show the initial concentrations of the specific PAH compounds.Letters on the columns indicate significant differences (p < 0.05), data in columns with different letters are statistically different, data in columns with the same letters are not statistically different.

Figure 8 .
Figure 8. PAH degradation (Panel A) and microbiological activity (Panel B) in the soils treated with CaO 2 and/ or enzymes.Data represent averages of three replicates and error bars are standard deviations.Red lines show the initial concentrations of the specific PAH compounds.Letters on the columns indicate significant differences (p < 0.05).
Open access funding provided by Budapest University of Technology and Economics.KT, BV, and VG were partially supported by the Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund, financed under the ELTE TKP 2021-NKTA-62 funding scheme.KT, BV, VG and BGV were partially funded by the National Research, Development and Innovation Fund under the contract

Table 1 .
Treatment conditions of the microcosm experiments.

Table 2 .
General properties and bioinformatic predictions of metagenomic hit enzymes.

Table 3 .
The average pollutant degradation efficiency of the different treatments compared to the initial pollutant concentrations determined by GC MS.