Identification of an unauthorized genetically modified bacteria in food enzyme through whole-genome sequencing

Recently, the unexpected presence of a viable unauthorized genetically modified bacterium in a commercialized food enzyme (protease) product originating from a microbial fermentation process has been notified at the European level (RASFF 2019.3332). This finding was made possible thanks to the use of the next-generation sequencing technology, as reported in this study. Whole-genome sequencing was used to characterize the genetic modification comprising a sequence from the pUB110 shuttle vector (GenBank: M19465.1), harbouring antimicrobial resistance genes conferring a resistance to kanamycine, neomycin and bleomycin, flanked on each side by a sequence coding for a protease (GenBank: WP_032874795.1). In addition, based on these data, two real-time PCR methods, that can be used by enforcement laboratories, specific to this unauthorized genetically modified bacterium were developed and validated. The present study emphasizes the key role that whole-genome sequencing can take for detection of unknown and unauthorized genetically modified microorganisms in commercialized microbial fermentation products intended for the food and feed chain. Moreover, current issues encountered by the Competent Authorities and enforcement laboratories with such unexpected contaminations and the importance of performing official controls were highlighted.


Results and Discussion
A food enzyme (protease) product commercialized on the EU market, used in this study, was recently suspected to be contaminated by unauthorized GMM. On the one hand, starting from DNA extracted from the food enzyme matrix, the presence of bacterial DNA belonging to the Bacillus genus was demonstrated by PCR amplification and Sanger sequencing of the bacterial 16 S rRNA gene region 8 (Fig. 1 step 1). In addition, the potential GM nature of this contamination was suspected based on a positive real-time PCR signal for the aminoglycoside adenyltransferase (aadD) gene conferring a resistance to both kanamycin (KanR) and neomycine (NeoR) (GenBank: M19465.1) ( Fig. 1 step 1) 10 . The presence of the full-length size of this aadD gene was subsequently confirmed through nested-PCR combined with Sanger sequencing, highlighting potential health and environmental risks associated to this food enzyme product in light of AMR acquisition concerns 9,10 . On the other hand, importantly, a viable bacterial strain was isolated from the tested food enzyme product 8 . This bacterial strain was subjected to the same analysis as above applied on DNA extracted from the food enzyme matrix. Similarly to the Figure 1. Schematic representation of the workflow, composed of three main successive steps, applied on the tested food enzyme product, allowing to identify unauthorized GMM by whole-genome sequencing (WGS) and subsequently to develop event-specific real-time PCR methods. (1) DNA extracted from the FE preparation was tested for the presence of bacterial DNA as well as the presence of AMR genes frequently harboured by GMM used to produce FE. (2) Living microbial strains, earlier isolated from the FE preparation, were tested for the presence of bacterial DNA and subsequent determination of their genus/species as well as for the presence of AMR genes frequently harboured by GMM used to produce FE. (3) The bacterial strains identified in (2) as carrying AMR genes was characterized by a WGS strategy using a de novo assembly analysis in order to demonstrate the presence of a viable unauthorized GMM in the tested FE preparation. With the generated sequences, real-time PCR methods specific to this GMM were developed to be used by enforcement laboratories.
results observed using DNA extracted from the food enzyme matrix, this bacterial strain was shown to belong to the Bacillus genus 8 as well as to carry the aadD gene ( Fig. 1 step 2, Table 1, Supplementary file 1). The presence of a viable GM Bacillus strain in the tested sample was consequently strongly suspected, but, these results were insufficient to undoubtedly prove the presence of this GMM and characterize its associated genetic modifications. GMM identification using WGS. Using an Illumina MiSeq system (250 bp paired-end reads), WGS applied on DNA from the bacterial strain isolated from the food enzyme product ( Fig. 1 step 3, Supplementary file 2) generated 714,637 paired-end raw reads. Following read trimming, 589,817 high-quality reads (average Phred score of 37) were retained to perform a de novo assembly, allowing to generate 430 contiguous sequences (contigs) of which 47 were longer than 1,000 bases with a k-mer coverage of at least 10x. Contig sizes ranged from 56 bp to 457,195 bp, with an N50 value of 291,658.
On the one hand, the generated contigs presented a correspondence to the Bacillus genus, and, more precisely, surprisingly to the B. velezensis species (RefSeq: NZ_CP011937.1) instead of the expected B. subtilis species that was labelled as being the producer organism of the commercialized neutral protease. This identification was based on three observations. Firstly, when using the assembly for typing against the B. subtilis MLST schema hosted by the PubMLST.org web-based platform, a perfect match to sequence type 140 was obtained for which only a single isolate was present in the database (PubMLST: ATCC 12321) annotated as the species B. velezensis. Secondly, a k-mer based classification of sequencing reads against an in-house dump of all complete genomes in the RefSeq Microbial Genomes database indicated the presence of B. velezensis (Supplementary file 3 4). B. velezensis species is not listed by EFSA (2018) as being used in the food and feed industry to produce food and feed additives, enzymes and flavourings intended for the EU market 30 . However, this species, for which the wild-type is harmless for human and closely related to B. amyloliquefaciens and B. subtilis, has previously been described as highly valuable for producing enzymes, including proteases, for the agro-industrial sector [31][32][33][34][35][36] .
On the other hand, the generated contigs were blasted against the aadD gene, conferring KanR and NeoR, that was earlier detected by real-time PCR as well as nested-PCR followed by Sanger sequencing analysis 10 (Table 1, Supplementary file 1). A contig of 349,285 bp with a k-mer coverage of 59.434x was identified as harbouring the targeted AMR gene (Fig. 2, Supplementary file 5). In order to identify the putative transgenic insertion, the regions flanking this AMR gene were then characterized and compared to the reference genome of B. velezensis (RefSeq: NZ_CP011937.1). In the reference genome, a region of 2,385 bp from position 2,460,164 to 2,462,548, with >99% sequence identity, composed of a gene coding for a protease (GenPept: WP_032874795.1; RS12020 in Fig. 2) as well as part of a gene coding for an acetyltransferase (GenPept: WP_032874793.1; RS12025 in Fig. 2), was replaced by a fragment of 9,141 bp in the genome of the isolated bacterial strain containing the region of Figure 2. Schematic representation of the identified transgenic insertion. The pUB110 shuttle vector (green) harbours the aadD gene conferring a resistance to kanamycin (KanR) (purple) and the ble gene conferring a resistance to bleomycin (BleoR) (yellow). Blue rectangles represent annotated genes on the reference genome. The region indicated in orange contains a gene coding for a protease (RS12020) and a part of a gene coding for an acetyltransferase (RS12025). The latter, indicated by a small dark red rectangle in the GM consists out of a full (RS12025) and interrupted (RS12025a) copy. The red region is unique in the wild-type while this red region is duplicated, on both sides of the pUB110 shuttle vector, in the GMM. The dark and hatched rectangles indicate the regions targeted by the left (L) and right (R) event-specific real-time PCR methods developed and validated in this study. (2020) 10:7094 | https://doi.org/10.1038/s41598-020-63987-5 www.nature.com/scientificreports www.nature.com/scientificreports/ 2,385 bp in duplicate. Since the tested food enzyme product was commercialized as a protease, the duplication of this region can therefore be explained by the aim of the manufacturers to increase protease yield during the production process. Between these duplicated regions, a sequence of 4,102 bp, with a query coverage and identity of 100%, matching to the pUB110 shuttle vector (GenBank: M19465.1) harbouring the aadD gene (GenBank: AAA88361.1), conferring KanR and NeoR, that was earlier identified by real-time PCR and nested-PCR ( Fig. 1), and the ble gene (RefSeq: NG_047557.1), conferring a resistance to bleomycin (BleoR) was characterized (Fig. 2, Supplementary file 5). This pUB110 shuttle vector, originating from Staphylococcus aureus, and the identified AMR genes were previously reported as being highly used in GMM producing bacterial fermentation products in the food and feed chain, especially for selection of strains of interest 10,37 . In addition, the observed left and right transgene flanking regions of the inserted fragment of 9,141 bp as well as the left and right transgene flanking regions of the pUB110 shuttle vector were confirmed by PCR followed by Sanger sequencing (Supplementary files 5,6). Based on all these results, the presence of a genetic modification specific to a viable GMM in the commercialized food enzyme product was therefore demonstrated. These results, communicated to the Belgian Federal Agency for the Safety of the Food Chain, have led to the RASFF 2019.3332 notification at the EU level.
Regarding the bioinformatics methodology, compared to a read-mapping analysis, a de novo assembly analysis was the most relevant strategy to identify and characterize an unknown and unauthorized GMM for two reasons. Firstly, no reference sequence is required, representing an advantage e.g. in the present study due to the unavailability of a reference sequence for the identified GMM. This approach is also advantageous when the species identity of the GMM host is difficultly identifiable, as exemplified in this study with the Bacillus strain 8 . Secondly, through reconstructing a contig containing the transgenic insertion in the wild-type B. velezensis genome, an unnatural association of sequence elements could be inferred, providing strong evidence of the presence of a GMM. Without an available reference sequence for a specific GMM, a read-mapping analysis cannot provide this type of crucial information. Indeed, only the presence of sequences belonging either to the pUB110 shuttle vector or to B. velezensis could then have been demonstrated, but no link between the pUB110 shuttle vector and B. velezensis could have been established (Supplementary file 4).

Development of event-specific real-time PCR methods based on WGS data. Based on character-
ization of the transgenic insertion into B. velezensis, two event-specific real-time PCR methods were developed and validated, allowing to specifically target cost-and time-efficiently the unauthorized GMM discovered in the present study ( Fig. 1 step 3). These two event-specific methods were designed to cover either the left or the right transgene flanking region of the inserted pUB110 shuttle vector ( Fig. 2; Table 1; Supplementary file 5). For each real-time PCR method, an amplicon with the expected size and sequence was obtained (Supplementary file 7). The performance of these real-time PCR methods was then investigated.
First, the specificity of these real-time PCR methods was tested using, as positive control, DNA from the isolated GM B. velezensis RASFF 2019.3332 strain as well as, as negative controls, DNA from eighty-five wild-type microbial strains frequently used to produce microbial fermentation products 9,10 , DNA from six different wild-type B. velezensis strains, DNA from the vitamin B2-producing GM B. subtilis RASFF 2014.1249 strain, DNA from plant (Zea mays) and DNA from human. As expected, these event-specific real-time PCR methods presented a positive signal only for the positive control, confirming their specificity (Table 2). Second, the sensitivity of these real-time PCR methods was assessed using DNA from the GM B. velezensis RASFF 2019.3332 strain at different estimated full genome copy numbers (6 × 10 6 , 6 × 10 4 , 6 × 10 2 , 60, 12, 6, 1, 0.1 and 0) ( Table 3). For both real-time PCR methods, a positive signal was observed at as low as one estimated full genome copy, demonstrating their high sensitivity. Third, the applicability of these real-time PCR methods was tested using DNA from the commercialized food enzyme product in which the GM B. velezensis RASFF 2019.3332 strain (sample n°1) was detected as well as a commercialized vitamin B2 feed additive product (RASFF 2014.1249) (sample n°2). As expected, both real-time PCR methods presented a positive signal for the sample n°1 and a negative signal for the sample n°2 (Supplementary file 6). Based on all these results, the two proposed event-specific real-time PCR methods were evaluated as specific, sensitive and applicable, allowing enforcement laboratories to easily target the GM B. velezensis RASFF 2019.3332 strain in commercialized microbial fermentation products. If necessary, following to additional optimisation and validation steps, these real-time PCR methods could be combined into a duplex assay.

Conclusion
Following a first-line PCR-based screening analysis targeting generic markers, including 16S rRNA gene region for bacterial presence and key AMR genes frequently harboured by GMM, the potential presence of GMM in a commercialized food enzyme preparation of protease was previously suspected 8,10 . On this basis, a bacterial strain, isolated from this suspicious sample, and its associated genetic modifications were successfully characterized in this study by WGS combined to a de novo assembly analysis. The relevance of the proposed analytical workflow on the tested microbial fermentation product was thus demonstrated, including in particular the crucial role of the first-line PCR-based screening step targeting key AMR genes to assess the potential presence of a GMM. WGS applied on the isolated bacterial strain allowed to generate data in order to fully characterize the transgenic insertion, including the transgene flanking regions and unnatural associations of elements, which indubitably proved the presence of a viable GMM. The present study allowed therefore to demonstrate for the first time the presence of an unknown and unauthorized GMM in food enzyme products commercialized on the EU market. Since the identified GMM is viable and carries full-length AMR genes with flanking regions, health risks related to AMR acquisition clearly need to be considered [17][18][19] . This finding in conjunction with the previous RASFF notification related to the presence of a living GMM in vitamin B2 feed additives 5,6 strongly emphasises, in particular to the Competent Authorities, the importance of enforcement laboratories to control microbial fermentation products in order to guarantee the safety of the food and feed chain. For this purpose, while using (2020) 10 Continued a technology largely mastered by enforcement laboratories, two real-time PCR methods targeting specifically the protease-producing GMM identified in the analysed food enzyme product have been here developed and validated. This case study emphasizes also concerns at several levels associated to the freedom of choice for consumers, the traceability and the safety of the food and feed chain. Indeed, the requirements of the current EU legislation are insufficient because no tool to monitor the unauthorized presence of GMM is provided to enforcement laboratories and Competent Authorities. As mandatory for GM plants authorized for commercialization on the EU food and feed chain, available identification methods specific to trace GM strains in microbial fermentation products would be helpful 16 . Due to the lack of appropriate tools and the confidentiality of GMM dossiers, enforcement laboratories are thus not able to verify the respect of criteria recommended by EFSA, such as the absence of AMR genes, and consequently the safety of the food and feed chain 17 .
The success of the present study relied on the use of WGS followed by a de novo assembly analysis. Although it is still beyond routine activities for most enforcement laboratories, this approach may yet be considered as  www.nature.com/scientificreports www.nature.com/scientificreports/ cost-and time-efficient and state-of-the-art. It requires however a minimum of bioinformatics infrastructure and expertise for data analysis. Therefore, in case of suspicious samples, it could be envisaged that WGS is performed by "sentinel" enforcement laboratories with the necessary expertise in order to characterize sequences of interest, followed by the development of real-time PCR methods that successively can easily be implemented by other peripheral enforcement laboratories [38][39][40] . In addition, the proposed WGS strategy is only recommendable for isolated bacterial strains. Nonetheless, the isolation step of cultivable bacterial strains represents a bottleneck. Therefore, targeted sequencing approaches, including DNA walking, or metagenomics approaches can be developed for the detection and characterization of GMM when bacterial isolation is not achieved [25][26][27][28][29] . However, specific requirements are necessary for the use of these culture-independent alternatives. For the targeted sequencing approaches, a minimum of prior knowledge is mandatory. On this basis, a DNA walking strategy can be developed to anchor on key AMR genes earlier detected during the first-line PCR-based screening. The unknown regions surrounding these AMR genes can thus be characterized, as previously performed for unknown and unauthorized GM plants [25][26][27][28][29] . For metagenomics, even if no prior knowledge is required similarly to WGS, this promising approach is still currently in its infancy mainly due to the following bottlenecks that are challenging its implementation 26,[41][42][43][44][45][46] . Indeed, the low abundance of DNA of interest present in the total DNA extract, as frequently encountered with GMO contamination, complicated its identification. To overcome such issue, a very high sequencing depth is usually required, increasing dramatically the time and the cost of the analysis. Moreover, metagenomics requires significant development at both wet-and dry-lab levels as well as important computational capacities for such type of complex bioinformatics analysis. However, all these limitations are expected to be overcome in the near future as successfully illustrated by pioneer studies in other problematics, including foodborne pathogen detection [41][42][43][44][45][46] .

Materials and methods
GMM isolation of the tested FE product. The commercialized FE product containing a neutral protease in a maltodextrin and corn starch carrier (Pureferm, Batch number TAP25114738, The Alchemist's Pantry, https://thealchemistspantry.com/product/pureferm/) in a solid powder form was collected. This FE product (1 g) was mixed into Brain-Heart Infusion broth for bacterial growing overnight at 37 °C. Following a 1:10,000 dilution, 100 µl of this liquid was plated on nutrient agar for bacterial growth overnight at 37 °C.
Microbial strains. Eighty-five wild-type bacterial and fungal strains were obtained from several collections including Sciensano, the Belgian co-ordinated collections of micro-organisms, the Research Institute for Agriculture, Fisheries and Food, the Convention of Biological Diversity, the American Type Culture Collection and the Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (Table 2). These microbial strains correspond to the majority of microorganisms reported by EFSA (2018) as being used in the food and feed industry to produce food and feed additives, enzymes and flavourings 9,30 . Six wild-type Bacillus velezensis strains were also collected from the Belgian co-ordinated collections of micro-organisms ( Table 2). The vitamin B2-producing GM B. subtilis RASFF 2014.1249 strain, previously isolated from a commercialized feed additive, was obtained from Sciensano collection ( Table 1). The GM B. velezensis isolated in this study was associated to the RASFF 2019.3332 notification number (Table 1).  www.nature.com/scientificreports www.nature.com/scientificreports/ DNA extraction, concentration and purity. DNA extraction from wild-type microbial strains (Table 2), the GM B. subtilis RASFF 2014.1249 strain (Table 2), the commercialized food enzyme product and the wild-type Zea mays was performed as previously described [8][9][10]47 . Human DNA was purchased from ThermoFisher (4312660) ( Table 2).
According to the manufacturer's instructions, DNA from the bacterial isolate of the GM B. velezensis RASFF 2019.3332 strain was extracted using the Genomic-tip 100/G kit (QIAGEN) and then visualized by capillary electrophoresis using the Tapestation 4200 device with the associated genomic DNA Screen Tape and reagents (Agilent) (Supplementary file 2).
Each DNA concentration was measured by spectrophotometry using Nanodrop ® 2000 (ThermoFisher) and each DNA purity was evaluated using the A260/A280 and A260/A230 ratios. WGS analysis. The DNA library was prepared using the Nextera XT DNA library preparation kit (Illumina) according to manufacturer's instructions. The sequencing was carried out on an Illumina MiSeq system with the V3 chemistry, obtaining 250 bp paired-end reads. The generated data (SRA number: PRJNA575813) were analyzed via an in-house instance of the Galaxy Workflow Management System 48 , for which a public instance is also available at https://galaxy.sciensano.be. The quality of the generated raw data was evaluated using FastQC 0.11.4 with default parameters. The raw data were trimmed with Trimmomatic 0.36 49 with SLIDINGWINDOW:4:20 and MINLEN:150 as settings. The quality of the trimmed reads was evaluated using FastQC with default parameters.
For the de novo assembly, contigs were generated from the trimmed reads using SPAdes 3.8 50 with the starting k-mers set at 117, 121 and 127 (other parameters were left at default values). These k-mer values were selected using VelvetOptimiser 2.5.5 51 with default settings to optimize for the largest assembly N50. Additional assembly statistics were afterwards generated using Quast 4.1 52 .
For species identification, the assembled contigs were used for typing against the B. subtilis MLST schema using the PubMLST.org web-based platform 53 . Additionally, the trimmed paired reads were analyzed with Kraken2 2.0.7-beta 54 with default parameters against an in-house dump of all complete genomes from the NCBI RefSeq Microbial Genomes database (database retrieved 18/02/2019) 55 57 with the '-a' option enabled on the mapped reads, and then extracting the statistics from the resulting tabular file with an in-house script.
For characterization of the transgenic insertion, the assembly was web-based blasted against the aminoglycoside adenyltransferase (aadD) gene (GenBank: M19465.1; AAA88361.1) that confers both KanR and NeoR, identifying a single contig carrying the aadD gene. This contig was then blasted using the megablast program against the NCBI nucleotide database with default parameters to identify and characterize the transgenic insertion (Fig. 2, Supplementary file 3). The location of the transgenic insertion was then determined by aligning the flanking regions against the single representative reference genome of B. velezensis (RefSeq: NZ_CP011937.1) using a local installation of blastn 2.7.1 58 with default settings. A visualization focusing on the region containing the transgenic insertion was constructed using Circos 0.69-6 based on the reference genome annotation and blastn results 59 (Fig. 2). Alignment of the transgenic insertion against the NCBI nucleotide database gave a match of length 4,101 bp and sequence identity of >99% with the pUB110 shuttle vector (GenBank: M37273.1). Lastly, trimmed reads were mapped against the shuttle vector sequence and de novo assembly and visualized with Tablet as specified above (Supplementary file 5).
PCR and nested-PCR assays. Each assay was performed in a standard 25 µl reaction volume containing 1X Green DreamTaq PCR Master Mix (ThermoFisher Scientific), 400 nM of each primer (Eurogentec) and 10 ng of DNA from the isolated GM B. velezensis RASFF 2019.3332 strain (Table 1; Supplementary file 1). The PCR program consisted of a single cycle of 1 min at 95 °C (initial denaturation) followed by 35 amplification cycles of 30 sec at 95 °C (denaturation), 30 sec at 60 °C (annealing) and 1 min at 72 °C (extension) and finishing by a single cycle of 5 min at 72 °C (final extension). The run was performed on a Swift MaxPro Thermal Cycler (Esco). For each assay, a "No Template Control" (NTC) was included. The final PCR products were visualized by capillary electrophoresis using the Tapestation 4200 device with the associated D1000 or D5000 Screen Tape and reagents (Agilent) (Supplementary file 6). The generated PCR products were purified using USB ExoSAP-IT PCR Product Cleanup (Affymetrix) according to the manufacturer's instructions, in order to be sequenced on a Genetic Analyzer 3500 using the Big Dye Terminator Kit v3.1 (Applied Biosystems) (Supplementary file 6).
To verify the WGS data related to the characterization of the genetic modification observed in the isolated bacterial strain, primers were designed on the observed left and right transgene flanking regions of the inserted fragment of 9,141 bp as well as on the left and right transgene flanking regions of the pUB110 shuttle vector using the software Primer3 (Table 1; Supplementary file 5).
Real-time PCR assays. Each real-time PCR assay was performed in a standard 25 µl reaction volume containing 1X TaqMan ® PCR Mastermix (Diagenode), 400 nM of each primer (Eurogentec), 200 nM of the probe and 5 µl of DNA (Table 1). The real-time PCR program consisted of a single cycle of DNA polymerase activation for 10 min at 95 °C followed by 45 amplification cycles of 15 sec at 95 °C (denaturing step) and 1 min at 60 °C or 64 °C (annealing-extension step). All runs were performed on an a CFX96 Touch Real-Time PCR Detection System (BioRad). For each assay, a NTC was included. Primers and probes targeting the AMR genes were previously published 9,10,60 .
For the real-time PCR methods targeting either the left or the right transgene flanking region of the insertion identified in the characterized GM B. velezensis (RASFF 2019.3332), primers and probes were designed in this

Data availability
All data generated or analyzed during this study are included in the published article and its supplementary information files or are available from the corresponding author. Regarding the RASFF2019.3332 notification, information related to the availability of the GM bacterial strain and associated recombinant DNA are available from the corresponding author.