Abstract
Background noise in metagenomic studies is often of high importance and its removal requires extensive post-analytic, bioinformatics filtering. This is relevant as significant signals may be lost due to a low signal-to-noise ratio. The presence of plasmid residues, that are frequently present in reagents as contaminants, has not been investigated so far, but may pose a substantial bias. Here we show that plasmid sequences from different sources are omnipresent in molecular biology reagents. Using a metagenomic approach, we identified the presence of the (pol) of equine infectious anemia virus in human samples and traced it back to the expression plasmid used for generation of a commercial reverse transcriptase. We found fragments of multiple other expression plasmids in human samples as well as commercial polymerase preparations. Plasmid contamination sources included production chain of molecular biology reagents as well as contamination of reagents from environment or human handling of samples and reagents. Retrospective analyses of published metagenomic studies revealed an inaccurate signal-to-noise differentiation. Hence, the plasmid sequences that seem to be omnipresent in molecular biology reagents may misguide conclusions derived from genomic/metagenomics datasets and thus also clinical interpretations. Critical appraisal of metagenomic data sets for the possibility of plasmid background noise is required to identify reliable and significant signals.
Similar content being viewed by others
Introduction
Metagenomics dramatically changed our view on the composition of microbial communities in a diversity of ecosystems, including particularly the gut associated microbiome. The large-scale, indiscriminate sequencing used in metagenomics allows to comprehensively map all microbial components within well-defined ecosystems. However, an important and often ignored pitfall of this sequencing approach is the indiscriminate sequencing of all sequences present in a given sample. Genomic sequences that may have been inadvertently introduced into samples during processing will be sequenced at a similar efficacy as target sequences and this background noise may mask signals obtained from target sequences. This is a common contamination problem, as exemplified in studies that used Whole Genome Amplification (WGA), where as few as 30% of all reads originated from target DNA1.
Genomic background noise may originate from very different sources and may be introduced at multiple points during sample preparation. For example, bacteriophage ΦX174 genomic DNA is added to samples as positive control and despite standard filtering of these sequences it may still be found in published metagenomics samples2. More commonly, however, genomic background noise may be introduced inadvertently into the test system. For example, cross-species contamination from bacterial and mammalian DNA has been reported frequently from metagenomic studies3,4,5,6. Background nucleic acids are commonly introduced inadvertently by human handling of samples7,8 via air9,10, commercial enzymes11,12,13,14,15,16,17, DNA extraction kits18,19,20, Ultrapure-water Systems (UPW)21,22,23 or paper points24. Even plain buffer solutions used in metagenomics may be source of foreign DNA25. The ensuing high background noise reduce the generalizability of findings and differentiation from meaningful signals26,27,28,29,30,31. Extensive and stringent post-analytical bioinformatic filtering of data sets is, therefore, required to ensure a clean look at the biological system. Nevertheless, if filtering parameters are too lenient, this would pose the risk of eliminating biologically meaningful signals from data sets.
An important but widely ignored source of foreign genomic sequences are enzyme preparations used for NGS. Enzymes such as polymerases are generated recombinantly in prokaryotic hosts with the usage of an inducible expression vectors. Residual DNA introduced by the protein expression, particularly bacterial DNA, has been reported since the early 90’s32,33,34,35,36,37. The most widely used enzyme, Taq polymerase, is estimated to contain between 102 to 105 genome equivalents of bacterial DNA per unit of enzyme11,38,39. In addition to bacteria-derived residues, murine13,40,41 and human retroviruses42 as well as bacterial-phage such as DNA sequences12 were identified in further enyzme preparations.
The relevance of residual expression vectors for NGS, however, has not been elucidated so far. Plasmids are naturally occurring circular shaped pieces of extrachromosomal DNA, which can replicate independently from the host DNA and can be transferred into samples via multiple routes43,44. Complete expression vector sequences have not been identified in commercially available enzyme preparations although fragments interfered occasionally with sequencing studies14,15,17,41,42,45. For example, Tenover et al., had to use a native Taq polymerase to avoid false-positive test results when evaluating samples for the presence of antibiotic resistance genes such as BlaTem-1 because most expression vectors used for generation of the enzyme also had this resistance gene46. The erroneous identification of antibiotic resistance genes potentially has far-reaching consequences such as misguided patient treatment47.
In a recent metagenomic virome study, we frequently found signatures of natural plasmids in human samples48. More remarkably, we also found sequences of a horse retroviral pathogen in these human samples. The pol gene of the Equine Infectious Anemia Virus (EIAV) was found in all urine- and pharyngeal lavage samples collected from healthy human volunteers. EIAV is a species-specific lentivirus that infects Equidae and causes an immunodeficiency syndrome similar to that of the Human Immunodeficiency Virus (HIV-1). The immunological, virological and medical implications for the common presence of an equine retrovirus in human samples are far-reaching49. Still, viruses in general, are very specific for their hosts and cross-species infections are rare events.
The aim of the present study is, therefore, to evaluate the source and the biological relevance of this finding. Surprisingly, we uncovered a common theme in NGS workflows – introduction of foreign plasmid DNA from very different and multiple sources into samples tested with NGS methods. Our findings may have far-reaching biological consequences for wide areas of life sciences.
Results
Equine Infectious anemia virus pol sequences are derived from extrinsic plasmids
In a previous study, we detected contigs containing the polymerase (pol) gene of the retrovirus Equine infectious anemia virus (EIAV) in all evaluated human samples from healthy volunteers (n = 4)48. EIAV is a retrovirus infecting Equidae but not reportedly humans and also has not been reported as a zoonotic disease of humans so far49. A phylogenetic analysis of the sequences found in relation to those of other lentiviridae such as Human Immunodeficiency Virus-1 pol (HIV-1; NC_001802.1), Feline Immunodeficiency Virus pol (FIV; NC_001482.1) and Maedi/Visna pol strain kv1772 (NC_001452.1) showed a high similarity of the sequences detected with the pol gene of the EIAV clone CL 22 strain (ID: M87581.1; Fig. 1). Further alignment of sequences showed no genetic variation among the pol sequence we found, which is highly unusual for retroviruses with high mutation rates. Only when compared to the standard strain EIAV Wyoming, a small number of nucleotide differences had been identified.
All fragments found, corresponded only to a part of the pol gene of EIAV reference strains (1.667 kb). Furthermore, the pol sequences identified were flanked by a CmR sequence (Chloramphenicol acetyltransferase; ID: EDS05563.1), and in the case of the longest contig available by an additional BlaTem-1 resistance-encoding sequence (ID: WP_000027050.1, Fig. 2A). Further assembly of EIAV pol flanking sequences revealed additional genes indicative for the presence of an expression vector including a Histidine-Tag, a Ribosomal Binding Site (RBS), a lac operator, a T5 promoter and a lambda t0 as well as a rrnB T1 terminator (Fig. 2B).
To validate the presence of a vector and to identify the source of contamination, we tested all laboratory consumables and clinical samples used previously by Thannesberger et al., with the use of a PCR assay that is specific for the EIAV pol sequences found. Surprisingly, all of these samples were negative for EIAV pol sequences (Fig. 3A). To exclude the presence of an RNA template of the EIAV pol sequences, samples had been tested again after reverse transcription with Omniscript RT Kit (Qiagen, Hildesheim, Germany). After that, all samples that were reverse transcribed had been tested positive for EIAV pol sequences, including also the non-template control of the reaction mix (Fig. 3B). Therefore, we suspected that the RT kit used (Omniscript RT Kit) is the EIAV pol source. To validate this hypothesis, we treated all of these samples with a different reverse-transcriptase (iScript cDNA Synthesis Kit, Biorad, California, USA) and repeated the same experiment. These experiments yielded uniformly negative test results (data not shown), which further indicates that the Omniscript RT Kit was the source of the EIAV pol sequences.
In order to quantify the overall genomic background noise present during the virome testing procedure, a qPCR was designed that is specific for the CmR resistance found frequently in the EIAV contigs. Three different time steps, reflecting the enzymatically treatment incorporated in the standard workflow of the VIPEP method, had been tested and designated T0, T1 and T2. Time step T0 contained the reverse transcription mix (Omniscript RT Kit) without performing reverse transcription, T1 after the reverse transcription, and T2 was after a multiple displacement amplification (MDA) of 1 µl T1 with REPLI-g Mini Kit (Qiagen, Hildesheim, Germany). The plasmid copy number increased from 39,249 per µl at T0 to 383,045 copies at T1 and 245,444,045 copies at T2.
Characterization of omnipresent natural and artificial plasmid residues in NGSs reagents
After that, all contigs available from the previous study had been re-evaluated in silico for the presence of plasmid sequences such as selection markers and origin of replication to evaluate the possible presence of additional artificial expression vectors. We found multiple other sequences exhibiting characteristics of expression vectors (Fig. 4). Of 4956 contigs from twelve samples, 1.61% (n = 80) contained plasmid sequences. These sequences were found in such diverse samples such as human urine (n = 4), pharyngeal lavages (n = 4), technical replicate groups (n = 2) and a non-template control (n = 1). The relative abundance of plasmid background ranged from 0.16% in the Non-Template Control (NTC) up to 20.83% in one patient sample. Interestingly, the urine samples had a higher plasmid background with a mean of 11.67% (Max: 20.83%; Min: 2.65%; SD: 8.97%) compared to the pharyngeal lavage samples with a mean of 4.67% (Max: 10.47%; Min: 2.65%; SD: 4.42%). The urine technical replicates had higher plasmid residues compared to the pharyngeal lavage technical replicates (6.757% vs. 4.225%) (Fig. 5).
Characterization of plasmid residues
Of the 80 contigs with plasmid signatures, 41% (n = 33) had an origin of replication, 63% (n = 51) a selection marker and 52% (n = 42) an insert. Apart from the EIAV coding expression vector, three other artificial expression vectors could be identified by their inserts. Of these inserts, 19% included a chimera of a human-mouse chimera Bicaudal 1 gene (n = 8), 11% the UL-32 gene of the Cytomegalovirus (n = 5) and 5% the leukemia fusion protein AML1-MTG8 (n = 2). All contigs with a specific insert had been aligned and the consensus sequence displayed in SnapGene Viewer gave a predicted plasmid map (Fig. 5). The plasmids coding for Bicaudal 1 chimera and UL-32 genes were identical to those used for other studies in our laboratory and had, therefore, been identified as laboratory contaminants. BLAST of the 2268 bp long fragment of “Und_TR29_len2635”, found in the Und sample (Undetermined contigs), showed a 99% query coverage with homo sapiens mRNA for AML1-MTG8 fusion protein (GenBank: D13979.1). The source of this plasmid remains unknown.
Natural plasmids residues are derived from a variety of sources
Besides the presence of artificial plasmids, natural occurring plasmids from different species were found in all twelve samples (n = 12). The most frequent plasmid was from Micrococcus spp. (92%) followed by Serratia spp. (50%), Burkholderia spp. (42%), Ralstonia spp. (25%), Acinetobacter spp. (25%), Mucilaginibacter spp. (17%), Streptococcus spp. (17%), Enterobacter spp. (8%) and Cupriavidus spp. (8%; Table 1). The plasmid sequences we found from Serratia maracesens pUO901 (ID: NG_047232.1) and Enterobacter cloacae pEC005 (ID: NG_050201.1) coded only for antibiotic resistances. The first one was identified as a aminoglycoside-(3)-N-acetyltransferase (AAC(3)s), whereas the latter coded for a Class A extended-spectrum beta lactamase TEM-157 (Table 1). These plasmids are likely from natural sources.
Detection of plasmid residues in commercially available polymerases
To evaluate whether plasmid residues are commonly present in commercially available polymerase preparations, we tested Taq polymerases (n = 4), high-fidelity polymerases (n = 2) and qPCR mastermixes (n = 7) for the presence of an origin of replication (pBM1/pUC19/pBR322/ColE1) and selection markers (blaTEM-1; CmR). An origin of replication and an ampicillin resistance had been found in two polymerase preparations (HotStarTaq, EvaGreen). An origin of replication had only been found in one polymerase preparation (iTaq Universal Probes Supermix). A Chloramphenicol resistance had not been found in any of the polymerase preparations tested. The methodology used did not incorporate a negative control to see if a positive signal can be obtained. Therefore, possible laboratory cross-contamination could not be excluded entirely although being unlikely due to PCR mastermix preparation in CleneCab PCR Workstation and highly specific primers. (Herolab, Wiesloch, Germany). To confirm our findings, enzymes preparations that had been tested positive for plasmid residues were used as template and amplified with a previously plasmid negative polymerase preparation, (GoTaq G2 Hot Start Polymerase; Promega). The HotStarTaq was still positive for Ori- and Ampicillin presence and the EvaGreen 2X qPCR Express Mix-ROX remained only positive for Ori presence, indicative for possible presence of artificial expression plasmids. All previous positive tested Taq enzymes from BioRad had been tested negative and, therefore, reconfirmed negative for plasmid presence (Table 2).
Analysis of metagenomics studies
Finally, we analyzed previously published metagenomic data sets of human gut and plasma samples as well as a data set using different whole genome amplification kits50,51,52 for the presence of plasmid residues. Retrospective analysis of these data sets, natural plasmid residues had been found in most sets and most commonly Acinetobacter sp. and Escherichia sp. as source organisms (Table 1 and Table 2). The highest diversity of plasmids had been found in metagenomic data focusing on the fecal microbiome53. Especially metagenomic studies analyzing high bio mass samples such as microbiome studies are expected to contain a higher amount and diversity of natural plasmids compared to samples with low biomass (e.g. plasma). Remarkably, a plasmid highly similar to Xuhuaishuia manganoxidans strain DY6-4 had been detected in several samples of two unrelated metagenomics studies although this bacterium has been found only in the Pacific Clarion-Clipperton Fracture Zone51 (Table 3) so far.
Discussion
The presence of bacterial DNA residues in commercially available enzymes, DNA extraction kits and other molecular grade reagents have been recognized recently21,26,41,52. The presence of plasmids in molecular biology reagents, however, has remained unnoticed, so far. We found natural and artificial plasmid residues in most tested NGS reagents including particularly recombinant generated enzyme preparations. Sources of these plasmids included laboratory contaminants as well as bacteria and expression vectors used for the generation of recombinant proteins. Plasmid sequences have been identified frequently in NGS studies, but may have been attributed erroneously to bacteria. Hence, plasmid sequences present in clinical and environmental samples may have far-reaching consequences.
Metagenomic studies are increasingly used in addition to standard PCR assays to address clinical questions as reviewed in Klymiuk & Steininger54. Enzymes used for these assays are generated by recombination in (with) prokaryotic systems. Plasmid sequences may misguide clinical treatment decisions and adversely affect patient outcome. For example, antimicrobial resistance testing is increasingly adjunct by testing bacterial isolates for the presence of genes that confer resistance55. In the studies analyzed, common antibiotic resistance gene sequences had been found from Enterobacter cloacae and Serratia marcesens. These two pathogens are increasingly resistant to multiple or most antimicrobial drug classes and the presence of resistance genes in clinical samples would not be surprising or questioned14,15,17,45. Consequently, the choice of antimicrobial treatment would be misguided towards reserve antimicrobials that are more toxic than standard ones. At least one patient death was documented in association with a false-positive test result by a contaminated mastermix56.
Misguidance of clinical decisions may also be associated with false-positive PCR results. We found evaluated EIAV sequences in all human samples. We could identify the plasmid used for the generation of the reverse transcriptase as the source of these sequences. Identification of a horse retrovirus in human samples was implausible, which guided our investigation into the right direction. In general, the presence of host-specific viral, genomic or plasmid DNA (e.g. Xuhuaishuia manganoxidans strain DY6-4) in samples derived from other hosts should be questioned for their plausibility. Still, recombinant reverse transcriptase is also used in PCR assays for detection of EIAV in horse samples and this pol sequence is used in several detection assays as target57. A positive test result would be plausible and negative controls would test negative because they are usually not treated with a reverse transcriptase. In case of a single positive EIAV test result, however, all horses of the stable would be culled.
Elimination of plasmid sequences from molecular biology reagents is difficult and costly. The presence of natural plasmids from bacteria such as Ralstonia sp., Bradyrhizobium sp. and Legionalla sp., are common contaminants in Ultrapure Water and are difficult to avoid21. Contamination of reagents from the human body may remain unnoticed. In one of our recent metagenomic studies, we found plasmid fragments from Ralstonia sp., Burkholderia sp., Enterobacter sp., Acinetobacter sp., and Micrococcus sp.48. The first two were likely introduced by water samples, whereas the later were likely introduced through human handling as these microbes are part of the normal human skin flora58. Previously, we found Bicaudal-1 and UL32 protein expression plasmids in human samples48. These plasmids were very likely contaminations as our research group used these plasmids in another research study. In addition, prokaryotic expression plasmids are commonly used to generate enzymes for molecular biology and are difficult to eliminate. For example, we identified the plasmid used for the generation of the EIAV reverse transcription as the pDS56/RBSII-based plasmid expression vector by the backbone59. Nevertheless, we also found differences in the level of contamination between the enzyme preparations from different manufacturers, which also indicates the feasibility of reducing this background signal.
A possible, inexpensive and feasible solution to the problem of plasmid residues in metagenomics studies may be the testing of technical replicates of the samples as well as the negative controls in parallel and subtracting during bioinformatics analysis signals detectable in both samples. Databases that comprehensively annotate the different expression vectors used for recombinant generation of proteins are important in this respect. Furthermore, specification of the type and sequences of expression plasmids used in the package inserts of every molecular biology reagent would be helpful. Nevertheless, most production processes of enzymes are proprietary and, in our experience, companies are very hesitant to provide this information.
Another solution, presented by de Goffau and colleagues, would be to use different isolation kits during sample preparation to control if the results are reproducible60.
In conclusion, we found that plasmid sequences are frequently present in molecular biology reagents. The sources for this background noise in metagenomic studies are diverse and include contamination of reagents from the environment, cross-contamination in the laboratory from purposely generated plasmids, as well as plasmids used for the generation of enzymes. The amount and type of plasmids found in metagenomics studies may greatly vary upon pre-treatment of samples (e.g. use of different enzymes). The presence of these plasmids in samples may have far-reaching consequences including the misguidance of therapeutic decisions in human and veterinary medicine – particularly when unexpected. Our observations open up whole new avenues to identifying and appropriately addressing these potential issues. Background plasmid noise may be eliminated for example from signals by use of appropriate negative controls, manufacturers of enzymes and recombinant proteins may inform customers of the possible presence of plasmid traces, and metagenomic data will be interpreted even more cautiously.
Methods
Urine and pharyngeal lavage samples from human healthy volunteers had been collected in a sterile collection cup (Greiner Bio-One GmbH, Kremsmünster, Austria) as described previously48. Lavages had been collected by asking the patient to gurgle with 10 ml of sterile, physiologic sodium-chloride solution (0.9% NaCl Mini-Plasco isotonic solution, B. Braun-Austria GmbH, Maria Enzersdorf, Austria) for a minimum of one minute and collecting the lavage fluid in a sterile tube. Samples had been kept on ice and had directly been processed. Nucleic acids had been enriched with Vivaspin 20 50.000 MWCO PES ultracentrifugation columns (Sartorius, Aubagne, France) at 4000 g and 4 °C. Total DNA and RNA had then been purified with the Roche High Pure Viral Nucleic Acid Kit (Roche, Mannheim, Germany) and reverse transcribed with either iScript cDNA Synthesis Kit (Bio-Rad, Hercules, USA) or Omniscript RT Kit (Qiagen, Hildesheim, Germany) according to the manufactures instructions. The samples had been cryopreserved at −80 °C until testing.
Plasmid detection
For the detection of Equine Infectious Anemia Virus pol sequences, a PCR assay had been developed amplifying a 723 bp long fragment of the pol gene of EIAV and corresponds to the fragment detected previously with use of the following primers: forward: 5′-CGG-AAG-AGG-CAC-AAA-AAG-AG-3′; reverse: 5′-GAC-CAG-GTA-CCC-AAG-CAA-AA-3′. The PCR mix contained 0.125 µl OneTaq Hot Start DNA Polymerase (New England Biosystems, Ipswich, USA), 500 µM of each primer, 5 µl ThermoPol Buffer (New England Biosystems; Ipswich, USA), 200 µM of each dNTP (ThermoFisher, Waltham, USA) and 1 µl DNA or cDNA template per 25 µl reaction mix. Amplification started with an initial denaturation step at 94 °C for 5 minutes, followed by 40 cycles of denaturation at 94 °C for 60 seconds, annealing at 51 °C for 30 seconds and extension at 68 °C for 60 seconds, followed by a final extension time of 7 minutes The PCR product had then been visualized on a 1.0% agarose gel in Tris-Acetat-EDTA (TAE) with a ChemiDoc XRS + System (Biorad, California, USA).
For quantitative analyses of plasmid copies, a qPCR assay amplifying in part the chloramphenicol acetyltransferase (CmR) encoding gene had been designed with the use of the online-tool GenScript Real-time PCR (TaqMan) Primer Design (https://www.genscript.com/ssl-bin/app/primer). The 20 µl reaction mix contained 9 µl iTaq Universal Probes Supermix (Bio-Rad, Hercules, USA), 300 nM primers (Forward: 5′-GAC-GGT-GAG-CTG-GTG-ATA-TG-3′; Reverse: 5′-TGT-GTA-GAA-ACT-GCC-GGA-AA-3′), 200 nM of the CmR Probe (5′-FAM-CGC-TCT-GGA-GTG-AAT-ACC-ACG-ACG-TAMRA-3′) and 5 µl template. The reaction had been done in a 96-well optical microtiter plate (Life Technologies, Carlsbad, CA, USA) and amplified in a StepOnePlus Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA). The reaction mix had been pipetted into a MicroAmp Fast 96-Well Reaction Plate 0.1 ml (Applied Biosystems, California, USA) and afterwards 5 µl of template had been added. The cycling conditions included an initial denaturation step at 95 °C for 2 minutes, followed by 40 cycles of denaturation for 15 seconds at 95 °C and 20 seconds extension time at 60 °C. Every run of the CmR qPCR included a serial dilution of the plasmid pDONR221 from 3 × 101 to 3 × 106 copies per well for calculation of a standard curve and quantification of target sequences. Each DNA sample had been analyzed in triplicate and at least 12 negative controls, only containing the reaction mix with 1 µl ddH20 as template, had been included in each run.
In order to test commercially available polymerases for presence of plasmid sequences, a specific pan-Ori primer pair (Forward: 5′-AGT-TCG-GTG-TAG-GTC-GTT-CG-3′; Reverse: 5′-GCC-TAC-ATA-CCT-CGC-TCT-GC -3′) had been designed with the online primer design tool Primer3 v.0.4.0. (http://bioinfo.ut.ee/primer3-0.4.0/primer3/). This PCR assay allowed detection of pBM1, pBR322, ColE1 and pUC19 in one reaction. The commonly used penicillin resistance blaTEM-1, had been detected by a PCR using a primer pair designed by Lee and colleagues (Forward: 5′-CTA-CGA-TAC-GGG-AGG-GCT-TA-3′, Reverse: 5′-ATA-AAT-CTG-GAG-CCG-GTG-AG-3′)53. For the detection of Chloramphenicol resistance (CmR) the same primer pair had been used as for the described qPCR. Cycling conditions and set up of reaction mixes had been conducted according to the enclosed manufacturer’s manual except that no template had been added. All PCR reactions consisted of 30 cycles with 30 seconds denaturation at 95 °C, 30 seconds annealing at 60 °C and 25 seconds extension time at 72 °C. The time needed for initial denaturation and final extension as well as primer, MgCl2 and dNTP concentration may vary upon polymerase or mastermix used. Cycling conditions for High-Fidelity Polymerases such as Q5 and iProof were shorter (10 seconds denaturation and 20 seconds extension time). As positive control for Ampicillin and Ori presence, 1 µl of a 1 ng/µl pcDNA3.1(+) dilution has been used as template. The (RT)-qPCR mastermixes had been pipetted according to each manufacturer’s manual. The same cycling conditions had been used as for the PCR reaction.
To exclude false-positive results, 0.125 µl to 0.2 µl of pure enzyme had been used as template for amplification with the GoTaq G2 DNA Polymerase (Promega, Madison, Wisoconsin, USA) which had no detectable plasmid residues. Cycling conditions included a 2 minute initial denaturation step at 95 °C, followed by 30 cycles of 30 seconds denaturation at 95 °C, 30 seconds annealing at 60 °C and 25 seconds extension time at 72 °C with a final extension time of 5 minutes at 72 °C. A 25 µl reaction consisted of 5 µl Colorless Flexi Buffer, 0.5 µl of each 10 µM primers, 0.5 µl of 10 µM dNTP Mix, 0.125 µl GoTaq G2 Enzyme and 2.5 µl 25 mM MgCl261. Positive tested qPCR Mastermixes had been reevaluated, by testing a 50 µl reaction instead of 25 µl in order to incorporate more enzymes.
Every mastermix had been pipetted in a CleneCab PCR Workstation (Herolab, Wiesloch, Germany) to avoid the introduction of foreign nucleic acids. The templates had been added on ice while the clean cab itself had been decontaminated for 20 min with the use of UV irradiation.
Bioinformatics and Statistical Evaluation
The EIAV pol sequence detected in our previous study has been analyzed in comparison with reference sequences including EIAV pol Wyoming (ID: AF016316.1), EIAV pol Liaoning (ID: AF327877.1), EIAV pol Vaccine Strain (ID: gb|AF327878.1), EIAV pol V70 Strain (ID: gi 9929860), EIAV pol V26 Strain (ID: gi 9929867), EIAV pol (ID: NC_001450.1), EIAV pol Clone 22 (ID: M87581.1), EIAV pol Miyazaki2011-A (ID: JX003263.1) and other strains of the lentiviridae group such as Human Immunodeficiency Virus-1 pol (HIV-1; ID: NC_001802.1), Feline Immunodeficiency Virus pol (FIV; ID: NC_001482.1) and Maedi/Visna pol strain kv1772 (ID: NC_001452.1) with use of the software package CLC Main Workbench 7 (Qiagen, Hildesheim, Germany).
In order to evaluate contigs for further potential plasmid contaminations, sequences had been evaluated for the presence of common plasmid features including origin of replication (F1, pBR322, pUC19, p15a, ColE1, SV40), selection markers (Chloramphenicol, Ampicillin (BlaTem-1), Kanamycin (Tn5), Streptomycin (aadA), Puromycin (pac) and Hygromycin (hph)), promoter (T7, T3, Sp6, AmpR, CMV, tet, LacI, polyhedrin, SV40), terminator (rrnB T1-T2, lambda), protein tags (Histidine, HA, Streptavidin) and primer binding sites (pBluescript SK, pBluescript KS, M13 pUC and other commonly used primer sites). All plasmid sequences had been searched from 5′ to 3′as well as from 3′to 5′. Sequences with at least one of these characteristics had been analyzed further by the SnapGene Viewer software (GSL Biotech LLC, Chicago, USA), which automatically annotates plasmid features. All sequences attributed to plasmids had been analyzed via their annotated features and classified into artificial vectors or artificial plasmid fragments (see Fig. 5A).
As final step, known plasmid sequences had been searched in the short read metagenome sequence data of all samples, which was described earlier by Thannesberger and colleagues48 as well as published raw data from other metagenomics studies50,51,52. We used the previously described bioinformatic pipeline48 which estimates the coverage along the plasmids and rejects short regions of unspecific coverage. All plasmid sequences from the NCBI RefSeq database, release 77, had been used as reference54.
Ethics Statement
All healthy donors of clinical samples (urine and saliva) provided written informed consent. All experimental protocols had been approved by the Medical University of Vienna.
Method Statement
All methods had been carried out in accordance with relevant guidelines and regulations.
Abbreviated Summary
Due to increasing sequencing throughput enabled through Next-Generation sequencing (NGS), the analysis of all microbial genomes present in a single sample became possible (metanogemics). The indiscriminant sequencing of all nucleic acid sequences present in a sample by metagenomics does pose the risk of attributing biological significance to contaminating sequences as well as biasing the biological signal through a technical signal. Thus research conclusions and clinical decisions may be misguided significantly. We found that background plasmid sequences are present in every biological sample and have been erroneously interpreted as clinically significant biological differences previously. Through recognition of this significant background in metagenomic studies, however, we show how to devise effective countermeasures such as labelling of commercial reagents for presence of plasmids used for generation of recombinant proteins, and specifying these.
References
Raghunathan, A. et al. Genomic DNA Amplification from a Single Bacterium Genomic DNA Amplification from a Single Bacterium. Appl. Environ. Microbiol. 71, 3342–3347 (2005).
Mukherjee, S., Huntemann, M., Ivanova, N., Kyrpides, N. C. & Pati, A. Large-scale contamination of microbial isolate genomes by Illumina PhiX control. Stand. Genomic Sci. 10, 18 (2015).
Merchant, S., Wood, D. E. & Salzberg, S. L. Unexpected cross-species contamination in genome sequencing projects. PeerJ 2, e675 (2014).
Kryukov, K. & Imanishi, T. Human contamination in public genome assemblies. PLoS One 11, 1–11 (2016).
Longo, M. S., O’Neill, M. J. & O’Neill, R. J. Abundant human DNA contamination identified in non-primate genome databases. PLoS One 6, 1–4 (2011).
Strong, M. J. et al. Microbial Contamination in Next Generation Sequencing: Implications for Sequence-Based Analysis of Clinical Samples. PLoS Pathog. 10, 1–6 (2014).
Malmström, H., Storå, J., Dalén, L., Holmlund, G. & Götherström, A. Extensive human DNA contamination in extracts from ancient dog bones and teeth. Mol. Biol. Evol. 22, 2040–2047 (2005).
Pilli, E. et al. Monitoring DNA Contamination in Handled vs. Directly Excavated Ancient Human Skeletal Remains. PLoS One 8, 1–6 (2013).
Luksamijarulkul, P., Kiennukul, N. & Vatthanasomboon, P. Laboratory facility design and microbial indoor air quality in selected hospital laboratories. Southeast Asian J. Trop. Med. Public Health 45, 746–755 (2014).
Padua, Ra, Parrado, a, Larghero, J. & Chomienne, C. UV and clean air result in contamination-free PCR. Leuk. Off. J. Leuk. Soc. Am. Leuk. Res. Fund, U.K 13, 1898–1899 (1999).
Rand, K. & Houck, H. Taq polymerase contains bacterial DNA of unknown origin. Mol. Cell. Probes 4, 445–450 (1990).
Newsome, T., Li, B. J., Zou, N. & Lo, S. C. Presence of Bacterial Phage-Like DNA Sequences in Commercial Taq DNA Polymerase Reagents. J. Clin. Microbiol. 42, 2264–2267 (2004).
Sato, E., Furuta, R. A. & Miyazawa, T. An endogenous murine leukemia viral genome contaminant in a commercial RT-PCR kit is amplified using standard primers for XMRV. Retrovirology 7, 110 (2010).
Perron, A., Raymond, P. & Simard, R. The occurrence of antibiotic resistance genes in Taq polymerases and a decontamination method applied to the detection of genetically modified crops. Biotechnol. Lett. 28, 321–325 (2006).
Song, J. S. et al. Removal of contaminating TEM-la beta-lactamase gene from commercial Taq DNA polymerase. J. Microbiol. 44, 126–128 (2006).
Blainey, P. C. & Quake, S. R. Digital MDA for enumeration of total nucleic acid contamination. Nucleic Acids Res. 39, 1–9 (2011).
Koncan, R. et al. Learning from mistakes: Taq polymerase contaminated with??-lactamase sequences results in false emergence of Streptococcus pneumoniae containing TEM [1]. J. Antimicrob. Chemother. 60, 702–703 (2007).
Evans, G. E. et al. Contamination of Qiagen DNA extraction kits with Legionella DNA [4]. J. Clin. Microbiol. 41, 3452–3453 (2003).
Mohammadi, T., Reesink, H. W., Vandenbroucke-Grauls, C. M. J. E. & Savelkoul, P. H. M. Removal of contaminating DNA from commercial nucleic acid extraction kit reagents. J. Microbiol. Methods 61, 285–288 (2005).
Smuts, H., Kew, M., Khan, A. & Korsman, S. Novel Hybrid Parvovirus-Like Virus, NIH-CQV/PHV, Contaminants in Silica Column-Based Nucleic Acid Extraction Kits. J. Virol. 88, 1398–1398 (2014).
Kulakov, L. A., McAlister, M. B., Ogden, K. L., Larkin, M. J. & O’Hanlon, J. F. Analysis of bacteria contaminating ultrapure water in industrial systems. Appl. Environ. Microbiol. 68, 1548–1555 (2002).
Mcalister, M., Kulakov, L., O’hanlon, J., Larkin, M. & Ogden, K. Survival and nutritional requirements of three bacteria isolated from ultrapure water. J. Ind. Microbiol. Biotechnol. 29, 75–82 (2002).
Shen, H., Rogelj, S. & Kieft, T. L. Sensitive, real-time PCR detects low-levels of contamination by Legionella pneumophila in commercial reagents. Mol. Cell. Probes 20, 147–153 (2006).
van der Horst, J. et al. Sterile paper points as a bacterial DNA-contamination source in microbiome profiles of clinical samples. J. Dent. 41, 1297–1301 (2013).
Loeffler, J. et al. Contamination Ocurring in Fungal PCR Assays. J. Clin. Microbiol. 37, 1200–1202 (1999).
Glassing, A. et al. Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathog. 8, 24 (2016).
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
Glassing, A. et al. Changes in 16s RNA Gene Microbial Community Profiling by Concentration of ProkaryoticDNA. J. Microbiol. Methods 119, 239–242 (2015).
Lauder, A. P. et al. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4, 29 (2016).
Drengenes, C. et al. Laboratory contamination in airway microbiome studies. Eur. Respir. J. 48 (2016).
Fischer, M. et al. Efficacy Assessment of Nucleic Acid Decontamination Reagents Used in Molecular Diagnostic Laboratories. PLoS One 11, e0159274 (2016).
Bottger, E. C. Frequent contamination of Taq polymerase with DNA. Clin. Chem. 36, 1258–1259 (1990).
Cimino, G. D., Metchette, K. C., Isaacs, S. T. & Zhu, Y. S. More false-positive problems. Nature 345, 773–774 (1990).
Deragon, J. M., Sinnett, D., Mitchell, G., Potier, M. & Labuda, D. Use of? irradiation to eliminate DNA contamination for PCR. Nucleic Acids Res. 18, 6149 (1990).
Sarkar. Sarkar, Sommer - 1990 - Shedding light on PCR contamination.pdf (1990).
Tilburg, J. J. H. C. et al. Contamination of commercial PCR master mix with DNA from Coxiella burnetii. J. Clin. Microbiol. 48, 4634–4635 (2010).
Maiwald, M., Ditton, H.-J., Sonntag, H.-G. & von Knebel Doeberitz, M. Characterization of contaminating DNA in Taq polymerase which occurs during amplification with a primer set for Legionella 5S ribosomal RNA. Mol. Cell. Probes 8, 11–14 (1994).
Lyons, S. R., Griffen, A. L. & Leys, E. J. Quantitative Real-Time PCR for Porphyromonas gingivalis and Total Bacteria Quantitative Real-Time PCR for Porphyromonas gingivalis and Total Bacteria. J. Clin. Microbiol. 38, 2362–2365 (2000).
Meier, A., Persing, D. H., Finken, M. & Bottgerl, E. C. Elimination of contaminating DNA within polymerase chain reaction reagents: Elimination of Contaminating DNA within Polymerase Chain Reaction Reagents: Implications for a General Approach to Detection of Uncultured Pathogens. J. Clin. Microbiol. 31, 646–652 (1993).
Tuke, P. W., Tettmar, K. I., Tamuri, A., Stoye, J. P. & Tedder, R. S. PCR master mixes harbour murine DNA sequences. caveat emptor! PLoS One 6, 1–6 (2011).
Zheng, H., Jia, H., Shankar, A., Heneine, W. & Switzer, W. M. Detection of murine leukemia virus or mouse DNA in commercial RT-PCR reagents and human DNAs. PLoS One 6 (2011).
Monleau, M., Plantier, J. C. & Peeters, M. HIV contamination of commercial PCR enzymes raises the importance of quality control of low-cost in-house genotypic HIV drug resistance tests. Antivir. Ther. 15, 121–126 (2010).
Rosano, G. L. & Ceccarelli, E. A. Recombinant protein expression in Escherichia coli: Advances and challenges. Front. Microbiol. 5, 1–17 (2014).
Gokcezade, J., Sienski, G. & Duchek, P. Efficient CRISPR/Cas9 plasmids for rapid and versatile genome editing in Drosophila. G3 (Bethesda). 4, 2279–82 (2014).
Chiang, C. S., Liu, C. P., Weng, L. C., Wang, N. Y. & Liaw, G. J. Presence of β-lactamase gene TEM-1 DNA sequence in commercial Taq DNA polymerase [4]. J. Clin. Microbiol. 43, 530–531 (2005).
Tenover, F. C., Huang, B. O., Rasheed, J. K. & Persing, D. H. Development of PCR assays to detect ampicillin resistance genes in cerebrospinal fluid samples containing Haemophilus Development of PCR Assays To Detect Ampicillin Resistance Genes in Cerebrospinal Fluid Samples Containing Haemophilus influenzae. 32, 2729–2737 (1994).
Patel, R., Grogg, K. L., Edwards, W. D., Wright, A. J. & Schwenk, N. M. Death from inappropriate therapy for Lyme disease. Clin. Infect. Dis. 31, 1107–1109 (2000).
Thannesberger, J. et al. Viruses comprise an extensive pool of mobile genetic elements in eukaryote cell cultures and human clinical samples. FASEB J., https://doi.org/10.1096/fj.201601168R (2017).
Cook, R. F., Leroux, C. & Issel, C. J. Equine infectious anemia and equine infectious anemia virus in 2013: A review. Vet. Microbiol. 167, 181–204 (2013).
Bedarf, J. R. et al. Functional implications of microbial and viral gut metagenome changes in early stage L-DOPA-naïve Parkinson’s disease patients. Genome Med. 9, 39 (2017).
Law, J. et al. Identification of Hepatotropic Viruses from Plasma Using Deep Sequencing: A Next Generation Diagnostic Tool. PLoS One 8 (2013).
Thoendel, M. et al. Impact of Contaminating DNA in Whole Genome Amplification Kits Used for Metagenomic Shotgun Sequencing for Infection Diagnosis. J. Clin. Microbiol. JCM.02402-16, https://doi.org/10.1128/JCM.02402-16 (2017).
Lee, C., Kim, J., Shin, S. G. & Hwang, S. Absolute and relative QPCR quantification of plasmid copy number in Escherichia coli. J. Biotechnol. 123, 273–280 (2006).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Wang, L., Liu, Y., Shi, X., Wang, Y. & Zheng, Y. a manganese-oxidizing bacterium isolated from deep-sea sediments from the Pacific Polymetallic Nodule Province. 1521–1526, https://doi.org/10.1099/ijsem.0.000912 (2017).
Czurda, S., Smelik, S., Preuner-Stix, S., Nogueira, F. & Lion, T. Occurrence of fungal DNA contamination in PCR reagents: Approaches to control and decontamination. J. Clin. Microbiol. 54, 148–152 (2016).
Klymiuk, I. et al. A Physicians’ Wish List for the Clinical Application of Intestinal Metagenomics. PLOS Med. 11, e1001627 (2014).
Bilgilier, C. et al. Prospective multicentre clinical study on inter- and intrapatient genetic variability for antimicrobial resistance of Helicobacter pylori. Clin. Microbiol. Infect. 1–6, https://doi.org/10.1016/j.cmi.2017.06.025 (2017).
Rys, P. N. & Persing, D. H. Preventing false positives: Quantitative evaluation of three protocols for inactivation of polymerase chain reaction amplification products. J. Clin. Microbiol. 31, 2356–2360 (1993).
de Goffau, M. et al. Recognizing the reagent microbiome. Nature Microbiology 8, 851–853 (2018).
Promega. GoTaq® G2 Hot Start Polymerase. 13, 4–5 (2007).
Acknowledgements
This study was supported by a research grant from the Austrian Science Fund P25353-B21 and P28102-B30.
Author information
Authors and Affiliations
Contributions
Nikolai Wally (First Author); Designed research, performed research, Analyzed Data, Wrote the paper.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wally, N., Schneider, M., Thannesberger, J. et al. Plasmid DNA contaminant in molecular reagents. Sci Rep 9, 1652 (2019). https://doi.org/10.1038/s41598-019-38733-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-019-38733-1
This article is cited by
-
Comparative genomic assessment of members of genus Tenacibaculum: an exploratory study
Molecular Genetics and Genomics (2023)
-
Contamination detection in genomic data: more is not enough
Genome Biology (2022)
-
An atlas of the tissue and blood metagenome in cancer reveals novel links between bacteria, viruses and cancer
Microbiome (2021)
-
Comprehensive pathogen detection in sera of Kawasaki disease patients by high-throughput sequencing: a retrospective exploratory study
BMC Pediatrics (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.