Assessment of the plasmidome of an extremophilic microbial community from the Diamante Lake, Argentina

Diamante Lake located at 4589 m.a.s.l. in the Andean Puna constitutes an extreme environment. It is exposed to multiple extreme conditions such as an unusually high concentration of arsenic (over 300 mg L−1) and low oxygen pressure. Microorganisms thriving in the lake display specific genotypes that facilitate survival, which include at least a multitude of plasmid-encoded resistance traits. Hence, the genetic information provided by the plasmids essentially contributes to understand adaptation to different stressors. Though plasmids from cultivable organisms have already been analyzed to the sequence level, the impact of the entire plasmid-borne genetic information on such microbial ecosystem is not known. This study aims at assessing the plasmidome from Diamante Lake, which facilitates the identification of potential hosts and prediction of gene functions as well as the ecological impact of mobile genetic elements. The deep-sequencing analysis revealed a large fraction of previously unknown DNA sequences of which the majority encoded putative proteins of unknown function. Remarkably, functions related to the oxidative stress response, DNA repair, as well as arsenic- and antibiotic resistances were annotated. Additionally, all necessary capacities related to plasmid replication, mobilization and maintenance were detected. Sequences characteristic for megaplasmids and other already known plasmid-associated genes were identified as well. The study highlights the potential of the deep-sequencing approach specifically targeting plasmid populations as it allows to evaluate the ecological impact of plasmids from (cultivable and non-cultivable) microorganisms, thereby contributing to the understanding of the distribution of resistance factors within an extremophilic microbial community.

Plasmids are self-replicating, extrachromosomal, mobile, genetic elements of ecological importance, as they may confer functions or beneficial traits enabling their hosts to thrive in a given environment and-equally important-they can act as horizontal gene transfer vehicles 1 thereby contributing to the spread of genetic information within a microbial community. Thus, plasmids act as important evolutionary driving force by accelerating genome innovation and allowing acquisition of evolutionary novelty 2 .
There are myriads of studies addressing bacterial and archaeal plasmids, which eventually revealed the typical functions ensuring plasmid replication, mobilization and maintenance, as well as other accessory characteristics [3][4][5][6][7][8][9][10][11][12] . Only quite recently, however, it was recognized that to characterize them as a whole is necessary to understand their ecological impact in microbial ecosystems [13][14][15][16][17][18] . In this regard, the total plasmid DNA present in an ecological niche was defined as the "plasmidome" 19 . Indeed, plasmidome studies based on independent microbial culture methods substantiated the significance of extrachromosomal genetic elements with respect to different environments, such as the bovine rumen and wastewater treatment plants [13][14][15] . We have recently reported the study of the plasmidome of the Puquio de Campo Naranja, an extreme environment of www.nature.com/scientificreports/

Materials and methods
Sampling and biomass purification. Samples were aseptically taken from red biofilms attached to gaylussite crystals at the bottom of submerged microbialites in the Diamante Lake, Catamarca, Argentina (26°00′51.04″S, 67°01′46.42″W) in September 2019 (Fig. 1). Microbialites were found at a distance of 2 m from the lake shore. Samples from three randomly chosen sites were taken and pooled to ensure representativeness. Stored in sterile plastic flasks at 4 °C such pooled samples were further processed within a week. Permission for sample collection was granted by the Secretaría de Medio Ambiente, Catamarca, Argentina (No. 22935/2016). Microorganisms were separated from the sample by using the protocol described by Perez et al. 18 . Pellets obtained from biomass purification were kept at − 20 °C until plasmid DNA extraction. DNA extractions. The plasmid DNA was isolated by using the Large-Construct kit as recommended by the manufacturer (Qiagen, Hilden, Germany).
In parallel, metagenomic DNA was extracted from red biofilm samples by using the FastDNA Spin kit for soil as recommended by the manufacturer (MP Biomedicals, CA, USA). The extracted metagenomic DNA served as template for 16S rRNA gene amplicon sequencing.
Sequencing, quality control and assembly. Illumina shotgun paired-end sequencing libraries were generated from isolated plasmid DNA using the Nextera XT sample preparation kit as recommended by the manufacturer (Illumina, CA, USA). The MiSeq system together with the MiSeq reagent kit version 3 (600-cycle) was used for the plasmidome sequencing as recommended by the manufacturer (Illumina). The quality control of raw sequence reads was carried out with FastQC v0.11.9 and the reads were quality-filtered using Trimmomatic v0.38.0 29 . Finally, the reads were de novo assembled by using SPAdes software v3.9.0 with the -meta parameter to call the metaSPAdes module 30 . Recycler algorithm was used to assemble cyclic sequences, which are likely plasmids, phages and other circular elements from the assembly graphs provided by SPAdes 31 . The bioinformatic analysis pipeline described by Kothari et al. 17 was also used to identify the complete closed circular contigs. The circular elements obtained in both cases were compared to DoriC 10, a database of replication origins in prokaryotic genomes including chromosomes and plasmids 32 .
Bioinformatic analysis. The reads generated by sequencing of the plasmid DNA were aligned with the metagenome contigs using the Bowtie2 tool 33 . Metagenome contigs were assembled using SPAdes v3.9.0 from sequencing data of three independent red biofilm samples taken on another occasion and published by Saona et al. 26 .
Annotation and labeling of all the relevant genomic characteristics on plasmidome contigs were done with Prokka v1.14.5 34 . The assembled plasmidome dataset was submitted to the MG-RAST server 35 for functional and taxonomic analysis. Comparisons with the SEED subsystem database were performed by using a maximum E-value of 10 -5 . The deduced functional profile of the red biofilm plasmidome was compared with the one derived from the metagenome mentioned above 26 by employing the software STAMP (Statistical Analysis of Metagenomic Profiles) 36 . Comparisons with other plasmidomes were also performed 15,18 .
Both the known plasmid sequences from NCBI database and the domains related to plasmid replication and mobilization were assessed as described previously by Perez et al. 18 . In addition, the plasmidome contigs were compared to TADB 2.0 database by blastn in order to identify toxin-antitoxin (TA) systems 37 . Furthermore, the Prokka annotation file of the plasmidome was subjected to Conditional Reciprocal Best BLAST (crb-blast) against plasmid genes sequences from the ACLAME database with an E-value ≤ 10 -338 . Hits with an identity ≥ 70% and an alignment coverage ≥ 90% were selected. Similarly, putative genes encoding metal resistance and virulence factors were also searched by using the BacMet 39 and VFDB databases 40 , respectively.
Due to the high arsenic concentration found in the lake 22 , arsenic resistance-related genes were separately annotated. For this purpose, the amino acid sequences were downloaded from Uniprot and were subjected to Position-Specific Iterated BLAST (PSI-BLAST) 41 . CD-HIT v4.8.1 was used for creating non-redundant datasets 42 and Clustal Omega v1.2.4 for sequence alignments 43 . Profiles Hidden Markov model were build and searched for in the plasmidome translated gene sequences identified with Prokka by using HMMER 3.3 (cut-off E-value < 10 -3 ) 44 .
The Resistance Gene Identifier (RGI) software was employed for prediction of antibiotic-resistance genes using the Comprehensive Antibiotic Resistance database (CARD) 45 as a reference.
The ISEScan software pipeline was used to search for mobile elements such as insertion sequences 46 , and the HMM profiles downloaded from TnpPred web 47 for prediction of prokaryotic transposases by HMMER 3.3.
Amplicon sequencing and taxonomic analysis. 16S rRNA gene amplicon sequencing was performed using the above-described primers partially covering the 16S rRNA gene sequence. The MiSeq system together with MiSeq reagent kit version 3 (600-cycle) was used for sequencing of the amplicons as recommended by www.nature.com/scientificreports/ the manufacturer (Illumina). Data quality control and analysis were performed using the QIIME software 48 . First, paired-end reads were joined with PEAR v0.9.6 49 . Quality-filtering was performed using the split_librar-ies_fastq.py script. Forward and reverse primers were removed by using cutadapt v1.16 50 . USEARCH v11 51 was used for zero-radius operational taxonomic unit (zOTU) determination. Taxonomy was assigned against Silva 132 database 52 .

Results and discussion
Sequencing and assembly output. Illumina sequencing from the Diamante Lake plasmidome generated 1,071,941 paired-end reads, of which 941,587 passed quality-filtering. The SPAdes assembler produced 13,492 contigs (> 500 bp) corresponding to roughly 16.9 Mb (largest contig 20.415 bp) (Supplementary Table S1). It is smaller than the previously reported one for another similar extremophile community of Puquio de Campo Naranja (135,813 contigs, 127.9 Mb) 18 . Thirty-nine closed replicons were predicted by the Recycler software (> 1000 bp), the largest consisting of 3313 bp, but none displayed a known plasmid origin of replication when compared to the DoriC 10.0 database. The bioinformatic pipeline used to detect circularity produced 20 circular contigs. The largest comprised 8295 bp and the smallest 2025 bp. Only one of them showed a known plasmid origin of replication (87% similarity, alignment length of 70 nt), corresponding to pSN found in Haloterrigena thermotolerans strain H13 (DoriC ID: pORI00000477). Exclusively hypothetical proteins could be annotated from the open reading frames present in the circular elements.
Functional analysis. MG-RAST analysis revealed that the Diamante Lake plasmidome contains a large fraction of unknown DNA as 18,314 sequences (41.5%) code for predicted proteins with known functions but 25,800 sequences (58.5%) encode putative proteins of unknown function. Thus, such deep-sequencing approaches are suited to detect novel proteins with so far undiscovered functions. In our previous plasmidome analysis from an AME, 39% of the predicted proteins could not be functionally annotated 18 . The difference between the two environments displaying rather similar environmental characteristics is possibly due to the large proportion of archaea present in the Diamante Lake, as archaeal genomes contain typically a higher fraction of "dark matter" when compared to bacterial genomes. The isolation and cultivation of most of the archaea, and accordingly, the experimental characterization of archaeal gene products, is challenging 53 . Likewise, Sentchilo et al. 15 also reported 52 and 66% of coding sequences without assigned function in two wastewater treatment plant plasmidomes.
From the functional SEED assignment, only 5196 predicted proteins were annotated (28.4%), most of them covering basic metabolic functions such as DNA, RNA and protein metabolism ( Fig. 2A). It is noteworthy that among the proteins involved in DNA metabolism those related to DNA repair were rather diverse (Fig. 2B). It www.nature.com/scientificreports/ is well known that DNA repair plays a key role as an adaptive mechanism to withstand the high UV irradiation in the Andean Puna [54][55][56][57][58] .
Although on a smaller scale, the subsystems "Stress Response" and "Virulence, Disease and Defense" were represented ( Fig. 2A). With respect to stress response, predicted proteins involved in the response to oxidative stress were most frequently found (57.4%) ( Supplementary Fig. S1). As for the DNA repair mechanisms, oxidative stress response systems contribute to protect microorganisms from UV-mediated damage. 82.4% of the assignments to the above mentioned second group corresponded to the subsystem "Resistance to antibiotics and toxic compounds", with arsenic resistance systems prevailing (67.4%). The arsenic resistance included predicted proteins for an arsenate reductase (ArsC), an arsenical pump-driving ATPase (ArsA), an arsenical resistance operon trans-acting repressor (ArsD) and an arsenical-resistance protein ACR3 (Fig. 3).
The unique characteristics of this environment, such as its location at a high altitude, the exposure to extreme conditions and the peculiarities of its microbial composition, which includes a major proportion of archaea, may explain the relatively few functional annotations during grouping into SEED categories.
Functional comparison between plasmidomes. The predicted functional profile of the Diamante Lake plasmidome was compared to the one derived from the Puquio de Campo Naranja plasmidome 18 . "RNA www.nature.com/scientificreports/ Metabolism", "DNA Metabolism", "Phages, Prophages, Transposable elements, Plasmids" and "Cell Division and Cell Cycle" subsystems were more frequently represented in Diamante Lake than in the other, while "Carbohydrates", "Cell Wall and Capsule", "Clustering-based subsystems", "Stress Response", "Respiration" were more abundant in Puquio de Campo Naranja. No significant differences in the abundances of other subsystems were observed, suggesting a certain degree of similarity of the predicted functional profiles for both of the AMEs (Fig. 4). Diamante Lake is known to be among the aqueous environments displaying high arsenic concentrations 22,25,59 , reinforcing the repeatedly mentioned presence of genes related to its resistance. Proportional differences between "Resistance to antibiotics and toxic compounds" categorizable protein coding genes belonging to the "Virulence, Disease and Defense" SEED subsystem showed that arsenic resistance is more abundant in the Diamante Lake plasmidome than in that of the Puquio de Campo Naranja (Fig. 5). In addition, the comparison of the former . Predicted functional profiles derived from the Diamante Lake plasmidome (red bars) and the Puquio de Campo Naranja plasmidome (orange bars). Percent SEED categorizable protein-encoding genes and pairwise proportional differences calculated using STAMP. Fisher's exact test was used and corrected P-values were calculated using Storey's FDR. Only the statistically significant SEED subsystems are shown (q < 0.05).

Figure 5.
Predicted functional profiles derived from the Diamante Lake plasmidome (red bars), the Puquio de Campo Naranja plasmidome (orange bars) and the one derived from the wastewater treatment plant plasmidome in Visp, Switzerland (green bars). Percent "Resistance to antibiotics and toxic compounds" categorizable protein coding genes belonging to the "Virulence, Disease and Defense" SEED subsystem, and pairwise proportional differences calculated using STAMP. Fisher's exact test was used and corrected P-values were calculated using Benjamini-Hochberg's FDR. Only the statistically significant SEED subsystems are shown (q < 0.05).  15 revealed that arsenic resistance traits were again more abundant in the Diamante Lake plasmidome (Fig. 5).
Plasmid-purification advantage. Plasmids usually represent only a small fraction of the total DNA in a given environment, due to their low rate of occurrence and number of copies. Hence, though they are casually recorded by conventional metagenomic sequencing methods, experimental plasmid-purification prior to sequencing allows for an analysis specifically targeting plasmid populations in a culture-independent manner, at best without losing information 60 . Obtained results are in line with such notion, as only 52% of plasmidome reads aligned with metagenome contigs described above in Saona et al. 26 . The same applies to the Puquio de Campo Naranja plasmidome, in which alignment reached only 30% 18 . Thus, our study strengthens that plasmidpurification prior to sequencing more satisfactorily meets the requirements to comprehensively assess the ecological importance of plasmid-borne sequences. The pairwise comparison aiming at distinguishing the plasmid gene pool from the metagenomic one ( Fig. 6) accordingly revealed that "Phages, Prophages, Transposable elements, Plasmids" and "Membrane Transport" subsystems are more frequently represented in the plasmidome.
Plasmid backbone functions: replication, mobilization and maintenance. In order to identify plasmid-like traits within the plasmidome, we focused on the search for Pfam domains related to plasmid replication and MOB-type relaxase families, which are related to plasmid mobilization.
RHH_1 and DUF1424 were detected as the main Pfam domains of plasmid replication in the Diamante Lake plasmidome, followed by RepL (Table 1). Likewise, RHH_1 and RepL protein families were also the most abundant in the plasmidome from Puquio de Campo Naranja 18 . It was not the case of DUF1424, which is a family of several archaeal proteins that seems to be present exclusively in Halobacterium and Haloferax species. Although the function of the latter family is unknown, its members are probably rep proteins due to the presence of conserved functional motifs 61,62 .
Rep_1 and Rep_3 are the major families of replication initiation proteins. They have been reported among the most abundant in plasmidomes from wastewater treatment plants and a rat cecum 15,16,63,64 . In this study, domains belonging to the Rep_1 and Rep_3 families were also detected, but with a lower hit rate. Domains of replication initiation proteins from other known families were not detected (Table 1). Possibly, there are replication systems for which the molecular details and the mechanisms are currently unknown, particularly as most of the contributing microorganisms were not cultured and because of the taxonomic composition dominated by specific taxa, i.e. halobacteria 25 .
The most abundant relaxase families in the plasmidome were MOB C and MOB P , for which 41 and 18 protein domain matches, respectively, were counted. MOB T , MOB V and MOB M were also present (3, 2 and 1 protein domain matches, respectively) ( Table 2). Mobilization elements have been reported in most of the previous plasmidome analyses [14][15][16]63 , however, the classification in relaxase MOB families proposed by Garcillán-Barcia et al. 65,66 was not performed. Meanwhile, 29 protein domain matches were counted for MOB T family in the Figure 6. Predicted functional profiles derived from the plasmidome (red bars) and the metagenome (blue bars) of Diamante Lake red biofilm. Percent SEED categorizable protein-encoding genes and pairwise proportional differences calculated using STAMP. G-test (w/Yates') was used and corrected P-values were calculated using Benjamini-Hochberg's FDR. Only the statistically significant SEED subsystems are shown (q < 0.05). www.nature.com/scientificreports/ Puquio de Campo Naranja plasmidome 18 , and Kothari et al. 17 reported the MOB Q and MOB P families as the most abundant in circular plasmids from groundwater plasmidomes. In addition to the above plasmid replication and mobilization entries, sequences harboring genes involved in plasmid maintenance such as loci corresponding to toxin-antitoxin (TA) systems were identified (identity and coverage at least 85%). All of the TA systems belong to type II TA-loci (Supplementary Table S2). Only a  www.nature.com/scientificreports/ single complete system could be annotated, i.e. a toxin with the respective antitoxin (T2787-AT2787), both located in the same contig (NODE_2880). It corresponds to the VapBC family, where the toxin is a PIN-domain ribonuclease (145 aa) and the antitoxin is a transcription factor (98 aa) 67 . Interestingly, this system is known from Haloquadratum walsbyi DSM 16790, a halophilic archaeon that was isolated from a solar saltern in Brac del Port (Alicante, Spain), and it was found to dominate most of the thalassic NaCl-saturated environments 68 .
In the previous plasmidome studies, TA systems were not taken into consideration. Only Kothari et al. 17 reported the YoeB-YefM and RelE/StbE-RelB/StbD type II TA systems in some circular plasmids from groundwater plasmidomes. YoeB and RelE are ribosome-dependent RNase toxins that bind directly to the A site of the ribosome, where they cleave ribosome-associated mRNA 69 .
Plasmid accessory functions: antibiotic resistance and arsenic resistance. Sequence analysis of the plasmidome from the Puquio de Campo Naranja revealed that antibiotic resistance traits are widespread in this extreme pristine environment, as 123 putative antibiotic resistance genes (ARGs) were annotated 18 . In the present study, only 8 ARGs could be classified, conveying resistance to 10 drug classes, among them macrolides, carbapenems, cephalosporins, penams (Supplementary Table S3). Such noticeable difference with respect to the number of ARGs found in similar extreme environments is probably due to microbial regional distinctions. The metabolic processes and the cell walls of bacteria and archaea display significant differences, offering an explanation for the fact that a number of antibiotics are effective against the former but do not threaten the latter 70 . Moreover, studying antibiotic-resistance mechanisms is-due to the clinical relevance-of much more necessity in the bacterial domain, as pathogenic archaea have not yet been identified 71 . So far, only a relationship between the periodontal-disease-severity and the relative abundance of the archaeon Methanobrevibacter oralis was reported 72 .
The bias necessarily introduced by the existing databases and developed from the information currently available, as well as the dominance of the archaea in the microbial community studied interfere with the analysis of the resistome encoded by the Diamante Lake plasmidome. Thus, it cannot to be excluded that the lack of relevant knowledge is the reason for the low number of identifiable ARGs and virulence factors.
Regarding the resistances to metals, a respective search in the BacMet database produced no hits, possibly also due the above reason as the database consists solely of bacterial entries. However, our manual annotation disclosed the presence of arsenic resistance genes. Arsenic hits various microorganisms, however, several bacteria and archaea possess detoxification systems enabling growth even under high As-concentrations 73 . The most common resistance system is encoded by the ars operon for which different genetic organizations were described among prokaryotes 74 . In addition to the extrusion systems, composed of the gene-products encoded by arsA, arsB, arsC, arsD, arsR, and acr3, another mechanism involving a putative arsenite(III)-methyltransferase (ArsM) was reported in Halobacterium sp. NRC-1 75 . We identified 28 proteins possibly related to arsenic resistance; ten of them had been automatically annotated as "hypothetical proteins" by Prokka (Supplementary Table S4). Hence, automatic annotation bears the risk of less accurate results or the disclosure of a fewer number of genes than actually exist. It is to be emphasized that genes enabling microorganisms to conduct anaerobic arsenate respiration (arr) and arsenite oxidation (aio) were not detectable in the plasmidome, as these are usually encoded by the chromosome.
As already mentioned, the genetic organization of the ars operons can vary among diverse microorganisms. In the Diamante Lake plasmidome, arsC (arsenate reductase) and acr3 (arsenite efflux transporter) genes were present twice in close proximity, but the most relevant was the arsADR gene cluster of contig 116 (Supplementary Table S4). The genetic arrangement agrees with that described for pHLAC01 and pNRC100 of Halorubrum lacusprofundi ATCC 49239 and Halobacterium sp. NRC-1, respectively ( Supplementary Fig. S2). In all cases, the arsDA and arsR genes are transcribed in opposite directions and the absence of the arsenite transporter ArsB-encoding gene is noticeable. This otherwise unusual operon structure is apparently characteristic for the haloarchaea 74,75 . Plasmid databases: NCBI and ACLAME. Sequences belonging to 24 megaplasmids described in 13 strains of halophilic archaea isolated from different saline environments were found ( Table 3). Most of the matches were with the plasmid of Halobacterium sp. DL1 (315 kb), which was isolated from a freshwater pond (NZ_CP007061.1). Fourteen matches were detected between the plasmidome sequences and the sequences of the plasmid pHLAC01 (431 kb) of Halorubrum lacusprofundi ATCC 49239 that was isolated from the Deep Lake, a hypersaline Antarctic site. It is noteworthy that one of these matches (identity 92%, length 4123 bp) corresponds to contig 116 of the plasmidome, which harbors the arsDA genes ( Supplementary Fig. S3). Thirteen sequences match with plasmid pHTIA (330 kb) of Halorhabdus tiamatea SARL4B, which was isolated from the Shaban deep-sea hypersaline anoxic lake in the Red Sea 76 . Thus, such plasmid sequences are evidently preserved in different high salinity environments.
When the red biofilm plasmidome genes annotated by Prokka were compared with plasmid genes of the Aclame database, 125 matches corresponded to genes of 11 megaplasmids of five different halophilic archaeal and one actinobacterial strain (Rhodococcus sp. RHA1) (Supplementary Table S5). Most of them were related to DNA metabolism, transposition and recombination. Gene related to arsenic resistance, DNA repair and plasmid partitioning were also identified. Thus, for many of the hypothetical proteins identified by Prokka a function was attributed, but not to all of them. Anyway, the comparison provided further justification of the practice as the presence of known plasmid-associated genes in our plasmidome dataset was proven.

Mobile genetic elements: transposases and insertion sequences.
A total of 532 insertion sequences (IS) were identified in the Diamante Lake plasmidome, with IS200/IS605 and IS5 like elements being the most frequent ones followed by members of the IS4, IS6, IS630 and ISH3 families (Supplementary Table S6 www.nature.com/scientificreports/ The first are the five main IS families spreading in Halobacteria, which is the dominating class of the studied community 77 . Most of the archaeal IS fall into families detected in Bacteria, while others are restricted to Archaea such as members of the ISH3 family 78 . Two new potential IS not attributable to any of the known families were classified as well (GenBank accession OK172335 and OK172336). In the Puquio de Campo Naranja plasmidome, a much lower number of IS (28) was reported. Again, most of them were assigned to the IS5, IS630 and IS4 families 18 . The absence of Tn3 family transposases in both of the plasmidomes is conspicuous, as it represents one of the most abundant families in bacterial genomes, and Tn3 elements preferentially transpose into plasmids 79 . The presence of so many IS elements is in line with the notion that the plasmidome substantially contributes to genome evolution as well as adaptation processes by facilitating the acquisition of novel genes and beneficial traits 80,81 . Taxonomic analysis. Although plasmids can be transferred between different microorganisms, the taxonomic assignment of the plasmidome contigs allows an estimation of potential hosts. Eighty-eight percent were assigned to Archaea, while 11% were assigned to Bacteria and the remaining 1% to Eukarya. Among the Archaea, the phylum Euryarchaeota (99.85%) is dominating.
When the phylum distribution of the plasmidome was compared with 16S rRNA sequencing data from the corresponding metagenomic DNA sample, the phylum Euryarchaeota again stood out with the highest relative zOTU abundance (Fig. 7A). In both, the class of the halobacteria dominated. With respect to the Bacteria, the Proteobacteria (36%), the Firmicutes (22.3%) and the Actinobacteria (10.5%) comprised most of the contigs assigned in the plasmidome. Also, in the metagenome, the Proteobacteria (66.9%) and the Firmicutes (17.3%) account for the phyla with the highest relative zOTU abundance with the Bacteroidetes, however, ranking third (10.9%) (Fig. 7B). Notably, plasmid contigs of Actinobacteria, Chloroflexi and Deinococcus-Thermus were obtained, whereas the 16S rRNA analysis did not disclose members of these phyla. A possible explanation might be horizontal transfer of plasmid-borne genes between bacterial phyla, or the existence of plasmids with a wide host range. On the other hand, the 16S rRNA analysis indicates the presence of members of the phylum Halanaerobiaeota but no plasmidome contig could be assigned to the latter, which is possibly due to a bias in plasmid databases when a phylum is not well represented or to the absence of plasmids within the taxon.

Conclusions
It is currently possible to study plasmid elements in the course of a conventional metagenomic analysis, but an approach to specifically target plasmid populations allows to overcome the inherent constraints of the bioinformatic tools applied for the analysis of plasmids from total community DNA. Under this perspective, and from the comparison with the metagenome of the same community, this study showed that part of the plasmid information will not be detected when the experimental plasmid-purification is not carried out prior to sequencing. Furthermore, a large fraction of genes with an unknown function was present in the plasmidome dataset, as at least 58.5% of the predicted proteins were hypothetical. In addition, the percentages of SEED assignments were even lower. The relatively few functional annotations may accord with the peculiarities of the extreme environment, which harbors a microbial community that is dominated by archaea. On the other hand, functions related to the response to oxidative stress and DNA repair were annotated, which agrees with the requirement of adaptive mechanisms enabling the hosts to withstand the exposure to the high UV irradiation in the Andean Puna. Comparison of the Diamante Lake plasmidome to that of Puquio de Campo Naranja, revealed a certain degree of similarity between the predicted functional profiles of both AMEs. However, striking differences with respect to antibiotic and arsenic resistance were detected. Sequences pointing to arsenic resistance are more abundant in the Diamante Lake plasmidome, a fact that also accounts for the plasmidome derived from a wastewater treatment plant that contains large quantities of effluents of the chemical/pharmaceutical industry. Our results reflect the high amount of arsenic present in the environment under investigation. Traits expected to be found in a plasmid pool were detected, such as Pfam domains related to plasmid replication, MOB-type relaxase families related to plasmid mobilization and genes belonging type II toxin-antitoxin systems related to plasmid maintenance. Moreover, there are sequences known from megaplasmids of halophilic archaea isolated from different saline environments, which provides further evidence for known plasmid-associated genes in the obtained dataset. www.nature.com/scientificreports/ The results presented here along with the detection of numerous IS elements favors the opinion that the plasmidome facilitates the mobility and the transfer of genes within such extreme microbial communities.

Data availability
The sequence data of the Diamante Lake plasmidome and 16S rRNA gene amplicon have been deposited at NCBI (National Center for Biotechnology Information) under the accession numbers SRR13795604, SRR13795605 and SRR13795606.