Metagenomic survey of the taxonomic and functional microbial communities of seawater and sea ice from the Canadian Arctic

Climate change has resulted in an accelerated decline of Arctic sea ice since 2001 resulting in primary production increases and prolongation of the ice-free season within the Northwest Passage. The taxonomic and functional microbial community composition of the seawater and sea ice of the Canadian Arctic is not very well known. Bacterial communities from the bottom layer of sea ice cores and surface water from 23 locations around Cornwallis Island, NU, Canada, were extensively screened. The bacterial 16S rRNA gene was sequenced for all samples while shotgun metagenomics was performed on selected samples. Bacterial community composition showed large variation throughout the sampling area both for sea ice and seawater. Seawater and sea ice samples harbored significantly distinct microbial communities, both at different taxonomic levels and at the functional level. A key difference between the two sample types was the dominance of algae in sea ice samples, as visualized by the higher relative abundance of algae and photosynthesis-related genes in the metagenomic datasets and the higher chl a concentrations. The relative abundance of various OTUs and functional genes were significantly correlated with multiple environmental parameters, highlighting many potential environmental drivers and ecological strategies.

Scientific RepoRts | 7:42242 | DOI: 10.1038/srep42242 The objectives of the present survey were to 1) characterize bacterial community structures in seawater and the immediate bottom layer of sea ice cores (referred to as sea ice) recovered near Cornwallis Island in the Northwest Passage, and 2) relate the relative abundance of key taxa and functional genes to environmental parameters. In order to attain these goals, the 16S rRNA gene was amplified and sequenced from sea ice and seawater samples taken at 23 locations around Cornwallis Island in the Canadian high Arctic, while 16 selected samples (6 sea ice and 10 seawater samples) were also subjected to shotgun metagenomic sequencing.

Material and Methods
Sample collection. Samples of surface water and ice bottom samples were collected at 23 stations surrounding Cornwallis Island, in Lancaster Sound, Wellington Channel and adjacent channels, between 74.10′ and 75.93′ N and 92.64′ and 101.34′ W (Fig. 1). The sampling was carried out between May 4 and 18, 2011, which corresponded to the period of the ice algal bloom. At each station, seven or eight ice cores were collected using a 9 cm diameter manually-driven ice corer (Mark II coring system, Kovacs Enterprises, Lebanon, NH, USA). The bottom 3 cm section of each core was cut using a clean stainless steel saw and brought back to the shore laboratory for analysis (core volume ≅ 190 ml). Three cores, used for biomass measurements and DNA analyses, were pooled together and melted with the addition of 0.2 μ m sterile-filtered surface seawater collected at the time of sampling (referred to as "sea ice" samples). This procedure was performed to dilute the thick microbial mats found at the bottom of the cores and maintain osmotic pressure in the samples to be filtered for DNA extraction. The three other cores, used for chemical analyses, were placed in a sterile bag and melted without the addition of filtered seawater. Sterile gloves were always worn when manipulating the cores. All cores were slowly melted at 4 °C in the dark. Surface water (the layer of water directly under ice) was collected using a submersible pump mounted on an articulated arm deployed at the sea ice-seawater interface (referred to as "seawater" samples). At the time of sampling, snow thickness and ice thickness were measured at 5 sites near core collection.
Sample processing and analyses. Sea ice and seawater samples were analyzed for chlorophyll a (chl a), bacterial abundance, particulate organic carbon (POC), dissolved organic carbon (DOC), dissolved nitrogen (DN), macro-nutrients (NO 3 + NO 2 , PO 4 , SiOH 4 ) and salinity. Chl a fluorescence was read on a 10AU Turner Design fluorometer calibrated using pure chl a extract (Sigma-Aldrich, Oakville, ON, Canada) and concentrations were calculated according to Parsons et al. 14 . Bacterial abundance was measured by flow cytometry as detailed in Belzile et al. 15 . POC was measured on a Carlo Erba NC2500 elemental analyzer (Carlo Erba Reagents, Val de Reuil, France). Dissolved organic carbon and nitrogen were measured using a Shimadzu TOC-VCHP analyzer with a TNM-1 Total Nitrogen module (Shimadzu Corp., Kyoto, Japan). Analyses were systematically checked against deep Sargasso Sea reference water from Hansell's Certified Reference Materials (University of Miami, Miami, FL, USA). Samples for nutrient analysis were frozen at − 80 °C and later analyzed on a SmartChem 200 chemistry analyzer (Unity Scientific, Brookfield, CT, USA). Additional details on the methods can be found in Michel and Niemi 16 .  DNA extraction. DNA extractions were performed on duplicate samples of the melted sea ice (50-100 ml) or seawater samples (1000 ml) filtered on 0.22 μ m nitrocellulose filters (Millipore, Billerica, MA, USA). The filters were immediately frozen and stored at − 80 °C until DNA and RNA extraction. The Powerwater RNA Isolation Kit (MoBio Laboratories, Carlsbad, CA, USA) was used for seawater nucleic acid extraction of the filters according to the manufacturer's protocol, modified for the presence of lysis-resistant organisms and the omission of DNAse treatment for total nucleic acid extraction. The protocol for DNA extraction of the sea ice filters was based on a modified CTAB (cetyltrimethylammonium bromide) method of Ausubel et al. 17 to address the large amounts of exopolymeric substances found on the sea ice filters. Volumes of reagents were tripled in comparison to the original protocol to ensure full submersion of filters. DNA was quantified by the Picogreen method (Life Technologies, Burlington, On, Canada) on a Tecan Magellan Fluorimeter (Tecan Group Ltd., Männedorf, Switzerland). DNA extracts were stored at − 80 °C until further analysis.
Amplification and sequencing of the bacterial 16S rRNA gene. Libraries for sequencing were prepared according to Illumina's "16S Metagenomic Sequencing Library Preparation" guide (Part # 15044223 Rev. B), with the exception of using Qiagen HotStar MasterMix for the first PCR ("amplicon PCR") and halving reagent volumes for the second PCR ("index PCR"). The template specific primers were (without the overhang adapter sequence): 515f (5′ -GTGCCAGCMGCCGCGGTAA-3′ ) and 806R (5′ -GGACTACHVGGGTWTCTAAT-3′ ). The first PCR ("amplicon PCR") was carried out for 25 cycles with annealing temperatures of 55 °C. Diluted pooled samples were loaded on an Illumina MiSeq and sequenced using a 500-cycle MiSeq Reagent Kit v3.
Metagenomic sequencing. Following UPGMA cluster analysis of the samples based on the 16S rRNA gene dataset (not shown), representative samples of major clusters were selected for metagenomic sequencing. Each DNA library was prepared for sequencing using the Ion Xpress Plus Fragment Library Kit (Life Technologies) with the Ion Xpress Barcode Adapters 1-16 (Life Technologies), using the Ion Shear Plus Reagents. Size selection was performed using a Pippin Prep instrument (SAGE Science, Beverly, MA) set to "tight" collection with a base pair target setting of 315 to collect fragments with a median of 200-300 bp as recommended in the "Ion Express Plus gDNA Fragment Library Preparation" manual (Life Technologies publication number 4471989, revision L). Barcoded libraries were pooled in an equimolar ratio three by three. A total of 3.50 × 10 7 molecules were used in an emulsion PCR using the Ion Xpress 200 Template Kit (Life Technologies) as described in Sanschagrin and Yergeau 18 . Sequencing of the pooled libraries was performed using the PGM system with the Ion Sequencing 200 kit and 316 chips.
16S rRNA gene sequence data analysis. Sequences were analyzed through our internal rRNA short amplicon analysis pipeline as previously described 19,20 . Briefly, reads were filtered, assembled with their overlapping paired-end and clustered at 97% identity. Taxonomy was then assigned to each cluster based on the Greengenes taxonomy (Greengenes v13_5) and OTU tables were generated, filtered to exclude eukaryotes and chloroplasts and normalized as described in McMurdie and Holmes 21 . This normalized OTU table was used for downstream analysis and for computing alpha and beta diversity metrics. Sequencing statistics are presented in Supplementary Table S1.
Shotgun metagenomic sequence data analysis. Shotgun metagenomic sequences were submitted to MG-RAST 3.0 22 where they were de-replicated using the method of Gomez-Alvarez et al. 23 and trimmed using the dynamic trimming method of Cox et al. 24 in a way that each individual sequence would contain a maximum of 5 bases below a Phred score of 15. Within MG-RAST, significant matches were defined as having 60% sequence identity over at least 15 amino acids or 50 bp and with an e-value below 10 −5 . MG-RAST classifies sequences using the "subsystems technology", in which each sequence is assigned to a manually curated subsystem 25 . The subsystems are grouped in categories in a hierarchical fashion, ranging from "Functions" (most detailed category, e.g. "Photosystem II CP47 protein (PsbB)") to "Level1" (least detailed category, e.g. "Photosynthesis"), with intermediate categories Level 2 (e.g. "Electron transport and photophosphorylation") and Level 3 (e.g. "Photosystem II").

Statistical analyses. All statistical analyses were carried out in R (The R foundation for Statistical
Computing, Vienna, Austria). Paired and non-paired t-tests were performed using the t.test function (or the non-parametric wilcox.test if needed), Spearman rank-order correlations (r s ) using the "cor.test" function, dissimilarity calculation using the vegdist function of the vegan library, principal coordinate analyses (PCoA) using the "cmdscale" function and Permanova using the adonis function of the vegan library. Unweighted Pair Group Method with Arithmetic Mean (UPGMA) was carried out based on the UniFrac distance matrices using the "agnes" function of the vegan library.

Results
Biochemical characteristics. A summary of the biochemical characteristics of sea ice and seawater is presented in Table 1, showing significantly higher concentrations of chl a, particulate organic carbon, dissolved organic carbon and nitrogen, bacterial abundances, and nutrient (N and P) concentrations in the sea ice compared to seawater (t-tests, p < 0.05). In contrast, Si(OH) 4 and salinity were significantly higher in the seawater as compared to the sea ice (t-tests, p < 0.001).
Scientific RepoRts | 7:42242 | DOI: 10.1038/srep42242 Taxonomic community profile. Not surprisingly, when plotting principal coordinates calculated from UniFrac distances (based on 16S rRNA gene sequences) a significant difference (Permanova tests: F = 146.66, P < 0.001) was observed between the microbial communities in seawater and sea ice samples (Fig. 2a). The microbial communities associated with the sea ice were more spatially variable than the microbial communities in seawater (average weighted UniFrac distance between samples of 0.253 for sea ice vs. 0.233 for seawater) (Fig. 2a). Seawater samples were significantly richer (Chao 1 richness: 128.1 for sea ice and 322.1 for seawater, t = 17.4, P < 0.001) and more diverse than sea ice samples (Inverse Simpson diversity index: 0.869 for sea ice and 0.963 for seawater; t = 7.7, P < 0.001).
Similar differences between the sea ice and seawater samples were visible in the metagenomic dataset, both for the species-level taxonomic table (Fig. 2b) and the MG-RAST "Function" level subsystem table (Fig. 2c). Although the differences between sea ice and seawater samples were visually less clear for the function-level ordination (Fig. 2c), Permanova tests revealed that there were significant differences between seawater and sea ice samples for both the taxonomic and function tables (F = 11.49, P < 0.001 and F = 3.55, P < 0.001, respectively). The seawater samples showed less variability than the sea ice for both the taxonomic and function tables (average Bray-Curtis distance between samples of 0.491 and 0.700 for sea ice vs. 0.307 and 0.460 for seawater, respectively) (Fig. 2b,c).
In the 16S rRNA gene datasets, the sea ice was dominated by OTUs related to Alteromonadaceae, Polaribacter, and Colwelliaceae (Fig. 4a), while the seawater was dominated by OTUs related to Nitrosopumilus, Flavobacteriales and Oceanospirillaceae (Fig. 4b). The 15 most abundant OTUs depicted in Fig. 4 accounted in most cases for more than 80% of the OTUs for the sea ice, but less than 60% of the OTUs for the seawater (Fig. 4), in line with the higher diversity observed in sea water.
The metagenomic datasets were also compared at a finer level (MG-RAST "Level 2" and "Level 3"). There were 160 "Level 2" categories and 988 "Level 3" categories, therefore only categories having a direct link to our data are presented in Fig. 6. The relative abundance of sequences related to "Coenzyme B12 biosynthesis" was significantly higher in the seawater as compared to the sea ice (t = 3.61, P = 0.00295), while the sequences related to both photosystem I and photosystem II had significantly higher relative abundance in sea ice as compared to sea water (Wilcox W = 60 and P = 0.000250 for both) (Fig. 6). The relative abundance of genes related to "Proteorhodopsin", "Ammonia assimilation" and "Nitrate and nitrite ammonification" was variable between sampling sites and was not significantly different between seawater and sea ice (Fig. 6).

Environmental drivers.
Correlation and canonical correspondence analyses (CCA) were used to identify linkages between OTUs, functional genes and measured environmental variables/constituents (described in Table 1). Given the expected significant differences between sea ice and seawater constituents, the two sample types were treated separately during the analyses. The effect of environmental variables on the microbial community structure was tested by CCA with forward selection of variables to be included in the model. For sea ice 16S rRNA gene datasets, salinity, total bacteria abundance, depth and POC were selected in the model. For the seawater 16S rRNA gene datasets, total N and Si(OH) 4 were selected. For the MG-RAST "Function" dataset, NO 2 + NO 3 was selected for the sea ice, while no variable was significant for the seawater. Many OTUs showed significant correlations with environmental variables. In the sea ice 16S rRNA gene datasets, 461 Spearman correlations were significant (P < 0.05), with 387 being negative and 74 being positive. Salinity was the factor most often positively correlated with OTUs (15 times), while POC was the factor most often negatively correlated with OTUs (71 times). In the seawater 16S rRNA gene dataset, 392 Spearman correlations were significant (175 positive, 217 negative). Si(OH) 4

Discussion
In the present study, we compared seawater and the immediately overlying sea ice at 23 locations around Cornwallis Island in the Canadian high Arctic. It is one of the first studies to simultaneously look at linked seawater and sea ice samples using shotgun metagenomics and 16S rRNA gene sequencing and to relate this genomic data to potential environmental drivers. As expected, seawater and sea ice exhibited large differences in terms of chemistry, community composition and functional gene content. A key difference between the two sample types was the dominance of algae in sea ice samples, as visualized by the higher relative abundance of algae and photosynthesis-related genes in the shotgun metagenomic datasets and the high sea ice chl a concentrations. Metagenomic datasets from this study showed that these algae were mainly diatoms, confirming visual observations during sampling and corresponding with ice algal taxonomic studies near the study area 26 . The high biomass of primary producers in the sea ice is likely a key/primary driver of the observed differences in microbial communities and nutrient availability 27 . Indeed, primary producers, such as algae, are responsible for the increased availability of labile carbon 28 , benefiting heterotrophs, such as Gammaproteobacteria, and this taxon was significantly more abundant in sea ice as compared to seawater. A symbiotic relationship was proposed to exist between algae and bacteria by production of vitamin B12 29 , and algae exuded exopolymeric substances 30,31 , acting as osmo-and cryoprotectants, probably allowing for the survival of bacteria that are not necessarily cold-adapted [32][33][34] . However, in our metagenomic dataset, the relative abundance of genes related to the biosynthesis of vitamin B12 was significantly higher in the seawater, while the relative abundance of genes related to osmoregulation varied greatly from sample to sample, with no significant differences between sea ice and seawater.
The mean values and range for the physico-chemical indicators are similar to values previously reported for surface water in the Northwest Passage 35 . Among these indicators, salinity and nutrients were shown to be correlated with the abundance of various microbial taxa, even though variability in environmental parameters was small 35 . Interestingly, in the present study, the number of sea ice OTUs and functions that had significant negative correlations with environmental parameters was much higher than the number of OTUs and functions having significantly positive correlations. In the seawater datasets, the number of positive and negative significant correlations was more balanced, suggesting different ecological strategies or susceptibilities to environmental drivers between the microbial communities inhabiting seawater and overlying sea ice.
Most results showed higher variability in microbial communities in sea ice samples as compared to seawater. Sea ice is highly variable in terms of physiochemical properties, especially within its microenvironments 36 , as well as snow cover depth, resulting in an uneven distribution (e.g. abundance and species composition) of algae and other microorganisms at the ice bottom 37,38 . Accordingly, we observed variability in physical and chemical characteristics of sea ice samples, which resulted in higher variability in microbial communities when compared to the more homogeneous seawater samples. Still, the water mass in the Canadian Arctic archipelago is quite variable regarding dynamics and nutrient content 39 .
Although previous studies used different PCR primers 7,40 , sampled multi-year ice as compared to first-year ice here 12,33,41,42 and sampled later in summer as compared to our springtime sampling in the midst of the algal bloom 11 , the community composition of the surface water and bottom layer of the sea ice samples of this study were comparable to previous studies of Arctic and Antarctic sea ice and surface water. The Gammaproteobacteria and Bacteroidetes were abundant in sea ice samples, similar to multi-year ice sampled near the geographic North Pole 12 and in first-year ice from a Norwegian Fjord study 43 . Other taxa that were present, but at lower abundances, were Actinobacteria, Alpha-and Betaproteobacteria, which are also common to other first-year and multi-year sea ice samples 10 . As in the sea ice, both Proteobacteria and Bacteroidetes were relatively abundant in seawater samples, similar to previous reports from Antarctic and Arctic seawater 8,12,35,44 .
In conclusion, our study has shown that the microbial communities and their associated functional genes present around Cornwallis Island in the Canadian high Arctic are very different between sea ice and seawater, even though they were quite variable between sampling sites. The functional differences observed could be at the root of the different capacities of sea ice and seawater to degrade hydrocarbon, as recently shown using samples from the same area 45 .