Spatial variations in the molecular diversity of dissolved organic matter in water moving through a boreal forest in eastern Finland

Dissolved organic matter (DOM) strongly affects water quality within boreal forest ecosystems. However, how the quality of DOM itself changes spatially is not well understood. In this study, to examine how the diversity of DOM molecules varies in water moving through a boreal forest, the number of DOM molecules in different water samples, i.e., rainwater, throughfall, soil water, groundwater, and stream water was determined using Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) in Norway spruce and Scots pine stands in eastern Finland during May and June 2010. The number of molecular compounds identified by FT-ICR MS (molecular diversity) ranged from 865 to 2,194, revealing large DOM molecular diversity in the water samples. Additionally, some of the molecular compounds were shared between different water samples. The DOM molecular diversity linearly correlated with the number of low-biodegradable molecules, such as, lignin-like molecules (lignins), but not with dissolved organic carbon concentration. The number of lignins shared between different sampling locations was larger than that of any other biomolecular class. Our results suggest that low-biodegradable molecules, especially lignins, regulate spatial variations in DOM molecular diversity in boreal forests.

1. The number of molecular compounds detectable by FT-ICR MS (i.e., molecular diversity) and subsequently the number of each biomolecular class vary with water movement (i.e., rainwater to stream water via throughfall, soil water, and groundwater). 2. The molecular compounds identified as the same DOM compound exist at different sampling locations. 3. DOM molecular diversity increases with increasing DOC concentration.

Results
Molecular diversity and biomolecule classification of DOM. The number of DOM molecular compounds detected as m/z peaks (molecular diversity) was 1,293 ± 307 SD on average across all of the water samples and ranged from 865 to 2,194 (Fig. 1a). The coefficient of variation (CV) for the molecular diversity was 23%. The molecular diversity did not differ significantly among sampling locations (p > 0.05).
Each DOM molecular compound was assigned to one of seven biomolecular classes (i.e., lipids, proteins, aminosugars/carbohydrates (As/Ch), unsaturated hydrocarbons (UH), lignin-like molecules (lignins), tannin-like molecules (tannins), or condensed aromatic structures (CAS)), or it was noted that it could not be assigned to any class. Our results revealed that the number of lignins was larger than that of any other class and only a few molecules of As/Ch were found in every sample except the bulk deposition sample obtained in May (Fig. 2). The number of lignins was especially large in throughfall, groundwater-PT, and stream water samples in June. The numbers of lipids or proteins were the second largest after lignins in the majority of the water samples; however, in some samples, the number of CAS was second to lignins.
Relationships of biomolecular classes and DOC concentration to molecular diversity. DOM molecular diversity increased significantly with increasing numbers of lignins, tannins, and CAS in both May and June (Fig. 3a: r = 0.83, p < 0.001 in May, r = 0.77, p < 0.001 in June; 3b: r = 0.86, p < 0.001 in May, r = 0.77, p < 0.001 in June; 3c: r = 0.77, p < 0.01 in May, r = 0.76, p < 0.001 in June). The proportion of the sum of lignins, tannins, and CAS in the molecular diversity in soil water was generally high (28-56%) compared with those in bulk deposition and throughfall (20-31%), although not in comparison to the throughfall in June (56%).
The DOC concentration across all of the water samples was 11.2 ± 10.4 mgC l -1 on average, ranging from 0.7 to 36.4 mgC l -1 (Fig. 1b). The CV for DOC concentration was 93%. The DOC concentration differed significantly among the sampling locations ( Fig. 1b: p < 0.01). There was no correlation between DOC concentration and DOM molecular diversity ( Fig. 4: r = 0.09, p = 0.742 in May, r = 0.09, p = 0.722 in June). Spatial variation in DOM molecular composition. The number of molecular compounds detected as the same m/z peak between different water samples ranged from 153 to 1010. The proportion of those compounds in the molecular diversity ranged from 10% to 60% depending on sampling location. Additionally, the number of lignins shared between different water samples was larger than that of any other biomolecular class and ranged from 52 to 445. The dendrogram including all species of molecular compounds detected by FT-ICR MS in May (Fig. 5a) showed that bulk deposition and throughfall formed one small cluster, indicating that these samples in May had many molecular compounds in common. The dendrograms for the seven biomolecular classes in May (Fig. 5b-h) also showed that lignins detected in the bulk deposition and throughfall formed one small cluster. O-horizon soil water (SW O (O1), SW Y (O1) and SW Y (O2) in Fig. 5), groundwater-PT and stream water were grouped into the same cluster in the lignins, tannins and CAS dendrograms in May ( Fig. 5f-h). The bulk deposition and throughfall did not form one small cluster in the dendrogram including all molecular compounds identified in June (Fig. 6a); however, they were grouped into the same cluster when the dendrogram was cut at the dissimilarity level of 0.95. On the other hand, the dendrograms for the seven biomolecular classes in June (Fig. 6b-h) showed that liginins detected in the bulk deposition and throughfall were separated into the different clusters. Throughfall, groundwater-PT and stream water formed one cluster in the lignins, tannins and CAS dendrograms (Fig. 6f-

Discussion
The results of the FT-ICR MS analysis revealed that thousands of different DOM molecules existed in each water sample collected from our study site, i.e., Kangasvaara catchment (Fig. 1a). FT-ICR MS is one of the most advanced instruments available for the detection of ionizable DOM because it has ultrahigh mass resolution  The linear relationships between DOM molecular diversity and the numbers of lignins, tannins, and CAS ( Fig. 3) suggest that low-biodegradable molecules regulate changes in DOM molecular diversity 36,37 . Biodegradable molecules, such as hydrophilic organic matter leached from trees, are preferentially utilized by microorganisms, resulting in an abundance of low-biodegradable molecules, such as hydrophobic aromatic compounds remaining in soils 29,38 . Hur et al. 39 suggested that microbial utilization of labile components, such as simple carbohydrates and amino acids, produced humic-like aromatic components in the DOM extracted from leaf litter. These would also be reflected in the generally higher proportion of the sum of lignins, tannins, and CAS in the molecular diversity in soil water than in bulk deposition and throughfall. However, some low-biodegradable molecules can be preferentially adsorbed by soil particles 18,40 . This might be reflected in the variations in the number of low-biodegradable molecules among replicates of soil water samples (see Figs S1 and S2). Similar patterns were found among the lignins, tannins and CAS dendrograms (Figs 6 and 7), implying that dynamics of low-biodegradable molecules were similar within boreal forest ecosystems.
Comparisons of the m/z value between different water samples revealed that common molecular compounds existed at different sampling locations (hypothesis 2). We found that the number of lignins shared between different sampling locations was larger than that of any other biomolecular class. Additionally, correlation analyses between the cophenetic matrices of Jaccard's distance (Fig. 7) showed that the pattern of the degree of similarity among sampling locations for lignins reflects that for all molecular compounds detected by FT-ICR MS. Indeed, similar patterns were found between the dendrograms for lignins and all molecular compounds in May and June (Figs 5 and 6). These results indicate that lignins could regulate spatial variations in the number of the common molecular compounds. Lignins are phenolic polymers that originate mainly from vascular plants in terrestrial ecosystems and are considered to be recalcitrant organic matters because they are degraded at a lower rate than other substances originating from plants, such as, cellulosic and non-cellulosic polysaccharides and proteins 41,42 . This presumably allows lignins to exist as the molecular compounds common to different sampling locations in our study site.
On the other hand, a number of different molecular compounds existed between sampling locations since common molecular compounds accounted for less than 50% of the molecular diversity in the majority of the water samples. This may be related to the fact that the quality of DOM is altered by several processes. For instance, tree canopies and forest soils release DOM molecular compounds into rainwater; whereas in rainwater, they are microbially converted to lower molecular-weight compounds and/or some are retained in mineral soils    O 11 as an example of isomers, which represent two major refractory substances, i.e., lignin and carboxyl rich alicyclic molecule, and explained that these substances have different reactivities and sources though they have the same molecular formula. Therefore, it is possible that even molecular compounds considered lignins in the van Krevelen diagram were misidentified and consequently a larger number of different molecular compounds existed between sampling locations than observed. This also would be supported by Reetsma 47 who states that the molecule plotted into the region of one of the biomolecular classes in the diagram does not necessarily belong to that class because the O/C and H/C ratios are insufficient criteria to ascribe a molecule to a certain biomolecuar class. However, several studies described that lignins are major components of refractory DOM in the terrestrial freshwater 12,30,48 . In forests, litter is considered to be an important source of soil DOM 49 and especially coniferous litter, such as spruce and pine needle litter could release lignin-derived DOM components with a high aromaticity into the soils via microbial decomposition 50 . These support our conclusion that lignins could regulate spatial variations in the number of common molecular compounds and consequently molecular diversity in boreal forests. Vertical profiles in DOM molecular diversity and DOC concentration within the Kangasvaara catchment ( Fig. 1) revealed that variations in DOM molecular diversity were much smaller than those in DOC concentration. Presumably reflecting this result, molecular diversity did not change in association with DOC concentration ( Fig. 4) (hypothesis 3). These results demonstrate that DOC concentration alone does not sufficiently explain spatial variations in molecular diversity. This could be attributed to the fact that molecular compositions detected by FT-ICR MS do not provide quantitative information on the molecular compounds present in DOM 51 . Alternatively, it could be attributed to the fact that not all DOM molecular compounds in water samples are detectable because the enrichment procedure of DOM from water samples, such as the C 18 solid phase extraction method used, is incomplete 47 and also because non-ionizable organic compounds are not characterized by FT-ICR MS 30 . Reemtsma and These 52 suggested the possibility that high molecular-weight components of humic and fulvic acids were less effectively ionized by ESI than low molecular-weight components. Additionally, Piccolo and Spiteller 53 stated that when utilizing ESI to detect DOM molecular compounds, the dominance of hydrophobic compounds in humic substances may inhibit ESI of hydrophilic compounds due to supramolecular associations of low-molecular organic compounds. However, Stenson et al. 54 used ESI coupled to FT-ICR MS to detect molecular compounds in Suwannee River fulvic acids and reported that ions generated by ESI were partially representative of the entire humic substances studied. In any case, our results suggest that DOM molecular diversity does not increase or decrease with DOC concentration in water samples collected from boreal forests.
This study presents preliminary but unprecedented results on the changes in the molecular diversity and composition of DOM across water movement in a boreal forest. At this stage, we could not examine temporal variations in the quality of DOM. However, common DOM molecular compounds were detected in the same location between May and June and the number of the lignins common to both May and June was larger than that of any other biomolecular class for any water sample (Fig. S3). Therefore, it is possible that lignins regulate not only spatial but also temporal variations in molecular diversity. Further research is needed on the temporal variations and on the factors driving the spatial and temporal variations in molecular diversity and the mechanisms of preservation and alteration of DOM molecular compounds across water movement throughout the year in boreal ecosystems.

Methods
Site description. This study was conducted in the Kangasvaara catchment (Fig. S4) located in eastern Finland (63°51′N, 28°58′E). The area and mean elevation of the catchment are 56 ha and 187 m a.s.l., respectively. The catchment has a perennial brook. The mean slope gradient of the catchment is 7%. From 1981 to 2000, the mean annual air temperature was 2.3 °C and precipitation was 527 mm, 35% of which was snowfall 55 . The soils in the catchment mainly consist of iron podzols (haplic podzols) and fibric histosols (peat), which have developed on stony till material. The underlying bedrock consists of gneiss granite and granodiorite. The proportion of peatland is 8% of the catchment area 56 . Most of the forests on the upland mineral soils (97%) were old-growth forests dominated by Norway spruce (Picea abies L. Karst.), Scots pine (Pinus sylvestris L.), and white and silver birch (Betula pubescens Ehrh. and Betula pendula Roth., respectively). European aspen (Populus tremula L.) were also found in the catchment 56 . The experimental stands in the catchment were old-growth mixed forest (F O ) dominated by Norway spruce and 12-year-old Scots pine plantation (F Y ). The plantation was established after harvesting a portion of the Norway spruce stands (Fig. S4). A 50 × 50 m plot was established in each stand. Norway spruce, Scots pine, and deciduous trees accounted for 53%, 33%, and 14%, respectively, of the total stand volume of 260 m 3 (Fig. S4). Throughfall was sampled using 20 collectors of the same type used for bulk deposition, placed systematically around the F O plots. The bulk deposition and throughfall samples were collected for a week in each month and emptied after the week.
The soil water was collected under the O-horizon, at a depth of 15 cm that was comparable to the E-horizon and at a depth of 35 cm that was comparable to the B-horizon by tension (60 kPa) lysimeters that consisted of 67-mm-long porous cups with an outer diameter of 12 mm (P80, Hoechst CeramTec AG, Germany). Three lysimeters (L1, L2, and L3) were placed 1-2 m apart in every horizon and each stand and soil water from each location was collected for the same week as bulk deposition and throughfall. In May, no water samples were obtained from two lysimeters in the E-and B-horizons in F O , one lysimeter in the O-horizon, three lysimeters in the E-horizon, and two lysimeters in the B-horizon in F Y because of small rainfall. In June, no water samples were obtained from three lysimeters in the O-horizon in F O , three lysimeters in the O-horizon and one lysimeter in the B-horizon in F Y .
The groundwater was obtained from a groundwater tube installed at a depth of 0.7 m in the peat layer (groundwater-PT), 2.5 m in the moraine (groundwater-MO), and 2.4 m in the mineral soil under the peat layer (groundwater-MI). The depth of the peat layer around the groundwater-PT ranged from 1.5 to 2.1 m. The groundwater tube was made of polyamide and had an inner diameter of 30 mm and an outer diameter of 40 mm. The stream water samples were collected at the outlet of the Kangasvaara catchment (Fig. S4). The groundwater and stream water samples were taken at the same day as other water samples, where temperature, rainfall, and 10-day antecedent rainfall were 8.1 °C in May and 13.5 °C in June, 0.8 mm in May and 0 mm in June, and 27.3 mm in May and 57.3 mm in June, respectively.
All samples were filtered using pre-combusted glass fiber filters with a nominal pore size of 0.7 μ m (GF/F, Whatman, Japan) and frozen until analysis.
Chemical and data analyses. DOC concentrations were measured using the combustion catalytic oxidation method (TOC-5000; SHIMADZU Corp., Japan). DOM molecular compounds were analyzed using electrospray ionization (ESI) coupled to high-resolution FT-ICR MS. In this study, an FT-ICR mass spectrometer with a 9.4-T magnetic field was used (APEX-Q-94e, Bruker Daltonics Inc., MA, USA). Filtered samples were treated with the C 18 solid phase extraction (SPE) method to remove inorganic salts 57 . Water samples extracted by this method were diluted with deionized water and methanol to yield a final sample composition of 50/50 (v/v) of methanol to water. The extraction efficiency ranged from 32% to 64%. The ionization efficiency was enhanced by adding ammonium hydroxide before ESI 23 . Samples were injected into the FT-ICR mass spectrometer using a syringe pump at an infusion rate of 100 μ l h -1 . All samples were analyzed in the negative ion mode. The electrospray voltage was optimized for each sample. Ions were accumulated in a hexapole for 2 s before they were transferred to the ICR cell, and the 100 transients collected using a 2 M Word time domain were co-added. All spectra (Fig. S5) were externally calibrated using an arginine standard and internally calibrated using fatty acids. Mass lists were produced using a signal-to-noise ratio (S/N) cut-off of 4. Isotope peaks were removed from the list. The molecular formula calculator (Molecular Formula Calculator ver. 1.0; ©NHMFL, 1998) was used to assign an expected molecular formula for each m/z value with a mass accuracy ≤ 1 ppm 23,27 . Only m/z values in the range of 180-500 were inserted into the molecular formula calculator. The following conditions were used for formula assignment: C = 0-∞ ; H = 0-∞ ; O = 0-∞ ; N = 0-5; S = 0-3; P = 0-3; DBE ≥ 0 27 . After the formula assignment by the molecular formula calculator, some formulas not likely to be observed in natural water were eliminated based on rules described in Kujawinski and Behn 58 and Wozniak et al. 34 .
To identify which biomolecular class each molecular compound belonged to, a van Krevelen diagram was used based on the elemental ratios of the expected molecular formulas, i.e., the oxygen-to-carbon (O/C) and hydrogen-to-carbon ratios (H/C) 26 (Fig. S6). Here, we excluded the molecules that had multi-candidate formulas assigned and thus did not have a particular range of O/C and H/C values that falls into a biomolecular class. The excluded molecules accounted for 40% of the total m/z peaks on average, with variability ranging from 19% to 55%, and tended to increase with increasing molecular weight. Following the protocols proposed by Grannas et al. 27 and Hockaday et al. 28  DOC concentration and the number of molecular compounds detected were compared among sampling locations using a one-way ANOVA. The number of replicates ranged from 2 to 4 at each location. Tukey-Kramer post-hoc tests were used to compare sample means. Normality (Shapiro-Wilk test) and homogeneity (Bartlett test) were verified in advance. Jaccard similarity coefficients were calculated to examine how many molecular compounds were the same between different sampling locations, where coefficient = 1 indicates that two water samples share all of the molecular compounds while coefficient = 0 indicates that there are no common molecular compounds. Then, dendrograms of different sampling locations were obtained for all molecular compounds detectable by FT-ICR MS and for each biomolecular class by calculating the Jaccard's distance between samples and subsequently using Ward's method. Additionally, correlation coefficients were calculated between cophenetic matrices of Jaccard's distance for all molecular compounds and for each biomolecular class to examine which biomolecular class dendrogram exhibits a similar pattern to the dendrogram for all molecular compounds. Statistical analyses in this study were conducted using R (version 2.13.1 59 ).