Abiotic and past climatic conditions drive protein abundance variation among natural populations of the caddisfly Crunoecia irrorata

Deducing impacts of environmental change on species and the populations they form in nature is an important goal in contemporary ecology. Achieving this goal is hampered by our limited understanding of the influence of naturally occurring environmental variation on the molecular systems of ecologically relevant species, as the pathways underlying fitness-affecting plastic responses have primarily been studied in model organisms and under controlled laboratory conditions. Here, to test the hypothesis that proteome variation systematically relates to variation in abiotic conditions, we establish such relationships by profiling the proteomes of 24 natural populations of the spring-dwelling caddisfly Crunoecia irrorata. We identified protein networks whose abundances correlated with environmental (abiotic) gradients such as in situ pH, oxygen- and nitrate concentrations but also climatic data such as past thermal minima and temperature seasonality. Our analyses suggest that variations in abiotic conditions induce discrete proteome responses such as the differential abundance of proteins associated with cytoskeletal function, heat-shock proteins and proteins related to post-translational modification. Identifying these drivers of proteome divergence characterizes molecular “noise”, and positions it as a background against which molecular signatures of species’ adaptive responses to stressful conditions can be identified.

Organisms are able to maintain fitness via physiological and phenotypic plasticity by continuously sensing and integrating environmental cues [1][2][3] . These responses allow them to adjust to environmental variation, a feat that influences the outcome of evolution and the likelihood of extinction [4][5][6] . Due to the reversible nature of plasticity and its contribution to fitness, it has the potential to moderate the loss of global biodiversity during environmental change [7][8][9][10] . The advantage of species' plasticity during such change depends on the threshold-limits of their more-or less robust molecular systems that underlie plastic responses 11,12 . In many, if not all species, these networks of interacting macromolecules are likely tuned to the fluctuations in the environment over space and time. Accordingly, substantial variation in gene expression and protein abundance profiles within and between populations can be observed [13][14][15][16][17][18] . Attributing this variation to environmental cues experienced by the organism can not only give insights into the molecular mechanisms underlying regulatory plastic responses [19][20][21] but also inform about population health and how a population at one site responds to environmental change not encountered by another population. Often, the environmental cues and underlying molecular pathways involved are largely unknown but increasingly studied by identifying large-scale gene expression changes, providing fundamental knowledge on adaptive processes in response to environmental challenges such as global warming and pollution 22-25 . It has been repeatedly shown that the correlation between transcript and protein abundance is modest [26][27][28] . Phenomena such as pre, co-and post-transcriptional and translational modification 29,30 , protein turnover 31 and the inherently stochastic nature of gene expression contribute to this pronounced difference 32,33 . Accordingly, the molecular signatures of plasticity triggered by environmental (selective) pressure might frequently only be detectable at the protein level and against a pre-established background of environmentally induced proteome variation. Our knowledge on the concrete environmental cues that induce protein abundance variation among www.nature.com/scientificreports/ natural populations is limited due to a lack of field-based studies that relate abundance variation to variation in environmental conditions encountered at the sites.
In this study, we proposed to tackle two fundamental questions. First, what is the standing protein abundance variation between natural populations of different geographic regions? Second, are differences in the abiotic environment linked to this variation, and, if so, do functional profiles of proteins reflect these differences? To this end, 24 populations of the caddisfly Crunoecia irrorata (Trichoptera: Lepidostomatidae, Curtis 1834) were used as a study system for two reasons: (1) The increasing need to monitor population health of non-model species (i.e. species without sequenced genomes) via systems-wide approaches and (2) the continuous exposure of larval individuals to surrounding aqueous conditions. Population-wide liquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to examine abundance variation patterns amongst commonly identified proteins within this species. We tested the hypothesis that this variation can be systematically attributed to abiotic measurements including elevation, geographic distance and past climatic conditions. As cellular conditions change rapidly, often in seconds or minutes 34,35 , functional physiological changes must necessarily occur before or during transcriptional and translational changes. We therefore focus on abiotic point-measurements, i.e. we measured abiotic variables at the same time as organismal sampling to obtain a "physiological snapshot" of these natural populations. Such a characterization of the "normal state" of a given system-the ranges of its variation 36 -could aid in defining baselines of molecular responses that must be exceeded in order to be considered potentially adaptive changes or physiological responses triggered by changes in the environment.

Material and methods
Organism collection and environmental variables. Eight C. irrorata larval individuals of similar sizes were collected from 24 (near-) natural freshwater springs, i.e. from eight springs in three study regions across Germany: Rhoen Biosphere Reserve (R), Harz National Park (H) and Black Forest (BF) (Supplementary Table S1). Sites were distributed in a randomized design between low (275 m) and high (967 m) elevations. Individuals from one spring were pooled into one sample since pooling has been shown to match mean protein abundances of the individuals making up the pool 37 , thereby reducing the biological variation compared with that between individuals 38 . After identification, larvae were washed with dH 2 O and transferred on-site into 5.0 ml Protein LoBind tubes (Eppendorf, Germany) containing 1 ml of RNA-later (Invitrogen). RNA-later was chosen, since snap-freezing was infeasible and, despite its limitations, proteins of various tissues are well preserved and biological information is largely kept intact [39][40][41] . Samples were stored at ~ 4 °C in a portable fridge (Dometic TropiCool TC 33) until storage at − 20 °C was possible. From each spring, additional larvae were collected, conserved in EtOH and their head capsule widths (n = 88) were measured using a Leica S9i stereo microscope, serving as a proxy control for possible instar stage distribution differences between sampling regions. Abiotic variables were measured at each spring using a multi-parameter portable meter (MultiLine Multi 3650 IDS). Inorganic nutrient contents were determined from 30 ml of spring water via Ion Chromatography (IC) using a 940 Professional IC Vario ONE/SeS/PP (Metrohm) and iron (Fe) content via Inductively Coupled Plasma Emission Spectrometry (ICP-OES) using a 5100 ICP-OES (Agilent Technologies). In order to assess the influence of past climatic differences between sites on protein abundance profiles, we collected macroclimate data (Bio-Clim1-19) from the WorldClim v.2 42 database based on spring coordinates using the raster package v.3.0.12 43 (resolution: 5-min of a longitude/latitude degree), presenting various indices of environmental data calculated from 30 years of average monthly data .
Protein extraction. Prior to protein extraction, samples were thawed on ice and centrifuged at 2,800g.
RNA-later was decanted and larvae transferred from RNA-later into 1.5 ml protein LoBind tubes containing 1 ml cold 10 × phosphate-buffered saline (PBS). After vortexing for 10 s, larvae were transferred into 1.5 ml protein LoBind tubes containing 400 µl cold lysis buffer (1% sodium deoxycholate [SDC], 10 mM TCEP, 100 mM Tris, pH 8.5 [adjusted with NaOH/HCI]). Samples were in-tube homogenized with a sterile glass pestle followed by 10 min incubation at 4 °C and 3 × 1 ultrasonication (Bandelin Sonoplus HD270). Samples were spun at 2800g for 10 min at − 4 °C, and 200 µl supernatant was subsequently transferred to a new tube. To desalt absorbed RNAlater, trichloroacetic acid (TCA)/acetone precipitation was performed according to the protocol by Luis Sanchez 47 , resulting in a pellet that was dissolved in 200 µl lysis buffer. After 10 × 1 s sonication, samples were incubated for 10 min at 95 °C at 10 g in a Thermomixer C (PCR 96 heating block, Eppendorf). At this point, protein concentrations were determined via a Bicinchoninic acid assay (BCA; Thermo Scientific Pierce BCA Protein Assay Kit) according to manufacturer's instructions. After letting samples cool down at room temperature, they were spun down at 2800g for 10 s. Four µl of 0.75 M chloroacetamide solution was added to each sample and incubated at 37 °C for 10 min at 28 g and again spun down at 2800g for 10 s. After checking if the pH of each sample was around 8, 1 µg trypsin (Sequencing Grade Modified Trypsin, Promega) was added to 50 µg extracted proteins per sample which then were digested overnight at 37 °C and 300 rpm. Samples were acidified with 50 µl 5% trifluoroacetic acid (TFA) and peptides were purified using PreOmics cartridges (Martinsried) according to manufacturer's instructions. Eluted peptides were transferred to a 96-well plate and concentrated to dryness by applying vacuum for 2 h. Peptides were subsequently dissolved in 20 µl 0.1% formic acid by 10 × 1 s ultrasonication and shaking at 28 g at 25 °C for 5 min. After spinning dissolved peptides down at 1800g for 10 min, peptide concentrations were determined based on absorbance values using a SPECTROstar Nano Absorbance Plate Reader (BMG Labtech). Protein and peptide concentrations are given in Supplementary Table S2. Peptides were diluted to a concentration of 0.5 µg/µl in 0.1% formic acid. IRT peptides (Biognosys AG, Schlieren, Switzerland) were added to the wells to control for LC-MS performance, and samples were stored at − 20 °C prior to LC-MS/ MS analysis. www.nature.com/scientificreports/ LC-MS/MS analysis. Samples were subjected to LC-MS/MS analysis using an Orbitrap Fusion Lumos Tribrid Mass Spectrometer fitted with an EASY-nLC 1200 (both Thermo Fisher Scientific) and a custom-made column heater set to 60 °C. Peptides were resolved using an RP-HPLC column (75 µm × 36 cm) packed in-house with C18 resin (ReproSil-Pur C18-AQ, 1.9 µm resin; Dr. Maisch GmbH) at a flow rate of 0.2 µl/min. The following gradient was used for peptide separation: from 5 to 12% B over 5 min to 35% B over 65 min to 50% B over 20 min to 95% B followed by 20 min at 95% B. Buffer A was 0.1% formic acid in water, and buffer B was 80% acetonitrile, 0.1% formic acid in water. The mass spectrometer was operated in Data-Dependent Acquisition (DDA) mode with a cycle time of 3 s between master scans. Each master scan was acquired in the Orbitrap at a resolution of 120,000 full width at half maximum (at 200 m/z, MS1) and a scan range from 375 to 1600 m/z followed by MS/MS (MS2) scans of the most intense precursors in the linear ion trap at "Rapid" scan rate with isolation of the quadrupole set to 1.4 m/z. Maximum ion injection time was set to 50 ms (MS1) and 35 ms (MS2) with an AGC target of 1.0E6 and 1.0E4, respectively. Monoisotopic precursor selection (MIPS) was set to peptide, and the intensity threshold was set to 5.0E3. Peptides were fragmented by HCD (higher-energy collisional dissociation) with collision energy set to 35%, and one microscan was acquired for each spectrum. The dynamic exclusion duration was set to 30 s.

Protein identification.
Raw spectra (Thermo raw files) were submitted to an Andromeda 44 search in Max-Quant v1.6.10.43 45 . The "match between runs" option was enabled (match time window: 0.7 min, alignment time window: 20 min). Instrument type was set to Orbitrap, precursor mass tolerance to 15 ppm and fragment ion tolerance to 0.05 Da. Enzyme specificity was set to fully tryptic, with a maximum of two missed cleavages. MS/MS spectra were searched against a previously described database consisting of translated gene-prediction sequences of Trichoptera species 46 , additionally including candidate protein-coding regions of Rhyacophila fasciata 47 ; BioProject: PRJNA219600). All searches included a contaminants database (as implemented in Max-Quant, 267 sequences). For protein identification, unique and razor peptides were used. The peptide spectrummatch false discovery rate (FDR) and the protein FDR were set to 0.01 (based on the target-decoy approach). Oxidation of methionine (M) and acetylation (Protein N-term) were specified as variable and carbamidomethylation of cysteines (C) as fixed modifications. Enzyme specificity was set to "Trypsin/P". Minimum peptide length was set to 7. The "evidence.txt" and "summary.txt" output files were used for LC-MS/MS quality control using R package artMS v.1.4.2 48 . To investigate identification differences between the three regions, we analyzed samples in three independent MaxQuant runs (settings as above  Fig. S3), making data imputation unreliable. Therefore, protein raw intensities were calculated through summation of peptide intensities 51 . Raw intensity values were log 2 -transformed, quantile normalization was performed using function normalize.quantiles from package preprocessCore v.1.46.0 52 and missing values (n = 1512 = 5.37%) imputed using function impute.knn from package impute v.1.60.0 53 . These relative abundance values were tested for association with abiotic variables: first, unsupervised non-metric multidimensional scaling (nMDS) ordination was performed to visualize clustering of protein abundances between populations and regions. A Bray-Curtis 54 distance matrix was calculated based on relative protein abundance values for each population and used as input to the metaMDS function implemented in vegan v.2.5.6 55 (Supplementary Fig. S4). Weighted gene co-expression network analysis (WGCNA) v.1.68 56 was used to identify suites of co-regulated proteins in an unsupervised way. These protein networks could identify important functional groups of proteins and candidate proteins underlying plastic responses, similar to gene expression analyses 57,58 .
A sample network was constructed to identify outlying samples with a standardized connectivity score of less than − 2.5 59 . A signed protein co-abundance network was constructed for all 24 populations independent of sampling region. The network was constructed with a soft threshold power (β) of 18 as this value was found to be appropriate by the function pickSoftThreshold to reach a scale-free topology index (R 2 ) of at least 0.90. We used the Dynamic Tree Cut approach to merge highly correlated modules using a height-cut of 0.20 60 . The abundances of these modules can be summarized as the abundance of a single "eigengene", calculated as the first principal component of the abundances of all proteins in a module across samples 60 . These eigengenes were correlated (Pearson's correlation) with abiotic variables to identify module-environment relationships ( Supplementary Fig. S5). Analogous to the in situ abiotic variables, WGCNAs were independently performed for bioclimatic variables (WGCNA BC , Supplementary Fig. S6) and for differentially abundant proteins (DAPs) between the three sampling regions (WGCNA DAP , Supplementary Fig. S19). DAPs were identified using the Linear Models for Microarray Data (LIMMA) library 61 . A linear model was fit for each protein via the function lmFit and contrasts from the model fit, and summary statistics were computed via the functions contrasts.fit and eBayes. We considered proteins differentially abundant between regions if their log 2 fold change (FC) was < − 2 or > 2 and their adjusted p-value < 0.05. We computed GLMs by regressing protein abundances against each continuous, abiotic response variable (i.e. abiotic gradient) using the stan_glm function in the rstanarm package v2.17.4 62 . We termed a positive or negative abundance change of a specific protein with an abiotic gradient a "protein reaction norm", akin to the concept of a reaction norm concept in the study of phenotypic plasticity and life-history theory 1,63,64 . In this case, it describes the sensitivity of a protein, or of a set of proteins, to a change in some specific abiotic variable. Resulting protein reaction norms were assessed independently via scatterplots.
To assess the effect of geographic distance on protein abundance divergence between populations, a Mantel test www.nature.com/scientificreports/ was performed between geographic-and (Euclidean) protein-distance matrices in ade4 v.1.7.13 65 . Geographic distances between springs were calculated based on longitude and latitude of springs using package geosphere v.1.5.10 66 and the implemented function distm (Vincentry great circle distances). Pairwise spectrum distances (cosine distances) between MS/MS runs (proteome-wide distances independent of peptide-identification and therefore of the homology-based protein database) were computed in R using the DISMS2 algorithm 67 .
Functional annotation of protein sequences. The first majority protein ID of each identified protein was parsed from the MaxQuant output and corresponding amino-acid sequences extracted from the database using function ucsc-faSomeRecords as implemented in Bioconda 68 . All sequences were queried against the Swiss-Prot database 69 (accessed 10 January 2020) using stand-alone blastp v.2.2.28 with default parameters (E-value 0.001). Protein domains were determined using stand-alone InterProScan (IPS) v.5.39-77.0 70 with default parameters. All proteins were assigned to existing Cluster of Orthologous Groups (COGs) 71 and Gene Ontology (GO) terms in EggNOG v5.0 72 via eggNOG-Mapper 73 (Taxonomic scope: Insecta, Orthologs: all orthologs, GO evidence: non-electronic terms, E-value: 0.001). To test whether any GO terms were overrepresented in WGCNA modules and DAPs, we sorted proteins in selected modules by their Gene Significance (GS) for the significantly correlated abiotic variable and DAPs by their adjusted p-values and performed rank-based tests for each GO term assigned to these proteins by applying Kolmogorov-Smirnov tests via package topGO v.2.38.1 74 ("weight01" algorithm, nodeSize = 10). Individual proteins associated with significant GO terms of interest can be easily identified using the "Supplemental R script". To further identify and substantiate biological functions and processes associated with modules, a domain-based analysis was conducted. To this end, the IPS Protein Families (Pfam) annotations 75 of the proteins belonging to a certain module were used as entry for domain-centric GO (dcGO) enrichment (FDR p-value < 0.01) 76 .

Results
Global drivers of protein abundance differentiation. Mass spectra matched to 4536 distinct peptide sequences (mean number of amino acids: 16, range 8-44). These peptide sequences mapped to 1173 proteins that were commonly identified in all 24 C. irrorata populations. Protein identifications largely overlapped (Supplementary Fig. S7b) and pairwise peptide-level correlations between samples were above 0.8. Protein abundance profiles exhibited variation between populations but showed association with sampling regions, whereby populations from any one region clustered together mostly along axis 2 of the nMDS (Fig. 1a). Therefore, proteins with significant, positive eigenvector loadings along this axis may play a dominant role in proteome differentiation over geographic range. The top proteins with positive loadings (> 0.005, n = 69) were enriched for GO terms small molecule metabolic process (GO:0044281, BP), oxidoreductase activity (GO:0016491, MF) and mitochondrion (GO:0005739, CC). More similar protein abundance profiles were found between populations occupying adjacent springs when compared to springs further apart (p Mantel = 0.006, Supplementary Fig. S8) and pairwise spectrum distances between LC-MS/MS runs were positively correlated with environmental distances between springs (p Mantel = 0.009), whereby spectral profiles were more similar between springs of similar abiotic environmental conditions (Fig. 1b). Environmental distance between springs did not correlate with geographic distance (p = 0.33; Supplementary  Fig. S9). One-hundred eighty-two DAPs were identified between sampling regions. These proteins were primarily enriched for GO terms related to primary metabolic process (GO:0044238; e.g. mitochondrial-processing peptidase subunit beta and probable NADH dehydrogenase [ubiquinone] iron-sulfur protein 6) and gene expression www.nature.com/scientificreports/ (GO:0010467; e.g. chromodomain-helicase-DNA-binding protein Mi-2 homolog and nuclear pore complex protein Nup93). To investigate if abundances of DAPs between regions are influenced by abiotic gradients, we related all 182 DAPs to the abiotic variables (Fig. 2a). The "green DAP " module emerged as the only set of co-regulated and differentially abundant proteins that significantly correlated with pH (r greenDAP = 0.42), nitrate (r greenDAP = -0.45) and sulfate (r greenDAP = -0.41) as shown in Fig. 2c. Protein families showing the highest frequencies across the "green DAP " subset of DAPs were O: Post-translational modification, protein turnover, and chaperones, E: amino acid transport and metabolism and G: carbohydrate transport and metabolism (Fig. 2b). This subset of DAPs is functionally similar to the protein family distribution of all identified DAPs ( Supplementary Fig. S20). Significantly changing proteins with the highest gene significance (GS) values in this module also show different, sometimes opposite, reaction norms between regions (Fig. 2d).
Protein reaction norms. We observed baseline abundance differences between geographical regions ( Figs. 2d and 3). For example, the reaction norm of phosphoglycerate mutase (member of the "yellow" module), is a pH-sensitive enzyme 77,78 and had a higher baseline abundance in populations from BF springs which had significantly higher mean pH than springs from R and H (Supplementary Fig. S10a). Multiple proteins showed pronounced abundance changes in relation to abiotic conditions (Figs. 2d, 4b-d). For example, 13 proteins showed increasing and 20 decreasing reaction norms according to spring elevation and pH variation was accompanied by 71 increasing and 5 decreasing norms of reaction. Temperature variation was accompanied by 29 increasing and 33 decreasing reaction norms whereby a high frequency of decreasing proteins belonged to the chaperonin Cpn60/TCP-1 family and a high frequency of increasing proteins to the calreticulin/calnexin family. www.nature.com/scientificreports/ Both families play a role in the physiological integration of temperature variation, both in cold 79,80 and warm 81 environments. Interestingly, this feature (i.e. protein's abundances change as a factor of changing abiotic conditions) can also be observed for DAPs (Fig. 2).
Global network analyses. The first WGCNA assigned all 1173 proteins to eight co-expression modules (designated randomly by colors). Five module eigengenes were significantly correlated with distinct abiotic variables (Fig. 4a): both, "brown" (n = 477) and "red" (n = 59) modules correlated with magnesium (r brown = − 0.46, r red = − 0.42) and sulfate (r brown = − 0.43, r red = − 0.49). "Green" (n = 81), "yellow" (n = 133) and "grey" (n = 73) modules correlated positively with oxygen (r green = 0.4), pH (r yellow = 0.47) and nitrate (r grey = 0.47), respectively. Correlations between module membership and protein significance values were used to further corroborate these relationships, with correlations ranging from 0.5 for the "yellow" module to 0.66 for the "grey" module ( Fig. 4f-i). Gene Ontology enrichment of the highly underrepresented "brown" module detected many significant GO terms across all GO categories (Cellular Compartment (CC), Molecular Function (MF) and BF). Gene Ontology terms within BP and MF included many terms associated with unfolded protein stimulus [e.g., chaperone binding (GO:0051082), chaperone-mediated protein folding (GO:0061077) and cellular response to topologically incorrect protein (GO:0035967)]. The "red" module was associated with the same abiotic variables as the "brown" module, but had no enriched GO terms. The "green" module was enriched with four GO terms in the BP category: Intestinal stem cell homeostasis (GO:0036335), sleep (GO:0030431), response to endoplasmic reticulum stress (GO:0034976) and RNA interference (GO:0016246) and one GO term in the CC category: Intracellular membrane-bounded organelle (GO:0043231). Lastly, the "yellow" module was enriched with only one GO term in the BP category: Electron transport chain (GO:0022900) but with three GO terms in the CC category: Cytoplasmic part (GO:0044444), intracellular membrane bounded organelle (GO:0043231) and endomembrane system Domain-centric GO enrichment of the "yellow" module (Pfam slim level = 3) computed significantly enriched GO terms such as regulation of sodium ion transport (GO:0002028), pH reduction (GO:0045851) and retina layer formation (GO:0010842). The "green" module included significant GO terms such as oxidoreductase activity (GO:0016491), oxidation-reduction process (GO:0055114) and response to oxidative stress (GO:0,006,979). The "grey" module, related negatively to pH and positively to nitrate concentrations was enriched for similar GO terms as the "yellow" module such as pH reduction (GO:0045851), regulation of intracellular pH (GO:0051453) but also terms related to nitrogen such as response to xenobiotic stimulus (GO:0009410) and cellular nitrogen compound metabolic process (GO:0034641). The "yellow" module had higher frequencies of proteins associated with C: energy production and conversion, Z: cytoskeleton and O: post-translational modification/protein turnover, and chaperones. The "green" module included 39.5% proteins associated with O: post-translational modification/ protein turnover, and chaperones and the "grey" module showed COG underrepresentation and many proteins with S: unknown function (Fig. 2e,f).
The WGCNA BC assigned all proteins to six co-expression modules (designated randomly by colors; Fig. 5a). An overview of COG frequency distributions and eigengene correlations with BioClim variables is given in Fig. S17. The strongest relationship in the WGCNA BC emerged between the "green BC " module (n = 214) and bio6 (r = 0.72), representing the minimum temperatures (°C) of the coldest month experienced by C. irrorata populations in the corresponding sites over 30 years (Fig. 5c,d). This module contained a high frequency of proteins related to the cytoskeleton COG family (Fig. 5b) and was enriched for GO terms related to the regulation of cytoskeleton and tissue integrity, including terms such as regulation of microtubule-based process (GO:0032886), illustrating the relationship between a protein's module membership score (x-axis) and the protein's significance for the abiotic variable (y-axis). Higher correlations between these parameters indicate stronger associations of the (f) "yellow", (g) "red", (h) "green" and (i) "grey" modules with their associated abiotic variables (pH, magnesium, oxygen and nitrate, respectively). The module with the second highest frequency of proteins related to the cytoskeleton COG family was the "pinkBC" module (n = 43; Fig. S22). Its eigengene expression correlated negatively (p < 0.05) with bio2 (mean diurnal range; r = − 0.43) and bio4 (temperature seasonality; r = − 0.53), indicating that temperature fluctuations (monthly and seasonal) directly influence species' molecular responses, additionally to determining their distributional range 82 . Interestingly, the "black BC " module consisted to 43% of proteins associated with the COG family O: post-translational modification, protein turnover, and chaperones (Fig. S22). Like the "pink BC " module, it also correlated negatively (p < 0.05) with bio2 (r = − 0.51) and bio4 (r = − 0.51), hinting at the importance of PTMs and molecular chaperones in adaptive mechanisms to temperature fluctuations.

Discussion
Relating the most variable proteins that drive proteome differentiation among populations to their associated protein families and functions can provide broad insight into which physiological systems respond dynamically to abiotic variation. As a result, any environmental change that is to be deduced from molecular profiles that are associated with any of these physiological systems are potentially confounded. In this study, the differentiation among population proteomes was associated with environmental heterogeneity between springs, geographic distance between populations and past climatic conditions. These findings indicate that the presence or absence of spectra associated with certain peptides/proteins may be an important indicator of environmental differences and that relative protein abundance differences are influenced by the geographic distance between sites. Even though no relationship between protein abundances and environmental distance emerged, protein reaction norms and association of protein modules with abiotic variation show that protein abundances of C. irrorata populations are regulated by the species' integration of variation in abiotic conditions commonly experienced in its habitat.
The absence of such a relationship (i.e. the majority of identified proteins did not exhibit a relation or a norm of reaction) was to be expected since cells maintain their own internal environment, e.g. via ion-transport 83 . Proteins associated with energy acquisition and storage (e.g. hexamerin-1.1 84   www.nature.com/scientificreports/ Fig. S21), indicating, for example, that nutrient availability differs between sites and affects populational protein abundances. Further, protein families showing high frequencies among all DAPs likely contribute substantially to proteome-divergence. High frequencies of DAPs belong to the O: Post-translational modification, protein turnover, and chaperones, T: signal transduction mechanisms and C: energy production and conversion COG families. These findings point to differences in cellular environmental sensing 88 , and post-translational modification (PTM), regulatory modifications that may change or suppress protein functions 89 , between sampling regions. Our findings therefore substantiate the importance of PTMs in integrating environmental cues and regulating physiological processes, suggesting that PTM profile differences between natural populations are substantial and are agents of plasticity. As a detailed discussion of all significant module-abiotic relationships and annotation profiles is beyond the scope of this research article, we focused the discussion on certain protein reaction norms and protein networks that corresponded with differences in abiotic conditions of springs. We first establish functional connections between modules and abiotic variables as this information is of great relevance to studies investigating molecular abundance and activity differentiation between populations 90,91 . Functional relationships between modules and abiotic variation. We highlight functional relationships between modules and abiotic variables with the examples of the "yellow" module, related to pH, and the "green" module, related to oxygen concentration. Proteins in the "yellow" module show functional profiles related to cell-internal and external pH conditions: Member proteins such as succinate dehydrogenase and peptidyl-prolyl cis-trans isomerase are activated by changes in pH and modulate intracellular pH homeostasis, respectively 92,93 . Enrichment for electron transport chain (GO:0022900) of member proteins such as electron transfer flavoprotein [subunits alpha/beta (C-and N-terminal)] indicate a relationship between aqueous pH and the electron transport system (ETS). Fifteen proteins in this module are found in this oligomeric enzyme complex within the inner mitochondrial membrane and of these, seven show increasing reaction norms with increasing pH (Supplementary Information 1 Figs. S11 and S12). The ETS of freshwater invertebrates is dependent on external pH, whereby its activity is reduced at lower pH 94 . The relationship between this module and pH variation is further corroborated when analyzing its Pfam and COG family distributions ( Supplementary  Fig. S12b, Fig. 2d). Higher frequencies of proteins with a zinc finger (LIM type) domain, filamin/ABP280 repeat structure and a gelsolin-like domain such as galectin-8 and a high frequency of proteins related to the Z: cytoskeleton family (n = 22) likely reflect the pH-dependent regulation of the structure and contractility of the actin-based cytoskeleton, as proteins with these domains and within this family are involved in the control of cytoskeletal function [95][96][97][98] . These actin-binding proteins and their ability to cross-link actin filaments is pHdependent [99][100][101] and their regulation of the cytoskeleton represents an integral response of organisms to various environmental contexts including changes in pH 102,103 . Proteins in the "green" module, associated with variation in oxygen concentration, were characterized by high frequencies of heat shock protein 70 family and insect cuticle domains (Supplementary Fig. S13a) and were predominantly associated with the COG family O: Post-translational modification, protein turnover, and chaperones (Fig. 2e). Translated, unfolded proteins require oxygen to form disulfide bonds 104 which may explain why changes in oxygen levels correlated with changes in abundance of proteins associated with unfolded protein metabolic activity. The relationship between the insect cuticle and oxygen concentration is less understood but our data points to key proteins involved in regulating cuticle characteristics such as permeability in response to varying oxygenation of freshwater, potentially to release reactive oxygen species (ROS) 105 or to facilitate penetration of dissolved oxygen 106 . Multiple enriched GO terms such as oxidoreductase activity (GO:0016491), oxidoreductase activity, acting on CH 2 -OH group of donors (GO:0016614) and oxidation-reduction process (GO:0055114) and GO terms related to externally-induced oxidative stress such as response to oxidative stress (GO:0006979) and response to abiotic stimulus (GO:0009628) were enriched for this module, providing further evidence for this relationship. The relation between GO enrichment of intestinal stem cell homeostasis (GO:0036335) and regulation of stem cell differentiation (GO:2000736) in this module and varying oxygen levels of springs suggests that C. irrorata larvae actively regulate oxygen levels in the low oxygen/ ROS niches in which insect stem cells reside, potentially to avoid the effects of ROS on stem cells such as DNA damage and senescence 107,108 .
Overall, the outlined functional relationships indicate the coordinated regulation of protein abundances in response to abiotic variation, validating the present approach as a way to connect protein dynamics with changes in abiotic conditions. Given this functional relationship between pH variation and the "yellow" module, testable hypotheses about adaptation and phenotypically plastic responses to pH variation can be formulated. For example, the enriched GO term retina layer formation (GO:0010842) in the "yellow" module may reflect a previous observation that retinal gene expression changes in response to a drop in local pH 109 . The module contained e.g. paxillin (PF00412) and obscurin (Unc-89), a signaling (COG: T) protein containing a Immunoglobulin I-set domain (PF07679), whose deletion affects eye development in zebrafish 110 . This finding hints at the pH-induced regulation of stemmata physiology in the Trichoptera larval ocelli, a potential adaptation to live underwater.
Variation of biomarker proteins. A consequence of the relation between adaptive physiology and environment is that baselines and activity levels of protein biomarkers may shift according to abiotic variation of the studied system. Eco-toxicological studies showed that enzymatic variability across time and/or space is often not fully explained by stress factors such as pollution 111,112 . Additionally, abundance measures of protein biomarkers are regularly applied in eco-toxicological research including freshwater invertebrates 113 . Members include e.g. the cytochrome P450 mixed function oxidases and enzymes such as catalase and filamin-A [114][115][116][117] . Catalase, filamin-A, ATP-citrate(pro-5)-lyase and the two cytochromes Cyp62a and Cyp4c1 exhibited significant reaction norms in relation to geographic elevation (Supplementary Fig. S14) www.nature.com/scientificreports/ ticides such as dichlorodiphenyltrichloroethane (DDT) [118][119][120] . Reaction norms of these two cytochromes over an elevation range hint at a potential presence of organophosphorus insecticides in groundwater-aquifers in Central Europe. Moreover, it might indicate that the contemporary application of organophosphorus insecticides in agricultural practice results in groundwater contamination as anthropogenic input of xenobiotics should increase with decreased elevation and corresponding increased settlement density. The described biomarker norms of reaction exemplify that sources and nature of variability in organism-environment relationships need to be understood and made explicit; i.e. used as information rather than dismissed as noise in eco-toxicological analyses of natural populations.
Adjusting for different baseline abundances. In comparative field-studies that investigate the systems-wide regulation of proteins in response to environmental change (e.g. comparing populations experiencing drought vs. no drought or a shift in mean temperature vs. no shift), the influence of that change on organismal physiology is commonly identified through the log 2 fold-change (FC) values of pairwise comparisons between abundances of identified proteins 121,122 . Our study indicates that these compared abundances are directly influenced by abiotic variation and, therefore, by environmental differences between (sampling) sites (e.g. Fig. 2). Importantly, our data show that these DAPs may precisely be differentially abundant because of differences in abiotic conditions between individual sites. Results of pairwise comparisons may therefore over-, or even underestimate the differences between compared groups if only one studied factor between regions is considered in the sampling design (e.g. polluted vs. non-polluted). Including information on baseline abundance shifts due to confounding variables could, for example, be achieved by reducing the set of DAPs by those that also show abundance increases or decreases with abiotic conditions. One could account for the influence of con-founding abiotic variables by subtracting the coefficients of the regression model from the original abundances used as input for e.g. LIMMA differential analysis, followed by a new differential analysis on adjusted abundances. This value should then be representative of the protein abundances of populations independent from the influence of pH differences between regions. Since we did not investigate regional differences such as pollution specifically, and, therefore, do not expect abundance differences to be associated with other than abiotic differences, future studies applying such an adjustment are warranted.
Temperature tolerance. Despite being a key environmental variable in determining the physiology of ectothermic organisms, in situ water temperature was not a key abiotic factor regulating a protein-network. In contrast, proteins regulating the cytoskeleton appear to be related to past extreme climatic conditions. "Green BC " eigengene expression increased with higher minimum extreme temperatures and annual mean temperature (Fig. 5). Cytoskeletal integrity is perturbed by a variety of cell stress responses and plays a crucial role in maintenance of cell homeostasis 123,124 . This relationship echoes previous findings that show cytoskeletal regulation in response to temperature change via e.g. associated cryoprotective dehydration [102][103][104][105][106] . Population-wide abundance changes of cytoskeletal proteins in response to previously experienced thermal extremes indicates heritable protein abundance variation between populations attributable to heritable gene expression variation 125,126 , supporting the notion that extreme environmental temperatures exert a strong selective pressure on ectothermic species 127 . In support of this finding, the "green BC " and "pink BC " modules, both with high numbers of proteins associated with cytoskeletal functioning (Supplementary Fig. S17), show decreasing eigengene expression with increasing temperature seasonality, i.e. the standard deviation of the mean monthly temperature. These results indicate that cytoskeleton regulation is integral to organismal functioning when organisms experience less-variable temperature regimes (i.e. longer stretches of warmer or colder temperatures). Integral parts of the cellular stress response include the evolutionary conserved heat-shock-and cold-shockresponse (HSR and CSR), characterized by the expression of heat-shock proteins (Hsps). Putatively cold-adapted species have been shown to constitutively express Hsps to facilitate protein folding at low temperatures 128,129 , a phenomenon also observed in C. irrorata 46 . Here, we observed constitutive and increasing but also decreasing Hsp reaction norms with increasing in situ temperatures ( Fig. 4e; Supplementary Fig. S15). Chaperones with decreasing norms of reaction such as Hsp83 and Hsp90 might be integral for alleviating cold-induced protein denaturing and increased protein folding time. These patterns indicate evolutionarily sub-divided roles within the Hsp gene family in putatively cold-adapted species, with certain Hsps functioning during HSRs and others during CSRs. In light of the present data, chaperonins belonging to the Chaperonin Cpn60/TCP-1 protein family such as Hsp68 could be indicative of cold-adaptation of C. irrorata, especially since chaperonins govern cell growth at cold temperatures and springs show constant cold temperatures 79,130,131 . Species from such stable and cold environments might be ill-adapted for climate-induced warming of aquatic ecosystems 132,133 . Somewhat counterintuitively, recent cellular evidence indicates that, for certain aquatic species, the stable cold environment might be a conditioning factor such that they evolved to be able to protect themselves from temperature-induced cellular damage 129,134,135 . Limitations and implications for the study of natural populations. This work assumes that if two populations have a different quantity of protein it means that the proportion of individuals which express that protein differs between populations. Inter-individual differences are not only pronounced but directly influence the population-level outcome of environmental change 136,137 . Consequently, even in the absence of a significant population-level response to the environment, some individuals may still respond plastically to changing conditions. Due to this strong inter-individual variation, it may be worthwhile to apply omics-wide assessments on pooled samples directly, if the goal is to obtain a "screening" of the physiological status of populations. Overall, this approach can help keeping sample sizes smaller than inter-individual studies, and, as shown in this study, does not apparently mask abundance patterns in response to environmental variation including established Scientific RepoRtS | (2020) 10:15538 | https://doi.org/10.1038/s41598-020-72569-4 www.nature.com/scientificreports/ biomarker proteins. The systematic characterization of such relationships in a non-targeted approach, using a non-model organism, has the potential to give unprecedented insights into the molecular functioning of a variety of ecologically relevant species. We further recognize that there are many other sources of confounding factors that complicate the interpretation of the observed molecular responses. Our study design did not account for biotic influences on molecular profiles such as predator presence 138 , parasitism status 139 and microbial differences between sites and individuals or populations 140,141 . Further, since we used whole-animal lysates, information on GO term enrichment at certain life stages or tissues were likely missed. Lastly, information on other abiotic and biotic factors such as prevailing thermal conditions or co-abundant species are additionally needed to further refine baseline levels of protein abundances and to understand how natural environmental conditions might mask and alter the molecular responses of natural populations.

conclusion
The primary findings of this study are that the variation in proteome profiles of pooled population samples are attributable to abiotic but also geographic and past climatic differences between sites. Variables such as pH, oxygen concentration, past thermal minima and seasonality "explained" a part of the variation and need to be considered when comparing sites experiencing differential environmental change. Additionally, protein families related to energy allocation, cellular information processing and PTM regulation contribute to proteome divergence overall and likely play a vital role in physiological adjustment to varying abiotic conditions. More fundamentally, protein norms of reaction are ubiquitous and often reflect the functional roles these proteins have within multicellular organisms. Additionally, signatures of local adaptation (e.g. warming temperatures or differences in oxygenation) can be gleaned through such comparative analyses, whereby abundance increases of cytoskeleton-related proteins in populations experiencing warmer temperatures at a historical timescale are just one example. This considerable variability in protein abundance patterns likely distorts abundance differences in comparative field studies if abiotic differences are not accounted for, potentially leading to over-or under-estimation of environmental change experienced by organisms. Integrating this natural variation appears to be an important step in systems-wide biomarker development and for disentangling the differences between populations experiencing differential environmental change.

Data availability
We provided all raw, transformed and filtered data required to interpret, replicate, and build upon our findings in the Supplementary Information files 1 and 2 (combining quantitative and annotation information). An R script, detailing all commands used in data analysis can be found in the "Supplementary Information". Mass spectrometry raw data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository 142 with the data set identifier PXD017959.