Differential gene expression of Australian Cricotopus draysoni (Diptera: Chironomidae) populations reveals seasonal association in detoxification gene regulation

Understanding the molecular mechanisms of organismal response to human-derived ecosystem change is recognised as a critical tool in monitoring and managing impacts, especially in freshwater systems. Fundamental to this approach is to determine the genes involved in responding to ecosystem change and detect modifications to their expression and activity in natural populations. Potential targets for this approach include well-known detoxification genes that are upregulated in response to stress. Here, we tested whether expression of such genes varied in association with differences in ecosystem health and could be detected in the field. We sampled populations of the freshwater midge, Cricotopus draysoni, from two geographically proximate sites in southeast Queensland, Australia, which differed in their ecosystem health, at multiple time points. We assessed transcriptome-level differential gene expression and predicted greatest differential expression between sites, associated with organismal responses to local physico-chemical factors. In contrast, we observed a clear and dramatic difference in gene expression – including of known detoxification genes – between time points, specifically between periods at the start and end of the austral summer rainfall when in-stream water levels are most different. These data suggest that these waterways experience greatest pollution load when water levels are high following rainfall events.

Recent advances in genomic sequencing technologies have enabled researchers to ask deeper questions about the interactions between genes and the environment than ever before 1 . Of particular importance is understanding how organisms respond and adapt to human-driven ecosystem change 2 . In freshwater habitats especially, monitoring ecosystem responses to changes in, for example, surrounding land use, pollution, and land clearing, is critical for appropriate management and protection. Whilst harnessing gene-environment interactions is not necessarily a new idea in aquatic monitoring 3 , the last decade has seen a significant shift toward utilising and integrating molecular 'big data' to improve ecosystem assessments. Predominately this has involved using high throughput sequencing technologies to enhance biodiversity estimates via metabarcoding ('Biomonitoring 2.0'- 4,5 ) and tracking adaptive shifts by connecting genomics to ecotoxicology under the so-called ecotoxicogenomic approach 2,6,7 .
Identifying adaptive shifts aims to detect sublethal effects of ecosystem change at an earlier stage than traditional biomonitoring, potentially allowing more effective remediation and management 8 . Fundamental to this approach is to determine the genes and biochemical pathways involved in responding to ecosystem change (e.g., detoxification genes) and, critically, detect modifications to their expression and activity in natural populations 9 . Such genes/enzymes can then be used as biomarkers for monitoring populations. A vast literature base describes genes, GST and/or HSP) in response to potential increased pesticide/herbicide runoff from surrounding agriculture at this site relative to Cedar Creek.

Results
Cricotopus draysoni was sampled equally across both sites and seasons (Table 1), although the species was collected in May 2014 only from Cedar Creek, and in October 2014 only from North Pine River. In total, ten samples were sequenced from each site, with five samples from each season at each site. An additional single pooled sample was used as a deep-sequenced reference to improve assembly quality. Numbers of retained reads post-filtering ranged from 15 M to 25 M, with 1.1 billion reads retained for the pooled reference sequence. Analysis of an initial assembly using all samples suggested the sample CED7 from Cedar Creek was an outlier (Supplementary Fig. S2) and so was removed from further analysis. Re-assembly of the remaining samples resolved 38076 transcripts with a mean contig length of 888 bp and N50 of 1503 (Table 2). Blastx search results for the full assembly (38076 transcripts) returned 25978 and 31306 hits to the SwissProt and UniRef90 databases, respectively. Blastp searches of 24112 ORF's inferred by TransDecoder returned 19269 and 22624 hits to the above databases, respectively. Searches against Pfam and SignalP databases retrieved 18165 and 2233 hits, respectively. Within these results, there were numerous hits to selected detoxification gene groups, with greatest numbers of hits to cytochrome p450 (470 hits to 74 genes) and general esterases (268 hits to 63 genes) (Supplementary Table S1). In contrast, ecdysone receptors (three hits to one gene), cholinesterases (4 hits to two genes) and AChE's (15 hits to one gene) were rarer in the dataset.
PCA of total read counts showed no separation according to site on the first three principal components (first two principal components shown in Fig. 2a). In contrast, a clear division between seasons ('End wet'/'Start wet') is evident on PC1 for total gene counts (Fig. 2b). The trend was the same for the above mentioned detoxification gene groups: read counts were not partitioned by sample site for any of the groups assessed; however, putative seasonal trends were observed among most detoxification gene groups (Fig. 3). All analyses hereafter were conducted to investigate this apparent seasonal trend in gene expression. We constructed new assemblies for each season to ensure observed trends were not associated with differences in the number of resolved transcripts between seasons. No evidence was observed for such differences, with similar numbers of transcripts and quality measures obtained for both seasons (Table 2).
EdgeR analysis of log2-transformed normalized read counts for selected detoxification gene groups (FC ≥ 2; FDR ≤ 0.001) revealed several differentially expressed genes between the seasons, from a total of 400 differentially expressed genes. Specifically, three HSP's, two GST's and an esterase were upregulated in the 'End wet' , whereas a cytochrome p450 and another esterase were upregulated in the 'Start wet' (Table 3). More stringent global tests of gene expression (FC ≥ 2; FDR ≤ 0.00001) resolved 99 genes that were differentially expressed between seasons. Sample correlation matrices supported all samples within each season as highly correlated, with clear differences between seasons (Fig. 4, Supplementary Table S2). Two clusters of genes were differentially expressed between seasons: Cluster 1 (84 genes) was upregulated in the 'End wet' period, while Cluster 2 (15 genes) was upregulated during the 'Start wet' (Fig. 5). Plots of expression values of genes within each cluster were highly similar across  Table S3). Genes that failed to return hits to the SwissProt database also returned no hits to the nr database, and may be novel gene candidates. The only detoxification gene retained in the more stringent analysis was a single putative glutathione S-transferase, which was strongly expressed across all samples in the 'End wet' but near-absent in 'Start wet' samples. Other genes upregulated in the 'End wet' are thought to be involved in immune response (apolipophorin), learning and memory (Ca 2+ /calmodulin responsive adenylate cyclase), metal binding (DnaJ homolog), extracellular matrix (papilin), and transmembrane proteins (post-GPI attachment). In contrast, genes upregulated in the 'Start wet' were largely involved in cell signalling (multidrug resistance-associated protein, FERM and PDZ domain-containing protein, insulin receptor substrate), adhesion (Zasp, cadherin-related family member), and amino acid catalysis (bifunctional glutamate/ proline tRNA ligase). Distributions of GO terms associated with differentially expressed transcripts were clearly different between seasons, showing several GO terms present in the 'End wet' but absent in the 'Start wet' , especially among Cellular Component and Molecular Function categories (Fig. 6).

Discussion
This study set out to extend and test in a field environment the results of laboratory-based ecotoxicogenomic research that demonstrated differential expression of detoxification genes in response to pollutants. We sampled populations of C. draysoni -known to tolerate significant ecological impact -from two locations of divergent stream 'health' at several time points throughout 2014-2015. We expected to observe greatest differential expression between sites, associated with organismal responses to local physico-chemical factors. In contrast, we observed a clear and dramatic difference in expression between time points, notably between the start and end of the austral summer rainfall ('Start/End wet') when in-stream water levels differ most. This included marked  seasonal variation in expression of known detoxification gene groups, with GST supported as most significantly differentially expressed. Taken together, these data imply a distinct seasonality in putative organismal responses to pollution in southeast Queensland and supports the argument that ecotoxicogenomics can shed light on responses that otherwise would be overlooked by traditional biomonitoring surveys. Seasonality in pollution regimes in the Australian subtropics is not uncommon, nor unexpected given the distinct seasonal variation in rainfall. However, the period at which pollution might be expected to be at its peak can vary markedly between river systems 44 . For example, increased levels of pollutants have been recorded entering waterways as runoff from surrounding land during high rainfall events 45,46 . Additionally, pollutants that are retained in sediments can be mobilised by the hydraulic action of increased rainfall/flow, thereby becoming more biologically available during wet periods 47 . In contrast, pollution levels in other systems can peak during dry periods due to low waterway discharge, in which case high rainfall acts to dilute pollutants and flush the Figure 3. Principal Components Analysis of TMM-normalized read counts among samples for selected detoxification gene groups. Analysis was conducted on each gene group by site and season, and plots are paired thus: glutathione S-transferase -site (a), season (b); cytochrome p450 -site (c), season (d); heat shock proteins -site (e), season (f); acetylcholinesterase -site (g), season (h); cholinesterase -site (i), season (j); and general esterases -site (k), season (l). Symbols in plots of analysis by site are: unfilled circles -Cedar Creek, unfilled triangles -North Pine River; and by season: filled circles -end wet, filled triangles -start wet.  Table 3. All differentially expressed putative detoxification genes identified by edgeR analysis between 'End wet' and 'Start wet' season samples (log2; FC ≥ 2, FDR ≤ 0.001). Entries are sorted by log fold change (logFC), and gene names were assigned according to Blast annotations from the SwissProt database (<1e-5).
system 47,48 . Seasonal trends in detoxification gene expression or biomarker activity in various macroinvertebrates has been recorded, especially for AChE, with less variability observed for GST [49][50][51] . When observed together, the two trends can be considered linked: changes to instream pollution concentrations driving changes in the activity of detoxification genes and their products in organisms. This study revealed a significant association between season -as defined by rainfall patterns -and gene expression. Specifically, rainfall peaked in 2014 during February-March before falling sharply prior to the commencement of sampling in April ('End wet' , Supplementary Fig. S1). At this time, water levels in both creeks remained high, but had dropped noticeably by May. The majority of differentially expressed detoxification genes were upregulated during this period; including, GSTs, HSPs and a general esterase. Moreover, the majority of other genes also were upregulated during this season. Rainfall was low throughout the rest of austral Autumn and Winter until a pulse of increased rainfall in August. This period corresponded with lowest water levels at both sites; both were visited and sampled between throughout this period, but C. draysoni was rare 40 . Filamentous green algal growth appeared greatest at both sites during this period: generally a sign of increased nutrient load (personal observation, 2014). Water levels remained low when C. draysoni became more abundant in October -just prior to the onset of austral summer rain ('Start wet') -but had increased by the final collection, made in January 2015. This period of lower (but increasing) water levels was associated with fewer differentially expressed genes overall, mostly characterised by developmental genes. These data suggest that these waterways experience greatest pollution load when water levels are high following rainfall events, caused by either runoff from surrounding agricultural areas, or by disturbance of sediments.
The lack of a spatial trend in either general gene expression patterns or in those for specific detoxification genes rejects our initial hypothesis that gene expression would be correlated with sample site, and hence ecosystem health. Specifically, we expected that detoxification genes would be upregulated in North Pine River, associated with the putative increased level of impact at this site relative to Cedar Creek; however, this was not observed. This contrasts with previous studies that have demonstrated spatial variation in detoxification gene expression both between uncontaminated sites 52 and between sites that differ in toxicity level 53 . Our data suggests instead that the influence of seasonal trends in rainfall at both sites overrides any spatial differences between them. However, these results also demand re-evaluation of our initial assumptions about the difference in ecosystem health between sites. We relied on previous water quality and ecosystem health assessments conducted across the catchment over the past 17 years [41][42][43] . Cedar Creek was consistently healthier than North Pine River at the sampled sites, although the former has degraded since monitoring began in 2001, while the latter has remained the same 43 . Possibly Cedar Creek has deteriorated further since the last monitoring assessment in 2012, which makes the lack of spatial pattern in the current study even more important as it suggests that the two sites may now be equivalent in stream health. This raises some concerns over waterway management in the upper Cedar Creek catchment and warrants further investigation. Given that the current sample site is located near the headwaters, close to the D' Aguilar National Park boundary (Fig. 1), likely sources of impact should be relatively easy to identify. Aside from putative detoxification genes, seasonal differences in expression revealed no immediately obvious biological patterns. There were more than five times as many differentially expressed genes in the 'End wet' season than during the 'Start wet' period. In both seasons, differentially expressed genes ranged across a variety of molecular pathways, with a majority of genes in the 'Start wet' associated with Biological Process GO categories, particularly general developmental pathways. All larvae were size-selected to control for age (instar), so this seasonal trend in developmental genes is not related simply to life stage. Possibly there were different nutrient profiles present between seasons, which may have driven different molecular pathways to be upregulated. These data indicate other growth and developmental responses to changes in the instream environment associated with rainfall, but this requires further study to understand what the respective roles of these genes are. Moreover, several transcripts were differentially expressed between seasons, but did not match any genes in SwissProt, Uniref90 or nr databases. These may represent novel chironomid genes that too respond to environmental change associated with rainfall and should be a focus of future research in this area.
There are some critical caveats relevant to this study. Firstly, we lacked actual water quality data for either site during the sampling period, which limits the strength of our inferences of the relationship between rainfall,  o rg a n e lle p a rt sy n a p se sy n a p se p a rt a n tio xi d a n t a u xi lia ry tr a n sp o rt p ro te in b in d in g ca ta ly tic e le ct ro n ca rr ie r e n zy m e re g u la to r m o le cu la r tr a n sd u ce r st ru ct u ra l m o le cu le tr a n sc ri p tio n re g u la to r tr a n sl a tio n re g u la to r tr a n sp o rt e r a n a to m ic a l st ru ct u re fo rm a tio n b io lo g ic a l a d h e si o n b io lo g ic a l re g u la tio n ce llu la r co m p o n e n t b io g e n e si s ce llu la r co m p o n e n t o rg a n iz a tio n ce llu la r p ro ce ss  pollution, and gene expression. Further, we lack data at both sites concerning sediment pollutant levels, and monitoring the episodic mobilization of sediment-bound pollutants requires near-continuous sampling to be informative 54,55 . Taking water and sediment samples at the same time as insects were sampled would have added strength to the interpretation of patterns in this study, but unfortunately outside the scope of this project. However, given the overall robustness of the historical monitoring data upon which we based our site choice, our initial assumptions were sufficient to formulate testable hypotheses. Secondly, estimating gene expression is not always equivalent to actual protein production, as translation may be inhibited prior to complete protein formation 56 . This means that, although a gene may be expressed or upregulated, the gene product may not be biologically active. This is a common assumption that underlies all gene expression studies, and only further proteomic analysis will resolve this issue 57 . Nevertheless, we believe the results presented here are robust: all analyses support a strong seasonal trend in general and detoxification gene expression.
In the context of understanding how tolerance to pollution has evolved in Cricotopus, this study suggests that in C. draysoni both well-known detoxification gene pathways and putative novel genes may be involved in responding to seasonal changes in water quality. Given that the ability to tolerate in-stream pollution was supported as ancestral to the genus 37 , and that expression profiles of co-located species were near-identical 40 , this suggests that potentially all Cricotopus species possess these well-known genes. However, in more ecologically sensitive species (e.g., C. hillmani Drayson & Cranston 2015), detoxification gene expression may be reduced or inhibited, thereby limiting their ecological tolerance and restricting their distribution. Future work should focus on increasing the transcriptomic knowledge of this genus, both by exploring differential expression within species co-located with C. draysoni (e.g., C. albitarsis, C. parbicinctus), and by assessing expression profiles of more sensitive species, to identify genes and pathways common across species versus those associated only with tolerance or sensitivity.
In conclusion, this study has provided strong evidence for the differential expression of detoxification genes in larval chironomid populations associated with seasonal rainfall trends in southeast Queensland. We detected several HSP's, GST's and a general esterase that were upregulated when water levels were high, whereas a cytochrome P450 and another general esterase were upregulated during drier periods. We propose that this reflects increased runoff from surrounding areas entering streams during periods of higher rainfall. The lack of any observed correlation between expression patterns and ecosystem health may reflect deterioration of what was considered the healthier site (Cedar Creek). Together, this study demonstrates an important phenomenon -seasonal variation in sublethal impacts on aquatic macroinvertebrate populations -that must be considered by future monitoring studies.

Methods
Field collections were conducted between April 2014 and January 2015 at Cedar Creek and North Pine River in southeast Queensland, Australia. As noted in Krosch (2017) 40 , rainfall data was concordant between sites over the sampling period, including a late-winter pulse in rainfall in mid-August (Bureau of Meteorology, www.bom. gov.au, accessed July 2016; Supplementary Fig. S1). Sampling occurred at multiple time points over a ten month period, structured especially around the beginning and end of the austral summer rainfall peak when water levels are highest at both sites (Table 1). This provided a set of biological replicates for among-site comparisons to control for within-site variation, and also allowed assessment of trends associated with season/rainfall. Samples were thus segregated for analysis accordingly: April-May 2014 = 'End wet'; October 2014-January 2015 = 'Start wet' .
Here, we utilise RNAseq data for C. draysoni obtained in a previous comparative transcriptomic study to explore patterns of differential expression within-species 40 (BioProject PRJNA350713, Transcriptome Shotgun Assembly Accession GFNI00000000, Short Read Archive Accessions in Table 1). Details of specimen collection, preservation, processing, RNA extraction, sequencing and read filtering are described in Krosch & Bryant (2015) 58 and Krosch (2017) 40 . Cleaned reads were assembled with Trinity Version 2014-04-13pl package 59 . Assembly quality was assessed using distributed scripts in Trinity to calculate number of contigs, N50 and median contig length. Additionally, transcriptome completeness was assessed via BUSCO analysis 60 against the distributed set of arthropod single-copy orthologous genes, and by estimating the number of full length transcripts using Blastx searches against the SwissProt database. Open Reading Frames (ORFs) were inferred with TransDecoder Version 2.0.1 59 . Transcripts and protein annotation was conducted via searches against the Pfam database 61 using HMMER 62 , the SignalP Version 4.1 database 63 , and the SwissProt and Uniref90 databases using Blastx and Blastp searches 64 , and compiled into a report as per the Trinotate Version 2.0.2 pipeline 59 (Supplementary Dataset 1). Gene Ontology (GO) terms were assigned by Trinotate to each transcript according to HMMER, SignalP, Blastx and Blastp search results. Principal Components Analysis (PCA) among samples was conducted using the PtR script distributed with Trinity based on TMM-normalized read counts for all transcripts, and for subsets of read counts for transcripts that returned Blast hits to specific detoxification gene groups (GST, cytochrome p450, HSP, ecdysone receptor, AChE, cholinesterase and general esterases).
We conducted differential expression analysis using the edgeR pipeline within Trinity, based on log2-transformation of expression counts. Expression results for putative detoxification gene groups were extracted based on Blast annotations, and those with Fold Change (FC) ≥ 2 and False Discovery Rate (FDR) ≤ 0.001 were reported. A more stringent (FC ≥ 2; FDR ≤ 0.00001) global test of differential expression was also conducted to identify other major molecular pathways that contributed to any overall seasonal trend in gene expression. Sample correlation matrices and heatmaps of differentially expressed genes versus samples were produced using PtR. Blast annotations for differentially expressed genes were extracted from the global Trinotate report; any transcripts that returned no hits to SwissProt or Uniref90 databases were searched manually against the nr database using the online Blastn portal (https://blast.ncbi.nlm.nih.gov/Blast.cgi -Accessed April 2017). GO terms were extracted for genes that were differentially expressed at each time point and their distribution across GO categories was compared using the online tool WEGO 65 .
Scientific RepoRts | 7: 14263 | DOI:10.1038/s41598-017-14736-8 Data availability. All sequence data, including raw reads and assemblies, originated from a prior study by the first author and are available on GenBank. Accession numbers for Short Read Archive and Transcriptome Shotgun Assembly database entries are provided in text. The Trinotate report of Blast/Pfam/SignalP search results and gene ontologies is provided in Supplementary Dataset 1, and differential expression data are available from the corresponding author on reasonable request.