Regional heterogeneity impacts gene expression in the subarctic zooplankter Neocalanus flemingeri in the northern Gulf of Alaska

Marine pelagic species are being increasingly challenged by environmental change. Their ability to persist will depend on their capacity for physiological acclimatization. Little is known about limits of physiological plasticity in key species at the base of the food web. Here we investigate the capacity for acclimatization in the copepod Neocalanus flemingeri, which inhabits the Gulf of Alaska, a heterogeneous and highly seasonal environment. RNA-Seq analysis of field-collected pre-adults identified large regional differences in expression of genes involved in metabolic and developmental processes and response to stressors. We found that lipid synthesis genes were up-regulated in individuals from Prince William Sound and down-regulated in the Gulf of Alaska. Up-regulation of lipid catabolic genes in offshore individuals suggests they are experiencing nutritional deficits. The expression differences demonstrate physiological plasticity in response to a steep gradient in food availability. Our transcriptional analysis reveals mechanisms of acclimatization that likely contribute to the observed resilience of this population.


Fig. 3. Relative gene expression in in individual Neocalanus flemingeri CVs.
Heatmap of z-scores of differentially expressed genes (DEGs, rows) that did not annotate at an E-value 1e-05 or lower (n=3,365). Color-coding for each gene indicates the magnitude of differential expression calculated as the z-score (scale bottom right). Each column represents relative expression of an individual with collection station identified above. All stations include three individuals with the exception of GAK9 (see Fig. 4B, Supplementary Figure 1). Genes were ordered by similarity of expression pattern as shown by the dendrogram (left). DEGs were identified by GLM test with p ≤ 1.5 after FDR correction.

Supplementary Tables
Supplementary Table 1

. Summary of RNA-Seq results for Neocalanus flemingeri CV individuals collected at sample locations within Prince William Sound and
Gulf of Alaska. RNA-Seq was performed on pre-adults (copepodite stage CV) collected from six stations (three individuals per station) within Prince William Sound (PWS) and Gulf of Alaska (GAK). For each individual, number of RNA-Seq reads, highly-quality reads and NCBI accession numbers are provided. Raw reads have been deposited at National Center for Biotechnology Information (NCBI) under Bioproject No. PRJNA496596.

Stations
Sample

Development of de novo assemblies and of a reference transcriptome
A separate assembly was generated for each of the 18 individuals using Trinity software (v. 2.0.6) on the National Center for Genome Analysis Support's (NCGAS; Indiana University, Bloomington, IN, USA) Mason Linux cluster. The initial parameters of Trinity were set to: --seqType fq --CPU 32 --max_memory 200G --min_contig_length 300 --normalize_max_read_cov 50. The minimum sequence length in the assembly was set to 300 bp. For each assembly, summary of the statistics was obtained using the script TrinityStats.pl (v2.0.6). Each assembly was tested for completeness using "Bench-marking universal single-copy orthologs" (BUSCO) software (v1.22) to identify "core genes", single copy genes highly conserved among eukaryotes, and thus expected to be present in a complete assembly. BUSCO analysis was performed using the Arthropoda dataset consisting of 2,675 single-copy orthologs.
These individual assemblies were used in the genetics analysis using genetic markers (three mitochondrial and one nuclear marker).
The de novo assembly from a GAK1 individual (GAK1-S83R1) was selected as the reference transcriptome to identify differentially expressed genes (DEGs).
Annotation of transcripts and functional analysis were performed in different steps.

Supplementary Results
The assembly statistics for the 18 de novo assemblies were comparable with similar number of transcripts and N50 lengths (Supplementary data 4). In all assemblies, the longest transcript exceeded 20,000 bp with the exception of two individual (GAK9 and GAK14; Supplementary data 4). BUSCO analysis identified approximately 40% complete orthologs in each assembly. On average, 10-12% of these genes present as more than one copy (duplicated) and an additional 12-16% of core genes were fragmented in each assembly (Supplementary data 4). The number of missing core genes ranged from 25% to 45% (Supplementary data 4).  Fig.   2A). The majority (60%) of the transcripts with KEGG annotations were involved in metabolic pathways including amino-acid biosynthesis, lipid and carbohydrate metabolism, and nucleotide metabolism. Mapping rate for the 18 RNA-Seq libraries to the GAK1 reference transcriptome was high and ranged between 82% and 95% with an average of 91.5% ( Supplementary Fig. 1). Ambiguous mapping which was ~58% (reads that aligned > 1 time), is likely to be a result of the large number of multiple isoforms assembled by Trinity.