Effects of cold acclimation and dsRNA injections on Gs1l gene splicing in Drosophila montana

Alternative splicing, in which one gene produce multiple transcripts, may influence how adaptive genes respond to specific environments. A newly produced transcriptome of Drosophila montana shows the Gs1-like (Gs1l) gene to express multiple splice variants and to be down regulated in cold acclimated flies with increased cold tolerance. Gs1l’s effect on cold tolerance was further tested by injecting cold acclimated and non-acclimated flies from two distantly located northern and southern fly populations with double stranded RNA (dsRNA) targeting Gs1l. While both populations had similar cold acclimation responses, dsRNA injections only effected the northern population. The nature of splicing expression was then investigated in the northern population by confirming which Gs1l variants are present, by comparing the expression of different gene regions and by predicting the protein structures of splices using homology modelling. We find different splices of Gs1l not only appear to have independent impacts on cold acclimation but also elicit different effects in populations originating from two very different environments. Also, at the protein level, Gs1l appears homologous to the human HDHD1A protein and some splices might produce functionally different proteins though this needs to be verified in future studies by measuring the particular protein levels. Taken together, Gs1l appears to be an interesting new candidate to test how splicing influences adaptations.


Supplementary information: Description of the data analysis of the transcriptome data from which only the information of Gs1l gene was used in this study.
Sample collection 21 days old cold acclimated and non-acclimated (control) D. montana female and male flies (treatments as described in the methods of the manuscript) were collected from the Korpilahti (Finland, 62°00'N, 25°34'E) mass-bread population for RNA extractions (done in the same way as described in the methods). We used three replicates each consisting of pooled sample of 3 flies and library preparation was carried out according to Illumina TruSeq® Stranded mRNA Sample Preparation Guide. Samples were sequenced with Illumina TruSeq technology with 150 bp forward and reverse reads using Finnish Functional Genomics Center (FFCG) in Turku (https://www.btk.fi/). For each of the 12 samples, a total of 60-90 M paired-end reads were sequenced.

Quality checking and trimming
The quality of raw reads was checked with FastQC 0.11.4 1 . Illumina TruSeq Adapters ('Illumina adapter sequences' 2009) were detected and removed with CutAdapt 1.9.2.dev0 (with Python 2.7.6) 2 . Quality was rechecked with FastQC, after which reads were trimmed to length of 140 bp with Trimmomatic 0.35 3 . Next, bases falling below a quality score 30 were cut from the beginning (LEADING:30) and the end of the read (TRAILING:30). Finally, 17 bp wide reading frame, "a sliding window", was slid through the read starting from the 5'-end, and when average quality within the sliding window dropped below 19, the rest of the read was discarded (SLIDINGWINDOW: 17:19). After this step, reads shorter than 85 bp were discarded (MINLEN:85), in order to minimize expression bias by short, aggressively trimmed reads 4 . On average, a total of 6 % of the reads were lost in the trimming steps.

Mapping the reads to the genome
Before mapping, a bowtie2-index 5 was built. Then Tophat 2.0.13 6 was used for mapping with --readedit-dist and -N 4 with respect to D. montana genome with its annotation, for each sample in each lane separately (12 samples x 5 lanes = 60). Following this read mapping step, the mapping percentages were inspected and when no discrepancies were detected the mapped reads files of different lanes were merged within samples with SAMtools merge 1.3 7 . Mapping success was 70-81 %.

Constructing the annotation
The annotation used in this study had been constructed with RNA-seq reads from two previous studies 8,9 . To build a custom transcript-annotation, where the majority of transcripts from the RNAseq experiment were included, Cufflinks 2.2.1 10 was used as follows. Aligned reads and annotation served as inputs to Cufflinks and it was run separately for each sample. Then the sample-specific annotations were combined to construct a common annotation for all the samples with cuffmerge 2.2.1. The few transcripts whose direction (+ or -) Cufflinks could not determine (Cufflinks symbol) were converted to forward (+) direction for convenience. Transcripts having non-canonical bases at the splicing sites were assumed to be technical noise rather than result of actual non-canonical splicing. Thus, these transcripts were filtered out using gffread, a program included in the Cufflinks package, with canonical requirement -N and including the genome, option -g. Also, strandedness of the transcripts was checked and transcripts overlapping features in the original annotation but going to the different direction were discarded. The overlap between cufflinks-transcripts determined from the RNA-seq data and genes in the original annotation was inspected with bedtools 2.17.0 11 , and a number of transcripts spanning two or more annotated genes were found. Those were assumed to be of non-biological origin and 12,13 .
A final step in constructing the transcript-annotation was removal of rRNA transcripts. The rRNA sequences were obtained from D. virilis ("dvir-all-miscRNA-r1.2.fasta" from FlyBase 14 and blasted (v. 2.3.0) with blastn (provided by NCBI Resource Coordinators, ww.blast.ncbi.nlm.nih.gov) against the transcript-annotation. rRNA transcripts were then removed from the annotation. This final transcript-annotation was then used as a reference to remap quality-trimmed reads to the genome with Tophat with the same parameters as before. Then, the potential genes were extracted from this transcript-annotation with gffread, forming gene-annotation to be used for counting reads mapped to genes. Results using the updated annotation were compared to those conducted with the original annotation to assess the impact of the new annotation. Finally, a reference for transcript calculation was constructed from the updated annotation with RSEM 1.3.0 15 with rsem-prepare-reference.

Counting the reads and filtering
Read counts for genes were obtained separately using HTSeq (v. 0.6.1) 16 with option --stranded=no, using the corresponding prepared annotation described above. Read counts for transcripts were obtained with RSEM (v. 1.3.0), using trimmed reads and prepared reference as inputs. The genes and transcripts were filtered according to the read counts removing those with negligible expression and those whose expression was limited to one sex. A generalized, modified approach of maximumbased methods was chosen: the threshold was counts per million (cpm) greater than 1 for at least two samples of each sex and cpm greater than 0 for every sample. Features with lower expression were excluded from further analysis. This relatively harsh filtering was applied in order to improve false discovery rate (FDR) control 17 .

Differential expression (DE)
Filtered read counts were used as inputs for DE analysis between control/acclimated and male/female flies. Differential expression was calculated using R with Bioconductor package edgeR 18 . First, samples were grouped according to their gene counts with principal coordinate analysis, including all the (filtered) genes (edgeR function plotMDS with top = Inf), to visualize and confirm the split between sexes and acclimation treatment. To check the quality of data, betweensample variance i.e. dispersion in gene expression was then examined with respect to expression levels and visualized with edgeR function plotBCV with default parameters. Read counts were normalized using the TMM (trimmed mean of M-values) method 19 . Then a generalized linear model (GLM) with a negative binomial distribution was fit to the data. Factors included were sex and treatment (cold acclimated or control). A within-gene between-sample variance parameter, tag wise dispersion, was individually estimated for each gene. Tag wise dispersion was then used in a GLM likelihood ratio test to determine if a gene was significantly down-or upregulated between cold acclimated and control flies, i.e. DE in cold acclimation. P-values from the GLM likelihood ratio test were multiple tests corrected using Benjamine and Hochberg's equation to control false discovery rate (FDR) 20 with a significance threshold level of < 0.05.
For transcripts, the DE analysis in cold acclimation was similar to the gene level analyses described above, but after calculating p-values for individual transcripts with edgeR, the function perGeneQValue from R-package DEXseq 3.5 21,22 was used to convert significance values to the gene-level, to avoid overestimating the significance values of genes with more transcripts. This approach captured genes with differential transcript usage 23 , but it also included genes with differential expression at the gene level, without any differential transcript usage.   Figure S3) are given in parentheses. Intron 2 area included in variant T1 is marked with grey shading.       Table S2. Sample sizes for the Critical thermal minimum (CTmin) analysis. Fly treatments are dsRNA injection ("dsRNA"), control with no injection ("C(NI)") and control with a Ringer buffer injection ("C(B)"). Fly treatments are dsRNA injection ("dsRNA"), control with no injection ("C(N)") and control with a Ringer buffer injection ("C(B)").