Background & Summary

Dreissena polymorpha and D. rostriformis bugensis, also known as the zebra and the quagga mussels, are model species in ecology and ecotoxicology since the early eighties. These invasive species are now commonly found in freshwaters of the northern hemisphere. D. polymorpha belongs to the 100 of the World’s Worst Invasive Alien Species identified by the IUCN (http://www.iucngisd.org/gisd/100_worst.php). It spread from Ponto-Caspian basin to northern and western Europe in the 19th century, and to North America and Middle East (Turkey) in the late 20th century, while D. r. bugensis colonized both North America and Western Europe between the late 20th and at the beginning of the 21th century1,2. They can reach high densities in invaded ecosystems3, inducing important ecological and economic damages4,5. Their distribution and invasion dynamic is well documented, as well as ecological features of colonized ecosystems. Dreissenids thus become model species in ecology of biological invasion to identify pathway of invasion, and genetic mechanisms associated to colonization at various spatial and temporal scales.

Their ability to tolerate a wide range of environmental contaminants and their presence in almost every major hydrosystem in Europe and North America also lead dreissenids to become a model species for ecotoxicologists, who identified them as the counterpart of Mytilus for freshwaters6. Indeed, being abundant, sessile, filter feeder, mussels bioaccumulate contaminants present in the water column, and are good candidates to be included in biomonitoring programs. It allows to evaluate both the presence of contaminants through bioaccumulation measurements7,8 and their effects through the use of biomarkers, either in the field or in laboratory9,10,11.

A bibliographic research with the keyword “dreissena” performed on the Web of Science the 22th of january 2019 reveals that 3,667 articles were published since 1990, with ca. 180 article each year since 2010. In the meantime, these articles were cited 97,000 times, with a h-index of 119. Their distribution among scientific topics (Fig. 1) reveals this dual interest in ecology and ecotoxicology. However, over the eleven Bioprojects present in NCBI for Dreissena, most focused on foot, gill or mantle tissues where few concern digestive glands. Moreover, the majority of their studies are not de novo RNA-seq but micro-array studies and almost none of them made sequences available in accessible repositories. In the present study, digestive gland was chosen for its importance in ecotoxicology due to its central roles in detoxication functions and energy metabolism12.

Fig. 1
figure 1

Top 10 fields treemap for Dreissena publications.

Indeed, genetic studies on zebra and quagga mussel mainly concerns their spread dynamic through a population genetic perspective13,14,15, the identification of genetic markers for their identification, and bivalve phylogeny and phylogeography13,16,17,18,19. When the focus of the study was to elucidate toxic effect pathways, only a very limited set of genes were targeted, corresponding to the main processes implied in organism’s response to contaminant20,21.

Studies from several populations of the two species can pave the way for several new research possibilities. The most promising and useful ones are to make available a reference transcriptome for proteogenomic studies, to identify sequences to develop new sets of biomarkers, and to better understand acclimation mechanisms occuring during the invasion process and implied in tolerance to contamination.

Methods

Sampling and RNA extraction

Samples were performed on 12 sites representative of french dreissenid populations diversity. Sites were chosen according to their variable contamination levels, chemicophysical quality and topography characteristics (see Supplementary Table 1). Five bivalves were sampled by populations and digestive glands were in situ dissected and stored in RNAlater (Qiagen) at 4 °C. Total RNA extraction was done using RNeasy MinElute kit (Qiagen) according to the manufacturer’s protocol with slight modifications. Briefly, tissus disruption was done first with a pestle and then with a beads beater with glass bead (200 µm) for 1 min. at max speed in 1 mL of RLT buffer supplemented with 40 mM DTT. Samples were centrifuged 3 min at 20,000 × g. Qiagen protocol was used for the next steps. Genomic DNA was removed by DNase digestion on column and total RNA was eluted in ultra pure water. RNA purity and quantity were assessed by OD measurements (OD 260 nm and OD ratio 260/280 and 260/230) and RNA integrity was checked using Bioanalyseur 2100 (Agilent, CA, USA). Total RNA samples were stored at −80. Samples were send to GeT-PlaGe core facility in dried-ice.

RNA sequencing

RNAseq was performed at the GeT-PlaGe core facility, INRA Toulouse. RNA-seq libraries have been prepared according to Illumina’s protocols using the Illumina TruSeq Stranded mRNA sample prep kit to analyze mRNA. The 79 individuals were sequenced separately on 8 different lanes of one flow cell. Briefly, mRNA were selected using poly-T beads. Then, RNA were fragmented to generate double stranded cDNA and adaptators were ligated to be sequenced. Eleven cycles of PCR were applied to amplify libraries. Library quality was assessed using a Fragment Analyser and libraries were quantified by QPCR using the Kapa Library Quantification Kit. RNA-seq experiments have been performed on an Illumina HiSeq3000 using a paired-end read length of 2 × 150 pb with the Illumina HiSeq3000 sequencing kits.

De novo assembly and annotation

The RNA-seq libraries read quality was evaluated using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Reads from 45 D. polymorpha and 34 D. rostriformis bugensis individuals were cleaned, filtered and de novo assembled for each species with DRAP (De novo RNA-seq Assembly Pipeline, version 1.7)22 using the Oases assembler23. Contigs were kept if they had at least one FPKM. Contigs were then aligned with NCBI BLAST (version 2.2.26, e-value under 1e-5 parameter) on Refseq, Swissprot and three databases issued from Ensembl (Crassostrea gigas, Lottia gigantea and Lingula anatina sequences) to retrieve corresponding annotations. Contigs were also processed with RNAmmer (version 1.2, standard parameters)24 to find ribosomal genes, with RepeatMasker (version open-4-0-3, -engine crossmatch -gccalc -species Crassostrea gigas parameters, http://www.repeatmasker.org) to list contained repeats and with InterProScan (version 4.8, -goterms -pathways parameters)25 for gene ontology and structural annotation. Reads were realigned back to contigs with BWA (version 0.7.12, standard parameters, mem algorithm)26. The resulting sam files were compressed, sorted and indexed with SAMtools (view, sort and index programs, version 1.1, standard parameters)27. Contig expression counts were generated from the bam files with SAMtools (IdxStats program, version 1.1, standard parameters) and merged with UNIX commands (cut, paste). Alignment files were then filtered for duplicates with SAMtools (rmdup program, version 1.1, standard parameters) before variant calling (SNPs and INDELs). Resulting bam files were processed with GATK (version v3.0–0-g6bad1c6, -glm BOTH parameter) following the best practices found on the GATK website28. All the results were uploaded in a RNAbrowse instance29 and can be accessed from the web at http://ngspipelines.toulouse.inra.fr:9014/. Further KEGG annotation were done with KAAS interface (KEGG Automatic Annotation Server) using contig sequences in fasta format.

Biomarker explorations were done using the implemented BLAST in NGSpipeline interface. Biomarker sequences come from close related species (i.e. molluscs). Candidates were selected by the user on score, e-values, identity and length. If several candidates exist, the sequence with highest score was selected.

Data Records

Raw reads and assemblies were gathered in the same NCBI BioProject (PRJNA507340) which includes all BioSamples used for transcriptome assembly (Table 1)30,31,32. All datasets were also available online on sequencing platform web interface (Table 2)33. Datasets annotations (contigs, KEGG, GO) are available on Figshare (Table 3)34.

Table 1 BioProject deposit. The BioProject gathered all BioSamples, SRA30 and TSA31,32 related to this Data Descriptor.
Table 2 NGSPipeline deposit.
Table 3 Figshare deposit.

Technical Validation

Extraction and RNA integrity

Total RNA purity was assessed with a Nanodrop ND-100 Spectrophotometer (Nanodrop Technologies, Wilmington, USA) and RNA with a 260/280 and 260/230 ratio superior to 1.8 were kept. RNA integrity was evaluated with a Bioanalyzer (Agilent RNA 6000 Nano kit). Due to a non conventional 18S/28S ribosomal ratio in bivalve, sample quality was evaluated with the 18S/28S ratio and on the electropherogram (Supplementary Fig. S1). The absence of degradation fragment in the 5S, Fast and Inter regions was a criteria of selection for sequencing.

De novo transcriptome assembly validation

A total of 94,217 contigs were assembled spanning from 200 to 40,000 bp and with an average length of 2,314 bp for D. polymorpha and 1,972 for D. r. bugensis (Table 4). Assembly validation aims at verifying the correspondence between contigs and assembled reads, between contigs and the proteome of a phylogenetically related species and to check if the contigs host single copy awaited proteins. First, reads were mapped back to contigs for each sample in order to monitor the realignment rates revealing a realignment rate above 95% for both species (Table 5). Second, Crassostrea gigas proteins were aligned to contigs using BLAT (v. 35 × 1)35. Proteins mapping at 50% identity and over 50% of the protein length were counted to measure the share of proteins correctly reconstructed by the assembly (Table 6). Last, contigs were processed with BUSCO version 3.0.236 using the metazoa OrthoDB (v. 9) database to check for awaited proteins (Table 7) which shows that almost 95% of expected metazoa BUSCO sequences were found as complete in our de novo transcriptome.

Table 4 Assembly metrics.
Table 5 Individual realignment statistics.
Table 6 Crassostrea gigas proteins.
Table 7 BUSCO analysis.

Annotation quality

More than the half of assembled contigs found annotation with an e-value under 1e-5 (59% of 44,538 D. polymorpha contigs and 57% of 49,679 D. rostriformis bugensis contigs). Among top 5 species found as best annotation, Crassostrea gigas represent the best hit species matching with approximately 53% of D. polymorpha and D. r. bugensis sequences (Fig. 2). Among the “others” category, some non bivalves sequences can be present thanks to the analyzed tissue (as digestive gland may contain processing foods). Sequence functional annotations successfully attribute GOterms to 35,4% and KEGG to 21.6% of D. polymorpha contigs and attribute GOterms to 34,2% and KEGG to 20.6% of D. r. bugensis contigs. Full contigs annotation as well as functional annotations are available in the figshare deposit in csv format for quick and easy reuse (Table 3).

Fig. 2
figure 2

Most representated species hits. Top 5 best species hits for (A) Dreissena polymorpha and (B). Dreissena rostriformis bugensis.

Usage Notes

We present here the first whole de novo transcriptomes of the two invasive Dreissena polymorpha and D. rostriformis bugensis species who will be fully available in TSA repository. One of the strength of proposed transcriptome is the conditions in which it was obtained: by investigating individuals coming from several populations, inhabiting contrasted hydrosystems, we encompassed as much as possible the variability of the transcriptome induced by environmental conditions (pollution, biotic interactions, physico-chemistry, climate…), with a mRNA set that is representative of this ecological heterogenity. Our ambitious approach also lead to identification of numerous sequence variants (insertion, deletion and single-nucleotide polymorphism), since studied populations are established for a long time and may have been prone to mutation.

In ecotoxicology, biomarkers implied in responses to oxidative stress, heat shock or xenobiotic exposure are used as indicators of stress. Nowadays, multibiomarker approaches have become a standard, considering that the diversity of contaminants that organisms are exposed to can generate a wide diversity of biological responses. Here, we first wanted to focus on the most frequently used biomarkers in ecotoxicology and which are listed in Table 8. Biomarkers were screened, manually checked and all were found expressed in our both species digestive glands. This table also shows the closest species hits (blastn) and biomarkers selected here exhibit relatively high value of homology with the corresponding sequence in other mollusk, supporting our contig assignations. The high sequence conservation levels observed here may facilitate the development and adaptation of further measurements assays from existing assays in close related species. Interspecies sequence alignments were also investigated indicating a high levels of homologies between the two species (Table 9). In the same way, this may lead to the development of biomarker based assays that, because of sequence similarity, could be cross compatible for D. polymorpha and D. r. bugensis.

Table 8 Biomarker best contig hits and closest species hits.
Table 9 Interspecies sequence homologies.

By providing Dreissena sequences through a user-friendly interface, we open the way to further explorations of Dreissenids molecular mechanisms by such biomarker assays development, primer design allowing targeted expression analysis or promising proteogenomics studies when coupling with mass spectrometry analysis.