Integrated de novo transcriptome of Culex pipiens mosquito larvae as a resource for genetic control strategies

Mastrantonio, Valentina; Libro, Pietro; Di Martino, Jessica; Matera, Michele; Bellini, Romeo; Castrignanò, Tiziana; Urbanelli, Sandra; Porretta, Daniele

doi:10.1038/s41597-024-03285-1

Download PDF

Data Descriptor
Open access
Published: 09 May 2024

Integrated de novo transcriptome of Culex pipiens mosquito larvae as a resource for genetic control strategies

Valentina Mastrantonio¹,
Pietro Libro²,
Jessica Di Martino ORCID: orcid.org/0000-0003-0505-4190²,
Michele Matera ORCID: orcid.org/0009-0006-5759-6492^3,4,
Romeo Bellini⁵,
Tiziana Castrignanò²,
Sandra Urbanelli¹ &
…
Daniele Porretta¹

Scientific Data volume 11, Article number: 471 (2024) Cite this article

351 Accesses
Metrics details

Subjects

Abstract

We present a de novo transcriptome of the mosquito vector Culex pipiens, assembled by sequences of susceptible and insecticide resistant larvae. The high quality of the assembly was confirmed by TransRate and BUSCO. A mapping percentage until 94.8% was obtained by aligning contigs to Nr, SwissProt, and TrEMBL, with 27,281 sequences that simultaneously mapped on the three databases. A total of 14,966 ORFs were also functionally annotated by using the eggNOG database. Among them, we identified ORF sequences of the main gene families involved in insecticide resistance. Therefore, this resource stands as a valuable reference for further studies of differential gene expression as well as to identify genes of interest for genetic-based control tools.

The rise of baobab trees in Madagascar

Article Open access 15 May 2024

Cicer super-pangenome provides insights into species evolution and agronomic trait loci for crop improvement in chickpea

Article 23 May 2024

Complexity of avian evolution revealed by family-level genomes

Article Open access 01 April 2024

Background & Summary

Insecticide resistance is a serious problem in the control of vectors and vector-borne diseases (VBDs)¹. In the last decades, many efforts have been addressed to reveal the mechanisms underlying insecticide resistant phenotypes and develop new tools alternative to chemical compounds. Genetically-based tools (e.g., RNA interference, CRISPR-Cas9) have been identified as a cutting-edge alternative to manage insecticide resistance because they enhance our ability to detect, functionally validate, and inhibit putative resistance genes and/or mutations^2,3. Likewise, these tools are proving effective in suppressing critical biological functions, inducing sex ratio distortion, and driving the spreading of desired genes within populations^3,4,5,6. However, these technologies have a primary constraint to be effectively applied: the need for available target sequences. Increasing high-quality -omics resources is therefore mandatory, especially for non-model species.

Transcriptomes have recently been considered valuable genomic resources^{7,8,9,10,11,12}. Contrary to genomic data revealing what genes are present within cells, transcriptomes provide evidence about what genes are expressed under certain conditions. By comparing transcriptome data of different species, life stages, tissues, or conditions, we can thus elucidate the molecular pathways underlying specific phenotypes and identify candidate genes for traits of interest^{13,14,15,16,17}. Nonetheless, this possibility to explore the multiple facets of transcriptomes mainly relies on high-quality references for comparison.

Here, we furnish a de novo assembled transcriptome of the mosquito Culex pipiens (ecotype pipiens), the primary vector of several pathogens such as West Nile virus, Japanese encephalitis, and lymphatic filariasis across the temperate Northern Hemisphere^18,19. This species belongs to the Culex pipiens complex, that also include Cx. quinquefasciatus, Cx. pipiens pallens, Cx. australicus and Cx. globocoxitus²⁰ and it is commonly found in urban and natural habitats. Due to its vector competence, Cx. pipiens has always been a priority in chemicals control applications. Recently, concern about this species has also increased because genetic resistance to the insecticide diflubenzuron (DFB), the principal larvicide used to control Cx. pipiens, has been found in populations across its geographic range^{21,22,23,24,25}. Genetic analyses showed that DFB resistance was associated with a mechanism already observed in other relevant arthropod pests^2,26,27, where point mutations in the chitin-synthase I gene (chs1) cause aminoacidic substitutions from wildtype Isoleucine into mutated Leucine, Methionine or Phenylalanine (i.e., I1043L; I1043M; I1043F)^{2,20,22,28,29}. A deep analysis of the homozygous I1043M-resistant strain of Cx. pipiens established from field populations also revealed that resistant mosquitoes had an increased cuticle thickness and constitutively up-regulate the chs1 gene³⁰.

Since including transcripts expressed under different conditions improves the broad use of the transcriptomic references, the de novo transcriptome of Cx. pipiens presented in this paper has been assembled by including in the dataset susceptible and I1043M-resistant 4-^th instar larvae, which represent the target of DFB applications³¹. The bioinformatic analysis was conducted following field-best practices, employing specialized tools for quality control, sequence alignment, and transcriptomic assembly. In particular, we adopted widely recognized methodologies to ensure an accurate and comprehensive representation of the Cx. pipiens transcriptome. Subsequent annotation was performed using standard transcriptomic annotation procedures, ensuring a thorough understanding of the functionality of the identified transcripts. Furthermore, ORF sequences belonging to detoxifying gene families associated with insecticide resistance, such as ABC transporters, cytochrome P450, glutathione-S-transferases, UDP-glucosyltransferase, cuticular proteins and heat shock proteins were identified within the Cx. pipiens transcriptome.

Our resource will be a valuable reference for future studies about insecticide resistance and future comparative transcriptomic studies about mosquito biology. We cannot exclude that other mechanisms beyond target site mutation in the chs 1 gene could be associated with DFB resistance in Cx. pipiens. As shown by several studies, the analysis of constitutive differentially expressed genes in susceptible and resistant individuals is a main approach to identifying genes associated with insecticide resistance^32,33,34,35. Likewise, the analysis of differentially expressed genes under insecticide exposure could allow us to identify a larger pool of transcripts and reveal further genes associated with insecticide resistance. From a technological perspective, the transcriptome of Cx. pipiens will be also useful to identify new genes of interest for genetic control tools.

Methods

Mosquitoes

Mosquitoes used to generate the dataset were obtained from colonies established from field populations of Cx. pipiens. Immature mosquitoes were collected in the sites of Parma (Lat. 44,768382; Long. 10,319429) and Forlì (Lat. 44,21091241; Long. 12,05524495) to establish the DFB susceptible and resistant colonies, respectively. In each site, at least five breeding sites were sampled to have a representative sample of the population³⁰. The susceptible colony was homozygous for the wild-type allele I1043, while the resistant colony was homozygous for the resistant allele I1043M³⁰. Mosquitoes from 5^th generation of both colonies were used for the current study.

Larval stages of both colonies were maintained in plastic trays containing 2 L of spring water and daily fed with fish food (0.85 mg/larva) (Tetra® Goldfish Granules). Adults were maintained in 45 × 45 × 45 cm Bugdorm insect-rearing cages (Watkins & Doncaster, UK) and daily fed with 10% sucrose solution and water ad libitum. Artificial blood meal was also provided to females to mature and lay eggs (Hemotek Ltd, UK). Immature and adult stages were all reared in a thermostatic chamber with constant conditions of temperature, relative humidity (RH), and photoperiod (L:D) (i.e., T = 26 ± 1 °C; RH = 70%; L:D = 16:8 hours light:dark). Two thousand L₁ larvae of each strain were put in plastic trays filled with 2 L of spring water, and the trays were set up in triplicate. In each tray, the larvae were maintained as described above, and at L₄, a pool of 10 larvae was collected and immediately stored in RNA later buffer (Thermo fisher Scientific, Ravenna, Italy) at −80 °C until RNA extraction^36,37,38.

RNA isolation, library preparation and sequencing

Pooled RNA was extracted from 6 pools of whole-body 4^th-instar larvae of Cx. pipiens (i.e, three pools for each colony). Extraction was performed using whole-body tissues to include a larger pool of RNAs in the de novo transcriptome assembly of Culex pipiens. RNA extraction was performed using the NucleoSpin RNA Plus XS kit (Macherey-Nagel, AG), following the manufacturer’s instructions. A step including a DNase treatment was also performed during RNA extraction. After RNA isolation, the quality and integrity of RNAs were checked by QubitTM fluorimetry and the 5200 Fragment Analyzer (Agilent Technologies, USA). Libraries preparation and sequencing were performed by the Polo GGB (Polo d’Innovazione di Genomica, Genetica e Biologia), Perugia, Italy (http://www.pologgb.com/) using a NextSeq Illumina® platform. The libraries were prepared in accordance with the QIAseqTM Stranded mRNA Selected Kit Handbook for Illumina Paired-End Indexed Sequencing. Briefly, oligo-dT probes were covalently attached to the surface of the magnetically charged Pure mRNA Beads, and the bound RNA was then washed and eluted to provide a highly enriched pool of mRNA. Then the enriched mRNA was fragmentated by heat according to input RNA quality and approximate insert size. After fragmentation, enriched mRNA was converted to cDNA using an RNase H in combination with random primers. Once the synthesis of the first strand finished, the second-strain synthesis was performed using 5′ phosphorylated random primers. Samples underwent adapter ligation, and then the indexed libraries were amplified, purified, and validated. After a normalization step, libraries were pooled in equal volumes for sequencing. The Illumina NextSeq 550 sequencing system was used through the Illumina chemistry V2,2x75bp paired end run. On average 43.7 million reads for each library were obtained and they are available at the NCBI Sequence Read Archive (project ID PRJEB47420).

Pre-assembly processing stage

The bioinformatics analyses described below were conducted using high-performance computing systems through the ELIXIR-IT HPC@CINECA’s call^{39,40,41,42,43}. The workflow of the bioinformatics pipelines, adapted from two previous studies^44,45, is illustrated in Fig. 1. A total of 437,233,732 pairs of reads was produced through Illumina sequencing. All of them underwent a cleaning analysis process to prepare data for assembly. The FastQC 0.11.5 tool (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc) was used to assess the quality of the initial reads, which allowed to estimate the quality profiles of the RNAseq samples. Quality estimates with FastQC were performed on both raw and trimmed data. In order to eliminate adapter sequences and low-quality bases, an initial analysis of the raw reads was conducted using Trimmomatic, v.0.39⁴⁶ (setting the option SLIDINGWINDOW: 4: 15, MINLEN: 36, and HEADCROP: 13). All reads that were unpaired were excluded from further analysis. Executing the trimming process, a total of 391,907,976 clean reads were retained for constructing the de novo transcriptome assembly (which equates to approximately 90% of the original raw reads, as shown in detail in Table 1). To provide a summary quality assessment metrics view for processed data quality, results were combined across all samples and compiled into a report using the MultiQC21 v.1.9 software tool⁴⁷ (see steps 2–4 of Fig. 1). The analysis results were deposited on figshare (see Imagefile 2 on Table 2).

Table 1 Summary of the 6 libraries deposited in the European Nucleotide Archive (ENA, BioProject: PRJEB47420)⁶⁷, in terms of number of raw and trimmed reads per sample.

Full size table

Table 2 Overview of produced data files and their access on figshare⁶⁸.

Full size table

De novo transcriptome assembly and quality assessment

Since the species under study lacks a specific annotated reference genome⁴⁸, we proceeded with the de novo assembly of the transcriptome. We used the assembler rnaSPAdes⁴⁹, a tool for de novo transcriptome assembly from RNA-Seq data, available in the SPAdes v.3.14.1 package. Using rnaSPAdes, we generated a total of 50,416 assembled transcripts, with a CG content of 47% and an N50 of 2,409 bp, as presented in Table 3.

Table 3 Statistics on rnaSPAdes, CD-HIT-est and Corset outputs, evaluated with the Transrate assembly validator.

Full size table

In order to ensure the elimination of assembly redundancies, two filtering steps were executed. The initial step involved the utilization of CD-HIT-est (v. 4.8.1) on the output obtained from rnaSPAdes. The resulting data was then uploaded to figshare for further analysis (see Datafile 2 on Table 2). Subsequently, we employed Corset (v. 1.06)⁵⁰, a tool that has demonstrated its efficacy in a previous study⁵¹, to generate the final assembly. The Corset output showed an N50 of 2208 bp (Table 3). As a result of removing redundancies in an efficient way (two steps of removal), the final assembly contained about 63.94% of the original transcripts.

The evaluation of the assembly results involved two validation phases: one conducted after the assembly process to assess the initial assembly, and another performed after removing redundancies to evaluate the quality of the final non-redundant assembly. To accomplish this, two distinct software tools were employed, namely TransRate⁵² (v. 1.0.3) and BUSCO (Benchmarking Universal Single-Copy Orthologs)⁵³ (v. 5.4.4). These tools generate a range of metrics that serve as indicators for identifying potential errors in the assembly process and provide evidence regarding the transcriptome’s quality. BUSCO offers a quantitative measure of transcriptome completeness and quality by comparing gene content against evolutionarily informed expectations derived from databases of near-universal and highly conserved protein orthologs. In this study, BUSCO analysis was performed using five orthologous gene databases: Arthopoda, Metazoa, Eukaryota, Insecta and Diptera. The degree of transcriptome completeness, as assessed by BUSCO, is presented in Table 4, while Fig. 2 illustrates the distribution of completed, fragmented, and missing genes from the four databases. Furthermore, the quality assessment of the Corset output involved the use of HISAT2 (v. 2.1.0)⁵⁴ to map trimmed reads back to the reference transcriptome (unigenes). The results of all the validation phases can be found in Table 2 and are extensively discussed in the subsequent “Technical Validation” section.

Table 4 The BUSCO (vs. 5) validation, though the gVolante web server to five databases.

Full size table

Generation of the full-length transcriptomes

Following the initial validation and evaluation phase, which involved the use of TransRate and BUSCO, the output of the assembly procedure served as the input for CD-HIT-est program⁵⁵. CD-HIT-est is a hierarchical clustering tool utilized to eliminate redundant transcripts and fragmented assemblies, ensuring the generation of unique genes during the de novo assembly process. The default parameters of CD-HIT-est were employed, setting a 95% similarity threshold. To further refine the final transcriptome dataset, a subsequent hierarchical clustering phase was performed using Corset. Subsequently, the Corset output, after the validation phase with Hisat2, was subjected to analysis with TransDecoder^56,57 (v. 5.7.0), a widely recognized tool for identifying long open reading frames (ORFs) within assembled transcripts. The default parameters of TransDecoder were used, enabling ORF prediction on both strands of assembled transcripts regardless of the sequenced library. Additionally, TransDecoder ranked the ORFs based on their completeness and assessed if the 5′ end was incomplete by examining the presence of any length of amino acid codons upstream of a start codon (M) without a stop codon. The “Longest ORF” rule was adopted, selecting the translation start site with the highest 5 AUG (relative to the inframe stop codon).

Transcriptome annotation

The contigs obtained from the assembly were subjected to ORFs prediction performed with Transdecoder (vs. 5.5.0). For the predicted ORFS of the de novo assembly, we employed a variety of annotations. They were aligned against the Nr, SwissProt, and TrEMBL databases using the DIAMOND algorithm to extract the most relevant annotations.DIAMOND⁵⁸ is an open-source algorithm that offers a significant improvement in speed compared to BLASTX, making it highly suitable for short reads. With a comparable level of sensitivity, DIAMOND employs a double indexing approach to exhaustively determine all meaningful alignments for a given query. Traditional sequence comparison programs, such as BLASTX, typically follow a seed-and-extend paradigm. This two-phase approach involves searching for matches of seed sequences (short segments of the query sequence) in the reference database, followed by an extension phase aimed at computing a complete alignment. To configure DIAMOND, we employed the following parameter settings: DIAMOND-fast DIAMOND BLASTX-t 48 -k 250 -min-score 40 for a faster analysis, and DIAMOND-sensitive: DIAMOND BLASTX -t 48 -k 250 -sensitive -min-score 40 for a more sensitive analysis. These settings ensured efficient and accurate annotation of the assembly.

For each database, Nr, SwissProt, and TrEMBL, we selected the best annotation, resulting in the creation of an annotation matrix. The predicted ORFs s were analysed using BLASTX, the tool performing nucleotide-to-protein sequence searches. The use of BLASTX against Nr, TrEMBL, and SwissProt yielded the following results: 40576 (92.7%), 41507 (94.8%), and 27289 (62.3%) contigs, respectively. These results are in line with raw reads mapping observed in the de-novo transcriptomes annotation of other species^11,59,60. Detailed information on the resulting datasets can be found in Table 5. Furthermore, an overview of the data files and datasets generated in this study, along with relevant details on the data repository and access numbers, is summarised in Table 2.

Table 5 Summary of homology annotation hits performed with BLASTX and on three different databases: Nr, SwissProt and TrEMBL.

Full size table

The results of the BLASTX annotation yielded a total of 27,281 sequences that were simultaneously mapped to the Nr, SwissProt, and TrEMBL databases. Venn diagrams illustrating the overlap of annotations in different databases are presented in Fig. 3, showcasing the redundancy of annotations for both DIAMOND BLASTX. Additionally, Figs. 4, 5 display the top ten most represented species and gene product hits obtained from BLASTX by aligning the transcripts with the reference database Nr. Furthermore, the ten most represented species and the ten hits of the gene product obtained with BLASTX by mapping the transcripts against the reference database Nr are shown in Fig. 4.

The total number of predicted ORFs obtained from the transcriptome assembly were also mapped onto another database of functional annotations: eggNOG (Evolutionary genealogy of genes: Non-supervised Orthologous Groups)⁶¹. The eggNOG database incorporates various taxonomic levels of orthologous groups (OGs) of proteins with functional annotations, using an algorithm that exploits previous orthologous group (COG) methodologies. Of the 43,793 total predicted ORFs obtained in our analyses, 14,966 (or 34.2%) were annotated in the eggNOG database. For details, see Datafile 11 in Table 2.

Selection of ORF sequences belonging to gene families related to insecticide resistance

Regulatory changes of genes involved in the oxidation, conjugation and extrusion of chemical compounds is a main mechanism associated with insecticide resistance⁶². By using an ad-hoc parsing script, we searched within the annotation file of predicted ORFs for gene families known be involved in insecticide resistance. Then, we reported in Table 6 the number of found predicted ORFs in the NR annotation file by this search. Furthermore, in Table 2, we added the link to the FASTA file deposited in figshare (‘Data file 18’), including the nucleotide sequences of predicted ORFs of gene families related to insecticide resistance. Further research comparing mosquitoes exposed to insecticide and under control conditions would be necessary identify additional genes involved in DFB resistance and/or to validate the data presented here.

Table 6 Number of ORFs with Nr annotations belonging to gene families of interest known for insecticide resistance⁵⁰.

Full size table

Comparison with other species through the orthologues

The comparison of orthologous genes with those of closely related is a crucial step in validating the quality of a de novo transcriptome assembly, as outlined by several evidence in literature^63,64,65. Here, therefore we compared ORFs predicted from the de novo transcriptome of Culex pipiens with all proteins potentially transcribed based on the assembled genomes of Aedes aegypti, Anopheles gambiae, Culex pipiens pallens and Culex quinquefasciatus. The identification and orthological grouping of all proteins of the various species was performed with OrthoFinder⁶⁶ (v. 2.5.5). This approach also served to assess the completeness of the assembly on the basis of sequence similarity. OrthoFinder allows the identification of orthogroups, defined as a set of genes that descend from a single gene of the last common ancestor within species groups. A total of 136,663 genes were identified, grouped into 18,163 orthologues. Of these, only 12,764 genes were classified as species-specific and were grouped into 4,337 deduced orthologues. In fact, orthogroup detection showed considerable overlap in sequences in all five groups. Over 40% (7471) of the transcripts identified as putative orthologues were shared between all five species (Fig. 5). Consequently, a relatively low proportion of transcripts were identified as unique to a given assemblage (i.e., “species-specific” or “assembly-specific”); notably only 248 transcripts (1.4%) in Aedes aegypti, 410 transcripts (2.3%) in Anopheles gambiae, 243 (1.3%) in Culex pipiens pallens, 183 (1.0%) in Culex quinquefasciatus and 3253 transcripts (17.9%) in Culex pipiens were classified as species-specific. Therefore, the marked level of sequence overlap observed between the transcriptomes further validates the completeness and quality of the assemblage presented in this study. In addition to providing inference of the completeness of the assemblage, these results represent the first transcriptome-level comparison of four ecologically important Culicidae species. Interestingly, we found no marked difference in the number of overlapping sequences between the focal species in terms of their phylogenetic distance/proximality and each other (Fig. 5).

Data Records

All raw data generated in this project have been deposited in the European Nucleotide Archive (ENA, BioProject: PRJEB47420). The de novo transcriptome assembly resource is available on figshare (link: Data file 3 in Table 2). The datasets containing all files produced in this transcriptome assembly and annotation pipeline (rnaSPAdes transcriptome assemblies, unigenes and functional annotation files) have also been deposited in the figshare archive (links to the pipeline results are listed in Table 2).

Technical Validation

Raw reads quality and validation of the assembly

The overall data quality was assessed using FastQC for all samples before and after cutting. Among the FastQC results, the average quality scores at each base position were above 35 (see image file 1 in Table 2). Validation of the transcriptome assembly was performed using two validation tools: BUSCO and TransRate. The results of the validation steps are shown in Table 3. BUSCO analysis was performed on five databases: Arthopoda, Metazoa, Eukaryota, Insecta and Diptera. The details of BUSCO are listed in Table 4 and some of them are represented, in the form of a histogram, in Fig. 2. A further validation evaluation was performed by mapping the clipped reads against the de novo assembled transcriptome of Culex pipiens. The HISAT2 results showed an even higher percentage of 86% (Fig. 6), confirming the high quality of the assembly. The final transcriptome (unigenes) obtained after CD-HIT-est comprised a total of 41,054 transcripts and an N50 of 2254 bp, with a completeness value of over 80% for the BUSCO evaluation in every database interrogated.

Quality control of annotation

The transcriptome was functionally annotated by running DIAMOND and eggNOG. The application of DIAMOND for annotation resulted in the identification of 43,793 predicted ORFs (for BlastX analysis) shared between the three databases. EggNOG is a comprehensive orthologous gene database that provides detailed functional information for genes within each orthologous group. The EggNOG database includes a wide range of sequenced genomes from different species, providing a robust evolutionary context for our data analysis. The EggNOG analysis provided valuable insights through COG (Cluster of Orthologous Groups) assignments and KEGG (Kyoto Encyclopedia of Genes and Genomes) annotations.

Code availability

All the software programs used in this article (de novo transcriptome assembly, pre- and post-assembly steps and transcriptome annotation) are listed with the version in the Methods paragraph. In case of no details on parameters the programs were used with the default settings.

References

Nauen, R. Insecticide resistance in disease vectors of public health importance. Pest Manag Sci. 63, 628e33 (2007).
Article Google Scholar
Douris, V. et al. Resistance mutation conserved between insects and mites unravels the benzoylurea insecticide mode of action on chitin biosynthesis. Proc. Natl. Acad. Sci. USA 113, 14692–14697 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Gantz, V. M. & Akbari, O. S. Gene editing technologies and applications for insects. Curr. Opin. Insect Sci. 28, 66–72 (2018).
Article PubMed PubMed Central Google Scholar
Burt, A. Site-specific selfish genes as tools for the control and genetic engineering of natural populations. Proc. Biol. Sci. 270, 921–928 (2003).
Article CAS PubMed PubMed Central Google Scholar
Airs, P. M. & Bartholomay, L. C. RNA Interference for Mosquito and Mosquito-Borne Disease Control. Insects 8, 4 (2017).
Article PubMed PubMed Central Google Scholar
Lester, P. J. et al. The potential for a CRISPR gene drive to eradicate or suppress globally invasive social wasps. Sci. Rep. 10, 12398 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Gupta A. K. & Gupta U. D. Next Generation Sequencing and Its Applications. In: Animal Biotechnology, eds. Verma, A. S. & Singh, A. pp. 345–367, (Academic Press, 2014).
De Marco, L. et al. The choreography of the chemical defensome response to insecticide stress: insights into the Anopheles stephensi transcriptome using RNA-Seq. Sci. Rep. 7, 41312 (2017).
Article ADS PubMed PubMed Central Google Scholar
Morandin, C. et al. De novo transcriptome assembly and its annotation for the black ant Formica fusca at the larval stage. Sci Data 5, 180282 (2018).
Article CAS PubMed PubMed Central Google Scholar
Prado-Álvarez, M. et al. De novo transcriptome reconstruction in aquacultured early life stages of the cephalopod Octopus vulgaris. Sci Data 9, 609 (2022).
Article PubMed PubMed Central Google Scholar
Palomba, M. et al. De novo transcriptome assembly and annotation of the third stage larvae of the zoonotic parasite Anisakis pegreffii. BMC Res Notes 15, 223 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chowdhury, M. A. A. et al. Integrated transcriptome catalog of Tenualosa ilisha as a resource for gene discovery and expression profiling. Sci Data 10, 214 (2023).
Article CAS PubMed PubMed Central Google Scholar
Pankey, M. S., Minin, V. N., Imholte, G. C., Suchard, M. A. & Oakley, T. H. Predictable transcriptome evolution in the convergent and complex bioluminescent organs of squid. Proc Natl Acad Sci USA 111, E4736–E4742 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Ingham, V. A. et al. Dissecting the organ specificity of insecticide resistance candidate genes in Anopheles gambiae: known and novel candidate genes. BMC Genomics 15, 1018 (2014).
Article PubMed PubMed Central Google Scholar
Bharati, M. & Saha, D. Differential expression of carboxylesterases in larva and adult of Culex quinquefasciatus Say (Diptera: Culicidae) from sub-Himalayan West Bengal, India. Int J Trop Insect Sci 38, 303–312 (2018).
Article Google Scholar
De Marco, L. et al. Transcriptome of larvae representing the Rhipicephalus sanguineus complex. Mol. Cell. Probes S0890- 8508, 30013–5 (2017).
Google Scholar
Zhang, H. et al. Transcriptome profiling of a beach-adapted wild legume for dissecting novel mechanisms of salinity tolerance. Sci Data 5, 180290 (2018).
Article CAS PubMed PubMed Central Google Scholar
Brugman, V. A. et al. The Role of Culex pipiens L. (Diptera: Culicidae) in Virus Transmission in Europe. Int J Environ Res Public Health 15, 389 (2018).
Article PubMed PubMed Central Google Scholar
Farajollahi, A., Fonseca, D. M., Kramer, L. D. & Marm Kilpatrick, A. “Bird biting” mosquitoes and human disease: a review of the role of Culex pipiens complex mosquitoes in epidemiology. Infect Genet Evol. 11(7), 1577–85 (2011).
Article PubMed PubMed Central Google Scholar
Harbach, R. E. Culex pipiens: species versus species complex taxonomic history and perspective. J Am Mosq Control Assoc. 28, 10–23 (2012).
Article PubMed Google Scholar
Grigoraki, L. et al. Striking diflubenzuron resistance in Culex pipiens, the prime vector of West Nile virus. Sci Rep 7, 11699 (2017).
Article ADS PubMed PubMed Central Google Scholar
Porretta, D. et al. Focal distribution of diflubenzuron resistance mutations in Culex pipiens mosquitoes from northern Italy. Acta Trop. 193, 106–112 (2019).
Article CAS PubMed Google Scholar
Fotakis, E. A. et al. Identification and detection of a novel point mutation in the Chitin Synthase gene of Culex pipiens associated with diflubenzuron resistance. PLoS Negl. Trop. Dis. 14, e0008284 (2020).
Article CAS PubMed PubMed Central Google Scholar
Guz, N., Çağatay, N. S., Fotakis, E. A., Durmuşoğlu, E. & Vontas, J. Detection of diflubenzuron and pyrethroid resistance mutations in Culex pipiens from Muğla, Turkey. Acta Trop. 203, 105294 (2020).
Article CAS PubMed Google Scholar
Vereecken, S. et al. Phenotypic insecticide resistance statusof the Culex pipiens complex: a European perspective. Parasites Vectors 15, 423 (2022).
Article CAS PubMed PubMed Central Google Scholar
Van Leeuwen et al. Population bulk segregant mapping uncovers resistance mutations and the mode of action of a chitin synthesis inhibitor in arthropods. Proc. Natl. Acad. Sci. USA 109, 4407–4412 (2012).
Article ADS PubMed PubMed Central Google Scholar
Tadatsu, M. et al. A mutation in chitin synthase I associated with etoxazole resistance in the citrus red mite Panonychus citri (Acari: Tetranychidae) and its uneven geographical distribution in Japan. Pest. Manag. Sci. 78, 4028–4036 (2022).
Article CAS PubMed Google Scholar
Mastrantonio, V. et al. Evolution of adaptive variation in the mosquito Culex pipiens: Multiple independent origins of insecticide resistance mutations. Insects 12, 676 (2021).
Article PubMed PubMed Central Google Scholar
Porretta, D. et al. Historical samples reveal a combined role of agriculture and public-health applications in vector resistance to insecticides. Pest Manag Sci 78, 1567–1572 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lucchesi, V. et al. Cuticle modifications and over-expression of the chitin-synthase gene in diflubenzuron resistant phenotype. Insects 13, 1109 (2022).
Article PubMed PubMed Central Google Scholar
Merzendorfer, H. Chitin synthesis inhibitors: old molecules and new developments. Insect Sci 20, 121–138 (2013).
Article CAS PubMed Google Scholar
Li, C. X. et al. Identification of genes involved in pyrethroid-, propoxur-, and dichlorvos- insecticides resistance in the mosquitoes, Culex pipiens complex (Diptera: Culicidae). Acta Tropica 157, 84–95 (2016).
Article CAS PubMed Google Scholar
Meng, J., Chen, X. & Zhang, C. Transcriptome-based identification and characterization of genes responding to imidacloprid in Myzus persicae. Sci Rep 9, 13285 (2019).
Article ADS PubMed PubMed Central Google Scholar
Su, H. et al. Comparative transcriptome profiling reveals candidate genes related to insecticide resistance of Glyphodes pyloalis. Bulletin of Entomological Research. 110(1), 57–67 (2020).
Article CAS PubMed Google Scholar
Wondji, C. S., Hearn, J., Irving, H., Wondji, M. J. & Weedall, G. RNAseq-based gene expression profiling of the Anopheles funestus pyrethroid-resistant strain FUMOZ highlights the predominant role of the duplicated CYP6P9a/b cytochrome P450s. G3 (Bethesda). 12(1), jkab352 (2022).
CAS PubMed Google Scholar
Cassone, B. J. et al. Gene expression divergence between malaria vector sibling species Anopheles gambiae and An. coluzzii from rural and urban Yaoundé Cameroon. Mol Ecol. 23(9), 2242–59 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wimalasiri-Yapa, B. M. C. R. et al. Differences in gene expression in field populations of Wolbachia-infected Aedes aegypti mosquitoes with varying release histories in northern Australia. PLoS Negl Trop Dis 17(3), e0011222 (2023).
Article CAS PubMed PubMed Central Google Scholar
Main, B. J. et al. Genetic variation associated with increased insecticide resistance in the malaria mosquito, Anopheles coluzzii. Parasites Vectors 11, 225 (2018).
Article PubMed PubMed Central Google Scholar
Castrignanò, T. et al. ELIXIR-IT HPC@ CINECA: high-performance computing resources for the bioinformatics community. BMC Bioinformatics 21, 1–17 (2020).
Article Google Scholar
Picardi, E., D’Antonio, M., Carrabino, D., Castrignanò, T. & Pesole, G. ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments. Bioinformatics 27, 1311–1312 (2011).
Article CAS PubMed Google Scholar
Chiara, M. et al. CoVaCS: a consensus variant calling system. BMC Genom. 19, 1–9 (2018).
Article CAS Google Scholar
Castrignano, T. et al. ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization. Nucleic Acids Research 34, W440–W443 (2006).
Article CAS PubMed PubMed Central Google Scholar
Castrignanò, T. et al. The MEPS server for identifying protein conformational epitopes. BMC bioinformatics 8, 1–5 (2007).
Article Google Scholar
Libro, P. et al. First brain de-novo transcriptome of Tyrrenhian tree frog, Hyla sarda, for the study of dispersal-related behavioral variation. Front. Ecol. Evol. 10, 1–6 (2022).
Article ADS Google Scholar
Libro, P. et al. De novo transcriptome assembly and annotation for gene discovery in Salamandra salamandra at the larval stage. Sci. Data 10, 330 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Article CAS PubMed PubMed Central Google Scholar
Liu, W. et al. Chromosome-level assembly of Culex pipiens molestus and improved reference genome of Culex pipiens pallens (Culicidae, Diptera). Mol Ecol Resour 23, 486–498 (2023).
Article CAS PubMed Google Scholar
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 8, giz100 (2019).
Article PubMed PubMed Central Google Scholar
Davidson, N. M. & Oshlack, A. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes. Genome Biol. 15, 1–14 (2014).
Google Scholar
Chiocchio, A. et al. Brain de novo transcriptome assembly of a toad species showing polymorphic anti-predatory behaviour. Sci. Data 9, 619 (2022).
Article CAS PubMed PubMed Central Google Scholar
Smith-Unna, R., Boursnell, C., Patro, R., Hibberd, J. M. & Kelly, S. TransRate: Reference-free quality assessment of de novo transcriptome assemblies. Genome Res. 26, 1134–1144 (2016).
Article CAS PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat biotechnol 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Signal, B. & Kahlke, T. Borf: Improved ORF prediction in de-novo assembled transcriptome annotation. BioRxiv 2021–04 (2021).
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein-coding regions in RNA transcripts. Nucleic Acids Res. 43, 78 (2015).
Article Google Scholar
Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Palomba, M. et al. De novo transcriptome assembly of an Antarctic nematode for the study of thermal adaptation in marine parasites. Scientific Data 10, 720 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chabikwa, T. G. et al. De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango. Sci Data 7, 9 (2020).
Article CAS PubMed PubMed Central Google Scholar
Muller, J. et al. eggNOG v2. 0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010).
Article ADS CAS PubMed Google Scholar
Ranganathan, M., Narayanan, M., Kumarasamy, S. Importance of metabolic enzymes and their role in insecticide resistance. In: New and future development in biopesticide research: Biotechnological exploration. Springer, Singapore, pp. 243–260 (2022).
Mangul, S. et al. Transcriptome assembly and quantification from ion torrent rna-seq data. BMC Genomics, 15(S5). https://doi.org/10.1186/1471-2164-15-s5-s7 (2014).
O’Neil, S. T. et al. Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics 11, 310, https://doi.org/10.1186/1471-2164-11-310 (2010).
Article CAS PubMed PubMed Central Google Scholar
Carruthers, M. et al. De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species. BMC Genomics, 19(1). https://doi.org/10.1186/s12864-017-4379-x (2018).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 1–14 (2019).
Article Google Scholar
Libro, P. Identification of metabolic resistance mechanisms in diflubenzuron resistant Culex pipiens mosquitoes. European Nucleotide Archive (ENA) https://www.ebi.ac.uk/ena/browser/view/PRJEB47420 (2023).
Libro, P. Culex pipiens data collection. figshare. https://doi.org/10.6084/m9.figshare.c.6748110.v1 (2023).

Download references

Acknowledgements

We thank Valentina Lucchesi for technical help and Mark Eltenton for English revision. We acknowledge the CINECA and the ELIXIR-ITA HPC@CINECA initiative for providing HPC resources to our project ELIX5_porretta P.I. Daniele Porretta. This study was funded by European Union’s Horizon (INFRAVEC) research and innovation programme under grant agreement No 731060 (https://infravec2.eu/) (Grant No. 8006) and the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. D.P. was supported by EU funding within the NextGeneration EU-MUR PNRR Extended Partnership initiative on Emerging Infectious Diseases (Project no. PE00000007, INF-ACT). V.M and S.U. received funds from the Italian Ministry for Education, University and Research (PRIN project: 2017J8JR57).

Author information

Authors and Affiliations

Department of Environmental Biology, Sapienza University of Rome, 00185, Rome, Italy
Valentina Mastrantonio, Sandra Urbanelli & Daniele Porretta
Department of Ecological and Biological Sciences, Tuscia University, Largo dell’Università snc, 01100, Viterbo, Italy
Pietro Libro, Jessica Di Martino & Tiziana Castrignanò
Envu, 2022 ES Deutschland GmbH, Germany, Monheim, Germany
Michele Matera
Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, United Kingdom
Michele Matera
Centro Agricoltura Ambiente “G. Nicoli”, Via Sant’Agata 835, 40014, Crevalcore, Italy
Romeo Bellini

Authors

Valentina Mastrantonio
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Libro
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Di Martino
View author publications
You can also search for this author in PubMed Google Scholar
Michele Matera
View author publications
You can also search for this author in PubMed Google Scholar
Romeo Bellini
View author publications
You can also search for this author in PubMed Google Scholar
Tiziana Castrignanò
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Urbanelli
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Porretta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.M., D.P., S.U., conceived the study; V.M., S.U. and D.P. designed the experiment; M.M. and R.B. performed sampling and sample preparation; T.C. designed and coordinated the bioinformatic analysis; P.L., J.D.M. and T.C. performed reads quality assessment, reads alignment on transcriptome, transcriptome annotation and validation; V.M., P.L. and T.C. wrote the manuscript; V.M., P.L., J.D.M., T.C., M.M., R.B., S.U. and D.P. reviewed the manuscript.

Corresponding author

Correspondence to Tiziana Castrignanò.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mastrantonio, V., Libro, P., Di Martino, J. et al. Integrated de novo transcriptome of Culex pipiens mosquito larvae as a resource for genetic control strategies. Sci Data 11, 471 (2024). https://doi.org/10.1038/s41597-024-03285-1

Download citation

Received: 07 August 2023
Accepted: 19 April 2024
Published: 09 May 2024
DOI: https://doi.org/10.1038/s41597-024-03285-1