Abstract
Mussels of order Unionida are a group of strictly freshwater bivalves with nearly 1,000 described species widely dispersed across world freshwater ecosystems. They are highly threatened showing the highest record of extinction events within faunal taxa. Conservation is particularly concerning in species occurring in the Mediterranean biodiversity hotspot that are exposed to multiple anthropogenic threats, possibly acting in synergy. That is the case of the dolphin freshwater mussel Unio delphinus Spengler, 1793, endemic to the western Iberian Peninsula with recently strong population declines. To date, only four genome assemblies are available for the order Unionida and only one European species. We present the first genome assembly of Unio delphinus. We used the PacBio HiFi to generate a highly contiguous genome assembly. The assembly is 2.5 Gb long, possessing 1254 contigs with a contig N50 length of 10 Mbp. This is the most contiguous freshwater mussel genome assembly to date and is an essential resource for investigating the species’ biology and evolutionary history that ultimately will help to support conservation strategies.
Similar content being viewed by others
Background & Summary
The application of genomics approaches to study non-model organisms is deemed a key approach to assess biodiversity and guide conservation1,2,3,4. Whole genome assemblies provide access to a species’ “entire genetic code”, thus representing the most comprehensive framework to efficiently decipher a species’ biology5,6. Genomic resources allow accurate definition of conservation units, identification of genetic elements with conservation relevance, inference of adaptive potential, assessment of population health, as well as provide predictive assessments of the impact of human-mediated threats and climate change3,5,7,8. Consequently, assembled genomes and other genomic tools are key resources to study and guide conservative actions and management planning.
Bivalves of the Order Unionida (known as freshwater mussels) are commonly found throughout most of the world’s freshwater ecosystems, where they play key ecological roles (e.g., nutrient and energy cycling and retention)9,10,11 and provide important services (e.g., water clearance, sediment mixing, pearls, and other raw materials)9,10,12. Despite their indisputable importance for freshwater ecosystems, freshwater mussels are among the most threatened taxa, with many populations worldwide having well-documented records of continuous declines over the last decades, as well as of many local and global extinctions13,14,15. Threatened species with limited distributions, such as the dolphin freshwater mussel U. delphinus Spengler, 1793 (Unionida: Unionidae) only found in the western Iberian Peninsula region (Fig. 1), represent particularly urgent but challenging targets for conservation16. The dolphin freshwater mussel, only recently recognised as a valid species17, has been strongly affected by a series of human-mediated actions over the last decades, including habitat destruction, dams or barrier construction, pollution, poor river management, water depletion, the introduction of invasive species, among others16,18. All these pressures are further augmented by the effects of climate change, especially the increasing interannual variability of water cycles over the last decades, which is particularly evident in the Mediterranean region19,20. As a consequence, the area of occurrence of the dolphin freshwater mussel has been reduced by almost one-third from its historical distribution18. This concerning trend has triggered an unprecedented effort to understand the threats and promote and implement conservation policies. These are critically dependent on the understanding the multiple aspects of the species’ biology, such as its life history, reproductive demands, ecological requirements, and its abiotic and biotic interactions13,16,18,21.
Recent efforts have focused on providing a thorough characterization of the species’ genetic diversity, population structure, and evolutionary history21,22,23. Despite the undeniable achievements of these early molecular studies, the availability of large-scale and more biologically informative genomic resources is almost inexistent, not only for U. delphinus but also for all freshwater mussels. In fact, for approximately 1000 known species, only four whole genome assemblies24,25,26,27 and less than 20 transcriptomes are currently available28,29,30,31,32,33,34,35,36,37,38,39,40,41. Recently, the first transcriptome assemblies of five threatened European freshwater mussel species have been published, including the gill transcriptome of the dolphin freshwater mussel41. This transcriptome was a fundamental tool to begin studying the evolutionary and adaptive traits of the species. However, single tissue RNA-seq approaches only capture a small fraction of the genetic information. Conversely, whole genome sequence assemblies represent a highly informative and fruitful resource to investigate and decipher multiple aspects of the species’ biology.
Here, we provide the first whole genome assembly of the dolphin freshwater mussel, U. delphinus. This is the most contiguous freshwater mussel genome assembly available, and the first Unionidae genome assembly from a European species. This genome provides a unique tool for an in-depth exploration of the many molecular mechanisms governing the biology of this species, which will ultimately guide conservation genomic studies to protect the critically declining population trend.
Methods
Animal sampling
One individual of Unio delphinus was collected in the Rabaçal River in Portugal (Table 1) and transported alive to the laboratory, where tissues were separated, flash-frozen, and stored at −80 °C. The whole shell and preserved tissues of the individual are deposited at CIIMAR tissue and mussels’ collection, under the voucher code: BIV7592.
DNA extraction, library construction, and sequencing
For PacBio HiFi sequencing, mantle tissue was sent to Brigham Young University (BYU), where high-molecular-weight DNA extraction and PacBio HiFi library construction and sequencing were performed, following the manufacturer’s recommendations (https://www.pacb.com/wp-content/uploads/Procedure-Checklist-Preparing-HiFi-SMRTbell-Libraries-using-SMRTbell-Express-Template-Prep-Kit-2.0.pdf). Size-selection was conducted on the SageELF system. Sequencing was performed on four single-molecule, real-time (SMRT) cells using Sequel II system v.9.0, with a run time of 30 h, and 2.9 h pre-extension. The circular consensus analysis was performed in SMRT® Link v9.0 (https://www.pacb.com/wp-content/uploads/SMRT_Link_Installation_v90.pdf) under default settings (Table 2).
For short read Illumina sequencing, extracted genomic DNA was sent to Macrogen Inc. where a standard Illumina Truseq Nano DNA library preparation and whole genome sequencing of 150 bp paired-end (PE) reads was achieved using an Illumina HiSeq X machine (Table 2).
Pre-assembly processing
Illumina PE short read quality was evaluated using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and after, reads were quality trimmed with Trimmomatic v.0.3842, specifying the parameters “LEADING: 5 TRAILING: 5 SLIDINGWINDOW: 5:20 MINLEN: 36”. The quality of the clean reads was re-validated in FastQC. The clean reads were used to estimate genome size, heterozygosity and repetitive content using Jellyfish v.2.2. and GenomeScope v2.043 specifying a k-mer length of 25.
Mitochondrial genome assembly
PacBio HiFi reads were used to retrieve a whole mitochondrial genome (mtDNA) assembly by applying a pipeline recently developed by our group44. Briefly, all Unionida mtDNA assemblies available on NCBI were retrieved (Fasta format; Supplementary_File1) and used as a reference mitogenome database. All the raw PacBio HiFi reads were mapped to the mitogenome database using Minimap2 v.2.1745, specifying parameters (-ax asm20). The output sam file was converted to bam and sorted using Samtools v.1.946, with options “view” and “sort”, respectively. Samtools “view” was also used to retrieve only the mapped reads with parameter (-F 0 × 04) and after the bam file was converted to fastq format using the option “bam2fq”. The resulting PacBio HiFi mtDNA reads were corrected using Hifiasm v.0.13-r30847,48 with parameters (–write-ec). The corrected reads were assembled using Unicycler v.0.4.849, a software package optimised for circular assemblies, with default parameters. Mitogenome annotation was produced using MitoZ v.3.450 with parameters (--genetic_code 5--clade Mollusca), using the PE reads for coverage plotting.
Genome assembly
The overall pipeline used to obtain the genome assembly and annotation is provided in Fig. 2.
Firstly, PacBio HiFi reads were assembled using multiple software optimized for PacBio HiFi reads, i.e., Hifiasm 0.16.1-r37547,48 with default parameters, Flye v.2.8.351 with parameters (–pacbio-hifi), NextDenovo v.2.4.0 (https://github.com/Nextomics/NextDenovo) with parameters (read_type = hifi) and Peregrine-2021 v0.4.352 with default parameters. After, the overall quality of each assembly was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.5.2.253 with Eukaryota and Metazoa databases and Quality Assessment Tool for Genome Assemblies (QUAST) v.5.0.254 (Fig. 2). Hifiasm 0.16.1-r375 produced the best results of the tested assemblies and thus was selected for further analyses. Since the genome size was larger than predicted by the GenomeScope report, several new assemblies were produced with this Hifiasm 0.16.1-r375, testing a range of parameters (l = 3; s = 0.50, 0.45, 0.35), following the authors’ recommendations (https://hifiasm.readthedocs.io/en/latest/faq.html#p-large). Given that reducing the similarity threshold for duplicate haplotigs (i.e., parameter -l and -s) had little impact on the overall statistic, the assembly with default parameters was chosen for further analysis. To separate poorly resolved pseudo-haplotypes, purge_dups v.1.2.555 was applied, first with default parameters and after by manually adjusting the transition between haploid and diploid cut-off (i.e., parameter -m of option calcuts) to 30, 32 and 25 in three independent runs. In all the runs the lower and upper bound for read depth were always maintained, i.e., 5 and 87, respectively. All the cutoff values were determined by inspection of the k-mer plot produced by the K-mer Analysis Toolkit (KAT) tool56. The influence of purge_dups v.1.2.5 was evaluated using BUSCO v.5.2.2 with Eukaryota and Metazoa databases and QUAST v.5.0.2. Since purge_dups v.1.2.5 did not remove any duplicates (neither with the default nor adjusted cutoffs) the Hifiasm 0.16.1-r375 default assembly was selected as the final assembly. To evaluate the quality of the final assembly, several metrics and software were used. Besides BUSCO v.5.2.2 and QUAST v.5.0.2 metrics, completeness, heterozygosity, and collapsing of repetitive regions were evaluated using a k-mer distribution with KAT56. Moreover, read-back mapping was performed for the PE using with Burrows-Wheeler Aligner (BWA) v.0.7.17-r119857, for long reads with Minimap2 v.2.17 and for RNA-seq (SRR1926176441) with Hisat2 v.2.2.058. To inspect the genome for possible contamination, we used BlobTools v.1.1.159. Briefly, a blast search of the final genome assembly was conducted against the RefSeq60 database, using the BLASTX function from DIAMOND v.2.0.11.14961, following authors’ instructions59. The blast output, as well as the alignment of PE short reads against the genome performed with BWA v.0.7.17, were used as input to run BlobTools, with contamination screening at Phylum level.
Masking of repetitive elements, gene models predictions and annotation
Before masking repetitive elements, a de novo library of repeats was created for the final genome assembly, with RepeatModeler v.2.0.13362. Subsequently, the genome was soft masked combining the de novo library with the ‘Bivalvia’ libraries from Dfam_consensus-20170127 and RepBase-20181026, using RepeatMasker v.4.0.73463.
The masked assembly was used for gene prediction, performed using BRAKER2 pipeline v2.1.664. First, RNA-seq data from U. delphinus was retrieved from GenBank (SRR1926176441) (the same individual used for the genome assembly), quality trimmed with Trimmomatic v.0.3839 (parameters described above) and aligned to the masked genome, using Hisat2 v.2.2.0 with the default parameters. Moreover, the complete proteome of 14 mollusc species and three reference Metazoa genomes (Homo sapiens, Ciona intestinalis, Strongylocentrotus purpuratus), were used as supplementary evidence for gene prediction, downloaded from public databases (Table 3). BRAKER2 pipeline was applied, specifying parameters “–etpmode; –softmasking;”. The resulting predictions file (braker.gtf) was filtered to retain only predictions with RNA-Seq and/or protein evidence (using auxiliary scripts selectSupportedSubsets.py) and after converted to.gff3 using the Augustus auxiliary script gtf2gff.pl. Gene predictions were processed using a series of auxiliary scripts from Another Gtf/Gff Analysis Toolkit (AGAT) v.0.8.06365. Briefly, gene predictions were clean with agat_convert_sp_gxf2gxf.pl, renamed with agat_sp_manage_functional_annotation.pl, overlapping prediction corrected with agat_sp_fix_overlaping_genes.pl and coding sequence regions (CDS) with <100 amino acid and incomplete gene predictions (i.e., without start and/or stop codons) were corrected and/or removed with agat_sp_add_start_and_stop.pl and agat_sp_filter_incomplete_gene_coding_models.pl, respectively. Finally, the overall statistics of the processed predictions were retrieved using agat_sp_statistics.pl and the predicted genes, protein, CDS and exon sequences were retrieved using agat_sp_extract_sequences.pl. The protein sequences were next used for functional annotation, using InterProScan v.5.44.8066, as well as BLASTP searches against the RefSeq database60. BLASTP homology searches were obtained using DIAMOND v.2.0.11.14961, specifying the parameters “-k 1, -b 20, -e 1e-5,–sensitive,–outfmt 6”. To validate the set of proteins obtained, the BUSCO scores were estimated based on the protein set, using the Eukaryota and Metazoa databases, as described previously.
Data Records
The raw reads sequencing outputs were deposited at the NCBI Sequence Read Archive with the accession’s numbers: SRR23060683, SRR23060685, SRR23060678 and SRR23060675 for PacBio CCS HiFi; SRR23060686 for Illumina PE67. The Genome assembly is available under accession number JAQISU00000000068. BioSample accession number is SAMN32554582 and BioProject PRJNA91785569. The remaining information was uploaded to figshare. In detail, the files uploaded to figshare70 include the final unmasked and masked genome assemblies (Ude_BIV7592_haploid.fa and Ude_BIV7592_haploid_SM.fa), the two pseudohaplotypes genome assemblies generated by Hifiasm assembler (Ude_BIV7592_pseudohaplotype_1.fas.gz and Ude_BIV7592_pseudohaplotype_2.fas.gz), the annotation file (Ude_BIV7592_annotation_v4.gff3), predicted genes (Ude_BIV7592_genes_v4.fasta), predicted messenger RNA (Ude_BIV7592_mrna_v4.fasta), predicted open reading frames (Ude_BIV7592_cds_v4.fasta), predicted proteins (Ude_BIV7592_proteins_v4.fasta), as well as full table reports for Braker gene predictions and InterProScan functional annotations (Ude_BIV7592_annotation_v4_InterPro_report.txt) and RepeatMasker predictions (Ude_BIV7592_annotation_v4_RepeatMasker.tbl).
Technical Validation
Raw datasets and pre-assembly processing quality control
Raw sequencing outputs general statistics are provided in Table 2. GenomeScope2 estimated genome size was ~2.31 Gb and heterozygosity levels of ~0.64% (Fig. 3a), both within the values observed for other Unionidae genomes available24,25,26,27.
Genome assembly metrics
Hifiasm produced the overall most contiguous and complete (accessed under BUSCO scores) genome assembly of all the tested assemblers (Table 4). Both Flye and Peregrine-2021 were very inefficient in collapsing haplotypes, resulting in unexpectedly large assemblies with high levels of duplicated BUSCO scores (Table 4). Conversely, Hifiasm and NextDenovo efficiently resolve duplicates while ensuring high complete BUSCO scores (Table 4). Additionally, Hifiasm produced a much more contiguous genome assembly, with an almost 5-fold increased N50 length (Table 4). Although the BUSCO scores of the Hifiasm assembly had residual percentages of duplicated sequences, considering the increased genome size compared with GenomeScope estimation, as well as the genome sizes of other Unionidae assemblies (Table 5), we tested several similarity thresholds for duplicates in Hifiasm. The impact of the resulting assemblies on the overall statistics was limited, i.e., -s 0.50-0.35, or had no impact at all, i.e., -l 3 (Table 4). Although two of the assemblies, i.e., -s 0.50 and -s 0.45, show a slight increase in the N50 length (Table 4), given the overall little impact in the final genome size, we opted to use the Hifiasm default assembly as the final assembly. Moreover, purg-dups software did not remove any additional sequences from the Hifiasm default assembly, suggesting that reducing the similarity threshold for duplicate haplotigs (option -s) might be over-purging the assembly.
The final genome assembly has a total length of ~2.5 Gbp, which is relatively larger than the GenomeScope size estimation, i.e., ~2.31 Gbp (Table 5, Fig. 3a). Although unexpected, the fact is that from all the primary assemblies here produced (from different software and Hifiasm parameters), none had a total length close to those estimated from GenomeScope (Tables 4–5). The alternative haplotypes assemblies produced by Hifiasm have a total length similar to the GenomeScope estimations, however, the complete BUSCO scores were reduced for these assemblies with no significate impact on duplicates (Table 5). On the other hand, purge-dups did not report any duplicated sequences in the assembly, which further support that Hifiasm efficiently resolved the haplotype variants. Moreover, the few genome assemblies available for freshwater mussels, show considerable distinct genome sizes (up to 696Mbp difference in size), even within the family Unionidae (Table 5). Consequently, the discrepancies between GenomeScope and the final genome size are likely a consequence of short read-based k-mer frequency spectrum analyses inaccurate estimation of the genome size.
The assembly here presented also shows, the most contiguous freshwater mussel genome assembly available to date, with a contig N50 length of ~ 10 Mbp, which represents a ~5-fold increase in N50 length regarding the only other PacBio-based genome assembly, i.e., from P. streckersoni25 (Table 5). The levels of completeness reported by BUSCOs scores are also within those observed for other freshwater mussel genome assemblies, with nearly no fragmented nor missing hits for both the eukaryotic and metazoan curated lists of near-universal single-copy orthologous (Table 5). The KAT k-mer analyses revealed a low level of k-mer duplication (blue, green, purple, and orange in Fig. 3b), with a high level of haplotype uniqueness (red in Fig. 3b) and a similar k-mer distribution to GenomeScope2 (performed with Illumina PE reads Fig. 3a,b). Both short-read, RNA-seq and long-read back-mapping percentages resulted in an almost complete mapping (Table 5). Finally, BlobTools Read Coverage Plots (ReadCovPlot) shows a dominance of hits with Mollusca (41.68%), followed by two groups with a similar hit percentage, i.e., undefined (27.41%) and Arthropoda (22.81%) (Fig. S1). The high values of undefined hits are expected given the overwhelming low number of closely related species with annotated genomes available on NCBI. Only 16 bivalves’ genomes have annotations available of NCBI, none of which belong to freshwater mussels or Palaeoheterodonta. In fact, annotations are only available for two higher-level bivalve clades, the vast majority for Pteriomorphia (n = 12) and the remaining for Imparidentia (n = 4). Moreover, this low and biased representation of annotated references most likely also explains the apparent contamination with Arthropoda (Fig. S1), as unspecific hits with unrelated taxa have been observed in other recent freshwater mussel genome assemblies24. Nevertheless, to deeply scrutinize for possible contaminations, the percentage of phyla representation was also quantified from the U. delphinus predicted proteins, using the RefSeq BLASTP search outputs (Fig. S2, Supplementary File 2). The results show the dominance of hits with Mollusca, with other taxa having residual representation and low percentages of identity, thus unlikely to represent contaminations (Fig. S2, Supplementary File 2).
Overall, these general statistics validate the high completeness, low redundancy, and quality of the final genome assembly.
Repeat masking, gene models prediction, and annotation
RepeatModeler/RepeatMasker masked 52.83% of the genome, a value within those observed for other Unionida genome assemblies and close to the GenomeScope estimation (Table 6, Fig. 3a). Unlike the results observed in previous freshwater mussel’s genome assemblies24,25, most repeats are classified as DNA elements (21.92%, ~ 549 Mgp), rather than unclassified (16.32%, ~ 408 Mgp), with the remaining categories having similar percentages (Table 6). These results might be a consequence of PacBio HiFi reads efficiency in resolving repetitive regions thus facilitating their classification. BRAKER2 gene prediction identified 44,382 CDS, which is close to the predictions of the other freshwater mussel assemblies (Table 5). BUSCO scores for protein predictions showed almost no missing hits for either of the near-universal single-copy orthologous databases used (Table 5). The number of functionally annotated genes was 32,089, which is similar to the number of annotated genes for the Margaritifera margaritifera genome assembly (Table 5)24. Overall, the numbers of both predicted and annotated genes are within the expected range for bivalves (reviewed in71), as well as within the records of other freshwater mussel assemblies (Table 5)24,25,26,27.
Code availability
All software with respective versions and parameters used for producing the resources here presented (i.e., transcriptome assembly, pre and post-assembly processing stages, and transcriptome annotation) are listed in the methods section. Software programs with no parameters associated were used with the default settings.
References
Allendorf, F. W., Hohenlohe, P. A. & Luikart, G. Genomics and the future of conservation genetics. Nature Reviews Genetics 2010 11:10 11, 697–709 (2010).
Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol Evol 37, 197–202 (2022).
Hohenlohe, P. A., Funk, W. C. & Rajora, O. P. Population genomics for wildlife conservation and management. Mol Ecol 30, 62–82 (2021).
Meek, M. H. & Larson, W. A. The future is now: Amplicon sequencing and sequence capture usher in the conservation genomics era. Mol Ecol Resour 19, 795–803 (2019).
Paez, S. et al. Reference genomes for conservation. Science (1979) 377, 364–366 (2022).
Stephan, T. et al. Darwinian genomics and diversity in the tree of life. Proc Natl Acad Sci USA 119 (2022).
van Oppen, M. J. H. & Coleman, M. A. Advancing the protection of marine life through genomics. PLoS Biol 20, e3001801 (2022).
Bertorelle, G. et al. Genetic load: genomic estimates and applications in non-model animals. Nature Reviews Genetics 2022 23:8 23, 492–503 (2022).
Vaughn, C. C., Nichols, S. J. & Spooner, D. E. Community and foodweb ecology of freshwater mussels. 27, 409–423, https://doi.org/10.1899/07-058.1 (2015).
Vaughn, C. C. Ecosystem services provided by freshwater mussels. Hydrobiologia 2017 810:1 810, 15–27 (2017).
Lopes-Lima, M. et al. Biology and conservation of freshwater bivalves: Past, present and future perspectives. Hydrobiologia 735, 1–13, https://doi.org/10.1007/s10750-014-1902-9 (2014).
Haag, W. R. North American Freshwater Mussels: Natural History, Ecology, and Conservation. (Cambridge University Press, 2012).
Lopes-Lima, M. et al. Conservation status of freshwater mussels in Europe: state of the art and future challenges. Biological Reviews 92, 572–607 (2017).
Cuttelod, A., Seddon, M. & Neubert, E. European red list of non-marine molluscs. (Publications Office of the European Union Luxembourg, 2011).
Lopes-Lima, M. et al. Conservation of freshwater bivalves at the global scale: diversity, threats and research needs. Hydrobiologia 810, 1–14 (2018).
Lopes-Lima, M. et al. Setting the stage for new ecological indicator species: A holistic case study on the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. Ecol Indic 111, 105987 (2020).
Araujo, R. et al. Las náyades de la península Ibérica As náiades da Península Ibérica The naiads of the Iberian Peninsula. 27, 7–72 (2009).
Araujo, R., Feo, C., Pou, Q. & Campos, M. Conservation of two endangered European freshwater mussels (Bivalvia: Unionidae): a three-year, semi-natural breeding experiment. Nautilus (Philadelphia) 129, 126–135 (2015).
Robson, B. J., Chester, E. T., Mitchell, B. D. & Matthews, T. G. Disturbance and the role of refuges in mediterranean climate streams. Hydrobiologia 719, 77–91 (2013).
Cid, N. et al. High Variability Is a Defining Component of Mediterranean-Climate Rivers and Their Biota. Water 2017, Vol. 9, Page 52 9, 52 (2017).
Froufe, E. et al. Who lives where? Molecular and morphometric analyses clarify which Unio species (Unionida, Mollusca) inhabit the southwestern Palearctic. Org Divers Evol 16, 597–611 (2016).
Fonseca, M. M., Lopes-Lima, M., Eackles, M. S., King, T. L. & Froufe, E. The female and male mitochondrial genomes of Unio delphinus and the phylogeny of freshwater mussels (Bivalvia: Unionida). Mitochondrial DNA B Resour 1, 954–957 (2016).
Araujo, R., Buckley, D., Nagel, K. O., García-Jiménez, R. & Machordom, A. Species boundaries, geographic distribution and evolutionary history of the Western palaearctic freshwater mussels Unio (Bivalvia: Unionidae). Zool J Linn Soc 182, 275–299 (2018).
Gomes-dos-Santos, A. et al. The Crown Pearl: a draft genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758). DNA Research 28 (2021).
Smith, C. H. A High-Quality Reference Genome for a Parasitic Bivalve with Doubly Uniparental Inheritance (Bivalvia: Unionida). Genome Biol Evol 13 (2021).
Rogers, R. L. et al. Gene family amplification facilitates adaptation in freshwater unionid bivalve Megalonaias nervosa. Mol Ecol 30, 1155–1173 (2021).
Renaut, S. et al. Genome Survey of the Freshwater Mussel Venustaconcha ellipsiformis (Bivalvia: Unionida) Using a Hybrid De Novo Assembly Approach. Genome Biol Evol 10, 1637–1646 (2018).
Roznere, I., Sinn, B. T. & Watters, G. T. The Amblema plicata Transcriptome as a Resource to Assess Environmental Impacts on Freshwater Mussels. Freshwater Mollusk Biology and Conservation 21, 57–64 (2018).
Wang, R. et al. Rapid development of molecular resources for a freshwater mussel, Villosa lienosa (Bivalvia:Unionidae), using an RNA-seq-based approach. 31, 695–708, https://doi.org/10.1899/11-149.1 (2015).
Luo, Y. et al. Transcriptomic Profiling of Differential Responses to Drought in Two Freshwater Mussel Species, the Giant Floater Pyganodon grandis and the Pondhorn Uniomerus tetralasmus. PLoS One 9, e89481 (2014).
Patnaik, B. B. et al. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery. PLoS One 11, e0148622 (2016).
Wang, X., Liu, Z. & Wu, W. Transcriptome analysis of the freshwater pearl mussel (Cristaria plicata) mantle unravels genes involved in the formation of shell and pearl. Molecular Genetics and Genomics 292, 343–352 (2017).
Yang, Q. et al. Histopathology, antioxidant responses, transcriptome and gene expression analysis in triangle sail mussel Hyriopsis cumingii after bacterial infection. Dev Comp Immunol 124, 104175 (2021).
Bertucci, A. et al. Transcriptomic responses of the endangered freshwater mussel Margaritifera margaritifera to trace metal contamination in the Dronne River, France. Environmental Science and Pollution Research 24, 27145–27159 (2017).
Robertson, L. S., Galbraith, H. S., Iwanowicz, D., Blakeslee, C. J. & Cornman, R. S. RNA sequencing analysis of transcriptional change in the freshwater mussel Elliptio complanata after environmentally relevant sodium chloride exposure. Environ Toxicol Chem 36, 2352–2366 (2017).
Capt, C. et al. Deciphering the Link between Doubly Uniparental Inheritance of mtDNA and Sex Determination in Bivalves: Clues from Comparative Transcriptomics. Genome Biol Evol 10, 577–590 (2018).
Huang, D., Shen, J., Li, J. & Bai, Z. Integrated transcriptome analysis of immunological responses in the pearl sac of the triangle sail mussel (Hyriopsis cumingii) after mantle implantation. Fish Shellfish Immunol 90, 385–394 (2019).
Capt, C., Renaut, S., Stewart, D. T., Johnson, N. A. & Breton, S. Putative Mitochondrial Sex Determination in the Bivalvia: Insights From a Hybrid Transcriptome Assembly in Freshwater Mussels. Front Genet 10, 840 (2019).
Chen, X., Bai, Z. & Li, J. The Mantle Exosome and MicroRNAs of Hyriopsis cumingii Involved in Nacre Color Formation. Marine Biotechnology 21, 634–642 (2019).
Cornman, R. S., Robertson, L. S., Galbraith, H. & Blakeslee, C. Transcriptomic Analysis of the Mussel Elliptio complanata Identifies Candidate Stress-Response Genes and an Abundance of Novel or Noncoding Transcripts. PLoS One 9, e112420 (2014).
Gomes-dos-Santos, A. et al. The gill transcriptome of threatened European freshwater mussels. Sci Data 9, 494 (2022).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1–10 (2020).
Machado, A. M. et al. A genome assembly of the Atlantic chub mackerel (Scomber colias): a valuable teleost fishing resource. GigaByte 2022, 1–21 (2022).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, 1–4 (2021).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 2022 40:9 40, 1332–1335 (2022).
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13, e1005595 (2017).
Meng, G., Li, Y., Yang, C. & Liu, S. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Res 47, e63–e63 (2019).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 2019 37:5 37, 540–546 (2019).
Chin, C.-S. & Khalak, A. Human Genome Assembly in 100 Minutes. bioRxiv 705616, https://doi.org/10.1101/705616 (2019).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol 38, 4647–4654 (2021).
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Mapleson, D., Accinelli, G. G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: A K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574–576 (2017).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. (2013).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat Methods 12, 357–360 (2015).
Laetsch, D. R. & Blaxter, M. L. BlobTools: interrogation of genome assemblies. F1000Res 6, 1287 (2017).
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35, D61–D65 (2007).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015).
Smit, A. & Hubley, R. RepeatModeler. www.repeatmasker.org (2015).
Smit, A. & Hubley, R. RepeatMasker. www.repeatmasker.org (2015).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3, 1–11 (2021).
Dainat, J., Hereñú, D. & Pucholt, P. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. Zenodo https://doi.org/10.5281/zenodo.4205393 (2020).
Quevillon, E. et al. InterProScan: Protein domains identifier. Nucleic Acids Res 33, W116–W120 (2005).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP417152 (2023).
Gomes-dos-Santos, A. et al. Unio delphinus voucher BIV7592, whole genome shotgun sequencing project. Genbank https://identifiers.org/nucleotide:JAQISU000000000 (2023).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA917855 (2023).
Gomes-dos-Santos, A. et al. PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. figshare https://doi.org/10.6084/m9.figshare.21878946 (2023).
Gomes-dos-Santos, A., Lopes-Lima, M., C. Castro, L. F. & Froufe, E. Molluscan genomics: the road so far and the way forward. Hydrobiologia 847, 1705–1726 (2020).
Lehner, B. & Grill, G. Global river hydrography and network routing: Baseline data and new approaches to study the world’s large river systems. Hydrol Process 27, 2171–2186 (2013).
Acknowledgements
AGS was funded by the Portuguese Foundation for Science and Technology (FCT) under the grants SFRH/BD/137935/2018 and COVID/DB/152933/2022, which also supported MLL (2020.03608.CEECIND) and EF (CEECINST/00027/2021). This research was developed under the project EdgeOmics - Freshwater Bivalves at the Edge: Adaptation genomics under climate-change scenarios (PTDC/CTA-AMB/3065/2020) funded by FCT through national funds. Additional strategic funding was provided by FCT UIDB/04423/2020 and UIDP/04423/2020. We thank the two anonymous reviewers for the helpful remarks and suggestions, which have significantly improved the manuscript.
Author information
Authors and Affiliations
Contributions
E.F., M.L.L. and L.F.C.C. designed and conceived this work. M.L.L. and A.T. collected the samples. A.G.S. and A.M.M. carry on all the analysis. A.G.S. and E.F. wrote the first version of the manuscript. All authors read, revised, and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gomes-dos-Santos, A., Lopes-Lima, M., Machado, M.A. et al. PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. Sci Data 10, 340 (2023). https://doi.org/10.1038/s41597-023-02251-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02251-7