PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793

Gomes-dos-Santos, André; Lopes-Lima, Manuel; Machado, M. André; Teixeira, Amílcar; C. Castro, L. Filipe; Froufe, Elsa

doi:10.1038/s41597-023-02251-7

Download PDF

Data Descriptor
Open access
Published: 01 June 2023

PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793

Scientific Data volume 10, Article number: 340 (2023) Cite this article

1935 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Mussels of order Unionida are a group of strictly freshwater bivalves with nearly 1,000 described species widely dispersed across world freshwater ecosystems. They are highly threatened showing the highest record of extinction events within faunal taxa. Conservation is particularly concerning in species occurring in the Mediterranean biodiversity hotspot that are exposed to multiple anthropogenic threats, possibly acting in synergy. That is the case of the dolphin freshwater mussel Unio delphinus Spengler, 1793, endemic to the western Iberian Peninsula with recently strong population declines. To date, only four genome assemblies are available for the order Unionida and only one European species. We present the first genome assembly of Unio delphinus. We used the PacBio HiFi to generate a highly contiguous genome assembly. The assembly is 2.5 Gb long, possessing 1254 contigs with a contig N50 length of 10 Mbp. This is the most contiguous freshwater mussel genome assembly to date and is an essential resource for investigating the species’ biology and evolutionary history that ultimately will help to support conservation strategies.

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Article Open access 10 February 2024

Genome assembly of the Korean intertidal mud-creeper Batillaria attramentaria

Article Open access 28 July 2023

The gill transcriptome of threatened European freshwater mussels

Article Open access 13 August 2022

Background & Summary

The application of genomics approaches to study non-model organisms is deemed a key approach to assess biodiversity and guide conservation^1,2,3,4. Whole genome assemblies provide access to a species’ “entire genetic code”, thus representing the most comprehensive framework to efficiently decipher a species’ biology^5,6. Genomic resources allow accurate definition of conservation units, identification of genetic elements with conservation relevance, inference of adaptive potential, assessment of population health, as well as provide predictive assessments of the impact of human-mediated threats and climate change^3,5,7,8. Consequently, assembled genomes and other genomic tools are key resources to study and guide conservative actions and management planning.

Bivalves of the Order Unionida (known as freshwater mussels) are commonly found throughout most of the world’s freshwater ecosystems, where they play key ecological roles (e.g., nutrient and energy cycling and retention)^9,10,11 and provide important services (e.g., water clearance, sediment mixing, pearls, and other raw materials)^9,10,12. Despite their indisputable importance for freshwater ecosystems, freshwater mussels are among the most threatened taxa, with many populations worldwide having well-documented records of continuous declines over the last decades, as well as of many local and global extinctions^13,14,15. Threatened species with limited distributions, such as the dolphin freshwater mussel U. delphinus Spengler, 1793 (Unionida: Unionidae) only found in the western Iberian Peninsula region (Fig. 1), represent particularly urgent but challenging targets for conservation¹⁶. The dolphin freshwater mussel, only recently recognised as a valid species¹⁷, has been strongly affected by a series of human-mediated actions over the last decades, including habitat destruction, dams or barrier construction, pollution, poor river management, water depletion, the introduction of invasive species, among others^16,18. All these pressures are further augmented by the effects of climate change, especially the increasing interannual variability of water cycles over the last decades, which is particularly evident in the Mediterranean region^19,20. As a consequence, the area of occurrence of the dolphin freshwater mussel has been reduced by almost one-third from its historical distribution¹⁸. This concerning trend has triggered an unprecedented effort to understand the threats and promote and implement conservation policies. These are critically dependent on the understanding the multiple aspects of the species’ biology, such as its life history, reproductive demands, ecological requirements, and its abiotic and biotic interactions^13,16,18,21.

Recent efforts have focused on providing a thorough characterization of the species’ genetic diversity, population structure, and evolutionary history^21,22,23. Despite the undeniable achievements of these early molecular studies, the availability of large-scale and more biologically informative genomic resources is almost inexistent, not only for U. delphinus but also for all freshwater mussels. In fact, for approximately 1000 known species, only four whole genome assemblies^24,25,26,27 and less than 20 transcriptomes are currently available^{28,29,30,31,32,33,34,35,36,37,38,39,40,41}. Recently, the first transcriptome assemblies of five threatened European freshwater mussel species have been published, including the gill transcriptome of the dolphin freshwater mussel⁴¹. This transcriptome was a fundamental tool to begin studying the evolutionary and adaptive traits of the species. However, single tissue RNA-seq approaches only capture a small fraction of the genetic information. Conversely, whole genome sequence assemblies represent a highly informative and fruitful resource to investigate and decipher multiple aspects of the species’ biology.

Here, we provide the first whole genome assembly of the dolphin freshwater mussel, U. delphinus. This is the most contiguous freshwater mussel genome assembly available, and the first Unionidae genome assembly from a European species. This genome provides a unique tool for an in-depth exploration of the many molecular mechanisms governing the biology of this species, which will ultimately guide conservation genomic studies to protect the critically declining population trend.

Methods

Animal sampling

One individual of Unio delphinus was collected in the Rabaçal River in Portugal (Table 1) and transported alive to the laboratory, where tissues were separated, flash-frozen, and stored at −80 °C. The whole shell and preserved tissues of the individual are deposited at CIIMAR tissue and mussels’ collection, under the voucher code: BIV7592.

Table 1 MixS descriptors for the Unio delphinus specimen used for whole genome sequencing.

Full size table

DNA extraction, library construction, and sequencing

For PacBio HiFi sequencing, mantle tissue was sent to Brigham Young University (BYU), where high-molecular-weight DNA extraction and PacBio HiFi library construction and sequencing were performed, following the manufacturer’s recommendations (https://www.pacb.com/wp-content/uploads/Procedure-Checklist-Preparing-HiFi-SMRTbell-Libraries-using-SMRTbell-Express-Template-Prep-Kit-2.0.pdf). Size-selection was conducted on the SageELF system. Sequencing was performed on four single-molecule, real-time (SMRT) cells using Sequel II system v.9.0, with a run time of 30 h, and 2.9 h pre-extension. The circular consensus analysis was performed in SMRT^® Link v9.0 (https://www.pacb.com/wp-content/uploads/SMRT_Link_Installation_v90.pdf) under default settings (Table 2).

Table 2 General statistics of raw sequencing reads used for the Unio delphinus genome assembly.

Full size table

For short read Illumina sequencing, extracted genomic DNA was sent to Macrogen Inc. where a standard Illumina Truseq Nano DNA library preparation and whole genome sequencing of 150 bp paired-end (PE) reads was achieved using an Illumina HiSeq X machine (Table 2).

Pre-assembly processing

Illumina PE short read quality was evaluated using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and after, reads were quality trimmed with Trimmomatic v.0.38⁴², specifying the parameters “LEADING: 5 TRAILING: 5 SLIDINGWINDOW: 5:20 MINLEN: 36”. The quality of the clean reads was re-validated in FastQC. The clean reads were used to estimate genome size, heterozygosity and repetitive content using Jellyfish v.2.2. and GenomeScope v2.0⁴³ specifying a k-mer length of 25.

Mitochondrial genome assembly

PacBio HiFi reads were used to retrieve a whole mitochondrial genome (mtDNA) assembly by applying a pipeline recently developed by our group⁴⁴. Briefly, all Unionida mtDNA assemblies available on NCBI were retrieved (Fasta format; Supplementary_File1) and used as a reference mitogenome database. All the raw PacBio HiFi reads were mapped to the mitogenome database using Minimap2 v.2.17⁴⁵, specifying parameters (-ax asm20). The output sam file was converted to bam and sorted using Samtools v.1.9⁴⁶, with options “view” and “sort”, respectively. Samtools “view” was also used to retrieve only the mapped reads with parameter (-F 0 × 04) and after the bam file was converted to fastq format using the option “bam2fq”. The resulting PacBio HiFi mtDNA reads were corrected using Hifiasm v.0.13-r308^47,48 with parameters (–write-ec). The corrected reads were assembled using Unicycler v.0.4.8⁴⁹, a software package optimised for circular assemblies, with default parameters. Mitogenome annotation was produced using MitoZ v.3.4⁵⁰ with parameters (--genetic_code 5--clade Mollusca), using the PE reads for coverage plotting.

Genome assembly

The overall pipeline used to obtain the genome assembly and annotation is provided in Fig. 2.

Firstly, PacBio HiFi reads were assembled using multiple software optimized for PacBio HiFi reads, i.e., Hifiasm 0.16.1-r375^47,48 with default parameters, Flye v.2.8.3⁵¹ with parameters (–pacbio-hifi), NextDenovo v.2.4.0 (https://github.com/Nextomics/NextDenovo) with parameters (read_type = hifi) and Peregrine-2021 v0.4.3⁵² with default parameters. After, the overall quality of each assembly was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.5.2.2⁵³ with Eukaryota and Metazoa databases and Quality Assessment Tool for Genome Assemblies (QUAST) v.5.0.2⁵⁴ (Fig. 2). Hifiasm 0.16.1-r375 produced the best results of the tested assemblies and thus was selected for further analyses. Since the genome size was larger than predicted by the GenomeScope report, several new assemblies were produced with this Hifiasm 0.16.1-r375, testing a range of parameters (l = 3; s = 0.50, 0.45, 0.35), following the authors’ recommendations (https://hifiasm.readthedocs.io/en/latest/faq.html#p-large). Given that reducing the similarity threshold for duplicate haplotigs (i.e., parameter -l and -s) had little impact on the overall statistic, the assembly with default parameters was chosen for further analysis. To separate poorly resolved pseudo-haplotypes, purge_dups v.1.2.5⁵⁵ was applied, first with default parameters and after by manually adjusting the transition between haploid and diploid cut-off (i.e., parameter -m of option calcuts) to 30, 32 and 25 in three independent runs. In all the runs the lower and upper bound for read depth were always maintained, i.e., 5 and 87, respectively. All the cutoff values were determined by inspection of the k-mer plot produced by the K-mer Analysis Toolkit (KAT) tool⁵⁶. The influence of purge_dups v.1.2.5 was evaluated using BUSCO v.5.2.2 with Eukaryota and Metazoa databases and QUAST v.5.0.2. Since purge_dups v.1.2.5 did not remove any duplicates (neither with the default nor adjusted cutoffs) the Hifiasm 0.16.1-r375 default assembly was selected as the final assembly. To evaluate the quality of the final assembly, several metrics and software were used. Besides BUSCO v.5.2.2 and QUAST v.5.0.2 metrics, completeness, heterozygosity, and collapsing of repetitive regions were evaluated using a k-mer distribution with KAT⁵⁶. Moreover, read-back mapping was performed for the PE using with Burrows-Wheeler Aligner (BWA) v.0.7.17-r1198⁵⁷, for long reads with Minimap2 v.2.17 and for RNA-seq (SRR19261764⁴¹) with Hisat2 v.2.2.0⁵⁸. To inspect the genome for possible contamination, we used BlobTools v.1.1.1⁵⁹. Briefly, a blast search of the final genome assembly was conducted against the RefSeq⁶⁰ database, using the BLASTX function from DIAMOND v.2.0.11.149⁶¹, following authors’ instructions⁵⁹. The blast output, as well as the alignment of PE short reads against the genome performed with BWA v.0.7.17, were used as input to run BlobTools, with contamination screening at Phylum level.

Masking of repetitive elements, gene models predictions and annotation

Before masking repetitive elements, a de novo library of repeats was created for the final genome assembly, with RepeatModeler v.2.0.133⁶². Subsequently, the genome was soft masked combining the de novo library with the ‘Bivalvia’ libraries from Dfam_consensus-20170127 and RepBase-20181026, using RepeatMasker v.4.0.734⁶³.

The masked assembly was used for gene prediction, performed using BRAKER2 pipeline v2.1.6⁶⁴. First, RNA-seq data from U. delphinus was retrieved from GenBank (SRR19261764⁴¹) (the same individual used for the genome assembly), quality trimmed with Trimmomatic v.0.3839 (parameters described above) and aligned to the masked genome, using Hisat2 v.2.2.0 with the default parameters. Moreover, the complete proteome of 14 mollusc species and three reference Metazoa genomes (Homo sapiens, Ciona intestinalis, Strongylocentrotus purpuratus), were used as supplementary evidence for gene prediction, downloaded from public databases (Table 3). BRAKER2 pipeline was applied, specifying parameters “–etpmode; –softmasking;”. The resulting predictions file (braker.gtf) was filtered to retain only predictions with RNA-Seq and/or protein evidence (using auxiliary scripts selectSupportedSubsets.py) and after converted to.gff3 using the Augustus auxiliary script gtf2gff.pl. Gene predictions were processed using a series of auxiliary scripts from Another Gtf/Gff Analysis Toolkit (AGAT) v.0.8.063⁶⁵. Briefly, gene predictions were clean with agat_convert_sp_gxf2gxf.pl, renamed with agat_sp_manage_functional_annotation.pl, overlapping prediction corrected with agat_sp_fix_overlaping_genes.pl and coding sequence regions (CDS) with <100 amino acid and incomplete gene predictions (i.e., without start and/or stop codons) were corrected and/or removed with agat_sp_add_start_and_stop.pl and agat_sp_filter_incomplete_gene_coding_models.pl, respectively. Finally, the overall statistics of the processed predictions were retrieved using agat_sp_statistics.pl and the predicted genes, protein, CDS and exon sequences were retrieved using agat_sp_extract_sequences.pl. The protein sequences were next used for functional annotation, using InterProScan v.5.44.80⁶⁶, as well as BLASTP searches against the RefSeq database⁶⁰. BLASTP homology searches were obtained using DIAMOND v.2.0.11.149⁶¹, specifying the parameters “-k 1, -b 20, -e 1e-5,–sensitive,–outfmt 6”. To validate the set of proteins obtained, the BUSCO scores were estimated based on the protein set, using the Eukaryota and Metazoa databases, as described previously.

Table 3 List of proteomes used for BRAKER2 gene prediction pipeline.

Full size table

Data Records

The raw reads sequencing outputs were deposited at the NCBI Sequence Read Archive with the accession’s numbers: SRR23060683, SRR23060685, SRR23060678 and SRR23060675 for PacBio CCS HiFi; SRR23060686 for Illumina PE⁶⁷. The Genome assembly is available under accession number JAQISU000000000⁶⁸. BioSample accession number is SAMN32554582 and BioProject PRJNA917855⁶⁹. The remaining information was uploaded to figshare. In detail, the files uploaded to figshare⁷⁰ include the final unmasked and masked genome assemblies (Ude_BIV7592_haploid.fa and Ude_BIV7592_haploid_SM.fa), the two pseudohaplotypes genome assemblies generated by Hifiasm assembler (Ude_BIV7592_pseudohaplotype_1.fas.gz and Ude_BIV7592_pseudohaplotype_2.fas.gz), the annotation file (Ude_BIV7592_annotation_v4.gff3), predicted genes (Ude_BIV7592_genes_v4.fasta), predicted messenger RNA (Ude_BIV7592_mrna_v4.fasta), predicted open reading frames (Ude_BIV7592_cds_v4.fasta), predicted proteins (Ude_BIV7592_proteins_v4.fasta), as well as full table reports for Braker gene predictions and InterProScan functional annotations (Ude_BIV7592_annotation_v4_InterPro_report.txt) and RepeatMasker predictions (Ude_BIV7592_annotation_v4_RepeatMasker.tbl).

Technical Validation

Raw datasets and pre-assembly processing quality control

Raw sequencing outputs general statistics are provided in Table 2. GenomeScope2 estimated genome size was ~2.31 Gb and heterozygosity levels of ~0.64% (Fig. 3a), both within the values observed for other Unionidae genomes available^24,25,26,27.

Genome assembly metrics

Hifiasm produced the overall most contiguous and complete (accessed under BUSCO scores) genome assembly of all the tested assemblers (Table 4). Both Flye and Peregrine-2021 were very inefficient in collapsing haplotypes, resulting in unexpectedly large assemblies with high levels of duplicated BUSCO scores (Table 4). Conversely, Hifiasm and NextDenovo efficiently resolve duplicates while ensuring high complete BUSCO scores (Table 4). Additionally, Hifiasm produced a much more contiguous genome assembly, with an almost 5-fold increased N50 length (Table 4). Although the BUSCO scores of the Hifiasm assembly had residual percentages of duplicated sequences, considering the increased genome size compared with GenomeScope estimation, as well as the genome sizes of other Unionidae assemblies (Table 5), we tested several similarity thresholds for duplicates in Hifiasm. The impact of the resulting assemblies on the overall statistics was limited, i.e., -s 0.50-0.35, or had no impact at all, i.e., -l 3 (Table 4). Although two of the assemblies, i.e., -s 0.50 and -s 0.45, show a slight increase in the N50 length (Table 4), given the overall little impact in the final genome size, we opted to use the Hifiasm default assembly as the final assembly. Moreover, purg-dups software did not remove any additional sequences from the Hifiasm default assembly, suggesting that reducing the similarity threshold for duplicate haplotigs (option -s) might be over-purging the assembly.

Table 4 Unio delphinus genome assemblies tests’ general statistics.

Full size table

Table 5 General statistics of the Unio delphinus final genome assembly (p_ctg); Unio delphinus alternative haplotypes genome assemblies (hap1 and hap2); other published freshwater mussels genome assemblies.

Full size table

The final genome assembly has a total length of ~2.5 Gbp, which is relatively larger than the GenomeScope size estimation, i.e., ~2.31 Gbp (Table 5, Fig. 3a). Although unexpected, the fact is that from all the primary assemblies here produced (from different software and Hifiasm parameters), none had a total length close to those estimated from GenomeScope (Tables 4–5). The alternative haplotypes assemblies produced by Hifiasm have a total length similar to the GenomeScope estimations, however, the complete BUSCO scores were reduced for these assemblies with no significate impact on duplicates (Table 5). On the other hand, purge-dups did not report any duplicated sequences in the assembly, which further support that Hifiasm efficiently resolved the haplotype variants. Moreover, the few genome assemblies available for freshwater mussels, show considerable distinct genome sizes (up to 696Mbp difference in size), even within the family Unionidae (Table 5). Consequently, the discrepancies between GenomeScope and the final genome size are likely a consequence of short read-based k-mer frequency spectrum analyses inaccurate estimation of the genome size.

The assembly here presented also shows, the most contiguous freshwater mussel genome assembly available to date, with a contig N50 length of ~ 10 Mbp, which represents a ~5-fold increase in N50 length regarding the only other PacBio-based genome assembly, i.e., from P. streckersoni²⁵ (Table 5). The levels of completeness reported by BUSCOs scores are also within those observed for other freshwater mussel genome assemblies, with nearly no fragmented nor missing hits for both the eukaryotic and metazoan curated lists of near-universal single-copy orthologous (Table 5). The KAT k-mer analyses revealed a low level of k-mer duplication (blue, green, purple, and orange in Fig. 3b), with a high level of haplotype uniqueness (red in Fig. 3b) and a similar k-mer distribution to GenomeScope2 (performed with Illumina PE reads Fig. 3a,b). Both short-read, RNA-seq and long-read back-mapping percentages resulted in an almost complete mapping (Table 5). Finally, BlobTools Read Coverage Plots (ReadCovPlot) shows a dominance of hits with Mollusca (41.68%), followed by two groups with a similar hit percentage, i.e., undefined (27.41%) and Arthropoda (22.81%) (Fig. S1). The high values of undefined hits are expected given the overwhelming low number of closely related species with annotated genomes available on NCBI. Only 16 bivalves’ genomes have annotations available of NCBI, none of which belong to freshwater mussels or Palaeoheterodonta. In fact, annotations are only available for two higher-level bivalve clades, the vast majority for Pteriomorphia (n = 12) and the remaining for Imparidentia (n = 4). Moreover, this low and biased representation of annotated references most likely also explains the apparent contamination with Arthropoda (Fig. S1), as unspecific hits with unrelated taxa have been observed in other recent freshwater mussel genome assemblies²⁴. Nevertheless, to deeply scrutinize for possible contaminations, the percentage of phyla representation was also quantified from the U. delphinus predicted proteins, using the RefSeq BLASTP search outputs (Fig. S2, Supplementary File 2). The results show the dominance of hits with Mollusca, with other taxa having residual representation and low percentages of identity, thus unlikely to represent contaminations (Fig. S2, Supplementary File 2).

Overall, these general statistics validate the high completeness, low redundancy, and quality of the final genome assembly.

Repeat masking, gene models prediction, and annotation

RepeatModeler/RepeatMasker masked 52.83% of the genome, a value within those observed for other Unionida genome assemblies and close to the GenomeScope estimation (Table 6, Fig. 3a). Unlike the results observed in previous freshwater mussel’s genome assemblies^24,25, most repeats are classified as DNA elements (21.92%, ~ 549 Mgp), rather than unclassified (16.32%, ~ 408 Mgp), with the remaining categories having similar percentages (Table 6). These results might be a consequence of PacBio HiFi reads efficiency in resolving repetitive regions thus facilitating their classification. BRAKER2 gene prediction identified 44,382 CDS, which is close to the predictions of the other freshwater mussel assemblies (Table 5). BUSCO scores for protein predictions showed almost no missing hits for either of the near-universal single-copy orthologous databases used (Table 5). The number of functionally annotated genes was 32,089, which is similar to the number of annotated genes for the Margaritifera margaritifera genome assembly (Table 5)²⁴. Overall, the numbers of both predicted and annotated genes are within the expected range for bivalves (reviewed in⁷¹), as well as within the records of other freshwater mussel assemblies (Table 5)^24,25,26,27.

Table 6 RepeatMasker report of the content of repetitive elements in the Unio delphinus genome assembly.

Full size table

Code availability

All software with respective versions and parameters used for producing the resources here presented (i.e., transcriptome assembly, pre and post-assembly processing stages, and transcriptome annotation) are listed in the methods section. Software programs with no parameters associated were used with the default settings.

References

Allendorf, F. W., Hohenlohe, P. A. & Luikart, G. Genomics and the future of conservation genetics. Nature Reviews Genetics 2010 11:10 11, 697–709 (2010).
CAS Google Scholar
Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol Evol 37, 197–202 (2022).
Article CAS PubMed Google Scholar
Hohenlohe, P. A., Funk, W. C. & Rajora, O. P. Population genomics for wildlife conservation and management. Mol Ecol 30, 62–82 (2021).
Article PubMed Google Scholar
Meek, M. H. & Larson, W. A. The future is now: Amplicon sequencing and sequence capture usher in the conservation genomics era. Mol Ecol Resour 19, 795–803 (2019).
Article PubMed Google Scholar
Paez, S. et al. Reference genomes for conservation. Science (1979) 377, 364–366 (2022).
CAS Google Scholar
Stephan, T. et al. Darwinian genomics and diversity in the tree of life. Proc Natl Acad Sci USA 119 (2022).
van Oppen, M. J. H. & Coleman, M. A. Advancing the protection of marine life through genomics. PLoS Biol 20, e3001801 (2022).
Article PubMed PubMed Central Google Scholar
Bertorelle, G. et al. Genetic load: genomic estimates and applications in non-model animals. Nature Reviews Genetics 2022 23:8 23, 492–503 (2022).
CAS Google Scholar
Vaughn, C. C., Nichols, S. J. & Spooner, D. E. Community and foodweb ecology of freshwater mussels. 27, 409–423, https://doi.org/10.1899/07-058.1 (2015).
Vaughn, C. C. Ecosystem services provided by freshwater mussels. Hydrobiologia 2017 810:1 810, 15–27 (2017).
Google Scholar
Lopes-Lima, M. et al. Biology and conservation of freshwater bivalves: Past, present and future perspectives. Hydrobiologia 735, 1–13, https://doi.org/10.1007/s10750-014-1902-9 (2014).
Haag, W. R. North American Freshwater Mussels: Natural History, Ecology, and Conservation. (Cambridge University Press, 2012).
Lopes-Lima, M. et al. Conservation status of freshwater mussels in Europe: state of the art and future challenges. Biological Reviews 92, 572–607 (2017).
Article PubMed Google Scholar
Cuttelod, A., Seddon, M. & Neubert, E. European red list of non-marine molluscs. (Publications Office of the European Union Luxembourg, 2011).
Lopes-Lima, M. et al. Conservation of freshwater bivalves at the global scale: diversity, threats and research needs. Hydrobiologia 810, 1–14 (2018).
Article Google Scholar
Lopes-Lima, M. et al. Setting the stage for new ecological indicator species: A holistic case study on the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. Ecol Indic 111, 105987 (2020).
Article Google Scholar
Araujo, R. et al. Las náyades de la península Ibérica As náiades da Península Ibérica The naiads of the Iberian Peninsula. 27, 7–72 (2009).
Araujo, R., Feo, C., Pou, Q. & Campos, M. Conservation of two endangered European freshwater mussels (Bivalvia: Unionidae): a three-year, semi-natural breeding experiment. Nautilus (Philadelphia) 129, 126–135 (2015).
Google Scholar
Robson, B. J., Chester, E. T., Mitchell, B. D. & Matthews, T. G. Disturbance and the role of refuges in mediterranean climate streams. Hydrobiologia 719, 77–91 (2013).
Article Google Scholar
Cid, N. et al. High Variability Is a Defining Component of Mediterranean-Climate Rivers and Their Biota. Water 2017, Vol. 9, Page 52 9, 52 (2017).
Google Scholar
Froufe, E. et al. Who lives where? Molecular and morphometric analyses clarify which Unio species (Unionida, Mollusca) inhabit the southwestern Palearctic. Org Divers Evol 16, 597–611 (2016).
Article Google Scholar
Fonseca, M. M., Lopes-Lima, M., Eackles, M. S., King, T. L. & Froufe, E. The female and male mitochondrial genomes of Unio delphinus and the phylogeny of freshwater mussels (Bivalvia: Unionida). Mitochondrial DNA B Resour 1, 954–957 (2016).
Article Google Scholar
Araujo, R., Buckley, D., Nagel, K. O., García-Jiménez, R. & Machordom, A. Species boundaries, geographic distribution and evolutionary history of the Western palaearctic freshwater mussels Unio (Bivalvia: Unionidae). Zool J Linn Soc 182, 275–299 (2018).
Article Google Scholar
Gomes-dos-Santos, A. et al. The Crown Pearl: a draft genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758). DNA Research 28 (2021).
Smith, C. H. A High-Quality Reference Genome for a Parasitic Bivalve with Doubly Uniparental Inheritance (Bivalvia: Unionida). Genome Biol Evol 13 (2021).
Rogers, R. L. et al. Gene family amplification facilitates adaptation in freshwater unionid bivalve Megalonaias nervosa. Mol Ecol 30, 1155–1173 (2021).
Article CAS PubMed Google Scholar
Renaut, S. et al. Genome Survey of the Freshwater Mussel Venustaconcha ellipsiformis (Bivalvia: Unionida) Using a Hybrid De Novo Assembly Approach. Genome Biol Evol 10, 1637–1646 (2018).
Article CAS PubMed PubMed Central Google Scholar
Roznere, I., Sinn, B. T. & Watters, G. T. The Amblema plicata Transcriptome as a Resource to Assess Environmental Impacts on Freshwater Mussels. Freshwater Mollusk Biology and Conservation 21, 57–64 (2018).
Google Scholar
Wang, R. et al. Rapid development of molecular resources for a freshwater mussel, Villosa lienosa (Bivalvia:Unionidae), using an RNA-seq-based approach. 31, 695–708, https://doi.org/10.1899/11-149.1 (2015).
Luo, Y. et al. Transcriptomic Profiling of Differential Responses to Drought in Two Freshwater Mussel Species, the Giant Floater Pyganodon grandis and the Pondhorn Uniomerus tetralasmus. PLoS One 9, e89481 (2014).
Article ADS PubMed PubMed Central Google Scholar
Patnaik, B. B. et al. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery. PLoS One 11, e0148622 (2016).
Article PubMed PubMed Central Google Scholar
Wang, X., Liu, Z. & Wu, W. Transcriptome analysis of the freshwater pearl mussel (Cristaria plicata) mantle unravels genes involved in the formation of shell and pearl. Molecular Genetics and Genomics 292, 343–352 (2017).
Article CAS PubMed Google Scholar
Yang, Q. et al. Histopathology, antioxidant responses, transcriptome and gene expression analysis in triangle sail mussel Hyriopsis cumingii after bacterial infection. Dev Comp Immunol 124, 104175 (2021).
Article CAS PubMed Google Scholar
Bertucci, A. et al. Transcriptomic responses of the endangered freshwater mussel Margaritifera margaritifera to trace metal contamination in the Dronne River, France. Environmental Science and Pollution Research 24, 27145–27159 (2017).
Article CAS PubMed Google Scholar
Robertson, L. S., Galbraith, H. S., Iwanowicz, D., Blakeslee, C. J. & Cornman, R. S. RNA sequencing analysis of transcriptional change in the freshwater mussel Elliptio complanata after environmentally relevant sodium chloride exposure. Environ Toxicol Chem 36, 2352–2366 (2017).
Article CAS PubMed Google Scholar
Capt, C. et al. Deciphering the Link between Doubly Uniparental Inheritance of mtDNA and Sex Determination in Bivalves: Clues from Comparative Transcriptomics. Genome Biol Evol 10, 577–590 (2018).
Article CAS PubMed PubMed Central Google Scholar
Huang, D., Shen, J., Li, J. & Bai, Z. Integrated transcriptome analysis of immunological responses in the pearl sac of the triangle sail mussel (Hyriopsis cumingii) after mantle implantation. Fish Shellfish Immunol 90, 385–394 (2019).
Article CAS PubMed Google Scholar
Capt, C., Renaut, S., Stewart, D. T., Johnson, N. A. & Breton, S. Putative Mitochondrial Sex Determination in the Bivalvia: Insights From a Hybrid Transcriptome Assembly in Freshwater Mussels. Front Genet 10, 840 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, X., Bai, Z. & Li, J. The Mantle Exosome and MicroRNAs of Hyriopsis cumingii Involved in Nacre Color Formation. Marine Biotechnology 21, 634–642 (2019).
Article CAS PubMed Google Scholar
Cornman, R. S., Robertson, L. S., Galbraith, H. & Blakeslee, C. Transcriptomic Analysis of the Mussel Elliptio complanata Identifies Candidate Stress-Response Genes and an Abundance of Novel or Noncoding Transcripts. PLoS One 9, e112420 (2014).
Article ADS PubMed PubMed Central Google Scholar
Gomes-dos-Santos, A. et al. The gill transcriptome of threatened European freshwater mussels. Sci Data 9, 494 (2022).
Article PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1–10 (2020).
Article Google Scholar
Machado, A. M. et al. A genome assembly of the Atlantic chub mackerel (Scomber colias): a valuable teleost fishing resource. GigaByte 2022, 1–21 (2022).
Article Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, 1–4 (2021).
Article CAS Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 2022 40:9 40, 1332–1335 (2022).
CAS Google Scholar
Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13, e1005595 (2017).
Article ADS PubMed PubMed Central Google Scholar
Meng, G., Li, Y., Yang, C. & Liu, S. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Res 47, e63–e63 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 2019 37:5 37, 540–546 (2019).
CAS Google Scholar
Chin, C.-S. & Khalak, A. Human Genome Assembly in 100 Minutes. bioRxiv 705616, https://doi.org/10.1101/705616 (2019).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol 38, 4647–4654 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS PubMed PubMed Central Google Scholar
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mapleson, D., Accinelli, G. G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: A K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574–576 (2017).
Article CAS PubMed Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. (2013).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat Methods 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Laetsch, D. R. & Blaxter, M. L. BlobTools: interrogation of genome assemblies. F1000Res 6, 1287 (2017).
Article Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35, D61–D65 (2007).
Article CAS PubMed Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Smit, A. & Hubley, R. RepeatModeler. www.repeatmasker.org (2015).
Smit, A. & Hubley, R. RepeatMasker. www.repeatmasker.org (2015).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3, 1–11 (2021).
Google Scholar
Dainat, J., Hereñú, D. & Pucholt, P. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. Zenodo https://doi.org/10.5281/zenodo.4205393 (2020).
Quevillon, E. et al. InterProScan: Protein domains identifier. Nucleic Acids Res 33, W116–W120 (2005).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP417152 (2023).
Gomes-dos-Santos, A. et al. Unio delphinus voucher BIV7592, whole genome shotgun sequencing project. Genbank https://identifiers.org/nucleotide:JAQISU000000000 (2023).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA917855 (2023).
Gomes-dos-Santos, A. et al. PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. figshare https://doi.org/10.6084/m9.figshare.21878946 (2023).
Gomes-dos-Santos, A., Lopes-Lima, M., C. Castro, L. F. & Froufe, E. Molluscan genomics: the road so far and the way forward. Hydrobiologia 847, 1705–1726 (2020).
Article Google Scholar
Lehner, B. & Grill, G. Global river hydrography and network routing: Baseline data and new approaches to study the world’s large river systems. Hydrol Process 27, 2171–2186 (2013).
Article ADS Google Scholar

Download references

Acknowledgements

AGS was funded by the Portuguese Foundation for Science and Technology (FCT) under the grants SFRH/BD/137935/2018 and COVID/DB/152933/2022, which also supported MLL (2020.03608.CEECIND) and EF (CEECINST/00027/2021). This research was developed under the project EdgeOmics - Freshwater Bivalves at the Edge: Adaptation genomics under climate-change scenarios (PTDC/CTA-AMB/3065/2020) funded by FCT through national funds. Additional strategic funding was provided by FCT UIDB/04423/2020 and UIDP/04423/2020. We thank the two anonymous reviewers for the helpful remarks and suggestions, which have significantly improved the manuscript.

Author information

Authors and Affiliations

CIIMAR/CIMAR — Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, S/N, P, 4450-208, Matosinhos, Portugal
André Gomes-dos-Santos, M. André Machado, L. Filipe C. Castro & Elsa Froufe
Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre 1021/1055, 4169-007, Porto, Portugal
André Gomes-dos-Santos, M. André Machado & L. Filipe C. Castro
BIOPOLIS Program in Genomics, Biodiversity and Ecosystems, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal
Manuel Lopes-Lima
IUCN SSC Mollusc Specialist Group, c/o IUCN, David Attenborough Building, Pembroke St, Cambridge, England
Manuel Lopes-Lima
Centro de Investigação de Montanha (CIMO), Instituto Politécnico de Bragança, Bragança, Portugal
Amílcar Teixeira

Authors

André Gomes-dos-Santos
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Lopes-Lima
View author publications
You can also search for this author in PubMed Google Scholar
M. André Machado
View author publications
You can also search for this author in PubMed Google Scholar
Amílcar Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
L. Filipe C. Castro
View author publications
You can also search for this author in PubMed Google Scholar
Elsa Froufe
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.F., M.L.L. and L.F.C.C. designed and conceived this work. M.L.L. and A.T. collected the samples. A.G.S. and A.M.M. carry on all the analysis. A.G.S. and E.F. wrote the first version of the manuscript. All authors read, revised, and approved the final manuscript.

Corresponding authors

Correspondence to André Gomes-dos-Santos, Manuel Lopes-Lima or Elsa Froufe.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gomes-dos-Santos, A., Lopes-Lima, M., Machado, M.A. et al. PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. Sci Data 10, 340 (2023). https://doi.org/10.1038/s41597-023-02251-7

Download citation

Received: 16 January 2023
Accepted: 18 May 2023
Published: 01 June 2023
DOI: https://doi.org/10.1038/s41597-023-02251-7

This article is cited by

The transcriptome assembly of the European freshwater mussel Unio elongatulus C. Pfeiffer, 1825
- André Gomes-dos-Santos
- Elza Fonseca
- Elsa Froufe
Scientific Data (2024)
Single molecule real-time sequencing data sets of Hypericum perforatum L. plantlets and cell suspension cultures
- Rajendran K. Selvakesavan
- Maria Nuc
- Gregory Franklin
Scientific Data (2024)