The survey and reference assisted assembly of the Octopus vulgaris genome

Zarrella, Ilaria; Herten, Koen; Maes, Gregory E.; Tai, Shuaishuai; Yang, Ming; Seuntjens, Eve; Ritschard, Elena A.; Zach, Michael; Styfhals, Ruth; Sanges, Remo; Simakov, Oleg; Ponte, Giovanna; Fiorito, Graziano

doi:10.1038/s41597-019-0017-6

Download PDF

Data Descriptor
Open access
Published: 01 April 2019

The survey and reference assisted assembly of the Octopus vulgaris genome

Ilaria Zarrella¹^na1,
Koen Herten^2,3^na1,
Gregory E. Maes^3,4,5^na2,
Shuaishuai Tai ORCID: orcid.org/0000-0001-5241-1782⁶^na1,
Ming Yang⁶^na1,
Eve Seuntjens⁷,
Elena A. Ritschard⁸,
Michael Zach⁸,
Ruth Styfhals^7,9,
Remo Sanges⁹,
Oleg Simakov⁸^na2,
Giovanna Ponte^1,9^na2 &
…
Graziano Fiorito⁹^na2

Scientific Data volume 6, Article number: 13 (2019) Cite this article

6976 Accesses
44 Citations
31 Altmetric
Metrics details

Subjects

Abstract

The common octopus, Octopus vulgaris, is an active marine predator known for the richness and plasticity of its behavioral repertoire, and remarkable learning and memory capabilities. Octopus and other coleoid cephalopods, cuttlefish and squid, possess the largest nervous system among invertebrates, both for cell counts and body to brain size. O. vulgaris has been at the center of a long-tradition of research into diverse aspects of its biology. To leverage research in this iconic species, we generated 270 Gb of genomic sequencing data, complementing those available for the only other sequenced congeneric octopus, Octopus bimaculoides. We show that both genomes are similar in size, but display different levels of heterozygosity and repeats. Our data give a first quantitative glimpse into the rate of coding and non-coding regions and support the view that hundreds of novel genes may have arisen independently despite the close phylogenetic distance. We furthermore describe a reference-guided assembly and an open genomic resource (CephRes-gdatabase), opening new avenues in the study of genomic novelties in cephalopods and their biology.

Design Type(s)	species comparison design • sequence analysis objective • sequence assembly objective
Measurement Type(s)	whole genome sequencing assay
Technology Type(s)	DNA sequencing
Factor Type(s)
Sample Characteristic(s)	Octopus vulgaris • testis • ocean biome

Machine-accessible metadata file describing the reported data (ISA-Tab format)

The chromosome-level genome of Cherax quadricarinatus

Article Open access 17 April 2023

Honglin Chen, Rui Zhang, … Bao Lou

Assembly of the poorly differentiated Verasper variegatus W chromosome by different sequencing technologies

Article Open access 13 December 2023

Xi-wen Xu, Pengchuan Sun, … Songlin Chen

Draft genome of the big-headed turtle Platysternon megacephalum

Article Open access 16 May 2019

Dainan Cao, Meng Wang, … Shiping Gong

Background & Summary

Octopus vulgaris is a benthic, neritic species belonging to the class Cephalopoda. It occurs from the coastline to the outer edge of the continental shelf, inhabiting various marine habitats at depths spanning from 0 to 200 m. O. vulgaris is one of the most widely distributed species belonging to the genus, and is an important commercially harvested resource for human consumption. It is found worldwide in temperate and tropical waters^1,2,3. Throughout its distribution range, the animal undertakes limited seasonal migrations: mostly found in deep waters in winter and shallow waters in summer.

O. vulgaris is perhaps the most famous and best studied of all octopus species, largely due to the initiative of Professor John Z. Young^4,5. Since the late 1940 s, Young carried out at the Stazione Zoologica Anton Dohrn of Napoli (Italy) a systematic analysis of the neural structures underlying behavioural plasticity in this animal^6,7. Based on this contribution, the anatomy of O. vulgaris nervous system⁸ and its physiology and life history^4,9,10,11 have been well characterized. It is the phenomenological proximity of behavioral traits and phylogenetic distance in respect to higher vertebrates that guaranteed the short, but wide success of cephalopods^5,10. O. vulgaris in particular became a “model of the brain”^12,13, and more recently a case for studying the evolution of cognition in invertebrates^{7,14,15,16,17,18,19}. Researchers still use O. vulgaris as an organism to study behavioural and neural plasticity including learning and memory recall^5,12,20, regeneration^21,22,23,24 and sophisticated cognition^{7,14,15,16,17,25}.

Currently available genomic resources for molluscs are scarce, considering the species abundance and the commercial value of the phylum Mollusca. Publicly available molluscan genomes include a dozen representatives from bivalves, gastropods^{26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42} and to-date only three cephalopods, namely the California two-spot octopus Octopus bimaculoides⁴³ and, more recently, for Callistoctopus minor⁴⁴ and Euprymna scolopes⁴⁵.

Although the first step towards cephalopod genetics was made over 30 years ago⁴⁶, cephalopod research is only slowly entering the genomics era^10,47. Obtaining high quality cephalopod genomes has been impeded due to their large size (e.g., O. bimaculoides: 2.7 Gb; Gregory, 2018 - Animal Genome Size Database, http://www.genomesize.com), heterozygosity and high abundance of repeat regions^43,47,48. However, several collaborative genome projects are currently underway for a variety of cephalopod species such as the nautilus, Sepia officinalis, Idiosepius paradoxus and Doryteuthis pealeii.

Cephalopods arose more than 500 Mya and diverged into over 800 current living species with highly diversified life styles and body plans⁴⁸. Translocations, duplications, exon shuffling and gene conversions occurred within the cephalopod genome during evolution, which might explain the development of different morphological novelties, such as the prehensile arms, the unique jet propulsion system, the ink sac and sophisticated sensory and neural systems⁴⁹. The analysis of O. bimaculoides genome revealed an extensive expansion of particular gene families, including protocadherins and the C2H2 superfamily of zinc-finger transcription factors⁴³, as well as novel octopus-specific genes expressed in specialized structures such as suckers, skin and brain (for review see also Shigeno et al.¹⁸). These genome-level novelties are accompanied by other sophisticated innovations such as extensive RNA editing, particularly in the nervous system cells^50,51,52. Furthermore, partial genome sequencing of several cephalopods showed that repeat elements, in particular transposable elements, are abundant^53,54. Indeed, the genome of O. bimaculoides revealed that over 45% of the genome is comprised of repetitive elements⁴³.

The study of cephalopod biological innovations^10,18,43,55 is driven by the unique scientific value of these animals for evolutionary genomics, neuroscience and cognition^{7,10,18,25,43,55,56,57,58} which continues the heritage of the discovery of the action potential in the squid giant axon, a seminal contribution to neuroscience⁵⁹. Furthermore, the phylogenetic relationships within the cephalopods have not yet been fully elucidated and biological research would benefit from more cephalopod genomes^60,61.

In line with those previous and current efforts, and to promote data sharing among cephalopod researchers^10,47, we present the sequence and draft assembly of the common octopus, Octopus vulgaris, genome. It is noteworthy to report that the two species (i.e., O. vulgaris and O. bimaculoides), although both belonging to the same genus, go through a substantially different life cycle since the paralarval stage is absent in O. bimaculoides⁶². Therefore, the two species represent different biological and physiological adaptations among closely related species. The genomic sequencing of both octopus species and our online platform to browse these data will allow for future comparative genomics studies, revealing key genomic innovations and facilitating the discovery of the molecular basis of intricate processes such as learning, regeneration and the evolution of complex brains.

Methods

Genomic DNA preparation

An adult male belonging to the species O. vulgaris Cuvier, 1797 (450 g body weight) was caught by fishermen from the Bay of Naples in 2011^1,2 and immediately humanely-killed^63,64. Given the high rate of heterozygosity in marine organisms^65,66, tissue from a single individual was used to extract the genomic DNA (to avoid contamination, spermatophores were used). Spermatophores in octopus are stored within the Needham’s sac, structure that was dissected following Chapko and coworkers⁶⁷. Tissue (124 mg) was used to extract the genomic DNA following the recommended phenol-chloroform extraction protocol by the Beijing Genomics Institute (BGI)-Shenzhen. Briefly, tissue lysis occurred overnight at 56 °C after adding 3.0 ml of lysis buffer containing proteinase K (300 μg; Sigma-Aldrich, Saint Louis, Missouri, United States) and RNase A (100 μg; Sigma-Aldrich, Saint Louis, Missouri, United States). DNA was then extracted with phenol (2X), phenol:chloroform, chloroform and was subsequently precipitated. Genomic DNA was dissolved in TE buffer to reach a final concentration of 1 μg/μl.

Genome sequencing and quality control

A total of four genomic DNA libraries (with different insert sizes: 170, 250, 500 and 800 bp) were constructed following the Illumina library preparation protocols. Briefly, to construct the paired-end libraries DNA was fragmented by Adaptive Focused Acoustics technology (Covaris) and tested via gel-electrophotometry, the fragmented DNA combined with End Repair Mix (20 °C for 30 min). After purification, DNA ends were blunted and an A base was added to the 3′ ends. DNA adaptors with a single T-base 3′-end overhang were ligated to the above products. Ligation products were purified on 2% agarose gels to recover the target fragments and were purified from the gels (Qiagen Gel Extraction kit, 28704). Several rounds of PCR amplification with PCR Primer Cocktail and PCR Master Mix were performed to enrich the Adapter-ligated DNA fragments. Then the PCR products selected by running another 2% agarose gel to recover the target fragments and the gel purified (QIAquick Gel Extraction kit, QUIAGEN). The final library was quantified by assessing the average molecule length (Agilent 2100 Bioanalyzer), and by Real-Time qRT-PCR. A total of 277 Gb of raw data were generated by Illumina Hiseq 2000 at BGI.

All libraries were sequenced in a paired-end mode with read lengths of 100 bp or 150 bp. Reads were filtered and trimmed (100 bp to 95 bp, 150 bp to 145 bp) using SOAPnuke software (https://github.com/BGI-flexlab/SOAPnuke)⁶⁸ which yielded 250 Gb of data. Low-quality reads, reads with adaptor sequences and duplicated reads were filtered, and if the quality of bases at the head or tail of the reads was low, we directly trimmed them from 100 bp to 95 bp (PE100) or form 150 bp to 145 bp (PE150). The remaining high-quality data were used in the further analysis. SGA PreQC v0.10.14⁶⁹ modules were run per library and on the combined libraries to estimate various genome parameters (Table 1 and Table 2).

Table 1 Main statistics from O.

Full size table

Table 2 k-mer = 17 raw read statistics for Octopus vulgaris genome data.

Full size table

Draft genome assembly

We applied Assembly By Short Sequencing 2.0.2 (ABySS^70,71) for both k-mer sizes that were suggested by SGA PreQC. The quality of assemblies (ABySS kmer41 and ABySS kmer81) was evaluated by QUAST 4. 3⁷². A summary of various statistics is shown in Table 3. Based on the QUAST analysis the optimal kmer size for the ABySS assembly was estimated to be 81. Since a higher heterozygosity rate of the genome was predicted based on these initial results, the Redundans 0.13 c⁷³ tool was used to reduce the number of ABySS contigs from the initial assemblies. Redundans reduces contigs by removing highly similar contigs. These highly similar contigs are originally the different alleles of the same genomic position, but are too different for the De Brujin graph method to be assembled into the same contig (too much variation inside one kmer). Redundans collapses and scaffolds these reduced contigs into single genomic locations. Redundans reduced the number of scaffolds of the draft genome over seven (7) times, while improving assembly statistics (see Table 3).

Table 3 Assembly statistics for Octopus vulgaris.

Full size table

Reference Assisted Scaffolding

Given the availability of a relatively good reference genome of a related species (O. bimaculoides)⁴³, a reference assisted scaffolding tool was used to optimize the genome. The reduced scaffolds were aligned to the O. bimaculoides genome using blastn⁷⁴ of the blast+ toolkit 2.8.0-alpha. These alignments were used by chromosomer 0.1.3 (https://github.com/gtamazian/Chromosomer) to scaffold the reduced scaffolds according to the given genome.

Assessment of draft genomes

An assessment of the draft genomes (ABySS, Redundans and chromosomer) was performed by looking for the highly conserved genes using BUSCO 3.0. 2⁷⁵. The Metazoa odb9 database was used, supplying 978 orthologs. The number of complete orthologs increased with each improvement of the assembly (Table 3), confirming the gain in assembly quality of the final chromosomer version. The final genome build has over 50% complete BUSCOs, and 10% fragmented BUSCOs (orthologs found, but scattered over multiple scaffolds).

Data Records

The draft genome(s) of O. vulgaris as shown in Table 3 has been made publicly available on the genome browser and data repository of the Association for Cephalopod Research that initiated this work (http://www.cephalopodresearch.org/ceph_gdatab/) in collaboration with the Department of Molecular Evolution and Development, University of Vienna. This web resource is based on the browser originally designed by University of California, Santa Cruz (UCSC)⁷⁶ and will be maintained and curated to keep track of all present and upcoming octopus genomes. It includes comparative genomics tracks such as read mapping and whole genome alignment between the two octopus species. Raw reads have also been deposited to the NCBI SRA⁷⁷. The reference-guided assembly has been deposited at GenBank⁷⁸ and its original version is also provided in the associated FigShare record (chromosomer.fa) together with its annotation (gene_models.chromosomer.gff), and other assemblies listed in Table 3 (Octopus vulgaris genome assemblies⁷⁹. Table 2 and Table 3 summarize statistics about O. vulgaris genome as deduced from our current sequencing data and Fig. 1 shows the kmer (17mer) distribution determining the overall sequencing depth (Table 1 and 2).

Technical Validation

Quality control

The quantity and integrity of the genomic DNA was analysed via agarose gel electrophoresis and with a NanoDrop spectrophotometer (Thermo Fisher Scientific; concentration of 1 μg/μl, A₂₆₀/A₂₈₀ = 1.84 and A₂₃₀/A₂₆₀ = 2.2). DNA integrity was analysed with Agilent Bioanalyzer 2100.

Quality control DNA library

To assess the quality of Illumina reads FastQC (www.bioinformatics.babraham.ac.uk/projects/fastqc) was performed on all raw data. Trimmomatic v0.36⁸⁰ was was not able to identify any significant adaptor sequence contamination within the raw data. The data were mapped to the PhiX control library (Illumina, Inc) using Bowtie2 v2.3.4⁸¹ and no matches were found.

Sequencing depth assessment

We used jellyfish 2.2. 10⁸² on the raw read data using kmer size of 17 bp. This resulted in a depth of sequencing histogram (Fig. 1) showing sequencing depth peak of around 76x. Using the kmer depth curve and the cumulative read depth (Fig. 1), repetitiveness, and heterozygosity was conducted independent of the genome assemblies (see Tables 2 and 3). The genome was estimated to be around 2.4 Gb in length with a relatively high heterozygosity rate (>1.1%) and large repetitiveness (>50%).

Genome properties and future steps

To gain information on the genetic distance between the two closely related species O. vulgaris and O. bimaculoides, we mapped all the available raw sequence data from O. vulgaris against the genome of O. bimaculoides⁸³ and found that 74–84% of the data aligned, but that a high percentage (20–50%) was able to align multiple times. The significant proportion of multiple mapping reads suggests that, similar to the O. bimaculoides genome, O. vulgaris genome has a large number (at least 50%) of repetitive elements, confirmed by the cumulative read depth analysis (Fig. 1). Ab initio repeat analysis using dnaPipeTE⁸⁴ revealed similar classes of octopus specific short interspersed nuclear elements (SINE) to be over-represented (Fig. 2), yet the proportions were strikingly different, despite the close phylogenetic distance. This indicates high activity of repetitive elements in the common octopus genome.

Profiling O. bimaculoides regions with read coverage from O. vulgaris, we found that 23,509 O. bimaculoides genes were covered at 90% or more of their coding sequence length by O. vulgaris reads (Fig. 3). Approximately 50% of those genes had a Pfam annotation, including gene families previously reported to have undergone major expansions in the O. bimaculoides genome, such as zinc fingers and protocadherins. This is in strong contrast to only 1,570 O. bimaculoides genes with no O. vulgaris read coverage, with just 14% of those having a Pfam annotation. Those candidates represent very recent novel or highly diverged genes and their number indicates a relatively high rate of novel gene formation in octopus genomes. To investigate non-coding evolution among cephalopods, we furthermore compared the mapping rates to non-repetitive non-coding regions of 100 bp and longer. Again, we found the majority of those loci are covered at 90% length or higher. However, the relative proportion of O. bimaculoides regions not covered by any reads was higher than for the genes, indicating a higher turnover rate for the non-coding, potentially regulatory, sequences (Fig. 3).

To evaluate the completeness of our assemblies, raw reads were mapped using Bowtie2 v2.3.4 against both ABySS kmer81 and kmer41 assemblies. For ABySS kmer 41, at least 99.94% of all the reads were mapped while the percentage of uniquely mapped reads was only around 33–50%. For the ABySS kmer81 assembly, percentages were at least 98% and between 31 and 57%, respectively.

We used our assemblies to estimate whole-genome divergences between the available octopod genomes. Mapping of the scaffolds of 10 kb and longer against the O. bimaculoides genome using MEGABLAST resulted in the overall sequence similarity of 92.4% in the aligned regions of 1 kb and above (Fig. 4). This divergence of around 8% between the two species is higher than the estimated heterozygosity rate of 1.1% in O. vulgaris and lower than the divergence between O. bimaculoides⁸³ and the recently released data of C. minor (82.4% similarity) (Fig. 4, and ref.⁸⁵) from a different genus, providing for the first whole-genome divergence estimates within this clade.

Our assemblies confirm that abundant repeat regions make it difficult to improve the genome based on the currently available sequence data. Future steps will include long read sequencing technology such as proximity-ligation based assemblies (e.g., Dovetail, PhaseGenomics) or longer read technologies (e.g., PacBio) to optimize the current assemblies.

References

De Luca, D., Catanese, G., Procaccini, G. & Fiorito, G. An integration of historical records and genetic data to the assessment of global distribution and population structure in Octopus vulgaris. Front. Ecol. Evol 2, 55 (2014).
Google Scholar
De Luca, D., Catanese, G., Procaccini, G. & Fiorito, G. Octopus vulgaris (Cuvier, 1797) in the Mediterranean Sea: Genetic diversity and population structure. Plos One 11, e0149496 (2016).
Article Google Scholar
Amor, M. D. et al. Morphological assessment of the Octopus vulgaris species complex evaluated in light of molecular-based phylogenetic inferences. Zool. Scr. 46, 275–288 (2017).
Article Google Scholar
Wells, M. J. Octopus: physiology and behaviour of an advanced invertebrate. (Springer Science & Business Media, 1978).
Marini, G., De Sio, F., Ponte, G. & Fiorito, G. In Learning and Memory: A Comprehensive Reference (Second Edition) Vol. Volume 1 - Learning Theory and Behavior (Menzel, Randolf - volume Editor) (ed John H. Byrne) 441–462 (Academic Press, Elsevier, 2017).
Borrelli, L. & Fiorito, G. In Learning and Memory: A Comprehensive Reference (ed. Byrne, J. J.) 605–627 (Academic Press, 2008).
Edelman, D. B. & Seth, A. K. Animal consciousness: a synthetic approach. Trends Neurosci. 32, 476–484 (2009).
Article CAS Google Scholar
Young, J. Z. The anatomy of the nervous system of Octopus vulgaris. (Oxford University Press, 1971).
Zarrella, I., Ponte, G., Baldascino, E. & Fiorito, G. Learning and memory in Octopus vulgaris: a case of biological plasticity. Curr. Opin. Neurobiol. 35, 74–79, https://doi.org/10.1016/j.conb.2015.06.012 (2015).
Article CAS PubMed Google Scholar
O’Brien, C. E., Roumbedakis, K. & Winkelmann, I. E. The Current State of Cephalopod Science and Perspectives on the Most Critical Challenges Ahead From Three Early-Career Researchers. Front. Physiol. 9, 700, https://doi.org/10.3389/fphys.2018.00700 (2018).
Article PubMed PubMed Central Google Scholar
Nixon, M. & Young, J. Z. The brains and lives of Cephalopods. 1-392 (Oxford University, 2003).
Young, J. Z. Computation in the learning system of cephalopods. Biol. Bull. 180, 200–208 (1991).
Article CAS Google Scholar
Young, J. Z. A Model of the Brain. (Clarendon Press, 1964).
Mather, J. A. Cognition in cephalopods. Adv. Stud. Behav. 24, 317–353 (1995).
Article Google Scholar
Mather, J. A. Cephalopod consciousness: behavioural evidence. Conscious. Cogn. 17, 37–48 (2008).
Article Google Scholar
Mather, J. A., Anderson, R. C. & Wood, J. B. Octopus: The Ocean’s Intelligent Invertebrate. 1–208 (Timber Press, 2010).
Mather, J. A. & Dickel, L. Cephalopod complex cognition. Curr. Opin. Behav. Sci. 16, 131–137 (2017).
Google Scholar
Shigeno, S., Andrews, P. L. R., Ponte, G. & Fiorito, G. Cephalopod Brains: An Overview of Current Knowledge to Facilitate Comparison With Vertebrates. Front. Physiol. 9, 952, https://doi.org/10.3389/fphys.2018.00952 (2018).
Article PubMed PubMed Central Google Scholar
Edelman, D. B. How Octopuses See the World and Other Roads Less Traveled: Necessity Versus Sufficiency and Evolutionary Convergence in the Study of Animal Consciousness. J. Shellfish Res. 30, 1001 (2011).
Google Scholar
Young, J. Z. In Cephalopod Neurobiology (eds Abbott, J. N., Williamson, R. & Maddock, L.) 431–443 (Oxford University Press, 1995).
Imperadore, P. Nerve regeneration in the cephalopod mollusc Octopus vulgaris: a journey into morphological, cellular and molecular changes including epigenetic modifications. PhD Thesis (Università della Calabria, 2017).
Imperadore, P. & Fiorito, G. Cephalopod Tissue Regeneration: consolidating over a century of knowledge. Front. Physiol. 9, 593, https://doi.org/10.3389/fphys.2018.00593 (2018).
Article PubMed PubMed Central Google Scholar
Imperadore, P. et al. Nerve regeneration in the cephalopod mollusc Octopus vulgaris: label-free multiphoton microscopy as a tool for investigation. J. Royal Soc. Interface 15, 20170889, https://doi.org/10.1098/rsif.2017.0889 (2018).
Article CAS Google Scholar
Zullo, L., Fossati, S. M., Imperadore, P. & Nödl, M. T. Molecular determinants of Cephalopod muscles and their implication in muscle regeneration. Front. Cell. Dev. Biol. 5, 53 (2017).
Article Google Scholar
Huffard, C. L. Cephalopod neurobiology: an introduction for biologists working in other model systems. Invert. Neurosci. 13, 11–18 (2013).
Article Google Scholar
Barghi, N., Concepcion, G. P., Olivera, B. M. & Lluisma, A. O. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome. Mol. Genet. Genomics 291, 411–422, https://doi.org/10.1007/s00438-015-1119-2 (2016).
Article CAS PubMed Google Scholar
Chen, H. et al. Sequencing and de novo assembly of the Asian clam (Corbicula fluminea) transcriptome using the Illumina GAIIx method. Plos One 8, e79516 (2013).
Article ADS CAS Google Scholar
Du, X. et al. The pearl oyster Pinctada fucata martensii genome and multi-omic analyses provide insights into biomineralization. Gigascience 6, gix059, https://doi.org/10.1093/gigascience/gix059 (2017).
Article ADS CAS Google Scholar
Gómez-Chiarri, M., Warren, W. C., Guo, X. & Proestou, D. Developing tools for the study of molluscan immunity: the sequencing of the genome of the eastern oyster, Crassostrea virginica. Fish Shellfish Immunol 46, 2–4 (2015).
Article Google Scholar
Modica, M. V., Lombardo, F., Franchini, P. & Oliverio, M. The venomous cocktail of the vampire snail Colubraria reticulata (Mollusca, Gastropoda). BMC Genomics 16, 441 (2015).
Article Google Scholar
Murgarella, M. et al. A First Insight into the Genome of the Filter-Feeder Mussel Mytilus galloprovincialis. Plos One 11, e0151561, https://doi.org/10.1371/journal.pone.0151561 (2016).
Article CAS PubMed PubMed Central Google Scholar
Peñarrubia, L., Sanz, N., Pla, C., Vidal, O. & Viñas, J. Using Massive Parallel Sequencing for the Development, Validation, and Application of Population Genetics Markers in the Invasive Bivalve Zebra Mussel (Dreissena polymorpha). Plos One 10, e0120732, https://doi.org/10.1371/journal.pone.0120732 (2015).
Article CAS PubMed PubMed Central Google Scholar
Raghavan, N. & Knight, M. The snail (Biomphalaria glabrata) genome project. Trends Parasitol. 22, 148–151 (2006).
Article CAS Google Scholar
Sadamoto, H. et al. De novo sequencing and transcriptome analysis of the central nervous system of mollusc Lymnaea stagnalis by deep RNA sequencing. Plos One 7, e42546 (2012).
Article ADS CAS Google Scholar
Schell, T. et al. An Annotated Draft Genome for Radix auricularia (Gastropoda, Mollusca). Genome Biol. Evol. 9, 585–592, https://doi.org/10.1093/gbe/evx032 (2017).
Article CAS Google Scholar
Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes. Nature 493, 526 (2013).
Article ADS CAS Google Scholar
Sun, J. et al. Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes. Nat. Ecol. Evol. 1, 0121 (2017).
Article Google Scholar
Uliano-Silva, M. et al. A hybrid-hierarchical genome assembly strategy to sequence the invasive golden mussel, Limnoperna fortunei. Gigascience 7, gix128, https://doi.org/10.1093/gigascience/gix128 (2018).
Article CAS Google Scholar
Wang, S. et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat. Ecol. Evol. 1, 0120 (2017).
Article Google Scholar
Xing, Q. et al. Genome-wide identification, characterization and expression analyses of two TNFRs in Yesso scallop (Patinopecten yessoensis) provide insight into the disparity of responses to bacterial infections and heat stress in bivalves. Fish Shellfish Immunol. 52, 44–56, https://doi.org/10.1016/j.fsi.2016.03.010 (2016).
Article CAS PubMed Google Scholar
Zhang, G. et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490, 49 (2012).
Article ADS CAS Google Scholar
Zhou, X. et al. The complete mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae). Mitochondrial DNA A 27, 884–885 (2016).
Article CAS Google Scholar
Albertin, C. B. et al. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature 524, 220–224 (2015).
Article ADS CAS Google Scholar
Kim, B.-M. et al. The genome of common long-arm octopus Octopus minor. Gigascience 7, giy119, https://doi.org/10.1093/gigascience/giy119 (2018).
Article PubMed Central Google Scholar
Belcaid, M. et al. Symbiotic organs shaped by distinct modes of genome evolution in cephalopods. Proc. Natl. Acad. Sci. USA., 201817322, https://doi.org/10.1073/pnas.1817322116 (2019).
Walker, W. F. & Ford Doolittle, W. 5S rRNA sequences from four marine invertebrates and implications for base pairing models of metazoan sequences. Nucleic Acids Res 11, 5159–5164 (1983).
Article CAS Google Scholar
Albertin, C. B. et al. Cephalopod genomics: A plan of strategies and organization. Stand. Genomic Sci. 7, 175–188 (2012).
Article Google Scholar
Kröger, B., Vinther, J. & Fuchs, D. Cephalopod origin and evolution: A congruent picture emerging from fossils, development and molecules. BioEssays 33, 602–613 (2011).
Article Google Scholar
Packard, A. Cephalopods and fish: the limits of convergence. Biol. Rev. 47, 241–307 (1972).
Article CAS Google Scholar
Garrett, S. & Rosenthal, J. J. C. RNA Editing Underlies Temperature Adaptation in K⁺ Channels from Polar Octopuses. Science 335, 848 (2012).
Article ADS CAS Google Scholar
Liscovitch-Brauer, N. et al. Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods. Cell 169, 191–202.e111, https://doi.org/10.1016/j.cell.2017.03.025 (2017).
Article CAS PubMed PubMed Central Google Scholar
Garrett, S. C. & Rosenthal, J. J. C. A Role for A-to-I RNA Editing in Temperature Adaptation. Physiology Bethesda 27, 362–369 (2012).
Article CAS Google Scholar
Yoshida, M.-A. et al. Genome structure analysis of molluscs revealed whole genome duplication and lineage specific repeat variation. Gene 483, 63–71 (2011).
Article CAS Google Scholar
Petrosino, G. The transcriptional landscape of the nervous system of Octopus vulgaris. PhD Thesis (Università degli Studi di Napoli Federico II, 2015).
Lee, P. N., Callaerts, P., de Couet, H. G. & Martindale, M. Q. Cephalopod Hox genes and the origin of morphological novelties. Nature 424, 1061–1065 (2003).
Article ADS CAS Google Scholar
Nakajima, R., Shigeno, S., Zullo, L., De Sio, F. & Schmidt, M. R. Cephalopods between science, art, and engineering: a contemporary synthesis. Front. Commun 3, 20 (2018).
Article Google Scholar
Shigeno, S., Parnaik, R., Albertin, C. B. & Ragsdale, C. W. Evidence for a cordal, not ganglionic, pattern of cephalopod brain neurogenesis. Zool. Lett 1, 26 (2015).
Article Google Scholar
Shigeno, S., Takenori, S. & Boletzky, S. V. In Cephalopods - Present and Past (eds Tanabe, K., Shigeta, Y., Sasaki, T. & Hirano, H.) 23–34 (Tokai University Press, 2010).
Keynes, R. D. The role of giant axons in studies of the nerve impulse. BioEssays 10, 90–93 (1989).
Article CAS Google Scholar
Allcock, A. L., Lindgren, A. & Strugnell, J. The contribution of molecular data to our understanding of cephalopod evolution and systematics: a review. J. Nat. Hist. 49, 1373–1421 (2015).
Article Google Scholar
Sanchez, G. et al. Genus-level phylogeny of cephalopods using molecular markers: current status and problematic areas. PeerJ 6, e4331, https://doi.org/10.7717/peerj.4331 (2018).
Article PubMed PubMed Central Google Scholar
Sweeney, M. J., Roper, C. F., Mangold, K. M., Clark, M. & Boletzky, S. V. “Larval” and juvenile cephalopods: a manual for their identification. Smithson. Contrib. Zool. 513,1–282 (1992).
Article Google Scholar
Andrews, P. L. R. et al. The identification and management of pain, suffering and distress in cephalopods, including anaesthesia, analgesia and humane killing. J. Exp. Mar. Biol. Ecol. 447, 46–64 (2013).
Article Google Scholar
Fiorito, G. et al. Guidelines for the Care and Welfare of Cephalopods in Research - A consensus based on an initiative by CephRes, FELASA and the Boyd Group. Lab. Anim. 49, 1–90 (2015).
Article Google Scholar
DeWoody, J. & Avise, J. Microsatellite variation in marine, freshwater and anadromous fishes compared with other animals. J. Fish Biol. 56, 461–473 (2000).
Article CAS Google Scholar
Bazin, E., Glémin, S. & Galtier, N. Population size does not influence mitochondrial genetic diversity in animals. Science 312, 570–572 (2006).
Article ADS CAS Google Scholar
Chapko, M. K. et al. Devilfish. A Practical Guide to the Dissection of Octopus. 1–65 (Wayne Senior High School, 1962).
Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7, gix120, https://doi.org/10.1093/gigascience/gix120 (2018).
Article CAS Google Scholar
Simpson, J. T. Exploring genome characteristics and sequence quality without a reference. Bioinformatics 30, 1228–1235 (2014).
Article CAS Google Scholar
Jackman, S. D. et al. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res. 27, 768–777 (2017).
Article CAS Google Scholar
Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Article CAS Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS Google Scholar
Pryszcz, L. P. & Gabaldón, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44, e113, https://doi.org/10.1093/nar/gkw294 (2016).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS Google Scholar
Waterhouse, R. M. et al. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Article CAS Google Scholar
Kent, W. J. et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar
NCBI Sequence Read Archive, http://identifiers.org/ncbi/insdc.sra:SRP162501 (2018).
Octopus vulgaris isolate Ov12053, whole genome shotgun sequencing project. GenBank, http://identifiers.org/ncbi/insdc:RXHP00000000 (2018).
Zarrella, I. et al. The survey and reference assisted assembly of the Octopus vulgaris genome. figshare https://doi.org/10.6084/m9.figshare.c.4266164 (2019).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012).
Article CAS Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article Google Scholar
NCBI Assembly, https://www.ncbi.nlm.nih.gov/assembly/GCF_001194135.1/ (2015).
Goubert, C. et al. De Novo Assembly and Annotation of the Asian Tiger Mosquito (Aedes albopictus) Repeatome with dnaPipeTE from Raw Genomic Reads and Comparative Analysis with the Yellow Fever Mosquito (Aedes aegypti). Genome Biol. Evol. 7, 1192–1205, https://doi.org/10.1093/gbe/evv050 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kim, B.-M. et al. Supporting data for “The genome of common long-arm octopus Octopus minor”. GigaScience Database, https://doi.org/10.5524/100503 (2018).

Download references

Acknowledgements

We are recognizing the contribution of Drs A. Cole (currently at University of Vienna), N. Tublitz (Oregon University), and C. Di Cristo (University of Sannio and CephRes) for their generous support at an earlier stage of this project. The contribution of the State Key Laboratory of Agricultural Genomics (grant No. 2011DQ782025) to GEM is also appreciated. EAR and OS are supported by a grant from the Austrian Science Fund (FWF): P30686-B29. We thank the COST Action (FA1301) that facilitated the networking initiative underlying this study. Genome browsers are integrated in the CephRes databases-website platform (http://www.cephalopodresearch.org/ceph_gdatab/) based on a bilateral agreement with University of Vienna. This study was supported by BGI-Shenzhen and by the Association for Cephalopod Research ‘CephRes’ a no-profit organization.

Author information

These authors contributed equally: Ilaria Zarrella, Koen Herten, Shuaishuai Tai and Ming Yang.
These authors jointly supervised this work: Gregory E. Maes, Oleg Simakov, Giovanna Ponte and Graziano Fiorito.

Authors and Affiliations

Association for Cephalopod Research ‘CephRes’, Napoli, Italy
Ilaria Zarrella & Giovanna Ponte
Genomics Core, KU Leuven, Leuven, Belgium
Koen Herten
Laboratory for Cytogenetics and Genome Research, Center for Human Genetics, KU Leuven, Leuven, Belgium
Koen Herten & Gregory E. Maes
Laboratory of Biodiversity and Evolutionary Genomics, KU Leuven, Leuven, Belgium
Gregory E. Maes
Centre for Sustainable Tropical Fisheries and Aquaculture, Comparative Genomics Centre, College of Science and Engineering, James Cook University, Townsville, 4811, QLD, Australia
Gregory E. Maes
BGI Genomics, BGI-Shenzhen, Shenzhen, 518083, China
Shuaishuai Tai & Ming Yang
Laboratory of Developmental Neurobiology, Department of Biology, KU Leuven, Leuven, Belgium
Eve Seuntjens & Ruth Styfhals
Department of Molecular Evolution and Development, University of Vienna, Vienna, Austria
Elena A. Ritschard, Michael Zach & Oleg Simakov
Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn Napoli, Napoli, Italy
Ruth Styfhals, Remo Sanges, Giovanna Ponte & Graziano Fiorito

Authors

Ilaria Zarrella
View author publications
You can also search for this author in PubMed Google Scholar
Koen Herten
View author publications
You can also search for this author in PubMed Google Scholar
Gregory E. Maes
View author publications
You can also search for this author in PubMed Google Scholar
Shuaishuai Tai
View author publications
You can also search for this author in PubMed Google Scholar
Ming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Eve Seuntjens
View author publications
You can also search for this author in PubMed Google Scholar
Elena A. Ritschard
View author publications
You can also search for this author in PubMed Google Scholar
Michael Zach
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Styfhals
View author publications
You can also search for this author in PubMed Google Scholar
Remo Sanges
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Simakov
View author publications
You can also search for this author in PubMed Google Scholar
Giovanna Ponte
View author publications
You can also search for this author in PubMed Google Scholar
Graziano Fiorito
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.Z. and G.P. provided material for DNA sequencing and drafted an early first version of the manuscript. S.T. and M.Y. generated the data and performed a preliminary analysis. K.H. and G.E.M. further analysed the data and run following steps, and developed a further version of the manuscript. O.S. contributed to the conception of the work and contributed to data analysis and to paper writing. M.Z. established the database resource. G.F. and G.P. conceived the work, performed the sampling and contributed to the paper writing. All authors contributed to writing and editing the final manuscript.

Corresponding authors

Correspondence to Oleg Simakov or Graziano Fiorito.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

ISA-Tab metadata file

Download metadata file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Zarrella, I., Herten, K., Maes, G.E. et al. The survey and reference assisted assembly of the Octopus vulgaris genome. Sci Data 6, 13 (2019). https://doi.org/10.1038/s41597-019-0017-6

Download citation

Received: 18 October 2018
Accepted: 04 February 2019
Published: 01 April 2019
DOI: https://doi.org/10.1038/s41597-019-0017-6

This article is cited by

Hologenome analysis reveals independent evolution to chemosymbiosis by deep-sea bivalves
- Yang Guo
- Lingfeng Meng
- Shanshan Liu
BMC Biology (2023)
CaaX-less lamins: Lophotrochozoa provide a glance at the playground of evolution
- Reimer Stick
- Annette Peter
Protoplasma (2023)
Identification of LINE retrotransposons and long non-coding RNAs expressed in the octopus brain
- Giuseppe Petrosino
- Giovanna Ponte
- Remo Sanges
BMC Biology (2022)
The gold-ringed octopus (Amphioctopus fangsiao) genome and cerebral single-nucleus transcriptomes provide insights into the evolution of karyotype and neural novelties
- Dianhang Jiang
- Qun Liu
- Xiaodong Zheng
BMC Biology (2022)
De novo transcriptome reconstruction in aquacultured early life stages of the cephalopod Octopus vulgaris
- María Prado-Álvarez
- Sonia Dios
- Camino Gestal
Scientific Data (2022)

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Genomic DNA preparation

Genome sequencing and quality control

Draft genome assembly

Reference Assisted Scaffolding

Assessment of draft genomes

Data Records

Technical Validation

Quality control

Quality control DNA library

Sequencing depth assessment

Genome properties and future steps

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

ISA-Tab metadata file

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links