The marbled crayfish Procambarus virginalis is a unique freshwater crayfish characterized by very recent speciation and parthenogenetic reproduction. Marbled crayfish also represent an emerging invasive species and have formed wild populations in diverse freshwater habitats. However, our understanding of marbled crayfish biology, evolution and invasive spread has been hampered by the lack of freshwater crayfish genome sequences. We have now established a de novo draft assembly of the marbled crayfish genome. We determined the genome size at approximately 3.5 gigabase pairs and identified >21,000 genes. Further analysis confirmed the close relationship to the genome of the slough crayfish, Procambarus fallax, and also established a triploid AA’B genotype with a high level of heterozygosity. Systematic fieldwork and genotyping demonstrated the rapid expansion of marbled crayfish on Madagascar and established the marbled crayfish as a potent invader of freshwater ecosystems. Furthermore, comparative whole-genome sequencing demonstrated the clonality of the population and their genetic identity with the oldest known stock from the German aquarium trade. Our study closes an important gap in the phylogenetic analysis of animal genomes and uncovers the unique evolutionary history of an emerging invasive species.
Freshwater crayfish are important keystone species that play critical roles in the maintenance of their ecosystems1,2. Taxonomically, they belong to the order of decapod crustaceans, which includes crabs, lobsters, prawns and shrimps. Surprisingly, however, complete genome sequences from these ecologically and economically important groups remain to be established. Currently, the only crustacean genomes available are those of the water flea (Daphnia pulex) and the sand flea (Parhyale hawaiensis)3,4, leaving decapods as a major gap in the phylogenetic analysis of genomes.
The marbled crayfish Procambarus virginalis (Fig. 1a) is a freshwater crayfish species5 that holds a unique position among decapod crustaceans due to its parthenogenetic mode of reproduction6. Marbled crayfish are descendants of the sexually reproducing slough crayfish Procambarus fallax and reproduce by apomictic parthenogenesis7,8. We have previously suggested that marbled crayfish originated through an evolutionarily very recent macromutation in P. fallax, consistent with the first known appearance of marbled crayfish in the German aquarium trade in 1995 (ref. 8). Subsequent distribution via the pet trade and anthropogenic releases resulted in increasing numbers of wild populations in several countries9,10,11,12,13,14,15,16. The propagation of marbled crayfish is facilitated by their parthenogenetic mode of reproduction and their high fecundity17, which allows the establishment of large populations from single animals18, and may serve as a model for the spread of invasive species. However, our understanding of marbled crayfish distribution, origins, diversification and ability to adapt to new environments is severely limited by the lack of genetic information.
Available genome sequences of parthenogenetic animals are currently limited to certain nematodes19,20,21,22 and the bdelloid rotifer Adineta vaga23. Their analysis revealed several interesting features that are likely to reflect important strategies for the evolutionary robustness of these parthenogenetic animals, including the presence of allelic regions on the same chromosome in A. vaga23 and substantial heterozygosity, combined with the loss of key meiosis genes in Diploscapter pachys and Diploscapter coronatus21,22. However, these features were identified in genomes that have been shaped by asexual reproduction for millions of years. Due to its very young evolutionary age, the marbled crayfish provides a unique opportunity to analyse the structure of a recently formed parthenogenetic genome and to track its evolution at a very early stage.
Here, we provide a draft genome assembly of the marbled crayfish to investigate the genome structure, evolutionary history, population structure and invasive spread of this unique animal.
Previous studies based on microsatellite analyses and karyotyping have shown that the marbled crayfish is a triploid organism with 276 chromosomes, which corresponds to the exact triplicate number of the haploid set of chromosomes in P. fallax24. Furthermore, marbled crayfish represent an evolutionarily very young species8,24, which contrasts with other known parthenogenetic animals, such as bdelloid rotifers and asexually reproducing nematodes, and suggests that the three genome copies are still highly similar. We therefore assumed that the marbled crayfish genome represents a triplicate version of the original 1 N genotype from P. fallax. To quantitatively determine the marbled crayfish genome size, we analysed the DNA content of haemocytes by flow cytometry (Fig. 1b). Haploid genome size estimates using human and mouse blood cells as internal references suggested genome sizes of 3.9 and 3.5 gigabase pairs (Gbp), respectively (Fig. 1c and Supplementary Fig. 1). An in silico genome size estimate based on k-mer frequencies provided a slightly lower, but overall consistent value (3.3 Gb; Supplementary Fig. 1). Taken together, these findings suggest that the 1 N-equivalent genome size of the marbled crayfish is approximately 3.5 Gbp.
To establish the complete genome sequence of the marbled crayfish, we used genomic DNA from a single animal of the ‘Petshop’ laboratory strain8 to prepare various libraries for Illumina sequencing and obtained 350 Gbp of DNA sequence (Supplementary Table 1). After contig assembly, scaffolding and gap filling, we generated a draft genome sequence with a total length of 3.3 Gb and a weighted mean sequence length (N50) of 39.4 kilobases (kb). Benchmarking with universal single-copy orthologues25 suggested that the quality of the marbled crayfish genome assembly was comparable to other, recently published arthropod genomes26,27,28 (Supplementary Fig. 2). Phylogenetic placement among various published arthropod genomes confirmed that the marbled crayfish is most closely related to D. pulex and P. hawaiensis (Fig. 1d).
The gene length in marbled crayfish averaged 6.7 kb. Average exon and intron sizes were 0.3 kb and 2 kb, respectively, thus placing marbled crayfish gene lengths between those of P. hawaiensis and D. pulex. Gene annotation was performed using the MAKER genome annotation pipeline29, which provided important starting points for further analysis. For example, we detected multiple genomic locations for cellulase genes of the GH9 family (Fig. 2a). Genomically encoded cellulase genes are relatively rare in higher animals, but are generally assumed to play a key role for omnivorousness in freshwater crayfish30. Furthermore, repeat annotation detected 484,313 repeats that were subclassified into 7 major categories and covered 8.8% of the annotated genome assembly (Fig. 2b). Repeat coverage is likely to increase substantially in future versions of the marbled crayfish genome assembly, as the fragmentation of the genome assembly currently represents a major bottleneck for algorithmic repeat detection. The most recent version of the genome assembly and annotation can be accessed through a dedicated internet portal (http://marmorkrebs.dkfz.de).
In parallel, we also established the marbled crayfish transcriptome from a normalized sequencing library that was generated from several distinct tissues. Benchmarking again confirmed that the quality was comparable to other, recently published arthropod transcriptomes (Supplementary Fig. 2). The transcriptome consists of 22,338 transcripts (Fig. 2c), which corresponds closely to the numbers of predicted genes (21,772) and messenger RNAs (22,205) in the genome assembly. Comparisons with other publicly available transcriptomes revealed homologues for the majority (81%) of predicted proteins, while 19% (n = 4,306) of the predicted proteins were classified as unique (Fig. 2d).
The analysis of two other parthenogenetic genomes had revealed the presence of homologous but diverged blocks that reflect genome rearrangements typically associated with asexual reproduction19,23. However, the average copy number of the 1,066 universal single-copy orthologues was 1.01, which argues against the existence of divergent homologues. Similarly, the coverage distribution of genes showed only a single peak (Fig. 3a). Finally, we could only detect a very low number (n = 66) of collinear genes on different scaffolds, all of which had rather high E-values (Supplementary Table 2), indicating that they probably represent artefacts. Together, these findings suggest that the genome rearrangements described in longstanding parthenogens are not detectable in the marbled crayfish genome, which is consistent with its very young evolutionary age.
Additional features of the marbled crayfish genome were revealed by the analysis of heterozygous sequence variants. The global rate of heterozygosity was 0.53%, which is relatively high compared with other sequenced genomes, including P. fallax from the pet trade (0.03%; Fig. 3b). Furthermore, the allelic frequency of marbled crayfish sequence variants peaked at 0.33 (Fig. 3c), in agreement with heterozygous positions in a triploid genome. In contrast, the frequencies of P. fallax and P. alleni sequence variants and polymorphisms peaked at 0.5 and 1.0 (Fig. 3c), reflecting their diploid nature and polymorphisms towards the marbled crayfish genome, respectively. Finally, the triploid marbled crayfish genome showed a negligible fraction (0.15%) of triallelic sequence polymorphisms (Fig. 3d). Together, these findings provide strong support for an AA’B genotype that may have originated from an autopolyploid gamete8.
Previous studies have suggested that marbled crayfish reproduce by apomictic parthenogenesis31,32, which should result in the establishment of a genetically homogeneous population. We therefore sequenced the genomes of four additional marbled crayfish from diverse sources: (1) an animal from the longest-known stock (‘Heidelberg’, founded in 1995); (2) an animal from a German wild population caught in 2013 (‘Moosweiher’); (3) an animal from a market purchase in Madagascar (‘Madagascar 1’); and (4) an animal from an American laboratory stock, which originated from another pet shop purchase in Germany17 (‘Petshop 2’). In addition, we generated genome sequences from two closely related species, P. fallax (four animals from an aquarium supplier) and P. alleni (one animal from an aquarium supplier). Sequencing and mapping to the marbled crayfish reference genome resulted in genome coverages ranging from 16–72× (Supplementary Table 3). We then used single-nucleotide polymorphisms to analyse phylogenetic relationships among the nine sequenced animals. The results confirmed the clonality of the four analysed marbled crayfish genomes and their separation from P. fallax and P. alleni (Fig. 4). Taken together, our findings illustrate a unique path of animal genome evolution that involves genome duplication, triploidy and clonal expansion.
Despite their clonality and very recent emergence, it has been suggested that marbled crayfish are successful invaders of new territories and environments9,11,18. For example, in 2007, a novel crayfish was found in the capital of Madagascar. The animals were characterized as marbled crayfish based on their morphology and DNA sequencing of a 16 S mitochondrial DNA fragment9,10. However, there were no further reports on their genetic characteristics and potential spread. The availability of genome sequences for the closely related and morphologically similar marbled crayfish P. fallax and P. alleni allowed us to identify genomic sites with a high degree of sequence diversity among the three species, to confirm the identity and track the spread of the animals on Madagascar. Fieldwork was conducted in two phases, with a first series of collections in the central highland, followed by a more comprehensive study covering large parts of the country (Supplementary Table 4). In a pilot analysis, we sequenced polymerase chain reaction amplicons for a mitochondrial (cytochrome b; 214 bp) and nuclear (Dnmt1; 220 bp) locus from 24 independent animals that were collected in four regions from the central highland (Fig. 5a). The results showed 100% sequence identity with the marbled crayfish reference sequence for all analysed samples and substantial sequence differences towards P. fallax and P. alleni (Fig. 5a). These findings unambiguously classify the collected animals as marbled crayfish. Additionally, our systematic field collections and morphological analyses detected large populations of marbled crayfish (Fig. 5b) in diverse freshwater habitats, such as lakes and rice fields on the central highland, as well as in swamps close to the coastline (Supplementary Table 4 and Fig. 5c). We therefore analysed an additional 25 animals from 8 diverse regions by DNA sequencing of cytochrome b and Dnmt1 and identified only 6 mismatches among >20,000 bases analysed in total (Supplementary Table 5), which is commensurate to the normal level of polymerase chain reaction and/or sequencing errors. We estimate that between 2007 and 2017, the size of the marbled crayfish distribution area increased about 100-fold from 103 km2 to more than 105 km2 (Fig. 5c) and that the current population on Madagascar comprises millions of animals.
To further characterize the marbled crayfish population on Madagascar, we used whole-genome sequencing. The sequencing of 5 animals from diverse collection sites and mapping to the marbled crayfish reference genome resulted in genome coverages ranging from 17× to 36× (Supplementary Table 3). Sequence comparisons revealed extremely low numbers of polymorphisms in the analysed marbled crayfish genomes (Fig. 6a). In marked contrast, the P. fallax genome showed a substantial number of polymorphisms towards the marbled crayfish reference genome sequence (Fig. 6a). These results provide additional, strong support for the clonality of the marbled crayfish population.
To further explore the relationship between the animals found on Madagascar and the German stocks of marbled crayfish, we obtained two additional whole-genome sequences of animals from Germany (Supplementary Table 3). Our final dataset thus consisted of 11 genome sequences from diverse sources (Supplementary Table 6). Genetic variants were extracted and filtered for single base substitutions and mapping artefacts were eliminated by remapping of sequencing reads from the genome reference individual (see Methods for details). This identified a strikingly low number of only 416 single-nucleotide variants (SNVs) in a highly diverse group of animals (Supplementary Table 7). The maximum number of non-synonymous SNVs per animal was four (Supplementary Table 7), which further illustrates the extremely low genetic complexity of the marbled crayfish population. Finally, the comparison of SNVs also provided interesting insight into the relationships of the sequenced animals. The results showed an overlapping distribution of animals from Germany and Madagascar (Fig. 6b), indicating that the Malagasy population originates from a German stock. In addition, a separate cluster was formed by two aquarium stocks that were independently founded by animals from different stores of the same German pet shop chain more than ten years ago (Fig. 6b).
In summary, our findings thus establish the marbled crayfish as a potent invader of freshwater ecosystems and demonstrate a unique genetic structure of the invasive population.
Our study establishes the genome assembly of a decapod crustacean, thus providing an important resource for further research for an economically and scientifically important group of animals. The marbled crayfish is a particularly important example due to its recent emergence, parthenogenetic mode of reproduction and invasive potential. Our results show that the marbled crayfish genome consists of two almost identical copies of one genotype and a third copy of a comparably divergent, but still homologous genotype. These findings are consistent with the model that the marbled crayfish genome originated from an autopolyploid P. fallax gamete and the mating of two distantly related P. fallax individuals (Supplementary Fig. 3), possibly from distant populations and in captivity. Alternative hypotheses involving allopolyploid formation with P. alleni appear unlikely due to the lack of hybrid morphological features7 and the considerable genetic differences. Interestingly, triploidy and heterozygosity might provide a significant evolutionary advantage for marbled crayfish as they could buffer the effects of deleterious genetic mutations (Muller’s ratchet33) and also increase the capacity for rapid adaptation34. Evolution of the marbled crayfish genome towards effective haploidy, as predicted by the Meselson effect35,36, was not detectable. This is probably explained by the very young evolutionary age of marbled crayfish and represents an important difference from the genomes of other asexually reproducing animals, such as Meloidogyne incognita19 and A. vaga23.
Our results unambiguously demonstrate the clonality of the marbled crayfish genome, consistent with the proposed mode of reproduction by apomictic parthenogenesis31,32. The generation of genetic diversity will be shaped by a complex set of factors, including the intrinsic mutability of the genome, environmental mutagens, genetic drift and selective pressure. All these factors are known to play an important role in the evolution of tumour genomes37,38. The analysis of mutations in marbled crayfish populations provides an opportunity to detect the generation, fixation and elimination of genetic changes with particularly high sensitivity and robustness and could therefore disentangle the specific contributions of individual factors. As such, it will be interesting to further explore marbled crayfish as a model system for clonal genome evolution in cancer39,40.
Our results also provide detailed genetic information about the marbled crayfish population on Madagascar. While increasing numbers of marbled crayfish introductions have been reported in northern countries11,12,13,14,15,16, there are currently no scientific records of active range expansions. This contrasts with the situation on Madagascar, where rapid invasive spread is not only facilitated by anthropogenic distribution, but also supported by favourable environmental conditions, such as relatively high temperatures and a dense network of freshwater habitats41,42 (Supplementary Fig. 4). The marbled crayfish population on Madagascar will have to be monitored closely to prevent any adverse impact on the unique local freshwater communities43,44. This includes seven endemic crayfish species45,46 in habitats that are adjacent to or overlapping with the current distribution range of marbled crayfish.
Finally, our results also demonstrate that the Madagascar population is genetically homogeneous and extremely similar to the oldest known stock of marbled crayfish founded in Germany in 1995. These findings support the notion that the global marbled crayfish population represents a single clone. The rapid invasion of diverse habitats is particularly noteworthy, as it appears to be independent of genetic variants, which are generally considered to be the major determinants of ecological adaptation47. This suggests that alternative mechanisms, such as stochastic epigenetic variation and/or epigenetic plasticity48,49, play a prominent role in the rapid adaptation of marbled crayfish.
Genome size estimation by flow cytometry
A detailed protocol for cell preparation can be found in the Supplementary Materials. Briefly, 100 µl aliquots of cells treated with 2 µl RNase A stock solution (50 mg ml–1) were mixed with 5 µl propidium iodide stock solution (1 mg ml–1). Samples were diluted with 100 µl 1× phosphate buffered saline and briefly mixed. Propidium-iodide-stained cells were counted and the fluorescence intensity per cell was measured. After determining the cell density of each sample (cell counts µl–1), equal volumes of stained cells from different organisms were mixed together and analysed again with the flow cytometer. The genome size was obtained by calculating the median fluorescence signal of stained cells per haploid genome multiplied by the known genome size of the reference samples.
Library generation and genome sequencing
A detailed protocol for DNA isolation and quality control can be found in the Supplementary Materials. Library generation, sequencing and pre-processing of read data was performed by Eurofins MWG. DNA from one individual female from a laboratory strain (Petshop 1) was used for sequencing. Fragmentation of shotgun libraries was performed on a Covaris E210 according to the manufacturer’s instructions. Ligation products were size selected by agarose gels, with a targeted insert size of 500 bp. Library generation for long jumping distance (LJD) sequencing was performed following a mate-pair library protocol provided by Illumina. The protocol was modified using adaptor-guided ligation of genomic fragments, which achieves higher accuracies. Targeted insert sizes were 3, 8, 20 and 40 kb.
In total, six shotgun libraries and six LJD libraries were produced. Clusters were generated using an Illumina cBot, followed by sequencing which differed between the shotgun library and LJD runs. Shotgun libraries were sequenced using a HiSeq 2500 platform (HiSeq Control Software 220.127.116.11) with 2× 150 bp paired-end sequencing. LJD libraries were sequenced using Illumina HiSeq 2000 with 2× 100 bp paired-end sequencing. Additionally, one MiSeq library was generated in the German Cancer Research Center Genomics and Proteomics Core Facility following a standard MiSeq protocol with a library with 900-bp insert sizes and a 300-bp read length.
Genome assembly and annotation
A detailed description can be found in the Supplementary Materials.
A set of 138 single-copy orthologues were extracted from recently published arthropod genomes. To estimate genetic divergence, a multiple sequence alignment was calculated with ClustalW (2.1)50. Furthermore, alignments were reduced to focus conserved blocks using Gblocks (0.91b)51. Phylogenetic tree construction was based on maximum likelihood estimation from PhyML (20120412)52 with default parameters. Branch support was calculated using a Shimodaira–Hasegawa-like procedure..
Analysis of genetic variants
Sequencing data of all individuals (marbled crayfish, P. fallax and P. alleni) were mapped to the marbled crayfish reference genome using Bowtie2 (2.2.6)53. Subsequently, a subset of sequences larger than 10 kb was extracted, resulting in a total of 6.7 × 108 callable sites comprising about 19% of the genome assembly. Variant calling was performed using Freebayes (0.9.21-g7dd41db). Remapping shotgun reads and eliminating sites with no reference allele observations enabled filtering of potential mapping error sites. Variants were categorized as biallelic and triallelic, describing their occurrence on variant alleles. Additionally, polymorphisms were denoted as SNVs in one allele (marbled crayfish) or in up to two alleles (P. fallax and P. alleni) to the reference genome. Before polymorphism calling, sites were restricted to non-heterozygous loci, as determined by variant calling in the genome individual.
Population genetic mutations were quantified by mapping read data to the reference genome, as described above. Polymorphic sites were further restricted by data from the reference strain to positions with a minimum reference coverage of 10 and a maximum reference coverage of 200. Furthermore, each site was required to have a valid genotype (as determined by Freebayes) in all samples. Samples were filtered for at most one heterozygous substitution to the reference genome. For each site, a minimum coverage of eight was required for each sample to account for differences in the sequencing yield. Phylogenetic trees were generated based on pairwise polymorphism information between different Procambarus species and between different marbled crayfish animals.
Determination of heterozygosity levels
High-coverage-sequence information (~72×) of the marbled crayfish individual used for genome assembly was used for estimating heterozygosity levels. Non-normalized read-mapping data, as described in variation analysis, were extracted from sequences of ≥10 kb. Variant calling was performed using Freebayes with the parameter settings described in the Supplementary Materials. Heterozygous positions were extracted after filtering the output quality of at least 30 and coverage of at least 15. Heterozygosity information for other genomes, estimated by a similar approach, was extracted from publications26,27,54. The heterozygosity level of P. fallax was estimated by assembling high-coverage reads into a raw contig assembly. Due to a lower total read coverage than in marbled crayfish, the coverage cutoff was 10. Heterozygosity levels were calculated as the total number of heterozygous single-nucleotide polymorphisms divided by the actual number of nucleotides (without ambiguous bases).
Fieldwork was conducted on Madagascar between March 2016 and March 2017 and covered 15 regions, 33 sites and 88 sampling stations. Sites were selected based on biogeographical parameters with a wide range of different habitats, as well as feedback from awareness campaigns and interactions with local residents. Two to five sampling stations were chosen per site, according to the availability of water access and local collaborators, such as fishermen or crayfish collectors. Animals were caught in lakes, streams, ponds, rice field channels and swamps for one to three hours during the morning using traditional fishing tools, such as creels (50 cm × 30 cm × 30 cm) called ‘tandroho’ and/or nets called ‘harato’. Ecological characteristics of the habitats and physicochemical parameters of the water were noted. Collected animals (up to 100 per catch) were morphologically analysed to preliminarily confirm their identity. Tissue samples from three to five animals per sampling station were preserved in ethanol for genotyping.
Crayfish genomic DNA was isolated and purified from 100 mg abdominal musculature using a Tissue Ruptor and the DNeasy Blood and Cell Culture Kit (both Qiagen), following the manufacturer’s instructions. For genotyping, two fragments were amplified: a 274-bp fragment from the mitochondrial cytochrome b gene was amplified using primers: 5′-CAG GAC GTG CTC CGA TTC ATG-3′ and 3′-GAC CCA GAT AAC TTC ATC CCA G-5′. In addition, a 334-bp fragment of the Dnmt1 nuclear gene was amplified using primers 5′-GCT TTC TGG TCT CGT ATG GTG-3′ and 3′-CTG CAC ACA GCC TAA GAT GC-5′. A polymerase chain reaction was performed with thin-wall tubes using a Bio-Rad Peltier Thermal Cycler. Some 5 µl of genomic DNA was added to a reaction mixture (25 µl final volume) containing 2 µl (10 µmol) of reverse and forward primers, 1 µl of deoxynucleoside triphosphates (10 mmol), 0.5 µl of FireTaq blue polymerase (1 U µl–1; Steinbrenner), 2.5 µl of 10× Reaction Buffer and 14 µl of water. Samples were preheated at 96 °C for 3 min followed by amplification under the following conditions: denaturation at 96 °C for 30 s, annealing at 57 °C for 30 s and elongation at 72 °C for 30 s. A total of 30 cycles were performed and then followed by a final elongation step at 72 °C for 3 min. The resulting polymerase chain reaction amplicons were analysed and purified by agarose gel electrophoresis and cloned using the TOPO TA Cloning Kit (Invitrogen) according to the manufacturer’s instructions. Purified plasmids were sequenced by GATC Biotech and the sequences were aligned in FinchTV version 1.4.0.
Life Sciences Reporting Summary
Further information on experimental design is available in the Life Sciences Reporting Summary.
All sequencing data have been deposited as a National Center for Biotechnology Information BioProject (accession number PRJNA356499).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors thank S. Wolf and the German Cancer Research Center Genomics and Proteomics Core Facility for whole-genome sequencing, S. Hagemann and S. Tönges for samples, and G. Raddatz, G. Vogt and J. Jones for helpful discussions. The authors also thank O. Simakov, C. Städele and A. Vidal-Gadea for critical comments on the manuscript, and K.F. Lyko for graphical support. This study was supported by the Ministry of Ecology, Environment and Forest of Antananarivo, Madagascar (research permit 262/16/MEEF/SG/DGF/DSAP/SCB.Re). Grant support was provided by Deutsche Forschungsgemeinschaft to W.S. (STE 937/9-1).