The chromosome-level genome of Cherax quadricarinatus

Red claw crayfish (Cherax quadricarinatus) is an aquatic crustacean with considerable potential for the commercial culture and an ideal model for studying the mechanism of sex determination. To provide better genomic resources, we assembled a chromosome-level genome with a size of 5.26 Gb and contig N50 of 144.33 kb. Nearly 90% of sequences were anchored to 100 chromosomes, which represents the high-quality crustacean genome with the largest number of chromosomes ever reported. The genome contained 78.69% repeat sequences and 20,460 protein-coding genes, of which 82.40% were functionally annotated. This chromosome-scale genome would be a valuable reference for assemblies of other complex genomes and studies of evolution in crustaceans.


Background & Summary
Crustaceans are a diverse and ancient group of arthropods 1 , and are not only essential components of the marine and freshwater environments, but also an interesting model for the study of evolutionary biology and developmental biology. However, due to the high complexity, assembly of complete and exact crustacean genomes is difficult, let alone genomes at the chromosome level 2 .
Cherax quadricarinatus, also known as the red claw crayfish, is a large tropical freshwater crustacean with significant commercial interest for global aquaculture 3 . Intersexuality appears relatively widespread throughout gonochoristic crustaceans and has been reported in several crayfish species 4 . In red claw crayfish, the intersex individuals undergo a dramatic morphological and physiological sex shift, which makes it a fascinate model to study the mechanisms underlying sex determination and differentiation of crustacean. Although a genome of this species has been reported previously, with uncomplete and fragmental genome assembly (assembled genome size, 3.24 Gb and Contig N50, 33 kb), it still prevents many studies from going deep 5 . Here, we de novo assembled a chromosome-level genome of red claw crayfish with the assembled genome size of 5.26 Gb and contig N50 of 144,316 bp. This high-quality genome would enrich the genomic resources of crustaceans and provides basic data for further genome-wide selective breeding.

Methods
Sample collection and genomic sequencing. All samples used in this study were from a healthy male adult red claw crayfish farmed in Honghai Co., LTD., Zhejiang, China. Fresh muscle and haemolymph were used for whole genomic sequencing and Hi-C sequencing, respectively. Seven tissues including muscle, intestine, eyestalk, hepatopancreas, gills, stomach, and antennal gland were used for transcriptomic sequencing. Isolation of DNA/RNA, construction of libraries and genomic sequencing were carried out according to protocols from https://www.protocols.io/widgets/doi?uri=dx.doi.org/10.17504/protocols.io.bs8inhue.
For whole genomic sequencing (WGS), the genomic DNA was sonicated into ~250 bp fragments that used to build the 100 bp paired-end (PE100) sequencing library. The library was then sequenced on the BGISEQ-500 platform and generated 280.51 Gb raw data, which covered ~58X of the estimated genome (Table 1).
www.nature.com/scientificdata www.nature.com/scientificdata/ For PacBio Continuous Long Reads (CLR) sequencing, seven sequencing libraries were constructed using ~20Kb high-quality molecular DNA fragments. All libraries were sequenced on the PacBio Sequel II platform, which generated 568.55 Gb raw data with an N50 of 17,393 bp (Table 1).
For the construction of Hi-C library, DNA was fixed with formaldehyde solution and isolated from nuclei, and digested with MboI, the digested fragments were labeled with biotinylated nucleotides. Eight libraries were sequenced on the BGISEQ-500 platform and produced a total of 542.71 Gb raw data, which covered ~105X of the estimated genome (Table 1).  www.nature.com/scientificdata www.nature.com/scientificdata/ Seven RNA libraries were constructed according to the protocols and sequenced on the BGISEQ-500 platform, generating a total of 136.96 Gb raw data (Table 1). Genome survey. Raw PE100 reads were firstly filtered by SOAPnuke (v1.6.5) 6 with parameters of "-M 1 -d -A 0.4 -n 0.05 -l 10 -q 0.4 -Q 2 -G -5 0", and 240 Gb clean data were retained (Table 1). Then Jellyfish (v2.2.6) 7 was used to count k-17mers and GenomeScope 8 was used to estimate the size, heterozygosity, and repetitive sequences of the genome at 4.74 Gb, 0.86% and 85.6%, respectively (Fig. 1a).
Chromosome karyotyping. The number and length of chromosomes in red claw crayfish were obtained by karyotyping experiment using 15 male adults, according to the published pipeline 9 . Chromosomes were measured using Adobe Photoshop CS6 measurement tools under a magnification of 600 × . The chromosome pairs were classified following the nomenclature of Levan (1964) 10 into m = metacentric (long arm/short arm (r) = 1-1.7), sm = submetacentric (r = 1.7-3), st = subtelocentric (r = 3-7), and a = acrocentric (r > 7). The karyotype formula of the male red claw crayfish is n = 100 = 36 m + 33 sm + 14 st + 17 t (Fig. 1b), and the arm lengths data were listed in Supplementary Table 1.
Genome assembly. Reads longer than 5 kb were kept from raw Pacbio CLR reads and corrected by Canu (v1.5) 11 , based on which the draft genome was assembled by Wtdbg2 12 with parameters of "-p 21 -E 2 -S 4 -s 0.05 -L 5000 -X 40". The draft genome was further polished by Pilon 13 using clean PE100 reads with default parameters, giving an assembly with the size of 5.26 Gb and the contig N50 of 144.33 kb (Table 2).
Based on the polished genome, 84.34 Gb Hi-C data were validated through quality control by Hi-C-Pro (v. 2.8.0) 14 , which were then applied for chromosomal reconstruction by Juicer (v1.5) 15 and 3D-DNA (3D-de novo assembly) 16 . To get more precise chromosomes, we manually made some adjustments according to the  Table 2. Summary of the genome assembly of red claw crayfish. www.nature.com/scientificdata www.nature.com/scientificdata/ chromosomal interaction heatmap by Juicebox 17 (Fig. 2). Finally, a total of 4.70 Gb sequences were anchored to 100 chromosomes, of which the longest is 142.95 Mb and the shortest is 18.54 Mb (Supplementary Table 2). The linear regression analysis of karyotyping and assembly showed a high correlation (R 2 = 0.9874) between the physical length and sequence length of 100 chromosomes (Fig. 1c), indicating the high-quality crustacean genome with the largest number of chromosomes ever reported.
These genes were then functionally annotated through BLAST against NCBI non-redundant proteins (NR), TrEMBL, Gene Ontology (GO), SwissProt, and Kyoto Encyclopedia of Genes and Genomes (KEGG) protein databases. Finally, 16,859 genes accounting for 82.40% of the total were successfully annotated with at least one public functional database ( Table 7).
The tRNAscan-SE 30 was used to annotate the tRNAs based on annotated features such as isotype, anticodon, and tRNAscan-SE bit score. The rRNA sequences were annotated from homologous references in close species. MiRNAs and snRNAs were predicted by the INFERNAL 31 based on the covariance model of the Rfam database. Totally 6,954 non-coding RNAs were predicted, including 25 miRNA, 1,448 rRNA, 5,023 tRNA and 458 snRNA genes (Table 8).

Data Records
The genomic WGS sequencing data were deposited in the SRA at NCBI SRR22412649 32 , SRR22412641 33 .
The final chromosome assembly was deposited in GenBank at NCBI JAPQEV000000000 50 . The genome annotation file is available in figshare 51 .    Table 7. Summary of gene annotation in red claw crayfish.

Technical Validation
The quality and quantity of total DNA was checked using agarose gel electrophoresis, and the concentration was determined using a NanoDrop 2000 spectrophotometer. RNA integrity was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA). The sample used in our study had an RNA integrity number (RIN) larger than 8. To further assess the quality of the genome, clean PE100 reads were aligned back to the genome by BWA 52 , showing the mapping rate as high as 99.03%. The depth and GC content were also statistically analyzed within a 10Kb sliding window. Moreover, 85.7% completed and 6.2% fragmented BUSCOs 53 (Benchmarking Universal Single-Copy Orthologs, v4.0) in arthropoda_odb9 database were identified, which showed a noticeable improvement than the previous version (81.3%).

Code availability
No specific code was developed in this work. The parameters of all commands and pipelines used for data processing are described in the Methods section. If no detailed parameters are mentioned for a software, the default parameters were used, as suggested by the developer.  www.nature.com/scientificdata www.nature.com/scientificdata/