A chromosome-level genome assembly of the Asian arowana, Scleropages formosus

Li, Jia; Bian, Chao; Hu, Yinchang; Mu, Xidong; Shen, Xueyan; Ravi, Vydianathan; Kuznetsova, Inna S.; Sun, Ying; You, Xinxin; Qiu, Ying; Zhang, Xinhui; Yu, Hui; Huang, Yu; Xu, Pao; Gu, Ruobo; Xu, Junmin; Orbán, László; Venkatesh, Byrappa; Shi, Qiong

doi:10.1038/sdata.2016.105

Download PDF

Data Descriptor
Open access
Published: 06 December 2016

A chromosome-level genome assembly of the Asian arowana, Scleropages formosus

Jia Li¹^na1,
Chao Bian ORCID: orcid.org/0000-0001-9904-721X¹^na1,
Yinchang Hu²^na1,
Xidong Mu²^na1,
Xueyan Shen³^na1,
Vydianathan Ravi ORCID: orcid.org/0000-0003-0807-7697⁴^na1,
Inna S. Kuznetsova^3,5,
Ying Sun¹,
Xinxin You¹,
Ying Qiu¹,
Xinhui Zhang¹,
Hui Yu¹,
Yu Huang¹,
Pao Xu⁶,
Ruobo Gu^6,7,
Junmin Xu^1,7,
László Orbán³,
Byrappa Venkatesh⁴ &
…
Qiong Shi^1,7,8

Scientific Data volume 3, Article number: 160105 (2016) Cite this article

2940 Accesses
11 Citations
4 Altmetric
Metrics details

Subjects

Abstract

Asian arowana (Scleropages formosus), an ancient teleost belonging to the Order Osteoglossomorpha, has been a valuable ornamental fish with some varieties. However, its biological studies and breeding germplasm have been remarkably limited by the lack of a reference genome. To solve these problems, here we report high-quality genome sequences of three common varieties of Asian arowana (the golden, red and green arowana). We firstly generated a chromosome-level genome assembly of the golden arowana, on basis of the genetic linkage map constructed with the restriction site-associated DNA sequencing (RAD-seq). In addition, we obtained draft genome assemblies of the red and green varieties. Finally, we annotated 22,016, 21,256 and 21,524 protein-coding genes in the genome assemblies of golden, red and green varieties respectively. Our data were deposited in publicly accessible repositories to promote biological research and molecular breeding of Asian arowana.

Design Type(s)	individual genetic characteristics comparison design • strain comparison design
Measurement Type(s)	genome assembly • chromosome-level assembly
Technology Type(s)	whole genome sequencing • restriction site-associated DNA sequencing
Factor Type(s)	subspecies • tissue
Sample Characteristic(s)	Scleropages formosus

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Chromosome-scale assembly and high-density genetic map of the yellow drum, Nibea albiflora

Article Open access 15 October 2021

The first high-quality genome assembly and annotation of Anthocidaris crassispina

Article Open access 10 August 2024

A chromosome-level genome assembly and evolutionary analysis of Coregonus ussuriensis Berg

Article Open access 18 July 2024

Background & Summary

Asian arowana or Asian dragonfish (Scleropages formosus; Osteoglossidae, Osteoglossiformes, Osteoglossomorpha) represents an ancient lineage of teleost fish. The phylum Osteoglossomorpha is one of the basal branching lineages of teleosts with fossil records from the late Jurassic¹. Because of its bright and shiny body colour, arowana has been considered as one of the most expensive ornamental fishes around the world². At least five naturally occurring colour varieties of arowana have been identified. Among them, the red, golden and green varieties are the most popular. Within the three varieties, the red arowana has the highest commercial value, whilst the green one has the lowest price. In recent times the population size of arowana in its native habitat has declined due to overfishing. As a result, Asian arowana has been listed as an endangered species by the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) Appendix I since 1980 (ref. 3). It therefore raises people’s awareness about this endangered ornamental fish to some extent.

The Asian arowana, as a well-known representative of the Osteoglossiformes, possesses many primitive characters such as toothed tongue bone. On the other hand, arowana has evolved several derived traits such as mouth brooding and air-breathing function of the swim bladder. It is therefore considered as an important model for exploration of the teleost diversity. Furthermore, the genetic basis of body colour variations and the sex-determining mechanisms of arowana remain largely unknown.

Our previous study published in Scientific Reports⁴ has reported a primary genome assembly of a female golden arowana and the draft genomes of the green and red varieties. The genomic and transcriptome comparisons among these three varieties have interpreted the possible molecular mechanism of colour variations. In addition, a potential sex chromosome was identified, revealing a solid clue for the sex-determination of arowana.

To construct these genomes, we extracted genomic DNAs from the three varieties of arowana and subsequently sequenced up to 100-fold coverage using the Illumina HiSeq2000 platform. After filtering low quality reads, we applied the SOAPdenovo2.2 (ref. 5) to assemble the clean reads of the three varieties separately. We obtained scaffold N50 values of 5.96 Mb (golden arowana), 1.63 Mb (red arowana) and 1.85 Mb (green arowana) respectively. The assembled genome sizes are 779, 753 and 759 Mb respectively, which are consistent with the estimates by k-mer analyses (Table 1). By using de novo-assembled transcipts and the CEGMA method⁶, we confirmed good completeness and accuracy of the three assembled genomes.

Table 1 Overview of genome assembly and annotation for the three varieties of Asian arowana.

Full size table

To construct chromosome-level assembly of the golden arowana, we performed restriction site-associated DNA sequencing (RAD-seq), on basis of 94 F2 individuals from red grad 1 and Malaysian golden crossback, to generate a high-density genetic linkage map. Ultimately, we identified a total of 22,881 single-nucleotide polymorphisms (SNPs), in which 5,740 SNPs were clustered into 25 linkage groups (Fig. 1). The genetic linkage map spanned a genetic distance of 3,240 cM. According to the refined SNPs and their corresponding scaffolds, we calculated the size of the chromosomes-level assembly up to 683 Mb, which accounts for 87.7% of the achieved genome assembly. The high-quality chromosome-level assembly of golden arowana, along with the high-density genetic linkage map, will provide a valuable resource for further comparative genomic studies.

**Figure 1: Construction of the 25 linkage groups (or pseudo-chromosomes) of golden arowana based on RAD-seq.**

Transposable elements (TEs) in the three assembled genomes were examined. We observed that TEs account for 27, 27 and 28% of the genome assemblies in golden, red and green varieties, respectively (Table 1). Multiple methods for gene annotation, including ab initio, transciptome and homology based gene prediction, were employed to generate refined gene sets, which contain 22,016 (golden arowana), 21,256 (red arowana) and 21,524 (green arowana) protein-coding genes, respectively.

The availability of a high-quality reference genome of the golden arowana provides a good opportunity for biological characterization and molecular breeding of arowana.

Methods

These methods are expanded from detailed descriptions in our previous work⁴.

Fish sample preparation

Experiments in China were approved by the Institutional Review Board on Bioethics and Biosafety of BGI and performed according to related guidelines. Animal experiments in Singapore were approved by Institutional Animal Care and Use Committee (approval ID: TLL(F)-10-003) of Temasek Life Sciences Laboratory and performed in accordance with its guidelines.

Fish samples for Illumina genome sequencing

Second filial generation (F2) individual samples of arowana, including one golden arowana (within 2 years old), one red arowana (within one-year-old) and one green arowana (within one-year-old), were obtained from Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, China.

Fish samples for RAD sequencing

The Qian Hu fish farm (Singapore) obtained F1 hybrid individuals that originated from crossing two unrelated and genetically divergent founder (F0) Asian arowana grandparents (Red grade 1×Malaysian golden varieties). Previously, we have generated two mapping families by crossing two pairs of these F1 hybrid brooders. The fostering methods of arowana are described in detail in our related work⁴. We used 94 F2 progenies and their parents for RAD sequencing to construct the high-density genetic map.

Library construction, sequencing and genome assembly

Library construction and sequencing

High-quality genomic DNAs of three arowana varieties were extracted from the mixture of three tissues (muscle, skin and liver) independently using Puregene Tissue Core Kit A (Qiagen, MD, USA) for constructing libraries with different insert-sizes (170 bp to 40 kb; see more details in Table 2). Overall, we generated 16 pair-end libraries (8 for the golden arowana, 4 for the red arowana and 4 for the green variety, respectively) with the standard operating procedure provided by Illumina (San Diego, USA). Pair-end sequencing with the whole genome sequencing (WGS) strategy was performed on the Illumina HiSeq2000 platform using the standard operating procedure.

Table 2 Construction of the sequencing libraries.

Full size table

Raw data filtering

Raw reads were subjected to quality filtering (Raw Data Clean Procedure). Details for the disciplines of filtering were described in our related work⁴. The SOAPfilter v2.2 software (http://soap.genomics.org.cn/index.html) with the optimized parameters (-y -p -g 1 -o clean -M 2 -f 0) was chosen to remove low-quality bases and PCR-replicates as well as adaptor sequences. Finally, we obtained 74.07 Gb (golden arowana), 75.60 Gb (red arowana) and 60.40 Gb (green arowana) of clean reads.

k-mer analysis

A k-mer indicates a K-bp length nucleotide segment of sequencing reads. A raw sequencing read with a total length of L bp contains (L-K+1) k-mers⁷. Details of the k-mer analysis were provided in our related work⁴. Finally, we estimated that the genome sizes of the golden, red and green varieties are 822, 949 and 897 Mb, respectively.

Genome assembly

The filtered reads were assembled using SOAPdenovo2 v2.04.4 (ref. 5) software with optimized parameters (pregraph -K 25 -d 1; contig -M 1; scaff -F -b 1.5 -p16) to generate contigs and original scaffolds. The generated genome assemblies span approximately 779, 753 and 759 Mb for the golden, red and green varieties, respectively. Finally, the scaffold N50 values reached 5.96, 1.63 and 1.85 Mb for the golden, red and green varieties respectively (Table 1).

RAD sequencing, genetic map construction, and chromosome-level assembly

RAD sequencing

High-quality genomic DNAs were extracted from the scales and/or fin clips of the 94 progeny individuals and their parents for RAD sequencing by using Mag Attract HMW DNA Kit (Qiagen, MD, USA). The DNAs were subsequently digested with the restriction endonuclease EcoRI and divided into 3 RAD libraries⁸. Sequencing was performed by an Illumina HisSeq2000 platform. Finally, 72.8 Gb of reads with 101-bp length (the average depth is 1×) were obtained.

RAD SNP calling

After performing the above-mentioned Raw Data Clean Procedure to filter the adaptor sequences and remove low-quality reads, we re-aligned the clean reads onto the golden assembly (reference) using SOAP2 v2.21 (ref. 9) software with optimized parameters (-m 100 -×888 -s 35 -l 32 -v 3 -p4). The SNPs were called using the Samtools¹⁰ in each individual. Those low-quality SNPs with the missing rates higher than 30% were discarded.

Genetic map construction

SNPs showing significant Mendelian segregation distortion (χ2 test, P<0.01, d.f.=1) were discarded. Then, the filtered SNPs were uploaded into JoinMap v4.1 (ref. 11) and analyzed with default parameters. The pairwise recombination estimations and logarithm of odds (LOD) scores of all SNPs were calculated, and SNPs pairs were then clustered into linkage groups at a LOD threshold of 10.0. Map distances (cM) were counted using the Regression mapping algorithm with the Kosambi function. All the SNP markers were clustered into 25 linkage groups (Fig. 1), which is consistent with our previously reported chromosome karyotype (2n=48 and one additional W chromosome)¹².

Anchoring the genome assembly to the linkage groups

A total of 5,740 SNP markers which located on 194 scaffolds were used for chromosomal-level assembly. The relative order between anchored scaffolds on each chromosome was determined by the following disciplines. For those scaffolds with the number of SNPs more than two, the two SNPs with the best quality on each scaffold were chosen to determine the order and the orientation. However, the orientation of those scaffolds with only one SNP was not ordered due to the lack of enough markers. Subsequently these ordered scaffolds were directly anchored to the chromosomes. Finally, 87.65% of the assembled genome sequences were able to be anchored onto the 25 pairs of chromosomes (Fig. 1, Table 3).

Table 3 statistics of the anchored scaffolds for the golden variety.

Full size table

Analysis of repetitive sequences in the draft assembly

Firstly, a de novo repeat library was built using the RepeatModeller v1.04 (http://www.repeatmasker.org/RepeatModeler.html) and LTR_FINDER¹³ with default parameters. Subsequently, the RepeatMasker v3.2.9 (ref. 14) was used to align our sequences with the Repbase TE v14.04 (ref. 15) and the de novo repeat libraries to recognize transposable elements (TEs). The Tandem Repeat Finder v4.04 (ref. 16) with optimized parameters (Match=2, Mismatch=7, Delta=7, PM=80, PI=10, Minscore=50, and MaxPerid=2000) was performed to annotate tandem repeats. Furthermore, the RepeatProteinMask software v3.2.2 (http://www.repeatmasker.org) was conducted to identify TE relevant proteins in our generated assemblies. Finally, the TEs were estimated to account for 27.34, 27.93 and 28.04% of the golden, red and green arowanan genomes, respectively (Table 1).

Genome annotation

In brief, we applied three independent methods to predict gene sets.

Homology annotation

We used the protein sequences from eight reported genomes (Table 4) to detect homology-based genes. All the protein sequences were downloaded from Ensembl (release 75). An all-against-all TblastN search was performed with an e-value of 10⁻⁵. Best hits in each of the analyzed genome were searched, and the potential gene structures were then predicted by using Genewise2.2.0 (ref. 17). Those genes with length less than 150 bp, or prematurely terminated or frame-shifted, were discarded.

Table 4 Versions of genome assemblies of the eight vertebrate species used for homology annotation.

Full size table

De novo annotation

We implemented de novo similarity-based gene prediction using AUGUSTUS v.2.5 (ref. 18) with optimized parameters (pre-trained with 1,500 randomly selected genes from the homology annotation set). To avoid pseudogene annotation, the repetitive regions of three arowana varieties were masked as ‘N’ seqeunces. Then AUGUSTUS v2.5 and GENSCAN v1.0 (ref. 19) were performed on the three-draft repeat-mask genome sequences. Subsequently, three gene sets were filtered using the same method applied for the homology annotation.

RNA-seq annotation

RNAs were isolated independently from the skin tissues of three arowana varieties for RNA-seq. RNA-seq libraries were constructed using the Illumina mRNA-Seq Prep Kit (San Diego, CA, USA). Subsequent RNA sequencing was performed using the Illumina sequencing platform. Finally, 8.4 Gb of data were generated. After discarding the low quality reads, we used the Tophat1.2 (ref. 20) software to align filtered reads onto the corresponding genome sequences separately. Then the Cufflink²¹ software was employed to determine potential gene structures of the achieved alignments of Tophat.

Integration of annotation results

After merging all results generated from the above three methods, we applied GLEAN²² to obtain a comprehensive and non-redundant gene set. Expression levels of the GLEAN genes and the alignments of Tophat were calculated using the Cuffdiff package of Cufflink²¹ software with optimized parameters (-FDR 0.05 --geometric-norm TRUE -compatible-hits-norm TURE). To find the best hit of each deduced protein, we employed BlastP with an e-value of 10⁻⁵ to map the protein sequences of GLEAN results against the SwissPort and TrEMBL databases²³ (Uniprot release 2011.06). The predicted protein sequences of the three arowana varieties were then aligned against the public databases (Pfam, PRINTS, ProDom and SMART) for detection of function motifs and domains. Ultimately, the GLEAN gene sets were filtered by the following three steps: 1) discarded genes with the length less than 150 bp, 2) discarded genes recognized as TEs, and 3) removed genes that were only obtained from the de novo method but without functional assignments and with an expression value lower than 1.

Data Records

In our previous work⁴, annotated genome sequences of the golden, red and green arowana varieties were uploaded to GenBank under the GenBank assembly accession numbers GCA_001624265.1 (Data Citation 1), GCA_001624255.1 (Data Citation 2) and GCA_001624245.1 (Data Citation 3), respectively. The RAD-seq raw read files were stored at NCBI SRA under experiment accession numbers SRX1728941 to SRX1728946 (Data Citation 4). The RNA-seq raw read files can be found at NCBI SRA SRX1668426 to SRX1668432 (Data Citation 5). In this paper, the gene annotation information of three varieties of arowana are available from Dryad (Data Citation 6). The chromosome annotation of golden arowana are available from Dryad (Data Citation 6). Data are given in tabular, tab-delimited value format.

Technical Validation

We took the following two steps to assess the generated genome assemblies. The first approach is Transcriptome evaluation. We used the Trinity²⁴ software to de novo assemble the RNA sequences of skin tissues from three varieties, and then utilized the BLAT²⁵ (E-value=10e⁻⁶, identity=90% and coverage>90%) to align the genome assemblies (Table 5). The second approach is CEGMA⁶ (Core Eukaryotic Genes Mapping Approach; http://korflab.ucdavis.edu/Datasets/genome_completeness, version 2.3) assessment with 248 conserved Core Eukaryotic Genes (CEGs) subjected to evaluation of the gene space completeness within the three assemblies. The evaluation results revealed a high coverage of more than 90% of gene coding-regions and over 95% of core eukaryotic genes, confirming the high-level completeness of the three genome assemblies (Table 6).

Table 5 Assessing the completeness of gene regions in the three genome assemblies by RNA-seq of skin tissue.

Full size table

Table 6 Coverage rates of core eukaryotic genes in the three assembled genomes by CEGMA.

Full size table

Usage Notes

Except for the Joinmap v4.1 that was ran on a Windows system, the other softwares were run on a linux system. The optimized parameters are provided in the main text.

Additional Information

How to cite this article: Li, J. et al. A chromosome-level genome assembly of the Asian arowana, Scleropages formosus. Sci. Data 3:160105 doi: 10.1038/sdata.2016.105 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Kumazawa, Y. & Nishida, M. Molecular phylogeny of osteoglossoids: a new model for Gondwanian origin and plate tectonic transportation of the Asian arowana. Molecular Biology & Evolution 17, 1869–1878 (2000).
Article CAS Google Scholar
Scott, D. B. C. & Fuller, J. D. The reproductive biology of Scleropages formosus (Müller & Schlegel) (Osteoglossomorpha, Osteoglossidae) in Malaya, and the morphology of its pituitary gland. Journal of Fish Biology 8, 45–53 (2006).
Article Google Scholar
Greenwood, P. H. Phyletic studies of teleostean fishes, with a provisional classification of living forms. Bull.amer.mus.nat.hist 131, 455 (1966).
Google Scholar
Bian, C. et al. The Asian arowana (Scleropages formosus) genome provides new insights into the evolution of an early lineage of teleosts. Scientific Reports 6, 24501 (2016).
Article ADS CAS Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).
Article Google Scholar
Genis, P., Keith, B., Zemin, N., Thomas, K. & Ian, K. Assessing the gene space in draft genomes. Nucleic acids research 37, 289–297 (2009).
Article Google Scholar
Song, L. et al. Draft genome of the Chinese mitten crab, Eriocheir sinensis. GigaScience 5, 1–3 (2016).
Article ADS CAS Google Scholar
Miller, M. R., Dunham, J. P., Amores, A., Cresko, W. A. & Johnson, E. A. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome research 17, 240–248 (2007).
Article CAS Google Scholar
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics (Oxford, England) 25, 1966–1967 (2009).
Article CAS Google Scholar
Li, H., Handsaker, B., Wysoker, A., Fennell, T. & Ruan, J. The Sequence Alignment-Map format and SAMtools. Bioinformatics (Oxford, England) 25, 2078–2079 (2009).
Article Google Scholar
van Ooijen, J. W. Joinmap 4: software for the calculation of genetic linkage maps in experimental populations (ed. Kyazma B.V.) (Wageningen, Netherlands, 2006).
Google Scholar
Shen, X. Y. et al. The first transcriptome and genetic linkage map for Asian arowana. Molecular ecology resources 14, 622–635 (2014).
Article CAS Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research 35, W265–W268 (2007).
Article Google Scholar
Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences (John Wiley & Sons, Inc., 2004).
Book Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic & Genome Research 110, 462–467 (2005).
Article CAS Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27, 573–580 (1999).
Article CAS Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome research 14, 988–995 (2004).
Article CAS Google Scholar
Mario, S. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic acids research 34, 435–439 (2006).
Article Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997).
Article CAS Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (Oxford, England) 25, 1105–1111 (2009).
Article CAS Google Scholar
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology 31, 46–53 (2013).
Article CAS Google Scholar
Elsik, C. G. et al. Creating a honey bee consensus gene set. Genome Biology 8, 90–105 (2007).
Article Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research 28, 45–48 (2000).
Article CAS Google Scholar
Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature Biotechnology 29, 644–652 (2013).
Article Google Scholar
Kent, W. J. BLAT--the BLAST-like alignment tool. Genome research 12, 656–664 (2002).
Article CAS Google Scholar

Data Citations

NCBI Assembly GCA_001624265.1 (2016)
NCBI Assembly GCA_001624255.1 (2016)
NCBI Assembly GCA_001624245.1 (2016)
NCBI Sequence Read Archive SRX1728941 to SRX1728946 (2016)
NCBI Sequence Read Archive SRX1668426 to SRX1668432 (2016)
Li, J. Dryad Digital Repository https://doi.org/10.5061/dryad.9rp2b (2016)

Download references

Acknowledgements

We acknowledge the financial support from China Special Project on the Integration of Industry, Education and Research of Guangdong Province (No 2013B090800017) and National Science Infrastructure Platform of China (No 2016DKA30470). This research project was also supported by the Biomedical Research Council of A*STAR of Singapore, China 863 project (No 2014AA093501), China Shenzhen Science and Technology program (no CXB201108250095A), Shenzhen Special Program for Industrial Development (No JSGG20141020113728803), and internal research grants from the Temasek Life Sciences Laboratory of Singapore.

Author information

Jia Li, Chao Bian, Yinchang Hu, Xidong Mu, Xueyan Shen and Vydianathan Ravi: These authors contributed equally to this work

Authors and Affiliations

Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI, Shenzhen, 518083, China
Jia Li, Chao Bian, Ying Sun, Xinxin You, Ying Qiu, Xinhui Zhang, Hui Yu, Yu Huang, Junmin Xu & Qiong Shi
Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 510380, China
Yinchang Hu & Xidong Mu
Reproductive Genomics Group, Temasek Life Sciences Laboratory, Singapore, 117604, Singapore
Xueyan Shen, Inna S. Kuznetsova & László Orbán
Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, 138673, Singapore
Vydianathan Ravi & Byrappa Venkatesh
Department of Cytology and Histology, Laboratory of Chromosome Structure and Function, Biological Faculty, Saint Petersburg State University, Saint-Petersburg, 198504, Russia
Inna S. Kuznetsova
Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi, 214081, China
Pao Xu & Ruobo Gu
BGI-Zhenjiang Institute of Hydrobiology, Zhenjiang, 212000, China
Ruobo Gu, Junmin Xu & Qiong Shi
Marine Research Center, College of Life Sciences, Shenzhen University, Shenzhen, 518060, China
Qiong Shi

Authors

Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Chao Bian
View author publications
You can also search for this author in PubMed Google Scholar
Yinchang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xidong Mu
View author publications
You can also search for this author in PubMed Google Scholar
Xueyan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Vydianathan Ravi
View author publications
You can also search for this author in PubMed Google Scholar
Inna S. Kuznetsova
View author publications
You can also search for this author in PubMed Google Scholar
Ying Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xinxin You
View author publications
You can also search for this author in PubMed Google Scholar
Ying Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xinhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Pao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ruobo Gu
View author publications
You can also search for this author in PubMed Google Scholar
Junmin Xu
View author publications
You can also search for this author in PubMed Google Scholar
László Orbán
View author publications
You can also search for this author in PubMed Google Scholar
Byrappa Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Shi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Q.S., B.V., L.O., C.B. and Yi.H. initiated and conceived the arowana genome project. Q.S., B.V., L.O., Yi.H., X.M., Y.S., I.S.K., X.Y., Y.Q., X.Z., H.Y. and Y.H. collected the samples for WGS and RAD sequencing. C.B. and J.L. performed the genome assemblies and annotations of three arowana varieties. B.V., V.R., C.B. and J.L. constructed the phylogenetic trees. B.V. and V.R. analyzed the Hox gene families. P.X., R.G. and J.X. contributed to the planning of the whole project or various parts of it. J.L., C.B., Q.S., B.V., L.O., V.R., X.S. and X.Y. wrote the manuscript.

Corresponding authors

Correspondence to László Orbán, Byrappa Venkatesh or Qiong Shi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

Reprints and permissions

About this article

Cite this article

Li, J., Bian, C., Hu, Y. et al. A chromosome-level genome assembly of the Asian arowana, Scleropages formosus. Sci Data 3, 160105 (2016). https://doi.org/10.1038/sdata.2016.105

Download citation

Received: 25 May 2016
Accepted: 15 September 2016
Published: 06 December 2016
DOI: https://doi.org/10.1038/sdata.2016.105

This article is cited by

Evidence that DNA repair genes, a family of tumor suppressor genes, are associated with evolution rate and size of genomes
- Konstantinos Voskarides
- Harsh Dweep
- Charalambos Chrysostomou
Human Genomics (2019)
Divergence, evolution and adaptation in ray-finned fish genomes
- Chao Bian
- Yu Huang
- Qiong Shi
Science China Life Sciences (2019)

Subjects

Abstract

Similar content being viewed by others

Chromosome-scale assembly and high-density genetic map of the yellow drum, Nibea albiflora

The first high-quality genome assembly and annotation of Anthocidaris crassispina

A chromosome-level genome assembly and evolutionary analysis of Coregonus ussuriensis Berg

Background & Summary

Methods

Fish sample preparation

Fish samples for Illumina genome sequencing

Fish samples for RAD sequencing

Library construction, sequencing and genome assembly

Library construction and sequencing

Raw data filtering

k-mer analysis

Genome assembly

RAD sequencing, genetic map construction, and chromosome-level assembly

RAD sequencing

RAD SNP calling

Genetic map construction

Anchoring the genome assembly to the linkage groups

Analysis of repetitive sequences in the draft assembly

Genome annotation

Homology annotation

De novo annotation

RNA-seq annotation

Integration of annotation results

Data Records

Technical Validation

Usage Notes

Additional Information

References

References

Data Citations

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

ISA-Tab metadata

ISA-Tab metadata

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Evidence that DNA repair genes, a family of tumor suppressor genes, are associated with evolution rate and size of genomes

Divergence, evolution and adaptation in ray-finned fish genomes

Search

Quick links