Chromosome-level assembly of Gymnocypris eckloni genome

Wang, Fayan; Wang, Lihan; Liu, Dan; Gao, Qiang; Nie, Miaomiao; Zhu, Shihai; Chao, Yan; Yang, Chaojie; Zhang, Cunfang; Yi, Rigui; Ni, Weilin; Tian, Fei; Zhao, Kai; Qi, Delin

doi:10.1038/s41597-022-01595-w

Download PDF

Data Descriptor
Open access
Published: 02 August 2022

Chromosome-level assembly of Gymnocypris eckloni genome

Fayan Wang¹^na1,
Lihan Wang¹^na1,
Dan Liu¹,
Qiang Gao¹,
Miaomiao Nie¹,
Shihai Zhu²,
Yan Chao³,
Chaojie Yang¹,
Cunfang Zhang¹,
Rigui Yi¹,
Weilin Ni¹,
Fei Tian⁴,
Kai Zhao⁴ &
…
Delin Qi¹

Scientific Data volume 9, Article number: 464 (2022) Cite this article

2163 Accesses
8 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Gymnocypris eckloni is widely distributed in isolated lakes and the upper reaches of the Yellow River and play significant roles in the trophic web of freshwater communities. In this study, we generated a chromosome-level genome of G. eckloni using PacBio, Illumina and Hi-C sequencing data. The genome consists of 23 pseudo-chromosomes that contain 918.68 Mb of sequence, with a scaffold N50 length of 43.54 Mb. In total, 23,157 genes were annotated, representing 94.80% of the total predicted protein-coding genes. The phylogenetic analysis showed that G. eckloni was most closely related to C. carpio with an estimated divergence time of ~34.8 million years ago. For G. eckloni, we identified a high-quality genome at the chromosome level. This genome will serve as a valuable genomic resource for future research on the evolution and ecology of the schizothoracine fish in the Qinghai-Tibetan Plateau.

Measurement(s)	Genome
Technology Type(s)	Whole Genome Sequencing
Sample Characteristic - Organism	Gymnocypris eckloni
Sample Characteristic - Environment	fresh water
Sample Characteristic - Location	Little Yellow River

Chromosome-level assembly of Triplophysa yarkandensis genome based on the single molecule real-time sequencing

Article Open access 05 January 2024

Chromosome-level genome assembly of Plagiognathops microlepis based on PacBio HiFi and Hi-C sequencing

Article Open access 19 July 2024

Chromosome-level genome assembly of ridgetail white shrimp Exopalaemon carinicauda

Article Open access 04 June 2024

Background & Summary

The Qinghai-Tibetan Plateau (QTP) is the highest and one of the biggest plateaus on earth, covering 2.5 × 10⁶ square kilometers with an elevation of 3000–5000 m for most parts of the area. The intensive uplifts of QTP resulted from collision of the India plate and the Eurasia plate had a profound impact on the climate and environment^1,2. Characterized by high altitude, low oxygen partial pressure (hypoxia), low temperatures, dramatic temperature fluctuations, and high UV radiation, the QTP environment posed harsh challenges to the endemic animals^3,4. Recently, comparative genomic studies of animals endemic to the QTP provide valuable clues for scientists to understand the molecular mechanism of environmental adaptation^4,5,6,7,8. However, the genome information of fish species in QTP is still lacking.

Schizothoracine fish (Teleostei: Cyprinidae) are the largest and most diverse taxon within the QTP ichthyofauna and their radiation has been correlated with the plateau’s rapid upheaval^9,10. The schizothoracine fish, confined to regions at either high altitudes or high latitudes, have evolved a number of unique traits (i.e., degeneration of body scales, slow growth, and late sexual maturity) that adapt to the extreme environment of the QTP and play significant roles in the trophic web of QTP freshwater communities^10,11,12,13. Therefore, the schizothoracine fish have been accepted as ideal models for studying the molecular mechanisms underlying the adaptation to harsh environments^11,12,13.

The schizothoracine fish comprises 11 or 12 genera and approximately 100 species and are mainly distributed in cold tributaries and lakes of the QTP and adjacent areas at 2000 m above sea level^10,11. The phylogenetic analysis based on morphological traits revealed that the schizothoracine fishes can be divided into three sub-groups including primitive, specialized and highly specialized group¹⁰, which was proposed to be associated with the tectonic upshifts of the QTP^14,15,16. Previous studies have shown that the karyotypes of the schizothoracine fish range from 90 to 446 and that almost all species were polyploid^17,18,19,20. A recent genomic study confirmed that Schizothorax o’connori of Schizothoracinae was a young tetraploid that underwent a fourth whole-genome duplication (4 R WGD) after the teleost-specific third WGD (3 R WGD)²¹. Other studies indicated that the globin gene superfamily, toll-like receptor family, and interferon regulatory factors in a representative species from this subfamily underwent adaptive evolution in response to the plateau environment, specifically gene loss, and gain events as a result of genome and/or gene duplications^13,22,23,24. Gymnocypris eckloni is a representative species of the highly specialized schizothoracine fish that is widely distributed in isolated lakes and the upper reaches of the Yellow River, and is very well adapted to the plateau’s aqueous environment^9,10. Investigating the genomic evolution of G. eckloni may shed light on the underlying molecular mechanisms involved in high-altitude adaptations in schizothoracine fish of the QTP.

In the present study, we integrated PacBio long-read sequencing, Illumina short-read sequencing, and high-throughput chromosome conformation capture (Hi-C) technology to generate a high-quality chromosome-level reference genome for G. eckloni. The reference genome obtained in this study will provide a foundation for future investigations on the evolution and adaptation of schizothoracine fish.

Methods

Experimental fish and sequencing

G. eckloni genomic DNA were extracted from the muscle samples of a healthy female individuals obtained from the Native Fish Artificial Proliferation and Release Station, Xunhua, Qinghai Province, China (Fig. s1). For genome assembly, two libraries with insert sizes of 300 bp and 20 kb were separately constructed using an Illumina TruSeq Nano DNA Library Prep Kit and SMRT bell Template Prep Kit. The two libraries were subsequently sequenced using an Illumina HiSeq X Ten instrument and a PacBio Sequel platform²⁵. For the PacBio platform, a total of 312.2 Gb PacBio long sequencing reads were generated, and 239.0 Gb subreads (334.6 × coverage) with an average length of 23,706 bp were obtained after removing adaptors in polymerase reads (Table 1). For the Illumina HiSeq X Ten sequencing platform, a total of 251.7 Gb short sequencing reads were generated. After filtering, 215.2 Gb (231.2 × coverage) of clean Illumina data were retained to perform a genome survey.

Table 1 Sequencing data used for the genome G. eckloni assembly.

Full size table

To conduct chromosome-level assembly of the G. eckloni genome, a Hi-C library was generated using the Mbo I restriction enzyme following previously described standard protocol with minor modifications²⁶. In brief, the purified DNA from the fresh muscle sample was digested with Mbo I restriction enzyme and labelled by incubating with Biotin-14-dATP (Thermo Fisher Scientific, USA), and then ligated by T4 DNA Ligase. After incubating overnight to reverse crosslinks, the ligated DNA was sheared into 200–600 bp fragments, and then blunt-end repaired and A-tailed, followed by purification through biotin-streptavidin-mediated pull down. Finally, the Hi-C libraries were quantified and sequenced on the Illumina NovaSeq6000 platform (Illumina, USA) using a PE-150 module, generating a total of 257.3 Gb (275.8 × coverage) clean data after using the same filter criteria with short reads (Table 1).

To provide evidence of transcripts for genome structure annotation, we conducted RNA-seq for muscle, skin, gill, liver, gut, spleen, kidney, heart, eye and blood samples. RNA was extracted using Ambion MagMAX-96 total RNA isolation kit (Life Sciences, United States) for all samples, and DNase I treatment was performed to eliminate DNA contamination. After the quality assessment of the extracted RNAs using NanoPhotometer® spectrophotometer (Implen, United States), RNA-seq libraries were constructed according to the protoco and were sequenced by Illumina HiSeq4000 in paired-end 150 bp mode, resulting in a total of 66.43 Gb clean transcriptome data (Table 1).

De novo assembly of G. eckloni genome

We used the k-mer method to survey the genomic features of the G. eckloni. The k-mer count histogram was obtained from Illumina paired-end sequencing data using Jellyfish v2.99²⁷. Based on the total number of 169,021,371,761 17-mers and a peak 17-mer depth of 181, the genome size of G. eckloni was estimated to be 927.13 Mb, and the estimated heterozygosity rate was approximately 1.82% (Table s1).

The 239.0 Gb subreads from the PacBio Sequel platform were used for genome assembly using wtdbg2²⁸ followed by Quiver²⁹ and Pilon³⁰ polishing using the 215.2 Gb of Illumina HiSeq clean reads, which produced a 918.45 Mb genome assembly, consisting of 3,170 contigs with a contig N50 size of 4.19 Mb (Table 2).

Table 2 The statistics of length and number for the de novo assembled G. eckloni genome.

Full size table

Hi-C technology was applied to conduct the chromosome-level genome assembly of G. eckloni. Clean reads sequenced from the Hi-C library were aligned to the contig-level genome with an end-to-end algorithm implemented in Bowtie v2.3.5 according to the Hi-C-Pro strategy^31,32. Juicer v1.6.2 and 3D de novo assembly (3D-DNA) pipelines were used to assemble the contigs into the chromosome-level genome^33,34. Ultimately, the assembled sequences were further anchored and orientated onto 23 pseudo-chromosomes using Hi-C data. The 23 pseudo-chromosomes ranged in size from 15.91 to 89.39 Mb (Fig. 1 and Table s2), covering ~98.52% of the whole genome. Finally, the G. eckloni genome was obtained with 711 scaffolds and a total length of 918,681,488 bp, a contig N50 of 4.19 Mb, and scaffold N50 of 43.54 Mb (Table 2).

The completeness of the genome assembly was assessed by the single copy orthologs (BUSCO, version 5.3.2)³⁵ and CEGMA³⁶ software. The BUSCO analysis based on the actinopterygii_odb10 database showed that 87.5% (single-copy genes: 83.0%, duplicated genes: 4.5%) of the 3,640 single-copy genes were identified as complete, 1.3% were fragmented, and 11.2% were missing from the assembled genome. The CEGMA analysis revealed that 221 conserved genes (89.11% of the core eukaryotic genes) supported the completeness of the assembled genome. Illumina short reads were mapped to the assembled genome using BWA³⁷ software to evaluate completeness of the genome assembly. The results showed that 93.40% of the reads could be mapped, covering 96.34% of the assembled genome.

Repetitive element and non-coding gene annotation in the G. eckloni genome

A combined strategy using homology alignments and de novo searches to identify whole-genome repeats was applied in our repeat annotation pipeline. Tandem repeats were extracted using TRF (http://tandem.bu.edu/trf/trf.html) by ab initio prediction. For homolog prediction, Repbase (http://www.girinst.org /repbase) employing RepeatMasker (http://www.repeatmasker.org/) software and its in-house scripts (RepeatProteinMask) with default parameters was used to extract repeat regions. Additionally, ab initio prediction based on the de novo repetitive elements database was conducted by LTR_FINDER (http://tlife.fudan.edu.cn/ltr_finder/), RepeatScout (http://www.repeatmasker.org/), and RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html) with default parameters. Then, all repeat sequences with lengths > 100 bp and gap ‘N’ < 5% were used to construct the raw transposable element (TE) library. A custom library (a combination of Repbase and our de novo TE library, which was processed by uclust to yield a non-redundant library) was supplied to RepeatMasker for DNA-level repeat identification. The results showed revealed that 47.63% of the G. eckloni genome was annotated as repetitive elements (Table s3), of which LTRs were the most abundant with a total length of 356.79 Mb, accounting for 38.84% of the whole genome. SINEs were the rarest with a total length of 2.37 Mb and represented 0.26% of the whole genome (Table s4).

The tRNAs were predicted using tRNAscan-SE (http://lowelab.ucsc.edu/tRNAscan-SE/), and the rRNA sequences were predicted using BLAST. The results showed that a total of 12,157 tRNAs were predicted using tRNAscan-SE, and 1,780 rRNA genes were annotated using BLASTN tool with an E-value of 1E-10³² against human rRNA sequence. Other ncRNAs, including miRNAs and snRNAs, were identified by searching against the Rfam database with default parameters using infernal software (http://infernal.janelia.org/) (Table s5).

Annotation of protein-coding genes

Gene predictions were conducted through a combination of homology, de novo, and transcriptome-based prediction methods. For homology-based predictions, the protein sequences of seven fish species, including Oryzias latipes, Ctenopharyngodon idellus, Ictalurus punctatus, Cyprinus carpio, Takifugu rubripes, Danio rerio, and Astyanax mexicanus, were downloaded from Ensembl database (http://asia.ensembl.org/index. html). Protein sequences were aligned to the genome using TblastN v2.2.26 with an e-value of 1e⁻⁵³⁸. Then, matching proteins were aligned to homologous genome sequences for accurate spliced alignments using GeneWise v2.4.1³⁹ (referred to “Homolog” in Table 3), which was subsequently used to predict gene structure of each protein region. RNA-sequencing data derived from nine tissues and blood samples were assembled using Trinity v2.1.1⁴⁰, and were aligned against the G. eckloni genome using Program to Assemble Spliced Alignment (PASA)⁴¹ (referred to “PASA” in Table 3). To optimize genome annotation, RNA-seq reads from different tissues were aligned to G. eckloni genome fasta using TopHat package v2.0.11 with default parameters to identify exons region and splice positions⁴². The alignment results were then used as inputs for Cufflinks package v2.2.1 with default parameters for genome-based transcript assembly⁴³ (referred to “Cufflinks”in Table 3). Finally, EvidenceModeler v1.1.1 was used to combine the gene models into weighted consensus gene structures with masked repetitive elements⁴¹. Additionally, PASA was used to update the final gene models, thereby adding information of alternatively spliced sites and untranslated regions (UTR) (referred to “Pasa-update” in Table 3). Ultimately, a total of 24,430 protein-coding genes were predicted in the G. eckloni genome. The average transcript length was 16,219.34 bp with an average coding sequence (CDS) length of 1,536.71 bp. The average exon number per gene was 8.88 with an average exon length of 173.00 bp and average intron length of 1,862.69 bp (Table 3). The statistics of gene models, including lengths of a gene, CDS, intron, and exon in G. eckloni were comparable to those for close-related species (Table s6 and Fig. 2).

Table 3 Gene annotation of G. eckloni genome via three methods.

Full size table

Public biological function databases of NR, SwissProt⁴⁴, InterPro⁴⁵, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases⁴⁶ were used for the functional annotation of protein-coding genes using BLASTX and BLASTN utilities⁴⁶ with an e-value threshold of 1e⁻⁵. InterPro database was used to predict protein function based on the conserved protein domains by InterproScan tool⁴⁷. A total of 23,157 genes (94.8%) were successfully annotated by at least one public database (Table s7 and Fig. 3).

Evolutionary and comparative genomic analysis

To examine G. eckloni evolution, we used OthoMCL⁴⁸ to cluster its genes with those from 13 other vertebrates: Astyanax mexicanus, Ictalurus punctatus, Danio rerio, C. carpio, Ctenopharyngodon idella, Oreochromis niloticus, Oryzias latipes, Takifugu rubripes, Gallus gallus, Homo sapiens, Mus musculus, Xenopus tropicalis, and Petromyzon marinus. From these 14 species, we identified 597 one-to-one single-copy genes that were used to construct a maximum likelihood (ML) tree using RaxML with the GTRGAMMA model⁴⁹. Divergence times between species were calculated using the MCMC tree program implemented by PAML package⁵⁰. According to the time-calibrated phylogeny, the age of the most recent common ancestor (MRCA) of the teleost fish was estimated to be 211.8–254.1 million years ago. The G. eckloni with the closest relationship to C. carpio shared an MRCA at ~ 34.8 million years ago (Fig. 4).

A total of 24,619 gene families were identified among the 14 species (Table s8), of which 2,739 core gene families were shared by all 14 species and 856 gene families is unique for G. eckloni including 1,488 genes. Analysis of the expansion and contraction of the gene families revealed that there were 464 (1650 genes) expanded and 743 (192 genes) contracted gene families in G. eckloni when compared to its MRCA (Fig. 4). The expanded gene families included ABC transporters, Peroxisome, Herpes simplex virus 1 infection, Staphylococcus aureus infection, Axon guidance, Dorso-ventral axis formation, Pertussis, Legionellosis, Rap1 signaling pathway and so on, and the contracted gene families included Tight junction, Systemic lupus erythematosus, Pathogenic Escherichia coli infection, Gap junction, Alcoholism, Pertussis, Ascorbate and aldarate metabolism, NOD-like receptor signaling pathway and so on.

Data Records

All raw data of the whole genome have been deposited into the National Center for Biotechnology Information (NCBI) SRA database (Experiments for SRP377513) under BioProject accession number PRJNA835611⁵¹. The assembled genome has been deposited at DDBJ/ENA/GenBank under the accession JAMHKY000000000⁵². Data of the expansion and contraction of the gene families, gene functional annotations, repeat annotation and results of evolutionary analysis had been deposited at Figshare⁵³.

Technical Validation

RNA integrity

The transcriptomes for nine tissues and blood from three fish individuals were sequenced. Before constructing RNA-Seq libraries, RNA purity was analyzed with a NanoPhotometer Spectrophotometer (Implen, United States). The RNA concentration was quantified with a Qubit RNA Assay Kit in a Qubit 2.0 Fluorometer (Life Technologies, United States). RNA integrity was analyzed using a RNA Nano 6000 Assay Kit and an Agilent Bioanalyzer 2100 (Agilent Technologies, United States). The total amount of RNA, RNA integrity and rRNA ratio were used to estimate the quality, content and degradation level of RNA samples. In the present study, RNAs samples with a total RNA amount ≥ 10 μg, RNA integrity number ≥ 8, and rRNA ratio ≥ 1.5 were finally subjected to construct the sequencing library.

Comparative genomic analyses

The protein sequences of 13 vertebrates, including A. mexicanus, I. punctatus, D. rerio, C. carpio, C. idella, O. niloticus, O. latipes, T. rubripes, G. gallus, H. sapiens, M. musculus, X. tropicalis, and P. marinus, were downloaded from the Ensembl database (Release 98). Orthologous relationships between the genes from G. eckloni and the 13 other vertebrates were inferred through all-against-all protein sequence similarity searches using OthoMCL⁴⁸. Only the longest predicted transcript per locus was retained. In the all-against-all BLASTP comparisons, a cutoff e-value of 1e⁻⁵ was used. The MCL inflation index was set to 1.5.

For each gene family, an alignment was produced using Muscle (http://www.drive5.com/muscle/), and ambiguously aligned positions were trimmed using Gblocks (http://molevol.cmima.csic.es/castresana/Gblocks.html). The tree was inferred using RAxML⁴⁹. The best-scoring ML tree was inferred by a rapid bootstrap algorithm and ML searches after performing 1000 rapid bootstrap replications. Divergence times between species were calculated using the MCMC tree program implemented by PAML package⁵⁰. The divergence times for D. rerio vs C. idella (48–75 Ma), A. mexicanus vs C. carpio (137–174 Ma), C. carpio vs T. rubripes (206–252 Ma), G. gallus vs X. tropicalis (347.6–358.3 Ma), T. rubripes vs G. gallus (413–443 Ma), and G. gallus vs P. marinus (515–646 Ma) were obtained from the TimeTree database then used to calibrate divergence dates of other nodes on the phylogenetic tree⁵⁴.

According to the divergence times and phylogenetic relationships, CAFÉ was used to analyze the expansion and constriction of gene families in the G. eckloni genome based on the gene families identified by OrthoMCL⁵⁵. The phylogenetic tree topology and branch lengths were taken into account when inferring the significance of change in the gene family size of each branch. Enrichment analyses based on the Gene Ontology (GO) and KEGG annotations were performed to identify the functional implications of expanded and contracted genes (Fisher’s exact test, adjusted p-value < 0.05).

Code availability

All software used in this work is in the public domain, with parameters being clearly described in Methods. If no detail parameters were mentioned for a software, default parameters were used as suggested by developer.

References

Li, J. & Fang, X. Uplift of the Tibetan Plateau and environmental changes. Chinese Science Bulletin 44, 2117–2124 (1999).
Article ADS Google Scholar
Favre, A. et al. The role of the uplift of the Qinghai-Tibetan Plateau for the evolution of Tibetan biotas. Biol Rev Camb Philos Soc 90, 236–253 (2015).
Article PubMed Google Scholar
Scheinfeldt, L. B. & Tishkoff, S. A. Living the high life: high-altitude adaptation. Genome Biol 11, 133 (2010).
Article PubMed PubMed Central Google Scholar
Qiu, Q. et al. The yak genome and adaptation to life at high altitude. Nat Genet 44, 946–949 (2012).
Article CAS PubMed Google Scholar
Chen, N. et al. Ancient genomes reveal tropical bovid species in the Tibetan Plateau contributed to the prevalence of hunting game until the late Neolithic. Proc Natl Acad Sci USA 117, 28150–28159 (2020).
Article CAS PubMed PubMed Central Google Scholar
Qu, Y. et al. Ground tit genome reveals avian adaptation to living at high altitudes in the Tibetan plateau. Nat Commun 4, 2071 (2013).
Article ADS PubMed CAS Google Scholar
Ge, R. L. et al. Draft genome sequence of the Tibetan antelope. Nat Commun 4, 1858 (2013).
Article ADS PubMed CAS Google Scholar
Yu, H. et al. Genomic evidence for the Chinese mountain cat as a wildcat conspecific (Felis silvestris bieti) and its introgression to domestic cats. Sci Adv 7 (2021).
Chen, Y. F. & Cao, W. Y. in Fauna Sinica, Osteichthyes, Cypriniformes III. (ed P.Q. Yue) 273-390. (Science Press, 2000).
Wu, Y. F. & Wu, C. Z. The fishes of the Qinghai – Xizang plateau. (Science and Technology Press, 1992).
Qi, D. et al. Convergent, parallel and correlated evolution of trophic morphologies in the subfamily schizothoracinae from the Qinghai-Tibetan plateau. PLoS One 7, e34070 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Qi, D. et al. Transcriptome Analysis Provides Insights Into the Adaptive Responses to Hypoxia of a Schizothoracine Fish (Gymnocypris eckloni). Front Physiol 9, 1326 (2018).
Article PubMed PubMed Central Google Scholar
Xia, M. et al. Changes of hemoglobin expression in response to hypoxia in a Tibetan schizothoracine fish, Schizopygopsis pylzovi. J Comp Physiol B 186, 1033–1043 (2016).
Article CAS PubMed Google Scholar
Cao, W. X., Chen, Y. Y., Wu, Y. F. & Zhu, S. Q. in Studies on the Period, Amplitude and Type of the Uplift of the Qinghai–Xizang Plateau (ed Chinese Academy of Sciences The Team of the Comprehensive Scientific Expedition to the Qinghai-Xizang Plateau) 118-130 (Science Press, 1981).
Li, Y. et al. High altitude adaptation of the schizothoracine fishes (Cyprinidae) revealed by the mitochondrial genome analyses. Gene 517, 169–178 (2013).
Article CAS PubMed Google Scholar
Yonezawa, T., Hasegawa, M. & Zhong, Y. Polyphyletic origins of schizothoracine fish (Cyprinidae, Osteichthyes) and adaptive evolution in their mitochondrial genomes. Genes Genet Syst 89, 187–191 (2014).
Article PubMed Google Scholar
Zan, R. G., Liu, W. G. & Song, Z. Tetraploid-hexaploid relationship in Schizothoracinae. Acta Genet. Sin. 12, 137–142 (1985).
Google Scholar
Yu, X. Y., Li, Y. C. & Zhou, T. Karyotype studies of cyprinid fishes in China -Comparative study of the karyotypes of 8 species of schizothoracine fishes. Journal of Wuhan University 2, 97–104 (1990).
Google Scholar
Yang, S. et al. Morphogenesis of blood cell lineages in Ya-fish (Schizothorax prenanti). Chinese Journal of Zoology 50, 231–242 (2015).
Google Scholar
Dai, Y. & Han, H. Karyological analysis of two species in the subfamily schizothoracinae (Cypriniformes: Cyprinidae) from China, with notes on karyotype evolution in schizothoracinae. Turkish Journal of Fisheries and Aquatic Sciences 18, 175–186 (2018).
Article Google Scholar
Xiao, S. et al. Genome of Tetraploid Fish Schizothorax o’connori Provides Insights into Early Re-diploidization and High-Altitude Adaptation. iScience 23, 101497 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Qi, D. et al. Adaptive evolution of interferon regulatory factors is not correlated with body scale reduction or loss in schizothoracine fish. Fish Shellfish Immunol 73, 145–151 (2018).
Article CAS PubMed Google Scholar
Qi, D. et al. Duplication of toll-like receptor 22 in teleost fishes. Fish Shellfish Immunol 94, 752–760 (2019).
Article PubMed Google Scholar
Chen, Q. C. et al. A new pattern of hemoglobin switching in teleost fish-study of the embryonic hemoglobin in the Schizopygopsis pylzovi. Acta Hydrobiologica Sinica 44, 1199–1207 (2020).
Google Scholar
Peng, Y. et al. Chromosome-level genome assembly of the Arctic fox (Vulpes lagopus) using PacBio sequencing and Hi-C technology. Mol Ecol Resou (2021).
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155–158 (2020).
Article CAS PubMed Google Scholar
Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569 (2013).
Article CAS PubMed Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16, 259 (2015).
Article PubMed PubMed Central CAS Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article CAS PubMed Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gertz, E. M., Yu, Y. K., Agarwala, R., Schaffer, A. A. & Altschul, S. F. Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol 4, 41 (2006).
Article PubMed PubMed Central CAS Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, R7 (2008).
Article PubMed PubMed Central CAS Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36 (2013).
Article PubMed PubMed Central CAS Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–439 (2006).
Article CAS PubMed PubMed Central Google Scholar
UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic Acids Res 46, 2699 (2018).
Article CAS Google Scholar
Finn, R. D. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res 45, D190–D199 (2017).
Article CAS PubMed Google Scholar
Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42, D199–205 (2014).
Article CAS PubMed Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).
Article CAS PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/insdc.sra:SRP377513 (2022).
Qi, D. Gymnocypris eckloni isolate SKLPE_202101, NCBI Assembly, https://identifiers.org/ncbi/insdc.gca:GCA_024082105.1 (2022).
Qi, D. Chromosome-level assembly of Gymnocypris eckloni genome, figshare https://doi.org/10.6084/m9.figshare.19633674.v2 (2022).
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol Biol Evol 34, 1812–1819 (2017).
Article CAS PubMed Google Scholar
De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This work was supported by grants from the Natural Science Foundation of Qinghai Science & Technology Department in China (No. 2020-ZJ-907) and the National Natural Science Foundation of China (No. 31960127).

Author information

These authors contributed equally: Fayan Wang, Lihan Wang.

Authors and Affiliations

State Key Laboratory of Plateau Ecology and Agriculture, Qinghai University, Xining, 810016, China
Fayan Wang, Lihan Wang, Dan Liu, Qiang Gao, Miaomiao Nie, Chaojie Yang, Cunfang Zhang, Rigui Yi, Weilin Ni & Delin Qi
College of Eco-Environmental Engineering, Qinghai University, Xining, 810016, China
Shihai Zhu
Animal Science Department of Agriculture and Animal Husbandry College, Qinghai University, Xining, 810016, China
Yan Chao
Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, China
Fei Tian & Kai Zhao

Authors

Fayan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lihan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Nie
View author publications
You can also search for this author in PubMed Google Scholar
Shihai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Chao
View author publications
You can also search for this author in PubMed Google Scholar
Chaojie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Cunfang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rigui Yi
View author publications
You can also search for this author in PubMed Google Scholar
Weilin Ni
View author publications
You can also search for this author in PubMed Google Scholar
Fei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Delin Qi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.L.Q. and K.Z. planned the project. F.Y.W., L.H.W., M.M.N., R.G.Y., and W.L.N. performed the experiments. D.L.Q., Q.G., C.J.Y. and F.T. performed the data analyses. C.F.Z., S.H.Z. and Y.C. assisted with sampling and experimentation. D.L.Q. and F.Y.W. wrote and revised the manuscript. Also, all authors read, edited and approved the final manuscript.

Corresponding author

Correspondence to Delin Qi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information of Chromosome-level assembly of Gymnocypris eckloni genome

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, F., Wang, L., Liu, D. et al. Chromosome-level assembly of Gymnocypris eckloni genome. Sci Data 9, 464 (2022). https://doi.org/10.1038/s41597-022-01595-w

Download citation

Received: 07 June 2022
Accepted: 25 July 2022
Published: 02 August 2022
DOI: https://doi.org/10.1038/s41597-022-01595-w

This article is cited by

Population structure and adaptability analysis of Schizothorax o’connori based on whole-genome resequencing
- Kuo Gao
- Zhi He
- Taiming Yan
BMC Genomics (2024)
Chromosome-level genome assembly and annotation of the Spinibarbus caldwelli
- Lina Wu
- Sui Gu
- Shaoxiong Ding
Scientific Data (2024)