High-quality genome assembly of a C. crossoptilon and related functional and genetics data resources

Wu, Siwen; Wang, Kun; Dou, Tengfei; Yuan, Sisi; Wu, Dong-Dong; Ge, Changrong; Jia, Junjing; Su, Zhengchang

doi:10.1038/s41597-024-03087-5

Download PDF

Data Descriptor
Open access
Published: 27 February 2024

High-quality genome assembly of a C. crossoptilon and related functional and genetics data resources

Scientific Data volume 11, Article number: 247 (2024) Cite this article

671 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

There are four species in the Crossoptilon genus inhibiting at from very low to very high altitudes across China, and they are in varying levels of danger of extinction. To better understand the genetic basis of adaptation to high altitudes and genetic changes due to bottleneck, we assembled the genome (~1.02 Gb) of a white eared pheasant (WT) (Crossoptilon crossoptilon) inhibiting at high altitudes (3,000~7,000 m) in northwest of Yunnan province, China, using a combination of Illumina short reads, PacBio long reads and Hi-C reads, with a contig N50 of 19.63 Mb and only six gaps. To further provide resources for gene annotation as well as functional and population genetics analyses, we sequenced transcriptomes of 20 major tissues of the WT individual and re-sequenced another 10 WT individuals and a blue eared pheasant (Crossoptilon auritum) individual inhabiting at intermediate altitudes (1,500~3,000 m). Our assembled WT genome, transcriptome data, and DNA sequencing data can be valuable resources for studying the biology, evolution and developing conservation strategies of these endangered species.

Chromosome-level genome assembly of Przevalski’s partridge (Alectoris magna)

Article Open access 25 November 2023

Chromosome-level genome assembly of the bar-headed goose (Anser indicus)

Article Open access 03 November 2022

A high-quality genome assembly highlights the evolutionary history of the great bustard (Otis tarda, Otidiformes)

Article Open access 18 July 2023

Background & Summary

Crossoptilon, belonging to the Phasianidae family in the Galliformes order, is a rare but important genus endemic in China¹. There are four species in the Crossoptilon genus, including Tibetan eared pheasant (TB) (C. harmani), white eared pheasant (WT) (C. crossoptilon), blue eared pheasant (BL) (C. auritum) and brown eared pheasant (BR) (C. mantchuricum)^2,3. TBs are only found in southeastern Tibet with high altitudes (more than 6,000 m), BRs are mainly distributed in mountains of Beijing, Shanxi and Hebei provinces with low altitudes (20~1,000 m)², BLs are only encountered in the mountains of Qinghai, Gansu and Sichuan provinces and Ningxia Autonomous Region with intermediate altitudes (1,500~3,000 m)², and WTs are distributed in Qinghai, Sichuan, Yunnan and Tibet Province of China with high altitudes (3,000~7,000 m)². All the four species are of high commercial value but in varying levels of danger of extinction, and thus are national key protection animals in China. They are also excellent model organisms for studying genetics basis of altitude adaptation of closely related species and genetic changes during the bottleneck of endangered species. However, studies of the four species are rare, and mostly limited to single genes, partial sequences^4,5 or mitochondrial DNA sequences^1,6. Although the genome of a BR individual was sequenced and assembled in 2020⁷ using Illumina short reads and fragment libraries, with a contig N50 of 0.11 Mb, a scaffold N50 of 3.63 Mb and a BUSCO complete value⁸ of 95.1%, it is not continuous and accurate enough for various genomic studies of the Crossoptilon species. Therefore, it is urgent to sequence and assemble a Crossoptilon species genome with high-quality. Moreover, although re-sequencing data of varying numbers of BR and BL individuals have been available⁷, no WT and TB individuals have been so far sequenced, hampering population genetics studies of the Crossoptilon species.

To fill these gapes at least partially, we assembled the genome of a WT female individual at the chromosome-level with very high quality using a combination of Illumina short reads, PacBio long reads and Hi-C reads. The resulting assembly has a total length of 1.02 Gb, with a contig N50 of 19.6 Mb, a scaffold N50 of 29.6 Mb, a complete BUSCO value⁸ of 97.2% and only six gaps. To facilitate the annotation and functional analysis of the genome, we also sequenced transcriptomes of 20 major tissues of the WT individual. Moreover, we re-sequenced another 10 WT individuals and one individual of BL for population genetics and comparative genomics analyses. Therefore, the assembled almost-gap-free WT genome as well as the large volumes of transcriptome and DNA sequencing data can be valuable resources for studying the biology, evolution and developing conservation strategies of these endangered valuable species.

Methods

Sample information

Blood samples of a total of 10 WT individuals (five males and six females) aged about 10 months were collected from Diqing Tibet Autonomous prefecture, Yannan Province, China and subjected to Illumina paired-end DNA short reads sequencing. A female individual was collected from the same area for whole genome assembly, and its relevant tissues were subject to Illumina paired-end DNA short reads sequencing, PacBio long reads sequencing and Hi-C paired-end short reads sequencing. Moreover, 20 tissues (Heart, Liver, Spleen, Lung, Kidney, Pancreas, Gizzard, Glandular, Crops, Ovary, Abdominal fat, Rectum, Duodenum, Cecum, Skin, Small intestine, Brain, Cerebellum, Chest muscle, Leg muscle) of the WT individual were subject to paired-end RNA-seq. Furthermore, blood sample of a BL individual (female) was collected from Guangzhou Zoo, China and subjected to Illumina paired-end DNA short reads sequencing.

Ethics approval

All the experimental procedures were approved by the Animal Care and Use Committee of the Yunnan Agricultural University (approval ID: YAU202103047). The care and use of animals fully complied with local animal welfare laws, guidelines, and policies.

Short reads DNA sequencing

Two milliliters of blood were drawn from the wing vein of each bird in a centrifuge tube containing anticoagulant (EDTA-2K) and stored at −80 °C until use. Genomic DNA (10 µg) in each blood sample was extracted using a DNA extraction kit (DP326, TIANGEN Biotech, Beijing, China) and fragmented using a Bioruptor Pico System (Diagenode, Belgium). DNA fragments around 350 bp were selected using SPRI beads (Beckman Coulter, IN, USA). DNA-sequencing libraries were prepared using Illumina TruSeq® DNA Library Prep Kits (Illumina, CA, USA) following the vendor’s instructions. The libraries were subject to 150 cycles paired-end sequencing on an Illumina Novaseq. 6000 platform (Illumina, CA, USA) at 102X coverage.

PacBio long reads DNA sequencing

High molecular weight DNA was extracted from the blood sample using NANOBIND® DNA Extraction Kits (PacBio, CA, USA) following the vendor’s instructions. DNA fragments of about 25 kb were size-selected using a BluePippin system (Sage Science, MA, USA). Sequencing libraries were prepared for the DNA fragments using SMRTbell® prep kits (PacBio, CA, USA) following the vendor’s instructions, and subsequently sequenced on a PacBio Sequel II platform (PacBio, CA, USA) at 91X coverage.

Transcriptome sequencing

One to two grams of tissues (Heart, Liver, Spleen, Lung, Kidney, Pancreas, Gizzard, Glandular, Crops, Ovary, Abdominal fat, Rectum, Duodenum, Cecum, Skin, Small intestine, Brain, Cerebellum, Chest muscle, Leg muscle) were collected from the selected female WT individual in a centrifuge tube and immediately frozen in liquid nitrogen, then stored at −80 °C until use. Total RNA from each tissue sample were extracted from each tissue or mixed tissues using TRlzol reagents (TIANGEN Biotech, Beijing China) according to the manufacturer’s instructions. RNA-sequencing libraries for each tissue collected from the individual were prepared using Illumina TruSeq® RNA Library Prep Kits (Illumina, San Diego) following the vendor’s instructions. The libraries were subject to 150 cycles paired-end sequencing on an Illumina Novaseq. 6000 platform with a total of 936,231,391 pairs of reads.

Hi-C reads sequencing

Five milliliters of blood were drawn from the wing vein of the selected WT individual in a Streck Cell-free DNA BCT collecting vessel (Streck Corporate, USA), and stored at 4 °C and used in 24 hours. Hi-C libraries were constructed using Phase Genomics’ Animal Hi-C kit following the vendor’s instructions and subsequently sequenced on an Illumina’s Novaseq. 6000 platform at a sequencing depth of 81X.

Cleaning of raw sequencing reads

For the short sequencing reads, we removed possible adaptors and low-quality portions using TrimGalore (https://github.com/FelixKrueger/TrimGalore) with parameters length > 50 and q > 10. For the long sequencing reads, we removed the reads shorter than 5,000 bp.

Quality assessment of sequencing data

We used FastQC (0.12.1) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) to evaluate the quality of different kinds of sequencing data of the WT and BL.

Contig assembling and scaffolding

We used the PacBio long reads to assemble the contigs using Wtdbg (2.5)⁹ (parameters used: -x ccs -g 1 g -X 98 -e 6), and polished the contigs using Wtdbg (2.5)⁹ with Illumina DNA short reads for the WT (default settings). Then we used SALSA^10,11 to bridge the contigs and obtain the scaffolds with Hi-C short reads (parameters used: -e AAGCTT -m yes -i 4 -s 1000000000 -c 500). We filled the gaps in the scaffolds using PBJelly¹² with the PacBio long reads (parameters used: --minMatch 8 --minPctSimilarity 70 --bestn 1 --nCandidates 20 --maxScore ‐500), and then made two rounds of polish by firstly using Racon (1.4.21)¹³ with PacBio long reads (default settings) and secondly using NextPolish (1.4.0)¹⁴ with Illumina DNA short reads from the selected WT individual (default settings).

Quality evaluation of the assembly

We masked the repeats for the assembly of the WT genome using WindowMasker (2.11.0)¹⁵ to get the repeat rate, and estimated the heterozygosity of the assembly using Jellyfish (2.3.0)¹⁶ and GenomeScope¹⁷. To estimate the continuity of the assembly, we used QUAST (5.0.2)¹⁸ to calculate the contig N50 and scaffold N50. To estimate the structural accuracy, we used Asset (https://github.com/dfguan/asset) to calculate the reliable block N50 and used BUSCO (5.1.3)⁸ against the aves_odb10 database to calculate the false duplication rate for the assembly. To estimate the base accuracy, we used Merqury (1.3)¹⁹ to calculate the k-mer QV (k = 17) and k-mer completeness for the assembly, used BWA (0.7.17)²⁰ to map the short reads of the selected WT individual to the assembly, and used SAMtools (1.10)²¹ to analyze the mapping results. To estimate the functional completeness, we used BUSCO (5.1.3)⁸ to assess the completeness of the assembly against the aves_odb10 database. To plot the heatmap of the scaffolds of the assembly, we mapped the Hi-C paired-end short reads to the assembly using BWA (0.7.17)²⁰, used SAMtools (1.10)²¹ and Pairtools (0.3.0) (https://github.com/open2c/pairtools) to analyze the mapping results, and used Higlass²² to plot the heatmap for the assembly. Default settings were used in each tool.

Data Records

The Illumina DNA paired-end short reads, PacBio long reads, Hi-C paired-end short reads and the RNA-seq paired-end short reads of different tissues of the selected WT individual are available at NCBI SRA with the accession number PRJNA956489²³. The re-sequencing paired-end short reads of the other 10 WT individuals are available at NCBI SRA with the accession number PRJNA956570²⁴. The re-sequencing paired-end short reads of the BL individual are available at NCBI SRA with the accession number PRJNA1039024²⁵. The assembled genome of the WT individual is available at GenBank with the accession number PRJNA956505²⁶.

Technical Validation

Quality evaluation of the sequencing data

We generated Illumina DNA paired-end short reads, PacBio long reads and Hi-C paired-end short reads for a female WT individual. As shown in Table 1, for the Illumina DNA paired-end short reads, the sequencing length is 150 bp and the sequencing depth is 102X. For the PacBio long reads, the average sequencing length is 10 kbp and the sequencing depth is 91X. For the Hi-C paired-end short reads, the sequencing length is 150 bp and the sequencing depth is 81X. In addition, we also sequenced the transcriptomes of 20 tissues (Heart, Liver, Spleen, Lung, Kidney, Pancreas, Gizzard, Glandular, Crops, Ovary, Abdominal fat, Rectum, Duodenum, Cecum, Skin, Small intestine, Brain, Cerebellum, Chest muscle, Leg muscle) of the WT individual and re-sequenced another 10 individuals of WT. As shown in Table 1, for the RNA-seq reads, the sequencing length is 150 bp with a total of 936,231,391 pairs of reads. For the re-sequencing reads, the sequencing length is 150 bp and the average sequencing depth is 31X. For the re-sequencing reads of the BL individual, the sequencing length is 150 bp and the average sequencing depth is 52X.

Table 1 Summary of raw sequencing data.

Full size table

Figure 1a–e show the quality assessment of the different sequencing data. All the Illumina DNA paired-end short reads, Hi-C paired-end reads, RNA-seq reads and re-sequencing reads (WT and BL) have a Phred score greater than 35 (Fig. 1a–d), suggesting that the base accuracy of all these reads is greater than 99.9% (https://www.illumina.com/documents/products/technotes/technote_Q-Scores.pdf) and are of very high quality. As the PacBio long reads do not come with Phred quality scores, we evaluated their quality using length distribution. As shown in Fig. 1e, the PacBio long reads have an average length about 10 kbp, indicating that they are of high quality.

Evaluation of the quality of the assembled WT genome

Using the short and long sequencing reads, we assembled the genome of the WT individual into 805 contigs with a contig N50 of 19.63 Mb and a total contig length of 1.02 Gb, comparable to those of the chicken (Gallus gallus) genome assemblies GRCg6a and GRCg7b/w as well as of the previously assembled BR genome⁷ (1.01 Gb) (Table 2). Using the Hi-C paired-end short reads, we further assembled the contigs into 643 scaffolds with a scaffold N50 of 29.59 Mb (Table 2). We assessed the quality of the assembly using the criteria proposed by the VGP consortium²⁷, and compared it with chicken assemblies GRCg6a and GRCg7b/w, the best-studied bird genomes. These criteria include genome features (heterozygosity and repeat rates), continuity (assembly size, N50 and gaps), structure accuracy (reliable block N50 and false duplication rate), base accuracy (k-mer QV, k-mer completeness and short reads mapping rate) and functional completeness (BUSCO completeness) (Table 2).

Table 2 Evaluation of the quality of the assembled WT individual genome.

Full size table

The heterozygosity rate of the assembled WT genome is 0.54%, and its repeat rate is 20.6%, both are comparable to those of the GRCg6a and GRCg7b/w assemblies (Table 2). For the continuity, the contig N50 (19.6 Mb) of the assembly is slightly larger than those of the GRCg6a (17.7 Mb) and GRCg7b/w (18.8/17.7 Mb) assemblies. The scaffold N50 (29.6 Mb) of the assembly is slightly larger than that of the GRCg6a assembly (20.8 Mb), but smaller than those of the GRCg7b/w assemblies (90.9/90.6 Mb) (Table 2). For gaps, there are only six gaps in our assembly, which is much fewer than those of the GRCg6a (500,945) and GRCg7b/w (463/409) assemblies (Table 2), indicating that our assembly is almost gapless. For the structural accuracy, the reliable block N50 of our assembly (14.6 Mb) is comparable to those of Avian genomes assembled by the recent VGP consortium²⁷. The false duplication rate⁸ of our assembly (0.3%) is slightly smaller than those of the GRCg6a (0.4%) and GRCg7b/w (0.4%) assemblies (Table 2), indicating that the structural accuracy of our assembly is very high. For the base accuracy, the k-mer QV of our assembly is 42.0, suggesting that the consensus base accuracy is greater than 99.99%¹⁹ (Table 2). The k-mer completeness (defined as the fraction of reliable k-mers in highly accurate short reads data that are also found in the assembly¹⁹) of our assembly is 95.3% (Table 2), which is comparable to those of the recent VGP assemblies²⁷. To further evaluate the base accuracy, we mapped the Illumina short reads of the WT individual to the assembly and found that 99.3% short reads can be mapped to the assembly (Table 2), suggesting that our assembly is of high base accuracy. For the functional completeness, we achieved a larger BUSCO completeness value⁸ (97.2%) than those of the GRCg6a (96.6%) and GRCg7b/w (96.6%/96.8%) assemblies (Table 2), suggesting that our assembly is of high functional completeness. To further check whether our assembly is at chromosome-level, we plotted the Hi-C interaction heatmap of the scaffolds. As shown in Fig. 1f, almost all the scaffolds form a square at the diagonal of the heatmap, indicating that our assembly is at chromosome-level, although we lack genetic marks to sort them into specific chromosomes.

Usage Notes

Our almost gapless assembly of the WT genome can be used jointly with other assembled high-quality bird genomes to study many important questions in bird biology and evolution. The WT genome can be compared with the previously assembled BR genome⁷ to reveal their genetic basis to adapt to high and low altitude niches, respectively. The WT genome can also be used as a reference to call single nucleotide variants in the populations of the Crossoptilon species, thereby identifying the natural selective sweeps in their genomes. The RNA-seq data generated from the 20 tissues of the WT individual can be used to annotate its assembled genome and the previously assembled BR genome⁷. The RNA-seq data can also be used in other functional analyses of the species. The re-sequencing WT and BL data together with other available re-sequencing data from BR and BL populations⁷ can be used to identify natural selective signatures on the WT, BL and BR genomes to reveal their genetic bases to adapt to high, intermediate and low altitude niches, respectively. These data can also be used to reveal genetic changes during the bottleneck of these endangered species as previously demonstrated⁷, thereby developing conservation strategies to more effectively protect these endangered valuable species.

Code availability

All genome assembly code and the corresponding pipeline description are available at https://github.com/zhengchangsulab/A-genome-assebmly-and-annotation-pipeline.

References

Li, X., Huang, Y. & Lei, F. Comparative mitochondrial genomics and phylogenetic relationships of the Crossoptilon species (Phasianidae, Galliformes). BMC Genomics 16, 42, https://doi.org/10.1186/s12864-015-1234-9 (2015).
Article PubMed PubMed Central Google Scholar
Xin, L., Guangmei, Z. & Binyuan, G. A preliminary investigation on taxonomy, distribution and evolutionary relationship of the eared pheasants, Crossoptilon. Dong wu xue bao.[Acta zoologica Sinica] 44, 131–137 (1998).
Google Scholar
Zheng, Z. A complete checklist of species and subspecies of the Chinese birds. (Science Press, 1994).
Aiping, W., Wei, D., Zhengwang, Z. & Xiangjiang, Z. Phylogenetic relationships of the avian genus {\sl Crossoptilon}. Dong wu xue bao.[Acta Zoologica Sinica] 51, 898–902 (2005).
Google Scholar
Li, X. et al. Assessment of genetic diversity in Chinese eared pheasant using fluorescent-AFLP markers. Mol Phylogenet Evol 57, 429–433, https://doi.org/10.1016/j.ympev.2010.05.024 (2010).
Article CAS PubMed Google Scholar
Ren, Q. et al. Complete mitochondrial genome of the Blue Eared Pheasant, Crossoptilon auritum (Galliformes: Phasianidae). Mitochondrial DNA A DNA Mapp Seq Anal 27, 615–617, https://doi.org/10.3109/19401736.2014.908371 (2016).
Article CAS PubMed Google Scholar
Wang, P. et al. Genomic Consequences of Long-Term Population Decline in Brown Eared Pheasant. Mol Biol Evol 38, 263–273, https://doi.org/10.1093/molbev/msaa213 (2021).
Article CAS PubMed Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article CAS PubMed Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155–158, https://doi.org/10.1038/s41592-019-0669-3 (2020).
Article CAS PubMed Google Scholar
Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chin, C. S. Scaffolding of long read assemblies using long range contact information. BMC Genomics 18, 527, https://doi.org/10.1186/s12864-017-3879-z (2017).
Article CAS PubMed PubMed Central Google Scholar
Ghurye, J. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol 15, e1007273, https://doi.org/10.1371/journal.pcbi.1007273 (2019).
Article CAS PubMed PubMed Central Google Scholar
English, A. C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7, e47768, https://doi.org/10.1371/journal.pone.0047768 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27, 737–746, https://doi.org/10.1101/gr.214270.116 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255, https://doi.org/10.1093/bioinformatics/btz891 (2020).
Article CAS PubMed Google Scholar
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141, https://doi.org/10.1093/bioinformatics/bti774 (2006).
Article CAS PubMed Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075, https://doi.org/10.1093/bioinformatics/btt086 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 19, 125, https://doi.org/10.1186/s13059-018-1486-1 (2018).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP432961 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP433016 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP471294 (2023).
Wu, S. et al. Genome assembly of a white eared pheasant individual. GenBank https://identifiers.org/insdc.gca:GCA_036346035.1 (2023).
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746, https://doi.org/10.1038/s41586-021-03451-0 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (U2002205 and U1702232), Yunling Scholar Training Program of Yunnan Province (2014NO48), Yunling Industry and Technology Leading Talent Training Program of Yunnan Province (YNWR-CYJS-2015-027), Natural Science Foundation of Yunnan Province (2019IC008 and 2016ZA008), and Department of Bioinformatics and Genomics of the University of North Carolina at Charlotte.

Author information

These authors contributed equally: Siwen Wu, Kun Wang, Tengfei Dou.

Authors and Affiliations

Department of Bioinformatics and Genomics, College of Computing and Informatics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
Siwen Wu, Sisi Yuan & Zhengchang Su
Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
Kun Wang, Tengfei Dou, Changrong Ge & Junjing Jia
State Key Laboratory of Genetic Resources and Evolution/Key Laboratory of Healthy Aging Research of Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
Kun Wang & Dong-Dong Wu

Authors

Siwen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tengfei Dou
View author publications
You can also search for this author in PubMed Google Scholar
Sisi Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Dong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Changrong Ge
View author publications
You can also search for this author in PubMed Google Scholar
Junjing Jia
View author publications
You can also search for this author in PubMed Google Scholar
Zhengchang Su
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.G., J.J. and Z.S. supervised and conceived the project; K.W., T.D. and D.W. collected tissue samples and conducted molecular biology experiments; S.W. and S.Y¹. assembled the genomes; S.W. and Z.S. performed data analysis; and S.W. and Z.S. wrote the manuscript.

Corresponding authors

Correspondence to Changrong Ge, Junjing Jia or Zhengchang Su.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, S., Wang, K., Dou, T. et al. High-quality genome assembly of a C. crossoptilon and related functional and genetics data resources. Sci Data 11, 247 (2024). https://doi.org/10.1038/s41597-024-03087-5

Download citation

Received: 04 December 2023
Accepted: 21 February 2024
Published: 27 February 2024
DOI: https://doi.org/10.1038/s41597-024-03087-5