High quality assemblies of four indigenous chicken genomes and related functional data resources

Wu, Siwen; Wang, Kun; Dou, Tengfei; Yuan, Sisi; Yan, Shixiong; Xu, Zhiqiang; Liu, Yong; Jian, Zonghui; Zhao, Jingying; Zhao, Rouhan; Zi, Xiannian; Gu, Dahai; Liu, Lixian; Li, Qihua; Wu, Dong-Dong; Jia, Junjing; Su, Zhengchang; Ge, Changrong

doi:10.1038/s41597-024-03126-1

Download PDF

Data Descriptor
Open access
Published: 15 March 2024

High quality assemblies of four indigenous chicken genomes and related functional data resources

Siwen Wu^1,2^na1,
Kun Wang ORCID: orcid.org/0000-0001-9681-1593¹^na1,
Tengfei Dou¹^na1,
Sisi Yuan²^na1,
Shixiong Yan¹^na1,
Zhiqiang Xu¹,
Yong Liu¹,
Zonghui Jian¹,
Jingying Zhao¹,
Rouhan Zhao¹,
Xiannian Zi¹,
Dahai Gu¹,
Lixian Liu¹,
Qihua Li¹,
Dong-Dong Wu ORCID: orcid.org/0000-0001-7101-7297^3,4,
Junjing Jia¹^na2,
Zhengchang Su ORCID: orcid.org/0000-0003-4636-3440²^na2 &
…
Changrong Ge¹^na2

Scientific Data volume 11, Article number: 300 (2024) Cite this article

937 Accesses
2 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 08 April 2024

This article has been updated

Abstract

Many lines of evidence indicate that red jungle fowl (RJF) is the primary ancestor of domestic chickens. Although multiple versions of RJF (galgal2-galgal5 and GRCg6a) and commercial chickens (GRCg7b/w and Huxu) genomes have been assembled since 2004, no high-quality indigenous chicken genomes have been assembled, hampering the understanding of chicken domestication and evolution. To fill the gap, we sequenced the genomes of four indigenous chickens with distinct morphological traits in southwest China, using a combination of short, long and Hi-C reads. We assembled each genome (~1.0 Gb) into 42 chromosomes with chromosome N50 90.5–90.9 Mb, amongst the highest quality of chicken genome assemblies. To provide resources for gene annotation and functional analysis, we also sequenced transcriptomes of 10 tissues for each of the four chickens. Moreover, we corrected many mis-assemblies and assembled missing micro-chromosomes 29 and 34–39 for GRCg6a. Our assemblies, sequencing data and the correction of GRCg6a can be valuable resources for studying chicken domestication and evolution.

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

Emx2 underlies the development and evolution of marsupial gliding membranes

Article Open access 24 April 2024

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Background & Summary

Evidence shows that red jungle fowl (RJF) (Gallus gallus) is the primary ancestor of domestic chickens (Gallus gallus domesticus) all over the world¹. Since the release of the initial draft genome assembly (galgal2) of an RJF individual², multiple improved assemblies (galgal3-galgal5 and GRCg6a) have been developed^3,4. More recently, the Vertebrate Genomes Project (VGP) also assembled pseudo-haplotype genomes (GRCg7b and GRCg7w) of a hybrid individual from a broiler mother and a layer father using long sequencing reads and multiple scaffolding data^5,6. There are also several assemblies for indigenous chickens deposited in GenBank, such as Yeonsan Ogye chicken (Ogye1.0) and Tibetan chicken (ASM2537063v1). However, with a contig N50 < 1 M and lack of chr29 and chr34–39, their quality is quite low. Li, et al. have assembled a so-called pan-genome for 20 chicken breeds using short, long and Hi-C reads⁷. However, with a contig N50 of 5.89–16.72 Mb and a complete BUSCO⁸ value of 92.4%-95.3%, this assembly does not allow the identification of subtle differences in gene compositions of chickens. Besides, they failed to assemble many micro-chromosomes such as 29 and 34–39. Recently, gap-less genome assemblies for a Chinese broiler (Huxu chicken)⁹ and a Silkie chicken¹⁰ have been made using PacBio HiFi, Oxford Nanopore long reads and multiple scaffolding data. In another effort, Rice, et al. generated a pangenome graph based on 30 assembled chicken genomes, including the GRCg7b/w and Huxu assemblies, among others¹¹. Although these assemblies provide a good foundation for understanding various aspects of chicken biology and guiding poultry breeding, more high-quality assemblies for indigenous chickens (traditionally domesticated village chickens) are necessary to further the understanding of chicken domestication and evolution.

To fill the gaps, we sequenced and assembled genomes of four individual indigenous chickens of Daweishan, Hu, Piao and Wuding breeds from Yunnan province in southwestern China, one of the major geographical places where the domesticated chickens originated¹. These chicken breeds have been developed by less-intensive traditional family-based artificial selection in villages in isolated mountainous areas in the province since 2000–6000 BC¹². Each breed possesses distinct morphological traits: Daweishan chickens have a miniature body size (0.5–0.8 kg for female and 0.8–1.2 kg for male adults); Hu chickens have a large body size (3 kg for female and 6 kg for male adults) with extraordinarily stout legs; Piao chickens have a short tail (a rumpless phenotype); and Wuding chickens have a middle-sized body and are good at running. Using a combination of Illumina short, PacBio or Oxford Nanopore long and Hi-C reads, we assembled each genome at the chromosome-level with a contig N50 of 16.2–25.1 Mb, chromosome N50 of 90.5–90.9 Mb and a complete BUSCO value of 96.5%-96.7%. Evaluations from multiple aspects of criteria proposed by the Vertebrate Genome Project⁵ suggest that our assemblies are amongst the highest quality of chicken genome assemblies. To provide resources for gene annotation and functional analysis of indigenous chickens, we also sequenced the transcriptomes of 10 tissues for each of the four indigenous chickens. In addition, we corrected many mis-assemblies in micro-chromosomes (31–33) and assembled micro-chromosomes 29 and 34–39 for RJF (GRCg6a), providing a more complete assembly for the primary ancestor of domestic chickens. Our assembled high-quality indigenous chicken genomes, related functional data, and corrections of the GRCg6a assembly can be valuable resources for the community to reveal the genetic basis of the important and interesting traits of these chicken breeds as well as to study the evolution and domestication process of chickens.

Methods

Chickens

A female individual of Daweishan chicken aged 10 months, a female individual of Hu chicken aged 7 months, a female individual of Piao chicken aged 10 months and a female individual of Wuding chicken aged 10 months were collected from corresponding chicken breed populations from the Experimental Breeding Chicken Farm of the Yunnan Agricultural University (Yunnan, China). Each individual chicken was subject to Illumina short-reads DNA sequencing, PacBio or Oxford Nanopore long-reads sequencing, Hi-C sequencing and Illumina short-reads RNA sequencing.

Ethics statement

All the experimental procedures were approved by the Animal Care and Use Committee of the Yunnan Agricultural University (approval ID: YAU202103047). The care and use of animals fully complied with local animal welfare laws, guidelines, and policies.

Short-reads DNA sequencing

Two milliliters of blood were drawn from the wing vein of each chicken in a centrifuge tube containing anticoagulant (EDTA-2K) and stored at −80 °C until use. Genomic DNA (10 µg) in each blood sample was extracted using a DNA extraction kit (DP326, TIANGEN Biotech, Beijing, China) and fragmented using a Bioruptor Pico System (Diagenode, Belgium). DNA fragments around 350 bp were selected using SPRI beads (Beckman Coulter, IN, USA). DNA-sequencing libraries were prepared using Illumina TruSeq® DNA Library Prep Kits (Illumina, CA, USA) following the vendor’s instructions. The libraries were subject to 150 cycles paired-end sequencing on an Illumina Novaseq 6000 platform (Illumina, CA, USA) at 100X coverage.

PacBio long-reads sequencing

Two milliliters of blood were drawn from the wing vein of a female Hu chicken (H3) in a centrifuge tube with anticoagulant (EDTA-2K) and stored at −80 °C until use. High molecular weight DNA was extracted from each blood sample using NANOBIND® DNA Extraction Kits (PacBio, CA, USA) following the vendor’s instructions. DNA fragments of about 25 kb were size-selected using a BluePippin system (Sage Science, MA, USA). Sequencing libraries were prepared for the DNA fragments using SMRTbell® prep kits (PacBio, CA, USA) following the vendor’s instructions, and subsequently sequenced on a PacBio Sequel II platform (PacBio, CA, USA) at 36X coverage.

Oxford Nanopore long-reads sequencing

Two milliliters of blood were drawn from the wing vein of a female Daweishan (F025), Piao (P17) or Wuding (W17) chicken in a centrifuge tube with anticoagulant (EDTA-2K) and stored at −80°C until use. High molecular weight DNA from each blood sample was prepared using Ultra-Long Sequencing Kits (Oxford Nanopore Technology (ONT), Oxford, UK) following the vendor’s instructions. The integrity of DNA was determined using pulsed field electrophoresis. DNA fragments of about 20 kb were size-selected using a BluePippin system (Sage Science, MA, USA). Sequencing libraries were prepared for the DNA fragments using ONT Template prep kit (SQK-LSK109) and NEB Next FFPE DNA Repair Mix kit, following the vendors’ instructions. The libraries were sequenced on a Nanopore PromethION P48 platform (ONT, Oxford, UK) at ~100X coverage using ONT sequencing kits (EXP-FLP001.PRO.6).

Hi-C sequencing

Five milliliter of blood were drawn from the wing vein of the selected Daweishan (F025), Piao (P17), Hu chicken (H3) or Wuding (W17) chickens in a Streck Cell-free DNA BCT collecting vessel (Streck Corporate, USA), and stored at 4 °C and used in 24 hours. Hi-C libraries were constructed using Phase Genomics’ Animal Hi-C kit following the vendor’s instructions and subsequently sequenced on an Illumina’s Novaseq 6000 platform at a sequencing depth of ~100X.

Transcriptome sequencing

One to two grams of various tissues were collected from the selected female individual chicken of each breed in a centrifuge tube and immediately frozen in liquid nitrogen, then stored at −80 °C until use. Total RNA from each tissue sample were extracted using TRlzol reagents (TIANGEN Biotech, Beijing China) according to the manufacturer’s instructions. RNA-sequencing libraries for each tissue as well as for the mixture of all the tissues collected from a chicken were prepared using Illumina TruSeq® RNA Library Prep Kits (Illumina, San Diego) following the vendor’s instructions. The libraries were subject to 150 cycles paired-end sequencing on an Illumina Novaseq 6000 platform at a sequencing depth of 80X. Additional individual chicken was randomly selected from each breed population and the same types of tissues were collected. Total RNA from each tissue sample were extracted in the same way as described above. Equal weight of total RNA of each tissue was mixed for preparing an RNA-seq library as described above.

Quality assessment of sequencing data

We used FastQC (0.12.1) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) to evaluate the quality of our different kinds of sequencing data. Default parameters were used in the tool.

Contig assembling

We filtered out PacBio/Oxford Nanopore long reads shorter than 5,000 bp in each library, and assembled contigs using Wtdbg (2.5)¹³ with the remaining reads for each chicken. Parameters used in the tool for each chicken are summarized in Table S1.

Scaffolding and gap filling

We bridged the contigs by 100 Ns and obtained scaffolds using SALSA^14,15 with the Hi-C sequencing reads. We filled the gaps in the scaffolds using PBJelly¹⁶ with PacBio/Oxford Nanopore long reads, and made two rounds of polishing on the resulting scaffolds for each chicken, first by using Racon (1.4.21)¹⁷ with the long reads, and second by using NextPolish (1.4.0)¹⁸ with the paired-end short reads from the same individual chicken. Parameters used in each tool for each chicken are summarized in Table S1.

Chromosome-level genome assembling

To sort the assembled scaffolds into chromosomes, we used the chromosomes of the GRCg7b assembly as templates, except for chr16 where we used that in the Huxu chicken assembly (GGswu), since this chromosome is much more completely assembled in the Huxu reference than in GRCg7b (4.9 Mb vs 2.7 Mb). Specifically, we mapped the assembled scaffolds in each chicken to the templates using blastn (2.11.0)¹⁹. The parameters used in the tool for each chicken are summarized in Table S1. We consider a scaffold as being mapped to a chromosome if the ratio of the mapped length of a scaffold over the minimum of the length of the query scaffold and the length of the target chromosome was greater than 0.5. We ordered and orientated the scaffolds based on their mappings on a template chromosome. We concatenated the scaffolds mapped to the same chromosome by 500 Ns according to their mapping orders. The remaining scaffolds that could not be mapped to the template chromosomes, were considered as unplaced scaffolds. In this way, we sorted the assembled scaffolds of the four chickens into 42 chromosomes (including two sex chromosomes and one mitochondrial chromosome).

During the mapping, we found that, in two cases, one part of a scaffold was mapped to one template chromosome while the other part mapped to another chromosome. Each section was connected by 100 Ns, suggesting that two parts of two different chromosomes were incorrectly concatenated into a scaffold by the scaffolding tool, probably due to the physical proximity of the territories of the two chromosomes in the nuclei. We thus manually split the two parts and sorted them into their corresponding mapped template chromosomes. In addition, there is no scaffold in Hu chicken’s primary assembly, which can be mapped to the mitochondrial chromosome of GRCg7b. Thus, it appears that the assembly tools did not assemble the mitochondrial chromosome for Hu chicken for some unknown reason. We therefore mapped the short reads of Hu chicken to the mitochondrial chromosome of GRCg7b, and then assembled the mitochondrial chromosome for Hu chicken using Abyss (2.2.5)²⁰ with default settings using the mapped short reads.

Quality evaluation of assemblies

We masked the repeats of the four assemblies using WindowMasker (2.11.0)²¹, and estimated the heterozygosity of each assembly using Jellyfish (2.3.0)²² and GenomeScope²³. To estimate the continuity of each assembly, we used QUAST (5.0.2)²⁴ to calculate the contig N50, scaffold N50 and chromosome N50. To estimate the structural accuracy, we used Asset (https://github.com/dfguan/asset) to calculate the reliable block N50 and used BUSCO (5.1.3)⁸ to calculate the false duplications in each assembly. To estimate the base accuracy, we used Merqury (1.3)²⁵ to calculate the k-mer QV and k-mer completeness for the four assemblies (k = 17), used BWA (0.7.17)²⁶ to map the short reads to the four assemblies, and used and SAMtools (1.10)²⁷ to analyze the mapping results. To estimate the functional completeness, we used BUSCO (5.1.3)⁸ to assess each assemblies’ completeness against the avian gene set and used STAR (2.7.0)²⁸ to map the mRNA short reads to each assembly and calculate the mRNA completeness value. To plot the heatmap of the chromosomes of each assembly, we mapped the Hi-C paired-end reads to the assembly using BWA (0.7.17)²⁶, used SAMtools (1.10)²⁷ and Pairtools (0.3.0) (https://github.com/open2c/pairtools) to analyze the mapping results, and used Higlass²⁹ to plot the heatmap for each assembly. Default parameters were used in each tool.

Correction of GRCg6a assembly

Since chromosomes 29 and 34–39 of the most recent RJF genome assembly (GRCg6a) are still missing and some assembled chromosomes might contain mis-assemblies, we assembled these missing chromosomes to provide a better reference assembly for RJF. We validated these newly assembled chromosomes using the GRCg7b reference as the template, except chr16 for which we used that in the Huxu chicken assembly (GGswu) for the aforementioned reason. Specifically, we mapped all the assembled chromosomes and unplaced contigs of GRCg6a to the templates using blastn (2.11.0)¹⁹ with the same parameters used in the four indigenous chickens. If a chromosome of GRCg6a was completely mapped to the corresponding template chromosome, then we kept it intact. If the contigs of a chromosome from GRCg6a were mapped to different template chromosomes (indicating possible mis-scaffolding), we split them between the Ns that concatenated them, and sorted the split contigs to the mapped template chromosomes. Finally, we assembled the missing chromosomes and mis-assembled ones by ordering, orientating and concatenating (by 500 Ns) the split and unplaced contigs based on their mappings on the template chromosomes.

Data Records

All the Illumina short DNA sequencing reads, PacBio or Oxford Nanopore long reads, Hi-C reads and the Illumina short RNA-seq reads of different tissues of the four indigenous chickens are available at NCBI SRA database with accession number PRJNA865263³⁰. The assembled genomes of Daweishan, Hu, Piao and Wuding chicken have passed NCBI’s quality evaluation and are available at NCBI GenBank under the BioProject number PRJNA865263^31,32,33,34. Details of each sample in the BioProject are summarized in Table S2. The corrected and assembled missing chromosomes of GRCg6a are available at the figshare database³⁵.

Technical Validation

Quality evaluation of the sequencing data of the four chicken breeds

To assemble the genome of an individual of each of the four indigenous chicken breeds, we generated Illumina paired-end short reads, PacBio or Oxford Nanopore long reads and Hi-C paired-end short reads for each selected individual. As shown in Table 1, the Illumina paired-end DNA short reads have a length 150 bp and cover from 89X in Hu chicken to 143X in Daweishan chicken genomes. The long reads in Hu chicken were generated using PacBio sequencing with an average length 12.6 kbp, covering 36X of the genome. The long reads in the other three chickens were produced using Oxford Nanopore sequencing with an average length ~12 kbp, covering about 100X of the genomes. For the Hi-C paired-end short reads, the sequencing length is 150 bp with about 100X sequencing depth for all the four chickens. To provide resources for gene annotation, we also sequenced transcriptomes of 10 tissues in each of the four chickens using paired-end RNA-seq with a sequencing length 150 bp and 257–280 million reads.

Table 1 Summary of sequencing data from the four chicken breeds.

Full size table

Figure 1 shows the results of quality assessment of these different kinds of sequencing data of Wuding chicken. All the Illumina reads including genomic paired-end short reads, Hi-C paired-end short reads and RNA-seq paired-end short reads of the Wuding chicken individual have a phred quality score greater than 35, suggesting that the base accuracy of all these reads is greater than 99.9% (https://www.illumina.com/documents/products/technotes/technote_Q-Scores.pdf). For the Oxford Nanopore long reads of the Wuding chicken individual, the peak phred quality is about 12. Similar results were obtained for the sequencing data of the other three chickens (Supplementary Figures). Of note, PacBio sequencing reads do not come with phred quality scores, thus, the long reads of Hu chicken were evaluated using length distribution, which indicates the data is of high quality.

Evaluation of the quality of the assemblies of the four indigenous chicken genomes

Using a combination of Illumina short reads (89–143X) and PacBio (36X) or Oxford Nanopore (101-110X) long reads (Table 1), we assembled the genomes of a female individual of the Daweishan, Hu, Piao and Wuding chicken into 462–1,364 contigs (Table S3) with a contig N50 of 23.0, 16.2, 25.1 and 21.5 Mb, respectively. Hu chickens’ smaller contig N50 (16.2 Mb) might be due to the shorter PacBio reads (50 bp-90.8 kbp) and a shallower sequencing depth (36X) than those of Oxford Nanopore reads (Table 1). The contig N50 values of the Daweishan, Piao and Wuding assemblies are larger than those of the GRCg6a, GRCg7b and GRCg7w assemblies (17.7–18.8 Mb) (Table 2). The total length of the contigs (>1 Gb) for each chicken is comparable with those of GRCg6a, GRCg7b and GRCg7w assemblies (Table 2). We scaffolded the contigs using Hi-C reads (102-112X) for each chicken (Table 1), resulting in 308-1,088 scaffolds, with a scaffold N50 of 74.3, 28.9, 62.8 and 71.1 Mb for the Daweishan, Hu, Piao and Wuding chickens, respectively (Table 2). Using the GRCg7b chromosomes and chr16 of the Huxu chicken assembly (GGswu) as the template, we ordered and oriented the scaffolds into 39 autosomal chromosomes, two sex chromosomes W and Z and one mitochondrial genome for each chicken (Table S3), with a chromosome N50 of 90.5, 90.7, 90.5 and 90.9 Mb for the Daweishan, Hu, Piao and Wuding chickens, respectively, comparable to those of the GRCg6a and GRCg7b/w assemblies (Table 2).

Table 2 Evaluation of the four assemblies using six categories of criteria.

Full size table

Recently, the VGP consortium proposed six categories of criteria for evaluating the quality of a chromosome-level assembly, including genome (degree of heterozygosity and repeats), continuity, structural accuracy, base accuracy, functional completeness, and chromosomal assignment status⁵. We thus further evaluated the quality of each of our assembled genomes using these criteria (Table 2). For the genome evaluation, we found that Piao chicken had the highest heterozygosity of 0.9%, Hu chicken the lowest heterozygosity of 0.7%, and both Daweishan and Wuding chicken a middle heterozygosity of 0.8%. Repeats consist of from 19.9 to 20.4% of all the four assembled genomes, which are similar to those of GRCg6a (20.4%), GRCg7b (20.5%) and GRCg7w (20.2%). For the continuity evaluation, both the contig N50 (16.2–25.1 Mb) and chromosome N50 (90.5–90.9 Mb) of our assemblies are comparable with those of GRCg6a (17.7 and 91.3 Mb), GRCg7b (18.8 and 90.9 Mb) and GRCg7w (17.7 and 90.6 Mb). There are 388, 515, 506 and 312 gaps in the Daweishan, Hu, Piao and Wuding assemblies, respectively, which are substantially fewer than those in GRCg6a (500,945) and are comparable with those in GRCg7b (463) and GRCg7w (409). For the structural accuracy evaluation, we identified reliable blocks and false duplications of the assemblies. As shown in Table 2, we achieved a reliable block N50 > 13.5 Mb except for Hu chicken (2.9 Mb), and a false duplication rate of 0.3–0.4%. The values of both parameters are comparable to those of the recent VGP assemblies of 16 species of six major vertebrate lineages⁵.

For the base accuracy evaluation, we first computed the k-mer QVs of our assemblies, which is the log-scaled probability of consensus errors in the assembly²⁵. We found that the k-mer QVs of the Daweishan, Piao and Wuding chicken assemblies were greater than 41.5 and the value of the Hu chicken assembly was 38.4, suggesting that the consensus base accuracy is greater than 99.99% and 99.90% for the former three assemblies and the Hu chicken assembly²⁵, respectively, which is comparable to those obtained by the VGP assemblies⁵. We next calculated k-mer completeness, which is defined as the fraction of reliable k-mers in highly accurate short reads data that are also found in the assembly²⁵. As shown in Table 2, the k-mer completeness for all the four assemblies is greater than 92.8%, also comparable to those of the recent VGP assemblies⁵. Since our assemblies are the mosaics of the paternal and maternal homologous chromosomes that differ in heterozygous sites, to further evaluate the completeness of the assemblies, we mapped short reads from each individual chicken to its assembled genome and found that the mapping rates were greater than 99.2% for all the assemblies. These results indicate that all our four assemblies have achieved high base accuracy.

For the functional completeness evaluation, all of the four assemblies have >96.5% BUSCO completeness⁸, which are comparable to those of GRCg6a (96.6%), GRCg7b (96.6%) and GRCg7w (96.8%). We next mapped the RNA-seq reads from multiple tissues of each chicken to its assembled genome and found that the mapping rates were at least 95.0% for all the four assembled genomes except for Hu chicken (91.3%). These results indicate that the assemblies are of high completeness. For the chromosomal assignment status evaluation, although there are still some unplaced contigs in each of our assemblies, the total length (non-N bp) of the contigs assigned to chromosomes are greater than 98% for all the four assemblies, which are similar to those of GRCg6a and GRCg7b/w. Thus, most of our contigs are assigned to chromosomes.

Additionally, we plotted the Hi-C interaction heatmaps of the autosomes and sex chromosomes (W and Z) of each of the four assemblies. As shown in Fig. 2, for all the four genomes, most assembled chromosomes including the micro-chromosomes (chr11-39) form a squared box along the main diagonal of the heatmap matrix, with the exception of a few very small chromosomes such as chr31 and chr35. Moreover, the assembled chromosomes in each of the four assembled genomes display high collinearity with those of the GRCg7b reference (Fig. 3), indicating that the structures of the assembled genomes are consistent. Moreover, our assembled mitochondrial genomes of Daweishan, Hu, Piao and Wuding chickens have a length of 16.0, 16.5, 16.3 and 16.8 kbp, respectively, which are similar to those of GRCg6a (16.8 kbp), GRCg7b (16.8 kbp) and GRCg7w (16.8 kbp) (Table 2). Taken together, these results indicate that we have achieved chromosome-level assembly for all four chicken genomes with very high quality.

Assembly of missing micro-chromosomes and correction of mis-assemblies in GRCg6a

The current RJF’s reference genome (GRCg6a) lacks assemblies of micro-chromosomes 29 and 34–39. We found that contigs of these chromosomes were either mistakenly assembled into chr31, chr32 and chr33 or were unplaced (Methods). We assembled these missing chromosomes and corrected the mis-assembled chr31, chr32 and chr33 of GRCg6a using the corresponding chromosomes in the GRCg7b assembly as templates (Methods). Table 3 summarizes the contig compositions and lengths of these assembled RJF micro-chromosomes in comparison with those of the GRCg7b assembly. Interestingly, except for chr35, chr36 and chr39, these assembled RJF micro-chromosomes are much longer than their corresponding ones in GRCg7b, suggesting that they might be more complete. Chr1-28, chr30, the two sex chromosomes and the mitochondrial chromosome in GRCg6a were consistent with those of GRCg7b, and thus were kept intact. The assembled GRCg6a chromosomes also display high collinearity with those of GRCg7b (Fig. 3), indicating that the structures of the two assembled genomes are consistent.

Table 3 Contig compositions and lengths of the assembled RJF chr29, chr31-chr39 in comparison with the corresponding chromosomes of the GRCg7b assembly.

Full size table

Varying lengths and high G/C contents of assembled micro-chromosomes chr16 and chr29-39

We compared the lengths of the assembled chromosomes of the four indigenous chickens with those of RJF. As shown in Fig. 4a,the four indigenous chickens and RJF have similar lengths of all the assembled chromosomes, except for micro-chromosomes 16 and 29–39 that show highly varying lengths. These results indicate that the lengths of all the assembled macro-chromosomes (chr1-10), most micro-chromosomes (chr11-15, 17–26) and sex chromosome (chrW and chrZ) of the five chickens are consistent, and thus are sufficiently assembled. Our assemblies of chr33, chr35, chr36, chr38 and chr39 are much longer than those of RJF, but our assemblies of chr27, chr29, chr31, chr32, chr34 and chr37 are shorter than those of RJF. These micro-chromosomes have higher G/C contents than macro-chromosomes (Fig. 4b). Thus, the difficulty to better assemble these micro-chromosomes might be at least partially due to their higher G/C contents, since genome regions with high G/C content are difficult to sequence using the technologies we used to generate the DNA reads. The lengths of the assembled chr16 in Daweishan and Piao chickens are comparable to that of RJF; however, those in Hu and Wuding chickens are only 73% that of RJF (Fig. 4a,Table S3), even though we used chr16 (4.9 Mb) in Huxu chicken as the template to assemble the same chromosome in the four indigenous chickens (Methods). When a shorter chr16 (2.7 Mb) in GRCg7b was used as the template, the resulting chr16 assemblies in all the four indigenous chickens only had lengths <60% that of RJF (data not shown). Thus, in absence of accurate longer reads, chromosomes can be more accurately assembled by using more complete chromosomes as the templates. Interestingly, chr16 in Huxu chicken (4.9 Mb) is much longer than that in Silkie chicken (3.3 Mb), although chromosomes of both chickens were assembled using PacBio HiFi long reads, suggesting that different chicken breeds might have a variable chr16 that harbors highly varying repeat regions in the major histocompatibility complex (MHC)³⁶. Thus, the variable lengths of the assembled chr16 in the different chicken breeds might be due to a combination of factors including their highly varying repetitive major MHC regions, relatively more duplicated genes³⁶, and higher G/C contents (Fig. 4b). More efforts are needed in the future to completely assemble the micro-chromosomes in the indigenous chickens using new sequencing technologies.

Code availability

The genome assembly pipeline, codes and the documentation are available at https://github.com/zhengchangsulab/A-genome-assebmly-and-annotation-pipeline.

Change history

08 April 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41597-024-03203-5

References

Wang, M. S. et al. 863 genomes reveal the origin and domestication of chicken. Cell Res 30, 693–701, https://doi.org/10.1038/s41422-020-0349-y (2020).
Article CAS PubMed PubMed Central Google Scholar
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716, https://doi.org/10.1038/nature03154 (2004).
Warren, W. C. et al. A New Chicken Genome Assembly Provides Insight into Avian Genome Structure. G3 (Bethesda) 7, 109–117, https://doi.org/10.1534/g3.116.035923 (2017).
Article CAS PubMed Google Scholar
Schmid, M. et al. Third Report on Chicken Genes and Chromosomes 2015. Cytogenetic and genome research 145, 78–179, https://doi.org/10.1159/000430927 (2015).
Article PubMed Google Scholar
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746, https://doi.org/10.1038/s41586-021-03451-0 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Smith, J. et al. Fourth Report on Chicken Genes and Chromosomes 2022. Cytogenetic and genome research 162, 405–528, https://doi.org/10.1159/000529376 (2022).
Article PubMed Google Scholar
Li, M. et al. De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions. Mol Biol Evol 39, https://doi.org/10.1093/molbev/msac066 (2022).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article CAS PubMed Google Scholar
Huang, Z. et al. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci USA 120, e2216641120, https://doi.org/10.1073/pnas.2216641120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhu, F. et al. A chromosome-level genome assembly for the Silkie chicken resolves complete sequences for key chicken metabolic, reproductive, and immunity genes. Communications biology 6, 1233, https://doi.org/10.1038/s42003-023-05619-y (2023).
Article CAS PubMed PubMed Central Google Scholar
Rice, E. S. et al. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 21, 267, https://doi.org/10.1186/s12915-023-01758-0 (2023).
Article CAS PubMed PubMed Central Google Scholar
Miao, Y. W. et al. Chicken domestication: an updated perspective based on mitochondrial genomes. Heredity (Edinb) 110, 277–282, https://doi.org/10.1038/hdy.2012.83 (2013).
Article CAS PubMed Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155–158, https://doi.org/10.1038/s41592-019-0669-3 (2020).
Article CAS PubMed Google Scholar
Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chin, C. S. Scaffolding of long read assemblies using long range contact information. BMC Genomics 18, 527, https://doi.org/10.1186/s12864-017-3879-z (2017).
Article CAS PubMed PubMed Central Google Scholar
Ghurye, J. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol 15, e1007273, https://doi.org/10.1371/journal.pcbi.1007273 (2019).
Article CAS PubMed PubMed Central Google Scholar
English, A. C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7, e47768, https://doi.org/10.1371/journal.pone.0047768 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27, 737–746, https://doi.org/10.1101/gr.214270.116 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255, https://doi.org/10.1093/bioinformatics/btz891 (2020).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403–410, https://doi.org/10.1016/s0022-2836(05)80360-2 (1990).
Article CAS PubMed Google Scholar
Jackman, S. D. et al. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res 27, 768–777, https://doi.org/10.1101/gr.214346.116 (2017).
Article CAS PubMed PubMed Central Google Scholar
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141, https://doi.org/10.1093/bioinformatics/bti774 (2006).
Article CAS PubMed Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075, https://doi.org/10.1093/bioinformatics/btt086 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352 (2009).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21, https://doi.org/10.1093/bioinformatics/bts635 (2013).
Article CAS PubMed Google Scholar
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol 19, 125, https://doi.org/10.1186/s13059-018-1486-1 (2018).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP487534 (2023).
Siwen, W. K. W. et al. Assemblies of four indigenous chickens in southwest China. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_030849555.1 (2023).
Siwen, W. K. W. et al. Assemblies of four indigenous chickens in southwest China. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_030979905.1 (2023).
Siwen, W. K. W. et al. Assemblies of four indigenous chickens in southwest China. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_030914265.1 (2023).
Siwen, W. K. W. et al. Assemblies of four indigenous chickens in southwest China. GenBank https://identifiers.org/ncbi/insdc.gca:GCA_030914275.1 (2023).
Siwen, W. K. W. et al. Assembly for missing chromosomes and correction for chr31~33 for chicken GRCg6a. figshare https://doi.org/10.6084/m9.figshare.24438283.v1 (2023).
Cheng, Y. & Burt, D. W. Chicken genomics. Int J Dev Biol 62, 265–271, https://doi.org/10.1387/ijdb.170276yc (2018).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (U2002205 and U1702232), Yunling Scholar Training Program of Yunnan Province (2014NO48), Yunling Industry and Technology Leading Talent Training Program of Yunnan Province (YNWR-CYJS-2015-027), Natural Science Foundation of Yunnan Province (2019IC008 and 2016ZA008), and Department of Bioinformatics and Genomics of the University of North Carolina at Charlotte.

Author information

These authors contributed equally: Siwen Wu, Kun Wang, Tengfei Dou.
These authors jointly supervised this work: Junjing Jia, Zhengchang Su, Changrong Ge.

Authors and Affiliations

Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
Siwen Wu, Kun Wang, Tengfei Dou, Shixiong Yan, Zhiqiang Xu, Yong Liu, Zonghui Jian, Jingying Zhao, Rouhan Zhao, Xiannian Zi, Dahai Gu, Lixian Liu, Qihua Li, Junjing Jia & Changrong Ge
Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
Siwen Wu, Sisi Yuan & Zhengchang Su
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
Dong-Dong Wu
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
Dong-Dong Wu

Authors

Siwen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tengfei Dou
View author publications
You can also search for this author in PubMed Google Scholar
Sisi Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shixiong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zonghui Jian
View author publications
You can also search for this author in PubMed Google Scholar
Jingying Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Rouhan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiannian Zi
View author publications
You can also search for this author in PubMed Google Scholar
Dahai Gu
View author publications
You can also search for this author in PubMed Google Scholar
Lixian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qihua Li
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Dong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Junjing Jia
View author publications
You can also search for this author in PubMed Google Scholar
Zhengchang Su
View author publications
You can also search for this author in PubMed Google Scholar
Changrong Ge
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.J., Z.S. and C.G. supervised and conceived the project; K.W., T.D., S.Y²., Z.X., Y.L., Z.J., J.Z., R.Z., X.Z., D.G., L.L., Q.L. and D.W. collected tissue samples and conducted molecular biology experiments; S.W. and S.Y¹. assembled and corrected the genomes; S.W. and Z.S. performed data analysis; and S.W. and Z.S. wrote the manuscript.

Corresponding authors

Correspondence to Junjing Jia, Zhengchang Su or Changrong Ge.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Table 1

Table 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, S., Wang, K., Dou, T. et al. High quality assemblies of four indigenous chicken genomes and related functional data resources. Sci Data 11, 300 (2024). https://doi.org/10.1038/s41597-024-03126-1

Download citation

Received: 24 November 2023
Accepted: 05 March 2024
Published: 15 March 2024
DOI: https://doi.org/10.1038/s41597-024-03126-1

Subjects

Abstract

Similar content being viewed by others

Genome assembly in the telomere-to-telomere era

Emx2 underlies the development and evolution of marsupial gliding membranes

Complexity of avian evolution revealed by family-level genomes

Background & Summary

Methods

Chickens

Ethics statement

Short-reads DNA sequencing

PacBio long-reads sequencing

Oxford Nanopore long-reads sequencing

Hi-C sequencing

Transcriptome sequencing

Quality assessment of sequencing data

Contig assembling

Scaffolding and gap filling

Chromosome-level genome assembling

Quality evaluation of assemblies

Correction of GRCg6a assembly

Data Records

Technical Validation

Quality evaluation of the sequencing data of the four chicken breeds

Evaluation of the quality of the assemblies of the four indigenous chicken genomes

Assembly of missing micro-chromosomes and correction of mis-assemblies in GRCg6a

Varying lengths and high G/C contents of assembled micro-chromosomes chr16 and chr29-39

Code availability

Change history

08 April 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Table 1

Table 2

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links