Transcriptomic datasets of Verticillium wilt resistant and non-resistant Gossypium barbadense varieties during pathogen inoculation

Cotton is a significant cash crop and the primary source of natural fiber globally. Among the numerous diseases encountered in cotton production, Verticillium wilt is one of the most serious, caused by the pathogen Verticillium dahliae (V. dahliae). Unfortunately, there are no effective targeted methods to combat this disease. Genomic resources for Verticillium wilt resistance primarily exist in Gossypium barbadense (G. barbadense). Regrettably, there have been limited transcriptomic comparisons between V. dahliae-resistant and -susceptible varieties of G. barbadense due to the scarcity of susceptible resources. In this study, we conducted a transcriptome analysis on both V. dahliae-resistant and -susceptible varieties of G. barbadense at the 0, 12, 24 and 48 hours after V. dahliae inoculation. This comparative transcriptome analysis yielded high-quality data and offered new insights into the molecular mechanisms underlying cotton’s resistance against this destructive pathogen.


Background & Summary
Cotton, as one of the prominent fiber crops, serves as vital renewable natural fiber source, which contributes to approximately 35% of global fiber production 1,2 .In 2022, China becomes the second-largest cotton-producing country worldwide, yielding a total production of 5.977 million tons 3 .However, various diseases cause the loss of cotton production, with cotton Verticillium wilt (CVW) emerging as one of the most widespread and great threat to cotton production 4 .Over 40% of the cotton planting area in China being affected by V. dahliae, and this disease has become a main threat to the sustainable development of cotton production 5 .
CVW is a soil-borne vascular fungal disease, which is primarily caused by the pathogen of V. dahliae 6,7 .Root exudates induces V. dahliae germinaion and V. dahliae initiates invasion of the host plant through the root system.The pathogen then penetrates the root epidermal gaps via the root tip or natural wounds, ultimately growing vertically into the root vascular tissue.Subsequently, the spores of V. dahliae rapidly spreads to the cotton stems and leaves, resulting in gradual wilting and potential death of the cotton plant 8,9 .The microsclerotia of V. dahliae can survive up to 10 years, coupled with its wide range of hosts and the lack of effective fungicides for prevention or eradication, poses significant challenges in controlling the disease 5 .Consequently, the most effective, economical, safe, and environmentally friendly integrated disease management approach for controlling Verticillium wilt is planting and breeding resistant cultivars.
Gossypium barbadense (G.barbadense) and Gossypium hirsutum (G.hirsutum) are two major cultivated allopolyploid species, originating from natural hybridization and polyploidization events between diploid A0 and D5 genomes approximately 1-2 million years ago (MYA) 10 .These two cotton species exhibit significant differences in traits, such as stress tolerance, fiber quality, and lint yield 11 .These variations can be attributed to different processes of natural selection and human domestication.G. hirsutum is extensively cultivated due to its superior yield performance and better adaptability, contributing to over 90% of the global cotton fiber production 12,13 .V.dahilae impacts more than half of the cotton acreage, and G. hirsutum is notably susceptible to V. dahliae.This susceptibility leads to a substantial reduction in both fiber quality and yield 14 .Notably, G. barbadense demonstrates high resistance or immunity to cotton V. dahliae, making it a valuable resource for identifying key genes associated with Verticillium wilt resistance and uncovering the corresponding regulatory mechanisms 15,16 .However, the lack of V. dahliae-susceptible G. barbadense germplasm resources has limited the research focus to comparing V. dahliae-resistant G. barbadense with either V. dahliae-resistant G. hirsutum or V. dahliae-susceptible G. hirsutum varieties [17][18][19][20][21][22] .Currently, there is only one study comparing transcriptomic differences between V. dahliae-resistant and -susceptible varieties of G. barbadense, but the transcriptome data has not been released 16 .The rare bioinformatic data limited the deep understanding of how immune network of G. barbadense play role in defeating the invasion of V. dahliae.
In this study, we performed a comparison of the dynamic changes in the transcriptomes of resistant and susceptible G. barbadense varieties before and after inoculation with V. dahliae using RNA-seq.Our study demonstrates that the disease-resistant verity exhibits higher expression levels of genes involved in the suberin biosynthetic process, phenylpropanoid biosynthetic process, and other biological pathways associated with the establishment of cell physical barriers under normal growth conditions.Furthermore, we observed that the transcriptome response of the resistant variety is less sensitive even after invasion by Verticillium wilt, suggesting that its stronger physical barrier may play a significant role in conferring resistance against V. dahliae.

Methods
plant material.Two Gossypium barbadense varieties, namely V. dahliae-resistant G. barbadense 'Xinhai 47'(BR) and the V. dahliae-susceptible G. barbadense 'Shidahaigan 1' (BS) were selected for pathogenicity assays and RNA-seq analysis.To facilitate proper growth of the seedlings, the germinated seeds were incubated in controlled conditions in an incubator with a photoperiod of 16 hours of light and 8 hours of darkness; maintaining day and night temperatures of 25 °C and 23 °C respectively.Disease resistance identification.Two-leaf-stage cotton seedlings were infected with the highly aggressive defoliating V. dahliae strain "V991" by using the root irrigation method 23 .The strain was kindly provided by Associate Professor Yanfei Sun (Shihezi University).The concentration of "V991" spore liquid was adjusted to 106 spores/mL by using a hemocytometer, followed by the application of 20 mL per plant into the nutrient bowls.Subsequently, the plants were incubated for further cultivation.At 14 days post-infection (dpi), the BR plants exhibited mild symptoms of Verticillium wilt, characterized by slight yellowing of a few lower leaves.In contrast, the BS plants displayed more severe symptoms, including significant yellowing and leaf loss, with some plants eventually succumbing to the infection (Fig. 1a).The investigation of disease index and quantification of fungal biomass was conducted with reference to previous studies 23,24 .The disease index of BS was notably higher than that of BR at both 14 and 21 dpi (Fig. 1b).Assessing the cotton stem sections revealed browning in both transverse and longitudinal stem sections of BS plants after 14 days of V. dahliae inoculation, indicating a more severe condition compared to BR plants (Fig. 1c).Additionally, quantification of fungal biomass in the stems illustrated significantly higher accumulation in BS plants compared to BR plants at 14 and 21 dpi (Fig. 1d).Overall, these results convincingly demonstrated the significantly higher resistance of BR to V. dahliae than to BS.
RNA extraction, Library preparation and Sequencing.Root samples from both the BR and BS genotypes were collected at 0, 12, 24 and 48 hours after V. dahliae inoculation separately.The samples were immediately frozen in liquid nitrogen.All samples were harvested from five cotton plants with three independent biological replicates.In total, 24 samples (2 genotypes × 4-time points × 3 biological replicates) were subjected to RNA-seq.For total RNA extraction, 0.1 g of the finely ground samples was accurately weighed and using the plant polysaccharide polyphenol total RNA extraction kit from TianGen Biotech Co., Ltd.(Beijing).From the extracted total RNA, 3 µg of qualified RNA was used for further processing.Specifically, mRNA was purified using magnetic beads with Okigo (dT) oligo for enrichment.Subsequently, the mRNA was fragmented, and first-strand cDNA synthesis was performed using a random primer with a six-base sequence.The second-strand cDNA synthesis was carried out by adding buffer, dNTPs, and DNA polymerase I. Subsequently, the double-stranded cDNA was purified using AMPure XP beads.To prepare the cDNA library, the purified double-stranded cDNA underwent end repair, A-tailing, and sequencing adapter ligation.Fragment size selection was performed using AMPure XP beads, and PCR amplification was conducted to obtain the cDNA library.The resulting library was subjected to RNA sequencing on the Illumina HiSeq 4000 platform, generating 150 bp paired-end reads.A total of 48 clean reads were obtained from the 24 cotton samples.The quality assessment of these raw RNA-seq reads (48 in total) was conducted using FastQC18 software, and the summarized results were generated using MultiQC software tool 25 .The raw sequencing data underwent data quality control with the fastp program, which involved the removal of sequences containing adapters, poly-N sequences, and low-quality data 26 .

Differential expression genes (DeGs) identification and Go enrichment.
The clean reads were aligned to the Gossypium barbadense L. (Hai7124) genome 27 using hisat2 with default parameters 28  (Version 2.0.1) was used to calculate the read count of each gene, and gene expression levels were quantified using TPM 29 (transcripts per kilobase of exon model per million mapped reads).Differentially expressed genes (DEGs) were identified using the DESeq R package (1.18.0) 30 based on the criteria of |log2(fold-change)| ≥ 2 and an adjusted P value < 0.05.The gene Ontology (GO) enrichment analysis of DEGs was performed using the cluster-Profiler 31 R package, with a significance threshold set at an adjusted p-value < 0.05.

Validation of DeGs by quantitative real-time pCR (qRT-pCR).
We selected 6 DEGs and designed gene specific primers online using the qPrimerDB website (https://biodb.swu.edu.cn/qprimerdb/), with amplified fragments ranging in length from 80 to 250 bp.The primers used in qRT-PCR were listed in Supplementary Table S1.We performed the qRT-PCR experiment on Roche LightCycle 480II, using SYBR Green Mix.Additionally, GbUBQ7 (DQ116441.1)was used as the internal reference gene and the relative expression level of the target gene is determined through Livak's 2 −△△CT method 32 .

Data Records
24 raw RNA-seq data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) in National Genomics Data Center (Nucleic Acids Res 2021), China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA012121) 33 that are publicly accessible at https://ngdc.cncb.ac.cn/gsa/browse/CRA012121.Other data, such as the breakdown of each sample file on GSA, lists of raw gene counts, TPM of the 24 samples, the information of DEGs and results of GO enrichment analysis are also available at figshare 34 .(https://doi.org/10.6084/m9.figshare.24097941.v2).

technical Validation
Quality assessment of sequencing data.The MultiQC analysis revealed that the quality scores of all sequences exceeded 30(Fig.2a).Moreover, the per-sequence quality scores predominantly concentrated within the range of 35-40 (Fig. 2b), indicating a base error rate of less than 0.03%.The GC distribution of all sequences exhibited a normal distribution closely resembling the theoretical distribution (Fig. 2c), providing further evidence of the high quality of the reads.Throughout the data processing, an average of 5.9 million adapter sequences were filtered out from each sample, resulting in an average sample size of clean reads of 8.83 Gb (range 6.96-10.88Gb).The percentage of clean reads was calculated as 99.4% (Table 1).
principal component analysis (pCA) and different basal transcriptome identification.To evaluate the transcriptional differences between the BR and BS varieties, we performed principal component analysis (PCA) using the TPM values from 24 samples.The results demonstrated a clear distinction between the BR and BS varieties, regardless of V. dahliae inoculation (Fig. 3a, Figshare File 2 34 ).Remarkably, even in the absence of V. dahliae inoculation, the BR and BS samples exhibited distinct separation based on PC1.This observation suggests a potential association between the transcriptional dissimilarities of the two varieties and their distinct levels of disease resistance.A total of 1117 DEGs were identified between the BR and BS varieties at 0 hours post-inoculation (hpi), prior to V. dahliae infection.Of these DEGs, 643 exhibited significantly higher expression levels in BR than in BS, whereas 474 displayed significantly lower expression levels in BR (Fig. 3b, Figshare File 3 34 ).Among the top 20 GO terms in 643 DEGs, genes associated with physical barrier formation, such as GbCYP86A1 (GB_A07G1113), GbCER1 (GB_A10G0724) in suberin biosynthetic process were identified.Furthermore, a GO term related to Verticillium wilt resistance, which included cellular response to antibiotic pathways and melatonin biosynthetic process, was also significantly enriched in 643 DEGs (Fig. 3c, Figshare File 4 34 ).The most enriched processes among the 474 DEGs with higher expression in BS were related to carbon-oxygen lyase activity and isoprenoid biosynthesis (Fig. 3d, Figshare File 5 34 ).

Analysis of differentially expressed genes (DeGs) involved in the response to V. dahliae inoculation.
To investigate the molecular mechanisms underlying the resistance of different varieties of G. barbadense to V. dahliae infection, we compared the gene expression changes at 12, 24, and 48 hpi with an uninfected sample (0 hpi).A total of 1080, 469, 548 DEGs were identified at 12, 24, and 48 hpi in the BR variety respectively.Similarly, in the BS variety, we found 997, 1147, and 944 DEGs at the same time points (Fig. 4a, Figshare File 6 34 ).Interestingly, BR demonstrated a high number of induced genes at all three time points, whereas BS displayed the opposite pattern.Analysis revealed 1498 nonredundant DEGs in BR and 2129 in BS, in response to V. dahliae at all three time points post-inoculation (Fig. 4b-c, Figshare File 7 34 ).
Among the top 20 GO terms, 8 terms were shared between BR and BS.The most significantly enriched terms included photosynthesis, light reaction, and response to high light intensity (Fig. 5a, Figshare File 8 34 ).Additionally, BS exhibited specific enrichment in phenylpropanoid metabolic process, response to hydrogen peroxide, and flavonoid metabolic process.These GO terms have previously been reported to play a role in cotton resistance to V. dahliae (Fig. 5b, Figshare File 9 34 ).The findings suggest that BR displayed insensitivity to the pathogen, whereas BS exhibited a rapid activation of resistance-related transcriptome responses upon inoculation with V. dahliae.
Validation results of RNA-seq data by qRT-pCR.Although the log2(fold-change) value from qRT-PCR did not precisely match that of RNA-seq, there was a strong positive correlation (R 2 = 0.854) observed between the results of RNA-seq and qRT-PCR expression analyses, indicating the reliability of the RNA-seq results (Fig. 6).
In summary, we have presented a collection of high-quality transcriptome data illustrating the response of G. barbadense to V. dahliae.Our preliminary analysis indicates a potential association between the resistance of G. barbadense to V. dahliae and the presence of physical barriers.We believe that this transcriptome data offers valuable insights into the molecular mechanisms that underlie cotton's resistance to V. dahliae, providing a novel approach for identifying key genes involved in such resistance.

Fig. 1
Fig. 1 Comparison of Verticillium dahliae (V.dahliae) resistance.(a) The phenotype of BR and BS at 14 days post-infection (dpi) with V. dahliae, visual observations were made.(b) The disease index of both varieties was assessed at 14 and 21 dpi.(c) The phenotype of stems from BR and BS plants was examined 14 days after inoculation with V. dahliae.(d) The relative fungal biomass in the stems of BR and BS at 14 days postinoculation (dpi) was determined.All data presented are expressed as mean ± standard deviation based on three independent biological replicates.Statistical analysis was performed using the T-test (P < 0.05, *P < 0.01).

Fig. 2
Fig. 2 Evaluation of sequence quality in raw FASTQ data.The quality scores of all 48 raw RNA-seq reads were assessed using FastQC, and the results were compiled and summarized using MultiQC.(a) Displays the distribution of read counts for mean sequence quality.(b) Represents the distribution of mean quality scores.(c) Illustrates the per-sequence GC content.

Fig. 3
Fig. 3 Basal transcriptomes of Xinhai 47 (BR), a resistant variety, and shidahaigan 1 (BS), a susceptible variety.(a) Principal component analysis (PCA) based on TPM values for 24 samples.(b) Heatmap showing the expression patterns of 1117 differentially expressed genes (DEGs) between BR and BS without Verticillium dahliae (V.dahliae) inoculation.(c) GO enrichment of highly expressed DEGs in uninfected BR plants compared to BS plants.(d) GO enrichment of highly expressed DEGs in uninfected BS plants compared to BR plants.

Fig. 4
Fig. 4 Identification of differentially expressed genes (DEGs) in Xinhai 47 (BR) and shidahaigan 1 (BS) at 12, 24, and 48 h after Verticillium dahliae infection.(a) The number of DEGs in BR and BS after V. dahliae infection at 12, 24, 48 hpi.(b) Venn diagram of DEGs at each time point in BR after inoculation with V. dahliae.(c) Venn diagram of DEGs at each time point in BS after inoculation with V. dahliae.

Fig. 5
Fig. 5 Gene Ontology (GO) analysis was performed on the 1498 and 2129 differentially expressed genes (DEGs) identified in the BR and BS varieties, respectively, after infection with Verticillium dahliae (V.dahliae).(a) GO analysis of DEGs in the BR variety following inoculation with V. dahliae.(b) GO analysis of DEGs in the BS variety following inoculation with V. dahliae.

Fig. 6
Fig. 6 Validation of RNA-seq results by qRT-PCR analysis.(a) Fold change of six DEGs through qRT-PCR and RNA-seq.(b) Pearson correlation of the expression fold change of six DEGs between the RNA-seq and RT-qPCR assays.