Comparative transcriptome analysis of fiber and nonfiber tissues to identify the genes preferentially expressed in fiber development in Gossypium hirsutum

Cotton is an important natural fiber crop and economic crop worldwide. The quality of cotton fiber directly determines the quality of cotton textiles. Identifying cotton fiber development-related genes and exploring their biological functions will not only help to better understand the elongation and development mechanisms of cotton fibers but also provide a theoretical basis for the cultivation of new cotton varieties with excellent fiber quality. In this study, RNA sequencing technology was used to construct transcriptome databases for different nonfiber tissues (root, leaf, anther and stigma) and fiber developmental stages (7 days post-anthesis (DPA), 14 DPA, and 26 DPA) of upland cotton Coker 312. The sizes of the seven transcriptome databases constructed ranged from 4.43 to 5.20 Gb, corresponding to approximately twice the genome size of Gossypium hirsutum (2.5 Gb). Among the obtained clean reads, 83.32% to 88.22% could be compared to the upland cotton TM-1 reference genome. By analyzing the differential gene expression profiles of the transcriptome libraries of fiber and nonfiber tissues, we obtained 1205, 1135 and 937 genes with significantly upregulated expression at 7 DPA, 14 DPA and 26 DPA, respectively, and 124, 179 and 213 genes with significantly downregulated expression. Subsequently, Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway analyses were performed, which revealed that these genes were mainly involved in catalytic activity, carbohydrate metabolism, the cell membrane and organelles, signal transduction and other functions and metabolic pathways. Through gene annotation analysis, many transcription factors and genes related to fiber development were screened. Thirty-six genes were randomly selected from the significantly upregulated genes in fiber, and expression profile analysis was performed using qRT-PCR. The results were highly consistent with the gene expression profile analyzed by RNA-seq, and all of the genes were specifically or predominantly expressed in fiber. Therefore, our RNA sequencing-based comparative transcriptome analysis will lay a foundation for future research to provide new genetic resources for the genetic engineering of improved cotton fiber quality and for cultivating new transgenic cotton germplasms for fiber quality improvement.

www.nature.com/scientificreports/ gel electrophoresis showed that the RNA from seven tissues exhibited two complete rRNA bands and that the 28S rRNA band was approximately twice as bright as the 18S rRNA band. The results indicated that the integrity of these total RNAs was relatively complete, and no degradation was observed (Fig. S1). The concentration and purity of the total RNA were measured with a Nanodrop 2000 spectrophotometer. The concentration of each sample was higher than 150 ng/μL, the OD260/280 was approximately 1.9, and the OD260/230 was greater than 2.0. Therefore, the RNA concentration and purity of these samples were relatively high, reaching the standard for A-level library construction (Table S1). The next step of database building and sequencing was then conducted.
Transcriptome sequencing and comparison with the reference genome. On the Illumina HiSeq™ 2000 platform, high-throughput RNA sequencing generated 343 million raw reads from seven cotton tissues, with more than 46.59 million reads per tissue. A total of 332 million clean reads (96.79%) were obtained from the total raw reads after discarding adapters, low-quality reads and raw reads filtered according to an N greater than 5%. Two biological replicates were evaluated for each line, and the results indicated that the RNA-seq data were in good agreement (0.936 < R2 < 0.982). For all sequence data, the average Q20, Q30, and GC contents were 98.88%, 93.39% and 43.86%, respectively, and the error rates for all samples ranged from 0.01 to 0.02%. Overall, 83.33%-88.22% of the high-quality clean reads were mapped to the reference genome of Gossypium hirsutum TM-1 using TopHat29 (Table 1). The distribution of the numbers of genes expressed in each tissue sample (Table S2) and the sequencing coverage of the gene transcripts ( Fig. S2) were analyzed. There were 3774-4481 genes expressed in high abundance. Among the mapped reads, 78.75-88.03% were distributed in exon regions, 1.54-7.81% were located in introns, and 10.09-17.26% were located in intergenic regions. Three types of mapping data were obtained: (1) multiple mapping (21)(22)(23)(24)(25).66%) and unique mapping reads (60.24-65.4%), (2) forward mapping (36.84-39.38%) and reverse mapping reads (36.75-39.22%), and (3) nonspliced reads (56.68-61.93%) and spliced reads (13.23-21.65%) ( Table 1). A total of 76,772 genes were identified, and no fewer than 52,565 genes were expressed in the seven libraries. To test the correlations among the experimental samples, the expression of the genes in these seven libraries was analyzed using the Pearson correlation coefficient (PCC) (Fig. S3a). The results showed that 7 DPA fibers presented the greatest similarity to 14 DPA fibers (similarity as high as 0.732), followed by 26 DPA fibers (0.605), while the similarity to nonfiber tissues (root, leaf, anther and stigma) was lower. Therefore, gene expression was most similar in fibers from different elongation periods (7,14 and 26 DPA); however, the gene expression between different tissues showed significant changes.
To further confirm the relationships among these seven different tissue libraries, principal component analysis (PCA) was performed on the expressed genes (Fig. S3b). The results showed that the expression of genes differed mainly among different tissues, except for the genes that were expressed in different periods in the fiber tissue. In the PCA, the gene expression pattern was different between different tissues, which was conducive to screening for cotton fiber superiority or specifically expressed genes and to some extent indirectly verified the reliability of our RNA-seq data.  Screening of DEGs and overall transcriptome sequencing analysis. To screen DEGs in different tissues of cotton, we used the fragments per kilobase of transcript per million fragments (FPKM) method to measure the gene expression levels. According to the different FPKM values of the expressed genes in each tissue, the screening parameters for DEGs were set as follows: p value < 0.05 and |Log2(Fold Change)| ≥ 2. If the |Log-2Fold Change)| value was larger, it meant that the expression difference of the gene between samples was higher, that is, high-abundance expression. According to the FPKM value of each gene in different tissues, cluster analysis and differential gene expression profiling were performed. From our data, a volcano plot and a histogram showing detailed information about the number of DEGs in each pairwise comparison were generated (Fig. 3). According to the sequencing results, volcano plots were drawn and screened according to the significant difference criteria (difference in gene expression > 2 and FDR ≤ 0.05), and statistically significant differences in gene expression were measured. The volcano plots showed that many genes were upregulated and downregulated in the pairwise comparisons ( Fig. 3a-c). Histograms were generated to summarize the significant DEGs identified in the pairwise comparisons of all samples (Fig. 3d). Subsequently, the transcriptome data from the pairwise comparisons were compared with gene annotation databases; 889-5017 genes showed matches in the Gene Ontology (GO) database, and 765-4329 genes showed matches in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database ( Table 2).

Screening of DEGs from Venn diagrams between fiber and nonfiber tissues at different developmental stages.
To identify genes that were specifically or predominantly expressed during the period of fiber development and elongation, transcriptome data were used to generate heatmaps of DEGs, and Venn diagrams showing detailed information about the numbers of DEGs in each pairwise comparison were produced (Fig. 4). Overall, 7 DPA and 14 DPA fibers showed high similarity in their gene expression profiles, but they presented less similarity to the 26 DPA fibers; however, the similarity between the fiber profiles was higher than that between the nonfiber tissues (Fig. 4a). These results were consistent with the results of the PCC. Comparative analysis of DEGs between fiber and nonfiber tissues revealed 1411 DEGs in 7 DPA fibers (Fig. 4b), 1405 DEGs in 14 DPA fibers (Fig. 4c) and 1219 DEGs in 26 DPA fibers (Fig. 4d). Among the DEGs, there were 1205, 1135 and   (Fig. 5a-c), respectively. During fiber elongation and development, among biological processes, the DEGs were significantly enriched in metabolic processes, cellular processes, single organism processes, and biological regulation processes. In the cell component category, the DEGs were mainly distributed in cells, membranes, and organelles. Among molecular functions, the DEGs were mainly enriched in functional groups, such as catalytic activity, binding, transport activity, and molecular function regulation. These results showed that the above functional groups could play an important role in fiber development and elongation.
In the KEGG metabolic pathway analysis, 455, 431 and 333 DEGs were significantly clustered into 30, 30 and 33 metabolic pathways in 7, 14 and 26 DPA fibers (Fig. 6a-c), respectively, which mainly included metabolic pathways such as those involved in signal transduction, carbohydrate metabolism, protein translation and processing, transportation and catabolism. These results showed that the above functional groups and metabolic pathways could play an important role in the stage of fiber development and elongation.

Analysis of related functional genes during fiber elongation and development. Fiber cell
metabolism is very active during the fiber development and elongation stage, which is mainly manifested in the overexpression of three major groups of genes: (1) cytoskeleton-related genes; (2) cell wall structure and cellulose biosynthesis-related genes; and (3) energy and carbohydrate metabolism-related genes. A number of related functional genes that were differentially expressed during fiber elongation and development were selectively analyzed. The expression patterns of genes associated with fiber elongation development are shown in Fig. 7 and Table S3. Many upregulated genes were associated with cytoskeletal components, such as actin (GhACT ), tubu- www.nature.com/scientificreports/ lin (GhTUBA4, GhTUBB2, GhTUBB6), kinesin (GhPSS1), clathrin (GhCLC2), and dynamin-related proteins GhDRP1), and these genes were predominantly or specifically expressed in cotton fiber tissues. The metabolic activities of various carbohydrate substances promote the rapid elongation of fibers, involving enzymes such as cellulose synthase (GhCESA5), sucrose synthase (GhSUS), glucosidase (GhGBA), phosphate uridylyltransferase (GhUGP1), polygalacturonase (GhADPG1) and other carbohydrate synthase or modification enzymes. Compared with nonfiber tissues, these genes were significantly upregulated at different fiber elongation stages. We also identified a number of genes involved in fatty acid metabolism and secondary metabolic processes, such as ultralong-chain enoyl coenzyme A reductase (GhECR2), fatty acyl coenzyme A reductase (GhFAR3), long-chain acyl coenzyme A synthase (GhLACS1), ultralong-chain-3-hydroxyacyl coenzyme A dehydratase (GhHACD3), alkane hydroxylase (GhMAH1), and phenylpropane compounds (GhACO1). In particular, most of the fatty acid metabolism-related genes were significantly upregulated in the early stage of fiber elongation (7 DPA), indicating that these genes play important roles in the early stage of cotton fiber elongation.  (18) transcription factors (Fig. 8a). Among these transcription factors, 148 were significantly upregulated in fibers. Putative homologs of genes related to fiber development were identified in seven different cotton tissues (Fig. 8b-e). GhMYB114, GhMYB1, GhMYB3 www.nature.com/scientificreports/ and GhCPC-like belong to the MYB transcription factor family. Among these genes, GhMYB114, GhMYB1 and GhMYB3 presented similar expression patterns and were significantly upregulated in 14 DPA fibers, with lower expression in the other fiber periods, while their expression levels in nonfiber tissues were very low. GhCPC-like was significantly upregulated in 7 DPA fibers. PTI-6 belongs to the AP2/EREBP transcription factor family, and three PTI-6-homologous genes (GhERF34, GhERF38 and GhERF84) showed the same expression pattern and were upregulated in fiber tissue. However, their expression was significantly upregulated in the early stage of fiber elongation, and the expression level gradually decreased with the passage of developmental time. Two other AP2/EREBP transcription factors, RAP2 (GhTINT) and ERF (GhWIN1) also presented this expression pattern. Similarly, in the bHLH, bZIP and WRKY families, many transcription factors were found to be upregulated in fibers but showed gradually decreased expression with fiber development and were expressed at low levels or not at all in nonfiber tissues.

Analysis of dominant or specific expression genes during fiber elongation and development.
A large number of genes were expressed in different developmental stages and participated in the regulation of fiber cell elongation and development. In this study, some functional genes specifically or preferentially expressed in the process of fiber elongation and development were analyzed. Based on the p value ≤ 0.05, the screening parameters between fibers and nonfibers were |Log2(Fold Change)| ≥ 2, and the screening parameters between fibers at different elongation and development stages were 0.5 < |Log2(Fold Change)| < 1.5. Corresponding parameters were set to screen fiber-dominant genes. The expression pattern of dominant expression genes in fiber elongation development is shown in Table S4 and Fig. 9. A total of 330, 128 and 278 highly expressed genes were screened in Fiber_7, Fiber_14 and Fiber_26, respectively; among them, 154 (46.67%), 43 (33.60%) and 96 (34.53%) had clear gene annotations, and the rest were genes with unknown function. Simultaneously, 206 genes with high-abundance expression in the whole elongation and development period of fiber were also screened, of which 114 (55.34%) had clear gene annotations. Therefore, there were a large number of fiber-dominant or fiber-specifically expressed genes in the process of fiber elongation and development, which has not been reported previously.
Validation of RNA-seq data by qRT-PCR. To further validate the reliability of the RNA-seq results and the accuracy of the DEGs, twelve upregulated DEGs were randomly selected from each of the 7 DPA, 14 DPA and 26 DPA fiber transcriptome libraries, for a total of 36 genes. qRT-PCR was used to analyze the differential expression of genes between different tissues of cotton. Cotton Sad1 (NCBI Reference Sequence: NM_001327106.1) was used for relative gene expression normalization 24 . The qRT-PCR results showed that the 36 upregulated genes were all expressed specifically or predominantly in fibers ( Fig. 10 and Fig. S5), among which 19 were specifically or preferentially expressed in 7 DPA fibers (e.g., CotAD_98043, CotAD_14327, CotAD_51137). Six upregulated genes were specifically or predominantly expressed in 14 DPA fibers, including CotAD_46959, CotAD_27919, CotAD_22244, CotAD_23413, CotAD_10228 and CotAD_36479. The remaining 11 upregulated genes were expressed specifically or preferentially in 26 DPA fibers. Comparative analysis of www.nature.com/scientificreports/ the qRT-PCR results and RNA-seq data revealed slight differences, but the expression trends of DEGs in different tissues were highly similar between the two groups of data. This result showed that the identification of genes that were specifically or predominantly expressed in fiber by comparing DEGs between different tissues resulted in improved accuracy. In conclusion, the transcriptome database of different cotton tissues constructed in this study presented high reliability.

Discussion
Transcriptome analysis of fiber and nonfiber tissues in cotton. Cotton fiber is an ideal model for studying cell elongation and cell wall construction in plants. Cotton fiber elongation is regulated in a complex, orderly manner involving multiple genes and pathways. Prominent progress in molecular biology research on cotton fiber development is the isolation of some cotton fiber-specific or abundantly expressed genes. With the development of new technology and bioinformatics, expression profile analysis with RNA-seq technology as the main method has played an important role in cotton fiber development research. In this study, we used RNA-seq technology to sequence the transcriptome of seven different cotton tissues (root, leaf, anther, stigma, 7 DPA fiber, 14 DPA fiber, and 26 DPA fiber) to screen for highly abundant genes during fiber elongation and development.
In this study, the analysis of DEGs was performed from the transcriptome data of 7 DPA fiber and nonfiber tissues. There were 1,205 upregulated genes with significant expression differences found in 7 DPA fibers; many of these genes had been previously reported in fibers, such as E6 25      www.nature.com/scientificreports/ analysis of 7 DPA fibers, indicating that lipid metabolism, signal transduction, catalytic activity, the cell wall, and the cytoskeleton played important roles in the rapid and late elongation of cotton fibers.

Fiber development-related transcription factors regulate cotton fiber elongation and development.
A cotton fiber consists of a single epidermal cell of the ovule that undergoes specialized initial differentiation, elongation, thickening and dehydration maturation to form a mature epidermal fiber 39 . A large number of genes are required for fiber differentiation and development, but it is thus far unclear how these genes control and regulate fiber development. Transcription factors play an important regulatory role in the growth and evolution of cotton fiber. In this study, a total of 1,467 transcription factors were identified from families such as MYB, bHLH, bZIP, and TCP. Among these transcription factors, 148 were significantly upregulated in fibers. GhMYB7, an R2R3-MYB transcription factor, was highly expressed during fiber elongation. A crosssectional assay of basal stems revealed that the cell wall thickness of vessels and interfascicular fibers was higher in transgenic lines overexpressing GhMYB7 than in the wild type. This gene may be involved in regulating secondary cell wall biosynthesis in cotton fibers 40 . GhMYB25 and GhMYB25-like were mainly expressed at high levels during the initial differentiation and early elongation of the fiber, and their expression decreased with the onset of rapid elongation of the fiber. The silencing of GhMYB25 is associated with the production of short fibers in cotton 41 , while the suppression of GhMYB25-like produces fiber-less cotton 42 . GhMYB46 showed the highest expression in 20 DPA fibers. Overexpression of GhMYB46 leads to ectopic secondary cell wall (SCW) deposition in transgenic plants and could activate the promoter of the SCW cellulose synthase gene to regulate the elongation and development of cotton fiber 43 . GhMYB109 is specifically expressed in the differentiation and elongation stages of cotton fiber primordial cells, and antisense-mediated suppression of GhMYB109 leads to a substantial reduction of fiber length 44 . GhDEL65, a transcription factor of the bHLH protein family from upland cotton, is a homologous gene of Arabidopsis GLABRA3 (GL3), which is expressed at a significant level in the early stage of fiber elongation; ectopic expression partially rescues hair body development. Ectopic expression of GhDEL65 partially rescues the development of trichomes in the Arabidopsis gl3 mutant 45 . GhFSN1, one of the transcription factors of the NAC family, has been found to exhibit high levels of transcript accumulation in 15-28 DPA fibers, but the expression of this gene is undetectable or very weak in other tissues of cotton. This gene regulates the formation of the SCW in cotton fiber 46 . GhTCP14a, a TCP transcription factor, is expressed at a high level during the rapid elongation period (6-12 DPA) of cotton fibers and regulates the rapid elongation development of the cotton fiber 47 . In this study, a large number of transcription factors were expressed either preferentially or specifically in cotton fibers. However, further research is required to determine how transcription factors regulate the elongation and development of cotton fibers.

Fiber development-related functional genes regulate cotton fiber elongation and development.
Researchers have successfully cloned and identified many cotton genes with fiber-dominant expression and studied their gene functions. These genes were all in the high-abundance expression database of fibers screened in this study (Table S3). For example, GhTUB1 is preferentially expressed in the elongation and development stages of fibers 33 . GhACT1 is mainly expressed in fiber cells, and its suppression disrupts the actin cytoskeleton and causes reduced fiber elongation, indicating that GhACT1 plays an important role in the period of fiber elongation but does not play a role in the initial period of fiber cell development 31 . GhPEL76 is a pectate lyase-like gene. Expression analysis (qRT-PCR) results showed that GhPEL76 is mainly expressed in cotton fibers, and its expression levels are significantly different in long-and short-fiber varieties. Virus-induced GhPEL76 silencing shortens fiber length, suggesting that GhPEL76 has a positive regulatory effect on fiber elongation 48 . GhLTPG1, a GPI-anchored lipid transporter gene, was cloned from Gossypium hirsutum and shown to be significantly expressed in the period of rapid fiber elongation. After heterologous expression in Arabidopsis, the number of leaf epidermal hairs was significantly increased. In contrast, silencing of the GhLTPG1 gene in upland cotton by using RNAi technology resulted in a significantly shortened fiber length, reduced polar lipid content and inhibited expression of genes related to fiber elongation 49 . GhFIM2 is an actin gene that is expressed predominantly in the overlapping portion of the fiber elongation and SCW synthesis phases. In cotton, overexpression of the GhFIM2 gene increases the expression level in the actin bundle in the fiber elongation period and accelerates the rate of fiber elongation so that the length of mature fibers is increased 50 . GhEXPA8 is an EXPAN-SIN family gene that is expressed at a high level during the rapid elongation (7-25 DPA) of fiber. Overexpression of GhEXPA8 can increase the fiber length and mark value to enhance fiber tenacity 51 . In this study, a large number of genes were screened for significantly high expression levels in the fiber elongation and development period, providing many new gene resources for genetic engineering for cotton fiber quality improvement.

Analysis of highly abundant expressed genes in fiber elongation development.
Through analysis of fibers and nonfibers transcriptome data, a total of 1324 highly expressed genes in fiber were screened, of which 330, 128, and 278 genes were predominantly expressed in 7, 14 and 26 dpa fibers. A total of 407 genes had clear functional annotations, but only a few genes have been reported in research. After sorting out the data and analyses, we found a large number of unreported new genes that are expressed in high abundance in fibers (Table S4). For example, CotAD_24107 belongs to the proline-rich cell wall protein family genes, and its homologous gene PdPRP is preferred in immature poplar. Overexpression of PdPRP promotes secondary wall deposition and induces the expression of genes involved in microfibril angle and secondary wall biosynthesis 52 .
In this study, we found that this gene was expressed in high abundance in cotton fibers, especially in the period of fiber secondary wall thickening (14 DPA). CotAD_27598 is a protodermal factor gene. In Arabidopsis, AtPDF2 is specifically expressed in bud epidermal cells and plays an indispensable role in bud differentiation 53,54 . This study showed that this gene was expressed in extremely high abundance during the differentiation stage of fibroblasts www.nature.com/scientificreports/ (7 DPA), and it was speculated that it played an important role in the initial stage of fibroblast development. The homologous gene MtKCS of CotAD_37982 was predominantly expressed in the epidermal cells of the bud apical meristem, leaf primordium, and floral organs of Medicago truncatula and was mainly involved in the biosynthesis of very-long-chain fatty acids. However, overexpression of very-long-chain fatty acids can enhance the biosynthesis of cytokinins and promote the process of cell differentiation 55 . In this study, it was found that this gene was specifically expressed in fibers and was expressed in high abundance during the differentiation period of fiber primordial cells (7 DPA). It was speculated that this gene had the function of promoting fibroblast differentiation. CotAD_37677 is a bidirectional sugar transporter gene. Its homologous gene OsSWEET3a functions as a glucose transporter and is predominantly expressed in the basic vascular bundles of rice seedlings 56 . This gene was mainly secondary to cotton fiber development. It was expressed in abundance during the wall thickening period (26 DPA), and we speculated that it transported glucose to fiber cells for cellulose biosynthesis. CotAD_60133 encoded a skewing-related protein. Its homologous gene AtSPR1 is related to directional cell expansion and functions by regulating cortical microtubule dynamics 57,58 , indicating that this gene may be regulated by microtubule dynamics to regulate the expansion and change of fiber cells. In conclusion, this study used transcriptome data analysis of fiber and nonfiber tissues to screen a large number of fiber dominant expression genes, which can be used as candidate genes for the improvement of cotton fiber quality.
To further verify the comprehensiveness and reliability of the transcriptome sequencing data from different cotton tissues obtained in this study, we selected 36 genes from the 7, 14 and 26 DPA genes that were significantly upregulated in the fiber to analyze the expression patterns among different cotton tissues by qRT-PCR. The significantly upregulated genes identified in fiber were all specifically or predominantly expressed in fiber. Among these genes, CotAD_46044 (E6) 25 , CotAD_46959 (GhPRP1) 59 , CotAD_27919 (GhEXPA) 51 , CotAD_14327 (GhGASL3) 60 , CotAD_63563 (GhGLP1) 61 , CotAD_05318 (GhXTH7) 62 , CotAD_01886 (GhPEL) 63 , CotAD_49061 (GhKCS) 64 , CotAD_20528 (GhFb) 26 , CotAD_69249 (GhFLA7) and CotAD_1612 (GhFLA12) 37 have been previously reported to be specifically or preferentially expressed in the period of fiber elongation and secondary wall thickening and play an important role in cotton fiber elongation. The remaining 25 genes that were specifically or predominantly expressed in fiber had not been previously reported, so they could be used as candidate genes for further functional studies. By comparing the expression patterns of the selected fiber-upregulated genes between different tissues according to the qRT-PCR and RNA-seq data, we found that the gene expression patterns revealed by the two types of analysis were very similar. Therefore, the transcriptome libraries of different tissues constructed in this study were comprehensive and reliable and provided new genetic resources for the genetic engineering of cotton fiber quality improvement.

Materials and methods
Plant materials. Coker 312 was upland cotton cultivar and preserved in our laboratory, which was commonly used as test material in cotton research. In our study, using Coker 312 as material to identify the genes preferentially expressed in fiber development in Gossypium hirsutum. Our research contents complied with local and national regulations. Coker 312 was planted in a greenhouse. The roots and leaves were collected at 15, 25, and 35 days after germination, and the surface of the materials was cleaned with ddH 2 O. The root materials from the three periods were mixed together as the root tissue material and wrapped with aluminum foil, and the leaf material was subjected to the same procedure. The material was frozen in liquid nitrogen for 5 min and then transferred to a − 80 °C refrigerator for storage. Anthers were collected from cotton at − 3 ~ 0 DPA, stigmas were collected on the day of flowering, and fibers were collected at 7, 14, and 26 DPA (removing ovules). These samples were quickly placed in liquid nitrogen for freezing treatment for approximately 10 min and then transferred to a − 80 °C freezer for storage.
RNA extraction, cDNA library construction and RNA sequencing. Total RNA was extracted from each sample using the RNAprep Pure Plant Kit (Polysaccharides & Polyphenolics-rich) (TIANGEN, Beijing, China) following the manufacturer's protocol. We collected 14 samples from two biological replicates of each sample. Electrophoresis in a 1% agarose gel was used to determine whether the total RNA presented genomic DNA contamination, degradation or impurity. The concentration, purity and RNA integrity of the total RNA were further determined using a Kaiao K5500 spectrophotometer (Kaiao, Beijing, China) and an Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA). After the total RNA samples were qualified, oligo (dT) magnetic beads were used to enrich the mRNA. Fragmentation buffer (Agilent, CA, USA) was added to the obtained mRNA to generate short fragments. Then, the first strand of cDNA was synthesized with six-base random primers, and the second strand of cDNA was synthesized by adding buffer, dNTPs, RNase H and DNA polymerase I (NEB, MA, USA). The QIAQuick PCR kit was used for purification, and elution was conducted with EB buffer (QIAGEN, Germany). The purified double-stranded cDNA was then treated by terminal repair and A base and sequencing adapter addition (Illumina, CA, USA). Subsequently, a fragment of approximately 150 bp was recovered by agarose gel electrophoresis, and PCR amplification was performed to complete the library preparation. Finally, the constructed cDNA libraries were sequenced on the Illumina HiSeq 2000 platform.
Bioinformatics analysis of RNA-seq data to identify DEGs. The initial results of transcriptome sequencing were in the form of original images, which could be employed for base recognition and transformation using CASAVA (v1.8) software to obtain the original sequence data 65 . The quality of the original sequence was evaluated by detecting the base sequencing error rate and G+C distribution. Higher-quality clean reads were obtained by collating and filtering to remove adaptor tags, reads with an N content greater than 5% and lowerquality ultrashort reads. In this study, the whole-genome data of Gossypium hirsutum TM-1 were used as a reference genome sequence (http:// grand. crica as. com. cn/ home), and clean reads from seven different tissues were www.nature.com/scientificreports/ used for genome location analysis using TopHat2 software 66 . StringTie software was used to reassemble all clean reads for the prediction of new transcripts 67 . Gene expression level quantification was estimated via the FPKM (fragments per kilobase of transcript sequence per million fragments mapped) method using HTSeq software 68 . DEGseq software was used to identify the DEGs of different tissues according to a p value ≤ 0.05 and |Log2(Fold Change)| ≥ 2 69 . Finally, the statistical results for the DEGs among tissues were obtained.
Functional classification of DEGs. The functional analysis of DEGs was conducted with GO and KEGG annotation. The GOseq R (v4.0.2) software package 70 (https:// www.r-proje ct. org/)was used for the GO enrichment analysis of DEGs. We used KOBAS (v3.0) software (http:// kobas. cbi. pku. edu. cn/ kobas3/) to test the statistical abundance of DEGs in the KEGG database. The GO terms and KEGG pathways with corrected p values ≤ 0.05 were considered the thresholds to determine the significant enrichment of DEGs. The main functions and metabolic pathways of DEGs were preliminarily hypothesized.
Verification of RNA-seq results by qRT-PCR. To verify the accuracy of the obtained differential gene expression patterns between fiber and nonfiber tissues, we screened 36 genes that were significantly upregulated in fibers for verification analysis by qRT-PCR. These DEG-specific primers were designed with the Primer3 online tool (http:// bioin fo. ut. ee/ prime r3-0. 4.0/) (Table S5). According to the manufacturer's instructions, cDNA synthesis was performed from 1 μg of total RNA in a 20 μL reaction mixture using a PrimerScript RT kit (TAKARA, Dalian, China). The 20 μL reactions were performed using 10 μL of SYBR Premix Ex Taq II (TLi RanseH Plus) (TAKARA, Dalian, China), 0.8 μL of 10 mM forward and reverse primers each, 7.4 µL of ddH 2 O and 1 μL of cDNA template, after which amplification reactions were conducted. The cotton Sad1 gene was used as an internal reference gene. The qRT-PCR conditions were as follows: 95 °C for 30 s, 40 cycles of 95 °C for 5 s and 60 °C for 34 s. Three biological and technical replicates were performed for each sample to verify the results of the qRT-PCR test, and relative gene expression levels were quantified via the 2-ΔΔCt method.