Transcriptome profiling of Capsicum annuum using Illumina- and PacBio SMRT-based RNA-Seq for in-depth understanding of genes involved in trichome formation

Trichomes, specialized epidermal cells located in aerial parts of plants, play indispensable roles in resisting abiotic and biotic stresses. However, the regulatory genes essential for multicellular trichrome development in Capsicum annuum L. (pepper) remain unclear. In this study, the transcript profiles of peppers GZZY-23 (hairy) and PI246331 (hairless) were investigated to gain insights into the genes responsible for the formation of multicellular trichomes. A total of 40,079 genes, including 4743 novel genes and 13,568 differentially expressed genes (DEGs), were obtained. Functional enrichment analysis revealed that the most noticeable pathways were transcription factor activity, sequence-specific DNA binding, and plant hormone signal transduction, which might be critical for multicellular trichome formation in hairy plants. We screened 11 DEGs related to trichome development; 151 DEGs involved in plant hormone signal transduction; 312 DEGs belonging to the MYB, bHLH, HD-Zip, and zinc finger transcription factor families; and 1629 DEGs predicted as plant resistance genes (PRGs). Most of these DEGs were highly expressed in GZZY-23 or trichomes. Several homologs of trichome regulators, such as SlCycB2, SlCycB3, and H, were considerably upregulated in GZZY-23, especially in the trichomes. The transcriptomic data generated in this study provide a basis for future characterization of trichome formation in pepper.


Results
Trichome phenotypic analyses. Trichomes are specialized epidermal cells, which are present on the leaves, stems, carpopodium, and calyx of pepper 3 . Here, a new variety of pepper, designated as GZZY-23, was identified, which exhibits large, dense, and long trichomes on the surface of the stems and leaves (Fig. 1A). However, another pepper variety PI246331 was glabrous on the leaves and stems (Fig. 1A).
To characterize the trichome phenotypes, we conducted a scanning electron microscopy (SEM) assay. We found that long macro-trichomes and sporadic short trichomes cover the stem surface of GZZY-23 (Fig. 1B). PI246331 had sporadic short trichomes on the stems and presented glabrous phenotype (Fig. 1B). Different from the trichomes in Arabidopsis and tomato 46,47 , the long trichomes on GZZY-23 stems were simple multicellular trichomes, which had no branches, glands, or prickles (Fig. 1A,B).

Sequencing and de novo transcriptome assembly. To identify differentially expressed genes (DEGs)
that possibly led to multicellular trichome formation, nine cDNA libraries of PI246331, GZZY-23, and Trichome were sequenced by Illumina RNA-seq, and one cDNA library was constructed by SMRT technology. For SMRT sequencing, the Pacific Biosciences Sequel platform was used to obtain 27,748,483 subreads. 299,864 circular consensus sequence (CCS) reads were identified, with an average length of 1654 bp and an N50 length of 1988 bp, including 226,497 full length (FL) reads, 192,252 full-length non-chimeric (FLNC) reads, and 192,094 FLNC reads with polyA. The average length of the FLNC reads with polyA was 1301 bp, and its N50 length was 1547 bp (Table S1).
For Illumina sequencing, 54.37 Gb clean reads were obtained with an average of 6.04 Gb for each sample, after removing low quality reads and trimming adapter sequences (Table S2). The average Q30 contents were 88.1% to 91.6%, and GC contents were 41.9% to 44.2%, respectively (Table S2). These data indicate that the sequencing results were accurate. To assess the quality of the RNA-Seq data, the reads were mapped to the Capsicum annuum. L Zunla-1 reference genome. Among them, 87%-93% clean reads in each sample were successfully mapped to the reference genome (Table S2). Of these, the proportion of mapped reads among PI246331, GZZY-23, and Trichome samples was 71% to 84% of the unique mapping ratio. The percentage of the unique mapping reads was above 75% in each sample, suggesting that the RNA-Seq libraries exhibited high quality. In addition,  (Table S3). These results demonstrated that the assembly quality of RNA-Seq data was satisfactory.
Gene structure and function annotation. In (Tables 1 and S5). The number of transcripts greater than 2 kb generated by SMRT sequencing was significantly higher than that produced from the reference genomes (Table 1) (Table S5). Alternative splicing (AS) can enhance transcriptome plasticity and proteome diversity. In plants, AS occurs at different stages of development 48 . It is well known that long reads generated by SMRT sequencing platform are suitable for widely and accurately identifying AS forms 49,50 . In this study, 4427 AS events were identified in the transcripts by ASTALAVISTAA Stalavista 51 . The main types of AS included exon skipping (ES), intron retention (IR), alternative donor sites (AD), and alternative acceptor sites (AA) (Table S6). Among the main types of alternatively spliced transcripts, IR was found to be predominant, accounting for 26.84% of the AS transcripts, followed by AA (18.86%), ES (12.26%), and AD (8.53%) (Table S6).
Long non-coding RNAs (LncRNAs), which are not translated into proteins, widely exist in plants and play important roles in plant growth and development 52 . Here, 2227 LncRNAs were predicted (Table S7). Additionally, 10,327 alternative polyadenylation (APA) sites in 5947 genes were identified using full-length transcriptome APA detection software Tapis 53 (Table S8).
Gene expression analysis based on Illumina and SMRT data. 194,591,984 clean read pairs produced by Illumina sequencing were aligned to the newly constructed transcript library using Bowtie software 54 . The gene expression patterns of PI246331, GZZY-23, and Trichome samples were determined using the fragments per kilobase million (FPKM) values and RSEM software 55 . The results revealed 28,704, 29,127, and 28,835 genes expressed in PI246331, GZZY-23, and Trichome, respectively ( Fig. 2A). Among these expressed gene clusters, the FPKM values of 75% of the genes were < 25, approximately (Fig. 2B), and those of approximately 3% of the genes were > 200 (Fig. 2B). It was clear that the gene expression levels in different materials displayed a similar trend (Fig. 2B). The gene expression distribution analysis showed that among the 31,389 expressed genes, 26,354 (84%) genes were expressed in all three materials, whereas a small number of genes were expressed only in one condition (2.4% of PI246331, 2.1% of GZZY-23, and 3.3% of Trichome) ( Fig. 2A). Furthermore, 965 (3.1%) genes were expressed in both GZZY-23 and Trichome, but not expressed in PI246331 ( Fig. 2A).
DEGs were statistically evaluated according to the DESeq2 method 56 . By comparing in pairs, 13,568 DEGs were obtained, of which 2955 DEGs were discovered in PI246331 vs. GZZY-23, 9824 DEGs in GZZY-23 vs. Trichome, and 11,813 DEGs in PI246331 vs. Trichome (Fig. 2C,D). Among these DEGs, the expression of 2049 (1176 upregulated and 873 downregulated) DEGs was both upregulated or downregulated in GZZY-23 and Trichome compared with those in PI246331, respectively. Furthermore, 3722 and 4608 DEGs were both upregulated or downregulated in Trichome compared with those in GZZY-23 and PI246331, respectively (Tables S9, S10, and S11). Moreover, 1038 DEGs overlapped in the three comparisons (Fig. 2C), including 521 and 474 DEGs most highly expressed in GZZY-23 and Trichome, respectively (Tables S9, S10, and S11). It was obvious that there were more genes upregulated in the three comparisons, indicating that many genes involved in trichome formation were highly expressed in GZZY-23, especially in Trichome.
To verify the sequencing data, 12 DEGs, including CaH (Capana00g000811), CaCycB2 (Capana10g002051), and CaCycB3 (Capana06g000649), were selected for qRT-PCR verification. As expected, Capana07g000998, Capana10g001433, and Capana11g000748 were highly expressed in GZZY-23 (Fig. 6). The expression of the other DEGs was the highest in the trichomes. Generally, the expression patterns determined by qRT-PCR were consistent with those obtained by RNA-Seq (Fig. 6), which confirmed the accuracy of the RNA-Seq results reported in this study. Rich factor is the proportion of the deferentially expressed genes in the pathway. The higher the rich factor, the higher the degree of enrichment. The FDR is the false discovery rate, ranging from 0 to 1; the closer the value to zero, the more significant enrichment.  www.nature.com/scientificreports/ DEGs related to signal transduction. In this study, a total of 151 DEGs were enriched in plant hormone signal transduction pathway in the three comparisons (Fig. 7). In auxin (indoleacetic acid, IAA) signal transduction, most DEGs encoding auxin response factor (ARF) and auxin responsive GH3 gene family (GH3) were upregulated in Trichome, whereas DEGs encoding auxin responsive protein IAA (Aux/IAA) and auxin influx carrier (AUX1) were downregulated in Trichome (Fig. 7). In cytokinine (CTK) signal transduction, DEGs encoding two-component response regulator ARR-A family (ARR-A) were upregulated in PI246331, whereas they were downregulated in Trichome. DEGs encoding two-component response regulator ARR-B family (ARR-B) were upregulated in Trichome (Fig. 7). In gibberellin (GA) signal transduction, DEGs encoding GA receptor GID1 and phytochrome-interacting factor 4 (TF) were upregulated in Trichome (Fig. 7). In brassinosteroid (BR) signal transduction, all DEGs were downregulated in Trichome (Fig. 7). In jasmonic acid (JA) signal transduction, 1 and 5 DEGs encoding jasmonate ZIM domain-containing protein (JAZ) were upregulated and downregulated in Trichome, whereas DEGs encoding jasmonic acid-amino synthetase (JAR1) and jasmonate ZIM domain-containing protein (JAZ) were upregulated (Fig. 7). Moreover, all DEGs involved in abscisic acid (ABA), ethylene (ETH), and salicylic acid (SA) signal transduction, were upregulated in trichomes, except two DEGs encoding the ABA receptor PYR/PYL family (PYR/PYL) and transcription factor TGA (Fig. 7).

Discussion
In the last two decades, molecular mechanisms involved in trichome formation have been investigated for the purpose of analyzing the factors controlling cell fate and plant cell differentiation 6,15 . However, the molecular regulation basis of multicellular trichome formation in pepper remains elusive. Similar to trichomes in tomatoes, pepper trichomes are multicellular structures. GZZY-23 is a hairy pepper variety with a high density of long non-glandular trichomes on the leaves and stems, and a few short glandular trichomes (Fig. 1A,B). PI246331, on the other hand, has sporadic short trichomes on the stem, showing a hairless phenotype (Fig. 1B). Therefore, these two materials are ideal candidates, making it easier to study the formation and development of multicellular trichomes. RNA-Seq has become one of the most widely used tools based on next-generation sequencing technology, and has been used to analyze transcriptional regulation during trichome formation in plants such as tomato, cucumber, and tobacco 33,57,58 . The proportion of full-length transcripts from Illumina RNA-Seq assembly is small, and inaccuracy in gene structure characterization resulting from misassembly is a common problem 44 . SMRT sequencing developed by PacBio can help achieve full-length sequencing without full-length post-sequencing assembly, and the error rate of PacBio reads can be overcome by correction with Illumina RNA-Seq 42 . Thus, the hybrid approach combining both short-read next-generation sequencing technology and long-read SMRT sequencing, as a high throughput and cost-effective approach of sequence determination, has considerably improved the quality, efficiency, and speed of gene discovery. In the present study, transcriptome sequencing and analysis were performed to reveal the molecular mechanisms of pepper trichome formation using a hybrid approach combining both Illumina-and SMRT-based RNA-Seq; 40,079 genes were obtained, which contained 35,336 previously annotated genes and 4,743 novel genes (Tables 2 and S5). Since there are many gaps in the Zunla-1 genome data, it cannot effectively predict all transcripts, especially long transcripts. However, long transcripts can be efficiently obtained using SMRT sequencing. Therefore, in this study, we found that SMRT sequencing produced more long transcripts, especially transcripts larger than 2 kb, compared with those produced by the reference genome (Table 1). A previous study showed that lncRNAs are involved in the regulation of trichome formation 59 . Therefore, it was necessary to screen lncRNAs and determine their possible roles in pepper trichome initiation. Here, 2,227 lncRNAs were identified based on PacBio sequencing data (Table S7). AS and APA are two ubiquitous post-transcriptional regulations that could generate transcriptome and subsequently proteomic diversity in eukaryotes 60 . PacBio sequencing has enabled the identification of the complexity of AS in plants at the genome-wide level and the identification of cleavage sites for polyadenylation, which is important for gene annotation 43,61 . Thus, using PacBio sequencing, we obtained 4,427 AS events in this study www.nature.com/scientificreports/ (Table S6). It was possible that these differential splicing-related genes are involved in the splicing regulation during trichome formation. PacBio sequencing was also used for obtaining long reference sequence and for profiling 10,327 APA sites from 5,947 genes in pepper (Table S8). The data presented a comprehensive view of APA at the genome-wide level.
Previous studies have showed that many genes essential for multicellular trichome formation were highly expressed in hairy materials, especially in trichomes, such as Wo, SlCycB2, SlCycB3, NtCycB2, H, and CaH 25,32,33 . Thus, according to the comparative analysis of the three groups, DEGs highly expressed in trichomes might participate in multicellular trichome formation in pepper. Here, 1,038 overlapping DEGs were identified in the three comparisons (Fig. 2C). Of which, 521 and 474 DEGs were most highly expressed in GZZY-23 and Trichome, respectively (Tables S9, S10, and S11). Interestingly, many homologs of known trichome regulators were also highly expressed in GZZY-23, especially in trichomes, such as CaH and CaCycB2 (Fig. 6; Tables S9, S10, and S11).
The formation of multicellular trichomes may be controlled in a distinct pathway from unicellular trichomes. However, multicellular trichomes in Solanaceous species may share a common regulatory pathway, as they are mediated by HD-ZIP IV, SlCycB2, and C2H2 26 . In the present study, the homologs of SlCycB2 and H were also found to be significantly differentially expressed in the three comparisons. CaCycB2 (Capana10g002051), CaCycB3 (Capana06g000649), and CaH (Capana00g000811) were upregulated in GZZY-23, especially in trichomes ( Fig. 6; Tables S9, S10, and S11). Many genes belonging to the HD-ZIP transcription factor family were highly expressed in GZZY-23, especially in trichomes (e.g., Capana01g001442 and Capana03g001526) (Table S15). Therefore, genes involved in pepper trichome formation may also function in the formation of trichomes in other Solanaceae members.
Several studies have reported that transcription factors are involved in trichome formation, including MYB, bHLH, HD-Zip, and C2H2 25,26,58 . In this study, transcription factor activity, sequence-specific DNA binding (GO:0003700), and sequence-specific DNA binding (GO:0043565) were obviously enriched (Fig. 3A). Moreover, 11 DEGs involved in trichome differentiation (GO: 0010026) and trichome morphogenesis (GO: 0010090) were identified in the three comparison groups (Table S14). Of these, Chr06.1236, Capana00g002415, and Capana00g003240 belonged to the MYB family. Capana07g001455 encoded the HD-ZIP transcription factor (Tables S9, S10 and S11). Here, 312 TFs belonging to the bHLH, bZIP, C2H2, MYB, and HD-Zip families were significantly upregulated or downregulated in GZZY-23 or Trichome (Table S15). The differentially expressed TFs highly expressed in trichomes were identified, and most of these TFs showed a higher expression in GZZY-23 than in PI246331, suggesting that they possibly participate in the regulation of multicellular trichome formation, such as CaH (Figs. 5 and 6). Interestingly, some TFs that were highly expressed in GZZY-23 were downregulated in trichomes, suggesting that these genes may regulate trichome initiation ( Fig. 5; Table S15). Studies have shown that the formation and development of trichomes are regulated by hormonal signals such as the GA, CK, and IAA signals 15,26 . The initiation of trichome development can be induced in the glabrous GA-deficient mutant gal-3 by applying exogenous GA 57 . The application of CK resulted in the formation of trichomes on the stem of the inflorescence 62 . The suppression of SlIAA15 in tomato can reduce the formation of trichomes, and this demonstrated that auxin is also required for multicellular trichome initiation 63 . Many genes related to the signaling pathway of phytohormones were also changed in the present study. We identified 151 DEGs involved in hormone signaling (Fig. 7). Most of those involved in SA, ABA, and ETH were upregulated in trichomes (Fig. 7).
Plant trichomes frequently function as the first line of defense against biotic and abiotic stresses through physical and/or chemical resistance 26,64,65 . Overexpression of a weed (Solanum americanum) proteinase inhibitor SaPIN2a in tobacco increased glandular trichome density and branches, which enhanced resistance to Helicoverpa armigera and Spodoptera litura 65 . In addition, overexpression of Wo in tobacco not only induced the formation of trichomes, but also increased plant resistance to aphids, suggesting that leaf trichome density positively correlated with herbivore resistance 26 . GO enrichment analysis showed that response to stress (GO:0006950), defense response (GO:0006952) and chitin metabolic process (GO:0006030) were enriched by upregulated DEGs in the comparisons PI246331 vs. GZZY-23 and PI246331 vs. Trichome (Fig. 3A,B and Table S12). Moreover, in our KEGG pathway analysis, upregulated DEGs were predicted to function in MAPK signaling pathway-plant (ko04016) and plant-pathogen interaction (ko04626) in the three comparisons ( Fig. 4 and Table S13), which was similar to the results of previous studies 26 . We thus speculate that the trichomes in GZZY-23 leaves and stems (Fig. 1A,B) may increase the resistance to herbivores. Previous studies demonstrated that hormones not only participate in trichome development, but also play an essential role in plant defense against herbivores 15,26,57,62,63,66 . Here, numerous genes related to ABA, ETH, JA, and SA signaling pathway were significantly upregulated in GZZY-23, especially in trichomes (Fig. 7), which may also contribute to herbivore resistance in pepper. Moreover, we found that the expression level of many genes participating in chloroplast formation and photosynthesis was repressed in GZZY-23, especially in Trichomes (Table S9, S10, S11). Therefore, it can be implied that the photosynthetic activity in GZZY-23 and Trichomes may be insufficient. Likewise, the declined photosynthetic activity in tobacco plants with more trichomes was observed due to quick cell proliferation in multicellular trichomes 26 .

Conclusions
To our knowledge, this is the first study to perform comparative transcriptome analysis between two Capsicum annuum L. cultivars to identify biological processes and functional gene activities involved in trichome development. We screened 312 DEGs that may be involved in trichome formation, belonging to the bHLH, MYB, HD-ZIP, and zinc finger protein families. Moreover, trichome regulators CaCycB2, CaCycB3, and CaH were considerably upregulated in the cultivar GZZY-23, especially in trichomes. The DEG analysis of plant hormone signal transduction indicated that plant multicellular trichome development may be mainly dependent on the IAA and CTK signaling pathways, and plant stress may be dependent on the ETH, ABA, and SA signaling pathways. In addition, 1629 DEGs were predicted as PRGs. Most of these genes were upregulated in GZZY-23 www.nature.com/scientificreports/ or trichomes. An extensive characterization of pepper trichomes will not only help understand the underlying molecular mechanisms involved in multicellular trichome development but also pave the way for creating new pepper varieties with desired trichome growth and density. Library construction and sequencing. Total  RNA-Seq data and differential expression analysis. The raw RNA-Seq data obtained after sequencing of cDNA libraries were quality checked with the FastQC package to eliminate low quality reads with only adaptor, unknown nucleotides of > 5%, or Q20 of < 20%. The high-quality clean reads screened from raw reads were mapped independently to the reference genome Capsicum annuum L. www.nature.com/scientificreports/ instructions. We performed quantitative real-time PCR (qRT-PCR) in 96-well plates on Thermo Fisher Scientific Biosystems QuantStudio 5 Real-Time PCR system (Applied Biosystem, MA, USA) using the SYBR Premix Ex Taq Kit (Takara, Dalian, China). Ubiquitin-conjugating protein CaUbi3 (Accession Number: AY486137.1) was used as the internal reference 75 . Three independent biological replicates were analyzed. The relative expression level of the selected genes was calculated using the 2 -∆∆CT method 76 . All primers for qRT-PCR were designed according to the transcript sequences using Primer Premier 5.0 and the primers used in this experiment are listed in Table S17.

Materials and methods
Identification of AS events and lncRNAs. AS events were identified using ASTALAVISTA program from Illumina and SMRT RNA-Seq data in this study. The Cuffdiff tool was used to identify these AS events using transcript models obtained from cufflinks 77 . The transcripts were excluded, which displayed a read number of less than 25% uniquely mapped reads compared with that of the surrounding exons, insufficient read coverage (read number < 30) at junction sites, and putative AS events that would lead to very short proteins in the alternate ORF 78 . Moreover, the SMRT RNA-Seq data were used to identify lncRNAs as described previously 79 .
After discarding the transcripts without strand information and transcripts that overlapped with known genes, the transcripts of longer than 200 nucleotides with FPKM higher than 0.5 in multiple-exons or 2 in single-exon in at least one sample were screened. The coding potential of the remaining transcripts was predicted using the coding potential calculator (CPC) and coding-non-coding index (CNCI) program; the transcripts with CPC scores ≤ 0 or CNCI ≤ 0 were retained as candidate lncRNAs.
Identification of TFs and PRGs. TFs, like MYB, bHLH, HD-ZIP, and zinc finger proteins were identified using PlantTFDB (http:// plant tfdb. gao-lab. org/), which included the sequences of 58 transcription factor families from 165 plant species 80 . The UniGene sequence was compared with the transcription factor database by Blastx alignment and the gene with the best E value < e −5 was selected as the annotation information for the UniGenes. There were more than 112 resistance genes and 104,335 candidate resistance genes in the Plant Resistance Genes database (http:// prgdb. crg. eu/ wiki/) 81 . The sequences of UniGenes were compared with those in the PRG database by Blastx alignment, and the best of these with an E value < 10 -5 was screened as the annotation information of the UniGene.

Data availability
The collection of seed complied with local and national guidelines and permissions of seed were obtained. The RNA-Seq data supporting the results of this article have been uploaded to the Sequence Read Archive of NCBI (National Center for Biotechnology Information). It could be accessed via the NCBI SRA database with accession numbers of PRJNA658737.