Peppers (Capsicum annuum L.), belonging to the Solanaceae family, are one of the most economically important crops globally. Like other crops, peppers are threatened by diverse environmental conditions due to different pathogens and abiotic stresses. High-quality reference genomes with massive datasets of transcriptomes from various conditions can provide clues to preferred agronomic traits for breeding. However, few global gene expression profiling datasets have been published to examine the environmental stress-resistant mechanisms in peppers. In this study, we report the RNA-seq analyses of peppers treated with heat, cold, salinity, and osmotic stress at six different time points. RNA-seq libraries from 78 RNA samples containing three biological replicates per time point for each of the abiotic stresses and a mock control were constructed. A total of 204.68 Gb of transcriptome data were verified by differentially expressed genes and gene ontology enrichment analysis. Analyses of the transcriptome data in this study will provide useful information for basic studies of various stimuli to facilitate the development of stress-resistant pepper cultivars.
|Measurement(s)||RNA • transcriptome|
|Technology Type(s)||RNA sequencing|
|Factor Type(s)||time point measurement|
|Sample Characteristic - Organism||Capsicum annuum|
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.11440659
Background & Summary
Abiotic stresses, such as heat, cold, drought, and salinity, which affect the condition of the soil can decrease crop quality, reduce crop production, and threaten food security. Plants respond to abiotic stresses via dynamic and complex reactions that accompany molecular, cellular, and physiological changes in plant tissues1. To understand the responses of plants to abiotic stresses, diverse crop breeding approaches have been applied from traditional breeding methods to variable -omics methods, such as next-generation sequencing (NGS).
Since the development of NGS, the transcriptome has been widely studied to gain insights into the molecular mechanisms by which plant species adapt to their environment. Currently, transcriptome data analyses of plants are performed in various organisms under diverse conditions, including exposure to abiotic stresses. Most transcriptome studies involving abiotic stresses have been performed in model plants, with a few studies examining crops treated with one or two different stresses at a certain plant development stage2,3,4,5,6,7,8. Therefore, limited comparative transcriptome analyses for plants responding to different abiotic stresses have been performed.
Peppers (Capsicum spp.) are an economically important crop belonging to the Solanaceae family. Over the last two decades, the production and cultivation of chili peppers have steadily increased worldwide, reaching 3.8 million ha of land for cultivation and 40.7 million tons of peppers produced in 2017 (FAO; www.fao.org). Recently, multiple reference pepper genomes and transcriptomes have been published; these datasets can be used to obtain abundant information on pepper breeding traits9,10,11. However, few studies have examined pepper breeding in diverse conditions, such as during exposure to abiotic and biotic stresses. Comprehensive transcriptome analyses under such diverse conditions are required to obtain a wide variety of gene expression profiles and identify complex gene expression networks.
In this study, we present transcriptome analyses of peppers subjected to four major environment stresses—heat, cold, drought, and salinity—at the same time points and at the same plant stages. We describe in detail the construction of 78 RNA-seq libraries for heat-, cold-, mannitol-, and NaCl-treated and untreated control samples at 0, 3, 6, 12, 24, and 72 h. A total of 204.68 Gb of transcriptome data were generated using transcriptome analysis pipelines consisting of quality control, quantification, and differential gene expression analyses. A principal components analysis (PCA) test, hierarchical clustering of gene expression data and gene ontology (GO) enrichment analysis were used to infer the quality of the RNA-seq data and the characteristics of samples in each treatment. The extensive transcriptome data obtained will provide valuable information for future studies of crops exposed to abiotic stresses.
Overview of experimental design
The third or fourth leaves were collected from four pepper plants per each biological replicate. Leaves were harvested at 3, 6, 12, 24, and 72 h after treatment. Mock controls were simultaneously collected with each abiotic treatment sample at 0, 3, 6, 12, 24, and 72 h. Marker gene expression for each condition was confirmed for 78 RNA samples by RT-PCR (Fig. 1). Subsequently, RNA-seq libraries were constructed and sequenced. Transcriptome data were used to conduct a quality assessment and aligned to Capsicum annuum cv. CM334 reference genome (v.1.6). The workflows for the abiotic stress treatment and transcriptome data analysis pipeline are presented in Fig. 1.
Plant materials and treatment
Two weeks after germination, the pepper seedlings were transplanted into a 32-plug tray (6 cm in diameter by 6.5 cm in height) and maintained in a growth room at 24 ± 1 °C with a 16-h light and 8-h dark photoperiod. At the six-true-leaf stage, plants were subjected to a temperature of 10 °C or 40 °C to mimic cold or heat stress, respectively. For salinity stress, plants were treated with 50 mL of a 400 mM NaCl solution; for osmotic stress, the peppers were treated with 50 mL of 400 mM mannitol. For transcriptome profiling, the third or fourth leaves from four plants were harvested per replicate at 0, 3, 6, 12, 24, and 72 h after treatment (Fig. 1). Three biological replicates at each time point per condition were collected. The leaf samples were flash-frozen in liquid nitrogen and stored at −80 °C until RNA isolation.
RNA extraction, library construction, and sequencing
Total RNA was extracted from pepper leaf samples (100 mg) using the Trizol reagent (Ambion, USA), according to the manufacturer’s instructions. To perform RNA quality control, RNA was quantified spectrophotometrically using a NanoDrop 2000 spectrophotometer (Thermo Scientific, USA), and RNA integrity was verified by agarose gel electrophoresis. Marker gene expression for each treatment was confirmed by RT-PCR analysis using primers specific for each marker gene: heat stress (CaWRKY)12, cold stress (CaDhn)13, salinity stress (CaPR10)14, and osmotic stress (CaDhn)13 (Fig. 1). RT-PCR was performed using a GeneAtlas thermo-cycler G-02 device (Astec, Japan) using rTaq DNA polymerase (Elpis, Korea) as described by the manufacturer. The gene expression level was normalized to the expression of the CaActin gene and was calculated relative to mock control. Values were calculated following three replications with standard deviations (Fig. 1). Five micrograms of each RNA sample were used to generate a strand-specific library containing inserts of approximately 150–200 bp in size, as previously described15. In total, 78 cDNA libraries from five treatments (i.e., the four abiotic stresses and a mock control) were constructed for transcriptome analysis (Table 1). For RNA sequencing, 150-nt, paired-end sequencing was conducted using a HiSeq 2500 platform (Illumina, USA) at Macrogen (Korea).
Data preprocessing, gene quantification, and GO enrichment analysis
The raw RNA sequences were filtered and trimmed using cutadapt16 and the NGS QC Toolkit17 to remove low-quality bases and adapter sequences. After filtering, the trimmed reads were assessed using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and quality results were then merged with multiQC18 using default parameters. The preprocessed reads were aligned to C. annuum v.1.6 reference genome (GenBank Accessions: AYRZ00000000)19 and the annotation gene model v.2.0 (http://peppergenome.snu.ac.kr) using Hisat v2-2.1.0 software20 and default parameters. Transcriptome quantification was performed using featureCounts21 to calculate the transcript read counts. Raw read counts were normalized using TMM methods22. PCA was performed using previously published code with modification23.
We used the edgeR with three package (estimateDisp, glmQLFit, and glmQLFTest) to conduct differential expression analysis24. To design a model formula, all the experimental factors such as time points and treatments were combined into one factor. Then, we found genes that respond differently between the treatment and the mock at any time points. Genes with an adjusted p-value < 0.05 and fold change (FC) > |2| were considered differentially expressed genes (DEGs). The DEGs for each point in treatments were collected to union sets of the DE gene across the time points. The top 30 genes of DEGs for each stress were displayed into heatmap using pheatmap package25. GO enrichment analysis for DEGs were performed based on the functional annotation of Arabidopsis genome (TAIR10, http://www.arabidopsis.org). The best hit proteins were mapped by the best match of BLASTP with filtering category (query coverage ≥60%, subject coverage ≥60% and identity ≥60%). The GO enrichment analysis for best hit proteins was performed using clusterProfiler26 in the R package with org.At.tair.db for Arabidopsis annotation package27 (See the file “Programs and code information.docx” at figshare). The enriched GO terms were obtained by FDR < 0.05 and minGSize = 10. Then, the stress-related GO terms were visualized using ggplot228. All expression profiling, DEG information and the results of GO enrichment analysis are available at figshare27 (see the files “Normalized TMM.zip”, “DEG result for each stress.zip” and “GO enrichment analysis result.zip” at figshare).
The RNA-seq raw data of 78 samples are deposited at the NCBI Sequence Read Archive (SRA) with identifier SRP18779429. The gene expression quantification data of all the samples was deposited at Gene Expression Omnibus (GEO) database with identifier30. The combined additional files and information generating this study have been uploaded to figshare27.
The quality of the RNA-seq data was assessed by investigating the mean quality score per position and per sequence, as well as the GC content and read length distribution using FastQC and multiQC18. The assessment plots are shown in Fig. 2. The quality scores of the bases per position were higher than the Phred quality score of 25, and all reads were greater than the quality score of 20. The GC content of all samples was shown as a normal distribution; these data indicate a lack of sequence contamination during the sequencing process. These statistics revealed that the raw reads were of high quality. Alignments of the preprocessed reads had a high mapping rate, which on average 70.14% and 90.38% in the gene model (v.2.0) and reference genome of C. annuum (v.1.6), respectively27 (see the file “Statistical summary of RNA-seq for each sample.xlsx” at figshare).
Analysis of transcriptome data
To quantify global gene expression patterns for multiple abiotic stresses, the mapped reads were calculated into read counts for the individual pepper genes. The distributions of all samples for normalized read counts were compared and are shown as a boxplot in Fig. 3a. These distributions were similar between the samples. A PCA analysis revealed that the first two PCs explained most of the variance, and samples from each treatment belonged to the same cluster with similar patterns (Fig. 3b). We further investigated the global gene expression profiles by performing analyses of the DEGs related to each abiotic stress and compared the results to those of the mock control27 (See the file “DEG result for each stress.zip” at figshare). As shown in Fig. 3c, the y-axis depicts the fold change in the log2-transformed data, and the x-axis represents the log2-transformed average counts per million reads (CPM). Upregulated DEGs are highlighted in red, whereas downregulated DEGs are shown in blue, with an adjusted p-value < 0.05 and FC > |2|. The gene expression patterns for the 30 top-ranked DEGs are shown in a heatmap (Fig. 4a). We identified a total of 12,494 DEGs shared and unique between stress treatments (Fig. 4b). To validate plant responses to each abiotic stress, GO enrichments were analyzed27 (See the file “GO enrichment analysis result.zip” at figshare) and we represented stress-responsive GO enrichments showing conserved and unique GO terms by comparison of each treatment (Fig. 4c). The distinctive patterns of gene expression and GO enrichments suggest that these data would be useful for comparing changes in gene expression for other abiotic stresses.
Codes that were used for the RNA-seq data processing are available at figshare27. Software and their versions were described in Methods.
Zhu, J. K. Abiotic stress signaling and responses in plants. Cell 167, 313–324 (2016).
Coolen, S. et al. Transcriptome dynamics of Arabidopsis during sequential biotic and abiotic stresses. Plant J. 86, 249–267 (2016).
Rasmussen, S. et al. Transcriptome responses to combinations of stresses in Arabidopsis. Plant Physiol. 161, 1783–1794 (2013).
Cohen, S. P. & Leach, J. E. Abiotic and biotic stresses induce a core transcriptome response in rice. Sci. Rep. 9, 6273 (2019).
Ma, J. et al. Transcriptomics analyses reveal wheat responses to drought stress during reproductive stages under field conditions. Front. Plant Sci. 8, 592 (2017).
Liu, Z. et al. Temporal transcriptome profiling reveals expression partitioning of homeologous genes contributing to heat and drought acclimation in wheat (Triticum aestivum L.). BMC Plant Biol. 15, 152 (2015).
Gong, L. et al. Transcriptome profiling of the potato (Solanum tuberosum L.) plant under drought stress and water-stimulus conditions. PLoS One 10, e0128041 (2015).
Kang, W. H. & Yeom, S. I. Genome-wide Identification, classification, and expression analysis of the receptor-like protein family in tomato. Plant Pathol. J. 34, 435–444 (2018).
Kim, S. et al. New reference genome sequences of hot pepper reveal the massive evolution of plant disease-resistance genes by retroduplication. Genome Biol. 18, 210 (2017).
Kim, M. S. et al. Global gene expression profiling for fruit organs and pathogen infections in the pepper. Capsicum annuum L. Sci. Data 5, 180103 (2018).
Kang, W. H., Kim, S., Lee, H. A., Choi, D. & Yeom, S. I. Genome-wide analysis of Dof transcription factors reveals functional characteristics during development and response to biotic stresses in pepper. Sci. Rep. 6, 33332 (2016).
Dang, F. F. et al. CaWRKY40, a WRKY protein of pepper, plays an important role in the regulation of tolerance to heat stress and resistance to Ralstonia solanacearum infection. Plant Cell Environ. 36, 757–774 (2013).
Kim, S. Y. et al. Identification of a CaRAV1 possessing an AP2/ERF and B3 DNA-binding domain from pepper leaves infected with Xanthomonas axonopodis pv. glycines 8ra by differential display. Biochim. Biophys. Acta 1729, 141–146 (2005).
Park, C. J. et al. Pathogenesis-related protein 10 isolated from hot pepper functions as a ribonuclease in an antiviral pathway. Plant J. 37, 186–198 (2004).
Zhong, S. et al. High-throughput illumina strand-specific RNA sequencing library preparation. Cold Spring Harb. Protoc. 2011, 940–949 (2011).
Martin, M. Cutadapt removes adapter sequences From high-throughput sequencing reads. EMBnet. Journal 17, 10–12 (2011).
Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).
Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Kim, S. et al. Capsicum annuum cultivar CM334, whole genome shotgun sequencing project. GenBank, https://identifiers.org/ncbi/insdc:AYRZ00000000 (2017).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Sollner, J. F. et al. An RNA-Seq atlas of gene expression in mouse and rat normal tissues. Sci. Data 4, 170185 (2017).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Kolde, R. pheatmap: Pretty Heatmaps, https://CRAN.R-project.org/package=pheatmap (2015).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Kang, W. H. et al. Transcriptome profiling of abiotic responses to heat, cold, salt, and osmotic stress of Capsicum annuum L. figshare. https://doi.org/10.6084/m9.figshare.10264832.v5 (2019).
Wickham, H. ggplot2: elegant graphics for data analysis. Springer-Verlag New York (2009).
NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRP187794 (2019).
NCBI Gene Expression Omnibus, https://identifiers.org/GEO:GSE132824 (2019).
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korean Government (NRF-2017R1E1A1A01072843 and NRF-2019R1C1C1007472) and the Korean Research Institute of Bioscience and Biotechnology initiative program. N.K. and J.L. were supported by a scholarship from the BK21 Plus Program from the Ministry of Education. We appreciate the assistance from the KOBIC Research Support Program.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Kang, W., Sim, Y.M., Koo, N. et al. Transcriptome profiling of abiotic responses to heat, cold, salt, and osmotic stress of Capsicum annuum L.. Sci Data 7, 17 (2020). https://doi.org/10.1038/s41597-020-0352-7