Transcriptomic profile of tobacco in response to Phytophthora nicotianae infection

Black shank, caused by Phytophthora nicotianae (P. nicotianae), is a serious disease of cultivated tobacco (Nicotiana tabacum) worldwide. The interactions between tobacco and P. nicotianae are complex and the outcomes of the interactions depend on the tobacco genotype, P. nicotianae strain, and environmental conditions. In this study, we used RNA-sequencing (RNA-Seq) to investigate and compare transcriptional changes in the stems of tobacco upon inoculation with P. nicotianae strain race 0. We used two tobacco varieties: RBST (named from resistance to black shank and tobacco mosaic virus), which was resistant to the P. nicotianae strain race 0, and Honghuadajinyuan (HD), which was susceptible to P. nicotianae race 0. Samples were collected 12 and 72-hour post inoculation (hpi). Analysis of differentially expressed genes (DEGs) and significantly enriched GO terms indicated that several basic defense mechanisms were suppressed in both varieties, which included response to wounding (GO: 0009611), and defense response to fungus (GO: 0050832). We also found some genes that may especially be related to mechanisms of resistance in RBST, such as the one encoding a chitinase. These results will provide a valuable resource for understanding the interactions between P. nicotianae and tobacco plants.

The amount of P. nicotianae in the infected samples. It is unknown whether there were different trends for the change in the number of the pathogen during its infection of different tobaccos, we analyzed the read numbers that were derived from P. nicotianae. There were 1.9% and 2.3% reads were derived from P. nicotianae for HD in 12 h and 72 h. There were 1.4% and 2.2% reads were derived from P. nicotianae for RBST in 12 h and 72 h. The late stage of the infection had more reads of P. nicotianae than that at the early stage. The trends were similar for the two tobaccos. However, the susceptible variety HD had more reads of P. nicotianae than that the resistant variety RBST had at the early stage of infection.
Differentially expressed genes for each variety at two time points after infection. The gene expression profiles of healthy HD and RBST stems were used as controls. All genes and their gene expression data are listed in Supplementary Table S1 and Supplementary Table S2. If the gene expression in infected stems recorded a 4-fold (or more) difference relative to the control (P < 0.05), this gene was regarded as the differentially expressed gene (DEG).
For HD, there were 2,778 DEGs at 12 h, and 2,728 DEGs at 72 h after inoculation. The number of upregulated DEGs at 12 h after inoculation was 1,368, and 1,152 at 72 h (Fig. 2a). For RBST, the number of DEGs decreased from 691 at 12 h after inoculation to 550 at 72 h after inoculation. The number of upregulated DEGs was lower than that of downregulated DEGs at 12 h (121 vs. 570 genes) and 72 h (58 vs. 492 genes) after inoculation (Fig. 2b).

Samples
Raw reads Clean reads Clean bases Q20 (%) Q30 (%) Notes As shown in Fig. 2, a Venn diagram was generated from the DEG lists at 12 h, and 72 h after inoculation to identify shared members. There were 2,011 differentially expressed genes at all time points in HD. The number of upregulated and downregulated DEGs were 842 and 1,169, respectively. For RBST, there were 373 DEGs that could be identified at all-time points. The number of upregulated and downregulated DEGs were 20 and 353, respectively.
As shown in Fig. 2 Shared differentially expressed genes during P. nicotianae infection. The above-mentioned two lists of shared DEGs were further analyzed for commonalities and differences. As shown in the Venn diagram in Fig. 3, there were 264 common DEGs in two lists of DEGs. We were especially interested in the features of the 15-upregulated common DEGs (Supplementary Table S3). One gene, TR75953_c0_g12, was predicted to encode dehydration-responsive element-binding protein. This gene was involved in abscisic acid (ABA) -independent defense against pathogens in Arabidopsis thaliana and tobacco 13,14 . Another gene, TR30085_c0_g2, encode a zinc   Table S4). One gene, TR75736_c2_g1, which encodes a chitinase, is usually involved in the response to fungal pathogens 16 . Gene ontology (GO) enrichment analysis of differentially expressed genes. Within 49,935 identified unigenes, the protein products of 34,065 unigenes were annotated with at least one GO term. Using these 34,065 unigenes as references, 526 unique upregulated DEGs in HD_12 h, and 310 unique upregulated DEGs in HD_72 h, were enriched in 53 and 41 GO terms, respectively. These 53 GO terms were involved in "plastid part (GO: 0044435)" and "chloroplast thylakoid (GO: 0009534)" (Supplementary Table S5). These 41 GO terms were involved in "biological regulation (GO: 0065007)" and "gibberellin biosynthetic process (GO: 0009686)" (Supplementary Table S6). Similarly, 101 unique upregulated DEGs in RBST_12 h, and 38 unique upregulated DEGs in RBST_72 h, were enriched in 6 and 6 GO terms (Supplementary Table S7). These GO terms at 12 h were involved in "cell junction assembly (GO: 0034329) and "secondary cell wall (GO: 0009531)". The GO terms at 72 h were involved in "defense response, incompatible interaction (GO: 0009814)" and "innate immune response (GO: 0045087)".
Only five upregulated DEGs were specifically identified and two GO terms were specifically enriched in RBST. They were "exochitinase activity (GO: 0035885)" and "endochitinase activity (GO: 0008843)".
Quantitative RT-PCR (qRT-PCR) validation of differentially expressed genes. A subset of 10 genes, which responded to P. nicotianae infection, were selected for quantitative real-time PCR (qRT-PCR) analyses ( Supplementary Fig. S1). qRT-PCR analyses showed the trends of expression to be consistent with those found by RNA-Seq.

Discussion
Black shank, as one of the most important plant diseases, causes crop losses worldwide 17 . In order to identify tobacco genes involved in broad-spectrum resistance to tobacco black shank, a previous study using suppression subtractive hybridization (SSH) was performed in Nicotiana megalosiphon to identify differentially expressed cDNAs, and 48 differentially expressed genes were discovered 18 . In this study, we used next-generation sequencing approaches to investigate the gene expression changes associated with the characteristic disease development process induced in tobacco plants by P. nicotianae. The results will assist in the discovery and annotation of important genes in plant defense response, physiology and metabolism. RNA-Seq analysis identified a total of 49,935 unigenes in tobacco. GO analysis showed that 34,065 of the unigenes could be grouped into 49 known GO terms. Our analyses of GO terms generally represented the main biological GO classification and ensured the integrity of the downstream functional analyses of the candidate genes.
It is worth mentioning that various unique GO terms were enriched at different time points. For example, the cellular component GO terms plastid part (GO: 0044435) and chloroplast thylakoid (GO: 0009534) were only enriched in HD after 12 h of infection. At 72 h, GO terms biological regulation (GO: 0065007) and gibberellin biosynthetic process (GO: 0009686) were enriched. Close examination of these GO terms may provide molecular insight into the mechanism of response to pathogens at different times after infection. The chloroplast is a photosynthesis-related organelle. Photosynthesis has been reported to modulate plant defense responses induced by pathogen infection 19 . At 72 h, some new biological processes were enhanced, such as gibberellin biosynthetic process. Previous studies have proven that the function of plant hormone signal transduction plays an important role in the defense response 20 . It is interesting that increasing the content of gibberellin in N. benthamiana plants could enhance the susceptibility of these plants to infectious pathogens 21 . One gene, TR75953_c0_g12, was upregulated in the HD both at 12 h and 72 h. This gene was involved in abscisic acid (ABA)-independent defense against pathogens in Arabidopsis thaliana and tobacco 13,14 . ABA has been considered a negative regulator of disease resistance. These may be some reasons why HD is susceptible to P. nicotiana. Similarly, some special GO terms for RBST were identified at different time points. Cell junction assembly (GO: 0034329) was enriched at 12 h, and innate immune response (GO: 0045087) was enriched at 72 h. Our results further revealed that several genes related to innate immunity, e.g., ADR1 and RPM1 22,23 , were also induced during systemic symptom development. These results indicated that the two varieties had different mechanisms to respond to P. nicotianae infection at different time points.
Our differentially expressed gene (DEG) analysis provided further insight into the molecular mechanism for common biological processes between resistant and susceptible tobaccos after P. nicotianae infection. Analysis of significantly enriched GO terms indicated that several biological processes are common between the two varieties. Most of the upregulated DEGs are enriched in cellular component GO terms, which were related to organelle construction. On the other hand, the downregulated DEGs belong to biological processes: response to wounding (GO: 0009611) etc. These downregulated biological processes belonged to defense-related pathways. And these pathways played important roles in defense response 24 . Those down-regulated genes in these pathways were likely to decrease basal defense response and reduce disease resistance to P. nicotianae infection in both resistant and susceptible tobaccos. It is well known that plant pathogens have evolved to secrete effectors, which can manipulate the host immune system and suppress host defense 25 . Previous papers have evidenced that some resistance (R) genes can be activated by specific effectors, which is shown by the accumulation of higher levels of R genes transcripts. Therefore, we paid attention to those putative R genes that could possibly be induced by P. nicotianae infection of the two varieties. Among these genes, TR29096_c1_g6 and TR29096_c0_g1 were commonly up-regulated in HD and RBST. These R proteins directly or indirectly detect bacterial or fungal effectors, which will activate downstream signaling and lead to pathogen resistance 26 .
The variety RBST has high levels of resistance to P. nicotianae race 0. However, why it could avoid the development of symptoms was not clear. In this study, we found some genes and biological processes that may be related to this resistant mechanism. Gene TR75736_c2_g1, which encodes chitinase, participated in exochitinase activity (GO: 0035885) and endochitinase activity (GO: 0008843). The expression level of this gene was 10 times higher than that of the control. In vitro assays in rice indicated that the chitinase showed antifungal activity with a clear inhibitory effect on the growth of the pathogen Rhizoctonia solani 27 . The gene (TR24473_c0_g2) encoding plasma membrane protein, was upregulated in RBST. The plasma membrane is involved in important cellular process that determines signal response of plant to pathogen such as hypersensitive response 28 . There was no overlap between the 48 genes identified by SSH study 18 and genes identified by our RNA-seq study. This may due to the different species used in these studies. The SSH study was performed in Nicotiana megalosiphon.
Genetic resources for resistance to black shank in the varieties of tobacco currently in use for production are mainly derived from Florida 301 29 and N. plumbaginifolia 30 . The varieties currently used to produce commercial flue-cured tobacco are not completely immune to black shank, and the effects can still be seen when pathogen levels exceed a critical threshold. Therefore, to reduce losses in tobacco production, it will be essential to perform more studies to identify genes and pathways involved in resistance to black shank.
In conclusion, the gene expression patterns of HD and RBST after P. nicotianae infection provide a solid foundation for future studies of the molecular mechanisms underlying the response of tobacco to black shank.

Materials and Methods
Inoculation of tobacco with P. nicotianae. The resistant tobacco breeding line RBST and the susceptible tobacco cultivar Honghuadajinyuan (HD) were used in the experiments. The two varieties of N. tabacum were developed by our institution. The detailed development processes of the two varieties had been described in the Supplement Materials and Methods. A field-isolate of P. nicotianae race 0 was used for all inoculations throughout this study. The inoculum and the protocol for inoculation under greenhouse conditions were prepared as described 31 . The infected stem was harvested at two time points, 12 and 72 h post inoculation.
Sample collection and library preparation. All three biological replicate samples were used for RNA extraction. Total RNA was extracted from inoculated as well as noninoculated plants using TRIzol reagent (Invitrogen Corp., Carlsbad, CA). RNA purifications were performed using an RNeasy Mini Kit (Qiagen, Chatsworth, CA). Library preparation was carried according to the Illumina Hiseq RNA sample preparation kit (Illumina, San Diego, CA). All original data were deposited in the NCBI Sequence Read Archive database (accession number: SRP074868).
De novo assembly and annotation of transcriptomes. Raw data (raw reads) in fastq format were first processed through the NGS QC Toolkit 32 . In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N, and low quality reads from raw data. At the same time, Q20, Q30, and GC-content of the clean data were calculated. All downstream analyses were based on clean data. The black shank resistance gene in RBST was derived from N. plumbaginifolia 30 . The three varieties that had reference genomes do not have the introgression segment with resistance gene conferring resistance to P. nicotianae race 0 33 . So de novo assembled transcriptome was necessary. The transcriptome was de novo assembled using Trinity with the default parameters 34 . The candidates with the most probable longest ORF were generated from the Trinity assembly result using TransDecoder (https://transdecoder.github.io/). Protein sequences corresponding to the coding sequences of unigenes were obtained and searched against the NCBI non-redundant (nr) database, using blastp with a cut-off E-value of 10 −5 . Gene Ontology (GO) annotation was based on the Gene Ontology Database of Arabidopsis thaliana 35 . WEGO software was used to perform functional annotation analyses at three gene ontology levels (Biological Process, Molecular Function and Cellular Component) 36 .
Detection of differentially expressed genes. RSEM 37 , an accurate transcript quantification tool for RNA-Seq data, was utilized to quantify transcripts. Briefly, clean data were mapped back onto the assembled transcriptome. The read count for each gene was obtained from the mapping results. Differential expression analysis between inoculated and noninoculated plants was performed using the DESeq 38 and EdgeR 39 package. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate (FDR) 40 . The overlap between the sets of genes found by DESeq and edgeR with FDR < 0.05 and the absolute value of Log2 (Ratio) ≥ 2 were assigned as differentially expressed genes (DEGs). Venn Diagrams were drawn with VENNY (http://bioinfogp.cnb.csic.es/tools/venny).
Gene ontology enrichment analyses. We identified GO terms at three levels (biological process, molecular function and cellular component). GO terms enriched in genes from the differentiation analyses were identified with KOBAS software 41 . GO terms with a false discovery rate (FDR) 42 of less than 0.05 were considered over-represented.
Quantitative real-time PCR (qRT-PCR) validation. To validate the DEG results, 10 DEGs were randomly selected and qRT-PCR analysis was performed. Total RNA was isolated using a TRIzol kit (Invitrogen, USA). First-strand cDNA synthesis was performed using the PrimeScriptTM First Strand cDNA Synthesis Kit (Takara, Japan). The actin gene (GenBank no. X63603) was used as the internal control. Primer sets were designed using Primer Premier 6.0 software 43 (Supplementary Table S10). qRT-PCR was performed usinga SYBR Green qPCR kit (New England Biolab) according to the manufacturer's instructions. All qRT-PCR experiments were performed in triplicate using independent samples. Expression quantification and data analysis were performed using the 2 −ΔΔCt method 44 .
Data availability. The RNA-Seq raw data were deposited in the NCBI Sequence Read Archive (SRA) with the accession number SRP074868.