De-novo transcriptome assembly and analysis of lettuce plants grown under red, blue or white light

Lettuce (Lactuca sativa) is grown in various parts of the world for use as a leafy vegetable. Although the use of light-emitting diode (LED) in controlled plant production systems has been successfully used to enhance nutritional quality and plant growth efficiently, the molecular basis of lettuce’s response to varying light spectra is not studied. Using next-generation sequencing, we have analyzed the transcriptomes of leaf lettuce (Lactuca sativa var. ‘New Red Fire’) grown hydroponically in a modular agricultural production system under three different types of LED lighting: red, blue, and white light. Illumina HiSeq sequencing platform was used to generate paired-end sequence reads (58 Gb raw and 54 Gb clean data) of the transcriptome of lettuce leaves exposed to varying light spectra. The de novo assembled final transcriptome contained 74,096 transcripts. Around 53% and 39% of the assembled transcripts matched to the UniProt and RefSeq RNA sequences, respectively. The validation of the differentially expressed transcripts using RT-qPCR showed complete agreement with RNA-Seq data for 27 transcripts. A comparison of the blue versus red light treatments showed the highest number of significantly differentially expressed transcripts. Among the transcripts significantly up-regulated in blue-light-exposed leaves compared to white-light-exposed leaves, ~ 26% were involved in the ‘response to stress’. Among the transcripts significantly upregulated under red light compared to white light, ~ 6% were associated with ‘nucleosome assembly’ and other processes, such as ‘oxidation–reduction process’ and ‘response to water deprivation’ were significantly enriched. Thus, the result from the current study provides deeper insights into differential gene expression patterns and associated functional aspects under varying light qualities.


Materials and methods
Plant materials and treatments. Lettuce seeds Lactuca sativa L. cultivar 'New Red Fire' , was procured from Vesey Seeds, Canada. The experiments were conducted in compliance with the relevant international guidelines. No additional permissions are required to use the above-mentioned seed material as they were procured from a commercial source.
The seeds were sown in cell plug trays (60 × 41 × 5 cm) containing rockwool pellets (Grodan AO cubes Canada). The seedlings were grown for 15 d in a germination chamber (Percival Model GR-36L, USA) with 24 h photoperiod at a temperature of 20 ± 1 °C. The seedlings were transferred to a Modular Agricultural Production System (MAPS) which was custom-built by JGS Ltd. and the University of Guelph in conjunction with Com Dev International Ltd. (Canada) and Intravision Group AS (Norway) and were grown hydroponically for 35 days under conditions of 20 ± 1 °C temperature, 70 ± 10% relative humidity (Philips Humidifier AC2729/90, The Netherlands), and carbon dioxide levels of 500 µmol mol −1 (Module SCD30, Sensiron, Switzerland). Modified Hoagland nutrient solution 25 was used in recycling mode with a pH maintained at 5.5 ± 0.2 and EC 1.5 dS m −1 . The three treatments included exposure of lettuce to 100% red (Wavelength 630 nm; LUMILEDS, Philips Lumileds Lighting Company, The Netherlands), blue (Wavelength 460 nm; LUMILEDS, Philips Lumileds Lighting Company, The Netherlands), and white light (Wavelength 400-700 nm with peak at 550 nm; LUMILEDS, Philips Lumileds Lighting Company, The Netherlands) produced by LEDs separately in each level of the MAPS. The photosynthetic photon flux density was measured using a photometer (LI-250A, LI-COR Inc, USA) at a distance of 20 cm above the benchtop. The light intensity was adjusted to 250 µmol m −2 s -1 . Light spectral distribution was scanned using a spectroradiometer (RPS-900R, International Light Technologies, USA) at a distance of 20 cm above the tabletop. A light and dark 18:6 photoperiod cycle was used throughout the experiment.
A total of 54 seedlings were planted at each level for each treatment constituting 162 plants in total. The nutrients were supplied from the same tank for all the treatments and all the parameters were kept identical for the treatments except for light quality. Leaf samples were collected for RNA analysis from the lettuce plants on day 32 of the transfer of seedlings to the MAPS. Fully developed leaf samples were collected using a pair of sterilized scissors from 10 independent replicate plants to form one pooled biological replicate. For maintaining the homogeneity of the samples, multiple leaf samples were collected and pooled from each plant to have an unbiased representation of the leaf transcriptome. For each treatment, two such pooled biological replicates were collected constituting six samples for three treatments. The tissue samples were harvested using sterile scissors, wrapped in labeled aluminum foil, and snap-frozen in liquid nitrogen to preserve the biological status of the collected tissues. . Total RNA concentration and purity was measured using the NanoDrop™ spectrophotometer (Thermo Scientific, USA). The library was prepared for sequencing using the TruSeq RNA Sample Prep Kit v2 (Illumina Inc., USA). 200 ng total RNA was used as a starting material for purification using the oligo-dT beads. mRNA having a poly (A) tail were fragmented using the Elute, Prime, Fragment mix (Illumina Inc., USA). First-strand cDNA was synthesized by First Strand Master Mix, Super Script II reverse transcription kit (Invitrogen, CA, USA) under the following conditions: 25 °C for 10 min, 42 °C for 50 min, 70 °C for 15 min, followed by addition of second strand master mix, for the synthesis of the second strand at 16 °C for 1 h. Subsequently, the fragmented DNA was subjected to repair using the End Repair Mix by incubating the samples at 30 °C for 30 min. The end-repaired cDNA was further purified using Ampure XP Beads (Agencourt, Beckman Coulter, USA). Further, the samples were subjected to A-tailing and the RNA index adapter was added. The end-repaired cDNA was purified with Ampure XP Beads. Several rounds of PCR amplification with PCR Primer Cocktail and PCR Master Mix were performed to enrich the cDNA fragments. Then the PCR products were purified with Ampure XP Beads. The libraries were amplified on cBot to generate the cluster on the flowcell (TruSeq PE Cluster Kit V3-cBot-HS, Illumina, USA). The amplified flowcell was then sequenced to obtain paired-end reads of 150 bp using the Illumina HiSeq 2000 sequencer.
Quality analysis and trimming of raw data. A total of 58 Gb paired-end sequencing data of 150 bp read length (163.07 million reads) was checked for quality using FastQC v0.11.4 (https:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/) before and after trimming. The raw data was trimmed and filtered for lowquality reads based on base-quality score and length, using Prinseq-lite v0.20.4 26 . The minimum Phred quality score and read length considered for trimming was 20 (average quality score for each read) and 50 bp, respectively, allowing maximum Ns of 2.
De-novo transcriptome assembly, filtering, and quality assessment. After the stringent quality filtering and trimming of the raw data, 54 Gb of sequencing data (162.73 million reads) was used for de-novo transcriptome assembly using Trinity v2.4.0 27 with default parameters. The assembled transcriptome was quantified using RSEM tool within Trinity, and lowly expressed transcripts having a TPM of < 1 were excluded. Further, identical transcripts having ≥ 99% identity were removed from the raw assembly using CD-HIT 28 .
The quality of the filtered assembly was assessed by aligning the filtered reads back to the filtered assembly using Bowtie2 29 . Homology-based annotation of the filtered assembly was performed using BLASTx 30 against UniProt/SwissProt 31 and BLASTN against NCBI-RefSeq [https:// www. ncbi. nlm. nih. gov/ refseq/] RNA sequences (both the databases were downloaded in June 2017), with an E-value threshold of 1e-10 and maximum of one target sequence.
Quantification and differential analysis of genes and transcripts. Genes and transcripts were quantified using the filtered assembly across different conditions by RSEM package 32 within Trinity with default parameters. Differential expression analysis was performed at both gene and transcript levels using edgeR package 33 using RSEM calculated fragment counts. Significantly differentially expressed gene and isoform clusters between conditions were derived using |log2 fold|≥ 1 (absolute fold ≥ 2) and FDR corrected p-value of 0.01. Transcripts differentially expressed in both red and blue light compared to the white light-treated samples were considered for heatmap generation. The heatmap was constructed using the TPM values of the corresponding transcripts in ClustVis tool 34 by Euclidean distance matrix and average linkage method.
Validation of differentially expressed transcripts using RT-qPCR. A total of 28 significantly differentially expressed transcripts were selected across the comparisons for validation using reverse transcriptionquantitative polymerase chain reaction (RT-qPCR). Ten differentially expressed transcripts (5 up-and 5 downregulated) in each of the red and blue light conditions compared to white light were selected. Another set of 8 transcripts commonly differentially expressed (5 up-and 3 down-regulated) in both blue and red light compared to white light treatment was also selected for validation. Further, two transcripts (TRINITY_DN16902_c3_g1_ i2 and TRINITY_DN14773_c0_g2_i1) that showed no change in their expression in both blue and red light conditions compared to the white light were selected as the control. A total of nine PCR experiments were performed across all the samples. The primer details for all the transcripts considered for validation are provided in Supplementary File 1. Reverse transcription was performed using 250 ng of total RNA (iScript Reverse Transcription Supermix, Biorad, USA), followed by quantitative PCR on a Biorad Real-Time System (CFX96, Biorad, USA) using SYBR green real-time PCR mix (Biorad, USA). The PCR mix was incubated at 95 °C for 5 min and amplification was performed using the following cycling parameters: 40 cycles at 95 °C for 20 s, 60 °C for 20 s, and 72 °C for 20 s, and default melt curve setting were followed. The fold change of the transcripts between the test and control conditions was calculated using 2 −△△ CT method 35 . Functional analysis of differentially expressed isoforms. Significant  www.nature.com/scientificreports/ functional annotations. BLASTx 30 was used for matching the differentially expressed transcripts against Uni-Prot/SwissProt protein database with an E-value of 1e-10, and the results were used to obtain Gene Ontology annotations using Blast2GO 36 . The alignments considered were 20 with a word size of 3, while performing BLASTx 30 . Additionally, to gain better insights into the molecular mechanisms associated with light treatment, the best-matched Arabidopsis thaliana proteins (to the significantly differentially expressed isoforms) were analyzed for the enriched annotations, using DAVID Bioinformatics Resources 6.8 37 , and annotations and pathways enriched with a p < 0.05 were considered significant.

Results
RNA isolation, transcriptome sequencing, and de-novo assembly. The extracted RNA concentration was in the range of 210-409 ng/µl and the optical density 260/280 absorbance ratio was ranging between 2.06 and 2.12 indicating high quality of the RNA samples. Illumina sequencing produced around 27 million reads per sample, and after stringent quality filtering, 99.8% of the data was retained across different sample groups corresponding to blue, red, and white light treatment (Table 1). After stringent filtering, high-quality sequence data from all the samples were used for de-novo transcriptome assembly. The raw assembly contained 125,444 transcripts with an N50 value of 1656 bp ( Table 2). Assembled transcripts were filtered to remove low-quality transcripts having TPM (Transcripts per Kilobase Million) value of less than 1, and transcripts with 99% identity. A total of 74,096 transcripts having a length ranging from 201 to 13,702 bp (mean: 1072 bp) were obtained in the filtered assembly. The N50 value of the filtered assembly was found to be 1594 bp ( Table 2).
Quality assessment and annotation of filtered assembly. The quality assessment of the filtered assembly was performed by aligning the reads back to the transcriptome assembly and matching the assembled transcripts to the UniProt/SwissProt and RefSeq databases. Approximately, 95% of the filtered reads were aligned to the generated assembly (Table 3), indicating the completeness of the assembly. Furthermore, approximately 53% (39,242) and 39% (28,653) of the assembled transcripts showed similarity with UniProt/SwissProt protein Quantification of genes and transcripts and differential expression analysis. Transcripts with a minimum TPM value of 1 were considered as expressed. Plants grown under red light treatment expressed the highest number while the plants grown under white light expressed the lowest number of transcripts. Furthermore, lettuce grown under the red light resulted in the expression of the highest number of unique transcripts compared to the ones that were grown under blue or white light ( Table 4). The number of common and unique transcripts expressed in different samples is shown in Fig. 1.
A total of 2279 transcripts were found to be significantly differentially expressed between lettuce grown under blue and white light. Among these, 1388 were upregulated, whereas 891 were downregulated in plants grown under blue light compared to white light. A list of top 20 differentially expressed transcripts along with   www.nature.com/scientificreports/ their best-matched SwissProt sequences is provided in Table 5. Furthermore, a total of 1751 transcripts were significantly differentially expressed (upregulated: 547; downregulated: 1204) between lettuce grown under red and white light.  Fig. 2A) or red and white light (Fig. 2B) against the log-average (A-values, i.e., the average level counts for each gene across the two samples). The heatmap constructed using differentially expressed transcripts in lettuce grown under blue or red or white light treatment showed a clear clustering of the samples of the same groups (Fig. 3).

Validation of differentially expressed transcripts.
A total of 28 significantly differentially expressed transcripts were considered for validation using RT-qPCR. However, one transcript, TRINITY_DN17554_c0_ g6_i2 was excluded as it showed multiple bands. All 27 transcripts when tested using RT-qPCR analysis, showed complete agreement with the RNA sequencing data. The transcripts were differentially expressed with a minimum |log2 fold|> 2 according to RT-qPCR (Table 7).  (Fig. 4). The biological processes, such as photosynthesis, cell cycle, secondary metabolic process, signal transduction, and protein folding were represented by the upregulated unigenes by both blue and red light exposure. Approximately, 26% of the significantly upregulated transcripts in blue vs. white light were found to be involved in 'response to stress' . Further, the upregulated unigenes were found to be involved in various metabolic and biosynthetic processes, such as cellular nitrogen metabolic processes, carbohydrate metabolism, and lipid metabolism. A few of the transcripts were also found to be involved in biological processes related to transport, cell differentiation and morphogenesis, and homeostasis (Fig. 5A). Furthermore, the downregulated transcripts were found to be involved in various processes (Fig. 5B). The transcripts significantly upregulated in plants grown under red vs. white light were found to be involved in processes such as nucleosome assembly, oxidation-reduction process, cell cycle as well as various DNA replication-related processes (Fig. 5C). Interestingly, many similar biological processes were downregulated by both red and blue light treatment ( Fig. 5B and D). A few of these processes include 'oxidation-reduction' , 'response to light' , 'response to cytokinin' , 'response to water deprivation' and 'photosynthesis' . A complete list of GO annotations (with at least two isoforms involved) along with the transcript identifiers is provided in Supplementary file 4. As the A. thaliana genome is well annotated, we performed GO and pathway enrichment analysis using the best matched Arabidopsis proteins to understand the mechanisms underlying the blue or red light exposure. The analysis identified several GO biological processes and KEGG (Kyoto Encyclopedia of Genes and Genomes database 39 ) pathways to be significantly enriched in each pair-wise comparison. We found 'response to water deprivation' to be enriched significantly both in blue and red-light treatment groups. Interestingly, 'response

Discussion
RNA-sequencing of paired-end data corresponding to leaf lettuce grown under blue, red, or white light was generated using Illumina HiSeq sequencing platform. The raw data was filtered for low-quality reads, and a de-novo transcriptome assembly was constructed. The low-coverage (TPM < 1) and redundant sequences (99% identity) were filtered out from the raw assembly to obtain the final assembly. The filtered assembly contained a set of 74,096 unigenes with an N50 value of 1594, which was comparable to a recent study published on red leaf lettuce 40 . Based on the overlapping of transcripts across different light conditions, we found that the overall transcriptome of the lettuce plant appears to be conserved. A total of 24,475 transcripts were shared across the tested treatment sets consisting of RNA isolated from leaf lettuce grown under blue, red, and white light. Furthermore, a total of 5404 transcripts were expressed exclusively in plants grown under red light, whereas 3759 transcripts were exclusively expressed in plants grown under blue light, which indicates that the red light treatment may activate the expression of a large number of isoforms compared to that of blue or white light treatment.
Around 95% of the filtered reads were aligned back to the assembled transcriptome, indicating a high degree of accuracy of the generated assembly. The final filtered assembly was functionally annotated, and approximately 53% of the assembled transcripts matched to the UniProt/SwissProt sequences with high accuracy (E-value of 1e-10).
Differential expression analysis indicated a significant number of genes to be up/down regulated between blue/red and white light treated lettuce samples. Cation/H + exchanger 20 (CHX20) was the most significantly upregulated unigene when lettuce was grown under blue light treatment, whereas translation initiation factor 2, small GTP-binding protein (FUG1) was the most significantly downregulated isoform. CHX20 is a member of putative Na + /H + antiporter family and is involved in osmoregulation and possibly pH modulation. FUG1  24 . In this study, although the metabolite analysis was performed at three-time points zero, one, and seven days, the gene expression studies were performed at zero and one-day time points following exposure to different light qualities 24 . Kitazaki et al. documented the changes in transcriptome occurring at the very early stage, (0 and 24 h) in the third leaf of lettuce plants. However, we have investigated the gene expression at a stage when the lettuce is fully developed and ready for harvest. Some of the genes, CHS, CHI, CHI-like 1, F3H F'3H, DFR, and FLS were upregulated at the 24 h timepoint in blue light exposed plants 24 . However, we have noticed a significant difference in the expression of genes and pathways (Tables 5, 6 and 7, Fig. 5) which were not recorded in the earlier report 24 . The reason for this could be due to differences in the time point chosen for the gene expression study. Also, the early light responsive genes may not maintain the same level of differential regulation even during the late maturation stage of thirty-two days under three different light qualities in a controlled MAPS system. The morphological and developmental changes in lettuce grown under three different light qualities are documented in an earlier report 41 . The blue light exposed lettuce plants were significantly taller and intense red in color, whereas, red light exposure resulted in green coloured leaves with significantly higher number of leaves per plant compared to the other two counterparts 41 . The accumulation of red color due to the accumulation of anthocyanin in response to exposure to blue light has been reported in Arabidopsis 42 , tomato 43 , pear 44,45 , Brassica napus 46 , lettuce 47 , many plant species [48][49][50][51][52] . A broader time course analysis at various developmental stages of plants following different light exposure would help further understanding of gene regulation linked to pigment accumulation.
The functional annotation analysis of the differentially expressed unigenes indicated 11 biological processes to be common in red and blue light treated leaf samples. A few of these include photosynthesis, secondary metabolic process, and cell cycle. A study by Manivannan et al., showed an increase in the production of secondary metabolites by blue light followed by red light treatment of Scrophularia kakudensis, a potential medicinal plant 53 . Furthermore, several biological processes enriched by the differentially expressed unigenes in the current study have been also reported by other studies in response to light stimulation 54 .
To further understand the role of differentially expressed unigenes, we performed functional annotation using Arabidopsis protein identifiers that matched the de-novo assembled unigenes. The results showed enrichment of similar GO annotations that were obtained by the functional analysis of differentially expressed unigenes. The biological processes, such as 'response to blue light' , 'circadian rhythm' , and 'photosynthesis' were significantly enriched by the differentially expressed genes by blue light treatment, which is in agreement with the published studies 53,54 . The stomata are important channels for the exchange of materials, such as gas and water with the www.nature.com/scientificreports/ external environment 55 . A study by Muneer and co-workers showed that the blue LEDs were more efficient in opening and closing stomata. Further, the authors also showed an increase in the number of stomata in plants grown under blue light 56 . The current study found enrichment of genes that are involved in 'regulation of stomatal movement' and they are upregulated in lettuce plants grown under blue light. In the current study, around 9 genes that were differentially expressed by blue light treatment were involved in flower development, which is in agreement with the results from a study by Ye et al., that revealed that blue light exposure regulates flowering 54 . Different light intensities are known to affect the plant defense-related mechanisms 24 . In the current study, blue light treatment affected many biological processes associated with the defense mechanism of plants, such as response to wounding, pathogenic bacteria, salicylic acid and hypersensitive response. There are reports of the modulation of circadian rhythm by light and dark conditions 57 , especially blue light 58 . In the current study, differentially regulated genes by blue light were enriched for 'circadian rhythm' related processes when compared to white and red light treatment.
The current study used RNA sequencing to explore the effect of blue, red, and white light on leaf lettuce. The assembled transcriptome was of good quality, as evident from the alignment of the RNA-Seq reads. Further, more than 50% of the assembled unigenes matched with the sequences available in well-known public databases. The differentially expressed unigenes between blue/red light and white light treatment were found to be enriched in various important physiological processes, such as photosynthesis, plant defense response, and circadian rhythm. The obtained genes can be further used to unravel their underlying roles in a specific wavelength of light during the developmental stages of lettuce.