Genome-wide identification and expression analyses of genes involved in raffinose accumulation in sesame

Sesame (Sesamum indicum L.) is an important oilseed crop. However, multiple abiotic stresses severely affect sesame growth and production. Raffinose family oligosaccharides (RFOs), such as raffinose and stachyose, play an important role in desiccation tolerance of plants and developing seeds. In the present study, three types of key enzymes, galactinol synthase (GolS), raffinose synthase (RafS) and stachyose synthase (StaS), responsible for the biosynthesis of RFOs were identified at the genome-wide scale in sesame. A total of 7 SiGolS and 15 SiRS genes were identified in the sesame genome. Transcriptome analyses showed that SiGolS and SiRS genes exhibited distinct expression profiles in different tissues and seed developmental stages. Comparative expression analyses under various abiotic stresses indicated that most of SiGolS and SiRS genes were significantly regulated by drought, osmotic, salt, and waterlogging stresses, but slightly affected by cold stress. The up-regulation of several SiGolS and SiRS genes by multiple abiotic stresses suggested their active implication in sesame abiotic stress responses. Taken together, these results shed light on the RFOs-mediated abiotic stress resistance in sesame and provide a useful framework for improving abiotic stress resistance of sesame through genetic engineering.


Results
Identification of raffinose biosynthesis pathway genes in sesame. Seven GolS genes were identified from the sesame genome database (Sinbase, http://ocri-genomics.org/Sinbase/index.html) by a BLAST search using the protein sequence of GolSs from Arabidopsis. As shown in Table 1, we named the obtained GolS sequences SiGolS1 to SiGolS7 according to their positions from the top to the bottom on the sesame linkage groups (LGs). Fifteen putative raffinose synthase genes were also identified in the sesame genome, and designated as SiRS1 to SiRS15. All the identified SiGolSs and SiRSs were checked manually for the presence of Glyco_trans_8 Pfam (PF01501) and Raffinose synthase Pfam (PF05691), respectively. The detailed information of SiGolS and SiRS genes, including locus ID, linkage group distribution, the length of coding sequences, molecular weight (MW), and theoretical isoelectric point (pI) is listed in Table 1.
SiGolS and SiRS genes were mapped to the 16 sesame linkage groups based on the coordinates of Sinbase loci. As shown in Fig. 1, all the SiGolS and SiRS genes were unevenly distributed among 10 LGs out of the 16 LGs of the sesame genome, except for SiRS15, which was located on the unanchored scaffold. Sequencing analysis of the sesame genome revealed that the recent sesame whole genome duplication genomic regions covered approximately 50% of the current sesame genome assembly 26 . We further analyzed the segmental duplication events of SiGolS and SiRS genes. A total of 5 SiGolSs (SiGolS1, 3, 4, 5, and 7) and 6 SiRSs (SiRS2, 4, 5, 6, 8, and 11) were detected as segmentally duplicated genes. As shown in Supplementary Fig. S1, these segmentally duplicated SiGolS and SiRS genes were located on duplicated segments on 8 LGs. To investigate the evolutionary relationship of GolSs from sesame and other plant species, a Neighbor-joining tree was created based on the protein sequences of 34 GolSs from sesame, Arabidopsis (AtGolS1-7), tomato (SlGolS1-4), maize (ZmGolS1-3), poplar (PtrGolS1-9), rice (Oswsi76 and OsGolS1), and Brachypodium distachyon (BdGolS1 and BdGolS2). GolS proteins could be classified into 5 groups (GolS-I to GolS-V) according to the phylogenetic tree ( Fig. 2A). SiGolS proteins were distributed in all groups, except GolS-III and GolS-V. Generally, SiGolSs have a closer relationship with SlGolSs as compared to AtGolSs, in accordance with the current understanding in their evolutionary history 26 . Notably, in the clade GolS-V, no GolS homologs were found from sesame, Arabidopsis, tomato and poplar ( Fig. 2A), suggesting that group GolS-V is specific for monocot species.

Structural and phylogenetic analyses of
To obtain further insight into the structural features of SiGolS genes, the exon/intron organization was analyzed by GSDS v2.0. As shown in Figure S3B SiGolSs, 10 conserved motifs were captured by MEME v4.11.0 ( Supplementary Fig. S3C), and the details of the sequence logo of each motif were presented in Supplementary Fig. S4. Generally, SiGolSs in the same subfamilies showed similar motifs, indicating that the classification of SiGolS families was supported by motif analyses.    S5). Other thirteen SiRSs belong to the putative RafS enzyme family. Phylogenetic analysis based on the full-length amino acid sequences of RSs from sesame and six other plants clearly distinguished RSs into 6 groups (RS-I to RS-VI) (Fig. 2B). SiRS proteins were distributed in all groups. Similar to SiGolSs, SiRSs have a closer relationship with RSs from tomato and poplar. Two putative StaSs in sesame (SiRS7 and SiRS15) were clustered with all other reference StaSs from Arabidopsis (AtRS4), Vigna angularis (VaStaS), Pisum sativum (PsStaS), Cucumis melo (CmStaS) and Alonsoa meridionalis (AmStaS) in group RS-VI 15,29,30 , indicating the group RS-VI might be specific for StaS enzyme family.
Exon-intron organization of the SiRS family was also investigated to reveal their gene structural diversity ( Supplementary Fig. S6B). The numbers of intron of SiRSs varied from 3 to 14. In general, SiRSs clustered in the same group showed similar gene structure ( Supplementary Fig. S6B). All SiRS genes in the group RS-II have 4 or 5 exons, while SiRSs in the group IV have 13 or 14 exons. Then, the MEME program was used to predict putative conserved motifs in SiRSs. A total of 20 putative motifs were detected (Supplementary Figs S6C and S7). As expected, SiRSs in the same groups have similar motif organization, indicating the link between evolutionary relationship and conserved motifs.

Expression profiles of SiGolS and SiRS genes in different tissues.
To investigate the expression patterns of SiGolS and SiRS genes, their transcript levels in four tissue samples (capsule, leaf, root and stem) and seed samples at different developmental stages were retrieved from Sesame Functional Genomics Database (SesameFG, http://www.ncgr.ac.cn/SesameFG). Heatmaps were generated according to hierarchical clustering methods based on the RPKM values for each gene (Fig. 3). All SiGolS and SiRS genes displayed very diverse expression in all samples, except for SiRS11, which was not expressed across all tissues.
Among the 7 SiGolS genes, SiGolS3 and SiGolS4 displayed high expression, whereas, SiGolS2 and SiGolS5 showed relatively low expression in almost all tissues (Fig. 3A). SiGolS1 displayed high expression levels in capsule, leaf and root, showed relatively low expression levels in stem and during the seed development. SiGolS6 and SiGolS7 exhibited high expression levels during the seed development and relatively low expression levels in leaf and stem (Fig. 3A). Concerning the SiRS genes, all of them exhibited high expression levels in all tissues and developing seeds, except that SiRS1, 8 and 10 displayed relatively low expression levels during seed development (Fig. 3B). Especially, SiRS4 and SiRS5 were constitutively expressed at a relatively high level across all tissues. SiRS8 and SiRS10 exhibited specific high expression in leaf and stem. While, SiRS14 and SiRS15 exhibited specific low expression in stem and leaf, respectively. It is worth noting that over half of the SiRS genes displayed lower expression at the late stage of seed development compared to the early stage (Fig. 3B).

Expression profiles of SiGolS and SiRS genes in response to abiotic stresses. GolS and RS genes
have been reported for their responsiveness to various abiotic stresses 14,17 . Thus, the expression patterns of these genes in response to drought and waterlogging stresses in the root of genotypes with contrasting tolerance levels were firstly revealed by two separate transcriptome analyses 31,32 . According to the transcriptome data, 6 SiGolSs (except for SiGolS5) and 5 SiGolSs (except for SiGolS2 and SiGolS5) showed the corresponding expression data under drought and waterlogging stresses, respectively. 13 SiRSs (except for SiRS8 and SiRS11) showed the corresponding expression data under both drought and waterlogging stresses. Although some similar expression patterns were exhibited, SiGolSs and SiRSs showed complex expression patterns in response to drought and waterlogging stresses in two contrasting genotypes, as evidenced by the cluster analyses in the heatmaps (Figs 4 and 5).
Expression of most of the SiGolS and SiRS genes were repressed under waterlogging stress in both the waterlogging-tolerant (WT) variety Zhongzhi No. 13 and the waterlogging-susceptible (WS) variety ZZM0563 (Fig. 5). All the SiGolSs showed decreased transcripts under waterlogging stress at whole time points in the WT and WS varieties, except that SiGols6 was instantaneously up-regulated at 3 h after waterlogging stress in the WT variety. Six SiRSs (SiRS1, 6, 7, 12, 13, and 14) were down-regulated under waterlogging stress in both WT and WS varieties. SiRS3 and 9 were up-regulated at 9 h and 15 h after waterlogging stress in WT and WS varieties. SiRS2, 4 and 5 were down-regulated only at 3 h after waterlogging stress in WT and WS varieties. SiRS10 exhibited relatively higher transcript accumulation at 3 h and 15 h after waterlogging stress in the WT variety, but showed no significant difference in the WS variety (Fig. 5).
To extend our understanding of SiGolS and SiRS genes in response to other important abiotic stresses impairing the sesame production, 5 SiGolSs (SiGolS1, 2, 4, 6, 7) and 12 SiRSs (SiRS1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14) that showed different expression patterns in different organs, and in response to abiotic stresses based on transcriptome data were chosen for further investigation of their expression patterns in shoot under osmotic, salinity, and cold treatments by qPCR. Under osmotic stress, SiGolS2 and SiGolS7 were significantly up-regulated (fold change > 2) during the whole treated time points, while the other 3 SiGolSs exhibited no significant change ( Fig. 6A and Supplementary  Fig. S8). Under salt treatment, 3 SiGolSs (SiGolS2, 4, and 7) were significantly induced (fold change > 2) during  the treatment period, whereas SiGolS1 was only up-regulated at 2 h treatment. Under cold treatment, SiGolS2 and SiGolS7 were significantly repressed during the whole treated time points. As shown in Fig. 6B and Supplementary  Fig. S9, 4 SiRSs (SiRS4, 5, 6, and 12) were significantly (fold change > 2) up-regulated, while SiRS8 was significantly (fold change > 2) down-regulated during the entire osmotic stress time points. SiRS5 and SiRS6 were significantly induced (fold change > 2) during the salt treatment period, whereas SiRS2, 7, 10, 12 and 14 were only up-regulated or down-regulated at particular time points. Under cold treatment, most of the SiRSs were not significantly affected, while SiRS7 and SiRS14 were significantly (fold change > 2) up-regulated. Together, these results indicated that most of the SiGolS and SiRS genes are active in response to osmotic and/or salt stresses, whereas slightly affected by cold stress (Fig. 6, Supplementary Figs S8 and S9).
Galactinol and raffinose content in sesame exposed to osmotic stress. Finally, the changes in the content of galactinol and raffinose in sesame under osmotic stress treatment were investigated. Two-week-old seedlings were treated with 15% PEG 6000, and shoot samples were harvested at 0, 2, 5 and 9 days after treatment. As shown in Fig. 7, the content of both galactinol and raffinose clearly increased under osmotic stress, and the most obvious accumulation in galactinol and raffinose were observed at 6 days after treatment. During the osmotic stress treatment, the amounts of galactinol increased continuously. While the raffinose content peaked at 6 days and decreased subsequently.

Discussion
Sesame is widely grown in arid and semi-arid areas facing frequent occurrences of drought and an increasing soil salinization due to intense use of irrigation and applied fertilizers 33 . Although sesame is a resilient crop that fairly resistant to several abiotic stresses including drought, salt, heat, it is highly sensitive to environmental stresses during its vegetative stage which directly affects its yield potential 27,28,34,35 . More importantly, the molecular mechanisms underlying sesame responses to abiotic stresses are poorly understood 36 . Raffinose family oligosaccharides (RFOs), which accumulate during seed development and plant exposed to abiotic stresses, perform a critical function in desiccation tolerance of developing seeds and plants 6,7 . Although some RFOs biosynthesis related genes (such as GolSs and RSs) have been studied in many plants 7,15,17,37 , less information is known about the GolS and RS gene families in sesame. Herein, a total of 7 SiGolSs and 15 SiRSs were genome-wide identified from sesame, which were classified into 5 and 6 subgroups, respectively, according to the phylogenetic relationship (Fig. 2). This classification is consistent with previous studies of GolS family in poplar, tomato and Brachypodium distachyon 21,38 . Furthermore, 13 RafS and 2 StaS (SiRS7 and 15) were further identified in SiRS gene family based on the existence of characteristic insertion of StaS 15 . The phylogenetic classification of GolS and RS was also supported by conserved motif and gene structure analyses. Protein and nucleotide sequence analyses showed that GolS and RS gene families harbored similar motifs and exon-intron organizations in the same subgroup ( Supplementary Figs S3 and S6). These typical characteristics of these two gene families were also observed in other plants, such as maize, poplar, tomato and Brachypodium distachyon 20,21,38 . Collectively, similar conserved motifs and exon-intron organizations shared in the same subgroup indicate that SiGolSs and SiRSs in the same group had a closer relationship during the evolution process.
Based on transcriptome data, comprehensive expression profiles of SiGolS and SiRS genes at different developmental stages, or different tissues were revealed (Fig. 3). We found that some SiGolS and SiRS genes exhibited tissue-and developmental stage-specific expression patterns, indicating their possible roles in specific growth or developmental stages. For example, SiRS8 and 10 exhibited specific higher expression in leaf and stem (Fig. 3B). RFOs accumulate during seeds development is thought to be important for desiccation tolerance during seed maturation and longevity in dehydrated state 18,39 . Seeds of the AtRS4 and AtRS5 double mutant showed a total loss of RFOs and five days delayed germination phenotype in darkness, suggesting that RFOs also act as a galactose store in seeds and are necessary for rapid germination in the dark 40 . Among sesame GolS and RS genes, 3 SiGolSs (SiGolS3, 6 and 7) and 2 SiRSs (SiRS4 and 5) displayed relatively higher transcripts during seed development, suggesting that these RFOs synthetic genes may be involved in the sesame seed development process.
RFOs were also found accumulated under multiple abiotic stress conditions and function as osmolytes to stabilise cell components, and/or act as reactive oxygen species (ROS) scavengers 3,9 . We also found galactinol and raffinose significantly accumulated under osmotic stress in sesame. Increasing evidence indicates that RFOs synthesis related genes, especially GolS, are important in the physiology of plant stress resistance. Expression analyses of GolS and RS gene family members in Arabidopsis, rice, maize, poplar, and tomato suggested that many GolS and RS genes showed transcriptional changes under drought, high-salinity, and cold stresses 17,[19][20][21] . Moreover, transgenic plants analyses revealed special members of GolS and RS gene families as key players in plant abiotic stress resistance. AtGolS2 was up-regulated by drought and salt stresses, overexpression of AtGolS2 not only enhanced tolerance to drought, salt, chilling and oxidative stresses in transgenic Arabidopsis 7,17 , but also improved drought stress tolerance in the monocot model Brachypodium distachyon and rice 22,41 . Especially, overexpression of AtGolS2 reduces yield losses under field drought conditions under different environmental conditions and in different rice genetic backgrounds, which suggests that AtGolS2 is a useful biotechnological tool to improve drought tolerance in rice 22 . Based on in silico analysis and our qPCR analysis, 6, 2, and 4 SiGolSs were regulated by drought stress (4 up-regulated genes and 2 down-regulated genes), osmotic stress (2 up-regulated genes), and salinity stress (4 up-regulated genes), respectively (Figs. 4 and 6). Among the 7 sesame GolS genes, SiGolS2 and SiGolS7 showed a closer phylogenetic relationship with AtGolS2 ( Fig. 2A), and exhibited amino acid identities of 81% and 70%, respectively, to the protein encoded by AtGolS2. Moreover, SiGolS4 and SiGolS7 were significantly up-regulated in both osmotic and salinity stresses ( Fig. 6A and Supplementary Fig. S8), suggesting that these SiGolSs might be positively involved in drought and salt tolerances of sesame. Our study also found that 10, 8 and 11 SiRSs were regulated by drought stress (5 up-regulated genes and 5 down-regulated genes), osmotic stress (5 up-regulated genes and 3 down-regulated genes) and salt stress (4 up-regulated genes and 7 down-regulated genes), respectively (Figs 4 and 6). Among these genes, SiRS5 and SiRS6 were commonly up-regulated by drought, osmotic, and salinity stresses. On the contrary, SiRS8 and SiRS9 were down-regulated in both osmotic and salinity stresses. Additionally, we found that 3 SiGolSs (SiGolS2, 6 and 7) and 7 SiRSs (SiRS4, 5, 6, 7, 13, 14 and 15) could be induced by drought stress in different genotypes (Fig. 4). All these evidences demonstrated the implication of these genes in response to abiotic stresses in sesame, and therefore, could be further targeted for functional analysis. Interestingly, the expression of SiGolS and SiRS genes was slightly affected by cold stress except SiGolS2 and SiGolS7 ( Fig. 6; Supplementary Figs S8 and S9), which could be explained by the fact that sesame was native to warm areas. Sesame is highly susceptible to waterlogging stress, and waterlogging is a significant environmental constraint to sesame production in China and Korea 32 . However, the expression of genes involved in RFOs biosynthesis under waterlogging stress is largely unknown. Herein, we provide the first insight into waterlogging-responsive of GolS and RS gene family members. Most of the SiGolSs, were down-regulated under waterlogging stress in two genotypes. GolSs act as a switch of inositol metabolism and RFO biosynthesis. Down-regulated of many SiGolS genes under waterlogging stress may divert myo-inositol away from the RFO synthetic pathway, thus participated in O-methyl-inositol (OMI) synthesis and act as a stress tolerance molecule 9 . These results presented here would be helpful for uncovering the function of RFOs synthetic pathway in abiotic stress resistance in sesame. In conclusion, 7 SiGolS and 15 SiRS genes from sesame have been characterized based on evolutionary, conserved protein motif, and gene structure analyses. The expression profiles of SiGolS and SiRS genes reveal their involvement in sesame seed development and responses to abiotic stresses. Together, these data will supply abundant information for functional characterization of SiGolS and SiRS genes and advance our understanding of RFOs-mediated abiotic stress tolerance in sesame.

Methods
Sequence identification and phylogenetic analysis. Protein sequences of genes involved in raffinose biosynthesis in Arabidopsis, such as AtGolSs and AtRSs, were used as queries to search against the protein database at Sinbase (Sesamum indicum genome database, http://ocri-genomics.org/Sinbase/index.html) 42  Gene structure and motif identification of SiGolSs and SiRSs. Exon and intron structures of these genes were investigated by comparing the coding sequences with their corresponding genomic sequences from Sinbase database, and visualized by using GSDS 2.0 (http://gsds.cbi.pku.edu.cn/index.php) 45 . The duplication pattern of each SiGolS and SiRS gene was analyzed using MCScanX software (http://chibba.pgml.uga.edu/mcscan2/) according to the previous description 46 . Conserved motifs in SiGolSs and SiRSs were identified using MEME v4.11.4 (http://meme-suite.org/tools/meme).

Plant growth and stress treatment.
To measure the transcript levels of the sesame GolS and RS family members under various abiotic stresses, seeds of sesame cultivar Zhongzhi No. 13 were germinated and grown hydroponically in a growth chamber with a 16 h light/8 h dark cycle. For osmotic and salt stress treatments, two-week old seedlings were treated with 15% PEG 6000 and 150 mM NaCl. For cold stress, seedlings were transferred to a growth chamber at 4 °C. Shoot samples from five randomly selected plants were collected (as one biological replicate) at 0 h (pretreatment), and at 2 h, 6 h and 12 h after stress treatments. For each treatment and time point, three replicates were used for RNA extraction.
Expression profiles analyses of SiGolSs and SiRSs. Total RNA was isolated using the EASYspin Plus kit (Aidlab, China) according to the manufacturer's instructions. For real-time quantitative RT-PCR (qPCR) analysis, first-strand cDNAs were synthesized from DNaseI-treated total RNA using the HiScript II 1st Strand cDNA Synthesis kit (Vazyme, China) according to the manufacturer's instructions. Real-time quantitative RT-PCR was performed on Roche LightCycler 480 real-time PCR system using the ChamQ SYBR qPCR Master Mix (Vazyme, China) according to the manufacturer's protocol. The sesame Histone H3.3 gene (SIN_1004293) was used as the endogenous control 47 . The relative expression levels were calculated as described previously 48 . The qPCR assays were performed with three replicates. The gene-specific primers are listed in Supplementary Table S1.
Expression patterns of SiGolS and SiRS genes in capsule, leaf, root, stem, and seeds at different stages were examined in a set of transcriptome data downloaded from Sesame Functional Genomics Database (SesameFG, http://www.ncgr.ac.cn/SesameFG). Expression data of SiGolS and SiRS genes under drought stress were extracted from the transcriptome data of two sesame varieties (drought-tolerant cultivar ZZM0635 and drought-sensitive cultivar ZZM4782) under drought stress at flowering stage 31 . Expression data of SiGolS and SiRS genes under waterlogging stress were extracted from the transcriptome data of two sesame varieties (waterlogging-tolerant cultivar Zhongzhi No. 13 and the waterlogging-susceptible cultivar ZZM0563) under waterlogging stress at flowering stage 32 . The hierarchical cluster analyses of gene expression were performed using Cluster 3.0 software 49 , and heatmaps were visualized with TreeView 50 .
Quantification of galactinol and raffinose content. Quantification of galactinol and raffinose content in sesame was performed by liquid chromatography-mass spectrometry (LC-MS) at Wuhan Metware Biotechnology Co.,Ltd (Wuhan, China) as described by 51 , with small modifications. Briefly, shoot samples from five plants were harvested after stress treatment and immediately frozen in liquid nitrogen. Then, samples were crushed and extracted overnight at 4 °C with 1.0 ml 70% aqueous methanol. After filtering, the extracts were analyzed by LC-MS. Details of the methods for the quantification of galactinol and raffinose content by LC-MS are provided in Supplementary Methods S1.