Insights into the species-specific metabolic engineering of glucosinolates in radish (Raphanus sativus L.) based on comparative genomic analysis

Glucosinolates (GSLs) and their hydrolysis products present in Brassicales play important roles in plants against herbivores and pathogens as well as in the protection of human health. To elucidate the molecular mechanisms underlying the formation of species-specific GSLs and their hydrolysed products in Raphanus sativus L., we performed a comparative genomics analysis between R. sativus and Arabidopsis thaliana. In total, 144 GSL metabolism genes were identified, and most of these GSL genes have expanded through whole-genome and tandem duplication in R. sativus. Crucially, the differential expression of FMOGS-OX2 in the root and silique correlates with the differential distribution of major aliphatic GSL components in these organs. Moreover, MYB118 expression specifically in the silique suggests that aliphatic GSL accumulation occurs predominantly in seeds. Furthermore, the absence of the expression of a putative non-functional epithiospecifier (ESP) gene in any tissue and the nitrile-specifier (NSP) gene in roots facilitates the accumulation of distinctive beneficial isothiocyanates in R. sativus. Elucidating the evolution of the GSL metabolic pathway in R. sativus is important for fully understanding GSL metabolic engineering and the precise genetic improvement of GSL components and their catabolites in R. sativus and other Brassicaceae crops.

activity against bacteria and fungi in plants 12,13 and to effectively decrease the carcinogenic risk of colon and lung cancer 14,15 . In contrast, epithionitriles and nitriles have been shown to have little potential for conferring health-benefits 16,17 .
Radish (Raphanus sativus L., 2n = 2x = 18), a member of the Brassiceae tribe in the plant family Brassicaceae 14 , is a historically cultivated crop worldwide. R. sativus is a relative of B. rapa and B. oleracea. The main GSLs in the seeds of B. rapa and B. oleracea are progoitrin and gluconapin 18,19 , both of which are aliphatic; however, in the roots of B. rapa and B. oleracea, the predominant GSL is gluconasturtiin, which is aromatic 19 . Glucoraphenin, which accounts for 70-95% of the total GSLs in seeds of R. sativus, and glucoraphasatin, which can account for 50-90% of the total GSLs in roots, are aliphatic GSLs specific to R. sativus [20][21][22] . Aliphatic GSLs are predominant in the seeds of R. sativus, which has also been demonstrated in B. rapa and B. oleracea 18,19 . In B. rapa and B. oleracea, the major GSL hydrolysis products are nitriles and epithionitriles rather than ITCs 23,24 . In contrast, ITCs are preferentially produced in R. sativus, whereas a small amount of nitriles and epithionitriles are formed 25 .
In the model plant A. thaliana, many genes have been investigated in the context of controlling GSL biosynthesis 2,26 . Furthermore, in comparative studies with A. thaliana, the genes that control GSL biosynthesis in Brassicaceae vegetables have been identified and characterized. One hundred two and 105 GSL biosynthesis genes were identified in B. rapa 27 and B. oleracea 28 , respectively. Moreover, 87 and 104 GSL biosynthesis genes were reported in two R. sativus genomes 29,30 , respectively. The greatest variation between the GSL profiles of B. rapa, B. oleracea and R. sativus is mainly attributed to the duplication or loss of two genes, GRS1 and AOP 28,29,31 . GRS1 catalyses the dehydrogenation reaction to generate the unsaturated 4-carbon GSL from glucoerucin or glucoraphanin to obtain glucoraphasatin or glucoraphenin, respectively; these reactions are specific to R. sativus 31 . AOP2 and AOP3 catalyse the formation of alkenyl and hydroxyalkyl GSLs, respectively 28 . Only one AOP2 is functional in B. oleracea, and no AOP2 homologue has been identified in R. sativus; however, three copies of AOP2 have been identified in B. rapa, and no AOP3 homologue has been identified in these three vegetables. Therefore, low amounts of sinigrin and gluconapin were found in R. sativus [27][28][29] . Epithiospecifier protein (ESP) and nitrile-specifier protein (NSP) can catalyse GSL to produce nitriles and epithionitriles, respectively 32,33 , but the enzyme cofactor ESP is inactivated in R. sativus 34 . Although several studies on R. sativus genes have involved GSL biosynthetic and degradation pathways [29][30][31] , the mechanism responsible for the existence and distribution of both species-specific GSLs and the hydrolysis products of these GSLs remains unclear, which leads to the questions of why R. sativus seeds show substantial accumulation of aliphatic GSLs rather than aromatic and indole GSLs and why low amounts of nitriles exist in R. sativus. Furthermore, the tissue-specific distribution of the two dominating GSLs (glucoraphenin and glucoraphasatin) remains to be elucidated in R. sativus.
In this study, we systematically identified GSL metabolism genes in R. sativus. Furthermore, we present several interesting observations that may explain the formation and distribution of species-specific GSLs and their hydrolysed products in R. sativus. The results of this study contribute to a more thorough understanding of how to precisely genetically modify and improvement of the the GSL metabolism pathway in R. sativus and its relatives.

Results
Identification and analysis of GSL genes in R. sativus. By comparing the draft R. sativus genome 35 with the A. thaliana genome, we discovered 144 GSL genes, which were slightly fewer than the 161 GSL genes present in B. rapa, greater than the 113 GSL genes of Arabidopsis lyrata and the 117 GSL genes of B. oleracea, and two-fold higher than the 68 found in A. thaliana ( Fig. 1 and Supplementary Table S1). Notably, R. sativus homologues corresponding to 17A. thaliana GSL genes were not identified in this study. These A. thaliana genes include MAM3, IPMI-SSU3, IPMDH3, CYP79F2, three FMOGS-OX genes and two transcription factors (Supplementary Table S1). In addition, TGG2 was not found in the 'XYB36-2' genome, but two TGG2 genes were reported by Jeong et al. 30 . Regarding gene numbers, there was less variation, with 40% more side-chain elongation genes observed in R. sativus compared to A. thaliana, whereas in the co-substrate pathway, the number of genes found in R. sativus was 2.33 times the number present in A. thaliana (Fig. 1). As aliphatic GSLs are the major GSLs present in R. sativus, the loss, retention and expansion of the GSL genes may have led to the accumulation of species-specific GSLs in R. sativus (Fig. 2). Expansion of the GSL metabolic genes in R. sativus. Gene copy numbers can be expanded in four major ways: genome duplication, segmental duplication, tandem duplication and via transposable elements 36 . Most of the retained genes (36 out of 51) were present as multi-copies in R. sativus (Supplementary Table S1), which may have resulted from tandem duplication or whole-genome triplication (WGT) occurring after the divergence of R. sativus and A. thaliana. Of the 36 expanded genes, 25 AtGSL genes have syntenic relationships to 51 RsGSL genes (Supplementary Table S2). Furthermore, 16 of the 36 over-retained AtGSL genes were tandemly duplicated and distributed in 17 tandem arrays in R. sativus (Supplementary Table S3). These data showed that tandem duplication and WGT substantially contributed to the expansion of GSL metabolic genes in R. sativus.
We analysed the number and the ratio of single-copy to multi-copy homologous genes to reveal the retention status of the three types of GSL core structure formation genes in R. sativus after the WGT. The gene copy number ratios between R. sativus and A. thaliana were 2 (6:3), 0.8 (5:4) and 1 (4:4) for aliphatic, indole and aromatic GSL core structure formation genes, respectively (Table 1). Furthermore, the downstream genes (GGP1, SUR1, UGT74C1 and ST5b), which retained more than one copy, were more frequently duplicated than upstream genes in the pathway for aliphatic GSL core structure biosynthesis in R. sativus (Fig. 2).
Notably, the ST genes were found to be highly expanded in R. sativus, and most of the ST copies (8 out of 14) in R. sativus were tandemly duplicated. R. sativus contained 11 copies of the ST5b gene, which encodes desulfoglucosinolate sulfotransferase, which is involved in the final step of GSL core structure biosynthesis 37 . The phylogenetic tree of the ST genes in R. sativus and its closely related species revealed that ST5a and ST5b, especially the latter, have undergone the greatest amount of expansion in R. sativus, B. rapa and B. oleracea. The copies of ST5b were clustered into three groups, two of which included only ST5b copies from R. sativus, B. rapa and B. oleracea, implying that they were inherited from a common ancestor and diverged via tandem duplication (Fig. 3).
Myrosinase activity Only FMOGS-OX2 was identified as a single copy gene in R. sativus. Transcriptome analysis indicated that FMOGS-OX2 is highly expressed in the silique but minimally expressed in the other tissues of R. sativus. FMOGS-OX2 was also retained as one copy in B. rapa and B. oleracea and is expressed in all tissues except the roots. In addition, one FMOGS-OX4 gene was identified in B. rapa, but this gene was minimally expressed in all tissues (Fig. 4a). MYB115 and MYB118 play key roles in the aliphatic GSL biosynthetic pathway, acting as negative regulators in the benzoyloxy GSL pathway 39 . We have found that MYB115 is lost but that MYB118 is present in 1, 2 and 1 copies in R. sativus, B. rapa and B. oleracea, respectively. Interestingly, MYB118 was expressed only in the silique in B. rapa and R. sativus and was expressed highly in the silique but at low levels in other tissues in B. oleracea (Fig. 4b).
Putative non-functional ESP and tissue-specific expression of NSP genes in R. sativus. When tissues of Brassicales plants are damaged, myrosinase is released, and glucosinolates are degraded to isothiocyanate, thiocyanate, or nitrile derivatives. Although two R. sativus myrosinase genes have been cloned by Hara et al. 40 and a total of 11R. sativus myrosinase genes were identified by Mitsui et al. 29 , it remains unclear why non-nitrile products exist in R. sativus. We found one ESP gene in R. sativus; we also identified two copies of ESP in B. rapa and five copies in B. oleracea. The RNA-seq data indicated that the ESP genes were not expressed in R. sativus but were expressed in both B. rapa and B. oleracea (Fig. 5). NSP, whose product promotes the breakdown of glucosinolates into nitrile derivatives, was found to be retained as one copy in R. sativus and as two copies in B. rapa. Interestingly, in R. sativus, the NSP5 gene was expressed at a minimal level in the flowers, seedpods and callus and was not expressed in the roots, but one of the copies in B. rapa was highly expressed (Fig. 5).

Discussion
We identified the counterparts of most of the A. thaliana GSL metabolic genes, which are present in various copy numbers in R. sativus. Compared with the GSL genes identified in two previously reported genomes, the copy numbers were the same for most of the identified genes 29,30 , with the exception of TGG2. The two R. sativus TGG2 genes reported by Jeong et al. 30 are not considered to be homologues, since these two genes were identified by using homologue searches based on the sequence of AT4G11150, which does not encode the TGG2 gene in A. thaliana. The corresponding homologue of TGG2 (Rs303830) identified by Jeong et al. 30 were identified in the 'XYB36-2' and ' Aokubi' genomes, but they represented the homologues of TGG1 in these two genomes 29 .
The qualitative and quantitative changes in the GSL profiles of R. sativus may be due to the loss, expansion, non/neo-functionality and tissue/stage-specific expression of GSL genes. According to our comprehensive and comparative analysis based on genome and transcriptome data from this study and previous research, these changes corresponded to the evolution of the R. sativus genome [29][30][31]35 . In terms of side-chain elongation, the absence of MAM3 led to a lack of long-chain aliphatic GSLs in R. sativus, and this result is consistent with Mitsui et al. 29 . We also found that the loss of CYP79F2, of which the knockout decreases the abundance of long-chain aliphatic GSLs in A. thaliana, may also result in short-side-chain aliphatic GSLs in R. sativus 41,42 . Considering that functional MAM3 genes were found in B. rapa and B. oleracea, the loss of CYP79F2 might be the main reason for the predominance of short-side-chain aliphatic GSLs observed in these two species. In addition, no homologues of IPMI-SSU3 or IPMDH3 from A. thaliana were found in R. sativus. The IPMIs, which are composed of a large subunit and a small subunit, catalyse the isomerization reaction. IPMI-SSU3 encodes the small subunit. However, knockout mutants of IPMI-SSU3 result in small changes in GSL levels in A. thaliana 43 . IPMDH3 is a predicted enzyme that has a similar function to IPMDH1 44 , which promotes glucosinolate accumulation 45,46 . Therefore, the loss of IPMI-SSU3 and IPMDH3 is likely to be offset by IPMI-LSU1, IPMI-LSU2 and IPMDH1.
Moreover, duplicated genes may enhance the potential for the quantitative variation of a particular trait 47 . Interestingly, most of the GSL genes were present in multiple copies. Syntenic analysis demonstrated that GSL genes have increased their copy numbers through WGT. In addition to WGT, some genes also expanded in number through tandem duplication, such as ST5b, which is present in 11 copies in R. sativus. Notably, expansion was observed for more aliphatic GSL core structure formation genes than for indole and aromatic GSL genes. Further, the biosynthesis genes for the core structures of aliphatic GSLs that were present in downstream locations were more over-retained than those in upstream locations. The redundancy of the downstream genes may guarantee products for successful synthesis of aliphatic GSLs. Furthermore, with respect to the secondary modification step, three FMOGS-OX genes (FMOGS-OX1, FMOGS-OX3 and FMOGS-OX4) and two AOP genes (AOP2 and AOP3) were lost. FMOGS-OX1-4 S-oxygenates both short-and long-chained methylthioalkyl GSLs to produce the corresponding methylsulfinylalkyl GSLs 38 . In R. sativus, we identified one copy of FMOGS-OX2, which catalyses the S-oxygenation of glucoraphasatin and converts glucoraphasatin into glucoraphenin. The results of the GSL content analyses performed in previous   catabolite of glucoraphenin 51 . Additionally, we found no AOP2 or AOP3 genes in R. sativus, as reported by Mitsui et al. 29 . The loss of these two genes prevents the conversion of S-oxygenated and hydroxyalkyl GSLs into downstream GSLs, which is the reason for the accumulation of glucoraphasatin and glucoraphenin in R. sativus.
In addition, two transcription factors (MYB76, MYB115) were lost in R. sativus. Although aliphatic GSL levels will increase with increased MYB76 expression levels, a knockout mutant of MYB76 was reported to exhibit no significant change in GSLs in A. thaliana 52 . MYB115 and MYB118 are functionally redundant and interact to control the expression of GLS genes 39 . Therefore, the loss of MYB76 and MYB115 does not completely abolish GSL biosynthesis. While MYB115 and MYB118 negatively regulate aliphatic GSLs in A. thaliana 39 , the double myb115-myb118 mutant exhibits increased levels of most of the short-chain aliphatic GSLs with the exception of 4-methylthiobutyl GSL (glucoerucin, GER) 39 . Moreover, GER content increased in the single myb115 mutant but significantly decreased in the myb118 mutant 39 . That is, MYB118 promotes the accumulation of GER but decreases the levels of other aliphatic GSLs in A. thaliana. GER is a precursor of the predominant aliphatic GSLs in R. sativus, B. rapa and B. oleracea 53 . Several studies have shown that aliphatic GSLs, especially GER and its downstream products, are highly predominant in seeds in the three Brassica vegetables. In addition, in other organs, the percentages of aliphatic GSLs are decreased 18,19,53 . Considering that MYB118 was expressed only in the silique, we inferred that its silique-specific expression results in the high accumulation of aliphatic GSLs in seeds of these three vegetables. Nonetheless, further work is required to elucidate the molecular basis of the regulation of GSL biosynthesis through MYB118. The ESP and NSP proteins divert the myrosinase-catalysed hydrolysis of an aliphatic GSL from the formation of ITC to the production of epithionitrile and nitrile 54 . O'Hare et al. found that there were no ESP proteins in R. sativus 34 . We identified one ESP gene showing no detectable expression in any organ of R. sativus, indicating that its function might have been lost in R. sativus. We also observed one NSP5 gene with no expression profile in the roots and minimal expression in other organs, and we found no NSP1-4 genes within the R. sativus genome. The non-expression of the ESP gene and the lack of expression of the NSP gene in the roots and the leaves were inferred to be the major reasons for the accumulation of distinctive beneficial isothiocyanates, such as sulforaphene, rather than epithionitriles or nitriles.

Materials and Methods
Data resources. The A. thaliana genome and annotation data were downloaded from TAIR10 (http://www. arabidopsis.org/) 55 . B. rapa and B. oleracea assembly and annotation data were downloaded from the BRAD database (http://brassicadb.org/brad/) 56 . The genome data for A. lyrata were obtained from the JGI database. The three whole-genome sequences of R. sativus sequenced by Zhang et al. 35 , Mitsui et al. 29 and Jeong et al. 30 were downloaded from the BRAD database 56 , NODAI Radish genome database (http://www.nodai-genome-d.org/) 29 and Radish Genome database (http://www.radish-genome.org/) 30 , respectively. B. rapa and B. oleracea RNA-seq data were obtained from the Gene Expression Omnibus (GEO) database with accession numbers GSE43245 and GSE42891, respectively. R. sativus RNA-seq data 35 are available at EMBL/NCBI/SRA (PRJNA413464).

Analysis of GSL genes in R. sativus.
To detect the generation mechanism of the expanded genes, we identified syntenic orthologues using SynOrths based on both sequence similarity and the collinearity of flanking genes 61 . We identified tandem duplicate genes using the same standard with GSL gene identification, and only one unrelated gene was allowed to exist between the two genes in a tandem array 62 . CLUSTALW 63 was employed for sequence alignment. The phylogenetic tree of the ST gene family members of R. sativus and other species was constructed using the neighbour-joining method with MEGA (version 7.0.21) software 64 .