Comparative genomic and transcriptomic analyses of Family-1 UDP glycosyltransferase in three Brassica species and Arabidopsis indicates stress-responsive regulation

In plants, UGTs (UDP-glycosyltransferases) glycosylate various phytohormones and metabolites in response to biotic and abiotic stresses. Little is known about stress-responsive glycosyltransferases in plants. Therefore, it is important to understand the genomic and transcriptomic portfolio of plants with regard to biotic and abiotic stresses. Here, we identified 140, 154, and 251 putative UGTs in Brassica rapa, Brassica oleracea, and Brassica napus, respectively, and clustered them into 14 major phylogenetic groups (A–N). Fourteen major KEGG pathways and 24 biological processes were associated with the UGTs, highlighting them as unique modulators against environmental stimuli. Putative UGTs from B. rapa and B. oleracea showed a negative selection pressure and biased gene fractionation pattern during their evolution. Polyploidization increased the intron proportion and number of UGT-containing introns among Brassica. The putative UGTs were preferentially expressed in developing tissues and at the senescence stage. Differential expression of up- and down-regulated UGTs in response to phytohormone treatments, pathogen responsiveness and abiotic stresses, inferred from microarray and RNA-Seq data in Arabidopsis and Brassica broaden the glycosylation impact at the molecular level. This study identifies unique candidate UGTs for the manipulation of biotic and abiotic stress pathways in Brassica and Arabidopsis.

Studying the similarities and differences at the genomic and transcriptomic levels to infer the function and evolution of various biological processes is known as comparative genomics and transcriptomics. The Brassicaceae family in the plant kingdom comprises a set of diverse species of great economic, agronomic, and scientific importance, including the model plant Arabidopsis 1

Identified Putative UGTs in Brassica.
We identified 545 putative UGTs in three Brassica species, with a PSPG (Plant Secondary Product Glycosyltransferase) motif at their C-terminal from the projected proteomes (Supplementary Table S1), having a sequence similarity of 40% to 98% with the Arabidopsis UGTs ( Supplementary  Fig. S1). Out of these 545 putative UGTs, 140 belong to B. rapa, 154 to B. oleracea, and 251 to B. napus. The PHMMR tool used for the identification of UGTs produced consistent results in terms of the UDPGT domain with PFAM00201. The average peptide lengths and molecular weights of the putative UGTs were found to be almost similar across the species, with an average length and weight of 438 aa and 49 kDa in B. napus, 457 aa and 51 kDa in B. oleracea, and 454 aa and 51 kDa in B. rapa (Supplementary Table S1). In B. rapa, Bra034848.1 was the gene with a maximum length of 1183 aa among all three Brassica species, and it had a PSPG motif. All the UGT sequences started with a methionine and were full-length sequences. The percentage of GC content was also found to be similar among all Brassica species putative UGTs, with a range of 44.36% to 45% (Supplementary Table S1). As per the calculated amino acid composition for the putative UGTs in all three Brassica species, the amino acids, leucine, valine, glycine and aspartate were predominant across all the UGT proteins (Table S1). Cysteine and tyrosine were found minimally.
Phylogenetic Analysis and Comparison. The final alignment file contained the aligned positions with two highly conserved sites, 490 variables, and 17 singleton sites. The constructed phylogenetic tree suggested a classification into 14 major groups (A-N), as seen in all higher plants such as maize, cotton, peach, and apple ( Fig. 1A) 12,14,15,17 . Two newly discovered groups (O and P) were absent in all three Brassica species as well as Arabidopsis, but are present in Z. mays 15 . The PSPG motif was found to vary in each phylogenetic group, but was similar across all three Brassica species and Arabidopsis (Supplementary Table S1 and Fig. 1B). The histidine at positions 10 and 19 in the PSPG motif was the most conserved site across all three Brassica species (Fig. 1B).
Across the Brassica and Arabidopsis species, groups D, E, H, and L had the largest numbers of UGTs (Fig. 2). Phylogenetic groups E and L in B. napus were the most expanded ones among all the compared groups, with 48 and 61 genes, respectively (Fig. 2). Groups I, J, K, M, and N were found to be conserved among all three Brassica species (Fig. 2). B. napus had the maximum number of putative UGTs in each group, while B. rapa and B. napus had almost the similar number of genes in their phylogenetic groups. In B. oleracea, group F was found to be absent.

KEGG Pathway Mapping and Go Annotation. The Go ontology analysis of all the identified putative
UGTs from all three-brassica species associated them with 13 molecular function and 24 biological processes shown in (Fig. 3A and B). In general, fourteen major KEGG pathways (steroid hormone biosynthesis, metabolism of xenobiotics by cytochrome P450, zeatin biosynthesis, phenylpropanoid biosynthesis, pentose and glucuronate The evolutionary history was inferred using the Neighbor-Joining method. The optimal tree with the sum of branch length = 4816.24407500 is shown. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the number of differences method and are in the units of the number of amino acid differences per sequence. The analysis involved 567 amino acid sequences. All positions with less than 80% site coverage were eliminated. That is, fewer than 20% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 171 positions in the final dataset. Evolutionary analyses were conducted in MEGA7. (B) PSPG motif of each phylogenetic group is shown. interconversions, tryptophan metabolism, drug metabolism -other enzymes, flavone and flavanol biosynthesis, ascorbate and aldarate metabolism, glucosinolate biosynthesis, drug metabolism -cytochrome P450, retinol metabolism, porphyrin and chlorophyll metabolism, and anthocyanin biosynthesis) were found to be linked with the putative UGTs (Supplementary Table S2). Phylogenetic group wise KEGG pathway mapping was also investigated and listed in (Supplementary Table S2). Each group UGTs were found to be involved for specific KEGG pathways. The putative UGTs involvement in various biological processes and their molecular functions is presented in group wise ( Supplementary Fig. S2). Overall most of the putative UGTs were found to be involved in quercetin 3-O-and 7-O glucosyltransferases, hormonal glycosylation and various flavonoid glucuronidation and their biosynthesis.
Chromosomal distribution, Duplication, and Divergence. All the chromosomes in all three Brassica species contained UGTs (Fig. 4), although the numbers varied among them. In B. napus, some genes were found without any chromosome number and were included in the random category (Fig. 4). A comparison between B. rapa and B. napus AA chromosomes showed that the putative UGTs of B. napus followed the exact pattern of chromosome involvement as its progenitor B. rapa (Fig. 4). The same was observed for CC genome from B. oleracea and B. napus.  The maximum clustering of the putative UGTs in all three Brassica species was observed in phylogenetic groups E, L, and H (Fig. 4). Cluster size was found to be variable for each chromosome in all three Brassica species (Fig. 4).
We found 28 clusters of tandemly duplicated UGTs in B. rapa (Supplementary Table S3, Supplementary  Fig. S4). The maximum clusters consisted of two to four UGT genes. Some clusters had five to seven UGT genes, for example Cluster_17 and Cluster_13 ( Supplementary Fig. S4). For B. rapa, almost all the clusters belonging to the same phylogenetic groups, except some clusters, such as Cluster_171, which had a gene from phylogenetic  Table S3). We also found 17 clusters of tandemly duplicated UGTs in B. oleracea ( Supplementary Fig. S4, Supplementary Table S3). The cluster size in B. oleracea was 2 to 4 genes per cluster. We also found that some cluster genes belonged to two different phylogenetic groups, such as the genes in Cluster_1498, Cluster_453, and Cluster_1455 in B. oleracea.
We found that 25 duplicated pairs of putative UGTs diverged between B. rapa (AA) and B. oleracea (CC) genomes at the time of divergence from their progenitor (Table 1). Almost all the duplicated UGT gene pairs belonged to those genes that had undergone a genes that had undergone a whole-genome duplication (WGD) or segmental duplication, and all the pairs had a Ka/Ks ratio of less than 1, indicating the purifying selection acting on these genes. The calculation of the divergence time of the duplicated UGTs spanned from 1.41 to 21.07 million years ago (MYA) ( Table 1). Duplicated gene pairs 2,5,11, 17, 18, and 21 diverged around 13.65 to 21.07 MYA, while fourteen gene pairs (1, 3, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 19, 20, 22, 24, and 25) diverged around 1 to 5 MYA. Only two UGT genes pairs diverged around 5 to 9 MYA. Gene synteny and subgenomic fractionation of the putative UGTs in Brassica. B. rapa showed paleo hexaploidy, with three subgenomes that share the same diploid ancestor as that of the model species A. thaliana 47 . Here, we provide B. rapa-orthologous syntenic genes in B. napus, B. oleracea, and the model plant Arabidopsis. We found, 100 UGT genes of B. rapa were having the syntenic genes in all three-compared species including Arabidopsis (Supplementary Table S3). Overall, All the syntenic genes belonged to tPCK (the ancestral genome of Brassica species) chromosomes 1 to 7 and the following syntenic blocks:  Table S3). The maximum number of syntenic UGT genes was retained in the fractioned subgenome (LF) of B. rapa, B. oleracea, and B. napus (Supplementary Table S3). The minimum number of orthologs was contained in the more fractioned subgenome (MF2) of all three Brassica species. Each phylogenetic group of the putative UGTs belongs to different syntenic blocks, for example, in group E, different syntenic blocks (A, F, T, J, O, X, N) were detected (Supplementary Table S3). Similarity in the syntenic sequences of the putative UGTs among LF, medium fractionated subgenomes (MF1), and MF2 subgenomes are shown in (Fig. 5).
Exon/Intron organization of the putative UGTs in Brassica. Of Table S4). Except Bra034848, which had 11 introns, almost all the remaining UGTs had 1-4 introns per gene. In each phylogenetic group, the intron numbers were found to vary among the three Brassica species. The maximum number of introns was found in E, D, L, and H, whereas the minimum number of introns was found in B, C, F, I, M, and N groups, in all three Brassica species (Supplementary Table S4). Intron phases in some phylogenetic groups were also found to be conserved; for example, all the introns in group G UTGs were found in phase 1 in B. rapa and B. oleracea (Supplementary Table S4). Intron size varied in each phylogenetic group across the three Brassica species (Supplementary Table S4). Interestingly, B. rapa and B. oleracea followed the same pattern of introns in each phylogenetic group, for example, the group A UGTs did not have any intron in either species. After mapping the introns to the amino acid sequence alignments, at least 10 independent intron insertion events can be observed in intron-containing UGTs, following an established method 12,17 . These insertion events can be serially numbered I-1 to I-10, according to their positions. Intron insertion in phase 1 between 150-200 aligned amino acids was found to be highly conserved across the three Brassica species and Arabidopsis 10 ( Supplementary Fig. S4). For group G, in the UGTs from all three Brassica species, a highly conserved intron-insertion region among the amino acids could be observed. A deletion of the conserved intron-insertion region was observed in group E UGTs for all three Brassica species ( Supplementary Fig. S4).
Transcriptome analysis of tissue-specific expression. The RNA-seq data (GSE43245) from B. rapa tissues showed that the UGTs showed a variable expression in all the seven investigated tissues (callus, flower, silique, leaf, root, and stem) ( Supplementary Fig. S5A). The maximum expression of these UGTs was observed in callus, flower, root, and silique tissues. The same results for UGT gene expression were observed in the 127 investigated anatomical parts in Arabidopsis ( Supplementary Fig. S5B), with the maximum expression observed in silique (endosperm and testa) and root (root protoplast, root cortex, and root stele cells). Some UGT genes in Arabidopsis, such as UGT76F2, UGT85A1, UGT85A5, UGT87A2, and UGT89A2 were preferentially expressed in the leaf epidermis and guard cells ( Supplementary Fig. S5B). UGT72B7 genes in Arabidopsis were maximally expressed in the shoot vascular and phloem companion cells. UGT76E3 and UGT84B1 were preferentially expressed in the seed endosperm in Arabidopsis. Overall, each UGT gene showed a unique expression pattern, and therefore, we cannot link the expression patterns with the phylogenetic groups (Fig. S4B). The investigated expression of the Arabidopsis UGTs across 10 developmental stages (germinated seed, seedlings, young rosette, developed rosette, bolting, young flower, developed flower, Flower/silique, silique, and senescence) suggested that the UGTs had the maximum expression at the senescence stage ( Supplementary Fig. S5C).

Transcriptome analysis of Plasmodiophora brassicae. Glycosylation of fungal-responsive pathway
proteins was evaluated using RNA-seq data in Arabidopsis and B. rapa. RNA-seq data at 24 hai and 48 hai stages, in response to P. brassicae, were investigated in Arabidopsis 48 . The maximum number of UGT genes were found to be downregulated upon early and late infection by P. brassicae, in both Arabidopsis and B. rapa (Supplementary  Table S5, Supplementary Fig. S6A). Only 9 UGT genes (UGT71B3P, UGT71C1, UGT72D1, UGT72E1, UGT72E2, UGT74B1, UGT74C1, and UGT76F2) were found to be upregulated, based on the Reads Per Kilobase of transcript per Million mapped reads (RPKM) values for infection with P. brassicae. Some genes were up-and downregulated simultaneously, but at different stages of infection, for example the UGT74E2 gene was found to be upregulated after 48 hai, whereas downregulated at 24 hai, compared to the control. UGT71C1, UGT72E1, and UGT72E2 showed the maximum TPM values against P. brassicae infection. The UGT79B5 gene in Arabidopsis was maximally downregulated, with RPKM reduction from 229 to 106, against the clubroot pathogen.
Based on the RPKM values, the maximum number of downregulated UGTs against P. brassicae was observed in the clubroot-resistant line compared to the susceptible line of B. rapa 49 (Supplementary Table S5, Supplementary Fig. S6B). All the downregulated UGTs had low RPKM values against early and late infectious stages of P. brassicae. Twenty-one UGTs were found to be upregulated against the clubroot disease. Interestingly, UGTs from group M were found to be upregulated in both Arabidopsis and B. rapa against clubroot. Differentially expressed UGT genes for Phytohormone Glycosylation. Diversity in the putative UGTs to glycosylate phytohormones was evaluated using microarray analysis in Arabidopsis, and further confirmed in B. rapa for MeJA-treated RNA-seq data. All the 118 UGTs were differentially expressed against 2,4-D + trichostatin, ABA + SA, indole acetic acid (IAA), MeJA, SA, and zeatin (Fig. 6). The fold change and p-values are provided in Fig. 6. UGT71C3, UGT74D1, and UGT90A2 genes were downregulated in the hypocotyl segments treated with 2,4-D and trichostatin (a histone deacetylase inhibitor) (Fig. 6). UGT72E2, UGT74B1, and UGT79B7 were found to be upregulated in response to 2,4-epibrassionolide and in glucose-treated seedlings (Fig. 6). A differential expression of UGT71B7, UGT85A5, and UGT87A2 was observed in the cell suspension cultures under 25 uM ABA and 300 uM SA after 3 h of treatment. Interestingly, UGT72E1 was downregulated in the leaf samples treated with 50 mM ABA for 3 h, and cell samples treated with 25 uM ABA after 3 and 24 h. An upregulation of UGT72D1, UGT73B4, and UGT76E12 was observed in the seeds of ABA-hypersensitive mutant (ahg1-1), germinated on a medium containing 0.5 uM ABA. A downregulation of UGT71C1 and UGT73D1 was observed in the root xylem pole pericycle protoplast samples transferred to 5uM IAA solution for 3 h. UG74E2 and UGT75B1 were maximally downregulated in the root samples of Col-0 grown for 6 days under continuous light and then transferred to 1 uM IAA for 8 and 12 h. UGT71B7, UGT72B3, UGT72E1, AT3G11340, UGT78D2, and UGT87A2 were the most downregulated UGT genes in the leaf disc samples of Ler and Penta plants grown under short-day light conditions and sprayed with 10 uM MeJA. UGT71D1, UGT73B3, UGT79B2, and UGT90A1 genes were upregulated under 10 uM MeJA-treated leaf disk samples from Ler and Penta plants of Arabidopsis. UGT74D1 and UGT74E2 were found to be downregulated in the leaf discs samples treated with 10 uM naphthalene acetic acid NAA (Fig. 6). UGT73E4 and UGT76E12 were upregulated in response to Col-0 plant samples treated with 75 uM 12-oxo-phytodienoic acid (OPDA) for 4 h. A consistent upregulation of UGT76B1 and down regulation of UGT89C1 was observed in rosette leaf samples taken from plants of 16 different genetic background, treated with 0.3 mM SA. UGT75C1 and UGT78D3 were the only downregulated UGT genes in 21-day old whole plants, which overexpressed the ARR22 (Col-0 background) and were treated with 20 uM trans-zeatin for 3 h (Fig. 6). To compare this data, we found only the RNA-seq (GSM1243356) dataset of B. rapa leaves treated with 0.2 mM MeJA for ZSN5 cultivar. We found 50 upregulated and 39 downregulated UGT genes in response to MeJA exposure (Supplementary Table S6, Supplementary Fig. S6). Bra031290, Bra023872, Bra023594, Bra009143, and Bra005641 were the most down regulated UGT genes based on the RPKM values under MeJA treatment. Bra039547, Bra015435, and Bra012892 were highly upregulated in response to MeJA (Supplementary Table S6, Supplementary Fig. S7).

Differentially expressed Arabidopsis UGT genes under Biotic and Abiotic stresses.
Overall, the maximum number of UGT genes in Arabidopsis was found to be upregulated in response to pathogens such as Alternaria brassiciola, Blumeria graminis, Escherichia coli, G. cichoracearum, Golovinomyces orontii, Hyaloperonospora arabidopsidis, Plectospherella cuccumerina, P. syringae, Rhizoctonia solani, and Xanthomonas campestris (Fig. 7). Interestingly, UGT73B3, UGT73B4, and UG73B5 were differentially upregulated in response to all the above-mentioned pathogens. UGT71C3, UGT74E2, and UGT85A1 were highly upregulated in rosette leaf samples treated with the pathogens, A. brassicicola and B. graminis. A very low level of gene regulation of UGTs was observed against E. coli, H. arabidopsidis, and R. solani in Arabidopsis. The UGT89C1 gene was highly downregulated in response to G. cichoracearum-treated whole rosette leaf samples after 18 h and 36 h in cultivars of Col-0 and edr1 genetic background (Fig. 7). UGT73B3 and UGT89C1 were upregulated against G. orontii infection in the rosette leaf samples in plants of Col-0 and eds16-1 genetic background. UGT73B3, UGT76C1, UGT85A1, and UGT87A1 responded positively in response to P. cucumerina treatment in rosette leaf samples of plants with agb1-1 genetic background. An upregulation of UGT73D1 was observed against P. syringae pv. phaseolicola in the leaves of Col-0 plants. The maximum number of UGT genes was found to be upregulated in response to P. syringae treatment in plants (Fig. 7). Again, UGT73B3 responded positively in terms of gene expression against X. campestris treatment in the leaves of plants with Ws-4 genetic background.
A diverse range of differentially expressed UGT genes of Arabidopsis was observed against cold, drought, heat, hypoxia, osmotic, salt, and submergence conditions (Fig. 8). UGT73C1, UGT78D3, UGT79B2, UGT84A3, and UGT90A1 were dominantly expressed in the aerial tissue samples of sf2 plants, grown at 21 °C under 24 h illumination and then exposed to cold (4 °C) for 24 h. A downregulation of UGT72E1 and UGT74F2 was observed in the aerial tissue samples of Col-0 plants, grown at 20 °C under 8 h light/16 h dark for 28 days and then exposed to cold (4 °C) for 10 days. UGT72B1 and UGT75B1 were downregulated, whereas UGT78D3 and UGT90A1 were upregulated in rosettes of atcsp1-1 (a knockdown mutant) plants maintained at 4 °C. In drought, UGT73B3, UGT73B5, UGT73C1, UGT74E2, and UGT79B1 were highly uregulated in leaf samples from wild type (Col 0) plants, which had not been watered for 7 days. UGT79B2, AT4G31780, UGT85A1, UGT85A2, UGT86A2, and UGT90A1 were downregulated in hsf1: hsf3 (mutant plants significantly impaired with respect to the induction of HS gene expression) leaf samples after 1-h heat treatment (37 °C). In response to hypoxia, UGT71B1, UGT71B2, UGT71B7, UGT74D1, UGT76E1, UGT76F2, UGT78D1, UGT87A1, and UGT88A1 were highly downregulated in ANAC102(KO-1) (mutant) plants exposed to low oxygen treatment for 4 h in a dark chamber. UGT73B3, UGT73B5, and UGT73B6 were found to be upregulated in response to the low oxygen treatment. A very low level of regulation of UGT genes was observed in response to 300 mM mannitol-treated leaf samples of col-0 plants. The maximum number of UGT genes was found to be highly downregulated, such as UGT71B1, UGT71C1, UGT71C2, UGT71B1, UGT71D1, UGT72B2, UGT74D1, UGT74F2, UGT76E9, UGT84A2, and UGT84A4, in root epidermis and lateral root cap protoplast samples treated for 3 h (Fig. 8). A downregulation of UGT75C1 and UGT78D1 was observed in the leaf samples shifted from cold to freezing conditions. A high expression of putative UGTs in B. oleracea was observed in root and pod tissues in comparison to the leaf and flower tissues (Fig. 9). A high expression of Bo3g039900, Bo2g098930, Bo1g007610, Bo3g018000, Bo3g091760, and Bo9g090770 genes was observed in root tissues (Fig. 9). Bo9g169250, Bo5g124870, Bo1g056770, Bo1g007610, Bo3g078830 were dominantly expressed in pod tissue of B. oleracea (Fig. 9). Two UGTs i.e. Bo3g091760 and Bo1g056770 were maximally expressed only in flower, while Bo1g007610, Bo4g031150, Bo4g049480, Bo4g030990, and Bo9g090770 genes were expressed in leaf tissues (Fig. 9). The UGTs from the following phylogenetic groups (E, D, H, L, K, A and M) were abundantly expressed in calcium limited seedling tissues of root, leaf, and flower (Fig. 9). Some key regulated UGTs were following i.e. Bo3g078820, Bo2g098930, Bo7g021600, Bo9g176810, Bo1g056770, Bo4g030990, Bo9g090770, and Bo5g008070 against low calcium treatment (Fig. 9).
The salt treatment in B. napus decreased the putative UGTs expression at different levels in leave and root tissues (Fig. 10). A 12 h salt treated leaves and root tissues abruptly decreased the expression of maximum number of UGTs. BnaA03g36450D, BnaA07g04240D, BnaC03g72710D, BnaA08g28350D, and BnaC05g05040D UGTs showed an increase in their expression upon 1 h salt treatment only, while the same genes were showing a decreased expression upon 12 h stress treatment (Fig. 10).

Discussion
The 14 identified phylogenetic groups (A-N) of all the putative UGTs in the three Brassica species were similar to Arabidopsis, which indicates that this family has not expanded in terms of new phylogenetic groups after separation from Arabidopsis 10,11 (Figs 1 and 2). The presence of newly identified O and P groups in higher plants such as maize, but their absence in the three Brassica species and Arabidopsis indicates that they have been lost at some stage during the evolution of these plants 13 (Figs 1 and 2). An expansion of D, E, H, and L groups in all three Brassica species compared to Arabidopsis, indicates that multiple functions are associated with these groups of UGTs and they have a broad substrate specificity 13 . Groups B, C, and O were not as expanded as others, which suggests that they have a conserved substrate specificity 13 (Figs 1 and 2). Surprisingly, group H expanded only in the three Brassica species and Arabidopsis, whereas it contracted in all other eudicots, indicating that these UGTs are more required in Brassica species but have limited functions in other plants (Figs 1 and 2). An expansion, conservation, and reduction of UGT genes in each phylogenetic group of the eudicots reflects the physiological challenges that plants have to overcome for survival on land 13 . The absence of group F was observed in B. oleracea, indicating that these UGTs have been lost during evolution, and the same results were found for G. hirsutum, where group C was found missing 14 .
Structural investigation of the PSPG motif in each phylogenetic group revealed the role of specific amino acid residues that are highly conserved at positions 1 (W), 4 (Q), 10 (H), 19 (H), and 44 (Q) (Fig. 1B). The occurrence of these specific amino acids at these positions in the sequence provides certain evolutionary and functional information that could be helpful for enzyme discovery 13 .
The involvement of all 545 putative UGTs in Brassica to manipulate 14 major KEGG pathways and 24 biological process suggests that this family is highly important for plants to cope with environment stimuli and cellular homeostasis (Fig. 3A,B). The maximum involvement of putative UGTs in flavonoid glucuronidation and their biosynthesis indicates their mode of action through flavonoid glycosylation i.e quercetin 3-O-, 7-O glycosylation etc. (Supplementary Fig. S2 and Supplementary Table S2). The maximum involvement of UGT genes in flavonoid glucuronidation, in response to toxic substances and ABA stimulus, indicates that they are unique modulators molecules in plants that interact with the environment 52 . Several genes for ABA glycosylation have already been functionally characterized in Arabidopsis, Phaseolus vulgaris, and Vigna angularis 41,44,53,54 . Their involvement in plant growth and development has already been verified, for instance, in the in seed and fiber development in soybean and cotton 14,16 .
A varied chromosomal distribution, but the same pattern of chromosome sharing in B. rapa AA genome to B. napus AA genome, and B. oleracea CC genome to B. napus CC genome, reflects the occurrence of recent gene duplication events and close phylogenetic relationships, which is consistent with the findings for Arabidopsis, soybean, and cotton UGTs 10,14,16 . The expansion of the UGT gene family is primarily due to WGD events 12,16 . Tandemly duplicated UGT genes in B. rapa and B. oleracea suggest that chromosomal rearrangements took a part in the versatility of plant secondary metabolism, and the same results was observed in mustard family, where 45% and 48% of the glucosinolate UGT loci in Arabidopsis and Aethionema descend from tandem duplication 55 (Supplementary Fig. S3). All the duplicated gene pairs had a Ka/Ks ratio <1, indicating that UGT genes might have preferentially conserved function and structure under selection pressure 11,56 (Table 1). The sixteen pairs of diverged duplication pairs of UGT genes between B. rapa and B. oleracea around 1 to 9 MYA indicate a recent divergence during Brassica triplication events (5-9 MYA) 57 Table S3). The minimum number of UGTs in MF (MF1 and Mf2) subgenomes reflected the round of gene loss in these subgenomes 11 . It seems that LF experienced one round of gene loss in UGTs and retained more genes than MF1 and MF2, which experienced two rounds of gene loss 47 . Some UGTs were not contained in all three subgenomes, suggesting that these UGTs evolved individually and later became a part of the genome. The maximum number of duplicated regions were observed in MF1 and MF2 UGTs in Brassica, suggesting that these genes remained highly conserved during subgenome fractionation and rounds of genome duplication (Fig. 5) 47 .
An increased number of intron-containing UGTs in B. napus, compared to B. rapa and B. oleracea, suggested that polyploidization increased the proportion of introns during the hybridization of genomes (Supplementary Table S4). The results are in line with those on cotton polyploidization, where duplicated genes were found to evolve independently after the polyploid formation 59 . Conserved intron between (150-200 amino acids) was the most widespread and oldest intron observed in most members of the different phylogenetic groups in all three Brassica species (Supplementary Fig. S4). The same intron insertion was observed in Z. mays, P. persica, flax, soybean, and cotton UGTs 10,14-17 . Most of the intron insertions were observed in phase 1, across all three compared Brassica gene structures, suggesting that the majority of conserved introns are ancient elements and that their phases remain stable 13,14,17 (Supplementary Fig. S4). Intron size within each phylogenetic group appears to be variable (Table S4), suggesting that the intron size has been gene-specific during evolution 13 .
The maximum expression of UGTs in callus, flower root, and silique suggested their active involvement in phytohormones, which are highly active in developing tissues ( Supplementary Fig. S4A). The same results were observed in cotton and soybean, where the maximum expression of these genes was observed in developing seeds and fiber 14,16 . The individual gene expression of each UGT in a particular tissue reflected their specific role in various plant parts. The maximum expression of UGTs at the senescence stage in Arabidopsis suggested their involvement in the biological ageing of plant tissues, and the results were supported by up-regulated UGT genes at the senescence stage in cotton 60 (Supplementary Fig. S5B).
To date, UGT genes have been well studied with respect to their functional role in the hypersensitive response against P. syringae, F. graminearum mycotoxin inactivation, DON resistance from Fusarium head blight, in tomato, Arabidopsis, wheat, and rice 24,26,[61][62][63][64] . The maximum number of downregulated UGTs genes in response to infection with P. brassicae at early and late infection stages suggested their possibly role in SA and JA crosstalk and glycosylation [24][25][26] (Supplementary Fig. S6A,B). A very few number of up-regulated UGT genes upon infection to P. brassicae suggested that these genes are also involved in some metabolite glycosylation, which is indirectly involved in pathogen resistance.
The differentially up-regulated UGT73B3, UGT73B4, and UG73B5 genes in response to all the studied pathogens suggest that they are the key genes regulated under all pathogen infection in Arabidopsis (Fig. 7). They belong to phylogenetic group D, the members of which preferably glycosylate the SA (pathogen responsive hormone) 72 . In tomato, UGT73B3 and UGT73B5 have been functionally characterized against the P. syringae hypersensitive response, based on SA-dependent induction 24 . In wheat, UGT73C5 has been successfully transformed from Arabidopsis to inactivate the deoxynivalenol mycotoxin against Fusarium head blight 61 . Thus, we propose that the D group UGTs are key regulatory genes against pathogen response in all three Brassica species. The maximum up-regulation of UGT71C3, UGT74E2, and UGT85A1 in response to A. brassicicola and B. graminis also confirmed their involvement in IBA and zeatin glycosylation 37,38 . Highly responsive UGT87A1 against P. cucumerina infection in Arabidopsis suggested the ABA glycosylation of this gene against the pathogen 53 .
Until now, few reports have been published about functional characterization of UGT genes against stress responsiveness in all three-brassica species. The availability of Arabidopsis data provides sufficient evidence about UGTs importance in plant physiology and a maximum homology of all three Brassica UGTs with Arabidopsis is another opportunity to validate this data in brassica species (Supplementary Fig. S1). Recently, a BrUGT74B1 was found to be involved in phytoalexins biosynthesis which have an important role in plant disease resistance 73 . Previously, an overexpression of the same genes in B. napus increased the aliphatic and indolic glucosinolates levels by 1.7 and 1.5 folds in leaves against Sclerotinia sclerotiorum and Botrytis cinereal pathogens 74 . In general, a decreased expression by maximum UGTs under biotic and abiotic stresses in all the observed datasets suggested that glycosylation mechanism is hindered by putative UGTs upon stress in B. napus (Fig. 10). Based on UGTs expression against various biotic and abiotic stresses, some key UGTs and their close homologs could be the potential candidates for functional characterization i.e. BnaCnng01870D and BnaC01g31820D (blackleg resistance), BnaC01g42100D BnaC03g42360D, BnaA08g11370D, BnaA08g28350D (stem rot resistance), BnaA02g20270D, BnaC07g05000D, and BnaA08g20880D (salt stress resistance), BnaC04g14940, BnaC04g14950D, BnaCnng47330D, BnaA04g18440D, BnaC09g49560D, BnaC04g08280D and BnaC05g36720D (drought stress resistance) (Fig. 10).

Methods
Data resources used. One hundred and eighteen UGT sequences of Arabidopsis were retrieved from (http://www.p450.kvl.dk/UGT.shtml). Genomic, proteomic, and cDNA sequences, as well as the annotation data for B. rapa, B. oleracea, and B. napus was downloaded from BioMart Ensemble plants (https://plants.ensembl. org/index.html) and Uniprot (http://www.uniprot.org/). The Hidden Markov Model (HMMER) (http://www. ebi.ac.uk/Tools/hmmer/) web server was used to retrieve the genes containing UDPGT (PF00201.17) domain with significant hits in all three Brassica species. The BRAD database (http://brassicadb.org/brad/) and the Bolbase database (http://ocri-genomics.org/bolbase) were used for gene syntenic analysis. Expression Array and RNA Sequencing data were obtained from the Gene Expression database with the following accession numbers: GSE43245 in B. rapa for expression in tissues, GSE74044 for the clubroot disease, GSE51363 for expression against Methyl JA in B. napus. Some additional datasets were personally requested and obtained for the club root disease RNA-seq from the following authors 48 .
Identification of Putative UGTs. In the reference proteomes of all three Brassica species, UDPGT domain encoding genes were identified through a PHMMER profile corresponding to the PFAM 00201.17 domain, using HMMER web search restricted by taxonomy and significant cut off E-values (0.01 for sequence and 0.03 for hit). The selected protein sequences were screened through 44 amino acid-based conserved PSPG motifs, high quality sequences were aligned using the MUSCLE alignment tool in MEGA7 (http://www.megasoftware.net/). These quality sequences were used to construct the specific UGT profiles for all three Brassica species, using the hmmbuild module in the HMMER program. With these models, the final datasets of the UGT genes were identified from each Brassica proteome. The gene name, chromosomal location, % GC content, peptide length, and mass data were obtained from the Biomart Ensemble Plants database. Gene ontology (metabolic process, integral components, and transferase activity) and enzyme class data were retrieved using the ID mapping tool in Uniprot database. Amino acid composition of all the putative UGTs was retrieved using the data explorer tool in MEGA7 software.
Phylogenetic Analysis and Comparison. The downloaded sequences were aligned with MUSCLE using the MEGA7 software. Sequences that were too short, too divergent, or too long were removed from the input file after the initial alignment, and the remaining sequences were then re-aligned. The obtained alignment file contained only those sequences similar to the desired PSPG motif. Phylogenetic analysis was performed using the Neighbor-joining statistical method with 1000 bootstrap replicates in MEGA7. A 100% data coverage was used to construct the phylogeny. The phylogenetic tree was visualized using the iTOL website (http://itol.embl.de/). To compare the phylogenetic groups of the putative UGTs in all three Brassica species, we collected the published data on the number of putative UGTs and the number of phylogenetic groups in Arabidopsis 10 .
KEGG Pathway Mapping and Go Annotation. Gene ontology and KEGG pathway mapping of all the putative UGTs in the three Brassica species were conducted using the Blast2Go tool (https://www.blast2go.com/). Individual UGTs from each phylogenetic group in all three Brassica species were blasted against Arabidopsis in the NCBI database following with mapping and annotation. This analysis provided us the sequence similarity with Arabidopsis and the Go annotation (molecular function, and biological process) for each group and gene. KEGG pathway mapping based on phylogenetic groups provided us with enzyme-specific pathway information, EC code, and the responsible genes in each group and each UGT gene of all three Brassica species.
Chromosomal distribution, Duplication, and Divergence. The physical location of each UGT on the chromosomes was retrieved using the start and stop positions of the genes taken from BioMart. Mapchart 2.2 was used to visualize the UGT gene distribution on each chromosome in all three Brassica species. A gene cluster was defined as two or more copies located in a chromosomal region of length <200 kb 75 . A lot is known about the dominant pattern of UGT duplication by segmental duplication and WGD in maize and peach 15,16 . To detect the tandem duplication pattern of the putative UGTs, we used the Plant Tandem Duplicated Genes database (http:// ocri-genomics.org/PTGBase/). Tandem repeats of each gene were retrieved using the Gene ID as the query in that specific species cluster. The cluster name and its graphical representation was recorded and saved.
To estimate the divergence of the duplicated UGT genes between B. rapa (AA) and B. oleracea (CC) genomes, the duplicated pairs were detected in the Plant gene duplication database (http://chibba.agtec.uga.edu/duplication/) using the locus IDs as a search tool and the Ka (non-synonymous)/Ks (synonymous rate) values were recorded for each duplicated pair in a 100-kb display range. The R-value was taken as 1. Exon/Intron organization of the putative UGTs in Brassica. The exon/intron organization for each phylogenetic group was illustrated using the Gene structure display server (GSDS) program (http://gsds.cbi.pku. edu.cn/), by aligning the coding and genomic sequences obtained from BioMart. Introns were classified based on structure, phase, length, and number. A UGT intron map was constructed in accordance with a previously established method 12,16,17 . The splice sites of the intron-containing UGTs were mapped on all aligned sequences of intron-containing UGT peptides using the PIECE web tool (https://wheat.pw.usda.gov/piece/). Intron distribution graph was built for each phylogenetic group on the basis of the introns contained in each group.
Transcriptome analysis for tissue-specific expression. To check tissue-specific expression of the putative UGTs in all three Brassica species, the RNA-Seq data (GSE43245) of B. rapa was selected and the expression of the putative UGTs in different tissues, such as callus, flower, leaves, roots, silique, and stem, was obtained and analyzed according to the phylogenetic groups. A heatmap of the data was generated using the distance function (Euclidean) and hierarchical clustering (Average) on the Heatmapper website (http://heatmapper.ca/). To compare tissue-specific expression data of B. rapa, we analyzed the RNA-Seq and microarray data for expression of 118 UGTs across 10 developmental stages and 105 anatomical parts, using the Genevestigator database.

Transcriptome analysis against Plasmodiophora brassicae.
To infer the role of the putative UGTs against the obligate biotrophic protist, P. brassicae, in B. rapa and Arabidopsis, RNA seq data were obtained at two timepoints: an early timepoint, at which the pathogen had colonized the root, and a later timepoint, at which more than 60% of the host root cells were colonized and the root morphology was drastically altered 48 . In Arabidopsis, RNA-Seq data at early stages of infection and within a relatively short period of 24 and 48 h post-inoculation (hpi) were retrieved from a previous study 48 . The retrieved RNA-seq data against P. brassicae in Arabidopsis was compared with the obtained RNA-seq data from near-isogenic lines carrying clubroot-resistant and susceptible alleles in B. rapa in response to P. brassicae during the early infection stages 49 . Heatmaps were generated to understand the graphical representation of gene expression. Differential expression of Phytohormone Glycosylation. The glycosylation of phytohormones is an essential mechanism to control the level of active hormones in a cell. To check the expression of individual UGTs against auxins, cytokinins, SA, and JA, microarray datasets were obtained from GENEVESTIGATOR ® (https:// genevestigator.com/gv/) for Arabidopsis, and the results were further confirmed using B. rapa RNA-seq data (Only MeJA data available). Genes were declared to be differentially expressed if they show a fold-change of at least 1.5 and also satisfied p < 0.05 after adjustment for multiple testing 77,78 . Thus, we generated the heatmaps with fold-change value of 1.5 and p < 0.05 against all the treatments. Down-and up-regulated expressed genes were shown to propose the functional characterization of UGTs for phytohormone glycosylation.

Differential expression against Biotic and Abiotic stresses in Arabidopsis.
To contribute to the existing knowledge for future studies and examine the link of glycosylation phenomena with various biological processes in the plants, we checked the differential expression of these genes against various biotic and abiotic stress treatments. By using the fold-change value of 1.5 and p < 0.05, we identified the differentially expressed genes against A. brassiciola, B. graminis, E. coli, G. cichoracearum, G. orontii, H  P. syringae, R. solani, and X campestris pathogens, as well as cold, drought, heat, hypoxia, osmotic, salt, and wounding stresses. Heatmaps of the down-and up-regulated UGTs were generated using Genevestigator. B. oleracea and B. napus. The following SRA datasets PRJNA227258 79 (calcium-limited seedlings expression in 102043 and 107140 genotypes) and SRP040796 80 (leaf, flower, root and pod expression) were obtained and analyzed by Galaxy (https://usegalaxy.org/) RNA-seq analysis webtool in B. oleracea for differential gene expression in tissues and calcium limited plants. The UGTs involvement against abiotic stresses (drought and salt) and biotic stresses (blackleg and stem rot) in polyploid B. napus was evaluated by analyzing the following SRA datasets i.e. SRP051461 81 (drought), SRP028575 82 (salinity), GSE77723 83 (blackleg) and GSE81545 84 (stem rot). Hence, we generated the individual heatmaps in B. oleracea and B. napus using the fold-change value of 1.5 and p < 0.05. Data Availability statement. All data generated or analyzed during this study are included in this published article (and its supplementary information files).