Shaping the evolutionary tree of green plants: evidence from the GST family

Glutathione-S-transferases (GSTs) are encoded by genes belonging to a wide ubiquitous family in aerobic species and catalyze the conjugation of electrophilic substrates to glutathione (GSH). GSTs are divided in different classes, both in plants and animals. In plants, GSTs function in several pathways, including those related to secondary metabolites biosynthesis, hormone homeostasis, defense from pathogens and allow the prevention and detoxification of damage from heavy metals and herbicides. 1107 GST protein sequences from 20 different plant species with sequenced genomes were analyzed. Our analysis assigns 666 unclassified GSTs proteins to specific classes, remarking the wide heterogeneity of this gene family. Moreover, we highlighted the presence of further subclasses within each class. Regarding the class GST-Tau, one possible subclass appears to be present in all the Tau members of ancestor plant species. Moreover, the results highlight the presence of members of the Tau class in Marchantiophytes and confirm previous observations on the absence of GST-Tau in Bryophytes and green algae. These results support the hypothesis regarding the paraphyletic origin of Bryophytes, but also suggest that Marchantiophytes may be on the same branch leading to superior plants, depicting an alternative model for green plants evolution.

The Zeta class is linked to tyrosine degradation, catalyzing the GSH-dependent conversion of malelyacetoacetate to fumarylacetoacetate. The Theta class is similar to the corresponding mammalian class 9 and it is present in bacteria, insects, plants, fish, and mammals 20 .
Lambda and Dhar classes were identified comparing the human Omega GSTs versus the Arabidopsis genome 17 .
Finally, the Mapeg class includes the microsomal GSTs, with transferase and peroxidase activities 21 .
Recently more 6 GST classes have been identified in plants: TCHQD, EF1Bγ, URE2p, Omega-like, Iota and Hemerythrin 19 . Members of the URE2p class were found in Physcomitrella patens, in Selaginella moellendorffii and in bacteria, probably because of horizontal gene transfer events in bacteria, while the Iota GST class was found only in Physcomitrella patens and in Selaginella moellendorffii 19 . Hemerythrin GSTs are non-heme iron binding proteins found in metazoans, prokaryotes, protozoans, and fungi 22 , which acts in detoxification from heavy metals by catalyzing the conjugation of GSH with metal ions 19 .
A phylogenetic analysis made both in monocots (maize and rice) and in dicots (soya and Arabidopsis) demonstrated that Zeta and Theta classes are monophyletic groups in monocots, dicots and mammals, suggesting that their origin might be anterior to the division between plants and animals 23 . Zeta and Theta classes have undergone one or two duplication events, presenting at maximum three paralogs in maize, rice, soya and Arabidopsis. Phi and Tau classes show differences between monocots and dicots due to the extensive gene duplication events that monocots and dicots underwent after their divergence. Extensive duplications also resulted in genic clusters sharing high similarity in small genome regions. The reasons of these retained extensive gene duplications are still unknown 23 . 1107 GSTs from 20 different plant species with sequenced genomes were analyzed (Table 1) to reveal the organization of this relevant family in plants. Two green algae genomes, two Bryophytes, one Marchantiophyta, one Lycopodiophyta, one Gymnosperm, three monocots, ten dicots, including the reference plant species Arabidopsis thaliana (family Brassicaceae), were examined.
In order to associate the unclassified GSTs with specific classes, the collection was analyzed by a multiple protein sequence alignment using Muscle 24 and an associated phylogenetic tree based on the maximum likelihood method 25 (Fig. 1). The analysis defined the class association of the 666 unclassified GSTs (Table 2,  Klebsormidium flaccidum (Klebsormidiales) and two GSTs (213211, 49816) from Micromonas pusilla (Chlorophyta) resulted in the Tau class, as also summarized in Table 2.
In Liu et al., 2013, the authors suggested that GST-Tau genes were absent in algae and Bryophytes and served in Tracheophytes to colonize lands. Interestingly, our preliminary results show also that two GSTs (Mapoly0031s0032.1, Mapoly0118s0009.1) of Marchantia polymorpha (Marchantiophyta) belong to the Tau class.
In Table 3 the results of further analyses on the assignment of these 5 sequences to a specific GST class are shown. A BLASTp analysis 26 , versus all the other GST protein sequences here collected and versus the UNIPROTkb 27 database, highlighted that the two Marchantia polymorpha (Mapoly0031s0032.1, Mapoly0118s0009.1) GST-Tau sequences are actually significantly similar to other members of the Tau class. This result is also valid for one of the two Micromonas pusilla (213211) sequences, although with lower significance (low score and identity values).
On the other hand, the sequence from Klebsormidium flaccidum (kfl00659_0030) and the remaining one from Micromonas pusilla (49816) showed a significant alignment with members of the Mapeg class (Table 3).
A domain search using the Interpro tool 28 ( Figure S1) showed that a GST-Tau from both the phylogenetic tree and the BLASTp analysis in Micromonas pusilla (213211) is actually an Omega-like GST ( Figure S1).
The presence of the GST-Tau class in plants from Lycophytae to higher plants in Liu et al., 2013, suggested that this class of proteins served the plants to colonize lands. The absence of Tau GSTs in all Bryophytes by a multiple sequence alignment and an associated phylogenetic tree of all the available GSTs from this division and the 1107 proteins from our collection (data not shown) was confirmed. This study highlighted the presence of two Tau GSTs in the Marchantiophytes division. This evidence supports the hypothesis of a paraphyletic origin for Bryophytes [29][30][31] (Fig. 2), in contrast with the general assumption that Bryophytes and Marchantiophytes are a separated clade from the one that gave rise to higher plants, and it also suggests that Marchantiophytes could indeed belong to the branching bringing to higher plants.
Tau subclasses. Data collected in this research clearly highlights the amplification of the GST-Tau class when compared to other GST classes 8 (Fig. 1). In the work of Wagner 32 , the authors suggested that GST-Tau in Arabidopsis could be divided into three subclasses. In order to further investigate the expansion of the Tau class, a pairwise similarity of these proteins in Arabidopsis thaliana (Fig. 3) and in Solanum lycopersicum (Table S2) (Table S2).
For further confirmation, two independent phylogenetic trees, one for Arabidopsis and one for tomato ( Fig. 4), respectively, were drawn. The trees support our results from the pairwise similarity matrices. Successively, a phylogenetic tree (Fig. 5) with a reduced number of species, when compared to the one in Fig. 1, and including only Arabidopsis, S. lycopersicum, V. vinifera, three monocots (maize, rice and greater duckweed), S. moellendorffii and M. polymorpha was built. The latter two species are considered plants ancestors 33 . The figure shows the specific grouping into five subclasses, which are indicated from subclass 1 to 5, already detected in the species-specific analysis of tomato Tau GSTs. Subclass 5 does not include GSTs from Arabidopsis.
In the work of Dixon and Edwards 34 , all Arabidopsis GSTs were assigned with a specific role. Considering these functional assignments, subclass 1 includes nine Arabidopsis GSTs (AT3G43800.1, AT1G78370.  Figure S2), shows that GSTs from B. oleracea are not included in the subclass 5, and suggests that the absence of members of subclass 5 could be a common feature in Brassicaceae. 47 GSTs are included in subclass 5 (Fig. 5) (40) 1 (1) 3  Table 2. Number of GSTs per species and per class. Type classes as in Table 1. In brackets the number of GSTs per class before the assignment resulting from the reported analyses.

Discussion
This analysis of 1107 GSTs from plants with sequenced genomes results in a wide phylogenetic tree providing insights on the organization of the different GST classes and highlights the presence of subclasses in the major classes currently described.
Beyond the assignment to specific GST classes for 666 unclassified proteins, the main aspect presented in this study is the possible confirmation of the paraphyletic origin of Bryophytes in contrast with the general assumption that Bryophytes and Marchantiophytes are a separated clade from the one that gave rise to higher plants. Moreover, the results indicate that Marchantiophytes could indeed belong to the branching bringing to higher plants.
The study includes the analysis of GST-Tau class, resulting in the discovery of the presence of at least 5 subclasses. The study tried to define the function of these subclasses. The results highlight the presence of a GST-Tau subclass including all the GST sequences from ancestor species, suggesting a primordial functionality for the members of this subclass. Finally a possible subclass, including genes associated with abscission, appears to be absent in Brassicaceae.
Phylogenetic Analysis. Multiple alignments were obtained using Muscle 24 with default parameter (gap open penalty -2,9, gap extension penalty 0). The Phylogenetic tree was built with RaxML 25 , using the maximum likelihood method, considering PROTCATBLOSUM62 as similarity matrix with the Bootstrap option. Finally the editing tool iTOL v3 49 was used.
In order to obtain the pairwise distances of GST-Tau protein sequences we used "protdist" from PHYLIP, using the JTT matrix 50 . All the alignments, trees and matrices were built using shorter identifiers to indicate each gene. The conversion table between the original gene IDs and the code here used is reported in the supplemental Table 1.
Class assignation for ambiguous cases. In order to understand the class of the three putative GST-Tau of the two algae and the class of the two putative Tau GSTs of the Marchantiophyta we performed a BLASTp 26 with default parameters versus the entire GSTs collection here considered. A Uniprot BLASTp was also performed using default parameters versus UNIPROTkb 27 . The M. pusilla putative GST-Tau was further investigated by an InterProScan 28 analysis with default parameters.