Genome-wide analysis of HECT E3 ubiquitin ligase gene family in Solanum lycopersicum

The E3 ubiquitin ligases have been known to intrigue many researchers to date, due to their heterogenicity and substrate mediation for ubiquitin transfer to the protein. HECT (Homologous to the E6-AP Carboxyl Terminus) E3 ligases are spatially and temporally regulated for substrate specificity, E2 ubiquitin-conjugating enzyme interaction, and chain specificity during ubiquitylation. However, the role of the HECT E3 ubiquitin ligase in plant development and stress responses was rarely explored. We have conducted an in-silico genome-wide analysis to identify and predict the structural and functional aspects of HECT E3 ligase members in tomato. Fourteen members of HECT E3 ligases were identified and analyzed for the physicochemical parameters, phylogenetic relations, structural organizations, tissue-specific gene expression patterns, and protein interaction networks. Our comprehensive analysis revealed the HECT domain conservation throughout the gene family, close evolutionary relationship with different plant species, and active involvement of HECT E3 ubiquitin ligases in tomato plant development and stress responses. We speculate an indispensable biological significance of the HECT gene family through extensive participation in several plant cellular and molecular pathways.


Results
Identification and characterization of HECT gene family in tomato. A total of 14 homologous members of the HECT family in S. lycopersicum (Taxonomy ID: 4081) were identified by retrieving the profile HMM of the HECT domain from the Pfam database. Hidden Markov Model (HMM) profile is built using the HMMER program, used for extensive analyses which encompass probabilistic approaches by converting multiple sequence alignment to generate position-specific scorings and assess sequence similarities amongst proteins 34 . A significant e-value of 0.01 was used as cutoff along with other default parameters for sequence extraction and redundant sequences were excluded from the search. The fourteen tomato HECT members were named SlHECT 1 to SlHECT 14 according to their position on chromosomes 1 to 12. The analysis of the physicochemical properties (Supplementary Table S1) revealed the number of amino acids that ranged from 286 to 3757; molecular weight and pI varied from 33,008.62 to 412,777.85 and 4.65 to 8.3, respectively. All the proteins identified were stable and the index of aliphatic side chains varied from 81.57 to 96.28. The GRAVY values of all the proteins were found to be < 0, which indicates the hydrophilic nature of the proteins. Furthermore, the chromosomal location, gene size, number of introns and exons, and HECT domain information of the 14 putative candidates were evaluated (Supplementary Table S1). The HECT proteins were predicted to translocate majorly in the nucleus and cytoplasm, and five members were located in the chloroplast that suggests an active involvement of these proteins in organelle-specific signaling pathways.

Phylogenetic analysis.
To evaluate and deduce the evolutionary relatedness of the HECT gene family of S. lycopersicum with Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Vitis vinifera, Sorghum bicolor, Zea mays, Mus musculus, and Homo sapiens, a phylogenetic tree was constructed using the Maximum Likelihood (ML) method with 1000 bootstrap replications (Fig. 1). The identified HECT genes in all the organisms were named according to their position on the chromosomes (Supplementary Table S3). The HECT domain, in all the cases, was located at the C-terminal and the classification of the HECT gene family was based on the presence of different domains at the N-terminal (Supplementary Figure S9) 35,36 . The HECT proteins were distributed into six classes according to the presence of NEDD4 subfamily domains (Class I), HERC subfamily domains (Class II), only HECT domains (Class III), armadillo sequences (Class IV), ubiquitin-associated domains (Class V), and other domains (Class VI). We observed that 20 and 5 HECT members from Classes I and II, respectively belonged to mice and humans only. Five tomato HECT members, SlHECT 8,9,10,12, and 13 were classified into Class III. Many Class III members have shown similarity with the Class V members and were closely mapped. The SlHECT 9 from Class III and VvHECT6 from Class V shared high similarity despite the presence of different domains. Similarly, the SlHECT 10, 12, and 13 of Class III were closely related to SlHECT 6, and 7 of Class V. Majority of SlHECT members namely, SlHECT 1,3,4,5,6,7, and 14 were classified under Class V, indicating their participation in ubiquitination. Class IV was completely devoid of members from humans and mice. Class III, IV, and V members showed significant conservation that indicates the common mechanism of the divergence producing HECT members with different functional domains. We noticed a clear distance between the plant and animal HECT members. Despite the presence of the conserved HECT domain in all the sequences, the organization of the domains was quite different from that reflected in the distant evolutionary mapping. We found many common motifs in the structures of the mouse and human HECT members that were present in plants HECT members. We speculate the conservation of the HECT domains among all the HECT members and their evolution as per specific biological requirements. It is interesting to observe that genes can be selectively modified to participate in specific cellular or molecular mechanisms. www.nature.com/scientificreports/ Chromosomal localization and promoter analysis. The HECT genes were found to be allocated across 8 out of the 12 chromosomes of S. lycopersicum (Fig. 2). The highest number of genes were distributed on chromosome 9 and the majority of the genes were localized at the distal ends of the chromosomes. Chromosomal recombination is a key mechanism for gene duplication and diversification of the gene pool and noncoding sequences 37 . The acquisition of the new functional domains in the HECT gene family could arise from the recombination of the distal ends of the chromosomes. The gene family expansion and functional divergence are related when genes are present at the distal ends of the chromosomes. We performed promoter region analysis for the fourteen tomato HECT genes using the PlantCARE database 38 for the identification of cis-regulatory elements, and the transcription factor binding sites, carrying information of gene expression in response to environmental stimuli, or involvement in cellular pathways (Fig. 3a). The analysis of promoter sequences predicted the regulatory roles of the HECT gene family in tomato plant development. The different elements were found participating in hormonal pathways, plant development, abiotic stresses, defense, and stress responsiveness (Fig. 3b). All the HECT genes harbor the elements which play a major role in light responsiveness and abiotic stress. The abiotic stress factors were explicated by the elements; CAA CTG (drought-inducibility), CCG AAA (low-temperature), AAA CCA (low-oxygen response or anaerobic induction), and CCC CCG (no-oxygen response or anoxic-specific inducibility). Most of the HECT genes accommodate elements for hormonal responses such as auxin, gibberellins, abscisic acid, salicylic acid, and methyl jasmonate, suggesting their crucial participation in hormonal pathways ( Fig. 3 and Supplementary Table S2). We found elements contributing to the development of the tomato plant, namely, GCC ACT (meristem expression), TGA GTC A (endosperm expression), CAAT(A/T)ATTG (differentiation of palisade mesophyll cells), GAT GAT GTGG (zein metabolism regulation), CAT GCA TG (seed-specific regulation) and CAA AGA TATC (circadian control). GTT TTC TTAC element was responsible for defense and stress responsiveness.
Motif-based analysis and genomic organization. Ten novel conserved motifs were discovered using a two-component finite mixture model in MEME suite for fourteen HECT gene family members (Fig. 4a). The identified motifs were related to the HECTc superfamily and were majorly involved in the eukaryotic ubiquitinprotein ligase activity (Supplementary Table S18). The width of the discovered motifs ranged from 21 to 193 amino acids (Supplementary Table S4). Motif 1, 4, and 7 were present in all the 14 SlHECT sequences and may represent the core HECT domain corresponding to ubiquitin ligase activity. Interestingly, motif 10 was present in all the sequences except SlHECT 10, and Motif 5 was present in all the sequences except SlHECT 9 and 10. These selective presences of motifs could be responsible for functional divergence. Motif 9 and 6 appeared in exclusively SlHECT 2, 8, and 11 while motif 2 and motif 3 were present in SlHECT 4, 6, 7, 10, 12, and 14 mem-  Table S3 and Supplementary Figure S9).  Figure S5). The discovered motifs were further scanned in HECT protein sequences of these reference organisms to assess the evolutionary and structural relatedness (Supplementary Table S4, Supplementary Figure S5). Few discovered motifs found in tomato were also present in the HECT gene families from other species. Motif 1 and 10 were present   www.nature.com/scientificreports/ in almost all the HECT protein sequences of plants suggesting a hallmark conservative function. Motif 5 was present in only a few protein sequences of the HECT genes. It is an interesting example of the evolutionary closeness of the different organisms. The analysis of discovered motifs in tomato predicts significant structural and functional conservation in different organisms. The results imply that the core HECT domains in E3 ligases are modified and sometimes, expanded to participate in diverse cellular and biological pathways. We found distinct and complex exon and intron structural organization patterns in tomato HECT gene family members (Fig. 4b). The SlHECT 12 was observed without any intron, while maximum exons were found in SlHECT 1. Approximately half of the SlHECT genes were characterized by two or fewer introns and one-third of the genes exhibited 14-18 exons. The presence of numerous exons indicates assorted functional capabilities and expansion of the HECT gene family in tomato. The arrangement of introns and exons demonstrated the conservation of the structural sequence matrix throughout the gene family.
The architecture of conserved residues in the HECT domain and three-dimensional structure prediction. The crystal structure of human HECT Nedd4 (neural precursor cell expressed developmentally down-regulated protein 4) 21,39 was used as a reference for 14 SlHECT members (Supplementary Figure S6b). The three-dimensional HECT domain structures of all the 14 SlHECT were aligned with the Nedd4 (PDB ID: 4BBN) chain A to understand the structural organization of the tomato HECT domain ( Supplementary Fig. 6a). A logo representing conserved regions in the HECT domains in the S. lycopersicum was developed to gain insights into structural relatedness among SlHECT members (Supplementary Figure S14). The ubiquitin transfer is mediated through the binding of the E2 ubiquitin-conjugating enzyme with the E6-AP HECT domain followed by the thiol-ester exchange to transfer the ubiquitin from E2 to cysteine residues in the E3 ubiquitin ligase enzyme 21,40 . Later, the transfer may occur to either a protein molecule or to another ubiquitin molecule to undergo polyubiquitination. The ubiquitin molecule is transferred from catalytic cysteine residue to lysine and forms an iso-peptide bond 41,42 . We observed significant conservation in the tomato HECT members when compared with Nedd4, and the N-lobe of the HECT domain comprise E2 ubiquitin-conjugating enzyme binding site. The N-terminal region is loosely conserved that facilitates interaction with a range of E2 enzymes with different specificities 43 . Most of the conserved residues were present towards the C-lobe where the critically conserved residues L282 and V291 were observed. The catalytic cysteine residues were present in the C-terminal region of the HECT domain and the amino acid alignment at the C-terminal was altered compared to the Nedd4 protein, indicating functional divergence. We observed structural modifications in the tomato HECT genes for organism-specific cellular pathways and therefore, the acquisition of specific functional characteristics. Structural characterization of the HECT proteins provides insights into the possible conformations of the proteins, their possible interaction with substrates and ubiquitin, and eventually help in determining their cellular functions 44 .
The knowledge of a pre-existing structure of a molecule can serve as the template for modeling the structure of the target protein based on sequence-similarity phenomena, using computational approaches 45,46 . We have performed comparative modeling of 14 SlHECT proteins of tomato using Phyre2, a powerful structure prediction server (Fig. 5) 45 using the templates with a high percentage of confidence level (Supplementary Table S7). The percentage of sequence identity ranged from 27 to 54 and the characteristics of the secondary structure such as α-helix, β-strands, disordered regions, and transmembrane (TM) helix, were computed for all the predicted models. A major proportion of the secondary structure was composed of α-helices which ranged from 47 to 66%, while β-strands and disordered regions varied from 1-11% to 10-48%, respectively. Few members exhibited up to a 9% proportion of transmembrane helices. The structural variations in the HECT domains may correspond to functional diversity. The number of the exposed residues in the pockets varied among the tomato HECT gene family members depending on their structural configuration. The varied structural organizations suggest the conservation of the core ubiquitination function and participation in the diverse biological processes in tomato. All the models were validated through ERRAT and Verify3D with scores between 54.63-100, and 35.09-83.65%, respectively, which indicates the high quality of the model. The QMEAN of the structures ranged from − 10.2 to − 1.69. Ramachandran plot for all the predicted models revealed that 81.5-94.30% of the residues were in the most favorable region, while 5.40-13.70% were in the allowed region, and few residues up to 3.10% were in the disallowed region (Supplementary Table S8).

PPI network construction, and gene ontology analysis.
We have designed a protein-protein interaction (PPI) network using the STRING database, with a confidence score of ≥ 0.70, and the number of interactions limited to not more than 50. The PPI network of tomato consisted of 133 nodes and 930 edges (Fig. 6a). Amongst the 133 nodes, 14 nodes are representative of the SlHECT gene family (blue) and the interacting partners of the respective HECT proteins are represented by red color. Large protein complexes are often regulated compactly, and the construction of meaningful modules or clusters of such protein members based on the density between the nodes and their inter-connectivity can be used to understand the crosstalk among proteins and their participation in the molecular pathways 47 . We noticed active participation and interaction of SlHECT 1, 2, 3, 5, 8, and 11 gene family members with proteins involved in the various biological processes. The protein interaction map depicts the regulatory interaction of the HECT gene family in the tomato. We have used the MCODE plug-in of Cytoscape for searching locally dense regions in the PPI network and the threshold for cluster score was considered as ≥ 5. In the tomato PPI network, 3 clusters having cluster scores, 5.455, 5.200, and 5.158, respectively, were selected for gene ontology analysis (Supplementary Table S11 Table S12). The functional annotation revealed the involvement of the HECT gene family members in the ubiquitination, fundamental biological processes, and molecular mechanisms related to tomato plant development and responses to external stimuli. We noticed that most of the HECT members and interacting partners participate in the protein binding and ubiquitin transferase activity, ribosome-binding, and the translation process. Similarly, a pathway analysis, for genes present in the modules discovered through MCODE, was performed using the KEGG Automatic Annotation Server (KAAS). A total of 93 genes were analyzed and these annotated genes and pathways involved were divided into five broad groups with 45 genes in genetic information processing, 03 in environmental information processing, 15 in cellular processes, 09 in organismal systems, and 92 in human diseases (Supplementary Table S13). Most of the genes participated in pathways involved in different human diseases and genetic information processing (Fig. 6c). The tomato HECT genes and their interacting partners were least involved in the external information processing pathway. The data suggest the close correlation of the HECT domain characteristics in plants and animals that maintained the core function. The tomato HECT gene family and their interacting partners could participate in the critical cellular processes and pathways.
Gene expression analysis in vegetative and reproductive tissues. The tomato HECT E3 ubiquitin ligases gene family was analyzed for the gene expression in the vegetative and reproductive tissues (Fig. 7). We noticed that Cluster 1 genes were dominantly expressed in most of the tissues, but few Cluster 1 genes were moderately expressed in the flower tissues. www.nature.com/scientificreports/ Funneliformis mosseae, Tomato Yellow Leaf Curl Virus, Virus-Induced Gene Silencing of Argonaute genes), and abiotic (sun, shade, and heat shock) stress conditions and hormonal exposure (cytokinin, auxin, indole acetic acid, 1-aminocyclopropane-1-carboxylic acid). The abiotic stress treatments significantly changed the expression profile of the HECT gene family members (Fig. 8). The Cluster 1 genes were expressed in vegetative tissues as reported earlier and selectively under-expressed in the reproductive tissues. The Cluster 2 members were mostly expressed at lower levels in reproductive tissues and leaf parts of the tomato plant. The SlHECT 6, 7, and 10 were selectively expressed under abiotic stress in the reproductive tissues. Moreover, we noticed that the differential gene expression profile of the HECT gene family members was affected under biotic stress exposure (Fig. 8). The root tissues were found with lowered Cluster 1 genes expression in the 5-and 15-day infection of Meloidogyne javanica compared with regular gene expression levels that suggest an indication of defense alert. Further, the leaf tissues infected with Tomato Yellow Leaf Curl Virus showed a higher level of Cluster 1 gene expression while SlHECT 8 and 10 gene expression did not change upon induction of the biotic stress (Funneliformis mosseae). These findings advocate the direct involvement of Cluster 1 genes in biotic stress response in tomato. Except for SlHECT 6 and 10, no Cluster 2 member was significantly expressed in response to biotic stress, and SlHECT 13 was exclusively expressed in fruit tissues infected with Funneliformis mosseae. Cluster 2 genes were selectively expressed under biotic stress exposure and present a classical example of HECT E3 ligase gene family evolution for biotic stress response. We observed that Cluster 1 genes were expressed in all the tissues under all the hormonal treatments suggesting activation of the HECT E3 ligases in response to hormonal treatments (Fig. 9). The genes with partial HECT domain were not expressed under hormone exposure and SlHECT 6 and 13 were selectively expressed in root and fruit tissues. The fruit tissues treated with IAA and ACC hormones showed mild gene expression and the SlHECT 5 was under-expressed compared to other members in the fruit tissues. However, SlHECT 7 and 10 showed a comparatively lower expression profile and expressed in almost all the tissues. We did not find selective changes in the Cluster 1 genes for the cytokinin and auxin hormones treatments. Our results convey a very strong indication of Cluster 1 gene participation in stress and hormone treatments.

Discussion
The E3 ubiquitin ligases are the largest and crucial members of the ubiquitin-proteasomal degradation mechanism. The HECT E3 gene family has been identified and studied amongst some higher plants, such as Arabidopsis thaliana, Brassica rapa, Brassica oleracea, Glycine max, Zea mays, Physcomitrella patens, Oryza sativa, Solanum   17,18,22,[24][25][26] . There are no evident traces of the characterization of the HECT gene family in tomato, ever since its genome was completely annotated in 2012, therefore, we have aimed to identify and delineate the HECT E3 ligases present in S. lycopersicum 48 . A total of fourteen genes with core HECT domain were identified in tomato by employing in-silico analyses. The physical and chemical characteristics revealed the stable and hydrophilic nature of the tomato HECT E3 ligases. Most of the SlHECT gene family members were found in the nucleus, cytoplasm, and few members were localized in the chloroplast. Previous studies have reported the colocalization of HECT E3 ligases in both the nucleus and cytoplasm. For instance, the mouse Nedd4 comprises certain amino acid sequences ranging from 402 to 413 which directs its localization into the nucleus 49 . In addition to this, human Nedd4 related WWP1 protein shows context-dependent translocation in the nucleus when co-expressed with human Notch1, which implies that the co-expression of proteins can change the localization patterns 50 . The phylogenetic analysis revealed a close evolutionary relation of tomato HECT members with other plant HECT gene family members. Our analysis provides interesting insights into the conservation and divergence of the HECT gene families in plants and animals. The conservation of core HECT domain and motifs in plants, the addition of new motifs, and the expansion of gene family could be a result of sub-functionalization. These evolutionary mechanisms partition the ancestral genes while retaining the duplicated genes 51 . The evolutionary factors could lead to alternative mechanisms for the partitioning of ancestral functions. However, we could not observe any duplication or syntenic event in the tomato HECT gene family. The HECT gene family classification was based on the different domains present at the N-terminal of the HECT E3 ligases 35 . NEDD4 sub-family (Class I), comprises domains WW and C2 mainly, which helps in the identification of substrates for degradation. The HERC sub-family (Class II), comprising SPRY, APC10, Cyt-b5, ZZ, and WD40 domains, contributes to ubiquitin-mediated protein degradation. Further, the classes were formed based on the presence of the HECT domain and associated domains (Class III-VI), which are abundant with a repertoire of domains such as Ubiquitin-Associated (UBA), Ubiquitin Interacting Motif (UIM), Ubiquitin (UBQ), Armadillo-repeat domain (ARM) and IQ domains, that help in ubiquitin ligase activity 28,36,52 . ARM, IQ, UBA, and UIM domains have been discovered in the Arabidopsis thaliana HECT gene family suggesting a close functional association 27 . The phylogenetic analysis of tomato HECT members construed distinct responsibility in recognition of substrates for protein degradation, or binding to ubiquitin moieties to form polyubiquitin  www.nature.com/scientificreports/ chains. On the contrary, most members of the mouse and humans HECT E3 gene family were present in classes I and II, suggesting their major roles in human diseases, apart from ubiquitin ligase activity. Further, to ascertain the molecular and biological functions associated with the HECT gene family members, we constructed a protein-protein interaction network 53 . Our results indicate that the HECT gene family influences ATP binding, isopeptidase activity, histone binding, protease activity, and numerous molecular and biological processes in tomato. The HECT E3 ubiquitin ligases have been identified to regulate trichome development 54 , leaf senescence 55 , biotic stress 56 , cell growth, proliferation, autophagy, DNA repair, antiviral responses, and many diseases 40 . Our results agree with previous studies and predict extended involvement in the tomato cellular and genetic processes. The promotor analysis of the tomato HECT gene family revealed a range of cis-regulatory elements participating in various developmental and environmental response mechanisms. The analysis of promoter sequences unveiled the role of SlHECT ligases in defense and stress mechanisms, abiotic stress responses, plant development, and hormonal regulation. Each member showed responsiveness towards light indicating the HECT E3 ligase gene family contribution to light signaling and phytohormone pathways. The previous reports on E3 ligases in plants have suggested their roles in hormonal signaling and plant development, along with responses towards abiotic stress factors 14,15 . The findings were further validated with protein-protein interactions, gene ontology studies, and gene expression analysis. The gene ontology data implicated that tomato HECT E3 ligases are involved in biological regulation and various cellular and metabolic processes, indicating the presence of various interaction motifs upstream to the HECT domain, which has roles in ubiquitin-binding, regulation, and localization 22 . All the SlHECT members displayed the catalytic activity, a key feature of the HECT domain as a part of molecular function, that catalyzes both Ub-substrate and Ub-Ub binding processes 12 . Additionally, we visualized the location of SlHECT members on tomato chromosomes wherein chromosome 9 was found to be densely populated with the presence of the majority of SlHECT genes, and genes were located on the distal regions. The distal region genes are prone to genetic recombination and functional diversification 57 . We could not find any duplication among tomato HECT gene family members. Motifs are functionally important, short stretches of DNA, RNA, or protein sequences, which are widespread and inferred to have the same biological function, serving as key elements of molecular evolution 58 . The presence of certain motifs in all the members of tomato implies their conserved nature by evolution, while the sparse distribution of some motifs across the members indicates specialized functions in the plant cellular mechanisms. The exon-intron arrangement has been known to contribute to structural divergence, especially in duplicated genes 59 . The pattern of exon-intron on the HECT gene family members exhibits a complex organization that suggests the presence of the diverse functional motifs responsible for their involvement in different molecular mechanisms.  www.nature.com/scientificreports/ The sequence-structure-function relationships unravel the presumptive interactions with ligands and other proteins 44 . The insights into the architecture of 14 SlHECT members showed conserved HECT domain at the C-terminal, diverse alpha-helix, and disordered region. The protein structural differences influence their target binding capabilities and are directly related to their three-dimensional structures 60 . The GO-driven functional annotation predicted three predominant molecular functions; protein binding, ubiquitin-protein transferase activity, and ubiquitin-ubiquitin ligase activity, in which all the members actively participate. Interestingly, S. lycopersicum HECT gene family exclusively participates in molecular functions like DNA-dependent ATPase activity, histone binding, and isopeptidase activity [61][62][63] . Formerly, studies on ubiquitin-mediated protein degradation have reported the alternative roles of ubiquitin machinery in DNA repair, and signal transduction 35 . The GO category of cellular components has shed light on the translocation of all the HECT E3s and their interacting proteins in the ubiquitin ligase complex, indicating their roles in the ubiquitin-ligase activity. Conversely, members of tomato PPI modules have shown localization, majorly in ribosomes, in addition to cytoplasm and nucleus. Similarly, KEGG pathway analysis revealed that most of the proteins in the PPI networks participated in the genetic information processing pathways, mainly ubiquitin-mediated proteolysis, and proteasomal degradation.
We have used the TomExpress RNA sequencing database for the extraction of the differential gene expression profiles for the tomato HECT gene family under environmental stress and plant developmental conditions 64 . We observed a strong constitutive expression profile of the Cluster 1 genes indicating their active role in regulating protein quality and participating in plant development, and stress responses. The HECT genes have been found to regulate seed size and crop yield in the Brassica napus and cell death in Brassica rapa 65 . A similar constitutive expression profile of the HECT was observed in the Brassica oleracea that is responsible for cellular pathways 26 . Few genes from Cluster 2 were selectively expressed under abiotic and biotic stress conditions whereas the Cluster 1 genes were actively expressed in all the tissues suggesting the critical requirement of the HECT gene family members for generating responses against external stimuli. The HECT gene family was found to participate in abiotic stresses such as cold and drought in Malus domestica 25 . Our results agree with previous studies on the HECT gene family members in plant species. The promoter and gene ontology analysis validated the gene expression analysis that confirms the involvement of the HECT gene family members in the plant growth and development, and stress responses. The presence of the diverse motifs or functional domain in the tomato HECT gene family may qualify them to participate in numerous molecular mechanisms.
Identification and characterization of the HECT gene family members aid in exploring the genetic expansion, distribution, and involvement in tomato plant development. We speculate a strong relation of the HECT gene family in the growth and development of the tomato plant. The HECT gene family can be targeted for the elucidation of several molecular mechanisms related to development, plant immunity, adaptation, drought, and salinity stress responses. This work will serve as preliminary evidence for future studies of the E3 ubiquitin ligases in plants.

Materials and methods
Identification and characterization of HECT gene family in Solanum lycopersicum. The candidate proteins of the HECT family of E3 ubiquitin ligases were extracted using the HMMER program, downloaded from HMMER (http:// hmmer. org/) 66 . The HECT domains were identified with the help of the Pfam database (https:// pfam. xfam. org/), to generate the HMM profile of putative candidates 67 . With S. lycopersicum as a reference database (iTAG2.4), an HMM search was performed with default parameters and a significant e-value of 0.01. The presence of the HECT domain in the candidate proteins was validated using a Simple Modular Architecture Research Tool (SMART) (http:// smart. embl-heide lberg. de/) 68 . The physicochemical parameters, such as molecular weight, number of amino acids, instability index, aliphatic index, and grand average of hydropathicity (GRAVY), of the HECT proteins, were computed with the aid of ExPASy ProtParam (https:// web. expasy. org/ protp aram/) tool 69 . To determine the chromosomal location as well as intron-exon count, PhytoMine (https:// phyto zome. jgi. doe. gov/ phyto mine/ begin. do), an InterMine surface from Phytozome, was used 70 .

Subcellular localization and gene ontology analysis. Balanced subCellular Localization predictor
(BaCelLo) (http:// gpcr. bioco mp. unibo. it/ bacel lo/) and Protein Subcellular Localization Prediction System, Loc-Tree3 (https:// rostl ab. org/ servi ces/ loctr ee3/) were used to determine the subcellular localization of the candidate proteins 71,72 . The Gene Ontology (GO) of the target proteins was identified using the server, Protein ANalysis Through Evolutionary Relationships (PANTHER) (http:// www. panth erdb. org/) 73 . Chromosomal localization and analysis of promoter sequences. The chromosomal lengths of the 12 chromosomes of tomato were retrieved from Ensembl Plants database (https:// plants. ensem bl. org/ index. html) 74 . The chromosomal positions were retrieved from the PhytoMine server and were used for determining the positions of the candidate proteins on the chromosomes of S. lycopersicum, using MapChart 2.32 software, downloaded from MapChart (https:// www. wur. nl/ en/ show/ Mapch art. htm) 70,75 . The promoter sequences (2000 bp in size) for the HECT gene family of tomato were extracted from Phytozome (https:// phyto zome. jgi. doe. gov/ pz/ portal. html) 70 . These promoter sequences were represented in the form of a word cloud, with the help of the WordArt tool (https:// worda rt. com), and were analyzed for the presence of different motifs, using the PlantCARE database (http:// bioin forma tics. psb. ugent. be/ webto ols/ plant care/ html/) 38,76 . Motif and gene structure analysis. Novel conserved motifs were discovered by using probabilistic and discrete algorithms of MEME suite (http:// meme-suite. org/ tools/ meme), a motif-based sequence analysis tool 58,77 . Maximum optimal width (number of characters in a sequence pattern) of 200 amino acids was used for a single motif search with 10 number of motifs limit. The identified motifs were analyzed for their  81 . Motif enrichment analysis was done using Analysis of Motif Enrichment (AME) (http:// meme-suite. org/ tools/ ame) to find relative enrichment of discovered motifs in the reference organisms 82 . Furthermore, the characterization and visualization of HECT gene structure and annotated features like CDS and introns' length was performed using Gene Structure Display Server (GSDS) v2.0 (http:// gsds. cbi. pku. edu. cn/) 83 . Multiple sequence alignment and identification of conserved residues in the HECT domain. To assess the alignment of the retrieved HECT domains from the putative candidates of tomato, multiple sequence alignment (MSA) was performed in the software, Jalview 2.11.0, downloaded from Jalview (http:// www. jalvi ew. org/), by taking human HECT Nedd4 (neural precursor cell expressed developmentally down-regulated protein 4) as a reference, whose sequence information (PDB ID: 4BBN, chain A) was retrieved from RCSB-Protein Data Bank (PDB) (https:// www. rcsb. org/) 84,85 . The sequence alignment was visualized as logo using the tool, WebLogo 3 (http:// weblo go. three pluso ne. com/) and as a three-dimensional structure, using UCSF Chimera 1.14 software (https:// www. cgl. ucsf. edu/ chime ra/) 86-88 . Protein structure prediction for structural characterization. The peptide sequences of the putative targets were submitted in a web-based server, Protein Homology/analogy Recognition Engine (Phyre2) version 2.0 (http:// www. sbg. bio. ic. ac. uk/ ~phyre2/ html/ page. cgi? id= index) for predicting the structural attributes and models of the HECT gene family in tomato 45 . UCSF Chimera 1.14 software (https:// www. cgl. ucsf. edu/ chime ra/) was used for visualization and generation of graphic images of the models predicted 88 . The qualitative assessment of the predicted models was performed using the servers, the Structure Analysis and Verification Server (SAVES) v6.0 (https:// saves. mbi. ucla. edu/) and Qualitative Model Energy ANalysis (QMEAN) (https:// swiss model. expasy. org/ qmean/) [89][90][91][92] . SAVES v6.0 provides the platform for stereochemical assessment through PRO-CHECK, analysis of the 3-D structure using atomic-resolution coordinates through VERIFY-3D, and provides the overall quality factor of non-bonded atoms in structure using ERRAT [89][90][91] . QMEAN evaluates the quality of the model by assessing the likelihood of the generated model being comparable to the experimental structure through the 'degree of nativeness' and the quality score of the model is then expressed in terms of 'Z-scores' 92 .
Phylogenetic tree construction. The evolutionary relationship was assessed using the software, Molecular Evolutionary Genetic Analysis (MEGA) (https:// www. megas oftwa re. net/) 93 . Multiple sequence alignment was performed by ClustalW using all the default parameters in MEGA X and used to construct a phylogenetic tree using the Maximum Likelihood method with 1000 bootstraps per replication. Interactive Tree Of Life (iTOL) v6.3 (https:// itol. embl. de/) was used for visualization of the phylogenetic tree 94 .
Tissue-specific expression patterns in HECT gene family. Gene expression profile of the tomato HECT ubiquitin ligase was inquired using the latest RNA sequencing data pipeline of the TomExpress database (http:// tomex press. toulo use. inra. fr/) 64 . The gene expression was analyzed in vegetative, reproductive tissues, biotic (Meloidogyne javanica, Funneliformis mosseae, Tomato Yellow Leaf Curl Virus, Virus-Induced Gene Silencing of Argonaute genes) and abiotic (sun, shade, and heat shock) stress conditions and hormonal exposure (cytokinin, auxin, indole acetic acid, 1-aminocyclopropane-1-carboxylic acid). The data was visualized using heat maps generated from TomExpress database.

Protein-protein interaction (PPI) network construction and cluster analysis.
For the assessment of the functional association between the HECT gene family as well as with other related proteins, proteinprotein interaction (PPI) networks were constructed using the STRING v11 (Search Tool for the Retrieval of Interacting Genes) database (https:// string-db. org/) 95 . The peptide sequences of target proteins of tomato were used as input and the maximum first shell interactors for each protein sequence were selected to be no more than 50' via basic settings in the STRING interface. The confidence cut-off score for each interaction was set to 'high' (0.700) along with other default parameters. The interaction network data were used for visualization and cluster analysis through Cytoscape (https:// cytos cape. org/) 96 . Cytoscape plug-in, Molecular COmplex DEtection (MCODE) was utilized to identify meaningful modules such as clusters, bearing potential functions within the PPI networks. In this, the value of k-core and node score cut-off were 2 and 0.2, respectively, along with other default settings. Modules having a cluster score ≥ 5 were selected for further analysis.
Functional annotation and pathway analysis of PPI network. Blast2GO, a functional analysis module of OmicsBox 1.3.11 (https:// www. biobam. com/ omics box/), was used for performing Gene Ontologybased functional annotation with the nodes (protein) in each module 97 . Peptide sequences (in FASTA format) of all the interacting proteins in each cluster were retrieved from UniProt (https:// www. unipr ot. org/) and run in Blast2GO, using the default parameters 98 . For a deep understanding of the roles played by these proteins in biological systems, pathway analysis was performed in a Kyoto Encyclopedia of Genes and Genomes (KEGG) web- www.nature.com/scientificreports/ based service, KEGG Automatic Annotation Server (KAAS) (https:// www. genome. jp/ kegg/ kaas/), and pathways were reconstructed by BLAST comparisons in the KEGG database 99 .

Data availability
All data generated or analyzed during this study are included in this published article (and its Supplementary Information files).