Genome-wide identification, evolution, and transcript profiling of Aldehyde dehydrogenase superfamily in potato during development stages and stress conditions

The Aldehyde dehydrogenase (ALDH) superfamily comprises a group of enzymes involved in the scavenging of toxic aldehyde molecules by converting them into their corresponding non-toxic carboxylic acids. A genome-wide study in potato identified a total of 22 ALDH genes grouped into ten families that are presented unevenly throughout all the 12 chromosomes. Based on the evolutionary analysis of ALDH proteins from different plant species, ALDH2 and ALDH3 were found to be the most abundant families in the plant, while ALDH18 was found to be the most distantly related one. Gene expression analysis revealed that the expression of StALDH genes is highly tissue-specific and divergent in various abiotic, biotic, and hormonal treatments. Structural modelling and functional analysis of selected StALDH members revealed conservancy in their secondary structures and cofactor binding sites. Taken together, our findings provide comprehensive information on the ALDH gene family in potato that will help in developing a framework for further functional studies.


Results
Genome-wide analysis of potato identifies 22 putative ALDH members. A total of 22 putative ALDH proteins were identified in S. tuberosum based on homology search in the Solanaceae Genomics Network (https:// solge nomics. net/) (Appendix 1 and 2). NCBI Conserved Domain Database and Pfam analyses confirmed the presence of conserved ALDH domain (PF00171) in all the identified candidates, which is the fundamental property of the ALDH superfamily (Fig. S1). Analysis with PROSITE and multiple sequence alignment ( Fig. 1) confirmed the appearance of conserved cysteine active site (PS00070) and glutamic active site (PS00687) www.nature.com/scientificreports/ in most of the StALDH proteins. 14 out of the total 22 (StALDH2B2, StALDH2B6, StALDH2B7, StALDH2C1, StALDH3F1, StALDH3F2, StALDH3H1, StALDH5F1, StALDH7A1, STALDH10A1, StALDH10A2, StALDH11A1, StALDH12A1, and StALDH22A1) proteins have both the cysteine and glutamic acid active site residues; while StALDH6B1, StALDH6B2, StALDH18A1, and StALDH18A2 proteins have only cysteine active site: and the remaining four (StALDH2B1, StALDH2B3, StALDH2B4, and StALDH2B5) proteins contain no conserved active site in the domain structure ( Fig. 1 and Table S1). The catalytic glutamic acid residue functions as a general base in the hydrolytic ALDHs 23 , thus the absence of glutamic acid as well as cysteine residues in the four hydrolytic ALDHs analyzed sequence might be being an incomplete sequence. Similarly, ALDH6 and ALDH18 enzymes do not possess the catalytic glutamate residue because they contain Coenzyme A (CoA) dependent acylating and Δ-1-pyrroline-5-carboxylate synthetases activity, respectively 12 . All these identified StALDH proteins were grouped into ten families (ALDH2, ALDH3, ALDH5, ALDH6, ALDH7, ALDH10, ALDH11, ALDH12, ALDH18, and ALDH22) and nomenclature based on the established criteria designed by AGNC 11 . Among all the 10 families of StALDH, ALDH2 has the largest number of 8 members, while ALDH5, ALDH7, ALDH11, ALDH12, and ALDH22 has only one member per family (Table S1). ALDH3 has 3 members, while the rest families have 2 members each. The biggest protein, StALDH18A1 is 717 aa in length with a molecular weight of 77.47 kDa, while the smallest protein one, StALDH6B2 is 88 aa in length and 9.35 kDa in size (Table S1). The gene length of StALDH varied from a range of 2005 nt (StALDH2B3) to 9423 nt (StALDH18A1). The predicted isoelectric point (pI) of all the putative StALDH proteins varied from 5.10 (StALDH2C1) to 10.00 (StALDH2B5). Subcellular localization of most of the StALDH proteins was mainly predicted to cytosol and chloroplast, followed by mitochondria, plasma membrane, nucleus, and extracellular space (Table S1).
All the identified StALDH members distributed unevenly in ten different chromosomes. Illustration of the chromosomal distribution indicated that all the putative 22 StALDH genes are located unequally on 10 out of 12 chromosomes of potato (Fig. 2). Chromosome 6 possesses the maximum number of 5 genes, followed by chromosomes 1, 3, and 5 with three members each. Chromosome 12 contains two StALDH genes; followed by chromosomes 2, 4, 8, and 9 contain a single ALDH gene per chromosome. There was no StALDH gene in Chromosome 10 and 11. The expansion of StALDH gene families could be justified through gene duplication analysis 24 ( Fig. 2 and Table S2). Three tandem duplication gene pairs (StALDH2B4|StALDH2B5, StALDH2B5|StALDH2B6, and StALDH18A1|StALDH18A2) and three whole genome duplication (WGD)/segmental duplication events (StALDH2B2| StALDH2B6, StALDH2B6|StALDH2B7, and StALDH10A1|StALDH10A2) were identified ( Exon-intron structure, conserved domain, and motif analysis of StALDHs. The expansion of ALDH family members in potato was further explored by generating an unrooted phylogenetic tree (Fig. 3A). Each class of StALDH members are clustered together to form a separate clade except StALDH2B1 and StALDH2B3 due to their partial sequence (Fig. 3A). This indicates the separation of individual ALDH classes took place before the species-specific expansion.  (Fig. 3C). Ten highly conserved motifs of more than 10 amino acids in length were identified among the 22 StALDH proteins using the online MEME motif search tool ( Fig. S2 and Table S3). Almost all the StALDH genes contain at least one conserved motif except StALDH2B3 and StALDH6B2. StALDH2B2, StALDH2B6, StALDH2B7, StALDH2C1, StALDH10A1, and StALDH10A2 contain all the 10 conserved motifs (Fig. S2). Among the identified motifs, motif-1 and -9 were found in the maximum of 15 sites, followed by motif-3 and -5 in the 14 sites, while motif-7 and -8 were found only in 8 sites (Table S3). Several of the identified genes/ proteins were appeared to be incomplete/truncated with very low protein length (less than 300 aa), molecular mass of less than 32 kDa and shorter conserved ALDH domain. This suggests that either these proteins might be non-functional or the product of pseudo-genes. Although few of them showed significant tissue-specific expression (Fig. S3) ; two algae (Chlamydomonas reinhardtii and Ostreococcus tauri); and two mammals (Homo sapiens and Mus musculus) ALDH members. A maximum-likelihood phylogenetic tree was constructed using 335 protein sequences from all the above-mentioned species (Fig. 4). Our investigation revealed that all these ALDH proteins from various species grouped in 19 families (ALDH-1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 18, 19, 21, 22, 23 and 24). Interestingly, most of the proteins from the same family were clustered together irrespective of the source/ type of organism. Most of the StALDH proteins shared a common core of plant ALDH families and mainly were distributed in 10 major plant specific families and one minor (SlALDH19) subgroup only present in tomato.   Fig. 4 and Table 1). ALDH family-2 and 3 formed the biggest clusters indicating their abundance in all the studied species, while family-5, 12 and 22 had the lowest number of members (Fig. 4). Overall, our observation revealed that the evolution of plant ALDH genes happened before the separation of monocot-rice, maize; and dicot-Arabidopsis, soybean, tomato, mustard, grape, and potato as all the ALDH subfamily members were found to be clustered together from both monocot and dicot plants in the phylogenetic tree (Fig. 4).

Expression of StALDH genes is abundant in fruits.
The expression profile of all the identified StALDH genes was analyzed in thirteen different tissues including roots, tubers, shoots, leaves, flowers, petioles, sepals, petals, carpels, stamens, immature fruits, mature fruit and inside of fruit (mesocarp & endocarp). Various StALDH genes exhibited differential tissue-specific expression patterns. Amidst all the 16 StALDH genes, StALDH2B2 exhibited the maximum level of expression in almost all the considered tissues except tuber, carpel, stamen, and fruits where StALDH5F1 exhibited maximum expression (Fig. 5). Some of the members of the StALDH gene family display highly tissue-specific expression; for example, the expression of StALDH3F2 is petal specific, and StALDH2C1 is leaf-specific (Fig. 5A). All these tissue-specific genes do not show a high level of expression in other tissues or organs. Some of the StALDH genes exhibited high expression in multiple tissues, such as StALDH5F1 showed high expression in root, tuber, and shoot; StALDH6B1 showed high expression in flower, petal, and stamen; StALDH7A1 showed high expression in root, flower, carpel, and stamen; and StALDH11A1  www.nature.com/scientificreports/ found to be highly expressed in leaf, sepal, and petal. Surprisingly, some of the genes showed a very low level of expression in all the tissues, such as StALDH3H1, StALDH10A2, StALDH3F1, and StALDH22A1. Global gene expression analysis in various tissues revealed that StALDH genes were abundant in fruit (immature, mature, and mesocarp & endocarp), flowers, and carpels with a median FPKM more than 25, two-fold higher than that in the root (Fig. 5B). Moreover, StALDH5F1 showed a remarkably higher level of expression in most of the tissues with a median FPKM of 76, followed by StALDH2B2 with a median FPKM of 67 in different tissues.
StALDH transcripts showed differential expression pattern in response to various abiotic and biotic stress elicitors, and hormonal treatments. To have a better understanding of the function of StALDH genes under abiotic stresses, we analyzed their transcript profiling in response to three abiotic stress conditions viz. salinity, dehydration, and heat (Fig. 6A). Among the 16 StALDH genes, StALDH3H1 . Phylogenetic analysis of potato ALDH members. ALDH proteins from various species including Arabidopsis, rice, soybean, maize, field mustard, grape, potato, tomato, black cottonwood, moss, green algae, mouse, and human were collected from databases. A total of 335 protein sequences from 12 different species were aligned by ClustalW followed by the construction of a maximum-likelihood tree using MEGA X (https:// www. megas oftwa re. net/) with 1000 bootstrap replicates. Bootstrap values greater than 50 were shown in the different branching points of the tree indicating significant clustering. ALDH members from different species were indicated by a different colour and summarized in the middle of the tree. The tree was divided into 11 families based on their clustering pattern and individual ALDH family number was mentioned. www.nature.com/scientificreports/ and StALDH12A1 were found to be highly upregulated in all three abiotic stress conditions. A cluster of genes StALDH3F2, StALDH10A1, StALDH10A2, and StALDH18A2 showed a medium to a high level of upregulation in response to salinity and dehydration stresses. Besides this cluster, StALDH11A1 showed the highest level of upregulation in response to salt, dehydration, and heat, respectively. Transcripts of StALDH11A1 showed an upregulation of more than 10 folds, respectively in response to salt, dehydration, and heat. Based on the median fold change in expression of all StALDH genes, the total transcript abundance of StALDH genes was mostly upregulated in response to dehydration stress, while mostly downregulated in response to heat stress (Fig. 6B). For the biotic stress responsiveness of StALDHs, the expression pattern was observed responding to β-aminobutyric acid (BABA) and benzothiadiazole (BTH) and pathogen treatment (Fig. 6C). StALDH6B1 is the only member showing upregulation in all three biotic stress conditions. A clade of StALDH2B2, StALDH2B7, StALDH5F1, and StALDH22A1 were upregulated in both BTH and pathogen treatment. Some genes show upregulation, in particular, one type of stress treatment; StALDH2B6 and StALDH2C1 showed upregulation only in response to BABA treatment; StALDH11A1, StALDH18A1, StALDH18A2, StALDH3F2, StALDH10A1, and StALDH12A1 showed upregulation only in response to BTH treatment. Transcript of StALDH3H1 showing upregulation only in response to pathogen treatment. In response to BABA treatment, StALDH2B6 showed the highest upregulation of more than 4 folds changes. StALDH10A2 showed downregulation in all three biotic stress elicitors. However, the median fold change in expression of all StALDH genes showed mostly upregulation for BTH induction, while downregulation in response to BABA and pathogens (Fig. 6D).

Chlamydomonas reinhardtii
Unicellular green algae Ostreococcus tauri Marine green algae Physcomitrium patens www.nature.com/scientificreports/ (GA3) (Fig. 6E). Surprisingly, all the 22 StALDH genes showed downregulation in response to BAP treatment. www.nature.com/scientificreports/ StALDH12A1, and StALDH18A1 were found to be upregulated in response to other three (IAA, ABA, and GA3) hormone treatments. A cluster of two genes StALDH3F2 and StALDH22A1) showed downregulation in response to all four hormonal treatments. The total transcript abundance of StALDH genes was extremely low in BAP treatment as observed previously in the case of StTPSs 26 , while the rest of the three hormone treatments bought upregulation (Fig. 6F).

Selaginella moellendorffii
Abiotic stress-responsiveness of selected StALDH genes were verified using qRT-PCR. Global gene expression analysis of StALDH in response to various stress conditions revealed that StALDH transcripts are regulated/altered differently depending on the type of environmental stimuli. The differential expression of genes was verified in response to salt (NaCl), drought (Mannitol) and heat stress. Quantitative RT-PCR was performed for highly stress-inducible 11 selected StALDH genes (StALDH-2B6, 2C1, 3H1, 5F1, 6B1, 7A1, 10A1, 11A1, 12A1, 18A2 and 22A1). Data analysis revealed that most of the genes showed upregulation in response to all three treatments (Fig. 7A). Only StALDH10A1 and StALDH11A1, showed significant down-regulation in response to drought and heat treatments, respectively. Transcript of StALDH12A1 was highly in all three conditions followed by StALDH7A1 and StALDH2B6. The results suggested that StALDH members showed frequent stress-induced upregulation in their transcript level.
To verify this finding, we have analyzed the expression pattern of ALDH members from two widely studied model plants-Arabidopsis and rice. Most of the AtALDH members showed upregulation in response to salinity, drought, and osmotic stresses, while fluctuation of temperature (either cold or heat) mostly resulted in downregulation (Fig. 7B). Similarly, the expression of OsALDH transcripts was highly upregulated in response to different degree of dehydration stress (Fig. 7C), followed by salinity and cold stresses. Overall, the abiotic stress-induced transcript alteration of ALDH superfamily members was found to be evolutionarily conserved in both monocot and dicotyledons plant species.
Promoters of StALDH genes contain various abiotic and hormone-responsive cis-elements. Cis-regulatory elements are crucial factors to influence gene expression and regulation 27 . In total, 22 www.nature.com/scientificreports/ cis-regulatory elements including 16 abiotic stress and 6 phytohormone responsive elements were identified in the putative promoter region of StALDH genes (Fig. S4). The appearance of various hormonal responsive elements (ABRE, ARE, GARE motif, TC rich elements, TCT motifs, and TGA elements) on StALDH promoters revealed the feasible impact of different hormones, for instance, abscisic acid, gibberellin, auxin, jasmonic acid, and salicylic acid on the expression of StALDH genes. Our analysis revealed that promoter of StALDH2B4 and Homology modelling of representative StALDH proteins. Self-optimized prediction method with alignment (SOPMA) predicted the presence of alternate ratio of alpha helices, extended strands, beta turns, and coils in the different StALDH protein structures (Table S4). The presence of alpha-helix ranging from 29.12 www.nature.com/scientificreports/ to 56.67% dominates the other form in the secondary structure prediction, followed by random coil (30.00-43.68%), extended strand (9.17-20.88%) and finally beta-turn (4.15-11.36%). Protein glycosylation is another important aspect of protein structure that regulates a wide range of biological processes such as protein folding, signalling, stability, conformation, and cell-cell interactions 28 . Glycosylation analysis predicted that 12 out of 16 analyzed StALDH proteins have potential N-glycosylation sites, among them StALDH12A1, StALDH18A1, and StALDH18A2 have the highest number 3N-glycosylation sites (Table S5). To know the structural arrangement and 3-D coordination, four abiotic stress-responsive proteins-StALDH3H1, StALDH10A1, StALDH11A1, and StALDH12A1 were selected for homology modelling using the template of Rattus norvegicus ALDH (PDB: 1AD3), Solanum lycopersicum ALDH (PDB: 4I9B), Streptococcus mutans ALDH (PDB: 1EUH), and Zea mays ALDH (PDB: 6D97), respectively (Fig. 8). The generated homology models were validated using MolProbity Ramachandran plot analysis (Fig. S5). Results confirmed the accuracy of 3D modeling as most of the residues of StALDH3H1 (Fig. 8A), StALDH10A1 (Fig. 8B), StALDH11A1 (Fig. 8C), and StALDH12A1 (Fig. 8D) were placed in the favored region of 95.1%, 97.8%, 96.0% and 97.7%, respectively (Fig. S5). The homology model revealed that the overall structure of the four selected proteins was very similar in terms of common strands and helices in the Rossmann folding type (Fig. 8). However, few remarkable dissimilarities were noticed in the length and conformation of the oligomerization site, angle of alpha-helices and beta-sheets, and tail of the N-terminal. StALDH11A1 showed a longer loop in the oligomerization domain and more curvature coil in the catalytic and coenzyme binding site (Fig. 8C) than the other selected proteins. The surface charge distribution of the selected proteins was generated through the Adaptive Poisson-Boltzmann Solver (APBS) package as shown in two surface views rotated 180° (Fig. 8A-D). Different colours depicted in these models indicated different surface properties, where blue representing positive charge, red negative charge, and white neutral charge. Significant dissimilarities have been observed in the positively, and negatively charged amino acid distribution in the surface of those selected proteins. www.nature.com/scientificreports/

Discussion
Potato is a good source of dietary fiber with other essential nutrients and serves as the main food to more than a billion people in over 100 countries 29,30 . Consequently, increasing the yield of potato has a significant role in satisfying the nutritional demands for global population growth 31 . As potato is a stress-sensitive crop, its capability to deter various abiotic and biotic stress is essential for producing as a major food source in near future. The complete genome sequence of potato was made available in 2011 22 . In this study, we have performed a comprehensive investigation of ALDH members in potato to reveal its functional correlation with various abiotic and biotic stress conditions. ALDH genes have been identified in both prokaryotic and eukaryotic organisms and specified within almost all plant species 12 . Previously 16,20,22,53,23,23, and 29 ALDH genes have been identified in Arabidopsis 4 , rice 7 , maize 15 , soybean 9 , grape 17 , mustard 8 , and tomato 14 , respectively (Table 1). We have identified 22 ALDH genes in the potato (genome size of 840 Mb), which is greater than the number of previously identified smaller genome sized Arabidopsis (16) and rice (20), but lower than mustard and grape (23 each). Thus, the number of total ALDH could be directly correlated with their respective genome size (Table 1). Scatter plot with regression analysis showed a significant correlation (R 2 = 0.6128) between the total identified ALDH gene numbers with their respective genome size on the selected organisms except for Z. mays, H. Sapiens, and M. musculus (Fig. S6).
Plant ALDH genes are mainly grouped into 14 families, while only ten ALDH families (Family-2, 3, 5, 6, 7, 10, 11, 12, 18, and 22) have been found in potato consistent with the previously identified Arabidopsis, rice, maize, soybean, and grape ALDH family. Interestingly, the presence of ALDH19 has been only reported from tomato plants 14 and family ALDH-21, 23 and 24 identified only in mosses and algae, to date (Table 1 and Fig. 4). Expansion of a gene family evolved from the process of whole-genome duplication, tandem duplication, or segmental duplication 32 . We have observed both WGD/segmental duplication and tandem duplication events that are involved in the expansion of the potato ALDH gene family. Previously, two tandem duplication events had been recorded in O. Sativa (OsALDH2-1|2-2 and OsALDH3-1|3-2) 7 , V. vinifera (VvALDH5F1|5F2|5F3 and VvALDH6B3|6B5) 17 . In our current study, we also found two tandem duplication events (StALDH2B4|StALDH2B5|StALDH2B6, and StALDH18A1|StALDH18A2). Besides, we have also found three WGD/segmental duplication events that took place approximately 61.7, 25.8, and 18.8 Mya ago. Though 9 out of 22 StALDH genes have emerged from the duplication events, it was not always possible to determine their function and expression relying on their common ancestors. Six out of these 22 StALDH genes (StALDH2B1, StALDH2B3, StALDH2B4, StALDH2B5, StALDH3F1, and StALDH6B2) have partial/truncated sequence and thus, do not possess the conserved active site residues/ALDH domain (Table S1, and Fig. 1). However, three of them (StALDH2B3, StALDH2B4, and StALDH2B5) showed significant transcript abundance in stamen and mature fruit (Fig. S3). Thus, these genes could be pseudogene, or neo-/sub-functionalized, which need further experiment to confirm. Phylogenetic analysis of S. tuberosum ALDH members with other identified plant and non-plant species revealed that members from the same family clustered together. This indicates the fact of the evolution of plant ALDH genes took place before the detachment of dicotyledon-Arabidopsis grape, soybean, tomato, potato and mustard and monocotyledon-rice, and maize. Moreover, our phylogenetic investigation unveiled that ALDH family-2, 5, 7, and 10 are closely related, whereas family-18 is the most distantly related one (Fig. 4). In addition, ALDH-2 and ALDH-3 are the two most extended families, while ALDH-12 and ALDH-18 are the smallest families in the twelve analyzed species.
Plant ALDH genes have a significant role in environmental adaptability and alteration in expression patterns when exposed to a variety of stressors such as dehydration, extreme salinity, heat, oxidative stress, and many others 12,33 . Therefore, expression profiling of different StALDH genes reveals their function in different stress conditions. Different members of plant ALDH genes have been found to express in different tissues and developmental stages. In an earlier study, MdALDH3F1 and MdALDH10A8 were found to be highly expressed in the fruit development of apple 34 . The expression level of VvALDH2B8, VvALDH3H5, and VvALDH18B1 significantly increased during grape development and ripening 17 . GmALDH3H2 and GmALDH3H4 showed a high expression level in the flower of soybean 9 . In Solanum tuberosum, a high expression level was observed in fruit (immature, mature, and mesocarp & endocarp), flower, and carpel with a median FPKM of more than 25 (Fig. 5). However, StALDH2B2 and StALDH5F1 showed remarkably higher expression in almost all tissues. Thus, our investigation has consistency with the fact that different ALDH genes have different expression patterns in a tissue-specific manner. Previously ALDH genes were found to be upregulated in drought, salinity, and heat stresses in various organisms. Transcripts of OsALDH2-4 and GmALDH2B2 genes were highly up-regulated in response to drought stress in rice and soybean, respectively 7,9 ; that of VvALDH2B4 and VvALDH2B8 have shown up-regulation in response to drought and salinity stress in grape 17 and PtALDH3H4 and PtALDH6B4 in black cottonwood were found to be upregulated in response to heat stress 18 . In our study, we have observed an upregulation of 50% (8/16) StALDHs in response to salinity stress among them StALDH18A2 showed the highest upregulation of almost three-fold change. In response to dehydration stress, 62.5% (10/16) genes were upregulated, among them StALDH10A2 and StALDH18A2 showed the maximum upregulation of 1.5 folds (Fig. 6A). Moreover, the abiotic stress-specific transcript upregulation of StALDH12A1, StALDH7A1, and StALDH2B6 was further confirmed by qRT-PCR analysis in one of the Bangladeshi potato variety (Fig. 7A) and the abiotic stress-specific transcript upregulation found to be evolutionary conserved in Arabidopsis and rice (Fig. 7B,C). Similarly, 56.25% (9/16) StALDH genes showed upregulation in response to heat. To the best of our knowledge, the role of ALDH in biotic stress has not been investigated thoroughly. Most of the StALDH genes showed upregulation in response to different biotic stress elicitors (Fig. 6C). Among the 16 members, StALDH6B1 showed universal upregulation in response to all three biotic stress conditions. Phytohormones played important roles in the ability to respond to various stress condition. Formerly, AtALDH3I1 and AtALDH7B4 from Arabidopsis and BrALDH12A1 from Brassica rapa have shown significant upregulation in response to ABA treatment. We have observed a similar www.nature.com/scientificreports/ pattern of upregulation for most of the StALDH transcripts in response to ABA, GA3, and ABA treatments. Interestingly, BAP treatment resulted in complete downregulation of all StALDHs (Fig. 6E). This result indicates that BAP might be a key negative regulator for ALDH gene transcription, similar to TPS genes 26 . Moreover, the presence of various phytohormone and stress-responsive cis-acting regulatory elements in the putative StALDH promoter regions could be directly correlated with the observed expression profile. The promoter of ALDH7 genes of different Brassicaceae family contained conserved ACGT-containing motif, dehydration-responsive element (DRE) and C-reactive low temperature-responsive element (CRT) that is induction by salt, dehydration, and ABA in leaves 35 . From the cis-regulatory elements analysis, we found that StALDH2B2, StALDH3H1, StALDH5F1, StALDH10A1, StALDH11A1, StALDH12A1, and StALDH18A2 contained cis-element which has a critical role in response to drought stress 36 . This result is compatible with our abiotic stress expression findings as these genes are upregulated in response to drought stress condition. Cellular functions of a protein are accomplished by 3D folded protein structure and protein-ligand interactions 15 . To gain an insight into its function homology-based modelling of ALDH protein was done previously in rice 10 , maize 15 and tomato 14 . In the present study, we have analyzed the structure of four abiotic stressspecific proteins for their structural variation mainly in the oligomerization sites and charge distribution in the outer surface. Having identified the putative StALDH proteins along with their transcript profile and subcellular compartments, a cellular model for stress-resistant via aldehyde dehydrogenase has been proposed for potato (Fig. 9). Abiotic and biotic stresses arise a disproportion enhancement of ROS production and induce oxidative stress in general 37 . Oxidative stress, in turn, triggers lipid peroxidation to produces aldehydes as by-products. Thus, ROS induced aldehyde by-products form a vicious circle to further amplify the destructive function of reactive species 38 . Excessive ROS induced aldehydes cause several downstream modifications including depletion of reduced glutathione, protein oxidation/modification, mitochondrial dysfunction, and nutrient stress, which ultimately lead towards endoplasmic reticulum (ER) stress. ER stress promotes the unfolding of proteins, leading to metabolic remodelling, inflammatory responses, cytotoxicity, and even DNA abandonment, hence threatening cellular viability 37 . To counteract the deleterious effects of reactive aldehydes in the cell, StALDH proteins may catalyze the conversion of aldehyde to acid. However, these aldehydes could be reduced by aldose reductase 39 , or neutralized by glutathione conjugation facilitated by glutathione S-transferases 40 .

Materials and methods
Identification, characterization, and nomenclature of ALDH genes in potato. For the identification of Solanum tuberosum ALDH proteins, a BLASTp search was conducted with a stringent E-value cutoff (≤ e − 3) using previously identified ALDH protein sequences of Arabidopsis 4 , rice 7 , and tomato 14 as query sequences; and InterPro ID (IPR015590) search in the Solanaceae Genomics Network database (https:// solge nomics. net/). NCBI conserved domain database (https:// www. ncbi. nlm. nih. gov/ Struc ture/ cdd/ wrpsb. cgi) and Pfam server (http:// pfam. xfam. org/) were used to confirm the presence of conserved ALDH domain (PF00171) in the resulting protein sequences. The presence of ALDH cysteine active site (PS00070) and glutamic acid active site (PS00687) were confirmed using PROSITE (http:// prosi te. expasy. org/) and multiple sequence alignment by Clustal Omega (https:// www. ebi. ac. uk/ Tools/ msa/ clust alo/). All the confirmed potato ALDH members from S. tuberosum were named according to the protocol of the ALDH Gene Nomenclature Committee (AGNC) 11 and specified as StALDH. According to the AGNC criteria, protein labels (ALDH) were accompanied by a family Figure 9. Illustration of the possible role of StALDH in aldehyde detoxification under stress conditions. Different abiotic (heat, cold, salinity, drought, flood, heavy metal) and biotic (pathogen and insects) stresses affect various part of potato plants and induce intercellular reactive oxygen species (ROS) accumulation. ROS in turn forms a vicious cycle by forming reactive aldehydes that could be either detoxified by ALDH or resulted in tissue degeneration. The figure was generated using Adobe Illustrator (https:// www. adobe. com/ produ cts/ illus trator. html). www.nature.com/scientificreports/ designation number (1, 2, 3, etc.), a subfamily designation letter (A, B, C, etc.), and a gene description number accordant with chromosomal order. Amino acid sequences greater than 40% identical to the previously identified ALDH sequences were grouped in the same family, and sequences greater than 60% similarity were grouped as protein subfamily. Protein sequences of less than 40% similarity were grouped as novel ALDH protein family. Physical properties of protein such as polypeptide length, pI, and molecular weight were predicted using ExPASy Genomic organization and duplication analysis of StALDH genes. The genomic position of all the 22 StALDH genes was illustrated by CIRCOS software 44 based on the positional information available in Table S1. For synteny analysis synteny block within the StALDH genes were retrieved from the plant genome duplication database (http:// chibba. agtec. uga. edu/ dupli cation/ index/ downl oads) 34 . Duplication events were predicted by considering ≥ 80% sequence similarity among the ALDH proteins 40,45 . Tandem duplication events were predicted by finding adjacent homologous StALDH genes on the identical chromosome with no more than one gene that separate them 17,34 . Duplicated StALDH gene pairs falling in the recognized syntenic blocks were defined as whole-genome duplication or segmental duplication 9,46 55 . Seedlings (15 days old) were subjected to 150 mM NaCl for salinity or 260 mM mannitol for drought or kept at 37 ± 1 °C for heat treatment. Samples of each treatment with triplicates were harvested after 24 h and stored at − 80 °C after frozen in liquid nitrogen until RNA isolation.

Conclusion
In conclusion, we have identified a total of 22 putative ALDH members, that were grouped into ten families. Detailed investigation of these genes was carried out regarding their classification, genomic organization, subcellular localization, structure, evolution, promotor analysis, and protein modelling. Moreover, analyses of their expression profiles at various potato tissues and under biotic and abiotic stress treatments widen our understanding of this multidimensional protein. Collectively, this study led to the functional characterization of potato ALDH genes. Unlike other previous reports, the current study covered a wider perspective of the detoxification process of reactive aldehydes that will pave way for many more future studies for a better understanding of stress alleviation pathways in plants.