Introduction

As the terminal components of stress signal transduction chain, heat shock transcription factors (Hsfs) play an important role in mediating the expression of genes responsive to various abiotic stresses, especially heat stress (HS)1. Hsfs specifically recognize the binding motifs ‘AGAAnnTTCT’, called heat stress elements (HSEs) conserved in promoters of HS-inducible genes2.

The structure of Hsfs was similar to other transcription factors, a classical Hsf was composed by N-terminal highly conserved DNA-binding domain (DBD), oligomerization domain (OD), nuclear localization signal (NLS), nuclear export signal (NES), repressor domain (RD) and C-terminal activator peptide motif (AHA)2. The DBD is characterized by a central helix-turn-helix motif that specifically binds to the HSEs in promoters of target genes, this domain is the best preserved domain during the evolution3. The OD with a pattern of hydrophobic heptad repeats, referred to as the HR-A/B region, form a coiled-coil structure mediate oligomerization of Hsfs2. According to the length of the linker between DBD and HR-A/B regions and the amino acid length inserted into the HR-A/B region, plant Hsfs are grouped into three main classes (A, B and C)4. The NLS and NES mediate the intracellular distribution, while activity of Hsfs depends on the shutting balance between nuclei and cytoplasm5. The RD are characterized by the tetrapeptide LFGV in the C-terminal of class B Hsfs, except HsfB5, which is function as repressor motif through interaction with the corepressor6. The C-terminal AHA motifs confer the transcriptional activator function of Hsfs and are class A Hsf specific and not found in class B or C2,4.

Numerous studies have indicated that Hsfs play important roles in plant responses to various environmental stresses, including heat, cold, drought, salinity and oxidative stresses3. Within class A Hsf, the members of HsfA1 subclass are considered as master regulators of HS response. The tomato HsfA1a has been reported as the single master regulator in thermotolerance7. And HsfA1a is required for the nuclear retention and transcriptional activation of HsfA28. However, the situation might be inconsistent in different plant species. For example, single or multiple mutants of four HsfA1s (a, b, d and e) revealed no such master HsfA1 regulating the thermotolerance in Arabidopsis9. In many plant species, such as tomato, Arabidopsis, rice, Populus trichocarpa and Salix suchowensis, HsfA2 is the most strongly induced Hsf under heat stress10,11,12,13. In heat stressed cells, HsfA2 becomes the dominant Hsf after long-term HS or repeated cycles of HS and recovery14. Although HsfA2 can function to improve thermotolerance in the absence of HsfA1s when overexpressed in a quadruple knockout mutant Arabidopsis (athsfA1a,b,d,e)15. The hetero-oligomerization of HsfA1 and HsfA2 leads to form a superactivator complex, which activates downstream HS-inducible genes in a stronger manner than individuals10. In contrast, the HsfA4 and HsfA5 can also form oligomers, but their activities are inhibited. HsfA4 controls the ROS levels and is considered as an antiapoptotic factor, while HsfA5 specifically represses HsfA4 by inhibiting its DNA binding activity and acting as a proapoptotic factor16. HsfA3 is involved in drought and salt stress responses and is located in downstream of the DREB2A transcriptional cascade system17. In comparison to class A Hsfs, class B Hsfs have no typical transcriptional activity as they lack of an activator domain2. In tomato, the HS-induced HsfB1 act as a coactivator of HsfA1a by assembling into an enhanceosome-like complex14. Tomato HsfA1a, HsfA2 and HsfB1 form a triad to activate the expression of HS-inducible genes during HS response and recovery2. But HsfB1 from Arabidopsis is described as a repressor of HS-responsive Hsfs18. This indicates the Hsf members, even the orthologous genes, play divergent roles in different species. Recently, genome-wide expression profiles analyses of Hsfs were performed in many plant species, including two woody species P. trichocarpa and S. suchowensis13,19. However, little is known about Hsfs in the desert poplar, P. euphratica.

P. euphratica, along with the two other genome sequenced woody species, P. trichocarpa and S. suchowensis, belong to the Salicaceae family. Their morphological characteristics and stress tolerance performances were significantly different. As a native species in desert regions, P. euphratica shows distinguished adaptability to various abiotic stresses20.

To explore the potential roles of P. euphratica Hsfs in abiotic stress responses, the current study identified 32 PeuHsf genes and analyzed their evolutionary relationships, gene structures, conserved domains, cis-acting elements and expression patterns across different tissues and under various abiotic stresses. In addition, the expression profiles of PeuHsfs under different abiotic stresses were compared to the well-studied Hsfs in Arabidopsis. In conclusion, the present study is helpful in understanding the distinguished stress tolerance of P. euphratica and providing the basis for further functional studies of PeuHsfs.

Results

Genome-wide identification of the Hsf genes in P. euphratica

To identify the Hsf genes in P. euphratica, the conserved Hsf domain (PF00447) from the Pfam database was used to search against the P. euphratica genome. In addition, the amino acid sequences of 27 AtHsfs and 31 PtHsfs were used as query to perform BLASTP search in the P. euphratica genome. After detection of the conserved DBD domain and the coiled-coil structure from the SMART database, a total of 32 PeuHsfs were identified (Table 1). The subfamily classification of PeuHsf was based on the length of the linker between DBD and HR-A/B regions and the amino acid length inserted into the HR-A/B regions. Moreover, the results were also confirmed in the Heatster database (http://www.cibiv.at/services/hsf/)2. The identified PeuHsfs encode proteins ranging from 207 to 737 amino acids (aa) in length (average of 384 aa), with molecular weight (MW) of 23.89 to 81.18 kDa (average of 42.91 kDa) and isoelectric point (pI) of 4.72 to 9.28. Among the 32 PeuHsf proteins, the percentage of negatively charged residues (Asp + Glu) ranged from 8.8% to 17.7% and the percentage of positively charged residues (Arg + Lys) ranged from 7.8% to 16.7%. Based on the instability index analysis, all the PeuHsf proteins were unstable. In addition, the aliphatic index had a range of 57.58 to 79.06 and the grand average of hydropathicity ranged from −0.903 to −0.415 (Table 1).

Table 1 The Hsf genes identified from the P. euphratica.

Phylogenetic relationship of PeuHsf genes

To explore the evolutionary characteristics of the PeuHsf genes, an unrooted phylogenetic tree was generated using the Hsf protein sequences from P. euphratica, P. trichocarpa, S. suchowensis and A. thaliana. Compared to the two species in Salicaceae (P. trichocarpa and S. suchowensis) and model plant Arabidopsis, the desert poplar P. euphratica is the largest in size of the Hsf family (Fig. 1a). According to the phylogenetic tree, the PeuHsfs could be grouped into three main classes (A, B and C) (Fig. 1b). Among the three main classes, class A was the largest class consisting of 19 members from nine subclasses (A1–A9), class B consisted of 12 members from five subclasses (B1–B5), while class C only contained one gene (Fig. 1). The length of class A PeuHsfs ranged from 331 to 737 aa (average of 449 aa), which was relatively longer than class B (207 to 368 aa with average of 285 aa) and class C (338 aa) (Table 1). All the three analyzed Salicaceae species contained Hsf members in subclass B5, while Arabidopsis did not include any member in subclass B5. Compared to the Hsfs in P. trichocarpa, P. euphratica containing one more in subclasses A5, A7 and B2, but one less in subclasses A8 and B3 (Fig. 1 and Supplementary Table S1).

Figure 1
figure 1

Hsf gene family and their phylogenetic relationships in P. euphratica (Peu), P. trichocarpa (Pt), S. suchowensis (Ssu) and A. thaliana (At).

(a) The members of Hsf from four species were classified into three mainly subfamilies (class A, B and C). (b) The phylogenetic tree was constructed using full-length of amino acid sequences of Hsf proteins in the four species by the neighbor-joining (NJ) method with 1,000 bootstrap replicates. Bootstrap support values are indicated on each node.

Structural analysis of PeuHsfs

Gene structures

The gene structures of PeuHsfs were analyzed by comparing the cDNA sequence and genomic DNA sequence. As shown in Fig. 2a, two PeuHsfs (-A2 and -A5c) were comprised of four exons, two PeuHsfs (-A1a and -A1d) were comprised of three exons, while PeuHsf-A1b was only comprised of one exon without splice. Except for the aforementioned five PeuHsfs, all the other 27 PeuHsfs were comprised of two exons. Introns can be classified into phase 0, phase 1 and phase 2 depending on their position relative to the reading frame. All the two-exon containing PeuHsfs showed the same intron phase (phase 0). Over all, the gene structure and the intron phases were significantly conserved among the members in each subclass of PeuHsf family.

Figure 2
figure 2

Gene structure (a) and conserved protein motifs (b) of members in PeuHsf family. (a) Grey box, blank box and black line were represented UTR, exon and intron, respectively. The number 0, 1 and 2 on the black line were intron phase. (b) Total of 15 conserved motifs were identified using MEME. The detail were listed in Supplementary Table S2.

Conserved motifs

Based on the known information of Hsfs in Arabidopsis, six conserved domains including DBD, HR-A/B, NLS, NES, AHA and RD were identified in PeuHsf proteins (Table 2). As the binding domain to the downstream functional genes, the DBD existed in all the PeuHsfs. The DBD was comprised of three α-helices and four β-sheets in the form of α1-β1-β2-α2-α3-β3-β4 (Supplementary Fig. S1). All the members of PeuHsf included NLS, which consists of nucleus localization of the transcription factor. With the exception of the members in subclasses A3 and A9, all the other class A PeuHsfs had NES domains. AHA motifs were only detected in class A PeuHsfs (except for PeuHsf-A9), four proteins in subclasses A2 and A4 had two AHA motifs, while PeuHsf-A3 had a longer AHA motif (W-X17-W-X20-W-X15-W) than the other members (Table 2).

Table 2 Functional domains of PeuHsfs.

In addition, the conserved motifs of PeuHsfs were also analyzed using MEME. Among the 15 detected motifs, motifs 1 and 2 existed in almost all of the PeuHsfs; the two motifs including the highly conserved DBD. Motif 3 represented coiled-coil structure in classes A and C PeuHsfs, while which was replaced by motif 5 in class B PeuHsfs. Moreover, motif 6 corresponding to the AHA motif nearby the C-terminus of class A PeuHsfs (Fig. 2b).

Expansion of Hsf family in P. euphratica

Duplication events

To analyze the expansion of Hsf genes in P. euphratica, the PeuHsfs were mapped onto each scaffold based on publicly available information provided in the P. euphratica genome database. A total of 32 PeuHsfs were mapped onto 31 scaffolds (Supplementary Fig. S2). The synteny analysis was performed to identify the duplicated blocks. Based on the chromosomal location, duplicated blocks, phylogenetic relationships, gene structures and conserved motifs of the PeuHsfs, total of 10 paralogous pairs were identified in PeuHsf family (Table 3). With the exception of one paralogous pair (PeuHsf-A5b/PeuHsf-A5c) that was generated by tandem duplication event, the other nine pairs were generated by whole genome duplication (WGD) (Supplementary Fig. S2). Two pairs (PeuHsf-A1a/PeuHsf-A1d and PeuHsf-A5b/PeuHsf-A5c) were generated in the latest stage (~0.49 MYA), three pairs (PeuHsf-A4a/PeuHsf-A4c, PeuHsf-A6a/PeuHsf-A6b and PeuHsf-A7a/PeuHsf-A7b) were generated in ~13 MYA and the other five pairs were generated between 17–20 MYA (Table 3). Comparatively, the generation date of P. trichocarpa Hsf paralogous pairs was relatively concentrated, one pair was generated in ~7.87 MYA, five pairs were generated in ~13 MYA, four pairs were generated in 17–20 MYA19.

Table 3 Divergence between paralogous PeuHsf gene pairs.

Among the four analyzed species, 10, 10, 5 and 4 Hsf paralogous pairs were detected in P. euphratica, P. trichocarpa, S. suchowensis and A. thaliana, respectively (Supplementary Table S3). With respect to the total number of Hsfs in each species, the proportions of duplicated Hsfs were relatively high in two Populus species (10 pairs in 31 PtHsfs or 32 PeuHsfs). Although the same number of Hsf paralogous pairs were identified in the allied species P. euphratica and P. trichocarpa, the composition of paralogous pairs was significantly different - two subclass A1 Hsf paralogous pairs in P. euphratica but only one pair in P. trichocarpa, Hsf paralogous pair in subclass B5 was specifically in P. euphratica while Hsf paralogous pairs in subclasses A8 and B3 were specifically in P. trichocarpa (Supplementary Table S3).

Ka/Ks values

Subsequently, the substitution rate ratio of non-synonymous (Ka) vs. synonymous (Ks) substitutions was analyzed to verify the Darwinian position selection. The Ka/Ks ratios of all the 10 PeuHsf paralogous pairs were less than 0.6 (Table 3), indicating that the PeuHsf family had undergone purifying selection pressure. Compared to S. suchowensis and A. thaliana, the two Populus species had wide range of Ka/Ks, where the average Ka/Ks in P. euphratica was larger than that in P. trichocarpa (Fig. 3a). When combined, looking at the synonymous distance, we found that the synonymous distance of all the Hsf paralogous pairs were less than 0.5 in Salicaceae (P. euphratica, P. trichocarpa and S. suchowensis) but more than 0.7 in A. thaliana, which resulted in the distribution of AtHsf paralogous pairs departing from PeuHsf, PtHsf and SsuHsf paralogous pairs (Fig. 3b).

Figure 3
figure 3

Estimates of Ka/Ks ratios in pairwise comparisons in four species Hsf families.

(a) Average molecular evolutionary rate (Ka/Ks) for Hsf genes in P. euphratica (Peu), P. trichocarpa (Ptr), S. suchowensis (Ssu) and A. thaliana (At). Red line in each box indicates average value of Ka/Ks in each species. (b) The Ka/Ks ratios of the duplicated Hsf genes in four species are shown in the scatter plots, the y and x axes denote the Ka/Ks ratio and synonymous distance for each pair. The detail of the Ka and Ks in the four species was listed in Supplementary Table S3.

Furthermore, a sliding window analysis of the Ka/Ks ratios was carried out for pairwise comparison. Two pairs (PeuHsf-A1a/PeuHsf-A1d and PeuHsf-A5b/PeuHsf-A5c) did not obtain the results because of their extremely similar sequences. Among the other eight PeuHsf paralogous pairs, the N-terminus of DBD in PeuHsf-A1c/PeuHsf-A1d pair had a significant peak higher than 1 that was under positive selection, while all the other regions were under purifying selection (Fig. 4). However, all the DBD and HR-A/B regions in Arabidopsis Hsf paralogous pairs were conserved with very low Ka/Ks ratio, except for the region behind HR-A/B in AtHsf-A1a/AtHsf-A1c pair had a peak more than 1 (Supplementary Fig. S3).

Figure 4
figure 4

Sliding window plots of representative duplicated Hsf genes in P. euphratica.

The window sizes were 90 bp and 60 bp separately.

Variety of cis-acting elements in the promoter regions of PeuHsf genes

The 2 kb upstream sequences of the translation initiating site of the PeuHsfs were searched in the PlantCARE database to identify the potential cis-acting elements. As shown in Fig. 5, the identified cis-acting elements were classified into three mainly functional classes: stress, hormone and development. In stress-related cis-acting elements, HSE, anaerobic induction element (ARE), TC-rich repeats and MYB binding site involved in drought inducibility (MBS) were detected in the promoters of 27, 26, 26 and 21 PeuHsfs, respectively. Among these stress-related cis-acting elements, HSE was the most enriched element with enrichment level 2.7 (total of 73 HSE located in 27 PeuHsfs promoters, i.e. 73/27), followed by ARE (enrichment level 2.3), TC-rich repeats (enrichment level 1.9) and MBS (enrichment level 1.7). In hormone-related cis-acting elements, a total of 39 MeJA responsive elements (CGTCA-motif), 35 salicylic acid responsive elements (TCA-element), 29 gibberellin responsive elements (GARE-motif), 22 abscisic acid responsive elements (ABRE) and 22 auxin responsive elements (TGA-element) were detected in the promoters of 23, 24, 17, 16 and 15 PeuHsfs, respectively. Among the development-related cis-acting elements, 23 PeuHsfs had circadian related elements (enrichment level 2.7) and 16 PeuHsfs contained meristem related cis-acting elements (CAT-box or CCGTCC-box). Three PeuHsfs (-A1b, -A1c and -A7a) had leaf development related elements (HD-Zip1 or HD-Zip2). In addition, three PeuHsfs (-A5a, -A6a, -A7b) contained MYB binding site involved in flavonoid biosynthetic gene regulation (MBSI).

Figure 5
figure 5

cis-acting elements in the promoter region of PeuHsf genes.

(a) Number of each cis-acting element in the promoter region (2 kb upstream of translation initiation site) of PeuHsf genes. (b) The statistics of total number of PeuHsf genes including corresponding cis-acting elements (red dot) and total number of cis-acting elements in PeuHsf gene family (black box). Based on the functional annotation, the cis-acting elements were classified into three major classes: stress-, hormone-, or development-related cis-acting elements. (HSE, heat stress responsive element; MBS, MYB binding site involved in drought inducibility; LTR, low temperature responsive element; ARE, essential for the anaerobic induction; GC-motif, enhancer-like element involved in anoxic specific inducibility; WUN-motif, wound responsive element; TC-rich repeats, defense responsive element; Box-W1, fungal elicitor responsive element; AuxRR-core and TGA-element, auxin responsive element; ABRE, abscisic acid responsive element; TATC-box, GARE-motif and P-box, gibberellin responsive element; ERE, ethylene responsive element; TCA-element, salicylic acid responsive element; CGTCA-motif, MeJA responsive element; circadian, circadian control; CAT-box, meristem expression; CCGTCC-box, meristem specific activation; as-2-box, shoot-specific expression and light responsiveness; HD-Zip1, differentiation of the palisade mesophyll cells; HD-Zip2, control of leaf morphology development; as1, root-specific expression; RY-element, seed-specific regulation; MBSI, MYB binding site involved in flavonoid biosynthetic genes regulation.)

Expression profiles of PeuHsfs across different tissues and response to various abiotic stresses

Tissue-specific expression

We then analyzed the spatial and temporal expression profiles of PeuHsfs in eight tissues including shoot tip (ST), young leaf (YL), mature leaf (ML), stem (S), young root (YR), old root (OR), female catkin (FC) and male catkin (MC). Because of the highly similarity of four pairs (A1a/A1d, A5b/A5c, A7b/A7c, B2a/B2d), no gene-specific primers could be designed to distinguish their expression patterns in each of the pairs. Among the 28 analyzed PeuHsfs, three of four members in subclass B4 PeuHsfs were highly expressed in shoot tip. Eight PeuHsfs (-A1b, -A1c, -A2, -A4b, -A7a, -B1, -B2c and -B5b) were highly expressed in mature leaf compared with those in young leaf. Four members in subclass B4 PeuHsf were highly expressed in stem. Eight PeuHsfs (-A1b, -A1c, -A4a, -A5a, -A5b/c, -B1, -B3 and -B5a) were highly expressed in both young root and old root. Noticeably, PeuHsf-B3 was highly expressed in female catkin but scarcely expressed in male catkin (Fig. 6a).

Figure 6
figure 6

Expression patterns of PeuHsf genes across different tissues (a) and under various abiotic stresses (b) using qRT-PCR. (a) The expression of PeuHsfs in shoot tip (ST), young leaf (YL), mature leaf (ML), stem (S), young root (YR), old root (OR), female catkin (FC) and male catkin (MC) from P. euphratica. (b) The expression patterns of PeuHsfs after treated for 0, 1, 6, or 12 h under drought (25% PEG w/v), salt (300 mM NaCl), heat (42 °C), cold (4 °C), or 100 μM abscisic acid (ABA). To illustrate, the expression of PeuHsfs in different tissues were compared with the median value among these tissues. For different treatment, the expression of PeuHsfs in 1, 6, or 12 h were compared with the control in 0 h. The different colors correspond to log2 transformed value, green indicates down-regulation and red represents up-regulation. The expression patterns of orthologous Hsf genes in Arabidopsis across various tissues and under abiotic stresses were shown in Supplementary Fig. S4 and Supplementary Fig. S5. Primers used for qRT-PCR are listed in Supplementary Table S3.

Stress responses

To identify the roles of PeuHsfs in stress responses, the expression patterns of PeuHsfs were analyzed in the leaves of P. euphratica seedlings treated with drought (25% PEG w/v), salt (300 mM NaCl), heat (42 °C), cold (4 °C), or ABA (Fig. 6b). As the name declares, PeuHsfs were dramatically responsive to heat stresses, as a total of 10 PeuHsfs (-A2, -A5b/c, -A6a, -A6b, -A7a, -A9, -B1, -B2b, -B5b and -C1) showed prompt induction by heat stress at 1 h. Aside from heat stress, most of the PeuHsfs (e.g. PeuHsf-A4b, -A6a, -B1, -B3, -B5a, -B5b, -C1) were responsive to both drought and salinity stresses. In contrast to the strong responsiveness of PeuHsfs to heat, drought or salt stresses, only three PeuHsfs (-A6a, -B4c and -B5a) were promptly induced after 1 h under cold stress. It is important to note that PeuHsf-A6a was promptly and significantly induced by all the tested treatments in this study (Fig. 6b).

Expression divergence between paralogous pairs

The expression patterns of duplicated PeuHsf genes were divergent during the evolution. As indicated in Figs 6 and 7, most PeuHsfs in paralogous pairs showed different expression patterns. Among the 10 PeuHsf paralogous pairs, genes in two paralogous pairs (PeuHsf-A1a/PeuHsf-A1d and PeuHsf-A5b/PeuHsf-A5c) could not be effectively distinguished by qRT-PCR in our study because of their highly sequence similarity in each pair. The genes in three pairs (PeuHsf-A6a/PeuHsf-A6b, PeuHsf-B4a/PeuHsf-B4c and PeuHsf-B4b/PeuHsf-B4d) showed similar expression patterns across different tissues (Fig. 6a), but had significant differences under various stresses (Fig. 6b). In contrast, the genes in the remaining five PeuHsf paralogous pairs presented different patterns both in tissues and under stress conditions (Fig. 6).

Figure 7
figure 7

Different stress response model of Hsf family in P. euphratica (a) and A. thaliana (b). (a) The results were based on our qRT-PCR in Fig. 6b. (b) The results were based on the public available microarray data (see Supplementary Fig. S5).

Divergence of expression patterns between P. euphratica and A. thaliana

To reveal the potential mechanism of distinguished stress tolerance of P. euphratica, the stress response profiles of PeuHsfs were also compared to AtHsfs (see Supplementary Fig. S5). As shown in Fig. 7, the responses of Hsf genes in the two species were significantly different. In Arabidopsis, with the exception of the 9 Hsfs that responded to heat stress, there were 15, 6, 9 and 8 Hsfs that were highly induced by cold, drought, salt, or ABA treatment, respectively (Fig. 7b). While in P. euphratica, more Hsfs were induced by these stresses. Total of 21 PeuHsfs were induced by heat stress. In addition, 22 and 19 PeuHsfs were induced by salt and drought stresses. Total of 10 and 9 PeuHsfs were induced by cold and ABA treatments. The orthologous genes in the two species showed significantly divergent expression patterns. For example, AtHsfA2 was induced by heat, cold and salt stresses but PeuHsf-A2 was induced by heat, drought, salt and ABA treatments. In Arabidopsis, three class B AtHsfs (-B1, -B2a and -B2b) were widely induced by heat, cold, drought, salt and ABA treatments; while in P. euphratica, only two class A PeuHsfs (-A6a and -A9) were induced by all the five treatments (Fig. 7).

Discussion

As a class of ubiquitous transcription factors, the Hsf gene family plays an important role in various biological processes including plant development and stress tolerance. In this study, the evolution and divergence of the Hsf genes in desert poplar, P. euphratica, were studied to identify specific and subtle changes in these genes resulting in subfunctionalization or possible neofunctionalization.

After a comprehensive analysis, a total of 32 Hsf genes were identified in P. euphratica (Table 1). The size of the Hsf family in P. euphratica was larger than that of other species in Salicaceae such as P. trichocarpa and S. suchowensis13,19. Phylogenetic analysis revealed that Hsf genes in P. euphratica followed a similar distribution pattern as in other plant species (Fig. 1). Based on the previous classification, the PeuHsfs was classified as three main classes (A, B and C). Classes A and B were further divided into A1-A9 and B1-B5 subclasses. Differing from P. trichocarpa, P. euphratica is a native species in desert area and has great tolerance to drought and salinity20. Compared with the stress sensitive poplar P. trichocarpa, the size of Hsf subclasses A5, A7 and B2 was larger in P. euphratica (Fig. 1). Meanwhile, the members in these subclasses had abundant stress-related cis-acting elements (Fig. 5) and showed significant stress responses (Fig. 6), which might contribute to the distinguished stress tolerance in P. euphratica.

It has been reported that the intron densities were significantly lower in the genes with rapid expression induction in response to stresses. Introns affect the expression efficiency through at least three possible aspects: delay transcript production by 1) splicing, 2) the added length of nascent transcript, or 3) the added energetic cost from increased transcript length21. Our analysis suggests that most of PeuHsfs contain one or two introns, which is helpful to their prompt induction under stress conditions. Interestingly, HsfA2 in P. euphratica has three introns, while its orthologous genes have two and one intron(s) in P. trichocarpa and S. suchowensis, respectively13,19.

Gene duplication plays a critical role in the generation of new genes, increasing the number of these genes and dispersing them in the genome22. In this study, the expansion of PeuHsf family was primarily due to the WGD events. The duplication events in PeuHsf gene family might introduce neofunctionalization, subfunctionalization, or nonfunctionalization. Most duplicated PeuHsfs display different spatial expression patterns or stress responses (Fig. 6), indicating that PeuHsfs are going through subfunctionalization or neofunctionalization. As dioecious and cross-pollinated plants, the gene flows in Salicaceae plants were stronger than self-pollinated plants such as Arabidopsis, which provided chance for subfunctionalization or neofunctionalization of duplicated genes. Despite P. euphratica, P. trichocarpa and S. suchowensis belonging to the Salicaceae family, the genome sizes of P. euphratica and P. trichocarpa are significantly larger than S. suchowensis, which might be the reason for the broad range of Ka/Ks value in P. euphratica and P. trichocarpa (Fig. 3).

During the long stage of evolution, selection pressure plays an important role in shaping gene families, resulting in different evolutionary patterns among gene families or even in one gene family23. Our study demonstrates that the Ka/Ks ratios in PeuHsf HR-A/B regions were smaller than those in DBD domains (especially in paralogous pairs PeuHsf-A1c/PeuHsf-A1d and PeuHsf-B2c/PeuHsf-B2d) (Fig. 4), indicating that the HR-A/B region is relatively conserved in PeuHsf family. As the recognition and binding region, DBD domains play crucial roles in determining the specific downstream HS-inducible genes. The relatively high variation in DBD domains between PeuHsf paralogous pairs is helpful in enhancing the binding specificity and diversity of the P. euphratica Hsf family.

Among various cis-acting elements in promoter regions of PeuHsfs, HSE was the most abundant cis-acting element (Fig. 5). In the 32 members of PeuHsf family, total of 27 PeuHsf including 73 HSE in their promoter regions and with high enrichment level (2.7). Based on our expression analysis (Fig. 6), 12 PeuHsfs (-A2, -A6a, -A6b, -A7a, -A7b, -A7c, -B1, -B2a, -B2b, -B2d, -B5b and -C1) were highly induced by heat stress, which including 37 HSE in their promoter regions with a relatively higher enrichment level (3.1) than that in all HSE-including genes (2.7). These results suggest that there is a positive correlation between the cis-acting elements and gene expression patterns in PeuHsf gene family.

In P. euphratica, more than 60% Hsf genes were induced by heat, drought, or salt stresses (21, 19 and 22 PeuHsfs, respectively), which were significantly more than that in Arabidopsis (9, 6 and 9 AtHsfs, respectively) (Fig. 7). The expression patterns of PeuHsfs might be the evolved adaptive mechanism of P. euphratica to face the frequent heat, drought and salt stresses in desert areas. In tomato and Arabidopsis, HsfA2 has a high activator potential to activate the expression of Hsp genes. It has been identified as the dominant Hsf and strongly accumulated under heat stress7. In P. euphratica, the PeuHsfA2 was also highly induced by heat stress, reaching ~500 folds at 1 h after heat stress (Fig. 6). The result implied that HsfA2 might also be the dominant Hsf in P. euphratica similar to the tomato and Arabidopsis.

The members in Hsf family showed different stress responses across various plant species. The P. euphratica HsfA2 was responsive to heat, drought, salt and ABA treatments (Figs 6 and 7), whileArabidopsis HsfA2 was responsive to heat, cold and salt stresses (Fig. 7, Supplementary Fig. S5), indicating that HsfA2 might be involved in ABA-independent pathway in Arabidopsis but ABA-dependent pathway in P. euphratica. Noticeably, some heat-insensitive Hsfs were involved in other stress tolerances. For instance, overexpression of wheat HsfA4a enhanced Cd tolerance but did not improve thermotolerance in transgenic rice24. In Arabidopsis, HsfA4a did not response to heat, but could be induced by cold, drought and salt stresses (Fig. 7b). Consistently, overexpression of a dominant negative mutant form of AtHsfA4a increased oxidative stress sensitivity in transgenic Arabidopsis25. In contrast, HsfA4a in P. euphratica was not only induced by drought and salt stresses, but also induced by heat stress. Moreover, the other members in A4 subclass (PeuHsfA4b and PeuHsfA4c) were also induced by both heat and other stresses (Fig. 7a). It implies that these Hsf genes showed wide stress response patterns in the desert poplar.

It has been revealed that HsfA1 and HsfA2 can form a superactivator complex to activate downstream genes in a stronger manner than individual factors10, although overexpression of AtHsfA2 also improved thermotolerance in quadruple knock-out Arabidopsis mutant (athsfA1a, b, d, e)15. In Arabidopsis, only one member (A1d) in A1 subclass showed multiple stress response to heat, cold, salt and ABA (Fig. 7b), it could be cooperated with HsfA2 in these stress conditions. In contrast, many more A1 members (A1a/d, A1b and A1c) in P. euphratica showed multiple stress responses to heat, drought and salt stresses and is similar to the expression patterns of PeuHsfA2 (response to heat, drought, salt and ABA treatments) (Fig. 7a). The high similarity of expression patterns of PeuHsfA1s and PeuHsfA2 can form more superactivator complexes under heat, drought and salt stresses in P. euphratica, which might be one reason for high stress tolerances of P. euphratica in arid and saline environments.

In conclusion, identification and detailed analysis of the Hsf gene family has been carried out in desert poplar, P. euphratica. Analysis suggests that Hsf gene family in P. euphratica has diverged during evolution and is widely responsive to various abiotic stresses. These findings are helpful in understanding the distinguished adaptability of P. euphratica to severe desert environments and provide the basis for functional analysis of PeuHsfs in the future.

Methods

Identification of Hsfs in P. euphratica and other plant species

For Hsf identification, the Hsf domain (PF00447) from the Pfam database (http://pfam.xfam.org/) was used to search against the P. euphratica genome (http://me.lzu.edu.cn/stpd/#main_tabs=0)20. In addition, the Hsf protein sequences of A. thaliana26 and P. trichocarpa19 were used as queries to perform a BLASTP search against the P. euphratica genome. The Simple Modular Architecture Research Tool (SMART, http://smart.embl-heidelberg.de/) was used to analyze the DBD domain and the coiled-coil structure.

Sequence analysis

Full-length protein sequences of Hsf from four species, including model species A. thaliana and three Salicaceae species - P. euphratica (this study), P. trichocarpa19 and Salix suchowensis13, were aligned using the Clustal X2.127 and the phylogenetic tree was constructed using MEGA 528 with the neighbor-joining (NJ) method with 1,000 bootstrap replicates. The P. euphratica Hsfs (PeuHsfs) were named according to their subfamily classification and their phylogenetic relationships with the AtHsfs and PtHsfs (see Supplementary Table S1). The gene structures including exon and intron were displayed using Gene Structure Display Server (GSDS, http://gsds.cbi.pku.edu.cn/index.php). The conserved motifs of PeuHsfs were defined by Multiple Em for Motif Elicitation (MEME, http://meme-suite.org/). The parameters (e.g. molecular weight, isoelectric point, instability index, aliphatic index, GRAVY and so on) of PeuHsf proteins were obtained from the ExPASy database (https://www.expasy.org/).The instability index provides an estimate of the stability of the protein in a test tube. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable. The aliphatic index of a protein is described as the relative volume occupies by the amino acids such as alanine, valine, isoleucine and leucine, which have an aliphatic side chain in their structure. The GRAVY value for a protein or a peptide is calculated by adding the hydropathy value of each amino acid residues and dividing by the number of residues in the sequence or length of the sequence. Increasing positive score indicates a greater hydrophobicity.

Chromosome location and duplication analysis

The PeuHsf genes were mapped onto P. euphratica scaffolds based on the publicly available information provided in the P. euphratica genome database. For duplication of Hsf genes in A. thaliana and P. trichocarpa, the duplication events were obtained from the Plant Genome Duplication Database (http://chibba.agtec.uga.edu/duplication/). For duplication of Hsf genes in P. euphratica and S. suchowensis, the duplicated events were identified based on the duplicated block from synteny analysis using MicroSyn29. A duplicated block was defined as a region where three or more conserved homologs were located within 15 genes up- and down-stream between scaffolds30.

In silico analysis of cis-acting elements of PeuHsfs

The cis-acting elements in promoter region (2 kb upstream of translation initiation site) of the PeuHsf genes were identified using PlantCARE31.

Calculation of Ka/Ks values

The paralogous pairs were aligned using Clustal X2.1 and analyzed using PAL2NAL (http://www.bork.embl.de/pal2nal/) to calculation the Ka and Ks substitution rates. The divergence time (T) was calculated according to T = Ks/(2 × 9.1 × 10−9) MYA for Populus22.

Plant growth conditions and treatments

P. euphratica were planted in the greenhouse at Chinese Academy of Forestry (Beijing, China). For different tissues, shoot tip (ST), young leaf (YL), mature leaf (ML), stem (S), young root (YR), old root (OR), female catkin (FC) and male catkin (MC) from P. euphratica were collected. For various abiotic stresses, 4-month-old P. euphratica seedlings were water-cultured using Hoagland solution32. The seedlings were treated with 25% (w/v) polyethylene glycol 6000 (PEG6000, for drought stress), 300 mM NaCl (for salt stress), 42 °C (for heat stress), 4 °C (for cold stress), or 100 μM abscisic acid (ABA). The dosages of the abiotic stresses and hormone treatment were determined based on treatments in poplar19,33. Considering the outstanding salt tolerance of P. euphratica, the concentration of NaCl was increased to 300 mM from 150 mM. The leaves mixed from eight individuals at four time points (0, 1, 6 and 12 h) during the treatments were collected and frozen immediately in liquid nitrogen until to use. Three biological replicates were performed for each sample.

The expression data of Arabidopsis Hsf genes response to various abiotic stresses and hormone was download from AtGenExpress database (http://jsp.weigelworld.org/expviz/expviz.jsp). The Arabidopsis seedlings were treated under 38 °C (for heat stress and recovery at 25 °C), 4 °C (for cold stress), 150 mM NaCl (for salt stress), 300 mM mannitol (for osmotic stress), 1.5 μg/ml bleomycin + 22 μg/ml mitomycin (for genotoxic stress), 10 μM methyl viologen (for oxidative stress), punctured with pins (for wounding stress), or 10 μM ABA. For drought stress, the Arabidopsis seedlings were stressed by 15 min dry air stream (clean bench) until 10% loss of fresh weight, then incubation in closed vessels in the climate chamber34. To compare the responses of Hsf genes to various abiotic stresses and hormone treatment between P. euphratica and A. thaliana, the significantly induced genes were selected by the following criteria: P < 0.05 and the value of Log2 (fold change) ≥1.

RNA extraction and qRT-PCR analysis

Total RNA was isolated from P. euphratica materials using the RNeasy Plant Mini Kit (Qiagen) according to the manufacture’s protocol. Approximately 4 μg of total RNA were reverse transcribed using the SuperScript III reverse transcription kit (Invitrogen) to generate cDNA. qRT-PCR was performed on the LightCycler® 480 Real Time PCR System (Roche) using SYBR Premix Ex TaqTM Kit (Takara) according to the manufacturer’s procedure. The PeuActin gene was used as an internal control. The final threshold cycle (Ct) values were the mean of four values for each sample and four technical replicates. All primers used in this study are listed in Supplementary Table S3.

Statistical analyses

Statistical analyses of qRT-PCR were carried out using SPSS 16.0 software (SPSS Inc, Chicago, IL, USA). Data was compared using Student’s t test. Differences were considered to be significant if P < 0.05.

Additional Information

How to cite this article: Zhang, J. et al. Molecular evolution and expression divergence of the Populus euphratica Hsf genes provide insight into the stress acclimation of desert poplar. Sci. Rep. 6, 30050; doi: 10.1038/srep30050 (2016).