Genome-wide identification and analysis of the heat shock transcription factor family in moso bamboo (Phyllostachys edulis)

Heat shock transcription factors (HSFs) are central elements in the regulatory network that controls plant heat stress response. They are involved in multiple transcriptional regulatory pathways and play important roles in heat stress signaling and responses to a variety of other stresses. We identified 41 members of the HSF gene family in moso bamboo, which were distributed non-uniformly across its 19 chromosomes. Phylogenetic analysis showed that the moso bamboo HSF genes could be divided into three major subfamilies; HSFs from the same subfamily shared relatively conserved gene structures and sequences and encoded similar amino acids. All HSF genes contained HSF signature domains. Subcellular localization prediction indicated that about 80% of the HSF proteins were located in the nucleus, consistent with the results of GO enrichment analysis. A large number of stress response–associated cis-regulatory elements were identified in the HSF upstream promoter sequences. Synteny analysis indicated that the HSFs in the moso bamboo genome had greater collinearity with those of rice and maize than with those of Arabidopsis and pepper. Numerous segmental duplicates were found in the moso bamboo HSF gene family. Transcriptome data indicated that the expression of a number of PeHsfs differed in response to exogenous gibberellin (GA) and naphthalene acetic acid (NAA). A number of HSF genes were highly expressed in the panicles and in young shoots, suggesting that they may have functions in reproductive growth and the early development of rapidly-growing shoots. This study provides fundamental information on members of the bamboo HSF gene family and lays a foundation for further study of their biological functions in the regulation of plant responses to adversity.

www.nature.com/scientificreports/ domain (also known as the HSF domain) consists of three α-helix bundles and four strands of reverse-parallel β-folded layers that form a tight sphere. This structure ensures the specific binding of HSFs to HSEs in target gene promoters, thereby regulating the transcription and expression of heat shock genes 9 . The OD consists of two hydrophobic 7-peptide repeat regions, HR-A and HR-B, and HSFs are divided into three subfamilies (HsfA-C) based on differences in the number of amino acids inserted between HR-A and HR-B. HsfAs and HsfCs have 21 and 7 amino acid residues between HR-A and HR-B, respectively, whereas HsfBs have none 10,11 . The nuclear localization signal (NLS) is composed of basic amino acids and directs the transport of HSFs into the nucleus 12 , whereas the leucine-rich nuclear export signal (NES) facilitates their export from the nucleus to the cytoplasm 13 . The CTAD is the least conserved region and contains AHA motifs that consist of aromatic, large hydrophobic, and acidic amino acid residues. The AHA sequence is specific to the class A subfamily, and it is generally required for transcriptional activation. HsfBs and HsfCs do not contain AHA motifs and thus lack a transcriptional activation function 14 .
The first plant HSF gene was cloned from tomato in 1990. As more completed plant genome sequences became available, HSF family members were identified in Arabidopsis (21 HSF genes) 10 , tomato (26) 15 , cabbage (52) 16 , pepper (25) 17 , poplar (31) 18 , maize (25) 19 , and rice (25) 20 . Detailed functional studies of individual HSF subfamilies have been performed in Arabidopsis, tomato, and other model plants, and experiments have confirmed that HSFs have important roles in plant responses to heat and other abiotic stresses. Arabidopsis HsfAs were shown to be transcriptional activators, and AtHsfA1a regulates the synthesis of heat shock proteins (Hsps) during the heat-response phase to minimize plant growth damage at high temperatures 21 . The overexpression of Arabidopsis AtHsfA2 22 and tomato SlHsfA3 23 significantly increased both basal and acquired heat tolerance. The expression of AtHsfA6a increased nearly 150-fold under salt stress compared with control conditions 24 . By contrast, HsfB1 and HsfB2b from subfamily B were shown to be transcriptional repressors, and rice OsHsfB2b was found to significantly reduce plant salt tolerance after NaCl treatment 20,25 . Under drought stress, plants with upregulated expression of AtHsfA1b showed significant increases in yield and harvest index. Plants that overexpressed HsfA1b also had enhanced resistance to pathogenic bacteria, indicating that this gene is involved in the regulation of both abiotic and biotic stress resistance 26 .
Moso bamboo (Phyllostachys edulis) is a non-timber forestry species from the subfamily Bambusoideae of the family Poaceae and is native to China 27 . It has a wide distribution area and a high economic value, and it plays an important role in soil and water conservation, soil carbon sequestration and oxygen dynamics, and climate regulation 28 . However, water availability in moso bamboo forests is entirely dependent on natural precipitation, and its two most important water-demanding growth stages, 'asparagus gestation' in the autumn and 'asparagus emergence' in the spring, are most sensitive to heat and drought stress 28,29 . High temperatures and drought not only damage moso bamboo growth and development, thereby reducing the yield of bamboo shoots, but also reduce the carbon stocks of moso bamboo 30 . The intensification of global climate change is predicted to significantly alter the level and quality of ecological services to forests 31 , which will aggravate the effects of water stress on moso bamboo and significantly limit its yield.
To date, there have been no studies of the HSF transcription factor family in moso bamboo. Here, we used a bioinformatics approach to identify and characterize HSF genes and their encoded proteins in the moso bamboo genome 32,33 ; we analyzed their evolutionary relationships, conserved domains, gene structures, promoter elements, duplication patterns, tissue-specific expression, and responses to external hormone treatments. Our results lay the groundwork for further study of heat stress response mechanisms in bamboo.

Materials and methods
Identification and sequence analysis of HSF proteins from moso bamboo. Genomic data from moso bamboo were downloaded from the P. edulis genome database (http:// parrot. genom ics. cn/ gigadb/ pub/ 10. 5524/ 100001_ 101000/ 100,498/ assem bly_ fasta/ Bamboo. HIC. genome. fasta. gz). A hidden Markov model of the HSF DBD (PF00447) was obtained from the Pfam database (http:// pfam. xfam. org/) and used as the seed model for an HMMER3 search (http:// hmmer. janel ia. org/) of the downloaded bamboo protein sequence data (E ≤ 10 −20 ) 34 , and redundant genes were removed to produce a set of preliminary HSF candidate sequences. To verify that these candidates were HSFs, we used the normal mode of SMART (http:// smart. embl-heide lberg. de/) 35 sequence analysis with default search parameters and a Batch search (E < 0.001) of the Pfam (http:// pfam. xfam. org) 36 database to filter out sequences that lacked complete HSF_DNA-bind (PF00447.17) domains. The confirmed HSF genes were renamed according to their positions on the moso bamboo chromosomes. Subcellular localization predictions were generated using CELLO with default parameters (http:// cello. life. nctu. edu. tw/) 37 , and the ExPASy ProtParam tool (https:// web. expasy. org/ protp aram/) 38 was used with default parameters to predict protein physicochemical parameters such as molecular weight (MW) and isoelectric point (pI).
Sequence alignment and phylogenetic tree construction. Whole genome information for Arabidopsis and rice was downloaded from the TAIR10 database (http:// www. arabi dopsis. org/ index. jsp) and the Rice Genome Annotation Project database (http:// rice. plant biolo gy. msu. edu). Maize and pepper genomic data were downloaded from the Ensembl database (http:// asia. ensem bl. org/ index. html). Twenty-one Arabidopsis HSF proteins and 25 rice HSF proteins were identified from HMMER3 searches of the corresponding local protein databases 34 . The Arabidopsis and rice HSF sequences were combined with those from moso bamboo, a multiple protein sequence alignment was produced with ClustalX 2.0 (http:// www. clust al. org/ clust al2/). The comparison parameter is the multiple comparison mode (other parameters are default), and the resulting alignment was used to construct a maximum likelihood (ML) phylogenetic tree in MEGA 7.0 with 1000 bootstrap replicates 39 . Intraspecific classification of the moso bamboo HSF sequences was based on this interspecific phylogenetic tree. www.nature.com/scientificreports/ The amino acid sequences of conserved domains were compared and edited using Jalview software (V2.10.5) (http:// www. jalvi ew. org/) 40 , and the Jalview output was submitted to JPred (http:// www. compb io. dundee. ac. uk/ jabaws) for protein secondary structure prediction using default parameters 41 . Gene structures, motif identification, and conserved domains. The intron-exon distributions of the moso bamboo HSF genes were obtained using GFF annotation files from the moso bamboo genome. Conserved amino acid sequences of HSF proteins were analyzed using the online MEME tool (http:// meme-suite. org/) 42 . MEME analysis parameters included a minimum width ≥ 6, a maximum width of 50, and a motif number of 10; all other parameters were set to default values. Conserved domains in the HSFs were predicted using the NCBI Conserved Domain Database (https:// www. ncbi. nlm. nih. gov/ cdd/) (E-value < 0.001, other parameters set to defaults), and DOG 2.0 (http:// dog. biocu ckoo. org) was used for protein structure visualization with default parameters 43 . Chromosomal locations, genomic duplications, and Ka/Ks ratios. Chromosome lengths and gene locations were obtained from the moso genome annotation file, and MG2C v.2 (http:// mg2c. iask. in/ mg2c_ v2.0/) was used to visualize the gene locations on chromosomes 44 . The moso bamboo protein sequences were aligned to one another or to the protein sequences from Arabidopsis, rice, maize, or pepper using TBtools software 45 . MCScanX 46 was used with default parameters to identify gene duplication events and syntenic relationships among the HSF proteins, and the results were visualized using Circos and Dual Synteny Plot in TBtools 45 .
For Ka/Ks analysis, thirteen homologous gene pairs were identified by BLASTn using two criteria: (1) > 75% sequence similarity and (2) an alignable region > 75% of the length of the longer sequence 47 . KaKs_Calculator2.0 was used to calculate the synonymous substitution rate (Ks), nonsynonymous substitution rate (Ka), and Ka/Ks ratio between homologous gene pairs 48 . Evolutionary divergence times within the bamboo HSF gene family were calculated using the bamboo-specific divergence time formula T = Ks/2λ (where λ = 6.5 × 10 −9 ) 32 .
Three-dimensional (3D) structural modeling of HSF family proteins. The PDB database (http:// www. rcsb. org/) was used to retrieve protein models homologous to those of the HSF proteins. Swiss Model (https:// www. swiss model. expasy. org/) was then used with default parameters to predict the protein tertiary structures by homology modeling, and the quality of the resulting models was assessed using SAVES v.5.0 (http:// servi cesn. mbi. ucla. edu/ SAVES/).

Protein interaction network predictions and GO enrichment analyses. The HSF protein
sequences were uploaded to the STRING database (https:// string-db. org/) for node comparison, and relationships among important proteins were predicted based on rice protein interactions. Cytoscape (V3.7.1) was used to visualize the resulting network 50 .
GOATOOLS (http:// github. com/ tangh aibao/ GOato ols) 51 was used to assign GO annotations to HSFs, and Fisher's exact test was used to identify biological functions enriched in the PeHsfs relative to the full GO database. A Bonferroni 52 multiple testing correction was used to minimize false positives, and functions were considered to be significantly enriched when their Bonferroni-corrected P-values (Padjust) were < 0.05.

Results
Identification of HSF genes in moso bamboo. Forty-four putative HSF candidate genes were obtained from an HMMER3 search of the bamboo protein database using the plant HSF-type DBD model (Pfam PF00447) with an E-value threshold of ≤ 10 −20 . We removed redundant genes and verified the presence of conserved domains and motifs to arrive at a final set of 41 HSF family members. The genes were renamed PeHsf01-PeHsf41 based on their positions on the chromosomal scaffolds ( Table 1).
The predicted physicochemical properties of the amino acid sequences showed that the 41 HSF genes encoded proteins containing 159 (PeHsf03) to 512 (PeHsf30) amino acids and their molecular weights ranged from 17.49 (PeHsf03) to 56.33 kDa (PeHsf32). Approximately 90% of the HSF proteins had molecular weights of 30-50 kDa. Their predicted isoelectric points (pI) ranged from 4.72 (PeHsf04) to 9.31 (PeHsf39). Instability index calculations predicted that 39 (95%) of the HSF proteins were stable in vitro, with the exceptions of PeHsf05 and PeHsf21, which had instability indices of 32.96 and 37.27, respectively. Aliphatic amino acid indices showed that the thermal stability of the proteins ranged from 57.01 to 83. 25 www.nature.com/scientificreports/ were relatively minor. The grand average of hydropathicity (GRAVY) scores of all HSF proteins were negative, indicating that they were hydrophilic proteins. CELLO subcellular localization predictions suggested that about 80% of the HSF proteins were located in the nucleus.

Phylogenetic analysis and sequence alignment of the PeHsf proteins.
To clarify the evolutionary relationships among the HSF proteins, amino acid sequences of 41 moso bamboo HSFs, 25 rice HSFs, and 21 Arabidopsis HSFs were used to construct an ML phylogenetic tree 53 ( Fig. 1 and Supplementary Table S1). Based on well-established Arabidopsis and rice HSF family classifications 54 , the HSF proteins were divided into three major subfamilies, HsfA (blue), HsfB (green), and HsfC (yellow). The HsfA subfamily was the largest, with 48 members across the three species, whereas the HsfC subfamily was the smallest, with 12 members. There was only one Arabidopsis HsfC family member (AtHsfC1), suggesting that the C subfamily had expanded in monocots. The interspecific phylogenetic tree indicated that the PeHsfs included members of all three subfamilies: 20 HsfAs, 14 HsfBs, and 7 HsfCs. A multiple sequence alignment of the 41 PeHsf proteins was generated to investigate the presence and locations of conserved protein domains. All HSF family members contained a highly conserved DBD at the amino terminus that consisted of approximately 100 amino acids (Fig. 2). The protein secondary structure was predicted to contain three α-helix bundles (α1-α3) and four reverse-parallel β-folds (β1-β4). The DBD domain can specifically recognize and precisely localize heat stress elements, and similar phenomena are found in other plants.
Gene structures, motif identification, and conserved domain analysis. The number of introns in each moso bamboo HSF gene ranged from one to three (Fig. 3). Approximately 75% of the genes contained a single intron, whereas six genes contained two introns and four contained three introns. With the exception of PeHsf31, all HsfCs contained only one intron. Structures of HSF genes from the same subfamily were generally similar.
We identified up to 10 highly conserved motifs in each HSF protein using MEME (Fig. 4). All HSFs contained motifs 1-3, which constituted the most highly conserved portions of the DBD. The relative positions of the motifs were similar for most sequences, with the exception of the HsfB PeHsf03, which lacked motif 2. In addition, both class B and C subfamily members did not contain motif 6, which is associated with transcriptional activation.
The structures of the PeHsf proteins were further characterized using the NCBI Conserved Domain Database (Fig. 5). All PeHsf proteins contained highly conserved DBDs at the N terminus between amino acids 7 and 155, the presence of which served as confirmation of their identity as HSF proteins. Some proteins contained additional conserved domains. For example, PeHsf10, PeHsf27, and PeHsf41 contained a bZIP-like transcription factor domain, and PeHsf17 and PeHsf36 contained a PRK10246 domain. Therefore, multiple functions may have evolved in some members of the moso bamboo HSF family.
Gene duplication events are prevalent in all species; they can give rise to new functional genes and drive species evolution 56 . We therefore used MCScanX genome synteny analysis to explore duplications within the moso bamboo HSF gene family (Fig. 7). Twenty-seven gene pairs appeared to have arisen from segmental duplications, and only PeHsf07 and PeHsf31 had no duplicates in the genome. Previous studies have shown that two or more highly similar genes located in close proximity to one another are likely to be tandem duplicates 57 . As shown in Fig. 6, two such pairs, PeHsf01-PeHsf02 and PeHsf28-PeHsf29, were present on scaffolds 3 and 18, respectively. www.nature.com/scientificreports/ To further investigate gene duplications in the HSF gene family, we performed genome-to-genome synteny analysis between moso bamboo and four representative plants (Supplementary Table S3). These included two dicots, Arabidopsis and Capsicum annuum (pepper) (Fig. 8a), and two monocots, O. sativa (rice) and Zea mays (maize) (Fig. 8b). Four, 3, 58, and 52 moso bamboo HSF genes were syntenic with those of Arabidopsis, pepper, rice, and maize, respectively. Furthermore, the rice HSF genes all had corresponding orthologs in moso bamboo, and most of them had more than two orthologs, suggesting that moso bamboo has undergone additional wholegenome duplication events during its evolution.
To investigate evolutionary constraints and selection pressures on the PeHsf genes, we calculated Ka, Ks, and Ka/Ks for 13 homologous PeHsf gene pairs ( Table 2). The synonymous substitution rate (Ks) represents the background base substitution rate, and Ks values can therefore be used to predict the timing of whole-genome duplication events. The Ks values of the PeHsf gene pairs ranged from 0.1023 to 1.0833, indicating that a largescale PeHsf gene duplication event occurred as early as 83.33 million years ago (MYA) and as recently as 7.86 MYA. The Ka/Ks values of the gene pairs were all less than 1.0, and these genes may have undergone strong purifying selection during evolution.
Promoter cis-element analysis of PeHsf genes. cis-acting elements are non-coding DNA sequences in gene promoters that regulate the transcription of their associated genes. We identified ten cis-acting elements in the 1500 base pairs upstream of the PeHsf genes using PlantCARE software (Fig. 9). Most were stress-response elements, and all promoters except that of PeHsf09 contained at least one MYB drought stress response element. The next most common element was the ABRE element, which is involved in abscisic acid responses. Other www.nature.com/scientificreports/ elements present in the PeHsf promoters included the MBS element (involved in drought, high salt, and low temperature responses), the TC-rich repeat (involved in defense and stress responses), the SP1 light response element, the P-box and TGA gibberellin response elements, the TCA salicylic acid response element, and the CGTCA motif (involved in methyl jasmonate response). These results suggest that the expression of moso bamboo HSF genes is regulated by cis-elements associated with plant developmental processes and abiotic stress responses.
Transcription profiles of the PeHsf genes. We used published transcriptome data to investigate the expression patterns of the PeHsf genes (1) in seedling roots treated with gibberellic acid (GA) or naphthalene acetic acid (NAA), (2) in roots, rhizomes, panicles, and leaves, and (3) in young shoots of different heights (20, 50, and 100 cm). In (Fig. 10 and Supplementary Table S4), blue indicates low transcript abundance and red indicates high transcript abundance. Under GA treatment (Fig. 10a), the expression of PeHsf17, PeHsf24, PeHsf26, PeHsf06, PeHsf22, PeHsf33, PeHsf20, PeHsf15, PeHsf08, and PeHsf36 was downregulated compared with the control, indicating that GA repressed the expression of these genes. By contrast, the expression of PeHsf27, PeHsf01, and PeHsf31 was elevated under GA treatment but reduced under NAA treatment (Fig. 10b), indicating that the root expression of individual PeHsf genes differed between the hormone treatments. Similarly, PeHsf expression differed markedly among plant organs (Fig. 10c). In general, more than half of the genes were highly expressed in panicles, and four genes (PeHsf27, PeHsf11, PeHsf10, and PeHsf41) had expression levels greater than 100 TPM in this organ. Seven genes (PeHsf33, PeHsf13, PeHsf03, PeHsf35, PeHsf09, PeHsf16, and PeHsf29) were highly expressed only in roots, suggesting that they may participate in the development and/or function of bamboo roots. PeHsf36 and PeHsf30 were highly expressed only in leaves, and a large proportion of PeHsf genes were either not expressed or expressed only at low levels in the rhizome. Expression analysis of bamboo shoots at different germination stages (Fig. 10d) showed that many PeHsfs were expressed highly in 20-cm shoots, at intermediate levels in 50-cm shoots, and at the lowest level in 100-cm shoots. In general, these PeHsfs were specifically expressed during rapid shoot growth, suggesting that they may have important functions during this time.  Table S2) .  Table S5). Eight molecular function, one cellular component, and eleven biological process GO terms were enriched in the PeHsfs relative to the complete GO database. Thirty-six proteins were annotated with the enriched GO term heat response, which had an enrichment factor of 0.09, confirming that heat response is a primary function of the PeHsf proteins. GO enrichment results suggested that PeHsf transcription factors were also involved in macromolecule biosynthesis, RNA biosynthesis, nitrogen compound metabolism, cellular biosynthetic processes, primary metabolic processes, and the regulation of RNA metabolic processes. The largest number of genes (37) were associated with the term "regulation of cellular macromolecule biosynthetic process".

Construction of a PeHsf protein interaction network.
We used the STRING database to predict potential interactions among the PeHsf proteins ( Fig. 12 and Supplementary Table S6). There were 19 nodes in the PeHsf protein interaction network, each of which interacted with multiple other nodes. Some proteins exhibited direct interactions, e.g. PeHsf14 and PeHsf16, and others showed more complex multi-gene interactions, e.g. PeHsf01, PeHsf16, and PeHsf23. PeHsf01, PeHsf24, PeHsf25, PeHsf27, and PeHsf40 were predicted to be central nodes, radiating numerous connections to other nodes.
Homology modeling of PeHsf tertiary structures. PeHsf15 from subfamily HsfA, PeHsf28 from subfamily HsfB, and PeHsf26 from subfamily HsfC were selected for tertiary structural homology modeling. The three-dimensional structures of the three subfamilies shared a number of similarities (Fig. 13). The highly con- www.nature.com/scientificreports/ served DNA binding domain (DBD), a distinctive feature of HSFs, was visible as a spherical region formed from three α-helix bundles and four strands of reverse-parallel β-folded layers. Different sequences had different convoluted structures. In PeHsf15, ARG91 was linked to the substrate via hydrogen bonds (Fig. 13a). By contrast, PeHsf28 (Fig. 13b) and PeHsf26 (Fig. 13c), representing the HsfB and HsfC families, showed no hydrogen bonding with the substrate owing to their lack of AHA motifs.

Discussion
HSFs are a specific class of transcription factors that play an important role in plant resistance to various stress injuries [58][59][60] . The HSF gene family has been studied in Arabidopsis 10 , tomato 15 , cabbage 16 , pepper 17 , poplar 18 , maize 19 , and rice 20 , and specific molecular functions of the HSFs have been validated in the model plants Arabidopsis and rice. However, HSFs have not previously been investigated in the economically important and widely distributed graminaceous species moso bamboo. At present, the genome draft of moso bamboo is almost complete, enabling the comprehensive characterization of important gene families 33,34 . Here, we identified 41 PeHsf genes from moso bamboo and classified them into three subfamilies (A-C) based on phylogenetic analysis (Fig. 1). Gene structure analysis revealed that some members within the same subfamily had structural differences. For example, PeHsf31 in the HsfC subfamily contained three introns, whereas the other HsfCs had only one. PeHsf31 may therefore have experienced the splicing or insertion of gene fragments during its evolution 61,62 . Nonetheless, similar conserved sequences and gene structures among HSF subfamily members suggests that genes within a subfamily may generally have similar biological functions. The highly conserved plant HSF DBD is located at the N terminus, where it enables the precise localization and recognition of the heat stress element (HSE) in target gene promoters 63,64 . Here, multiple sequence alignment and secondary structure prediction showed that a DBD consisting of three α-helices and four β-folds was www.nature.com/scientificreports/ present in all PeHsf proteins (Fig. 2). Interestingly, other conserved domains were also present in some PeHsf proteins, and further experimental verification is needed to determine whether this is a manifestation of gene family functional diversification. Tertiary structure analysis demonstrated that the portion of the transcription factors that interacts with nucleic acids was conserved among subfamilies. HsfB and HsfC members did not contain AHA motifs and therefore probably lack transcriptional activation function, consistent with the results of previous studies 14 .
Gene duplication plays an important role in evolution by facilitating the generation of new genes and gene functions. There are three main evolutionary modes of gene duplication 10 : segmental duplication, tandem duplication, and translocation events. Segmental and tandem duplication most commonly underlie the expansion of plant gene families 57,65 . The number of HSF genes was significantly higher in moso bamboo (41) than in the model plants Arabidopsis (21) and rice (25), suggesting that gene duplication had occurred, consistent with previous reports of whole-genome duplication in moso bamboo 32,66 . We therefore performed intra-and inter-genomic collinearity analyses of the HSFs. Evolutionary analysis has shown that whole genome duplication (WGD) and segmental duplication play an important role in the expansion of the HSF gene family. Within the moso bamboo genome, there were 29 duplicated gene pairs among the HSF genes, including 27 segmental duplicate pairs and two tandem duplicate pairs. Segmental duplications therefore dominated the expansion of the HSF gene family in moso bamboo. By comparison, a previous study reported nine segmental duplicate pairs in 25 rice HSF genes 54 . Synteny analysis of the moso bamboo genome with four other sequenced plant genomes showed that there was significant collinearity of HSF family members between bamboo and the monocots maize and rice. Only a few  www.nature.com/scientificreports/ HSF members were collinear between bamboo and the dicots Arabidopsis and pepper. This result is consistent with the evolutionary relationship between dicot and monocot plants.
There are four main fates of duplicated genes 67 . First, both genes may retain their original function, leading to functional redundancy. Alternatively, duplicates with the same ancestral function may gradually develop different functions in a process referred to as subfunctionalization. In a less common scenario, one of the duplicates may acquire a wholly new function (neofunctionalization). Finally, the intermediate evolutionary stage between subfunctionalization and neofunctionalization preserves genes that are essential for plant growth. We found that many PeHsf homologous gene pairs were expressed at similar levels ( Fig. 10), suggesting that duplication of the HSF genes in moso bamboo has primarily resulted in functional redundancy.
Ka and Ks values were calculated for thirteen homologous, segmentally duplicated gene pairs. Ks values indicated that large-scale PeHsf gene duplication events had occurred between ~ 7.86 and 83.33 MYA. Interestingly, about 62% of the homologous gene pairs appeared to have undergone duplication far earlier than 12 MYA, consistent with the occurrence of a separate whole-genome duplication (WGD) event prior to the WGD previously reported to have occurred 7-12 MYA 32 . In addition, all the homologous gene pairs had Ka/Ks values less than 1, indicating that the PeHsf genes were subject to purifying selection during their evolutionary history.
Flowering is the most important life history trait in angiosperm plants, and bamboo is no exception. Unlike many other bamboos, moso bamboo produces a single flowering spike approximately every fifty years, blooms sporadically, and dies after flowering. Several studies have demonstrated that HSFs modulate the expression of stress-related proteins such as heat shock proteins (HSPs) that play important roles in plant stress response 68 . HSPs such as HSP70 and HSP90 are involved in flowering time regulation and vernalization pathways that promote flowering by inhibiting FLC expression 69 . Based on HSF expression profiling in different bamboo tissues (Fig. 10c), we found that many PeHsfs showed relatively high expression levels in the panicle. For example, orthologs of rice OsHsfB2b (PeHsf27, PeHsf10, and PeHsf38) were highly expressed in panicles, and a previous study has shown that OsHsfB2b acts as a negative regulator of plant drought response 20 . These PeHsf genes may therefore be involved not only in panicle development but also in the response of moso bamboo to drought stress. Some PeHsf genes showed little to no expression in any tissue tested, suggesting that they may act in other  www.nature.com/scientificreports/ tissues or at other developmental stages. Interestingly, some PeHsfs were highly expressed in leaves but expressed at low levels in early 20-cm bamboo shoots, suggesting that they are mainly involved in cell differentiation and elongation, rather than in the process of lignification.
Multiple cis-acting elements located in gene promoters play a crucial role in signaling, and synergistic interactions among them can regulate complex biological processes. The drought-responsive MYB element was present in the promoter regions of almost all PeHsf genes, as were ABRE, MBS, P-box, TC-rich repeat, and SP1 elements, indicating that PeHsf expression is likely to be affected by abiotic factors such as high temperature, drought, high salinity, and light. Examination of expression profiles of seedling roots under NAA and GA treatments revealed that most PeHsf genes were expressed at different levels under treated and control conditions. Some genes, (e.g., PeHsf40, PeHsf31, and PeHsf13) were upregulated by GA treatment, whereas others (e.g., PeHsf33, PeHsf06, PeHsf03, PeHsf35, and PeHsf26) were downregulated. Interestingly, the promoter analysis ( Fig. 9) indicated that these genes all contained cis-elements involved in ABA response. In rice, multiple HSF genes are induced by exogenous gibberellin 3 (GA3) and abscisic acid (ABA) 70 , and multiple ABRE and P-box elements responsive to ABA and GA, respectively, were present in the PeHsf promoters. Promoter analysis and transcriptomic results therefore suggest that HSF genes are involved not only in the regulation of heat response, but also in responses to drought, salt, and various exogenous hormones (NAA, GA, and ABA), making them important transcription factors for plant resistance to multiple stresses. Figure 9. Diagram of cis-acting elements identified in the 1,500-bp upstream promoter regions of the PeHsf genes. Rectangular boxes of different colors represent individual cis-acting elements, and some elements overlap with one another. PlantCARE (http:// bioin forma tics. psb. ugent. be/ webto ols/ plant care/ html/) was used to identify cis-acting elements in the 1500-bp promoter region upstream of each gene's transcription start site, and the results were visualized using TBtools (v1.0697).

Conclusions
We systematically identified and analyzed 41 moso bamboo HSF genes and divided them into three subfamilies, each of which had similar gene structures and sequences. Evolutionary analysis indicated that segmental duplications associated with whole-genome duplication events were responsible for much of the expansion of the moso bamboo HSF gene family. Transcriptomic analyses confirmed that expression of some PeHsfs responded to exogenous GA and NAA application, and the high expression of other PeHsfs in 20-cm shoots suggested that they may function in the rapid growth and early development of bamboo shoots. The PeHsf gene family appears to be functionally diverse, with roles in bamboo growth and development, stress response, and the functions of specific tissues and organs. These results provide a foundation for the subsequent exploration of bamboo HSF gene functions.  Figure 11. GO enrichment analysis of the PeHsf proteins relative to the GO database. The horizontal axis indicates the enrichment factor, and the size of the circle indicates the number of genes annotated with a given GO term. GOATOOLS (http:// github. com/ tangh aibao/ GOato ols) was used to assign GO annotations to HSFs, and Fisher's exact test was used to identify biological functions enriched in the PeHsfs relative to the full GO database. Visualization were performed using the Majorbio online platform (https:// cloud. major bio. com).