Genome-wide identification, transcriptome analysis and alternative splicing events of Hsf family genes in maize

Heat shock transcription factor (Hsf) plays a transcriptional regulatory role in plants during heat stress and other abiotic stresses. 31 non-redundant ZmHsf genes from maize were identified and clustered in the reference genome sequenced by Single Molecule Real Time (SMRT). The amino acid length, chromosome location, and presence of functional domains and motifs of all ZmHsfs sequences were analyzed and determined. Phylogenetics and collinearity analyses reveal gene duplication events in Hsf family and collinearity blocks shared by maize, rice and sorghum. The results of RNA-Seq analysis of anthesis and post-anthesis periods in maize show different expression patterns of ZmHsf family members. Specially, ZmHsf26 of A2 subclass and ZmHsf23 of A6 subclass were distinctly up-regulated after heat shock (HS) at post-anthesis stage. Nanopore transcriptome sequencing of maize seedlings showed that alternative splicing (AS) events occur in ZmHsf04 and ZmHsf17 which belong to subclass A2 after heat shock. Through sequence alignment, semi-quantitative and quantitative RT-PCR, we found that intron retention events occur in response to heat shock, and newly splice isoforms, ZmHsf04-II and ZmHsf17-II, were transcribed. Both new isoforms contain several premature termination codons in their introns which may lead to early termination of translation. The ZmHsf04 expression was highly increased than that of ZmHsf17, and the up-regulation of ZmHsf04-I transcription level were significantly higher than that of ZmHsf04-II after HS.


Results
Identification and sequence analysis of ZmHsfs. A total of 58 ZmHsf proteins were found by HMM and Blastp searches in the latest released maize genome. Among these, 31 non-redundant ZmHsf proteins containing DBD and HR-A/B regions were identified by the SMART program. 25 identified ZmHsfs were previously numbered according to their chromosomal locations 33 . Our newly identified Hsfs, ZmHsf26, 27, 28, 29, 30 and 31, were also named according to their chromosomal location too. The WoLFPSORT website analysis of their amino acid sequences predicts that most of the ZmHsfs are localized in the nucleus, ZmHsf23, ZmHsf10 and ZmHsf11 are located to the cytoplasm, chloroplast and ER, respectively. The length of ZmHsfs proteins varies from 250 to 622 amino acids, the molecular weight is between 27309.07~68553.79 Da and the pI value varies from 4.7 to 9.53 (Table 1).
Multiple sequence alignment illustrates the conservation of the DBD domain and HR-A/B regions of ZmHsfs (Fig. 1). The secondary structure of all ZmHsf proteins contains 3 α-helical structures and 4 β-fold structures in the DBD domain. Based on the length of flexible linkers between the HR-A and HR-B regions, 16, 10 and 5 members of ZmHsf were assigned to class A, B and C, respectively. There are 21 inserted residues in class A and 7 in class C between the HR-A and HR-B regions and no insertion in class B.
Conserved protein motifs and gene structure of ZmHsfs. The results of phylogenetic tree, motif identification and gene structure analysis of 31 ZmHsf proteins were merged using TBtools to assess conserved motifs and gene structure. All ZmHsfs contain motifs 1, 2, 3 and 4 except ZmHsf31, which is lack of motif 4. All members of class A include motif 5 and 6 except ZmHsf15, which does not have motif 6. All members of class C contain motif 6 ( Fig. 2). Motifs 1, 2 and 3 function in the DBD domain, motif 4 functions in the HR-B domain, and motif 6 functions in the NLS domain. The Hsf domains were found in all ZmHsf proteins. Most class A members have longer introns and a larger number of exons than members of class B and C (Fig. 2).

Analysis of phylogenetic tree and collinearity of AtHsfs, OsHsfs, SbHsfs and ZmHsfs.
A phylogenetic tree was constructed from a multiple sequence alignment of full-length proteins from 31 maize ZmHsfs, 21 Arabidopsis AtHsfs, 25 rice OsHsfs and 25 sorghum SbHsfs (Fig. 3). The class and subclass of the ZmHsfs are highlighted with different background colors and different line colors, respectively (Fig. 3). There are 16, 10 and 5 ZmHsf proteins in class A, B and C, respectively. Same to the two Gramineae species, class A has more members than class B or C in maize, and the subclass A2 has the most members in class A. Compared with the dicot Arabidopsis, the three Gramineae genomes lack members of subclass A7, A8 and B3, and contain fewer members of subclass A1 and more subclass A2 and class C (Fig. 3).
Through homologous blast of maize amino acid sequences performed with MCScan toolkit, 7783 collinear genes, 1152 tandem repeat genes and 415 collinear blocks were identified (Table S3). Chromosome 1 contains the most ZmHsfs of class A and B. Chromosomes 2, 4 and 10 only contain class B genes. The majority of ZmHsfs www.nature.com/scientificreports www.nature.com/scientificreports/ were located at the end of chromosomes (Fig. 4a). Pairs of paralogous ZmHsf gene arose from whole genome duplication or segmental duplication in subclass A1, A2, B1, B2, C1 and C2, respectively (Fig. 4a). The most members might arise from transposition and have no tandem repeat genes. The results of homologous blast show the collinearity among maize, rice and sorghum (Fig. 4b). There are more collinear genes and blocks between maize and sorghum than between maize and rice (Table S4 and S5). Most ZmHsfs distribute in the regions in which there are more than 10 consecutive collinear genes (highlighted by blue lines), excepting ZmHsf03, ZmHsf11 and ZmHsf19 of subclass B2 and ZmHsf24 of subclass A9 (Fig. 4b). Maize chromosome 1 shares the most collinear blocks with chromosome 1 of sorghum bicolor and chromosome 3 of rice (Fig. 4b).
Expression analysis of ZmHsfs under heat stress. Transcriptome sequencing analysis of the ZmHsf family was performed by measuring plants TPM levels both anthesis and post-anthesis stage under HS (Fig. 5). Most of ZmHsf members responded to HS, and expression patterns are similar in the same subclass. Total eight members of subclass A2 (ZmHsf26, ZmHsf04, ZmHsf01, ZmHsf05) and subclass B2 (ZmHsf25, ZmHsf03, ZmHsf11, ZmHsf19) were induced by both HS1 and HS2 treatments. In addition, the expressions of such genes were all improved after HS treatment: ZmHsf23 of subclass A6, ZmHsf02 of subclass A9, ZmHsf08 of subclass B1 and ZmHsf29 of subclass C2. The transcription levels of ZmHsf17 of subclass A2 and ZmHsf13 of subclass C2, were up-regulated in the anthesis stage but down-regulated in the post-anthesis stage after HS treatments. To further confirm transcription changes, qRT-PCR was done for 24 ZmHsfs responsive to HS, and the results of qRT-PCR were consistent with that of transcriptome sequencing analysis (Fig. S1).
Alternative splicing (AS) analysis of ZmHsfs. The third generation full length transcriptome sequencing was used for AS analysis in the leaves of maize seedlings after HS treatment. The results showed that AS events occurred only in ZmHsf04 and ZmHsf17 in the form of intron retention, causing ZmHsf04-II and ZmHsf17-II www.nature.com/scientificreports www.nature.com/scientificreports/ with the retention lengths of intron 229 bp and 141 bp, respectively. New splicing sites were identified in the first intron both ZmHsf04 and ZmHsf17 genes. The sequences of retained regions contain several new 'TAG' and 'TAA' stop codons (Fig. 6A,B). The intron retention probably generated small peptides which retain only a truncated DBD domain, because the first intron was between the third α-helical and β-fold structures of the DBD domain. Compared the C-terminal amino acid sequences of ZmHsf04-II and ZmHsf17-II hypothetical small peptides, an additional Leucine-rich hydrophobic motif was found in ZmHsf04-II but not in ZmHsf17-II ( Fig. S2) The AS events in leaves were confirmed at seedling, anthesis and post-anthesis stages by semi-quantity RT-PCR and quantity RT-PCR experiments. Specific primers were designed in both ends of ZmHsf04 and ZmHsf17 genes and used with semi-quantitative PCR assays to assess the response of different isoforms to HS treatment. Two pairs of primers which can amplify two different isoforms respectively were designed and used for quantitative RT-PCR. Both isoforms of ZmHsf04 and ZmHsf17 were significantly induced by HS treatment (Fig. 6C,D), and the up-regulation were most significant at the seedling stages (Fig. 6E,F). The ZmHsf04 expression have a far higher increase than ZmHsf17 after HS treatment. The expression increasing in ZmHsf04-I were higher than that in ZmHsf04-II (Fig. 6C,E). Differently, the ZmHsf17-I expression decreased after HS treatment at post-anthesis stage, and the ZmHsf17-II expressions were up-regulated slightly after HS treatment at both anthesis and post-anthesis stages (Fig. 6D,F).

Discussion
Since three Hsfs were identified from HS treated tomato cells, the Hsf family genes has been studied as one of key regulators of plant thermotolerance 43 46 and 82 TaHsfs in wheat 24 . All genes identified by HMM research or BLASTP possess canonical DBD domain and OD domains. Recently the maize B73 inbred line was genome-sequenced and assembled using single-molecule PacBio sequencing technology, making it more precise for gene annotation and discovery of new genes 30 .
Base on homology comparison with rice, 31 ZmHsfs were identified from the maize 'B73RefGen_V4′ and assigned into three class A, B and C, which contain 7, 3 and 2 subclasses, respectively. Similar to rice and sorghum bicolor, class A of maize has more members than the other classes, and subclass A2, with 5 members, is the most abundant subclass. In wheat, 18 members of TaHsfA2 were recently identified 24 . Differently, Arabidopsis has only one HsfA2 gene, but four HsfA1 genes 44 . As the dominate Hsf, HsfA2 is strongly induced by HS and has been improved to be a key regulator of abiotic stress response by recent research 24,45,46 . Speaking from structure (Fig. 2), all members of the subclass A2 possess the AHA motif (motif 7), which has a transcriptional activation function. Meanwhile the AHA motifs are different in subclass A1, A4 and A6. Subcellular localization predictions indicate that ZmHsf23 and ZmHsf10 of subclass A6 are not localized to the nucleus. Previously, AtHsfA6a has been simultaneously found in both cytoplasm and nucleus and appeared to be translocated into the nucleus after salt stress 47 .
Compared to the model dicot Arabidopsis, class B and C were relatively more abundant in monocots 24,33,45,46,48 , this was verified in maize in this study (Fig. 3). Curiously, in the maize B73 genome, two Hsfs are aimed to subclasses A1, A4d, A9, B1, B2b and C1a, respectively, but there is only one member of these subclasses in both rice and sorghum bicolor (Fig. 3). These perhaps because the maize genome underwent a whole genome duplication event about 5 to 12 mya and has continuously expanded over the last 3 million years via long terminal repeat    37 . Gene loss and retention after duplication events are presumably the results of enhanced resistance to environmental stress 49 , redundant duplicates of ZmHsfs in subclasses A1 and A2 may be associated with the thermotolerance-regulating. Our previous research found that obvious differences existed in the regulating function and expression patterns of two highly homologous HsfA1 genes, ZmHsf12 50 and ZmHsf06 40 .
The flowering, pollination and grain filling of maize are all highly susceptible to high temperature 51 . Transcriptome analysis of maize leaves under HS in both anthesis and post-anthesis stages showed that the  www.nature.com/scientificreports www.nature.com/scientificreports/ expression levels of some ZmHsfs changed significantly. The three most distinctly up-regulated subclasses were A2, A6 and B2, similarly in wheat 24 . These can be proved by the close phylogenetic relationship of subclass A2 and A6 in our phylogenetic tree (Fig. 3). As one of key regulators for heat stress response, the up-regulation of ZmHsfA2 was expected. Subclass A2 were also induced by HS in rice and sorghum 45,46 . The ZmHsf05 of subclass A2 was induced by HS treatment in maize, and rescued the reduced thermotolerance of the athsfa2 mutant in Arabidopsis. In ZmHsf05-overexpressing lines of Arabidopsis, the basal and acquired thermotolerance of plants were all enhanced 41 . Up-regulations of the four ZmHsfs of subclass B2 were observed in maize after heat treatment, similar to the expression of both SlyHsf03 and SlyHsf10 of subclass B2 in tomato, which increased dramatically after heat treatment for 1-2 h 23 . The ZmHsf12 and ZmHsf06 of subclass A1 were down-regulated in both anthesis and post-anthesis stages, differing from previous studies in maize seedlings 33 . The expression levels of TaHsfA1a was down-regulated after heat or drought stress in wheat seedlings too 24 . In Arabidopsis, HsfA1s, act as transcriptional activator, can be induced at the early HS response stage 25,52 . Those suggest that Hsf can demonstrate different expression responses during different developmental stages. In our experiment, ZmHsf17 of subclass A2 and ZmHsf13 of subclass C2 were observed to have different responses to HS at anthesis and post-anthesis stages, both transcriptional levels of two genes increased in anthesis stage and decreased in post-anthesis stage under HS treatment. Different express patterns suggest diversity and complexity of regulating roles.
Almost all Hsfs have the conserved DBD domain which containing two independent exons separated by one intron 53 . The conserved intron locates in the end of the central helix-turn-helix motif (H2-T-H3) which is necessary for specific recognition of the palindromic HSEs in plant 53 . AS of pre-messenger RNA is the important post-transcriptional regulation mechanism that often widely happen during various developmental stages of plants 54 . Previous studies verified that intron retention is the predominant form of AS in plants and exon skipping is the most frequent AS event in mammals 55 . More than 6000 Arabidopsis genes and 1000 grape genes have displayed AS patterns under salt and high-temperature stress, respectively 56,57 . In Arabidopsis, three spliced variants of AtHsfA2 were identified after different HS treatment. Except for full spliced AtHsfA2-I, AtHsfA2-II  www.nature.com/scientificreports www.nature.com/scientificreports/ and AtHsfA2-III have different C-terminal amino acid sequences due to different parts of the first intron were retained 58,59 . AtHsfA2-II without Leucine-rich hydrophobic motif (LRM) in C-terminal was degraded by nonsense-mediated mRNA decay (NMD) 58 . However, AtHsfA2-III with the LRM in C-terminal could be translated into a small truncated AtHsfA2 protein at HS recovered stage. This "small AtHsfA2" protein was shown to bind to the TATA box-proximal clusters of HSE in the HsfA2 promoter to activate the transcription of HsfA2 59 .
In our experiment, only ZmHsf04 and ZmHsf17 were found to have AS events after HS, in the form of intron retention. ZmHsf04-II and ZmHsf17-II retained different partial introns which containing several premature termination codons. The two truncated isoforms of ZmHsf04 and ZmHsf17 might be involved in different transcriptional regulation process. Based on the amino acid sequences, we speculated that ZmHsf17-II without the LRM may be degraded by NMD, like the AtHsfA2-II of Arabidopsis 58 . However, ZmHsf04-II with the LRM in C-terminal could be translated and activate the transcription of ZmHsf04-I, like the AtHsfA2-III of Arabidopsis 59 . In previous reports, overexpression of ZmHsf04-I in Arabidopsis up-regulated the expression of heat and other related genes, the thermotolerance and salt tolerance of transgene plants were all enhanced 42 . At the same time, Quantitative RT-PCR showed that the expression increase of ZmHsf04 were far higher than ZmHsf17 after HS in three development periods, the expression of ZmHsf17 were also observed in normal plants, these verify above speculation further. The full spliced HsfA2 protein contain typical complete functional domains and motifs and have been considered as the major transcriptional regulatory forms 57 . In rice, the full spliced OsHsfA2d-I could participated in the unfolded protein response by regulating expression of OsBiP1, but OsHsfA2d-II could not, because of its incomplete DBD domain and LRM motif 60 . The regulating functions of different isoforms both ZmHsf04 and ZmHsf17 need more deeply research.

Conclusion
There is remarkable functional diversity in the members of plant Hsf family after the long-term adaptation to high temperature 11 . We identified and classified all possible ZmHsfs and found the orthologous and paralogous genes through amino acid sequence alignment, the results were valuable for the functional study of ZmHsfs. We analyzed the expression pattern of all the ZmHsfs responsive to heat stress at anthesis and post-anthesis stages through RNA-seq and quantitative RT-PCR, and found some up-regulated Hsf genes from subclass A2, A6 and B2. Intron retention events which often occurring within DBD domain were found in ZmHsf04 and ZmHsf17 of subclass A2 through full length transcriptome sequencing. The two truncated isoforms both ZmHsf04 and ZmHsf17 have different expression patterns after HS treatment in plants of seedlings, anthesis and post-anthesis stages. We infer that different spliced variants of Hsf genes in maize may involve in different transcriptional regulation process.

Materials and Methods
Plant culture and heat stress. Maize inbred variety 'H21(♀ Huangzao 4 × H84 ♂)' were selected for this study. Maize inbred variety 'H21' is one of the main inbred lines of China and it is widely used as a parent in breeding hybrid. Inbred H21 has typical characteristics like early maturing, compact plant type and strong combining ability, meanwhile it has good drought and disease resistance 61,62 . The plants were grown in large pots containing nutritive soil in a greenhouse and used for RNA sequencing experiments. When the seedlings grew to two-leaves-old, some plants were subjected to HS at 42 °C for 30 min (HS0). When pollens were released, some plants were subjected to same HS as seedlings (HS1). After 3 weeks, the other plants were subjected to HS either (HS2), and the plants growing under the normal conditions of different stages were performed as CK0, CK1 and CK2. The second leaves or the flag leaves were sampled quickly and frozed in liquid nitrogen and stored until RNA extraction.
Identification and bioinformatic analysis of Hsfs in maize. The maize reference genome (Maize B73 RefGen_V4) was downloaded from the MaizeGDB database on the website ftp://ftp.ensemblgenomes.org/ pub/release-41/plants/fasta/zea_mays/dna/. 47 non-redundant Hsfs amino acid sequences were downloaded from PlantTFDB and blasted against the amino acid sequences retrieved from the maize reference genome. Additionally, the Hsf-type DBD domain (Pfam: PF00447) was used as a query in BLASTP (P = 0.001) search for probable Hsf protein sequences in the maize genome reference data base. 58 maize Hsf proteins were collected after homologous comparison and HMM research. Results from the two searches described above were blasted, respectively, against the UniprotKB/Swiss-prot data base on the NCBI blastp suite and the NCBI batch CDD (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). Redundant sequences were discarded. Using SMART, sequences without a DBD domain or HR-A/B domain were eliminated. Motif elicitation of Hsf proteins was completed using the MEME suite (http://meme-suite.org/tools/meme). The pI and MW of identified Hsf proteins were calculated using the Expasy website (https://web.expasy.org/compute_pi/). WoLFPSORT was used to predict subcellular localization motifs in amino acid sequences of the identified Hsf proteins (https://wolfpsort. hgc.jp/).

Multiple sequence alignment and phylogenetic tree. The amino acid sequences of Hsf proteins from
Arabidopsis, Oryza sativa, Sorghum bicolor and Zea mays were examined together. MEGA 7.0 was used for multiple sequence alignment and the phylogenetic tree was constructed using the Neighbor joining method with a bootstrap value of 1000. The phylogenetic tree was modified using Figtree software. The physical location and gene duplication events were assigned based on the maize genome annotation results from the MCScan toolkit in TBtools 63 . MCScan toolkit was also used to analyze the collinearity of rice, sorghum and maize Hsf proteins and plot collinear genes and blocks on the chromosomes 63 .
RNA extraction and RNA sequencing analysis. Total RNA of leaves from different treatments were extracted and purified using the Total RNA Extractor kit and RNase-free DNase I (Sangon, China). The RNA (2020) 10:8073 | https://doi.org/10.1038/s41598-020-65068-z www.nature.com/scientificreports www.nature.com/scientificreports/ quality and integrity were estimated with an Agilent 2100 Bioanalyzer (Agilent, USA), and the RNA quantity was measured with a NanoDrop ND-1000 spectrophotometer (Thermo scientific, USA). The cDNA libraries were constructed using 2 μg of fragmented RNA from different samples with slight modifications of previously published method 24 . Refer to the instruction manual of VAHTS mRNA-seq V3 library prep Kit for Illumina (Vazyme, China) for the specific construction process. The four cDNA libraries were sequenced using Hiseq XTen sequencers (Illumina, USA). A data quality assessment of the raw reads was attained with FastQC and Trimmomatic. The clean reads were mapped to the maize reference genome sequence (Maize B73 RefGen_V4), and the transcription-level expression was analyzed with HISAT, StringTie and Ballgown according to a previously reported standard process 64 . Gene expression levels were calculated in TPM (Transcripts per million) using StringTie and Ballgown, and the heat map of gene expression levels was plotted using TBtools.
Quantitative RT-PCR for validation of RNA-Seq. The first strand cDNA was synthesized with 1 μg RNA by reverse transcription PCR. A SYBR Premix ExTaqTM kit (Takara, Japan) was used for quantitative RT-PCR assays in the ABI 7500 (Applied Biosystem, USA) according to the manufacturer instructions. The reaction procedure included pre-denaturation at 95 °C for 10 min, followed by 40 cycles of denaturation at 95 °C for 5 s and annealing/extension at 60 °C for 40 s. Three biological replicates were performed for each group. After the reaction, the data were analyzed using the 2 −ΔΔCt method 65,66 and plotted with Microsoft Excel 2010. The expression level of CK1 was set as 1. For statistical analysis, each dataset was repeated at least three times. The Actin2 gene was used as an endogenous control. All of the primers used in quantitative RT-PCR are listed in Table S1. Nanopore sequencing and alternative splicing analysis. The second leaves of two-leaf-old maize seedlings were sampled after HS for nanopore sequencing experiments. RNA extraction, cDNA library construction and long-read sequencing were performed according to the standard protocols of Oxford Nanopore Technologies (ONT) 67,68 . Ribosomal RNA and low-quality raw reads less than 500 bp in length were removed. Minmap2 software was used to map all full-length reads to the reference genome and remove the redundant transcript reads were removed. The alternative splicing events in both CK0 and HS0 treatments were identified using Astalavista software 69 . The structures of different isoforms were analyzed by sequence alignment in DNAman software. The RNA samples of maize leaves from seedling, anthesis and post-anthesis stages were used to analyze the transcriptional levels of different isoforms. The transcription abundances of the cDNA from different isoforms were tested by semi-quantity and quantity RT-PCR methods, respectively. According to the operation manual, the 2 × Taq Plus Master Mix II (Dye Plus) kits were used for semi-quantity RT-PCR experiments. The primers used in semi-quantity and quantity RT-PCR were listed in Table S2.