Genome-wide identification, characterization, and expression patterns analysis of the SBP-box gene family in wheat (Triticum aestivum L.)

SQUAMOSA promoter-binding protein (SBP)-box genes encode a family of plant-specific transcription factors that play roles in plant growth and development. The characteristics of SBP-box genes in rice (Oryza sativa) and Arabidopsis have been reported, but their potential roles in wheat (Triticum aestivum) are not fully understood. In this study, 48 SBP-box genes (TaSBPs) were identified; they were located in all wheat chromosomes except for 4B and 4D. Six TaSBPs were identified as tandem duplication genes that formed three tandem duplication pairs, while 22 were segmentally duplicated genes that formed 16 segmental duplication pairs. Subcellular localization prediction showed TaSBPs were located in nucleus. Among the 48 TaSBPs, 24 were predicted to be putative targets of TamiR156. Phylogenetic analysis showed that TaSBPs, AtSBPs, and OsSBPs that shared similar functions were clustered into the same subgroups. The phylogenetic relationships between the TaSBPs were supported by the identification of highly conserved motifs and gene structures. Four types of cis-elements––transcription-related, development-related, hormone-related, and abiotic stress-related elements––were found in the TaSBP promoters. Expression profiles indicated most TaSBPs participate in flower development and abiotic stress responses. This study establishes a foundation for further investigation of TaSBP genes and provides novel insights into their biological functions.


Scientific Reports
| (2020) 10:17250 | https://doi.org/10.1038/s41598-020-74417-x www.nature.com/scientificreports/ TaSPL16 in Arabidopsis delays the emergence of vegetative leaves, increases organ size, and affects yield-related traits 14 . These studies have indicated that SBP-box genes function in regulation of plant development and growth. SBP-box genes are also among the conserved plant TFs that are targeted by microRNAs (miRNAs), especially miR156/157 family members 15 . For example, 10 AtSBP genes are predicted or verified to be targeted by miR156 in Arabidopsis 8 . In rapeseed (Brassica napus), 44 BnSBPs were predicted to be targeted by miR156 16 . In rice, there are 11 SBP-box genes that are targets of OsmiR156, and tissue-specific interactions have been revealed between OsmiR156 and OsSBP genes 17 . However, there are few reports on whether miRNA regulation is conserved in wheat SBP-box genes with miRNA-binding sites.
To date, a large number of SBP-box genes have been identified in different plants: for example, there are 16 in Arabidopsis, 18 in rice 18 , 29 in maize 19 , and 19 in grapevines 20 . Wheat is one of the most important food crops worldwide. Compared with other plant species, the identification and functional analysis of the SBP-box gene family in wheat is not so advanced. In this study, we conducted genome-wide identification of SBP-box genes in wheat and performed a phylogenetic analysis and classified the genes into subgroups to explore the evolution of the SBP-box gene family. The exon-intron structure, the conserved motifs, and expression patterns were also analyzed. This study establishes a foundation for further analysis of SBP-box genes in wheat and other plants species.

Identification of SBP-box genes in wheat.
To identify the SBP genes in wheat, we performed a Hidden Markov Model (HMM) search and 48 non-redundant SBP genes were identified in the wheat genome (Supplementary Table S1). The number of hexaploid wheat SBP (TaSBP) genes in wheat (48) was much higher than those in rice (18), maize (31), and Arabidopsis (16) 18,19 . The 48 TaSBP genes were named TaSBP1A to TaSBP19D according to their distribution on chromosomes and genomic homology. All chromosomes contained at least one TaSBP gene, except for chromosomes 4B and 4D (Fig. 1). As shown in Fig. 1 and Supplementary Table S2, 22 segmentally duplicated genes were identified; these formed 16 segmental duplication pairs. Meanwhile, three tandem duplication pairs were derived from chromosomal tandem duplication. The 48 genes--with the exception of TaSBP11A, TaSBP11B, and TaSBP11D--were verified by expressed sequence tags (ESTs) deposited in the National Center for Biotechnology Information (NCBI) database, and 36 TaSBP genes constituted 12 sets, with every set including three homologous genes in the A, B, and D sub-genomes, respectively.
The physical features of the TaSBP genes were predicted. The protein length varied from 192 (TaSBP2A) to 1124 (TaSBP16D) amino acids; the isoelectric point varied from 5.73 (TaSBP6A) to 9.87 (TaSBP2D); and the molecular weight varied from 20.117 kDa (TaSBP2B) to 123.141 kDa (TaSBP16D). Detailed information is presented in Supplementary Table S1. Subcellular localization of TaSBP proteins. The results of protein subcellular localization showed that all TaSBP proteins were located in the nucleus (Supplementary Table S1). To determine if wheat TaSBPs localize to nucleus, we cloned three TaSBPs from Chinese Spring and assessed the subcellular localization of the encoded TaSBPs by transient expression assays in wheat protoplasts, using translational fusions to GFP. As shown in  Multiple alignment and phylogenetic analysis of TaSBPs. All the TaSBP proteins were aligned using ClustalW. As shown in Fig. 3, each SBP domain contained a complete SBP domain with two Zn finger motifs and one nuclear localization signal region (NLS). The first Zn finger motif was a CCCH type motif, and the second was a CCHC type motif.
To further evaluate the phylogenetic relationships of TaSBPs and other plant SBPs, we selected 263 SBP sequences from ten species according to previous studies 16,18,19,21,22  , and constructed a phylogenetic tree based on the full-length protein sequences alignment. According to the phylogenetic analysis (Fig. 4), SBPs from these ten plant species could be classified into night subgroups. The largest subgroup (I) contained 48 SBP members. The smallest subgroup (VIII) contained 14 members. Except for subgroup II and VIII, the other subgroups contained at least three TaSBPs. As shown in Fig. 4, subgroup II and VIII only contained dicot SBPs, while subgroups IV and VI only had monocot SBP members. Based on the phylogenetic analysis, all TaSBP members were classified into seven subgroups (Fig. 5A).
Conserved motifs, gene structure, and sites targeted by miR156. The SBP domain forms the core of SBP transcription factors and binds to the promoter of their downstream genes. In total, 10 conserved motifs were identified and designated motif 1 to 10 (Fig. 5B). Among them, motifs 1, 2, and 4 were in the basic region and the hinge region of the SBP domain. Motifs 3, 6, 7, and 10 were only found in subgroup V; motif 5 was only www.nature.com/scientificreports/ present in subgroups VI and VIII; and motif 9 was only found in subgroup VIII. The structure of the TaSBP genes was also examined to elucidate the gene function (Fig. 5C). The number of exons ranged from 2 to 11; subgroups I, II, III, IV, V, and VII only contained 2-4 exons, while subgroups VI and VIII had more than 10 exons. The TaSBP genes in the same subgroup shared similar gene structures.
To identify the miR156-mediated post-transcriptional regulation of TaSBP genes, we searched the coding sequences (CDSs) and 3′-untranslated region (UTR) sequences of all TaSBPs for miR156-binding sites. The results showed that 24 TaSBPs (half of the TaSBPs) had miR156-binding sites (sequences that were complementary to the mature TamiR156 sequences), with 19 in the CDSs and 5 in the 3′-UTR regions (Fig. 6).
Cis-acting elements in the promoters of TaSBP genes. Cis-acting elements in gene promoters are crucial regions involved in transcription factor binding for the initiation of transcription. To further explore the possible biological functions of TaSBP genes, the 2-kb upstream promoter regions of all TaSBP genes were used to predict cis-acting elements using the PlantCARE database. Four types of cis-acting elements--transcriptionrelated, development-related, hormone-related, and abiotic stress-related elements--were identified (Fig. 7). Transcription-related cis-elements-including the TATA-box and CAAT-box--were found in all the TaSBP genes. Development-related cis-elements included meristem-specific regulatory elements (CCG TCC -box and CAT-box). Hormone-related cis-elements included the methyl jasmonate (MeJA)-responsive element (CGTC) Analysis of the expression patterns of the TaSBP genes. To obtain the temporal and spatial expression patterns of TaSBP genes, the expression profiles were analyzed using high-throughput data from previous research 18 . As shown in Fig. 8, 95.83% (46/48) of TaSBP genes were detected in at least one tissue. Further, it can be seen that 89.58% were highly expressed in the inflorescence, especially when two nodes or internodes were visible and when the stem reached its maximum length.
To elucidate the roles of these TaSBP genes in response to abiotic stresses, expression profiles of TaSBP genes under different abiotic stresses were also examined. The results showed that the expression of 79.17% (38/48) of the TaSBP genes was detected and some of them were highly expressed in response to heat and drought stresses. The phylogenetically similar genes shared similar expression patterns. For example, the subgroup I genes had similar expression patterns, and the subgroup VIII genes were expressed in all tissues.
To elucidate the roles of TaSBP genes in wheat growth and development, we examined the relative expression levels of 10 TaSBP genes (each group choose at least one TaSBP gene) in four tissues (roots, stems, leaves, and inflorescences, collected at the heading stage) (Fig. 9A) and under different abiotic stresses (Fig. 9B). All the TaSBP genes were detected in at least one of the tissues examined, but different expression levels were observed. All of them were highly expressed in inflorescences. In addition, TaSBP10A and TaSBP12B/D were highly expressed in stems, and TaSBP19B/D was mainly expressed in leaves. These results suggested that these genes may play different roles in wheat growth and development.
To explore the potential roles of TaSBP genes under different abiotic stresses, seedling plants were subjected to heat, cold, ABA, salt, and drought. The changes in the transcript levels of the genes were analyzed using quantitative real-time (RT)-PCR. Results showed that all of them were induced by different abiotic stresses (Fig. 9B). The expression level of TaSBP9B and TaSBP19B/D were significantly down-regulated under different abiotic stresses when compared to control. Under heat treatment, the expression levels of TaSBP1A/B/D and 16A were significantly up-regulated. Under cold stress, the expression levels of TaSBP6A, TaSBP16A, and TaSBP17B/D was significantly up-regulated, while TaSBP9A/D, TaSBP9B, TaSBP10A, and TaSBP19B/D were significantly down-regulated. It has been proposed that SBP-box genes are plant specific 3 . In the present study, 48 wheat SBP-box genes were identified, accounting for 0.63% of all wheat annotation genes, which is more than that in rice (0.45%), Arabidopsis (0.58%), maize (0.57%), and B. distachyon (0.51%) 18,19,23 . In terms of sub-genomes, there are 16, 16, and 16 members in wheat A, B, and D sub-genomes respectively, this number in each sub-genome is similar with the result in rice (18), S. bicolor (17), S. italica (18), and B. distachyon (16) 18,23,24 . Duplication analysis showed that there were 16 segmental duplication pairs were formed by 22 TaSBPs, and three tandem duplication pairs were constructed by 6 TaSBPs. Duplication promoted TaSBPs gene expansion, maybe is the reason why the number of TaSBPs is more than other plant species.
Multiple sequence analysis showed that each of TaSBPs contained two Zn finger motifs, CCCH and CCHC type motifs, and one NLS region in the SBP domain, they constitute the main identifying characteristics of www.nature.com/scientificreports/ SBP-box proteins. Genes within the same phylogenetic subgroup shared a similar length, gene structure, and motif composition. For example, all subgroup I, II, IV, and VII members had three exons, motifs 3, 6, 7, and 10 were only found in subgroup V. Therefore, the similar gene structure and motif composition of SBP-box genes in wheat might reflect their evolutionary relationships. In the promoters of these TaSBP-box genes, four kinds of cis-acting elements--transcription-related, development-related, hormone-related, and abiotic stress-related elements--were detected, and phylogenetically similar genes shared the same cis-elements.
In addition, phylogenetic analysis revealed that the TaSBP-box proteins had a close evolutionary relationship with other plant SBP-box genes, especially with monocot plants. SBPs from ten plant species could be classified www.nature.com/scientificreports/ into night subgroups (Fig. 4), and there are two subgroups, II and VIII, only contained dicot plant SBPs, while IV and VI only had monocot SBP members, indicated that the closer relationship between monocot plant SBPs.  www.nature.com/scientificreports/ phase 26 ; AtSPL8 affects pollen sac development and also controls gynoecium development 27 ; the miR156-SPL3 module controls FLOWERING LOCUST expression to regulate ambient temperature-responsive flowering 28 . In rice, OsSPL14 is highly expressed in the reproductive stage and promotes flower development; it also affects panicle branching 29 ; OsSPL8 (OsLG1) controls ligule development and inflorescence architecture 30,31 . In the tomato plant, LeSPL/CNR is crucial for normal fruit development and ripening 32 . In wheat, TaSPL20 and TaSPL21, corresponding to TaSBP13D and TaSBP11D in this study, respectively, were highly expressed in the lemma and palea 12 . Ectopic expression of TaSPL20 or TaSPL21 in rice revealed that these genes have similar functions in regard to increasing the number of primary branches, secondary branches, grain number, and panicle length 12 . Ectopic expression of TaSPL16 (TaSBP15B in this study) in Arabidopsis delays the emergence of vegetative leaves, promotes early flowering, increases organ size, and affects yield-related traits 14 . TaSPL8 (TaSBP4D in this study) in wheat affects lamina joint development and plant architecture 13 . In this study, we found that 89.58% of TaSBP genes were highly expressed in the inflorescence according to ArrayExpress data, especially when two nodes or internodes were visible and when the stem reached its maximum length. Additionally, in our quantitative RT-PCR analysis of different tissues, all of the 10 selected TaSBP genes were highly expressed in flowers. These results suggest that TaSBP genes might play important roles in plant development and growth.

SBP-box genes play important roles in plant development and growth. SBP-box
Conservation of miR156-binding sites in SBP-box genes. miRNAs play key roles in regulating the transcription of target genes. Most studies show that overexpression of miR164, miR159a, miR319, miR319, and miR399 affect members of the NAC, MYB, TCP, GAMYB, and WRKY transcription factor families, respectively 13 . Regarding the SBP-box gene family, tissue-specific interactions between OsmiR156 and OsSBP genes were found in rice 17 . Previous studies indicated that the miR156 function important in plant development and growth. For example, miR156 directly repressed the expression of SBP-box genes that function in juvenileto-adult transition in wheat and Arabidopsis 33,34 . miR156 also play important roles in controlling flowering, leaf development, plant architecture by targeting SBP-box genes. For example, overexpression of miR156 delays Arabidopsis flowering and decreases apical dominance by regulating SBP-box genes 35 . In wheat, overexpression of tae-miR156 leads to increased tiller number and severe defects in spikelet formation 36 .
In the present study, target prediction showed that 24 TaSBPs have miR156-binding sites and that phylogenetically similar genes shared the same miR156-binding site. SBP-box genes with a miRNA-binding site existed across many subgroups (I, III, IV, VI, and VII) in wheat, suggesting conservation of miRNA-binding sites because of their functional importance. Previous study showed that tae-miR156-TaSPL3/17 interact with DWARF53 to regulate TEOSINTE BRANCHED1 (TaTB1) and BARREN STALK1 (TaBA1) expression, thus regulated wheat tillering and spikelet development 36 . Wheat TaSPL16 (TaSBP15B in this study) gene have miR156-binding sites in their terminal exons 14 . It has been reported that miR156 is responsible for the temporal expression of SPL13 during vegetative development 35 . As in Arabidopsis and Brassica napus, a previous study reported that the homologous genes in wheat are predicted to be targets of miR156 16 . Moreover, in the present study, the sites complementary to miR156 were located in the CDS of 19 TaSBP genes and in the 3′UTR regions of five TaSBP genes. These results showed that the miR156-binding site in SBP-box genes is conserved across plant species.

Materials and methods
Data retrieval and identification of SBP-box genes. To identify the SBP-box genes in wheat, the HMMER profile of the SBP-box-binding domain (PF03110) was obtained from the Pfam database (https :// pfam.xfam.org/) and searched against the protein sequences of wheat with a threshold of e < 1e −5 . The SBP-box protein sequences of 16 Arabidopsis and 19 rice SBP-box genes 18 were retrieved from the Ensembl Plants database, and used to conduct a BLASTP search against the protein sequences of wheat with the threshold of e < 1e −5 and identity of 50%. After BLASTP, a self-blast and manual correction was performed to remove the alternative splicing events and redundancy. Finally, the NCBI-Conserved Domains Database (CDD; https ://www.ncbi.nlm. nih.gov/cdd) and Simple Modular Architecture Research Tool (SMART) program (https ://smart .embl.de/) were used to confirm the putative SBP-box proteins. The subcellular location of SBP-box genes was predicted using the CELLO web tool (https ://cello .life.nctu.edu.tw/). The theoretical isoelectric point and molecular weight of SBP-box genes were predicted using the ExPASy tool (https ://www.expas y.org/). To further verify the existence of TaSBP genes in wheat, we performed BLASTN 37 to search for expressed sequence tags (ESTs) using the CDSs of TaSBP genes.
The protein sequences, cDNA sequences, DNA sequences, upstream 2-kb genomic DNA sequences, and CDSs of SBP-box genes used in this study were downloaded from the Ensembl Plants database (https ://plant s.ensem bl.org/index .html) for further analysis.

Subcellular localization of TaSBPs.
To check the subcellular localization of TaSBP4B, TaSBP9B, and TaSBP10A protein in wheat protoplast, a number of GFP fusion proteins were constructed. The cDNAs of fulllength TaSBP4B, TaSBP9B, and TaSBP10A were cloned in frame with GFP to generate the constructs, respectively. These recombinant plasmids were transformed into wheat protoplasts using a polyethylene glycol (PEG)mediated transient transformation system 38 . Visualization of the fluorescent proteins was performed using an Olympus IX83-FV1200 confocol microscope with excitation wavelengths of 460/480 nm for GFP and 633 nm for chloroplast. Gene structure and conserved motif analyses. The exon-intron structure of TaSBP genes was graphically displayed using the Gene Structure Display Server 44 using the CDSs and DNA sequences of TaSBPs. The amino acid sequences of TaSBPs were used to predict the conserved motifs using the MEME Suite web server 45 with the maximum number of motifs set at 10 and the optimum width of motifs from 5 to 200 amino acids. These data were used to analyze the expression profiles of TaSBP genes in 15 tissues, i.e. the root when the cotyledon emerged, three leaves were visible, and the stem reached its maximum length; the stem when the nodes or internodes were visible, half of the flowers were open, and elongation had begun; the leaf when the main shoot and axillary shoots were visible (with three nodes), the cotyledon emerged, and the whole plant fruit had formed; the inflorescence when the flowers opened, two nodes or internodes were visible, and the stem reached maximum length; and the grain when 30-50% of the whole plant grain had formed, 70-100% of the whole plant grain had formed, and the whole plant grain had ripened.  47 . To normalize the total amount of cDNA present in each reaction, the wheat ACTIN gene was coamplified as an endogenous control for calibration of the relative expression 48 . The relative expression level was calculated using the 2 −△△CT method 49   , and the stem reached its maximum length (c); the stem when two nodes or internodes were visible (d), half of the flowers were open (e), and elongation had begun (f); the leaf when the main shoot and axillary shoots were visible (with three nodes) (g), the cotyledon emerged (h), and the whole plant grain had formed (i); the inflorescence when the flowers opened (j), two nodes or internodes were visible (k), and the stem reached its maximum length (l); the grain when 30-50% of the whole plant grain had formed (m), 70-100% of the whole grain had formed (n), and the whole plant grain had ripened (o). The abiotic stresses were as follows: (p) normal condition, (q) heat stress for 1 h, (r) heat stress for 6 h, (s) drought stress for 1 h, (t) drought stress for 6 h, (u) heat and drought stress combination for 1 h, (v) heat and drought stress combination for 6 h.

Conclusions
In this study, we systematically identified TaSBP genes in the wheat genome. Forty-eight TaSBPs were identified and each contained a conserved SBP-box domain. The chromosome locations, gene and protein structures, subcellular localization, phylogenetic relationships, miR156-binding sites, and cis-elements were also characterized. The TaSBP expression levels in different tissues indicated that they were responsible for flower development.
Quantitative RT-PCR analysis showed that the tested TaSBP genes were highly expressed in inflorescences and in response to abiotic stressors. This study establishes a foundation for further investigation of TaSBP genes and provides novel insights into their biological functions.
Received: 12 January 2020; Accepted: 21 September 2020 Figure 9. Results of quantitative RT-PCR of 10 TaSBP genes (A) in different tissues and (B) under different abiotic stresses. The horizontal and vertical co-ordinates represent four different tissues/abiotic stresses and the relative expression, respectively. Statistically significant differences are indicated: *P < 0.05; **P < 0.01 (Student's t-test).