Genome-wide identification and expression analysis of bZIP gene family in Carthamus tinctorius L.

The basic leucine zipper (bZIP) is a widely known transcription factors family in eukaryotes. In plants, the role of bZIP proteins are crucial in various biological functions such as plant growth and development, seed maturation, response to light signal and environmental stress. To date, bZIP protein family has been comprehensively identified in Arabidopsis, castor, rice, ramie, soybean and other plant species, however, the complete genome-wide investigation of Carthamus tinctorius-bZIP family still remains unexplained. Here, we identified 52 putative bZIP genes from Carthamus tinctorius using a draft genome assembly and further analyzed their evolutionary classification, physicochemical properties, Conserved domain analysis, functional differentiation and the investigation of expression level in different tissues. Based on the common bZIP domain, CtbZIP family were clustered into 12 subfamilies renamed as (A–J, S, X), of which the X is a unique subfamily to Carthamus tinctorius. A total of 20 conserved protein motifs were found in CtbZIP proteins. The expression profiling of CtbZIP genes deciphered their tissue-specific pattern. Furthermore, the changes in CtbZIP transcript abundance suggested that their transcription regulation could be highly influenced by light intensity and hormones. Collectively, this study highlights all functional and regulatory elements of bZIP transcription factors family in Carthamus tinctorius which may serve as potential candidates for functional characterization in future.


Classification of the CtbZIP proteins based on phylogram.
We constructed a phylogenetic tree to elucidate the evolutionary relationship among bZIP TFs of Carthamus tinctorius L., Arabidopsis thaliana, Oryza sativa and Ricinus communis (Fig. 1). Arabidopsis thaliana bZIP family has been classified into 13 subfamilies 4 . The bZIP TFs of most of plant species are classified according to the subfamilies of Arabidopsis. For example, the bZIP proteins of Oryza sativa were divided into 10 subfamilies 5 , Ricinus communis into 9 26 and Camellia sinensis into 11 27 . We divided the 52 CtbZIP TFs into 12 subfamilies (CtbZIP-A, CtbZIP-B, CtbZIP-C, CtbZIP-D, CtbZIP-E, CtbZIP-F, CtbZIP-G, CtbZIP-H, CtbZIP-I, CtbZIP-J, CtbZIP-S and CtbZIP-X) on the basis of the classification of Arabidopsis. However, CtbZIP13, CtbZIP14, CtbZIP20 and CtbZIP46 could not be aggregated into any subfamily thus were grouped together into a separate branch named as subfamily X. None of CtbZIP  28 showed one or more of the intact conserved domains (bZIP_1, bZIP_2 and bZIP Maf while 20 conserved motifs were identified using MEME software 29 the names and sequence logos of which are illustrated in Fig. 2. We counted the width and E value of each conservative motif using TB tools 30 (Fig. 3A), and the distribution number of motifs in each subfamily was depicted (Fig. 3B). In terms of size, motif 20 was the shortest (20 aa) while motif 3, 11, 12, 17 and 18 were longest having 50 aa each. The motif average width lied around 38 aa. Interestingly, motif1 and motif2 were recognized as bZIP conserved domains and could be found in all of the subfamilies, however some subfamilies also had unique motif compositions (Fig. 3B,C). For example, subfamily A possesses a unique motif6, whereas motif11 is unique to subfamily D, motif 17 in subfamily I and motif13 in subfamily S. For the safflower specific subfamily X, CtbZIP20 and CtbZIP46 specifically contain motif 19, which are associated next to the N-terminus of the amino acid sequence and substantially identical to the bZIP conserved domain (Fig. 3C). All of these motifs indicate the group-specific functions for members in each group.
Functional differentiation of CtbZIP TFs. Some motifs of bZIP TFs participate in a variety of physiological processes. To understand their function in the biological processes, we predicted the function of CtbZIP TFs in silico using Gene Ontology (GO) terms 31 . All of 52 CtbZIP TFs were analyzed, 45 of which categorized into three primary GO functional categories, biological processes (BP), molecular function (MF) and cellular   5B) and participate in the process of nitrogen metabolism. A number of CtbZIP TFs might respond to various abiotic stresses. All CtbZIP TFs have transcriptional regulatory activity, this allows them to regulate the growth and development of safflower. Based on these findings, the function of CtbZIPs may be associated with various biosynthetic and metabolic processes in response to abiotic and biotic stresses to affect the development of various tissues and organs.
Expression profiles and network analysis of CtbZIP TFs. The bZIP TFs are not only the most widely distributed and most conserved eukaryotic transcription factors, but their function is also diverse. The safflower bZIP TFs have a variety of functions and there are synergistic effects in the exercise of their functions. In order to explore the expression profiles and the interaction among the CtbZIP TFs, we analyzed their expression variation in different tissues, including roots, stems, leaves, flowers, DAF10-seeds, DAF13-seeds and DAF20-seeds by heatmap (Fig. 6). We noticed that CtbZIP13 highly expresses in roots. CtbZIP6 and 25 transcripts are abundant while that of CtbZIP40, 23 and 29 are less in stem. CtbZIP13 and 25 have higher expression in flowers than in other samples. High expression of CtbZIP5 is observed in DAF13-seeds. Similarly, CtbZIP52 highly expresses in DAF20-seeds. However, the expression levels of CtbZIP22 is almost the same in all of the 7 samples. The varied expression pattern indicates functional divergence of different groups of CtbZIP TFs. These results indicate that the functions of CtbZIP family are differentiated with differentiation in their expression. We quantified the expressions of all 52 CtbZIP TFs in different tissues and seeds (of various developmental stages). The expression networks (p ≤ 0.05) (Fig. 7) were constructed using BioLayout Express 3D 3.2 software 32 . The CtbZIP TFs are a complex family with 51 nodes and 1,199 edges. Among them, 43 transcripts (85%) are more tended to have associated expression and form a co-expression network whereas the other 8 transcripts also exhibit weak co-expression. The network is composed of 5 clusters; the largest cluster contains sixteen transcripts, while the smallest cluster contains eight. There is a certain degree of related expression trend between these clusters and this tendency was statistically significant. These results indicate that although the functions

Expression analysis of CtbZIP TFs in various tissues.
To further verify the authenticity of the expression pattern, we detected the expression level of 52 CtbZIP genes in different tissues of safflower including roots, stems, leaves, flowers, seeds, cotyledons and hypocotyls using RT-qPCR (Fig. 8). The results showed that the CtbZIP25 gene is highly expressed in all tissues and we speculated that it may be involved in various stages of plant growth and development. The CtbZIP13 is highly expressed in root and might play a role in root growth. In seeds, CtbZIP52 has the highest expression and might regulate the development of seeds. Likewise, CtbZIP25and CtbZIP30 have higher expression in hypocotyls. The expression level of CtbZIP6 and CtbZIP25 peak in stem and they may affect the growth of the stem. Conversely, the expression level of CtbZIP2, CtbZIP23, CtbZIP31 and CtbZIP34 is relatively low in all tissues, among which CtbZIP34 is the lowest in roots while CtbZIP2, CtbZIP22, CtbZIP31 and CtbZIP47 are the lowest in stems. Similarly, CtbZIP23 and CtbZIP47 are the lowest in leaves, CtbZIP23 in flowers and CtbZIP19, CtbZIP20 and CtbZIP23 in seeds have the lowest expression. However, CtbZIP22 gene expresses in cotyledon and hypocotyl after seed germination. This indicates that the CtbZIP22 gene is specifically involved in seed germination. In short, the results of RT-qPCR show that the expression pat-

Discussion
Safflower is an important plant used for ornamental, food, feed and medicinal purposes. In terms of tolerance for abiotic stresses such as water deficit, it is a tough plant however, for increasing demand of edible oil as well its vast pharmaceutical properties, its improvement seeks comprehensive understanding through omics. Omics by combining genomics, transcriptomics, proteomics and metabolomics (as solving a puzzle) attempts to obtain a clear picture of molecular and biochemical circuitries underlying primary and secondary metabolites/products 33 . Our genomic survey identified 52 members in Carthamus tinctorius bZIP TF family. These TFs constitute a large families in all organisms reported to date. CtbZIPs also look a big gene family however, as compared to Arabidopsis (78 TFs), rice (89), maize (125), Brassica napus (247) and soybean (131), safflower got a relatively small bZIP family. Based on phylogenetic reconstruction (Fig. 1), we categorized CtbZIPs into 13 subfamilies (A-J, S and X) according to their relevance in Arabidopsis 4 , rice 5 , Ricinus communis 26 and Camellia sinensis 27 . This categorization was further supported protein structure analyses. None of CtbZIP proteins clustered into subfamily K and M indicating loss of these proteins throughout safflower evolution.
CtbZIPs protein structure analyses revealed 20 motifs in total, same as reported in Manihot esculanta 38 , which were named sequentially from motif1 to motif20 (Figs. 2, 3). Relating their motifs to some known motifs revealed some functions of CtbZIP TFs. The motif2 was further identified as the extension of the leu zipper region, closely related to motif1. The motif4 was a new highly conserved cysteine-rich sequence which might be involved in protein-protein or protein-DNA interactions. In most of the cases, motif1 and motif2 conserved domains are located next to each other, however, some motifs are located far from each other. The maximum distance between two motifs is found in CtbZIP45 of subfamily D. In addition, there are three motifs (motif4, 5, 13) between bZIP domains in subfamily E of CtbZIP TFs, and motif1 and motif2 together with three motifs form a conserved structural group, as the subfamily E of OsbZIPs 5 . The same situation exists in the subfamily I of CtbZIP TFs, motif1 and motif2 together with motif4, 5, 9, 17 form a conserved structural group, but motif9 is not between motif1 and motif2. The conserved groups of E and I subfamilies exist near the C-terminus which predicts that the functions of subfamily E and subfamily I could make a significant difference with other subfamilies. CtbZIP26 only contains the bZIP domain (motif1 and motif2) in the subfamily H, which confirms that the function of CtbZIP26 is more conservative. The motif11 in subfamily D is a conserved structure of Dog1 (PF14144) also found in Arabidopsis bZIP 4 . This family appears to be a highly specific controller of seed dormancy. On one hand, MEME results further prove that outcomes of Hidden Markov Model (HMM) have high reliability. On the other hand, they also reveal the functional diversity of CtbZIP family. These analyses are an important starting point for further functional verification.
The genome-wide expression prediction of CtbZIPs genes flaunted their differential transcript level in various developmental stages and tissues. As shown in Fig. 6, there seems a vast level of divergence in expression pattern with respect to tissue type and seed stage. The varied expression pattern indicates functional divergence of different groups of CtbZIP TFs, which predicts that the functions of CtbZIP family vary with variation in their expression. We quantified the expressions of all 52 CtbZIP TFs in different tissues and seeds (of various developmental stages). The network is composed of 5 clusters as shown in Fig. 7. There is a certain degree of related expression trend between these clusters and this tendency was statistically significant. These results indicate that although the functions and expressions of CtbZIP family members have dramatically diverged, they retain to some extent, www.nature.com/scientificreports/ the tendency of correlated expression and functional cooperation. To verify the transcript abundance of CtbZIPs genes, we used RT-qPCR and evaluated their expression in root, stem, leaf, flower, seed, cotyledon and hypocotyl (Fig. 8). The results of RT-qPCR showed that the expression pattern of safflower is consistent with the predicted expression. According to this expression pattern, the function of CtbZIP TFs can be more effectively estimated.
In the process of plant growth and development, light and hormone are the key factors that directly affect these two processes. At present, it has been confirmed that the A subfamily bZIP members of Arabidopsis thaliana are mainly involved in ABA signaling 39 whereas H and G subfamilies regulate photoresponse 14,40 . In rice, Values are average of three replicates ± SD. Asterisks indicate significant difference applying ANOVA (p < 0.05, p < 0.01 and p < 0.001).

Scientific RepoRtS
| (2020) 10:15521 | https://doi.org/10.1038/s41598-020-72390-z www.nature.com/scientificreports/ OsbZIP12 has been reported as a positive regulator of ABA signalling 41 while in Medicago esculenta, bZIP11, 27, 52 and 64 were upregulated at time points of ABA treatment 38 . In Ipomoea trifida, eight bZIP genes were upregulated at least in one tissue type as well as one time point, in response to ABA treatment 42 . AtbZIP16 has been reported to regulate early development of seedling by integrating hormone and light signalling pathways thereby promoting germination as well elongation of hypocotyl 43 . Under RL (Red Light) treatment, ClabZIP6 and ClabZIP56 were significantly induced while ClabZIP37 and ClabZIP22 were repressed in leaves of Citrullus lanatus 44 . Figure 9 depicts that changes in expression of CtbZIPs under GA3 and light reveal that some of CtbZIP genes might be directly or indirectly affected by light intensity and hormones. These results provide a basis for further exploration of the function of CtbZIP TFs. In summary, our study provides genome-wide analysis of the safflower bZIP family. We accurately screened 52 CtbZIP TFs, and divided them into 12 subclasses by identifying the conserved homology between them. Their basic physical and chemical properties were analyzed including ORF, number of amino acids and conserved structural positions. A total of 20 conserved structures are found in CtbZIP TFs family. All CtbZIP TFs contain a typical conserved bZIP_1 domain. For the enrichment analysis of the CtbZIP TFs, we found that 45 of the 52 CtbZIPs were enriched, and among the 45, none of the genes were individually enriched into a certain GO functional category. Six CtbZIP TFs were enriched in three major categories CC, BP and MF, and 39 CtbZIP TFs are enriched in BP and MF. A total of four clusters within the CtbZIP TFs were discovered, which constitute a complex interplay network. The expression patterns of the CtbZIP family were predicted and verified by heat map and qRT-PCR. This study improves our understanding of safflower bZIP TFs and lays the foundation of cultivating new cultivars of safflower through molecular breeding methods.

Methods
Plant materials and treatments. The JiHong No. 1 safflower seeds purchased from safflower edge Co.
Ltd. in Xinjiang of China, were cultivated in experimental field of Jilin Agricultural University for multiplication. The collected seeds of safflower were germinated in soil and allowed to grow at 23 ± 2 °C in growth room. It takes about 7 days to sprout cotyledons and hypocotyls, flowers in approximately 100 days while seeds in about 135 days. For light treatment, some safflower plants were grown under normal light radiation (16.8 MJ/m 2 ) while another set of plants under weak light radiation (5.04 MJ/m 2 ). For GA3 treatment, the plants that grew after flowering were sprayed with 50 mg/L GA3 once daily for 5 days. Each experimental group was sprayed simultaneously at 10 am. We collected various tissues, such as leaf, stem, root, flower, cotyledon, hypocotyl and seeds, immediately froze in liquid nitrogen and stored at − 80 °C for further use.
Identification and characterization of CtbZIP TFs. The sequences of CtbZIP were obtained from the safflower genome database (Accession: PRJNA399628 ID: 399628). We downloaded HMM profile of bZIP_1 (PF00170) from Pfam database 28 (https ://pfam.xfam.org/) and the similar sequence of bZIP_1 was searched using Hidden Markov Model (HMM) as the query (P < 0.001). To avoid missing possible bZIP members, NCBI BLAST was performed using the known Arabidopsis bZIP sequences (downloaded from the TAIR, https ://www. arabi dopsi s.org/), as queries against the safflower genome database 26 . All of the possible bZIP TFs were screened according to the significant e-value < 1 × 10 -5 in our data. In addition, the conserved bZIP domains were predicted using SMART 45 (https ://smart .embl-heide lberg .de/) and Search Pfam 28 (https ://pfam.xfam.org/searc h/ seque nce) in all of the possible bZIP TFs. Therefore, the high-confidence bZIP TFs were screened, which were named as CtbZIP. Afterwards, we analyzed the physical and chemical properties of the predicted high-confidence CtbZIP TFs by ProtParam online tool 46 (https ://www.expas y.org/). Phylogenetic analysis of the CtbZIP proteins. The bZIP protein sequences of Arabidopsis and Ricinus communis were downloaded from database of PlantTFDB (https ://plant tfdb.cbi.pku.edu.cn) and that of rice were downloaded from the Rice Genome Annotation Project 47 (https ://rice.plant biolo gy.msu.edu/index .shtml ). Multiple alignment of the full-length bZIP sequences of safflower, Arabidopsis, rice and Ricinus communis was executed using Clustal X 2.0 program 48 and saved in the Clustal X file format. Using MEGA 7.0 program 49 , we constructed a cladogram tree with 1,000 bootstrap replications and Neighbor-joining algorithm. The phylogenetic tree was modified using the iTOL online software 50 (https ://itol.embl.de/login .cgi).

Motifs analysis of CtbZIP proteins.
We searched the open reading frames of CtbZIP genes through the ORF finder at NCBI (https ://www.ncbi.nlm.nih.gov/gorf/gorf.html). CtbZIP transcripts were analyzed in the Pfam 28 (https ://pfam.sange r.ac.uk/) protein database. Analysis of the conserved motifs in safflower CtbZIP TFs were further carried out by multiple EM for motif elicitation software (MEME 29 ) (https ://meme.sdsc.edu/meme/ cgi-bin/meme.cgi) with default parameters. The maximum number of motifs was set to 20 and motif width to 6-50aa. Whereafter a conservative structure was generated using TBtools 30 (https ://www.tbtoo ls.com/). The related motif information used is listed in Table S2.
Gene ontology annotations of CtbZIP TFs. The functions of the CtbZIP TFs were categorized in silico using Blast2GO software 31 (https ://www.blast 2go.com/). The GO functional categorization of 52 CtbZIP TFs was used into each subcategory for enrichment analysis. The enrichment of the number of CtbZIP transcripts categorized into each subcategory was determined by Chi-square test.
Network analysis of the CtbZIP TFs. The construction of the co-expression network is conceptually simple and intuitive. Through the similarity of gene expression, the possible interactions of gene products can Gene expression patterns analysis. To investigate the CtbZIP gene family expression patterns, the highthroughput safflower transcriptome sequencing data were used to analyze the CtbZIP gene expression patterns in various tissues for roots, stems, leaves, flowers and DAF10, 13 and 20 seeds. The expression estimations of CtbZIP genes were normalized and represented in the form of RPKM (reads per kilo base per million mapped reads), and fold change (log2) values were calculated through the ratio of gene expression to draw heatmaps with R 51 and TBtools 30 software.
RNA extraction and cDNA synthesis. The experimental materials (various tissues: root, stem, leaf, flower, seed, cotyledon, hypocotyl) were pulverized adequately and put into centrifuge tubes. Total RNA of various tissues was isolated using Trizol (Invitrogen, Carlsbad, CA, USA), according to the instructions of the manufacturer. The extracted total RNA was treated with RNase-free DNase (Promega, USA) to remove the genomic DNA contamination. RNA quality was checked on OD260/280 values by Nano Drop 2000 (ThermoFisher Scientific, Beijing, China) and 1.2% agarose gel electrophoresis. The cDNA was synthesized from total RNA isolated from various tissues using the PrimeScript RT reagent kit with gDNA Eraser (Takara, Japan), according to the manufacturer's protocols. First, 2 μL 5 × DNA Eraser buffer, 1 μL gDNA Eraser, 2 μL total RNA (about 1,000 ng) and 5 μL RNase free ddH 2 O were mixed in tube and incubated at 42 °C for 2 min to remove DNA. The purified RNA was reverse-transcribed into cDNA by adding 4 μL 5 × PrimeScript buffer, 1 μL PrimeScript enzyme mix I, 1 μL RT primer mix and 4 μL RNase free ddH 2 O into the above-mentioned reaction and incubated at 37 °C for 15 min followed by 85 °C for 15 s. The cDNA was stored at − 20 °C.

Real-time fluorogenic quantitative PCR.
Real-time fluorogenic quantitative PCR (RT-qPCR) was performed using SYBR Premix Ex Taq II kit (Takara, Japan) and Stratagene Mx3000P thermocycler (Agilent) to monitor DNA products. The most stable housekeeping reference gene (EF1α) was selected for the expression analysis in various tissues. The relative expression of CtbZIP was normalized to the expression of EF1α and expressed relative to the level in various treatment. Gene-specific primers designed for the CtbZIP genes are listed in Table S3. RT-qPCR amplification was performed in 15 μL reaction volume containing 500 ng template cDNA (1 μL), 0.3 μL primer (10 m), 7.5 μL SYBR Premix Ex Taq (2×), 0.3 μL ROX Reference Dye (10 m), and 5.6 μL DEPC ddH 2 O. RT-qPCR profile was set as an initial denaturation at 95 °C for 5 min, followed by 40 cycles of 95 °C for 5 s and annealing at 60 °C for 30 s. The fold change in relative expression level was calculated using 2 − CT method.

Statistical analysis.
The experiment was designated for three random replications. All data were analyzed by one-way analysis of variance (ANOVA) and all means were separated at the P < 0.05 level. The different tissues and GA3 treatment by the biological significance of the differential expression were analyzed.