Identification of Alfalfa SPL gene family and expression analysis under biotic and abiotic stresses

The SQUAMOSA promoter binding-like protein (SPL) is a specific transcription factor that affects plant growth and development. The SPL gene family has been explored in various plants, but information about these genes in alfalfa is limited. This study, based on the whole genome data of alfalfa SPL, the fundamental physicochemical properties, phylogenetic evolution, gene structure, cis-acting elements, and gene expression of members of the MsSPL gene family were analyzed by bioinformatics methods. We identified 82 SPL sequences in the alfalfa, which were annotated into 23 genes, including 7 (30.43%) genes with four alleles, 10 (43.47%) with three, 3 (13.04%) with two, 3 (13.04%) with one allele. These SPL genes were divided into six groups, that are constructed from A. thaliana, M. truncatula and alfalfa. Chromosomal localization of the identified SPL genes showed arbitary distribution. The subcellular localization predictions showed that all MsSPL proteins were located in the nucleus. A total of 71 pairs of duplicated genes were identified, and segmental duplication mainly contributed to the expansion of the MsSPL gene family. Analysis of the Ka/Ks ratios indicated that paralogs of the MsSPL gene family principally underwent purifying selection. Protein–protein interaction analysis of MsSPL proteins were performed to predict their roles in potential regulatory networks. Twelve cis-acting elements including phytohormone and stress elements were detected in the regions of MsSPL genes. We further analyzed that the MsSPLs had apparent responses to abiotic stresses such as drought and salt and the biotic stress of methyl jasmonate. These results provide comprehensive information on the MsSPL gene family in alfalfa and lay a solid foundation for elucidating the biological functions of MsSPLs. This study also provides valuable on the regulation mechanism and function of MsSPLs in response to biotic and abiotic stresses.


Results
Identification of MsSPL gene family in alfalfa. Twenty-three MsSPL genes were identified from the alfalfa allele haploid database, and these genes were named MsSPL1-MsSPL23 according to the location of chromosomes ( Fig. 1 and Table 1). The allele of each MsSPL was named with the gene names of "-1" to "-4", including 7 genes with 4 alleles (30.43%), 10 genes with 3 alleles (43.47%), 3 genes with 2 alleles (13.04%), 3 genes with 1 allele (13.04%), MsSPL15 have two tandem duplicators. In addition, some genes also contain paralogs from different homologous chromosomes, and these were regarded as different copies of one number of the SPL gene family.
The basic physicochemical properties of 23 genes including alleles, tandem duplicates and paralogous were shown in supplementary Table S1. The predicted physicochemical properties of amino acid sequences indicated that the 23 SPL genes encode proteins containing 72 (MsSPL5) to 1026 (MsSPL7-2) amino acids with molecular weights ranging from 8447.27 (MsSPL5-1,-2,-3) to 113,299.24 Da (MsSPL1-1). The overall mean of the hydrophilic (GRAVY) scores of all SPL proteins was negative, indicating that they are all hydrophilic proteins. The predicted that isoelectric points ranged from 5.33 (MsSPL9-3) to 10.6 (MsSPL5). The subcellular localization predictions showed that all MsSPL proteins were located in the nucleus. The secondary structure prediction of alfalfa SPL proteins showed that random coil is the main component of the secondary structure of MsSPL protein (Supplementary Table S2), accounting for 71.01-45.83%, followed by extended chain and α-helix, accounting for 37.20-11.58% and 33.33-7.71%, respectively. The tertiary structure of MsSPL protein showed that the MsSPL protein family was dominated by random coils (Supplementary Table S2), which was consistent with the prediction of the secondary structure. The modeling and prediction of the tertiary structure of MsSPL proteins provide a theoretical reference for the subsequent research of MsSPL proteins family.
Chromosome mapping of SPL gene in alfalfa. Physical location of 23 genes (82 SPL sequences) were unevenly distributed on 26 chromosomes (2n = 4x = 32) of alfalfa (Fig. 1). The most numerous of them were on chromosome chr 4.2 with 7 SPL sequences, chromosomes chr 1.4, chr 3.1, chr 3.2, chr 5.1, chr 5.3, and chr 5.4 all had only one SPL sequence distribution, and the rest of the chromosomes had a range of two to five SPL sequences.
Phylogeny of alfalfa SPL protein family. To analyze the evolutionary relationships of the alfalfa SPL protein family, 16 A. thaliana SPL proteins from A. thaliana and 28 SPL proteins from M. truncatula were selected to construct a phylogenetic tree together with 23 alfalfa SPL proteins identified in this study (Supplementary Table S3). Results showed that these SPL proteins were classified into six clades based on phylogenetic relationship (Fig. 2). The remaining 6 subgroups all have MsSPL members, of which the I subgroup contains 27 members, the largest number; the II, III, IV, V, VI subgroups contain 7, 20, 6, 2, 5 MsSPL members. There are a total of 19  MsSPL gene structure analysis, conserved motif, and binding domain analysis. Further gene structure analysis was performed on alfalfa SPL genes. In the MsSPL gene family, the number of introns ranges from 0 to 4 (Fig. 3b), while, 10 out of 82 alleles and paralogs contain no intron, 9 contain one intron, 25 contain two introns and 38 contain more than three introns. The structure of alleles in the same subfamily was similar, but the structure of alleles in different subfamilies was different, which indicated the diversity of gene evolution trend and the diversity of gene structure.
The similarity and diversity of motifs in different MsSPL proteins were explored, MEME software was used to predict and Tbtools to predict the structural protein domains. The results showed that MsSPL2-1 and MsSPL10-1 had two SBP structural domains, the rest had one SBP structural domain ( Supplementary Fig. S1). Ten relatively  www.nature.com/scientificreports/ conserved motifs (named motif 1-10) were identified (Fig. 3d), and the conserved motifs of members of the same subfamily were similar, the subfamily with the largest number of exons also contained the most motifs (Fig. 3c). 88% (72) genes of the 82 alleles and paralogs contained multiple motif structures, and 12% (10) genes contained only one motif structure, among which motif 1, motif 3, motif 4, and motif 8 were the most conserved and shared by most MsSPL proteins. In addition to common motifs, each group of motifs also has certain specificity, and there are similar conserved motifs among subfamilies, such as motif 2, motif 6, and motif 10, which are only found in I subgroup. The presence of subfamily-specific conserved motifs in the subfamily may play an critical role in the functional specificity of the subfamilies. www.nature.com/scientificreports/ Analysis of cis-acting elements. To further predict the potential functions of the alfalfa SPL gene family, the promoter regions of the alfalfa SPL gene family were analyzed and found that it contained multiple types of cis-acting elements (Fig. 4), among which the stress response elements included drought, low temperature, wound, and defense and stress element; hormone response elements included growth gibberellin, abscisic acid, salicylic acid, methyl jasmonate response element. As seen in Fig. 4 that the ditribution of cis-element between MsSPL alleses promoters is different, indicating that MsSPL genes may participate in plant abiotic stress

Analysis of SPL protein interaction in alfalfa.
Using protein network interactions to connect unknown functional proteins into protein interaction networks will help further understand the biological functions of proteins. Therefore, in this study, M. truncatula was used as a background to predict the potential interacting proteins associated with the protein function of MsSPLs. The expected number of edges in the interaction network graph was 11, the average local clustering coefficient: was 0.832, the protein-protein interaction enrichment P-value was 0.00948, which was considered reasonable for the results. A total of 10 functional molecules directly related to MsSPL proteins and 10 potential interacting proteins were identified (Table 2), and all 10 interacting proteins were SQUAMOSA promoter-binding proteins (Fig. 5). It has interactions with superoxide dismutase, floral meristem control frond (LFY) protein, Ubiquitin-conjugating enzyme, MADS-box transcription factor growth regulator, DNA/RNA binding protein kin17, eukaryotic translation initiation factor 2c, plastocyanin et al. have an interaction relationship. It is speculated that the MsSPL proteins may be synergistically involved in the process of disease resistance and stress resistance in plants with these interacting proteins.

Analysis of the duplication events of the alfalfa MsSPL gene.
Gene duplication is one of the important mechanisms for plants to acquire and create new genes, and it is also the main driving force in the evolution of genomes. Gene duplication usually has two duplication modes: segmental duplication and tandem duplication. The alfalfa MsSPL gene duplication event was studied, tandem duplication and segmental duplication were analyzed. It was found that there were 71 pairs of gene duplications in the alfalfa SPL gene family (Supplementary Table S6), including MsSPL15-2/MsSPL15-5 and MsSPL15-2/MsSPL15-3 two pairs of tandem duplicatiors (Table 3), MsSPL1-1/MsSPL1-2 and other 69 pairs of segmental duplicators. The Ka/Ks (non-synonymous/synonymous) value is widely used to represent gene selection pressure and evolutionary rate: a Ka/Ks value greater than 1 indicates evolutionary accelerated positive selection, and Ka/ Ks = 1 indicates neutrality gene drift, Ka/Ks < 1 indicates purification selection under functional constraints. To further analyze the selection pressure of the alfalfa SPL gene family, the non-synonymous substitution rate Ka, synonymous substitution rate Ks, and Ka/Ks ratio of 71 pairs of homologous genes were analyzed (Supplementary Table S6). The homologous gene Ka/Ks ratio is < 1, ranging from 0.042 (MsSPL8-3/8-4) to 0.844 (MsSPL13-2/13-4, MsSPL13-3/13-4), indicating that the alfalfa SPL gene has undergone a great purification selection pressure so that its function can be maintained. It reflected that they do not have much functional differentiation in the evolution process and are highly conserved. The Ka/Ks ratio of the three pairs of homologs, MsSPL13-1/13-4, MsSPL13-3/13-4, and MsSPL15-5/13-4, was > 1, indicating that they had undergone positive evolutionary selection, and these three pairs of genes may be responsible for differentiating new functions ( Table 3).
The ancestors of the legume family originated about 67 million years ago, whereas the divergence time of Papilionoideae Subfamily, which includs the genus alfalfa, was approximately 34-63.7 MYA. The evolutionary divergence time (millions of years, MYA) between the genes of alfalfa SPLs was calculated using the formula T = Ks/2λ × 10 -6 (λ = 6.5 × 10 -9 ). The results showed that   www.nature.com/scientificreports/

Discussion
As a plant-specific transcription factor, SPL has a highly conserved SBP domain. SPL gene family has been identified in Arabidopsis (16) 30 and other species. In this study, 82 SPL sequences were identified by bioinformatics methods. These sequences were divided into 23 genes containing 4 (30.43%), 3 (43.47%), 2 (13.04%) and 1 (13.04%) alleles ( Table 1). As a type of transcription factor unique to higher plants, the expression product of SPLs should be located in the nucleus and play a regulatory role in the expression of downstream genes. The subcellular localization predictions showed that all MsSPL proteins were located in the nucleus. The results were consistent with the subcellular localization results of SPL gene family in Arabidopsis 4 . This suggests that SPL may function as a nuclear transcription factor. Analysis of SPL evolution in different species helps to predict gene function. By protein multiple sequence alignment and phylogenetic tree analysis, the identified 23 MsSPL proteins were classified into six subfamilies with the SPL proteins of A. thaliana (16) and M. truncatula (28) (Fig. 2), each subgroup from the A. thaliana, M. truncatula, and M. sativa have an unequal number of proteins, with 19 orthologs including MsSPL9 and MtSPL04, MsSPL7 and MtSPL06, MsSPL1 and MtSPL01 et al., and 9 paralogs pairs are AtSPL02 and AtSPL03, AtSPL04 and AtSPL05, which indicated that the MsSPLs is closely related to MtSPLs. The function of the gene is closely related to its structure. Analysis of gene structure and conserved motifs showed that MsSPL alleles and paralogs have similar motifs and exons/introns, which not only verified the construction of phylogenetic tree, but also further supported the conserved evolutionary characteristics of SPL gene family. 10 out of 82 alleles and paralogs contained no intron, and 72 contained one or more introns (Fig. 3). Previous studies reviewed that www.nature.com/scientificreports/ lacking introns in the coding regions is gengeally associated with rapid changing expression levels during cell division and cell differentiation, contributing to the generation of a new generation genes 31 .
Chromosome location analysis showed that 82 MsSPL sequences were unevenly distributed on 26 chromosomes (Fig. 1). Segmental duplication and tandem duplication were the main driving forces for gene acquisition of new functions and family expansion 32 . Among them, MsSPL15-2/MsSPL 15-5, MsSPL15-2/MsSPL15-3, two pairs of tandem duplication genes (Table 3), and 69 pairs of segmental duplication genes, indicating that segmental duplication is the main driving force for the evolutionary expansion of the MsSPL gene family. New research showed that angiosperms also had a whole genome duplication event 20 million years ago 33 . The analysis showed that the duplication time of the three pairs of genes, MsSPL15-2/15-5, MsSPL15-2/15-3, and MsSPL15-2/14, coincided with the time of the whole genome duplication event. The Ka/Ks ratio of 96% of homologous genes was less than 1 (Supplementary Table S6), indicating that purification selection played an important role in the evolution of SPL transcription factors in alfalfa and was highly conserved.
Using protein network interactions to connect proteins of unknown function to protein interaction networks will contribute to further understanding of protein biological functions enriched by protein network interactions, and dynamic regulatory networks between various biological logic molecules in cells 34,35 . Therefore, it is necessary to predict the potential interacting proteins of MsSPL proteins and their functions. AES82580 is a superoxide dismutase protein, MsSPL8 interact with AES82580 (Fig. 5), indicating that these proteins may be involved in scavenging free radicals and repairing related damaged cells. Plastocyanin is a class of coppercontaining electron transfer proteins involved in plant photosynthesis, growth and development, and adaptation to the environment 36 . MsSPL8 also interact with plastocyanin, suggesting that these proteins may be related to plastocyanin. LFY is a unique transcription factor in plants. It acts as the downstream of auxin response factor to promote the initiation of flower primordia and the formation of flower organs 37,38 . Our research analysis showed that multiple MsSPLs interact with LFY transcription factors. This is consistent with Gao et al. 28,39,40 that SPLs induce expression in floral organs.
A growing number of studies have shown that SPL transcription factors play an important regulatory role in inducing plant development and participating in plant biological functions. SPLs were found to be involved in wheat spike development in wheat 5 . SPLs are involved in early anther development in A. thaliana 4 . Citrus SPL gene was able to promote A. thaliana flowering independently of photoperiod 41 . The MtmiR156/MtSPL module in M. truncatula is involved in developing leaves, branches, and seed pods 42 . Most BpSPL genes in Betula platyphylla had high transcription levels in leaves, female inflorescences, and male inflorescences 12 . Most studies have shown that SPL transcription factors play an important role in responding to biotic and abiotic stresses. In lychee, 10 LcSPLs were highly expressed in cold response, and only LcSPL1 and LcSPL2 were associated with agedependent flowering in response to cold 43 . The adaptability to salt and drought stress environments was improved in birch trees overexpressing BpSPL9 44 . In response to exogenous hormone treatment, including indoleacetic acid (IAA), gibberellic acid (GA3), methyl jasmonic acid (Me JA), and abscisic acid (ABA), SmSPL6 exhibited different expression patterns 45 . Studies have shown that cis-acting elements in the promoter region have an impact on the number, type, and distribution of gene expression in different regulatory roles [46][47][48] . Therefore, the analysis of promoter regulatory elements is essential to study the function of specific genes. A large number of auxin-responsive elements, salicylic acid cis-acting elements, methyl jasmonate-responsive elements, abscisic acid-responsive elements, gibberellin cis-acting elements, and elements involved in abiotic stress response were predicted in the promoter region of MsSPL genes (Fig. 4). These include defense and stress, low-temperature, wound response, and drought response elements that play key roles in plant responses to biotic and abiotic stresses, as well as plant signal transduction. Previous studies mainly focused on the role of SPL genes in growth and development. In contrast, few studies on the function of SPL genes under stress, drought, salt, and methyl jasmonate stress treatment of alfalfa, the results of qRT-PCR showed that under drought stress, MsSPL1-4/4-1/7-3/5-4/23-2/21-3 contain ABA response elements, and these genes are significantly up-regulated under drought stress, indicating that may belong to the abscisic acid-dependent pathway; MsSPL9-2/18-3 do not contain drought response elements, and the genes are down-regulated, MsSPL20-1/17-4 contained drought-responsive elements, and gene expression was up-regulated, indicating that the drought-responsive elements of these four genes may be positively correlated with gene expression (Fig. 6). Studies have shown that high-salt environment can accelerate the accumulation of ABA in plants, and ABA accumulation can induce the expression of ABA-responsive element genes, thereby making plants resistant. In our study, 12 MsSPLs contained ABA-responsive elements, and all 12 genes tended to be up-regulated under salt stress (Fig. 7). This is consistent with the study of Irfan 49 . It is worth noting that the MsSPL3-3 contains five methyl jasmonate response elements. Under Me JA treatment, the MsSPL3-3 was significantly down-regulated, and the MsSPL4-1/7-3/17-4/18-3/23-2/21-3 without the methyl jasmonate response element were significantly up-regulated. (Fig. 8), which indicated that the methyl jasmonate response element in the promoter region was negatively correlated with possible gene expression. This indicated that MsSPL genes actively respond to biotic and abiotic stresses.

Conclusion
A detailed analysis of the phylogeny, gene structure, conserved motifs, protein interactions, cis-acting elements, and expression profiles of members of the MsSPL gene family was carried out. A total of 23 MsSPL genes were identified from the alfalfa genome in this study. Together with A. thaliana and M. truncatula to construct a phylogenetic tree, the SPL proteins can be divided into six subgroups. There are similar gene structures and conserved motifs in the same subgroup, and they are more closely related to M. truncatula. Segmental duplication is the main form of gene family expansion, and purifying selection plays an major role in the evolution of SPL transcription factors in alfalfa. Analysis of cis-acting elements revealed that MsSPLs are regulated by plant hormones and various stresses. QRT-PCR analysis showed that the MsSPL gene family had different spatiotemporal www.nature.com/scientificreports/ expression patterns under drought stress, salt stress, and methyl jasmonate stress. Collectively, our data add to the understanding of the genetic evolutionary relationships and biological functions of MsSPLs. Ultimately, the results of this study lay the foundation for further revealing the functional characteristics of the SPL gene family.

Materials and methods
Plant material and stress treatments. The seeds of alfalfa variety "Sundeli" provided by Jiuquan Future Physicochemical property analysis and chromosomal location analysis of alfalfa SPL. MsSPL amino acid residues, overall average hydrophilicity, isoelectric point, and molecular weight were analyzed using the ExPaSy protein server (https:// web. expasy. org/ protp aram/) 52 . WoLF PSORT (https:// wolfp sort. hgc. jp/) 53 was used to predict the subcellular location of MsSPL proteins. The location information of all MsSPL genes in the alfalfa genome was extracted, and the online tool TBtools was used to map the MsSPL genes to the corresponding chromosomes.

Phylogenetic classification analysis. Based on multiple sequence alignments of MsSPL, MtSPL and
AtSPL proteins could be divided into various groups. We performed phylogenetic analysis with MEGA 7.0 54 . The phylogenetic tree image was enhanced by the Evolview online program (http:// www. evolg enius. info/ evolv iew) 55 .
Gene structure and conserved motif analysis. The  www.nature.com/scientificreports/ Analysis of gene duplication and divergence time. Using NCBI-Protein Blast to align all amino acid sequences of the alfalfa SPL gene online and screen genes with a coverage rate of > 75% and identity of > 75%, this pair of genes is regarded as a duplicated gene. In addition, in a 100 kb region, two genes separated from multiple genes are considered tandem repeats 60 . The "Simple Ka/Ks Calculator" in TBtools (https:// github. com/ CJ-Chen/ TBtoo ls) was used to calculate the nonsynonymous (Ka) and synonymous (Ks) substitution rates of gene duplication pairs. The ratio of Ka/Ks to judge the selection pressure of replicating genes. Ka/Ks < 1 means purification selection, Ka/Ks = 1 means neutral selection, and Ka/Ks > 1 means positive selection 61 . The divergence time between SPL gene pairs is expressed in million years ago (MYA), and the calculation formula for divergence time is T = Ks/2λ (λ = 6.5 × 10 -9 ), where λ represents each synonym per year Synonymous mutation substitution rate for points.
Ethics approval and consent to participate. This study does not include human or animal subjects.

Statement on guidelines.
All experimental studies and experimental materials involved in this research are in full compliance with relevant institutional, national, and international guidelines and legislation.

Data availability
The alfalfa genome data, CDS sequences, and protein sequences used in this experiment were downloaded at www.nature.com/scientificreports/ Correspondence and requests for materials should be addressed to X.W.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.