Evolution of an intron-poor cluster of the CIPK gene family and expression in response to drought stress in soybean

Calcium ion is an intracellular messenger that plays a central role in signal transduction pathways. Calcineurin B-like proteins (CBLs) and CBL-interacting protein kinases (CIPKs) signal network have shown different functions in the Ca2+ signaling process. In this work, we identified the entire soybean (Glycine max) CIPK gene family, which comprised 52 genes and divided into four subgroups (I to IV) based on phylogeny. The gene structural analysis separated these 52 genes into an intron-rich clade and an intron-poor clade. Chromosomal location analysis resulted in the identification of 22 duplicated blocks and six tandem duplication events. Phylogenetic classification of 193 CIPK proteins from representative plant species suggested that the intron-poor clade of CIPKs originated in seed plants. Analysis of global gene expression patterns of soybean CIPK family revealed that most intron-poor soybean CIPK genes are drought-inducible; a finding that was further confirmed using qRT-PCR. Our study provides a foundation for further functional analysis to reveal the roles that CIPKs and more specifically the intron-poor clade play in drought tolerance in soybean.

drought, and was found to enhance plant tolerance to drought when was overexpressed in Arabidopsis 20 . These findings suggest that CIPK family plays important roles bringing about plant tolerance to environmental stresses.
The CIPK gene family has been analyzed in Arabidopsis 21,22 , poplar (Populus trichocarpa) 22 , maize (Zea mays) 23 , rice (Oryza sativa) 24 , and canola (Brassica napus) 25 . These studies reported that the CIPK gene family could be divided into an intron-rich clade and an intron-poor clade, in which a subgroup in Arabidopsis, maize, rice and canola CIPK genes were induced by drought [23][24][25][26] . Although soybean (Glycine max var. Williams 82) has been sequenced 1 , no specific study has been reported on characterization of the CIPK gene family at the genome level to obtain a general perspective into their potential biological functions, especially in response to drought stress. In this study, we identified 52 soybean CIPK family members and found that intron-poor CIPK genes might be originated in seed plants. We also analyzed the gene expression patterns of the CIPK genes using publicly available microarray data and found that several CIPK genes are drought-inducible, from which 18 genes were confirmed using quantitative real-time polymerase chain reaction (qRT-PCR). Taken together, our data implicate soybean CIPK gene family in drought tolerance and point into several candidate genes for further functional characterization towards improving drought tolerance in soybean.

Materials and Methods
Genome-wide identification of CIPK gene family in soybean. To identify soybean CIPK proteins, all protein sequences were downloaded from the soybean genome (Wm82.a2.v1) from Phytozome V10 (http:// phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Gmax). Hidden Markov Models were used to search for putative soybean CIPK proteins. A HMM profile of the NAF domain (PF03822), the signature domain of CIPKs, was first downloaded from Pfam (http://pfam.xfam.org/) 27 and used to search for soybean CIPKs by HMMER 3.0 28 . Each CIPK candidate sequence was examined for the presence of the NAF domain and protein kinase domain to be considered as a member of soybean CIPK family. The putative CIPK family members were further examined using Pfam and SMART domain detection softwares (http://smart.embl-heidelberg.de/smart/ set_mode.cgi?GENOMIC=1) 29 . Molecular weight (MW) and isoelectric point (pI) of each protein sequence were calculated using ExPASy (http://web.expasy.org/compute_pi/) 30 .

Multiple alignment and phylogenetic tree construction. The protein sequences of all the 52
GmCIPK family members were aligned with ClustalX and constructed using Neighbor-Joining (NJ) method by MEGA6.06 31 . The bootstrap values for phylogenetic tree were based on 1000 replicates. Protein sequences from soybean, Arabidopsis, grape, rice, amborella, gymnosperm plants (pine, gnetum, ephedra, welwitschia and ginkgo), spikemoss, fern, moss and green algae were aligned with ClustalX. The phylogenetic tree was constructed with MEGA6.06 using the NJ method, and bootstrap analysis using 1000 replicates with the pairwise deletion and Poisson model.

Exon-Intron structure analysis and identification of conserved motifs. Gene structure analysis of
GmCIPK subgroup was performed using Gene Structure Display Server (GSDS 2.0, http://gsds.cbi.pku.edu.cn/) 32 by aligning the cDNAs with the corresponding genomic DNA sequences. Motifs analysis was performed with the MEME program (http://meme-suite.org/tools/meme) 33 . The parameters were as follows: number of repetitions, any; maximum numbers of motifs, 30; and the optimum motif widths, between 6 and 200 residues.
Chromosomal location and gene duplication. The chromosomal location image of soybean CIPK genes was generated by Mapchart 2.30 (www.wageningenur.nl/en/show/Mapchart-2.30.htm). The chromosomal position information of soybean CIPK genes was collected from the phytozome database (http://phytozome.jgi.doe. gov/pz/portal.html). Duplication patterns of GmCIPK genes were assigned based on their locations. The tandem duplicated genes were defined as an array of two or more genes located on the same chromosome and separated by five or fewer genes in a 100-kb region 34 . Genes located on duplicated chromosomal blocks were considered as segmental duplication. The information for segmental duplication was obtained from the SoyBase browser (http://soybase.org/gb2/gbrowse/gmax2.0/) 35 .

Microarray analysis of soybean CIPK genes expression. Gene expression pattern of soybean CIPK
gene family under drought stress, were downloaded from the Gene Expression Omnibus (GEO) database (http:// www.ncbi.nlm.nih.gov/geo/) at the National Center for Biotechnology Information (NCBI). The two microarray data sets represented expressions of 48 CIPK genes in soybean leaves at a vegetative stage (GSE29663) and a reproductive stage (GSE40604) under drought. The heatmap of soybean CIPKs was generated using Multi experiment viewer 4.8 (MeV 4.8) software (http://www.tm4.org/mev.html) 36 .
Plant materials, growth conditions and drought treatment. Soybean (var. Williams 82) was used in this study. Soybean seedlings were grown in a growth chamber at 25 °C with a photoperiod of 12 h/12 h, and a light intensity of 180 μ mol m −2 s −1 . The seedlings were watered every two days before drought treatment 20 . At 15 days post germination, drought treatment was initiated and this was set as the 0 day of drought treatment. Leaves, stems and roots samples were collected at 0, 4, 8 and 12 days, respectively, after as the initiation of drought stress ( Figure S2). Control seedlings were continuously watered every two days. The collected samples were frozen in liquid nitrogen immediately and stored at − 80 °C for further analysis. Three biological replicates, each contained three plants, were used for each treatment or control.
RNA extraction and gene expression assay by qRT-PCR. Total  Wilmington, DE). First strand cDNA was synthesized from 2 μ g total RNA using cDNA reverse transcription kit (Applied Biosystems, Foster city, CA) with RNase Inhibitor (RNase out, Invitrogen Life Technologies, Carlsbad, CA) according to the manufacturer's instructions. Specific primers for the 18 soybean CIPK genes (Table S1) were designed using Primer 3 software (http://bioinfo.ut.ee/primer3/). Ribosomal protein s20e gene (RSP s20e, Glyma.03G142300) was used as a reference gene 37 . The real-time qRT-PCR was conducted using a Power SYBR Green PCR Master Mix Kit (Applied Biosystems, Foster city, CA) on an ABI 7900HT Fast Real-Time PCR System (Applied Biosystems, Foster city, CA). The PCR reactions were performed according to the manufacturer's Neighbor-joining method was used with 1000 bootstrap replicates. These soybean CIPK genes were divided into four subgroups (I-IV) with different colored branches. Exon and intron analysis was performed using GSDS2.0 (B). The yellow boxes represent exons and the black lines represent introns. The blue boxes represent upstream/ downstream-untranslated regions. The scale bars of introns, exons and untranslated regions are included at the bottom of the graph. (C) Classification of CIPK genes into intron-poor clade (green dots) and intron-rich clades (blue squares). Genes with intron number less than 3 were grouped into the intron-poor clade, and genes with intron number more than 8 were grouped into the intron-rich clade.

Results
Soybean genome encodes 52 CIPK genes. We identified 52 CIPK gene family members (GmCIPK1 to GmCIPK52) that contain both NAF and kinase domains, the characteristic features of CIPK proteins ( Table 1). The 52 CIPK genes are distributed across 19 chromosomes (chromosome 1-20, except chromosome 12) ( Table 1). These proteins range in size between 306 to 528 amino acids. The relative molecular weights of these CIPK kinase proteins varied from 35.14 to 59.71 kD. Most of these proteins (82.69%) have high isoelectric points (pI > 7.0). The detail information about other parameters was provided in Table 1.
Phylogenetic and gene structural analysis of the soybean CIPK gene family. The evolutionary relationship among the 52 soybean CIPK members is shown in Fig. 1A. The phylogenetic analysis classified the 52 CIPK family members into four subgroups; I, II, III and IV (Fig. 1A). Subgroup IV is the largest one and contains 35 members. The other three subgroups contain 17 members in total (5 in subgroup I, 7 in subgroup II, and 5 subgroup III).
To further investigate the structural diversity of the CIPK genes in soybean, the exon/intron organization of the GmCIPK genes was analyzed. The GmCIPK gene members were clearly divided into an intron-rich clade (> 8 introns per gene) and an intron-poor clade (< 3 introns per gene) 23,26 . All the intron-poor clade members belong to subgroup IV and all intron-rich members relate to subgroup I, II and III (Fig. 1B). In subgroup IV, only GmCIPK7 and −51 contain two introns; GmCIPK9, − 28, − 33, −37 and − 47 contain one intron; the other members in subgroup IV are intronless. Most members in subgroup III contain 11 introns except GmCIPK43, which contains 14 introns. Seven members in subgroup II varied in intron numbers from 9 (GmCIPK18) to 14 (GmCIPK20 and − 34). All genes in subgroup I contain 13 introns (Fig. 1C).
Conserved motifs were also analyzed for all the 52 soybean CIPK proteins using MEME software 33 . Totally thirty motifs were identified (Fig. 2) and the details of each motif were shown in Figure S1. All soybean CIPK proteins contained motif 10 or motif 15 annotated as the NAF domain. All proteins in subgroup I and II contain motif 14, but only 4 CIPK proteins in the subgroup IV, CIPK1, − 23, − 49 and − 51, have motif 14. Only five proteins all in subgroup I, contain motif 17 (Fig. 2).

Chromosomal location analysis and gene duplication.
To determine chromosomal locations and duplication events, all the 52 CIPK genes were mapped to 19 out of the 20 soybean chromosomes, except chromosome 12 (Fig. 3). The 52 CIPK genes were not distributed evenly in these 19 chromosomes. Chromosomes 1, 5, 16, 19 and 20 contain one CIPK gene, while chromosome 13 contains most CIPK genes (6) among all soybean chromosomes.
Gene duplication events have driven the expansion of soybean CIPK genes, with 41 genes found in 22 duplicated blocks and only 11 GmCIPK genes located outside of the duplicated blocks (Fig. 3) region on chromosome 2, 9, 13, 15 and 18, respectively, which were resulted from tandem duplications and were all intron-poor genes.

Evolution analysis of CIPK in plants.
To investigate the origin and evolution of CIPKs, we built a NJ phylogenetic tree using 193 full-length protein sequences containing a NAF (PF03822) domain from 14 representative plant species. Among the 193 proteins, only one CIPK protein was found in green algae, 7 in moss, 8 in fern, 5 in spikemoss, 26 in pine, 3 in ginkgo, 2 in gnetum, 3 in welwitschia, 2 in ephedra, 33 in rice, 7 in amborella, 18 in grapevine, 26 in Arabidopsis, and 52 in soybean. The CIPK proteins in rice, Arabidopsis, soybean and grapevine, all being angiosperm, were divided into four subgroups (Fig. 4). However, CIPK proteins in a green algae, moss, fern, and spikemoss were all grouped in subgroup I and II. All the 35 soybean intron-poor genes were clustered in subgroup IV (Fig. 1A). In addition, we found that some of gymnosperms plants (pine, ginkgo, gnetum,  While our qRT-PCR data are overall consistent with the microarray results (Fig. 5), a few genes displayed distinct expression patterns. This includes, for example, GmCIPK31, which was down-regulated in qRT-PCR assays of leave samples, but was up-regulated in the two microarray experiments.

Discussion
Calcium plays a key role in plant signal transduction responding to environment stresses. Plant protein kinases such as calcium-dependent protein kinases (CDPKs) play central role in mediating plant response to stress signaling 39,40 . The calcium sensor calcineurin B-like proteins (CBLs) and their target kinase CBL-interacting protein kinases (CIPKs) system function together to regulate plant environmental stresses, such as drought 10 . Arabidopsis CBL1-and CBL9-CIPK23 complexes control abscisic acid (ABA)-regulated drought tolerance 41 . CIPK family had been analyzed in some model plants and major crops [22][23][24][25][26] , but no detailed information about soybean CIPK gene family is available. In this study, we identified 52 GmCIPK genes in soybean (Table 1), which is twice as much as that in Arabidopsis, and more than that in most of the other plant species with sequenced Genomes. The large size of the CIPK gene family in soybean could be attributed to the whole-genome duplication events occurred approximately 59 and 13 million years ago (Mya) 1 .
The soybean CIPK gene family has significantly expanded in its evolutionary history. Similar to Arabidopsis 26 and maize 23 , the soybean CIPK proteins were divided into four subgroups based on the phylogenetic classification (Fig. 1A). The soybean CIPK genes are clearly divided into intron-rich (subgroups I, II and III) and intron-poor (subgroup IV) clades (Fig. 1). This finding suggests that similar intron gain and loss events contributed to the structural evolution of the CIPK gene family before the eudicot-monocot divergence. All of the 52 GmCIPK The full-length of 193 CIPK protein sequences from soybean, grape, Arabidopsis, rice, amborella, ginkgo, pine, gnetum, ephedra, welwitschia, fern, spikemoss, moss and green algae were used to construct the phylogenetic tree using MEGA with the Neighbor-joining (NJ) method. Bootstrap values (on nodes) were calculated using 1000 replicates. Subfamilies are highlighted with different colors. The CIPK proteins in soybean were marked by green dots. A green algae CIPK protein was used as an outgroup and marked with a red triangle. Eudicot CIPK proteins were marked with dots. Monocotyledons were marked with squares.
proteins contain the signature NAF domain (Fig. 2) 13 . Our analysis of chromosomal locations and duplication events implicates gene duplication, especially segmental duplication and tandem duplication as the major evolutionary mechanisms responsible for soybean CIPK expansions (Fig. 3). Interestingly, all genes contributed by tandem duplication events were intron-poor genes. However, segmental duplication events occurred both in intron-poor genes and intron-rich genes, an observation similar to what was previously reported in Arabidopsis 26 . Tandem duplications have been found to be associated with gene families that regulate plant responses to stresses 42 , but little information is available with regard to the relationship between intron-poor gene clade and plant adaptation to environmental stresses.
Our phylogenetic analysis suggests that CIPKs originated in green algae, but expanded along the evolutionary trajectory to angiosperms. It is interesting that the intron-poor CIPK group was evolved much later, first appeared in the seed plants, very likely derived from loss of introns in the intron-rich members, as the CIPKs in the more ancient lineages, such as in green algae 14 , moss, fern and spikemoss were all in the intron-rich group (Fig. 4). This may suggests that when seed plants evolved, there was a great force for environmental stress adaptation. It has been previously reported that the intron-poor clade of the Hsp90 gene family in Populus displayed differential expression patterns upon exposure to various abiotic stresses, particularly drought stress 43 . Plant CIPK genes could be induced by different stresses, such as drought 23,44 , salt 45 , and cold 18 . Various functional studies of plant CIPK genes provided clear evidence for their implication in stress responses. For example, overexpression of SiCIPK24(SISOS2) in tomato enhanced salt tolerance 46 . In Similarly, overexpression of GhCIPK6 in Arabidopsis significantly increased the tolerance to drought, salt, and ABA 20 . Chaves-Sanjuan et al. 47 described the CIPK protein structure and its regulatory mechanism during plant response to environmental stimuli. Consistent with a role of CIPK gene family in drought tolerance, a substantial number of soybean CIPKs changed mRNA abundance upon drought application as revealed by microarray analysis (Fig. 5). This was further verified in qRT-PCR assays (Fig. 6). The detailed gene expression analysis of soybean CIPK gene family in different tissues provided intriguing insight into their roles in responding to drought stress. Our qRT-PCR data showing differential expression patterns of 18 CIPK genes in three diverse tissues provide an indication of distinctive functional roles in of soybean CIPKs in different tissues in response to drought. In this context it may be important to mention that the unique and overlapping biological functions of CIPK gene family in different tissues are still unexplored 24,48 . Interestingly, we found the majority of drought-responsive CIPK genes in our qRT-PCR assays belong to the intron-poor gene clade in subgroup IV. This result suggests that expansion of intron-poor clade of CIPK genes may be an adaptive feature for drought stress 26 , but the mechanism underlying this adaptation remains elusive.
Among the up-regulated GmCIPK genes, we identified CIPK9, −12, −24, and − 49 as the most highly expressed genes under drought stress in the three tissues tested. These genes represent bona find targets for improving soybean tolerance to drought and deserve further analysis to reveal their functional roles in drought response and the underlying molecular mechanisms, which is currently underway.