Genetic diversity at the Dhn3 locus in Turkish Hordeum spontaneum populations with comparative structural analyses

We analysed Hordeum spontaneum accessions from 21 different locations to understand the genetic diversity of HsDhn3 alleles and effects of single base mutations on the intrinsically disordered structure of the resulting polypeptide (HsDHN3). HsDHN3 was found to be YSK2-type with a low-frequency 6-aa deletion in the beginning of Exon 1. There is relatively high diversity in the intron region of HsDhn3 compared to the two exon regions. We have found subtle differences in K segments led to changes in amino acids chemical properties. Predictions for protein interaction profiles suggest the presence of a protein-binding site in HsDHN3 that coincides with the K1 segment. Comparison of DHN3 to closely related cereals showed that all of them contain a nuclear localization signal sequence flanking to the K1 segment and a novel conserved region located between the S and K1 segments [E(D/T)DGMGGR]. We found that H. vulgare, H. spontaneum, and Triticum urartu DHN3s have a greater number of phosphorylation sites for protein kinase C than other cereal species, which may be related to stress adaptation. Our results show that the nature and extent of mutations in the conserved segments of K1 and K2 are likely to be key factors in protection of cells.

Dhn genes demonstrated that their allelic variations have originated from deletion or duplication of Φ domains (named the Φ -segments) in the K-segment and single nucleotide polymorphisms through the entire gene 14 . The Φ -segments are variable motifs rich in polar amino acids (Gly or Ala/Pro) located between or before the K-segment. There are five different subgroups of dehydrins (Y n SK 2 , K n , K n S, SK n , and Y 2 K n types) based on the number and position of the conserved segments 17 . Dehydrin3 (Dhn3) and Dehydrin4 (Dhn4) genes are located on barley chromosome 6H as consecutive genes, while other Dhn genes are distributed on 3H, 4H, 5H and 6H 14 . Expression patterns of barley Dhn genes have been found to be correlated with the known regulatory element compositions in their sequences 14 . Barley Dhn3 and Dhn4 genes have been reported as early responsive genes to drought and other stress factors [18][19][20][21] . Both genes have been rapidly up-regulated in drought-tolerant barley (Hordeum vulgare L. "Chalbori") and their transfer and over-expression in Arabidopsis conferred tolerance to this plant 18 . There are not many studies on expression profile of Dhn genes of H. spontaneum under drought. However, Suprunova et al. 22 has clearly proved Dhn3 expressional induction in response to drought, similar to barley.
Dehydrins are hydrophilic due to the presence of polar and charged amino acids (Ala, Gly Lys, Asp, Glu, and Ser). They do not have stable three-dimensional structures, thus they are intrinsically disordered proteins (IDPs) 23 . Despite having high flexibility and minimal secondary structure, IDPs are an important class of proteins, functionally related to cell signalling, transcription, and assembly of protein complexes 24 . As cryoprotectants, dehydrins are known to interact with membrane phospholipids, metal ions, and water during stressful conditions [25][26][27][28] . Recent studies also suggest that conserved motifs of dehydrins, such as K-segments, have a role in their intrinsic disordered structure and binding affinity to other molecules and cell membranes 24,29,30 . IDPs such as dehydrins often gain structural stability when bound to ligands such as membranes, and they may change their oligomeric state when bound to ions 31 . Experimental studies of disordered regions of proteins have been difficult for X-ray diffraction analyses, and often for nuclear magnetic resonance imaging (NMR) due to their flexible nature and so predictive tools are often used in conjunction 32 .
Although there have been many investigations related to structure and functions of dehydrins, the genetic diversity of individual dehydrin proteins in natural populations and its effects on the resultant polypeptide structure are not well known. In the present study, we have selected DHN3 from the dehydrin family for characterization of biochemical features in comparison to other cereal species. Additionally, we identified indels and SNPs within HsDhn3 alleles to understand the range of mutations in H. spontaneum populations of Turkish origin, which represent an important gene pool.

Materials and Methods
Plant material. H. spontaneum L. accessions collected from 21 locations mostly in Southeastern Turkey, and Hordeum vulgare L. cv. Tokak 157/37 (TK157/37) were used in the study (see Supplemantary Table S1). Seeds were germinated and grown in pots filled with soil in a growth chamber (Angelantoni, Ekochl 700) under a shortday photoperiod (8 h light/16 h dark), at a temperature of 25 °C and 50-60% relative humidity. One plant was used from each accession.
DNA extraction and isolation of Dhn3 alleles. We extracted genomic DNA from the leaves of wild barley seedlings following the CTAB method 33 . Specific primers (forward: 5′ AGGCAACCAAGATCAACACC 3′ and reverse: 5′ TTCTGCAAGGTAGCCAGACC 3′ ) were designed to amplify the whole sequence of the Dhn3 gene based on sequences of H. vulgare cv. Dicktoo presented in GenBank database (AF043089.1) using Primer3 software. The specificity of designed primers was confirmed by BLASTN analysis 34 . Genomic DNA amplifications were performed in a 25 μ l reaction containing 0.5 U of Dreamtaq DNA polymerase (Thermo Scientific EP0702), 200 μ M of each dNTP, 0.4 μ M of primer, 2 mM MgCl 2 and 50 ng genomic DNA. Thermocycling was performed at 95 °C for 5 min, followed by 35 cycles at 95 °C for 1 min, 61 °C for 40 s, 72 °C for 1 min, and a final extension at 72 °C for 10 min. The resulting amplicons were purifed using Wizard ® SV Gel and PCR Clean-Up System (Promega, USA) and cloned into pTZ57R/T plasmid vector (Thermo Scientific, USA) according to the manufacturer's instructions. The cloned HsDhn3 fragments were sequenced by the dideoxy chain termination method using ABI Prism 310 Genetic Analyzer (Applied Biosystems, USA). DNA sequence analysis. The sequencing chromatograms were examined with Chromas Lite 2.1.1 (Technelysium Pty Ltd, Australia) and converted to FASTA format. The vector sequences were removed using web based VecScreen. The nucleotide and predicted amino-acid sequences were compared with sequences in the GenBank and EMBL databases respectively, using BLAST. The intron was identified by aligning to known sequences of Dhn3 CDS of cv. Dicktoo (AF043089.1). Amino acid sequence alignments of the predicted DHN3 polypeptide and nucleotide sequence alignments were performed using CLUSTALW2 35 with default parameters. We identified SNPs among the HsDhn3 DNA sequences from different genotypes using MEGA 6.06 software 36 . Measurement of nucleotide diversity. Nucleotide diversity (π ) between the genotypes was calculated as the average of the pairwise nucleotide difference per site between two sequences according to Nei 37 (1987) using the MEGA 6.06 software. The number of unique haplotypes (h) and haploid diversity (Hd) were measured using DNAsp version 5.10.01 38 . Protein sequence analyses. We analysed the physical and chemical properties of DHN3 including molecular weight, theoretical isoelectric point (pI), stability index and hydropaticity index, according to amino acid scale values by Kyte and Doolittle 39 , using the ProtParam tool from Expasy 40 . Putative protein kinase C (PKC) and casein kinase 2 (CK2) phosphorylation sites were predicted using NetPhosK 1.0 41 . The protein sequences of the DHN3 variants were submitted to the IntFOLD server 42,43 to generate alternative 3D models using the latest methodology 44 . Predictions of the intrinsically disordered (natively unstructured) regions in the sequences were generated using DISOclust 45 and likely disordered and protein binding regions were predicted with DISOPRED3 46,47 .

Results
Sequence Diversity. The H. spontaneum Dhn3 (HsDhn3) gene has typically 486 bp of coding sequences with two exons and 439 bp of noncoding sequences (59-bp 5′ UTR, 113-bp intron and 267-bp 3′ UTR) like Hordeum vulgare Dhn3 (Fig. 1). We sequenced a 692-bp region of the Dhn3 alelle including all the coding regions (195-bp Exon1 and 291-bp Exon 2) and 206 bp of noncoding regions. Through all sequences, we detected total 29 SNPs in the coding regions and intron of HsDhn3. The variation in the intron, with one SNP every 14 bp on average, was one-and-a-half fold as high as in the coding regions, with one SNP every 23 bp on average. There was only one gap observed in the sequenced region of the HsDhn3, 18-bp length in Exon 1. This gap was observed in only three H. spontaneum genotypes. The nucleotide diversity for the whole HsDhn3 was estimated by Nei's 37 π statistics to be 0.00684 for the whole sequence. There was higher diversity in the intron region (π = 0.01247) than in the exon regions (π = 0.00290 and 0.00720 for Exon 1 and Exon 2, respectively). 16 haplotypes were observed for all regions of Dhn3, where the highest haplotype score was observed in Exon 2 with 12 haplotypes. The allelic variation was measured according to haplotype diversity (Hd; Table 1). The lowest value was found in the Exon 1 (Hd: 0.458), while Exon 2 showed the highest variability (Hd: 0.824). The Hd value was smaller in the intron (0.795) than Exon 2.
The SNP number, nucleotide diversity, and haplotype diversity were also calculated for each sub-region of HsDhn3 gene (Table 1). These sub-regions are highly conserved regions containing one Y-segment, one S-segment, and two K-segments. There was also a spacer, named K sp , in-between the two K-segments. 10 of the 21 SNPs were observed in the K sp region. The K sp region also showed the highest scores for the number of haplotypes and haplotype diversity. The variation in K 1 and K 2 was detected with one SNP every 10 and 9 bp, respectively. Nucleotide diversity of K 1 (π = 0.01354) is higher than K 2 (π = 0.01058). The lowest variation between regions was observed in Y with π = 0.00414.
The HsDhn3 gene is GC rich with a content of 66.9%. In total, 21 SNP mutations were detected within the coding region of HsDhn3 (Table 1). Regarding the nature of base mutations, transition mutations were 76.2% of total, while transversion mutations were about 23.8% (Table 2). A/G substitutions had the highest percentage at 57.1. While 6 SNPs were synonymous, 15 SNPs were non-synonymous and led to amino acid replacements.

Biochemical features and motif structure of DHN3 in H. spontaneum. The molecular weight of
Hordeum spontantenum DHN3 (HsDHN3) varied from 15.72 kDa to 16.22 kDa with 155 or 161 amino acid residues, respectively (see Supplementary Table S2 online). HsDHN3s had a number of putative protein kinase C (PKC) phosphorylation sites, varying from 9 to 11, which was similar to that of H. vulgare (see Supplementary  Table S2 online). PKC sites were outside of the conserved motifs, except a serine residue within the S-segment (Fig. 2)   proteins were highly stable with values much lower than 40 40 . All HsDHN3 were found to be highly hydrophilic, with GRAVY values ranging from − 1.020 to − 1.128 and also basic, with theoretical pIs varying from 7.99 to 8.90. Similar to the DHN3 protein of cultivated barley, HsDHN3 is YSK 2 -type containing one Y-segment, one S-segment and two K-segments (Fig. 2). The Y-segment sequences (DEYGNPV) were the same as in cultivated barley 14 , with the exception of the LH1 variant, which contains DEYGYPV, where the amino acid Asn was replaced by Tyr. S-segments are Ser rich conserved motifs and typically described as RSGSSSSSSS 14 and interrupted by an intron (Fig. 1). Although S-segments appear to be conserved in all H. spontaneum genotypes, Ser was replaced by Thr in H. vulgare cv. TK157/37 (Fig. 2).
In barley, the K segment has two 15-mer Lys-rich consensus segments RKKGLKDKIKEKLPG and EKKGIMDKIKEKLPG named the K 1 -segment and the K 2 -segment, respectively 14 . In the K 1 -segment, the amino acid Asp is replaced by Glu in the K102, K169, K394, and LK8 variants (Fig. 2). In addition, the LK8 variant included an amino acid change of Gly to Ser. In the K 1 segment, another non-synonymous substitution included Lys replaced with Arg in the LH4 variant. Regarding the Φ -segments, there were conserved GHFQ, GDQQ, YGQH, and YGQQ sequences found between the Y-S and K 1 -K 2 segments, similar to cv. Dicktoo in all HsDHN3 variants with the exception of a Cys substitution occurring at position 101 in the Φ -segment of the AA3 variant. In addition, four amino acid substitutions, Gly to Ala, Thr to Ile, Thr to Ala, and Gly to Ser, were also observed between the K 1 -and K 2 -segments at positions 111, 112, 130, and 139, respectively.
HsDHN3 is hydrophilic due to the presence of K-segments (Fig. 3A). Additionally, the region between position 40 and 150 was found to be both hydrophilic and disordered in all HsDHN3 (Fig. 3B). At position 145 we observed that TR4982 replaces a Lys by an Arg that results in an increased hydophilicity (Fig. 3A).
Comparison of DHN3 sequences in cereal species. We compared the predicted DHN3 proteins from H. spontaneum, versus other closely related cereals in terms of their general biochemical properties (Table 3). All DHN3 proteins were YSK 2 -type, with the exception of T. urartu DHN3 (YSK-type). The number of amino acids varied from 154 (S. bicolor) to 183 residues (B. distachyon), while molecular weights ranged between 15.73 kDa (A. tauschii) and 18.93 kDa (T. urartu). All DHN3 proteins were stable, with an instability index (II) under 40, with the exception T. urartu (41.12). The most basic protein among the DHN3s was T. urartu DHN3 with a pI of 10.22. The number of predicted phosphorylation sites varied from 4 to 15 for PKC and 1 to 5 for CK2 in DHN3s (Table 3). All the DHN3 proteins were identified to be highly hydrophilic with GRAVY values ranging from − 0.946 to − 1.145. H. spontaneum variants contain on average 55.5% of charged and polar amino acids. The most frequent amino acid is Gly, a non-polar one, which constitutes 26.7% of the amino acid content. The frequency of the Cys and Phe are less than 1%. Cys was discovered only in the H. spontaneum variant AA3 and is a rare amino acid in Dhn genes (Fig. 2). Trp residues were not detected among any of the DHN3 proteins (see Supplementary  Table S3 online).
Comparisons of the predicted DHN3 proteins in different cereal species indicate that the Y-, S-, and K-segments are highly conserved sequences (Fig. 4). A consensus motif of [V/T]D[E/Q]YGNP (the Y-segment), located near the N-terminus was found in all cereal DHN3s. The Val, the first amino acid of the Y segment, was replaced by Ile and Leu in T. urartu and S. bicolor, respectively. In addition, the E/Q to V substitution is also present in O. sativa DHN3. The S-segment (RSGSSSSSS) was conserved intact with an extra Ser in T. urartu, A. tauschii and O. sativa.
The NLS peptide (RRKK), placed just upstream from the K 1 segment (first K-segment), was found in all cereal DHN3s (Fig. 4). Although, the K 1 segment (RKKGIKDKIKEKLPG) was found highly conserved in all cereals, some amino acid substitutions were discovered in the K 1 segment. A non-polar amino acid Ile was replaced with Leu and Met, which were also non-polar. The positively charged Lys was substituted with the non-charged Gly in the S. bicolor DHN3 protein. In addition, there was another amino acid replacement between Asp and Glu in B. distachyon, Z. mays, S. bicolor and O. sativa. Although DHN3s have two highly conserved and Lys-rich segments named K 1 and K 2 in all cereals, the K 2 -segment occurring at the C-terminus (EKKGIMDKIKEKLPG) was not detected in the T. urartu DHN3. Two substitutions were discovered in the K 2 segments at the same amino acid position. An Ile residue was replaced by Leu and Phe in B. distachyon and O. sativa, respectively. Another highly conserved region, [E(D/T)DGMGGR], not previously reported, was discovered between the S-segment and the K 1 -segment in all cereal DHN3s (Fig. 4). Only a single amino acid replacement, Asp to Thr, was found in the S. bicolor DHN3 for this conserved sub-sequence.
Structural predictions for DHN3 variants. As expected, the 3D models predicted by IntFOLD server strongly suggest that all HsDHN3 variants are mostly unstructured. No high quality globular 3D models were obtained; all models were highly variable and most of the models obtained were neither folded nor compact. The DISOclust results from the IntFOLD server, shown in Fig. 3B, also confirmed the extent of the intrinsic disorder for each of the variants, due to the large variations in the 3D locations of residues across the multiple alternative 3D models. The results in Fig. 3C and Supplemantary Fig. S1 indicate the putative regions of protein binding to be in the first 10-15 residues and the last 10-15 residues with a peak in the region around residue ~80. Often intrinsically disordered regions in proteins coincide with protein binding sites and the latest version of the DISOPRED method provides confidence scores for protein binding residues. Differences in protein binding profiles were observed between e.g. cultivated barley (TK157/37) and TR4982 with varying peak sizes occurring in the K1-segment around amino acid position ~80 (see Supplemantary Fig. S1). The varying confidence scores indicate that the SNPs may affect the putative protein binding function of HsDHN3. Interestingly, the protein binding regions coincide with point mutations in the KxKIxEKLPx subsequence and in the C-terminal of K-segment (Fig. 3C).

Discussion
Dehydrins play a fundamental role in the response of plants to different abiotic stresses especially dehydration, salinity and low temperatures by accumulating in vegetative tissues. They are the best-investigated group within LEA proteins with the characterized multilocus families including ten members in Arabidopsis 13 , eight members in rice 49 , fifty-four unigene in wheat 50 , and thirteen members in barley 14,15 . Dehydrins are characterized by the presence and copy number of several conserved motifs named the K-, S-, and Y-segments. DHN3 from wild barley is YSK 2 -type and structurally highly similar to cultivated barley. Interestingly, a 18-bp deletion occurring on Exon 1 was determined in only three out of 21 H. spontaneum genotypes: TR4982 (Çanakkale), TR47002 (İzmir) and TR49085 (Adıyaman). Different polypeptide size as a result of the indel was previously reported in H. vulgare cv. Himalaya and cv. Dicktoo 14 . Dehydrins are known to be located in different compartments of cell, including the cytoplasm, nucleus, mitochondria, chloroplast, and vicinity of plasma membrane 31 . YSK 2 -type dehydrins have both cytoplasmic and nuclear localizations but are mostly found in the nucleus 51,52 . Goday et al. 51 also reported the in vitro interaction of a maize DHN5 homolog, RAB17 with a SV40 NLS signal for its import to nucleus. In   our study, a "RRKK" motif, postulated as a nuclear localization signal (NLS), was determined just upstream from the first K-segment of nine cereal DHN3s (Fig. 4). Further experimental data is needed to confirm the functionality of the NLS sequence as well as the exact localization of DHN3s in barley. In general, cereal DHN3s were found to be stable proteins except the T. urartu (with an instability index of 41.12). The presence of only one K-segment in T. urartu DHN3 (TuDHN3), in contrast to other dehydrins, may have a negative effect on protein stability. On the other hand, TuDHN3 was also the most basic protein with the highest molecular weight (18.93 kDa). T. urartu is a wild diploid wheat and progenitor species of a genome of bread wheat. Despite the sparse T. urartu literature, LEA proteins have been recently found associated with cold tolerance in this species 53 . The NetphosK 1.0 program predicted that HsDHN3 might be specifically phosphorylated by protein kinase C at 9-11 sites, which were mainly Thr residues. Y n SK n -type DHNs are predominantly phosphorylated by protein kinase C group proteins, rather than CK2s 29 . Compared to other cereal species, H. vulgare and H. spontaneum had one of the highest occurrences of PKC phosphorylation sites, second only to T. urartu. Particularly, phosphorylation by protein kinase C at K-segments has been found to be associated with membrane binding functions of dehydrins 29 . Therefore, both HsDHN3 and TuDHN3 are good candidates to investigate membrane-dehydrin interactions. Amino acid changes led to the occurrence of a new phosphorylation site in two H. spontaneum accessions, LK8 and K169, by replacement of Thr at the position of 112. Brini et al. 52 found that the phosphorylation pattern in wheat DHNs was related to abiotic stress tolerance. In particular, higher phosphorylation indicated higher tolerance to drought and salinity. This suggests that the extra phosphorylation site may play a role in the drought tolerance of LK8 and K169.
The amino acid composition of HsDHN3 showed a high proportion of Gly residues (26.7%) conferring flexibility to the protein with the lack of a hydrophobic core and other factors. Moreover, 55.5% of HsDHN3 amino acids were polar amino acids with hydrophilic character and this was also supported by the GRAVY results, with calculated values ranging from − 1.020 to − 1.128. In general, DHN3s are Gly-rich proteins and known to be deficient in Trp and Cys in the literature. We have found a Cys residue in the H. spontaneum variant AA3. S. bicolar and T. urartu contained one Cys (0.6%) and one Trp (0.6%) residue among the cereal DHN3s. Intrinsically disordered proteins are also significantly depleted in Cys and Trp 54 ; typically less than 1%, compared with the average folded protein in the Protein Databank. In general, His residues are rarely found in proteins and constitute approximately 2% of the amino acid content 55,56 . Nevertheless, dehydrins contain a higher proportion of His residues. For example, His content ranged from 3.2% to 13.5% in Arabidopsis DHNs 56 . We have found that HsDHN3s were relatively His rich proteins containing 8.1% His residues. Moreover, conserved His residues were adjacent, with both K-segments formed as Gly-His (GH) or Gln-His (QH) motifs in HsDHN3. Eriksson et al. 29 reported that the ionization state of His residues flanking the K-segments modulates the affinity of dehydrins to the cellular membranes in a pH dependent manner. His residues were not concentrated as motifs in cereal DHN3s, as earlier reported for a Citrus dehydrin 56 .
The HsDHN3 protein variants are all likely to be intrinsically disordered (natively unstructured) with a likelihood of protein binding sites near the N-and C-termini and surrounding residue 80. The C-terminal site and the site around residue 80 also coincide with predictions of alpha helices and so these regions may undergo and disorder-order transition on protein binding. Importantly, the point mutations are observed occur within these protein binding regions and therefore the amino acid substitutions in these sequence variants may affect protein interactions. Similarly, the protein binding regions coincide with point mutations in the KxKIxEKLPx subsequence and in the C-terminal K-segment, a region highly conserved in plants 27 . Often disordered regions become ordered on binding, so it is interesting to predict secondary structures to determine if local structures may form during protein-protein interactions. The specific nature of these interactions is not yet known although the predicted helices are unlikely to form coiled-coil interactions according to results obtained from Pcoils. DHNs are also known to interact with lipids, membranes, metal ions, water, ice and DNA 31 . They function as cryoprotectant and have binding properties that allow chaperon activities. However, the exact mechanism and details of these interactions are not completely clear. Recently, dehydrin-dehydrin binding has been demonstrated in two plant species 57,58 . Yeast two-hybrid assays confirmed that K-segments and His residues are required for dimerization of Opuntia DHN1 58 . In our study, the peaks shown in Fig. 3C and Supplemantary Fig. S1 indicated disordered residues that may fold or become ordered upon protein binding. These regions are therefore likely to be the dimerization sites in the HsDHN3 variants.
In this study, we report the genetic structure and diversity of near-complete Dhn3 alleles from native H. spontaneum plants. By taking the advantage of the additional data, we were able to compare predicted DHN3 sequences with other closely related cereals, which allowed us to distinguish polymorphisms and motif structures. Most of the SNPs identified occurred in non-coding and inter-segment positions and resulted non-synonymous mutations. However, point mutations in several variants have resulted in amino acids with opposite chemical properties as seen in the substitution of Met (a sulphur containing hydrophobic) for Lys (a basic, polar, and positively charged), or Gly (Aliphatic and nonpolar) for Ser (Non-aromatic hydroxyl containing, polar). Dehydrins, as IDPs, are structurally not globular folded molecules; however they are proposed to be rich in functionality because of their flexibility and modularity. Bioinformatics tools such as, IntFOLD, DISOclust and DISOPRED are ideal for deducing the nature and the functional properties of plant IDPs, and act as a guide for further experimental studies. From the predictions in our work, we have showed the potential availability of at least one likely protein binding site in barley DHN3. Furthermore, point mutations within the conserved sequences in H. spontaneum variants affected the predicted protein binding profile. Our results may contribute to future experimental designs to resolve the interactions of barley DHNs with known and undetermined ligands, which lead to their diverse functions in plant cells as cryoprotectants and chaperons.