Introduction

The formation of transcriptionally silent chromatin, so called heterochromatin, is critical for genomic stability and cell differentiation1,2,3. The higher order structure of heterochromatin is maintained by a nonhistone chromosomal protein, termed heterochromatin protein 1 (HP1)4,5,6. HP1 family proteins contain two functionally distinct globular domains, the N-terminal chromodomain (CD) and the C-terminal chromo-shadow domain (CSD). The CD directly binds to lysine 9 methylated histone H3 (H3K9me), a hallmark of heterochromatin7,8,9, whereas the CSD is responsible for HP1 homodimerization, which mediates the condensation of nucleosomes containing H3K9me10. The CD contains an N-terminal three stranded β sheet and a C-terminal α helix that form a globular domain and binds to H3K9me by a cage formed by three aromatic residues11,12. For example, in the mouse HP1β CD, tyrosine 20, tryptophan 41 and phenylalanine 44 form the aromatic cage that recognizes the methylated lysine residue of H3K9me12.

In mice there are three HP1 isoforms, HP1α, HP1β and HP1γ with highly conserved CDs that are thought to share a similar structural feature to interact with H3K9me. By contrast, the CDs of these isoforms are connected to a less-conserved N-terminal tail, which is hypothesized to contribute to their isoform-specific function6. In HP1α, in particular, there are four successive serine residues in the N-terminal tail that are constitutively phosphorylated in vivo13. While HP1α CD itself can bind to H3K9me, its affinity becomes much stronger when the N-terminal tail is phosphorylated13. However, the molecular mechanisms underlying this enhancement remain elusive.

To reveal the role of the phosphorylated tail in HP1 binding to H3K9me, we have compared NMR structures of the CD connected to the N-terminal tail in both its phosphorylated form (phos-NCD) and its un-phosphorylated form (unmod-NCD). The only noticeable effect of phosphorylation on the CD was a change in the dynamics of the N-terminal tail, while the tertiary structure of the structured domain remained intact: that is, the flexibility of the segment containing successive phosphorylated serine residues was significantly reduced as compared with the un-phosphorylated segment. Small angle X-ray scattering (SAXS) experiments revealed that both phos-NCD and unmod-NCD hold an extended string-like structure, but the string of phos-NCD is more extended than that of unmod-NCD. The role of the phosphorylated serine residues in HP1 binding to H3K9me was further confirmed by binding experiments using a series of N-terminal tail truncated CDs with or without phosphorylation. In addition, replica exchange molecular dynamics (REMD) simulations were performed for both phos-NCD and unmod-NCD in their free states and in complex with H3K9me to reveal their characteristic extended string-like structures. REMD enabled exhaustive sampling of the flexible N-terminal tail of NCD and the flexible C-terminal region of the H3K9me peptide interacting with CD in an aqueous environment. We constructed the structural ensemble from 2.4-μs simulations for each of the four simulation systems to illustrate the detailed atomic features of the flexible N-terminal tail. Together, these results uncover a novel structural role of phosphorylation to ensure and specify protein-protein interactions.

Results and Discussion

Structures of phos-NCD and unmod-NCD

To examine the structural basis for the role of N-terminal tail phosphorylation in HP1α, we prepared NMR samples of 13C-, 15N-labeled unmod-NCD and phos-NCD, both of which comprised amino acids 1–80 of HP1α. The phos-NCD sample was prepared by co-expression with casein kinase II. The phosphorylation state of NCD was checked by MALDI-TOF MS, which verified that all four serine residues in HP1α’s N-terminal tail were phosphorylated.

The HSQC spectra of unmod-NCD and phos-NCD showed well resolved NMR signals (Supplementary Fig. S1). Their structures were determined by using distance restraints estimated from NOEs (739 for phos-NCD and 652 for unmod-NCD), hydrogen-bond restraints (18 for phos-NCD and 15 for unmod-NCD) and dihedral restraints (94 for phos-NCD and 98 for unmod-NCD) (Table 1). Notably, for both proteins, all medium- and long-range NOEs and hydrogen bonds were observed only in each CD portion, comprising amino acids 20–80, indicating that neither N-terminal tail region (amino acids 1–19) held tertiary structure.

Table 1 Structural statistics table.

The solution structures of both unmod-NCD (Fig. 1a) and phos-NCD (Fig. 1b) are well resolved except for their tails (Supplementary Figs S2 and S3); both CD structures of amino acids 20–67 are well defined with backbone root-mean-square-deviations (RMSDs) of 0.42   0.10 Å in unmod-NCD and 0.40  0.11 Å in phos-NCD and with heavy atom RMSDs of 1.02  0.08 Å in unmod-NCD and 0.92  0.11 Å in phos-NCD (Table 1). The CD structures (amino acids, 20–75) of unmod-NCD and phos-NCD, consisting of three β strands and a C-terminal α helix, are well superimposed with an RMSD of 1.43 Å (Fig. 1c) and the aromatic cage structures formed by Tyr 20, Trp 41 and Phe 44 are also well defined (Fig. 1d). It is, therefore, apparent that the CD forms a domain structure independent of the N-terminal tail and irrespective of its phosphorylation state.

Figure 1
figure 1

The 20 best calculated NMR structures of unmod-NCD (a) and phos-NCD (b) of HP1α and their super positions (c). Unmod-NCD is shown in blue and phos-NCD is shown in red. (d) Superposition of the aromatic cage formed by Tyr 20, Trp 41 and Phe 44 and backbones of unmod-NCD (top) and phos-NCD (bottom).

Although NMR revealed that both proteins have essentially the same CD structure and a similarly disordered N-terminal tail (Fig. 1c), unmod-NCD seems to adopt a more broadly distributed random N-terminal tail structure as compared with phos-NCD. Regarding the tail structures of unmod-NCD and phos-NCD, distance restraints were obtained only from similar sequential NOEs without medium and long range NOEs. The different distributions of the disordered tail structures were not caused by static NMR structural data, but maybe due to simulated annealing calculations, which included repulsive forces between atoms. The phosphorylated serine residue is more bulky than unphosphorylated serine; as a result, the four phosphorylated serine residues in the N-terminal tail of phos-NCD tend to form a more extended string owing to the successive repulsions of the bulky phosphate groups as compared with the un-phosphorylated tail.

Dynamics and chemical shift changes of unmod-NCD and phos-NCD

To compare the dynamic structures of the tails of phos-NCD and unmod-NCD, we examined {1H}-15N hetero nuclear Overhauser effects (NOEs)14,15 (Fig. 2a; unmod-NCD, blue bars; phos-NCD, red bars). {1H}-15N hetero NOE relaxation of backbone 15N spins reflects dynamic fluctuations of individual N-H bonds on the pico- to nanosecond scale14,15. The CD portion (amino acids 20–80) showed essentially the same hetero NOE values in both protein forms: nearly 0.6 for the 20–70 amino acid region with gradually decreasing values for the 71–77 amino acid region and negative values for the C-terminal 78–80 amino acid region. These data correspond well with the NMR structures, in which both CDs hold a globular rigid structure with a flexible C-terminus. The dynamic character observed for both forms is typical of globular proteins. Both CD portions are dynamically and statistically independent from the phosphorylation state of their N-terminal tails.

Figure 2
figure 2

NMR comparison between unmod-NCD and phos-NCD.

(a) {1H}-15N hetero NOE values for unmod-NCD (blue bars) and phos-NCD (red bars). (b) Chemical shift indices of unmod-NCD (blue bars) and phos-NCD (red bars). The chemical shift index of each residue at i-th position is calculate as ΔCα − Cβ = [{Cα(i − 1) + Cα(i) + Cα(i + 1)} − {Cβ(i − 1) + Cβ(i) + Cβ(i + 1)}]/3. (c) Difference in chemical shift between phos-NCD and unmod-NCD. Δδ of 1H and 15N chemical shifts upon phosphorylation was calculated according to Δδ = {(Δδ 1H)2 + (Δδ 15N/5)2}1/2.

Surprisingly, however, both N-terminal tails exhibited positive hetero NOE values for amino acids 1–19 (Fig. 2a), suggesting that they are not as flexible as the C-terminus of the CD. In addition, all amino acids in the phosphorylated tail exhibited relatively high hetero NOE values between 0.4 and 0.5, suggesting that this portion is more likely to be an extended string rather than the entirely flexible random coil observed for the C-terminus.

In unmod-NCD, the region 9ADSSSSED16 had slightly reduced hetero NOE values as compared with phos-NCD, suggesting that the successive unphosphorylated serine residues in the tail behave as a flexible chain as compared with the phosphorylated ones. In unmod-NCD, a basic segment comprising 3KKTKR7 and an acidic segment comprising 15EDEEE19 are connected by this flexible linker, which might enable the segments to dynamically and intra-molecularly interact with each other.

Upon phosphorylation of the four serine residues, the serine and acidic segments form a long negatively charged segment comprising 10DpSpSpSpSEDEEE19, the whole of which behaves like an extended string owing to both a series of electrostatic repulsions between neighboring amino acids and the repulsion between the successive bulky phosphate groups of the serine residues as observed in the NMR structure. In this case, the N-terminal 3KKTKR7 segment might dynamically and intra-molecularly interact with the 10DpSpSpSpS14 segment by means of a presumed short hairpin formed by 8TA9. Thus, the phosphorylated tail of CD would adopt an extended longer structure as compared with the un-phosphorylated tail.

The extended string-like structures of the N-terminal tails were supported by the chemical shift indices of Cα – Cβ (Fig. 2b), which are markers of secondary structure16,17. Indeed, both unmod-NCD and phos-NCD showed essentially the same pattern of indices for their CD portions, consistent with their determined tertiary structures. As compared with the C-terminal amino acids (73–80), which show no apparent secondary structure, the N-terminal tails in both protein forms are likely to adopt a partially extended structure. Both N-terminal tails have similar patterns of basic and acidic segments; however, the phosphorylated serine segment of 11pSpSpSpSED16 seems to have a significantly more extended conformation as compared with the un-phosphorylated one: in the phosphorylated N-terminal tail, both the basic segment comprising 3KKTKR7 and the phosphorylated serine and acidic segment comprising 11pSpSpSpSEDEEE19 are likely to be partially extended. The chemical shift indices of the N-terminal tail of both unmod-NCD and phos-NCD also correspond well to the proposed string-like structures indicated by the hetero NOE values.

Upon phosphorylation, the chemical environment of the N-terminal tail altered, as shown by the chemical shift changes (Fig. 2c). As indicated by both NMR structures, by contrast, no significant chemical shift changes were observed in the CD portions except for the N-terminal Tyr20 and Val21. In the N-terminal tail, phosphorylation induced great chemical shift changes of the successive serine residues; however, significant chemical shift changes were also observed for Thr5, Arg7, Thr8 and Ala9 in the basic segment and Asp16 and Glu17 in the acidic segment, suggesting that the chemical environment of each of these segments was significantly altered upon phosphorylation. This is likely to be caused the different interaction of the basic and acidic segments in phos-NCD as compared with unmod-NCD as stated above.

We tried to examine direct interactions between the basic segment 3KKTKR7 and the phosphorylated serine segment 10DpSpSpSpS14 in phos-NCD or the acidic segment of 15EDEEE19 in unmod-NCD by homo-nuclear NOE spectroscopy with several mixing times; however, no interactions were detected. The presumed interaction of the basic segment with the acidic or phosphorylated segment might be too dynamic to show any detectable NOE between them.

In the presumed folded back conformation of unmod-NCD, the basic segment of 3KKTKR7 could potentially also interact with Tyr20 and Val21 in the CD, which neighbor the acidic segment 15EDEEE19, because both amino acids exhibited significant chemical shift changes upon phosphorylation (Fig. 2c). Both Tyr20 and Val21 also exhibited substantial chemical shift changes upon binding to the H3K9me peptide (see below). For unmod-NCD, therefore, the intra-molecular interaction between the basic segment 3KKTKR7 and the acidic segment plus Tyr20 and Val21 is likely to mimic the inter-molecular interaction between the H3K9me peptide and the acidic segment plus Tyr20 and Val21.

It might be possible that the differences in chemical shift observed between unmod-NCD and phos-NCD originate from different dimer or multimer associations due to their N-terminal tails. We checked the HSQC spectra of 10-fold diluted samples of both proteins, but no significant signal changes were observed.

Small angle X-ray scattering (SAXS) of CD, unmod-NCD and phos-NCD

To confirm the structural differences between unmod-NCD and phos-NCD, small angle X-ray scattering (SAXS) experiments were carried out on CD, unmod-NCD and phos-NCD. To check inter-particle interference, SAXS data were collect at three different protein concentrations, 5, 10 and 15 mg/ml (Supplementary Fig. S4). The radius of gyration Rg and the estimated molecular weight from the forward scattering intensity I(0), of CD, unmod-NCD and phos-NCD are summarized in Fig. 3. For all three proteins, Rg did not depend on protein concentration, indicating that there was no apparent inter-particle interference in the SAXS data. The estimated molecular weight of CD, unmod-NCD and phos-NCD was close to the exact Mw of 7.6, 9.8 and 10.1 kDa, respectively, indicating that all proteins existed as a monomer and the data did not contain scattering from aggregated protein. The Rg value of the CD at the three different protein concentrations was estimated as 14.0 ± 0.3, 13.8 ± 0.3 and 13.7 ± 0.2 Å, which are close to the Rg value calculated from the NMR structure of the CD (15.0 Å). This indicates that the structure of CD is globular, as observed for the NMR structure.

Figure 3
figure 3

P(r) functions for CD (black), unmod-NCD (blue) and phos-NCD (red) (top) and the radius of gyration Rg and estimated molecular weight from the forward scattering intensity I(0), for CD, unmod-NCD and phos-NCD (bottom).

We converted the scattering profiles at 5 mg/ml to the P(r) function and the estimated maximum particle dimension, Dmax (Fig. 3). For the CD, the P(r) function showed a single peak corresponding to the pair distribution of atoms within a globular protein. On the other hand, the P(r) function of both unmod-NCD and phos-NCD showed an extended tail that was not observed for the CD. The Dmax of unmod-NCD (79 Å) and phos-NCD (94 Å) was larger than that of the CD (56 Å). The extended tail of the P(r) function and larger Dmax for unmod-NCD and phos-NCD is due to scattering from the N-terminal tail region in each case. Further comparison showed that the extended tail of P(r) function of phos-NCD was more extended than that of unmod-NCD and Rg and Dmax of phos-NCD were larger than those of unmod-NCD. These data indicate that the conformation of the N-terminal region of phos-NCD is more extended than that of unmod-NCD.

Notably, the P(r) functions calculated from the NMR structures predicted Dmax value of 82 Å for unmod-NCD and 107 Å for phos-NCD (Supplementary Fig. S5). The NMR structures gave a larger Dmax for phos-NCD than for numod-NCD, which corresponded well qualitatively with the SAXS data; however, the magnitudes of Dmax derived from the NMR structures were larger than the observed SAXS values. This suggests that the N-terminal tail structure of both unmod-NCD and phos-NCD in solution is more compact than that in the static NMR structures. Comparison of the static NMR structures of the two N-terminal tails reflects only the bulkiness of the phosphate groups in phos-NCD.

Binding between H3K9me and N-terminally truncated CDs

To confirm the extended string-like structure together with the folded back dynamic structures of the N-terminal tail of both unmod-NCD and phos-NCD, we examined the effect of N-terminal deletion on the binding of NCD to a H3K9me peptide (22-mer, comprising amino acids 1–21 of histone H3 with a C-terminal tyrosine for quantitative analysis), by using isothermal calorimetry (ITC) experiments (Fig. 4, Supplementary Fig. S6). phos-NCD, comprising amino acids 1–80 of HP1α, bound to H3K9me with approximately 10-fold stronger affinity (Kd = 0.17 μM) than unmod-NCD (Kd = 1.77 μM). Systematic deletion mutants showed that the CD domain alone, NCDΔ1–19 (amino acids 20–80), had the weakest binding activity (Kd = 13.3 μM) suggesting that the N-terminal tail mediates binding between H3K9me and CD. Addition of the acidic segment to the CD, NCDΔ1–14 (amino acids 15–80), led to about 5-fold stronger activity than the CD alone and further addition of the serine segment, NCDΔ1–9 (amino acid 10–80) resulted in about 83-fold stronger binding. Notably, the phosphorylated NCDΔ1–9 form, phos-NCDΔ1–9, showed the strongest binding of all mutants (Kd = 40 nM) with about 330-fold stronger activity than the CD. These data suggest that the acidic segment 15EDEEE19 together with the phosphorylated 10DpSpSpSpS14 region is responsible for enhancing the interaction with the H3K9me peptide, 1ARTKQTAR(Kme)STGGKAPRKQLA21(Y), (hereafter to clarify italic font is used for the amino acids of H3 and normal font for HP1), where Kme9 fits into the aromatic cage formed by Tyr20, Trp41 and Phe44 and the 8R(Kme)STGGKAPRK18 segment of the histone H3 tail seems to interact with the 10DpSpSpSpSEDEEE19 segment in the phos-NCD tail.

Figure 4
figure 4

Binding affinities of various N-terminal deleted mutants of unmod-NCD and phos-NCD by ITC experiments.

(a) Upper panels: raw data for heat measured upon injection of the H3K9me peptide into each protein. Lower panels: integrated heat of injections. The solid line shows the best fit of the data. (b) Schematic view showing the deletion mutants together with their Kd values.

As compared with NCDΔ1–9, NCDΔ1–4 (amino acids 5–80) containing five additional amino acids, 5TKRTA9, led to about 4-fold weaker binding to H3K9me, suggesting that the two amino acids 6KR7 might intra-molecularly bind to the acidic 15EDEEE19 segment, thereby inhibiting interactions with the 8R(Kme)STGGKAPRK18 segment of the H3K9me peptide. However, when the serine residues of NCDΔ1–4 were phosphorylated (phos- NCDΔ1–4), binding to H3K9me was recovered to a value similar to that of un-phosphorylated NCDΔ1–9. As compared with NCDΔ1–9, the basic segment 3KKTKR7 in numod-NCD might intra-molecularly bind more strongly to the acidic string 15EDEEE19, causing 6-fold weaker binding to the H3K9me peptide. In phos-NCD, however, the phosphorylated serine residues would stiffen the string, thereby inhibiting the folded-back interaction of the basic string 3KKTKR7 with the acidic string 15EDEEE19. Thus, the observed order of binding affinity (Kd) was phos- NCDΔ1–9 (0.04 μM) phos-NCD (0.17 μM) ~ NCDΔ1–9 (0.16 μM) ~ phos- NCDΔ1–4 (0.38 μM) > NCDΔ1–4 (0.76 μM) >unmod-NCD (1.77 μM) ~ NCDΔ1–14 (2.52 μM) CD (13.3 μM).

Comparison of CD, unmod-NCD and phos-NCD bound to H3K9me peptide

Next, we examined the structural changes that occur in CD, unmod-NCD and phos-NCD upon binding to H3K9me peptide by NMR (Fig. 5). Notably, the isolated CD and the CD region in phos-NCD showed essentially the same chemical shift changes upon binding to H3K9me peptides (Fig. 5a), although the binding affinities differed considerably: phos-NCD bound H3K9me about 80 times more strongly than the CD. This suggests that the structure of the CD portion in phos-NCD bound to H3K9me is not altered by the presence of the phosphorylated tail; in other words, the tail does not interfere with the CD but enhances its interaction with the H3K9me peptide. In addition, Glu19 and Tyr20 showed substantial chemical shift changes after binding to H3K9me peptide (Fig. 5a), indicating that these two residues contribute to the interaction with 8R(Kme)9 of H3, as will be shown below in the complex structure.

Figure 5
figure 5

Chemical shift changes, chemical shift indices and {1H}-15N hetero NOE values.

(a) Changes in 1H and 15N chemical shifts of CD (green), unmod-NCD (blue) and phos-NCD (red) upon binding to the H3K9me peptide according to Δδ = {(Δδ 1H)2 + (Δδ 15N/5)2}1/2, where signals that are disappeared after binding to H3K9me are indicated as open blue (unmod-NCD) or open green (CD) bars over 1.2 ppm. (b) Chemical shift indices of the complex of phos-NCD bound to H3K9me peptide, where the chemical shift index of each residue at i-th position is calculated as ΔCα − Cβ = [{Cα(i − 1) + Cα(i) + Cα(i + 1)} − {Cβ(i − 1) + Cβ(i) + Cβ(i + 1)}]/3. (c) {1H}-15N hetero NOE values for phos-NCD in its free state (black bars) and its complex with H3K9me (gray bars).

The chemical shift changes in the N-terminal tail of phos-NCD on binding to H3K9me were small; however, as compared with the changes of the C-terminal portion, significant changes were observed for the four phosphorylated serine residues and Asp16 and Glu17 in addition to Glu19 (Fig. 5a). These residues are probably responsible for the interaction with the 8R(Kme)STGGKAPRK18 segment of H3K9me.

As compared with CD and phos-NCD, unmod-NCD showed greater spectral changes upon binding to H3K9me (Fig. 5a). On formation of the complex of unmod-NCD and H3K9me, many NMR signals in the tail region in addition to the CD region disappeared, as indicated by open blue bars. The broadening and disappearance of signals may be caused by conformational fluctuations upon binding to H3K9me; such fluctuations were not observed in the complexes of CD and phos-NCD with H3K9me. Strikingly, the signals of Arg7 and Thr8 in unmod-NCD disappeared upon binding to H3K9me (Fig. 5a), suggesting that the conformations of these two amino acids fluctuate in the complex. As observed in the experiments with tail deletion mutants, the basic segment 3KKTKR7 seems likely to interact with the acidic segment 15EDEEE19 plus Tyr20 and Val21 in the un-phosphorylated tail. Upon binding to H3K9me, the acidic 15EDEEE19 segment plus Tyr20 and Val21 might interact dynamically with both the H3K9me basic segment 8R(Kme)STGGKAPRK18 and the basic 3KKTKR7 segment of unmod-NCD. In the case of unmod-NCD, even though Kme9 of H3 is specifically captured in the aromatic cage formed by Tyr20, Trp41 and Phe44 of the CD portion, binding of the backbone of the 8RKmeSTGGKAPKR18 segment to the acidic 15EDEEE19 segment plus Tyr20 and Val21, seems to be dynamically inhibited by intra-molecular binding of the basic segment 3KKTKRT8. This interference might lead to disappearance of the signals for Arg8, Thr9, Glu19 and Tyr20 in unmod-NCD upon binding to the H3K9me peptide. Partial binding of the basic segment of unmod-NCD to the acidic segment would cause changes in the complex structure of unmod-NCD and H3K9me in the CD portion. Consistent with this, we obtained broadened signals for amino acids in the H3K9me binding sites of the CD portion, including Val21, Val22, Lys42, Gly43, Glu54, Lys55, Asn56, Leu57, Asp58, Cys59, Glu61 and Ser64 in addition to Tyr20. Indeed, the amino acids with signals that disappeared corresponded well with those that showed significant chemical shift changes in CD and phos-NCD upon binding to the H3K9me peptide.

NMR of phos-NCD bound to H3K9me peptide

The chemical shift indices of phos-NCD bound to H3K9me remained essentially same as those of free phos-NCD (Fig. 5b), indicating that the secondary structures of phos-NCD did not change upon binding to H3K9me. However, small but significant differences in chemical shift indices were observed in two regions: namely, 19EYVV22 and 56NKD58. As described in the complex structure below, both regions are involved in interacting with the H3K9me peptide: the backbone of 5QTARKme9 of H3 is sandwiched between the 19EYVV22 and 56NKD58 backbones of the CD portion in phos-NCD.

In the complex of phos-NCD bound to H3K9me, almost all portions except two regions, 3KKTKRT8 and 19EY20, showed essentially the same hetero NOE as free phos-NCD (Fig. 5c). In the first case, the basic 3KKTKR7 segment is probably inhibited from interacting with the phosphorylated 10DpSpSpSpS14 segment by the 8R(Kme)STGGKAPRK18segment of the histone H3 tail, so it is now more freely exposed to solvent; as a result, the flexibility of the basic 3KKTKRT8 segment seems to be lowered. In the second case, the flexibilities of Glu19 and Tyr20 region are reduced by binding to H3K9me. As shown below, the backbone of the Glu19 and Tyr20 portion is stabilized by hydrogen bonding to the backbone of 5QTARKme9 in H3K9me; thus, the two amino acids are stiffened by binding.

Structural analysis of phos-NCD bound to H3K9me peptide

To determine the tertiary structure of phos-NCD bound to the H3K9me peptide, we prepared an NMR sample of a complex formed between 13C-, 15N-labeled phos-NCD (residues 1–80) of HP1α phosphorylated at serine residues 11–14 and the unlabeled H3K9me peptide (residues 1–18) of histone H3. A total of 637 distance restraints estimated from NOEs, 7 hydrogen-bond restraints (for phos-NCD), 99 dihedral restraints (for phos-NCD) and 43 intermolecular NOEs were used to determine the structure (Table 1).

The NMR structures of phos-NCD bound to the H3K9me peptide were well defined (Fig. 6a, Table 1, Supplementary Fig. S7). Figure 6c shows superposition of the structures of free phos-NCD (red) and phos-NCD (green) bound to H3K9me (magenta). Upon binding to H3K9me, the structure of the CD moiety is not altered as indicated by the RMSD value of 0.97 Å between the two structures. The three methyl groups of H3K9 are surrounded by three aromatic residues, Tyr20, Trp41 and Phe44 of phos-NCD and the aromatic cage structure in the complex is essentially the same as that in phos-NCD, Drosophila HP1a and mouse HP1β (Supplementary Fig. S8).

Figure 6
figure 6

Solution structures of phos-NCD bound to the H3K9me peptide.

(a) Superposition of the 20 best NMR structures. (b) Lowest energy structure of the complex. phos-NCD is shown as a green ribbon; the H3K9me is shown as a magenta stick. (c) Structural comparison between unbound phos-NCD (red) phos-NCD (green) bound to the H3K9me peptide (magenta).

The backbone of 5QTARKS10 in the H3K9me peptide interacts with two backbone strands of phos-NCD (Supplementary Fig. S8): the Asp58 and Asn56 strand; and the Glu18, Glu19, Tyr20 and Val21 strand. These amino acids of phos-NCD showed significant chemical shift changes upon binding to H3K9me. The mode of backbone interaction is very similar to that observed between the Drosophila CD and the H3K9me peptide. In the present NMR structure determination, there were no direct interactions such as homo-nuclear NOEs between the phosphorylated HP1 tail and the histone H3 tail. However, two presumed interacting structures were obtained (Supplementary Fig. S8): one shows interactions between Arg8 of the H3K9me peptide and phosphorylated Ser14 of phos-NCD, together with Arg17 of the H3K9me peptide and phosphorylated Ser11 of phos-NCD; the other shows interactions between Arg8 of the H3K9me peptide and Glu15 of phos-NCD, together with Arg17 of the H3K9me peptide and Asp10 of phos-NCD.

REMD simulations of unmod-NCD and phos-NCD, with and without the H3K9me peptide

To provide more detailed structures to interpret the results obtained from the NMR and SAXS experiments, we generated structural ensembles for the four different systems-i.e., the complex forms of unmod-NCD and phos-NCD and the unbound forms of unmod-NCD and phos-NCD, by replica exchange molecular dynamics (REMD) simulations. For each system, 48 replicas were sufficient to connect each ensemble at 290.53 K to that at 443.15 K and the potential energy distributions significantly overlapped between neighboring replicas to give an average exchange rate of 0.1 (Supplementary Fig. S9). During a simulation time of 50 ns, each replica went from the lowest to the highest temperatures several times, yielding a well converged structural ensemble at 293.15 K (the second lowest temperature among the 48 replicas; Supplementary Fig. S9).

During the REMD simulations, the core region of CD (20–67) maintained its structure with a root mean square fluctuations (RMSFs) for Cα atoms of 1.15, 1.15, 0.75 and 0.73 Å for unmod-NCD and phos-NCD without the H3K9me peptide and unmod-NCD and phos-NCD in complex with peptide, respectively. Although phosphorylation of the N-terminal serine residues scarcely affected the fluctuations, formation of the complex significantly stiffened the core region. The H3K9me peptide connects the N-terminal (residues 17–20) and the C-terminal (residues 55–58) regions and stabilizes the flexible loop via interactions with the aromatic cage (residues 20, 41 and 44). The Cα RMSFs of the C-terminus after fitting the core region were 9.19, 8.81, 5.10 and 5.39Å, respectively, for the systems in the above order, indicating the intrinsically flexible structure of the C-terminus. The differences between the unbound and peptide-bound structures may simply reflect the difference in flexibility of the core regions. On the other hand, the Cα RMSFs of the N-terminal tail strongly depended on both the phosphorylation state and the binding of the H3K9me peptide (10.78, 12.69, 7.62 and 5.39 Å). These observations agree with the results from our experiments. In the unbound state, phosphorylation tends to make the N-terminal tail more extended, corresponding to the increase in the radius of gyration observed in the SAXS experiments. In contrast, peptide binding reduces the RMSFs more on phos-NCD, indicating stronger interactions between the N-terminal tail and the H3K9me peptide.

In terms of the behavior of the flexible N-terminal tail, the structural ensembles obtained from the REMD simulations for unmod-NCD and phos-NCD in their free states were compared. The distribution of the far N-terminus (Cα atom of Met1) was enhanced by phosphorylation of the N-terminal serine residues (Supplementary Fig. S9d). This was also observed in the statistical analysis in Supplementary Fig. S9e, where the distribution of the minimum distance between the Cα atoms of residue 1–10 (the edge of the N-terminal tail) and those of residues 19–21/56–59 (the atoms that form contact with the H3K9me peptide in the complex form) was calculated for unmod-NCD and phos-NCD. Unmod-NCD has a high frequency of contacts between the two regions, whereas phos-NCD exhibits a lower frequency of contacts. This indicates that phosphorylation breaks favorable interactions between the N-terminal tail and the core region and thus distributes the N-terminal tail over a wider range of space. Supplementary Fig. S10 shows ten representative structures of unmod-NCD and phos-NCD, which are extremely flexible but appear to contain the non-specific electrostatic interactions between the basic (3–8) and acidic (15–19) segments in unmod-NCD and the basic (3–8) and phosphorylated serine and acidic (10–19) segments in phos-NCD.

REMD simulations demonstrated that, in the H3K9me peptide-bound state, the phosphorylated 11pSpSpSpS14 segment in phos-NCD showed strong electrostatic interactions with the basic residues Lys14, Arg17 and Lys18 in the H3K9me peptide. Figure 7a clearly shows that the phosphorylation strongly enhanced the interactions between the N-terminal tail and the histone tail. Figure 7b is a snapshot of the REMD simulation for phos-NCD bound to the H3K9me peptide, exhibiting salt bridges between phosphorylated Ser12 and Lys18 and between phosphorylated Ser13 and Arg17.

Figure 7
figure 7

Results of REMD simulations for unmod-NCD and phos-NCD bound to the H3K9me peptide.

(a) Probability distributions of the minimum distance between the Cα atoms of K14/R17/K18 of the H3K9me peptide (the basic residues at the C-terminal after K9me) and those of the four serines (11–14) of NCD, for unmod-NCD (blue) and phos-NCD (red) in the REMD simulation of the bound state. (b) Representative structure of phos-NCD complexed with H3K9me obtained in the REMD simulation of the bound state. Ionic interactions between pS12 and K18 and pS13 and R17 are indicated by dotted cyan lines.

Conclusion

The recognition mode of the chromodomains of HP1 family proteins for histone H3K9me has been described in detail. A cage formed by three aromatic amino acids—for example, Tyr24, Trp45 and Tyr48 in Drosophila HP1a11; and Tyr20, Trp41 and Phe44 in mouse HP1β chromodomains12—is responsible for capturing the methyl moiety on lysine 9 in the methylated histone H3 tail and the backbone of the N-terminal histone tail of amino acids 5–8 is sandwiched by two β strands in the chromodomains11,12. In this context, the structure of HP1α phos-NCD bound to the H3K9me peptide shows a similar binding mode with an aromatic cage formed by Tyr20, Trp41 and Phe44. However, these features do not explain the enhancement in affinity due to phosphorylation of the N-terminal tail of HP1α. Here, we have demonstrated that the HP1α tail segment containing four phosphorylated serine residues 10DpSpSpSpS14, together with the acidic amino acids 15EDEEE19, behave like a negative extended string to interact with the positive amino acids 8RKmeSTGGKAPRK18 in the tail of histone H3. Systematic truncation of the N-terminal tail of numod-NCD and phos-NCD suggested that the CD protein with an entirely negative 10DpSpSpSpSEDEEE19 string-like tail has the strongest affinity for the H3K9me peptide, binding more than 300 times more tightly as compared with the isolated CD alone. These ten negative amino acids seem to adopt a negatively charged extended string that can directly interact with the positively charged histone H3 tail containing K9me.

This mode of interaction between two isolated extended string-like proteins is rather peculiar in the sense of currently known protein-protein interactions. It is well known that proteins that are intrinsically disordered in the unbound state form a rigid structure on their target proteins by a coupled folding and binding mechanism18,19,20,21,22. Depending on the protein’s amino acid sequences, the folded structures on the target protein is polymorphic, occurring as an amphipathic helix23,24 or an extended strand25,26,27; however, the complex overall adopts a globular structure.

In the case of the interaction between the tail of phos-NCD and the histone H3 tail, two extended string-like interactions have a dominant effect on binding activity; however, no rigid complex globular structure was observed for these strings. Even in the interaction between the two tails, each tail seemed to retain a dynamic character. It is likely that the two fluctuating strings are dynamically interacting with each other. This dynamic string-like behavior forms the basis of the enhancement in interaction between HP1α and histone H3 resulting from phosphorylation of the tail of the HP1α chromodomain.

Methods

Purification of CD without and with the phosphorylated or un-phosphorylated tail

Recombinant CDs without and with the phosphorylated or un-phosphorylated tail were prepared according to a previously described method13. In our previous study, phos-NCD was confirmed by LC-MS/MS to contain only phosphorylation of Ser11, Ser12, Ser13 and Ser14. Proteins were expressed in Escherichia coli strain BL21 (DE3) star containing the expression plasmid pCold/Amp (HP1αCD) with or without pRSFduet/Kan (CK2) grown in LB medium. Each of the 15N-labeled or 13C/15N-labeled proteins was expressed in M9 minimal medium containing 15N-ammonium chloride with or without 13C-glucose.

The harvested cells were re-suspended in Buffer A (50 mM Na phosphate buffer (pH 7.0), 500 mM NaCl, 10% glycerol, 1 mM 2-Mercaptoethanol), lysed on ice by sonication and centrifuged. The supernatant of each protein solution was then applied to Ni-NTA super flow (Qiagen) equilibrated with Buffer B (50 mM Na phosphate buffer (pH 7.0), 1 M NaCl, 30 mM imidazole, 1 mM 2-Mercaptoethanol) and each His-tagged sample was eluted by Buffer C (50 mM Na phosphate buffer (pH 7.0), 1 M NaCl, 1 mM 2-Mercaptoethanol ,400 mM imidazole). Each eluted His-tagged sample was dialyzed against Buffer D (50 mM Na phosphate buffer (pH 7.0), 300 mM NaCl, 1 mM 2-Mercaptoethanol) and digested with Thrombin protease at 4 °C overnight. The protein solution was again loaded onto the Ni-NTA column. Fractions passing through the column were concentrated and loaded on HiLoad 26/60 Superdex 75pg against Buffer E (50 mM Na phosphate buffer (pH 7.0), 500 mM NaCl, 1 mM 2-Mercaptoethanol]. Fractions were collected and dialyzed against Buffer F (20 mM KPB (pH 6.8), 10 mM NaCl, 10 or 100% D2O, 5 mM DTT). Each sample was checked by mass spectrometry, using a MALDI-TOF AutoflexTM (Bruker Daltonis) to confirm that the modification state of phos-NCD contained four phosphorylated residues. In addition, the HSQC spectra of unmod-NCD and phos-NCD showed that only Ser11, Ser12, Ser13 and Ser14 are typically downshifted by direct phosphorylation17, whereas Thr5 and Thr8 are upshifted, indicating that neither threonine residues was directly phosphorylated but both signals were indirectly changed due to the phosphorylation of the four serine residues (Supplementary Fig. S11).

Preparation of CD with an N-terminally truncated tail

Plasmids encoding the CD with a truncated tail were obtained by PCR using plasmid pCold/Amp (HP1αCD) as a template with appropriate primer sets and Prime Star® Max DNA polymerase (Takara Bio.). Protein expression and purification were performed as described above with small modifications. For protease digestion, each sample was concentrated by Millipore Amicon®Ultra MWCO 5,000 and cleaved by Thrombin at room temperature for 2days in 50mM Na phosphate buffer (pH 7.0), 300 mM NaCl, 1 mM 2-Mercaptoethanol. Each sample was checked by mass spectrometry using a MALDI-TOF AutoflexTM (Bruker Daltonis) to confirm its modification state. During preparation of the phosphorylated D10 mutant, the sample obtained was 5 amino acids longer than expected; thus, the Thrombin recognition site in the vector was changed to an HRV3C recognition site to produce phosphorylated and un-phosphorylated D10 with two amino acids, GP, attached to the N-terminus. All other samples contained three amino acids, GSH, at their N-terminus after cleavage by Thrombin.

NMR spectroscopy

The protein concentrations were 0.5 mM in 20 mM KPB (pH 6.8), 10 mM NaCl, 5 mM d-DTT and 10% or 100% D2O. The NMR experiments were performed at 25 °C on a Bruker Avance 600 MHz spectrometer and 800 MHz spectrometer, both with a 5-mm triple-resonance pulsed-field gradient cryoprobe. Chemical shifts were referenced to the chemical shift of 2,2-dimethyl-2-silapentane-5-sulfonate. The 15N and13C chemical shifts were referenced indirectly to 2,2-dimethyl-2-silapentane-5-sulfonate using the absolute frequency ratios.

Backbone and side chain resonances were assigned by the following experiments: 2D 1H–15N HSQC and 13C HSQC; 3D HNCO, 3D HN(CO)CA, 3D HNCA, 3D HNCACB, 3D CBCA(CO)NH, 3D HBHA(CO)NH, 3D HCCH-TOCSY, 3D HCCH-COSY, 2D (HB)CB(CGCD)HD and 2D (HB)CB(CGCDCE)HE; and 3D 15N NOESY-HSQC (100 ms) and 3D 13C NOESY-HSQC (100 ms). For the complex, intramolecular distance restraints were obtained from 3D 15N and 13C NOESY–HSQC spectra. Resonance assignments and intramolecular distance restraints for the unlabeled H3K9me3 peptide were obtained from 2D NOESY and TOCSY with a 13C-filtered or 13C/15N filtered pulse scheme28. Intermolecular distance restraints were obtained from 3D 13C/15N X-filtered NOESY spectra29. The NOE mixing time in all NOESY experiments was set to 180 ms. All NMR spectra were processed by using the program NMRPipe30 and analyzed using the program Olivia (M. Yokochi, S. Sekiguchi, & F. Inagaki, Hokkaido University, Sapporo, Japan) with angle restraints determined by TALOS+ and structures calculated by CYANA v2.131,32. CYANA was used to compute seven cycles, each with 600 structures. Each conformer was subjected to 10,000 steps of torsion angle dynamics per cycle. Ultimately, the 20 lowest energy structures were selected to represent each structure. Ramachandran plot statistics for the structures were calculated by using PROCHECK-NMR33. The statistics of the structures are summarized in Table 1. The structures were visualized by using MOLMOL34 and PyMOL (DeLano, W. L. The PyMOL Molecular Graphics System, http://www.pymol.org).

Preparation of histone H3 peptides

All of the histone H3 peptides used in this study were purchased from Sigma Genosys.

Isothermal titration calorimeter (ITC) experiments

Protein solution (40 μM) was loaded into a VP-ITC isothermal titration calorimeter (Microcal, INC.) cell (active cell volume 1.4 ml). The solution was titrated against 400 μM ligand solution via a 250 μl titration syringe. Experiments were carried out at 20 °C. The ligand solution was prepared in the same buffer as the protein (20 mM KPB (pH 6.80), 10 mM NaCl). The heat of dilution generated by the ligand was subtracted and the binding isotherms were fitted to a one-site binding model by using Origin 7 Software (Microcal, INC.). From the values of Kd and ΔH, the thermodynamic parameters, ΔG and ΔS were calculated according to the basic thermodynamic equations;

Small angle X-ray scattering (SAXS)

SAXS measurements were performed at 20 °C with a MicroMax007HF X-ray generator (Rigaku). A PILATUS100K detector (DECTRIS) at a distance of 561 mm from the sample was used to measure scattering intensities. Each sample solution was transferred to a quartz-window cell with a 1 mm path length. Circular averaging of the scattering intensities was carried out to obtain one-dimensional scattering data I(q) as a function of q (q = 4πsinθ/λ, where 2θ is the scattering angle and λ is the X-ray wavelength 1.5418 Å). To correct for interparticle interference, I(q) data were collected at three different protein concentrations (5, 10 and 15 mg/ml in 50 mM KPB (pH 6.8), 500 mM NaCl and 5 mM DTT.). The total exposure times were 6 hours for 5 mg/ml and 3 hours for both 10mg/ml and 15mg/ml. Because the intensity profile did not indicate a concentration effect, the correction for interparticle interference was not applied. To estimate the molecular weight of samples, I(q) data were collected for hen egg white lysozyme (5.2 mg/ml in 50 mM KPB (pH 6.8), 500 mM NaCl). The data were processed by using the software applications embedded in the ATSAS package [ http://www.embl-hamburg.de/biosaxs/software.html]. The radius of gyration Rg and forward scattering intensity I(0) were estimated from the Guinier plot35 of I(q) in the smaller angle region of qRg < 1.3. The distance distribution function P(r) was calculated in the program GNOM36, where the experimental I(q) data were used in a q-range from 0.031 to 0.250 Å−1. The maximum particle dimension Dmax was estimated from the P(r) function as the distance r for which P(r) = 035. The I(q) and Rg of 20 models corresponding to the NMR structures were calculated by using the program CRYSOL37 and averaged. The molecular weight of the sample was estimated by comparing I(0)/c (where c is the protein concentration) of the sample to that of lysozyme.

Replica exchange molecular dynamics (REMD) simulations

Replica exchange molecular dynamics (REMD) simulations38,39 for the four systems, unmod-NCD (amino acids 1–80) and phos-NCD (amino acids 1–80) complexed with the H3K9me peptide (amino acids 1–22) and unmod-NCD and phos-NCD without the peptide, were performed in explicit solvent. The starting structure of the CD region was built by homology modeling using MODELLER40 and the crystal structure (PDB: 1Q3L), whereby the structure of residues 60–64 was modified to be an α-helix on the basis of the NMR structures and the disordered regions (residues 1–20 and 75–80 for NCD and 1–4 and 17–22 for the H3K9me peptide) were built simply in random coil. The C-termini of both chains were capped by N-methyl group. The protein was immersed in a rectangular box (56.6 × 58.3 × 85.4 Å3 on average) filled with TIP3P waters41 together with sodium and chloride ions to generate the SAXS experimental ionic concentration, resulting in 40,055, 41,722, 34,683, 36,263 atoms in total for the complex forms of unmod-NCD and phos-NCD and the apo forms of unmod-NCD and phos-NCD, respectively.

REMD simulations were performed by the MD program MARBLE42 with an extension for the REMD (P = 1 atms, T is described below). The force field used in these computations was the CHARMM 36 all-atom parameter43 and the parameters for methylated lysine44. Electrostatic interactions were calculated using the particle-mesh Ewald method45. The cut-off length of the Lennard-Jones potential was 10 Å. The symplectic integrator for rigid bodies was used to constrain the bond lengths and angles involving hydrogen atoms42, allowing the time step to be 2.0 fs. Forty-eight replicas were used in each simulation with temperatures ranging from 290.53 K to 443.15 K with intervals generated in the exponential scale. The highest temperature, 443.15 K, was chosen in order to sample all possible structures including the flexible N-terminal region. To avoid the dissociation of the H3K9me peptide, distance restraints were applied between Nζ of K9me and the centers of mass of the aromatic rings comprising the aromatic cage (residues 20, 41 and 44) with the harmonic force constant of 0.15 kcal/mol/Å2. 50-ns simulations were carried out for each of four systems using Metropolis trials to exchange temperatures between neighboring replicas every 10 ps. The simulation trajectories with T = 293.15 K (the second lowest temperature for the 48 replicas) were selected from all of the replicas as the structural ensembles for analysis.

Additional Information

Accession codes: Coordinate and structural factors have been deposited in the Protein Data Bank under accession code 2RVL for unmod-NCD, 2RVM for phos-NCD in its unbound state and 2RVN for the phos-NCD bound to H3K9me. NMR constraints have been deposited in the Biological Magnetic Resonance Bank under entry 11604 for unmod-NCD, 11605 for phos-NCD in its unbound state and 11606 for the phos-NCD bound to H3K9me. http://www.nature.com/srep

How to cite this article: Shimojo, H. et al. Extended string-like binding of the phosphorylated HP1a N-terminal tail to the lysine 9-methylated histone H3 tail. Sci. Rep. 6, 22527; doi: 10.1038/srep22527 (2016).