To better understand the recognition mechanism of RISC and the repertoire of guide-target interactions we introduced G:U wobbles and mismatches at various positions of the microRNA (miRNA) ‘seed’ region and performed all-atom molecular dynamics simulations of the resulting Ago-miRNA:mRNA ternary complexes. Our simulations reveal that many modifications, including combinations of multiple G:U wobbles and mismatches in the seed region, are admissible and result in only minor structural fluctuations that do not affect overall complex stability. These results are further supported by analyses of HITS-CLIP data. Lastly, introduction of disruptive mutations revealed a bending motion of the PAZ domain along the L1/L2 ‘hinge’ and a subsequent opening of the nucleic-acid-binding channel. Our findings suggest that the spectrum of a miRNA's admissible targets is different from what is currently anticipated by the canonical seed-model. Moreover, they provide a likely explanation for the previously reported sequence-dependent regulation of unintended targeting by siRNAs.
MiRNAs are short RNAs, approximately ~22 nucleotides (nts) in length, which post-transcriptionally regulate their protein-coding targets in a sequence-dependent manner1. The typical transcribed precursor molecule (pri-miRNA) assumes a double-stranded RNA (dsRNA) conformation with a characteristic hairpin-like structure; the latter is pre-processed in the nucleus by the “microprocessor” complex into a pre-miRNA before being shuttled into the cytoplasm by XPO52. Mirtrons represent an exception to this rule: their pre-miRNAs are directly generated from introns during the splicing of nascent mRNAs3. In the cytoplasm, the pre-miRNA hairpin is cleaved by the DICER endonuclease to form dsRNA, 20–25 nts in length, with 3′ overhangs; one of the two strands, the miRNA, is incorporated into the RISC, also known as the miRNA ribonucleoprotein complex (miRNP), where the interaction with the target mRNA takes place1. This interaction typically results in degradation of the target mRNA or inhibition of its translation by the ribosome4. Originally believed to form heteroduplexes mainly with the 3′ untranslated region (3′UTR) of the target, miRNAs have since been shown to also target protein-coding regions and 5′UTRs5,6,7,8,9,10,11,12.
MiRNAs have been shown to be involved in many fundamental processes that include developmental timing13,14,15,16, the induction of organ asymmetry17, tumor suppression and oncogenic activity18,19,20, invasion and metastasis21, modulation of embryonic stem cell differentiation9, neurodegeneration22 etc. Moreover, extensive work has revealed tissue- and cellular-state-dependent miRNA profiles in cancers23,24,25,26,27,28,29,30, cardiovascular disease31,32, immunity33, Alzheimer's34,35, Tourette's syndrome36, schizophrenia37, and others. Given their involvement in such diverse contexts, and their ever-increasing numbers38, understanding the identity and cardinality of a given miRNA's targets represents an important endeavor.
Ever since the first report on the lin-4:lin-14 heteroduplex14,16, it was clear that the 5′ region of a miRNA played a central role in the recognition of its target. This region, spanning positions 2–7 from the miRNA's 5′ end, was originally referred to as the ‘core element’ but eventually became known as the ‘seed.’ Its apparent importance has been a central component of many computer-based miRNA target prediction schemes39,40,41,42,43,44,45,46 whereby the presence of the reverse complement of a miRNA's seed sequence is used as a filter before generating lists of candidate mRNA targets. Methods that do not rely on such filtering schemes have also been in use6,47. All computational attempts to tackle the problem have met with various degrees of success and for all practical purposes the problem remains open48,49.
Some of the very early work13,16 indicated that formation of functioning heteroduplexes did not require a strict reverse-complementarity relationship between the miRNA-seed sequence and its target. Since then, evidence has continued to accumulate steadily in support of such an ‘expanded’ interaction mode9,50,51,52,53,54,55,56,57,58,59,60,61,62,63, in turn suggesting a potential repertoire of targets for a given miRNA that is more diverse than the ‘seed’-reverse-complementarity constraint might suggest. We revisit this very question with the help of molecular dynamics (MD) simulations and the recently released crystal structure of ternary complexes of eubacterial Thermus thermophilus Ago (TtAgo)64,65.
TtAgo is considered an appropriate model for studying the properties of Ago complexes thanks to the high structural and functional similarities with the eukaryotic Ago families. For this study, we introduced selected modifications to the currently available Ago-DNA:mRNA co-crystal structure. The collection of heteroduplexes to simulate was informed by previously reported non-canonical examples6,9,13,14,16,52,62 and includes heteroduplexes with a) G:U wobbles in the seed in conjunction with adjacent Watson-Crick pairs; b) G:U wobbles in the seed but without adjacent Watson-Crick pairs; c) a single bulge on the target side (mRNA) at each of several different seed locations; and, d) a single bulge on the guide side (miRNA) at each of several different seed locations. To build our Ago-miRNA:mRNA ternary complexes we replaced the bases of the guide DNA by the corresponding RNA (miRNA). Each MD simulation spanned a minimum of 100 ns, generating a trajectory that is sufficiently long for the current analysis of the mRNA recognition dynamics and also for observing any potential conformational changes (discussed further below).
The Ago-complex is stable in the presence of multiple seed-region G:U wobbles
In earlier in vivo studies, the impact of G:U wobbles in the seed region (positions 2–7) was examined in D. melanogaster66 and in C. elegans52 and led to different conclusions. The fruit-fly study examined the impact of one, two and three G:U wobbles and concluded that “[…] a G:U wobble in the seed region is always detrimental […]”66. On the other hand, the worm study examined the impact of one and two wobbles in the seed region and found the mutants to still be functionally regulated by the targeting miRNA lsy-652. In addition to these two studies, luciferase assays were used to demonstrate the validity of functional heteroduplexes involving as many as five G:U wobbles in the seed6. For our studies, we created a first mutant with three G:U wobbles at positions 2, 3 and 4 of the seed region (Mutant #1 in Table 1). In a second experiment, we added a fourth G:U wobble at position 5 of the seed region (Mutant #2). Our analysis shows that in both configurations the resulting structures are stable and their Ago backbones compared to the backbone of the native Ago ternary complex exhibit small RMSD values (~2–3Å) (see Fig. 1c, Fig. 2a, and Supp. Fig. S1). In the mutants, study of position 6 or 7 of the seed region reveals that the conformations of Watson-Crick pairs remain intact between guide strand and the target strand (Supp. Fig. S2).
The Ago-complex is stable in the presence of multiple seed-region G:U wobbles and no compensating Watson-Crick pairs immediately adjacent to the seed
As can be seen from Figure 1b and Table 1, the heteroduplexes of Mutants #1 and #2 contain two Watson-Crick pairs immediately past the seed region, at positions 8 and 9. In order to determine the extent to which these two base pairs play a compensatory role that contributes to the stability of the complex we removed both and repeated the previous simulations (Mutant #1 → Mutant #3, Mutant #2 → Mutant #4). The two resulting heteroduplexes, were they stable, would rely primarily on coupling that spans the seed region and is rooted in the presence of three- (case of Mutant #3) and four- (case of Mutant #4) G:U wobbles respectively. Our simulations show that these mutants are indeed stable: the RMSD values of the Ago backbones from the wild-type remain low and reach a plateau of ~2.5Å after only ~40 ns (Fig. 1c, Fig. 2b, and Supp. Fig. S1). This indicates that the Watson-Crick pairs already present in the seed region (at positions 6 and 7) are sufficient for maintaining the overall stability of the heteroduplex – see also base pair distances in Supp. Fig. S2
The Ago-complex is stable in the presence of only partial seed-region coupling and no compensating Watson-Crick pairs immediately adjacent to it
The observed stability of the ternary complex in the presence of multiple G:U wobbles and without any compensating Watson-Crick pairs adjacent to the seed prompted us to also examine a somewhat extreme situation. In particular, we mutated the miRNA's adenosine at position 7 of the seed to a cytosine, thus “breaking” the base pairing at that location (Mutant #5) – shown in cyan in Table 1. The resulting heteroduplex, if realized, would be brought about by only five base pairs in the seed region, with four of them being G:U wobbles, and without any compensating Watson-Crick pairs beyond it. Interestingly, and somewhat surprisingly, we found that this arrangement also leads to a stable structure. In fact, the resulting RMSD is only slightly larger than the wild-type arrangement, remaining well below 3Å for the length of the simulation (green curve of Fig. 2b).
The Ago-complex is stable in the presence of a seed-region bulge on the messenger–RNA–side of the heteroduplex
For let-7, the second miRNA ever reported, it was shown that it regulates the heterochronic gene lin-41 by binding to two locations of lin-41's 3′UTR15. These two target locations, referred to as LCS1 and LCS2, were later demonstrated in vivo to be simultaneously required for lin-41's regulation62. For the purpose of this discussion, the heteroduplex formed between let-7 and LCS1 contains a bulge on the lin-41 (mRNA) side between positions 4 and 5 of the seed region15,62. Subsequently, examples of functioning heteroduplexes comprising messenger-RNA-side bulges in the seed region were reported and validated for mouse Oct4 (between seed positions 4 and 5) and mouse Sox2 (between seed positions 5 and 6), and concomitant physiological effects were shown for these heteroduplexes9. We thus sought to investigate the impact on the stability of the Ago complex that a bulge located on the target-side (mRNA) might have as a function of the bulge's actual location within the seed. Notably, in these experiments we maintained the three G:U wobbles that were previously introduced in the seed region and removed the two Watson-Crick pairs that were originally adjacent to the seed at positions 8 and 9; arguably this generates a rather demanding context for the complex stability in our simulation study. We investigated three bulge placements in the seed region: between seed positions 6 and 5 (Mutant #6), between seed positions 5 and 4 (Mutant #7), and, finally, between seed positions 4 and 3 (Mutant #8). We found that all three placements of the bulge generate stable structures, with a slight dependence on the actual position of the bulge within the seed's span. The resulting RMSD from the wild-type arrangement is small and ranges between 2 and 3Å (Fig. 2c). These findings demonstrate that more extreme and challenging scenarios than the one reported very recently63 (namely, the presence of heteroduplexes containing a bulge between positions 5 and 6 of the mRNA) are also possible.
The Ago-complex stability is affected minimally by a miRNA–side bulge in the seed region
In one of the very early publications on miRNA-driven RNA interference13 it was shown that bulged lin-4:lin-14 heteroduplexes, with the bulge being on the side of the targeting miRNA, at position 6 of the seed, were functional and sufficient for lin-14 temporal gradient formation in C. elegans. More recently, similarly bulged interactions were shown for mouse miRNA:mRNA heteroduplexes6,9. In this group of experiments we investigated the impact on the stability of the Ago-complex of a single bulge that is increasingly closer to the 5′ end of the miRNA at seed positions 6, 5 and 4 (Mutants #9, #10 and #11, respectively). Just as before, we maintained in all experiments the three G:U wobbles introduced in the seed region thus creating an extreme context for our simulation study. We also preserved a single Watson-Crick base pair immediately adjacent to the seed, at position 8. In all three cases, the resulting RMSD values were similar to what we observed prior to having introduced the miRNA-side bulge (Fig. 2d). Also, there was some distortion of individual base pairs when the bulge was placed at position 4 of the seed (Mutant #11) – see Fig. 1c. Bulge placements at seed positions 6 and 4 (Mutant #9 and Mutant #11, respectively) led to slightly larger RMSD values (~4 Å). Placement of the bulge at position 5 (Mutant #10) exhibited smaller structural deviations from the native structure for both the Ago protein and the RNA heteroduplex compared to the other two placements (Fig. 2d).
Disruptive mutations lead to a large bending motion of PAZ domain along the L1/L2 ‘hinge’ and a subsequent opening of the nucleic-acid-binding channel
We also examined whether our 100 ns simulations are long enough to capture large Ago-complex motions, as would be the case when attempting to simulate unsuitable, disruptive mutations. In order to address this question, we introduced several G→C mutations in the seed region aimed at “disrupting” the structure of the complex. Each G→C mutation broke a triple bond and led to non-bonded bases between the guide (miRNA) strand and the target (mRNA) strand. The first mutation we introduced broke the G-C bond at position 8 immediately adjacent to the seed (Mutant #12). Three more mutations gradually increased the number of non-bonded bases inside the seed region from one (Mutant #13) to two (Mutant #14) to three (Mutant #15), while maintaining the mutation at position 8 – see Methods and Supp. Table S1 for details. Not surprisingly, as the number of non-bonded bases increased, the stability of the miRNA-mRNA heteroduplex decreased (Supp. Fig. S3). The average RMSD of each nucleotide in the mRNA strand increased significantly and in proportion to the number of mismatches (Supp. Fig. S4). The comparable structural stability of the wild-type and mutants with a single non-bonded base supports earlier experimental work showing that a single nucleotide mismatch at the seed region only slightly reduces the cleavage activity of Ago complexes65. On the contrary, for Mutant #15 (which contains four G-C disruptions) a mere ~10 ns of simulation sufficed to disrupt most of the base pairing and base stacking, even for the canonical Watson-Crick base pairs (Supp. Fig. S5). The severe distortion of the backbone in the guide-target duplex caused the overall “decoupling” of the miRNA-mRNA heteroduplex, which in turn indicates that nucleation at the seed region cannot be achieved in the mutant with the four G-C-disruptions. The final snapshot of the wild type and of the four-G-C-disruption mutant are shown in Fig. 3a (see also Supplement). Mutant #15 also helped us observe significantly large motions of the PAZ domain. Strikingly, the PAZ domain bent away from both the N-domain (rotated by 55.7° and translated by 1.7Å – Fig. 3b) and the PIWI-domain (rotated by 42.5° and translated by 5.3Å – Fig. 3c), via two “hinges” close to PAZ in the L1 and L2 regions. The rotation and translation of the PAZ domain caused the nucleic-acid-binding channel to open between the PAZ and PIWI lobes, indicating that the bending motion of the PAZ domain plays a pivot role in the miRNA recognition process (see Supplement notes for more details).
The findings persist when simulating the 15-nt ternary complex
Lastly, we carried out simulations with the longer 15-nt complex67 – see Methods, Supp. Table 1, and Supp. Fig. S7–S9. In all cases, we were able to recapitulate the observations we made with the 11-nt complex. The extra 3′-compensatory pairing of the longer 15-nt complex has a minor contribution on maintaining the stability of the heteroduplex when the seed region bonds are broken (Supp. Fig. S7–S9). Our results are consistent with a recent finding that the complementary base-pairing beyond the seed region is not relevant for the repression of the cog-1 3′UTR and other C. elegans 3′UTRs by the lsy-6 miRNA68,69. It is noteworthy that, just like the case of Mutant #15 above, the same four G-C disruptions in the longer 15-nt complex (Mutant #19) result in large distortions in the seed region (Supp. Fig. S8, S9) and a disruption of the complex. This indicates that lack of nucleating base pairs in the seed region cannot be rescued even in the presence of a significant number of compensatory bonds beyond the seed.
Analyses of HITS-CLIP data corroborate the findings of the molecular dynamics simulations
HITS-CLIP (i.e. high-throughput sequencing of RNAs isolated by cross-linking immunoprecipitation) is a method that was introduced recently55 for the analysis of miRNA targets from mouse brain: following ultraviolet irradiation, Ago was immunoprecipitated under stringent conditions; as expected, the Ago protein was cross-linked with miRNAs, resulting in complexes of ~110 kDa, and further with mRNAs, resulting in larger size complexes of ~130 kDa. In the original report55, the immunoprecipitation was carried out using two distinct monoclonal antibodies and with biological replicates, and gave rise to a total of 10 datasets, five with enriched miRNAs and five with enriched mRNAs. We created genomic maps for the sequenced reads, paired up the maps of matching ~110 kDa and ~130 kDa sets and examined the data for support of the canonical miRNA targeting model and of an ‘expanded’ model that allows one or more G:U wobbles in the seed region (see Methods). Table 2 shows the results for these two models for the Brain A (Ago antibody 2A8) and Brain D (Ago antibody 7G1-1*) datasets. For each of the shown miRNA seeds, the Table lists the P-value that the matching HITS-CLIP ~130 kDa dataset could support the corresponding targeting model accidentally. As can be seen, HITS-CLIP data indeed support the ‘expanded model.’ For some of the miRNAs, e.g. miR-449a-5p, miR-222-3p, etc., the expanded model is a better fit for the HITS-CLIP data.
We have presented a series of molecular dynamics simulations on Ago ternary complexes that focused on investigating the influence of seed-located wobbles, bulges and combinations thereof on the structural stability of the Ago-miRNA:mRNA complex and the motion of its domains, and, by extension its ability to cleave its target. We found that introduction of multiple G:U wobbles in the seed region only minimally affects the miRNA-mRNA heteroduplex and does not compromise the stability of the complex. With regard to bulge insertions in the seed region, and for a variety of possible arrangements, we find that they are tolerated on both the miRNA and the mRNA sides. Seed-region bulges that occur on the miRNA side of the heteroduplex give rise to slight distortions in the nucleic acid duplex and induce somewhat larger conformational changes but do not disrupt the complex. Seed-region bulges that occur on the mRNA side appear to be better tolerated by comparison. We also find that arrangements involving simultaneously multiple G:U wobbles and a single bulge lead to stable structures as well. Moreover, we examined the impact of artificially introduced disruptive mutations to the seed region and found a novel recognition mechanism that involves an important bending motion of the PAZ domain along the L1/L2 ‘hinge’ link followed by the opening of the nucleic-acid-binding channel. Lastly, we made use of several distinct publicly available HITS-CLIP datasets and found that they corroborate the conclusions of our current molecular simulations.
Our analyses provide additional evidence in support of and are consistent with earlier work that emphasized the importance of strong base-pairing interactions spanning positions 2 through 7 of a miRNA, or a subset of those positions (e.g. in vivo examples involving lin-4 and let-7 comprising seed region bulges). However, it is important to also realize that as our molecular dynamics analyses show such strong interactions can be realized in a multitude of ways that obviate the requirement that the exact reverse complement of the miRNA's seed sequence be present in the target. In turn, this suggests that a given miRNA can give rise to non-canonical functioning heteroduplexes with targets that do not contain the miRNA seed. Taken together, these findings indicate that the spectrum of potential targets for a miRNA can admit a wide-spectrum of seed-less targets and thus substantially differs from what is anticipated by the canonical seed model. Consequently, our findings indicate that similar conclusions can be drawn about the potential spectrum of a given siRNA's targets, considering that user-designed siRNAs and miRNAs share the same pathway downstream of the DICER cleavage. In other words, it follows that those mRNAs harboring sequences that are proximal to the seed of the transfected siRNA, either because they would induce G:U wobbles or the introduction of a bulge in the seed region, could also be down-regulated by the siRNA.
Molecular dynamics simulations
Following similar protocols as in our previous studies70,71,72,73, the X-ray crystal structure of wild-type TtAgo bound to a 21-nt guide DNA and a 20-nt target RNA complex (PDB entry: 3F73, released in 2008.12) was used as the starting structure for the MD simulation65. The DNA and the RNA strands can only be partly traced from position 1 to 11, and the base coordinates at position 10 and 11 are not available in the reported crystal structure65. Therefore, the missing coordinates at position 10 and 11 were built from the known backbone structures. We repeated the simulations using the more recently released Ago complex with longer traceable guide-target duplex (length of the duplex is 15-bp, positions 2–16, PDB entry: 3HK2, released in 2009.10)67. The Ago-miRNA:mRNA complexes were generated by replacing the bases of the guide DNA by corresponding RNA (deoxyribose was replaced by ribose in A, C, and G whereas T was replaced by U). All the Ago complexes were solvated in ~110×100×90Å3 water boxes. A total of 32 Na+ ions and 29 Cl− ions were added to neutralize and mimic the biological environment (100 mM NaCl concentration). The solvated systems contain approximately 100,000 atoms. We utilized the NAMD274 package for the MD simulations with the NPT ensemble. The CHARMM (parameter set c32b1) force field was used for the protein and nucleic acid75,76, and the TIP3P water model was used as the explicit solvent77. The Particle Mesh Ewald (PME) method78 as applied to treat the long-range electrostatic interactions and a 12 Å cutoff was employed for the van der Waals interactions. All the Ago complexes systems were equilibrated via a 20,000-step energy minimization to remove bad contacts. The minimized configurations were used as the starting point for 1-ns NPT MD equilibrations with 0.5 fs time-step at 1 atm and 310 K. The equilibrated configurations were then subjected to production runs for a minimum of 100 ns. The time step for all production runs was 1.5 fs with SHAKE/RATTLE algorithm79.
Analyses of HITS-CLIP data
We used BWA80 to quality-trim and map on the mouse genome all 10 sets of reads (Brain A through E at ~110 kDa, and Brain A through E at ~130 kDa) that resulted from the deep-sequencing of the two immunoprecipitations of the biological replicates55. The mapping process allowed up to 2 mismatches and excluded all the reads that could not be mapped uniquely on the genome. More than 48 million reads in total were processed, of which almost 32 million were mapped uniquely. From each of the mRNA-enriched sets (~130 kDa) we only kept locations that had at least twenty reads mapped to them. We treated the products from the two arms of a miRNA separately and used the coordinates of each miRNA's 5p and 3p products (Release 18 of miRBase38) to identify the top-30 most abundant miRNAs in the five miRNA-enriched sets (~110 kDa). Not unexpectedly, and given that our searches were carried out using a miRBase release that was significantly more enriched than the one used in the original work55, we found some of the top-spots to be occupied by miRNAs that were added to miRBase only recently. For the rest of the analyses, the read sets were paired up: Brain A ~110 kDa Brain A ~130 kDa, Brain B ~110 kDa Brain B ~130 kDa, etc. Targets for the top miRNAs of a given ~110 kDa read set were sought in the genomic maps of the matching ~130 kDa set among locations to which 20 or more reads mapped. We focused only on those HITS-CLIP reads that mapped on the exons of known mouse protein-coding genes. We then carried out two types of searches. In one, we sought the exact reverse complement of the seed of a top-ranking miRNA (canonical model) in the mRNA maps. In the second, we examined whether the mRNA maps support an ‘expanded’ model of miRNA:mRNA interactions where one or more G:U wobbles are allowed in the seed region. To this end, we replaced every G in the seed by a pyrimidine (C or T; represented as Y) in the reverse complement of the seed, and every T in the seed by a purine (A or G; represented as R) and computed the P-value of HITS-CLIP accidentally supporting targeting by the corresponding miRNA under each of the two models (see Supplement for details).
We thank Payel Das, Jingyuan Li, Bruce Berne, and Chunli Yan for helpful discussions. This work is supported by the IBM Blue Gene Science program. The work of Y.Z., H-W.C., P.L., P.C. and I.R was supported by Thomas Jefferson University startup funds and by a William M. Keck Foundation grant to IR.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareALike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/