Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

M3: an integrative framework for structure determination of molecular machines


We present a broadly applicable, user-friendly protocol that incorporates sparse and hybrid experimental data to calculate quasi-atomic-resolution structures of molecular machines. The protocol uses the HADDOCK framework, accounts for extensive structural rearrangements both at the domain and atomic levels and accepts input from all structural and biochemical experiments whose data can be translated into interatomic distances and/or molecular shapes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Workflow of the integrative structure determination protocol M3.
Figure 2: Application to the yeast RNA polymerase (pol) II demonstrates M3's ability to translate sparse data into a structural model.
Figure 3: Structure determination of the Box C/D RNP underpins the robustness of the M3 protocol.

Accession codes

Primary accessions

Protein Data Bank

Referenced accessions

Electron Microscopy Data Bank


  1. Karaca, E. & Bonvin, A.M. Advances in integrative modeling of biomolecular complexes. Methods 59, 372–381 (2013).

    Article  CAS  Google Scholar 

  2. Ward, A.B., Sali, A. & Wilson, I.A. Biochemistry. Integrative structural biology. Science 339, 913–915 (2013).

    Article  CAS  Google Scholar 

  3. Morag, O., Sgourakis, N.G., Baker, D. & Goldbourt, A. The NMR-Rosetta capsid model of M13 bacteriophage reveals a quadrupled hydrophobic packing epitope. Proc. Natl. Acad. Sci. USA 112, 971–976 (2015).

    Article  CAS  Google Scholar 

  4. Duss, O., Yulikov, M., Jeschke, G. & Allain, F.H. EPR-aided approach for solution structure determination of large RNAs or protein–RNA complexes. Nat. Commun. 5, 3669 (2014).

    Article  CAS  Google Scholar 

  5. Ferber, M. et al. Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat. Methods 13, 515–520 (2016).

    Article  CAS  Google Scholar 

  6. Kalinin, S. et al. A toolkit and benchmark study for FRET-restrained high-precision structural modeling. Nat. Methods 9, 1218–1225 (2012).

    Article  CAS  Google Scholar 

  7. Lapinaite, A. et al. The structure of the box C/D enzyme reveals regulation of RNA methylation. Nature 502, 519–523 (2013).

    Article  CAS  Google Scholar 

  8. Politis, A. et al. A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat. Methods 11, 403–406 (2014).

    Article  CAS  Google Scholar 

  9. Russel, D. et al. Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 10, e1001244 (2012).

    Article  CAS  Google Scholar 

  10. van Zundert, G.C. et al. The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol. 428, 720–725 (2016).

    Article  CAS  Google Scholar 

  11. Carlomagno, T. Present and future of NMR for RNA–protein complexes: a perspective of integrated structural biology. J. Magn. Reson. 241, 126–136 (2014).

    Article  CAS  Google Scholar 

  12. Dominguez, C., Boelens, R. & Bonvin, A.M. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).

    Article  CAS  Google Scholar 

  13. Gabel, F. Small-angle neutron scattering for structural biology of protein-RNA complexes. Methods Enzymol. 558, 391–415 (2015).

    Article  CAS  Google Scholar 

  14. Madl, T., Gabel, F. & Sattler, M. NMR and small-angle scattering-based structural analysis of protein complexes in solution. J. Struct. Biol. 173, 472–482 (2011).

    Article  CAS  Google Scholar 

  15. Feng, C. et al. Log-transformation and its implications for data analysis. Shanghai Arch. Psychiatry 26, 105–109 (2014).

    PubMed  PubMed Central  Google Scholar 

  16. Robinson, R.C. et al. Crystal structure of Arp2/3 complex. Science 294, 1679–1684 (2001).

    Article  CAS  Google Scholar 

  17. Leung, A.K., Nagai, K. & Li, J. Structure of the spliceosomal U4 snRNP core domain and its implication for snRNP biogenesis. Nature 473, 536–539 (2011).

    Article  CAS  Google Scholar 

  18. Gnatt, A.L., Cramer, P., Fu, J., Bushnell, D.A. & Kornberg, R.D. Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution. Science 292, 1876–1882 (2001).

    Article  CAS  Google Scholar 

  19. Armache, K.J., Mitterweger, S., Meinhart, A. & Cramer, P. Structures of complete RNA polymerase II and its subcomplex, Rpb4/7. J. Biol. Chem. 280, 7131–7134 (2005).

    Article  CAS  Google Scholar 

  20. Chen, Z.A. et al. Architecture of the RNA polymerase II–TFIIF complex revealed by cross-linking and mass spectrometry. EMBO J. 29, 717–726 (2010).

    Article  CAS  Google Scholar 

  21. Raman, S. et al. NMR structure determination for larger proteins using backbone-only data. Science 327, 1014–1018 (2010).

    Article  CAS  Google Scholar 

  22. Plaschka, C. et al. Architecture of the RNA polymerase II–Mediator core initiation complex. Nature 518, 376–380 (2015).

    Article  CAS  Google Scholar 

  23. Karaca, E. & Bonvin, A.M. A multidomain flexible docking approach to deal with large conformational changes in the modeling of biomolecular complexes. Structure 19, 555–565 (2011).

    Article  CAS  Google Scholar 

  24. Alber, F. et al. The molecular architecture of the nuclear pore complex. Nature 450, 695–701 (2007).

    Article  CAS  Google Scholar 

  25. Xue, S. et al. Structural basis for substrate placement by an archaeal box C/D ribonucleoprotein particle. Mol. Cell 39, 939–949 (2010).

    Article  CAS  Google Scholar 

  26. Saff, E.B. & Kuijlaars, A.B.J. Distributing many points on a sphere. Math. Intell. 19, 5–11 (1997).

    Article  Google Scholar 

  27. Rodrigues, J.P. Computational Structural Biology of Macromolecular Interactions (Ridderprint BV, 2014).

  28. Brunger, A.T. Version 1.2 of the crystallography and NMR system. Nat. Protoc. 2, 2728–2733 (2007).

    Article  CAS  Google Scholar 

  29. MATLAB and Statistics Toolbox Release v. R2008a (Version 7.6) (Natick, 2008).

  30. van Dijk, M. & Bonvin, A.M. Pushing the limits of what is achievable in protein–DNA docking: benchmarking HADDOCK's performance. Nucleic Acids Res. 38, 5634–5647 (2010).

    Article  CAS  Google Scholar 

  31. Petoukhov, M.V. et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Cryst. 45, 342–350 (2012).

    Article  CAS  Google Scholar 

  32. Pettersen, E.F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

    Article  CAS  Google Scholar 

  33. Méndez, R., Leplae, R., De Maria, L. & Wodak, S.J. Assessment of blind predictions of protein–protein interactions: current status of docking methods. Proteins 52, 51–67 (2003).

    Article  Google Scholar 

  34. Nilges, M., Gronenborn, A.M., Brünger, A.T. & Clore, G.M. Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng. 2, 27–38 (1988).

    Article  CAS  Google Scholar 

  35. Rosenzweig, R., Moradi, S., Zarrine-Afsar, A., Glover, J.R. & Kay, L.E. Unraveling the mechanism of protein disaggregation through a ClpB-DnaK interaction. Science 339, 1080–1083 (2013).

    Article  CAS  Google Scholar 

  36. Kahraman, A., Malmström, L. & Aebersold, R. Xwalk: computing and visualizing distances in cross-linking experiments. Bioinformatics 27, 2163–2164 (2011).

    Article  CAS  Google Scholar 

  37. Urlaub, H., Kühn-Hölsken, E. & Lührmann, R. Analyzing RNA-protein crosslinking sites in unlabeled ribonucleoprotein complexes by mass spectrometry. Methods Mol. Biol. 488, 221–245 (2008).

    Article  CAS  Google Scholar 

  38. Karaca, E. & Bonvin, A.M. On the usefulness of ion-mobility mass spectrometry and SAXS data in scoring docking decoys. Acta Crystallogr. D Biol. Crystallogr. 69, 683–694 (2013).

    Article  CAS  Google Scholar 

  39. Mund, M., Overbeck, J.H., Ullmann, J. & Sprangers, R. LEGO-NMR spectroscopy: a method to visualize individual subunits in large heteromeric complexes. Angew. Chem. Int. Edn Engl. 52, 11401–11405 (2013).

    Article  CAS  Google Scholar 

  40. Mühlbacher, W. et al. Conserved architecture of the core RNA polymerase II initiation complex. Nat. Commun. 5, 4310 (2014).

    Article  Google Scholar 

  41. Petoukhov, S.V. The system-resonance approach in modeling genetic structures. Biosystems 139, 1–11 (2016).

    Article  CAS  Google Scholar 

  42. Karaca, E. et al. M3: an integrative framework for structure determination of molecular machines. Protocol Exchange (2017).

Download references


This work was supported by the EMBL, the EU FP7 ITN project RNPnet (contract number 289007) and the DFG grant CA294/3-2. E.K. acknowledges support from the Alexander von Humboldt Foundation through a Humboldt Research Fellowship for Postdoctoral Researchers. We thank J. Kirkpatrick for critical reading of the manuscript and B. Simon for discussion and support with CNS. A.M.J.J.B. acknowledges funding from the European H2020 e-Infrastructure grants West-Life (grant no. 675858) and BioExcel (grant no. 675728).

Author information

Authors and Affiliations



E.K. designed the studies, developed software, performed structure calculations, analyzed and interpreted data and wrote the manuscript, J.P.G.L.M.R. developed software; A.G. analyzed experiments; A.M.J.J.B. provided software and assisted in software development; T.C. designed the studies, assisted in data interpretation, wrote the manuscript and supervised the project.

Corresponding author

Correspondence to Teresa Carlomagno.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Sparse experimental data leads to a non-normal right (positive) skewed Eexp distribution.

a. When only sparse experimental data is available, global search generates few structures with significantly low Eexp. b. Low Eexp structures can be distinguished from the rest of the population by transforming Eexp values into ln(Eexp). Such transformation leads to a left (negative) skewed distribution. c. Structures with significantly low Eexp can be isolated as outliers (green circles) by using a box-and-whisker plot, where whiskers are extended by two IQRs. The green line indicates the median.

Supplementary Figure 2 The completeness of the input data can be probed by box-and-whisker statistics.

The heptameric Arp2/3 protein complex was used to test the performance of the M3 protocol with respect to the number of restraints. a. Graphical representation of building block separation prior to global search; Arp2/3 monomers are named after their chain IDs (as given in 1k8k). The yellow dashes correspond to 30 inter-monomer NOE distances. b. Normalized ln(Eexp) distributions for global-search runs using 50 (blue), 30 (green) and 10 (grey) NOEs. The run with 50/30/10 NOEs resulted in 119/58/0 outliers. The outliers of the runs with 50 and 30 NOEs run have a precision of 2.1 ű1.0 Å and 7.2 ű3.0 Å, respectively. c. The top ten structures from the global search step using 30 NOEs (superimposed on chain A). The precision of the ensemble with 10 lowest-energy structures is reported in the figure; the accuracy with respect to 1k8k is 2.5 ű2.2 Å (Cα-RMSD).

Supplementary Figure 3 Use of complementary structural information leads to a converged ensemble.

The human U4 Sm proteins-RNA complex (4wzj) was used to test the performance of different types of restraints. a. Graphical representation of the positions between which distances can be measure either by NMR, i.e. methyl groups of the ILV residues (represented by spheres) and PRE label locations (pink pentagons), or by XL-MS, i.e. NZ atoms of the lysine side chains (metallic blue circles). b. MS-XL restraints during the global search step resulted in no outliers, whereas runs using mPREs generated 9 and 7 outliers for 100% and 50% assigned methyl groups, respectively. c. 70 local search structures, following the global search using mPRE data for 50% assigned methyl groups, were grouped into 7 clusters. The best scoring two structures of cluster 2 (dark green circles) display a significantly better χ with respect to the SAXS curve. d. The precision of the final selected ensemble is reported in the figure; the accuracy with respect to 4wzj is 2.8±0.8 Å (Cα- and P-RMSD).

Supplementary Figure 4 Sparse distance restraints result in a native-like ensemble for all but one monomer.

a. Graphical representation of the separation of the building blocks of RNA polymerase II prior to the global search with 50 inter-protein (yellow dashed lines) and 5 protein-nucleic acid (salmon dashed lines) restraints. Due to the small number of restraints, the interactions between Rpb1-Rpb3, Rpb2-Rpb7, Rpb2-Rpb10, Rpb2-Rpb11, Rpb3-Rpb7 and Rpb2-Rpb6 are described by only one distance. b. Scoring by ln(EExp) identified three conformers to be passed to the local search step. c. The 30 local search conformers were separated in two clusters. Cluster 1 contains the ensemble of 13 structures (dark green circles) with the best fitness to the EM map (mean ccor > 0.94). The precision of the ensemble of 13 structures, including Rpb11, is given in the figure (for clarity we depicted only the best 10 structures); the accuracy with respect to 1i6h is 7.7±1.2 Å (Cα/P-RMSD). The orientation of all monomers but Rbp11 is similar to 1i6h (Supplementary Figure 5).

Supplementary Figure 5 The RNA pol II structures resulting from the local search step prior to the shape-driven selection differ in the orientation of Rpb5 and Rpbp11.

a. Representative structures of cluster 1 and 2. Major differences are related to the orientation of Rpb5 (light gray) and Rpb11 (black). b. In cluster 1 the relative orientation of the monomers Rpb11 and Rpb3 is predicted incorrectly. As a result, one restraint is violated between two lysine side chains (dashed yellow line). c. The restraint #41 (shown in b) is violated in all structures of cluster 1 (distance >> 16.4 Å). In this panel, e on the x-axis indicates a structure that is selected for the final ensemble. The order of the structures represented on the x-axis is random.

Supplementary Figure 6 a-b. Evaluation of global conformational sampling for RNA Pol II.

Due to the limited number of degrees of freedom and experimental restraints, the energy surface could be sampled with only 500 structures (a); extension of the sampling to 1000 structures (b) did not generate any structure with better fit to the experimental data or significantly different geometry. c-d. Decrease in the Eexp values after local search indicates convergence of physical and restraint forces close to the native structure. For the U4 Sm proteins-RNA complex, Eexp decreases upon refinement of the interaction interfaces, as it is expected when searching the space close to the native structure (c); contrarily, for RNA Pol II the Eexp values increase upon refinement of the interfaces, indicating conflicting physical and restraints forces; this is expected when searching the space far from the native structure (d). e-f. Distribution of energy values for the structures of RNA Pol II calculated during local search. Restraints (e) and physical (force-field, f) energies are plotted with respect to the i-RMSD from the structure with the highest ccor for each structure generated during local search. The lack of correlation between ln(Eexp) and Eff is evident.

Supplementary Figure 7 Eexp analysis of global search solutions for the Box C/D RNP in its substrate-bound form.

The global search of the conformational space of the Box C/D enzyme in the substrate-bound form was driven by three restraint classes: PRE-derived distances, SANS-derived RNA shape and connectivity restraints. To ensure equal weighting of each term in the selection process, the Eexp terms, which span different value ranges, were individually normalized over [0,1] and then summed (Methods).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Note 1. (PDF 2518 kb)

Life Sciences Reporting Summary

Life Sciences Reporting Summary. (PDF 129 kb)

Supplementary Protocol

M3 manual. (PDF 338 kb)

Supplementary Software

HADDOCK-M3 software. (ZIP 2929 kb)

Supplementary Data

Restraint files, starting structures and final models. (ZIP 27585 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karaca, E., Rodrigues, J., Graziadei, A. et al. M3: an integrative framework for structure determination of molecular machines. Nat Methods 14, 897–902 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing