Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Structure reveals why genome folding is necessary for site-specific integration of foreign DNA into CRISPR arrays

Abstract

Bacteria and archaea acquire resistance to viruses and plasmids by integrating fragments of foreign DNA into the first repeat of a CRISPR array. However, the mechanism of site-specific integration remains poorly understood. Here, we determine a 560-kDa integration complex structure that explains how Pseudomonas aeruginosa Cas (Cas1–Cas2/3) and non-Cas proteins (for example, integration host factor) fold 150 base pairs of host DNA into a U-shaped bend and a loop that protrude from Cas1–2/3 at right angles. The U-shaped bend traps foreign DNA on one face of the Cas1–2/3 integrase, while the loop places the first CRISPR repeat in the Cas1 active site. Both Cas3 proteins rotate 100 degrees to expose DNA-binding sites on either side of the Cas2 homodimer, which each bind an inverted repeat motif in the leader. Leader sequence motifs direct Cas1–2/3-mediated integration to diverse repeat sequences that have a 5′-GT. Collectively, this work reveals new DNA-binding surfaces on Cas2 that are critical for DNA folding and site-specific delivery of foreign DNA.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Cryo-EM structure of the type I-F CRISPR integration complex.
Fig. 2: Foreign DNA constrains the Cas2/3 linker against conserved Cas1 residues.
Fig. 3: The Cas2 homodimer simultaneously coordinates four dsDNA helices critical to CRISPR integration.
Fig. 4: Sequence motifs in the leader and IHF proteins facilitate Cas1–2/3-based integration into diverse repeat sequences.
Fig. 5: I-F CRISPR integration complex suggests a mechanism for primed acquisition and Cascade impact on integration.
Fig. 6: DNA is a flexible scaffold that controls DNA mobilization.

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon request. Curated raw multi-frame movies have been deposited along with a reference gain file under accession number EMPIAR-11659. Cryo-EM maps were deposited in the Electron Microscopy Data Bank under accession number EMD-29280. The atomic model of the type I-F integration complex was deposited in the PDB under accession number 8FLJ. Plasmids generated in this study are available from Addgene. Source data are provided with this paper.

Code availability

Code is available at https://github.com/WiedenheftLab/.

References

  1. Koonin, E. V. & Krupovic, M. Evolution of adaptive immunity from transposable elements combined with innate immune systems. Nat. Rev. Genet. 16, 184–192 (2015).

    CAS  PubMed  Google Scholar 

  2. McClintock, B. The origin and behavior of mutable loci in maize. Proc. Natl Acad. Sci. USA 36, 344–355 (1950).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Nuñez, J. K., Bai, L., Harrington, L. B., Hinder, T. L. & Doudna, J. A. CRISPR immunological memory requires a host factor for specificity. Mol. Cell 62, 824–833 (2016).

    PubMed  Google Scholar 

  4. Fagerlund, R. D. et al. Spacer capture and integration by a type I-F Cas1–Cas2-3 CRISPR adaptation complex. Proc. Natl Acad. Sci. USA 114, 201618421 (2017).

    Google Scholar 

  5. Wright, A. V. et al. Structures of the CRISPR genome integration complex. Science 357, 1113–1118 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Obergfell, K. P. & Seifert, H. S. Mobile DNA in the pathogenic Neisseria. Microbiol. Spectrum 3, https://doi.org/10.1128/microbiolspec.mdna3-0015-2014 (2014).

  7. Laxmikanthan, G. et al. Structure of a Holliday junction complex reveals mechanisms governing a highly regulated DNA transaction. eLife 5, e14313 (2016).

    PubMed  PubMed Central  Google Scholar 

  8. Lee, H. & Sashital, D. G. Creating memories: molecular mechanisms of CRISPR adaptation. Trends Biochem. Sci. 47, 464–476 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Wang, J. et al. Structural and mechanistic basis of PAM-dependent spacer acquisition in CRISPR–Cas systems. Cell 163, 840–853 (2015).

    CAS  PubMed  Google Scholar 

  10. Nuñez, J. K., Harrington, L. B., Kranzusch, P. J., Engelman, A. N. & Doudna, J. A. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527, 535–538 (2015).

    PubMed  PubMed Central  Google Scholar 

  11. Xiao, Y., Ng, S., Nam, K. H. & Ke, A. How type II CRISPR–Cas establish immunity through Cas1–Cas2-mediated spacer integration. Nature 550, 137–141 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Jackson, S. A. et al. CRISPR–Cas: adapting to change. Science 356, eaal5056 (2017).

    PubMed  Google Scholar 

  13. Mojica, F. J. M., Díez-Villaseñor, C., García-Martínez, J. & Almendros, C. Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155, 733–740 (2009).

    CAS  PubMed  Google Scholar 

  14. Kim, S. et al. Selective loading and processing of prespacers for precise CRISPR adaptation. Nature 579, 141–145 (2020).

    CAS  PubMed  Google Scholar 

  15. Hu, C. et al. Mechanism for Cas4-assisted directional spacer acquisition in CRISPR–Cas. Nature 598, 515–520 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Ramachandran, A., Summerville, L., Learn, B. A., DeBell, L. & Bailey, S. Processing and integration of functionally oriented prespacers in the Escherichia coli CRISPR system depends on bacterial host exonucleases. J. Biol. Chem. 295, 3403–3414 (2020).

    CAS  PubMed  Google Scholar 

  17. Liao, C. et al. Spacer prioritization in CRISPR–Cas9 immunity is enabled by the leader RNA. Nat. Microbiol. 7, 530–541 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. McGinn, J. & Marraffini, L. A. CRISPR–Cas systems optimize their immune response by specifying the site of spacer integration. Mol. Cell 64, 616–623 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Wang, R., Li, M., Gong, L., Hu, S. & Xiang, H. DNA motifs determining the accuracy of repeat duplication during CRISPR adaptation in Haloarcula hispanica. Nucleic Acids Res. 44, 4266–4277 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Goren, M. G. et al. Repeat size determination by two molecular rulers in the type I-E CRISPR array. Cell Rep. 16, 2811–2818 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Linheiro, R. S. & Bergman, C. M. Testing the palindromic target site model for DNA transposon insertion using the Drosophila melanogaster P-element. Nucleic Acids Res. 36, 6199–6208 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Santiago-Frangos, A., Buyukyoruk, M., Wiegand, T., Krishna, P. & Wiedenheft, B. Distribution and phasing of sequence motifs that facilitate CRISPR adaptation. Curr. Biol. 31, 3515–3524 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Kieper, S. N., Almendros, C. & Brouns, S. J. J. Conserved motifs in the CRISPR leader sequence control spacer acquisition levels in type I-D CRISPR–Cas systems. FEMS Microbiol. Lett. 366, 2016–2020 (2019).

    Google Scholar 

  24. Rollie, C., Graham, S., Rouillon, C. & White, M. F. Prespacer processing and specific integration in a type I-A CRISPR system. Nucleic Acids Res. 46, 1007–1020 (2018).

    CAS  PubMed  Google Scholar 

  25. Yosef, I., Goren, M. G. & Qimron, U. Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 40, 5569–5576 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Wei, Y., Chesne, M. T., Terns, R. M. & Terns, M. P. Sequences spanning the leader-repeat junction mediate CRISPR adaptation to phage in Streptococcus thermophilus. Nucleic Acids Res. 43, 1749–1758 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Wright, A. V. & Doudna, J. A. Protecting genome integrity during CRISPR immune adaptation. Nat. Struct. Mol. Biol. 23, 876–883 (2016).

    CAS  PubMed  Google Scholar 

  28. Westra, E. R. et al. Parasite exposure drives selective evolution of constitutive versus inducible defense. Curr. Biol. 25, 1043–1049 (2015).

    CAS  PubMed  Google Scholar 

  29. Makarova, K. S. et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).

    CAS  PubMed  Google Scholar 

  30. Richter, C. et al. Priming in the type I-F CRISPR–Cas system triggers strand-independent spacer acquisition, bi-directionally from the primed protospacer. Nucleic Acids Res. 42, 8516–8526 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Datsenko, K. A. et al. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nat. Commun. 3, 945 (2012).

    PubMed  Google Scholar 

  32. Xiao, Y. et al. Structure basis for RNA-guided DNA degradation by Cascade and Cas3. Science 361, eaat0839 (2018).

    PubMed  PubMed Central  Google Scholar 

  33. Nicholson, T. J. et al. Bioinformatic evidence of widespread priming in type I and II CRISPR–Cas systems. RNA Biol. 16, 566–576 (2019).

    PubMed  Google Scholar 

  34. Dillard, K. E. et al. Assembly and translocation of a CRISPR-Cas primed acquisition complex. Cell 175, 934–946.e15 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Li, M., Wang, R., Zhao, D. & Xiang, H. Adaptation of the Haloarcula hispanica CRISPR–Cas system to a purified virus strictly requires a priming process. Nucleic Acids Res. 42, 2483–2492 (2014).

    CAS  PubMed  Google Scholar 

  36. Semenova, E. et al. Highly efficient primed spacer acquisition from targets destroyed by the Escherichia coli type I-E CRISPR–Cas interfering complex. Proc. Natl Acad. Sci. USA 113, 7626–7631 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Fineran, P. C. et al. Degenerate target sites mediate rapid primed CRISPR adaptation. Proc. Natl Acad. Sci. USA 111, E1629–E1638 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Rice, P. A., Yang, S., Mizuuchi, K. & Nash, H. A. Crystal structure of an IHF–DNA complex: a protein-induced DNA U-turn. Cell 87, 1295–1306 (1996).

    CAS  PubMed  Google Scholar 

  39. Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Zayed, H. The DNA-bending protein HMGB1 is a cellular cofactor of Sleeping Beauty transposition. Nucleic Acids Res. 31, 2313–2322 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Little, A. J., Corbett, E., Ortega, F. & Schatz, D. G. Cooperative recruitment of HMGB1 during V(D)J recombination through interactions with RAG1 and DNA. Nucleic Acids Res. 41, 3289–3301 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Nash, H. A. & Robertson, C. A. Purification and properties of the Escherichia coli protein factor required for lambda integrative recombination. J. Biol. Chem. 256, 9246–9253 (1981).

    CAS  PubMed  Google Scholar 

  43. Lavoie, B. D. & Chaconas, G. Site-specific HU binding in the Mu transpososome: conversion of a sequence-independent DNA-binding protein into a chemical nuclease. Genes Dev. 7, 2510–2519 (1993).

    CAS  PubMed  Google Scholar 

  44. Chalmers, R., Guhathakurta, A., Benjamin, H. & Kleckner, N. IHF modulation of Tn10 transposition: sensory transduction of supercoiling status via a proposed protein/DNA molecular spring. Cell 93, 897–908 (1998).

    CAS  PubMed  Google Scholar 

  45. Haniford, D. B. Transpososome dynamics and regulation in Tn10 transposition. Crit. Rev. Biochem. Mol. Biol. 41, 407–424 (2006).

    CAS  PubMed  Google Scholar 

  46. Whitfield, C. R., Wardle, S. J. & Haniford, D. B. The global bacterial regulator H-NS promotes transpososome formation and transposition in the Tn5 system. Nucleic Acids Res. 37, 309–321 (2009).

    CAS  PubMed  Google Scholar 

  47. Liu, D., Haniford, D. B. & Chalmers, R. M. H-NS mediates the dissociation of a refractory protein–DNA complex during Tn10/IS10 transposition. Nucleic Acids Res. 39, 6660–6668 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. van Gent, D. C., Hiom, K., Paull, T. T. & Gellert, M. Stimulation of V(D)J cleavage by high mobility group proteins. EMBO J. 16, 2665–2670 (1997).

    PubMed  PubMed Central  Google Scholar 

  49. Rowland, S.-J., Stark, W. M. & Boocock, M. R. Sin recombinase from Staphylococcus aureus: synaptic complex architecture and transposon targeting. Mol. Microbiol. 44, 607–619 (2002).

    CAS  PubMed  Google Scholar 

  50. Alonso, J. C., Weise, F. & Rojo, F. The Bacillus subtilis histone-like protein Hbsu is required for DNA resolution and DNA inversion mediated by the β recombinase of plasmid pSM19135. J. Biol. Chem. 270, 2938–2945 (1995).

    CAS  PubMed  Google Scholar 

  51. Petit, M.-A., Ehrlich, D. & Jannière, L. pAMβ1 resolvase has an atypical recombination site and requires a histone-like protein HU. Mol. Microbiol. 18, 271–282 (1995).

    CAS  PubMed  Google Scholar 

  52. Rojo, F. & Alonso, J. C. The β recombinase of plasmid pSM19035 binds to two adjacent sites, making different contacts at each of them. Nucleic Acids Res. 23, 3181–3188 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Walker, M. W. G., Klompe, S. E., Zhang, D. J. & Sternberg, S. H. Transposon mutagenesis libraries reveal novel molecular requirements during CRISPR RNA-guided DNA integration. Preprint at bioRxiv https://doi.org/10.1101/2023.01.19.524723 (2023).

  54. Rollins, M. F. et al. Cas1 and the Csy complex are opposing regulators of Cas2/3 nuclease activity. Proc. Natl Acad. Sci. USA 114, 201616395 (2017).

    Google Scholar 

  55. Wang, X. et al. Structural basis of Cas3 inhibition by the bacteriophage protein AcrF3. Nat. Struct. Mol. Biol. 23, 868–870 (2016).

    CAS  PubMed  Google Scholar 

  56. Wiedenheft, B. et al. Structural basis for DNase activity of a conserved protein implicated in CRISPR-mediated genome defense. Structure 17, 904–912 (2009).

    CAS  PubMed  Google Scholar 

  57. Kunin, V., Sorek, R. & Hugenholtz, P. Evolutionary conservation of sequence and secondary structures in CRISPR repeats. Genome Biol. 8, R61 (2007).

    PubMed  PubMed Central  Google Scholar 

  58. Nethery, M. A. et al. CRISPRclassify: repeat-based classification of CRISPR loci. CRISPR J. 4, 558–574 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Dhingra, Y., Suresh, S. K., Juneja, P. & Sashital, D. G. PAM binding ensures orientational integration during Cas4-Cas1-Cas2-mediated CRISPR adaptation. Mol. Cell 82, 4353–4367.e6 (2022).

    CAS  PubMed  Google Scholar 

  60. Ali Azam, T., Iwata, A., Nishimura, A., Ueda, S. & Ishihama, A. Growth phase-dependent variation in protein composition of the Escherichia coli nucleoid. J. Bacteriol. 181, 6361–6370 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Montaño, S. P., Pigli, Y. Z. & Rice, P. A. The Mu transpososome structure sheds light on DDE recombinase evolution. Nature 491, 413–417 (2012).

    PubMed  PubMed Central  Google Scholar 

  62. Maertens, G. N., Hare, S. & Cherepanov, P. The mechanism of retroviral integration from X-ray structures of its key intermediates. Nature 468, 326–329 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Rollie, C., Schneider, S., Brinkmann, A. S., Bolt, E. L. & White, M. F. Intrinsic sequence specificity of the Cas1 integrase directs new spacer acquisition. eLife 4, e08716 (2015).

    PubMed  PubMed Central  Google Scholar 

  64. Béguin, P., Chekli, Y., Sezonov, G., Forterre, P. & Krupovic, M. Sequence motifs recognized by the casposon integrase of Aciduliprofundum boonei. Nucleic Acids Res. 47, 6386–6395 (2019).

    PubMed  PubMed Central  Google Scholar 

  65. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Classification and nomenclature of CRISPR-Cas systems: where from here? CRISPR J. 1, 325–336 (2018).

    PubMed  PubMed Central  Google Scholar 

  66. Deveau, H. et al. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J. Bacteriol. 190, 1390–1400 (2008).

    CAS  PubMed  Google Scholar 

  67. Künne, T. et al. Cas3-derived target DNA degradation fragments fuel primed CRISPR adaptation. Mol. Cell 63, 852–864 (2016).

    PubMed  Google Scholar 

  68. Musharova, O. et al. Prespacers formed during primed adaptation associate with the Cas1–Cas2 adaptation complex and the Cas3 interference nuclease–helicase. Proc. Natl Acad. Sci. USA 118, e2021291118 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Wiegand, T. et al. Reproducible antigen recognition by the type I-F CRISPR–Cas System. CRISPR J. 3, 378–387 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Vorontsova, D. et al. Foreign DNA acquisition by the I-F CRISPR–Cas system requires all components of the interference machinery. Nucleic Acids Res. 43, 10848–10860 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Cavazzana-Calvo, M. et al. Gene therapy of human severe combined immunodeficiency (SCID)-X1 disease. Science 288, 669–672 (2000).

    CAS  PubMed  Google Scholar 

  72. Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. Science 365, 48–53 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Klompe, S. E., Vo, P. L. H., Halpin-Healy, T. S. & Sternberg, S. H. Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019).

    CAS  PubMed  Google Scholar 

  74. Shipman, S. L., Nivala, J., Macklis, J. D. & Church, G. M. Molecular recordings by directed CRISPR spacer acquisition. Science 353, aaf1175 (2016).

    PubMed  PubMed Central  Google Scholar 

  75. Schmidt, F., Cherepkova, M. Y. & Platt, R. J. Transcriptional recording by CRISPR spacer acquisition from RNA. Nature 562, 380–385 (2018).

    CAS  PubMed  Google Scholar 

  76. Rollins, M. F. et al. Structure reveals a mechanism of CRISPR-RNA-guided nuclease recruitment and anti-CRISPR viral mimicry. Mol. Cell 74, 132–142.e5 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Sinkunas, T. et al. Cas3 is a single-stranded DNA nuclease and ATP-dependent helicase in the CRISPR/Cas immune system. EMBO J. 30, 1335–1342 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Huo, Y. et al. Structures of CRISPR Cas3 offer mechanistic insights into Cascade-activated DNA unwinding and degradation. Nat. Struct. Mol. Biol. 21, 771–777 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Herzik, M. A., Wu, M. & Lander, G. C. High-resolution structure determination of sub-100 kDa complexes using conventional cryo-EM. Nat. Commun. 10, 1032 (2019).

    PubMed  PubMed Central  Google Scholar 

  80. Mastronarde, D. N. Automated electron microscope tomography using robust prediction of specimen movements. J. Struct. Biol. 152, 36–51 (2005).

    PubMed  Google Scholar 

  81. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. CryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).

    CAS  PubMed  Google Scholar 

  82. Suloway, C. et al. Automated molecular microscopy: the new Leginon system. J. Struct. Biol. 151, 41–60 (2005).

    CAS  PubMed  Google Scholar 

  83. Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).

    CAS  PubMed  Google Scholar 

  84. Scheres, S. H. W. & Chen, S. Prevention of overfitting in cryo-EM structure determination. Nat. Methods 9, 853–854 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Tan, Y. Z. et al. Addressing preferred specimen orientation in single-particle cryo-EM through tilting. Nat. Methods 14, 793–796 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 75, 861–877 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. Nicholls, R. A. Conformation-independent Comparison of Protein Structures. PhD thesis, Univ. of York (2011).

  90. Williams, C. J. et al. MolProbity: more and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315 (2018).

    CAS  PubMed  Google Scholar 

  91. Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).

    CAS  PubMed  Google Scholar 

  92. Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).

    CAS  PubMed  Google Scholar 

  93. Sagendorf, J. M., Markarian, N., Berman, H. M. & Rohs, R. DNAproDB: an expanded database and web-based tool for structural analysis of DNA–protein complexes. Nucleic Acids Res. 48, D277–D287 (2020).

    CAS  PubMed  Google Scholar 

  94. Biswas, A., Staals, R. H. J., Morales, S. E., Fineran, P. C. & Brown, C. M. CRISPRDetect: a flexible algorithm to define CRISPR arrays. BMC Genomics 17, 356 (2016).

    PubMed  PubMed Central  Google Scholar 

  95. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).

    Google Scholar 

  96. Abby, S. S., Néron, B., Ménager, H., Touchon, M. & Rocha, E. P. C. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS ONE 9, e110726 (2014).

    PubMed  PubMed Central  Google Scholar 

  97. Couvin, D. et al. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 46, W246–W251 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    CAS  Google Scholar 

  99. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. Gouveia-Oliveira, R., Sackett, P. W. & Pedersen, A. G. MaxAlign: maximizing usable data in an alignment. BMC Bioinform. 8, 312 (2007).

    Google Scholar 

  101. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    CAS  PubMed  Google Scholar 

  103. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Thanks to members of the B.W. laboratory for feedback and discussions. We thank M. Matyszewski and J. Jeliazkov for helpful discussions. Thanks to C. Hophan-Nichols for computational support. A.S.-F. is a postdoctoral fellow of the Life Science Research Foundation that is supported by the Simons Foundation. A.S.-F. is supported by the Postdoctoral Enrichment Program Award from the Burroughs Wellcome Fund. This work was supported by National Institutes of Health, United States grant 1K99GM147842 (A.S.-F.). L.T. and A.B.G. are supported by Montana State University’s Undergraduate Scholars Program, and by the NIH NIGMS IDeA program (P20GM103474). This work was performed using the cryo-EM facility at Montana State University (NSF 1828765 and the M.J. Murdock Charitable Trust). Microscopy was also performed at the National Center for CryoEM Access and Training (NCCAT) and the Simons Electron Microscopy Center located at the New York Structural Biology Center, supported by the NIH Common Fund Transformative High Resolution Cryo-Electron Microscopy program (U24 GM129539), and by grants from the Simons Foundation (SF349247) and NY State Assembly. Research in the Wiedenheft laboratory is supported by the NIH (R35GM134867), the M.J. Murdock Charitable Trust, a young investigator award from Amgen, the Montana State University Agricultural Experimental Station (USDA NIFA) and a sponsored research agreement from VIRIS Detection Systems. Molecular graphics and analyses performed with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases. Funders had no role in the conceptualization, designing, data collection, analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

A.S.-F. carried out conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, supervision, validation, visualization and writing (original draft). W.S.H., T.W. and M.B. performed data curation, investigation, methodology, visualization and writing (review and editing). A.B.G. and R.A.W. undertook investigation and methodology. L.T. carried out visualization. C.C.G. contributed to software, resources and writing (review and editing). K.N. and E.T.E. performed investigation and resources. G.C.L. undertook methodology, supervision, visualization and writing (review and editing). B.W. was responsible for funding acquisition, project administration, resources, supervision, visualization and writing (review and editing).

Corresponding author

Correspondence to Blake Wiedenheft.

Ethics declarations

Competing interests

B.W. is the founder of SurGene and VIRIS Detection Systems. B.W. and A.S.-F. are inventors on patent applications related to CRISPR–Cas systems and applications thereof. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Elizabeth Kellogg for her contribution to the peer review of this work. Dimitris Typas was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cryo-EM sample preparation, imaging and processing for type I-F integration complex.

a, Sequence-level schematic of DNA used to assemble the integration complex. The length of each motif is listed. The latter two thirds of the CRISPR repeat and second spacer (grey dashed box) could not be resolved in the cryo-EM reconstruction. See also Supplementary Table 1. b, Size-exclusion chromatography (SEC) profile (Superdex 75 16/600, Cytiva) of IHF heterodimer purified as described in methods section, and SDS-PAGE gel (inset). c, SEC profile (Superdex 200 10/300, Cytiva) of Cas1–2/3 heterohexamer purified as described in methods section, and SDS-PAGE gel (inset). d, I-F integration complex was assembled from purified DNAs, IHF and Cas1–2/3 as described in the methods section, and the assembled complex was further purified by size-exclusion chromatography (SEC) (Superdex 200 10/300, Cytiva). Individual fractions were collected along the elution profile, and were concentrated and stored separately for further analysis and imaging. e, Individual SEC fractions were analyzed by SDS-PAGE to determine which fractions contained all the proteins necessary for a complete complex. f, Individual SEC fractions were phenol-chloroform extracted, and the aqueous layer was analyzed by Urea-PAGE to determine which fractions contained all four DNA strands necessary for a complete complex. The fraction chosen for cryo-EM analysis is indicated with a dotted purple box. g, Image processing pipeline for a small subset of 10,740 total micrographs for the type I-F integration complex, to generate an initial model for template picking. Scale bar represents 100 nm. h, Final image processing pipeline for the type I-F integration complex. i, Viewing direction distribution plot depicting particle orientations present in final reconstruction. More populated views are shown in red, and less populated views are shown in blue. j, 3D Fourier Shell Correlation (3DFSC) of the final I-F integration complex reconstruction. The global resolution at 0.143 is indicated by a dashed line, 3.48 Å. k, Local resolution estimation of the cryo-EM reconstruction calculated by cryoSPARC81. The purification of proteins, assembly of the integration complex, and analysis of these samples by SDS-PAGE or Urea page was performed once. Micrographs were collected on two separate occasions with similar results.

Source data

Extended Data Fig. 2 Cas1-2/3 undergoes a large structural rearrangement during integration.

a, The Cas3 domains of the Cas1-2/3 complex have undergone a ~ 100° rotation in the structure of the integration complex as compared to Cas1-2/3 alone54. The positions of the first (90) and last residues (110) of the Cas2/3 linker are shown (Cas2/3a, red; Cas2/3b, tomato). The linker residues were not resolved for the previously determined pseudo-atomic model of Cas1-2/3 alone, and so are not shown for either complex for clarity54. The rotation of the Cas3 domains outwards unveils new DNA binding sites on two opposing faces of the Cas2 dimer. b, Zoom-in on the atomic fit of the Cas2/3 linker to the cryo-EM map. The Cas2/3 linker is disordered in the absence of Cas1 (PDB:5B7I)55, but has become ordered in the I-F integration complex structure due to packing by the foreign DNA against the Cas1 beta hairpins. c, A sequence logo depicting the conservation of Cas2/3 linker residues (top). The Cas1 residues that contact the Cas2/3 linker are conserved in Cas1 proteins associated with type I-F CRISPR loci (middle), but are not conserved in the closely related Cas1 proteins associated with type I-E CRISPR loci that have similar IHF motif-containing leaders22. Residues are numbered according to P. aeruginosa PA14 Cas1 and Cas2/3 proteins.

Extended Data Fig. 3 Conservation analysis of Cas1-2/3 residues involved in DNA binding and integration.

a, Conservation of Cas1 and Cas2/3 residues involved in binding the foreign DNA, or catalyzing the strand transfer reaction, or catalyzing the degradation of nucleic acids. See Fig. 2. b, Conservation of basic and polar Cas1 and Cas2/3 residues involved in accommodating the DNA duplexes bound by the Cas1-2/3 complex during integration. See Fig. 3. Residues are numbered according to P. aeruginosa Cas1 and Cas2/3 proteins.

Extended Data Fig. 4 Cas1-2/3 predominantly recognizes IR motifs, CRISPR repeat and foreign DNA through non-sequence specific interactions.

a, Splayed 3′ ends of the foreign DNA are directed into the Cas1 transesterification active site. The product of the first strand-transfer reaction is shown in the Cas1a* active site (top), and the 3′ OH of the other end of the foreign DNA is positioned in the Cas1b* active site (bottom). The cryo-EM map is shown in transparent grey. b, Zoom-in on the Cas1-2/3 contacts to the CRISPR repeat (ChimeraX contacts command with default parameters). Most protein contacts occur to the DNA backbone and minor groove. Cas1 residue E184 appears to probe nucleotide G1 of the repeat. c, Zoom-in on the Cas1-2/3 contacts to the IR leader motifs. Most protein contacts occur to the DNA backbone and minor groove. d, DNAproDB analysis of Cas1-2/3 interactions with the 3′ ends of foreign DNA and the CRISPR leader-repeat junction. For clarity, only protein interactions to the nucleobases are shown93. e, DNAproDB analysis of Cas1-2/3 interactions with the IR leader motifs93. For clarity, only protein interactions to the nucleobases are shown. f, Zoom-in on the atomic fit of the base-pairs around the leader-repeat junction (-3 and +3 bps, coordinated by Cas1a*), to the cryo-EM map. Tension in the DNA loop at the leader-repeat junction has been released in the post-integration structure by a physical separation of base-pairs, as measured by an increase in base step rise. This tension may further pull the leaving 3′OH out of the Cas1 transesterification active site, to inhibit disintegration of the foreign DNA from the repeat. The approximate local base step rise was calculated using the http://web.x3dna.org/ webserver. g, A bioinformatic analysis of the first repeat from 24,940 CRISPR loci reveals that a 5′ GT dinucleotide is strongly conserved across most CRISPR subtypes. Similarly, a 5′ GT is present at the spacer-end of the repeat (seen as AC-3′ on the sense strand) within certain CRISPR subtypes (I-D, II-C, III-C, III-D, V-B, V-E, V-K, VI-A), but it is not broadly conserved.

Extended Data Fig. 5 Purification of structure-guided mutants of Cas1-2/3.

a, SEC profile of a new preparation of wildtype Cas1-2/3 and all variants purified in the same manner on a Superdex 200 16/600 (Cytiva). An excess of free Strep-tagged Cas1 elutes at approximately 82 mL. b, SDS-PAGE gel of the Cas1-2/3 hetero-hexamer peak for all purified Cas1-2/3 variants. The SDS-PAGE gel of all Cas1-2/3 samples was run twice with similar results.

Source data

Extended Data Fig. 6 Validation of Cas1-2/3 interactions with the foreign DNA and IR motifs.

a, Time-course integration reactions to test the role of Cas1H25 in splaying the foreign DNA ends. Integration reactions were performed with trimmed foreign DNA (lacking a PAM) in triplicate, resolved on denaturing polyacrylamide gels. Timepoints were taken at 0, 1, 2, 4 and 8 minutes. Reactions were stopped by the addition of phenol. A 32P-labelled DNA that is shorter (140-160 bp) than the full length CRISPR is present in some DNA preparations (also see Extended Data Fig. 6c). Full-length CRISPR DNA, leader- and spacer-side integration products, do not overlap with this band. Further, Cas1-2/3, foreign DNA and IHF are in excess over the 32P-labelled DNA. The 140-160 bp band does not interfere with the quantification or generation of integration products. b, Quantification of time-course experiments to determine the role of Cas1 residue H25 in integration. The mean and standard deviation of three replicate experiments are shown. The Cas1H25A mutant integrates splayed and fully complementary foreign DNA fragments less efficiently that WT Cas1-2/3, suggesting that H25 steers the non-nucleophilic DNA strand away from the Cas1 active site. These results mirror the previously published effect of type I-E Cas1-2 tyrosine wedge mutation10. c, Time-course integration reactions to test the role of Cas2 residues in recognition of the IR motifs in the leader, performed as in panel a. d, Quantification of time-course experiments to determine the role of Cas2 residues K11, R12, R55 and N56 in integration. The mean and standard deviation of three replicate experiments are shown. The Cas2R55E,N56D/3 mutant retains a small amount of integration activity. The Cas2K11D,R12E/3 and Cas2K11D,R12E,R55E,N56D/3 mutants do not integrate DNA into the I-F CRISPR. Quantification of leader- (grey circles) or spacer-side (white circles) integration events from all three replicate gels. Individual dots for each triplicate reaction are shown, and some dots overlap.

Source data

Extended Data Fig. 7 PAM blocks Cas-mediated integration of foreign DNA into CRISPR repeat.

a, Schematic summarizing the step-wise strand-transfer reactions catalyzed by a Cas integrase. In the absence of putative host nucleases that trim PAMs, the foreign DNA fragments that contain a PAM stall the reaction at leader-side integration. However, the foreign DNA fragments that have been trimmed (no PAM) proceed through leader- and spacer-side integration. b-d, Endpoint integration reactions performed with a PAM-containing foreign DNA in triplicate, resolved on denaturing polyacrylamide gels. The X1 and X2 lanes signify lanes that were not further analyzed for this manuscript. e-g, Endpoint integration reactions performed with a trimmed foreign DNA in triplicate, resolved on denaturing polyacrylamide gels. The X1 and X2 lanes signify integration substrates that were not analyzed for this manuscript. h, i, Quantification of leader- (grey circles) or spacer-side (white circles) integration events from all three replicate gels. Individual dots for each triplicate reaction are shown, and some dots overlap. Three independent gels were run for PAM-containing or trimmed Foreign DNA integration reactions with similar results.

Source data

Extended Data Fig. 8 Control reactions for integration assay and generation of 32P-labelled ladder.

a, Scheme of nine CRISPR fragments used for in vitro integration assays. Each CRISPR locus contains two repeats and two spacers. Leader motifs are color-coded and annotated (IR, inverted repeat; DR, direct repeat; IHF, IHF binding site; LAS, Leader anchoring site). To simplify the pictograms shown here and in subsequent panels, a single-colored rectangle was used to represent a given collection of leader motifs, and a single diamond was used to represent a CRISPR locus composed of two repeats and two spacers. b, The four CRISPR repeats tested in the integration assays have diverse palindromes (yellow), and a wide range of GC-content. c, Control reactions in which all components necessary for integration, except Cas1-2/3, were incubated. The overexposed gels show that the majority of the 32P signal for a given integration substrate DNA corresponds to the full-length strands. d, A custom 32P-labelled DNA ladder was made by mixing the degradation products generated by individual restriction enzyme digests of different 32P-labelled integration substrate DNAs. X1 and X2 signify integration substrates that were not analyzed for this manuscript. e, Schematics of the nine CRISPRs tested in Extended Data Fig. 5. The arrows identify locations of off-target integration reactions. Most off-target integration reactions occur by spurious integration at DNA motifs (for example, second CRISPR repeat, IHFdistal site, or upstream motifs) found near the ends of the CRISPR DNA target. Previous deep-sequencing of similar integration reactions has shown that the second repeat is a common off-target integration site, and that IHF blocks integration at the IHF binding site22. Urea-PAGE gels were run once.

Source data

Extended Data Fig. 9 Validation of Cas1-2/3 interactions with the repeat.

a, Time-course integration reactions to compare rate of integration into I-E and I-F repeats downstream of a I-F leader. b, Quantification of time-course experiments to determine the impact of 19 mutations associated with swapping the I-F repeat for the I-E repeat, on integration. Leader-side integration is indistinguishable. But spacer-side integration is slower into the I-E repeat. The mean and standard deviation of three replicate experiments are shown. c, Time-course integration reactions to measure the impact of I-F repeat mutations on integration rate. d, Time-course integration reactions to measure the impact of I-F repeat mutations on integration rate, in the context of a Cas1E184A mutation. The Cas1E184A mutation is expected to disrupt 5′ G recognition, but also impacts stability of the Cas1-2/3 complex (Extended Data Fig. 5). e, Quantification of time-course experiments to determine the impact of I-F repeat mutations on integration rate, in the context of WT Cas1-2/3. The mean and standard deviation of three replicate experiments are shown. f, Quantification of time-course experiments to determine the impact of I-F repeat mutations on integration rate, in the context of Cas1E184A-2/3. In panels e and f, no integration occurs into either the ‘G1A,T2A,A28C’ or ‘G1A,T2A’ repeat mutations, and the datapoints for these plots overlaps at roughly Y = 0 over the time course. The mean and standard deviation of three replicate experiments are shown. Each Urea-PAGE gel was run once.

Source data

Supplementary information

Supplementary Information

Supplementary Table 1.

Reporting Summary

Peer Review File

Supplementary Video 1

Cas1–2/3 undergoes a large conformational change to unveil Cas2-leader binding sites.

Supplementary Video 2

Overview of how the I-F CRISPR leader and IHF guide Cas1–2/3-mediated integration of foreign DNA at the first CRISPR repeat.

Supplementary Video 3

The strained IHF-mediated DNA bends sway in relation to the Cas1–2/3 integrase.

Source data

Source Data Fig. 4

Unprocessed gel images.

Source Data Extended Data Fig. 1

Unprocessed gel images.

Source Data Extended Data Fig. 5

Unprocessed gel images.

Source Data Extended Data Fig. 6

Unprocessed gel images.

Source Data Extended Data Fig. 7

Unprocessed gel images.

Source Data Extended Data Fig. 8

Unprocessed gel images.

Source Data Extended Data Fig. 9

Unprocessed gel images.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santiago-Frangos, A., Henriques, W.S., Wiegand, T. et al. Structure reveals why genome folding is necessary for site-specific integration of foreign DNA into CRISPR arrays. Nat Struct Mol Biol 30, 1675–1685 (2023). https://doi.org/10.1038/s41594-023-01097-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41594-023-01097-2

This article is cited by

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology