Foreign DNA capture during CRISPR–Cas adaptive immunity


Bacteria and archaea generate adaptive immunity against phages and plasmids by integrating foreign DNA of specific 30–40-base-pair lengths into clustered regularly interspaced short palindromic repeat (CRISPR) loci as spacer segments1,2,3,4,5,6. The universally conserved Cas1–Cas2 integrase complex catalyses spacer acquisition using a direct nucleophilic integration mechanism similar to retroviral integrases and transposases7,8,9,10,11,12,13. How the Cas1–Cas2 complex selects foreign DNA substrates for integration remains unknown. Here we present X-ray crystal structures of the Escherichia coli Cas1–Cas2 complex bound to cognate 33-nucleotide protospacer DNA substrates. The protein complex creates a curved binding surface spanning the length of the DNA and splays the ends of the protospacer to allow each terminal nucleophilic 3′-OH to enter a channel leading into the Cas1 active sites. Phosphodiester backbone interactions between the protospacer and the proteins explain the sequence-nonspecific substrate selection observed in vivo2,3,4. Our results uncover the structural basis for foreign DNA capture and the mechanism by which Cas1–Cas2 functions as a molecular ruler to dictate the sequence architecture of CRISPR loci.


CRISPR loci are defined by repetitive elements that are separated by similarly sized spacer sequences acquired from foreign DNA during the adaptation stage of CRISPR–Cas adaptive immunity6,14. CRISPR transcripts generated from the loci assemble with Cas proteins to detect and cleave foreign nucleic acids bearing sequence complementarity to the spacer segment1,5,15,16,17,18,19. In E. coli, expression of the Cas1–Cas2 protein complex triggers acquisition of new 33-base-pair (bp) spacers at the A/T-rich leader end of the CRISPR locus7,8,9,10,20. How the Cas1–Cas2 complex selects 33-bp protospacers of variable sequences and activates the 3′-OH ends for integration remains unknown. As the Cas1–Cas2 complex is sufficient to initiate spacer acquisition and adaptation of the CRISPR–Cas immune system, we hypothesized that the protein complex alone must provide the structural basis for the unknown mechanism of spacer length determination.

To determine how protospacer variation influences the efficiency of Cas1–Cas2-mediated spacer acquisition, we used an in vitro integration assay to test versions of a 33 bp sequence with constant overall length but different 3′ single-stranded overhang lengths12. The protospacer sequence is derived from the M13 bacteriophage genome and is highly acquired into the E. coli CRISPR locus after infection8. Unexpectedly, protospacers with overhanging 3′ nucleotides are strongly preferred by the Cas1–Cas2 complex over a completely double-stranded 33 bp protospacer (Fig. 1a and Extended Data Fig. 1a, b). Single-stranded DNA and substrates with 5′ overhangs are poor substrates for integration, highlighting the ability of Cas1–Cas2 to select specific DNA substrates before integration12. The most preferred protospacer DNA for in vitro integration consists of five overhanging nucleotides on each 3′ end (Extended Data Fig. 1). To determine the molecular basis of Cas1–Cas2 protospacer capture, we assembled Cas1–Cas2 complexes with the preferred protospacer substrate and determined crystal structures of the complex in the presence and absence of Mg2+ at 3.0 Å and 3.2 Å resolutions, respectively (Extended Data Fig. 2 and Extended Data Table 1).

Figure 1: Overall architecture and active site positioning of 3′-OH nucleophile.

a, A representative agarose gel of in vitro integration reactions using increasing lengths of 3′ single-strand (ss) protospacer DNA overhangs. Per cent integration values are the average of three independent experiments. kb, kilobases; nt, nucleotide; S.C., supercoiled pCRISPR; Band X, relaxed pCRISPR byproduct (ref. 12). b, The overall architecture of Cas1–Cas2 bound to protospacer DNA. The line segments indicate the length of the DNA, spanning a total of 33 nucleotides. c, Stick configurations of the two Cas1 active sites (blue subunits in b) that coordinate the nucleophilic 3′-OH ends of the protospacer (green arrow). Supplementary Information contains the full image for a.

PowerPoint slide

The structures reveal a hexameric protein architecture comprising four copies of Cas1 and two copies of Cas2, in which the protospacer spans the central Cas2 dimer and terminates within individual Cas1 subunits on each end of the complex (Fig. 1b). Structural superposition of the Cas1–Cas2 complex with and without bound DNA reveals a DNA-induced change in Cas1 subunit orientation in which each Cas1 dimer rotates ~10° in opposing directions against the central Cas2 hub (Extended Data Fig. 3a, b). Cas1–Cas2 protospacer capture positions each single-stranded protospacer 3′ end within a channel leading directly to a Cas1 active site. Simulated annealing omit maps show clear electron density for the double-helical region and the five-nucleotide overhangs on each end of the protospacer (Extended Data Fig. 4a–c). The constrained protein channel guiding each DNA strand from its double-helical region to the single-strand-accommodating Cas1 active site explains the specificity of Cas1–Cas2 for five-nucleotide 3′ overhang substrates (Fig. 1a and Extended Data Fig. 1). Two of the four Cas1 subunits, coloured green in Fig. 1b, are not occupied with the protospacer 3′ ends and are probably non-catalytic, since the 3′-OH nucleophile and the scissile phosphodiester bond of the target DNA must be in the same active site for direct nucleophilic integration.

In the active sites, the 3′ terminal base is involved in a stacking interaction with Y217 that positions the nucleophilic 3′-OH ends of the protospacer near the conserved metal-binding residues E141, H208 and D221 (Fig. 1c). Although we cannot assign density for Mg2+ in the active sites, these three residues have been shown previously to coordinate a Mn2+ ion in the active site of Cas1 from Pseudomonas aeruginosa21. Furthermore, alanine mutations at these positions disrupt in vivo spacer acquisition7,8,10. Thus, the observed positioning of the 3′-OH nucleophiles and catalytic residues probably represents the active configuration of the nucleoprotein complex immediately before spacer integration.

All interactions between Cas1–Cas2 and protospacer DNA involve coordination of the phosphate backbone rather than base-specific contacts, consistent with the variable sequence selection of protospacers that is essential for resistance to diverse foreign sequences2,3,4. Two central regions of the Cas1–Cas2 complex, which we term the ‘arginine clamp’ and the ‘arginine channel’, stabilize the protospacer (Fig. 2a–d). The arginine clamp interacts with the middle of the duplex region where four Arg residues coordinate each DNA strand: Cas1 R41 and Cas2 R16, R77 and R78 (Fig. 2c). Reverse charge mutations of Cas1 R41 and Cas2 R16 and R78 drastically reduce spacer acquisition in vivo, whereas the Cas2 R77E mutant functions similar to wild-type Cas2 (Fig. 2e). Thus, Cas1 R41, Cas2 R16 and R78 are the key constituents of the arginine clamp. The contribution of Cas2 to protospacer DNA binding supports the previous hypothesis that the main function of Cas2 is to form a non-catalytic scaffold within the Cas1–Cas2 complex10.

Figure 2: Coordination of protospacer DNA within the complex.

a, Electrostatic potential surface representation of the Cas1–Cas2 complex with the protospacer shown in yellow. b, Close up of the arginine channel that stabilizes the ssDNA overhang. c, Stick configuration representation of arginine clamp residues that coordinate the protospacer duplex region. d, Map of amino acid residues that coordinate the protospacer phosphodiester backbone (black dots). Residue colours indicate Cas1–Cas2 protomers from Fig. 1b. e, Agarose gels of in vivo spacer acquisition assays of arginine channel and clamp mutant proteins. WT, wild type. f, Plot of per cent in vitro integration of either double-stranded DNA (dsDNA; black) or 5-nucleotide (nt) overhang (blue) protospacers with wild-type Cas1, Cas1(R59D) or Cas1(R66D) complexed with Cas2. g, Fluorescence polarization binding assays of a 5-nucleotide overhang protospacer with the same mutants in f complexed with Cas2. The calculated relative binding affinities (Kd) are indicated. Error bars represent the standard deviation of three independent experiments. Data in panel eg are results of at least three biological replicates. Supplementary Information contains the full images for e.

PowerPoint slide

Cas1 residues R66, R84, R245 and R248 line the arginine channel that stabilizes the junction where the duplex region terminates and the single-stranded DNA overhang enters the active site. Reverse charge mutations of each arginine lining the arginine channel disrupts spacer acquisition in vivo (Fig. 2e). In addition, purified Cas1 R59D or R66D proteins complexed with wild-type Cas2 are highly defective in integrating 33-bp duplex or five-nucleotide overhang protospacer substrates in vitro (Fig. 2f). Fluorescence polarization assays demonstrate that the mutant complexes exhibit dramatically reduced affinity for protospacer DNA, highlighting the critical role of this part of the Cas1–Cas2 complex for protospacer capture and complex stability (Fig. 2g).

The Cas1–Cas2–DNA crystal structures uncover a protein wedge that terminates the protospacer double-stranded DNA region and allows single-stranded DNA overhangs to enter the arginine channel. A stacking interaction of the 5′ terminal base (adenine 6 in Fig. 3a, b) with Y22 of Cas1 stabilizes protospacer duplex unwinding, directing each single-stranded 3′ overhang to sharply bend ~90° away from the duplex and into the active site channel (Fig. 3b). A mutation of Y22 to alanine reduces spacer acquisition in vivo, whereas a phenylalanine mutation has near wild-type levels of acquisition, consistent with a specific role for Cas1 Y22 base-stacking in protospacer strand splaying (Fig. 3c). Sequence alignment of representative Cas1 proteins in type I CRISPR systems reveals that Y22 is not universally conserved in other bacteria, suggesting that additional or different Cas1 residues may stabilize the splayed ends in other CRISPR–Cas systems (Extended Data Fig. 5).

Figure 3: Mechanism of protospacer DNA end separation.

a, The 5-nucleotide splayed protospacer sequence used for crystallization to determine the trajectory of the displaced non-nucleophilic strand. Cas1 Y22, involved in base stacking at the fork, is shown in blue. b, Close up of the DNA fork showing the base stacking interaction of Y22 with the terminal adenine nucleotide of the non-nucleophilic strand. The nucleotides are numbered from 5′ to 3′ of each DNA strand shown in a. The grey mesh shows the 2Fo − Fc density contoured at 2.2σ of the first ejected nucleotide of the displaced strand. The arrows indicate the opposite trajectories of each strand. c, Agarose gel of in vivo acquisition assay of co-expressed wild-type (WT) Cas1 or the indicated Cas1 mutant with Cas2. Quantification is the mean of three independent experiments ± standard deviation. d, Plot of per cent integration of increasing number of splayed nucleotides at the protospacer ends using wild-type Cas1 (blue) or Cas1(Y22A) (blue) complexed with Cas2. Error bars represent the standard deviation of three independent experiments. Supplementary Information contains the full image for c.

PowerPoint slide

The observed stacking interaction raises the possibility that fully duplexed protospacers are separated by Cas1 Y22, thereby displacing the 5′ end of the duplex, which we term the non-nucleophilic strand, from the nucleophilic strand carrying the 3′-OH. DNA transposases and retroviral integrases also utilize end fraying to isolate the reactive DNA strands for chemistry within enzyme active sites22,23,24. To test this potential activity of Cas1–Cas2, we introduced an increasing number of mismatches at the ends of the 33 bp protospacer to disrupt end base pairing and assayed their potential for in vitro integration (Fig. 3d and Extended Data Fig. 6a, b). Similar to the 3′ overhang substrates, the 4- and 5-nucleotide frayed ends are highly preferred, presumably due to the lower energy required for capture of these substrates compared to perfectly duplexed ends (Fig. 3d). The complex containing the Cas1 Y22A mutant regains marginal activity with substrates containing 5- or 6-nucleotide splayed ends, suggesting that Y22 steers the non-nucleophilic DNA strand away from the active site (Fig. 3d). Notably, the displaced non-nucleophilic strand is not cleaved into a shorter fragment by Cas1–Cas2, as the protospacer ends are not processed during integration (Extended Data Fig. 6c).

To determine the trajectory of the displaced non-nucleophilic strand after end-splaying, we crystallized Cas1–Cas2 with a protospacer with five-nucleotide frayed ends on both sides (Fig. 3a, b). The electron density at the fork is similar to the structures described above, except that we observe the first nucleotide of the displaced non-nucleophilic strand pointing in the opposite direction from the nucleophilic single-stranded DNA strand. Clear electron density is not observed for the remaining nucleotides of the displaced strand, indicating that they are not stabilized by the complex.

An alternative crystal form grown in the presence of Mg2+ reveals secondary Cas1–DNA interactions that provide additional insight into the mechanism of Cas1–Cas2 genomic DNA target binding and subsequent integration. In addition to the two Cas1 ‘catalytic’ active sites carrying the 3′-OH ends of the protospacer, the ‘non-catalytic’ Cas1 active sites interact with the protospacer DNA from a symmetry mate, revealing a possible coordination of the target DNA during integration (Fig. 4a and Extended Data Fig. 7a). The non-catalytic Cas1 engages the DNA minor groove by contacts with α-helix 7, causing a slight kink on the DNA compared to our alternative crystal form lacking Mg2+ (Extended Data Fig. 7b). A close-up of the active site shows continuous density for Mg2+ with E141, H208, D221 and a phosphate backbone of the presumed target DNA, capturing a snapshot of scissile phosphodiester bond coordination before integration (Fig. 4a).

Figure 4: Model of protospacer DNA integration.

a, View of crystal packing from a symmetry mate complex (grey) showing coordination of the symmetry DNA along a Cas1 active site. The inset is a magnified view of the coordination of the phosphodiester backbone with metal-binding residues E141, H208 and D221. The mesh represents a Fo − Fc density for a Mg2+ ion, contoured at 2.2σ. b, c, Model of protospacer DNA integration into target DNA (black) and positioning of the scissile phosphate (green arrow) and the 3′-OH nucleophile in the Cas1 active site.

PowerPoint slide

Because integration must occur in the active site that coordinates the 3′-OH of the protospacer DNA, we modelled the protein–DNA interactions from the non-catalytic Cas1 active sites into the catalytic Cas1 active sites. This reveals the positioning of the nucleophilic 3′-OH of the protospacer ends for attacking the scissile phophodiester bond in the modelled DNA (Fig. 4b, c). Further work will be needed to shed light on how the complex specifically recognizes the leader-repeat region of the CRISPR locus for integration, as recently observed in vitro11,12,13.

Together, these data explain key aspects of Cas1–Cas2 integrase-mediated acquisition of new DNA into bacterial genomes. First, we show that the substrates for integration are double-stranded DNA. Importantly, however, optimal substrates include a central 23 bp helical region flanked by five single-stranded nucleotides on each 3′ end. If substrates for CRISPR integration come from single-stranded DNA products of RecBCD, as recently suggested, they must somehow anneal or otherwise become double stranded before Cas1–Cas2 capture20. It remains unclear how the Cas1–Cas2 complex recognizes the AAG protospacer adjacent motif during protospacer selection, since the terminal nucleotides containing the 3′-OH nucleophiles are coordinated similarly in the Cas1 active sites (Fig. 1). Second, the Cas1–Cas2 integrase architecture specifies the precise length of integrated DNA, ensuring uniformity of spacer lengths within CRISPR loci. Finally, the structure-based model of DNA target sequence positioning suggests that in addition to catalysing the integration reaction, Cas1 plays a role in binding the target CRISPR locus. Target binding could possibly disrupt the structural symmetry observed in the crystal structure to coordinate the sequence-specific integration reactions at the leader-end of the CRISPR locus. Insights into target site recognition may offer strategies for altering or enhancing integration site specificity, with implications for use of the Cas1–Cas2 integrase as a genome-modifying technology.


No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Cas1, Cas2 and DNA preparation

The Cas1 and Cas2 proteins from E. coli K12 (MG1655) were cloned and separately purified as previously described10. Single-stranded DNA (ssDNA) oligonucleotides purchased from Integrated DNA Technologies were annealed in 20 mM HEPES-NaOH, pH 7.5, 25 mM KCl, 10 mM MgCl2 by heating at 95 °C for 3 min and slow cooling to room temperature. The pCRISPR DNA target for in vitro integration was constructed as previously described12. The DNA substrates used for crystallization were gel-purified before complex formation. The sequences for the five-nucleotide overhang substrates used for crystallization are: ssDNA1, 5′-ATTTACTACTCGTTCTGGTGTTTCTCGT-3′; and ssDNA2, 5′-AAACACCAGAACGAGTAGTAAATTGGGC-3′. The sequences for the five-nucleotide splayed substrates are: ssDNA1, 5′-TAAACATTTACTACTCGTTCTGGTGTTTCTCGT-3′; and ssDNA2, 5′-CATCTAAACACCAGAACGAGTAGTAAATTGGGC-3′.

In vivo acquisition and in vitro integration assays

The in vivo acquisition assays were performed as previously described7. The in vitro integration reactions were conducted as previously described with slight modifications12. After pre-incubation of equimolar Cas1 and Cas2 at 4 °C, 100 nM of the resulting Cas1–Cas2 complex was incubated with 100 nM protospacer DNA for an additional 10–15 min at room temperature. The integration reaction was activated by the addition of 300 ng (~5 nM) pCRISPR, incubated at 37 °C for 1 h and quenched with DNA loading buffer supplemented with EDTA at a final concentration of 20 mM. The reaction products were analysed on 1.5% agarose gels. Per cent integration activity values were determined by quantifying the band intensity of the relaxed pCRISPR product and dividing over the intensity of all bands detected by Image Lab Software (Bio-Rad). We note that the integration activity could be a mixture of half-site and full-site integration products, as described previously12.

Complex formation, crystallization and structure determination

Purified Cas1 and Cas2 were incubated with protospacer DNA at equimolar concentrations (50 μM) in buffer A (500 mM KCl, 20 mM HEPES-NaOH, pH 7.5, 1 mM DTT, 10 mM EDTA), followed by overnight dialysis at 4 °C against buffer B (100 mM KCl, 20 mM HEPES-NaOH, pH 7.5, 1 mM DTT, 5 mM EDTA). The dialysed sample was applied on a Superdex 75 10/300 column (GE Healthcare) in buffer B. Peak fractions were pooled and concentrated to ~3 mg ml−1 for crystallization. Optimized crystals were grown by hanging-drop vapour diffusion at room temperature in two different conditions, as described in the text. The Mg2+-containing crystals grew as gem-like morphologies in 50 mM MES, pH 6.1, 10% isopropanol and 20 mM MgCl2. The Mg2+-free crystals grew as rods in 100 mM sodium citrate tribasic pH 5.6, 200 mM sodium acetate and 8% PEG 8000 (w/v). The crystals were briefly transferred into a drop containing either 25% ethylene glycol (with Mg2+ crystals) or 30% glycerol (without Mg2+ crystals) for cryoprotection and frozen in liquid nitrogen. The Cas1–Cas2 complex with a splayed DNA substrate crystallized in the same conditions as the Mg2+-free crystals.

X-ray diffraction data were collected under cryogenic conditions at beamline 8.3.1 at the Lawrence Berkeley National Laboratory Advanced Light Source. Initial phases were obtained by sequential molecular replacement using individual protein components of the Cas1–Cas2 apo structure (Protein Data Bank (PDB) accession number 4P6I) as search models. Following initial placement of two Cas1 dimers and a dimer of Cas2, phases were improved by performing one round of rigid body refinement in PHENIX25. The resulting maps showed clear unbiased density for protospacer DNA, and subsequent model building was performed through iterative rounds of building in Coot26 and refinement in PHENIX with NCS restraints on the protein subunits. The asymmetric unit of the three structures contains one copy of the Cas1–Cas2 complex bound to protospacer DNA. Statistics for the final crystal structures are reported in Extended Data Table 1. The final structures are missing clear density for the loop connecting α6 and α7 of Cas1. We assume this loop to be highly disordered as it is also not observed in the apo E. coli Cas1 crystal structure (PDB 3NKD) and the apo Cas1–Cas2 complex (PDB 4P6I)10,27.

Fluorescence polarization

Fluorescence polarization assays were performed in 20 mM HEPES-NaOH, pH 7.5, 25 mM KCl, 5 mM EDTA, 1 μg ml−1 BSA and 1 mM DTT. Cas1–Cas2 were complexed and purified over gel filtration for all binding assays. The 3′-fluorescein labelled DNA substrate was added to the protein solution at a final concentration of 5 nM and the DNA–protein mixture was allowed to incubate for 30 min at 22 °C. Measurements were made by excitation at 485 nm and monitoring emission at 535 nm. Data were fit to a binding isotherm to obtain Kd. Each experiment was conducted in triplicate and error bars represent the standard deviation.

Sequence alignment

The cas1 sequences were obtained from the National Center for Biotechnology Information (NCBI) Gene Data Bank. A representative cas1 from each CRISPR type I subtype were chosen based on previous subtype assignments and the alignment was generated using MAFFT28,29. The organisms chosen for the alignment are: Escherichia coli K-12, Cronobacter dublinensis str. 582, Erwinia amylovora, Yersinia pestis biovar Antiqua str. B42003004, Yersinia kristensenii, Hafnia alvei, Sulfolobus solfataricus, Thermotoga maritima, Pseudothermotoga lettingae, Deferribacter desulfuricans, Desulfovibrio vulgaris, Bacillus halodurans, Bacillus cereus, Synechocystis sp. PCC 6803, Cyanothece sp. PCC 8802 and Limnoraphis robusta.

Accession codes

Primary accessions

Protein Data Bank

Data deposits

Atomic coordinates and structure factors for the reported crystal structures have been deposited at the Protein Data Bank under accession codes 5DS4 (no Mg2+), 5DS5 (with Mg2+) and 5DS6 (splayed DNA).


  1. 1

    Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007)

    CAS  ADS  Article  Google Scholar 

  2. 2

    Mojica, F. J., Diez-Villasenor, C., Garcia-Martinez, J. & Soria, E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J. Mol. Evol. 60, 174–182 (2005)

    CAS  ADS  Article  Google Scholar 

  3. 3

    Bolotin, A., Quinquis, B., Sorokin, A. & Ehrlich, S. D. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 151, 2551–2561 (2005)

    CAS  Article  Google Scholar 

  4. 4

    Pourcel, C., Salvignol, G. & Vergnaud, G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151, 653–663 (2005)

    CAS  Article  Google Scholar 

  5. 5

    Garneau, J. E. et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468, 67–71 (2010)

    CAS  ADS  Article  Google Scholar 

  6. 6

    van der Oost, J., Westra, E. R., Jackson, R. N. & Wiedenheft, B. Unravelling the structural and mechanistic basis of CRISPR–Cas systems. Nature Rev. Microbiol. 12, 479–492 (2014)

    CAS  Google Scholar 

  7. 7

    Yosef, I., Goren, M. G. & Qimron, U. Proteins and DNA elements essential for the CRISPR adaptation process in Escherichia coli . Nucleic Acids Res. 40, 5569–5576 (2012)

    CAS  Article  Google Scholar 

  8. 8

    Datsenko, K. A. et al. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nature Comm. 3, 945 (2012)

    ADS  Article  Google Scholar 

  9. 9

    Swarts, D. C., Mosterd, C., van Passel, M. W. & Brouns, S. J. CRISPR interference directs strand specific spacer acquisition. PLoS ONE 7, e35888 (2012)

    CAS  ADS  Article  Google Scholar 

  10. 10

    Nuñez, J. K. et al. Cas1-Cas2 complex formation mediates spacer acquisition during CRISPR–Cas adaptive immunity. Nature Struct. Mol. Biol. 21, 528–534 (2014)

    Article  Google Scholar 

  11. 11

    Arslan, Z., Hermanns, V., Wurm, R., Wagner, R. & Pul, U. Detection and characterization of spacer integration intermediates in type I-E CRISPR–Cas system. Nucleic Acids Res. 42, 7884–7893 (2014)

    CAS  Article  Google Scholar 

  12. 12

    Nuñez, J. K., Lee, A. S., Engelman, A. & Doudna, J. A. Integrase-mediated spacer acquisition during CRISPR–Cas adaptive immunity. Nature 519, 193–198 (2015)

    ADS  Article  Google Scholar 

  13. 13

    Rollie, C., Schneider, S., Brinkmann, A. S., Bolt, E. L. & White, M. F. Intrinsic sequence specificity of the Cas1 integrase directs new spacer acquisition. eLife 4, 10.7554/eLife.08716 (2015)

  14. 14

    Heler, R., Marraffini, L. A. & Bikard, D. Adapting to new threats: the generation of memory by CRISPR–Cas immune systems. Mol. Microbiol. 93, 1–9 (2014)

    CAS  Article  Google Scholar 

  15. 15

    Brouns, S. J. et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321, 960–964 (2008)

    CAS  ADS  Article  Google Scholar 

  16. 16

    Carte, J., Wang, R., Li, H., Terns, R. M. & Terns, M. P. Cas6 is an endoribonuclease that generates guide RNAs for invader defense in prokaryotes. Genes Dev. 22, 3489–3496 (2008)

    CAS  Article  Google Scholar 

  17. 17

    Haurwitz, R. E., Jinek, M., Wiedenheft, B., Zhou, K. & Doudna, J. A. Sequence- and structure-specific RNA processing by a CRISPR endonuclease. Science 329, 1355–1358 (2010)

    CAS  ADS  Article  Google Scholar 

  18. 18

    Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 471, 602–607 (2011)

    CAS  ADS  Article  Google Scholar 

  19. 19

    Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012)

    CAS  ADS  Article  Google Scholar 

  20. 20

    Levy, A. et al. CRISPR adaptation biases explain preference for acquisition of foreign DNA. Nature 520, 505–510 (2015)

    CAS  ADS  Article  Google Scholar 

  21. 21

    Wiedenheft, B. et al. Structural basis for DNase activity of a conserved protein implicted in CRISPR-mediated genome defense. Structure 17, 904–912 (2009)

    CAS  Article  Google Scholar 

  22. 22

    Savilahti, H., Rice, P. A. & Mizuuchi, K. The phage Mu transpososome core: DNA requirements for assembly and function. EMBO J. 14, 4893–4903 (1995)

    CAS  Article  Google Scholar 

  23. 23

    Scottoline, B. P., Chow, S., Ellison, V. & Brown, P. O. Disruption of the terminal base pairs of retroviral DNA during integration. Genes Dev. 11, 371–382 (1997)

    CAS  Article  Google Scholar 

  24. 24

    Katz, R. A., Merkel, G., Andrake, M. D., Roder, H. & Skalka, A. M. Retroviral integrases promote fraying of viral DNA ends. J. Biol. Chem. 286, 25710–25718 (2011)

    CAS  Article  Google Scholar 

  25. 25

    Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D 66, 213–221 (2010)

    CAS  Article  Google Scholar 

  26. 26

    Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004)

    Article  Google Scholar 

  27. 27

    Babu, M. et al. A dual function of the CRISPR–Cas system in bacterial antivirus immunity and DNA repair. Mol. Microbiol. 79, 484–502 (2011)

    CAS  Article  Google Scholar 

  28. 28

    Makarova, K. S. et al. Evolution and classification of the CRISPR–Cas systems. Nature Rev. Microbiol. 9, 467–477 (2011)

    CAS  Google Scholar 

  29. 29

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)

    CAS  Article  Google Scholar 

Download references


We thank G. Meigs and the 8.3.1 beamline staff at the Advanced Light Source for assistance with data collection, J. Chen for input on experimental design and members of the Doudna laboratory for comments and discussions. The 8.3.1 beamline is supported by UC Office of the President, Multicampus Research Programs and Initiatives grant MR-15-328599 and Program for Breakthrough Biomedical Research, which is partially funded by the Sandler Foundation. This project was funded by US National Science Foundation grant No. 1244557 to J.A.D. and by NIH grant AI070042 to A.N.E. J.K.N. and L.B.H. are supported by US National Science Foundation Graduate Research Fellowships and J.K.N. by a UC Berkeley Chancellor’s Graduate Fellowship. P.J.K. is supported as a Howard Hughes Medical Institute Fellow of the Life Sciences Research Foundation. J.A.D. is an Investigator of the Howard Hughes Medical Institute and a member of the Center for RNA Systems Biology.

Author information




J.K.N. and L.B.H. conducted the crystallography, biochemistry and in vivo spacer acquisition assays. J.K.N., L.B.H. and P.J.K. collected the X-ray diffraction data and determined the crystal structures. J.K.N., L.B.H., P.J.K., A.N.E. and J.A.D. designed the study, analysed all data and wrote the manuscript.

Corresponding author

Correspondence to Jennifer A. Doudna.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Effect of overhang length on integration efficiency.

a, A plot of the per cent integration of protospacers ± standard deviation with varying 3′ single-stranded DNA extensions. A representative gel is shown in Fig. 1a. b, Protospacer sequences used for the assays described in a and Fig. 1a, with the red nucleotides indicating the 3′ overhang regions.

Extended Data Figure 2 Assembly of Cas1–Cas2 complex bound to protospacer DNA.

a, Gel filtration chromatogram of pre-assembled Cas1–Cas2 complex with protospacer DNA containing five-nucleotide 3′ overhangs. The dotted lines indicate the peak fractions of the Cas1–Cas2 complex without DNA, as shown in d. The dotted lines indicate the peak fractions of the Cas1–Cas2 complex bound to DNA (first peak) and excess, unbound DNA (second peak). b, c, The fractions from peak 1 (~12 ml) and peak 2 (~15 ml) were analysed by Coomassie-stained SDS–PAGE (b) and 12% urea-PAGE (c) to confirm the presence of Cas1, Cas2 and protospacer DNA. d, Gel-filtration chromatogram of assembled Cas1–Cas2 without protospacer DNA. e, Coomassie-stained SDS–PAGE of the peak fractions from d. Supplementary Information contains the full images for b, c and e.

Extended Data Figure 3 Conformational dynamics upon protospacer DNA binding.

a, An overlay of the DNA-bound Cas1–Cas2 structure with the apo Cas1–Cas2 (grey, PDB 4P6I). b, Vector lines depicting the conformational changes the Cas1–Cas2 complex undergoes upon protospacer DNA binding compared to the apo complex (PDB 4P6I). The Cas1 subunits rotate towards the direction of the arrows.

Extended Data Figure 4 Omit maps of the protospacer DNA.

a, Simulated annealing Fo − Fc omit electron density map of the entire protospacer DNA using the ‘no Mg2+’ map and model. b, c, Simulated annealing Fo − Fc omit electron density maps of the terminal five nucleotides in the active sites of the structures (a) with Mg2+ or (b) without Mg2+ in the crystallization condition. The maps are contoured at 2.0σ.

Extended Data Figure 5 Sequence alignment of Cas1 proteins in type I CRISPR systems.

Sequence alignments of Cas1 from representative organisms with type I CRISPR systems. The E. coli sequence is displayed at the top. The dots indicate the residues described in this study, with the red dots indicating the metal-binding residues. The box highlights the non-universal conservation of the E. coli Y22 residue in the β1 region of type I CRISPR systems. The secondary structure representations shown are for the E. coli Cas1.

Extended Data Figure 6 Integration of protospacer substrates with splayed ends.

a, Representative agarose gel of in vitro integration reactions using increasing lengths of splayed ends. The average per cent integration of three independent experiments is plotted in Fig. 3d. b, Sequences of protospacers used in the integration assays in a. c, A 12% denaturing polyacrylamide gel of protospacers after incubation with Cas1–Cas2 for 1 h at 37 °C in integration assay buffer conditions. The indicated DNA substrates are radiolabelled at the 5′ end. Supplementary Information contains the full images for a and c. nt, nucleotide.

Extended Data Figure 7 Crystallographic packing of the complex bound to Mg2+.

a, View of the symmetry mates (grey) contacting the non-catalytic Cas1 subunits (green). Catalytic Cas1 subunits are shown in blue, Cas2 in yellow and DNA is shown in salmon and red. b, Superposition of our two crystal structures, with or without Mg2+, shows a slight DNA kink in the structure bound to Mg2+ (dotted box). This region contacts α-helix 7 of a symmetry mate, as described in the text.

Extended Data Table 1 Summary of X-ray crystallography data collection and refinement

Supplementary information

Supplementary Information

This file contains uncropped gel images with size marker indications. (PDF 1129 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nuñez, J., Harrington, L., Kranzusch, P. et al. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527, 535–538 (2015).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing