Clustered regularly interspaced short palindromic repeats (CRISPR)-associated Cas9 is an effector protein that targets invading DNA and plays a major role in the prokaryotic adaptive immune system. Although Streptococcus pyogenes CRISPR–Cas9 has been widely studied and repurposed for applications including genome editing, its origin and evolution are poorly understood. Here, we investigate the evolution of Cas9 from resurrected ancient nucleases (anCas) in extinct firmicutes species that last lived 2.6 billion years before the present. We demonstrate that these ancient forms were much more flexible in their guide RNA and protospacer-adjacent motif requirements compared with modern-day Cas9 enzymes. Furthermore, anCas portrays a gradual palaeoenzymatic adaptation from nickase to double-strand break activity, exhibits high levels of activity with both single-stranded DNA and single-stranded RNA targets and is capable of editing activity in human cells. Prediction and characterization of anCas with a resurrected protein approach uncovers an evolutionary trajectory leading to functionally flexible ancient enzymes.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
We have made available sequencing data. Other data supporting the findings of this study are available from the corresponding authors upon reasonable request. Plasmids for gene editing in human cells are available through Addgene (Supplementary Table 14 and https://www.addgene.org/Raul_Perez-Jimenez/). Plasmids for HT-PAMDA experiments are available through Addgene (Supplementary Tables 8–10 and www.addgene.org/Benjamin_Kleinstiver/). Sequencing data for PAM determination and gene editing experiments will be made available through the National Center for Biotechnology Information Sequence Read Archive (NCBI SRA) under BioProject ID PRJNA832610. Sequencing data for HT-PAMDA experiments will be made available through NCBI SRA under BioProject ID PRJNA832159. Source data are provided with this paper.
Python script used for structure analysis can be found at spectrumBar.py and code used for sequence analysis is available via Zenodo at https://zenodo.org/record/3710516#.Y1plwHbMKUl.
Mohanraju, P. et al. Alternative functions of CRISPR–Cas systems in the evolutionary arms race. Nat. Rev. Microbiol. 20, 351–364 (2022).
Makarova, K. S. et al. Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants. Nat. Rev. Microbiol. 18, 67–83 (2020).
Garneau, J. E. et al. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468, 67–71 (2010).
Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007).
Karginov, F. V. & Hannon, G. J. The CRISPR system: small RNA-guided defense in bacteria and archaea. Mol. Cell 37, 7–19 (2010).
Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in daaptive bacterial immunity. Science 337, 816–821 (2012).
Mojica, F. J., Diez-Villasenor, C., Garcia-Martinez, J. & Almendros, C. Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155, 733–740 (2009).
McGinn, J. & Marraffini, L. A. Molecular mechanisms of CRISPR–Cas spacer acquisition. Nat. Rev. Microbiol. 17, 7–12 (2019).
Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014).
Singh, D., Sternberg, S. H., Fei, J., Doudna, J. A. & Ha, T. Real-time observation of DNA recognition and rejection by the RNA-guided endonuclease Cas9. Nat. Commun. 7, 12778 (2016).
Mali, P., Esvelt, K. M. & Church, G. M. Cas9 as a versatile tool for engineering biology. Nat. Methods 10, 957–963 (2013).
Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).
Koonin, E. V. & Makarova, K. S. Origins and evolution of CRISPR-Cas systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 374, 20180087 (2019).
Manteca, A. et al. Mechanochemical evolution of the giant muscle protein titin as inferred from resurrected proteins. Nat. Struct. Mol. Biol. 24, 652–657 (2017).
Perez-Jimenez, R. et al. Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat. Struct. Mol. Biol. 18, 592–596 (2011).
Barruetabeña, N. et al. Resurrection of efficient Precambrian endoglucanases for lignocellulosic biomass hydrolysis. Commun. Chem. 2, 76 (2019).
Zakas, P. M. et al. Enhancing the pharmaceutical properties of protein drugs by ancestral sequence reconstruction. Nat. Biotechnol. 35, 35–37 (2017).
Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935–949 (2014).
Jinek, M. et al. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science 343, 1247997 (2014).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2021).
Karvelis, T. et al. Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements. Genome Biol. 16, 253 (2015).
Chatterjee, P., Jakimo, N. & Jacobson, J. M. Minimal PAM specificity of a highly similar SpCas9 ortholog. Sci. Adv. 4, eaau0766 (2018).
Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).
Chatterjee, P. et al. An engineered ScCas9 with broad PAM range and high specificity and activity. Nat. Biotechnol. 38, 1154–1158 (2020).
Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290–296 (2020).
Walton, R. T., Hsu, J. Y., Joung, J. K. & Kleinstiver, B. P. Scalable characterization of the PAM requirements of CRISPR–Cas enzymes using HT-PAMDA. Nat. Protoc. 16, 1511–1547 (2021).
Gasiunas, G. et al. A catalogue of biochemically diverse CRISPR-Cas9 orthologs. Nat. Commun. 11, 5512 (2020).
Ma, E., Harrington, L. B., O’Connell, M. R., Zhou, K. & Doudna, J. A. Single-stranded DNA cleavage by divergent CRISPR-cas9 enzymes. Mol. Cell 60, 398–407 (2015).
Feng, W. et al. CRISPR technology incorporating amplification strategies: molecular assays for nucleic acids, proteins, and small molecules. Chem. Sci. 12, 4683–4698 (2021).
Chen, J. S. et al. CRISPR-Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science 360, 436–439 (2018).
Charlesworth, C. T. et al. Identification of preexisting adaptive immunity to Cas9 proteins in humans. Nat. Med. 25, 249–254 (2019).
Fernández, A. et al. Genetics of non-syndromic and syndromic oculocutaneous albinism in human and mouse. Pigment Cell Melanoma Res. 34, 786–799 (2021).
Cervera, S. T. et al. Therapeutic potential of EWSR1–FLI1 inactivation by CRISPR/Cas9 in Ewing sarcoma. Cancers 13, 3783 (2021).
Wang, J.A.-O., Pausch, P.A.-O. & Doudna, J.A.-O.X. Structural biology of CRISPR-Cas immunity and genome editing enzymes. Nat. Rev. Microbiol. https://doi.org/10.1038/s41579-022-00739-4 (2022).
Risso, V. A., Gavira, J. A., Mejia-Carmona, D. F., Gaucher, E. A. & Sanchez-Ruiz, J. M. Hyperstability and substrate promiscuity in laboratory resurrections of Precambrian beta-lactamases. JACS Au. 135, 2899–2902 (2013).
Risso, V. A., Gavira, J. A. & Sanchez-Ruiz, J. M. Thermostable and promiscuous Precambrian proteins. Environ. Microbiol. 16, 1485–1489 (2014).
Merkl, R. & Sterner, R. Ancestral protein reconstruction: techniques and applications. Biol. Chem. 397, 1–21 (2016).
Jones, B. J. et al. Larger active site in an ancestral hydroxynitrile lyase increases catalytically promiscuous esterase activity. PLoS ONE 15, e0235341 (2020).
Boussau, B., Blanquart, S., Necsulea, A., Lartillot, N. & Gouy, M. Parallel adaptations to high temperatures in the Archaean eon. Nature 456, 942–945 (2008).
Blanquart, S. et al. Resurrection of ancestral malate dehydrogenases reveals the evolutionary history of halobacterial proteins: deciphering gene trajectories and changes in biochemical properties. Mol. Biol. Evol. 38, 3754–3774 (2021).
Risso, V. A. et al. De novo active sites for resurrected Precambrian enzymes. Nat. Commun. 8, 16113 (2017).
Santiago-Ortiz, J. et al. AAV ancestral reconstruction library enables selection of broadly infectious viral variants. Gene Ther. 22, 934–946 (2015).
Gumulya, Y. & Gillam, E. M. Exploring the past and the future of protein evolution with ancestral sequence reconstruction: the ‘retro’ approach to protein engineering. Biochem. J. 474, 1–19 (2017).
Harms, M. J. & Thornton, J. W. Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol. 20, 360–366 (2010).
Lee, J. K. et al. Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).
Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).
Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Burstein, D. et al. New CRISPR–Cas systems from uncultivated microbes. Nature 542, 237–241 (2017).
Le, S. Q. & Gascuel, O. An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320 (2008).
Sheridan, P. P., Freeman, K. H. & Brenchley, J. E. Estimated minimal divergence times of the major bacterial and archaeal phyla. Geomicrobiol. J. 20, 1–14 (2003).
Marin, J., Battistuzzi, F. U., Brown, A. C. & Hedges, S. B. The timetree of prokaryotes: new insights into their evolution and speciation. Mol. Biol. Evol. 34, 437–446 (2016).
Tamura, K. et al. Estimating divergence times in large molecular phylogenies. Proc. Natl Acad. Sci. USA 109, 19333–19338 (2012).
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Hedges, S. B. & Kumar, S. The Timetree of Life (Oxford Univ. Press, 2009).
Krissinel, E. & Henrick, K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 (2004).
Collaborative Computational Project, No. 4 The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 50, 760–763 (1994).
Leenay, R. T. et al. Identifying and visualizing functional PAM diversity across CRISPR-Cas systems. Mol. Cell 62, 137–147 (2016).
Wang, D. et al. Adenovirus-mediated somatic genome editing of Pten by CRISPR/Cas9 in mouse liver in spite of Cas9-specific immune responses. Hum. Gene Ther. 26, 432–442 (2015).
Harms, D. W. et al. Mouse genome editing using the CRISPR/Cas system. Curr. Protoc. Hum. Genet https://doi.org/10.1002/0471142905.hg1507s83 (2014).
Oliveros, J. C. et al. Breaking-Cas-interactive design of guide RNAs for CRISPR-Cas experiments for ENSEMBL genomes. Nucleic Acids Res. 44, W267–W271 (2016).
Seruggia, D., Fernández, A., Cantero, M., Pelczar, P. & Montoliu, L. Functional validation of mouse tyrosinase non-coding regulatory DNA elements by CRISPR–Cas9-mediated mutagenesis. Nucleic Acids Res. 43, 4855–4867 (2015).
Fernandez, A. et al. Simple protocol for generating and genotyping genome-edited mice with CRISPR-Cas9 reagents. Curr. Protoc. Mouse Biol. 10, e69 (2020).
López-Márquez, A. A.-O. et al. CRISPR/Cas9-mediated allele-specific disruption of a dominant COL6A1 pathogenic variant improves collagen VI network in patient fibroblasts. Int. J. Mol. Sci. 23, 4410 (2022).
Lindsay, H. et al. CrispRVariants charts the mutation spectrum of genome engineering experiments. Nat. Biotechnol. 34, 701–702 (2016).
Certo, M. T. et al. Tracking genome engineering outcome at individual DNA breakpoints. Nat. Methods 8, 671–676 (2011).
This work has been supported by grant nos. PID2019-109087RB-I00 (to R.P.-J.) and RTI2018-101223-B-I00 and PID2021-127644OB-I00 (to L.M.) from the Spanish Ministry of Science and Innovation. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 964764 (to R.P.-J.). The content presented in this document represents the views of the authors, and the European Commission has no liability in respect to the content. We acknowledge financial support from the Spanish Foundation for the Promotion of Research of Amyotrophic Lateral Sclerosis. A.F. acknowledges Spanish Center for Biomedical Network Research on Rare Diseases (CIBERE) intramural funds (no. ER19P5AC756/2021). F.J.M.M. acknowledges research support by Conselleria d’Educació, Investigació, Cultura i Esport from Generalitat Valenciana, research project nos. PROMETEO/2017/129 and PROMETEO/2021/057. M.M. acknowledges funding from CIBERER (grant no. ER19P5AC728/2021). The work has received funding from the Regional Government of Madrid (grant no. B2017/BMD3721 to M.A.M.-P.) and from Instituto de Salud Carlos III, cofounded with the European Regional Development Fund ‘A way to make Europe’ within the National Plans for Scientific and Technical Research and Innovation 2017–2020 and 2021–2024 (nos. PI17/1659, PI20/0429 and IMP/00009; to M.A.M.-P. B.P.K. was supported by an MGH ECOR Howard M. Goodman Award and NIH P01 HL142494. We thank H. Stutzman for assistance with cloning plasmids, and Z. Herbert and M. Berkeley from the Molecular Biology Core Facilities at the Dana-Farber Cancer Institute for assistance with NextSeq sequencing.
R.P.-J. and B.A.-L. are co-inventors on a patent application (European Patent Application EP21382474) filed by CIC nanoGUNE and licensed to Integra Therapeutics S.L. relating to work in this article. A.S.-M. and M.G. are cofounders of Integra Therapeutics S.L. B.P.K. is an inventor on patents and/or patent applications filed by Mass General Brigham that describe genome engineering technologies, including the HT-PAMDA method (WO2021151065). B.P.K. is a consultant for EcoR1 Capital and is an advisor to Acrigen Biosciences, Life Edit Therapeutics and Prime Medicine. The remaining authors declare no competing interests.
Peer review information
Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Posterior probability distribution for each inferred residue of all ancestral anCas endonucleases.
The residue with the highest posterior probability is assigned at each position. The posterior average probability of each anCas is indicated in brackets. In all cases, posterior probability average is close to 1 except for FCA anCas which shows an average value of 0.74.
Extended Data Fig. 2 Alignment of the amino acid sequences from PAM interacting (PI) domain of anCas and SpCas9.
Percentage of identity of the different anCas sequences with respect to SpCas9.
Extended Data Fig. 3 List of important mutations and domain organization of anCas compared to SpCas9.
Mutations of the main residues involved in PAM recognition are marked in blue. These alterations suggest possible differences in PAM recognition abilities in FCA anCas and possibly in BCA anCas. Bottom figure depicts domain organization and structural alignment of SpCas9-FCA and SpCas9-PDCA anCas. Ancestral anCas are grey colored and SpCas9 colored by domains.
Extended Data Fig. 4 Structural predictions of anCas and SpCas9 by AlphaFold2.
Structures are colored by pLDDT score according to the color bar.
Extended Data Fig. 5 Activity of FCA anCas H838A endonuclease on a supercoiled DNA substrate.
(a) In vitro cleavage assay for anCas FCA H838A on a 4007 bp substrate at different reaction times showing nicked and linear fractions. (b) Quantification of total cleavage fraction at different reaction times and exponential fits (lines). (c) Quantification of fraction nicked at different times. (d) Quantification of DSB cleavage. Single-exponential fits were used to obtain kcleave and maximum fraction cleaved (amplitude). Values reported as mean, where n = 2.
Extended Data Fig. 6 PAM determination of anCas enzymes.
(a) Example of in vitro cleavage assay to obtain 278 bp fragment for NGS analysis. (b) Weblogo of the different PAM recognized by anCas and SpCas9. (c) Heatmaps illustrating the total reads for each of the possible 256 NNNN PAMs, analyzed from NGS of the 278 bp cleaved DNA fragments from panel a. (d) In vitro cleavage assay using the PAM sequence TCC.
Extended Data Fig. 7 HT-PAMDA determined PAM profiles of younger anCas enzymes.
PAM profiles of anCas enzymes and SpCas9 as determined by HT-PAMDA. Rate constants corresponding to Cas cleavage activity on each of the 256 NNNN PAMs are illustrated as mean log10 values of cleavage reactions against two unique spacer sequences. For comparison, the SpCas9 is re-plotted from Fig. 4.
Extended Data Fig. 8 Trans-activity of FCA anCas, BCA anCas and SpCas9 on M13 phage ssDNA.
Nonspecific M13 ssDNA cleavage with sgRNA and complementary (or not) 85nt ssDNA activator with no sequence homology to M13 circular ssDNA. FCA anCas can cleave the ssDNA substrate in the presence of the activator, whereas BCA anCAs and SpCas9 do not cleave the same substrate.
Extended Data Fig. 9 Analysis of the in vivo activity of anCas variants.
Alignments generated by Jalview program of the wild-type and the most frequent edited alleles (indels) detected by Mosaic Finder in (a) OCA2 and (b) TYR genes after NHEJ cell repair in HEK 293T cells. Heatmaps are shown underneath the alignments highlighting the frequencies of the top-7 most frequent alleles generated after cleavage and repair with SpCas9, PDCA, PCA, SCA and BCA anCas, once normalized with respect to the total number of indels for each Cas. The guide, the PAM and the DSB theoretical site are marked in the figure. For the mutation nomenclature of each allele, we consider the first nucleotide of the PAM as +1. Numbers within the allele sequences represent the length of insertions or deletion in the exact location indicated by the first figure. Example: -4Ins1, insertion of 1 nucleotide four bases upstream the PAM.
Extended Data Fig. 10 Traffic Light Reporter cleavage assay targeting gene TLR.
The relative NHEJ frequency is estimated by the number of RFP-positive cells and is normalized to SpCas9. Bars represent the average value of two independent experiments indicated by the black dots.
Supplementary Figs. 1–5, Tables 1–16, Notes 1–4 and references.
Source Data Fig. 1
Amino acid sequences of ancestral anCas.
Source Data Fig. 4
HT-PAMDA data summary.
Source Data Fig. 5
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alonso-Lerma, B., Jabalera, Y., Samperio, S. et al. Evolution of CRISPR-associated endonucleases as inferred from resurrected proteins. Nat Microbiol 8, 77–90 (2023). https://doi.org/10.1038/s41564-022-01265-y