To the Editor:
We recently described the genome-wide, unbiased identification of double-strand breaks (DSBs) enabled by sequencing (GUIDE-seq) technology, a sensitive method for detecting global off-target DSBs induced by RNA-guided CRISPR–Cas9 nucleases in living cells1. The experimental component of GUIDE-seq is straightforward and encompasses capture of a double-stranded oligodeoxynucleotide (dsODN) into Cas9-induced DSBs in cells, selective amplification of these integration events, and next-generation sequencing of genomic DNA adjacent to the dsODN. However, analysis of the resulting sequencing data is a multistep process that, as described in our original published report1, required multiple custom-built software components. Here we describe guideseq, a streamlined, open-source Python package that enables any user to readily perform analysis of GUIDE-seq experiment data (Fig. 1a). The software is simple to use and requires only basic technical knowledge to set up and run.
The guideseq software performs analysis based on raw sequencing data and a sample manifest in YAML format (http://yaml.org/). The sample manifest organizes the required information for bioinformatic analysis of GUIDE-seq runs, including the location of raw sequencing read files, the names of the biological samples and control, the sequences of dual-index barcodes, and the intended target site sequence.
In an initial step, our GUIDE-seq analysis pipeline prepares sequencing reads for alignment by demultiplexing a pooled multisample sequencing run into sample-specific read files. PCR duplicates are consolidated based on 8-bp unique molecular indexes (UMIs) in order to improve quantitative interpretation of GUIDE-seq read counts (https://github.com/aryeelab/umi). Next, off-target identification is performed through read alignment, site identification, false positive filtering, and reporting steps (Supplementary Methods). Off-target cleavage sites are sorted by GUIDE-seq read count, and figures are produced of the sequence alignment (Fig. 1b). The pipeline can either be run end-to-end with a single command or, if preferred, the component steps can be executed individually.
The guideseq Python package is provided under an open-source (AGPLv3) license and should broadly enable researchers to analyze GUIDE-seq experiments. Source code, installation, and up-to-date running instructions will be maintained at http://github.com/aryeelab/guideseq (see Supplementary Note for version of instructions at the time of this publication).
Author contributions
S.Q.T. conceived and developed the initial GUIDE-seq analysis algorithm. M.J.A. developed UMI processing and PCR deduplication code. V.V.T. developed the software package infrastructure, filtering and visualization modules, and wrote documentation with input from M.J.A. and S.Q.T. J.K.J. and M.J.A. supervised the project. S.Q.T., V.V.T., J.K.J., and M.J.A. wrote the manuscript.
References
Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).
Acknowledgements
J.K.J. is supported by an NIH Director's Pioneer Award (DP1 GM105378) and the Jim and Ann Orr MGH Research Scholar Award. M.J.A. is supported by the MGH Department of Pathology.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
J.K.J. is a consultant for Horizon Discovery. J.K.J. has financial interests in Editas Medicine, Hera Testing Laboratories, Poseida Therapeutics, and Transposagen Biopharmaceuticals. J.K.J.'s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies. S.Q.T., J.K.J. and M.J.A. are co-founders of Beacon Genomics, a company that is commercializing methods for determining nuclease specificity. S.Q.T. and J.K.J. have filed a patent application for the GUIDE-seq method.
Additional information
Editor's note: This article has been peer-reviewed.
Supplementary information
Supplementary Materials
Supplementary Methods and Supplementary Note (PDF 191 kb)
Rights and permissions
About this article
Cite this article
Tsai, S., Topkar, V., Joung, J. et al. Open-source guideseq software for analysis of GUIDE-seq data. Nat Biotechnol 34, 483 (2016). https://doi.org/10.1038/nbt.3534
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.3534
This article is cited by
-
A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets
Nature Communications (2023)
-
Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities
Nature Communications (2022)
-
Base-editing-mediated dissection of a γ-globin cis-regulatory element for the therapeutic reactivation of fetal hemoglobin expression
Nature Communications (2022)
-
FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity
Nature Communications (2022)
-
Defining genome-wide CRISPR–Cas genome-editing nuclease activity with GUIDE-seq
Nature Protocols (2021)