Open-source guideseq software for analysis of GUIDE-seq data

Tsai, Shengdar Q; Topkar, Ved V; Joung, J Keith; Aryee, Martin J

doi:10.1038/nbt.3534

Download PDF

Correspondence
Published: 06 May 2016

Open-source guideseq software for analysis of GUIDE-seq data

Shengdar Q Tsai^1,2,3,4^na1,
Ved V Topkar^1,2,3^na1,
J Keith Joung^1,2,3,4 &
…
Martin J Aryee ORCID: orcid.org/0000-0002-6848-1344^1,2,4,5

Nature Biotechnology volume 34, page 483 (2016)Cite this article

6545 Accesses
39 Citations
13 Altmetric
Metrics details

Subjects

To the Editor:

We recently described the genome-wide, unbiased identification of double-strand breaks (DSBs) enabled by sequencing (GUIDE-seq) technology, a sensitive method for detecting global off-target DSBs induced by RNA-guided CRISPR–Cas9 nucleases in living cells¹. The experimental component of GUIDE-seq is straightforward and encompasses capture of a double-stranded oligodeoxynucleotide (dsODN) into Cas9-induced DSBs in cells, selective amplification of these integration events, and next-generation sequencing of genomic DNA adjacent to the dsODN. However, analysis of the resulting sequencing data is a multistep process that, as described in our original published report¹, required multiple custom-built software components. Here we describe guideseq, a streamlined, open-source Python package that enables any user to readily perform analysis of GUIDE-seq experiment data (Fig. 1a). The software is simple to use and requires only basic technical knowledge to set up and run.

**Figure 1: Overview of software analysis pipeline for processing of GUIDE-seq data and example output visualization.**

The guideseq software performs analysis based on raw sequencing data and a sample manifest in YAML format (http://yaml.org/). The sample manifest organizes the required information for bioinformatic analysis of GUIDE-seq runs, including the location of raw sequencing read files, the names of the biological samples and control, the sequences of dual-index barcodes, and the intended target site sequence.

In an initial step, our GUIDE-seq analysis pipeline prepares sequencing reads for alignment by demultiplexing a pooled multisample sequencing run into sample-specific read files. PCR duplicates are consolidated based on 8-bp unique molecular indexes (UMIs) in order to improve quantitative interpretation of GUIDE-seq read counts (https://github.com/aryeelab/umi). Next, off-target identification is performed through read alignment, site identification, false positive filtering, and reporting steps (Supplementary Methods). Off-target cleavage sites are sorted by GUIDE-seq read count, and figures are produced of the sequence alignment (Fig. 1b). The pipeline can either be run end-to-end with a single command or, if preferred, the component steps can be executed individually.

The guideseq Python package is provided under an open-source (AGPLv3) license and should broadly enable researchers to analyze GUIDE-seq experiments. Source code, installation, and up-to-date running instructions will be maintained at http://github.com/aryeelab/guideseq (see Supplementary Note for version of instructions at the time of this publication).

Author contributions

S.Q.T. conceived and developed the initial GUIDE-seq analysis algorithm. M.J.A. developed UMI processing and PCR deduplication code. V.V.T. developed the software package infrastructure, filtering and visualization modules, and wrote documentation with input from M.J.A. and S.Q.T. J.K.J. and M.J.A. supervised the project. S.Q.T., V.V.T., J.K.J., and M.J.A. wrote the manuscript.

References

Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

J.K.J. is supported by an NIH Director's Pioneer Award (DP1 GM105378) and the Jim and Ann Orr MGH Research Scholar Award. M.J.A. is supported by the MGH Department of Pathology.

Author information

Shengdar Q Tsai and Ved V Topkar: These authors contributed equally to this work.

Authors and Affiliations

Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, Massachusetts, USA
Shengdar Q Tsai, Ved V Topkar, J Keith Joung & Martin J Aryee
Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, USA
Shengdar Q Tsai, Ved V Topkar, J Keith Joung & Martin J Aryee
Center for Computational and Integrative Biology, Massachusetts General Hospital, Charlestown, Massachusetts, USA
Shengdar Q Tsai, Ved V Topkar & J Keith Joung
Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA
Shengdar Q Tsai, J Keith Joung & Martin J Aryee
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Martin J Aryee

Authors

Shengdar Q Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Ved V Topkar
View author publications
You can also search for this author in PubMed Google Scholar
J Keith Joung
View author publications
You can also search for this author in PubMed Google Scholar
Martin J Aryee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to J Keith Joung or Martin J Aryee.

Ethics declarations

Competing interests

J.K.J. is a consultant for Horizon Discovery. J.K.J. has financial interests in Editas Medicine, Hera Testing Laboratories, Poseida Therapeutics, and Transposagen Biopharmaceuticals. J.K.J.'s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies. S.Q.T., J.K.J. and M.J.A. are co-founders of Beacon Genomics, a company that is commercializing methods for determining nuclease specificity. S.Q.T. and J.K.J. have filed a patent application for the GUIDE-seq method.

Additional information

Editor's note: This article has been peer-reviewed.

Supplementary information

Supplementary Materials

Supplementary Methods and Supplementary Note (PDF 191 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsai, S., Topkar, V., Joung, J. et al. Open-source guideseq software for analysis of GUIDE-seq data. Nat Biotechnol 34, 483 (2016). https://doi.org/10.1038/nbt.3534

Download citation

Published: 06 May 2016
Issue Date: May 2016
DOI: https://doi.org/10.1038/nbt.3534

This article is cited by

A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets
- Péter István Kulcsár
- András Tálas
- Ervin Welker
Nature Communications (2023)
Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities
- Dawn G. L. Thean
- Hoi Yee Chu
- Alan S. L. Wong
Nature Communications (2022)
Base-editing-mediated dissection of a γ-globin cis-regulatory element for the therapeutic reactivation of fetal hemoglobin expression
- Panagiotis Antoniou
- Giulia Hardouin
- Annarita Miccio
Nature Communications (2022)
FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity
- Zifeng Cui
- Rui Tian
- Zheng Hu
Nature Communications (2022)
Defining genome-wide CRISPR–Cas genome-editing nuclease activity with GUIDE-seq
- Nikolay L. Malinin
- GaHyun Lee
- Shengdar Q. Tsai
Nature Protocols (2021)

Open-source guideseq software for analysis of GUIDE-seq data

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Materials

Rights and permissions

About this article

Cite this article

This article is cited by

A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets

Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities

Base-editing-mediated dissection of a γ-globin cis-regulatory element for the therapeutic reactivation of fetal hemoglobin expression

FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity

Defining genome-wide CRISPR–Cas genome-editing nuclease activity with GUIDE-seq

Search

Quick links

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Materials

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A cleavage rule for selection of increased-fidelity SpCas9 variants with high efficiency and no detectable off-targets

Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities

Base-editing-mediated dissection of a γ-globin cis-regulatory element for the therapeutic reactivation of fetal hemoglobin expression

FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity

Defining genome-wide CRISPR–Cas genome-editing nuclease activity with GUIDE-seq

Search

Quick links