Despite recent improvements in sequencing methods, there remains a need for assays that provide high sequencing depth and comprehensive variant detection. Current methods1,2,3,4 are limited by the loss of native modifications, short read length, high input requirements, low yield or long protocols. In the present study, we describe nanopore Cas9-targeted sequencing (nCATS), an enrichment strategy that uses targeted cleavage of chromosomal DNA with Cas9 to ligate adapters for nanopore sequencing. We show that nCATS can simultaneously assess haplotype-resolved single-nucleotide variants, structural variations and CpG methylation. We apply nCATS to four cell lines, to a cell-line-derived xenograft, and to normal and paired tumor/normal primary human breast tissue. Median sequencing coverage was 675× using a MinION flow cell and 34× using the smaller Flongle flow cell. The nCATS sequencing requires only ~3 μg of genomic DNA and can target a large number of loci in a single reaction. The method will facilitate the use of long-read sequencing in research and in the clinic.
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequencing data from all non-primary patient samples for this study can be retrieved from the SRA, under the BioProject ID PRJNA531320.
The computational code used in all of the analysis is hosted on GitHub (see https://github.com/timplab/Cas9Enrichment, https://github.com/isaclee/nanopore-methylation-utilities).
Karamitros, T. & Magiorkinis, G. Multiplexed targeted sequencing for Oxford Nanopore MinION: a detailed library preparation procedure. Methods Mol. Biol. 1712, 43–51 (2018).
Leija-Salazar, M. et al. Evaluation of the detection of GBA missense mutations and other variants using the Oxford Nanopore MinION. Mol. Genet. Genom. Med. 7, e564 (2019).
Gabrieli, T. et al. Selective nanopore sequencing of human BRCA1 by Cas9-assisted targeting of chromosome segments (CATCH). Nucleic Acids Res. 46, e87 (2018).
Giesselmann, P. et al. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat. Biotechnol. 37, 1478–1481 (2019).
Kozarewa, I., Armisen, J., Gardner, A. F., Slatko, B. E. & Hendrickson, C. L. Overview of target enrichment strategies. Curr. Protoc. Mol. Biol. 112, 7.21.1–7.21.23 (2015).
Eberle, M. A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62–67 (2014).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Preprint at bioRxiv https://doi.org/10.1101/504993 (2018).
Messier, T. L. et al. Histone H3 lysine 4 acetylation and methylation dynamics define breast cancer subtypes. Oncotarget 7, 5094–5109 (2016).
Welcsh, P. L. & King, M. C. BRCA1 and BRCA2 and the genetics of breast and ovarian cancer. Hum. Mol. Genet. 10, 705–713 (2001).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Luo, R. et al. Clair: Exploring the limit of using a deep neural network on pileup data for germline variant calling. Preprint at bioRxiv https://doi.org/10.1101/865782 (2019).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Martignano, F. et al. GSTP1 methylation and protein expression in prostate cancer: diagnostic implications. Dis. Markers 2016, 4358292 (2016).
Kabir, N. N., Rönnstrand, L. & Kazi, J. U. Keratin 19 expression correlates with poor prognosis in breast cancer. Mol. Biol. Rep. 41, 7729–7735 (2014).
Wang, X.-M., Zhang, Z., Pan, L.-H., Cao, X.-C. & Xiao, C. KRT19 and CEACAM5 mRNA-marked circulated tumor cells indicate unfavorable prognosis of breast cancer patients. Breast Cancer Res. Treat. 174, 375–385 (2019).
Noguchi, S. et al. Detection of breast cancer micrometastases in axillary lymph nodes by means of reverse transcriptase-polymerase chain reaction. Comparison between MUC1 mRNA and keratin 19 mRNA amplification. Am. J. Pathol. 148, 649–656 (1996).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Preprint at bioRxiv https://doi.org/10.1101/193144 (2018).
Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675.e19 (2019).
Dixon, J. R. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398 (2018).
Timp, W. & Feinberg, A. P. Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat. Rev. Cancer 13, 497–510 (2013).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Hansen, K. D., Langmead, B. & Irizarry, R. A. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012).
This work was supported by funding from the National Institutes for Health (grant no. R01 HG009190) (National Human Genome Research Institute).
J.G., E.R., R.B. and A.H. are employees of Oxford Nanopore Technologies. W.T. has two patents licensed to Oxford Nanopore Technologies (US patent nos. 8,748,091 and 8,394,584). T.G., I.L., F.S. and W.T. have received travel funds to speak at symposia organized by Oxford Nanopore Technologies.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
(a) Coverage and reads at off-target site (first locus from Supplementary Table 3), identified in sequencing run TG_09. (b) Pair-wise alignment showing similarity between guideRNA and the off-target cleavage site.
Supplementary Fig. 2 True positive variants and false positive variants demonstrating the impetus for dual-strand filter.
Left: Two real variants which are supported by data on both strands. Right: Example of two false positive variants resulting from a sequencing error on only one strand.
The single false positive variant from high-coverage sequencing data that passes dual-strand filtering. This variant is present in a highly thymidine-dense region. Note this variant falls within a repetitive region of the genome masked by RepeatMasker, thus the lowercase reference.
Single nucleotide high-confidence variant calls (nanopolish passing dual strand filter) at two other enriched sites on chr17 (KRT19 and 30kb piece of BRCA1). Reads were phased to show only variants passing dual-strand filter using the ‘phase-reads’ module of nanopolish. Tumor reads were phased into haplotypes using only variants from the corresponding normal sample.
Supplementary Fig. 5 Methylation line plots, read-level plots and per-CpG plots for five loci in GM12878 enrichment data.
(a) Line and dot plot of methylation calls made by bismark (WGBS Illumina data: GEO: GSE86765) and nanopolish (Cas9-targeted nanopore data) at all CpGs in the targeted regions. Gene models plotted below for orientation. (b) Read-level methylation plots for five loci in GM12878. (c) Per CpG scatter plot comparing methylation calls made by bismark (WGBS Illumina data: GEO: GSE86765) and nanopolish (Cas9-targeted nanopore data) at all CpGs in the targeted regions. r=0.81 across all 5 sites.
Read-level methylation plots for captured loci in breast cell lines (MCF-10-A, MDA-MB-231, MCF-7).
Normalized expression data (read counts) for three breast cell lines from existing RNA-seq data (GEO: GSE75168).
Read-level methylation plots for 5 captured loci in fresh breast tissue (reduction mammoplasty, cell-line-derived xenograft, paired tumor/normal). Tumor/normal samples are segregated into haplotypes using only variants from the normal sample.
Reads at a small (< 10kb) common structural variant on chromosome 5 from breast cell line nanopore enrichment data (deletion at chromosome 7 is included as main Fig. 4a).
Comparing methylation patterns at heterozygous deletions on chromosome 5 and chromosome 7 in MDA-MB-231 cell line data.
Coverage plots around two large heterozygous deletions in GM12878 (RunID: TG_07). Yellow triangles show points of Cas9 cleavage. Blue lines show coverage of reads assigned to paternal haplotype and red lines show coverage of reads assigned to maternal haplotype. (In both cases, the distance between cuts on the deleted allele is ~10kb and distance between cuts on non-deleted allele is ~80kb).
Comparing paternal and maternal coverage at two sites in GM12878 using a single cut each side (RunID: TG_01) at sites with no heterozygous SVs between guideRNAs. Unlike at the sites of large heterozygous deletions, we do not see a dramatic bias towards either parental allele.
Left: Reads from BRCA1 enrichment with DNA extracted using the Masterpure kit (Lucigen, Cat#MC85200) Right: Reads from BRCA1 enrichment run with DNA extracted using the Nanobind kit (Circulomics, Cat#NB-900-001-0).
(a) Showing whole genome PacBio data around BRCA1 in GM12878 from publicly available data (SRA: SRR9001768 - SRR9001773) (b) Comparison of the three not annotated heterozygous indels found in GM12878 between Cas9-nanopore enrichment data (top) and whole-genome PacBio data (bottom).
Methylation analysis using nanopolish on each of the two alleles of BRCA1 in GM12878. Reads from enrichment run using Circulomics CBB Nanobind kit for DNA preparation shown.
Supplementary Figs. 1–15.
The gRNA sequences and details for each sequencing run. Sheet 1: gRNA sequences and target sites for the targeted regions for methylation, SV and SNV interrogation. Sheet 2: details of flow cell, sequencer, sample and gRNAs used in each sequencing run.
Coverage table for all sites across each of the sequencing runs. Read count, average coverage and on-target percentage for the 10 enrichment sites across sequencing runs.
Off-target analysis with SURVIVOR: off-target analysis for the GM12878 sequencing run using multiple gRNAs (RunID: TG_09), using the bincov tool from SURVIVOR (Jeffares, D. C. et al. Nat. Commun. 8, 14061 (2017)). On-target loci are colored orange. Maximum coverage shows the highest coverage reached in the specific locus.
SNV calls with different coverage in GM12878: sensitivity/TPR and F1 score of SNVs detected by different tools at different coverage levels in the enriched 140 kb from GM12878 (RunID: TG_09); 174 annotated SNVs exist in these regions. Analysis limited to SNVs, through comparison with the platinum genome dataset in GM12878. TPR, true positive rate (sensitivity). F1 score is the harmonic mean of precision and recall.
SNVs called in MDA-MB-231 MinION data (three loci). Sheet 1: SNVs in the MDA-MB-231 cell line identified anew using Nanopolish from nanopore enrichment data at three loci (TP53, BRAF, KRAS). Sheet 2: Nanopolish variants from sheet 1 passing dual-strand filter (high-confidence MDA-MB-231 variants).
Sniffles calls SVs in three breast cell lines: Sniffles SV calls from enrichment data in the three breast cancer cell lines. For both deletions the ploidy was called as heterozygous (het) in MDA-MB231 and homozygous (homo) in MCF-7.
Sniffles calls of large SVs in GM12878. Left: reference calls from LongRanger 2.1 analysis of 10x Genomics data from the GIAB consortium. Right: Sniffles SV calls in GM12878. het, heterozygousGT*; homo, homozygous. Note the settings of Sniffles were adjusted to ensure that the genotypes of large deletions in GM12878 were correctly called (see Methods)
Indels in GM12878 BRCA1 enrichment data. Sheet 1: all indels called between assemblies of the BRCA1 haplotypes in GM12878. DNA isolated using the Circulomics Nanobind CBB kit (RunID: TG_08). Sheet 2: indels from sheet 1 filtered for length ≥3 nt, removing indels resulting from differences in homopolymer length. Indels not previously annotated are colored blue. Comparison with annotated variants from the platinum genomes 2017 hybrid dataset for Hg38 human assembly (Eberle et al. Genome Res. 27(1), 157–164 (2017)).
About this article
Cite this article
Gilpatrick, T., Lee, I., Graham, J.E. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0407-5