Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky (https://github.com/TheJacksonLaboratory/Picky), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Bochukova, E. G. et al. Large, rare chromosomal deletions associated with severe early-onset obesity. Nature 463, 666–670 (2010).
Diskin, S. J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).
Edwards, P. A. Fusion genes and chromosome translocations in the common epithelial cancers. J. Pathol. 220, 244–254 (2010).
Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl. Acad. Sci. USA 113, E2373–E2382 (2016).
Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).
Stankiewicz, P. & Lupski, J. R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010).
Chaisson, M. J. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Sović, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).
Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).
Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017).
Sedlazeck, F.J. et al. Accurate detection of complex structural variations using single molecule sequencing. bioRxiv Preprint at https://www.biorxiv.org/content/early/2017/07/28/169557 (2017).
Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351–356 (2015).
Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).
Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
Gazdar, A. F. et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998).
Li, H. Minimap2: fast pairwise alignment for long nucleotide sequences. arXiv Preprint at https://arxiv.org/abs/1708.01492 (2017).
Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).
Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Cahill, D., Connor, B. & Carney, J. P. Mechanisms of eukaryotic DNA double strand break repair. Front. Biosci. 11, 1958–1976 (2006).
Howarth, K. D. et al. Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes. Oncogene 27, 3345–3359 (2008).
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Branco, M. R. & Pombo, A. Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 4, e138 (2006).
Tjong, H. et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl. Acad. Sci. USA 113, E1663–E1672 (2016).
Chung, I. F. et al. DriverDBv2: a database for human cancer driver gene research. Nucleic Acids Res. 44, D975–D979 (2016).
The authors thank P. Shreckengast for collecting the HCC1187 cells; C. Robinett and A. Lau for their comments on the manuscript; and B. Hanson and M. Bolisetty for their help in setting up the initial nanopore runs. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
L.G., C.-H.W., and C.-L.W. have received a few batches of reagent from Oxford Nanopore. C.-L.W. has received travel and accommodation support from Oxford Nanopore as an invited speaker at the Oxford Nanopore user meeting.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Each blue dot represents a single 2D nanopore run. N = 13
(a) The total counts and (b) the log-likelihood of the adjacent SVs phased by the multi-breakpoint long reads. Red count indicates observation > 2X expected. Blue count indicates observation < 0.5X expected. N = 2,374.
Nanopore read-to-genome alignments, junction sequences and affected genes were shown in each SV class. The micro-homologous sequences shared between junctions were highlighted in red boxes. (a) TDJ. (b) INS. (c) DEL. (d) INV. (e) TLC. The translocation t(1;8) identified is consistent with translocation identified previously by spectral karyotyping (SKY)32 with base resolution. (f) Amplified PCR fragments across breakpoints for each SVs shown in (a)-(e) were analyzed by Bioanalyzer (Agilent Technologies). L: molecular size markers. Independent repeats = 2.
(a) Summary of the validated SVs by PCR strategy. (b) Numbers of SVs called by LUMPY from different depth of short-read data. *: deletion and DEL in INDEL. **: thresholds used in SV calling by LUMPY (see Online Methods). ***: not called by standard LUMPY pipeline. (c) The numbers of high confidence SVs previously described in HCC1187 detected by nanopore sequencing.
(a) PCR products corresponding to different haplotypes in two validated SVs. Independent repeats = 2. (b) Reads supporting both SV and the normal genotypes from the same locus were visualized in IGV browser. (c) Heterozygosity analysis from 50 randomly selected loci from each of the seven SV types.
Supplementary Figure 6 A comprehensive comparison of SV detection in long-read and short-read analyses.
LR, long-read. SR, short-read. (a) Numbers of SVs found in each data and their overlaps. (b) Distributions of the SV span size.
Overview of the different components and features among Picky, Sniffles and NanoSV. Yes represents the SV type can be reported by the pipeline while N/A represents that cannot be reported.
(a) The span distribution of DEL, INS and INDEL. (b) Relative percentages of repeats across different span sizes in simple DEL. (c) Relative percentages of repeats across different span sizes in simple INS.
Supplementary Figure 9 Selected cases of micro-insertions from nanopore results confirmed by PacBio sequencing.
(a) A 36 bp insertion associated with a 329 bp deletion on chromosome 20. (b) A 75 bp insertion associated with a 3,262 bp deletion on chromosome X.
Supplementary Figure 10 Distribution of the SV breakpoints along the genomic features of transcription.
(a) Enrichment of breakpoint from each SV class. (b) Distributions of the breakpoints from different types of TDCs.
(a) Histogram of gene expression from SVs-genes (log2 transferred). (b) Histogram of gene expression from the control genes. Similar expression profiles and the equivalent numbers of SVs-genes are shown (log2 transferred). (c) The MDS plot expressions of the SVs-genes by sample-wise permutation. Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113. (d) The MDS plot of the expressions from the control genes. All data are from the breast carcinoma (BRCA) dataset within the cancer genome atlas (TCGA). Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113.
High-scoring Segment Pair (HSP) i between the read segment and the reference segment is denoted by Qi and Si respectively. Linked alignment extensions with 3 segments will have 3 HSPs indicated by Q1:S1, Q2:S2 and Q3:S3. Each segment span is denoted by (start,end] as per UCSC 0-start, half-open coordinate system. sDiff between reference segment Si and Si+1 is given by Si+1(start)-Si(end). qDiff between read segment Qi and Qi+1 is given by Qi+1(start)-Qi(end).
(a) The ratio of the observed versus expected instances of all 1,024 5-mers. Highlighted are the 4 under-called homopolymers. (b) The annotated current trace for the segment harboring basecalled deletion. The trace indicates the clear existence of the two homopolymers (marked (A)20 and (T)18) rather than the deletion flanked by (A)5 and (T)5.
Overview of the process of assigning breakpoints to their corresponding genomic features on the basis of the gene model.
Supplementary Figures 1–14
Supplementary Note 1
Summary of the 15 nanopore runs in this study
Summary of the mapping and SV-calling results
List of seven SV types detected in nanopore data
SVs selected for validation analysis
Details of nanopore sequencing kits, devices, and software
List of all primers used in this study
About this article
Cite this article
Gong, L., Wong, CH., Cheng, WC. et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat Methods 15, 455–460 (2018). https://doi.org/10.1038/s41592-018-0002-6
Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma
Scientific Reports (2021)
Chromosome-level de novo assembly of Coprinopsis cinerea A43mut B43mut pab1-1 #326 and genetic variant identification of mutants using Nanopore MinION sequencing
Fungal Genetics and Biology (2021)
Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology
npj Precision Oncology (2021)
Lab on a Chip (2020)
Analyses of breakpoint junctions of complex genomic rearrangements comprising multiple consecutive microdeletions by nanopore sequencing
Journal of Human Genetics (2020)