In complex DNA samples, the sequences of biological interest often comprise only a fraction of the total DNA, which creates a need for enrichment strategies to avoid wasting sequencing bandwidth on uninformative reads. Two new papers in Nature Biotechnology report methods for targeted sequencing of complex DNA samples, achieved in real time during nanopore sequencing runs.
Strategies for targeted sequencing generally involve enrichment of the DNA of interest before sequencing, which is achieved by selective capture or amplification of the intended DNA. However, the oligonucleotides for target DNA enrichment are not readily customizable on a case-by-case basis, which motivates the pursuit of more adaptable methods for targeted sequencing.
Published in 2016, ‘Read Until’ is a method of targeted nanopore sequencing whereby the targeting occurs during the sequencing itself: by collecting data in real time for the first part of a read, ‘unwanted’ DNA molecules can be ejected from the pore by reversing the current in that channel, thus freeing up the channel for a new DNA molecule. As only intended DNA molecules are sequenced to full length, the sequencing bandwidth is focused on the DNA sequences of interest.
The original implementation of Read Until used the raw sequencing signals (the pattern of electrical changes as the DNA passes through the pore) and compared them to simulated raw signals derived from a reference sequence database to determine whether sequencing of that DNA molecule should continue. However, the original Read Until algorithm was computationally intensive and hence became unfeasible for reference databases larger than tens of kilobases. Both the new studies sought to optimize efficient bioinformatic strategies to expand real-time targeted sequencing for use with complex genomics applications.
In their study, Kovaka et al. devised a method called UNCALLED, which leverages a Ferragina–Manzini index to efficiently search for sequences within a DNA reference database that are compatible with the emerging raw signals.
The authors demonstrated the applicability of their approach for targeted sequencing of Saccharomyces cerevisiae DNA within a ZymoBIOMICS metagenomic pool also containing DNA from seven species of bacteria. By ejecting any DNA molecules that mapped to the combined 29 Mb of bacterial reference sequences, the authors achieved 4.46-fold enrichment of S. cerevisiae sequence relative to an untargeted control experiment.
For application to human targeted sequencing, Kovaka et al. used an 18 Mb reference database of 148 genes linked to human hereditary cancer predisposition. In this case, the reference database was used as the targeted list rather than the excluded list, and the team demonstrated 5.5-fold enrichment of this gene set from human DNA.
Finally, although not demonstrated in practice, Kovaka et al. used simulations to report that UNCALLED would be feasible for enrichment using larger reference databases, such as a 111 Mb reference consisting of the COSMIC set of 717 genes mutated in cancer.
In their study, Payne et al. instead operated in regular nucleotide space; hence, their method, termed readfish, focused on optimizing the efficiency of base-calling during sequencing, which was achieved by using a graphical processing unit (GPU).
Similarly to Kovaka et al., Payne et al. showed that S. cerevisiae sequence could be enriched from a ZymoBIOMICS metagenomic sample, achieving 5.7-fold enrichment. As an elegant illustration that the lack of requirement for target-specific oligonucleotides provides valuable adaptability of targeting, Payne et al. showed that the reference database could be dynamically updated during a sequencing run to focus the sequencing bandwidth on DNA from species that had not yet reached a chosen threshold of coverage.
Payne et al. additionally showed that readfish is applicable to complex targeted human DNA sequencing by experimentally demonstrating several-fold enrichment of various subsets of genomic DNA: a panel of 717 COSMIC genes (89.9 Mb) and a whole-exome panel (176 Mb), as well as several whole chromosomes representing up to half of the entire genome.
Furthermore, both teams showed that various single-nucleotide variants and structural variants could be identified in the targeted human genome sequencing data.
“the lack of requirement for target-specific oligonucleotides provides valuable adaptability of targeting”
Overall, real-time targeted sequencing holds great promise as a rapid and adaptable sequencing approach for various applications in both human and microbial genomics.
Kovaka, S. et al. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0731-9 (2020)
Payne, A. et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-00746-x (2020)
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020)
About this article
Cite this article
Burgess, D.J. Complex targeted sequencing in real time. Nat Rev Genet 22, 67 (2021). https://doi.org/10.1038/s41576-020-00324-6