Long-read sequencing powers a more complete reading of genomic information.
This June, we published a special issue highlighting the success of the Telomere-to-Telomere (T2T) Consortium in presenting the first complete human genome. This achievement was made possible by a wide range of experimental and computational efforts. Among them was long-read sequencing, the main sequencing technology responsible for generating the T2T data, which arguably laid the foundation of this feat. Yet the work from the T2T Consortium is only one example of the vast number of discoveries long-read sequencing is enabling in reading genomes, transcriptomes and epigenomes in humans and other species. For its momentous methodological advancement and broad application, we have chosen long-read sequencing as our Method of the Year 2022.
Since the advent of next-generation sequencing nearly two decades ago, the pace of technological innovation has never slowed. Although powerful algorithms strive to connect short reads relying on overlapping sequences, the sheer length and complexity of many genomes pose severe hurdles in generating complete sequences, often resulting in many missing parts and errors. This motivated the development of various strategies for long-read sequencing. The two most widely used commercial technologies are Pacific Biosciences’ Single Molecule Real-Time (SMRT) sequencing (average read length ~20 kb with >99.9% accuracy for HiFi reads) and Oxford Nanopore Technologies’ nanopore sequencing (average read length ~100 kb for ultra-long reads, ~99% accuracy for R10.4). Their distinct sequencing principles and approaches to data generation yield sequencing reads with varied lengths, error rates and throughputs. Researchers may find one long-read sequencing technology better to meet their research goals and resource requirements, depending on the application, and both techniques are continually evolving. In a Technology Feature in this issue, Vivien Marx highlights voices from several researchers developing and applying long-read sequencing in various areas, including interesting stories from its early days and perspectives on the future.
As in many other fields where new technologies are emerging, computational methods are vital role to translating the rich information embedded in long-read sequences to biological discoveries. A Comment from Michael Schatz and colleagues highlights such developments. Active method development is ongoing for many long-read data analysis tasks, ranging from identifying different bases and chemical modifications in DNA and RNA to genome assembly and genome variation detection. One promising direction is to apply advanced statistical and machine learning methods, which have shown remarkable performance in other fields for many computationally challenging tasks. They are increasingly becoming the core elements of the toolbox for long-read data analysis, and we expect the trend to continue into the future.
Enabled by the multitude of method developments, long-read sequencing has found applications in almost all the major areas of genomics. Karen Miga, who co-leads the T2T Consortium, and colleagues present a Comment on applying long-read sequencing in discovering and analyzing genetic variation. As demonstrated by their T2T work, long-read data shines light on many previously dark regions of the genome, such as telomeres and other highly repetitive regions and complex structural variations. With the launch of other large-scale endeavors such as the Vertebrate Genomes Project, more high-quality genomes from human and other species are on the horizon.
Besides genomes, the study of transcriptomes, which are dynamic and tissue- and cell-type-specific in nature, also benefits considerably from long-read sequencing. As explained in a Comment from Hagen Tilgner and colleagues, long-read sequencing holds the potential to unveil the hidden complexity of transcriptomes, such as isoform structure and expression, down to the level of a single cell. Given the paramount role of gene regulation and intra- and intermolecular interaction in isoform diversity, this knowledge will lead to a more quantitative and complete understanding of transcriptomic dynamics and its underlying mechanisms.
Another exciting dimension of genomics where long-read sequencing is seeing substantial traction is epigenomics and epitranscriptomics. A Comment from Eva Maria Novoa and colleagues provides an overview of this fast moving area, which is boosted by long-read sequencing’s ability to detect chemical modifications in DNA and RNA. Unlike standard chemical- or antibody-based detection methods, direct analysis of nanopore sequencing signal, as an example, has been shown to enable reading of different types of modifications. Considering the vast number of different DNA and RNA modifications, with many being underdetected and understudied, long-read sequencing opens a door to exciting discoveries about their distribution and functional significance.
The final Comment comes from Mads Albertsen, who highlights the surging area of applying long-read sequencing to microbial genomics and metagenomics. One common challenge when studying microbial genomes is that samples are often composed of a community of microbes, with individual species hard to separate or culture. With the help of long-read sequencing, high-quality metagenome-assembled genomes are now more than ever within reach. Such efforts will greatly accelerate our exploration of genomic information spanning the whole tree of life.
Despite its power, long-read sequencing technology does not reach perfection. Besides the unceasing race to generate longer reads with higher accuracy, optimizing cost effectiveness is another crucial consideration for improving its accessibility to more research communities. It also does not exist in isolation. Combined with other genomic methods, long-read sequencing has nurtured new frontiers for method development and biological research. We hope you share our excitement when reading this special issue, in which we also cover a number of Methods to Watch. We wish you a very happy 2023!
About this article
Cite this article
Method of the Year 2022: long-read sequencing. Nat Methods 20, 1 (2023). https://doi.org/10.1038/s41592-022-01759-x