The timely detection and surveillance of infectious diseases and responses to pandemics are crucial but challenging. Whole-genome sequencing (WGS) is a common tool for pathogen identification and tracking, establishing transmission routes and outbreak control.

At the turn of 2019/20, Wu et al1 used metagenomic RNA sequencing to identify the aetiology of an at this point unknown respiratory disease in a single patient in Wuhan, China, where several cases of severe respiratory infections have been reported. The authors identified the potential causative pathogen as a new coronavirus by reconstructing the viral genome from the bronchoalveolar lavage fluid sample of the patient. In early January 2020, the viral genome sequence was released, which facilitated the development of rapid molecular diagnostics assays worldwide. Subsequently, the virus (now known as SARS-CoV-2, which causes the ongoing coronavirus disease 2019 (COVID-19) pandemic) rapidly spread globally, and there has been an immediate effort to study viral transmission and evolution using WGS. For example, SARS-CoV-2 Sequencing for Public Health Emergency Response, Epidemiology and Surveillance (SPHERES) in the United States and the COVID-19 Genomics UK2 (COG-UK) in the United Kingdom.

Credit: Philip Patenall/Springer Nature Limited

The latter consortium was launched in March 2020 as a nationwide genomic surveillance network that aims to track viral transmission, identify viral mutations and integrate viral data with health data2. By June 2020, the consortium sequenced >20,000 SARS-CoV-2 genomes and defined transmission lineages based on phylogeny. Open data sharing and standardized lineage definitions (Global Initiative on Sharing All Influenza Data (GISAID)) were established to enable global efforts in detecting emerging lineages and mutations that are relevant for outbreak control and vaccine development on an international level3. By the end of June 2020, >57,000 SARS-CoV-2 genomes from around 100 different countries have been deposited in the GISAID database. To overcome the challenge in data analysis and interpretation, user-friendly web-based applications were designed for linage assignment (Pangolin COVID-19 Lineage Assigner) and to interactively visualize the circulating lineages on national and international scales (for example, Microreact and Nextstrain).

The international effort towards open data sharing is of major scientific benefit, enabling monitoring of SARS-CoV-2 evolution in nearly real time and on a global level. Korber et al.4 developed a bioinformatics pipeline to track changes in the SARS-CoV-2 spike glycoprotein, which mediates host cell entry and is a key vaccine target. The pipeline monitors changes in the amino acid sequence of spike over time to identify variants that are concomitantly increasing in frequency in different geographic locations. The analysis, which was enabled by data from GISAID, suggested that a SARS-CoV-2 variant carrying a particular spike mutation (D614G) became globally dominant over a period of one month. Comparison of different regions revealed consistent patterns of the G614 variant replacing a previously established D614 variant, which might be indicative of potential positive selection. The viral genome data were linked with patient clinical information, which showed that the G614 variant might be associated with potentially higher viral loads but not with disease severity. Updated data and current global counts of the spike 614 variants are available in the COVID-19 Viral Genome Analysis Pipeline.

Genomic surveillance can generate a rich source of information for tracking pathogen transmission and evolution on both national and international levels. More importantly, the recent application of genomics in surveillance of COVID-19 highlighted its usefulness in the nearly real-time response to a public health crisis.