In June 2020, the World Health Organization (WHO) SARS-CoV-2 evolution working group was established to track SARS-CoV-2 variants and their specific genetic changes and to monitor viral characteristics and their impact on medical and non-medical countermeasures, including vaccines against COVID-19. In November 2021, this working group transitioned to a formal WHO Technical Advisory Group on Virus Evolution (TAG-VE), with the aim of developing and implementing a global risk-monitoring framework for SARS-CoV-2 variants, based on a multidisciplinary approach that includes in silico, virological, clinical and epidemiological data.

Tracking variants

The main role of the TAG-VE is to function as an integrative forum for the exchange of information from global surveillance and research studies to monitor early warning signals, and to perform timely assessment of the need for public health action in response to emerging variants1. It uses a Delphi consensus method to establish which emerging variants are considered variants of interest (VOIs) or variants of concern (VOCs) (definitions, ref. 2). To avoid stigmatizing countries that first identify and report variants, a naming scheme that follows WHO guidelines3 and assigns Greek letters to VOIs and VOCs was adopted for global discourse in June 2021 (ref. 4). As of 26 March 2022, eight VOIs and five VOCs had been designated, and they have been further described as previously or currently circulating variants to reflect changes in their epidemiology over time2,5.

The first available data on emerging variants that are assessed by TAG-VE are viral sequences and their associated metadata shared on publicly accessible genetic sequence databases (e.g., GISAID, Genbank, the European Nucleotide Archive, and the DNA Database of Japan). Because all current VOIs and VOCs originated from ancestral variants, available sequences are still compared with that of the 2019 index virus (GISAID accession ID: EPI_ISL_402124), with the primary focus on regions of the genome that are known to encode important viral or immunity- or infection-associated proteins. Mutations in the gene encoding the spike protein are given highest priority, as they have the highest probability of being clinically important. The spike protein contains the receptor-binding domain that is essential for docking of the virus on host cells, as well as major determinants of both viral transmissibility (for example, the polybasic cleavage site) and antigenic makeup (the N-terminal domain and receptor-binding domain). However, mutations in genes encoding proteins other than spike have also been found to be important and should not be neglected6. The constellation of mutations detected in any emerging variant is compared with a list of annotated mutations that are suspected or known to have a role in one or more of the viral characteristics included in the definitions of VOI and VOC: transmissibility, immune escape, disease severity, detectability and susceptibility to available treatments. Figure 1 depicts the evolution of SARS-CoV-2 and the variant-defining sets of amino acid substitutions in spike seen in VOCs. Thus far, researchers have studied more than two dozen spike proteins, including those with phenotypes that confer immune escape (Fig. 2). The list of mutations and their associated phenotypic impact, directly demonstrated in laboratory studies or inferred through the use of in silico methods, is updated regularly on the basis of timely research presentations from the WHO partner network, as well as searches of preprints and peer-reviewed published literature2. This annotated list of mutations has proven useful, as the same mutations have independently arisen in different lineages of SARS-CoV-2, strongly suggestive of viral adaptation to the human host and selective pressure from population immunity.

Fig. 1: Overview of VOCs and amino acid substitutions in spike.
figure 1

Spike structure of the index virus (far left), together with key domains (highlighted in different colors), including the N-terminal domain (NTD), the receptor-binding domain (RBD), the S1–S2 junction (including the polybasic cleavage site), and the heptapeptide repeat domains (HR1 and HR2). For the five VOCs (middle and right), blue lines with dots indicate location of variation in each domain, and blue dots in adjacent protein structures indicate location of variation in that structure. The VOCs are plotted chronologically according to date of identification.

Notably, the detection of a new constellation of mutations in a variant does not necessarily translate into an increased public health threat, which suggests that some mutations, or combinations of mutations, may impart a fitness cost, rather than benefit. For example, the VOI Theta had a constellation of mutations that alerted scientists because it included top-ranked mutations such as E484K, N501Y, D614G and P681H in spike (Fig. 2), but its spread remained very limited. To establish whether a new variant poses a serious threat, TAG-VE looks for early epidemiological signals of spread and clinical signals derived from surveillance data or specific studies. This includes assessing measures of how quickly cases are increasing and in what geographical areas and population subgroups a variant emerges and spreads, as well as changes to disease severity indicators, such as hospitalization. Careful consideration is also given to the assessment of evidence of relative transmissibility, compared with that of other circulating variants, including secondary attack rates observed in household transmission studies. Finally, the assessment of the threat posed by a variant also needs to consider vaccine- or infection-derived population immunity, which has progressively grown to high levels in many countries, and has also become more complex in terms of different permutations of hybrid immunity, which refers to the immunity elicited by both vaccination and infection7. Therefore, studies that look at the prevalence of re-infections or vaccine breakthroughs are also reviewed, where available.

Fig. 2: Amino acid substitutions in spike with known impact.
figure 2

Key amino acid substitutions (along top of plot) and their associated lineage (left margin), and presence in N-terminal domain, receptor-binding domain or S1–S2 polybasic cleavage site. Dark blue, VOC; light blue, VOI.

Access to samples

The experience gained since the launch of the virus evolution working group, which later became the TAG-VE, has shown that the speed with which new SARS-CoV-2 variants spread can outpace the current ability to assess their threat. Variant sequence determination has now become part of the global surveillance, and systematic phenotypic characterization of emerging variants is a critical component that needs to be added to the surveillance core toolbox. It is therefore important to define gaps that need to be addressed to enable better and more timely responses to new variant threats.

Although genomic indicators for transmissibility and immune escape of VOCs are conceptually relatively straightforward to assess, the wide diversity of assays, the lack of centralized biobanks of viruses and clinical specimens, the difficulties in international shipping of materials, and the globally fragmented funding landscape make the comparability of data amassed in real time challenging. To improve the global capacity to rapidly perform phenotypic characterization, viral isolates of emerging variants should be promptly generated and shared to enable researchers in different laboratories to work with viruses that carry the same constellation of mutations. To address this challenge, the WHO is developing a bio-hub system and recently established a BioHub facility8. Its aim is to offer a reliable, safe and transparent mechanism for WHO Member States to voluntarily share novel biological materials, without replacing or competing with existing systems such as EVAg (European Virus Archive Global) and the US National Institutes of Health BEI (Biodefense and Emerging Infections) repository.

Sequence data

Another important challenge is the availability, representativeness and quality of genetic sequence data, which vary depending on sample quality and processing and the experience of the researchers in handling samples, and the platforms, protocols and analytical tools used, and access to all of these. This challenge is exemplified in the large bias in the volume of sequences contributed by a small number of countries to global databases, as well as the release of sequences with low-confidence genomic regions and gaps. For example, amplicon-based methods are sensitive to mutations, which leads to amplification errors that can lead to sequence errors or poor-quality sequence in certain regions of the genome. The consequences of viral diversification call for sustained, if not enhanced, investment from governments in the capacity of reference laboratories to match the global public health demand for high-quality sequences and viral characterization for SARS-CoV-2, which can then be used for future public health threats posed by any infectious disease.

Verification of sequence quality through analysis of raw reads is a critical quality check that becomes even more critical for the detection of recombination between SARS-CoV-2 genomes. Verified detection of SARS-CoV-2 circulating recombinant forms since the Omicron VOC emerged has increased, most likely because of increased availability of genomic surveillance, as well as natural factors. The larger number of lineage-defining mutations in Omicron makes the detection of recombinant forms easier, and re-infection with immune-escape variants increases the chances of co-infection and therefore recombination, especially if two variants co-circulate in the community at high levels. Although it is impossible to predict whether circulating recombinant forms with specific genomic breakpoints may become more transmissible, for a new variant to spread widely, it must inherit traits from the parental viruses that provide a selective advantage.

Infectivity and virulence

Immune escape of VOCs can be shown through live virus-neutralization studies, with results from different laboratories showing similar results. In contrast, there are many methodological gaps in how to assess infectivity and virulence in vitro, ahead of clinical and epidemiological data. Some critical sites and amino acid substitutions have been identified, but this list is not comprehensive; these sites influence determinants of infectivity such as receptor binding, cleavage of the S1 and S2 domains of the spike protein, and cellular entry (Table 1). In vivo animal studies can help elucidate specific features that may be difficult to estimate from epidemiologic data, such as assessing cross-neutralization between variants, which is challenging in humans, as, unlike animal models, they may have unknown prior exposures. Animal models can also help in the assessment of virulence without the confounding effects of background immunity, but it remains to be seen how animal data correlate with disease severity in humans9,10,11,12.

Table 1 Amino acid substitutions in spike associated with laboratory evidence of phenotypic impact

Disease severity in the clinical setting can be especially difficult to determine rapidly and accurately because of challenges posed by ascertainment, inclusion, quality of care, and confounding and colliding biases from empirical observational studies. A key role will be served by the interoperability of electronic health records, which should allow easy linkage of sequencing data with patients’ clinical records. Unbiased and systematic collection of clinical and epidemiological data with comprehensive biological sampling and rigorous characterization of virulence and correlates of protection remains the gold standard for variant threat assessment. The generation of such data from well-designed studies across diverse healthcare settings and different geographies is expected to remain challenging, especially once the response to COVID-19 is de-escalated in the post-acute phase of the pandemic. Integration of such studies into newly established routine surveillance system work plans may help ensure that clinical and epidemiological studies continue in the future.

Until November 2021, the emergence of VOCs such as Alpha and Delta was associated mainly with an increase in transmissibility, most likely driven by viral adaptation to the human host, with modest degrees of immune escape13. The emerging evidence on Omicron variants suggests that immune escape was a substantial driver for its observed displacement over Delta, and its selective advantage was driven by increasing population immunity, in addition to its increased transmissibility14. Although the lower virulence of Omicron was most likely a chance event, the preservation of cellular immunity that protects against severe disease is likely to be a recurring theme15,16. In addition, hybrid immunity in people who have experienced a breakthrough infection seems to broaden immune responses, which suggests that population immunity may be able to tolerate considerable continued evolution of SARS-CoV-217,18. However, because transmission and virulence are uncoupled for SARS-CoV-2, it cannot be assumed that the next variant will be less virulent. Future variants may have virulence similar to or higher or lower than that of Omicron.

An early warning system

A recent retrospective analysis suggested that some of the key variant-defining mutations could potentially have been detected much earlier, which shows the importance of early warning bioinformatics tools from globally shared data, an approach in its infancy19. The identification of key mutations could then trigger a subsequent virus-, variant- or mutation- characterization pipeline. Machine learning algorithms to determine the potential impact of key mutations are being developed and validated, but full assessment of variants still requires epidemiological data and in vivo and/or in vitro experiments, as a VOC cannot be ascertained solely through genomic data.

The future of the pandemic is difficult to predict, for several reasons. First, unlike variants of other respiratory viruses, such as human influenza viruses, SARS-CoV-2 VOCs have so far not emerged from the most recently dominant circulating virus20. Second, chronically infected patients can allow the acceleration of intra-host evolution21. Third, a wide number of susceptible mammals, including cervids, mustelids and rodents, may act as secondary reservoirs with the potential for reverse zoonoses22. Fourth, too much of the world’s vulnerable population remains unvaccinated at present, and current vaccines are suboptimal in preventing transmission. Given these challenges in predicting the evolution of the virus, targeted surveillance of suspected high-risk populations, such as chronically infected patients, and improving detection systems, such as a One Health approach, which also includes wastewater and animal surveillance, are critical for the early detection of future variants.

The evolving virus and the uncertainty of predicting the trajectory of the pandemic call for strengthened surveillance and continued monitoring of SARS-CoV-2. The TAG-VE will continue to critically appraise state-of-the-art methodologies for predicting further evolution of SARS-CoV-2 and will continue to rapidly determine the threat levels posed by new variants. The pandemic is not over, and SARS-CoV-2 is spreading at a high level globally. Now is the time to enhance global sequencing capacities, focusing on widening coverage to include previous geographical and population blind spots, and to build a global consensus toward continued concerted multidisciplinary efforts, under the leadership of the WHO R&D Blueprint for action to prevent epidemics, to track and assess the threat posed by future SARS-CoV-2 variants.