Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology


Expansion of COVID-19 worldwide increases interest in unraveling genomic variations of novel SARS-CoV-2 virus. Metadata of 408,493 SARS-CoV-2 genomes submitted to GISAID database were analyzed with respect to genomic clades and their geographic, age, and gender distributions. Of the currently known SARS-CoV-2 clades, clade GR was the most prevalent worldwide followed by GV then GH. Chronological analysis revealed expansion in SARS-CoV-2 clades carrying D614G mutations with the predominance of the newest clade, GV, in the last three months. D614G clades prevail in countries with more COVID-19 cases. Of them, the clades GH and GR were more frequently recovered from severe or deceased COVID-19 cases. In contrast, G and GV clades showed a significantly higher prevalence among asymptomatic patients or those with mild disease. Metadata analysis showed higher (p < 0.05) prevalence of severe/deceased cases among males than females and predominance of GR clade in female patients. Furthermore, severe disease/death was more prevalent (p < 0.05) in elderly than in adults/children. Higher prevalence of the GV clade in children compared to other age groups was also evident. These findings uniquely provide a statistical evidence on the adaptation-driven evolution of SARS-CoV-2 leading to altered infectivity, virulence, and mortality.


Late in December 2019, an outbreak of atypical pneumonia of unknown etiology was described in Wuhan province in China. A novel coronavirus named “Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2)” was then identified as the etiologic agent1,2. Later, the disease was designated COrona VIrus Disease-2019 (COVID-19)3. The rapid expansion of COVID-19 cases in number and geographic distribution prompted the World Health Organization (WHO) to declare a global health emergency. Containment of the disease was hindered by the lack of antiviral treatment, lack of vaccines and existence of asymptomatic carriers. On March 11, 2020, COVID-19 was officially classified by the WHO as a pandemic.

After declaration of COVID-19 as pandemic, there was a global interest in exploring genomic variations in the novel virus. The first genomic sequence of SARS-CoV-2 was reported by Wu and colleagues2. Subsequently, publicly available resources were developed to provide dynamic and updated data on SARS-CoV-2 genome, thus offering an extraordinary opportunity for comparative genomic studies. Among the open access repositories of SARS-Cov-2 genomic sequences are the Global Initiative for Sharing All Influenza Data (GISAID) database (, National Center for Biotechnology Information database (NCBI) (, and Virus Pathogen Resource database (ViPR) ( Genome analysis tools were also provided by several platforms such as The China National Center for Bioinformation ( Nextstrain project (, and CoV-GLUE (

According to GISAID nomenclature system, most of the currently sequenced SARS-CoV-2 genomes were clustered into one of seven major clades. Such clades include L, to which SARS-CoV-2 virus reference strain belongs, S, V, G, GH, GR, and GV. They exhibit few changes in relation to the reference strain (GenBank accession number NC_045512, GISIAD accession ID: EPI_ISL_402124). Such changes include: L84S in NS8 for clade S; coexisting L37F and G251V mutations in NSP6 and NS3, respectively for clade V; D614G mutation in the spike protein (S) for clade G. In addition to D614G, NS3-Q57H, N-G204R and S-A222V mutations characterize the clades GH, GR and GV, respectively. Genomes that don’t belong to any of the seven major clades had the designation “O clade”.

Given that most of the immune-based therapeutics and diagnostics of COVID-19 are based on the protein sequence of Wuhan reference strain spike8, their efficacy could potentially be affected by genomic variations and the associated altered viral phenotype. Moreover, the influence of genetic mutations on the infectivity and/or virulence of SARS-CoV-2 is yet to be established9. The acquisition of mutations imparting higher infectivity, virulence and/or immunological resistance is thus an eminent threat. Accordingly, active genomic surveillance and close monitoring of the genomic sequence dynamics of SARS-CoV-2 is urgently required to: (a) trace the pattern of geographic spread of the virus during the ongoing pandemic10,11; (b) ensure the effectiveness of vaccines and immune-based diagnostic or therapeutic interventions currently in use or under investigation9; and (c) identify putative therapeutic targets12,13,14.

Geographic, gender, and age discrepancy of COVID-19 disease outcome have been reported by several studies15,16,17,18. Whether this correlates to SARS-CoV-2 genomic variation is still unclear. In addition to laboratory investigations, statistical approaches correlating the distribution of viral clades in different groups to disease severity might provide a good evidence on this bias. The current study aims to analyze the geographic, gender and age distribution of SARS-CoV-2 genomic clades with respect to COVID-19 disease epidemiology.


Geographic distribution of SARS-CoV-2 clades

As of January 25, 2021, WHO reported a total number of 98,925,221 confirmed COVID-19 cases and 2,127,294 deaths19. The calculated world case fatality rate (CFR) was 2.15%. Of all continents, the highest number of COVID-19 cases was reported from North America while most deaths were in Europe. South America had the highest calculated CFR.

GR was the most common clade (34.0%), followed by GV (22.3%) and GH (21.4%). Lower prevalence was noted for their parent clade, G, (15.8%). Other less common clades including L, S, and V were identified in 1.2%, 2.1% and 1.5% of the submitted genomes, respectively. About 1.7% of the genomes were not clustered into any of the major clades and thus had the designation “clade O”.

Analysis of the continent distribution of SARS-CoV-2 clades (Fig. 1) showed that clade GR was the most frequently identified among the genomes submitted from four continents namely Africa (41.1%), Asia (52.7%), Oceania (74.8%), and South America (66.8%). Clade GH was the most common in North America (59.0%). In Europe, both GR (35.5%) and the newly emerged clade, GV (34.6%) predominated.

Figure 1

Continent distribution of various SARS-CoV-2 clades. The figure shows the predominance of clade GR in Africa, Asia, Oceania, and South America and the predominance of clade GH in North America. The clades GR and GV predominated in Europe.

The number of coexisting clades was compared between countries with respect to different disease epidemiology parameters including the number of cases, total number of deaths and CFRs. Viral strains belonging to all known clades coexisted in 31 countries (21.1%). Of them, 61.3% reported above median local values for the studied disease epidemiology parameters.

Mann–Whitney test showed a significant difference in the distribution of the number of coexisting clades in the two groups with respect to the total number of cases (P-value < 0.001), total deaths (P-value < 0.001) and CFRs (P-value = 0.020). Higher medians of the number of coexisting clades were shown in the group of countries where above median cases, deaths and CFRs were recorded.

The impact of the distribution of individual clades on the disease epidemiology was also analyzed. Distribution bias of some clades was noted, as shown in Table 1. This was statistically significant for all clades with all disease epidemiology parameters.

Table 1 Geographical distribution of SARS-CoV-2 clades with respect to disease epidemiology parameters.

Among all studied cases, patient’s clinical status were specified for only 2,634. Based on the provided data, such cases were grouped into asymptomatic/mild cases and severe/deceased cases. Although clades GH, and GR were significantly more prevalent among viral genomes isolated from severe/deceased cases, L clade showed the same distribution but with P-value > 0.05. In contrast all other clades showed higher prevalence in asymptomatic/mild cases than severe/deceased ones. This was statistically significant for all clades except clade V (Table 2).

Table 2 Distribution of SARS-CoV-2 clades with respect to patient’s clinical status.

Analysis based on the chronological distribution of SARS-CoV-2 clades was done for 404,496 cases for which the exact date of collection was available. The analysis showed that clade L predominated at the beginning of the pandemic. Thereafter, new clades evolved including the clades S and V. Viral clades carrying D614G mutation also emerged. Of them, clade G first emerged and soon split into the clades GH and GR. A gradual regression of all clades was then noted with an expansion of clade GR that predominated the scene for six months till the emergence of the last clade, “GV”, by which it was rapidly outweighed. Currently, GV clade is distributed in at least 49 countries predominantly in the United Kingdom (73.3% of the reported GV cases). The global chronological distribution of SARS-CoV-2 clades is shown in Fig. 2. The origins and the evolution time of SARS-CoV-2 clades inferred from the genomes submitted to GISAID are shown in Table 3.

Figure 2

Global chronological distribution of SARS-CoV-2 clades in the period from December 2019 till December 2020. Percentages are used for labelling the predominant clade in each month. The emergence of D614G mutation is marked by a red asterisk.

Table 3 The origins and the evolution time of SARS-CoV-2 clades.

Gender distribution of SARS-CoV-2 clades

Analysis of 96,350 cases (Males = 49,454, Females = 46,896) for which patient gender was specified showed gender distribution bias for some clades (Table 4). This was statistically significantly for clades L and G that showed higher prevalence in males than females. Similarly, the clades GR and GV were more frequently isolated from females than males with P-value less than 0.05.

Table 4 Gender Distribution of SARS-CoV-2 clades.

The severity of cases in both genders were compared in 2495 cases for which both gender and patient’s clinical status are known. Among the group of cases for which the clinical status was recorded as severe or deceased, the number of male patients was significantly higher than female patients (49.9% versus 35.4%, P-value < 0.001). Clinical status of patients infected by SARS-CoV-2 of different clades in different gender groups is shown in Fig. 3.

Figure 3

Clinical status of patients infected by SARS-CoV-2 of different clades in different gender groups.

Age distribution of SARS-CoV-2 clades

The distribution of the genomes that belonged to different clades in different age groups was analyzed among 95,848 cases for which the patient age was specified (Table 5). As shown in Table 5, viral isolates belonging to clade GR were more common in adult patients than other groups. Meanwhile, children were the age group from which the clades GH and GV were more frequently isolated. Isolation of all other clades was commonest in elderly patients compared to others. Clinical status of patients of different age groups from which viral genomes belonging to different clades were isolated is shown in Fig. 4.

Table 5 Age groups distribution of SARS-CoV-2 clades.
Figure 4

Clinical status of patients infected by SARS-CoV-2 of different clades in different age groups.

A significant correlation was found between age groups and patient’s clinical status. The analysis included 2,524 cases for which both patient age and clinical status are known (Fig. 4). Severe/deceased cases were significantly more prevalent in elderly than in adults (71.9% vs 31.6%, Pearson Chi-Square P-value < 0.001) or in children (71.9% vs 3.4%, Pearson Chi-Square P-value < 0.001). They were also more frequently reported among adults compared to children in a statistically significant manner (31.6 vs 3.4%, Fisher’s Exact test P-value < 0.001).


A relatively higher genomic stability was reported for SARS-CoV-2 compared to SARS-CoV20. Nevertheless, SARS-CoV-2 genomes sequenced so far were clustered into at least seven major clades, as defined by GISAID database. Whether the genetic variability in SARS-CoV-2 clades arises due to an ongoing adaptation or merely due to genetic drift is still unknown. Lack of distinct evolutionary patterns or signatures in SARS-CoV-2 genomes was reported21, while independently emerged recurrent mutations were also identified22, suggesting an ongoing adaptation. Whether this possible adaptation provides more fitness for transmission and/or virulence is a matter of concern. In the current study, the metadata of 408,493 SARS-CoV-2 genomes submitted to GISAID EpiCoV database as of January 25, 2021 were analyzed with respect to genomic clades and their geographic, age, and gender distribution.

Most of the genomes belonged to one of seven major clades namely L, S, V, G, GH, GR or GV. In addition, genomes that were not clustered to any of the seven major clades (clade O) were also identified. About 93.5% of the genomes belonged to the clades with D614G mutation including the clades G, GH, GR and GV. Of them Clade GR was the most frequently identified followed by GV and GH. Earlier in January, clade G characterized by spike D614G mutation was identified and rapidly predominated the pandemic. The mutation was found to be located in a heavily glycosylated residue in the viral spike that is highly conserved in this species23. Theoretical evidence strongly suggests that mutations in this region could be coupled to altered capacity for host cell membrane fusion23,24,25, an effect that should also lead to higher person to person transmission and pathogenicity. An experimental evidence was later provided by Korber and colleagues9, who could link this mutation to greater infectivity and higher viral loads in COVID-19 patients. Sub-clusters of clade G then started to evolve including the clades GH, GR and more recently clade GV. The analysis of the chronological distribution of SARS-COV-2 clades in the current study showed that there was much expansion in the number of sequenced genomes that were clustered into the GR clade compared to clade G. A regression in the number of genomes clustered into clade GH was also evident. The newly introduced clade, GV, could also outweigh clade GR in the last few months suggesting higher fitness for transmission by the newer clades compared to their ancestral one. Based on the mentioned facts, the hypothesis of an adaptation-driven genetic evolution is stronger. However, an experimental evidence, providing comparison between clades, is yet to be established.

Adequate scientific elucidation of the reasons behind the rapid transmission and higher mortality rates of COVID-19 in some geographic regions compared to others is still demanding. Apart from public health issues, intrinsic factors related to viral genome may be implicated. Whether the geographic distribution bias of SARS-CoV-2 clades is related to the discrepancy of COVID-19 disease severity observed worldwide is still unclear26. In agreement with others21,27, a geographic distribution bias of SARS-CoV-2 clades was evident in the current analysis. The predominance of certain clades in different continents with respect to local disease epidemiology parameters was also analyzed. GH clade predominated North America were the highest number of cases was reported while GR predominated South America, the top ranked continent with respect to CFR. Both GR and GV were equally prevalent in Europe from which most deaths were reported. Coexistence of all clades was evident in 21.1% of the contributing countries accompanied, in most cases, by relatively higher COVID-19 cases, deaths and CFRs.

Tracking the distribution of individual clades in different countries with respect to disease epidemiology parameters showed higher prevalence of all clades with D614G mutation among the group of countries that showed above median total number of cases than others. With respect to the case fatality rates, only the clades G and GV were more frequently identified among the genomes submitted from the group of countries showing below median CFRs. Such findings suggest higher transmission of viral strains whose genome belong to all clades with D614G mutation. Higher virulence of clades GH and GR compared to G and GV is also suspected. To further examine this hypothesis, the distribution of all clades among viral genomes from patients with asymptomatic or mild disease and those from severe disease or deceased patients was analyzed. The clades GH and GR significantly showed higher prevalence among the group of severe disease or deceased patients. This is in line with the previous finding of higher viral loads in patients infected by SARS-CoV-2 virus strains harboring D614G genomic mutations9. In addition, lower prevalence of the clades S, G and GV among severe or deceased cases was also statistically significant. In agreement with this finding, clade S was also found to be less prevalent among the group of countries that showed above median values for the studied epidemiologic parameters. Although the reference strain of SARS-CoV-2 belonged to the L clade that also had higher prevalence at the beginning of the pandemic, clade S was found to be evolutionarily more related to animal coronaviruses28. In agreement with our findings, this suggests higher fitness for clade L compared to clade S from which it had rapidly evolved early in the pandemic. Together, our findings support the previous hypothesis of Brufsky about possible ongoing competition between viral clades of varying virulence during the current pandemic24.

Our analysis of genomes metadata showed higher prevalence of severe or deceased cases among male patients than females in a statistically significant manner. The worse disease outcome of male patients was also reported by others15,16,17,18. Several assumptions have been made by scientists to justify this gender bias. Among them are female’s superior immune response29 and higher angiotensin converting enzyme type 2 (ACE2) activity in male or ovariectomized animal models30. ACE2 is the main receptor for SARS-CoV-2 spike through which it attaches to target cells31. Wambier and colleagues assumed androgen receptor genetic variation as a likely reason32. The receptor is thought to regulate transcription of the transmembrane protease serine 2 (TMPRSS2), responsible for S protein priming that allows viral fusion to host cell membranes31. To explain the role of the genomic variation of SARS-CoV-2 in the gender-biased COVID-19 outcome, the distribution of SARS-CoV-2 clades in viral genomes from male versus female patients was analyzed. Gender bias was evident for some clades. Strikingly, the clades GR and GV were found to be more significantly more prevalent in female patients than males. The least susceptible gender group are thus found to be show higher susceptibility to the newer SARS-CoV-2 clades.

Consistent with previous reports16,33,34,35, our analysis showed that severe disease or death was significantly more prevalent in elderly than in adults and children. This was previously explained by existence of comorbidities, immune senescence36 and alterations in ACE2 receptors37. Mild disease in children was also reported by many studies15,38. Contributing factors may include lower maturity and function of ACE2 receptors39 and viral co-infection that leads to limited replication of SARS-CoV-2 in the respiratory tract40. Interestingly, clade GV showed the highest prevalence among children compared to other age groups. Being the last to emerge among all clades, this further supports the adaptation-driven evolution hypothesis where new clades become more infectious to the least susceptible age group.


The current analysis provides a statistical evidence on an ongoing adaptation-driven SARS-CoV-2 evolution whose outcome is higher viral infectivity and/or virulence. This is suggested by the biased distribution of the newer clades in geographic regions from which higher number of cases and deaths as well as higher CFRs were reported. More frequent isolation of the newer clades from the least susceptible populations including females and children was also noted. Given that the newer clades are thought to have higher virulence (GR according to the current study) and/or infectivity (GV according to the current study), this suggests that further evolution of the virus may put such groups at higher risk for COVID-19 worse outcome. However, it is worth mentioning that a successful genome-based epidemiologic analysis is limited by the inadequate and imbalanced number of genomes deposited in open access databases. Some constraints in this respect are the lack of whole genome sequencing facilities and data sharing policies by some countries. Accordingly, an experimental evidence is required to confirm or role out our hypothesis. Future studies are also recommended to address the impact of climate and lock down strategies on COVID-19 epidemiology.


SARS-CoV-2 genomes metadata

Metadata of all SARS-CoV-2 genomes submitted to the GISAID database (, were accessed in January 25, 2021 (n = 419,256). Only genomes of viruses isolated from humans and those for which genomic clades were specified (n = 408,493) were selected for analysis. Metadata of genomes included information on collection date, geographic location, patient gender, patient age, patient clinical status and viral genome clade. The genomes were submitted by labs from 146 countries around the world. The continent distribution of the genomes included in the current study was as follows: 6,008 from Africa, 27,473 from Asia, 262,934 from Europe, 88,574 from North America, 18,358 from Oceania, and 5,178 from South America. Genomic clades were inferred by GISAID database and defined according to its nomenclature system at the time of data collection outlined in (

For age-based comparisons, entries for which patient age are available (n = 95,848) were classified into three age groups including children (up to 18 years), adults (18–64 years) and elderly (65 years or more). Cases for which patients clinical status were clearly specified were grouped into asymptomatic or mild group and severe or deceased group.

Disease epidemiology data

Data of the disease epidemiology including total number of cases and total number of deaths in different countries were obtained from COVID‐19 situation dashboard of the World Health Organization available at ( accessed in January 25, 2021.

The calculated median number of cases in the countries from which SARS-CoV-2 genomes were submitted to the database was 107,841 while the median number of deaths was 1,532 and that of the CFR was 1.6%. Contributing countries were grouped into two groups according to the relation between the national values of each of the disease epidemiology parameters to the median.

Statistical analyses

Categorical data were expressed as percentages, while the median was used to describe the central tendency of the non-normally distributed numerical data. Group comparisons were done using Mann–Whitney U-test for numerical data and Chi-square (χ2) or Fisher’s exact test for categorical data. All statistical analyses were performed using the Statistical Package for Social Sciences (SPSS) software version 20.0 (IBM Corp., Armonk, NY, USA). P-value of less than 0.05 (two-tailed) was considered to be statistically significant.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.


  1. 1.

    Gorbalenya, A. E. et al. Severe acute respiratory syndrome-related coronavirus: The species and its viruses—a statement of the coronavirus study group. bioRxiv (2020).

    Article  Google Scholar 

  2. 2.

    Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269. (2020).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    World Health Organization. Novel Coronavirus (2019-nCoV): Situation Report, 22 (World Health Organization, 2020).

    Google Scholar 

  4. 4.

    Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data—from vision to reality. Eurosurveillance 22, 30494. (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Zhao, W. M. et al. The 2019 novel coronavirus resource. Yi Chuan = Hereditas 42, 212–221. (2020).

    Article  PubMed  Google Scholar 

  6. 6.

    Hadfield, J. et al. Nextstrain: Real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123. (2018).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Singer, J., Gifford, R., Cotten, M. & Robertson, D. CoV-GLUE: A web application for tracking SARS-CoV-2 genomic variation. Preprints (2020).

    Article  Google Scholar 

  8. 8.

    Wang, C. et al. The establishment of reference sequence for SARS-CoV-2 and variation analysis. J. Med. Virol. 92, 667–674. (2020).

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Korber, B. et al. Tracking changes in SARS-CoV-2 Spike: Evidence that D614G increases infectivity of the COVID-19 virus. Cell (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Giovanetti, M., Angeletti, S., Benvenuto, D. & Ciccozzi, M. A doubt of multiple introduction of SARS-CoV-2 in Italy: A preliminary overview. J. Med. Virol. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Castillo, A. E. et al. Phylogenetic analysis of the first four SARS-CoV-2 cases in Chile. J. Med. Virol. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Zhou, Y. et al. Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discov. 6, 14. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Chen, Y. W., Yiu, C. B. & Wong, K. Y. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL (pro)) structure: Virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Research 9, 129. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Robson, B. Computers and viral diseases. Preliminary bioinformatics studies on the design of a synthetic vaccine and a preventative peptidomimetic antagonist against the SARS-CoV-2 (2019-nCoV, COVID-19) coronavirus. Comput. Biol. Med. 119, 103670. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Guan, W. J. et al. Clinical characteristics of coronavirus disease 2019 in China. N. Engl. J. Med. 382, 1708–1720. (2020).

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Jin, J. M. et al. Gender differences in patients with COVID-19: Focus on severity and mortality. Front. Public Health 8, 152. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Richardson, S. et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. JAMA (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Shi, Y. et al. Host susceptibility to severe COVID-19 and establishment of a host risk score: Findings of 487 cases outside Wuhan. Crit. Care 24, 108. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    World Health Organization. WHO coronavirus disease (COVID-19) dashboard. Accessed January 25, 2020.

  20. 20.

    Jia, Y. et al. Analysis of the mutation dynamics of SARS-CoV-2 reveals the spread history and emergence of RBD mutant with lower ACE2 binding affinity. BioRxiv (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Chiara, M., Horner, D. S., Gissi, C. & Pesole, G. Comparative genomics suggests limited variability and similar evolutionary patterns between major clades of SARS-CoV-2. BioRxiv (2020).

    Article  Google Scholar 

  22. 22.

    Van Dorp, L. et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol. 83, 104351. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Walls, A. C. et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Brufsky, A. Distinct viral clades of SARS-CoV-2: Implications for modeling of viral spread. J. Med. Virol. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260–1263. (2020).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Baud, D. et al. Real estimates of mortality following COVID-19 infection. Lancet. Infect. Dis 20, 773. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Joshi, M. et al. Genomic variations in SARS-CoV-2 genomes from Gujarat: Underlying role of variants in disease epidemiology. bioRxiv (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Tang, X. et al. On the origin and continuing evolution of SARS-CoV-2. Natl. Sci. Rev. 7, 1012–1023. (2020).

    Article  Google Scholar 

  29. 29.

    Schurz, H. et al. The X chromosome and sex-specific effects in infectious disease susceptibility. Hum. Genom. 13, 2. (2019).

    Article  Google Scholar 

  30. 30.

    Liu, J. et al. Sex differences in renal angiotensin converting enzyme 2 (ACE2) activity are 17beta-oestradiol-dependent and sex chromosome-independent. Biol. Sex Differ. 1, 6. (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Hoffmann, M. et al. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell 181, 271–280. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Wambier, C. G. et al. Androgen sensitivity gateway to COVID-19 disease severity. Drug Dev. Res. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Chen, N. et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: A descriptive study. Lancet 395, 507–513. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Zhang, J. J. et al. Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan, China. Allergy 75, 1730–1741. (2020).

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Wang, D. et al. Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan, China. JAMA (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Alpert, A. et al. A clinically meaningful metric of immune age derived from high-dimensional longitudinal monitoring. Nat. Med. 25, 487–495. (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Koff, W. C. & Williams, M. A. Covid-19 and immunity in aging populations—a new research agenda. N. Engl. J. Med. (2020).

    Article  PubMed  Google Scholar 

  38. 38.

    Wu, Z. & McGoogan, J. M. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72314 cases from the chinese center for disease control and prevention. JAMA (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Dong, Y. et al. Epidemiology of COVID-19 among children in China. Pediatrics (2020).

    Article  PubMed  Google Scholar 

  40. 40.

    Brodin, P. Why is COVID-19 so mild in children?. Acta Paediatr. 109, 1082–1083. (2020).

    CAS  Article  PubMed  Google Scholar 

Download references

Author information




S.M.H. and W.F.E. planned and designed the research, analyzed the data, wrote and revised the manuscript. A.S.K. contributed to data curation as well as writing, reviewing and editing of the manuscript. A.M.N. contributed to research design, data curation as well as writing, reviewing and editing of the manuscript. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Walid F. Elkhatib.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hamed, S.M., Elkhatib, W.F., Khairalla, A.S. et al. Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology. Sci Rep 11, 8435 (2021).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing