Which mammalian species are at risk of being infected by SARS-CoV-2: an ACE2 perspective CURRENT STATUS: REVIEW

SARS-CoV-2 can transmit efficiently in humans, but it is less clear what other mammalian are at high risk of being infected. SARS-CoV-2 contain a Spike (S) protein that uses mammalian ACE2 receptors to mediate cell entry, a species with a human-like ACE2 receptor is therefore at risk of being infected by SARS-CoV-2. We compared between 131 mammalian ACE2 genes and 15 coronavirus S proteins. We showed that global similarity reflected by the phylogenetic relationship from ACE2 gene alignment is a poor predictor of high-risk mammals, whereas local ACE2 similarities at key binding sites highlight several high-risk mammals. Both SARS-CoV and SARS-CoV-2 likely have a bat origin; however, direct human transmission is unlikely due to their differences in ACE2 receptors, and various mammals share similar or better homologies in ACE2 receptor with humans. Furthermore, by comparing key binding sites at S protein of SARS-like coronaviruses in high-risk mammals, we found high similarities in S protein binding domains between SARS-CoV-2 and Pangolin-CoV but not Civets-CoV, and high similarities between SARS-CoV and Civets-CoV but not Pangolin-CoV. Hence, evolutionary adaptation of the bat virus in different intermediate hosts could allow it to acquire distinct high binding potential between S protein and human-like ACE2 receptors.

Introduction SARS-CoV-2 poses a serious global health emergency. Since its emergence in Wuhan city, Hubei province of China in December 2019, the viral outbreak has resulted in over 2.8 million confirmed cases worldwide and 195,000 deaths 1 . While it is evident that SARS-CoV-2 can transmit efficiently from person to person, it is less clear what other mammalian species are at high risk of being infected. Both SARS-CoV and SARS-CoV-2 genomes encode a spike protein that uses the human ACE2 receptor to mediate viral entry into the host cell [2][3][4][5]6,7 . Mechanistically, two S protein domains are involved during Betacoronavirus infection on mammalian cells that express ACE2 receptors, with the S1 domain interacting with the ACE2 receptor 8,9 , and the S2 domain undergoing structural rearrangements to mediate membrane fusion 4 . Interacting with the S1 domain, the ACE2 receptorbinding domains are primarily located in α-helix 1 and β-sheet 5 4 . Specifically, previous SARS-CoV studies have identified many ACE2 genetic hotspots that affect viral infectivity, and the efficacy of the interaction between S proteins and ACE2 receptors is a good predictor of severity of Betacoronavirus infection 4 and potentially plays an important role in subsequent viral replication 10,11 . Hence, determining the key sites at the human ACE2 receptor involved in binding with SARS-CoV-2 S proteins and their differences in comparison to mammalian ACE2 receptors is crucial to provide insights into viral infectivity, host range, and the patterns of viral evolutionary adaptation that have allowed interspecies transmission and infection of human ACE2 expressing cells. Answering these questions is important to 1) improve our ability to predict and control future pandemics, 2) facilitate efforts in the development of a potential SARS-CoV-2 vaccine, and 3) manage and protect wildlife and  [12][13][14][15] . Whereas in human cells, ACE2 receptors only bind efficiently to S proteins from SARS-CoV TOR2, but not to that from Civet-CoV SZ3 or from the less severe GD variant 4 . Furthermore, early studies found that differences in binding domains explained why S protein-mediated SARS-CoV infection in human is also efficient in palm civet but is not efficient in rats 4 , and introducing point mutations in the AEC2 gene variably changed the efficiency of receptor binding in palm civets, humans, and rats. For example, experimentally mutating rat ACE2 His353 to human Lys353 changed the rat ACE2 receptor from one that poorly binds S protein into one that is efficient for binding, introducing human ACE2 residues 82 to 84 into rats led to an increase in S1protein binding, and changing human ACE2 Met82 into rat ACE2 Asn82 interfered with S protein mediated entry. Furthermore, changing human AEC2 Lys31, Tyr41, Asp355, and Arg357 also interfered with S protein binding. On the contrary, other ACE2 site differences, such as Glycine 354 in human and Asparatic acid 354 in palm civet, and their differences in residues 90-93 that potentially interact to S protein residue 479, did not contribute to a difference the efficacy of S protein binding 10 .
Hence, although major sequence differences are found between palm civet and human ACE2 in α-helix 1 (residues 30-40) and in α-helix 3 (residues 90-93), they do not contribute to differences in efficacy of S protein binding. In brief, mutation experiments by Li,et al. 3 shows that the conservation for Lys31 and Tyr41 on α-helix 1, residues 82-84 in the vicinity of α-helix 3, and residues 353-357 in β-sheet 5 in human ACE2 may be crucial for efficient S protein binding in SARS-CoV because their replacements lead to altered binding dynamics between the S protein and ACE2 receptor.
To illustrate the importance of contrasting between human and other mammalian ACE2 at key sites crucial for S protein binding, figure 1 compares ACE2 gene similarities in a sample of mammalian species in two ways: one is global similarity reflected by the phylogenetic tree from ACE2 alignment ( Fig. 1A), and the other is local similarity in ACE2 key sites that are crucial for S protein binding from mutation experiments by Li, et al. 3 (Fig. 1B). Figure 1A shows that the human ACE2 gene shares higher global similarity to mammalian ACE2 genes of other Primates and species of the Rodentia family than to species of the Carnivora, Artiodactyla, and Chiroptera orders. This emphasizes that global ACE2 gene comparison is not a good predictor of which species is at high risk of being infected by SARS-CoV because we cannot infer the two aforementioned information: that the progenitor of SARS-CoV is likely a bat (from the Chiroptera order) virus, and that the probably intermediate host is masked palm civets (from the Carnivora family). In contrast, by comparing the ACE2 gene at key residues that are involved in human SARS-CoV infection, we found that certain species in the Carnivora, Artiodactyla, and Chiroptera orders are at higher risk of being infected by SARS-CoV-2, even though they share less global ACE2 similarity than other Primates and Rodentia species. Indeed, global similarity between palm civet ACE2 and human ACE2 does not identify palm civet as a high-risk creature, but key site similarity comparisons does suggest palm civets as a high-risk intermediate host.
Similarly, given there are key binding sites in human ACE2 that form contacts with SARS-CoV-2 S protein, it is important to understand how well different mammalian ACE2 feature these binding sites, in order to find species at high risk of SARS-CoV-2 infection that may act as potential intermediate hosts. ACE2 binding by SARS-CoV and SARS-CoV-2 share overall structural similarity, likely constrained by the structure of ACE2 6 . However, the spike protein is notably different between SARS-CoV and SARS-CoV-2 with only about 75% homology 5,16 . Based on two recent crystal structure experiments 6,7 , different residues on the ACE2 receptor are involved in SARS-CoV-2 and SARS-CoV infections, these differences are summarized in Table 1. In brief, in SARS-CoV-2 Ser19 forms a new hydrogen bound with ACE2 Ala475 while SARS-CoV lacks this interaction 7 . The ACE2 Asp30 forms salt bridge with SARS-CoV-2 S protein at Lys417 but does not interact with SARS-CoV S protein 6 . Other noteworthy differences are ACE2 Glu35 and Arg393 interacting only with the SARS-CoV-2 S-proteins (Table 1). These differences may contribute to a difference in binding affinity between the two viruses and suggest distinct viral infectivity and host range. Additionally, because the S proteins of SARS-CoV-2 and SARS-CoV have unique structural differences, even when the same ACE2 residues are involved in binding, they form different contacts. For example, human ACE2 Gln24 interacts with SARS-CoV-2 S-proteins at Leu472 but with SARS-CoV S proteins at Asn473 6 , and ACE2 Leu79 and Met82 interact with SARS-CoV-2 S proteins at Phe486 but with SARS-CoV S protein at Leu472. While binding at Gln24 was not previously documented to affect S protein binding in SARS-CoV 10 , the difference in its binding to SARS-CoV-2 may contribute to a difference in binding affinity between the two viruses 7 .
Furthermore, the conservation of several ACE2 hotspots may be critical to achieve efficient S protein binding; examples are Lys31 and Lys353 7 , both forming salt bridges with the S protein of SARS-CoV-2 and of SARS-CoV. In total, at the ACE2 receptor's binding region, 15 residues are potentially crucial in binding to SARS-CoV-2 S protein, among which four are distinctly found to interact with SARS-CoV-2 S proteins ( Table 1). As of the time this manuscript was submitted, studies that mutate human ACE2 interface residues to assess their binding potential to SARS-CoV-2 are scarce. Hence, we cannot rule out the possibility that all ACE2 interface residues listed in Table 1  Besides the comparison of ACE2 gene to understand SARS-CoV-2 and SARS-CoV ACE2 host range, one other important question is how did bat SARS-like coronavirus transmitted to humans? Given that there are distinct sites on the S1 receptor binding domain between SARS-CoV-2 and SARS-CoV, we wish to know which mammalian-CoV have the same sites as the two human SARS coronaviruses.
Answering this question facilitates the identification of 1) a progenitor that proceeded the common ancestor of SARS-CoV-2 and SARS-CoV, and 2) which mammalian-CoV may evolve to exploit these similarities to infect humans. SARS-CoV-2 might have evolved from the bat RaTG13 coronavirus because they share high S protein sequence similarity. Notably, different from SARS-CoV, SARS-CoV-2 S protein contains a Gly482, Val483, Glu484, and Gly485 four-residue motif in the binding ridge that may facilitate contact with the N-termal helix on human ACE2 receptors. Similarly, the bat RaTG13 virus also contains the four-residue motif 7 . This may explain why RaTG13 could use human ACE2 as its receptor 7 and the simplest assumption is that there are no intermediate hosts. However, the S protein binding domain of the pangolin Betacoronavirus Pangolin-CoV also shares high sequence similarity with that of SARS-CoV-2 and it also contains the 482-485 four-residue motif 7 . Additionally, the ACE2 receptor is more similar between humans and pangolin than between humans and bats.
Indeed, pangolins have been proposed to be an intermediate host 17 .
If the intermediate host for SARS-CoV-2 is distinct from SARS-CoV, then it should meet two criteria.
First, the culprit should share high similarity with humans in ACE2 binding domains and are at high risk for SARS-CoV-2 infection. Second, the SARS-like coronaviruses that are species-specific to high risk mammals should share similarities in the S1 domains with human SARS-CoV-2 and with bat RaTG13 but not with SARS-CoV. Many residues in RaTG13 S protein are not fine-tuned for binding human ACE2 6,7 . Changing bat RaTG13 S protein at Lys486 and Tyr493 to SARS-CoV-2 S protein Phe486 and Gln493, respectively, enhanced human ACE2 recognition 7 . Similarly, changing RaTG13 Lys479 to SARS-CoV Asn479 also increased human ACE2 binding. Moreover, residues Leu455 and Asn501 are found to favor human ACE2 binding and they are conserved between RaTG13 and SARS-CoV-2 7 , and residues Tyr442, Leu472, Asn479, Thr487 and Tyr505 were also identified as critical sites involved in ACE2 binding 18 . Table 1 details all other S protein residues on SARS-CoV-2 and on SARS-CoV that interacts with the human ACE2 receptor. Although S protein amino acid sites may differ, many interacts with the same ACE2 residues, among which some residues are identical amino acids (in blue, Table 1) whereas others are different amino acids but share similar chemical properties (underlined, Table 1) 6 . In total, there are 15 SARS-CoV-2 S protein residues and 13 SARS-CoV S protein residues that are critical to ACE2 binding (Table 1). Determining the differences in S protein binding sites of SARS-CoV-2, SARS-CoV, and SARS-related viruses in other mammalian species may shed light on the evolution of Betacoronaviruses that mediated animal to human transmission. Table 1. Contacting residues at ACE2/SARS-CoV-2 (columns 1 and 2) and at ACE2/SARS-CoV (columns 3 and 4) crystal structure interfaces, retrieved from Shang, et al. 7 and Lan, et al. 6 . Each row specifies the contacts formed between human ACE2 residue and S protein residue (e.g., row 2; ACE2 S19 -SARS-CoV-2 A475). In red are distinct residues on the ACE2 receptor interface for binding with SARS-CoV-2 and SARS-CoV S proteins. In blue are shared S protein amino acids between the two SARS coronaviruses that interact with the same ACE2 residues, although the position of S protein residue differs between the two viruses. Underlined are different S protein amino acids interacting with the same ACE2 residues, but these interactions share similar biochemical properties. We performed comparative gene analyses to trace differences in ACE2 binding regions for ACE2 genes in 131 mammalian species across 18 orders. We found that the ACE2 receptors are most strongly conserved among Primates. Among other mammals, in comparison to bat species in the Chiroptera order, the human ACE2 receptor's key residues are at least similarly or more conserved for SARS-CoV-2 binding in specific species from the Artiodactyla, Rodentia, Carnivora, Perissodactyla, Pholidota, Lagomorpha, Proboscidea, Sirenia, and Tubulidentata orders. Next, we compared sequence similarity in S protein key sites from SARS-CoV-2 and SARS-CoV with that of SARS-like coronaviruses that infect 12 high risk mammalian species, these include species from the Primates, Perissodactyla, Artiodactyla, Canivora, Chiroptera, Rodentia and Pholidota orders, but the Lagomorpha, Proboscidea, Sirenia, and Tubulidentata orders are ruled out because there are no records of mammalian-specific SARS-like coronaviruses for these species. Based on sequence similarity, we found that key residues on S protein in SARS-CoV is most similar to that of palm civet Civet-CoV, whereas that in SARS-CoV-2 is most similar to Pangolin-CoV, and both human coronaviruses share S protein similarities to bat RaTG13 but not with other mammalian coronaviruses. Together, our results suggest that that the progenitor of both SARS-CoV-2 and SARS-CoV are of bat origin. The bat coronavirus infected two different distinct species from different orders, the palm civets and pangolins, because both share high similarities at key sites with human ACE2 and with human SARS S1 binding sites. The rapid evolution of the bat virus in the intermediate hosts allowed it to distinctly adapt high binding potential between the S protein and human-like ACE2 receptors. Indeed, the S protein binding sites are highly similar between SARS-CoV-2 and Pangolin-CoV but not Civets-CoV, and highly similar between SARS-CoV and Civets-CoV but not Pangolin-CoV.

Retrieving mammalian ACE2 genes and putative S protein binding regions
The nucleotide sequences of 296 ACE2 gene variants from 131 mammalian species were retrieved from the National Center for Biotechnology Information (NCBI) Nucleotide Database (https://www.ncbi.nlm.nih.gov/). The NCBI Nucleotide Database was queried for records containing "ACE2" as a gene name and "Mammalia" as a taxonomic class, excluding whole-genome and chromosome-wide results. Next, each record was searched for /product= "angiotensin-converting enzyme 2", and all others were removed. For each entry, only the coding DNA sequence region was extracted in FASTA format. The nucleotide CDS was translated into amino acids using DAMBE7 19  amino acid sequences where first aligned with MAFFT 20 with the slow but accurate G-INS-i option, then a match-mismatch heat-map was generated for each site and a total similarity score (the number of matching amino acid sites between human and mammalian species) was calculated.
Similarly, to compare S protein binding sites between SARS-CoV-2 and SARS-CoV to other coronaviruses, S protein amino acid sequences where first aligned with MAFFT 20 with the slow but accurate G-INS-i option, then site-specific amino acids were retrieved to generate the heat-map.

Phylogenetic reconstruction
Only one ACE2 isoform was used for each species for phylogenetic reconstruction. available, but when only predicted variants are available, we picked one among those whose amino acid identities at ACE2 receptors are conserved (picked ACE2 variants are listed in Supplemental file S1). Similarly, one representative coronavirus was picked out of the several available strains (e.g., SARS-CoV Urbani). Then, the amino acid sequences for these ACE2 variants and coronavirus Spike proteins were aligned with MAFFT 20 with the slow but accurate G-INS-i option.
Three phylogenetic trees were constructed using MAFFT aligned amino acids with the maximumlikelihood-based PHYML approach. One tree for aligned ACE2 gene for 131 mammalian species (bootstrap = 500, model = JTT + G + I + F), another tree for ACE2 gene for 13 sample mammalian species (bootstrap =100, model = JTT + G + I + F), and a third tree for 15 mammalian SARS-like coronaviruses (bootstrap = 100, model = WAG + G + I + F). All were constructed using the PHYML model 21 implemented in DAMBE. The tree improvement option "-s" was set to "BEST" (best of NNI and SPR search). The "-o" option was set to "tlr" which optimizes the topology, the branch lengths and rate parameters.

Results
The Human ACE2 receptor is most conserved among Primates, but variably more conserved in species of other orders in comparison to bat species of the Chiroptera order.  Between the two human viruses, the S protein's binding domain for SARS-CoV-2 is weakly shared by SARS-CoV (Fig. 3A), and vice versa (Fig. 3B). This suggests that the two viruses are evolutionarily distinct. Furthermore, the S protein's binding sites on SARS-CoV-2 shares highest similarities with that of Pangolin-CoV strain Guangdong (GD) that was sequenced with high coverage (Fig. 3A), but shares lower similarity with Pangolin-CoV strain Guangxi (GX) flagged with poor quality by GISAID; whereas the S protein's binding domain of SARS-CoV shares highest similarity with that Civets-CoV (Fig. 3B).
Additionally, both viral S protein binding domains share a medium degree of similarity with that of the bat coronavirus RaTG13 from the species Rhinolophus affinis, but not with other bat SARS-like viruses from Rhinolopus sinicus and Rhinolopus ferrumequinum. Among other SARS-like coronaviruses, their S proteins share little similarities with that of both SARS-CoV-2 and SARS-CoV, including the human MERS-CoV. Together, these results suggest that both SARS-CoV-2 and SARS-CoV likely originated from the bat (Chroptera order, Rhinolophus family) but they have distinct intermediate hosts.
In terms of similarities in S protein binding domains, Rhinolophus affinis is less similar to humans than two specific species, palm civets and pangolins. While the S protein binding domains of Civets-CoV is more similar to that of SARS-CoV, the S protein binding domain of Pangolin-CoV is more similar to that of SARS-CoV-2. One additional evidence of this distinct zoonotic transmission for SARS-CoV-2 and SARS-CoV is the presence of an unique 482-485 domain that is also present in bat RaTG13, SARS-CoV-2, and Pangolin-CoV GD, but not in SARS-CoV and palm civets (Fig. 3A). Furthermore, both groups are more closely related with bat SARS-like coronaviruses than with the human MERS coronavirus with a presumed camel origin, and both human SARS coronaviruses are more distant than SARS-like coronaviruses of other mammals such as those infecting canine, bovine, camel, equine, rat, and feline ( Fig. 4). More importantly, figure 4A shows that while SARS-CoV-2 is more similar to bat RaTG13 than to Pangolin-CoV in overall S protein sequence similarity, the S protein's binding domains are more similar between SARS-CoV-2 and Pangolin-CoV than to RaTG13. Hence, the S protein of Pangolin-CoV is better adapted to infect humans than is the S protein of RaTG13 even though globally SARS-CoV-2 is more similar to RaTG13. The notion that local similarities are more important than global similarities is therefore consistent for the identification of high-risk mammals (Fig. 1) and coronavirus infectivity (Fig. 4).

Discussion
We performed comparative gene analyses to trace differences at key SARS-CoV-2 binding sites on the ACE2 receptors in 131 mammalian species across 18 orders. Similarities in mammalian ACE2 genes are measured in two ways, one is by global similarity reflected by the phylogenetic tree from ACE2 gene alignment, and the other is key site similarities to human ACE2. While global similarities is not a good predictor of which species is at high risk of being infected by SARS-CoV-2, local ACE2 similarities highlight which mammals that are at high risk of being infected by SARS-CoV-2 ( Fig. 1) We next compared the SARS-CoV-2 and SARS-CoV S proteins with SARS-like coronaviruses in 12 high risk mammalian species from the Perissodactyla, Artiodactyla, Canivora, Rodentia, Pholidota orders.
However, species from the Lagomorpha, Proboscidea, Sirenia, and Tubulidentata orders are ruled out because there are no documented SARS-like coronaviruses that infect species from these orders.
Given that distinct key sites on the S proteins of SARS-CoV-2 and SARS-CoV are important for human ACE2 binding, we wished to know which high risk mammals host SARS-like coronaviruses have shared key sites with the two human SARS coronaviruses. We found that key residues on the S protein receptor in SARS-CoV is most shared by Civet-CoV, similar to Pangolin-CoV, weakly shared by bat RaTG13 and by SARS-CoV-2, but not shared by other mammalian-specific viruses. Whereas key residues on S protein receptor in SARS-CoV-2 are most shared by Pangolin-CoV GD, less shared by Pangolin-CoV GX and bat RaTG13, weakly shared by Civet-CoV and SARS-CoV, but not shared by other mammalian-specific viruses (Fig. 3). These findings corroborate a recent study 22 that found that the entire SARS-CoV-2 S1 receptor-binding domain from residues 451 to 509 share higher similarity to Pangolin-CoV GD than to bat RaTG13 and SARS-CoV.
Because the S proteins of human SARS coronaviruses and of bat SARS-like coronavirus share close phylogenetic relationships (Fig. 4), this suggest that both SARS-CoV-2 and SARS-CoV are indeed likely to have a bat origin 2,23 , possibly one from the Rhinolophidae family. However, direct transmission from bats to humans is unlikely because 1) key residues on bats ACE2 receptors are less similar to humans (Fig. 2C) than those on ACE2 receptors of mammalian species from other orders (Fig. 2), and 2) key residues on the bat SARS-like coronavirus S protein are less similar to those on human SARS coronaviruses than those on other mammalian-specific SARS-like coronaviruses (Fig. 3). Indeed, our results suggests that the intermediate host for both viruses are likely distinct for SARS-CoV-2 and SARS-CoV, namely the previously proposed palm civets for SARS-CoV 4,12 and pangolins for SARS-CoV-to that of humans. Importantly, based on global similarities, the S protein of SARS-CoV-2 is more closely related to bat RaTG13 than it is to Pangolin-CoV (Fig. 4), similar to what recent studies have shown 6,25 . However, here we showed that key binding sites at the S protein that are involved in ACE2 receptor binding are more alike between SARS-CoV-2 and Pangolin-CoV than RaTG13. Hence, the S protein of Pangolin-CoV is better adapted to human ACE2 receptor than that of RaTG13 even though SARS-CoV-2 is more similar to RaTG13 globally. The notion that key site differences are more important than global similarities is therefore important in the determination of both high-risk mammals (Fig. 1) and S protein infectivity (Fig. 4). Hence, rapid evolution of the bat virus in the intermediate hosts allowed it to adapt higher binding potential between the S protein on the virus and mammalian ACE2 receptors that are similar to human ACE2 receptor, leading to evolutionary differences between SARS-CoV-2 and SARS-CoV S proteins. The corroborative evidence for this proposition is the observed similarities in S protein binding receptors between SARS-CoV-2 and

Pangolin-CoV but not Civets-CoV, and similarities between SARS-CoV and Civets-CoV but not
Pangolin-CoV. One additional evidence of this differential adaptation is the presence of the S protein 482-485 motif that allows favorable human ACE2 recognition 7 in SARS-CoV-2, bat RaTG13, and Pangolin-CoV, but not in SARS-CoV or Civets-CoV (Fig. 3A).

Declarations
Supplemental File S2 contain Supplemental figure legends.

AUTHOR CONTRIBUTIONS
Y.W. and X.X. designed the study and wrote the manuscript. Y.W., P.A., and H.F. collected the data and Y.W. and P.A. analyzed the data. P.A. prepared all figures. X.X. supervised the study. All authors reviewed the manuscript. X.X. supervised the project.  The 15 key sites on human ACE2 receptor involved in SARS-CoV-2 S protein binding (see Table 1) are compared with the ACE2 receptors of 131 mammalian species belonging to A) Primates, B) Artiodactyla, C) Chiroptera, D) Canivora, E) Rodentia, and F) 13 other orders.
All ACE2 genes are aligned in MAFFT. At the site-specific residues (ACE2 binding hotspots similarity with hACE2), in blue and red are matching and mis-matching amino acids, respectively, between mammalian ACE2 and human ACE2. At total similarity, highlighted in blue are mammalian ACE2 with perfect site similarity (14 to 15 matching sites), in green high similarity (12 to 13 matching sites), in light green medium similarity (10 to 11 matching sites), in yellow medium similarity (8 to 9 matching sites), in orange medium low similarity (6 to 7 matching sites), and in red low similarity (5 or less matching sites) when compared to human ACE2 receptor.

Figure 3
MAFFT aligned amino acid comparisons at key binding residues (see Table 1) between 14

SARS-like coronavirus S proteins and A) SARS-CoV-2 S protein and B) SARS-CoV (SARS
Urbani) S protein (16 virus in total). In blue and red are matching and mis-matching amino acid residues, respectively. Total similarity designates the total number of matching amino acid residues, highlighted in blue are S protein binding domains with perfect site similarity (14 to 15, and 12 to 13 matching sites), in green high similarity (12 to 13, and 10 to 11 matching sites), in light green medium similarity (10 to 11, and 8 to 9 matching sites), in yellow medium similarity (8 to 9, and 6 to 7 matching sites), in orange medium low similarity (6 to 7, and 4 to 5 matching sites), and in red low similarity (5 or less, and 3 or less matching sites) when other SARS-like coronavirus S proteins are compared to the S protein of SARS-CoV-2 and SARS-CoV, respectively. In yellow is the 482-485 GVEG motif that is distinctly found in SARS-CoV-2.

Figure 4
Phylogenetic reconstruction using 15 MAFFT aligned SARS and SARS-like coronaviruses infecting mammalian species (of the Artiodactyla, Canivora, Chiroptera, Perissodactyla, Primates, Pholidota and Rodentia orders), using the maximum-likelihood-based PHYML approach, with best model = WAG + G + I + F, and Bootstrap = 100. Pangolin-CoV GX strain was excluded in favor of Pangolin-CoV GD strain having better coverage. The ratio appended to each species at the end indicates the total similarity (shown in Fig. 3) to A) 15 key S protein sites in SARS-CoV-2 and B) 13 key S protein sites in SARS-CoV.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.