Origin and evolution of pathogenic coronaviruses.

Severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) are two highly transmissible and pathogenic viruses that emerged in humans at the beginning of the 21st century. Both viruses likely originated in bats, and genetically diverse coronaviruses that are related to SARS-CoV and MERS-CoV were discovered in bats worldwide. In this Review, we summarize the current knowledge on the origin and evolution of these two pathogenic coronaviruses and discuss their receptor usage; we also highlight the diversity and potential of spillover of bat-borne coronaviruses, as evidenced by the recent spillover of swine acute diarrhoea syndrome coronavirus (SADS-CoV) to pigs.

likely the major natural reservoirs of alphacoronaviruses and betacoronaviruses 24 .

Animal origin and evolution of SARS-CoV
At the beginning of the SARS epidemic, almost all early index patients had animal exposure before developing disease. After the causative agent of SARS was identified, SARS-CoV and/or anti-SARS-CoV antibodies were found in masked palm civets (Paguma larvata) and animal handlers in a market place 12,16,[39][40][41][42] . However, later, wide-reaching investigations of farmed and wild-caught civets revealed that the SARS-CoV strains found in market civets were transmitted to them by other animals 16,39 . In 2005, two teams independently reported the discovery of novel coronaviruses related to human SARS-CoV, which were named SARS-CoV-related viruses or SARS-like coronaviruses, in horseshoe bats (genus Rhinolophus) 15,43 . These discoveries suggested that bats may be the natural hosts for SARS-CoV and that civets were only intermediate hosts. Subsequently, many coronaviruses phylogenetically related to SARS-CoV (SARSr-CoVs) were discovered in bats from different provinces in China and also from European, African and Southeast Asian countries 15,20,38,[43][44][45][46][47][48][49][50][51][52][53][54] (FIg. 4; Supplementary Fig. S1a). According to the ICTV criteria, only the strains found in Rhinolophus bats in European countries, Southeast Asian countries and China are SARSr-CoV variants. Those from Hipposideros bats in Africa are less closely related to SARS-CoV and should be classified as a new coronavirus species 54 . These data indicate that SARSr-CoVs have wide geographical spread and might have been prevalent in bats for a very long time. A 5-year longitudinal study revealed the coexistence of highly diverse SARSr-CoVs in bat populations in one cave of Yunnan province, China 18,20,55 . This location is a diversity hot spot, and the SARSr-CoVs in this location contain all the genetic diversity found in other locations of China. Furthermore, the viral strains that exist in this one location contain all genetic elements that are needed to form SARS-CoV (FIg. 5). As no direct progenitor of SARS-CoV was found in bat populations despite 15 years of searching and as RNA recombination is frequent within coronaviruses 56 , it is highly likely that SARS-CoV newly emerged through recombination of Coronaviruses form enveloped and spherical particles of 100-160 nm in diameter. They contain a positive-sense, single-stranded RNA (ssRNA) genome of 27-32 kb in size. The 5'-terminal two-thirds of the genome encodes a polyprotein, pp1ab, which is further cleaved into 16 nonstructural proteins that are involved in genome transcription and replication. The 3' terminus encodes structural proteins, including envelope glycoproteins spike (S), envelope (E), membrane (M) and nucleocapsid (N). In addition to the genes encoding structural proteins, there are accessory genes that are species-specific and dispensable for virus replication. Here, we compare prototypical and representative strains of four coronavirus genera: feline infectious peritonitis virus (FIPV), Rhinolophus bat coronavirus HKU2, severe acute respiratory syndrome coronavirus (SARS-CoV) strains GD02 and SZ3 from humans infected during the early phase of the SARS epidemic and from civets, respectively , SARS-CoV strain hTor02 from humans infected during the middle and late phases of the SARS epidemic, bat SARS-related coronavirus (SARSr-CoV) strain WIV1, Middle East respiratory syndrome coronavirus (MERS-CoV), mouse hepatitis virus (MHV), infectious bronchitis virus (IBV) and bulbul coronavirus HKU11.
www.nature.com/nrmicro bat SARSr-CoVs in this or other yet-to-be-identified bat caves. This hypothesis is consistent with previous data showing that a direct progenitor of SARS-CoV emerged before 2002 (rEFs 42,57,58 ). Recombination analysis also strongly supported the hypothesis that the civet SARS-CoV strain SZ3 arose through recombination of two existing bat strains, WIV16 and Rf4092 (rEF. 20 ).
Furthermore, WIV16, the closest relative to SARS-CoV found in bats, likely arose through recombination of two other prevalent bat SARSr-CoV strains 20 . The most frequent recombination breakpoints are within the S gene, which encodes the spike (S) protein that contains the receptor-binding domain (RBD), and upstream of orf8, which encodes an accessory protein 20,58,59 . Given the prevalence and great genetic diversity of bat SARSr-CoVs, their close coexistence and the frequent recombination of the coronaviruses, it is expected that novel variants will emerge in the future 60,61 . Because there were no SARS cases in Yunnan province during the SARS outbreak, we hypothesize that the direct progenitor of SARS-CoV was produced by recombination within bats and then transmitted to farmed civets or another mammal, which then transmitted the virus to civets by faecal-oral transmission. When the virus-infected civets were transported to Guangdong market, the virus spread in market civets and acquired further mutations before spillover to humans.

Variability of SARS-CoV in humans and civets
The genome sequences of SARS-CoVs from market civets are almost identical to the genomes of human SARS-CoVs 42,62 . However, two genes show major variation. The first variable region is located in the S gene. The SARS-CoV S protein is functionally divided into two subunits, denoted S1 and S2, which are responsible for receptor binding and fusion with the cellular membrane, respectively 1 . S1 is further divided into the amino-terminal domain (S1-NTD) and the carboxy-terminal domain (S1-CTD). The S1-CTD functions as the RBD and is responsible for binding ACE2 and entering cells 7,63,64 . Two amino acid residues in the RBD, 479 and 487, were identified to be essential for ACE2-mediated SARS-CoV infection and critical for virus transmission from civets to humans 76,78 .
The second major location of variation is the accessory gene orf8 (FIg. 5). On the basis of SARS spread, the SARS 2002-2003 outbreak could be divided into three phases, with the early phase characterized by a limited number of localized cases, followed by a middle phase during which a superspreader event occurred in a hospital and finally the late phase of international spread 62 . The viral genomes from early-phase patients contain two genotypes of orf8, one with a complete orf8 (369 nucleo tides) and the other containing an 82-nucleotide deletion. By contrast, viral genomes from late-phase patients and most of the genomes from middle-phase patients contain a split orf8 (orf8a and orf8b) owing to a 29-nucleotide deletion; two exceptions were found in middle-phase genomes, one containing an 82-nucleotide deletion in orf8 and the other with the whole orf8 deleted. The human isolates from 2004 and all civet SARS-CoV genomes have a complete orf8 except one civet strain with an 82-nucleotide deletion 62 . These data indicate that orf8 genes underwent adaptations during transmission from animals to humans during the SARS epidemic. A limited functional analysis suggested that the ORF8a protein is dispensable for SARS-CoV replication in Vero   Variability of bat SARSr-CoVs SARS-CoVs and bat SARSr-CoVs mainly vary in three regions: S, ORF8 and ORF3 (FIg. 5). Bat SARSr-CoVs share high sequence identity with SARS-CoV in the S2 region but are highly variable in the S1 region. Compared with human and civet SARS-CoV, bat SARSr-CoV S1 can be divided into two clades: clade 1, which is found only in Yunnan province, has the same size S protein as human and civet isolates [18][19][20]51 , whereas clade 2, which is found in many locations, has a shorter size S protein owing to deletions of 5, 12 or 13 amino acids in length 15 BtC oV HK U9 -1 BatCoV The second variable region is located in ORF8. Most of bat SARSr-CoVs retain an intact orf8 (366 or 369 nucleotides) and share 47.7-100% sequence identity among themselves and 50.6-98.4% with SARS-CoV in civets and early-phase patients. A split orf8 (364 nucleotides) owing to a 5-nucleotide deletion was found in one bat SARSr-CoV strain, similar to that of SARS-CoVs from middle-phase and late-phase patients 20 . The European bat SARSr-CoV has completely lost orf8 (rEF. 45 ). These data show that the orf8 genes in bat SARSr-CoVs are constantly evolving in their natural reservoirs. Considering the variability of orf8 in bats, civets and humans, investigating the function of orf8 is a priority, particularly the contribution of these different variants to viral pathogenesis.
The third variable region is in ORF3. The SARS-CoV genome encodes a 154-amino acid ORF3b, which is an interferon antagonist. Bat SARSr-CoVs and SARS-CoV are highly similar in ORF3a (96.4-98.9% amino acid identity), but bat SARSr-CoVs have different sizes of ORF3b (54-154 amino acids) (a large part of the region encoding ORF3b overlaps with the ORF3a coding region) 20,70 . ORF3b retains the anti-interferon function in some bat SARSr-CoVs but has lost the function in other bat SARSr-CoVs 70 .
A novel accessory gene, named orfx and located between orf6 and orf7, was identified in the genomes of several bat SARSr-CoVs from Yunnan province [18][19][20] (FIg. 5). A preliminary study indicated that ORFX is involved in an anti-interferon response 71 .
Receptor usage of SARS-CoV and SARSr-CoV ACE2 binding is a critical determinant for the host range of SARS-CoV 72,73 . Electron microscopic studies have shown that the SARS-CoV S protein forms a clover shaped trimer, with three S1 heads and a trimeric S2 stalk 74,75 . The RBD is located on the tip of each S1 head. The RBD binds to the outer surface of ACE2, away from its zinc-chelating enzymatic site 77,141 (FIg. 6a). Different SARS-CoV strains isolated from several hosts vary in their binding affinities for human ACE2 and consequently in their infectivity of human cells 76,78 (FIg. 6b). The epidemic strain hTor02 was isolated from humans during the late phase of the outbreak in 2002-2003. It has a high affinity for human ACE2 and high infectivity in human cells, and consequently, it was transmitted efficiently between humans 62 . Strains cSz02 and cHb05 were isolated from palm civets in 2002-2003 and 2005, respectively. Both have low affinity for human ACE2 and low infectivity in human cells but have high affinity for civet ACE2 and high infectivity in civet cells 12,79 . Strain hcGd03 was isolated from both humans and palm civets in 2003-2004 and has moderate affinity for human ACE2 and moderate infectivity in human cells; it infected humans but did not transmit between humans 80 . Strain hHae08 was isolated from human cell culture and has high affinity for human ACE2 and high infectivity in human cells 81 . Understanding the molecular basis for human receptor usage by different SARS-CoV strains is crucial for understanding the cross-species transmission of SARS-CoV and for epidemiological monitoring of potential future outbreaks.

SARS-CoV mutations that affect human and civet receptor binding.
Crystal structures of the SARS-CoV RBD complexed with human ACE2 revealed that the SARS-CoV RBD contains a core structure and a receptor-binding ).The strain Zhejiang2013 (GenBank No. KF636752) was used as a root. b | By contrast, Middle East respiratory syndrome-related coronaviruses (MERSr-CoVs) form two major viral lineages, L1 and L2. L1 is found in humans and camels, and L2 is found only in camels. Two small clusters, B1 (bat 1) and B2, and one single virus, SA , from South Africa, were found in bats. The phylogenetic tree of MERSr-CoVs is based on a published trees 94,139 and reconstructed using full-genome alignment of all coding regions using the same method as above. HKU4-1 (EF065505) and HKU5-1 (EF065509), two 2c betacoronaviruses, served as the root of the tree. Detailed phylogenetic trees and grouping information can be found in Supplementary Fig. S1. MERS-CoV, Middle East respiratory syndrome coronavirus. motif (RBM) 82,141 (FIg. 6a). Two virus-binding hot spots have been identified at the interface of the RBD and human ACE2, centring on ACE2 residues Lys31 (hot spot 31) and Lys353 (hot spot 353) 83,84 (FIg. 6b). They both consist of a salt bridge (between Lys31 and Glu35 for hot spot 31 and between Lys353 and Asp38 for hot spot 353); both salt bridges are buried in hydrophobic pockets and contribute a substantial amount of energy to RBD-ACE2 binding as well as filling voids at the RBD-ACE2 interface. Naturally selected RBM mutations all interact with the hot spots (FIg. 6b; TAblE 1) and affect RBD-ACE2 binding.
Mutations in RBM residue 479 had an important role in the civet-to-human transmission of SARS-CoV 42,76,78,85 . Residue 479 is an asparagine in strains hTor02, hcGd03 and hHae08 but is a lysine in strain cSz02 and an arginine in strain cHb05 (TAblE 1). Asn479 is located near hot spot 31, without interfering with the structure of hot spot 31 (rEF. 85 ) (FIg. 6b, c). However, a change to Lys479 leads to steric and electrostatic interference with hot spot 31, reducing the binding affinity between the SARS-CoV RBD and human ACE2. By contrast, Arg479 reaches the vicinity of hot spot 353 and forms a salt bridge with ACE2 residue Asp38 (rEF. 83 ) (FIg. 6d). Hence, strains hTor02, hcGd03 and hHae08 (all of which contain Asn479) and strain cHb05 (which contains Arg479) recognize human ACE2 and infect human cells efficiently, whereas strain cSz02 (which contains Lys479) recognizes human ACE2 inefficiently and infects human cells inefficiently. The above structural analyses are supported by biochemical, functional and epidemiological data 42,76,78,[83][84][85] . Because of residue differences between human ACE2 and civet ACE2, both Asn479 and Lys479 fit well into the interface between the RBD and civet ACE2, although Arg479 fits even better 83,85 ; consequently, strains hTor02, cSz02, hcGd03 and cHb05 (which contain either Asn479, Lys479 or Arg479) recognize civet ACE2 and infect civet cells efficiently 79 . In sum, Asn479 and Arg479 are viral adaptations to human ACE2, whereas Lys479 is incompatible with human ACE2; Arg479 is a viral adaptation to civet ACE2, whereas Asn479 and Lys479 are also compatible with civet ACE2.
Mutations in RBM residue 487 had an important role in the human-to-human transmission of SARS-CoV. Residue 487 is a threonine in strain hTor02 but is a serine in the other strains isolated from humans and civets. The methyl group of Thr487 interacts with hot spot 353 in human ACE2 by providing stacking support for the formation of the salt bridge between Lys353 and Asp38; consequently, strain hTor02 recognizes human ACE2 efficiently and was transmitted between humans during the 2002-2003 SARS epidemic. By contrast, Ser487 cannot provide support to hot spot 353, and hence the other strains isolated from humans and civets recognize human ACE2 inefficiently. Consequently, neither cSz02 nor hcGd03 was transmitted between humans. The above structural analyses are supported by biochemical, functional and epidemiological data 42,76,78,[83][84][85] . Because of residue differences between human ACE2 and civet ACE2, Ser487 fits well into the RBD-civet ACE2 interface although still not as well as Thr487 (rEFs 83,85 ); consequently, strains sSZ02, hcGd03 and cHb05 (which contain Ser487) recognize civet ACE2 and infect civet cells efficiently 79 . In sum, Thr487 is a viral adaptation to both human and civet ACE2, and Ser487 is much more compatible with civet ACE2 than with human ACE2 (FIg. 6b).
RBM residues 442, 472 and 480 also contribute to receptor recognition and host range of SARS-CoV although not as much as residues 479 and 487. Detailed structural, biochemical and functional analyses showed that Phe442, Phe472 and Asp480 are viral adaptations to human ACE2, whereas Tyr442, Leu472 or Pro472, and Gly480 are viral adaptations to civet ACE2 (rEFs 72,83 ).

Fig. 5 | Variable regions in different SARS-CoV and bat SARSr-CoV isolates.
Variability and thus species adaptation majorly affect three severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-related coronavirus (SARSr-CoV) proteins: the spike protein (S) (both the S1 amino-terminal domain (S1-NTD) and the S1 receptorbinding domain (S1-RBD) show variability), ORF3 (3a and 3b) and ORF8 (8a and b). SARS-CoV GD02 and hTor02 represent strains that were isolated from patients during the early , and middle or late phase of the SARS epidemic in 2002-2003, respectively ; SARS-CoV CZ3 is a representative of strains isolated from civets in 2003 and 2004 (rEFs 42,62 ). All bat SARSr-CoVs, except HKU3 and Rp3, were discovered in Yunnan province during 2011-2015. On the basis of deletions in the RBD, bat SARSr-CoVs can be divided into two clades. Those without a deletion and thus an identical size in S1 to SARS-CoV can be further divided into four genotypes: genotype 1, represented by WIV16, is highly similar to SARS-CoV in both the NTD and the RBD; genotype 2, represented by WIV1, differs in NTD from SARS-CoV; genotype 3, represented by Rs4231, differs in RBD from SARS-CoV; and genotype 4, represented by SHC014 and Rs4084, differs in both NTD and RBD from SARS-CoV 20 . The differences in S influence species-specific receptor binding, whereas differences in the accessory proteins, including potentially the newly discovered ORFX (X), mainly affect immune responses and viral immune evasion. Adapted from rEF. 20 , CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

Salt bridge
A structure in proteins that forms a bond between oppositely charged residues that are sufficiently close to each other to experience electrostatic attraction.

www.nature.com/nrmicro
To corroborate the importance of these residues for SARS-CoV binding to either human or civet ACE2, two SARS-CoV S proteins, hOptimize and cOptimize, were rationally designed: the former contains all of the human ACE2-adapted residues (Phe442, Phe472, Asn479, Asp480 and Thr487), whereas the latter contains the civet ACE2-adapted residues (Tyr442, Pro472, Arg479, Gly480 and Thr487). These two S proteins demonstrate exceptionally high affinity for human ACE2 and civet ACE2, confirming that the human ACE2adapted and civet ACE2-adapted RBM residues help determine SARS-CoV host range 72,83 Fig. 6 | Receptor recognition by SARS-CoV and MERS-CoV. a | Severe acute respiratory syndrome coronavirus (SARS-CoV) uses its receptorbinding domain (RBD) (as shown in the structure of strain hTor02, containing core structure (cyan) and receptor-binding motif (RBM; magenta)) to bind human angiotensin-converting enzyme 2 (ACE2; green; Protein Data Bank ID: 2AJF). ACE2 is a peptidase with zinc (blue) in its active centre. b | Several residues in the host and viral receptor, as well as two salt bridges that stabilize the structure (dotted lines) and form two binding hot spots, are crucial for binding of the severe acute respiratory syndrome (SARS) epidemic strain hTor02. Hydrophobic residues surrounding the two salt bridges are present in the structure but are not shown in the figure. c | By contrast, the SARS-related coronavirus (SARSr-CoV) strain bWIV1, which was isolated from bats and can infect both civet and human cells, differs in residues 442, 472 and 487. The mutation from threonine to asparagine in residue 487 introduces a polar side chain and is predicted to interfere with binding at hot spot 353. The model shown here was built on the basis of the structure of hTor02 RBD complexed with human ACE2 (Protein Data Bank ID: 2AJF), in which residues 442, 472 and 487 were mutated from those in strain hTor02 to those in strain bWIV1. d | The bat SARSr-CoV strain bRsSHC014 can also infect human and civet cells; it carries an alanine in position 487 , and the short side chain of this residue does not support the structure of hot spot 353. The model was built on the basis of the structure of cOptimize RBD complexed with human ACE2 (Protein Data Bank ID: 3SCJ), in which residues 442, 480 and 487 were mutated from those in strain cOptimize to those in strain bWIV1. e | The Middle East respiratory syndrome coronavirus (MERS-CoV) RBD (core structure in cyan and RBM in magenta) binds human dipeptidyl peptidase 4 (DPP4; green; Protein Data Bank ID: 4KR0). Structure figures were made using PyMOL 115 . Modelled mutations in panels c and d were performed using Coot 140 . Panels a-d are adapted from rEF. 83  receptor binding, proteolytic cleavage of S and potentially other mutations that affect virion and trimer stability may also be important for virus transmissibility in different hosts, and these factors need to be studied further.

SARSr-CoV mutations that affect receptor binding.
To date, numerous SARSr-CoV strains have been identified from bats 15,16,[18][19][20] . These bat SARSr-CoVs are the likely progenitors of SARS-CoV that infected humans and civets, and hence understanding their interactions with human or civet ACE2 is critical for tracing the origins of SARS-CoV and for preventing and controlling future SARS-CoV outbreaks in humans. The RBD sequences of these bat SARSr-CoVs fall into three major groups; the representative strains from each group are bHKU3 (isolated in 2005), bWIV1 (isolated in 2013) and bRsSHC014 (isolated in 2013) (TAblE 1). Strains bWIV1 and bRsSHC014, but not strain bHKU3, use both human and civet ACE2 and hence can infect both human and civet cells 16,[18][19][20]86,87 . Strain bHKU3 has a truncated RBM (TAblE 1), which distorts the structure of the RBM and abolishes its binding to human and civet ACE2. Neither strain bWIV1 nor strain bRsSHC014 contains truncations in its RBM, and hence, their RBMs likely retain the same structure as SARS-CoV RBMs. Here, we analysed the potential interactions between these two strains (bWIV1 and bRsSHC014) and human ACE2 by building homology structural models of their RBDs complexed with human ACE2, focusing on residues 479 and 487 (FIg. 6c, d). Strain bWIV1 contains Asn479 and Asn487 in its RBM. Whereas Asn479 is a viral adaptation to human ACE2, the polar side chain of Asn487 may have unfavourable interactions with the aliphatic portion of residue Lys353 in human ACE2, which is part of hot spot 353 (FIg. 6c). Strain bRsSHC014 contains Arg479 and Ala487 in its RBM. Whereas Arg479 is a viral adaptation to human ACE2, the small side chain of Ala487 does not provide support to the structure of hot spot 353 (FIg. 6d). Therefore, although both bWIV1 and bRsSHC014 can infect human airway cells, they bind human ACE2 less well than hTor02 and produce less severe symptoms than the epidemic strain of SARS-CoV in vivo 88,89 . Similarly, both bWIV1 and bRsSHC014 can infect civet cells, but they bind civet ACE2 less well than cSz02. Thus, it is predicted that both strains will be attenuated compared with early-phase or late-phase human SARS epidemic viruses. Future evolution of bat SARSr-CoV strains bWIV1 and bRsSHC014 in crucial RBM residues may allow them to cross the species barriers between bats, civets and humans, posing potential health threats.

Origin and evolution of MERS-CoV
Whereas the emergence of SARS involved palm civets, most of the early MERS index cases had contact with dromedary camels. Indeed, MERS-CoV strains isolated from camels were almost identical to those isolated from humans [90][91][92][93][94][95] . Moreover, MERS-CoV-specific antibodies were highly prevalent in camels from the Middle East, Africa and Asia 13,14,[96][97][98][99][100][101][102][103] . MERS-CoV infections were detected in camel serum samples collected in 1983 (rEF. 100 ), suggesting that MERS-CoV was present in camels at least 30 years ago. Genomic sequence analysis indicated that MERS-CoV, Tylonycteris bat coronavirus HKU4 and Pipistrellus bat coronavirus HKU5 are phylogenetically related (denoted as betacoronavirus lineage C) 21 . The viruses in this lineage have identical genomic structures and are highly conserved in their poly proteins and most structural proteins, but their S proteins and accessory proteins are highly variable. MERSr-CoVs were found in at least 14 bat species from two bat families, Vespertilionidae and Nycteridae. However, none of these MERSr-CoVs is a direct progenitor of MERS-CoV, as their S proteins differ substantially from that of MERS-CoV 98,104-106 .
To understand the evolutionary relationships between MERS-CoV and MERSr-CoVs, we constructed a phylogenetic tree on the basis of the alignment of all the coding regions (FIg. 4b; Supplementary Fig. S1b). The phylogenetic tree contains two main clusters and several small clades or strains. Overall, the genetic diversity within the L1 and L2 viral lineages is low, indicating that humans and camels have been infected by viruses from the same source within a short time period. The L1 viruses include human and camel MERS-CoVs mainly be other yet-to-be-identified viruses that are circulating in nature and directly contributed to the emergence of MERS-CoV in humans and camels. Hopefully, such viruses will be found in bats in the future. Not surprisingly, recombination events have taken place in the evolution and emergence of MERS-CoV 94,105,107-109 . Phylogenetic trees constructed using genes encoding orf1ab and S were incongruent with the tree topology of the complete genome, suggesting potential recombination in these genes 108 . Numerous recombinations imply that MERS-CoV originated from the exchange of genetic elements between different viral ancestors, including those isolated from camels and the assumed natural host bats 94,105,107,110,111 .

Variability of human and camel MERS-CoV
The full-length genomic sequences of MERS-CoVs isolated from humans and camels are almost identical (>99% identity). The major variations are located in S, ORF4b and ORF3, particularly in African camel MERS-CoVs 94 . Substitutions of a few amino acid residues were found in the S protein of some camel MERS-CoVs, but none of them was located in the RBD 94,112 . Neutralization assays indicated that camel sera that are positive for MERS-CoV can completely neutralize the human MERS-CoV strains, suggesting that MERS-CoVs isolated from humans and camels are antigenically similar to each other 94 . MERS-CoVs from both humans and camels contain variable ORF3 and ORF4 proteins with different lengths owing to either terminal truncations or internal deletions 94 . ORF4b is known to be an interferon antagonist 113,114 . MERS-CoV isolates from West African camels with a truncated ORF4b gene replicate less efficiently in human cell culture and are less pathogenic in human DPP4 transgenic mice 94 . Curiously, deletion of the orf4 gene in the human MERS-CoV strain EMC did not substantially reduce virus replication, although it induced a stronger interferon response 94 . Another study demonstrated that the deletion of orf3-orf5 dramatically attenuated MERS-CoV virulence, primarily through increased host responses, including disrupted cellular processes, increased activation of the interferon pathway and robust inflammation 115 .

Variability of bat MERSr-CoVs
To date, bat MERSr-CoVs and human and camel MERS-CoVs share the same genomic structures but differ substantially in their genomic sequences 105,106,110,111,116 . The highest overall genomic sequence identity between bat MERSr-CoV and human and camel MERS-CoV is ~85%. On the basis of their genomic sequences, several bat MERSr-CoV strains discovered in China, such as Ii-MERSr-CoV, Ve-MERSr-CoV and Hy-MERSr-CoV, have just reached the taxonomic threshold to be considered the same species as MERS-CoV 106,110,111 .
Compared with human and camel MERS-CoV, bat MERSr-CoVs vary most in S and accessory genes. The sequence identity of the S protein between bat MERSr-CoVs and human and camel MERS-CoVs is approximately 45-65%, with even lower sequence identity in the RBD region 110,111 . The size of these S proteins differs in these strains, mainly because of deletions in their RBD region and/or the S1 and S2 boundary. These deletions are considered to be related to the differences in receptor binding and cell entry 111,116 . The accessory genes, including those encoding ORF3, ORF4a, ORF4b and ORF5, are also highly variable in length and sequence between bat MERSr-CoVs and human and camel MERS-CoVs, suggesting substantial evolution of these genes in their natural hosts 105,106,110,111,116 .

Receptor usage of MERS-CoV and MERSr-CoV
In contrast to SARS-CoV, which uses ACE2 as its receptor, MERS-CoV uses DPP4. Similar to SARS-CoV S1-CTD, MERS-CoV S1-CTD functions as the viral RBD 10,117 . Like the SARS-CoV S1-CTD, the MERS-CoV S1-CTD also contains two subdomains, a core structure and an RBM 9,118-120 (FIg. 6e). The core structures of these two S1-CTDs are similar to each other, with both containing a five-stranded β-sheet as the main scaffold. However, their RBMs differ substantially: whereas the SARS-CoV RBM mainly contains loops, the MERS-CoV RBM mainly contains a four-stranded β-sheet. The structural differences between MERS-CoV and SARS-CoV RBMs account for the different receptor specificities of the two viruses 121 .
Like the interactions between SARS-CoV and ACE2, the interactions between MERS-CoV and DPP4 have been extensively examined. DPP4 from humans, camels, horses and bats can function as a receptor for MERS-CoV, whereas DPP4 from mice, hamsters and ferrets cannot 112,[122][123][124][125] . Key residue differences between human DPP4 and the DPP4 from other species affect the species specificities of MERS-CoV. For example, two residues (288 and 330) in mouse DPP4 and five residues (291, 295, 336, 341 and 346) in hamster DPP4 are largely responsible for the incompatibility of mouse and hamster DPP4s with MERS-CoV 112,123 . Mutating these residues to the corresponding residues in human DPP4 makes mouse and hamster DPP4 functional receptors for MERS-CoV. On the other hand, MERS-CoV and MERSr-CoVs have been isolated from camels and bats, respectively. MERS-CoV strains isolated from humans and camels are highly similar to each other, and they both use human DPP4 efficiently 112 . MERSr-CoVs from bats in general share only ~60-70% sequence identity with MERS-CoV in the RBD, and only some of these bat viruses, including HKU4, recognize DPP4 as the receptor 110,111,126 . However, they bind DPP4 less efficiently than MERS-CoV. Mutating three residues in the HKU4 RBD (540, 547 and 558) substantially increased its affinity for human DPP4 (rEF. 127 ). Overall, as in the case of SARS-CoV, receptor recognition is a crucial determinant of the host range of MERS-CoV.

SADS-CoV
From 28 October 2016 to 2 May 2017, swine acute diarrhoea syndrome (SADS) was observed in four pig breeding farms in Guangdong province, with a mortality up to 90% for piglets 5 days or younger. A novel HKU2-related bat coronavirus, named SADS-CoV, was identified as the causative agent 34,128,129 . The SADS-CoV isolates from piglets of the four farms were almost identical and shared 95% identity with Rhinolophus bat coronavirus HKU2 (rEF. 130 ), indicating the bat origin of this pig virus. Immediately after the SADS outbreak, SADS-related CoVs (SADSr-CoVs) with 96-98% sequence identity to SADS-CoV were detected in 9.8% of anal swabs collected from different Rhinolophus species in Guangdong province during 2013-2016. Although genetically highly similar, bat SADSr-CoVs show high genetic diversity in the S gene, with 72-92% nucleotide and 80-98% amino acid identity to SADS-CoV. Receptor analysis indicated that none of the known coronavirus receptors, ACE2, DPP4 and aminopeptidase N, are essential for SADS-CoV entry 34 . The mechanism of transmission of SADS-CoV from bats to pigs and the pathogenesis of bat-originated SADSr-CoVs in pigs need further exploration. This is the first documented spillover of a bat coronavirus that caused severe diseases in domestic animals, although molecular evolution data suggested PEDV probably originated in bats 37,38 .

Conclusions and future perspectives
The collected data on genetic evolution, receptor binding and pathogenesis demonstrated that SARS-CoV most likely originated in bats through sequential recombination of bat SARSr-CoVs. Recombination likely occurred in bats before SARS-CoV was introduced into Guangdong province through infected civets or other infected mammals from Yunnan. The introduced SARS-CoV underwent rapid mutations in S and orf8 and successfully spread in market civets. After several independent spillovers to humans, some of the strains underwent further mutations in S and became epidemic during the SARS outbreak in 2002-2003. However, a recent serological investigation revealed the presence of antibodies against the SARSr-CoV nucleocapsid in humans living around a bat cave but who had not shown clinical signs of disease, suggesting that the virus can infect humans through frequent contact 131 .
A similar scenario might have happened for MERS-CoV. Since its outbreak in 2012, MERSr-CoVs and related viruses (HKU4 and HKU5) have been found in different bat species in five continents 17,21,106,110,111,116,126,127,132 . The ORF1ab of these viruses is highly similar to MERS-CoV ORF1ab, but they are highly diverse in their S proteins. Surprisingly, some bat MERSr-CoVs and HKU can use the same receptor, DPP4, as MERS-CoV 110,111,126,127 . Given the massive number of coronaviruses carried by different bat species, the high plasticity in receptor usage and other features such as adaptive mutation and recombination, frequent interspecies transmission from bats to animals and humans is expected.
Currently, no clinical treatments or prevention strategies are available for any human coronavirus. Given the conserved RBDs of SARS-CoV and bat SARSr-CoVs, some anti-SARS-CoV strategies in development, such as anti-RBD antibodies or RBD-based vaccines, should be tested against bat SARSr-CoVs. Recent studies demonstrated that anti-SARS-CoV strategies worked against only WIV1 and not SHC014 (rEFs 71,88,89 ). In addition, little information is available on HKU3-related strains that have much wider geographical distribution and bear truncations in their RBD. Similarly, anti-S antibodies against MERS-CoV could not protect from infection with a pseudovirus bearing the bat MERSr-CoV S 111 . Furthermore, little is known about the replication and pathogenesis of these bat viruses. Thus, future work should be focused on the biological properties of these viruses using virus isolation, reverse genetics and in vitro and in vivo infection assays. The resulting data would help the prevention and control of emerging SARS-like or MERS-like diseases in the future.
It is widely accepted that many viruses have existed in their natural reservoirs for a very long time. The constant spillover of viruses from natural hosts to humans and other animals is largely due to human activities, including modern agricultural practices and urbanization. Therefore, the most effective way to prevent viral zoonosis is to maintain the barriers between natural reservoirs and human society, in mind of the 'one health' concept.