Introduction

Coronaviruses cause respiratory and intestinal infections in animals and humans1. They were not considered to be highly pathogenic to humans until the outbreak of severe acute respiratory syndrome (SARS) in 2002 and 2003 in Guangdong province, China2,3,4,5, as the coronaviruses that circulated before that time in humans mostly caused mild infections in immunocompetent people. Ten years after SARS, another highly pathogenic coronavirus, Middle East respiratory syndrome coronavirus (MERS-CoV) emerged in Middle Eastern countries6. SARS coronavirus (SARS-CoV) uses angiotensin-converting enzyme 2 (ACE2) as a receptor and primarily infects ciliated bronchial epithelial cells and type II pneumocytes7,8, whereas MERS-CoV uses dipeptidyl peptidase 4 (DPP4; also known as CD26) as a receptor and infects unciliated bronchial epithelial cells and type II pneumocytes9,10,11. SARS-CoV and MERS-CoV were transmitted directly to humans from market civets and dromedary camels, respectively12,13,14, and both viruses are thought to have originated in bats15,16,17,18,19,20,21. Extensive studies of these two important coronaviruses have not only led to a better understanding of coronavirus biology but have also been driving coronavirus discovery in bats globally21,22,23,24,25,26,27,28,29,30,31. In this Review, we focus on the origin and evolution of SARS-CoV and MERS-CoV. Specifically, we emphasize the ecological distribution, genetic diversity, interspecies transmission and potential for pathogenesis of SARS-related coronaviruses (SARSr-CoVs) and MERS-related coronaviruses (MERSr-CoVs) found in bats, as this information can help prepare countermeasures against future spillover and pathogenic infections in humans with novel coronaviruses.

Coronavirus diversity

Coronaviruses are members of the subfamily Coronavirinae in the family Coronaviridae and the order Nidovirales (International Committee on Taxonomy of Viruses). This subfamily consists of four genera — Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus — on the basis of their phylogenetic relationships and genomic structures (Fig. 1). The alphacoronaviruses and betacoronaviruses infect only mammals. The gammacoronaviruses and deltacoronaviruses infect birds, but some of them can also infect mammals24. Alphacoronaviruses and betacoronaviruses usually cause respiratory illness in humans and gastroenteritis in animals. The two highly pathogenic viruses, SARS-CoV and MERS-CoV, cause severe respiratory syndrome in humans, and the other four human coronaviruses (HCoV-NL63, HCoV-229E, HCoV-OC43 and HKU1) induce only mild upper respiratory diseases in immunocompetent hosts, although some of them can cause severe infections in infants, young children and elderly individuals1,28,29. Alphacoronaviruses and betacoronaviruses can pose a heavy disease burden on livestock; these viruses include porcine transmissible gastroenteritis virus32, porcine enteric diarrhoea virus (PEDV)33 and the recently emerged swine acute diarrhoea syndrome coronavirus (SADS-CoV)34. On the basis of current sequence databases, all human coronaviruses have animal origins: SARS-CoV, MERS-CoV, HCoV-NL63 and HCoV-229E are considered to have originated in bats; HCoV-OC43 and HKU1 likely originated from rodents28,29. Domestic animals may have important roles as intermediate hosts that enable virus transmission from natural hosts to humans. In addition, domestic animals themselves can suffer disease caused by bat-borne or closely related coronaviruses: genomic sequences highly similar to PEDV were detected in bats35,36,37,38, and SADS-CoV is a recent spillover from bats to pigs34 (Fig. 2). Currently, 7 of 11 ICTV-assigned Alphacoronavirus species and 4 of 9 Betacoronavirus species were identified only in bats (Fig. 3). Thus, bats are likely the major natural reservoirs of alphacoronaviruses and betacoronaviruses24.

Fig. 1: The genomes, genes and proteins of different coronaviruses.
figure 1

Coronaviruses form enveloped and spherical particles of 100–160 nm in diameter. They contain a positive-sense, single-stranded RNA (ssRNA) genome of 27–32 kb in size. The 5'-terminal two-thirds of the genome encodes a polyprotein, pp1ab, which is further cleaved into 16 non-structural proteins that are involved in genome transcription and replication. The 3' terminus encodes structural proteins, including envelope glycoproteins spike (S), envelope (E), membrane (M) and nucleocapsid (N). In addition to the genes encoding structural proteins, there are accessory genes that are species-specific and dispensable for virus replication. Here, we compare prototypical and representative strains of four coronavirus genera: feline infectious peritonitis virus (FIPV), Rhinolophus bat coronavirus HKU2, severe acute respiratory syndrome coronavirus (SARS-CoV) strains GD02 and SZ3 from humans infected during the early phase of the SARS epidemic and from civets, respectively, SARS-CoV strain hTor02 from humans infected during the middle and late phases of the SARS epidemic, bat SARS-related coronavirus (SARSr-CoV) strain WIV1, Middle East respiratory syndrome coronavirus (MERS-CoV), mouse hepatitis virus (MHV), infectious bronchitis virus (IBV) and bulbul coronavirus HKU11.

Fig. 2: Animal origins of human coronaviruses.
figure 2

Severe acute respiratory syndrome coronavirus (SARS-CoV) is a new coronavirus that emerged through recombination of bat SARS-related coronaviruses (SARSr-CoVs)20. The recombined virus infected civets and humans and adapted to these hosts before causing the SARS epidemic42,62. Middle East respiratory syndrome coronavirus (MERS-CoV) likely spilled over from bats to dromedary camels at least 30 years ago100 and since then has been prevalent in dromedary camels. HCoV-229E and HCoV-NL63 usually cause mild infections in immunocompetent humans. Progenitors of these viruses have recently been found in African bats133,134, and the camelids are likely intermediate hosts of HCoV-229E134,135. HCoV-OC43 and HKU1, both of which are also mostly harmless in humans, likely originated in rodents. Recently, swine acute diarrhoea syndrome (SADS) emerged in piglets. This disease is caused by a novel strain of Rhinolophus bat coronavirus HKU2, named SADS coronavirus (SADS-CoV)34; there is no evidence of infection in humans. Solid arrows indicate confirmed data. Broken arrows indicate potential interspecies transmission. Black arrows indicate infection in the intermediate animals, yellow arrows indicate a mild infection in humans, and red arrows indicate a severe infection in humans or animals.

Fig. 3: Phylogenetic relationships in the Coronavirinae subfamily.
figure 3

The highly human-pathogenic coronaviruses belong to the subfamily Coronavirinae from the family Coronaviridae. The viruses in this subfamily group into four genera (prototype or representative strains shown): Alphacoronavirus (purple), Betacoronavirus (pink), Gammacoronavirus (green) and Deltacoronavirus (blue). Classic subgroup clusters are labelled 1a and 1b for the alphacoronaviruses and 2a–2d for the betacoronaviruses. The tree is based on published trees of Coronavirinae25,136 and reconstructed with sequences of the complete RNA-dependent RNA polymerase-coding region of the representative coronaviruses (maximum likelihood method under the GTR + I + Γ model of nucleotide substitution as implemented in PhyML, version 3.1 (ref.137)). Only nodes with bootstrap support above 70% are shown. IBV, infectious bronchitis virus; MERS-CoV, Middle East respiratory syndrome coronavirus; MHV, mouse hepatitis virus; PEDV, porcine enteric diarrhoea virus; SARS-CoV, severe acute respiratory syndrome coronavirus; SARSr-CoV, SARS-related coronavirus.

Animal origin and evolution of SARS-CoV

At the beginning of the SARS epidemic, almost all early index patients had animal exposure before developing disease. After the causative agent of SARS was identified, SARS-CoV and/or anti-SARS-CoV antibodies were found in masked palm civets (Paguma larvata) and animal handlers in a market place12,16,39,40,41,42. However, later, wide-reaching investigations of farmed and wild-caught civets revealed that the SARS-CoV strains found in market civets were transmitted to them by other animals16,39. In 2005, two teams independently reported the discovery of novel coronaviruses related to human SARS-CoV, which were named SARS-CoV-related viruses or SARS-like coronaviruses, in horseshoe bats (genus Rhinolophus)15,43. These discoveries suggested that bats may be the natural hosts for SARS-CoV and that civets were only intermediate hosts. Subsequently, many coronaviruses phylogenetically related to SARS-CoV (SARSr-CoVs) were discovered in bats from different provinces in China and also from European, African and Southeast Asian countries15,20,38,43,44,45,46,47,48,49,50,51,52,53,54 (Fig. 4; Supplementary Fig. S1a). According to the ICTV criteria, only the strains found in Rhinolophus bats in European countries, Southeast Asian countries and China are SARSr-CoV variants. Those from Hipposideros bats in Africa are less closely related to SARS-CoV and should be classified as a new coronavirus species54. These data indicate that SARSr-CoVs have wide geographical spread and might have been prevalent in bats for a very long time. A 5-year longitudinal study revealed the coexistence of highly diverse SARSr-CoVs in bat populations in one cave of Yunnan province, China18,20,55. This location is a diversity hot spot, and the SARSr-CoVs in this location contain all the genetic diversity found in other locations of China. Furthermore, the viral strains that exist in this one location contain all genetic elements that are needed to form SARS-CoV (Fig. 5). As no direct progenitor of SARS-CoV was found in bat populations despite 15 years of searching and as RNA recombination is frequent within coronaviruses56, it is highly likely that SARS-CoV newly emerged through recombination of bat SARSr-CoVs in this or other yet-to-be-identified bat caves. This hypothesis is consistent with previous data showing that a direct progenitor of SARS-CoV emerged before 2002 (refs42,57,58). Recombination analysis also strongly supported the hypothesis that the civet SARS-CoV strain SZ3 arose through recombination of two existing bat strains, WIV16 and Rf4092 (ref.20). Furthermore, WIV16, the closest relative to SARS-CoV found in bats, likely arose through recombination of two other prevalent bat SARSr-CoV strains20. The most frequent recombination breakpoints are within the S gene, which encodes the spike (S) protein that contains the receptor-binding domain (RBD), and upstream of orf8, which encodes an accessory protein20,58,59. Given the prevalence and great genetic diversity of bat SARSr-CoVs, their close coexistence and the frequent recombination of the coronaviruses, it is expected that novel variants will emerge in the future60,61. Because there were no SARS cases in Yunnan province during the SARS outbreak, we hypothesize that the direct progenitor of SARS-CoV was produced by recombination within bats and then transmitted to farmed civets or another mammal, which then transmitted the virus to civets by faecal–oral transmission. When the virus-infected civets were transported to Guangdong market, the virus spread in market civets and acquired further mutations before spillover to humans.

Fig. 4: Phylogenetic analysis of SARSr-CoVs and MERSr-CoVs.
figure 4

a | The figure shows a simplified phylogenetic tree of severe acute respiratory syndrome-related coronaviruses (SARSr-CoVs) from bats. SARSr-CoVs cluster into three lineages, L1–L3, and human severe acute respiratory syndrome coronaviruses (SARS-CoVs) embed in L1. Two individual SARSr-CoVs do not cluster into these lineages: YN, a virus isolated from Yunnan province, China, and BG, a virus from Bulgaria, Europe. The tree is based on published trees20,138 and reconstructed using sequences of the complete RNA-dependent RNA polymerase-coding region (maximum likelihood method under the GTR + I + Γ model of nucleotide substitution as implemented in PhyML, version 3.1 (ref.137)).The strain Zhejiang2013 (GenBank No. KF636752) was used as a root. b | By contrast, Middle East respiratory syndrome-related coronaviruses (MERSr-CoVs) form two major viral lineages, L1 and L2. L1 is found in humans and camels, and L2 is found only in camels. Two small clusters, B1 (bat 1) and B2, and one single virus, SA, from South Africa, were found in bats. The phylogenetic tree of MERSr-CoVs is based on a published trees94,139 and reconstructed using full-genome alignment of all coding regions using the same method as above. HKU4-1 (EF065505) and HKU5-1 (EF065509), two 2c betacoronaviruses, served as the root of the tree. Detailed phylogenetic trees and grouping information can be found in Supplementary Fig. S1. MERS-CoV, Middle East respiratory syndrome coronavirus.

Fig. 5: Variable regions in different SARS-CoV and bat SARSr-CoV isolates.
figure 5

Variability and thus species adaptation majorly affect three severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-related coronavirus (SARSr-CoV) proteins: the spike protein (S) (both the S1 amino-terminal domain (S1-NTD) and the S1 receptor-binding domain (S1-RBD) show variability), ORF3 (3a and 3b) and ORF8 (8a and b). SARS-CoV GD02 and hTor02 represent strains that were isolated from patients during the early, and middle or late phase of the SARS epidemic in 2002–2003, respectively; SARS-CoV CZ3 is a representative of strains isolated from civets in 2003 and 2004 (refs42,62). All bat SARSr-CoVs, except HKU3 and Rp3, were discovered in Yunnan province during 2011–2015. On the basis of deletions in the RBD, bat SARSr-CoVs can be divided into two clades. Those without a deletion and thus an identical size in S1 to SARS-CoV can be further divided into four genotypes: genotype 1, represented by WIV16, is highly similar to SARS-CoV in both the NTD and the RBD; genotype 2, represented by WIV1, differs in NTD from SARS-CoV; genotype 3, represented by Rs4231, differs in RBD from SARS-CoV; and genotype 4, represented by SHC014 and Rs4084, differs in both NTD and RBD from SARS-CoV20. The differences in S influence species-specific receptor binding, whereas differences in the accessory proteins, including potentially the newly discovered ORFX (X), mainly affect immune responses and viral immune evasion. Adapted from ref.20, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

Variability of SARS-CoV in humans and civets

The genome sequences of SARS-CoVs from market civets are almost identical to the genomes of human SARS-CoVs42,62. However, two genes show major variation. The first variable region is located in the S gene. The SARS-CoV S protein is functionally divided into two subunits, denoted S1 and S2, which are responsible for receptor binding and fusion with the cellular membrane, respectively1. S1 is further divided into the amino-terminal domain (S1-NTD) and the carboxy-terminal domain (S1-CTD). The S1-CTD functions as the RBD and is responsible for binding ACE2 and entering cells7,63,64. Two amino acid residues in the RBD, 479 and 487, were identified to be essential for ACE2-mediated SARS-CoV infection and critical for virus transmission from civets to humans76,78.

The second major location of variation is the accessory gene orf8 (Fig. 5). On the basis of SARS spread, the SARS 2002–2003 outbreak could be divided into three phases, with the early phase characterized by a limited number of localized cases, followed by a middle phase during which a superspreader event occurred in a hospital and finally the late phase of international spread62. The viral genomes from early-phase patients contain two genotypes of orf8, one with a complete orf8 (369 nucleotides) and the other containing an 82-nucleotide deletion. By contrast, viral genomes from late-phase patients and most of the genomes from middle-phase patients contain a split orf8 (orf8a and orf8b) owing to a 29-nucleotide deletion; two exceptions were found in middle-phase genomes, one containing an 82-nucleotide deletion in orf8 and the other with the whole orf8 deleted. The human isolates from 2004 and all civet SARS-CoV genomes have a complete orf8 except one civet strain with an 82-nucleotide deletion62. These data indicate that orf8 genes underwent adaptations during transmission from animals to humans during the SARS epidemic. A limited functional analysis suggested that the ORF8a protein is dispensable for SARS-CoV replication in Vero E6 cells but may have a role in modulating endoplasmic reticulum stress, inducing apoptosis and inhibiting interferon responses in host cells20,65,66,67,68,69. Whether and how these adaptations were involved in SARS-CoV virulence are not fully clarified.

Variability of bat SARSr-CoVs

SARS-CoVs and bat SARSr-CoVs mainly vary in three regions: S, ORF8 and ORF3 (Fig. 5). Bat SARSr-CoVs share high sequence identity with SARS-CoV in the S2 region but are highly variable in the S1 region. Compared with human and civet SARS-CoV, bat SARSr-CoV S1 can be divided into two clades: clade 1, which is found only in Yunnan province, has the same size S protein as human and civet isolates18,19,20,51, whereas clade 2, which is found in many locations, has a shorter size S protein owing to deletions of 5, 12 or 13 amino acids in length15,43,44,45,48,50. Among the sequenced bat SARSr-CoVs, those with deletions in their RBDs show 78.2–80.2% amino acid sequence identity with SARS-CoV in the S protein, whereas those without deletions are much more closely related to SARS-CoV, with 90.0–97.2% amino acid sequence identity.

The second variable region is located in ORF8. Most of bat SARSr-CoVs retain an intact orf8 (366 or 369 nucleotides) and share 47.7–100% sequence identity among themselves and 50.6–98.4% with SARS-CoV in civets and early-phase patients. A split orf8 (364 nucleotides) owing to a 5-nucleotide deletion was found in one bat SARSr-CoV strain, similar to that of SARS-CoVs from middle-phase and late-phase patients20. The European bat SARSr-CoV has completely lost orf8 (ref.45). These data show that the orf8 genes in bat SARSr-CoVs are constantly evolving in their natural reservoirs. Considering the variability of orf8 in bats, civets and humans, investigating the function of orf8 is a priority, particularly the contribution of these different variants to viral pathogenesis.

The third variable region is in ORF3. The SARS-CoV genome encodes a 154-amino acid ORF3b, which is an interferon antagonist. Bat SARSr-CoVs and SARS-CoV are highly similar in ORF3a (96.4–98.9% amino acid identity), but bat SARSr-CoVs have different sizes of ORF3b (54–154 amino acids) (a large part of the region encoding ORF3b overlaps with the ORF3a coding region)20,70. ORF3b retains the anti-interferon function in some bat SARSr-CoVs but has lost the function in other bat SARSr-CoVs70.

A novel accessory gene, named orfx and located between orf6 and orf7, was identified in the genomes of several bat SARSr-CoVs from Yunnan province18,19,20 (Fig. 5). A preliminary study indicated that ORFX is involved in an anti-interferon response71.

Receptor usage of SARS-CoV and SARSr-CoV

ACE2 binding is a critical determinant for the host range of SARS-CoV72,73. Electron microscopic studies have shown that the SARS-CoV S protein forms a clover shaped trimer, with three S1 heads and a trimeric S2 stalk74,75. The RBD is located on the tip of each S1 head. The RBD binds to the outer surface of ACE2, away from its zinc-chelating enzymatic site77,141 (Fig. 6a). Different SARS-CoV strains isolated from several hosts vary in their binding affinities for human ACE2 and consequently in their infectivity of human cells76,78 (Fig. 6b). The epidemic strain hTor02 was isolated from humans during the late phase of the outbreak in 2002–2003. It has a high affinity for human ACE2 and high infectivity in human cells, and consequently, it was transmitted efficiently between humans62. Strains cSz02 and cHb05 were isolated from palm civets in 2002–2003 and 2005, respectively. Both have low affinity for human ACE2 and low infectivity in human cells but have high affinity for civet ACE2 and high infectivity in civet cells12,79. Strain hcGd03 was isolated from both humans and palm civets in 2003–2004 and has moderate affinity for human ACE2 and moderate infectivity in human cells; it infected humans but did not transmit between humans80. Strain hHae08 was isolated from human cell culture and has high affinity for human ACE2 and high infectivity in human cells81. Understanding the molecular basis for human receptor usage by different SARS-CoV strains is crucial for understanding the cross-species transmission of SARS-CoV and for epidemiological monitoring of potential future outbreaks.

Fig. 6: Receptor recognition by SARS-CoV and MERS-CoV.
figure 6

a | Severe acute respiratory syndrome coronavirus (SARS-CoV) uses its receptor-binding domain (RBD) (as shown in the structure of strain hTor02, containing core structure (cyan) and receptor-binding motif (RBM; magenta)) to bind human angiotensin-converting enzyme 2 (ACE2; green; Protein Data Bank ID: 2AJF). ACE2 is a peptidase with zinc (blue) in its active centre. b | Several residues in the host and viral receptor, as well as two salt bridges that stabilize the structure (dotted lines) and form two binding hot spots, are crucial for binding of the severe acute respiratory syndrome (SARS) epidemic strain hTor02. Hydrophobic residues surrounding the two salt bridges are present in the structure but are not shown in the figure. c | By contrast, the SARS-related coronavirus (SARSr-CoV) strain bWIV1, which was isolated from bats and can infect both civet and human cells, differs in residues 442, 472 and 487. The mutation from threonine to asparagine in residue 487 introduces a polar side chain and is predicted to interfere with binding at hot spot 353. The model shown here was built on the basis of the structure of hTor02 RBD complexed with human ACE2 (Protein Data Bank ID: 2AJF), in which residues 442, 472 and 487 were mutated from those in strain hTor02 to those in strain bWIV1. d | The bat SARSr-CoV strain bRsSHC014 can also infect human and civet cells; it carries an alanine in position 487, and the short side chain of this residue does not support the structure of hot spot 353. The model was built on the basis of the structure of cOptimize RBD complexed with human ACE2 (Protein Data Bank ID: 3SCJ), in which residues 442, 480 and 487 were mutated from those in strain cOptimize to those in strain bWIV1. e | The Middle East respiratory syndrome coronavirus (MERS-CoV) RBD (core structure in cyan and RBM in magenta) binds human dipeptidyl peptidase 4 (DPP4; green; Protein Data Bank ID: 4KR0). Structure figures were made using PyMOL115. Modelled mutations in panels c and d were performed using Coot140. Panels a–d are adapted from ref.83: this research was originally published in The Journal of Biological Chemistry. Wu, K. L., Peng, G. Q., Wilken, M., Geraghty, R. J. & Li, F. Mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus. J. Biol. Chem. 2012; 287:8904–8911. © American Society for Biochemistry and Molecular Biology.

SARS-CoV mutations that affect human and civet receptor binding

Crystal structures of the SARS-CoV RBD complexed with human ACE2 revealed that the SARS-CoV RBD contains a core structure and a receptor-binding motif (RBM)82,141 (Fig. 6a). Two virus-binding hot spots have been identified at the interface of the RBD and human ACE2, centring on ACE2 residues Lys31 (hot spot 31) and Lys353 (hot spot 353)83,84 (Fig. 6b). They both consist of a salt bridge (between Lys31 and Glu35 for hot spot 31 and between Lys353 and Asp38 for hot spot 353); both salt bridges are buried in hydrophobic pockets and contribute a substantial amount of energy to RBD–ACE2 binding as well as filling voids at the RBD–ACE2 interface. Naturally selected RBM mutations all interact with the hot spots (Fig. 6b; Table 1) and affect RBD–ACE2 binding.

Table 1 Mutations in the receptor-binding motif of SARS-CoV

Mutations in RBM residue 479 had an important role in the civet-to-human transmission of SARS-CoV42,76,78,85. Residue 479 is an asparagine in strains hTor02, hcGd03 and hHae08 but is a lysine in strain cSz02 and an arginine in strain cHb05 (Table 1). Asn479 is located near hot spot 31, without interfering with the structure of hot spot 31 (ref.85) (Fig. 6b, c). However, a change to Lys479 leads to steric and electrostatic interference with hot spot 31, reducing the binding affinity between the SARS-CoV RBD and human ACE2. By contrast, Arg479 reaches the vicinity of hot spot 353 and forms a salt bridge with ACE2 residue Asp38 (ref.83) (Fig. 6d). Hence, strains hTor02, hcGd03 and hHae08 (all of which contain Asn479) and strain cHb05 (which contains Arg479) recognize human ACE2 and infect human cells efficiently, whereas strain cSz02 (which contains Lys479) recognizes human ACE2 inefficiently and infects human cells inefficiently. The above structural analyses are supported by biochemical, functional and epidemiological data42,76,78,83,84,85. Because of residue differences between human ACE2 and civet ACE2, both Asn479 and Lys479 fit well into the interface between the RBD and civet ACE2, although Arg479 fits even better83,85; consequently, strains hTor02, cSz02, hcGd03 and cHb05 (which contain either Asn479, Lys479 or Arg479) recognize civet ACE2 and infect civet cells efficiently79. In sum, Asn479 and Arg479 are viral adaptations to human ACE2, whereas Lys479 is incompatible with human ACE2; Arg479 is a viral adaptation to civet ACE2, whereas Asn479 and Lys479 are also compatible with civet ACE2.

Mutations in RBM residue 487 had an important role in the human-to-human transmission of SARS-CoV. Residue 487 is a threonine in strain hTor02 but is a serine in the other strains isolated from humans and civets. The methyl group of Thr487 interacts with hot spot 353 in human ACE2 by providing stacking support for the formation of the salt bridge between Lys353 and Asp38; consequently, strain hTor02 recognizes human ACE2 efficiently and was transmitted between humans during the 2002–2003 SARS epidemic. By contrast, Ser487 cannot provide support to hot spot 353, and hence the other strains isolated from humans and civets recognize human ACE2 inefficiently. Consequently, neither cSz02 nor hcGd03 was transmitted between humans. The above structural analyses are supported by biochemical, functional and epidemiological data42,76,78,83,84,85. Because of residue differences between human ACE2 and civet ACE2, Ser487 fits well into the RBD–civet ACE2 interface although still not as well as Thr487 (refs83,85); consequently, strains sSZ02, hcGd03 and cHb05 (which contain Ser487) recognize civet ACE2 and infect civet cells efficiently79. In sum, Thr487 is a viral adaptation to both human and civet ACE2, and Ser487 is much more compatible with civet ACE2 than with human ACE2 (Fig. 6b).

RBM residues 442, 472 and 480 also contribute to receptor recognition and host range of SARS-CoV although not as much as residues 479 and 487. Detailed structural, biochemical and functional analyses showed that Phe442, Phe472 and Asp480 are viral adaptations to human ACE2, whereas Tyr442, Leu472 or Pro472, and Gly480 are viral adaptations to civet ACE2 (refs72,83). To corroborate the importance of these residues for SARS-CoV binding to either human or civet ACE2, two SARS-CoV S proteins, hOptimize and cOptimize, were rationally designed: the former contains all of the human ACE2-adapted residues (Phe442, Phe472, Asn479, Asp480 and Thr487), whereas the latter contains the civet ACE2-adapted residues (Tyr442, Pro472, Arg479, Gly480 and Thr487). These two S proteins demonstrate exceptionally high affinity for human ACE2 and civet ACE2, confirming that the human ACE2-adapted and civet ACE2-adapted RBM residues help determine SARS-CoV host range72,83. In addition to receptor binding, proteolytic cleavage of S and potentially other mutations that affect virion and trimer stability may also be important for virus transmissibility in different hosts, and these factors need to be studied further.

SARSr-CoV mutations that affect receptor binding

To date, numerous SARSr-CoV strains have been identified from bats15,16,18,19,20. These bat SARSr-CoVs are the likely progenitors of SARS-CoV that infected humans and civets, and hence understanding their interactions with human or civet ACE2 is critical for tracing the origins of SARS-CoV and for preventing and controlling future SARS-CoV outbreaks in humans. The RBD sequences of these bat SARSr-CoVs fall into three major groups; the representative strains from each group are bHKU3 (isolated in 2005), bWIV1 (isolated in 2013) and bRsSHC014 (isolated in 2013) (Table 1). Strains bWIV1 and bRsSHC014, but not strain bHKU3, use both human and civet ACE2 and hence can infect both human and civet cells16,18,19,20,86,87. Strain bHKU3 has a truncated RBM (Table 1), which distorts the structure of the RBM and abolishes its binding to human and civet ACE2. Neither strain bWIV1 nor strain bRsSHC014 contains truncations in its RBM, and hence, their RBMs likely retain the same structure as SARS-CoV RBMs. Here, we analysed the potential interactions between these two strains (bWIV1 and bRsSHC014) and human ACE2 by building homology structural models of their RBDs complexed with human ACE2, focusing on residues 479 and 487 (Fig. 6c, d). Strain bWIV1 contains Asn479 and Asn487 in its RBM. Whereas Asn479 is a viral adaptation to human ACE2, the polar side chain of Asn487 may have unfavourable interactions with the aliphatic portion of residue Lys353 in human ACE2, which is part of hot spot 353 (Fig. 6c). Strain bRsSHC014 contains Arg479 and Ala487 in its RBM. Whereas Arg479 is a viral adaptation to human ACE2, the small side chain of Ala487 does not provide support to the structure of hot spot 353 (Fig. 6d). Therefore, although both bWIV1 and bRsSHC014 can infect human airway cells, they bind human ACE2 less well than hTor02 and produce less severe symptoms than the epidemic strain of SARS-CoV in vivo88,89. Similarly, both bWIV1 and bRsSHC014 can infect civet cells, but they bind civet ACE2 less well than cSz02. Thus, it is predicted that both strains will be attenuated compared with early-phase or late-phase human SARS epidemic viruses. Future evolution of bat SARSr-CoV strains bWIV1 and bRsSHC014 in crucial RBM residues may allow them to cross the species barriers between bats, civets and humans, posing potential health threats.

Origin and evolution of MERS-CoV

Whereas the emergence of SARS involved palm civets, most of the early MERS index cases had contact with dromedary camels. Indeed, MERS-CoV strains isolated from camels were almost identical to those isolated from humans90,91,92,93,94,95. Moreover, MERS-CoV-specific antibodies were highly prevalent in camels from the Middle East, Africa and Asia13,14,96,97,98,99,100,101,102,103. MERS-CoV infections were detected in camel serum samples collected in 1983 (ref.100), suggesting that MERS-CoV was present in camels at least 30 years ago. Genomic sequence analysis indicated that MERS-CoV, Tylonycteris bat coronavirus HKU4 and Pipistrellus bat coronavirus HKU5 are phylogenetically related (denoted as betacoronavirus lineage C)21. The viruses in this lineage have identical genomic structures and are highly conserved in their polyproteins and most structural proteins, but their S proteins and accessory proteins are highly variable. MERSr-CoVs were found in at least 14 bat species from two bat families, Vespertilionidae and Nycteridae. However, none of these MERSr-CoVs is a direct progenitor of MERS-CoV, as their S proteins differ substantially from that of MERS-CoV98,104,105,106.

To understand the evolutionary relationships between MERS-CoV and MERSr-CoVs, we constructed a phylogenetic tree on the basis of the alignment of all the coding regions (Fig. 4b; Supplementary Fig. S1b). The phylogenetic tree contains two main clusters and several small clades or strains. Overall, the genetic diversity within the L1 and L2 viral lineages is low, indicating that humans and camels have been infected by viruses from the same source within a short time period. The L1 viruses include human and camel MERS-CoVs mainly from the Middle East (the United Arab Emirates, the Kingdom of Saudi Arabia, Oman and Jordan) and two Asian countries (South Korea and Thailand) that had caused outbreaks in human populations. It is worth noting that the cases reported in South Korea and Thailand were related to those in the Middle East. The L2 viruses include camel MERS-CoVs from Africa (Nigeria, Burkina Faso and Ethiopia) and one Middle East country (Morocco); these viruses have not caused any human infection. Clearly, these two viral lineages share a common ancestor but have diverged in their potential to cause human infections. The MERSr-CoV strain Neoromicia/5038 (GenBank No. MF593268) isolated in South Africa was the closest relative to MERS-CoVs in the phylogenetic tree. Overall, all the MERSr-CoVs isolated from bats support the hypothesis that MERS-CoV originated from bats. However, given the phylogenetic gap between the bat MERSr-CoVs and human and camel MERS-CoVs, there should be other yet-to-be-identified viruses that are circulating in nature and directly contributed to the emergence of MERS-CoV in humans and camels. Hopefully, such viruses will be found in bats in the future.

Not surprisingly, recombination events have taken place in the evolution and emergence of MERS-CoV94,105,107,108,109. Phylogenetic trees constructed using genes encoding orf1ab and S were incongruent with the tree topology of the complete genome, suggesting potential recombination in these genes108. Numerous recombinations imply that MERS-CoV originated from the exchange of genetic elements between different viral ancestors, including those isolated from camels and the assumed natural host bats94,105,107,110,111.

Variability of human and camel MERS-CoV

The full-length genomic sequences of MERS-CoVs isolated from humans and camels are almost identical (>99% identity). The major variations are located in S, ORF4b and ORF3, particularly in African camel MERS-CoVs94. Substitutions of a few amino acid residues were found in the S protein of some camel MERS-CoVs, but none of them was located in the RBD94,112. Neutralization assays indicated that camel sera that are positive for MERS-CoV can completely neutralize the human MERS-CoV strains, suggesting that MERS-CoVs isolated from humans and camels are antigenically similar to each other94. MERS-CoVs from both humans and camels contain variable ORF3 and ORF4 proteins with different lengths owing to either terminal truncations or internal deletions94. ORF4b is known to be an interferon antagonist113,114. MERS-CoV isolates from West African camels with a truncated ORF4b gene replicate less efficiently in human cell culture and are less pathogenic in human DPP4 transgenic mice94. Curiously, deletion of the orf4 gene in the human MERS-CoV strain EMC did not substantially reduce virus replication, although it induced a stronger interferon response94. Another study demonstrated that the deletion of orf3–orf5 dramatically attenuated MERS-CoV virulence, primarily through increased host responses, including disrupted cellular processes, increased activation of the interferon pathway and robust inflammation115.

Variability of bat MERSr-CoVs

To date, bat MERSr-CoVs and human and camel MERS-CoVs share the same genomic structures but differ substantially in their genomic sequences105,106,110,111,116. The highest overall genomic sequence identity between bat MERSr-CoV and human and camel MERS-CoV is ~85%. On the basis of their genomic sequences, several bat MERSr-CoV strains discovered in China, such as Ii-MERSr-CoV, Ve-MERSr-CoV and Hy-MERSr-CoV, have just reached the taxonomic threshold to be considered the same species as MERS-CoV106,110,111.

Compared with human and camel MERS-CoV, bat MERSr-CoVs vary most in S and accessory genes. The sequence identity of the S protein between bat MERSr-CoVs and human and camel MERS-CoVs is approximately 45–65%, with even lower sequence identity in the RBD region110,111. The size of these S proteins differs in these strains, mainly because of deletions in their RBD region and/or the S1 and S2 boundary. These deletions are considered to be related to the differences in receptor binding and cell entry111,116. The accessory genes, including those encoding ORF3, ORF4a, ORF4b and ORF5, are also highly variable in length and sequence between bat MERSr-CoVs and human and camel MERS-CoVs, suggesting substantial evolution of these genes in their natural hosts105,106,110,111,116.

Receptor usage of MERS-CoV and MERSr-CoV

In contrast to SARS-CoV, which uses ACE2 as its receptor, MERS-CoV uses DPP4. Similar to SARS-CoV S1-CTD, MERS-CoV S1-CTD functions as the viral RBD10,117. Like the SARS-CoV S1-CTD, the MERS-CoV S1-CTD also contains two subdomains, a core structure and an RBM9,118,119,120 (Fig. 6e). The core structures of these two S1-CTDs are similar to each other, with both containing a five-stranded β-sheet as the main scaffold. However, their RBMs differ substantially: whereas the SARS-CoV RBM mainly contains loops, the MERS-CoV RBM mainly contains a four-stranded β-sheet. The structural differences between MERS-CoV and SARS-CoV RBMs account for the different receptor specificities of the two viruses121.

Like the interactions between SARS-CoV and ACE2, the interactions between MERS-CoV and DPP4 have been extensively examined. DPP4 from humans, camels, horses and bats can function as a receptor for MERS-CoV, whereas DPP4 from mice, hamsters and ferrets cannot112,122,123,124,125. Key residue differences between human DPP4 and the DPP4 from other species affect the species specificities of MERS-CoV. For example, two residues (288 and 330) in mouse DPP4 and five residues (291, 295, 336, 341 and 346) in hamster DPP4 are largely responsible for the incompatibility of mouse and hamster DPP4s with MERS-CoV112,123. Mutating these residues to the corresponding residues in human DPP4 makes mouse and hamster DPP4 functional receptors for MERS-CoV. On the other hand, MERS-CoV and MERSr-CoVs have been isolated from camels and bats, respectively. MERS-CoV strains isolated from humans and camels are highly similar to each other, and they both use human DPP4 efficiently112. MERSr-CoVs from bats in general share only ~60–70% sequence identity with MERS-CoV in the RBD, and only some of these bat viruses, including HKU4, recognize DPP4 as the receptor110,111,126. However, they bind DPP4 less efficiently than MERS-CoV. Mutating three residues in the HKU4 RBD (540, 547 and 558) substantially increased its affinity for human DPP4 (ref.127). Overall, as in the case of SARS-CoV, receptor recognition is a crucial determinant of the host range of MERS-CoV.

SADS-CoV

From 28 October 2016 to 2 May 2017, swine acute diarrhoea syndrome (SADS) was observed in four pig breeding farms in Guangdong province, with a mortality up to 90% for piglets 5 days or younger. A novel HKU2-related bat coronavirus, named SADS-CoV, was identified as the causative agent34,128,129. The SADS-CoV isolates from piglets of the four farms were almost identical and shared 95% identity with Rhinolophus bat coronavirus HKU2 (ref.130), indicating the bat origin of this pig virus. Immediately after the SADS outbreak, SADS-related CoVs (SADSr-CoVs) with 96–98% sequence identity to SADS-CoV were detected in 9.8% of anal swabs collected from different Rhinolophus species in Guangdong province during 2013–2016. Although genetically highly similar, bat SADSr-CoVs show high genetic diversity in the S gene, with 72–92% nucleotide and 80–98% amino acid identity to SADS-CoV. Receptor analysis indicated that none of the known coronavirus receptors, ACE2, DPP4 and aminopeptidase N, are essential for SADS-CoV entry34. The mechanism of transmission of SADS-CoV from bats to pigs and the pathogenesis of bat-originated SADSr-CoVs in pigs need further exploration. This is the first documented spillover of a bat coronavirus that caused severe diseases in domestic animals, although molecular evolution data suggested PEDV probably originated in bats37,38.

Conclusions and future perspectives

The collected data on genetic evolution, receptor binding and pathogenesis demonstrated that SARS-CoV most likely originated in bats through sequential recombination of bat SARSr-CoVs. Recombination likely occurred in bats before SARS-CoV was introduced into Guangdong province through infected civets or other infected mammals from Yunnan. The introduced SARS-CoV underwent rapid mutations in S and orf8 and successfully spread in market civets. After several independent spillovers to humans, some of the strains underwent further mutations in S and became epidemic during the SARS outbreak in 2002–2003. However, a recent serological investigation revealed the presence of antibodies against the SARSr-CoV nucleocapsid in humans living around a bat cave but who had not shown clinical signs of disease, suggesting that the virus can infect humans through frequent contact131.

A similar scenario might have happened for MERS-CoV. Since its outbreak in 2012, MERSr-CoVs and related viruses (HKU4 and HKU5) have been found in different bat species in five continents17,21,106,110,111,116,126,127,132. The ORF1ab of these viruses is highly similar to MERS-CoV ORF1ab, but they are highly diverse in their S proteins. Surprisingly, some bat MERSr-CoVs and HKU can use the same receptor, DPP4, as MERS-CoV110,111,126,127. Given the massive number of coronaviruses carried by different bat species, the high plasticity in receptor usage and other features such as adaptive mutation and recombination, frequent interspecies transmission from bats to animals and humans is expected.

Currently, no clinical treatments or prevention strategies are available for any human coronavirus. Given the conserved RBDs of SARS-CoV and bat SARSr-CoVs, some anti-SARS-CoV strategies in development, such as anti-RBD antibodies or RBD-based vaccines, should be tested against bat SARSr-CoVs. Recent studies demonstrated that anti-SARS-CoV strategies worked against only WIV1 and not SHC014 (refs71,88,89). In addition, little information is available on HKU3-related strains that have much wider geographical distribution and bear truncations in their RBD. Similarly, anti-S antibodies against MERS-CoV could not protect from infection with a pseudovirus bearing the bat MERSr-CoV S 111. Furthermore, little is known about the replication and pathogenesis of these bat viruses. Thus, future work should be focused on the biological properties of these viruses using virus isolation, reverse genetics and in vitro and in vivo infection assays. The resulting data would help the prevention and control of emerging SARS-like or MERS-like diseases in the future.

It is widely accepted that many viruses have existed in their natural reservoirs for a very long time. The constant spillover of viruses from natural hosts to humans and other animals is largely due to human activities, including modern agricultural practices and urbanization. Therefore, the most effective way to prevent viral zoonosis is to maintain the barriers between natural reservoirs and human society, in mind of the ‘one health’ concept.