Main

As coronaviruses are common in mammals and birds5, we used the whole-genome sequence of SARS-CoV-2 (strain WHCV; GenBank accession number MN908947) in a Blast search of SARS-related coronavirus sequences in available mammalian and avian viromic, metagenomic and transcriptomic data. We identified 34 closely related contigs in a set of viral metagenomes from pangolins (Extended Data Table 1), and therefore focused our subsequent search on SARS-related coronaviruses in pangolins.

We obtained the lung tissues from 4 Chinese pangolins (Manis pentadactyla) and 25 Malayan pangolins (Manis javanica) from a wildlife rescue centre during March–August 2019, and analysed them for SARS-related coronaviruses using reverse-transcription polymerase chain reaction (RT–PCR) with primers that target a conservative region of betacoronaviruses. RNA from 17 of the 25 Malayan pangolins generated the expected PCR product, whereas RNA from the Chinese pangolins did not amplify. The virus-positive Malayan pangolins were all from the first transport. These pangolins were brought into the rescue centre at the end of March, and gradually showed signs of respiratory disease, including shortness of breath, emaciation, lack of appetite, inactivity and crying. Furthermore, 14 of the 17 pangolins that tested positive for viral RNA died within one and half months of testing. Plasma samples of four PCR-positive and four PCR-negative Malayan pangolins were used in the detection of IgG and IgM antibodies against SARS-CoV-2 using a double-antigen sandwich enzyme-linked immunosorbent assay (ELISA). One of the PCR-positive sample reacted strongly, showing an optical density at 450 nm (OD450) value of 2.17 (cut-off value = 0.11) (Extended Data Table 2). The plasma remained positive at the dilution of 1:80, which suggests that the pangolin was naturally infected with a virus similar to SARS-CoV-2. The other three PCR-positive pangolins had no detectable antibodies against SARS-CoV-2. It is possible that these pangolins died during the acute stage of disease, before the appearance of antibodies. Histological examinations of tissues from four betacoronavirus-positive Malayan pangolins revealed diffuse alveolar damage of varying severity in the lung, compared with lung tissue from a betacoronavirus-negative Malayan pangolin. In one case, alveoli were filled with desquamated epithelial cells and some macrophages with haemosiderin pigments, with considerably reduced alveolar space, leading to the consolidation of the lung. In other cases, similar changes were more focal (Fig. 1, Extended Data Fig. 1). The severe case also had exudate with red blood cells and necrotic cell debris in bronchioles and bronchi. Focal mononuclear-cell infiltration was seen in the bronchioles and bronchi in two of the cases, and haemorrhage was seen in the bronchioles and small bronchi in one case (Extended Data Figs. 13). Hyaline membrane and syncytia were not detected in the alveoli of the four cases we examined.

Fig. 1: Pathological changes in the lungs of pangolins that are potentially induced by pangolin-CoV.
figure 1

ad, Histological changes in the lung tissues are compared between a virus-negative Malayan pangolin (a) and three Malayan pangolins naturally infected with pangolin-CoV (bd) (original magnification × 1,000). Proliferation and desquamation of alveolar epithelial cells and haemosiderin pigments are seen in tissues from all three infected pangolins and severe capillary congestion is seen in one of them (c). e, Viral particles are seen in double-membrane vesicles in the transmission electron microscopy image taken from Vero E6 cell culture inoculated with supernatant of homogenized lung tissue from one pangolin, with morphology indicative of coronavirus (inserts at the top right corner of e). Scale bar, 200 nm.

To isolate the virus, supernatant from homogenized lung tissue from one dead Malayan pangolin was inoculated into Vero E6 cells. Obvious cytopathogenic effects were observed in cells after a 72-h incubation. Viral particles were detected by transmission electron microscopy: most of these particles were inside double-membrane vesicles, with a few outside of them. They showed typical coronavirus morphology (Fig. 1e). RT–PCR targeting the spike (S) and RdRp genes produced the expected PCR products: these PCR products had approximately 84.5% and 92.2% nucleotide sequence identity, respectively, to the partial S and RdRp genes of SARS-CoV-2.

Illumina RNA sequencing was used to identify viruses in the lung from 12 pangolins (including four that were reported previously15). Mapping sequence data to the reference SARS-CoV-2 WHCV genome identified coronavirus sequence reads in nine samples (Extended Data Table 3). For one sample, higher genome coverage was obtained by remapping the total reads to the reference genome (Extended Data Fig. 4). We obtained the completed coronavirus genome (29,825 bp)—which we designated pangolin-CoV—using the assembled contigs, short sequence reads and targeted PCR analysis. The full S gene was sequenced in six PCR-positive samples, which revealed the presence of only four nucleotide differences in the sequence alignment among these samples (Extended Data Fig. 5); this indicates that only one type of coronavirus was present in the batch of study samples. The predicted S, E, M and N genes of pangolin-CoV are 3,798, 228, 669 and 1,260 bp, respectively, in length and the proteins they encode share 90.7%, 100%, 98.6% and 97.8% amino acid identity to the equivalent proteins of SARS-CoV-2 (Table 1).

Table 1 Genomic comparison of pangolin-CoV with SARS-CoV-2, SARS-CoV and bat SARS-related coronaviruses

In a Simplot analysis of whole-genome sequences, we found that pangolin-CoV was highly similar to SARS-CoV-2 and RaTG13, with sequence identity between 80 and 98% (except for the S gene) (Fig. 2). Further comparative analysis of the S gene sequences suggests that there were recombination events among some of the SARS-related coronaviruses that we analysed. In the region of nucleotides 1–914, pangolin-CoV is more similar to the bat SARS-related coronaviruses ZXC21 and ZC45, whereas in the remaining part of the gene pangolin-CoV is more similar to SARS-CoV-2 and RaTG13 (Fig. 2). In particular, the receptor-binding domain (RBD) of the S protein of pangolin-CoV has only one amino acid difference with SARS-CoV-2. Overall, these data indicate that SARS-CoV-2 might have originated from the recombination of a virus similar to pangolin-CoV and a virus similar to RaTG13 (Fig. 2). To further support this conclusion, we assessed the evolutionary relationships among betacoronaviruses in the full genome, the RdRp and S genes, and in different regions of the S gene (Fig. 2c, Extended Data Fig. 6). The topologies mostly showed the clustering of pangolin-CoV with SARS-CoV-2 and RaTG13; SARS-CoV-2 and RaTG13 form a subclade within this cluster (Fig. 2c). However, pangolin-CoV and SARS-CoV-2 grouped together in the phylogenetic analysis of the RBD. Conflicts in cluster formation among phylogenetic analyses of different regions of the genome serve as a strong indication of genetic recombination, as has previously been seen for SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV)6,7.

Fig. 2: Genome characterization of pangolin-CoV.
figure 2

a, Similarity plot of the full-length genomes and S gene sequences of pangolin-CoV against sequences of SARS-CoV-2 strain WIV02, as well as RaTG13, ZC45 and ZXC21. Although pangolin-CoV has a high sequence identity to SARS-CoV-2 and RaTG13 in most regions of the S gene, it is more similar to ZXC21 and ZC45 at the 5′ end. SARS-rCoV, SARS-related coronavirus. Parameters for the similarity plots are: window, 500 bp; step, 50 bp; gap strip, on; Kimura (2 parameter); T/t 2.0. b, Because of the presence of genetic recombination, there is discrepancy in cluster formation among the outcomes of phylogenetic analyses of different regions of the S gene. c, Phylogeny of coronaviruses closely related to SARS-CoV-2, based on full genome sequences. The phylogenetic tree was constructed using RAxML with the substitution model GTRGAMMAI and 1,000 bootstrap replicates. Numbers (>70) above or below branches are percentage bootstrap values for the associated nodes. The scale bar represents the number of substitutions per site. Red circles indicate the pangolin coronavirus sequences generated in this study, and blue triangles indicate SARS-CoV-2 sequences from humans.

As the S proteins of both SARS-CoV and SARS-CoV-2 have previously been shown to specifically recognize angiotensin-converting enzyme 2 (ACE2) during the entry of host cells2,8, we conducted molecular binding simulations of the interaction of the S proteins of the four closely related SARS-related coronaviruses with ACE2 proteins from humans, civets and pangolins. As expected, the RBD of SARS-CoV binds efficiently to ACE2 from humans and civets in the molecular binding simulation. In addition, this RBD appears to be capable of binding ACE2 of pangolins. By contrast, the S proteins of SARS-CoV-2 and pangolin-CoV can potentially recognize only the ACE2 of humans and pangolins (Extended Data Fig. 7).

SARS-CoV-2 is one of three known zoonotic coronaviruses (the others are SARS-CoV and MERS-CoV) that infect the lower respiratory tract and cause severe respiratory syndromes in humans7,9. Thus far, SARS-CoV-2 has been more contagious, but less deadly, than SARS-CoV10: the total number of human infections by SARS-CoV-2 far exceeds those of SARS-CoV11. Epidemiological investigations of the SARS-CoV-2 outbreak have shown that some of the initial patients were associated with the Huanan seafood market, where live wildlife was also sold10. No animals thus far have been implicated as carriers of the virus. SARS-CoV-2 forms a cluster with SARS-CoV and bat SARS-related coronaviruses (Fig. 2c). In addition, a bat coronavirus (RaTG13) has about 96% sequence identity to SARS-CoV-2 at the whole-genome level2. Therefore, it is reasonable to assume that bats are the native host of SARS-CoV-2, as has previously been suggested for SARS-CoV and MERS-CoV12,13. The SARS-related coronavirus identified in the present study and the metagenomic assemblies of viral sequences from Malayan pangolins14 is genetically related to SARS-CoV-2, but is unlikely to be directly linked to the current outbreak because of its substantial sequence differences from SARS-CoV-2. However, a virus related to pangolin-CoV appears to have donated the RBD to SARS-CoV-2. SARS-related coronavirus sequences have previously been detected in dead Malayan pangolins15. These sequences appear to be from the same virus (pangolin-CoV) that we identified in the present study, as judged from their sequence similarity. Here we provide evidence for the potential for pangolins to act as the zoonotic reservoir of SARS-CoV-2-like coronaviruses. However, the pangolins we studied here showed clinical signs of disease. In general, a natural reservoir host does not show severe disease, whereas an intermediate host may have clinical signs of infection16. Although a SARS-CoV-2-like coronavirus was detected in the lungs of these pangolins, a direct association between the clinical signs or pathology and active virus replication is not available as we lack evidence from immunohistochemistry or in situ hybridization experiments. The experimental infection of healthy pangolins with pangolin-CoV would provide more definitive answers; however, as pangolins are protected it is difficult to carry out such experiments. Further studies are needed to confirm the role of pangolins in the transmission of SARS-related coronaviruses.

As the RBD of pangolin-CoV is nearly identical to that of SARS-CoV-2, the virus in pangolins presents a potential future threat to public health. Pangolins and bats are both nocturnal animals, eat insects and share overlapping ecological niches17,18, which make pangolins an ideal intermediate host for some SARS-related coronaviruses. Therefore, more systematic and long-term monitoring of SARS-related coronaviruses in pangolins and related animals should be implemented to identify the potential animal source of SARS-CoV-2 in the current outbreak.

Our findings support the call for stronger enforcement of regulations against the illegal trade in pangolins. Owing to the demand for their meat as a delicacy and their scales for use in traditional medicine in China, the illegal smuggling of pangolins from Southeast Asia to China is widespread18. International co-operation in the implementation of stricter regulations against illegal wildlife trade and consumption of game meat should be encouraged, as this will increase the protection of endangered animals and help to prevent future outbreaks of diseases caused by SARS-related coronaviruses.

Methods

No statistical methods were used to predetermine sample size.

Metagenomic analysis and viral genome assembly

We collected viromic, metagenomic and transcriptomic data of different mammals and birds in public databases—including NCBI Sequence Read Archive (SRA) and European Nucleotide Archive (ENA)—for searching potential coronavirus sequences. The raw reads from the public databases and some in-house metagenomic datasets were trimmed using fastp (v.0.19.7)19 to remove adaptor and low-quality sequences. The clean reads were mapped to the SARS-CoV-2 reference sequence (MN908947) using BWA-MEM (v.0.7.17)20 with >30% matches. The mapped reads were collected for downstream analyses. Contigs were de novo-assembled using Megahit (v.1.0.3)21 and identified as related to SARS-CoV-2 using BLASTn with E–values < 1 × 10−5 and sequence identity >90%.

Samples

Pangolins used in the study were confiscated by Customs and Department of Forestry of Guangdong Province in March and August 2019. They included four Chinese pangolins (M. pentadactyla) and 25 Malayan pangolins (M. javanica). The first transport confiscated contained 21 Malayan pangolins, and the second transport contained 4 Malayan pangolins and 4 Chinese pangolins. These pangolins were sent to the wildlife rescue centre, and were mostly inactive and crying, and eventually died in custody despite exhaustive rescue efforts. Tissue samples were taken from the lung of pangolins that had just died for histological and virological examinations.

Pathological examinations

Histological examinations were performed on lung tissues from five Malayan pangolins. In brief, the tissues collected were cut into small pieces and fixed in 10% buffered formalin for 24 h. They were washed free of formalin, dehydrated in ascending grades of ethanol and cleared with chloroform, and then embedded with molten paraffin wax in a template. The tissue blocks were sectioned with a microtome. The sections were transferred onto grease-free glass slides, deparaffinized and rehydrated through descending grades of ethanol and distilled water. They were stained with a haematoxylin and eosin staining kit (Baso Diagnostics, Wuhan Servicebio Technology). Finally, the stained slides were mounted with coverslips and examined under an Olympus BX53 equipped with an Olympus PM-C 35 camera.

Virus isolation and RT–PCR analysis

Lung tissue extract from pangolins was inoculated into Vero E6 cells for virus isolation. The cell line was tested free of mycoplasma contamination using LookOut Mycoplasma PCR Detection Kit (SIGMA), and was authenticated by microscopic morphologic evaluation. Cultured cell monolayers were maintained in Dulbecco’s Modified Eagle Medium (DMEM) and Ham’s F-12. The inoculum was prepared by grinding the lung tissue in liquid nitrogen, diluting it 1:2 with DMEM, filtering it through a 0.45-μm filter (Merck Millipore), and treating it with 16 μg/ml trypsin solution. After incubation at 37 °C for 1 h, the inoculum was removed from the culture and replaced with fresh culture medium. The cells were incubated at 37 °C and observed daily for cytopathic effects.

Viral RNA was extracted from the lung tissue using the QIAamp Viral RNA Mini kit (Qiagen) following the manufacturer-recommended procedures, and examined for coronavirus by RT–PCR using a pair of primers (F: 5′-TGGCWTATAGGTTYAATGGYATTGGAG-3′, R: 5′-CCGTCGATTGTGTGWATTTGSACAT-3′) designed to amplify the S gene of betacoronavirus.

Transmission electron microscopy

Cell cultures that showed cytopathic effects were examined for the viral particles using transmission electron microscopy. Cells were collected from the culture by centrifugation at 1,000g for 10 min, and fixed initially with 2.5% glutaraldehyde solution at 4 °C for 4 h, and again with 1% osmium tetroxide. They were dehydrated with graded ethanol and embedded with PON812 resin. Sections (80 nm in thickness) were cut from the resin block and stained with uranyl acetate and lead citrate sequentially. The negative stained grids and ultrathin sections were observed under a HT7800 transmission electron microscope (Hitachi).

Serological test

Plasma samples from eight Malayan pangolins were tested for anti-SARS-CoV-2 antibodies using a double-antigen ELISA kit for the detection of antibodies against SARS-CoV-2 by Hotgen, following manufacturer-recommended procedures. The assay was designed for the detection of both IgG and IgM antibodies against SARS-CoV-2 in humans and animals, and marketed as supplementary diagnostic tool for COVID-19. It uses the capture of antibodies against SARS-CoV-2 by the S1 antigen precoated on ELISA plates, and the detection of the antibodies through the use of horseradish peroxidase-conjugated RBD. Both the S1 antigen and RBD fragment were expressed in eukaryotic cells. Data generated by the test developer have shown a 95% detection rate in the analysis of sera from over 200 patients with COVID-19s. The assay has an inter-test variation of ≤15%, and no cross-reactivities with sera or plasma from patients positive for SARS-CoV, common and avian influenza viruses, mycoplasma and chlamydia. Fifty microlitres of plasma was analysed in duplicate, together with two negative controls and one positive control. The reaction was read on a Synergy HTX Multi-Mode Microplate Reader (BioTek) at 450/630 nm, with optical density (OD) values being calculated. The cut-off OD value for positivity was 0.105 + mean OD from the negative controls, and the cut-off value for OD for the positive control was set at ≥ 0.5. Positive samples were tested again with serial-diluted plasma.

Metagenomic sequencing

The lung tissue was homogenized by vortex with silica beads in 1 ml of phosphate-buffered saline. The homogenate was centrifuged at 10,000g for 5 min, with the supernatant being filtered through a 0.45-μm filter (Merck Millipore) to remove large particles. The filtrate or virus culture supernatant was used in RNA extraction with the QIAamp Viral RNA Mini kit. cDNA was synthesized from the extracted RNA using PrimeScriptScript II reverse transcriptase (Takara) and random primers, and amplified using Klenow Fragment (New England Biolabs). Sequencing libraries were prepared with NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs), and sequenced paired-end (150-bp) on an Illumina NovaSeq 6000. Specific PCR assays were used to fill genome sequence gaps, using primers designed based on sequences flanking the gap.

Phylogenetic analysis

Multiple sequence alignments of all sequence data were constructed using MAFFT v.7.22122. The phylogenetic relationship of the viral sequences was assessed using RAxML v.8.0.1423. The best-fit evolutionary model for the sequences in each dataset was identified using ModelTest24. Potential recombination events and the location of possible breakpoints in betacoronavirus genomes were detected using Simplot (version 3.5.1)25 and RDP 4.9926.

Molecular simulation of interactions between RBD and ACE2

The interaction between the RBD of the S protein of SARS-related coronavirus and the ACE2 of humans, civets, and pangolins was examined using molecular dynamic simulation. The crystal structure of SARS-CoV RBD domain binding to human ACE2 protein complex was downloaded from Protein Data Bank (PDB code 2AJF27). The structures of the complexes formed by ACE2 of civets or pangolins and the RBD of SARS-CoV-2, RaTG13 and pangolin-CoV were made using the MODELLER program28, and superimposed with the template (PDB code 2AJF). The sequence identity of SARS-CoV RBD (PDB code 6ACD) to the RBD of SARS-CoV-2, RaTG13 and pangolin-CoV was 76.5%, 76.8% and 74.2%, respectively, and the sequence identity of the human ACE2 protein to that of pangolins and civets was 85.4% and 86.9%, respectively.

The molecular dynamic simulations of RBD–ACE2 complexes were carried out using the AMBER 18 suite29 and ff14SB force field30. After two-stage minimization, NVT and NPT-MD, a 30-ns production molecular dynamics simulation was applied, with the time step being set to 2 fs and coordinate trajectories being saved every 3 ps. The MM-GBSA31 approach was used to calculate the binding free energy of each ACE2 protein to the RBD of the S protein, using the python script MMPBSA.py32 in the build-in procedure of AMBER 18 suite. The last 300 frames of all simulations were extracted to calculate the binding free energy that excludes the contributions of disulfide bond.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.