a, Genome organization of coronaviruses including the pangolin coronaviruses obtained in this study, with the predicted ORFs shown in different colours (ORF1a is omitted for clarity). The pangolin coronavirus strain GX/P2V is shown with its sequence length. For comparison, the human sequences NC_045512.2 and NC_004718.3, and bat sequences MG772933.1, GQ153541.1 and KC881006.1 are included (see Extended Data Table 6 for sources). b, Phylogeny of the subgenus Sarbecovirus (genus Betacoronavirus; n = 53) estimated from the concatenated ORF1ab, S, E, M and N genes. Red circles indicate the pangolin coronavirus sequences generated in this study (Extended Data Table 1). GD/P1L is the consensus sequence re-assembled from previously published raw data7. Phylogenies were estimated using a maximum likelihood approach that used the GTRGAMMA nucleotide substitution model and 1,000 bootstrap replicates. Scientific names of the bat hosts are indicated at the end of the sequence names, and abbreviated as follows: C. plicata, Chaerephon plicata; R. affinis, Rhinolophus affinis; R. blasii, Rhinolophus blasii; R. ferrumequinum, Rhinolophus ferrumequinum; R. monoceros, Rhinolophus monoceros; R. macrotis, Rhinolophus macrotis; R. pearsoni, Rhinolophus pearsoni; R. pusillus, Rhinolophus pusillus; R. sinicus, Rhinolophus sinicus. Palm civet (P. larvata, Paguma larvata; species unspecified for Civet007 and PC4-13 sequences) and human (H. sapiens, Homo sapiens) sequences are also shown.