From: Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic

Fig. 1

a, Breakpoints identified by 3SEQ illustrated by percentage of sequences (out of 68) that support a particular breakpoint position. Note that breakpoints can be shared between sequences if they are descendants of the same recombination events. Pink, green and orange bars show BFRs, with region A (nt 13,291–19,628) showing two trimmed segments yielding region A′ (nt 13,291–14,932, 15,405–17,162, 18,009–19,628). Regions B and C span nt 3,625–9,150 and 9,261–11,795, respectively. Concatenated region A′BC is NRR1. Open reading frames are shown above the breakpoint plot, with the variable-loop region indicated in the S protein. b, Similarity plot between SARS-CoV-2 and several selected sequences including RaTG13 (black), SARS-CoV (pink) and two pangolin sequences (orange). The shaded region corresponds to the S protein. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. Nucleotide positions for phylogenetic inference are 147–695, 962–1,686 (first tree), 3,625–9,150 (second tree, also BFR B), 9,261–11,795 (third tree, also BFR C), 12,443–19,638 (fourth tree) and 23,631–24,633, 24,795–25,847, 27,702–28,843 and 29,574–30,650 (fifth tree). Relevant bootstrap values are shown on branches, and grey-shaded regions show sequences exhibiting phylogenetic incongruence along the genome. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004.

