Main

At least 30 outbreaks of Ebola virus disease (EVD) have been identified since the late 1970s, the most severe of which affected Guinea, Sierra Leone and Liberia from December 2013 to June 20161,2. Guinea experienced a new outbreak of EVD in 2021, which started in Gouéké—a town about 200 km away from the epicentre of the 2013–2016 outbreak. The probable index case was a 51-year-old nurse, an assistant of the hospital midwife in Gouéké. On 21 January 2021, she was admitted to hospital in Gouéké suffering from headache, asthenia, nausea, anorexia, vertigo and abdominal pain. She was diagnosed with malaria and salmonellosis and was released two days later. Feeling ill again once at home, she attended a private clinic in Nzérékoré (40 km away) and visited a traditional healer, but died three days later. In the week after her death, her husband—as well as other family members who attended her funeral—fell ill, and four of them died. They were reported as the first suspect cases by the national epidemic alert system on 11 February. On 12 February, blood was taken from two suspect cases admitted to hospital in Nzérékoré. On 13 February, both of these patients were confirmed to have EVD by the laboratory in Guéckédou, which used a commercial real-time polymerase chain reaction with reverse transcription (qRT–PCR) assay (RealStar Filovirus Screen Kit, Altona Diagnostics). On 13 February, the husband of the index case—who travelled more than 700 km from Gouéké to Conakry, the capital city of Guinea, for treatment—was admitted to the Centre de Traitement Epidémiologique (CTEpi) in Nongo, Ratoma Commune. He presented with fever, nausea, asthenia, abdominal pain and lumbar pain and was strongly suspected to have EVD. A blood sample was analysed on the same day and was found to be positive for Ebola Zaire (Zaire ebolavirus; EBOV) according to the GeneXpert molecular diagnostic platform (Xpert Ebola test, Cepheid) and by an in-house qRT–PCR assay. Laboratory confirmation of EVD in the three suspect cases led to the official declaration of the epidemic on 14 February. By 5 March, 14 confirmed cases and 4 probable cases of EVD had been identified, leading to 9 deaths—including 5 confirmed cases as reported by the Agence Nationale de la Sécurité Sanitaire (ANSS) of Guinea. After a period of 25 days without new cases, two new cases were reported around Nzérékoré on 1 and 3 April, and on 19 June 2021 the outbreak was declared to be over. In total, 16 confirmed cases were reported, among which 12 people died.

Genomic characterization of the virus that caused the 2021 epidemic of EVD in Guinea was of immediate importance to public health. First, because diagnostic tools, therapeutics and vaccines with proven effectiveness in recent EVD outbreaks—such as in Guinea (2013–2016) and in the Equateur and North-Kivu/Ituri provinces of the Democratic Republic of the Congo (DRC) (2018–2020)—have primarily been developed for EBOV3,4,5. Second, to identify whether the outbreak resulted from a new zoonotic transmission event or from the resurgence of a viral strain that had circulated in a previous EBOV outbreak: it is known that EBOV can persist in the bodily fluids of patients who have survived EVD and can be at the origin of new transmission chains6,7,8. Although the Xpert Ebola test was developed to detect only EBOV strains and the in-house qRT–PCR assay uses a probe that is specifically designed to detect EBOV9, additional confirmation by sequence analysis was sought by targeting a short fragment in the viral protein 35 region of the sample from the patient who was hospitalized in Conakry. The phylogenetic tree (Supplementary Fig. 1) underscores that this highly conserved region can discriminate between Ebola virus species, and analysis confirmed that the virus that caused the new outbreak was of the species Zaire ebolavirus. This confirmed that available vaccines and the vast majority of molecular-diagnostic tools and therapeutics could be immediately applied.

To gain further insight into the genomic make-up of the viruses causing this outbreak, 11 complete or near-complete (greater than 95% recovery) and 8 partial (greater than 65% recovery) genomic sequences from 12 of the 14 confirmed cases were obtained by 3 different laboratories using different next-generation sequencing technologies (Table 1). To facilitate the public-health response and the evaluation of existing medical countermeasures, sequencing results were made publicly available on 12 March through joint posting (https://virological.org/c/ebolavirus/guinea-2021/44). Blood and swab samples from 14 patients with confirmed EVD, sampled from 12 February to 4 March, were processed by the following methods: hybridization capture technology and sequencing on Illumina iSeq100, an amplicon-based protocol with EBOV-specific primer pools and sequencing on MinION (Oxford Nanopore Technologies (ONT)), and a hybrid-capture-based approach using a probe panel that included EBOV-specific targets followed by TruSeq exome enrichment, as previously described5. The data generated between the three groups were pooled and the sequence that had the highest quality was chosen for each patient. This enabled us to reconstruct 12 high-quality EBOV genomes that covered 82.9–99.9% of the reference genome (KR534588) (Table 1). The consensus EBOV sequences with the highest genome recovery (greater than 82.9%) from 12 different patients were used in further analyses.

Table 1 Patient and sample characteristics and sequencing results obtained by the laboratories involved in the study

Maximum likelihood phylogenetic reconstruction places the 12 genomes from the 2021 outbreak of EVD in Guinea as a single cluster among the EBOV viruses that were responsible for the 2013–2016 outbreak in West Africa (Figs. 1, 2). The genomes from the 2021 outbreak share 10 substitutions (compared with KJ660346) that were accumulated during the 2013–2016 outbreak, including the A82V marker mutation for human adaptation in the glycoprotein that arose when the virus spread to Sierra Leone11,12. These patterns provide strong evidence of a direct link to human cases from the 2013–2016 outbreak rather than a new spillover from an animal reservoir. The 2021 lineage is nested within a clade that predominantly consists of genomes sampled from Guinea in 2014 (Fig. 2). The branch by which the 2021 cluster diverges from the previous outbreak exhibits only 12 substitutions, which is far fewer than would be expected from the evolution of EBOV during 6 years of sustained human-to-human transmission (Fig. 3). Using a local molecular-clock analysis, we estimate a 6.4-fold (95% highest posterior density (HPD) interval: 3.3-fold, 10.1-fold) lower rate along this branch. For comparison, we also estimate a 5.5-fold (1.6-fold, 10.8-fold) lower rate along the branch leading to the 2016 cases, which were linked to a patient who survived the disease and in whom the virus persisted for more than 500 days7,13. Rather than a constant long-term low evolutionary rate, some degree of latency or dormancy during persistent infection seems to be a more likely explanation for the low divergence of the genomes from the 2021 epidemic. We tested whether the 12 genomes from the 2021 epidemic, which were sampled over a time period of less than one month, contained sufficient temporal signal to estimate the time to most recent common ancestor (tMRCA) (Supplementary Fig. 2); however, we did not identify statistical support for sufficient divergence accumulation over this short timescale. We therefore calibrated our analysis using an evolutionary rate that reflects EBOV evolution under sustained human-to-human transmission (as estimated by the local molecular-clock analysis). This resulted in a tMRCA estimate of 22 January 2021 (95% HPD interval: 29 December 2020, 10 February 2021).

Fig. 1: Maximum likelihood phylogenetic reconstruction for 55 representative genomes from previous outbreaks of Zaire ebolavirus and 12 genomes from the 2021 outbreak in Guinea.
figure 1

Most clades for single or multiple closely related outbreaks are collapsed and internal node support is proportional to the size of the internal node circles. The clades or tip circles are labelled with the locations and years of the outbreaks, and coloured according to the (first) year of detection.

Fig. 2: Maximum likelihood phylogenetic reconstruction for 1,065 genomes sampled during the 2013–2016 West African outbreak and 12 genomes from the 2021 outbreak in Guinea.
figure 2

A colour gradient (from purple to green for increasing divergence) is used to colour the tip circles. The 2021 genomes are shown with a larger circle in yellow.

Fig. 3: Temporal divergence plot showing genetic divergence from the root over time.
figure 3

This plot relates to the tree shown in Fig. 2. The regression is exclusively fitted to genomes sampled between 2014 and 2015. The same colours are used for the data points as in Fig. 2. The dashed yellow lines highlight how the 2021 data points deviate from the relationship between sampling time and sequence divergence. According to this relationship, about 95 substitutions (95% prediction interval: 88–101) are expected on the branch ancestral to the 2021 cluster, whereas only 12 are inferred on this branch.

These results open up a new perspective on the relatively rare observation of EBOV re-emergence. It is assumed that all known filovirus outbreaks in humans are the result of independent zoonotic transmission events from bat reservoir species or from intermediate or amplifying hosts such as apes and duikers6. Here we clearly show that, even almost five years after the declaration of the end of an epidemic, new outbreaks could also be the result of transmission from humans who were infected during a previous epidemic. The viruses from the 2021 outbreak fall within the lineage of EBOV viruses obtained from humans during the 2014–2016 outbreak; as such, it is very unlikely that this new outbreak has an animal origin or is the result of a new cross-species transmission with the same lineage that remained latent in this natural host, which in that scenario would be at the basis of the West African cluster. The limited genomic divergence between 2014–2015 and 2021 is compatible with a slow long-term evolutionary rate. However, a relatively long phase of latency might be more likely than continuous slow replication. Independent of the mechanistic explanation, the virus most probably persisted at a low level in a human who had survived previous infection. Plausible scenarios of EBOV transmission to the index case include: sexual transmission by exposure to EBOV in semen from a male survivor; contact with body fluids from a survivor who had a relapse of symptomatic EVD (for example during healthcare—the index case was a healthcare worker); or relapse of EVD in the index case—although she was not known to have been infected previously, she could have had an asymptomatic or pauci-symptomatic EBOV infection during the previous outbreak. A detailed investigation of the family of the index case by anthropologists revealed that she was not known to have had EVD previously, nor were her husband or close relatives. However, among more distantly related family, 25 individuals had EVD during the previous outbreak. Only five survived, although the index case apparently had no recent contacts with this part of the family. Consultation of the hospital registers in Gouécké showed that all patients seen by the index case in January 2021 were in good health and were still in good health in March 2021. However, the index case also performed informal consultations outside the hospital environment, which could not be verified. An alternative scenario is that the nurse was not the actual index case, but was part of a small, unrecognized chain of human-to-human transmission in this area of Guinea. However, the diversity of the currently available genomes is limited, and molecular-clock analysis suggests a recent tMRCA, with a mean estimate close to the time that the nurse was first hospitalized and a 95% HPD boundary around the beginning of the year. This provides some reassurance that the outbreak was detected early.

The 2013–2016 outbreak in West Africa was the largest and most complex recent outbreak of EBOV, and involved more than 28,000 cases, 11,000 deaths and an estimated 17,000 survivors, mostly in Guinea, Liberia and Sierra Leone2. This large outbreak provided new information about the disease itself as well as about the medical, social and psychological implications for patients who survived the disease14,15,16. It was also possible to estimate, to some extent, the proportions of asymptomatic or pauci-symptomatic infections and to identify their role in specific unusual transmission chains17,18,19. Although the main route of human-to-human transmission of EBOV is direct contact with infected bodily fluids from symptomatic or deceased patients, some transmission chains in this outbreak were associated with viral persistence in semen3. Several studies demonstrated viral persistence in more than 50% of male survivors at 6 months after discharge from Ebola treatment units (ETU), and the maximum duration of persistence in semen has been reported to be up to 500–700 days after ETU discharge in a small number of male EVD survivors9,20,21,22. Transmission through other bodily fluids (such as breast milk and cervicovaginal fluids) is also suspected8,23,24,25. Furthermore, some immunological studies among survivors suggest a continuous or intermittent EBOV antigenic stimulation due to persistence of an EBOV reservoir in some survivors26,27, although this was not confirmed in another study28. Cases of relapse of EVD have also been sporadically reported and could be the origin of large transmission chains, as recently reported in the North-Kivu outbreak in DRC29. For example, the presence of EBOV RNA, 500 days after ETU discharge, in the breast milk of a woman who was not pregnant when she developed EVD has recently been reported. She attended the hospital owing to complications at 8 months of pregnancy, and a breast milk sample that was taken 1 month after delivery tested positive for EBOV RNA9. These examples illustrate that healthcare workers can be exposed to EBOV when taking care of patients who survived EVD but have an unrecognized relapse of their infection. The 2021 outbreak now highlights that viral persistence and reactivation is not limited to a two-year period, but can also occur on much longer timescales with late reactivation.

Active genomic surveillance has already shown the resurgence of previous strains in other outbreaks of the disease. For example, two EBOV variants circulated simultaneously within the same region during the recent 2020 outbreak in Equateur province, DRC30. Moreover, strains from the two consecutive outbreaks in Luebo, DRC, in 2007 and 2008, are also so closely related that it now seems difficult to exclude that the epidemic observed in 2008 was due to a resurgence event from patient who survived EVD in the 2007 outbreak31,32. However, the limited genomic sampling does not allow for a formal test of this hypothesis.

Although the majority of EVD outbreaks remained limited both in the number of cases and in geographic spread, the two largest outbreaks in West Africa (December 2013–June 2016) and in eastern DRC (August 2018–June 2020) infected thousands of individuals over wide geographic areas, leading to large numbers of EVD survivors. This means that the risk of resurgence is higher than ever before. Continued surveillance of EVD survivors is therefore warranted to monitor the reactivation and relapse of EVD infection and the potential presence of the virus in bodily fluids. This work and associated communications must be conducted with the utmost care for the wellbeing of EVD survivors. During the 2013–2016 outbreak in Guinea, patients who survived EVD had a mixed experience after discharge from ETUs. On the one hand, they were considered as heroes by non-governmental organizations and became living testimonies of a possible recovery33,34. On the other hand, they experienced different forms of stigmatization, such as rejection by family and friends, refusal of involvement in collective work, loss of jobs and housing, and sometimes self-isolation from social life and workplaces35. The human origin of the 2021 EVD outbreak, and the associated shift in our perception of EBOV emergence, call for careful attention to survivors of the disease. The concern that survivors will be stigmatized as a source of danger should be a matter of scrupulous attention36. This is especially true for the area of Gouécké, which is only 9 km away from Womey—a village that is emblematic of the violent reaction of the population towards the EVD response team during the 2013–2016 epidemic37.

Since the 2013–2016 EVD outbreak in Western Africa, genome sequencing has become a major component of the response to outbreaks10,38,39,40,41. The establishment of in-country sequencing and the building of capacity enabled a timely characterization of EBOV strains in the 2021 outbreak in Guinea. In addition to the importance of appropriate healthcare measures focused on survivors, the late resurgence of the virus also highlights the urgent need for further research into potent antiviral agents that can eradicate the latent virus reservoir in patients with EVD, and into efficient vaccines that provide long-term protection. In parallel, vaccination could also be considered to boost protective antibody responses in survivors of the disease27. The vaccination of populations in areas with previous EBOV outbreaks could also be promoted to prevent secondary cases.

Methods

Ethics statement

Diagnostic specimens were collected as part of the emergency response from the Ministry of Health of Guinea, and therefore consent for sample collection was waived. All preparation of samples for sequencing, genomic analysis and data analysis was performed on anonymized samples identifiable only by their laboratory or epidemiological identifier.

Confirmation of Ebola virus species by sequence analysis of the VP35 fragment at CERFIG

Viral RNA was extracted from 140 µl of whole blood collected from samples from the patient hospitalized in Conakry, using the Nuclisens kit (Biomerieux) and following the manufacturer’s instructions. Amplification of a small fragment of the VP35 region was attempted in a semi-nested PCR with a modified protocol as previously described4. First-round VP35 PCR products from positive samples were barcoded and pooled using the Native Barcoding Kit EXP-NBD104 (ONT). Sequencing libraries were generated from the barcoded products using the Genomic DNA Sequencing Kit SQK-LSK109 (ONT) and were loaded onto a R9 flow cell on a MinION (ONT). Genetic data were collected for 1 h. Basecalling, adapter removal and demultiplexing of .fastq files were performed with MinKNOW, v.4.1.22.  Fastq reads >Q11 were used for mapping a virus database with the Genome Detective tool (https://www.genomedetective.com/app/typingtool/virus/). The generated consensus sequence was used for further analysis. For phylogenetic inference, we retrieved one sequence per outbreak from the haemorrhagic fever virus (HFV) database to which we added the newly generated VP35 sequence of the new outbreak. Phylogenetic analyses were performed using maximum likelihood methods using IQ-TREE with 1,000 bootstraps for branch support42,43. The general time-reversible (GTR) model plus a discrete gamma distribution were used as nucleotide substitution models.

Full-length genome sequencing of the new Ebola viruses

Genome sequencing at CERFIG

Whole-genome sequencing was attempted on viral extracts for samples that were positive for EBOV glycoprotein (GP) and nucleoprotein (NP) on the GeneXpert molecular diagnostic platform (Xpert Ebola Assay) with the GP and NP of Zaire ebolavirus. We extracted full nucleic acid using the QIAamp Viral RNA Mini Kit (Qiagen). After DNase treatment with TURBO DNA-free Kit (Ambion) and clean-up with RNA Clean & Concentrator Kit (Zymo Research), RNA was converted to double-stranded cDNA (ds-cDNA) using the SuperScript IV First-Strand Synthesis System (Invitrogen) and NEBNEXT mRNA Second Strand Synthesis Module (New England Biolabs). The resulting ds-cDNA was enzymatically fragmented with NEBNext dsDNA Fragmentase (New England Biolabs) and converted to dual indexed libraries with the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs) and NEBNext Multiplex Oligos for Illumina (New England Biolabs). To enrich EBOV in the libraries, we performed two rounds of hybridization capture (16 h at 65 °C) with custom-made biotinylated RNA baits (120 nucleotides, 2-fold tiling; Arbor Biosciences) covering representative genomes for Zaire ebolavirus (KC242801), Sudan ebolavirus (KC242783), Reston ebolavirus (NC_004161), Taï Forest ebolavirus (NC_014372), Bundibugyo ebolavirus (KC545395) and Marburg marburgvirus (FJ750956), following the myBaits Hybridization Capture for Targeted NGS protocol (v.4.01). After the second round, capture products were quantified using the Qubit 3.0 Fluorometer with Qubit dsDNA HS Assay Kit (Invitrogen), and pooled in equimolar amounts for sequencing on an Illumina iSeq using iSeq 100 i1 Reagents (2 × 150 cycles). Sequencing reads were filtered (adapter removal and quality filtering) with Trimmomatic44 (settings: LEADING:30 TRAILING:30 SLIDINGWINDOW:4:30 MINLEN:40), merged with ClipAndMerge (https://github.com/apeltzer/ClipAndMerge), and mapped to the Zaire ebolavirus RefSeq genome (NC_002549) using BWA-MEM45. Mapped reads were sorted and deduplicated with SortSam and MarkDuplicates from the Picard suite (Broad Institute, Picard; http://broadinstitute.github.io/picard). We generated consensus sequences using Geneious Prime 2020.2.3 (https://www.geneious.com), in which unambiguous bases were called when at least 90% of at least 20 unique reads were in agreement (20×, 90%). For samples with few mapped reads (0001, 0002, 0010, 0030), we also called a consensus at 2×, 90% and 5×, 90%.

Genome sequencing at PFHG

Sequencing at PFHG was performed using a mobile MinION facility deployed by BNITM to Guinea at the beginning of March 2021. A total of 13 EBOV-positive initial diagnostic samples processed at the Laboratoire des Fièvres Hémorragiques Virales de Gueckédou, the Laboratoire Régional de l’Hôpital de Nzérékoré were used for sequencing. If RNAs from diagnostic procedures performed by the peripheral laboratories was not sent to PFHG, samples were inactivated and RNA was extracted from 50 µl for whole blood EDTA, 70 μl of plasma from EDTA blood or from 140 µl of wet swabs using the QIAamp Viral RNA Mini Kit (Qiagen) following the manufacturer’s instructions. Tiled primers generating overlapping products combined with a highly multiplexed PCR protocol were used for amplicon generation10. At start of deployment, three different primer pools (V3 or pan_10_EBOV, V4 or pan_EBOV and Zaire-PHE or EBOV-Zaire-PHE) were tested and results were combined for the optimal recovery of consensus. A new primer pool V5 (EBOV-Makona-V5) was further designed and implemented to increase consensus recovery. Primer pools V3, V4 and V5 were designed by the ARTIC network and Zaire-PHE primer pools by Public Health England (PHE). For V3, 62 primers were used, while for V4 and V5, 61 primers pairs were used, to amplify products of around 400 nt in length. For Zaire-PHE, 71 primer pairs were used to amplify products of around 350 nt in length for the approximately 20 kb viral genome. All primer pools used can be found in Supplementary Table 1. The multiplex PCR was performed as described by the most up-to-date ARTIC protocol for nCoV-2019 amplicon sequencing (nCoV-2019 sequencing protocol V3 (LoCost) V.3 (https://artic.network/ncov-2019), adapted to include the EBOV-specific primer sets. In brief, RNA was directly used for cDNA synthesis using the LunaScript RT SuperMix (New England Biolabs) and the cDNA generated was used as template in the multiplex PCR, which was performed in two reaction pools using Q5 Hot Start DNA Polymerase (New England Biolabs). The resulting amplicons from the two PCR pools were pooled in equal volumes and the pooled amplicons were diluted 1:10 with nuclease-free water.

Sequencing libraries were prepared, barcoded and multiplexed using the Ligation Sequencing Kit (SQK-LSK109) from ONT combined with the Native Expansion pack (EXP-NDB104, EXP-NBD114, EXP-NBD196) following the ARTIC Network’s library preparation protocol (nCoV-2019 sequencing protocol v3 (LoCost) V.3 (https://artic.network/ncov-2019)). For the preparation of fewer than 11 samples, each sample was prepared in multiples to achieve the library concentration required for sequencing. In brief, the diluted pooled amplicons were end-repaired using the Ultra II End Prep Module (New England Biolabs) followed by barcode ligation using the Blunt/TA Ligase Master Mix and one unique barcode per sample. Equal volumes from each native barcoding reaction were pooled and subject to bead clean-up using 0.4× AMPure beads. The pooled barcoded amplicons were quantified using the Qubit Fluorometer (Thermo Fisher Scientific) and AMII adapter ligation was performed using the Quick T4 DNA Ligase (New England Biolabs) followed by an additional bead clean-up. The adaptor-ligated barcoded amplicon pool was quantified using the Qubit Fluorometer (Thermo Fisher Scientific) aiming for a minimum recovery of 15 ng sequencing library to load onto the flow cell.

Sequencing libraries were sequenced using R9.4.1 Flow Cells (FLO-MIN106D, ONT) on the Mk1C device (ONT) using MinKNOW v.21.02.2 with real-time high accuracy base-calling and stringent demultiplexing (minimum barcoding score = 60). Within the barcoding options, barcoding on both ends and mid-read barcodes were both switched on. Reads were demultiplexed and binned in a barcode specific folder only if a barcode above the minimum barcoding score was identified on both read ends and if mid-read barcodes were not identified. Sequencing runs were stopped after around 24 h, and base-calling was allowed to finish before data handling.

Bioinformatics data analysis was performed as per the ARTIC protocol using a combination of the ARTIC EBOV (https://artic.network/ebov/ebov-bioinformatics-sop.html) and ARTIC SARS-CoV-2 (https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html) pipelines. A few minor modifications to the ARTIC bioinformatics protocol were incorporated. The two initial steps described, base-calling with Guppy and demultiplexing, were omitted as these were both done on the Mk1C device in real-time during the sequencing run; subsequently, the bioinformatics analysis was initiated from the read-filtering step (ARTIC Guppyplex). In brief, the ARTIC Guppyplex program was used to collect reads for each barcode into a single fastq file, in the presence of a length filter to remove chimeric reads. Reads were filtered based on length with a minimum (option: --min-length) and maximum (option: --max-length) length cut-off based on the amplicon size used (For V3, V4 and V5 primer pools: --min-length 400 and --max-length 700, for Zaire-PHE primer pool: --min-length 350 and --max-length 650). The quality check was omitted because only reads with a quality score of greater than 7 were processed. After merging and filtering, the ARTIC MinION pipeline was used to obtain the consensus sequences. The data were normalized to 200 and, using the --scheme-directory option, the pipeline was directed to the respective primer scheme used for each barcode. Reads were aligned to the NCBI reference KJ660347 (Zaire ebolavirus isolate H.sapiens-wt/GIN/2014/Makona-Gueckedou-C07) for data generated using V3, V4, and V5 primer pools and to NC_002549.1 (Zaire ebolavirus isolate Ebola virus/H.sapiens-tc/COD/1976/Yambuku-Mayinga) for data generated using Zaire-PHE primer pools.

Sequencing at IPD

Viral RNA was extracted from 140 µl of whole blood samples using the QIAamp Viral RNA Mini Kit (Qiagen) according to the manufacturer’s instructions and eluted in nuclease-free water to a final volume of 60 µl. Extracted RNA was tested using qRT–PCR as previously described46. In brief, the DNA library was prepared and enriched using the Illumina RNA Prep with Enrichment (L) Tagmentation kit (Illumina) according to the manufacturer’s recommendations with a pan viral probe panel that included EBOV-specific targets5. The purified libraries were pooled and sequenced on the Illumina MiSeq platform using the MiSeq Reagents Kit v3 (Illumina) according to the manufacturer’s instructions. Illumina sequence reads were quality trimmed by Prinseq-lite and consensus EBOV genome sequences were generated using an in-house de novo genome assembly pipeline.

Phylogenetic analysis of full-length genome sequences

Phylogenetic inference

The new EBOV genome sequences were embedded in different datasets for subsequent analyses. For phylogenetic reconstruction, we use a Zaire Ebola virus dataset consisting of 55 representative genomes from previous outbreaks and a Makona virus dataset consisting of 1,065 genomes sampled from Guinea, Sierra Leone and Liberia between 2014 and 2015. Multiple sequence alignment was performed using mafft47. We identified 6 T-to-C mutations in the genome from patient 11 that were indicative of mutations induced by adenosine deaminases acting on RNA. According to previous recommendations48, we masked these positions in this genome in all further analyses. Maximum likelihood trees were reconstructed using IQ-TREE under the GTR model with gamma (G) distributed rate variation among sites49. Temporal divergence plots of genetic divergence from the root of phylogenies against sampling time were constructed using TempEst50. To construct the temporal divergence plot for the Guinean 2021 genome data, we used a tree reconstructed under an HKY+G model.

Local molecular-clock model analysis

We used BEAST to fit a local molecular-clock model to a dataset consisting of 1,020 dated Makona virus genomes and one of the 2021 genomes (patient 1)51,52. We specified a separate rate on the tip branch for this genome as well as on the tip branch for a genome in a 2016 outbreak. We used the Skygrid coalescent model as a flexible nonparametric tree prior and an HKY+G substitution model53.

Guinea 2021 tMRCA estimation

Temporal signal was evaluated using the BETS procedure54. We estimated a slightly lower log marginal likelihood for a model that uses tip dates (−26,063.6) compared to a model that assumes sequences are sampled at the same time (−26,062.1). These BEAST analyses were performed using an exponential growth model, a strict molecular-clock model and an HKY+G substitution model. We specified a lognormal prior with a mean of 1 and a standard deviation of 5 on the population size and a Laplace prior with a scale of 100 on the growth rate.  Default priors were used for all other parameters. For the estimation of divergence time, we used a normal prior on the substitution rate with a mean of 0.001 and a standard deviation of 0.00004 based on the background EBOV rate estimated by the local molecular-clock analysis.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.