Real-time Metagenomic Analysis of Undiagnosed Fever Cases Unveils a Yellow Fever Outbreak in Edo State, Nigeria

Fifty patients with unexplained fever and poor outcomes presented at Irrua Specialist Teaching Hospital (ISTH) in Edo State, Nigeria, an area endemic for Lassa fever, between September 2018 - January 2019. After ruling out Lassa fever, plasma samples from these epidemiologically-linked cases were sent to the African Centre of Excellence for Genomics of Infectious Diseases (ACEGID), Redeemer’s University, Ede, Osun State, Nigeria, where we carried out metagenomic sequencing which implicated yellow fever virus (YFV) as the etiology of this outbreak. Twenty-nine of the 50 samples were confirmed positive for YFV by reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR), 14 of which resulted in genome assembly. Maximum likelihood phylogenetic analysis revealed that these YFV sequences formed a tightly clustered clade more closely related to sequences from Senegal than sequences from earlier Nigerian isolates, suggesting that the YFV clade responsible for this outbreak in Edo State does not descend directly from the Nigerian YFV outbreaks of the last century, but instead reflects a broader diversity and dynamics of YFV in West Africa. Here we demonstrate the power of metagenomic sequencing for identifying ongoing outbreaks and their etiologies and informing real-time public health responses, resulting in accurate and prompt disease management and control.


Results
Demographics. Of the 50 samples tested, 31(62%) were male and 14(28%) were female. Their ages ranged from 3 to 60 years with a mean age of 19.4 years ( Table 1).

SYBR based YfV qRt-pcR.
Twenty-nine out of the fifty plasma samples were found positive for YFV by Table S1). phylogenetic analysis. We assembled fourteen YFV genomes mostly from samples with lower Ct values as these had sufficient YFV reads for genome assembly. Maximum likelihood phylogenetic analysis revealed that the 2018 YFV sequences formed a tightly clustered clade (Fig. 1a), more closely related to sequences from other West African countries (mean pairwise identity = 95.2%) than to earlier  Nigerian sequences (mean pairwise identity = 91.4%). Bayesian analysis estimated the tMRCA of the fourteen (14) 2018 sequences from Nigeria as 3 years (95% HPD interval), while the tMRCA of the pre-1991 sequences from Nigeria is 195 years (95% HPD interval) ( Supplementary Fig. S1).

RT-qPCR (Supplementary
Metagenomic analysis. Metagenomic analysis revealed YFV as the only virus with reads from across the viral genome and appearing in multiple samples. Other viruses with small numbers of reads suggested by Kraken classification were human herpesvirus 6A, human herpesvirus 7 and mastadenovirus C (Fig. 2). The reads initially classified as herpesvirus, however, consisted of oligonucleotide repeats with weak similarity to terminal regions of the betaherpesvirus genome. These reads are, however, non-specific, and match several other genera in a standard NCBI BLAST search (after removing the low-complexity filter), and they therefore likely represent false Kraken positive classification of nonspecific template with tandem repeats. In contrast, mastadenovirus C reads did appear to be specific matches, but they totaled only three read-pairs in a single sample.
We separately noted that one of the 6 negative (water) controls -labeled NTC5 in Fig. 2 -contains a small number of reads from YFV. Only one sample (0054) processed in the same batch as NTC5 was positive for YFV, and this sample was positive by qPCR before being processed for sequencing. This argues that 0054 was a true positive that likely contaminated the negative control. However, to remove any ambiguity, we did not further include 0054 in the case counts or phylogenetic analyses reported here.

Discussion
We established the presence of YFV in Edo State within three days of receiving initial samples, and we shared this information immediately with the referring hospital (ISTH) and the Nigeria Center for Disease Control (NCDC). Based in part on these findings, NCDC and the Nigeria Federal Ministry of Health declared an outbreak in Edo state the following day 4 , prompting more samples to be sent for diagnosis.
Notably, these are the first sequence data reported to date of recent Nigerian YFV cases and the only complete Nigerian YFV genomes from patient samples collected after 1950. Our phylogenetic analysis suggests that the YFV clade responsible for the 2018 outbreak in Edo State does not descend directly from the Nigerian YFV outbreaks of the last century but is instead part of the broader diversity of YFV in West Africa. These results are supported by the few available historical whole genome sequences (Fig. 1b), but a detailed understanding of the recent yellow fever outbreak in Nigeria will require sequencing of stored samples from other states and from the past decades.
The parallel detection of yellow fever on unbiased metagenomic sequencing of multiple samples illustrates the power of genomics in explaining a suspected outbreak. In circumstances where sample collection is unplanned, simultaneous analysis of multiple samples can improve the sensitivity for the true etiology; in this case, we suspect the short period of viraemia relative to the duration of symptoms in YF 8 partly explains the absence of detectable viral nucleic acid in a subset of samples.
This ability to rapidly identify and characterize a re-emerging virus -in an unusual cluster identified by local health officials -highlights the value of in-country genomics capacity. Serology using ELISA is the current method of choice for yellow fever diagnosis by the WHO. This diagnosis method despite its limitations is done in   www.nature.com/scientificreports www.nature.com/scientificreports/ very few selected laboratories and cost about one thousand US dollars per sample. The integration of genomics capacity into the established, but siloed, pathogen-specific diagnostic platforms developed over the past 20 years provides exciting opportunities for public health surveillance.

Materials and Methods
Sample collection and testing. In 2018 clinicians and public health authorities at Irrua Specialist Teaching Hospital (ISTH) in Edo State, Nigeria noted a pattern among a series of fifty (50) patients who had common clinical presentations, poor outcomes, a lack of clear diagnosis and resident in contiguous local government areas. All patient samples tested negative for Lassa virus by RT-qPCR. Aliquots of plasma were sent to the African Center of Excellence for Genomics of Infectious Disease (ACEGID) at Redeemer's University for further genomic investigation.
Sample preparation and sequencing. Plasma samples were inactivated in AVL and viral RNA was extracted according to the QiAmp viral RNA mini kit (Qiagen) manufacturer's instructions. Extracted RNA was treated with Turbo DNase to remove contaminating DNA, followed by cDNA synthesis with random hexamers. Sequencing libraries were prepared using the Nextera XT kit (Illumina) as previously described 9

Metagenomic analysis of viral infections.
Kraken 10 was used to perform an initial taxonomic classification of all viral taxa present in the sample using a database that encompassed the known diversity of all viruses that infect humans, as previously described 11 . Alignment to reference genomes was performed for each virus species identified by Kraken as present in one or more samples in order to confirm and obtain de-duplicated counts of classified reads. Alignment was performed with Novoalign (http://www.novocraft.com) using the following parameters: "-r Random -l 30 -g 40 -x 20 -t 502" and Picard (http://broadinstitute.github.io/picard) was used to mark and remove duplicates.
Genome assembly and maximum likelihood phylogenetic analysis. Viral genomes were assembled using viral-ngs v1.21.2 12,13 on the DNAnexus platform, and MAFFT v7.310 14 was used to align these genomes with all African YFV genomes available in GenBank as of 19th January 2019 (including a small number of non-African sequences as an outgroup). Using Geneious 2019.0.4 15 , 678 bp of the prM/E region of the genome was identified as having the most coverage by earlier GenBank sequences from Africa; this region was extracted and used to infer a maximum likelihood tree using IQTREE v1.5.5 16 . We used a Tamura-Nei nucleotide-substitution model with a gamma distribution of rate variation among sites 17 and ultrafast bootstrapping 18 . For each pair of aligned 678 bp prM/E regions, we calculated a percent nucleotide identity. We then averaged this value over all pairs in a given branch of the maximum likelihood tree to report "mean pairwise identity" within a putative clade.
Bayesian phylogenetic analysis. Time-scaled Bayesian phylogenetic analysis was carried out using a Markov chain Monte Carlo (MCMC) algorithm implemented in the BEAST v1.10.4 19 package with BEAGLE 20 to improve run-time. The evolutionary and demographic processes were estimated from the sampling dates of the prM/E sequences using a model that incorporated a General Time Reversible (GTR) + Gamma distribution (four categories) model with "(1 + 2),3" codon partitioning, an uncorrelated relaxed clock with log-normal distribution 21 , and a Bayesian skyline coalescent tree prior distribution 22 . All Bayesian analyses were run for 200 million MCMC steps, with parameters and trees sampled every 10000 generations. The uncertainty in our parameter estimates was assessed by calculating the effective sample size (ESS) and the 95% highest probability density (HPD) values using the TRACER v1.6.0 23 program. Maximum clade credibility (MCC) trees summarizing all MCMC samples were generated by TreeAnnotator v1.10.4 24 software, with a burn-in rate of 10%. FigTree v1.4.4 25 was used to view and annotate the MCC tree. This analysis used the same 678 bp of the prM/E region of the YFV genome as was used for the maximum likelihood approach above, possibly biasing our time to most common recent ancestor (tMRCA) estimates. ethical approval. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Institutional Review Boards of ISTH (Edo State, Nigeria), Redeemer's University (Osun State, Nigeria) and Harvard University (Massachusetts, USA). Excess plasma samples used for clinical testing were obtained under a waiver of consent granted by the ISTH Research Ethics Committee (ISTHREC).

Data availability
All sequences from this study are available on GenBank with accession numbers MK457700 -MK457703 and MN211301 -MN211311.