Hepatitis B virus (HBV) is a major cause of human hepatitis. There is considerable uncertainty about the timescale of its evolution and its association with humans. Here we present 12 full or partial ancient HBV genomes that are between approximately 0.8 and 4.5 thousand years old. The ancient sequences group either within or in a sister relationship with extant human or other ape HBV clades. Generally, the genome properties follow those of modern HBV. The root of the HBV tree is projected to between 8.6 and 20.9 thousand years ago, and we estimate a substitution rate of 8.04 × 10−6–1.51 × 10−5 nucleotide substitutions per site per year. In several cases, the geographical locations of the ancient genotypes do not match present-day distributions. Genotypes that today are typical of Africa and Asia, and a subgenotype from India, are shown to have an early Eurasian presence. The geographical and temporal patterns that we observe in ancient and modern HBV genotypes are compatible with well-documented human migrations during the Bronze and Iron Ages1,2. We provide evidence for the creation of HBV genotype A via recombination, and for a long-term association of modern HBV genotypes with humans, including the discovery of a human genotype that is now extinct. These data expose a complexity of HBV evolution that is not evident when considering modern sequences alone.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Damgaard, P. d. B. et al. 137 ancient human genomes from across the Eurasian steppes. Nature https://doi.org/10.1038/s41586-018-0094-2 (2018).
Lai, C. L., Ratziu, V., Yuen, M.-F. & Poynard, T. Viral hepatitis B. Lancet 362, 2089–2094 (2003).
Schweitzer, A., Horn, J., Mikolajczyk, R. T., Krause, G. & Ott, J. J. Estimations of worldwide prevalence of chronic hepatitis B virus infection: a systematic review of data published between 1965 and 2013. Lancet 386, 1546–1555 (2015).
Murhekar, M. V., Murhekar, K. M. & Sehgal, S. C. Epidemiology of hepatitis B virus infection among the tribes of Andaman and Nicobar Islands, India. Trans. R. Soc. Trop. Med. Hyg. 102, 729–734 (2008).
Locarnini, S., Littlejohn, M., Aziz, M. N. & Yuen, L. Possible origins and evolution of the hepatitis B virus (HBV). Semin. Cancer Biol. 23, 561–575 (2013).
Littlejohn, M., Locarnini, S. & Yuen, L. Origins and evolution of hepatitis B virus and hepatitis D virus. Cold Spring Harb. Perspect. Med. 6, a021360 (2016).
Kramvis, A. Genotypes and genetic variability of hepatitis B virus. Intervirology 57, 141–150 (2014).
Hannoun, C., Horal, P. & Lindh, M. Long-term mutation rates in the hepatitis B virus genome. J. Gen. Virol. 81, 75–83 (2000).
Zhou, Y. & Holmes, E. C. Bayesian estimates of the evolutionary rate and age of hepatitis B virus. J. Mol. Evol. 65, 197–205 (2007).
Paraskevis, D. et al. Dating the origin of hepatitis B virus reveals higher substitution rate and adaptation on the branch leading to F/H genotypes. Mol. Phylogenet. Evol. 93, 44–54 (2015).
Zehender, G. et al. Enigmatic origin of hepatitis B virus: an ancient travelling companion or a recent encounter? World J. Gastroenterol. 20, 7622–7634 (2014).
Kramvis, A. et al. Relationship of serological subtype, basic core promoter and precore mutations to genotypes/subgenotypes of hepatitis B virus. J. Med. Virol. 80, 27–46 (2008).
MacDonald, D. M., Holmes, E. C., Lewis, J. C. & Simmonds, P. Detection of hepatitis B virus infection in wild-born chimpanzees (Pan troglodytes verus): phylogenetic relationships with human and other primate genotypes. J. Virol. 74, 4253–4257 (2000).
Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541, 302–310 (2017).
Rasmussen, S. et al. Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago. Cell 163, 571–582 (2015).
Feldman, M. et al. A high-coverage Yersinia pestis genome from a sixth-century Justinianic plague victim. Mol. Biol. Evol. 33, 2911–2923 (2016).
Reid, A. H., Fanning, T. G., Hultin, J. V. & Taubenberger, J. K. Origin and evolution of the 1918 “Spanish” influenza virus hemagglutinin gene. Proc. Natl Acad. Sci. USA 96, 1651–1656 (1999).
Duggan, A. T. et al. 17th century variola virus reveals the recent history of smallpox. Curr. Biol. 26, 3407–3412 (2016).
Kahila Bar-Gal, G. et al. Tracing hepatitis B virus to the 16th century in a Korean mummy. Hepatology 56, 1671–1680 (2012).
Patterson Ross, Z. et al. The paradox of HBV evolution as revealed from a 16th century mummy. PLoS Pathog. 14, e1006750 (2018).
Bond, W. W. et al. Survival of hepatitis B virus after drying and storage for one week. Lancet 317, 550–551 (1981).
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
Simmonds, P. & Midgley, S. Recombination in the genesis and evolution of hepatitis B virus genotypes. J. Virol. 79, 15467–15476 (2005).
Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10, e1003537 (2014).
Simmonds, P. Reconstructing the origins of human hepatitis viruses. Phil. Trans. R. Soc. Lond. B 356, 1013–1026 (2001).
Tedder, R. S., Bissett, S. L., Myers, R. & Ijaz, S. The ‘Red Queen’ dilemma—running to stay in the same place: reflections on the evolutionary vector of HBV in humans. Antivir. Ther. 18, 489–496 (2013).
Duchêne, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proc. R. Soc. Lond. B 281, 20140732 (2014).
Zehender, G. et al. Reliable timescale inference of HBV genotype A origin and phylodynamics. Infect. Genet. Evol. 32, 361–369 (2015).
Hannoun, C., Söderström, A., Norkrans, G. & Lindh, M. Phylogeny of African complete genomes reveals a West African genotype A subtype of hepatitis B virus and relatedness between Somali and Asian A1 sequences. J. Gen. Virol. 86, 2163–2167 (2005).
Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl Acad. Sci. USA 111, 2632–2637 (2014).
Ghosh, S. et al. Unique hepatitis B virus subgenotype in a primitive tribal community in eastern India. J. Clin. Microbiol. 48, 4063–4071 (2010).
Basu, A., Sarkar-Roy, N. & Majumder, P. P. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc. Natl Acad. Sci. USA 113, 1594–1599 (2016).
Drexler, J. F. et al. Bats carry pathogenic hepadnaviruses antigenically related to hepatitis B virus and capable of infecting human hepatocytes. Proc. Natl Acad. Sci. USA 110, 16151–16156 (2013).
Geer, L. Y. et al. The NCBI BioSystems database. Nucleic Acids Res. 38, D492–D496 (2010).
Bell, T. G., Yousif, M. & Kramvis, A. Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database. Springerplus 5, 1896 (2016).
Bronk Ramsey, C. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360 (2009).
Reimer, P. J. et al. IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal bp. Radiocarbon 55, 1869–1887 (2013).
Lindgreen, S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes 5, 337 (2012).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Drosten, C., Weber, M., Seifried, E. & Roth, W. K. Evaluation of a new PCR assay with competitive internal control sequence for blood donor screening. Transfusion 40, 718–724 (2000).
Willerslev, E. & Cooper, A. Review Paper. Ancient DNA. Proc. R. Soc. Lond. B 272, 3–16 (2005).
Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).
Orlando, L., Gilbert, M. T. P. & Willerslev, E. Reconstructing ancient genomes and epigenomes. Nat. Rev. Genet. 16, 395–408 (2015).
Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).
Martin, D. & Rybicki, E. RDP: detection of recombination amongst aligned sequences. Bioinformatics 16, 562–563 (2000).
Padidam, M., Sawyer, S. & Fauquet, C. M. Possible emergence of new geminiviruses by frequent recombination. Virology 265, 218–225 (1999).
Martin, D. P., Posada, D., Crandall, K. A. & Williamson, C. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res. Hum. Retroviruses 21, 98–102 (2005).
Smith, J. M. Analyzing the mosaic structure of genes. J. Mol. Evol. 34, 126–129 (1992).
Posada, D. & Crandall, K. A. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl Acad. Sci. USA 98, 13757–13762 (2001).
Gibbs, M. J., Armstrong, J. S. & Gibbs, A. J. Sister-scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics 16, 573–582 (2000).
Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 176, 1035–1047 (2007).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).
Rambaut, A., Lam, T. T., Max Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
Bouckaert, R. R. & Drummond, A. J. bModelTest: Bayesian phylogenetic site model averaging and model comparison. BMC Evol. Biol. 17, 42 (2017).
Duchêne, S., Duchêne, D., Holmes, E. C. & Ho, S. Y. W. The performance of the date-randomization test in phylogenetic analyses of time-structured virus data. Mol. Biol. Evol. 32, 1895–1906 (2015).
Kass, R. E. & Raftery, A. E. Bayes Factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Rambaut, A., Suchard, M. A., Xie, D. & Drummond, A. J. Tracer v1.6. https://github.com/beast-dev/tracer/releases/tag/v1.6 (2017).
Sanchez, G. et al. Human (Clovis)–gomphothere (Cuvieronius sp.) association ∼ 13,390 calibrated yBP in Sonora, Mexico. Proc. Natl Acad. Sci. USA 111, 10972–10977 (2014).
Bourgeon, L., Burke, A. & Higham, T. Earliest human presence in North America dated to the Last Glacial Maximum: new radiocarbon dates from Bluefish Caves, Canada. PLoS ONE 12, e0169486 (2017).
Andernach, I. E., Nolte, C., Pape, J. W. & Muller, C. P. Slave trade and hepatitis B virus genotypes and subgenotypes in Haiti and Africa. Emerg. Infect. Dis. 15, 1222–1228 (2009).
Kayser, M. et al. Melanesian and Asian origins of Polynesians: mtDNA and Y chromosome gradients across the Pacific. Mol. Biol. Evol. 23, 2234–2244 (2006).
B.B. thanks D. Tserendulam for help, wisdom and guidance. E.W. thanks St John’s College, Cambridge for facilitating scientific discussion. We thank S. Rankin and the staff of the University of Cambridge High Performance Computing service and the National High-throughput Sequencing Centre (Copenhagen). This work was supported by: The Danish National Research Foundation, The Danish National Advanced Technology Foundation (The Genome Denmark platform, grant 019-2011-2), The Villum Kann Rasmussen Foundation, KU2016, European Union FP7 programme ANTIGONE (grant agreement No. 278976), European Union Horizon 2020 research and innovation programmes, COMPARE (grant agreement No. 643476), VIROGENESIS (grant agreement No. 634650) and the Lundbeck Foundation. The National Reference Center for Hepatitis B and D Viruses is supported by the German Ministry of Health via the Robert Koch Institute (Berlin). B.B. was supported by Taylor Family-Asia Foundation Endowed Chair in Ecology and Conservation Biology. A.D.M.E.O. was supported by N-RENNT of the Ministry of Science and Culture of Lower Saxony, Germany.
Nature thanks P. Simmonds, B. Shapiro, C. Pepperell and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
The frequencies of the mismatches observed between the HBV reference sequences (Supplementary Table 3) and the reads are shown as a function of distance from the 5′ end. C > T (5′) and G > A (3′) mutations are shown in red and blue, respectively. All other possible mismatches are shown in grey. Insertions are shown in purple, deletions in green and clippings in orange. The count of reads matching HBV for each sample is shown in parentheses. a, Damage patterns for RISE563, DA222, DA119, RISE254, DA195, DA27, DA51, RISE386, RISE387, DA29, DA45, RISE416 and RISE154. b, Damage patterns for DA222 without (left) and with (right) USER treatment. c, Damage patterns with 10, 20, 50, 100, 200, 500 and 1,000 reads sampled from RISE563, in which each opaque line corresponds to one replicate set of reads.
This figure shows 26 Orthohepadnaviridae sequences (dataset 1, see Methods), including the ancient HBV sequences. Ancient genotype A sequences are shown in red, the ancient genotype B sequence in orange, ancient genotype D sequences in blue and novel genotype sequences in green. The tree was constructed in PhyML60, optimizing for topology, branch lengths and rates, with 100 bootstraps (see Methods). Internal nodes with < 70% bootstrap support are shown as polytomies.
RDP451 was used to analyse the set of 12 ancient sequences plus a representative set of 15 modern human and non-human primate sequences (see Methods). The seven recombination programs used by RDP4 suggested that all genotype A sequences are recombinants, with the genotype D sequence HBV-DA51 as the minor parent and an unknown major parent. The obvious interpretation is that recombination formed an ancestor of the oldest sequences, evidence of which is still present in the less-ancient and the modern representatives. The figure shows the graphical evidence and predicted recombination break-point distribution for the two oldest genotype A sequences, HBV-RISE386 and HBV-RISE387, according to three of the RDP4 methods (MaxChi, Bootscan and RDP). In all subplots, the predicted location of the break points is shown as a dashed vertical line and the surrounding grey area shows the 99% confidence interval for the break point. Subplots on the same row share their y axis and those in the same column share their x axis. a, HBV-RISE386 analysed by MaxChi. b, HBV-RISE386 analysed by Bootscan. c, HBV-RISE386 analysed by RDP. d, HBV-RISE387 analysed by MaxChi. e, HBV-RISE387 analysed by Bootscan. f, HBV-RISE387 analysed by RDP.
The sequences from dataset 2 (see Methods) and the ancient sequences were aligned in MAFFT59. The tree was constructed in PhyML60, optimizing for topology, branch lengths and rates, with 100 bootstraps (see Methods). Internal nodes with < 70% bootstrap support are shown as polytomies. Ancient genotype A sequences are shown in red, ancient genotype B sequences in orange, ancient genotype D sequences in blue and novel genotype sequences in green. Taxon names indicate: genotype or subgenotype, GenBank accession number, age, abbreviation of country of sequence origin, region of sequence origin, host species and optional additional remarks. Note that the maximum likelihood tree shows topological uncertainty (polytomies) in areas where the BEAST225 tree (Fig. 2) is well resolved. This is the case for two reasons. First, BEAST2 always produces a fully resolved binary topology without polytomies. Second, and more important, BEAST2 creates a time tree and uses tip dates to constrain the possible topologies under consideration. Thus, BEAST2 can know that certain topologies are unlikely or impossible, whereas maximum likelihood cannot and thus inherently has greater uncertainty regarding tree topology.
a, Regression of root-to-tip distances and ages performed in Scipy (http://www.scipy.org). One hundred and twenty-four branch lengths were extracted using TempEst62 from trees inferred using neighbour joining, maximum likelihood and Bayesian methods. Shaded areas show 95% confidence intervals. Slopes are 1.01 × 10−5, 1.20 × 10−5 and 4.21 × 10−6, and correlation coefficients are 0.45 (R2 = 0.2), 0.36 (R2 = 0.13) and 0.51 (R2 = 0.26), for maximum likelihood, Bayesian and neighbour joining trees, respectively. b, Date randomization tests under the strict clock model. The median and 95% HPD interval for the substitution rates are given. The rate for the correctly dated tree is shown in red. Dates were randomized within all sequences, within the ancient sequences only, and within each genotype. We performed three replicates of each. None of the 95% HPD intervals for the randomized runs overlaps with the 95% HPD intervals for the correctly dated runs, suggesting the presence of a temporal signal in the data.
This file is in PDF format and contains: Three Supplementary Tables: SI Tables 1 and 2 describe the number of reference genomes and accession numbers of sequences used to design capture probes. SI Table 3 contains additional information for the HBV positive samples. A Supplementary Methods section, showing: 1) An investigation into the dependence of damage patterns on the number of reads, 2) Lists of accession numbers for sequences included in the different analyses, and 3) The three phylogenetic trees used for the regression analysis, inferred using neighbour joining, maximum likelihood and Bayesian methods.
About this article
Nature Reviews Microbiology (2019)
Nature Reviews Microbiology (2019)
Nature Reviews Microbiology (2019)