Hepatitis B virus (HBV) is a major cause of human hepatitis. There is considerable uncertainty about the timescale of its evolution and its association with humans. Here we present 12 full or partial ancient HBV genomes that are between approximately 0.8 and 4.5 thousand years old. The ancient sequences group either within or in a sister relationship with extant human or other ape HBV clades. Generally, the genome properties follow those of modern HBV. The root of the HBV tree is projected to between 8.6 and 20.9 thousand years ago, and we estimate a substitution rate of 8.04 × 10−6–1.51 × 10−5 nucleotide substitutions per site per year. In several cases, the geographical locations of the ancient genotypes do not match present-day distributions. Genotypes that today are typical of Africa and Asia, and a subgenotype from India, are shown to have an early Eurasian presence. The geographical and temporal patterns that we observe in ancient and modern HBV genotypes are compatible with well-documented human migrations during the Bronze and Iron Ages1,2. We provide evidence for the creation of HBV genotype A via recombination, and for a long-term association of modern HBV genotypes with humans, including the discovery of a human genotype that is now extinct. These data expose a complexity of HBV evolution that is not evident when considering modern sequences alone.
HBV is transmitted perinatally or horizontally via blood or genital fluids3. The estimated global prevalence is 3.6%, ranging from 0.01% (UK) to 22.38% (South Sudan)4. In high endemicity areas, in which prevalence is over 8%, 70–90% of the adult population show evidence of past or present infection5 (http://www.who.int/mediacentre/factsheets/fs204/en/). The young and the immunocompromised are most likely to develop chronic HBV infection, which can result in high viraemia over years to decades3. Approximately 257 million people are chronically infected and around 887,000 people died in 2015 owing to associated complications (http://www.who.int/mediacentre/factsheets/fs204/en/).
Despite the prevalence and public health impact of HBV, its origin and evolution remain unclear6,7. Inference of HBV nucleotide substitution rates is complicated by the fact that the virus genome consists of four overlapping open reading frames8, and that mutation rates differ between phases of chronic infection9. Studies based on heterochronous sequences, sampled over a relatively short time period, find higher substitution rates, whereas rates estimated using external calibrations tend to be lower, leading to a wide range of estimated HBV substitution rates (7.72 × 10−4–3.7 × 10−6 substitutions per site per year)10,11,12. Human HBV is classified into at least nine genotypes (A–I) based on sequence similarity of at least 92.5% within genotypes13, with a heterogeneous global distribution7,8 (Fig. 1a). Attempts to explain the origin of genotypes using human migrations have been inconclusive. The hypothesis that HBV co-evolved with modern humans as they left Africa 60–100 thousand years ago (ka) has been contested owing to the basal phylogenetic position of genotypes F and H, which are found exclusively in the Americas6. HBV also infects non-human primates, and the human and other great ape HBVs are interspersed in the phylogenetic tree, possibly owing to cross-species transmission14. Given the variability of estimated substitution rates, the incongruence of the tree topology with some human migrations and the mixed topology of the non-human primate and human HBV sequences in the phylogenetic tree, there remains considerable uncertainty about the evolutionary history of HBV.
Recent advances in the sequencing of ancient DNA (aDNA) have yielded important insights into human evolution, past population dynamics15 and diseases16,17. However, ancient sequences have been recovered for only a handful of exogenous human viruses, including influenza virus (sample approximately 100 years old)18, variola virus (sample approximately 350 years old)19 and HBV (samples approximately 340 and 450 years old)20,21. The knowledge gained from these cases emphasizes the general importance of ancient sequences for the direct study of long-term viral evolution. HBV has several characteristics that make it a good candidate for detection in an aDNA virus study: its extended high viraemia during chronicity3, the relative stability of its virion22, and its small, circular and partially double-stranded DNA genome8.
Shotgun sequence data were previously generated from 167 Bronze Age1 and 137 predominantly Iron Age2 individuals from central to western Eurasia with a sample age range of approximately 7.1–0.2 thousand years (kyr) old. We identified reads that matched the HBV genome in 25 samples (Table 1, Extended Data Table 1a and Supplementary Table 3), spanning a period of almost 4,000 years, from several different cultures and with a broad geographical range (Fig. 1b, Table 1, Extended Data Table 1a and Supplementary Table 3). Using TaqMan PCR, we tested two samples (DA195 and DA222) with high genome coverage and two samples (DA85 and DA89) with low genome coverage for the presence of HBV. The high-coverage samples tested positive, whereas the low-coverage samples tested negative (Extended Data Table 1b). This is consistent with shotgun sequencing being more effective than targeted PCR for analysing highly degraded DNA23. On the basis of the availability of sample material, libraries from 14 samples were selected for targeted enrichment (capture) of HBV DNA fragments (Supplementary Tables 1, 2). This resulted in increased genome coverage and an average of a 2.4-fold increase in the number of HBV-positive reads (Extended Data Table 1a and Supplementary Table 3). We obtained 17.9–100% HBV genome coverage from the sequence data, with genomic depth ranging from 0.4× to 89.2× (Table 1 and Extended Data Table 1a). We selected 12 samples for phylogenetic analyses. Criteria for inclusion were at least 50% genome coverage and clear aDNA damage patterns after capture (Extended Data Fig. 1).
For an initial phylogenetic grouping, we estimated a maximum likelihood tree using the ancient HBV genomes together with modern human, non-human primate, rodent and bat HBV genomes (dataset 1, see Methods). All ancient viruses fell within the diversity of Old World primate HBV genotypes, which includes all human and other great ape genotypes with the exception of human genotypes F and H (Extended Data Fig. 2).
Recombination is known to occur in HBV24. We found strong evidence that an ancient sequence (HBV-DA51) and an unknown parent recombined to form the ancient genotype A sequences. Although this cannot literally be the case owing to sample ages, the logical interpretation is that an ancestor of HBV-DA51 was involved in the recombination. The same recombination is also suggested for the two modern genotype A sequences that were included in the analysis. The ancient genotype B (HBV-DA45), a modern genotype B and two modern genotype C sequences were not similarly flagged, which suggests that the possible recombination occurred after genotypes A, B and C had diverged. The predicted recombination break points (Extended Data Table 2 and Extended Data Fig. 3) correspond closely to the polymerase gene. It is therefore possible that the polymerase from an unknown parent and the remainder of the genome from an HBV-DA51 ancestor recombined to form the now-ubiquitous genotype A about 7.4–9 ka (Fig. 2, Extended Data Table 3b and Methods). Similar recombination events that involved the creation of genotypes E, G and a currently circulating B/C recombinant have previously been identified24.
For detailed phylogenetic analyses, we used a set of 112 reference human and non-human primate HBV sequences (dataset 2, see Methods). A maximum likelihood phylogenetic tree based on these reference sequences and the 12 ancient sequences was constructed (Extended Data Fig. 4). Regression of root-to-tip genetic distances against sampling dates, as well as date randomization tests, showed a clear temporal signal in the data (Extended Data Fig. 5 and Supplementary Figs. 1–3), suggesting that molecular clock models can be applied. A dated coalescent phylogeny was constructed using BEAST225 (Fig. 2). The molecular clock was calibrated using tip dates. Strict and relaxed log-normal molecular clocks were tested with coalescent constant, exponential and Bayesian skyline population priors (Extended Data Table 3a). Model comparisons favoured a relaxed molecular clock model with log-normally distributed rate variation and a coalescent exponential population prior (Extended Data Table 3a). The median root age of the resulting tree is estimated to be 11.6 kyr (95% highest posterior density (HPD) interval: 8.6–15.3 kyr) and the median clock rate is 1.18 × 10−5 substitutions per site per year (95% HPD interval: 9.21 × 10−6–1.45 × 10−5 substitutions per site per year). Under a strict molecular clock, a coalescent Bayesian skyline population prior was favoured, in which case the median root age is 15.6 kyr (95% HPD interval: 13.7–17.8 kyr) and the median substitution rate is 9.48 × 10−6 substitutions per site per year (95% HPD interval: 8.3 × 10−6–1.07 × 10−5 substitutions per site per year) (Extended Data Table 3a–c).
Under all model parameterizations used here, the substitution rate that we find is lower than rates estimated from phylogenies built using either modern heterochronous sequences10 or sequences from mother-to-child transmissions26 but higher than rates inferred using external calibrations based on human migrations11. A lower rate is consistent with previous work27 in which it was shown that, although mutation rates may be high, mutations within an individual often revert back to the genotype consensus and thus rarely lead to long-term sequence change. It is also consistent with the time-dependent rate phenomenon, observed for many viruses, which suggests that short-term evolutionary rates are higher than long-term rates28.
The ancient HBV genome data enable us to formally evaluate hypotheses concerning HBV origins using path sampling of calibrated phylogenies based on appropriate external divergence date assumptions. We tested several calibration points that would be implied by a co-expansion of HBV with humans after leaving Africa for support of congruence between migrations and geographical locations of HBV clades11. We find weak evidence for the split of the F and H clade occurring between 13.4 and 25.0 ka under a strict, but not a relaxed, clock model. We do not find support for the divergence of subgenotype C3 strains between 5.1 and 12.0 ka (hypothesized to have led to its distribution in different regions of Polynesia11) or for divergence of Haitian A3 strains from other genotype A strains between 0.2 and 0.5 ka under either strict or relaxed clock models (Extended Data Table 3d).
In the dated coalescent phylogeny, four ancient sequences (from youngest to oldest: HBV-DA119, HBV-DA195, HBV-RISE386 and HBV-RISE387) group with genotype A. The first three are well within the 7.5% nucleotide divergence criterion that was used to delimit membership in HBV genotypes, and HBV-RISE387 is right on this limit (7.51%)13 (Extended Data Table 4a). The three oldest samples lack a six-nucleotide insertion at the carboxyl end of the core gene (C) that is present in all modern genotype A viruses8 (Table 2). HBV-RISE387 encodes a stop codon in its pre-core peptide that would have ablated the expression of the immune modulator HBe antigen (HBeAg), a phenomenon that is known to occur in modern HBV infections (Table 2). This characteristic viral mutant is usually found in chronic HBV carriers who seroconverted from HBeAg to anti-HBe. RISE386 and RISE387 have archaeologically dates of only about 100 years apart and both come from the Bulanovo site in Russia, but their viruses have only 93.34% sequence identity (Extended Data Table 4b), which indicates the existence of substantial localized HBV diversity about 4.2 ka.
The ancient sequence HBV-DA45 phylogenetically groups with genotype B and has 97.65% sequence identity with modern genotype B (Extended Data Table 4a).
Sequences HBV-DA27, HBV-DA29, HBV-DA51 and HBV-DA222 phylogenetically group with the modern genotype D. They have high sequence identity (96.99–98.74%) with modern genotype D sequences (Extended Data Table 4a), and have the typical 33-nucleotide deletion in the preS1 region of the S gene, encoding the three HBV surface proteins8 (Table 2).
Sequences HBV-RISE154, HBV-RISE254 and HBV-RISE563 are in a sister relationship with the chimpanzee–gorilla HBV clade (Fig. 2). HBV-RISE254 and HBV-RISE563 have the same 33-nucleotide deletion in the preS1 sequence that is shared with non-human primate HBVs and human genotype D (Table 2). HBV-RISE563 does not encode a functional pre-core peptide (Table 2). On the basis of sequence similarity across the whole genome, HBV-RISE563 and HBV-RISE254 together might be classified as a new human HBV genotype that is extinct today, and HBV-RISE154 might possibly be classified as another (Extended Data Table 4). However, HBV-RISE154 has low genome coverage, which precludes an exact calculation. The sister relationship of these three sequences with modern chimpanzee and gorilla HBVs could be interpreted as a consequence of relatively recent transmission(s) of HBV from humans to non-human primates14. However, other scenarios and confounding factors are possible, as these lineages are deeply separated in the tree. Incomplete lineage sorting combined with viral extinction (possibly boosted by massive recent reductions in great ape populations) should be considered. More data on current and, if possible, ancient HBVs will be necessary to reach definitive conclusions.
The geographical locations of some of the ancient virus genotypes do not match the present-day genotype distribution, and also do not match dates and/or locations inferred in previous studies of HBV. Although the data presented here are limited, they provide important spatiotemporal reference points in the evolutionary history of HBV. Their synopsis suggests a more complicated ancestry of present-day genotypes than previously assumed, especially in light of recent insights into the history of human migration.
We find genotype A in south-western Russia by 4.3 ka (in samples RISE386 and RISE387) in individuals belonging to the Sintashta culture, and in a Hungarian sample (DA195) from the Scythian culture. The western Scythians are related to the Bronze Age cultures of western steppe populations2 and their shared ancestry suggests that the modern genotype A may descend from this ancient Eurasian diversity and not, as previously hypothesized, from African ancestors29,30. This is also consistent with the phylogeny (Fig. 2), as well as the fact that the three oldest ancient genotype A sequences (HBV-DA195, HBV-RISE386 and HBV-RISE387) lack the six-nucleotide insertion found in the youngest (HBV-DA119) and in all modern genotype A sequences. The ancestors of subgenotypes A1 and A3 could have been carried into Africa subsequently, via migration from western Eurasia31.
The ancient HBV genotype D sequences were all found in Central Asia. HBV-DA27, found in Kazakhstan and dated to 1.6 ka, falls basal to the modern subgenotype D5 sequences that today are found in the Paharia tribe from eastern India32. DA27 and the Paharia people in India are linked by their East Asian ancestry2,33.
Based on the observation that genotypes go extinct and can be created by recombination, the ancient sequence data show that the diversity that we observe today is only a subset of the diversity that has ever existed. These data support a scenario in which all present-day HBV diversity arose only after the split of the Old World and New World genotypes (25–13.4 ka). Any attempt to interpret the currently known HBV tree based on human migrations that happened before this event will necessarily result in anomalies that cannot be reconciled, such as the basal position of genotypes F and H and the apical position of subgenotype C4, which is found exclusively in indigenous Australians8. If HBV did co-evolve with ancient modern humans as they left Africa as previously proposed6, most of the pattern of earlier diversity has been replaced by changes that happened after the split of the Old and New World genotypes. Genotypes F and H would therefore be remnants of the earlier now-extinct diversity, and the arrival of subgenotype C4 in Australia would have taken place long after the split between Old and New World genotypes, as supported by the tree in Fig. 2. Alternatively, there could have been a New World origin of HBV or the virus could have been introduced into humans from a different host. Our data do not allow us to speculate either way.
To our knowledge, we report the oldest exogenous viral sequences recovered from DNA of humans or any vertebrate, and show that it is possible to recover viral sequences from samples of this age. We show that humans throughout Eurasia were widely infected with HBV for thousands of years. Despite the age of the samples and the imperfect diagnostic test, our dataset contained a high proportion of HBV-positive individuals. The actual ancient prevalence during the Bronze Age and thereafter might have been higher, reaching or exceeding the prevalence typically found in contemporary indigenous populations5. This clearly establishes the potential of HBV as powerful proxy tool for research into human spread and interactions. The data from ancient genomes reveal aspects of complexity in HBV evolution that are not apparent when only modern sequences are considered. They show the existence of ancient HBV genotypes in locations incongruent with their present-day distribution, contradicting previously suggested geographical or temporal origins of genotypes or sub-genotypes; evidence for the creation of genotype A via recombination and the emergence of the genotype outside Africa; at least one now-extinct human genotype; ancient genotype-level localized diversity; and demonstrate that the viral substitution rate obtained from modern heterochronously sampled sequences is probably misleading. Together, these findings suggest that the difficulty in formulating a coherent theory for the origin and spread of HBV may be due to genetic evidence of an earlier evolutionary scenario being overwritten by relatively recent alterations, as has previously been suggested in the context of recombination24. The lack of ancient sequences limits our understanding of the evolution of HBV and very probably of other viruses. Discovery of additional ancient viral sequences may provide a clearer picture of the true origin and early diversification of HBV, enable us to address questions of palaeo-epidemiology, and broaden our understanding of the contributions of natural and cultural changes (including migrations and medical practices) to human disease burden and mortality.
No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.
The following HBV datasets were used in the present study. Full listings of accession numbers are given in the Supplementary Methods.
Dataset 1 comprises 26 HBV genomes, covering all species in the Orthohepadnaviridae. This includes one sequence each from the human HBV genotypes (A–J), orangutan, chimpanzee, gorilla, gibbon, woolly monkey, woodchuck, ground squirrel, Arctic ground squirrel and horseshoe bat, four sequences from roundleaf bats, and three sequences from tent-making bats, largely following a previous publication34.
Dataset 2 comprises 124 HBV genomes, from humans and non-human primates. This set contains 92 sequences from a previous publication11 (excluding their incomplete sequences), 7 additional genotype D sequences, the Korean mummy genotype C sequence20, the 12 ancient sequences from the present study and 12 full genomes selected from a set of 9,066 full HBV genomes downloaded from NCBI35 on 24 August 2017 (Entrez query: hepatitis b virus[organism] not rna[title] not clone[title] not clonal[title] not patent[title] not recombinant[title] not recombination[title] and 3000:4000[sequence length]) corresponding to the closest, non-artificial match for each of the ancient sequences. Dates for these sequences were acquired by looking for a date of sample collection in the NCBI entry, or the paper in which the sequence was first published. If a range of dates was mentioned, the mean was used. If no date of sample collection was found in this way, either the year of the publication of the paper, or the year of addition of the sequence to GenBank was used, whichever was earlier.
Dataset 3 comprises 124 HBV genomes, from humans, non-human primates and a variety of other Orthohepadnaviridae host species, including woolly monkey, roundleaf bat, tent-making bat, ground squirrel, Arctic ground squirrel, woodchuck and snow goose. This set contains 113 sequences that were obtained from the union of 91 sequences from Paraskevis et al.11 and 29 from Drexler et al.34, plus 11 additional sequences (giving 124 sequences in total).
Dataset 4 comprises 3,505 HBV genomes. Of these, 3,384 are from a previous publication36, divided into ten human genotypes. To these, we added 17 chimpanzee, 56 gorilla, 12 gibbon and 36 orangutan full HBV genome sequences downloaded from NCBI on 18 January 2017, resulting in 14 genome categories.
Dating of ancient samples
Sample ages were determined by direct 14C dating. These ages were calibrated using OxCal37 (version 4.3) using the IntCal13 curve38. Table 1 shows the 14C age and standard deviation for each sample. This is followed by the median probability calibrated age before present (cal. bp). RISE386 was 14C dated twice, with ages (standard deviation) of 3,740 (33) and 3,775 (34); a rounded mean of 3,758 (34) was used for its calibration. DA29 was dated at 822 years using 14C and also at about 700 years using multi-proxy methods: the former date was used for consistency. The dates for DA119, DA222, RISE548, RISE556, RISE568 and RISE597 are best estimates, based on sample context.
Data and data processing
We analysed 101 Bronze Age samples published in Allentoft et al.1, 137 predominantly Iron Age samples published in Damgaard et al.2 and 66 additional samples from the Bronze Age. A total of 114.58 × 109 Illumina HiSeq 2500 sequencing reads were processed.
AdapterRemoval39 (version 2.1.7) was used with its default settings to remove adaptors from all sequences, to trim N bases from the ends of reads and to trim bases with quality ≤ 2. Reads were aligned against a human genome (GRCh38, https://www.ncbi.nlm.nih.gov/grc/human) using BWA40 (version 0.7.15-r1140, mem algorithm). Reads that did not match the human genome were then mapped against the NCBI viral protein reference database containing 274,038 viral protein sequences (downloaded on 31 August 2016) using DIAMOND41 (version 0.8.25). Protein matches were grouped into their corresponding viruses. Reads matching HBV were found in 25 samples.
The non-human reads from the samples that had more than three reads matching HBV using DIAMOND were selected for a subsequent BLAST42 (version 2.4.0) analysis. A BLAST database was made from dataset 3, and samples were matched using BLASTn (with arguments -task blastn -evalue 0.01). Matching reads with bit scores greater than 50 for all samples (except DA222 (70) and DA45 (55)) were selected for subsequent processing. The number of reads selected from the BLAST matches, per sample, is shown in Table 1, with additional detail in Extended Data Table 1. Across all samples 11,149 reads matched against HBV sequences.
Real-time PCR was established using primers and TaqMan probes as previously described43, which were used to amplify a 91-base-pair amplicon of the HBV genome. Primers and probe were added to QuantiTect PCR mix (Qiagen #204343) in a final concentration of 400 nM or 200 nM, respectively, in a total reaction volume of 25 μl, including 5 μl template. Using the Roche LC480 or Agilent Mx3006p instruments, PCRs were incubated for 15 min at 95 °C followed by 45 cycles of 15 s at 94 °C and 60 s at 60 °C, measuring fluorescence from the 6-carboxy-fluorescein/BHQ1-labelled probe and the passive dye (ROX) at the end of each cycle.
Careful precautions were taken to prevent PCR contamination. PCR mastermixes were prepared in dedicated aDNA clean laboratory facilities, in which no prior targeted work has been carried out on HBV. aDNA extracts and non-template controls (NTCs) were added into PCR reactions in this location too, and were not subsequently opened. Positive control material was handled in laboratories in a physically separated building. Here, standard material, diluted to 5–50 copies per reaction, was added to duplicate PCR reactions along with additional NTCs.
Fourteen samples with sufficient sample material were selected for virus capture (DA27, DA29, DA45, DA51, DA85, DA89, DA119, DA195, DA222, RISE254, RISE386, RISE416, RISE568 and RISE556). The viral reference genomes for probes were selected as follows. The International Committee for Taxonomy of Viruses (ICTV) 2012 listed 2,618 viral species. As many had no associated reference genomes or merely partial sequence information, we selected 2,599 sequences of full-length viral genomes, available from GenBank (June 2014), representing viral species found in vertebrates excluding fish. Sequences < 1,000 nucleotides were discarded. Sequences with identical length and organism identification were regarded as duplicates and thus reduced to 1. For a number of specific viral taxa for which a large number of similar reference sequences are available, we manually selected representative genomes or genome segments (Supplementary Tables 1, 2). For example, among 72 available hepatitis C virus genome sequences, we selected one genome per subtype (subtypes 1a–1c, 1 g; 2a–c, 2i, 2k; 3a, 3b, 3i, 3k; 4a–4d, 4 f, 4 g, 4k–4r, 4t; 5a; 6a–6 u; 7a). Likewise, 12 HIV-1 genomes were selected to represent group M (subtypes A–D, F1, F2, H, J, K, N, O and P). For influenza A virus, we included only sequences from segment 7 and segment 5 that encode the conserved matrix proteins M1/M2 and the nucleocapsid protein NP, respectively. We selected 82 M1/M2 segments and 115 NP segments among the available segment sequences. All available segments were included from genomes belonging to Arenaviridae, Bunyavirales and Reoviridae. For members of Poxviridae for which full genomes were unavailable (skunkpox, raccoonpox and volepox viruses) sequences representing the conserved gene encoding the DNA-dependent RNA polymerase were included (n = 22). In addition, two partial genomes of squirrelpox virus were included. By mistake, two and nine partial sequences were included from Iridoviridae (1.5–2.5 kb) and Coronaviridae (1.3–14.5 kb), respectively, already represented by full genomes. Likewise, sequences representing Merkel cell polyomavirus and KI polyomavirus were not included among the reference genomes used for probe design. SeqCap EZ hybridization probes were designed and synthesized by Roche NimbleGen based on the resulting reference sequences.
Capture was performed on double-indexed libraries prepared from aDNA, following the manufacturer’s protocol (version 4.3) with the following modifications. In brief, 1.8 to 2.2 μg of pooled libraries were hybridized at 47 °C for 65–70 h with low complexity C0T-1 DNA, specific P5/P7 adaptor-blocking oligonucleotides each containing a hexamer motif of inosine nucleotides to match individually indexed adapters, hybridization buffer containing 10% formamide, and the capture probes. Dynabeads M-270 (Invitrogen) were used to recover the hybridized library fragments. After washing and eluting the libraries, the post-capture PCR amplification was performed with KAPA uracil + polymerase (Kapa Biosystems). PCR cycling conditions were as follows: 1 cycle of 3 min at 95 °C, followed by 14 cycles of: 20 s denaturation at 98 °C, 15 s annealing at 65 °C and 30 s elongation at 72 °C, ending with 5 min at 72 °C. The amplified captured libraries were purified using AMPureXP beads (Agencourt).
Shotgun sequencing data were generated as previously described1. Sequencing of target-enriched libraries was performed on Illumina Hiseq2500 SR80bp, V4 chemistry.
The resulting reads were compared to dataset 2 using BLASTn (with arguments -task blastn -evalue 0.01). Matching reads with bit scores greater than 50 for all samples (except DA222 (70) and DA45 (55)) were selected for subsequent processing. In total, 6,757 reads matched HBV in the capture data.
The following evidence leads us to believe that the ancient HBV sequences are authentic and that the possibility of contamination can be excluded.
(1) Standard precautions for working with aDNA were applied44.
(2) Sequences were checked for typical aDNA damage patterns using mapDamage45 (version 2.0.6). Whenever sufficient amounts of data were available ( > 200 HBV reads), we found C > T mutations at the 5′ end, typical of aDNA46 (see Extended Data Fig. 1a, c).
(3) Capture was performed on sample DA222 DNA extracts with and without pre-treatment by uracil-specific excision reagent (USER)47. After USER treatment (3 h at 37 °C) of the aDNA extract, the damage pattern is eliminated (Extended Data Fig. 1b).
(4) As the ancient viruses are from three different HBV genotypes (A, B and D) and a clade in sister relationship to chimpanzee and gorilla HBVs, any argument that samples were contaminated would have to account for this diversity as well as the sequence novelty.
(5) HBV sequences were identified in 25 of 304 analysed samples (Table 1), showing that the findings cannot be due to a ubiquitous laboratory contaminant.
(6) Despite the low frequency of positive samples, we sequenced extraction blanks to provide additional evidence against the possibility that the HBV sequences stemmed from sporadic incorporation, amplification and sequencing of background reagent contaminants into the aDNA libraries. The negative extraction controls were amplified for 40 PCR cycles, and BLAST was used to match the read sequences against dataset 3, with the same parameters used for the ancient samples. Because the ancient HBV-positive reads used to assemble genomes all had bit scores of at least 50 (see ‘Data and data processing’), we filtered the negative extraction control BLAST output for reads with a bit score ≥ 45. No reads (out of 23 million) matched any HBV genome at that level.
(7) HBV is a blood-borne virus that is mainly transmitted by exposure to infectious blood and that does not occur in the environment3, making contamination during archaeological excavation extremely unlikely.
Reads from the original sequencing and from the capture were aligned to a reference genome (Supplementary Table 3) in Geneious48 (version 9) using medium sensitivity/fast and iterate up to 5 times. Because aDNA damage often clusters towards read termini46, the resulting alignments were carefully curated by hand to remove non-matching termini of reads if the majority of the read showed a very good match with the reference sequence.
All reads used to construct the ancient HBV consensus sequences were matched against the full NCBI nucleotide database (downloaded 28 December 2016) using BLAST. Of these reads, 97.5% had HBV as their top match. All ancient consensus sequences were matched against the full HBV genomes of dataset 4 with the Needleman–Wunsch algorithm49, as implemented in EMBOSS50 (version 184.108.40.206). For each ancient sequence, the percentage of sequence identity with the most similar representative of each modern genotype and four non-human primate species is listed in Extended Data Table 4a. The Needleman–Wunsch algorithm was also used to calculate the pairwise sequence similarity between all ancient sequences (Extended Data Table 4b).
The recombination detection program51 version 4 (RDP4) was used to search for evidence of recombination within the 12 ancient sequences and a selection of 15 modern human and non-human primate sequences (Supplementary Methods). Recombination with HBV-RISE387 as the recombinant and HBV-DA51 as one parent, was suggested at positions 1567–2256, by seven recombination methods (RDP52, GENECONV53, BootScan54, MaxChi55, Chimaera56, SiScan57 and 3Seq58) with P values from 1.179 × 10−6 to 5.336 × 10−11 (Extended Data Table 2). The same recombination was suggested for all 4 ancient genotype A and two modern genotype A sequences. Graphical evidence of the recombination and the predicted break point distribution for sequences HBV-RISE386 and HBV-RISE387 from three methods (MaxChi, Bootscan and RDP) is shown in Extended Data Fig. 3.
Initial maximum likelihood phylogenies
An initial maximum likelihood tree was generated to ascertain whether the ancient sequences fall within the primate HBV clades. Dataset 1 and the ancient sequences were aligned in MAFFT59 (version 7). The maximum likelihood tree was constructed using PhyML60 (version 20160116), optimizing topology, branch lengths and rates. We used a general time reversible (GTR) substitution model, with base frequencies determined by maximum likelihood, and a maximum likelihood-estimated proportion of invariant sites and 100 bootstraps (Extended Data Fig. 2). Furthermore, a maximum likelihood tree (Extended Data Fig. 4) was generated based on a MAFFT alignment of dataset 2 and the ancient sequences, using the same parameters as above.
Dated coalescent phylogenies
To check for a temporal signal in the data, root-to-tip regressions and date randomization tests were performed. For the root-to-tip regression, input trees were calculated using dataset 2 with the addition of a woolly monkey sequence (GenBank Accession Number: AF046996) as an outgroup. Three phylogenetic algorithms were used; neighbour joining, maximum likelihood (PhyML), and Bayesian (MrBayes61 (version 3.2.5)) methods (Supplementary Figs. 1–3). Root-to-tip distances were extracted using TempEst62 (version 1.5). For maximum likelihood and Bayesian methods, root-to-tip distances (in substitutions per site) were extracted from optimized tree topologies (maximum likelihood and maximum clade credibility trees, respectively). For the neighbour joining method, root-to-tip distances were averaged over 1,000 bootstrap replicates. Regression analyses were performed with Scipy (version 0.16.0; http://www.scipy.org). For the date randomization tests, we used three different approaches to randomize tip dates. First, tip dates were randomized between all sequences in the phylogeny. Second, tip dates were randomized only among the ancient sequences presented in this Letter, as well as the Korean mummy sequence (GenBank accession number JN315779), while the modern sequences retained their correct ages. Third, dates were randomized within a clade. For each of the three approaches, we performed three independent randomizations. This resulted in a total of nine analyses, which were run for 100,000,000 generations each, under the relaxed log-normal clock model and coalescent exponential tree prior. We also ran the same analyses under a strict clock and coalescent Bayesian skyline tree prior, which were run for 20,000,000 generations. We used a GTR substitution model with unequal base frequencies, four gamma rate categories, estimated gamma distribution of rate variation and estimated proportion of invariant sites, as found by bModelTest63 (version 1.0.4). None of the analyses using the relaxed clock converged (estimated sample size < 200). This is most probably because the mis-specification of the dates leads to incongruence between the sequence and time information. Under the strict clock model, all runs converged, and none of the 95% HPD intervals of the root age overlapped between the randomized and the non-randomized runs, fulfilling the criteria for evidence of a temporal signal64.
Dated phylogenies were estimated using BEAST225 (version 2.4.4, prerelease). We used a MAFFT alignment of dataset 2. Using bModelTest63, we selected a GTR substitution model with unequal base frequencies, four gamma rate categories, estimated gamma distribution of rate variation and estimated proportion of invariant sites. Proper priors were used throughout. Path sampling, as implemented in BEAST2, was performed to select between a strict or relaxed log-normal clock and a coalescent constant, exponential or coalescent Bayesian skyline tree prior (Extended Data Table 3a). Likelihood values were compared using a Bayes factor test. A Bayes factor in the range of 3–20 implies positive support, 20–150 strong support and > 150 overwhelming support65. The relaxed log-normal clock model in combination with a coalescent exponential tree prior was favoured (Extended Data Table 3a). For the final tree, a Markov chain Monte Carlo analysis was run until parameters reached an estimated sample size > 200, sampling every 2,000 generations. Convergence and mixing were assessed using Tracer66 (version 1.6). The final tree files were subsampled to contain 10,000 or 10,710 (for the relaxed log-normal clock, coalescent exponential tree prior) trees, with the first 25% of samples discarded as burn-in. Maximum clade credibility trees were made using TreeAnnotator25 (version 2.4.4 prerelease).
To formally test the ‘Out of Africa’ hypothesis of HBV evolution, calibration points were tested using path sampling as implemented in BEAST2. Calibration points were constrained as follows. For the split of genotypes F and H, the most-recent common ancestor (MRCA) of all genotype F and H sequences was constrained using a uniform (13,400:25,000) distribution, as this is the range of estimates for when the Americas were first colonized67,68. For the split of subgenotype A3 in Haiti, the MRCA of FJ692598 and FJ692611 was constrained using a uniform (200:500) distribution, owing to the timing of the slave trade to Haiti69. For the split of C3 in Polynesia, the MRCA of X75656 and X75665 was constrained using a uniform (5,100: 2,000) distribution, owing to the range of estimates for the MRCA of Polynesian populations11,70. Calibration points were tested under both a relaxed log-normal clock, coalescent exponential tree prior, and a strict clock, Bayesian skyline tree prior.
Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.
The complete sequences in this study have been deposited in the European Nucleotide Archive under sample accession numbers ERS2295383–ERS2295394. All other data are available from the corresponding author upon reasonable request.
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Damgaard, P. d. B. et al. 137 ancient human genomes from across the Eurasian steppes. Nature https://doi.org/10.1038/s41586-018-0094-2 (2018).
Lai, C. L., Ratziu, V., Yuen, M.-F. & Poynard, T. Viral hepatitis B. Lancet 362, 2089–2094 (2003).
Schweitzer, A., Horn, J., Mikolajczyk, R. T., Krause, G. & Ott, J. J. Estimations of worldwide prevalence of chronic hepatitis B virus infection: a systematic review of data published between 1965 and 2013. Lancet 386, 1546–1555 (2015).
Murhekar, M. V., Murhekar, K. M. & Sehgal, S. C. Epidemiology of hepatitis B virus infection among the tribes of Andaman and Nicobar Islands, India. Trans. R. Soc. Trop. Med. Hyg. 102, 729–734 (2008).
Locarnini, S., Littlejohn, M., Aziz, M. N. & Yuen, L. Possible origins and evolution of the hepatitis B virus (HBV). Semin. Cancer Biol. 23, 561–575 (2013).
Littlejohn, M., Locarnini, S. & Yuen, L. Origins and evolution of hepatitis B virus and hepatitis D virus. Cold Spring Harb. Perspect. Med. 6, a021360 (2016).
Kramvis, A. Genotypes and genetic variability of hepatitis B virus. Intervirology 57, 141–150 (2014).
Hannoun, C., Horal, P. & Lindh, M. Long-term mutation rates in the hepatitis B virus genome. J. Gen. Virol. 81, 75–83 (2000).
Zhou, Y. & Holmes, E. C. Bayesian estimates of the evolutionary rate and age of hepatitis B virus. J. Mol. Evol. 65, 197–205 (2007).
Paraskevis, D. et al. Dating the origin of hepatitis B virus reveals higher substitution rate and adaptation on the branch leading to F/H genotypes. Mol. Phylogenet. Evol. 93, 44–54 (2015).
Zehender, G. et al. Enigmatic origin of hepatitis B virus: an ancient travelling companion or a recent encounter? World J. Gastroenterol. 20, 7622–7634 (2014).
Kramvis, A. et al. Relationship of serological subtype, basic core promoter and precore mutations to genotypes/subgenotypes of hepatitis B virus. J. Med. Virol. 80, 27–46 (2008).
MacDonald, D. M., Holmes, E. C., Lewis, J. C. & Simmonds, P. Detection of hepatitis B virus infection in wild-born chimpanzees (Pan troglodytes verus): phylogenetic relationships with human and other primate genotypes. J. Virol. 74, 4253–4257 (2000).
Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541, 302–310 (2017).
Rasmussen, S. et al. Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago. Cell 163, 571–582 (2015).
Feldman, M. et al. A high-coverage Yersinia pestis genome from a sixth-century Justinianic plague victim. Mol. Biol. Evol. 33, 2911–2923 (2016).
Reid, A. H., Fanning, T. G., Hultin, J. V. & Taubenberger, J. K. Origin and evolution of the 1918 “Spanish” influenza virus hemagglutinin gene. Proc. Natl Acad. Sci. USA 96, 1651–1656 (1999).
Duggan, A. T. et al. 17th century variola virus reveals the recent history of smallpox. Curr. Biol. 26, 3407–3412 (2016).
Kahila Bar-Gal, G. et al. Tracing hepatitis B virus to the 16th century in a Korean mummy. Hepatology 56, 1671–1680 (2012).
Patterson Ross, Z. et al. The paradox of HBV evolution as revealed from a 16th century mummy. PLoS Pathog. 14, e1006750 (2018).
Bond, W. W. et al. Survival of hepatitis B virus after drying and storage for one week. Lancet 317, 550–551 (1981).
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
Simmonds, P. & Midgley, S. Recombination in the genesis and evolution of hepatitis B virus genotypes. J. Virol. 79, 15467–15476 (2005).
Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 10, e1003537 (2014).
Simmonds, P. Reconstructing the origins of human hepatitis viruses. Phil. Trans. R. Soc. Lond. B 356, 1013–1026 (2001).
Tedder, R. S., Bissett, S. L., Myers, R. & Ijaz, S. The ‘Red Queen’ dilemma—running to stay in the same place: reflections on the evolutionary vector of HBV in humans. Antivir. Ther. 18, 489–496 (2013).
Duchêne, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proc. R. Soc. Lond. B 281, 20140732 (2014).
Zehender, G. et al. Reliable timescale inference of HBV genotype A origin and phylodynamics. Infect. Genet. Evol. 32, 361–369 (2015).
Hannoun, C., Söderström, A., Norkrans, G. & Lindh, M. Phylogeny of African complete genomes reveals a West African genotype A subtype of hepatitis B virus and relatedness between Somali and Asian A1 sequences. J. Gen. Virol. 86, 2163–2167 (2005).
Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl Acad. Sci. USA 111, 2632–2637 (2014).
Ghosh, S. et al. Unique hepatitis B virus subgenotype in a primitive tribal community in eastern India. J. Clin. Microbiol. 48, 4063–4071 (2010).
Basu, A., Sarkar-Roy, N. & Majumder, P. P. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc. Natl Acad. Sci. USA 113, 1594–1599 (2016).
Drexler, J. F. et al. Bats carry pathogenic hepadnaviruses antigenically related to hepatitis B virus and capable of infecting human hepatocytes. Proc. Natl Acad. Sci. USA 110, 16151–16156 (2013).
Geer, L. Y. et al. The NCBI BioSystems database. Nucleic Acids Res. 38, D492–D496 (2010).
Bell, T. G., Yousif, M. & Kramvis, A. Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database. Springerplus 5, 1896 (2016).
Bronk Ramsey, C. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360 (2009).
Reimer, P. J. et al. IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal bp. Radiocarbon 55, 1869–1887 (2013).
Lindgreen, S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes 5, 337 (2012).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Drosten, C., Weber, M., Seifried, E. & Roth, W. K. Evaluation of a new PCR assay with competitive internal control sequence for blood donor screening. Transfusion 40, 718–724 (2000).
Willerslev, E. & Cooper, A. Review Paper. Ancient DNA. Proc. R. Soc. Lond. B 272, 3–16 (2005).
Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).
Orlando, L., Gilbert, M. T. P. & Willerslev, E. Reconstructing ancient genomes and epigenomes. Nat. Rev. Genet. 16, 395–408 (2015).
Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).
Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: detection and analysis of recombination patterns in virus genomes. Virus Evol. 1, vev003 (2015).
Martin, D. & Rybicki, E. RDP: detection of recombination amongst aligned sequences. Bioinformatics 16, 562–563 (2000).
Padidam, M., Sawyer, S. & Fauquet, C. M. Possible emergence of new geminiviruses by frequent recombination. Virology 265, 218–225 (1999).
Martin, D. P., Posada, D., Crandall, K. A. & Williamson, C. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res. Hum. Retroviruses 21, 98–102 (2005).
Smith, J. M. Analyzing the mosaic structure of genes. J. Mol. Evol. 34, 126–129 (1992).
Posada, D. & Crandall, K. A. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl Acad. Sci. USA 98, 13757–13762 (2001).
Gibbs, M. J., Armstrong, J. S. & Gibbs, A. J. Sister-scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics 16, 573–582 (2000).
Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 176, 1035–1047 (2007).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).
Rambaut, A., Lam, T. T., Max Carvalho, L. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
Bouckaert, R. R. & Drummond, A. J. bModelTest: Bayesian phylogenetic site model averaging and model comparison. BMC Evol. Biol. 17, 42 (2017).
Duchêne, S., Duchêne, D., Holmes, E. C. & Ho, S. Y. W. The performance of the date-randomization test in phylogenetic analyses of time-structured virus data. Mol. Biol. Evol. 32, 1895–1906 (2015).
Kass, R. E. & Raftery, A. E. Bayes Factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Rambaut, A., Suchard, M. A., Xie, D. & Drummond, A. J. Tracer v1.6. https://github.com/beast-dev/tracer/releases/tag/v1.6 (2017).
Sanchez, G. et al. Human (Clovis)–gomphothere (Cuvieronius sp.) association ∼ 13,390 calibrated yBP in Sonora, Mexico. Proc. Natl Acad. Sci. USA 111, 10972–10977 (2014).
Bourgeon, L., Burke, A. & Higham, T. Earliest human presence in North America dated to the Last Glacial Maximum: new radiocarbon dates from Bluefish Caves, Canada. PLoS ONE 12, e0169486 (2017).
Andernach, I. E., Nolte, C., Pape, J. W. & Muller, C. P. Slave trade and hepatitis B virus genotypes and subgenotypes in Haiti and Africa. Emerg. Infect. Dis. 15, 1222–1228 (2009).
Kayser, M. et al. Melanesian and Asian origins of Polynesians: mtDNA and Y chromosome gradients across the Pacific. Mol. Biol. Evol. 23, 2234–2244 (2006).
B.B. thanks D. Tserendulam for help, wisdom and guidance. E.W. thanks St John’s College, Cambridge for facilitating scientific discussion. We thank S. Rankin and the staff of the University of Cambridge High Performance Computing service and the National High-throughput Sequencing Centre (Copenhagen). This work was supported by: The Danish National Research Foundation, The Danish National Advanced Technology Foundation (The Genome Denmark platform, grant 019-2011-2), The Villum Kann Rasmussen Foundation, KU2016, European Union FP7 programme ANTIGONE (grant agreement No. 278976), European Union Horizon 2020 research and innovation programmes, COMPARE (grant agreement No. 643476), VIROGENESIS (grant agreement No. 634650) and the Lundbeck Foundation. The National Reference Center for Hepatitis B and D Viruses is supported by the German Ministry of Health via the Robert Koch Institute (Berlin). B.B. was supported by Taylor Family-Asia Foundation Endowed Chair in Ecology and Conservation Biology. A.D.M.E.O. was supported by N-RENNT of the Ministry of Science and Culture of Lower Saxony, Germany.
Nature thanks P. Simmonds, B. Shapiro, C. Pepperell and the other anonymous reviewer(s) for their contribution to the peer review of this work.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
The frequencies of the mismatches observed between the HBV reference sequences (Supplementary Table 3) and the reads are shown as a function of distance from the 5′ end. C > T (5′) and G > A (3′) mutations are shown in red and blue, respectively. All other possible mismatches are shown in grey. Insertions are shown in purple, deletions in green and clippings in orange. The count of reads matching HBV for each sample is shown in parentheses. a, Damage patterns for RISE563, DA222, DA119, RISE254, DA195, DA27, DA51, RISE386, RISE387, DA29, DA45, RISE416 and RISE154. b, Damage patterns for DA222 without (left) and with (right) USER treatment. c, Damage patterns with 10, 20, 50, 100, 200, 500 and 1,000 reads sampled from RISE563, in which each opaque line corresponds to one replicate set of reads.
This figure shows 26 Orthohepadnaviridae sequences (dataset 1, see Methods), including the ancient HBV sequences. Ancient genotype A sequences are shown in red, the ancient genotype B sequence in orange, ancient genotype D sequences in blue and novel genotype sequences in green. The tree was constructed in PhyML60, optimizing for topology, branch lengths and rates, with 100 bootstraps (see Methods). Internal nodes with < 70% bootstrap support are shown as polytomies.
RDP451 was used to analyse the set of 12 ancient sequences plus a representative set of 15 modern human and non-human primate sequences (see Methods). The seven recombination programs used by RDP4 suggested that all genotype A sequences are recombinants, with the genotype D sequence HBV-DA51 as the minor parent and an unknown major parent. The obvious interpretation is that recombination formed an ancestor of the oldest sequences, evidence of which is still present in the less-ancient and the modern representatives. The figure shows the graphical evidence and predicted recombination break-point distribution for the two oldest genotype A sequences, HBV-RISE386 and HBV-RISE387, according to three of the RDP4 methods (MaxChi, Bootscan and RDP). In all subplots, the predicted location of the break points is shown as a dashed vertical line and the surrounding grey area shows the 99% confidence interval for the break point. Subplots on the same row share their y axis and those in the same column share their x axis. a, HBV-RISE386 analysed by MaxChi. b, HBV-RISE386 analysed by Bootscan. c, HBV-RISE386 analysed by RDP. d, HBV-RISE387 analysed by MaxChi. e, HBV-RISE387 analysed by Bootscan. f, HBV-RISE387 analysed by RDP.
The sequences from dataset 2 (see Methods) and the ancient sequences were aligned in MAFFT59. The tree was constructed in PhyML60, optimizing for topology, branch lengths and rates, with 100 bootstraps (see Methods). Internal nodes with < 70% bootstrap support are shown as polytomies. Ancient genotype A sequences are shown in red, ancient genotype B sequences in orange, ancient genotype D sequences in blue and novel genotype sequences in green. Taxon names indicate: genotype or subgenotype, GenBank accession number, age, abbreviation of country of sequence origin, region of sequence origin, host species and optional additional remarks. Note that the maximum likelihood tree shows topological uncertainty (polytomies) in areas where the BEAST225 tree (Fig. 2) is well resolved. This is the case for two reasons. First, BEAST2 always produces a fully resolved binary topology without polytomies. Second, and more important, BEAST2 creates a time tree and uses tip dates to constrain the possible topologies under consideration. Thus, BEAST2 can know that certain topologies are unlikely or impossible, whereas maximum likelihood cannot and thus inherently has greater uncertainty regarding tree topology.
a, Regression of root-to-tip distances and ages performed in Scipy (http://www.scipy.org). One hundred and twenty-four branch lengths were extracted using TempEst62 from trees inferred using neighbour joining, maximum likelihood and Bayesian methods. Shaded areas show 95% confidence intervals. Slopes are 1.01 × 10−5, 1.20 × 10−5 and 4.21 × 10−6, and correlation coefficients are 0.45 (R2 = 0.2), 0.36 (R2 = 0.13) and 0.51 (R2 = 0.26), for maximum likelihood, Bayesian and neighbour joining trees, respectively. b, Date randomization tests under the strict clock model. The median and 95% HPD interval for the substitution rates are given. The rate for the correctly dated tree is shown in red. Dates were randomized within all sequences, within the ancient sequences only, and within each genotype. We performed three replicates of each. None of the 95% HPD intervals for the randomized runs overlaps with the 95% HPD intervals for the correctly dated runs, suggesting the presence of a temporal signal in the data.
This file is in PDF format and contains: Three Supplementary Tables: SI Tables 1 and 2 describe the number of reference genomes and accession numbers of sequences used to design capture probes. SI Table 3 contains additional information for the HBV positive samples. A Supplementary Methods section, showing: 1) An investigation into the dependence of damage patterns on the number of reads, 2) Lists of accession numbers for sequences included in the different analyses, and 3) The three phylogenetic trees used for the regression analysis, inferred using neighbour joining, maximum likelihood and Bayesian methods.
About this article
Cite this article
Mühlemann, B., Jones, T.C., Damgaard, P. et al. Ancient hepatitis B viruses from the Bronze Age to the Medieval period. Nature 557, 418–423 (2018). https://doi.org/10.1038/s41586-018-0097-z
Different evolutionary dynamics of hepatitis B virus genotypes A and D, and hepatitis D virus genotypes 1 and 2 in an endemic area of Yakutia, Russia
BMC Infectious Diseases (2022)
Archives of Virology (2022)
Virology Journal (2021)
The interplay between non-alcoholic fatty liver disease and innate immunity in hepatitis B virus patients
Egyptian Liver Journal (2021)
Nature Reviews Genetics (2021)