Introduction

The Scandinavian Peninsula (hereafter referred to as Scandinavia) gradually became accessible to humans when the Weichsel ice sheet melted away after the late glacial maximum (LGM) peaked between 27,000 and 21,000 years before present (BP). There is material evidence for sporadic human presence in ice-free areas around 16,000 BP, but from around 12,800 BP there was a proper expansion into Scandinavia1, and from 11,700 BP there was a continuous human presence in the southwestern part of the Peninsula2,3,4. However, the earliest human DNA from the Scandinavian Peninsula is only slightly younger than 10,000 years old5.

The early Mesolithic sites exhibit a rich material culture (lithics, bone and antler) that has been used to infer demography, mobility, social relations, use of technology, and the subsistence strategies6. While this has provided an understanding of the Early Mesolithic people in Scandinavia, we still don’t have enough molecular data to contextualise their oral microbiome profile, pathogen burden, diet related information, and paleoenvironmental utilisation.

Recently, metagenomic analysis of ancient DNA reads sequenced from ancient human samples have enabled researchers to obtain information about the historical human populations with a great depth. Ancient dental calculus has become the predominately used material to investigate oral microbiome, due to its structural integrity7. Lately, chewed pitch materials are also proved to be useful to access historical oral microbiome8.

Ancient dental calculus studies provide information about the oral microbial species in dental biofilm formation and maturation, and its association with dental health9,10,11. Moreover, changes in the oral microbiome through archaeological time periods can also be investigated. For example, the transition from hunter-gatherer societies to farmers brought a significant increase in human oral health pathologies, which are associated with dietary and lifestyle changes12. Several ancient DNA studies had investigated changes in oral microbiome composition from the Mesolithic period to modern day and their relation to dietary changes and population movements13,14,15,16, as well as pre and post Columbian contact17. It is also possible to identify dietary remains and DNA reads related to the paleoenvironment through ancient DNA and microscopic studies18,19,20.

Another promising source of ancient microbiome DNA investigated here are masticates, or pitch pieces that have been chewed by humans and therefore contain materials from the oral cavity. Previous studies had investigated the oral microbiome and dietary components from 5,700 years old chewed pitch from Denmark and successfully proved the potential usage in ancient DNA research domain5,8.

To better understand the Mesolithic community in Huseby Klev, we provide a metagenomic analysis of the pitch mastic pieces that are more than 9,500 years old, and that already have yielded information on Mesolithic demography5. This DNA represents the oldest known genomic and metagenomic material to date from humans and human activities in Scandinavia.

Our analysis shows the oral microbiome profile in three chewed pitch pieces along with high coverage genomes of human commensal oral microbes. By comparing our data with modern oral microbiomes, we demonstrate the increased abundance of oral microbes that could cause pathological conditions such as periodontitis. Moreover, we provide DNA evidence on dietary components and possible paleoenvironmental eukaryotes that represents the Huseby Klev archaeological site21.

Results

A snapshot of the Mesolithic oral microbiome

To have a broad overview of the metagenomic composition in our ancient pitch materials, we first compared the metagenomic profile of the samples (ble004, ble007, and ble008) against the taxonomic composition reported in the modern human oral cavity (N  = 725), modern human nasal cavity (N  = 93), modern human skin (N  = 42), modern human stool (N  = 500), modern human vagina (N  = 86), ancient human dental calculus (N = 395), and chewed pitch (N = 1) samples. The last sample is also chewed pitch but from a later period (5,858–5,661 cal. BP)8,9,10,11,14,15,16,17,19,22,23,24,25,26,27,28,29,30,31.

Phylum profiles of the ancient samples show a pattern closely related to oral microbiome samples. Also, the phylum-level profile of the pitch samples closely resembles the Syltholm pitch profile (Figure S1). In addition, the species' relative abundances show a remarkable presence of commensal human bacteria (Figure S2).

Subsequently, we calculated Bray–Curtis distances using species taxonomic composition to compare microbial diversity between the samples32 and visualised the distance matrix with the non-Metric Multidimensional Scaling (NMDS) method. The NMDS plot and Bray–Curtis distances show that microbial compositions in the ancient pitch material are closer to oral samples than to other parts of the human body. The ancient ble pitch samples cluster with modern oral microbiome samples in the NMDS plot along with the Syltholm chewed birch pitch (Fig. 1, Figure S3). In addition to these analyses, we quantified the potential oral microbiome contribution to the ancient mastic samples by using the same external dataset along with environmental sediment samples and we observed 39—57% of oral microbiome and ancient dental calculus contribution to ble chewed pitch pieces (Figure S4) 33,34,35,36.

Figure 1
figure 1

Non-metric multidimensional projection of the Bray–Curtis distance matrix based using the MetaPhlAn3 species abundance matrix. In this figure, we used in modern human oral cavity (N  = 725), modern human nasal cavity (N = 93), modern human skin (N = 42), modern human stool (N  = 500), modern human vagina (N  = 86), ancient human dental calculus (N  = 395), and chewed pitch (N  = 1) to characterise the microbiome composition in the ancient chewed materials. We can see that ancient chewed pitch materials are similar to the oral samples.

These results strongly suggest that the microbial profile of the pitch material originates from the oral cavity, corroborating that the mastic pieces were orally processed.

Validation of bacterial species presence

The taxonomic profiling mapped DNA reads in our libraries to 281 bacterial species (Table S1). To validate the presence of these bacteria in our ancient chewed pitch material, we aligned all reads to their corresponding reference genome. A homogeneous coverage across the genomes distinguishes bacterial presence from false positives.

We found 51 individual bacterial genomes with a minimum of 1 × depth of coverage and 40% breadth of coverage (Table 1). Most of these genomes correspond to pathogenic or opportunistic oral bacterial species. For example, we identified sequences from Porphyromonas gingivalis and Treponema denticola that belong to the red complex (a group of species that are found predominantly in periodontitis cases), Aggregatibacter actinomycetemcomitans (predominantly in sample ble004), and Streptococcus mutans. Furthermore, in all samples and predominantly in sample ble004 we note the presence of Haemophilus influenzae, Aggregatibacter actinomycetemcomitans, Cardiobacterium valvarum, Eikenella corrodens, and Kingella denitrificans.

Table 1 Genome reconstruction of ancient pathogens. Single stranded deamination rates were low for ble004 samples since damage-repaired libraries were included in the analysis. However, higher deamination rates observed in non-damage repaired ble004 DNA libraries confirm their authenticity. DOC: Mean depth of coverage, BOC: Breadth of coverage, δS: Single stranded deamination rate, SD: Standard deviation for delta S.

Authentication of ancient oral bacteria

We used three established criteria to authenticate the antiquity of the reads37,38: edit distance distribution, length distribution, and deamination patterns. It is expected that most of the sequences have edit distance at most 1. Also, we expect the majority of the sequences to be shorter than 100 nucleotides, and they should have deamination patterns towards the ends (Table S1, Figure S5). Edit distance describes the number of substitutions between a reference genome and aDNA reads thus providing an estimate for the evolutionary distance.

Table 1 shows the summary statistics after the genome alignment steps with at least 1 × depth of coverage. Note that several libraries of the ble004 sample were damage-repaired for human population genetics analysis, and thus shows a lower deamination rate. However, the non-damage repaired libraries of ble004 show the authentic deamination patterns.

De novo assembly of ancient bacteria

The metagenomic profile indicates presence of ancient oral bacteria, several of them likely opportunistic pathogens. Next, we reconstructed large portions of the ancient genomes by assembling de novo to further authenticate our data (Table S2). This strategy enabled us to assemble contigs that could be associated to a total of 49 unique bacterial species in our data.

Differentially abundant oral bacteria in ancient pitch materials

The types of bacterial species present in ancient chewing gum appears to be useful to explore the oral microbiome. This is in concordance with recent studies where some 15% of the salivary microbiome seems to get encapsulated in modern chewing gums39. Thus, the microbiome profiles from the pitch material should make it possible to identify the signs of dysbiosis in the microbiomes of the ancient individuals.

Using modern periodontitis, caries, and healthy salivary datasets40,41, we explored the presence of differentially abundant microbes that are specific to dysbiosis conditions. The analysis indicated 27 differentially abundant bacteria that could be used as markers for periodontitis and caries conditions (Fig. 2, Table S3). We found increased abundances of Actinomyces species along with Treponema denticola, Streptococcus anginosus, Slackia exigua, and Fusobacterium nucleatum, which are common in modern periodontitis cases. Concerning dental caries, we found increased abundance of Streptococcus sobrinus and Parascardovia denticolens compared to healthy oral microbiome profile.

Figure 2
figure 2

The Beta Binomial method applied to ancient mastics. Rows (Caries and Periodontitis) describing the two dysbiosis conditions tested. Each column represents a comparison condition. The condition control column represents the marker species that are significantly changed in each dysbiosis condition. Condition ble004, ble007 and ble008 columns represent the microbial species that were significant changes in ancient materials, compared to healthy cases.

With this method, we identified differentially abundant bacterial species in libraries ble004, ble007, and ble008 (Table S4). Several of those marker species were found in ancient masticated pitch samples such as Treponema denticola, Actinomyces johnsonii, Actinomyces timonensis, and Streptococcus anginosus (Table 2).

Table 2 Differential abundance of the marker species in BLE samples using the beta-binomial method. The taxa with increased differential abundance in the marker set is shown in bold. Markers: the taxa that we found differentially abundant in dysbiosis conditions.

Moreover, the results suggest a significant increase in the oral commensal bacterial species, in respect to healthy salivary samples such as Rothia dentocariosa, Gemella sanguinis, and Streptococcus sanguinis as well as Haemophilus influenzae, Aggregatibacter actinomycetemcomitans, Cardiobacterium valvarum, Eikenella corrodens, and Kingella denitrificans.

The ble004 sample is especially interesting, as eleven periodontitis-specific marker species show abundance patterns linked to disease. On the other hand, we found only one marker species (Actinomyces timonensis) linked to dental caries.

Machine learning predicts periodontitis-like oral microbiome in chewed pitch materials

We used a machine learning method to find the species that correlate with dysbiosis, and to predict sign of dysbiosis in ancient samples.

Using modern salivary microbiome data, we trained several Random Forests42 models for periodontitis and caries conditions, and selected the features (species) that contribute to the overall accuracy of the models. The best models classified the correct health status of the samples with an estimated error rate under 14%. The selected models identified 38 bacterial species (Table S6 for the periodontitis and caries models. They included most of the marker species that are differentially abundant in dysbiosis cases, and several additional species.

The clear results, including the low estimated error rates, allow us to infer the microbiome profile in ancient pitch materials. This was carried out by comparing the microbial abundances in the periodontitis and dental caries models, and calculating the probability values for concordance (Fig. 3). The probability values for the dental caries models were not high enough to conclude the presence of caries with sufficient certainty. However, the periodontitis models suggest the occurrence of periodontitis based on the oral microbiome from ancient pitch samples, with more than 70% probability. In the ble004 sample, the probability of the presence of a periodontitis-like microbiome is 84%. These results supported the conclusions from the differential abundance analysis.

Figure 3
figure 3

The class prediction probability values from at least six different best models were combined and the predictions are shown for Periodontitis models.

Eukaryotic species in ancient pitch pieces

Using five Kraken2 confidence interval thresholds, we authenticated DNA reads from 15 possible eukaryotic species (Table S7S11). In Table 3, only results from threshold 0.5 are shown. DNA reads mapped to hazelnut (Corylus avellana), red deer (Cervus elaphus), and brown trout (Salmo trutta) in ble004 and ble007; wolf (Canis lupus), hazelnut, red fox (Vulpes vulpes), arctic fox (Vulpes lagopus) in ble008 were present in all five threshold values. Eelgrass (Zostera marina) in ble004 and mallard (Anas platyrhynchos) in ble007 were present in four thresholds. Wolf, European robin (Erithacus rubecula), European crab apple (Malus sylvestris) in ble004 and European turtle dove (Streptopelia turtur) were present in three thresholds. Moreover, we authenticated DNA reads from mistletoe (Viscum album), apple (Malus domestica), tufted duck (Aythya fuligula), and limpet (Patella vulgata) in other Kraken2 thresholds.

Table 3 Authenticated eukaryotes using Kraken2 confidence interval threshold of 0.5.

These species passed the visual confirmation criteria: edit distance, length distribution, and deamination patterns.

In particular, wolf and hazelnut in ble008, mallard in ble007, and hazelnut in ble004 samples had more than 10,000 reads aligned to the respective reference sequences. Moreover, we found 1059 reads mapped to the mitochondrial genome of red fox in the ble008 sample.

Discussion

To broaden our understanding of human health and everyday life in Mesolithic Scandinavia, we investigated the metagenomic content of three pieces of ancient pitch material from one site. Using this data, we explored oral microbiome composition, diet related DNA reads, and resource utilisation.

The non-metric multidimensional scaling on Bray-Curtis32 distances grouped chewed pitch materials close to human oral microbiome and ancient dental calculus samples. Also, SourceTracker43 analysis showed a contribution of around 50% of modern oral microbiome and ancient dental calculus to ancient pitch samples. Moreover, there is a high relative abundance of species that are known to be commensal oral bacteria. This suggests that the DNA from chewed ancient pitch pieces is not specific to the pitch, nor to a specific tissue, but to most of the oral area in which the pitch was chewed.

To validate the presence of these commensal oral bacterial species we used two approaches. First, we compiled a large database curated from full length genomic sequences of Archaea, Bacteria, Viruses and Fungi, and used malt38 (an alignment-based metagenomic classification method) to confirm the presence of known oral commensal bacteria. Also, we de novo assembled the DNA reads and created draft ancient bacterial genomes. Therefore, our analysis proves that the metagenomic content in ancient pitch pieces is closer to oral microbiome profile and contains commensal oral bacterial species.

A general problem with ancient microbial paleopathology is that there can be bacterial presence without physical symptoms. Generally, the presence of pathogenic species correlates with dysbiotic states but not necessarily in a causal way. Thus, our results suggest dysbiosis but we cannot show disease symptoms.

Subsequently we examined the microbial species in pitch pieces that are over abundant. To do this, we used a modern salivary microbiome dataset that contains healthy, periodontitis, and dental caries conditions and ancient dental calculus samples that were identified as periodontitis.

Recently, Nearing et al.44 tested several differential abundance tools and showed a large variability between methods. Thus, they suggest using a consensus, that is, i. e. on results where several tools (based on different methods) agree.

In our study we applied two methods to identify bacterial species that correlate with dysbiosis conditions: a statistical method based on the beta-binomial distribution, and machine learning using Random Forests. The logic is, as suggested by Nearing et al.44, that the agreement between several techniques is a stronger argument for the detection of a given condition. Along the same line of reasoning, the reliability for dysbiosis detection increases when it is detected by several methods.

Using the modern salivary microbiome datasets, we identified specific bacterial species that are correlated with dysbiosis conditions and we search for the presence of these species in ancient chewed pitch pieces.

When we compare healthy and periodontitis modern samples, we observe increased abundance of Streptococcus, Actinomyces, Eubacterium, Dialister, and Treponema species along with a reduction of abundance of Neisseria, (Fig. 2) which is coherent with the findings of Li et al45.

Streptococcus mutans has been the most studied marker for dental caries, even though it is not considered to be the only cause of dental caries. For example, even in the absence of Streptococcus mutans, several acidophilic bacteria (e.g. Lactobacillus species) have been shown to co-occur with dental caries46,47.

Previous studies have shown that Treponema denticola, Slackia exigua48, Eubacterium brachy, and Streptococcus anginosus are likely associated with periodontitis. The predictor bacterial species from the best random forest models in this study identified most marker species and added additional potential markers. Moreover, in the caries random forest models, we found the marker Streptococcus mutans, which has been extensively studied in connection with dental caries cases. This is in line with the results of the beta-binomial test.

We confirmed the presence of the marker species in the ancient pitch material. We observed an increased abundance of periodontitis characteristic bacteria in all samples. Moreover, random forests models predict high probability for periodontitis in the Mesolithic individuals who chewed the pitch pieces. This is especially clear in ble004, where we had the highest number of differentially abundant markers.

The list of bacteria from the Syltholm chewed pitch material, which is a type of material similar to ours, indicates the presence of periodontitis specific bacterial taxa from the red-complex species. This sample dates to the Neolithic transition period but the Mesolithic human genetic component is compatible with our material8.

When contextualised, the results are compatible with what is generally known about the Mesolithic Scandinavian society. The Mesolithic population density was low49, with limited possibilities for pandemic-causing microbes to spread between humans, but not restricting the presence of bacteria from other sources than humans, like for example those causing systemic diseases including infective endocarditis. The wider use of the teeth, as tools, likely increased the risk for collecting periodontitis causing oral microbes.

The genetic traces we found complements the osteological and palaeobotanical results found from Huseby Klev and the species list in Table 3 reflects the palaeoenvironment of a settlement site located in the transitional zone between terrestrial and marine ecosystems. In particular, the remains of wolves, red fox, hazelnut, red deer and mallard were found in the several layers of Huseby Klev. In this perspective, the genetic data is consistent with the archaeological remains on site and reflects a Mesolithic environment21,50.

Moreover, other eukaryotic remains also fit the description of other Mesolithic sites in Europe. For example, hazelnut shells are frequently found at contemporary sites1,8,51. Also, reports show that limpets could be a dietary component in the Mesolithic period to complement meat from ungulates52. Bird bones were commonly used to produce tools in the Mesolithic. DNA sequences from Tufted duck, European robin, and mallard could therefore derive from birds used as food and/or raw material.

Even though remains of salmonids were not present, the presence of other species like cod (Gadus morhua) indicates that fishing was part of the subsistence strategy21. It has been suggested that salmonids are usually not detectable due to poor fish bone preservation53. Moreover, reference genomes for fishes are mostly not complete. Therefore, it is possible that the DNA reads assigned to trout because of scanty reference genomes.

Mistletoe is a hemiparasitic plant that infects hard and softwood trees. It is frequently found in Europe and often discovered by macroscopic analysis at Mesolithic sites in central Europe. Reports suggest that the plant could be used to produce poison for arrowheads, in addition to being utilised for medicinal purposes54. Crab apple pieces, cores and wood from apple trees are also found at the site41 as in several Mesolithic sites, and it is native to Europe.

In sample ble008, red fox, arctic fox and wolf were identified. These all three are members of the Canidae family, and they share a common ancestor. Thus separating these three species can be problematic. The presence of red fox is consistent with Mesolithic Scandinavian fauna, and their presence in the pitch samples is perhaps surprising but still consistent with osteological material from Huseby Klev. Furs of Canidae were probably used for clothing and the meat may have been consumed. Even though recent reports have demonstrated hunting of arctic foxes in Southern Poland 25,000 years ago55, the climate around Huseby Klev was too mild (annual mean temperature was 8℃) for this species to live56. We therefore used a competitive mapping approach and confirmed the presence of the mitochondrial genome of the red fox with 31% breadth of coverage.

Boethius et al.21 analysed strontium content of faunal and floral remains from Huseby Klev to trace evidence of mobility. The authors suggest that the residents of Huseby Klev were mobile, and used waterways to travel distances up to 20 km for hunting. The presence of terrestrial animal DNA in the pitch material indicates hunting of those species which is in agreement with the osteo-archaeological material from the Huseby Klev site.

Recently Mann et al.18 demonstrated the challenges in identifying eukaryotic reads in ancient samples. In essence, the relatively low number of aDNA sequences, and shared evolutionary ancestry in eukaryotic species create problems to authenticate species identifications. Moreover, they suggest that thousands of DNA reads would be needed for proper ancient DNA authentication. This is why we relied on stringent authentication criteria to identify the eukaryotic DNA reads.

In conclusion, we found that the chewed pitch material is an excellent source of ancient DNA, which can be used to understand diet, raw material use, and oral health conditions in prehistoric and early modern populations. We reconstructed several ancient bacterial genomes and found notable amounts of oral pathogens. Furthermore, we compared the microbial profile in chewed pitch mastic particles to modern human salivary microbiomes. Our results show that the bacterial profile is similar to modern periodontitis cases.

The samples also contained eukaryotic species like hazelnut, trout, mallard, red fox, limpet, and red deer. These species provide sources for reconstructing Mesolithic diet and paleoenvironment, while the presence of many of the species identified in this study is supported by archaeological finds. These DNA data may provide clues to specific dietary items but do not represent the whole variety of food sources used during the Scandinavian Mesolithic. They do, however, indicate what members of this particular group had been orally processing before or during their interaction with the pitch.

Our results portray a group of Mesolithic hunter-gatherers in south-western Scandinavia, where individuals likely suffered from dental diseases and were utilising resources from different domains through oral manipulation.

Materials and methods

The specific environmental/history/collection context

The Huseby Klev materials were unearthed and collected by archaeologists (including two of the co-authors of this article) during the excavation of this coastal hunter-fisher-gatherer site in the 90s50.

The material assemblage was rich and well preserved: human bones, animal bones, plant remains and pieces of masticated pitch were found. The exceptional preservation was presumably caused by a layer of clay which sealed the site into a geological time capsule. Pitch or “tar” pieces are a rather common find at Stone Age sites in Eurasia, and particularly in Scandinavia, where well preserved pieces of masticated pitch are found dating well into the Iron Age57. The pitch pieces were produced from birch bark tar, thus making birch a valuable resource during the entire Stone Age. There are about 90 masticated pitch pieces from the site, while tooth imprints in six of them have been cast and osteologically analysed. All the analysed pieces had imprints suggesting individuals younger than 20 years: three aged between 5 and 11 and three teenagers50. The pieces we used for the study were directly sampled from the material collection, which is under the guardianship of Bengt Nordqvist.

Experimental procedure

The DNA extraction and library building was performed at Stockholm University, AFL, in ancient DNA dedicated facilities, all designed to have positive pressure, separate airlock, HEPA filters for the incoming air, UV lights at the ceiling, and constant sodium 3% hypochlorite disinfection (NaOCl). Sampling of the masticated pitch pieces was performed in a “drill laboratory”, a separate room to the “main laboratory” where exclusively sampling of the materials was undertaken to minimise contamination. The material was irradiated in a UV oven at about 6 J/cm2 at 254 nm. When the necessary amount of the samples (between 50 and 100 mg) were collected into 2 mL screw cap tubes, the tubes were taken to the main laboratory for further extraction and library building. A blank was added at this stage. We checked the DNA concentration after extraction using Qubit 3.0 high sensitive double strand DNA fluorometric assay (Thermo Fisher Scientific). We did not observe any trace of contaminating DNA in the blank extractions.

In the main lab extraction was performed using two methods: the Yang extraction method58 which is commonly used for extraction of bone tissue (incubation at 55 °C in 1000 μL of buffer per sample containing 0.5 M EDTA pH8, 1 M Urea, and 10 μL of Proteinase K 10 mg/mL, followed by concentration of the supernatant using Amicon membrane filters and purification with MinElute spin columns to obtain 110 μL of final product) and the QIAamp PowerFecal DNA Kit (Qiagen) with which we obtained 100 μL of final product, following the instructions of the kit with minor adjustments.

The libraries were also built in the main laboratory, following a modified protocol by Meyer and Kircher59 for double-stranded blunt-end-repair libraries and damage-repaired libraries as in56 pre-treating the extracts with USER enzyme. Libraries were amplified with 10 µM index primers, later double-indexed, using AmpliTaq Gold 1000 Units 5 U/μL (Applied Biosystems) for blunt-end libraries, and AccuPrime™ Pfx DNA Polymerase (2.5 U/μL) (Invitrogen) polymerase for damage repair libraries (damage repair applied to only ble004 sample). We determined the number of cycles using quantitative PCR, with reagents from Thermo Scientific and Biomers. Libraries were sequenced on the Illumina Hiseq X platform at the SciLife centre in Stockholm and SciLife centre in Uppsala. The damage repaired libraries of the ble004 sample are shown in Table S12.

DNA libraries and external datasets

After receiving the sequences, AdapterRemoval60 was used to trim adapters, nucleotides less than 20 base quality, and filtered shorter than 10 nucleotides and subsequently subjected to human demography pipeline.

In this study we focused on DNA sequences that do not map to the human reference genome. To do this we first filtered human DNA reads from fastq files and we used cutadapt61 to trim remaining adapters, and nucleotides less than 20 base quality, and 10 nucleotides were also removed. Final sequencing depth of the fastq files is shown in Table S12.

In addition, we used published ancient and modern datasets to understand the microbial composition of ancient mastics. First, we used pre-calculated MetaPhlAn3 relative abundances of modern human microbiome samples using the curatedMetagenomicData package25,26,27,28,29,30,62,63. Next, we downloaded published DNA data from ancient dental calculus and chewed pitch studies8,9,10,11,14,15,16,17,19,22,23,24. At last, we used published studies that focus on sediments to identify environmental microbiome contribution to pitch samples28,33,34,64 (Table S13).

For differential abundance analysis, we used a dataset containing salivary microbiome data of 12 healthy, 12 caries, and 10 periodontitis patients40,41.

We downloaded the fastq files from European Nucleotide archive and processed paired-end DNA sequences using AdapterRemoval tool61 tool by trimming bases that have less than 20 phred quality score and collapsed forward and reverse reads if they had overlapping regions with 11 or more nucleotides.

Metagenomic profiling of the ancient mastics

We used MetaPhlAn3 with default options to profile the fastq files of the ancient samples62. This tool uses a marker database (that contains approximately 13,500 bacterial and archaeal, 3,500 viral, and 110 eukaryotic marker DNA fragments from ~ 17,000 reference genomes) to profile a DNA library and produces a microbial relative abundance table. This describes the proportion of DNA fragments that belong to a particular taxon, and a sequence alignment map (sam) file that contains the alignment information. After profiling each DNA library, we removed optical PCR duplicates from sam files with picard-tools65, sam files that correspond to each library were merged for each sample, and relative abundance tables were calculated with MetaPhlAn3 for each sample.

To further analyse the dataset, we prepared a set of reference and representative genomes of bacterial, archaeal, fungal, and viral species from the NCBI RefSeq database66 (N = 6133) and indexed them using the malt alignment tool38 and we aligned each fastq file to this indexed reference collection with the same tool. We used the dust-masker67 tool to mask repetitive regions that could create false-positive classifications. Then, we aligned the sequenced libraries to the database using malt38 with default options, and we obtained binary rma6 files. These files contain the alignment and taxonomy information for each sequence. Afterwards, we used MEGAN68 to extract the species absolute abundance values for each fastq file.

Comparing microbial profiles from ancient samples with modern human microbiome

To understand the metagenomic composition in ancient mastic material, we compared the relative microbial abundances from the ancient mastic samples against a dataset consists of modern human oral cavity (N  = 725), modern human nasal cavity (N  = 93), modern human skin (N = 42), modern human stool (N = 500), modern human vagina (N = 86), ancient human dental calculus (N  = 395), and chewed pitch (N  = 1). The site composition of the oral microbiome samples are anterior nares (N = 93), buccal mucosa (N  = 119), hard palate (N  = 1), keratinized gingiva (N  = 6), palatine tonsils (N = 6), saliva (N  = 5), subgingival plaque (N  = 72), supragingival plaque (N  = 127), and tongue dorsum (N  = 198). Some of the oral microbiome samples do not have any site information and therefore were not included in the list.

We calculated pairwise Bray–Curtis distances between each compositional vector using the vegdist function in the vegan package of the R statistical platform32,69,70. The distance matrix was visualised using a non-metric multidimensional scaling (NMDS) implemented in metaMDS in the vegan library with a K value of 7. The first three axes were selected for visualisation.

At last, we used SourceTracker43 to quantify the oral microbiome content in chewed pitch materials30. To do this, we selected species taxonomic nodes from the MetaPhlAn3 abundance matrix, and we marked the comparison dataset from modern human oral microbiome samples and ancient dental calculus samples as source materials. Additionally, we included sediment (N = 23), and shallow marine sediment (N = 21) as environmental sources. Then we used the SourceTracker tool with the default options, to quantify the source contribution to the ancient samples (Table S14S22).

Generating full genome ancient bacterial sequences

We collected all reads that were assigned to species’ taxonomy nodes and combined them into one fastq file for each species, with all reads that align to at least one reference genome of that species, and nothing else.

Based on the taxonomic assignment results, we selected the corresponding reference genomes (n = 281). And we aligned each fastq file to the respective reference genome, separately.

To produce full genome sequences, we extracted aDNA reads for each bacteria using malt-extract38.

Thereafter, DNA reads for each sample were merged and aligned to their respective bacterial full genome sequences using bwa71.

Sequence alignment files were filtered using SAMtools72 with alignment quality (q30) parameter to remove bad alignments, and DNA sequences with length less than 30 nucleotides were discarded from the alignment file. Optical PCR duplicates were removed using Picard-tools with the mark-duplicates option. Genome coverage was calculated using SAMtools72. Deamination patterns were obtained by the mapDamage tool73 with default options. User enzyme treated (damage repaired, dr) ble004 libraries were excluded from the analysis to calculate reliable deamination patterns. Results from these libraries are denoted with the suffix nondr. Finally, we generated genome authentication plots that contain edit distance distribution, length distribution, deamination patterns, and genome coverage plots. Damage repaired ble004 libraries were added in the analysis after we obtained reliable deamination patterns in non-damage repaired libraries.

Also, we co-assembled DNA reads using the megahit tool with the k-list parameter 21,29,39,59,79,99,11974. In essence, we de novo assembled all DNA reads that belong to one particular sample and we classified the contigs using krakenuniq tool with the full NT database that contains Eukaryotic full length genomes75,76. To calculate the authenticity of the contigs, we used pyDamage tool77. To assess the assembly quality, we used the Quast tool78 and specified reference genomes of each oral microbe that were used in the reference-based genome assembly step. With this approach we aimed to estimate the assembled proportion of each microbe (Table S2).

Beta-binomial modelling to find overabundant microbes in ancient pitch materials

A beta-binomial statistical model was used to detect which commensal oral microbes are overabundant in ancient mastics. We first used modern salivary microbiome datasets that contain healthy and dysbiosis cases (periodontitis and dental caries), restricted to the species that are commonly found in human oral cavities, as described in the Human Oral Microbiome Database. Then we used the beta-binomial method implemented in the corncob79,80 R package to build differential abundance models for each bacterial species. The cases where the differences between healthy and dysbiosis are statistically significant (multiple tests corrected using Benjamini–Hochberg method with a cut-off value of 0.05) suggests that these microbes could be markers of dysbiosis.

After selecting the marker species for dysbiosis in the modern salivary microbiome, we applied the same methodology to ancient mastics. First, the sequencing depth of ancient mastic libraries was scaled so the average depth of each library matched the mean value of the healthy salivary datasets. This way all distributions have the same order of magnitude and are therefore easier to compare.

Then, a beta-binomial test was applied to each sample, and significant taxa and models were extracted. Afterwards, we searched for marker species in the list of significant species, and we restricted the species list to the species in the Human Oral Microbiome Database.

Moreover, the same methodology was used to identify the differences between pitch materials and ancient dental calculus samples that had been identified with periodontitis17.

Machine learning to find predictors of dysbiosis

To estimate the overall health of the people who chewed the pitch material, we used a machine-learning classification method known as random forests42. Random forests aggregate predictions from different classification trees to reduce variance and to improve robustness75. In essence, the method creates several simple classification trees, each with a randomly chosen subset of features, and combines their predictions through voting.

In our case, the features correspond to the different species considered, and the data is the relative abundance of these species found in each sample. The labels used for training are the dysbiosis conditions: caries, periodontitis, or healthy. In this part, we calculated the relative abundance restricted to oral specific microbial species.

Each random forest is trained using only part of the data (i.e., bagging), and uses the remaining data to evaluate an out-of-bag error rate, which is numerically close to the N-fold classification error rate76. In other words, a low out-of-bagging error indicates that the random tree will generalise well for new data.

By evaluating the mean decrease in Gini impurity (MDG), random forests can rank each feature (i.e., species) sorting out which ones are more relevant to characterise the health condition. We used the MDG ranking to iteratively remove the least important feature, until five features remain, following Darst et al.81. In each iteration we created ten independent random forests, we averaged the MDG for each feature, and we discarded the feature with the lowest average MDG. This procedure was repeated until five features remained. With this approach, we kept only the features that contribute to the overall accuracy of the model.

This whole feature selection process was repeated three times. Finally, we sorted all the trees built in the feature selection process by their out-of-bagging error rate, and kept the best two models in each major iteration to make a model pool.

To have a simpler interpretation, we handled the periodontitis and caries cases separately, building two independent model pools.

We used the randomForest function of the R statistical package with default parameters to build random forests models70,82.

Multiple models trained with the same feature set were combined using the combine function in the randomForest library. This model pool was used to estimate the probability of each health condition for the ancient material, using the predict function over the relative abundance tables from the Mesolithic pitch materials. These probability values were then plotted using the ggplot2 and ggpubr libraries on the R statistical package83,83.

Eukaryotic taxonomic classification

To study the presence of eukaryotic species, we first removed the bacterial reads that we identified from the previous step. Then, we used Kraken284 to classify these reads using the full NCBI NT database that contains the all available reference sequences at that time.

We also used several confidence thresholds to filter the possible false positive classification (0.1—0.5). We manually restricted the resulting abundance table to keep only eukaryotic taxa with at least 10,000 DNA reads.

To authenticate these eukaryotic DNA reads, we extracted reference sequences for each eukaryote from the Kraken2 database. Then we used bwa with custom parameters (L = 16,500, N = 0.01, O = 2) to align the reads assigned to each taxa against their corresponding genome71.

After alignment, we removed nucleotides shorter than 30 bp, and validated the antiquity of the eukaryotic DNA reads using the same aDNA authentication protocol that we used for bacteria. We plotted the edit distance distribution, length distribution, and deamination plots for each set of DNA reads aligned to a eukaryotic species. Each plot was manually scored between 0 and 1 (1 for ideal and 0 for not ideal) based on the visual confirmation criteria: edit distribution should peak at low values and decline rapidly, there should be deamination patterns close to the reads’ extremes, and the distribution of DNA length should be concentrated in small values. Then, the entries were sorted according to the scores and to the number of DNA reads aligned to the reference genome (Table S7S11).

Data and code availability

Next generation sequencing data of ancient mastic materials are available on NCBI Sequence Read Archive via PRJNA994900 Bioproject accession number. The original codes, metagenomic abundance data and final random forests models are published with https://doi.org/10.5281/zenodo.10252965.