Identifying Archaeological Bone via Non-Destructive ZooMS and the Materiality of Symbolic Expression: Examples from Iroquoian Bone Points

Today, practical, functional and symbolic choices inform the selection of raw materials for worked objects. In cases where we can discern the origin of worked bone, tooth, ivory and antler objects in the past, we assume that similar choices are being made. However, morphological species identification of worked objects is often impossible due to the loss of identifying characteristics during manufacture. Here, we describe a novel non-destructive ZooMS (Zooarchaeology by Mass Spectrometry) method which was applied to bone points from Pre-Contact St. Lawrence Iroquoian village sites in southern Quebec, Canada. The traditional ZooMS technique requires destructive analysis of a sample, which can be problematic when dealing with artefacts. Here we instead extracted proteins from the plastic bags in which the points had been stored. ZooMS analysis revealed hitherto unexpected species, notably black bear (Ursus americanus) and human (Homo sapiens sapiens), used in point manufacture. These surprising results (confirmed through genomic sequencing) highlight the importance of advancing biomolecular research in artefact studies. Furthermore, they unexpectedly and exceptionally allow us to identify and explore the tangible, material traces of the symbolic relationship between bears and humans, central to past and present Iroquoian cosmology and mythology.


S2: mtDNA PCR Amplifications and Sequencing
Mitochondrial DNA Analysis Mitochondrial DNA (mtDNA) fragments were targeted via simplex PCR to further resolve the species identifications of the bear points. For DR-21 and DR-1662, we amplified a 217 bp fragment of the cytochrome b gene using forward primer UR-F6 (5' CAACATCCGAAAAACCCACCC 3') and UAM-R223 (5' AGTGAACATCTCGGCAAATATGGG 3') spanning positions 15311-15528 of the Ursus arctos mitochondrial genome (Genbank accession NC003427). Amplifications were performed in a 30 μL reaction containing nuclease free H2O, 1.5X of GeneAmp® 10X PCR Gold Buffer, 2.5 mM MgCl2 , 0.2 mM dNTP, 1 mg/ml BSA, 0.3 µM of each primer, 2.5 U of AmpliTaq® Gold LD DNA Polymerase (Applied Biosystems) and 3 μL of DNA extract. Amplification conditions involved an initial denaturation step of 95°C for 12 min, followed by 60 cycles of 52°C for 30 sec (annealing), 94°C for 30 sec (denaturing), and 72°C for 90 seconds (elongation), followed by a final annealing step of 72°C for 5 min. Amplified samples were visualized on a 2% agarose gel and sequenced at Eurofins UK. Resultant sequences were edited with ChromasPro to remove primer portions. Sequences were submitted to Genbank BLAST for initial identifications, and species and haplotype identifications were confirmed through multiple alignments in BioEdit 4 . The cytochrome b sequences from DR-21 and DR-1662 were aligned with 75 previously published bear sequences of extant and extinct American bear species [5][6][7][8] . The sequences from the two bone points both matched identically with each other and U. americanus A-East cytochrome b haplotype (Accession KM257060) 8 . The ancient mtDNA sequences were deposited in Genbank under Accessions MG696868-MG696869. No amplifications were observed within any of the blank extractions or negative controls run alongside the ancient samples.

S3: Library Creation and Illumina Sequencing
DNA extracts from five samples (DR-21, DR-1662 [bears], DR-894s, DR-1797s and MC-398s [humans]) were converted into double-stranded Illumina sequencing libraries for shotgun sequencing following the protocol by Meyer and Kircher 9 and modified according to Fortes and Paijmans 10 . Libraries were prepared using 25 μL of DNA, and individual P5 and P7 barcodes were included for each sample. For sample DR-1797s, libraries were prepared from both the pre-digestion and second digestion DNA extracts. 3 μL of the resulting libraries were amplified and indexed in a 25 μL reaction containing 1X AmpliTaq Buffer, 2mM MgCl 2 , 0.1 mg/ml BSA, 0.25mM dNTPs, 1.25U AmpliTaq Gold 360, 0.2μM IS4 Forward Primer and 0.2μM each of individually barcoded P7 Indexing Primer. Amplification thermal cycling conditions were as follows: 10 min at 94°C; between 15 and 19 cycles of 30 sec at 94°C, 45 sec at 60°C and 45 sec at 72°C, with a final extension step of 5 min at 72°C. Amplified libraries were purified using Qiagen MinElute PCR purification columns, quantified using a Qubit 2.0 Fluorometer, and quality assessed on an Agilent 2100 Bioanalyzer using a High Sensitivity Chip. Libraries were constructed from blank extracts, and negative controls were processed along with all samples to monitor for contamination. Indexed libraries were pooled in equimolar concentrations (along with blanks and negative controls) and single-end sequenced (with read length 80bp and/or 100bp) on a HiSeq2500 Illumina platform at the National High-throughput DNA Sequencing Centre, University of Copenhagen, Denmark. The library from DR-894s was sequenced twice (SE80 and SE100); data from both sequencing runs was combined before adaptor trimming and quality filtering. Sequencing results are presented in SI Table 1. Sequencing datafiles for the bone points, extraction blanks and library controls are available through the European Nucleotide Archive under Accession PRJEB23998.

S4: Data Processing and Read Mapping
The raw reads obtained from the sequenced libraries were trimmed for adapter and P5 index sequences using cutadapt v1.11. During P5 index trimming, one error in the index sequence was allowed (parameter −e 0.125). The reads were filtered to a minimum phred-scaled quality score of 20 (-q 20) and any sequence less than 30 bp in length (-m 30) or that did not match the correct P5 index was discarded from analysis. FastQ Screen (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen) was used for initial species identification, aligning the trimmed reads to three genomes: human (hd37d5), red deer/elk (Cervus elaphus hippelaphus GCA_002197005.1 Celaphus1.0) and polar bear (Ursus maritimus GCF_000687225.1). The FastQ Screen results confirmed the human genome as the most likely source of sequences for DR-894s and DR-1797s (Supplementary Figure 1). Likewise, the bear genome was confirmed as the most likely source of sequences for DR1662, and DR21, although the latter had <0.5% of the sequences mapping to any genome. No species identity, however, could be confirmed for MC-398s due to a lack of aligned reads.
The individual reads from the human bone points were then mapped to the human reference genome (hg19) using the Burrows-Wheeler Alignment 0.7.5a 12 with the following parameters (bwa aln -l 16500 -n 0.02 -o 2). Similarly, the individual reads from the bear bone points were mapped to the polar bear genome (Ursus maritimus GCF_000687225.1) using the same parameters. Reads were sorted and PCR duplicates removed using Samtools v0.1.19 12 ; mean fragment lengths were estimated using BAMStats (http://bamstats.sourceforge.net/). The percentage of endogenous DNA within all five bone points was extremely low. The human points DR-894s, DR-1797s and MC-398s displayed 2.94%, 2.19% and 0.01% of the sequences mapping to the human genome, respectively. As noted above, two separate DNA extracts were analysed from DR-1797s: the 'pre-digestion' fraction and the second digestion fraction. The library sequenced from the pre-digestion fraction yielded a greater quantity of endogenous DNA (2.19%) compared to the second digestion (0.27%) and thus, the data from former was used for subsequent genomic analysis. As noted from the FastQ Screen results, MC-398s had very few sequences mapping to the human genome (< 2000), and was excluded from further genomic analysis. The bear bone points displayed 0.06 % and 12.71% of sequences mapping to the polar bear genome for DR-21 and DR-1662, respectively (SI Table 1). Figure 1: Histograms illustrating the relative frequency of raw sequence read alignments to the human, red deer and polar bear genomes for the five bone points. Data labels indicate the percentage of sequences mapping uniquely to a single genome.

S5: Genomic Analysis
Data Authenticity Authentication of the ancient DNA sequences was undertaken through the assessment of postmortem degradation including short sequence length and characteristic misincorporation patterns, particularly the deamidation of cytosine (C-T) at the 5' ends of molecules 13 . Sequence length distributions and damage patterns were assessed through the mapDamage2.0 package (Jónsson et al., 2013) (SI Figure 2). The two bear samples (DR-21 and DR1662) and the two human bone point samples with over 1% endogenous DNA (DR-894s and DR-1797s) displayed average read lengths of 71 bp. An increase in C to T and G to A changes towards the 5' and 3' ends of reads was observed across all four samples, with the C to T misincorporation at the 5' read end ranging from 6.7-16.0% and complementary G to A frequencies ranging from 6.1-11.0% (SI Figure S2).

Molecular Sex Determination
Sex identification was undertaken on two human bone point libraries (DR-894s and DR-1797s) by calculating the ratio of reads aligning to the Y chromosome to reads aligning to both sex chromosomes (Ry) 14 . The 95% confidence interval (CI) was computed as Ry ± 1.96 × Ry × (1 -Ry)/(Nx + Ny), where Nx and Ny are the total number of reads aligning to the X chromosome and Y chromosome respectively. Using this method, both bone points could be identified as deriving from male individuals (SI Table 1).

Mitochondrial Genome Analysis
To determine mitochondrial haplogroups of the human bone points, we aligned the trimmed reads to the revised Cambridge Reference Sequence (rCRS) using mapping parameters described in S4, and a minimum mapping quality of 30 and called consensus mitochondrial genome sequences using Samtools mpileup (with parameters -B and -Q 30) and vcfutils.pl (vcf2fq). HaploFind 15 was used to identify defining mutations and assign sample haplogroups. Only one of the bone points (DR-894s) produced sufficient sequences (>1,000 mtDNA sequences) to obtain a confident mtDNA haplotype, corresponding to haplogroup C1, one of the major founding haplogroups of the Americas 16,17 .

Principal Components Analysis
PCA was performed using LASER version 2.04 18 on the two bone point samples identified as human via FastQ Screen. This comprises a two-step process in which modern reference samples and ancient samples are computed independently. In the first step, data comprising approximately 650K SNPS from 938 modern humans (HDGP_938) 19 is used to create a reference space by performing a standard PCA. In the second step, the ancient samples are mapped into the reference space by simulating sequencing data for each reference individual that matches the coverage pattern of the ancient sample. A second PCA is then performed, which includes both simulated reference data and the ancient samples. Finally, Procrustes analysis was used to project the results of the ancient sample PCA into the modern sample reference space. Due to the low endogenous DNA content of the bone samples, relatively few SNPs could be called for the two bone points (DR-894s 7585 SNPs; DR1797s 3094 SNPs, Figure 4 main text).

S6: ZooMS Bag Method Contamination Control
Two contamination issues commonly arose with the bag extraction method, those being coextraction of non-endogenous peptides (often keratin, see SI Figure 4) and plastic residues from the storage bags (see SI Figure 5). However, contaminants can be easily recognized and removed from downstream analysis. All bag spectra were checked against a list of known contaminant peptide masses (such as keratin and porcine trypsin), and those peaks were then excluded from further analysis. Keratin, the main structural component of hair, nails and the outer layer of skin, is a common laboratory contaminant and was the main source of contamination in the bag spectra. Being a common contaminant, the masses for human keratin peptides are widely available and any peaks matching with known keratin masses were excluded from further analysis. The keratin peaks also tend to be easily identifiable in the spectra as their intensity is generally greater than the actual sample collagen (SI Figure 4). The plastic residues pose less of a risk in terms of identification error as they generally form a distinct wave pattern of repeating peaks, usually 14 m/z apart and appearing at the lower molecular weight end of the spectra (SI Figure  5). Figure 4 Comparison of MALDI-TOF-MS spectra from the bag and destructive ZooMS methods for sample DR-491s depicting keratin contamination (all masses with *) and the lack of high molecular weight peaks in the bag extraction method. Peaks labelled in the destructive spectra are those used to identify the sample as bear; these peaks also appear in the bag spectra, however only keratin peaks have been labelled. SI Figure 5 Comparison of MALDI-TOF-MS spectra from the bag and destructive ZooMS methods for sample DR-1588s depicting likely plastic residue contamination in the bag extraction method. The lack of high molecular weight peaks in the bag extraction can also be seen. Peaks labelled in the destructive spectra are those used to identify the sample as white tailed deer.