A treponemal genome from an historic plague victim supports a recent emergence of yaws and its presence in 15th century Europe

Developments in techniques for identification of pathogen DNA in archaeological samples can expand our resolution of disease detection. Our application of a non-targeted molecular screening tool for the parallel detection of pathogens in historical plague victims from post-medieval Lithuania revealed the presence of more than one active disease in one individual. In addition to Yersinia pestis, we detected and genomically characterized a septic infection of Treponema pallidum pertenue, a subtype of the treponemal disease family recognised as the cause of the tropical disease yaws. Our finding in northern Europe of a disease that is currently restricted to equatorial regions is interpreted within an historical framework of intercontinental trade and potential disease movements. Through this we offer an alternative hypothesis for the history and evolution of the treponemal diseases, and posit that yaws be considered an important contributor to the sudden epidemic of late 15th century Europe that is widely ascribed to syphilis.

performed as previously described 7 . Samples were captured in individual wells, but blanks 54 were pooled and captured together separately from the samples. 55 56 Shotgun libraries and capture products were sequenced to a depth of approximately 10 57 million reads on an Illumina HiSeq4000 using a 50 bp paired--end kit. Screening and captured 58 blanks were sequenced to a depth of approximately 2 million on an Illumina NextSeq500 on 59 either a 75 bp paired--end mid--output kit or on a 75bp single--read kit, except for the T. 60 pallidum capture blanks for the first tooth from AGU007, which were sequenced to a depth 61 of 2 million reads on an Illumina HiSeq4000 using a 75bp single--read kit. 62 63 ii) Illumina read processing 64 Sequenced shotgun libraries and blanks were demultiplexed and mapped to the human 65 genome (hg19) in EAGER v1.92 to evaluate aDNA preservation 8 . In EAGER, reads had the 66 adapters clipped, and paired reads were merged using AdapterRemoval v2.2 9 . Reads were 67 filtered for a length of ≥ 30 bp and a minimum base quality of 20. Mapping was performed 68 by BWA version 0.7.12 with a seed length (--l) of 32 and a mapping stringency setting of 0.01 69 (--n). We removed read alignments with a mapping quality (--q) below 20 using SAMtools 70 (http://samtools.sourceforge.net/). Duplicates were removed with MarkDuplicates 71 (http://broadinstitute.github.io/picard/) and mapDamage v2.0 was used to examine DNA 72 damage 10 . The 26 samples had 373 -1,090,755 DNA fragments mapping to hg19 after 73 quality filtering and duplicate removal, and % endogenous DNA ranging from 0.01 to 32.61 74 (Table S2). Negative controls showed between 90 and 24,782 fragments mapping to hg19 75 after quality filtering (Table S2). 76 77 Shotgun libraries for samples and blanks, both UDG and nonUDG, were mapped in EAGER 78 against the CO92 reference genome for Y. pestis (NC_003143.1) as a verification of the qPCR 79 screening results 11 . EAGER was applied as described above, however, the BWA mapping 80 parameters were --l = 16, --n = 0.01 and read alignments with a mapping quality below 37 81 were removed. Results are presented in Table S3.  82   83 iii) qPCR re--evaluation 84 On account of the unexpectedly high number of mapping Y. pestis fragments in individual 85 AGU007, re--examination of the qPCR assay results revealed that AGU007 had a signal that 86 had been initially dismissed. AGU007's amplification curve indicated a quantity below the 87 lowest standard of 0.2 copies/µL ( Figure S1, Table S1), and the melting peak signal was also 88 very low as compared to the amplification standards and the other samples ( Figure S2). 89 90 91 92

103
A UDG library from 40 µL of extract was prepared for AGU007 and was shotgun sequenced 104 on an Illumina HiSeq 4000 with a 75bp single end kit. Mapping to hg19 and Y. pestis was 105 carried out as previously described, though with more stringent parameters (--l 32, --n 0.1, --q 106 37) for the Y. pestis mapping (Table S3) observed in either the UDG--treated or untreated libraries for AGU003, AGU009, and  117  AGU013, indicating these were likely false positives.  118   119 Aside from Y. pestis, numerous reads were assigned to other pathogenic taxa, including oral 120 microorganisms ( Figure S3). Verification of these signals would require targeted enrichment 121 for species and strains of the relevant taxa, and was not pursued here. 122 123 In addition to Y. pestis, the UDG library for AGU007 also demonstrated a signal for 124 Treponema pallidum that had not been observed in the initial screening of the nonUDG 125 library. Closer examination of the MALT data of the nonUDG library disclosed the presence 126 of 7 fragments matching to T. pallidum and 11 fragments matching to the Treponema genus 127 ( Figure S3). This was not reported as positive identification by HOPS since the level of 128 damage in the reads precluded proper evaluation of taxon assignment. 129 130 131 132

133
Numbers in each cell correspond to assigned reads. Cell fill colour indicates degree of taxon assignment 134 evaluation: gold pass the set edit distance filter, green pass this filter and also have terminal C to T damage, 135 and blue shows that reads with terminal C to T damage pass the edit distance filter. Grey indicates no positive 136 taxon assignment evaluation in HOPS.

137
Both the untreated and UDG--treated libraries were mapped in EAGER against the Nichols 138 syphilis reference genome (NC_021490.2) using the same programs and parameters as 139 previously described for mapping to CO92 14 . The untreated library was found to contain 27 140 mapping fragments after duplicate removal and quality filtering and the UDG--treated library 141 was determined to have 36 mapping fragments following duplicate removal and quality 142 filtering (Table S11). The other 25 AGU samples had mapping fragment numbers ranging 143 from 0--8 after duplicate removal and quality filtering. Screening blanks were also mapped in 144 EAGER using screening library parameters and programs, and were found to have 0--3 145 fragments mapping to Nichols (NC_021490.2) after duplicate removal and quality filtering, 146 consistent with the other samples found to be negative via screening. probes and probes with low sequence complexity were removed. This resulted in 1,125,985 155 unique probe sequences. This probe set was spread on two Agilent one--million feature 156 SureSelect DNA Capture Arrays. The capacity of the two arrays was filled by randomly 157 duplicating probes from the probe set. The arrays were turned into an in--solution DNA 158 capture library as described elsewhere 15 . 159 160 The treated and untreated libraries for AGU007 and its associated blanks were then in--161 solution captured for whole genomic Treponema pallidum DNA using the capture library 162 described above. The T. pallidum--captured untreated and UDG--treated libraries were 163 captured in individual wells and the associated blanks were pooled and captured separately 164 from the samples. 165 166 Mapping of captured products for Y. pestis and T. pallidum (first tooth) 167 168 Reads from Y. pestis--captured samples were mapped in EAGER against the CO92 reference 169 genome (NC_003143.1) 11 with stringent BWA mapping parameters (--l 32, --n 0.1, --q 37). 170 Samples had 119 -156,123 fragments mapping to CO92 after quality filtering and duplicate 171 removal, with mean genomic coverages ranging from 0 to 1.98--fold (Table S4 and S5). 172 Untreated libraries showed expected damage patterns ( Figure S4). Samples AGU003, 173 AGU009 and AGU013 had mean genomic coverages of 0 -0.01--fold following capture, 174 supporting the negative MALT/HOPS results. Samples AGU010, AGU020 and AGU025 had 175 low mean genomic coverages ranging from 0.28 to 0.77--fold. Captured blanks had between 176 5 and 90 mapping fragments, with duplication (cluster) factors ranging from 2.3 to 2645.7, 177 indicating that complexity of the library had been adequately explored. These numbers are 178 consistent with previously reported Y. pestis capture blank data in Bos et al. 2016. 179 180 The T. pallidum--captured untreated and UDG--treated libraries for AGU007 were sequenced 181 on a HiSeq4000 to a depth of approximately 40 million reads with a 75 bp paired--end kit. 182 The samples were mapped in EAGER to the Nichols genome (NC_021490.2) with the same 183 parameters as the Y. pestis mapping . The libraries had 31,882 and 126,552 mapping  184 fragments, respectively, following duplicate removal and quality filtering (Table S12). The 185 mean coverages of the untreated and UDG--treated libraries were 1.78X and 6.67X, 186 respectively, with the untreated library displaying the expected aDNA damage pattern 187 ( Figure  S5). The UDG--treated library covered 96.1% of the genome at 1X and 76.1% at 5X. 188 Captured blanks, sequenced on a HiSeq4000 with a 75bp single read kit, had 4 or fewer 189 reads mapping to the Nichols genome after quality filtering. 190 191 192 Processing and analysis of Second Set of Teeth from AGU 193 194 Further data analysis awaited the arrival of the second set of teeth: one each from 195 individuals AGU007, AGU010, AGU020 and AGU025. Following arrival, sampling, DNA 196 extraction, library preparation, indexing, amplification, and Y. pestis and T. pallidum capture 197 protocols were followed as previously described. No qPCR screening was performed for 198 these samples. Remaining aliquots of unindexed AGU010 and AGU020 UDG--treated libraries 199 from the first tooth were indexed and captured as part of the same batch following the 200 same processes. Untreated and UDG--treated screening libraries were sequenced on 201 separate runs on a HiSeq4000 using single--end 75 bp kits. Y. pestis--captured libraries were 202 paired end sequenced to a depth of 20 million reads on a NextSeq500 using a 75bp mid 203 output kit and T. pallidum--captured libraries were sequenced to a depth of approximately 204 40 million on a HiSeq4000 using a 75 bp paired--end kit. All negative controls were 205 sequenced to a depth of approximately 2 million on separate NextSeq500 runs using 75 bp 206 mid--output paired end kits. 207 208 Mapping of the untreated and UDG--treated libraries of the tooth samples to hg19 was 209 performed following the same procedures described previously. Shotgun sequencing 210 yielded endogenous DNA contents of 0.07--59%, which corresponded to 6,381 - 6,986,467 211 mapping fragments after duplicate removal and quality filtering (Table S6). Both AGU007 212 and AGU010 had better human DNA preservation in the second tooth, but the opposite was 213 observed for AGU020 and 025. Negative controls showed between 683 and 6,886 fragments 214 mapping to hg19 after quality filtering. Mapping of the untreated and UDG--treated shotgun 215 libraries to the Y. pestis CO92 genome was also performed, yielding 22 -6,394 mapping 216 fragments following duplicate removal and quality filtering (Table S7). UDG--treated shotgun 217 libraries were mapped using more stringent parameters than untreated shotgun libraries (--l  218 32, --n 0.1 and --q 37, and --l 16, --n 0.01 and --q 37, respectively). Negative controls had 11 or 219 fewer fragments mapping after quality filtering. 220 221 Mapping of shotgun libraries from the second set of teeth against the Nichols genome 222 (NC_021490.2) yielded 1 - 181 fragments following duplicate removal and quality filtering 223 (Table S13). Mapping parameters used were the same as for the shotgun library mapping to 224 Verification of the presence of Y. pestis plasmids pMT1 and pCD1 was also performed. 253 Mapping in EAGER using previous stringent parameters yielded coverages for the pMT1 254 from 4.02 to 63.75--fold, with 6284 to 79,105 fragments mapping after quality filtering. As 255 expected, the lowest values were observed for AGU020 and the highest for AGU007 Table  256 S9). No more than 4 uniquely mapping fragments were observed in the captured negative 257 controls. Mapping to pCD1 displayed similar results with the lowest values found in 258 AGU020, with 6983 unique mapping fragments and 6.08--fold coverage, and the highest in 259 AGU007, with 68,941 uniquely mapping fragments covering the pCD1 at 80.90--fold (Table  260 S10). No uniquely mapping fragments were found in the negative controls. 261 262 The T. pallidum--captured libraries of AGU007 and its associated blanks were mapped in 263 EAGER against Nichols (NC_021490.2) using the same parameters and programs as the Y. 264 pestis captures. AGU007 had 146,398 -330,505 fragments mapping after duplicate removal 265 and quality filtering, mean coverages of 9.3 to 21.3--fold and 92.1% to 97.6% of the genome 266 covered at 5--fold (Table S14). The nonUDG library showed the expected ancient DNA 267 damage pattern ( Figure  S5). One mapping fragment was detected in three of the captured 268 blanks after quality filtering. Each mapping fragment was analyzed using BLASTn and found 269 to map to regions conserved in multiple bacterial taxa. None of the top 10 matches were to 270 T. pallidum. 271 272 273

286 Phylogenetic Analysis of Yersinia pestis 287
Phylogenetic assessment was made following methods described in the main manuscript 288 with a genome dataset defined in Table S15. Y. pseudotuberculosis was used as the 289 outgroup. A full phylogeny is presented in figure S7, and a zoomed in phylogeny of the 290 post--Black Death lineages only is shown in Figure 4.   (Table S19). Each simulated read is 100bp long with 1bp tiling to the successive read. 303 Adapters of sequenced reads were trimmed and poor--quality reads were filtered. 304 Overlapping reads were merged and processed in EAGER version 1.92.58 19 . For AGU007, 305 included a minimum SNP coverage of 3--fold, genotyping quality of 30, and homozygosity 346 calling at 90% support. An N was inserted in positions where no base call could be made 347 based on the above parameters. Excluded regions are described elsewhere 37 . A total of 348 6,949 variant positions were identified in this set (Table S16). The unique variants associated 349 with AGU025 were analyzed using SnpEff 38 (Tables S17 and S18). The 4 SNPs in AGU025 350 affected four different proteins. Two of these changes, at positions 4,047,235 and 351 4,171,875, affecting the yjcD and aceB genes respectively, were synonymous. The SNP at 352 position 3,789,518, present in only AGU025, affected the mrcB gene, which is a penicillin 353 binding protein. The SNP at position 4,232,523 affects the fre gene --NADPH (FMN 354 reductase) and is present in both AGU025 and AGU020. One unique SNP in AGU025 at 355 position 1577025 was disregarded following visual inspection: It was restricted to the 356 terminal ends of reads at the edge of an uncovered region of the genome, and hence is 357 more likely to result from mismapping than actual biological variation. 358 The above procedures were followed for the treponemal dataset with 31 genomes (Table  359 S19), with exclusion of SNPs filtered for sites identified as recombinant (see above). The 360 unique variants observed in AGU007 were analyzed using SnpEff version 4.3t. The variant 361 type, its annotation, and effect were estimated using SnpSift and snpEff respectively. The 362 two unique variants 39267 and 523975 cause non--synonymous changes in the amino acid 363 residues of genes TPANIC_RS00150 (molecular chaperone GroEL), TPANIC_RS02370 364 (methyl--accepting chemotaxis protein) respectively. 365 366

367
Dating of the Yaws Clade 368 369 As input for TempEst, we used a maximum likelihood tree generated using RAxML 39 and an 370 AGU007 tip date of 1464 CE (oldest sigma 2 values as determined from radiocarbon dating). 371 The relationship between root--to--tip distance and date yielded an R 2 of approximately 0.053 372 (Figure S7