Biomolecular insights into North African-related ancestry, mobility and diet in eleventh-century Al-Andalus

Historical records document medieval immigration from North Africa to Iberia to create Islamic al-Andalus. Here, we present a low-coverage genome of an eleventh century CE man buried in an Islamic necropolis in Segorbe, near Valencia, Spain. Uniparental lineages indicate North African ancestry, but at the autosomal level he displays a mosaic of North African and European-like ancestries, distinct from any present-day population. Altogether, the genome-wide evidence, stable isotope results and the age of the burial indicate that his ancestry was ultimately a result of admixture between recently arrived Amazigh people (Berbers) and the population inhabiting the Peninsula prior to the Islamic conquest. We detect differences between our sample and a previously published group of contemporary individuals from Valencia, exemplifying how detailed, small-scale aDNA studies can illuminate fine-grained regional and temporal differences. His genome demonstrates how ancient DNA studies can capture portraits of past genetic variation that have been erased by later demographic shifts—in this case, most likely the seventeenth century CE expulsion of formerly Islamic communities as tolerance dissipated following the Reconquista by the Catholic kingdoms of the north.

We were able to sample three individuals for ancient DNA analysis, but only one individual, UE2298/MS060 (the "Segorbe Giant"), excavated in 1999 (Supplementary Figure   S2), yielded DNA for genomic analysis. This was a ~25-year-old male, whose burial stood out from the others in the cemetery in several ways. He was 184-190 cm tall, ~20-25 cm taller than any other individual buried at the same site, which led to his designation as "Giant". His grave was the deepest found in the cemetery, and was covered by a layer of rocks that protected the grave and contributed to the particularly good anatomical preservation of his remains 1 . This allowed for a detailed anthropological study, which concluded that he suffered from various nonlethal pathologies, impoverished nutrition and/or high-fever episodes during childhood 2 .
Nevertheless, these episodes of malnutrition and/or disease did not seem to handicap his wellabove-average growth. There is also evidence of porotic hyperostosis in his skeleton, most likely a result of some sort of anaemia 2 .
In order to investigate his diet and mobility patterns, we also collected tooth samples from twelve additional individuals from the necropolis for dietary stable isotopic analysis (Supplementary Table S1). Although the necropolis is dated to the 11 th -13 th centuries CE, all human samples collected for this study came from a context dated to the 11 th century CE.
Additionally, we collected seventeen bone fragments from animals found in the site (although one was later identified as human by ZooMS and therefore excluded from the dataset). The faunal assemblage from Segorbe might post-date the timeframe of the Islamic necropolis of Plaza de Almudín, possibly dating instead to the later Christian period. All samples were stored in the Museo Municipal de Arqueología y Etnología de Segorbe, from where they were selected.
Permissions for sample collection and analysis were agreed by the museum, and granted by the Direcció General de Cultura i Patrimoni (Conselleria d'Educació, Investigació, Cultura i Esport de la Generalitat Valenciana).

Sample processing, aDNA extraction and sequencing
We processed all the archaeological samples in clean rooms in the specialized Ancient DNA Facility at the University of Huddersfield. Surfaces and tools were frequently bleached, cleaned with LookOut® DNA Erase (SIGMA Life Sciences) and exposed to UV-radiation. We used full-body suits, gloves, hairnets and face masks at all stages. We subjected samples to UVradiation for a total of 60 minutes (30 minutes each side), and cleaned the sampling surfaces with air-abrasion using 5µm aluminium oxide powder and a SWAM-Blaster compressed air abrasive system. We used a hobby drill with a diamond-tipped circular saw to sample the roots of the teeth, which were powdered using a Mixer Mill (Retsch MM400; 45 seconds at a frequency of 30 Hz/s), aiming for ~0.15g of fine tooth powder.
We performed DNA extraction following the protocol of Yang et al. 3 with modifications by MacHugh et al. 4 , including four blanks (air, water and two extraction buffer controls) at different stages to control for contamination. We confirmed DNA extraction by DNA quantification with Qubit TM 3.0 Fluorometer (ThermoFisher Scientific), using the Qubit® dsDNA HS Assay Kit.
We performed library preparation following the protocol by Meyer and Kircher (2010), with modifications as described in Gamba et al. 6 and Cassidy et al. 7 . Initially we sequenced one USER TM -treated library on one tenth of an Illumina HiSeq4000 lane (100 cycles) to screen for endogenous aDNA content. We then ran three additional libraries (one of which was non-USER treated) on half an Illumina HiSeq4000 lane, sequenced for 100 cycles. Sequencing was performed at Macrogen, inc. (Seoul, South Korea).

Sequence data processing
We assessed raw read quality by processing fastq files with FastQC v.0.11 8 , and merged paired-end reads and removed sequencing adapters using leeHom 9 . We mapped reads both to the human genome reference (hg19, but modified to include rCRS instead of chrM) and to only the rCRS (revised Cambridge Reference Sequence, GenBank accession code: NC_012920) with BWA (Burrows-Wheeler Aligner) v.0.7.5a-r405 aln 10 (using the optimized settings for aDNA mapping: -l 16569, -n 0.01 and -o 2 11 ) and samse. We used samtools v.1.4 12 to sort bam files (sort), remove PCR duplicates (rmdup) and to exclude reads below mapping quality 30 and minimum read length of 30 base pairs (bp).
We performed quality control of the alignment with QualiMap v.2.2.1 13 . To confirm aDNA authenticity, we (1) checked for contamination (0-0.5%) on the mtDNA sequence of the nontreated library using schmutzi 14 ; (2) confirmed one single mtDNA haplotype across all libraries, consistent with one single donor (Supplementary Table S2 We followed two approaches to avoid the effect of SNP miscalls due to post-mortem misincorporations: (1) we downscaled base quality scores of positions likely affected by postmortem damage using the --rescale option in mapDamage, and (2) we soft-clipped the terminal 3 bp of reads with the trimBam option in bamUtil v. 1.0.14 19 , to control for potential reference bias resulting from downscaling base quality scores that could influence formal tests of admixture and qpAdm modelling. We then merged all libraries using picard MERGESAM (https://github.com/broadinstitute/picard). Published ancient samples included in the autosomal analysis were remapped and reanalysed alongside UE2298/MS060 to prevent possible batch effects due to differences in pipelines.

Analysis of mtDNA and Y-chromosome variation
We retrieved mtDNA variant positions (Supplementary Table S2 We classified Y-chromosome variation into haplogroups using Yleaf 24 , and checked mutations against the ISOGG (International Society of Genetic Genealogy) SNP index (as of June 2018) (Supplementary Table S2). Next, we used pathPhynder 25 to investigate the affinity of our medieval sample with present-day Y chromosomes. This approach is ideal for low-coverage ancient samples such as UE2298/MS060, which has only ~9000 reads aligning to the Y chromosome, because it uses all available variation within a present-day-dataset, including SNPs not yet catalogued in ISOGG, and therefore potentially increasing the number of SNPs overlapping with an ancient individual 25 . For this analysis, we merged a subset of 256 individuals from the Hallast et al. 26 and Solé-Morata et al. 27 datasets and estimated a neighbour-joining tree with the 'ape' 28 R 29 package. We used the pathPhynder software to assign informative SNPs to tree branches, which we subsequently called in the ancient sample, using them to infer the individual's most likely position within the phylogeny.
We compiled a dataset with ~1.2M SNPs for analysis including only ancient samples. We The following tests were performed using two datasets generated using different strategies to deal with post-mortem damage ("mapDamage --rescale" and "soft-clipping"

Phylogeographic analysis of mtDNA haplogroup U6
We built a phylogenetic tree of mtDNA haplogroup U6 based on a total of 330 modern (35 of which are newly published here) and 32 ancient sequences (one being UE2298/MS060) using MtPhyl v.5.003 software (http://eltsov.org) (Supplementary Table S4). The newly reported present-day mitogenomes comprise 19 sequences from Portugal and Spain (included in the dataset described above), 9 from Libyan Berbers (DNA extracted from buccal swabs with QIAamp ® DNA blood mini kit, by QIAGEN ® ), 3 from Italy (DNA extracted from saliva following a standard phenol/chloroform protocol), and 4 sampled in Germany for the KORA study 46 . We amplified and sequenced samples from Libya and Italy using the protocols described above for the Iberian dataset; sequences from Germany were generated as part of the KORA study. We assigned geographic origins of newly reported present-day samples based on the birthplace of the donors' maternal grandmother.
We excluded insertions at positions 309 and 315, indels between positions 515 and 522, and hotspot positions 16182, 16183 and 16519 from the tree, in accordance with PhyloTree recommendations. Node age estimates were calculated from present-day sequences using rho (ρ) statistic and maximum likelihood (ML). We excluded remaining indels (included in the phylogenetic reconstruction), as they are not considered by the models used for age calculations.
We used a mutation rate of one substitution in every 3,624 years, correcting for purifying selection 47 , estimating standard errors as in Saillard et al. 48 . We performed ML estimates of branch lengths using PAML 4 49 , assuming the HKY85 mutation model with gamma-distributed rates (discrete distribution of 32 categories) and considering two partitions so as to differentiate the fast-evolving hypervariable segments (HVS) I (16024-16400 bp) and II (44-340 bp) from the rest of the molecule.
We computed Bayesian skyline plots (BSPs) 50 for the complete modern U6 dataset using BEAST v.1.8.0 51 (100,000,000 interactions with a burn-in of 10,000,000 steps), applying a relaxed molecular clock with a mutation rate of 2.514x10 -8 mutations/site/year (previously calculated for U6 52 ), assuming a 28-year generation time 53 . We combined three independent runs with LogCombiner v.1.8.0, included in BEAST package.

Mobility isotope analysis: oxygen
We selected samples for enamel carbonate analysis based on tooth type (Supplementary   Table S1), avoiding teeth that would be affected by nursing enrichment where possible 54 . We avoided the top ~2mm of the tip of the crown where enamel forms first when sampling second molars and premolars to do our best to ensure that the majority of the bulk sample was made up 9 of enamel formed after nursing had probably ceased, although with possible 1 st /2 nd molars and 1 st /2 nd premolars some nursing effect may be present if these are indeed the earlier forming teeth.
Historical sources indicate that medieval populations were typically breastfed to at least two years of age 55 .
For enamel carbonate sampling and pre-treatment, we followed that described by

Identification of animal bone by collagen peptide mass fingerprinting (ZooMS)
ZooMS is a qualitative analytical technique for taxonomic identification of archaeological bone through collagen peptide mass fingerprinting 64,65 . Analysis was undertaken on a sub-sample of the extracted collagen from all seventeen animal samples. Approximately 0.5mg of extracted collagen was added to 50μL of AmBic (ammonium bicarbonate buffer, pH 8.0) and vortexed.
0.4μg of trypsin (Promega) was added and the samples were digested overnight at 37°C.
Afterwards, samples were centrifuged for 1 minute at 13,000rpm, following which 1μl of 5% TFA was added to stop enzymatic digestion. A C18 ZipTip (Agilent) was used for peptide extraction, and eluted using 50μl of 50% ACN in 0.5% TFA.
MALDI-TOF-MS analysis followed previously established protocols 64 , but using 1μl sample solution and 1μl matrix (α-cyano-hydroxycinnamic acid) solution. These were spotted in triplicate along with calibration standards onto a Bruker ground steel target plate and run on a Bruker ultraflex III MALDI TOF/TOF mass spectrometer. Spectra were analysed using the mMass software 66 . We assigned taxonomic identifications based on the presence of unambiguous m/z markers (Supplementary Table S12) from published sources 64,67,68 . One bone fragment, GOGa03, was determined to be a human and was not included in the faunal isotopic study.

Phylogeography of mtDNA haplogroup U6
The study of individual UE2298/MS060 prompted a renewed phylogeographic study of mtDNA haplogroup U6 (Supplementary Figure S4)  lineage in mainland Iberia. Today, U6a is found at very low frequencies in the Iberian Peninsula.
In our large dataset of complete modern mtDNA sequences (n=1104 from mainland Spain, the Balearic Islands, and mainland Portugal), U6a has an average frequency of 1.6% in the peninsula, with a peak of 3.57% in the south of Spain (Figure 1b)   , however a mixed C3/C4 diet is also a possibility for Africa 108 .

Diet
There is a general problem of equifinality when analysing consumer isotope values in the The human data, however, lack a significant positive correlation between δ 13 C and δ 15 N (r 2 =0.086, Pearson's r=0.33) usually characteristic of mixed terrestrial C3/marine protein intake 111 . Any marine component to the diet is likely to be overemphasised in δ 15 N values as some proportion of the carbon in collagen will reflect the consumption of carbohydrates and lipids in addition to protein (see Craig et al. 113 for discussion). A weak correlation between d 13 C and δ 15 N 16 values for collagen in similar contexts in Spain has therefore been taken to indicate some direct consumption of C4 crops by humans in a C3/marine protein-dominated system 114,115 . Given that humans from Plaza del Almudín possess collagen isotope values within the range of medieval populations from Galicia, who subsisted on both marine fish and C4 crops in addition to C3 terrestrial resources 115 , it is probable that this dietary combination was also consumed by the human population at Sergobe. In addition, enamel carbonate values that reflect whole diet (lipids, carbohydrates, proteins) are higher at Sergobe (δ 13 Cap mean = -6.5‰, n=8) than published apatite values for medieval populations from Spain that are interpreted as consuming C4 crops (Écija δ 13 Cap mean -11.5‰, n=40 116 ; Eivissa δ 13 Cap mean -9.6‰, n=6 114 ), which altogether serve to support this hypothesis.
Although he cannot be classed as an outlier, the collagen of UE2298/MS060 shows more negative d 13 C and lower d 15 N values than the majority of the humans from this assemblage and thus it is likely that the diet of this individual had a lower input of marine protein/C4 crops compared to others among the Segorbe population (Supplementary Figure S11).

Supplementary Figures
Supplementary Figure S1