Retroviral analysis reveals the ancient origin of Kihnu native sheep in Estonia: implications for breed conservation

Native animal breeds constitute an invaluable pool of genetic resources in a changing environment. Discovering native breeds and safeguarding their genetic diversity through specific conservation programs is therefore of high importance. Endogenous retroviruses have proved to be a reliable genetic marker for studying the demographic history of sheep (Ovis aries). Previous research has revealed two migratory episodes of domesticated sheep from the Middle East to Europe. The first episode included predominantly ‘primitive populations’, while the second and most recent is hypothesised to have included sheep with markedly improved wool production. To examine whether the recently discovered Kihnu native sheep in Estonia have historically been part of the first migratory episode and to what extent they have preserved primitive genetic characters, we analysed retroviral insertions in 80 modern Kihnu sheep and 83 ancient sheep from the Bronze Age to Modern Period (850 BCE–1950 CE). We identified that the Kihnu sheep have preserved ‘primitive’, ‘Nordic’, and other ‘ancient’ retrotypes that were present both in archaeological and modern samples, confirming their shared ancestry and suggesting that contemporary Kihnu native sheep originate from the first migratory episode. However, over the course of history, there has been a gradual decrease in the frequency of primitive retrotypes. Furthermore, Kihnu sheep possessed several ‘novel’ retrotypes that were absent in archaeological individuals, but were shared with improvement breeds, suggesting recent crossing within the last two centuries. To preserve these ancient lineages, our results are being applied in the conservation program of the Kihnu Native Sheep Society.

-1. Archaeological sites of ancient sheep samples (n=83) analysed in this study. The samples of Kihnu sheep (n=80) were collected from the primary population and from the collection herds in Kihnu and Manija islands and in mainland south-west Estonia (Pärnu County). The Kihnu native sheep breed is named after the Kihnu Island. Map by Geoportal [22]. Image processing: Adobe Illustrator CS5 v.15 [23].

S2.1.2. Sampling and extraction protocol for modern DNA
The blood samples were collected to 9 mL vacutainers with EDTA (Greiner BioOne, Kremsmünster, Austria) and stored at -20°C until DNA extraction. DNA was extracted from 200 μl of blood using the High Pure PCR Template Preparation Kit (Roche Diagnostics), following the manufacturer's protocol. DNA extraction was conducted at the Department of Zoology, University of Tartu, Estonia. PCR and post-PCR work was conducted at BioArCh, University of York, United Kingdom.

S2.1.3. Sampling and extraction protocol for ancient DNA
All pre-PCR work was conducted in the ancient DNA (aDNA) laboratory, separate from the PCR and post-PCR laboratories at BioArCh, University of York, United Kingdom. Sample preparation and DNA extraction followed the silica spin-column protocol [1] with slight modifications. Nondisposable equipment was decontaminated between the samples, and latex gloves and protective clothing were worn when handling the specimens.
Thirteen sheep bones were subsampled using a sterile saw blade. Subsampled bones were soaked in 6% sodium hypochlorite for 5 min, rinsed in HPLC water three times, and UV irradiated for 20 min on two sides. The bone piece was ground in a mortar using a pestle, and 100-150 mg of bone powder was weighed out for the following extraction.
To demineralize the sample, 1 ml of lysis buffer with a concentration of 0.5M EDTA and 0.5 mg/ml Proteinase K was added and pre-digested at 37°C for 45 min. After that, the supernatant was removed and another 1.8 ml of the lysis buffer was added to the sample and digested at 50°C overnight. After incubation was complete, the sample was centrifuged at 13000 rpm for 5-10 min until the bone powder was separated from the buffer solution. To concentrate the sample, the supernatant was transferred to an Amicon™ Ultra Centrifugal Filter 10K MWCO (Merck, Darmstadt, Germany) and centrifuged at 4400 rpm for at least 90 min, until the preferred concentrated volume left in the membrane was 50-100 μl. QIAGEN QiaQuick MinElute kits were used for DNA purification. To bind the DNA, around 500-650 μl (5X the sample volume) of PB buffer was added to the sample, transferred to a MinElute column, and centrifuged for 1 min at 6500 rpm. The flow-through was discarded. To wash the DNA, 500 μl of PE buffer was added to the column, centrifuged for 1 min at 6500 rpm, and flow-through discarded. Another 500 μl of PE buffer was added to the column, centrifuged for 1 min at 13000 rpm, and flow-through discarded. To dry the column, it was centrifuged for 1 min at 13000 rpm and then transferred to a collection tube. To elute the DNA, 30 μl of heated (65°C) EB buffer was loaded to the column, incubated at room temperature for 5-10 min, and centrifuged for 1 min at 13000rpm. For the second elution, another 80 μl of EB was loaded to the column, incubated, and centrifuged. DNA was stored in safe-lock tubes at -20°C. For this study, the second elution of DNA was used for PCR amplifications.
Work was conducted at BioArCh, Department of Archaeology, University of York, United Kingdom, following a conventional destructive ZooMS method [2]. Briefly, the specimens were sampled for 15-30 mg of small bone pieces, demineralized in 0.6M hydrochloric acid, and washed once in 200 μl 0.1M sodium hydroxide and trice in 200 μl 50 mM ammonium bicarbonate solution pH 8.0. After one-hour incubation in 100 μl ammonium bicarbonate at 65°C, 50 μl of a sample was digested with 1 μl trypsin by incubating overnight at 37°C. Following incubation, 1 μl of 5% trifluoroacetic acid solution (TFA) was added to terminate trypsin activity. Peptides were extracted using a C18 ZipTip pipette tip (Millipore) treated with 0.1% TFA washing solution and 50% acetonitrile / 0.1% TFA conditioning solution, and then eluted with 50 μl conditioning solution. One microliter of the sample was spotted on a ground steel plate in triplicate, mixed with 1 μl αcyano-4-hydroxycinnamic acid matrix solution. The plate was run on a calibrated Bruker Ultraflex III MALDI TOF/TOF mass spectrometer. Three spectra of each sample were averaged and analysed in mMass [3,4,5]. Individual peptides were identified manually according to previously published markers [6,7,8].

S2.1.5. Radiocarbon dating
Most samples were dated based on archaeological context, that is, through associated finds and site stratigraphy. Five samples with unclear context were radiocarbon dated by AMS in SUERC Radiocarbon Dating Laboratory and one sample in the Leibniz Laboratory for Radiometric Dating and Stable Isotope Research, Christian-Albrechts-University of Kielall calibrations according to IntCal20 atmospheric curve [9]; OxCal v.4.4.2 [10]; r:5. All six samples gave a successful and expected result (Supplementary Table S1-1 online). The radiocarbon date for sample 142OaKar6 is discussed in more detail in the main text (see Results -Retrotype distribution), and therefore, we present the calibration curve here ( Figure S2-2). For 14 samples, dating results have previously been reported in Rannamäe et al. [11,12], but recalibrated here according to IntCal20 atmospheric curve [9]; OxCal v.4.4.2 [10]; r:5.

S2.1.6. Primers and conditions for polymerase chain reaction (PCR)
We used PCR primers designed to amplify short degraded DNA fragments. Some primers had been previously published [13] and some were designed specifically for this study (Table S2-1). The same primers were used for both ancient and modern DNA. With a set of three primer pairs, three regions of each provirus were targeted: the 5′ and 3′ long terminal repeats (LTRs) including the genomic flanking region of the host; and the empty locus (EL), which is the empty genomic insertion site, that is, only the genomic flanking regions of the host (for schematic representation of the PCRs, see [14,15]). Essentially, only the presence of one of the two LTR regions (5′LTR or 3′LTR) would be enough to confirm the provirus insertion in the host genome: if 5′LTR is present/absent, then 3′LTR has to be the same and vice versa. Based on the test PCRs on modern samples, we decided to focus on amplifying the 5′LTR region and the EL, because the 3′LTR seemed to give partially unclear results (smear in gel images). However, if it was necessary to authenticate the presence/absence of the 5′LTR, we also tested the 3′LTR.

S2.1.7. Authentication
Contamination of PCR was monitored by the use of negative controls for every primer pair in each PCR. Preparation of PCR reactions was done separately from the post-PCR work.
To test the quality of the primers, we amplified 5′LTR and EL of the 'ancient' enJSRV-6 in eight samples. enJSRV-6 is an 'ancient' enJSRV and thus fixed in all sheep [14,16]. For this test, we used both previously published primers [15] and those from this study. All results were positive as expected. The primers designed for this study, suitable to amplify shorter fragments of the aDNA, gave even clearer gel images than the previously published primers and therefore we decided to use these primers for the rest of the modern DNA samples as well. To further verify the performance of the 5′LTR primers, we chose seven samples to repeat the PCR with the 3′LTR region. Consequently, all results agreed. Five PCR products were sequenced (Table S2-2), confirming the correct target region. For all the rest of the modern samples, one PCR was agreed to be sufficient for a positive amplification, while all negative amplifications were confirmed with a second PCR to monitor for allelic dropout as a cause of false negatives.
In ancient samples, we mostly focused on 5′LTR instead of the 3′LTR, because the latter gave occasionally poor results, especially for enJSRV-7 and enJS5F16. Poor function of the 3′LTR primers could derive from factors like duplication of the genomic DNA at the site of provirus integration or identical LTR regions within and between proviruses [14], but also of poor preservation of the DNA and difficulties in targeting ancient nuclear DNA with designated primers. For enJSRV-18 and enJSRV-8, the 3′LTR primers worked well, authenticating the results gained from 5′LTR. All the rest of the 5′LTR amplifications were repeated at least twice. Furthermore, 25 PCR products were sequenced to confirm some of the amplifications (Table S2-2). Only samples with high-confident results were selected for the analysis.
It is important to note that while the results from modern DNA samples can be considered reliable for the presence/absence of each provirus (supported by the quality of the DNA and several steps of authentication), results may be less reliable with ancient samples. In aDNA samples, it is almost impossible to confirm whether the amplification was negative because of the absent insertion or due to DNA degradationeven when PCRs were repeated several times. Nevertheless, we consider our results for the ancient samples reach a reliable level of confidence, not just because of the authentication steps taken, but also because of the distribution of the enJSRVs and retrotypes are consistent with previously published data. 6

S2.2.1. Zygosity
As the sheep is a diploid organism, the retrovirus insertion could be present in both chromosomes (homozygous) or in only one of them (heterozygous). In case of heterozygous individuals, we considered the insertion dominant (following [15,17]). Presence/absence of the insertion was shown as 1 or 0, respectively (Table S2-3). In both modern and ancient sheep, for enJSRV-18 and enJS5F16, both homozygous and heterozygous individuals were found, while for enJSRV-7, all individuals were heterozygous. For enJSRV-8, both homozygous and heterozygous individuals were present among Kihnu population, while in ancient samples this retroviral integration was not present in any time period (Tables S2-4, S2-5).

S2.2.2. Retrotype examples among the Kihnu native sheep
There were 18 maternal lineages in the primary population of the Kihnu native sheep. Currently (in August 2020), 12 of these lineages are still present (i.e., their living descendants) and altogether 480 breeding animals are being used in the breeding program.
Kihnu native sheep shared eight retrotypes with the ancient individuals from the Late Bronze Age to the Modern Period (R0-R7). Of these, R7 is related to the Modern Period and possibly to improvement breeds. The remaining seven retrotypes (R0-R6) we considered inherent to the indigenous population in Estonia and thus worth preserving in the current breeding program. Interestingly, individuals from these seven retrotypes feature good primitive morphological traits, which were already being valued and selected in the breeding program. Among the sheep tested for this study, some were from the primary population and some were their descendants (Supplementary Table S1-2 online). Importantly, many of those descendants belong to primitive or ancient retrotypes, indicating good selection strategy already in the earlier days of the breeding program.
Knowing the retrotypes of the tested breeding animals and combining this information with preferred phenotypic traits allows even better selection in the future.
Kihnu native breed is characterized by small hardy body build, short tail, chance of horns in both males and females, occasional occurrence of wattles, double layered coat of hairy outercoat and woolly undercoat, primitive fleece structure, variety of colours, prolificacy and strong maternal instinct, adaptation to local environmental conditions, and resistance to diseases and parasites. In general, native sheep have developed during hundreds of years in mutual relationship with the local environment and their morphological traits are in conjunction with local adaptations. Moreover, indigenous animals have adapted to use vegetation most economically and today the native sheep are helping to preserve semi-natural environments and biodiversity. Porter et al. have stated that "the difficulty of realizing and developing these traits, without incurring the inherent dangers of commercial exploitation, is a problem that must be addressed in the 21 st century" [18]. Therefore, it is very important that all these factors together with morphological traits and molecular information are considered in order to contribute to the conservation of overall diversity of the northern European sheep breeds. sheep; however, in their statement they refer to rather old studies [21]. In a later publication of Mason's Encyclopedia, a tail length of 14 cm or less (as in the Soay) to 25 cm (as in the Hebridean) and with 12-14 caudal vertebrae are stated as suitable measures for the NST [18]. Interestingly, it is also noted that the number of caudal vertebrae in non-short-tailed sheep would be 20 or more [18], leaving a kind of 'grey' area in-between the number of 14 and 20. Moreover, the length of the tail is not considered as a precise indicator of the number of the caudal vertebrae [18]. According to the records of the Kihnu Native Sheep Society, live Kihnu sheep averagely possess a 21-22 cm long tail, which fits well into the limits described by Porter et al. [18]. But, on the other hand, our preliminary observations on the number of caudal vertebrae in Kihnu sheep show it to be in average between 14-16, and thus leading to another discrepancy in the morphology of the tail. Nevertheless, even though the tail in the Kihnu sheep is not as short as in several other NST breeds, it is very much shorter than in modern improvement breeds.