The origin of sheep settlement in Western Mediterranean

The arrival of Neolithic culture in North Africa, especially domestic animals has been essentially documented from archaeological records. As the data relative to sheep are scarce, we studied the genetic relationship between Moroccan sheep breeds and Mediterranean ones using the sequencing of 628 bp of the mitochondrial DNA control region in 193 Moroccan individuals, belonging to six breeds, and 652 sequences from other breeds in Europe and Middle East. Through Network analysis and an original phylogenetically derived method, the connection proportions of each Moroccan breed to foreign ones were estimated, highlighting the strong links between Moroccan and Iberian breeds. The first founders of Moroccan sheep population were issued at 79% from Iberia and 21% from a territory between Middle East and Africa. Their calculated expansion times were respectively 7,100 and 8,600 years B.P. This suggests that Neolithization was introduced by a double influence, from Iberia and from another route, maybe Oriental or Sub-Saharan. The consequence of the environmental changes encountered by founders from Iberia was tested using different neutrality tests. There are significant selection signatures at the level of Moroccan and European breeds settled in elevated altitudes, and an erosion of nucleotide diversity in Moroccan breeds living in arid areas.


Newik-Extra results
Suppl. S4.C. PCR amplification and sequencing of the mitochondrial DNA D-loop region.
The three primers used for both polymerase chain reaction (PCR) and sequencing of the Dloop region are indicated in Table 1. These primers were designed using the software Primer3 (at primer3.ut.ee), starting from the complete mtDNA sequence of reference Ovis aries in GenBank (NC_001941.1). The first coupleD1F/D1R allows the amplification of the first 655pb of mtDNA control region from 15391 to 16046, and the CR1F/D1R one allows the amplification of 634 pb from 15412 to 16046 (Suppl. S4.C. Table 1). Amplification reactions were carried out in a final volume of 20 μL PCR mix containing 70-80 ng of genomic DNA, 4 μl of buffer (High-Fidelity), 200 μM of each deoxynucleotide triphosphate (dATP, dCTP, dGTP and dTTP), 10 pmol of each primer and finally 0.2 μl of Taq polymerase (Phusion High-Fidelity from ThermoFisher scientific, Waltham, MA, USA). PCR thermal conditions were specified in Suppl. S4.C. Table 2. PCR products were checked by 1.5% agarose gel electrophoresis stained with ethidium bromide. Amplicons were treated with ExoSAP-IT in order to purified them from remaining dNTP and primers, as per manufactures instructions (ThermoFisher scientific, Waltham, MA, USA). Sequencing reactions were conducted using Big-Dye Terminator Cycle Sequencing kit (Applied Biosystems) according to the thermal conditions summarized in Suppl. S4.C. Table 3. The products were purified on Sephadex Columns (Sephadex™ LH-20, Sigma-Aldrich, Saint-Louis, MO, USA). Purified products were analyzed in ABI Prism 3100 Genetic Analyzer (Applied Biosystems).
Suppl. S4.C. To briefly explain our strategy, let's imagine an ancestral population composed by individuals belonging to two haplogroups A and B, including for example the individuals "a" and "b" belonging to the haplogroups A and B, respectively. It was assumed that this population split into subpopulations 1 and 2 containing the progeny a1 b1 and a2 b2, respectively. The individuals a1 and a2 share mutations acquired by their ancestor a, and ditto for b1 and b2. To quantify the proximity between subpopulations, we took into account the number of sister sequences retrieved in each pairwise combination, using the information contained in the topology of a phylogenetic tree.In order to entirely explore the phylogenetic tree, we constructed a program that (i) reads the topology expressed in the parenthetic tree (Newick) and (ii) records all the common points between the breeds in any combination, at the level of the terminal branches, and (iii) summarizes the data in an ExcelTM table. For example, in the simple parenthetic tree of figure 1 (A, B, C), (D, E), we retained the following information: A has one point in common with B and C, and ditto for B and C, and D and E. This program named Newick-Extra, written in R is available on request to the authors. Two levels of proximity have been considered. The first one corresponds to strict sister sequences (terminal branches), and the second one to two series of embedded sequences. The matrix gives for each breed pair the number of similarities that was later transformed into their proportions, giving an asymmetric matrix. This last one can be treated by distance measures applied to ordered variables.
Suppl. S4.D. Figure 1: Counting the two levels of connection numbers.

Matrix of Level 2
It should be noted that the treatment of the topology information at the level 2 produced a new table of affinities between breeds (data not shown) that generated a Cluster analysis in which the Moroccan breeds are scattered in three groups: among Italian and Iberian breeds. As a result, the level 2 gives a confusing view relatively to the Network analysis.