Genome-wide patterns of local adaptation in Drosophila melanogaster: adding intra European variability to the map

Signatures of spatially varying selection have been investigated both at the genomic and transcriptomic level in several organisms. In Drosophila melanogaster, the majority of these studies have analyzed North American and Australian populations, leading to the identification of several loci and traits under selection. However, populations in these two continents showed evidence of admixture that likely contributed to the observed population differentiation patterns. Thus, disentangling demography from selection is challenging when analyzing these populations. European populations could be a suitable system to identify loci under spatially varying selection provided that no recent admixture from African populations would have occurred. In this work, we individually sequence the genome of 42 European strains collected in populations from contrasting environments: Stockholm (Sweden), and Castellana Grotte, (Southern Italy). We found low levels of population structure and no evidence of recent African admixture in these two populations. We thus look for patterns of spatially varying selection affecting individual genes and gene sets. Besides single nucleotide polymorphisms, we also investigate the role of transposable elements in local adaptation. We concluded that European populations are a good dataset to identify loci under spatially varying selection. The analysis of the two populations sequenced in this work in the context of all the available D. melanogaster data allowed us to pinpoint genes and biological processes relevant for local adaptation. Identifying and analyzing populations with low levels of population structure and admixture should help to disentangle selective from non-selective forces underlying patterns of population differentiation in other species as well.

number of SNPs per gene and Z ST was eliminated (r = 0.01121; p-value = 0.1981) ( Figure   2 8 1 S1B, Supporting information). A total of 657 genes in the top 5% tail of the empirical Z ST 2 8 2 distribution were considered as candidate differentiated genes. genes was performed using topGO v.2. 16.0 (Alexa et al. 2006). We also performed gene set definition. In addition to the conventional Fisher's exact test or the Kolmogorov-Smirnov test 2 9 0 (classic), we used three other algorithms: elim, weight and weight01 to identify significantly 2 9 1 enriched GO biological processes. While in the classic enrichment test each GO term is tested independently, in the other three algorithms the hierarchical structure of the GO is taken into 2 9 3 account so that the enrichment score incorporates information about the whole GO topology. This eliminates local similarities and dependencies between GO terms and reduces the high 2 9 5 false-positive rate that is usually attained with classic GO enrichment tests (Alexa et al. 2006).

9 6
The elim method iteratively removes genes annotated to significant GO terms from their 2 9 7 ancestors, following a bottom-up approach. This allows for the identification of more specific 2 9 8 nodes in the GO graph. The Weight algorithm identifies the locally most significant terms in 2 9 9 the GO graph (and not necessarily the most specific terms). Weight01 is a combination of 3 0 0 weight and elim algorithms. We decided to look at the results yielded by elim, weight and 3 0 1 weight01 in order to explore which areas in the GO graph contain enriched GO terms instead 3 0 2 of finding the precise GO terms that are enriched. Only GO terms with 5 or more 3 0 3 differentiated genes were considered. We considered that a GO term was significantly Using the presence/absence information provided by Tlex2 we created a multi-sample VCF S4A, Supporting information). We found that the levels of differentiation were significantly 4 0 0 heterogeneous among chromosomal arms (ANOVA p-value <2e-16; Figure S4B, Supporting 4 0 1 information). Therefore, we used the empirical F ST distribution of each chromosomal arm to 1% or top 0.05% tails of the distribution were considered as candidates ( Figure S4C, Supporting information). analysed. In(2R)NS was present at low frequencies in both populations, while In(3R)P and 4 1 0 In(3L)P were either absent or present at low frequency (Table S4, Supporting information).

1 1
We thus only considered In(2L)t for the rest of this work. was significantly higher than the median F ST outside this inversion: ). However, we did not find a significant accumulation of candidate 4 1 8 differentiated loci inside the inversion ( Figure 6A).

1 9
We then analyzed the functional class of the differentiated SNPs. We first compared the the three categories that showed a more significant enrichment of candidate SNPs (Table S5, Supporting information). To identify the biological processes likely to be involved in population differentiation, we genes with a Z ST score in the top 5% of the distribution (Table S6, Supporting information).

3 9
We then performed GO enrichment analysis using an algorithm based on the classical and ovarian follicle cell migration, respectively. Interestingly, these processes are associated

5 7
To identify the most likely candidate SNPs to play a role in local adaptation, we focused on 4 5 8 the 22 genes that belong to the four GO terms that were consistently found to be 4 5 9 overrepresented across algorithms: response to insecticide, cuticle pigmentation, neuropeptide 4 6 0 signalling pathway, and cell wall macromolecule catabolic process (Table 1). For each gene 4 6 1 in these four significant GO terms, we identified the SNP with the highest F ST (Table 1). We In(2L)t inversion. However, this gene has been previously identified when analysing different The strongest F ST signal in the genome corresponds to a coding change in the Acetylcholinesterase (Ace) gene that is known to confer insecticide resistance and has been  mutation was also identified for Cyp6g1 (Table 1). This gene has also been repeatedly could represent an additional Cyp6g1 genetic variant involved in insecticide resistance ( Table   4  8  6 1). Finally, we found that two other insecticide resistance related genes Cyp6a8 and Gr8a also 4 8 7 have coding changes while Cyp6g2 has a mutation in the core promoter (Table 1).

8 8
Regarding cuticle pigmentation and neuropeptide signalling, all changes were non-coding Lehner B (2013) Genotype to phenotype: lessons from model organisms for human genetics. Nat Rev Genet 14, 168-178.