Introduction

Human mitochondrial DNA (mtDNA) variation has been intensely scrutinized in European populations since the late 1980s.1, 2, 3 The considerable data accumulated so far has allowed researchers to reconstruct the phylogeography of principal lineages and make inferences concerning key events in the prehistory of European populations.4 However, mtDNA variation has been little studied at micro-geographic level through dense sampling5, 6, 7 despite the fact that exploring how genetic diversity is distributed in geographically close populations, we may complement studies which aim to identify patterns of variation on a larger scale (macro-regional, continental or global), identifying local areas of genetic discontinuity and populations with peculiar gene pools.

In this context, Trentino and its populations may be regarded as a valuable case study. In fact, its geographical complexity and linguistic diversity provide us with the opportunity to assess the role of physical and cultural factors on human genetic variation in a micro-geographic framework. It is a relatively small area (6200 km2) located in North-Eastern Italy (Trentino-Alto Adige region) and a significant portion of its territory overlaps the eastern alpine system (Figure 1). Its complex territory, with 60% of the land located at an altitude >1000 m above sea level, may have influenced human mobility. However, the presence of numerous valleys, the Adige river and the Garda lake has always provided communication routes, favoring population movements and interactions. Such ambivalence is also reflected in the contrast between the central-western part of Trentino (for example, Adige, Non and Sarca valleys) where flat valley bottoms are more suitable for human occupation compared with the eastern zone (for example, Primiero, Fassa valleys and Lagorai group), which is composed mostly of mountains with steep slopes and thickly wooded areas.8, 9, 10 Trentino is also characterized by the presence of groups with a noticeable linguistic diversity. It hosts the Italian-speaking communities of the romance language (500 000 people), which speak five different dialects: Semi-ladino; Trentino occidentale; Trentino orientale, Trentino centrale and Trentino dell'Avisio,11, 12 and the romance-speaking group of Ladins (belonging to the Rhaetian sub-branch) from the Fassa Valley (about 9000 people),13 which is part of the Dolomitic Ladins (Ladìn) settled in other valleys of the Italian oriental Alps (Gardena and Badia in south Tyrol and Fodom and Ampezzo in Veneto).14 Finally, it hosts the German-speaking group of Cimbri who are thought to derive from the Bavarian populations that colonized a vast territory, including some Eastern Italian Valleys (Veneto, 1053 AD) and Lavarone, Folgaria and the Luserna plateau in Trentino15 (1216 AD). Their language (Zimbar) belongs to the Southern Bavarian-Austrian dialects, which today is only spoken in Luserna (300 inhabitants).16

Figure 1
figure 1

Geographic location of the populations from Trentino analyzed in this study. P, plateau, V, valley, Linguistic groups: Italian-speaking [Semi Ladino (Non and Sole valleys), Trentino occidentale (Giudicarie valley), Trentino orientale (Primiero and Fersina valleys), Trentino centrale (Adige valley) and Trentino dell'Avisio (Fiemme valley) dialects]; Ladin-speaking (Fassa valley); German-speaking (Luserna plateau).

In this study, we investigate the variation of mtDNA in nine populations from Trentino, which are distributed throughout the territory and cover most of its linguistic diversity. Our objectives are to evaluate the extent of genetic variation across the Trentino region and understand whether and how geographic factors and linguistic diversity could have played a role in determining the genetic structure and the demographic profile of the populations under study. The results obtained highlight the importance of historical, cultural and environmental knowledge when studying the genetic structure of human populations, even when focusing on small geographical areas.

Materials and methods

Sampling and laboratory analyses

The data set comprises 393 individuals from nine locations: Adige valley (central region), Giudicarie, Non and Sole valleys (western region), Fassa, Fersina, Fiemme and Primiero valleys and the Luserna plateau (eastern region; see Figure 1). Sampling sites are located at an average distance of 71 km. In the framework of the BIOSTRE project (Biodiversity and history of populations from Trentino: http://laboratoriobagolini.it/en/search/projects/biostre/), blood samples and buccal swabs were collected in apparently healthy and unrelated individuals selected according to the place of birth of the sampled individual and of their parents and grandparents (at least, three generations). The procedure and informed consent were reviewed and approved by the ‘Comitato Etico per la Sperimentazione con l'Essere Umano’ of the University of Trento (http://www.unitn.it/en/ateneo/2640/university-human-research-ethics-committee).

The DNA was extracted using the ‘Nucleic Acid Isolation System’ by QuickGene-810 instrument following the standard protocols for blood and swabs samples (FUJIFILM; http://products.fujifilm.eu/products/life_science/nucleic_acids/index.html).

Following the protocol of Quintans et al.,17 we analyzed the variation at 17 single-nucleotide polymorphisms of the mtDNA coding region (3010, 3915, 3992, 4216, 4336, 4529, 4580, 4769, 4793, 6776, 7028, 10 398, 10 400, 10 873, 12 308, 12 705 and 14 766) in order to classify samples in 21 principal informative haplogroups (H*, H1, H2, H3, H4, H5, H6, H7, HV0, V, HV*, J/T, J1, J2, U*, K, R*, M*, N*, I, and L3*). A region of 651 nucleotides was sequenced, (control region from 16 033 to 114), encompassing the hypervariable region 1 and part of region 2 [HVR-1 (from 16 033 to 16 569) and HVR-2 (from 57 to 114)] according to the Cambridge Reference Sequence rCRS, GenBank accession NC_012920. A fragment of about 1.4 kb was amplified using the primers L15564 and H429, whereas L15990 and H155 were used as internal primers for sequencing (Castrì, unpublished; Supplementary Table S1). To classify the sequences belonging to the K haplogroup in the two sub-haplogroups K1 and K2, three additional positions of the HVR-2 (146, 152 and 263) were investigated by using the additional primer H429 (Supplementary Table S1). Fragment sizes were detected by the ABI PRISM 310 genetic Analyzer (Applied Biosystem, Life Technologies Corporation, Carlsbad, CA, USA).

The quality of the mtDNA data sequences was checked using the method suggested by Bandelt et al.18 (see Supplementary Table S2). The mtDNA sequences reported herein have been submitted to GenBank (accession numbers JQ623511-JQ623902).

Statistical analyses

To evaluate the overall level of interpopulation variation, we calculated the Fst parameter among the nine populations analyzed using the Arlequin software (version 3.5.1.2).19

The genetic relationships among populations were explored by different methods in order to detect possible genetic patterns related to geographic or linguistic diversity. First, we estimated the populations pairwise genetic distances (Fst, Reynolds' distance)20 using the Arlequin software19 and the obtained matrix was visualized in a Multi-Dimensional Scaling plot21 obtained by using the SPSS software (release 16.0.1 for windows, SPSS Inc.). Second, we performed a Correspondence analysis, which is based on the haplogroup frequencies (see Supplementary Table S2), using the Past program (version 1.88).22 Finally, we partitioned genetic variance at different hierarchical levels of population subdivision according to geographic and language groups (see the legend to Figure 1) by means of a molecular analysis of variance23 using the Arlequin software.19

To quantify the correlation between geographical distances and genetic relatedness among populations, the Spearman's rho correlation value (r2) between geographic and genetic distance matrices was obtained using the Mantel program.24 Geographical distances (bidimensional matrix of the cumulative costs) were calculated as the shortest and easiest walking routes (cost surface) between one location and another chosen taking into account the morphology of the landscape (based on the DTM 40m resolution) and the presence of natural barriers (slope inclination as a friction value) using the Grass GIS software with the algorithms r.walk and r.drain.23, 25 This approach was preferred to traditional approaches based on the measures of the great circle or air geographic distances among populations, as it is more appropriate for studies carried out on a large geographic scale in mountainous environment.8, 9

The level of intrapopulation genetic variation was analyzed through the calculation of different indices, including haplotype diversity (HD), the mean number of pairwise differences (MNPDs) and the number of different haplotypes (H) using the Arlequin software.19 Furthermore, to overcome some limitations of the MNPD, which could overestimate diversity in populations that might have experienced effective size reductions (for example, bottlenecks), we also calculated the weighted mean pairwise difference within each haplogroup (weighted mean intralineage mean pairwise differences, WIMPs) as reported in Hurles et al.26

As a final step, we applied different methods to characterize the demographic profile of populations under study. These include the Fu's Fs neutrality test27 and the mismatch distribution (that is, the distribution of the number of pairwise differences between haplotypes),28, 29 which were obtained using the Arlequin software.19 Moreover, a Bayesian approach was used in order to infer the effective population size and the growth rate in Trentino populations. A Markov Chain Monte Carlo coalescent-based algorithm implemented in the Beast software (v1.4.8)30 was used and for each population we set an upper limit of 100 000 for population size and a mutation rate of 4.3 × 10−6 mutations per site per generation (25 years) as reported in Soares et al.31 A Hasegawa Kishino Yano + G model of nucleotide substitution32 was used under the demographic model (exponential growth or constant population), which seemed to be the most appropriate considering the demographic inferences obtained through the methods indicated above (see Table 2). We performed a 200000000 Monte Carlo sampling among possible genealogical trees (20000000 burn-in) to ensure that convergence had been achieved and that effective sample sizes were adequately high. Mean value, confidence intervals and median values were inferred by Tracer 1.5 software (http://tree.bio.ed.ac.uk/software/tracer/), the modal value was inferred with R software (ver. 2.11.1, http://www.r-project.org/).

Results

Mitochondrial variation in Trentino populations

A total of 191 different haplotypes were detected in 393 individuals (see Supplementary Table S2). Transitions occurred in 102 out of 110 total polymorphic sites, transversions in five (16188G, 16227C, 16305T, 16317T, 16318T) and both mutation types in three sites (16129G-C, 16147C-A, 16355C-A). Moreover, one deletion (del 16162) and two insertions (+44C and +57C) were detected in one, five and two subjects, respectively. Heteroplasmy was observed in two individuals (position 16263C/T and 16327C/T).

The haplotypes were classified into 20 main haplogroups based on single-nucleotide polymorphism analysis (H*, H1, H2, H3, H4, H5, H6, H7, HV0, HV*, V, J/T, J1, J2, U*, K, R*, M*, N* and I) and were further classified into 92 sub-haplogroups (Supplementary Table S2) according to the updated phylogenetic tree of global human mtDNA33 (Build 10, http://www.phylotree.org) and the HaploGrep software34 (http://haplogrep.uibk.ac.at/). Five haplogroups (H, U, T, J and HV) occur in our data set at frequencies >5%. The prevalent mtDNA haplogroups are the same that are found in most European populations.35 The haplogroup H accounts for 41% of the total sequences (from 35 to 41% in Europe), the haplogroup U for 16% (from 18 to 22%), whereas the haplogroup T reaches a frequency of 12% (from 9 to 11%) and J arrives at 8% (from 8 to 9%). However, the H6 haplogroup represents an interesting finding. In fact, it is rather rare in Europe (average of 1%, recalculated from Brandstätter et al.36) and mostly restricted to the near-East and the south Caucasus.37 However, H6 is widespread in the Cimbri and the Fersina valley (20%), where it reaches a statistically higher frequency than in the other groups from Trentino (from 0 to 4%; P<0.05 in the differentiation test between all pairs of populations). Another observation worth noting regarding the near-eastern haplogroup K, which was found at frequency of 19% in the Cimbri community, a value which is very distant from those observed in the other populations from Trentino (from 0 to 5%; P<0.05 in the differentiation test in four out of eight comparisons) and eastern-central European populations (from 3% in Romania to 6% in Austria and Bavaria; recalculated from Achilli et al).38

Intrapopulation diversity parameters are reported in Table 1. The populations from the Adige valley and Luserna show values at the two extremes for the number of different haplotypes, H [44 (79%) versus 11 (52%)] and HD (0.990±0.006 versus 0.919±0.034). It is worth noting that one of the highest MNPD values was observed among the Cimbri. Interestingly, the apparent contrast between the two measures of within-population diversity for the Cimbri disappeared when MNPD was replaced by WIMP. In fact, the Cimbri show one of the lowest values of weighted mean pairwise difference within each haplogroup (2.550) among those found in the total database (from 2.364 to 4.157). This supports the notion that WIMP may be a more appropriate measure of intrapopulation genetic diversity than MNPD when dealing with populations that are subject to genetic drift (see Discussion).39

Table 1 Intrapopulation diversity parameters in the nine populations analyzed

Interpopulation variation

The percentage of variation among Trentino populations was found to be close to 1% (Fst=0.0115; P-value=0.0005). Using the jacknife procedure, this parameter reached the minimum and the maximum values when the population from Luserna and the Adige valley were removed from the data set, respectively (Fst=0.0089 and 0.0137).

The genetic distances among populations were visualized in a bidimensional Multi-Dimensional Scaling plot (Figure 2). We can distinguish one central group composed of most populations under study (11/36 statistically significant genetic distances, see the legend to Figure 2). There are two populations that are external to this cluster; the Sole valley and, more markedly, the Cimbri. This pattern was also confirmed when the genetic relationships between populations were analyzed by the Correspondence analysis, which is based on haplogroup frequencies (Supplementary Figure S1).

Figure 2
figure 2

Multi-dimensional scaling plot (MDS) of the Fst genetic distances in the nine populations analyzed. Stress value: 0.167. Adi, Adige; Fas, Fassa; Fer, Fersina; Fie, Fiemme; Giu, Giudicarie; Lus, Luserna; Non, Non; Pri, Primiero; Sol, Sole. Symbols: circle (Italian-speaking), triangle (German-speaking), rumble (Ladin-speaking). Statistically significant genetic distances: Giu/Fer, Sol/Adi, Sol/Pri, Sol/Non, Sol/Fer, Sol/Lus, Lus/Pri; Lus/Giu; Lus/Non, Lus/Fie, Lus/Fas.

The populations analyzed were then compared with another 37 European populations (we considered the region from np 16 033 to 16 383 of the HVR-1; see Supplementary Figure S2 and Supplementary Table S3). Together with the populations from the Pusteria, Isarco and Gardena valleys (Ladins) from south Tyrol6 and the other Ladin group from Veneto40 (eastern Italy), the Cimbri from Luserna again behave as an outlier (with 40/45 significant genetic distances with an average of Fst=0.036). The remaining Trentino populations are scattered in the central group showing a certain degree of genetic heterogeneity (14/28; average Fst, from 0.013 to 0.018). It is interesting to note that the Ladins from the Fassa valley analyzed in this study are genetically separated from the other Ladin groups studied by Thomas et al.6 and Vernesi et al.,40 with all the pairwise genetic distances among these groups of Ladins being statistically significant. Our data set was also compared with a subset of 25 European populations, six of which, in previous studies, had been reported to be isolated (Aromuns from Albany and Macedonia,41 the Csango from Hungary,42 the Sardi from Urzulei43 and the Ladin from the Badia and Gardena valleys in south Tyrol;6 Supplementary Figure S3 and Supplementary Table S3). The genetic differentiation of the Cimbri is among the highest we observed. In fact, only three populations [(Csango (CSA) Aromuns from Macedonia (Aro5) and Sardi (Sar1) in Supplementary Figure S3)] show a higher number of significant genetic distances (24/24; average Fst from 0.027 to 0.041) than observed in our group (22/24; average Fst=0.038).

The analysis of molecular variance showed that the percentage of variation obtained subdividing the data set into geographic or linguistic groups (see the legend to Figure 1) was very low and not statistically significant (data not shown). Accordingly, a substantial lack of correlation between genetic and geographic distance matrices was observed (r2=0.0119, P=0.344).

Demographic inferences

The demographic parameter estimates in the Trentino populations are reported in Table 2 and Figure 3. The Fu's Fs and the mismatch distribution (with associated Harpending's Raggedness index value, HRI) have been shown to be sensitive to the demographic history of the populations.27, 29 Population demographic expansion generally leads to significant large negative values of Fu's Fs and to unimodal distribution of nucleotide differences (with lower HRI), whereas stationary populations display reduced negative values of Fu's Fs and multimodal distribution with larger HRI.44, 45 Based on these parameters, the populations from Luserna and Fersina valleys are close to the expectations of a stationary state (not significant Fu's Fs and highest HRI values). On the contrary, we have observed clear signs of demographic expansion, although with different intensities, in the other populations. In fact, Adige, Giudicarie, Non, Sole and Primiero valleys display stronger signals than Fiemme and Fassa valleys. This pattern is mirrored by the estimates of effective population size, which ranged from 3293 (Luserna) to 73 217 (Adige), reaching relatively low values in the populations from Fersina, Fassa and Fiemme valleys (from 3550 to 11 006). Similarly, growth rate estimates, which were not calculated for Cimbri and Fersina because of the lack of signatures of expansion, attained a minimum of 1.8 × 10−3 and 1.9 × 10−3 in Fassa and Fiemme valleys, respectively, and the maximum of 3.5 × 10−3 in Non Valley (Table 2).

Table 2 Estimates of some demographic parameters in the nine populations analysed
Figure 3
figure 3

Mismatch distribution in the nine populations analyzed (see also Table 2).

The small number of samples analyzed for the population from Luserna (21) and Fersina valley (25) compared with the other groups (from 40 to 63) may give rise to the possibility that these results could have been biased by sample size. To verify this possibility, we extracted 1000 random samples of 21 individuals for the seven populations with the highest number of samples analyzed (see Table 2), and recalculated the Fu's Fs, as well as the HD and the MNPD parameters. The distribution values obtained for these parameters are provided in the Supplementary Figures S4, S5 and S6. Even after this procedure, the populations with the strongest signal of demographic expansion (Adige, Giudicarie, Non and Sole valleys) show statistically significant values of Fu's Fs (Supplementary Figure S4), which makes us confident that the results obtained for Luserna and the Fersina valley are reliable.

Discussion

The genetic structure of Trentino populations

Understanding the way in which genetic variation is structured at micro-geographic level is an important objective for anthropologists and human geneticists. In fact, the reconstruction of the genetic structure at a finer scale may help in gaining a better understanding of whether geographical and/or cultural factors have influenced the genetic relations among populations in a short geographic range. This is even more true for areas such as Trentino where geographical and linguistic factors could have played an important role in shaping genetic diversity.

On the whole, Trentino populations show a level of genetic differentiation, which is comparable to or even higher than widely distributed European populations. This can be easily inferred from the Multi-Dimensional Scaling of the genetic distances among 46 European populations, where the populations analyzed are dispersed in the plot (see Supplementary Figure S2). From a quantitative point of view, the mitochondrial variation among Trentino populations is higher (Fst=0. 0117, recalculated considering the region spanning from np 16 033 to 16 383) than that obtained for nine geographically distant European populations (from Fst=0.0041 to 0.0098 using different data sets).

We could detect no statistically significant correlation between genetic and geographic distances, which indicates that the genetic differentiation among populations does not decrease consistently according to their geographic distances. This is in accordance with previous mtDNA studies conducted on a macro-geographic scale in Europe.46, 47 Nonetheless, our study supports a role for geography in shaping the genetic structure of Trentino populations, although in a more subtle way than would be expected. In fact, through the estimates of the demographic parameters, it was possible to highlight important differences among the central western and the eastern parts of the region. The populations settled in the former area (the Adige, Giudicarie, Sole and Non valleys) show strong signs of expansion, whereas most inferences obtained for the latter zone fit a stationary demographic state (Luserna and Fersina valley) or indicate a lower degree of expansion (Fassa, Fiemme). The only exception is represented by the Primiero sample, which shows strong sign of expansion and growth rate despite it belongs to the eastern area.

These differences may be explained by geomorphological factors and also by archeological data.48

Compared to the eastern area, the geographical settings of the central-western region is characterized by wider valleys, which are more suitable for agricultural activities, lower altitudes and more accessible mountain passes. These characteristics, as well as the remarkable presence of good quality flint sources, may explain why the right side of the Adige valley has been densely populated since ancient times, as indicated by the abundance of archeological records in the Adige, Non and Giudicarie valleys.48 Accordingly, considerable archeological evidence dated from Paleolithic to protohistoric times has been recovered, leading scholars to suggest that human groups were constantly present during the Holocene all along the Adige valley.48 Moreover, the presence of exogenous raw materials (for example, obsidian, green stone), ornaments or symbolic items (for example, the shapes and decoration styles of the pottery), indicates contacts between communities located in the Adige valley and those settled on the Po plain and/or in the northern Alps.48 By contrast, geomorphological features have always represented an obstacle to the massive peopling of the eastern area. In fact, the Valsugana Valley is the only densely populated area in this part of the territory.49

Linguistic diversity was found to be a poor predictor of mitochondrial genetic diversity in Trentino, a fact that has been observed on a continental scale,50 contrasting with previous studies of autosomal variation,51, 52 whereas the analysis of the European Y-chromosome gave conflicting results.53, 54

This is particularly evident when comparing the relationships between the Ladins from the Fassa valley and the Cimbri from Luserna with other populations from Trentino and Europe. The Ladins show a substantial genetic affinity with the Italian-speaking groups from Trentino. In fact, they are separated from the neighbor population from Fiemme valley by their lowest pairwise Fst genetic distance (0.006), and share the highest number of haplotypes (9). Unfortunately, groups that are geographically close to Ladins have not been sampled in previous studies,6, 40 which hampers any comparison with the results obtained in the course of the present investigation. However, the genetic distance between Ladins from Fassa valley and those previously studied is relatively high [(from 0.02 (with Ladins from Badia valley, LadB) to 0.08 (with Ladins from Veneto, LadV), see Supplementary Figure S2)]. Furthermore, the number of haplotypes shared by Ladins from Fassa valleys and those from other alpine areas is relatively low (four with LadB, four with LadG and zero with LadV) and only one haplotype is shared simultaneously among all Ladin groups. This is consistent with the high diversity among Dolomitic Ladin communities reported by Thomas et al.6 On the whole, the data accumulated so far suggest that the genetic heterogeneity among Ladins could be due to the combined effect of a scarce reciprocal genetic exchange, probably due to geographic isolation, and genetic drift.

The Cimbri seem to be set apart from all the other populations, both from Trentino and Europe, including linguistically related German groups from Bavaria (Supplementary Figure S2). Considering our data set, the Cimbri show the lowest values of the parameters, which describe within-population genetic variation (HD, number of haplotypes and WIMPs). They also show one of the lowest levels of HD among 60 other European populations (see Supplementary Table S3). Finally, they lack three of the most frequent European lineages (H*, H1 and H3) while they bear rare haplogroups at remarkably high frequencies (H6 and K1, see Supplementary Table S2).

Certainly, one should consider that the small number of individuals analyzed for the Cimbri, the lowest of our data set, could have biased the result. However, applying the resampling procedure we showed that our results cannot be simply accounted for by a low sample size. Moreover, we sampled a portion of 7% of the current population of Luserna, the highest value for the data set (from 0.05% for the Adige valley to 2.5% for the Fersina valley, see Supplementary Table S4). Furthermore, a peculiar mitochondrial profile is not to be unexpected for this group, in which genetic drift could have played an important role in shaping the genetic structure in two different ways. First, the Cimbri from Luserna could have experienced a founder effect, a possibility indicated by historical sources that trace the origin of the community back (XIII century) to a few families that moved from Lavarone, the first Trentino area that was colonized by the Cimbri, to the Luserna plateau.15 Moreover, the community has always been characterized by a very small number of individuals, which reached a modest peak of 1005 inhabitants in the 1920s.15 Finally, cultural isolation could have increased the differentiation between the Cimbri and the other populations, a possibility supported by some lexical data, which indicate that the Cimbri language has maintained some words (for example, Libar) that have been taken from the neighboring romance-speaking populations during the first contacts (XIII century).55

In conclusion, through this study, we highlighted some important aspects of the genetic structure of populations of Trentino. The genetic data obtained in the course of this study made it possible to infer different demographic patterns in two longitudinal areas, which are seemingly related to geomorphological factors. Furthermore, although no evident relation between linguistic and genetic diversity was observed, the two linguistic isolates, Ladins and Cimbri, proved to be the key populations when drawing inferences of anthropological significance. The different way in which Ladins and Cimbri are genetically related to the populations from Trentino is of particular interest. In fact, this evidence points to the importance of carrying out a systematic study among the linguistic isolates of the Italian Alps to understand how they fit into the surrounding genetic background and identify the signatures of their original genetic ancestry.56, 57, 58