Introduction

Due to their uniparental modes of inheritance, the mitochondrial DNA (mtDNA) and male-specific region of the Y chromosome (MSY) have been extensively used to study differences between paternal and maternal histories of human populations and assess the influence of socio-cultural practices on sex-specific patterns of variation [1, 2]. However, direct comparisons of mtDNA and MSY diversity were hampered until recently by differences between methods used to detect variation, and by ascertainment bias in the choice of MSY single nucleotide polymorphisms (SNPs) [3]. In the last few years, various studies took advantage of the increasing availability of next-generation sequencing (NGS) platforms to compare unbiased MSY and mtDNA sequence data on a global scale [4,5,6,7]. Nevertheless, the sex-specific patterns disclosed by worldwide studies reflect average trends exhibited by different macro-regions, and do not explore the rich diversity of interactions between genetic variation and cultural practices that shape MSY and mtDNA variation at the local level [8].

Here, we used targeted NGS to generate unbiased sequence data from 2.3 Mb of the MSY in 170 males comprising seven small communities from SW Angola, whose complete mtDNA genomes have been previously investigated [9]. Despite being located in a relatively small geographic area (Fig. 1), these groups offer a unique framework to explore the relationships between socio-cultural practices, local variation in mtDNA versus MSY diversity, and continent-wide migratory processes. The !Xun from Kunene Province are Kx’a-speaking hunter gatherers who descend from the oldest population layer of southern Africa represented by the so-called “Khoisan” peoples [10]. The Kuvale, Himba, Tjimba, Twa, Kwisi, and Kwepe from the Angolan Namib Desert are all Bantu-speaking groups, who display striking socio-economic disparities that have been associated with different population histories. The Himba and Kuvale are two mildly polygynous pastoralist populations belonging to the broad Herero ethnic division, who arrived in SW Africa during the Bantu expansions [11]. The Kwepe, Twa, Kwisi, and Tjimba are marginalized ethnic minorities subsisting on small-scale pastoralism and foraging, who gravitate around the Himba and Kuvale and are best described as peripatetic peoples [12]. While the Himba-speaking Tjimba are commonly thought to be impoverished Himba [13], the Kuvale-speaking Kwisi and Twa have been associated with a hypothetical stratum of pre-Bantu foragers of unknown provenance, whose original language has been lost [14]. The Kwepe, who spoke Kwadi—a now extinct language belonging to the Khoe-Kwadi family—before shifting to Kuvale, were considered to be remnants of a pre-Bantu pastoralist migration introducing Khoe-Kwadi languages into southern Africa [15]. In spite of their different subsistence strategies, all Bantu-speaking groups of the Angolan Namib share a patrilocal residence pattern and a matrilineal descent-group system, which regulates important parts of social life, such as group membership, inheritance, and marriage behavior.

Fig. 1
figure 1

Y chromosome phylogeny, haplogroup distribution, and map of the sampling locations. The phylogenetic tree was reconstructed in BEAST based on 2379 SNPs and is in accordance with the known Y chromosome topology [7, 27, 30]. Main haplogroup clades and their labels are shown with different colors. Age estimates are reported in italics near each node, with the mean TMRCA of main haplogroups shown with their corresponding color. A map of the sampling locations, re-used with permission from Oliveira et al. [9], is shown on the bottom left, and the haplogroup distribution per population is shown on the bottom right, with color-coding corresponding to the phylogenetic tree. In the map, the Angolan Namib province is delimited by a gray contour, country borders are show in black, and the names of the main intermittent rivers are indicated in blue

In this study, we compare new data on MSY sequence variation with our previous findings on whole mtDNA genomes [9] to investigate whether the histories of these groups fit the expectations of earlier anthropological and linguistic hypotheses, and to study the causes and consequences of sex-biased processes for their genetic variation.

Material and methods

Samples

We analyzed 162 partial MSY sequences sampled from populations that inhabit the Angolan Namib Desert (including the Himba, Kuvale, Kwepe, Kwisi, Twa and Tjimba) and the Kunene Province (!Xun) (Fig. 1; Table S1), and eight additional sequences from Bantu-speaking individuals with other ethnic affiliations used exclusively in haplotype-based analysis. The samples were collected as described previously [16], with the donors’ written informed consent, the ethical clearance of ISCED and CIBIO/InBIO-University of Porto boards, and the support and permission of the Provincial Governments of Namibe and Kunene.

MSY Sequencing

Indexed libraries produced previously [9] were enriched for 2.3 Mb of target MSY as previously described [17]. Paired-end sequencing data of 107 bp length were generated on the Illumina HiSeq 2500 platform and standard Illumina base-calling was performed using Bustard. We trimmed Illumina adapters and merged completely overlapping paired sequences using leeHOM [18], and de-multiplexed the pooled sequencing data using deML [19]. The sequencing data were aligned to the human reference genome hg19 and SNPs were identified according to ref. [17]. Reads that aligned to the MSY captured region are available in the European Nucleotide Archive (https://www.ebi.ac.uk/ena) with the study accession number PRJEB27776. Y chromosome haplogroups were determined using yhaplo [20], which is based on the International Society of Genetic Genealogy (ISOGG) nomenclature of January 2016 (https://isogg.org/tree/index.html), with two modifications: (i) the SNP defining B-50f2(P) (Table S2) was corrected according to a recent ISOGG update (August 6th, 2017); and (ii) instead of using variant NC_000024.9:g.16251357G>A (not typed in this study), we used variant NC_000024.9:g.7595638T>A [21, 22] to define haplogroup E-V1245 (Table S2).

Data analysis

Genetic diversity indices, pairwise Φst values and Analyses of Molecular Variance (AMOVA) were computed in Arlequin v35 [23]. Non-metric multidimensional scaling (MDS), k-means and neighbor-joining (NJ) analyses based on pairwise Φst distance matrices were carried out in R, using the functions “isoMDS”, “kmeans” with several random starts, and “nj”, respectively. To determine the support of NJ partitions we generated bootstrap replicates with the function “boot.phylo” and used “stat.phist” (strataG v0.9.2) to recalculate Φst distances. Mantel tests were performed in R using the function “mantel” (package vegan) with 1,000 permutations of matrix elements to determine significance.

A phylogenetic tree was constructed with BEAST v1.8 [24], using an A00 representative haplotype as outgroup [4]. To account for the absence of invariable sites in BEAST, we applied an invariant site correction. We used a strict clock and a mutation rate of 0.74 × 10−9 mutations/bp/year, as estimated by Karmin et al. [4] based on calibration with two ancient DNA sequences. Despite the uncertainty regarding the mutation rate and differences between genealogical and evolutionary-based estimates, the estimates calibrated with aDNA better match independently-dated events such as the Out-of-Africa expansion and the peopling of the Americas [3]. Furthermore, the use of aDNA calibrations yielded similar results in different studies [25]. Additional settings used in BEAST are reported in Table S3. We performed additional BEAST runs to build Bayesian Skyline plots (BSP) for different population groupings based on MSY and previously published mtDNA data [9] (see settings in Table S3). The estimates of effective population size (Ne) were obtained assuming the more conservative, lower values of generation time recommended by Fenner [26] for hunter-gatherer females and males (25 and 31 years, respectively), as they are likely more suitable for the traditional societies studied here than the corresponding values estimated for nation states.

Median-joining networks were computed with Network 5.0 (www.fluxus-engineering.com) and plotted with Network Publisher v2.1.1.2.

For comparative purposes, we merged the sequence data generated in this study (2.3 Mb) with (i) 447 partial MSY sequences (0.9 Mb) from other southern African groups [27], obtaining an overlap of 0.56 Mb, and (ii) 21 complete Y chromosomes of various origins in Africa [22, 28]. The merged datasets were used to build networks.

Results

We obtained 2.3 Mb of MSY sequence from each of 170 Angolan individuals, with a mean coverage of 28× (range 8–52×). After quality filtering, a total of 1854 SNPs were identified, of which only 66% are reported in dbSNP (build 150). A VCF file containing all SNPs and 154 non-variable nucleotide positions that are different from the reference sequence is available online (Supplementary datafile 1). An average of 6 nucleotides per individual (0.3%) were missing and were imputed with Beagle 4.0 [29] as described in Supplementary datafile 2. We used a simulated dataset to evaluate the performance of Beagle in imputing missing genotypes from haploid data and show that this imputation choice is suitable for empirical datasets where the amount of missing genotypes is below 25% (Fig. S1; Supplementary datafile 2).

MSY phylogeography in Angola

Figure 1 displays a Bayesian phylogenetic tree for the Angolan MSY sequences, which also includes an early splitting A00 haplotype [4] (see also Fig. S2 for a network relating all Angolan haplotypes). The estimated split time of the A-L419 branch (143 kya), which comprises A-P262 and A-M51, corresponds to the most recent common ancestor (TMRCA) of all Angolan sequences. This and the date of the split of the Angolan sequences from A00 (256 kya) are remarkably close to previous estimates based on high-coverage whole Y chromosomes sampled from other populations (Table S4) [4, 30]. Despite being sampled in a relatively small area, the Angolan lineages have very different phylogeographical characteristics, and belong to haplogroups that have been associated with three major population layers that settled southern Africa at different periods (see Table S2 for alternative haplogroup nomenclatures). A-P262, A-M51, and B-50f2(P) contain deep-rooting nodes and are associated with an early substrate of “Khoisan” foragers (>10 kya) speaking Kx’a and Tuu languages [27]. These represent 44% of the !Xun but only 0–9% of the genetic makeup of Bantu-speaking peoples from Angola (Fig. 1; Table S2). E-M293 sequences, which have been linked to a pre-Bantu migration of sheep pastoralists from East Africa (~2 kya) [31] have a TMRCA of 6.6 kya and are observed in varying frequencies among the Kwepe (7%), Kwisi (6%), Himba (2%), and !Xun (25%) (Fig. 1; Table S2). E-M180 and B-M109, which have been previously associated with the Bantu expansions [32, 33] (though B-M109 might also have existed in “Khoisan” groups before the arrival of Bantu speakers [27]), have TMRCAs close to 10 kya and represent 91–98% of the MSY sequences sampled among Bantu-speaking groups and 33% of the !Xun MSY sequences (Fig. 1; Table S2). In accordance with previous studies [7], subhaplogroup E-M180 displays a star-like branching pattern (Fig. 1 and S2), consistent with a rapid demographic expansion from a small ancestral population size.

An inspection of the molecular relationships between MSY haplotypes from different populations reveals that most lineages from Angola cluster together with other available sequences from southern Africa (Fig. S3). The only exceptions are the B-M109 sequences (Fig. S3g-h), which are grouped in a divergent monophyletic cluster that includes lineages previously found in SW Bantu groups from Namibia, and the B-50f2(P) haplotypes, which are very divergent from haplotypes found in the wider region of southern Africa or in Pygmy groups (Fig. S3c-d).

Intrapopulation diversity and demographic inferences

Table S1 presents summary statistics for MSY and mtDNA diversity. The MSY nucleotide diversity in the !Xun (πMSY = 1.2 × 10−4) is 2.5 times higher than in the Bantu-speaking groups (πMSY = 4.8 × 105), and similar to values calculated from previous studies for other “Khoisan” groups (πMSY = 1.5 × 10−4–1.8 × 10−4) [6, 27].

In Bantu speakers, MSY vs. mtDNA diversity ratios accounting for differences in mutation rates of the two chromosomes (πMSY/πmtDNA) range from 0.09 to 0.5 (Table S1), indicating that Bantu peoples, like many other human populations, display less MSY diversity relative to mtDNA than expected in neutral demographic models without sex-biased processes [4, 6, 27, 34]. In contrast, the !Xun resemble other “Khoisan” groups in displaying comparable levels of diversity in both sexes (πMSY/πmtDNA = 1.11) [6, 27].

To better understand the present differences in levels of mtDNA and MSY diversity, we inferred the history of male and female effective population size (Ne) changes by using BSPs (Fig. 2). We found striking past population size differences between males and females in a pooled sample comprising all Bantu-speaking groups from the Angolan Namib (Fig. 2a). Starting from the past, Ne estimates based on mtDNA (Nef) remained stable for a long period of time (~20,000), and display a sharp reduction with minimum size (~2000) around 2 kya, followed by expansion to the present (Fig. 2a). In contrast, the male demographic profile is characterized by a recent expansion from a relatively low, more stable long-term population size (Nem ~3000) (Fig. 2a). Unlike the Bantu-speaking populations, the !Xun displayed almost overlapping female and male population sizes that start to decline around 10 kya with no traces of population recovery (Fig. 2b).

Fig. 2
figure 2

Bayesian skyline plots (BSP) of effective population size change through time, based on mtDNA (red) and the MSY (black). Thick lines show the mean estimates and dashed lines show the 95% HPD intervals. The vertical line highlights the 2 ky before present mark. Effective sizes are plotted on a log scale. Generation times of 25 and 31 years were assumed for mtDNA and the MSY, respectively [26]

When population size changes among Bantu speakers with different subsistence patterns were compared, the peripatetic communities showed less pronounced differences in sex-specific Ne, and smaller post-bottleneck size recoveries than the pastoral populations (Fig. 2c, d). These differences persisted when the lower sample sizes of peripatetics were taken into account (Fig. S4).

As BSPs assume a single, isolated, panmictic population, and the Angolan groups are likely to be part of a network of structured populations, some inferred demographic events might have been more influenced by migration levels and sampling design than by real changes in population size [35]. To account for these confounding factors, we generated separate mtDNA and MSY BSPs for all individual groups (Fig. S5). The demographic profile of the Kuvale, who display high frequencies of “Khoisan”-related mtDNA haplogroups, remained similar to that of the Himba, who have similar sample sizes and do not show signs of “Khoisan” introgression in their mtDNA [9], suggesting that the differences between female and male BSPs of pastoralists are not exclusively due to admixture with resident foragers (Fig. S5). On the other hand, not all of the BSPs for individual peripatetic groups show the signs of post-bottleneck Ne recovery that were detected in the pooled “peripatetic” sample (Fig. S5).

Interpopulation diversity

To compare the levels of between-population divergence for the MSY with our previous results on mtDNA variation in the same populations [9], we carried out AMOVA based on different partitions of the data (Table S5). Although we found similar amounts of divergence between the !Xun and Bantu speakers (22.5% for the MSY vs. 16.6% for mtDNA), the genetic differentiation among Bantu speakers is much lower for the MSY than for mtDNA (4.4% for the MSY vs. 20.2% for mtDNA), even when the Kuvale are removed from the comparisons to eliminate the confounding effects of “Khoisan” lineages on the levels of population divergence (5.5% for the MSY vs. 18.8% for mtDNA) [9] (Table S5). Moreover, we found that the genetic differentiation among matriclans for mtDNA (50.8%) is much higher than for the MSY (2.5%) (Table S5), reflecting the structuring effect of the matriclanic system on mtDNA, but not on MSY variation [9].

The population relationships displayed in an MDS plot based on pairwise Φst values further reveal noticeable differences between the MSY and mtDNA (Fig. 3a), which are reflected in a lack of correlation between their corresponding Φst matrices (Mantel test, p-value = 0.091). In addition, there is a clear mismatch between the clustering patterns inferred from k-means analyses based on MSY and mtDNA (Fig. 3b, c). For mtDNA, the best k-means partition (k = 4; Φct = 20.9%; p-value < 0.006; Table S5) places the Kwisi and Twa in a separate group, and associates the Kwepe with the Himba. For MSY (k = 3; Φst = 14.4%; p-value < 0.014; Table S5), the Kwepe and the Kwisi are grouped with their northern Kuvale neighbors, while the southernmost Twa are grouped with the Himba (Figs. 1 and 3). Interestingly, the MSY clustering has remarkable parallels with the distribution of cultural traits such as language, dressing habits, and names of matriclans. For example, while the Twa tend to imitate the dressing habits of Himba women, the Kwisi and Kwepe try to mimic the characteristic attire of the Kuvale [14]. Moreover, the variety of Kuvale spoken by the Twa has clearly been influenced by the Himba language, while the Kwisi and Kwepe speak language varieties that are practically indistinguishable from mainstream Kuvale [9] (see NJ in Fig. S6a). This is also reflected in the significant correlation we found between lexicon-based linguistic distances [9] and MSY distances (Mantel test, p-value = 0.033). Finally, we have previously shown that peripatetic groups tend to replace their own clan names with those of their neighboring pastoral groups, leading to the shared use of matriclan labels by the Twa and Himba on the one hand, and by the Kwisi, Kwepe and Kuvale on the other [9]. Despite the genetic consistency of the matriclanic system within each group, this clan switching leads to quite different patterns of population relationships based on mtDNA variation and on the distribution of clan names (Fig. S6b-c). In contrast, we found that NJ trees constructed based on Φst distances for MSY and on distances based on clan name frequencies do have similar patterns (Fig. S6b, d), as confirmed by a significant correlation between the two distance matrices (Mantel test, p-value = 0.001).

Fig. 3
figure 3

Multidimensional scaling (MDS) and k-means analysis based on Фst distances of mtDNA and the MSY. a MDS. The stress values are 0.003 and 0.006 for MSY and mtDNA, respectively. b k-means. Each color represents the cluster assigned to a population. c Percentage of variance explained by each k. The best k (i.e., the value where the percentage of the variance explained starts to plateau) is highlighted in gray

Discussion

The origins of MSY diversity in SW Angola

In accordance with our previous mtDNA study [9], the present MSY analysis reveals a major division between the Kx’a-speaking !Xun and the Bantu-speaking groups, whose paternal genetic ancestry does not display any old remnant lineages or a clear link to pre-Bantu eastern African migrants introducing Khoe-Kwadi languages and pastoralism into southern Africa (cf. [15]). This is especially evident in the distribution of the eastern African subhaplogroup E-M293 [31], which reaches the highest frequency in the !Xun (25%) and not in the formerly Kwadi-speaking Kwepe (7%). This observation, together with recent genome-wide estimates of 9-22% of eastern African ancestry in other Kx’a and Tuu-speaking groups [36], suggests that eastern African admixture was not restricted to present-day Khoe-Kwadi speakers. Alternatively, it is likely that the dispersal of pastoralism and Khoe-Kwadi languages involved a series of punctuated contacts that led to a wide variety of cultural, genetic and linguistic outcomes, including possible shifts to Khoe-Kwadi by originally Bantu-speaking peoples [37].

Although traces of an ancestral pre-Bantu population may yet be found in autosomal genome-wide studies, the extant variation in both uniparentally inherited markers strongly supports a scenario in which all groups of the Angolan Namib share most of their genetic ancestry with other Bantu groups but became increasingly differentiated within the highly stratified social context of SW African pastoral societies [11].

The influence of socio-cultural behaviors on the diversity of MSY and mtDNA

A comparison of the MSY variation with previous mtDNA results for the same groups [9] identifies three main sex-specific patterns. First, gene flow from the Bantu into the !Xun is much higher for male than for female lineages (31% for the MSY vs. 3% for mtDNA; Table S2, see also Fig. 2 of ref. [9]), similar to the reported male-biased patterns of gene flow from Bantu to “Khoisan”-speaking groups [34], and from non-Pygmies to Pygmies in Central Africa [38]. A comparable trend involving the introgression of MSY eastern African lineages was also found in the !Xun, who exhibit high frequencies of haplogroup E-M293 (25%) while retaining the mtDNA profile characteristic of southern African “Khoisan” populations [9]. These patterns may be explained by a context of social discrimination, in which women from food producing populations are prevented from moving into forager communities, while food producing men and forager women can have children, who will then mostly be raised in the mother’s group [38, 39]. However, the dominant Kuvale pastoralists, who show a high frequency of “Khoisan”-related mtDNA (53%), indicate that admixed children may also remain in the father’s group [9].

Secondly, the levels of intrapopulation diversity in the Bantu-speaking peoples from the Namib were found to be consistently higher for mtDNA than for the MSY, reflecting the marked association between the Bantu expansion and the relatively young MSY E-M180 haplogroup, which has no parallel in mtDNA [27, 40]. In contrast, the !Xun have a more diverse MSY haplogroup composition, combining deeply rooted lineages and younger clades obtained through recent admixture. Using BSP analysis, we found these patterns to be reflected in larger long-term Nef than Nem in Bantu speakers, and more equal sex-specific Ne in the !Xun (Fig. 2).

Global patterns showing that Nef was larger than Nem during a large part of human history have been explained by a number of sex-biased processes, including natural selection affecting the MSY or culturally influenced sex-specific demographic behaviors [1,2,3]. In the context of the Bantu expansions, these patterns have been mostly interpreted as the result of polygyny and/or higher levels of assimilation of females from resident forager communities [39, 41]. However, most groups from the Angolan Namib are only mildly polygynous [11] and ethnographic data suggest that the actual rates of polygyny in many populations may be insufficient to significantly reduce Nem [2, 42]. In addition, the finding of a large Nef/ Nem ratio in the Himba (Fig. S5), who have almost no Khoisan-related mtDNA lineages [9], indicates that female-biased introgression cannot fully explain the observed patterns.

An alternative explanation may be sought in the prevailing matrilineal descent rules, which might have created a sex-specific structuring effect, similar to that proposed for patrilineal groups from Central Asia [43]. As we previously demonstrated [9], all Bantu-speaking groups sampled in the region have genetically consistent, highly structured matrilineal descent systems, with levels of genetic variation between matriclans that are 20 times higher for mtDNA than for the MSY (50.8% for mtDNA vs. 2.5% for the MSY; Table S5). Since ethnic groups are conglomerates of matriclans, they harbor a remarkable amount of mtDNA structure and have fragmented female populations that can inflate Nef estimates [44]. Under this hypothesis, and using the terminology proposed by Wakeley (1999) [45], the population size growth starting at ~2 kya that is detected in both the female and male BSPs (Fig. 2 and S5) would be associated with the “scattering phase” of the mtDNA tree. The separation of male and female BSPs before 2 kya would then correspond to Wakeley’s “collecting phase” [45] and reflect the inflation of Nef due to the large mtDNA differences between matriclans. Since the male pool is not structured, the MSY tree has no collecting phase and Nem remains essentially unchanged beyond 2 kya. In the future, it will be interesting to make a more comprehensive re-evaluation of the relationship between descent rules and Nem/Nef ratios across different Bantu populations, since studies in other regions of the world have shown that the more structured sex may not display the highest Ne if the extinction rate of clans is high [43, 46].

The third important sex-specific pattern observed in this study is the much lower amount of between-group differentiation for the MSY than for mtDNA among Bantu-speaking populations (4.4% for the MSY vs. 20.2% for mtDNA), in spite of the patrilocal residence patterns of all ethnic groups (Table S5). This difference can hardly be explained by unequal levels of introgression of “Khoisan” mtDNA lineages into the Bantu, since the percentage of mtDNA variation remains high (18.8%) when the Kuvale, who have high frequencies of “Khoisan”-related mtDNA, are excluded from the comparisons. It therefore seems more plausible that differentiation is higher in the mtDNA simply because there is more ancestral mtDNA than MSY variation that can be sorted among different populations [47]. Moreover, due to the matriclanic organization of all Bantu-speaking communities, factors enhancing inter-group differentiation, like kin-structured migration and kin-structured founder effects [48], would have been restricted to mtDNA. Finally, it is also likely that the discrepancy between among-group divergence of mtDNA vs. the MSY might have been influenced by higher migration rates in males than females. In fact, although all Bantu-speaking populations have patrilocal residence patterns, the observance of endogamy rules severely constrains the between-group mobility of females. In this context, the children from extramarital unions involving members from different populations tend to be raised in the mother’s group, effectively increasing male versus female migration rates. Moreover, it is likely that, in the highly hierarchized setting of the Namib, most inter-group extramarital unions would involve men from dominant groups and women from peripatetic communities. This hypothesis is indirectly supported by the finding that in MSY-based clusters (but not in mtDNA-based clusters) pastoralist populations are grouped together with peripatetic communities that share their cultural traits (Figs. S6 and 3b), suggesting that migration of MSY lineages follows a path that is similar to horizontally transmitted cultural features.

Taken together, our results highlight the importance of the matrilineal rule of descent in shaping sex-specific patterns of population diversity and differentiation, stressing the need to better understand how regularities disclosed at the global level are associated with demographic processes occurring at local scales.