Introduction

Wetlands are one of the most threatened ecosystems in the world. Originally covering ~10.3% of Earth's land surface (Kingsford et al. 2016), more than 70% of the wetlands worldwide have been lost since 1900 (Davidson 2014; Kingsford et al. 2016). Agriculture and urban area expansion leading to habitat loss, pollution and overharvesting are threatening the conservation of wetlands and their biodiversity and ecosystem services (Dudgeon et al. 2006; Vörösmarty et al. 2010). With increasing temperatures and climate unpredictability, water flow regimes in rivers and wetlands will change, imposing more challenges to wetland ecosystem conservation (Kingsford 2011).

In Africa, in the early twentieth century 5–7% of the continent was covered by wetlands, but 43% has already been lost (Junk et al. 2013; Kingsford et al. 2016). Most wetlands in Africa are situated in Central and East Africa and comprise four major river basins, the Nile, Niger, Congo and Zambezi, the Okavango Delta in Botswana, the Sudd in southern Sudan and Ethiopia, and several lakes, including the Lake Chad and the Rift Valley lakes, such as Victoria, Tanganyika, Malawi, Turkana, Mweru and Albert (Kabii 2021).

Wetland plant species rely greatly on the water for the dispersal of seeds and propagules, which may lead to a downstream unidirectional dispersal trend (Müller 1974; Boedeltje et al. 2004). The unidirectional dispersal trend may affect population demography and connectivity, leading to population extinction upstream (Anholt 1995; Honnay et al. 2010), and accumulation of genetic diversity downstream (the unidirectional diversity hypothesis, Ritland 1989). In contrast, populations located upstream may present low genetic diversity due to poor compensation from downstream (Honnay et al. 2010). In fact, some studies provide support for the unidirectional dispersal hypothesis. For instance, Myricaria laxiflora in the Three Gorges Valley of the Yangtze River (Liu et al. 2006), Heliconia metallica in Southeast Peru (Schleuning et al. 2011), Sparganium emersum in Niers River, Germany–The Netherlands (Pollux et al. 2009), and Impatiens glandulifera in Wales and Ireland (Love et al. 2013). On the other hand, several studies found no evidence of unidirectional dispersal in genetic diversity in plant species (e.g., Markwith and Scanlon 2007; Chen et al. 2009, 2017). For instance, Hymenocallis coronaria, a macrophyte of rocky rivers in Southeast USA, shows no consistent increase in genetic diversity and gene flow downstream (Markwith and Scanlon 2007), opposing the unidirectional dispersal hypothesis or Kimura’s one-directional stepping-stone model (Kimura and Weiss 1964). The “stepping-stone-model” advocates a decrease in allelic frequency correlation among populations with distance, similar to isolation-by-distance (Wright 1943), due to gene flow among adjacent populations (Kimura 1953; Kimura and Weiss 1964).

The lack of support for the unidirectional genetic diversity hypothesis may reveal the complexity of population dynamics and gene flow in wetlands. The migration dynamics, which may determine the evolutionary success of the species, results from intricate variables including life-history traits, geographical, environmental and human factors. Besides the pollination system that may promote long-distance dispersal, breaking the expected pattern in unidirectional genetic diversity due to water-mediated dispersal of seeds and vegetative propagules, the geological history and paleoclimatic changes may have affected demographic history and species distribution leading to more complex spatial patterns of genetic diversity distribution. The dynamics in effective population size through time and historical gene flow are essential to understand the current spatial patterns in genetic diversity, because past bottlenecks or population range retraction may decrease genetic diversity due to genetic drift (Holsinger and Weir 2009; Excoffier et al. 2009).

Most of the modern main rivers of Southeast Africa were derived from the Proto–Limpopo basin, which occupied an extensive area from Central African plateau towards the Indian Ocean (Goudie 2005; Tweddle 2010). In the Cretaceous/Tertiary transition, the uplift of Okavango, Kalahari and Zambezi axis changed the drainage leading to the formation of the Limpopo, Zambezi and Okavango river basins (Goudie 2005; Moore et al. 2008). The propagation of the East African rift system in the Paleogene c. 30–35 Ma caused the formation or affected the drainage system of the main modern river basins and lakes in East Africa (Grove 1986; Nyblade and Brazier 2002; Chorowicz 2005). Nevertheless, these river basins still have a strong biological connection considering the similarity of fish species (Tweddle 2010; Danley et al. 2012). Okavango and the upper Zambezi rivers, for instance, may be connected during extremely rainy years (Tweddle 2010).

During the Quaternary, several glaciations may have caused drier and colder climates in Africa (Shanahan and Zreda 2000; Ngomanda et al. 2009). The pollen fossil record indicates that Africa was cold and dry between ca. 35 and 15 kyr (Bonnefille et al. 1990), with lower lake levels. The driest period agrees with the LGM, ca. 21 kyr (Gasse et al. 1989). Temperature and precipitation started to increase ca. 15 kyr, also agreeing with the ending of the LGM (Bonnefille et al. 1990). Moreover, a period of strong aridity is recorded across East Africa ca. 13–11 cal. kyr (Bonnefille et al. 1995), matching the European Younger Dryas cold interval. A final glaciation occurred at 8.6 ± 0.5 kyr (Shanahan and Zreda 2000) corresponds to a widespread cold event in the early Holocene (Alley et al. 1997). The cycles of glaciation and interglaciation periods may have caused range contraction and expansion of wetland plant species, leading to an increase in the frequency of one or few alleles due to the dispersal of genotypes at the edge of expansion (Shine et al. 2011). This expansion may result in lower genetic diversity in newly colonized areas (Hewitt 1996), due to the spread of alleles that migrates on the forward motion (allele surfing, Excoffier and Ray 2008; Arenas et al. 2013). The fast colonization of new areas in addition to founder and density-dependent processes may cause low genetic diversity and gradients from the centre towards the periphery of the distribution range (Excoffier et al. 2009; Waters et al. 2013). In wetlands, this may cause higher genetic diversity in populations upstream than downstream, depending on which places were more suitable during glacial periods when populations retracted. Thus, understanding how the Quaternary climate changes affected the geographical distribution of lineages in wetlands may help understanding the dynamics and diversification of species to unravel biogeographical patterns in wetlands, as we address here for the paper reed sedge.

Cyperus papyrus L. (Cyperaceae) is a perennial rhizomatous species using C4 photosynthesis, which is dominant in wetlands across East and Southeast Africa, occurring in large and dense almost monospecific stands (Supplementary Fig. S1). The species evolved ca. 10.9 Ma (Besnard et al. 2009), and has been used by humans since ancient times and continues to be harvested by local riparian communities for fuel, handicraft and building. The species reproduces both asexually by creeping rhizomes, and sexually via pollination by wind. The dispersal unit is a small 1-seeded nutlet that is dispersed by water, birds and wind (Opio et al. 2014). The clearing of wetlands for agricultural and urban areas is decreasing and endangering papyrus populations and the ecosystem services provided by the species (Owino and Ryan 2007; Saunders et al. 2012). However, most studies about C. papyrus so far have been carried out in Kenya, shedding light on the ecology and conservation (e.g., Owino and Ryan 2007; Morrison and Harper 2009), use, management and impacts of anthropogenic disturbances (e.g., Jones et al. 2018), and populations genetics (e.g., Terer et al. 2012; Triest et al. 2014; Geremew et al. 2018a; Mwaniki et al. 2019). A comprehensive genetic study in other river basins is still lacking, hindering the understanding of species evolution and the processes driving the distribution of genetic diversity to support conservation strategies.

Here, we address how the past demographic dynamics shaped the contemporary spatial pattern in genetic diversity and population structure of C. papyrus in Southeast Africa river basins, (Zambezi, Okavango, Messalo, Inkomati, Maputo and Lugela). We specifically test the hypothesis of unidirectional dispersal, i.e., that populations downstream have higher genetic diversity than populations upstream. We tested the hypothesis at two levels, overall populations in different river basins and populations within a single river basin (Zambezi). If this hypothesis holds then we expect (i) that populations downstream have higher genetic diversity than upstream, (ii) higher admixture in populations downstream than upstream, and (iii) unidirectional migration from populations upstream towards populations downstream. On the other hand, because previous studies show strong isolation-by-distance in populations of C. papyrus in Kenya (Terer et al. 2012; Triest et al. 2014), if C. papyrus dispersal follows the stepping-stone model, we expect (iv) a stronger effect of isolation-by-distance than by-environment, (v) high admixture among populations due to stepping-stone gene flow, and (vi) higher migration between nearest neighbour populations. We also expect low genetic diversity within populations and high differentiation among populations due to the high frequency of vegetative reproduction (Opio et al. 2014).

Material and methods

Sampling sites and design

We sampled 1172 individuals of Cyperus papyrus, from 22 populations distributed across five countries in Southeast Africa, comprising six different major river basins (Fig. 1 and Supplementary Table S1). These populations were integrated into nine rivers (Zambezi, Shire, Kafui, São Filipe, Futi, Inkomati, Licungo, Messalo and Okavango), two lakes (Malawi and Gumbua) and four swamps (Nhamiconda, Nhamadiba, Muindi and Luabo).

Fig. 1: Geographical distribution of the 22 sampling sites of Cyperus papyrus in Southeast Africa and the Bayesian genetic clustering of individuals based on 19 microsatellite loci (B).
figure 1

A The geographical distribution of sampling sites onto the river basin shape file obtained from the Hydrologic Derivatives for Modelling and Analysis (HDMA, https://hidrosheds.cc.usgs.gov/hydro.php). Different river basins are highlighted by different colours following the legends. B Geographical distribution of the genetic clusters (K = 4). Each colour represents an inferred genetic cluster. The size of cluster chart section represents population co-ancestry for each cluster. The plot shows the individual ancestry for the most likely K = 4. Details of the populations are provided in Supplementary Table S1.

In each site, we collected leaves and shoots from at least 25 adults (up to 70 individuals in populations with more than 25), following a continuous and linear transect. To avoid sampling duplicates due to vegetative reproduction (Opio et al. 2014), the samples were collected at 3 m distance from each other, following Triest et al. (2014). We recorded the coordinates of the first and the last individuals of each transect using a Garmin GPS·, and mapped populations onto the river basin shape file obtained from the Hydrologic Derivatives for Modelling and Analysis (HDMA, https://hidrosheds.cc.usgs.gov/hydro.php), using the Geographic Information Systems QGis 3.16 (https://www.qgis.org/). Sampling was carried out from January to October 2017.

Genetic data

DNA was extracted from leaves dried in silica gel using the E.Z.N.A. SP Plant DNA Kit® (Omega BioTek, Norcross, GA). For genotyping, we used 19 microsatellites primers (C38, C4, C14, C52, C34, C10, C13F, C135, C5, C3, C28, C7, C56, C27, C12, C23, C15, C62 and C1) developed and optimized for C. papyrus (Triest et al. 2014). The polymerase chain reaction was performed following Triest et al. (2014), in thermal cyclers MJ PTC-200 (Marshall Scientific, Hampton, NH) and Bio-Rad MyCycler (Hercules, CA). The genotyping was performed using the Macrogen facilities (Seoul, Southern Korea). Microsatellite allele-size scoring was extracted using the software GeneMarker v3.0.0 (Holland and Parson 2011). Genotypes were first analysed to minimize genotyping errors due to stutter bands and drop using Micro-Checker v2.2.3 software (Van Oosterhout et al. 2004). The raw data showed no significant evidence of genotyping errors or null alleles.

Genetic diversity and population genetic structure

Because C. papyrus has vegetative reproduction, we first analysed multi-loci genotypes to discard potential clones using GenAlex software v.6.5 (Peakall and Smouse 2006). Our final dataset included 770 individuals (588 potential clones excluded).

For each population, we calculated the following genetic parameters with the software FSTAT 2.9.3.2 (Goudet 1995): A, mean number of alleles per locus; Ar, allelic richness (El Mousadik and Petit 1996); Ho, observed heterozygosity; He, expected heterozygosity under Hardy–Weinberg Equilibrium (Nei 1978); f inbreeding coefficient. We tested whether genetic diversity differs among river basins using permutation tests (10,000 permutations) implemented in FSTAT v2.9.3.2 software (Goudet 1995).

To analyse the genetic structure and population differentiation, we applied a hierarchal AMOVA implemented in Arlequin software v3.5 (Excoffier and Lischer 2010), partitioning the total variance in variance among river basins (FCT); among populations within river basins (FSC); among individuals in the total population (FIT); and among individuals within each population (FIS). A nonparametric test was performed to detect the statistical significance using 10,000 random permutations. We also estimated global FST (Wright 1965) and RST (Slatkin 1995), based on the variance in allele sizes to verify the contribution of stepwise-like mutations to genetic differentiation, and tested the hypothesis that FST = RST (Hardy et al. 2003) using the software SPAGeDi v1.4 (Hardy and Vekemans 2002).

We used Bayesian clustering simulation to assess the number of genetic clusters (K) and admixture among populations of C. papyrus from different river basins, using the software Structure v2.3.4 (Pritchard et al. 2000). We performed a burn-in period of 100,000 repetitions and then, 1,000,000 Markov Chain Monte Carlo (MCMC) repetitions of data collection, with admixture model of ancestry and correlated allele frequencies. To detect the consistency of results, ten independent runs were performed for each K value (K = 1–22). We assessed the most likely K supported by the data using the ΔK method (Evanno et al. 2005) implemented in Structure Harvester v0.6.94 (Earl and VonHoldt 2012).

Demographic dynamics across the river basins in Southeast Africa

We first estimated contemporary effective population size (Ne) using the linkage disequilibrium method implemented in NeEstimator v2.1 and selecting a threshold lowest allele frequency value of 0.05 (Do et al. 2014).

We then estimated demographic parameters using coalescent analysis implemented in the software Migrate-N v5.0.4 (Beerli 2009), to understand the demographic dynamics through time and migration. We estimated theta and effective population size (θ = 4μNe; coalescent or mutation parameter for a diploid genome, where Ne is effective population size), the number of migrants per generation from a scaled migration rate (M = 4Nem/θ, where m is migration rate), and population divergence time. To test the hypothesis of migration, we performed the coalescent analysis with eight groups of populations, based on the river basins and the Structure clusters (see results, Fig. 1): Okavango (OKA population, cluster 3), Inkomati and Maputo (INK and FUT, cluster 4), Lugela (LUA, LIA and LIB, mainly clusters 2 and 3), Messalo (MEA and MEB, clusters 1, 2 and 4), middle Zambezi river basin (KAA, KAB, KAT, mainly clusters 2 and 3), lower Zambezi, Shire river (AWS, SHI, NHA, NHI, cluster 1), lower Zambezi, Gumbua (GUM, MAL, MOP, clusters 1, 2 and 3), lower Zambezi, delta (SAL, SUA, MAR, MUI, clusters 1 and 2). Using Brownian motion model, we ran two long chains, sampling 1,000,000 steps and recorded 10,000 steps with 100,000 Burn-in per replicate. We included four independent chains with different temperatures (1.00, 1.50, 3.00 and 1,000,000.00) in a Metropolis-coupled MCMC procedure (Geyer and Thompson 1995). Bayesian estimates were computed individually for the 19 loci and summarized as weighted values over all loci, we selected the mean values. We ran two independent analyses to check results consistency using Tracer v1.7.1 (Rambaut et al. 2018). To estimate divergence time and demographic parameters we used the microsatellite mutation rate reported for a closely related lineage, Zea mays (Poaceae), of 7.7 × 10−4 (95% CI = 5.2 × 10−4 to 1.1 × 10−3) mutations per allele per generation (Vigouroux et al. 2002). For generation time we used 1 year, because C. papyrus’ steams takes ~6 months to reach maturity and 9–12 months to flowering and senescence (Terer et al. 2012). However, considering the individual lifetime, because it is a perennial sedge, we also used generation times of 2 and 5 years.

Spatial pattern in genetic diversity and structure

We applied multiple regression analyses to test whether populations are isolated-by-distance or by-environment. Thus, to infer the effect of geographical and environmental distance on genetic differentiation, we performed Multiple Matrix Regression with Randomization method, following Wang (2013). We calculated pairwise linearized FST (Slatkin 1995) using Arlequin software, and calculated the logarithm of the geographical distance between pairs of populations. For populations in the same river, the distance was calculated based on the watercourse distance. For environment variables, we obtained the 19 bioclimatic variables from WorldClim (https://www.worldclim.org/bioclim), and selected the bioclimatic variables from the highest loading on each of the three first factors, after performing a factorial analysis with Varimax rotation. We selected (Supplementary Table S2): BIO4 (temperature seasonality), BIO10 (mean temperature of the warmest quarter) and BIO14 (precipitation of driest month). In addition, we obtained the African data of digital elevation model (slope) with 3” resolution from the Digital Elevation Model database from the HDMA (https://www.sciencebase.gov/catalog/item/591f6d02e4b0ac16dbdde1c7).

To test whether populations downstream have higher genetic diversity due to unidirectional dispersal model, we also calculated the distance of each population to the most upstream population (KAA) only for the Zambezi River, due to the higher number of populations and to the distribution of populations up and downstream (Fig. 1). We regressed the genetic diversity (He) and allelic richness (Ar) against geographical distance using Minitab® software. We also regressed the genetic differentiation among populations up and downstream against geographical distance to test for bidirectional stepping-stone model.

To visualize the spatial pattern in population genetic structure and identify corridors and barriers to gene flow, we used the Estimated Effective Migration Surfaces (EEMS) method that explicitly represents genetic differentiation as a function of the migration rates based on an isolation-by-distance model. The effective migration surface represents migration rates that, would produce genetic dissimilarities similar to those observed in the data (Petkova1 et al. 2016). To run EEMS (https://github.com/dipetkov/eems), we used 22 demes and ran the analyses with a burn-in of 500,000 and MCMC length of 10,000,000. We optimized EEMS parameters by adjusting the qEffctProposalS2 (0.1), qSeedsProposalS2 (1.5), mEffctProposalS2 (4.0), mSeedsProposalS2 (4.0) and mrateMuProposalS2 (0.5) such that the acceptance proportions for all parameters, except degrees of freedom, were within 10–40%, as suggested by Petkova et al. (2016). Using rEEMSplots R package (Petkova et al. 2016), we plotted the surfaces of effective migration rates (m) and effective diversity (q), i.e., the expected genetic dissimilarity of two individuals sampled within the same deme, on Southeast Africa map.

Results

Genetic diversity and population genetic structure

Overall, we found low genetic diversity (He < 0.5) and allelic richness (Ar < 2.0) in all populations (Table 1). We also found a high and significant inbreeding coefficient in most populations (Table 1). He ranged from 0.280 to 0.497, but we found no significant difference among river basins (permutation test, p = 0.294). Although Maputo, Inkomati and Okavango had seemingly lower He, only one population was sampled in these river basins (Table 1). We also found no significant difference in allelic richness (p = 0.433) that was also apparently lower in those three river basins (Table 1). Inbreeding coefficient was also statistically similar among river basins (p = 0.345).

Table 1 Genetic diversity of 22 populations of Cyperus papyrus in Southeast Africa, based on 19 microsatellite loci.

We found high genetic differentiation among populations based on FST (0.204; p < 0.001) and RST (0.262; p < 0.001), that did not differ statistically (p = 0.112), suggesting that stepwise mutations did not cause a shift in mean allele sizes between populations. We found high and significant genetic differentiation among river basins (FCT = 0.070, p < 0.001) and among populations within river basins (FSC = 0.153, p < 0.001). However, variation was higher among individuals in total populations (FIT = 0.373, p < 0.001) and among individuals within each population (FIS = 0.204, p < 0.001). We also performed a hierarchical AMOVA based on Structure clusters (4 clusters) and found higher variation among clusters (FCT = 0.112, p < 0.001), and lower among populations within clusters (FSC = 0.099, p < 0.001), but inbreeding in total population (FIT = 0.363, p < 0.001) and within populations (FIS = 0.204, p < 0.001) were very similar to the former values obtained with the original populations.

Bayesian clustering showed that K = 4 was most likely to explain our data (Fig. 1 and Supplementary Fig. S2), but with high admixture among populations either from the same or different river basins. The unique populations from Maputo (FUT) and Inkomati (INK) river basins were grouped in cluster 4 (Fig. 1), although some individuals showed admixture with clusters 1 and 3. The Okavango (OKA) population was assigned to cluster 3, and Zambezi populations showed high admixture, with individuals assigned to the four clusters (Fig. 1). Populations from Lugela (LUA, LIA and LIB) river basin were assigned mainly to clusters 2 and 3, and Messalo populations (MEA and MEB) showed high admixture, with individuals assigned to clusters 1, 2 and 4.

Demographic dynamics across the river basins in Southeast Africa

Contemporary effective population size (Ne) based on the NE Estimator was low in most populations, ranging from 0.5 to 91.9 (Table 1). Population AWS, from Zambezi river basin (Shire River), had the highest Ne (91.9, 95% CI = 37.8–inf). However, this may be a result of sampling error, because the estimation of Ne converged to infinite; therefore, the highest Ne was observed in the Okavango population (Ne = 19.6; 95% CI = 12.5–32.6), followed by Inkomati (Ne = 10.7; 95% CI = 6.2–17.4).

Coalescent simulations showed relatively high effective population sizes (Table 2). Populations from the Middle Zambezi had the highest Ne (1666), followed by Lower Zambezi Delta (Ne = 1340), and Lower Zambezi Gumbua (Ne = 1184). Lineages from the Middle Zambezi diverged first, at ca. 6.6 kyr (considering 1 year generation time) to 33.3 kyr (considering 5 years generation time), followed by populations Lower Zambezi (ca. 5.3–26.8 kyr). Okavango, Lugela, Inkomat and Messalo lineages diverged only recently (Table 2).

Table 2 Demographic parameters of Cyperus papyrus groups of populations in Southeast Africa, based on the coalescent analysis performed with Migrate-N.

Overall, we found moderate to high migration between river basins or groups (Nem > 1.0 migrants per generation; Supplementary Table S3). Overall, Inkomat, Lugela and Okavango were sinks of migration, i.e., received immigrants from the other basins but sent no migrants (Supplementary Table S3). Messalo and the Lower and Middle Zambezi populations were important sources of migrants and also receive immigrants from each other.

Spatial pattern in genetic diversity and structure

We found high and significant genetic differentiation (FST) among all pairs of populations (Supplementary Table S4). Overall, Okavango population was the most divergent, with the highest values of pairwise FST. Differentiation between populations was significantly related to geographical distance (r2 = 0.192, p = 0.003, Supplementary Table S5). Although BIO14 (precipitation of the driest month, r2 = 0.192, p < 0.036) had a significant relationship with pairwise FST, the regression coefficient was negative (b = −0.187), implying no isolation-by-environment. We found no significant relationship between FST and BIO4 (temperature seasonality), mean BIO10 (temperature of a warmest quarter) and slope (all p > 0.05, Supplementary Table S5).

We found a significant relationship between genetic diversity and allelic richness and distance from the most upstream population in Zambezi river basin. However, genetic diversity (r2 = 0.369, p = 0.021, Supplementary Fig. S3a) and allelic richness (r2 = 0.394, p = 0.016, Supplementary Fig. S3b) decreased with geographical distance. Genetic differentiation was not significantly related to the geographical distance (r2 = 0.016, p = 0.227, Supplementary Fig. S3c) among populations from Zambezi river basin.

Our results showed that genetic similarities between demes tended to decay faster with distance, in the Middle Zambezi (populations KAA, KAB and KAT), Okavango (OKA), Maputo (FUT) and Inkomati (INK), and thus these demes have lower effective migration (Fig. 2). Populations of Lower Zambezi (see Fig. 1), Lugela and Maputo, showed slightly higher effective migration (Fig. 2). The diagnostic plot of the EEMS showed no deviations from the fitted model isolation-by-distance (Supplementary Fig. S4).

Fig. 2: Estimated effective migration for 22 populations of Cyperus papyrus in Southeast Africa.
figure 2

The colour contour plot shows the posterior distribution of the migration rates on a log10 scale, relative to the overall migration rate across the geographical distribution. For instance, log(m) = 1 corresponds to an effective migration that is tenfold faster than the average.

Discussion

Seed dispersal and river basin connections partially explain the lack of unidirectional dispersal

Our findings show no support for a unidirectional dispersal. Populations upstream and downstream from different river basins showed similar values of genetic diversity (He) or allelic richness (Ar), and admixture was similar in populations upstream or downstream, contrary to predictions i and ii, respectively. Migration was similar in both directions upstream or downstream opposing the expected under unidirectional dispersal (prediction iii).

We found high genetic differentiation among populations from the same river basin, and among river basins. However, we found high admixture, but not only between neighbour populations (prediction v) nor higher admixture among populations downstream (ii). Notably, our results showed admixture among populations from Zambezi, Lugela and Messalo river basins, and Zambezi and Okavango. Zambezi, Lugela and Messalo river basins are connected during extremely rainy years, leading to gene flow among populations from these different river basins. Similarly, the Okavango river basin can be connected to the middle Zambezi, throughout the Chobe and Magwekwana rivers during exceptional flood years (Tweddle 2010), explaining the clustering in similar groups and common ancestry. Zambezi delta populations may also be connected due to flood episodes. The connection between river basins may also explain the high migration between river basins such as Zambezi, Messalo and Lugela, and the lack of unidirectional migration (prediction iii). This connection was also evidenced by coalescent analysis and effective migration surface (EEMS method). Population KAT is upstream and KAB is downstream Victoria falls in Zambezi, but they were assigned to the same Structure genetic clusters (2 and 3). Environmental conditions such as the high-water velocity at the confluence between Kafue and Zambezi rivers (Pinay 1988) may increase seed carry over long distances leading to high connectivity among populations and admixture.

The high admixture is evidence of ancient gene flow but also ongoing gene flow among some populations. It is important to note that C. papyrus reproduces mainly by vegetative rhizome growth, and is wind-pollinated. Wind pollination may potentially promote long-distance pollen dispersal, although in fragmented populations wind pollination may be limited to short distances (Koenig and Ashley 2003). This local pollen dispersal may lead to highly variable seed production among populations, but a masting seed production (sporadic production of large seed yields by a population) that are characteristics of wind-pollinated species (Satake and Iwasa 2002). The overharvesting of C. papyrus’ populations and loss of habitat (Nerima and Orikiriza 2013), may decrease the effectiveness of wind-pollen dispersal due to population fragmentation and isolation. We hypothesize that vegetative reproduction and limited pollen dispersal may contribute to the high differentiation among populations evidenced here. On the other hand, most long-distance gene flow may be due to seed dispersal. Hydrochory facilitates the connectivity among populations of C. papyrus along the river basins (Triest et al. 2014; Terer et al. 2015). Water may be the major disperser agent for C. papyrus, leading to long-distance dispersal, but mostly unidirectional. However, birds are also important seed dispersers and may carry seeds over a long distance, potentially assuaging the tendency of unidirectional water-mediated dispersal. The bird seed dispersal together with the sporadic connection between river basins during extremely rainy years may break the expected pattern of unidirectional dispersal and partially explain the opposite direction in the distribution of genetic diversity in relation to that expected by unidirectional dispersal. Our results showed that populations upstream the Zambezi river basin had higher genetic diversity than populations downstream the Zambezi delta, opposing the unidirectional dispersal hypothesis (Honnay et al. 2010). In addition, we found a tendency for higher migration between nearest groups or rivers, supporting a stepping-stone dispersal (prediction vi). Moreover, the prevailing of Tropical easterlies’ winds in Southeast Africa (Miller et al. 2019) may also account for pollen movement from populations downstream towards upstream rivers, leading to admixture and accumulation of genetic diversity upstream.

Following our expectation (iv), we found no evidence of isolation-by-environment and a strong effect of isolation-by-distance overall river basins, which is probably mediated by hydrochory leading to higher gene flow among neighbouring populations, expected by the stepping-stone model (prediction iv). In fact, C. papyrus show isolation-by-distance in other river basins, such as in lakes and wetlands in Kenya (Terer et al. 2015).

Demographic dynamics due to climate changes during the Holocene may also explain the lack of unidirectional dispersal footprint

Our findings show that demographic dynamics during the Holocene may have shaped the spatial patterns in genetic diversity and differentiation in populations of C. papyrus from Southeast Africa. The coalescent model indicated that lineages from the Middle Zambezi were the first to diverge from the MRCA, at ca. 6.6 kyr, suggesting that the recent Holocene climate changes are more related to the contemporary spatial pattern in genetic diversity and differentiation in C. papyrus than the geological history of the river basins. For instance, the capture of the middle Zambezi by the lower Zambezi during the transition of Pliocene and Pleistocene (Nugent 1990; Moore et al. 2008) is more ancient than the contemporary lineages of C. papyrus.

In fact, the MRCAs coincide with the last glaciation detected in Africa (Shanahan and Zreda 2000) that corresponds to a widespread cold event in the early Holocene (Alley et al. 1997), which may have caused a retraction in C. papyrus range and divergence of the main groups of Zambezi River. The spread of these ancestral lineages may have caused the lower genetic diversity in the new colonizing areas downstream and the isolation-by-distance. Moreover, the climate became drier and more unpredictable in Southeast Africa in the last ca. 5 cal kyr BP, probably due to the return of the westerlies from the equator, the high-pressure cell, and the South Indian Ocean Convergence Zone, with the most pronounced arid event at ca. 2.8 cal kyr BP (Miller et al. 2019). The drier climatic conditions and unpredictability may have caused cycles of populations retraction and expansion of C. papyrus, leading to loss of genetic diversity in some populations, and also lineage sorting and admixture, leading to higher genetic diversity in populations upstream than downstream, depending on which places were more suitable during glacial periods when populations retracted.

It is possible that populations upstream the Zambezi river basin were in more stable sites and were the source of propagules to new areas downstream, leading to an opposite pattern expected by the unidirectional dispersal model. In addition, several episodes of colder and drier climates occurred in Southeast Africa (Tyson 1999), ca. 4700 to 4200 BP and then from 2500 to 3200 BP, the Neoglacial period, a widespread event reported elsewhere in the world. The climate was very variable in the last 2000 BP. From 2000 to 900 BP the climate was characterized by cooling, warming from about 250 to 600 AD, and cooling again until the year 900 AD. From 900 to 1300 AD the climate was warm and variable, corresponding to the Medieval Warm Epoch (Tyson 1999). The five-century cooling from 1300 to 1850 AD, corresponds to the Little Ice Age in Southeast Africa. These cycles of warm and colder periods may explain the low genetic diversity and the variable migration among populations. While some river basins show more recent gene flow, in the last 570 years, most populations show negligible migration (<1.0 migrants per generation). The migration between far and apparently isolated river basins, such as Messalo and Maputo and Inkomati may be an effect of stepping-stone dispersal mediated by the connection between Messalo and Zambezi, but also raises the hypothesis of human-assisted migration.

The scenario of cycles of population retraction and expansion due to drier and colder periods in the African continent still holds if we consider a longer generation time such as 5 years (Table 2). The African continent, including East Africa, was cold and dry between ca. 35 and 15 kyr, with the driest period occurring at ca. 21 kyr, agreeing with the LGM (Bonnefille et al. 1990). Fossil pollen records also indicate large variability in precipitation during this time, which is consistent with the record of multiple glacial advances and retreats during the last glaciation (Shanahan and Zreda 2000). A period of strong aridity is also recorded across East Africa at 13–11 cal. kyr, coinciding with the European Younger Dryas cold interval (Bonnefille et al. 1990), followed by another glacial period at the Early Holocene ca. 8.6 kyr, the last glaciation detected in Africa (Shanahan and Zreda 2000) that corresponds to a widespread cold event in the early Holocene (Alley et al. 1997).

Life-history and environmental conditions may explain the spatial pattern in genetic diversity

Our results reveal that the geographical location and environmental conditions of the site play an important role in the patterns of genetic diversity and differentiation. In fact, populations of C. papyrus have low genetic diversity, similar to Kenya populations in East Africa (He = 0.412, SD = 0.149; Terer et al. 2015), and slightly lower than populations in the lake Naivasha, Kenya (He = 0.545, SD = 0.022, Triest et al. 2014). The species reproduces mainly by vegetative growth, which may lead to low genetic diversity, polymorphism and high differentiation among populations.

The highest level of genetic diversity in the Gumbua population (He = 0.497) can be explained mainly by the physical-geographical conditions of the site. The Gumbua lake (transitory lake) is crossed by the Shire River (Palamuleni et al. 2010). These conditions may allow genetic richness accumulation considering the contribution of individuals from upstream Shire River and small tributaries around the lake. In contrast, the low level of genetic diversity in the population at the Inkomati river basin may be a result of intense anthropogenic disturbances in southern Mozambique during the last 42 years (Smith et al. 2008). Anthropogenic disturbances may also affect the spatial pattern in the population genetic diversity of C. papyrus. In wetlands, changes in landscape features may disrupt gene flow and affect spatial patterns in genetic diversity. For instance, dams for crop irrigation and urban development may affect genetic differentiation and bidirectional gene flow in Miscanthus lutarioriparius in the Yangtze River in China (Yan et al. 2016).

Similarly, populations downstream the Zambezi river basin showed lower genetic diversity than populations upstream. Those populations downstream are located in the most disturbed area of Mozambique, close to urban and industrial areas and agriculture. Thus, the low contemporary effective population sizes and genetic diversity may be due to recent impacts on populations of C. papyrus. For instance, C. papyrus population INK showed very low genetic diversity (0.289) and is intensively harvested by the local community for building material, cattle grazing and other household objects (Jones et al. 2018). The construction of three dams along the middle Zambezi 70 years ago (Tumbare 2000) may have contributed to the genetic differentiation between populations, given the short generation time of C. papyrus, and may partially explain the low connectivity. Cyperus papyrus has a short life cycle (less than 2 years), seeds may germinate promptly after dispersal, and plants may flower within approximately 6 months (Rongoei and Outa 2016). Although the vegetative reproduction system of the species may affect generation time (Rongoei and Outa 2016), given the fast life cycle and time to first flowering and reproduction, anthropogenic impacts on genetic diversity are feasible. It is important to note that the analysis-based NeEstimator retrieved low values of Ne. This is most likely due to deviation from the models’ assumptions, which assumes isolated and random mating populations. In addition, we found a high inbreeding coefficient indicating selfing or biparental inbreeding. Nevertheless, C. papyrus may have strong fine-scale spatial genetic structure due to limited seed dispersal, with very low neighbourhood sizes ranging from 5 to 11 (Geremew et al. 2018b). Thus, it is likely that the Ne estimates based on linkage disequilibrium reflect more the effective size of local neighbourhoods than the effective size of a whole population. Also, the low genetic diversity may constrain a reliable estimate of Ne.

Concluding remarks

Our findings show that the spatial pattern in genetic diversity and population structure of C. papyrus in Southeast Africa is the outcome of intricate factors. We found no evidence of unidirectional dispersal, yet our data supports stepping-stone dispersal, but not necessarily unidimensional. On one hand, the past demographic dynamics in the Holocene, caused by several episodes of cool and dry climate interposed by wet and hot periods shaped the current pattern of genetic diversity and population structure among the adjacent river basins, leading to higher genetic diversity in populations upstream the Zambezi river basin. The high level of admixture may reveal past and ongoing gene flow among C. papyrus populations due to seed dispersal and connection among adjacent river basins due to floods during extremely wet years.

It is important to highlight the very low genetic diversity of C. papyrus populations in Southeast Africa, which may hamper the long-term viability of these populations. This may be due to the demographic dynamics, but also due to anthropogenic disturbances, such as overharvesting decreasing population sizes (Nerima and Orikiriza 2013), pollution and loss of habitat due to urban and agriculture expansion. It is also important to acknowledge that the number of populations sampled in the river basins was very different, which may affect the comparison of genetic diversity among river basins and the analysis of unidirectional dispersal in different river basins. Unfortunately, we could not sample more populations in Okavango due to the Botswana government's authorization to access sample sites and in other river basins in Mozambique due to civil war. Future studies may include more populations from these other river basins, and the upper Zambezi to better understand the spatial distribution of genetic diversity. It is also important to highlight that this is the first study about the spatial pattern of genetic diversity in the Southeast Africa River basins, and the first study about C. papyrus in the region. The Okavango Delta is unique in the world that flows towards the interior of the continent, to the Kalahari Desert in Botswana (Ramberg et al. 2006). Thus, a more in-depth study of this river basin is still needed. The Zambezi is an important river in the region, with a catchment area of ca. 1.32 million km2 in eight countries, providing ecosystem services to hundreds of thousands (Tweddle 2010). The low genetic diversity detected for C. papyrus may give a clue about the impacts of human disturbances along the catchment of this river basin and may provide important data to design sound conservation and management strategies.