Selective signatures and high genome-wide diversity in traditional Brazilian manioc (Manihot esculenta Crantz) varieties

Knowledge about genetic diversity is essential to promote effective use and conservation of crops, because it enables farmers to adapt their crops to specific needs and is the raw material for breeding. Manioc (Manihot esculenta ssp. esculenta) is one of the world’s major food crops and has the potential to help achieve food security in the context of on-going climate changes. We evaluated single nucleotide polymorphisms in traditional Brazilian manioc varieties conserved in the gene bank of the Luiz de Queiroz College of Agriculture, University of São Paulo. We assessed genome-wide diversity and identified selective signatures contrasting varieties from different biomes with samples of manioc’s wild ancestor M. esculenta ssp. flabellifolia. We identified signatures of selection putatively associated with resistance genes, plant development and response to abiotic stresses that might have been important for the crop’s domestication and diversification resulting from cultivation in different environments. Additionally, high neutral genetic diversity within groups of varieties from different biomes and low genetic divergence among biomes reflect the complexity of manioc’s evolutionary dynamics under traditional cultivation. Our results exemplify how smallholder practices contribute to conserve manioc’s genetic resources, maintaining variation of potential adaptive significance and high levels of neutral genetic diversity.


Results
SNP genotyping. Sequencing of the two ddGBS libraries resulted in 198,017,706 (NsiI-MspI) and 240,176,492 (PstI-MseI) raw reads. After demultiplexing and quality filtering, 153,858,282 reads for NsiI-MspI (mean of 1,672,372.6 reads per sample ± 406,184.6 SD) and 118,476,082 reads for PstI-MseI (mean of 1,287,783.5 reads per sample ± 540,792.9 SD) were used for SNP identification. The NsiI-MspI library resulted in 4790 SNPs (34.8 mean sequence depth per locus ± 21.3 SD, 2.2% of missing data) while the PstI-MseI library resulted in 10,121 SNPs (29.5 mean sequence depth per locus ± 16.8 SD, 4.8% of missing data). After merging these data sets and pruning markers with high LD we obtained a final set of 11,782 high-quality SNPs (31.2 mean sequence depth per locus ± 18.5 SD, 4% of missing data) (Supplementary Figure S1, Table S2).
Putative signatures of selection. The tests for selective signatures contrasting the wild and cultivated manioc identified a total of 2301 outlier SNPs (pcadapt: 1239; F ST : 588; FLK: 590; hapFLK: 590; XP-EHH: 364), of which 698 SNPs were identified by at least two different methods (Fig. 2a,c). When contrasting the groups of varieties per biome, we identified a total of 1673 outlier SNPs (pcadapt: 161; F ST : 550; FLK: 556; hapFLK: 590), of which 169 SNPs were identified by at least two different methods (Fig. 2b,d). Only two outlier SNPs were common for both criteria, and we considered that 865 outlier SNP loci showed putative signatures either of the selection of cultivated manioc from the wild ancestor, or for diversification of manioc in different cultivation environments. A total of 5174 effects were predicted for these outlier SNPs (Supplementary Table S3), of which 569 were within introns and 534 within exons (269 synonymous mutations and 265 non-synonymous). These numbers are greater than the number of outlier SNPs because the effects were predicted for all the alternative transcripts of the genes.
Among the loci with putative selective signatures, 680 SNPs were in 663 different predicted manioc genes. These genes showed a total of 1176 annotations distributed in 33 different GO classes ( Supplementary Fig. S2). The most frequent GO annotations were associated with the molecular functions of binding (215 genes) and catalytic activity (188 genes), and the biological process of metabolism (181 genes). Consistent with this, the most frequent enriched GO annotations were binding to ATP (58 genes) and biding to metal ion (49 genes) (Supplementary Table S4). A total of 69 manioc genes with outlier SNPs were similar to PRGdb 3.0 resistance genes (45 with identity > 90%) belonging to six different classes (according to their protein domains), and most of them (38) had a kinase domain. A total of 576 manioc genes with outlier SNPs were similar to Swiss-Prot proteins (306 with identity > 60%) related to many functions, including plant growth and development, organ sizes, root, flowering, and biotic and/or abiotic stresses. Many genes had similarities with proteins related to more general cellular processes, such as cell proliferation/elongation, transcription regulation, chloroplast activity, signaling, and ubiquitination. The variety of functions is exemplified by the descriptions for 21 of these proteins (Supplementary Material Appendix 1), that we used to guide our discussion. Supplementary Table S5 summarizes blastp results.  According to the AMOVAs, greater proportions of the genetic variance were found within groups ( Table 2). As expected, the greatest divergence among groups was observed between wild and cultivated manioc (φ ST = 0.32, p < 0.001), followed by the divergence between the groups of biomes when compared to wild manioc (φ CT = 0.31, p < 0.001). The divergence among different biomes was small, yet significant (φ ST = 0.03, p < 0.001). Pair-wise estimates of F ST between cultivated and wild manioc were high ( Table 3). The bitter varieties were slightly more divergent from wild manioc (F ST = 0.357) than the sweet varieties (F ST = 0.344), and the divergence between bitter and sweet varieties was much lower (F ST = 0.045), yet significant. All the biomes were highly divergent from wild manioc, with F ST ranging from 0.34 (Amazonia) to 0.43 (Pantanal). The divergence between biomes (a) (b) (d) (c) Figure 2. Summary of genome scans for signatures of selection considering different groups of manioc (Manihot esculenta) samples. Venn diagrams showing the number of outlier SNPs detected for each test (within parenthesis) and the overlap among them (numbers inside ellipses) for (a) wild and cultivated manioc, and (b) the groups of varieties per biome. The genomic context of outlier SNPs is illustrated in circular plots for (c) the groups of wild and cultivated manioc, and d) the groups of varieties per biome. Each manioc chromosome is represented by a different box (10 Mb tick sizes), and their names are coded according to the manioc genome Manihot esculenta v6 (NCBI PRJNA234389). The outlier SNPs are represented by dots for each test, which are shown in different layers. The outlier SNPs detected by at least two tests are highlighted in red.  Table 3). The high divergence between wild and cultivated manioc, and the low divergence among biomes were also evident in sNMF and DAPC (Fig. 3). There was no flatting point in the curve of cross-entropy estimates in sNMF, suggesting the absence of major genetic structure (Fig. 3a). Therefore, we evaluated the correspondence of the ancestry coefficients for K = 2, 3 and 5 with the respective groups of cultivated vs. wild, bitter vs. sweet vs. wild,   www.nature.com/scientificreports/ and the four biomes vs. wild (Fig. 3b). The most evident genetic structure was observed between the wild and cultivated manioc, with high admixture between the groups of bitter and sweet varieties or among biomes. DAPC results were similar, showing high divergence between wild and cultivated manioc, and a great overlap of varieties from different biomes (Fig. 3c,d). Besides the great genetic admixture, the DAPC plots also identify some highly divergent varieties from different biomes. Because DAPC maximizes between-group variations, Amazonia was somewhat more divergent in relation to the other biomes, just as suggested by the pairwise F ST estimates.

Discussion
There are many methods for the detection of selective signatures based on the significant deviation of outlier markers from the distribution of a given statistic measured under a specific model 53 . However, deviance from model assumptions and covariance with sampling strategy, demography, underlying genetic structure, and other specific factors may lead to the detection of false positives [53][54][55][56] . A general approach to account for these limitations is to combine the results of different outlier tests 57,58 . The variable number of outlier SNPs detected by the tests we performed reflect their different underlying models. We discuss below the possible biological significance of some outlier SNPs consistently identified by different tests.
The reduction of genetic diversity associated with domestication bottlenecks, or resulting from multiple founding effects during crops' dispersals 59,60 , might have affected plant defense and resistance mechanisms 61 . The presence of outlier SNPs in putative resistance genes from different classes suggests that the gene bank conserves important genetic resources for the crop. For example, the outlier SNPs Pst_1761 and Pst_6563 were identified in genes similar to the disease resistance proteins RPS2 and RPM1, which are involved in the response to bacterial blight. In manioc, "cassava bacterial blight" is caused by Xanthomonas campestris pv. manihotis (or X. axonopodis pv. manihotis) that is found in cultivation areas throughout the world and is one of the most serious diseases affecting the crop 11,62 . We also identified outlier SNPs (Nsi_172 and Pst_4209) in genes similar to www.nature.com/scientificreports/ proteins involved in resistance to powdery mildew fungus, which in manioc is known as ash disease and is caused by Oidium manihotis Henn 62 . Although ash disease is widespread, it is not considered of great importance due to its superficial lesions 62 . Agronomic trials would be important to confirm if some of the varieties conserved in the gene bank may be sources of resistance alleles for these diseases. Pest and disease resistance may be the principal weakness of manioc to adapt to climate change 63 .
Most of the putative manioc resistance genes had other functions probably due to their catalytic kinase domains, which play many key roles in eukaryotes 64 including signaling and plant defense responses 65,66 . Other outlier SNPs were in genes related to general cellular processes, such as ubiquitination (Nsi_2393, Nsi_3361 and Pst_4290) and transcriptional regulation (Nsi_3361, Pst_8996, Pst_9853 and Pst_9959). These processes are central in the expression and regulation of several other genes, and therefore are likely to be involved in adaptations, including domestication traits 67,68 .
Outlier SNPs in genes putatively involved with development may be genetic signatures of manioc domestication and selection for different traits of interest in response to distinct human preferences. Manioc was domesticated for its starchy roots, and we identified SNPs in genes putatively involved in root formation (Nsi_2393, Nsi_3361, Pst_4290, Pst_8996 and Pst_9959), organ size (Nsi_3361) and shape (Pst_1010), and in starch metabolism (Pst_5701). These SNPs may be of agronomic importance because the current major objectives of breeding for food and industrial uses include increasing the root/stem ratio, improving starch quality, and increasing starch content in roots 69,70 . We also found outlier SNPs in genes putatively involved in the development of shoots (Nsi_3361 and Nsi_3540) and branching (Pst_6259). Domestication often resulted in changes in shoot architecture that facilitate plant growth and harvesting 71,72 . Contrasting with the highly branched habit of ssp. flabellifolia 21 , farmers prefer sparsely branched manioc plants because they may provide thicker stem cuttings 73 . Some outlier SNPs were in genes putatively involved in flower development (Pst_7007, Pst_8971, Pst_9853 and Pst_9959), fertilization (Nsi_172 and Pst_4209), and gametogenesis (Nsi_3361). These selective signatures may result from the relaxation of selective pressures on sexual fertility caused by domestication for vegetative propagation 74 . Flowering is variable in manioc and about 60% of the pollen produced may remain viable, although manioc cultivars commonly have male sterility and low seed/flower ratios 11 . We also identified outlier SNPs putatively involved in cotyledon development (Nsi_2289 and Nsi_4509); this was in a manioc gene similar to Arabidopsis thaliana STY46, a protein kinase involved in chloroplast biogenesis and differentiation in cotyledons 75 . The seedlings of domesticated and wild manioc have contrasting functional differences 76 . The cotyledons are hypogeal in the seedlings of ssp. flabellifolia, providing additional opportunities for plant regrowth after damage. In domesticated manioc, cotyledons are epigeal, foliaceus, and photosynthetically active, collaborating with rapid plant growth 76 .
Growth resilience under distinct environments was essential for the spread and adaptation of crops to regions outside their domestication centers 71 . The outlier SNP Pst_29 was in a gene putatively involved in the metabolism of glycosinolates, secondary compounds that may act in plant defense to a wide range of enemies 77 . Ecological shifts to resource-richer cultivation environments may have relaxed the selection for chemical and physical defenses in some crops 24 . For manioc, natural selection might have been important to maintain defense mechanisms because cultivation in anthropic landscapes made plants more apparent to pests and herbivores 24 . This is especially important in the context of active human selection for low toxic sweet manioc and cultivation in herbivore and pathogen-rich tropical regions 24 , such as Amazonia and other Brazilian biomes. We also found outlier SNPs (Nsi_172, Pst_1010, Pst_3333, Pst_4209 and Pst_5985) possibly involved in cell proliferation and elongation. Genes controlling cell division are frequently associated with domestication and diversification processes because they often resulted in increased organ sizes 78,79 . The roots of cultivated manioc have greater amounts and larger starch granules, but not larger cell sizes, than the wild relatives 11,80 . However, the cessation of cell division and expansion may play an important role during manioc response to drought stress 81,82 . Another outlier SNP (Pst_7007) was in a gene that may act in responses to salt, drought, and cold stresses. Responses to abiotic stresses were important crop adaptations for the ecological shifts from the wild to the cultivation environment and when they started to be dispersed 24,71 . In the specific case of manioc, abiotic resistance is relevant to crop adaptability to marginal areas and might be key for manioc adaption to harsher future climates 63 .
We recognize that the type of genomic library and the limited sampling number of some groups of varieties may have introduced bias in the analyses 83,84 . Although the groups of manioc varieties are not true populations, some aspects, such as the possibility of crossings and incorporation of sexual plants into clonally propagated varieties, make these groups biologically meaningful. Moreover, genetic approaches provided interesting results even when using generic groups of manioc varieties 85,86 and limited sample sizes in other study systems 87,88 .
The selective signatures discussed above should be regarded as initial hypotheses and further genome-wide association studies and quantitative trait loci mapping are required to confirm their biological significance 71,89 . Nonetheless, this new information may guide future bottom-up approaches to characterize the genomic changes and their associated phenotypic effects 90 relevant to manioc evolution under domestication and to assist breeding strategies, contributing to our understanding about adaptive genes in crops.
The genetic diversity of varieties from different biomes (H E ranges from 0.285 to 0.312) was similar to that observed in previous studies and suggests that the gene bank conserves high levels of genetic diversity, despite its limited size. Within Embrapa Brazilian manioc germplasm bank, Albuquerque et al. 49 reported an overall H E = 0.29 (biallelic SNPs) and Ogbonna et al. 91 found H O ranging from 0.26 to 0.39 across genetic groups. Similar results were observed for the gene banks of the International Institute of Tropical Agriculture (IITA, H E = 0.334) and the International Center for Tropical Agriculture (CIAT, H E = 0.341) 43 . This high genetic diversity is explained by the complex evolutionary dynamics of the crop under traditional cultivation, that results in varieties consisting of a predominant clone plus individuals morphologically similar, but genetically distinct 28 . Although smallholder farmers propagate manioc exclusively by stem cuttings, crossings between different varieties may occur producing sexual seeds that become part of the soil seed bank and may sprout amid clonally propagated plants 27 www.nature.com/scientificreports/ harvesting, the farmers may incorporate stem cuttings of these sexual plants into their clonal stocks leading to the amplification and maintenance of genetic diversity in the local scale 30 . The high genetic diversity observed in the different biomes may be explained by the widespread occurrence of the incorporation of sexual plants to clonal varieties, as well as by the selection for distinct human preferences under diverse ecological and cultural contexts 68,93 . Genetic evidence for different human preferences in distinct ecogeographic contexts has already been reported in different regions of South America for manioc 20,94 and other crops 95,96 . The crop's reproductive biology and other traditional farming practices may explain the low to moderate genetic divergence among biomes (F ST ranging from − 0.025 to 0.065) and the high admixture observed in clustering analyses (Fig. 3). Exchange networks of manioc varieties have been reported at local and ample geographic scales 35,48 . These exchange networks facilitate geneflow between distinct varieties at local and broader geographical scales. Genetic admixture among varieties from distinct geographical locations are commonly associated with extensive exchange networks of manioc and other crops 36,97,98 . This low overall genetic divergence may also reflect the initial common dispersal of landraces from their domestication center in Amazonia 20,99 . The Amazonian varieties were somewhat more divergent in relation to the other biomes (Fig. 3d, Table 3), possibly because almost all bitter varieties were from this region. The influence of ecogeographic variation in the distribution of genetic diversity in manioc is variable 20,91,100 , but the existence of divergent varieties from different biomes may also reflect the selection under distinct ecological and cultural contexts. Nonetheless, the genetic divergence between bitter and sweet manioc seems to be higher in Amazonia than outside this region 85,86 . It is possible that divergent selective pressures for bitterness or sweetness are more relaxed outside Amazonia, because the crop's dispersals were not always accompanied by cultural appropriation 13 . Knowledge about adequate processing to avoid intoxication after the consumption of bitter manioc landraces is essential to achieve food security where people rely on manioc cultivation 39 .
The remarkable divergence between wild and cultivated manioc was expected given the long history of the crop diversification under human selection and cultivation 16 . This result may also reflect many founding events 59 that accompanied the rapid spread of the crop across the Neotropics 23 and the wide dispersal across the world. The observed genetic divergence was similar to our previous study 85 , but more geographically extensive sampling of wild populations would improve our understanding about the current population dynamics between wild and cultivated manioc. Recent genomic approaches evidenced introgressions from some wild relatives in the genome of cultivated manioc 44,45 . Because the primary gene pool of manioc consists of 13 species 101 , introgressions may have been contributing to the extant genetic diversity of the crop. Therefore, genome-wide studies in cultivated manioc and different wild Manihot species could greatly contribute to our understanding about the evolution of the crop.
In this study, we observed high levels of genome-wide diversity in manioc varieties from different Brazilian biomes. It is noteworthy that the genetic diversity also included putative adaptive variation, which may be associated with the crop's domestication and its cultivation in distinct environmental contexts with different human preferences. Some of the signatures of selection may be associated with resistance genes and agronomic traits of interest, which might have practical importance for breeding purposes. The varieties conserved in the gene bank can be used as sources for reintroductions into smallholder communities 102 : the highly genetic divergent varieties are important resources for their specific regions of origin, while the admixed varieties may adapt well to cultivation in various locations. This study reinforces the importance of ex situ collections for the conservation of the crops' genetic resources, although it is financially and technically challenging to maintain active gene banks 8,103 . Our study also highlights the necessity of maintaining traditional practices of cultivation, since they are often associated with the management of a great diversity of other native crops 102,104,105 . In the context of a changing world, the characterization and conservation of agrobiodiversity is essential for the appropriate management of their genetic resources and ultimately for food security, especially of poor people.

Methods
Plant material and DNA isolation. We sampled apical leaves of the 78 manioc varieties (1 plant/accession) from the gene bank at the Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, São Paulo, Brazil (22° 42′ 26.8″ S; 47° 38′ 17.8″ W). These varieties were originally collected in smallholder farmer communities in six Brazilian states, located in four different biomes: Amazonia (16), Cerrado (30), Atlantic Forest (27) and Pantanal (5) (Fig. 1, Supplementary Table S1). Of these 78 cultivated varieties, 40 were originally recognized by the farmers as sweet varieties, 13 as bitter varieties, but this information was unavailable for 25 varieties, which are identified in Fig. 1 and Supplementary Table S1 as ND (non-designated). We also evaluated other samples collected in a previous study 85 : eight M. esculenta ssp. flabellifolia samples collected in the center of manioc domestication in Rondônia state, and six Amazonian varieties (three bitter and three sweet). We used these samples as references for the crop's closest wild relative, and for the major groups of cultivated manioc, respectively. The collections of these samples were registered in the Brazilian National Council for Genetic Patrimony CGEN (numbers A7994B4 and AEA71DE), according to Brazilian Law 13123 (20 May 2015). These registers also characterize our study as basic scientific research, enabling the experiments that we performed. The plant materials and methods employed in the present study are in compliance with local and national regulations. Because the manioc varieties are conserved in vivo in the gene bank, they were not deposited in herbarium, however DNA vouchers are stored in our laboratory. In addition, because all manioc varieties have been cultivated for a long time in local communities of smallholder farmers, no formal taxonomic identification was performed at the time of sampling to establish the gene bank.
The leaves were dehydrated with silica gel in paper bags and stored at -20 °C. We obtained total genomic DNA from 50 mg of leaf samples following the protocol described by Doyle and Doyle 106 and inspected DNA quality with electrophoresis in agarose 1% (w/v) gels stained with ethidium bromide. We estimated DNA concentrations www.nature.com/scientificreports/ with dsDNA BR Assay quantification kit for Qubit3 fluorometer (Invitrogen), and normalized DNA concentrations to 25 ng•μL -1 .

Genomic libraries and SNP identification.
We prepared two double-digest genotyping-by-sequencing (ddGBS) libraries as described by Poland et al. 107 . Briefly, 175 ng of genomic DNA were digested with PstI and MseI for one library, and NsiI and MspI for the other (all enzymes from New England Biosciences). The restriction fragments were ligated to adaptors complimentary to each of the restriction sites (including 96-plex PstI or NsiI adaptor sets with unique 4-9 bp barcode sequences). Ligation products of each sample were pooled and enriched for fragments containing different adapters in both ends through PCR. The concentrations of the ddGBS libraries were estimated using the NEBNext Library Quant kit for Illumina (New England Biosciences) through real-time PCR, and the libraries' profiles were inspected using the DNA 12000 Analysis kit for Bioanalyzer 2100 (Agilent). Each ddGBS library was sequenced twice in the NextSeq550 (Illumina) platform (singleend, 150 bp).
We inspected the quality of raw reads using FastQC 108 , and due to a large amount of 3'-end adaptors we trimmed the reads to 100 and 80 bp for PstI-MseI and NsiI-MspI libraries, respectively. We performed read trimming and demultiplex according to specific barcodes using the module process_radtags of Stacks 1.42 109 . We aligned the demultiplexed reads against the manioc genome Manihot esculenta v6 (NCBI PRJNA234389) 47 using Bowtie 2.2.1 110 with the "end-to-end" and "sensitive" configurations. We identified SNPs separately for each library using SAMtools 0.1.19 111,112 and VCFtools 0.1.17 113 , retaining only biallelic markers and only one SNP per read to avoid explicit linkage. Candidate SNPs had sequence depth ≥ 5X, minor allele frequency ≥ 0.05, mapping quality ≥ 13 and were present in at least 90% of the samples. The SNPs identified in each library were merged using MergeVcfs from Picard (http:// broad insti tute. github. io/ picard). Then, we used VCFtools 0.1.17 113 and SAMtools 0.1.19 111 to filter out SNPs within 100 bp and within windows of 1000 SNP sites with linkage disequilibrium (LD) r 2 ≥ 0.8, and to retain the SNPs observed in Manihot esculenta v6 chromosomes.
Genome scans. We used five different methods (pcadapt, F ST , FLK, hapFLK, and XP-EHH) with distinct underlying models to detect putative selective signatures (outlier SNPs). These analyses were performed considering two scenarios: i) the groups of wild versus cultivated manioc (including sweet, bitter, and non-designated varieties), and ii) the groups of varieties per biomes (without wild samples, but with non-designated varieties). With this design we aimed to identify loci with putative selective signatures related to either the domestication of manioc or the diversification of the crop in different cultivation environments.
Pcadapt is based on a principal component analysis (PCA) without assuming any explicit genetic model 114 . The detected outlier SNPs are associated with the first K principal components with greater contribution to the observed genetic structure. We used pcadapt 4.03 114 for R 115 to test 1 to 20 K principal components. We detected outliers based on K = 2 for wild versus cultivated manioc, and K = 5 for the biomes (Supplementary Fig. S3). The estimation of genetic differentiation among populations, measured by F ST or related statistics, is a classic method for detecting selective signatures that reflect a broad range of scenarios, such as selection of standing variation or incomplete sweeps 116 . We estimated Weir and Cockerham's F ST 117 among the groups of samples for each locus using diveRsity 118 for R 115 . FLK is similar to F ST , but it accounts for hierarchical structures using a kinship matrix to model the covariance of the populations' allele frequencies 119 . We estimated FLK for each locus using hapFLK 1.4 (https:// forge-dga. jouy. inra. fr/ proje cts/ hapflk) based on the calculation of Reynold's 120 distances from our data.
The three methods above are based on individual markers, and we also applied two tests (hapFLK and XP-EHH) considering "long-range haplotypes" that account for variation in recombination rates by comparing haplotypes to other alleles in adjacent loci 121 . HapFLK was built upon FLK for the detection of positive selection, including incomplete sweeps, from multiple populations and is robust in the presence of bottlenecks and migration 116 . Complimentarily, the cross-population extended haplotype homozygosity (XP-EHH) detects loci that have swept to near fixation (hard sweeps) within a specific population 121 . The hard sweeps detected based on XP-EHH estimates assume one ancestral and one derived population. While this makes sense in the case of the groups of wild and cultivated manioc, it might not be straightforward when contrasting different biomes. Even if we considered manioc varieties from Amazonia as ancestral to the varieties from the other biomes, we would follow a different criterion (pairwise comparisons among biomes) in relation to the other tests performed for the identification of selective signatures that considered the overall genetic divergence between biomes. For that reason, this analysis was performed only considering the groups of cultivated and wild manioc. For these two analyses, we used fastPHASE 1.4 122 to estimate missing data and reconstruct haplotypes. We used a perl script (https:// github. com/ lstev ison/ vcf-conve rsion-tools) to convert VCF files to fastPHASE input format and then performed 20 runs of the expectation-maximization (EM) algorithm with default configurations. We estimated hapFLK simultaneously with FLK considering the same Reynold's distance matrix and using 15 EM runs. We estimated XP-EHH using the functions scan_hh(), ies2xpehh() and calc_candidate_regions() in rehh 2.0 123 for R 115 . We set the functions to use unpolarized data, and to consider the XP-EHH estimates in the direction of wild to cultivated manioc (positive selection of alleles in the cultivated samples).
We considered as outlier SNPs the loci with q-values ≤ 0.10 in pcadapt, loci at the top and bottom 2.5% of the F ST estimates (reflecting positive and balancing selection, respectively), and the loci at the top 5% of the FLK, hapFLK and XP-EHH estimates. Then, we considered as loci putatively under selection those SNPs that were detected in at least two of the five (four) methods used for the groups of wild versus cultivated manioc (groups of varieties per biome).
We evaluated the predicted effects of outlier SNPs using SnpEff 124 . We recovered the Gene Ontology (GO) annotations, which summarize the information about biological processes, molecular functions, and cellular  129 . Then, we further assessed putative functional annotations described in UniProt for the blastp hits with the arbitrary threshold of identity ≥ 90% (PRGdb) or ≥ 60% (Swiss-Prot).

Population genetics analyses.
We performed the analyses of genetic diversity and structure considering only the putatively neutral SNPs (those that were not detected as outliers by at least two of the tests described above). We estimated the genetic diversity [total number of alleles (A), percentage of polymorphic loci (%P), number of private alleles (PA), observed (H O ) and expected (H E ) heterozygosities] and the inbreeding coefficients (f) using diveRsity 118 and PopGenKit 130 for R 115 . H O , H E , and f confidence intervals were obtained with 1000 bootstraps. We evaluated the genetic variation within and among groups of varieties with analyses of molecular variance (AMOVA), estimated pairwise genetic divergence (F ST ) among groups, and obtained their associated significance based on 20,000 permutations using Arlequin 3.5 131 . We investigated the patterns of genetic structure using sparse non-negative matrix factorization (sNMF) 132 and discriminant analysis of principal components (DAPC) 133 . sNMF assumes that the genetic data originate from the admixture of K unknown parental populations and estimates ancestry coefficients from multilocus genotypes 132 . This analysis is similar to other model-based approaches, such as Structure 134 , with the advantage of being robust to departures from traditional population genetic model assumptions 132 . We tested from 1 to 10 K ancestral populations with 200,000 iterations, ten repetitions for each K value, and used the cross-entropy criterion to visualize the results of the simulations. We performed this analysis using LEA 135 for R 115 . Complimentarily, DAPC summarizes information from large datasets (like genotypes from thousands of SNPs) to assign individuals to clusters without a pre-defined genetic model 133 . DAPC is useful for the assessment of genetic structure based on SNP datasets because it maximizes the variation among groups while minimizing the correlations between the original variables (such as LD) 133 . We performed DAPCs in adegenet 136 for R 115 based on the groups of wild and cultivated manioc from different biomes, and only for the groups of varieties per biomes (without wild samples). We opted to use these groups in DAPC because sNMF was performed without any a priori classification and suggested no clear number of ancestral populations (see "Results"). We used the function optim.a.score() to apply the alpha-score optimization to obtain the number of principal components (PCs) that reduced over-fitting of DAPCs membership coefficients 136 . After this procedure we retained nine PCs in the analysis considering the four biomes plus wild manioc, and 11 PCs in the DAPC considering only the four biomes.

Data availability
Final SNP data uploaded as online Supplementary