Selection and adaptive introgression guided the complex evolutionary history of the European common bean

Domesticated crops have been disseminated by humans over vast geographic areas. Common bean (Phaseolus vulgaris L.) was introduced in Europe after 1492. Here, by combining whole-genome profiling, metabolic fingerprinting and phenotypic characterisation, we show that the first common bean cultigens successfully introduced into Europe were of Andean origin, after Francisco Pizarro’s expedition to northern Peru in 1529. We reveal that hybridisation, selection and recombination have shaped the genomic diversity of the European common bean in parallel with political constraints. There is clear evidence of adaptive introgression into the Mesoamerican-derived European genotypes, with 44 Andean introgressed genomic segments shared by more than 90% of European accessions and distributed across all chromosomes except PvChr11. Genomic scans for signatures of selection highlight the role of genes relevant to flowering and environmental adaptation, suggesting that introgression has been crucial for the dissemination of this tropical crop to the temperate regions of Europe.

Domesticated crops have been disseminated by humans over vast geographic areas. Common bean (Phaseolus vulgaris L.) was introduced in Europe after 1492. Here, by combining whole-genome profiling, metabolic fingerprinting and phenotypic characterisation, we show that the first common bean cultigens successfully introduced into Europe were of Andean origin, after Francisco Pizarro's expedition to northern Peru in 1529. We reveal that hybridisation, selection and recombination have shaped the genomic diversity of the European common bean in parallel with political constraints. There is clear evidence of adaptive introgression into the Mesoamerican-derived European genotypes, with 44 Andean introgressed genomic segments shared by more than 90% of European accessions and distributed across all chromosomes except PvChr11. Genomic scans for signatures of selection highlight the role of genes relevant to flowering and environmental adaptation, suggesting that introgression has been crucial for the dissemination of this tropical crop to the temperate regions of Europe.
Following the process of domestication, crops were spread by humans over vast geographic areas, where they adapted to new and often extreme environments 1 . The Columbian Exchange 2 started in 1492 with the transatlantic journey of Christopher Columbus. This large-scale set of reciprocal biological introductions between continents provides a paradigm for the rapid adaptation of crop plants to changing environments. Changes in flowering time and photoperiod sensitivity were selected in parallel in the common bean (Phaseolus vulgaris), maize (Zea mays), potato (Solanum tuberosum), to name a few crops that have undergone selection for these adaptive traits 1,3 . Among crops originating from the Americas, the common bean was rapidly adopted and successfully disseminated across Europe 4 and it is now possible to identify local European varieties with Andean and Mesoamerican origins [5][6][7][8][9][10] . Nowadays, common bean, as other food legumes, is crucial for main societal challenges and to promote the transition to plantbased diets 11 .
The introduction of the common bean to Europe from two distinct centres of origin offered an opportunity for widespread genepools hybridisation and recombination 9 . Studies of common bean evolution in Europe can exploit the parallel domestication processes and the major genetic differences between the two American genepools. This provides an ideal model to study the role of introgression admixture, particularly in the accessions of Mesoamerican origin. The occurrence of pervasive admixture among gene pools characterises also other centres of cultivation of common bean (see Supplementary Note 17 and Supplementary Fig. 24), and it is supported in Europe also by our phylogenetic network analysis (see Supplementary Note 18 and Supplementary Fig. 25-26 for more details).
Finally, to provide insight into the population structure of the American pure accessions, we considered their passport data (geographic distribution and country of origin) and phenotypic data on growth habits and photoperiod sensitivities. We identified a clear correspondence between the genetic groups in our sample from the Americas and the well-known common bean eco-geographic races: 18 M1 corresponded to the higher-altitude Durango and Jalisco races, which originated primarily in northern and southern Mexico, respectively; M2 corresponded to the lower-altitude Mesoamerican race, which is mostly photoperiod-insensitive and is distributed in lowland Mexico, in Central America and in the Caribbeans; A1 corresponded to the generally photoperiod-insensitive Nueva Granada race; A2 corresponded to the Peru race, which includes entries with vigorous climbing growth habits and photoperiod sensitivity; and A3 corresponded to the Chile race, which has also been identified in archaeological samples from northern Argentina that date from 2500 to 600 years ago 19 . The identification of these well-defined ancestral genetic groups in the Americas offers a robust basis to study the intergenepool and inter-race introgression that may have facilitated adaptation to European environments.

Asymmetric introgression and recombination between genepools underlie European common bean adaptation
Given the presence of admixed individuals, we studied the intergenepool hybridisation and introgression pattern associated with the evolutionary history of the common bean in Europe by genetic assignment at the chromosome level in ChromoPainter v2.0 20 (Supplementary Note 19). When compared to admixture analysis, this can provide information on recombination between markers and the size of regions that can be attributed to different ancestries. The 66 pure American accessions (Supplementary Data 4), from the five genetic groups identified using admixture, due to their low levels of admixture, were used as donor (reference/founder) populations for the chromosome painting of the European genotypes. On this basis, and for each European genotype, we attributed all the single-nucleotide polymorphisms (SNPs) and chromosomal regions to particular ancestries, taking into account within-accession recombination breakpoints. Using this approach, we were also able to detect recombination events between genepools at the whole-genome level, even in accessions that showed <1% introgression. Overall, 71 European accessions were attributed to the Andean gene pool (EU_AND) and 43 were assigned to the Mesoamerican gene pool (EU_MES), in agreement with the admixture analysis (Pearson correlation: r = 0.99, p < 0.01; Supplementary Fig. 27) and confirming previous knowledge about the prevalence of Andean genotypes in Europe 5,21 . Globally, the inferred amount of per-accession introgressed material differed between the EU_MES and EU_AND accessions (two-sided K-S test, p = 3.3 × 10 -3 ), showing median proportions of 4.7% and 9.2%, respectively. In the EU_MES accessions, 0.01-44.9% of the genome had introgressed from the other genepool, with only one EU_MES accession showing <1% genome introgression ( Fig. 3a; see also Supplementary Note 19). These proportions were similar in the EU_AND samples, ranging from 0% to 42.2% (Fig. 3a), although two EU_AND accessions showed no introgression and 10 EU_AND accessions showed <1% introgression from the other genepool (Fig. 3a). The pervasive effect of admixture in European individuals was confirmed by the presence of several accessions with >20% of their genome acquired by introgression in both the EU_MES (8 of 43 accessions, 18.6%) and EU_AND (11 of 71 accessions, 15.5%) groups (Fig. 3a). The inferred ancestry per each chromosome is represented in Supplementary Fig. 28-38.
The median length of the introgressed genomic segments was higher for the EU_AND accessions (EU_AND = 217 kb, EU_MES = 70 kb; Mann-Whitney test, p = 7.22 × 10 −10 , Fig. 3b, c), with more extended regions introgressed into EU_AND particularly on chromosomes PvChr02, PvChr05, PvChr06 and PvChr09 (Fig. 3b, c). We obtained very similar results when we repeated the analysis by excluding six accessions showing an admixture proportion > = 40% ( Supplementary  Fig. 39). The EU_AND accessions carried longer Mesoamerican introgressed haplotypes; the maximum length of the introgressed genomic blocks measured in the EU_AND chromosomes was always higher than what present in EU_MES individuals (Supplementary Table 7), supporting the more recent introgression of Mesoamerican genome Our results indicate that the first cultigen successfully disseminated across Europe was composed of Andean types. This is shown by the smaller introgression segments of Andean origin and the higher frequencies of Andean-derived common bean accessions in Europe. Our data are also consistent with available historical records. Indeed, the first unambiguous evidence for the introduction of common bean in Europe points to Andean cultivars 22 probably introduced into Spain by Francisco Pizarro in 1529 following the exploration of Peru. Piero Valeriano Bolzanio received common bean seeds from Giuliano de Medici (Pope Clement VII, 1523-1534), which had been donated to the same pope by Emperor Charles V's Spanish emissaries from Sicily (where the bean seeds were harvested). The very detailed writing of Piero Valeriano Bolzanio refers to common bean seeds, describing in depth several phenotypic traits supporting their Andean origin, as also recently proposed 23 . Valeriano documented his efforts, along with a network of collaborators in the north-east of Italy, Slovenia and Dalmatia, to grow and reproduce beans starting in 1532 22 , with the first report of a putative Mesoamerican genotype in Europe dated 1542 23 . Historical information and timelines, together with our data showing asymmetric introgression, suggest an earlier successful introduction and spread of the Andean genepool into Europe. This may also explain the high frequency of A1/Nueva Granada Andean ancestries (Fig. 1e) in Sicily, the south and north-east of Italy, Slovenia and Croatia, because they could have been among the first European areas to cultivate common bean with the earliest introduced Andean genotypes probably from the Nueva Granada race.
Adaptive differences among common beans in the New World may also have influenced the distributions in Europe. For example, M1/ Durango-Jalisco genotypes can be particularly photoperiod sensitive, and may therefore have failed to adapt well to many European environments, thus, limiting their dissemination, particularly in central and northern Europe (Fig. 1e). In contrast, southern Spain, southern Italy, Sicily, North Africa, Madeira Island, and the Canary Islands are characterised by mild winters. In these environments, photoperiodsensitive and late-flowering genotypes, or those adapted to warmer conditions, may have easily completed the crop cycle. As also reported by others 5,24 , we found that Mesoamerican genotypes are more frequent in specific European regions, particularly in south-eastern Europe ( Fig. 1e), which also suggests that the history of their introduction may have contributed to their current distribution. As with the role of Charles V and Pope Clement VII in the early dissemination of the Andean beans, the political subdivision of Europe and the Mediterranean basin in the 16th century may have influenced the dissemination of the Mesoamerican genepool. The Ottoman Empire dominated the southern shores of the Mediterranean, the Nile Basin, the Red Sea into eastern Africa, and south-eastern Europe, spanning the area from modern-day Greece to Austria. The prevalence of Mesoamerican genotypes in eastern Africa and China 24,25 may reflect their initial introduction into Africa from Spain during the Ottoman Empire, which extended its rule in north-eastern Africa and controlled the exchange of goods with China through the Silk Road. Although additional comparative studies between European and Chinese centres are required, our hypothesis is compatible with our results from a de-novo admixture analysis applied to Chinese landraces 25 , reported in Supplementary Note 17 ( Supplementary Fig. 24). The importance of political/cultural factors associated with the dissemination of common bean genotypes in Europe is compatible with the lack of significant spatial and ecological patterns between genetic, geographic and ecological distances. Indeed, the routes of dissemination based on cultural and political factors are often independent of geographic and environmental distances, making the occurrence of correlations between genetic distances and geographical or environmental differences less likely [26][27][28][29][30] (see also Supplementary Note 22).

Analysis of the environmental associations
We used the geographic distribution of the five ancestral components inferred by ChromoPainter in an association analysis with biogeographical variables (Supplementary Note 22 and Supplementary Data 6). Ancestral components of A3/Chile negatively correlated with latitude (Supplementary Table 9; r = -0.35, p = 0.0001) and were never observed above the 47 th parallel (Fig. 1e). Moreover, A3/Chile component was associated with warmer climates, particularly the maximum temperature in September (Supplementary Table 9; r = 0.29, p < 0.002). Although A3/Chile did not appear any more photoperiodsensitive than A1/Nueva Granada (see Supplementary Note 23 on flowering and metabolomics variability among genetic groups; Supplementary Table 10-11), some American A3/Chile individuals tend to flower later ( Fig. 2d and Supplementary Fig. 41) at higher latitudes when grown in Europe (Fig. 2e). Here, we suggest that although and European (EU; right) accessions, with the identification of two gene pools (identified as clusters 1 and 2) that show correspondence to the two main common bean gene pools based on our passport data (cluster 1 -Andean, cluster 2 -Mesoamerican), and several intermediates and admixed genotypes in Europe. The chloroplast ancestry assignment is shown for the accessions (triangles below the chart) when not consistent with the nuclear assignment b Admixture plots for the AM Mesoamerican accessions (K = 2) grouped by geographic origin (i.e., latitude and state), which identifies two main subgroups (M1 and M2). c Admixture plots for the AM Andean accessions (K = 4) grouped by geographic origin (i.e., latitude and state), which identifies three Andean genetic subgroups (A1, A2 and A3). A fourth cluster in four accessions, based on the whole-set ADMIXTURE analysis (K = 2), was induced by the occurrence of Mesoamerican alleles with AM_M1/AM_M2 components (see also supplementary Note 15). d Neighbour-joining tree and seed pictures of the 66 pure American accessions. e Spatial interpolation of the geographic distributions of the EU Mesoamerican (M1 and M2) and EU Andean (A1, A2 and A3) ancestry components in Europe, as inferred by ChromoPainter analysis. Maps were designed using the map tools implemented in different R packages, such as spatial, maps, fields, maptools, raster, rgdal. f Primary and secondary domestications of Mesoamerican and Andean genetic groups/races in America. Loss of photoperiod sensitivity during secondary domestication was a relevant factor for the introduction of the Andean A1/Nueva Granada and A3/Chile and for the Mesoamerican M2/Mesoamerica ancestries in Europe (solid arrow). Genetic group M1 (Durango-Jalisco race) was successfully introduced into Europe after introgression from other genetic groups characterised by little or no photoperiod sensitivity (dashed arrow). Genetic group A2 (Peru race) was not introduced into Europe due to its high photoperiod sensitivity (discontinuous and truncated line). Source data are provided as a Source Data file.
A3/Chile was successfully introduced in Europe, a residual sensitivity to the photoperiod might be still preserved in some European genotypes mainly belonging to this ancestry, that show delayed flowering at certain latitudes (Fig. 2e), that may have also influenced the dissemination of this common bean ancestry in Europe (Fig. 1e, f). However, compared to the Mesoamerican genotypes, A3/Chile was more uniformly distributed in Europe across different longitudes (Fig. 1e), which also supports the earlier introduction of Andean genotypes. Only a few weak associations with environmental variables were detected for the other genetic groups (Supplementary Note 22).

Analysis of genetic diversity in the European common bean
To disentangle how inter-genepool hybridisation have shaped the genetic diversity of the European common bean, and given the The development of a masked dataset allowed us to consider nucleotide diversity using the frequencies of two reconstructed nonadmixed populations of Andean and Mesoamerican origin. From each European genotype, all the Andean SNPs were separated from the Mesoamerican SNPs and included in the two masked datasets. Based on the unmasked and masked datasets, American common bean accessions showed moderately higher nucleotide diversity than European accessions, apparently due to the introduction bottleneck in Europe (Fig. 4a, b). Moreover, when compared to the Mesoamerican gene pool, the Andean gene pool showed an overall lower diversity in the primary centres of domestication (Americas) using both the masked and unmasked datasets (Fig. 4a, b). This confirms that the diversity of the Andean germplasm at the centre of origin might still reflect the bottleneck that occurred in the Andean wild populations during the expansion into South America before domestication 31 , as reflected in the domesticated pool 32 . Indeed, we detected~70% lower diversity (θ π /bp) in the Andean compared to the Mesoamerican accessions. Very similar results were obtained when we repeated the analysis by excluding six accessions showing an admixture proportion > = 40% (Supplementary Note 25).
When American and European genetic diversities were compared within each genepool using the unmasked dataset (AM_AND vs EU_AND and AM_MES vs EU_MES), due to the admixture, European diversity was always higher than American diversity, but the opposite was found when using the masked dataset (Fig. 4b). In other words, we e Proportions of the genetic memberships -P(AM_A1), P(AM_A2), P(AM_A3), P(AM_M1), P(AM_M2), P(SAND), and P(SMES)inferred from the donor accessions and composing the American and European accessions (grouped as mainly AM_A1, AM_A2, AM_A3_AM_M1, AM_M2, EU_A1, EU_A3, EU_M1, EU_M2, and EU_MIX) are shown in the pie charts below the corresponding groups and flowering data (number and percentage of individuals with delayed or no flowering) in northern and southern Europe, related to the corresponding groups. c, d box plots represent minimum, first quartile, median, third quartile and maximum. Source data are provided as a Source Data file. show how the Andean common beans from Europe are more diverse than those from America because of admixed ancestry with the Mesoamerican gene pool, as seen by comparing the genetic diversities of the unmasked and masked datasets (EU_AND). This comparison of the estimated levels of genetic diversity in Europe reflects the key role of inter-genepool hybridisation and recombination in shaping the diversity of the European common bean. The genetic diversity was higher in M1/Durango-Jalisco than M2/Mesoamerica accessions in Americas, and also in A2/Peru than A1/Nueva Granada accessions in Americas, whereas the amount of diversity in A3/Chile accessions was very low. Combined with the neighbourhood-joining tree shown in Fig. 1d, this indicates that the A2/Peru and M1/Durango-Jalisco races have been probably the earlier domesticated Andean and Mesoamerican populations, from which the other races arose by secondary domestication associated with the loss of photoperiod sensitivity (Fig. 1f). Indeed, earliness and loss/reduction of photoperiod sensitivity were important traits under selection during the expansion of the common bean in Europe. This is also suggested by our test for the occurrence of selection for flowering during the introduction of common bean in Europe (see also Supplementary Note 23). The genetic differentiation between American and European accessions for flowering time (PC1 based on flowering data across different European field and greenhouse trials) was measured using Q ST  hereditable metabolites and with the F ST distribution of the SNPs. We show that the Q ST for the flowering is in the top 97.5% of distribution of the Q ST for highly inheritable metabolites, being also an outlier (99.5%) compared to our F ST distribution, suggesting that flowering is likely a candidate trait that underwent a selection process 33,34 . Considering the Andean genepool, the successful introduction in Europe was connected to the domestication pattern at the centre of origin. The earlier, photoperiod-sensitive domesticated genotypes were less successfully disseminated in Europe. Indeed, the relationship between the American and European genetic groups of Andean origin (as defined by ChromoPainter), coupled with the phenotypic data for flowering (Fig. 2d), shows that the A2/Peru race was more photoperiod-sensitive and was not introduced into Europe successfully, due to the lack of adaptation (other than a single highly admixed accession, qA2 = 43.6%). In contrast, the remaining Andean genetic groups (A1/Nueva Granada and partially A3/Chile) became widespread in Europe. A different scenario was seen for the Mesoamerican genotypes, especially M1/Durango-Jalisco, where introgression appears to have been an important element in the dissemination of the common bean in Europe (Fig. 1f, 2d, e). M1/Durango-Jalisco showed very high levels of admixture in the European material due to introgression from the M2/ Mesoamerica and the A1/Nueva Granada and A3/Chile (Fig. 2e), which likely contributed to reduced photoperiod sensitivity compared to the American Durango-Jalisco counterpart (AM_M1) (Fig. 2d), supporting its dissemination throughout Europe (Fig. 1e, f). For the Andean genotypes, both the diversity pattern and photoperiod sensitivity (Fig. 2d) suggest at least two domestication steps occurred: primary domestication of photoperiod-sensitive populations (A2/Peru) and secondary domestication characterised by reduced photoperiod sensitivity (A3/Chile and particularly A1/Nueva Granada). This indicates that secondary domestication 35 was necessary for the successful dissemination of the Andean common bean in Europe (Fig. 1f). For the Mesoamerican genotypes, an open question is where and when the introgression from the Andean genepool occurred. We suggest this is likely to have happened in southern Europe and along the southern Mediterranean shore, where the warmer climate in winter may have favoured the Mesoamerican genotypes.

Proportion of introgressed
The average linkage disequilibrium (LD) decay in accessions from Europe and the Americas (Fig. 4c) is consistent with the historical differences between the genepools and the effects of high intergenepool hybridisation and introgression at the whole-genome scale in Europe (see also Supplementary Note 27). Admixture in Europe increased the molecular diversity (i.e., effective population size). It also generated new genome-wide admixture LD due to new combinations of alternative alleles in each genepool. Accordingly, inter-genepool hybridisation followed by recombination reduced LD at a long distance but, as expected, had a limited effect on LD decay at short distances because regions are directly inherited from the source populations 36 . When we compared the American and European accessions, LD decay was much faster over short distances (<1.5-2.0 Mb) in American genotypes. In contrast, there was faster LD decay over greater distances (>3 Mb) in European populations (Fig. 4c). This reflects higher historical rates of recombination in the American genotypes over short distances and the effect of recombination due to inter-genepool introgression in Europe over long distances. A similar pattern was seen when the Mesoamerican and Andean genepools were analysed separately ( Supplementary Fig. 60). However, the Andean accessions were characterised by higher baseline LD levels. Indeed, the AM_MES and AM_AND populations reached r 2 = 0.2 at~500 kb and~1 Mb, respectively, whereas r 2 = 0.2 was reached at~1.1 and~3.5 Mb for the EU_MES and EU_AND samples, respectively.

Synonymous and missense mutations
The ratios between missense and loss-of-function mutations over synonymous mutations were calculated to reveal patterns of genetic load across genepools and continents (see Supplementary Note 28). The total amount of synonymous, missense, stop-gain and stop-loss non-reference alleles within each accession is provided in Supplementary Data 7. We observed a clear pattern in the genetic load reflecting differences between the Andean and Mesoamerican origins, with the Andean accessions showing a higher genetic load due to the bottleneck before domestication (Fig. 4d). We observed a reduced genetic load in EU_AND for both the loss-of-function (EU_AND vs AM_AND; Mann-Whitney p = 0.03) and the missense mutations (EU_AND vs AM_AND; Mann-Whitney p = 0.005). Conversely, we did not observe a reduction of genetic load in EU_MES for both missense and loss-of-function mutations (Fig. 4d). This suggests that the relatively short period of inter-genepool hybridisation, followed by selfing and recombination, promoted the purging of deleterious alleles accumulated in the European Andean pool. The role of hybridisation and subsequent recombination was also supported by the pattern of long-range LD in Europe compared to the Americas (Fig. 4c). The pattern for private alleles (i.e., those not identified in other genepools or populations; Supplementary Note 29) in the American and European accessions for low-frequency mutations (<5%) revealed a 1.44fold higher frequency of non-synonymous over synonymous mutations in Europe ( Supplementary Fig. 61). This may have resulted from the pattern of crop dissemination, which was probably characterised by the exchange of small quantities of seeds and several sequential bottlenecks, followed by rapid population growth at the single-farm level, leading to the fixation of most mutations due to the small population size (i.e., a founder effect). In this demographic context, most mutations would be fixed rapidly at the local level (within the population grown by a single farmer). However, it is also possible that the purging of deleterious mutations, due to hybridisation following seed exchange among farmers and the co-occurrence of different varieties in the same fields 21 , facilitated the combined effects of natural and human selection against deleterious recessive alleles and the capture of valuable variants.

Selection and adaptive introgression
We defined putative adaptive introgressed loci (PAIL) as those showing signatures of adaptive introgression meeting the following requirements: (a) an excess of introgression based on Chromopainter, (b) a signature of selection detected using the hapFLK method, which analyses multiple populations, jointly considering their hierarchical structure 37 , and (c) an outlier F ST value between Europe and the Americas, suggesting different patterns of diversity between these regions. The level of genetic differentiation (F ST ) among genetic groups is reported in Supplementary   Supplementary Table 15 and Supplementary Data 8) have been identified as described in Supplementary Note 31. Although the hapFLK method allowed us to identify selection signatures across the genome, an outlier F ST value was used to define which selection signature represents significant differentiation at the genomic level between American and European populations, suggesting selection in Europe. The identification of excess introgression independently of hapFLK provides evidence for adaptive introgression and the identification of PAIL. We also considered the occurrence of inter-chromosomal LD across genomic regions, private to European genepools (Fig. 4e, Fig. 5, see also Supplementary Note 32, Supplementary Fig. 64-66, Supplementary Data 9-12 and Supplementary Table 16) as an interesting signal to define regions potentially involved in adaptive processes. We also identified, through a GWA analyses (see Supplementary Note 33-36) genomic regions associated to flowering and growth habit( Supplementary Fig. 67-75 and Supplementary Table 17). Finally, we suggest a putative role for candidate loci/ genes, combining our data on adaptive introgression regions and GWAS peaks with a literature search on the orthologous genes function (see Supplementary Note 37-52 for details). Adaptive introgression appears particularly important for the evolution of the European genotypes of Mesoamerican origin (EU_MES). We identified 44 Andean genomic regions with excess introgression (23 of which showed signals of adaptive introgression) that are shared by >90% of the European genotypes, spanning all chromosomes except PvChr11 (Supplementary Data 8; F(AND)) and ranging from~5 to~118.5 kb in length. An Andean allele frequency of 96% was detected along a genomic segment of PvChr01 (Chr01:46175616-46294040; Supplementary Data 8) that shows signs of adaptive introgression. This region contains 18 genes including Phvul.001G203400, which is orthologous to OVARIAN TUMOR DOMAIN-CONTAINING DEUBIQUITINATING ENZYME 5 (OTU5) (see Supplementary Note 43 and Supplementary Data 13; row 16). In Arabidopsis thaliana, the function of this gene is to recalibrate and maintain cellular inorganic phosphate homeostasis 38,39 . The common bean orthologue may therefore be involved in the phosphate starvation response, making it an interesting candidate for further testing. In the same region on chromosome Pv01, we also identified Phvul.001G204600 and Phvul.001G204700 (see Supplementary Note 41 and 50 and Supplementary Data 13; rows 29 and 30), which are orthologous to LUMINIDEPENDENS and NUTCRACKER, respectively (Fig. 5  The genes involved in the photoperiod, vernalisation, autonomous and gibberellin pathways that lead to the transition from vegetative to flowering are shown below the corresponding pathway. Additional genes belonging to secondary pathways that interact with the main regulatory flowering networks are shown in italic. Orthologues genes in common bean showing signatures of adaptive introgression, and those located in GWA peaks in our study are highlighted as follows: yellow hexagonscommon bean orthologues of LHY (Phvul.009G259400, Phvul.009G259650) and VRN1 and RTV1 (Phvul.011G050600) showing private interchromosomal linkage disequilibrium (LD) in the EU_M pool; pink hexagonscommon bean orthologues of LD (Phvul.001G204600), NUC (Phvul.001G204700), CGA1 and GNC (Phvul.003G137100) showing private inter-chromosomal LD in the EU_M pool; red outlinesat least one orthologous gene in common bean showing signature of selection, introgression and with a significant differentiation (F ST index) between American and European accessions (p < 0.05); orange outlinesat least one orthologous gene in common bean showing a signature of selection with no significant F ST (p < 0.05); blue asterisksat least one orthologous gene in common bean showing a signature of introgression; dashed blue outlinesat least one orthologous gene in common bean located within 50 kb centered on a significant GWA peak for days to flowering; dashed green outlinesat least one orthologous gene in common bean located within 50 kb centered on a significant GWA peak for growth habit; arrowspositive regulation of gene expression; truncated arrowsrepression of gene expression; solid linesdirect interactions; dashed linesindirect interactions in A. thaliana. Candidate genes for adaptation or post-domestication of the common bean in Europe, orthologous to those involved in flowering-related pathways, are shown in parentheses: UBP12/13 (Phvul.007G234000); LHY (Phvul.009G259400, Phvul.009G259650); LUX (Phvul.011G062100); PIL5 (Phvul.001G168700); CIB2 (Phvul.008G133600); LRB1 (Phvul.006G109600); DRIP1/2 (Phvul.001G157400, Phvul.007G177500); VRN1, RTV1 (Phvul.011G050600); UBC1/2 (Phvul.003G191900); LD (Phvul.001G204600); TFL1, ATC (Phvul.001G189200); GA2OX4 (Phvul.006G120700); CGA1, GNC (Phvul.003G137100); GAI, RGA1, RGL1, RGL2 (Phvul.001G230500); LMI1 (Phvul.001G184800, Phvul.001G184900); SIC (Phvul.008G182500); CRP (Phvul.008G142400); MYB30 (Phvul.008G041500); NUC (Phvul.001G154800, Phvul.001G204700, Phvul.011G074100); SUC9 (Phvul.004G085100, Phvul.004G085400, Phvul.004G085594); CYP715A1 (Phvul.007G071500).
This adaptive introgression region in the common bean genome is close to known regions associated with flowering time, such as the fin locus controlling determinacy, and explaining phenotypic variation also for flowering time 40 . The region also co-maps with Phvul.001G189200 (PvTFL1y; Pv01:44856139-44857862) 41 and shows linkage 40 to the Ppd locus controlling photoperiod sensitivity 42 . Wu et al. 25 recently identified several markers on chromosome Pv01 associated with flowering under different conditions, with one at~45.5 Mb. In our GWA, we identified a significant association between photoperiod, flowering time (see also Supplementary Note 36) and the marker S01_48049738, which is found~400 kb downstream of Phvul.001G221100 (Chr01:47642033-47647745) a gene that has been proposed as a candidate for the common bean Ppd locus 43,44 .
Overall, we identified 77 genes that are PAIL. These represent~11% of all genes (n = 681) showing signatures of selection in Europe (i.e., selection signatures identified with hapFLK and being in an F ST outlier window; n = 354) and/or excessintrogression (n = 404). Accordingly, 277 genes show selection in Europe but not excess introgression, and 327 show excess introgression but not selection in Europe. The 77 PAIL show enrichment in seven Gene Ontology categories including GO:0048523, negative regulation of cellular processes; GO:0010228, vegetative to reproductive phase transition of the meristem; GO:0042445, hormone metabolic processes; GO:0009657, plastid organisation; GO:0042440, pigment metabolic processes; GO:0009733, response to auxin; and GO:0070647, protein modification by small protein conjugation or removal (see Supplementary Note 39 and Supplementary Fig 76). Enrichment analysis also suggested that flowering has been an important target of adaptive introgression, highlighting the importance of genes that may be associated with adaptation to abiotic and biotic stress.
Adaptive introgression signals were identified in many P. vulgaris genes with a putative role related to flowering (Supplementary Note 40 and 41), including orthologues of genes involved in the four major A. thaliana flowering pathways (Fig. 5). Significant examples include Phvul.009G259400 and Phvul.009G259650 (see Supplementary Note 41 and Supplementary Data 13; rows 90 and 92), which are orthologues of A. thaliana LATE ELONGATED HYPOCOTYL (LHY), both located within the same adaptive introgressed region of chromosome PvChr09 (characterised by an Andean allele frequency of 96% in the European genotypes; Supplementary Data 8). Notably, the transcription factor encoded by LHY is a pivotal oscillator in the morning stage of the circadian clock. It is connected to the indirect suppression of the middle, evening, and night complex genes by CIRCADIAN CLOCK ASSO-CIATED 1 (CCA1) 45 (Fig. 5), with a putative function in regulating flowering that has also been proposed in the common bean 46 . In the EU_MES population, these two LHY orthologues show private and significant inter-chromosomal LD with Phvul.011G050600, which yields GWA signals for flowering time and adaptive introgression (see Supplementary Note 50 and Supplementary Data 13; row 97). The latter is an orthologue of the A. thaliana genes VERNALISATION 1 (VRN1) and RELATED TO VERNALISATION1 1 (RTV1) (Fig. 5), which are needed to activate the floral integrator genes following long-term exposure to cold temperatures 47 . The inter-chromosomal LD between these putative flowering genes, which is private to the EU_MES accessions, may be the  (Fig. 5). NUC encodes a transcription factor that positively regulates photoperiodic flowering by modulating sugar transport and metabolism via the FLOWERING LOCUS T (FT) gene 48,49 . The paralogous GNC and CGA1 genes act redundantly to promote greening downstream of the gibberellin signalling network 50 .

Discussion
We have shown that adaptive introgression was necessary for the successful dissemination and adaptation of the common bean in Europe. We combined genome resequencing, metabolomics, classical phenotyping and data analysis for chromosome-level genetic assignment and environmental association. Our data indicate that the Andean genepool was the first to be successfully introduced in Europe, most likely from Francisco Pizarro's expedition to northern Peru in 1529. Most of the Andean genetic background of the European common bean was contributed by the A1/Nueva Granada and A3/Chile races after secondary domestication, whereas the more photoperiodsensitive A2/Peru race contributed little to European common bean germplasm. The secondary domestication of these Andean races, related to the latitudinal expansion of the cultivation areas from the Andean centres of origin, facilitated the successful dissemination of the Andean common bean in the Old World. However, we propose that the adaptive introgression observed in Europe for individuals mainly belonging to the M1/Durango-Jalisco race was an important event that underpinned the successful dissemination of this Mesoamerican ancestry in Europe. Indeed, genomic analysis indicated that Andean genotypes were rapidly disseminated, whereas Mesoamerican genotypes were eventually disseminated in Europe following introgression from the Andean types. In addition to the flowering time data gathered from the European and American accessions, we also identified clear signatures of selection in common bean orthologues of genes representing the major flowering pathway and environmental adaptations, such as the OTU5 gene involved in the inorganic phosphate starvation response. These are interesting candidate loci for further validation. Finally, we propose that the dissemination of common bean accessions in Europe may have been influenced by political factors and constraints in the 16th century, including the interaction between political and religious powers in Western Europe and the subdivision of the European continent into Islamic and Christian countries.

Plant materials
Original seeds for 218 common bean accessions (P. vulgaris) were collected from international gene banks or individual institutional collections. We produced 199 single seed descent (SSD) lines by performing at least three cycles of self-fertilisation. For the remaining 19 accessions, one seed per accession was sampled directly from original seeds provided by the donor.

Experimental design and phenotyping
Plants were grown across 10 different environments in fields and greenhouses, applying long-day (7), short-day (2)  Classical phenotyping was carried out on the 199 SSD lines, focusing on two main traits: days to flowering (DTF), defined as the number of days from sowing until 50% of plants showed at least one open flower; and growth habit (GH), defined as determinacy vs indeterminacy on a single plant basis. Photoperiod sensitivity (PS) was calculated as the ratio between DTF in long-day and short-day experiments.
Descriptive statistics were calculated for the different phenotypic traits using R (https://cran.r-project.org/) or JMP 7.0.0 51 . The restricted maximum likelihood (REML) model implemented in JMP 7.00 was used to calculate least square means (LSM) and the best linear unbiased predictors (BLUPs) of each genotype. The REML model was also used to calculate the broad-sense heritability (h 2 B) for each quantitative trait by considering genotypes and environments as random effects. The distribution of DTF in each environment and Pearson's pairwise correlation between environments were calculated using the corrplot and PerformanceAnalytics R packages 52,53 .
Molecular phenotyping (see Supplementary Note 5) of the 199 accessions was performed on the first trifoliate fully expanded leaves harvested under long-day conditions in three biological replicates. Secondary metabolites were measured as described in Perez de Souza et al. 54 . For non-targeted metabolomics, chromatogram peak detection and integration were achieved using GeneData REFINER MS 10.0 (http://www.genedata.com). To explore the molecular phenotypic diversity, we performed non-targeted metabolic fingerprinting by high-throughput LC-MS analysis. Mass signals that were not detected in ≥50% of the samples and/or those with a peak intensity ≤1000 were excluded. Heritability was analysed as stated above, setting genotype and continent as random effects. The heritability was calculated based on 190 accessions (94 from the Americas and 96 from Europe) with more than one replicate.

Sequencing, variant calling and annotation
Genomic DNA was extracted from frozen young leaves of the 199 SSD lines grown in greenhouse and directly from seeds of the remaining 19 accessions using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). It was sheared using a Covaris E220 device to fragments of~550 bp and PCR-free libraries were constructed using the KAPA HyperPrep Kit. Paired-end sequencing libraries were sequenced on Illumina HiSeq2500 or HiSeq4000 devices and labelled with different barcodes.
Sequencing data were aligned to the common bean reference genome v2.0 15 using BWA-mem v0.7.15 55 . Unmapped reads were mapped to the P. vulgaris chloroplast genome (NCBI NC_009259). In both cases, SNPs were called using SAMtools 56 and Genome Analysis ToolKit (GATK) v3.6 57,58 . In SAMtools, duplicated reads were removed with rmdup and SNPs were discovered with mpileup for filtered highquality alignments (-q 10) and bases (-Q 20) and then genotyped with BCFtools 59 . In GATK, duplicated reads were sorted and filtered with Picard v2.4.1 (http://broadinstitute.github.io/picard). Variants were then called according to GATK best practices and pre-filtered using the recommended parameters for hard filtering. Chromosomal and overlapping SNPs reported by both methods were retained and the genotypes produced by GATK were selected. We applied additional filtering with VCFtools 60 (-minDP 3-max-missing 0.5-maf 0.05) and excluded SNPs with proportions of heterozygous genotypes >0.01. SNPs were annotated with SnpEff v4.3s 61 .

Population structure analysis
Population structure analysis was applied to the SAMtools/GATK overlapping SNP callset, followed by filtering to retain only genomic positions with a QUAL score ≥30 and a global depth of coverage between 1/3 and 4 times the mean value. Individual genotypes called using two reads or fewer were marked as missing data. Imputation and phasing were performed with Beagle 62 . ADMIXTURE v1.3 13 was used for population structure analysis. The unphased variants were filtered by taking one SNP every 250 kb using VCFtools. In ADMIXTURE, we varied K from 1 to 20 in 20 replicates and applied the analysis independently over the whole sample of American and European (n = 218) accessions or using the American accessions only (AM, n = 104). We dealt with potential cryptic population structures within each pool as previously described 16,31,32 .
Population structure was inferred from chloroplast data using Bayesian Analysis of Population Structure (BAPS) v5.3 63,64 . Mixture analysis was used to determine the most probable K value according to the data. Clustering with linked loci analysis was chosen to account for the linkage between sites. Ten repetitions of the algorithm for each K value from 2 to 20 were applied. The relationships between genotypes were determined using the neighbour-joining method in MEGA X 65 with a bootstrap value of 1000. Gaps and missing data were excluded.

Chromosome painting
Chromosome painting was applied to the phased variants in Chro-moPainter v2.0 20 . The effective population size (Ne) and mutation rates (Mu) were estimated individually for each accession using 10 iterations of the expectation-maximisation algorithm in Chromo-Painter. The estimated parameters were fixed in a new round of analysis producing the final chromosome painting of the recipient haplotypes. Donor individuals were chosen as follows, according to their ancestry proportion inferred by admixture: (a) Mesoamerican individuals with a q value >0.99 in the admixture run (K = 3 using all American accessions), and (b) Andean individuals with a q value consistently >0.99 from K = 2 to K = 4 in the admixture run restricted to Andean accessions. Donors were subdivided into the five groups inferred by ADMIXTURE (AM_M1, AM_M2, AM_A1, AM_A2 and AM_A3) and were used to estimate their contribution to the ancestry of each SNP in the recipient individuals. Individual SNP probabilities were then combined in 10 kb non-overlapping sliding windows along chromosomes and each window in each recipient haplotype was assigned to one of the five donor groups if a probability ≥0.8 was observed (see also Supplementary Note 19). The total proportion of genetic material from the seven groups or "unknown" (genotypes assigned to none of the groups) was computed for each recipient individual and for each chromosome (both pairs). The final assignment of each recipient accession to the genepools was based on (a) the total proportion of windows attributed to Mesoamerica or Andes, and (b) the number of chromosomes assigned to the two genepools following the majority rule criterion.
The attribution of each genomic window to the seven groups was also used to estimate the length of the introgressed blocks within each European accession. Each haplotype of the EU_AND accessions was traversed, merging consecutive windows attributed to any of the Mesoamerican clusters. Bedtools 66 was used to join windows within a maximum distance between elements of 50 kb to deal with artificially broken introgressed blocks. The length of each Mesoamerican block in each EU_AND individual was recorded for each chromosome and was then filtered to remove blocks composed of single windows (10 kb). The final within-individual distribution of lengths was characterised by the median value due to the non-normal distribution of the data.
For spatial analyses, the ecological data (resolution~1 km2) were downloaded from WorldClim (http://www.worldclim.org) 67 for a total of 19 bioclimatic variables and 24 monthly variables (Supplementary Data 6). The vegan R package 68 was used to calculate the geographical and ecological distances, the Mantel statistics, and the spatial autocorrelation. Initially, the Mantel statistics were tested by 10 3 permutations and the autocorrelogram was calculated between 10 distance classes of nearly 540 km each, determining the significance of the correlation in each class by 9999 permutations. We then applied environmental association analysis with a multivariate correlation analysis between the p values (proportion of membership to the five genetic groups) assigned to each European accession and the ecological variables registered at the collection site.

Genetic diversity
The genetic diversity within groups of accessions, defined according to their geographic origin and genepool, was quantified using the theta estimator θπ 69 . The-site-pi VCFtools flag was used to obtain a per-SNP Article https://doi.org/10.1038/s41467-023-37332-z estimate that was subsequently filtered according to the genome annotation, including only positions located: (a) in callable regions, (b) in coding regions, and/or (c) in neutral regions (Supplementary Note 24). The per-site θπ estimate was then summed up and divided by the size of each specific region to calculate a global estimate. A raw estimate of θπ along chromosomes, averaged over 100 kb nonoverlapping windows, was also computed to highlight chromosomal regions with different levels of genetic diversity. To evaluate the stability of the θπ estimate at different missing data levels, a masked dataset was obtained by filtering introgressed alleles (identified by ChromoPainter) and alleles with an ambiguous assignment, within European accessions. The-site-pi and-missing-site commands in VCFtools were used to obtain a per-site θπ estimate and the proportion of missing data for each position, respectively. The global withingroup θπ was computed for the callable, coding and neutral genomic partitions, excluding regions with an average (over SNP) minimum mean proportion of non-masked individuals (PIND) from 0% to 100%.
To detect patterns of private alleles, missense and synonymous variants were screened in American and European accessions (Supplementary Note 28 and 29). Variants that were private to the European or American groups were retained and divided into those with low (<5%) and medium-high (>5%) within-sample frequencies. The genomic coordinates related to private alleles segregating at different frequencies in the American and European groups of accessions were intersected with the gene annotations, and the burden of missense and synonymous mutations was recorded for each gene element.
The magnitude of the genomic differentiation between and within America and Europe was evaluated using the Weir & Cockerham estimator F ST 70 . We estimated the baseline differentiation between and within the two continents. In addition, the F ST was then computed in 10 kb non-overlapping sliding windows between each pair of groups using VCFtools. The mean and the interquartile range (IQR) of the windows-based distribution were used as a point estimate of the differentiation between groups and to evaluate its dispersion.
Comparison of the genetic structure, molecular phenotype and flowering data Analysis of variance (ANOVA) was used to evaluate differences between the genetic subgroups using the first principal component related to DTF and PS (see also Supplementary Note 6 and 23) as representative phenotypic traits. Principal component 1 was obtained from the secondary metabolites with a high hereditability of H 2 > 0.65 (Supplementary Note 5 and 23) and this was used as a phenotype for comparison between the genetic subgroups.

Tagging the signatures of adaptation in Europe
The presence of excess of introgression and selection was investigated in Europe (Supplementary Note 21 and 31). To detect deviations from the frequencies expected in the absence of demographic and selection forces, the ChromoPainter output was parsed by tracing the assignment of each SNP to the corresponding Mesoamerican or Andean groups. For each SNP, we computed the proportion of haplotypes assigned to the Mesoamerican or Andean groups. We extracted the genomic coordinates of SNPs showing an unexpected proportion of Andean alleles (threshold: EU_A, 71 × 2 = 142 haplotypes, plus 50% of the Mesoamerican ones, EU_M, 43 × 2 × 0.5 = 43, Fobs ≥0.811). The putative SNP targets of Mesoamerican introgression events were identified according to the same rationale (threshold: EU_M, 43 × 2 = 86, plus the 50% of Andean ones, EU_A, 71 × 2 × 0.5 = 71, Fobs = 0.688). The Bedtools slop -b 2500 and merge -d 10000 functions were used to pass from SNP point coordinates to 5 kb regions and then merge into larger genomic blocks if the relative distance between them was <10 kb. Only genomic regions supported by at least three SNPs were retained.
The hapFLK 37 method was used to identify selection signatures. The local genomic differentiation along chromosomes, as measured by haplotypic F ST , was compared to the expectation given by the inferred genomic relationships between groups, considering the genetic drift within groups. Accessions were subdivided into the AM_A (n = 30), AM_M (n = 36), EU_A (n = 71) and EU_M (n = 43) groups and VCFtools was used to sample a single SNP every 250 kb. This set of SNPs was used to construct a neighbour-joining tree and a kinship matrix according to the Reynolds' genetic distance matrix between the four groups of accessions, constituting a genome-wide estimate of population structure. The hapFLK statistics were then computed independently on each chromosome over the complete SNP dataset and averaged over 20 expectation-maximisation cycles to fit the LD model. Initial analysis fixed the number of haplotype clusters to five. A second run was conducted, selecting the appropriate number of haplotype clusters based on the fastPHASE 71 cross-validation procedure, implemented in the imputeqc R package (https://github.com/ inzilico/imputeqc). VCFtools was used to extract a subset of SNPs spaced at least 100 kb apart on each chromosome, and five independent copies of the SNP set were generated, randomly masking 10% of the variants. We then used fastPHASE v1.4.8 to impute the missing genotypes in each dataset, setting the number of haplotype clusters to 5, 10, 20, 30, 40 and 50. The EstimateQuality function was used to compute the proportion of wrongly imputed genotypes (Wp) for each combination, and the K value, minimising the mean Wp proportion across the five SNP set replicates, was selected as the most supported number of haplotype clusters. The analysis was replicated using all or only American accessions. The scaling_chi2_hapflk.py script was used to scale hapFLK values and compute the corresponding p values. The significant SNPs (p < 10 -3 ; FDR < 0.05) were extracted and bedtools was used to create a 10 kb region centred on each significant SNP and to merge overlapping regions within a maximum distance of 5 kb. The two sets of regions were merged forming the extended set (constituted by the union of the two sets) and the restricted set (containing only regions supported by both runs). To pinpoint putative regions under selection in Europe, the extended and restricted sets were intersected with the F ST window analysis, and only regions containing at least one F ST window located in the top 5% or top 1% were retained.

Linkage disequilibrium
The relationship between linkage disequilibrium (LD) and physical distance along chromosomes was evaluated in America and Europe, and successively within the American subgroups. PopLDdecay 72 was used to compute the correlation (r2) between allele frequencies at pairs of SNPs along the chromosomes, setting a minimum minor allele frequency of 0.1 and a maximum distance between SNPs of 5 Mb.
The level of inter-chromosomal LD was also evaluated. VCFtools was used to sample one SNP every 10 kb and compute r 2 between pairs of markers located on different chromosomes. The analysis was performed independently over the American subgroups, using only SNPs that were segregating within each group of accessions with a minor allele frequency >0.05, and only pairs of SNPs showing an r 2 value ≥0.8 were retained. Multiple pairs of SNPs pointing to the same chromosomal regions were merged if within a distance of 100 kb from each other and only pairs of regions spanning at least 500 kb on each side were retained. The whole analysis was also repeated including only SNPs falling in the putative regions under selection, decreasing the minimum width of retained regions from 500 kb to 50 kb. Link plots showing high-LD regions were produced using Rcircos 73 .

Genome-wide association study
Genome-wide association study (GWAS) was carried out for the growth habit, flowering time and photoperiod sensitivity data (see Supplementary Note 6, 33-36). First, we ran a single-locus mixed linear model (MLM) in the R package MVP 74 (https://github.com/XiaoleiLiuBio/MVP).
Bonferroni correction at α = 0.05 was applied as the significance threshold for each trait. The analysis was then conducted using the multi-locus stepwise linear mixed-model (MLMM) 75 (https://github. com/Gregor-Mendel-Institute/MultLocMixMod). By applying a stepwise approach, this includes the most significant SNPs as cofactors in the mixed model. The mBonf criterion was used to identify the optimal results with Bonferroni correction at α = 0.05.
Investigation on the putative function of candidate genes for adaptation Common bean genes orthologous to A. thaliana and legume genes were identified using Orthofinder 76,77 (Supplementary Note 37). The putative function of poorly characterised genes was predicted based on orthologous relationships and literature screening. The orthologue and known genes involved in DTF, GH and PS were screened against the GWAS results. Genes within 50 kb on either side of each significant SNP associated with DTF and GH, and genes located within selection scan and introgression scan regions, were investigated by GO term enrichment analysis (biological process, cellular component and molecular function) using Metascape 78 (http://metascape.org).

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The raw sequence reads generated and analysed in this study have been deposited in the Sequence Read Archive (SRA) of the National Center of Biotechnology Information (NCBI) under BioProject number PRJNA573595. DOIs for the BEAN_ADAPT PV-core 2 accessions are provided in the Supplementary Data 1. Data from the databases/data repositories Phytozome v.2.1, NCBI NC_009259 (P. vulgaris chloroplast genome), WorldClim (http://www.worldclim.org), Zenodo (https:// zenodo.org/record/3236786#.Y49hDXbMK3A; vcfs for the whole genomes sequences data from Wu et al. 25 ) have been used and analyzed within this manuscript. Source data are provided with this paper.