Introduction

Since the 1960s, large collections of crop species and their close relatives have been collected and maintained in international gene banks. This ensures the survival of genotypes that might otherwise be lost and provides access to potentially valuable material for plant breeders. It is the exploitation of genetic diversity for crop improvement (as well as concerns about genetic erosion) that has been a major driving force for the exploration and ex situ conservation of plant genetic resources. In order to exploit such germplasm, evaluation of accessions is required in order for breeders to select the most appropriate material. Evaluation refers to traits that will be very useful for crop improvement, but which are affected in terms of their expression by the environment. It is consequently difficult to undertake such evaluation as it requires elaborate statistically designed field or laboratory experiments to gather reliable data.

A major problem facing many international gene banks is the sheer size of their collections (Jackson, 1997). For example, the International Rice Genebank at the International Rice Research Institute currently holds almost 108 000 samples of rice, the vast majority of which are cultivated Asian rice, Oryza sativa. Evaluation of the whole collection for a range of traits is impossible because of resource limitations. However, we have previously shown that it is possible to predict the performance for various quantitative traits in diverse germplasm of rice using molecular marker data (Virk et al., 1996a). This is possible because of the presence of statistical associations between the presence/absence of individual markers and trait performance. This allows the development of a regression equation that uses information from a set of informative markers to predict the performance for a quantitative trait. We have used this approach to predict the performance for several agronomically important traits in rice. We have also demonstrated that associations between markers and flowering time exist in annual beets (Beta vulgaris ssp. maritima: Virk et al., 1996b) and parallel studies have indicated that such associations exist for a range of traits in barley (Pakniyat et al., 1997).

The traits that have been studied initially have typically been those that are easy to measure in the field: for rice these are culm length, culm number, grain width, days to flowering, leaf length, and panicle length (Virk et al., 1996a, b). However, we have now extended our studies and shown that performance for leaf rolling (a character associated with drought tolerance), micropropagation rates in vitro, callus production and shoot regeneration from callus are also associated with marker distribution in diverse germplasm (unpublished data from our laboratory).

In the present contribution, we investigate the genetic basis of the observed associations between marker loci and QTL that appear to have remained intact across very diverse rice landraces over thousands of years of domestication and selection. We have identified features of the rice genome that underlie these associations and allow accurate predictions of performance for quantitative traits to be made on the basis of marker data. The results shed light upon the evolution of cultivated rice, and indicate ways in which germplasm evaluation could be significantly improved in the future.

Materials and methods

Material

A diverse set of 48 accessions of Oryza sativa (from South and South-east Asia) was selected as described in Virk et al. (1996a). Initially, 200 accessions from the South and South-east Asian germplasm collection held at the IRRI Genetic Resources Center were selected at random and then evaluated in a randomized plot experiment with two replicate blocks during the dry season (November 1993–May 1994). Quantitative data for 10 traits were recorded for each accession. The quantitative data were subjected to cluster analysis and 48 accessions were identified by stratification to represent the diversity present in the original set of 200 accessions. These 48 accessions included representatives of isozyme groups I (16 accessions), II (11 accessions), V (1 accession) and VI (16 accessions) (Glaszmann, 1988) with four accessions of unknown isozyme group. These accessions originated from 10 countries: Bangladesh, Bhutan, India, Indonesia, Malaysia, Pakistan, Philippines, Sri Lanka, Thailand and Vietnam (see Virk et al., 2000). A mapping population of 60 double-haploid (DH) lines obtained from a cross between IR64 and Azucena (Huang et al., 1994; Maheswaran et al., 1997) was also used for this study. Morphological data for four quantitative traits (culm number, culm length, panicle length and days to 50% flowering) were scored on 10 representative plants of each of the 48 accessions and 60 DH lines.

AFLP analysis

The AFLP protocol developed by Vos et al. (1997) was essentially followed, with minor modifications (Virk et al., 1998). Fourteen primer combinations yielded 122 mapped AFLP markers. AFLP data scored on 48 accessions and 60 DH lines were available. These data were used in our previous studies to construct an AFLP map (Virk et al., 1998) and to compare assessments of diversity using mapped and anonymous (unmapped) markers (Virk et al., 2000).

Data analysis

First, to detect the association between 122 mapped AFLP markers scored on the diverse set, a contingency chi-squared analysis for all possible pairs (7381) of markers was conducted. In order to minimize the occurrence of false positives, a conservative test was used in which the probability (P) obtained from the normal test of significance was multiplied by n – 1 (where n is the number of markers). The association was declared significant wherever the adjusted P was less than 0.05.

Secondly, linear regression analysis (SAS, 1990) was employed to detect the association between an AFLP marker and a quantitative trait for the diverse set, where the latter was treated as a dependent variable and the various AFLP marker genotypes (scored as 1 for presence and 0 for absence) as independent variables as in Virk et al. (1996a,b). In the case of the DH mapping population the significance of regression was interpreted to be solely a result of genetic linkage, as opposed to statistical and/or genetical association in the diverse set.

Results

Associations between pairs of markers

The ultimate aim of the present study was to discover how associations between marker loci and QTL had been maintained across diverse rice germplasm. However, initially we carried out analyses in which associations between different AFLP marker loci were examined. All 122 AFLP markers which had been mapped using a DH population (Virk et al., 1998) were scored as present/absent across 48 diverse rice germplasm accessions. In this diverse material strong associations were found between the allelic states of some pairs of markers using chi-squared analysis. Out of 7381 pairs of markers, 960 were found to be strongly associated (adjusted P < 0.05). In 111 (11.6%) of these 960 cases the pairs of markers mapped to the same chromosome when assessed using the DH population. However, for the remaining 849 pairs, markers mapped to different chromosomes so that their statistical association was not explained by genetic linkage.

These statistically associated but genetically unlinked markers were not distributed randomly over the genome; some of them occupied ‘blocks’ within rice chromosomes (Fig. 1). A block of markers was defined as including three or more distinct AFLP loci closely linked and having adjacent positions on the genetic map. Amongst these blocks there has clearly been maintenance of combinations of marker alleles across a set of very diverse rice landraces. For example, there are statistically significant associations between the allelic states of a block on each of chromosomes 2, 3, 9, 11 and 12; another example involves different blocks on chromosomes 3 and 9 (Fig. 1).

Fig. 1
figure 1

Molecular map (Kosambi units) from the rice cross IR64 × Azucena showing the location of AFLP, RAPD and RFLP markers (Huang et al., 1994; Virk et al., 1998). Markers that are indica or japonica specific are in large font (Virk et al., 1998). (a) regions (blocks) on chromosomes 2, 3, 11 and 12 where there are statistically significant associations between the allelic states; (b) regions (blocks) on chromosomes 3 and 9 where there are statistically significant associations between allelic states.

Associations between markers and traits

Next, associations between markers and performance for four quantitative traits were examined in both the diverse germplasm and in the DH population. Markers that were found to be in association with traits in either or both of the diverse set and DH population and their chromosomal locations are listed in Table 1. Single marker regression analysis identified 20 markers that were found to be strongly associated with panicle length in the diverse material (Table 2). Of these 20 markers, nine (45%) were found also to be associated with this trait in the DHs suggesting that associations between these nine markers and panicle length in the diverse set were the result of genetic linkage. However, for the 11 markers that were associated in the diverse set, but were not found to be associated in the DHs, there was clearly no evidence that the association was the consequence of linkage.

Table 1 Markers which are found to be associated (P = < 0.5) by regression analysis, with one or more traits in either or both of the diverse set (a) and doubled haploids (b)
Table 2 Number of mapped AFLP markers associated with each of the four quantitative traits in 48 diverse landrace accessions and 60 double-haploid (DH) lines of rice; for the landraces, the percentage of markers where the association results from genetic linkage is also given

Similarly, there were 55 markers associated with performance for culm number in the diverse material (Table 2). Only nine of these (16%) were associated with the trait in the DHs, associations that were presumably resulted from linkage. Twenty-one markers were associated with days to 50% flowering in the diverse material, and seven (33%) of these were shown to be linked in the DHs (Table 2). Only two markers were associated with culm length in the diverse set, one of which resulted from genetic linkage as shown in the DHs. Conversely, and perhaps unusually, for each of the four traits there was a low number (3–5) of markers that were associated (and therefore genetically linked) in the DHs, but that were not detected as being associated with the traits in the diverse material (Table 2).

Discussion

The research presented herein shows that associations between pairs of AFLP markers can be identified across diverse rice landraces. The frequency of these associations (11.6%) is greater than would be expected (8.3%) given the number of linkage groups (12) that exist in rice. While this difference is highly significant (χ2=13.1; P < 0.001), it may be influenced by an uneven distribution of markers between linkage groups. A DH population has been used to map the markers and it is clear that a large proportion of the associated markers is not genetically linked. Some of the statistically associated, but genetically unlinked markers occupy blocks within rice chromosomes. Amongst these blocks there has clearly been maintenance of combinations of marker alleles across the set of landraces constituting the diverse germplasm studied.

In this study, we have assumed that the AFLP bands scored in the mapping population are allelic with the comigrating bands in the diverse set of germplasm; this was investigated in detail in only one case. One AFLP fragment that explains a large amount of the variation in culm number both in the mapping population and across the diverse set of germplasm was used to develop locus-specific primers that allowed the amplification of the sequence in both parents of the cross and from five landraces from the diverse material. The sequences were identical except for the occurrence of two short insertions/deletions in some alleles. The frequency with which comigrating AFLP bands, derived from genotypes of the same species are allelic has been shown previously to be very high in rice (Nandi et al., 1997), barley (Waugh et al., 1997), potato (Rouppe Van Der Voort et al., 1997) and Arabidopsis (Alonso-Blanco et al., 1998).

In this research we have also identified markers that are associated with performance for each of four traits in the DH mapping population. This has been achieved using classical QTL analysis; these markers are linked genetically to loci controlling the appropriate trait. It is clear that for each trait, some of the markers identified using the DH population have also been identified following the association analysis for that trait using the diverse landrace material. For culm number, nine of the 12 markers identified in the DH population were also identified using the diverse material. Nine of the 13 markers associated with performance for panicle length in the DH population were also associated with this trait in the diverse set. It is clear that the association analysis carried out using diverse material identifies many of the same markers detected by analysis of the mapping population. The markers common to both analyses provide information about the same QTL for a particular trait. Hence, it is clear that associations between markers and traits in the diverse material are, as expected, based on associations between alleles at marker loci and alleles at QTL.

A few markers associated with the four traits were detected using the DH set but not the diverse material. In some cases this may be misleading; particular markers may have fallen just outside the threshold significance used in the analyses and therefore are not included in the lists. It may also be because polymorphism at a QTL in the two parents used in the cross has assumed much greater significance in the mapping population than it does across the diverse set of landraces. This could occur if the effects of other QTL cannot be detected because in this cross these loci are monomorphic.

Conversely, some markers are associated with a trait in the diverse material but not associated with that trait in the DH population. This is explained in part because some of the QTL controlling the traits studied here may not be polymorphic in the particular cross (Azucena × IR64) used to produce this mapping population. Hence, those QTL, and the markers associated with them, would not be detected in the DH population. However, in addition, because marker alleles on different chromosomes can remain in association across diverse germplasm, a marker on one chromosome may also be associated with a QTL allele within an associated block on another chromosome.

The important question is how alleles at loci either within or outside blocks on different chromosomes have remained in association during the adaptive development of landraces. These alleles may have been maintained in association over the hundreds or thousands of years of rice landrace cultivation over a geographically wide area of the world by both natural and human selection. Alternatively, the associations have been selected several or many times during evolution, perhaps as an adaptive response to similar environmental conditions, but moulded by human intervention. The combination of migration and inbreeding giving rise to linkage disequilibrium may have been significant in the initial establishment of associations, but cannot alone account for their occurrence and maintenance over time in geographically diverse regions of the world. Clegg & Allard (1972) were able to identify correlations among allele frequencies in divergent populations of wild oats amongst which migration was common, but where different gene combinations gave rise to adaptation to a range of environments. Human-dispersed populations of landraces of rice across South and South-east Asia will also have been exposed to environmental challenges as well as varied human selection forces. As with wild oats and wild barley (Clegg & Allard, 1972; Clegg et al., 1972), and proposed by Lewontin (1964), it would appear that in these rice landraces, the selection for phenotypes produced as a result of epistatic interactions between alleles at different quantitative trait loci (even on different chromosomes), may have contributed to the associations between alleles at marker loci found in the present study.

We suggest therefore that associations, particularly of genes amongst different chromosomal blocks are an illustration of ‘coadaptation’ originally described by Dobzhansky (1970). While Dobzhansky's ‘coadapted gene complexes’ resulted from suppressed recombination owing to small chromosomal inversions, he suggested that the binding together of favourable allele combinations could be favoured by selection. It was demonstrated subsequently that selection, accompanied by recombination reduced by one or all of linkage, inbreeding and reduced population size can result in nonrandom associations among alleles at different loci (strictly gametic phase disequilibrium) giving rise to genes selected as coadapted units (Franklin & Lewontin, 1970; Allard et al., 1972). Some or all of these factors would seem to be apposite for rice landrace evolution. The existence of ‘adaptive gene complexes’ was later referred to by Allard (1988) in the context of crop plants. He stated that ‘the picture of evolutionary change [in cultivated plants] is one of increasing numbers of favourably interacting alleles into large synergistic complexes’. Results from work on the ‘Barley Composite Cross II’ led Allard (1988) to conclude that after the formation of the initial composite cross, in subsequent generations, restructuring of the populations grown under standard agricultural conditions occurred, giving rise to the incorporation of specific alleles into multilocus complexes that included increasing numbers of loci. The major hypothesis derived from these studies is that, within inbreeding populations, it is the interaction between favourable alleles within these so-called gene complexes that gives rise to the similar increased adaptation of different populations to their local environments. It is no surprise therefore that an apparently similar situation has been identified amongst rice landraces (Virk et al., 1996a) and subsequently in diverse barley germplasm (Pakniyat et al., 1997).

It is interesting to note that more recently Xiong et al. (1999) have also observed chromosomal blocks in their study of QTL controlling domestication-related traits in rice. They report that QTL controlling a range of such traits tend to lie on regions of certain chromosomes. However, it is difficult to compare such results with those reported here because the former were obtained using progeny from a cross between O. sativa and a wild relative (O. rufipogon) with the intention of investigating changes that occurred in domestication. In the present work we have used genetic polymorphisms between a range of genotypes all of which are domesticated. The focus has been on identifying associations between chromosomal blocks that have been maintained during the diversification of rice after domestication. We have also considered whether adaptive gene complexes – and therefore associations – are largely a result of the separate evolutionary histories of the indica and japonica subgroups within O. sativa. We have previously identified 50 AFLP markers out of a total of 122 that account for most of the indica/japonica differentiation (Virk et al., 1998). Of these 50, 18 mapped within blocks and 32 outside; this result is on the borderline of significance at the 5% level (χ2=3.92). Given that substantially more indica/japonica markers fall outside blocks, we feel that this differentiation may only have played a small part in maintaining linkage disequilibrium and underpinning the associations that we have found.

While the present research, like that of Allard (1988) and Pakniyat et al. (1997), identifies a situation that applies to inbreeding species, it remains to be seen whether such a pattern of adaptive evolution can be identified in outbreeding species. Although we have already shown that associations between markers and QTL exist within Beta (Virk et al., 1996b), which as a crop is generally outcrossing, this result was achieved using a particular wild form of beet that is largely inbreeding.

We have already argued that the existence of large numbers of associations between markers and QTL that can be identified amongst diverse germplasm of different crops has practical significance (Virk et al., 1996a). These associations could be used within the early stages of a breeding programme for marker-assisted selection, while their existence could transform the whole process of germplasm evaluation, by allowing for the description of germplasm performance for different quantitative traits prior to undertaking elaborate field trials. In addition, because there is evidence that some of the associations result from genetic linkage, some of the markers could be used to follow QTL through breeding programmes or play a role in map-based cloning strategies.