Identification of Quantitative Trait Loci (QTL) has been a challenge for complex traits due to the use of populations with narrow genetic base. Most of QTL mapping studies were carried out from crosses made within the subspecies, either indica × indica or japonica × japonica. In this study we report advantages of using Multi-parent Advanced Generation Inter-Crosses global population, derived from a combination of eight indica and eight japonica elite parents, in QTL discovery for yield and grain quality traits. Genome-wide association study and interval mapping identified 38 and 34 QTLs whereas Bayesian networking detected 60 QTLs with 22 marker-marker associations, 32 trait-trait associations and 65 marker-trait associations. Notably, nine known QTLs/genes qPH1/OsGA20ox2, qDF3/OsMADS50, PL, QDg1, qGW-5b, grb7-2, qGL3/GS3, Amy6/Wx gene and OsNAS3 were consistently identified by all approaches for nine traits whereas qDF3/OsMADS50 was co-located for both yield and days-to-flowering traits on chromosome 3. Moreover, we identified a number of candidate QTLs in either one or two analyses but further validations will be needed. The results indicate that this new population has enabled identifications of significant QTLs and interactions for 16 traits through multiple approaches. Pyramided recombinant inbred lines provide a valuable source for integration into future breeding programs.
Rice is a major food crop for over half of the world population, accounting for almost 90% of production of global rice by Asian countries1. With the increase of world population, rice production has to be doubled by 20502. The production of rice has been significantly improved after the development of semi-dwarf cultivars and hybrid rice3. However, in the last decades, rice yield was not significantly improved and reached into a projected rice production4. To ensure food security, declining in genetic gain, narrow genetic base of the modern rice varieties, biotic and abiotic stress pressure, increasing demand for more quantity and better quality of rice are some of the concerns for rice breeders2,3,5. In practice, most of the economically important traits display complex genetic architecture that are under polygenic control and often influenced by extensive genotype × environment (G×E) interactions.
Breeders and geneticists traditionally used bi-parental populations for Quantitative Trait Loci (QTL) mapping and varietal development. A number of mapping studies have been carried out in bi-parental populations for detecting QTLs for grain yield and quality traits because of the ease of population development and availability of a wide range of statistical analysis tools6,7,8,9,10,11,12,13. Bi-parental populations such as Recombinant Inbred Lines (RILs), Backcross Inbred Lines (BILs), Near Isogenic Lines (NILs), Advanced Inter-Cross (AIC) and Double Haploid (DH) have been found to be effective in mapping of large-effect QTLs14,15,16,17,18,19,20. The weakness of the bi-parental population is that loci are mapped with low mapping resolution as a result of the limited recombination21. Additional mapping is still required to fine map the QTLs with small effects. In contrast association mapping exploits linkage disequilibrium (LD) to localize small and large-effect QTLs in diverse populations. Facilitated with high-throughput genotyping, agronomic QTLs and grain quality QTLs have been mapped by high-dense Single Nucleotide Polymorphism (SNP) markers through genome-wide association study (GWAS) 22,23,24,25. However, diverse population introduce population structure which could lead the spurious association if they are not accounted for26,27.
An alternative approach is to create multi-parental populations derived from elite parents in which each line represents a combination of alleles inherited from multiple parents. This allows the broadening of the genetic base and creates agronomically superior breeding lines through strategic recombination of genes/QTLs, thereby helps to select best lines suitable for targeted breeding programs. Multi-parent Advanced Generation Inter-crosses (MAGIC) populations have been developed in a number of crop species such as rice, corn, bread wheat, durum wheat, barley and chickpea28. A comprehensive review of the development and use of MAGIC populations has been provided28. Applications of MAGIC populations have been discussed and adopted within rice community to develop multi-parental populations29,30,31. MAGIC involves intercrossing a number of parental lines for “n” generations in a mating design to combine the genomes of all parents in the progeny lines. It can be used for coarse mapping with low marker densities on lines derived from an early generation and for fine mapping QTL using lines derived from more advanced generation32. In this study, QTL analysis on yield and related component traits, and grain quality traits was conducted in MAGIC global population (MGP) developed at the International Rice Research Institute (IRRI). The main objectives of the study were to identify the loci that were responsible for higher grain yield, superior agronomic characters, good grain quality and biofortification, and map the QTLs with higher resolution and study interactions. Based on the QTL identified, tightly linked SNP markers can be used by breeders for marker-aided selection to precisely introduce beneficial QTLs into elite lines for crop improvement.
Trait variances and correlations
Nine traits (agronomical and biofortification traits) were measured in both 2015 Dry Season (2015DS) and 2016 Dry Season (2016DS), while 16 traits (agronomical, grain quality and biofortification traits) were measured in the 2016DS. MGP presented substantial variations for all traits during both 2015DS and 2016DS (Table S1). The results from 2015DS showed that among the parental lines CSR30 had the highest Best Linear Unbiased Estimator (BLUE) values for number of productive tillers (PTN), grain iron (Fe) and grain zinc (Zn). Inia Tacuari had the highest BLUE values for grain weight per panicle (GWT) and chlorophyll content index (SPAD) in flag leaf at maturity stage. Cypress, Samba Mahsuri + Sub1 and WAB 56–125 had highest BLUE values for grain yield (GYLD), grain number per panicle (TGN) and panicle length (PNL) respectively. Colombia XXI, IR45427-2B-2-2B-1-1, IR77186-122-2-2-3 and IR77298-14-1-2-10 were less than 110 cm. Four lines showed better GYLD than the highest parent (10.08 tons/ha) while 1010 lines were less than that parent. A total of 62 lines showed better GYLD than top check variety (7.12 tons/ha), whereas 952 lines were less than that variety. During 2016DS, among the parents Colombia XXI had the highest BLUE values for PNL, grain length (GL) and GWT, and Shan-Huang Zhan-2 had highest BLUE values for PTN and Fe content. IR73571-3B-11-3-K2 had highest BLUE values for GYLD and amylose content (AC) while IR4630-22-2-5-1-3 and IR45427-2B-2-2B-1-1 had highest BLUE values for grain width (GW) and CSR30 had the highest BLUE values for Zn content. A total of 60 lines showed higher GYLD than the top parent (8.40 tons/ha), whereas 1278 lines had lower than top parent. A total of 243 lines showed better GYLD than top check variety (6.44 tons/ha), whereas 1095 lines were less than top check variety.
Most of the parents flowered and matured early except Samba Mahsuri + Sub1. In the MAGIC RILs, the ranges and means for majority of traits were similar in both 2015DS and 2016DS trials. However, both means and ranges were higher for plant height (PHT), TGN, GWT, Zn and Fe during 2015DS, while PTN, SPAD and GYLD ranges were higher during 2016DS. But PNL range was higher in 2015DS and mean was higher in 2016DS. The genotypic variance for all the traits during both the seasons was highly significant (p < 0.0001). The quantile-quantile (QQ) analyses showed almost normal distributions for most of the measured traits. Combined BLUE analysis (Two-stage analysis in PBTools) was also significant for genotypic variance of nine common traits between two dry seasons. Combined BLUE values of nine common traits (2015DS and 2016DS) and BLUE values of seven traits (2016DS) were used to perform for further analyses. Several significant correlations were identified among different traits. Of 36 possible correlations, there were 21 positive and 15 negative correlations in 2015DS, whereas GWT was significantly correlated with PNL and TGN at p < 0.05. In 2016DS, 120 possible correlations, there were 54 negative and 66 positive correlations, whereas 18 (15 positive and 3 negative) were significant at p < 0.05. At a level of significance (p < 0.05), GYLD were positively correlated with PHT, PNL, number of filled grains (FG) and GWT, and negatively with Zn (Fig. S1A,B).
Population structure analysis and linkage disequilibrium (LD)
For this population, the log likelihood revealed by STRUCTURE gradually increased from k = 1 to k = 5 but no obvious optimum was observed. In contrast, the maximum of Dk was observed at k = 2, indicating that population can be divided into two subgroups (Fig. S2A). However, STRUCTURE did not identify any significant population structure as Dk value was very low in MGP. Four principal components (PCs) were used to measure the variations in the population. The first PC explained 4.7% variations while the rest three PCs explained less than 1.5% variations. PC analysis showed no major clustering in the population although Jinbubyeo and Inia Tacuari were observed in counting of wide variations from the population (Fig. S2B). The LD analysis showed that there is extensive variability in the magnitude of allele frequency correlations (r2) reflecting variations in LD across chromosomes through 66,309 SNP markers. Average LD decay between 200–400 Kb were observed among intra-chromosomal marker pairs across different physical distance groups in the population at r2 ~0.24, about half of its initial values (Table S2). Therefore, this MGP has no population structure with lower LD across the genome, representing a useful genetic resource for genetic studies and fine mapping major effect QTLs and genes in rice.
Genome-wide association study
Genome-wide association analysis (GWAS) was carried out to detect significant QTLs for 16 measured traits in MGP. A total of 1,027 MAGIC RILs, 16 parents and 66,309 SNP markers were used in association analysis. SNP makers significantly associated with different traits were detected at a threshold of p < 0.0001. All the significant SNPs linked to a trait on a chromosomal region was considered as significant QTL or genomic region. The significant QTLs for each trait are provided (Figs. 1(i) and S3). A total of 38 QTLs were significantly associated with different traits and these QTLs were distributed on all chromosomes. The number of QTLs identified for each trait varied from 1 to 5. The highest number of QTLs were identified for GW and PNL on chromosome 1, 2, 3, 5, 7 and 8. For the remaining traits a maximum of three QTLs were identified. The phenotypic variance explained (PVE) by these QTLs varied from ~3.2 to 39.8% and 21 QTLs had PVE of more than 10%. In several QTL regions multiple SNPs were identified for different traits with clear peaks within wider confidence intervals while chalkiness (CHALKY), PTN and number of unfilled grains (UF) had one to two SNPs. Manhattan plots showed 25 significant QTLs for agronomic traits and 13 significant QTLs for grain quality and biofortication traits. The qUF3 and qCLK4 explained smallest QTL effects (PVE < 5%) for UF and CHALKY while qPHT1 explained large QTL effect (PVE~40%) for PHT. Of 38 QTLs, 22 QTLs explained moderate to large QTL effect (PVE > 10%) for PHT, days-to-flowering (DTF), PNL, GL, GW, TGN, AC and Zn. The rest 16 QTLs explained small QTL effects (PVE < 10%) for PTN, SPAD, FG, UF, GWT, TGN, CHALKY and GYLD. In this study, GWAS identified a number of QTLs located either within or near reported genomic regions as well as newly detected QTLs across the genome. The QTL of plant height (qPHT1) was co-located with qPH1/OsGA20ox2 underlying semi-dwarf trait while qDTF3 and qGYLD3 were located in very close proximity with major flowering activator genes (qDF3/OsMADS50, Hd9, Hd1) for DTF and GYLD traits. For grain quality QTLs QGL3, qGW5 and qAC6 were closely located with GS3, qGW-5b and Wx genes. Meanwhile, QZn7 was co-located with qZn7.1/OsNAS3, long distance metal transporter for Zn (Table S3).
Multi-parent interval mapping
Inia Tacuari and IR07F287 showed highest contributions of genomes among the parents. Cypress and Fedearroz50 were lowest in contributions of their genomes to the progenies (Fig. S4A). In genetic map, number of SNP markers varied from 342 on chromosome 9 to 845 on chromosome 1 (Fig. S4B). A total of 89 QTLs were identified for 16 traits from interval mapping (IM) at p < 0.0001 whereas the number of significant QTLs were reduced to 34 QTLs after fitting the full model (Figs. 1(ii) and S5). IM detected 19 QTLs for agronomic traits and 15 QTLs for grain quality and biofortification traits. Four QTLs qPHT1, qDTF3, QGL3 and qAC6 explained moderate to large QTL effects (PVE > 10%) for PHT, DTF, GL and AC. For PHT, qPHT1 was detected on chromosome 1 with large QTL effect at PVE of 38.7%. PVE of three QTLs qDTF3, QGL3 and qAC6 explained moderate QTL effects and varied from 14.11–22.43% for DTF, GL and AC. The remaining 30 QTLs explained small QTL effects and varied from 2.37 to 8.72%. Two QTLs qGYLD2 and qGYLD3 varied from 2.8 to 4.41% for GYLD while QZn1.1 and QZn7 were from 5.33 to 7.71% for Zn. The qUF2 QTL explained the smallest QTL effect (PVE~3%) for UF. Notably, major QTLs detected in IM were consistent with the QTLs uncovered by GWAS. These major reported QTLs qPH1/OsGA20ox2, qDF3/OsMADS50, Hd9, Hd1, GS3, qGW-5b, Wx and qZn7.1/OsNAS3 were closely identified by IM for PHT, DTF, GYLD, GL, GW, AC and Zn traits (Table S4).
Bayesian genomic prediction network
Bayesian Genomic Prediction Network (BN) explained that causal predictive correlations showed higher predictive power than genetic predictive correlations for all traits (Table 1). Moreover, BN showed the strength and direction of relationships among traits and markers (Fig. S6). A total of 60 QTLs were identified by BN whereas 31 QTLs were agronomic traits and 29 QTLs for grain quality and biofortification. BN consistently identified major reported QTLs, uncovered by GWAS and IM qPH1/OsGA20ox2, qDF3/OsMADS50, GS3, qGW-5b, Wx and qZn7.1/OsNAS3 for PHT, DTF, GYLD, GL, GW, AC and Zn traits (Fig. 2; Table S5). Further, a total of 73 nodes and 119 associations were observed in BN of 16 traits. There were 22 marker-marker associations, 32 trait-trait associations and 65 marker-trait associations in BN analysis. At averaged BN (Strength > 0.5), significant direct associations among the traits were PHT~PNL:GWT:DTF, PTN~PHT:GWT, PNL~GWT, UF~TGN:DTF, GWT~TGN:UF, FG~PHT:TGN, GYLD~Zn:PHT:PTN:TGN: GWT:FG:DTF, GW~TGN:GWT:GL, GL~PNL:TGN, AC~PHT, CHALKY~TGN:DTF:GL:GW, Zn~FG and Fe~Zn:GL. At significant marker-trait associations, numbers of significant markers varied from one to eight markers for respective traits. GW and Zn were associated with eight markers for each trait while only one marker associated with FG.
Candidate QTLs/Genes analysis
Candidate genes analysis was carried out using peak SNP markers detected in at least two of three analyses (GWAS, IM and BN). All known genes and fine mapped QTLs of the significant markers were shortlisted in Table 2. Ten QTLs qPHT1, qDTF3, qPNL7, qCHP1, qGW5, qGW7, QGL3, qAC6, QZn7 and qGYLD3 were consistently identified in three analyses whereas qDTF6, qCHP4, qUF2, qGN4, qGW2, qGW3, qGW8, QZn1 and QZn5 were identified in at least two analyses. For reported QTLs of these genomic regions, plant height QTL qPHT1 was in close proximity with qPH1/OsGA20ox2, semidwarf gene on chromosome 1. Two flowering QTLs qDTF3 and qDTF6 were in close proximity with qDF3/OsMADS50, flowering activator gene and Hd1 on chromosome 3 and 6. With co-location of QTLs, flowering QTL qDTF3 and grain yield QTL qGYLD3 were co-located with qDF3/OsMADS50 gene on chromosome 3. Panicle length QTL qPNL7 was positioned within PL on chromosome 7 while qCHP1 and qCHP4 were co-located with QDg1 and QDg4a on chromosome 1 and 4 for chlorophyll content index. The grain number QTL qGN4 was co-located with gn-4 on chromosome 4 while qUF2 was novel QTL for unfilled grain on chromosome 2. Grain width QTLs qGW3, qGW5 and qGW7 were co-located with qGL3/GS3, qGW-5b, and grb7-2 on chromosome 3, 5 and 7 whereas qGW2 and qGW8 have not reported in QTLs databases. Grain length QGL3 was positioned within the qGL3/GS3 gene, underlying grain shape on chromosome 3. For grain quality and biofortification, qAC6 was positioned within Amy6/Wx gene on chromosome 6 for AC whereas QZn1, QZn5 and QZn7 were co-located with metal transporter genes OsFRDL4, rMQTL5.2 and OsNAS3 on chromosome 1, 5 and 7 for Zn (Fig. 3). In gene association analysis, ten candidate genes were identified for GYLD on chromosome 3 whereas 78 candidate genes were identified on chromosome 6 for AC. A total of 22 candidate genes were associated with Zn on chromosome 7 while 10 candidate genes were associated with Zn on chromosome 5. All the top five candidate genes of grain yield, grain quality and biofortification traits were shortlisted in Table S6.
MAGIC lines with multiple QTLs pyramided
In MAGIC global population, phenotypic analysis showed wider variations for 16 traits during dry seasons. QTL combinations were observed in MAGIC RILs from the contributions of 16 founders. Reshuffling of these founder genomes increased crossovers to break negative drag effects between two genetic loci. Out of 1,027 RILs, 72 lines were found with high GYLD and Zn. Meanwhile, 69 lines were observed with high GYLD and Zn, and early flowering (DTF), 18 lines with high GYLD and Zn (~18 ppm), early flowering (DTF), taller plants (PHT) and moderate AC. Correspondent QTLs and allelic combinations are being further investigated for these pyramided RILs. Based on acceptable yield and zinc level, ten best multi-trait pyramided RILs have been shortlisted and presented in Table 3. These promising lines with multiple trait combinations will provide a good genetic resource for breeding programs.
Most of the economically important traits in rice are quantitatively inherited in genetic manner33. Combination of association and pedigree-based studies was a good approach to identify small and large effect QTLs using appropriate mapping population. In previous studies, most of mapping populations have been limited to apply both association and pedigree-based studies14,15,21,22,26,34,35. MAGIC global population is a unique genetic resource with wider genetic diversity representing indica and japonica subgroups without prominent population structure as well as low LD28,29,30,31,36. Phenotypic analysis showed substantial variations for 16 measured traits and transgressive RILs for further genetic analysis. In a Pearson correlation, we observed positive correlations between GYLD and, PHT, PNL, FG and GWT, and negatively with Zn. Meng’s group reported that population structure in MAGIC population was negligible as an intercrossed population37. Our study suggests that no major clustering was observed by STRUCTURE and PC analyses. The LD decay distance is an important factor in determining the association mapping resolution as high LD decay enhance the fine mapping of QTL regions38. Different LD decay rates of MAGIC rice populations have been reported by previous studies37,39,40. The results of LD decay showed high rate of recombination with an average LD decay around 300 kb (r2 = 0.24). High LD decay increased mapping resolution whereas non-significant population structure reduced spurious marker-trait association28,30,31,40.
In this study, we used a unique mapping population with large population size, adequate marker density and appropriate statistical model to detect significant QTL regions though different SNP marker sets used for different analyses based on statistical model and computational power. Significant marker-trait associations and interactions were captured through the association and pedigree-based analyses. All analyses (GWAS, IM, BN) have identified significant QTLs in close proximity with known QTLs/genes qPH1/OsGA20ox2, qDF3/OsMADS50, PL, QDg1, qGW-5b, grb7-2, qGL3/GS3, Amy6/Wx gene and qZn7.1/OsNAS3 for PHT, DTF, GYLD, PNL, SPAD, GW, GL, AC and Zn across the genome (https://rapdb.dna.affrc.go.jp/; https://archive.gramene.org/qtl/; http://qtaro.abr.affrc.go.jp/). These results indicate that all analyses used the validity and appropriateness of model for the study. Aside from these QTLs, we also detected unknown and known QTLs across the genomes in either one or two analyses. Based on differences in statistical performance, each analysis can detect the QTL that was not detected by other analyses. However, these QTLs still require further validations before they can be incorporated in breeding program.
Our study is a first report for exploring genetic architecture of grain yield and grain quality through the combination of association and pedigree-based studies in 16-way MAGIC rice population although several studies reported for yield and grain quality traits1,3,4,8,9,11,12,13. Many published studies mentioned that most of high-yielding varieties have longer growth duration for longer metabolic activities and grain filling41. In this study, GYLD and DTF were co-located with qDF3/OsMADS50, flowering activator genes on chromosome 3. This result suggests that there is a pleiotropic interaction between GYLD and DTF, consistent with previous studies31,41. Further, we explored the interactions among yield and quality traits through the BN prediction. BN prediction revealed that PHT, DTF, GWT, TGN, PTN, Zn and FG were directly associated with GYLD. Consistent with previous reports, we detected negative correlations between GYLD and, DTF and Zn40,41,42. Low recombination rate in bi-parental population is a limiting factor to break the negative drag effects among the traits21. However, reshuffling of 16 founder genomes help breaking the negative drag effects between two genetic loci in the population. For instance, we are able to select pyramided lines which have high yield with short lifespan, and high yield with high zinc content.
In conclusions, MAGIC global population provided a valuable genetic resource with multi-trait combinations. The promising lines with multiple traits will make them ideal for direct utilization in breeding. With a unique population, combination of association and pedigree-based studies was a powerful tool to identify significant candidate QTLs as well as interactions among the traits. In this study, we uncovered candidate QTLs with high mapping resolution, interval regions of candidate QTLs, marker-marker associations, marker-trait associations and the trait-trait associations of 16 measured traits. Consistent significant markers identified in all analyses can be directly used in MAS to facilitate screening the breeding lines with desirable traits in crop improvement programs. The validation of novel regions and candidate genes will be a focus of future research.
MAGIC global population
The MAGIC indica and japonica populations were developed at IRRI by using eight elite founders from indica pool and eight elite founders from japonica pool. These founders possessed good grain quality, high yield potential, biotic and abiotic stress tolerance. Both MAGIC populations followed the same scheme of development29. Here, MAGIC global population was developed by expanding the diversity to increase recombination between the eight indica and eight japonica MAGIC pools through additional cycles of intercrossing. The eight-way F1’s derived during the development of the MAGIC indica population were crossed to the eight-way F1’s derived during the development of the MAGIC japonica population. A total of 150 sixteen-way crosses were advanced for a number of selfing generations (S8) to create MAGIC global population. Therefore, MAGIC global population is representative of 16 founders of indica and japonica pools (Fig. 4).
Field trials and trait measurements
MAGIC global population was grown during 2015DS and 2016DS at IRRI. We followed standard field management practices to raise good crop43. During ripening stage (about 30 days after flowering), 9 traits (agronomic and biofortification traits) and 16 traits (agronomic and grain quality traits including biofortification traits) were measured in 2015DS and 2016DS. In 2015DS trial, three uniform plants in the middle of each plot were measured for PHT, PTN, SPAD at maturity and three panicles harvested from each plot were sampled to measure PNL, GW and TGN. The inner twelve hills (3 × 4) were harvested for measurement of GYLD and adjusted 14% moisture content. In 2016DS, seven additional traits such as DTF, FG, UF, GW, GL, CHALKY and AC were measured. In 2016DS, about 30–40 hills were harvested for GYLD after removing last border row. Yield per plot was converted to tons/ha31. Zn and Fe were measured by using milled rice in both dry seasons. AC was measured by using Skalar San++ System Segmented Flow Analyser (SFA) which consists of an autosampler, an amylose chemistry unit (manifold, proportioning pump and colorimeter with 620 nm filter). Grain physical appearances (GW, GL and CHALKY) were measured by using SeedCount SC5000 Image Analyzer. For measuring grain Zn and Fe, milled rice samples weighting at least 3 g were subjected to X-ray fluorescence (XRF) analysis using Bruker S2 Ranger for Zn and Fe. Measurements were done twice per sample and was expressed in parts per million (ppm).
The statistical analyses of all measured traits were performed using PB Tools software (http://bbi.irri.org/) and R/Asreml. For nine common traits of both dry seasons, adjusted means from P-rep and AugRCB designs were first weighted by 1/mse. The weighted means were used to perform combined analysis in a two-stage analysis within PB Tools software based on error mean square (mse), standard error and number of replicates. Statistical significance of seven additional measured traits from 2016DS were analysed by using AugRCB design in R/Asreml. Correlations, boxplot and basic statistical parameters were calculated in R programs. Skewed phenotypic data was normalized by using rankTransPheno function in R/FRGEpistasis program. A total of 1027 common genotypes between two dry seasons and parents were used to perform GWAS, IM and BN.
Genotyping by sequencing (GBS) and SNP calling
About 2 milligrams leaf samples of 1330 genotypes with replicates were collected by using PlantTrak Hx sampling method. DNA extraction was conducted by using oKtopure Extraction protocol in the Genotyping Service Laboratory at IRRI. DNA library was sent to Cornell University for SNP multiplex analysis using Illumina’s GBS protocol44. The GBS pipeline was run by the Philippine Genome Center of the University of the Philippines using Tassel software Version 3.0.16945. The sequence reads were aligned to the reference genome Nipponbare sequence MSUv7 to derive the physical positions of markers. Post-processing steps were applied to the genotype data for generating quality SNPs by imposing various criteria31. After filtering post-GBS pipeline, different SNP datasets were generated for multiple approaches. A 22,338 SNP markers were generated for pedigree-based analysis after filtering parents at minor allele frequency (MAF) (1/16) with no missing data while 66,309 SNP markers were generated for association analysis at MAF (0.05) and call rate (70%). From the 22,338 SNP markers, 8,110 SNP markers were extracted for BN analysis based on MAF (0.05), r2 < 0.5 and no heterozygous call while 6,170 SNP markers were binned and extracted for genetic mapping at no closer than 0.1 cM (Fig. S7).
Population structure analysis and linkage disequilibrium
Population structure was performed by 8110 SNP markers using a model based Bayesian clustering analysis method, implemented in STRUCTURE software Version 2.3.446. The program was run with the following parameters: k, the number of groups in the panel varying from 1 to 5; 10 runs for each k value; for each run, 10,000 burn in iterations followed by 10,000 MCMC (Markov Chain Monte Carlo) iterations. The optimal number of K clusters was estimated with the parameter (ΔK) of47 in Structure Harvester48. In addition, four PCs were conducted for population analysis by using 66,309 SNP markers through R/SNPRelated package. The results of clustering in the population were interpreted based on percent variations explained by different PCs. The intra-chromosomal linkage disequilibrium (LD) between SNP marker pairs were calculated by r2 values between the pairs of markers using 66,309 SNPs in TASSEL v5.2.20. Marker pairs with statistically significant LD (pDiseq < 0.05) were considered in the LD decay analysis. The LD decay rate was measured as the average r2 dropped to half of its maximum value12,24.
Genome-wide association study
A genome-wide association study (GWAS) was performed for 16 traits using 66,309 SNPs and mean BLUEs of each trait. All statistical analyses were performed using the PBTools and R/Asreml software packages (Fitting linear mixed model using residual maximum likelihood, Version 3.0). GWAS was carried out using R/GAPIT (Genome Association and Prediction Integrated Tool)49. The compressed mixed linear model (MLM) method was applied for detecting QTL associated with the trait. This MLM allowed correction to cryptic relatedness and other fixed effects using a kinship matrix and population stratification through principle components50. The default criteria implemented in GAPIT was used with a significance threshold of p < 0.0001.
Multi-parent interval mapping
Multi-parent interval mapping was carried out for 16 traits using 6,170 SNP markers. Founder probabilities of 16 parents and percentage of recombination per chromosome were estimated using R/Happy Version 2.3. The genetic map of the population was generated by using 6,170 SNP markers at average marker density at ~63 Kb through R/mpMap. Significant QTLs were detected by conducting interval mapping using the functions ‘mpprob’ and ‘mpIM’ through R/happy and R/mpMap51. Simple interval mapping (SIM) was carried out using adjusted means as response. A QTL was considered as important in SIM after passing a significance threshold level at p < 0.0001. The effects of all QTLs were used to simultaneously estimate from the function ‘fit’ by fitting all the detected QTLs in a single model or full model (both fixed and random effects).
Bayesian genomic networking
The averaged Bayesian network in multiple QTLs analysis was conducted by using 8,110 SNP markers for 16 traits following the instructions of Scutari’s group52. The package lme4 was used to adjust for family structure while bnlearn was used to learn the model and perform predictions, and parallel to speed up learning. We encoded short labels to the marker names after preprocessing data file. Moreover, we identified which variables in the data are traits, which are markers, which contain variety IDs and pedigree information. The Bayesian network model was fitted by the ‘fit.the.model()’ function which takes the data and the type I error threshold alpha to use for structure learning as arguments. The type I error alpha was set at 0.01 in this study.
Candidate QTLs/genes analysis
Candidate QTLs/genes were identified using publicly available databases; RAP DB (https://rapdb.dna.affrc.go.jp/), QTARO (http://qtaro.abr.affrc.go.jp/) and GRAMENE (https://archive.gramene.org/qtl/) databases. All candidate QTLs/genes of significant genomic regions were searched to provide additional insight in genetic architecture of grain yield and grain quality traits using annotated Napponbare reference genome (MSUv7) through Galaxy/IRRI Bioinformatics (http://galaxy.irri.org/). Within ±200 kb (100 kb - SNP + 100 kb) of the peak SNP, gene association analysis was carried out for GYLD, AC and Zn using MAGMA Version 1.06 for detecting significant candidate genes.
Bazrkar-Khatibani, L. et al. Genetic Mapping and Validation of Quantitative Trait Loci (QTL) for the Grain Appearance and Quality Traits in Rice (Oryza sativa) by Using Recombinant Inbred Line (RIL) Population. International Journal of Genomics, 1–13 (2019).
Ray, D. K. et al. Recent patterns of crop yield growth and stagnation. Nature Communications 3, 1293 (2012).
Xu, J. L. et al. SS1 (NAL1)- and SS2-mediated genetic networks underlying source- sink and yield traits in rice (Oryza sativa). PLoS ONE 10, e0132060 (2015).
Zhu, M. et al. QTL mapping using an ultra-high-density SNP map reveals a major locus for grain yield in an elite rice restorer R998. Scientific Reports 7, 10914 (2017).
Godfray, H. C. J. & Garnett, T. Food security and sustainable intensification. 369. Phil. Trans. Soc. B. (2014).
Long-Biao, G. & Guo-You, Y. Use of Major Quantitative Trait Loci to Improve Grain Yield of Rice. Rice science 21, 65–82 (2014).
Marathi, B. et al. QTL analysis of novel genomic regions associated with yield and yield related traits in new plant type based recombinant inbred lines of rice (Oryza sativa). BMC Plant Biology 12, 137 (2012).
Zhou, S. et al. Mapping of QTLs for yield and its components in a rice recombinant inbred line population. Pakistan Journal of Botany 45, 183–189 (2013).
Mahender, A. et al. Rice grain nutritional traits and their enhancement using relevant genes and QTLs through advanced approaches. Springerplus 5, 2086 (2016).
Liu, G. F. et al. Genetic analysis of grain yield conditioned on its component traits in rice (Oryza sativaL.). Australian Journal of Agricultural Research 59, 189 (2008).
Huang, A., Xu, S. & Cai, X. Whole-Genome Quantitative Trait Locus Mapping Reveals Major Role of Epistasis on Yield of Rice. PLoS ONE 9, e87330 (2014).
Huang, X. H. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42, 961–967 (2010).
Park, G. H., Kim, J.-H. & Kim, K.-M. QTL Analysis of Yield Components in Rice Using a Cheongcheong/Nagdong Doubled Haploid Genetic Map. American Journal of Plant Sciences 5, 1174–1180 (2014).
Doerge, R. W. Multifactorial Genetics: Mapping and analysis of quantitative trait loci in experimental populations. Nature Review Genetics 3, 43–52 (2002).
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits. Am J Hum Genet 68, 548–549 (2001).
Rakshit, S., Zaide, P. H. & Mishra, S. K. Molecular markers and tagging of genes in crop plants. In Advances in plant physiology Scientific Publications, Jodhpur, India (ed. A. Hemantaranjan). 4, 205–223 (2002).
Collard, B. C. Y. et al. An introduction to markers, quantitative trait loci (QTL) mapping and marker assisted selection for crop improvement: the basic concepts. Euphytica 142, 169–196 (2005).
Loudet, O. et al. Bay-0 x Shahdara recombinant inbred lines population: a powerful tool for the genetic dissection of complex traits in Arabidopsis. Theoretical and Applied Genetics 104(6-7), 1173–1184 (2002).
Churchill, G. A. et al. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet 36, 1133–1137 (2004).
Yalchin, B., Flint, J. & Mott, R. Using progenitor strain information to identify quantitative trait nucleotides in outbred mice. Genetics 171, 673–681 (2005).
Li, H. et al. Statistical properties of QTL linkage mapping in bi-parental genetic populations. Heredity 105, 257–267 (2010).
Myles, S. et al. Association mapping: critical considerations shift from genotyping to experimental design. The Plant Cell 21, 2194–220 (2009).
Begum, H. et al. Genome-wide Association Mapping for Yield and Other Agronomic Traits in an Elite Breeding Population of Tropical Rice (Oryza sativa). PLoS One 10, e0119873 (2015).
Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun 2, 467 (2011).
Biscarini, F. et al. Genome-wide association study for traits related to plant and grain morphology, and root architecture in temperate rice accessions. PLoS one 11, 1–28 (2016).
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Review. Genetics 6, 95–108 (2005).
Kerentjes, J. B. et al. A comparison of population type used for QTL mapping in Arabidopsis thaliana. Plant. Genet. Res 9, 185–188 (2011).
Huang, B. E. et al. MAGIC populations in crops: current status and future prospects. Theoretical and Applied Genetics 128, 999–1017 (2015).
Bandillo, N. et al. Development of multi-parent advanced generation intercross (MAGIC) populations for gene discovery in rice (Oryza sativa). Philipp. J. Crop Sci 35(1), 96 (2010).
Bandillo, N. et al. Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for Genetics research and breeding. Rice 6, 11 (2013).
Raghavan, C. et al. Approaches in characterizing genetic structure and mapping in a rice multiparental population. G3: Genes, Genomes. Genetics 7, 1721–1730 (2017).
Mackay, I. & Powell, W. Methods for linkage disequilibrium mapping in crops. Trends Plant Sci 12, 57–63 (2007).
Falconer, D. S. & Mackay Trudy F. C. Introduction to Quantitative Genetics. 4thed. Harlow, UK, Longman Group, 464 (1996).
Darvasi, A. & Soller, M. Advanced intercross lines, an experimental population for fine genetic mapping. Genetics 141, 1199–1207 (1995).
Keurentjes, J. J. B. et al. A comparison of population type used for QTL mapping in Arabidopsis thaliana. Plant Genet. Res 9, 185–188 (2011).
Mott, R. et al. A method for fine mapping quantitative trait loci in outbred animal stocks. PNAS 97, 12649–12654 (2000).
Meng, L., et al. Characterization of Three Rice Multi-parent Advanced Generation Intercross (MAGIC) Populations for Quantitative Trait Loci Identification. The Plant Genome 9 (2016).
Flint-Garcia, S. A., Thornsberry, J. M. & Buckler, E. S. Structure of linkage disequilibrium in plants. Review of Plant Biology 54, 357–374 (2003).
Ogawa, D. et al. Haplotype-based allele mining in the Japan-MAGIC rice population. Scientific Reports 8, 4379 (2018).
Descalsota, G. I. L. et al. Genome-Wide Association Mapping in a Rice MAGIC Plus Population Detects QTLs and Genes Useful for Biofortification. Frontiers in Plant Science 9, 1347 (2018).
Li, F. et al. Genetic Basis Underlying Correlations Among Growth Duration and Yield Traits Revealed by GWAS in Rice (Oryza sativa L.). Frontiers in Plant Science 9, 650 (2018).
Swamy, B. P. M. et al. Identification of genomic regions associated with agronomic and biofortification traits in DH populations of rice. PLoS One 13, e0201756 (2018).
Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6, e19379 (2011).
Glaubitz, J. C. et al. A high capacity Genotyping by Sequencing analysis pipeline. PLoS One 9, e90346 (2014).
Pritchard, J. et al. Association mapping in structured populations. Am. J. Hum. Genet 67, 170–181 (2000).
Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14, 2611–2620 (2005).
Earl, D. A. & Von Holdt, B. M. Structure Harvester: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Cons. Genet Res 4, 359–361 (2012).
Lipka, A. E. et al. From association to prediction: Statistical methods for the dissection and selection of complex traits in plants. Curr. Opin. Plant Biol 24, 110–118 (2015).
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet 42, 355–60 (2010).
Huang, E. & George, A. R/mpMap: a computational platform for the genetic analysis of multi-parent recombinant inbred lines. Bioinformatics 27, 727–729 (2011).
Scutari, M. et al. Multiple Quantitative Trait Analysis Using Bayesian Networks. Genetics 198, 129–137 (2014).
The first author is supported by a Lee Foundation Scholarship. We acknowledge support from Genotyping Services Laboratory for DNA extraction, the Genomic Diversity Facility, Biotechnology Resource Centre, Cornell University for genotyping by sequencing services, and the Philippine Genome Center, University of the Philippines, Philippines for running the GBS pipeline. We extend our acknowledgement to supports provided by Biometrics and Bioinformatics teams and the Grain Quality and Nutrition Centre at IRRI.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zaw, H., Raghavan, C., Pocsedio, A. et al. Exploring genetic architecture of grain yield and quality traits in a 16-way indica by japonica rice MAGIC global population. Sci Rep 9, 19605 (2019). https://doi.org/10.1038/s41598-019-55357-7
Advances and trends on the utilization of multi-parent advanced generation intercross (MAGIC) for crop improvement
Mapping novel QTLs for yield related traits from a popular rice hybrid KRH-2 derived doubled haploid (DH) population
3 Biotech (2021)
Molecular mapping of QTLs for yield related traits in recombinant inbred line (RIL) population derived from the popular rice hybrid KRH-2 and their validation through SNP genotyping
Scientific Reports (2020)