Genetic dissection of seedling vigour in a diverse panel from the 3,000 Rice (Oryza sativa L.) Genome Project

Seedling vigour (SV) is important for direct seeding rice (Oryza sativa L.), especially in a paddy-direct seeding system, but the genetic mechanisms behind the related traits remain largely unknown. Here, we used 744 germplasms, having at least two subsets, for the detection of quantitative trait loci (QTLs) affecting the SV-related traits tiller number, plant height, and aboveground dry weight at three sampling stages, 27, 34, and 41 d after sowing. A joint map based on GAPIT and mrMLM produced a satisfying balance between type I and II errors. In total, 42 QTL regions, containing 18 (42.9%) previously reported overlapping QTL regions and 24 new ones, responsible for SV were detected throughout the genome. Four QTL regions, qSV1a, qSV3e, qSV4c, and qSV7c, were delimited and harboured quantitative trait nucleotides that are responsible for SV-related traits. Favourable haplotype mining for the candidate genes within these four regions, as well as the early SV gene OsGA20ox1, was performed, and the favourable haplotypes were presented with donors from the 3,000 Rice Genome Project. This work provides new information and materials for the future molecular breeding of direct seeding rice, especially in paddy-direct seeding cultivation systems.

randomly selected from the sequenced accessions of the 3 K Rice Genome Project 25 . The field work was conducted at the experimental station in Shenzhen (22.6°N, 114.4°E), Guangdong Province during the late season of 2017. Approximately 120 germinated seeds for each accession were directly sown with 20 cm between rows, 15 cm between individuals and two seeds per hill in the paddy plot. At the one-leaf stage, 60 similar seedlings for each accession, with a single seedling in each hill, were maintained. The extra seedlings were removed. The tests were arranged in randomised plots with three replications. Field management was carried out according to the local practice. Specifically, ~113 kg ha −1 of carbamide was applied at the 2.5-leaf stage to supply nitrogen. Butachlor was applied at ~1.8 kg ha −1 for weed control during paddy preparation before seeding, and it was applied again after the 3-leaf stage.
The first sampling (stage A) was carried out at 27 DAS, and the second (stage B) and third (stage C) samplings were carried out at 34 and 41 DAS, respectively. From each accession, 10 uniform individuals, except for those on the boarders, were evaluated for SV traits. The SV traits, tiller number (TN), plant height (PH), and aboveground dry weight (DW), were measured for each individual.
Genotyping by sequencing and SNP extraction. The 744 accession panel was re-sequenced, with an average depth of more than 10× 25 . The cleaned reads were then mapped to the reference genome of 'Nipponbare' (IRGSP1.0), and ~14 M of high-quality SNPs were identified 25 . From these SNPs, a set of 2.9 M SNPs related to potential protein-coding areas was carefully selected. To build a SNP set for association studies, a subset of 27,921 SNPs was selected from the 2.9 M SNPs by choosing one SNP per 100 counts, as described in our previous GWAS mapping work 23 . QTL analysis, comparative mapping and haplotype analysis. Basic statistical analyses for SV traits were conducted with SAS 26 . For the graphing and plotting, both Excel and R scripts were adopted. The basic scenario of a compressed mixed linear model 27 implemented in GAPIT 28 was adopted for the association analysis between QTL-flanking markers and SV traits for the 744 sequenced accessions. The parameters for GAPIT were set in accordance with our previous report 23 . A relatively stringent threshold was adopted to identify significant correlations between the SNPs and SV traits with a −LOG 10 (P) value of 6.0. To minimise to the possibility of type II errors in QTL detection 29 , a relatively loose threshold of 3.0 was adopted for the QTL regions having supporting evidence from different traits or previous reports. The allelic effects were estimated by setting the Major.allele. zero = TRUE in GAPIT to identify the donors of favourable alleles and their effects on SV traits. We used the mrMLM package 24 to confirm and complement the mapping results from GAPIT.
Comparative mapping was carried out against a reference sequence map, the GRAMENE annotation sequence map 30 , to compare the QTLs detected in this study with previously reported QTLs or genes known to be associated with SV-related traits in rice.
Favourable haplotypes for candidate genes were investigated jointly with the aid of Perl and R scripts as described in our previous reports 22,23 , with minor modifications. The procedure included the following six major steps: (1) Determine the sub-regions defined by quantitative trait nucleotides (QTNs) with supporting evidence from both GAPIT and mrMLM; (2) Fill in the sub-regions with more SNPs from the original 2.9 M sets 23 , and then carry out GWASs using mrMLM within these sub-regions; (3) Screen the key QTNs with evidences from different SV-related traits; (4) Search for candidate genes harbouring these key QTNs for SV traits; (5) Carry out haplotype analyses for these candidate genes for SV-related traits to confirm the GWAS mapping results; and 6) Determine the top donors carrying favourable haplotypes. www.nature.com/scientificreports www.nature.com/scientificreports/

Results
Performances of SV-related traits in the sequenced accessions. As shown in Fig. 1a  progressive sampling stage. One unique trait was DW, for which the distribution range increased significantly from sampling stages A to C (Fig. 1c). However, the distribution range of TN and PH almost remained almost unchanged from sampling stages A to C. The pattern change for PH based on the sampling stages was between those of the previous two traits.
For each trait, all of the values from the different sampling stages were highly significantly correlated except for the correlations between PH and TN, and all of the correlations were in positive direction (Fig. 2). Especially, DW and TN were highly correlated having correlation coefficients greater than 0.60 at the first two sampling stages (27 and 34 DAS), but only ranged by 0.42-0.55 at the third sampling stage (41 DAS). However, the correlations between DW and PH (ranged by 0.35-0.59) were not as significant as the others.
Using the 27 K SNPs, sample clustering and a PCA were also performed. The PCA results for the 744 panel are shown in Fig. 3a, and the kinship between the 744 accessions is presented in Fig. 3b. Based on the PCA and the kinship analysis, the 744 accessions were divided into at least two large groups. Within each group, there were at least one or two subgroups. This was consistent with the two sub-species groupings of the rice accessions.
Linkage disequilibrium (LD) decay values were calculated throughout the genome, and the average values are shown in Supplementary Fig. 1. On average, LD blocks in this rice population of 744 germplasms extend up to ~50 kb.
Of the 31 SV-related QTL regions responsible for DW, 18, 12, and 8, as assessed by GAPIT, and 10, 8, and 7, as assessed by mrMLM, were detected at 27, 34, and 41 DAS, respectively. Only qSV2c was active throughout all three sampling stages according to GAPIT and it was confirmed by mrMLM at 27 and 34 DAS. The averaged −LOG 10 (P) values for SV-related QTL regions affecting DW at 27, 34, and 41 DAS, were 4.2, 4.0, and 3.9, respectively, with ranges of 3.0-6.7, 3.0-6.0, and 3.0-6.0, respectively, GAPIT, and 4.1, 3.6, and 3.9, respectively, with ranges of 3.0-6.9, 3.0-5.2, and 3.0-5.5, respectively, according to mrMLM. www.nature.com/scientificreports www.nature.com/scientificreports/ Approximately 11.5% of the SV-related QTL regions were responsible for TN throughout all three sampling stages (27,34, and 41 DAS). When we analysed PH with a higher heritability, the percentage of stable QTL regions reached 33.3%; however, the DW value dramatically decreased to 3.3% Table 2, for the 28 SV-related QTL regions responsible for TN, GAPIT detected 26 (15,11, and 15 for the three sampling stages, respectively), while mrMLM detected 14 (10, 7, and 12 for the three sampling stages, respectively). In a comparison between the two methods, 12 QTL regions overlapped (9, 7, and 11 for the three sampling stages, respectively). For the 33 QTL regions responsible for PH, GAPIT detected all of them, with 20, 26, and 21 for the three sampling stages, respectively, while mrMLM detected 27, with 12, 18, and 11 for the three sampling stages, respectively. The overlapping QTL regions between the two methods at the three sampling stages were 12, 18, and 11, respectively. For the 31 SV-related QTL regions responsible for DW, GAPIT detected 30, with 18, 12, and 8 for 27, 34, and 41 DAS, respectively, while mrMLM detected 23, with 10, 8, and 7 for the three sampling stages, respectively. The overlapping QTL regions between the two methods were 10, 8, and 6 at the three sampling stages, respectively.

Mapping results from GAPIT and mrMLM. As shown in
As shown in Supplementary Figs 2-4, Manhattan plots with similar patterns were obtained for both GAPIT and mrMLM. However, using the threshold settings, relatively fewer QTL regions with clearer backgrounds were obtained by mrMLM.
Haplotype analysis of candidate genes for QTL regions. Based on the above association mapping results, four QTL regions, qSV1a, qSV3e, qSV4c, and qSV7c, were selected for fine-mapping using more SNPs. The region harbouring qSV3e was split into two sub-regions, which contained candidate Os03g0799700, with multi-evidenced QTNs, and Os03g0856700, with a single SNP but large −LOG 10 (P) value of 13.0 based on the   Table). Similarly, the other three QTL regions were delimited and harboured multi-evidenced QTNs after a sub-region analysis. Candidate genes containing significant SNPs or exactly located SNPs having phenotypic haplotype effects, as well as their donors, were analysed and are shown in Table 3 and Figs 4-8. Os01g0166800, OS03g0799700 and Os03g0856700 (OsGA20ox1), Os04g0683600, and Os07g0600400 were determined to be candidate genes for qSV1a, qSV3e, qSV4c, and qSV7c, respectively, owing to significant differences in related traits among their different haplotypes. As shown in Fig. 4, haplotype 2 of Os01g0166800 appeared to be elite, affecting both TN and DW without affecting PH, while haplotype 5 affected all three traits. Haplotypes 1 and 6 of OS03g0799700 were recommended (Fig. 5), affecting TN and DW without significantly increasing PH, while haplotype 3 influenced all three traits. Two haplotypes of Os03g0856700 (OsGA20ox1) appeared in the 744 germplasms (Fig. 6). Phenotypic differences among the three trait values (TN, PH, and DW) between the two haplotypes were highly significant at all three sampling stages, except for marginal differences in PH at 34 and 41 DAS (Fig. 6b). Haplotype 2 behaved better than haplotype 1 for Os03g0856700. For qSV4c, haplotype 1 was better than haplotype 2, while for qSV7c, haplotypes 1 and 3 were better than haplotype 2. The favourable haplotypes of qSV4c and qSV7c affected all the three SV-related traits (Figs 6 and 7).
Among these QTLs, qSV2c was the most supported, and it was mapped with four reported loci, qGR2 for germination rate 31 , qSEV-2-2 for PH and plant dry weight 16 , qCSH2 for seedling height, and qSDW2 for SV-related traits under cold stress 32 . Another two QTL regions (qSV1d and qSV3e) were mapped with three reported loci. qSV1d was mapped with qSSL1b for seedling shoot length 33 , qRL-1 for root length 34 , and qLA-1 for leaf area 35 , while qSV3e was close to qPHS3-2/OsGA20ox1 for seedling height 36 , qSEV-3-4 for both PH and plant dry weight 16   www.nature.com/scientificreports www.nature.com/scientificreports/ and qSV-3-2 for shoot length and germination rate 13 . Seven QTL regions (qSV2a, qSV4b, qSV5c, qSV6b, qSV7c,  qSV8d, and qSV11a) were each mapped with two reported loci. The qSV2a was mapped with qFML2-1 for mesocotyl length 20 and qRL-2 for root length 35 ; qSV4b was mapped with qSDW4.2 for shoot dry weight 37   www.nature.com/scientificreports www.nature.com/scientificreports/ germination percentage 38 ; qSV5c was mapped with qFV-5-2 for seedling height 15 and qSEV-5-2 for both PH and plant dry weight 16 , although these two were detected in different tests of the same population. qSV6b was close to qDW-6 for seedling dry weight 34 and qLDW-6 for leaf biomass 35 ; qSV7c was mapped with qSDW5-1 for shoot dry weight 11 and qSL-7 for shoot length 35 ; qSV8d was mapped with qSV-8-2 for germination rate, shoot length, and root length 13 and with qPR-8 for biomass portioning to roots 35 , while qSV11a was mapped with qFML11-1 for mesocotyl length 20 and with qLN-11 for leaf number 35 . QTL regions for the SV-related traits that were identified in different mapping populations and diverse environments could be beneficial for the marker-assisted selection-based development of varieties with high levels of SV.

Dynamic expression of SV-related QTL regions and their pleiotropy for SV-related traits.
Improving SV has long been a focus of rice breeders. In particular, ESV (less than 28 DAS) is important for DSR under aerobic conditions in which they compete with weeds. In LSV, the tillering ability and biomass are the bases for the final yield components.
However, LSV is also based on the ESV. Thus, these two types of SV are highly associated. More than one third (42.9%) of the SV-related QTL regions detected in this work were mapped with reported ESV loci. However, four QTLs (qSV1d, qSV2c, qSV3e, and qSV4b) were not only mapped with more than one reported loci, but were also detected by both GAPIT and mrMLM, and all of them were responsible for PH. PH is an agronomic trait with a relatively high level of heritability. Thus, more QTL regions were responsible for PH (33 and 27 by GAPIT and mrMLM, respectively) compared with TN (26 and 23, respectively) and DW (30 and 23, respectively). In total, 11 www.nature.com/scientificreports www.nature.com/scientificreports/ of the loci at the QTL regions stably detected at all three sampling stages were responsible for PH. However, the number of stably detected QTL regions decreased to three and one for TN and DW traits, respectively.
One locus on chromosome 2 (qSV2c) affected all the three traits (TN, PH, and DW) throughout all three sampling stages (27,34,and 41 DAS). This locus is also consistent with the three previously reported loci (qGR2 31 , qCSH2, and qSDW2) 32 . This stably expressed locus could be very useful and will be analysed further. However, the −LOG 10 (P) values for qSV2c were not very high. They ranged from 3.0 to 3.4, with a mean value of 3.2, for TN, from 3.8 to 4.7, with a mean value of 4.4, for PH, and from 3.0 to 3.3, with a mean value of 3.1, for DW.
The number of SV-related QTL regions expressed under different sampling stages varied according to the studied traits. For example, there were totally 28 loci responsible for TN, 33 loci for PH, and 31 for DW. The numbers of TN-related QTL regions for the three different sampling stages varied by 16, 11, and 16, respectively, while the numbers of PH-related QTL regions varied by 20, 26, and 21, respectively for the three stages. However, the numbers of DW-related QTL regions decreased by 18, 12, and 9, respectively. A dynamic pattern was determined for the SV-related QTL regions. In total, 33.3% of PH-related QTL regions were detected in all three sampling stages, and 36.4% were responsible for PH at two stages. Only 10.7% of TN-related and 3.2% of DW-related QTL regions were detected in all three stages, and 32.1% of the former and 19.4% of the latter were detected at two stages.
Joint mapping of QTLs using GAPIT and mrMLM. GAPIT is based on a single dimension of genome scanning. We used the built-in maximum-likelihood method for the GWAS mapping of SV-related traits, while mrMLM was used because of its multi-locus nature. Thus, for trait variations controlled by multiple loci, the estimations of the loci effects by mrMLM should have greater confidence levels 24 . www.nature.com/scientificreports www.nature.com/scientificreports/ Using a common setting, in comparison with GAPIT, mrMLM detected relatively fewer numbers of loci with a similar statistical power, indicated by −LOG 10 (P) values, in most cases. Additionally, as shown in Supplementary  Figs 2-4, the threshold setting influenced QTL mapping. mrMLM focused on loci with high confidence levels and produced results with clearer backgrounds than GAPIT. This would be adequate for most molecular breeding or gene cloning applications. However, to determine the whole picture of a trait, a traditional package, such as GAPIT, should also be used. Conversely, as shown in this report, QTL regions can be located with GAPIT and then, supporting evidence obtained from mrMLM.
For the estimated allelic effects, of the 84 cases detected both by GAPIT and mrMLM, only 25 (29.8%) cases showed consistent effect directions. According to previous reports, with simulated data, a multi-loci method (mrMLM) is more powerful and more accurate in QTN effect estimation than single-locus methods 24 . All of these QTL regions were responsible for TN and PH rather than DW. Further experiments are required to clarify this inconsistency.
Additionally, as an example, we adopted SV mapping results from GAPIT with a suitable threshold setting to minimise possible type II errors. With the aid of mrMLM, the SV mapping results should have greater confidence levels. Thus, combining GAPIT and mrMLM is an option for our future GWAS mapping.
Candidate gene identification in important QTL regions. In our 744 accessions, the average LD decayed significantly to 0.3 at a physical distance of ~50 kb (Supplementary Fig. 1). This is much less than the previously reported LD distance of 75 kb for Xian/indica 39 , not to mention the 150-kb LD distance for tropical Implications in the development of DSR cultivars. Both ESV and LSV are important for cultivars used in DSR cultivating systems. The ESV is highly correlated to the ability of rice seedlings to compete with weeds, especially under the aerobic condition (as in UDS) 45 . In PDS, weed control is much easier than in UDS. The plant type, which develops during the period of LSV, including the TN, PH, and DW contribute to the final grain yield. Most alleles at important stably detected QTL regions and/or in regions with multi-evidenced QTNs, such as www.nature.com/scientificreports www.nature.com/scientificreports/ qSV1a, qSV3e, qSV4c, and qSV7c, increase the TN, PH, and DW (Table 1). Further validation of favourable alleles at the above four stable QTL regions detected throughout the three sampling stages will be performed.
A rice cultivar with strong SV-related traits is desirable for direct seeding. However, SV-related traits have not been selected for in crop improvement programs focused on conventional breeding owing to their complex nature and quantitative inheritances. Molecular markers are effective in increasing selection efficiency, particularly for quantitative traits that are simply inherited. In this study, except for qSV2c's roles in all three SV-related traits, there were 2 QTL regions (qSV5a and qSV9b) responsible for TN only, and 10 QTL regions (qSV1c, qSV1d, qSV4a, qSV5c, qSV6a, qSV6b, qSV7a, qSV7c, qSV8c, and qSV10b) responsible for PH only. A simple increase in PH would result in problems in the final maturation stage and may cause lodging, which is another key trait for DSR. A balance between TN and PH is favourable for increased yields. Thus, an appropriate combination of TN and PN for both a high SV at the seedling stage and a high yield potential at harvest could be achieved by pyramiding the QTLs underlying the two traits, such as qSV5a and qSV9b for TN and qSV1c and qSV4afor PH, after QTL validation using a marker-assisted selection approach.
With a high density of SNP markers, we identified 11 favourable haplotypes for four loci. These haplotypes, defined by key SNPs, can be transformed into breeder-friendly markers in the future. In particular, haplotype 2 of qSV1a, and haplotype 1, 4, and 6 of qSV3e offer more opportunities for the development of cultivars with greater biomasses, resulting from more tillers rather than taller plants. PDR cultivars with this kind of SV would not affect lodging resistance and would have relatively wider adaptations. In addition, elite donors for these favourable haplotypes were also identified ( Table 3). For example, IRIS_313-12135 can be used as the donor for haplotype 2 of qSV1a, and IRIS_313-12033, IRIS_313-11863, and IRIS_313-11807 can be used for haplotype 6 of qSV3e. IRIS_313-11453 can be used as a donor for haplotype 1 of qSV3e, while CX92 can be the donor for haplotype 4 of qSV3e. For OsGA20ox1, IRIS_313-10973 could be a good donor for the elite haplotype 2. www.nature.com/scientificreports www.nature.com/scientificreports/

Conclusions
Tremendous variations for three SV-related traits, TN, PH, and DW, existed in the studied rice germplasms. Using a GWAS of GAPIT and mrMLM, 42 QTL regions, including 18 overlapping previously reported QTL regions and 24 new ones, for the SV-related traits were identified. Five candidate genes were inferred by fine-mapping using more SNPs and a haplotype analysis, including the known gene OsGA20ox1 for qSV3e, which controls ESV and spikelet number per panicle at the maturing stage. Our results indicated that combining GAPIT and mrMLM is an option for GWAS mapping. Favourable alleles at stably expressed QTLs, such as qSV1a, qSV3e, qSV4c, and qSV7c, were mined, and their corresponding accessions were also identified. The results provide useful germplasms and genetic information for the future improvement of SV in rice.