Breeding for higher yield, early maturity, wider adaptability and waterlogging tolerance in soybean (Glycine max L.): A case study

Breeding for higher yield and wider adaptability are major objectives of soybean crop improvement. In the present study, 68 advanced breeding lines along with seven best checks were evaluated for yield and attributing traits by following group balanced block design. Three blocks were constituted based on the maturity duration of the breeding lines. High genetic variability for the twelve quantitative traits was found within and across the three blocks. Several genotypes were found to outperform check varieties for yield and attributing traits. During the same crop season, one of the promising entries, NRC 128,was evaluated across seven locations for its wider adaptability and it has shown stable performance in Northern plain Zone with > 20% higher yield superiority over best check PS 1347. However, it produced 9.8% yield superiority over best check in Eastern Zone. Screening for waterlogging tolerance under artificial conditions revealed that NRC 128 was on par with the tolerant variety JS 97–52. Based on the yield superiority, wider adaptability and waterlogging tolerance, NRC 128 was released and notified by Central Varietal Release Committee (CVRC) of India, for its cultivation across Eastern and Northern Plain Zones of India.

While dealing with quantitative and complex traits such as grain yield, effects of G × E interactions needs to be considered for genotypic evaluation and varietal selection 6,7 . Various stability models have been developed to understand the G × E interaction patterns 8,9 . Interactions become complex as the number of environments and genotypes increase, and detail analysis and understanding cannot be possible without a graphic approach. GGE (Genotype main effect (G) plus genotype by environment interaction (GE)) is a multivariate, graphic based stability model that has been extensively employed in stability analysis and in understanding Genotype × Environment Interactions, more commonly for grain yield 10 .
Waterlogging is a major abiotic stress significantly affecting world soybean production, causing 16% yield loss globally 11,12 and 18% yield loss in India 13 . Further, global climate change-based weather simulation models showed an expected increase in loss of crop production due to flooding in near future 14 . During last five cropping seasons, major soybean growing regions of central India received more than 70% rainfall during August-September months, when the crop is at late vegetative stage or early reproductive stage in farmer fields 15 , indicating the potential threat of waterlogging stress to the soybean production. Importance of breeding for waterlogging tolerance has been reported in India 15 . JS 20-38 an advanced breeding line has been identified as potential donor for the waterlogging tolerance 16 .Genome-wide association mapping of waterlogging tolerance has identified large number of favorable flood-tolerant alleles and new genetic sources for use in soybean breeding for waterlogging tolerance 17 .Till date only one waterlogging tolerant variety JS 97-52 is notified for cultivation in central zone and north-eastern zone of India. Utilizing this variety in the breeding program, some varieties JS 20-29, JS 20-69 and JS 20-98 were developed with objective of yield traits which were released for cultivation in Central Zone of India. So, there is a need to develop the variety which is having wider adaptability with waterlogging tolerance for other zones. In India, soybean crop especially in Eastern Zone comprising of Bihar, Chhattisgarh, Ranchi states affected by waterlogging conditions due to prolonged monsoon rains. In our earlier studies conducted at ICAR-IISR, cultivar JS 97-52 has been reported as waterlogging tolerant genotype and it is being used as tolerant check in evaluation studies conducted in India 18,19 . A very few researchers evaluated Indian soybean genotypes for waterlogging tolerance either at vegetative or reproductive stage 20 but not at both stages. Keeping in view of soybean improvement under changing climate, the present study was undertaken to develop and evaluate several diverse breeding materials for identifying near-ideotype having higher yield potential, wider adaptability and waterlogging tolerance (Table 1).

Results
Genetic variability of quantitative traits. Significant genotypic difference (p < 0.05) was observed for traits under study within individual blocks viz., early, medium and late maturity blocks except for days to maturity in early block (Tables S1-S12). Pairwise comparison of the genotypes within the three groups was analyzed through LSD test (P < 0.05). In the early maturity (block 1), with respect to grain yield/ plant, entry G21 (8-101-3), (93.9 g) was significantly superior to both the check varieties JS 20-34 and JS 95-60 whereas, G20 (6A-34-11) yielded (2470 g/plot) on par with check variety JS 20-34 (2344 g/plot). In block 2, G42 (6A-18-3-1) entry produced yield of 2230 kg/plot which is significantly higher than three checks JS 93-05, NRC 86 and JS 20-29. Similarly, in block 3, G54 (NRC 128) yielded 3833 kg/plot which is significantly higher than the rest of the tested entries across the block. However, as far as yield/plant is concerned it produced on par with two tested entries G 57 and G 62 ( Table 2). As observed from violin plots (Fig. 1) and PCA (Fig. 2), overall, inflorescence length was highest in medium maturing group followed by early and late maturing groups. No of nodes per plant, No of branches per plant, No of pods per plant and biomass was recorded highest in late maturing group followed by medium and early maturing groups. Traits like 100 seed weight, harvest index, grain yield per plant and grain yield per plot were highest in case of early maturing group followed by late and medium maturing groups. Variability among the other traits across the blocks was shown in Table S2. Days to flowering was recorded in the range of 27-48 days. IC 15,089 an indigenous germplasm accessions flowered in only 27 days followed by entry (13-2) derived from NRC 86 × MACS 330 (30 days). It matured earlier (89 days) when compared to the other entries, whereas JS 97-52 took 48 and 100 days to flower and mature respectively. 100 seed weight was found highest (18.72 g) in G38 (6A-33-1-2) followed by G32 (8-94-3) and several entries exceeded the checks for trait 100 seed weight. The inflorescence length was found highest (5.25 cm) in G2 (6A-47-1) followed by G22 (6A-47-4) (4.76 cm). Plant height was also showed wider range (39.2-102.8 cm) and highest plant height (102.8 cm) was recorded in the genotype G26 (13-100) followed by G14 (100.8 cm) (12)(13)(14)(15)(16) and JS 97-52(86.27 cm).Biomass per plant (g) was showed66.00-211.67 g range and highest was recorded in the line G54 (NRC 128). (Tables S13 and S14). Narrow differences between PCV and GCV indicated lesser influence of the environment for all the twelve traits. Correlation analysis. Correlation of yield with other traits in early maturing breeding lines revealed that grain yield per plant had significant positive correlation with 100 seed weight (0.68***), biomass (0.90***), pods per plant (0.65***), branches per plant (0.66***) and days to flowering (0.41*). In the medium maturing, yield per plant was significantly associated with harvest index (0.72***), 100 seed weight (0.74***) and biomass (0.83) but found non-significant negative association with days to flowering (-0. 22  Cluster analysis. In the present study, days to flowering (52.25%) exhibited greater variation and contribution to diversity among genotypes followed by days to maturity (10.38%) and plot yield (9.23%). Yield per plant (1.15%), harvest index (0.22%) and branches per plant (1.73%) contributed comparatively less to the total diversity (Table 3). Seventy-five genotypes including seven checks grouped into five clusters based on D 2 values using the Tocher's method. The distribution of genotypes into various clusters is depicted in Fig. S1 & Table 4. Out of the five clusters, cluster I was the largest comprising of 57 genotypes followed by cluster II with 14 genotypes. Clusters III and V represented by one genotype each and two genotypes were presented in cluster IV. The average intra and inter cluster D 2 values can be computed from the cluster diagram where the statistical distances among the 75 genotypes were exhibited (Table S15). Intra cluster D2 values ranged from zero to 6.79 with maximum distance in cluster 1 (8.04), followed by cluster IV (6.17). From the inter cluster D 2 values of the five clusters, highest divergence was noticed between cluster II and V (19.76) while the lowest was noticed between cluster III and V (10.17). The cluster means for each of 12 characters (Table S16) indicated that the cluster mean for days to flowering was highest in cluster III (48.33) and the lowest in cluster II (31.67) and similar trend was noticed with days to maturity with respective clusters. 100-seed weight was highest in cluster II (14.70 g) and lowest in cluster III (12.37 g). Cluster V recorded the highest plot grain yield (3833.33 g) and the lowest was in cluster IV (1064.33). Cluster III was characterized by longest inflorescence (4.29 cm) while the shortest was recorded in cluster IV (1.49 cm). The number of pods per plant was highest in cluster V (70.0) and the lowest number was noticed in cluster II (29.77). It was observed that cluster V had many of the desirable means for several characters and with respect to contribution to the genetic diversity.     Through mean vs stability analysis, RSC 11-07 was found to be high yielding (1755.50 kg/ha) followed by NRC 128that produced 1720 kg / ha of grain yield which was9.76% higher yield than the best check JS 97-52(1552 kg/ha) (Fig. 5a). The 100 seed weight of the NRC 128 (13.2 g) was also higher than that of all the check entries (Table 8). On the other hand, RSC 11-07 was found to be near ideal genotype followed by NRC 136, AMS 2014-1 and NRC 128when mean performance and stability were considered simultaneously (Fig. 5b). Similarly, in Northern plain Zone, NRC 128 (L3) was evaluated along with five other promising entries viz., PS 1613 (L1), PS 1611 (L2), PS 1347 (L4), Pusa 97-12 (L5) and SL 958 (L6). It ranked first in terms of mean performance and stability by yielding 2242 kg / ha which was 20.6% higher than best check PS 1347 (1782 kg/ha) (Fig. 6a). Further, NRC 128 was ranked first with respect to ideal genotype (Fig. 6b). The mean multi-location data for grain yield (kg/ha) for NPZ and EZ has been presented in Tables S17 & S18 respectively. Pooled ANOVA for genotypes evaluated across two agro-climatic Zones were presented in Table S19. Phenotype of NRC 128 has been depicted in the Fig. 7.

Selection of genotypes based on MGIDI index and genetic gain.
Offseason seed multiplication. For the seed requirement of the farmers, 477 kg (4.77 quintal) nucleus seed of NRC 128 was sown at Belagavi, Karnataka during February-May 2021. The variety was sown in area of 7.2 hectare and produced 7020 kg (70.20 quintal) seed with productivity of 975 kg per hectare.

Discussion
With the presence of narrow genetic base in soybean, use of diverse parents and development of large F 2 population plays a vital role in development of high yielding varieties. Therefore, employing exotic germplasm accessions in hybridization program will helps in broadening the genetic base. Yield potential is built-up by progressive   www.nature.com/scientificreports/ assembling of productivity genes as against quality, resistance to biotic and abiotic stresses 21 . In India, mega soybean varieties like Gaurav (JS 72-44), JS 335, JS 93-05 and JS 95-60 were bred through conventional breeding and aided in enhanced soybean production 21 . The annual genetic gain during 1969 to 1993 in seed yield of Indian soybean varieties has been about 22 kg/ha 22 . Similar gain trend was also seen in other reports 23 .
For increasing the yield per se in soybean crop, conventional breeding must be reoriented with use of discreetly chosen parents and pre-bred diverse material in the crosses, sizeable F 2 populations and three-way crosses, multiparent crosses and combination breeding 21 . An experiment was carried out in group balanced block design (GBBD) which is very efficient compared to regular RBD design. The GBBD design helps in reduction of experimental error by making blocks based on the maturity of the genotypes and treatments are compared with higher degree of the precision 24 . The presented study evaluated early, medium, and late maturing soybean advanced breeding lines for yield and attributing traits. The advanced breeding materials were derived from diverse crosses, including pre breeding material derived from wild type G. soja andmultiparent crosses. Two germplasm accessions EC 572,109 and IC 15,089 (triple mutant for e1, e3, e3) carrying early maturity alleles were also evaluated for yield and associated traits and both matured in 89-90 days 25 . One of the advanced breeding line 13-2 derived   26 . The early maturing breeding lines evaluated in block 1, particularly those derived from crosses involving one of the parent as EC 538,828 was found to have higher 100 seed weight than other tested entries in block 2 and block 3. EC 533,828 is bold seeded genotype along with tolerance to drought and terminal heat stress 27 . Breeding for early maturity is much needed to fulfill the demand of the farmers in the Central India. In this region, early maturity soybean is primary requirement for soybeanpotato-wheat/soybean-wheat cropping system. In the present study, few early maturing breeding lines 6A-34-11 (NRC 146) (2470 kg/plot), 6A-34-6 (2416 kg/plot) and 3A-44-1-1 (2246 kg/plot) yielded on par with best early maturing check JS 20-34 (2344.33 kg/plot).The genotype NRC 146 was reported as heat tolerant 28 . The response to selection in any crop improvement program depends on the degree of genetic variability and heritability 29 . High degree of genetic parameters such as heritability, variance, genetic advance, and genetic gain for the important traits like grain yield, biomass and others has been noticed in the current study. Correlation analysis in the present study revealed that improvement of higher yield is possible through selection of attributing traits such as biomass, harvest index, plant height, number of branches, number of pods, which was in accordance with previous reports 30,31 . Soybean varieties with ideal inflorescence architecture could help in producing more yield potential. As an important and complex trait, inflorescence length (IL) of soybean significantly affected seed yields 32,33 . In the present study longest inflorescence was found in G2 (6A-47-1) genotype (5.25 cm) and it has also positive and significantly correlated with yield per plot (0.47*). Cluster analysis helps in identification of distinct and diverse genotypes for the hybridization program to develop breeding material with broader genetic base. In the present study it grouped the accessions into five clusters with cluster I comprising of 57 accessions indicating close relatedness of the accessions and crossing among these accessions may yield less genetic gain.  www.nature.com/scientificreports/ Such reports on grouping of genotypes were done by other workers 34,35 . Clusters III and V had only one breeding line each indicating a high degree of heterogeneity, and these may be directly utilized as parents in hybridization programs to combine desirable characters. Similarly, hybridization between lines belonging to different clusters especially cluster II and V certainly is rewarding in generating diverse breeding material. An efficient multivariate selection index, MGIDI 2 was used to select genotypes nearer to ideotype. Based on this index, eleven genotypes were selected as superior to other tested entries and out of eleven, genotype G54 (NRC 128) is found to be ranked first in terms of ideotype. The genetic gain was positive for the traits under consideration except with plant height. Plant height had found negative gain may be due to its negative association with yield attributing traits viz., branches per plant, yield per plot, harvest index and 100 seed weight (Fig. S2). NRC 128 genotypes is derived from JS 97-52 x (EC 389,148 × PS 1042) and one of its parents, JS 97-52 is climate smart genotype having resistance for major disease of soybean such as charcoal rot and yellow mosaic disease, and tolerance to abiotic stresses like drought, heat, and waterlogging stresses [36][37][38][39][40] . JS 97-52 possesses 100-seed weight of 8-9 gm, whereas in the NRC 128, 100 seed weight trait has been improved to 13.3 g and it was also least affected by waterlogging stress. The importance of GGE is demonstrated in number of other crops for yield and other agronomic traits [41][42][43][44][45][46] to understand G x E interaction pattern and to select stable and superior genotypes. Based on GGE biplots RSC 11-07 was found to be near-ideal genotype in Eastern Zone, while NRC 128 was found to be near-ideal genotype in Northern Plain Zone.
Substantial yield reductions in soybean have been observed when excessive soil water occurs during both vegetative and reproductive stages of the plant [47][48][49][50][51][52][53] . The most effective and economic approach to decrease yield loss is by developing waterlogging tolerant soybean cultivars 17 . Screening of genotypes at reproductive stages   55 . The present study observed that good grain filling during waterlogging stress at reproductive stage in NRC 128; therefore, it may be one of the candidate donor parents for development of varieties for the ecologies where more rainfall occurs near harvesting stage. Availability of quality seed is one of the primary requirements of the farmers to achieve more production. A total of 70.20 quintal of NRC 128 seed was produced for farmer's requirement. The variety as produced 975 kg per ha yield during off season which is considerably high.

Conclusion
Evaluation of large number of advanced breeding lines identified several promising lines for early maturity, bold seed, waterlogging tolerance and higher yield. The group balanced block design and MGIDI index used in the current study were found efficient and aided in identification of NRC 128, as high yielding and near-ideotype. Further, NRC 128 identified as first waterlogging tolerant variety for northern plain zone and especially for eastern zone of India where waterlogging situations occurs due to prolonged monsoon rains. Based on its superiority over yield and waterlogging tolerance, it has been released and notified (S.O. 500(E) 29 , 100 seed weight (g), harvest index (%), grain yield per plot (g)and grain yield per plant (g) were recorded as per standard procedure (IBPGR 1984). Grain yield per plant was based on average yield (g) of five randomly selected plants. Recommended crop production package of practices has been followed throughout the experiment to reach maximum yield potential of the crop 56 . The methodology andprotocol used in the present study are in accordance with relevant institutional, national, and international guidelines and legislation.  evaluated for waterlogging tolerance at early vegetative and reproductive growth stages at ICAR-Indian institute of Soybean Research, Indore. As per our previous records NRC 128 was found promising and it has derived from the waterlogging tolerant variety JS 97-52; therefore, it was evaluated along with two check varieties for waterlogging tolerance under controlled conditions. All three genotypes were sown in three rows of one meter each and observations were recorded on every individual plant. For vegetative stages waterlogging tolerance, waterlogging stress was imposed during V 2 -V 3 growth stages for 10 days by saturating the soil up to 10 cm above the soil surface in stress field plot while counter control field plot was maintained with normal irrigated condition using standard protocols 57,58 . Foliar damage score (FDS; 1-9 scale based on chlorosis, necrosis and plant mortality), plant survival rate (PSR) 59 and stem elongation rate (SER) in stressed plot were recorded. Plant survival rate was calculated as: PSR = {100 − (number of plants before stress-number of plants after stress/ number of plants before stress)} × 100. Stem elongation rate (SER) was calculated as: = (height after stress-height before stress)/ height before stress × 100. For determining the leaf chlorophyll content in both plots, five unrolled leaflets were randomly selected in each replicate using a chlorophyll meter (Konica Minolta, SPAD-502).Similarly, root nodule dry weight per plant in both plots (control and stress) was estimated as per methodology suggested 60 . After recording the yield traits and other related morpho-physiological traits in control and stress plots, percent reduction in grain yield per plant, root nodules' weight and SCMR (SPAD chlorophyll meter readings) under waterlogged conditions in comparison to normal field conditions was estimated. Waterlogging tolerance coefficient (WTC) was calculated with formula WTC = mean value (seed yield per plant) of treatment (genotype) in stressed plot × plant survival rate/mean value (seed yield per plant) of treatment (genotype) in control plot. www.nature.com/scientificreports/ Similarly, NRC 128 along with susceptible and tolerant check was evaluated for waterlogging tolerance at reproductive stage. Waterlogging stress was provided at R 1 stage (12-15 cm of water above the soil surface) for 15 days as per methodology with slight modifications 61 . These genotypes were evaluated for yield attributes 62 and total chlorophyll content (through Acetone DSMO method) as per methodology from control and stress plot 63 . Waterlogging tolerance was evaluated by dividing the seed yield of stressed plants by that of the control plants, to provide a waterlogging tolerance index (WTI) 64 .

Multi-location evaluation.
A total eleven genotypes along with NRC 128 was evaluated for yield in different locations of Eastern Zone (Dholi, Raipur, Ranchi, Bhavanipanta). Similarly, total of six genotypes including NRC 128 were evaluated at three locations viz., Delhi, Ludhiana and Pantnagar in North Plain Zone (of India. Multi-location trials were conducted in RBD fashion with four replicates each. Each replication is sown in a plot size of 21.6 m 2 and the yield was converted into kg/ha. Recommended package of practiced were followed throughout the experiments 56 . Finally, nucleus seeds of NRC 128 genotype were multiplied in offseason at Belagavi, Karnataka (February-May 2021) for farmers requirement.
Statistical analysis. For the breeding trial, Analysis of Variance (ANOVA) was calculated as per Gomez and Gomez. Violin plots for different traits were generated using "ggplot2" R package 65 . Correlation analysis has been carried out using R package "PerformanceAnalytics" 66 . Principal Component Analysis was done using R packages "devtools" 67 and "factoextra" 68 . Cluster analysis was carried out using software "INDOSTAT". For multi-location trials, GGE Biplot analysis was done using R package "GGEBiplotGUI" 69 . MGIDI index was calculated using R package "Metan" 70 .

Data availability
The raw data supporting the conclusions of this manuscript will be made available by the authors to any qualified researcher. All datasets used for analysis in the study are included in the manuscript and as supplementary files.