Genetic improvement of Egyptian cotton (Gossypium barbadense L.) for high yield and fiber quality properties under semi arid conditions

Between 2016 and 2018, the Agriculture Research Center's Sakha Agriculture Research Station conducted two rounds of pedigree selection on a segregating population of cotton (Gossypium barbadense L.) using the F2, F3, and F4 generations resulting from crossing Giza 94 and Suvin. In 2016, the top 5% of plants from the F2 population were selected based on specific criteria. The superior families from the F3 generation were then selected to produce the F4 families in 2017, which were grown in the 2018 summer season in single plant progeny rows and bulk experiments with a randomized complete block design of three replications. Over time, most traits showed increased mean values in the population, with the F2 generation having higher Genotypic Coefficient of Variance (GCV) and Phenotypic Coefficient of Variance (PCV) values compared to the succeeding generations for the studied traits. The magnitude of GCV and PCV in the F3 and F4 generations was similar, indicating that genotype had played a greater role than the environment. Moreover, the mean values of heritability in the broad sense increased from generation to generation. Selection criteria I2, I4, and I5 were effective in improving most of the yield and its component traits, while selection criterion I1 was efficient in improving earliness traits. Most of the yield and its component traits showed a positive and significant correlation with each other, highlighting their importance in cotton yield. This suggests that selecting to improveone or more of these traits would improve the others. Families number 9, 13, 19, 20, and 21 were the best genotypes for relevant yield characters, surpassing the better parent, check variety, and giving the best values for most characters. Therefore, the breeder could continue to use these families in further generations as breeding genotypes to develop varieties with high yields and its components.


Plant materials
The study was conducted at Sakha Agricultural Research Station for three consecutive growing seasons (2017,  2018, and 2019).The materials used for the study were intraspecific cotton (Gossypium barbadense L.) cross (Giza 94 × Suvin) from the F 2 , F 3 , and F 4 generations.Giza 94 is known for its high yield, high lint percentage, and earliness, while Suvin is known for its high yield and earliness.The F 1 generation was self-pollinated, and the resulting seeds were used for the F 2 generation.In the 2017 season, the F 2 population and their parents were grown with a spacing of 70 cm between rows and plants, and one plant per hill was maintained.The plants were self-pollinated by covering the flowers with craft paper bags, and 375 selected plants were harvested separately.The plants were evaluated based on the first opening flower, boll weight, seed cotton yield, lint percentage, and lint index, with a selection intensity of 5%.The plants with the highest performance for each criterion were maintained, resulting in 65 selected progenies for the F 3 generation.In the 2017 season, 65 self-selected plants from the F 2 generation were used to raise the F 3 generation.The F 3 families were planted in plots represented by two rows for each plant, with recommended cultural practices carried out throughout the growing season.The best 30 plants out of the 65 F 3 generation were chosen based on a selection intensity of 5% for the five selection criteria.In the 2018 season, the selfed seeds of the 30 F 3 families, as well as the two parents and a commercial variety Giza 86, were planted to represent the F 4 family.Giza 86 was used as a check variety due to its high yield and good fiber properties.A randomized complete block design with three replications was used; one row represented each plot for each plant, 4.5 m. long and 0.7 m. wide with 70 cm.hill spacing.Hills were thinned to one plant per hill.Normal cultural practices for cotton production were performed at the proper time.The studied characters were the first opening flower, first fruiting node, boll weight (g), seed cotton yield, lint cotton yield, lint percentage, seed index and lint index.

Statistical and genetic analysis
The PCV and GCV were calculated using the method described by Kearsey and Pooni 39 .The variance and covariance components obtained from the analysis of a regular randomized complete block design were used to estimate the phenotypic and genotypic variances and covariances.
Heritability in a broad sense (h 2 b ) was calculated as follows: www.nature.com/scientificreports/Heritability in the broad sense was obtained as described by Warner 40.where VF 2 is the phenotypic variance of the F 2 generation, VP 1 and VP 2 are the variances of the first and second parents, respectively, σ 2 g is the genotypic variance of the F 3 and F 4 generations, and σ 2 ph is the phenotypic variance of the F 3 and F 4 generations.
The phenotypic and genotypic correlation coefficients between the studied characters in the F 2 , F 3 , and F 4 generations were estimated using the method outlined by Dewey and Lu 41 and Millar et al. 42 .
To calculate various selection indices, we used the following formula: 43,44 .The formula for estimating the predicted improvement in lint yield based on an index is: ΔY Selection advance (SA) = SD × Σ(b i × σg iw ) 1/245 .where SD is the selection differential in standard units, b i represents the index weights for the characters considered in the index, and σg iw represents the genotypic covariances of the characters with yield.
The formula for estimating the predicted genetic advance in lint yield based on direct selection is ΔG w due to selection for X i = K•σg wi /σp i 46 .The predicted response in any selected and unselected character was calculated based on the methodology proposed by Walker 45 Robinson and Comstock 48.The realized gains were then determined by computing the deviation of the generation mean for each character from the procedure mean of that character.An ordinary analysis of variance was performed for the randomized complete block design according to Kearsey and Pooni 39 , and estimates were obtained for the phenotypic (PCV %) and genotypic (GCV%) coefficients of variations, phenotypic (σ 2 ph) and genotypic (σ 2 g) variances.The calculation of phenotypic and genotypic correlation coefficients was done using the methodology outlined by Hussain et al. 47 .Heritability in the broad sense was obtained as described by Borojević et al. 48, and as described by Warner 40 .

Ethics approval and consent to participate
This article does not contain any studies with human or animal subjects.The current experimental research and field study including the collection of plant material, is complying with relevant institutional, national, and international guidelines and legislation and used for research and development.

Results
Range, means, phenotypic and genotypic coefficients of variation (PCV and GCV), as well as broad-sense heritability (H 2 b %), for the studied traits across F 2 , F 3 , and F 4 generations, are presented in Table 1.The results indicate that the mean values for the first opening flower (FOF) were 72.05, 71.98, and 70.54 days, and the PCV values were 4.62%, 3.93%, and 3.19%, while the GCV values were 3.83%, 3.76%, and 3.02%, respectively.Similarly, for the first fruiting node (FFN), the mean values were 7.66, 7.17, and 6.80, and the PCV values were 15.05%, 8.64%, and 6.30%, while the GCV values were 10.78%, 7.78%, and 5.85%, respectively, across F 2 , F 3 , and F 4 generations.As for boll weight (BW) (g), the mean values were 3.19 g, 3.25 g, and 3.81 g, and the PCV values were 9.88%, 8.20%, and 8.10%, while the GCV values were 7.40%, 7.14%, and 6.63%, respectively, across F 2 , F 3 , and F 4 generations.Regarding seed cotton yield per plant (SCY/P), the mean values across F 2 , F b%) values in F 4 were higher than in F 3 , which were in turn higher than in F 2 for all the traits analyzed.

Selection criteria
Table 2 presents the mean values, as well as the phenotypic and genotypic coefficients of variation for traits in the five different selection criteria from F 4 families.The data revealed that selection criterion I1 had the highest value for the first opening flower at 67.14, as well as the highest PCV and GCV values for the first fruiting node at 7.33% and 6.74%, respectively.Selection criterion I2 yielded the highest mean value for both seed cotton yield (122.93) and seed index (12.58)

Mean performance of the thirty selected families of F 4 generation
Table 3 displays the average performance of thirty selected families from the F 4 generation of the Giza 94X Suvin population for all traits studied.Significant differences were noted among various progenies compared to the better parent and commercial variety.For the first opening flower trait, progenies No. 11, 14, and 21 showed significant differences from the superior parent, while progenies No. 12 and 24 exhibited highly significant differences.When compared to the commercial variety, progenies No. Regarding the lint percentage trait, progenies No. 25 and 27 exhibited significant and highly significant differences from the better parent, respectively.Meanwhile, twenty families significantly differed from the commercial variety.For the seed index trait, progenies No. 1, 2, 6, 13, 16, and 29 showed highly significant differences from the superior parent.Progenies No. 14, 26, and 28 exhibited significant and highly significant differences from the Table 1.Estimates of broad-sense heritability (h 2 b ), phenotypic (PCV), and genotypic (GCV) coefficients of variation, range and means, for the eight studied characters in F 2 , F 3, and F 4 generations.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.

Phenotypic and genotypic correlation
Phenotypic and genotypic correlation coefficients were calculated for all studied traits in the F 4 selected families (Table 4).

Density plots for the studied traits in the three generations
Figure 1 displays density plots displays the studied traits in the three generations using a smoothed kernel density function to represent the probability of the trait values.The area under the curve represents the distribution of values, and the Y-axis represents their probability.The X-axis value corresponding to the peak of each density plot is the average of the trait.The peaks of each density plot show a higher concentration of values and therefore a higher probability, while the tails demonstrate a lower concentration of values and a lower probability.The figure indicates that the peak value of LI for F 2 is different from F 3 and F 4 , with the means higher than the mean of LI for F 3 and F 4 .A similar trend is observed for SI, while the opposite is true for other traits.The density plots demonstrate that selection from F 2 improved the means of LI and SI but decreased the variation of all studied traits.

Principal component analysis (PCA)
In biplot analysis (Fig. 2), the sharp angle (below 90 degrees) and the obtuse angle (above 90 degrees) between the variables indicated positive and negative correlation between variables, respectively.Positive correlations were observed among the indices of BW, LI, L, LY, SI, and SCY.These indices were highly positively correlated with F 2 , F 3 and F 4 .

Path analysis
The study utilized path analysis, a statistical method for examining causal relationships between variables, using the "sem" function in the "lavaan" package of R software.The results were displayed in path diagrams (Figs. 3, 4 www.nature.com/scientificreports/and 5) that illustrated the effects of five factors (lint index, seed index, lint percentage, first opening flower, and first fruiting nod) on three outcomes (boll weight, lint yield, and seed cotton yield) across three generations.Three types of arrows were used in the path diagrams: single-headed arrows indicating causal relationships, double-headed arrows indicating covariance between two factors, and double-headed arrows pointing to a single factor representing the variance of that factor.The findings revealed that the lint index had a strong positive effect on both lint yield and boll weight in all three generations, with R 2 values of 0.93, 0.87, and 1.51 for lint yield and 1.07, 1.24, and 1.7 for boll weight in F 2 , F 3 , and F 4 , respectively.Moreover, lint yield had a direct and significant effect on seed cotton yield in all three generations, with R 2 values of 1.03, 1.05, and 1.04 for F 2 , F 3 , and F 4 , respectively.Boll weight had a stronger direct effect on lint yield than both lint percentage and boll weight themselves, as evidenced by a more substantial direct effect of boll weight on lint yield than the direct effect of lint percentage and boll weight on lint yield.The indirect effect of the lint index and seed index on lint yield through lint percentage had the same pattern across all three generations, where the lint index had a positive effect and the seed index had a negative effect.Meanwhile, the seed index had no indirect effect on lint yield through boll weight, whereas the lint index and seed index had a positive indirect effect on lint yield through boll weight in all three generations.Finally, the direct and indirect effects on lint yield were more pronounced in F 4 than in F 2 and F 3 , possibly due to selection that strengthened the relationship between lint yield and its components, resulting in a more stable progeny of that generation.In summary, the path analysis provided valuable insights

Discussion
The results indicate that genetic factors play a significant role and that there is a possibility of achieving positive outcomes by selecting early segregating generations in the presence of water deficit stress.These findings are consistent with earlier research 32,49,50 .The F 2 and F 3 generations showed greater phenotypic and genotypic coefficients of variation for all traits studied compared to the F 4 generation.This decrease in genetic variability and heterozygosity may have resulted from various selection methods that reduced a significant portion of the variability 30,51,52 .Although the mean performance of all traits studied was higher in the F4 generation than in the F 3 generation, the favorable alleles that accumulated due to effective selection procedures resulted in a lower desirable MR in the F 4 generation.The study suggests that selecting for these traits is possible, and other researchers have reported similar results 5,49,50,53 .Plant breeders need to consider all the economic traits and not just focus on one.Correlation analysis is a useful tool for predicting how a change in one trait will affect another.The genotypic correlations were generally higher than the phenotypic correlations, which could be attributed to the relative stability of genotypes under certain selection pressures.This finding is consistent with other studies 16,37,43,51,54 .The study showed that most traits exhibited an increase in mean values across generations in the population due to additive gene action.Additionally, high heritability values were observed, possibly due to the close relationship between PCV and GCV values.All traits under study showed a very high to high degree of heritability, indicating a low or negligible influence of the environment on their expression, making them suitable for selection and improvement.The findings are in agreement with Ali et al. 's research 53 which reported a persistent correlation between GCV and PCV values, leading to high estimates of broad-sense heritability.
The results also indicated a minor disparity between GCV and PCV values for most traits, suggesting that environmental factors have a minimal impact on their expression.The variation in the values of genotypic and phenotypic coefficients was narrow for all studied characters which indicated less influence of environment in the expression of these characters 35 .The high values of PCV and GCV together with high values of heritability are a good indicator for genetic advance in this population through the amount of genetic variance to be expected from the selection.It was suggested that combining the genetic coefficient of variation with heritability would provide the most accurate estimation of the expected genetic variance resulting from selection 8,14 .The three most influential factors in genetic progress in the three populations were high selection intensity, genotypic coefficient of variation, and heritability 1,10,32,37,43,53,55 .The differences between generations may be due to the various genotypes scored by each selection criteria.The study found that selection criteria I2, I4, and I5 were effective in improving most yield and its component traits, while criterion I1 was efficient in improving earliness traits.The narrow range between PCV and GCV values for different selection criteria suggests that environmental factors have a minimal impact on these traits.The analysis of variance revealed significant differences among families of F 4 for all studied traits, indicating that selection criteria would be effective.The increase in selected families was greater than the better parent and commercial variety.The results also showed variations in the mean performance of most selected families, which could be attributed to gene expression.Therefore, selecting adaptable families with higher means than the better parent and commercial variety would help produce highly yielding and earlier crops, particularly in winter cropping, without a decrease in yield 23,27,31,39,56 .It appeared that most of the yield and its component traits were positive and either significant or highly significant correlated with each other, indicating that these traits are important components of cotton yield.A positive association between major yield components is very significant to the breeder because component breeding would be very effective under such a situation.Such correlations indicate that selection for improving one or more of these traits would improve other traits 17,36,57 .The increases in yield and its components for this population were higher in family numbers (9, 13, 19, 20 and 21).These families could be continued to further generations as breeding genotypes for developing higher yield and its components.Nazmey et al. 58 studied cotton yield and yield components in relation to the relative contribution and found that seed cotton yield was significantly and positively correlated with boll weight and lint yield.The same relationship was found between lint yield and each boll weight and seed cotton yield [59][60][61] .Which reported that positive significant correlation values were found between seed cotton and lint yields with boll weight and lint percentage.It was indicated that seed cotton yield was highly significantly genetically correlated with boll weight (r = 0.99), lint yield (r = 0.88) and lint index (r = 0.96).With respect to, the phenotypic and genotypic correlation values between lint percentage and lint index there were positive highly significant correlation values (0.906** and 0.938**) 62 .

Conclusion
Based on our findings, it can be concluded that the studied cotton traits, including first opening flower, first fruiting node, boll weight, seed cotton yield, lint yield, lint percentage, seed index, and lint index, are significantly influenced by genetic factors.The broad-sense heritability values indicate that the traits are heritable, with higher heritability values observed in the F 4 generation compared to F 3 and F 2 .The results of the different selection criteria indicate that the traits under genetic control are suitable for breeding programs.The mean performance of selected families from the F 4 generation of the Giza 94X Suvin population showed significant differences in various progenies compared to the better parent and commercial variety, indicating the potential for further selection and improvement of cotton traits.There are several prospects for future research and breeding programs aimed at improving cotton traits.Firstly, the traits found to be under genetic control and suitable for breeding programs, including first fruiting node, seed cotton yield, boll weight, first flower opening, seed index, lint yield, and lint percentage, can be further studied to identify specific genetic markers and genes responsible for these traits.This information can then be used to develop more efficient breeding strategies, such as marker-assisted selection, to improve these traits in cotton varieties.Secondly, the selected families from the F 4 generation with significant differences in progenies compared to the better parent and commercial variety can be further evaluated for their agronomic performance and adaptability to different growing conditions.This information can then be used to identify the best-performing families and varieties, which can be released for commercial cultivation.Lastly, the heritability values observed in the F 4 generation suggest that this generation may be suitable for developing new cotton varieties through the application of different breeding methods, such as recurrent selection or hybridization with other elite varieties.The resulting new varieties can then be evaluated for their performance and suitability for cultivation in different regions.

Figure 1 .
Figure 1.The density plot of six studied traits: First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.

Figure 2 .
Figure 2. The PCA biplot for three generations and traits (with a sample size of 8) displays the contribution of individuals to each variable.Individuals on the same side as a variable are seen as having a high impact on it.The strength of contribution to each principal component (PC) is shown by the magnitude of the vectors (lines).The direction of the vectors indicates the correlation between variables, with vectors pointing in similar directions showing positive correlation, vectors pointing in opposite directions indicating negative correlation, and vectors at proximately right angles indicating low or no correlation.The colored concentration ellipses represent the observations grouped by mark class, with the size determined by a 0.95 probability level.

Figure 3 .
Figure 3. Path diagram that illustrates both the direct and indirect effects of the studied traits in the F 2 generation.Bidirectional arrows show correlation between the variables, and unidirectional arrows indicate a direct effect on the direction of the arrow, blue and red arrows represent positive and negative effects.Solid arrows indicate P < 0.05 and dashed arrows indicate P > 0.05.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.

Figure 4 .
Figure 4. Path diagram that illustrates both the direct and indirect effects of the studied traits in the F 2 generation.Bidirectional arrows show correlation between the variables, and unidirectional arrows indicate a direct effect on the direction of the arrow, blue and red arrows represent positive and negative effects.Solid arrows indicate P < 0.05 and dashed arrows indicate P > 0.05.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.

Figure 5 .
Figure 5. Path diagram that illustrates both the direct and indirect effects of the studied traits in the F 2 generation.Bidirectional arrows show correlation between the variables, and unidirectional arrows indicate a direct effect on the direction of the arrow, blue and red arrows represent positive and negative effects.Solid arrows indicate P < 0.05 and dashed arrows indicate P > 0.05.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.
. Additionally, selection criterion I3 had the highest PCV and GCV values for the first opening flower, boll weight, seed index, and lint index, with values of 2.48%, 2.18%, 11.11%, 10.39%, 5.86%, 5.43%, and 13.59%, 13.33%, respectively.Selection criterion I4 showed the highest mean values for boll weight, lint yield, lint percentage, and lint index, with values of 4.11, 51.94, 42.60, and 9.27, respectively.Finally, selection criterion I5 gave the best mean value for the first fruiting node at 6.66, and the highest values for PCV and GCV h 2 b in F 2 generation = (VF 2 − (VP 1 + VP 2 )/VF 2 ) × 100 seed index, lint yield, and lint percentage are mainly under genetic control and less affected by the environment.

Table 2 .
Mean, phenotypic (PCV) and genotypic coefficient of variations (GCV) for earliness, yield, and its components for the five different selection criteria parameters of F 4 selected families.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.

Table 3 .
Mean performance of thirty selected families from the F 4generation of the Giza 94X Suvin population for all studied traits.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot utilizes a smoothed kernel density function to show the probability of trait values.The area under the curve indicates the distribution of values with the Y-axis representing the probability.Vol.:(0123456789) Scientific Reports | (2024) 14:7723 | https://doi.org/10.1038/s41598-024-57676-wwww.nature.com/scientificreports/into the causal relationships among different factors and their effects on lint yield, boll weight, and seed cotton yield across three generations.

Table 4 .
Coefficient of phenotypic and genotypic correlations among different character combinations of 8 quantitative traits for the F 4 selected families.*and ** significant and highly significant at 0.05 and 0.01 probability levels, respectively.First Opening Flower (FOF), First Fruiting Node (FFN), Boll Weight (BW), Seed Cotton Yield (SCY), Lint Percentage (L%), Seed Index (SI), and Lint Index (LI).The plot uses a smoothed kernel density function to represent the probability of trait values, with the area under the curve indicating the distribution of values and the Y-axis representing the probability.