Introduction

World widely grown, strawberries (Fragaria × ananassa Duchesne) are popular fruits used for fresh or processing markets1,2,3. In 2020, strawberry production was 9 million tons, with China and United States contributing to more than 50% of this world production4. The most common strawberry cultivars in the world are often poorly adapted to tropical conditions as they are mainly developed by breeding programs in temperate regions, such as the United States, Spain, and Italy5. In South America, most strawberry seedlings are grown in nurseries of Argentina and Chile, significantly raising the cost of production in countries like Brazil, where strawberries are grown in 4,300 hectares and more than 75% of seedlings are imported from these nurseries4.

Strawberry cultivars are usually selected based on yield, resistance to pests and disease, adaptability to semi-hydroponic systems, and fruit quality such as firmness, sweetness, acidity, and aroma6,7,8. Assembling all the favorable characteristics into a single cultivar is complex, especially because strawberries have an octaploid genome highly interactive with the environment9. After making biparental crosses, strawberry cultivar development takes several rounds of selection to identify the best offspring as potential cultivars. The process of selecting among hundreds or thousands of seedlings is arduous, time-consuming, and costly. Traditionally, the selection is conducted for a single trait either through direct selection, which is directly based on the trait of interest, or indirect selection, which uses a correlated characteristic to select the trait of interest8. However, single trait selection may result in cultivars that do not adequately meet the demands of producers and consumers10. Combining direct and indirect effects can better evince the importance of traits on an independent variable, such as production of commercial fruits11,12. Using the path analysis to seek a better understanding of cause and effect, along with multivariate techniques, such as selection indices, to simultaneously evaluate multiple traits may be the best strategy for obtaining genotypes that balance agronomic, biometric, and biochemical characteristics, especially in the early stages of a breeding program10,13,14.

Thus, the variability of a population can be explored through cluster analysis, among other statistical tools. The K-means algorithm is a non-hierarchical data exploration method that maximizes the variation component between the formed K-groups while minimizing the variation within each group15. Still, determining the number of groups, which is defined a priori, is complex and can generate imprecise analyses. To minimize this possibility, the Elbow Method determines the ideal number of clusters16.

This study focused on the classic parametric index presented by Smith17 and Hazel18 and the base index established by Williams19. The classic Smith-Hazel index employs matrices of genetic and phenotypic variances and covariances estimated in the analysis of variance. This index consists of obtaining the maximum correlation between the genotypic aggregate (H) and the index (I). The H is a linear combination of the analyzed traits, pondered by a coefficient established by the economic weights previously assigned to each trait. The I consist of a linear combination of the “x” values of each trait, pondered by a coefficient to be estimated. The base index19 is established by the linear combination of the average phenotypic values of traits pondered directly by their respective economic weights. The index is estimated by: \({\text{I}}\, = \,{\text{a}}^{{1}} {\text{y}}^{{1}} \, + \,{\text{a}}^{{2}} {\text{y}}^{{2}} \, + \cdots + \,{\text{a}}^{{\text{n}}} {\text{y}}^{{\text{n}}} \, = \,{\text{y}}^{\prime}{\text{a}}\), where yj is the mean of the jth trait and pj the economic weight. As for the non-parametric indices, the sum of ranks from Mulamba and Mock20 and the genotype-ideotype21 indices were used. The index of Mulamba and Mock ranks genotypes in relation to each trait individually, assigning absolute values, according to the classification direction determined by the breeder, from highest to lowest, or vice versa, depending on how this direction favors the genetic improvement. The values assigned in the classification of each trait are added, resulting in the selection index. The genotype-ideotype index21 estimates the distance of the evaluated genotypes in relation to an ideotype previously defined by the breeder. The first step is to identify the favorable value for improvement based on the average, maximum, and minimum values informed by the statistical computer program. This favorable value, called optimal value (OVj), must be within an upper (UL) and lower (LL) limit for each trait (LLj ≤ Xij ≤ ULj). The OVj is corrected by a constant concerning the depreciation of the genotype average, Cj (Cj = ULj−Lj), resulting in the Yij value. This process guarantees that any value of Xij that is outside the optimal range is not selected. Subsequently, the Yij values obtained with the transformation are standardized and pondered by weights previously assigned by the breeder for each trait. The OVj for each trait is standardized and pondered.

Overall, studies have shown that applying the aforementioned selection indices, which associate information from various characters, increases the success rate in crop improvement programs, including alfalfa22, passion fruit23, soybean24,25, sweet potato26, among others. Nevertheless, the use of indices in the strawberry genetic improvement process has been just recently shown in the literature13 and still needs to be better understood. The objective of this study was to evaluate and select intraspecific strawberry genotypes, to assess their phenotypic diversity, to compare different selection indices, and determine the direct and indirect relationship among yield and biochemical traits, using multivariate analysis methods.

Results

The optimal K value for the population was determined to be 2 according to the Elbow Method (Fig. 1). Two clusters were generated without overlapping in the K-means clustering (Fig. 2). The control ‘Camino Real’ and 40 seedling genotypes were in group 1. The control ‘Camarosa’ and 154 seedling genotypes were in group 2. Group 1 was better than group 2, according to the 5% confidence interval, for yield-related traits mass of commercial fruits (MCF), number of commercial fruits (NCF), and average mass commercial fruits (AMCF), as well as for the biochemical characteristics Ratio, ascorbic acid (AA), and anthocyanins (ANT). Contrarily, no significant difference in total pectin (TP) was measured between groups 1 and 2 (Table 1).

Figure 1
figure 1

Identification of the elbow point for the evaluated dataset.

Figure 2
figure 2

K-means cluster analysis for yield and biochemical traits evaluated in 10 populations of Fragaria × ananassa Dusch.

Table 1 Confidence interval of the mean values of the variables evaluated in the two K-means clusters with the respective number of genotypes (p < 0.05).

Twelve significant and positive correlations were found by the t-test (p < 0.05) among the 21 pairs of traits evaluated (Fig. 3). The most robust correlations were obtained for yield-related characteristics. High phenotypic correlations (r > 0.66) were measured between MCF and NCF (r = 0.96). Medium correlations (0.33 < r < 0.67) were measured between MCF and AMCF; MCF and Ratio; and NCF and Ratio, presenting r values of 0.55, 0.53, and 0.53, respectively. Low correlations (r < 0.34) were measured between NCF and AMCF (r = 0.34), between yield- and biochemical- related traits (MCF and AA (r = 0.32); MCF and ANT (r = 0.29); NCF and AA (r = 0.34); NCF and ANT (r = 0.32); and AMCF and Ratio (r = 0.27)), and between the biochemical traits Ratio and AA (r = 0.18) and Ratio and ANT (r = 0.32).

Figure 3
figure 3

Pearson’s phenotypic correlations among yield and biochemical traits evaluated in 10 populations of Fragaria × ananassa Dusch. NCF number of commercial fruits (fruits plant−1), AMCF average mass of commercial fruits (g fruit−1), Ratio—soluble solids (Brix°)/titratable acidity (g citric acid 100 g−1 pulp), TP total pectin (g total pectin 100 g−1 pulp), AA ascorbic acid (mg ascorbic acid 100 g−1 pulp), and ANT anthocyanins (mg cyanidin-3-glucoside 100 g−1 pulp).

By unfolding the correlations through path analysis for a single causal diagram, direct and indirect effects for the independent trait MCF and the other characteristics were identified (Table 2). Biochemical-related traits had no direct or indirect effect on MCF. The NCF had a direct effect (0.88) on MCF and an indirect effect on AMCF (0.086). Contrarily, AMCF had an indirect effect on NCF (0.30) greater than the direct effect on itself (0.25). Ratio had a small direct effect on MCF, while presented a greater indirect effect on NCF (0.46) and AMCF (0.068).

Table 2 Direct (on the main diagonal) and indirect (on the upper and lower diagonals) effects of the independent variables on the mass of commercial fruits in 10 populations of Fragaria × ananassa Dusch.

The total percentage gains obtained by the four indices under the four criteria (GV, h2, GCV, and EW) ranged from 366 to 386% (Table 3) for simultaneous selection of yield and biochemical traits. The Smith-Hazel index showed the highest total gains with 386%, for all criteria, followed by the Genotype-Ideotype distance index with 384% for GCV, the Mulamba and Mock index with 384% and 383% for EW and GCV, respectively, and by the Williams index with 380% for h2 (Table 3). Regarding yield and biochemical traits, the Mulamba and Mock index provided greater gains for the biochemical-related traits, in relation to h2 (86%) and GCV (81%), followed by the Genotype-Ideotype index, under the criteria h2 (79%) and GCV (75%). The Smith and Hazel index, despite having shown the greatest gains for yield traits, showed the lowest gains of biochemical traits, in which the four indices under the four criteria selected 53 genotypes (Table 4). From this total, 38 are located in group 1 and 15 in group 2, according to the K-means clustering. A total of 28 genotypes were selected by all indices for all criteria, in which only one belonged to group 1 of the K-means cluster analysis. Eleven genotypes were selected in some indices for all criteria and only in some criteria in other indices. Another eleven genotypes were selected in some indices for some criteria, and three genotypes were selected by some indices for all criteria.

Table 3 Estimates of percentage gains obtained by simultaneous selection with application of four indices based on four criteria of economic weights for seven traits evaluated in 10 populations of Fragaria × ananassa Dusch.
Table 4 Hybrids selected by Smith-Hazel, Mulamba and Mock, Williams, and Genotype-Ideotype indices and K-means clustering for yield and biochemical traits in 10 populations of Fragaria × ananassa Dusch.

The crosses ‘Camarosa’ × Aromas’ and ‘Camarosa’ × ‘Sweet Charlie’ stood out, with 14 and nine selected hybrids, respectively. The other crosses showed the following number of selected hybrids: ‘Dover’ × ‘Aromas’—6, ‘Oso Grande’ × ‘Tudla’—5, ‘Festival’ × ‘Aromas’—5, ‘Aromas’ × ‘Sweet Charlie’—3, ‘Tudla’ × ‘Aromas’—3, ‘Dover’ × ‘Sweet Charlie’—3, ‘Tudla’ × ‘Sweet Charlie’ – 3, and ‘Festival’ × ‘Sweet Charlie’—2.

According to Dindex (Fig. 4), genotypes can be grouped into four groups by a significant knee in the plot of index values against the number of clusters. The circular hierarchical dendrogram (Fig. 5) obtained from the analysis of the 53 genotypes selected by the indices and the controls (‘Camarosa’ and ‘Camino Real’), generated groups with 32, 20, 2 and 1 genotypes, whose cophenetic correlation value was 0.827 (p < 0.05).

Figure 4
figure 4

Dindex graphic for determining the best number of clusters in 53 selected genotypes and two controls of Fragaria × ananassa Dusch.

Figure 5
figure 5

Circular dendrogram based on yield and biochemical traits of the 53 selected genotypes and two controls of Fragaria × ananassa Dusch.

Discussion

Brazilian strawberry production depends almost entirely on cultivars developed in foreign breeding programs that, due to aspects related to genotype × environment interactions, may present lower yield, lower biochemical quality, greater susceptibility to pests and diseases, increasing production costs5. Nonetheless, these imported cultivars have the potential to be explored in intraspecific crosses aiming to express the existing variability in the species13,27.

Strawberry is an octoploid species that has gone through various levels of ploidization throughout evolutionary history28. Strawberry also harbors millions of DNA variants of the subgenomes of the species that gave rise to actual strawberry fruit29. In general, strawberry presents great variability in hybrids obtained from crosses, which favors the selection of new cultivars30. Significant variability with identification of superior hybrids has been found in phenotypic analyses for yield and physicochemical traits in populations obtained from crosses between commercial strawberry cultivars in Brazil9,12,13,31,32,33. In addition, genetic studies with hybrids and commercial cultivars based on molecular markers have shown that the germplasm of the Brazilian strawberry breeding program has genetic variability and divergence; therefore, it has a high potential for launching new cultivars9,34.

For the population analyzed in this study, the Elbow method established two clusters, which presented no overlap in the K-means clustering, showing variability in the analyzed population and complete dissimilarity between the two groups formed. The highest phenotypic correlation for the independent variable mass of commercial fruits (MCF) was obtained for the number of commercial fruits (NCF) (0.96), which had a high direct effect (0.88) in the path analysis. The average mass of commercial fruits (AMCF), which also had a medium and positive correlation (0.55) with the MCF, demonstrated in the path analysis that its indirect effect (0.29) on NCF is superior to the direct effect (0.25). Diel et al.35 found a direct effect of the total number of fruits (0.81), and an indirect effect of the mass of commercial fruits, via the total number of fruits (0.71), while the average fruit mass showed a direct relationship of 0.22. Authors results corroborate with our study and these positive findings suggest that direct selection via number of commercial fruits has a greater effect on yield and indirectly benefits the average mass of commercial fruits.

The balance between soluble solids and titratable acidity (Ratio) represents the equilibrium between sweetness and acidity. This balance combined with aroma and other biochemical traits makes up flavor, which has great importance in sensory perception and consumer preference5,6. In the present study, Ratio showed a moderate and positive phenotypic correlation with the mass of commercial fruits (0.52) and number of commercial fruits (0.53); however, when unfolding this correlation, a negative direct effect was observed, while the indirect effect was positive via NCF. In agreement with the present study, Diel35 found a negative direct effect (− 0.10) and a positive indirect effect (0.15) of Ratio via the total number of fruits on the total fruit mass. Direct effects of the number of strawberry fruit on production per plant were also reported by Ara et al.36 and Garg11, while Sighn et al.37 stated that the greatest direct positive effects came from flower number and fruit length. These results evince that the selection of strawberry genotypes for mass of commercial fruits can be directly performed via the number of commercial fruits and that genotypes with numerous fruits, but of medium size, tend to have a better Ratio than genotypes with large fruits.

Selecting genotypes that balance yield and biochemical traits simultaneously is a complex task10. The use of selection indices, both parametric and non-parametric, has been useful to identify more balanced hybrids of diverse crops, such as sweet potato26, alfalfa38, soybean25,39,40, potato41, maize42, acai43, passion fruit23,44, and, more recently, strawberry13,27.

In the present study, the Mulamba and Mock and Genotype-Ideotype indices were more sensitive to the use of different criteria, showing greater differences between gains. Cruz et al.45 recommend the use of statistics obtained from the analysis of experimental data as economic weights (EW) since it relates to the genotypic variance, they are dimensionless and maintain a certain proportionality among the evaluated traits. In the present study, the greatest gains for yield traits were obtained by the Smith and Hazel index (330.14%); however, it showed no difference between the statistical criteria or assigned weights. Contrarily, the greatest gains for the biochemical-related traits were obtained by the Mulamba and Mock and Genotype-Ideotype indices, under the criteria of h2 and GCV with 86.34% and 81.06%; 79.41% and 74.87%, respectively. Vieira et al.27, evaluating strawberry genotypes, also reported the greatest increments for yield traits with the Smith and Hazel index and for biochemical characteristics applying the Mulamba and Mock index. It occurs because parametric tests use the distribution parameters to calculate the statistics, while non-parametric tests use ranks assigned to ordered data and are uninfluenced by the probability distribution of the data evaluated46. Thus, the non-parametric Mulamba and Mock index is less sensitive, mathematically, to traits that present wide variance, such as number of fruits.

From the 194 genotypes analyzed, 28 were selected for all indices, under all criteria, in which 27 belong to group 1 of the K-means clustering. The use of different indexes and criteria tend to present very similar results for the initial positions of the selected genotypes. Bernardo et al.47 analyzed studies in several agronomic crops and concluded that, if the population is large enough, any selection index applied judiciously is useful for the simultaneous improvement of multiple traits, regardless of the method used. Nevertheless, the indices start to select different hybrids for the different criteria with the progress of positions.

The crosses with the highest number of selected hybrids were ‘Camarosa’ × ‘Aromas’ and ‘Camarosa’ × ‘Sweet Charlie’. Similarly, Galvão et al.28 identified the best hybrids for yield traits in the cross between ‘Camarosa’ × ‘Aromas’. Camarosa has been reported as a highly productive cultivar, with large, firm, and tasty fruits48, being one of the most planted short-day cultivars in the world49. The presence of large number of favorable alleles in ‘Camarosa’ and ‘Aromas’33 and their high productive potential50,51 make them promising parents for strawberry breeding programs5. Camargo et al.32 also found and selected the best hybrids coming from the crosses ‘Camarosa’ × ‘Aromas’ and ‘Camarosa’ × ‘Sweet Charlie’, concerning biochemical traits.

The dendrogram generated from the 53 selected genotypes led to the formation of five groups, demonstrating that this population still has variability that can be further investigated.

Conclusion

K-means clustering, correlation analysis, and path analysis complement the use of selection indices, leading to the selection of hybrids with better balance between yield- and biochemical-related traits in strawberry. This combined approach is more promising than the direct selection based on only one or a few traits. Furthermore, the multivariate analysis methods were efficient in selecting strawberry genotypes for multi-characters.

The number of commercial fruits was more relevant to the mass of commercial fruits than the average mass of commercial fruits. Therefore, NCF is a trait of greater importance for the selection of strawberry genotypes aiming at yield. The Smith and Hazel index showed the greatest gain for yield traits. Possibly because it is mathematically more influenced by characteristics with greater variability such as yield. The Mulamba and Mock and Genotype-Ideotype indices, both non-parametric, showed the highest estimated gains for biochemical traits under the criteria of h2 and GCV. The crosses with the highest number of selected hybrids were ‘Camarosa’ × ‘Aromas’ and ‘Camarosa’ × ‘Sweet Charlie’. The selected population of 53 hybrids still has variability with potential to be exploited.

Material and methods

The material and methods of our study was performed in accordance with the relevant guidelines and regulations. Plant material and replications followed the regulations of the Ministry of Agriculture, Cattle and Supplying of Brazil.

Plant material

Ten populations were obtained from biparental crosses among strawberry cultivars traditionally grown in South America (Table 5). All parents are public commercial cultivars available at the Brazilian Agricultural Research Corporation (EMBRAPA) from the Ministry of Agriculture, Cattle and Supplying of Brazil, and they were grown with Multiplanta Tecnologia Vegetal (Andradas, MG, Brazil). Parents are short-day cultivars based on photoperiod responses, except by Aromas which is a day-neutral cultivar52. Hybridization was performed following Chandler et al.53. The choice between cultivars to carry out the crosses to obtain segregating populations was based on the genetic dissimilarity study carried out by Morales et al.34. After crossing, achenes present in the fruits were removed and germinated in vitro, as described by Galvão et al.31. At 60 days after germination, the seedlings were transplanted to 72-cell polypropylene trays containing biostabilized substrate. A total of 2000 plants (about 200 seedlings per population) were transplanted to low-tunnel-covered beds in an augmented block design. Seedlings were transplanted in April 2015 and the genotypes evaluated until November of the same year. Based on agronomic (total and commercial fruit production, average fruit mass), phytosanitary [no symptoms of anthracnose (Colletotrichum acutatum and C. fragariae), Botrytis cinerea and Mycosphaerella fragariae), SS (Content of soluble solids above 8° Brix)], firmness traits and distribution of production on cycle, 194 genotypes were selected, grown in the greenhouse, cloned, and transplanted to the experimental field. Strawberry runners were transplanted in trays with substrates to obtain seedlings in sufficient numbers to be used as replications.

Table 5 Intraspecific crosses used to obtain 10 segregating strawberry (Fragaria × ananassa Dusch.) populations.

Experimental area

The experimental area is located in the city of Guarapuava, Paraná, Brazil (25° 23′ 36″ S and 51° 27′ 19″ W). The area has a humid mesothermal subtropical climate, type Cfb, with moderate winter and summer with average temperatures around 22 °C according to the Köppen's classification54. The soil is classified as a typical dystroferric Bruno Latosol55.

Seedlings were obtained from the stolons emitted by the parent plants, kept in a greenhouse. Rooting took place in 46-cell polypropylene trays filled with commercial substrate. At 50 days after planting, seedlings were transplanted in the experimental area, with evaluations occurring between May and November 2016.

Strawberry transplanting was performed in a low-tunnel system 0.8 m high with beds 1 m wide × and 0.25 m high surface covered with a black polyethylene film 30-µm thick. To cover the tunnels, 120-µm thick transparent polyethylene film was used. The plant spacing was 0.30 × 0.30 m between plants and 0.40 m between rows.

Beds were fertilized with 1,650 kg ha−1 of simple superphosphate, 250 kg ha−1 of potassium chloride, and 295 kg ha−1 of urea, based on the soil chemical analysis in accordance with the recommendations for the strawberry crop52. Nutritional replacement was performed via fertigation twice a week. Irrigation water was provided using a micro-drip system and followed the crop water demand. Additionally, for phytosanitary preventive control, applications of Thiamethoxam and Azoxystrobin + Difenoconazole were carried out. Strawberry fruits were harvested at maturity stage when 75% of fruit were red.

The experiment was conducted using the randomized block design with three replications and ten plants per plot. There was a total of 194 F1 experimental hybrids and two commercial controls ('Camarosa' and 'Camino Real').

Yield and biochemical traits evaluated

Traits that showed significant differences in the analysis of variance were used in the further analyses, namely: mass of commercial fruits (MCF) (g plant−1), number of commercial fruits (NCF) (fruit plant−1), average mass of commercial fruits (AMCF) (g fruit−1), ratio between soluble solids (SS) (Brix°) and titratable acidity (TA) (g citric acid 100 g−1 pulp (Ratio), total pectin (g total pectin 100 g−1 pulp), ascorbic acid content (AA) (mg ascorbic acid 100 g−1 pulp), and anthocyanin content (ANT) (mg cyanidin-3-glucoside 100 g−1 pulp).The biochemical traits were assessed in samples of commercial ripe strawberries (above 10 g), stored at − 2 °C right after harvest. Strawberries were thawed, crushed, and homogenized. Using the homogenized pulp, soluble solids content was measured with an Optech bench refractometer. Titratable acidity was determined by the titration method, with aliquots of 10 g of strawberry pulp plus 100 mL of distilled water 0.1 mol L−1NaOH standard solution up to pH 8.2, which corresponds to the turning point of phenolphthalein56. The total pectin was determined by the method described by McCready and McComb57, and calorimetrically determined while using the carbazole reaction, according to the methodology that was described by Bitter and Muir58. Ascorbic acid was obtained by the standard titration method of the Association of Official Analytical Chemists (AOAC), modified by Benassi and Antunes59. Whereas anthocyanin was determined by the differential pH method described by Giusti and Wrosltad60, with adaptations for strawberry. All biochemical analyses were performed in triplicates.

Statistical analyses

The variability of the 194 genotypes/hybrids and the two controls was analyzed using the R software (http://cran-rc3sl.ufpr.br). First, the number of clusters was determined by the Elbow Method using the factextra v.1.0.7 package61. A graph was used to indicate the ideal cluster number to represent a data set, where the value of “K” to be used is the point of the curve that looks like an elbow (inflection). Subsequently, the K-means non-hierarchical cluster analysis was performed based on the Euclidean distance, with the stats R Core Team62, dplyr v.0.8.563, ggplot264, and ggfortify65 packages. The relationship between traits was performed using a Pearson correlation map thoughtout the corrplot v.0.84 package66, while a path analysis was performed with agricolae v.1.3-267. The fenotypic correlations were classified as high (r > 0.66), medium (0.33 < r < 0.67) and low (r < 0.33)68.

Variance component analysis was performed with the Genes software69,70 to estimate genotypic variance (GV), heritability (h2), and genotypic coefficient variation (GCV). Economic weights (EW) were assigned (Table 6). Subsequently, two parametric indices, the classic index from Smith17 and Hazel18 and the base index19, and two non-parametric indices, the rank-sum-based index20, and the genotype-ideotype distance index21 were used to select.

Table 6 Economic weights criteria used in the application of selection indices for trait analysis in 10 populations of Fragaria × ananassa Dutch.

The genotypic aggregate (H) in the classic Smith-Haze index it is obtained by the expression \({\text{H}}\, = \,{\text{a}}_{{1}} {\text{g}}_{{1}} \, + \,{\text{a}}_{{2}} {\text{g}}_{{2}} \, + \cdots {\text{a}}_{{\text{n}}} {\text{g}}_{{\text{n}}}\), where “a” is the n × 1 dimension vector of the economic weights and “g” is the p × n dimension matrix of unknown genetic values of the “n” traits for the “p” families or progenies evaluated. The index (I) consists of a linear combination of the “x” values measured of each trait, pondered by a coefficient. It is obtained by the expression: \({\text{I}}\, = \,{\text{b}}_{{1}} {\text{x}}_{{1}} \, + \,{\text{b}}_{{2}} {\text{x}}_{{2}} \, + \cdots {\text{b}}_{{\text{n}}} {\text{x}}_{{\text{n}}} .\), where the coefficient “b” is an (n × 1) vector estimated from the expression b = P−1 Ga, where “P−1” is the inverse of the phenotypic covariance matrix; “G” is the genetic covariance matrix and “a” is the (n × 1) vector of the economic weights assigned to the traits17,18.

This index of Mulamba and Mock is obtained by the expression: \({\text{I}}\, = \,{\text{r}}_{{1}} \, + \,{\text{r}}_{{2}} \, + \cdots + \,{\text{r}}_{{\text{n}}}\) , where “I” is the index value for a given individual, rj is the rank of an individual in relation to the j-th variable, and “n” is the number of traits considered in the index. This procedure allows the ranking order of traits to have different weights, as specified by the breeder. Thus, we have that \({\text{I}}\, = \,{\text{p}}_{{1}} {\text{r}}_{{1}} \, + \,{\text{p}}_{{2}} {\text{r}}_{{2}} \, + \cdots + \,{\text{p}}_{{\text{n}}} {\text{r}}_{{\text{n}}}\), with pj being the economic weight attributed by the breeder to the j-th trait20.

To obtain the genotype-ideotype index, the values that will express the distance between genotypes and the ideotype are calculated by the expression: IDGI = √1/n Σ(yij − voj)2. The best genotypes were identified, and selection gains were estimated based on IDGI. Based on the values of the ideotype (Yij), the principal components analysis was performed to obtain the eigenvalues and eigenvectors associated with the correlation matrix between the analyzed variables. The distances of the genotypes in relation to the ideotype were estimated. This process allows the selection of genotypes closer to the optimal pattern defined by the breeder (ideotype)21.

Selection gains [SG (%)] in the base index19 were estimated with the following expression: SG (%) = 100 h2 (Xs − Xo)/Xo, where Xs is the average genotypic value of selected hybrids, Xo is the average genotypic value of all hybrids, and h2 is the heritability of the trait of interest. Heritability was obtained by the ratio between genotypic and phenotypic variance, as \(h^{2} = \hat{\sigma }_{g}^{2} /\hat{\sigma }_{p}^{2}\), where \(\hat{\sigma }_{g}^{2}\) is the genotypic variance and \(\hat{\sigma }_{p}^{2}\) is the phenotypic variance19.

Lastly, the optimal number of clusters was identified by Dindex index with R package NbClust71 to generate a circular hierarchical dendrogram created with all selected hybrids and controls, in all parameters and indices using the R packages vegan v.2.5–672, for the standardization of data, ape v.5.073, and cluster v.2.1.074.