Study on characters associations and path coefficient analysis for quantitative traits of amaranth genotypes from Ethiopia

Selection based on yield alone may not be effective for yield improvement in plant breeding programs. Thus, in order to progress the genetic gains during selection, yield should be considered along with potential yield contributing traits. The objective of this study was to improve the genotype of amaranth and increase the effectiveness of selection in the program by identifying the correlation and path coefficients between yield and its relevant attributes. On 120 genotypes of amaranth planted during two growing seasons in 2020 and 2021, the study was carried out using an alpha lattice design with two replications. The results revealed significant positive phenotypic and genotypic associations on leaf yield, with leaf area, leaf breadth, branch number, leaf number, plant height at flowering, and grain yield all having positive direct effects. Similar strong positive phenotypic and genotypic relationships were found for grain yield and grain sink filling rates. Using path coefficient analysis, the direct and indirect effects of yield-related traits on yield were also determined. In addition to having a strong direct impact on grain output, the grain sink filling rates showed both phenotypic and genotypic evidence of substantial positive relationships with grain yield. It was further suggested that leaf yield in amaranth genotypes may increase through the indirect selection of plant height at maturity, leaf length, and terminal inflorescence lateral length, which showed such significant indirect influences, mostly through leaf area, days to maturity, and days to emergence, which displayed such strong indirect effects, primarily through plant height at flowering. This study consequently shows the need for traits with significant positive indirect impacts via leaf area to be considered indirect selection criteria for improving leaf yield in amaranth genotypes. The grain sink filling rate also significantly improved grain yield indirectly at both the phenotypic and genotypic levels, mainly via days to flowering and leaf yield. This demonstrated that selection that mainly targeted days to flowering, leaf yield, and grain sink filling rate would ultimately boost the grain yield in amaranth genotypes.


Plant material
Amaranth was collected legitimately and under the proper national (Ethiopian Biodiversity Institute) and international guidelines.The Ethiopian Biodiversity Institute field staff assisted in collecting the samples that were used as vouchers.The specimens were accepted by the National Herbarium (ETH) at Addis Abeba University after they had been placed there, dried, tagged, pressed, and given a number.The specimens were identified by systematist Melaku Wondafrash Ayitaged at the Addis Abeba University National Herbarium.
A total of 120 amaranth genotypes were used in this study, with 34 genotypes coming from the Ethiopian Biodiversity Institute (EBI), 2 from the Melkasa Research Center, 15 from the Werere Research Center (Afar region), 8 from Sidama, and the remaining 61 genotypes being collected in the Southern Nations, Nationalities and Peoples Region, Oromia, and Tigray regions of Ethiopia in 2019 and characterized for various agromorphological traits.One hundred eighteen members contain passport information, but the other two do not and are regarded as released varieties (Table 1).www.nature.com/scientificreports/

Experimental layout
A tractor cleared the experimental field, correctly plowed it, and harrowed it.A manual hoe was used to prepare the ridges.The experimental design was an alpha lattice.A layout with 2 replications, 16 blocks, and 15 plots per block for amaranths was conducted.Each unit plot was separated by a 0.60-m distance between plots, a one-meter distance between blocks, and a three-meter distance between replications, with a plot size of 1.80 m in length and 1.50 m in width for a 2.7 m 2 area.Each season required a total space of 1345.2 m 2 (38 m × 35.4 m).
On April 15th, 2020, and 2021, the seed was sown at one location in each season, which is under ideal growing circumstances and during the agricultural season.One growing season on the experimental site in one year is considered the environment.Seeds of various genotypes (Table 1) were consistently sowed in two rows with a gap of 0.75 m between them.Seeds are quite tiny, with sizes ranging from 0.37 to 1.21 g per 1000 seed weight 15 , and they were planted in seedbeds and covered with powdered, finely diced cow farmyard manure after being combined with sand in a 1:4 ratio.At 14 and 22 days after sowing (DAS), thinning was done twice at a distance of 75 cm between rows and 30 cm between plants.According to Grubben and Van Sloten 16 and Shukla, et al. 17 , the experiment followed standard cultural practices.Hand-hoeing was used to control weeds at 2-week intervals following germination and whenever necessary.A total of 12 plants were maintained in each plot.

Data collection
Data on agro-morphological traits were collected either on a plot-by-plot basis or from ten randomly selected plants per plot.A list of the studied 24 agro-morphological traits with their descriptions and sampling methods is indicated in Table 2.At various phonological phases, observations were conducted on a variety of morphological characteristics.Ten randomly selected plants in each plot had their phenotypic traits evaluated.The International Board for Plant Genetic Resources suggested utilizing amaranth descriptors to describe mature plants based on taxonomic keys 18 .Characters that were not on the list but were deemed crucial for the characterization were included.To accomplish the aforementioned objectives, a two-season experiment was carried out at the Agricultural Research Site of Hawassa University, Ethiopia.

Diagnosis of multicollinearity
One of the requirements for accurate and trustworthy path coefficient estimation is the analysis of multicollinearity between any explanatory variables 19 .The key plant breeding phenomena of collinearity diagnostic boosts the effectiveness of indirect selection for genetic improvement when the target characteristics have low heritability 20 .When there is a moderate to severe multicollinearity, the computed coefficients may have utterly irrational (higher than unity) values, and the variances linked to estimators of path coefficients may inflate, generating incorrect estimates 21 , and the variances associated with estimators of path coefficients may become inflated and leads to unreliable estimates 22 .By examining the correlation matrix or by using the variance inflation factor (VIF), tolerance limit (TOL), and eigenvalues, multicollinearity can be identified.Two explanatory variables may be multicollinear if there are strong coefficients (near + 1) between them in the correlation matrix 23 .This possibility can be verified using the VIF, TOL, and eigenvalues.Using the VIF, TOL, and COLLIN option methods, the PROC REG was used to perform the VIF, TOL, and eigenvalues (SAS).
A number greater than 10 indicates that the related regression coefficients are underestimated due to multicollinearity 24 .The VIF quantifies how much the variance of an estimated regression coefficient is increased due to the effect of multicollinearity 25 .The TOL, which is the inverse of the VIF, indicates that there is severe collinearity in the database when it is larger than 0.1 25 .According to Montgomery et al. 25 , the correlation matrices' maximum eigenvalue (max) can be divided by the subsequent smallest eigenvalue (min) to determine the condition number (CN) for each trait.Montgomery et al. 25 determined that the degree of multicollinearity in the matrices was categorized as weak when the estimates of CN were less than 100, moderate when they were between 100 and 1000, and severe when they were greater than 1000.Either eigenvalues or eigenvectors can be used simultaneously to find the multicollinearity issue.Therefore, removing the traits responsible for boosting the variance of the regression coefficient is essential unless the trait is of importance for improvement.Traits with a low eigenvalue and a high matching eigenvector indicate a striking collinearity problem 22 .Therefore, removing the traits that cause multicollinearity from the study is a quicker and more efficient way to solve this issue and enables obtaining the most precise path coefficients 23 .As a result, the correlation matrices, VIF, TOL, www.nature.com/scientificreports/CN, and Eigen system analysis of the correlation matrix were used to perform the multicollinearity test between the researched features.In total 24 traits were measured and based on the multicollinearity diagnosis tests.These multicollinearity diagnosis tests allowed us to exclude variables like TSW, GFP, BLBL, TLBL, SD, NN, PL, and LT from the analysis of correlation since they were so strongly linked to multicollinearity in the matrix.However, multicollinearity is associated with DM according to grain yield path analysis and GSFR related to leaf yield path analysis; these features were removed as well from the analysis.The remaining 16 variables (or around 66%) and 14 variables (58.33%) were considered for correlation and path, respectively, for the analysis due to their low multicollinearity in the current study.

Phenotypic and genotypic correlations
The collected data from the two years were combined and subjected to correlation and path coefficients analysis.Correlation analysis was performed for all possible combinations of traits after diagnosing the multicollinearity Table 2. List of quantitative agro-morphological traits used along with their marker code, unit, basis of sampling, and description.The main countable leaves were counted along the main stem at the flowering stage

Days to maturity DM Days Plot
The number of days between 90% emergence and the physiological maturity of 80% of plants.(The time of plant maturity is when the seed taken from the central part of the inflorescence does not change shape when pressed between fingers and when the inflorescence changes its color from green to brown) www.nature.com/scientificreports/test.Phenotypic correlation, the observable correlation between two explanatory traits, which includes both genotypic and environmental effects, and genotypic correlation, the inherent association between two explanatory traits, were computed from the components of variance and co-variances as described by 26 .The coefficient of correlation was tested using 'r' tabulated value at n − 2 degrees of freedom, at 5, 1, and 0.1% probability level, where n is the number of genotypes.
where rp xy and rg xy are the phenotypic and genotypic correlation coefficient between traits x and y, Pcov(x,y) and Gcov(x,y) are phenotypic and genotypic covariance between traits x and y; σ 2 p x and σ 2 p y are phenotypic variances for traits x and y; σ 2 g x and σ 2 g y are genotypic variances for traits x and y, respectively.

Path-coefficient analysis
Using path coefficient analysis, which was proposed by Dewey and Lu 27 and developed by Wright 28 , the correlation coefficient was further divided into direct and indirect effects 28 .It is possible to tell whether a cause-andeffect relationship between two explanatory traits is real and independent of other traits by using path analysis.LY and GY were chosen for the path-coefficient analysis as the result (dependent) variables and the other traits as the causal (predictor) traits.The LY was also considered as an independent trait in path coefficient analysis for GY and vice versa.The direct and indirect effects of the independent traits on LY and GY were estimated by the simultaneous solution of the following equations which may be represented by the following general formula as applied by Dewey and Lu 27 .
where is the mutual association between the independent trait (i) and dependent trait (j) as measured by the genotypic and phenotypic correlation coefficients; is components of direct effects of the independent trait (i) on the dependent trait (j) as measured by the genotypic and phenotypic path-coefficients; and rik * Pkj is the summation of components of indirect effects of a given independent trait (i) on a given dependent trait (j) via all other independent traits (k).To determine values, square matrices of the correlation coefficients between independent traits in all possible pairs were inverted and then multiplied by the correlation coefficients between the independent and dependent traits.
Based on average data from the two years, path coefficient analysis was conducted using phenotypic and genotypic correlation matrices built up as X = Y − 1 * Z for LY and GY.The phenotypic and genotypic correlation coefficients of dependent traits (LY and GY) versus independent agro-morphological traits are represented in this matrix by the vector ′X.Vector Z is the direct effect of the route coefficients, and vector Y′ is the inversed value of the phenotypic and genotypic correlations for all conceivable combinations among the examined traits.The matrix inverse (MINVERSE) function of Microsoft Excel 2010 was used) compute the inverse of matrix Y′ after inverting the reciprocals and autocorrelations of the correlation coefficient matrix.Using the matrix multiplication (MMULT) function of Microsoft Excel 2010, the direct influence of path coefficients was estimated as the product of vector ′X and each row of vector Y′ inverse.As recommended, the value of the direct effect path coefficient was multiplied by the correlation coefficient in the matrix to assess the indirect effect of the path coefficient 27 .The direct effect product in the path analysis and the dependent trait coefficient in the correlation analysis via all predictor traits are added to create the coefficient of determination (R 2 ).The contribution of the remaining unknown factor was measured as the residual factor (RF), which was calculated with the following formula; The value of RF reveals how effectively the causative factors explain the dependent factor's variability 29 .That is, if the RF value is low (for example, close to zero), the variability in the predictor characteristics fully accounts for the variance in the dependent trait, but a greater RF value suggests that other factors that have not been taken into account must be included in the study.

Discussion
Understanding the relationships between characters is crucial for any crop improvement effort since it indirectly influences selection success.To find the component characters that can be used in selection to boost yield, a correlation analysis is conducted to evaluate the underlying correlations between distinct characters.The correlation coefficient measures the degree to which two variables are positively or negatively associated.In the current trial, from the two sets of correlation coefficients, 61(50.83%) of the 120 possible pairs of trait comparisons across traits showed that the genotypic correlation coefficient was higher than the corresponding phenotypic correlation coefficient, which showed that the association was largely due to genetic factors among various traits, and www.nature.com/scientificreports/ that enhanced genetic inherent association [30][31][32] , this aids in determining the attributes that will be used in the breeding scheme.Moreover, genetic correlations between two characters arise because of linkage, pleiotropic or developmentally induced functional relationships 33 .As a result, it is more important and may be used to build a successful selection system.However, the remaining pairs had higher phenotypic correlation coefficients corresponding to genotypic correlation coefficients suggesting that environmental factors influenced the inherent associations among the different traits under study.The phenotypic correlation analysis indicated that the LW was associated significantly and positively with LL which also correlated significantly and positively with PHM, PHF, LA, and DF.Other research findings also showed that the LL had a significant positive association with LW and PHM, and significant positive associations with LA were reported at the phenotypic level 34,35 .The LL had a very high positive correlation with LW, PHM, PHF, LA, DF, and GSFR.Similar to LW and LA, Jangde et al. 34 found a substantial positive association between leaf length and both.In the current finding of PHM, BN, LL, DM, LA, DF, and DE, it was shown that the PHF had a very highly significant positive association with each of these variables.Singh 36 , discovered comparable results in the previous study on amaranth genotypes for plant height exhibited positive significant phenotypic correlations with days to 50% flowering and number of branches.The strong and positive link between leaf width and length also raises the prospect of enhancing both characteristics at the same time.A similar finding was www.nature.com/scientificreports/ reported by Olusanya 37 .BN was significantly and positively correlated with LN, PHM, and PHF, which indicates that selecting genotypes with taller plant heights would result in a greater number of branches, which is necessary for the production of leaf yields.Akter et al. 38 , came to the same results.DF was significantly and positively correlated with BN, PHF, and PHM.Similar findings were reported by Singh 36 .Days to maturity showed a positive and highly significant correlation with plant height.Similar results were reported by Shrivastav et al. 39 .The time of flowering are important adaptive trait in grain amaranth 15,40 .It was observed that leaf number and leaf area correlated negatively, suggesting that the more leaves amaranth produced, the smaller those leaves were.
A negative correlation between the number of leaves and leaf area might act as a challenge in the improvement of leaf yield in amaranths.Furthermore, this pattern suggests that the length of the leaf and the area of the leaf decrease as the number of leave increase.Breeding strategies should focus on increasing the area of each leaf or on producing more leaves in each main branch.Similar results were reported by Gerrano et al. 41 .Genotypic correlation between traits is important for identifying the most and least, important characteristics to be considered for the success of selection in breeding 42 .At the genotypic level, genes governing two separate traits are correlated positively at the coupling phase of linkage and negatively at the repulsive phase of linkage 43 .The relationship between leaf length and leaf width is very strong.Similarly, most of the leaf traits showed high positive correlations with one another, indicating that these traits are crucial for improving and Table 6.Estimated phenotypic direct (horizontal) and indirect (off-diagonal) effects of 14 traits on grain yield of 120 amaranth genotypes in Hawassa University agricultural research site in 2020 and 2021 cropping seasons.R 2 = 0.9899; residual factor = 0.1005.AIL axillary inflorescence length, LY leaf yield, GY grain yield, LL leaf length, LW leaf width, BN branch number, LN leaf number, PHF plant height at flowering, PHM plant height at maturity, TILL terminal inflorescence laterals length, TISL terminal inflorescence stalk length, GSFR grain sink filling rate, LA leaf area, DF days to flowering, DE days to emergence.www.nature.com/scientificreports/selecting amaranth yields.The results of the trait correlations in this study imply that selection for these features in amaranth genotypes will be efficient and successful.Similar findings on garden peas were reported by Sharma and Sharma 44 .With the following variables BN, LL, TILL, LN, DF, and GSFR, the PHM demonstrated highly significant positive relationships.Similar results were reported by Shrivastav et al. 39 , and inflorescence length and days to 50% flowering were positively and extremely significantly correlated with plant height, according to Yadav et al. 45 .A positive and significant correlation with lateral inflorescence length, leaf length, leaf width, number of branches per plant, and days to maturity was obtained 46 .Similar results were found by Varalakshmi and Pratap Reddy 47 .
For the leaf yield, the phenotypic correlation studies carried out for this investigation revealed a strong and positive association between economically important parameters such as LA, LW, and LL.The results show that any enhancement of these characteristics will increase the potential LY of amaranth.Similar trends were identified by Kumar 48 , in their analysis.Therefore, consideration for these traits should be given in amaranth breeding programs.LY had significant genotypic and phenotypic relationships with LW, PHM, LL, PHF, LA, DF, and BN.The taller genotypes in the current study produced more leaves, which resulted in higher leaf yields; hence PHM as well as PHF did forecast the genotypes' high leaf-yielding potential due to their significant association.However, the fact that PHM and PHF had a negative association with GY suggested that taller genotypes had poor grain yields.Therefore, selecting a genotype for higher leaf yield by focusing on the taller genotypes and shorter genotypes would produce good grain yields.As a result, the breeding strategy gave higher priority to characters that increased LW, PHM, LL, PHF, LA, DF, and BN, increasing the chances of obtaining a genotype with a high leaf yield.Similar results were reported in amaranth genotypes by Varalakshmi and Pratap Reddy 47 , who reported that plant height, leaf length, and leaf width were positively and significantly correlated with the leaf yield.Moreover, choosing genotypes with taller plant heights will result in a better number of branches, which is required for the production of leafy vegetables, according to the statistically significant positive association between plant height and the number of branches.A similar finding was reported by Olaniyi 49 Vegetable amaranths are said to benefit from late flowering and maturation since farmers would have more time to harvest the leaves 50 .A single gene controls early flowering in amaranth, and the dominant allele determines early flowering 51 .For grain production, early flowering and maturity would be preferable.The positive and significant correlations between yield and yield components in amaranth were also reported by 52,53 .
The results showed that at the phenotypic level, GY with GSFR, AIL, TISL, and TILL have extremely high positive phenotypic correlations.Furthermore, it demonstrated a strong positive phenotypic relationship with both LW and LL.The results suggest that these traits might be simultaneously enhanced to raise grain yield either individually or in combination.Showemimo et al. 54 , reported comparable results.As a result, amaranth should be chosen for future breeding programs to ensure the maximum exploitation of grain amaranth, as the photosynthetic capacity of a plant with longer leaves is predicted to be higher than that of a plant with shorter leaves 55 , and traits like a high leaf area, leaf length, and plant height enable optimal crop output when water is accessible as an essential agricultural input in sufficient quantities 56 .Grain yield exhibited a highly significant and positive phenotypic association with inflorescence length (TISL, TILL, and AIL).Nyasulu et al. 57 , Kumar 58 , and Yadav et al. 45 , obtained consistent findings in amaranth species.As a result, characteristics may be prioritized during selection to produce genotypes with higher grain yield.
The genotypic relationship between GSFR, PHM, and LY with the GY was also very substantial and positive.It also revealed a highly significant positive genotypic association with LN, a substantial positive genotypic association with DF, and a significant positive genotypic link with TILL.These findings demonstrated that selection for any one of these traits that contribute to grain yield will result in increases in the other traits, thereby finally boosting the grain yield.To select genotypes with higher grain production, primary selection for characteristics like GSFR, PHM, and LY may be prioritized.There were similar trends for the association of GY with other traits (PHM, TILL, and DF) that may be prioritized 45 .Grain yield and leaf area, length, and width are strongly correlated with one another.Similar to grain yield, all leaf traits showed high positive correlations with one another, indicating that these features are important for selecting amaranth leaf and grain productivity.The findings of the trait correlations in this study imply that selection for GSFR, PHM, and LY will be efficient and beneficial in improving grain production in grain amaranth genotypes, as also found in garden peas by Sharma and Sharma 44 .In addition, the positive correlation between grain yields and days of flowering revealed that early flowering would be a viable choice for a greater grain yield in this scenario.These results appear to be in line with a general pattern that early blooming is linked to higher grain yields across species 37,59 .These findings appear inconsistent with a general trend for late flowering to be associated with high grain yield across species 37 .
The relationship between any two features is merely depicted by their correlation coefficients, which do not reveal any potential underlying causes.The degree of the relationship between the yield and yield components can be determined by path coefficient analysis, which also helps to clarify the cause-and-effect interactions between different characters.Path coefficient analysis was performed to segregate the correlation coefficient into the direct and indirect effects of various features on yield.The yield-contributing features for LY and GY's varied phenotypic and genotypic correlation coefficients were further divided into direct and indirect effects.According to the findings, both phenotypic and genotypic levels of LY were positively affected directly by the LW, BN, LN, PHM, LA, and GY.This suggests that these traits can be used to develop a selection index that is both maximally reliable and effective at improving the leaf yield of amaranth genotypes.These findings confirmed the results of Aycicek and Yildirim 60 , for PHF in wheat and amaranths genotypes 48,61 , for the number of leaves in amaranths genotypes, Islam 62 for leaf area, and Jangde et al. 34 for leaf width .LL, TILL, and DE, on the other hand, were found to have the greatest negative direct effects on LY, both at the phenotypic and genotypic levels, showing that direct selection of these traits did not improve LY in amaranth genotypes.Following the present findings, the LL had a negative direct influence on LY in amaranth genotypes 35,48,62 , However, since PHM, LL, and TILL had such strong indirect effects, mostly through LA, while DM and DE exhibited such strong indirect effects, primarily through PHF, it was anticipated that LY in amaranth genotypes may also rise through an indirect selection of these traits.While TISL had positive direct effects on LY at the genotypic level, AIL had positive direct effects on LY solely at the phenotypic level.Additionally estimated was the residual effect, which establishes the unaccounted variability of the dependent variables (LY and GY).The characteristics that were included in the phenotypic and genotype path coefficient analyses explained 87.12% and 87.79% of the total variance in LY, respectively, according to the residual effects of 0.1288 and 0.1221.These results suggest that the independent variables taken into account in this study effectively represented the diversity of LY in amaranth genotypes.Other factors (at phenotypic level 12.88% and genotypic level 12.21%) that caused variations in LY, but were not taken into account in the current investigation, did exist.
A path analysis' findings revealed the importance of GSFR, PHM, and PHF traits for direct selection since they had considerable positive direct effects on GY both at the phenotypic and genotypic levels.This suggests that GY improvement in amaranth genotypes might be obtained through selection based on these characteristics.The result was congruity with 35,48 for the plant height in amaranth genotypes.On the other hand, LW, TISL, LL, and DF had negative direct effects at both phenotypic and genotypic levels indicating that direct selection based on these traits could not improve the GY in amaranth genotypes.The results were also in agreement with the findings of Shrivastav et al. 39 , who reported that DF had a negative direct effect on GY in amaranth genotypes.

Table 1 .
List and passport data of plant materials included in the study.

Table 5 .
Estimated genotypic direct (horizontal) and indirect (off-diagonal) effects of 14 traits on leaf yield of 120 amaranth genotypes in Hawassa University agricultural research site in 2020 and 2021 cropping seasons.R 2 = 0.9851; residual factor = 0.1221.AIL axillary inflorescence length, GY grain yield, LL leaf length, LW leaf width, BN branch number, LN leaf number, DM days to maturation, PHF plant height at flowering, PHM plant height at maturity, TILL terminal inflorescence laterals length, TISL terminal inflorescence stalk length, LA leaf area, DF days to flowering, DE days to emergence.

Table 7 .
Estimated genotypic direct (horizontal) and indirect (off-diagonal) effects of 14 traits on grain yield of 120 amaranth genotypes in Hawassa University agricultural research site in 2020 and 2021 cropping seasons.R 2 = 0.987; residual factor = 0.1140.AIL axillary inflorescence length, LY leaf yield, GY grain yield, LL leaf length, LW leaf width, BN branch number, LN leaf number, PHF plant height at flowering, PHM plant height at maturity, TILL terminal inflorescence laterals length, TISL terminal inflorescence stalk length, GSFR grain sink filling rate, LA leaf area, DF days to flowering, DE days to emergence.