Genotype by Yield*Trait (GYT) Biplot: a Novel Approach for Genotype Selection based on Multiple Traits

Genotype selection based on multiple traits is a key issue in plant breeding; it has been dependent on setting a subjective weight for each trait in index selection and a subjective truncation point for each trait in independent culling, and the weights and truncation points can be highly subjective. In this paper we proposed and demonstrated a novel approach for genotype selection based on multiple traits, the genotype by yield*trait (GYT) biplot, where “trait” can be any breeding objective other than yield; it may be an agronomic trait, a grain quality, processing quality, or nutritional quality trait, or a disease resistance. The GYT biplot ranks genotypes based on their levels in combining yield with other target traits and at the same time shows their trait profiles, i.e., their strengths and weaknesses. Compared to existing methods, this approach is graphical, objective, effective, and straightforward. Underlying the GYT biplot approach is the paradigm shift that genotypes should be evaluated by their levels in combining yield with other traits as opposed to by their levels in individual traits. An oat dataset from multi-year multi-locations trials was used to demonstrate the GYT biplot approach.

The importance of plant breeding to the welfare of mankind cannot be overemphasized, and genotype evaluation, i.e., identifying superior cultivars out of a population of genotypes, is a key part of this process. Genotype evaluation faces two key challenges. The first is genotype by environment interaction (GE) for a key trait, and the second is unfavorable associations among key traits [1][2][3] . GE has been investigated and reported in numerous publications, and a clear road map on how to handle GE in plant breeding has been outlined 4 . Briefly, data from multi-location trials in two or more years are needed to develop a strategy of dealing with GE for a given region and crop. Such multiyear multi-location data can be used to investigate whether there are any repeatable GE patterns. If yes, the patterns can be used as a guide to divide the target region into meaningful subregions or mega-environments (ME). If not, the target region should be treated as a single ME. Genotype evaluation and recommendation should be conducted for individual ME rather than across ME; thereby repeatable GE can be utilized by employing cultivars specifically adapted to each ME. By definition, GE within a ME is random noise. The noise can be canceled out and thereby genotypes be reliably evaluated if genotypes are tested in a sufficient number of trials in the ME. This number is determined by the relative size of genotypic variance versus GE variance within the ME 4 . When tested sufficiently, genotype evaluation can be based mainly on mean performance across trials and supplemented by a measure of stability. GGE (genotypic main effect plus genotype by environment interaction) biplots are an effective tool for dealing with GE for a trait 4,5 .
The current paper addresses the second challenge, i.e., genotype evaluation based on multiple traits. An ideal cultivar has to have superior levels for a number of target traits (breeding objectives). The challenge arises from the fact that target traits are usually unfavorably associated such that improvement in one trait often leads to reduced levels in one or more of other traits. Two strategies have been proposed and used in tandem or jointly, in dealing with this problem: independent culling and index selection [6][7][8] . Independent culling is to discard a genotype if its value for a trait is below a minimum requirement, no matter how good the genotype is for other traits. Index selection is to rank genotypes based on an index, which is a linear combination of the target traits. The difficulty with these strategies is that both are highly subjective. It is up to the breeder/researcher to set a weight for each trait in index selection and Ottawa Research and Development Center, Agriculture and Agri-Food Canada, 960 Carling Ave., Ottawa, Ontario, K1A 0C6, Canada. Correspondence and requests for materials should be addressed to W.Y. (email: weikai.yan@agr. gc.ca) a truncation point for each trait in independent culling. The weights and truncation points can vary from researcher to researcher and from time to time for the same researcher, even for the same dataset. Different sets of weights and/ or truncation points can lead to (dramatically) different selection decisions, of course.
A genotype by yield*trait (GYT) biplot approach is proposed in this paper to tackle the problem of genotype evaluation on multiple traits. It is based on the following conceptualization. 1) Yield is the most important trait and all other target traits are important only when combined with high yield. 2) The superiority of a genotype should be judged by its levels in combining yield with other target traits, rather than by its levels in individual traits. In this approach, the genotype by trait (GT) two-way table from a variety trial(s) is first transformed to a genotype by yield*trait (GYT) two-way table, in which each column is the combination of yield and a trait. The GYT table is then displayed in a GYT biplot. The average tester coordination (ATC) view 9 of the GYT biplot is employed to rank genotypes based on their overall superiority across the yield-trait combinations and to show their trait profiles (i.e., strengths and weaknesses), which serves as the basis for genotype evaluation and recommendation.
A dataset of covered oat (Avena sativa L.) from Quebec, Canada will be used as an example in the case study. Covered oat is produced in Canada for human food as well as for animal feed. The hull of the covered oat grain has to be removed when used as food; the part of the oat grain after hull removal is called groat. Oat based food is regarded as healthy food as the oat groat is relatively rich in β-glucan and other soluble fibers, which have been shown to reduce the risk of heart disease, high blood pressure, and type-II diabetes when a certain amount of oat meal is served daily 10,11 . Thus, high groat percentage and β-glucan content are two important breeding objectives for milling oat, only secondary to high grain yield. In addition, good lodging resistance is a highly valued trait by oat growers; it is important for achieving high yield and good quality as well as for easy harvest. High test weight is also a valued trait by both growers and millers for easy storage and transportation. High β-glucan and low oil are desirable for use as milling oat but low β-glucan and high oil are desirable for use as feed oat. Everything being equal, high protein content, early maturity, and large kernels are also preferred. Therefore, these traits are routinely measured in oat variety trials (Table 1). It will be shown that complicated associations exist among these traits and the GYT biplot makes it easy to rank oat genotypes based on their levels of combining yield and other target traits and at the same time to show their strengths and/or weaknesses.

Results
Genotype by trait (GT) biplot. The genotype by trait (GT) data presented in Table 1 are trait means for each of 26 genotypes tested across 30 trials at nine Quebec locations plus one Ontario location in 2015 to 2017. The Pearson correlations among these traits are presented in Table 2. This GT data are approximately displayed in a GT biplot 12 (Fig. 1), which can be used to visualize the associations among traits and the trait profiles of the genotypes. The GT biplot was based on trait-standardized GT data (indicated by "Scaling = 1" and "Centering = 2" on the biplot) and trait-focused singular value partitioning (indicated by "SVP = 2"). A biplot with such settings has the following interpretations. 1) The cosine of the angle between the vectors of two traits approximates the Pearson correlation between them. Thus, an angle smaller than 90° indicates a positive correlation, an angle greater than 90° indicates a negative correlation, and an angle of 90° indicates zero correlation. 2) The angle between a genotype and a trait indicates the relative level of the genotype for the trait. Thus, an acute angle indicates that the genotype is above-average for the trait; an obtuse angle indicates that the genotype is below-average for the trait; and a right angle indicates that the genotype is average for the trait. 3) The vector length (i.e., the distance to the biplot origin) of a trait indicates how well the trait is represented in the biplot; a relatively short vector indicates that the variation of the trait across genotypes is either small or not well presented in the biplot, which is due to its weak or lack of correlation with other traits. This can occur when the goodness of fit of the biplot is relatively poor (the goodness of fit of the GT biplot in Fig. 1 is 51.8%). 4) The vector length of a genotype indicates whether it is intermediate for all traits or has clear strengths and/or weaknesses in its trait profile.
Based on these principles, the following observations can be made from Fig. 1. (1) Grain yield (YLD) was negatively correlated with lodging score (LOD) (a larger lodging score indicates more lodging and less lodging resistance) and groat content (GROAT) but it was not strongly associated with other traits. So good lodging resistance was important for high yielding; and grain yield and groat content was unfavorably associated. (2) Groat content was positively correlated with lodging score but negatively correlated with β-glucan content (BGL), protein content (PROTEIN), and grain yield, all being unfavorable associations. This indicates that high groat content was poorly combined with other breeding objectives in the tested cultivars. Groat content was also negatively correlated with days to maturity (DTM), meaning that earlier genotypes tended to have higher groat content. (3) β-glucan content was positively correlated with protein content but negatively correlated with test weight (TW), days to maturity, lodging score, and groat content. The negative correlations of β-glucan content with test weight and groat content are challenging unfavorable associations. (4) Kernel weight (KW) was not strongly correlated with any traits, as suggested by its short vector. These statements can be verified from the correlation table (Table 2), even though the goodness of fit of the biplot was only moderate (51.8%).
The GT biplot in Fig. 1 also shows the trait profiles of the genotypes, the accuracy of which depends on the goodness of fit of the biplot. For example, it shows that cultivar Avatar had high groat content and high test weight but low grain yield and low protein content, and it was highly susceptible to lodging; cultivar Hidalgo had high levels of β-glucan content, groat content, and lodging score and had low levels of test weight, days to maturity, and grain yield; Richmond had a trait profile quite opposite to that of Hidalgo.
Despite its usefulness in revealing associations among traits and trait profiles of genotypes, the GT biplot is not very helpful in making decisions on which cultivars to select or recommend and which cultivars to discard or discommend, which are decisions a breeder/researcher must make. The proposed GYT biplot described below was designed to accomplish this.  (Table 3), in which each column was a yield-trait combination. For example, YLD*BGL is the combined level of grain yield and β-glucan content, which is a measure of how grain yield and β-glucan content were combined in a genotype. Either low grain yield or low β-glucan content would affect this combined value and the genotype will thereby be judged unfavorably. The same is true for other yield-trait combinations. The combinations yield*earliness (YLD/DTM) and yield*lodging resistance (YLD/LOD) had the division operator ("/"), as opposed to the multiplication operator ("*") in other trait combinations, to reflect the fact that more days to maturity and a larger lodging score are less desirable. The "/" operator means the values of the trait were reversed before being multiplied to the yield values. Thus, in the GYT table a larger value is always more desirable. The GYT biplot (  graphically displays the GYT data (Table 3), and the different views of the GYT biplot (Figs 2, 3 and 4) allows the data to be investigated from different angles. Note that yield per se was not included in the GYT data or the GYT biplot as it was incorporated into each of the yield-trait combinations.

Associations among various yield-trait combinations.
Since all yield-trait combinations have yield as a component, they tend to be positively correlated with each other, as indicated by the acute angles in the biplot (Fig. 2). This is an important feature of the GYT biplot, as opposed to the GT biplot (Fig. 1); it allowed genotypes to be graphically and meaningfully ranked based on their yield-trait combinations (below). Nevertheless, strong trait associations observed in the GT biplot ( Fig. 1), e.g., the positive correlation between β-glucan content and protein content and the negative correlations of test weight with these two traits ( Fig. 1 and Table 2) can still be seen in the GYT biplot, as shown by the magnitudes of angles among YLD*TW, YLD*PROT, and YLD*BGL.
Trait profiles of the genotypes. Figure 3 is the polygon view or "which-won-where" view 9 of the same biplot as in Fig. 2. This view is particularly useful for visualizing the trait profiles of the genotypes. The irregular polygon was formed by connecting the genotypes with the longest vectors in all directions. For each polygon side a line was drawn to start from the biplot origin and to be perpendicular to the polygon side. These lines divided the yield-trait combinations into two sectors; corresponding to each sector there was a polygon vertex. The geometry of the biplot determines that the genotype placed on a vertex has the largest values for the yield-trait combinations placed within the corresponding sector. Thus, Akina (and closely placed Kara) had the largest values for YLD*BGL, YLD*PROT, and YLD/LOD, meaning that these two cultivars were the best in combining grain yield with β-glucan content, protein content, and lodging resistance. Similarly, Unnamed1 (and closely placed Nicolas) had the highest levels of YLD/DTM, YLD*KW, YLD*GROAT, and YLD*TW, meaning that these two cultivars were the best in combining grain yield with early maturity, kernel weight, groat content, and test weight. From Fig. 3 it is also apparent that OA1436-1 had a contrasting trait profile to that of Akina and Kara although all three cultivars had good levels of yield.
Superiority rank of the genotypes based on their yield-trait combinations. Figure 4 is the ATC view of the same biplot as Figs 2 and 3 except that it was based on genotype-focused singular value partitioning (indicated by "SVP = 1" on the biplot), so as to focus on comparison among genotypes 13 . The small circle in the biplot represents the placement of the "average yield-trait combination, " which is determined by the coordinates of all yield-trait combinations included in the biplot. The line with a single arrow passes through the biplot origin and the average yield-trait combination and is called the average tester axis (ATA). The arrow points to higher mean values for the genotypes, across all yield-trait combinations. The ATA serves the purpose of ranking the genotypes based on their overall superiority or usefulness. The line with two arrows pointing outwards passes through the biplot origin and is perpendicular to the ATA. This double-arrowed line serves to separate genotypes better than average (placed on its right, on the same side as the ATA arrow) from those poorer than average (placed on the left side). This separation intuitively suggests the researcher to focus on the genotypes ranked better than average. The double-arrowed line also helps indicate whether a genotype had an all-rounded or balanced trait profile or had obvious strengths and/or weaknesses; the latter determines how a "useful" genotype should be used in terms of environmental adaptation and/or end use. Genotypes placed close to ATA (i.e., with short projections to the double-arrowed line) tend to have balanced trait profiles whereas those placed away from the ATA in either direction tend to have obvious strengths and/or weaknesses. From Fig. 4, the best ranked cultivars based on the yield-trait combinations included: Unnamed1 > Nicolas > Akina > OA1426-2 > Kara > OA1436-1. Avatar and Hidalgo, placed on the far left side of the biplot, were ranked the poorest, even though they were among the best in groat content (Table 1). In addition to ranking genotypes based on their overall superiority, Fig. 4 also shows the trait profiles of the genotypes (although Fig. 3 is the best for this purpose). Specifically, Fig. 4 shows that Nicolas and Unnamed1 were balanced for various traits; Akina and Kara were strong in β-glucan content, protein content, and lodging resistance but poor in test weight; and OA1436-1 was strong in test weight but poor in β-glucan content, protein content, and lodging resistance. This information is important for deploying the superior but different cultivars to their most suitable environments and end uses. In addition, regardless of their overall superiority, all genotypes placed below the ATA tended to have relatively good levels of test weight, groat content, kernel weight, and/or early maturity, but relatively low levels of β-glucan, lodging resistance, and/or protein content. The opposite is true for genotypes placed above the ATA.

Cultivar evaluation based on the GGE biplot for yield vs. that on the GYT biplot for multiple traits.
Presented in Fig. 5 is the ATC view of the GGE biplot for grain yield for the 26 cultivars tested in the 30 trials. No repeatable GE patterns can be seen in the GGE biplot, meaning that the 30 trials should be regarded as random samples of a single ME. The ATC view of the GGE biplot is therefore suitable for evaluating the genotypes on their mean yield and stability across the environments. The ATA points to higher mean yield and the double-arrowed line points to greater instability in either direction. Seven cultivars showed clear yield advantage over other cultivars. They were: Unnamed1 > Nicolas > OA1436-1 > Akina > Kyron > OA1426-2 > Kara. It can be noted that this rank is different from that based on the GYT biplot (Fig. 4). Among the seven high yielding cultivars, Kyron and OA1436-1 were ranked lower in the GYT biplot, due to their poor levels in combining yield with groat content, β-glucan content, and/or test weight. The rank change between the GGE biplot for yield and the GYT biplot for multiple traits Name YLD*GROAT YLD*BGL YLD*TW YLD/LOD YLD*KW YLD*PROT YLD/DTM  highlighted and validated the usefulness of the GYT biplot in identifying superior cultivars; superior cultivars must be high yielding but not all high yielding cultivars are superior for a given end use.

Discussion
Although numerous papers have been published and continue to be published on GE analysis of single traits, publications on genotype evaluation based on multiple traits are few. This may be interpreted as that genotype    evaluation based on multiple traits is no longer an issue. As senior plant breeders the authors can testify that this is not the case. The fact is that plant breeders and statisticians working with them have chosen to accept the reality that this issue is too complicated to tackle and there are no better ways other than depending on the breeder/ researcher's personal judgement to set a subjective weight and a subjective truncation point for each trait when making selection decisions. The GYT biplot proposed in this paper provides a novel approach to genotype evaluation based on multiple traits. This approach is comprehensive and effective, as it graphically ranks the genotypes based on their levels in combining yield with various target traits and at the same time shows the strengths and weaknesses of the genotypes. The rank indicates the usefulness of the genotypes and the strengths and weaknesses suggest how the genotypes should be used. This approach is objective because no subjective weights and truncation points are involved. The selection results depend only on the traits that are included in the analysis. It is advisable to include only those traits that are essential for the success of a cultivar in GYT biplot analysis. One novelty of this approach is the paradigm shift that the superiority of a genotype should not only be measured by its levels in individual traits but more importantly by its levels in combining yield with other target traits. This paradigm shift emphasizes the importance of yield relative to other breeding targets, which is in line with the common sense and practice in plant breeding and cultivar evaluation. Indeed, yield is the only trait that can determine the usefulness of a genotype by itself while other traits (agronomic traits, quality traits, or disease resistances) are valuable to producers only when they are combined with sufficiently good yield levels. For example, an oat genotype with a β-glucan level of 8% would be a highly valuable breeding parent. However, if its yield is only 50% of the best cultivars, then it will not be an acceptable cultivar. Similarly, a genotype with extremely good lodging resistance but very low yield would have no place in growers' fields. The same can be said of all other traits. Thus, levels of yield-trait combinations are more meaningful than levels in individual traits in selecting superior cultivars (though not necessarily so in selecting breeding parents). The relation between yield and other target traits for a crop cultivar may be compared to that between the skin and the hair for a fur; a trait gains its value only when associated with a yield level.
Another novelty of the proposed approach is its use of the ATC graph of the biplot in multi-trait analysis. The ATC view was initially developed for GGE biplots such that genotypes can be visually evaluated for their mean performance and stability across environments for a trait 9 . However, this view is valid only when the following conditions are met: 1) the data from all environments (or columns in the two-way table, in a generic term) have the same unit (or unit-free in case of standardized data), and 2) there are no strong negative correlations between individual environments and the average environment. For a GT biplot (Fig. 1), the first condition is met because it is based on trait-standardized data, but the second condition is rarely met due to strong negative correlations among traits. Also, in the GT data (Table 1) some traits are so presented that a large value means less desirable, which makes the ATC view meaningless. However, these conditions are all met in the GYT biplot (Fig. 2), making the ATC view of the GYT biplot a meaningful and effective tool to rank genotypes based on various yield-trait combinations and to show the strengths and weaknesses of the genotypes.
The GYT biplot analysis is straightforward because the yield-trait combinations can be readily calculated from the GT data and because biplot analysis is now routinely used by many researchers. For those who are not yet using biplot analysis, a superiority index integrating all yield-trait combinations can be easily calculated using a spreadsheet. This involves a few simple steps: 1) generating the GYT table (Table 3) from the GT table (Table 1), 2) standardizing the GYT table to form a standardized GYT table (Table 4), and finally, 3) taking the mean across the standardized yield-trait combination values for each genotype, which can be used to rank the genotypes (last column, Table 4). The strengths and weaknesses of each genotype can be appreciated by examining Table 4 as well. In fact, the GYT biplot (Fig. 2) is simply a graphical approximation of the standardized GYT data (Table 4). Nevertheless, the GYT biplot is highly recommended as it is much more effective than the GYT table.
It may be argued that GYT approach puts too much weight on yield relative to other traits. However, this approach reflects the consideration and reality of the oat value chain (and possibly the value chains of other crops). The first consideration of oat growers in choosing oat cultivars is their yield levels, as soon as they meet the minimum quality requirements from the end users. Although millers benefit directly from high quality (high groat content and high β-glucan content, in particular), they also understand the importance of grain yield to oat growers such that high grain yield combined with best possible quality is also their criterion when recommending oat cultivars. Their purpose of doing so is to ensure a reliable supply of oat grain with sufficiently good quality at regular prices, as opposed to a supply of best quality grain at higher prices. Moreover, the GYT biplot does allow the choices of oat cultivars for specific adaptations and end uses. For example, Fig. 4 shows that Nicolas and Unnamed1 ranked the best and had all-rounded or balanced trait profiles, and therefore can be recommended as all-purpose cultivars in Quebec and similar regions. Akina and Kara were good in combining yield with β-glucan, protein, and lodging resistance, though poor in test weight. They are therefore more suitable for use as milling oat for environments where lodging is a key problem. In contrast, OA1436-1 was good in combining yield with test weight, but was poor in β-glucan, protein, and lodging resistance. It is therefore more suitable for use as feed and for growing in environments where lodging is less of a problem.

Methods
The data source. The sample dataset (Table 1) was derived from the 2015 to 2017 Quebec provincial oat registration and recommendation trials, organized by Réseaux Grandes Cultures du Québec (RGCQ) and Centre de recherche sur les grains inc. (CÉROM). These trials were conducted annually at nine locations representing the crop zones of Quebec, plus at Ottawa, Ontario, making up 10 locations each year. A randomized complete block design with three replications was used in each trial. Each year about 45 covered oat cultivars or breeding lines were tested, and 26 cultivars were tested in all three years. In addition to grain yield, data on agronomic traits (days to maturity, plant height, lodging score) and grain quality traits (kernel weight, test weight, and hull percentage, which is the reverse of groat content) were collected for each genotype at all locations. Groat content, β-glucan content, oil content, and protein content were determined for composite samples across replications for each genotype from three locations each year. The data in Table 1 are mean values for each genotype-trait combination across the trials. The genotype by yield*trait (GYT) table. The GYT table (Table 3) was obtained as follows. For groat content, β-glucan content, protein content, test weight, and kernel weight, the values for the yield-trait combinations were obtained by multiplying the yield value with the trait value for each genotype (e.g., YLD*BGL). For lodging score and days to maturity, which were so measured that a larger value means less desirable, the values for the yield-trait combinations were obtained by dividing the yield value with the trait value for each genotype (e.g., YLD/LOD). Some traits, e.g., lodging and disease scores, are usually measured with 0 as the best and a larger value is less desirable. In this case it is advisable to reverse the values such that 0 means worst and a larger value means more desirable before calculating the yield*trait values. This ensures that in the GYT table a larger value is always more desirable. The units for the yield-trait combinations are not important as it is the standardized data that are used in genotype evaluation. Data standardization. The GT table or the GYT table was standardized so that the mean for each trait or yield-trait combination becomes 0 and the variance becomes unit (e.g., see Table 4). The standardization was performed as: ij ij j j where P ij is the standardized value of genotype i for trait or yield-trait combination j in the standardized table, T ij is the original value of genotype i for trait or yield-trait combination j in the GT or GYT table (Tables 1 and 3), T j is the mean across genotypes for trait or yield-trait combination j, and s j is the standard deviation for trait or yield-trait combination j.
Construction of a GT biplot. The GT biplot (Fig. 1)    where ζ i1 and ζ i2 are the eigenvalues for PC1 and PC2, respectively, for genotype i; τ 1j and τ 2j are the eigenvalues for PC1 and PC2, respectively for trait j, and ε ij is the residual from fitting the PC1 and PC2 for genotype i on trait j; λ 1 and λ 2 are the singular values for PC1 and PC2, respectively. α is the singular value partitioning factor. When α = 1 (i.e., SVP = 1 in terms of GGEbiplot), the biplot is said to be genotype-focused, and is suitable for comparing genotypes. When α = 0 (i.e., SVP = 2), the biplot is said to be trait-focused, and is suitable for visualizing correlations among traits. Genotype by trait relations are not affected by the choice of α. The scalar d is chosen such that the length of the longest vector among genotypes is equal to that that among traits, which is important for generating a functional biplot 3  for traits in the same plot.
Construction of a GYT biplot. The procedures for constructing a GYT biplot (Fig. 2) are exactly the same as constructing a GT biplot except the term "trait" should be replaced with "yield-trait combination. " Construction of a GGE biplot. The GGE biplot (Fig. 5) presented in this paper was generated the same way as the GT biplot ( Fig. 1) except that the term "trait" is replaced with "environment. " It is useful to note that there are different types of GGE biplots, depending on how the data are scaled before being subjected to SVD 3 . The GT biplot, GYT biplot, and GGE biplot were generated using the GGEbiplot software 3 . A recent addition to this software is to directly transform a GT biplot into a GYT biplot.
Data availability statement. All relevant data are included in the manuscript.
Ethical approval and informed consent. This work has no bearing on ethical issues.