A singular value decomposition Bayesian multiple-trait and multiple-environment genomic model

Today, breeders perform genomic-assisted breeding to improve more than one trait. However, frequently there are several traits under study at one time, and the implementation of current genomic multiple-trait and multiple-environment models is challenging. Consequently, we propose a four-stage analysis for multiple-trait data in this paper. In the first stage, we perform singular value decomposition (SVD) on the resulting matrix of trait responses; in the second stage, we perform multiple trait analysis on transformed responses. In stages three and four, we collect and transform the traits back to their original state and obtain the parameter estimates and the predictions on these scale variables prior to transformation. The results of the proposed method are compared, in terms of parameter estimation and prediction accuracy, with the results of the Bayesian multiple-trait and multiple-environment model (BMTME) previously described in the literature. We found that the proposed method based on SVD produced similar results, in terms of parameter estimation and prediction accuracy, to those obtained with the BMTME model. Moreover, the proposed multiple-trait method is atractive because it can be implemented using current single-trait genomic prediction software, which yields a more efficient algorithm in terms of computation.


Introduction
Breeders often want to improve more than one trait simultaneously in their breeding programs and thus conduct various experiments. For example, Teixeira et al. (2016) reported that a breeding program in Brazil measured 41 pig traits obtained by crossing 345 F2 pig populations of Brazilian Piau × commercial pigs. To analyze this type of experiment, breeders implement one of the following two approaches: (i) they perform a univariate analysis (one trait at a time), therefore ignoring the correlation between traits that does not allow to improve either parameter estimates or prediction accuracy, or (ii) they perform a multiple-trait analysis, which may not only take into account the correlation between traits but may also significantly increase the computing intensity.
Implementing first approach is valid when the correlation between traits is low or close to zero, but it is less desirable when the correlation between traits is moderate to strong. However, implementing the second approach is sometimes challenging-for example, when there are a large number of traits, and under these circumstances, breeders opt for the first approach. Furthermore, implementing the second approach is also challenging because early breeding programs start with at least 1000 lines that are evaluated in multiple environments, which complicates the analysis, as including genotype × environment (G × E) increases the dimensionality of the data considerably (Chiquet et al. 2013).
As mentioned above, multiple-trait analysis improves parameter estimates and prediction accuracy. With regard to parameter estimates, Schulthess et al. (2017) found that multiple-trait analysis improves parameter estimates, while Calus and Veerkamp (2011) found modest improvement using multiple-trait analysis compared to separate-trait analysis for prediction accuracy; these authors also showed that the performance of multiple-trait analysis depends considerably on whether only some traits are missing in only some individuals or in all individuals. Jia and Jannink (2012) also showed evidence favoring multiple-trait analysis compared to single-trait analysis, finding that the genetic correlation between traits is the basis for the benefit of multiple-trait analysis. Jiang et al. (2015) later arrived at the same conclusion in favor of multiple-trait analysis. Montesinos-López et al. (2016) also found modest improvement in the prediction accuracy of multiple-trait analysis when comparing correlated traits to the analysis that assumes null correlation between traits. Along these lines, He et al. (2016) concluded that modeling multiple traits could improve the prediction accuracy for correlated traits in comparison to univariate-trait analysis. Schulthess et al. (2017) also found that multiple-trait analysis is better in terms of prediction accuracy than separate-trait analysis, pointing out that the multiple-trait model is better when the degree of relatedness between genotypes is weaker.
There is evidence suggesting that even when traits are correlated, genomic-enabled prediction accuracy is not improved, as stated by Montesinos-López et al. (2017a) and as shown by Märtens et al. (2016), who compared multipletrait analysis to single-trait analysis and found no difference in terms of prediction accuracy. This issue was also documented by Oliveira and Teixeira-Pinto (2015) who proved this result by stating that, in the multivariate linear regression case (when the covariates in each equation are the same), even if the errors are strongly correlated, the multivariate model gives the same result (both point estimates and standard errors) as fitting individual regressions with ordinal least squares for each outcome, despite the level of correlation between the errors.
As a breeding tool, genomic selection (GS) uses all available molecular markers (Meuwissen et al. 2001) to design genomic-assisted breeding programs and develop new marker-based models for genetic evaluation. GS provides opportunities to obtain higher rates of genetic gain than traditional phenotypic selection in less time and at a reasonable cost. For example, in animal breeding, GS allows animal scientists to select young animals early in life that do not have records, greatly reducing evaluation costs and generation intervals when compared to the traditional progeny test schemes (Schaeffer 2006;Boichard et al. 2016). In general, for traits that have a long generation time or are difficult to evaluate (i.e., insect resistance, breadmaking quality, and others), GS is cheaper and/or easier than traditional phenotypic selection because more candidates can be characterized for a given cost, thus enabling increased selection intensity. Hence, GS has a number of merits over traditional selection because it reduces selection duration and increases selection accuracy, intensity, efficiency, and gains per unit of time. In addition, it saves time and financial investment, along with producing reliable results (Rutkoski et al. 2011;Desta and Ortiz 2014). This enables faster development of improved crop varieties to cope with the challenges of climate change and the decrease in arable land (Bhat et al. 2016).
Therefore, we propose an alternative method for analyzing multiple-trait and multi-environment data-one that takes into account the correlation between traits. This model will be useful for analyzing multiple-trait and multienvironment data because the linear predictor may include the following interaction terms: environment × trait, genotype × trait, and three-way interaction (environment × genotype × trait), assuming an unstructured variance-covariance matrix in the genetic and residual covariance matrices of traits and an identity matrix for the correlation matrix between environments.
The proposed method consists of four steps. In the first step, we transform the original matrix of response variables into a matrix of response variables of the same dimension but between uncorrelated transformed traits using singular value decomposition (SVD), which is equivalent to using principal component analyses (PCA). This first step is performed ignoring all the information of the design effect and other covariates. In the second step, given that the traits are not correlated, we apply a single-trait analysis where we can take into account the design effect, along with the effects of genotypes, environments, genotype × environment interaction, and other covariates, if they are available. In the third step, we collect and put together all the parameter estimates of the single analysis, which are transformed in the fourth step to obtain the parameter estimates and/or predictions for the traits in the original scale of the multiple-trait and multiple-environment data. Our approach has the same goal as the canonical transformation method proposed by Thompson (1977), which involves using special matrices to transform the observations on several correlated traits into new variables that are uncorrelated to each other, which means that these new variables can be analyzed as single-trait analysis, but the results (predictions) are transformed back to the original scale of the observations (Mrode 2014). However, our method is different to the Thompson (1977) method since our approach directly decorrelates the matrix of response variables with the SVD, while the Thompson (1977) method transforms the variance of the response var(Y) = Σ t + R, such that QRQ T = I and QΣ t Q T = W, where I is an identity matrix and W is a diagonal matrix, of course assuming that Σ t and R are positive definite matrices and that there is a matrix Q. More details of this method can be found in chapter 6 of the book by Mrode (2014). A detailed example of the Thompson (1977) method is available in Appendix E, section E.1, in the book by Mrode (2014). The Thompson (1977) method has the inconvenience that if the (co)variances R and Σ t are unknown, the transformations mentioned above need to be applied at each iteration of the Henderson mixed model equations. For this reason, it's implementation is not straightforward using current univariate software.
The advantage of the proposed alternative method is that the analysis can be performed directly using the current software for univariate genomic selection and prediction. However, it is important to point out that when the distribution of the traits has considerably departed from normality, the proposed method does not guarantee independence between the transformed traits, a key assumption for the successful implementation of the proposed model, since for non-normal traits lack of correlation does not imply independence. Also, the proposed method can be implemented to perform the analysis even when there are some missing traits per individual and some individuals are missing in some environments, as long as the level of unbalance is not strong.

Statistical models
Multiple-trait multiple-environment model Since genotype × environment interaction is of paramount importance in plant breeding, the following univariate linear mixed model is usually used for each trait: where y ij represents the normal response from the jth line in the ith environment (i = 1, 2, …, I, j = 1, 2, …, J). E i represents the effect of the ith environment and is assumed as a fixed effect, g j represents the random effect of the genomic effect of the jth line, with g = (g j , …, g J ) T~N 0; σ 2 1 G g À Á , σ 2 1 denotes the genomic variance and G g is of order J × J and represents the genomic relationship matrix (GRM) and is calculated using the Van Raden (2008) method as G g = ZZ T p , where p denotes the number of markers and Z the matrix of markers of order J × p. The G g covariance matrix is constructed using the observed similarity at the genomic level between lines, rather than the expected similarity based on pedigree. gE ij is the random interaction term between the genomic effect of the jth line and the ith environment where gE = (gE 11 , …, gE IJ ) T~N 0; σ 2 2 I I G À Á , σ 2 2 denotes the variance of the interaction term of genotype by environment, and e ij is a random error term associated with the jth line in the ith environment distributed as N(0, σ 2 ), with σ 2 denoting the residual variance. This model is usually used for each of the l = 1, …, L traits, where L denotes the number of traits under study. Next we will present the multivariate version of model (1); for this reason, first we provide the notation for the matrix variate normal distribution, which is a generalization of the multivariate normal distribution. In particular, let the (n × p) random matrix, M, be distributed as matrix variate normal distribution denoted as M~NM n×p (Η, Ω, Σ), if and only if, the (np × 1) random vector vec(M) is distributed as multivariate normal denoted as N np (vec(Η), Σ ⊗ Ω); therefore, NM n×p denotes the (n × p) dimensional matrix variate normal distribution, Η is a (n × p) location matrix, Σ is a (p × p) first covariance matrix, and Ω is a (n × n) second covariance matrix (Srivastava and Khatri 1979). vec(.) and ⊗ are the standard vector operator and Kronecker product, respectively.
To account for the correlation between traits, all of the L traits given in Eq. (1) are jointly modeled in a whole multiple-trait, multiple-environment mixed model as follows: where Y is of order n × L, X is of order n × I, β is of order I × L, Z 1 is of order n × J, b 1 is of order J × L and contains the first interaction term genotype × trait, Z 2 is of order n × IJ, b 2 is of order IJ × L and contains the second interaction term genotype × environment × trait, and e is of order n × L, with b 1 distributed under matrix variate normal distribution as NM J×L (0, G g , Σ t ), where Σ t is the unstructured genetic (co)variance matrix of traits of order L × L, b 2~N M JI×L (0, where Σ E is an unstructured (co)variance matrix of order I × I, and e~NM n×L (0, I n , R e ), where R e is the unstructured residual (co)variance matrix of traits of order L × L, and G g is the GRM described above. The Bayesian multiple-trait and multiple-environment (BMTME) model resulting from Eq. (2) was implemented by Montesinos-López et al. (2016).
First, we provided a modified version of the original BMTME model proposed by Montesinos-López et al. (2016), and in the next section, we will provide the modified Gibbs sampler for this modified BMTME model.

Gibbs sampler for the BMTME model
Outlined below is the Gibbs sampler for estimating the parameter of interest in the BMTME model. While the order is somewhat arbitrary, we suggest the following: Step 1. Simulate β according to the normal distribution given in Supplementary material (A.1).
Step 2. Simulate b 1 according to the normal distribution given in Supplementary material (A.2).
Step 3. Simulate b 2 according to the normal distribution given in Supplementary material (A.3).
Step 4. Simulate Σ t according to the inverse Wishart (IW) distribution given in Supplementary material (A.4).
Step 5. Simulate Σ E according to the IW distribution given in Supplementary material A (A.5).
Step 6. Simulate R e according to the IW distribution given in Supplementary material A (A.6).
Step 7. Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics.
The main differences between this Gibbs sampler and that given by Montesinos-López et al. (2016) are: (i) that this modified Gibbs sampler assumes an unstructured variance-covariance matrix for environments, while the original BMTME model assumes a diagonal variance-covariance matrix for environments, and (ii) that the original BMTME model used non-informative priors based on the Half-t distribution of each standard deviation term and uniform priors on each correlation of the covariance matrices of traits (genetic and residual). The modified BMTME model presented here assumes weak informative priors not based on the Half-t distribution of each standard deviation (details of the hyperparameters of the BMTME model are given in Supplementary material). The hyperparameters for the BMTME model were set similar to those used in the BGLR software (Pérez-Rodríguez and de los Campos 2014). The modified full conditional of this modified BMTME model, which supports the Gibbs sampler given above, is provided in Supplementary material. Next, we will describe the parameterization of the model given in Eq.
(2) to develop the proposed alternative method that is based on SVD.

BMTME_Thompson version
Following Thompson (1977) and Ducrocq and Chapuis (1993), we define Q as a matrix of order L × L such that QΣ t Q T = D t , where D t is a diagonal matrix also of order L × L and QR e Q T = I t . The Q matrix always exists and can be calculated as Q = L T P, where P ¼ U e B À0:5 e U T e , where U e and B e are obtained by applying the SVD to R e ¼ U e B e U T e , while L T is obtained also by applying the SVD to PΣ t P T = LD t L T . Then, by applying a linear transformation to Eq. (2), we obtain: Then note that: Since D t and I t are diagonal matrices of order t × t, the full conditional distributions for the transformed random effects b & 1 and b & 2 of the BMTME model are: where e It is important to point out that the full conditionals of b & 1 and b & 2 are diagonals for traits; for this reason, these full conditionals can be sampled independently for each trait, as: Full conditional for b &ðlÞ 1 for l = 1, 2, …, L where e Σ b &ðlÞ Full conditional for b &ðlÞ 2 for l = 1, 2, …, L where e Σ b &ðlÞ Therefore, the Gibbs sampler for the BMTME Thompson version should be: Step 1. Simulate β according to the normal distribution given in Supplementary material (A.1).
Step 5. Simulate Σ t according to the IW distribution given in Supplementary material (A.4).
Step 6. Simulate Σ E according to the IW distribution given in Supplementary material (A.5).
Step 7. Simulate R e according to the IW distribution given in Supplementary material (A.6).
Step 8. Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics.
The Gibbs sampler for the BMTME Thompson version is similar to the original modified Gibbs sampler except that steps 2 and 3 were replaced for univariate sampling for the transformed random effects (b 1 and b 2 ), which are back transformed in Step 4. The advantage of the BMTME Thompson version is that it allows sampling the random effects of b 1 and b 2 independently for each trait, which allows improving the speed of the Gibbs sampler because it can be parallelized. However, the remaining parameters (β, Σ t , Σ E , and R e ) are sampled exactly as the original Gibbs sampler.

BMTME_Approx model with SVD
An alternative but only approximate method is to transform the matrix of response variables with SVD as Y = UDV T , where U and V are orthogonal matrices called the left and right singular vectors, respectively, of dimensions n × n and L × L, while D is a rectangular diagonal matrix of the singular values of order n × L (where only the first L values of D are positive while the rest are zeros). Therefore, the reparametrized model given in Eq. (2) can be rewritten as: where and e * is of order n × L. Note that the parametrized model does not have the same conceptual definition as the original model (2), first, because β * is a random matrix and not an unknown constant matrix as the matrix of fixed effects, β, and second, because b Ã 1 , b Ã 2 , and e * are no longer independent. However, if we fix V as part of the subjacent data structure, and we suppose that Σ t = VD t1 V T and R e = VD te V T (they have a restricted parameter space, Lin and Smith (1990) À Á , and e *~N M n×L (0, I n , D te ), where D t1 and D te are diagonal variance-covariance matrices of dimension L × L. It is important to point out that, under this approximate model, the (co)variance matrix for environment is assumed an identity matrix, Σ E = I I .
To use the existing software, we replaced the distribution of transformed random effects With this, the resulting model (8) can be estimated with the R package BGLR, which is appropriate for univariate analysis. From Eq. (8), it is clear that the parameter estimates and predicted values of the original model (Eq. (2)) without transformation can be approximated as: Steps for implementing the proposed BMTME_Approx model Step 1: De-correlate the original traits with the SVD as Y = UDV T and use it as response variable Y * = UD = YV.
Step 2: Implement model (1) but using one column at a time of the uncorrelated matrix Y * as the response variable. This means that a total of L single analyses are done with the model in Eq. (1).
Step 3: With the output of Step 2, the predicted values can be calculated in terms of the transformed values with b Step 4: Finally, with Eq. (12), we obtain the predicted values in terms of the original traits. Furthermore, with Eqs. (9-11, 13-15), we obtain the parameter estimates of the beta coefficients, β, random effects b 1 and b 2 , as well as the variance-covariance matrices of traits corresponding to traits in the first interaction term, Σ t1 , for traits in the second interaction term, Σ t2 , and the residual variance-covariance matrix of traits, R e . The R code for implementing this proposed BMTME_Approx model in BGLR is given in Supplementary material.
For implementing the proposed BMTME_Approx model with random cross-validation, we will follow the four-step procedure exactly as described above, except for the first step, which is modified. Now, in Step 1, we will decorrelate the traits in the training data set (Y t ) with the SVD where the subscript t denotes that these matrices were estimated with the training data set. Then we transform the response variable into Y * = U t D t = Y t U t , and expand Y * with the number of rows that are the same size as the testing data set; the expanded rows should all be replaced with NA, to represent the missing values. The positions of the expanded rows with missing values will correspond to the testing data set. Once this is done, the remaining steps must be followed exactly as in the above procedure that is described for the full data set. Note that the dimension of the response variable matrix with the training data set has fewer rows than the full data set of response variables.
It is important to point out that the BMTME model was built to estimate only one variance-covariance matrix of traits, Σ t , involved in the two interaction terms, genotype × trait and genotype × environment × trait. However, the BMTME_Approx model allows estimating a variance-covariance matrix of traits for each interaction term in which the traits are involved. The BMTME model is also able to estimate an unstructured variance-covariance matrix for the environments, Σ E ; this is not reported for the BMTME_Approx model because an identity matrix is assumed.

Hyperparameters
First we provide the hyperparameters for the BMTME model: β~MN I×L (β 0 , , and R e~I W(ν e = 5, S e = S e ); β 0 was obtained as the least square of each trait. The remaining hyperparameters S βt , S t , S E , and S e are given in Supplementary material. The hyperparameters for the BMTME_Approx were exactly the same to those of the BMTME, but for the univariate analysis, it were as those used in the BGLR software (Pérez-Rodríguez and de los Campos 2014). The proposed Gibbs sampler was implemented in the R-software (R Core Team 2018). A total of 60,000 iterations were performed with a burn-in of 20,000, so that 40,000 samples were used for inference. To eliminate potential problems due to the autocorrelation function (ACF), we considered a thinning of 5. The convergence of the MCMC chains was monitored using trace plots, ACF and Gelman-Rubin diagnostics. We provide weakly informative priors to implement the proposed models. It is important to point out that the proposed BMTME_Approx model only works for normally distributed traits. However, when there is considerable departure from normality, we suggest using independent component analysis (ICA) instead of SVD for transforming the matrix of response variables (Y) (see Supplementary material for its implementation).

Simulated data set 1 and data set 2
To test the proposed models and methods, we simulated multiple-trait and multiple-environment data using the model in Eq. (2). For this first data set, we used the following parameters: 3 environments, 3 traits, 200 genotypes, and 1 replication of the environment-trait-genotype combination. We assumed that β T = [13,10,5,12,8,7,11,9,6], where the first three beta coefficients belong to traits 1, 2, and 3 in environment 1, the second three values belong to the three traits in environment 2, and the last three belong to environment 3. We assumed that the GRM is known and equal to G g = 0. Therefore, the total number of observations is 3 × 200 × 3 × 1 = 1800, that is, 600 for each trait. Since a covariance matrix can be expressed in terms of a correlation matrix (R r ) and a standard deviation matrix D 1=2 r , with r = t, E, e, where r = t represents the genetic covariance between traits, r = E represents the genetic covariance matrix between environments, and r = e represents the residual covariance matrix between traits. For the three covariance matrices (r = t, E, e), we used R r = 0.75I 3 + 0.25J 3 , where J 3 is a matrix of order 3 × 3 of ones, and D 1=2 t = diag(0.9, 0.8, 0.9), D 1=2 E = diag(0.5, 0.65, 0.75) and D 1=2 e = diag(0.6, 0.42, 0.33). For the second data set, the parameters used in the simulation were: β T = [13, 12.5, 12, 11.5, 11, 10.5, 10, 12, 11.5, 11, 10.5, 10.5, 10,10,11,11.5,12, 12, 11, 10,10.5], where the first seven beta coefficients belong to traits 1-7 in environment 1, the second seven values to the 7 traits in environment 2, and the last seven belong to environment 3.
The matrix of the relationship between lines was generated as G g = 0.3I 200 + 0.7J 200 and was equal to the first simulation data set. Here the total number of observations is 3 × 200 × 7 × 1 = 4200, that is, 600 for each trait. For two of the three covariance matrices (f = t, e), we used = diag(1, 1.0003, 1.0003, 1, 0.9996, 1, 1.0003), (1, 1, 1), that is, we assumed independence between the environments. It is important to point out that this second data set has a high genetic and environmental correlation between traits.

Experimental data sets
Maize data set The first real data set used for implementing the proposed model is composed of 309 double-haploid maize lines. Traits available in this data set include grain yield (GY), anthesis-silking interval (ASI), and plant height (PH); each of these traits was evaluated in three optimum rain-fed environments (EBU, KAT, and KTI). After editing, information from 158,281 markers was used. This data set was also used by Montesinos-López et al. (2016) and includes best linear unbiased estimates (BLUEs) obtained based on a mixed model analysis of individual trials of a first analysis.

Wheat data set
Here we present information on the second real data set used for implementing the proposed model. This real data set is composed of 250 wheat lines that were extracted from a large set of 39 yield trials grown during the 2013-2014 crop season in Ciudad Obregon, Sonora, Mexico ). The traits under study were days to heading (DH), GY, PH, and the green normalized difference vegetation index (NDVI). Each of these traits was evaluated in three environments (Bed2IR, Bed5IR, and Drip). The marker information used after editing was from 12,083 markers, this data set also used by Montesinos-López et al. (2016); those interested in obtaining more details about this data set can consult this publication. The phenotypes of each trait are BLUEs obtained after a first analysis where they were adjusted by the experimental field design.

High-throughput (HTP) data set
This data set belongs to an experiment that used HTP wheat plant phenotyping conducted at Ciudad Obregon, Sonora, México. The data set is comprised of 976 wheat lines that were extracted from a large set of 1170 lines from the CIMMYT Global Wheat Program. The following traits were under study: GY, DH, red normalized difference vegetation index (RNDVI), green normalized difference vegetation index (GNDVI), simple ratio (SRa), ratio analysis of reflectance spectra chlorophyll a (RARSa), ratio analysis of reflectance spectra chlorophyll b (RARSb), ratio analysis of reflectance spectra chlorophyll c (RARSc), normalized pheophytinization index (NPQI), and photochemical reflectance index (PR). Each of these traits was evaluated in three environments (drought, irrigated, and reduced irrigation). After marker editing, information from 1448 markers was used. This data set was also used by Montesinos-López et al. (2017b, c). The phenotypes of each trait are BLUEs obtained after a first analysis where they were adjusted by the experimental field design.

Large EYT set
This data set belongs to CIMMYT's three elite yield trial (EYT) nurseries, consisting of 2505 lines of wheat, genotyped by genotyping-by-sequencing (GBS). These were evaluated for GY, DH, and PH in five environments (BED_5IR, FLAT_5IR, BED_2IR, FLAT_DRIP, and LHT) evaluated in Ciudad Obregon, Mexico, under bed and flat planting systems. The EYT nurseries were sown in 39 trials, each containing 28 lines and two checks that were arranged in an alpha lattice design with three replications and six blocks. The nurseries were evaluated for the three traits under study on a plot basis during 2014 (EYT 13-14), 2015 (EYT 14-15), and 2016 (EYT 15-16). We used BLUEs as observed values of the breeding lines resulting from adjusting for the corresponding experimental design. All the 2505 lines were genotyped using GBS (Elshire et al. 2011;Poland et al. 2012) at Kansas State University, with an Illumina HiSeq2500 for obtaining genome-wide markers. Markers with missing data >60% (minor allele frequency <5% and percentage of heterozygosity >10%) were removed, and we obtained 2038 markers, which were used for the analysis. Also, the traits used are BLUEs obtained after a first analysis where they were adjusted by the experimental field design in each trial.

Random cross-validation scheme
For testing the prediction ability of the proposed models, the BMTME_Approx model and the BMTME model, we implemented a type of cross-validation where all the traits are missing in some individuals and the information of some individuals is missing in some environments (that is, their lines and traits are missing), but in at least one environment there is information available on those individuals. We implemented a 20 random cross-validation scheme for all the data sets under study, with the exception of the HTP and the large EYT data sets, in which we implemented 10 random cross-validations. For the 20 random cross-validation scheme, in each partition we assigned 20% of the data to the testing set and the remaining 80% of the data to the training set, while for the 10 random crossvalidation scheme, we assigned 30% of the data to the testing set and 70% to the training set. The models were fitted with the information in the training data sets, and the prediction accuracy was evaluated with the testing data sets.
The metrics used for reporting the prediction accuracy were the Pearson's correlation (Cor) and the mean square error of prediction (MSEP) obtained by averaging the information of the 20 (or 10) random partitions resulting from the testing data sets. The models were implemented in the R package (R Core Team 2018) and the proposed BMTME_Approx model can be implemented in the BGLR package of de los Campos and Pérez-Rodríguez (2014). On the other hand, the BMTME model was implemented in R with the model proposed by Montesinos-López et al. (2016).

Data repository
The phenotypic and genotypic information of the experimental wheat and maize data sets included in this study can be downloaded from the link http://hdl.handle.net/11529/ 10646 (Montesinos-López et al. 2016). This link includes phenotypic data on maize (Data.maize) and wheat (Data. trigo), as well as genomic data on maize (G.maize) and wheat (G.trigo).
The HTP data and materials used in this study can be downloaded from the link given in Montesinos-Lopez et al.

Results
The results are described in two main sections: The first section presents the results of the simulated data sets, while the second presents the results of the real data sets. It is important to point out that we do not present results for the BMTME Thompson version because it is only a different reparametrization of the original BMTME model.

Simulated data sets
Data set 1 First, we present the parameter estimates of the BMTME and the BMTME_Approx models. Table 1 shows that the beta coefficients of both models are similar. In general, the beta coefficients of the BMTME model are larger than those of the BMTME_Approx model, the smallest difference is 3% observed in environment 1 and trait 1, and the largest difference is 15.7% observed in environment 3 and trait 3. The variance-covariance matrix of traits (Σ t ) of the BMTME model is quite similar to the variance-covariance matrices of the BMTME_Approx model (Σ t1 , Σ t2 ). The variance-covariance components of the residual of both models are quite similar, with the smallest difference (1.6%) observed in trait 2 and trait 1 and the largest difference (33.7%) observed in trait 3 and trait 2. It is worth pointing out again that the BMTME and BMTME_Approx models are different; consequently, we cannot expect the exact same parameter estimates. Finally, when comparing the observed versus the predicted values for each trait using Pearson's correlation and MSEP, we observed that the BMTME_Approx model produced predicted values that are very similar to those of the BMTME model (see Table 1).
With regard to prediction accuracy, Table 2 shows that the BMTME model was the best: In six out of the nine trait-environment combinations, it was superior to the BMTME_Approx model in terms of Pearson's correlation and MSEP. On average, the BMTME model was superior to the BMTME_Approx model by 0.94% and 0.88% in terms of Pearson's correlation and MSEP, respectively. From the results of these simulated data, it is evident that the BMTME_Approx model is very similar to the BMTME model in terms of prediction accuracy (Table 2).

Data set 2
First, we present the parameter estimates of the BMTME and BMTME_Approx models. Table 3 shows that the beta coefficients of both models are similar, and in general, the beta coefficients of the BMTME model are larger than those of the BMTME_Approx model. The smallest difference is 5.15%, which is observed in environment 3 and trait 6, while the largest difference is 12.37%, observed in environment 1 and trait 5. The variance-covariance matrix of traits (Σ t ) in the BMTME model is very similar to the variance-covariance matrices of the BMTME_Approx model (Σ t1 , Σ t2 ). The variance-covariance components of the residual of both models are quite similar, with the smallest difference (0.48%) observed in trait 7 and trait 1 and the largest difference (27.38%) observed in the variance of trait 5. It is worth reiterating that, with the BMTME and BMTME_Approx, we should not expect exactly the same parameter estimates, as the models are different. Finally, when comparing the observed versus the predicted values for each trait using Pearson's correlation and MSEP, the BMTME_Approx model produced predicted values that were slightly better than those of the BMTME model (see Table 3). In terms of Pearson's correlation, the smallest The best predictions for each trait-environment combination are in bold Cor and MSEP denote Pearson's correlation and mean square error of prediction, respectively, between the observed and predicted values. True values denotes the true parameter values used for simulating the data β denotes the beta coefficients, Σ t denotes the genetic (co)variance matrix of traits, Σ E denotes the genetic (co)variance matrix of environments, R e denotes the residual (co)variance matrix of traits, and the symbol hat (^) denotes estimates of the corresponding parameters  Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 difference in favor of the BMTME_Approx model was 6.73% observed in trait 6, while the largest difference, also in favor of the BMTME_Approx model, was 14.16% observed in trait 3. In terms of MSEP, the smallest difference was 13.81% in trait 1 and the largest difference was 19.34% in trait 5, both in favor of the BMTME_Approx model. Table 4 shows that the proposed approximate model, BMTME_Approx, was better than the BMTME model in terms of Pearson's correlation, as shown in 14 out of the 21 trait-environment combinations. However, in terms of MSEP, the BMTME model was superior to the the BMTME_Approx model, as exemplified by 12 out of the 21 trait-environment combinations. On average, in terms of Pearson's correlation, the BMTME_Approx model was better than the BMTME model by 7.24%, while in terms of MSEP, both models were, on average, almost identical.

Maize data set
First, we compared the parameter estimates of the two models (BMTME and BMTME_Approx). Table 5 shows that the beta coefficients of the proposed BMTME_Approx model are all similar to those of the BMTME model. The variance-covariance matrix of traits (Σ t ) in the BMTME model is quite similar to the variance-covariance matrices of the BMTME_Approx model (Σ t1 ,Σ t2 ). When comparing the variance-covariance components of the residual of both models, we observe that 6 out of the 9 terms are not significantly different; however, the remaining 3 terms are quite different, as they belong to the covariance of trait PH with the other traits. Finally, when comparing the observed versus the predicted values for each trait using Pearson's correlation and MSEP, we see that the BMTME_Approx model is slightly better, since in Pearson's correlation, it outperformed the BMTME model in 2 out of the 3 traits and, on average, the BMTME_Approx model was 2.1% (for trait GY) and 1.4% (for trait ASI) better than the BMTME model. In terms of MSEP, the BMTME_Approx model was better than the BMTME model in 2 out of the 3 traits, and, on average, the BMTME_Approx model was 7% better (for trait GY) and 4.11% better (for trait ASI) than the BMTME model (see Table 5). In general, the parameter estimates and predictions are relatively similar in this data set.
Next, in Table 6, we see that the proposed approximate model, BMTME_Approx, was better than the BMTME model in terms of Pearson's correlation, as shown in 5 out of the 9 trait-environment combinations. However, in terms of MSEP, the BMTME model was superior to the BMTME_Approx model in 7 out of the 9 trait-environment combinations. On average, the BMTME_Approx model  Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7   Trait6   variance matrix of traits, and the symbol hat (^) denotes estimates of the corresponding parameters was better than the BMTME model by 11.72% in terms of Pearson's correlation, while conversely, the BMTME model was better than the BMTME_Approx by 37.94% (Table 6) in terms of MSEP.

Wheat data set
First, we compared the parameter estimates of the two models (BMTME and BMTME_Approx) and then their prediction accuracy. Table 7 shows that the beta coefficients of the proposed BMTME_Approx model are similar to those of the BMTME in 9 out of the 12 parameters; however, there are substantial differences in three of the beta coefficients corresponding to trait NDVI. The variance-covariance matrix of traits (Σ t ) in the BMTME model is significantly different from the variance-covariance matrices of the BMTME_Approx model (Σ t1 , Σ t2 ).
When comparing the variance-covariance components of the residual of both models, it is evident that 9 out of the 16 terms are not notably different, while the remaining 7 terms are. These very different terms belong to the covariance of trait NDVI with the other traits. Finally, when we compared the models in terms of the observed versus the predicted values for each trait using Pearson's correlation and MSEP, we saw that the BMTME_Approx model was marginally better, and it was superior to the BMTME model for Pearson's correlation in 3 out of the 4 traits and, on average, 0.99% better for trait DH, 22.04% better for trait GY, and 17.16% better for trait PH. Furthermore, for MSEP, the BMTME_Approx model was better than the BMTME model in 3 out of the 4 traits and, on average, 18.88% better for trait DH, 48.85% better for trait GY, and 57.5% better for trait PH (see Table 7).
When comparing both models in terms of prediction accuracy with only the testing set of the 20 random partitions implemented, Table 8 shows that the proposed alternative model, BMTME_Approx, was better than the BMTME model in 8 out of the 12 trait-environment combinations in terms of Pearson's correlation and in 7 out of the 12 trait-environment combinations in terms of MSEP. On average, the BMTME_Approx model was also better than the BMTME model by 2.1% in terms of Pearson's correlation and by 6.87% in terms of MSEP (Table 8). The best predictions for each trait-environment combination are in bold Cor and MSEP denote Pearson's correlation and mean square error of prediction, respectively, between the observed and predicted values. β denotes the beta coefficients, Σ t denotes the genetic (co)variance matrix of traits, Σ E denotes the genetic (co)variance matrix of environments, R e denotes the residual (co)variance matrix of traits, and the symbol hat (^) denotes estimates of the corresponding parameters The best predictions for each trait-environment combination are in bold a Traits: grain yield (GY), anthesis silking interval (ASI), and plant height (PH)

HTP data set
In Table 9 we provide the parameter estimates of the HTP data set, which has 10 traits evaluated in 3 environments, with 976 lines evaluated in each environment. The estimates of the beta coefficients are reasonable since they are consistent with the sample average for each trait in each environment, which is equivalent to the least square estimates.
Rather than reporting the variance-covariance of traits, we report the correlation between traits, which can be more useful, as it gives a better idea of the level of correlation between traits in the whole data set. Our proposed BMTME_Approx model estimates two variance-covariance matrices for the genetic part of the traits and one for the residual part of the traits; in this vein, we report two correlation matrices for the genetic part of the traits and one for the residual correlation of the traits. In the section "genetic Cor and MSEP denote Pearson's correlation and mean square error of prediction, respectively, between the observed and predicted values. β denotes the beta coefficients, Σ t denotes the genetic (co)variance matrix of traits, Σ E denotes the genetic (co)variance matrix of environments, R e denotes the residual (co)variance matrix of traits, and the symbol hat (^) denotes estimates of the corresponding parameters a Traits: days to heading (DH), grain yield (GY), plant height (PH), and the green normalized difference vegetation index (NDVI). Each of these traits was evaluated in three environments (Bed2IR,Bed5IR,and Drip) correlation between traits" in Table 9, the upper diagonal part gives the correlations between traits corresponding to the term genotype × trait, where we can observe that 33 (73.33%) of the 45 possible correlations were, in absolute values, >0.5. On the other hand, in the lower diagonal part of the genetic correlation section are given the genetic correlations that correspond to the three-way interaction term environment × genotype × trait. Here we can observe that only 16 (35.56%) of the 45 possible correlations are >0.5. In general, there is evidence of a reasonably high genetic correlation between traits in this HTP data set. In Table 9, the section on the residual correlation between traits shows that only 15 (33.33%) out of the 45 possible residual correlations are >0.5. This information indicates that the phenotypic correlation between the ten traits is more influenced by the genetic part than by the residual part. Finally, the "prediction accuracy" section of Table 9 shows the Pearson's correlation and MSEP obtained for the whole data set between the observed and predicted values for each trait. In general, all the observed Pearson's correlations are >0.86, with the exception of the RNDVI, GNDVI, and SRa traits, which had a Pearson's correlation <0.8.
The prediction accuracies for the HTP data set are given in Table 10. The predictions reported were the average of the ten random partitions of the testing data set. Here we only report the prediction accuracy for the BMTME_Approx model since it is extremely difficult in terms of implementation time to run the BMTME_model due to the large data set. Table 10 shows that the best predictions in terms of Pearson's correlation were observed for trait DH, while the predictions for the rest of the traits in general were low. Also, in terms of Pearson's correlation, the best predictions were observed in the irrigated environment. On the other hand, in terms of MSEP for traits GY and DH, the best predictions were observed in the drought environment; however, there were no significant differences in terms of prediction accuracy between the remaining traits, since in all traits, the MSE was almost zero.

Large EYT data set
In Table 11, we provide the parameter estimates of the EYT data set that has 3 traits evaluated in 5 environments, with 2505 lines evaluated in each environment. We found that the estimates of the beta coefficients are reasonable since they are consistent with the least square estimates.
Here we also report the correlation between traits to have a better idea of the level of correlation between traits in the whole data set. Furthermore, we report two matrices of correlation for the genetic part of traits and one for the residual correlation of traits. In the section "genetic correlation between traits" in Table 11, the upper diagonal part gives the correlations between traits corresponding to the term genotype × trait, where we can observe that the correlation between PH and DH was the only one >0.5. On the other hand, in the lower diagonal part of the genetic correlation section, between traits are given the genetic correlations that correspond to the three-way interaction term The best predictions for each trait-environment combination are in bold  (SRa), ratio analysis of reflectance spectra chlorophyll a (RARSa), ratio analysis of reflectance spectra chlorophyll b (RARSb), ratio analysis of reflectance spectra chlorophyll c (RARSc), normalized pheophytinization index (NPQI), and photochemical reflectance index (PR) Cor and MSEP denote Pearson's correlation and mean square error of prediction, respectively, between the observed and predicted values. Environments are drought, irrigated, and reduced irrigation (Red_Irrig). In bold are the correlations >0.5. β denotes the beta coefficients, Σ t denotes the genetic (co)variance matrix of traits, Σ E denotes the genetic (co)variance matrix of environments, R e denotes the residual (co)variance matrix of traits, and the symbol hat (^) denotes estimates of the corresponding parameters environment × genotype × trait. Here we can observe that only the correlation between GY and PH was >0.5 (Table  11).
In Table 11, in the section on the residual correlation between traits, we can observe that only the correlation between GY and PH was >0.5. Finally, in the "prediction accuracy" section of Table 11, we can observe the Pearson's correlation and MSEP obtained for the whole data set between the observed and predicted values for each trait. In general, all the observed Pearson's correlations are >0.89.
The prediction accuracies for this large wheat data set are given in Table 12. The predictions reported are the average of the ten random partitions of the testing data set. Here we only report the prediction accuracy for the BMTME_Approx model; owing to the size of the data set, it almost impossible to run the BMTME_model. Table 12 shows that the best predictions in terms of Pearson's correlation were for trait GY, while the worst were for trait DH. At the same time, the best predictions for GY, DH, and PH in terms of Pearson's correlation were observed in environments BED_5IR, FLAT_5IR, and BED_2IR, respectively. On the other hand, in terms of MSEP for traits GY, DH, and PH, the best predictions were observed in BED_2IR, LHT, and FLAT_5IR, respectively (Table 12).

Discussion
The amounts of data that the breeding programs around the world are generating continue to increase; consequently, there is a growing need to extract more knowledge from the data being produced. To this end, multiple-trait models are commonly used to take advantage of correlated traits to improve parameter estimation and prediction accuracy. However, when there is a large number of traits, implementing these types of models is challenging. Therefore, it is necessary to develop efficient multiple-trait and multipleenvironment models for whole-genome selection in order to take advantage of multiple correlated traits. In this paper, we propose an alternative method for analyzing multi-trait data that could be useful for whole-genome selection in the context of an abundance of traits. Some advantages of the proposed method are: (i) it can be implemented in current genomic selection software that was built for univariate Cor and MSEP denote Pearson's correlation and mean square error of prediction, respectively, between the observed and predicted values a Genetic correlation between traits (the upper diagonal corresponds to b Σ t1 , while the lower diagonal is for b Σ t2 ). Traits are grain yield (GY), days to heading (DH), and plant height (PH) implemented using conventional software for whole-genome prediction.

Conclusions
The results of the simulated and real data sets show that the proposed alternative method produced results that are similar to those of the conventional multiple-trait analysis. For this reason, the proposed method is an attractive alternative for analyzing multiple-trait data in the context of a large number of traits. However, it is important to point out that the significant differences found in parameter estimates between the proposed BMTME_Approx model and the BMTME model can be attributed mainly to the fact that the BMTME_Approx model allows the estimation of two genetic variance-covariance matrices for traits, one for the interaction term genotype × trait, and the other for the term environment × genotype × trait, while the BMTME model only estimates one genetic matrix of variance-covariance for traits.