Being the first tissues that intercept various nutrients and water uptake, roots play an essential role in plant growth and development. Root architecture highly varies in response to different nutrient deficiencies1 and adapts to continually changing growth conditions through structural plasticity2. Most plants can utilize only half of the applied N, losing it in the form of nitrates (NO3), which can cause environmental hazards3. Spinach requires high rates of N fertilizer to produce high biomass and quality. It is estimated that about 60% of N applied to spinach in commercial production is lost through leaching4 due to its shallow root system5 and short production cycle. Spinach is also relatively poor in its NO3 reducing capacity6,7. About 80% of the total root length of spinach settles in the upper 0–15 cm soil layer, and the root distribution is not affected even by the addition of usable nitrates below 15–30 cm5. Thus, the rooting structure and the growth patterns that define roots are essential considerations to delineate the differences in nutrient absorption and efficiency. This is even more important in spinach due to its high affinity for N8. The morphological changes in the root system are regulated by the plant's nutritional status and interaction with the surrounding environment as detected through the localized signals by roots. Several studies have discussed the N-dependent (NO3, ammonium and glutamate) changes in the root architecture across species9,10,11,12,13. Therefore, investigation of root development is of importance for understanding plant responses to low-N stresses.

Unlike soil, where the complex root-microbial association may naturally facilitate the plant absorptive capacity of hitherto unavailable N14,15, root development in the soilless media is entirely reliant on an exogenous supply of nutrients. Due to their inert nature, the soilless systems minimize the changes that could occur due to gradients in temperature, oxygen status, water availability, pH, bulk density, or seasonal variations. In these cases, the root structure and appearance are a function of genetics as modified by the soilless matrix, and the concentration of the nutrients applied16. Different studies have profiled root systems and their association with the environment in field conditions, non-soil media and microbes17,18,19. Modeling techniques for root feature diversity, structure, and activity have been attempted, including multivariate and machine-learning techniques20,21,22. However, much remains to decipher the genetic and potential importance prioritization of the assigned root traits in influencing the above-ground biomass.

Root structure and its functioning are associated with N uptake, which influences plant performance and yield. Although improving root performance is relevant to all crops, it is particularly relevant to short-cycle vegetable crops like spinach that would benefit from early below-ground vigor23. However, the shorter crop duration of spinach allows only a short time for root development24,25. To date, much research on developing breeding strategies to improve N uptake or utilization is focused on modifying the root architecture of main crops like maize, rice, and wheat. Although several root architectural, topological, and developmental traits such as deeper and longer roots, rapid growth, and higher root density associated with higher N use efficiency have been identified26,27 in cereal crops, such efforts to identify root traits that respond to N stress to capture the N available at depths in vegetables are limited.

Soilless indoor farming is becoming increasingly popular in the recent years28,29. Since a soilless indoor system relies on the artificial supply of essential nutrients28,29,30, nutrient management is critical for the harvestable quality and quantity of a crop31,32. The differences in the value for money' between varieties of the same species may sometimes narrow down how efficiently the crop can uptake and convert the nutrients into harvestable products. Rooting system and architecture are important determinants of efficiency that maintain a favorable balance between resource investment (photosynthates) and resource acquisition (raw materials)33,34. Varieties with favorable root architecture that enhance nutrient uptake and photosynthate use will reduce operating costs while balancing nutritional content and yield. This study investigates root traits and their relationship to the harvestable 'above-ground' biomass of spinach grown in two contrasting N supplies in a uniformly controlled indoor environment. Our assumption here is that in a uniform soilless matrix, genetics and the N management are the two primary sources of variation defining root architecture. We have used supervised random forest optimized machine learning algorithm by 'ranger'35, to predict the importance of eight root traits on the spinach's harvestable biomass at the 'baby-spinach' stage. We have also applied META-R36 to determine the genetic and phenotypic correlations and heritability to prioritize the root traits in influencing the harvestable shoot biomass. Finally, we compare the machine learning classification and the prioritization based on genetic correlations and present the top trait(s) with the highest selection potential.


Plants, plant material, experimental setup, and evaluation environment

A collection of 202 spinach (S. oleracea) accessions maintained and provided by the USDA-National Plant Germplasm System (NPGS) ( at Ames, Iowa, U.S.A was used in the present study. The plants were grown in a growth chamber under controlled conditions of 12/12 h (light/dark), 22 °C, and 75% relative humidity. The seeds of each accession were sown in triplicate in turface (Turface Athletics MVP, PROFILE Products LLC, Buffalo Grove, Illinois, USA) in small pots (10.2 cm × 10.2 cm and 8.9 cm deep). Each set of replicates was completely randomized across separate shelves. After the seedling emergence, plants were fertilized with Peters professional ready mix (5-11-26, hydroponics special water-soluble fertilizer, Everris NA Inc., Ohio, USA) every after four days. Two concentrations of nitrogen (N), low N (50 ppm), and high N (200 ppm) were used for low and high N management, respectively. An additional N for the high N-management was provided using calcium nitrate, and equivalent calcium (3.85 mM) was replaced by calcium chloride in the low N-management. The concentrations of the macro/micro-nutrients present in the fertilizer solution are provided in Supplementary Dataset Table S1.

The experimental research in the lab facilities for this study was performed as required by Texas A&M System Regulation (15.99.06 Use of Biohazards in Research, Teaching, and Testing) and the University’s Rule for the use of Biohazards and Dual Use Research of Concern (15.99.06.M1 Use of Biohazards, toxins and rDNA and DURC), approved by the Texas A&M Institutional Biosafety Committee (IBC).

Plant material processing, root imaging, and data processing

The plants were harvested at the physiological maturity of baby spinach (5–6 leaves) after 41 days of sowing. Each plant was carefully pulled from the turface and washed with running water to clear any debris off the roots. The roots were separated from the shoot at the cotyledonary nodes and floated in water. The lateral roots were separated gently using a fine tip paintbrush to minimize the overlapping of roots. The images were taken by digitally scanning roots of individual plants (Supplementary Dataset Figure S1) using a high-resolution scanner (Calibrated Color Optical scanner STD4800 with Special Lighting System) and scanned images analyzed using WinRHIZO Pro software (Regent Instruments Inc. Canada). Categorization of the traits was adapted from the Fine-Root Ecology Database37 classification and included: (1) morphology for root length (RLength) and average root diameter (RAvdiam); (2) an indication of the complexity of the root system architecture measured using number of tips, forks (number of root bifurcations), and crossings (Xsings; overlapping parts); (3) root system of the standing crop (RSSC) for root volume (RVol), root surface area (RSarea) and root weight (RWt). The harvestable above-ground biomass was determined as fresh shoot weight (SWt) (WR P-series balance, Model 500P, VWR International, U.S.A.) after removing the excess surface moisture by gently paper-bloating the wet roots followed by a two-minute air drying.

Data analysis

The analysis pipeline was designed to define the phenotypic, genotypic, and predictive relationship between the root traits and between the root traits and the SWt of spinach plants grown in a soilless system. We determined the rg and rp (defined below in the section -Determining the genotypic and phenotypic correlation between traits) between root traits and rg and rp between the root traits and the SWt within and between the two N managements. Parallel to the correlation analyses, we used a supervised random forest machine learning38,39 technique to estimate the variable importance of each root phenotype in predicting the above-ground shoot biomass. The details of the parallel procedures are described below.

Individual trait and combined management variance analysis and mean separation

Linear mixed models were implemented in lmer from package lme4 of R using REML via Multi Environment Trial Analysis with R, META-R36, to calculate the adjusted means (best linear unbiased estimates, BLUEs, and predictors, BLUPs) for each root and shoot variable, under the two N managements. For individual analyses, we used the model is

$$Y_{ik} = \mu + Rep_{i} + Gen_{k} + \varepsilon ik$$

where Yijk is the trait of interest, µ is the mean effect, Repi is the effect of the ith replicate represented by the complete blocks, Genk is the effect of the kth genotype, εijk is the error associated with the ith replication, jth incomplete block and the kth genotype, which is assumed to be normally and independently distributed, with mean zero and homoscedastic variance σ2. For the combined analyses across the two N-managements, the model was adjusted to

$$Y_{ijkl} = \mu + Mgt_{i} + Rep_{j} \left( {Mgt_{i} } \right) + Gen_{l} + Mgt_{i} \times Gen_{l} + \varepsilon ijkl,$$

where the new terms Mgti and Mgti × Genl are the effects of the ith N-management and the N-management by genotype interaction, respectively. Genotype and N-management were both treated as random effects, and BLUPs were used to estimate random effects and BLUEs to estimate the fixed effect. Grand means were separated based on Fisher’s Least Significance Difference (LSD) at α = 5%. We also determined the coefficients of variation (CV) for all traits.


We estimated the average broad-sense heritability (repeatability, H) of three replicates in each N-management, which is also an estimate of correlation expected between line means from the three replicate trials conducted in the two N-managements. H was determined40 on a line mean basis as

$$H = \sigma^{2}_{g} \times \sigma^{2}_{g} + \sigma^{2}_{e} /nReps,$$

and combined for the two N-managements as

$$H = \sigma^{2}_{g} \times \sigma^{2}_{g} + \sigma^{2} ge/nMgt + \sigma^{2}_{e} /\left( {nMgt \times nReps} \right),$$

where σ2g and σ2e are the genotype and the residual error variance components, respectively; nReps is the number of replicates, σ2ge is the variance component of genotype by N-management interaction, and nMgt is the number of N-managements in the analysis.

Determining the genotypic and phenotypic correlation between traits

Genetic and phenotypic correlations were calculated for each trait pair, within and across the N-managements. The rg were also determined in META-R, which applies the equations from Cooper et al.41. Between the N-managements, rg was estimated as

$$r_{g} \left( {jj_{0} } \right) = r_{p} \left( {jj_{0} } \right)/H_{j} H_{j0} ,$$

and between traits within a single N-management,

$$r_{g} = \sigma_{g} \left( {jj_{0} } \right)/\sigma_{g} \left( j \right)\sigma_{g} \left( {j_{0} } \right),$$

where rp(jj0) is the phenotypic correlation between N-managements j and j0; and Hj and Hj0 are the heritability of N-managements j and j0 respectively, σg(jj0) is the arithmetic mean of all pairwise genotypic covariances between trait j and j0, and σg(j)σg(j0) is the arithmetic average of all pairwise geometric means among the genotypic variance components of the traits.

For graphical illustrations, cluster analysis based on the environment distance matrix (1—Genetic Correlation matrix) was also performed using the ‘Ward’ method42, creating a dendrogram. In each case, a minimum heritability threshold was set at 0.1; any trait whose heritability within or between the two N-managements was lower than 0.1 was excluded from the analysis and was not plotted. For phenotypic correlations, simple Pearson correlations between different pairs of N-managements or traits were used.

Predicting correlated response

Correlated response (CR) was predicted for SWt to determine if direct or indirect selection resulting from selecting a root trait would be superior under similar N-managements. We used the formula:

$$CR_{swt} = ir_{g} \surd H_{RT} V_{g(swt)} ,$$

where CRswt is the correlated response of SWt, rg is the genetic correlation, HRT is the repeatability of root traits, Vg(swt) is the genetic variance of SWt, and i is the selection intensity whose estimate we assumed would be similar between traits. Thus, CRswt was compared to direct response (R),

$$R_{swt} = i\surd H_{swt} V_{g(swt)} ,$$

by CRswt/Rswt = rg√HRT/√ Hswt. That is, if rg × HRT > Hswt, then indirect selection would be superior43,44,45,46.

Summary of data preparation and evaluation by machine learning

To rank the root traits by importance in the prediction models for SWt, we used the random forest (RF) modeling in R. RF is a powerful ensemble machine learning tool that combines the outputs of numerous decision tree classification models. We applied the regression type to the randomForest47 package and first ran the regression on default tuning parameters. We also invoked a user-defined hyperparameter tuning in the ranger35 package to optimize our models; ranger is a C++ implementation of Breiman's FORTRAN-based random forest algorithm39. Finally, we compared the accuracy of the model from the randomForest default tuning and that from ranger hyperparametric search tuning. The function missForest48 was used to impute missing data. Outliers were normalized by an internally derived proximity matrix procedure built into the RF. In the normalization, if an outlier case i and case j both end up in the same tree node, increase proximity prox(ij) between i and j by 1 and accumulate over all trees in RF, the outliers are normalized by twice the number of trees in RF. This creates a proximity square matrix where observations that are ‘similar or alike’ in value have proximities close to 1 and the dissimilar proximity closer to 0.

Default tuning and model evaluation

The default data split (into 63.25% as training dataset and the remainder as the validation set) were applied to train each N-management. The 63.25% is the proportion expected of unique observations in a bootstrap sample39,49. The typical range is ~ 60 to 85%, where smaller sample sizes can reduce the training time but may introduce more bias than necessary, while too large a sample size can increase performance but at the risk of overfitting because it introduces more variance39. An F-fold cross-validation feature in RF invokes the evaluation of model performance by training it on a number of different smaller datasets and evaluating them over the other smaller testing sets. By default, randomForest randomly splits the number of datasets of almost the same sized k-folds, and each of the folder models is evaluated over the number of folders and tested on the remaining test set39,47. This process is repeated until all the subsets have been evaluated. The regression tree parameters are tuned further by choosing the number of independent variables (m) using the default as m = p/3, where p is the total number of root traits in our analysis. This helps generalize the data best to return the least out-of-bag (OOB) error rate and provides a built-in validation set. Further, it identifies the number of trees (ntree), required to stabilize the error rate during tuning more efficiently39,49. OOB error is an internal error estimate of a random forest as it is being constructed39. It is estimated by testing each tree built from the bootstrap aggregation (bagging samples) from the training set on the remaining (validation set as defined by the default data split) of the samples not used in building that tree; randomForest chooses a random subset of features and builds many regression trees, and the model averages out all the predictions of the decision trees.

Setting hyperparametric tuning and evaluation parameters

We first determined the optimal number of trees (ntree), producing the least OOB error rate. The term ‘Optimal’ refers to the number of trees that were just enough to stabilize the OOB error and improve efficiency by avoiding unnecessary runs, as determined from the ntree function and which.min argument. The optimal number of trees was delineated first by running 500 trees with the default 63.25:36.75 split for each N-management. A hypergrid search was then constructed across several hyperparameter combinations and looped through each combination (details are in Supplementary Dataset Table S2). The model was evaluated over all the combinations we passed in the search space function using the grid search. The hyperparameter searches applied (values in parenthesis) were: mtry (4 variables from 2 to 48), for the number of random root trait variables to include in each tree. The primary concern was to tune the number of candidate variables (features) to sample at each tree node split randomly; 2) sampsize (sample fractions 0.55, 0.60, 0.65, 0.70, 0.75, 0.80) denoting the number of samples to train, 3) model nodesize (8 variables from 1 to 48), which determines the minimum number of samples within the terminal nodes and thus controls the complexity of the trees. This was necessary to set a bias-variance tradeoff where smaller node size allows for deeper, more complex trees with the risk of introducing more variance (risk of overfitting) and larger node results in shallower trees which may introduce more bias (risk of not fully capturing unique patterns and relationships in the data)39. The minimum OOB root mean square error (OOB_RMSE) was set at zero (0). For ntree, we used 500 because the OOB_RMSE from hypergrid searches stabilized with less than 500 trees (Fig. 1). The resulting hyperparameter combination producing a model with the least prediction variance and OOB_RMSE was selected and tested with the training set and an independent, smaller test sample data (not used in each N-management training). The independent test sample was obtained from the optimal sampsize split, without bootstrap replacement.

Figure 1
figure 1

Model training and grid search across 192 hyperparametric combinations. Random forest out-of-bag error rate (a and b) compared to the corresponding validation error rate (c and d) averaged along tree optimization splits. The optimal number of trees and the corresponding mean squared error (out-of-bag error) are shown by the orange arrows on the graphs (ad). The root mean squared error (RMSE) at which all hyperparameters converged is shown in the histograms (e and f). RMSE at optimal hyperparameters (obtained for training model and used for independent validation sets) are shown by arrows in the histograms Also see Supplementary Table S2-tuning hyperparameters).

Constructing accuracy function and evaluating the models

We applied all the above models on the same independent test (validation set) dataset to evaluate the accuracy of the grid-tuned model compared to the default model. The best of the two (lower mean error rate and greater mean model R2 of regression trees) was used as our prediction model in a new regression run in randomForest to predict the new test set. The validation set was used as the independent test set since the sampsize split was done before bootstrapping and before sampsize split-variable randomization of the predictor root traits. Furthermore, we set importance as equals impurity' in the above modeling, which allows us to assess the variable importance of the root traits. Variable importance is measured by recording the decrease in mean square error (MSE) each time a variable is used as a node split in a tree39. The remaining error left in predictive accuracy after a node split is the ‘node impurity,’ and a variable that reduces this impurity is considered more important than those that do not. Consequently, the root variable with the greatest accumulated reduction in MSE was considered the more impactful39.


Model tuning and accuracy

Optimized hyperparameters used in cross-validation with both the training and test samples in the two N-management datasets were variable size, node size, sample size, and the number of trees. All the optimized settings resulting from hyperparameter grid search are in Table 1, and the stabilizing ntree and OOB-RMSE are shown by arrows in Fig. 1a–f, respectively. For each N-management, hyperparameters were constructed (tested) across a total of 196 models (combinations: 8 predictor parameters [all the eight root traits], 4 node sizes [2, 4, 6, 8], across 6 sample sizes [0.56, 0.60, 0.632, 0.70, 0.74, 0.80], and 1 predetermined optimal number of trees [within 500], Supplementary Table S2). To assess the performance of our tuned hyperparameters, we compared the mean OOB prediction error, and the mean OOB variance explained (R squared_OOB, Table 2) between the tuned OOB regression model (training) and the test model, and between the training model and the RF default models. The OOB prediction error of 0.210 g of SWt under LN and 2.712 g of SWt under HN for the trained model (Table 1) were marginally smaller (the smaller, the better) than those for the RF default cross-validation (LN, 0.227 g; HN, 2.794 g) and the test model (LN,0.239 g, and HN, 2.799 g). Here, we define a marginal' difference as separation by at least 1%, but not large enough to be statistically significant by the conventional (non-machine learning) mean separation methods. The prediction variance explained (R squared_OOB) by our tuned model (57.2% of SWt) was similar to that of the internally cross-validated (default RF) model (56.8% of SWt) under the LN management but marginally larger (61.3% of SWt) than the default (60.2% of SWt) under the HN management. The hyperparametric tuning performed marginally better in the test model, with 61.2% and 64.6% variance predicted for SWt under the LN and HN managements, respectively. Overall, the tuned model performed as expected (with no large penalty even with varying sample size) on the independent test data.

Table 1 Summary of model evaluation and test (validation) by random forest machine learning.
Table 2 The genetic correlations and phenotypic correlations of root traits.

Prediction by machine learning is a close approximation of both the genetic and phenotypic correlations

By machine learning (ML), we ranked root traits based on the predicted importance of each in the models in describing its relationship with SWt. The traits with the greatest variable importance (Tips under HN and Xsings under LN) identified by ML also had the largest rg and rp to SWt in the corresponding N-managements (Fig. 2). The traits with the least variable importance (RAvdiam under HN and RVol under LN) were correctly identified in three out of four cases by the rg and rp ranking methods; the exception was rg in the LN, where RVol followed RSarea as the least correlated to SWt. Overall, 6 out of 8 traits were correctly matched between ML and rg_HN, with a two-trait rg position switch, e.g., RVol then Xsings, instead of Xsings then RVol. Only 2 out of 8 traits were correctly matched between ML and rg_LN with a two-trait position switch between six traits, e.g., RAvdiam then Tips, RWt then Forks and RVol then RSrea, instead of vice versa (Fig. 2). These two-trait switches, in our opinion, are minor alterations if we consider the fact that the four root traits predicted by ML as the most important and the four predicted as the least important under LN were also the same root traits with the largest and smallest rg and rp under LN. The four traits predicted by ML as the most important and the least important under HN were the same for rp_LN except for RWt instead of RAvdiam for rp_LN. It seems that as the rg and rp decrease so does the ability of our ML variable importance predictions to correctly identify the ranking of root trait-SWt genotypic and phenotypic correlations, and vice versa.

Figure 2
figure 2

Ranking of root traits based on machine learning and multi-environment trait analysis methods. (a) and (b) are rankings of the variable importance of eight root traits (vertical axis) in a model for predicting shoot weight under high N-management, HN (a) and low N-management, LN (b) using random forest machine learning algorithm. Each point in (a) and (b) is a cumulative reduction in mean square error (MSE) of regression models each time the corresponding predictor root trait variable is used as a node split in a decision tree. The larger the cumulative reduction in the MSE, the greater the important ace and the higher a root trait ranks across all models tested. The traits have been ranked in ascending order with the least important at the bottom and the most important at the top on the vertical axis. (c) and (d) show the genetic correlation coefficients (rg), and (e) and (f) the phenotypic correlation coefficient (rp) between each variable and shoot weight. Correlation size and significance are shown on each bar. A negative ( −) before a number indicates it is a negative correlation, *significant at P = 0.05 and **significant at p < 0.01.

Pairwise genetic and phenotypic correlations are affected by N management

The rg between all traits within and between N-managements are summarized in Tables 2, 3 and 4, while the structure of these correlations is shown in Fig. 3. The main reason for estimating rg is to determine if a greater response on SWt would result by selecting a root trait as a secondary trait. The pairwise rg and rp between root traits and SWt were nearly identical under HN except for correlations between RAvdiam andSWt where rg = 0.045 and rp = 0.168. Under LN, the similarity was also high except for correlations between RAvdiam and SWt where rg =  −0.83 and rp =  −0.48, and between RSarea and SWt where rg = 0.05 and rp = 0.28. With the exception of RAvdiam and Xsings, the rg and rp between the other root traits and SWt were generally larger under HN compared to the corresponding rg and rp between root traits and SWt under LN (Fig. 2; Tables 2 and 3). The close similarities between rg and rp between within an N-management show that in our experimental growth environment, the rg and rp between the root traits and SWt were close approximations of each other within an N-management, likely due to low effects of the environment external to the growth facility on genotypes. Across the N-managements, the rg between the root traits and SWt were greater than the corresponding rp, most likely due to the between N-management treatment noise confounding the phenotypic variance on rp. As mentioned earlier, here, H was less than a threshold we had set at 0.1; therefore, H values for Xsings between the N-managements and rg associated with it were not included in the across-management output (Table 4).

Table 3 The genetic correlations (lower triangle) and phenotypic correlations (upper triangle) of eight root trait and one shoot trait under low N-management.
Table 4 The genetic correlations (lower triangle) and phenotypic correlations (upper triangle) between traits across the two N managements.
Figure 3
figure 3

The structure of genetic correlations between traits within and between the two N-managements. Ward method was used to construct the dendrograms. In the correlations between N-Managements, Xsings is missing because its heritability was < 0.1 threshold we had set when calculating the genetic correlation coefficients.

Variation among traits and between N managements

Variation among the genotypes and in the genotype × management had a significant effect for all the traits in the two N managements, but RAvdiam had the least variation among the genotypes with CV ~ 10.3% in the HN and ~ 7.1% in the LN, compared to CV ranging between ~ 47% to ~ 80% among the rest of the traits in the two N managements (Table 6). Because CV is highly dependent on the grand mean of a trial50, we exercised caution in using a CV to interpret the comparative variability between traits under the two N managements. The trait CV between the two N management did not show any specific pattern to suggest a trait variance inflation due to the low nitrogen treatment or the differences in the grand means.

Comparing heritability and correlated response in LN and HN among root traits

The mean H of only two traits, SWt and Xsings, were substantially greater under HN than under LN, while the mean H of RAvdiam was substantially greater under LN than under HN. Trait heritability showed varying degrees of H ‘instability’ between the two N-managements (Table 5). Some showed higher H under one N management than the other, with RAvdiam being the most heritable (under LN) and having the largest H difference between the N-managements (53.6% in LN and 95.7% in LN). RWt had the least difference (50.3% in HN and 50.6% in the LN). H between the N-managements was very low, yet the genetic correlation was very high, suggesting that H is affected by the environmental noise between the two N-managements.

Table 5 Variance, heritability and means separation for shoot weight and root traits.

RAvdiam had significant negative and large correlations (− 0.83) to SWt under LN, while the correlations were positive but not significant under HN. It was ranked among the highest in the RF regression under LN but ranked bottom in the HN. The heritability was very high (0.957) under LN compared to 0.536 under HN. On the other hand, Xsing, was ranked highest ranked by the RF regressions under LN, but lower under HN (rg and rp were positive and highly significant in both managements, while H was medium' in the LN, 0.588; in the HN, 0.658). These observations suggest that selecting for small root diameter may be desirable for improving shoot weight of baby spinach in low N. In fact, of all the root traits in this report, only RAvdiam had a predicted high indirect selection efficiency (113%) for SWt (rg_RTswt × HRT > Hswt, Table 6). Stronger correlated response efficiency of SWt was predicted for morphological (RLength and RAvdiam) and architectural traits (Tips and Xsings) compared to the standing crop root traits (RSarea, RVol, and RWt) (Table 6).

Table 6 Predicting the correlated response and indirect selection efficiency for shoot weight using root traits as the secondary traits under high N-management and low N-management.


The analysis pipeline was designed to define the phenotypic, genotypic, and predictive relationship between root architecture traits (Forks, Tips, and Xsings), root morphological traits (RLength, and RAvdiam), the root system of the standing crop (RSarea, RVol, and RWt) and between the root traits and the SWt of spinach grown in a soilless system. The objective was to determine root traits that have the greatest effect on the harvestable shoot under low N and thus can be used as a secondary trait to select for high NUE germplasm. We determined the phenotypic and genetic correlations (rp and rg, respectively) between root traits and between the root traits and the SWt within and between the N-managements. Parallel to the correlation analyses, we used the predictive random forest machine learning technique to rank the root phenotypes according to their strength to predict SWt.

Selecting the root traits with predicted potential as secondary traits for shoot biomass

We predicted the correlated response (CR) in SWt resulting from selecting any of the eight root traits to speculate its suitability as a secondary trait. An important component in defining CR is the H, which integrates information on genetic variation and environmental noise into one statistic and thus is useful in planning breeding programs51. One condition that must be met for indirect selection to be effective is H and rg must be high in both the selection and target environments43,44 even though H and rg are environment- and population-specific43,46. Fortunately, H has been strikingly similar in many environments34,52, and H variations in indoor growth environments are expected to be low52,53. In this context, since our H estimates were the average repeatability of 3-replicate trials in each of the N-managements, it may also be used to estimate the correlation expected between line means obtained from trials conducted at different indoor systems. Selecting a root trait in one N-management where H is high may predict the performance in the other, but we think the actual phenotypic quantity may vary substantially. However, if heritability values are high for both traits, then the correlation in breeding values dominates the phenotypic correlation43,45,46. Since the H values in SWt (H ~ 70%) was not as high compared to H of RAvdiam (~ 96%), and with a genetic correlation of − 0.827 between them, the correlation in environmental values within N-management which dominated the phenotypic correlation (− 0.481) between RAdiam and SWt may have been mainly due to LN-management effect on SWt. Thus, an LN environment that minimizes the devaluation of the breeding values between the RAvdiam and SWt must be maintained, and we believe indoor environments may provide this condition.

In this context, selecting for a root trait as a secondary trait should produce a correlated response in spinach shoot biomass, and the ratio CRswt/Rswt (Table 5) provides such an indirect selection criterion. It is clear (from these ratios) that direct selection of shoot biomass (SWt) is predicted to be superior to selecting for most of the eight root traits as a proxy in both N-managements. The exception is RAvdiam, which was predicted to result in superior indirect selection efficiency (112.8%) for SWt, i.e., a gain of ~ 12.8% in SWt by selecting against large RAvdiam. Other traits resulted in lower than 100% predicted efficiency; for instance, under LN was Xsings (74.6%), while the rest were less than 45% efficient. In the HN management, RLength (77.2%) and Tips (82.6%) were the most efficient but not enough for an indirect selection advantage.

The case for selecting against large average root diameter in baby spinach

We have noticed that under HN, RAvdiam did not have significant rg to any of the other root traits and with SWt, and only had significant positive rp with RWt and RVol (Table 2). Meanwhile, under LN, RAvdiam had significant positive rp only to RSarea and RVol, non-significant rp with RWt (0.104) and RLength, but significant negative rp with Tips (− 0.238) and Xsings (− 0.310) (Table 3). Spinach requires high N supply8, and under such conditions, it seems RAdiam is not likely to substantially influence the yield differences observed in shoot biomass among the genotypes. However, the significant negative rp (− 0.481) and the highly negative rg (− 0.827) with SWt under LN management (Table 3) suggest that larger mean root diameter is associated with smaller mean shoot weight and vice versa. The reduction in root diameter-related phenes in the youngest maize nodes under N stress suggested that root diameter might play a role in adaptive stress responses54,55. Whether or not the greater mean RAvdiam was due to root girth expansion in a negative feedback response to low N or competing resource allocation8,34 was not investigated in this study. Based on the pattern of diameter change in response to nutrient concentration in different species, it is suggested that altering root diameter may be another way to save C costs in root growth during nutrient stresses56. Although it is unclear how the root anatomical changes influence spinach root diameter, maize roots showed reduced cell diameter and area of vessels but an increased amount of aerenchyma during LN stress54. It is plausible to assume that N is preferentially allocated to the roots to sustain their growth under LN than shoots, and the reduced N concentration act as the internal signal in regulating the response of axile root growth. Given the robust correlated response efficiency of SWt predicted for RAvdiam and lateral root traits (Tips and Xsings) in our study, the RAvdiam measure can be used to indicate the ratio of axial: lateral roots.

The genotypic correlations between RAvdiam and RSarea (0.533) and between RAvdiam and RVol (0.794) were also highly significant and positive. This implies that selecting against large root diameter may also select against RSarea and RVol under low N-management. Since there was non-significant rg but significant rp between RSarea and SWt, our data suggest that only a limited genetic linkage drag or pleiotropy45 on shoot yield might result from selecting against large RAvdiam. Moreover, RSarea and RVol are standing crop traits37 for ‘root bulk’, which are a product of morphological traits (RAvdiam and RLength) and root architectural traits (e.g., Tips, Forks, and Xsings). The significant small positive rp between RSarea and SWt may be a combined artifact of these other morphological and architectural components when we also consider a significant negative rp existed between RAvdiam and SWt in the LN management. This position is also supported by the fact that RVol had a negative rg (− 0.296) and an insignificant rp to SWt, and yet RVol had positive rg (0.439) and rp (0.931) to RLength; RLength, on the other hand, had significant rg (0.539) and rp (0.529) to SWt. RAvdiam was also highly heritable with H ~ 96% under LN, Table 6). We propose that the root average diameter is the only trait in this study that can successfully be selected against to improve the yield of shoot biomass low N. Further studies to validate these findings might benefit from testing in multiple growth conditions (temperature, humidity, and graduated N concentrations) to define a broader range of root trait-shoot yield relationships and N-responsiveness.

Resolving the conundrum around the antagonistic relationship between RAvdiam and SWt

Compared to SWt, RAvdiam has shown greater H (Table 6). The rg between SWt and RAvdiam can be high (Table 3). In other words, indirect selection for a secondary trait will be superior if the heritability of that trait is high, and the correlation between the traits is close to 145,46. In this study, the RAvdiam met these two critical criteria. However, for practical use in a breeding program, the secondary trait must also be inexpensive and easy to measure in large trials43,45. In that case, shoot biomass estimate could be used to select for roots bulk traits in production systems that target spinach roots as the end product. Because precision in imaging techniques for roots and shoot is rapidly evolving57,58, we believe that soilless systems can be designed to facilitate robust root metrics characterization to match the ease with which above-ground biomass can be phenotyped. It would also be worthwhile to determine if further selection for/ against other root traits would eventually result in a superior secondary selection for shoot biomass. How these relationships would play along as the plants mature under different indoor growth conditions or in the field conditions require further studies.

Although a soilless system reduces the complexities associated with soils, we believe that the selection of a root trait needs to be understood in the context of the possible complex interplay among root traits45,51. The possible complex interactions between the root traits that may have influenced the SWt were not explicitly considered in our interpretations. However, we have alluded to such complexity by describing the trait to trait correlations, which we hope should serve as an impetus for further inquiry. Although the expected genetic correlation between estimates of cultivar means are best obtained from independent sets of trials43,59, we hypothesize that under similar N treatments, manipulation of other growth conditions in independent indoor growth environments may lead to some deviations from the response to selection predicted here. With the advent of techniques in processing images and the deep learning60 frameworks that use advanced optimization and features from data, such prediction accuracy is likely to improve continually61,62,63. Nonetheless, machine learning recognized root traits would continue to rely on vigorous calibrations and field-based validation in systems of interest.

In conclusion, we report on the investigation of eight root traits genetic and phenotypic correlations with fresh shoot biomass of spinach grown in a soilless system in a controlled indoor environment. The plants were harvested at 41 d after sowing, a stage corresponding to the marketable baby spinach. We have used both genotypes by management and other conventional breeder statistics and a machine learning predictive technique to define candidate root traits with the potential for indirectly selecting for spinach shoot yield. The experiments were set up under two separate and contrasting N-managements. Of the eight root traits, the root average diameter emerged as the only candidate with a predicted indirect selection efficiency good enough to improve shoot biomass. However, it had a robust negative genetic correlation with shoot yield, making us believe that selecting against large root diameter may improve the fresh shoot yield of baby spinach. We have exercised caution in this interpretation by recommending further studies into the possible complex interactions among the root traits considered in improving shoot biomass yield in baby spinach.