Abstract
The efficient acquisition and transport of nutrients by plants largely depend on the root architecture. Due to the absence of complex microbial network interactions and soil heterogeneity in a restricted soilless medium, the architecture of roots is a function of genetics defined by the soilless matrix and exogenously supplied nutrients such as nitrogen (N). The knowledge of root trait combinations that offer the optimal nitrogen use efficiency (NUE) is far from being conclusive. The objective of this study was to define the root trait(s) that best predicts and correlates with vegetative biomass under differed N treatments. We used eight image-derived root architectural traits of 202 diverse spinach lines grown in two N concentrations (high N, HN, and low N, LN) in randomized complete blocks design. Supervised random forest (RF) machine learning augmented by ranger hyperparameter grid search was used to predict the variable importance of the root traits. We also determined the broad-sense heritability (H) and genetic (rg) and phenotypic (rp) correlations between root traits and the vegetative biomass (shoot weight, SWt). Each root trait was assigned a predicted importance rank based on the trait’s contribution to the cumulative reduction in the mean square error (MSE) in the RF tree regression models for SWt. The root traits were further prioritized for potential selection based on the rg and SWt correlated response (CR). The predicted importance of the eight root traits showed that the number of root tips (Tips) and root length (RLength) under HN and crossings (Xsings) and root average diameter (RAvdiam) under LN were the most relevant. SWt had a highly antagonistic rg (− 0.83) to RAvdiam, but a high predicted indirect selection efficiency (− 112.8%) with RAvdiam under LN; RAvdiam showed no significant rg or rp to SWt under HN. In limited N availability, we suggest that selecting against larger RAvdiam as a secondary trait might improve biomass and, hence, NUE with no apparent yield penalty under HN.
Similar content being viewed by others
Introduction
Being the first tissues that intercept various nutrients and water uptake, roots play an essential role in plant growth and development. Root architecture highly varies in response to different nutrient deficiencies1 and adapts to continually changing growth conditions through structural plasticity2. Most plants can utilize only half of the applied N, losing it in the form of nitrates (NO3−), which can cause environmental hazards3. Spinach requires high rates of N fertilizer to produce high biomass and quality. It is estimated that about 60% of N applied to spinach in commercial production is lost through leaching4 due to its shallow root system5 and short production cycle. Spinach is also relatively poor in its NO3− reducing capacity6,7. About 80% of the total root length of spinach settles in the upper 0–15 cm soil layer, and the root distribution is not affected even by the addition of usable nitrates below 15–30 cm5. Thus, the rooting structure and the growth patterns that define roots are essential considerations to delineate the differences in nutrient absorption and efficiency. This is even more important in spinach due to its high affinity for N8. The morphological changes in the root system are regulated by the plant's nutritional status and interaction with the surrounding environment as detected through the localized signals by roots. Several studies have discussed the N-dependent (NO3−, ammonium and glutamate) changes in the root architecture across species9,10,11,12,13. Therefore, investigation of root development is of importance for understanding plant responses to low-N stresses.
Unlike soil, where the complex root-microbial association may naturally facilitate the plant absorptive capacity of hitherto unavailable N14,15, root development in the soilless media is entirely reliant on an exogenous supply of nutrients. Due to their inert nature, the soilless systems minimize the changes that could occur due to gradients in temperature, oxygen status, water availability, pH, bulk density, or seasonal variations. In these cases, the root structure and appearance are a function of genetics as modified by the soilless matrix, and the concentration of the nutrients applied16. Different studies have profiled root systems and their association with the environment in field conditions, non-soil media and microbes17,18,19. Modeling techniques for root feature diversity, structure, and activity have been attempted, including multivariate and machine-learning techniques20,21,22. However, much remains to decipher the genetic and potential importance prioritization of the assigned root traits in influencing the above-ground biomass.
Root structure and its functioning are associated with N uptake, which influences plant performance and yield. Although improving root performance is relevant to all crops, it is particularly relevant to short-cycle vegetable crops like spinach that would benefit from early below-ground vigor23. However, the shorter crop duration of spinach allows only a short time for root development24,25. To date, much research on developing breeding strategies to improve N uptake or utilization is focused on modifying the root architecture of main crops like maize, rice, and wheat. Although several root architectural, topological, and developmental traits such as deeper and longer roots, rapid growth, and higher root density associated with higher N use efficiency have been identified26,27 in cereal crops, such efforts to identify root traits that respond to N stress to capture the N available at depths in vegetables are limited.
Soilless indoor farming is becoming increasingly popular in the recent years28,29. Since a soilless indoor system relies on the artificial supply of essential nutrients28,29,30, nutrient management is critical for the harvestable quality and quantity of a crop31,32. The differences in the value for money' between varieties of the same species may sometimes narrow down how efficiently the crop can uptake and convert the nutrients into harvestable products. Rooting system and architecture are important determinants of efficiency that maintain a favorable balance between resource investment (photosynthates) and resource acquisition (raw materials)33,34. Varieties with favorable root architecture that enhance nutrient uptake and photosynthate use will reduce operating costs while balancing nutritional content and yield. This study investigates root traits and their relationship to the harvestable 'above-ground' biomass of spinach grown in two contrasting N supplies in a uniformly controlled indoor environment. Our assumption here is that in a uniform soilless matrix, genetics and the N management are the two primary sources of variation defining root architecture. We have used supervised random forest optimized machine learning algorithm by 'ranger'35, to predict the importance of eight root traits on the spinach's harvestable biomass at the 'baby-spinach' stage. We have also applied META-R36 to determine the genetic and phenotypic correlations and heritability to prioritize the root traits in influencing the harvestable shoot biomass. Finally, we compare the machine learning classification and the prioritization based on genetic correlations and present the top trait(s) with the highest selection potential.
Methods
Plants, plant material, experimental setup, and evaluation environment
A collection of 202 spinach (S. oleracea) accessions maintained and provided by the USDA-National Plant Germplasm System (NPGS) (https://npgsweb.ars-grin.gov/) at Ames, Iowa, U.S.A was used in the present study. The plants were grown in a growth chamber under controlled conditions of 12/12 h (light/dark), 22 °C, and 75% relative humidity. The seeds of each accession were sown in triplicate in turface (Turface Athletics MVP, PROFILE Products LLC, Buffalo Grove, Illinois, USA) in small pots (10.2 cm × 10.2 cm and 8.9 cm deep). Each set of replicates was completely randomized across separate shelves. After the seedling emergence, plants were fertilized with Peters professional ready mix (5-11-26, hydroponics special water-soluble fertilizer, Everris NA Inc., Ohio, USA) every after four days. Two concentrations of nitrogen (N), low N (50 ppm), and high N (200 ppm) were used for low and high N management, respectively. An additional N for the high N-management was provided using calcium nitrate, and equivalent calcium (3.85 mM) was replaced by calcium chloride in the low N-management. The concentrations of the macro/micro-nutrients present in the fertilizer solution are provided in Supplementary Dataset Table S1.
The experimental research in the lab facilities for this study was performed as required by Texas A&M System Regulation (15.99.06 Use of Biohazards in Research, Teaching, and Testing) and the University’s Rule for the use of Biohazards and Dual Use Research of Concern (15.99.06.M1 Use of Biohazards, toxins and rDNA and DURC), approved by the Texas A&M Institutional Biosafety Committee (IBC).
Plant material processing, root imaging, and data processing
The plants were harvested at the physiological maturity of baby spinach (5–6 leaves) after 41 days of sowing. Each plant was carefully pulled from the turface and washed with running water to clear any debris off the roots. The roots were separated from the shoot at the cotyledonary nodes and floated in water. The lateral roots were separated gently using a fine tip paintbrush to minimize the overlapping of roots. The images were taken by digitally scanning roots of individual plants (Supplementary Dataset Figure S1) using a high-resolution scanner (Calibrated Color Optical scanner STD4800 with Special Lighting System) and scanned images analyzed using WinRHIZO Pro software (Regent Instruments Inc. Canada). Categorization of the traits was adapted from the Fine-Root Ecology Database37 classification and included: (1) morphology for root length (RLength) and average root diameter (RAvdiam); (2) an indication of the complexity of the root system architecture measured using number of tips, forks (number of root bifurcations), and crossings (Xsings; overlapping parts); (3) root system of the standing crop (RSSC) for root volume (RVol), root surface area (RSarea) and root weight (RWt). The harvestable above-ground biomass was determined as fresh shoot weight (SWt) (WR P-series balance, Model 500P, VWR International, U.S.A.) after removing the excess surface moisture by gently paper-bloating the wet roots followed by a two-minute air drying.
Data analysis
The analysis pipeline was designed to define the phenotypic, genotypic, and predictive relationship between the root traits and between the root traits and the SWt of spinach plants grown in a soilless system. We determined the rg and rp (defined below in the section -Determining the genotypic and phenotypic correlation between traits) between root traits and rg and rp between the root traits and the SWt within and between the two N managements. Parallel to the correlation analyses, we used a supervised random forest machine learning38,39 technique to estimate the variable importance of each root phenotype in predicting the above-ground shoot biomass. The details of the parallel procedures are described below.
Individual trait and combined management variance analysis and mean separation
Linear mixed models were implemented in lmer from package lme4 of R using REML via Multi Environment Trial Analysis with R, META-R36, to calculate the adjusted means (best linear unbiased estimates, BLUEs, and predictors, BLUPs) for each root and shoot variable, under the two N managements. For individual analyses, we used the model is
where Yijk is the trait of interest, µ is the mean effect, Repi is the effect of the ith replicate represented by the complete blocks, Genk is the effect of the kth genotype, εijk is the error associated with the ith replication, jth incomplete block and the kth genotype, which is assumed to be normally and independently distributed, with mean zero and homoscedastic variance σ2. For the combined analyses across the two N-managements, the model was adjusted to
where the new terms Mgti and Mgti × Genl are the effects of the ith N-management and the N-management by genotype interaction, respectively. Genotype and N-management were both treated as random effects, and BLUPs were used to estimate random effects and BLUEs to estimate the fixed effect. Grand means were separated based on Fisher’s Least Significance Difference (LSD) at α = 5%. We also determined the coefficients of variation (CV) for all traits.
Heritability
We estimated the average broad-sense heritability (repeatability, H) of three replicates in each N-management, which is also an estimate of correlation expected between line means from the three replicate trials conducted in the two N-managements. H was determined40 on a line mean basis as
and combined for the two N-managements as
where σ2g and σ2e are the genotype and the residual error variance components, respectively; nReps is the number of replicates, σ2ge is the variance component of genotype by N-management interaction, and nMgt is the number of N-managements in the analysis.
Determining the genotypic and phenotypic correlation between traits
Genetic and phenotypic correlations were calculated for each trait pair, within and across the N-managements. The rg were also determined in META-R, which applies the equations from Cooper et al.41. Between the N-managements, rg was estimated as
and between traits within a single N-management,
where rp(jj0) is the phenotypic correlation between N-managements j and j0; and Hj and Hj0 are the heritability of N-managements j and j0 respectively, σg(jj0) is the arithmetic mean of all pairwise genotypic covariances between trait j and j0, and σg(j)σg(j0) is the arithmetic average of all pairwise geometric means among the genotypic variance components of the traits.
For graphical illustrations, cluster analysis based on the environment distance matrix (1—Genetic Correlation matrix) was also performed using the ‘Ward’ method42, creating a dendrogram. In each case, a minimum heritability threshold was set at 0.1; any trait whose heritability within or between the two N-managements was lower than 0.1 was excluded from the analysis and was not plotted. For phenotypic correlations, simple Pearson correlations between different pairs of N-managements or traits were used.
Predicting correlated response
Correlated response (CR) was predicted for SWt to determine if direct or indirect selection resulting from selecting a root trait would be superior under similar N-managements. We used the formula:
where CRswt is the correlated response of SWt, rg is the genetic correlation, HRT is the repeatability of root traits, Vg(swt) is the genetic variance of SWt, and i is the selection intensity whose estimate we assumed would be similar between traits. Thus, CRswt was compared to direct response (R),
by CRswt/Rswt = rg√HRT/√ Hswt. That is, if rg × HRT > Hswt, then indirect selection would be superior43,44,45,46.
Summary of data preparation and evaluation by machine learning
To rank the root traits by importance in the prediction models for SWt, we used the random forest (RF) modeling in R. RF is a powerful ensemble machine learning tool that combines the outputs of numerous decision tree classification models. We applied the regression type to the randomForest47 package and first ran the regression on default tuning parameters. We also invoked a user-defined hyperparameter tuning in the ranger35 package to optimize our models; ranger is a C++ implementation of Breiman's FORTRAN-based random forest algorithm39. Finally, we compared the accuracy of the model from the randomForest default tuning and that from ranger hyperparametric search tuning. The function missForest48 was used to impute missing data. Outliers were normalized by an internally derived proximity matrix procedure built into the RF. In the normalization, if an outlier case i and case j both end up in the same tree node, increase proximity prox(ij) between i and j by 1 and accumulate over all trees in RF, the outliers are normalized by twice the number of trees in RF. This creates a proximity square matrix where observations that are ‘similar or alike’ in value have proximities close to 1 and the dissimilar proximity closer to 0.
Default tuning and model evaluation
The default data split (into 63.25% as training dataset and the remainder as the validation set) were applied to train each N-management. The 63.25% is the proportion expected of unique observations in a bootstrap sample39,49. The typical range is ~ 60 to 85%, where smaller sample sizes can reduce the training time but may introduce more bias than necessary, while too large a sample size can increase performance but at the risk of overfitting because it introduces more variance39. An F-fold cross-validation feature in RF invokes the evaluation of model performance by training it on a number of different smaller datasets and evaluating them over the other smaller testing sets. By default, randomForest randomly splits the number of datasets of almost the same sized k-folds, and each of the folder models is evaluated over the number of folders and tested on the remaining test set39,47. This process is repeated until all the subsets have been evaluated. The regression tree parameters are tuned further by choosing the number of independent variables (m) using the default as m = p/3, where p is the total number of root traits in our analysis. This helps generalize the data best to return the least out-of-bag (OOB) error rate and provides a built-in validation set. Further, it identifies the number of trees (ntree), required to stabilize the error rate during tuning more efficiently39,49. OOB error is an internal error estimate of a random forest as it is being constructed39. It is estimated by testing each tree built from the bootstrap aggregation (bagging samples) from the training set on the remaining (validation set as defined by the default data split) of the samples not used in building that tree; randomForest chooses a random subset of features and builds many regression trees, and the model averages out all the predictions of the decision trees.
Setting hyperparametric tuning and evaluation parameters
We first determined the optimal number of trees (ntree), producing the least OOB error rate. The term ‘Optimal’ refers to the number of trees that were just enough to stabilize the OOB error and improve efficiency by avoiding unnecessary runs, as determined from the ntree function and which.min argument. The optimal number of trees was delineated first by running 500 trees with the default 63.25:36.75 split for each N-management. A hypergrid search was then constructed across several hyperparameter combinations and looped through each combination (details are in Supplementary Dataset Table S2). The model was evaluated over all the combinations we passed in the search space function using the grid search. The hyperparameter searches applied (values in parenthesis) were: mtry (4 variables from 2 to 48), for the number of random root trait variables to include in each tree. The primary concern was to tune the number of candidate variables (features) to sample at each tree node split randomly; 2) sampsize (sample fractions 0.55, 0.60, 0.65, 0.70, 0.75, 0.80) denoting the number of samples to train, 3) model nodesize (8 variables from 1 to 48), which determines the minimum number of samples within the terminal nodes and thus controls the complexity of the trees. This was necessary to set a bias-variance tradeoff where smaller node size allows for deeper, more complex trees with the risk of introducing more variance (risk of overfitting) and larger node results in shallower trees which may introduce more bias (risk of not fully capturing unique patterns and relationships in the data)39. The minimum OOB root mean square error (OOB_RMSE) was set at zero (0). For ntree, we used 500 because the OOB_RMSE from hypergrid searches stabilized with less than 500 trees (Fig. 1). The resulting hyperparameter combination producing a model with the least prediction variance and OOB_RMSE was selected and tested with the training set and an independent, smaller test sample data (not used in each N-management training). The independent test sample was obtained from the optimal sampsize split, without bootstrap replacement.
Constructing accuracy function and evaluating the models
We applied all the above models on the same independent test (validation set) dataset to evaluate the accuracy of the grid-tuned model compared to the default model. The best of the two (lower mean error rate and greater mean model R2 of regression trees) was used as our prediction model in a new regression run in randomForest to predict the new test set. The validation set was used as the independent test set since the sampsize split was done before bootstrapping and before sampsize split-variable randomization of the predictor root traits. Furthermore, we set importance as equals impurity' in the above modeling, which allows us to assess the variable importance of the root traits. Variable importance is measured by recording the decrease in mean square error (MSE) each time a variable is used as a node split in a tree39. The remaining error left in predictive accuracy after a node split is the ‘node impurity,’ and a variable that reduces this impurity is considered more important than those that do not. Consequently, the root variable with the greatest accumulated reduction in MSE was considered the more impactful39.
Results
Model tuning and accuracy
Optimized hyperparameters used in cross-validation with both the training and test samples in the two N-management datasets were variable size, node size, sample size, and the number of trees. All the optimized settings resulting from hyperparameter grid search are in Table 1, and the stabilizing ntree and OOB-RMSE are shown by arrows in Fig. 1a–f, respectively. For each N-management, hyperparameters were constructed (tested) across a total of 196 models (combinations: 8 predictor parameters [all the eight root traits], 4 node sizes [2, 4, 6, 8], across 6 sample sizes [0.56, 0.60, 0.632, 0.70, 0.74, 0.80], and 1 predetermined optimal number of trees [within 500], Supplementary Table S2). To assess the performance of our tuned hyperparameters, we compared the mean OOB prediction error, and the mean OOB variance explained (R squared_OOB, Table 2) between the tuned OOB regression model (training) and the test model, and between the training model and the RF default models. The OOB prediction error of 0.210 g of SWt under LN and 2.712 g of SWt under HN for the trained model (Table 1) were marginally smaller (the smaller, the better) than those for the RF default cross-validation (LN, 0.227 g; HN, 2.794 g) and the test model (LN,0.239 g, and HN, 2.799 g). Here, we define a marginal' difference as separation by at least 1%, but not large enough to be statistically significant by the conventional (non-machine learning) mean separation methods. The prediction variance explained (R squared_OOB) by our tuned model (57.2% of SWt) was similar to that of the internally cross-validated (default RF) model (56.8% of SWt) under the LN management but marginally larger (61.3% of SWt) than the default (60.2% of SWt) under the HN management. The hyperparametric tuning performed marginally better in the test model, with 61.2% and 64.6% variance predicted for SWt under the LN and HN managements, respectively. Overall, the tuned model performed as expected (with no large penalty even with varying sample size) on the independent test data.
Prediction by machine learning is a close approximation of both the genetic and phenotypic correlations
By machine learning (ML), we ranked root traits based on the predicted importance of each in the models in describing its relationship with SWt. The traits with the greatest variable importance (Tips under HN and Xsings under LN) identified by ML also had the largest rg and rp to SWt in the corresponding N-managements (Fig. 2). The traits with the least variable importance (RAvdiam under HN and RVol under LN) were correctly identified in three out of four cases by the rg and rp ranking methods; the exception was rg in the LN, where RVol followed RSarea as the least correlated to SWt. Overall, 6 out of 8 traits were correctly matched between ML and rg_HN, with a two-trait rg position switch, e.g., RVol then Xsings, instead of Xsings then RVol. Only 2 out of 8 traits were correctly matched between ML and rg_LN with a two-trait position switch between six traits, e.g., RAvdiam then Tips, RWt then Forks and RVol then RSrea, instead of vice versa (Fig. 2). These two-trait switches, in our opinion, are minor alterations if we consider the fact that the four root traits predicted by ML as the most important and the four predicted as the least important under LN were also the same root traits with the largest and smallest rg and rp under LN. The four traits predicted by ML as the most important and the least important under HN were the same for rp_LN except for RWt instead of RAvdiam for rp_LN. It seems that as the rg and rp decrease so does the ability of our ML variable importance predictions to correctly identify the ranking of root trait-SWt genotypic and phenotypic correlations, and vice versa.
Pairwise genetic and phenotypic correlations are affected by N management
The rg between all traits within and between N-managements are summarized in Tables 2, 3 and 4, while the structure of these correlations is shown in Fig. 3. The main reason for estimating rg is to determine if a greater response on SWt would result by selecting a root trait as a secondary trait. The pairwise rg and rp between root traits and SWt were nearly identical under HN except for correlations between RAvdiam andSWt where rg = 0.045 and rp = 0.168. Under LN, the similarity was also high except for correlations between RAvdiam and SWt where rg = −0.83 and rp = −0.48, and between RSarea and SWt where rg = 0.05 and rp = 0.28. With the exception of RAvdiam and Xsings, the rg and rp between the other root traits and SWt were generally larger under HN compared to the corresponding rg and rp between root traits and SWt under LN (Fig. 2; Tables 2 and 3). The close similarities between rg and rp between within an N-management show that in our experimental growth environment, the rg and rp between the root traits and SWt were close approximations of each other within an N-management, likely due to low effects of the environment external to the growth facility on genotypes. Across the N-managements, the rg between the root traits and SWt were greater than the corresponding rp, most likely due to the between N-management treatment noise confounding the phenotypic variance on rp. As mentioned earlier, here, H was less than a threshold we had set at 0.1; therefore, H values for Xsings between the N-managements and rg associated with it were not included in the across-management output (Table 4).
Variation among traits and between N managements
Variation among the genotypes and in the genotype × management had a significant effect for all the traits in the two N managements, but RAvdiam had the least variation among the genotypes with CV ~ 10.3% in the HN and ~ 7.1% in the LN, compared to CV ranging between ~ 47% to ~ 80% among the rest of the traits in the two N managements (Table 6). Because CV is highly dependent on the grand mean of a trial50, we exercised caution in using a CV to interpret the comparative variability between traits under the two N managements. The trait CV between the two N management did not show any specific pattern to suggest a trait variance inflation due to the low nitrogen treatment or the differences in the grand means.
Comparing heritability and correlated response in LN and HN among root traits
The mean H of only two traits, SWt and Xsings, were substantially greater under HN than under LN, while the mean H of RAvdiam was substantially greater under LN than under HN. Trait heritability showed varying degrees of H ‘instability’ between the two N-managements (Table 5). Some showed higher H under one N management than the other, with RAvdiam being the most heritable (under LN) and having the largest H difference between the N-managements (53.6% in LN and 95.7% in LN). RWt had the least difference (50.3% in HN and 50.6% in the LN). H between the N-managements was very low, yet the genetic correlation was very high, suggesting that H is affected by the environmental noise between the two N-managements.
RAvdiam had significant negative and large correlations (− 0.83) to SWt under LN, while the correlations were positive but not significant under HN. It was ranked among the highest in the RF regression under LN but ranked bottom in the HN. The heritability was very high (0.957) under LN compared to 0.536 under HN. On the other hand, Xsing, was ranked highest ranked by the RF regressions under LN, but lower under HN (rg and rp were positive and highly significant in both managements, while H was medium' in the LN, 0.588; in the HN, 0.658). These observations suggest that selecting for small root diameter may be desirable for improving shoot weight of baby spinach in low N. In fact, of all the root traits in this report, only RAvdiam had a predicted high indirect selection efficiency (113%) for SWt (rg_RTswt × HRT > Hswt, Table 6). Stronger correlated response efficiency of SWt was predicted for morphological (RLength and RAvdiam) and architectural traits (Tips and Xsings) compared to the standing crop root traits (RSarea, RVol, and RWt) (Table 6).
Discussion
The analysis pipeline was designed to define the phenotypic, genotypic, and predictive relationship between root architecture traits (Forks, Tips, and Xsings), root morphological traits (RLength, and RAvdiam), the root system of the standing crop (RSarea, RVol, and RWt) and between the root traits and the SWt of spinach grown in a soilless system. The objective was to determine root traits that have the greatest effect on the harvestable shoot under low N and thus can be used as a secondary trait to select for high NUE germplasm. We determined the phenotypic and genetic correlations (rp and rg, respectively) between root traits and between the root traits and the SWt within and between the N-managements. Parallel to the correlation analyses, we used the predictive random forest machine learning technique to rank the root phenotypes according to their strength to predict SWt.
Selecting the root traits with predicted potential as secondary traits for shoot biomass
We predicted the correlated response (CR) in SWt resulting from selecting any of the eight root traits to speculate its suitability as a secondary trait. An important component in defining CR is the H, which integrates information on genetic variation and environmental noise into one statistic and thus is useful in planning breeding programs51. One condition that must be met for indirect selection to be effective is H and rg must be high in both the selection and target environments43,44 even though H and rg are environment- and population-specific43,46. Fortunately, H has been strikingly similar in many environments34,52, and H variations in indoor growth environments are expected to be low52,53. In this context, since our H estimates were the average repeatability of 3-replicate trials in each of the N-managements, it may also be used to estimate the correlation expected between line means obtained from trials conducted at different indoor systems. Selecting a root trait in one N-management where H is high may predict the performance in the other, but we think the actual phenotypic quantity may vary substantially. However, if heritability values are high for both traits, then the correlation in breeding values dominates the phenotypic correlation43,45,46. Since the H values in SWt (H ~ 70%) was not as high compared to H of RAvdiam (~ 96%), and with a genetic correlation of − 0.827 between them, the correlation in environmental values within N-management which dominated the phenotypic correlation (− 0.481) between RAdiam and SWt may have been mainly due to LN-management effect on SWt. Thus, an LN environment that minimizes the devaluation of the breeding values between the RAvdiam and SWt must be maintained, and we believe indoor environments may provide this condition.
In this context, selecting for a root trait as a secondary trait should produce a correlated response in spinach shoot biomass, and the ratio CRswt/Rswt (Table 5) provides such an indirect selection criterion. It is clear (from these ratios) that direct selection of shoot biomass (SWt) is predicted to be superior to selecting for most of the eight root traits as a proxy in both N-managements. The exception is RAvdiam, which was predicted to result in superior indirect selection efficiency (112.8%) for SWt, i.e., a gain of ~ 12.8% in SWt by selecting against large RAvdiam. Other traits resulted in lower than 100% predicted efficiency; for instance, under LN was Xsings (74.6%), while the rest were less than 45% efficient. In the HN management, RLength (77.2%) and Tips (82.6%) were the most efficient but not enough for an indirect selection advantage.
The case for selecting against large average root diameter in baby spinach
We have noticed that under HN, RAvdiam did not have significant rg to any of the other root traits and with SWt, and only had significant positive rp with RWt and RVol (Table 2). Meanwhile, under LN, RAvdiam had significant positive rp only to RSarea and RVol, non-significant rp with RWt (0.104) and RLength, but significant negative rp with Tips (− 0.238) and Xsings (− 0.310) (Table 3). Spinach requires high N supply8, and under such conditions, it seems RAdiam is not likely to substantially influence the yield differences observed in shoot biomass among the genotypes. However, the significant negative rp (− 0.481) and the highly negative rg (− 0.827) with SWt under LN management (Table 3) suggest that larger mean root diameter is associated with smaller mean shoot weight and vice versa. The reduction in root diameter-related phenes in the youngest maize nodes under N stress suggested that root diameter might play a role in adaptive stress responses54,55. Whether or not the greater mean RAvdiam was due to root girth expansion in a negative feedback response to low N or competing resource allocation8,34 was not investigated in this study. Based on the pattern of diameter change in response to nutrient concentration in different species, it is suggested that altering root diameter may be another way to save C costs in root growth during nutrient stresses56. Although it is unclear how the root anatomical changes influence spinach root diameter, maize roots showed reduced cell diameter and area of vessels but an increased amount of aerenchyma during LN stress54. It is plausible to assume that N is preferentially allocated to the roots to sustain their growth under LN than shoots, and the reduced N concentration act as the internal signal in regulating the response of axile root growth. Given the robust correlated response efficiency of SWt predicted for RAvdiam and lateral root traits (Tips and Xsings) in our study, the RAvdiam measure can be used to indicate the ratio of axial: lateral roots.
The genotypic correlations between RAvdiam and RSarea (0.533) and between RAvdiam and RVol (0.794) were also highly significant and positive. This implies that selecting against large root diameter may also select against RSarea and RVol under low N-management. Since there was non-significant rg but significant rp between RSarea and SWt, our data suggest that only a limited genetic linkage drag or pleiotropy45 on shoot yield might result from selecting against large RAvdiam. Moreover, RSarea and RVol are standing crop traits37 for ‘root bulk’, which are a product of morphological traits (RAvdiam and RLength) and root architectural traits (e.g., Tips, Forks, and Xsings). The significant small positive rp between RSarea and SWt may be a combined artifact of these other morphological and architectural components when we also consider a significant negative rp existed between RAvdiam and SWt in the LN management. This position is also supported by the fact that RVol had a negative rg (− 0.296) and an insignificant rp to SWt, and yet RVol had positive rg (0.439) and rp (0.931) to RLength; RLength, on the other hand, had significant rg (0.539) and rp (0.529) to SWt. RAvdiam was also highly heritable with H ~ 96% under LN, Table 6). We propose that the root average diameter is the only trait in this study that can successfully be selected against to improve the yield of shoot biomass low N. Further studies to validate these findings might benefit from testing in multiple growth conditions (temperature, humidity, and graduated N concentrations) to define a broader range of root trait-shoot yield relationships and N-responsiveness.
Resolving the conundrum around the antagonistic relationship between RAvdiam and SWt
Compared to SWt, RAvdiam has shown greater H (Table 6). The rg between SWt and RAvdiam can be high (Table 3). In other words, indirect selection for a secondary trait will be superior if the heritability of that trait is high, and the correlation between the traits is close to 145,46. In this study, the RAvdiam met these two critical criteria. However, for practical use in a breeding program, the secondary trait must also be inexpensive and easy to measure in large trials43,45. In that case, shoot biomass estimate could be used to select for roots bulk traits in production systems that target spinach roots as the end product. Because precision in imaging techniques for roots and shoot is rapidly evolving57,58, we believe that soilless systems can be designed to facilitate robust root metrics characterization to match the ease with which above-ground biomass can be phenotyped. It would also be worthwhile to determine if further selection for/ against other root traits would eventually result in a superior secondary selection for shoot biomass. How these relationships would play along as the plants mature under different indoor growth conditions or in the field conditions require further studies.
Although a soilless system reduces the complexities associated with soils, we believe that the selection of a root trait needs to be understood in the context of the possible complex interplay among root traits45,51. The possible complex interactions between the root traits that may have influenced the SWt were not explicitly considered in our interpretations. However, we have alluded to such complexity by describing the trait to trait correlations, which we hope should serve as an impetus for further inquiry. Although the expected genetic correlation between estimates of cultivar means are best obtained from independent sets of trials43,59, we hypothesize that under similar N treatments, manipulation of other growth conditions in independent indoor growth environments may lead to some deviations from the response to selection predicted here. With the advent of techniques in processing images and the deep learning60 frameworks that use advanced optimization and features from data, such prediction accuracy is likely to improve continually61,62,63. Nonetheless, machine learning recognized root traits would continue to rely on vigorous calibrations and field-based validation in systems of interest.
In conclusion, we report on the investigation of eight root traits genetic and phenotypic correlations with fresh shoot biomass of spinach grown in a soilless system in a controlled indoor environment. The plants were harvested at 41 d after sowing, a stage corresponding to the marketable baby spinach. We have used both genotypes by management and other conventional breeder statistics and a machine learning predictive technique to define candidate root traits with the potential for indirectly selecting for spinach shoot yield. The experiments were set up under two separate and contrasting N-managements. Of the eight root traits, the root average diameter emerged as the only candidate with a predicted indirect selection efficiency good enough to improve shoot biomass. However, it had a robust negative genetic correlation with shoot yield, making us believe that selecting against large root diameter may improve the fresh shoot yield of baby spinach. We have exercised caution in this interpretation by recommending further studies into the possible complex interactions among the root traits considered in improving shoot biomass yield in baby spinach.
Data availability
All data generated or analyzed during this study are included in this published article (and Supplementary Information files).
References
Gruber, B. D., Giehl, R. F., Friedel, S. & von Wirén, N. Plasticity of the Arabidopsis root system under nutrient deficiencies. Plant Physiol. 163, 161–179 (2013).
Sun, C.-H., Yu, J.-Q. & Hu, D.-G. Nitrate: a crucial signal during lateral roots development. Front. Plant. Sci. 8, 485 (2017).
Socolow, R. H. Nitrogen management and the future of food: lessons from the management of energy and carbon. Proc. Natl. Acad. Sci. 96, 6001–6008 (1999).
Marvi, M. S. P. Effect of nitrogen and phosphorous rates on fertilizer use efficiency in lettuce and spinach. J. Hortic. For. 1, 140–147 (2009).
Schenk, M., Heins, B. & Steingrobe, B. The significance of root development of spinach and kohlrabi for N fertilization. Plant Soil 135, 197–203 (1991).
Stagnari, F., Di Bitetto, V. & Pisante, M. Effects of N fertilizers and rates on yield, safety and nutrients in processing spinach genotypes. Sci. Hortic. 114, 225–233 (2007).
Biemond, H., Vos, J. & Struik, P. Effects of nitrogen on accumulation and partitioning of dry matter and nitrogen of vegetables. 3. Spinach. NJAS Wageningen J. Life Sci. 44, 227–239 (1996).
Smorlders, E. & Merckx, R. Growth and shoot:root partitioning of spinach plants as affected by nitrogen supply. Plant Cell Environ. 15, 795–807. https://doi.org/10.1111/j.1365-3040.1992.tb02147.x (1992).
Walch-Liu, P. & Forde, B. G. Nitrate signalling mediated by the NRT1. 1 nitrate transporter antagonises l-glutamate-induced changes in root architecture. Plant J. 54, 820–828 (2008).
Lima, J. E., Kojima, S., Takahashi, H. & von Wirén, N. Ammonium triggers lateral root branching in Arabidopsis in an AMMONIUM TRANSPORTER1; 3-dependent manner. Plant Cell 22, 3621–3633 (2010).
Forde, B. G. Nitrogen signalling pathways shaping root system architecture: An update. Curr. Opin. Plant Biol. 21, 30–36 (2014).
Giehl, R. F., Gruber, B. D. & von Wirén, N. It’s time to make changes: Modulation of root system architecture by nutrient signals. J. Exp. Bot. 65, 769–778 (2014).
Razaq, M., Zhang, P., Shen, H.-L. & Salahuddin, A. Influence of nitrogen and phosphorous on the growth and root morphology of Acer mono. PLOS ONE 12, e0171321. https://doi.org/10.1371/journal.pone.0171321 (2017).
Lee, S. & Lee, J. Beneficial bacteria and fungi in hydroponic systems: Types and characteristics of hydroponic food production methods. Sci. Hortic. 195, 206–215. https://doi.org/10.1016/j.scienta.2015.09.011 (2015).
Parniske, M. Arbuscular mycorrhiza: the mother of plant root endosymbioses. Nat. Rev. Microbiol. 6, 763–775. https://doi.org/10.1038/nrmicro1987 (2008).
Eldridge, B. M. et al. Getting to the roots of aeroponic indoor farming. New Phytol. 228, 1183–1192. https://doi.org/10.1111/nph.16780 (2020).
Liese, R., Alings, K. & Meier, I. C. Root branching is a leading root trait of the plant economics spectrum in temperate trees. Front. Plant. Sci. 8, 315–315. https://doi.org/10.3389/fpls.2017.00315 (2017).
Gopinath, P., Vethamoni, I. & Gomathi, M. Aeroponics soilless cultivation system for vegetable crops. Chem. Sci. Rev. Lett. 6, 838–849 (2017).
Koohakan, P. et al. Evaluation of the indigenous microorganisms in soilless culture: Occurrence and quantitative characteristics in the different growing systems. Sci. Hortic. 101, 179–188. https://doi.org/10.1016/j.scienta.2003.09.012 (2004).
Zhao, J., Bodner, G. & Rewald, B. Phenotyping: Using machine learning for improved pairwise genotype classification based on root traits. Front. Plant Sci. https://doi.org/10.3389/fpls.2016.01864 (2016).
Bodner, G. et al. A statistical approach to root system classification. Front. Plant. Sci. https://doi.org/10.3389/fpls.2013.00292 (2013).
Moon, T., Ahn, T. I. & Son, J. E. Forecasting root-zone electrical conductivity of nutrient solutions in closed-loop soilless cultures via a recurrent neural network using environmental and cultivation information. Front. Plant. Sci. 9, 66. https://doi.org/10.3389/fpls.2018.00859 (2018).
Lammerts van Bueren, E. T. & Struik, P. C. Diverse concepts of breeding for nitrogen use efficiency. A review. Agron. Sustain. Dev. 37, 50. https://doi.org/10.1007/s13593-017-0457-3 (2017).
Chan-Navarrete, R., Dolstra, O., van Kaauwen, M., van Bueren, E. T. L. & van der Linden, C. G. Genetic map construction and QTL analysis of nitrogen use efficiency in spinach (Spinacia oleracea L.). Euphytica 208, 621–636 (2016).
Chan-Navarrete, R., Kawai, A., Dolstra, O., van Bueren, E. T. L. & van der Linden, C. G. Genetic diversity for nitrogen use efficiency in spinach (Spinacia oleracea L.) cultivars using the Ingestad model on hydroponics. Euphytica 199, 155–166 (2014).
Ju, C. et al. Root and shoot traits for rice varieties with higher grain yield and higher nitrogen use efficiency at lower nitrogen rates application. Field Crop Res. 175, 47–55 (2015).
Mu, X. et al. Genetic improvement of root growth increases maize yield via enhanced post-silking nitrogen uptake. Eur. J. Agron. 63, 55–61 (2015).
SharathKumar, M., Heuvelink, E. & Marcelis, L. F. M. Vertical farming: Moving from genetic to environmental modification. Trends Plant. Sci. 25, 724–727. https://doi.org/10.1016/j.tplants.2020.05.012 (2020).
Despommier, D. The vertical farm: Controlled environment agriculture carried out in tall buildings would create greater food safety and security for large urban populations. J. Verbr. Lebensm. 6, 233–236. https://doi.org/10.1007/s00003-010-0654-3 (2011).
Meinen, E., Dueck, T., Kempkes, F. & Stanghellini, C. Growing fresh food on future space missions: Environmental conditions and crop management. Sci. Hortic. 235, 270–278. https://doi.org/10.1016/j.scienta.2018.03.002 (2018).
Eppendorfer, W. H. & Bille, S. W. Free and Total Amino Acid Composition of Edible Parts of Beans, Kale, Spinach, Cauliflower and Potatoes as Influenced by Nitrogen Fertilisation and Phosphorus and Potassium Deficiency. J. Sci. Food Agric. 71, 449–458. https://doi.org/10.1002/(SICI)1097-0010(199608)71:4%3c449::AID-JSFA601%3e3.0.CO;2-N (1996).
Maneejantra, N. et al. A quantitative analysis of nutrient requirements for hydroponics Spinach (Spinacia oleracea L.) production under artificial light in a plant factory. J. Fertil. Pest. 7, 170–174 (2016).
Lynch, J. Root architecture and plant productivity. Plant. Physiol. 109, 7–13. https://doi.org/10.1104/pp.109.1.7 (1995).
Lynch, J. P. in Nutrient Acquisition by Plants Vol. 181 Ecological Studies (ed BassiriRad H.) Ch. Chapter 7, 147–183 (Springer, 2005).
Wright, M. N. & Zagger, A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77, 1–17. https://doi.org/10.18637/jss.v077.i01 (2017).
Alvarado, G. et al. (eds Maize International & Center Wheat Improvement) (CIMMYT Research Data & Software Repository Network, 2015).
Iversen, C. M., McCormack, M. L., Blackwood, C. B., Freschet, G. T., Kattge, J., Roumet, C., Stover, D. B., Soudzilovskaia, N.A., Valverde-Barrantes, O. J., van Bodegom, P. M., Violle, C. Version 2 (Department of Energy, Oak Ridge National Laboratory TES SFA, U.S., Oak Ridge, Tennessee, USA, 2018).
Breiman, L. in Manual On Setting Up, Using, And Understanding Random Forests V3.1 (University of California at Berkeley, Berkeley, CA) (2002).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Falconer, D. S., Mackay, T. F. & Frankham, R. Introduction to quantitative genetics (4th edn). Trends in Genetics, Vol. 12, p. 280 (1996).
Cooper, M. & DeLacy, I. Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theor. Appl. Genet. 88, 561–572 (1994).
Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244. https://doi.org/10.1080/01621459.1963.10500845 (1963).
Falconer, D. S. Introduction to Quantitative Genetics. 365 (Ronald Press, 1961).
Searle, S. R. The value of indirect selection: I. Mass selection. Biometrics 21, 682–707. https://doi.org/10.2307/2528550 (1965).
Gallais, A. in Efficiency in Plant Breeding. (ed W. Lange, Zeven, A.C., Hogenboom, N.G. ) 45–60 (Pudoc, 1984).
Hansel, H. in Efficiency in Plant Breeding. (ed A.C. Zeven and N.G. Hogenboom W. Lange) 61–64 (Pudoc, 1984).
Liaw, A. & Weggy, M. Classification and regression by randomForest. R News 2, 18–22 (2002).
Stekhoven, D. J. & Bühlmann, P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 112–118 (2012).
Ljumović, M. & Klar, M. in 2015 4th Mediterranean Conference on Embedded Computing (MECO). 212–215 (IEEE).
Brown, C. E. in Applied multivariate statistics in geohydrology and related sciences 155–157 (Springer, 1998).
Wray, N. V. P. Estimating trait heritability. Nat. Educ. 1, 29 (2008).
Gitonga, V. W. et al. Genetic variation, heritability and genotype by environment interaction of morphological traits in a tetraploid rose population. BMC Genet 15, 146–146. https://doi.org/10.1186/s12863-014-0146-z (2014).
Folta, K. M. Breeding new varieties for controlled environments. Plant. Biol. 21(Suppl 1), 6–12. https://doi.org/10.1111/plb.12914 (2019).
Gao, K., Chen, F., Yuan, L., Zhang, F. & Mi, G. A comprehensive analysis of root morphological changes and nitrogen allocation in maize in response to low nitrogen stress. Plant. Cell Environ. 38, 740–750. https://doi.org/10.1111/pce.12439 (2015).
Yang, J. T., Schneider, H. M., Brown, K. M. & Lynch, J. P. Genotypic variation and nitrogen stress effects on root anatomy in maize are node specific. J. Exp. Bot. 70, 5311–5325. https://doi.org/10.1093/jxb/erz293 (2019).
Zobel, R. W., Kinraide, T. B. & Baligar, V. C. Fine root diameters can change in response to changes in nutrient concentrations. Plant. Soil 297, 243–254. https://doi.org/10.1007/s11104-007-9341-2 (2007).
Bodner, G., Nakhforoosh, A., Arnold, T. & Leitner, D. Hyperspectral imaging: A novel approach for plant root phenotyping. Plant. Methods 14, 84. https://doi.org/10.1186/s13007-018-0352-1 (2018).
Atkinson, J. A., Pound, M. P., Bennett, M. J. & Wells, D. M. Uncovering the hidden half of plants using new advances in root phenotyping. Curr. Opin. Biotechnol. 55, 1–8. https://doi.org/10.1016/j.copbio.2018.06.002 (2019).
Holland, J. W., Nyquist, W.E., Cervantes-Martinez, T.C. in Plant Breeding Reviews Vol. 22 (ed J. Janick) Ch. 2, 29–39 (Wiley, 2003).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444. https://doi.org/10.1038/nature14539 (2015).
Khaki, S., Wang, L. & Archontoulis, S. A CNN-RNN Framework for Crop Yield Prediction. (2019).
van Dijk, A. D. J., Kootstra, G., Kruijer, W. & de Ridder, D. Machine learning in plant science and plant breeding. iScience 24, 101890. https://doi.org/10.1016/j.isci.2020.101890 (2021).
Shahhosseini, M., Hu, G., Huber, I. & Archontoulis, S. V. Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt. Sci. Rep. 11, 1606. https://doi.org/10.1038/s41598-020-80820-1 (2021).
Acknowledgements
This study was supported in part by funds from USDA-SCMP Grant #. TX-SCM-17-04 to V.J. and C.A.A, Texas A&M AgriLife Vegetable seed grant FY16-FY17 to C.A.A. and V.J.; and USDA-National Institute of Food and Agriculture Specialty Crops Research Initiative 2017-51181-26830 to C.A.A.
Author information
Authors and Affiliations
Contributions
V.J. designed and supervised experiments; H.O.A., methodology; A.K.M, H.G, and J.D., data collection and extraction; H.O.A, formal analysis; H.O.A, writing—original draft preparation; H.O.A, C.A.A, and VJ, writing—review and editing; V.J. and C.A.A, funding acquisition. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Awika, H.O., Mishra, A.K., Gill, H. et al. Selection of nitrogen responsive root architectural traits in spinach using machine learning and genetic correlations. Sci Rep 11, 9536 (2021). https://doi.org/10.1038/s41598-021-87870-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-87870-z
This article is cited by
-
Temporal phenotypic variation of spinach root traits and its relation to shoot performance
Scientific Reports (2024)
-
Evaluation of growth adaptation of Cinnamomum camphora seedlings in ionic rare earth tailings environment
Scientific Reports (2023)
-
Non-invasive phenotyping for water and nitrogen uptake by deep roots explored using machine learning
Plant and Soil (2023)
-
Cover Crop Amendments and Lettuce Plant Growth Stages Alter Rhizobacterial Properties and Roles in Plant Performance
Microbial Ecology (2023)
-
Genetic dissection of nitrogen induced changes in the shoot and root biomass of spinach
Scientific Reports (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.