A two-stage approach for the spatio-temporal analysis of high-throughput phenotyping data

High throughput phenotyping (HTP) platforms and devices are increasingly used for the characterization of growth and developmental processes for large sets of plant genotypes. Such HTP data require challenging statistical analyses in which longitudinal genetic signals need to be estimated against a background of spatio-temporal noise processes. We propose a two-stage approach for the analysis of such longitudinal HTP data. In a first stage, we correct for design features and spatial trends per time point. In a second stage, we focus on the longitudinal modelling of the spatially corrected data, thereby taking advantage of shared longitudinal features between genotypes and plants within genotypes. We propose a flexible hierarchical three-level P-spline growth curve model, with plants/plots nested in genotypes, and genotypes nested in populations. For selection of genotypes in a plant breeding context, we show how to extract new phenotypes, like growth rates, from the estimated genotypic growth curves and their first-order derivatives. We illustrate our approach on HTP data from the PhenoArch greenhouse platform at INRAE Montpellier and the outdoor Field Phenotyping platform at ETH Zürich.

In recent years, the frequency of use of high-throughput phenotyping (HTP) techniques and platforms has strongly increased in plant genetics and physiology. HTP data provide quick, precise, non-destructive and costeffective information on phenotypic traits with high spatial and temporal resolution 1 . Designed HTP experiments, either indoors or in a field, usually consist of experimental units (e.g., single plants in pots or plots) that are combined with a wide range of sensing equipment for the (almost) continuous monitoring of plant/plot phenotypic traits for large sets of genotypes. Treatments applied in HTP research designs may comprise not only different genotypes but also different management practices. High dimensional HTP data as derived from multiple sensors (e.g., images, point clouds, hyperspectral data) are typically filtered, condensed, integrated and summarised into features. Combinations of one or more features are used to approximate biological traits that are on the one hand still close to the data, and on the other hand are relatively far from the target traits of commercial interest (most often yield and quality parameters). Following van Eeuwijk et al. 2 , here we will refer to these traits as low-level, in the sense of being close to the original phenotypic measurement at a single point in time and with little statistical modelling applied. Thus, "low-level" does not necessarily refer to the biological complexity or agronomic importance of the trait. Examples of such low-level traits are plant height, canopy cover, leaf area index, ear and tiller counts, canopy temperature or indices related to water or chlorophyll content. Thus, researchers and plant breeders have now access to large and detailed datasets, in the form of (long) time-series, enabling to track multiple low-level traits from, e.g., seed emergence to physiological maturity. The challenge in this setting is how to efficiently and adequately exploit the diversity and complexity of HTP data to (a) extract www.nature.com/scientificreports/ They can serve as inputs to subsequent statistical analyses that aim at modelling genotype-by-environment interactions in biologically complex traits, like yield, in terms of underlying component traits.

Methods
We consider that, in the HTP experiment, genotypes have been allocated to the experimental units following an experimental design. For simplicity, in what follows we assume that only genotypes have been tested (i.e., we have a single factor treatment structure). However, we allow for population structure, modelled as different families, panels or populations of genotypes. Thus, the data present a three-level nested hierarchical structure, with plants/plots nested in genotypes, and genotypes nested in populations (see Supplementary Fig. S1 online). For clarity and simplicity, hereafter we refer to the experimental units of the experimental design as plants. The experimental unit is most commonly a plant for an indoor experiment and a plot containing several plants for a field experiment.
Data structure. Let y pgi (t) denote the (observed) low-level phenotypic trait of interest for the i-th plant ( i = 1, . . . , m pg ) of the g-th genotype ( g = 1, ..., ℓ p ) in the p-th population ( p = 1, . . . , k ) at time t ∈ {t 1 , . . . , t n } . Let L = k p=1 ℓ p denote the total number of genotypes and M = k p=1 ℓ p g=1 m pg the total number of plants. We consider that plants can be mapped to a coordinate system defined in terms of r rows and c columns, and denote as u pgi and v pgi the row and column position respectively ( u pgi ∈ {1, . . . , r} and v pgi ∈ {1, . . . , c} ). In the first stage, the correction is made per time point for the whole experiment. As such, note that we assume that all plants in the experiment are measured at the same times, ( t 1 , . . . , t n ). That is a simplification, as platform data is typically acquired within the order of minutes to hours. However, we presume that the factors that may affect the platform measurements within that period can be accounted for (and captured) by the experimental design (e.g., blocking structure). The assumption of the same measuring times, however, does not preclude the presence of incomplete data. Our approach can handle missing values at both the plant and genotypic level (i.e., with plants or even genotypes not measured for some times) thanks to the use of splines and mixed models (for an in deep discussion on missing data for longitudinal data, we refer the interested reader to Brumback and Rice 21 and Fitzmaurice et al. 35 ) First stage: environmental correction using SpATS. In the first stage, a SpATS model is fitted to the phenotypic data separately for each measurement time t ∈ {t 1 , . . . , t n } . A SpATS model is a linear mixed model where large-scale and small-scale spatial dependence (or spatial trend) is explicitly modelled by a two-dimensional smooth surface defined over the row and column positions of the plants in the experiments, f t (u, v) . This smooth function is constructed with tensor-product P-splines 23,36 , i.e., through the combination of the tensorproduct of marginal B-spline bases and an anisotropic discrete penalty on the coefficients. In its more general specification, and considering genotypes as random, a SpATS model has the following form (for more details, see Rodríguez-Álvarez et al. 11 ) where y(t) = (y 111 (t), . . . , y kℓ k m kℓ k (t)) T is the low-level phenotypic trait at time t, X and Z represent columnpartitioned matrices, associated respectively with fixed and random components, as for instance, row, column, replicate and/or (incomplete) block effects, and Z g is the design matrix assigning plants to genotypes. Further, we decompose the fixed effects component in two terms The first term, X h β ht , corresponds to the factors/covariates whose effects we are interested in modelling in the second stage, whereas X q β qt corresponds to the factors/covariates whose effects we are interested in removing (e.g., those associated with experimental design factors). As said before, for simplicity, we consider that X h β ht only contains information regarding the family/population to which the genotypes belong to (if any). Concerning X q β qt , we assume that it is associated with R experimental design factors (categorical covariates), and thus, The length of β qtr (and therefore the number of columns in the associated design matrix X qr ) ( r = 1, . . . , R ) corresponds to the number of different categories, say c r , of the r-th experimental design factor minus one (as the intercept is included in the model).
Once the SpATS model in Eq. (1) is fitted, the phenotype of interest at time t, y(t) , is corrected by only considering the (estimated) sources of variation that are of interest, plus the residual component, i.e., the corrected phenotype, denoted as ỹ(t) = (ỹ 111 (t), . . . ,ỹ kℓ k m kℓ k (t)) T , is obtained as follows where J qr are matrices of ones of appropriate dimensions (i.e., M × (c r − 1) ). The correction is performed following the procedure for obtaining predictions (e.g., adjusted means) in linear mixed models described in Welham et al. 37 . In that paper, the authors propose a partition of the explanatory variables (see Eq. (1)) in three groups: (1) those for which predictions are required (i.e., population, X hβht , and genotypic, Z gĉgt , effects), (2) (1) (2) www.nature.com/scientificreports/ those to be averaged over (i.e., experimental design factors effects, R r=1 1 c r J qrβqtr ); and, (3) those to be ignored (i.e., spatial trends, f t (u, v) , and other random effects, Zĉ t ).
As will be seen in the next section, in the second stage of our proposal we model ỹ(t) ( t ∈ {t 1 , . . . , t n } ). Thus, it is worth emphasising that, in the way it is constructed, ỹ(t) (see Eq. (2)) only contains information about genetic populations and genotypes, as well as unexplained plant-to-plant variation (measurement error). In other words, for the second stage the predicted values for the genetic populations and the genotypes, p(t) , as well as the unexplained plant-to-plant variation, ε t , are maintained as the "new" (spatially corrected) experimental unit values, while the spatial trends and other blocking factors to control for spatial variability are omitted. Also, note that the "observations" that enter the second stage, ỹ(t) , are not observed but estimated/predicted. Thus, we propose to propagate the uncertainty from the first stage to the second stage through the inclusion of weights, in a similar way to the weighted stage-wise analysis of multi-environment trials 38 .In particular, weights are obtained from the inverse of the variance-covariance (vcov) matrix for the predictions (plus the residual variance), i.e., w(t) = diag((vcov(p(t)) +σ 2 t I M ) −1 ).
Second stage: P-spline hierarchical curve data model. The aim of the second stage is to model the spatially corrected phenotype obtained in the first stage. We can re-organise the data for this stage in such a way that they can be seen as a sample of plant growth curves, where f p is the growth/change over time of the (spatially corrected) phenotype for the p-th population (i.e., the p-th population mean function), f pg is the genotype-specific growth deviation from f p for the g-th genotype, and f pgi is the plant-specific growth deviation from f pg for the i-th plant. The additive modelling approach implies that f p + f pg can be interpreted as the growth over time of the (spatially corrected) phenotype for the g-the genotype in the p-th population. Thus, on top of genotype-specific deviations from their overall population mean, we also obtain genotype-specific growth curves. Finally, w pgi are the weights obtained in the first stage. We use P-splines for hierarchical curve data 24,31 to estimate the model in Eq. (3). In this framework, each function in Eq. (3) is approximated by a linear combination of cubic B-spline basis functions, and its smoothness is controlled by a penalty on the differences of the B-spline coefficients (we use differences of order 2). The influence of the penalty is determined by a smoothing parameter. One of the attractive features of using B-splines to represent the functions in Eq. (3) is that their derivatives can be easily obtained 39 . Furthermore, we adopt the connection between P-splines and linear mixed models 32,33 , where each function is treated as a sum of fixed (linear) and random (non-linear) components, and the smoothing parameter is replaced by a ratio of variances components. For clarity we omit here the more technical details, but it can be shown that, under this setting, each function in Eq. (3) is expressed as follows are sets of basis functions obtained from the connection between P-splines and linear mixed models 32 . In contrast to standard P-spline mixed models, but in line with the traditional random intercept and slope model for longitudinal data, here the linear component (intercept and slope) associated with f pg (genotypic deviations) and f pgi (plant deviations) is modelled with random rather than fixed effects (i.e., genotypes and plants are treated as random samples from the corresponding population). Also, for notational simplicity, we assume the same genetic variation across populations. However, this assumption can be easily relaxed by considering different values of σ 2 gen,0 , σ 2 gen,1 and σ 2 gen per population. A similar approach can be followed to allow for the plant-to-plant variation ( σ 2 plant,0 , σ 2 plant,1 and σ 2 plant ) to vary across genotypes. These generalisations might be worth exploring if there are sufficient number of genotypes per population and plants per genotype, respectively.
Before proceeding, we comment on the selection of the number of B-spline basis functions used to approximate f p , f pg and f pgi (i.e., b pop , b gen and b plant , respectively). In P-splines, it is recommended to choose a large number of bases to provide enough flexibility; the role of the penalty is to avoid over fitting 23 . In our setting the number of functions in the complete model equals k + L + M (populations + genotypes + plants), and, hence, the number of regression coefficients (either fixed or random) to be estimated is This value can be very large, with the number of plants, M, and associated basis dimension, b plant , playing the major role: the dataset may contain thousands of plants. Thus, to reduce the computational burden, one could www.nature.com/scientificreports/ be tempted to use different basis dimensions for f p , f pg and f pgi , and to be less generous with f pgi . However, this is not a good strategy. Simulation studies, as well as preparatory data analyses, have shown that results may be sensitive (and in some cases unreliable) to using different bases dimensions. We therefore recommend choosing the same value for b pop , b gen and b plant , while keeping the number of coefficients at a reasonable level (i.e., a trade-off between flexibility and dimensionality).
We now present the model in Eq. (3) based on the specification in Eq. (4) in matrix notation. Let's first order the data by population, genotype, plant, and time (in that order), i.e., ỹ = (ỹ 111 (t 1 ), . . . ,ỹ 111 (t n ), . . . ,ỹ kℓ k m kℓ k (t 1 ), . . . ,ỹ kℓ k m kℓ k (t n )) T . Thus, in a compact way, the three-level nested hierarchical growth model can be expressed as where and The specific form of the matrices and vectors involved in Eq. (5) is given in Section "Three-level nested hierarchical growth model: mixed model formulation". Note that model in Eq. (5) is a standard linear mixed model, and, thus, estimation can be carried out with any mixed-model software, such as the R-packages ASReml-R 40 , nlme 41 and lme4 42 , or the PROC MIXED 43 procedure in SAS . Also, we can use the R-package mgcv 44 for that purpose. However, high-throughput phenotypic data are usually characterised by a large number of observations, which, together with the number of regression coefficients in Eq. (5), might make estimation with the above-mentioned software computationally expensive. Thus, we have implemented in the R language 45 our own code (provided along with the paper), which resorts to the recently proposed SOP (Separation of Overlapping Penalties) method 46 . Empirical best linear unbiased estimates (BLUE) and predictors (BLUP) are obtained by the solution of Henderson's mixed model equations 47 , and variance components by means of restricted maximum likelihood (REML) 48 . Construction of (approximate) confidence intervals for the estimated curves and their derivatives is based on the prediction error variance 37 . To speed up computation, we take advantage of the array structure of the data through Generalised Linear Array Models (GLAM) 49 and of the sparse structure of the matrices involved in the model. It is to note that the model in Eq. (5) presents a standard variance-covariance matrix for the random effects (i.e., it is linear in the variance parameters) and thus, the SOP method reduces to the estimating algorithm described in Harville 50 .

Applications
We illustrate the potential of our approach with data from two experiments from two different HTP platforms: (1) the PhenoArch platform (INRAE Montpellier) 51 (greenhouse, Fig. 1a) and, (2) the FIP (FIeld Phenotyping) platform (ETH Zürich) 52 (field, Fig. 1b). The PhenoArch platform data are used to describe our approach and illustrate the outcomes at each step. The FIP platform data are further analysed to discuss the extraction of important (genotype-specific) attributes from the estimated genotype-specific growth curves and their first-order derivatives, as well as their possible use in the decision-making process 2,53,54 .  (1)), the model included the population (panel by water regime combination) as fixed effect, and the row and column positions as random effects. A different genetic variance for each of the four populations (panel by water regime combination) was considered. Regarding the spatial trend (i.e., the tensor-product P-spline), B-spline bases of dimension 60 and 28 were chosen for the row and column positions, respectively, and nested bases, with half the dimension, were used 56 . The spatially corrected leaf area included the estimated population and genotypic effects, as well as the residual (see Eq. (2)). The comparison of the spatially corrected leaf area when modelling genotypes as fixed (as usually in stage-wise analyses) or random (as done here) effects shows essentially identical results (Supplementary Fig. S3 online): shrinkage of the genotypic BLUPs is counteracted by the inclusion of the residual component into the correction in Eq. (2). Results show that, as expected, the spatial pattern observed in the raw data disappears in the spatially corrected phenotype, as illustrated with the leaf area at four different measurement times (Fig. 2). Moreover, the correction reduced the variability among plants (i.e., replicates) of the same genotype and water treatment combination (Fig. 3). Estimates at the three levels of the hierarchy were obtained: (1) population growth curves ( f p ), (2) genotypespecific deviations ( f pg ) and respective growth curves ( f p +f p g ), and (3) plant-specific deviations ( f pgi ) and respective growth curves ( f p +f p g +f pgi ). Note that the four populations (panel by water regime combination) show a different growth pattern. Plants in water deficit show a lower growth compared to well watered plants for both panels (Fig. 4a). Genotype-specific deviations from their estimated population mean are shown in Fig. 4b where positive and negative deviations refer, respectively, to better and worse genotypic performance compared to the mean population. As expected, the magnitude of the deviations (and, thus, the differences among genotypes performance) increases with time. Also, genotypes from Panel 1 show the largest genetic variation under both water regimes. This is in concordance with the spatially corrected data (grey lines in Fig. 4a). The genotype-by-water regime interaction is illustrated for two genotypes per Panel in Fig. 4c. When compared to the average performance of the genotypes in the Panel, genotype 44 in Panel 1 and genotype 20 in Panel 2 behave similarly under both water regimes (they depict similar deviations under both treatments). In contrast, genotype 43 in Panel 1 presents a better performance under WD regime, while genotype 03 in Panel 2 displays a better performance under WW regime. Finally, the model is able to successfully recover the evolution over time of the spatially corrected leaf area (Fig. 4d, grey lines), while appropriately handling the missing data, as illustrated with the plant-specific growth curves (Fig. 4d, dotted blue lines) of two genotypes (see also results for the populations means in Fig. 4a). Additionally, genotype-specific growth curves (Fig. 4d, orange lines) seem to properly summarise/describe the behaviour of the plant curves. To asses the relative accuracy of our model in predicting the plant-specific growth curves, i.e. f p +f p g +f pgi , we computed the mean absolute percentage error (MAPE). For this data, the MAPE value was 2.9% , indicating a good performance.

FIP platform (ETH Zürich). The FIP platform, located at the ETH research station in Lindau-Eschikon
(Switzerland), is a cable-suspended multi-sensor platform designed for automated, accurate and supervised high throughput data acquisition on an area of 1 hectare 57 . From 2015 to 2017, the FIP platform was used to measure the development of canopy height on a diverse panel of European wheat genotypes (GABI wheat 58 ), including a panel of Swiss varieties 15,16,52 . Here we focus on the 2017 data where height measurements started well before jointing. Figure 1b shows the FIP platform with its crop rotation allocated to six different lots; in 2017 the wheat experiment was planted in lots 2 and 6. Details on the experiment and experimental design can be found in Kronenberg et al. 16,52 . In short, the experimental unit was a plot to which the genotypes were allocated as only treatment factor in an augmented 2D design. Three checks were placed in nine complete replications per lot, and test genotypes were allocated in a row-column design assuming that each row within a replicate (lot) received different environmental conditions (due to variability of crop husbandry measures, such as crop protection and fertilisation) while the upper, central and lower range of each lot received similar conditions (due to the similar slope direction within both lots). Canopy height measurements were carried out in irregular intervals of 2 to 13 days between February 27th and June 30th 2017 (which correspond to 58 and 181 DOY, respectively), using a terrestrial laser scanner mounted on the FIP sensor head 52 . To analyse the experiment, we arranged the two replicates (lots) diagonally in a virtual grid of r = 42 rows by c = 36 columns. Additionally, the country in which a genotype was first inscribed into the European variety catalogue was also considered in the analyses. We used this information to allocate the genotypes to different wheat populations targeted for specific regions within Europe. We will refer to these wheat populations (groups of genotypes) as regions. Accordingly  103  108  113  118  123  128  133  103  108  113  118  123  128  133  103  108  113  118  123  128  133  103  108  113  118  123  128  www.nature.com/scientificreports/ First stage results: FIP platform. In this case, the SpATS model included, besides the spatial trend and the random genotypic effects, fixed effects for the two lots (experimental design factor) and the seven wheat populations (region of origin), as well as random effects for the row and column positions. We also considered different genetic variances for each of the seven populations (regions). For the spatial trend, basis dimensions of 42 and 36 were assumed for the row and column positions of the virtual grid, respectively. The correction included the estimated population and genotypic effects and the residuals, and we averaged over the lot fixed effect to eliminate its impact. The spatially corrected canopy heights obtained when modelling genotypes either as fixed or random were, also here, very similar (see Supplementary Fig. S5 online). As for the PhenoArch experiment, the correction performed in the first stage reduced the variability among replicates of the same genotype (see Supplementary Figs. S4 and S6 online). This reduction is due to the lot effect and the spatial variation. The spatial distribution of the raw and spatially corrected canopy height, at four different measurement days (DOY), is shown in Fig. 5. In addition to the estimated growth curves and deviations at the three levels of the hierarchy, for this dataset we also obtained the first-order derivatives of the region and genotype-specific growth curves (Fig. 6); they have shown to be good descriptors of genotype specific growth habit 53 (recall that the first-order derivative is an indicator of the growth rate, i.e. the speed of canopy height development). Results show differences between regions (and between genotypes within a region) in, e.g., growth patterns (Fig. 6a), growth rates (Fig. 6b), and genotypes performance (Fig. 6c). To better illustrate differences among regions, Fig. 7a shows the estimated region growth curves together with the daily average temperature along the experiment. In Fig. 7b important maxima (i.e. largest growth rates) are indicated in the first-order derivatives of the respective region growth curves. Results for the replicate-specific growth curves are included in Supplementary Fig. S7 online (for one genotype per region, as illustration).
Extracting time-independent attributes to characterise genotypes. To characterise the genotypes, we extracted the maximum corrected canopy height (maxTrait) from the estimated genotype-specific growth curves (Fig. 6a) as well as three maximum speed rates (maxSpeed1, maxSpeed2 and maxSpeed3) from their first-order deriva- www.nature.com/scientificreports/ tives (Figs. 6b and 7b). These speed rates correspond to maxima around DOYs 97, 133 and 147, respectively. Maximum speed rates around DOY 133 (maxSpeed2) and 189 (maxSpeed3) can be interpreted, respectively, as recoveries after a severe cold period in April (DOY 110-120) and a milder one in May (DOY 140) (Fig. 7a, grey line). However, it appears that already for maxSpeed3 the growth rates declined as plants approached their final height. We also estimated the area under the genotype-specific deviations (Fig. 6c) as a global measure of a genotype performance over time when compared to the genotypes of the same region. A positive (negative) area indicates a genotype performance better (worse) than the regional average. Here, the area (AUC) was estimated www.nature.com/scientificreports/ for the complete time interval where the genotypes were measured. Nothing, however, precludes to focus attention on a restricted time interval of interest. The bivariate scatterplots of the extracted genotype-specific attributes show that the genotypes cluster according to their region of origin (Fig. 8, lower off diagonal, one color per region). Some atypical genotypes can also be identified, for example an old, tall (> 1.2 m) Swiss variety. In the upper off diagonal, we report the Pearson's correlation coefficient (marginal and by region). The strongest marginal correlations are between maxSpeed3 and maxTrait, and maxSpeed2 and maxTrait; maxSpeed3 and maxSpeed2 also present a high correlation. It is noteworthy that the marginal correlation between maxTrait and AUC is lower than the conditional correlation by region. Univariate analysis of each attribute using boxplots also shows regional clustering, and that clusters might change along time (e.g., compare clusters in maxSpeed1, maxSpeed2 and maxSpeed3). www.nature.com/scientificreports/ Use of time-independent attributes to characterise regional adaptation. While a deeper physiological analysis is beyond the scope of this publication, we will use the extracted attributes to highlight the potential benefit of an in-depth analysis of spline-based growth patterns. Here, we use the regional groups but a similar analysis could be done using individual genotypes. The observed height development follows a principally logistic growth curve: stem elongation started after the plants were vernalised over winter (by means of cold exposure) and ended around flowering. However, the height development plateaued between day 103 and 118 in all genotypes, most likely due to a cold period in April. When looking at the region specific growth curves, this short and extreme phase caused even rank changes in growth (first-order derivative in Fig. 7): the regional groups showing most vigorous growth before the stress (maxSpeed1) stopped growth completely while those which grew slowest could maintain some growth during the cold (first local minima following maxSpeed1). Such pattern may point to physiological adaptations to the different climatic regions of Europe as the slow-growing northern types from Great Britain (GB) and Denmark and Sweden (SE/DK) showed least response to cold while the fast-growing continual types from Poland (PL) and Austria and Czechia (AT/CZ) stopped growing. Moreover, the genotypes from the south-west-France (FR) and Switzerland (CH)-did not recover growth up to the same level as the more northern and eastern varieties did (compare maxSpeed1 with maxSpeed2). We acknowledge that a solid interpretation of this pattern and its significance will require a multi-year analysis to sample the typical genotype or region-specific average development.

Discussion
In agricultural and breeding research, non-destructive data acquisition of phenotypic traits by HTP platforms has emerged in recent years as a rich source of new information on plant growth and development as well as on genotypic performance. This has been accompanied by the need for novel and appropriate statistical methods of analysis. In this paper, we focus on the analysis of a common type of HTP data in the form of time-series observed on phenotypic traits that are still close to the platform measurements (low-level traits), with these time-series occurring within a hierarchical structure, of plants or plots nested in genotypes, and genotypes nested in populations (of genotypes). We propose a statistically flexible and computationally feasible method that decomposes the required spatio-temporal analysis into two stages. In the first stage, we correct the "raw" HTP data for (nuisance) spatial variation and obtain spatially corrected time-series at the resolution of plants or plots with reduced between replicates/plots variability. The second stage consists of a temporal analysis with a hierarchical curve data model to jointly estimate growth curves at each level of the hierarchy (plant or plot, genotype, and population) as well as their first-order derivatives.  59 discuss the use of information obtained from HTP time-series traits for genomic selection and the detection of QTL and causal variants. Finally, while we have focused in the paper on data with a nested structure, the proposed modelling framework can easily be extended to accommodate more complex structures, such as data with crossed levels of grouping 21 (e.g., when modelling genotype-by-treatment or genotype-by-environment interactions are of interest).
Regarding the statistical methods used at each stage of our modelling approach, the first stage relies on the SpATS model, and (hierarchical) P-splines are used for the second stage. However, our approach is flexible with respect to the choice for both the first and second stages. For instance, the separable autoregressive model 6 represents a clear alternative for the first stage, while, for the second stage, hierarchical functional principal component analysis can be used 28 . We believe that our double P-spline approach is attractive both computationally and for interpretation, and the HTP data that we analysed in this paper and in other projects show that it works well (see Millet et al. 20 and https:// eppn2 020. plant-pheno typing. eu/ for more examples). Nevertheless, fairness requires us to mention some limitations as well. The P-spline hierarchical data model used in the second stage relies on specifying each function in the model using B-spline basis expansions. The model contains, consequently, a large number of parameters. Typically, computational times are within an acceptable range (for the experiments analysed in this paper, around 10 min). Yet, the approach may not scale well to experiments where the number of plants (and associated basis dimension) is very large (due to the size of the system of equations to be solved). Regarding the number of B-spline basis functions, we recommend using the same value for the three levels of the hierarchy, even if this increases computation. The final number does not seem to impact results (estimated curves) significantly, provided it is large enough to capture the underlying patterns. However, the estimated first-order derivatives have shown to be more sensitive to the number of basis functions.
In the first stage of our two-stage approach, analyses are done separately per time point. As such, information on spatial heterogeneity is not shared across different measurement times, and there is the need that all plants or plots are measured at (approximately) the same times. From a modelling perspective, the simultaneous modelling of spatial and temporal genetic and non-genetic variation in a one-stage model will serve to share information on common spatial variability across measurement times and it may solve the problem of variable www.nature.com/scientificreports/ measurement times. Yet, one-stage approaches have the limitation of being very computationally demanding, especially when the number of observations and/or the parameters to be estimated is very large. Preliminary and proof-of-concept results can be found in Pérez et al. 14 and in Verbyla et al. 13 . In a similar vein to the work presented here, Pérez et al. 's 14 approach is based on using three-dimensional P-splines for modelling the nongenetic spatio-temporal (noise) process and hierarchical P-splines for the longitudinal genetic signal. In Verbyla et al. 13 , multi-dimensional splines are also used for the spatio-temporal noise, whereas a factor analytic structure is considered for the genetic effects. Although promising, both approaches suffer from severe computational limitations (e.g., Verbyla et al. 13 , report convergence requiring several days). More work is thus required to render them feasible for routine use, and to date, stage-wise approaches represent not only computationally feasible but also statistically valid alternatives. Apart from this work, van Eeuwijk et al. 2 also proposed a two-stage approach for the analysis of HTP data, where they first correct for spatial variation, and then focus on estimating, and further processing, the temporal dynamic of the genetic effects. In that paper, spatially adjusted genotypic means are carried to the temporal analysis. In contrast, our proposal allows keeping the data resolution for the second stage at the experimental unit. Also, in the second stage, we jointly model the whole sample of spatially corrected growth curves, while in van Eeuwijk et al. 2 analyses are done separately per genotype. Our hierarchical approach thus allows borrowing strength across plant/plot curves for more efficient estimation of genotype and population growth trajectories. This is particularly important in the presence of incomplete data. We are aware that when choosing a two-stage approach, one might also choose, as done in Roth et al. 34 , to first model the longitudinal variation at a plant or plot level and subsequently apply a spatial correction to extracted features. In that paper, a P-spline model is first fitted separately for each plot time-series, from which the timing of key stages (among other features) are extracted. These intermediate traits are then processed to obtain spatially adjusted genotypic means for further analyses. We feel that both options-with the spatial or the temporal analysis first-represent valid alternatives. The choice for one or another method will depend on the relative magnitudes of the various spatial, temporal and spatio-temporal genetic and non-genetic processes and may be difficult to assess beforehand. A study of the proposal that best suits particular situations represents an interesting area of study. It should be realised that stage-wise analyses make it necessary to propagate uncertainty from stage to stage. Here, we did it by weighting the second stage with the (inverse) of the estimated variance associated with the spatially corrected trait. In the HTP context, weighting has shown to improve results 34 .
All in all, the two-stage approach described in this paper represents a good compromise between accuracy, adequacy, computational efficiency and interpretability. The results show that our proposal is feasible on standard computers and delivers good descriptions of the genetic and non-genetic variation in the temporal dimension together with useful summary statistics for breeding purposes. We believe that it represents a powerful tool for routine application in phenotyping experiments with dense time series. To allow practitioners to use our proposal, the R-functions implementing it are publicly available at https:// gitlab. bcama th. org/ dperez/ htp_ two_ stage_ appro ach, where we also provide the code to reproduce the analyses and results shown in the paper. www.nature.com/scientificreports/