A new paradigm in modelling the evolution of a stand via the distribution of tree sizes

Our study focusses on investigating a modern modelling paradigm, a bivariate stochastic process, that allows us to link individual tree variables with growth and yield stand attributes. In this paper, our aim is to introduce the mathematics of mixed effect parameters in a bivariate stochastic differential equation and to describe how such a model can be used to aid our understanding of the bivariate height and diameter distribution in a stand using a large dataset provided by the Lithuanian National Forest Inventory (LNFI). We examine tree height and diameter evolution with a Vasicek-type bivariate stochastic differential equation and mixed effect parameters. It is focused on demonstrating how new developed bivariate conditional probability density functions allowed us to calculate the evolution, in the forward and backward directions, of the mean diameter, height, dominant height, assortments, stem volume of a stand and uncertainties in these attributes for a given stand age. We estimate the parameters by considering discrete samples of the diameter and height at a given age and by using an approximated maximum likelihood procedure. The model performance criteria for the height and diameter growth models include statistical indexes and an analysis of residuals.

Foresters seek to predict patterns in the distributions of individual trees within stands, across tree sizes and over space. Determining a distribution of tree diameters and heights has been broadly studied since the twentieth century 1 . Originally, the tree diameter and height distribution concept crystallised in assessing standing timber value based on merchantable piece sizes and forest structure which are the basis of aboveground forest carbon estimates 2 . The distribution of tree diameters at the breast height (in the sequel -diameter) is one of the most investigated subjects of statistical research in forestry. The forestry literature reports that tree diameter distributions vary across different stands 3 . In forest research, the univariate tree diameter distribution was introduced by Meyer 4 and Schnur 5 using a discrete probability space methodology. Natural extensions of a discrete diameter distribution are the continuous univariate gamma distribution 6 , log-normal distribution 7 , Weibull distribution 8 , logit-logistic 9 among others. Classical mathematical modelling of tree size distribution is largely concerned with the use of a well-known probability density function for fitting to the empirical tree size frequencies and does not quantify the evolution of a tree size distribution via a forest stand age. Several authors have analysed the tree height distribution in a forest stand. The tree height distribution can be depicted using two mathematical models. First, a height distribution can be derived indirectly from a diameter distribution together with the relationship between tree height and diameter. The relationship between height and diameter distributions can be accounted, provided that the function relating to height and diameter can be inverted 10 . Second, theoretical height distributions were fitted directly to tree height observations and have included such shapes as the beta 11 , the Weibull 8 , the Johnson 12 , the power-normal 13 and much more. The construction and applications of bivariate distributions of diameter and height are the most used classical fields of research in forest statistics, and it continues to be an active field of research in forest mensuration and inventory, where both the diameter and height are measured on every tree.
The philosophy of the mean stem volume estimation is based on the modelling of the distributions of the volume components of diameter and height instead of the volume itself. The bivariate distribution, as a possible relevant model for mean stem volume statistics, provides more physical insights on the validity domain and the quantification of the error made by using such a distribution. A bivariate distribution of diameter and height can Scientific REPORTS | 7: 15875 | DOI: 10.1038/s41598-017-16100-2 Bivariate stochastic differential equation framework. In this paper, we take a bivariate Ornstein-Uhlenbeck family 29 SDE of the Vasicek type 22 to study the tree diameter and height bivariate distribution problem. This results in an exact bivariate conditional probability density function which parameters can be estimated by maximum likelihood procedure based on discrete time observations. A bivariate mixed effect parameters SDE is set up for describing the tree diameter and height dynamics that extends to a theory of a bivariate conditional probability density function of the tree diameter and height. Proceeding as we have in the fixed effect parameters Gompertz-type bivariate case 23 , the Vasicek type bivariate SDE that describes the development of the diameter and height is defined by: here: M is the total number of stands used for model fitting, t is the time (stand age), In the sequel, initial condition takes the form: if t 0 = 5, then d 0 = 0.0 and h 0 = 1.3.
The Vasicek-type bivariate SDE can be converted into a well-studied bivariate Ornstein-Uhlenbeck 29 process by the transformation has a bivariate normal distribution, we can deduce that the conditional random vector ( and conditional probability density function:

Results and Discussion
The conceptually simplest modelling methodology for complex stands evolves from individual-tree level models that are then aggregated to predict consequences at the stand level 31 . Our developed conditional bivariate probability density function, defined by Eq. 10, outlines the evolution of the tree diameter and height structure via a stand age and can be used to explain some aspects of stand evolution such as the mean values of diameter, height and stem volume and their coefficient of variation, diameter distribution, height distribution, dominant height and the mean stand basal area or stand volume (m 3 /ha).
Estimating results. The maximum likelihood estimator seeks (see Supplement Method) to make the conditional probability density function the most likely fit to the observed diameter and height estimation dataset i starting at the initial age t 0 = 5, diameter and height X(t 0 ) = x 0 = (0.0, 1.3)) T from which a given set of SDEs (1) , were calculated by maximum likelihood and approximated maximum likelihood technique, respectively, realised by Eqs SM2, SM.8 and SM.9 (see Supplement Method), using the NLPSolve procedure in MAPLE 11 32 . The results of the parameter estimates, their standard deviations and the Akaike's Information Index (AIC) 33 are summarised in Table 2. The Akaike's Information Index is defined by: where K is the number of plots in the validation dataset and m i is the number of observed trees of the ith plot and for the height by: Bivariate diameter and height distribution at a given age. The newly developed bivariate conditional probability density function for tree diameter and height is attractive for its simplicity and may be justified from an application perspective. To demonstrate that the data provided by the LNFI (2007-2010) of Scots pine trees do indeed follow the bivariate estimated probability density function defined by Eq. 10 we will use simple graphical techniques to illustrate the goodness of fit. Theoretical validation that a height and diameter dataset observed at discrete ages follows a bivariate probability density function is not easy and there is no simple statistical test. We graphically illustrate goodness of fit of the estimated bivariate density function (Eq. 10) with fixed effect and mixed effect parameters on three randomly selected plots from a validation dataset by plotting the estimated bivariate density functions, their contour plots and the observed datasets. Figure 1 shows the estimated bivariate mixed effect parameters estimated probability density functions for three randomly selected stands from the validation dataset as well as the estimated bivariate fixed effect parameters stationary (time t equates to Univariate fixed effects models 35 calculated by maximum likelihood procedure presented in the Supplementary Method are summarised in Table 2). The diameter and height random effects , 3) for these three randomly selected stands were calibrated by Eqs 13 and 14, respectively.
The contour plots of the bivariate estimated mixed effects probability density function for three randomly selected stands from the validation dataset and the observed values, together with the contour plot of the bivariate estimated fixed effects stationary (t = +infinity) probability density function and the observed values from the same three randomly selected stands from the validation dataset are given in Fig. 2. Figure 2 shows that the mixed effect and fixed effect parameters bivariate estimated probability density functions well capture the main features of the data for three randomly selected stands from the validation dataset. The diameter and height random effects , φ φ for these three randomly selected stands were calibrated by Eqs 13 and 14, respectively. The conditional distribution of the diameter at a given height and age and the conditional distribution of the height at a given diameter and age have univariate , respectively. The mean, variance and coefficient of correlation can be calculated (using Eqs 4-10) in the following forms: In summary, Eqs 16 and 19 show us that the univariate conditional distribution of diameter (height) given height (diameter) has an age dependent variance, which is the same for each height (diameter). It is worth mentioning that the result on the height (diameter) independent variance of this study is influenced by properties of bivariate normal distribution. Figures 3 and 4 show the univariate conditional mixed effect parameters probability density functions at a given height and age (diameter and age) for three randomly selected stands from the validation dataset with the observed values, and the univariate conditional fixed effects parameters stationary (t = +infinity) probability density function at a given height and age (diameter and age).

Models of diameter and height.
Diameter and height evolution can be formulated using a wide range of mathematical relationships from linearised fixed effect parameters regression equations to nonlinear mixed effect parameters generalised relationships. The mathematical technique for a system of uniform diameter and height regional functions is the approach known as the generalised model. The mixed effects regression models are able to achieve the same results as the generalised model 34 .
Next, we summarise the results concerning the evolution of the diameter or height using univariate and bivariate fixed effects and mixed effects for the Vasicek-type growth models. Recall the models for the diameter and height growth: univariate fixed effect parameters bivariate fixed effect parameters     presented in Table 2, and the random effects for diameter and height calibrated by Eqs 13 and 14, respectively). The estimated diameter evolution defined by Eq. 26 and the estimated height evolution defined by Eq. 27 were the best functions, with an increase in the coefficient of determination (R 2 ) and decreases in the absolute prediction bias (AB) and the error (AE) for both estimation and validation datasets.
The estimated evolution of the diameter, In Supplementary Fig. S1, the residuals of the diameter models defined by Eqs 20, 22, 24 and 26, and the LOWESS line (Locally Weighted Scatterplot Smoothing line) are plotted against the predicted diameter values. In Supplementary Fig. S2, the residuals of the height models defined by Eqs 21, 23, 25 and 27, and the LOWESS line are plotted against the predicted height values. Supplementary Figs S1 and S2 show that the residuals that were calculated using the bivariate mixed effects scenario are distributed more symmetrically around zero, with approximately constant variance, compared with the other scenarios. Therefore, the bivariate models incorporating the random effects of the stands were the best models for predicting diameter and height growth of individual Scots pine trees in the study area (see Supplementary Figs S1    , respectively, and describe the knowledge of the predicted mean and uncertainty about the mean completely. These distributions can be used to describe the accuracy of predicted values by calculating the confidence intervals.
Stem structure. The bivariate conditional probability density function defined by Eq. 10 allows us to find the evolution of the percentage of trees in a particular stand that achieve fixed sizes for the diameter and height. Figure 6 shows the evolution of the percentage of two assortments for three randomly selected stands from the validation dataset. Slenderness ratio. The slenderness coefficient is an important characteristic for indexing tree resistance to wind throw and snow damage. Figure 7 shows the evolution of the mean slenderness coefficient for three randomly selected plots. The slenderness of trees decreases with increasing stand age. For the bivariate fixed effect and mixed effect SDEs height and diameter distribution models the evolution of the slenderness is defined as follows:

Mean and coefficient of variation of stem volume.
For forestry applications, the most commonly used approach is to derive measures to estimate forest variables such as mean tree diameter, height and stem volume.  First, the bivariate fixed effects and mixed effects conditional probability density function of the diameter and height allows us to calculate the evolution of the mean stem volume in the following form: h) is the individual stem volume regression function of power form 35 : Estimates of the parameters β 1 β 2 , and β 3 were calculated by a weighted least squares technique 36  . The estimated evolution of the mean stem volume in a stand for the fixed effect and mixed effect models and ± . ⋅ SD t 1 65 () V confidence bands (if we assume normality then they covers 90%) are shown in Fig. 8 for all stands (fixed effect scenario) and three randomly selected stands (mixed effect scenario) from the validation dataset, respectively. Observed values of mean stem volumes in all stands were defined by the average of stem volumes calculated by the regression (Eq. 30). For the mixed effect scenario the mean stem volume estimate proves satisfactory, with the mean prediction bias (the percentage mean prediction bias) −0.0158 m 3 (−3.28%) and with the mean prediction absolute bias (the percentage mean prediction absolute bias) 0.0322 m 3 (5.07%). For the validation dataset the percent of the mean stem volume variation explained attains high levels too, 99.51%, for the mixed effect methodology.  Left -fixed effect scenario: mean stem volume trend -solid; confidence bands -dash; observed mean stem volumes in cross. Right -mixed effect scenario for three randomly selected stands: first stand -red (mean of age, diameter and height: 44.0, 20.5, 21.68) and observed mean stem volume -cross; second stand -blue (mean of age, diameter and height: 100.0, 35.4, 24.9) and observed mean stem volume -circle; third stand -black (mean of age, diameter and height: 145.0, 48.6, 25.6) and observed mean stem volume -box. Mean stem volume trend from initial age to observed stand age -solid and from this age forward (forecast) -dot. Confidence bands -dash.
Second, the bivariate fixed effects and mixed effects conditional probability density functions of the diameter and height allow us to estimate stem volume variability by the coefficient of variation. We estimate the evolution of the standard deviation of the stem volume in the following form: Hence, the coefficient of variation takes the following form: The coefficient of variation of stem volume defined by Eq. 32 is a statistical measure of the dispersion of the volumes calculated by Eq. 30 data points around the mean stem volume calculated by Eq. 29. The coefficient of variation of stem volume is a useful statistic for comparing the degree of variation of stem volume from one stand to another, even if the means are drastically different from one another. Figure 9 shows a plot of the coefficient of variation as a function of stand age using the mean trend and standard deviation functions of stem volume. The coefficient of variation of the stem volume evolves into a stationary coefficient of variation. The coefficient of variation based on stem volume decreases with an increase in stand age.
Stand volume per ha. Estimation of stand volume per ha is important for sustainable forest management.
Methods to forecast stand volume per ha (or stem volume structure) for a short or long time period are necessary to ensure forest management schemes. Traditionally, estimating stand volume per ha includes several steps, such as the choice of stem volume regression model, the choice of a height-diameter model when tree height data are incomplete, the expansion of a sample volume to the number of trees it represents per unit area or the choice of stand volume per ha model developed by replacing the original tree variables into equivalent stand variables 37 (tree height are replaced by dominant height and diameter at breast height by quadratic mean diameter). The SDEs framework enables us to characterise a stand volume per ha as a function of any specified stand age, t, in the following form: here m N (t) is the number of trees per ha at a given age, t. The Vasicek-type univariate SDE that describes the development of the number of trees per ha,   Figure 10 shows the evolution of the stand volume per ha as a function of a stand age using the fixed effects scenario and the mixed effects scenario (for three randomly selected stands from the validation dataset).

Conclusions
We focus on discussion of the bivariate Vasicek-type stochastic process that drives the density evolution of the tree diameter and height structure via stand age. In this study, principal concepts of the bivariate mixed effect parameters Vasicek-type SDE in simple terms are presented and results of the diameter and height distribution, mean diameter, mean height, mean stem volume and stand volume per ha evolution via stand age are illustrated using the Lithuanian National Forest dataset from 2007 to 2010 for Scots pine trees. A new SDE framework for bivariate diameter and height distribution modelling is demonstrated using stand-level random effects. The newly developed conditional probability density functions are easy applied for the modelling of the evolution of the stand attributes via stand age.
The newly developed bivariate conditional probability density functions allowed us to calculate the evolution, in the forward and backward directions, of the mean diameter, height, dominant height, stem volume, stem volume of a stand and the uncertainties in these characteristics completely for a given stand age. The derived conditional probability density can be used to describe the accuracy of the predicted values (diameter, height and stem volume) in the sense of confidence bands.
It is easy to estimate fixed and random effects of the bivariate SDE using large datasets, which in this case was provided by the National Forest Inventories (one cycle). Moreover, the newly developed bivariate mixed effects conditional probability density function is an appropriate model for examination of diameter and height distributions of uneven-aged forest stands.
The present paper attempts to expand the scientific knowledge related to the evolution of tree diameter and height distributions in a forest stand by using SDEs, random effect technique and their possible applications to the modelling of the mean diameter, height, stem volume and stand volume per ha of a stand. Although this study has some limitations (e.g. estimation and validation datasets are not provided with remeasurement cycles of plots), these findings contribute to a better understanding of how stand growth and yield attributes be related to the stand age. Therefore, in our opinion, this study represents one of the first attempts to examine the evolution of growing stock resources, their structure and changes using the bivariate random process methodology and utilizing the data from one cycle of the LNFI. For a better understanding of the tree number and stand volume per ha dynamics remeasurement data from permanent plots (LNFI) could be utilized.
The trivariate (diameter, height and number of trees per ha) distributional model would be superior to the bivariate (diameter and height) distributional model in accordance to the underlying covariance structure driving changes in the tree diameter, height and number of trees per ha.
The number of trees per ha plays a central role for stand growth, but it's representation in the LNFI dataset is limited by data availability. Missing data on initial density of stand (number of trees per ha at the age of t 0 = 5), restricted ability to develop a trivariate model.
Further research on the interplay between tree diameter, height and number of trees per ha using trivariate diffusion process and remeasurement data (2 or 3 cycles) from the LNFI would improve the consequences for the tree number and stand volume per ha evolution.
The analysis of the mixed effect SDE bivariate model by the specification of one random effect structure that used estimation based on a maximum likelihood procedure is not properly supported by the LNFI dataset. On the other hand, the model with the structure of all possible random effect components included may not converge. Figure 10. Estimated evolution of stand volume per ha (m 3 ) using fixed and mixed effect scenarios. Left -fixed effect scenario: stand volume trend -solid and observed stand volumes -crosses. Right -mixed effect scenario for three randomly selected stands: first stand -red (mean of age, diameter and height: 44.0, 20.5, 21.68) and observed stand volume -cross; second stand -blue (mean of age, diameter and height: 100.0, 35.4, 24.9) and observed volume -circle; third stand -black (mean of age, diameter and height: 145.0, 48.6, 25.6) and observed stand volume -box. Stand volume trend from initial age to observed stand age -solid and from observed age forward (forecast) -dot.