Wind energy potential assessment based on wind speed, its direction and power data

Based on wind speed, direction and power data, an assessment method of wind energy potential using finite mixture statistical distributions is proposed. Considering the correlation existing and the effect between wind speed and direction, the angular-linear modeling approach is adopted to construct the joint probability density function of wind speed and direction. For modeling the distribution of wind power density and estimating model parameters of null or low wind speed and multimodal wind speed data, based on expectation–maximization algorithm, a two-component three-parameter Weibull mixture distribution is chosen as wind speed model, and a von Mises mixture distribution with nine components and six components are selected as the models of wind direction and the correlation circular variable between wind speed and direction, respectively. A comprehensive technique of model selection, which includes Akaike information criterion, Bayesian information criterion, the coefficient of determination R2 and root mean squared error, is used to select the optimal model in all candidate models. The proposed method is applied to averaged 10-min field monitoring wind data and compared with the other estimation methods and judged by the values of R2 and root mean squared error, histogram plot and wind rose diagram. The results show that the proposed method is effective and the area under study is not suitable for wide wind turbine applications, and the estimated wind energy potential would be inaccuracy without considering the influence of wind direction.


Methodology
Wind speed model with Weibull distribution. When the frequency of low wind speed, especially of null wind is significant, a three-parameter Weibull distribution can be used to model this wind speed data well and a more appropriate results can be obtained. The pdf of wind speed using the three-parameter Weibull distribution is given by 4,13 : where v is wind speed, η is the scale parameter (m/s), η > 0, β represents the shape parameter, β > 0, and γ is the position parameter, γ ≤ 0. When γ = 0, three-parameter Weibull distribution reduces to two-parameter Weibull distribution. (1) where Λ = [w, η, β, γ] are unknown model parameters of wind speed. Then the log-likelihood function can be given as follows: Due to the complexity of the log-likelihood function, the model parameters cannot be got by taking the partial derivatives of log-likelihood function with respect to each parameter and setting them equal to zero. Therefore, the log-likelihood function is maximized directly to estimate the model parameters. Unfortunately, there is no closed-form expression for computing them, it only can be numerically estimated. Therefore, a numerical method, such as EM algorithm is needed to find the maximum likelihood estimates of the parameters.
EM algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of model parameters for statistical distribution from a given data set. It proceeds iteratively in two steps, the expectation (E) step and maximization (M) step. In the E-step, a function for the expectation of the log-likelihood is created, and the hidden variables or missing data are estimated given the observed data and current estimator of model parameters. In the M-step, the likelihood function defined by the previous E-step is maximized to obtain new parameter estimations under the assumption that the hidden variables or missing data are known. It should be noted that the E-step and M-step in EM algorithm are performed iteratively until the algorithm converges. Initial values are required for the iterative procedure. In this study, the population is divided into m components, and the estimated parameters are assumed to have the same values with a single Weibull distribution for each component. These estimates are considered as initial values for the iterative procedure.
For an m-component mixture model, Eq. (6) is used to find the mean c 1 , variance c 2 , the coefficients of skewness c 3 and kurtosis c 4 of wind speed, respectively.
When the model parameters of three-parameter Weibull mixture distributions are known, based on Eq. (6), the values of c 1 , c 2 , c 3 and c 4 can be obtained as follows, respectively: www.nature.com/scientificreports/ Wind direction model with von Mises distribution. For the assessment of wind direction, the voM distribution is used. Consider a random variable θ following the voM distribution, the corresponding pdf is 27,28 where θ is wind direction in radians units, μ denotes location parameter or mean direction on the circle, 0 ≤ μ ≤ 2π, α represents concentration parameter, α ≥ 0, and I 0 (α) is the modified Bessel function of the first kind of order zero, given by 27,28 When wind direction has several modes or prevailing wind directions, the distribution of wind direction comprises a finite mixture of voM distributions. Thus, based on Eq. (11), the corresponding pdf of mixture distribution model can be given by where k is the number of components and p i is the weight coefficient of the ith component that sum to one, given by The mixture of voM distributions corresponds to the weighted sum of several voM distributions. It is thus suitable for the statistical description of multimodal datasets. Given the n observed wind direction data Θ = [θ 1 , θ 2 , …, θ n ], the likelihood function on Θ is given by where Δ = [p, μ, α] are unknown model parameters of wind direction. Then the log-likelihood function is computed by the following expression: The model parameters can also be estimated by the EM algorithm for maximum likelihood estimation 29,47 . If the model parameters of the voM mixture distributions are known, the mean b 1 and variance b 2 can also be got as follows, respectively: Joint distribution model of wind speed and direction. Based on Eq. (2), the corresponding cumulative distribution function (CDF) for wind speed is given as follows: Equation (13) can also be expressed as a series of Bessel functions and given by.
where I q (α) is the modified Bessel function of the first kind of order q, whose expression is given by www.nature.com/scientificreports/ Therefore, using Eq. (20), the CDF for wind direction can be obtained as follows: The joint angular-linear pdf of wind speed and direction is then given as 33 where g(ζ) is the correlation pdf of circular variable ζ between wind speed v and direction θ, and ζ given by 33 Using the above definitions, the values of ζ j can be obtained for each pair of values of wind speed v j and direction θ j (j = 1, 2, …, n) from a sample of size n given as 33 Therefore, based on Eqs. (2), (13), (19), (22) and (24), the pdf g(ζ) of the variable ζ can also be described by a mixture of voM distributions. Selection of optimal model. For mixture distribution model, the selection of the number of components is important. The log-likelihood cannot be directly used for selecting the number of components, since the log-likelihood value increases with the number of components. The best models of wind speed and direction are selected by two information criteria, which are named the Akaike information criterion (AIC) 12,25,39,47 and Bayesian information criterion (BIC) 37 . The values of AIC and BIC are defined as where l is the number of estimated parameters, n is the number of all observations, and maxlnL is the maximized log-likelihood. In this study, the number of unknown parameters l = 4 m-1 for m components three-parameter Weibull mixture wind speed model, and l = 3 m-1 for m components voM mixture wind direction model. The maxlnL can be obtained by Eqs. (5) and (16) with EM algorithm. AIC and BIC contain the penalization terms that take into account the number of model parameters and all observed values to counterbalance the maximized log-likelihood. To avoid overfitting, AIC penalizes model for its complexity only with model parameters, while BIC imposes a greater penalty for additional parameter than AIC. So, AIC and BIC give a comprehensive balance in order to find a good tradeoff between the goodness-of-fit and the complexity of the model, and avoid the risk of choosing a complex model with a poor generalization. The smaller the values of AIC and BIC are, the higher the fit accuracy of model is. Therefore, the number of components does not need to be known in advance.
Validation of model. The coefficient of determination (R 2 ) and the root mean squared error (RMSE) are used to judge the goodness-of-fit of different mixture models to wind speed and direction data, because it quantifies the correlation between the observed and the estimated probability density according to a particular distribution.
The coefficient of determination is the square of the correlation between the estimated values and observed values, and is defined by 13,16 where y j is the jth observed value, y m is the mean value of all observations, and ŷ j is the jth estimated value, respectively. A large value of R 2 indicates the proposed distribution fits the wind speed data set well in all candidate models.
Unlike the value of R 2 , a high RMSE value indicates a poor fit. The smaller the values of RMSE are, the better the proposed distribution function approximates the observed data. It can be given by 13,16 Wind energy estimation The density of air changes slightly with air temperature and with altitude at a potential site, supposed that air density is constant, based on a pdf f(v) of wind speed v at a height for a wind turbine, in theory, the wind power P(v) in W can be computed by 10,12  www.nature.com/scientificreports/ where ρ is air density (kg/m 3 ), A is the sweep area of a wind turbine rotor (m 2 ). Therefore, the wind power density p(v) in W/m 2 can be given as Using the real wind speed values of time series wind data, the effective wind power density, denoted as p e (v) in W/m 2 is estimated as follows 12,22,42 : where n e is the number of effective wind speed, which lies between the cut-in wind speed v in and cut-out wind speed v out of wind turbine.
If the wind direction influence is ignored, the wind power density p(v) for a specified wind turbine can be obtained numerically by When considering the effect of wind direction, similar to Eq. (32), the wind power density p(v,θ) at different wind speed and direction can be computed numerically by 37 The power curve gives a relation between the output power P w (v) and wind speed v, and this relation can be expressed by a polynomial function of degree u as follows 48-50 : where P w (v) is the output power, P r denotes the rated power of wind turbine, v r represent the rated speed, a 0 and a i are the regression constants which can be obtained using a polynomial regression method.
Based on the power curve of a specified wind tribune, the wind energy output E(v,θ) with a joint pdf of wind speed and direction within a period of time t can be calculated numerically by 33 Case study. Wind speeds are continuously acquired for a significant time, usually not less than one year. Therefore, the data used in this study were collected in a period of one year (January 1, 2019 to December 31, 2019) and measured at a height of 30 m above the ground level from the Maling Mountain wind farm (34°31.4′ N and 118°45.7′ E) located in Jiangsu Province, China (see Supplementary Table S1 on line). The Maling Mountain wind farm is selected due to the fact that the histogram of wind speed indicates that the frequency of 0-2 m/s wind speed range is 8.25% for this station (see Supplementary Table S2 on line). The percentage of null wind speed or calms (0-0.2 m/s) at this wind farm is 0.50%. Wind direction data are circular because they are recorded in terms of degrees, from 0° clockwise to 360°. However, for modeling convenient, the data are converted into radian units. After removing some abnormal and unreasonable data such as the missing data by sensor fault, measurement error data and low temporal resolution data, a total of 47,084 wind data are collected. The statistical description of wind speed, its direction and wind power data for 1.8 MW wind turbine are shown in Table 1.

Results and discussion
The estimated parameters with different methods for the Weibull distribution wind speed model are shown in Table 2.
A comparison results with different methods for the Weibull distribution wind speed model are also given in Table 3. Based on the information criteria of AIC and BIC, we can see that the fit accuracy of mixture model is higher than that of single model, and the accuracy of LSE is the lowest. For single model, three-parameter model has a higher fit accuracy than that of two-parameter model. The reason is that the former considers the null wind speed. However, for mixture model, as the number of components increases, the accuracy of the model decreases. Therefore, we select two-component three-parameter Weibull mixture model as the optimal model for wind speed. This result is also confirmed by the values of RMSE and R 2 . Because a lower value of RMSE and a higher value of R 2 indicate a better fit of the model to the data, and two-component three-parameter Weibull mixture model has the lowest value of RMSE and highest value of R 2 in all candidate models. It is worth noting that in the case of multi-modal data, the fitting, modeling and analysis for a statistical distribution are more www.nature.com/scientificreports/ accurate than an ordinary regression analysis, a value of R 2 only more than 0.7 is not sufficient 28 . In this study, this value is high, it is 0.9944. The fit results of different models are given in Fig. 1. It also shows that two-component three-parameter Weibull mixture model adequately fits the frequency histogram of wind speed well than other models. The fit accuracy of three-component three-parameter Weibull mixture model and two-parameter Weibull single model with LSE method is the lowest in all models. To fit the sample histogram, the wind speed and direction intervals must be given. In this case, the bin size of wind speed and direction interval is selected as 0.5 m/s and 10° (0.1745 rad) 7,21,27,29,35 , which is often considered to be reasonable in wind energy analyses. The speed interval of 0.5 m/s is also close to the value of 2 km/h (approximates 0.56 m/s) used by Deep et al. 14 and Gugliani et al 25 .
They concluded that the class width of 2 km/h gives a minimum error for modelling wind speed data.
A statistical description of wind speed data can give some useful information on wind speed, such as mean, variance, symmetry and flatness. We can also calculate them using Eqs. (7)-(10) and compare them with the statistical description in Table 1 to verify the rightness and effectiveness of our proposed method. Therefore, Table 1. Statistical description of wind speed, its direction and wind power data. The data records corresponding to the periods of averaged 10 min each and containing wind speed, wind direction and wind power data for 1.8 MW wind turbine collected from supervisory control and data acquisition (SCADA) systems.   www.nature.com/scientificreports/ using Eqs.  Table 1, we can find that the relative errors of mean and variance are small, they are only 6.30% and 2.51%, respectively, and the estimated values of c 3 > 0 and c 4 < 3. All these indicate that the probability distribution of wind speed has a long and light right tail than a normal distribution. This result also agrees well with two-component three-parameter Weibull mixture distribution in Fig. 1. Table 4 is the results of estimated parameters with different components for voM wind direction model. Table 5 is a comparison results with the different mixture models for wind direction. It can be observed that increasing the number of voM components from one to ten in the mixture distributions will increase the value of the R 2 coefficient, which indicates a better fit to the data. However, when the number of components increases to ten, the value of the R 2 coefficient does not increase any more, it is the same as the nine-component mixture model has. At the same time, a nine-component mixture model has the same value of RMSE as a ten-component mixture model. In this situation, how to select the best wind direction model? However, it is noticed that ninecomponent mixture model has lower values of AIC and BIC, so based on the information criteria of AIC and BIC, and the values of RMSE and R 2 , the best wind direction model would be selected using the comprehensive criteria of information and goodness-of-fit. It can be concluded that the most suitable model for wind direction  www.nature.com/scientificreports/ at this station is a voM mixture model with nine-component distributions. Generally speaking, a voM mixture distribution with six components for the modelling of wind direction is enough, increasing the number of components of mixture distributions, the variations in value of R 2 are not significant 27 . On the contrary, it would yield a complex model. Therefore, combining with the comprehensive criteria of model selection, an appropriate number of mixture distributions can be selected, which not only reduces the computational burden but also improves the model accuracy, and the model has a higher predictive ability. The fit results of different models are given in Figs. 2(a) and 2(b), it also shows that nine-component mixture model fits the frequency histogram of wind direction well than other models.
In Table 4, from the estimated model parameters of nine-component mixture model of wind direction, we can see that the parameters of μ, which correspond to the nine main wind directions, are 0.0010, 0.5544, 1 Fig. 2(b), the prevailing wind directions covering from 0° to 180°.
This result is also confirmed by the wind rose diagram shown in Fig. 3. According to the frequency of occurrence for wind direction, the nine main wind directions are classified into three kinds in descending order in this study: the first four, the middle three and the last two main wind directions. We can see that the first four prevailing wind directions are the ESE, ENE, E and NNE directions with 9.39%, 8.92%, 8.72% and 7.95% of frequency of occurrence, respectively. In Table 4, the four estimated parameters μ of nine-component mixture model are 2.1458, 1.3416, 1.5989 and 0.5544 rad, which correspond to the wind directions are 122.94°, 76.87°, 91.61° and 31.77°, respectively. This result is close to the result given by Fig. 2(b) with the ESE, ENE, E and NNE directions (112.5°, 67.5°, 90° and 22.5°). The middle three predominant directions are the SE, SSE and N directions (135°, 157.5° and 0°) with 7.74%, 7.42% and 7.40% of occurrence. This result also agrees well with the result given by the estimated parameters μ = 2.2854, 3.0405 and 0.0010 rad (130.94°, 174.21° and 0.06°) in nine-component mixture model of wind direction. The last two main wind directions are the WSW and NW directions (247.5° and 315°) with 5.11% and 4.20% frequency. Therefore, the yaw system of a wind turbine can be arranged to the ESE direction, since most of the wind blows from this direction, which will enable the wind turbine to be positioned in such a way as to maximize the captured energy.  www.nature.com/scientificreports/ Using Eqs. (17) and (18), the estimated mean and variance of wind direction can be given as follows: b 1 = 2.8051 rad (160.72°) and b 2 = 0.1052, respectively. The relative error of b 1 is very small, it is only 2.99%.
The statistical association between the wind speed and wind direction is measured by the linear-circular correlation coefficient, r 2 , given by 33,37,40 where r vc = corr(v, cosθ), r vs = corr(v, sinθ) and r cs = corr(cosθ, sinθ). The correlation coefficient between wind speed and direction should satisfy the requirement of ‫|‪r‬|‬ < 1/3. In this study, the value of r is small and equals to 0.1101, therefore, it can be seen that there exists a weak correlation between wind speed and direction. The absolute value of correlation coefficient is within 1/3 satisfying the condition of use for voM wind direction model. At last, the parameter estimation results for the pdf g(ξ) of circular variable ζ between the wind speed and direction using a voM mixture distribution with the different components are given in Table 6. Table 7 is a comparison results with different components for circular variable. From Table 7, it can be found that increasing the number of voM components from one to six in the mixture distributions will increase the value of the R 2 coefficient. However, when the number of voM components increase to seven, the value of the R 2 coefficient does not increase, it is the same as the six-component mixture model has. At the same time, a seven-component mixture model has the same value of RMSE as a six-component mixture model. It is also noticed that six-component mixture model has a lower value of AIC and BIC, so based on the comprehensive criteria of model selection, the most suitable model for the wind direction at this station is sixcomponent voM mixture model.
When considering the correlation between wind speed and direction, the number of parameters of μ, which corresponds to the main wind directions, is decreased from 9 to 6. They are 0. 4968 Figure 3. Wind rose diagram of wind direction. The wind direction is divided into 16 different sections. In the range from the direction N clockwise to S, three are seven main wind directions with 57.54% frequency totally. The other directions show an approximately uniform dispersion.  Fig. 4. The maximum error does not exceed the value of one Sect. (22.5°). Therefore, based on Eqs. (23) and (33), the joint pdf of wind speed and direction and wind power probability density can be given in Figs. 5 and 6.
The scatter plot of wind power output versus wind speed for 1.8 MW wind turbine is shown in Fig. 7, after pre-processing with a data-cleaning method, some abnormal and unreasonable data are discarded. The number and value of model parameters of power curve affect the fitting accuracy for wind power to speed data; more parameters can imply better performance, but can also mean time-consuming. Therefore, 4-parameter and 5-parameter logistic functions were compared to find the best performance 49 . In our study, we use a polynomial regression model to fit wind power curve because of its convenient calculating with Matlab and a high coefficient    Table 8, based on the correlation coefficient R 2 , we select the eightdegree polynomial as the best model for wind power output.   www.nature.com/scientificreports/ The power curve of the studied wind turbine as shown in Fig. 8. The expression of wind power in W with wind speed in m/s is given as follows: In this study, the cut-out wind speed (maximum allowable speed) and rated output power are known, they are 18.99 m/s and 1800 kW, respectively. The rated wind speed can be found by examining the power curve, i.e., the lowest wind speed, where the wind turbine first reaches its rated power, is the rated wind speed. Therefore, set Eq. (37) equals to 1.8 × 10 6 and 0, we can get the rated output speed, v r = 9.74 m/s, and the cut-in speed, v in = 2 m/s, respectively, as shown in Fig. 8.
To estimate the wind power output of a wind turbine, it is necessary to know wind speed and the number of hours of the year, in which the wind blows at velocity v. The period t of one year, i.e., approximates 8760 h and the air density of studied area is 1.216 kg/m 3 are used. Subsequently, the estimated values of wind power density at this wind farm were calculated using Eq. (33) is 93.27 W/m 2 . This value is close to the reference value of wind power density of 88.14 W/m 2 which is obtained using Eq. (31). It indicates that the region under study stands in class 1 10 , which belongs to the low-wind speed wind power development area, and is generally not suitable for wide wind turbine establishment and wind farm investment. However, it would be possible to exploit the wind power applications for small scale wind turbines at this area. Based on Eq. (32), a comparison result without taking into account of wind direction is 125.78 W/m 2 , is also presented. It is clearly shown that the wind energy potential is overestimated without considering the effect of wind direction. Using Eq. (35), the annual energy output can also be obtained, it is 2.21 GWh.

Conclusions
In this paper, the wind energy potential of the Maling Mountain in China is studied by using the measured wind data for a period of one year at a height of 30 m. Based on EM algorithm, an assessment method of wind energy potential using finite mixture statistical distribution model is proposed, the probability density function of wind power density and the annual energy output are given for use in wind energy analyses. The suitability of the model is judged using a comprehensive technique of model selection including AIC, BIC information criteria, coefficient of determination R 2 and RMSE. Field monitoring wind data are used to verify the effectiveness and validity of the proposed method by comparing it with other estimation methods in accordance with the value of R 2 and RMSE, the histogram plot and wind rose diagram. Wind direction is also important for a given wind farm when the orography and wake effects are going to be studied, so the orography and wake effects will be carried on in our future research, in addition we will also use Kato Jones distribution as candidate wind direction model and compare it with von Mises distribution. The further study to extend the modelling of direction-dependent power curve for wind turbine is also needed. The main conclusions are drawn from the study as follows: 1. The proposed method takes into account the effect of wind speed and direction simultaneously, the correlation existing between both variables, as well as the bimodal or multimodal distributions of them. The mixture distribution model provides a better fitting result for wind data than single distribution model, and it can therefore be used in the assessment of wind energy at a potential site and assessment result is more accuracy and close to the reality. On the other hand, three-parameter Weibull distribution considers the frequency of calm winds, it shows a good fit to wind speed data with a significant null wind speed or high percentage of low wind speed and is particularly suitable for a skewed data with a long tail in histogram plot. www.nature.com/scientificreports/ 2. The best mixture model with the lowest AIC and BIC values is selected as the optimal model from all candidate models with a finite component number, therefore it is not necessary to know the number of components in mixture model in advance. Increasing the number of components of the mixture distributions, the variations in value of R 2 are not significant. On the contrary, it might yield complex. Therefore, combining with the comprehensive technique of model selection, an appropriate number of mixture distributions can be selected, which not only reduces the computational burden but also improves the model accuracy, and the model has a higher predictive ability. 3. Compared with the real wind power density of time series wind speed data, it also shown that when there exists a correlation between wind speed and its direction, the estimated results of wind energy potential is more close to the real situation when considering the influence of wind direction.