Introduction

The future of the energy industry lies in clean power that minimizes or entirely removes pollutants from the process of power generation. The perfect clean energy mix occurs where green energy, derived from natural sources, meets renewable energy from sources that are constantly being replenished. Wind energy is one of the most important sustainable forms of this ideal clean energy and one of the fastest-growing energy sources. A sophisticated knowledge, based on statistical analysis, of wind characteristics is crucial for the future harnessing of this important renewable energy resource. Wind power is developing as a renewable energy source in a number of countries and it will be increasingly important to find an effective and predictable way of integrating this intermittent but environmentally friendly power source into the existing electrical grid system.

In South Africa, there is an increasing transition towards an environmentally sustainable, climate-change resilient, low-carbon economy. In October 2020, the South African Wind Energy Association (SAWEA) reported that wind technology has already attracted R209.7 billion in investment for the development of projects in South Africa. In fact wind power comprises a larger share of the planned renewable energy investments to date. It is estimated by 2030 that \(22.7\%\) of the required electricity in South Africa, namely 17742 MW, will be generated from wind energy. In terms of job creation, the 22 wind Independent Power Producers (IPPs) that have successfully reached commercial operations to date, have created 2723 jobs for South African citizens.

Wind as an energy source is only practical in areas that have strong and steady winds. South Africa’s climatology allows for significant wind energy production especially along the coastal areas of the Eastern and Western Capes. The first large-scale wind farm in South Africa became operational in 2014 and based on the SAWEA report, there are 33 wind farms: 22 fully operational and 11 in construction. In this paper, we will study the wind direction of two operational wind farms in South Africa: (1) Jeffreys Bay (Humansdorp), located in the Eastern Cape; (2) Noupoort located in the Northern Cape. In addition, we will investigate the wind direction data from Marion Island, part of the Western Cape Province which possesses excellent potential for wind studies.

Unlike conventional energy resources that are available at any time, wind speed and wind direction need to be forecasted in advance in order to estimate production and plan its contribution to a nation’s grid system. As the use of wind power increases, accurate forecasts are essential to maximize output from the wind farms. This includes the most important decision of all, the location of a wind farm and the placement of its turbines1.

The location of an industrial-scale wind farm, defined as a cluster of wind turbines used to produce electricity, is of paramount importance. Measuring the farm-specific wind characteristics including mean wind speed, wind speed distribution (diurnal, seasonal, annual patterns), distribution of wind direction, short-term fluctuations, long-term fluctuations and wind shear profile are essential for determining the location of farm and turbines. This can strongly influence the performance of the wind turbines and thus the power generated by the wind farms2. Moreover, interactions among multiple turbines change the power generation efficiency of turbines. Specifically, the wakes from upwind turbines can greatly affect the power production of downstream turbines, and this effect depends strongly on the wind direction3. Generally, downstream turbines produce less power compared to upwind turbines, but changes in wind direction can cause heterogeneity in the power curve of each turbine such that some upstream turbines can become downstream turbines4. Porté-Agel et al.5 presented a study about the effects of wind direction on turbine wakes and power losses at a large wind farm. Castellani et al.6 showed how the alignment of wind turbines to wind direction affects efficiency (see also Kazacoks et al.7 and Gomez and Lundquist8).

Predicting wind speed and wind direction are crucial to choose the location of wind farm and the placement of its turbines and also to estimate wind power production. To the best of the authors’ knowledge none of the existing literature follows a directional statistics approach for prediction of the wind direction. The interested reader is referred to some contributions in which several approaches have been proposed for forecasting wind direction. El-Fouly et al.20 suggested a linear time-series-based model for prediction of wind speed and direction. Garcia-Planas and Gongadze21 constructed a predictive model for wind speed and direction based on linear Markov chains under linear algebra point of view (see also Zeng et al.22, Fan et al.23, Zheng et al.24, Chen et al.25, Liu et al.26, Giangregorio et al.27, Wang et al.28). Note however that this paper approaches skew directional models from the Bayesian statistical angle.

Circular statistics can be applied to obtain the distribution of wind direction while Weibull, gamma, normal, Rayleigh, log-normal, inverse Gaussian, logistic distributions are some common models for the wind speed (see Deep et al.9 and Gugliani et al.10). For example, mixtures of von Mises (VM) distributions have been widely applied to model wind direction for different locations10,11,12,13,14,15,16,17. Gugliani et al.18 have applied Kato and Jones circular distribution19 to model wind direction.

However, wind datasets usually exhibit skew and multimodal patterns while most of the well-known circular distributions are symmetric such as the von Mises. Therefore in this paper, the application of skewed multimodal distributions is investigated for modeling the wind direction of South Africa hotspots from Bayesian viewpoint. The k sine-skewed von Mises (SSVM) distribution29 and mixtures of SSVM are ideal candidates to model wind direction data exhibiting both skewness and multimodality behaviour. Due to the fact that the likelihood-based inference and also the expectation maximization (EM) algorithm techniques for mixture models can be computationally complicated, a Bayesian approach can overcome such computational difficulties. It provides more accurate results for small datasets. Bayesian inference is conditional on the data and is exact, without reliance on asymptotic approximations. The Bayesian predictive posterior function can be used to forecast the wind direction.

Two important contributions of a Bayesian stochastic model are as follows:

(1) Inclusion of uncertainty about the parameters of the wind direction distribution results in using a more practical predictive distribution for the wind direction. This implies the predictive distribution is more disperse than the probability distributions when the uncertainty about the parameters is neglected. (2) The prior distributions of the parameters can represent the heterogeneity of the distributions of the wind direction over a wind farm. The wind direction distributions for various turbines on a farm may belong to the same family, such as the skew-von Mises, but the model parameters of each turbine may be different randomly according to some probability distributions. The Bayesian predictive distribution aggregates the non-homogeneous distributions into a single distribution that captures the variation among the probability distributions of the wind directions at the turbines’ locations on a wind farm.

There is a vast literature on the Bayesian approach for symmetric directional data specifically, Bayesian analysis using the symmetric von Mises and von Mises-Fisher distributions30,31,32,33,34,35,36,37,38,39. The von Mises-Fisher mixture model is implemented by Taghia et al.40 and Roge et al.41. Mulder et al.42 provided a Bayesian inference for mixtures of von Mises distributions using the reversible jump Markov chain Monte Carlo (MCMC) sampler and focused on noninformative priors. From the preceding it follows there is a gap in the literature that inspired us to propose novel Bayesian analysis of skew directional wind data. Recently Nakhaei Rad et al.43,44 provided Bayesian analysis for skew von Mises-Fisher distribution and skew Wrapped Cauchy mixture model.

In “Site location and wind data”, we provide details of the datasets that are analyzed in this paper. “Materials and methods” revisited the k sine-skewed von Mises distribution and the maximum likelihood estimates (MLEs) of the mixture of SSVM parameters. The Bayesian inference of the mixture of SSVM is also presented, followed by the posterior predictive distribution to forecast the wind direction. In “Evaluation and results”, a simulation study is conducted to show the performance of the proposed Bayesian approach. Finally, SSVM and mixture of SSVM are fitted to these datasets for different values of k together with their competitor, namely the mixture of von Mises distributions.

Site location and wind data

The first dataset (A) shows the wind direction of Marion island which is recorded daily at 08:00, 14:00 and 20:00 South Africa standard time (SAST) (relates to the main synoptic hours). Marion Island is part of South Africa with a climate that is highly oceanic in nature, coupled with the influence of passing frontal weather systems. In fact, the geographic location of Marion Island, lying directly in the path of eastward moving depressions all year round make it an excellent location for meteorological studies. Powerful regional winds, colloquially known as the ‘Roaring Forties’, so called as they have found between the latitudes of 40\(^{\circ }\) and 50\(^{\circ }\) in the Southern Hemisphere, blow almost every day in a north-westerly direction. The exceptional research potential of Marion Island for wind studies, as well the rate and impacts of climate change, is demonstrated by the presence of a permanent meteorological research station on the island. This station was established as early as 1948, and run by the South African National Antarctic Programme (see Fig. 1).

Figure 1
figure 1

Marion island (created by the University of Pretoria) and meteorological research station on the island (provided by Antarctic Legacy of South Africa http://www.antarcticlegacy.org and https://blogs.sun.ac.za).

The second dataset (B) reflects the wind direction of Jeffreys Bay wind farm, recorded every 10 min at 60 m height. Jeffreys Bay is one of the biggest wind farms in South Africa spanning 3700 hectares with a 138 MW capacity. This site’s optimal wind conditions, relatively flat topography, minimal environmental constraints and its close proximity to the Eskom (electricity supply commission of South Africa) grid line, make it an ideal wind energy resource (see Fig. 2, left).

Figure 2
figure 2

Jeffreys Bay (Humansdorp) wind farm https://jeffreysbaywindfarm.co.za (left) and Noupoort wind farm https://noupoortwind.co.za (right).

The last dataset (C) shows wind direction of Noupoort wind farm comprising 7500 hectares and providing a 80 MW capacity, recorded every 10 min at 20 m height. This site is significant because of the excellent wind conditions, its proximity to national roads for wind turbine transportation, the favourable construction conditions, municipality and local stakeholder support and the straightforward electrical connection into the Eskom grid (see Fig. 2, right). Figure 3 shows the map of South Africa with the locations of Marion island, Jeffreys Bay and Noupoort wind farms and rose plots of the wind direction in these regions.

Figure 3
figure 3

Map of South Africa with the locations of Marion island, Jeffreys Bay and Noupoort wind farms and rose plots of the wind direction (created by R programming language version 4.1.3 https://www.r-project.org).

Table 1 shows the descriptive information about the datasets. The results in Table 1, confirm skewness presence in these datasets. Also the Boxplots and kernel density plots of these datasets in Fig. 4. The Boxplots emphasize that these wind direction datasets reveal skew patterns and the kernel density plots confirm multimodal patterns. kernel density estimate is a smoothed version of the histogram which is a useful alternative to the histogram for continuous data. Unlike the histogram, the kernel technique produces a smooth estimate of the density function, uses all sample points’ locations and more convincingly suggests multimodality.

Table 1 Descriptive statistics for the wind direction data.
Figure 4
figure 4

Boxplots and kernel density plots of the wind direction datasets A-C from Marion island, Jeffreys Bay and Noupoort wind farms.

Materials and methods

Sine-skewed von Mises distribution

Most of the distributions on the unit circle share the common feature of being symmetric about their location \( \mu \in [-\pi ,\pi )\). However, since the assumption that data is symmetric is often rejected, Ref.29 introduced the k sine-skewed von Mises distribution with density function

$$\begin{aligned} f_{{SSVM}}(\theta ;\mu ,\tau ,\lambda )=\frac{1}{2\pi I_0(\tau )}\exp (\tau \cos (\theta -\mu ))(1+\lambda \sin (k(\theta -\mu ))), \end{aligned}$$
(1)

where \(I_0(.)\) is the modified Bessel function of the first kind of order 0, \(\mu \in [-\pi ,\pi )\) is the location parameter, \(\tau >0\) is the concentration parameter, \(-1\le \lambda \le 1\) is the skewness parameter and k is a positive integer. \(\lambda >0\) leads to left skewed distributions and \(\lambda <0\) provides right skewed distributions. The symmetric von Mises distribution is retrieved if \(\lambda =0\). For \(k\ge 2\), (1) has a multimodal form but for \(k=1\) it can be both unimodal and bimodal. Figure 5 shows plots of SSVM density functions (see (1)) for \(\mu =0\), \(\tau =0.5\), \(\lambda =-0.8,-0.2,0.5,1\) and \(k=1,2\). As can be seen with \(k=2\) bimodal distributions follows. A mixture of SSVM distributions with \(M\in \mathbb {Z}^{+}\) components is expressed as

$$\begin{aligned} f_M(\theta ;{\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }})=\sum _{j=1}^{M} w_j f_{SSVM}(\theta ;\mu _j,\tau _j,\lambda _j), \end{aligned}$$
(2)

where \({\varvec{\mu }} = (\mu _1, \ldots ,\mu _M )\), \({\varvec{\tau }} = (\tau _1, \ldots ,\tau _M )\) and \({\varvec{\lambda }} = (\lambda _1, \ldots ,\lambda _M )\) are vectors of parameters, \(\tau _j>0\), \(\mu _j\in [-\pi ,\pi )\) and \(\lambda _j\in [-1,1]\). \({\varvec{w}} = (w_1, \ldots ,w_M )\) is a vector of the weights containing the relative size of each component in the total sample satisfy the constraints \(0 \le w_j \le 1\) and \(\sum _{j=1}^{M} w_j=1\).

Figure 5
figure 5

Density functions of the SSVM for \(\tau =0.5\), \(\mu =0\), \(\lambda =-0.8,-0.2,0.5,1\) and \(k=1\) (left) and \(k=2\) (right).

Algorithm 145 can be used to generate a sample from the SSVM distribution in (1).

figure a

Parameter estimation

In this section, first, the MLEs of parameters for a mixture of SSVM is presented, followed by a Bayesian inference when all the weight, location, concentration and skewness parameters \(({\varvec{w}}\), \({\varvec{\mu }}\), \({\varvec{\tau }}\), \({\varvec{\lambda }})\) are unknown.

Maximum likelihood estimation

The log-likelihood function of a mixture of SSVM in (2), can be represented as follows:

$$\begin{aligned} l({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }}|{\varvec{\theta }})= \sum _{i=1}^{n}log\left( \sum _{j=1}^{M} w_j f_{SSVM}(\theta _i;\mu _j,\tau _j,\lambda _j)\right) . \end{aligned}$$
(3)

By setting the partial derivatives of (3) with respect to (\({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }}\)) to zero, the MLEs of \(({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }})\) can be obtained. Since no closed-form expressions exist, numerical methods should be used to obtain the estimates. The DEoptim package46 in R software which is based on the Differential Evolution (DE) algorithm47 is used to obtain the MLEs. Differential evolution is a heuristic evolutionary method for global optimization that is effective in many problems of interest in science and technology and its significant performance as a global optimization algorithm on continuous numerical minimization problems has been extensively studied48. DEoptim has made this algorithm possible to easily apply in the R language and environment. DEoptim relies on repeated evaluation of the objective function in order to move the population toward a global minimum46.

Bayes estimation

Let \({\varvec{\theta }}=(\theta _1,\theta _2,\ldots ,\theta _n)\) be a random sample of size n from a mixture of SSVM (see (2)). It should be noted that the number of components M is considered as a known parameter. Suppose the latent variable \({\varvec{d}}=(d_1,\ldots ,d_n)\) allocates the component that \({\varvec{\theta }}\) is sampled from. The probability of being attributed to component j is given by

$$\begin{aligned} P(d_i = j|{\varvec{w}}) = w_j. \end{aligned}$$

Therefore, for \(i = 1, \ldots , n\) and \(j = 1, \ldots , M\)

$$\begin{aligned} f(\theta _i|d_i = j) = f_{SSVM}(\theta _i;\mu _j, \tau _j,\lambda _j). \end{aligned}$$

It implies that conditional on \(d_i\), \(\theta _i\) is an independent observation from its respective component j that makes the inference easier because the problem reduces to inference for a single SSVM component. Therefore, conditional on \({\varvec{d}}\), the likelihood function can be expressed as

$$\begin{aligned} L({\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }}|{\varvec{\theta }},{\varvec{d}})= \prod _{i=1}^{n} f_{SSVM}(\theta _i;\mu _{d_i},\tau _{d_i},\lambda _{d_i}). \end{aligned}$$
(4)

Subsequently, we measure the uncertainty in the parameters with the following prior distributions for \(({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }})\). If the sample size is small, or available data provides only indirect information about the parameters of interest, the prior distribution becomes more important49. Ghaderinezhad et al.50 implemented the Wasserstein impact measure (WIM) as a measure of quantifying prior impact. It helps us to choose between two or more given priors. Nakhaei Rad et al.44 by using the WIM measure demonstrated that the combination of the von Mises, gamma and truncated normal distributions decreases the execution time in the Gibbs sampling algorithm. Thus, providing accurate parameter estimates for the skew Fisher-von Mises distribution51 as well.

Therefore, consider independent von Mises and gamma distributions with parameters \(({\varvec{\mu }_0},{\varvec{\tau }_0})\) and \(({\varvec{\alpha }},{\varvec{\beta }})\) as priors for \({\varvec{\mu }}\) and \({\varvec{\tau }}\), respectively:

$$\begin{aligned} \pi (\mu _j,\tau _j;{\mu _0}_{j},{\tau _0}_{j},\alpha _j,\beta _j)\propto \exp ({\tau _0}_{j}\cos (\mu _j-{\mu _0}_j))\tau _{j}^{\alpha _j-1}\exp (-\beta _j\tau _j), \end{aligned}$$
(5)

where \({\tau _0}_j,\alpha _j,\beta _j>0\), \({\mu _0}_j\in [-\pi ,\pi )\) and \(j=1,2,\ldots ,M\).

For the skewness parameter \({\varvec{\lambda }}\), the truncated normal distribution on \([-1,1]\) is proposed with parameters \({\varvec{\xi }}\) and \({\varvec{\sigma }}^2\):

$$\begin{aligned} \pi (\lambda _j; \xi _j,\sigma _j)=\frac{1}{\sigma _j}\frac{\phi \left( \frac{\lambda _j-\xi _j}{\sigma _j}\right) }{\Phi \left( \frac{1-\xi _j}{\sigma _j}\right) -\Phi \left( \frac{-1-\xi _j}{\sigma _j}\right) },~~~~~~~~\lambda _j\in [-1,1]. \end{aligned}$$
(6)

where \(\xi _j\in \mathbb {R}\), \(\sigma _j>0\), \(j=1,2,\ldots ,M\), \(\phi (.)\) is the density function of standard normal distribution and \(\Phi (.)\) is its cumulative distribution function.

For the weight parameter \({\varvec{w}}\), the Dirichlet distribution with parameter \({\varvec{c}}\) is considered as prior:

$$\begin{aligned} \pi ({\varvec{w}};{\varvec{c}})= \frac{1}{B({\varvec{c}})}\prod _{j=1}^{M}w_j^{c_j-1}, \end{aligned}$$
(7)

where \(c_j>0\) for \(j=1,\ldots ,M\) and \(B({\varvec{c}})=\frac{\prod _{j=1}^{M}\Gamma (c_j)}{\Gamma \left( \sum _{j=1}^{M}c_j\right) }\). Thus the marginal distribution of \(w_j\) is \(Beta(c_j,\sum _{i=1}^{M}c_i-c_j)\)52.

Subsequently, the posterior distribution is:

$$\begin{aligned} \pi ({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }}|{\varvec{\theta }})\propto \pi ({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }})L({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }}|{\varvec{\theta }}), \end{aligned}$$
(8)

with \(\pi ({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }})\) from (5), (6) and (7). The full conditionals of parameters \(({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }},{\varvec{d}})\) for using in the Gibbs algorithm follow from (8). Therefore the Gibbs sampler is as follows (see Algorithm 2):

figure b

For \({\varvec{\theta }}=(\theta _1,\theta _2,\ldots ,\theta _n)\), a set of observations and \({\varvec{\varpi }}=({\varvec{w}},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }})\), the posterior predictive distribution for a new data point \(\theta _{new}\) and \(d_{new}\) (the corresponding latent switch variable associated with \(\theta _{new}\)) is:

$$\begin{aligned} \pi (\theta _{new}|{\varvec{\theta }})=\sum _{d_{new}}\int _{{\varvec{\varpi }}}f(\theta _{new}| d_{new},{\varvec{\mu }},{\varvec{\tau }},{\varvec{\lambda }}) p(d_{new}| {\varvec{w}})\pi ({\varvec{\varpi }}|{\varvec{\theta }})d{\varvec{\varpi }}, \end{aligned}$$

where \(\theta _{new}\) is independent of the sample data \({\varvec{\theta }}\). Sometimes the form of \(\pi (\theta _{new}|{\varvec{\theta }})\) can be derived directly, but it is often easier to sample from \(\pi (\theta _{new}|{\varvec{\theta }})\) using Monte Carlo methods. For generating an iid sample \((\theta ^{(1)}_{new},\theta ^{(2)}_{new},\ldots ,\theta ^{(n)}_{new})\) from \(\pi (\theta _{new}|{\varvec{\theta }})\) Algorithm 3 is followed:

figure c

Model selection criteria

Model selection is an important part of any statistical analysis and many tools for selecting the “best model” have been suggested in the literature. Here, three different criteria are applied to evaluate the models. Suppose \({\varvec{\varpi }}\) is the vector of parameters with k elements, \(l({\varvec{\varpi }}| {\varvec{\theta }})\) is the log-likelihood function and n is the sample size. The Akaike information criterion (AIC)53 and the Bayesian information criterion (BIC)54 as penalized-likelihood criteria are given by

$$\begin{aligned} AIC&=-2l({\varvec{\varpi }}|{\varvec{\theta }})+2k,\\ BIC&=-2l({\varvec{\varpi }}|{\varvec{\theta }})+k\log n. \end{aligned}$$

As can be seen, BIC penalizes parameters more heavily than AIC. Spiegelhalter et al.55 proposed the deviance information criterion (DIC), as

$$\begin{aligned} DIC=2\overline{D}({\varvec{\varpi }})-D(\overline{{\varvec{\varpi }}}), \end{aligned}$$

where \(D({\varvec{\varpi }})=-2 l({\varvec{\varpi }}|{\varvec{\theta }})\), \(\overline{{\varvec{\varpi }}}\) is the posterior mean of \({\varvec{\varpi }}\) and \(\overline{D}(.)\) is the average of D(.) over the samples of \({\varvec{\varpi }}\). DIC is usually applied in Bayesian model selection problems where the posterior distribution has been obtained by MCMC simulation.

Evaluation and results

Simulation

In this section, to assess the performance of the proposed Bayesian approach a simulation study was conducted to estimate the parameters of SSVM in (1) and mixture of SSVM in (2). SSVM with parameters \(\mu =3,\tau =2,\lambda =0.5\) and prior parameters \(\mu _0=0,\tau _0=0.01,\alpha =4,\beta =2,\xi =0.5\), \(\sigma =0.01\) and a mixture of SSVM with two components (\(M=2\)) with parameters \(w=0.8, \mu _1=3, \tau _1=0.2, \lambda _1=0.75, \mu _2=3.14, \tau _2=0.6, \lambda _2=-0.3\) and prior parameters \(\mu _{0_1}=3\), \(\tau _{0_1}=0.1\), \(\alpha _1=4\), \(\beta _1=2\), \(\xi _1=0.9\), \(\sigma _1=0.15\), \(c_1=1\) and \(\mu _{0_2}=0\), \(\tau _{0_2}=0.1\), \(\alpha _2=6\), \(\beta _2=2\), \(\xi _2=-1\), \(\sigma _2=1.0\), \(c_2=1\) were considered. Samples of sizes \(n=50,100,500\) were generated from the posterior distribution in (8) for each model, using Gibbs sampling in Algorithm 2. The Bayes estimates of parameters were obtained based on the squared error and absolute error loss functions. The posterior mean and the posterior median are the Bayes estimators under the squared error and absolute error loss functions, respectively. In order to obtain the Bayes estimates of the parameters, the mean and median of the generated samples from the posterior distribution (8) were calculated along with some other descriptive statistics. The results, including the sample mean, standard deviation (sd) and quartiles (Q1, median, and Q3) of the posterior distribution are summarized in Tables 2 and 3. As can be seen the differences between true values of the parameters and the posterior sample mean and the posterior sample median are minimal. Therefore, the proposed Bayesian approach provides accurate estimates for the parameters. The traceplots of the generated samples from the posteriors and the compare-partial plots56 are shown in Fig. 6 for the mixture of SSVM. A traceplot is used for evaluating convergence which shows the time series of the sampling process from the posterior distribution. It is expected to get a traceplot that looks completely random. A compare-partial plot provides overlapped kernel density plots related to the last part of the chain (the last 10 values, in green) and the whole chain (in black). The overlapped kernel densities are expected to be similar. It means the initial and final parts of the chain should to be sampling in the same target posterior distribution. These plots in Fig. 6 confirm the convergence of the chains and show that the Gibbs sampler recovers the values that actually generate the dataset.

Table 2 Bayes estimates of parameters of SSVM with prior parameters, \(\mu _0=0,\tau _0=0.01,\alpha =4,\beta =2,\xi =0.5\) and \(\sigma =0.01\).
Table 3 Bayes estimates of parameters of a mixture of SSVM with prior parameters, \(\mu _{0_1}=3\), \(\tau _{0_1}=0.1\), \(\alpha _1=4\), \(\beta _1=2\), \(\xi _1=0.9\), \(\sigma _1=0.15\), \(c_1=1\) and \(\mu _{0_2}=0\), \(\tau _{0_2}=0.1\), \(\alpha _2=6\), \(\beta _2=2\), \(\xi _2=-1\), \(\sigma _2=1.0\), \(c_2=1\).
Figure 6
figure 6

Traceplots and estimated posterior density plots of generated samples for \((w,\mu _1,\tau _1,\lambda _1,\mu _2,\tau _2,\lambda _2)\) in Table 3 for \(n=500\).

To evaluate the accuracy of the obtained Bayes estimates, the mean squared errors (MSE) of the estimates under squared error and absolute error loss functions for the mixture of SSVM with two components (\(M=2\)) with parameters which are mentioned above were obtained for different sample sizes \(n = 10,25,50,100,200,300,500\) with 100 repetitions. The results in Fig. 7 show that by increasing n, MSE decreases and also, the MSEs of the estimates for absolute error loss function are less than squared error loss function because outliers have a smaller effect on the median.

Figure 7
figure 7

MSE of Bayes estimates under the squared error (left) and absolute error (right) loss functions, for \(n = 10,25,50,100,200,300,500\).

Real data

To demonstrate the performance of the SSVM for the wind direction data for South African hotspots, three real skewed datasets as discussed in “Site location and wind data” (see Table 1) were analyzed. Due to the multimodal pattern of the datasets observed in Fig. 4, the following distributions were assumed:

  • mixtures of von Mises distributions with \(M=2,3,4\) components,

  • SSVM with \(k=2\),

  • mixtures of SSVM with \(k=1\) and \(M=2\) components,

  • mixtures of SSVM with \(k=2\) and \(M=2\) components.

The MLEs of parameters \((\mu ,\tau ,\lambda ,p)\) were obtained by using the DEoptim package in R. The results including MLEs and corresponding log-likelihood, AIC and BIC are reported in Table 4. A model with the maximum log-likelihood and minimum values of AIC and BIC provides better fit for the data. Therefore, for dataset A, the mixture of SSVM with \(k=1\) provides the best fit. Mixture of SSVM with \(k=2\) and the mixture of von Mises with \(M=2\) are the second and third best models, respectively. For datasets B and C, the mixture of SSVM with \(k=2\) provides the best fit and the mixture of von Mises with \(M=4\) is the second best model. In all of these datasets, the difference in the AIC and BIC values of the mixture of SSVM in comparison to the mixture of von Mises are remarkable. Furthermore, the mixture of SSVM with smaller value of M, outperformed the mixture of von Mises. The kernel density plots of the datasets and the fitted curves consisting of the best mixture of von Mises and mixture of SSVM for \(k=1,2\) are shown in Fig. 8.

Table 4 Maximum likelihood estimates and corresponding log-likelihood, AIC and BIC for datasets.
Figure 8
figure 8

Kernel density plots of datasets and fitted curves based on MLEs.

To demonstrate the performance of the proposed Bayesian approach, a mixture of two SSVM distributions is fitted to dataset A for \(k=1\), and to dataset B and C with \(k=2\). A sample of size \(n=500\) was generated from the posterior distribution in (8) for each model, using the Gibbs sampling outlined in Algorithm 2. The Bayes estimates of the parameters were obtained based on the squared error, absolute error and zero-one loss functions. For our purpose, the posterior mean, posterior median and posterior mode were calculated from the generated samples as the Bayes estimates of parameters under the different mentioned loss functions. The results including the Bayes estimates of the parameters and corresponding DIC are reported in Table 5. A model with minimum value of DIC has better fit for the data. The mentioned models above with parameters estimated based on the absolute error loss function provide more accurate fit for the datasets. The kernel density plots of the datasets and the fitted curves are shown in Fig. 9.

Table 5 Bayes estimates of parameters under different loss functions and corresponding DIC for datasets.
Figure 9
figure 9

Kernel density plots of datasets and fitted curves based on Bayes estimates.

In Table 6, using Algorithm 3, the predicted means of wind direction were obtained, based on absolute error loss function, for \(n=20,50,100\). Also, \(95\%\) credible intervals are derived. We focused on the assumption of absolute error loss function as a result of the performance observed in Table 5. As can be seen, by increasing n, the mean value of the predictive wind direction distributions are getting closer to the mean value of the datasets. In addition, the length of the credible intervals is short. Therefore, our approach provides accurate prediction of wind direction.

Table 6 Predicted wind direction based on absolute error loss function for different values of n.

Conclusion

In this paper, due to the skew and multimodal patterns of wind direction datasets from South Africa, a skew and multimodal mixture model, namely mixture of sine-skewed von Mises distributions is proposed for modeling wind direction. Our proposed model outperforms mixtures of von Mises distributions (with larger number of components) which is extensively used in literature to model wind direction. Due to the difficulties in estimating parameters for mixture models using maximum likelihood method, a Bayesian approach is implemented for estimating the parameters of a mixture of sine-skewed von Mises distributions using a Gibbs sampler. The results show this approach provides accurate estimates for parameters. In addition the posterior predictive distribution can be applied for wind direction prediction (see Table 6) which provides accurate forecasts. Future work may consist of implementing the models of Bekker et al.57 and Kato and Jones19 and investigating the impact of other prior choices50. One can use our proposal to improve the wind energy potential as described and detailed in Arashi et al.58.