Optimal sampling and statistical inferences for Kumaraswamy distribution under progressive Type-II censoring schemes

In this paper, we study non-Bayesian and Bayesian estimation of parameters for the Kumaraswamy distribution based on progressive Type-II censoring. First, the maximum likelihood estimates and maximum product spacings are derived. In addition, we derive the asymptotic distribution of the parameters and the asymptotic confidence intervals. Second, Bayesian estimators under symmetric and asymmetric loss functions (Squared error, linear exponential, and general entropy loss functions) are also obtained. The Lindley approximation and the Markov chain Monte Carlo method are used to derive the Bayesian estimates. Furthermore, we derive the highest posterior density credible intervals of the parameters. We further present an optimal progressive censoring scheme among different competing censoring scheme using three optimality criteria. Simulation studies are conducted to evaluate the performance of the point and interval estimators. Finally, one application of real data sets is provided to illustrate the proposed procedures.

showed the behavior of pdf and cdf of the Kum. distribution at different values of the parameters α and β . In Fig. 1, we notice different patterns from the distribution according to the values of the parameters, where we notice the presence of a U-shaped shape (α = 0.5, β = 0.5) , and the gamma distribution (α = 2, β = 5) and we find a form that approximates the exponential distribution at the rest of the values.
(1) www.nature.com/scientificreports/ In industrial life testing and medical survival analysis, it is common for the object of interest to be lost or withdrawn before failure, or for the object's lifetime to be only known within an interval. This results in a sample that is incomplete, often referred to as a censored sample. There are various reasons for removal of experimental units, such as saving them for future use, reducing total test time, or lowering associated costs. Right censoring is a technique used in life-testing experiments to handle censored samples. The conventional Type-I and Type-II censoring schemes are the most common methods of right censoring, but they do not allow for removal of units at points other than the terminal point of the experiment, limiting their flexibility. To address this limitation, a more general censoring scheme called the progressive Type-II censoring scheme (PCS-II) has been proposed. as follows: • Suppose that n unite are placed on a test at time zero with m failures to be observed.
• At the first failure, say x (1) , R 1 of the remaining units are randomly selected and removed.
• At the time of the second failure, x (2) , R 2 of the remaining units are selected and removed.
• Finally, at the time of the m th failure the rest of the units are all removed, R m = n − R 1 − R 2 − · · · − R m−1 − m . • Thus, it is possible to witness the data x (1) , R 1 , . . . , x (m) , R m during a gradual censorship plan. Even though R 1 , R 2 , . . . , R m are encompassed as a section of the data, their values are previously known.
The joint probability density function of all m PCS-II order statistics is Balakrishnan and Aggarwala 5 where If R 1 = R 2 = · · · = R m−1 = 0 , then R i = n − m which corresponds to the Type-II censoring. and If R 1 = R 2 = · · · = R m = 0 , then n = m which corresponds to the complete sample Wu 6 .
The theory of estimation consists of these methods by which are make inference or generalization about a population parameter. Today's trend is to distinguish between the non-Bayesian and Bayesian estimate methods. Any statistical or computational strategy that does not rely on Bayesian inference is referred to as a non-Bayesian method. Bayesian method utilizes prior subjective knowledge about the probability distribution of the unknown parameters in conjunction with information provided by the sample data. Non-Bayesian method and Bayesian method of estimation will be introduced in next sections.
Gholizadeh et al. 7 examined the performance of Bayesian and non-Bayesian estimators in estimating the shape parameter, reliability, and failure rate functions of the Kumaraswamy distribution under progressively Type-II censored samples. They used various loss functions, such as squared error loss, Precautionary loss function, and linear exponential (LINEX) loss function, to obtain the maximum likelihood and Bayes estimates of the reliability and failure rate functions, both symmetric and asymmetric. Feroze and El-Batal 8 focused on estimating two parameters of the Kumaraswamy distribution using progressive Type-II censoring with random removals. They derived the maximum likelihood estimate for the unknown parameters and also determined the asymptotic variance-covariance matrix. Eldin et al. 9 studied parameter estimation for the Kumaraswamy distribution using progressive Type-II censoring. They obtained estimates through both maximum likelihood and Bayesian methods. In the Bayesian approach, the two parameters were considered as random variables and estimators for the parameters were obtained by employing the squared error loss function. Erick et al. 10 focused on estimating parameters of test units from the Kumaraswamy distribution using a progressive Type-II censoring scheme. They employed the EM algorithm to derive maximum likelihood estimates for the parameters. Additionally,

Non-Bayesian estimation methods
In this section, we examine the task of estimating Kum. parameters under PCS-II samples, employing two estimation techniques known as maximum likelihood estimators (MLEs) and maximum product spacing estimators (MPSEs).

Maximum likelihood estimation.
Suppose that X = (x (1) , x (2) , . . . , x (m) ) a PCS-II sample drawn from Kum. population whose pdf and cdf are given by (1) and (2), with the censoring scheme (R 1 , R 2 , . . . , R m ) . From (1), (2) and (3) the likelihood function is then given by: Taking log-likelihood function of (4), l(α, β) = logL(α, β) , one be obtaining: Then, to estimate the parameter of the Kum. distribution, it can be obtained by finding the first partial derivative of (5) concerning parameters α and β as follows: Let the partial derivative to α and β respectively be 0, we have www.nature.com/scientificreports/ The maximum likelihood estimators (MLEs) α MLE and β MLE are the solution of the two nonlinear equations that the system needs to be solved numerically to obtain parameters estimation values.
Maximum product spacings. Ng et al. 15 introduced maximum product spacing (MPS) method based on PCS-II sample method, The MPS technique selects the parameter values that minimize the deviation of the observed data from a predetermined quantitative measure of uniformity.
from (2), one can get: The natural logarithm of the product spacings function is where S(α, β) = logG(α, β) . The MPS estimators of α and β , denoted by α MPS and β MPS , respectively, are obtained by solving the following normal equations simultaneously and To obtain estimates for the parameters, the system requires solving two nonlinear equations numerically, which leads to the solution of the MPS, α MPS and β MPS .
Asymptotic variance-covariance. The asymptotic variance-covariance matrix of the MLEs for the two parameters is the inverse of the observed Fisher information matrix as follows and the variance-covariance matrix of parameters α and β is given by www.nature.com/scientificreports/ Using (6), an asymptotic 100(1 − γ )% confidence interval for the parameters α and β can be easily obtained as respectively. Here var( α) and var( β) are the elements on the main diagonal of the variance-covariance. Where Z γ 2 is the uppe γ 2 percentile of the standard normal distribution, so MLE ∼ N(0, I −1 ).

Bayesian estimation methods
In this section, the Bayesian estimation (BE) of shape parameters α and β denoted by α and β respectively, are obtained under the assumption that α and β are independent random variables with prior distributions Gamma(a 1 , b 1 ) and Gamma(a 2 , b 2 ) respectively with pdfs: where the hyper parameters a 1 , b 1 , a 2 , andb 2 are chosen to reflect the prior knowledge about α and β . The joint prior for α and β is given by By applying Bayes' theorem and combining the likelihood function from (4) with the joint prior from (7), we can obtain the posterior distribution of the parameters α and β denoted as π(α, β|x) , which is proportional to the likelihood and prior. This can be expressed as: when the likelihood function of α and β as follows: thus, the likelihood function in (4) can be rewritten as follows: Hence, the taken exp(log) function will be The join posterior density function of α and β can be written as: where: www.nature.com/scientificreports/ Thus, the posterior density function can be rewritten as The conditional posterior densities of α and β are as follows: where Hence, the conditional posterior density of α will be and where Hence, the conditional posterior density of β will be It is clear that π 2 (β|α, x) is the density function of a gamma (m + a 2 , b 2 − m i=1 (R i + 1)log(1 − x i α )) random variable. BE are obtained based on three different types of loss functions, namely; the squared error (SE) loss function (as a symmetric loss function), linear exponential (LINEX) and general entropy (GE) loss functions (as asymmetric loss functions).
Markov chain Monte-Carlo. In this case, we apply the MCMC technique to produce samples from the posterior distributions. From these samples, we calculate the BE of the unknown parameters and construct the corresponding credible intervals. he conditional posterior densities of β in (10) are gamma densities with a shape parameter of (m + a 2 ) and a scale parameter Thus, samples of β can be easily generated using any gamma generating routine. The posterior of α given in (9) does not present a standard form, but the plot of it shows that it is similar to a normal distribution with mean α and standard deviation S α , here S α represents the variance-covariance matrix. Therefore, to generate random numbers from this distribution, we use the Metropolis-Hastings algorithm with the normal proposal distribution to generate samples from it see Tierney 20 and El-Sagheer 11 . Therefore, the MCMC is given as: 4. Using the following Metropolis-Hastings, generate α (i) from π 1 (α|β, x) with the normal proposal distribution N α, S α .
4.1 Generate a proposal α * from N α, S α , where S α is Standard deviation of α.

Elicitation of hyper-parameters. This section discusses the elicitation of hyper-parameter values when
informative priors are considered. It is to be noted that the hyper-parameter values can be chosen depending on informative priors. Suppose that we have k samples available from the K(α, β) distribution, and that the associated maximum likelihood estimates of (α, β) be α j , β j , j = 1, 2, . . . , k . The hyper-parameter values can now be obtained by equating the mean and variance of α j , β j , j = 1, 2, . . . , k with the mean and variance of examined priors. In the present work, we have considered the gamma prior of α and β respectively are 2 . Therefore, on equating mean and variance of α j , β j , j = 1, 2, . . . , k with the mean and variance of gamma priors, we get We can find a 1 and b 1 , estimators for a and b , by solving Eqs. (11) as follows: and Solving the above equations yields the estimators for the hyper-parameters for the prior distribution for α . Similarly, estimators for the hyper-parameters for the prior distribution for β can be found as One may also refer to the work of Dey et al. 19,21 , and Singh and Tripathi 22 in this regard.

Simulation study and real data analysis
The objective of this section is to evaluate the effectiveness of the different estimation methods that were discussed in the preceding sections. A real dataset is used to illustrative purposes, and a simulation study is conducted to observe how the proposed methods perform under a PCS-II.
Simulation study. In this subsection, we conduct a simulation study to compare the performance of different estimates and confidence intervals for the Non-BE and BE methods of unknown parameters of the Kum. distribution under a PCS-II. We will compute the Average, Mean Square Error (MSE), Confidence intervals (CI), average interval length (AIL), and the coverage probability (CP) to compare the performance of the different methods using a number of replications 1000. The performance of Non-BE and BE are compared based on the following assumptions: 1. values of (α, β) = (0.5, 0.5), (0.5, 1), (1, 1), and (1, 2). 2. Sample sizes of n = 40 and n = 80. www.nature.com/scientificreports/ 3. In this simulation, the algorithm proposed by Balakrishnan and Sandhu 23 can be used to generate a progressively Type-II censored sample, removed items R i are assumed at different sample sizes n and the number of stages m as shown in Table 1.
Based on the generated data, we compute the MLEs, MPSs, and corresponding 95% asymptotic confidence intervals (Asy-CI). On deriving MLEs, be aware that the initial assume values are considered as true parameter values.
The gamma prior distribution is used to compute BE parameter using both symmetric and asymmetric loss functions. These estimates are obtained through Lindley's approximation and the MCMC method. To determine the values of the hyper-parameters, 500 complete samples of size 60 are constructed from the Kum. distribution with various values of α and β using historical data. The obtained informative prior values are then used to evaluate the desired estimates. The MLEs are employed as initial values using the MH algorithm, along with the corresponding variance-covariance matrix S α of α . In the end, the posterior density removed 2000 burn-in samples of the total of 10,000 generated samples and produced BE for three different loss functions: SE, LINEX at c = −0.5, 1.5 , and GE loss functions at q = 0.1, 2 . Additionally, HPD interval estimates were calculated using the approach developed by Chen and Shao 24 . www.nature.com/scientificreports/ In Tables 2, 3, 4, 5, we display the Non-BE obtained by using MLE and MPS at different values of n and m , respectively. Further, the first column represents the average estimates (Avg.) and the second column represents the mean square errors (MSEs). For confidence intervals, we have asymptotic confidence interval (Asy CI), average interval length (AIL), and coverage probability (CP) using the MLE. Tables 6,7,8,9,10,11,12,13 show BE obtained using Lindley's approximation and MCMC method with different loss functions, for various values of n and m . The first column of the tables indicates the average estimates (Avg.), while the second column represents the mean square errors (MSEs). In Tables 14, 15, we present confidence intervals, which include the highest posterior density (HPD) intervals, average interval length (AIL), and coverage probability (CP) using the MCMC method.
Based on Tables 2, 3 , 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, with an increase in the values of (n, m) , and specifically an increase in the value of m leads to a decrease in the MSEs, in addition to that the Avg. denotes the true value of the two parameters α and β , for all estimation methods. By comparing the performance of Non-BE methods, we note that estimates of MLE are more efficient than estimates of MPS. In comparison between the performance of the BE methods relative to the Lindley's approximation, we find that the value of the MSEs decreases under the LINEX loss function when c = −0.5 , then followed by the SE loss function. As for the MCMC algorithm, we find that the MSEs value decreases under the SE loss function. In general, we note that estimates of MCMC are more efficient than estimates of Lindley's approximation. From Tables 14, 15, it is observed that the HPD and Asy CI have the smallest and largest average lengths, respectively. As a general result, we see that when n increases, for all cases, the AIL decrease and the corresponding CP percentages increase.
The graphs of MCMC estimates for α and β using MH algorithm are the plotting of estimates, histogram of estimates, and convergence of estimates, these graphs can be showed in Fig. 2. In Fig. 2 the plots display the random distribution of the values of α and β , which are observed to be scattered around the mean. Also, from the histograms of the MH sequences for α , we observe that choosing the normal distribution as a proposal distribution is quite appropriate.
Real data application. In this subsection, we examine actual data that pertains to the Shasta reservoir's monthly water capacity in California, USA. The data covers the month of February between 1991 and 2010. see Sultana et al. 25 and Sultana et al. 12 . The data points are listed below as follows.   www.nature.com/scientificreports/ To determine if the considered dataset can be appropriately analyzed using a Kum. distribution, a goodness of fit test is conducted. In addition to Kum. distribution, we also fit generalized exponential [Gen.Exp], Burr XII [Burr], and beta distributions to the data set. We judge the goodness of fit using various criteria, for example, negative log-likelihood criterion (NLC), Akaike information criterion (AIC) introduced by Akaike 26 , corrected AIC (AICc) introduced by Hurvich and Tsai 27 , and Bayesian information criterion (BIC) introduced by Schwarz 28 . The smaller the value of these criteria, the better the model fits the data. The results are shown in Table 16. For fitting the given data set graphically, the empirical cdf can be plotted with the corresponding fitted cdfs for Kum., Gen.Exp, Burr, and beta distributions. Also, the histogram can be plotted with the corresponding fitted pdf lines for the same distributions. Figure 3 showed the fitted lines for the cdfs and pdfs for the given data set and corresponding distributions. The figures also indicate that the Kum. distribution provide better fit than the other distributions at least for this data set.
Referring to the values reported in Table 16, we conclude that the Kum. distribution fits the data set good compared to the other models. Thus, the various point and interval estimates of α and β for real data under PCS-II as following as in Tables 17,18,19. In Table 17, we display the Non-BE obtained by using MLE and MPS at m = 10 . We computed the average estimates (Avg.), standard deviation, and the asymptotic confidence interval (Asy CI) using the MLE.

Optimal progressive Type-II censoring scheme
In the preceding sections, we have discussed both Non-BE and BE methods for estimating unknown parameters of the Kum. distribution when using PCS-II to obtain samples. To conduct a life-testing experiment using PCS-II, it is necessary to have advance knowledge of n, m, and (R 1 , R 2 , . . . , R m ) . However, in many reliability and life testing studies, practical considerations require selecting the optimal PCS-II from a set of possible schemes. Balakrishnan and Aggarwala 5 extensively discussed the problem of determining the best censoring plan using different setups. Comparing two different censoring schemes has been of great interest to several researchers. See, for example, Ng et al. 29 , Kundu 30, Lee et al. 31 , Lee et al. 32 , and Ashour et al. 33 . To determine the optimum PCS-II, we consider an information measure as the following criteria.    www.nature.com/scientificreports/ Hence, the logarithmic for T u of the Kum. distribution is given by From Eq. (13), using the delta method, the variance of log T u can be approximated by where is the gradient of log(T u ) with respect to the unknown parameters α and β.where Thus, the variance of log T u can be obtained as www.nature.com/scientificreports/ The calculated values of determinant and trace of the variance-covariance matrix of the MLEs when α = 0.5 , β = 1, n = (40, 80)andm = (20, 30, 40, and60) are presented in Table 20. Based on three different quantiles namely: u = 0.25, 0.5and0.75 , the Criterion 3 is computed. The calculated values of the three criteria are reported in Table 20.
Using the previous application in "Real data application" section, we considered a PCS-II sample of size m from the Kum. distribution. Using the variance-covariance matrix of the MLEs, we can easily compute the values of the trace and determinant of the variance-covariance matrix for all choices of n, m, and schemes (S 1 , S 2 , . . . , S m ) are presented in Table 21. Based on three different quantiles namely: u = 0.25, 0.5and0.75 , the Criterion 3 is computed. These values indicate that the optimal censoring scheme is the one that yields the smallest determinant or trace of the variance-covariance matrix of the MLEs. From Table 21, therefore, the optimum scheme in Criterion 1, 2 is (n, m, (S 1 , S 2 , . . . , S m )) = (20, 10, 0 * 9 , 10 ) but in Criterion 3 when the values of (u = 0.25, 0.5) the optimum scheme is (n, m, (S 1 , S 2 , . . . , S m )) = (20, 10, 0 * 4 , 5, 5, 0 * 4 ) , and when the value of u was increased, it became the optimum scheme is (n, m, (S 1 , S 2 , . . . , S m )) = (20, 10, 0 * 9 , 10 ).  www.nature.com/scientificreports/

Conclusion
This paper deals with the problem of estimating unknown parameters for a Kum. distribution under PCS-II from both BE and Non-BE perspectives. We obtained MLE, MPS, and asymptotic confidence interval estimates for the unknown parameters of a Kum. distribution. We also computed BE, including Lindley's approximation and MCMC under both symmetric and asymmetric loss functions, along with their corresponding HPD interval estimates. We discussed how to choose hyper-parameter values based on past samples and compared the methods using MSE, AIL, and CP. Our results indicate that BE is superior to non-Bayesian estimates. We identified the optimal censoring scheme for life testing experiments based on three criteria measures, which is important information for reliability practitioners. The future work can be extended to studying neutrosophic statistics for the Kum. distribution. Another work could involve modeling COVID-19 data under different progressive censoring schemes.   www.nature.com/scientificreports/   www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/  www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/

Data availability
The data is available in this article.