The development of an extended Weibull model with applications to medicine, industry and actuarial sciences

This paper delves into the theoretical and practical exploration of the complementary Bell Weibull (CBellW) model, which serves as an analogous counterpart to the complementary Poisson Weibull model. The study encompasses a comprehensive examination of various statistical properties of the CBellW model. Real data applications are carried out in three different fields, namely the medical, industrial and actuarial fields, to show the practical versatility of the CBellW model. For the medical data segment, the study utilizes four data sets, including information on daily confirmed COVID-19 cases and cancer data. Additionally, a Group Acceptance Sampling Plan (GASP) is designed by using the median as quality parameter. Furthermore, some actuarial risk measures for the CBellW model are obtained along with a numerical illustration of the Value at Risk and the Expected Shortfall. The research is substantiated by a comprehensive numerical analysis, model comparisons, and graphical illustrations that complement the theoretical foundation.

• It is more flexible then the well-known complementary Poisson Weibull model.
• It is tractable, has three parameters and comparatively simple probability density function (pdf) and cdf.
• It has a very good fit for heavy-tailed and skewed data.
• It works well when the Weibull, exponential or Burr distribution is used as baseline model, and the failure rate function can have various shapes, including unimodal, upside-down bathtub, increasing or decreasing shapes.
Note that the Weibull distribution is also an important lifetime distribution and there are several recent modifications [9][10][11] .The paper is organized as follows."The CBellW distribution and its properties" section presents the CBellW model and its key distributional properties."Simulation study" section discusses the results of the simulation study, while "Real-life applications" section focuses on various applications of the CBellW model using six real data sets.Finally, the paper concludes in "Concluding remarks" section.

General distributional properties
Practitioners can employ the CBellW distribution to analyze various types of data because of the failure rate function's versatility.Consider the baseline cdf and pdf for the Weibull distribution, G(x) = 1 − exp − x α β and g(x) = β α β (x) β−1 exp − x α β , for x > 0 , α > 0 and β > 0 , respectively.Then, the cdf of the CBellW distribution is as follows: where x > 0 , α > 0 and β > 0 .The pdf corresponding to Eq. ( 4) is as follows: (1) F(x) = e G(x) − 1 e − 1 , (2) P(X = x) = x e −e +1 B x x! , x = 0, 1, 2, . . ., 1 − e 1−e , (4) F(x; , α, β) = exp e 1−exp −(x/α) β − 1 − 1 exp e − 1 − 1 , www.nature.com/scientificreports/ The survival function related to the CBellW distribution is as follows: The hazard rate function (hrf) is the ratio of f (x) 1−F(x) and can be obtained using Eqs.(5) and (6).The quantile function (qf) of the CBellW distribution is as follows: where u[0,1].As the qf has a closed form solution, it can be used to obtain L-moments, and it is suitable to design a GASP as well as various actuarial risk measures.Figure 1 demonstrates the possibility of symmetric, reversed-J, and right-skewed for the pdf of the CBellW distribution.In general, after a failure of different engineering systems the hrf initially has to drop, then it is reasonably static, and lastly, there is a growing failure rate.The terms "burning," "random," and "wear-out failure zones" refer to these three phases in reliability theory.The hrf plots have some adaptable shapes, such as increasing, decreasing, and increasing-decreasing shapes, which quantify the characteristics of the lifetime distribution.It can represent the second phase of the bathtub-shaped failure rate because it has a long constant failure rate period as shown in Fig. 1, whereas Fig. 2 shows the mean, www.nature.com/scientificreports/variance, skewness and kurtosis of the CBellW model.By increasing , the mean and variance tend to increase.On the other hand, skewness and kurtosis reduce when increases.The scale parameter α is considered as 1.
Proposition 1 The pdf of the CBellW distribution can be expressed in the form where t n = ∞ v=0 ζ v (v + 1)(−1) n v n and ζ v is defined is Eq.(43) (see Appendix) and the last term Proof Using Eq. ( 41) yields and by applying binomial expansion to the last term we get and the above expression reduces to The general result of Proposition 1 shows that the CBellW pdf is a linear combination of Weibull densities.Therefore, several mathematical properties of CBellW can be derived from those of the Weibull distribution.Some of them will be presented below.

Ordinary and incomplete moments
The mean and variance of the CBellW distribution can be obtained by using Eq.(10), where mean = µ ′ 1 and variance Moreover, the first four moments can be obtained using the well-established relationship between ordinary and central moments.The moment-based measure of skewness and kurtosis, respectively, is obtained by using , where Pearson's coefficient of skewness and kurtosis can be yielded as √ β 1 and β 2 − 3 , respectively.The rth raw or ordinary moment of the CBellW distribution is given by where t n is as in Proposition 1.On the other hand, there are many important and useful applications for incom- plete times.For instance, they are essential when calculating the average waiting time, deviation, conditional moments, measures of income disparity, etc.The representation of the rth incomplete moments is provided by 41), we get where t n is given in Proposition 1, and Ŵ(a, b) is the Gamma function.

Moment generating function
In probability theory and statistics, several statistical measures are used to specify the distribution of interest namely the moment generating function (mgf), characteristic function, the rth moments, qf, etc.Let X be a random variable associated to f(x) given in Eq. (8).The mgf is defined by E e tx = e tx f (x)dx .Here, we use the Wright generalized hypergeometric function, to derive the mgf.Considering for computational ease we set � = (n+1) 1/β α , and by expanding the series e tx = ∞ m=0 t m m! x m , we obtain Hence, we have the following expression for the mgf:

Reliability
Numerous applications related to reliability have been conducted in various fields.We are able to calculate the failure probability at a specific time point due to aspects of reliability.Let X 1 and X 2 be two random variables that follow the CBellW distribution.If the applied stress is more than the component's strength, it will fail; but, if X 1 > X 2 , it will operate satisfactorily.Here, we derive the reliability of the CBellW model when X 1 and X 2 are independent with f (x; 1 , α, β) and F(x; 2 , α, β) as well as identical scale (α) and shape (β) parameters.It is then given by

Residual and reversed residual life
The nth moment of the residual life of X is given by By using Eq. ( 8), one gets where t p = t n and t * p = t p n r=0 n r (−t) n−r .Here, the mean residual life of X can be achieved by setting n = 1 in Eq. (15).
where the function γ (a, b) represents the upper incomplete Gamma function.The following expression gives the nth moment of reversed residual life: where t * * p = t p n r=0 n r (−1) r t n−r .Then, the mean reverse residual life or mean inactivity time of X can be obtained by setting n = 1 in Eq. ( 17):

Entropy measures
Entropy measures are important when highlighting a random variable's uncertainty variation.Here, we present important entropy measures including the Reńyi entropy, Havrda and Charvat (HC) entropy, the Arimoto entropy and the Tsallis entropy based on the CBellW model.Moreover, we evaluate their numerical values which show flexibility under the CBellW model.For more details, the readers are referred to 12 .In the following let X ∼ CBellW ( α, β, ).
The Reńyi entropy is given by where The HC entropy is given by The Arimoto entropy is given as follows: The Tsallis entropy is given by See Table 1 for exemplary numerical computations of the above entropy measures. ( Table 1.Numerical computation of entropy measures.

Parameter estimation
The log-likelihood function L related to the parameter vector θ = ( , α, β) ⊤ in Eq. ( 5) is given by The components of the score vector U(θ) are as follows: By solving this system of non-linear equations, one can obtain the maximum likelihood estimates of the respective parameters.The above equations can be solved using computer-based programming algorithms.

Simulation study
In this section, we conduct a simulation study related to the parameter estimates of the proposed CBellW model's to analyze the performance for various sample sizes n = 20, 25, 30, . . ., 250 .We simulated N = 1000 samples that are replicated 5000 times.We consider the scale parameter α = 2 for two different sets and vary the shape parameters β and in various combinations.In particular, we consider set I = [β = 6.0, = 0.70] and set II = [β = 5.0, = 1.20].
According to the results of the simulation study in Tables 2, 3, the bias and the mean squared error (MSE) of the parameters decrease as the sample size increases.Therefore, the CBellW model parameters may be estimated and their proposed confidence intervals can be constructed using the maximum likelihood estimators (MLEs) and their asymptotic results.The graphical illustration of MSEs and biases for set I and set II are presented in Figs. 3 and 4, respectively.The following Eqs.( 23) and (24), and are used to evaluate the MSE and bias of the estimates, respectively.

Real-life applications
This section aims to practically implement the CBellW model on real data sets to demonstrate the benefits of the proposed model.In "Modeling of COVID-19 and cancer data" section, we apply the CBellW model to four medical data sets, and in the following "Designing a GASP with application to Guinea pigs data" and "Actuarial measures with applications to auto-mobile collision claims data" section, we design a GASP (with application to Guinea pigs data) and compute risk measures by using actuarial data, respectively.We also compare several Weibull-based models such as the complementary Poisson Weibull (CPW) 3 , alpha power Weibull (APW) 13 , transmuted Weibull (TW) 14 , beta Weibull (BW) 15 , Marshall Olkin Weibull (MOW) 16 , Weibull claim (W-claim) 17 , gamma Weibull (GW) 18 , Gull alpha power Weibull (GAPW) 19 , and exponentiated exponential (EE) with the proposed CBellW model.The first data set was recently used by 20 and comprises daily confirmed COVID-19 death cases.The data set consists of 89 observations with an average of 18.72 daily reported deaths.The second data set represents the survival time of head and neck cancer disease patients treated by using radiotherapy (RT).The data set consists www.nature.com/scientificreports/recently used by 23 and represents the survival times in days of 73 patients diagnosed with acute bone cancer with mean survival time of 3.76.The fifth data set 24 represents the survival data of Guinea pigs infected with virulent tubercle bacilli.Guinea pigs are regarded to have a high susceptibility to human tuberculosis, which is one of the motives to select guinea pigs for this study.The sixth data set is extracted from the Insurance Data R package 25 and represents UK auto-mobile collision claims.The data set consists of 32 observations (in pounds) related to the severity of claims.The observations are divided by 100 for computational purposes (but this does not affect statistical inference).The descriptive statistics for all the data sets are shown in Table 4, whereas the data sets 1-5 are given in Table 5 (for data set 6 see 25 ).

Modeling of COVID-19 and cancer data
From a medical perspective, policy makers are always interested in accurate estimates to enable better planning for disease management and control.There are several flexible models that are commonly used for this purpose, e.g., Klakattawi et al. 23 used an extended Weibull model for cancer patients survival analysis.Badr et al. 26 employed an extended Weibull distribution on survival data.Zichuan et al. 27 analysed bladder cancer data also by using an extended Weibull distribution, and Wang et al. 28 introduced an exponent power Weibull model to analyze medical data.
In the following, we focus on data sets 1-4 (COVID-19 and cancer data).Table 6 displays the MLEs and SEs of the estimates for the fitted models.AIC, CAIC, BIC, and HQIC are shown in Table 7 along with other important metrics like p-values and the results of the Anderson-Darling (A), Cramer-von Mises (W), and Kolmogrov-Smirnov (K-S) tests.See also some related visualizations in Figs. 5, 6, 7, 8, 9 and 10.Following the results it can be stated that the proposed CBellW model with three parameters outperforms the other wellknown models.Among all other comparable models, the model with the highest p-values and lowest values of the information criteria is deemed to be the best.

Designing a GASP with application to Guinea pigs data
Product quality is one of the most important characteristics that distinguish different goods in a global market.Before approving or rejecting a lot, particular quality control procedures are carried out in accordance with different sample schemes.A lot of items will be accepted or rejected in accordance with the acceptance sampling technique depending on the quality of the items that were assessed in a sample taken from the lot 29 .The GASP inspects multiple items at once depending on the number of testers available to the experimenter for testing, whereas the ordinary acceptance sampling plan (OASP) only inspects one item at a time.www.nature.com/scientificreports/This section provides an example of a GASP having cdf as in Eq. ( 26) with known parameters β and to demonstrate the assumption that an item's lifespan distribution will follow the CBellW model.A sample of size n should be collected for a GASP, distributed, and retained for life testing for a predetermined period of time, where n = rg with r items for each group.If any group experiences more failures than the acceptance number c, the experiment is declared a failure.Many authors have briefly described GASPs, and it can be found in, e.g., [30][31][32][33][34] .When designing the GASP, the quality parameter is taken into consideration as either the mean or the median; however, for skewed distributions, the median is typically preferred 30 .The GASP is based on the following steps: • Identify the group size g.• Assign r items to each group for the life test after selecting gr items at random from a lot; in the life test, n = gr is the necessary sample size.• Set the life test's termination time t 0 and the acceptance number c for each group.
• A decision is finally made to either accept or reject the lot.A lot can be accepted when there is a maximum of c nonconforming units, and it is to be rejected when there are more than c nonconforming units.
The probability of accepting a lot is given as follows: Table 4. Descriptive information on the data sets.www.nature.com/scientificreports/where p is used to signify the likelihood that a group member would fail before t 0 and is produced by inserting Eq. ( 7) in Eq. ( 4): By replacing α = m/ζ and t = a 1 m 0 in Eq. ( 4), we obtain the probability of a failure as Given a 1 and r 2 , where r 2 = m/m 0 , p may be calculated for a chosen β and from Eq. ( 27).Both failure prob- abilities, which correspond to the consumer's and producer's risk, are denoted by p 1 and p 2 , respectively.We have to determine the values of the design parameters (c, g) that concurrently meet both of the following equations for a given value of θ and , r 2 , a 1 , β , and γ   www.nature.com/scientificreports/where r 1 and r 2 represent the mean ratio at producer's risk and consumer's risk, respectively, and the failure probabilities to be used in Eqs.(28) and (29) are given in the following Eqs.(30) and (31) for the CBellW model: and Table 8 shows the design parameters, which are obtained by taking β = 0.7330 and = 2.0201 and two levels of r (5, 10).The analysis revealed that by reducing β (consumer's risk) the number of groups tends to be increased.Moreover, the number of groups rapidly declines when r 2 increases.However, after a certain point, the prob- ability of accepting a lot is increased with constant values of g and c.Table 8, where β = 0.25 , a 1 = 1 , = 2.0201 and r = 10 , indicating that g decreases and the OC value increases, shows the proposed GASP (see also Table 9).Recently, Sivakumar et al. 24 designed a GASP under the odd generalized exponential log-logistic model by analyzing survival data from guinea pigs that had been exposed to virulent tubercle bacilli.One of the factors that led researchers to choose guinea pigs for this investigation was their reputation for having a high vulnerability to human tuberculosis.Here, we bear in mind only the observations in which all animals in a single cage are below the identical regime.The data was also studied by Bjerkedal 35 .The data set consists of 72 observations of survival time with mean and median values of 1.77 and 1.51 days, respectively.See Fig. 11 for visualizations of the related data set.The K-S test led to a p-value of 0.617 and a maximum difference between real and fitted data of 0.089.In comparison to the odd generalized exponential log logistic model 24 , the three parameter CBellW model fits the data better (K-S test 0.0774 and p-value 0.7809).The estimated parameters (SEs) are α = 0.3418 (0.1661), β = 0.7330 (0.1351) and ˆ = 2.0201 (0.2990).Table 8 shows the GASP under the CBellW model with MLE values suggesting minimal g and c for r = 5 and r = 10 and a 1 = 0.5 and 1, for lifetime testing.There are 90 groups, or 450 (= 90 • 5) total units, required for testing.The number of groups or units that must be tested (29)  In the following we exemplary apply VaR and ES to the UK auto-mobile collision claims data set.Various visualizations of the data set can be seen in Fig. 12. Table 10 gives the MLEs and SEs of the estimates for the fitted models.AIC, CAIC, BIC, and HQIC are shown in Table 11 along with other important metrics like p-values and the results of some tests (A, W, K-S).
Table 12 and Fig. 13 provide numerical and graphical representations, respectively, of both VaR and ES.By using the MLEs for the data set, the proposed CBellW model and the Weibull model are compared in terms of their VaR and ES.Note that a distribution is considered to have a heavier tail compared to another distribution when the associated risk measures yield larger values.Table 12 shows that the CBellW model has larger values of both risk measures than its counterpart, the Weibull model.Figure 13 also reveals that the proposed model has a heavier tail than the Weibull model.The readers are referred to Chan et al. 43 for numerical computations of ES and VaR using the R package VaRES.

Concluding remarks
In this paper, we have studied the CBellW model based on the CBell-G family of distributions.The failure rate function of the CBellW model can take different forms that makes it a very flexible and relevant model for realworld applications in numerous areas.We derived and discussed the key properties of the CBellW model in detail.The effectiveness of the CBellW model has been evaluated using real data applications (COVID-19, cancer, quality control, and actuarial data), and it has been compared with several established models.The conducted analysis revealed that the proposed CBellW model is superior to the competitors.The introduced distribution family represents a considerable contribution to the existing body of literature, given that it builds upon the DBellD as its foundation, inheriting the advantageous properties associated with Bell distributions.Hence, the proposed CBellW distribution family presents a promising alternative with the potential to outperform the well-established CP-G family.Since the quantile function of the CBellW model has a closed form solution it can be used to perform quantile regression analysis as an exemplary idea for fruitful future directions to further employ and enhance the CBellW model.

Figure 3 .
Figure 3. Graphical illustration of biases and MSEs for varying sample sizes for set I.

Figure 4 .
Figure 4. Graphical illustration of biases and MSEs for varying sample sizes for set II.

Figure 13 .
Figure 13.Graphical illustration of VaR and ES based on the MLEs.

Table 2 .
Output summary of simulation study regarding set I.
Vol:.(1234567890) Scientific Reports | (2024) 14:12338 | https://doi.org/10.1038/s41598-024-61308-8www.nature.com/scientificreports/ of 58 observations with a mean survival time of 226.17.This data set is also used by 21 .The third data set contains 128 people with blood cancer's average number of months in remission with a mean remission time of 9.37 months and was recently examined by many authors including Hamdeni et al. 22 and 23 .The fourth data set was

Table 3 .
Output summary of simulation study regarding set II.

Table 5 .
Real data sets.

Table 6 .
Fitted models with parameter estimates and standard errors.

Table 7 .
Detailed summary of model selection criteria.

Table 10 .
Fitted models with parameter estimates and standard errors.

Table 11 .
Detailed summary of model selection measures.

Table 12 .
VaR and ES for the CBellW and W model based on MLEs.