Robust process capability indices Cpm and Cpmk using Weibull process

Process Capability Indices (PCIs) are very helpful to measure the manufacturing capability and production quality of the products in many manufacturing processes. These PCIs are calculated by using a relationship between process mean and standard deviation, provided that process follows a normal distribution. In case of non-normal processes many researchers recommended the use of robust PCIs by modifying the classical PCIs. In case of robust PCIs most of the work is reported on first- and second-generation PCIs but less work is reported on third generation PCIs. The objective of this work was to evaluate the efficiency of three dispersion measures, namely median absolute deviation (MAD), interquartile range (IQR), and Gini's mean difference (GMD), as a measure of dispersion in third generation PCIs and construct their bootstrap confidence intervals (CIs). The efficacy of these measures is compared with quantile-based PCIs under different asymmetric behaviour of the Weibull process. The results showed that quantile-based PCIs are strongly influenced by high asymmetry and IQR method provides a poor estimator across all sample sizes. On the other hand, the GMD method performed well under low, moderate, and high asymmetry of the Weibull process, but its irregular behavior needs to be addressed carefully. Among all selected four methods MAD-method performed better under low and moderate asymmetric conditions. In case of interval estimation, bias-corrected percentile (BCPB) CIs was recommended for quantile-based PCIs, while percentile (PB) and percentile-t (PTB) CIs were recommended for MAD-based PCIs under all asymmetric conditions. To validate the simulated findings, two real-world datasets were analyzed that supported the simulation results.


Quantile based PCI
Suppose that y is a random variable with probability distribution f (x, θ) , where θ is a single unknown parameter.Let [y 1 , y 2 , . . .., y n ] be an i.i.d random sample selected from the process having density f (x, θ) .θ = (θ 1 , . . ., θ k ) τ is the transpose of the column vector of process parameters.The likelihood and log likelihood function of θ are given by respectively.The α-quantile of the process distribution is defined implicitly by the function Then the quantile-based PCI superstructure is a function of the population parameter θ .That is Let θ = θ 1 , θ 2 , . . ., θ k τ which maximizes L(θ ) or l(θ) , be the MLE of θ .The maximum likelihood estimator of quantile Q α is defined to the Q α = Q α θ .Therefore, the parametric maximum likelihood estimators of the supersaturated PCI is Note that C Np (η, κ, θ) is a real-valued function of quantile, Q p 1 , Q p 2 , and Q p 3 which are a continuous real- valued function of the parameter θ .Since θ is a consistent MLE of θ , C Np η, κ, θ is a consistent MLE of C Np (η, κ, θ) under some regularity conditions 40 .

Median absolute deviation (MAD) based PCI
Suppose that the sample median (MD) is computed from a random sample (x 1 , x 2 , . . . . . ., x n ) .Then MAD from the sample median is defined as 25,26,41 .
The value of constant b in (7) is used to make it as a consistent estimator.In case of normal distribution, MAD is an unbiased estimator of σ if b = 1.4826 .For any non-normal distribution, this value changes to b = Q −1 0.75 , where Q 0.75 is the 75 th quantile of any underlying distribution.In case of normality, Q −1 0.75 = 1.4826 .Thus, the unbiased estimator of σ is Using (8) the MAD based estimators for supersaturated and third generation PCIs can be defined as

Inter quantile range (IQR) based PCI
The population IQR for any continuous distribution is defined as where both upper and lower quantiles are found by solving the following integrals Using (12) the IQR based estimators for supersaturated and third generation PCIs can be defined as

Gini's mean difference (GMD) based PCI
The Gini's Mean Difference for a set of n ordered observations, {x 1 , x 2 , • • • , x n } of a random variable X which arranged in ascending order of magnitude, is defined as www.nature.com/scientificreports/If the random variable x follows normal distribution with mean µ and variance σ 2 , then 42 , suggests as a pos- sible unbiased estimator of standard deviation (σ ) is where c = √ π = 1.77245 and latter on 43 proved that is an unbiased measure of variability.Using (21) GMD based estimators for supersaturated and third generation PCIs can be defined as

Case studies for non-normal distribution
One of the most suitable distribution that fits the quality parameters is Weibull distribution.The two parameter Weibull distribution, with γ and β as shape and scale parameters, is given as The cumulative distribution, quantile function for (24) respectively are defined as The maximum likelihood estimator of γ and β are defined as

The IQR of Weibull process
The IQR for Weibull process defined in (24) is defined as

The Gini's mean difference of Weibull process
By following the procedure of 44 , the unbiased estimator of GMD for Weibull distribution is, To evaluate the performance of robust third generation PCIs at different skewness behaviour of Weibull distribution, shape and scale parameters are selected so that the skewness level may be categorized as low, moderate and high as shown in Fig. 1. www.nature.com/scientificreports/

Methods of bootstrap confidence interval
The bootstrap technique originated from 45 .Morove Efron 45 and Hall et al. 46 provide theoretical details about the bootstrap technique.This technique can be used to construct confidence intervals for parameters when the usual interval estimation approach is not feasible.BCIs are commonly applied in constructing the confidence intervals for various PCIs.Suppose that ς 1 , ς 2 , ..., ς n constitute a random sample with n observations taken from a distri- bution of interest, say φ , i.e. ς 1 , ς 2 , . . ., ς n ∼ φ .Let θ represent an estimator of an arbitrary PCIs say C pmor C pmk .
Then the bootstrap technique is implemented as follows: i.A bootstrap sample with n observations (with replacement) is taken from the original sample by using 1 n as the mass at each point, where this bootstrap sample is denoted as ς * 1 , ς * 2 , . . ., ς * n .ii.From the kth bootstrap sample, for 1 ≤ k ≤ n , the kth bootstrap estimator of θ (an arbitrary PCI) can be denoted as iii.If the number of resamples in the bootstrap technique is B, then a total of B estimates of θ * can be obtained.
Arranging the whole collection from the smallest to the largest value constitutes an empirical bootstrap distribution of θ 13 .B = 1000 bootstrap resamples is considered in this article.The confidence intervals of θ can be constructed using any of the following three bootstrap techniques.

Method 1: Standard bootstrap (SB) confidence interval
The sample average and sample standard deviation are computed as follows using the 1000 bootstrap estimates of θ * : Consequently, the 1 00(1 − α)% SB confidence interval is obtained as where z (1− α 2 ) is the 1 − 1 α th quantile of the standard normal variable.

Method 2: Percentile bootstrap (PB) confidence interval
Since there is a total of B resamples of θ * , these resamples will produce B estimates of θ * .An arrangement of these estimates from the smallest value to the largest value will form an empirical distribution of θ * .From the ordered empirical distribution of θ * , choose the 100 α 2 and 100 1 − α 2 percentiles as the end points of the interval, which results in the 100(1 − α)% PB confidence interval for θ * given as For example, the 95% confidence interval with 1000 bootstrap estimates is where θ * (25) and θ * (975) represent the 25th and 975th ordered collection of the bootstrap estimates of θ * .
( Vol:.( 1234567890) ) and z (1− α 2 ) are the distribution function, α 2 th quantile and 1 − α 2 th quan- tile, respectively, of the standard normal distribution.Consequently, the 100(1 − α)% BCPB confidence interval is constructed as The average width (AW) is considered to compare the three different types of BCIs.The AW of the BCI is computed using a total of M trials.Next, the estimated AW is computed as where L w i and U w i are the estimated lower confidence limit and upper confidence limit of the 100 (1 − α)% confidence interval for any of the three types of BCIs based on the ith replicate.

Results and discussion
The point and interval estimation of modified PCIs based on Quantile (PC), MAD, IQR and GMD for different asymmetric behavior of Weibull distribution is given in Tables 1, 2, 3, 4.
Following 47 target values equal to 1.33 corresponding to existing processes were considered for the point estimation of indices C pm , and C pmk .The performance of each modified PCI under different asymmetric behavior is evaluated by using 10,000 simulated samples of size 25, 50,75 and 100.The R-Statistical language was used to complete simulation study.Bias and Mean Square Error (MSE) criteria has been used for the comparison purpose.The simulations have been performed on the following steps 1. C o l l e c t 1 0 , 0 0 0 s amp l e s of s i z e 2 5 f rom We i bu l l pro c e s s w it h p ar am e t e r s Shape, Scale = (2.8,3.5), (1.80, 2.00), (1.00, 1.30) .

Compute C pm & C pmk based on the measures of MAD (Median Absolute Deviation), IQR (Interquartile
Range) and GMD (Ginni' s Mean Differnce) . (

Results for PC based PCIs
Simulation results of quantile approach as suggested by 23 are presented in Table 1 for Weibull distribution.These tables depict the simulated mean, MSE, standard deviation and bias in parenthesis, bias and mean square error (MSE) corresponding to the target value equal to 1.33 for both indices for low, moderate and high asymmetric behavior of Weibull distribution.
In the case of index C pm , the PC-method gives good results under low and moderate asymmetric behavior, however, underestimates the target value in case of high asymmetry.As the sample size increases, the estimated values come close to the target values and ultimately produce less bias and mean square error.For the index C pmk , the PC-method is more accurate as compared to other three indices and gives lowest bias and MSE under low and moderate asymmetric conditions for the sample (n = 100) .For the three asymmetric levels of the Weibull distribution, following conclusions can be drawn; the PC-method gives a lower bias and MSE for indices C pmk under lower and moderate asymmetric behaviors when the target value is 1.33.For index C pm , except for high asymmetry, the MAD-based estimator is closer to the target value and less biased for large samples.The MAD-based estimator of index C pmk showed good performance for small sample sizes only.It showed accurate results under low and moderate asymmetric condition whereas for the new process it deals better with high asymmetry.In both cases, it slightly underestimates the target values for a large sample.

Results for IQR based PCIs
The simulation results of IQR based PCIs C pm , and C pmk for Weibull distribution under low, moderate and high asymmetric levels are reported in Table 3.The simulation results of both indices using IQR-method for all asymmetric levels of three distributions show the overestimation using all sample sizes.So, these estimators do not consider as good estimators.In all cases, large bias and MSE for all sample sizes is observed.The situation tends to worse estimation for all indices as asymmetry level turns from low to high level.Moreover, the findings of the simulation results indicate that IQR-method could not be a useful and attractive method for practical point of view due to large bias and MSE.

Results for GMD based PCIs
In this section, the performance of both PCIs, C pm and C pmk based on GMD-method has been assessed and compared under low, moderate and high asymmetric condition of Weibull distribution.The results are presented in Table 4.The results indicate that GMD-based PCIs perform better under the moderate asymmetric condition for the index C pm for large samples.The bias and MSE reduce as sample size increases.In case of index C pmk , the GMD-based estimator slightly overestimates the target value of 1.67 for small samples under low asymmetry, but bias increases as sample size increases.For moderate asymmetry this method underestimates, and for high asymmetry, it again overestimates the target value of new processes.Based on the above observations, GMDbased estimators of indices C pm and C pmk have the following results

Bootstrap confidence intervals for C pm and c pmk
In this section, four bootstrap confidence intervals, namely standard, percentile, bias-corrected percentile and percentile-t bootstrap confidence intervals are discussed for indices C pm and C pmk using PC, MAD and GMD method.For the simulation, Weibull process are used under low, moderate and high asymmetric conditions for sample sizes n = 25, 50, 75 and 100 .The results are presented in Tables 5, 6, 7, 8, 9, 10 which indicate true index value, 95% confidence limits, and coverage probability of each index under low, moderate and high asymmetric conditions for all sample sizes.These results are based on 1000 replications and different values of USL and LSL for the three types of processes which are given in Table 2 above.
Tables 5 to 6, present the 95% BCIs for the Weibull process using PC-method, while the coverage probability of each method is reported below each interval.Similarly, Tables 7 to 8 presents the 95% BCIs for Weibull process along with coverage probabilities using the MAD method.The results presented in all these tables indicate that the average width of all confidence intervals, which is the difference between lower and upper specification limit, reduces when the sample size increases in all cases under study.Moreover, the asymmetric levels effect the average width, where the average width increases as asymmetry increases.

BCIs for Weibull distribution
From the results of Weibull distribution, followings conclusions have been drawn.i.Among the PC-based estimators of both indices C pm and C pmk , BCBP method explicated least average width, under low, moderate and high asymmetric behavior of Weibull process.ii.Based on the average with, the four bootstrap methods are ranked as BCPB < PB < PTB < SB. iii.The coverage probability is directly proportional to sample size and reached to the nominal level 0.95 for large sample size in the case of SB and BCPB method.However, other two methods did not reach to a nominal level, particularly for small samples.iv.In the case of the MAD method, both BCPB and PB CIs showed less average width as compared to SB and PTB.Based on the average with, the four bootstrap methods are ranked as BCPB < PB < PTB < SB. v.Among BCPB and PB CIs, former showed lower coverage probability than later.Consequently, PB CI performed better for MAD-method.vi.In both methods, when the transition is made from low to high asymmetric conditions the average width approximately increased by two times.It means under high asymmetry; the width of CI is larger as compared to low and moderate asymmetry.
In general, BCPB CI is recommended for all asymmetric condition when PC-method is used.On the other hand, PB CI is recommended for MAD-method under low, moderate and high asymmetric behavior of Weibull process.The recommendation is made on the basis of low average width and high coverage probability among four BCIs.

Application of proposed methodology using practical data
A data sets was analysed using GMD, MAD and PC based PCIs.The results are appended in the following section.

Data: strength measures in GPA for single fibres data
In this section, a real-life example is presented to demonstrate the application of the MAD, PC and GMD-methods for the indices C pm , and C pmk .The data which represents the strength measures in GPA for single fibres and impregnated 1000-carbon fibre tows.Single fibres were tested under tension at a gauge length of 20 mm with sample size n = 69 and are given in Table 11 [48][49][50] .
To select the appropriate distribution, the different goodness of fit statistics 51 were used and reported in Table 12 along with summary statistics of the data.Based on AIC and BIC values, it is confirmed that two-parameter Weibull distribution is suitable for this data as compared to other distributions.By fitting two-parameter Weibull distribution, the maximum likelihood estimator for shape and scale parameters are γ = 5.504809, β = 2.650830, respectively.
To evaluate the adequacy of the data K-S goodness of fit test is used.The K-S distance value for this data is 0.056 with p-value 0.9816, which also in favor of Weibull distribution.The lower and upper specification limits used for the calculations of PCIs were (0.3989, 4.4960).The estimates of both indices using three methods and their corresponding bootstrap CIs are reported in Table 13.Likewise, simulation study, the performance of MAD and GMD method are more accurate than PC-method.Both indices C pk , C pmk showed better performance and estimated value is close to existing process target values.Based on the average width of CIs, the four bootstrap methods are ranked as BCPB < PB < PTB < SB .Overall, MAD and GMD, based estimator showed the wider spread of CIs.

Summary and conclusion
Statistical Process Control (SPC) is an attractive statistical tool and commonly used to monitor the processes in many industries now a days.Among SPC, PCIs have become an attractive and important tool to measure the quality of any product within specified limits.It seems difficult to choose the proper PCI that performs accurately in non-normal distribution while process variability and mean are being affected by non-normality.Moreover, any PCI which does not provide high target value (> 1.33) even then its importance cannot be neglected.So, the conditions under which PCI performs poorly it opens a new research horizon for the researchers.
The pragmatic attempt has been conducted to address the non-normality issues in PCIs using quantile (PC), MAD, IQR and GMD methods under asymmetric conditions of Weibull distribution.Moreover, the point and interval estimation of modified PCIs were assessed using simulation studies.The point estimation of quantilebased PCIs using PC-method has been observed an effective approach under low and moderate asymmetric conditions of Weibull process.PC-based estimator tends to be an under-estimation.However, this trend increases as sample size increases.Results not only indicate that PC-based estimator produces large bias but also explain under and overestimation of target values.Moreover, the PC-based estimator is influenced by high asymmetry and explains the worst estimation for all three distributions.
The simulation studies reveal that the results of MAD-method can be successfully used and has a great potential to deal with non-normality for Weibull process under low and moderate asymmetry.Overall, MAD-based estimators tend to produce very accurate results under low and moderate asymmetric conditions.In the case of high asymmetry, MAD-estimator of index C pm has shown good performance only for a sample of size less than 50.
The simulation studies for PCIs show that IQR-method gives overestimation problem for selected asymmetric levels of Weibull distribution.Moreover, a large bias and MSE has been observed for all sample sizes.The situation became worse when asymmetry level turned from low to high.Therefor, the IQR based estimators were not considered as good estimators for dealing non-normality.Finally, we demonstrated the application of GMD as a measure of variability in PCIs C pm and C pmk for Weibull distribution under low, moderate and high asymmetric conditions.The results indicate that GMD-method works well to some extent under high asymmetry but to get a better estimation of PCIs more research is required.
Beside point estimation, interval estimation of all PCIs was constructed.Moreover, four types of bootstrap confidence intervals i.e., SB, PB, BCPB and PTB and their coverage probabilities using simulation studies were calculated.The selection of the appropriate confidence interval for each method has been made by low average width and higher coverage probability.
The simulations illustrated that BCPB CIs produce the smallest average widths and highest coverage prob- abilities under all asymmetric levels of Weibull distribution for quantile-based (PC) indices C pm , and C pmk .On the other hand, the PB and PTB CIs are recommended for MAD-based indices.Both asymmetric behavior and sample size effect the width and coverage probabilities of confidence intervals.Moreover, coverage probabilities approach to nominal levels with the increase of sample size.The BCPB and PB CIs provides higher coverage probability with a smaller width in case of GMD-based estimators.

Recommendations
By conducting a comprehensive study, we concluded the following two recommendations.
1.The performance of both modified PCIs is highly effected by asymmetric behavior of the distributions.
However, the accurate performance of a particular method for one distribution does not necessitate accurate results for another distribution having different tail behavior.2. To deal with high asymmetry, more care is needed both for point and interval estimation.In general, in the case of point estimation, quantile-based PC-method leads towards under-estimation, while robust methods like MAD, IQR, and GMD leads towards over-estimation.For interval estimation, a wider spread of CIs was observed under high asymmetry as compared to low and moderate asymmetry.

Figure 1 .
Figure 1.PDF plots of Weibull distributions with different asymmetry levels along with shape and scale parameters.

Table 1 .
The statistical indicators of index C pm and C pmk for different asymmetric level of Weibull process based on PC-method.

Table 2 .
The statistical indicators of index C pm and C pmk using selected asymmetric level of Weibull process based on MAD-method.

Table 3 .
The statistical indicators of index C pm and C pmk using selected asymmetric level of Weibull process based on IQR-method.

Table 4 .
The statistical indicators of index C pm and C pmk using selected asymmetric level of Weibull process based on GMD-method.

Table 2
summarize the results of MAD-based estimators of both PCIs i.e., C pm and C pmk .Unlike the PC-method, MAD-based estimators of two indices showed a different pattern for Weibull process.Summing up the overall results, it can be concluded that performance of MAD based estimators is consistently better than that of PCbased estimator from low to high asymmetry.

Table 5 .
The 95% bootstrap confidence intervals with coverage probabilities for Weibull distribution using PC methods for index C pm .

Table 6 .
The 95% bootstrap confidence intervals with coverage probabilities for weibull distribution using PC methods for index C pmk .

Table 7 .
The 95% bootstrap confidence intervals with coverage probabilities Weibull distribution using MAD methods for index C pm .

Table 8 .
The 95% bootstrap confidence intervals with coverage probabilities Weibull distribution using MAD methods for index C pmk .

Table 9 .
The 95% bootstrap confidence intervals with coverage probabilities for Weibull distribution using GMD methods for index C pm .

Table 10 .
The 95% bootstrap confidence intervals with coverage probabilities for Weibull distribution using GMD methods for index C pmk .

Table 12 .
The summary statistics and goodness of fit statistics for fibre strength data.

Table 13 .
The point estimates and width of four BCIs for fibre strength data.