A new unit distribution: properties, estimation, and regression analysis

This research commences a unit statistical model named power new power function distribution, exhibiting a thorough analysis of its complementary properties. We investigate the advantages of the new model, and some fundamental distributional properties are derived. The study aims to improve insight and application by presenting quantitative and qualitative perceptions. To estimate the three unknown parameters of the model, we carefully examine various methods: the maximum likelihood, least squares, weighted least squares, Anderson–Darling, and Cramér-von Mises. Through a Monte Carlo simulation experiment, we quantitatively evaluate the effectiveness of these estimation methods, extending a robust evaluation framework. A unique part of this research lies in developing a novel regressive analysis based on the proposed distribution. The application of this analysis reveals new viewpoints and improves the benefit of the model in practical situations. As the emphasis of the study is primarily on practical applications, the viability of the proposed model is assessed through the analysis of real datasets sourced from diverse fields.


1
σ is applied to the cumulative distribution function (CDF) of NPFD.The proposed distribution is called the power new power function distribution (PNPFD).The PNPFD provides increasing, bathtub, J-shaped, reverse J-shaped, and decreasing shapes.Its density can be left-skewed, unimodal, rightskewed, concave down, or constant.Furthermore, this paper aims to delve into the main statistical properties of the PNPFD distribution.The analysis encompasses the shapes of the density function and hazard rate function, moments, incomplete moments, moment generating function (MGF), order statistics, stochastic ordering, and parameter estimation through the maximum likelihood method.To underscore the practical utility of the model, applications to real datasets are provided, demonstrating the distribution's applicability and usefulness.
An investigation of the relationship between independent one or more variables and the dependent variable is conducted by a classical regression model.The classical regression models correlate the mean response by giving specific values of the independents.In cases where the dependent variable contains an outlier, the classical regression models can be insufficient.The median can handle these scenarios better than the mean since it is a more robust estimate.For these cases, many quantile regression models were introduced such as the beta regression model by 29 , the Kumaraswamy regression model by 30 , unit Weibull regression model by 25 , unit Burr-XII regression model by 26 , the unit Burr-Hatke regression model by 31 , the unit log-log regression model by 32 , etc.This paper also introduces a new quantile regression model as an alternative to current ones based on the proposed distribution.
In this paper, we propose a new distribution as a novel probability distribution model tailored for data defined on the interval (0,1), This study makes a significant contribution to the field of statistics by thoroughly examining its statistical and reliability features.By discussing moments, stochastic ordering, reliability function, hazard rate function, order statistics, and quantile function, we comprehensively understand the PNPFD's properties.Furthermore, we establish a framework for comparing the efficacy of the PNPFD against selected distributions like the Kumaraswamy and beta distributions.This comparative analysis sets the stage for evaluating the PNPFD's performance in various statistical applications.Through rigorous parameter estimation techniques and Monte Carlo simulations, we demonstrate the precision and reliability of the PNPFD in handling real-world data.Additionally, introducing a novel regression analysis technique based on the PNPFD expands the scope of statistical modeling, particularly in scenarios where the dependent variable is proportional.Overall, this study presents a new distribution model and highlights its potential to enhance statistical analyses across diverse domains.
The rest of the paper is organized as follows: Section 2(Model formulation) introduces the nature of the probability density function (PDF) and hazard rate function (HRF) of the PNPFD.Its associated statistical properties, such as the moment generating function (mgf), moments, MRL, order statistics, stochastic ordering, and quantile function are investigated in Sect.3(Statistical properties).The estimation of the parameters is discussed in Sect.4(Estimation methods).The significant sample behavior of the PNPFD, with the help of certain simulated data sets, is detailed in Sect.5(Numerical simulation).In Sect.6(Regression analysis), a novel quantile regression is presented based on PNPFD.In Sect.7(Real data analysis), real data sets are analyzed using the proposed distribution.Finally, the study is concluded in 8.

Model formulation
In (2021), Iqbal et al. 28 derived a new statistical model called new power function distribution (NPFD) with CDF defined as follows its PDF is defined as follows The power transformation X = T 1 σ is applied to the CDF (1) to have power new power function distribution (PNPFD) with CDF defined as follows we have PNPFD PDF defined as follows Figure 1 shows the graphical representation of the PDF of the PNPFD for different combinations of parameter values of δ , η , and σ .Figure 1a-d show that they can be unimodal with monotonically increasing and then decreasing for some parameter combinations.Figure 1b shows a constant trend initially, increasing rapidly as x increases (J-shaped), and Fig. 1a shows that it can be skewed to the left.Figure 1c,d show that the PDF of PNPFD can be symmetric.

Mixture representation
The expansion of the PDF of the PNPFD proves valuable in deriving its properties.To facilitate this, we employ the following two lemmas: Using Lemmas 1 and 2, the expansion of PDF of the PNPFD can be derived as follows.Case I: 0 < δx σ < 1 , we have Case II: δx σ > 1 , we have

Reliability characteristics of the PNPFD
The reliability function (rf) of the PNPFD is given by The HRF of the PNPFD is given by (1) (2) . k+a) . (1 . Figure 2 gives examples of the shapes of the hazard function of our proposed model for different values of δ , η , and σ .Figure 2a,c show that the hazard rate function of PNPFD can be increased.Figure 2b shows that the hazard rate function can be decreased, and Fig. 2d shows that the hazard rate function of PNPFD is bathtub-shaped, depending on the values of its parameters.The reverse hazard rate function (rhrf) of the PNPFD is given by

Moments
The r th moment E(X r ) of PNPFD is given by Case I: 0 < δx σ < 1 . The first four moments of the PNPFD are obtained by substituting r = 1, 2, 3, 4 in Eqs. ( 10) and ( 11)

Moment generating function
The MGF of a PNPFD random is given by, Case I: 0 < δx σ < 1 Case II:

PDF and CDF of order statistics
The order statistics of a distribution are derived by arranging the sample values in ascending order.The PDF of the r th order statistic is expressed as: where, C r:n = n! (r−1)!(n−r)!Using Eqs.(3) and (4),the PDF of the r th the order statistic of PNPFD is given as Moreover, the CDF of r th the order statistic is given as Using Eq. (3) the CDF of the r th the order statistic of PNPFD is given as

Stochastic ordering
For a random variable X to be smaller than a random variable Y, certain conditions must be satisfied: decreasing in x we have to show that the derivative of Vol.:(0123456789) f Y (x) is less than 0, we can also show that the derivative of the logarithm of which is less than 0, when Hence, we proved Y ≥ lr X so we can say that Y ≥ hr X, Y ≥ mrl X and Y ≥ st X when Y and X follows the PNPFD.

Quantile function
By obtaining the CDF (3) of the PNPFD, the quantile function (QF) of the PNPFD is obtained by calculating the inverse function of the CDF (3) as follows

Numerical simulation
In this section, the bias and mean squared errors (MSEs) of MLEs, LSEs, WLSEs, ADEs, and CvMEs for parameters of the PNPFD are obtained via 5000 runs.For generating samples for the PNPFD in the simulation experiment, the quantile function provided in Eq. ( 17) is used.Furthermore, optimization procedures for obtaining estimations from the generated samples are performed using the BFGS method in the optim function in R. Six different scenarios are evaluated for parameter settings.These are � 1 = (0.5, 1.5, −0.5) , � 2 = (2, 1.5, −0.5), � 3 = (1.5, 0.5, 2) , � 4 = (3, 1.5, 2), � 5 = (0.5, 2.5, −0.7) and � 6 = (2.5, 0.7, 1.5) .The simulation results are given in Tables 1 and 2. Tables 1 and 2 show that the bias and MSEs decrease as the sample size increases for all estimators.According to the bias criterion, the best estimator for the parameters of σ and η is usually ADEs, while the best estimator for the δ parameter is MLEs.When scenarios are analyzed in detail, the following interpretations can be made for the MSEs criterion: • In scenario 1 , the MLEs for σ and ADEs for both η and δ are the best estimators.
• In scenario 2 , the LSEs for σ and CVMEs for both η and δ are the best estimators.
• In scenarios 3 and 6 , the WLSEs are the best estimators for three parameters.
• In scenarios 4 and 5 , the MLEs for σ and ADEs for both η and δ are the best estimators.
It is observed that the decreasing trend in bias and MSEs for all estimators is achieved as expected with the increase in sample size.

Regression analysis
In this section, a novel regression model is presented and serves as an alternative to the Kumaraswamy and beta regression models.The quantile function in Eq. ( 17) is used to obtain this new regression model.Re-parameterizing the PDF and CDF of the PNPFD can be achieved by utilizing the quantile function.Let Q p; σ , η, δ = µ and then is acquired.The CDF and PDF of the re-parametrized distribution are obtained, respectively, by and where where parameters η > 0 and δ > −1 characterize the PNPFD, while µ ∈ (0, 1) denotes the quantile regression parameter.The value of p is selected from the range (0, 1) and can be either 0.25, 0.5, or 0.75.It is noticed that the random variable Y is denoted by Y ∼ PNPF η, δ, µ, p .
Once the QPNPF has been defined, the new regression model using the PDF of the QPNPF in Eq. ( 24) can be presented.Let y 1 , y 2 , . . ., y n such that y i is an realization of Y ˜QPNPF η, δ, µ i , p for i = 1, 2, . . ., n where η, δ and µ i are unknown parameters, and the p is known.The proposed quantile regression model is as follows: where β= β 0 , β 1 , . . ., β p are the unknown regression parameter vector, x i = 1, x i1 , x i2 , . . ., x ip known ith vector of the covariates and g is a link function.We use the following logit-link function because the QPNPF is defined within the interval (0, 1): It is achieved by Eq. ( 26) www.nature.com/scientificreports/

Parameter estimation for regression parameters
In this section, for the estimate of unknown regression parameters and model parameters, the maximum likelihood estimation method is introduced.Let Y 1 , Y 2 , . . ., Y n be a random sample of size n from the QPNPF η, δ, µ i , p distribution with realizations y 1 , y 2 , . . ., y n , where the µ i is given in ( 27) for i = 1, 2, . . ., n.

Real data analysis
In this section, three real data applications are examined for both the proposed distribution and novel regression model.

Practical examples for PNPFD
In this subsection, two practical data sets are analyzed to demonstrate the usability of the PNPFD.The Kumaraswamy (K) 24 , unit-Weibull (UW) 25 , unit-Burr XII (UBXII) 26 , unit-Muth(UM) 27 , and NPFD models are used to compare the PNPFD.The PDFs for these models are given, respectively, by The maximum likelihood methodology is used to estimate the model parameters.The estimated log-likelihood ( ℓ ), Akaike information criterion (AIC), and the Bayesian information criterion (BIC) are used to assess the goodness-of-fit of the distributions.Furthermore, the Kolmogrov-Smirnov (KS) statistic and p-value of the KS statistic are calculated.The first set of data was taken from firm risk management cost-effectiveness, which is available on the web page of Professor E. Frees (Wisconsin School of Business).The data is defined on (0, 1) and calculated as the ( 27) , p 1 , p 2 > 0. total property and casualty premiums and uninsured losses as a percentage of the total assets.The first data is also reported and analyzed by 34 .Table 3 reports the first real data set modeling results.
The second data set indicates the recovery rates of viable CD34+ cells in the 239 patients who agreed to autologous peripheral blood stem cell transplant after myeloablative chemotherapy doses.The CD34+ is also investigated by 26 .Results for the CD34+ are given in Table 4.
When the modeling results for both real data sets are analyzed, Tables 3 and 4 clearly show that PNPFD is the best model among all models based on all criteria and statistics.Figures 3 and 4 present some goodness-of-fit graphs for real data modeling.In Figures 3 and 4, the fitted PDF, CDF, SF, and P-P plots of the PNPFD based on the first and second real datasets are illustrated in detail.Considering the fit in Figures 3 and 4, it is observed that the PNPFD is a suitable choice for modeling these two real datasets.

Practical example for QPNPFD
In this subsection, the new regression model is demonstrated for its usability through a real data application.For comparison purposes, the Kumaraswamy 30 and the beta 29 , log-extended exponential geometric (LEEG) 35 , and transmuted unit rayleigh (TUR) 36 regression models are utilized.The quantile parameter p is set to 0.5 for the QPNPFD, Kw, and LEEG regression models.The data is taken from 36 and can be found at https:// stats.oecd.org/ index.aspx?DataS etCode= BLI.Here, the percentage of the educational attainment values of the OECD countries (y) is considered as the dependent variable, and the percentage of the voter turnout ( x 1 ), homicide rate ( x 2 ), and life satisfaction ( x 3 ) as the independent variables.Detailed information about this data and some descriptive statistics can be viewed from 36 .This application aims to reveal the relationship with y and x 1 , x 2 , and x 3 .
The regression model is presented as  where µ i represents the median for QPNPFD, Kw, and LEEG models and the mean for Beta regression.Parameter estimates for regression models, p-values for the significance of model parameters, and log-likelihood results are presented in Table 5.From 5, it is striking that the best regression model for OECD data is the PNPFD model.For the PNPFD model, η , δ , and β 0 parameters are statistically insignificant at the level of 5%, and the other parameters β 1 , β 2 and β 3 are statistically significant at the level of 5%.The median response is positively affected by parameter β 3 , whereas the median response is negatively affected by parameters β 1 and β 2 .It is determined that an increase in life satisfaction increases the percentage of educational attainment, while an increase in voter turnout and homicide rate decreases the percentage of educational attainment.

Conclusion
This study aimed to introduce a new superior model capable of modeling and fitting data defined on (0,1).This paper introduced a new unit model as an alternative to Kumaraswamy and beta distributions.The new model's statistical and reliability features were discussed, like moments, stochastic ordering, reliability function, hazard rate function, order statistics, and quantile function.Furthermore, the PNPFD has flexible shapes for its density and hazard functions.The probability density function plots reveal that the new distribution is unimodal and J-shaped, while the hazard rate function exhibits a pattern characterized by decreased, increased, and bathtubshaped behavior.The major objectives had been established throughout the study, setting the groundwork for a comprehensive investigation into the efficacy of the PNPFD compared to existing, well-known distributions.
As we delve into the conclusion, it is noteworthy to emphasize that the research aim has been realized with resounding success.Its parameters are estimated with precision using various methods.The performance of these methods is compared with a Monte Carlo simulation.According to the simulation study, it is observed that the results of the estimators approached each other in a large sample size.Simulation results indicate that, according to the bias criterion, ADEs are typically identified as the optimal estimator for the parameters of σ and η , while MLEs are considered the most suitable estimator for the δ parameter..A novel regression analysis is introduced via the proposed distribution.Three real data analyses demonstrate the applicability and reliability of the new distribution and the new regression model evidenced by low error measures such as SE and p-value.
The results from the modeling with figures also demonstrate that the new distribution fits remarkably well with the real data.In conclusion, this study not only ensued in meeting its aim but also proved the capability of the PNPFD to contribute substantially to the field of statistics.The flexibility of the proposed regression model compared to existing regression models indicates that it is an effective model for situations where the dependent variable is proportional.The outcomes portrayed here open paths for future research incorporating novel heuristics techniques for investigating the disease dynamics and insist on the significance of the PNPFD as a beneficial tool for researchers in diverse areas, including neuro-computational intelligence, non-linear tumorimmune delayed model, nonlinear multi-delayed tumor oncolytic virotherapy systems, nonlinear influenza-A epidemic model, nonlinear multi-delays SVEIR epidemic systems, etc.We hope that this model will be used for data analysis in many different fields such as economics, engineering, medicine, etc.In addition to the properties we have discussed, several other methods, such as Bayesian regression and the method of moments, can be employed to estimate parameters to assess the efficiency of a model.By applying these methods, we can make future predictions based on the data set, allowing for further analysis and application of the proposed model.

1Lemma 2
If is a positive real non-integer and | y |≤ 1 , from Gradshteyn et al. 33 Equation (1.110) we get binomial series expansion as; If a is a positive real non-integer and | y b |> 1 and If a is a positive real non-integer and | y b |< 1

Figure 1 .
Figure 1.Plot for PDF of the PNPFD for different parameters values.

Figure 2 .
Figure 2. Plot for HRF of the PNPFD for different parameters values.

Figure 3 .Figure 4 .
Figure 3.The fitted PDF, CDF, SF, and P-P plots for PNPFD of the first data.

Table 1 .
The bias of all estimators for PNPFD.

Table 2 .
The MSEs of all estimators for PNPFD.

Table 3 .
The goodness of fit results for the first data sets.

Table 4 .
The goodness of fit results for the second data sets.

Table 5 .
Parameter estimates of regression models for OECD data with standard error (SE) and loglikelihoods.