Introduction

Statistical distributions constitute fundamental mathematical elements in data modeling, inference, and estimating processes, as well as in fields such as public health, actuarial science, biomedical studies, demography, and industrial reliability. Due to the lack of a suitable distribution for the data and the limitations of the existing distribution theory, researchers frequently selected the most appropriate distribution from the available blocks. In many studies, the absence of proper statistical distributions forces researchers in various fields to consistently put effort into developing new distributions to support their judgments. Applied researchers and practitioners often find modeling complex problems to be a perplexing challenge, especially when dealing with diverse lifetime datasets prevalent in physical and natural sciences. In their quest for simplicity and efficiency, exhaustive reviews on this subject can be explored in1,2. These references offer comprehensive summaries of statistical distributions derived through various methodologies.

New statistical models built on attractive distributions have long been a favorite in the statistical literature due to the complexity and diversity of modern data. The extended distributions suggested by adding extra parameters provide greater flexibility.

Numerous studies are examined to build probability distributions with substantially more perfect and flexible properties that can model real-life data sets of diverse kinds. The requirement to create new distributions appears from hypothetical concerns, actual applications, or both. There has been a pointed extension in generalizing some well-known distributions and their sensible application to contest more well-known distributions. The exponential distribution is perfect for exposing the life data, like for many types of industrial items. The major highlight of the exponential distribution is that it may be used to model the performance of objects with a fixed failure rate. The primary objective of this paper is to present a new, better model capable of modeling and fitting distinct forms of data. It also aims to exhibit the dominance of the new model in surpassing every opponent. It proposes a new model as a strong and novel contestant for modeling real data sets. Once demonstrating a situation with a known model is difficult, we might use generalization to account for extra data variation. The challenges present at this time are progressing significantly along with our world. As of this, we insist on extra generalizations of probability distributions to capture more complicated data. Also, it might be used to analyze many real-life data sets and fit them quite well; it can also be used in various problems in applied areas such as medicine, engineering, and industrial reliability analysis.

Moreover, numerous families of probability distribution have been suggested by a combining technique tracking the innovative work of Adamidis and Loukas3. Composite types have been headed in the situation of reliability study when the lifespan can be declared as the least or extreme of a system of independent and identically distributed (i.i.d.) random variables demonstrating system components failure times. The new combination of distributions can extend well-known classical distributions and provide flexibility in modeling data. Combining some valid lifetime data with power series (PS) distributions has been proposed by quite a few authors. Some of them are exponential-PS, Weibull-PS, generalized exponential PS, extended Weibull PS, Burr XII PS, Lindley PS, generalized inverse Weibull PS, and complementary exponentiated inverted Weibull PS distributions4,5,6,7,8,9,10,11.

Also, the power function (PF) distribution is a flexible lifetime distribution that may offer a suitable fit to some sets of failure data. Some generalized distributions from PF are beta PF12, Weibull13, Kumaraswamy PF14, transmuted PF (TPF)15, exponentiated Kumaraswamy PF16, exponentiated Weibull PF17 and odd generalized exponential PF18. In addition to the above-mentioned distributions some nonlinear predictive network epidemic models were introduced in the literature19,20,21,22,23

The primary objective of this article is to introduce an advanced model designed for the modeling and fitting of data defined on (0,1). We aim to demonstrate the superiority of this new model by surpassing all existing competitors. We advocate for the proposed distribution as a robust and innovative choice for modeling real datasets. In situations where modeling with a known distribution proves challenging, the utilization of generalization becomes crucial to accommodate additional variations in the data.

This paper aims to develop a three-parameter alternative to several lifetime distributions, including the Kumaraswamy24, unit-Weibull25, unit-Burr XI26, unit-Muth27, and new power function28 distributions. In this context, we propose and develop the statistical properties of the proposed distribution and show that it is a better model for reliability analysis to the data defined on (0,1).

In this paper, a new extended form of the new power function distribution (NPFD) is proposed using the power transformation \(X=T^{\frac{1}{\sigma }}\) is applied to the cumulative distribution function (CDF) of NPFD. The proposed distribution is called the power new power function distribution (PNPFD). The PNPFD provides increasing, bathtub, J-shaped, reverse J-shaped, and decreasing shapes. Its density can be left-skewed, unimodal, right-skewed, concave down, or constant. Furthermore, this paper aims to delve into the main statistical properties of the PNPFD distribution. The analysis encompasses the shapes of the density function and hazard rate function, moments, incomplete moments, moment generating function (MGF), order statistics, stochastic ordering, and parameter estimation through the maximum likelihood method. To underscore the practical utility of the model, applications to real datasets are provided, demonstrating the distribution’s applicability and usefulness.

An investigation of the relationship between independent one or more variables and the dependent variable is conducted by a classical regression model. The classical regression models correlate the mean response by giving specific values of the independents. In cases where the dependent variable contains an outlier, the classical regression models can be insufficient. The median can handle these scenarios better than the mean since it is a more robust estimate. For these cases, many quantile regression models were introduced such as the beta regression model by29, the Kumaraswamy regression model by30, unit Weibull regression model by25, unit Burr-XII regression model by26, the unit Burr-Hatke regression model by31, the unit log-log regression model by32, etc. This paper also introduces a new quantile regression model as an alternative to current ones based on the proposed distribution.

In this paper, we propose a new distribution as a novel probability distribution model tailored for data defined on the interval (0,1), This study makes a significant contribution to the field of statistics by thoroughly examining its statistical and reliability features. By discussing moments, stochastic ordering, reliability function, hazard rate function, order statistics, and quantile function, we comprehensively understand the PNPFD’s properties. Furthermore, we establish a framework for comparing the efficacy of the PNPFD against selected distributions like the Kumaraswamy and beta distributions. This comparative analysis sets the stage for evaluating the PNPFD’s performance in various statistical applications. Through rigorous parameter estimation techniques and Monte Carlo simulations, we demonstrate the precision and reliability of the PNPFD in handling real-world data. Additionally, introducing a novel regression analysis technique based on the PNPFD expands the scope of statistical modeling, particularly in scenarios where the dependent variable is proportional. Overall, this study presents a new distribution model and highlights its potential to enhance statistical analyses across diverse domains.

The rest of the paper is organized as follows: Section 2(Model formulation) introduces the nature of the probability density function (PDF) and hazard rate function (HRF) of the PNPFD. Its associated statistical properties, such as the moment generating function (mgf), moments, MRL, order statistics, stochastic ordering, and quantile function are investigated in Sect. 3(Statistical properties). The estimation of the parameters is discussed in Sect. 4(Estimation methods). The significant sample behavior of the PNPFD, with the help of certain simulated data sets, is detailed in Sect. 5(Numerical simulation). In Sect. 6(Regression analysis), a novel quantile regression is presented based on PNPFD. In Sect. 7(Real data analysis), real data sets are analyzed using the proposed distribution. Finally, the study is concluded in 8.

Model formulation

In (2021), Iqbal et al.28 derived a new statistical model called new power function distribution (NPFD) with CDF defined as follows

$$\begin{aligned} G(t)=1-\left( \frac{1-t}{\delta t+1}\right) ^{\eta },\quad ~0<t<1,~\eta >0,~-1<\delta <\infty . \end{aligned}$$
(1)

its PDF is defined as follows

$$\begin{aligned} g(t)=(\delta +1) \eta (1-t)^{\eta -1} (\delta t+1)^{-\eta -1}. \end{aligned}$$
(2)

The power transformation \(X=T^{\frac{1}{\sigma }}\) is applied to the CDF (1) to have power new power function distribution (PNPFD) with CDF defined as follows

$$\begin{aligned} F(x)=1-\left( \frac{1-x^{\sigma }}{\delta x^{\sigma }+1}\right) ^{\eta },~0<x<1,~\eta ,~\sigma >0,~-1<\delta <\infty . \end{aligned}$$
(3)

we have PNPFD PDF defined as follows

$$\begin{aligned} f(x)=\frac{(\delta +1) \eta \sigma x^{\sigma -1} \left( \frac{1-x^{\sigma }}{\delta x^{\sigma }+1}\right) ^{\eta }}{\left( 1-x^{\sigma }\right) \left( \delta x^{\sigma }+1\right) }. \end{aligned}$$
(4)

Figure 1 shows the graphical representation of the PDF of the PNPFD for different combinations of parameter values of \(\delta\), \(\eta\), and \(\sigma\). Figure 1a–d show that they can be unimodal with monotonically increasing and then decreasing for some parameter combinations. Figure 1b shows a constant trend initially, increasing rapidly as x increases (J-shaped), and Fig. 1a shows that it can be skewed to the left. Figure 1c,d show that the PDF of PNPFD can be symmetric.

Figure 1
figure 1

Plot for PDF of the PNPFD for different parameters values.

Statistical properties

Mixture representation

The expansion of the PDF of the PNPFD proves valuable in deriving its properties. To facilitate this, we employ the following two lemmas:

Lemma 1

If \(\lambda\) is a positive real non-integer and \(\mid y \mid \le 1\), from Gradshteyn et al.33 Equation (1.110) we get binomial series expansion as;

$$\begin{aligned} (1-y)^{\lambda -1} = \sum _{i=0}^{\infty }(-1)^i {\left( {\begin{array}{c}\lambda -1\\ i\end{array}}\right) }y^i. \end{aligned}$$

Lemma 2

If a is a positive real non-integer and \(\mid y^b \mid > 1\)

$$\begin{aligned} (1+y^b)^{-a} = \sum _{k=0}^{\infty } {\left( {\begin{array}{c}a+k-1\\ k\end{array}}\right) }y^{-b(k+a)}. \end{aligned}$$

and If a is a positive real non-integer and \(\mid y^b \mid < 1\)

$$\begin{aligned} (1+y^b)^{-a} = \sum _{k=0}^{\infty } {\left( {\begin{array}{c}a+k-1\\ k\end{array}}\right) }y^{bk}. \end{aligned}$$

Using Lemmas 1 and  2, the expansion of PDF of the PNPFD can be derived as follows.

Case I: \(0<\delta x^{\sigma }<1\), we have

$$\begin{aligned} f(x)=(\delta +1)\delta ^k \eta \sigma \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }x^{\sigma (j+k+1) -1.} \end{aligned}$$
(5)

Case II: \(\delta x^{\sigma }>1\), we have

$$\begin{aligned} f(x)=(\delta +1)\delta ^{-(k+\eta +1)}\eta \sigma \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }x^{\sigma (j-k-\eta )-1}. \end{aligned}$$
(6)

Reliability characteristics of the PNPFD

The reliability function (rf) of the PNPFD is given by

$$\begin{aligned} R(x)=\left[ \frac{(1-x^{\sigma })}{(\delta x^{\sigma }+1)}\right] ^\eta . \end{aligned}$$
(7)

The HRF of the PNPFD is given by

$$\begin{aligned} H(x)=\frac{(\delta +1) \eta \sigma x^{\sigma -1} }{(1-x^{\sigma })(\delta x^{\sigma }+1)}. \end{aligned}$$
(8)

Figure 2 gives examples of the shapes of the hazard function of our proposed model for different values of \(\delta\), \(\eta\), and \(\sigma\). Figure 2a,c show that the hazard rate function of PNPFD can be increased. Figure 2b shows that the hazard rate function can be decreased, and Fig. 2d shows that the hazard rate function of PNPFD is bathtub-shaped, depending on the values of its parameters.

Figure 2
figure 2

Plot for HRF of the PNPFD for different parameters values.

The reverse hazard rate function (rhrf) of the PNPFD is given by

$$\begin{aligned} W(x)=\frac{\eta \sigma (\delta +1) x^{\sigma -1}(1-x^{\sigma })^{\eta -1} }{(\delta x^{\sigma }+1)\left[ (\delta x^{\sigma }+1)^\eta -(1-x^{\sigma })^\eta \right] }. \end{aligned}$$
(9)

Moments

The \(r^{th}\) moment \(E(X^r)\) of PNPFD is given by

Case I: \(0<\delta x^{\sigma }<1\)

$$\begin{aligned} E(X^r) =\frac{(\delta +1)\delta ^k \eta \sigma \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }}{\sigma (j+k+1)+r}. \end{aligned}$$
(10)

Case II: \(\delta x^{\sigma }>1\),

$$\begin{aligned} E(X^r) =\frac{(\delta +1)\delta ^{-(k+\eta +1)}\eta \sigma \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }}{\sigma (j-k-\eta )+r}. \end{aligned}$$
(11)

The first four moments of the PNPFD are obtained by substituting \(r=1,2,3,4\) in Eqs. (10) and (11)

Moment generating function

The MGF of a PNPFD random is given by, Case I: \(0<\delta x^{\sigma }<1\)

$$\begin{aligned} M_X(t) = \sum _{r=0}^{\infty }\frac{t^r}{r!}\mu _r' = \frac{(\delta +1)\delta ^k \eta \sigma }{\sigma (j+k+1)+r}\sum _{r=0}^{\infty }\frac{t^r}{r!} \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }. \end{aligned}$$

Case II: \(\delta x^{\sigma }>1\)

$$\begin{aligned} M_X(t) = \sum _{r=0}^{\infty }\frac{t^r}{r!}\mu _r' = \frac{(\delta +1)\delta ^{-(k+\eta +1)}\eta \sigma }{\sigma (j-k-\eta )+r}\sum _{r=0}^{\infty }\frac{t^r}{r!} \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }. \end{aligned}$$

Incomplete moment

The incomplete \(r^{th}\) moment is defined by

$$\begin{aligned} m_r(x) = \int _{0}^{x} x^r f(x) dx. \end{aligned}$$

For the PNPFD, incomplete \(r^{th}\) moment is obtained by;

Case I: \(0<\delta x^{\sigma }<1\)

$$\begin{aligned} \begin{aligned} m_r(x) = \frac{(\delta +1)\delta ^k \eta \sigma }{\sigma (j+k+1)+r} \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }x^{\sigma (j+k+1)+r}. \end{aligned} \end{aligned}$$
(12)

Case II: \(\delta x^{\sigma }>1\)

$$\begin{aligned} \begin{aligned} m_r(x) = \frac{(\delta +1)\delta ^k \eta \sigma }{\sigma (j-k-\eta )+r} \sum _{j=0}^{\infty } \sum _{k=0}^{\infty } (-1)^{j+k}{\left( {\begin{array}{c}\eta -1\\ j\end{array}}\right) } {\left( {\begin{array}{c}\eta +k\\ k\end{array}}\right) }x^{\sigma (j-k-\eta )+r}. \end{aligned} \end{aligned}$$
(13)

Mean residual life function

The mean residual life (MRL) function is significant in reliability and survival analysis. It describes how long a system will operate, beginning at the time x. For PNPFD, the MRL is obtained as,

$$\begin{aligned} \phi (x) =\frac{R(x+t)}{R(t)} =\Bigg [\frac{(\delta x^{\sigma }+1)(1-(x+t)^{\sigma })}{(\delta (x+t)^{\sigma } +1)(1-(x)^{\sigma })}\Bigg ]^\eta . \end{aligned}$$
(14)

PDF and CDF of order statistics

The order statistics of a distribution are derived by arranging the sample values in ascending order. The PDF of the \(r^{th}\) order statistic is expressed as:

$$\begin{aligned} f_{r:n}(x) = C_{r:n} [F(x)]^{r-1}[1-F(x)]^{n-r}f(x). \end{aligned}$$

where, \(C_{r:n}= \frac{n!}{(r-1)! (n-r)!}\)

Using Eqs. (3) and (4),the PDF of the \(r^{th}\) the order statistic of PNPFD is given as

$$\begin{aligned} \begin{aligned} f_{r:n}(x) =C_{r:n} (\delta +1) \eta \sigma \Bigg [1-\left( \frac{1-x^{\sigma }}{\delta x^{\sigma }+1}\right) ^{\eta }\Bigg ]^{r-1 }(1-x^{\sigma })^{\eta (n-k+1)-1}(\delta x^{\sigma }+1)^{-\eta (n-k+1)-1} \end{aligned}. \end{aligned}$$
(15)

Moreover, the CDF of \(r^{th}\) the order statistic is given as

$$\begin{aligned} F_{r:n}(x) = \sum _{m=r}^{n}C_{n:m} [F(x)]^{m}[1-F(x)]^{n-m}. \end{aligned}$$

Using Eq. (3) the CDF of the \(r^{th}\) the order statistic of PNPFD is given as

$$\begin{aligned} F_{r:n}(x) = \sum _{m=r}^{n}\left( {\begin{array}{c}n\\ m\end{array}}\right) \Bigg [1-\left( \frac{1-x^{\sigma }}{\delta x^{\sigma }+1}\right) ^{\eta }\Bigg ]^{m}\Bigg [\frac{1-x^{\sigma }}{\delta x^{\sigma }+1}\Bigg ]^{\eta (n-m)} \end{aligned}$$

Stochastic ordering

For a random variable X to be smaller than a random variable Y, certain conditions must be satisfied:

  1. (i)

    Hazard rate order \(X \le _{hr} Y\) if \(h_X(x) \ge h_Y(x)\)

  2. (ii)

    Stochastic order \(X \le _{st} Y\) if \(F_X(x) \ge F_Y(x)\)

  3. (iii)

    Mean residual life order \(X \le _{mrl} Y\) if \(M_X(x) \le M_Y(x)\)

  4. (iv)

    Likelihood ratio order \(X \le _{lr} Y\) if \(\frac{f_X(x)}{f_Y(x)}\) decreasing in x

Theorem 1

Let random variables \(X \sim PNPFD(\sigma _1,\eta _1,{\delta }_{1}) and Y \sim PNPD(\sigma _2,\eta _2,{\delta }_{2})\) and if \(\sigma _1 \le \sigma _2,\eta _1 \le \eta _2, \delta _1 \le {\delta }_2,\) we have \(X \le _{lr} Y\) then \(X \le _{hr} Y, X \le _{mr} Y\) and \(X \le _{st} Y\)

Proof

To prove \(\frac{f_X(x)}{f_Y(x)}\) decreasing in x we have to show that the derivative of \(\frac{f_X(x)}{f_Y(x)}\) is less than 0.

$$\begin{aligned} \begin{aligned} \frac{f_X(x)}{f_Y(x)} =\frac{\frac{(\delta _1 +1) \eta _1 \sigma _1 x^{\sigma _1 -1} \left( \frac{1-x^{\sigma _1 }}{\delta _1 x^{\sigma _1 }+1}\right) ^{\eta _1 }}{\left( 1-x^{\sigma _1 }\right) \left( \delta _1 x^{\sigma _1 }+1\right) }}{\frac{(\delta _2 +1) \eta _2 \sigma _2 x^{\sigma _2 -1} \left( \frac{1-x^{\sigma _2}}{\delta _2 x^{\sigma _2 }+1}\right) ^{\eta _2}}{\left( 1-x^{\sigma _2}\right) \left( \delta _2 x^{\sigma _2 }+1\right) }}. \end{aligned} \end{aligned}$$

To prove \(\frac{f_X(x)}{f_Y(x)}\) is less than 0, we can also show that the derivative of the logarithm of \(\frac{f_X(x)}{f_Y(x)}\) is less than 0.

$$\begin{aligned} \begin{aligned} \frac{d}{dx} ln\big (\frac{f_X(x)}{f_Y(x)}\big ) =\frac{\sigma _1-\sigma _2}{x}-\sigma _1x^{\sigma _1-1} \Bigg [\frac{\eta _1-1}{1-x^{\sigma _1}}+\delta _1\frac{\eta _1+1}{1+\delta _1x^{\sigma _1}}\bigg ]+\sigma _2x^{\sigma _2-1} \Bigg [\frac{\eta _2-1}{1-x^{\sigma _2}}+\delta _2 \frac{\eta _2+1}{1+\delta _2x^{\sigma _2}}\bigg ]. \end{aligned} \end{aligned}$$
(16)

which is less than 0, when \(\sigma _1 \le \sigma _2,\eta _1 \le \eta _2, \delta _1 \le {\delta }_2\). Hence, we proved \(Y \ge _{lr} X\) so we can say that \(Y \ge _{hr} X, Y \ge _{mrl} X\) and \(Y \ge _{st} X\) when Y and X follows the PNPFD. \(\square\)

Quantile function

By obtaining the CDF (3) of the PNPFD, the quantile function (QF) of the PNPFD is obtained by calculating the inverse function of the CDF (3) as follows

$$\begin{aligned} Q(p)=\left( -\frac{(1-p)^{1/\eta }-1}{\delta (1-p)^{1/\eta }+1}\right) ^{1/\sigma },\quad 0<p<1. \end{aligned}$$
(17)

Estimation methods

In this section, many estimators like maximum likelihood, least squares, weighted least squares, Anderson–Darling, and Cramér-von Mises are examined to estimate the parameters \(\sigma , \eta\) and \(\delta\) of PNPFD. Let \(X_{1},X_{2},\ldots ,X_{n}\) be a random sample from the \(PNPFD\left( \sigma ,\eta ,\delta \right)\) distribution and \(x_{1},x_{2},\ldots ,x_{n}\) represents the values of the sample. Let \(X_{\left( 1\right) },X_{\left( 2\right) },\ldots ,X_{\left( n\right) }\) represent the order statistics for sample \(X_{1},X_{2},\ldots ,X_{n}\) with realization \(x_{\left( 1\right) },x_{\left( 2\right) },\ldots ,x_{\left( n\right) }\). The likelihood and log-likelihood functions can be given as

$$\begin{aligned} L\left( \Xi \right) =\left( 1+\delta \right) ^{n}\left( \eta \sigma \right) ^{n}\prod \limits _{i=1}^{n}\frac{x_{i}^{-1+\sigma }\left( \frac{ 1-x_{i}^{\sigma }}{1+x_{i}^{\sigma }\delta }\right) ^{\eta }}{\left( 1-x_{i}^{\sigma }\right) \left( 1+x_{i}^{\sigma }\delta \right) }. \end{aligned}$$

and

$$\begin{aligned} \ell \text { }\left( \Xi \right)= & {} n\log \left( 1+\delta \right) +n\log \left( \eta \right) +n\log \left( \sigma \right) +\left( \sigma -1\right) \sum \limits _{i=1}^{n}\log \left( x_{i}\right) \\{} & {} +\eta \text { }\sum \limits _{i=1}^{n}\log \left( \frac{1-x_{i}^{\sigma }}{ 1+x_{i}^{\sigma }\delta }\right) -\sum \limits _{i=1}^{n}\log \left( 1-x_{i}^{\sigma }\right) -\sum \limits _{i=1}^{n}\log \left( 1+x_{i}^{\sigma }\delta \right) . \end{aligned}$$

where \(\Xi =\left( \sigma ,\eta ,\delta \right)\). The maximum likelihood estimates(MLE) of \(\Xi\), say, \({\widehat{\Xi }}=\left( \widehat{ \sigma },{\widehat{\delta }},{\widehat{\eta }}\right)\) is obtained as follows:

$$\begin{aligned} {\widehat{\Xi }}=\underset{\left( \sigma ,\eta ,\delta \right) \in \left( 0,\infty \right) \times \left( 0,\infty \right) \times \left( -1,\infty \right) }{\arg \max \ell \text { }\left( \Xi \right) }. \end{aligned}$$

Let us deal with the following five functions to obtain the other estimators:

$$\begin{aligned} LS\left( \Xi \right)= & {} \sum \limits _{i=1}^{n}\left( \left( 1-\left( \frac{ 1-x_{\left( i\right) }^{\sigma }}{1+x_{\left( i\right) }^{\sigma }\delta } \right) ^{\eta }\right) -\frac{i}{n+1}\right) ^{2} . \end{aligned}$$
(18)
$$\begin{aligned} WLS\left( \Xi \right)= & {} \sum \limits _{i=1}^{n}\frac{\left( n+2\right) \left( n+1\right) ^{2}}{i\left( n-i+1\right) }\left( \left( 1-\left( \frac{ 1-x_{\left( i\right) }^{\sigma }}{1+x_{\left( i\right) }^{\sigma }\delta } \right) ^{\eta }\right) -\frac{i}{n+1}\right) ^{2} . \end{aligned}$$
(19)
$$\begin{aligned} AD\left( \Xi \right)= & {} -n-\sum \limits _{i=1}^{n}\frac{2i-1}{n}\log \left\{ \left( 1-\left( \frac{1-x_{\left( i\right) }^{\sigma }}{1+x_{\left( i\right) }^{\sigma }\delta }\right) ^{\eta }\right) \right\} \nonumber \\{} & {} +\log \left\{ \left( \frac{1-x_{\left( n+i-1\right) }^{\sigma }}{ 1+x_{\left( n+i-1\right) }^{\sigma }\delta }\right) ^{\eta }\right\} . \end{aligned}$$
(20)

and

$$\begin{aligned} CvM\left( \Xi \right) =\frac{1}{12n}+\sum \limits _{i=1}^{n}\left[ \left( 1-\left( \frac{1-x_{\left( i\right) }^{\sigma }}{1+x_{\left( i\right) }^{\sigma }\delta }\right) ^{\eta }\right) -\frac{2i-1}{2n}\right] ^{2}. \end{aligned}$$
(21)

The least-square estimates (LSEs), weighted least square estimate (WLSEs), Anderson–Darling estimate (ADEs) and Cramér–von Mises estimate (CvMEs) are achieved by minimizing Eqs. (18)–(21), respectively.

Numerical simulation

In this section, the bias and mean squared errors (MSEs) of MLEs, LSEs, WLSEs, ADEs, and CvMEs for parameters of the PNPFD are obtained via 5000 runs. For generating samples for the PNPFD in the simulation experiment, the quantile function provided in Eq. (17) is used. Furthermore, optimization procedures for obtaining estimations from the generated samples are performed using the BFGS method in the optim function in R. Six different scenarios are evaluated for parameter settings. These are \(\Xi _{1}=\left( 0.5,1.5,-0.5\right)\), \(\Xi _{2}=\left( 2,1.5,-0.5\right) ,\) \(\Xi _{3}=\left( 1.5,0.5,2\right)\), \(\Xi _{4}=\left( 3,1.5,2\right) ,\) \(\Xi _{5}=\left( 0.5,2.5,-0.7\right)\) and \(\Xi _{6}=\left( 2.5,0.7,1.5\right)\). The simulation results are given in Tables 1 and 2. Tables 1 and 2 show that the bias and MSEs decrease as the sample size increases for all estimators. According to the bias criterion, the best estimator for the parameters of \(\sigma\) and \(\eta\) is usually ADEs, while the best estimator for the \(\delta\) parameter is MLEs. When scenarios are analyzed in detail, the following interpretations can be made for the MSEs criterion:

  • In scenario \(\Xi _{1}\), the MLEs for \(\sigma\) and ADEs for both \(\eta\) and \(\delta\) are the best estimators.

  • In scenario \(\Xi _{2}\), the LSEs for \(\sigma\) and CVMEs for both \(\eta\) and \(\delta\) are the best estimators.

  • In scenarios \(\Xi _{3}\) and \(\Xi _{6}\), the WLSEs are the best estimators for three parameters.

  • In scenarios \(\Xi _{4}\) and \(\Xi _{5}\), the MLEs for \(\sigma\) and ADEs for both \(\eta\) and \(\delta\) are the best estimators.

It is observed that the decreasing trend in bias and MSEs for all estimators is achieved as expected with the increase in sample size.

Table 1 The bias of all estimators for PNPFD.
Table 2 The MSEs of all estimators for PNPFD.

Regression analysis

In this section, a novel regression model is presented and serves as an alternative to the Kumaraswamy and beta regression models. The quantile function in Eq. (17) is used to obtain this new regression model. Re-parameterizing the PDF and CDF of the PNPFD can be achieved by utilizing the quantile function. Let \(Q\left( p;\sigma ,\eta ,\delta \right) =\mu\) and then

$$\begin{aligned} \sigma =\frac{\log \left( \frac{1-\left( 1-p\right) ^{1/\eta }}{1+\delta \left( 1-p\right) ^{1/\eta }}\right) }{\log \left( \mu \right) } \end{aligned}$$
(22)

is acquired. The CDF and PDF of the re-parametrized distribution are obtained, respectively, by

$$\begin{aligned} F\left( y,\eta ,\delta ,\mu \right) =1-\left( \frac{1-y^{\sigma ^{*}}}{ \delta y^{\sigma ^{*}}+1}\right) ^{\eta }. \end{aligned}$$
(23)

and

$$\begin{aligned} f\left( y,\eta ,\delta ,\mu \right) =\frac{(\delta +1)\eta \sigma ^{*}y^{\sigma ^{*}-1}\left( \frac{1-y^{\sigma ^{*}}}{\delta y^{\sigma }+1}\right) ^{\eta }}{\left( 1-y^{\sigma ^{*}}\right) \left( \delta y^{\sigma ^{*}}+1\right) }. \end{aligned}$$
(24)

where

$$\begin{aligned} \sigma ^{*}=\frac{\log \left( \frac{1-\left( 1-p\right) ^{1/\eta }}{ 1+\delta \left( 1-p\right) ^{1/\eta }}\right) }{\log \left( \mu \right) }, \end{aligned}$$

where parameters \(\eta > 0\) and \(\delta > -1\) characterize the PNPFD, while \(\mu \in (0, 1)\) denotes the quantile regression parameter. The value of p is selected from the range (0, 1) and can be either 0.25, 0.5, or 0.75. It is noticed that the random variable Y is denoted by \(Y \sim PNPF\left( \eta ,\delta ,\mu ,p\right)\).

Once the QPNPF has been defined, the new regression model using the PDF of the QPNPF in Eq. (24) can be presented. Let \(y_{1},y_{2},\ldots ,y_{n}\) such that \(y_{i}\) is an realization of \(Y^{\tilde{\,}}QPNPF\left( \eta ,\delta ,\mu _{i},p\right)\) for \(i=1,2,\ldots ,n\) where \(\eta ,\delta\) and \(\mu _{i}\) are unknown parameters, and the p is known. The proposed quantile regression model is as follows:

$$\begin{aligned} g\left( \mu _{i}\right) ={\textbf{x}}_{i}\mathbf {\beta }^{\texttt{T}}, \end{aligned}$$
(25)

where \(\mathbf {\beta }\mathbf {=}\left( \beta _{0},\beta _{1},\ldots ,\beta _{p}\right)\) are the unknown regression parameter vector, \({\textbf{x}} _{i}=\left( \textbf{1,x}_{i1},{\textbf{x}}_{i2},\ldots ,{\textbf{x}}_{ip}\right)\) known ith vector of the covariates and g is a link function. We use the following logit-link function because the QPNPF is defined within the interval (0, 1):

$$\begin{aligned} g\left( \mu _{i}\right) =\log \left( \frac{\mu _{i}}{1-\mu _{i}}\right) ,i=1,2,\ldots ,n. \end{aligned}$$
(26)

It is achieved by Eq. (26)

$$\begin{aligned} \mu _{i}=\frac{\exp \left( {\textbf{x}}_{i}\mathbf {\beta }^{\texttt{T}}\right) }{1+\exp \left( {\textbf{x}}_{i}\mathbf {\beta }^{\texttt{T}}\right) }. \end{aligned}$$
(27)

Parameter estimation for regression parameters

In this section, for the estimate of unknown regression parameters and model parameters, the maximum likelihood estimation method is introduced. Let \(Y_{1},Y_{2},\ldots ,Y_{n}\) be a random sample of size n from the \(QPNPF\left( \eta ,\delta ,\mu _{i},p\right)\) distribution with realizations \(y_{1},y_{2},\ldots ,y_{n}\), where the \(\mu _{i}\) is given in (27) for \(i=1,2,\ldots ,n.\)Then the log-likelihood function is given by

$$\begin{aligned} \ell \left( \Xi \right)= & {} n\log \left( \delta +1\right) +n\log \left( \eta \right) +n\log \left( \sigma ^{*}\right) +\left( \sigma ^{*}-1\right) \sum \limits _{i=1}^{n}\log \left( y_{i}\right) \nonumber \\{} & {} +\eta \sum \limits _{i=1}^{n}\log \left( \frac{1-y^{\sigma ^{*}}}{\delta y^{\sigma ^{*}}+1}\right) -\sum \limits _{i=1}^{n}\log \left( 1-y_{i}^{\sigma ^{*}}\right) -\sum \limits _{i=1}^{n}\log \left( \delta y_{i}^{\sigma ^{*}}+1\right) \end{aligned}$$
(28)

where \(\Xi =\left( \eta ,\delta ,\mathbf {\beta }\right)\) is the parameter vector. The MLE of the \(\Xi ,\) say \({\widehat{\Xi }}=\left( {\widehat{\eta }}, {\widehat{\delta }},\beta _{0},\beta _{1},\ldots ,\beta _{p}\right)\) is achieved by maximizing the \(\ell \left( \Xi \right)\) presented in (28) for \(\eta ,\delta\) and \(\mathbf {\beta .}\) As the log-likelihood function in (28) involves a nonlinear function, and it can be maximized using optim function in R.

Real data analysis

In this section, three real data applications are examined for both the proposed distribution and novel regression model.

Practical examples for PNPFD

In this subsection, two practical data sets are analyzed to demonstrate the usability of the PNPFD. The Kumaraswamy (K)24, unit-Weibull (UW)25, unit-Burr XII (UBXII)26, unit-Muth(UM)27, and NPFD models are used to compare the PNPFD. The PDFs for these models are given, respectively, by

$$\begin{aligned} f_{PNPFD}\left( y\right)= & {} \frac{\left( p_{2}+1\right) p_{3}p_{1}y^{p_{1}-1}\left( \frac{1-y^{p_{1}}}{1+y^{p_{1}p_{2}}}\right) ^{p_{3}}}{(1-y^{p_{1}})(p_{2}y^{p_{1}}+1)},p_{1},p_{3}>0,p_{2}>-1.\\ f_{K}\left( y\right)= & {} p_{1}p_{2}y^{p_{1}-1}\left( 1-y^{p_{1}}\right) ^{p_{2}-1},p_{1},p_{2}>0.\\ f_{UW}\left( y\right)= & {} p_{1}p_{2}\left( -log\left( y\right) \right) ^{p_{2}-1}\exp \left( -p_{1}\left( -log\left( y\right) \right) ^{p_{2}}\right) y^{-1},p_{1},p_{2}>0.\\ f_{UBXII}\left( y\right)= & {} p_{1}p_{2}y^{-1}\left( -\log y\right) ^{p_{2}-1}\left( 1+\left( -\log y\right) ^{p_{2}}\right) ^{-p_{1}-1},p_{1},p_{2}>0.\\ f_{UM}\left( y\right)= & {} p_{2}^{-1}\exp \left( 1/p_{1}\right) \left( y^{-\frac{ p_{1}}{p_{2}}}-p_{1}\right) y^{-1-\frac{p_{1}}{p_{2}}}\exp \left( -\frac{1}{ p_{1}}y^{-\frac{p_{1}}{p_{2}}}\right) ,p_{1},p_{2}>0.\\ f_{NPFD}\left( y\right)= & {} \left( p_{1}+1\right) p_{2}\left( 1-y\right) ^{p_{2}-1}\left( 1+p_{1}y\right) ^{-p_{2}-1},p_{1},p_{2}>0. \end{aligned}$$

The maximum likelihood methodology is used to estimate the model parameters. The estimated log-likelihood (\(\ell\)), Akaike information criterion (AIC), and the Bayesian information criterion (BIC) are used to assess the goodness-of-fit of the distributions. Furthermore, the Kolmogrov-Smirnov (KS) statistic and p-value of the KS statistic are calculated.

The first set of data was taken from firm risk management cost-effectiveness, which is available on the web page of Professor E. Frees (Wisconsin School of Business). The data is defined on (0, 1) and calculated as the total property and casualty premiums and uninsured losses as a percentage of the total assets. The first data is also reported and analyzed by34. Table 3 reports the first real data set modeling results.

Table 3 The goodness of fit results for the first data sets.

The second data set indicates the recovery rates of viable CD34+ cells in the 239 patients who agreed to autologous peripheral blood stem cell transplant after myeloablative chemotherapy doses. The CD34+ is also investigated by26. Results for the CD34+ are given in Table 4.

Table 4 The goodness of fit results for the second data sets.

When the modeling results for both real data sets are analyzed, Tables 3 and 4 clearly show that PNPFD is the best model among all models based on all criteria and statistics. Figures 3 and 4 present some goodness-of-fit graphs for real data modeling. In Figures 3 and 4, the fitted PDF, CDF, SF, and P-P plots of the PNPFD based on the first and second real datasets are illustrated in detail. Considering the fit in Figures 3 and 4, it is observed that the PNPFD is a suitable choice for modeling these two real datasets.

Figure 3
figure 3

The fitted PDF, CDF, SF, and P-P plots for PNPFD of the first data.

Figure 4
figure 4

The fitted PDF, CDF, SF, and P-P plots for PNPFD of the second data.

Practical example for QPNPFD

In this subsection, the new regression model is demonstrated for its usability through a real data application. For comparison purposes, the Kumaraswamy30 and the beta29, log-extended exponential geometric (LEEG)35, and transmuted unit rayleigh (TUR)36 regression models are utilized. The quantile parameter p is set to 0.5 for the QPNPFD, Kw, and LEEG regression models. The data is taken from36 and can be found at https://stats.oecd.org/index.aspx?DataSetCode=BLI. Here, the percentage of the educational attainment values of the OECD countries (y) is considered as the dependent variable, and the percentage of the voter turnout (\(x_{1}\)), homicide rate (\(x_{2}\)), and life satisfaction (\(x_{3}\)) as the independent variables. Detailed information about this data and some descriptive statistics can be viewed from36. This application aims to reveal the relationship with y and \(x_{1}\), \(x_{2}\), and \(x_{3}\).

The regression model is presented as

$$\begin{aligned} \text {logit} \left( \mu _{i}\right) =\beta _{0}+\beta _{1}x_{i1}+\beta _{2}x_{i2}+\beta _{3}x_{i3},\text { }i=1,2,\ldots ,38. \end{aligned}$$

where \(\mu _{i}\) represents the median for QPNPFD, Kw, and LEEG models and the mean for Beta regression. Parameter estimates for regression models, p-values for the significance of model parameters, and log-likelihood results are presented in Table 5.

Table 5 Parameter estimates of regression models for OECD data with standard error (SE) and log-likelihoods.

From 5, it is striking that the best regression model for OECD data is the PNPFD model. For the PNPFD model, \(\eta\), \(\delta\), and \(\beta _{0}\) parameters are statistically insignificant at the level of 5%, and the other parameters \(\beta _{1}\), \(\beta _{2}\) and \(\beta _{3}\) are statistically significant at the level of 5%. The median response is positively affected by parameter \(\beta _{3}\), whereas the median response is negatively affected by parameters \(\beta _{1}\) and \(\beta _{2}\). It is determined that an increase in life satisfaction increases the percentage of educational attainment, while an increase in voter turnout and homicide rate decreases the percentage of educational attainment.

Conclusion

This study aimed to introduce a new superior model capable of modeling and fitting data defined on (0,1). This paper introduced a new unit model as an alternative to Kumaraswamy and beta distributions. The new model’s statistical and reliability features were discussed, like moments, stochastic ordering, reliability function, hazard rate function, order statistics, and quantile function. Furthermore, the PNPFD has flexible shapes for its density and hazard functions. The probability density function plots reveal that the new distribution is unimodal and J-shaped, while the hazard rate function exhibits a pattern characterized by decreased, increased, and bathtub-shaped behavior. The major objectives had been established throughout the study, setting the groundwork for a comprehensive investigation into the efficacy of the PNPFD compared to existing, well-known distributions. As we delve into the conclusion, it is noteworthy to emphasize that the research aim has been realized with resounding success. Its parameters are estimated with precision using various methods. The performance of these methods is compared with a Monte Carlo simulation. According to the simulation study, it is observed that the results of the estimators approached each other in a large sample size. Simulation results indicate that, according to the bias criterion, ADEs are typically identified as the optimal estimator for the parameters of \(\sigma\) and \(\eta\), while MLEs are considered the most suitable estimator for the \(\delta\) parameter.. A novel regression analysis is introduced via the proposed distribution. Three real data analyses demonstrate the applicability and reliability of the new distribution and the new regression model evidenced by low error measures such as SE and p-value. The results from the modeling with figures also demonstrate that the new distribution fits remarkably well with the real data. In conclusion, this study not only ensued in meeting its aim but also proved the capability of the PNPFD to contribute substantially to the field of statistics. The flexibility of the proposed regression model compared to existing regression models indicates that it is an effective model for situations where the dependent variable is proportional. The outcomes portrayed here open paths for future research incorporating novel heuristics techniques for investigating the disease dynamics and insist on the significance of the PNPFD as a beneficial tool for researchers in diverse areas, including neuro-computational intelligence, non-linear tumor-immune delayed model, nonlinear multi-delayed tumor oncolytic virotherapy systems, nonlinear influenza-A epidemic model, nonlinear multi-delays SVEIR epidemic systems, etc. We hope that this model will be used for data analysis in many different fields such as economics, engineering, medicine, etc. In addition to the properties we have discussed, several other methods, such as Bayesian regression and the method of moments, can be employed to estimate parameters to assess the efficiency of a model. By applying these methods, we can make future predictions based on the data set, allowing for further analysis and application of the proposed model.