Introduction

Regression analysis is widely used for predicting or identifying the factors associated with the response variable. Several types of regression models are developed that depend on the distribution of the response variable, and the type of relationship (linear vs non-linear) to measure the average relationship between the response variable and one or more explanatory variable variables such as the generalized linear model (GLM)1 and some other models. The GLM has various applications in science, engineering, business and some others2,3.

The count models are used to examine the factors that influence the response variable which is in positive integers such as 0, 1, 2, 3, 4…, etc.4,5. Several count regression type models are developed such as the Poisson regression model (PRM), Quasi-Poisson regression model (QPRM), negative binomial regression model (NBRM), bell regression model (BRM) and Conway Maxwell Poisson regression model (CMPRM). These models are used in various situations. The PRM6 is used when the response variable has a Poisson distribution with identical average and dispersion7.

In real-life datasets, the assumption of equal mean and variance, as postulated by PRM, often is violated. Over-dispersion occurs when the variance of the response variable exceeds its mean. The NBRM is used to handle over-dispersion8. However, NBRM requires a large number of samples and the response variable must have positive numbers. This model is commonly applied to count data and over-dispersion scenarios9. Conway and Maxwell introduced the Conway–Maxwell PRM (CMPRM) in 1962, adept at addressing both over and under-dispersion in count data10,11. Additionally, the QPRM is a generalization of the PRM and is commonly used for modeling an over-dispersed count variable12. The QPRM is an alternative to the NBRM13 and is recommended when the variance is a linear function of the mean13.

The quasi-likelihood estimator (QLE) is used to estimate the regression coefficients of the QPRM14. However, it produces inefficient estimates when the regressors are highly correlated known as multicollinearity. This term was used by Frisch15 for the first time in the linear regression model. When multicollinearity exists in a QPRM, the QLE produces large variances, leading to incorrectly signed regression coefficients and wider confidence intervals. This issue further results in inflated standard errors of regression coefficients, potentially leading to inaccurate outcomes16.

Biased estimation methods are commonly used to mitigate the impact of multicollinearity. Several biased estimation methods have been developed to diminish the impact of multicollinearity in producing more reliable estimates17,18,19. The ridge estimator (RE) is a popular method to deal with the issue of multicollinearity in the linear regression model20. It shrinks the coefficients of the model and diminishes the impact of multicollinearity. A key component of the RE is the biasing parameter k crucial for optimizing regression coefficients. Several studies found the optimal k value by minimizing the Mean Squared Error (MSE) to enhance the estimator's performance20,21,22,23,24,25,26,27,28. Various studies determine the optimal value of biasing parameters in RE with different probability-distributed models7,18,25,29,30,31,32,33.

Various studies proposed the REs to diminish the impact of multicollinearity in a count regression model such as, RE for Poisson ridge regression7,18,34,35, RE for the NBRM30,36, RE for Bell regression model32 and more recently, the RE for the CMPRM37.

It is evident from the literature that several studies proposed REs and some new biasing parameters for the linear regression model, count regression model and GLMs. Furthermore, these studies attempted to find the best ridge parameter estimators (RPEs) of the above models. However, no study has investigated RE for the QPRM. In this study, we aimed to investigate the RE for the QPRM to deal with multicollinearity and over-dispersion and compare the RE and the QLE in the QPRM. Furthermore, we aim to introduce diverse RPEs tailored for the QPRM's ridge estimator and assess their performance to identify the most effective one.

The rest of the paper is organized as follows: section “Methodology” will cover the structure, estimation, and properties of the QPRM. Additionally, we will introduce the RPEs in this section. The simulation study’s setting, its results and application will be discussed in section “Numerical evaluation”. Section “Application: apprentice migration dataset” will discuss the results of real-life application. The concluding remarks will be given at the end of this paper.

Methodology

The QPRM

The QPRM is used in the situation, where the dependent variables, \({y}_{i}\) is in counting form, follows the Poisson distribution and its variance is greater than its mean. The quasi-Poisson (QP) distribution is considered for analyzing the over-dispersion condition in the datasets. For detailed discussion and properties, see Efron38 and Istiana et al.39. The QP distribution has a relation with the Poisson distribution when the dispersion parameter is equal to 1. Let us consider Y, which represents the response variable selected from the QP distribution with parameters \(\mu {\text{ and }} \gamma\). The mean and variance of the QP are respectively given by

$${\text{E}}({\text{Y}}) =\upmu$$
(1)
$${\text{Var}}\left({\text{Y}}\right)= \gamma\upmu ,$$
(2)

where µ > 0, and \(\gamma\) is an over-dispersion parameter. The close relationship between the expectation and variance shows that variance is a function of its average. The QPRM is characterized by the first two moments (mean and variance) as discussed by Wedderburn12, but Efron38 and Gelfand and Dalal40 showed how to create a distribution for this model; however, it requires re-parameterization. Estimation often proceeds from the first two moments and estimating equations14. The likelihood function for the QPRM (quasi-likelihood) does not require a specific probability density function to estimate regression parameters except for the response variable assumption41. The formation of the QL function begins in the same way as the usual likelihood function. The QLE is obtained by minimizing the Quasi-log likelihood function that is given by Wakefield42

$$l\left({y}_{i}, {\mu }_{i}\right)=\frac{1}{\gamma }\left({y}_{i}log {\mu }_{i}-{\mu }_{i}\right)$$
(3)

Now differentiate (3) w.r.t. \({\mu }_{i}\), and equating to zero, we have the Quasi-score function

$${U}_{i}=\frac{{y}_{i}- {\mu }_{i}}{\gamma {\mu }_{i}},$$
(4)

where \(\mu =exp\left(X\beta \right)\). Here \(X\) is a covariates matrix of order \(n \times (p + 1)\) and \(\beta\) is the column vector of regression coefficients of order \((p + 1) \times 1\).

As Eq. (4) is non-linear in \(\beta\), so, the regression parameters of the QP model can be calculated using iterative reweighted least square (IWLS). The equation used in calculating the IWLS with \(t+1\) iterations can be written as13

$${\beta }^{\left[t+1\right]}={\left({X}^{\prime}{\widehat{{\text{W}}}}^{\left[t\right]}X\right)}^{-1}{X}^{\prime}{W}^{\left[t\right]}{m}^{\left[t\right],}$$
(5)

where \({\widehat{{\text{W}}}}_{i}^{\left[t\right]}=\frac{1}{V({\mu }_{i}^{\left[t\right]})}{\left(\frac{\partial {\mu }_{i}^{\left[t\right]}}{\partial {\eta }_{i}^{\left[t\right]}}\right)}^{2},\) \({m}_{i}^{\left[t\right]}={\eta }_{i}^{\left[t\right]}+\frac{\left({y}_{i}-{\mu }_{i}^{\left[t\right]}\right)}{\frac{\partial {g}^{-1}\left({\eta }_{i}^{\left[t\right]}\right)}{\partial {\eta }_{i}^{\left[t\right]}}}\), \({\eta }_{i}^{\left[t\right]}={x}_{i}^{\prime}{\beta }^{\left[t\right]}\),

$${\mu }_{i}^{\left[t\right]}={g}^{-1}\left({\eta }_{i}^{\left[t\right]}\right)={g}^{-1}\left({x}_{i}^{\prime}{\beta }^{\left[t\right]}\right),$$

where \({\mu }_{i}={\text{exp}}\left({\eta }_{i}\right)\). So, \({m}_{i}={\text{log}}\left({\mu }_{i}\right)+\frac{{y}_{i}-{\mu }_{i}}{{\mu }_{i}}\). The QLE of \(\beta\) at the final iteration is defined as

$${\widehat{\beta }}_{QLE}={\left(F\right)}^{-1}{X}^{\prime}\widehat{W}m,$$
(6)

where \(F={{\text{X}}}^{{{\prime}}}\widehat{{\text{W}}}{\text{X and }}\widehat{\text{ W}}={\text{diag}}\left(\frac{1}{V(\widehat{\mu })}{\left(\frac{\partial \widehat{\mu }}{\partial \eta }\right)}^{2} \right).\) The QLE is normally distributed with a covariance matrix that corresponds to the inverse of the matrix of the second derivatives:

$$Cov\left({\beta }_{QLE}\right)={\left[E\left(-\frac{{\partial }^{2}l({\text{X}};\upbeta )}{\partial \beta \partial {\beta }^{\prime}}\right)\right]}^{-1}={\gamma \left({\text{F}}\right)}^{-1}.$$
(7)

Furthermore, the MSE of the \({\widehat{\beta }}_{QLE}\) is given as

$$MSE\left({\widehat{\beta }}_{QLE}\right)=E{\left({\beta }_{QLE}-\beta \right)}^{\prime}\left({\beta }_{QLE}-\beta \right)=\gamma {\left({\text{F}}\right)}^{-1}$$
(8)

By applying the trace on both sides of Eq. (8), we have

$$MSE\left({\widehat{\beta }}_{QLE}\right)=\gamma {\text{tr}}{\left({\text{F}}\right)}^{-1}=\gamma \sum_{j=1}^{p+1}\frac{1}{{\lambda }_{j}},$$
(9)

where \({\lambda }_{j}\) is the jth eigenvalue of the matrix \(F\). When the explanatory variables in the QPRM are highly correlated, then the weighted matrix of cross-products of \(F,\) is ill-conditioned, and the QLE gives inefficient results with larger variances. In this condition, it is difficult to interpret the estimated coefficients since the vector of parameters is on average too large.

The quasi Poisson ridge regression estimator

When the multicollinearity is present among the explanatory variables in the QPRM, then the QLE does not perform efficiently, it gives a large variance of estimated coefficients. To minimize the effects of multicollinearity, the RR estimation method was introduced by Hoerl and Kennard20. In this study, we are proposing the quasi-Poisson ridge regression estimator (QPRRE) applied to the count and over-dispersed data that minimizes the effects of multicollinearity. So, the QPRRE is defined by

$${\widehat{\beta }}_{k}={\left(F+k{I}_{p}\right)}^{-1}F{\widehat{\beta }}_{QLE},$$
(10)

where \(F={{\text{X}}}^{{{\prime}}}\widehat{{\text{W}}}{\text{X}}\) and k is the biasing parameter and \({I}_{p+1}\) is the identity matrix with order \((p+1)\times (p+1)\). The bias, the covariance and the matrix MSE (MMSE) of the \({\widehat{\beta }}_{k}\) can be derived as follows

$$Bias \left({\widehat{\beta }}_{k}\right)=-kT{\Lambda }_{k}^{-1}\beta .$$
(11)
$$Cov\left({\widehat{\beta }}_{k}\right)=\widehat{\gamma }T{\Lambda }_{k}^{-1}\Lambda {\Lambda }_{k}^{-1} {T}^{\prime}.$$
(12)
$$MMSE\left({\widehat{\beta }}_{k}\right)=Cov\left({\widehat{\beta }}_{k}\right)+Bias \left({\widehat{\beta }}_{k}\right)Bias {\left({\widehat{\beta }}_{k}\right)}^{\prime}$$
$$MMSE\left({\widehat{\beta }}_{k}\right)=\widehat{\gamma }T{\Lambda }_{k}^{-1}\Lambda {\Lambda }_{k}^{-1} {T}^{\prime}+{k}^{2}T{\Lambda }_{k}^{-1}\beta {\beta }{\prime}{\Lambda }_{k}^{-1}{T}^{\prime},$$
(13)

where \({\Lambda }_{k}=diag({\lambda }_{1}+k,{\lambda }_{2}+k,\ldots,{\lambda }_{p+1}+k)\) and \(\Lambda =diag({\lambda }_{1},{\lambda }_{2}, {\lambda }_{3}, \ldots,{\lambda }_{p+1})=T\left(F\right){T}^{\prime}\), where the orthogonal matrix \(T\) has eigenvectors of \(F\). Finally, the scalar MSE of the QPRRE can be estimated by applying trace on Eq. (13), which can be defined as

$$\begin{aligned} MSE\left({\widehat{\beta }}_{k}\right)&=tr\left\{MMSE\left({\widehat{\beta }}_{k}\right)\right\} \\ MSE\left({\widehat{\beta }}_{k}\right)&=\widehat{\gamma }{\sum }_{ j=1}^{p+1}\frac{{\lambda }_{j}}{{\left({\lambda }_{j}+k\right)}^{2}}+{k}^{2}{\sum }_{j=1}^{p+1}\frac{{{\alpha }_{j}}^{2}}{{\left({\lambda }_{j}+k\right)}^{2}}={M}_{1}\left(k\right)+{M}_{2}\left(k\right), \end{aligned}$$
(14)

where \({\alpha }_{j}={{T}^{\prime}\widehat{\beta }}_{QLE}\) and \({\lambda }_{j}\) is the jth eigenvalue of the F matrix.

The superiority of the QPRRE to the QLE

To explore the superiority of the RR estimator over others, Hoerl and Kennard20 proposed the statements for the properties of the MSE of the RR estimators in the LRM. Here, we will prove that the theorems also hold for the QPRM and according to these theorems, we will explore the supremacy of the QPRRE over the QLE.

Theorem 3.1

The variance \({M}_{1}\left(k\right)\) and squared bias \({M}_{2}\left(k\right)\) are respectively continues, monotonically decreasing and increasing functions of k, since \(k>0\) and \({\lambda }_{j}>0\).

Proof

The 1st derivative of \({M}_{1}\left(k\right)\) and \({M}_{2}\left(k\right)\) from Eq. (14) concerning k are.

$${M}_{1}^{\prime}=-2\widehat{\gamma }{\sum }_{ j=1}^{p+1}\frac{{\lambda }_{j}}{{\left({\lambda }_{j}+k\right)}^{3}}$$
(15)

and

$${M}_{2}^{\prime}=2k{\sum }_{j=1}^{p+1}\frac{{{\alpha }_{j}}^{2}{\lambda }_{j}}{{\left({\lambda }_{j}+k\right)}^{3}}$$
(16)

The \(\widehat{\gamma },k,{\lambda }_{j}\) and \({{\alpha }_{j}}^{2}\) are positive, Eq. (15) shows that the \({M}_{1}\left(k\right)\) is a continuous and monotonically decreasing function of k, since \(k>0\) and \({\lambda }_{j}>0\). Equation (16) shows that the \({M}_{2}\left(k\right)\) is a continuous and monotonically increasing function of k.

Theorem 3.2

There always exists a \(k>0\) and the \(MSE\left({\widehat{\beta }}_{k}\right)<MSE\left({\widehat{\beta }}_{QLE}\right).\)

Proof

The 1st derivative of Eq. (14) concerning k is given by

$$\begin{aligned} \frac{dMSE\left({\widehat{\beta }}_{k}\right)}{dk}&=-2\widehat{\gamma }{\sum }_{ j=1}^{p+1}\frac{{\lambda }_{j}}{{\left({\lambda }_{j}+k\right)}^{3}}+2k{\sum }_{j=1}^{p+1}\frac{{{\alpha }_{j}}^{2}{\lambda }_{j}}{{\left({\lambda }_{j}+k\right)}^{3}}\\ &=-2\sum_{j=1}^{p+1}\frac{{\lambda }_{j}\left(\widehat{\gamma }-k{{\alpha }_{j}}^{2}\right)}{{\left({\lambda }_{j}+k\right)}^{3}} \end{aligned}$$
(17)

Equation (17), clearly shows that the sufficient condition for \(\frac{dMSE\left({\widehat{\beta }}_{k}\right)}{dk}\) to be less than zero is \(\frac{\widehat{\gamma }}{{{\alpha }_{j}}_{max}^{2}}<0\).

Selection of the biasing parameters

The RR estimator is based on the RPE which has a main role in its estimation. To minimize the effects of high correlation among the explanatory variables, the optimal value of the shrinkage parameter (k) is the main concern of the RR. The RPEs for different regression models have been suggested by many investigators and find the optimal biasing parameter20,24,31,37,43,44. Firstly, Hoerl and Kennard20 presented the ridge estimation method to mitigate the effect of a high degree of correlation for the LRM. This estimator was also used for the gamma regression model (GRM)44, and for the CMPRM37. For the QPRRE, it is defined as

$${k}_{1}=\frac{\widehat{\gamma }}{\sum_{j=1}^{p+1}{\widehat{\alpha }}_{j}^{2}}$$
(18)

where \(\widehat{\gamma }\) is the estimated dispersion parameter.

Hoerl et al.24 proposed the shrinkage parameter estimator for the RR in the LRM and we are adapting this estimator for the QPRRE as

$${k}_{2}=\frac{\left(p+1\right) \widehat{\gamma }}{\sum_{j=1}^{p+1}{\widehat{\alpha }}_{j}^{2}}$$
(19)

Amin et al.16,44 developed an RPE for the inverse Gaussian ridge regression (IGRR)

$${k}_{3}={\text{max}}\left({k}_{j}\right),\quad \text{where }{ k}_{j}=\frac{\widehat{\gamma }}{{\widehat{\alpha }}_{j}^{2}}.$$
(20)

Akram et al.45 proposed the following RPEs of the GRM’s RR.

$${k}_{4}=\frac{\widehat{\gamma }}{p+1}\sum_{j=1}^{p+1}\left(\frac{1 }{2{\widehat{\alpha }}_{j}^{2}+max\left(\frac{\widehat{\gamma }}{{\widehat{\lambda }}_{j}}\right)}\right)$$
(21)
$${k}_{5}=median\left(\frac{\widehat{\gamma }}{2{\widehat{\alpha }}_{min}^{2}+\left(\frac{\widehat{\gamma }}{{\widehat{\lambda }}_{j}}\right)}\right)$$
(22)
$${k}_{6}=\frac{\widehat{\gamma }}{p+1}\sum_{j=1}^{p+1}\left[\frac{1}{2{\widehat{\alpha }}_{j}^{2}+\left(\frac{\widehat{\gamma }}{{\widehat{\lambda }}_{j}}\right)}\right]$$
(23)
$${k}_{7}=median\left(\frac{\widehat{\gamma }}{2{\widehat{\alpha }}_{j}^{2}+\left(\frac{\widehat{\gamma }}{{\widehat{\lambda }}_{j}}\right)}\right)$$
(24)
$${k}_{8}=\left[\frac{1}{\widehat{\gamma }}\left\{max\left[\frac{\widehat{\gamma }}{2{\widehat{\alpha }}_{j}^{2}+max\left(\frac{\widehat{\gamma }}{{\widehat{\lambda }}_{j}}\right)}\right]\right\}\right]$$
(25)
$${k}_{9}=\prod_{j=1}^{p+1}{\left(\frac{\widehat{\gamma }}{2{\widehat{\alpha }}_{j}^{2}+\left(\frac{\widehat{\varnothing }}{{\widehat{\lambda }}_{j}}\right)}\right)}^{\frac{1}{p+1}}$$
(26)

Given above stated studies, we proposed the following RPEs for the QPRRE as

$${k}_{10}={\text{min}}\left[\frac{\left(\widehat{\gamma }{\widehat{\lambda }}_{j}\right)}{{\widehat{\alpha }}_{j}^{2}}\right],$$
(27)
$${k}_{11}=\sum_{j=1}^{p+1}\left[\left(\frac{\widehat{\gamma }}{{\widehat{\alpha }}_{j}^{2}}\right)\left(1+\left(1+{\left(\frac{\widehat{\gamma }}{{\widehat{\alpha }}_{j}^{2}}\right)}^{2}\right)\right)\right],$$
(28)
$${k}_{12}=\sum_{j=1}^{p+1}\left(\frac{{\widehat{\lambda }}_{j}\widehat{\gamma }}{\widehat{\gamma }+{\widehat{\lambda }}_{j}{\widehat{\alpha }}_{j}^{2}}\right),$$
(29)
$${k}_{13}=\frac{1}{max\left({\widehat{\alpha }}_{j}^{2}\right)},$$
(30)
$${k}_{14}=\sum_{j=1}^{p+1}\left(\frac{1}{{\widehat{\alpha }}_{j}^{2}}\right)$$
(31)
$${k}_{15}=\sum_{j=1}^{p+1}\left[\frac{{\widehat{\lambda }}_{j}}{(1+2{\widehat{\lambda }}_{j}{\widehat{\alpha }}_{j}^{2})}\right]$$
(32)
$${k}_{16}=(p+1)\sum_{j=1}^{p+1}\left[\frac{\widehat{\gamma }}{2{\widehat{\alpha }}_{j}^{2}+\left(\frac{\widehat{\gamma }}{{\widehat{\lambda }}_{j}}\right)}\right]$$
(33)

Numerical evaluation

In this section, a Monte Carlo simulation will be conducted to examine the performance of the RPEs in the QPRRE along with the QLE.

Simulation layout

In the QPRM, the response variable is generated from the quasi-Poisson distribution \(({\mu }_{i}, \gamma )\), where

$${\mu }_{i}={\text{exp}}\left({\beta }_{0}+{\beta }_{1}{x}_{ij}+{\beta }_{2}{x}_{ij}+\cdots +{\beta }_{p}{x}_{ip}\right), i={1,\,2},\ldots,n,$$
(34)

where \({\beta }_{p}\) shows the regression parameters of the QPRM. These parameters are selected under the condition of \(\sum_{j=1}^{p+1}{\beta }_{j}^{2}=1\). And, the following formula is used to generate the correlated regressors.

$${x}_{ij}={\left(1-\rho \right)}^{1/2}{z}_{ij}+\rho {z}_{i\left(j+1\right)},\quad i={1,\,2},\ldots n;\;\; j={1,2},\ldots ,p,$$
(35)

where \({\rho }^{2}\) shows the correlation between the regressors and \({z}_{ij}\) are the independent standard normal pseudo-random numbers. We consider different values of \(\rho\) corresponding to 0.80,0.90,0.95 and 0.99. We also consider different values of \(n,p, \gamma\). Here, \(n\) represents the sample size that is assumed to be 25, 50, 100, 150, 200 for p = 3, 6 and n = 50, 100, 150, 200 for p = 12, \(p\) shows the number of regressors that are assumed to be 3, 6, 12 and \(\gamma\) indicates the dispersion parameter that is taken to be 2, 4, 6. This simulation is replicated 2000 times with the different combinations of \(n, p, \gamma , \rho\). To check the dominance of our proposed ridge estimator with different RPEs, we use the MSE as the performance evaluation method defined by

$$MSE\left(\widehat{\beta }\right)=\frac{\sum_{i=1}^{V}{\left({\widehat{\beta }}_{i}-\beta \right)}^{\prime}\left({\widehat{\beta }}_{i}-\beta \right)}{V},$$
(36)

where \(V\) represents the number of replications and \(\left({\widehat{\beta }}_{i}-\beta \right)\) shows the difference between the true parameter and predicted vectors of the proposed estimator and QLE at ith replication. The R programming language is used for all calculations related to our study.

Simulation results and discussions

The estimated MSEs of the QPRREs are given in Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18. We evaluate the performance of different RPEs for the QPRRE as compared to the QLE in the presence of multicollinearity, the sample size, dispersion and the number of regressors.

Table 1 Estimated MSEs for \(\gamma =2\), p = 3 and \(n=25\). Bold value indicates minimum MSE.
Table 2 Estimated MSEs for \(\gamma =2\), p = 3 and \(n=50\). Bold value indicates minimum MSE.
Table 3 Estimated MSEs for \(\gamma =2\), p = 3 and \(n=100\). Bold value indicates minimum MSE.
Table 4 Estimated MSEs for \(\gamma =2\), p = 3 and \(n=150\). Bold value indicates minimum MSE.
Table 5 Estimated MSEs for \(\gamma =2\), p = 3 and \(n=200\). Bold value indicates minimum MSE.
Table 6 Estimated MSEs for \(\gamma =4\), p = 3 and \(n=25\). Bold value indicates minimum MSE.
Table 7 Estimated MSEs for \(\gamma =4\), p = 3 and \(n=50\). Bold value indicates minimum MSE.
Table 8 Estimated MSEs for \(\gamma =4\), p = 3 and \(n=100\). Bold value indicates minimum MSE.
Table 9 Estimated MSEs for \(\gamma =4\), p = 3 and \(n=150\). Bold value indicates minimum MSE.
Table 10 Estimated MSEs for \(\gamma =4\), p = 3 and \(n=200\). Bold value indicates minimum MSE.
Table 11 Estimated MSEs for \(\gamma =6\), p = 3 and \(n=25\). Bold value indicates minimum MSE.
Table 12 Estimated MSEs for \(\gamma =6\), p = 3 and \(n=50\). Bold value indicates minimum MSE.
Table 13 Estimated MSEs for \(\gamma =6\), p = 3 and \(n=100\). Bold value indicates minimum MSE.
Table 14 Estimated MSEs for \(\gamma =6\), p = 3 and \(n=150\). Bold value indicates minimum MSE.
Table 15 Estimated MSEs for \(\gamma =6\), p = 3 and \(n=200\). Bold value indicates minimum MSE.
Table 16 Estimated MSEs for \(\gamma =2\), p = 6 and \(n=25\). Bold value indicates minimum MSE.
Table 17 Estimated MSEs for \(\gamma =2\), p = 6 and \(n=50\). Bold value indicates minimum MSE.
Table 18 Estimated MSEs for \(\gamma =2\), p = 6 and \(n=100\). Bold value indicates minimum MSE.

The simulation findings extracted from Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 are summarized as follows:

  1. (i)

    The basic purpose of this simulation study is to study the performance of our proposed RPEs for the QPRRE in the presence of multicollinearity. As the multicollinearity level increases with fixed the number of predictors, the sample size, and dispersion, the estimated MSEs increase for all the estimation methods under study. However, at the high level of multicollinearity and with a larger sample size, mostly the QPRRE with RPEs \({\widehat{k}}_{3},\, {\widehat{k}}_{5},\, {\widehat{k}}_{9},\, {\widehat{k}}_{11},\, {\widehat{k}}_{12}\, and \,{\widehat{k}}_{16}\) gives the smaller MSEs as compared to the QPRRE with other RPEs and the QLE.

  2. (ii)

    The other factor that may affect the estimated MSEs of the QPRRE and the QLE is the number of regressors. According to the results of the simulation, we noticed that the estimated MSEs of the estimators and the number of regressors are directly proportional; it means that when the regressors increase by fixing the other factors, the estimated MSEs of QPRRE and the QLE also increase.

  3. (iii)

    When we see the effect of sample size on the MSEs of the estimators, it is observed that the relation between the estimated MSEs and the sample size is inverse. As the sample size increases by fixing the other factors the estimated MSEs decrease. Results show that mostly the QPRRE with RPEs \({\widehat{k}}_{3},\, {\widehat{k}}_{5},\, {\widehat{k}}_{9},\, {\widehat{k}}_{11},\, {\widehat{k}}_{12} \,and\, {\widehat{k}}_{16}\) performs well as compared to the other RPEs and the QLE based on minimum estimated MSE.

  4. (iv)

    According to the results, in every situation such as the different combinations of multicollinearity, different sample size, different number of explanatory variables and different levels of dispersion, the proposed QPRRE with different RPEs outperforms the QLE based on the minimum MSE.

  5. (v)

    According to the findings of simulation for all the conditions under study, the estimated MSE of the QLE is always greater than all suggested RPEs for QPRRE. We also noticed that the suggested QPRRE is significantly decreasing the estimated MSE. Finally, it is concluded that the proposed RPEs for the QPRRE perform well and give better results than the QLE due to the minimum estimated MSE under certain conditions. Mostly, the QPRRE with RPEs \({\widehat{k}}_{3},\, {\widehat{k}}_{5},\, {\widehat{k}}_{9},\, {\widehat{k}}_{11},\, {\widehat{k}}_{12}\, and\, {\widehat{k}}_{16}\) gives better results than the QPRRE with other RPEs. There are some situations, where the QPRRE with other RPEs i.e. \({\widehat{k}}_{4}{-}{\widehat{k}}_{8},\, and\, {\widehat{k}}_{15}\) gives better performance than others. By the evidence of simulation results, the study that is reported in this work is that the QPRR outperforms the QLE in the presence of multicollinearity. So, we suggest the researchers use QPRRE with the best biasing parameters \({\widehat{k}}_{3},\, {\widehat{k}}_{5},\,{\widehat{k}}_{9},\, {\widehat{k}}_{11},\, {\widehat{k}}_{12} \,and\, {\widehat{k}}_{16}\) that minimize the effects of the multicollinearity in QPRM due to their robust patterns.

Table 19 Estimated MSEs for \(\gamma =2\), p = 6 and \(n=150\). Bold value indicates minimum MSE.
Table 20 Estimated MSEs for \(\gamma =2\), p = 6 and \(n=200\). Bold value indicates minimum MSE.
Table 21 Estimated MSEs for \(\gamma =4\), p = 6 and \(n=25\). Bold value indicates minimum MSE.
Table 22 Estimated MSEs for \(\gamma =4\), p = 6 and \(n=50\). Bold value indicates minimum MSE.
Table 23 Estimated MSEs for \(\gamma =4\), p = 6 and \(n=100\). Bold value indicates minimum MSE.
Table 24 Estimated MSEs for \(\gamma =4\), p = 6 and \(n=150\). Bold value indicates minimum MSE.
Table 25 Estimated MSEs for \(\gamma =4\), p = 6 and \(n=200\). Bold value indicates minimum MSE.
Table 26 Estimated MSEs for \(\gamma =6\), p = 6 and \(n=25\). Bold value indicates minimum MSE.
Table 27 Estimated MSEs for \(\gamma =6\), p = 6 and \(n=50\). Bold value indicates minimum MSE.
Table 28 Estimated MSEs for \(\gamma =6\), p = 6 and \(n=100\). Bold value indicates minimum MSE.
Table 29 Estimated MSEs for \(\gamma =6\), p = 6 and \(n=150\). Bold value indicates minimum MSE.
Table 30 Estimated MSEs for \(\gamma =6\), p = 6 and \(n=200\). Bold value indicates minimum MSE.
Table 31 Estimated MSEs for \(\gamma =2\), \(p=12\), and \(n={50,\,100}\). Bold value indicates minimum MSE.
Table 32 Estimated MSEs for \(\gamma =2\), \(p=12\) and \(n={150,\, 200}\). Bold value indicates minimum MSE.
Table 33 Estimated MSEs for \(\gamma =4\), \(p=12\), and \(n={50,\,100}\). Bold value indicates minimum MSE.
Table 34 Estimated MSEs for \(\gamma =4\), \(p=12\), and \(n={150,\,200}\). Bold value indicates minimum MSE.
Table 35 Estimated MSEs for \(\gamma =6\), \(p=12\), and \(n={50,\,100}\). Bold value indicates minimum MSE.
Table 36 Estimated MSEs for \(\gamma =6\), \(p=12\) and \(n={150,\,200}\). Bold value indicates minimum MSE.

Application: apprentice migration dataset

In this section, we explore the superiority of proposed estimators through a real-life example. We use this dataset46 to determine the performance of the proposed estimators. This real-life dataset is about apprentice migration between 1775 and 1799 to Edinburgh from Scotland. The dataset consists of a sample size of 33 (\(n=33)\) with one explained variable and 4 predictors \(\left(p=4\right).\) The explained variable \(y\) denotes the apprentice and the independent variable \({x}_{1}\) represents the distance, \({x}_{2}\) represents the population, \({x}_{3}\) represents the degree of urbanization and the \({x}_{4}\) represents the direction from Edinburgh. The fitness of the QP distribution is determined by the estimated value of dispersion and the value of dispersion for this real-life data set was found to be 9651.93. By considering the dispersion value, it can be seen that this data of the concerned application is over-dispersed, so QPRM is more appropriate than the PRM.

As there are four predictors, there may be a chance of multicollinearity. To test multicollinearity among predictors, we consider the most popular criteria i.e. the condition index (CI). The CI is the square root of the ratio of minimum eigenvalue and maximum eigenvalue of the independent variables matrix. The CI value is 63.81 which is greater than 30, this shows that severe multicollinearity exists among the independent variables.

The QPRM estimates are obtained by using the QLE. The QLE can give better results if the predictors are uncorrelated. In this case, when the predictors are highly correlated with each other, so, the QLE does not provide good estimates. So, the QPRRE is considered to overcome the effect of multicollinearity. Table 37 shows the estimated coefficients and scalar estimated MSEs of the QLE and QPRRE under proposed RPEs. The QPRM estimates using the QLE and QPREE are obtained using Eqs. (6) and (10) respectively. The estimated scalar MSEs of the QLE and the QPRRE are obtained by using Eqs. (9) and (14) respectively. According to the results of Table 37, it is observed that the MSE of the QPRRE with RPEs is less than the MSE of the QLE. It means that the QPRRE performs well and gives the best results than the QLE. More specifically, the performance of the QPRRE with RPE \({k}_{7}\) is best as compared to the QPRRE with other RPEs and the QLE based on minimum MSE.

Table 37 Estimated regression coefficients and MSEs of the selected estimators in the Apprentice Migration Dataset. Bold value indicates minimum MSE.

In real datasets, sometimes the MSE criteria do not give good predictive performance of the estimators47,48,49. So, another model assessment criterion is recommended is the cross-validation (CV). The CV criteria are also known as the prediction sum of squares (PRESS/CV(1)) or a jackknife fit at given explanatory variables50. This criterion has also some limitations and different types51. The CV was considered by various authors for different models to assess the performance of their proposed estimators47,48,49,52,53. Here we consider the kfold CV and PRESS criterion for further evaluation of the proposed RPEs in the QPRRE. The procedure to compute the CV, we suggest to see the work of Akram et al.52 and Amin et al.53. While the PRESS is computed based on Pearson residuals for the QLE and QPRRE respectively as

$$PRESS\left(k=0\right)=\frac{1}{n}\sum_{i=1}^{n}\frac{{\chi }_{i}^{2}}{{\left(1-{h}_{ii}\right)}^{2}},$$

where \({\chi }_{i}=\frac{{y}_{i}-{\widehat{\mu }}_{i}}{\sqrt{\widehat{\gamma }{\widehat{\mu }}_{i}}}\), and \({h}_{ii}\) are the ith diagonal elements of the hat matrix computed for the QLE and

$$PRESS\left(k\right)=\frac{1}{n}\sum_{i=1}^{n}\frac{{\chi }_{ki}^{2}}{{\left(1-{h}_{kii}\right)}^{2}},$$

, where \({\chi }_{ki}=\frac{{y}_{i}-{\widehat{\mu }}_{ki}}{\sqrt{\widehat{\gamma }{\widehat{\mu }}_{ki}}}, {\widehat{\mu }}_{ki}\) is the predicted response of the QPRRE under different RPEs and \({h}_{kii}\) are the diagonal elements of the hat matrix obtained for the QPRRE. The estimated results for CV and PRESS for the QLE and QPRRE with all RPEs are given in Table 37. Based on CV results, it can be seen that the QPRRE with RPEs \({k}_{3}, {k}_{9}{-}{k}_{12}\) and \({k}_{16}\) gives a better performance as compared to the QPRRE with other RPEs as well as the QLE. When we look at the results of the PRESS criterion, we observed that the QPRRE with RPEs \({k}_{1}{-}{k}_{8}\) show a better performance than others. In view of simulation and application findings and we suggest the researchers to mitigate the effects of multicollinearity in QPRM, use the QPRRE with RPEs \({k}_{3}{-}{k}_{9}\), \({k}_{11}, {k}_{12}\) and \({k}_{16}\).

Conclusion

When the dependent variable is in count form and over-dispersed, then the QPRM can be used for modeling such types of response variables. In this study, we proposed different RPEs for the QPRRE to minimize the problems of multicollinearity among the explanatory variables. To determine the superiority of the proposed ridge estimators, we conduct the simulation study under different parametric conditions such as different sample sizes, different numbers of predictor variables, different dispersion levels and different degrees of multicollinearity. Furthermore, the evaluation of the performance of proposed ridge estimators is done by analyzing the real-life dataset related to Apprentice migration data. According to the results of the simulation study and real-life dataset, it is observed that the QPRRE with some available and proposed RPEs outperforms as compared to the QLE in the presence of severe multicollinearity. The provided evidence showed that the QPRRE is a better estimation method than the QLE to combat the problem of multicollinearity among the explanatory variables for counts data with over-dispersion.