A new estimator for the multicollinear Poisson regression model: simulation and application

The maximum likelihood estimator (MLE) suffers from the instability problem in the presence of multicollinearity for a Poisson regression model (PRM). In this study, we propose a new estimator with some biasing parameters to estimate the regression coefficients for the PRM when there is multicollinearity problem. Some simulation experiments are conducted to compare the estimators' performance by using the mean squared error (MSE) criterion. For illustration purposes, aircraft damage data has been analyzed. The simulation results and the real-life application evidenced that the proposed estimator performs better than the rest of the estimators.


Scientific Reports
| (2021) 11:3732 | https://doi.org/10.1038/s41598-021-82582-w www.nature.com/scientificreports/ experiment. Finally, the new method's benefit is evaluated in an example using aircraft damage data that was initially analyzed by Myers et al. 21 . This paper structuring is as follows: the Poisson regression model, some estimators and the MSEM and MSE properties of the estimators are discussed in Sect. 2. A Monte Carlo simulation experiment has been conducted in Sect. 3. To illustrate the finding of the paper, aircraft damage data was analyzed in Sect. 4. Some concluding remarks are presented in Sect. 5.

Statistical methodology
Poisson regression model and maximum likelihood estimator. Suppose that the response variable, y i is in the form of non-negative integers (or count data), then the probability function is given as follows where µ i > 0. The mean and variance of the Poisson distribution in Eq. (2.1) are the same (i.e.E(y) = Var y = µ ). The model is written in terms of the mean of the response. According to Myers et al. 21 , we assume that there exists a function, g, that relates the mean of the response to a linear predictor such that where g(.) is a monotone differentiable link function. The log link function is a popular type of this link function such that g(µ i ) = ln (µ i ) = exp x ′ i β . This log link is generally adopted for the Poisson regression model because it ensures that all the fitted values for the response variable are positive. The maximum likelihood estimator is popularly used to estimate the coefficients of the PRM, where the likelihood function is defined as: The log-likelihood function is used to estimate the parameter vector β Since Eq. (2.4) is nonlinear in β , the solution is obtained using iterative methods. A common such procedure is the Fisher Scoring method defined as: . The final step of the estimated coefficients corresponds to: . Ŵ and ẑ are obtained using Fisher scoring iterative procedure (see Hardin and Hilbe 22 ). The covariance matrix and mean square error are given respectively as follows: and where i is the ith eigenvalue of the matrix X ′Ŵ X.
Poisson K-L estimator. Månsson  to mitigate the problem of multicollinearity, which is defined as follows: where k > 0 is the biasing parameter, I is a p × p identity matrix and the optimal value of k is defined as: where α i is the ith component of α = Q ′ β , Q is the matrix whose columns are the eigenvectors of X ′Ŵ X. Månsson et al. 16 introduced the Poisson Liu estimator (PLE) as follows: where d according to Månsson et al. 16 may be estimated by the following formula: Kibria and Lukman 15 proposed a new single parameter ridge-type estimator for the linear regression model, which is defined as follows: Following Kibria and Lukman 15 , we proposed the following new estimator for the Poisson regression model as follows: Suppose α = Q ′ β and Q ′ X TŴ XQ = = diag 1 , ..., p where 1 ≥ 2 ≥ ... ≥ p , is the matrix of eigenvalues of X TŴ X and Q is the matrix whose columns are the eigenvectors of X TŴ X. The matrix mean square error and the mean square error of the estimators PMLE, PRRE, PLE and PKLE are provided in Eqs. (2.15) to (2.21) respectively as follows: where � d = (� + I) −1 (� + dI).
where i is the ith eigenvalue of X ′Ŵ X and α j is the jth element of α. For the purpose of theoretical comparisons, we adopt the following lemmas. Proof.

Proof.
The matrix found in the above equation Selection of Biasing Parameter. The parameter is estimated by taking the first derivative of the MSE function of α PKLE with respect to k and equating the resulting solution to zero. We obtain the following estimates of k: Following Månsson et al. 16 and Lukman and Ayinde 25 , we propose the following forms of the shrinkage parameters in Eq. (2.26).

Simulation Experiment
Simulation Design. Since a theoretical comparison among the estimators is not sufficient, as simulation experiment has been carried out in this section. We generate the response variable of the PRM from the Poisson the design matrix X and following Kibria 1 , we generated the X matrix as follows: where ρ 2 is the correlation between the explanatory variables. The values of ρ are chosen to be 0.85, 0.9, 0.95 and 0.99. The mean function is obtained for p = 4 and 7 regressors, respectively. According to Kibria et al. 26 , the intercept value are chosen to be − 1, 0 and 1 to change the average intensity of the Poisson process. The slope coefficients chosen so that p j=1 β 2 j = 1 and β 1 = β 2 = · · · = β p for sample sizes 50, 75, 100 and 200. Simulation experiment conducted through R programming language 27 . The estimated MSE is calculated as  Tables 4,  5, 6, respectively.  Tables 1, 2

Real life application
In this session, we examined the effectiveness of the new estimator using real-life data. We adopted the aircraft damage data to evaluate the proposed estimator's performance and some other estimators in this study. The dataset was initially used by Myers et al. 21 Table 7. From Table 7, we observed that all the coefficients have a similar sign. PMLE has the highest mean square error, while the proposed estimator (PKLE2) has the lowest MSE which established its superiority. The maximum likelihood estimator possesses the highest MSE due to the presence of multicollinearity. The ridge and Liu estimator equally perform well when there is multicollinearity. We observed that the performance of the proposed estimator is a function of the biasing parameter, k.

Some concluding remarks
The K-L estimator is an estimator with a single biasing parameter, k which eliminates the biasing parameter's computational rigour as obtainable in some of the two-parameter estimators. It falls in the ridge and Liu estimator class to mitigate multicollinearity in the linear regression model. According to Kibria and Lukman 15 , K-L estimator outclasses the following estimators: the ordinary least squares estimator, the ridge and the Liu estimator in the linear regression model. As earlier stated, the multicollinearity influences the performance of the maximum   www.nature.com/scientificreports/ likelihood estimator (MLE) in both the linear regression models and the Poisson regression models (PRM). The ridge regression and Liu estimator at a different time were harmonized to the PRM to solve multicollinearity. However, in this study, we developed a new estimator, establish its statistical properties, carried out theoretical comparisons with the estimators mentioned above. Furthermore, we conducted a simulation experiment and analyzed a real-life application to show the proposed estimator effectiveness. The simulated and application results show that the proposed estimators outperform the existing estimators, while PMLE has the worst performance.