Introduction

Linear regression models are popularly adopted to predict the response variable from a combination of regressors or predictors. The model is generally written as:

$$y=X\beta +\varepsilon ,$$
(1.1)

where y is an \(n\times 1\) vector of response variable, \(X\) is a \(n\times p\) full rank matrix of regressors, \(\beta\) is a \(p\times 1\) vector of unknown regression coefficients, \(\varepsilon\) is an \(n\times 1\) vector of errors. The error term is assumed to be normally distributed with mean zero and constant variance \({\sigma }^{2}{I}_{n},\) \({I}_{n}\) is an \(n\times n\) identity matrix. The parameter \(\beta\) is often estimated using the ordinary least squares estimator (OLS) which is defined as follows:

$$\widehat{\beta }={\left({X}^{^{\prime}}X\right)}^{-1}{X}^{^{\prime}}y$$
(1.2)
$$Cov(\widehat{\beta })={\sigma }^{2}({X}^{^{\prime}}X{)}^{-1}$$
(1.3)

where,\({\widehat{\sigma }}^{2}\) is the estimated residual mean square, \({\widehat{\sigma }}^{2}=\frac{{\left(Y-X\widehat{\beta }\right)}^{^{\prime}} \left(Y-X\widehat{\beta }\right)}{n-{p}^{*}}\). The scalar mean squared error \((\mathrm{S}MSE)\) of \(\widehat{\beta }\) and the matrix mean squared error \((\mathrm{M}MSE)\) of \(\widehat{\beta }\) are calculated as:

$$MMSE\left(\widehat{\beta }\right)={\sigma }^{2}({X}^{^{\prime}}X{)}^{-1}$$
(1.4)
$$SMSE\left(\widehat{\beta }\right)={\sigma }^{2}tr({X}^{^{\prime}}X{)}^{-1}={\sigma }^{2}\sum_{j=1}^{{p}^{*}}\frac{1}{{\lambda }_{j}}$$
(1.5)

is known to be sensitive to the presence of correlated regressors (multicollinearity) and outliers, which can negatively impact its performance. Several alternative methods have been proposed to address the issue of correlated regressors, including the Stein estimator, ridge regression, Liu estimator, Modified Liu estimator, modified ridge-type estimator, Kibra-Lukman estimator, Dawoud-Kibria estimator, and others1,2,3,4,5,6,7. These methods aim to effectively account for the correlation among the regressors.

Outliers are data points that differ significantly from other observations and can have a substantial impact on model estimates8,9. They threatened the efficiency of the OLS estimator8,9,10,11, and it is well-known that robust estimators are preferred when dealing with outliers12,13,14,15,16,17,18,19. However, both multicollinearity and outliers can exist simultaneously in a model. To address both issues, some of the methods mentioned earlier have been combined. For example, ridge regression has been combined with the M-estimator to handle both correlated regressors and outliers in the y-direction20.

Recently, the Stein estimator has gained popularity as an alternative to OLS and performs well in handling correlated regressors. Few researchers have extended the method to some generalized linear models such as the Poisson, the zero-inflated negative binomial and inverse gaussian regression models21,22,23. However, it is sensitive to outliers in the y-direction. In this study, we propose a robust version of the Stein estimator that can handle both multicollinearity and outliers.

In Section “Theoretical comparisons among estimators”, we provide a theoretical comparison of the proposed and existing estimators. We then conduct a simulation study in Section “Simulation study” to evaluate their performance, and in Section “Real-life application”, we analyze real-life data for illustration purposes. Finally, we conclude our findings in Section “Some concluding remarks”.

Theoretical comparisons among estimators

With the suggested biased estimators, we employ the spectral decomposition of the information matrix (\(X^{\prime } X\)) to offer the explicit form of the matrix mean squared error (MMSE) and the scalar mean squared error (SMSE). Assume that there exists a matrix \(T\) such that:

$$T\left({X}^{^{\prime}}X\right){T}^{\mathrm{^{\prime}}}=\Lambda =diag\left\{{\lambda }_{j}\right\} , j=\mathrm{1,2},\dots ,{p}^{*},\left({p}^{*}=p+1\right),$$

where, \({\lambda }_{1}\ge {\lambda }_{2} \ge ..\ge {\lambda }_{{p}^{*}}\), are the ordered eigenvalues of \(({X}^{^{\prime}}X)\) and \(T\) is a \(({p}^{*}\times {p}^{*})\) orthogonal matrix whose columns are the corresponding eigenvectors of \({\lambda }_{1}\ge {\lambda }_{2} \ge ..\ge {\lambda }_{{p}^{*}}\). Rewrite the linear regression model in Eq. (1.1) in canonical form:

$${y}_{i}={\sum }_{j=1}^{{p}^{*}}{\alpha }_{j}{h}_{ij}+{\varepsilon }_{i}, i=\mathrm{1,2},\dots ,n,$$
(2.1)

where \(\text{H}\,=\,\acute{{XT}}, \alpha =\acute{T\beta }, T\left(X{^{\prime}}X\right){T}{^{\prime}}={H}{^{\prime}}H=\Lambda\). With the presence of correlated regressors (multicollinearity) the ordinary least squares estimator \({\widehat{\alpha }}_{OLS}\) is inadequate and inefficient. Also, outlier(s) negatively affect the parameter estimates of \({\widehat{\alpha }}_{LS}\). The M-estimator is efficient for handling outliers in the y-direction15. Let \({\widehat{\alpha }}_{M}\) be the M-estimator of α, and can be obtained across a solution of M-estimating equations. The effects of outliers in the y-direction are eliminated by the weights of the residuals in the iterative reweighted least-squares approach used to solve M-estimating equations10,15.

$${\widehat{\alpha }}_{LS}={\Lambda }^{-1}{H}^{^{\prime}}y$$
(2.2)
$${\widehat{\alpha }}_{M }=min\sum_{i=1}^{n}\pi \left(\frac{{\varepsilon }_{i}}{\eta }\right)=min\sum_{i=1}^{n}\pi \left(\frac{{y}_{i}-{\sum }_{j=1}^{{p}^{*}}{\alpha }_{j}{h}_{ij}}{\eta }\right),$$
(2.3)

where \(\pi (.)\) indicates a robust criterion function and \(\eta\) is a scale parameter estimate. \({\widehat{\alpha }}_{M}\) is obtained through a solution of M-estimating equations \(\sum_{i=1}^{n}\phi \left(\frac{{e}_{i}}{\eta }\right)=0\) and \(\sum_{i=1}^{n}\phi \left(\frac{{e}_{i}}{\eta }\right){x}_{i}=0\), where, \({e}_{i}={y}_{i}-{\sum }_{j=1}^{{p}^{*}}{\widehat{\alpha }}_{j-M}{h}_{ij} , \phi ={\pi }^{^{\prime}}\) is a useful selected function10.

$$SMSE\left({\widehat{\alpha }}_{M}\right)={\sum }_{j=1}^{{p}^{*}}{\Psi }_{jj},$$
(2.4)

where \({\Psi }_{jj}\) is the \({j}\)th element of the main diagonal of the matrix \(Var\left({\widehat{\alpha }}_{M}\right)=\Psi\), which is finite.

The ridge regression estimator of \(\alpha\) is defined as:

$${{\widehat{\alpha }}_{Ridge }=\left(\Lambda +kI\right)}^{-1}\Lambda {\widehat{\alpha }}_{LS}$$
(2.5)
$$cov\left({\widehat{\alpha }}_{Ridge }\right)={\sigma }^{2}{\left(\Lambda +kI\right)}^{-1}\Lambda {\left(\Lambda +kI\right)}^{-1}, k\ge 0$$
(2.6)
$$Bias\left({\widehat{\alpha }}_{Ridge }\right)=E\left({\left(\Lambda +kI\right)}^{-1}\Lambda {\widehat{\alpha }}_{LS }\right)-\alpha$$
$$=\left[{\left(\Lambda +kI\right)}^{-1}\Lambda -I\right]\beta$$
(2.7)

The scalar mean squared error \((\mathrm{S}MSE)\) of \({\widehat{\alpha }}_{Ridge}\) and the matrix mean squared error \((\mathrm{M}MSE)\) of \({\widehat{\alpha }}_{Ridge}\) are calculated as:

$$MMSE\left({\widehat{\alpha }}_{Ridge }\right)={\sigma }^{2}{\left(\Lambda +kI\right)}^{-1}\Lambda {\left(\Lambda +kI\right)}^{-1}+Bias\left({\widehat{\alpha }}_{Ridge }\right)Bias\left({\widehat{\alpha }}_{Ridge }\right){^{\prime}}$$
(2.8)
$$SMSE\left({\widehat{\alpha }}_{Ridge }\right)={\sigma }^{2}\sum_{j=1}^{{p}^{*}}\frac{{\lambda }_{j}}{{\left({\lambda }_{j}+{k}\right)}^{2}}+{k}^{2}\sum_{j=1}^{{p}^{*}}\frac{{\alpha }_{j}^{2}}{{\left({\lambda }_{j}+{k}\right)}^{2}}$$
(2.9)

The M-Ridge is given by:

$${\widehat{\alpha }}_{M-ridge}={\left(\Lambda +{k}_{m}I\right)}^{-1}\Lambda {\widehat{\alpha }}_{M}$$
(2.10)
$$cov\left({\widehat{\alpha }}_{Ridge-M }\right)={\left(\Lambda +{k}_{m}I\right)}^{-1}\mathrm{\Lambda \psi \Lambda }{\left(\Lambda +{k}_{m}I\right)}^{-1}, k\ge 0$$
(2.11)
$$Bias\left({\widehat{\alpha }}_{Ridge-M }\right)=E\left({\left(\Lambda +{k}_{m}I\right)}^{-1}\Lambda {\widehat{\alpha }}_{M }\right)-\alpha$$
(2.12)

The scalar mean squared error \((\mathrm{S}MSE)\) of \({\widehat{\alpha }}_{Ridge-M}\) and the matrix mean squared error \((\mathrm{M}MSE)\) of \({\widehat{\alpha }}_{Ridge-M}\) are calculated as:

$$MMSE\left({\widehat{\alpha }}_{Ridge-M }\right)={\left(\Lambda +{k}_{m}I\right)}^{-1}\mathrm{\Lambda \Psi \Lambda }{\left(\Lambda +{k}_{m}I\right)}^{-1}+Bias\left({\widehat{\alpha }}_{Ridge-M }\right)Bias\left({\widehat{\alpha }}_{Ridge-M }\right){^{\prime}}$$
(2.13)
$$SMSE\left({\widehat{\alpha }}_{Ridge-M }\right)=\sum_{j=1}^{p*}\frac{{\lambda }_{j}^{2}}{{\left({\lambda }_{j}+k\right)}^{2}}{\Psi }_{jj}+\sum_{j=1}^{p*}\frac{{\alpha }_{j}^{2}{k}^{2}}{{\left({\lambda }_{j}+k\right)}^{2}}$$
(2.14)

The James–Stein estimator (Stein, 1960) is given by:

$${\widehat{\alpha }}_{JSE }=\mathrm{c }{\widehat{\alpha }}_{LS}$$
(2.15)

where

$$c=\frac{{\widehat{\alpha }}_{LS }{^{\prime}}{\widehat{\alpha }}_{LS }}{{\widehat{\alpha }}_{LS }{^{\prime}}{\widehat{\alpha }}_{LS }+{\sigma }^{2}tr(X{^{\prime}}X{)}^{-1})}={\sum }_{j=1}^{{p}^{*}}\frac{{\lambda }_{j}{\alpha }_{j}^{2}}{{\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}}$$
(2.16)
$$cov\left({\widehat{\alpha }}_{JSE }\right)=c Cov\left({\widehat{\alpha }}_{JSE }\right){c}^{^{\prime}}=c {\widehat{\sigma }}^{2}(X{^{\prime}}X{)}^{-1}c{^{\prime}}$$
(2.17)
$$Bias\left({\widehat{\alpha }}_{JSE }\right)=E\left(c{\widehat{\alpha }}_{JSE }\right)-\alpha =(c-1)\alpha$$

The scalar mean squared error \((\mathrm{S}MSE)\) of \({\widehat{\alpha }}_{JSE}\) and the matrix mean squared error \((\mathrm{M}MSE)\) of \({\widehat{\alpha }}_{JSE}\) are calculated as:

$$MMSE\left({\widehat{\alpha }}_{JSE }\right)=c {\sigma }^{2}({X}^{^{\prime}}X{)}^{-1}{c}^{^{\prime}}+Bias\left({\widehat{\alpha }}_{JSE }\right)Bias\left({\widehat{\alpha }}_{JSE }\right){^{\prime}}$$
(2.18)
$$SMSE\left({\widehat{\alpha }}_{JSE }\right)={c}^{2}{\sigma }^{2}\sum_{j=1}^{{p}^{*}}\frac{1}{{\lambda }_{j}}+{\left(c-1\right)}^{2}\sum_{j=1}^{{p}^{*}}{\alpha }_{j}^{2}$$
(2.19)
$$SMSE\left({\widehat{\alpha }}_{JSE}\right)={\sum }_{j=1}^{{p}^{*}}\frac{{\sigma }^{2}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\sigma }^{4}{\alpha }_{j}^{2}}{{\left({\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}$$
(2.20)

M-Stein estimator

Stein estimator is sensitive to outliers in the y-direction. Thus, there is a need to propose the Robust Stein estimator which is defined as follows:

$${\widehat{\alpha }}_{M-JSE}={c}^{*} {\widehat{\alpha }}_{M },$$
(2.21)

where \({\widehat{\alpha }}_{M}\) is the M-estimate of α,

$$ {c}^{*}={\sum }_{j=1}^{{p}^{*}}\left(\frac{{\lambda }_{j}{\alpha }_{M}^{2}}{{\Psi }_{jj}+{\lambda }_{j}{\alpha }_{M}^{2}}\right)$$
(2.22)
$$cov\left({\widehat{\alpha }}_{M-JSE }\right)={c}^{*}Cov\left({\widehat{\alpha }}_{M }\right){{c}^{*}}^{^{\prime}}={c}^{*}\uppsi (X{^{\prime}}X{)}^{-1}{c}^{*}{^{\prime}}$$
(2.23)
$$Bias\left({\widehat{\alpha }}_{M-JSE }\right)=E\left({c}^{*}{\widehat{\alpha }}_{M }\right)-\alpha =({c}^{*}-1)\alpha$$

The scalar mean squared error \((\mathrm{S}MSE)\) of \({\widehat{\alpha }}_{M-JSE}\) and the matrix mean squared error \((\mathrm{M}MSE)\) of \({\widehat{\alpha }}_{M-JSE}\) are calculated as:

$$MMSE\left({\widehat{\alpha }}_{M-JSE }\right)={c}^{*}\uppsi ({X}^{^{\prime}}X{)}^{-1}{{c}^{*}}^{^{\prime}}+Bias\left({\widehat{\alpha }}_{M }\right)Bias\left({\widehat{\alpha }}_{M }\right){^{\prime}}$$
(2.24)
$$SMSE\left({\widehat{\alpha }}_{M-JSE }\right)={{c}^{*}}^{2}\Psi \sum_{j=1}^{{p}^{*}}\frac{1}{{\lambda }_{j}}+{\left({c}^{*}-1\right)}^{2}\sum_{j=1}^{{p}^{*}}{\alpha }_{j}^{2}$$
(2.25)
$$SMSE\left({\widehat{\alpha }}_{M-JSE}\right)={\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}^{2}{\alpha }_{j}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}$$
(2.26)

We presume the following conditions hold to describe the major theorems:

  1. 1.

    \(\phi\) is skew-symmetric and non-decreasing.

  2. 2.

    The errors are symmetric.

  3. 3.

    \(\Psi\) is finite.

Now we will give the theoretical comparisons among the estimators based on the scalar mean squared errors, presented in Eqs. (1.5), (2.4), (2.9), (2.14), and (2.20).

Theorem 2.1

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)<SMSE\left({\widehat{\alpha }}_{LS}\right),\) if \({\sigma }^{2}\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)>{\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}\), where \({\Psi }_{jj}\) is the \({j}\)th element of the main diagonal of the matrix \(Var\left({\widehat{\alpha }}_{M}\right)=\Psi\).

Proof:

The difference between \(SMSE\left({\widehat{\alpha }}_{M-JSE}\right) \mathrm{and} SMSE\left({\widehat{\alpha }}_{LS}\right)\) is given by:

$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}^{2}{\alpha }_{j}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}-{\sum }_{j=1}^{{p}^{*}}\frac{{\sigma }^{2}}{{\lambda }_{j}}$$
(2.27)
$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}^{2}{\alpha }_{j}^{4}+{\Psi }_{jj}^{2}{\lambda }_{j}{\alpha }_{j}^{2}-{\sigma }^{2}{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}{{\lambda }_{j}{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}<0$$
$$\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right){\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}<{\sigma }^{2}{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}<{\sigma }^{2}\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}-{\sigma }^{2}\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)<0$$
(2.28)

It is obvious from Eq. (2.28) that \({\sigma }^{2}\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)\) is greater than \({\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}\). Thus, the difference is less than zero and the proof is completed.

Theorem 2.2

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)<SMSE\left({\widehat{\alpha }}_{Ridge}\right),\) if \({\sigma }^{2}{\lambda }_{j}\left({\alpha }_{j}^{2}{\lambda }_{j}+{\Psi }_{jj}\right)+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}>{\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}\left({\lambda }_{j}+2k\right)\), where \({\Psi }_{jj}\) is the \({j}^{th}\) element of the main diagonal of the matrix \(Var\left({\widehat{\alpha }}_{M}\right)=\Psi\).

Proof:

The difference between \(SMSE\left({\widehat{\alpha }}_{M-JSE}\right) and SMSE\left({\widehat{\alpha }}_{Ridge}\right)\) is given by:

$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}^{2}{\alpha }_{j}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}-{\sum }_{j=1}^{{p}^{*}}\frac{\left({\sigma }^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right)}{{\left({\lambda }_{j}+k\right)}^{2}}$$
(2.29)
$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right){\left({\lambda }_{j}+k\right)}^{2}-\left({\sigma }^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right){\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}{\left({\lambda }_{j}+k\right)}^{2}}$$

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)\) is better than \(SMSE\left({\widehat{\alpha }}_{Ridge}\right)\) if the difference is less than zero, i.e. if

$${\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right){\left({\lambda }_{j}+k\right)}^{2}<\left({\sigma }^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right){\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\left({\lambda }_{j}+k\right)}^{2}<\left({\sigma }^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right)\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}^{2}+{\Psi }_{jj}{\alpha }_{j}^{2}{k}^{2}+2k{\lambda }_{j}{\Psi }_{jj}{\alpha }_{j}^{2}<{\sigma }^{2}{\alpha }_{j}^{2}{\lambda }_{j}^{2}+{\sigma }^{2}{\lambda }_{j}{\Psi }_{jj}+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}{\Psi }_{jj}$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}^{2}-2k{\lambda }_{j}{\Psi }_{jj}{\alpha }_{j}^{2}-{\sigma }^{2}{\alpha }_{j}^{2}{\lambda }_{j}^{2}+{\sigma }^{2}{\lambda }_{j}{\Psi }_{jj}+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}< 0$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}\left({\lambda }_{j}+2k\right)-{\sigma }^{2}{\lambda }_{j}\left({\alpha }_{j}^{2}{\lambda }_{j}+{\Psi }_{jj}\right)-{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}<0$$
(2.30)

It is obvious from Eq. (2.30) that \({\sigma }^{2}{\lambda }_{j}\left({\alpha }_{j}^{2}{\lambda }_{j}+{\Psi }_{jj}\right)+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}\) is greater than \({\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}\left({\lambda }_{j}+2k\right)\). Thus, the difference is less than zero and the proof is completed.

Theorem 2.3

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)<SMSE\left({\widehat{\alpha }}_{M-Ridge}\right),\) if \(\left({\Psi }_{jj}^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}{\Psi }_{jj}+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}\right)>{\Psi }_{jj}{\alpha }_{j}^{2}k\left(k+2{\lambda }_{j}\right)\), where \({\Psi }_{jj}\) is the \({j}^{th}\) element of the main diagonal of the matrix \(Var\left({\widehat{\alpha }}_{M}\right)=\Psi\).

Proof:

The difference between \(SMSE\left({\widehat{\alpha }}_{M-JSE}\right) and SMSE\left({\widehat{\alpha }}_{M-Ridge}\right)\) is given by:

$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}^{2}{\alpha }_{j}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}-{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}}{{\left({\lambda }_{j}+k\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{k}^{2}{\alpha }_{j}^{2}}{{\left({\lambda }_{j}+k\right)}^{2}}$$
(2.31)
$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}-{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}}{{\left({\lambda }_{j}+k\right)}^{2}}$$
$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right){\left({\lambda }_{j}+k\right)}^{2}-\left({\Psi }_{jj}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right){\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}{\left({\lambda }_{j}+k\right)}^{2}}$$

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)\) is better than \(SMSE\left({\widehat{\alpha }}_{M-Ridge}\right)\) if the difference is less than zero, i.e. if

$${\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right){\left({\lambda }_{j}+k\right)}^{2}<\left({\Psi }_{jj}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right){\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{\left({\lambda }_{j}+k\right)}^{2}<\left({\Psi }_{jj}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}\right)\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)$$
$${\Psi }_{jj}{\alpha }_{j}^{2}{k}^{2}+{\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}^{2}+2{\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}k<{\Psi }_{jj}{\alpha }_{j}^{2}{\lambda }_{j}^{2}+{\Psi }_{jj}^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}{\Psi }_{jj}+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}$$
$${\Psi }_{jj}{\alpha }_{j}^{2}k\left(k+2{\lambda }_{j}\right)-\left({\Psi }_{jj}^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}{\Psi }_{jj}+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}\right)<0$$
(2.32)

It is obvious from Eq. (2.32) that \(\left({\Psi }_{jj}^{2}{\lambda }_{j}+{k}^{2}{\alpha }_{j}^{2}{\Psi }_{jj}+{k}^{2}{\alpha }_{j}^{4}{\lambda }_{j}\right)\) is greater than \({\Psi }_{jj}{\alpha }_{j}^{2}k\left(k+2{\lambda }_{j}\right)\). Thus, the difference is less than zero and the proof is completed.

Theorem 2.4

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)<SMSE\left({\widehat{\alpha }}_{JSE}\right),\) if \({\sigma }^{2}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)>{\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\sigma }^{2}\right)\), where \({\Psi }_{jj}\) is the \({j}^{th}\) element of the main diagonal of the matrix \(Var\left({\widehat{\alpha }}_{M}\right)=\Psi\).

Proof:

The difference between \(SMSE\left({\widehat{\alpha }}_{M-JSE}\right) and SMSE\left({\widehat{\alpha }}_{JSE}\right)\) is given by:

$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}^{2}{\alpha }_{j}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}-{\sum }_{j=1}^{{p}^{*}}\frac{{\sigma }^{2}{\lambda }_{j}{\alpha }_{j}^{4}}{{\left({\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}+{\sum }_{j=1}^{{p}^{*}}\frac{{\sigma }^{4}{\alpha }_{j}^{2}}{{\left({\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}$$
(2.33)
$${\sum }_{j=1}^{{p}^{*}}\frac{{\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right){\left({\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}-{\sigma }^{2}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\sigma }^{2}\right){\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}{{\left({\Psi }_{jj}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}{\left({\sigma }^{2}+{\lambda }_{j}{\alpha }_{j}^{2}\right)}^{2}}$$

\(SMSE\left({\widehat{\alpha }}_{M-JSE}\right)\) is better than \(SMSE\left({\widehat{\alpha }}_{JSE}\right)\) if the difference is less than zero, i.e. if

$${\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\sigma }^{2}\right)<{\sigma }^{2}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)$$
$${\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\sigma }^{2}\right)-{\sigma }^{2}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)<0$$
(2.34)

It is obvious from Eq. (2.34) that \({\sigma }^{2}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\Psi }_{jj}\right)\) is greater than \({\Psi }_{jj}{\alpha }_{j}^{2}\left({\lambda }_{j}{\alpha }_{j}^{2}+{\sigma }^{2}\right)\). Thus, the difference is less than zero and the proof is completed.

Simulation study

This section provides a simulation study using the R programming language to compare the performance of the non-robust and robust estimators.

Simulation design

The design of this simulation study is based on specifying the variables that are anticipated to have an impact on the features of suggested estimator and selecting a metric to assess the outcomes. Following the cited references24,25,26,27,28, we generated the regressors as follows:

$${x}_{ij}={(1-{\rho }^{2})}^{1/2}{m}_{ij}+\rho {m}_{i,{p}^{*}+1}, i=\mathrm{1,2},\dots ,n, j=\mathrm{1,2},3, \dots , {p}^{*}$$
(3.1)

where \({m}_{ij}\) independent standard normal are pseudo-random numbers, \({p}^{*}\) denotes the number of regressors (\({p}^{*}\)=4,8,12) and \(\rho\) denotes level of multicollinearity (\(\rho =\mathrm{0.7,0.8}, 0.9, 0.99)\). Thus, the response variable is given by:

$${y}_{i}={\beta }_{0}+{\beta }_{1}{x}_{i1}+ \dots + {\beta }_{{p}^{*}}{x}_{i{p}^{*}}+{\varepsilon }_{i}, i=\mathrm{1,2}, \dots , n$$
(3.2)

where \({\varepsilon }_{i}\sim N(0,{\sigma }^{2})\), \(\sigma =\mathrm{5,10}\), n = 30,50,100, 200 and the regression parameters are chosen such that \({\beta }^{^{\prime}}\beta =1\)29,30,31,32,33,34,35. The experiment is repeated 2000 times. We introduced outlier by increasing the magnitude of the response variable. Using Eq. (3.3), 10% and 20% contamination were added to the model.

$${y}_{i}=h*\mathrm{max}\left({y}_{i}\right)+{y}_{i},$$
(3.3)

where h = 10 is added to inflate the response variable36,37. The ridge parameter k is obtained using the following equation:

$$k=\frac{{p}^{*}{\widehat{\sigma }}^{2}}{\sum_{j=1}^{{p}^{*}}{\alpha }_{LS}^{2}},$$
(3.4)

where \({\widehat{\sigma }}^{2}=\frac{\sum_{j=1}^{n}{e}_{i}^{2}}{n-r}\), \({e}_{i}=y-\widehat{y}\) and r denotes the number of estimated parameter.

The unbiased estimator of \({\Psi }_{jj}\) is asymptotically \({\widehat{A}}^{2}{\lambda }_{j}^{-1}\) where \({\widehat{A}}^{2}={s}^{2}{(n-{p}^{*})}^{-1}\) \(\frac{\sum_{i=1}^{n}{\left[\varphi \left({e}_{i}/s\right)\right]}^{2}}{\sum_{i=1}^{n}{\left[\frac{1}{n}{\varphi }^{^{\prime}}\left({e}_{i}/s\right)\right]}^{2}} \mathrm{and }s\mathrm{ is the scale estimate}.\) Thus, the parameter for M-Ridge is determined using the following equation:

$${k}_{m}=\frac{{p}^{*}\widehat{A}}{\sum_{j=1}^{{p}^{*}}{\alpha }_{M}^{2}}$$
(3.5)

The estimated mean squared error (MSE) is computed as follows:

$$MSE= \frac{1}{2000} \sum_{i=1}^{2000}\sum_{j=1}^{p}{({\widehat{\beta }}_{ij}-{\beta }_{j})}^{2}$$
(3.6)

where, \({\widehat{\beta }}_{ij}\) is the estimated \({j}^{th}\) parameter in the \({i}^{th}\) replication and \({\beta }_{j}\) is the \({j}^{th}\) true parameter value. The estimated values of the mean squared error (MSE) of the proposed and other estimators are displayed in Tables 1, 2, 3, 4, 5 and 6 for \({p}^{*}\)=4 with 10% outliers, \({p}^{*}\)=8 with 10% outliers, \({p}^{*}\)=12 with 10% outliers, \({p}^{*}\)=4 with 20% outliers, \({p}^{*}\)=8 with 20% outliers and \({p}^{*}\)=12 with 10% outliers respectively.

Table 1 Estimated MSE values for p = 4 with 10% outlier.
Table 2 Estimated MSE values for p = 8 with 10% outlier.
Table 3 Estimated MSE values for p = 12 with 10% outlier.
Table 4 Estimated MSE values for p = 4 with 20% outlier.
Table 5 Estimated MSE values for p = 8 with 20% outlier.
Table 6 Estimated MSE values for p = 12 with 20% outlier.

For a clear visualization of the simulated MSE values, we plotted MSE values vs sample size in Fig. 1 for \({p}^{*}\)=4, σ = 5, 10% outliers and different ρ; in Fig. 2 for p = 4, σ = 5, 20% outliers and different ρ. The MSE values vs outliers are plotted in Fig. 3 for n = 30, \({p}^{*}\)=4, σ = 5 and different ρ.

Figure 1
figure 1

MSE vs sample size, For p = 4, σ = 5, 10% outliers and different ρ.

Figure 2
figure 2

MSE vs sample size, For p = 4, σ = 5, 20% outliers and different ρ.

Figure 3
figure 3

MSE vs outliers, for n = 30, p = 4, σ = 5, different values of ρ.

Simulation results discussions

Our conclusions are derived from the comprehensive review of the simulation results presented in Tables 1, 2, 3, 4, 5 and 6 and Figs. 1, 2 and 3. The key findings are outlined below:

In a comprehensive evaluation, it is evident that the proposed estimator consistently outperforms OLS in all scenarios, yielding a significantly lower Mean Squared Error (MSE) value. Additionally, all of the estimators exhibit monotonic behaviors in accordance with the MSE, meaning that the estimated MSE values drop as the sample size grows. The statistics clearly show that increasing the sample size has a beneficial impact on the effectiveness of all estimators, including OLS.

The proposed estimator \({\widehat{\alpha }}_{M-JSE}\) consistently exhibits the lowest MSE values across all simulation settings, surpassing both the OLS estimator and other biased estimators. To investigate the impact of outliers on the estimated regression parameters, we considered two different percentages of outliers in the y-direction. As the percentage increases from 10 to 20%, the MSE of all estimators shows a corresponding increase. In order to assess the influence of multicollinearity on the regression parameter estimates, we varied the correlation coefficients between explanatory variables (ρ = 0.7, 0.8, 0.9, 0.99). It was observed that increasing the correlation between explanatory variables resulted in higher MSE values for all estimators. When evaluating the performance of the estimators relative to the sample size (n = 30, 50, 100, 200) while keeping p, the percent of outliers, and σ fixed, a noticeable trend emerged: the MSE consistently decreased as the sample size grew. Additionally, the parameter σ had a significant impact on the MSE, as its increase led to a corresponding rise in the MSE for all estimators. The total number of explanatory variables also influenced the MSE values for all estimators. A higher number of explanatory variables resulted in higher MSE values. Under all simulation conditions, it is observed that the proposed is the most effective choice for mitigating multicollinearity in the presence of outliers.

Real-life application

In this section, we adopted three examples to evaluate the performance of the estimators.

Example I

We utilized a pollution dataset that has been previously analyzed by various researchers38,39. The response variable is the total age-adjusted mortality rate per 100,000, which is a linear combination of 15 covariates. For a more detailed description of the data, refer to38,39.

First, we employed the least squares method to fit model (1.1) and obtained the residuals. The diagnostic plots in Fig. 4 were obtained via the residuals, which indicated that certain observations were outliers. Specifically, the residual versus fitted plot identified data points 26, 31, and 37 as outliers, and the normal Q-Q plot indicated that data points 26, 32, and 37 were outliers. The residual versus leverage plot identified observations 18, 32, and 37 as outliers, while the scale-location plot picked observations 32 and 37. These observations reveal that there are outliers in the model. Additionally, the variance inflation factor for \({x}_{i12}\) and \({x}_{i13}\) were 98.64 and 104.98, respectively, indicating a high degree of correlation between the regressors.

Figure 4
figure 4

Graphical detection of outliers using pollution data.

To address the issues of correlated regressors and outliers, we estimated the model using the ridge regression, the Stein estimator, the M-ridge, and the proposed robust Stein estimator. We compared the performance of these estimators using the scalar mean squared error (SMSE), and the regression estimates and SMSE values are provided in Table 7.

Table 7 Regression coefficients and SMSEs for the pollution data.

From Table 7, we observed that due to the sensitivity of the OLS estimator to correlated regressors (multicollinearity) and outliers, it exhibited the worst performance in terms of SMSE. The coefficients of all the estimates were similar, except for \({x}_{6 ,}\) where only M-ridge and M-Stein had a positive coefficient. As expected, the robust ridge dominated the ridge estimator since the ridge estimator is sensitive to outliers. However, the Stein estimator performed better than the ridge estimator, as reported in the literature. Most notably, the proposed robust version of the Stein estimator (M-JSE) outperformed every estimator under the study.

Example II

The dataset was used to predict the value of a product in the manufacturing sector, based on three predictors: the value of imported intermediate (\({x}_{1}\)), Imported capital commodities (\({x}_{2}\)) and the value of imported raw materials \(\left({x}_{3}\right)\)14,40,41. A linear regression model was fitted, and the variance inflation factors were computed for each predictor, resulting in values of 128.26, 103.43, and 70.87, respectively, indicating high correlation between the predictor variables. The residual plot in Fig. 5 revealed the presence of outliers in the dataset. Outliers were identified by both the residual plot against the fitted values and the scale-location plot, which detected observations 16, 30, and 31 as outliers. The Normal Q-Q plot and Residual versus Leverage plot identified observations 31 and 30 as outliers. The residual versus leverage plot also detected observations 18, 32, and 37 as outliers, while the scale-location plot picked observation 32 and 37 as outliers. These findings indicate that the model contains both correlated regressors and outliers. The model was analyzed using several estimators, and the results were summarized in Table 8. It was observed that the regression estimate of the Stein estimator was the same as that of OLS, with a computed value of c approximately equal to 1 (c = 0.9996761). However, the Stein estimator exhibited a lower mean squared error than the OLS estimator. The ridge estimator dominated the Stein estimator in this instance, but the M-Ridge outperformed the ridge estimator by accounting for both multicollinearity and outliers. The proposed M-JSE performed the best in terms of smaller MSE.

Figure 5
figure 5

Graphical detection of outliers using import data.

Table 8 Regression coefficients and SMSE for the import data.

Example III

We analyzed the Longley data to predict the total derived employment, which is a linear function of the following predictors: gross national product implicit price deflator, gross national product, unemployment, size of armed forces, and non-institutional population 14 years of age and over33,38,39,40,42,43. The literature indicates that the model suffers from multicollinearity. Additionally, Fig. 6 shows that certain observations are anomalous, namely data points 9, 10, and 16.

Figure 6
figure 6

Graphical detection of outliers using longley data.

We used both robust and non-robust estimators to analyze the data, and the results are presented in Table 9. The table indicates that the regression estimates of OLS and Stein are the same, with a value of c = 1. However, the Stein estimator has a lower SMSE than OLS. The Stein estimator dominates the ridge and robust ridge estimators in this instance. Furthermore, the proposed robust Stein estimator provides optimal performance based on the results.

Table 9 Regression coefficients and MSEs for the pollution data.

In summary, the Longley data analysis indicates that the model suffers from multicollinearity and contains anomalous observations. However, using the robust Stein estimator provides the best performance among the estimators considered in this study.

Some concluding remarks

Linear regression models (LRMs) are widely used for predicting the response variable based on a combination of regressors. However, correlated regressors can decrease the efficiency of the ordinary least square method. Alternative methods such as the Stein and the ridge estimators can provide better estimations in such situations. However, these methods can be sensitive to outlying observations, leading to unstable predictions.

To address this issue, researchers have previously combined the ridge estimator with robust estimators (such as M-estimators) to account for both correlated regressors and outliers.

In this study, we developed a new biased estimator that offers an alternate approach to handling multicollinearity in linear regression, it is boosted Stein estimator by combining the M-estimator with the Stein estimator. Pseudo random numbers are created for both the independent and dependent variables in a Monte Carlo experiment. Different sample sizes, correlation strengths, and quantities of independent variables are taken into account. Our simulation and application results demonstrate that the robust Stein estimator outperforms the other estimators considered.

It is noted that, in the case of high multicollinearity, the suggested estimator showed its best performance by means of the reduction of the estimated MSE values and it is not affected by multicollinearity as much as other estimators. According to the tables, there is some difference between the performances of the suggested estimators according to the shrinkage parameter that is used and it may be concluded that, \({k}_{m}\) is the best shrinkage parameter among others in most cases.

The findings of this paper will be beneficial for practitioners who encounter the challenge of dealing with multicollinearity and outliers in their data. By using the Robust Stein estimator, they can obtain more stable and accurate predictions.

While this study has made substantial progress in addressing the challenges of LRMs, there are still avenues for further exploration. Future research endeavors should consider incorporating other robust estimators including the robust Liu estimator, Robust Liu-type estimator, robust linearized ridge estimator, Jackknife Kibria-Lukman M-Estimator, Modified Ridge-Type M-Estimator to conduct a more comprehensive comparative analysis13,14,45,46,47. This will contribute to a deeper understanding of the strengths and limitations of different approaches in handling complex data scenarios.

Another potential direction for future research is the extension of the current study using neutrosophic statistics. Neutrosophic statistics is an extension of classical statistics that is particularly useful when dealing with data from complex processes or uncertain environments48,49,50,51,52,53. By incorporating neutrosophic statistics, we can account for additional sources of uncertainty and variability, which may further enhance the robustness and applicability of our proposed estimator.