Introduction

The majority of surveys are only intended to offer estimates at the national and/or state/territory geographic levels that are statistically valid and design-based. Implementing and carrying out sample surveys that would produce accurate estimates at levels smaller than state/territory would be extremely difficult and expensive, both in terms of the larger sample sizes needed and the increased burden on survey respondents. Small area estimates are produced using small area estimation (SAE) techniques to get beyond the issue of small sample numbers and outperform the accuracy of direct survey estimates derived from the sample in each small region. Direct, synthetic, and other indirect estimations are some of the techniques used for SAE. The direct estimators solely employ information from the specified region under study. Mostly, they are unbiased, but very unstable having large variation. Indirect and composite estimators are more accurate because they additionally include information from related variables or nearby areas.

The direct estimators have been shown to produce unacceptable large standard errors as a result of asymmetric small samples from the relevant small area. In reality, there may be circumstances when no sample units can be selected from a portion of small domains. Finding indirect (synthetic) estimators, that dramatically increase sample size and subsequently reduce the standard error of the estimator is therefore necessary to achieve appropriate statistical accuracy. According to Gonzalez1 “an estimator is called a synthetic estimator if a reliable direct estimator for a large area, covering several small domains, is used to derive an indirect estimate for a small domain, under the assumption that the small areas have the same characteristics as the large area”. Developing indirect estimators for small areas is necessary since there is a lack of sufficient sample data in small geographic areas. Numerous researchers, particularly in the fields of health, agriculture, and poverty, have developed synthetic estimators. According to recent research by Tikkiwal and Ghiya2, Pandey and Tikkiwal3, Tikkiwal et al.4, Ashutosh et al.5,6, Bhushan et al.7, small area estimators based on auxiliary information outperform those that exclude it.

The issue of missing data is persistent in sample surveys and necessitates quick action to prevent the validity of any conclusions drawn from such data. The properties such as unbiasedness and efficiency of the estimators might both be compromised by the missing data. Imputation of missing data is the preferred and most often used method for dealing with missing data. Rubin8 proposed three fundamental conceptions in his landmark work: missing at random (MAR), observed at random (OAR), and parameter distribution (PD). A discrimination between missing at random (MAR) and missing completely at random (MCAR) was provided by Heitjan and Basu9. Many renowned writers have addressed the issue of missing data, and different imputation approaches have been used to fill in the gaps. The accessibility of adequate supplementary information is critical for the creation of effective imputations schemes. Numerous prominent researchers, including Rueda et al.10, Toutenburg and Srivastava11, Toutenburg et al.12, Singh and Horn13, Prasad14, Singh and Deo15, Singh16, Ahmed et al.17, Bhushan and Pandey18,19, Bhushan et al.20,21, Prasad22, Prasad and Yadav23, Bhushan and Kumar24 have studied in this field and developed imputations and the corresponding estimators for missing data utilizing auxiliary information. In this study, we use the MCAR approach to impute missing data altogether.

Further, in literature, no imputation method is available to solve the issue of missing data under SAE. Therefore, the objectives of this article are:

  1. (i)

    to propose some fundamental imputations, namely, mean, ratio, logarithmic type for estimating the domain mean;

  2. (ii)

    to propose Searls type logarithmic imputation methods estimating the domain mean;

  3. (iii)

    to compare the fundamental imputations with our Searls type logarithmic imputation methods.

Note that while imputing the missing observations, we do not modify the original responses. The methodology and notations used in this study are discussed below.

Methodology and notations

Consider a specified population \(\Phi =\{1, 2,\ldots , N\}\) of the size N from which a simple random sample s of the size n is drawn without replacement. In order to estimate the mean of domain d, we use the information collected in the sample. Further, let \(r_d\) and r be the amount of units responding from chosen \(n_d\) and n units and let \(R_d\) and R be the set of units responding in the domain d and total population, respectively. Also, \({\bar{R}}_d\) and \({\bar{R}}\) symbolize the set of units non-responding in the domain d and total population, respectively. For all units, \(i\in R\), the quantity \(y_i\) is obtained, but for the units \(i \in {\bar{R}}\), the quantities are missing and imputed data must be obtained to finalize the formation of sample data set. Suppose, the imputation is accomplished comprising the additional auxiliary information, X, so \(X_i\), the value of X for unit i, is available and positive for all \(i \in s\) such that the data \(\mathbf {X_s}=\left\{ X_i;~i \in s\right\}\) are available.

To derive the mean square error (MSE) of the consequent synthetic estimators of the proposed synthetic imputation methods, we take the following notations: \({\bar{y}}_{r}={\bar{Y}}(1+\varepsilon _0)\), \({\bar{x}}_{r}={\bar{X}}(1+\varepsilon _1)\), and \({\bar{x}}_{n}={\bar{X}}(1+\varepsilon _2)\), the \(\varepsilon 's\) are error terms such that \(E(\varepsilon _k)=0,~k=0,1,2\) and \(E(\varepsilon _0^2)=f_{r}C_{y}^2\), \(E(\varepsilon _1^2)=f_{r}C_{x}^2\), \(E(\varepsilon _2^2)=f_{n}C_{x}^2\), \(E(\varepsilon _0\varepsilon _1)=f_{r}\rho _{yx}C_{y}C_{x}\), \(E(\varepsilon _0\varepsilon _2)=f_{n}\rho _{yx}C_{y}C_{x}\), \(E(\varepsilon _1\varepsilon _2)=f_{n}C_{x}^2\), where, \(f_{r}=\left( \frac{1}{r}-\frac{1}{N}\right)\) and \(f_{n}=\left( \frac{1}{n}-\frac{1}{N}\right)\), \(C_{y}\) and \(C_{x}\) are the coefficient of variation of study and auxiliary variables, respectively, \(\rho _{yx}\) is the correlation coefficient between study and auxiliary variables.

The content that follows is broken up into a few sections. In “Adapted imputation methods” and “Proposed synthetic Searls type logarithmic imputation methods”, respectively, the adapted and proposed imputation methods are presented together with formulae for the mean square error (MSE). In “Efficiency conditions”, a comparison of the various imputation strategies is given. In “Simulation study”, a comprehensive simulation analysis using a few artificial populations is provided, and the main simulation results are explored. In “Real data application”, an actual data application is also provided. In “Conclusions”, this article is concluded with some concluding remarks.

Adapted imputation methods

Since literature contains no imputation methods to deal with the problem of estimation of mean of domain d in the presence of missing data. Therefore, we adapt some conventional imputation methods for the estimation of domain mean.

Conventional mean imputation method

When information on the auxiliary variables is not available, then the conventional mean imputation method is the obvious choice. When the ith sample unit in domain d is missing and requires imputation, we suggest the mean imputation of domain mean by amplifying the notations of Lee et al.25 for unit value imputation. The synthetic mean imputation technique for domain mean is given by

$$\begin{aligned} y_{{.i_m}}= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ {\bar{y}}_{r} &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

The consequent synthetic estimator is

$$\begin{aligned} t_{m}&={\bar{y}}_{r} \end{aligned}$$

The MSE of the consequent synthetic mean estimator is

$$\begin{aligned} MSE(t_m)&=({\bar{Y}}-{\bar{Y}}_d)^2+{\bar{Y}}^2f_{r}C_{y}^2 \end{aligned}$$
(1)

The imputation approaches are distinguished into two schemes when additional auxiliary information is taken into account.

Scheme I: When \({\bar{X}}_d\) is known and \({\bar{x}}_{n,d}\) is used.

Scheme II: When \({\bar{X}}_d\) is known and \({\bar{x}}_{r,d}\) is used.

Synthetic ratio imputation methods

The ratio imputation method provides efficient results when the study and auxiliary variables are positively correlated. The classical synthetic ratio imputation methods under schemes I and II are defined as

Scheme I

$$\begin{aligned} y_{{.i}_{r_1}}&= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ \frac{1}{n-r}\left[ n{\bar{y}}_{r}\left( \frac{{\bar{X}}_d}{{\bar{x}}_{n}}\right) -r{\bar{y}}_{r}\right] &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

Scheme II

$$\begin{aligned} y_{{.i}_{r_2}}&= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ \frac{1}{n-r}\left[ n{\bar{y}}_{r}\left( \frac{{\bar{X}}_d}{{\bar{x}}_{r}}\right) -r{\bar{y}}_{r}\right] &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

The consequent synthetic ratio estimators under above schemes are

$$\begin{aligned} t_{r_1}&={\bar{y}}_{r}\left( \frac{{\bar{X}}_d}{{\bar{x}}_{n}}\right) \\ t_{r_2}&={\bar{y}}_{r}\left( \frac{{\bar{X}}_d}{{\bar{x}}_{r}}\right) \end{aligned}$$

Theorem 2.1

The MSE of the consequent synthetic ratio estimators \(t_{r_j},~j=1,2\) of the synthetic ratio imputation methods \(y_{{.i}_{r_j}}\) under schemes I and II is given by

$$\begin{aligned} MSE(t_{r_1})&={\bar{Y}}_d^2\left( f_{r}C_{y}^2+f_{n}C_{x}^2-2f_{n}\rho _{yx}C_{y}C_{x}\right) \end{aligned}$$
(2)
$$\begin{aligned} MSE(t_{r_2})&={\bar{Y}}_d^2f_{r}\left( C_{y}^2+C_{x}^2-2\rho _{yx}C_{y}C_{x}\right) \end{aligned}$$
(3)

Synthetic logarithmic imputation methods

The proposed synthetic logarithmic imputation methods under schemes I and II are given below.

Scheme I

$$\begin{aligned} y_{{.i}_{l_1}}&= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ \frac{1}{n-r}\left[ n{\bar{y}}_{r}\left\{ 1+\theta _{1}\log \left( \frac{{\bar{x}}_{n}}{{\bar{X}}_d}\right) \right\} -r{\bar{y}}_{r}\right] &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

Scheme II

$$\begin{aligned} y_{{.i}_{l_2}}&= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ \frac{1}{n-r}\left[ n{\bar{y}}_{r}\left\{ 1+\theta _{2}\log \left( \frac{{\bar{x}}_{r}}{{\bar{X}}_d}\right) \right\} -r{\bar{y}}_{r}\right] &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

The resulting estimators are calculated under the schemes described above as

$$\begin{aligned} t_{l_1}&={\bar{y}}_{r}\left\{ 1+\theta _1\log \bigg (\frac{{\bar{x}}_{n}}{{\bar{X}}_d}\bigg )\right\} \\ t_{l_2}&={\bar{y}}_{r}\left\{ 1+\theta _2\log \bigg (\frac{{\bar{x}}_{r}}{{\bar{X}}_d}\bigg )\right\} \end{aligned}$$

where \(\theta _j\); \(j=1,2\) are the suitably chosen scalars.

Theorem 2.2

The MSE and minimum MSE of the consequent synthetic estimators \(t_{l_j},~j=1,2\) of the proposed synthetic imputation methods \(y_{{.i}_{l_j}}\) under schemes I and II are given by

$$\begin{aligned} MSE(t_{l_1})&={\bar{Y}}_d^2(f_{r}C_{y}^2+\theta _{1}^2f_{n}C_{x}^2-2\theta _1f_{n}\rho _{yx}C_{y}C_{x})\\ MSE(t_{l_2})&={\bar{Y}}_d^2f_{r}(C_{y}^2+\theta _{2}^2C_{x}^2-2\theta _2\rho _{yx}C_{y}C_{x})\\ minMSE(t_{l_1})&={\bar{Y}}^2_dC_{y}^2\left( f_{r}-f_{n}\rho _{yx}^2\right) \\ minMSE(t_{l_2})&={\bar{Y}}^2_dC_{y}^2f_{r}\left( 1-\rho _{yx}^2\right) \end{aligned}$$

Proposed synthetic Searls type logarithmic imputation methods

In order to increase the effectiveness of the estimators, Searls26 developed a transformation that required multiplying a tuning parameter in the estimators. Therefore, in order to improve the above works, we used a tuning parameter \(\delta _j,~j=1,2\) in the synthetic logarithmic imputation methods \(y_{{.i}_{l_j}}\) and propose synthetic Searls type logarithmic imputation methods for the mean of domain d utilizing auxiliary information in SRS.

The proposed synthetic Searls type logarithmic imputation methods under schemes I and II are given below.

Scheme I

$$\begin{aligned} y_{{.i}_{s_1}}&= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ \frac{1}{n-r}\left[ n\delta _1{\bar{y}}_{r}\left\{ 1+\theta _{1}\log \left( \frac{{\bar{x}}_{n}}{{\bar{X}}_d}\right) \right\} -r{\bar{y}}_{r}\right] &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

Scheme II

$$\begin{aligned} y_{{.i}_{s_2}}&= {\left\{ \begin{array}{ll} y_i&{}\text {if }i \in R\\ \frac{1}{n-r}\left[ n\delta _2{\bar{y}}_{r}\left\{ 1+\theta _{2}\log \left( \frac{{\bar{x}}_{r}}{{\bar{X}}_d}\right) \right\} -r{\bar{y}}_{r}\right] &{}\text {if }i \in {\bar{R}} \end{array}\right. } \end{aligned}$$

where \(\delta _j\), \(j=1,2\) are the suitably chosen scalars. The resulting synthetic estimators are calculated under the schemes described above as

$$\begin{aligned} t_{s_1}&=\delta _1{\bar{y}}_{r}\left\{ 1+\theta _1\log \bigg (\frac{{\bar{x}}_{n}}{{\bar{X}}_d}\bigg )\right\} \\ t_{s_2}&=\delta _2{\bar{y}}_{r}\left\{ 1+\theta _2\log \bigg (\frac{{\bar{x}}_{r}}{{\bar{X}}_d}\bigg )\right\} \end{aligned}$$

Special case

When \(\delta _j=1,~j=1,2\), then under schemes I and II, the proposed synthetic Searls type logarithmic imputation methods \(y_{{.i}_{s_j}}\) and the corresponding resultant synthetic Searls type logarithmic estimators \(t_{s_j}\) deform into the synthetic logarithmic imputation methods \(y_{{.i}_{l_j}}\) and the corresponding resultant synthetic logarithmic estimators \(t_{l_j}\), respectively.

Theorem 3.1

The MSE and minimum MSE of the consequent synthetic estimators \(t_{s_j},~j=1,2\) of the proposed synthetic imputation methods \(y_{{.i}_{s_j}}\) under schemes I and II are given by

$$\begin{aligned} MSE(t_{s_1})&=\left[ \begin{array}{l}{\bar{Y}}_{d}^2+\delta _1^2\left\{ {\bar{Y}}^2_d+f_r{\bar{Y}} _d^2C_y^2+f_n\theta _1^2{\bar{Y}}^2C_x^2+4\theta _1{\bar{Y}}{\bar{Y}}_df_n \rho _{xy}C_xC_y-\theta _1{\bar{Y}}{\bar{Y}}_df_nC_x^2\right\} \\ -2\delta _1\left\{ {\bar{Y}}^2 +\theta _1{\bar{Y}}{\bar{Y}}_df_n\left( \rho _{xy}C_xC_y-\frac{C_x^2}{2}\right) \right\} \end{array}\right] \\ MSE(t_{s_2})&=\left[ \begin{array}{l}{\bar{Y}}_{d}^2+\delta _2^2\left\{ {\bar{Y}}^2_d +f_r{\bar{Y}}_d^2C_y^2+f_r\theta _2^2{\bar{Y}}^2C_x^2+4\theta _2{\bar{Y}}{\bar{Y}}_df_ r\rho _{xy}C_xC_y-\theta _2{\bar{Y}}{\bar{Y}}_df_rC_x^2\right\} \\ -2\delta _2\left\{ {\bar{Y}}^2 +\theta _2{\bar{Y}}{\bar{Y}}_df_r\left( \rho _{xy}C_xC_y- \frac{C_x^2}{2}\right) \right\} \end{array}\right] \\ minMSE(t_{s_1})&={\bar{Y}}^2_d- \frac{Q_1^2}{P_1}\\ minMSE(t_{s_2})&={\bar{Y}}^2_d- \frac{Q_2^2}{P_2} \end{aligned}$$

where

$$\begin{aligned} P_1&={\bar{Y}}^2_d+f_r{\bar{Y}}_d^2C_y^2+f_n\theta _1^2{\bar{Y}}^2C_x^2+4\theta _1{\bar{Y}}{\bar{Y}}_df_n \rho _{xy}C_xC_y-\theta _1{\bar{Y}}{\bar{Y}}_df_nC_x^2,\\ Q_1&={\bar{Y}}^2+\theta _1{\bar{Y}}{\bar{Y}}_df_n\left( \rho _{xy}C_xC_y-\frac{C_x^2}{2}\right) ,\\ P_2&={\bar{Y}}^2_d+f_r{\bar{Y}}_d^2C_y^2+f_r\theta _2^2{\bar{Y}}^2C_x^2+4\theta _2{\bar{Y}}{\bar{Y}}_df_r \rho _{xy}C_xC_y-\theta _2{\bar{Y}}{\bar{Y}}_df_rC_x^2,\\ ~~\text {and}~~ Q_2&={\bar{Y}}^2+\theta _2{\bar{Y}}{\bar{Y}}_df_r\left( \rho _{xy}C_xC_y-\frac{C_x^2}{2}\right) . \end{aligned}$$

Proof

Consider the proposed consequent synthetic estimator \(t_{s_1}\) as

$$\begin{aligned} t_{s_1}&=\delta _1{\bar{y}}_{r}\left\{ 1+\theta _1\log \bigg (\frac{{\bar{x}}_{n}}{{\bar{X}}_d}\bigg )\right\} \end{aligned}$$

We can express the above estimator using the notations established in the previous section as

$$\begin{aligned} t_{s_1}&=\delta _1{\bar{Y}}(1+\varepsilon _0)\left[ 1+{\theta _1}\log \bigg \{\frac{{\bar{X}} (1+\varepsilon _2)}{{\bar{X}}_d}\bigg \}\right] \\&=\delta _1{\bar{Y}}(1+\varepsilon _0)\left[ 1+{\theta _1}\bigg \{\log \left( \frac{{\bar{X}}}{{\bar{X}}_d} \right) +\log (1+\varepsilon _2)\bigg \}\right] \\&=\delta _1{\bar{Y}}(1+\varepsilon _0)\left[ 1+{\theta _1}\bigg \{A+ \left( \varepsilon _2-\frac{\varepsilon _2^2}{2}+\cdots \right) \bigg \}\right] \end{aligned}$$

Simplifying the above expression and neglecting the higher order error terms, we get

$$\begin{aligned} t_{s_1}&=\delta _1{\bar{Y}}\left\{ 1+\varepsilon _0+\theta _1A+\theta _1\left( \varepsilon _2-\frac{\varepsilon _2^2}{2} \right) +\theta _1(A\varepsilon _0+\varepsilon _0\varepsilon _2)\right\} \end{aligned}$$

Subtracting \({\bar{Y}}_d\) on both sides to the above expression, we get

$$\begin{aligned} t_{s_3}-{\bar{Y}}_{d}&=\delta _1{\bar{Y}}(1+\theta _1A)-{\bar{Y}}_{d}+\delta _1{\bar{Y}} \left\{ \varepsilon _0+\theta _1\left( \varepsilon _2-\frac{\varepsilon _2^2}{2}\right) +\theta _1(A\varepsilon _0+\varepsilon _0\varepsilon _2)\right\} \end{aligned}$$
(4)

Squaring and taking expectation both sides to (4), we get MSE of the estimator \(t_{s_1}\) to the first order approximation as

$$\begin{aligned} MSE(t_{s_1})&=\left[ \begin{array}{l}\{\delta _1{\bar{Y}}(1+\theta _1A)-{\bar{Y}}_{d}\}^2 +2\delta _1\theta _1{\bar{Y}}\{\delta _1{\bar{Y}}(1+\theta _1A)-{\bar{Y}}_{d}\}f_n \left( \rho _{xy}C_xC_y-\frac{C_x^2}{2}\right) \\ +\alpha ^2{\bar{Y}}^2\left\{ (1+\theta _1A)^ 2f_rC_y^2+\theta _1^2f_nC_x^2+2\theta _1(1+\theta _1A)f_n\rho _{xy}C_xC_y\right\} \end{array}\right] \end{aligned}$$
(5)

Under the assumption of Searls logarithmic synthetic estimation \({\bar{Y}}(1+\theta _1A)={\bar{Y}}_d\), the \(MSE(t_{s_1})\) can be expressed as

$$\begin{aligned} MSE(t_{s_1})&=\left[ \begin{array}{l}{\bar{Y}}_{d}^2+\delta _1^2\left\{ {\bar{Y}}^2_d+f_r{\bar{Y}}_d^2C_y^2 +f_n\theta _1^2{\bar{Y}}^2C_x^2+4\theta _1{\bar{Y}}{\bar{Y}}_df_n\rho _{xy}C_xC_y -\theta _1{\bar{Y}}{\bar{Y}}_df_nC_x^2\right\} \\ -2\delta _1\left\{ {\bar{Y}}^2 +\theta _1{\bar{Y}}{\bar{Y}}_df_n\left( \rho _{xy}C_xC_y-\frac{C_x^2}{2}\right) \right\} \end{array}\right] \nonumber \\&={\bar{Y}}_d^2+\delta _1^2P_1-2\delta _1Q_1 \end{aligned}$$
(6)

where

$$\begin{aligned} P_1&={\bar{Y}}^2_d+f_r{\bar{Y}}_d^2C_y^2+f_n\theta _1^2{\bar{Y}}^2C_x^2 +4\theta _1{\bar{Y}}{\bar{Y}}_df_n\rho _{xy}C_xC_y-\theta _1{\bar{Y}}{\bar{Y}}_df_nC_x^2\\ \text {and}~~ Q_1&={\bar{Y}}^2+\theta _1{\bar{Y}}{\bar{Y}}_df_n\left( \rho _{xy}C_xC_y-\frac{C_x^2}{2}\right) . \end{aligned}$$

Partially differentiating (6) regarding \(\delta _1\) and equating to zero, we get the optimum value of \(\delta _1\) as

$$\begin{aligned} \delta _{1(opt)}=\frac{Q_1}{P_1} \end{aligned}$$

Putting the optimum value of \(\delta _1\) from the above expression to (6), we get minimum MSE of the estimator \(t_{s_1}\) as

$$\begin{aligned} min.MSE(t_{s_1})&={\bar{Y}}_{d}^2-\frac{Q_1^2}{P_1} \end{aligned}$$
(7)

Similarly, the first order approximated expressions of MSE and minimum MSE of the proposed synthetic estimator \(t_{s_2}\) can be obtained. \(\square\)

Efficiency conditions

In the present section, we compare the minimum MSE of the proposed synthetic imputation methods with the corresponding minimum MSE of the existing synthetic imputation methods under schemes I and II.

Lemma 4.1

The proposed synthetic Searls type logarithmic imputation methods \(y_{.i_{s_j}},~j=1,2\) dominate the synthetic mean imputation method \(y_{.i_{m}}\), if

$$\begin{aligned} MSE(t_{s_j})&<MSE(t_m)\implies \frac{Q_j^2}{P_j}>1-\frac{({\bar{Y}}-{\bar{Y}}_d)^2}{{\bar{Y}}_d^2}-\frac{{\bar{Y}}^2}{{\bar{Y}}_d^2}f_{r}C_{y}^2 \end{aligned}$$

Lemma 4.2

The proposed synthetic Searls type logarithmic imputation methods \(y_{.i_{s_j}},~j=1,2\) dominate the synthetic ratio imputation methods \(y_{.i_{r_j}}\) under schemes I and II, if

$$\begin{aligned} MSE(t_{s_j})&<MSE(t_{r_j})\implies \frac{Q_j^2}{P_j}>1-f_{r}C_{y}^2-f_{n}C_{x}^2+2f_{n}\rho _{yx}C_{y}C_{x} \end{aligned}$$

Lemma 4.3

The proposed synthetic Searls type logarithmic imputation methods \(y_{.i_{s_j}},~j=1,2\) dominate the synthetic logarithmic imputation methods \(y_{.i_{l_j}}\) under schemes I and II, if

$$\begin{aligned} MSE(t_{s_j})&<MSE(t_{l_j})\implies \frac{Q_j^2}{P_j}>1-C_{y}^2(f_{r}-f_{n}\rho _{yx}^2) \end{aligned}$$

The proposed synthetic Searls type logarithmic imputation methods repress the synthetic mean per unit imputation method, synthetic ratio imputation methods and synthetic logarithmic imputation methods, if the aforementioned lemmas are satisfied. The next section verifies the above lemmas utilizing a comprehensive simulation study.

Simulation study

A simulation study is executed to assess the effectiveness of the suggested synthetic imputation methods in comparison to the adapted synthetic imputation methods. In the simulation procedure, certain symmetrical and asymmetrical populations are produced in accordance with the models employed by Singh and Horn27. The model used are as follows:

$$\begin{aligned} y&=5.5+\sqrt{(1-\rho _{xy}^2)}~y^*+\rho _{xy}\left( \frac{S_y}{S_x}\right) x^*\\ x&=5.3+x^* \end{aligned}$$

where \(x^*\) and \(y^*\) are independent variables for the corresponding distributions. Considering the above models, we have generated the below mentioned populations:

  1. 1.

    A Normal population of size N=6000 using \(x^*\sim N(12,35)\) and \(y^*\sim N(13,45)\) with varying correlation coefficients \(\rho _{xy}\)=0.1, 0.5, 0.9.

  2. 2.

    A Gamma population of size N=6000 using \(x^*\sim G(0.02,0.006)\) and \(y^*\sim G(0.2,0.011)\) with varying correlation coefficients \(\rho _{xy}\)=0.1, 0.5, 0.9.

The above populations are divided into 6 equal domains of size 1000. We have drawn a random sample of sizes \((n_1,~n_2,~n_3,~n_4,~n_5,~n_6)=(200,~250,~300,~350,~100,~150)\) from the respective domains and chosen the varying response rates \(r_1=(170,~180)\), \(r_2=(230,~240)\), \(r_3=(270,~280)\), \(r_4=(330,~340)\), \(r_5=(80,~90)\), and \(r_6=(130,~140)\) from the respective samples. The imputation strategy is taken and the MSE of the consequent estimators is computed by utilizing 15,000 iterations. The simulation procedure is explained in the undermentioned steps.

  1. (i)

    Select a sample s of size n randomly from the population of size N.

  2. (ii)

    Bring out randomly (\(n_d\)-\(r_d\)) sample units through sample s every time.

  3. (iii)

    Impute selected units by considering the proposed imputation methods studied for quantified samples.

  4. (iv)

    Compute the needed statistics.

  5. (v)

    Iterated the prior steps 15,000 times.

The empirical (simulated) mean square error (EMSE) and the theoretical mean square error (TMSE). The TMSE is calculated using the MSE expressions of the respective estimators obtained in “Adapted imputation methods” and “Proposed synthetic Searls type logarithmic imputation methods”, while the EMSE is calculated utilizing the following formula:

$$\begin{aligned} EMSE(t_{*})&=\frac{1}{15,000}\sum _{i=1}^{15,000}(t_{*}-{\bar{Y}}_d)^2 \end{aligned}$$
(8)

where \(t_{*}\)=\(t_{m}\), \(t_{r_j},~j=1,2\), \(t_{l_j}\), \(t_{s_j}\).

The results of the consequent synthetic estimators for normal and gamma populations are reported in Tables 1 and 2, respectively.

Key results of simulation study

We interpret the key results of simulation study summarized from Tables 1 to 2 in the following points.

  1. 1.

    The outcomes drawn from normal population for the consequent synthetic estimators are reported in Table 1. These outcomes show that:

    1. (a)

      the EMSE and TMSE of the consequent synthetic ratio estimator \(t_{r_1}\) under scheme I decreases with the successive increase in the correlation coefficient \(\rho _{xy}\) from 0.1 to 0.9. This tendency in the EMSE and TMSE values of \(t_{r_1}\) can be also observed from scheme II for the estimator \(t_{r_2}\).

    2. (b)

      the EMSE and TMSE of the consequent synthetic logarithmic estimator \(t_{l_1}\) under scheme I decreases with the successive increase in the values of correlation coefficient \(\rho _{xy}\) from 0.1 to 0.9. This tendency in the EMSE and TMSE values of \(t_{l_1}\) can be also observed from scheme II for the consequent synthetic logarithmic estimator \(t_{l_2}\).

    3. (c)

      the EMSE and TMSE of the consequent synthetic Searls type logarithmic estimator \(t_{s_1}\) under scheme I decreases with the successive increase in the correlation coefficient \(\rho _{xy}\) from 0.1 to 0.9. This tendency in the EMSE and TMSE values of \(t_{s_1}\) can be also observed from scheme II for the consequent synthetic Searls type logarithmic estimator \(t_{s_2}\).

    4. (d)

      the EMSE and TMSE of the consequent synthetic ratio estimators, synthetic logarithmic estimators, and synthetic Searls type logarithmic estimators decreases with the increase in the responding units \(r_d\) under schemes I and II in each domain.

    5. (e)

      the EMSE and TMSE of the consequent synthetic ratio estimators, synthetic logarithmic estimators, and synthetic Searls type logarithmic estimators under both schemes in each domain are observed to be very close to each other.

    6. (f)

      the consequent synthetic Searls type logarithmic estimators \(t_{s_j},~j=1,2\) perform better than the adapted synthetic mean estimator \(t_{m}\), synthetic ratio estimators \(t_{r_j}\), and synthetic logarithmic estimators \(t_{l_j}\) under schemes I and II.

  2. 2.

    The similar tendency as observed from the results of Table 1 obtained from normal population for synthetic estimators can also be observed from the results of Table 2 obtained from gamma population for synthetic estimators.

  3. 3.

    Finally, from the results of Tables 1 and 2, the performance of the synthetic ratio estimators, synthetic logarithmic estimators, and synthetic Searls type logarithmic estimators is better under scheme II compared to scheme I.

Table 1 EMSE and TMSE of synthetic estimators under normal population.
Table 2 EMSE and TMSE of synthetic estimators under gamma population.

Real data application

Like most other Indian states, Uttar Pradesh is separated into a several districts for the purpose of taking taxes and conducting other administrative and agricultural works. Each district is further separated into a number of tehsils, and each tehsil is further separated into several blocks. Blocks are referred to as small domains in this study.

Since the area used for cultivation determines the yield of every crop. Therefore, for applications using real data, we take into account the problem of estimating agricultural output for various blocks in the Agra district of Uttar Pradesh. Six blocks in the Agra district are referred as small domains. The amount of Bajra crop produced (in tonnes) for the agricultural season 2021–2022 is regarded as the study variable y, whilst the area of Bajra crop produced (in hectares) for the agricultural season 2021–2022 is regarded as the auxiliary variable x. Various information regarding the blocks of Agra district are reported in Table 3, whereas for easy reference, the parameters for each domain are shown in Table 4.

Table 3 Total production and area under Bajra crop in Blocks of Agra district for agricultural season 2021–2022.
Table 4 Population parameters for different domains.

From the domain sizes \((N_1,~N_2,~N_3,~N_4,~N_5,~N_6)=(38,~53,~66,~45,~44,~53)\) mentioned in Table 4, we have selected samples \((n_1,~n_2,~n_3,~n_4,~n_5,~n_6)=(8,~11,~13,~9,~9,~11)\), respectively. Out of these selected samples, the responding units are taken as \(r_1=(5,~7)\), \(r_2=(7,~9)\), \(r_3=(9,~11)\), \(r_4=(5,~7)\), \(r_5=(5,~7)\), and \(r_6=(7,~9)\), respectively. Taking the parameters of domain given in Tables 3 and 4, we have computed the MSE of the proposed synthetic estimators.

The results based on the real data for synthetic estimators are reported in Table 5, respectively, which show the dominance of the proposed synthetic Searls type logarithmic imputation methods over the corresponding synthetic mean, ratio, and logarithmic type imputation methods. Under both schemes, the proposed synthetic imputation methods outperform the corresponding synthetic mean, ratio, and logarithmic type imputation methods. The MSE of the adapted and proposed synthetic estimators decreases as the responding units increase under both schemes in each domain. Moreover, the adapted synthetic ratio imputations, synthetic logarithmic imputations and the proposed synthetic Searls type logarithmic imputations perform better in scheme II compared to scheme I.

Table 5 MSE of synthetic estimators for real population.

Conclusions

In the current article, we have adapted synthetic mean, ratio, and logarithmic imputation methods, while proposing synthetic Searls type logarithmic imputation methods for the estimation of domain mean in the case of missing data under simple random sampling. The algebraic expressions of MSE for the proposed imputation methods are derived to first order approximation. The algebraic conditions are obtained by comparing the MSE expressions of the proposed and adapted imputations. Furthermore, a comprehensive simulation is executed using a deliberately drawn normal (symmetric) and gamma (asymmetric) population in order to assess the performance of the suggested imputation approaches. The EMSE and TMSE obtained in simulation study show that for varying amounts of correlation coefficient as well as responding units in each domain, the suggested synthetic Searls type logarithmic imputation techniques excel compared to the adapted synthetic mean, ratio, and logarithmic imputation methods. Further, from the results of Tables 1 and 2, the EMSE and TMSE of the adapted and suggested estimators are observed to be very close to each other under both the schemes in each domain. In addition, an actual data set based on the production of Bajra crops in the Agra district of Uttar Pradesh, India, is also used to demonstrate the applicability of the suggested imputation approaches. The results of the real data also favour the suggested imputations compared to the adapted imputations. Therefore, under SAE, if missing data is identified, survey practitioners may be advised to employ the suggested imputation procedures.