Efficient estimation of population variance of a sensitive variable using a new scrambling response model

This study introduces a pioneering scrambling response model tailored for handling sensitive variables. Subsequently, a generalized estimator for variance estimation, relying on two auxiliary information sources, is developed following this novel model. Analytical expressions for bias, mean square error, and minimum mean square error are meticulously derived up to the first order of approximation, shedding light on the estimator’s statistical performance. Comprehensive simulation experiments and empirical analysis unveil compelling results. The proposed generalized estimator, operating under both scrambling response models, consistently exhibits minimal mean square error, surpassing existing estimation techniques. Furthermore, this study evaluates the level of privacy protection afforded to respondents using this model, employing a robust framework of simulations and empirical studies.

The present study follows the methodology of Gupta et al. 28 and suggested a new generalized exponential estimator, to estimate the variance of the finite population which is complex in nature.The jargon of the bias and the mean square error of the proposed estimator originated up to the first order of approximation.The outline of the article is organized as follows: In "Sampling strategy for scrambled response model" section the sampling strategy for the scrambled response model presented by Diana and Perri 6 is discussed."The proposed estimator and its class of estimators" section, displays the proposed generalized estimator for two auxiliary variables under the existing model along with the expressions of bias and MSE.In "The proposed RRT model and estimator-II" section, we also propose a generalized randomized response model.The unbiased variance estimator, ratio estimator, and proposed generalized estimator are modified under the proposed model in the same section.The privacy protection measure for the models are discussed in "Privacy levels" section.To support the proposed methodology a simulation study is presented in "An application of the proposed model" section and some concluding interpretations are given in "Simulation study" section.

Sampling strategy for scrambled response model
Let a simple random sample done without replacement (SRSWOR) of size n be drawn a finite population of U = {U 1 , U 2 ,…, U N }.Let Y be a true response of sensitive quantitative variables and X be the non-sensitive auxiliary variable, positively correlated to Y. Let s 2 y = n i=1 (y i −y) 2 (n−1) , , , , n .Let us define the following assumptions and expectations to get the bias and mean square error: , and z = Z(1 + e z ), where δ z = and e z = z−z z , such that Based on the Diana and Perri 6 RRT model Z = TY + S, Gupta et al. 28 introduced basic variance and some ratio-type estimators.The basic variance estimator is as The MSE of t 0 is as, The ratio estimators is given by, The MSE of t ratio estimator is as, The generalized ratio estimator is as, The MSE of t 0 is as, (1) (3) (5) The proposed estimator and its class of estimators

The proposed estimator
In this section, a generalized exponential estimator is presented following Koyuncu et al. 11 .The form of the proposed estimator is given by, where k 1 , k 2 , k 3 , are the three optimizing and unrestricted constants which need to be estimated such that the MSE of the estimator is minimum, and 1 and 2 are the generalization constants which need to be placed with some suitable values, known parameters, or function of known parameters to get different efficient and or existing estimators.A few examples are shown in Table 1 by setting different values to the constants.
To obtain the Bias, and the MSE, we define the following error terms, rewriting Eq. ( 7), we have The mean square error of the generalized estimator t 1D is given by Differentiate Eq. ( 10) with respect to k 1 , k 2 , and k 3 , and after the simplification optimum values of the constants are given by, and utilizing the optimum values of the constants into Eq.(10), the simplified form of the minim MSE of the estimator is given by ( 7) , and Table 1.The class of estimators for different choices of constant's values.4 (µ 040 − 1)

Class of estimators
.

Mathematical comparison of the proposed class of estimators with
The above (i)-(vii) expressions the conditions under which the proposed class of estimators performs better as compared to the estimators t 0 .

The proposed RRT model and estimator-II The proposed RRT model
Our scrambled randomized response model provides a combination of multiplicative, additive, and subtractive models.Since Y is the sensitive variable of interest and hence subject to social desirability bias.S and R are the two independent scrambling variables and are mutually uncorrelated with Y.We assume ( 12) By pertaining expectations together on (24), the Bias and mean square error we obtain are as where ( 14) , . The MSE imputing this optimum value is given as, w h e r e D 1 = 2 , and,

Privacy levels
In the literature, many privacy protection measures are presented by different authors.For our study, privacy measure due to Yan et al. 30 is used to compute privacy for Diana and Perri's 6 model, and the proposed randomized response model.
The privacy protection measure presented by Yan et al. is given by, ( 27) , Vol.:(0123456789)

An application of the proposed model
In this section, motivated by Saleem and Sanaullah 17 a real-life application is presented to analyze the efficiency of the proposed RRT model compared to the existing models.
A survey is organized to collect real data for the problem of the estimation of the true variance of the Grade Point Average (GPA) of the students of the Department of Statistics, in Forman Christian College University Lahore, who have studied the Course: Statistical Methods in Spring 2023.Ninety students registered in three sections in this statistics course are considered as our population.In this application, the variable of interest Y is the CGPA of students, and the two auxiliary variables i.e., X 1 is the weekly study hours, and X 2 is the number of courses studied in recent semesters.For the scrambling variables, S is a normal random variable with a mean equal to zero and a standard deviation equal to 2, and R is a normal random variable with a mean equal to 1 and a standard deviation equal to 0.02.The following are some characteristics of the population: (33) Z = RY + S. (34) Table 2.The MSEs of the estimators for real population.
n Estimators  Table 2 shows the results for MSE estimates of the model given by Diana and Perri 6 and the proposed model.The resuts are obtained byusing two different sample sizes n = 20 and 38.One can notice that the proposed estimator provides minimum and better results as compared to the other estimators under both models.

Simulation study
In this section, we conduct a simulation study to evaluate the performance of the proposed generalized exponential-type estimators by comparing some existing variance estimators.
Population I: Population II: For both populations, we ruminate three different samples of sizes 200, 300, and 500.The variance of S i.e.Var (S) and variance of T, i.e.Var (T) choose different values for simulation.
Table 3 provides the privacy protection level of the RRT models discussed in this study, we follow Gupta et al. 31 unified measure of the estimator and is given by where i = 0, ratio, 1D, NP1, NP2, 1; j = D and PN, MSE(t i ) is the theoretical MSE of the various estimators and j is the privacy level for Diana and Perri's 6 model and the proposed model as discussed in "Privacy levels" section.
Tables 4, 5 and 6 give the MSE and percent relative efficiency (PRE) results for the proposed estimator and existing estimators discussed in this article.The following expression is implied to get the PRE, where i = ratio, gratio, and gep.
The results are presented in Tables 4, 5 and 6.The Tables 4 and 5 provides the numerical results of estimators dicussed in "Sampling strategy for scrambled response model" and "The proposed estimator and its class µ Z NP = 52.31;σ Z NP = 6.78; ρ Z NP X 1 = −0.044;ρ Z NP X 2 = 0.140.� = 10 3 2.9 3 2 1.1 2.9 1.1 2 , ρ x1y = 0.6817, ρ x2y = 0.6705.A smaller value of ϑ is to be preferred.Tables 3, 4 and 5 presents the unified measure along with the PRE of the estimators.It is observed that the proposed generalized estimator using two auxiliary variables efficiently performs either using Diana and Perri's model or the proposed RRT model.One can notice that the values of ϑ are smaller for the proposed generalized model.

Conclusion
This study addressed the estimation of population variance for sensitive study variables using a non-sensitive auxiliary variable.A generalized exponential-type estimator, based on Diana and Perri's 6 randomized response model, was introduced and evaluated against estimators proposed by Gupta et al. 28 , as detailed in Tables 4, 5 and 6.The comparative analysis indicated that the proposed estimator consistently demonstrated superior efficiency in variance estimation.Additionally, we introduced a novel generalized scrambled response model and applied it to conventional variance and ratio estimators, along with the proposed estimator.In "An application of the proposed model" section, a real survey-based study was presented, applying the proposed RRT model.The results, obtained under both our novel model and the model presented by Diana and Perri 6 , revealed that the proposed estimator consistently outperformed conventional mean and ratio estimators in minimizing MSE.Notably, as the sample size increased, the efficiency of the estimator further improved.Moreover, a simulation study was conducted, and the findings are summarized in Tables 3, 4, 5 and 6, comparing expected variances, MSE, and the precision (PRE).The results indicated that the generalized proposed estimator under the proposed randomized response model consistently provided the minimum MSE for both populations, outperforming the estimator's MSE results using Diana and Perri's 6 model.This research study contributes valuable insights into variance estimation for sensitive variables.The proposed generalized estimator, underpinned by the innovative scrambled response model, demonstrated robustness, scalability, and superior performance in both real and simulated scenarios.These findings underscore the potential of this approach in advancing the precision and reliability of population variance estimation in sensitive contexts.

Table 3 .
Privacy level for two populations.

Table 4 .
The MSEs and PREs of the estimators for Population I with σ 2 T = 0.5 using Z = YT + S.

Table 5 .
The MSEs and PREs of the estimators for Population II with σ 2 T = 0.5 using Z = YT + S.

Table 6 .
68e MSEs of the estimators for population I and II with σ 2 T using the proposed model. of estimators" sections whereas the Table6presented the results of estimators discussed in "The proposed RRT model and estimator-II" section based on proposed model.The values from Tables 4, 5 and 6 confirm that the existing estimators presented by Gupta et al.28are less efficient as compared to the generalized estimator.Also while comparing the proposed model and existing model estimator results in these tables on mayobseve that the proposed model provides more efficient MSE values as compared to the model presented by Diana and Perri6.As we can see as the variance of S increases the MSE decreases.