A novel approach for estimating fertility rates in finite populations using count regression models

Demographic health surveys (DHS) contain in-depth information about the demographic characteristics and the factors affecting them. However, fertility rates which are the important indicators of population growth have been estimated by utilizing the design-based approaches. Model-based approach, on the other hand, facilitates efficient predictive estimates for these rates by utilizing the demographic and other family planning related characters. In this article, we first attempt to observe the effect of various socio-demographic and family planing related factors on births counts by fitting different regression models to Pakistan Demographic Health Survey 2017–2018 data under classical as well as Bayesian frameworks. The births occurred during the time periods of 1-year, 3-years and 5-years are taken as the responses and modeled using different non-linear models. The model-based approach is then used for estimation of the fertility measures including age-specific fertility rates, total fertility rate, general fertility rate, and gross reproduction rate for ever-married women in Pakistan. The performance of the model-based estimators is examined using a bootstrapped sampling algorithm. While the age-specific fertility rates are over-estimated for some age groups and under-estimated for others. The model-based fertility estimates are recommended for estimating the demographic indicators at national and sub-national levels when survey data contains incomplete or missing responses.

or B and, therefore, do not have full range i.e. −∞ to +∞ .Hence, the relationship between the response vari- able and the predictors may not be linear.Some examples of GLMs used for birth history data include logistic regression model 18 , Poisson regression model 19 and event history methods 20 .Although the principle has rarely been described in demographic analysis manuals, regression methods, especially Poisson regression, can still be practiced to calculate the classical demographic measures such as total fertility rate (TFR) and other fertility rates 21,22 .Some new developments and applications on fertility estimation and casual inference are available in [23][24][25][26][27][28] .Reference 29 used 1998-1999 Burkina Faso DHS data to explain the estimation of TFR and ASFRs using individual data with Poisson regression.The number of births over a 5-years period preceding the interview (variable predefined in the DHS women recode file PKIR71) is taken as the response variable and the age groups as dummy regressors in the model controlling for the length of exposure (5-years corresponding to each woman) using a term (offset) 30,31 .The average number of births is obtained by exponentiating the coefficients for each of the seven age groups without introducing an intercept and computing the TFR as the sum of the rates multiplied by five.Recently 32 , obtained pooled estimate of the TFR in sub-Saharan Africa using (2010-2018) DHS data.
The classical linear regression provides estimate of the model parameters based on the sampled data alone.However, if the sample size is small, one might express the estimate as a distribution of possible values of the parameter given the sample information.This is the situation where Bayesian regression is needed.Bayesian regression is widely used in estimation, inference and prediction purpose in wide variety of non-linear setups 33,34 .In Bayesian regression parameter estimation is done using Markov Chain Monte Carlo (MCMC) method 35 .
Reference 35 warned about the programming errors and the problems that occur in estimation routines during MCMC.However, Bayesian framework is still preferred due to the flexibility in model construction, statistical inference, and assessment of the fitted model than the classical approaches 36 .Reference 37 also pointed out two main concerns with the employment of the MCMC algorithms: mixing and convergence and suggested to confirm that the algorithm results in a Markov chain that "converges" to the appropriate posterior density and "mixes" well throughout the values of the density.When these conditions met, Bayesian regression nicely models the linear as well as non linear relationship between the response variable and covariate with an effective prediction 38,39 .
The literature on fertility estimation emphasizes on observing the relationship between the birth counts and their determinants as well as utilizing the estimated models for obtaining more accurate estimates at national and sub-national levels.However, the model-based predictive estimation method aids in efficiency by utilizing the model relationship between the study variable and the available socio-demographic variables 40,41 .However, the response variables are the birth counts which, we cannot proceed with linear regression setup.For obtaining a precise estimates of fertility measures, we establish the model-based estimators of fertility measures using the classical as well as the Bayesian count regression models.The model-based approach increase efficiency of the estimators by including information available on related covariates.We are able to produce estimates on different fertility measures with age sex distribution at national as well as sub-national levels for fixed values of covariates.Based on the suggested estimation algorithm we can produce fertility estimates for fixed values of other significant socio-demographic and family planning variables.For example, we can produce estimates on fertility indicators based on Wealth Index (WI), education level of women, contraceptive usage etc.An etiological analysis on number of birth is conducted in Section "Model estimation and inference for birth counts".Section "Model-Based Estimation of Fertility Indicators" covers the model-based estimation of different fertility rates.Section "Conclusion" summarizes the article with some concluding remarks (Fig. 1).

Model estimation and inference for birth counts
We first briefly introduce fertility indicators and methods of their calculation following 5 .The Age Specific Fertility Rate (ASFR) is the average count of births that occurred during a given reference period per 1,000 women exposed to the risk of fertility, in 1-year, 3-years, or 5-years age groups.For any age group g = 1, 2, ..., 7 for 5-years grouping), B g , and E g denote the total births given by the women and women-years of exposure in age group g for the referenced period respectively.The ASFR in age group g is expressed as: where 1000 is multiplied to show the rate per 1000 women-years of exposure.The data about the exact Date of Birth (DoB) of the child from DHS data was utilized for directly calculating the numerator B g .For calculating the denominator E g , the exact DoB of each woman was used for summing up the number of women-years of exposure in g, because a woman can participate in atleast two age groups for a given referenced period.For details about computation of the women-years of exposure, readers are referred to 11 .The Total Fertility Rate (TFR) is used to measure women's fertility hypothetically 42 .It can be described as the total children who would be born per 1,000 women if they went through their reproductive age according to a given schedule of ASFR subject to no mortality.The TFR is computed on the basis of ASFR g for g = 0, 1, ..., 6 as follows The General Fertility Rate (GFR), which is the mean count of births that a woman gives during her whole reproductive period, is obtained by dividing the total number of births during a specified period by the total number of women exposed to the risk of fertility, during the same specific period.The standard formula for GFR is given by ( 1) Similarly, Gross Reproduction Rate (GRR) is computed based on ASFR and sex ratio at birth during that period.Like TFR, GRR assumes that the hypothetical group (cohort) of women pass from birth over their reproductive age with no mortality.This assumption is valid when one is interested in comparing levels of fertility over time.The GRR is expressed as where P g is the proportion of female births to the women in age group g.The birth data file (PKBR71FL.DTA) from PDHS 2017-18 was taken for analysis.The detail about the data collection mechanism, fieldwork, training of staff and pretest is available in 43 .The key variables including demographic characteristics, socioeconomic variables, and variables related to family planning were taken as the predictors and described in Table 1 along with the response variables.Figure 2 shows bar-charts corresponding to three responses and one can observe that all three responses depict highly departure from symmetry.Also one can observe that the number of births for 1 year includes the highest number of zeros than the other two counts.The number of births during the 5-year period tends to follow a Poisson distribution without access to zeros.

Birth count regression models
Let y be the observed response corresponding to a random variable Y whose values are unknown for a finite population of size N indexed as U = {1, 2, 3, ...., N} .In matrix notation, let y = (y i , i ∈ U) be the realizations of the stochastic vector Y = (Y i , i ∈ U) under model-based approach.Suppose a sample S = {1, 2, 3, ..., n} of size n is drawn from finite population U using a Sampling Design (SD) and r = (1, 2, 3, ..., N − n) be the set of index attached to the values of units that are not indexed in s.For a given sample s, we can rearrange the population vector as y = (y T s , y T r ) T , where y s and y r be the vectors of n sampled and (N − n) non-sampled values of the study variable respectively.Let M be the true underlying model expressed as where X is the data matrix containing p regressors including intercept and ǫ be the vector of random errors assumed to be distributed normally with mean vector 0 and variance-covariance matrix .When the response variable is the number of occurrences of an event, the distribution of counts is discrete and is bound to nonnegative integered values.While applying an ordinary linear regression model to such data, researchers may face one of the following two issues.(i) Often such count data has positively skewed distribution with many observations having value 0 as in Fig. 2. With a large number of zeros in the data set, one cannot transform such skewed  distributions into normal.(ii) It is quite possible that the regression model produces negative predicted values which contradicts with theory 44 .The following sub-sections cover some generalized linear regression models which we use as a working model for births per woman during 1, 3, and 5-years periods before the interview.

Poisson regression model
A Poisson Regression Model (PRM) assumes that the error term has Poisson distribution instead of a normal distribution.Further, it uses the natural logarithm of the response variable as a linear function of the coefficients rather than simply modeling the response variable as a linear function of the regression coefficients.To proceed it is assumed that the logarithm of the mean values (rates) can be modeled into a linear form with some unknown parameters.The mathematical form of PRM is given by where vector β is obtained using maximum likelihood estimation (MLE).Let µ be the rate parameter which is also the dispersion parameter of Poisson distribution and Equation ( 5) can be expressed as: (6)  log y = Xβ, Table 1.Variable Description. 1 The values of Preg_term are reported after imputing 0 in missing cases within first 13,123 and 1 in remaining 37,373 cases.Note that this division is made by dividing the data in ratio 4676 × (50495)/(13317 + 4676) and 13317 × (50495)/(13317 + 4676). 2 Missing entries in Age_husbnd was imputed by median age of the observed responses.The exponent of the coefficient β j (jth component of β for j = 0, 1, 2, ..., p ) for an explanatory variable (X j ) thus shows the relationship between the number of births per woman for which the explanatory variable has a specified value and the number of births per woman for which the variable has the specified value minus one, all other things remain constant.Readers can find more details about Poisson regression model from 45 .

Negative binomial regression model
The PRM assumes that the error term, consequently responses for fixed covariate values, has the same mean and variance which is not usually true in practice.In many cases we face problem of over dispersion i.e. the variance of the error is larger than its mean.An alternative method for modeling the data with an over-dispersed error term is to fit a Negative Binomial Regression Model (NBRM) 46 .In negative binomial distribution the parameter of the distribution is considered as a random variable.The variation in the parameter can be considered as the variance of the data that is larger than the mean.We need more parameterization in NB distribution to get a form that is appropriate to our model.Following the notations given in 47 , we parameterize the NB density for the ith observation with parameters P i and r.The former is known as the success parameter, and for the ith observation it is defined as P i = r r+µ i , where µ i satisfies the relation given in Equation ( 7).The latter is the over-dispersion parameter (≥ 0) , which is equal to 1 in the Poisson distribution (i.e.there is no over-dispersion).The maximum likelihood estimates of the coefficients are obtained using the MASS package in R. The detail about parameter estimation, model assumptions, and validity of estimates can be found in 48 , Page 326 and 49 .

Zero-inflated poisson model
In the cases with responses having a large variance, many zeros as well as a few very large values, the negative binomial model as an extension of Poisson handles the extra variance.However, sometimes there may exist too many zeros than a Poisson would expect to predict.In such cases, a better option is to use Zero-Inflated Poisson (ZIP) model 50 .In a ZIP model, we assume that some zeros occur by a Poisson process and some were not even able to have the event occur.Hence two processes work in ZIP -where one determines whether the individual is eligible for a non-zero response, and the other finds the count of that response for eligible individuals.The ZIP model consists of two regression models both working simultaneously.A logistic (or probit) model is used to determine the probability of being eligible for a positive count and a Poisson model is used to model the size of the counts for eligible individuals with positive value.Both models utilize the same predictors, but with separate estimates for their coefficients.In this way, the predictor variables can have quite different effects on the processes.While a ZIP model needs it to be theoretically reasonable that some individuals are not eligible for a count.Zero-Inflation Poisson (ZIP) for response y is defined as where θ is the probability of occurring false values (zeros).Hence there are two models coupled together (a mixture model) to give an overall probability: (1)-when a response is zero (i.e.y i = 0 ), it is the probability of getting a zero plus the probability of a true value times probability of choosing a value of zero from a Poisson distribution with parameter µ and (2)-when a response is greater than 0, it is the probability of true value times the probability of drawing that value from a Poisson distribution with parameter µ.This definition indicates that the Poisson parameter µ is the same for both zero and non-zero components.The model of zero values (i.e.y i = 0 ) is used for essentially investigating whether the likelihood of false zeros is related to the linear predictors.The greater than zero (i.e.y i ≥ 0 ) model, then, investigate whether the counts (non-zero responses) are related to the linear predictors.The expected value and the variance of the response y for a ZIP model are E(y i ) = µ(1 − θ) and Var(y i ) = µ(1 − θ) × (1 + θµ 2 ) respectively.The model building involves an iterative process which is performed using the MASS package in R. The detail about derivation and application of ZIP model to count responses is available in 48 .To model birth counts, we use four different regression models namely Poisson, NB, ZIP, and ZIP inflation on full data obtained from PDHS-2017-18.We obtain 3 different models corresponding to 3 responses as follows; For data with count responses, the regression model utilizes the maximum likelihood (ML) method for estimation of the parameters.Reference 51 provided a practically understandable introduction of ML estimation.The estimates of all coefficients are obtained by using an iterative set of procedures for estimating parameters.All the ML estimation results are converged and found a unique set of values for each coefficient.

Birth count regression models under classical approach
After screening different possible determinants of birth from DHS dataset, we select significant variables for fitting final Poisson, NB, ZIP and ZINB models.The estimated coefficients for Models-1, 2, and 3 are presented in Tables 2, 3, and 4 respectively to quantify the impact of different determinants on the number births.The ).The results provide sufficient evidence that the estimated coefficients for all variables except Sindh, Prof _ tech, and Prof _ Agr have a significant effect on the number of 1-year births.The variables Res _ Age, Residence, and Age _ husbund show negative estimated coefficients for the Poisson model which supports the argument that the number of births during 1-year period to a woman decreases with the age of mother and is also higher for rural areas.The number of births for urban areas is exp (−0.1021) = 0.903 times the number of births in rural areas assuming no changes in other factors.The likelihood value and its transformations (like AIC and BIC) are used for comparison of the fitting power of competing models 44 .The variable on the region of the respondent is reconstructed into 5 dummies leaving Punjab as the base category.For interpreting the dummies, we follow recommendations given in 52 .The dummy variable corresponding to the province Khyber Pakhtunkhwa (KPK) has negative estimated coefficient (−0.0926) with a standard error of 0.035 indicating lower birth exposure in KPK (KPK=1) as com- pared to Punjab (KPK=0) assuming all other factors as fixed.While the coefficients are −0.088 and −0.05 with standard errors 0.021 and 0.015 for Poisson Model 2 and 3 respectively.Similarly, one can distill from Tables 2, 3 and 4 that, after adjusting other variables, the average births during 1, 3 and 5-year period are respectively exp(−0.0926)= 0.9115 , exp(−0.088)= 0.92 and exp(−0.05)= 0.96 times of the average births in Punjab.The 95 % confidence intervals for the true effects are (−0.1612,−0.024) , (−0.12916, −0.04684) and ( −0.079, −0.021 ) for Poisson Models 1, 2 and 3 respectively.The 95% confidence intervals for the relative rates (exponentiated estimates) are (0.851, 0.976), (0.879, 0.954) and (0.924, 0.98).
The regression coefficient associated with the wealth index (WI) is approximately −0.13 for all models.As WI is dummy coded, the negative sign shows that the average number of births for those who have positive WI is smaller than for those who have negative WI.Similarly, for all three responses, the coefficients for all dummies corresponding to different groups of the contraceptive methods turn negative denoting the number of births for those who use any one of the contraceptive methods is smaller than those who use no contraceptive method.For example, the average births to the women who use contraceptive pills have exp(−0.358)= 0.70 times of the average number of births to the women who don't use any method at all.Further, the ratio of births during 1-year period between those who use male contraceptives and those who use female contraceptives is 0.2383 with exp(−0.102+ 0.77885) = 1.97 showing that the number of births during 1-year period for those who use male contraceptive method have more birth than those who use a female contraceptive method.The same www.nature.com/scientificreports/interpretation for the coefficients corresponding to contraceptive methods can be done with a slight change in the estimated values and standard errors.Further, the average number of 1, 3, and 5-year births to women belonging to an agriculture background are respectively 1.07, 1.06, and 1.123 times higher than those who do not work.However, the regression coefficient for dummy corresponding to pregnancy termination (Preg_term_new) is insignificant.The (1-α )% confidence interval corresponding to each exponentiated coefficient can be constructed after obtaining confidence interval for the coefficients given in Tables 2. The method introduced in 52 for interpreting dummy variables cannot be extended for the interpretation of the coefficients corresponding to the continuous predictors.For the education level variable, the regression coefficient is 0.095.To see one level change in education level, we put △ = 1 into the formula 100 × [0.095 × 1 − 1] = 10 indicating that there is a 10% ( 7.6% for 3-years births and 5.6% for 5-years) increase in the expected number of 1-year births for a unit increase in education level.For the number of children already living in the same household, the regression coefficient is 0.025 with 100 × [exp(0.025× 1) − 1] = 25.3 showing that there is a 25.3% increase in the average number of 1-year births with increase of one child in the family.The percentage changes in births during 3-years and 5-years periods before the interview have not been reported to reduce the length of the article.Similarly, the percentage change in the expected births during 1-year period with one unit change in the variable Res_age, Age_husbnd, Num_daughter, MTFBI are 9.7%, 1.64%, 2.5%, 26.905%, and 0.5% respectively.The percentage change in the expected births during 3-years period with one unit change in the variable Res_age, Age_husbnd, Num_daughter, MTFBI are 8.9%, 1.41% 2.14%, 23%, and 0.5% and the percentage change in the expected births during 5-years period with one unit change in the variable Res_age, Age_husbnd, Num_daughter, MTFBI are 7.9%, 1.14%, 1.78% 22.4%, and 0.435084% respectively.
Figure 2 displays the effect of factors on birth counts for the 3 models.The length of bars shows the dispersion in the estimates while the center shows the exponentiated average change in births.The vertical line on the center divides the variables with negative and positive coefficients.The variables with insignificant effects are very close to the vertical line.The bars corresponding to 1-year births for Con_IUD and Con_Female show a highly significant decrement in birth while using these contraceptive methods.
A comparison of Poisson, NB, and ZIP are provided in Table 5 for three different responses.The minimum, maximum and mean residuals for each model are reported.The minimum residual is observed at extreme (i.e.-1.0731, -1.7707, and -2.6435) for the ZIP model.While maximum residual is observed at extreme (i.e.51.793, 15.0487, and 9.5761) for NB.Further to see the model performance -2logliklihood and AIC values are also reported.The AIC values are observed smaller for ZIP as compared to other models considered in this study (Table 5).for the three models are obtained from MCMC sampling.The plots for the model with 1-year period births as response variable are reported in Appendix.The autocorrelation plots are displayed in Figs. 4, 5, 6, 7, 8 (see Appendix).It can be noticed that the autocorrelation goes down for all coefficients with increase in l.However, the auto-correlation plots corresponding to the intercept and the coefficient of age indicate the presence of auto-correlation.This auto-correlation can be reduced by thinning the MCMC chains, i.e. by discarding n samples for every sample that we keep.The thinning of the MCMC chain is not of much use unless we want to reduce the memory and storage space in long chains.With this argument, one should keep only one out of ten samples instead of thinning the chain because this is more efficient, concerning the Effective Sample Size (ESS), to run only one chain 10 times as long, it will take 10 times more storage space.A more reliable estimate for burn-in cut-off is through the ESS.An ESS is the number of independent samples with the same estimation power as the number of autocorrelated samples.The burn-in samples are the samples that have not much information, and if the period of burn-in is estimated to be short enough then this will lead to a reduction in the ESS.On contrary, if the period of burn-in is estimated to be much longer, again causes in reduction of the ESS as informative samples are being isolated.An increase in ESS should be with the optimal estimate of the burn-in are highly recommended in practical estimation procedures.The burn-in samples can be assessed from the trace plots and ESS.Table 6 shows that the ESS all coefficients, except β 0 , β 1 and β 3 which are 38.7, 36.3, and 32.75 respectively, are large enough to continue.The ESS for the coefficient of Cont_Inj is maximum with 1,855.Finally, we see the predictive power of our models by checking at deviances.The most widely used tool for checking the predictive power of a model is the Deviance Information Criterion (DIC).It is an estimate of the expected predictive error of the model.Table 7 gives the mean deviances for the three models.The DIC value is least (i.e.43,015) for the model with the number of birth during 1-year period as the response.The penalized deviance for the same model with the penalty of 24.08 is 43,039 which is slightly larger than the deviance without penalty.

Model-based estimation of fertility indicators
In PDHS final report 43 , ASFRs are obtained according to the formula given in Equations ( 1) and ( 2) for each age group.The mother's age is computed, in century month code (CMC) format, by taking the difference of the date of the interview and the mother's date of birth.Births are then tabulated by age group after converting the ages into years.Similarly, the denominator in ASFR is women-years of exposure in the five-year age group during the 3-years time-period.A woman can expose to several age groups in the given period, with varying lengths of the period.For the 3-years period, a woman will contribute to at most two five-year age groups during the 36-months period.For further details related to the allocation of women to the extreme age groups (higher and lower), readers are referred to 43 .
We developed R codes for estimating ASFR under a model-based approach after converting the individual (IR) data to person-year data.For tabulation of person-years, each woman is tallied twice, once in the lower age group aggregating lower age group exposure and once according to the higher age group summing the exposure she contributes to the higher age group.In computing fertility rates, we use only ever-married samples without taking the "all-women factor" under a model-based approach.Hence the interpretation of the rates is done based on birth per ever-married woman only.The total exposure in each age group is the sum of the exposure in each age group tallying from the first and second.After obtaining ASFR, it is straightforward to obtain TFR, GFR, and GRR using formulae given in Equations ( 2), ( 3) and (4).We obtained fertility measures i.e.ASFR, TFR, GFR, and GRR using predicted responses obtained from the regression models after partitioning data into sampled and non-sampled parts.A bootstrap sampling procedure was used to study the design-based properties of the estimated fertility rates PDHS 2017-18 women re-coded data.The calculations are documented in the following algorithm.Select a random sample of size 10,000 from PDHS 2017-18 women re-coded data and partition the data into sampled and non-sampled parts.2: Fit the Poisson, NB, ZIP, and ZINB models to the sampled data and obtain the coefficients and residuals.3: Predict the non-sampled part of the response variable by using the fitted models from Step 2. 4: Convert the individual re-code (PKIR71) data into person-year data and obtain fertility rates under the working models.5: Repeat Steps 1-4, 10,000 times to obtain expected rates and the corresponding root mean squared errors (RMSEs).Root mean squared error for estimated rate say R against the true rate R is given by where Rk is the predictive estimate of fertility rate obtained from kth sample and R is the corresponding rate obtained from full data without use of weights.
Algorithm 1. Boot-strapped algorithm for birth rates estimation Model-based ASFRs for Pakistan at national and sub-national levels for an ever-married sample along with their RMSE are presented in Table 8, and Tables 9-10 respectively under four different working models.The expected ASFR with the smallest RMSE among four (obtained under four alternative models) is bolded corresponding to each age group for the sub-national and national levels.In majority of the cases, the ZIP model gives relatively smaller RMSE at both national and sub-national levels.Comparing the estimated ASFR with the ASFR obtained from full data in the last column of Tables 8-10, we can notice that the ASFR for age groups 1, 2, 6, and 7 are upward biased for all models i.e. they are estimated larger than the ASFER obtained from full data.The discrepancies in results occurred due to the use of person time data for estimating ASFRs.As for smaller and elder ages we are not able to exactly find the exposure to births and denominator in ASFR might be counted below the actual exposure time.Which can be considered as major limitation of this approach.The expected ASFR with the smallest difference (bias) with the ASFR obtained from full data among four is underlined corresponding to each age group at both national and sub-national levels.For example, the closest estimate for age group 1 is obtained under NB model in Punjab province.From ASFR based on full PDHS data, one can observe that for age group 15-19 the highest average number of births is observed for FATA and the lowest for KPK.Similarly, ASFR can be compared across regions based on full data.
The TFR, GFR, and GRR for ever-married women are provided at sub-national and national levels in Tables 11, 12 and 13 respectively.The TFR for the ever-married sample is observed highest in ICT with 6.26 and lowest in Punjab with 5.44 per 1000 ever-married women.While the TFR for all women given in PDHS 2017-18 report is observed highest in FATA.The TFR obtained for all women using the DHS.rates package are given in Appendix.The estimates obtained under NB, ZIP, and ZINB are more accurate than the ones observed for Poisson which can be noticed from RMSE in Table 11 corresponding to each region.The smallest RMSE is observed at the national level when ZIP is used to model births i.e. 0.0617.Similarly, the predictive estimate for GFR at the sub-national and national level is observed highest for ever-married women in KPK which is 174.95.6 (NB as it is more precise than the other three estimates) births per 1000 married women.The lowest GFR is observed for Punjab with 162 children per 1000 ever-married women.The RMSE is observed highest when the Poisson regression model is used for modeling births in KPK.While for other regions continuing with Poisson, ZIP and NB give almost similar RMSE for estimating GFR at sub-national and national levels.The GFR for full data without model fitting is obtained in the last column of Table 12.The GRR is computed using the proportion of female births (PF) from all age groups using the sex ratio of male to female from full data (PF from census or administrative records can be used for obtaining sex ratio).The GRR is observed higher in Balochistan, Punjab, and KPK as compared to other regions replacement of 2 or more daughters per woman before the death of their   The model-based estimates on ASFR, TFR, GFR and GRR are obtained in this study using three responses (birth history during 1, 3 and 5 years preceding the interview).Interpretations are made on the basis of most suitable one (with smallest RMSE) from the estimates obtained under four different methods.The highest ASFR is estimated in Age group 25-29 and lowest is observed in 45-49 at national level.Same results for ASFR is obtained for Punjab, KPK and FATA.However, in Sindh, Balochistan and ICT highest ASFR is obtained for age group 20-24.The total TFR is estimated highest in ICT with 6.23 (about 6 births per woman) and lowest in Pubjab region with 5.45.TFR is estimated 5.8 (aorund 6 per woman during their whole reproductive age) at National level.The GFR is estimated highest in KPK with 175 per 1000 women and lowest in FATA with 153 per 1000 women.The GFR is estimated 168 per 1000 women at national level.Similarly, the GRR is obtained highest in ICT with 2.8 per married woman and lowest in Balochistan with 2.4.

Conclusion
Estimation and inference of fertility rates posses severe challenges when surveys consist of missing responses on birth counts.Model-based approach by utilizing available data on related covariates aids in efficiency of estimated fertility rates.This article covers an etiological analysis of PDHS 2017-18 data on birth history to observe its significant determinants followed by construction of the model-based estimators on fertility rates including ASFR, TFR, GFR, and GRR.Estimation and inference of birth counts are considered under classical as well as Bayesian frameworks.The bootstrapped study reveals that the model-based estimators of fertility rates provide efficient results when relevant auxiliary data are available from the census at unit or cluster level.It is important to mention that the data about the majority of covariates considered for prediction in this study is not easy in practical situations at the individual level.However, data might be available for clusters through Civil Registration of Vital Y = Xβ + ǫ,

Figure 1 .
Figure 1.(a) Bar chart for birth counts during 1-year period.(b) Bar chart for birth counts during 3-year period.(c) Bar chart for birth counts during 5-years period; X-axis consists of birth counts and Y-axis corresponding frequencies.

Model 1 :
Taking 1-year births as responses Model 2: Taking 3-years births as responses Model 3: Taking 5-years births as responses.

Figure 3 .
Figure 3. Illustration of birth history data on a Lexis diagram (each birth can fall in one of the given categories then person-years are calculated according to these categories) is given in Fig. 3. Years spent by respondents before interview are shown on X-axis in 3-year periods.While age of respondents are shown on Y-axis the first arrow begins from 15 years age and end at 20 years inferring that birth falling in this category, i.e. area between first and second arrows before the first vertical line corresponding to 3, are counted as exposure for group 15-19.

Table 5 .
Residual Comparison of Models.

Table 6 .
Posterior means, standard deviation and effective size for regression coefficients.

Table 8 .
Model-Based ASFR for Ever-married Women in Pakistan.Significant values are bold, underline, boldunderline.

Table 9 .
Model-Based ASFR for Ever-married Women by Region.Significant values are bold, underline, boldunderline.

Table 10 .
Model-Based ASFR for Ever-married Women by Region (Continued).Significant values are bold, underline, boldunderline.RMSE is smaller for Punjab, Sindh, and KPK when the NB model is used for prediction.While it is smaller for the remaining regions when the ZIP model is employed.

Table 11 .
Model-Based TFR at Regional Level.Significant values are bold, underline, boldunderline.

Table 12 .
Model-Based GFR at Regional Level.Significant values are bold, underline, boldunderline.

Table 13 .
Model-Based GRR at Regional Level.Significant values are bold, underline, boldunderline.