A quasi-experimental study of impacts of Tanzania’s wildlife management areas on rural livelihoods and wealth

Since the 2000s, Tanzania’s natural resource management policy has emphasised Wildlife Management Areas (WMAs), designed to promote wildlife and biodiversity conservation, poverty alleviation and rural development. We carried out a quasi-experimental impact evaluation of social impacts of WMAs, collecting data from 24 villages participating in 6 different WMAs across two geographical regions, and 18 statistically matched control villages. Across these 42 villages, we collected participatory wealth ranking data for 13,578 households. Using this as our sampling frame, we conducted questionnaire surveys with a stratified sample of 1,924 household heads and 945 household heads’ wives. All data were collected in 2014/15, with a subset of questions devoted to respondents’ recall on conditions that existed in 2007, when first WMAs became operational. Questions addressed household demographics, land and livestock assets, resource use, income-generating activities and portfolios, participation in natural resource management decision-making, benefits and costs of conservation. Datasets permit research on livelihood and wealth trajectories, and social impacts, costs and benefits of conservation interventions in the context of community-based natural resource management.

We used country-wide information from the 2002 national census on boundaries of enumeration areas (EAs) as our sampling frame to match WMA and non-WMA villages. Enumeration areas are administrative units defined for the purposes of conducting censuses which each contain a certain number of households. In rural areas they usually correspond to a single village, collection of smaller villages or part of a larger village. EAs were used in place of villages as the unit of sampling and matching because equivalent information was not available at the village level. In Southern WMAs, EAs poorly represented some of the village boundaries on the ground. In the case of Mbarang'andu WMA we were able to obtain locally more accurate maps covering the whole of Namtumbo district which were georeferenced and used to correct village boundaries prior to matching. For Tunduru Nalika WMA, equivalent maps covering all of Tunduru district were not available so the census EA boundaries were used despite discrepancies between 2002 shapefiles and information on the ground.
To select non-WMA villages that were as similar as possible to our sampled WMA villages prior to WMA establishment, we pursued a matching approach drawing upon existing available georeferenced data covering Tanzania relating to the period before WMA establishment ( Table 2). Non-WMA villages were selected using nearest neighbour matching with replacement based on Mahalanobis distance metrics calculated from a set of economic, environmental and demographic variables. These variables were chosen because they were theorised to affect both the probability that a village would become part of a WMA and the effect of a village joining a WMA. The pool of potential matches was restricted to villages within the same region of Tanzania. Highly skewed variables were transformed prior to use in matching. The matching procedure was carried out in R using the Match function from the Matching package 16 .

Constructing a village register
Within each village, we sampled households from different wealth-based strata. To identify eligible households and to rank each household according to wealth we first needed a village register for each village. The following eligibility conditions were applied to construct the village register: • We assumed patriarchal intra-household dynamics. A household constitutes multiple sub-households in case of polygamous marriages where the husband is in charge of economic decision-making on behalf of other family members. Hence the male head of the polygamous family represents the entire household and potential sub-households. Sub-households are not recorded in the village register to avoid double counting. This differs from the approach used by The Nature Conservancy in a subsequent survey 17 . • A son becomes a head of his own household as soon as he is making his own decisions over his family's economic activities. In this case he is treated as a head of the household and included in the sample frame. • Spatially, several economically independent households and by extension several household heads can share a multi-household homestead and each of these were eligible. • Households had to be formed at least in 2007 or earlier to be included. • Households that had immigrated to this village, had to be present in this village at least from 2007 onwards to be included. • The head of the household can be female in cases where the head is a widow or a never married woman, where the husband left or was permanently not available, the husband is disabled, or the husband is alive but does not provide for the household.
The rationale for exclusion of households that were not formed prior to 2007 or immigrated into the village after 2007 is that the survey seeks to understand what has happened to the household from 2007 until the time of the survey (2014/15). See below for the method of pinpointing recall.
Following these eligibility criteria, a village register was obtained from the village office, screened and updated. If no village register was available, it was constructed with the help of knowledgeable people,   and then throughout the year that followed. In the South, where the eruption was less salient, people were asked to think back to President Kikwete's 2005 election as a help to remember 2007 events. In both cases people's recall was further prompted by reference to local events (local elections; village subdivisions) and memorable personal or family matters (birth of a child, construction of a house…). By focusing on important and easily quantifiable household assets we reduced potential implications of recall bias.

Participatory wealth ranking
After the village register was constructed, wealth ranking was conducted with the purpose of assessing the wealth trajectory of all eligible households in the village register in terms of development in relative wealth over time, based on four community-distinguished and community-defined, levels of wealth. Focus groups were predominantly male. Prior to asking a focus group to rank households from the updated PIMA village register, we conducted a focus group discussion on what constitutes material wealth in each village individually. We asked the group to discuss together what constitutes a state of poverty in the eyes of the community and when a household might be considered very poor, poor, normal, and rich. Typically, we asked the focus group participants questions like what would a typical very poor/poor/normal/rich household in your village..
• spend money on, spend additional money on if it becomes available, • be able to afford in terms of access to food • have as ability to send children to school, to pay back debt, to be able to repair the house • be able to do to improve agricultural land or to increase livestock herd size, or to acquire more land • have as measures of housing quality including iron sheet roof, cement floor, electricity etc.
• have as assets such as clothes, radio, mobile phones, furniture, bicycles, domestic animals, farm land, agricultural implements, bee hives, small scale businesses, shop etc. • have in terms of access to services such as water pump, health care, veterinary assistance, primary and secondary school for children, number of meals per day etc. • have in terms of access to opportunities such as non-farm employment, pension, micro loans, bursaries etc.
The goal was to identify four distinct wealth groups within the community. The discussion was led by PIMA research assistants (enumerators, co-authors to this article) who would prompt the focus group participants if the discussion did not yield sufficient differentiation between the different wealth categories. After the four wealth groups had been identified by as much consensus as possible, the research assistants wrote down the defined assets and associated quantities for each wealth rank category. Based on this shared understanding of what constitutes poverty and wealth in the village, the focus group would together proceed to determine for each of the households within the PIMA village register its present-day (2014/2015) and 2007 wealth rank as Very poor, Poor, Normal or Rich. The focus group assessed each household as compared to the village in general.
The results were written down by the research assistants into the village register template and transcribed into Microsoft Excel. This yielded the first dataset, including 13,578 households across 42 villages (villages, as defined by the 2002 census, PIMA_WEALTH_RANKING.csv, Data Citation 1).  Thus, leaders in categories A and B were only present in WMA villages. In addition to wealth ranking, the identification of households with members in village leadership positions enables the analysis of social impacts on rural livelihoods to be differentiated between households whose members occupy leadership positions and those which do not. Both wealth ranking data and village leadership data were used for stratified random sampling.

Stratified random sampling
In each village 40 households were randomly sampled for inclusion in the main household questionnaire survey, which was conducted with the household head (PIMA_HHHEAD_SURVEY.csv, Data Citation 1). To ensure that different wealth categories and social strata were sufficiently well represented to allow differentiated statistical analysis, the household stratification and sampling procedure was designed to oversample poor households and leaders, selecting randomly: In WMA villages the 10 leadership households were identified based on the 'leadership positions' in the village register. Given that many control villages did not have 10 'leadership' households as defined above, 'rich' households based on wealth rank in 2007 were added to attain 10 households in this stratum.
Alongside the main household survey, we also randomly sampled 20 wives of male heads of sampled households to conduct a household survey specifically targeting wives of male household heads (PIMA_WIFE_SURVEY.csv, Data Citation 1). The relative proportions of wives sampled within the three strata were the same as those of household heads in the main survey. If the head of a selected household or the wife were not available, could not recall their situation in 2007 with clarity, or refused to participate, a backup respondent was selected (see Table 4). This was done by selecting the first respondent on the backup list (not the one geographically nearest to the enumerator's position in the village).
Since the 2002 census some of the villages selected for inclusion in our study have split into new villages. To deal with this we drew our household sample from within the original village boundaries. Thus, we created sample frames (across several new villages) for the 42 original villages selected (although 5 additional villages had been created since 2002), and selected 40 households from within each of these original villages. That way the household numbers remain constant in cases where original villages split into several new ones.

Survey implementation
Both household survey questionnaires were initially co-designed by University of Copenhagen and University College London researchers, and subsequently trialled and refined in discussion with local research assistants. The survey questionnaires were produced in Microsoft Word and subsequently digitised using OpenDataKit (ODK), an open source survey package that runs on Google Android  smartphones and tablets through the app ODK Collect. ODK stores both the questionnaire and survey results on a secure, password protected webserver (ODK Aggregate). Four experienced research assistants (RAs) were recruited by the first author with the assistance of the local project partner Tanzania Natural Resource Forum (TNRF), and trained as enumerators by the first author in the field. Each enumerator received a tablet for data collection purposes, and was only required to use paper for additional opportunistic field notes, or in case of technical failure. Household survey questionnaires were pre-tested in the field under the supervision of the first author as field team leader and instructor. Training included a week of theory on socio-economic development and change, and role-play in conducting household surveys, and two weeks of hands-on training by conducting surveys using paper and ODK tablets in the villages. Enumerators individually recorded questionnaire results from the same, jointly conducted, household survey in the presence of the first author. These individual results were compared against each other to establish a common understanding of the meaning and purpose of individual survey questions. All enumerators were male and fluent in both Kiswahili and English. Two of the four enumerators were also fluent in Maa, which was important in some of the villages in Northern Tanzania. Given the range of local languages used in the study areas, questionnaires were not translated into Swahili or Maa. Instead, the first author worked with the field teams to ensure that RAs had a robust and consistent individual and collective understanding of the meaning and purpose of each question, and that each could be presented unambiguously to interviewees irrespective of the operational language.
The first author supervised the field team for six out of 12 months of data collection, and conducted regular quality control remotely through ODK's data server for the entire period of data collection. During the absence of the first author, other authors individually spent additional periods of a week or so at a time with either the northern or the southern teams to observe data collection procedures and techniques, and help resolve emergent issues.
The survey was approved by the UCL Anthropology Departmental Ethics Committee under the reference No. Z6364106 of UCL Data Protection Registration. The survey obtained a research permit (2014-49-NA-2013-154) from Tanzania Commission for Science and Technology (COSTECH). All survey respondents were introduced to the project using a plain language statement, which explained the collaboration and project aims and methods, and included information about the interviewees' right to drop out, prior to or at any time during the survey. Respondents were then asked for their informed consent to participate in the survey. Participation was voluntary and no cash payment was offered. Small inexpensive but locally appreciated courtesy gifts such as tea, sugar or phone credit vouchers were offered as a thank you after the survey.
Both questionnaire surveys and wealth ranking took place between May 2014 and May 2015.

Questionnaires
Household head and wives' questionnaires (PIMA_HHHEAD_SURVEY_INSTRUMENT.pdf, PIMA_-WIFE_SURVEY_INSTRUMENT.pdf) are provided in pdf form and correspond to datasets 2 and 3 (PIMA_HHHEAD_SURVEY.csv, PIMA_WIFE_SURVEY.csv). The questionnaires are structured as follows in Table 5. Each question is numbered accordingly and corresponds to the header in the datasets. E.g., question 6.2 in pdf file of questionnaire with household head corresponds to the column with the header 'q6.2' in the csv file of dataset 2. Some questions could be answered with the ‚other' option, and were pre-coded as such. In this case the enumerator recorded the answer as open text. E.g., question 6.2 in pdf file of questionnaire with household head can be answered with 'other' under code number 7. The corresponding column in the csv file containing this response is named as 'q6.2_other'.
Most questions were pre-coded. Questions 3.1 and 11.7 in questionnaire with household head, and question 8.8 in questionnaire with wife of household head were open text. Sections that were only covered in WMA villages are indicated with * (see Table 5). Many questions aim to elicit if there has been a change in a particular condition as perceived by the respondent (e.g., change in access to water or firewood), and what was the main reason for change according to the respondent (e.g., access became restricted due to..). For that purpose, a question was asked both pertaining to the situation in 2007 (recall baseline) and to the present (2014/2015). In case there was change, and in case the survey was conducted in a WMA village, the enumerator would record if the perceived change was perceived to be due to the implementation of the WMA. However, the enumerator would not prompt about the WMA specifically, when asking about reasons for change, to avoid strategic responses.

Scoring exercise
Section 10 of questionnaire with household head was designed as a scoring exercise to establish the relative importance of different components of household livelihoods and income portfolios, in the six categories agricultural production, livestock production, environmental product harvesting, wage employment, business, and remittance and pensions. Similarly, section 9 of questionnaire with household head focused specifically on the relative importance of nine different categories of environmental goods and services including firewood, charcoal, construction material, timber, fish, honey, bushmeat, wild vegetables and fruits, and the category 'other' containing all other environmental products. Participants were asked by the enumerator to distribute a pre-defined amount of maize grains (50 and 25 grains in section 9 and 10, respectively) across the different categories to reflect the www.nature.com/sdata/ SCIENTIFIC DATA | 5:180087 | DOI: 10.1038/sdata.2018.87 importance in the household's livelihood during the past 12 months at the time of the enumeration in 2014/2015. Importance was explained as encompassing both cash income and value for own subsistence use defined as replacement value. Printed templates in Kiswahili were used with appropriate pictures to facilitate the exercise. After grains were placed and the enumerator had recorded their distribution, the respondent was asked to think back to 2007 and move the grains around accordingly to reflect relative changes across categories. This also included removing or adding as much extra maize as necessary to reflect changes in overall household income summed across the items listed, in cash and in kind, between 2007 and 2014/2015.

Data Records
Three different datasets are available (Table 6): • Village-based wealth ranking (n = 13,578, Villages and households are anonymised, PIMA_WEAL-TH_RANKING.csv, Data Citation 1). • Socio-economic household survey with the head of the household (n = 1,924, Villages and households are anonymised, PIMA_HHHEAD_SURVEY.csv, Data Citation 1). • Socio-economic household survey with the wife of the household head (in case the head of the household is male, n = 945, Villages and households are anonymised, PIMA_WIFE_SURVEY.csv, Data Citation 1).
The survey data is provided as comma-separated value (csv) files. All variables in the csv format correspond to the number in the questionnaire. Both household survey questionnaires can be downloaded in English as pdf files, including full code lists.

Technical Validation
The quality of the match between selected WMA and non-WMA villages was assessed by examining the improvement in covariate balance achieved after matching via standardised mean differences (Table 7). Balance improved for all covariates, but a non-negligible level of imbalance remained in the distance from protected areas, the distance from wildlife corridors and population density due to the limited availability of non-WMA villages with similar characteristics to WMA villages. We also visually examined QQ-plots of the matched pairs of WMA and non-WMA villages to determine specific areas where differences in observed covariates remained after matching (Fig. 2).
The quality of wealth ranking is ensured by pre-testing the approach in four villages where each village was ranked by two different focus groups to compare the results. We conducted wealth ranking with a female and a male focus group separately, each comprising 2-4 people knowledgeable of village affairs and households. Often the 2-3 people in the focus group would be sub-village chairpersons. Generating two separate wealth rank datasets (with a male and a female focus group) proved time consuming. After an analysis of difference indicated that the two sets were not very different, it was decided to proceed with only one focus group in each village.
After the main household survey data had been collected, we also used this to verify that the wealth ranks had correctly differentiated between households possessing differing amounts of land and livestock and households which were more likely to have members who occupy positions of leadership (Fig. 3). Our a priori expectation was that higher wealth ranks would be associated with ownership of more assets and a higher probability of occupying a leadership position.
Household survey questionnaires reflected project research questions and objectives and were designed collaboratively, drawing on the extensive expertise of different research project partners pertaining to socio-economic livelihood and anthropological studies. An important criterion was to make sure that a questionnaire could be conducted within 1-1.5 h to avoid respondent fatigue and thereby ensure high quality data, that respondent's needs were respected and that any collection of data not of immediate relevance to the research project objectives was avoided.
ODK enabled a digital data entry that proved to save time, reduce errors and offered almost real-time data availability for remote quality control. Paper was only used during training in combination with ODK tablets, and as a backup in case of technical failure of tablets (which did not happen). Typographic errors and other forms of human error were minimized substantially by the inclusion of data validation rules in the ODK code for the questionnaire surveys, implemented by the first author. For example, string answers were not accepted where a number (integer) is expected, logical upper and lower numerical limits were pre-coded to reduce typographic errors; skipping and branching logic ensured that only appropriate questions were asked; questions that were required to be asked were pre-coded appropriately. Due to the inability to make online corrections to the ODK dataset, the first author documented all inaccuracies and errors in a log spreadsheet throughout the 12 months of data collection. This was used to clean the complete dataset after data collection finished.

Usage Notes
To account for the residual imbalance in observable covariate values that remained between WMA and non-WMA villages after matching, we suggest that future analyses of the data should include adjustment on the matching covariates as part of their modelling strategy. Where appropriate, analyses of data from the household head survey and wife survey should also be designed to account for the stratified sampling approach adopted during data collection (e.g., using sampling weights or post-stratification based on the village wealth ranking data which served as our sampling frame).
A range of manuscripts is currently being developed based on analysis of various aspects of the PIMA data which we aim to publish in international peer reviewed scientific journals. Hence, anyone wishing to use the same specific data should consult the corresponding author of these publications in preparation. This includes the following data/questions: Martin R. Nielsen (mrni@ifro.ku.dk): Sections 9 and 10 of the questionnaire conducted with the household heads (Dataset 2, PIMA_HHHEAD_SURVEY.csv).  Table 7. Summary of matching outcome for each of the variables used in the matching procedure.