Introduction

The home environment can act as a reservoir for microbial colonization and can contribute to the spread of infectious diseases. Poor water, sanitation and hygiene (WASH) leads to fecal contamination of the home environment, which increases the risk of disease transmission1,2. Children in low-resource settings experience to a variety of enteropathogen risk factors from various sources and exposure pathways (e.g., water, soil, food, hands, flies, and containers)1. Similarly, inadequate access to basic sanitation, poor animal husbandry or keeping practice, and mouthing of soil contaminated materials are the commonest risk factors for transmission of enteric infections among children in the rural Ethiopia. A study done to measure child exposure to enteric infection in the rural northwest Ethiopia showed that contamination of water, food, and soil with fecal matter due to open defecation practice and poor animal keeping practice was common. Moreover, mouthing of soil or soil contaminated materials is commonly practiced among children in the rural Ethiopia3.

Various approaches for measuring human exposure exist along the environmental exposure pathway continuum, ranging from those that measure environmental contaminants to predict exposures before the contaminant reaches the human boundary (external exposure assessment) to those that estimate a dose after the contaminant has been taken up into the body (internal exposure assessment) (internal exposure assessment). The detection of indicators of fecal contamination or specific pathogens in a known size of environmental sample is a common approach for external exposure assessments4. Internal enteric pathogen exposure assessments using human biological specimens, on the other hand, can estimate actual enteric pathogen exposure after crossing the human body, typically through oral ingestion4,5.

Moreover, survey questionnaire can complement exposure assessments and the analysis of exposure data. Survey data on self-reported behaviors has been used as a rapid and cost-effective tool to collect information on a range of self-reported behaviors that result high risk of exposure6. However, questionnaire may result biased outputs unless it is valid and reliable. The quality of survey questionnaire is mostly related to the validity and reliability of the data obtained from it. An instrument would be considered a good measure when it passes the tests of validity and reliability7. The purpose of this work is, therefore, to design valid and reliable questionnaire to assess behaviors and environmental conditions that result exposure of children to enteric infections in the rural northwest Ethiopia.

Validity is the degree to which the questionnaire measures what is intended to be measured. In the literature, several types of validity have been described8,9,10. In this work, we included only face validity and content validity. Face validity is established when experts on the research subject reviewing the questionnaire concludes that it measures the research question/s11,12. When an expert examines the items in a questionnaire and agrees that the test is a valid measure of the concept being measured, this is known as face validity13. Content validity is the degree to which the questionnaire fully assesses the research question/s and it is achieved by a rational analysis of the instrument by experts on the research subject13,14,15.

Reliability is the degree to which a questionnaire produces consistent results over time. It refers to the consistency of scores over time or between raters. A pilot test is usually used to determine the questionnaire's reliability. Test–retest reliability, alternate form reliability, and internal consistency reliability are the three major types of reliability that can be assessed16. In this work, we used internal consistency reliability to assess the reliability of our questionnaire.

Methods

We used a method described by Zamanzadeh V, et al.17 to design the questionnaire and to test its validity and internal consistency. Our work first describes the steps involved in the design of questionnaire and the procedures of testing validity and reliability of the instrument (Fig. 1).

Figure 1
figure 1

Steps of tool development and validation study.

Step I: Understanding the theoretical framework on the research problems and determine content domains

In this step, we began with a thorough exploration of relevant literature to understand the theoretical framework on the research problems and objectives to determine content domains and to identify major variables to highlight what the questionnaire is measuring. Figure 2 summarizes the content domains and variables that the questionnaire will measure.

Figure 2
figure 2

Content domains and variables that the questionnaire will measure.

Step II: Item generation

We generated items/questions in each domain that can effectively address the study objectives or research questions after developing a good understanding of the theoretical framework on the research problems and objectives through review of literature. Each item in the questionnaire was generated based on the content domains and variables summarized in Fig. 2.

Step III: Instrument construction

In this step, we refined and organized the items in a suitable format and sequence so that the finalized items are collected in a usable form. Research team members reviewed and approved the final preliminary version of the instrument.

Step IV: Forward and backward translation and synthesis

The questionnaire, which was written in English, was translated into Amharic, the local language. Two native Amharic speakers fluent in English independently completed the translation, which was then back-translated into English by two independent English language experts fluent in Amharic who were blinded to the English version. The back-translated versions were checked for discrepancies against the original version. The preliminary version was ready for face validity after discrepancies were corrected.

Step V: Face validity

After translation, we conducted a face validity study by involving 12 experts on the field who are working at the University of Gondar, Ethiopia (3 environmental health experts, 4 microbiologists, 3 pediatric nurses, and 2 nutritionists). All the experts had 10 or above years of experience. All the environmental health experts and microbiologists were PhD holders and the rest experts were second degree holder. We dispatched the Amharic version questionnaire attached with a conceptual framework and study objectives to these experts to review it critically with clear instruction and we arranged a panel discussion after a week to discuss on each item in the questionnaire line-by-line and to collect and analyze their quantitative and qualitative viewpoints on the relevancy or representativeness, clarity and comprehensiveness of the items. In the discussion, experts evaluated the questions whether they appropriately measure the research objectives or not. Experts also added some relevant questions that can answer the study objectives and removed some questions that have little contributions to the study objectives. Experts also judged the way the questions were organized and acceptability of the questionnaire by study participants18.

Step VI: Content validity

After the face validity, we dispatched the questionnaire developed in the face validity attached with the conceptual framework and study objectives to 35 experts on the research subject. The experts were selected based on the following criteria: (i) area of expertise (environmental health, microbiology, parasitology, epidemiology, nutrition, and pediatric nurse); (ii) year of experience (10 or above years of experience); (iii) level of education (second degree or above); and (iv) research experience (assistant professor or above). Experts had been told to critically review the questionnaire line-by-line by referring the study objectives and conceptual framework and to rate the degree to which the questionnaire fully assesses or measures the study objectives. We told experts to rate the relevance of items in the questionnaire as ‘not relevant’ (which is assign a score of 1), ‘somehow relevant’ (which is assign a score of 2), ‘quite relevant’ (which is assign a score of 3), and ‘highly relevant’ (which is assign a score of 4)17.

To select the most important and correct item in the instrument, we calculated a content validity ratio (CVR). The experts' scores were used to calculate the CVR for each questionnaire item using the Lawshe method19.

\(\mathrm{CVR}= \frac{{(\mathrm{N}}_{\mathrm{e}} -\mathrm{ N}/2)}{\mathrm{N}/2}\), where Ne is the number of experts indicating the item is essential and N is the total number of experts. If CVR is bigger than 0.49, the item in the instrument with an acceptable level of significance will be accepted19.

Item-level content validity index (I-CVIs) was used to determine the proportion of agreement on the relevancy or appropriateness of each item. I-CVI is computed as the number of experts giving a rate of 3 or 4 to the relevancy of each item, divided by the total number of experts17. After calculating I-CVI, judgment on each item is made as follows: If the I-CVI is higher than 79%, the item is appropriate. If it is between 70 and 79%, it needs revision. If it is less than 70%, it is eliminated20. To determine the proportion of total items judged content valid, we used Scale-level content validity index (S-CVIs) which we calculated using universal agreement approach (S-CVI/UA) among experts. We first dichotomized the scale in to ‘relevant’ by combining values 3 and 4 together and ‘not relevant’ by combining values 2 and 1 together and then, the number of items considered ‘relevant’ is divided by the total number of items21,22. For the S-CVI/UA, 80% agreement or higher among judges was considered22.

Moreover, CVI does not consider the possibility of inflated values because of the chance agreement; we used both CVI and multi-rater kappa statistic to adjust for chance agreement23,24. To calculate modified kappa statistics, the probability of chance agreement, was first calculated for each item by following formula17:

$${\mathrm{P}}_{\mathrm{c}}=[\mathrm{N}!/\mathrm{A}!(\mathrm{N}-\mathrm{ A})!]*0.{5}^{\mathrm{N}}$$

where N = number of experts in a panel and A = number of panelists who agree that the item is relevant. After calculating I-CVI for all instrument items, finally, kappa was computed by entering the numerical values of probability of chance agreement (PC) and content validity index of each item (I-CVI) in following formula24:

$$\mathrm{K}= \frac{\mathrm{I}\_\mathrm{CVI }- {\mathrm{P}}_{\mathrm{c}}}{1-{\mathrm{P}}_{\mathrm{c}}}$$

Kappa values above 0.74 are considered as excellent, between 0.60 and 0.74 as good, and between 0.40 and 0.59 are considered as fair25.

Step VII: Pre-testing

The instrument generated in the content validity study was pre-tested among 10 selected rural households having similar characteristics to the target population in which the instrument will be used to evaluate the instructions, response format and the items of the instrument for clarity and a pre-final version of the instrument was generated.

Step VIII: Internal consistency reliability test

A pilot study in 150 randomly selected rural households was undertaken using the pre-final version of the instrument to test the internal consistency reliability. The minimum sample size (i.e., 150) for the internal consistency reliability study was determined based on the recommendations in the literature26,27,28. The pilot study was conducted in the rural setting of the east Dembiya district of Ethiopia in December 2020. The east Dembiya district is one of the districts in central Gondar zone, the Amhara national regional state, Ethiopia. As of July 2020, the district had a total of 192,020 rural and 18,741 urban residents29, of these, 39,927 (12.22%) were children under age five-years30. In the district, coverage of clean water and latrine in 2017 were 26.6% and 55%, respectively and the households are not linked to municipal water and sewage system in the area. Moreover, during June 2017, intestinal parasitic infections and diarrheal disease were the top four and five prevalent, which accounted 5161 (9.97%) and 4981 (9.62%), respectively. In the area domestic animals and their feces are not properly contained or separated from the living environments31.

All households in the rural kebeles (the lowest administrative unit in Ethiopia) in the district were considered for sampling. First, we chose three rural kebeles at random out of 28 kebeles using a simple random sampling technique. We allocated equal number of households with children under the age of five-years to each kebele. Finally, 150 households were included in the study using a systematic random sampling technique.

Field data collectors interviewed the female head of the household to collect data using the pre-final version of the instrument. The collected data were entered to Epi Info version 7 and exported to Stata version 14 for analysis. We assessed reliability using agreement and consistency indices. Cronbach’s alpha was computed to assess the internal consistency of items32,33 and values of ≥ 0.70 were considered adequate16.

Ethics approval and consent to participate

Ethical clearance was obtained from the Institutional Review Board of the University of Gondar (reference number: V/P/RCS/05/1933/2020). There were no risks due to participation and the collected data were used only for this research purpose with complete confidentiality. Written informed consent was obtained from study participants. All the methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

This manuscript does not contain any individual person’s data.

Results

Identification of content domains and item generation

In the first step of instrument design, ten content domains including socio-demographic, health and sanitation messages, healthcare seeking behavior for childhood illnesses, personal hygiene, excreta management, water quality and safety measures, food hygiene and safety measures, housing conditions, infestation of vectors, and enteric infections were identified.

In the item generation step, 123 items were generated from these domains [10 from socio-demographic domain, 10 from health and sanitation messages domain, 11 from healthcare seeking behavior for childhood illnesses domain, 11 from personal hygiene domain, 17 from excreta management domain, 20 from water quality and safety measures domain, 18 from food hygiene and safety measures domain, 16 from housing conditions domain, 8 from enteric infections domain, and 2 from infestation of vectors domain]. We refined and organized all these items in a suitable format.

Face validity

In the face validity study, experts re-categorized the content domains in to eight and added some relevant questions in each domain and removed some questions from each domain. Accordingly, 80 items [8 from socio-demographic domain, 8 from health and sanitation message domain, 12 from personal hygiene domain, 12 from waste management domain, 15 from drinking water supply domain, 11 from food safety domain, 8 from housing condition domain, and 6 from enteric infection domain] were generated.

Content validity

We calculated CVR, I-CVI, S-CVI/UA, and modified kappa based on the formulas described in the method section. The CVR, I-CVI, and modified kappa values for the total items were 0.95, 0.97, and 0.97, respectively. Moreover the CVR, I-CVI, and modified kappa values for each item were greater than the cut values (0.49, 0.79, and 0.74, respectively), indicating that all the items generated in the face validity test are appropriate to measure the research objectives (Table 1).

Table 1 Instrument domains, total number of items, Item-level Content Validity Index, Modified Kappa, and interpretations in the content validity study [number of domains = 8, total number of items = 80, number of content experts = 35, cut point for CVR ≥ 0.49, cut point for I-CVI ≥ 0.79, and cut point for Modified Kappa ≥ 0.74].

The S-CVI/UA value for the total items was 0.95 and the values to each item were greater than the cut value, i.e., 0.80 (Table 2), which showed that the proportion of total items judged content valid is within the acceptable range or the agreement among judges is higher.

Table 2 Number of items considered relevant by content experts, Scale-level Content Validity Index, and interpretation [total number of items = 80, number of content experts = 35, and cut point ≥ 0.80].

In all cases, no item was eliminated in the content validity process. So, our instrument was prepared with 8 dimensions and 80 items for internal consistency reliability.

Internal consistency

A pilot survey was carried out among 150 rural households to measure the internal consistency reliability of the questionnaire. Table 3 shows information about the socio-demographic and WASH profile of the 150 households.

Table 3 Socio-demographic and WASH profile of the rural households participated in the pilot survey (n = 150) in the east Dembiya district, December 2020.

An internal consistency reliability analysis was carried out on a survey questionnaire on environmental exposures of children to enteric infections comprising 80 items. The Cronbach’s α was used to measure the internal consistency of the scale items. For the whole scale, Cronbach’s α was 0.85 and ranged between 0.79 and 0.85 (Table 4) for the eight dimensions, indicating the strong reliability of the tool. Therefore, the final version of the questionnaire was prepared with 8 dimensions and 80 items. The final English (Supplementary File 1) and Amharic (Supplementary File 2) versions are included as supplementary materials.

Table 4 Dimension descriptions and scale reliability.

Discussion

This study was conducted to design valid and reliable questionnaire to complement exposure assessment of children to enteric infections in the rural northwest Ethiopia. As presented in this paper, a questionnaire assessing behaviors that result exposure of children to enteric infections was developed with satisfactory validity and reliability. The 8-domains and 80 items adopted in this study are appropriate or relevant to capture behaviors that result exposure of children to enteric infections. The domains included in the final version of the tool were socio-demographic domain, health and sanitation messages domain, personal hygiene domain, waste management domain, drinking water supply domain, food safety domain, housing condition domain, and enteric infection domain. These domains, as represented by the respective items per domain, appeared to be important. The content domains included in the final version of the questionnaire are partly or fully used in other studies to collect data on self-reported behaviors or observational data on practices to enable the targeting of environmental media and locations where the study population is predominantly exposed to enteric infections. The SaniPath tool is the standard tool researchers commonly used to complement external assessment34,35,36 and some studies combined external assessment with behavioral observations to estimate actual ingestion (e.g., measuring pathogens in soil and frequency of geophagia or measuring fecal indicators deposited by flies when alighting on food and the number of fly landings). However, these methods rely heavily on assumptions about conditions and behaviors that vary significantly within and between individuals36. Designing valid and reliable data collection tool that consider the local contexts in which the study will be conducted is very useful. This tool will be, therefore, used in the rural settings of developing countries to measure behaviors that result high exposure to enteric infections.

The CVR, I-CVI, and modified kappa for the total items and for each item were high, indicating that the items are appropriate to measure the research objectives. The S-CVI was also high for the total items and for each item, indicating the agreement among judges to each item is higher. CVR is an empirical analysis, which measures the essentiality of an item. CVR varies between 1 and -1, and a higher score indicates greater agreement among panel members17. I-CVI and S-CVI are the most widely reported approach for content validity. Values of I-CVI range from 0 to 1 where I-CVI > 0.79, the item is relevant, between 0.70 and 0.79, the item needs revisions, and if the value is below 0.70 the item is eliminated17. Eighty percent or higher values for S-CVI/UA is considered for acceptable agreement among judges22. The multi-rater kappa statistic adjusts chance agreement, whereas I-CVI and S-CVI do not consider the possibility of inflated values because of the chance agreement. Thus, checking the Kappa values to each item is important in addition to CVR, I-CVI, and S-CVI. Kappa values above 0.74 are considered as excellent, between 0.60 and 0.74 as good, and between 0.40 and 0.59 are considered as fair25.

The Cronbach’s α for the total scale was high (0.85) and all items appeared to be worthy of retention, resulting in a decrease in the alpha if deleted. The reliability coefficient (alpha) can range from 0 to 1, with 0 representing a questionnaire that is not reliable and 1 representing absolutely reliable questionnaire. Cronbach’s α coefficients \(\ge\) 0.9 indicate excellent internal consistency, 0.8 > α ≥ 0.9 are good, 0.8 > α ≥ 0.7 are acceptable, 0.7 > α ≥ 0.6 are questionable, 0.6 > α ≥ 0.5 are poor, and lesser than 0.5 are unacceptable37.

Overall, the tool can be applicable to other areas or situations outside the northwest Ethiopian context which have similar characteristics with the study populations of the current study, such as rural settings in developing countries where the population has no access to improved WASH services or areas where the households are not linked to municipal water and sewage system. However, the generalizability of the tool to urban settings may be affected since access to WASH services in urban and rural settings are different.

Limitation of the study

We initially planned to conduct construct validity and test–retest reliability. However, we didn’t do these since the variables were not factorable for factor analysis to test construct validity and the score of some variables are not stable over time, for instance WASH behavior or practice questions in the second survey were affected by the scores in the first survey.

Conclusion

In this study, we designed valid and reliable questionnaire to assess behaviors and environmental conditions that result risk of exposure to enteric infections in rural settings. The items included in the questionnaire were found to be appropriate to assess individual behaviors and environmental conditions that result a high risk of exposure to enteric infections. The questionnaire can be used as a tool in the rural settings of developing countries with some amendments to account local contexts. However, this questionnaire alone does not measure exposure of children to enteric infections. It only complements external and internal exposure assessments. External exposure assessment (by identifying indicator organisms or specific pathogens in environmental samples using culture-dependent or culture-independent methods, molecular methods such as polymerase chain reaction (PCR) based assays, metagenomics to sequence and analyze all DNA in environmental samples, and biosensors) and internal exposure assessment, i.e., measuring enteropathogens in humans (using microscopy, enzyme-linked immunosorbent assays, PCR based assays, metagenomics, and pathogen-specific immunoassays) should be done to completely measure exposures to enteric infections as discussed by Goddard et al.36.