Introduction

The aetiology of nasopharyngeal carcinoma (NPC) remains largely unclear. Multiple factors are implicated, including Epstein-Barr virus (EBV) infection, genetic susceptibility and environmental factors1. EBV infection is ubiquitous worldwide while NPC remains a rare cancer in most parts of the world2. Genetic predisposition is clearly associated with NPC, but the decline in risks for NPC observed in subsequent generations of Chinese migrants3,4,5,6 strongly suggests that the role of environmental exposure is crucial. The most commonly used method to measure environmental exposure is through self-reports in questionnaires and interviews.

NPC incidence in Southern China increases with age, and decreases after the peak at age 40–597, highlighting the importance of early life exposure. Indeed, previous case-control studies observed that salted fish consumption during childhood seems more strongly related to NPC risk than adulthood exposure8,9,10,11,12,13,14. To understand the role of early life exposure, the ideal would be a population-based birth cohort study with repeated measures of the exposure, and follow-up of the subjects in a few decades for NPC outcomes. Given that cohort studies are not feasible, retrospective case-control studies with high-quality data on early life exposure remain essential for research on NPC and other rare cancers. Nevertheless, information on early childhood exposure collected by interview and questionnaire relying on adult recall is subject to recall errors (random and systematic). Studies of reliability can inform researchers about such errors. The reliability of a survey instrument is most commonly assessed using the test-retest method, comparing responses given in the second survey with those in the first15. However, the reliability of questionnaires used in case-control studies were rarely reported. No evidence of test-retest reliability of NPC questionnaires was found. Such evidence is especially important on early life exposure, which would have greater error than adulthood exposure.

We evaluated the reliability of early life NPC aetiology factors in the questionnaire of an NPC case-control study in Hong Kong.

Results

Seventy subjects in Queen Mary Hospital (QMH, 69% of NPC cases and 80% of non-NPC hospital controls of all eligible subjects), and 70 in Queen Elizabeth Hospital (QEH, 51% and 58%) were included (Supplementary Fig. s1), with an average of 31.4 (standard deviation: 18.7) weeks and 29.5 (standard deviation: 19.0) weeks between the two interviews, respectively.

Figure 1 shows that for body figures and hand skin tone, similar weighted Kappa coefficients across life periods were observed. In all life periods (age 6–12, 13–18 and 19–30, and 10 years ago), moderate-to-almost perfect reliability coefficients for body figures were found (0.70, 0.69, 0.56 and 0.63 for males, and 0.86, 0.66, 0.76 and 0.65 for females, respectively). For hand skin tone, moderate reliability coefficients were observed in all life periods (0.56, 0.51, 0.50 and 0.53, respectively).

Figure 1
figure 1

Coefficients (95% confidence interval) for body figure, skin tone and sun exposure at different life periods.

However, for sun exposure, reliability coefficients differed by life periods (Fig. 1). The reliability coefficient for duration of sun exposure 10 years ago was 0.89, which was higher than those in earlier periods (age 19–30: 0.46, age 13–19: 0.55 and age 6–12: 0.32) (Fisher Z transformation p < 0.01 with Bonferonni correction). For “actively avoid sun exposure” and “protection from sunshine” (never/ever), moderate reliability coefficients were found for those 10 years ago (0.48 and 0.52). However, the coefficients were lower for age 19–30 (0.27 and 0.42), age 13–19 (0.27 and 0.26) and age 6–12 (0.31 and 0.30).

Table 1 shows moderate-to-substantial reliability coefficients (0.4–0.8) for most food frequency questionnaire (FFQ) items at age 6–12 and 13–18, and the coefficients were generally higher for frequency than portion size items. The reliability coefficients of FFQ at age 6–12 and 13–18 were similar (Table 2). For salted fish consumption, most reliability coefficients were substantial-to-almost perfect (0.6–1.0) in both periods.

Table 1 Reliability coefficients (95% confidence intervals: 95% CI) for food frequency questionnaire in early-life periods, including late childhood (6–12 years) and adolescence (13–18 years).
Table 2 Reliability coefficients (95% confidence intervals: 95% CI) for food items that were measured in both periods (late childhood, 6–12 years and adolescence, 13–18 years).

For other items in all subjects, almost perfect (0.8–1.0) reliability coefficients were observed for the number and sex of siblings and offspring, cancer history, family history of cancer and NPC, smoking and quitting status. Although some FFQ items at age 19–30 and 10 years ago showed poor (0–0.2) coefficients, most factors in the questionnaire had fair-to-substantial reliability coefficients (Supplementary Table s2). Subgroup analyses showed similar reliability coefficients in different strata except for a few items (7 by cases/controls, 14 by sex, 11 by interval between 1st and 2nd questionnaire, 2 by education, and 0 by age at the first questionnaire) (Supplementary Tables s2, s3 and s4).

Discussion

This test-retest study is the first to report the reliability of a computer-assisted, self-administered questionnaire in an NPC case-control study. We found moderate-to-almost perfect (0.4–1.0) reliability of the data on most NPC aetiology factors in early life periods (Tables 1 and 2, and Fig. 1), and most exposure and factors in the questionnaire had fair-to-substantial (0.2–0.8) reliability (Supplementary Table s2). While the reliability coefficients of most questionnaire items were similar across life periods, the reliability coefficients for indicators of sun exposure were higher 10 years ago than in earlier periods (age 19–30, 13–18 and 6–12). Indeed, sun exposure during childhood is remembered less precisely in general in previous studies on skin cancer20. Nevertheless, at least fair (0.2–0.4) reliability of sun exposure in earlier periods was found in our analysis, suggesting that the data are still useful but with limitations.

As some of our questionnaire items were adopted from those in previous NPC case-control studies, our results also suggest that previous case-control study results on the same aetiology factors in similar settings would have similar reliability.

Despite decades of research to find the causal factors of NPC, few studies have reported their reliability of the exposure data21. Compared with previous studies on other diseases, our results are consistent in the reliability of different exposures, including food/dietary supplements in the JPHC Study cohort II22, body figure in colorectal cancer patients23 and sun exposure in melanoma patients24. Moderate-to-substantial (0.4–0.8) reliability coefficients were found in most FFQ items at age 6–12 and 13–18, and higher reliability coefficients were observed for frequency than portion size, which are consistent with previous test-retest reliability studies on FFQ15.

The test-retest reliability of a questionnaire is essential for assessing the validity of epidemiological studies but seldom reported. Such results are particularly important when exposure data are largely based on recall, as in almost all case-control and many cohort studies, especially for rare cancers such as NPC. Evidence of test-retest reliability of NPC questionnaires has been very limited. We took NPC as an example to reaffirm that assessing the recall errors is fundamental while using epidemiological data, and informing the reliability of a questionnaire is important, especially on early life exposure. We also share our questionnaires online for future references (Supplementary questionnaires).

Our present study included 140 subjects within an NPC case-control study in Hong Kong, showing all the coefficients of each questionnaire items for all subjects and for five subgroups, making it the largest and most comprehensive test-retest reliability analysis on NPC case-control studies in the literature. A computer-assisted questionnaire was used to minimise errors from the interviewer, especially subjective bias. We allowed sufficient time (at least 2 weeks after the first questionnaire: average of 30.5 [standard deviation = 18.8] weeks) between interviews to prevent subjects from simply recalling their responses in the first interview. Moreover, differential recall by cases/controls, sex, time between 1st and 2nd questionnaire (2–29/≥30 weeks), education (secondary or less/postsecondary) or age (25–44/45–59/60+ years) at the first questionnaire was unlikely to be large because most of the reliability coefficients were similar.

A limitation of our reliability study was the relatively low response rate (62.8%), which might affect the representativeness of our findings. Such problem is common in test-retest reliability studies25. However, we have found that the respondents had similar basic characteristics as all subjects, suggesting that any non-response bias should be small. Secondly, patients were recruited from two of the five hospitals only, but substantial differences in reliability in subjects from different hospitals were unlikely. However, caution is needed for applying the results to other settings. Because only those who agreed to be re-interviewed were included for assessing test-retest reliability, our results might not be applicable to non-respondents. Thirdly, although our present study is the largest test-retest reliability analysis on NPC case-control studies, small numbers in some rare exposure were observed, including vitamin A supplements (no subject reported ever use), and vitamin D supplements (only 1 reported ever use) and alcohol intake (only 1 reported drinking beer using the one-pint glass). Increasing sample size and revising the questions on rare exposure are recommended. Fourthly, because the short-version (any changes) of FFQ was used at age 19–30, and 10 years ago, the present analysis included only the FFQ at age 6–12 and 13–19. This made it difficult to compare results on FFQ among different life periods. We also found poor reliability coefficients for FFQ items at age 19–30, and 10 years ago. Researchers are recommended to include all periods of exposure on FFQ with the standard version15 in future NPC studies.

Conclusions

This study has shown that the questionnaire data of most NPC aetiology factors of an NPC case-control study in Hong Kong have acceptable reliability. Test-retest reliability study for case-control studies of early life exposure in NPC and other rare cancers is warranted, and the results should be taken into account during data analysis. We recommend that reliability test results should be included for publication of main results from case-control studies, particularly for rare cancers, and the questionnaires should be made available for future references.

Methods

This test-retest reliability study was a supplementary part of a case-control study in Hong Kong, the main study, on the life-course determinants of NPC16. Briefly, eligible subjects for the main study were Chinese aged 18+ in 5 major regional hospitals (QMH, QEH, Pamela Youde Nethersole Eastern Hospital, Princess Margaret Hospital and Tuen Mun Hospital, which treat up to 70% of all NPC cases in Hong Kong) during 2014–2017 without the following conditions: chronic kidney failure, liver cirrhosis, serious heart disease, autoimmune diseases, pregnancy, thyroid disorder, previous thyroid/parathyroid removal surgery, dementia, frailty and cognitive impairment. Cases were incident NPC patients diagnosed with histological and/or radiological evidence in the past 2 months in the Department of Clinical Oncology in the hospitals. Controls were new patients or referrals of a new health complaint in the past 12 months in specialist outpatient clinics, or new inpatients admitted in the past 3 months in the same hospitals who were frequency-matched by age (5-year age groups) and sex to the cases. The controls were selected from patients who attended the clinics or admitted to the hospitals with a wide range of medical diseases unrelated to NPC. Those who had been screened positive for NPC related symptoms such as recent facial nerve palsy, tinnitus, unilateral hearing loss, and epistaxis were excluded. With reference to some questionnaires in previous NPC case-control studies and permission from corresponding authors, we designed a computer-assisted, self-administered questionnaire with 285 questions. A life-course milestone approach was used to collect information on socio-demographics, family cancer history, diet17, sun exposure, smoking18 and drinking history, occupational, household and others factors.

For the test-retest reliability study, eligible subjects were those who verbally agreed to join at least 2 weeks after completing the first questionnaire, and were invited to be interviewed again during January to December 2016 in two hospitals (QMH and QEH) contributing the largest number of subjects. The present analysis included 140 subjects with a response rate of 62.8% (Supplementary Fig. s1). Respondents and all subjects had similar basic characteristics (Supplementary Table s1). Test-retest reliability was assessed by calculating Kappa (κ), weighted Kappa (κw) and intra-class correlation coefficients (ICCs) for categorical, ordinal and continuous variables, respectively. Reliability coefficients were interpreted according to guidelines from Landis and Koch (0 to 0.2, poor; 0.2 to <0.4, fair; 0.4 to <0.6, moderate; 0.6 to <0.8, substantial; and 0.8 to 1.0, almost perfect)19. All analyses were conducted using R 3.3.1.

The present analysis focused on dietary patterns in early life periods, including late childhood (age 6–12) and adolescence (13–18), and on body figure and sunlight exposure in two adulthood periods (age 19–30, and 10 years ago). All items in the questionnaire are shown in Supplementary Table s2. Subgroup analyses were conducted stratified by (1) cases/controls, (2) sex, (3) time between 1st and 2nd questionnaire (2–29/≥30 weeks), (4) education (secondary or less/postsecondary) and (5) age (25–44/45–59/60+ years) at the first questionnaire. The differences between/among coefficients were (1) defined by a difference between/among coefficients of >0.30, and 2) tested by Fisher Z transformation (p < 0.01) with Bonferroni correction to adjust for multiple testing.

All participants gave written, informed consent before participation. The Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 11–192) and the Research Ethics Committee of the Hospital Authority Kowloon Central/Kowloon East (KC/KE-13-0115/ER-2) approved the study protocol, and all methods were carried out in accordance with relevant guidelines and regulations.