Translation and validation of the EORTC QLQ-BR45 among Ethiopian breast cancer patients

This study aimed to examine the validity and reliability of the EORTC QLQ-BR45 questionnaire among breast cancer patients in Ethiopia. This study included 248 breast cancer patients who completed the QLQ-BR45 and QLQ-C30 questionnaires. The internal reliability, test–retest reliability, and the content, concurrent, convergent, divergent, and clinical validity of the tool were examined. The statistical analyses included Cronbach’s α coefficient, Pearson’s correlation coefficient, standardised root mean square residual (SRMR), comparative fit index (CFI), t-test, and root mean square error of approximation (RMSEA). All items were marked as relevant, and item-level content validity index (I-CVI) scores ranged from 0.83 to 1. The S-CVI/Ave was calculated by dividing the sum of I-CVI values by the total number of items, which was found to be 0.94. The average CVR value was 0.76. The Cronbach’s α coefficient was 0.80 for all domains. All subscales met the minimal standards of reliability except the arm symptom scale (0.66). The test–retest reliability coefficient was 0.77 for all domains. Seven out of the 12 hypothesised scales showed positive correlations (r > 0.40) between the QLQ-BR45 and QLQ-C30 scales. Multitrait scaling analysis showed that the item-scale correlations exceeded the 0.40 criterion for item-convergent validity for 11 of the 12 hypothesised scales. The correlation coefficients between an item and its own subscale were significantly higher than with other subscales. The EORTC QLQ-BR45 had good reliability and validity, and it can be used to measure the quality of life of breast cancer patients in Ethiopia.

and above who were receiving or had previously received curative or palliative treatment, who had no previous primary or recurrent tumour, and who could understand and speak the Amharic language were invited to participate in the study. Patients were excluded from the study if they had a history of mental illness or cognitive impairment, if they were not willing to participate, or if they had any other severe medical illnesses, coexisting malignancies, or other metastatic disease.
Sample. The minimum sample size recommendations for validation studies range from 100 to 400 participants or more 11,12 . According to the EORTC, the sample size is determined by the number of items in the questionnaire. The sample was calculated according to the EORTC guidelines and the recommendations for multivariate psychometric analysis, which concluded that the sample size needed to be five to ten times the number of items 13 . Therefore, the calculated sample size was 248.
Instruments. EORTC QLQ-C30. The EORTC QLQ-C30 is a core questionnaire that assesses the quality of life of cancer patients. It consists of a 30-item questionnaire composed of five functional scales, three symptom scales, and a global health and quality-of-life scale. The other single item symptoms include dyspnea, loss of appetite, sleep disturbance, constipation, diarrhoea, and financial difficulties. The questionnaire was translated into Amharic, the official language of Ethiopia, and validated to assess the quality of life of Ethiopian cancer patients 14 .
EORTC QLQ-BR45. EORTC QLQ-BR45 is a specific breast cancer module that is used in combination with the EORTC QLQ-C30 core questionnaire. The EORTC QLG updated the previous breast cancer-specific module to EORTC QLQ-BR45. The updated version incorporates an additional 22 items, including a target symptom scale and a satisfaction scale. These new items include two multi-item scales: target symptom scale (20 items) and satisfaction scale (two items). The target symptom scale can be further divided into three subscales: endocrine therapy scale, endocrine sexual scale, and skin/mucosa scale 9 . A formal permission letter was obtained from the authors 9,15 .
The item scoring procedure for the EORTC QLQ-C30 and the EORTC QLQ-BR45 was managed according to the EORTC QLQ-C30 scoring manual. After the scoring procedures, the score was transformed into a 0-100 scale. A high score for functional scales indicates a high level of functioning, while for symptom scales, a higher score indicates a higher level of symptoms 16 . Translation procedure. The translation procedure for the EORTC QLQ BR-45 was based on the EORTC QLG translation procedure 17,18 . The English to Amharic translators (forward translators) were given the original English version. The original English version of the EORTC QLQ BR-45 was translated into Amharic by two oncologists independently of one another. The forward translation was performed by two separate translators who independently translated the questionnaire from English into Amharic. The two translations were reconciled by the principal investigator and then translated back into English by another oncology physician and a www.nature.com/scientificreports/ nurse, independently. The two translators were given the reconciled translation and were blinded to the original English version. The preliminary translation was reviewed by a professional proofreader. The proofreader checked the equivalence between the original English version questionnaire and the preliminary translation. The interim analysis was prepared after the translation unit members, the proofreader, and the principal investigator reached an agreement on the preliminary translation.
Pilot testing. The translated questionnaire was pilot tested on 10 female breast cancer patients. The principal investigator discussed with the participants whether the translation was difficult to understand, difficult to answer, upsetting/offensive, or confusing. The comments suggested by the participants were back-translated into English. After the pilot testing was successfully completed, the translation unit sent the final translation to the principal investigator for approval and use.
Ethical approval. Ethical approval was obtained before conducting the study. Participants' information was kept confidential. There was no risk associated with participating in the study.
Statistical analysis. The Amharic version of QLQ-BR45 questionnaire was evaluated for its internal reliability, test-retest reliability, content validity, concurrent validity, convergent validity, divergent validity, and known-group validity. Content validity was evaluated by a panel of six experts from September to November 2020 19 , including a professor of public health, an oncology nurse, an assistant professor in clinical pharmacology, and a PhD candidate in pharmacology with experience in the validation of QoL instruments. These experts were chosen based on their clinical and research experience.
The content validity index was evaluated at two levels, namely item-level content validity index (I-CVI) and scale-level content validity index (S-CVI), based on expert review. The S-CVI has two extensions: universal agreement (S-CVI/UA) and average (S-CVI/Ave). Use of the S-CVI/Ave is recommended, and acceptable values of S-CVI/Ave are 0.90 or higher 20 . An I-CVI value of 0.78 or higher is considered excellent 21,22 .
The content validity ratio (CVR) ranges from −1 to 1. Higher scores indicate greater agreement of panellists on the necessity of an item in an instrument. The closer the CVR is to 1, the more essential the tool will be. The formula for the CVR is CVR = (Ne-N/2)/(N/2), where Ne is the number of panel members considered "essential" and N is the total number of panellists. The numeric value of the CVR was determined using the Lawshe table 23 .
Concurrent validity means the agreement with the true value. The new questionnaire was compared to wellestablished instruments that already have an estimated validity. The concurrent validity is considered to be high if the agreement or correlation between the EORTC QLQ-BR45 and EORTC QLQ-C30 is high 12 .
Convergent validity is defined as a Pearson correlation coefficient between the item and its own scale (itemscale correlation) higher than r ≥ 0.40, while divergent validity is indicated when the relationship of one item to its domain is significantly higher than its relationship to another scale 15 .
Confirmatory factor analysis was used to test whether the correlation corresponds to the hypothesised scale structure. This method tests whether the hypothesised relationship between observed variables and their underlying latent dimensions is confirmed. The comparative fit index (CFI) is equal to the discrepancy function adjusted for sample size. The CFI ranges from 0 to 1, with larger values indicating a better model fit. An acceptable model fit is indicated by CFI values of 0.90 or greater. The root mean square error of approximation (RMSEA) is related to the residual in the model. RMSEA values range from 0 to 1, with an RMSEA value of 0.06 or less considered an acceptable model fit 24 .
Known-group comparisons were performed to evaluate how well scales can discriminate between participants enrolled in different groups, according to their age, residence, disease stage, and treatment modalities 25 . This psychometric property is also called sensitivity.
The internal consistency of the multi-item scale was assessed by Cronbach's α coefficient 26 . As recommended, a Cronbach's α coefficient of 0.70 or greater is acceptable 12,26 , while values exceeding 0.80 are considered good 27 . A subgroup of follow-up patients with no change in health status (stable health status) was invited to complete the QLQ-C30 and QLQ-BR45 a second time one to two weeks later for the test-retest analysis. This analysis was used to test the consistency of the module based on a repeatable score at a different time 27 . Thirty patients participated in the test-retest analysis one to two weeks after the first assessment.
Ethics approval. All  Reliability. The mean and standard deviation of each subscale/item, Cronbach's α coefficient, and test-retest reliability coefficients intraclass correlation (ICC) and correlation coefficient r) of all domains are presented in Table 2. The Cronbach's α coefficient of the Amharic version of the EORTC QLQ-BR45 was 0.80. All of the domains had an acceptable internal consistency value greater than 0.7, except for arm symptoms (0.66). The test-retest reliability coefficient was 0.768 for all domains. The test-retest reliability coefficients of most domains were less than 0.70, except for BRST (0.73), BRBS (0.75), and BRET (0.72). The ICC was similar to the correlation coefficient r, indicating no significant drift in the mean response for all domains.

Content validity.
Item-level content validity index. The I-CVI is computed as the number of experts giving a rating of 3 or 4 (quite or highly relevant) to the relevance of each item, divided by the total number of experts. The I-CVI for the relevance of each item was greater than 0.78 21,22 . All items were marked as relevant, and the I-CVI score ranged from 0.83 to 1. Thirty items had an I-CVI score of 1.00 and 15 had a score of 0.83.
Scale-level content validity index/average. The S-CVI/Ave was calculated by dividing the sum of I-CVI values by the total number of items, which was found to be 0.94.
Content validity ratio. The CVR was generated for each item. According to the Lawshe table 23 , the minimum CVR value for a total number of six panellists was 0.99. Sixteen items had a CVR value of 1.00, 27 items had a score of 0.67, and two items had a score of 0.33. The average CVR value was 0.76.
Clarity. Clarity was assessed by the six panelists on a 3-point Likert scale (1, not clear; 2, somewhat clear; 3, very clear). The average clarity scores for individual items ranged from 2.5 to 3, with 32 (71.1%) items considered very clear. Overall, 32 items had an average clarity score of 3.00, six items had a score of 2.83, four items had a score of 2.67 and two had a score of 2.5.

Construct validity.
Multitrait scaling analysis. Item scale correlation. Table 3 shows the item-scale correlation of the EORTC QLQ-BR45. Item-scale correlations (corrected for overlap) exceeded the 0.40 criterion for item-convergent validity for 11 of the 12 hypothesised scales, with the exception of item 38.
The correlation coefficients between an item and its own subscale were significantly higher than for other subscales. Item convergence and discrimination were noted in 97.8% and 88.7% of QoL scales, respectively. The most obvious scaling failure corresponded to systematic therapy side effects. The small number of scaling errors provided strong support for the hypothesised scale structure of the EORTC QLQ-BR45.
Interscale correlations. Table 4 presents the correlations among the 12 scales of the QLQ-BR45. All correlation coefficients ranged from 0.001 to 0.735. A strong correlation coefficient (r = 0.735) was found between the sexual enjoyment and sexual functioning scales.
The endocrine therapy scale and systematic therapy side effect scales were strongly correlated with most of the other subscales. The endocrine therapy scale had a strong correlation with systematic therapy side effects (r = 0.61), breast symptoms (r = 0.45), arm symptoms (r = 0.48), and body image (r = 0.40). The systematic therapy side effect scale was correlated with the breast symptom (r = 0.43), hair loss (r = 0.46), body image (r = 0.47), endocrine therapy (r = 0.61), and skin mucosis (r = 0.48) scales.
Criterion validity. The Pearson's correlation coefficients of scores between the domains of the two instruments (QLQ-BR45 and QLQ-C30) are presented in Table 5. Seven out of the 12 hypothesised scales showed correlations (r > 0.40) between the QLQ-BR45 and QLQ-C30 scales. The instrument correlations were higher between the same and similar domains than between different and non-similar domains. For example, the systematic therapy side effect scale was strongly correlated to the conceptually related QLQ-C30 fatigue, nausea and vomiting, and pain scales, with correlation coefficients greater than 0.5. The breast symptom and arm symptom scales also had strong correlations with pain (r > 0.5). All hypothesised correlations were statistically significant (p < 0.01). The QLQ-BR45 scales showed comparatively low correlations (r < 0.40) with QLQ-C30 scales in 144 out of 180 comparisons (80%). The QLQ-BR23 endocrine therapy scales showed strong correlations with the QLQ-C30 role functioning (r = 0.45), emotional functioning (r = 0.40), cognitive functioning (r = 0.44), fatigue (r = 0.53), and pain (r = 0.51) scales. www.nature.com/scientificreports/ sexual enjoyment (p < 0.001), body image (p = 0.003), and future perspectives (p = 0.009) and lower symptom scores (upset by hair loss, p = 0.031) than participants aged under 45 years. The endocrine sexual-related function (p = 0.031) was worse in patients residing in rural areas than those in urban areas.

Confirmatory factor analysis. Model assumptions.
To verify the stability and rationality of QLQ-BR45, the hypothesised scale structure was assumed to be a good model. However, the hypothesised scale structure model fit was poor according to CFA model fit indicators (CFI = 0.75, RMSEA = 0.08, SRMR = 0.09). These results indicate that the hypothesised scale structure did not fit the model well.
Modification and model fit. According to the modification indices, seven covariance correlations were added to the model, and each covariance correlation was between the residuals of different items in the same dimension, which supports the hypothesised scale.

Discussion
This study showed that the QLQ-BR45 is a reliable and valid tool to assess the QoL of breast cancer patients. Previous studies have been conducted on the validity and reliability of the Amharic version of the QLQ-BR23. However, since the development of the QLQ-BR23 questionnaire, there have been major advances in the diagnosis and treatment of breast cancer, requiring the update of QLQ-BR23 to QLQ-BR45. The latter includes an additional 22 items, which were added to the original version.
The translation of the QLQ-BR45 was performed in collaboration with the EORTC translation team, and it followed the translation procedure developed by the EORTC QLG 17 . As a result of the translation process, the final version was linguistically and conceptually comprehensible to people of all education levels, culturally acceptable, and reflected the wording and structure of the original English version, as well as the standard layout and formatting of EORTC questionnaires. The reviewers of the EORTC quality of life translation team approved the Amharic translation and the procedure employed.
The average time required to complete the translated version of the questionnaire was 8 min (SD 2.8 min), which is comparable to the result of a previous study (9.2 min, SD 4.7 min) 7 . Both of these tools were developed www.nature.com/scientificreports/ by the EORTC and follow the same questionnaire development guidelines, and the number of items is almost the same.  www.nature.com/scientificreports/     www.nature.com/scientificreports/ www.nature.com/scientificreports/ In this study, the overall reliability of the questionnaire was 0.80, and all of the domains had an acceptable internal consistency value greater than 0.7, which is consistent with the phase III international update of QLQ-BR23 9 . This indicates that the internal consistency of the item is acceptable 23,28 . However, the arm symptom scale had a low internal consistency coefficient, similar to that observed for Moroccan breast cancer patients 29 . This can be explained by the inclusion of different areas of the body in the scale, such as the arm, shoulder, and hand. The overall test-retest reliability of all domains was 0.768, which was not satisfactory. One possible explanation is that the newly diagnosed breast cancer patients did not have much knowledge of the disease on their initial admission, but they gained more knowledge about the severity of the disease and its poor prognosis in the following days. Moreover, patients may have started taking symptom relief medication by the second assessment, such as pain relief medication. In addition, having an inconsistent environment for participants, such as being in a hurry during the test and mood instability, might have impacted the subjective assessment of QoL.

Item number
Multitrait scaling analysis showed that almost all of the items had stronger correlations with their own subscales than other subscales. This indicates strong convergent validity of the instruments. The magnitude of the correlation coefficients among all subscales was high (r = 0.40-1.00). However, item number 38 on the endocrine therapy scale had a low correlation (r < 0.4). The magnitude of discriminate validity was 88.7%. The most obvious scaling failure was observed for the systematic therapy side effect. This might have occurred because systematic therapy side effects are often nonspecific. This is in contrast to a previous study performed on breast cancer patients 10 .
In this study, the I-CVI score ranged from 0.83 to 1, and the S-CVI was found to be 0.94. An I-CVI score of 0.78 or higher is considered excellent 30 and a S-CVI/Ave value of 0.90 or higher is acceptable 20 . Therefore, all items were considered relevant and had an excellent content validity score.
The interscale correlation value showed that the newly added QLQ-BR45 scales were not strongly correlated with the existing scales of the QLQ-BR23, similar to a previous update study on QLQ-BR23 9 . In this study, the endocrine therapy and systematic therapy side effect scales were strongly correlated with most of the other subscales, and strong correlation coefficients (r = 0.735) were found between the sexual enjoyment and sexual functioning scales. This strong correlation might result from both scales dealing with issues related to sexuality. Most other scales were correlated moderately or weakly with each other. The moderate or weak correlation of scales indicates that there are distinct components of the BR45 construct.
The external convergent validity correlation coefficient between EORTC QLQ-BR45 and EORTC QLQ-C30 was under 0.70. The systematic therapy side effect scale was correlated with the QLQ-C30 fatigue, nausea and vomiting, and pain scales. This correlation might be because both are symptom scales. The breast symptom and the arm symptom scales also had strong correlations with the pain scale of the QLQ-C30. A similar study performed in Turkey showed strong correlations between the symptom scales of QLQ-BR23 and QLQ-C30 10,31 . This might be explained by the two dimensions being conceptually related. Conversely, the symptom scales of EORTC QLQ-BR45 were more strongly correlated with the corresponding scales of EORTC QLQ-C30 than the functional scales 10 . Furthermore, correlations involving the sexual functioning, sexual enjoyment and endocrine sexual scales were low (r < 0.40). In Ethiopia, sexual-related issues are a sensitive topic, and patients often do not want to respond to related items. In the current study, 80% of the hypothesised scale structures between QLQ-BR45 and QLQ-C30 were not significantly correlated. However, the moderate and low correlation coefficients in the other domains of the QLQ-BR45 and QLQ-C30 suggest that the subscales were assessing distinct components of the QoL construct.
Known-group validity was examined for different groups, such as age, residential status, stage of disease, and treatment modalities. As expected, the mean functioning scales (sexual functioning and sexual enjoyment) had significantly lower scores in patients with metastatic disease compared to those in an early stage of disease, indicating that the QLQ-BR45 Amharic version is able to differentiate between patients with various disease severities. This is in line with a study conducted in Korean breast cancer patients 32 . Study participants aged over 45 years reported better sexual functioning, sexual enjoyment, body image, and future perspectives and a lower symptom scale (upset by hair loss) than participants younger than 45 years. Conversely, a study conducted in China showed that participants aged older than 50 years reported worse physical, role, and cognitive functioning and more sleep-related symptoms than those younger than 50 years 33 . This might be explained by differences in the tools used by the studies. Endocrine sexual-related function was lower in participants from rural places than urban places. This might be due to a lack of access to healthcare services and late detection of cancer.
Confirmatory factor analysis is a statistical technique used to test the hypothesised scale structure that assesses whether a relationship exists between the observed variables and their underlying latent constructs 34 . The hypothesised scale structure verified by CFA includes the 22 new items proposed by the EORTC QLQ study group, which showed no strong correlations with the existing scales of the QLQ-BR23 9 . It is recommended that a combination of measures should be used to test the hypothesised scale structure, such as the RMSEA, CFI, SRMER, and CMIN 35 . A RMSEA value less than 0.06 indicates an "excellent fit", a value between 0.06 and 0.08 indicates an 'acceptable fit' , and a value greater than 0.08 indicates an "unacceptable fit". CFI and TLI values close to 0.95 and SRMR lower than 0.08 reflect a good fit of the model to the data 36 . Thus, based on these criteria, the constructs of the instrument demonstrated an excellent fit to the model. Therefore, the results of multitrait scaling analysis and confirmatory factor analysis confirmed the hypothesised scale structure, indicating that the Amharic translation of the items and their response choices are appropriate and that scale scores could contribute to cross-cultural comparisons.
Strengths and limitations of the study. Globally, this is the first validation study done on the newly updated QLQ-BR45. Forward and back translation of the new tool was carried out according to the EORTC translation guidelines. The study used different reliability and validity tests, including internal consistency reli-