Introduction

Hematological Malignancy (HM) represents a complex group of highly malignant tumor diseases that are challenging to treat. According to 2020 WHO statistics, the incidence rates of leukemia in China was 5.9 per 100,000, non-Hodgkin lymphoma, multiple myeloma and Hodgkin lymphoma were 6.4 per 100,000, 0.47 per 100,000, 1.5 per 100,000 respectively1. Patients afflicted with HM often grapple with a myriad of physical, psychological, and social challenges, exacerbated by both the disease and its associated treatments. In the realm of cancer care, the frequent and precise assessment of symptoms is paramount. Patient-reported outcome (PRO) measures have emerged as a gold standard, offering invaluable insights into patients’ subjective experiences and overall quality of life2. These tools are instrumental in fostering enhanced patient-nurse communication, enabling systematic monitoring, and facilitating tailored management of patients’ symptoms, thereby promoting patient-centered care3,4,5,6.

The Patient-Reported Outcomes Measurement Information System (PROMIS), an initiative by the National Institutes of Health, is renowned for its innovative self-report measures designed to evaluate the physical, mental, and social facets of health and well-being7. The versatility and comprehensiveness of PROMIS have garnered significant attention, marking it as a pivotal tool in the holistic assessment of individual health2,8,9. PROMIS includes item banks that can be administered using computer-adaptive testing, short forms for individual domains, and profiles that yield information about multiple domains for use in clinical trials, observational studies, and clinical practice7.

The PROMIS-29 V2.1, in particular, stands out for its robust design, aimed at addressing the gap in universal and generalizable measures for assessing core patient-reported symptoms and functional domains in individuals with chronic diseases10. Developed through meticulous processes including literature review, Item Response Theory (IRT) analysis, and expert reviews, the PROMIS-29 V2.1 ensures a comprehensive and standardized evaluation of patients’ health statuses10.

Although the PROMIS-29 V2.1 has been translated into Chinese by the PROMIS National Center-China (PNC-China), its application in the context of HM remains limited. There is a conspicuous absence of validation studies exploring the efficacy and reliability of PROMIS-29 V2.1 among HM patients. Given the critical need for nuanced assessments of physical, social, and mental health in this demographic, validating the PROMIS-29 V2.1 could not only enhance clinical practices but also pave the way for international comparative studies.

In light of this, our study is poised to conduct an exhaustive psychometric evaluation of the Chinese version of PROMIS-29 V2.1 among a selected cohort of HM patients in mainland China. We aim to delineate its reliability, validity, and potential applications in this specific medical and cultural context.

Methods

Study design

This multicenter cross-sectional study received approval from the Medical Ethical Committee of Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (registration number QTJC2022002-EC-1). We adhered to the Consensus-based standards for the selection of health measurement instruments (COSMIN) guidelines to evaluate the psychometric properties of the Chinese version of the PROMIS-29 V2.1 among hematological malignancy patients.

Setting and sample

Patients were conveniently sampled from the hematology departments of four tertiary hospitals across Tianjin, Shandong, Jiangsu, and Anhui provinces in China, between June and August 2023. Based on the 5–10:1 case-to-variable ratio for psychometric evaluation and accounting for a potential 20% invalid sample rate, we aimed for a sample size between 174 and 348 and successfully included 354 cases9,11. The sample size was aslo sufficient to perform stable and precise model estimation by confirmatory factor analysis (CFA)11.

Patients were eligible for the study if they met the following criteria: (a) aged 18 or older, (b) had a diagnosis of Hematological Malignancy, including leukemia, lymphoma, myeloma, myelodysplastic neoplasms and myeloproliferative neoplasms, (c) Being able to speak Mandarin and read Chinese, and (d) signed an informed consent form. Patients with psychiatric illness, cognitive impairment or diagnosis of another cancer type were excluded.

Measures

Socio-demographic information questionnaire

A sociodemographic information questionnaire was developed to collect sociodemographic and clinical data including gender, age, residential location, education level, marital status, job, employment status, health insurance, average monthly family income, primary caregiver, diagnose, time since diagnosis, medical costs, treatment phase and medical treatment. Patients self-reported sociodemographic data, while trained nurse researchers extracted clinical data from medical records.

PROMIS-29 V2.1

The Chinese version of the PROMIS-29 V2.1 was used in this study, which was authorized by PNC-China. The PROMIS-29 V2.1 consists of 29 items measuring seven health and function domains: physical function, anxiety, depression, fatigue, sleep disturbance, ability to participate in social roles and activities, pain interference and intensity. Except for a single item for pain intensity, all domains include 4 items and are responded to with a five-point Likert scale from 1 to 5. The pain intensity item is answered with a 0 to 10 numeric rating scale ranging from 0 (without pain) to 10 (worst pain imaginable). Item scores in each domain were summed and transformed into T-scores metric: values of 50 (SD = 10) indicate the mean of the U.S. general population (http://www.healthmeasures.net)7. For physical function and social role, higher scores indicate better functioning and quality of life (QOL). For depression, anxiety, fatigue, pain interference, pain intensity, and sleep disturbance, a higher score indicates more serious implications of disease7.

FACT-G

The FACT-G are the most frequently used questionnaires to measure health-related quality of life (HRQOL) in patients with cancer. The FACT-G is comprised of four subscales: physical wellbeing (PWB, 7 items, 0–28), social/family wellbeing (SWB, 7 items, 0–28), emotional wellbeing (EWB, 6 items, 0–24), and functional wellbeing (FWB, 7 items, 0–28)12. All items in the FACT-G use a five-point rating scale (0 = not at all, 1 = a little bit, 2 = somewhat, 3 = quite a bit, and 4 = very much). The 12 items PWB l to 7, EWB l, EWB 3 to EWB 6 are reverse entries and need to be scored in reverse. The total score of the scale is 108, and the higher the score, the higher the quality of life12,13.

Date collection

Eligible patients were enrolled during hospitalization by the trained nurse researchers at each study site, who had received training regarding the study process to ensure the standardization of the data collection. All the participants were informed about the purpose and procedures of the study, and verbal consent was obtained before data collection. In addition, participants were informed of the voluntary nature of participation, participants’ rights, and the confidentiality of the data. Participants could choose to complete the survey either on paper or using web-based questionnaires based on their preferences. Data on every respondent were collected only once. The participants were required to return the questionnaire immediately after completion. To express gratitude, all participants were distributed a bottle of no-hand sanitizer after completing the questionnaire.

Date analysis

Analyses were conducted using IBM SPSS version 21.0 and IBM SPSS Amos Graphics (version 26.0). All significance tests were 2-tailed, with p < 0.05 considered signifcant.

Descriptive statistics were calculated for sample characteristics and study variables, in which continuous variables were analyzed by means and standard deviations, and categorical variables were described by counts and percentages. The PROMIS-29 V2.1 raw scores were transformed into T-scores based on the PROMIS guidelines (http://www.healthmeasures.net). The ceiling or floor effects were identified if responses exceeded 15% at the best and the worst possible score. Reliability was assessed via Cronbach’s α coefficient, Composite Reliability (CR) and split-half reliability.

Criterion validity was determined by correlating PROMIS-29 V2.1 domains with similar constructs in FACT-G, using Spearman correlation coefficients. Confirmatory factor analysis (CFA) was carried out using maximum likelihood estimation to examine the construct validity of the PROMIS-29 V2.1 domains. To examine the goodness of model fit, indices including the χ2/degree of freedom (χ2/df), root mean square error of approximation (RMSEA), goodness-of-fit index (GFI), comparative fit index (CFI), incremental fit index (IFI), Normed Fit Index (NFI), and Tucker–Lewis index (TLI) were included. An acceptable CFA model should have a χ2/df < 3; a RMSEA < 0.08; and a GFI, CFI, IFI, NFI, TLI > 0.914. AVE and √AVE index were performed to assess the convergent validity and discriminant validity.

Ethics approval and consent to participate

All participants signed written informed-consent forms and completed questionnaires online at their earliest convenience. Ethical approval was approved by the Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (No. QTJC2022002-EC-1).

Results

Sample characteristics

A total of 400 questionnaires were distributed. 29 eligible participants did not consent to participate, while 371 agreed to be involved. In addition, 17 questionnaires were excluded for that the participants circled the same response choice for every question asked. A sample of 354 was chosen for the final analysis. The average age of the patients was 46.93 years. A majority of the participants were male (57.3%), married (78.8%), and unemployed (78.2%). In terms of education, the largest group had completed high school or an equivalent level of education (39.5%). Most participants were covered by employee health insurance (60.5%), and the prevalent income bracket was ¥3001–¥5000 per month (25.7%). Clinically, leukemia was the most common diagnosis, accounting for 43.8% of the patients. A significant portion (32.8%) were diagnosed for less than 6 months. The majority (83.6%) were undergoing treatment at the time of the survey. See more detail in Table 1.

Table 1 Sample characteristics of the study sample (N = 354).

Reliability analysis

Regarding the reliability analysis, the internal consistency coefficients, CR and split-half coefficient were calculated. Reliability was excellent for the PROMIS-29 V2.1 scale with Cronbach’s α (0.965) and split-half coefficient (0.927). For all seven domains of PROMIS-29 V2.1 subscales, Cronbach’s α ranged from 0.787 (sleep disturbance) to 0.968 (pain interference and intensity), CR ranged from 0.778 (sleep disturbance) to 0.976 (pain interference and intensity), which were all above the threshold of 0.70, indicating sufficient reliability. See Tables 2 and 6.

Table 2 Reliability of the PROMIS-29 V2.1 (N = 354).

Descriptive statistics, ceiling, and floor statistics

Regarding the mean T-scores of PROMIS-29 V2.1, except the physical function (41.31 ± 11.85) and the ability to participate in social roles and activities (47.64 ± 11.38), the other five domains scores were significantly above than the reference level according to the PROMIS guidelines (http://www.healthmeasures.net). See Table 3.

Table 3 Mean Scores, T-scores, floor and ceiling effects of the PROMIS-29 V2.1 (N = 354).

Floor effects reflect the percentage of people who report the worst possible score; ceiling effects reflect the percentage of people who report the best possible score. And the ceiling or floor effects were identified if responses exceeded 15% at the best and the worst possible score. As mentioned in the methods section, for physical function and social role, higher scores indicate better functioning and QOL. For depression, anxiety, fatigue, pain interference, pain intensity, and sleep disturbance, a higher score indicates more serious implications of disease. As shown in Table 3, except sleep disturbance, the other six domains had ceiling effects, which were seen on physical function (26.0%), anxiety (37.0%), depression (40.4%), fatigue (18.4%), social roles (18.9%) and pain interference (43.2%), respectively. See Table 3.

Criterion validity

After normality test, the scores of PROMlS-29 and FACT-G scale did not conform to normal distribution, so Spearman correlation analysis was used to conduct correlation analysis. The absolute value of correlation coefficient between PROMIS-29 V2.1 item scores with the corresponding domains coefficients in the FACT-G ranged from 0.156–0.752 (p < 0.001), showing satisfactory criterion validity. See Table 4.

Table 4 The criterion validity of the PROMIS-29 V2.1

Construct validity

In our analysis, the PROMIS-29 V2.1 demonstrated excellent construct validity among patients with HM, as evidenced by a χ2/df of 2.602, an IFI of 0.960, and an RMSEA of 0.067. While the GFI was slightly below the ideal threshold at 0.850, the other indices, including AGFI, NFI, CFI, and TLI, exhibited values ranging from 0.937 to 0.960, affirming a commendable model fit (Table 5). The revised model, offering a visual representation of these findings, is illustrated in Fig. 1.

Table 5 Model fit indices of confirmatory factor analysis for PROMIS-29 V2.1 (n = 354).
Figure 1
figure 1

Confirmatory factor analysis model for PROMIS-29 V2.1 (F1–F7: anxiety, depression, physical function, fatigue, sleep disturbance, ability to participate in social roles and activities, and pain interference, respectively).

Convergent validity

The Average Variance Extracted (AVE) is the sum of the square of factor load, which represents the comprehensive explanation ability of the potential variable to all the measured variables. According to the general theory, the larger the AVE value, the stronger the potential variable's ability to explain its corresponding item at the same time; conversely, the stronger the item's ability to express the properties of the potential variable. When AVE > 0.5, convergent validity is good15, and when between 0.36 and 0.5, it is an acceptable range15. In this study, the AVE values for the seven dimensions of PROMIS-29 V2.1 range from 0.500 to 0.910. Each domain’s factor loadings, which are indicative of the relationships between the items and their respective constructs, were notably high across most domains, further corroborating this assertion, showing satisfactory convergent validity. See Table 6.

Table 6 The convergent validity of the PROMIS-29 V2.1

Discrimination validity

In this study, the seven dimensions of PROMIS-29 V2.1 were significantly correlated (p < 0.01), and the absolute correlation coefficients are all smaller than the corresponding √AVE, indicating that there is a certain correlation among all latent variables, and a certain degree of differentiation between each other, showing ideal discrimination validity. See Table 7.

Table 7 The discrimination validity of the PROMIS-29 V2.1

Discussion

This study is pioneering in its endeavor to evaluate the psychometric properties of the Chinese version of the PROMIS-29 V2.1 profile among patients with HM. Our findings affirm the reliability and validity of this instrument in capturing the multifaceted health status, encompassing physical, mental, and social dimensions, of this specific patient group.

Regarding reliability, Cronbach’s alpha is considered an adequate measure of internal consistency16. Composite Reliability (CR) reflects whether all questions in each latent variable consistently explain the latent variable, and when the value is higher than 0.70, it indicates that the latent variable has good CR17. Compared to Cronbach’s α, CR is more able to incorporate the different factor loadings of each observation item on latent variables into the calculation formula, and its estimated value is closer to the internal consistency reliability of the scale17. In this study, both the Cronbach’s α and CR of all domains were close to, or meeting the more stringent criterion of 0.9, which providing evidence of high internal consistency reliability.

The T-scores derived from the PROMIS-29 V2.1 highlighted an apparent diminution in physical function and social participation compared to the reference group. This underscores a pronounced impairment in physical activities and social engagement. The results were similar to those of patients with breast cancer2 and systemic sclerosis18.

Evidence of floor and ceiling effects has been observed in some PROMIS-29 V2.1 domains, which has also been noted in other PROMIS validation projects8,19. The floor and ceiling effects of the scale mean that the number of respondents who achieved the worst or the best possible score, which reflect the quantity scale features of score distribution16. Floor or ceiling effects are considered to be present if more than 15% of respondents achieved the worst or the best possible score, respectively16,20.

In our study, a significant proportion of participants reported minimal symptoms in anxiety, depression, fatigue, and pain domains, aligning with general population trends. However, pronounced ceiling effects in each domains (except sleep disturbance) could be attributed to the fact that a majority of our sample were undergoing treatment, potentially amplifying these effects. Nevertheless, it would not be problematic when identifying those with poor physical performance. Such limitations may not exist in a future sample including more patients at different stage of the disease.

The criterion validity was demonstrated by its varying degree of correlations with FACT-G. Criterion validity refers to the extent to which scores on a particular instrument relate to a gold standard16. Current studies on PROs or QOL in people with HM usually use the FACT-G as the assessment tool21,22. Spearman correlation coefficients > 0.50 were considered strong correlation, 0.30–0.50 indicated moderate correlation, and < 0.30 indicated weak correlation15. In this study, the PROMIS-29 V2.1 domains showed adequate correlations with all corresponding dimensions of the FACT-G (P < 0.01).

CFA showed that the Chinese version of the PROMIS-29 V2.1 in patients with HM had good evidence for construct validity including the presence of the seven domains. According to the results of goodness-of-fit, the model is considered to have a good fitting effect when χ2/df < 3, IFI > 0.9, and RMSEA < 0.08 after correction, meanwhile, the values of the five fitting indices (GFI, AGFI, NFI, CFI, and TLI) should be all between 0 and 1, the closer to 0, the worse the fitting, and the closer to 1, the better the fitting15. The goodness-of-fit indices for the original domain of PROMIS-29 V2.1 were high. Meanwhile, the PROMIS-29 V2.1 were showing satisfactory convergent validity and discrimination validity. The results underscore the robust structural integrity of the PROMIS-29 V2.1 in capturing the multifaceted health outcomes of patients with HM.

Convergent validity which is evaluated by the AVE index, means that items measuring the same underlying domain should belong to the same dimension and there should have a high degree of correlation between items15. In the context of this study, the AVE values for all seven domains of the PROMIS-29 V2.1 were examined, offering insights into the measure’s convergent validity among patients with HM. These findings underscore the instrument’s robustness in capturing the intended constructs with minimal measurement error, attesting to its utility in this specific patient population. The consistency in factor loadings amplifies confidence in the PROMIS-29 V2.1’s ability to offer reliable, nuanced insights into the multifaceted health outcomes of patients with HM.

Discriminant validity evaluates the extent to which a construct is distinct from other constructs, ensuring that it is not highly correlated with other variables, and should theoretically be different from15. In this context, it is assessed by comparing the√AVE for each construct with the correlations between that construct and others. Ideal discriminant validity is achieved when the √AVE for each construct is greater than its highest correlation with any other construct15. In our study, the PROMIS-29 V2.1 demonstrated excellent discriminant validity among patients with hematologic malignancies. For instance, while there was a notable correlation between anxiety and depression (r = 0.900, p < 0.01), the √AVE values for these constructs were 0.934 and 0.937, respectively, exceeding the correlation coefficient. This pattern was consistent across all construct pairs, underscoring the instrument’s ability to distinguish between different aspects of patients’ health and well-being effectively. These findings affirm the multidimensionality of the PROMIS-29 V2.1 and its applicability in capturing a broad spectrum of health outcomes among patients with hematologic malignancies, without conflating distinct constructs.

To sum up, these findings reinforce the utility of the Chinese version of the PROMIS-29 V2.1 as a reliable tool, mirroring the intricate nuances of patients’ experiences and outcomes. This congruence in outcomes underscores the PROMIS-29 V2.1’s potential as a pivotal tool in both clinical and research settings for this patient population.

Limitations

However, this study has several limitations. First, the participant pool, though multicentric, was confined to tertiary hospitals in China, warranting caution in extrapolating these findings to broader settings and populations. Second, the cross-sectional design precludes insights into the instrument’s responsiveness and interpretability over varying clinical states, marking an avenue for future longitudinal studies. Third, this study doesn’t explain how the questionnaires work in the pre- and post-treatment patient population, and that's what we’re going to explore next.

Conclusion

This study meticulously evaluated the psychometric properties of the Chinese version of the PROMIS-29 V2.1 in patients with HM, utilizing a comprehensive, multicenter sample. Our findings affirm that this version of PROMIS-29 V2.1 is a validated and reliable instrument, adept at measuring a spectrum of symptoms and functional attributes in HM patients. However, the evolution of this instrument’s applicability doesn’t end here. Future studies should consider incorporating Item Response Theory (IRT) methodologies. This advanced approach will facilitate a nuanced, micro-level analysis of item performance, enhancing the precision and applicability of the instrument. In conclusion, our study not only underscores the psychometric properties of the Chinese version of the PROMIS-29 V2.1 but also paves the way for its widespread adoption in assessing and monitoring symptoms and functions among Chinese patients with HM.