Evaluating the psychometric properties of the simplified Chinese version of PROMIS-29 version 2.1 in patients with hematologic malignancies

The Patient-Reported Outcomes Measurement Information System 29-item Profile version 2.1 (PROMIS-29 V2.1) is a widely utilized self-reported instrument for assessing health outcomes from the patients’ perspectives. This study aimed to evaluate the psychometric properties of the PROMIS-29 V2.1 Chinese version among patients with hematological malignancy. Conducted as a cross-sectional, this research was approved by the Medical Ethical Committee of the Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (registration number QTJC2022002-EC-1). We employed convenience sampling to enroll eligible patients with hematological malignancy from four tertiary hospitals in Tianjin, Shandong, Jiangsu, and Anhui province in China between June and August 2023. Participants were asked to complete a socio-demographic information questionnaire, the PROMIS-29 V2.1, and the Functional Assessment of Cancer Therapy-General (FACT-G). We assessed the reliability, ceiling and floor effects, structural, convergent discriminant and criterion validity of the PROMIS-29 V2.1. A total of 354 patients with a mean age of 46.93 years was included in the final analysis. The reliability of the PROMIS-29 V2.1 was affirmed, with Cronbach’s α for the domains ranging from 0.787 to 0.968. Except sleep disturbance, the other six domains had ceiling effects, which were seen on physical function (26.0%), anxiety (37.0%), depression (40.4%), fatigue (18.4%), social roles (18.9%) and pain interference (43.2%), respectively. Criterion validity was supported by significant correlations between the PROMIS-29 V2.1 and FACT-G scores, as determined by the Spearman correlation test (P < 0.001). Confirmatory factor analysis (CFA) indicated a good model fit, with indices of χ2/df (2.602), IFI (0.960), and RMSEA (0.067). The Average Variance Extracted (AVE) values for the seven dimensions of PROMIS-29 V2.1, ranging from 0.500 to 0.910, demonstrated satisfactory convergent validity. Discriminant validity was confirmed by ideal √AVE values. The Chinese version of the PROMIS-29 V2.1 profile has been validated as an effective instrument for assessing symptoms and functions in patients with hematological malignancy, underscoring its reliability and applicability in this specific patient group.

non-Hodgkin lymphoma, multiple myeloma and Hodgkin lymphoma were 6.4 per 100,000, 0.47 per 100,000, 1.5 per 100,000 respectively 1 .Patients afflicted with HM often grapple with a myriad of physical, psychological, and social challenges, exacerbated by both the disease and its associated treatments.In the realm of cancer care, the frequent and precise assessment of symptoms is paramount.Patient-reported outcome (PRO) measures have emerged as a gold standard, offering invaluable insights into patients' subjective experiences and overall quality of life 2 .These tools are instrumental in fostering enhanced patient-nurse communication, enabling systematic monitoring, and facilitating tailored management of patients' symptoms, thereby promoting patient-centered care [3][4][5][6] .
The Patient-Reported Outcomes Measurement Information System (PROMIS), an initiative by the National Institutes of Health, is renowned for its innovative self-report measures designed to evaluate the physical, mental, and social facets of health and well-being 7 .The versatility and comprehensiveness of PROMIS have garnered significant attention, marking it as a pivotal tool in the holistic assessment of individual health 2,8,9 .PROMIS includes item banks that can be administered using computer-adaptive testing, short forms for individual domains, and profiles that yield information about multiple domains for use in clinical trials, observational studies, and clinical practice 7 .
The PROMIS-29 V2.1, in particular, stands out for its robust design, aimed at addressing the gap in universal and generalizable measures for assessing core patient-reported symptoms and functional domains in individuals with chronic diseases 10 .Developed through meticulous processes including literature review, Item Response Theory (IRT) analysis, and expert reviews, the PROMIS-29 V2.1 ensures a comprehensive and standardized evaluation of patients' health statuses 10 .
Although the PROMIS-29 V2.1 has been translated into Chinese by the PROMIS National Center-China (PNC-China), its application in the context of HM remains limited.There is a conspicuous absence of validation studies exploring the efficacy and reliability of PROMIS-29 V2.1 among HM patients.Given the critical need for nuanced assessments of physical, social, and mental health in this demographic, validating the PROMIS-29 V2.1 could not only enhance clinical practices but also pave the way for international comparative studies.
In light of this, our study is poised to conduct an exhaustive psychometric evaluation of the Chinese version of PROMIS-29 V2.1 among a selected cohort of HM patients in mainland China.We aim to delineate its reliability, validity, and potential applications in this specific medical and cultural context.

Study design
This multicenter cross-sectional study received approval from the Medical Ethical Committee of Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (registration number QTJC2022002-EC-1).We adhered to the Consensus-based standards for the selection of health measurement instruments (COSMIN) guidelines to evaluate the psychometric properties of the Chinese version of the PROMIS-29 V2.1 among hematological malignancy patients.

Setting and sample
Patients were conveniently sampled from the hematology departments of four tertiary hospitals across Tianjin, Shandong, Jiangsu, and Anhui provinces in China, between June and August 2023.Based on the 5-10:1 case-tovariable ratio for psychometric evaluation and accounting for a potential 20% invalid sample rate, we aimed for a sample size between 174 and 348 and successfully included 354 cases 9,11 .The sample size was aslo sufficient to perform stable and precise model estimation by confirmatory factor analysis (CFA) 11 .
Patients were eligible for the study if they met the following criteria: (a) aged 18 or older, (b) had a diagnosis of Hematological Malignancy, including leukemia, lymphoma, myeloma, myelodysplastic neoplasms and myeloproliferative neoplasms, (c) Being able to speak Mandarin and read Chinese, and (d) signed an informed consent form.Patients with psychiatric illness, cognitive impairment or diagnosis of another cancer type were excluded.

Measures
Socio-demographic information questionnaire A sociodemographic information questionnaire was developed to collect sociodemographic and clinical data including gender, age, residential location, education level, marital status, job, employment status, health insurance, average monthly family income, primary caregiver, diagnose, time since diagnosis, medical costs, treatment phase and medical treatment.Patients self-reported sociodemographic data, while trained nurse researchers extracted clinical data from medical records.

PROMIS-29 V2.1
The Chinese version of the PROMIS-29 V2.1 was used in this study, which was authorized by PNC-China.The PROMIS-29 V2.1 consists of 29 items measuring seven health and function domains: physical function, anxiety, depression, fatigue, sleep disturbance, ability to participate in social roles and activities, pain interference and intensity.Except for a single item for pain intensity, all domains include 4 items and are responded to with a five-point Likert scale from 1 to 5. The pain intensity item is answered with a 0 to 10 numeric rating scale ranging from 0 (without pain) to 10 (worst pain imaginable).Item scores in each domain were summed and transformed into T-scores metric: values of 50 (SD = 10) indicate the mean of the U.S. general population (http:// www.healt hmeas ures.net) 7 .For physical function and social role, higher scores indicate better functioning and quality of life (QOL).For depression, anxiety, fatigue, pain interference, pain intensity, and sleep disturbance, a higher score indicates more serious implications of disease 7

FACT-G
The FACT-G are the most frequently used questionnaires to measure health-related quality of life (HRQOL) in patients with cancer.The FACT-G is comprised of four subscales: physical wellbeing (PWB, 7 items, 0-28), social/family wellbeing (SWB, 7 items, 0-28), emotional wellbeing (EWB, 6 items, 0-24), and functional wellbeing (FWB, 7 items, 0-28) 12 .All items in the FACT-G use a five-point rating scale (0 = not at all, 1 = a little bit, 2 = somewhat, 3 = quite a bit, and 4 = very much).The 12 items PWB l to 7, EWB l, EWB 3 to EWB 6 are reverse entries and need to be scored in reverse.The total score of the scale is 108, and the higher the score, the higher the quality of life 12,13 .

Date collection
Eligible patients were enrolled during hospitalization by the trained nurse researchers at each study site, who had received training regarding the study process to ensure the standardization of the data collection.All the participants were informed about the purpose and procedures of the study, and verbal consent was obtained before data collection.In addition, participants were informed of the voluntary nature of participation, participants' rights, and the confidentiality of the data.Participants could choose to complete the survey either on paper or using web-based questionnaires based on their preferences.Data on every respondent were collected only once.The participants were required to return the questionnaire immediately after completion.To express gratitude, all participants were distributed a bottle of no-hand sanitizer after completing the questionnaire.

Date analysis
Analyses were conducted using IBM SPSS version 21.0 and IBM SPSS Amos Graphics (version 26.0).All significance tests were 2-tailed, with p < 0.05 considered signifcant.Descriptive statistics were calculated for sample characteristics and study variables, in which continuous variables were analyzed by means and standard deviations, and categorical variables were described by counts and percentages.The PROMIS-29 V2.1 raw scores were transformed into T-scores based on the PROMIS guidelines (http:// www.healt hmeas ures.net).The ceiling or floor effects were identified if responses exceeded 15% at the best and the worst possible score.Reliability was assessed via Cronbach's α coefficient, Composite Reliability (CR) and split-half reliability.
Criterion validity was determined by correlating PROMIS-29 V2.1 domains with similar constructs in FACT-G, using Spearman correlation coefficients.Confirmatory factor analysis (CFA) was carried out using maximum likelihood estimation to examine the construct validity of the PROMIS-29 V2.1 domains.To examine the goodness of model fit, indices including the χ 2 /degree of freedom (χ 2 /df), root mean square error of approximation (RMSEA), goodness-of-fit index (GFI), comparative fit index (CFI), incremental fit index (IFI), Normed Fit Index (NFI), and Tucker-Lewis index (TLI) were included.An acceptable CFA model should have a χ 2 /df < 3; a RMSEA < 0.08; and a GFI, CFI, IFI, NFI, TLI > 0.9 14 .AVE and √AVE index were performed to assess the convergent validity and discriminant validity.

Ethics approval and consent to participate
All participants signed written informed-consent forms and completed questionnaires online at their earliest convenience.Ethical approval was approved by the Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College (No. QTJC2022002-EC-1).

Sample characteristics
A total of 400 questionnaires were distributed.29 eligible participants did not consent to participate, while 371 agreed to be involved.In addition, 17 questionnaires were excluded for that the participants circled the same response choice for every question asked.A sample of 354 was chosen for the final analysis.The average age of the patients was 46.93 years.A majority of the participants were male (57.3%), married (78.8%), and unemployed (78.2%).In terms of education, the largest group had completed high school or an equivalent level of education (39.5%).Most participants were covered by employee health insurance (60.5%), and the prevalent income bracket was ¥3001-¥5000 per month (25.7%).Clinically, leukemia was the most common diagnosis, accounting for 43.8% of the patients.A significant portion (32.8%) were diagnosed for less than 6 months.The majority (83.6%) were undergoing treatment at the time of the survey.See more detail in Table 1.

Reliability analysis
Regarding the reliability analysis, the internal consistency coefficients, CR and split-half coefficient were calculated.Reliability was excellent for the PROMIS-29 V2.1 scale with Cronbach's α (0.965) and split-half coefficient (0.927).For all seven domains of PROMIS-29 V2.1 subscales, Cronbach's α ranged from 0.787 (sleep disturbance) to 0.968 (pain interference and intensity), CR ranged from 0.778 (sleep disturbance) to 0.976 (pain interference and intensity), which were all above the threshold of 0.70, indicating sufficient reliability.See Tables 2 and 6.

Descriptive statistics, ceiling, and floor statistics
Regarding the mean T-scores of PROMIS-29 V2.1, except the physical function (41.31 ± 11.85) and the ability to participate in social roles and activities (47.64 ± 11.38), the other five domains scores were significantly above than the reference level according to the PROMIS guidelines (http:// www.healt hmeas ures.net).See Table 3.
Floor effects reflect the percentage of people who report the worst possible score; ceiling effects reflect the percentage of people who report the best possible score.And the ceiling or floor effects were identified if responses Vol:.( 1234567890

Criterion validity
After normality test, the scores of PROMlS-29 and FACT-G scale did not conform to normal distribution, so Spearman correlation analysis was used to conduct correlation analysis.The absolute value of correlation coefficient between PROMIS-29 V2.1 item scores with the corresponding domains coefficients in the FACT-G ranged from 0.156-0.752(p < 0.001), showing satisfactory criterion validity.See Table 4.

Construct validity
In our analysis, the PROMIS-29 V2.1 demonstrated excellent construct validity among patients with HM, as evidenced by a χ 2 /df of 2.602, an IFI of 0.960, and an RMSEA of 0.067.While the GFI was slightly below the ideal threshold at 0.850, the other indices, including AGFI, NFI, CFI, and TLI, exhibited values ranging from www.nature.com/scientificreports/0.937 to 0.960, affirming a commendable model fit (Table 5).The revised model, offering a visual representation of these findings, is illustrated in Fig. 1.

Convergent validity
The Average Variance Extracted (AVE) is the sum of the square of factor load, which represents the comprehensive explanation ability of the potential variable to all the measured variables.According to the general theory, the larger the AVE value, the stronger the potential variable's ability to explain its corresponding item at the same time; conversely, the stronger the item's ability to express the properties of the potential variable.When AVE > 0.5, convergent validity is good 15 , and when between 0.36 and 0.5, it is an acceptable range 15 .In this study, the AVE values for the seven dimensions of PROMIS-29 V2.1 range from 0.500 to 0.910.Each domain's factor loadings, which are indicative of the relationships between the items and their respective constructs, were notably high across most domains, further corroborating this assertion, showing satisfactory convergent validity.See Table 6.

Discrimination validity
In this study, the seven dimensions of PROMIS-29 V2.1 were significantly correlated (p < 0.01), and the absolute correlation coefficients are all smaller than the corresponding √AVE, indicating that there is a certain correlation among all latent variables, and a certain degree of differentiation between each other, showing ideal discrimination validity.See Table 7.

Discussion
This study is pioneering in its endeavor to evaluate the psychometric properties of the Chinese version of the PROMIS-29 V2.1 profile among patients with HM.Our findings affirm the reliability and validity of this instrument in capturing the multifaceted health status, encompassing physical, mental, and social dimensions, of this specific patient group.Regarding reliability, Cronbach's alpha is considered an adequate measure of internal consistency 16 .Composite Reliability (CR) reflects whether all questions in each latent variable consistently explain the latent variable, and when the value is higher than 0.70, it indicates that the latent variable has good CR 17 .Compared to Cronbach's α, CR is more able to incorporate the different factor loadings of each observation item on latent variables into the calculation formula, and its estimated value is closer to the internal consistency reliability of the scale 17 .In this study, both the Cronbach's α and CR of all domains were close to, or meeting the more stringent criterion of 0.9, which providing evidence of high internal consistency reliability.
The T-scores derived from the PROMIS-29 V2.1 highlighted an apparent diminution in physical function and social participation compared to the reference group.This underscores a pronounced impairment in physical activities and social engagement.The results were similar to those of patients with breast cancer 2 and systemic sclerosis 18 .
Evidence of floor and ceiling effects has been observed in some PROMIS-29 V2.1 domains, which has also been noted in other PROMIS validation projects 8,19 .The floor and ceiling effects of the scale mean that the number of respondents who achieved the worst or the best possible score, which reflect the quantity scale features < 0.08 > 0.9 > 0.9 > 0.9 > 0.9 > 0.9 www.nature.com/scientificreports/ of score distribution 16 .Floor or ceiling effects are considered to be present if more than 15% of respondents achieved the worst or the best possible score, respectively 16,20 .
In our study, a significant proportion of participants reported minimal symptoms in anxiety, depression, fatigue, and pain domains, aligning with general population trends.However, pronounced ceiling effects in each domains (except sleep disturbance) could be attributed to the fact that a majority of our sample were undergoing treatment, potentially amplifying these effects.Nevertheless, it would not be problematic when identifying those with poor physical performance.Such limitations may not exist in a future sample including more patients at different stage of the disease.The criterion validity was demonstrated by its varying degree of correlations with FACT-G.Criterion validity refers to the extent to which scores on a particular instrument relate to a gold standard 16 .Current studies on PROs or QOL in people with HM usually use the FACT-G as the assessment tool 21,22 .Spearman correlation coefficients > 0.50 were considered strong correlation, 0.30-0.50indicated moderate correlation, and < 0.30 indicated weak correlation 15 .In this study, the PROMIS-29 V2.1 domains showed adequate correlations with all corresponding dimensions of the FACT-G (P < 0.01).
CFA showed that the Chinese version of the PROMIS-29 V2.1 in patients with HM had good evidence for construct validity including the presence of the seven domains.According to the results of goodness-of-fit, the model is considered to have a good fitting effect when χ 2 /df < 3, IFI > 0.9, and RMSEA < 0.08 after correction, meanwhile, the values of the five fitting indices (GFI, AGFI, NFI, CFI, and TLI) should be all between 0 and 1, the closer to 0, the worse the fitting, and the closer to 1, the better the fitting 15 .The goodness-of-fit indices for the original domain of PROMIS-29 V2.1 were high.Meanwhile, the PROMIS-29 V2.1 were showing satisfactory convergent validity and discrimination validity.The results underscore the robust structural integrity of the PROMIS-29 V2.1 in capturing the multifaceted health outcomes of patients with HM.
Convergent validity which is evaluated by the AVE index, means that items measuring the same underlying domain should belong to the same dimension and there should have a high degree of correlation between items 15 .In the context of this study, the AVE values for all seven domains of the PROMIS-29 V2.1 were examined, offering insights into the measure's convergent validity among patients with HM.These findings underscore the instrument's robustness in capturing the intended constructs with minimal measurement error, attesting to its utility in this specific patient population.The consistency in factor loadings amplifies confidence in the PROMIS-29 V2.1's ability to offer reliable, nuanced insights into the multifaceted health outcomes of patients with HM.
Discriminant validity evaluates the extent to which a construct is distinct from other constructs, ensuring that it is not highly correlated with other variables, and should theoretically be different from 15 .In this context, it is assessed by comparing the√AVE for each construct with the correlations between that construct and others.Ideal discriminant validity is achieved when the √AVE for each construct is greater than its highest correlation with any other construct 15 .In our study, the PROMIS-29 V2.1 demonstrated excellent discriminant validity among patients with hematologic malignancies.For instance, while there was a notable correlation between anxiety and depression (r = 0.900, p < 0.01), the √AVE values for these constructs were 0.934 and 0.937, respectively, exceeding the correlation coefficient.This pattern was consistent across all construct pairs, underscoring the instrument's ability to distinguish between different aspects of patients' health and well-being effectively.These findings affirm the multidimensionality of the PROMIS-29 V2.1 and its applicability in capturing a broad spectrum of health outcomes among patients with hematologic malignancies, without conflating distinct constructs.
To sum up, these findings reinforce the utility of the Chinese version of the PROMIS-29 V2.1 as a reliable tool, mirroring the intricate nuances of patients' experiences and outcomes.This congruence in outcomes underscores the PROMIS-29 V2.1's potential as a pivotal tool in both clinical and research settings for this patient population.

Limitations
However, this study has several limitations.First, the participant pool, though multicentric, was confined to tertiary hospitals in China, warranting caution in extrapolating these findings to broader settings and populations.Second, the cross-sectional design precludes insights into the instrument's responsiveness and interpretability over varying clinical states, marking an avenue for future longitudinal studies.Third, this study doesn't explain how the questionnaires work in the pre-and post-treatment patient population, and that's what we're going to explore next.

Conclusion
This study meticulously evaluated the psychometric properties of the Chinese version of the PROMIS-29 V2.1 in patients with HM, utilizing a comprehensive, multicenter sample.Our findings affirm that this version of PROMIS-29 V2.1 is a validated and reliable instrument, adept at measuring a spectrum of symptoms and functional attributes in HM patients.However, the evolution of this instrument's applicability doesn't end here.Future studies should consider incorporating Item Response Theory (IRT) methodologies.This advanced approach will facilitate a nuanced, micro-level analysis of item performance, enhancing the precision and applicability of the instrument.In conclusion, our study not only underscores the psychometric properties of the Chinese version of the PROMIS-29 V2.1 but also paves the way for its widespread adoption in assessing and monitoring symptoms and functions among Chinese patients with HM.

Table 1 .
Sample characteristics of the study sample (N = 354).

Table 6 .
The convergent validity of the PROMIS-29 V2.1 S.E.standard error, CR Composite Reliability, AVE Average Variance Extracted.