INTRODUCTION

Patient-reported outcome measures (PROMs) assess health status through direct input from patients or their proxy and can be administered without the involvement of a healthcare provider.1,2 They comprise validated questionnaires inquiring about aspects of patients’ lives, including functional status and psychological well-being. Some PROMs are general enough to administer to healthy individuals as well as individuals with many disorders. Others are disease-specific, applicable only to individuals with specific disorders.3 PROMs can provide not only information on the health status of individuals with chronic diseases but also real-time, patient-centric data, which can be used to assess the utility and efficacy of therapeutic interventions.4

In contrast to open-ended interviews, validated PROMs allow for objective assessment of outcomes, thus reducing bias.5 In clinics, patients can use PROMs to communicate their experiences and expectations as well as unmet needs.6 The use of PROMs in clinics has been shown to enhance communication between patients and providers; improve shared decision-making among providers, patients, and families; and increase patient satisfaction.7,8 In the setting of clinical trials, PROMs can be invaluable in understanding patient experiences and outcomes that are not typically assessed by observer-rated measures or investigational endpoints. Hence, PROMs can not only serve as outcome measures but also inform and improve the design and conduct of clinical trials.9 Increasingly, PROMs have been recognized as useful endpoints in evaluating the efficacy of interventions. In fact, PROMs have been utilized in nearly one-third of studies conducted for the approval of new drugs and molecular entities by the US Food and Drug Administration.10,11 Selection of the appropriate PROM is critical for obtaining approval of product labeling claims; if the chosen PROM has insufficient validation in the population of interest, such label claims may be denied.10,12 Thus, understanding the variation, clinically meaningful differences in PROMs, and the expected sizes with interventions in specific populations is crucial for the design of clinical trials.

PROMs have been utilized as outcomes in clinical trials for numerous rare diseases.11,13 Rare bone diseases account for 5% of all birth defects and encompass over 400 conditions.14,15 Osteogenesis imperfecta (OI), a prototypic rare bone disorder, is characterized by bone fragility, hearing loss, dental abnormalities, scoliosis, cardiopulmonary problems, joint laxity, and muscle weakness.16,17 Disease severity ranges from individuals with a few fractures, to those with severe skeletal deformities, to perinatal lethality.16 Most studies evaluating therapies for OI have focused on areal bone mineral density, serum markers of bone remodeling, and fracture rates as primary endpoints.18,19 However, these outcomes do not directly assess functionality, health status, and quality of life. Presently there is a lack of validated and disease-specific PROMs for OI.20 Developing and validating OI-specific PROMs can be onerous given the rarity and heterogeneity of the disorder and challenges in normalizing the outcome measures compared with an unaffected population. In this study, we have examined the validity and reliability of an existing PROM, the Pediatric Outcomes Data Collection Instrument (PODCI), in individuals with OI. PODCI was developed by the Pediatric Orthopedic Society of North America to assess functional health status in children with musculoskeletal disorders. The initial validation of this instrument assessed responses from 470 patients, of whom only a small number (n = 12) had OI.21 Using data from a large, multicenter, observational, natural history study of OI conducted by the National Institutes of Health Rare Disease Clinical Research Network’s (RDCRN) Brittle Bone Disorders Consortium (BBDC), we assessed whether the PODCI could track functional health status in children with OI. Additionally, we correlated scores obtained from PODCI with scores from a validated observer-rated scale of functional health, the Brief Assessment of Motor Function (BAMF).

MATERIALS AND METHODS

Study details

The data were collected from participants enrolled in the Longitudinal Study of OI (NCT02432625) conducted by the BBDC. The clinical sites of the BBDC include Baylor College of Medicine (Houston, TX), Kennedy Krieger Institute (Baltimore, MD), Nemours/Alfred I. DuPont Hospital for Children (Wilmington, DE), Oregon Health & Science University and Shriners Hospital for Children (Portland, OR), Shriners Hospital for Children (Chicago, IL), Shriners Hospital for Children (Montreal, QC), University of California–Los Angeles (Los Angeles, CA), University of Nebraska Medical Center (Omaha, NE), Hospital For Special Surgery (New York, NY), Shriners Hospital for Children (Tampa, FL), and Children’s National Medical Center (Washington, DC). Data were collected in a systematic manner across all sites in accordance with the manual of operations. All data were captured using online case report forms developed, housed, and managed by the RDCRN Data Management and Coordinating Center at the College of Medicine, University of South Florida (Tampa, FL). Individuals with an OI diagnosis by clinical and radiographic features or by molecular analysis were enrolled. The classification was made using the Sillence clinical criteria: mild, nondeforming (type I); perinatally lethal (type II); progressively deforming (type III); and moderate OI with normal sclera (type IV).22 The study procedures were approved by the institutional review boards of all participating clinical sites (Baylor College of Medicine, Kennedy Krieger Institute, Nemours/Alfred I. DuPont Hospital for Children, Oregon Health & Science University and Shriners Hospital for Children [OR], Shriners Hospital for Children [IL], Shriners Hospital for Children [QC], University of California–Los Angeles, University of Nebraska Medical Center, Hospital For Special Surgery, Shriners Hospital for Children [FL], and Children’s National Medical Center) and informed consent was obtained from subjects or their legal guardians.

Data collection

The following data were collected: age, gender, OI type (I, III, IV, or other), PODCI core scale scores, and BAMF scale scores from the enrollment visit. Bisphosphonate treatment status was not collected.

PODCI evaluates functional health status through an 83–86 item questionnaire. The pediatric form (2 years to 10 years 11 months) is completed by parents. The adolescent form (11 years to 18 years 11 months) is answered by parents (parent report) and adolescents (self-report). PODCI consists of scores on seven core scales, four encompassing physical function and three assessing psychological well-being. Physical function core scales include (1) Upper Extremity and Physical Function, which measures ability to perform activities of daily living with upper extremities, such as lifting books and turning doorknobs; (2) Transfer and Basic Mobility, which measures ability to perform activities of daily living with lower extremities and trunk, such as standing, sitting, and transfers; (3) Sports and Physical Functioning, which measures capacity for physical activity like running and walking more than one mile; and (4) Pain/Comfort, which quantifies pain and its interference with activities. Psychological well-being core scales include (1) Happiness, which assesses a child’s happiness with physical appearance, capabilities, and health; (2) Satisfaction, which measures the satisfaction with current functional status; and (3) Expectations, which quantifies the expectations of treatment.

The BAMF is an observer rating of a patient’s functional capability, with three scales: Fine Motor Scale, Upper Extremity Gross Motor Scale, and Lower Extremity Gross Motor Scale. Each scale has an integer score of 0 to 10, reflecting the patient’s highest functional capability, with 10 representing the highest possible function. The BAMF was completed by trained research personnel according to the study manual of operations.

Statistical analyses

Data extraction and analyses were performed primarily by one author (D.C., the statistician for the BBDC). The PODCI responses were normalized for age group (pediatric versus adolescent) using the algorithm from the Normative Data Study from the American Academy of Orthopedic Surgeons.23 Six of the core scales were converted to standardized scores, ranging from 0 to 100, with 100 representing the best function, highest well-being, least pain, or greatest expectations from treatment. The normative range for these standardized scores (excepting Expectations) in the general population range from the mid-80s to high 90s and generally, a score of 85 is considered to be the lower limit of normal.24 The Expectations and Satisfaction Core Scale scores have not been calculated in healthy children, thus there are no normative scores. A study of children with various musculoskeletal diagnoses found that Expectations standardized scores range from mid-60s to mid-70s.21 The Satisfaction Core Scale yields a raw score, ranging from 1 to 5, with 5 representing the greatest satisfaction. Scores on the four physical core scales were averaged to form the Global Functioning Scale, a composite of overall physical function and symptoms.

The mean standardized scores of PODCI core scales were compared using analysis of variance (ANOVA) and Tukey’s studentized range test. Scores on the BAMF Fine Motor Scale and Lower Extremity Gross Motor Scale were correlated to the PODCI Upper Extremity and Physical Function and Transfer and Basic Mobility Core Scales, respectively. Correlations between selected BAMF scales and selected PODCI core scales, PODCI adolescent self-report and parent report scores, and PODCI psychological core scale scores and Global Functioning Scale scores were performed using the Pearson correlation coefficient. For correlation coefficients, 0.30 was interpreted as a weak positive relationship, 0.50 as a moderate positive relationship, and 0.70 as a strong positive relationship. For clinical trial readiness analysis, sample sizes needed to detect clinically meaningful differences were calculated using unpaired t test at an alpha error of 0.05 and a power of 0.8.

RESULTS

Study population

By the date the data extraction was completed (7 November 2018), 460 children aged 18 years or younger were enrolled in the BBDC longitudinal study. One child was under the age of two years, and PODCI data were not available. Overall, 203 had OI type I, 95 had OI type III, 119 had OI type IV, and 42 had other OI types (types V, VI, VII, and unclassified). The demographic characteristics of participants are detailed in Table 1. Females represented 55% of the study population. The mean and median age of participants in the three type I collagen-related OI was between 9 and 10 years. Sixty percent of the participants were within the pediatric age group (2 years to 10 years 11 months), whereas 40% were in the adolescent age group (11 years to 18 years 11 months) as defined by PODCI. As the number of participants with rare forms of OI were limited, the primary analyses were limited to OI types I, III, and IV.

Table 1 Demographic characteristics of participants enrolled in the Brittle Bone Disorders Consortium

PODCI physical core scores are a measure of severity of OI

The mean standardized scores for the physical core scales of PODCI categorized by OI type and age for children between the ages of 2 years and 10 years 11 months are depicted in Fig. 1. The age-wise categorization represents preschool (age 2–5 years), early school (6–8 years), and later school (9–10 years) ages. Overall, individuals with OI type I (mild form) had the highest (best) scores, individuals with OI type III (severe form) had the lowest (worst) scores, and individuals with OI type IV (moderately severe) had scores between those observed in OI types I and III. The mean standardized scores for the Upper Extremity and Physical Function, Transfer and Basic Mobility, and Sports and Physical Functioning scales were significantly lower in individuals with OI type III compared with OI types I and IV. Whereas the standardized scores were largely similar between OI types I and IV, the mean scores were lower in OI type IV compared with OI type I in the Upper Extremity and Physical Function Core Scale for the early school age group and in the Sports and Physical Functioning Core Scale in the preschool age group. For the Pain/Comfort Core Scale, individuals with OI type III older than 6 years had scores that were statistically lower than individuals with OI type I.

Fig. 1
figure 1

Standardized scores for Upper Extremity and Physical Function, Transfer and Basic Mobility, Sports and Physical Functioning, and Pain/Comfort Core Scales on pediatric parent report. (a) The mean standardized scores on pediatric parent report Pediatric Outcomes Data Collection Instrument (PODCI), by osteogenesis imperfecta (OI) type and age group for the Upper Extremity and Physical Function Core Scale (a), Transfer and Basic Mobility Core Scale (b), Sports and Physical Functioning Core Scale (c), and Pain/Comfort Core Scale (d). Maximum score, is 100. Score typically considered lower limit of normal for children in the general population is 85. (*p < 0.05 analysis of variance [ANOVA]).

In adolescents (age 11 to 18 years), self-reported and parent-reported scores were analyzed (Fig. 2). By both self-reported and parent-reported scores, the Upper Extremity and Physical Function and Transfer and Basic Mobility Core Scales mean standardized scores were significantly lower in OI type III compared with OI types I and IV. Whereas for the Sports and Physical Functioning Core Scale, the mean scores differed among all three OI subtypes; no differences were observed for the Pain/Comfort Core Scale.

Fig. 2
figure 2

Standardized Scores for Upper Extremity and Physical Function, Transfer and Basic Mobility, Sports and Physical Functioning, and Pain/Comfort Core Scales in adolescents. (a) The mean standardized scores by self- and parent-report in adolescents by osteogenesis imperfecta (OI) type for the Upper Extremity and Physical Function Core Scale (a), Transfer and Basic Mobility Core Scale (b), Sports and Physical Functioning Core Scale (c), and Pain/Comfort Core Scale (d). Maximum score is 100. Score typically considered lower limit of normal for children in the general population is 85 . (*p < 0.05 analysis of variance [ANOVA]).

PODCI psychological well-being core scale scores do not differ among the OI types

The Happiness, Satisfaction, and Expectations standardized scores were not different among the OI types in children between 2 and 10 years (Supplementary Figure 1). Similarly, there were no differences among the OI types in these domains in the adolescent age group (Supplementary Fig. 2).

These results demonstrate that PODCI psychological well-being scores do not correlate with known physical function in OI. Happiness mean scores tend to decrease with age, with children age 2–5 having scores in the normative range of mid-80s to mid-90s, and adolescents with all types of OI having scores below this range. Expectation mean scores generally fall within the range of children affected by various musculoskeletal diseases.21 Ceiling effect has been demonstrated in normative populations for the Happiness Core Scale, but not for children affected by musculoskeletal diseases.21,24 Normative scores for Satisfaction were not available for comparison.

These results demonstrate that PODCI is more likely to reliably assess physical function compared with psychological well-being in OI.

Correlation between self- and parent-reported scores in OI

There has been debate on whether proxy-reported (i.e., parent or caregiver) outcomes are equivalent to patient-reported outcomes.3 To investigate the differences between proxy- and patient-reported outcomes in children with OI, we evaluated correlations between adolescent self-report and adolescent parent report scores on the seven PODCI core scales (Supplementary Table 1). The correlation coefficients were moderate to strong for all physical core scales: Upper Extremity and Physical Function (R~0.56–0.74), Transfer and Basic Mobility (R~0.72–0.92), Sports and Physical Functioning (R~0.74–0.89), and Pain/Comfort (R~0.41–0.67). For the measures assessing psychosocial well-being, the correlations were moderate to strong for Satisfaction (R~0.63–0.89), but not Happiness (R~0.26–0.64) or Expectations (R~0.35–0.42).

Correlation between POCDI and BAMF

To assess the validity of PODCI physical core scales as a measure of physical functioning in children with OI, we correlated these measures with appropriate corresponding scales of the BAMF, an observer-rated and validated measure of physical function (Table 2).25 The gross motor scales of BAMF have been validated in children with OI, and the fine motor scale of BAMF has been validated in children with other rare diseases involving the musculoskeletal system.25,26 The PODCI Upper Extremity and Physical Function scale was compared with the BAMF Fine Motor Scale. The PODCI Transfer and Basic Mobility scale was compared with the BAMF Lower Extremity Gross Motor Scale. In the age group 2–10 years 11 months, correlation coefficients between PODCI Upper Extremity and Physical Function and BAMF Fine Motor range from 0.45 to 0.57 (p < 0.001). Correlation coefficients between PODCI Transfer and Basic Mobility and BAMF Lower Extremity range from 0.63 to 0.83 (p < 0.001). In the adolescent age group, we did not find significant correlations between PODCI Upper Extremity and Physical Function and BAMF Fine Motor scores. However, correlations between PODCI Transfer and Basic Mobility and BAMF Lower Extremity Gross Motor were strong (R~0.68–0.74).

Table 2 Correlation between PODCI and BAMF

Correlation between physical functioning and psychological well-being in OI

Though the psychological well-being of children with OI has been previously studied, there is a paucity of knowledge about the factors that modify quality of life in this disorder.27,28,29 To assess the impact of physical function on psychological well-being, we investigated the relationship between PODCI Global Functioning Scale score (the mean of the standardized scores on the four physical core scales) and selected psychological core scale scores (Supplementary Table 2). Happiness and Global Functioning standardized scores showed weak correlations in OI types I (age 2–10: R = 0.41, p < 0.001; age 11–18: R = 0.44, p < 0.001) and IV (age 2–10: R = 0.29, p < 0.05 age 11–18: R = 0.44 p < 0.05), but not OI type III. The Satisfaction standardized scores showed no or weak correlations with Global Functioning standardized scores, though correlation coefficients are inconsistent. In general, we observed more correlation between physical and psychological core scales in individuals with OI types I and IV than in individuals with OI type III. These results indicate that while physical function may have an effect on overall psychological well-being, the strength of correlation is weak, especially in those with OI type III.

Applicability of PODCI in clinical trials

An important aspect of natural history studies is to facilitate clinical trial readiness, wherein insights obtained from the data inform clinical trial design. Thus, based on the mean and interindividual variation, we estimated the necessary sample sizes to detect minimum clinically important differences (MCIDs) in standardized scores on the Upper Extremity and Physical Function and Transfer and Basic Mobility Core Scales in OI. MCIDs in the PODCI scores have not been defined for individuals with OI but have been estimated in studies involving children with cerebral palsy and Duchenne muscular dystrophy, which also cause significant limitations in physical function. In cerebral palsy, MCIDs for the Upper Extremity and Physical Function Core Scale have been reported to be between 2.0 and 9.5, whereas the MCIDS for Transfer and Basic Mobility Core Scale have been between 2.0 to 10.5.30 In Duchenne muscular dystrophy, one study reported the MCID for the Transfer and Basic Mobility scale to be 4.5.31 The numbers of participants with OI required to detect different effect sizes in PODCI scores are listed in Table 3.

Table 3 Sample sizes required to detect a range of differences in various measures of physical function using PODCI in OI

DISCUSSION

To date, there exist no disease-specific PROMs to assess disease burden and functional outcomes in OI, and thus various PROMs including Pediatric Evaluation of Disability Inventory, WeeFIMTM, Pediatric Quality of Life InventoryTM, Patient-Reported Outcome Measurement Information Systems (PROMIS)®, and PODCI have been used in the clinical and clinical investigational settings.20,32,33,34,35,36,37,38 These PROMs assess different but overlapping domains of functionality and quality of life and previously, our group has reported that there is some degree of correlation between measures assessed by different PROMs like PROMIS® and PODCI in OI.20 However, most studies have been conducted in small cohorts with limited representation of age and disease severity. The precision and accuracy of the approach of repurposing existing PROMs for OI has not been systematically analyzed. In this study, using data from a large cohort, we tried to answer the following important questions about the use of PODCI, a prototype PROM for musculoskeletal diseases: (1) How reliable are the physical function measures in assessing severity of OI in children and how do they compare with the normative values from the general population?; (2) Do the measures of physical function change with age?; (3) Do physical function measures reported by patients correlate with observer-rated assessments?; (4) How “happy” and “satisfied” are children with OI?; (5) Do physical function measures correlate with psychological well-being?; and (6) What is the interindividual variation in the measures of physical function and how does this impact the use of PODCI in clinical trials?

We show that PODCI physical core scores accurately reflect the known clinical severity of OI, with children with OI type I having the highest scores, children with OI type III having the lowest scores, and those with OI type IV intermediate between these. These results are consistent with previous studies using other PROMs like PedsQLTM in OI and attest to the fact that validated PROMs indeed can be used as a surrogate marker of disease severity.27 Generally, we found that the mean scores for the physical function measures were either in the normal range or just below the normal range (mid-80s) in individuals with OI type I across all age groups.24 We observed that the standardized scores for Upper Extremity and Physical Function Core Scale were higher in older children whereas such age-related increases were not apparent in the Sports and Physical Functioning core. This implies that children with OI are likely to learn, adapt with, and overcome limitations that require fine motor functioning but not activities that require strenuous exertion. Additionally, parents of children with OI may have prevented their participation in such activities and thus precluded them from the opportunity to develop adaptive mechanisms for sports. The Pain/Comfort scores were lower in all OI types compared with general population scores attesting to the fact that pain interferes with activities in this population. Importantly, we found that the Transfer and Basic Mobility Core Scale of PODCI shows a moderate-to-strong correlation with BAMF. These results show that PODCI is a reliable and valid measure of physical functioning in children with OI.

Clinical experience has taught us that in spite of the significant health challenges, individuals with OI have a remarkably positive attitude toward life. Terms such as “emotional endurance” and “resilience” have been used to reflect this positive and “can-do” attitude.29 Thus, we were particularly interested in measures that assess psychological health. The Happiness, Satisfaction, and Expectations scores were not different between the OI subtypes but Happiness scores were generally lower than scores observed in the general population. Normative values for Satisfaction and Expectations scores could not be found in the literature, but Expectations scores fall within or close to the range of scores in children affected by various musculoskeletal diagnoses.21 This attests to the fact that children with OI face quality of life challenges and expect better outcomes than observed with the current standard therapies. Interestingly, the strength of correlation between Global Functioning Scale score, a composite of overall physical function, and psychological well-being scores was weak, suggesting that physical functioning is not the only determinant of psychological health in OI.

One important decision to make when using PODCI is whether to use self-reported scores or parent-reported scores. In many studies of quality of life in children with genetic conditions, including studies in OI, parents rate their children’s quality of life lower than the children’s assessment themselves.3,39 We found that parents and adolescents give concordant responses with regard to physical functioning, but not when reporting psychological well-being. These findings reflect the commonly accepted notion that the child is the best judge of his or her own quality of life and functioning.3 Therefore, while using a self-report PROM remains the recommendation, our findings indicate that a proxy-reported PROM may closely approximate self-report with regard to physical functioning.

Our results have direct implications for the design of clinical trials. When choosing appropriate outcome measures for use in a clinical trial, it is important to consider the sample sizes needed to detect clinically significant differences in the outcome measures. Using the data from this large cohort, we show that interindividual variation is larger in younger children compared with adolescents and thus any clinical trial focused on physical function in younger children will need to have larger sample sizes. With the typical number of individuals who are enrolled in clinical trials in OI, it is possible to detect differences in physical functioning with a crossover design or a “before and after” paired analysis; however, this is not the case with a randomized, parallel group, placebo-controlled design. For example, to detect a small difference in Transfer and Basic Mobility, a sample size of over 300 individuals would be required when enrolling younger children. Our results also show that any improvements in physical functioning in children with OI type I are likely to be small as their baseline scores are close to the lower limits of the normative data from the general population and thus any trial that enrolls children with type I collagen-related OI may have to estimate sample size based on a smaller effect size. One significant challenge with the use of any PROM in OI is that physical and psychological scores can change quickly based on how an acute event like fracture affects the outcome measure. This is likely to reduce the correlation between repeat measures and such factors have to be considered in analysis.

In summary, we show that PODCI can reliably assess physical function in children with OI. Our results illustrate the opportunities and challenges with use of a PROM like PODCI in clinical trials. The lessons learned from this study are critical for the development and validation of OI-specific PROMs, which are a focus of ongoing research in the BBDC.