To the Editor:

Hayeems and colleagues recently published an article on the development of a novel measure of clinical utility titled the Clinician-reported Genetic testing Utility InDEx (C-GUIDE)TM[1]. C-GUIDE is an 18-item tool with 9 supplemental questions for clinical context. The authors note that the field currently lacks a “single validated measure that quantifies clinical utility”1 and that the C-GUIDE was developed to address this gap. We commend the authors in their efforts to develop a tool to measure clinicians’ perceptions of clinical utility in genomics given the current lack of validated measures in the field coupled with the importance of assessing utility to evaluate the use of genomics in clinical practice. As researchers conducting instrument development in our own work, we read the article with great interest. Unfortunately, the C-GUIDE seems to be an underdeveloped measure, the design of which neglected many psychometric best practices. We are concerned that C-GUIDE’s dissemination and wide use in its current form could adversely impact the field, particularly given that little evidence of its reliability or validity was presented.

A major issue with the C-GUIDE is that it appears that the authors have conflated defining a conceptual model with developing a measure. Scale development best practices call for first defining the construct one is trying to measure, including the development of a nomological net,2 and then conducting item development.3,4,5,6 Establishing a nomological net, which includes the internal structure of the construct (including subconstructs) and a description of the construct’s relationship to external constructs, helps ensure the development of a reliable and valid measure of the construct of interest and low measurement error.3,4 In contrast, Hayeems et al. change the model of their construct in response to feedback on specific items. For example, the authors describe consolidating two subconstructs (i.e., family implications and psychosocial implications) based on respondent answers to individual items. It would be more appropriate to rewrite and add new items to ensure adequate representation of the subconstructs that comprise the model, which helps ensure structural and content validity, rather than reconceptualizing the model of the construct3,4 based on poorly written items. This is particularly true given that their literature review and stakeholder interviews indicated five separate domains or subconstructs of clinical utility.

There are other methodological issues of concern in the C-GUIDE development related to the number of items that were developed as well as the scoring procedures for the instrument. Specifically, instrument development best practices call for initially writing many more items than one plans to include in the final measure (i.e., 2–5 times as many), which would have allowed for a more robust item elimination process3,4 in this case. Moreover, while an instrument that produces a single score to reflect overall impact or utility is theoretically appealing and would be a substantial advancement over the multicategory checklists used in previous studies,7 the scoring procedure used to arrive at the C-GUIDE’s single total score is problematic because the item count per subconstruct varies. Therefore, the C-GUIDE’s single total score disproportionately weights one subconstruct over another and does not represent the subconstructs equally. In addition, it appears that the authors asked respondents to choose a scoring approach without themselves first taking quantitative considerations into account. For instance, over 70% of the tool’s items are two- or three-point response options, which would lead to lower reliability than four-, five-, six- or seven-point scales8 and could limit the tool’s utility in statistical analyses.

We agree with the authors that the potential impact on family members and nonclinical domains is important to include in any score that attempts to summarize the impact of genomic sequencing, but it is noteworthy that per the C-GUIDE’s current administration, familial and psychosocial implications are assessed by the clinician.9 However, because clinicians’ perceptions of family impact may not accurately reflect families’ perceptions, the instrument should not be used in lieu of familial or psychosocial measures of perceived utility without first showing that clinician perceptions as assessed via the C-GUIDE are correlated with familial perceptions (i.e., criterion validity).

Finally, it is not advisable to widely disseminate a measure without first showing compelling evidence of the measure’s reliability and validity (e.g., test–retest reliability, interrater reliability, concurrent and predictive criterion validity, convergent and divergent validity, structural validity, sensitivity to change).4 While it is theoretically possible for the measure, in its current form, to show compelling evidence of reliability and validity in future studies, we are concerned that the problems we have outlined in the measure development process may contribute to limited reliability and validity overall. Specifically, while the authors cited Fryback and Thornbury’s hierarchical model of efficacy10 as the basis for developing their construct of clinical utility, they did not describe or refer to any theoretically sound methodology for scale development. The authors indicate their intention to conduct future work in reliability and validity, however, little evidence is presented in the current publication even though the tool’s publication and dissemination give the impression that C-GUIDE is ready for use by others in the field.

There is a dearth of tools available to assess clinical utility, and thus, developing psychometrically sound measures of this construct is critical for evaluating the use of genomics in clinical practice. We commend the authors on their review of the literature and on gathering feedback from clinicians in their process. Indeed, incorporating findings from the literature as well as engaging experts and a measure’s target population are among some of the recommendations from scale development best practices.3,4 Nevertheless, given the methodological deficiencies we have outlined here, C-GUIDE, in its current form, appears underdeveloped and its use by others in the field is likely premature.