Introduction

A decade after the release of the seminal Institute of Medicine (IOM) report, Crossing the Quality Chasm, the US health-care system continues to experience wide variations in practice and suboptimal quality resulting from major gaps between evidence and practice.1 Among recommendations proposed by the IOM to address individual and organizational barriers that impede the implementation of best practices, the IOM advocated for the cultivation of a strong organizational culture oriented around quality, clinician leadership, simplification and standardization of workflows, and use of interdisciplinary teams in complex care delivery situations.1,2 In particular, a system-based approach to bridge these gaps and improve quality of care was endorsed. Quality measurement, improvement, and accountability can create an infrastructure to support evidence-based practice.3 Furthermore, with the uptake of value-based care and the rise of accountable care organizations, quality measurement is an important component in these efforts, where providers and health-care organizations are evaluated, held accountable, and reimbursed accordingly.

Facilitating quality reporting and monitoring, the use of quality metrics and report cards has been integrated into health-care services and public health programs since the 1990s. Various health-care systems report comparative ratings by hospital, procedure, and physician. Over time, greater insights into institutional and geographic differences in services and outcomes have been achieved. Among many examples, Dartmouth Medical School’s Center for the Evaluative Clinical Sciences measures outcomes in clinical performance related to cardiac bypass graft (CABG) surgery, cystic fibrosis, and spine care. Hospitals are assessed by the Joint Commission, and their performance can be tracked via Hospital Compare (AHRQ). The National Committee for Quality Assurance (NCQA) evaluates managed care plans in the areas of clinical performance, member satisfaction, access to care, and overall quality, and makes the reports available to employers and purchasing groups. Moreover, public grading systems have been taken up by commercial enterprises, such as Health-Choice, Inc., in response to consumer advocacy, reflecting the patient’s “right to know” and choice of a health-care system that is best for the individual patient.

Quality measurement in genetics has lagged behind other areas of health care. As the field continues to expand and evolve, there has been and will continue to be a rapid expansion in genetic knowledge about disease etiology and development of new genetic tests and services.4,5 A number of articles and reviews have called for the development of new outcome measures and the associated importance of process measures.6,7,8,9,10,11 While there has been gray literature based on clinician recommendations, evidence to date has not systematically identified the breadth of outcome measures that should be applied to the evaluation of genetic services.5,12

In genetics, while limited public reporting occurs, it has not been comprehensive. Nevertheless, there are several examples at the national level. First, the March of Dimes publishes a report card based on the number of disorders screened in newborns by the 50 states and the District of Columbia.13 Second, the Centers for Disease Control and Prevention (CDC) has graded state Birth Defects Registries since the early 2000s. Third, the National Newborn Screening and Genetics Resource Center (NNSGRC) has posted reports compiled by state and territorial health departments up till 2000, and has since been replaced by the NewSTEPS Data Repository.14 Finally, the CDC offers, on request, proficiency testing for newborn screening laboratories. However, comparable information is not available for public health, clinical, and other laboratory genetic services in the United States. In addition, a number of instruments have been published for self-assessment, but they focus mostly on newborn screening. The Program Evaluation and Assessment Scheme (PEAS) for newborn screening developed by the Maternal and Child Health Bureau, Genetic Services Branch of the Health Resources and Services Administration (HRSA), and the NNSGRC, and the Menu of Positive Outcomes of Genetic Services from the Western States Genetic Services Collaborative (WSGSC) exemplify these efforts. In general, no instrument exists to assess genetic services that is comprehensive in covering conditions throughout the life cycle.

Therefore, the objective of this study is to identify and develop a set of metrics to be included in a genetic service assessment (GSA) tool, to inform individual states' genetic services programs.

Materials and methods

The following process was employed to specify the metrics set. We (1) conducted a systematic review of available literature on measurement in genetic medicine, (2) convened an expert panel for discussion and selection of metrics via the modified Delphi process, (3) pilot tested the selected metrics for validity, and (4) implemented the metrics in one region of the United States to assess the feasibility of implementation and dissemination.

Review of existing literature and metrics

To identify performance metrics, we conducted searches in MEDLINE, using search terms related to medical genetics (e.g., state medical/genetic screening/genetic/counseling) and quality (e.g., quality indicators/control).8 Internet searches of gray literature, and key informant interviews with several experts in genetics, were also conducted to capture measurements and standards that have been developed. An EndNote library was created to catalog the literature, available guidelines/recommendations, and published measurements. These efforts yielded an exhaustive list of measures and standards for consideration by the expert panel.

Expert panel and Delphi process

We convened an expert panel, drawing from a pool of national experts with content knowledge and expertise in genetics, performance measurement methods, or both. The expert panel was tasked to select, prioritize, and specify performance metrics through participation in a modified Delphi process (Fig. 1).15,16 The final expert panel included 24 individuals from various disciplines and represents the following organizations: Genetic Alliance, HRSA and its Genetics Services Branch; the American College of Medical Genetics and Genomics; Heartland Genetics Collaborative Advisory Board; March of Dimes; CDC Birth Defects Monitoring Program; Regional Genetics Service Collaboratives in Western, New England, and Mountain States; and the NNSGRC. Included on the 24-member panel were medical geneticists, state genetics coordinators/genetic counselors, patient and family advocates, and experts in quality improvement and metrics development.

Fig. 1
figure 1

The modified Delphi process

The expert panel held a series of in-person meetings and teleconferences to rank the list of measures and standards. Expert panel members ranked potential metrics based on the criteria established by the Strategic Framework Board of the National Quality Forum (NQF): (1) importance/relevance in the quality improvement of genetic services, (2) sufficiency of scientific evidence in support of the measure elements, and (3) feasibility to collect data needed for measurement.17,18 Each measure/standard under consideration was ranked on a five-point Likert scale ranging from 1, not very important, to 5, very important, for each criterion. Thus each measure or standard had three scores, one each for importance, evidence, and feasibility. After the first round of the modified Delphi process, metrics with scores that ranked in the top half across the three criteria were then included for further discussion and the ranking process was repeated. Panel members were also asked to suggest additional or alternative metrics. The iterative modified Delphi process led to the selection of a limited list of metrics for pilot testing from the set identified via the literature review.

Pilot testing for metric validation and refinement

Two state genetics programs were selected from the Western US region for pilot testing. The two programs were selected because they have been leaders or high performers in implementing genetic-related quality activities. Each state’s genetic coordinator was approached for participation. Upon agreement, each site was asked to complete the GSA tool, identify individuals who provided the response, and provide documentation supporting metric achievement. At site 1, the genetic coordinator completed the instrument with the help of four genetic counselors on staff. At site 2, the genetic coordinator completed the tool with assistance from genetic counselors, laboratory director, and other staff members. Site visits followed to collect qualitative information about the ease of administering the instrument, feasibility, utility, and application of the tool.

Feasibility testing

To assess feasibility of implementing the metric set, we recruited all eight states in the Heartland region, including Arkansas, Iowa, Kansas, Missouri, Nebraska, North Dakota, Oklahoma, and South Dakota, to participate in feasibility testing. Each state representative and their designees completed the GSA tool over a course of 3 months. Three states were selected for follow-up key informant interviews, after data illustrating individual and aggregated performance on each measure were computed and disseminated. The interviews assessed item appropriateness and clarity, barriers and facilitators of implementation, and suggestions for process improvement.

Results

The expert panel first held a planning meeting to identify concepts and dimensions of assessing quality that would apply to medical genetics and classify them using the Donabedian framework of relating structure and process to outcomes.19,20 Structure is defined as how the delivery of genetic services is organized and supported by the organization and infrastructure that comprise the delivery system. Process represents the actions taken and how care is delivered, including patient–provider interactions. Outcomes are results of service received.19,20 Under the structure domain, panel members identified workforce, training/education, information systems, and type of programs provided as major components of quality genetic services. Patient–provider interactions, care coordination and management, quality assurance and improvement mechanisms, and care provision and service delivery were identified as important processes in providing quality genetic services. Given the complexity in identifying causes of health outcomes, the panel decided to focus on process of care outcomes (e.g., screenings, referrals, etc.) in measuring quality for genetic services.

Concepts and dimensions identified in the inaugural expert panel guided the literature review. The systematic review of literature, guidelines, professional statements, and key informant interviews yielded a list of 61 measures and standards, which were discussed at subsequent expert panel meetings. Panel members ranked the 61 measures and standards according to the three NQF criteria, assigning a score for each criterion. Ratings were computed to generate a score for each standard. We retained 32 for a second round of modified Delphi process via electronic communication. After two more iterations of the Delphi exercise and an in-person meeting to discuss Delphi results, consensus was reached to retain a metric set of 21 elements for pilot testing.

Pilot testing took place in two states that have consistently been high performers in providing a large scope of genetic services and therefore had the capability to provide information on every measure. The two sites were able to provide data on all 21 measures and standards. These results helped to identify items or measures for elimination to further shorten the metric set to reduce burden. The final metric set contains 16 measures and standards, classified into five domains: five describing a state’s service capacity; three in access to genetic services; four in data systems; four in performance reporting; and two in workforce. Pilot testing results in informed changes in language to enhance clarity, refined definitions for each metric, identified processes to improve feasibility for data collection, and supported documentation to demonstrate how a measure is met. Findings from the site visits informed the implementation process.

The refined metric set was implemented in the Heartland region for feasibility testing. Aggregate data were computed to understand the delivery of genetic services within the region, and individual reports were generated for each state to provide information on the state’s genetic service delivery in comparison with the aggregate. Tables S1–S4 present the aggregated results from feasibility testing. We found that states were able to meet most of the 16 metrics in the set. However, gaps were found in measures related to states' support of prenatal services and adult screening, assessment and collection of patient feedback, staffing adequacy, and use of data and registry.

The qualitative data generated from three key informant interviews were synthesized, describing the feasibility of data collection, perceived utility, and limitations of the GSA tool. Most acknowledged that the data collection proceeded smoothly because of the institutional knowledge of the genetic coordinators and staff. However, it was noted that the sources of information could be fragmented as “different people have pieces of the pie.” The respondents also regarded the GSA tool as valuable for internal assessment to identify gaps to be addressed by the state in strategic planning. One respondent noted, “This tool definitely encourages thought and reflection about a program’s services, both those that are currently provided and those that are being developed.” Another said, “This tool serves as a stepping stone to mobilize QI efforts within the state.” Concerning the limitations of the GSA tool, the metrics at the time of the implementation trial assessed only the presence and availability of program/services, not their use or actual “quality.” A respondent summarized, “The tool does not get at potential problems relating to the quality of services provided; those ‘ground level’ characteristics are not addressed with this tool.

These implementation results informed the development and refinement of the GSA metric instrument to arrive at version 2 for broader dissemination (Tables 1,2,3, and 4). These results were the basis for developing definitions and determining weights and scoring for each element (Supplementary Materials). They helped identify measures that constitute the basic building blocks of a quality program in genetics versus aspirational measures. The basic building blocks were weighted more heavily than aspirational measures, which were those that only a limited number of states could meet at the time of implementation. Nevertheless, they served as goals toward which all states were striving to meet in the near future.

Table 1 Genetic service assessment (GSA) domains: statea capacity of services and access
Table 2 Genetic service assessment (GSA) domain: clinical processes and quality improvement
Table 3 Genetic service assessment (GSA) domain: performance reporting/improvement
Table 4 Genetic service assessment (GSA) domain: workforce

Discussion

Genetic services compete with other health-care services for scarce resources. In this climate of expansion in genetic services and knowledge, with the potential to impact resource allocation and new services for patients and families, this project developed systematically a set of 16 comprehensive quality metrics to evaluate public health genetics and provide agencies, such as state health departments, with a tool to assess their efforts in providing genetic services.

The list of 16 quality indicators includes (1) structural metrics elucidating access to care, workforce, and program and service capacity; (2) process of care metrics describing provider–patient interactions, continuity of care, quality programs, and performance tracking; and (3) outcome measures focusing on reports/records of tracking eligible patients receiving indicated services. Experts and pilot study participants agreed that enhancing access to service and care coordination would be key in improving genetic services, especially with limited resources. Workforce measures address concerns about the paucity of properly trained genetic professionals and the problem with having a “pipeline” for training and career development for genetics service providers. Metrics describing data systems, service capacity, and delivery focused on the process of providing care and potential for improvement. However, metrics related to health outcomes are more limited. The expert panel acknowledged that health outcomes for genetic services had not been fully developed and agreed upon, and while we will continue to follow their development, health outcomes are currently outside the scope of this project. It is worth noting that Doyle et al.9 presented a list of outcome measures related to hereditary breast ovarian cancer and Lynch syndrome, which was identified through the work of the Genomics and Public Health Action Collaborative. Nevertheless, developing health outcomes for genetic services remains difficult compared with other medical specialties.9 Genetic services often bridges many generations, generally lack procedures/medications that “heal” or “cure” the disorder, and even if a disorder is prevented from further manifestation, the magnitude of change is difficult to capture and quantify.10

Assessing the feasibility of implementing the GSA tool was an important goal as well. The utility of these metrics would be severely limited if their implementation proved infeasible. The pilot testing allowed administration of the GSA tool in states with adequate resources, and established infrastructure and extensive service capacity, to determine feasibility of data collection and make subsequent changes for broader implementation in states with potentially less capacity. Synthesis of qualitative feedback from the pilot states led to the reduction of the metric set to facilitate implementation; language and definitions for each measure were also refined. Feasibility testing further informed the implementation effort, which covered a wide region of the United States with diverse states with varying resources and infrastructure. These results provided information about the ease of implementing the GSA tool, as well as utility and application of the tool.

Wide variations among states and across regions in genetic service delivery have drawn interest from clinicians and public health experts. This work is the first opportunity to develop a comprehensive set of genetics-specific quality tools for both clinical and public health genetics, that stretches beyond newborn screening to cover the entire life cycle. This set of metrics meets the criteria of being clinically important, evidence-based, and feasible for measurement and reporting. It was also developed with the input of diverse stakeholders and the buy-in of users. It should be noted that service delivery within states may be fragmented, with services provided through either individual state health departments directly or through contracts. Consequently, many states have not been equipped to collect information in a comprehensive manner to fully assess their own capacity. This tool allows for collection of this information. A systematic survey of the genetic service landscape may be useful for state genetics coordinators and staff to conduct internal assessments to address gaps. The information may help states to assess service capacity and engage their leadership and legislators to advocate for increased resources or consider resource reallocation as well as to address the needs of state genetics programs.

Nevertheless, there are a number of limitations worth noting. First, variations exist across states in terms of population, resources, policies, and infrastructure. Some level of risk adjustment (e.g., population mix) may be appropriate when interpreting the data collected. Second, this is a “first pass” at the development of quality indicators in genetics, and more needs to be done to test the construct validity. We have recently completed data collection from additional regions to assess validity. Third, as a number of participants have indicated in their qualitative feedback, these metrics focus more on availability of services and mechanisms, rather than the utility of these in improving quality, as reflected in the scoring weights assigned to the items. While more emphasis has been placed on items that describe the services and mechanisms in place, a number of items have follow-up questions inquiring about the use of these services and mechanisms. In addition, quality measurement is a dynamic process, and improvement can only be measured longitudinally to observe any changes. Hence, it is important for the states to establish a baseline, identifying services and mechanisms that were in place at the time of data collection, and then capture changes thereafter to detect any improvement. At baseline, states may also use this tool to assess internal capacity in genetic service provision, availability and utility of tools, and processes to facilitate/enhance service delivery, which may inform strategic planning and resource allocation. Future iterations of the tool will shift the emphasis to place more value on how the system leverages its resources and mechanisms to achieve quality. Furthermore, with the advancement in health information technology and knowledge about patient-centered care, topics that will be explored for future inclusion include medical informatics, medical foods, and emergency preparedness.

The application of these metrics may quantify progress made in the field and identify areas for quality improvement. Given the specificity of genetic disorders, rapid advances in this field, and increased population demand in genetic disease prevention, treatment, and monitoring, there is greater reliance on evidence-based practices and a shift in goals of medical care to be more patient-centered. The GSA tool developed through our study may help focus policymakers’ attention to the evaluation of quality and cost-effectiveness of genetic services and additional outcome-oriented, health services research in genomic medicine.