The hemoglobinopathies are a group of inherited blood disorders caused by mutations in the globin genes and include sickle cell diseases (SCDs) and the α- and β-thalassemias. Despite the public health burden these disorders pose, the only existing universal hemoglobinopathy screening and reporting activities in the United States are the state-based newborn screening (NBS) programs; evaluation data from various state NBS programs have been voluntarily submitted to a national database since 1989.1 Screening for SCDs has been included on the recommended uniform NBS panel in all 50 states since 2006, but α- and β-thalassemia screening and reporting of results for newborns is currently performed in only a few states.2,3 However, many people at risk for a hemoglobinopathy who reside in the United States were born either before the implementation of NBS in their state or in a country without NBS. For these reasons, the true prevalence and burden of hemoglobinopathies in the United States is unknown.

A comprehensive understanding of the impact of hemoglobinopathies in the United States is important to public health practitioners, researchers, health insurers, and policy makers. Over the past several years, multiple stakeholders have identified the need for improved data collection as a priority. In 2007, the American Society of Pediatric Hematology/Oncology Sickle Cell Summit identified population-based surveillance to measure outcomes as one of five major areas of opportunity for improvement in understanding and treating SCDs.4 In 2008, the National Institutes of Health convened the Consensus Conference on Hydroxyurea Treatment for Sickle Cell Disease, which said that “a surveillance system is needed for patients with sickle cell disease . . . it should contain demographic, laboratory, clinical, treatment, and outcome information.”5

As a result of these meetings, the National Heart, Lung, and Blood Institute/National Institutes of Health and the Division of Blood Disorders at the Centers for Disease Control and Prevention collaborated to develop a state-based surveillance system for SCD and thalassemia. The purpose of this article is to describe the efforts of the participating states and federal agencies to establish the infrastructure and data collection methods for the system and to provide data on the number and characteristics of individuals with a hemoglobinopathy diagnosis identified in each of the states.

Materials and Methods


The Registry and Surveillance System for Hemoglobinopathies (RuSH) began with implementation in six states (California, Florida, Georgia, Michigan, North Carolina, and Pennsylvania) in February 2010; a seventh state (New York) was added in September of that year. In 2008, these seven states represented ~38% of the total population, 42% of the black population, 54% of the Asian population, and 49% of the Hispanic population in the United States ( Hemoglobinopathies are most common among members of these races (black and Asian) and ethnicity (Hispanic). The goal of RuSH was to identify and collect data on all people with a hemoglobinopathy diagnosis living in these states during 2004–2008.

Work groups

Three work groups were convened to establish the parameters and functional components of RuSH. The Data Collection and Harmonization Work Group assisted in identifying data sources for case ascertainment, provided guidance on the design and development of data collection tools and the data system, and made recommendations for data linkage, harmonization, security, quality assurance, and statistical analysis issues. The Surveillance Design Work Group provided guidance on the interpretation and use of clinical and laboratory information, developed case definitions, and identified and refined the clinical variables to be collected and analyzed in the surveillance system. The Community Partnerships and Health Education Work Group provided guidance on methods for community outreach, education, and communication about RuSH and determined implications for sharing of data collected by the program. During the first year of the project, each of the work groups met at least one time each month by phone, as well as for a 1-day, face-to-face meeting. They were convened as necessary for the remainder of the project period. The work groups consisted of representatives from the seven RuSH states, additional subject matter experts, and federal agency employees.

Case definitions

The Surveillance Design Work Group established three-level case definitions for SCD and for the thalassemias, based on laboratory results and International Classification of Diseases, Clinical Modification, Ninth Revision and Tenth Revision (ICD-9-CM and ICD-10-CM, respectively) codes ( Table 1 ). The levels were constructed to be indicative of the predicted reliability of the hemoglobinopathy diagnosis, with level 1 being the most reliable and level 3 the least, and they allowed for future analysis of subgroups of the study population, based on those levels of diagnostic certainty. The hemoglobinopathy-associated procedures, complications, and treatments that were included in the case definitions were based on a review of the literature, as well as on the professional opinions of the Surveillance Design Work Group members. The purpose of these items was to provide more certainty about the diagnoses than the diagnostic ICD codes alone would have provided.

Table 1 Case definitions for SCD and thalassemia

Data sources

All seven states had mandated universal NBS programs for SCD, with start dates ranging from 4 January 1975 in New York to 10 January 1998 in Georgia.3 However, only California mandated NBS and standardized reporting for α- and β-thalassemia disorders. Therefore, because individuals may have been born before NBS started in their state or in a country without hemoglobinopathy NBS, it was necessary to use data from additional sources to identify all people with a hemoglobinopathy diagnosis. NBS records, hospital discharge data, emergency room records, death records, clinical records, and state Medicaid claims were used for both case identification and as sources of demographic, medical, and health-care utilization data. Birth and immunization records were used only as sources for demographic, maternal, and vaccination data for patients identified with a hemoglobinopathy diagnosis; they did not provide any hemoglobinopathy-specific information. Additional data sources, such as the Pregnancy Risk Assessment Monitoring System, cancer registries, birth defect registries, blood bank data sets, and Medicare, were initially considered for inclusion in the project. However, it was determined that either (i) the additional variables that could be collected from these data sources did not fit the objectives of RuSH, or (ii) it was not possible for the states to access individual-level data with identifying variables from the data source. Each state used a unique combination of data sources for the project, depending on the data sets to which they were able to obtain access.

Data collection/linkage/deduplication

The variety of available data sources and the potential linking variables (name, social security number, date of birth, sex, mother’s name, phone number, county, partial zip code, address, and/or diagnosis) available in each data source necessitated a unique approach to data collection, linkage, and deduplication for each state. In general, individuals with a hemoglobinopathy diagnosis were identified by a positive laboratory result for SCD or thalassemia or ICD-9-CM or ICD-10-CM code in each data set. Next, data sets were matched and merged, one pair at a time, using a probabilistic algorithm that assigned different weights to the available linking variables. Elements of the case definitions were identified both within and across data sources. For example, the requirement that a case have two or more health-care encounters with relevant ICD-9-CM or ICD-10-CM codes could be met using hospital discharge records alone or by combining a hospital discharge record with a Medicaid outpatient record. Deduplication of cases also took place both within and across the linked data sources; the goal was for each state to create one final data set that included a single record with information on all variables for each identified individual. Using these methods, individuals with a hemoglobinopathy diagnosis were identified, their population profiles were established, and additional data sources (such as birth records) were used to augment their clinical information.


The number of unique individuals who fit the RuSH case definition, by definition level and state, are provided for SCD ( Table 2 ) and thalassemia ( Table 3 ) for six of the RuSH states (data from the seventh state were not available at the time of publication). They are further subdivided by data source in Supplementary Tables S1 and S2 online. The percentage of cases contributed by each data source varied within each state by case definition level and across states. For example, in California, the state Medicaid database (MediCal) contained 56.1% of the individuals identified as SCD level 1, 80.2% of those SCD level 2, and 78.5% of those SCD level 3. However, in North Carolina the state Medicaid database contained 44.7% of the individuals identified as SCD level 1, 31.9% of those SCD level 2, and 26.3% of those SCD level 3.

Table 2 Number of individuals with a sickle cell disease diagnosis by state and case definition level, 2004–2008
Table 3 Number of individuals with a thalassemia diagnosis by state and case definition level, 2004–2008

The sex, ethnicity, race, and mean age of the individuals identified with SCD, by state and case definition level, are provided in Supplementary Table S3 online. The percentage of females was higher in level 2 as compared with level 1, and then higher again in level 3 for each state. Overall, the majority of individuals identified with SCDs were black or African American; however, this percentage was higher in level 1 as compared with level 3 in the data set from each state. The mean age was lowest for the individuals identified as level 1. The absolute number of individuals identified as level 3 was greater than the number of individuals identified as level 1 or level 2 combined, in all states except for North Carolina.

The majority of individuals identified with thalassemia in California, Georgia, and Michigan were Asian, whereas the majority in Florida, New York, and North Carolina were black or African American (see Supplementary Table S4 online). As with SCD, the mean age was lowest for level 1, and the number of individuals identified as level 3 was higher than levels 1 and 2 combined in all states.

As a result of the RuSH case definition specifications, genotype information was available only for the individuals identified as level 1 ( Table 4 ). The percentage of individuals with hemoglobin S/S or hemoglobin S/β0 thalassemia ranged from 55.3% in Michigan to 66.2% in Georgia. New York had the highest percentage of hemoglobin S/C (34.7%), and Michigan had the highest percentage of hemoglobin S/β+ thalassemia (11.2%). Data are also presented for thalassemia.

Table 4 Genotypes of level 1 individuals by state, 2004–2008


The Registry and Surveillance System for Hemoglobinopathies used novel case definitions to identify individuals with hemoglobinopathies across multiple data sources and collected information on their demographics, clinical characteristics, and health-care utilization. Each of the participating states developed an estimate of the number of people living with these conditions and a profile of these populations. The project also provided each state with a better understanding of the strengths and limitations of how existing data sources could be leveraged to conduct surveillance of hemoglobinopathies.

One of the strengths of the RuSH project was the inclusion of a three-level case definition. The individuals identified as level 1 had the most evidence of truly having a hemoglobinopathy, whereas the individuals identified as level 3 had the least amount of evidence. The utility of this multitiered case definition was that it enabled subsets of the data to be analyzed based on the degree of confidence we had for each level to indicate accurate hemoglobinopathy diagnoses. For example, we were able to compare the results of our data collection with previously published estimates from Brousseau et al.6 and Hassell7 (Supplementary Table S5 online). When making this comparison, we chose to include only the individuals identified as SCD level 1 and level 2 because of our uncertainty about the accuracy of SCD level 3 diagnoses. Brousseau et al. used census data adjusted for mortality rates by sickle cell type to report the number of blacks and Hispanics with SCD in each state. Hassell also compiled population estimates for each state by using information from a variety of data sources, again adjusted for early SCD mortality. Compared with the data from Brousseau et al., the RuSH numbers are lower for California, Florida, Michigan, and New York and higher for Georgia and North Carolina. The RuSH numbers for Florida and New York fall within the range reported in the article by Hassell; California, Georgia, and North Carolina are higher than the range reported by Hassell; Michigan’s numbers are lower than the range. Because Hassell’s and Brousseau et al.’s estimates of the number of individuals living with SCD in the seven RuSH states are the only ones currently available, it is worthwhile to note the similarities and differences among the results from all three studies.

The development of the RuSH system built many new partnerships and coalitions. State health department employees, health-care providers, academic institutions, community organizations, patients, and families were all important contributors to the program, and they worked closely with each other throughout the entire process. The combined efforts of these partners produced information that will allow for a better understanding of the impact of hemoglobinopathies on affected individuals identified in the participating states. This approach differs from many of the analyses that are currently available, which often use only a single data source.8,9,10 Our system showed that not all individuals were found in every data source and that the various sources contained different types of information. Therefore, we expect that the knowledge gained from combining all of these data will be more comprehensive than using a single data source on its own.

Furthermore, although the original intent of RuSH was to devise a standardized data collection protocol for all states to follow, it was quickly discovered that the same methods could not be used by all states because of the varying availability of both data sets and the identifying information contained within those data sets. Consequently, each state created a unique system for obtaining RuSH data, and the benefits and problems with each of these approaches can be compared and evaluated.

There were also limitations of the RuSH study design. The legal hurdles that were encountered when trying to obtain data-sharing agreements or memorandums of understanding to access data sets were time consuming and required the help of many people external to the program, including attorneys in some states. Most states found that the RuSH methods were suited to collecting data on people with SCD but did not work for people with thalassemia, presumably because of a lack of NBS data and nonspecific thalassemia-related ICD-9-CM codes during the years covered by this program. The individuals with a hemoglobinopathy diagnosis included in the study were limited to only those who were identified with the RuSH case definitions in at least one of the available data sources. Individuals who were not born in the state or who were born before universal screening was initiated, those not insured by Medicaid, or those who were not hospitalized during the study period or provided care through an emergency department may not have been accounted for by the case-finding methods. Furthermore, because California was the only participating state that comprehensively screened all newborns for α- and β-thalassemia, its genotype data for that condition was the most robust.

It is probable that the demographics of the individuals identified in RuSH may have been a result of the original data source(s) in which the person was found. For example, the lower mean age of the individuals identified as level 1 relative to that of individuals in the other levels is likely a reflection of the large portion of patients who were identified in NBS records as compared with clinical records. In addition, the use of Medicaid as a data source for some states may have resulted in a bias toward identification of individuals who were more likely to be younger (49% of Medicaid enrollees are younger than age 18) and female (58% of Medicaid enrollees) ( One possible reason may be that although the optional criteria for extended Medicaid eligibility differ from state to state and year to year, the minimum core eligibility requirements include all children up to age 18 with family incomes less than 100% of the Federal Poverty Level and pregnant women with family incomes up to 185% of the Federal Poverty Level (

Unfortunately, there was not an opportunity to evaluate or validate any of these components during the short project period. Consequently, a new project, the Public Health Research Epidemiology and Surveillance for Hemoglobinopathies (PHRESH), was implemented in two of the RuSH states (California and Georgia) to validate the data collection methods used in RuSH. PHRESH will result in a refined case definition and better understanding of the RuSH results, which will help to validate the information gathered during the pilot period. The states that are participating in PHRESH will use multiple methods to accomplish these activities, including review of medical records and establishment of new partnerships with additional clinical facilities. The goal is for these new clinical partnerships to help the surveillance system to both (i) identify individuals missed during the original RuSH data collection and (ii) provide additional information that can help strengthen the validity of the current case definitions.

The health-care utilization and clinical data collected in the RuSH system could serve as the foundation for the development of a patient registry that can be used to collect ongoing longitudinal information. A hemoglobinopathy registry that includes individuals who are identified through a population-based surveillance system will enable researchers to better understand the entire spectrum of the patient population—those who receive comprehensive care, those who use the emergency department as their main source of health care, those who have public/private/no insurance, and those who have a mild presentation of the disease, among others. A system of this sort will allow researchers to answer specific questions that are important for the hemoglobinopathy community, such as the utilization of evidence-based care by the physicians who treat those with hemoglobinopathies. The knowledge gained from a registry will result in a better understanding of the conditions and, ultimately, improvement of the lives of the individuals with a hemoglobinopathy.


The authors declare no conflict of interest.