Racial Differences in Four Leukemia Subtypes: Comprehensive Descriptive Epidemiology

Leukemia is a malignant progressive disease and has four major subtypes. Different racial groups differ significantly in multiple aspects. Our goal is to systematically and comprehensively quantify racial differences in leukemia. The SEER database is analyzed, and comprehensive descriptive analysis is provided for the four major subtypes, namely ALL (acute lymphoblastic leukemia), CLL (chronic lymphoblastic leukemia), AML (acute myeloid leukemia), and CML (chronic myeloid leukemia), and for two age groups (≤14 and >14) separately. The racial groups studied include NHW (non-Hispanic White), HW (Hispanic White), BL (Black), and API (Asian and Pacific Islander). Univariate and multivariate analyses are conducted to quantify racial differences in patients’ characteristics, incidence, and survival. For patients’ characteristics, significant racial differences are observed in gender, age at diagnosis, diagnosis era, using radiation for treatment, registry, cancer history, and histology type. For incidence, significant racial differences are observed, and the patterns vary across subtypes, gender, and age groups. For most of the subtypes and gender and age groups, Blacks have the worst five-year survival, and significant racial differences exist. This study provides a comprehensive epidemiologic description of racial differences for the four major leukemia subtypes in the U.S. population.

leukemia as well as other cancer types, it has been suggested that differences in treatment exist and may explain some of the observed racial disparities in survival. In a hospital-based study on CLL 4 , it is found that compared to non-Blacks, Blacks had significantly shorter event-free and overall survival. Another study tested the hypothesis that after adjusting for biological factors, Black and Spanish children with newly diagnosed ALL had a worse survival compared to Whites 7 .
A better understanding of racial differences can assist diagnosis, implementation of tailored treatment strategies, and elimination of racial disparity 8,9 . Although sharing the same scheme of analyzing racial differences in leukemia as the aforementioned the other publications, this study may distinguish from them in multiple aspects. First, it analyzes the four major subtypes, four racial groups, and two age groups using the same techniques, facilitates direct comparisons, and is more comprehensive than those that focus on a single subtype/age group and a smaller number of racial groups. Second, the SEER (Surveillance, Epidemiology, and End Results) database is analyzed. The wide coverage and large sample size ensure generalizability and validity and make this study more powerful than those based on a single hospital or community. Third, this study comprehensively addresses patients' characteristics, incidence, and survival, and can be more comprehensive than those that focus on one single aspect. As such, this study can be complementary to the literature and is warranted.

Methods
Study population. Data are obtained from SEER 10 , which is the largest population-based cancer registry in the U.S. and contains input from eighteen regional and state registries. SEER has multiple registry groupings for analysis 11 , which cover different numbers of regions and different time periods. In this study, data are obtained from SEER 13 and 18, which cover approximately 14% and 28% of the U.S. population, respectively. For each case, the first matching record is identified for analysis. In SEER, the four major subtypes are identified by the International Classification of Diseases for Oncology ( [9945][9946]. It is noted that some published studies suggest slightly different definitions. Specifically, they exclude CMML (Chronic Myelomonocytic Leukemia, ICD-O-3 code 9945) from the analysis of CML. We adopt the SEER definition to be coherent with the database. The four major racial groups are HW (Hispanic White), NHW (non-Hispanic White), BL (Black), and API (Asian and Pacific Islander). For the analysis of patients' characteristics and incidence, SEER 13 contains data on patients diagnosed in the period of 1992-2011 and from thirteen registries. For the analysis of survival, SEER 18 contains data on patients diagnosed in the period of 1992-2006 and followed up to 12/31/2011 and from eighteen registries. Using different registry groupings maximizes sample sizes for analysis. Statistical analysis. Data on the four major subtypes and two age groups (≤14 and >14) are analyzed separately. This age division has been suggested in the literature 12,13 . Univariate analyses are conducted to compare patients' characteristics across racial groups using Chi-squared tests and ANOVA for categorical and continuous variables, respectively. Variables analyzed include gender, age at diagnosis (the younger age group: ≤4, 5-9, 10-14; the older age group: 15-34, 35-54, 55+), diagnosis era (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011), treatment (no radiation, radiation, unknown), registry (West, Northeast, Midwest, South, which has been suggested in the literature), survival, and histology type. Age-adjusted incidence rates and five-year relative survival rates are computed using the SEER*Stat 8.2.1 software and age-adjusted using the U.S. Census 2000 data as reference. Multivariate Cox regression is then conducted to further investigate racial differences in survival, adjusting for the potential confounding effects of age at diagnosis, gender, diagnosis era, treatment, registry, and histologic type. Analysis that cannot be carried out using SEER*Stat is realized using SAS 9.3 (SAS Institute Inc., Cary, NC, U.S.).

Results
Patients' clinicopathologic characteristics. Results for the age ≤14 group are shown in Table 1. With an insufficient sample size, CLL is not analyzed. For the other three subtypes, there is no difference in the distribution of gender across races. Significant differences in age at diagnosis are observed for ALL (p-value < 0.01). Specifically, APIs have the lowest age at diagnosis, and Blacks have the highest. For ALL, significant differences are also observed for diagnosis era (p-value < 0.001), with more diagnoses in the period of 1992-2001. In the analysis of treatment, as the counts for the "unknown" category are small, only "no radiation" and "radiation" are compared. Significant differences are observed for ALL (p-value 0.035) and CML (p-value < 0.01). Specifically, for ALL, APIs have the highest rate of "no radiation" (92%), while Blacks have the lowest (86.1%). For CML, the trend is reversed, with Blacks having the highest rate of "no radiation" (85.7%) and APIs having the lowest (65%). Significant differences are also observed for registry (p-values < 0.01). For all three subtypes, the dominating majority of HW and API patients are from West, while the percentages of NHW and BL patients from West are much lower. For all subtypes and all racial groups, the dominating majority of patients have no history of cancer. Significant differences in survival are observed for ALL and AML across racial groups (p-values < 0.01). NHWs have the longest mean survival, and BLs have the shortest. For example, for ALL, the mean survival for NHWs and BLs are 94.3 and 82.8 months, respectively. In the analysis of histology type, significant racial differences are observed for ALL and CML (p-value < 0.001 and 0.015, respectively). For ALL, APIs have a lower percentage of "precursor cell LL, NOS" than the other racial groups. For CML, NHWs and APIs have lower percentages of "chronic MML, NOS".
Results for the age > 14 group are shown in Table 2. For AML, the distribution of gender differs significantly across races (p-value < 0.01). NHWs have the highest percentage of male (54.7%), while BLs having the lowest (49.1%). For all subtypes, the distribution of age at diagnosis differs significantly across races (p-values < 0.01). NHWs have the highest age at diagnosis, whereas HWs, BLs, HWs, and HWs have the lowest for the four subtypes, respectively. Diagnosis era has significant racial differences for all four subtypes. Specifically, the racial groups that have more diagnoses in the 1992-2001 period are NHW (ALL), NHW and BL (CLL), NHW and BL (AML), and NHW (CML), respectively. For treatment, significant racial differences are observed for AML and CML, although it is noted that the dominating majority had "no radiation". Significant racial differences are observed for registry, and the patterns are similar to those in Table 1. Unlike for the younger age group, significantly more patients have a history of cancer, and racial differences are significant for all subtypes, with NHWs having the highest percentages of cancer history. Significant racial differences are observed for survival time, and patterns vary across subtypes. For example, for ALL, APIs have the longest mean survival (36.7 months), while BLs have the lowest (24.3 months). In comparison, for CLL, NHWs have the longest mean survival (59.3 months), while BLs have the lowest (50.6 months). For ALL, AML, and CML, racial differences are observed for histology type (p-values < 0.01). For example, for ALL, the percentage of NHWs with "precursor cell LL, NOS" (64.9%) is much higher than that for HWs (53.5%).
The age-adjusted incidence rates are presented in Table 3. Some analyses are not conducted because of small counts. For the ≤14 age group, for ALL, HWs have the highest incidence rate (5.3 per 100,000 person-years), followed by NHWs (3.9) and APIs (3.3), while BLs have the lowest incidence (1.9). When stratified by gender and age, the patterns persist, although the relative magnitudes are different. For AML, the overall incidence rate is much lower, and different racial groups have similar low rates. When stratified by gender and age, the patterns are similar. In the analysis of CML, the overall rate is low, and most counts are too small to generate reliable estimates. For the >14 age group, for ALL, HWs have the highest incidence rate (1.6 per 100,000 person-years), NHWs and APIs have comparable rates (0.8), and BLs have the lowest (0.6). The patterns are mostly retained in the stratified analysis. For CLL, overall, NHWs have the highest incidence rate (6.9), followed by BLs (4.4) and then HWs (2.9). Males have much higher incidence rates, and the racial patterns for both males and females are similar to those overall. A significant dependence on age is observed, with the 55+ age group having a much higher incidence. In the analysis of AML, overall, NHWs have a higher incidence rate, while the other three races have similar rates.
Males have higher incidence rates, and the across-race patterns for both males and females are similar to those overall. The dependence on age is observed, with the 55+ group having a significantly higher incidence. For CML, overall, NHWs have the highest rate (2.2), followed by BLs (2.0) and then NHs (1.8). Similar patterns are observed for both males and females, with males having a higher incidence. An increasing trend with age is observed. For a better visualization, the incidence analysis results are also shown in Fig. 1.
The five-year relative survival rates are provided in Table 4, along with the p-values of race from the multivariate Cox regression analysis. Analysis is not conducted for the CML age ≤14 group because of a small sample size. The detailed Cox regression results are available from the authors. The relative survival rates are also calculated for up to five years, and the results for all races and subtypes are presented in Fig. 2. For the ≤14 age group, in the analysis of ALL, NHWs and APIs have the highest five-year survival rates (0.83), while BLs have the lowest (0.75). In the multivariate analysis, after adjusting for confounding effects, the racial difference is significant (p-value < 0.01). In the stratified analyses, the patterns are mostly consistent, with the exception that for the 5-9 years age group, APIs have the lowest survival rate (0.78). In all of the stratified analyses, the racial differences are statistically significant. For AML overall, HWs have the highest survival rate (0.52), BLs have the lowest (0.46), and the racial difference is significant. Different patterns are observed for males and females. The racial difference is significant in multivariate analysis for females but not males. When stratified by age, only the 10-14 years age group has a significant difference (p-value 0.049). In the analysis of CML, overall, APIs have the highest survival rate (0.6), followed by NHWs (0.56), while BLs have the lowest (0.47). Different patterns are observed for males and females. Specifically, the female HWs have a much higher survival rate, although it is noted that this result should be interpreted cautiously because of the small sample size. When stratified by age, different patterns are observed for different age groups. For the >14 age group, for ALL overall, APIs have the highest survival rate (0.35), while BLs have the lowest, and the racial difference is significant (p-value < 0.01). When stratified by gender, similar patterns are observed, and both gender groups demonstrate significant racial differences. When   14), and the racial difference is significant (p-value < 0.01). For both males and females and for the 35-54 and 55+ age groups, the racial differences are significant. However, the patterns differ across subgroups. For CML, overall, HWs have the highest survival rate (0.54), NHWs have the lowest (0.39), and the racial difference is significant. Significant differences are also observed in the stratified analysis for females, 35-54, and 55+ age groups. For females and the 55+ age group, NHWs have the lowest survival rates, while for the 35-54 age group, BLs have the lowest rate. Figure 2 shows that the patterns are mostly persistent across time. However, a few "crossings" are observed, for example, for the CML ≤ 14 age group.

Discussion
Findings. As a major cancer type, the epidemiology of leukemia has been extensively studied, and it has been well acknowledged that racial differences exist in multiple aspects of leukemia. However, most of the existing studies simply include race as a confounder in analysis. A few studies have been focused on racial differences, however, usually limited to the disease overall or a single subtype, fewer racial groups, and one single aspect of the disease. This article has filled the knowledge gap by comprehensively examining racial differences for the four major subtypes, multiple aspects of the disease (including patients' characteristics, incidence, and survival), and four major racial groups. Analyzing data on the same ground using the same techniques allows for direct across-analysis comparisons. It should be noted that this study and some published ones may have analyzed different datasets/time periods, which may lead to results not directly comparable. With the broad coverage of the analyzed SEER data and wide timespan, findings made in this study may complement the existing literature. U.S. is an immigrant country with a significant racial mixture. Observations made in this study can provide more detailed information than the existing literature and assist public health and clinical investigators to better understand this disease, develop and implement tailored treatment and health care programs, and more appropriately allocate medical resources. Different from most other cancer types, leukemia is a major cancer for children (although the overall rate is still low). Childhood and adult leukemias differ in multiple aspects, and so analysis has been conducted for different age groups separately. Similar to other cancers, racial differences have been observed in multiple patients' characteristics. For the ≤14 age group, data on ALL, AML, and CML have been analyzed, and racial differences have been observed in the distributions of age at diagnosis, diagnosis era, treatment, registry, survival, and histology type. It is noted that, as there is only a small number of CML cases, the corresponding results should be interpreted with cautions. For ALL, the higher age at diagnosis for BLs may be caused by later onset, later diagnosis, and other factors, as have been suggested in the literature for other cancers. More data collection is needed to "tease out" the effects of, for example, later onset and quantify the extent of later diagnosis, and prevention and control programs should be developed accordingly to eliminate such disparity. The differences in diagnosis era can be confounded by differences in population structure and should be interpreted with cautions. Comparatively, there are fewer NHW ALLs in the 2002-2011 period, which can be caused by changes in both incidence and diagnosis. There are multiple treatment strategies for leukemia 14,15 , including induction chemotherapy, consolidation therapy (or intensification therapy), preventive therapy, maintenance treatments with chemotherapeutic drugs, and others. For most cases, radiation is not the mainline treatment, as has been observed in this analysis. Unfortunately, SEER only provides limited information on treatment. Similar problems have been observed in other racial difference studies using SEER. Some smaller studies have more informatively examined treatment. For example, a California Cancer Registry-based study 6 found that the Black race was associated with a lower probability of chemotherapy, and Blacks and Hispanics had a lower probability of transplant. The significant racial differences in registry are attributable to the uneven racial distribution of the U.S. population. In the analysis of histology type, the difference observed for CML again needs to be interpreted cautiously because of the small counts. For ALL, APIs have a lower percentage of precursor cell LL, NOS, which is likely caused by genetic factors. In the analysis of the >14 age group, more racial differences are observed. In particular, the distribution of gender is found to differ across races for AML, and the distribution of cancer history differs across races for all four subtypes. The cause of leukemia is still not completely known. The observed difference in gender distribution can be caused by genetic factors (that are related to gender) 16 as well as gender-related confounders such as smoking, occupational exposure to radiation and chemicals, which contribute to leukemia risk, and others. Unlike for the younger age group, a higher percentage of cancer history is observed. The significantly higher percentage for NHWs can be caused by both genetic factors (that lead to cancer co-occurrence 17 ) as well as a higher rate of diagnosis.   The incidence of leukemia is extremely complex. The observed incidence rates depend on the actual incidence as well as diagnosis and reporting. Multiple risk factors for leukemia overall and subtypes have been suggested in the literature, although the exact cause of leukemia is still not fully understood. In addition, it has been suggested that different subtypes, with their different pathological behaviors, have significantly different sets of risk factors 15 . Risk factors that have been suggested for leukemia overall and/or specific subtypes include smoking, exposure to chemicals, history of cancer and treatment, exposure to radiation, certain blood problems, congenital syndromes, family history, viral infections, as well as genetic abnormalities. Many of these factors, for example smoking, exposure to chemicals and radiation, and cancer history and treatment, have been suggested as race-dependent.
A few recent small-scale studies have also reported variations of molecular risk factors across races. For example, a recent study was focused on racial differences in CLL and examined genes Notch 1, SF3B1, p53, MyD88, BIRC3, ZAP70, and SCF 18 . Another study examined Black patients with CLL and suggested that they were more likely to be presented with unmutated IGHV gene, ZAP70 expression, and chromosome 17p or 11q deletion 4 . In the literature, although multiple recent studies have investigated genetic, epigenetic, and genomic markers for leukemia etiology 19,20 , attention to their racial differences or subtype-and race-specific interactions between genetic and other risk factors is still insufficient. Another limitation of the existing molecular studies on etiology is their insufficient power (small sample sizes).
The prognosis of leukemia overall and subtypes has been studied extensively in the literature 21,22 . Quite a few studies have suggested a survival disadvantage of the Blacks 4,6,23 . In our analysis, the survival disadvantage of the Blacks is observed for most but not all of the subtypes and age/gender groups. Multiple factors have been suggested to contribute to prognosis. One study suggested that the poor prognosis of Black children with AML was attributable to excessive treatment related mortality but not baseline differences in disease characteristics, response to therapy, or complications from stem cell transplant 23 . Another study suggested that the survival disadvantage of Black women with CML could be caused by selective imatinib resistance, which is likely to be caused by genetic factors 24 . In our analysis, it is observed that AML and CML have the worst prognosis in adult NHWs. This observation is consistent with the literature, where APL (acute promyelocytic leukemia), which has better prognosis than other subtypes, was excluded 25 . The poor prognosis of NHWs and Blacks with AML may be attributable to their higher rates of previous cancers (including previous AML, which may lead to a higher risk of secondary AML). For AML, prognostic factors suggested in the literature include treatment-related factors, for example, Zubrod scale 26,27 , age, serum albumin, and bilirubin, and resistance-to-treatment-related factors, of which the most important ones are the pretreatment cytogenetic and molecular genetic markers in AML blast. For CML, prognosis in the early chronic stage is analytically determined by scores derived from clinical and laboratory features. Other factors that have been suggested as associated with prognosis include cytogenetic changes, for example, deletions of the derivative chromosome 9, and degree and timing of hematological, cytogenetic and molecular responses. The poor prognosis of NHW can be caused by one or multiple of these factors, for example, diagnosis at older ages. In the literature, there is still a lack of study linking, for example, the aforementioned genetic risk factors with the NHW race. A limitation of SEER is that it does not have detailed information on treatment, which can be strongly associated with survival and also vary across races. Another factor that is also associated with survival and may vary across races is socioeconomic status 28 , which may directly affect early diagnosis, ready access to quality health care, and sufficient time and energy to maintain compliance with treatment. SEER started to have insurance information in 2007. For the analyzed time period, linking to other databases for example Medicare, is needed to obtain more useful information on treatment and socioeconomic status. In recent omics studies, molecular changes have also been associated with survival 22 . Similar to etiology, variations of molecular risk factors across races have not been carefully investigated for prognosis.
Limitations. SEER is chosen as the source of data because of its comprehensiveness, wide coverage, and large sample size. On the other hand, its limitations have been well noted. Specifically, important information, for example on treatment, socioeconomic status, environmental exposures, and genetic risk factors, is missing. Smaller hospital-based studies and linking with other databases may solve some of the problems, but they also have limitations such as small sample sizes and biased sample selection. Another complication may be brought by the multiple coexisting classification schemes. The old SEER database used the ICD-O coding, which was later converted to ICD-O-3, causing unclassified cases. The SEER population also have a higher proportion of foreign-born patients than the general U.S. population. Patients' characteristics, incidence, and survival all depend on environmental and socioeconomic factors, which vary significantly across countries. The analysis results drawn on the U.S. population may not be generalized to other countries.
Summary. This study has conducted an epidemiologic analysis and quantified racial differences for four major leukemia subtypes, four racial groups, and two age groups in multiple aspects. It advances from the existing literature by being more comprehensive. Some plausible causes of the observed differences have been suggested. It is also noted that the SEER database is limited by lacking certain important information. More comprehensive data collection and analysis are needed to fully decipher the observed racial differences. Despite certain limitations, as shown in the published SEER-based studies, it is expected that this study can be useful to public health and medical investigators by assisting in early detection, risk stratification, proper treatment selection, and ultimately elimination of racial disparity in leukemia.