Hispanic children have a 10–30% greater incidence rate of ALL than non-Hispanic whites, and nearly double the rate observed in African-Americans.1 Ethnic differences in ALL incidence may be explained by population-level differences in the frequency of genetic risk factors, including those first discovered in genome-wide association studies of European-ancestry populations.2, 3, 4, 5 As Hispanics are an admixed population with European, African and Native American ancestry, differences in ALL incidence observed in Hispanics may be attributable to genetic risk factors associated with Native American ancestry.
Increased Native American ancestry has been linked to increased risk of relapse among Hispanic children with ALL,6 but no study has yet investigated the contribution of genome-wide Native American ancestry to ALL incidence. Using genome-wide SNP data from 298 Hispanic children with B cell ALL and 456 matched controls from the California Childhood Leukemia Study (CCLS), we investigated whether genome-wide Native-American ancestry was associated with increased risk of B-cell ALL. Additionally, we assessed whether the risk alleles at loci identified in genome-wide association studies of European-ancestry populations (IKZF1, CDKN2A, PIP4K2A, ARID5B, CEBPE) were more common in individuals with greater levels of Native American ancestry. Finally, we quantified the contribution of these validated risk loci to the increased ALL incidence observed in Hispanics relative to populations of European or African ancestry.
Study participants were Hispanic children from the CCLS, whose recruitment and enrollment procedures have been described in detail previously (Supplementary Table S1).7, 8 Cytogenetic characteristics of included cases are shown in Supplementary Table S1. DNA was isolated from dried bloodspots collected at birth and archived by the California Department of Public Health. Samples were genotyped using the Illumina OmniExpress platform, assaying 730 525 single-nucleotide polymorphism (SNPs). Samples with genotyping call rates<98%, with discordant sex information (reported versus genotyped sex), or showing evidence of cryptic relatedness were excluded from analyses. To exclude poorly genotyped SNPs, SNPs with genotyping call rates <98% or Hardy–Weinberg Equilibrium P-value <1 × 10−5 in controls were removed from analyses.
A linkage-reduced set of 63 303 autosomal SNPs, evenly distributed across the genome, was extracted from the case-control data and the Human Genome Diversity Project (HGDP) data. The genetic structure of study subjects was evaluated using Structure v2.3.1 to estimate percent membership in three distinct founder populations: sub-Saharan African, European and Native American.9 Founder population allele frequencies were defined using SNP data from 372 unrelated HGDP individuals, including 111 Africans, 107 Native Americans and 154 Europeans.10
Logistic regression was used to determine if Native American ancestry was associated with case-status, with adjustment for sex, age and risk SNPs (where indicated). Logistic regression was also used to determine if these SNPs were associated with case-status, after adjustment for sex and age. We report results for the five SNPs (one in each risk locus) that achieved genome-wide significance in a previously published genome-wide association study and which were successfully genotyped on our Illumina platform. Although the array data provides genotypes for additional SNPs in these regions, we believed it important to analyse Native American ancestry in relation to risk loci first identified in populations of European-ancestry.
Correlations between Native American ancestry and number of risk alleles in IKZF1, CDKN2A, PIP4K2A, ARID5B and CEBPE were assessed using Pearson’s correlation coefficient. The contribution of known susceptibility loci to ethnic incidence rate ratios were calculated according to varying genotypic relative risks and ethnic group allele frequencies using previously described methods.11 Additional information on samples, genotyping and statistical procedures is available in the Supplementary Methods.
A total of 297 cases and 454 controls passed all quality control filters. Four SNPs identified as ALL risk factors in previous genome-wide association studies were significantly associated with ALL risk in our Hispanic sample (Supplementary Table S2). The strongest association was at rs7089424 in ARID5B (odds ratio (OR)=2.33, 95% confidence interval (CI): 1.85-2.92, P=2.6 × 10−14). As previously reported,2, 3, 4 this effect was stronger in hyperdiploid cases (OR=2.91, 95% CI: 2.05–4.12, P=2.1 × 10−10). SNP rs2239633 in CEBPE was also more strongly associated with hyperdiploid B-cell ALL (OR=2.07, 95% CI: 1.44–2.98, P=8.9 × 10−5) than with B-cell ALL not stratified by subtype (OR=1.35, 95% CI: 1.09–1.68, P=6.6 × 10−3). Although rs7088318 in PIP4K2A was not statistically significantly associated with B-cell ALL risk in our sample (OR=1.16, 95% CI: 0.92–1.49, P=0.21), the association approached significance among hyperdiploid cases (OR=1.37, 95% CI: 0.96–1.96, P=0.084). Risk alleles at rs4132601 (IKZF1) and rs3731217 (CDKN2A) were also strongly associated with B-cell ALL risk in our case-control sample (OR=1.46, 95% CI: 1.16–1.83, P=1.3 × 10−3 and OR=1.76, 95% CI: 1.17–2.65, P=4.6 × 10−3, respectively).
Compared with controls, cases had higher levels of Native American ancestry and lower levels of European ancestry (Supplementary Table S1 and Supplementary Figure S1). After adjustment for age, sex and percent African ancestry, each 20% increase in Native American ancestry was associated with a 1.20-fold increase in risk of B-cell ALL (OR=1.20, 95% CI: 1.00–1.45, P=0.048) (Supplementary Table S2). The association between genome-wide Native American ancestry and ALL risk was modestly attenuated when controlling for genotype at rs3731217 (CDKN2A), rs7088318 (PIP4K2A) and rs2239633 (CEBPE) (1, 2.5 and 4.2% decreases, respectively), and was further attenuated when conditioned on genotype at rs7089424 (ARID5B, 6.6% decrease) (Supplementary Table S2). These SNPs, in particular rs7089424, may contribute to the observed association between Native American ancestry and ALL risk.
Further support for this was shown when correlations were calculated between Native American ancestry and number of risk alleles at the five ALL risk SNPs. The number of risk alleles at four of these SNPs was positively and significantly correlated with increased Native American ancestry (Table 1). The strongest of these associations were with ARID5B and PIP4K2A SNPs (r=0.13, P=6.0 × 10−4 and r=0.18, P=2.1 × 10−5, respectively). The number of risk alleles at rs3731217 (CDKN2A) and rs2239633 (CEBPE) was also positively correlated with increased Native American ancestry (r=0.11, P=3.6 × 10−3 and r=0.081, P=0.027, respectively). These associations were consistent when analyses were restricted to control subjects, indicating that these associations reflect population structure, independent of case-status (Table 1).
We next assessed whether these risk loci contribute to the increased ALL incidence observed in Hispanics relative to populations of European or African ancestry (Table 2). Interestingly, the risk allele of rs3731217 in CDKN2A has an allele frequency of 100% in Native Americans. Despite the absence of the minor (protective) allele in this population, this SNP explains only a small proportion of the increased B-cell ALL risk observed in Hispanics compared with European or African-ancestry populations.
Previously identified risk alleles in CEBPE, PIP4K2A and ARID5B are also more common in Native American and Hispanic populations than in Europeans. SNP rs2239633 in CEBPE accounted for a 1.03-fold increased risk of B-cell ALL in Hispanics versus Caucasians (95% CI: 1.002–1.067). In addition, rs7089424 in ARID5B accounted for a 1.11-fold increased risk of B-cell ALL in Hispanics versus Caucasians (95% CI: 1.005–1.212) (Table 2). As this SNP is more strongly associated with hyperdiploid B-cell ALL than with other subtypes, it can explain an even larger proportion of the differences observed across populations in the incidence of this ALL subtype (Supplementary Table S3).
Our findings suggest that the increased risk of B-cell ALL observed in Hispanic populations is due, at least in part, to an effect of Native American ancestry. In our sample, each 20% increase in the proportion of an individual’s genome that is of Native American origin conferred a 1.20-fold increased risk of B-cell ALL. Because increased Native American ancestry was also associated with known ALL risk alleles, even among controls, we believe the increased risk of ALL associated with increased Native American ancestry is not easily attributed to potential confounding factors.
Taken together, the risk alleles in CDKN2A, PIP4K2A, CEBPE and ARID5B may account for an important proportion of the ALL incidence differences observed across ethnicities. Although these variants are associated with ALL risk in numerous populations,5, 12, 13, 14 their increased frequency in populations with Native American ancestry may result from a founder effect occurring during migration to the New World and genetic drift during subsequent population expansion.
As a corollary to the positive association between Native American ancestry and ALL risk, increased European ancestry is associated with decreased B-cell ALL risk in this Hispanic sample. However, were European ancestry protective, both Hispanic and African-American populations would be expected to have higher ALL incidence than European populations. As African-Americans have lower ALL incidence than Europeans, it appears the Native American component of Hispanic ancestry may be a risk factor, and not that the European component is a protective factor. This is further corroborated by our observations that known risk alleles in CDKN2A, PIP4K2A, CEBPE and ARID5B were all significantly associated with increased Native American ancestry.
In conclusion, we demonstrate that increased genome-wide Native American ancestry is associated with an increased risk of B-cell ALL in Hispanic children, and trace this to the effects of at least three genes. Additional questions remain as to whether the known risk loci can account for all of the increased B-cell ALL risk observed in Hispanics, or if additional risk loci can be identified though further study of this high-risk population.
This work was supported by National Institutes of Health grants: R25CA112355 (KMW), R01CA155461 (JLW, XM), R01CA126831 (JKW) and R01ES009137 (APC, LH, CM, GVD, MLL, KB, LFB, JLW, and PAB). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
About this article
Supplementary Information accompanies this paper on the Leukemia website (http://www.nature.com/leu)
An overview of disparities in childhood cancer: Report on the Inaugural Symposium on Childhood Cancer Health Disparities, Houston, Texas, 2016
Pediatric Hematology and Oncology (2018)