SNP variants associated with non-Hodgkin lymphoma (NHL) correlate with human leukocyte antigen (HLA) class II expression

Large consortia efforts and genome-wide association studies (GWASs) have linked a number of genetic variants within the 6p21 chromosomal region to non-Hodgkin lymphoma (NHL). Complementing these efforts, we genotyped previously reported SNPs in the human leukocyte antigen (HLA) class I (rs6457327) and class II (rs9271100, rs2647012 and rs10484561) regions in a total of 1,145 subjects (567 NHL cases and 578 healthy controls) from two major ethnic groups in Malaysia, the Malays and the Chinese. We identified a NHL-associated (PNHL_add = 0.0008; ORNHL_add = 0.54; 95% CI = 0.37–0.77) and B-cell associated (PBcell_add = 0.0007; ORBcell_add = 0.51; 95% CI = 0.35–0.76) SNP rs2647012 in the Malaysian Malays. In silico cis-eQTL analysis of rs2647012 suggests potential regulatory function of nearby HLA class II molecules. Minor allele rs2647012-T is linked to higher expression of HLA-DQB1, rendering a protective effect to NHL risk. Our findings suggest that the HLA class II region plays an important role in NHL etiology.

The HLA region maps to chromosome 6p21.3 in human. The primary function of HLA molecules is immune response. HLA class I molecules present endogenous antigens, and class II molecules present exogenous antigens to T cells, creating the "tri-molecular complex" (HLA-peptide-TCR) that initiates the immune response. The region contains more than 200 identified genes, over half of which are predicted to be expressed. Only some of the HLA region genes are involved in the immune response; in particular, the genes that encode the classical class I (A, B, and C) and class II (DR, DQ, and DP) antigens. Common genetic variants could alter the expression or function of HLA genes, disrupting the HLA molecules and subsequently its antigen presenting ability.
In this study, we genotyped four key SNPs in the HLA region to investigate their association with NHL in the Malaysian Malays and Chinese. rs6457327 is located near the HLA-C in the HLA class I region 14 ; rs10484561 and rs2647012 are located in HLA class II region 13 , which are in high linkage disequilibrium with the extended haplotypes DRB1*01:01-DQA1*01:01-DQB1*05:01 and DRB1*15-DQA1*01-DQB1*06:02 16 ; rs9271100 is located in HLA class II region and was associated with numerous autoimmune disease since its first discovery through the GWAS of systemic lupus erythematosus (SLE) 17 . Through this approach, we hope to identify a global NHL SNP, one that is linked to NHL in both Caucasian populations and the Malaysian population.

Results
Association between the SNPs and NHL susceptibility. Our Table 1.
We evaluated the association of rs6457327, rs2647012 and rs10484561 with NHL in the Malaysian Malay and Chinese subjects. These SNPs were previously reported in NHL large consortia studies in Caucasian populations. We also included rs9271100, a prominent SNP located near HLA-DRB1 constantly associated with autoimmune or autoimmune-like disease, a risk factor of NHL. In our statistical analysis, 3 genetic models, namely additive, dominant and recessive models, were evaluated. The different genetic models would enable us to identify the model that best represents correlation between SNP alleles and gene expression profiles.
We observed association for rs2647012 with all NHL types (P NHL_add = 0.0008; OR NHL_add = 0.54; 95% CI = 0.37-0.77) and B-cell NHL (P Bcell_add = 0.0007; OR Bcell_add = 0.51; 95% CI = 0.35-0.76) in the Malay samples (Table 2, Supplementary Table S1). SNP rs2647012 showed the strongest association in the additive genetic model. Association of all other SNPs with other ethnicities was not significant (P > P Bonf 0.00083) (Supplementary Table S1). This association has marginal power > 60% and shows no deviation from HWE ( Table 2). All 4 SNPs in our study are independent markers due to very weak pairwise LD observed in both Malay and Chinese subjects (Supplementary Table S2).
There is a substantial age difference between NHL cases and controls in the Malay samples. We performed additional analysis to evaluate the confounding effects of age on the association of rs2647012. From our samples, we selected Malay NHL cases and controls in a younger age group (18-55 years) as well as in older subjects (removing controls age < 40 years). For analysis of Malay subjects aged 18-55 years, average age between subjects were reduced for all NHL types [Case = 41.6 (± 10.5); Ctrls = 39.4 (± 8.  Table 3.
For all NHL types, Malay subjects from 18-55 years of age, the association of rs2647012 (P NHL_add = 0.005408; OR NHL_add = 0.55; 95% CI = 0.36-0.84) surpasses the nominal threshold of 0.05 but not the multiple testing threshold (P Bonf = 0.00083). Likewise for B-cell NHL, rs2647012 (P Bcell_add = 0.006009; OR Bcell_add = 0.53; 95% is not significantly associated with NHL after multiple testing correction. The effect sizes remain consistent in direction with our initial unmatched cases and controls analysis ( Table 2, Supplementary Table S3). Similar results were observed for analysis excluding controls aged < 40 years. In all NHL types for Malay subjects, the association of rs2647012 (P NHL_add = 0.0045; OR NHL_add = 0.56; 95% CI = 0.38-0.84) surpasses the nominal threshold of 0.05 but not after multiple testing correction. Likewise for B-cell NHL, rs2647012 (P Bcell_add = 0.003657; OR Bcell_add = 0.54; 95% CI = 0.36-0.82) is not significantly associated after multiple testing correction. The effect sizes remain consistent with our initial unmatched cases and controls. (Table 2,  Supplementary Table S3).

Discussion
We report the association of rs2647012, a HLA class II SNP with NHL and B-cell NHL in the Malaysian Malay population. The disparity in age between Malay NHL cases and controls poses a challenge to the reliability of our analysis. We have done additional analysis by looking at Malay NHL cases and controls in a younger age group (18-55 years) as well as in older subjects (removing controls age < 40 years). From our results, we observed that the age difference between NHL cases and controls did not affect the association in our study. The effect size remained consistent across the sub-analysis and initial analysis. Matching subjects based on age would have decreased the power of our association. Hence, the need to use a larger sample size at the expense of disparate age groups.
Our study lacks genome-wide data to enable correction of population structure. Previous reports have suggested a binary clustering of the Malay population in Peninsular Malaysia, grouping Malaysian Malays based on northern and southern origins 18 . Thus, the association of rs2647012 could be confounded by presence of population structure in the Malaysian Malay population.  The effect size and direction of rs2647012 in the Malaysian Malays are consistent with previous reports in Caucasian populations 15 , rendering a protective effect (Supplementary Table S5). The minor allele frequency differs, as rs2647012-T is less prevalent in Malays (MAF = 0.18) compared to Caucasian populations (MAF = 0.48). Other SNPs also displayed a similar trend. The association of rs6773854 was not replicated in our preliminary study despite its strong association to B-cell NHL in the Chinese population 19 .
In silico cis-eQTL analysis of rs2647012 suggests potential regulatory function of nearby HLA class II molecules. rs2647012-T is linked to higher expression of HLA-DQB1, an observation also reported by Sille and colleagues 20 . rs2647012-T is linked with lower expression of HLA-DQA2, contradicting the protective effect rendered by rs2647012-T. Our results could be important as increased HLA-DQB1 expression may enhance antigen presentation, immune surveillance and the elimination of tumor cells. Low levels of HLA-DR/DQ expression have been linked to decreased immune response in lymphoma cells 21 . HLA class II molecules are expressed in antigen presenting cells such as B-lymphocytes, and act to present exogenous antigens to CD4 + helper T-cells. Change in expression of these HLA class II molecules, modulated by SNPs, may compromise efficiency of antigen presentation.

Conclusion
We report the association of rs2647012, a HLA class II SNP with NHL and B-cell NHL in the Malaysian Malay population. Through in silico eQTL analysis, rs2647012 genotypes were shown to correlate with expression levels of nearby HLA class II molecules. This report represents the largest sample size to date for NHL associations in a Malaysian population after our previous study on IL-10 promoter SNPs 22 . This report is also one of few blood cancer association reports in south Asian populations 19,23 . The HLA class II region is highly polymorphic with extended LD blocks. Therefore, future work would focus on fine mapping the HLA class II region as well as genotyping the HLA class II alleles to identify potential causal variants. We could also expand our analysis to include immune-related pathways of HLA class II molecules as well as the regulatory function of SNPs in modulating binding of transcription factors in promoter regions.

Materials and Methods
Study subjects and clinical characteristics. A total of 567 NHL patients (304 Malays and 263 Chinese) were recruited from University Malaya Medical Centre (UMMC) and Ampang Hospital from April 2007 to July 2014. NHL types and subtypes were classified according to the World Health Organization (WHO) 2008 classification system. The criteria for NHL case recruitment were adult patients above the age of 18 years old, HIV-negative NHL patients and patients without undergoing haematopoietic stem cell transplantation. All 578 unrelated healthy controls (308 Malays and 270 Chinese) were aged between 18-85 years old. Younger control subjects were recruited from various blood donation campaigns held in UMMC during the same period of time while older control subjects were collected from outpatient clinics in UMMC. All controls were free from any underlying malignancies. Both NHL patients and healthy controls were unrelated self-reported ethnicity of Malay or Chinese descent. Written informed consent was obtained from all patients and healthy controls prior to blood/ buccal samples collection. This study was approved by UMMC Ethics committee (Reference no. TaqMan SNP genotyping method. Genotyping was done using TaqMan ® SNP genotyping assay (Applied Biosystems, Foster City, CA) and performed on the QuantStudio ™ 12 K Flex Real Time PCR System (Applied Biosystems, Foster City, CA). SNPs showing genotype call rates > 95% were retained for statistical analysis. To evaluate concordance of the genotyping calls, 5% of the samples for each SNP were randomly selected for re-genotyping.

Statistical analysis of genotyping data. Statistical analysis was performed separately for Malay and
Chinese ethnic groups. Within each ethnic group, SNP association was calculated for all NHL types as well as NHL subtypes of B-cell NHL, T-cell NHL, DLBCL and FL. Logistic regression was performed using an additive model, adjusting for age and gender. Odds ratio (OR) and 95% confidence intervals (CI) were calculated referencing the minor allele. All SNPs were evaluated for deviation from the Hardy-Weinberg equilibrium (HWE) in control subjects. SNPs deviating from HWE (P HWE < 0.05) were excluded from our analysis. We applied the Bonferroni method for multiple testing correction of the type 1 error. We corrected for 4 independent SNPs across 3 sample groups and 5 NHL types (α = 0.05/60 = 0.00083). SNPs showing P < 0.00083 were considered significantly associated. To address the disparity in age between NHL cases and controls, we performed additional analysis by selecting Malay NHL cases and controls in a younger age group (18-55 years) as well as in older subjects (removing controls age < 40 years). Pairwise linkage disequilibrium (LD) between all 4 SNPs was calculated to assess linkage or independence of the 4 variants. All statistical analysis were performed in PLINK v1.07 unless stated otherwise (see URL) 24 . Meta-analysis was performed in META (see URL) assuming a random-effects model to estimate the combined OR and P-value of SNPs in the combined analysis of Malay and Chinese subjects. A random-effects model was chosen considering the different ancestry of the Malay and Chinese subjects. Power analysis of all SNPs was calculated using QUANTO v1.2.4 (see URL) 25 . SNPs with power associations < 70% was considered underpowered. Linkage disequilibrium was calculated using both Pearson's correlation (r 2 ) and Lewontin's D-prime (D′ ).
Cis-eQTL analysis of NHL associated SNPs. Potential regulatory function of NHL SNPs was evaluated through in silico eQTL from GTEx version 6 (see URL) 26 . We focused on EBV-transformed lymphocytes, the most relevant cell type for our study. eQTL from other cell types were evaluated as well to ascertain the global regulatory potential of our SNPs. Details of the cis-eQTL analysis have been mentioned elsewhere 26 . Briefly, cis-eQTL was performed in Matrix eQTL 27 using linear regression, assuming an additive model, adjusting for 3 principal components and PEER factors. The cis-eQTL was defined as a window of + /− 1MB from the transcription start site (TSS). Storey FDR is used to correct for multiple hypothesis, with cis-eQTLs chosen as significant for q-values ≤ 0.05.