Evidence of new intragenic HBB haplotypes model for the prediction of beta-thalassemia in the Malaysian population

This study sought to determine the potential role of HBB haplotypes to predict beta-thalassemia in the Malaysian population. A total of 543 archived samples were selected for this study. Five tagging SNPs in the beta-globin gene (HBB; NG_000007.3) were analyzed for SNP-based and haplotype association using SHEsis online software. Single-SNP-based association analysis showed three SNPs have a statistically significant association with beta-thalassemia. When Bonferroni correction was applied, four SNPs were found statistically significant with beta-thalassemia; IVS2-74T>G (padj = 0.047), IVS2-16G>C (padj = 0.017), IVS2-666C>T (padj = 0.017) and 3’UTR + 314G>A (padj = 0.002). However, 3'UTR + 233G>C did not yield a significant association with padj value = 0.076. Further investigation using combined five SNPs for haplotype association analysis revealed three susceptible haplotypes with significant p values of which, haplotypes 1-2-2-1-1 (p = 6.49 × 10−7, OR = 10.371 [3.345–32.148]), 1-2-1-1-1 (p = 0.009, OR = 1.423 [1.095–1.850] and 1-1-1-1-1 (p = 1.39 × 10−4, OR = 10.221 [2.345–44.555]). Three haplotypes showed protective effect with significant p value of which, 2-2-1-1-1 (p = 0.006, OR = 0.668 [0.500–0.893]), 1-1-2-2-1 (p = 0.013, OR = 0.357 [0.153–0.830]) and 1-1-2-1-1 (p = 0.033, OR = 0.745 [0.567–0.977]). This study has identified the potential use of intragenic polymorphic markers in the HBB gene, which were significantly associated with beta-thalassemia. Combining these five SNPs defined a new haplotype model for beta-thalassemia and further evaluation for predicting severity in beta-thalassemia.

www.nature.com/scientificreports/ have been done previously in the East and West Malaysia, genetic heterogeneity is more observed in multiracial population in Malaysia with a diverse spectrum of alpha (α-), beta (β-) and delta (δ-) globin genes mutations among the patients with thalassemia syndromes [7][8][9][10][11] . Beta-thalassemia is due to decreased beta-globin chain synthesis of which, caused by a mutation in the HBB gene. The HBB gene mapped on chromosome 11p15.4 with a region spanning from 5,225,464 to 5,229,395 bp on the reverse strand 12 . Therefore, the identification of nucleic acid variations in the HBB gene has improved our understanding of underlying causal mutations of beta-thalassemia in Malaysia.
However, the genetic variant interaction in conferring the effect based on haplotype inference has yet to be explored and refined in beta-thalassemia among the Malaysian population. Deciphering the predisposing effect by the potential haplotype markers can promote the exposition of underlying mechanisms of thalassemia development. In this study, five single nucleotide polymorphisms (SNPs) within the HBB gene were evaluated to determine its significance and haplotype structure inference with beta-thalassemia in Malaysia, which was the first study conducted in Malaysia to the best of our knowledge.
Haplotype analysis. Captivated by this favorable data in single association analysis, further investigation was conducted using combined allele from IVS2-74T>G, IVS2-16G>C, IVS2-666C>T, 3'UTR + 233G>C and Table 1. Single association analysis of five tagging SNPs of the HBB gene with beta-thalassemia. Case data is at the top line, while control data is at the bottom line. The major allele is depicted as 1. Minor allele is represented as 2. MAF minor allele frequency. The p value < 0.05 is considered significant in Pearson Chi-Square. p adj -value is Bonferroni correction based on the five markers tested in this study.

Genotype data (frequency)
Case-control analysis The summary of these findings was tabulated in Table 2. Haplotype with the frequency < 0.03 in both controls and cases were automatically excluded from the analysis by the SHEsis online software.

Discussion
In this study, we explored a single-based and haplotype association of five intragenic HBB polymorphisms in beta-thalassemia cases from Malaysia. It was suggested that the association of intragenic SNPs might be useful for the diagnosis and delineation of the clinical heterogeneity of beta-thalassemia 16 . Furthermore, the intragenic SNPs could be useful marker for linkage analysis and in prenatal diagnosis it can improve the diagnostic errors of which, caused by recombination 17 .
From the analysis of single-based association, two intronic polymorphisms; IVS2-16G>C, and IVS2-666C>T, and one variant at 3' untranslated region to HBB gene assigned as 3'UTR + 314G>A were found significantly associated with beta-thalassemia. The substitution of C to T allele at position 666 of intron 2 with minor allele frequency (MAF) of 0.359 in case group and 0.423 in control group conferring protection in beta-thalassemia with the odds ratio of 0.765 (p = 0.032). However, we noted that the MAF for IVS2-666C>T from this study was higher when compared with global MAF in the ClinVar (0.286) database but lower compared to 1000 Genome Project (0.713) 18 . The genotype distribution of this intronic polymorphism revealed the heterozygote had yielded the highest frequency (48.5%) in the control group. Association study done by Akhavan-Niaki et al. (2011) reported that IVS2-666C>T was found to be linked to a mutation at codon 8(-AA) [HBB:c.25_26delAA], of which this β°-mutation was mainly described among the population from the Middle East and the Mediterranean. Hence, the authors suggested that IVS2-666C>T would be useful as a marker for codon 8 genotyping in prenatal diagnosis 17 .
Meanwhile, two other variants showed a significant susceptibility effect towards beta-thalassemia: IVS2-16G>C and 3'UTR + 314G>A. The MAF for IVS2-16G>C was 0.357 in the case group and 0.420 in the control group conferring susceptibility in beta-thalassemia with the odds ratio of 1.300 (p = 0.036). In comparison to global MAF from the ClinVar database (0.280), MAF findings for IVS2-16G>C in this study were noted higher but lower when compared to the 1000 Genomes Project (0.720) 19 . The untranslated region (UTR) is the sequence in the 3' region of a gene but not translated during protein synthesis and contains regulatory element for the gene expression 20 . A variant in the 3'UTR of the HBB gene, which is assigned as 3'UTR + 314 G>A was found to have a significant susceptibility effect towards beta-thalassemia with the odds ratio 2.013 (p = 0.004). The MAFs were found to be 0.092 in the case group and 0.048 in the control group. However, we noted that there was very limited report of this variant in the literature for further comparison. Overall, we noticed that the MAF for the three significant variants in this study were within the range of global MAF from other studies reported in the ClinVar database 18,19 . The different MAF value could be varied across diverse ethnic or population as well as study sample size 21 .
In an attempt to further evaluate the role of HBB haplotypes in beta-thalassemia in Malaysia, haplotype analysis revealed several susceptible and protective haplotypes 22 . The potential applications of haplotype-tagged SNPs have been widely described in the literature. Fields of application include, for example, disease association and pharmacogenetic studies 23 . Protective haplotype had been observed in association with breast density in fine mapping analysis 24 . In this study, we identified seven different haplotypes using the five intragenic HBB SNPs. A comparable finding was reported by Bilgen et al. (2011) for the haplotype analysis in the Turkish population. Table 2. Haplotype analysis of IVS2-74T>G, IVS2-16G>C, IVS2-666C>T, 3'UTR + 233G>C and 3'UTR + 314G>A in all races dataset with 249 cases and 294 controls. *Frequency < 0.03 in both controls and cases has been dropped in the analysis. **Major allele is depicted as 1. Minor allele is depicted as 2. A sequential in allele combination represents for IVS2-74T>G, IVS2-16G>C, IVS2-666C>T, 3'UTR + 233G>C and 3'UTR + 314G>A respectively. www.nature.com/scientificreports/ Likewise, the authors have also reported that SNP based haplotyping using five intragenic SNPs has successfully established the beta globin gene mutation related haplotypes 16 . In the earlier studies done by Fuchareon et al. (2001) and Sanguansermri et al. (2004) also have reported association of certain haplotype pattern with HbE and common beta-thalassemia mutation respectively by using PCR-RFLP method. To the best of our knowledge, no study was done so far to evaluate the important role of intragenic HBB SNPs in thalassemia syndrome in Southeast Asian region. In this study, we identified six significant haplotypes of which, have important role for beta-thalassemia. Noteworthy, individuals with haplotype that consists of all major alleles from our assigned HBB polymorphisms (1-1-1-1-1) might have a higher risk in developing beta-thalassemia. However, if the minor allele from IVS2-666C>T is substituted, the effect becomes a protective effect. This allele transition might reveal the protective role from the minor allele of IVS2-666C>T. The same effect is reflected in IVS2-16G>C. However, the protective effect from the minor allele of IVS2-16G>C was not strong enough to confer susceptibility for this haplotype.
Interesting to note that the combination of both minor alleles from IVS2-666C>T and 3'UTR + 233G>C with other dominant alleles projected higher protection, which elucidates the same protective role from 3'UTR + 233G>C. These synergist effects provide a better outcome for individuals with this haplotype 1-1-2-2-1. The same synergist effect was also observed for haplotype 2-2-1-1-1, which revealed the protective role from IVS2-74T>G and IVS2-16G>C. Likewise, the allele substitution for 3'UTR + 314G>A in haplotype 1-1-2-1-2 dropped the protective effect from haplotype 1-1-2-1-1. The susceptible effect might explain this from a minor allele of 3'UTR + 314G>A. This haplotype-based association analysis was carried out to provide a prediction of the predisposing effect and reveal the severity and possible prognosis using haplotype-tagged SNPs of HBB gene for beta-thalassemia. Thus, this model could be further developed for the improvement of clinical management of beta-thalassemia in Malaysia mainly based on the personalized haplotype profile.
In conclusion, the presented study the first study on intragenic polymorphic markers of the beta-globin gene involving the Malaysian population. Identification of susceptible and protective haplotype markers that conferred the significant association with beta-thalassemia in Malaysia can be further refined following the multi-ethnic background of the Malaysian population. The association data on a single genotype and haplotype might disclose the effect of HBB polymorphisms in beta-thalassemia that might provide an impact in the understanding of beta-thalassemia propensity. This study can be ascertained by larger sample size, and stratification by ethnicity should be deliberated since Malaysia is inhibited by various ethnicity. Only cases with valid Malaysian identity card numbers were included in this study. Cases with no sequencing results and no valid Malaysian identity card numbers were excluded from this study. These cases were molecularly ascertained via Sanger sequencing using 3730XL DNA Analyser (Applied Biosystem, Foster City, CA, USA) for the presence of HBB gene variation. Samples with heterozygous or compound heterozygous or homozygous state of HBB gene mutations were grouped as cases. Whilst controls were the samples without the known beta-globin gene mutation. SNP genotyping. Genomic DNA was extracted from peripheral blood using a commercial DNA extraction kit (QIAGEN, Germany). The detection of the genotype for IVS2-74T>G (HBB:c.315 + 74T>G), IVS2-16G>C (HBB:c.315 + 16G>C), IVS2-666C>T (HBB:c.316-185C>T), 3'UTR + 233G>C (HBB:c.*233G>C) and 3'UTR + 314G>A (HBB:c.*314G>A) polymorphic site in the HBB gene was performed using a direct DNA sequencing technique in which the cycle sequencing used the BigDye® Terminator v3.1 cycle sequencing kit. Sequence analysis was performed on CLC Main Workbench 6 version 6.6.1 software (CLC Bio, Denmark). Bioinformatics analysis. The SHEsis Online software (http:// analy sis. bio-x. cn/ myAna lysis. php) was employed to assess the Hardy-Weinberg equilibrium, allele frequency, SNPs and haplotype association in which allelic and genotypic distribution were compared between case and control groups 25 . Bonferroni correction was used for multiple comparison correction. The odds ratios (ORs) value with a 95% confidence interval (95% CI) in which a p value of 0.05 was considered as significant.