β-Thalassemia Patients Revealed a Significant Change of Untargeted Metabolites in Comparison to Healthy Individuals

β-Thalassemia is one of the most prevalent forms of congenital blood disorders characterized by reduced hemoglobin levels with severe complications, affecting all dimensions of life. The mechanisms underlying the phenotypic heterogeneity of β-thalassemia are still poorly understood. We aimed to work over metabolite biomarkers to improve mechanistic understanding of phenotypic heterogeneity and hence better management of disorder at different levels. Untargeted serum metabolites were analyzed after protein precipitation and SPE (solid phase extraction) from 100 β-thalassemia patients and 61 healthy controls using GC-MS. 40 metabolites were identified having a significance difference between these two groups at probability of 0.05 and fold change >1.5. Out of these 40 metabolites, 17 were up-regulated while 23 were down-regulated. PCA and PLS-DA model was also created that revealed a fine separation with a sensitivity of 70% and specificity of 100% on external validation of samples. Metabolic pathway analysis revealed alteration in multiple pathways including glycolysis, pyruvate, propanoate, glycerophospholipid, galactose, fatty acid, starch and sucrose metabolism along with fatty acid elongation in mitochondria, glycerolipid, glyoxylate and dicarboxylate metabolism pointing towards the shift of metabolism in β-thalassemia patients in comparison to healthy individuals.


Material and Method
Patient's Selection. The selection of patients for present study was carried out in National Institute of Blood Disease and Bone Marrow Transplantation (NIBD), Karachi, Pakistan after the ethical approval of the Institutional Review Board (IRB), ethic committee of hospital while the experimental protocols were approved by the primary research institute (International Center for Chemical and Biological Sciences, ICCBS). This study included 100 cases of β -thalassemia with following inclusion criteria: Patients have been registered at NIBD, diagnosed as case of β -thalassemia major (defined as Hb < 7 g/dL, high HbF, absent or very low HbA, and more than 8 transfusions a year), sampling was carried out prior to blood transfusion if patient requires blood transfusion, and in 4-8 hours fasting condition. Exclusion criteria included: Patients having any evidence of other chronic illnesses unrelated to thalassemia and unwillingness to enrol in the study. Healthy volunteers were also recruited at Dr. Panjwani Center for Molecular Medicine and Drug Research (PCMD) for this study, basic details of patients and healthy controls are mentioned in supplementary information (Table S1). Written informed consent was obtained from all the participants of this study including control and patients. A thorough questionnaire consisting of questions for information required for study was also filled from all the patients. Sample Collection. Sample collection was carried out in accordance with relevant guidelines and regulations. Blood was collected from participants after 8 hours fasting but for infants and toddlers sample was collected when they felt need for food. About 5 cc of blood was drawn from participants by venepuncture and collected in gel-based BD vacutainer tubes (BD Franklin Lakes NJ, USA, REF: 367381), interior coated with silicone for clot activation. Serum was separated by centrifugation at 2000 rpm for 10 minutes at 4 °C. And then serum was aliquoted and stored immediately at −80 °C freezer till further processing of sample. Sample Preparation and Derivatization. Protocol used for preparation of samples in this study for profiling of metabolites has been reported previously in detail and it include methods that were performed in accordance with the relevant guidelines and regulations 29 . In short, first proteins were precipitated by adding 800 μ L of chilled methanol in 100 μ L serum containing 20 μ L of myristic acid (2 mg/mL) as internal standard. Supernatant was subjected to solid phase extraction using a 96 well plate (Strata C18-E, 55 μ m pore size, 70 Å particle, 100 mg sorbent/1 mL Phenomenex, USA) under vacuum (AHC-7502, Phenomenex, USA). After sample loading, the solid phase was washed with 300 μ L of water and metabolites were eluted with 600 μ L methanol and collected in a 96 well collection plate. The eluent was finally dried in vaccum at room temperature. The dried extract was stored at 4 °C until analysis. Derivatization of dried samples was carried out by addition of 50 μ L methoxylamine hydrochloride (15 μ g/μ L in pyridine) followed by addition of 50 μ L BSTFA with 1% trimethylchlorosilane for formation of trimethylsilyl derivatives. Then sample was centrifuged and analyzed on GC/MS. GC-MS Analysis. GC-MS analysis of derivatized samples was executed as mentioned previously with minor modifications in the GC method 30 . Analysis was carried out on 7890 A GC (Agilent technologies, USA), fitted with a GC auto sampler 120 (PAL LHX-AG12-Agilent Technology) autosampler and coupled to Agilent 7000 Triple Quad system (Agilent Technologies, USA). A fused-silica capillary GC column, HP-5MS 30 m x 0.25 mm ID (Agilent J&W Scientific, Folsom, CA, USA), chemically bonded with a 95% dimethylpolysiloxane 5% diphenyl cross-linked stationary phase (0.25 mm film thickness) was used. The serum sample was injected in the splitless mode using helium as carrier gas. Initially the oven temperature was fixed at 50 °C for 1 min then temperature was raised in three steps. In first step temperature raised at a rate of 10 °C per minute to 80 °C for 3 min then again 10 °C per min to 180 °C for 3 min and in final step 15 °C per min raise to 300 °C for 5 min. After maintaining the temperature at 300 °C for five minutes, it was further increased to 305 °C for one minute which referred as post run. Retention time was locked to the internal standard at 20.070 min. Electron impact ionization (EI) was used as an ionization source for the GC/MS analysis at 70 eV. Data acquisition was done in full scan mode from 50-650 m/z in 0.5 seconds scan time. A blank was run between samples to remove contamination. Mass calibration was done with perfluorotributylamine (PFTBA).

GC-MS Data Preprocessing and Statistical Analysis. Agilent Mass Hunter Qualitative Analysis soft-
ware (version B.04.00) was used for data processing. Peak integration and deconvolution parameters have been previously reported 29,30 . Mass spectra of the peaks were compared with NIST mass spectral (Wiley registry NIST 11) library leading to presumptive identification of metabolites with a ≥ 70% similarity index. The GC-MS spectra were uploaded on Mass Profiler Professional (MPP) software 12.5. Filtering of the data involved using all available data and minimum absolute abundance of 5,000 counts with 3 number of ions minimum. Match factor 0.3, retention time tolerance 0.05 and delta MZ (low resolution) 0.2 were set as alignment parameters. External scalar was used for normalization of data. Z transform was selected as base lining option treating all the compounds equally irrespective of their intensity. A total of 711 compounds were detected in the entire samples after alignment of data. Statistical significance analysis was done using student T-test unpaired for healthy versus β -thalassemia patients of fold change (FC) 1.5. A PLSDA model was built for healthy versus β -thalassemia patients using auto scaling, N fold validation type, three number of folds and ten number of repeats. Blind samples (n = 20) were also run for external validation. Sensitivity and specificity of the constructed model were also measured.

Results
Metabolite profiling of total 161 serum samples including healthy volunteers (n = 61) and β -thalassemia patients (n = 100) were analysed by using GC-EI-MS. After performing GC-MS analysis as described above, identification of metabolites was carried out using Agilent Mass Hunter Qualitative Analysis software and NIST library. Statistical and multivariate data investigation i.e. Heatmap, PCA plot and PLSDA plot was carried out using MPP software in order to identify comparative and statistically significant metabolites between healthy and β -thalassemia samples. Significant testing and fold change was carried out on total of 711 entities found in this experiment. Student's T-test unpaired, asymptomatic p-value computation and multiple testing correction by Benjamini Hochberg FDR was applied. A list of 40 compounds was generated at probability of 0.05 and fold change > 1.5. Out of these 40 metabolites, 17 metabolites were up-regulated and 23 were down-regulated in β -thalassemia patients in comparison to healthy controls as listed in Tables 1 and 2, respectively with their CAS registry numbers. Among these 40 metabolites, 8 were showing a fold change of 2 among disease and healthy group as shown in Fig. 1. Twenty one out of forty low molecular weight metabolites; geraniol, palmitic acid, α -glyceryl palmitate, lactic acid, α -glyceryl stearate, M-pyrol, citronellyl formate, sucrose, triethanolamine, 5-ethyl-5-methyldecane, 2,3-dimethyl-2,3-butane-diol, boric acid, phosphoric acid, hexadecane, methylbis(phenylmethyl)benzene, dodecane, 4,6-dimethyl, phthalic acid, glycerol, stearic acid, n-pentatriacontane and ethylene glycol; were putatively determined by comparing the mass spectra of the peaks with those available in the NIST mass spectral (Wiley registry NIST 11) library at ≥ 70% similarity index while the remaining were not identified at this similarity index. The identified compounds are shown with their name while unidentified with their base peak and retention time in Tables 1 and 2. The EI/MS spectra of remaining nineteen unidentified compounds are shown in supplementary information ( Figure S1). Principal component analysis (PCA) was also carried out on our data to make sure that the difference in metabolic pattern is due to difference in health status and not  Table 2. List of down-regulated metabolites in β-thalassemia patients in comparison to healthy controls. attributable to age or weight. We found that samples were not separating on the basis of age or body mass shown in supplementary information ( Figures S2 and S3) therefore excluding these confounders. PCA was carried out and a model was generated which revealed a vibrant and noteworthy difference between the non-averaged healthy samples and β -thalassemia samples. The PCA scores are shown in Fig. 2 in which each sample is denoted by a single point. The 50% cumulative variance of samples was observed at component 6 and variance of first three components on X, Y and Z axis are found to be 23%, 7.18% and 6.05% respectively. So this indicates that there are many factors responsible for discriminating metabolites between the healthy and disease group. A prediction model of healthy versus disease group was built by multivariate data analysis that include all analysed samples i.e. 100 β -thalassemia and 61 healthy samples and on the basis of forty metabolites having a statistically important difference in expression between these two clusters. Samples were classified into discrete classes also by supervised Partial Least Square Discriminant Analysis (PLSDA). Two parts of the input data were randomly assigned to the training set and remaining into the testing set. Auto-scaling was applied which involves subtracting the variable mean from each variable (data column) and dividing each by its standard deviation. This process was repeated ten times, each time using a different part for testing thus using each row once in training and testing generate a Confusion Matrix, which gives accuracy of prediction for each class. Plots obtained by PLS-DA scores are shown in Fig. 3 exposing an unblemished separation trend between the two sets of our experiment. Sensitivity of the constructed model was calculated from the proportion of β -thalassemia samples that were predicted correctly and referred as true positives, while specificity was determined from the proportion of control samples which were correctly predicted and these are stated as true negatives. Sensitivity of our built model was found to be 92.0% and specificity was 95.0%, respectively, while the overall accuracy of the model was 93.1% as mentioned in Table 3. The predictive capacity (i.e. sensitivity and specificity) of the model was measured also by external validation using 20 serum samples consisting of 10 samples each from healthy controls and β -thalassemia patients. But these samples were decoded prior to preparation and analysis by GC-MS therefore these were an independent or blind test set of samples. External validation correctly predicted the presence of β -thalassemia in 7 out of 10 patients and healthy controls in 10 out of 10 patients resulting in a sensitivity of 70% and specificity of 100%. Sample prediction reports are shown in Figure S4 of supplementary information.
To identify metabolic pathways those are disturbed in β -thalassemia we used web based software MetaboAnalyst 3.0 (www.metaboanalyst.ca/) in which previously mentioned list of identified metabolites was  entered. On the basis of several databases such as KEGG (Kyoto Encyclopedia of Genes and Genomes) (http:// www.genome.jp/kegg/) 31,32 and HMDB (Human Metabolome Database) (http://www.hmdb.ca/) this online software helps in identifying the pathways with significant alterations. The summaries of pathway analysis created on the basis of hypergeometric test and relative-betweeness centrality in pathway topology analysis by this program using up-regulated and down-regulated metabolites are shown in Figures S5 and S6, respectively. The images of distinguishing identified pathways are provided in supplementary data i.e. Figure S7.

Discussion
The innovative omics technology has opened up exciting opportunity for screening and identification of novel biomarker that can acts as an indicator for the physiological alteration of body. Evolving technologies of metabolomics profiling hold potential for lighting biology and human diseases. Metabolites have a wide range of functional groups are present from volatile alcohols, ketones, amines, organic acids to complex lipids, carbohydrates and other secondary metabolites. β -thalassemia is one of the very frequent and extremely disabling genetic disease. Different pathological and environmental stresses change expression level of certain genes and hence concentrations of metabolites of corresponding pathways. Therefore, we aimed to determine these changes in metabolome of β -thalassemia patients for disease prognosis and to understand unclear pathophysiological mechanisms of thalassemia, as field of metabolomics has proven itself a promising technique in understanding pathophysiology of many other various diseases also including genetic diseases such as sickle cell anaemia [33][34][35][36] . However, our results are limited by the fact that the type of mutation was not known in patients.
The comparison of serum metabolites between β -thalassemia patients and normal subjects revealed evident alterations of metabolites in the disease group. The close image of heat map using non-average samples with normalized intensities of forty (40) significant metabolites are shown in Fig. 4 in which the identified metabolites are stated by their name while unidentified with their base peak and retention time. From this heat map it is quite clear that β -thalassemia metabolite profile is totally different from the control group and it can also be observed that the concentration of some of the metabolites is increased while of some is decreased. This change in metabolite profile indicates that in β -thalassemia patients metabolism is shifted from the normal state and it is also reported in literature that metabolism is disturbed in this genetic disease 37 . Hence, knowledge related to these altered metabolites play an important role in understanding of disease progression at molecular level. Significance  Table 3. Confusion Matrix of Model generated from healthy controls (n = 61) and β-thalassemia patients (n = 100). of the transformed metabolite profile in β -thalassemia can be described by referring it to the human metabolome database (HMDB) 38-40 . Up-regulated metabolites. Geraniol also named as rhodinol is a monoterpenoid and an alcohol which occurs in essential oils of several aromatic plants. Its bio-functions include cell signalling, storage and source of fuel or energy and integrity of membrane. It possess anti-cancer, antimicrobial, anti-oxidant, anti-inflammatory and some vascular effects 41 , therefore it is possible that geraniol is increased as a result of oxidative stress, inflammation and decreased RBCs membrane integrity in β -thalassemia. Palmitic acid or hexadecanoic acid is one of the most common saturated fatty acids found in animals, a saturated fatty acid found in fats, waxes and body lipids. It is involved in various metabolic pathways in body and its altered levels are also reported in colorectal cancer, breast cancer, eosinophilic esophagitis and gastroesophageal reflux disease. Palmitic acid plays important functions other than providing energy 42 . One of their functions is to induce apoptosis, so their high levels may be responsible for early degradation of RBCs in β -thalassemia patients. α -Glyceryl palmitate and α -glyceryl stearate are forms of fatty acid, both are one fatty acid chain containing glycerides, covalently bonded to a glycerol molecule through an ester linkage. Both of these are source of energy as well as required also in maintaining stability of membrane so its increase levels can be due to more destruction of red cells. Lactic acid is a crucial metabolite which is involved in various biochemical processes and its production take place due to extreme activity in muscles. It is a component of various metabolic pathways such as cysteine, propanoate and pyruvate metabolism. As it is a product of anaerobic glycolysis therefore its enhancement in β -thalassemia can be predicted as these patients have low Hb levels which cause less supply of oxygen to tissues and leading to more anaerobic glycolysis. The second reason for more lactic acid levels, that it may be due to poor hepatic function a consequence of iron overload in these patients as abnormal concentrations of lactic acid are found in hepatic biliary malignancies in addition to other cancers. M-pyrol is a product of GABA (γ -aminobutyric acid) a neurotransmitter, its altered levels are seen in bladder infection and urinary tract infections are common in β -thalassemia because of predisposing factors of such as splenectomy, iron overload, anaemia, and granulocyte dysfunctions 43 . Sucrose is a non-reducing disaccharide of glucose and fructose and is linked to various metabolic pathways of glucose and other sugars. As it is broken down into its constituents fructose and glucose and its increase in blood indicates metabolic syndrome, mostly in β -thalassemia patients glucose homeostasis is abnormal this may result in its elevation 28 . Citronellyl formate and triethanolamine are metabolites found in cytoplasm as well as extracellularly. Both of them are produced endogenously in addition to dietary source. Phosphoric acid is another important metabolite which we found to be up-regulated in the disease group. It is present in cytoplasm and it act as an osmolyte and enzyme cofactor in biological system. It is also involved in signalling and a list of metabolic pathways such as ammonia recycling, arginine, proline, cysteine, purine, pyruvate, inositol metabolism and various glucose metabolic pathways. Therefore, it can be stated that increase phosphoric acid levels are indicator that various metabolic pathways are up-regulated in this disease.

Down-regulated metabolites.
Hexadecane is a 16 carbon atoms chain that has been shown to exhibit anti-inflammatory, anti-bacterial, anti-oxidant and thermogenic functions. Because in β -thalassemia all these activities are increased so low levels of hexadecane can be co-related with its more utilization and less availability in serum freely. Phthalic acid is a toxin or pollutant found in blood and when found in tissues or biofluids, it arises from exposure to phthalate products. Phthalate is an environmental chemical of high public concern because reports of its potential risk to male reproductive health so it can be said that iron load is a major reason of infertility in these patients and not the environmental toxins 44 . Glycerol is a major component of phospholipids and triglycerides that can be converted into glucose by liver to fulfil energy requirements. It is a component of glycerolipid and glycerophospholipid metabolism. And its abnormal levels have been quantified and identified in various disorders 45,46 , so it is obvious that in β -thalassemia glycerol is more consumed for energy production due to metabolic stress. Octadecanoic acid or Stearic acid is a beneficial saturated fatty acid involved in mitochondrial beta-oxidation of long chain saturated fatty acids and plasmalogen synthesis. These both pathways mainly contribute to maintain dynamics of membrane and cell signalling so low levels of stearic acid in body further contribute to decrease strength of RBCs membrane and altered cell signalling. Ethylene glycol also known as polyethylene oxide (PEO) or polyoxyethylene (POE), depending on its molecular weight is an oligomer or polymer of ethylene oxide. It functions in biosystem as a nutrient, anesthetic, anti-microbial, laxative and radical scavenger and its low levels may further aggravate the oxidative stress and increase susceptibility to infections.
Pathway Analysis. Pathways were produced from MetPA (Metabolomic Pathway Analysis) software that showed dysregulation in β -thalassemia patients (Table 4). Using MetPA identified metabolites were analysed that contains pathways from the KEGG metabolic pathways database and HMDB. Pathway enrichment with topology analysis, and an interactive visualization system is also used to find pathways that are most substantially altered under the conditions of particular experiment. In metabolic networks, more severe effects are produced due to changes in more "vital" locations on the pathway compared to variations occurring in bordering or comparatively isolated positions. In our analysis, we identified several pathways some were generated from list of metabolites that were up-regulated in β -thalassemia, including fatty acid elongation in mitochondria, glycolysis or gluconeogenesis, pyruvate, propanoate, glycerophospholipid, galactose, fatty acid biosynthesis and metabolism, starch and sucrose metabolism that may be amplified in these patients. While the metabolites that were down-regulated in β -thalassemia patients showed abruption in glycerolipid, galactose, glyoxylate and dicarboxylate metabolism and fatty acid biosynthesis. Metabolites involved in dysregulation of these pathways are palmitic acid, lactic acid, sucrose, triethanolamine, glycerol and ethylene glycol. The detail results of pathway analysis are shown in Table 4 SCIEntIfIC RePoRTS | 7:42249 | DOI: 10.1038/srep42249 illustrating all matched pathways according to p-values from pathway enrichment analysis and pathway impact values from pathway topology analysis. The pathways with considerable impact include glycerolipid, pyruvate, galactose, starch/sucrose and fatty acid metabolism while metabolites blameable for these deviations are glycerol, lactic acid, sucrose and palmitic acid respectively. We also noted that two pathways are found in both lists of pathways altered either due to increase or decrease of metabolites. One of them is fatty acid biosynthesis so it can be assumed that increase in palmitic acid is compensated by body with decrease in stearic acid to sustain the regulation of fatty acid biosynthesis pathways. Other is galactose metabolism in which increase sucrose is responsible for this pathway alteration which is compensated by decrease in glycerol. But this compensation is not sufficient enough by body because decreased glycerol has no such significant impression on this pathway as compared to increased sucrose. Therefore, it can be anticipated that diabetes an important complication of β -thalassemia is linked to imbalanced and aggravated galactose metabolism.

Conclusion
This study showed that genetic abnormalities in β -thalassemia also give rise to disturbance in metabolism of body that can be observed by alteration in serum metabolomic profile of β -thalassemia patients as compared to the profile of healthy group. Our research demonstrated that metabolite profiling by GC-EI-MS is a reproducible, sensitive and less invasive method that can be used for establishment of a profile distinguishing between β -thalassemia patients and healthy controls with a good sensitivity and specificity. A model was fabricated on forty significantly expressed metabolites precisely classifying β -thalassemia patients and healthy controls on external validation. In addition to this many important pathways are identified that were found to be impaired in β -thalassemia and may play role in disease progression. Moreover, our approach is the first to report differences in the serum metabolome between healthy and β -thalassemia patients, a molecular level understanding that can be used in improving treatment options for the sufferers as well as diagnosing phenotype of patients.