Effect of APOB polymorphism rs562338 (G/A) on serum proteome of coronary artery disease patients: a “proteogenomic” approach

In the current study, APOB (rs1052031) genotype-guided proteomic analysis was performed in a cohort of Pakistani population. A total of 700 study subjects, including Coronary Artery Disease (CAD) patients (n = 480) and healthy individuals (n = 220) as a control group were included in the study. Genotyping was carried out by using tetra primer-amplification refractory mutation system-based polymerase chain reaction (T-ARMS-PCR) whereas mass spectrometry (Orbitrap MS) was used for label free quantification of serum samples. Genotypic frequency of GG genotype was found to be 90.1%, while 6.4% was for GA genotype and 3.5% was for AA genotypes in CAD patients. In the control group, 87.2% healthy subjects were found to have GG genotype, 11.8% had GA genotype, and 0.9% were with AA genotypes. Significant (p = 0.007) difference was observed between genotypic frequencies in the patients and the control group. The rare allele AA was found to be strongly associated with the CAD [OR: 4 (1.9–16.7)], as compared to the control group in recessive genetic model (p = 0.04). Using label free proteomics, altered expression of 60 significant proteins was observed. Enrichment analysis of these protein showed higher number of up-regulated pathways, including phosphatidylcholine-sterol O-acyltransferase activator activity, cholesterol transfer activity, and sterol transfer activity in AA genotype of rs562338 (G>A) as compared to the wild type GG genotype. This study provides a deeper insight into CAD pathobiology with reference to proteogenomics, and proving this approach as a good platform for identifying the novel proteins and signaling pathways in relation to cardiovascular diseases.

Coronary artery disease (CAD) refers to the chronic inflammation due to gradual accumulation of lipids and fibrous elements over a life-span that leads to atherosclerotic plaque formation in arteries of heart. Multiple genetic risk variants, polygenic traits, and exposure to atherogenic environment are suggested to be involved in CAD manifestation 1,2 . Besides, individuals with normal low density lipoprotein (LDL)-cholesterol may also develop atherosclerosis without any conventional risk factors. Therefore, a better understanding of the disease etiology, and efficient therapeutic interventions are mandatory. It is now evident that genetics or heritability may explain ≈ 25% of the phenotypic variance in CAD 2,3 . Several genome-wide association studies (GWAS) have led to the identification of many genetic risk factors including > 300 chromosomal loci which are significantly associated with CAD risk 4 .
Apolipoprotein B gene (APOB) is one of the lipid associated genetic factors located on human chromosome 2. Currently, it has become a research hot spot due to its vital importance in the lipoprotein metabolism [5][6][7][8] . In humans, apolipoprotein B acts as a lipid transporter, and found as a structural component of all non-HDL lipoproteins. It has two isoforms, ApoB100 and ApoB48. ApoB100 is the major isoform of ApoB synthesized in Protein quantification and data analysis. The raw files generated on Orbitrap Q-Exactive HF-X were used to generate the protein profile on MaxQuant (v2.3.2, Matrix Science, UK) using Andromeda search engine with default search settings 31 . The discovery rate was set as 1%. The spectra were searched against Homo sapiens proteins in the UniProt/Swiss-Prot database (http:// www. unipr ot. org/). During the main search, the mass tolerances for precursor and fragment ions were set to 4.5 and 20 ppm. Enzyme specificity was set as carboxy-terminal to arginine and lysine (trypsin) and maximum two missed cleavages were allowed at arginine/lysine-proline bonds. Carbamidomethylation of cysteine residues was set as a fixed modification, and variable modifications were set as oxidation of methionine (to sulfoxides), and acetylation of protein amino-termini. Proteins were quantified by the MaxLFQ algorithm, integrated in the MaxQuant software. Only proteins with at least one unique or razor peptide were retained for identification, while a minimum ratio count of two unique or razor peptides was required for quantification 32,33 . Overrepresentation analysis. Significance of differential protein levels between healthy control and CAD patients group was established using t-test. P-values were corrected for multiple testing according to Benjamini and Hochberg 34 . Significance of differential protein levels was assessed while adhering to a 10% FDR cutoff. To visualize the expression trend in different genotypes, a heat map was generated by using Perseus software (v.1.6.10.50). For overrepresentation analysis of molecular functioning, one sided Fisher's exact test was used with significance at an alpha-level of 0.05. Pathway analysis was performed using Reactome Pathway Analysis (https:// react ome. org/). String analysis. The STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) was used for critical assessment, and integration of protein-protein interactions (http:// string-db. org/). The interactions were drawn from the experimental evidence as well as predictions based on knowledge gained from other organisms 35 . By using STRING, prioritized significant proteins were mapped, and a network image was created.
Statistical analysis. Data are presented as mean ± standard deviation for continuous variables, and for categorical variables, expressed in frequency, and percentage. To analyze the significant effect of biochemical parameters on both the disease, and the control groups, univariate general linear model (ANCOVA) adjusted for age and gender was applied. Chi-square test and multinomial regression model adjusted for age and gender was used to measure odds ratio (95% CI). Allelic and genotypic frequencies were calculated by direct gene counting method, and Hardy-Weinberg Equilibrium was measured. All these statistical analyses were performed on SPSS (IBM SPSS 20).
Compliance with ethical standards. All procedures performed in this study involving human participants were approved by the National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan) ethics review committee. Informed consent was obtained from all the study participants.

Results
The overall workflow of the research design is presented in Fig. 1. Initially, baseline clinical and anthropometric parameters were measured in both the disease, and control groups (Table 1). Among all study subjects, 80% subjects were male, and 20% were female in the control group, while 45% were male and, 55% were female in the disease group. Approximately 9% of the selected subjects with CAD were taking antihypertensive drugs and < 20% were prescribed to take statins (cholesterol lowering drug). The samples were collected at the time of admission or after 1 day of their admission. Anthropometric and clinical parameters such as BMI, WHR, blood glucose, and blood pressure were significantly higher (p < 0.05) in the disease group when compared to the control group. Furthermore, a significant difference was found in the total cholesterol, HDL-C, LDL-C, triglycerides, serum uric acid, and serum creatinine in both the disease, and control groups. Genotyping was performed by T-ARMS PCR, as shown in Fig. 2 (full image of the gel is provided in Supplementary Fig. S2). By use of the gene counting method, both genotypic and allelic frequencies of rs562338 (G/A) were calculated ( Table 2). In the CAD group, 90.1% were found to have GG, 6.4% were GA, and 3.5% were having AA genotypes. In the control group, 87.2% had GG, 11.8% were having GA and, 0.9% had AA genotypes. In the present study, genotypic frequencies were within the Hardy-Weinberg equilibrium (HWE) (χ 2 = 1.07, p = 0.29). Significant differences were observed between genotypic frequencies (p = 0.007) of both the disease, and the control groups, while their allelic frequencies showed non-significant differences. Furthermore, in genetic modeling, the rare allele (A) was found to be strongly associated with CAD [OR = 4 (1.9-16.7)] when compared to the control subjects in recessive genetic model (p = 0.04).
For the proteomic analysis, pooled serum samples of each genotype were analyzed on Q-Exactive HF-X. Each sample was run in triplicate. A total of 151 proteins were identified in all samples using MaxQuant (A list of these proteins is provided in Supplementary Table S1). By applying different filtering steps in Perseus (v.1.6.10.50), including removing contaminates, reversed and only identified by site, a total of 60 significant proteins were obtained, as depicted in Fig. 3a. Reproducibility of these significant proteins between the genotypes was ensured by using multiscatter plot using correlation coefficient (Fig. 3b). The data was further analyzed for disease group to check the differences between subjects having GA, and AA genotype with the normal genotype GG. Out of total 60 significant proteins, 25 were exclusively identified in GA genotype, and 26 were exclusively identified in AA genotype (Fig. 3c), while 9 proteins were found to be common in both the genotypes. A heat map of all proteins is presented Fig. 3d, depicting the fold differences of XIC intensities in both GA and AA genotypes. Detailed description of each protein with FC (log 2 ) and p-values are listed in Table 3. Out of these 9 common Figure 1. Schematic representation of rs562338 genotyping and its genotype based differential expression of proteins by using label-free quantitative (LFQ) proteomics. The shotgun proteomics study with disease-control design (n = 700) was conducted to dissect comparative differential serum proteome with genetic variant. A genotypic-phenotypic relation was studied. Further details can be found under Experimental Procedures. www.nature.com/scientificreports/ proteins, three proteins ITIH4, HPX, and C3, showed higher differential expressions in the AA genotype as compared to the GA in comparison to the wild type GG genotype (Table 3). Gene Ontology (GO) of protein molecular functions with relevance to CAD showed that these proteins were involved in the modulation of cell migration and proliferation during development of acute-phase response, cell protection from oxidative stress, and complement system activation ( Fig. 4a,b). Six proteins (PPBP, PLG, GSN, APOC3, AHSG, and IGKV3-11) showed a decreased differential expression in AA genotype, and increased in GA genotype in comparison to GG. Molecular function analysis showed that these proteins were involved in neutrophil activation, blood coagulation, actin modulation, hepatic inhibition of triglyceride-rich particles, suppressing of arterial calcification, and antigen binding (Fig. 4a,b). The relevant protein to protein interactions among them is presented in Fig. 5e. Furthermore, the exclusive proteins of both the GA and the AA genotype carriers as compared to the wild GG genotype carrier were also analyzed, and most of them were found to be in the category of protein modifying enzymes, transfer/carrier protein, metabolite interconversion enzyme, protein-binding activity modulator, and defense/immunity proteins. The percentage distribution of each category is given in Fig. 4c and d. Statistical Table 1. Baseline anthropometric, clinical, and biochemical parameters of study subjects. BMI body mass index, WHR waist to hip ratio, LDL-C low-density lipoprotein cholesterol, HDL-C high-density lipoprotein cholesterol, TG triglycerides; Diabetes was defined as fasting blood glucose of ≥ 126 mg/dl. ARBs angiotensin II receptor blockers, ACE (angiotensin-converting-enzyme). Values are expressed as mean ± SD, and percent (frequency). Univariate General linear model (ANCOVA) adjusted for age and gender was used to analyze the effect of different variables in study groups. **Significant (p-value < 0.01).   Supplementary Fig. S2). www.nature.com/scientificreports/ overrepresentation analysis of molecular functions with reference to CAD of both the GA and the AA genotypes showed that phosphatidylcholine-sterol O-acyltransferase activator, cholesterol transfer activity, sterol transfer activities, and phosphatidylcholine binding were exclusively up-regulated in the GA, whereas, platelet degranulation, and response to elevated platelet cytosolic Ca 2+ were exclusively up-regulated in the AA genotype. Similarly, serine-type endopeptidase inhibitor activity and endopeptidase regulation/inhibition were down-regulated in both GA and AA ( Fig. 4e and f). String analysis of both up-, and down-regulated exclusive proteins in both genotypes represents strong interactions among them depicted in Fig. 5a-d.

Discussion
Several biological processes involve different types of biomolecules, and hence a single type of biomolecule may not represent the clear picture at multiple platforms such as genome, transcriptome, proteome, metabolome or ionome [36][37][38] . Discovery of new biological insight has become challenging and hindered due to difficulties in combining single-omic datasets in a meaningful manner. Therefore, it is important to consider these biological layers as separate elements and also their interaction with one another for more comprehensive understanding of fundamental biological processes.  (7) A: 30 (7)   www.nature.com/scientificreports/ The Apolipoprotein B gene (APOB) has been associated with dyslipidemia and a risk factor of Cardio Vascular Diseases (CVDs), especially the Coronary Artery Disease (CAD). Various genetic determinants of APOB are known to be associated with increased LDL-cholesterol level in CAD 39 . However, to understand the complex disease etiology, and for more efficient therapeutic targets, genetic predisposition does not provide a clear picture.
To understand the genotype-phenotype relationship, several layers of information through the multiple "Omics" platforms are required. In the present study, we used proteogenomic approach to better understand the disease pathology. This is to the best of our knowledge, the first proteogenomic approach used to study the rs562338 (G/A) polymorphism of the APOB gene in CAD in a Pakistani cohort.
In the current study, majority of the patients were enrolled in the study at the time of admission or after 1 day of their admission to Coronary Care Unit. Data related to their drug regime showed that ~ 9% patients were taking antihypertensive drugs while < 20% patients were prescribed to take statins. With a half-life of 1-3 h, the statin must be administered in multiple doses in order to produce an effect in the patients 40 . Therefore, it was understood that the initial level of statin in CAD patient's serum could not significantly influence the cholesterol metabolism related pathways. In the first part of genotyping of rs562338 (G/A) of the APOB, we found the frequency of minor allele of rs562338-A to be 7% in our patient cohort, which was high in comparison to 1.1% in the HapMap-HCB (Han Chinese in Beijing), and < 1% in the Han Chinese population 13,41 . These differences were due to significant ethnic diversity between the two populations. We found strong association of genotype rs562338-AA with increased risk of CAD (CAD cases versus healthy controls: odds ratio (OR) = 4: 95% CI 1.9-16.7; p = 0.04). These findings are in agreement with multiple studies in the American and European populations, which showed strong association of rs562338 polymorphism with higher levels of LDL-cholesterol, and in turn with an increased risk of CAD 42,43 .
In the second step, using label free proteomics, we analyzed differential expression of significant serum proteins from all three genotypes (GG, GA, and AA) in the CAD patients. The purpose of this strategy was to analyze the influence of these genotypes of APOB gene on patient serum proteome. In common proteins comparison, three acute phase proteins 44,45 (ITIH4, HPX, and C3) were found to be differentially down-regulated in GA as compared to AA, with reference to wild type GG genotype. ITIH4 has been reported to be a putative anti-inflammatory marker in ischemic stroke 46 , and several types of cancers [47][48][49] . HPX is a 60-kDa plasma glycoprotein which represents the primary line of defense against heme-related oxidative stress, and toxicity 50 . The HPX molecule acts as a heme-specific carrier from the bloodstream to the liver and excess heme may be detrimental to tissues by mediating oxidative and inflammatory injuries 51,52 . HPX is also known to inhibit the LDL oxidation, and hence reduce atherogenesis 53 . In our study, the levels of ITIH4 and HPX were up-regulated in the AA genotype and down-regulated in GA genotype as compared to GG genotype. These results showed less anti-inflammatory activity in AA in contrast to GG genotype. C3, the complement protein, secreted by liver and adipose tissues, is the central component of the complement system. Our findings are in agreement with several studies which reported that complement C3 as possible biomarker of cardio-metabolic diseases, and insulin resistance [54][55][56] .
Out of six common up-regulated proteins in GA genotype, the highest fold change was observed in PLG, PPBP and APOC3. PLG (plasminogen protein) plays a pivotal role in fibrinolysis and wound healing. This protein generates the active enzyme plasmin which is essential for the dissolution of blood clots, and is important in wound healing 57 . Its deficiency may result in increased risk of thrombosis 58 . In agreement with our findings, Folsom et al., also found a positive association between PLG and the risk of cardiovascular diseases 59 . Pro-platelet basic protein (PPBP) or chemokine (C-X-C motif) ligand 7 (CXCL7) is a small cytokine of CXC chemokine family. It is released in large amount from activated platelets in response to vascular injury 60 . It stimulates various processes, including mitogenesis, glucose metabolism, and the synthesis of extracellular matrix and is a plasminogen activator 61,62 . In Thai hyperlipidemia patients, Maneerat et al., found strong correlation of PPBB with the risk of CHD development 63 . In the present study, we also found the up-regulation of ApoC3 in GA. ApoC3, also known as aplolipoprotein C3, is a carrier/transporter protein found on the surface of triglyceride rich lipoproteins (TRLs), such as chylomicrons, VLDL, and remnant cholesterol 64 . Recent evidences have suggested that it promotes the vascular inflammation and TRLs mediated atherogenicity. Furthermore, dysfunctional ApoC3 is Table 3. Genotype (rs562338) based differential expression of serum proteins in CAD patients. www.nature.com/scientificreports/ associated with lower levels of plasma triglycerides and a reduced risk of CHD 65,66 . Our data suggest that the GA genotype is more prone to TRLs mediated atherogenicity, as compared to the AA genotype.
In the current study, we also found some proteins exclusively identified in CAD patients with GA and AA genotypes with reference to patient with the GG genotypes (Fig. 4a,c). In the exclusive protein comparison of GA/GG genotype, the GO annotation shows an up-regulation of activities, including phosphatidylcholine-sterol O-acyltransferase activator activity, cholesterol transfer activity, sterol transfer activity and phosphatidylcholine binding (Fig. 4b). All of these activities were involved in the regulation and uptake of cholesterol, and reverse cholesterol transport. High level of phosphatidylcholine-sterol O-acyltransferase or LCAT might be associated with decrease in LDL particle size, and increase in TRL markers in CVD patients 67,68 . Our data suggest that the GA genotype has high LCAT activity, as compared to the GG genotype and may have less atherogenic risk in www.nature.com/scientificreports/ terms of LCAT related genes. Similarly, the up-regulated GO annotated functions in AA genotype include, platelets degranulation and their response to elevated level of cytosolic Ca 2+ . Both of these responses are involved in platelets activation which play important role in the pathophysiology of CVDs. Because these proteins are implied in thrombus formation after atheroma plaque rupture 69,70 . Furthermore, the serine-type endopeptidase inhibitor activity was found to be down-regulated in both GA and AA genotypes. Serine proteases are key components of the inflammatory response, and play a major role in the body's defense mechanisms, as well as vascular homeostasis and tissue remodeling 71 . These proteins are produced either through in coagulation cascade or discharged from activated leukocytes and mast cells. Multiple studies found that leukocyte activation in several conditions, including infection, hypertension, hyperlipidemia, hyperglycemia, obesity, and atherosclerosis, are associated with increased CVD risks [72][73][74][75] . Their down-regulation in our study suggests protection from CAD. Overall, the proteomic analysis showed significant up-regulation of proteins involved in pathways related to the pathogenesis of CAD, such as cholesterol metabolism, in AA genotype as compared to the GG genotype. This finding is in parallel to the genomic association of AA genotype with the risk of CAD. Furthermore, these results are compatible with the findings of the biochemical analysis of our studied metabolites. Such that we found high levels of triglycerides (significant), cholesterol (significant), and LDL (significant); and low levels of HDL (significant) as compared to the control group. This represent disturbances in cholesterol related pathways.
In the present study, a strong association of the rs562338-AA genotype of APOB gene with CAD risk in Pakistani population was found. However, there are certain limitations of the study. APOB is a large gene with 43 kb size, observing the effect of multiple polymorphisms of this gene on CAD proteome was out of scope of the objectives of the current research work. The present study observed the effect of single SNP, however the chances of effect of other SNPs on the proteomics of the presented patient's serum samples cannot be excluded. Further, CAD is a complex metabolic condition in which multiple factors are responsible for the pathogenesis of the disease. However, current study is only presenting the effect of SNP on the proteomics of the CAD patient's serum samples, the metabolomics profiling of the same samples may reveal more detailed picture of the perturbations observed in the molecular pathways. Further, there are some specific limitations related to the subjects of the study. Such as, recruitment of controls was done on the basis of baseline biochemical parameters, and previous history, only. Moreover, the data of other clinical parameters like ejection fraction, and cardiac biomarkers were not collected, and therefore may have any impact on the results. Due to limited sample size, the study has a low statistical power, and less frequency of APOB rs562338-AA genotypes. A large size population-based study is recommended to increase the statistical power and to confirm any ethnic differences of this polymorphism. www.nature.com/scientificreports/

Conclusion
In summary, we have found a strong association of the rs562338-AA genotype (recessive model) of APOB gene with CAD risk in Pakistani population. Similarly, in the serum proteomic analysis the AA genotype of rs562338 (G/A) polymorphism is more actively involved in CAD relevant pathways, as compared to the GG genotype. This genotypic-phenotypic study provides a better understanding of CAD prevalence in local populations. In future, such studies need to be conducted on a large scale on different sub-population group to validate the effect of multiple genetic determinants on complex and multifactorial diseases occurrence, such as CVDs. Furthermore "proteogenomics" approach is recommended to better understand the disease pathology, and to pave the way for more efficient and personalized therapeutic interventions.