Large-scale Metabolomic Analysis Reveals Potential Biomarkers for Early Stage Coronary Atherosclerosis

Coronary atherosclerosis (CAS) is the pathogenesis of coronary heart disease, which is a prevalent and chronic life-threatening disease. Initially, this disease is not always detected until a patient presents with seriously vascular occlusion. Therefore, new biomarkers for appropriate and timely diagnosis of early CAS is needed for screening to initiate therapy on time. In this study, we used an untargeted metabolomics approach to identify potential biomarkers that could enable highly sensitive and specific CAS detection. Score plots from partial least-squares discriminant analysis clearly separated early-stage CAS patients from controls. Meanwhile, the levels of 24 metabolites increased greatly and those of 18 metabolites decreased markedly in early CAS patients compared with the controls, which suggested significant metabolic dysfunction in phospholipid, sphingolipid, and fatty acid metabolism in the patients. Furthermore, binary logistic regression showed that nine metabolites could be used as a combinatorial biomarker to distinguish early-stage CAS patients from controls. The panel of nine metabolites was then tested with an independent cohort of samples, which also yielded satisfactory diagnostic accuracy (AUC = 0.890). In conclusion, our findings provide insight into the pathological mechanism of early-stage CAS and also supply a combinatorial biomarker to aid clinical diagnosis of early-stage CAS.

Metabolic profiling of the plasma samples. We obtained 3892 and 2936 aligned individual peaks (variables) in ESI+ and ESI−mode, respectively. These peaks were for quasi-molecular ions, isotope ions, adduct ions,

Training set
Test set early CAS (n = 60) Control (n = 60) P-value early CAS (n = 40) Control (n = 40) P-value and fragment ions of the metabolites. Examples of the LC-QTOF/MS total ion chromatograms of plasma samples from an early stage CAS patient and a control subject are shown in Supplementary Fig. S1. Firstly, the unbiased PCA revealed that all the QC samples were tightly clustered in PCA score plots ( Supplementary Fig. S2), which confirmed that our method was robust. The score plots from PCA model performed on all plasma samples also showed that there were no extreme outliers that needed to be excluded from subsequent analysis. Nonetheless, no obvious separation trends between the two groups were observed when variables were not selected.
Then, to further explore the metabolic differences between the early stage CAS group and the controls, PLS-DA models were established in the training set. As shown in Fig. 1A,C, the early stage CAS subjects were obviously separated from the controls with little overlap. The values of those parameters quantifying the PLS-DA model were positive (R 2 X = 0.178, R 2 Y = 0.933, Q 2 = 0.540 in ESI+ mode and R 2 X = 0.168, R 2 Y = 0.943, Q 2 = 0.356 in ESI−mode), indicating the goodness of fit and prediction ability of the model 22 . In the training set, three samples from control subjects (5%) were wrongly classified in ESI+ mode, while no misclassifications were found in ESI−mode. Furthermore, the supervised PLS-DA models were validated with permutation tests to ensure those models were not overfitted. The validation plots of permutation tests (Fig. 1B,D) supported the validity of these constructed PLS-DA models, as all the values of the goodness of fit (R 2 and Q 2 ) calculated from the permuted data (in green on the left) were lower than the original point on the right, and the Q 2 regression line (in blue) had a negative intercept 23 . Selection and identification of potential metabolic biomarkers. To identify potential biomarkers of early stage CAS, variables that dominated the discrimination were first selected according to their VIP values (VIP > 1), which were calculated from the PLS-DA model. A nonparametric Kruskal-Wallis test was then performed, and variables without significant differences between the two groups (p ≥ 0.05) were eliminated. The remaining biomarker candidates were selected for subsequent identification. The procedure for metabolite identification is detailed in our previous work 23 . Following the procedure, a total of 20 differential endogenous metabolites in ESI+ mode and 22 metabolites in ESI−mode were identified (Table 2). Among them, the identities of seven were confirmed using reference standards, and 29 were identified by online database searches (HMDB and METLIN) and LC-QTOF/MS. The MS/MS spectrums of the metabolites are shown in the Supplementary Figs S3-S7.
The concentrations of 24 metabolites were significantly higher in the early stage CAS patients than the controls (Table 2). By contrast, the concentrations of 18 metabolites were lower in early stage CAS patients than the controls. These differences between the two groups are expressed as fold change. In addition, the relative standard deviations of the intensities of these 42 biomarkers were calculated in the QC samples, and varied from 4.63% to   29.40% with a median of 16.14%. These data indicated that our metabolic profiling platform was robust, and that changes in the biomarkers arose from the disease state rather than analytical errors. Furthermore, to investigate whether these differential metabolites are closely associated with clinical measures, Pearson correlation analysis was performed. However, the early stage CAS patients and their controls enrolled in this study exhibited no significant differences in clinical characteristics, so it is to be expected that no correlation was found between the majority of metabolic biomarkers and clinical parameters. Although certain biomarkers showed correlations with clinical parameters at a cutoff point of p = 0.05 (e.g. LysoPC(18:4(6Z,9Z,12Z,15Z)) and LysoPE(18:2)), the correlation coefficients were very low (<0.3075), as presented in Supplementary Table S1. Thus, the established correlations needed to be further investigated.

Evaluation of the Diagnosis Potential of the Metabolic Biomarkers.
To assess the diagnostic utility of the metabolites for discrimination between early stage CAS patients and controls, ROC curves were constructed for the 42 metabolites. For most biomarkers, the value of the area under the curve (AUC) was < 0.7 (Table 2), indicating they had poor prediction ability. Therefore, multiple metabolites will need to be combined to diagnose early CAS.
To validate the diagnostic capability of the combinatorial model, an independent cohort of 40 early stage CAS patients and 40 control subjects was used. None of the samples had been previously included in the training set, and this allowed for estimation of true predictive accuracy. In this case, the plasma biomarkers model still exhibited good classification ability (Fig. 2B). The AUC reached 0.890 (95% confidence interval 0.822-0.961). The percentage of correct diagnoses at the same cutoff value of 0.4789 was 85.0% for early stage CAS patients and 80.0% for the control subjects (Fig. 2C). The external validation study confirms the outstanding performance of the LC-MS plasma metabolomics platform for diagnosis of early stage CAS patients.
Staging analysis. In addition to accurate diagnosis, classification of the severity of CAS is critical for patient management and determining prognosis. Thus, we attempted to investigate the potential of plasma metabolomics for stratification of the severity of CAS in combined training and test data sets. For simplicity in this study, we divided the patients into three groups based on the number of artery stenoses. When using the entire data set, the established PLS-DA models exhibited good ability to discriminate from each other (See Supplementary  Fig. S8). Furthermore, all AUC values were greater than 0.90 (Table 3). These results highlight the potential of metabolomics in the staging of CAS. In addition, the concentrations of five metabolites were found to be significantly different in patients who were at different stages of the disease (Supplementary Fig. S9). However, further validation and selection of more differential metabolites should be carried out using a larger patient cohort to confirm these results.

Discussion
Complex diseases such as CAS with multiple etiological factors necessitate a systemic approach for mechanistic understanding and optimization of early diagnosis. In this study, we applied metabolomics, covering thousands of small molecular endogenous metabolites, to characterize metabolic alterations of early stage CAS.
Our results demonstrated that multivariate models can accurately distinguish early stage CAS patients from control subjects, and 42 plasma metabolites were identified as biomarkers of early stage CAS. However, diagnosis based on so many metabolites would not be convenient or economical in clinical practice. Thus, simplification of the plasma metabolite signature is required for practical diagnosis of early stage CAS. To accomplish this, we performed a binary logistic regression, in which nine metabolites were selected as the best predictors for early stage CAS discrimination. Furthermore, the AUC was calculated to quantitatively assess the diagnostic performance of this simplified metabolite signature. The findings indicated that the simplified metabolite signature of the nine biomarkers was a good classifier for discrimination of early stage CAS patients from controls, and this was supported by the satisfactory AUC values of 0.898 in the training set and 0.890 in the test samples. However, further studies involving a larger sample set or heterogeneous population are needed to verify these novel biomarkers.
In the current study, in addition to assessing the potential of these biomarkers as diagnostic indicators for early stage CAS, we investigated the biology and metabolic functions of the biomarkers to enhance our understanding of the disease's metabolic mechanisms. The pathways for the biomarkers (Table 2) were determined by searching the KEGG PATHWAY Database, Human Metabolome Database and ChEBI Database.
Among the metabolites, a series of LysoPCs and lysophosphatidylethanolamines (LysoPEs), in addition to phosphatidlycholines (PC), phosphatidylethanolamines (PE) and phosphatidylglycerols (PG), were greatly altered in early stage CAS patients compared to the controls. A well-known mechanism of lysophospholipid production is hydrolysis of phosphoglycerides by phospholipase A2 (Supplementary Fig. S10A). Although elevated levels of lysophospholipids have been reported to induce oxidative stress on endothelial cells, which leads to AS and cardiovascular disease [24][25][26] , it has also been observed that lysophospholipids produced by a PLA2-like activity of Paraoxanase 1 contribute to inhibition of macrophage biosynthesis and consequently reduce cellular cholesterol accumulation and atherogenesis 27 . Lysophospholipids have been widely recognized as pro-inflammatory and pro-atherogenic metabolites 28 , but some recent population-based studies have suggested lysophospholipids have protective effects on CHD and its risk factors. Fernandez et al. and Stegemann et al. found an inverse association between several LysoPCs and incident CHD 29,30 . In a study of type 2 diabetes, LysoPC 18:2 was found to be inversely associated with incident diabetes and impaired glucose tolerance 31 . Our study confirms and extends on these previous findings. With this knowledge, we inferred that a disturbed phospholipid catabolism would be closely interrelated with early stage CAS.
We  Fig. S10A). Within the intestinal wall, MGs are precursors to triglycerides via the MG pathway before being transported in lymph to the liver 32 . Thus, it has been shown that DGs and MGs are central in the synthesis and breakdown of triglycerides, and a large randomized analysis recently been proved this has a positive causal effect on CHD risk 33 . On the other hand, the disturbed DGs and MGs metabolism observed in this study may lead to an increase in the number of free fatty acids, and further migration and invasion of macrophages via the p38 MAP-kinase signaling pathway, Toll-like receptors 2 and 4, and JNK-dependent pathways 34,35 . Several studies have suggested accumulated macrophages exert a key role in the formation and development of AS 36 . Therefore, the perturbed DG and MG metabolites, particularly MG(18:2) and MG (18:3), should be associated with the onset of early stage AS, which is in line with a previously report that demonstrated they are involved in the pathogenesis of CHD 23,32 .
The levels of phytosphingosine and sphinganine in the early stage CAS patients compared to the controls were elevated significantly in our study. It is known that accumulation of phytosphingosine and sphinganine occurs because of the action of sphingomyelinases hydrolysis of sphingolipids 37 . Sphingolipids, which are a large class of lipids, play important roles as both membrane components and signaling molecules involved in diverse   Table 3. The diagnostic potential of PLS-DA Models for Different CAS stages.
cell processes, including cell-cell interactions, cell proliferation, cell differentiation, and apoptosis 38 . Emerging evidence has shown that sphingolipid-mediated cellular signaling pathways play a critical role in cardiovascular pathophysiology 39 . It has been reported that sphingolipids have the capacity to reduce triglyceride and cholesterol levels 40 . However, higher concentrations of phytosphingosine and sphinganine in plasma samples from patients suggested that sphingolipids were depleted, which increased the risk of AS and metabolic syndrome 41 . In addition, using metabolomics, Liu et al. and Qi et al. 42,43 found that phytosphingosine and sphinganine levels were significantly increased in a myocardial ischemia rat model, which is in accordance with our study. Therefore, it is reasonable to suggest that the sphingolipid metabolism is activated in the early stages of CAS. We found that several metabolites of interest were involved in metabolic processes related to long-chain fatty acids. Generally, fatty acids are an important source of energy for the heart. Under AS conditions, the oxygen requirements of the heart exceed the oxygen supply to the heart. In our study, we observed significantly enhanced levels of plasma long-chain fatty acids in early stage CAS patients, indicating that the increased abundance of plasma long-chain fatty acids was probably the result of strong de novo fatty acid synthesis during the initiation and progression of AS to supply the required energy. Zha et al. 44 showed that syntheses of polyunsaturated fatty acids and unsaturated fatty acids were significantly upregulated in an early AS animal model. Interestingly, we also found that the plasma concentrations of long-chain fatty acids, such as palmitic acid, linolenic acid, and elaidic acid, were significantly higher in the early stage CAS group than the control group. Therefore, the metabolism of long-chain fatty acids might have a pivotal pathogenetic role in triggering CAS.
In summary, our study demonstrates that LC-MS-based plasma metanolomics is a powerful approach that can accurately distinguish early stage CAS patients from control subjects. The results provide a panel of metabolite markers that have clinical potential for disease diagnosis and patient stratification for early CAS. These metabolite markers are involved in several key metabolic pathways such as phospholipid metabolism, sphingolipid metabolism, and fatty acid metabolism. The present study is the first clinic metabolomics study focusing on early stage CAS to suggest that plasma metabolomics could be used for non-invasive early diagnosis and surveillance of CAS with high sensitivity and specificity. The elucidation of the associations between biomarkers and early stage CAS events increases mechanistic understanding of early stage CAS.

Methods
Patients. The study protocol was approved by the Ethics Committee of the Second Affiliated Hospital of Harbin Medical University. All experiments were performed in accordance with relevant guidelines and regulations. Subjects were enrolled between August, 2012 and July, 2014 from the Department of Cardiology, 2nd Affiliated Hospital of Harbin Medical University, Harbin, China. Patients were included in this study if they underwent diagnostic CAG for the evaluation of coronary artery disease and did not have significant coronary artery stenoses (i.e., stenosis < 50%). According to Tousoulis's study 45 , we defined early stage CAS patients as individuals with newly diagnosed, angiographically documented coronary stenosis < 50% in at least one major coronary artery, while the controls showed no apparent lesions in angiography. Exclusion criteria for this study included the following: previous myocardial infarction or myocardial revascularization or percutaneous coronary intervention; heart failure (left ventricular ejection fraction less than 30%); valvular heart disease; any metabolic disease (e.g., diabetes mellitus); malignancy; liver/renal disease; inflammatory disease (e.g., infections); pregnancy or lactation; multiple organ function failure; and previous coronary artery bypass surgery. All participants provided written informed consent, were screened for age, sex, weight, cardiac risk factors, prior cardiac disease, cardiac medications, and were given hematological and biochemical examinations.
Peripheral venous blood samples (5 mL) were collected in the morning before breakfast from 100 early stage CAS patients and 100 controls using vacutainer tubes containing fresh sodium dihydrogen phosphate anticoagulant. The plasma samples were separated by centrifugation at 1000 × g for 10 min and stored at -80 °C until required for further analysis.
Blank and quality control samples. Blank and quality control (QC) samples were analyzed throughout the whole experimental procedure. A blank (75% acetonitrile) was run after every five samples to identify and minimize sample carryover. The QC samples were created by combining equal volumes of plasma samples from 20 patients with early stage CAS and 20 controls. The QC samples were injected four times in randomized order within every analytical batch, and used to monitor the stability and performance of the system and evaluate the quality of the acquired data.
Sample preparation. Before analysis, all of the plasma samples, including the QC samples, were processed according to our previous method with minor modifications 23 . Briefly, methanol (1000 μL) was added to 200 μL of plasma and vortex-mixed vigorously for 2 min. The mixture was centrifuged at 14,000 × g for 15 min at 4 °C. The supernatant was transferred to a clear vial and reduced under a stream of nitrogen at 37 °C. The residue was dissolved in 200 μL of acetonitrile/water (3:1/v:v), vortex-mixed for 60 s and then centrifuged at 14,000 × g for 15 min at 4 °C. The supernatant was then placed into a sample vial for LC-QTOF/MS analysis.
Chromatography. Chromatography separation was performed on an Agilent Technologies 1260 liquid chromatography system using a ZORBAX SB-C18 column (100 mm × 3.0 mm i.d., 1.8 μm, Agilent, Santa Clara, CA) at 40 °C. The mobile phase was a mixture of water containing 0.1% formic acid (A) and acetonitrile with 0.1% formic acid (B). The mobile phase flow rate was 0.5 mL/min. A linear gradient elution was performed, starting with 5% B, increasing to 98% B over 18 min, and was holding at 98% B for 3 min. Subsequently, the mobile phase was returned to the initial condition (5% B) within 0.1 min, and maintained at this level for 7 min for equilibration. The injection volume of the sample was 10 μL. All samples were maintained at 4 °C during the analysis 46 .
Scientific REPORTS | 7: 11817 | DOI:10.1038/s41598-017-12254-1 Mass spectrometry. Metabolic profiling was conducted using an Agilent 6530 series quadrupole time-of-flight mass spectrometer equipped with a dual electrospray ionization source (ESI). The ionization was operated in positive (ESI+) or negative (ESI−) mode. The mass spectrometry parameters were set as previously described 23 . To ensure mass accuracy and reproducibility, the mass spectrometer was internally mass calibrated in real time with purine (m/z 121.0509 and m/z 119.0363 in ESI + and ESI− mode, respectively).
Tandem mass spectrometry (MS/MS) experiments were carried out in targeted MS/MS mode to identify potential biomarkers. Argon was employed as the collision gas, and collision energy was set at 10, 20, or 40 eV.
Data preprocessing and annotation. The raw data acquired from LC-QTOF/MS were initially converted into mzData format via Mass Hunter Qualitative Analysis Software (Agilent) and then imported to xcms package in the R platform for preprocessing 47 . The default xcms parameters were used, with the following exceptions: xcmsSet (method = "centWave", peakwidth = c(10,50)) 47 . The preprocessing result was obtained with a three-dimensional data set of the retention time, mass-to-charge ratio (m/z), and peak intensity. Then the R package CAMERA was used for annotation of isotope peaks, adducts, and fragments 47 . Finally, the data for each sample were normalized to the total sum of peak intensities before statistical analysis 48 . Statistical analysis. Multivariate data analysis was performed using SIMCA-P 11.5 software (Umetrics AB, Umea, Sweden). Unsupervised principal component analysis (PCA) was first carried out with all samples to provide an overview of the grouping trends and outliers 49 . Then, supervised partial least-squares discriminant analysis (PLS-DA) was used to find differences between the early stage CAS patients and controls. Variable importance in the project (VIP) was calculated as a coefficient for selection of variables 50 . To validate the robustness of the supervised model and evaluate the degree of overfitting, permutation tests with 100 iterations were performed 51 . In addition to the multivariate statistical method, the nonparametric Kruskal-Wallis test was also applied to measure the significance of each variable. Only mass features with multivariate and univariate statistical significance (VIP > 1.0 and p < 0.05) were included in the list of candidate markers contributing most to the discrimination, which was then submitted to the metabolite identification procedure. Receiver operating characteristic (ROC) curve analysis and binary logistic regression were performed using SPSS software (IBM SPSS Statistics 22, USA) following the previously published data analysis method 52 . The training set was used to generate the classification model, and an independent test set was then subjected to the constructed model to evaluate its diagnostic ability.
Metabolite identification. Markers were identified through a multiple-step procedure. The first step was to find quasi-molecular ions via analysis of the peak list and annotation results and determine the corresponding molecular weights. The second step involved performing the MS/MS experiments on a quadrupole time-of-flight mass analyzer (6530 Agilent) to produce the fragment patterns and obtain structural information for selected biomarkers. Then, the fragmentation patterns of the biomarkers were compared to the spectral data of metabolites that had the same m/z in freely available databases, namely HMDB 53 , METLIN 54 , MassBank 55 and LIPID MAPS Structure 56 . The mass tolerance between the measured m/z value and the exact mass of the component of interest was set to within 15 ppm. Finally, if available, confirmation with standards was carried out by comparison of retention time, isotopic distribution, and fragments of commercially available reagents (Sigma-Aldrich, St. Louis, MO) with those obtained in real samples.