Introduction

Despite the significant reduction in the overall burden of cardiovascular disease (CVD) over the past decade, CVD still accounts for a third of all deaths in the United States and worldwide each year1,2. While efforts to identify and reduce risk factors for atherosclerotic heart disease (i.e. hypertension, dyslipidemia, diabetes mellitus, cigarette smoking, inactivity) remain the focus of primary prevention, the inability to accurately and temporally predict acute myocardial infarction (AMI) impairs our ability to further improve patient outcomes3. The current diagnostic evaluation for the presence of coronary artery disease relies on functional testing, which detects flow-limiting coronary stenosis, but it has been known for decades that most lesions underlying AMI are only of mild to moderate luminal narrowings prior to acute plaque rupture and not obstructing coronary blood flow4,5,6. Accordingly, there is an urgent need for improved diagnostics of the underlying arterial plaque dynamics, fissure and rupture7,8. Increased numbers of circulating endothelial cells (CEC) are known to be present not only in patients with AMI but also with unstable angina – marked by the absence of traditional biomarkers of myonecrosis (troponin, CK-MB) - and may provide a window into the pathophysiologic state preceding an acute atherothrombotic event and the development of myonecrosis9,10.

The transition from stable atherosclerotic disease to a ruptured plaque with acute thrombo-occlusive disease is multifactorial and has been the subject of great study. It is thought to involve a combination of physical (sheer stress, thin fibrous cap vulnerability) and biochemical (proinflammatory, vasoactive) factors4. Prior to plaque rupture most atherosclerotic plaques responsible for acute coronary syndromes are not physiologically significant and there is no current diagnostic modality for accurate identification of unstable plaques11. Differential gene expression patterns of leukocytes have previously been used successfully in the assessment of stable coronary disease12,13,14. Additionally, microarray-derived gene expression patterns in whole blood and PMBCs of patients presenting with AMI have been studied, and CEC-specific gene expression has been examined in patients with metastatic carcinoma and systemic sclerosis15,16,17,18,19,20,21. However, the prior studies in AMI were limited by their size and predictive ability. Here, we focus on CECs as a potential source of gene markers for AMI given their temporal elevation in the peri-plaque rupture process. Elevated numbers of CECs have been implicated by our group and others in the pathophysiology leading to acute myocardial infarction9,10,22,23,24,25. In fact, while absent in stable angina, increased CECs have been noted not only in AMI, but also in unstable angina, a condition of plaque instability without elevated biomarkers of myonecrosis (troponins, CK-MB)9. Additionally, CEC elevations during AMI are known to be completely independent of the traditional measurements of troponin and CK-MB10. Thus, our primary motivation for initiating our study of gene expression of CECs is that they may be regarded as a biomarker temporally preceding myonecrosis and a transcriptomic signature derived from these cells, and detectable in whole blood, may provide the key to earlier identification of AMI.

Results

Enumeration of CECs in patients with myocardial infarction

In this study we first assessed CEC counts in AMI patients (n = 28) and healthy control volunteers (n = 28). CECs were enriched from whole blood using CD146+ immunomagnetic separation and enumerated using the CellSearch system as previously described10. The median CEC count was elevated in AMI patients with 82.5 cells/mL (range, 4 to 650 CEC/mL) whereas the median for healthy volunteers was 9.5 cells/mL (range, 1 to 80 CEC/mL) (p < 0.0001 by Mann-Whitney) (Fig. 1). Receiver operating characteristic (ROC) curve analysis demonstrated an AUC of 0.895 (95% CI 0.810–0.980, p < 0.0001) for CEC enumeration alone for the discrimination of AMI versus healthy volunteers.

Figure 1
figure 1

Enumeration of CECs in Patients with AMI. Circulating endothelial cells (CEC) are elevated in the setting of acute myocardial infarction (AMI). CD146 + CECs immuno-magnetically separated from whole blood are increased in patients during AMI (n = 28) as compared to healthy controls (n = 28). *p < 0.0002, non-parametric Mann-Whitney two-tailed t-test.

In support of cellular stress leading to endothelial cell dysfunction and detachment during the acute phase process, we identified circulating microparticles (CMPs), using novel AC electrokinetic methodology previously utilized in the oncology space, as an additional and independent marker for AMI in a separate subset of patients26. CMPs have previously been shown to be associated with an increased risk of CVD and adverse cardiovascular clinical outcomes in patients with known CAD possibly by promoting procoagulant and inflammatory pathways27,28,29. In this group of AMI patients (n = 14) and healthy volunteers (n = 14), CMPs were elevated in AMI (median 168.5 versus 21.5 particles /mL, p < 0.0001 by Mann-Whitney) (Supplementary Fig. 1A). Elevated CEC enumeration in AMI was coordinately increased in the same subset of subjects (Supplementary Fig. 1B). In these subjects for which both CMP and CEC enumeration was performed, the CMP and CEC counts were highly correlated as measured by Pearson r analysis (R-squared 0.692, p < 0.0001) (Supplementary Fig. 1C) with no significant differences in their ability to differentiate AMI from control in ROC-curve analysis (AUC for CMP 0.898, 95% CI 0.781–1.0 AUC for CEC 0.888, 95% CI 0.767–1.0) (Supplementary Fig. 1D).

Microarray gene expression dynamics of enriched CECs

CEC and CMP enumeration is not a practical marker for rapid turnaround in the acute care setting and thus we turned our attention to gene expression assessment. We took an extreme phenotype study design to discover markers in CECs indicative of AMI and detectable in whole blood, and validated their discriminative potential in well-matched subjects. Samples were enriched for CD146 + CECs by the Veridex CellSearch system10 and gene expression determined via microarray. Markers were initially filtered based on biological function (see Methods) in order to account for expression differences correlated with co-morbidity differences in our cases vs controls not necessarily indicative of the presence of AMI. Initial marker discovery was performed with elastic net regression in a discovery set of enriched CECs from healthy control volunteers (n = 22) and AMI patients (n = 21 (Table 1A). The discriminative model trained on this discovery set identified 11 candidate genes (Fig. 2A and Table 2). The top performing marker in the discovery set, heparin-binding EGF-like growth factor (HBEGF), with a coefficient of 0.1132 in our model, was 5.40-fold different in AMI versus controls. However, sulfatase-1 (SULF1) showed the highest fold change, 8.89 (p = 1.97 × 10−6), but was less influential on the overall discriminative model (coefficient 0.0283). A model built around the expression levels of these 11 genes effectively discriminated myocardial infarction from healthy control as illustrated in ROC-curve analysis (Fig. 2C).

Table 1 Patient Demographics.
Table 2 Candidate Genes from Microarray.
Figure 2
figure 2

Microarray Analysis of Enriched CECs. An 11-gene signature for AMI was determined from microarray gene expression analysis of enriched CECs from healthy control and AMI patients. (A,B) Heat maps for the 11 genes in the microarray of the (A) discovery cohort of healthy control (n = 22) and AMI patients (n = 21) and (B) replication cohort of healthy control (n = 25) and AMI patients (n = 23) found in the elastic net to discriminate AMI from control. Samples are ordered according to their predicted probability of being an AMI. Expression levels are represented from high (blue) to low (red). (C,D) ROC-curves for the 11-gene signature in the (C) discovery cohort with AUC of 1.0 (p = 1.90 × 10−12 and (D) validation cohort with AUC of 0.99 (p = 7.78 × 10−13).

We next replicated this 11-gene model in a separate cohort of control volunteer (n = 25) and AMI patient (n = 23) samples acquired, processed and sent for microarray analysis independently of our discovery cohorts (Table 1A, Table 2 and Fig. 2B). Mirroring the excellent performance in the initial discovery cohort, the ROC-curve analysis of this independent replication cohort gave an AUC of 0.99 (p = 7.78 × 10−13) (Fig. 2D). It should be noted that while the samples used for microarray analysis were enriched in CECs, CD146 is expressed on a subset of cells other than CECs. Additionally, barcode analysis of the gene expression patterns from the enriched CEC microarray reveals evidence for a mixed-cell population based on an elevated number of total genes expressed30 (Supplementary Fig. 3 ). As a broad assessment of the general gene pathways altered during AMI we also conducted a gene set enrichment analysis (GSEA) on the microarray data (Supplementary Fig. 4). As expected, we find that several reactome pathways, such as hemostasis (NES = 3.88, p < 1 × 10−5, q < 1 × 10−5), platelet aggregation (NES = 3.67, p < 1 × 10−5, q < 1 × 10−5) and GPCR1 ligand signaling (NES = 4.60, p < 1 × 10−5, q < 1 × 10−5), are highly upregulated in AMI.

A molecular signature for myocardial infarction in whole blood

Following the designation of 11 candidate genes on microarray gene expression analysis of enriched CECs as markers for AMI, we asked if the top performing genes in this molecular signature could be assessed directly from whole blood. By examining the whole blood gene expression patterns we would obviate the specialized cell sorting done prior to microarray. To this end, RNA was isolated from whole blood of the same patients (control and AMI) utilized in the microarray replication study (above) with the addition of 14 new AMI patients following RBC lysis from which cDNA was prepared for qPCR analysis (n = 44 AMI and 29 control) (Table 1B). An important distinction is that while CECs had been specifically enriched from patient blood using CellSearch technology for our microarray analysis, here we used only whole blood. The purpose of this experiment was to simply determine whether the gene signature remains detectable and indicative of AMI in this more convenient sample source.

The expression levels for many of the original genes determined in enriched CEC microarray remained significantly elevated in whole blood samples of patients with AMI compared to healthy control volunteers (Fig. 3A). Heparin-binding EGF-like growth factor (HBEGF) showed the highest discriminatory performance between AMI and healthy control patients (AUC 0.97, 95% CI 0.93–1.00, p < 0.0001) in whole blood analysis. In terms of expression differences between AMI and healthy control patients, HBEGF was followed by SULF1 (AUC 0.93, 95% CI 0.86–0.99, p < 0.0001), NR4A3 (AUC 0.92, 95% CI 0.87–0.98, p < 0.0001), NFKBIA (AUC 0.91, 95% CI 0.84–0.97, p < 0.0001), and NR4A2 (AUC 0.90, 95% CI 0.83–0.97, p < 0.0001). We re-trained the elastic net model using the whole blood qPCR values to account for well established differences between microarray vs qPCR based transcriptomic measurements and eliminate those genes that lose discriminative power in whole blood vs enriched CECs. The elastic net regression retained seven discriminative genes (combined AUC 0.997, 95% CI 0.991–1.00) using HBEGF, NR4A3, RNASE1, SYTL3, SULF1, NFKBIA, and NR4A2 (Fig. 3C, solid black line).

Figure 3
figure 3

qPCR Analysis of Whole Blood. Candidate genes from enriched CEC microarray were assessed by qPCR in the whole blood of healthy control, stable diseased control, and two separate AMI patient groups. (A,B) Individual plots for each gene assessed by qPCR in (A) healthy controls (n = 29) vs AMI (n = 44) (cohort 1) and (B) diseased controls (n = 36) vs AMI (n = 45) (cohort 2). Specific gene counts normalized by GAPDH for each sample. (C) ROC-curve analysis for each model: solid black line, trained in cohort 1 and tested in cohort 1; dashed red line, trained in cohort 1 and tested in cohort 2. *p < 0.005, **p < 0.05, unpaired, two-tailed t-test. Models are evaluated using leave-one-out cross validation when using the same cohort for training and testing.

Finally, given the differences in age, sex and co-morbid diseases apparent in this first cohort of healthy controls compared to AMI patients we validated this gene expression model in a completely independent cohort of patients presenting with AMI (n = 45) as compared to a new cohort of age and sex-matched control patients (n = 36) (Table 1B). The majority of this second control cohort had co-morbid cardiovascular disease with hypertension (n = 24, 67%), dyslipidemia (n = 27, 75%) and stable coronary artery disease (n = 22, 61%) with many having undergone prior percutaneous coronary intervention (stenting) and/or coronary artery bypass grafting and thus more clinically representative of patients being evaluated for AMI symptoms in an acute care setting (Supplementary Table 1). None of the control or AMI patients in this cohort were a part of the cohorts included in microarray studies or the prior qPCR analysis. While the majority of the marker genes performed similarly in this cohort, there were differences, most notably for HBEGF and RNASE1 (Fig. 3B). Moreover, when we evaluated the gene expression profiles from a subset of cases with reported non-elevated troponins the discriminatory performance of all but these same two genes was modestly improved (Supplementary Table 2). The seven-gene discriminative model trained on the original set of AMI patients and healthy control volunteers (cohort one) and validated in this new cohort (cohort two), performed with an AUC of 0.857 (95% CI 0.774–0.941) in ROC-curve analysis (Fig. 3C, dashed red line).

Discussion

In the acute setting, the diagnosis of AMI relies upon detecting necrotic cardiomyocytes, as reflected by troponin or creatine kinase MB-fraction assays in addition to pathognomonic electrocardiographic changes. Yet each year a number of patients who present to an emergency room with chest pain do not manifest these signs and are discharged, only for some of them to manifest an MI or sudden cardiac death in subsequent days31. Our ultimate goal is to identify a simple, whole blood molecular signature that would not rely upon the endpoint of AMI and myocardial cell death but rather reflect the underlying acute biologic process leading to atherosclerotic plaque rupture and AMI. Here we present the initial steps towards that goal in the designation of a robust gene-based molecular signature for the identification of AMI. We began our search in a specific population of cells, circulating endothelial cells (CEC), that have been identified in increased numbers not only in patients with AMI but also in patients with unstable angina who have not yet manifested biomarker evidence of myonecrosis9,32. As such, CECs can be considered a potential signal of the active peri-plaque rupture process that eventually leads to acute atherothrombotic occlusion of the entire vessel and AMI. While our prior work had validated the findings from Mutin et al. and introduced a novel method for identifying and enumerating CECs, we sought to move beyond enumeration and fully characterize the transcriptome of CECs from patients with AMI so as to generate a specific molecular gene signature that would effectively differentiate AMI from control9. These findings may prove useful for future advances in the discovery of diagnostics for an impending acute coronary syndrome, which will require prospective assessment in at-risk patients who present to an acute care setting with chest pain, suspect AMI, but do not exhibit biomarker signs of myonecrosis.

The initial phase of this study identified 11 genes upregulated in AMI in samples enriched for CECs as determined by gene expression microarray with excellent discrimination. This 11-gene signature was subsequently replicated in an independent cohort of patients with AMI and control volunteers without a loss of power. However, the performance in this initial phase must be tempered by the fact that these comparisons were carried out in patients on separate extremes of the health spectrum: young volunteers without chronic disease and patients presenting with heart attack – a design that may increase statistical power if co-morbidity stratification across the cohorts is appropriately addressed. Additionally, the requirement for specialized cell sorting is a barrier to translating this finding to a point-of-care diagnostic setting.

Accordingly, we then asked if the expression profiles of these genes could be detected from whole blood using qPCR. In whole blood, seven of these genes showed continued expression differences that when analyzed using the elastic net remained significant to the combined molecular signature for discriminating AMI. We observed model coefficient variability depending upon the comparison being made; AMI vs healthy controls or AMI vs age-matched disease controls. However, while the coefficients vary in effect size, their predictive power is conserved and was validated across the different comparisons, as demonstrated by the ROC curves where training and testing were performed in disparate cohorts. Further, supporting the non-reliance of this signature on myonecrosis was that the performance of the seven genes of the signature remained unchanged if not marginally superior in a subset of patients presenting to a single center that had no elevation of their cardiac specific biomarkers at the time of presentation.

The determination of candidate genes from microarray analysis was completed by comparing the gene expression dynamics of two very separate populations, healthy controls and patients having AMI. The age and sex differences in addition to the dissimilarities of underlying co-morbid disease or medications of these populations could partly have magnified the discriminative ability of the original 11-gene model in initial testing. The initial AUC values we report in the discovery and validation cohorts in microarray analysis may reflect this magnified discriminative power. However, we would argue that any biases that are not reflective of AMI status would dampen the predictive power observed in our final age and sex matched validation cohort. We addressed this possibility in our final qPCR analysis of the 7-gene model in whole blood using an age and sex matched control cohort of patients with cardiovascular disease for which the model was attenuated though remained significantly robust. Also, given the limited sample size for this study, ethnic differences were not explored.

Currently, there exists no biomarker, diagnostic study or advanced clinical decision making algorithms that foretell a plaque rupture event leading to AMI. Physicians have imperfect tools to calculate ten-year and lifetime risk of potential cardiovascular events based on various epidemiologically derived, population-based risk factors including hypertension, dyslipidemia, diabetes mellitus, age and baseline inflammatory markers, but nothing that places this probability on a more temporal scale33. Additionally, even by using advanced non-invasive imaging tools to identify and then potentially intervening on high risk plaques, those with the greatest potential of rupture or fissure leading to AMI, would not eliminate the majority of future cardiac events34. While gene expression analysis has previously been combined with traditional clinical risk factors to improve determining the likelihood of stable obstructive coronary disease in non-diabetic patients, that classifier does not indicate or predict impending clinical events12. Likewise, several other groups have completed gene expression analysis of whole blood and PBMCs from patients in the setting of AMI to identify the genes with greatest expression differences, but none have reported a similar discriminatory performance as the molecular signature reported here in whether from enriched CEC microarray or whole blood qPCR15,16,17,18.

While the inability to accurately identify patients in an acute care setting destined for heart attacks before they fully manifest is a limitation to our study, it is also the driving force behind this study. The logical next step will be the prospective clinical validation of this CEC-derived, whole blood molecular signature for AMI in a large cohort of patients presenting to acute care settings with symptoms and high clinical suspicion for AMI, but without accompanying ECG or biomarker signs of myonecrosis. However, the seven-gene molecular signature presented herein may indeed provide a window into the biologic underpinnings of AMI that may precede current biomarkers and potentially lead to changes in the way we approach patients with chest pain symptoms in the future.

Materials and Methods

Patients and control subjects

The study population consisted of patients aged 18–80 years old of both sexes who presented to one of five San Diego County medical centers with the diagnosis of acute myocardial infarction (AMI). Healthy control patients between the ages of 18 and 35 without a history of chronic disease and diseased control patients (with known but stable cardiovascular disease) of between the ages of 18–80 years old were recruited to outpatient clinical centers affiliated with The Scripps Translational Science Institute (STSI) through which Institutional Review Board (IRB) approval for all aspects of this study was obtained. All experiments were performed in accordance with relevant guidelines and regulations. Recruitment of all patients occurred from February 2008 through July 2014, and experiments were conducted with patient samples in phases as separate cohorts. Informed consent was obtained from all subjects in this study. All AMI cases met strict diagnostic criteria including chest pain symptoms with electrocardiographic (ECG) evidence of ST-segment elevation of at least 0.2 mV in two contiguous precordial leads or 0.1 mV in limb leads in addition to angiographic evidence of obstructive CAD in the setting of positive cardiac biomarkers. Our sample sizes were above the calculated threshold of 12-samples at an alpha 0.01, estimated using an established microarray calculator to detect at least two-fold difference with a power of 0.8 and standard deviation of 0.7. This study is registered with ClinicalTrials.gov (NCT01005485).

Circulating microparticle (CMP) isolation and enumeration

CMPs were isolated from patient plasma using electric current and quantified using a fluorescent microscope with a charge-coupled device camera. Additional details are provided in the Online Appendix.

Blood collection and CEC sample preparation and enumeration

Early after arrival to an acute care setting, arterial blood was collected from AMI patients into both EDTA containing (Becton Dickinson, Franklin Lakes, NJ, USA) and CellSave (Veridex, Raritan, NJ, USA) tubes in the cardiac catheterization laboratory following the placement of an arterial sheath prior to the introduction of any guide wires or coronary catheters. Prior work has shown no effect of access site (venous versus arterial) differences on CEC acquisition32. The samples were maintained at room temperature and processed within 36 hours of collection. The CellTracks®AutoPrep® system was used in conjunction with the CellSearch®CEC kit and the CellSearch®profile kit (Veridex) to immunomagnetically enrich and enumerate CD146+ CECs as previously described10,35. The enriched CEC samples were analyzed with the CellTracks®Analyzer II and the number of CECs in the sample determined. For CEC microarray profiling, the AutoPrep tube with the sample from the CellTracks®AutoPrep® system was removed and placed into the MagCellect Magnet for ten minute incubation. With the tube still in the MagCellect Magnet, the supernatant liquid was aspirated without disrupting the ferrofluid bound cells from which RNA was subsequently isolated. For whole blood samples in EDTA tubes leukocytes and cellular debris was obtained for RNA isolation following RBC lysis with Erythrocyte Lysis Buffer (Qiagen, Valencia, CA).

Microarray sample preparation

Microarray analysis was performed in three separate experiments each with even numbers of cases and controls to minimize potential batch effects. Enriched CEC‑derived RNA was isolated using Trizol Reagent (Life Technologies, Carlsbad, CA). Labeled target antisense RNA (cRNA) and double stranded cDNA using the Ovation™ RNA Amplification System V2 (NuGEN, San Carlos, CA) was prepared from enriched CEC RNA samples. Purified cDNA underwent a two-step fragmentation and labeling process using the Encore Biotin Module (NuGEN). The amplified cDNA targets were hybridized to Affymetrix human U133 Plus 2.0 array to assess expression levels of over 47,000 independent transcripts (Affymetrix, Santa Clara, CA). Following hybridization, arrays were washed and stained before scanning on the Affymetrix GeneChip Scanner from which data was extracted using the Affymetrix Expression Console. Signal intensities from each array were normalized using the robust multichip average expression measure technique.

Microarray data analysis

Normalized expression values for the microarrays were calculated using RMA normalization36. Quality controls were conducted with the affy and affyQCReport R packages. A Gaussian mixture clustering of the principal components of the expression data detected eight outliers (five AMI and three control), which were discarded (Supplementary Fig. 2). To select genes for our predictive model, we first removed probe sets mapping to genes that are up-regulated in inflammatory diseases in order to account for the basic health status differences in our cases and controls in the discovery cohort. Next, differential expression analysis in the discovery set was performed via linear regression using the limma package in R. P-values were calculated using an empirical Bayesian method, which were adjusted using the Bonferroni correction37. Probe sets with a fold change less than two-times were removed from further consideration. With the remaining probes, we used elastic net regression with an alpha parameter of 0.5, via the glmnet package in R, to build a predictive model for acute myocardial infarction38. Genes selected by the elastic net regression were advanced to the whole blood qPCR analysis described below. Initial model performance was ascertained by training on the microarray discovery set and then applying this model to form predictions on the independent replication set. The performance of the model was evaluated using receiver-operator characteristic curves via the pROC package in R39. A gene set enrichment analysis was run on the combined set of discovery and replication samples40. For the GSEA, each probe’s log fold change was used as the ranking statistic, and the GSEA was set to the “classic” mode.

Data Availability

All microarray data are available from the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo) under accession code GSE66360.

cDNA synthesis, pre‑amplification and qRT‑PCR analysis

Genes selected by the elastic net regression of CEC-enriched microarray data were ascertained for their level of expression in whole blood. First‑strand cDNA was synthesized from total RNA using High‑Capacity cDNA Archive kit (Applied Biosystems, Foster City, CA). The cDNA was pre-amplified using ABI TaqMan PreAmp (Applied Biosystems) and the selected candidate genes were assessed using the qRT‑PCR. PCR data of Ct values were exported for further analysis. ΔCts normalized by GAPDH were applied entered into an elastic net model to predict acute myocardial infarction38. That gene expression levels for SULF1 in non-AMI samples were at the lower limit of detection, this comparison was completed in 18 controls and 41 AMI for cohort 1, and 22 controls and 42 AMI for cohort 2. When training and testing of the elastic net regression was applied to the same cohort performance was ascertained via hold-one-out cross validation. Otherwise training and testing were performed on independent cohorts to ascertain model performance. Additional methodological details are provided in the Online Appendix.