Atherosclerotic cardiovascular diseases are amongst the leading causes of death globally1. Atherosclerosis is characterized by the accumulation of pro-atherogenic lipoprotein particles in the sub-endothelial space of large- and medium-sized arteries2. The process is initiated by the trapping of apolipoprotein B particles in arterial intima by proteoglycans, followed by subsequent modifications, such as aggregation and oxidation, leading to the development of atherosclerotic plaques3. The rupture of a critically located atherosclerotic plaques can result in myocardial infarction and stroke4.

Ultrasound examination can be used to identify thickening of the intima-media in the carotid arteries and is used to detect and monitor both subclinical and clinical atherosclerosis5. Although it is unclear how the diffuse thickening of carotid artery wall represents the subclinical phase, there is a consensus that local carotid plaques, defined as distinct protrusions from the carotid vessel wall into the lumen, are indicative of specific phenotypic changes associated with an active atherosclerotic process6. As there is compelling evidence that the atherosclerotic process often starts early in life and may remain asymptomatic for several decades7, the study of young and middle-aged adults with non-obstructive carotid plaques provides an opportunity to explore biomarkers for early-stage preclinical atherosclerosis. Such markers may have potential in diagnostics and in understanding the disease etiology. Since atherosclerosis is a circulatory manifestation, the extracellular proteome, which includes proteins like collagens, elastin, proteoglycans, lipoproteins and glycoproteins, is an important analytical target8,9. Accordingly, a number of studies have used serum and plasma to establish or identify protein markers for atherosclerosis10,11. These have ranged from targeted comparisons, i.e. based on prior hypothesis, through to untargeted profiling discovery measurements12,13. As an example of the former, Malaud et al. determined proteomic profiles of atherosclerotic lesions, followed by Luminex immunoassays of likely targets in blood from the same subjects14. Using a discovery approach, DeGraba et al. employed surface-enhanced laser desorption/ionization (SELDI)-time of flight (TOF) mass spectrometry to identify distinguishing proteomics patterns from serum samples of atherosclerotic and non-atherosclerotic groups12. Employing a multi-faceted discovery strategy, Kristensen et al. compared subjects with different circulatory diseases using a combination of immuno-affinity depletion, isobaric labeling, and an additional consideration of phosphorylated and sialyated peptides. Vinculin was identified as a novel marker of acute coronary syndrome13.

In contrast to the published studies, in which the comparisons have mostly addressed advanced atherosclerotic phenotypes, we have analyzed serum samples taken early on from the subjects with a non-obstructive plaque in their carotid artery along with the matched controls. The samples were collected as part of The Cardiovascular Risk in Young Finns Study (YFS), which was established to investigate how childhood lifestyle, biological and psychological measures contribute to cardiovascular risk. Follow-up data on cardiovascular risk factors (e.g. body weight, blood pressure and other biochemical parameters) have been periodically determined at three to six years intervals from over two thousand YFS participants during the past thirty years. Ultrasound assessment of carotid plaque formation has also been performed during the last fifteen years15. On the basis of these ultrasound measurements, serum samples were selected from subjects (N = 43), in whom early signs of plaque development was discerned, together with the equivalent material from carefully matched controls (N = 43). With the view to identify markers of disease risk and onset, we have applied a label-free quantitative mass spectrometry approach to analyze this unique sample set16. Selected reaction monitoring mass spectrometry (SRM-MS) was subsequently used for the verification of the observed differences. The label-free strategy employed was advantageous in terms of ease of implementation and scalability. Additionally, targeted mass spectrometry-based validation assays could be quickly developed from the discovery data and applied to validate the results.


Discovery phase of premature carotid atherosclerosis biomarkers

Label-free quantitative proteomics was performed on serum samples obtained from 43 subjects who developed premature carotid artery plaques and 43 matched controls (Figure 1 and Table 1). Overall 296 proteins were detected with more than 1 peptide, according to the defined filtering criteria (see Methods section). For statistical analysis, 249 proteins with valid values in at least 50% of the samples were further considered (Supplementary Table 1). Notably, whilst atherosclerosis is characterized as an inflammatory disease, our clinical data from this study of the early phases of plaque formation revealed that there were no differences in the level of inflammatory C-reactive protein between cases and controls (Table 1).

Figure 1
figure 1

A Schematic representation of the study. Serum samples were selected from subjects of the YFS prospective study cohort on the basis of ultrasonic assessment of carotid artery intima-media thickness, with control samples selected on the basis of age, gender, body weight and systolic blood pressure (N = 43 vs. 43). Mass spectrometry-based serum proteomics analysis of depleted serum was used. The representation of human outline was adapted from Motifolio (

Table 1 Clinical parameters at the time of plaque detection.

The comparison of the samples from the plaque bearing subjects and their controls revealed the differential abundance of seven proteins (p < 0.05) as shown in Table 2 and depicted as a volcano plot in Fig. 2. Fibulin 1 proteoform C (FBLN1C), Beta-ala-his-dipeptidase (CNDP1), Cadherin-13 (CDH13), Gelsolin (GSN) and 72 kDa type IV collagenase (MMP2) were lower in abundance in cases, whilst apolipoprotein C-III (APOC3) and apolipoprotein E (APOE) were more abundant. After correction for multiple hypothesis testing only the difference in the FBLN1C levels was statistically significant (FDR < 0.05). On the basis of the known genetic associations, we evaluated the APOE genotype data but found no difference in the frequency of the carotid atherosclerosis risk related alleles between the case and control groups (Supplementary Fig. S2 and Supplementary Table 2).

Table 2 List of proteins found to be significantly differentially abundant between cases and their matched controls.
Figure 2
figure 2

A volcano plot showing the differential level of proteins between subjects who developed plaques and controls (N = 43 vs. 43). Each point represents a protein. The FDR corrected P-values were calculated by permutation (1000 times). Only the difference in FBLN1C was significant after multiple hypothesis testing, whilst the other labeled proteins had p < 0.05.

Machine learning classification

To gain an overview of whether there was a panel of proteins that could be used to distinguish the subjects we applied Lasso penalized logistic regression to the serum proteomics data. On the basis of this, a panel of three proteins, FBLN1C, APOE and CDH13, was observed to provide the best discrimination between the cases and controls. With the inclusion of APOE and CDH13, there was a statistically significant improvement in the AUROC (0.79, 95% CI: 0.69–0.88, p = 0.03) (Fig. 3). FBLN1C alone classified cases from controls with an AUROC value of 0.67 (95% CI: 0.56–0.79).

Figure 3
figure 3

Receiver operating characteristics (ROC) curve for the determined panel of three proteins. P23142-4 (FBLN1C) alone classified cases from controls with AUROC = 0.67 (95% CI: 0.56–0.79). The addition of P02649 (APOE) and P55290 (CDH13) to FBLN1C significantly improved AUROC to 0.79 (95% CI: 0.69–0.88) (p = 0.03). (N = 43 vs. 43).

SRM Verification

SRM measurements were performed for the seven proteins that were indicated to be differentially abundant in the initial profiling data. Additional targeted measurements were made for two housekeeping proteins (selected based on their consistency in our data), APOB (an established CVD risk factor) and standard retention time peptides (iRT)17 (Table 3). The analysis supported the downregulation of FBLN1C by a ratio of 0.85 (99% CI: 0.73–0.98; FDR < 0.05) in cases compared to their matched controls (Fig. 4). There were, however, no significant differences observed in any of the other targets. The SRM measurements for the panel did not improve the classification of the cases from controls as in the discovery phase (Supplementary Figure S3).

Table 3 List of proteins along with their proteotypic peptides used for the SRM-MS assay.
Figure 4
figure 4

Box-Whisker plot showing relative abundance of FBLN1C measured using SRM-MS assay in the YFS cohort (control, N = 43; cases, N = 43). The relative abundance of FBLN1C was found to be lower in cases than controls (Adjusted p-value = 0.039). Mass spectrometry-based serum proteomics analysis of undepleted serum was used.


In this comparison of serum from matched controls and subjects in whom the early stages in the development of carotid plaques were detected, the proteomic analysis indicated the differential abundance of several proteins. Amongst these were a number of proteins that have been previously linked with atherosclerosis, i.e. APOC3, APOE, CNDP1, CDH13, GSN, MMP2 and fibulin18. Using a machine learning approach the combination of FBLN1C, APOE and CDH13 was found to provide the best classification of the cases from controls. A targeted SRM assay was subsequently developed for the measurement of these differentially abundant proteins. Due to the unavailability of a similar sample set, it was used for verification only in the same samples as analyzed in the discovery phase. The verification measurements were performed for the samples prepared without depletion. The use of undepleted samples removed potential biases created by the depletion step and succeeded against the background of abundant serum proteins due to the intrinsic sensitivity of the targeted method. Based on this verification data, the quantitative difference of FBLN1C remained significant. The failure to confirm the dissimilarities detected for APOE and CDH13 in the discovery data could reflect the small magnitude and variability of these intra-individual differences.

The Fibulins are a family of six moderately abundant serum proteins (FBLN1 – FBLN6) that are linked with the extracellular matrix (ECM) proteins18. Differences in FBLN1 abundance have previously been observed in several studies, including its relationship to atherosclerosis, cardiovascular risk, arterial stiffness and type 2 diabetes (T2D). For example, Kawata et al. first reported reduced plasma levels of FLBN1 in patients with acute myocardial infarction and stable angina19. Both lower and higher levels have been reported in the plasma of T2D patients20,21, but differences in the duration of the disease (recently diagnosed vs. established disease) could reflect upon the latter division. In relationship to cardiovascular disease, FLBN1 was detected as a component of atherosclerotic lesions, and Argraves et al. suggested that decreased plasma FBLN1 could reflect its accumulation in the plaque22. Similarly, the accumulation of FBLN1 in the arterial wall has been detected in patients with T2D21, although in a situation in which the FBLN1 plasma levels were higher than in the matched controls. In contrast, in newly diagnosed T2D patients, lower plasma FBLN1 was found and correlated with carotid-femoral arterial stiffness20. On the basis of the latter observation, Paapstel et al. studied the relationship between arterial stiffness and serum FBLN1 levels in patients with atherosclerosis23. Here they found higher levels of FBLN1 in the patients.

In addition to the complex interplay between arterial stiffness and atherosclerotic risk factors24, the relationship between plasma levels of FBLN1 and the early stages of plaque formation is contradictory and yet to be clearly established.

In the above examples, there are differences in the study sizes, diseases and duration, as well as specificity of the controls. Further, whilst alternative splicing produces four FBLN1 proteoforms (A, B, C and D)25, the former the studies have not made any distinction between these. These variants may differ from both a structural and functional perspective. In this respect their distinction as proteoforms and this terminology implicates protein variants that are not coded explicitly in the genome, i.e. including alternative RNA splicing and post-translational modifications26. Our proteomic data has specifically highlighted significant differences for the C proteoform. Fibulin1C has been reported to be the predominant form in plasma25 and has been identified in the tissue secretome analysis of coronary arteries27. Within the limitations of this knowledge, we can only speculate whether this difference in abundance represents a phenotype that is more susceptible to plaque formation or an early indication of onset. Potentially, it may be that the structural differences for this specific proteoform of FBLN1 could, following some trigger, contribute to its interaction with extracellular matrix protein (ECM) molecules and accumulation on the arterial intimal walls, and thus be reflected by its lower serum abundance.

The progression of the atherosclerotic process is influenced by a range of factors including age, diet, stress and other risk factors such as smoking28,29. In the present study the subjects were carefully matched by age, gender, BMI and blood pressure. However, seven cases and five controls were smokers, and the influence of smoking was not separately evaluated. Additional limitations of the current study is the sample size and the need for further validation studies in a larger independent cohort. Furthermore, structural studies explaining the interaction of the FBLN1C proteoform with ECM proteins could provide insights into its potential role in plaque formation.

In summary, from these measurements from a cohort selected to study the risk and development of atherosclerotic lesions, distinguishing proteomics profiles were identified in subjects showing early signs of plaque development. In particular, FBLN1C is implicated as a target for further investigation


Study Population and design

The samples were selected from participants in the Cardiovascular Risk in Young Finns cohort15. On the basis of carotid ultrasound measurements, samples from 43 individuals with a distinct carotid artery plaque were selected together with samples from 43 controls. The controls were matched by age, sex, body weight and systolic blood pressure. The age range of the subjects was 30–45 years. The study design is depicted in Fig. 1, and the clinical characteristics of cases and controls are shown in Table 1. The measurements and data included in this manuscript have been acquired following the guidelines of the Declaration of Helsinki for research on human participants and were conducted with the permission of the Ethical Committees of the University Hospitals of Turku with written informed consent.

Carotid intima-media thickness measurement

Ultrasound examination of the left carotid artery, including common carotid artery and carotid bifurcation, were performed using B-mode ultrasound (Acuson Sequoia 512, Siemens) with the 13.0-MHz linear-array transducer, according to a standardized protocol30. Intima-media thickness was measured from digitally stored scans by one reader blinded to participant details. The best-quality end-diastolic frame was selected, and ultrasonic calipers were used to measure carotid intima-media thickness from the far wall of the common carotid artery 10 mm proximal to the bifurcation. To detect the carotid plaques, the images were scanned and the presence of atherosclerotic plaque defined as a distinct area of the carotid vessel wall protruding into the lumen >50% of the adjacent intima-media layer31. All the observed plaques were detected in the carotid bifurcation.

ApoE genotype determination

APOE genotyping was performed by using TaqMan SNP Genotyping Assays (rs429358 assay C 3084793_20; rs7412 assay C_904973_10) and the ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA, USA).

Sample preparation

Immunodepletion of high abundant proteins

An Agilent MARS-14 immunoaffinity column was used for the targeted removal of the most abundant serum proteins. The isolated, lower abundance proteins were reduced, alkylated, digested and desalted prior to mass spectrometry (MS) analysis as described previously32.

Preparation of undepleted serum

For the verification measurements, the serum samples were diluted in the denaturant, reduced, alkylated and digested using sequencing grade modified trypsin (Promega)32.

Heavy labeled synthetic analogs of proteotypic peptides33 of the differentially abundant proteins, housekeeping proteins (Alpha-1B-glycoprotein and complement C1s subcomponent) and the known CVD risk factor, Apolipoprotein B-100, were spiked into the digests together with indexed retention time standard (iRT) peptides (Biognosys). These were selected from discovery phase peptide data with the consideration of the consistent detection, absence of potentially modified residues and missed cleavages (Table 3). The heavy-labeled synthetic analogs (lysine 13C6 15N2 and arginine 13C6 15N4) equivalent of proteotypic peptides were obtained (Thermo Fischer Scientific).

Mass spectrometry analysis

Discovery phase

Aliquots of the depleted serum digests (500 ng) were analyzed with an Easy-nLC-II coupled to a LTQ Orbitrap Velos Pro mass spectrometer (Thermo Fisher Scientific). The peptides were separated on 150 mm × 75 µm ID column packed with 5 µm magic C18-bonded silica (200 Å). The peptides were eluted with an increasing gradient of 5–35% acetonitrile at a flow rate of 300 nl/min using a binary mixture of water and acetonitrile with 0.2% formic acid. The mass spectrometer was operated in data-dependent acquisition mode with a selection of top 15 precursors followed by fragmentation using collisional induced dissociation (CID) method. All the samples were analyzed in quadruplicate as randomized batches32. To ensure comparable instrument performance throughout the time span of the discovery phase study, an in-house standard was periodically analyzed to establish the consistency of the signal intensity and chromatographic separation (Supplementary Figure S1).


Aliquots of the digested peptides (250 ng), spiked with heavy labeled peptides and index retention time (iRT) peptides, were analyzed with Easy-nLC-II coupled to a TSQ Vantage mass spectrometer (Thermo Fisher Scientific)32. The peptide mixture was separated with a 150 mm × 75 µm ID column packed with ReproSil-Pur C18-AQ 5 µm resin (Dr. Maisch GmbH). An unscheduled analysis of the sample was carried out to generate iRT values of target peptides and their heavy counterparts. Skyline software was used to build up the scheduled method for the selected targets using the unscheduled run. The scheduled method was then edited by removing interfering signals32 to monitor 99 transitions from 33 peptides, representing 10 proteins and the iRT peptides.

The 86 serum samples (N = 43 vs. 43) were prepared without depletion and analyzed as randomized batches. To monitor the variation of peak areas and retention time across and within the batches, a pooled digest of the undepleted serum was included in each batch.

Data processing

Protein informatics analysis (Discovery phase)

The tandem mass spectra data were searched against a UniProt human isoform protein sequence database (UniProt release, August 2017, entries = 42,210) using the Andromeda34 search algorithm and MaxQuant The search parameters were set to allow two missed tryptic cleavages, methyl methanethiosulfonate (MMTS) modification of cysteine and variable modification of methionine and acetylation of the protein N terminus. A false discovery rate (FDR) of 1% was applied at peptide and protein level. The “match between run” option (matching and alignment time window = 0.7 and 20 minutes respectively) was selected in order to enable the transfer of identifications across the mass spectrometric measurements36. The label-free normalized intensity values (MaxQuant output) were further analyzed using Perseus software36,37. Briefly, the output was filtered to exclude reverse hits and proteins only inferred by the detection of single variable modifications. Furthermore, only proteins identified with >1 unique plus razor peptides were considered. The razor peptides are defined as those that are shared between different protein groups and are assigned to the protein that has the most peptides35,38. The data was then log2 transformed followed by filtering to at least 50% valid values. Missing values were imputed by “imputation from normal distribution” (width = 0.3, downshift = 1.8)39 followed by taking the average of quadruplicate analyses. The subsequent statistical analysis of data was then performed using R40.

SRM data analysis and transition selection

Skyline version 4.1 was used to develop and analyze SRM assay transitions41. The quality of the transitions and confirmation of light/heavy pairs were manually inspected. On account of interferences in the transitions for one of the FBLN1C peptides, DLLLTVK alone was considered for its statistical analysis in the SRM data. Out of the two housekeeping proteins included for normalization of the data, the TNFDNDIALVR peptide from complement C1s subcomponent was used.

Statistical analysis

Reproducibility-optimized test statistic (ROTS)

The label-free normalized protein intensity abundance values obtained from MaxQuant analysis were used as input for ROTS analysis36,42,43. Briefly, the log2 transformed data were analyzed using non-parametric method relying on the family of t-type statistics which ranks the proteins based on their differential expression in two group conditions and the calculation was made with 1000 permutations (FDR < 0.05).

Machine learning classification

To identify the protein panel with the highest discriminative performance, Lasso penalized logistic regression44, implemented in the R package glmnet45, was applied to the serum proteomics data. First, all candidate predictors were identified by shrinking the coefficients of non-informative predictors to zero using Lasso with 3-fold cross-validation, repeating the randomization procedure 200 times. In each fold, only significantly differentially abundant proteins (ROTS; P < 0.05) were considered. Finally, among the top 20 most frequent candidate proteins, the Lasso model with the protein panel having the largest improvement in discriminative performance in terms of area under the receiver-operating characteristic curve (AUROC) and with the least number of predictors was identified. Statistical significance of the differences in the AUROC values between the models was determined using the DeLong method46 implemented in the R package pROC47.


The MSStats (3.8.4) plugin included in the Skyline software was used for the group comparison between cases and controls48. Briefly, after normalizing the data to the housekeeping protein the statistics were calculated on the basis of the Turkey’s median polish method. The latter uses a linear mixed model to give a robust estimation of differentially abundant proteins between conditions.

Data availability

The LC-MS/MS proteomics discovery data are available from the ProteomeXchange Consortium via the PRIDE49 partner repository with the dataset identifier PXD008278. The SRM verification data are available from the ProteomeXchange Consortium via the PASSEL50 partner repository with dataset identifier PASS01146.