Plasma protein expression profiles, cardiovascular disease, and religious struggles among South Asians in the MASALA study

Blood protein concentrations are clinically useful, predictive biomarkers of cardiovascular disease (CVD). Despite a higher burden of CVD among U.S. South Asians, no CVD-related proteomics study has been conducted in this sub-population. The aim of this study is to investigate the associations between plasma protein levels and CVD incidence, and to assess the potential influence of religiosity/spirituality (R/S) on significant protein-CVD associations, in South Asians from the MASALA Study. We used a nested case–control design of 50 participants with incident CVD and 50 sex- and age-matched controls. Plasma samples were analyzed by SOMAscan for expression of 1305 proteins. Multivariable logistic regression models and model selection using Akaike Information Criteria were performed on the proteins and clinical covariates, with further effect modification analyses conducted to assess the influence of R/S measures on significant associations between proteins and incident CVD events. We identified 36 proteins that were significantly expressed differentially among CVD cases compared to matched controls. These proteins are involved in immune cell recruitment, atherosclerosis, endothelial cell differentiation, and vascularization. A final multivariable model found three proteins (Contactin-5 [CNTN5], Low affinity immunoglobulin gamma Fc region receptor II-a [FCGR2A], and Complement factor B [CFB]) associated with incident CVD after adjustment for diabetes (AUC = 0.82). Religious struggles that exacerbate the adverse impact of stressful life events, significantly modified the effect of Contactin-5 and Complement factor B on risk of CVD. Our research is this first assessment of the relationship between protein concentrations and risk of CVD in a South Asian sample. Further research is needed to understand patterns of proteomic profiles across diverse ethnic communities, and the influence of resources for resiliency on proteomic signatures and ultimately, risk of CVD.

South Asians experience a higher burden of cardiovascular disease (CVD) compared with other U.S. populations; the majority of this increased CVD is likely due to modifiable risk factors 1 . While psychosocial stress has been previously demonstrated to be an important factor associated with risk of CVD [2][3][4] , there is a paucity of research on whether these factors associate with CVD risk in South Asian populations 1 . Stress has nonetheless been identified as one of the top ten drivers of health disparities worldwide by the World Health Organization, with significant disparities in exposures to stress between majority versus minority racial/ethnic and low Protein characteristics. Figure 2 shows the distribution of the medians (from all patients) for all 1305 SOMAscan proteins on a logarithm base 10 scale. The protein relative fluorescence units (RFU) magnitude ranged from 1.6 logs to nearly 5.4 logs. Proteins with low RFU values (2 logs or below; RFU < 100) comprised about 6.5% of all proteins (85 out of 1,305 proteins). The protein with the largest RFU value was Apolipoprotein E (isoform E4), with a median value of 237,512 RFU, and the lowest RFU value protein was Baculoviral IAP repeat-containing protein 7 Isoform beta, which had a median RFU level of 53. The distribution of the median estimates for all proteins was similar both in those with and without a CVD event. The variability of these protein RFUs is shown in Fig. 3, where the logarithm base 10 of the standard deviation is displayed. These values ranged from 1.1 logs to 4.8 logs.
Of these 1305 proteins, we expected many to be correlated. In selecting our final set of protein biomarkers to use in final multivariable models, we therefore preferentially chose proteins with little or no correlation to other proteins. We performed a Spearman rank correlation analysis on all 1,305 proteins, which yielded 1,703,025 correlation coefficients. Figure 4 shows the distribution of protein-protein correlations at the Spearman correlation coefficient of 0.5 or above. There were 459 proteins that correlated at 0.5 or above with 0-30 other proteins. There were also 23 proteins that correlated at 0.5 or above with 660-700 other proteins, representing redundant information. In cases where our analyses yielded two proteins with an equal effect on CVD, we selected the    (15) 47 (40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53)(54)(55)(56)(57) 48 (13) 45 (39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51)(52)(53)(54)(55) 51 (16) 48  0.351  Systems biology analysis. Ingenuity Pathway Analysis was performed using the 36 proteins significantly associated with CVD to obtain insights into enriched signaling pathways and biological mechanisms. Interac-  . Distribution of the number of proteins that correlate with other proteins at the Spearman correlation coefficient of 0.5 or above. The purpose of this correlation analysis is to identify proteins that are least correlated to other proteins. For example, in the first bar of this graph, there are 459 proteins that are correlated at 0.5 or above with 0 to 20 other proteins. There are 23 proteins that are highly correlated (0.5 or above) with 660 to 700 other proteins. www.nature.com/scientificreports/ tive network analysis generated a network of primarily inflammation and immune function associated proteins/ genes that incorporated 24 of the 36 proteins into a single network, indicating a significant relationship among the majority of proteins associated with CVD incidence in our sample (Fig. 5A). Proteins up-regulated in CVD compared to controls are highlighted in red and down-regulated proteins in green. We then modeled the links between differential expression of these 36 proteins based on their established associations with shared upstream regulatory proteins and biological effects. The upstream regulators and biological functions that were most significantly enriched by the 36 proteins are shown in Fig. 5B,C. Pro-inflammatory cytokines such as IL-1, IL-4, and IFNG are predicted upstream regulators of 5, 8, and 9 of the 36 CVD-associated proteins, respectively (Fig. 5D). Similarly, TGFB1 and VEGFA are predicted upstream growth factors regulating 10 and 4 of the 36 CVD-associated proteins, respectively (Fig. 5D). We also saw enrichment for enhanced recruitment of immune cells, such as granulocytes, neutrophils, and myeloid cells, as the most prominently enriched biological function, followed by reduced activity of vasculature functions (Fig. 5E). Figure 5E highlights in detail the proteins linked to immune cell recruitment and vascular functions among the 36 proteins. This figure demonstrates, for example, that recruitment of granulocytes or neutrophils is linked to a similar and overlapping set of proteins as can be seen by the connecting nodes leading from recruitment of granulocytes to mostly the same proteins as recruitment of neutrophils. Table 2. Univariable logistic regression for the selection of top 36 proteins (p-value < = 0.05) associated with incident CVD. Correlation with Incident CVD is calculated using the Spearman correlation coefficient between each protein and Incident CVD. The # proteins corr > 0.5 indicates the number of other proteins that correlate with this protein at 0.5 or higher. Logistic Regression p-value is Wald chi-square p-value of univariable logistic regression modeling incident CVD. www.nature.com/scientificreports/ Multivariable modeling of protein and clinical variables. From the 36 proteins that were statistically significant at the type-I error threshold of 0.05 in the univariable logistic regression, we performed stepwise model selection using Akaike Information Criterion (AIC) to arrive at a final multivariable model of CVD risk that included three proteins: Contactin-5 (CNTN5), Low affinity immunoglobulin gamma Fc region receptor II-a (FCGR2A), and Complement factor B (CFB). Figure 5 highlights these proteins (red arrows) in the context of the interaction and upstream regulator networks. Separately, we also modeled the associations between MASALA clinical variables and CVD. We considered the following variables for this model: age, sex, smoking, type 2 diabetes (T2D), hypertension, LDL, and HDL. We did not include CAC score because it was highly correlated with hypertension. In the multivariable logistic regression with these seven clinical variables, we obtained an AUC (area under the receiver-operating-characteristic curve) of 0.69, but only T2D showed a significant adjusted effect on CVD (OR 3.0; 95% CI 1.2-7.5). Thus in our final main effects model, we included both clinical information and proteomic data, using the following 4 predictors: CNTN5, FCGR2A, CFB, and T2D. Table 3 shows that this model had an AUC of 0.82. CNTN5 and CFB exhibited down-regulated effects, such that an increase of 1 SD in CNTN5 and CFB RFU levels would decrease the odds of CVD by 70% and 55%, respectively. FCGR2A exhibited an up-regulated effect, such that 1 SD of increase in this protein would increase the odds of CVD by more than threefold.

R/S associations with CVD.
We assessed the influence of 10 different R/S measures or scales, as described in the Materials and Methods section, first assessing the direct association of these variables with CVD, and then assessing the modifying influence of these variables on the protein-CVD associations. With respect to associations between the R/S measures and CVD (Table 4), the largest absolute differences in CVD were seen according to positive religious coping (12%), religious struggles (12%), and Non-theistic Daily Spiritual Experiences (11%). There were moderate to high correlations among the R/S continuous variables. For example, positive religious coping and religious struggles were correlated with one another (Spearman correlation = 0.50), and both were correlated to Closeness to God at 0.80 and 0.35, respectively. Group prayer, individual prayer, and attendance at religious services or temple were correlated at above 0.40. Univariable logistic regression was used to assess the association between binary R/S variables (median-and conceptually-based) and CVD (the last 2 columns of Table 4). None showed statistically significant p-values; however, positive religious coping, religious struggles, and non-theistic daily spiritual experiences yielded the largest point estimates of odds ratios.

Effect modification by measures of R/S. Because we hypothesize that R/S variables may be important
resources for resiliency that modify the impact of stress on CVD, we then carried out effect modification analyses using the R/S variables. We hypothesized that R/S variables would decrease associations observed among participants between protein concentrations and increased risk of CVD, with the exception of religious struggles, which we anticipated would have the opposite effect (increasing the strength of protein-CVD risk associations). We used the above main effects model that included the three proteins and T2D for effect modification analyses. First, we added an interaction term between each protein and each median-based R/S variable (dichotomized above and below the median of the continuous R/S variable) to this model. Religious struggle had a significant interaction with CNTN5 (p-for-interaction: 0.009) and CFB (p-for-interaction: 0.004). We used linear contrasts to estimate the effect of protein levels (increase of 1 SD) on CVD for those above and below the median score for religious struggle ( Table 5). The adjusted odds ratio for the effect of CNTN5 levels on CVD varied from 0.04 (religious struggle above median) to 0.24 (religious struggle below median). Contrary to our hypothesis, we found that while increasing protein levels of CNTN5 decreases the odds of CVD, this decrease in odds of CVD is even greater in those with high religious struggle (above median). For CFB and CVD, however, the adjusted odds ratio varied in the opposite direction, in concert with our hypothesis, ranging from 0.86 (religious struggle above median) to 0.18 (religious struggle below median). This demonstrates that while increasing protein levels of CFB would decrease the odds of CVD, this decrease in odds of CVD is less in those reporting high levels of religious struggle (above median). For FCGR2A and CVD, where no significant interaction between the protein and reported levels of religious struggle was found, the adjusted odds ratio varied from 5.68 (below-median score for religious struggle) to 9.3 (above-median score for religious struggle).
It is significant to note that the AUC for the model with the interaction term for religious struggle (Table 5) is 0.91, a substantial improvement from the AUC of 0.82 (Table 3) in the model without the interaction term.

Discussion
In this paper, we present a case-control proteomics analysis among 100 South Asian adults from the MASALA study who are also participating in the SSSH, in which we found significant associations of three proteins (CNTN5, CFB, and FCGR2A) with risk of a future CVD event. Specifically, an increase of 1 SD in RFU of CNTN5 and CFB would decrease odds of having a CVD event by 70% and 55%, respectively; an increase of 1 SD in RFU of FCGR2A would increase the odds of having a CVD event by 3.2-fold. We then tested our hypothesis that certain religious or spiritual (R/S) practices or beliefs would modify the effect of significant proteins on risk of having a future CVD event. Two of these protein-CVD associations (CNTN5 and CFB) were significantly modified by reported levels of religious struggle/spiritual struggles, a strategy for coping with stressful life situations in which an individual understands such events as proof that their God has abandoned them, that they are being punished for their transgressions, or some other negative religious belief.
Our finding that CFB protein is associated with risk of CVD echoes four previous studies, only one of which was conducted in a South Asian sample [33][34][35][36] . CFB has been shown to be elevated in adipose tissue and serum www.nature.com/scientificreports/ from South Asians with type 2 diabetes and CVD 37 , to correlate with fasting glucose and circulating lipids 38 , and to increase the risk of heart disease 39 , but the relationship of CFB to disease is still not fully understood 36 . It is unclear why our analysis showed an effect in the opposite direction, where increased CFB protein was associated with lower risk of CVD. There is less existing research assessing the associations of CNTN5 and FCGR2A proteins with risk of CVD, although there is some evidence suggesting that polymorphisms in their corresponding genes may play a role in CVD. In particular, a polymorphism in the CNTN5 gene has been associated with atrial fibrillation and heart failure in a genome-wide analysis of the (mostly white) Framingham Heart Study participants 33 . Knockout of the CNTN5 gene in mice has also been shown to elevate several cardiovascular parameters, including heart rate, blood pressure, and blood flow speed 40 . A polymorphism in the FCGR2A gene has also been shown to predict coronary artery disease, and was also associated with altered levels of C-reactive protein 41 . While decreased FCGR2A expression has been reported to play a role in the pathogenesis of atherosclerosis, increased FCGR2A expression in platelets may also play a role in atherothrombosis 41,42 .
Based on public databases such as Human Protein Atlas (https ://www.prote inatl as.org) and GeneCards (https ://www.genec ards.org), all 3 of our significant proteins (CNTN5, CFB, FCGR2A) are expressed under various conditions in multiple bodily fluids, organs, and cell types. This is not unexpected, as this is case for the majority of human proteins. Furthermore, several previous SOMAscan studies have identified these 3 proteins as differentially expressed in various non-CVD disease phenotypes. This includes a published study on ovarian cancer, in which SOMAscan data were compared to antibody-based Olink, demonstrating that CNTN5 is significantly differentially expressed and highly correlated between the 2 platforms 43 . A second Alzheimer's disease exosome study incorporated CNTN5 into a panel that predicts Alzheimer's, with a high AUC 44 . And lastly, another Alzheimer's disease study has also demonstrated differential expression of CFB in relation to Alzheimer's 45 .
Systems biology analysis of the 36 CVD-associated proteins, including the 3 proteins selected for our multivariable analysis, suggested that the majority of these proteins are interconnected in an interactive network downstream of several cytokines and growth factors and relate to immune cell recruitment and vascularization. This list of 36 proteins includes several proteins previously linked to CVD, such as ADAMTS13 and LRIG3. Lower levels of plasma ADAMTS13 have been demonstrated to be associated with CVD in young patients 46 and to predict cardiovascular events in type 2 diabetics 47 . LRIG3 has been shown to increase blood pressure and induce cardiac hypertrophy 48 . There are a number of published studies that link circulating blood or urine proteins to CVD, including several studies utilizing the SOMAscan platform [49][50][51][52] . Most notably, a recent, largescale study using SOMAscan among participants with stable coronary heart disease (CHD) in the Heart and Soul Study (N = 938), and a validation analysis in the HUNT3 study (N = 971), identified a 9-protein risk score that was derived and validated for 4-year probability of myocardial infarction, stroke, heart failure, and allcause mortality 27 . These nine proteins included ANGPT2, MMP12, CCL18, C7, SERPINA3, ANGPTL4, TNNI3, GDF8/11, SERPINF2. This 9-protein risk score did not include any of the 3 proteins discovered in our study. However, these previous studies included, for the most part, white populations of relatively high-risk individuals with established CVD. Our analysis, on the other hand, was conducted in persons with and without a CVD event among U.S South Asians, who have not been included in extant studies. It is essential that biomarker research Ingenuity Pathways Analysis was applied to generate the interactive networks from the 36 proteins associated with CVD events. The interactive network with the highest statistical significance is shown here. Red indicates protein up-regulation and green denotes protein down-regulation in individuals with CVD event. The intensity of the node color indicates the degree of up-regulation (red) and down-regulation (green) in individuals with a CVD event as compared with the matched controls. Empty shapes reflect proteins/genes not differentially expressed or absent from the SOMAscan platform that were brought in as interactors. Proteins are coded by shape. Square: cytokine; vertical rhombus: enzyme; horizontal rhombus: peptidase; trapezoid: transporter; ellipse: transmembrane receptor; circle: other. Links reflect various potential interactions such as protein expression regulation, protein activity, or modification of the other protein-protein interactions. Red arrows indicate the 3 proteins significant in final multivariate CVD-protein signature models. (B) Upstream Regulator Analysis of CVD Incidence Proteins. Upstream regulators (i.e., a protein/gene that can affect the expression of another protein/gene) with highest statistical significance that best explain the observed expression changes in the input 36  www.nature.com/scientificreports/ aimed at improving CVD prevention includes participants from minority racial and ethnic communities that bear a disproportionate burden of CVD. Although U.S. South Asians bear a higher burden of CVD compared with other U.S. populations 1 , no proteomics research has been conducted in this sub-population prior to our study. If proteomics research is to inform efforts to reduce racial/ethnic disparities in the burden of CVD, research www.nature.com/scientificreports/ to identify protein biomarkers must include adequate numbers of individuals from communities that bear the highest rates of CVD-related morbidity and mortality. Our analysis is also the first exploration of whether R/S influence proteomic profiles associated with risk of CVD. Although religiosity and religious service attendance have been associated with incident CVD in a large prospective cohort study and several meta-analyses 8,19,20 , to date no R/S measures have ever been investigated in a proteomic analysis. Our results are the first to demonstrate that religious struggles significantly influence the impact of protein concentrations on risk of CVD. In our sample of U.S. South Asians, among those who reported religious struggles (e.g., feeling they were being punished by God, abandoned by God, doubting their faith) in response to stressful life events, 1 SD increase in CFB concentration was associated with a smaller decrease in CVD risk (14%) relative to those that did not employ religious struggle (82%). This may be in part because religious struggles may be stressful events in and of themselves. In this case, it is plausible that stress from religious struggles has a stronger effect on risk of CVD than CFB concentrations, thereby cancelling out the positive impact of greater concentrations of CFB. Religious struggles have not been investigated in relationship to CVD risk, but have been shown in previous studies to associate with stress, inflammation, and immune markers such as cortisol 53 , CD4 levels 54 , and interleukin-6 55 .
It is curious, however, that our results show religious struggles having the opposite effect on the association between CNTN5 and CVD risk, even though both CFB and CNTN5 were both shown to decrease risk of CVD in our analysis. Given that the biological pathways through which religious struggles work to affect health are still largely unknown, and that CNTN5 is also a protein about which we have relatively little mechanistic knowledge in the context of CVD, it is difficult to disentangle these relationships. Notably, the researcher who developed our religious coping measure has stated that religious struggles should not be considered universally maladaptive, and that "the efficacy of particular coping methods is determined by the interplay between personal, situational, and social-cultural factors, as well as by the way in which health and well-being are conceptualized and measured" 16 . The self-reported and somewhat subjective nature of these questionnaire items makes interpretation of such factors less straightforward than for biological variables. These pilot study results should be taken as preliminary associations that warrant further investigation in larger samples that can support more robust modeling in order to confirm the magnitude and the direction of the impact of religious struggle and other R/S measures on these significant protein-CVD associations across South Asian and other understudied racial/ethnic communities. These future studies will need to pay careful attention to modeling the interplay of R/S with the personal, situational, and socio-cultural factors mentioned above.
Notably, our analyses only include one protein measurement from a single time point for each subject; therefore, there is clearly a concern regarding the reproducibility and stability of these protein measurements. We were constrained by limited resources to run the SomaScan assay more than once for each MASALA participant, however we attempted to evaluate this issue by checking the stability of protein levels in seven participants that we are studying in another cohort, the Nurses Health Study II (NHSII), who have SomaScan data at two time points (baseline and 1-year). We looked at the same three proteins (CNTN5, FCGR2A, CFB) and assessed the stability of these proteins across the two time points in our NHSII participants, one year apart. Our results indicate that the levels of these three proteins remain quite similar between the two time points (Supplemental Table 1). Compared to these three proteins in our case-control study, the range of CFB is of similar magnitude between the NHSII data and our data. FCGR2A in these NHSII patients falls in the range of FCGR2A for our data. CNTN5 in the NHSII study for these patients is similar to CNTN5 distribution in our data. Overall, it does appear that these proteins are stable over time, at least in the observed 1-year span seen here in the NHS data.
Although our study data show robust discovery of three candidate biomarkers associated with CVD and religious struggle, discovery studies need to be paired with robust validation in order to establish quantitative accuracy for deleterious levels of protein concentrations affecting risk of CVD 23 . We plan to follow up this pilot study with a larger validation study in the context of the diverse participants from the SSSH, which includes approximately 1000 Black, Hispanic/Latino, American Indian, South Asian, and white participants, for a total of ~ 5000 participants. Our pilot study also comprised a relatively small sample size, although we demonstrate that we had adequate statistical power for all analyses (see "Methods"). Despite these limitations, this pilot study appears to be the first untargeted proteomics investigation of proteins that predict incident CVD in a U.S. South Asian population, and the first to investigate the role of R/S in modifying these relationships in any population. Our results provide a strong rationale for further investigation of these proteins as predictive protein markers of CVD in this high-risk ethnic population, and potentially other minority communities. Our results also provide a strong rationale for further investigating the biological impact R/S beliefs and practices may have on protein expression and risk of CVD, whether exerting a protective or deleterious effect, in order to creatively explore novel avenues for reducing the burden of CVD and disparities in the burden of illness.

Materials and methods
Study population. This analysis used data from a subsample of participants from the Mediators of Atherosclerosis in South Asians Living in America (MASALA) Study who are also participating in the Study on Stress, Spirituality, and Health (SSSH). The SSSH is a national multi-cohort study that brings together data from racially and ethnically-diverse cohorts and collects new data on all SSSH participants through periodic surveys to understand the relationships between psychosocial stress, religious and spiritual (R/S) practices and beliefs, and health. MASALA is an ongoing study designed to investigate causes of cardiovascular disease among South Asians not explained by traditional risk factors. In addition to CVD, MASALA measured a number of other facets of psychosocial and behavioral health. Participants were initially recruited from 2010 to 2013 56 Table 4. Prevalence of R/S Variables and their Association with Incident CVD in the MASALA Study. *Mean (Standard Deviation). † Of the 100 subjects in the sample, depending on the R/S variable, 10 to 14 subjects do not have R/S data. ‡ R/S variables dichotomized at the median of the distribution of the continuous R/S variables (yes: above median). § R/S variables dichotomized into conceptually based R/S variables (yes: above threshold). || Univariable logistic regression where Incident CVD is the dependent variable, and the Medianbased R/S variable is the independent variable. This is the odds ratio of Incidence of CVD above versus below median of the R/S variable distribution. # Univariable logistic regression where Incident CVD is the dependent variable, and the conceptually-based R/S variable is the independent variable. This is the odds ratio of Incidence CVD of above versus below conceptual threshold of the R/S variable distribution.   Table 5. Influence of Religious struggle on Associations between Proteins and Incident CVD Using Linear Contrasts * . *The linear contrasts came from the interaction terms of proteins and the median-based negative coping variable. Three interaction terms were included in the main effects model shown in Table 3. Shown are the odds ratios and 95% confidence interval for each stratum of negative coping. † AUC from main effects model + interaction term of religious struggle x protein.

Effect of 1 SD increase of protein for negative coping below median
Effect of 1 SD increase of protein for negative coping above median www.nature.com/scientificreports/ Francisco Bay and greater Chicago areas using telephone-based recruitment methods in areas where census data revealed high proportions of South Asians; 906 original cohort members (Exam 1) underwent a 6-h baseline examination. During a follow-up visit of the MASALA study between 2016 and 2018 (Exam 2), 733 returning cohort members completed the baseline Spirituality Survey (SS-1) of the SSSH. A new recruitment wave from 2017-2018 (Exam 1A) resulted in an additional 257 participants who completed both a MASALA baseline examination and the SSSH SS-1. In all, 990 MASALA participants completed the SS-1 at baseline and follow-up visit.
We used a nested case-control design with a total of 100 MASALA participants who completed the SS-1. All subjects ranged in age from 40-84 at the time of blood collection (either at Exam 1, 2010Exam 1, -2013 or Exam 1A, 2017-2018). Fifty cases were selected who had reported a CVD event by June 2018 and were matched by age and sex to a random sample of 50 participants who did not report any CVD events by the censoring time. CVD events were confirmed by medical records review (adjudicated by 2 physicians) and completed all R/S measures to be used in this analysis. R/S variables. We included 10 R/S measures in our analyses, including 5 R/S scales and 5 individual R/S items. All items were taken from the SS-1 and prefaced with the following statement: "These questions are being asked of people from different religious backgrounds, and although we use the term 'God' in some of the questions below, please substitute your own word for 'God' (for example, Bhagwan, Allah, The Divine, etc.)".
Positive coping and religious struggles. Two subscales included eight positive religious coping items (α = 0.93) and six religious struggles items (α = 0.77) from the RCOPE 57 , with input from the scale creators in reducing the number of sub-items assessed. All religious coping items were completed in response to the prompt, "In facing recent stressful life events…" Positive religious coping items included: "I saw my situation as part of God's plan, " "I tried to see how God might be trying to strengthen me in these situations, " "I tried to make sense of the situation with God, " "I worked together with God to relieve my worries, " "I did what I could and put the rest in God's hands, " "I took control over what I could, and gave the rest up to God, " "I sought God's love and care, " and "I trusted that God would be by my side. " Religious struggle items included: "I wondered what I did for God to punish me. " "I wondered if God allowed this event to happen to me because of my sins, " "I believed the devil or evil spirits were responsible for my situation, " "I felt as though the devil or an evil spirit was trying to turn me away from God, " "I wondered whether God had abandoned me, " "I questioned God's love for me. " Response categories include 1: not at all, 2: somewhat, 3: quite a bit, 4: a great deal. Items were averaged to create measures of positive and religious struggle.
Closeness to god (4 items). Among those who indicated belief in God, four items assessed participants' perception of their relationship with their God: "God gives me the strength to do things I otherwise could not do myself, " "God loves or cares for me unconditionally, in a way I could never earn, " "Throughout my life, God has come through for me, " "God is the center of my life, " and "When I pray, I feel a deep sense of closeness with God" (all de novo). Each ranged from 1 to 5 (1: definitely not true of me, 2: tends not to be true of me, 3: unsure, 4: tends to be true of me, 5: definitely true of me). Responses were averaged to create a Closeness to God scale.

Religious and spiritual practices (4 items).
Four separate items assessed frequency of: religious service attendance, group prayer outside of service attendance, praying alone, and meditation, with responses ranging from 1 (never) to 7 (several times a day). Each of these measures was evaluated individually.
Gratitude (2 items). Two questions assessed feelings of gratitude (α = 0.80), drawn from the Gratitude Questionnaire-6 58 : Responses to these two items were averaged to create a measure of gratitude.
Non-theistic daily spiritual experiences (4 items). Non-theistic subscales of the Daily Spiritual Experiences Scale (DSES) 59 were used to assess spiritual experiences that do not depend on belief in God. Four items formed the DSES non-theistic subscale: "I experience a connection to all of life, " "I feel deep inner peace or harmony, " "I am touched by the beauty of nature, " and "I feel a selfless caring for others. " Responses ranged from "never" to "many times a day. " Responses were averaged to create a non-theistic DSES scale. All R/S scales and measures were converted to binary variables for analysis. For all variables except for closeness to God, gratitude, and meditation, binary variables were created in two separate ways: each R/S continuous (or ordinal) variable was dichotomized as above or below a conceptually-based threshold ("conceptually-based"), based on previous research, and then a separate set of binary variables were created by dichotomizing according to the median of the variable ("median-based").The conceptual binary variables were created as follows: respondents were coded as having positive religious coping if they scored ≥ 3 on the scale (i.e., "quite a bit" or more). Religious struggle was less common, so respondents were coded as having religious struggle if the scale score was ≥ 2 (i.e., "somewhat" or more). Group prayer was relatively uncommon, so if respondents reported this activity once a week or more they were coded as someone who participates in group prayer. Individual prayer was more common, so participants were coded as those who regularly pray if they prayed privately once per day or more. Participants were considered to attend religious services/temple if they attended 2-3 times a month or more. Participants were considered to have non-theistic daily spiritual experiences if they scored ≥ 4 ("every day or more") on the scale. Participants were considered to have God involved with their health locus of control if they answered either "my health is determined by my own actions and behaviors, " or "when it comes to my health, God and I both have a role to play. " Otherwise, they were coded as God not being involved in health locus of control.
Closeness to God, gratitude, and meditation variables were only dichotomized according to the median response value due to a lack of theoretical or empirical information to guide a conceptual threshold for dichotomization.
SOMAscan assay. EDTA plasma samples (50 μl) from 100 individuals matched on age and gender were analyzed using the highly multiplexed SOMAscan Assay Kit for human plasma 1.3 k, which measures expression of 1,305 human proteins using highly selective single-stranded modified SOMAmers according to the manufacturer's standard protocol (SomaLogic; Boulder, CO). The majority of the proteins measured by SOMAscan are known to be secreted, leaked, or shed from cells into the circulation 60 , making this proteomics platform an ideal technology for biomarker discovery in plasma. Five pooled plasma controls and one no-protein buffer control were run in parallel with the plasma test samples for each run. Sample to sample variability is further controlled by several hybridization controls. Each individual protein concentration was transformed into a corresponding SOMAmer concentration, then quantified using a custom DNA microarray (Agilent) read-out, which reports the data as relative fluorescence units (RFU) 29,61,62 . Data quality control, calibration, and normalization was done according to the manufacturer's protocol as previously described 30 . Statistical analysis. Power analysis. As this is a pilot study, our power analysis was based on being able to identify several proteins that exhibit a significant signal in their association with CVD, and to assess effect modification from R/S variables of interest. We assumed that two models will be compared: the basic baseline clinical variables without a proteomic signature (MODEL 1), and one with protein predictors (MODEL 2). We examined the AUC for the baseline model (MODEL 1) of incident CVD that adjusts for age, sex, smoking, diabetes, hypertension, LDL, and HDL. We hypothesized that adding in the proteomic data would increase the predictive power of the model. Thus, "MODEL2 = MODEL1 + Proteins" is expected to have improved the aROC to a level of 0.80. Given a type-I error of 0.05, we would have power of 0.88 to detect the improvement of aROC from 0.70 (MODEL 1) TO 0.80 (MODEL 2) using a sample size of 100 participants 63,64 . The estimate aROC of 0.70 of the baseline model was based on the reported range from Alaa et al. 65 .
Statistical methods. For bivariate analyses, we used chi-square or Fisher's exact test, and t-test or Wilcoxon Rank Sum test. We used nonparametric Spearman correlations to obtain pairwise correlations for all 1305 SOMAscan proteins, and between outcome and baseline clinical variables.
We used univariable logistic regression models to screen for the selection of the individual proteins. We used multivariable logistic regression for model selection using Akaike Information Criterion (AIC), to select the optimal model with both clinical information and selected proteins.
Effect modification. In the final main effect multivariable model (that included both clinical information and select proteins), we evaluated each interaction term between the selected protein and the selected R/S variable (each model was run separately for each R/S variable and its interaction with the selected proteins). Both the median-based and conceptually-based R/S variables were tested. We then used linear contrasts from the main effect (proteins) and the interaction term (protein*R/S variable) to obtain the estimated effect of an increase of 1 standard deviation of protein on incident CVD for each level (above or below median) of the R/S variable. We report the c-statistic or AUC of the final main effect model, as well as the model with effect modification. We conducted all statistical analyses using SAS/STAT software version 9.
Systems biology analysis. Systems biology analyses of CVD-associated proteins were performed using the Ingenuity Pathways Knowledge Base (Qiagen, Redwood City, CA), a repository of biological interactions and functions created from millions of individually modeled relationships ranging from the molecular (proteins, genes) to organism (diseases) level. Ingenuity Pathway Analysis (IPA) uses enrichment analysis-based www.nature.com/scientificreports/ approaches 66,67 to calculate the significance of observing a candidate protein set within the context of biological systems. Ingenuity Pathways Analysis (IPA) has generated a knowledge database that defines and incorporates many different categories and subcategories of biological processes and disease functions based on gene/protein expression or interactions. Each function defined by IPA includes a specific set of genes/proteins that have been linked to the particular functional category based on published data. IPA then calculates the enrichment of genes/ proteins from the list of genes/proteins in the particular test set to the genes/proteins included in each functional category and calculates the p-value for enrichment or overlap between the test set and the IPA knowledge base using Fisher's Exact test.
The different functional categories are defined by IPA based on genes/proteins associated with such functions and can be displayed in a tree map which clusters related functions together, thus providing a high-level view of the function families. Consequently, there may be various related functions such as "recruitment of neutrophils" and "recruitment of granulocytes" that have many overlapping genes/proteins, but have also some that are unique to one but not the other. Since granulocytes includes basophiles, eosinophils, and neutrophils, granulocytes would be a higher-level hierarchical category than neutrophils and neutrophils would be a subcategory of granulocytes.
Ethics statement. Institutional Review Board (IRB) approval for this study was obtained from the Partners Human Research Committee (PHRC) and the University of California, San Francisco Review Board.

Data availability
MASALA study datasets may be obtained from the cohort's Coordinating Center upon approval of an ancillary study proposal, signing a data use agreement, and obtaining IRB or other relevant ethics approval. The SS-1 data may be obtained from the SSSH Steering Committee after approval of an ancillary study proposal, signing of a data use agreement, and obtaining IRB or other ethics approval.