We recently reported that differential gene expression and DNA methylation profiles in blood leukocytes of apparently healthy smokers predicts with remarkable efficiency diseases and conditions known to be causally associated with smoking, suggesting that blood-based omic profiling of human populations may be useful for linking environmental exposures to potential health effects. Here we report on the sex-specific effects of tobacco smoking on transcriptomic and epigenetic features derived from genome-wide profiling in white blood cells, identifying 26 expression probes and 92 CpG sites, almost all of which are affected only in female smokers. Strikingly, these features relate to numerous genes with a key role in the pathogenesis of cardiovascular disease, especially thrombin signaling, including the thrombin receptors on platelets F2R (coagulation factor II (thrombin) receptor; PAR1) and GP5 (glycoprotein 5), as well as HMOX1 (haem oxygenase 1) and BCL2L1 (BCL2-like 1) which are involved in protection against oxidative stress and apoptosis, respectively. These results are in concordance with epidemiological evidence of higher female susceptibility to tobacco-induced cardiovascular disease and underline the potential of blood-based omic profiling in hazard and risk assessment.
Exposure to tobacco smoke is one of the best studied examples of an exposure with proven causal association with a large number of human diseases1. Although the relevant epidemiological evidence is not completely consistent, many studies have provided evidence of differential sex susceptibility to the health effects of tobacco, especially in relation to smoking-induced cardiovascular disease (CVD; including acute myocardial infarction and coronary heart disease)2,3,4, chronic obstructive pulmonary disease (COPD)1,5,6 and lung cancer7,8,9. Focusing in particular on CVD, a systematic review and meta-analysis of data from 86 prospective studies and nearly 4 million subjects came to the conclusion that female smokers have a 25% higher risk of developing coronary heart disease than males with the same exposure to tobacco smoke and after allowing for other risk factors2, a conclusion supported by the results of a recent meta-analysis of the available data3. Another recent systematic review and meta-analysis covering data from 81 cohorts and nearly 4 million subjects concluded that the risk of stroke in Western populations is 10% higher in female smokers4.
Biomarker-based investigations have contributed significantly to our understanding of the disease risks associated with exposure to environmental hazards. Currently such biomarker studies are benefiting from the expanding use of genome-wide profiling (omics). We have recently reported the results of a study of the impact of tobacco smoke exposure on genome-wide gene expression and DNA methylation in white blood cells (WBCs) of apparently healthy smokers10 and identified large numbers of transcripts and DNA CpG sites whose expression and methylation, respectively, differ significantly between current and never smokers. Furthermore we used disease connectivity analysis to show that the corresponding gene profiles can identify with remarkable efficiency (specificity 94%, positive predictive value 86%) most diseases and conditions independently known to be causally associated with smoking10. In view of this finding, we decided to look for sex-related differences in these profiles which might possibly reflect differential disease susceptibility.
In order to minimise statistical power problems, we focused our search for sex-related differences in the effects of tobacco smoking on the transcriptomic and epigenetic features which we previously found to differ significantly between mixed-sex groups of current and never smokers10. These features consist of 1,273 CpG sites (FDR < 0.05; associated with 725 differentially methylated genes - DMGs) and 350 transcripts (FDR < 0.10; associated with 271 differentially expressed genes - DEGs) which were derived from the comparison of genome-wide transcriptomic and epigenetic profiles of 143 current and 311 never smokers (including 134 males and 320 females) derived from 2 cohorts, the Northern Sweden Health and Disease Study and EPIC Italy (Suppl. Tables S1 and S2).
The tobacco exposure data available to us (Table 1) included the number of cigarettes smoked per day and smoking duration in both cohorts, the smoking intensity measured in pack-years (only in the Italian cohort), as well as plasma cotinine levels for a fraction of the study subjects from both cohorts. Inspection of this data did not reveal statistically significant differences between the two sexes, although it did suggest possibly higher exposure intensity (pack-years) in males (Table 1). While the above parameters provide an approximate picture of smoking exposure, they do not allow an accurate quantitative estimation of the long-term exposure to tobacco smoke of the different subjects suitable for use in statistical adjustment for the purpose of sub-group comparison. For this reason, and also having in mind our previous observation of the highly skewed distribution of the expression or methylation differences between current and never smokers (effect size), we opted to base our sex comparisons not on the effect size, which is expected to be strongly dependent on exposure level and duration, but on the ranking of the various features by statistical significance (i.e. p-values in current vs never smoker comparisons) in sex-stratified analyses. Thus we compared the full set of signals separately for males and females, ranking them according to the statistical significance of their current-versus-never smoker differences in each sex and, finally, compared the rankings in the two sexes of the limited number of signals of interest. The methodology is described in detail in Methods while the workflow of the procedure is shown diagrammatically in Fig. 1. We adopted this approach on the reasonable assumption that any differences between the sexes with regard to the level or duration of exposure would in general affect the effect size for the different features in a similar manner but would be unlikely to alter their ranking within each sex group. This non-parametric approach has the added advantage that it minimizes the impact of non-normal data distribution, outliers and differences in statistical power arising from the different sizes of the male and female populations.
Suppl. Fig. S1 shows the distribution of the p-values of the rank differences in the two sexes. It can be seen that a biomodal distribution is observed, with the anticipated pattern of decreasing numbers of signals as p-values decrease being observed along with an increased number of signals with very low p-values (<0.05) increases. The latter indicates the existence, among both the transcriptomic and the epigenetic signals, of sub-groups of signals whose ranking differences between the sexes are more significant than expected.
Tables 2 and 3 show the lists of expression probes and CpG sites, respectively, whose significance rankings differ significantly (p < 0.05) between the two sexes. There was no significant difference in the corresponding rankings of these signals between male and female never smokers (results not shown), indicating that the observed sex-related differences reflect the differential impact of tobacco smoke exposure.
All 26 expression probes in Table 2 show large ranking differences between the sexes, with 23 exhibiting higher significance in females (median ranks: 111 in females and 19,775 in males), while the remaining 3 probes show higher significance in males. All probes are underexpressed in smokers with the exception of 2 which are overexpressed particularly in male smokers. Owing to population size limitations, many of the identified signals do not reach statistical significance in the sex-stratified analyses. However, for the 23 female-specific probes, the median FDR value with regard to the current-versus-never smoker comparisons is 0.40 (range 0.01–0.91) in females while all 23 have FDR > 0.80 in males (not shown), suggesting that the effects of smoking are largely limited to females. Importantly, as shown in Table 2 these significance ranking differences are accompanied by a corresponding difference in effect sizes for all the signals identified, with the median effect size (ratio of expression in current divided by never smokers) observed being 0.80 (range 0.72–0.93) and 0.96 (range 0.93–1.00) in females and males, respectively. No expression probe shows opposite effects of smoking in the two sexes.
Turning to the epigenetic profile, the 92 CpG sites (associated with 72 genes) thus identified (Table 3) show both under- and overmethylation in smokers while their ranking differences between the sexes are also substantial, with the median rank values being 432 and 215,917 in females and males, respectively. The median FDR value of these CpG sites is 0.12 (range 0.0002–0.68) in females, while all 92 sites have FDR > 0.9 in males. In complete analogy to what is observed for the transcriptomic profile, for all CpG sites the impact of smoking (Δβ = βsmokers − βnon-smokers) is greater in females than in males, with the median absolute Δβ values being 1.55% (range 0.15–8.45%) and 0.33% (range 0.00–8.45%), in females and males, respectively. No CpG sites exhibit opposite effects of smoking in the two sexes. There was no statistically significant overabundance in the distribution of the CpG sites in relation to their locations (TSS200, TSS1500, body, 3′UTR, 5′UTR, 1st exon, intragenic) or to their occurrence in CpG islands and their regions (island, non-island S-shore, S-shelf, N-shore, N-shelf). Finally, we note that one gene, PLIN5, appears to be more overexpressed in females but more undermethylated in males, implying a possibility of differential epigenetic regulation in the two sexes.
Consistency and stability of observed sex effects
Owing to the limited population size no direct replication between the two cohorts was conducted. However, cohort-stratified analyses show that the features listed in Tables 3 and 4 tend to be among the most significant features, in terms of sex differences, also in the individual cohorts. Thus, the median rank value of the 92 CpG sites of Table 4 was 3,048 (out of a total of 410,987 sites examined; top 0.7%) in females and 195,213 in males in NSHDS, and 9,098 (top 2%) and 218,704, respectively, in EPIC Italy, thus demonstrating a clear trend of higher female sensitivity in both cohorts. A similar trend was seen with the corresponding transcriptomic signals of Table 2, with the 23 female-specific signals having median ranks of 585 (out of a total of 29,667 probes examined; top 2%) in females and 25,742 in males in NSHDS, and 1,708 (top 6%) and 12,338 in females and males, respectively, in EPIC Italy. As regards the 3 male-specific transcriptomic signals, all 3 in NSHDS and 2 out of 3 in EPIC ranked higher in males.
The observation that almost all sex-specific features identified show higher sensitivity in females is striking. For comparison it is noted that only 116 of the 350 transcripts and 1,009 of the 1,273 CpGs differentially modified by smoking in the mixed population have lower FDR values in females. Although, as stated above, the rank-based comparison we employed is not expected to be significantly affected by group size, in view of the larger number of females in our study (320 female versus 134 male current and never smokers), we ran the same analysis as described above 10 times, in each case using all 134 male subjects and an equal number of females sampled randomly from our population while maintaining constant the proportions of subjects coming from each of the two cohorts and with the different types of smoking status (see Fig. 1, right). In each such resampling analysis, the resulting sex-specific epigenetic or transcriptomic signals included on average 50.3% (S.D. 7.3%) and 50.8% (S.D. 14.2%) of the signals shown in Tables 2 and 3, respectively, while the cumulative % overlap (average of each successive resampling round plus all preceding ones) for both lists tended towards approx. 50%, reflecting the similar loss of statistical power of the smaller subpopulations employed (Suppl. Table S3 and Suppl. Fig. S2). These observations provide confirmation that the identification of the sex-specific signals was not subject to bias by the group size of each sex class.
A number of recent studies on smoking-induced changes in methylation profiles, which employed effect size as the response classification parameter, failed to detect any sex-specific responses11,12, possibly because of residual confounding arising from insufficiently accurate adjustment for tobacco smoke exposure. By comparing in sex-stratified analyses the significance ranking of transcriptomic and epigenetic signals previously shown to differ between current and never smokers in a mixed-sex population, we identified a number of features which exhibit significantly different responses in the two sexes. Because the highly stringent nature of our non-parametric, rank-based statistical methodology inevitably attenuates sensitivity, it is possible that additional features with sex-specific behaviour may exist. On the other hand this approach has the advantage that it minimizes false positive findings and maximizes specificity.
Our observation that almost all features identified show stronger responses in females implies the possibility of higher female susceptibility to diseases related to the corresponding genes. The most notable observation regarding the list of sex-specific DEGs and DMGs relates to the presence of multiple genes related to CVD, especially genes involved in thrombin signaling and vascular and endothelial cell function. Table 4 summarises the relevant evidence (discussed further below) and compares the changes observed in the present study with those reported in clinical studies, where such information is available. It can be seen that the direction of change reported in these studies is in concordance with that which we have observed in apparently healthy female smokers, supporting the relevance of these changes to disease pathogenesis.
Genes involved in thrombin signaling
Thrombin, a serine protease, has an essential role in coagulation and haemostasis mediated by platelets, while in addition it elicits important effects in endothelial and vascular smooth muscle cells (VSMC). For these reasons thrombin-mediated effects are of great importance in the pathogenesis of CVD. Most of the cellular effects of thrombin are initiated via the activation of a family of G-protein-coupled receptors called protease-activated receptors (PARs), which are transmembrane proteins expressed on different types of cells including platelets, endothelial cells and VSMC. The main thrombin receptor on platelets and blood vessel cells is PAR1, also known as coagulation factor II (thrombin) receptor (F2R). This gene plays a key role in vascular function and CVD13 and its genetic variants are known to influence platelet function14,15. In our previously published analysis of the effects of smoking on transcriptomic and epigenetic profiles10 we found F2R to be differentially underexpressed in current smokers with a statistical significance of FDR = 0.15 which just fails to reach the threshold adopted in the present analysis (FDR < 0.10). Inclusion of the F2R-related expression probe together with the 350 probes with FDR < 0.10 in the sex-stratified analysis described above reveals a highly significant female specificity of the effect of smoking on this gene. Thus, in male- and female-stratified analyses, respectively, the significance rankings for current-versus-never-smoker comparisons were 29,197 and 1,283 (p = 1.08 × 10−4), the FDR values 0.98 and 0.040 and the effect sizes 1.00 and 0.88.
The interaction of thrombin with PAR1 in platelets is facilitated by its initial binding to the GPIb-IX-V complex which plays a critical role in thrombosis, atherogenesis and inflammation16. This complex includes the glycoprotein GP5, which is associated with a CpG site we found to be differentially overmethylated in female smokers (Table 3). Following its activation via the cleavage of its N-terminal domain by thrombin, PAR1 initiates multiple kinase signaling pathways which lead to different effects depending on the cells concerned (Fig. 2). Such effects include hemostasis and thrombosis in the case of platelets, induction of pro-inflammatory phenotype in the case of endothelial cells, increase of vascular permeability, proliferation, migration and hypertrophy in the case of VSMCs, thus contributing to the pathogenesis of different types of CVD.
Signaling by the activated PAR1 receptor is controlled by, among other factors, Src kinases17, including FYN (FYN proto-oncogene, Src family tyrosine kinase) which is associated with a CpG site differentially undermethylated in female smokers. Following initiation of GPIb-IX-V/PAR1 signalling, FYN phosphorylates PKCδ (protein kinase Cδ) which subsequently negatively modulates platelet activation18. The importance of FYN for haemostasis-related disease is underlined by the report that FYN-deficient mice show an altered haemostatic response19.
Another gene which influences platelet function is IGF1R (insulin-like growth factor 1 receptor), which is associated with a CpG site differentially overmethylated in female smokers. The IFG1R protein is expressed at high levels on the plasma membrane of platelets while its ligand, IGF1, is a growth factor found in the α granules in platelets. Stimulation of platelets with IGF1 results in rapid phosphorylation of IGF1R and potentiation of PAR1-induced platelet aggregation20.
As already mentioned, thrombin-mediated PAR (including PAR1) signaling also operates in VSMC and endothelial cells, thereby playing an important role in diverse cellular activities related to inflammation, CVD, tumor growth and other conditions. In this context the genes discussed above (with the exception of GP5 which is expressed only in platelets) can be anticipated to affect by analogous mechanisms the pathogenesis of such diseases. Moreover, a number of additional genes of relevance to thrombin signaling in VSMC and endothelial cells is included in the list of genes found to be differentially modified in female smokers. One of these genes is EGF (epidermal growth factor), differentially overmethylated in female smokers, a potent mitogenic factor in many cell types acting through its receptor EGFR. Activation of EGFR promotes thrombin-induced proliferation of VSMC21, while its inhibition attenuates thrombin-stimulated signalling along the PI3K-Akt-mTOR-S6K1 axis, leading to effects on cell proliferation and motility22. Importantly, EGFR signalling is coordinated by EPS8 (epidermal growth factor receptor pathway substrate 8)23, which is underexpressed in female smokers. In support of a probable role of EPS8 in vascular disease is the report that EPS8-null mice show increased vascular permeability24. Finally, RPTOR (regulatory associated protein of MTOR, complex 1), overmethylated at 2 CpG sites in female smokers, negatively regulates mTOR kinase25 which, as mentioned above, is involved in thrombin signaling in VSMC. It is noted that mice with an RPTOR deletion targeted on the myocardium have been reported to develop dilated cardiomyopathy26.
Other genes related to CVD pathogenesis
The list of genes exhibiting sex-specific response to tobacco smoking includes a number of additional members for which there is significant clinical or mechanistic evidence, including evidence from transgenic animal studies, that they are linked with CVD pathogenesis. The gene with the largest expression change in female smokers is HMOX1 (haem oxygenase 1), well known for its antioxidant and anti-inflammatory properties as well as for its protective role against CVD27,28,29. HMOX1 is also known to protect against smoking-induced COPD30, a disease for which there is strong evidence of differentially higher susceptibility in female smokers1,5,6. Another gene of interest is HDAC4 (histone deacetylase 4), differentially overmethylated in female smokers, which plays a global role in the epigenetic control of gene expression by modifying histones as well as non-histone proteins31 and plays an important role in regulating hypertrophic responses32. Finally of particular note is the female-specific demethylation of the well known anti-apoptotic gene BCL2L1 (BCL2-like 1), which plays a key role in the regulation of platelet activation and apoptosis33. Among the consequences of platelet apoptosis is the production of microparticles, which are recognised to play an important role in inflammation, CVD, coagulation and angiogenesis34.
The largest sex-related difference in the impact of smoking (absΔΔβ, last column in Table 4) is observed at 3 CpG sites associated with CACNA1D (calcium channel, voltage-dependent, L type, alpha 1D subunit). This gene encodes for the cav1.3 subunit of a voltage-gated, L-type calcium channel and human and animal studies strongly support its association with various pathological conditions, including cardiovascular and neurological disorders35,36,37,38. On a side note, it is of interest that cav1.3 physically interacts with the receptor of GABAB, with activation of the latter leading to an increase in the L-type calcium channel currents39. Given the key role of this receptor in the mechanism of addiction, it is possible that any sex-related variation in CACNA1D expression may be reflected in corresponding differences in susceptibility to nicotine addiction. In support of this idea, several lines of evidence indicate that females have a higher susceptibility to nicotine dependence, including faster progression to dependence, shorter and less frequent abstinence periods, greater difficulty to quit, and poorer response to smoking cessation treatments38,40,41.
Other genes listed in Table 4, for which there is evidence of varying strength of links with different types of CVD, include TAGLN (transgelin42), SYNE1 (spectrin repeat containing, nuclear envelope 1)43,44, IL32 (interleukin 32)45, PLIN5 (perilipin 5)46, HNRNPUL1 (Heterogeneous Nuclear Ribonucleoprotein U-Like 1)47 and miR-30d48. Finally, Table 4 shows a number of genes (C14orf43, C1orf21, ST3GAL1 and ZNF19) which do not have any function related to CVD pathogenesis, however they have been reported to be differentially expressed in patients with different types of CVD.
Molecular basis of sex-specific effects of tobacco smoke exposure
The previous discussion shows that numerous genes among those found to be differentially altered in female smokers interact closely in the context of thrombin signaling in platelets and vascular/endothelial cells. While the molecular basis for such differential female susceptibility to tobacco smoke is not currently understood, there is strong evidence that thrombin signaling and hemostasis are subject to hormonal influences and it is possible that such influences may also modify the responses to tobacco smoking. Megakaryocytes and platelets express the estrogen and androgen receptors, and the coagulation cascade is known to be influenced by variations in the levels of female sex steroids49,50,51, while it has been suggested that females have greater baseline platelet reactivity which may be attenuated by estrogens51,52. Furthermore, it has been reported that platelets from women are more responsive than those of men to thrombin agonists51, and that females with atherosclerosis show higher PAR1-mediated platelet activation53. Sex hormones contribute to the modulation of additional genes related to CVD and may therefore also modify the impact of smoking. For example, the expression of HMOX1 under conditions of oxidative stress is modulated by estrogen receptor alpha54, while estrogen receptor beta modulates the expression of HDAC4 under the influence of hypertrophic factors in rat cardiomyocytes55.
Summarising the preceding discussion, a large number of genes which are known or suspected to play a role in the mechanism of CVD, and to modulate corresponding disease risks, have their expression or methylation in WBCs modified by smoking significantly more in females than in males. It is not known whether similar changes occur in other tissues of smokers. However we have recently reported that tobacco smoking causes similar changes in expression and CpG methylation in the Ah receptor repressor gene in WBCs and lung cells10. We have also shown that the genes which are differentially expressed or methylated in WBCs of smokers are closely related to many smoking-induced diseases regardless of their target tissue, implying that changes observed in blood cells may reflect more global effects. This conclusion is further supported by the concordance of the results presented in the present study with the conclusions of epidemiological studies which consistently point to a higher female susceptibility to tobacco-induced CVD. Furthermore, we also report analogous, although more limited, findings supporting higher female susceptibility to tobacco-induced COPD and nicotine addiction. Although our evaluation of the disease-relevance of the sex-specific DEGs and DMGs presented above did not include cancer, many of the signaling pathways discussed are also highly relevant to carcinogenesis56. It is noted that the current evidence regarding sex susceptibility to tobacco-related cancer is mixed7,8,9,57.
In conclusion, the results presented here underline the utility of blood-based omics profiling for identifying health hazards associated with environmental exposures and suggest a potential for use of such data in identification of susceptible sub-groups.
Materials and Methods
The present report is based on data from the Envirogenomarkers project (www.envirogenomarkers.net). Envirogenomarkers is a prospective case-control study nested within the European Prospective Investigation into Cancer and Nutrition study (EPIC-ITALY) and the Northern Sweden Health and Disease Study (NSHDS)58,59, in which subjects asymptomatic at the time of enrolment provided a blood sample and information on dietary habits, lifestyle, health history etc. The EnviroGenomarkers project and its associated studies and experimental protocols were approved by the Regional Ethical Review Board of the Umea Division of Medical Research, for the Swedish cohort, and the Florence Health Unit Local Ethical Committee, for the Italian cohort, and all participants gave written informed consent. All methods were carried out in accordance with the approved guidelines.
Owing to the Envirogenomarkers project’s design, some of the participating subjects had been selected on the basis of the fact that they went on to develop breast cancer or B-cell lymphoma 2–16 years after recruitment, however for the purposes of the present study they were all treated as apparently healthy. We have previously shown that inclusion of such subjects did not significantly affect the list of smoking-modified transcriptomic and epigenetic features10. Anthropometric measurements and lifestyle parameters had been collected through questionnaires at recruitment (1993–1998 for EPIC-ITALY; 1990–2006 for NSHDS). Information on smoking was obtained through questionnaires and included data on duration, number of cigarettes smoked per day and (only in Italy) pack-years. In addition, for a fraction of the subjects data on plasma cotinine concentration were also available. Details of the subjects involved in the present study are shown in Table 1. Sample collection, storage and processing procedures have been described elsewhere58,59. Based on the conclusions of a previously published pilot study60, subjects were included in the study only if the processing of their blood samples and placement of their buffy coats in cold storage had been completed within 2 hours of collection so as to minimize effects on the transcriptomic profile.
RNA and DNA extraction from buffy coats, genome-wide analysis of gene expression (Agilent 4 × 44 K human whole genome microarray platform) and CpG methylation (Illumina Infinium HumanMethylation450 platform) and the corresponding data quality assessment and preprocessing, were conducted as described previously60. Cotinine levels (AUC) in plasma were measured by reverse-phase chromatography on an Acquity UPLC system (Waters Corporation, Milford, MA, USA) with a Acquity HSS T3 C18 10 mm × 2.1 mm, 1.8 μm, column (Waters) and a binary gradient elution comprising of water 0.1% formic acid and acetonitrile 0.1% formic acid for 19 min. Online analysis of the eluent was performed using a quadrupole time-of-flight mass spectrometer (QTOF-Ultima-MS; Waters) in the positive ion mode. Data were processed using Databridge and XCMS software (Waters). We confirmed the identity of cotinine with authentic standard and accurate mass.
Data analysis and the derivation of lists of expression probes and CpG sites which differed significantly between current and never smokers has been described in detail previously10. Briefly, linear mixed models were ran, using M values for DNA methylation or log2 intensities of mRNA expression as dependent variables, plus date of isolation, labeling, and hybridization for RNA expression, or date of analysis for methylation, as random variables. All analyses additionally adjusted for sex, age, BMI and cohort. Owing to the design of the EnviroGenomarkers project, future disease (breast cancer, B-cell lymphoma) and case-control status were also included as fixed variables. In the case of DNA methylation data, the models were also adjusted for blood cell composition estimated with a published algorithm61. Multiple testing was accounted for with high stringency by using Bonferroni or FDR Benjamini-Hochberg correction. This procedure led to the identification of lists, recently published10, of 1,273 CpG sites (FDR < 0.05) and 350 transcripts (FDR < 0.10) which differ significantly in current relative to never smokers (Suppl. Tables 1 and 2).
We looked for sex-related differences among the above mentioned expression and DNA methylation features by employing rank-based, non-parametric, statistical testing methodology based on the evaluation of the differences in the significance ranks of the probes in the two classes (Fig. 2). Towards this end we first conducted current-versus-never smoker comparisons, using the same statistical models as previously, for all transcriptomic and epigenetic features (29,667 expression probes and 410,987 methylation probes, respectively) separately in males and females, ranking all features by the corresponding significance (p-value). Subsequently we extracted from these lists the rank values of the features which we had previously found to be significant in the mixed population (350 expression probes significant at FDR < 0.10 and 1,273 CpGs significant at Bonferoni-corrected p < 0.05) and calculated their differences between the two sexes. We thus derived a distribution of differences which conforms to the normality assumption (steep unimodal) and was used as basis in order search for differences that violate the null hypothesis that only non-sex-related differences are observed, i.e. signal rankings in sex-stratified analyses are equal. The statistic was calculated from the corresponding complementary cumulative distribution function (Survival Function)62 which describes the probability that a variate takes a value greater than a particular number, taking as a significance threshold the value of p < 0.05. The workflow of this analysis is described diagrammatically in Fig. 1. The tool for the implementation of the statistical evaluation of sex-specific rank differences described above is publically available, under the name RIPOSTE (“Rank DrIven POpulation STatistical Evaluation”), on the Galaxy platform at http://mebioinfo.ekt.gr/galaxy, where instructions for its use are also given.
In order to check for any bias introduced by the difference in male and female population sizes on the selection of sex-specific signals, we implemented a permutation probabilistic approach (also illustrated in Fig. 1, right) by randomly resampling 10 times the full population so as to extract subpopulations even with respect to sex and smoking status, subsequently applying to all subpopulations thus selected the same analytical workflow as described above. In each subpopulation we used all available male subjects (134) and an equal number of females selected randomly while maintaining unaltered the proportions of females from the different cohorts and with different smoking status. For each resampling we counted the number of significant (p < 0.05 for rank difference) signals that came from among those obtained with the full male and female populations (shown in Tables 2 and 3).
How to cite this article: Chatziioannou, A. et al. Blood-based omic profiling supports female susceptibility to tobacco smoke-induced cardiovascular diseases. Sci. Rep. 7, 42870; doi: 10.1038/srep42870 (2017).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research support by the European Union (Grants number 226756 and 308610). Epigenomics sample analyses were conducted under contract by CBM (Cluster in Biomedicine) S.c.r.l., Trieste, Italy, an Illumina Certified Service Provider. The authors wish to thank Margarita Bekyrou and Stella Kaila for their technical contributions.