Introduction

Smoking is a social epidemic1 that, despite its adverse effects on health and the economy, continues to be one of the top causes of preventable disease and death globally2. Tobacco use accounts for 15.4% of all deaths worldwide in 20192. Smoking is one of the main factors in causing and aggravating various diseases such as chronic obstructive pulmonary disease, neurological diseases ,cardiovascular diseases3, and various cancers4. It is estimated that 1.2 million deaths per year worldwide are due to secondhand smoking (SHS), most of which occur in children under 10 years of age5. According to the latest reports in 2016, the prevalence of daily smoking in Iran is 9.7% and it is significantly higher among men (19.6%) than women (0.9%)6. This is because men could be more prone to turn to smoking as a result of a stronger predisposition to risky behaviours and dealing with numerous job-related difficulties, family and social duties.

Smokers can be classified into three categories based on their smoking intensity: light, moderate, and heavy7. Based on the results of previous studies, there are important differences between these three groups. For example, heavy smokers are more exposed to the negative effects of smoking such as low quality of life and it is also very difficult for them to quit smoking8. In addition, data from large population studies show that light smokers are 2 to 5 times more likely to experience respiratory symptoms and heart disease compared with nonsmokers9. Identifying the factors influencing the intensity of smoking such as socio-demographic differences, type of smoking habit, age at smoking onset, and ability to quit smoking can provide the information needed to adopt and implement tobacco control policies1. The age at smoking onset can significantly predict future smoking patterns and related health consequences10. According to a study by Nash et al., the age at smoking onset was strongly associated with death after the age of 70, so that current smokers who started smoking at a younger age were at higher risk for death compared to smokers who started later11. Based on a considerable body of studies12,13, it is commonly believed that the early age at smoking onset predicts heavy smoking in the future. Relevant researched, on the other hand, have either not explicitly tested this association or have been hampered by methodological flaws14. Many studies, for example, have used binary or grouped variables to display age at onset12,13,15, resulting in the loss of potentially valuable information about the onset path over time and the inability to assess a specific period of life, such as adolescence, when people are particularly vulnerable14. Also we can point to the lack of an appropriate model to detect this relationship.

Usually to describe the relationship between risk factors and outcome, classical statistical models such as the linear regression model are used. In reality, however, we frequently need to model more complex phenomena than those depicted by linear relationships. Among these, generalized additive models (GAMs) can be considered as an intermediate between classical models and machine learning models which can both fit complex and nonlinear relationships and act very strongly in terms of interpreting and understanding the fitted model. In fact, GAMs allow us to model nonlinear relationships along with linear relationships with high flexibility16.

In the present study, there is also a nonlinear and unknown relationship between one or more features such as age at smoking onset and outcome under consideration (smoking intensity) that conventional statistical models will not be able to identify this type of relationship. Therefore, due to the relatively high prevalence of smoking among Iranian men6 and its harmful role in causing various diseases, we try to use GAMs in order to evaluate the factors affecting the intensity of smoking, especially the age of smoking onset among Iranian adult male smokers over 18 years of age. The findings of this study can be shared with health policymakers so that they can plan and implement initiatives to reduce smoking intensity by focusing on these factors.

Materials and methods

Study setting, population, and sampling method

In this cross-sectional study (approved by “The Ethics Committee of the Hamadan University of Medical Sciences”; NO. IR.UMSHA.REC.1399.105 ) we analysed data related to tobacco use in a national cross-sectional study entitled “survey of risk factors for non-communicable diseases (NCDs) in 2016” conducted by the NCDs research center of Iran in order to assess the relationship between the age at which people start smoking and the intensity of smoking. Under the direction of the World Health Organization, a survey of risk factors for NCDs is conducted in the form of a study of the care system for risk factors for NCDs (WHO). Its overarching purpose is to build the infrastructure needed for global NCD risk factor management, with a focus on developing countries, as well as to provide global sources of information on the process and distribution of risk factors.

The study target group was adults over 18 years old and sampling was done from all provinces of Iran except Qom province. Samples were selected using multi-stage cluster sampling method.

Data collection

The WHO STEPwise method for risk factor Surveillance, is called STEPs17.To acquire data on tobacco, researchers utilised the WHO's standard STEPs study questionnaire, which was self-reported. For this purpose, the questionnaire was translated from English to Persian by two experts and was again translated from Persian to English by two other experts so that the translation expresses the intended goals. In order to assess the validity of the questionnaire and its questions, the opinions of experts in the field were used. Cronbach's alpha coefficient was utilised to evaluate the questionnaire's reliability, and the determined value was 80 percent. The study protocol contains information about this survey17. All methods were performed in accordance with the relevant guidelines. After applying the exclusion criterion, which will be discussed in the next section, 913 people were studied in this study. In this cross-sectional study, 98.9% of participants gave full answers to the smoking status questionnaire.

Predictor and outcome variables

Predictive variables were considered in terms of features related to demographic variables, economic status, and smoking behavior. Demographic variables included age, residence (urban/rural), marital status (married/other: single, divorced, widowed), level of education (Illiterate, lower than diploma/diploma and higher). variables including health basic insurance status (yes/no), monthly household income level (more than $ 175 vs $ 175 or less: based on the basic salary of the Ministry of Labor of Iran) and employment status (employee/worker/self-employed/retired/unemployed/others including: student, soldier, unpaid work), were considered as economic predictors. Based on definition the Merriam-Webster Dictionary, a worker is “a person who does a particular job to earn money.” Whereas, an employee is “a person who works for another person or for a company for wages or a salary.”18.

Some questions were asked to the participants in order to assess smoking behaviour among the Iranian population. Iranian Participants were divided into three categories: never smoker, former smoker, and current smoker, based on their answers to the questions "Have you ever smoked" and "Do you smoke now?". Non-smokers were participants who answered "no" to both questions. If participants answered "yes" to the first question and "no" to the second question, they would be classified as former smokers. Participants who answered "yes" to both of the above questions were considered current smokers. In the present study, non-smokers as well as former smokers who quit, that is, people whose answers to the question "Have you quit daily smoking?" were "yes" excluded from the study, and only the current smokers were studied. By applying this exclusion criterion, the sample size was reduced to 913 people. As mentioned in the introduction, one of the features that its relationship to the outcome (smoking intensity) is challenging, is the age at smoking onset. This feature was measured using the question “at what age did you start smoking?” Another aspect that is an essential indication of nicotine dependency, in addition to the age of smoking initiation, is trying to quit smoking19, which we tested using the question "Have you tried to quit smoking in the previous 12 months?" Because smokers are the most vulnerable group to smoking-related health risks20, the question "Have your doctor or healthcare professional advised you to quit smoking in the last 12 months?" also considered as a possible predictor of outcome. We assessed exposure to secondhand cigarette smoke using the question "Has anyone in your house or workplace smoked in your presence in the last 30 days?" in addition to the characteristics described in regard to smoking behaviour.

The study's outcome is that the intensity of smoking was examined using the question "How many cigarettes do you currently smoke each day?" The answer to this question was classified into three categories: less than 10 cigarettes/day as a light smoker, 10–19 cigarettes/day as moderate, and larger or equal to 20 cigarettes/day as a heavy smoker7.

Statistical methods and software

After deleting the missing data, we described the sample using appropriate descriptive statistics. Then we used one-way analysis of variance to compare the mean age and age at smoking onset in three groups of smokers and to assess the association between categorical/discrete variables and response variables, the chi-square test was used. After performing univariate analysis, we used GAM regression to adjust the potential confounder effect by each of the explanatory variables. As potential interactions in the GAM, we entered the interactions between the variables of the level of education and employment status, as well as the interaction between the variables of employment status and exposure to secondhand smoke at work.We entered the variables into the model as follows: based on the literature review, the factors that had the largest impact on the outcome (marriage status21, level of education, residence22, employment status20,23, and monthly household income level22) were kept fixed in the model and additional variables were selected using the backward method. After selecting the variable by the mentioned method, the effect of each feature on the outcome is expressed using the odds ratio (OR) criterion. Also, nonlinear relationships between age at smoking onset as well as age with smoking intensity were presented graphically.

Generalized additive models (GAMs)

GAM24 is an extension of the generalized linear model that is not sensitive to the assumption that the relationship between the covariates and the expected value of the response variable is linear. The general structure of GAM can be presented as follows:

$$g\left(E\left(y\right)\right)={X}_{i}\theta +\sum_{j=1}^{p}{f}_{j}({x}_{ji})$$
(1)

where g is the link function, y is the response variable, θ is the vector of the fixed parameters, \({X}_{i}\) is a row of the design matrix, and \({f}_{j}s\) are smooth functions of the covariates \({x}_{k}\). The normal, gamma, binomial, and Poisson distributions, which use identical, inverse, logit, and log link functions, respectively, are the most prevalent of the distributions accessible in GAM for the response variable. In GAM, the main issue is estimating the unknown function f(.). This unknown function, which describes the relationship between the explanatory variables and the response variable, is estimated using the data itself and a variety of smoothing methods25. Smoothing is the process of fitting a derivative curve to data. Smoothing can be accomplished in a variety of ways, with splines being one of the most common and powerful. A spline is a curve made up of polynomial sections that are uniformly connected at points called nodes16. Thin plate regression splines, cubic regression splines, and P-splines are the most prevalent splines. The effective degree of freedom is used to calculate the degree of curvature of a smooth curve in splines (edf). If edf = 1 then the estimated relationship will be linear. A larger edf would indicate a more complex relationship between the explanatory variable and the response variable26. In GAMs, nonparametric terms are represented using penalized spline regression with smoothing parameters selected using one of the criteria GCV/UBRE/AIC/REML or by regression splines with fixed degrees of freedom27. Refer to reference24 for more information on this topic and other estimation methods.

Software

SPSS software version 24 was used to describe the data. To fit the GAM, we used the ocat function included in the mgcv package in R4.0.3 software. The expected value of a latent variable that follows the logistic distribution is estimated using a linear predictor with the identity link function in this package for the ordinal categorical data that the current study is based on. The probability of belonging to each category of the ordinal categorical variable is determined by the probability of finding this latent variable between specific cut-points27.

Ethical approval and consent to participate

The study was approved by research ethics committee of Hamadan University of Medical Sciences. The written informed consent was obtained from all the participants.

Results

Among the included 913 male subjects, 246 (26.9%) subjects were light smokers, 190 (20.8%) subjects were moderate smokers and 477 (52.2%) subjects were heavy smokers. The mean (standard deviation) age of all participants was 47.38 (13.48) years. Table 1 shows some of the demographic features of the study participants in terms of low, medium, and heavy smoking. According to the results of this table, the mean age of heavy smokers was higher than the two groups with light and moderate consumption (P-value = 0.008). While the mean age at smoking onset in the heavy smokers group was lower than the other two groups (P-value < 0.001). Although a higher proportion of participants (53.1%) had income levels over $175, most heavy smokers (50.5%) had lower income levels than light and moderate smokers (P-value = 0.048). Among all participating smokers in the study, 75.2% had made no try to quit smoking in the past 12 months, and this percentage was higher for heavy smokers (79.9%) compared to the other two groups (P-value = 0.001).

Table 1 Demographical feature and status.

In order to assess the effect of the age at smoking onset on the intensity of smoking by adjusting the effect of other features under consideration, we fitted the GAM. The result of GAM is reported in Table 2 in both parametric and non-parametric parts. In the parametric part of the GAM, the estimation of the coefficients of variables, their significance, and also the corresponding OR (OR of heavy smokers vs light and moderate smokers) are reported. Income level and try to quit smoking features significantly predicted the intensity of smoking. None of the interactions between the pair of features of the level of education and employment status, as well as the interaction between the features of employment status and exposure to secondhand cigarette smoke at the workplace, were significant. The parametric section of GAM showed that the odds of more heavy smoking (heavy vs moderate and light) among smokers with lower than diploma and diploma or higher were 0.932 and 0.809 times less than illiterate smokers. To put it another way, a higher education level was a protective factor for higher consumption. Single subjects had a higher risk of smoking more intensely than married subjects (OR = 1.409). Furthermore, those who were exposed to secondhand smoke at home had 1.364 times the probability of consuming more intensely than non-exposed smokers. According to the results, the risk of more intense smoking approximately was the same among urban and rural areas (OR = 0.916). Compared to employees, the risk of more intense smoking was higher among the unemployed (OR = 1.364), retirees (OR = 1.217), self-employed (OR = 1.192), and workers (OR = 1.182), respectively. In addition, high-income smokers have less tendency to intense smoking than low-income smokers (OR = 0.742). Also, trying to quit for the past 12 months was not associated with heavy smoking (OR = 0.629).

Table 2 Association of intensity of cigarette smoking (heavy vs light and moderate smokers) and independent variables measured by the generalized additive model.

In the non-parametric part, which is presented in the second part of Table 2, the reported edf values for the variables of age at smoking onset and age are 1.913 and 2.974, respectively. Since these numbers are greater than one (approximately from degree two for age at smoking onset and three for age) is indicative of a nonlinear relationship with the outcome variable (intensity of smoking) so these nonlinear relationships are also statistically significant (P-value < 0.001).

The plots of predicted smooth function with 95% Bayesian confidence interval of these two factors in Fig. 1 also shows the nonlinear relationship of these two factors with smoking intensity as an outcome.

Figure 1
figure 1

Estimating the smooth function of the relationship between: (left) age and smoking intensity, (right) age at smoking onset and smoking intensity. The numbers displayed in brackets in the y-axis title represent the edf of smooth curves. The linear predictor scale is used to present the results. The ‘rug plot’ at the bottom of each graph indicates the covariate values. The points on the graph are residuals. The grey region represents the Bayesian confidence interval of 95%.

Figure 1 only shows the functional form of the relationship between the features and the outcome under consideration (smoking intensity). In order to determine the effect of the variable of age at smoking onset on the estimated probabilities of the three outcome groups, Fig. 2 is presented. Based on the results of this figure (left panel), subjects who start smoking at a younger age are more likely to become heavy smokers. Conversely, subjects who start smoking at an older age are more likely to become light smokers than moderate or heavy-smokers. Also, according to Fig. 2 (right panel), subjects between the ages of 40 and 70 with more probability, smoke more daily cigarettes.

Figure 2
figure 2

Plot of response probabilities in three groups with low, moderate, and heavy consumption vs. age at smoking onset (left) and age (right). The sum of the three probabilities is equal to one.

Discussion

In this study, we assessed the factors affecting the intensity of smoking. One of the factors that its association with the intensity of smoking is important is the age at smoking onset variable. However, due to limitations in the methodology of the earlier conducted researches, correctly determining the connection of this feature with smoking intensity remains a challenge14,28. On the other hand, based on a review of the literature, no study has been conducted to assess the effect of the age at smoking onset on the intensity of smoking among Iranian adults. As a result, we used GAM as a flexible modelling tool to find nonlinear and complex associations between this variable and other features on smoking intensity in this study.

According to the results of this study, more than half of the smokers (52%) smoked more than 20 cigarettes a day. This result is inconsistent with the study of Okuyemi et al., because in this study, which was conducted on African American smokers, a significant proportion of smokers (about 40%) were light smokers7. Thus, given that smoking earlier makes people more addicted and smoking more cigarettes the prevention should begin at an early ages to prevent smoking related diseases and mortality. According to multivariate analysis, there was an inverse relationship between education levels and smoking intensity, implying that education levels had a protective impact, i.e., participants with lower levels of education were more likely to become heavy smokers. Rogers' idea, which was published in 1970, can explain this finding29. According to this notion, individuals and groups with more health benefits adopt new health ideas and practises sooner, while disadvantaged persons accept them later30.When compared to employees, the odds of more intense smoking were higher among the unemployed, retirees, self-employed, and workers. The positive association between retirement and unemployment with smoking is consistent with existing studies in the field of occupational status and health because, according to the findings of these studies, job loss increases unhealthy behaviors23. For example, in the Ayyagari study, it was reported that retirement increases the probability of intensity of smoking23. As mentioned, self-employed individuals had higher consumption intensity than employees. Managers smoke more than other jobs, according to the findings of a study by Wang et al.20. It's probable that, unlike employees, the high rate of smoking among managers and self-employed individuals can be attributed to superiors' lack of control over their actions. On the other hand, the lower smoking rate among employees can be justified by the fact that most organizations avoid employing smokers for reasons such as higher costs of health care, more absenteeism, and loss of productivity31. In the present study, most heavy smokers (50.5%) had lower income levels than light and moderate smokers. According to the study by Nketiah-Amponsah et al., rich Ghanaian males were less likely to smoke, while older men living in poorer areas were more likely to smoke22. Household income, on the other hand, was not a substantial predictor of smoking in a study by Villanti et al.32.This discrepancy could be attributable to differences in the target communities under investigation. For example, in addition to men, the study's target population included a subgroup of women who, on average, have lower income levels than men. According to the GAM, exposure to secondhand smoke at home had a strong relationship with smoking intensity, which is consistent with the results of the study by Itanyi et al.33.

Because most heavy smokers began smoking near the end of adolescence or in early youth in the current study, the mean age at smoking onset in the heavy smokers group (20.07 ± 6.63) was lower than in the other two groups. This is because the transition from adolescence to young adulthood is a vital period of life during which young people graduate from high school and leave home to attend college or find a suitable job. These changes typically lead to reduced parental control and changes in social networks and increase the vulnerability of this age group to substance use, including smoking34. Although it is believed that smoking begins in adolescence, studies have shown that the onset of smoking also occurs in later life. According to the results of the present study, only 9 subjects, less than 1% of the participants started smoking at the age of over 50 years. This means that if a person has not started smoking in adolescence and early adulthood, very unlikely to start smoking later in life. The people who begin smoking at a younger age are more dependent on nicotine and their consumption will increase, whereas those who begin at an older age will have less intensity of consumption (Fig. 2 left panel). This result is consistent with the study of Hamzeh et al.1. In this study, using the GAM model, in addition to age at smoking onset, we also assessed the nonlinear effect of age on smoking intensity (Fig. 2, right panel). As a result, people between the ages of 40 and 70 are more likely to smoke more cigarettes on a regular basis. According to a Ghanaian study, older males are more likely to smoke than their younger counterparts, and they also consume more22. Perhaps the probability of more smoking in this age group can be attributed to their younger years. Most likely, these individuals began smoking throughout their adolescence and youth, and their consumption has gradually escalated to the point where smoking has become a habit for them and quitting is quite tough in this age range22. In addition, adult men often have so many financial responsibilities so that some of them are unable to meet the basic needs of their lives and this has caused a lot of stress and anxiety therefore, these people turn to heavy smoking to escape the created anxiety and stress35. Since 12% of deaths worldwide are attributed to smoking among adults over 30 yaers comprehensive measures are needed to prevent nonsmokers from starting to smoke in order to reduce the burden of death caused by smoking11.

Limitations

Because most epidemiological studies use self-reported data without any biomedical markers for smoking, one of the limitations of this type of study is exposure to potential biases such as recall bias36, from which our study also was no exception. Another limitation of this study was that although the samples used are representative of the data of the entire country, they do not include the female population due to the low response rate. It is suggested that in future studies, the age at smoking onset and other factors in women should also be investigated and compared with the male population.

Conclusions

According to the results of this study, poorer socioeconomic status as well as starting smoking at a younger age is associated with heavy smoking. Also, men in the age group of 40 to 70 years had higher consumption intensity than other age groups. Therefore, more intensive consumption and its adverse consequences in adulthood can be predicted by these factors.