Determining correlates of the average number of cigarette smoking among college students using count regression models

College students, as a large part of young adults, are a vulnerable group to several risky behaviors including smoking and drug abuse. This study aimed to utilize and to compare count regression models to identify correlates of cigarette smoking among college students. This was a cross-sectional study conducted on students of Hamadan University of Medical Sciences. The Poisson, negative binomial, generalized Poisson, exponentiated-exponential geometric regression models and their zero-inflated counterparts were fitted and compared using the Vuong test (α = 0.05). A number of 1258 students participated in this study. The majority of students were female (60.8%) and their average age was 23 years. Most of the students were non-smokers (84.6%). Negative binomial regression was selected as the most appropriate model for analyzing the data (comparable fit and simpler interpretation). The significant correlates of the number of cigarettes smoked per day included gender (male: incident-rate-ratio (IRR = 9.21), birth order (Forth: IRR = 1.99), experiencing a break-up (IRR = 2.11), extramarital sex (heterosexual (IRR = 2.59), homosexual (IRR = 3.13) vs. none), and drug abuse (IRR = 5.99). Our findings revealed that several high-risk behaviors were associated with the intensity of smoking, suggesting that these behaviors should be considered in smoking cessation intervention programs for college students.


Statistical models. Poisson regression.
The Poisson probability distribution is as follows: ..
Var y ( ) ( ) , where λ stands for the mean (and variance) of the response variable. To investigate the effect of explanatory variables, the canonical link (here logarithm of λ) is used to relate mean parameter λ to the covariates ( λ β = ′ x log( ) ).
Negative binomial regression. The probability mass function of the negative binomial distribution is as follows: . The parameter α is called dispersion (over-dispersion) parameter 34 .
Generalized poisson regression. The probability function of y with generalized distribution is given as follows: Exponentiated-exponential geometric regression. The exponentiated-exponential distribution is a unimodal and right-skewed distribution. The probability function of Y i with EEG distribution is given as follows: ..
Zero-Inflated models. Sometimes, the data consist of many zeros that cannot be handled using the above distributions. All distributions of Poisson, NB, GP, and EEGR can be considered as mixture models called zero-inflated (ZI) models to account for the excess zero counts. A ZI model is based on a logistic regression (typically with a logit link) to predict which class the zero belongs to. The general form of a ZI distribution is as follows: where f(y) stands for the count distribution and the parameter φ is the uncertainty parameter (mixing proportion).
Model fitting and selection. The average daily number of cigarettes (count), smoked by the students, was modeled as a function of gender, age and other explanatory variables using Poisson regression, NB regression, generalized Poisson regression, EEG regression and their zero-inflated counterpart regression models. The same explanatory variables were included in both parts (the logit and count components) of the zero-inflated models. In the EEGR model, we assumed that the shape parameter c is a nuisance parameter. We utilized a multivariate approach for model fitting. Therefore, all the variables were considered in all the models. The Vuong test 40 (based on BIC and AIC) was used to conduct all the pairwise comparisons between different models to see which one provides a better fit to the data. This test produces a z-statistic, where a value >1.96 supports the alternative assumption that the first model fits the data better and a value <−1.96 indicates that the second model provides a better fit to the data. Data were analyzed using PROC NLMIXED in SAS, version 9.4 (SAS Institute, Inc., Cary,NC

Results
A number of 1258 students participated in this study. About 84% (1064 out of 1258 participants) of the students were nonsmokers and the average daily cigarettes smoked was 4.36 (standard deviation = 5.04). Table 1 shows the characteristics of the students participated in this study. According to the results, shown in Tables 1, 60.8% of the students were female. The average age of the students was 22.54 years (SD = 3.35) with the majority aged 18-21 years (43.9%). Most of the students participated in the study were single/divorced (87%). About 35% of the students were first-born children, the majority of them lived in the dormitory (70.6%), and 29.4% of them were indigenous. Most of the students (88.9%) were BSc/MD students and were interested in their discipline (81.9%). The education level of most of the parents was a high school diploma (63.9% of mothers and 47.1% of fathers), 51.7% of the students had a boy/girlfriend and 33.3% of them experienced a break-up, 7.9% (7.5%) of the students had homosexual intercourse (heterosexual intercourse), 79.3% of them were optimistic about the future and 13.2% (6.1%) of them had suicidal thought (attempt) during their lifetime. 9.8% of the students had a history of drug abuse (ever) and 87.9% used social media, 41.1% of the students had psychiatric distress in terms of GHQ-28. Summary statistics of the GHQ-28 subscales as well as the total score for the college students were also provided in Table 2. As seen, the average and standard deviation of the general health of the students participated in this study was 22.72 and 14.80, respectively. Table 3 shows the results of the Vuong test related to the fitting of different models, including Poisson regression, ZIP regression, NB regression, ZINB regression, GP regression, ZIGP regression, EEG regression and ZIEEG regression, to the daily number of cigarettes smoker by the college students. The results of the Vuong test were based on both BIC and AIC. According to the results of the Voung test statistics (both BIC and AIC), the Poisson regression (ZIP regression) provided the worst fit to the data among all regression models. Moreover, both Vuong test statistics did not show significant statistical differences between other methods. So, overall, the NB model was selected as the final model for simple interpretation. Table 4 shows the regression coefficients of the NB regression model fitted to the daily number of cigarettes smoker by the college students. Exponentiated coefficients (incidence rate ratios (IRR)) and 95% confidence intervals were estimated for each model. According to the results shown in Table 4, the variables that were significantly associated with the daily number of cigarettes by the students included gender (male) (IRR = 9.45; 95% CI: 6.25, 14.28; P < 0.0001), Birth order (forth) (IRR = 2.05; 95% CI: 1.09, 3.90; P = 0.027), experiencing a break-up (IRR = 1.58; 95% CI: 1.05, 2.40; P = 0.027), having sexual intercourse (heterosexual vs. none: IRR = 2.59, 95% CI: 1.42 to 4.68, P = 0.002; homosexual vs. none: IRR = 3.13, 95% CI: 1.71 to 5.73, P < 0.001; homosexual vs. heterosexual: IRR = 1.21; 95% CI: 0.56 to 2.63, P = 0.628; having a history of drug abuse (opium/Psychedelic) (IRR = 5.99; 95% CI: 3.13, 11.51; P < 0.001).

Discussion
Smoking intensity, defined usually as the number of cigarettes smoked by a person per day, can be considered as an important factor in establishing many serious smoking-related diseases, especially cancers. Smoking by college students, comprising a vast population of youth in Iran, makes them vulnerable to other risky behaviors. Therefore, investigating its underlying factors is of great importance. In this regard, count regression models are the first-line models that can be used to determine factors associated with smoking intensity as a count response, defined as the daily number of cigarettes smoked by an individual. There is no model that fits well for all data. So, selecting a model with the best fit to the data is of crucial importance. Here, the goodness-of-fit of several classical count regression models (Poisson, NB, GP, and EEG), as well as their zero-inflated counterparts (ZIP, ZINB, ZIGP, and ZIEEG), were investigated using a dataset related to the daily number of cigarettes smoked by college students. The findings of the present study revealed that the NB regression and Poisson regression had the best and worst fit to the data, respectively. Nevertheless, the goodness-of-fit of other models was comparable with that of the NB regression. So, the simplest model in terms of interpretation among them (NB regression) was selected as the most appropriate one. It is also possible to interpret the results of the ZINB regression model when one is  www.nature.com/scientificreports www.nature.com/scientificreports/ interested in investigating factors associated with smoking/not-smoking and the number of cigarettes smoked per day as there may be different factors associated with each of them.
The findings of the present study indicated that having sexual intercourse increased the severity of smoking and led to smoking a greater number of cigarettes per day. Our findings revealed that having heterosexual and homosexual intercourse increased the daily number of cigarettes among the students by 2.59 and 3.13 times, respectively. While there might be few studies that investigate the association of these factors on the smoking severity as in this study, our findings were in concordance with the results of several studies that have investigated factors associated with smoker/nonsmoker response [41][42][43][44][45][46] . Moreover, according to our findings, homosexual students consumed a higher number of cigarettes per day compared to the heterosexual students (IRR = 1.21); however, it was not statistically significant, which may be due to the small number of homosexual and heterosexual students in the present study. Furthermore, due to the sensitive nature of questions about sexual activity in Iran, students may try to hide their sexual activities. Therefore, the fact that there may be some students in the "not having sexual intercourse" group that did not express their sexual orientation (heterosexual, gay/lesbian, and bisexual), because it is a social taboo, may attenuate the relationship of sexual orientation/activity with smoking among college students. There is a lot of evidence that tobacco use is higher among individuals identifying as lesbian, gay or bisexual (especially among women). Li et al. studied sex and sexual orientation in relation to tobacco use among young adult college students in the US 47 . They found that the pattern of tobacco use was different between heterosexual, gay/lesbian, and bisexual students; especially, bisexual women used a higher mean number of tobacco products compared to heterosexuals or other sexual minority groups. Hequembourg et al. also found that sexual minority women consumed more cigarettes smoked on smoking days compared to the heterosexual women 48 . The disparities in tobacco use across different sexual orientation groups have been reported by studies; so that a higher rate of tobacco use has been reported for bisexuals compared with gay/lesbian and heterosexuals and a higher rate of cigarette use has been reported for sexual minority versus heterosexual women [49][50][51][52][53][54][55] . In a study conducted by Zhang et al., it has been reported that homosexual people are more likely to engage in smoking 56 . Moreover, in another study conducted by Lindström et al., a higher smoking amount was observed for homosexual men compared with heterosexual men and women, while this quantity was not significant for homosexual women 57 . King and Nazareth found higher smoking rates for homosexual men and women compared with homosexual groups 58 . These evidences highlight the importance of targeting sexual minorities and considering the nuances across the sexual orientation spectrum in smoking cessation programs.
Our findings showed that illicit drug abuse, i.e. opium/psychedelic abuse as a high-risk behavior, was associated with consuming higher number of cigarettes per day among the students. Drug abusing has been also reported to be associated with a greater number of cigarettes smoked per day 59 . Having high-risk behaviors have been shown to be associated with psychiatric distress and suicidal ideation/attempt 60 . On the other hand, psychiatric distress and smoking have been shown to be associated positively 61,62 . It has been also reported that suicidal thoughts/attempts are strongly associated with smoking (OR = 4.03; 95% CI: 2.65-6.11) 60 .
Our findings also revealed that there was an association between experiencing a break-up and an increased daily number of cigarettes smoked by the students. The association between experiencing a break-up and smoking has not been investigated in the previous studies and this finding was novel. Experiencing a break-up might be related to the increased levels of psychosocial stress which is associated with greater odds of persistent smoking 63 .
Our findings also showed that the number of cigarettes smoked per day by male gender was higher than that of the female students by a factor of about 10. This finding is consistent with the findings of other studies. Moghimbeigi et al. in a study conducted in high schools in Iran showed that the daily number of cigarettes in male students was about 4 times greater than that of the female students 64 . Kilic and Ozturk in a study investigated the gender differences in cigarette consumption among adults in Turkey 65 . They found that the daily number of cigarettes in males was 1.6 times greater compared with the females. They also found that factors including education programs, cigarette taxation and tobacco advertising bans have different effects on each gender whereas social interaction is important for cigarette smoking behaviors of both genders. This might be attributed to the income elasticity among male students as they are more independent in terms of income than female students. Furthermore, it can be related to the differences in personality characteristics by gender. Traditional views also can cause differences in the social contacts of the students. While smoking by females is regarded as a taboo in the traditional culture of Iran, it is viewed as a common way of socializing with peers for males which might influence the smoking behavior for the male and female students 66 .
The findings of the present study indicated that the birth order was associated with the intensity of smoking, such that the greater number of cigarettes smoked per day was observed for higher orders of birth; and the daily number of cigarettes smoked by a student with the birth order of 4 was about 2 times greater compared with a student with the first order of birth. This finding was also consistent with the results of other studies 67 . Argys et al. found that "the number of cigarettes smoked daily increases monotonically with birth order, suggesting that the higher prevalence of smoking by later-borns found among U.S. adolescents" 68 . According to the theories, this may be attributed to the biological factors (changes in maternal immune system occurring over successive births), parents' skills and experiences and having higher incomes during raising later-born children, such that parents treat the first child differently than the later-born children [69][70][71][72][73][74] .
As there were several sensitive questions in our used questionnaire, including those related to the sexual activities and drug consumption as well as the self-reported nature of the questionnaire, our results were limited due to the possibility of underestimation of the high-risk behaviors (the rejection rate was 6% among college students). One other limitation of this study was that questions about alcohol use were missed which is likely correlated with the outcome of interest and it is suggested to be considered in future studies. The cross-sectional nature of this study was another limitation that can limit our results; as the obtained results did not imply cause-effect relationships. Despite these limitations, we used multivariate methods to provide beneficial information about potential correlates of smoking intensity and tried to select a model that best fits the data among the most widely used count regression models.
The multivariate model utilized in the present research helped to identify correlates of smoking severity which should be taken into consideration while identifying smoking behavior among the students and establishing prevention and intervention programs for this population. In fact, these findings suggested that focusing on high-risk behaviors can be helpful in interventional programs for smoking cessation among college students.

Data availability
The dataset used and/or analyzed during the current study is available from the corresponding author on reasonable request.