Classification of psychiatric symptoms using deep interaction networks: the CASPIAN-IV study

Identifying the possible factors of psychiatric symptoms among children can reduce the risk of adverse psychosocial outcomes in adulthood. We designed a classification tool to examine the association between modifiable risk factors and psychiatric symptoms, defined based on the Persian version of the WHO-GSHS questionnaire in a developing country. Ten thousand three hundred fifty students, aged 6–18 years from all Iran provinces, participated in this study. We used feature discretization and encoding, stability selection, and regularized group method of data handling (GMDH) to classify the a priori specific factors (e.g., demographic, sleeping-time, life satisfaction, and birth-weight) to psychiatric symptoms. Self-rated health was the most critical feature. The selected modifiable factors were eating breakfast, screentime, salty snack for depression symptom, physical activity, salty snack for worriedness symptom, (abdominal) obesity, sweetened beverage, and sleep-hour for mild-to-moderate emotional symptoms. The area under the ROC curve of the GMDH was 0.75 (CI 95% 0.73–0.76) for the analyzed psychiatric symptoms using threefold cross-validation. It significantly outperformed the state-of-the-art (adjusted p < 0.05; McNemar's test). In this study, the association of psychiatric risk factors and the importance of modifiable nutrition and lifestyle factors were emphasized. However, as a cross-sectional study, no causality can be inferred.

. Demographic and socioeconomic characteristics of participants: the CASPIAN-IV study. N number of people who are in each category.       www.nature.com/scientificreports/ The performance of the proposed algorithm for classifying extended three-class psychiatric symptoms was provided in Table 6. The algorithm was run for each age category (6-10, 11-14, and 15-19 years) to improve its performance. The selected factors by the stability feature selection were breakfast, life satisfaction, self-rated health, screen time, residence area, sleeping-time, and weight-reduction plan for the first age category. They were self-rated health, life satisfaction, gender, breakfast, sleeping-time, residence area, weight-reduction plan, physical activity, and body image for the second age category. The algorithm selected self-rated health, life satisfaction, gender, breakfast, physical activity, body image, and screen time for the third age category. Among the selected factors, self-rated health, life satisfaction, and breakfast were common in all age categories.

Discussion
In our study, we considered depression symptoms, worriedness symptoms, and mild-to-moderate emotional symptoms for classification. The SRH was recognized as the primary variable for classifying depressive symptoms and mild-to-moderate emotional symptoms (Figs. 2, 3). Overall, few studies were performed on SRH in the literature. It has been shown that depression was strongly associated with reporting poor SRH 42 . There was also a relationship between poor SRH, depressive and anxiety symptoms among university students with high academic stress 43 .
Moreover, a longitudinal study of adolescent health in the United States demonstrated that one of the main factors associated with persistent depressive symptoms was poor self-rated general health 44 . Although there was no significant correlation between SRH and depression symptoms in our dataset (Rank Biserial r rb = 0.002; p = 0.819) (Fig. 1), its interactions with screentime, diet, and breakfast were selected by the GMDH network (Fig. 2). It is how the interaction network identifies indirect factors. However, the univariate analysis could not identify this factor. It was similar for mild-to-moderate emotional symptoms, where there was no significant correlation between SRH and mild-to-moderate emotional symptoms in our dataset (Rank Biserial r rb = 0.008; p = 0.390) (Fig. 1); its interactions with milk type, abdominal obesity, and beverage consumption were selected by the GMDH network (Fig. 3).
One of the most critical risk factors for children, the physical activity level, was selected for those having worriedness symptoms in our study (Fig. 4). The beneficial effects of regular physical activity on health are indisputable in modern medicine 45 . Furthermore, a large amount of exercise plays an essential role in minimizing the worry in clinical settings 46 . Although the correlation between physical activity and worriedness symptoms was very low in our study (Rank Biserial r rb = −0.087; p = <0.001) (Fig. 1), its interactions with the others were selected by the GMDH network (Fig. 4).
In our study, sleep hour was selected as a factor for mild-to-moderate emotional symptoms (Fig. 3). It is in agreement with previous studies 47 . Adverse general health outcomes are associated with the indicators of sleep problems, such as short sleep duration. Another study on 11,788 pupils from 11 different European countries showed a negative association between sleep time hours per night and emotional symptoms 48 . Although the correlation between sleep hours and mild-to-moderate emotional symptoms was very low in our study (Rank Biserial r rb = −0.080; p = <0.001) (Fig. 1), its interaction with the milk type during infancy was selected by the GMDH network (Fig. 4). It was shown in the literature that there is a relationship between breastfeeding and sleep quality in infants 49 . Also, breastfeeding is related to behavior problems in children and adolescents 50 . However, there was no significant correlation between sleep hour and milk type in our study (Rank Biserial r rb = 0.005; p = 0.634) (Fig. 1).  www.nature.com/scientificreports/ One of the notable factors in our research was screen time (Fig. 2). Some studies showed that children who watch TV for more than two hours a day usually have lower self-esteem, lower school performance, and unhealthy eating habits 51 . Such consequences would lead to psychological distress in young children 52 . The consequences reported by these articles are in agreement with our findings of the positive association between screen time and having depression symptoms. Although the correlation between screen time and depression ssymptoms was very low in our study (Rank Biserial r rb = 0.056; p = < 0.001) (Fig. 1), its interactions with SRH and breakfast were selected by the GMDH network (Fig. 4).
A predictor of depression and worriedness symptoms was salty snack consumption (Figs. 2, 4). In general, there were few studies in this field 26 . It is also demonstrated that 12 to 13-year-old Norwegian adolescents with healthy dietary patterns have better mental health conditions 53 . Although the correlation between salty snack consumption and depression or worriedness symptoms was very low in our study (depression: Rank Biserial r rb = 0.030; p = 0.002, worriedness: Rank Biserial r rb = 0.044; p < 0.001) (Fig. 1), its interactions with SRH and breakfast, and with age category, SES category, and the physical activity were selected for depression, and worriedness symptoms, respectively (Figs. 2, 4).
Breakfast is one of the most important meals. The prevalence of breakfast skipping is increasing among adolescents. Previous studies showed that breakfast intake is related to mental problems 54 . Another study showed that skipping breakfast at least four times a week was significantly associated with a higher depressed mood score 55 . Our finding is consistent with such results on the association of breakfast consumption with depression symptoms (Fig. 2). Although the correlation between breakfast consumption and depression symptoms was relatively low in our study (Rank Biserial r rb = 0.107; p < 0.001) (Fig. 1), its interactions with the others were selected (Fig. 2).
Discretization was used in our study to generate categorical input variables instead of interval variables. Although this procedure reduces the flexibility of the variables, it could increase the classifiers' performance and their generalization. We further used the original interval variables, and the average accuracy of the GMDH classification system was reduced by 3%, 4%, 2%, and 7% for depression symptoms, mild-to-moderate emotional symptoms, worriedness symptoms, and psychiatric symptoms. Moreover, the correlation between the original input variables and their categorical version ranged from 0.68 for age (Spearman's rho; p < 0.001) to 0.95 for screentime variables (Spearman's rho; p < 0.001). Thus, such a discretization did not significantly reduce the amount of information.
In our study, the GMDH network was used as a classifier. This network is incremental and expands with regularized least squares (RLS), a convex algorithm, but it also generates the interaction network (Figs. 2, 3, 4), leading to better clinical interpretations. The proposed GMDH network had very good diagnostic accuracy for depression symptom classification, while it showed good diagnostic accuracy for other outcomes. It showed an excellent, fair to good agreement rate with the gold standard for depression symptoms and worriedness symptom or psychiatric symptoms classification. However, it showed a poor agreement rate for the mild-to-moderate emotional symptoms classification. The proposed system's discriminant power was fair, limited for depression symptoms and worriedness symptom classification. However, it was poor for mild-to-moderate emotional symptoms classifications. The proposed system's false alarm (FA) ranged from 3 to 31% when classifying depression symptoms and mild-to-moderate emotional symptoms. However, the proposed system's statistical power was always higher than 70% in the entire outcome. Moreover, the false discovery rate (a.k.a., 1-precision) ranged from 13 to 78% for classifying depression symptoms and mild-to-moderate emotional symptoms.
The proposed GMDH network had the best and worst classification performance (MCC) for depression symptoms and mild-to-moderate emotional symptoms, respectively ( Table 4). The dataset was highly imbalanced for mild-to-moderate emotional symptoms outcome (the prevalence of 11.1%). However, the entire performance indices were consistent in different test folds (Table 3). It must be mentioned that the diagnostic accuracy of classifying psychological distress is not usually high in the literature 56 . In the meanwhile, there could be two reasons why the GMDH significantly outperformed the MLP classifier. First, the MLP is a fully connected network, while the GMDH is not (Figs. 2, 3, 4), resulting in more parameters in the MLP. Second, the cost function of the GMDH was customized for the imbalanced data, while the cross-entropy was used for the MLP that could be improved by using weighted cross-entropy 57 .
The GMDH algorithm was further used to classify the extended three-class psychiatric symptoms. Overall, the agreement rate of the extended system was comparable to that of the four two-class problems (Tables 4, 6). However, the original questions are used in the extended system, and the severity of the entire psychiatric symptoms are identified rather than defining a psychiatric symptom based on one question. Accordingly, it might be preferred in practice. For the extended system, analyzing the interaction network identified that self-rated health, life satisfaction, breakfast consumption, and sleeping/screen time were the most critical factors.
In our study, we designed classification systems for psychiatric symptoms. Proper diagnosis of mental disorders, such as depressive and anxiety disorders, requires detailed analysis 58 . As an important limitation of our study, each outcome variable of depression and worriedness symptoms was derived from a single question. Mildto-moderate emotional symptoms outcome was created based on five questions (confusion, insomnia, anxiety, angriness, and worthlessness). Generally, depression might be embedded in emotional symptoms, as one of the limitations of our methodology. However, due to the definition of depression in our study that prevented students from routine activities, it was more alarming than mild-to-moderate emotional symptoms defined based on Q1-Q5. However, Q1-Q5 could be considered as symptoms of depression. In our analysis, no more than a trivial association between depression and any of the five mild-to-moderate emotional symptoms items was observed. However, the combination of Q1-Q5 had acceptable discrimination for depression diagnosis. Moreover, the overall outcome (i.e., psychiatric symptoms) was generated based on the entire seven questions in our analysis.
The advantage of our study is the large sample size. Moreover, it analyzed the comprehensive factors related to psychiatric symptoms to monitor direct and indirect modifiable factors. However, it is a repeated cross-sectional study 59 , and no casualty can be inferred. We only analyzed the CASPIAN-IV data, and examining the trend and www.nature.com/scientificreports/ association between variables over time is the focus of our future activity. The other limitation was the possible bias in the self-reported answers of participants. Moreover, there is evidence about adolescents using the substance in Iran and their considerable psychological dysfunction 60 . However, it was not recorded in CASPIAN-IV. It is another limitation of our study.
In conclusion, our study emphasized the modifiable factors of psychiatric symptoms, including breakfast, salty snack, sweet beverage consumption, consumption, screentime, (abdominal) obesity, sleep hour, and physical activity. Iran ranked fourth among the countries with the highest age-standardized mental disorder DALYs rates (2436.44 DALYs per 100,000, based on the GBD 2019) 61 . Such disorders could root from childhood psychiatric symptoms, and empowering protective factors and changing modifiable risk factors might reduce such rates in the future. It is possible to design an online web-based or Android App of the developed algorithm to identify whether the student could have a high risk of "psychiatric symptoms" based on the selected input variables (Figs. 2, 3, 4). Such indirect health screenings have a great potential to be integrated into schools, which is the focus of our future activities.

Methods
A large population of the fourth study of a national surveillance program, entitled "Childhood and Adolescence Surveillance and Prevention of Adult Non-communicable disease" (CASPIAN-IV), supported by the WHO/ Eastern Mediterranean region and the Iranian Ministry of Health and the Ministry of Education, were analyzed in our project. Detailed methodology is published elsewhere 62 . We briefly describe the study population.
The population and sampling method. A sample of 14,880 students aged 6-18 years was selected by multi-stage sampling from schools of urban and rural areas of 30 Iranian provinces. Having explained the objectives and protocols, participants were enrolled in the study. Parents gave the written informed consent and oral permission, while oral assent was obtained from students to express willingness to participate in research. Trained healthcare professionals performed all the data collection procedures. Study protocols were reviewed and approved by the Research and Ethics Council of Isfahan University of Medical Sciences (#5429-90). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Outcome variables. The World Health Organization-Global School-based Student Health Survey (WHO-
GSHS) was used in our study. It covers alcohol, tobacco use, hygiene, physical activity, mental health, dietary behaviors, violence, protective factors, and unintentional injuries among children and youths. After translating questions into Persian and simplifying the questions with any difficulty in understanding, the questionnaire's reliability and validity were assessed [63][64][65] . We considered psychiatric symptoms as depression symptoms, worriedness symptoms, and mild-to-moderate emotional symptoms, where the latter included confusion, insomnia, anxiety, angriness, and worthlessness 66 . The three indicated psychiatric symptoms were taken into consideration in this study. Also, if a subject has any of the depression symptoms, worriedness symptoms, and mild-tomoderate emotional symptoms, he/she was considered having "psychiatric symptoms" as the overall outcome of the study. The psychiatric symptoms were assessed by the questions presented in Supplementary Table S1. The first five questions were used to identify mild-to-moderate emotional symptoms in our research. Those who experienced at least 3 out of 5 problems every day, more than once a week, or once a week were defined as "adolescents with mild-to-moderate emotional symptoms. " An indicator of depression symptoms was a positive answer to the 6th question. In the last question, students who were worried most of the time or always so that they could not sleep at night were considered having worriedness symptoms 35 . Thus, our goal is to design and implement four binary classifiers to classify depression symptoms, mild-to-moderate emotional symptoms, worriedness symptoms, and "psychiatric symptoms" using the input variables. We further created an extended outcome using the entire seven questions used in the questionnaire. The original response to seven questions of the questionnaire was analyzed, and no dichotomization was used. The first principal component was extracted using Principal Component Analysis (PCA). Then, tertiles of the PC were extracted, splitting the subjects into "low, " "medium, " and "high" risk psychiatric symptoms groups. Thus, an extended three-class classification problem was also considered. Input variables. The input variables considered in our psychiatric symptom classification system are as follows 67 : age category, sex, socioeconomic status (SES), physical activity level, body mass index (BMI) category, abdominal obesity, family size, residence area, sleep-time category, screen time category, (passive and active) smoking habit, life satisfaction, health status, as well as the consumption of breakfast, fast food, salty snack and beverage, having nutrition plan, the number of close friends, mothers' education level, body image, birth weight category, milk type used in infancy, and the family history of cancer and sudden death.
Measurements. In this study, age was categorized as 6-10, 11-14, and 15-19 years 68 . The screen time was considered as a categorical variable and consisted of the time spent on watching television (TV)/video and computer games during leisure time, less than or equal to 4 (≤ 4 h) defined as low, and greater than 4 (> 4 h), as high 35 . The three categories of sleep time hour per week were defined as: sleep time hour less than or equal to 5 (h ≤ 5 h) (low), 5-8 h (moderate), and ≥ 8 h (high) 69 . The residence area was considered either urban or rural. The physical activity at school and out of school was quantified using principal component analysis (PCA). The obtained scores were then categorized into tertiles 66 . Variables including family assets, such as ownership of a house, car, computer, occupation, and education level of parents, school type (private/public), were sum-Scientific Reports | (2021) 11:15706 | https://doi.org/10.1038/s41598-021-95208-y www.nature.com/scientificreports/ marized in one main PCA component. Students were then classified as having low, moderate, and high SES, based on the component tertiles. The active smoking habit was considered using tobacco products (cigarettes, pipe, hookah, etc.) every day, while passive smoking was considered, as exposure to tobacco smoke was used by others or second-hand smokers 66 . Subjects with either passive or active smoking were considered as smokers and non-smokers; otherwise. The general state of participant's health was determined by the self-rated health (SRH) variable, asking "How would you describe your general state of health?" on the GSHS questionnaire, with the categories of "good, " "moderate, " and "bad" 35 . Life satisfaction was evaluated by asking questions about the degree of satisfaction with their life, using a tenth-point scale from 10 = very satisfied to 1 = very dissatisfied. The scores below 6, was signified low and high satisfaction, otherwise 70 . Body image was assessed using the question, "What do you think regarding your body size?"; the answer to this question was obtained with the following options "much too fat, " "a bit too fat, " "about the right size, " "a bit too thin, " " much too thin. " For the analysis, the variable was divided into overweight (much too fat and a bit too fat), underweight (much too thin and a bit too thin), versus normal weight cognition 71 . Breakfast consumption was categorized into three groups as nonskipper (those eating breakfast 5-7 days a week, semi-skipper (those eating breakfast 3-4 days a week), and skipper (those eating breakfast 0-2 days a week) 22 . The students were asked about the frequency of salty snack consumption, categorized as "seldom or never, " "weekly, " and "daily" consumption 72 . The family size was categorized as "less than or equal to 4" or "greater than 4". The number of close friends was categorized as nothing, one, two, three, or more. The nutrition plan was assessed as "adherence to a weight-modifying plan based on a special diet" or not, otherwise 71 . Sugar-sweetened beverage consumption (i.e., soda, soft drinks) was categorized as "daily, " "weekly, " "seldom or never" 72 . The consumption of fast foods (pizza, fried chicken, cheeseburgers, hamburgers, and hot dogs) was categorized into three groups: daily, Weekly, seldom, or never. The education level of mothers was categorized into three groups: Illiterate, diploma, and university degrees. Participants' birth weight (BW; g) was asked from their parents and then categorized into three groups; low (BW < 2500 g), normal (BW: 2500-4000 g), and high (BW > 4000 g) 73 . We also assessed whether breastfeeding was done for the children and adolescents during their infancy 74 , and the variable milk type was categorized as breast milk (1) and others (0) otherwise. Moreover, we considered the family history of sudden death (yes or no) and also the family history of cancer (yes or no) of the first-degree relatives of the subjects enrolled in the study.
Anthropometric measurement. In our study, trained healthcare providers performed anthropometric measurements at school. All measurements were conducted with calibrated instruments, according to standard protocols 66 . Height was measured in the standing position, barefooted while shoulders touch the wall. It was recorded to the nearest 0.2 cm. We measured weight shoeless and in lightly dressed condition to the nearest 200 g. Waist circumference (WC) was measured by a non-elastic tape to the nearest 0.2 cm. We calculated the BMI as weight in kilograms, divided by height in meters squared (m 2 ). The subjects were classified as underweight, healthy weight, overweight or obese, if BMI was < 5th percentile, between 5th and 85th percentiles, higher than 85th percentiles (i.e., BMI categories), respectively 75 . Abdominal obesity was defined as WC to height ratio (WHtR) of more than 0.5 76 .
Feature extraction. The interval variables (e.g., age, BMI, birth weight, family size, number of close friends, sleep, and screen time) were first categorized using unsupervised discretization. Although discretization reduces interval variables' flexibility, it could improve the classification problems' performance and generalization 77 . The input features in our study were thus entirely categorical. Their measurement scale was nominal (e.g., sex, smoking status, milk type, family history of sudden death, family history of cancer) or ordinal (e.g., age category, SES, physical activity level, BMI category). For each categorical variable, the wight-of-evidence encoding 78 was used for obtaining continuous covariates. For each outcome variable, stability feature selection 79 was then performed. The selected features were used in the following classification procedure.
Classification. Twenty-five input variables were used in GMDH (i.e., layer zero), while the outcomes depression, worriedness, and mild-to-moderate emotional symptomswere separately used as outputs. At the first layer, each pairwise interaction of the inputs was considered as a neuron. Suppose that the pair x i,j, and x i,k (features no. k and j from the subject no. i) are combined to generate the estimated outcome ỹ i at the first layer using the second-order polynomial model shown in Eq. (1).
where the coefficients A = [a 0 , . . . , a 5 ] T could be estimated using Regularized Least Squares (RLS) 80 on the entire estimation set in Eqs. (2,3). www.nature.com/scientificreports/ and λ is the regularization parameter, y i is the output of the sample no. i of the estimation set and I 6 is the identity matrix of size six. In addition to the regularization, the training set was divided into the estimation set with N e number of samples and the validation set with N v number of samples to avoid over-fitting during learning. The regularization parameter was tuned using the brute-force search algorithm to maximize the Matthews correlation coefficient (MCC) 81 on each output validation set. At the first layer, each neuron's RLS coefficients were estimated on the estimation set using the above procedure. Each neuron's performance was then assessed on the validation set, and the corresponding MCC values were calculated. The top 10 neurons with better performance than the previous layer's neurons were selected at maximum, and their pairwise interactions were analyzed at the next layer. The network is built up layer by layer during training until the stopping criterion based on the "early-stopping" strategy is achieved. Whenever the validation set's performance is reduced at the next layer, the output of the current layer's best neuron was selected as the output of the entire GMDH network. The presented algorithm was used for the binary classification problems. For the extended multi-class problem, the Macro-averaged F 1 -score 82 was used instead of the MCC as the fitness function.
Comparison with the state-of-the-art. Other classifiers, namely linear discriminant analysis (LDA), multilayer perceptron (MLP), and supported vector machines (SVM), were used for comparison. LDA is a base classifier used to identify whether the classes could be accurately identified using linear boundaries. SVM, on the other hand, constructs a hyperplane in a high-dimensional space. The nonlinear SVM with the radial basis function (RBF) kernel was used in our study. We tuned the RBF kernel radius and the soft-margin parameter using the method proposed by Wu and Wang 83 . MLP, a feed-forward fully-connected artificial neural network (ANN) model, maps a set of inputs onto an output. In our study, ten neurons with the sigmoid active function and one hidden layer were used. The parameters of the network were tuned on the validation set.
The validation framework. In our study, threefold cross-validation with stratified sampling 84 was used.
The same test folds were used for different classifiers. In the GMDH and MLP classifiers, 75% of the training set was used for estimation, while 25% was used for validation. The performance indices in Supplementary  Table S2 were reported for different classifiers. True Positive, False Positive, True Negative, and False Negative were calculated by comparing the classifiers' results and the gold standard in four classification problems (i.e., depression symptom, mild-to-moderate emotional symptoms, worriedness symptom, and finally psychiatric symptoms). The interpretation of the reference intervals of the indices AUC, K(C), MCC, and DP 85 was listed in Supplementary Table S3. Moreover, following the STARD guideline 86 , the CI 95% of the performance indices were reported for the cross-validated confusion matrices. The performance of the proposed GMDH algorithm on the three-class extended problem was assessed based on the Macro-and -Micro averaged indices presented by Sokolova and Lapalme 82 .
Statistical analysis. In this paper, subjects with complete information were used in the analysis. Only for comparison 87 (Table 5), Multivariate Imputation by Chained Equations (MICE) R-package 88 was used to impute the ordinal data of the enrolled subjects. All variables were reported, as the frequency and percentage, since they were categorical. The χ 2 analysis was used to compare the categorical variables in their categories. The Spearman's rho is used as the correlation coefficient between interval-ordinal pairs. Kendall's τ b was used as the correlation coefficient between ordinal-ordinal pairs. The phi coefficient and rank-biserial correlation coefficient were used for the binary-binary and binary-ordinal association, respectively 40 . The Cochran's Q test with McNemar's post hoc test for pairwise comparison with Bonferroni correction was used to compare different classifiers' performance. A significance level of 0.05 was used in our analysis. MATLAB version 8.6 (The MathWorks Inc., Natick, MA, USA) was used for classification, while R version 4.0.0 (R Core Team (2020), https:// www.R-proje ct. org/) was used for data imputation. The statistical analysis was performed using the SPSS statistical package, version 18.0 (SPSS Inc., Chicago, IL, USA).

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.