Introduction

Mental health and illness are a public health concern1. Nowadays, the prevalence of non-communicable diseases (NCDs), such as mental disorders, is rapidly increasing, and the prevention of their associated risk factors has been one of the world's health priorities2. The World Health Organization (WHO) demonstrated that in different age groups, about 450 million people are suffering from severe and mild mental disorders worldwide3. Moreover, at least 52 million people with severe mental disorders, such as schizophrenia, and nearly 150 million people tolerate unspecified mental diseases, including psychological distress4. The literature also discussed whether such distress is a symptom of a mental disorder or a marker of functional impairment5.

Psychological distress is the most common mental health issue affecting many children and is considered one of the leading causes of the global burden of disease6. The National Comprehensive Cancer Centre (NCCN) defines distress as an unpleasant emotional experience of a psychological problem, such as depression, worriedness, and panic7. If children's psychological distress remains untreated, their development is significantly influenced.

Children worldwide are affected by similar psychological distress as adults8. It is related to an increased risk of harmful events, including drug addiction and poor educational performance9. They are common in the Eastern Mediterranean Region (EMR), including Iran and neighboring countries, and are the leading cause of years of life lived with disability (YLDs). In EMR, depression was accounted for the most Disability-Adjusted Life Years (DALYs), and worriedness ranked second in 201310. In summary, depression, and worriedness, two essential components of psychological distress, are among the illness and disability leading causes in adolescents11.

Many studies have attempted to investigate the association between several factors and psychological distress among children. Risk factors associated with such psychological distress appear to be modifiable, partly through the link between these characteristics and lifestyle factors. In general, the literature review shows that the spread of psychological distress changes depending on various factors such as gender12, age13, hours of sleep14, physical activity and screentime15, family size16, life satisfaction17, residence area18, socioeconomic status19, self-rated health20, body mass index21, eating breakfast22, body image23, number of close friends24, having weight-reduction plan25, junk-food consumption26, sweetened beverage consumption27, smoking28, abdominal obesity29, parents’ education30, birth weight31, breastfeeding32, family history of sudden death33 and family history of cancer34. However, many studies focused on the univariate analysis and descriptive studies of psychological distress in children and adolescents, but there exist fewer studies about the classification of comprehensive modifiable risk factors and their interactions and associations with psychological distress.

Few studies have been conducted on using data mining for psychological distress classification, and to the best of our knowledge, none of them comprehensively considered various determinants and their interactions3, 35, 36. Thus, this study aims to classify the risk factors associated with psychiatric symptoms based on the demographic, lifestyle, socioeconomic status, and family history of diseases in a large sample of children and adolescents. The Group Method of Data Handling (GMDH), proposed by Ivakhnenko37, was used for classification in our study. In this network, optimal hyperparameters, the number of hidden layers, and neurons in such layers are automatically identified. Moreover, the interpretable interaction network is provided by the GMDH, which is very important in medical data mining38.

Results

Prevalence of mild-to-moderate emotional symptoms, worriedness, and depression

This national survey's participation rate was 90.6%, and the subjects enrolled were 13,486 children and adolescents out of 14,880 invited subjects. The number of missing values ranged from zero (living place and gender) to 1158 (birth weight category) in the enrolled subjects. The occurrence of missing data on the dependent and independent variables of the enrolled subjects was random (Little's MCAR test39; p = 0.421). In our study, the subjects with complete information were first analyzed. Accordingly, 10,350 subjects (i.e., 76.7% of the enrolled subjects) were analyzed. Overall, the percentages of 6–10, 11–14- and 15–19-year-old age groups were 33.7, 35.0, and 31.3, respectively, and 50.1% of the population was boys. The prevalence of having worriedness symptoms, mild-to-moderate emotional symptoms, and depression symptoms was 23.7%, 11.1%, and 20.1%, respectively. Specifically, the prevalence of having a worriedness symptom was 20.6% and 26.8% in boys and girls, respectively. 8.9% of boys and 13.2% of the girls suffered from mild-to-moderate emotional symptoms, while 18.5% of boys and 21.7% of girls experienced depression symptoms. The distribution of demographic variables, family history of diseases, and lifestyle factors was presented in different psychiatric groups (Tables 1, 2). The pairwise association between the input features and the outcome variables was shown in Fig. 1.

Table 1 Demographic and socioeconomic characteristics of participants: the CASPIAN-IV study.
Table 2 Lifestyle and health-related characteristics of participants: the CASPIAN-IV study.
Figure 1
figure 1

The bivariate correlation between the inputs and outputs. × 1: sleeping-time cat. (category); × 2: screen time cat.; × 3: family-size cat.; × 4: life satisfaction cat.; × 5: residence area; × 6: gender; × 7: physical activity cat.; × 8: SES cat.; × 9: self-rated health cat. (SRH); × 10: BMI cat.; × 11: breakfast cat.; × 12: body image cat.; × 13: age cat.; × 14: the number of close friends cat.; × 15: weight-reduction plan; × 16: salty-snack cat.; × 17: sweetened beverage consumption cat.; × 18: fast food consumption cat.; × 19: smoker; × 20: abdominal obesity; × 21: mother education cat.; × 22: birth-weight cat.; × 23: milk type during infancy; × 24: the family history of sudden death; × 25: the family history of cancer; y1: worriedness symptom; y2: depression symptom; y3: mild-to-moderate emotional symptoms; y4: psychiatric symptoms. The entire ordinal variables were encoded ascending (e.g., mild to severe, or seldom to daily), except for × 9 (1: good, 2: moderate, 3: bad), × 11 (1: non-skipper, 2: semi-skipper, 3: skipper), and × 17 (1: daily, 2: weekly, 3: seldom/never).

The association between outcome variables was measured using the Phi coefficient40. It was 0.208 (CI 95% 0.191–0.224) (p < 0.001) between worriedness and depression symptoms. The association between worriedness symptom and mild-to-moderate emotional symptomswas 0.385 (CI 95% 0.370–0.399) (p < 0.001). Similarly, the association between depression symptom and the mild-to-moderate emotional symptoms was 0.285 (CI 95% 0.269–0.300) (p < 0.001). Thus, no more than trivial association in the first and last outcome pairs was observed40, 41, while worriedness symptom and mild-to-moderate emotional symptoms were weakly correlated in our study.

Moreover, the association between depression symptoms and each of Questions 1–5 (i.e., questions used to define mild-to-moderate emotional symptoms) was assessed. The lowest association was with confusion (Q5) (Phi coefficient = 0.179 (CI 95% 0.162–0.195) (p < 0.001), while the highest association was with worthless (Q1) (Phi coefficient = 0.220 (CI 95% 0.203–0.236) (p < 0.001). Thus, no more than a trivial association between depression and any of the five mild-to-moderate emotional symptoms items was observed. The factor analysis was also used based on principal components analysis (PCA) on the original Q1–Q5. Only one PC was selected with Eigenvalues greater than one. It showed an acceptable discrimination for depression diagnosis [Area under the ROC Curve = 0.736 (CI 95% 0.726–0.746); (p < 0.001)]. Thus, the combination of Q1-Q5 could be indirectly used for depression symptom diagnosis.

Classification results

Among 25 features used in our study, five, ten, and four features were selected by the stability feature selection and the GMDH network for depression symptom, mild-to-moderate emotional symptoms, and worriedness symptom, respectively, consistent during cross-validation. The proposed GMDH networks for depression symptoms, mild-to-moderate emotional symptoms, and worriedness symptoms were shown in Figs. 2, 3 and 4. The top three most essential depression symptom features were the eating breakfast category (cat.), self-rated health cat (SRH), and diet program. They were age cat., self-rated health cat., and milk type for the mild-to-moderate emotional symptoms while age cat., physical activity cat., and socioeconomic status (SES) cat. were the most critical features for worriedness symptom. The most important features were selected based on the number of interactions used in the network.

Figure 2
figure 2

A representative GMDH network for classifying depression symptoms.

Figure 3
figure 3

A representative GMDH network for classifying mild-to-moderate emotional symptoms.

Figure 4
figure 4

A representative GMDH network for classifying worriedness symptoms.

The average performance of the GMDH network and the state-of-the-art on the test folds during threefold cross-validation of psychological distress were shown in Table 3. The entire performance indices and their CI 95% of the analyzed methods on the cross-validated confusion matrix were reported in Table 4. The GMDH network significantly outperformed the state-of-the-art for the entire outcomes (adjusted p < 0.05; McNemar's test), except for LDA in mild-to-moderate emotional symptoms classification where they were not significantly different.

Table 3 The performance of the different classifiers in MEAN ± SD over the test folds using threefold cross-validation.
Table 4 The performance of the different classifiers and their CI 95% based on the cross-validated confusion matrix.

We further classified the imputed dataset in which the number of subjects was increased from 10,350 to 11,820 using the GMDH algorithm (Table 5). No significant improvement was observed compared with the dataset with complete information (adjusted p > 0.05; McNemar's test).

Table 5 The performance of the GMDH classifier in MEAN ± SD over the test folds using threefold cross-validation on the imputed dataset.

The performance of the proposed algorithm for classifying extended three-class psychiatric symptoms was provided in Table 6. The algorithm was run for each age category (6–10, 11–14, and 15–19 years) to improve its performance. The selected factors by the stability feature selection were breakfast, life satisfaction, self-rated health, screen time, residence area, sleeping-time, and weight-reduction plan for the first age category. They were self-rated health, life satisfaction, gender, breakfast, sleeping-time, residence area, weight-reduction plan, physical activity, and body image for the second age category. The algorithm selected self-rated health, life satisfaction, gender, breakfast, physical activity, body image, and screen time for the third age category. Among the selected factors, self-rated health, life satisfaction, and breakfast were common in all age categories.

Table 6 The performance of the GMDH algorithm (in percent) based on the cross-validated confusion matrix for classifying extended three-class psychiatric symptoms.

Discussion

In our study, we considered depression symptoms, worriedness symptoms, and mild-to-moderate emotional symptoms for classification. The SRH was recognized as the primary variable for classifying depressive symptoms and mild-to-moderate emotional symptoms (Figs. 2, 3). Overall, few studies were performed on SRH in the literature. It has been shown that depression was strongly associated with reporting poor SRH42. There was also a relationship between poor SRH, depressive and anxiety symptoms among university students with high academic stress43.

Moreover, a longitudinal study of adolescent health in the United States demonstrated that one of the main factors associated with persistent depressive symptoms was poor self-rated general health44. Although there was no significant correlation between SRH and depression symptoms in our dataset (Rank Biserial rrb = 0.002; p = 0.819) (Fig. 1), its interactions with screentime, diet, and breakfast were selected by the GMDH network (Fig. 2). It is how the interaction network identifies indirect factors. However, the univariate analysis could not identify this factor. It was similar for mild-to-moderate emotional symptoms, where there was no significant correlation between SRH and mild-to-moderate emotional symptoms in our dataset (Rank Biserial rrb = 0.008; p = 0.390) (Fig. 1); its interactions with milk type, abdominal obesity, and beverage consumption were selected by the GMDH network (Fig. 3).

One of the most critical risk factors for children, the physical activity level, was selected for those having worriedness symptoms in our study (Fig. 4). The beneficial effects of regular physical activity on health are indisputable in modern medicine45. Furthermore, a large amount of exercise plays an essential role in minimizing the worry in clinical settings46. Although the correlation between physical activity and worriedness symptoms was very low in our study (Rank Biserial rrb = −0.087; p =  <0.001) (Fig. 1), its interactions with the others were selected by the GMDH network (Fig. 4).

In our study, sleep hour was selected as a factor for mild-to-moderate emotional symptoms (Fig. 3). It is in agreement with previous studies47. Adverse general health outcomes are associated with the indicators of sleep problems, such as short sleep duration. Another study on 11,788 pupils from 11 different European countries showed a negative association between sleep time hours per night and emotional symptoms48. Although the correlation between sleep hours and mild-to-moderate emotional symptoms was very low in our study (Rank Biserial rrb = −0.080; p =  <0.001) (Fig. 1), its interaction with the milk type during infancy was selected by the GMDH network (Fig. 4). It was shown in the literature that there is a relationship between breastfeeding and sleep quality in infants49. Also, breastfeeding is related to behavior problems in children and adolescents50. However, there was no significant correlation between sleep hour and milk type in our study (Rank Biserial rrb = 0.005; p = 0.634) (Fig. 1).

One of the notable factors in our research was screen time (Fig. 2). Some studies showed that children who watch TV for more than two hours a day usually have lower self-esteem, lower school performance, and unhealthy eating habits51. Such consequences would lead to psychological distress in young children52. The consequences reported by these articles are in agreement with our findings of the positive association between screen time and having depression symptoms. Although the correlation between screen time and depression ssymptoms was very low in our study (Rank Biserial rrb = 0.056; p =  < 0.001) (Fig. 1), its interactions with SRH and breakfast were selected by the GMDH network (Fig. 4).

A predictor of depression and worriedness symptoms was salty snack consumption (Figs. 2, 4). In general, there were few studies in this field26. It is also demonstrated that 12 to 13-year-old Norwegian adolescents with healthy dietary patterns have better mental health conditions53. Although the correlation between salty snack consumption and depression or worriedness symptoms was very low in our study (depression: Rank Biserial rrb = 0.030; p = 0.002, worriedness: Rank Biserial rrb = 0.044; p < 0.001) (Fig. 1), its interactions with SRH and breakfast, and with age category, SES category, and the physical activity were selected for depression, and worriedness symptoms, respectively (Figs. 2, 4).

Breakfast is one of the most important meals. The prevalence of breakfast skipping is increasing among adolescents. Previous studies showed that breakfast intake is related to mental problems54. Another study showed that skipping breakfast at least four times a week was significantly associated with a higher depressed mood score55. Our finding is consistent with such results on the association of breakfast consumption with depression symptoms (Fig. 2). Although the correlation between breakfast consumption and depression symptoms was relatively low in our study (Rank Biserial rrb = 0.107; p < 0.001) (Fig. 1), its interactions with the others were selected (Fig. 2).

Discretization was used in our study to generate categorical input variables instead of interval variables. Although this procedure reduces the flexibility of the variables, it could increase the classifiers' performance and their generalization. We further used the original interval variables, and the average accuracy of the GMDH classification system was reduced by 3%, 4%, 2%, and 7% for depression symptoms, mild-to-moderate emotional symptoms, worriedness symptoms, and psychiatric symptoms. Moreover, the correlation between the original input variables and their categorical version ranged from 0.68 for age (Spearman’s rho; p < 0.001) to 0.95 for screentime variables (Spearman’s rho; p < 0.001). Thus, such a discretization did not significantly reduce the amount of information.

In our study, the GMDH network was used as a classifier. This network is incremental and expands with regularized least squares (RLS), a convex algorithm, but it also generates the interaction network (Figs. 2, 3, 4), leading to better clinical interpretations. The proposed GMDH network had very good diagnostic accuracy for depression symptom classification, while it showed good diagnostic accuracy for other outcomes. It showed an excellent, fair to good agreement rate with the gold standard for depression symptoms and worriedness symptom or psychiatric symptoms classification. However, it showed a poor agreement rate for the mild-to-moderate emotional symptoms classification. The proposed system's discriminant power was fair, limited for depression symptoms and worriedness symptom classification. However, it was poor for mild-to-moderate emotional symptoms classifications. The proposed system's false alarm (FA) ranged from 3 to 31% when classifying depression symptoms and mild-to-moderate emotional symptoms. However, the proposed system's statistical power was always higher than 70% in the entire outcome. Moreover, the false discovery rate (a.k.a., 1-precision) ranged from 13 to 78% for classifying depression symptoms and mild-to-moderate emotional symptoms.

The proposed GMDH network had the best and worst classification performance (MCC) for depression symptoms and mild-to-moderate emotional symptoms, respectively (Table 4). The dataset was highly imbalanced for mild-to-moderate emotional symptoms outcome (the prevalence of 11.1%). However, the entire performance indices were consistent in different test folds (Table 3). It must be mentioned that the diagnostic accuracy of classifying psychological distress is not usually high in the literature56. In the meanwhile, there could be two reasons why the GMDH significantly outperformed the MLP classifier. First, the MLP is a fully connected network, while the GMDH is not (Figs. 2, 3, 4), resulting in more parameters in the MLP. Second, the cost function of the GMDH was customized for the imbalanced data, while the cross-entropy was used for the MLP that could be improved by using weighted cross-entropy57.

The GMDH algorithm was further used to classify the extended three-class psychiatric symptoms. Overall, the agreement rate of the extended system was comparable to that of the four two-class problems (Tables 4, 6). However, the original questions are used in the extended system, and the severity of the entire psychiatric symptoms are identified rather than defining a psychiatric symptom based on one question. Accordingly, it might be preferred in practice. For the extended system, analyzing the interaction network identified that self-rated health, life satisfaction, breakfast consumption, and sleeping/screen time were the most critical factors.

In our study, we designed classification systems for psychiatric symptoms. Proper diagnosis of mental disorders, such as depressive and anxiety disorders, requires detailed analysis58. As an important limitation of our study, each outcome variable of depression and worriedness symptoms was derived from a single question. Mild-to-moderate emotional symptoms outcome was created based on five questions (confusion, insomnia, anxiety, angriness, and worthlessness). Generally, depression might be embedded in emotional symptoms, as one of the limitations of our methodology. However, due to the definition of depression in our study that prevented students from routine activities, it was more alarming than mild-to-moderate emotional symptoms defined based on Q1–Q5. However, Q1–Q5 could be considered as symptoms of depression. In our analysis, no more than a trivial association between depression and any of the five mild-to-moderate emotional symptoms items was observed. However, the combination of Q1–Q5 had acceptable discrimination for depression diagnosis. Moreover, the overall outcome (i.e., psychiatric symptoms) was generated based on the entire seven questions in our analysis.

The advantage of our study is the large sample size. Moreover, it analyzed the comprehensive factors related to psychiatric symptoms to monitor direct and indirect modifiable factors. However, it is a repeated cross-sectional study59, and no casualty can be inferred. We only analyzed the CASPIAN-IV data, and examining the trend and association between variables over time is the focus of our future activity. The other limitation was the possible bias in the self-reported answers of participants. Moreover, there is evidence about adolescents using the substance in Iran and their considerable psychological dysfunction60. However, it was not recorded in CASPIAN-IV. It is another limitation of our study.

In conclusion, our study emphasized the modifiable factors of psychiatric symptoms, including breakfast, salty snack, sweet beverage consumption, consumption, screentime, (abdominal) obesity, sleep hour, and physical activity. Iran ranked fourth among the countries with the highest age-standardized mental disorder DALYs rates (2436.44 DALYs per 100,000, based on the GBD 2019)61. Such disorders could root from childhood psychiatric symptoms, and empowering protective factors and changing modifiable risk factors might reduce such rates in the future. It is possible to design an online web-based or Android App of the developed algorithm to identify whether the student could have a high risk of “psychiatric symptoms” based on the selected input variables (Figs. 2, 3, 4). Such indirect health screenings have a great potential to be integrated into schools, which is the focus of our future activities.

Methods

A large population of the fourth study of a national surveillance program, entitled “Childhood and Adolescence Surveillance and Prevention of Adult Non-communicable disease” (CASPIAN-IV), supported by the WHO/Eastern Mediterranean region and the Iranian Ministry of Health and the Ministry of Education, were analyzed in our project. Detailed methodology is published elsewhere62. We briefly describe the study population.

The population and sampling method

A sample of 14,880 students aged 6–18 years was selected by multi-stage sampling from schools of urban and rural areas of 30 Iranian provinces. Having explained the objectives and protocols, participants were enrolled in the study. Parents gave the written informed consent and oral permission, while oral assent was obtained from students to express willingness to participate in research. Trained healthcare professionals performed all the data collection procedures. Study protocols were reviewed and approved by the Research and Ethics Council of Isfahan University of Medical Sciences (#5429-90). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Outcome variables

The World Health Organization-Global School-based Student Health Survey (WHO-GSHS) was used in our study. It covers alcohol, tobacco use, hygiene, physical activity, mental health, dietary behaviors, violence, protective factors, and unintentional injuries among children and youths. After translating questions into Persian and simplifying the questions with any difficulty in understanding, the questionnaire's reliability and validity were assessed63,64,65. We considered psychiatric symptoms as depression symptoms, worriedness symptoms, and mild-to-moderate emotional symptoms, where the latter included confusion, insomnia, anxiety, angriness, and worthlessness66. The three indicated psychiatric symptoms were taken into consideration in this study. Also, if a subject has any of the depression symptoms, worriedness symptoms, and mild-to-moderate emotional symptoms, he/she was considered having “psychiatric symptoms” as the overall outcome of the study. The psychiatric symptoms were assessed by the questions presented in Supplementary Table S1. The first five questions were used to identify mild-to-moderate emotional symptoms in our research. Those who experienced at least 3 out of 5 problems every day, more than once a week, or once a week were defined as “adolescents with mild-to-moderate emotional symptoms.” An indicator of depression symptoms was a positive answer to the 6th question. In the last question, students who were worried most of the time or always so that they could not sleep at night were considered having worriedness symptoms35. Thus, our goal is to design and implement four binary classifiers to classify depression symptoms, mild-to-moderate emotional symptoms, worriedness symptoms, and “psychiatric symptoms” using the input variables. We further created an extended outcome using the entire seven questions used in the questionnaire. The original response to seven questions of the questionnaire was analyzed, and no dichotomization was used. The first principal component was extracted using Principal Component Analysis (PCA). Then, tertiles of the PC were extracted, splitting the subjects into “low,” “medium,” and “high” risk psychiatric symptoms groups. Thus, an extended three-class classification problem was also considered.

Input variables

The input variables considered in our psychiatric symptom classification system are as follows67: age category, sex, socioeconomic status (SES), physical activity level, body mass index (BMI) category, abdominal obesity, family size, residence area, sleep-time category, screen time category, (passive and active) smoking habit, life satisfaction, health status, as well as the consumption of breakfast, fast food, salty snack and beverage, having nutrition plan, the number of close friends, mothers' education level, body image, birth weight category, milk type used in infancy, and the family history of cancer and sudden death.

Measurements

In this study, age was categorized as 6–10, 11–14, and 15–19 years68. The screen time was considered as a categorical variable and consisted of the time spent on watching television (TV)/video and computer games during leisure time, less than or equal to 4 (≤ 4 h) defined as low, and greater than 4 (> 4 h), as high35. The three categories of sleep time hour per week were defined as: sleep time hour less than or equal to 5 (h ≤ 5 h) (low), 5–8 h (moderate), and ≥ 8 h (high)69. The residence area was considered either urban or rural. The physical activity at school and out of school was quantified using principal component analysis (PCA). The obtained scores were then categorized into tertiles66. Variables including family assets, such as ownership of a house, car, computer, occupation, and education level of parents, school type (private/public), were summarized in one main PCA component. Students were then classified as having low, moderate, and high SES, based on the component tertiles. The active smoking habit was considered using tobacco products (cigarettes, pipe, hookah, etc.) every day, while passive smoking was considered, as exposure to tobacco smoke was used by others or second-hand smokers66. Subjects with either passive or active smoking were considered as smokers and non-smokers; otherwise. The general state of participant's health was determined by the self-rated health (SRH) variable, asking “How would you describe your general state of health?” on the GSHS questionnaire, with the categories of “good,” “moderate,” and “bad”35. Life satisfaction was evaluated by asking questions about the degree of satisfaction with their life, using a tenth-point scale from 10 = very satisfied to 1 = very dissatisfied. The scores below 6, was signified low and high satisfaction, otherwise70. Body image was assessed using the question, “What do you think regarding your body size?”; the answer to this question was obtained with the following options “much too fat,” “a bit too fat,” “about the right size,” “a bit too thin,” “ much too thin.” For the analysis, the variable was divided into overweight (much too fat and a bit too fat), underweight (much too thin and a bit too thin), versus normal weight cognition71. Breakfast consumption was categorized into three groups as non-skipper (those eating breakfast 5–7 days a week, semi-skipper (those eating breakfast 3–4 days a week), and skipper (those eating breakfast 0–2 days a week)22. The students were asked about the frequency of salty snack consumption, categorized as “seldom or never,” “weekly,” and “daily” consumption72. The family size was categorized as “less than or equal to 4” or “greater than 4”. The number of close friends was categorized as nothing, one, two, three, or more. The nutrition plan was assessed as “adherence to a weight-modifying plan based on a special diet” or not, otherwise71. Sugar-sweetened beverage consumption (i.e., soda, soft drinks) was categorized as “daily,” “weekly,” “seldom or never”72. The consumption of fast foods (pizza, fried chicken, cheeseburgers, hamburgers, and hot dogs) was categorized into three groups: daily, Weekly, seldom, or never. The education level of mothers was categorized into three groups: Illiterate, diploma, and university degrees. Participants' birth weight (BW; g) was asked from their parents and then categorized into three groups; low (BW < 2500 g), normal (BW: 2500–4000 g), and high (BW > 4000 g)73. We also assessed whether breastfeeding was done for the children and adolescents during their infancy74, and the variable milk type was categorized as breast milk (1) and others (0) otherwise. Moreover, we considered the family history of sudden death (yes or no) and also the family history of cancer (yes or no) of the first-degree relatives of the subjects enrolled in the study.

Anthropometric measurement

In our study, trained healthcare providers performed anthropometric measurements at school. All measurements were conducted with calibrated instruments, according to standard protocols66. Height was measured in the standing position, barefooted while shoulders touch the wall. It was recorded to the nearest 0.2 cm. We measured weight shoeless and in lightly dressed condition to the nearest 200 g. Waist circumference (WC) was measured by a non-elastic tape to the nearest 0.2 cm.

We calculated the BMI as weight in kilograms, divided by height in meters squared (m2). The subjects were classified as underweight, healthy weight, overweight or obese, if BMI was < 5th percentile, between 5th and 85th percentiles, higher than 85th percentiles (i.e., BMI categories), respectively75. Abdominal obesity was defined as WC to height ratio (WHtR) of more than 0.576.

Feature extraction

The interval variables (e.g., age, BMI, birth weight, family size, number of close friends, sleep, and screen time) were first categorized using unsupervised discretization. Although discretization reduces interval variables' flexibility, it could improve the classification problems' performance and generalization77. The input features in our study were thus entirely categorical. Their measurement scale was nominal (e.g., sex, smoking status, milk type, family history of sudden death, family history of cancer) or ordinal (e.g., age category, SES, physical activity level, BMI category). For each categorical variable, the wight-of-evidence encoding78 was used for obtaining continuous covariates. For each outcome variable, stability feature selection79 was then performed. The selected features were used in the following classification procedure.

Classification

Twenty-five input variables were used in GMDH (i.e., layer zero), while the outcomes depression, worriedness, and mild-to-moderate emotional symptomswere separately used as outputs. At the first layer, each pairwise interaction of the inputs was considered as a neuron. Suppose that the pair xi,j, and xi,k (features no. k and j from the subject no. i) are combined to generate the estimated outcome \({\tilde{y }}_{i}\) at the first layer using the second-order polynomial model shown in Eq. (1).

$${\tilde{y }}_{i}={a}_{0}+{a}_{1}\times {x}_{i,j}+{a}_{2}\times {x}_{i,k}+{a}_{3}\times {x}_{i,j}^{2}+{a}_{4}\times {x}_{i,k}^{2}+{a}_{5}\times {x}_{i,j}\times {x}_{i,k}$$
(1)

where the coefficients \(A={\left[{a}_{0},\dots ,{a}_{5}\right]}^{T}\) could be estimated using Regularized Least Squares (RLS)80 on the entire estimation set in Eqs. (2,3).

$$A={\left({X}^{T}\times X+\lambda {I}_{6}\right)}^{-1}\times {X}^{T}\times Y$$
(2)

where

$${X}_{{N}_{e}\times 6}=\left[\begin{array}{cccccc}1& {x}_{1,j}& {x}_{1,k}& {x}_{1,j}^{2}& {x}_{1,k}^{2}& {x}_{1,j}\times {x}_{1,k}\\ 1& {x}_{2,j}& {x}_{2,k}& {x}_{2,j}^{2}& {x}_{2,k}^{2}& {x}_{2,j}\times {x}_{2,k}\\ .& .& .& .& .& .\\ .& .& .& .& .& .\\ .& .& .& .& .& .\\ 1& {x}_{{N}_{e},j}& {x}_{{N}_{e},k}& {x}_{{N}_{e},j}^{2}& {x}_{{N}_{e},k}^{2}& {x}_{{N}_{e},j}\times {x}_{{N}_{e},k}\end{array}\right],{Y}_{{N}_{e}\times 1}={\left[{y}_{1},{y}_{2},\dots ,{y}_{{N}_{e}}\right]}^{T}$$
(3)

and λ is the regularization parameter, yi is the output of the sample no. i of the estimation set and I6 is the identity matrix of size six. In addition to the regularization, the training set was divided into the estimation set with Ne number of samples and the validation set with Nv number of samples to avoid over-fitting during learning. The regularization parameter was tuned using the brute-force search algorithm to maximize the Matthews correlation coefficient (MCC)81 on each output validation set.

At the first layer, each neuron's RLS coefficients were estimated on the estimation set using the above procedure. Each neuron's performance was then assessed on the validation set, and the corresponding MCC values were calculated. The top 10 neurons with better performance than the previous layer's neurons were selected at maximum, and their pairwise interactions were analyzed at the next layer. The network is built up layer by layer during training until the stopping criterion based on the “early-stopping” strategy is achieved. Whenever the validation set's performance is reduced at the next layer, the output of the current layer's best neuron was selected as the output of the entire GMDH network. The presented algorithm was used for the binary classification problems. For the extended multi-class problem, the Macro-averaged F1-score82 was used instead of the MCC as the fitness function.

Comparison with the state-of-the-art

Other classifiers, namely linear discriminant analysis (LDA), multilayer perceptron (MLP), and supported vector machines (SVM), were used for comparison. LDA is a base classifier used to identify whether the classes could be accurately identified using linear boundaries. SVM, on the other hand, constructs a hyperplane in a high-dimensional space. The nonlinear SVM with the radial basis function (RBF) kernel was used in our study. We tuned the RBF kernel radius and the soft-margin parameter using the method proposed by Wu and Wang83. MLP, a feed-forward fully-connected artificial neural network (ANN) model, maps a set of inputs onto an output. In our study, ten neurons with the sigmoid active function and one hidden layer were used. The parameters of the network were tuned on the validation set.

The validation framework

In our study, threefold cross-validation with stratified sampling84 was used. The same test folds were used for different classifiers. In the GMDH and MLP classifiers, 75% of the training set was used for estimation, while 25% was used for validation. The performance indices in Supplementary Table S2 were reported for different classifiers. True Positive, False Positive, True Negative, and False Negative were calculated by comparing the classifiers' results and the gold standard in four classification problems (i.e., depression symptom, mild-to-moderate emotional symptoms, worriedness symptom, and finally psychiatric symptoms). The interpretation of the reference intervals of the indices AUC, K(C), MCC, and DP85 was listed in Supplementary Table S3. Moreover, following the STARD guideline86, the CI 95% of the performance indices were reported for the cross-validated confusion matrices. The performance of the proposed GMDH algorithm on the three-class extended problem was assessed based on the Macro- and -Micro averaged indices presented by Sokolova and Lapalme82.

Statistical analysis

In this paper, subjects with complete information were used in the analysis. Only for comparison87 (Table 5), Multivariate Imputation by Chained Equations (MICE) R-package88 was used to impute the ordinal data of the enrolled subjects. All variables were reported, as the frequency and percentage, since they were categorical. The χ2 analysis was used to compare the categorical variables in their categories. The Spearman's rho is used as the correlation coefficient between interval-ordinal pairs. Kendall's \(\tau_{b}\) was used as the correlation coefficient between ordinal-ordinal pairs. The phi coefficient and rank-biserial correlation coefficient were used for the binary-binary and binary-ordinal association, respectively40. The Cochran's Q test with McNemar's post hoc test for pairwise comparison with Bonferroni correction was used to compare different classifiers' performance. A significance level of 0.05 was used in our analysis. MATLAB version 8.6 (The MathWorks Inc., Natick, MA, USA) was used for classification, while R version 4.0.0 (R Core Team (2020), https://www.R-project.org/) was used for data imputation. The statistical analysis was performed using the SPSS statistical package, version 18.0 (SPSS Inc., Chicago, IL, USA).