Introduction

Human relationships are the foundation of a civilized society. A family is a recognized group of people bound together by the bonds of marriage1. In married life, marriage separation or divorce can be the most unpleasant event which hurts members of the family and have negative affect on their life2. Divorce is one of the most critical phenomena impacting individuals’ lives as well as personal and social identity3. The rate of divorce in the Arab world has increased rapidly in recent years4. The rising divorce rate is a major problem in Saudi society because many couples consider it as the primary solution to end their struggles5. For a very long time, economists, psychologists, and sociologists have struggled with the important and difficult question of predicting people’s social preferences6. When scholars extract textual features from the content of online texts to support better user understandings and services, emotional signals and sentiment tendencies also draw more attention in the information age7. According to local media reports, divorce rate in Saudi Arabia has reached unprecedented levels in the last few years. In 2022, the average number of divorces is about 168 divorces per day (seven divorces per hour). There are three divorces for every ten marriages. According to Ministry of Justice administrative data, about 150,117 marriages and more than 57,500 divorces took place in 2020 (an increase of 8.9% and 12.7% respectively from 2019). The overall divorce rate for the total population reached 2.18 per 1000 population, an increase of 10.1% from 2019. Saudi Arabia’s population has grown by 13.8% since 2019, and the overall divorce rate per 1000 Saudi population reached 3.64%. The highest overall divorce rate among Saudis provinces was recorded in Hail (4.47%), followed by Northern Borders (4.42%). The lowest overall divorce rate was recorded in Jazan Province (2.50%), followed by Eastern Regions (2.84%), and Albaha (2.84%). Most of these divorces occurred in the last 3 months; October (14.6%), November (13.9%) and December (14.5%)8,9.

Yöntem and lhan10 built the DPS on the foundation of Gottman couples therapy which focused on divorce prediction. The Gottman couples therapy model explains the reasons that lead to divorce. John Gottman, a psychology professor at the University of Washington, created this technique. According to this method, the factors criticism, disdain, defensiveness, and obstructionism are identified as the four main causes of problems in a relationship. The strategy seeks to improve friendships by fostering constructive conflict resolution and a sense of purpose in life. This theory contains seven fundamental principles which are love maps, turning towards and discussing, positive perspective, solve problems together, managing conflicts and shared meaning11.

Determining divorce rates and identifying common causes of divorce usually help to reduce the rate of divorce cases. It also benefit family consultant and therapist when providing consultations to married couples and family members to help them in resolving their disputes.

The goal of this study is to use and compare the machine learning algorithms to determine the divorce success rate of DPS and identify the reasons that usually lead to divorce in the scenario of Ha’il region, KSA. For this purpose, the algorithms of ANN, NB, and RF were used to determine the success rate of DPS in the scenario of Ha’il Region, KSA. These three machine learning algorithms were applied and compared to determine the success rate of DPS, to predict divorce among Saudi couples, and to identify the reasons behind divorce.

Related work

In order for computation equipment to be seamlessly integrated into people’s lives and to deliver more intelligent universal services through real-time sensing and dynamic interaction with the physical world, people want to closely relate the virtual world created by computation facilities to the physical world12. Researchers are continually tweaking the algorithms to improve their performance due to issues like the classifier performance declining with emotion refinement, the lack of a connection between sentences and the entire text, and the recognition of complex human emotions13. ext emotion analysis has grown in importance as one of the key areas of study in the field of natural language processing in recent years. It has been highlighted how to computationally identify and classify the opinions expressed in a piece of writing6. A variety of pattern recognition algorithms that were previously prohibitively expensive can now be used to uncover hidden values in large datasets thanks to advancements in computing technology14. Learning new ideas improves a person’s meta data and aids in the evaluation of individual class predictions by the local algorithms15. Many different fields, including signal processing, data mining, communications, finance, bio-medicine and robotics, etc. have heavily incorporated machine learning16,17.

Yöntem and lhan built the DPS on the foundation of Gottman couples therapy which focused on divorce prediction10. The Gottman couples therapy model, which was based on actual research, explained the most common reasons that lead to divorce. Within this paradigm, significant divorce predictors include the standards outlined in the Sound Relationship House concept. In this model, Gottman characterized four communication styles namely, Criticism, Contempt, Stonewalling, and Defensiveness, which can predict the end of a relationship18. Turkish researchers (Mustafa Kemal Yontem, Kemal Adem, Tahsin Lhan, and Serhat Kilicarslan) looked at divorce prediction from Turkish perspective19,10. Based on Gottman’s theory of couples, they created the DPS. They used ANN and relationship-based component determination. The Radial Basis Function neural network (RBF), ANN, and RF all achieved prediction rates of 97.64%, which was the highest. However, after selecting relationship-based highlights, they had 98.82% support for ANN. Furthermore, the accomplishment proportion was 97.64% using RBF and RF. Thus, they obtained the greatest results when they used the ANN model in conjunction with relationship-based element determination20,21.

Despite the lack of research on data mining techniques for divorce prediction, it is evident that various data mining techniques including classification, estimation and clustering are employed in numerous studies in the fields of psychology and psychiatry10. In 509 suicide attempters who were assessed in the emergency room, Baca-Garcia (2006) calculated the hospitalization choices of psychiatrists using data mining techniques. This study’s conclusions indicate that the Forward Selection approach has a 99% success rate in appropriately classifying patients22.

Song23 applied kNN, Bayes, and SVM data mining techniques to study psychological evaluation data of college students. Using SVM remarkable results were obtained regarding the binary classification model, with a success rate of 79.1%. Nguyen X24 employed data mining techniques to assess the effectiveness of insomnia symptoms in the management of long-term sleep apnea condition. Using decision trees, they showed that the unfavorable treatment responses were not related to long-term adjustment studies. A large number of radiology departments maintain an image database in an image archiving and communication system, which frequently offers a large number of examples for training neural networks. Since the 1960s, various computational methods for radiological diagnosis have been proposed and put into practise16. To improve the students’ operational effectiveness in the psychological data management system,

Qinghua25 implemented data mining technology based on the back-propagated ANN. The primary goal of this study is to avert psychological crises. Erikson et al.10 employed temporal data mining approaches to identify adverse medication responses. Rosenthal et al.26 utilized Data mining techniques to examine the variables influencing occupational results for people with mental impairments who received occupational rehabilitation services. They demonstrated that individuals getting job placement services have a favorable impact on occupational outcomes with the use of the CHAID algorithm10. Bae et al.27 implemented Decision tree algorithms to explore the factors that significantly affect the social functioning of schizophrenia patients27.

The development of the psychological equilibrium in society depends on healthy marriages. Researchers are seeking to counsel married couples on constructive marital remedies and disseminate information about tried-and-true methods. Research on the rehabilitation of patients who are hospitalized after suicidal attempts, recognizing the challenging parts for psycho-educational couples, and even the anticipated components of social functioning are now receiving more attention10.

In this study, the algorithms of ANN, NB, and RF were used to determine the success rate of DPS in the scenario of Ha’il Region, KSA. For this purpose three machine learning algorithms were applied and compared to determine the success rate of DPS, predict divorce among Saudi couples and to identify the reasons behind divorce. The prediction accuracy using ANN, NB, and RF was 80.00%, 85.00% and 90.00% respectively. However, after following the feature selection technique, the accuracy rate of NB and RF was increased to 88.14% and 91.66% respectively. The accuracy rate of ANN remained the same before and after the feature selection. Therefore, the best prediction was with RF after feature selection. The results show that DPS can predict divorce in the scenario of Ha’il region, KSA. This scale can help family counselors and therapists in case formulation and intervention plan development process. Additionally, it may be argued that the Ha’il region, KSA, sampling confirmed the Gottman couples treatment predictors.

Methodology

Study design and setting

The nature of this study was descriptive and survey design was carried out to collect data from the participants. In order to collect data from the Ha’il region, KSA, convenient sampling technique was applied. A Google form was used to collect data from participant. The form was consisted of two parts. The first part of the form was about personal information; age, gender, educational background, monthly income, kind of marriage, and marital status. The second part consisted of 54 questions for DPS. The responses for 54 attributes were gathered on five point Likert scale (0 = Never, 1 = Rarely, 2 = Average, 3 = Often, 4 = Always). After data collection, the data were translated into English, and cleaning and preprocessing of data was performed. Then the algorithms of ANN, NB, and RF were used to determine the success rate of DPS (Fig. 1).

Figure 1
figure 1

Study design28.

Dataset description and participants

The dataset was consisted of 148 cases altogether. These cases were divided into two groups. One was training dataset with 60% cases and the other was testing dataset with 40% cases. At the end machine learning algorithms were applied, using Google Colab, twice before and after feature selection. Google Colab was also used to develop histogram analysis of all 54 attributes of DPS. Table 1 lists the 54 attributes and Fig. 2 shows histogram analysis.

Table 1 DPS attributes with detail29.
Figure 2
figure 2

The divorce histogram analysis of all 54 attributes of DPS.

Data processing, training, and test sets

Collected data contained some missing values. The missing values were filled with the mean value of the concerned feature. After processing, the data was divided into two parts. One part, which is the training dataset, consisted of about 60% of the total data and the other 40% form the testing dataset.

Feature selection

Feature selection is the process of reducing the dimension of the data set through statistical techniques. In a nutshell, this process has the benefits of bettering mining performance, preventing overfitting of the algorithms, raising computational capacity, speeding up the data mining process, and improving understandability30. In this study, the six most useful features out of the 54 attributes that highly affect divorce were chosen by using CBFS (Correlation Based Feature Selection) approach. Correlation-based feature selection techniques, in supervised machine learning, chooses the optimal subset of features that comprises of characteristics which are substantially linked with the class but not with one another31,32. In this study, six attributes were obtained after applying CBFS (Correlation-based feature selection) technique on the dataset and their significant values were substantially linked with the class.

Machine learning algorithms

In this section, discussion will be done about machine learning algorithms that were used to determine the success rate of DPS in the scenario of Ha’il Region, KSA. There were three machine learning algorithms applied and compared to determine the success rate of DPS, predict divorce among Saudi couples and to identify the reasons behind divorce. These three machine learning algorithms were ANN, NB and RF.

Artificial neural network

In order to learn from data, generate new knowledge through learning and deal with an infinite number of variables, ANNs had been constructed. The ANN model was developed with the goal of simulating the human brain in a straightforward manner using computers. It focused on the mathematical modeling of biological neurons10.

The artificial neurons in this ANN algorithm are coupled to one another. A synthetic neuron is made up of four components. The dendrites transport the inputs from the sensory organs to the core in the human brain. The axons get the sum value that is produced by multiplying these input data by various weights. The synapses at the opposite end of the neuron get this value from the core via the axons, which then transmit it through the activation processes33.

In this study, the accuracy rate was 80.00% when ANN technique was simply used to the dataset. By using the same approach on the feature-selected dataset, the accuracy rate was remained same.

Naïve bayes

NB34,35 is a probabilistic classifier that relies on the Bayes theorem and makes significant assumptions about the relationships between the features. The majority of applications that use NB computations include sentiment analysis, spam filtering, recommendation frameworks, etc. Although they are quick and easy to complete, their biggest obstacle is the need that features be provided without charge.

In this study, the accuracy percentage was 85.00% when the NB technique was used to analyze the dataset directly. However, while using the same technique on the feature-selected dataset, the accuracy rate increased to 88.33%.

Random forest

The sacked group strategy known as RF relies on decision trees. When selecting a Random element, RF supports the differentiation of each tree separately. They then cast their votes in favor of the most prevalent class after it has produced many trees. Uneven information can be managed with the RF algorithm. It is rapid due to runtime and robust against over fitting36. After applying the RF method immediately to the dataset in this study, the obtained accuracy rate was 90.00%. But the accuracy rate proceeded from 90.00 to 91.66% by applying RF after feature selection.

Correlation based feature selection

The six most useful features out of the 54 attributes that affect divorce were chosen using this approach. Correlation-based feature selection techniques, in supervised machine learning, chooses the optimal subset of features that comprises of characteristics which are substantially linked with the class but not with one another31,32. In this study, six attributes were obtained after applying CBFS (Correlation-based feature selection) technique on the dataset and their significant values were substantially linked with the class.

Evaluation of models

Two metrics were used to evaluate the performance of the algorithms. The accuracy of the algorithms was calculated by applying the following metric37:

$$\begin{aligned} \text {Accuracy}=\frac{TP+TN}{TP+FP+FN+TN}. \end{aligned}$$

Kappa value of different algorithms was obtained by applying the following formula38:

$$\begin{aligned} k=\frac{p_o-p_e}{1-p_e}, \end{aligned}$$

where \(p_o =\) Relative observed agreement among raters, \(p_e =\) Hypothetical probability of chance agreement38.

Hyperparameter tuning

Hyperparameter tuning was performed to improve the performance of applied machine learning algorithms. Table 2 represents the hyperparameters which were used in this study to improve the performance of applied machine learning algorithms regarding divorce prediction.

Table 2 Hyperparameters for machine learning algorithms.

Results

In this study, probabilistic and ensemble learning classification algorithms along with ANNs were employed. As classifiers for this machine learning technique, ANN, NB, and RF have been used. Correlation-based feature selection (CBFS) technique was used for the feature selection portion. The feature vector is reduced to just six characteristics based on the identified correlation. The accuracy term was used to evaluate the algorithms. Every algorithm had been used twice, once with all features and once with only the chosen features. Google Colab was used to execute the machine learning algorithms. The computer contains a 4 GB RAM and an Intel Core i5-3320M, 2.60 GHz processor.

Table 3 Success rate for ANN.

The accuracy rate under ANN was the same with and without feature selection which was 80.00% (see Tables 3, 4).

Table 4 Confusion metrics for ANN.
Table 5 Success rate for NB.

The accuracy rate was 85.00% with the direct application of NB, without feature selection, on dataset (Tables 5, 6). But the accuracy rate proceeded from 85.00 to 88.33% by applying NB after feature selection.

Table 6 Confusion metrics for NB.
Table 7 Success rate for RF.

The accuracy rate was 90.00% with the direct application of RF, without feature selection, on dataset (Tables 7, 8). But the accuracy rate proceeded from 90.00 to 91.66% by applying RF after feature selection.

Table 8 Confusion metrics for RF.
Table 9 Values of significance through CBFS.

Table 9 lists the top six features and their significant values after using the correlation-based feature selection (CBFS) approach on the divorce dataset, Fig. 3 shows the analysis of these features. This indicates that Atr16 “Our views about the ideal marriage are similar”. was the highly affected feature. Other attributes include: Atr15 “In terms of living a good life, we both agree with each other”. Atr27 “I know my husband or wife very well”. Atr20 “I know how my partner wants to be taken care of when he/she is sick”. Atr7 “I enjoy traveling with my husband/wife”. Atr3 “The time I spent with my husband/wife is special for me”. These features were highly correlated and were used in the next phases. These DPS features might be helpful for counselors or therapists to make decisions in the course of their job.

Figure 3
figure 3

Divorce histogram analysis of top six highly effected features.

Table 10 Classifiers performance with and without feature selection.

Table 10 shows that the highest accuracy rate was 90.00% under RF algorithm after the direct application of classification methods on divorce dataset. But after the selection of six most influential feature with the help of CBFS, the highest accuracy rate was 91.66% under RF. After analyzing the above results, it was observed that after applying different classification algorithms the most successful result was achieved through RF used with the combination of CBFS.

Discussion

Turkish researchers (Mustafa Kemal Yontem, Kemal Adem, Tahsin Lhan, and Serhat Kilicarslan) looked at divorce prediction from Turkish perspective19,10. Based on Gottman’s theory of couples, they created the DPS. They used ANN and relationship-based component determination. The RBF, ANN, and RF all achieved prediction rates of 97.64%, which was the highest. However, after selecting relationship-based highlights, they had 98.82% support for ANN. Furthermore, the accomplishment proportion was 97.64 % using RBF and RF. Thus, they obtained the greatest results when they used the ANN model in conjunction with relationship-based element determination20,21. This study’s conclusions indicate that the Forward Selection approach has a 99 % success rate in appropriately classifying patients22. Song23 applied kNN, Bayes, and SVM data mining techniques to study psychological evaluation data of college students. Using SVM remarkable results were obtained regarding the binary classification model, with a success rate of 79.1%. But this study was conducted in the scenario of Ha’il Region, KSA. There were used the algorithms of ANN, NB, and RF to determine the success rate of DPS in the scenario of Ha’il Region, KSA. For this purpose three machine learning algorithms were applied and compared to determine the success rate of DPS, predict divorce among Saudi couples and to identify the reasons behind divorce. The prediction accuracy using ANN, NB, and RF was 80.00%, 85.00% and 90.00% respectively. However, after following the feature selection technique, the success rate for NB and RF was increased to 88.14% and 91.66% respectively but the success rate for ANN remained the same before and after feature selection. Therefore, the best prediction was with RF after feature selection. Thus our results aligns with the findings in Refs.19,10.

A strong output of the study is that DPS can predict divorce in the scenario of Ha’il region, KSA. Most likely a larger data set will support these finding. This scale can help family counselors and therapists in case formulation and intervention plan development process. Additionally, it may be argued that the Ha’il region, KSA, sampling confirmed the Gottman couples treatment predictors.

Conclusion

According to the findings of this study, DPS can be helpful for divorce prediction. In order to find the machine learning algorithm with the greatest performance, the attempt was made to differentiate between regular features and selected features. In this study, there were applied three algorithms on the dataset. The accuracy rates for ANN, RF and NB were 80.00%, 88.14% and 91.66%, respectively. Therefore, the best prediction was with RF after feature selection. One of the objective of this study was to use machine learning algorithms to predict divorce rates among Hail region, KSA spouses. If an early detection mechanism can be put in place using the information presented in this research, it will prevent the dissolution of thousands of families. In order to utilize DPS in their screening procedures, this may be advantageous for ministries that have direct contact with families, such as the Ministry of Family and Social Affairs, the Ministry of National Education and the Ministry of Health. This scale can be used by the counseling services personnel to get to know the individual who will be receiving family counseling and family therapy. The formulation of the case and intervention strategy may be influenced by the scale’s results. Moreover, it may be argued that the Hail region, KSA sampling verified the divorce predictions from Gottman couples therapy. Further research should examine the effectiveness of the Gottman couples therapy model’s intervention strategies in the Hail region with the help of experimental research by creating psycho-educational programs based on Gottman couples therapy and by using numerous attribute selection techniques to locate connected or hyperactive attributes that best represent the Hail region, KSA perspective.

Ethics statement

All subjects gave their informed consent for inclusion before they participated in the study. All methods were carried out in accordance with relevant guidelines and regulations. The informed consent was obtained from all subjects, and the protocol was approved by Research Deanship at University of Ha’il-Saudi Arabia number RD-21 067.