Abstract
Decision support systems can seriously help medical doctors in the diagnosis of different diseases, especially in complicated cases. This article is devoted to recognizing and diagnosing heart disease based on automatic computer processing of the electrocardiograms (ECG) of patients. In the general case, the change of the ECG parameters can be presented as a random sequence of the signals under processing. Developing new computational methods for such signal processing is an important research problem in creating efficient medical decision support systems. Authors consider the possibility of increasing the diagnostic accuracy of cardiovascular diseases by implementing of the new proposed computational method of information processing. This method is based on the generalized nonlinear canonical decomposition of a random sequence of the change of cardiogram parameters. The use of a nonlinear canonical model makes it possible to significantly simplify the maximum likelihood criterion for classifying diseases. This simplification is provided by the transition from a multidimensional distribution density of cardiogram parameters to a product of onedimensional distribution densities of independent random coefficients of a nonlinear canonical decomposition. The absence of any restrictions on the class of random sequences under study makes it possible to achieve maximum accuracy in diagnosing cardiovascular diseases. Functional diagrams for implementing the proposed method reflecting the features of its application are presented. The quantitative parameters of the core of the computational diagnostic procedure can be determined in advance based on the preliminary statistical data of the ECGs for different heart diseases. That is why the developed method is quite simple in terms of computation (computing complexity, accuracy, computing time, etc.) and can be implemented in medical computer decision systems for monitoring cardiovascular diseases and for their diagnosis in real time. The results of the numerical experiment confirm the high accuracy of the developed method for classifying cardiovascular diseases.
Similar content being viewed by others
Introduction
Medical statistics show^{1,2,3} that currently the main cause of death in the world (more than 30%) is diseases of the cardiovascular system (including among people of working age). Therefore, timely highprecision diagnostics of heart diseases, prevention and treatment at an early stage of the development of the disease acquire exceptional relevance. To improve the accuracy of diagnosing the state of the heart in recent decades, computer systems for automatic analysis^{4,5,6} of electrocardiographic data obtained during the processing of an electrocardiographic signal have been widely used. Automatic analysis of electrocardiograms is a rather complex theoretical problem. First of all, this is due to the physiological origin of the signal^{7,8,9}, which is the reason for its indeterminacy, diversity, variability, unpredictability, nonstationarity and susceptibility to numerous types of interference.
At present, the analysis process, as a rule, is a study of isopotential and other maps generated from the data obtained using the software supplied with the signal recording apparatus.
In medical practice, the conclusions of cardiologists about patients’ diagnoses have, as usual, qualitative or verbal character and are not always confirmed by enough number of quantitative data. In special or difficult situations with disease recognition, diagnosis errors by young or insufficiently experienced medical doctors are possible, and the real diagnostic process may be significantly extended until a final correct decision about the truth diagnosis.
Decision support systems can seriously help medical doctors in the decisionmaking processes about the diagnosis of different heart diseases, especially in complicated cases. The most perspective approach is based on the recognizing and diagnosing heart disease using automatic computer processing of the electrocardiograms (ECG) of patients.
In the general case, the change of the ECG parameters can be presented as a random sequence of the signals under processing. Developing new computational methods for such signal processing is an important research problem in creating efficient medical decision support systems.
In this article, the authors consider the possibility of increasing the diagnostic accuracy of cardiovascular diseases by implementing the computational method, which is based on the generalized nonlinear canonical decomposition of a random sequence of the change of cardiogram parameters. The absence of any restrictions on the class of random sequences under study makes it possible to achieve maximum accuracy in diagnosing cardiovascular diseases. The main advantage is that quantitative parameters of the computational diagnostic procedure can be determined in advance based on the preliminary statistical data of the ECGs for different heart diseases. That is why the developed method is quite simple in terms of computation (computing complexity, accuracy, computing time, etc.) and can be implemented in medical computer decision systems for monitoring cardiovascular diseases and for their diagnosis in real time.
Thus, the development of efficient mathematical models and computation methods for identifying the highaccuracy individual characteristics of an electrocardiogram (with subsequent classification), as well as the creation of an automated computer diagnostic support system, is an urgent and important task in “medicine–computer science” multidisciplinary research.
The rest of the article covers multiple aspects related to the topic discussion. “Background and analysis of the related works” section consists of the analysis of the related works in the field of ECG processing. In “Problem statement” section authors formulate the problem statement. “Solution” section deals with the development of the computation method and corresponding mathematical model based on the generalized nonlinear canonical decomposition of a random sequence of the change of cardiogram parameters. “Results of the numerical experiment” section represents the modeling and simulation results for different existing methods and comparative results with the proposed computational method. The paper ends with a conclusion in “Conclusion” section.
Background and analysis of the related works
Diagnostics of electrocardiograms consist of three successive stages: (a) preliminary processing, (b) feature extraction (normalization), and (c) classification. Let us analyze all these stages consequently.
Fist stage—Preprocessing reduces signal measurement noise by smoothing the electrocardiogram signal, reducing drift suppression and baseline deviation. The most common existing methods used to reduce signal noise are (a) second order low pass and (b) high pass Butterworth filters^{10}, (c) Daubechies wavelet^{11} and (d) orthogonal waveletfilter^{12}. Besides, for baseline adjustment, such techniques as median filter, linear phase high pass filter, mean median filter and others are used also.
Second stage—Feature extraction is an interactive process that includes a series of automatic data transformation procedures. In cases with a large number of measurementsfeatures that describe the characteristics of the input signal, the correlation and factor analysis of data can be used to reduce the dimension of the problem. According to the extraction and analysis methods, the features can be divided into the following categories:

Temporary features^{13} (these features are described in the time domain, representing amplitude, slope and heart rate);

Spectral features^{14,15} (features are defined in the frequency domain, account for spectral concentration, normalized spectral moments);

Timefrequency/wavelet features^{16,17} (features extracted from the results of the wavelet transform applied to the electrocardiogram signal);

Signs of the complexity of geometric distortions^{18} (these signs include various calculations related to the complexity of the considered segment of the electrocardiogram).
The stage of feature extraction ends with the optimization of their number, which allows reducing the set of redundant functions, reducing computational costs and improving the overall performance of the system. This step uses the following three main categories of feature selection methods: (a) wrapper methods (recursive feature elimination^{14}; direct feature selection^{19}; genetic algorithms^{20}; (b) filter methods (correlation)^{21}; ChiSquare^{15}; analysis of variance (ANOVA)^{22}; ReliefF^{23}; (c) builtin methods^{24}.
To date, a large number of approaches have been developed to solve the problem of diagnosing cardiovascular diseases using various mathematical methods.
Third stage—The classification process can be carried out in several iterations, depending on the chosen recognition scheme. In some cases, the results obtained at this stage require a revision of the entire processing scheme as a whole. The most common classification methods are: discriminant function^{25}; cluster analysis^{26}; artificial neural network^{27}; Naive Bayes classifier^{28}; support vector machine^{29}; kNearest Neighbors (kNNs)^{30}; Decision Trees (DT)^{31}.
Several different approaches for ECG analysis are based on a chaos theory^{32}, a combination of statistical, geometric, and nonlinear heart rate variability features^{32}, a semantic web ontology and heart failure expert system^{33}, signal averaging method, multivariate analysis^{34}, RPCA—recursive principal component analysis^{35}, SPSA—simultaneous perturbation stochastic approximation method^{36}, ABT—Amplitude Based Technique FDBT—First Derivative Based Technique, SDBT—Second Derivative Based Technique^{37}, Hilbert transform^{38} and others.
At the same time, each of the abovementioned methods has its drawbacks and limitations. That is why the need to develop new effective methods of medical diagnostics has not lost its relevance.
Thus, the change in the values of the electrocardiogram has a stochastic character; therefore, for the diagnosis of cardiovascular diseases, it is necessary to use methods for recognizing random functions and random sequences.
The main method for recognition of the realizations of random sequences is the Bayes decision rule, according to which a decision about the belonging of the realization to a certain class, for which the posterior probability is maximum, is made. The method is theoretically accurate, however, as well as many of its modifications (the NeymanPearson, Wald criterion, etc.^{39}) is applied in conditions when the stochastic properties of classes of random sequences are fully known. If the prior probabilities of the classes of random sequences are not known, then equal values are assigned to them and the decision rule is modified into the maximum likelihood criterion. The criterion is especially important when solving problems of recognition, in which unlikely events cannot be excluded from consideration (diagnostics of emergency technical systems, medical diagnostics, etc.). However, for the maximum likelihood method, as well as for the Bayes rule, the problem of approximating the multivariate distribution density for a random sequence with a large number of sampling points remains unresolved.
The use of the canonical expansion of Pugachev^{40} makes it possible to pass the decision rule from a multivariate distribution density to the product of onedimensional distribution densities of uncorrelated random coefficients. However, this approach is valid only for Gaussian random sequences.
The aim of our study is to eliminate the abovementioned disadvantage and develop a diagnostic method that takes into account nonlinear stochastic features of changes in cardiogram parameters.
Problem statement
The accumulated volume of statistical data on various cardiovascular diseases makes it possible to determine with high accuracy the characteristics \(E\left[ {C^{{\xi_{g} }} \left( {i  r_{g  1} } \right) \cdots C^{{\xi_{2} }} \left( {i  r_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right],\) \(\sum\nolimits_{j = 1}^{g} {\xi_{j} \le N,}\) \(\, r_{j} = \overline{1,i  1,}\) \(i = \overline{1,I}\) (\(E\left[ {} \right]\)expected value) of random sequences \(C\left( {i/k} \right), \, i = \overline{1,I,} \, k = \overline{1,K}\) describing the changes in the values of information signs of an electrocardiogram (\(K\) includes (\(K  1\)) different diseases and one normal state of a person, \(I\)—the number of information signs of an electrocardiogram). For example, \(C\left( {i/1} \right), \, i = \overline{1,I}\)—Marfan syndrome; \(C\left( {i/2} \right), \, i = \overline{1,I}\) pulmonary embolism; \(C\left( {i/3} \right), \, i = \overline{1,I}\)—heart attack; \(C\left( {i/4} \right), \, i = \overline{1,I}\)—cardiomyopathy;\(C\left( {i/5} \right), \, i = \overline{1,I}\)—pericardial disease; \(C\left( {i/6} \right), \, i = \overline{1,I}\)—rheumatic heart disease; \(C\left( {i/7} \right), \, i = \overline{1,I}\)—stroke; \(C\left( {i/8} \right), \, i = \overline{1,I}\)—normal state of a person; in this case \(K = 8\).
As a result of electrocardiography, a certain sequence of values \(c\left( i \right), \, i = \overline{1,I}\) can be obtained. It is necessary to determine to which class \(k^{*} \in \left\{ {1,..,K} \right\}\) this realization \(c\left( i \right), \, i = \overline{1,I}\) belongs.
Solution
Taking into account the specific features of the problem of diagnosing cardiovascular diseases, to analyze the cardiogram, as a rule^{41,42}, a random sequence \(\left\{ C \right\} = \left\{ {C\left( 1 \right),C\left( 2 \right),...,C\left( I \right)} \right\}, \, I = 14\) with fourteen elements is used. Each element corresponds to one most informative parameter of the electrocardiogram (Fig. 1), in particular: \(C\left( 1 \right)\) is the height of the tooth \(P\); \(C\left( 2 \right)\)—the width of the tooth \(P\); \(C\left( 3 \right)\)—the height of the tooth \(Q\); \(C\left( 4 \right)\)—the interval \(PQ\); \(C\left( 5 \right)\)—the height of the first tooth \(R\); \(C\left( 6 \right)\)—the interval \(QRS\); \(C\left( 7 \right)\)—the height of the tooth \(S\); \(C\left( 8 \right)\)—the interval \(RR\); \(C\left( 9 \right)\)—the height of the tooth \(T\); \(C\left( {10} \right)\)—the interval \(QT\); \(C\left( {11} \right)\)—the interval \(ST\); \(C\left( {12} \right)\)—the interval \(TP\); \(C\left( {13} \right)\)—the width of the tooth \(U\); \(C\left( {14} \right)\)—the height of the tooth \(U.\)
A universal method for recognizing a random sequence is the maximum likelihood criterion^{43,44}, according to which the decision on whether the realization \(\vec{c} = \left\{ {c\left( 1 \right),...,c\left( {14} \right)} \right\}\) belongs to the class \(k^{*} \in \left\{ {1,..,K} \right\}\) is made when the following condition is met^{45,46}:
where \(f_{I} \left( {\vec{c}/k} \right), \, k = \overline{1,K} , \, I = 14\) is the conditional distribution density of the features \(\vec{c}\), provided that the realization belongs to this class.
Thus, to solve the recognition problem, it is necessary to obtain the estimate of unknown densities \(f_{I} \left( {\vec{c}/k} \right), \, k = \overline{1,K}\), which, in turn, taking into account rather large dimension (\(I = 14\)) of the function, is a complex and timeconsuming procedure. In the case of using the simplifying assumption (\(E\left[ {C^{\nu } \left( j \right)C^{\mu } \left( i \right)} \right] \ne 0,\)\(E\left[ {C^{{\xi_{g} }} \left( {i  p_{g  1} } \right)...C^{{\xi_{2} }} \left( {i  p_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right] = 0\)) about the presence in a random sequence of only stochastic relations between two arbitrary parameters, the problem is greatly simplified by means of transition from the sequence \(C\left( i \right),\,i = \overline{1,14}\) to the analysis of a set of independent random coefficients \(P_{i}^{(N)} , \, i = \overline{1,14}\) of the canonical expansion^{47,48}:
The coordinate functions \(T_{h\nu }^{(\lambda )} \left( i \right)\) are determined from the relation^{49,50}:
where \(D_{\lambda } \left( i \right)\)—the variances of the random coefficients \(P_{i}^{(\lambda )}\)
In this case, the substitution of \(\vec{c}\) by the vector \(\vec{p}\) considering \(f_{I} \left( {\vec{p}/k} \right) = \prod\nolimits_{i = 1}^{I} {f_{1} \left( {p_{i}^{(N)} /k} \right)} ,\quad \, k = \overline{1,K} , \, I = 14\) allows us to write the decision rule in the following form
The problem of recognition, therefore, is reduced to a successive approximation of twelve onedimensional distribution densities.
The decision rule (1) is significantly simplified, however, the transition from the vector \(\vec{c}\) to the vector \(\vec{p}\) is possible provided that the random sequence \(C\left( i \right), \, i = \overline{1,14}\) has only stochastic relations \(E\left[ {C^{\nu } \left( j \right)C^{\mu } \left( i \right)} \right]\).
To eliminate all existing probabilistic relationships \(E\left[ {C^{{\xi_{g} }} \left( {i  r_{g  1} } \right)...C^{{\xi_{2} }} \left( {i  r_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right]\), we introduce into consideration the array of random variables:
The parameters of the matrix C are determined by the expressions
A priori information \(E\left[ {C^{{\xi_{g} }} \left( {i  r_{g  1} } \right) \cdots C^{{\xi_{2} }} \left( {i  r_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right]\) about the sequence \(C\left( i \right), \, i = \overline{1,14}\) can be obtained by determining the crosscorrelation of the elements the array C. Consider this array as a vector random sequence \(\vec{C}\), each component of which corresponds to a row of the array \(C\). Applying the vector linear canonical decomposition to \(\vec{C}\) gives the following expression for the first component of (7).
where \(M\left( \nu \right) = \left\{ {\begin{array}{*{20}l} {N  1,} \hfill & {{\text{for }}\,\nu \ge N  1;} \hfill \\ {\nu ,} \hfill & {{\text{for }}\,\nu < N  1.} \hfill \\ \end{array} } \right.\)
The coordinate functions \(\omega \left( {\nu ;\alpha_{1} /i;b_{1} \cdots b_{m  1} ;a_{1} \cdots a_{m} } \right)\), \(\omega \left( {\nu ;\beta_{1} , \ldots \beta_{n  1} ;\alpha_{1} , \ldots \alpha_{n} /i;b_{1} , \ldots b_{m  1} ;a_{1} , \ldots a_{m} } \right)\) of the canonical decomposition (8) are determined by the relations:
where \(D^{{\alpha_{1} }} \left( \nu \right)\), \(D_{{}}^{{\beta_{1} ,...\beta_{n  1} ;\alpha_{1} ,...\alpha_{n}^{{}} }} \left( \nu \right)\)—variances of random coefficients \(P_{{\alpha_{1} }} \left( \nu \right)\), \(P_{{\beta_{1} ,...\beta_{n  1} ;\alpha_{1} ,...\alpha_{n}^{{}} }} \left( \nu \right)\).
The values \(b_{1} ,...,b_{m  1} ;{\text{ a}}_{1} ,...,a_{m}\) change in the intervals \(b_{\mu } \in \left[ {b_{\mu  1}^{(m)} ;b_{\mu }^{^{\prime}(m)} } \right] \,\) and \(a_{\mu } \in \left[ {1;a_{\mu }^{\prime (m)} } \right]{, }m{ = }\overline{1  N} \,\) respectively. The right boundaries of the intervals are determined from the formulas
The block diagram of the algorithm for calculating the parameters of the canonical expansion (8) is shown in Fig. 2.
Expression (8) is a nonlinear canonical decomposition of the investigated sequence with full consideration of the stochastic properties \(E\left[ {C^{{\xi_{g} }} \left( {i  r_{g  1} } \right)...C^{{\xi_{2} }} \left( {i  r_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right].\) Therefore, the random coefficients \(P_{N  1} \left( 1 \right)\), \(P_{1;1,N  2} \left( 2 \right)\), \(P_{1,2;1,1,N  3} \left( 3 \right)\), …, \(P_{14  N + 1,14  N + 2,...,11;1,1...1} \left( {14} \right)\), calculated at the last iteration, are independent random variables and the decision rule takes the form:
The absence of assumptions about the form of the distribution density of random variables \(P_{N  1} \left( 1 \right)\), \(P_{1;1,N  2} \left( 2 \right)\), \(P_{1,2;1,1,N  3} \left( 3 \right)\), …, \(P_{14  N + 1,14  N + 2,...,13;1,1...1} \left( {14} \right)\) leads to the need to use nonparametric methods to describe them. The simplest and most efficient approach under these conditions is the use of nonparametric Parzentype estimates^{51}.
The computational method for diagnosing cardiovascular diseases consists in the realization of the following stages:

(1)
Estimation of moment functions \(E\left[ {C^{{\xi_{g} }} \left( {i  r_{g  1} } \right) \ldots C^{{\xi_{2} }} \left( {i  r_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right],\) \(\sum\limits_{j = 1}^{g} {\xi_{j} \le N,}\)\(r_{j} = \overline{1,i  1,}\)\(i = \overline{1,14}\) based on statistical information \(c_{l} \left( i \right), \, i = \overline{1,14} ,{\text{ l}} = \overline{1,L}\);

(2)
Formation of canonical decompositions (8) for various classes (diseases) of random sequences;

(3)
Synthesis of onedimensional distribution densities of independent random coefficients \(P_{N  1} \left( 1 \right)\), \(P_{1;1,N  2} \left( 2 \right)\), \(P_{1,2;1,1,N  3} \left( 3 \right)\), …, \(P_{14  N + 1,14  N + 2, \ldots ,13;1,1 \ldots 1} \left( {14} \right)\);

(4)
Calculation of the values \(p_{N  1} \left( 1 \right)\), \(p_{1;1,N  2} \left( 2 \right)\), \(p_{1,2;1,1,N  3} \left( 3 \right)\), …, \(p_{14  N + 1,14  N + 2, \ldots ,13;1,1 \ldots 1} \left( {14} \right)\) for some cardiogram \(\vec{c} = \left\{ {c\left( 1 \right), \ldots ,c\left( {14} \right)} \right\}\);

(5)
Determination of the belonging of the cardiogram \(\vec{c} = \left\{ {c\left( 1 \right), \ldots ,c\left( {14} \right)} \right\}\) to a certain class based on the decision rule (11).
The diagram of the functioning of the system of diagnostics of cardiovascular diseases based on the developed method is shown in Fig. 3.
Results of the numerical experiment
The proposed method was tested on the basis of statistical data of nine cardiovascular diseases: (a) mild neurocirculatory dystonia, \(\left\{ {C\left( i \right)/1} \right\}, \, i = \overline{1,14}\); (b) neurocirculatory dystonia of moderate degree, \(\left\{ {C\left( i \right)/2} \right\}, \, i = \overline{1,14}\); (c) severe neurocirculatory dystonia, \(\left\{ {C\left( i \right)/3} \right\}, \, i = \overline{1,14}\); (d) stenocardia of the first functional class, \(\left\{ {C\left( i \right)/4} \right\}, \, i = \overline{1,14}\);
(e) hypertrophy of myocardium, \(\left\{ {C\left( i \right)/5} \right\}, \, i = \overline{1,14}\); (f) severe arrhythmia, \(\left\{ {C\left( i \right)/6} \right\}, \, i = \overline{1,14}\); (g) aortic stenosis, \(\left\{ {C\left( i \right)/7} \right\}, \, i = \overline{1,14}\); (h) stenocardia of the second functional class, \(\left\{ {C\left( i \right)/8} \right\}, \, i = \overline{1,14}\); (i) stenocardia of the third functional class, \(\left\{ {C\left( i \right)/9} \right\}, \, i = \overline{1,14}\).
For the numerical experiment two hundred different cardiograms for each disease \(\left\{ {C\left( i \right)/k} \right\}, \, i = \overline{1,14} , \, k = \overline{1,K} , \, K = 9\) were used from the PhysikalischTechnische Bundesanstalt (PTB) dataset, which is a publicly available database. This database is compiled by the National Metrology Institute of Germany. It contains combinations of digitized ECGs of both normal and abnormal subjects’ recordings, which are provided for research^{52}.
Testing the statistical hypothesis about the independence of random coefficients \(P_{N  1} \left( 1 \right)\), \(P_{1;1,N  2} \left( 2 \right)\), \(P_{1,2;1,1,N  3} \left( 3 \right)\), …, \(P_{14  N + 1,14  N + 2, \ldots ,13;1,1.. \ldots 1} \left( {14} \right)\) based on the criterion \(\chi^{2}\)^{53} and the Bloom criterion^{54}, showed the truth of the hypothesis about the independence of coefficients at \(N = 4\) for all six sequences with a probability not less than \(P_{D} = 0,98\) (for a linear decomposition there is a statistically significant relationship between random coefficients; for \(N = 3\), there are not enough grounds for accepting the hypothesis about independence of random coefficients for \(i = 3, \, i = 6, \, i = 7, \, i = 8\)). Thus, the canonical expansion (8) for \(N = 4\) (random coefficients are a array of independent random variables) with the corresponding sets of coordinate functions are adequate models^{40} of the studied random sequences \(\left\{ {C\left( i \right)/k} \right\}, \, i = \overline{1,14} , \, k = \overline{1,9}\). The following methods were used for recognition: (a) linear criterion (rule (6) for \(E\left[ {C\left( j \right)C\left( i \right)} \right] \ne 0,\)\(E\left[ {C^{\nu } \left( j \right)C^{\mu } \left( i \right)} \right] = 0, \, \nu + \mu > 2\)); (b) polynomial criterion (6) ^{55}; (c) fuzzy logic method ^{56,57,58}; (d) neural network ^{59,60} based on the Daubechies wavelet function of the fourth order and the Levenberg–Marquardt algorithm for learning; (e) generalized nonlinear criterion (11). Let us consider mentioned criteria in details:

(a)
Linear criterion
$$k^{*} = \arg \mathop {\max }\limits_{k} \prod\limits_{i = 1}^{14} {f_{1} \left( {p_{i}^{{}} /k} \right)} ,\quad \, k = \overline{1,6} ,$$(12)where \(p_{i}^{{}} , \, i = \overline{1,I} , \, I = 14\) are random values of uncorrelated random coefficients \(P_{i}^{{}} , \, i = \overline{1,14} :\)
$$P_{i} = C\left( i \right)  E\left[ {C\left( i \right)} \right]  \sum\limits_{\nu = 1}^{i  1} {P_{\nu } T_{\nu } \left( i \right)} ,\quad \, i = \overline{1,14} .$$Coordinate functions
$$\begin{aligned} T_{\nu } \left( i \right) & = \frac{1}{{D_{{}} \left( \nu \right)}}\left( {E\left[ {C\left( \nu \right)C\left( i \right)} \right]  E\left[ {C\left( \nu \right)} \right]E\left[ {C\left( i \right)} \right]  \sum\limits_{\rho = 1}^{\nu  1} {D\left( \rho \right)} } \right. \\ & \times \left. {T_{\rho } \left( \nu \right)T_{\rho } \left( i \right)} \right),\quad \nu = \overline{1,i} ,\quad i = \overline{1,14} . \\ \end{aligned}$$Variances of random coefficients
$$D_{\lambda } \left( i \right) = E\left[ {C^{2\lambda } \left( i \right)} \right]{  }E^{2} \left[ {C^{\lambda } \left( i \right)} \right]{  }\sum\limits_{\rho = 1}^{\lambda  1} {D\left( \rho \right)} \left\{ {T_{\rho }^{{}} \left( i \right)} \right\}^{2} ,\quad \, i = \overline{1,14} .$$ 
(b)
Polynomial criterion
$$k^{*} = \arg \mathop {\max }\limits_{j} \prod\limits_{i = 1}^{14} {f_{1} \left( {p_{i}^{(3)} /k} \right)} , \, \quad k = \overline{1,6} .$$(13)Random coefficients
$$P_{i}^{(\lambda )} = C^{\lambda } \left( i \right)  \sum\limits_{\nu = 1}^{i  1} {\sum\limits_{j = 1}^{3} {P_{\nu }^{(j)} } } T_{\lambda \nu }^{(j)} \left( i \right)  \sum\limits_{j = 1}^{\lambda  1} {P_{i}^{(j)} } T_{\lambda i}^{(j)} \left( i \right),\quad \, i = \overline{1,14} .$$Coordinate functions
$$\begin{aligned} T_{h\nu }^{(\lambda )} \left( i \right) = & \,\frac{1}{{D_{\lambda } \left( \nu \right)}}\left( {E\left[ {C^{\lambda } \left( \nu \right)C^{h} \left( i \right)} \right]  E\left[ {C^{\lambda } \left( \nu \right)} \right]E\left[ {C^{h} \left( i \right)} \right]  \sum\limits_{\rho = 1}^{\nu  1} {\sum\limits_{j = 1}^{3} {D_{j} } } \left( \rho \right)} \right. \\ & \times \,\left. {T_{\lambda \mu }^{(j)} \left( \nu \right)T_{h\rho }^{(j)} \left( i \right)  \sum\limits_{j = 1}^{\lambda  1} {D_{j} } \left( \nu \right)T_{\lambda \nu }^{(j)} \left( \nu \right)T_{h\nu }^{(j)} \left( i \right)} \right),\quad \nu = \overline{1,i} ,\quad \, i = \overline{1,14} . \\ \end{aligned}$$Variances of random coefficients
$$\begin{aligned} D_{\lambda } \left( i \right) = & \,E\left[ {C^{2\lambda } \left( i \right)} \right]{  }E^{2} \left[ {C^{\lambda } \left( i \right)} \right]{  }\sum\limits_{\rho = 1}^{{i{  }1}} {\sum\limits_{j = 1}^{3} {D_{j} \left( \rho \right)} } \\ & \times \,\left\{ {T_{\lambda \rho }^{(j)} \left( i \right)} \right\}^{2} {  }\sum\limits_{j = 1}^{{\lambda {  }1}} {D_{j} \left( i \right)} \left\{ {T_{\lambda i}^{(j)} \left( i \right)} \right\}^{2} ,\quad \, i = \overline{1,14} . \\ \end{aligned}$$ 
(c)
Fuzzy logic method for medical diagnostics
This method is based on the implementation of fuzzy system with hierarchical structure of Rule Base ^{56}.
Input parameters: \(p_{1}\)—increase of double product per one kilogram of the body weight of the sick; \(p_{2}\)—increase of double product per one kilogram of physical exertion; \(p_{3}\)—coefficient of phosphorilation; \(p_{4}\)—age of the sick; \(p_{5}\)—double product of pulse on arterial tension; \(p_{6}\)—adenosinetriphosphoric acid; \(p_{7}\)—adenosine diphosphoric acid; \(p_{8}\)—adenylic acid; \(p_{9}\)—coefficient of the ratio of lactic and pyruvic acid content;\(p_{10}\)—maximal consumption of oxygen per one kilogram of the body weight of the sick; \(p_{11}\)—increase of double product in the response for submaximal physical exertion; \(p_{12}\)—tolerance to physical activity.
Expressions for the determination of the diagnosis are of the form:
where values c: \(c_{1}\)—mild neurocirculatory dystonia, \(\left\{ {C\left( i \right)/1} \right\}, \, i = \overline{1,14}\); \(c_{2}\)—neurocirculatory dystonia of moderate degree, \(\left\{ {C\left( i \right)/2} \right\}, \, i = \overline{1,14}\); \(c_{3}\) severe neurocirculatory dystonia, \(\left\{ {C\left( i \right)/3} \right\}, \, i = \overline{1,14}\); \(c_{4}\)—stenocardia of the first functional class, \(\left\{ {C\left( i \right)/4} \right\}, \, i = \overline{1,14}\); \(c_{5}\) hypertrophy of myocardium, \(\left\{ {C\left( i \right)/5} \right\}, \, i = \overline{1,14}\); \(c_{6}\)—severe arrhythmia, \(\left\{ {C\left( i \right)/6} \right\}, \, i = \overline{1,14}\); \(c_{7}\) aortic stenosis, \(\left\{ {C\left( i \right)/7} \right\}, \, i = \overline{1,14}\); \(c_{8}\)—stenocardia of the second functional class, \(\left\{ {C\left( i \right)/8} \right\}, \, i = \overline{1,14}\); \(c_{9}\)—stenocardia of the third functional class, \(\left\{ {C\left( i \right)/9} \right\}, \, i = \overline{1,14}\). Tables 1, 2 and 3 are the information base for the formation of a system of fuzzy logic equations that connect the membership functions of the diagnosis and input variables.
Tables 1, 2, 3 use the abbreviation for fuzzy terms: L—low, BA—below the average, A—average, AA—above average, H—high.
For example, if \(p_{1}\) = A, \(p^{(1)}\) = H, \(p^{(2)}\) = AA, according to the last line of Table 1, the diagnosed disease is stenocardia of the third functional class.
The value H for parameter \(p^{(1)}\) is accepted in three cases (rule base Table 2):

(a)
\(p_{2}\) = L,\(p_{3}\) = L,\(p_{4}\) = L,\(p_{5}\) = AA,\(p_{10}\) = L,\(p_{11}\) = L;

(b)
\(p_{2}\) = BA,\(p_{3}\) = L,\(p_{4}\) = BA,\(p_{5}\) = H,\(p_{10}\) = L,\(p_{11}\) = BA;

(c)
\(p_{2}\) = L,\(p_{3}\) = BA,\(p_{4}\) = BA,\(p_{5}\) = AA,\(p_{10}\) = L,\(p_{11}\) = L.
Parameter \(p^{(2)}\) is equal to AA if one of the conditions is met (rule base Table 3):

(a)
\(p_{6}\) = BA, \(p_{7}\) = A, \(p_{8}\) = BA, \(p_{9}\) = A, \(p_{12}\) = A;

(b)
\(p_{6}\) = AA, \(p_{7}\) = BA, \(p_{8}\) = A, \(p_{9}\) = BA, \(p_{12}\) = BA;

(c)
\(p_{6}\) = L, \(p_{7}\) = A, \(p_{8}\) = A, \(p_{9}\) = BA, \(p_{12}\) = A.

(iv)
Neural network
Daubechies wavelet function of the fourth order and the Levenberg–Marquardt algorithm for learning were used ^{59}.
Expressions for the determination of approximation coefficients and detailing of discrete wavelet transform are presented in the form:
$$\begin{aligned} & W_{\gamma } \left( {j_{0} ,k} \right) = \frac{1}{M}\sum\limits_{x} {F\left( x \right)} \gamma_{{j_{0} ,k}} \left( x \right), \\ & W_{\phi } \left( {j,k} \right) = \frac{1}{M}\sum\limits_{x} {F\left( x \right)} \phi_{j,k} \left( x \right), \\ \end{aligned}$$where \(\gamma_{j,k} \left( x \right), \, \phi_{j,k} \left( x \right)\) is a family of basic functions.
Output signal of each of separate neuron of output layer was forming as
Continuous sigmoid bipolar function \(F\left( x \right) = th\left( x \right)\) was used as activation function of each separate neuron.
Tables 4, 5 shows the comparative diagnostic results.
The data in Tables 4, 5 indicate the low efficiency of the linear criterion (12) (the minimum amount of a priori information is used:\(E\left[ {C\left( j \right)C\left( i \right)} \right]\)). The use of additional stochastic relations (\(E\left[ {C^{\nu } \left( j \right)C^{\mu } \left( i \right)} \right]\)) in the criterion (13) makes it possible to achieve an increase in the accuracy of solving the problem of diagnosing cardiovascular diseases compared to (12). The maximum accuracy of diagnostics is achieved by applying the proposed decision rule (11) by maximizing the use of the stochastic properties (\(E\left[ {C^{{\xi_{l} }} \left( {i  r_{l  1} } \right) \cdots C^{{\xi_{2} }} \left( {i  r_{1} } \right)C^{{\xi_{1} }} \left( i \right)} \right]\)) of the studied random sequences. The existing set of wavelet functions and the lack of a rigorous mathematical apparatus for analyzing fuzzy equations significantly limit the quality of decisionmaking about a cardiovascular disease based on the fuzzy logic method and neural network.
Conclusion
A computational method for computer systems for automated diagnosis of cardiovascular diseases based on a generalized nonlinear canonical decomposition of a random sequence of change of cardiograms has been obtained. The use of the canonical model made it possible to form the decision rule for the maximum distribution density in the form of a product of onedimensional distribution densities of random coefficients. The canonical decomposition does not impose any significant restrictions (linearity, stationarity, Markov property, monotony, ergodicity, etc.) on the class of random sequences under study, which makes it possible to maximally take into account the stochastic characteristics of sequences related to various cardiovascular diseases.
Taking into account the recurrent regularity of calculations, the diagnostic method is quite simple in terms of computation and allows using an arbitrary number of input parameters. A significant advantage of the method is the ability to use characteristics not directly related to the cardiogram (age of the patient, blood pressure, etc.).
During the operation of the diagnostic system based on the proposed computational method, new diseases unknown to medicine can be identified in the case of a significant difference in the values of the likelihood function for the investigated cardiogram and the classified cardiograms of known diseases.
The results of the numerical experiment indicate a high reliability of the diagnostics of cardiovascular diseases based on the proposed method.
Data availability
The datasets generated and analysed during the current study are available in the PhysikalischTechnische Bundesanstalt (PTB) repository, https://physionet.org/physiobank/database/.
References
Yun, S., Oh, K. The Korea national health and nutrition examination survey data linked cause of death data. J. Epidemiol. Health e2022021 (2022).
Mazumder, O. et al. Synthetic PPG signal generation to improve coronary artery disease classification: Study with physical model of cardiovascular system. J. Biomed. Health Inf. 26(5), 2136–2146 (2022).
Bair E. et al. The use of paclitaxelcoated devices in the treatment of peripheral Arterial disease is not associated with increased mortality or amputations. J. Ann. Vasc. Surgery (79), 387 (2022).
Zhao, Y., Zeng, Q., Li, J. & Jiang, X. Digital subtraction angiography image features under the deep learning algorithm in cardiovascular interventional treatment and nursing for vascular restenosis. J. Comput. Math. Methods Med. (2022).
Sinha, D., Sharma, A. & Sharma, S. Automated detection of coronary artery disease comparing arterial fat accumulation using CNN. J. Electron. Imaging 31(5), 051405 (2022).
Yu, C., Che, Y., Sun, G., Zhao, X. & Liu, B. Research on diagnosis architecture of cardiovascular diseases based on multimedical images. J. Comput. Math. Methods Med. (2022).
Darmawahyuni, A. et al. Deep learningbased electrocardiogram rhythm and beat features for heart abnormality classification. J. Comput. Sci. 8, e825 (2022).
Zang, X., Li, B., Zhao, L., Yan, D. & Yang, L. Endtoend depression recognition based on a onedimensional convolution neural network model using twolead ECG signal. J. Med. Biol. Eng. 1–9 (2022).
FuentesAguilar, R. Q., PérezEspinosa, H., & FiligranadelaCruz, M. A. Biosignals analysis (heart, phonatory system, and muscles). In Biosignal Processing and Classification Using Computational Learning and Intelligence 7–26 (2022).
Tripathy, R. K. et al. Detection of life threatening ventricular arrhythmia using digital taylor fourier transform. J. Front. Physiol. https://doi.org/10.3389/fphys.2018.00722 (2018).
Oh, S. L. et al. Shockable versus nonshockable lifethreatening ventricular arrhythmias using dwt and nonlinear features of ECG signals. J. Mech. Med. Biol. 17(07), 1740004. https://doi.org/10.1142/S0219519417400048 (2017).
Sharma, M., Tan, R.S. & Acharya, U. R. Detection of shockable ventricular arrhythmia using optimal orthogonal wavelet filters. J. Neural Comput. Appl. 32(20), 15869–15884 (2020).
Arafat, M. A., Chowdhury, A. W. & Hasan, M. K. A simple time domain algorithm for the detection of ventricular fibrillation in electrocardiogram. J. Signal Image Video Process. 5(1), 1–10. https://doi.org/10.1007/s1176000901361 (2011).
Granitto, P. M., Furlanello, C., Biasioli, F. & Gasperi, F. Recursive feature elimination with random forest for PTRMS analysis of agroindustrial products. J. Chemom. Intell. Lab. Syst. 83(2), 83–90. https://doi.org/10.1016/j.chemolab.2006.01.007 (2006).
Pławiak, P. Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ECG signals. J. Swarm Evol. Comput. 39, 192–208. https://doi.org/10.1016/j.swevo.2017.10.002 (2018).
Zhdanov, A. E. et al. OculusGraphy: Literature review on electrophysiological research methods in ophthalmology and electroretinograms processing using wavelet transform. In 2020 International Conference on eHealth and Bioengineering (EHB), Iasi, Romania, 1–6 (IEEE, 2020). https://doi.org/10.1109/EHB50910.2020.9280221
Li, Y., Bisera, J., Weil, M. H. & Tang, W. An algorithm used for ventricular fibrillation detection without interrupting chest compression. IEEE Trans. Biomed. Eng. 59(1), 78–86. https://doi.org/10.1109/TBME.2011.2118755 (2012).
Sinha, N. & Das, A. Automatic diagnosis of cardiac arrhythmias based on three stage feature fusion and classification model using DWT. J. Biomed. Signal Process. Control 62, 102066. https://doi.org/10.1016/j.bspc.2020.102066 (2020).
Ververidis, D. & Kotropoulos, C. Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. J. Signal Process. 88(12), 2956–2970. https://doi.org/10.1016/j.sigpro.2008.07.001 (2008).
Bhuvaneswari Amma, N. G. Cardiovascular disease prediction system using genetic algorithm and neural network. In Proceedings of 2012 International Conference on Computing, Communication and Applications 1–5 (2012). https://doi.org/10.1109/ICCCA.2012.6179185
Sireesha, M. Classification model for prediction of heart disease using correlation coefficient technique. Int. J. Adv. Trends Comput. Sci. Eng. 9(2), 2116–2123. https://doi.org/10.30534/ijatcse/2020/185922020 (2020).
Lehmann, E. L. & Romano, J. P. Testing Statistical Hypotheses 3rd edn. (Springer, 2008).
Fisher, R. A. Statistical methods for research workers. In Breakthroughs in Statistics (eds Kotz, S. & Johnson, N. L.) 66–70 (Springer, 1992). https://doi.org/10.1007/9781461243809_6.
Kira, K. & Rendell, L. A. The feature selection problem: Traditional methods and a new algorithm. In AAAI92 Proceedings 129–134 (1992). Available at: https://aaai.org/Papers/AAAI/1992/ AAAI92–020.pdf (Accessed November 29, 2020).
Firoozabadi, R., Gregg, R. E., Babaeizadeh, S. & Laciar, E. Identification of exerciseinduced ischemia using QRS slopes. J. Electrocardiol. 49, 55–59 (2016).
Shen, C.P. et al. Detection of cardiac arrhythmia in electrocardiograms using adaptive feature extraction and modified support vector machines. Expert Syst. Appl. 39, 7845–7852 (2012).
Ronzhina, M. et al. Spectral and higherorder statistics analysis of ECG: application to study of ischemia in rabbit isolated hearts. In 2012 Computing in Cardiology Conference (CinC) Vol 36, 645–648 (2012).
Sayadi, O., Shamsollahi, M. B., Clifford, G. D., Verleysen, M. & Bencherif, M. A. Robust detection of premature ventricular contractions using a wavebased bayesian framework. IEEE Trans. Biomed. Eng. 57, 353–362 (2010).
Tseng, Y.L., Lin, K.S., Jaw, F.S. & Laciar, E. Comparison of supportvector machine and sparse representation using a modified rulebased method for automated myocardial ischemia detection. J. Comput. Math. Methods Med. 2016, 1–8 (2016).
Doquire, G., de Lannoy, G., François, D., Verleysen, M. & Bencherif, M. A. Feature selection for interpatient supervised heart beat classification. J. Physiol. Meas. 31, 903–920 (2010).
Shebanin, V. et al. Application of fuzzy predicates and quantifiers by matrix presentation in informational resources modeling. In Proceedings of XII International Conference “MEMSTECH 2016”, LvivPoljana, 146–149 (2016). https://doi.org/10.1109/MEMSTECH.2016.7507536
Jovic, A. & Bogunovic, N. Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. J. Artif. Intell. Med. 51(3), 175–186 (2011).
Prcela, M., Gamberger, D. & Jovic, A. Semantic web ontology utilization for heart failure expert system design. J. Stud. Health Technol. Inf. 136, 851–856 (2008).
Biel, L., Pettersson, O., Philipson, L. & Wide, P. ECG analysis: A new approach in human identification. J. Instrum. Meas. 50(3), 808–812 (2001).
Pawar, T., Anantakrishnan, N. S., Chaudhuri, S. & Duttagupta, S. P. Impact analysis of body movement in ambulatory ECG. In 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 5453–5456 (2007).
Gerencsér, L., Kozmann, G., Vágó, Z. & Haraszti, K. The use of the SPSA method in ECG analysis. J. IEEE Trans. Biomed. Eng. 49(10), 1094–1101 (2002).
Fang, Q., Sufi, F. & Cosic, I. A Mobile Device Based ECG Analysis System (NTECH Open Access Publisher, 2008).
Benitez, D., Gaydecki, P. A., Zaidi, A. & Fitzpatrick, A. P. The use of the Hilbert transform in ECG signal analysis. J. Comp. Biol. Med. 31(5), 399–406 (2001).
Box, G. E. P. & Jenkins, G. M. Time–series analysis, Forecasting and Control (HoldenDay, 1970).
Kudritsky, V. D. Filtering, Extrapolation and Recognition Realizations of Random Functions 176 (FADA Ltd, 2001).
Rosenblum, M. G., Pikovsky, A. S. & Kurths, J. Synchronization approach to analysis of biological systems. Random Fluctuat. World Celebr. Two Decad. Fluctuat. Noise Lett. 335–344 (2022).
Peng, H., Wu, X., Wen, Y., Lin, J. & Guan, W. Myocardial infarction and stroke risks in multiple sclerosis patients: A twosample Mendelian randomization study. J. Mult. Scler. Relat. Disord. 58, 103501 (2022).
Chabchoub, S., Mansouri, S. & Ben Salah, R. Signal processing techniques applied to impedance cardiography ICG signals–a review. J. Med. Eng. Technol. https://doi.org/10.1080/03091902.2022.2026508 (2022).
Bickel, D. Coherent checking and updating of Bayesian models without specifying the model space: Adecisiontheoretic semantics for possibility theory. Int. J. Approx. Reason. 142, 81–93 (2022).
Shebanin, V., et al. Canonical mathematical model and information technology for cardiovascular diseases diagnostics. In: Proceedings 14th International Conference CADSM 2017 438–440 (2017). https://doi.org/10.1109/CADSM.2017.7916170
Kondratenko, Y., et al. University curricula modification based on advancements in information and communication technologies. In Proceedings volume of ICTERI 2016, Kyiv on the 21st–24th of June, 2016 184–199 (2016).
Atamanyuk, I. P. Algorithm of extrapolation of a nonlinear random process on the basis of its canonical decomposition. J. Kibern. Sist. Anal. 2, 131–138 (2005).
Atamanyuk, I., Kondratenko, Y., Shebanin, V. & Mirgorod, V. Method of polynomial predictive control of failsafe operation of technical systems. In Proceedings XIIIth International Conference CADSM 2015, PolyanaSvalyava, Ukraine 248–251 (2015). https://doi.org/10.1109/CADSM.2015.7230848
Atamanyuk, I. P. Optimal polynomial extrapolation of realization of a random process with a filtration of measurement errors. J. Automat. Inf. Sci. 41(8), 38–48. https://doi.org/10.1615/JAutomatInfScien.v41.i8.40 (2009).
Atamanyuk, I., Kondratenko, Y. & Sirenko, N. Management system for agricultural enterprise on the basis of its economic state forecasting, complex systems: Solutions and challenges in economics, management and engineering. In Studies in systems, decision and control Vol. 125 (eds Christian, B.V. et al.) 453–470 (Springer, 2018). https://doi.org/10.1007/9783319699899_27.
Parzen, E. An estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962).
Yongcheng, Q. & Yingchao, Z. Empirical likelihood method for complete independence test on high dimensional data. arXiv preprint arXiv:2201.08492 (2022).
Li, H., Zhang, H. & Jiang, H. Combining power of different methods to detect associations in large data sets. J. Brief. Bioinform. 23(1), bbab 488 (2022).
Atamanyuk I. & Kondratenko Y. Calculation method for a computer’s diagnostics of cardiovascular diseases based on canonical decompositions of random sequences. ICT in education, research and industrial applications: Integration, harmonization and knowledge transfer. In Proceedings of the 11 International Conference ICTERI2015, (eds S. Batsakis, et al.), CEURWS, Vol. 1356, 108–120 (2015).
Rotshtein, A. P. Intellectual Technologies of Identification: Fuzzy Logic, Genetic Algorithms, Neuron Networks 320 (UNIVERSUMVinnitsa, Vinnitsa, 1999).
Zadeh, L. A. Fuzzy Sets. J Inf. Control 8, 338–353 (1965).
Zadeh, L. A. The role of fuzzy logic in modeling, identification and control. J. Model. Identif. Control 15(3), 191–203 (1994).
Grigoriev, D. S. & Spitsin, V. G. The application of neural network and discrete wavelet transform for the analysis and classification of electrocardiograms. J. Bull. Tomsk Polytech. Univ. 5, 57–61 (2012).
Gupta, V., Saxena, N. K., Kanungo, A. Gupta, A. & Kumar, P. A review of different ECG classification/detection techniques for improved medical applications. Int. J. Syst. Assur. Eng. Manag. 1–15 (2022).
Author information
Authors and Affiliations
Contributions
I.A., Y.K. wrote the main manuscript text V.H., Y.V. performed numerical experiments.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Atamanyuk, I., Kondratenko, Y., Havrysh, V. et al. Computational method of the cardiovascular diseases classification based on a generalized nonlinear canonical decomposition of random sequences. Sci Rep 13, 59 (2023). https://doi.org/10.1038/s41598022273180
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598022273180
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.