The impact of creating mathematical formula to predict cardiovascular events in patients with heart failure

Since our retrospective study has formed a mathematical formula, α = f(x1, …, x252), where α is the probability of cardiovascular events in patients with heart failure (HF) and x1 is each clinical parameter, we prospectively tested the predictive capability and feasibility of the mathematical formula of cardiovascular events in HF patients. First of all, to create such a mathematical formula using limited number of the parameters to predict the cardiovascular events in HF patients, we retrospectively determined f(x) that formulates the relationship between the most influential 50 clinical parameters (x) among 252 parameters using 167 patients hospitalized due to acute HF; the nonlinear optimization could provide the formula of α = f(x1, …, x50) which fitted the probability of the actual cardiovascular events per day. Secondly, we prospectively examined the predictability of f(x) in other 213 patients using 50 clinical parameters in 3 hospitals, and we found that the Kaplan–Meier curves using actual and estimated occurrence probabilities of cardiovascular events were closely correlated. We conclude that we created a mathematical formula f(x) that precisely predicted the occurrence probability of future cardiovascular outcomes of HF patients per day. Mathematical modelling may predict the occurrence probability of cardiovascular events in HF patients.

and that this mathematical formula may not predict the future clinical outcomes such as a law of gravity 9 -the law of gravity guarantees the time for an object to reach the ground.
To clarify that our mathematical model prospectively provides the probability of cardiovascular events, we devised a mathematical formula using the clinical retrospective data of patients with HF and tested whether this formula can predict the probability of future clinical cardiovascular events per day in patients with HF. If this is proved, we can obtain the formula to predict the occurrence probability of cardiovascular events using many clinical or social parameters beforehand, leading to the precision medicine of HF 10,11 .

Ethics statement. This study was approved by National Cerebral and Cardiovascular Center Research
Ethics Committee (M22-49, M24-51). The Committee decided that the acquisition of informed consent from 167 patients was not required according to the Japanese Clinical Research Guideline because this was a retrospective observational study. Instead, we made a public announcement using both internet homepage of our institution and bulletin boards of our out-patient and in-patient clinics in accordance with the request of the Ethics Committee and the Guideline. For the prospective observational study of 213 patients, we obtained written informed consent after the approval of Research Ethics Committees in three institutes of National Cerebral and Cardiovascular Center and Hokkaido and Kyushu Universities. Registration number of the clinical trial is UMIN000018691 at https://upload.umin.ac.jp/cgi-open-bin/ctr/ctr. cgi?function=brows&action=brows&recptno=R000021637&type=summary&language=J.
Protocols. Protocol I: The creation of the mathematical formula using the retrospective data. Since we retrospectively obtained 252 clinical parameters among 402 parameters in 152 patients with acute decompensated HF (ADHF), calculated the formula to provide the probability of cardiovascular events (the hospitalization or death due to HF) 8 and added 16 patients in the patients' cohort after sorting the data, we enrolled 167 patients with ADHF admitted between November 2007 and October 2009. We followed up these patients until the time of cardiovascular events or December 2014. The diagnosis of HF was confirmed by an expert team of cardiologists using the Framingham criteria 12 .
Here, we showed how to create the mathematical formula to predict the cardiovascular events in the previous study. First of all, our hypothesis in the previous study is that we can derive a mathematical formula for the estimation of prognosis, i.e., the equation τ = f(x 1 , …, x p ), where x 1 , …, x p are clinical features and τ represents the day for the cardiovascular event in the patients with HF, and we showed the positive evidence to support such a hypothesis in the previous study. In the present study, we prospectively tested the predictive capability and feasibility of the mathematical formula of cardiovascular events in HF patients to strengthen the feasibility of the creation of the mathematical formula to predict the probability of the cardiovascular events.
Then we explained how we performed to create the mathematical formula of τ = f(x 1 , …, x p ) in the previous study. Since we obtained 402 parameters at the discharge following the hospitalization due to ADHF from the data of careful history-taking, physical examinations, laboratory tests, chest X-rays, electrocardiograms, complete Doppler echocardiographic studies, coronary angiography, right heart catheterization, cardiac scintigraphy, cardiovascular magnetic resonance, cardiopulmonary exercise testing and polysomnography in patients with HF, we hypothesized that all or some of the parameters influence the time of cardiovascular events to some extents, and we quantitatively assessed the occurrence probability of the cardiovascular events using the probability model based on the Poisson process. Thus, the probability density p i (t) for the cardiovascular events of patient i at an elapsed time t after discharge is represented by the following exponential formula: A mean elapsed time τ i from discharge to the rehospitalization of patient i depends on some of the given clinical factors of the patient, i.e., a common subset X X S i i ⊆ over all patients. The dependency is primarily approximated by the following inverse linear relation: where the denominator represents the expected frequency of cardiovascular rehospitalization per day, X S i is a set of values of the factors in X S for patient i, β j is the contributing weight of the j th factor to the frequency, and γ is the intrinsic frequency for any patient. We considered that τ i of the patients are sampled from a common population distribution p τ (τ). Therefore, the total probability distribution of the rehospitalization time P(t) is expected to be a superposition of Eq. (1) for various τ sampled from p τ (τ), as follows, where p(t) is p i (t) in Eq. (1) for a general τ: From these two equations we obtained the following equation.
Then we used the following natural conjugate prior distribution for the unknown p τ (τ): where τ i is given by the dataset D.
After several steps of the manipulation, we finally described the modeling algorithm. First, the value of every factor ∈ x X j i i for all patients i n 1, , = … in D was normalized to fit into the interval [0, 1] using the maximum and minimum values. This normalization to eliminate differences in the factor scales was necessary to allow for the measurement of the essential contribution of each factor's variation to τ i . Subsequently, we applied the equations (1) and (2) to the normalized dataset D N to model the probabilistic rehospitalization process and we determined the model parameters β j and γ in the equation (1) to maximize the following objective function: l n e xp (5) The first term is the log-likelihood of the model consisting of the previous equations over D N . The second term is called an L1-regularization term, which penalizes the coefficients of negligible factors by setting them equal to zero when the larger hyper-parameter λ eliminates more factors. This term avoids the over-fitting of the model to the dataset by selecting a set of effective factors X S i from a given X i . In our study, λ is tuned to be 0.02 to maintain the largest value of the equation (5) similarly to the other parameters β j and γ.
To seek the optimum parameter values of β 1 , …, β p , γ that maximize the objective function L(β 1 , …, β p , γ), we applied a simple greedy hill-climbing algorithm, in which the parameter values are iteratively modified toward their gradient direction ( ) When the improvement of L becomes nearly negligible, the resulting parameter values are taken as the optima. Because this process depends on the initial values of the parameters, we repeated this optimization 100 times starting with random initial values and selected the result providing the maximum L. This was how we selected 252 influential parameters among the 402 clinical parameters in the previous study 8 .
Then we selected the most influential 50 parameters among 252 parameters and revised the mathematical formula. The 50 most influential parameters in the present study are defined as the clinical parameters with the 50 highest coefficients values shown in the previous manuscript 8 . The number of the 50 is arbitrary and the realistic values to be collected for the prospective study.
Protocol II: The prospective study to validate the mathematical formula. We prospectively enrolled 213 patients with ADHF admitted between May 2013 and March 2015 in three different hospitals of National Cerebral and Cardiovascular Center (n = 114) and Hokkaido (n = 80) and Kyushu Universities (n = 19) and followed Figure 1. The Kaplan-Meier plots of calculated and actual cardiovascular event-free rates in Protocol I (the retrospective study). The actual cardiovascular events started slightly later than the calculated events and ended earlier than the calculated events; however, the goodness-of-fit model found that KM and predictive curves were significantly close, and the coefficient of determination was P = 0.8404. up these patients until the time of cardiovascular events or the end of April 2016. The timing of patients' discharge was determined by an expert team of cardiologists in charge of the HF department; discharge was recommended when patients presented no signs of decompensation such as the New York Heart Association (NYHA) Functional Classfication <3, no sign of rales, no galloping rhythm, stable blood pressure and an improvement in renal function due to an optimal treatment that followed international guidelines 13 . Rehospitalization was defined as hospitalization for decompensated HF and cardiovascular death was defined as the death due to the worsening of HF. The primary endpoint was the first cardiovascular event of either rehospitalization or death due to the worsening of HF.
Then we created the mathematical model for the occurrence probability of cardiovascular events. First of all, we assumed that the probability of cardiovascular events per day of patients does not change significantly from its discharge to its cardiovascular events. We defined the mathematical formula to predict the constant occurrence probability of cardiovascular events per day as follows: where α is the estimated occurrence probability of cardiovascular events per day for a patient, X x x ( , , ) is a weight vector of the features, and c is an intercept of α. In this study, 50 clinical features, that is, p = 50, was used. As any event occurring with a constant probability in a given time period is generated by a Poisson process 14 , cardiovascular events of a patient also occur through this process with its individual α. Thus, the probability density for cardiovascular events of a patient at an elapsed time t after discharge is represented by the following exponential formula: where X i and t i are the clinical feature vector and the elapsed days at the cardiovascular event from the discharge of a patient i, respectively, the expected survival curve of patients in D R is represented as: The Kaplan-Meier plots of calculated and actual cardiovascular event-free rates in patients in NCVC in Protocol II (the prospective study). The actual cardiovascular events started slightly later than the calculated events and ended earlier than the calculated events; however, the goodness-of-fit model found that KM and predictive curves were significantly close, and the coefficient of determination was P = 0.0784.  where P RE (X) is the population distribution of the retrospective dataset D R . N R is 167 in our case. Conversely, we directly derived the Kaplan-Meier survival curve P R (t) using D R by following a standard procedure 15 . Then, we estimated the best parameter values of β and c, which minimize the following Kullback-Leibler divergence (KL-divergence) 16 . The KL-divergence is a well-known statistical measure to reveal the discrepancy between two probability distributions.
where D RR is a dataset excluding the patients whose observations are censored and, thus, do not have t i in D R . The parameters β and c minimizing this measure are determined by using the Nelder-Mead method 17 , which is a renowned non-linear optimization algorithm. We used these estimated parameter values of β and c to predict the survival curve of a given prospective data- where N P is 213 in our case. The predicted survival curve was obtained by substituting the above-mentioned best values of β and c and the clinical feature vectors X i of patients in D P to the We compared this predicted curve for the prospective dataset D P and the Kaplan-Meier survival curve 15 P P (t) directly derived from D P .

Statistical Analysis.
Normally distributed data were expressed as mean ± standard deviation; other values were reported as a median and interquartile range (IQR). We conducted the goodness-of-fit test and used the coefficient of determination as a measure to assess the significant relationships between the predictive curves and actual Kaplan-Meier curves of the cardiovascular event-free rate. The differences in the predictive curves were tested using the Wilcoxon signed-rank test. We estimated the error bounds of the parameters, α and β, by applying the standard bootstrap sampling 16 . All tests were two-tailed, and P < 0·05 was considered significant. All analyses were performed using the JMP software for Windows (version 8.0.2, SAS Inc., Cary, NC).

Results
Patients characteristics. In   Predictive capability of the mathematical formula for the prospective outcomes. We confirmed that the Kaplan-Meier curves using this formula and actual data in the retrospective study revealed the proper fitting of the probability of cardiovascular outcomes (Fig. 1). Then, in the prospective study, we just analyzed the prospective data using only our institute. Figure 2 shows that the mathematical formula obtained from the retrospective study can predict the clinical outcomes observed in the prospective study. Thus, we tested whether our formula can predict the probability of cardiovascular events in all the institutes, and we found that our formula can predict the clinical outcomes for three institutes (Fig. 3).
The factors that provoke or prevent cardiovascular events in 50 clinical factors. Since we found that the mathematical formula applies to predict the occurrence of cardiovascular events in the prospective study, we assumed that each attribute coefficient for this mathematical formula is also essential for the clinical practice for HF (Table 2). When we investigated the contribution of each parameter for the objective measure, we found that ischemic heart disease results in a worse prognosis. In the physical examination, high heart rate or implantation of pacemaker classification was the worse factor, and the implantation of cardiac resynchronization therapy or implantable cardioverter defibrillator demonstrated better outcomes. Furthermore, the data of blood analysis, echocardiography and oral medications related to the cardiovascular events in the complex and confounding manners. Intriguingly, the number of family members resulted in a better prognosis.

Discussion
This study provided the evidence that the mathematical formula using the retrospective clinical data provides the occurrence probability of cardiovascular events in the prospective study in patients with HF. We were able to derive the formula of α = f(x 1 , …, x 50 ), where α is the probability of the cardiovascular events and x 1 , …, x 50 are clinical factors observed before cardiovascular events, which could prospectively predict the occurrence probability of cardiovascular events. This study proposes the novel idea that the occurrence probability of future cardiovascular events can be mathematically formulated and deduced from the retrospective clinical and personal parameters before the time of cardiovascular events. Importantly, we found that the occurrence probability depends not only on the cardiac dysfunction but also the dysfunction parameters of other organs, such as the kidneys and liver, and social factors, such as the number of family members living with a patient. Therefore, we can regard the occurrence probability as the overall severity of HF. This concept is well matched to the idea that we need to investigate the effect of certain treatment of HF by judging the mortality or morbidity, but not by cardiac function in large-scale clinical trials 18 . The mortality or morbidity during a certain observation period is depicted by the Kaplan-Meier curves, which represent the occurrence probability of cardiovascular events.
What is the differences between the present and previous studies to assess clinical outcomes? The earlier studies, including ours [19][20][21] , have merely identified the important factors for cardiovascular outcomes using the cohort data of patients with HF. In such studies, clinical data are retrospectively or prospectively collected and identified the most influential factors using the multivariate analysis. However, no researcher has tested whether such multiple factors can quantitatively predict the occurrence probability of future cardiovascular events. Most of all, arbitrary factors, which are unintentionally collected by investigators and usually ignored, may be essential factors to explain the occurrence probability, and the investigator-intended analysis of the data cannot cover such arbitrary factors beyond expectation. This is the concept of analysis of big data or data mining analysis 22 . Wang et al. 23 revealed that although multiple biomarkers are associated with a high relative risk of adverse events, even the combination of these factors only moderately improved the prediction of risk in an individual. This suggests that the occurrence of cardiovascular events may not be well predictable even after the multiple factors are convoluted. In contrast, we collected almost all the numerical data in the medical records documented before the onset of cardiovascular events and solved the mathematical formula using these parameters to provide the exact probability for future cardiovascular events. Of more than 250 clinical factors that constitute the original mathematical formula 8 , we selected the 50 most influential factors and re-solved the mathematical formula. The mathematical formula using these 50 factors potentially validates its plausibility for the calculation of the occurrence probability of cardiovascular events in patients with HF, suggesting that we need more clinical data to predict the future outcomes or obtain the mathematical formula for the prediction than we expected. WBC values at admission may approximately indicate the unique value of each patient. On the other hand, the most abnormal values at the admission may determine the severity of the pathophysiology of CHF.
How do we interpret the mathematical formulae given in the present study? One may argue that our process is just adjusting or fitting the clinical data with the clinical outcomes using the mathematical formula. Nevertheless, if the clinical parameters had no relation to the time of the occurrence of cardiovascular events, we could not have fitted clinical parameters with the objective measures. Since we could fit the clinical parameters before the time of the occurrence of cardiovascular events with the objective function of the probability of cardiovascular events, we consider that our fitting process of the mathematical formula seems reasonable. To further confirm the feasibility and applicability of the framework of the present investigation, we agreed to this criticism against our previous work 8 and decided to perform the prospective study to test the validity of our mathematical formula to predict the possibility of future cardiovascular events. Figures 2 and 3 support our hypothesis; thus, we can propose the predictability and reproducibility of the occurrence of cardiovascular events in patients with HF using the mathematical models. On the other hand, the patients' characteristics for retrospective and prospective studies are quite different, as shown in Tables 2 and 3. Patients for the prospective study seemed to have suffered from severer HF than those for the retrospective study. Nevertheless, the Kaplan-Meier curves produced by the formula can provide the right fitting for the actual data of the prospective study, suggesting that the present formula is valid for any group of patients with HF.
It would be intriguing to see the coefficient of each clinical parameter for the mathematical formula. We have to note that we revealed that 50 factors are essential to constitute the function of the occurrence probability of cardiovascular events, however, these factors are confounded in each other, of which the mathematical formula is created, indicating that we should recognize the importance of the network of these 50 factors in creating the formula rather than the clinical impact of each factor. We should be cautious of the fact that some of the 50 clinical parameters are largely and sensitively affected by the acute changes of the pathophysiology of HF. Since such parameters contribute to the creation of the present formula, we can only conclude that each value at the admission or the discharge in each patient affects the occurrence probability of cardiovascular events after discharge. We need to investigate the pathophysiological meaning in the future study.
The most important issue is that we can provide the predictive model of cardiovascular events in HF patients using 50 factors and verify the feasibility of the model in the cohort of HF patients in 3 different institutes.
Another important point of this study is that we formed the mathematical formula by the retrospective clinical data in National Cerebral and Cardiovascular Center at the central part of Japan and tested the applicability in the prospective data in Hokkaido University located in the north of Japan and Kyushu University at the southern part. Although one may consider that this mathematical formula is only valid in National Cerebral and Cardiovascular Center, it is not the case. In fact, this mathematical formula to predict the possibility of cardiovascular events in patients with HF is valid throughout Japan. This mathematical formula may not be valid in other countries; however, the pathophysiology and treatment strategy of HF are common worldwide, suggesting that such formulas should be valid to provide the future occurrence of cardiovascular events in other countries. Of course, the concept to create a mathematical formula should be translated and transmitted worldwide to know the real risk of cardiovascular events and to treat the clinical factors using their data in patients with HF.
There are several applications and limitations for the present study. First of all, since these 50 clinical parameters can be easily provided in outpatient or inpatient clinics, we can evaluate the severity of HF from the viewpoint of the probability of the onset of cardiovascular events in each patient. Secondly, we can identify what clinical factors increase the probability of cardiovascular events, suggesting that we can identify the target of the treatment of HF in each patient. Thirdly, this formula may provide the educational tool for the HF patients. Fourthly, the concept of the creation of formula to predict the clinical outcomes may be applicable to the other fields such as cerebral infarction or cancers 24 . On the other hand, we have some limitations of the present formula because we created the formula using the data of the HF patients with mild to moderate severity of HF symptom. Therefore, we are not able to apply the present formula to the severe HF patients to predict the occurrence probability of the cardiovascular events because we did not derive the present equation from the cohort of severe HF patients. To respond this requirement, we need to create the mathematical formula using the data of the severe HF patients.

Conclusions
We created a mathematical formula that precisely provides the probability of the clinical outcomes of patients who are hospitalized with ADHF and discharged after appropriate treatment. Mathematics using the present cardiovascular big data may predict the occurrence probability of future cardiovascular events. Since we found the importance of the clinical parameters independent of cardiac function, it merits the better treatment of HF.