Introduction

Studies show that numerous factors, including disease severity, treatment protocols and the environment, independently determine patients’ prognoses. For example, in patients with chronic heart failure (CHF), many studies have shown that various independent indices of the severity of CHF, such as plasma B-type natriuretic peptide (BNP) level, left ventricular function, exercise tolerance or New York Heart Association (NYHA) functional class affect the time to hospitalization or cardiac death.1, 2, 3, 4, 5 However, because we could not identify the elapsed time until hospitalization in certain patients with CHF, we estimated this time using knowledge of the pathophysiology of CHF, our experience with previous comparable patients and Kaplan–Meier plots of their hospitalization in the clinical studies; we then explained our estimation to each patient. This procedure led us to conclude that estimating the elapsed time to rehospitalization is a type of problem that is specific to clinical medical science because the results and outcomes of biology or basic medical sciences can be derived from mathematically formulated equations. Furthermore, other fields of basic science, such as physics and mathematics or applied sciences, such as mechanics, thermodynamics and fluid dynamics, are mathematically formulated; the observational phenomena in applied sciences other than medical science can be predicted by mathematical equations, for example, the law of universal gravitation.6 The most important issue in deriving a mathematical expression for relationships among two or more factors is the prediction of the future value of one variable based on the other factor(s). All phenomena, such as the severity of CHF and the patients’ characteristics before the occurrence of clinical events, may therefore provide a mathematical equation for the clinical outcome if we can relate factors in the patient’s clinical status to clinical outcomes such as rehospitalization.

To investigate this possibility, we sought to solve the equation τ=f(x1, …, xp), where x1, …, xp represent clinical features affecting the clinical outcome for CHF. We attempted to determine the function (f) to yield τ, the time to rehospitalization, from the clinical parameters (x1, …, xp) reflecting patient characteristics at the time of discharge.

Methods

Ethics statement

This study was approved by National Cerebral and Cardiovascular Center Research Ethics Committee. The Committee decided that the acquisition of informed consent from the 151 subjects was not required according to the Japanese Clinical Research Guideline because this was a retrospective observational study. Instead, we made a public announcement in accordance with the request of the Ethics Committee and the Guideline.

Subjects and clinical parameters

A total of 486 patients with acute decompensated heart failure (ADHF) were admitted between May 2006 and December 2009. Because patients who were admitted for ADHF only once were excluded, the remaining 151 patients were included in this study. The oldest hospitalization was adopted regarding repeat patients during this study. The diagnosis of HF was confirmed by an expert team of cardiologists using the Framingham criteria.7 Careful history-taking, physical examinations, laboratory tests, chest X-rays, electrocardiograms, Doppler echocardiographic studies, coronary angiography and right heart catheterization were performed during the hospitalization. The timing of patient discharge was determined by the expert team of cardiologists in charge of the HF department; discharge was recommended when the patients presented no signs of decompensation, such as NYHA functional class <3, no sign of rales, no galloping rhythm, stable blood pressure and an improvement in renal function due to an optimal treatment that followed international guidelines.8 Rehospitalization for the enrolled patients was defined as hospitalization for decompensated HF. The primary end point was the first rehospitalization for decompensated HF.

Cardiac catheterization

Left ventricular pressure was recorded with a 5-F pigtail catheter. Left ventricular volume and ejection fraction were determined with left ventriculography with a contrast medium using Kennedy’s formula. Right-sided catheterization was performed using a 7 F Swan–Ganz catheter to measure pulmonary capillary wedge pressure, mean pulmonary artery pressure (PAP), right ventricular end-diastolic pressure and mean right atrial pressure. Cardiac output was measured using the estimated Fick principle and the Thermal dilution. Systemic vascular resistance and pulmonary vascular resistance were calculated using the established formulas: systemic vascular resistance=80 × (mean pulmonary artery pressure−mean right atrial pressure)/cardiac output and pulmonary vascular resistance=80 × (mean pulmonary artery pressure−pulmonary capillary wedge pressure)/cardiac output.

Echocardiography

Echocardiographic examinations were performed with a Sonos-5500 (Philips Medical System, Andover, MA, USA), Alpha 10 (Hitachi-Aloka Medical, Tokyo, Japan), Vivid 7 Dimension (GE Healthcare, Buckinghamshire, UK), ACUSON Sequoia C256 (Mochida Simens Medical System, Tokyo, Japan) or Aplio XV (Toshiba Medical Systems, Tochigi, Japan) machine with a 2.5-MHz probe. Patients underwent a Doppler echocardiographic study for HF at admission and before discharge. Standard views were recorded, including the parasternal long-axis, short-axis and apical 4- and 2-chamber views, and cardiac chamber sizes and left atrial dimensions were evaluated according to the recommendations of the American Society of Echocardiography.9 The severity of valve regurgitation was quantified on a semicontinuous scale from none (0) to severe.4 Pulsed-wave Doppler examination and Doppler tissue imaging of the mitral annulus was performed. The peak mitral early diastolic inflow and atrial filling (E and A) velocities and the E-wave deceleration time were obtained. The sample volumes of the pulsed Doppler tissue imaging were determined at the septal and lateral margins of the mitral annulus. The peak early mitral annular velocities were measured, and then the average values of the septal and lateral velocities were used as E′.

The mathematical model for the rehospitalization process

To construct a model for future rehospitalization using the basic clinical factors for the patients, we adopted two working assumptions for the practical rehospitalization process.

Assumption 1

A mean elapsed time τi from discharge to the rehospitalization of patient i depends on some of the given clinical factors of the patient, that is, a common subset over all patients. The dependency is primarily approximated by the following inverse linear relation:

where the denominator represents the expected frequency of cardiovascular rehospitalization per day, is a set of values of the factors in for patient i, βj is the contributing weight of the jth factor to the frequency and γ is the intrinsic frequency for any patient.

Assumption 2

The clinical factors of patient i are fairly stable between discharge and rehospitalization. Thus, the expectation value of the mean elapsed time τi remains nearly constant for patient i. As any event occurring with a constant frequency in a given time period is generated by a Poisson process,10 rehospitalization also occurs via this process under Assumption 2. Thus, the probability density pi(t) for the rehospitalization of patient i at an elapsed time t after discharge is represented by the following exponential formula:

The parameter τi is given by Equation (1) according to Assumption 1.

We next describe the assumption test. Assumption 1 is limited to the relationship between the parameter τi and the clinical factors . If the accuracy of the approximation is insufficient, we can easily extend it to a nonlinear relation such as a higher-order polynomial. Assumption 2 essentially characterizes the process of the occurrence of rehospitalization and defines the formula for its probability density pi(t). Accordingly, before the modeling of the rehospitalization process based on a given data set, a test should be applied to verify that Assumption 2 actually holds true for the given data set.

With n samples in the data set D={(Xi,τi)|i=1, …, n}, where Xi is the set of clinical factor values for patient i, and τi is the elapsed time at rehospitalization after discharge, we first compute a histogram of the rehospitalization occurrences over t, that is, the number of rehospitalization occurrences in each elapsed time interval ((k−1)Δt, kΔt) (k=1, …, q) in the data set. The number of equal-width bins q into which to partition the sample range [0, qΔt] is appropriately chosen to be . (Venables and Repley)11 We also expect a certain value of by Equation (2) under Assumption 2. The value computed from the data set and its value expected by Equation (2), mk, should be consistent if Assumption 2 holds for the data set. Consistency with mk and is evaluated by the following G-score:12

Because this G-score is known to follow a χ2 distribution of degree q−2, we applied a χ2-test to the null hypothesis that the histogram of the given data set is consistent with Equation (2), that is, that Assumption 2 holds true for the data set. If the P-value of the test is less than a specific risk level a such as a=0.05, we conclude that Assumption 2 does not hold for the data set. This G-test is known to be more rigorous than the well-known Pearson’s χ2-test.

Thus, our problem was to derive the expectation valuemk (k=1, …, q) from Equation (2). We considered that τi of the patients in D are sampled from a common population distribution pτ(τ). Therefore, the total probability distribution of the rehospitalization time P(t) is expected to be a superposition of Equation (2) for various τ sampled from pτ(τ), as follows, where p(t) is pi(t) in Equation (2) for a general τ:

We use the following natural conjugate prior distribution for the unknown pτ(τ):

where τi is given by the data set D. The selection of this parameter distribution is widely considered to be reasonable in Bayesian statistics because it preserves the exponential shape of the distribution of elapsed times t.13 After several manipulations, the following P(t) is derived:

Accordingly, the expectation mk is given by the accumulation of P(t) over as follows:

Using Equations (3) and (4), we tested the validity of Assumption 2 for the given data set D.

Finally, we describe the modeling algorithm. First, the value of every factor for all patients i=1, …, n in D was normalized to fit into the interval [0,1] using the maximum and minimum values. This normalization to eliminate differences in the factor scales was necessary to allow for the measurement of the essential contribution of each factor’s variation to τi. Subsequently, we applied Equations (1) and (2) to the normalized data set DN to model the probabilistic rehospitalization process when Assumption 2 holds for the data set. We determined the model parameters βj and γ in Equation (1) to maximize the following objective function:

The first term is the log-likelihood of the model consisting of Equations (1) and (2) over DN. The second term is called an L1-regularization term, which penalizes the coefficients of negligible factors by setting them equal to zero when the larger hyper-parameter λ eliminates more factors.13, 14 This term avoids the over-fitting of the model to the data set by selecting a set of effective factors from a given Xi. In our study, λ is tuned to be 0.02 to maintain the largest value of Equation(5) similarly to the other parameters βj and γ.

To seek the optimum parameter values of β1,…,βp, γ that maximize the objective function L(β1,…,βp, γ), we applied a simple greedy hill-climbing algorithm, in which the parameter values are iteratively modified toward their gradient direction (∂L/β1,…,∂L/βp, ∂L/γ). When the improvement of L becomes nearly negligible, the resulting parameter values are taken as the optima. Because this process depends on the initial values of the parameters, we repeated this optimization 100 times starting with random initial values and selected the result providing the maximum L.

Results

Patients characteristics

Out of the 151 patients, 36 died of cardiovascular events after rehospitalization during the follow-up period. The remaining 115 patients were readmitted to our hospital at a median time of 296 days after discharge (range, 3–1891). Among these patients, the HF etiologies were valvular heart disease (n=38), dilated cardiomyopathy (n=30), hypertrophic cardiomyopathy (n=22), ischemic heart disease (n=20), hypertensive heart disease (n=17) and others. Their mean age was 68.6±14.6 years (range, 19–93), and 38% of the patients were women. The clinical characteristics of the 151 patients are summarized in Table 1.

Table 1 Patient characteristics

Validation of the formula

We hypothesized that the time-to-rehospitalization histogram for all patients (Figure 1) should be distributed exponentially if the mathematically estimated formula for the prognosis of each patient is regarded as a Poisson distribution. We therefore validated the assumptions of the model architecture. The goodness of fit was controlled by a χ2-test, considering that the incidence rates of rehospitalization or death differ depending on the patients. Thus, the null hypothesis that the observed frequency is a mixed Poisson process was tested, as explained in the Methods section. We chose an elapsed time to rehospitalization of 150 days, which is one-thirteenth of the range of the time interval [1,1,950] according to the measure of . As a result, the P-value was 0.29, which was far larger than 0.05, and we confirmed that the null hypothesis was not rejected. Therefore, we concluded that the mathematically derived estimation formula for the rehospitalization of each patient was a mixed Poisson distribution.

Figure 1
figure 1

Time-to-rehospitalization histogram for all patients.

Factors in rehospitalization for HF

We collected 402 clinical factors (Figures 2 and 3), and 150 out of 402 factors having small effects on the prognosis were automatically excluded by the regularization method described in the Methods section. Finally, we selected 252 factors for the analysis (Figures 2 and 3). The estimation results for the attribute coefficients are presented in bar graph form and numerically.

Figure 2
figure 2

Factors influencing the estimation of rehospitalization for HF and the contribution of each parameter. All of the clinical and personal factors for the patients with HF. Predictor variables with coefficient indicate the factors selected after the application of the regularization method. Negative values indicate favorable impact on prognosis, whereas positive values indicate undesirable effect. HF, heart failure; ADHF, acute decompensated heart failure; NYHA, New York Heart Association; CTR, cardiothoracic ratio; ECG, electrocardiogram; AST, aspartate aminotransferase; ALT, alanine aminotransferase; BNP, B-type natriuretic peptide; UIBC, unsaturated iron-binding capacity; LVDd, left ventricular end-diastolic dimension; LVDs, left ventricular end-systolic dimension; FS, fractional shortening; IVS, interventricular septal thickness; PW, left ventricular posterior thickness; LAD, left atrial dimension; TMF-E, the peak mitral inflow early diastolic velocity; TMF-A, the peak mitral inflow atrial filling; DcT, deceleration time; TR PG, tricuspid regurgitation pressure gradient; PAEDP, pulmonary artery end-diastolic pressure; MR, mitral regurgitation; AR, aortic regurgitation; AS, aortic stenosis; MS, mitral stenosis; ACEI, angiotensin-converting enzyme inhibitors; ARB, angiotensin receptor blockers; DM, diabetes mellitus; IABP, intraaortic balloon pumping; PCPS, percutaneous cardio pulmonary support; EDVI, end-diastolic volume index; ESVI, end-systolic volume index; IVC, inferior vena cava respiratory change; E/E′, ratio of peak mitral E-wave velocity to peak mitral annular velocity.

Figure 3
figure 3

Factors influencing the estimation of rehospitalization for heart failure and the contribution of each parameter. All of the medications at discharge for the patients with heart failure. Medications were calculated as ratios of their recommended doses. All drugs were divided into 55 groups. Predictor variables with coefficient indicate the factors selected after the application of the regularization method. Negative values indicate favorable impact on prognosis, whereas positive values indicate undesirable effect. HF, heart failure; ADHF, acute decompensated heart failure; NYHA, New York Heart Association; CTR, cardiothoracic ratio; ECG, electrocardiogram; AST, aspartate aminotransferase; ALT, alanine aminotransferase; BNP, B-type natriuretic peptide; UIBC, unsaturated iron-binding capacity; LVDd, left ventricular end-diastolic dimension; LVDs, left ventricular end-systolic dimension; FS, fractional shortening; IVS, interventricular septal thickness; PW, left ventricular posterior thickness; LAD, left atrial dimension; TMF-E, the peak mitral inflow early diastolic velocity; TMF-A, the peak mitral inflow atrial filling; DcT, deceleration time; TR PG, tricuspid regurgitation pressure gradient; PAEDP, pulmonary artery end-diastolic pressure; MR, mitral regurgitation; AR, aortic regurgitation; AS, aortic stenosis; MS, mitral stenosis; ACEI, angiotensin-converting enzyme inhibitors; ARB, angiotensin receptor blockers; DM, diabetes mellitus; IABP, intraaortic balloon pumping; PCPS, percutaneous cardio pulmonary support; EDVI, end-diastolic volume index; ESVI, end-systolic volume index; IVC, inferior vena cava respiratory change; E/E’, ratio of peak mitral E-wave velocity to peak mitral annular velocity.

Regarding underlying diseases in HF, whereas dilated cardiomyopathy (−4.5), hypertrophic cardiomyopathy (−1.5) and hypertensive heart disease (−1.0) had better outcomes, valvular disease (7.4) and dilated phase hypertrophic cardiomyopathy (2.4) had poor prognoses. Ischemia (4.4) was the worst trigger of HF. Based on laboratory data, whereas elevated inflammatory response values, such as white blood cell counts (−1.6/5.8; at admission/at discharge) or C-reactive protein levels (−2.2/8.1; at admission/at discharge), did not indicate a poor prognosis at admission, these elevated inflammatory response values at discharge were associated with a poor prognosis. Increases in the levels of aspartate aminotransferase (6.6), alanine aminotransferase (3.2), uric acid (6.6) and BNP (4.8) at discharge also indicated a poor prognosis. Patients who received dopamine (11.9), isosorbide dinitrate (5.0) or diuretic (2.0) infusions in the acute management of HF showed worse prognoses. In contrast, the use of dobutamine (−2.5) or nitroglycerin (−2.5) drip infusions resulted in better prognoses.

Regarding oral medications at discharge, the angiotensin-converting enzyme alacepril (−4.2), the β-blocker carvedilol (−7.1, the best response), the angiotensin receptor blocker telmisartan (−1.6), the diuretic furosemide (−4.2), the lipid-lowering drugs pitavastatin (−3.3), atorvastatin (−2.9) and ezetimibe (−2.2), the coronary dilator isosorbide dinitrate (−3.1), the antiallergic fexofenadine hydrochloride (−5.1), the sedative-hypnotic triazolam (−3.2), proton pump inhibitor lansoprazole (−0.9) and all antiflatulents, except toughmac, led to better prognoses. However, Ca inhibitor nifedipine (9.4) resulted in the worst outcome, and all diabetes drugs, antiarrhythmic drugs, potassium agents, vitamins and purgatives, excluding senna, were associated with worse prognoses.

Fitting the model to clinical data

The mean actual value for rehospitalization (X) was 388±377 days, whereas the mean estimated value calculated by the probability model based on a Poisson process (Y) was 398±381 days; X and Y were very tightly correlated (Figure 4). The results showed that the mathematical formula for rehospitalization time is the dependent variable, and the clinical and personal factors before rehospitalization are the independent variables.

Figure 4
figure 4

Correlation between the clinical data and the values calculated using the mathematical formula. The clinical data are in excellent agreement with the calculated times.

Discussion

This study provided evidence that the values of numerous factors, including risk factors at one phase of disease, can be used to construct a mathematical equation to predict clinical outcomes. We were able to derive the equation τ=f(x1, …, xp), where τ is the time to a future clinical event and x1, …, xp are clinical factors observed before the event. In this case, τ represents the days until rehospitalization after discharge, and x1, …, xp are the clinical and personal factors for patients hospitalized for ADHF. This study provides evidence that the clinical outcome of τ in this context is a function of 252 significant factors such as plasma BNP levels at and soon after discharge. This study presents the time to rehospitalization as the dependent variable and the clinical and personal factors before rehospitalization as the independent variables.

This study suggests the novel idea that the time to clinical events, such as rehospitalization or death, can be mathematically formulated from clinical and personal factors, demonstrating that clinical medicine can engage in physical science. The novelty of this study is based on the fact that clinical outcomes have been thought to be determined mainly from medical knowledge and the experience of the physicians. It can be argued that the known effectiveness of drugs may determine the time course of clinical events. Although this is partially true,15, 16, 17 no one knows how one drug or the combination of several drugs affects patients with different degrees of severity of a given disease. It may also be argued that large-scale trials may better depict clinical outcomes; for example, the patients with BNP levels of <170 pg/ml showed a 20% reduction of rehospitalization compared with the patients with BNP levels greater than 170 pg/ml.18, 19 Evaluating such results by Kaplan–Meier analysis is common in clinical medicine; however, this analysis only provides the average tendency of the average patient to undergo rehospitalization and does not prospectively provide a future clinical outcome for each patient. Indeed, in the epidemiological study, many biomarkers, such as BNP levels or C-reactive protein levels in addition to the classical risk factors, such as hypertension or diabetes mellitus, are known to be related to cardiovascular events and death. However, Wang et al.20 showed that although multiple biomarkers are associated with a high relative risk of adverse events, even in the combination of these factors they add only moderately to the prediction of risk in an individual person. This suggests that the occurrence of cardiovascular events may not be well predictable or mathematically formulated. On the other hand, using the formula developed in this study, we can identify the day of a clinical event to within a small range, suggesting that we need more clinical data to predict the future outcomes or obtain the mathematical formula for the prediction than we expected.

It would be difficult to strictly prove that this mathematical formula is correct because no gold standard or correct answer is available in the medical literature. However, there are hints as to the correctness of this formula. First, we assume that the probability of rehospitalization follows a Poisson distribution; if this is true, a histogram of the day of rehospitalization after discharge should follow a Poisson distribution. We found that the present data for the actual day of rehospitalization are distributed as a Poisson distribution. Second, when we compared the day of rehospitalization in a clinical setting and the calculated day of rehospitalization obtained by the formula, these two data are well fitted, suggesting that the current formula is likely to be correct. Third, we prevented over-fitting of the clinical data using the free variables, indicating the suitability of the present formula.

We do not believe that this equation is the perfect formula to predict the day of rehospitalization from numerous variables. Although we included 402 factors as the free variables, including factors as diverse as echocardiographic data and marital status, we may have neglected to include other unknown but important factors that may determine the day of rehospitalization. We did not include information on patient genetic backgrounds, such as point mutations in the myosin heavy chain, or social status, such as occupation or annual income, private matters, such as hobbies or personal characteristics, and mental health parameters, such as depression. The inclusion of these issues may improve the formula presented in this study; however, the present formula already provides a good fit with an R2 value of 0.9879. Most importantly, the importance of the possibility of constituting such a mathematical formula in clinical practice is now clear.

In this study, we assumed that a linear function of each parameter contributes to the formation of the formula for the clinical outcome. One might suggest the use of nonlinear functions of all of the factors to provide a more accurate approximation of the rehospitalization time. In fact, we performed a nonlinear analysis using this data, and surprisingly, the nonlinear method using support vectors yielded no improvement over the present formula using the linear functions of the factors.

Limitations

First of all, the factors in this study may have confounded each other, and we used the regularization method to eliminate automatically the factors that have weak effects on prognosis. Although the remaining factors with strong effects on prognosis could have confounded each other, the results of this study are probably not weakened because we obtained a good fitting to the clinical outcome using these factors. When we consider the clinical and pathophysiological meaning of each factor, we need to pay attention to each factor independently.

The other main limitation of this study is that the patient population consists of a retrospective cohort. However, because we enrolled all of the patients who were admitted to our department during the entry period, the selection bias may be small. Furthermore, this is a single-center study, so the formula may be true only in our institute. However, because (1) approximately one-half of the patients who were hospitalized during this time were referred from other hospitals, (2) the nature and treatment of HF did not differ among the hospitals and (3) our hospital sets a high standard for CHF treatment and specializes in receiving CHF patients from all over Japan; we believe that the formula developed in this study may be generalized. We estimated the day of rehospitalization in this study; however, the important issue is the ability to make this prediction, which needs further investigation.

Conclusions

This study demonstrated that clinical medicine and practice can use a mathematical formula to predict clinical outcomes or events using current data. A prospective study is needed to test whether this formula predicts the day of rehospitalization in CHF patients who are admitted because of ADHF and discharged after treatment. The application of these risk factors to individual CHF patients may distinguish those patients who are at low risk from those who are at high risk and may benefit from closer monitoring and aggressive treatment.