Recurrent probabilistic neural network-based short-term prediction for acute hypotension and ventricular fibrillation

In this paper, we propose a novel method for predicting acute clinical deterioration triggered by hypotension, ventricular fibrillation, and an undiagnosed multiple disease condition using biological signals, such as heart rate, RR interval, and blood pressure. Efforts trying to predict such acute clinical deterioration events have received much attention from researchers lately, but most of them are targeted to a single symptom. The distinctive feature of the proposed method is that the occurrence of the event is manifested as a probability by applying a recurrent probabilistic neural network, which is embedded with a hidden Markov model and a Gaussian mixture model. Additionally, its machine learning scheme allows it to learn from the sample data and apply it to a wide range of symptoms. The performance of the proposed method was tested using a dataset provided by Physionet and the University of Tokyo Hospital. The results show that the proposed method has a prediction accuracy of 92.5% for patients with acute hypotension and can predict the occurrence of ventricular fibrillation 5 min before it occurs with an accuracy of 82.5%. In addition, a multiple disease condition can be predicted 7 min before they occur, with an accuracy of over 90%.

Biometric information monitoring devices are used in various clinical scenarios such as surgeries and the intensive care units (ICUs) 1 . Many of these devices raise an alarm when clinical deterioration of the patient (detected via biological indices) is detected. For example, a pulse oximeter, which is capable of measuring the saturation of peripheral oxygen ( SpO 2 ) through a simple pinch on the fingertip, raises an alarm when SpO 2 is below the threshold value (Generally 89-92% 2 ). In other cases, blood pressure (e.g., diastolic blood pressure) can be continuously measured using a sphygmomanometer, and an alert is sounded when it falls below a set threshold. The thresholds for many of these alarms are set based on prior experiences of the healthcare provider and the patient's condition 3 . These medical devices can perform long-term monitoring of the patients' biological information and are important for efficient and effective treatment. However, conventional medical devices raise an alarm only after detecting a deterioration. This proves to be problematic for the medical staff, who cannot stay near the patient all the time.
To solve this problem, several studies have proposed clinical deterioration prediction systems 4,5 . For example, Langley et al. 4 focused on the change of heart rate intervals and they proposed an approach to predict the development of idiopathic atrial fibrillation with an accuracy of 56.0%. This approach used deviance from the average heart rate interval as a predictor. Lynn and Chiang 6 proposed an algorithm based on nonlinear features

Materials and methods
proposed method. Figure 1 shows the proposed prediction method. First, the measured biological signals were preprocessed. Preprocessing includes calculation of indices related to HRV. The preprocessed signal was then fed to a probabilistic neural network (i.e., R-LLGMN) to predict probabilities of conditions in future P minutes. In this section, the proposed method is discussed in detail.
Preprocessing. An analysis on HRV was performed on the heart rate interval (RRI) acquired from an electrocardiograph. In this paper, we configured the RR recording interval as 1 minute in accordance with the previous studies 21-23 on the short-term and ultra-short-term HRV analyses. The related indices include coefficient of variation of R-R intervals (CVRR) and the following indices that reflect vagal tone intensity 24,25 : root mean square successive difference 26 (RMSSD) and number of pairs of successive RRI that differ by more than 50 [ms] 27 (pNN50). The aforementioned indices can be calculated using the following equations: where N RRI is the total number of RRIs in 30 s, RRI mean is the average value of the RRIs in 30 s, N dif is total number of successive adjacent RRI differences, and N dif 50 is total number of successive adjacent RRI differences whose absolute values is greater than or equal to 50 [ms].
For biological signals that were not obtained from electrocardiographs, we employed the following preprocessing methods: (a) Standardise the biological signal to the normal distribution N(0, σ d ): (b) Time-differentiation using a differential filter that can reduce measurement noise based on the centred difference method as follows 28 : www.nature.com/scientificreports/ where s(t) represents a measured biological signal and h is the sampling time.
Proposed prediction model. Prediction model for acute deterioration triggered by the target symptoms must satisfy the following requirements: (1) Ability to account for the time series characteristics of biological signals.
(2) Ability to express diversity of patients' conditions.
(3) Ability to express uncertainty of the predicted physical condition in a probabilistic manner.
(4) Ability to simultaneously evaluate multiple types of biological signals. (5) Applicability to different medical fronts and different patients.
To satisfy the first two requirements, we apply HMM 29 . HMM can express various symptoms of the patient by applying a concept called "states" and "probabilistic transitions" between the states. For example, the condition of a patient can be defined as either "normal deterioration" or "acute deterioration", and the temporal change of the biological signal drives the probabilistic transition between the defined conditions. However, biological signals are expected to be complex nonlinear waveforms. Therefore, they are approximated using multidimensional mixed Gaussian distribution model 30 , which is capable of expressing multimodal distribution by weighted summation of multiple Gaussian distributions 31 . To approximate complex waveforms of biological signals and to satisfy requirements (3) and (4), the proposed model was constructed based on continuous density HMM 32 , which is a combination of HMM and the multi-dimensional Gaussian mixture model. Probabilistic output of each state of the HMM can thus be represented by multi-dimensional mixed Gaussian distribution, which enables calculation of the occurrence probability of a deterioration event from multiple types of biological signals. The bottom row of Fig. 1   , where x(t) ∈ R d , the a posteriori probability of class c, P(c|x(t)) is derived as follows: where γ c k ′ ,k is the probability of state transition from k ′ to k in class c, and b c k (x(t)) is defined as the a posteriori probability for state k in class c corresponding to x(t) . The prior probability π c k is equal to P(c, k)| t=0 . Assuming that the posterior probability b c k (x(t)) is given by a multidimensional Gaussian mixture model consisting of M c,k components, γ c k ′ ,k b c k (x(t)) can be rewritten as follows: where r (c,k,m) is the mixing proportion, µ (c,k,m) ∈ R d is the mean vector, and � (c,k,m) ∈ R d×d ) is the covariance matrix of each component. The parameters included in the model are used to estimate the probability distribution and generate the posterior probability P(c|x(t)) of acute deterioration. To set a machine learning framework for the determination of parameters and to satisfy requirement (5), an R-LLGMN 14 (see Supplementary Information S1) was employed. The output of R-LLGMN is represented by the posterior probability of each class based on the multidimensional Gaussian mixture model. In addition, because the parameters of R-LLGMN can be adjusted for the given learning dataset, the proposed method can be applied to various symptoms. Let us represent a pair of learning data for R-LLGMN consisting of the input vector of X is the posterior probability of class c. Each index represents the following: n = 1, 2, . . . N is the dataset number, T d is the total time step to output a posterior probability vector O c (n) at the output layer of R-LLGMN, class c = 1 represents the class for normal condition, and c = 2 represents the class for occurrence of deterioration event, such that C = 2 . The evaluation function J is then defined by the following equation: The learning process is applied to minimise the above function (i.e., maximising the likelihood). The weight parameters in R-LLGMN are iteratively updated using backpropagation through time 33 (BPTT). BPTT is a method of accumulating the error gradient in the time series and calculating the weight correction amount for each iteration. After the parameters included in the first and second layer are adjusted, R-LLGMN can predict the class of the condition, such as normal or acute deterioration, of the target patients P minutes from the acquisition of the biological signals.

Experimental configuration.
In order to verify the prediction performance of the proposed model, a prediction experiment was conducted using datasets that include various cases of ICU patients, published by Physionet 13 and the University of Tokyo Hospital. The dataset provided by the ICU at the University of Tokyo Hospital is composed of vital signs of patients in the ICU, with a sampling time of 1024 [ms]. In addition, two types of information are given as expert annotations. These are technical validity and clinical relevance of the alarm raised by the biological information monitor. Technical validity is the result of the diagnosis performed by the nurse when an alarm is raised, and clinical relevance is the result of the diagnosis performed by the doctor when the alarm is raised.
For the Physionet 13 dataset, one of the authors took part in an online ethics training program called "protecting human research participants 34 " (Certification number: 1756830, Acquisition Date: May. 2,2015). For the University of Tokyo Hospital dataset 35,36 , the authors confirmed that all data were provided with authorisation from the University of Tokyo Hospital ethics committee. Furthermore, informed consent was taken from all examinees or their families. Using the above datasets, the following five analyses were conducted: www.nature.com/scientificreports/ (i) Preprocessing selection: The influence of two types of preprocessing methods on the prediction accuracy was tested to determine the best preprocessing method. In addition, the optimal σ d was also determined. (ii) R-LLGMN hyperparameters selection: The influence of M c,k and K c on the prediction accuracy and the learning time was tested using the same dataset that was used for preprocessing selection. (iii) Prediction of Acute hypotension: Using the hyperparameters determined above, the prediction accuracy on occurrence of acute hypotension was examined. The accuracy was compared with the previous methods. (iv) Prediction of Vf: The accuracy of Vf prediction was tested. This analysis aims to investigate whether the proposed method can predict occurrence of acute diseases other than acute hypotension. (v) Prediction of multiple symptoms: The prediction accuracy of events triggered by a multiple disease condition was tested.
In all experiments, the positive threshold was calculated by performing receiver operating characteristic (ROC) analysis on the learning dataset. In addition, we defined patients with deterioration as the patients whose events occurred within P minutes, corresponding to class c = 2 , and patients in normal condition as those patients whose events did not occur within P minutes, corresponding to class c = 1 . Next, the configurations are discussed in detail.
Preprocessing selection. Dataset 1 provided by Physionet (see Table 1) was used to determine the best preprocessing method. Dataset 1 has a total of 60 patients' data, among which 30 patients had developed acute hypotension while the remaining 30 patients had not. The sampling time of Dataset 1 is 60 s, and the input biological signals are heart rate, systolic blood pressure, diastolic blood pressure, and mean blood pressure. The data was trimmed to 12 min based on preliminary experiment (see Supplementary information S3). Two types of processing, (a) normalisation processing and (b) time-differential processing, were performed to investigate the influence on accuracy. For preprocessing (a), σ d was varied as follows: σ d = 1, 0.1, 0.01, 0.001. In these analyses, the learning process of R-LLGMN was repeated five times with different initial weights, and the average prediction accuracy was calculated. Iterative two-way analysis of variance (ANOVA) was performed to compare methods (a) and (b). If interactions were confirmed with a significance level of less than 5[%], multiple tests based on the Bonferroni method were performed with p < 0.05 as the significance level. In statistical processing in (b), multiple tests based on the Bonferroni method were performed under a significance level of 5%. In addition, multiple tests based on the Bonferroni method were also performed to compare the effect between different σ d values. Three combinations of hyperparameters M c,k and K c included in R-LLGMN were set as: R-LLGMN hyperparameter selection. The influence of M c,k and K c , on prediction accuracy and learning time was tested using Dataset 1. This experiment was performed only on the normalisation process. The hyperparameters were varied in ranges of M c,k = 1, 2, 3, 4, 5 and K c = 1, 2, 3, 4, 5 . Hyperparameter σ d was set to be 0.01. Other hyperparameters' settings were the same as those described in the previous section. The leave-onepatient-out cross-validation method was employed to calculate prediction accuracy and learning time. The CPU of the PC used in this experiment employed an Intel Xeon (R) (X5667: Intel Corporation, number of cores: 4, clocking frequency: 3.1 GHz), memory (16.0 GB, DDR 3 800/1066/1333).

Prediction of acute hypotension occurrence.
To verify the prediction performance of the proposed model, a comparison with a previous prediction method was conducted. The learning dataset and configuration are the same as those used in R-LLGMN hyperparameter selection; however, only the normalisation preprocessing with σ d = 0.01 was performed. The test dataset used for verification was Dataset 2 (see Table 1), among which 14 patients had developed acute hypotension whereas the remaining 26 patients had not. The sampling time was 1 min; the input signals were heart rate, systolic blood pressure, diastolic blood pressure, and mean blood pressure.
Among the previous methods published by Physionet Challenge 5 , the method by Henriques et al., the one which achieved the highest accuracy, was chosen as the comparison target 11 . To compare the proposed method under the same conditions as the previous method, prediction accuracy of acute hypotension, sensitivity, and specificity were calculated using the test data.
Prediction of Vf occurrence. Prediction of Vf was performed using Dataset 3 (see Table 1), which was provided by Physionet and composed of patients with Vf. Here, because Dataset 3 only contains the patients with positive events, we extracted a negative time span from each patient to constitute negative data such that the number www.nature.com/scientificreports/ of positive and negative data was equalised. Here, normalisation preprocessing was selected as the preprocessing method. First, the parameters related to RRI were calculated and the influence of input parameters on accuracy was investigated. The prediction accuracy for a one-dimensional input (input data: RRI) and a threedimensional input (input data: CVRR, RMSSD, and pNN50) was compared. The influence of the prediction time P on the prediction accuracy was also investigated. Twenty patients with Vf participated for this analysis. As Vf occurred once for each patient, the normal data could be extracted from the same patient but different time spans. This resulted in 20 samples with occurrence of Vf and 20 samples under normal conditions. Here, the normal data was extracted from the data by excluding one hour before and after the occurrence of Vf. The sampling frequency was 250 [Hz]. The analysis target data was trimmed to 30 s. In preprocessing, σ d = 0.1 was used for one-dimensional inputs and σ d = 0.01 was used for three-dimensional inputs. Leave-one-event-out cross-validation method was applied to test the influence of prediction time P minutes ahead of the actual occurrence of Vf. Here, P was changed from 1 to 10 min in intervals of 1 min. The above procedure was repeated 10 times with different initial weights of R-LLGMN, and the average prediction accuracy was calculated. A statistical comparison was performed using the Welch test with a significance level of 5%.
Prediction of symptom events triggered by a multiple disease condition. Prediction of symptom events triggered by a multiple disease condition was performed using Dataset 4 (see Table 1). Dataset 4 contains biological signals such as heart rate and arterial blood pressure measured from ICU patients provided by Department of Emergency and Critical Care Medicine of the University of Tokyo Hospital 35,36 . In this experiment, patients whose blood pressure gauge alarms were confirmed to be clinically and technically appropriate were defined as patients with a symptom event for their respective disease. Other patients were defined as normal in this experiment. As such, 39 data of symptom events and 30 data under normal conditions were obtained (see Table 1). The prediction accuracy was calculated using leave-one-event-out cross-validation with different P; P was changed from 1 to 7 min with 1 min intervals. Here, normalisation preprocessing was selected as the preprocessing method. The input signals were heart rate, systolic blood pressure, diastolic blood pressure, and mean blood pressure. σ d = 0.01 , and the analysis period was 30 s.

Results
preprocessing selection. The results for selection of preprocessing method are shown in Table 2. The table shows a two-way ANOVA for hyperparameters (M c,k , K c ) = (1, 2), (2, 3), (3, 3) . Based on Table 2. I, it is not confirmed that there was a significant difference in the influence of time-differential processing on the prediction accuracy ( p = 1.0 ). In contrast, the normalisation process was confirmed to have a significant effect on the prediction accuracy. Moreover, a significant interaction between time-differential preprocessing and nor- Table 2. Interaction between time-differential processing and normalisation processing for hyperparameters M c,k and K c . * : p < 0.05 * * : p < 0.01 www.nature.com/scientificreports/ malisation preprocessing was confirmed ( p = 3.1 × 10 −10 , p = 3.0 × 10 −5 , respectively). Based on Table 2. II, a significant difference in the effect of time-differential preprocessing on the accuracy ( p = 3.2 × 10 −6 ) was observed. It was also confirmed that there was a significant interaction between time-differential preprocessing and normalisation preprocessing ( p = 8.6 × 10 −6 ). However, a significant difference in the influence of normalisation preprocessing on accuracy ( p = 1.1 × 10 −1 ) was not confirmed. Based on Table 2. III, it was confirmed that there was a significant difference in the influence of differentiation preprocessing and normalisation preprocessing on prediction accuracy ( p = 1.4 × 10 −2 , p = 1.3 × 10 −3 , respectively). In addition, it was confirmed that there was a significant interaction between time-differential preprocessing and normalisation preprocessing ( p = 3.8 × 10 −4 ). The above results show that time-differential preprocessing and normalisation preprocessing affect each other for all (M c,k , K c ) = (1, 2), (2, 3), (3, 3) . Therefore, the four groups can be regarded as independent groups and multiple tests based on the Bonferroni-adjusted method were performed for a significance level of 5%. Figure 2a shows the average prediction accuracy calculated using the following four different preprocessing: (i) Without preprocessing, (ii) normalisation preprocessing, (iii) time-differential preprocessing, and (iv) timedifferential and normalisation preprocessing. From Fig. 2a, when M c,k = 1 , K c = 2 , the accuracy and standard deviation for (i), (ii), (iii), and (iv) are 43.7 ± 0.7%, 61.7 ± 2.0%, 49.0 ± 2.8%, and 56.3 ± 2.2% , respectively. In addition, it is confirmed that there was a significant difference between all the groups, except between (i) and (iii) and between (i) and (ii). When M c,k = 2 , K c = 3 , the accuracy and standard deviation for (i), (ii), (iii), and (iv) were 52.3 ± 3.4%, 62 ± 1.4%, 51.7 ± 2.6%, and 46.0 ± 2.8% , respectively. Significant differences were confirmed between (i) and (ii), (ii) and (iii), and (ii) and (iv). When M c,k = 3 , K c = 3 , the accuracy and standard deviation for (i), (ii), (iii), and (iv) are 49.3 ± 2.5%, 59.0 ± 1.9%, 51.3 ± 3.6%, and 50.7 ± 1.9% , respectively. In addition, significant difference was confirmed among (i) and (ii), (ii) and (iii), and (ii) and (iv). From these results, it can be determined that the accuracy was highest when normalisation preprocessing is performed under the conditions of (M c,k , K c ) = (1, 2), (2, 3), (3,3) . It was also confirmed that there was a significant difference between all groups. Figure 2b shows the average prediction accuracy for different σ d . When M c,k = 1 , K c = 2 , the prediction accuracy for σ d = 1, 0.1, 0.01 , and 0.001 were 61.7 ± 2.0%, 61.0 ± 3.5%, 70.3 ± 0.7%, 58 ± 1.3% , respectively. A significant difference was confirmed between σ d = 1 and σ d = 0.01 , σ d = 0.1 and σ d = 0.01 , and σ d = 0.01 and σ d = 0.001 . When M c,k = 2 , K c = 3 , the prediction accuracies were 62.0 ± 1.4%, 60.0 ± 2.4%, 73.3 ± 0.0%, 50.0 ± 1.7% , respectively. A significant difference was confirmed between all the groups, except between σ d = 1 and σ d = 0.1 . When M c,k = 3,K c = 3 , average prediction accuracies and standard deviations were 49.3 ± 2.5%, 56.0 ± 2.1%, 73.3 ± 0.0%, and65.0 ± 0.9% . In addition, it was confirmed that there was a significant difference between all groups. Therefore, the accuracy was highest when σ d = 0.01 and a significant difference between all groups in the conditions of (M c,k , K c ) = (1, 2), (2, 3), (3, 3) was confirmed. Figure 2c shows the prediction accuracy and time required for learning when M c,k , K c is varied in the range of 1-5. The prediction accuracy was improved with an increase in M c,k and K c . The prediction accuracy was maximised (76.6 %) when M c,k = 3,K c = 3 and M c,k = 3,K c = 4 . It then decreased as M c,k and K c increased. In addition, it was confirmed that time required for learning increases with the increase in M c,k and K c .

R-LLGMn hyperparameter selection.
Prediction of acute hypotension occurrence. The data from a total of 60 patients was extracted from Physionet 13 (see section "Experimental configuration") and used to test prediction accuracy for acute hypotension. We compared the prediction accuracy of the proposed method against some previous studies 5,11 in which Physionet datasets 13 were also used. Figure 3 shows the results of comparison of the prediction accuracies between the proposed method and the previous methods. From the figure, it can be seen that the accuracy, sensitivity, and specificity of the proposed model were 92.5%, 100.0%, and 88.5%, respectively. A receiver operation characteristic analysis confirmed the area under the curve (AUC) value of 0.86. From this result, the proposed model had the same prediction accuracy (92.5%) as the method proposed by Henriques et al., which achieved the highest accuracy among the methods published in Physionet Challenge 2009 13 . Moreover, it was also confirmed that the proposed method has higher sensitivity (100.0%) than some of the previous methods.
Prediction of Vf occurrence. Dataset 3 provided by Physionet 13 is used to test the prediction accuracy of Vf (see section "Experimental configuration"). Figure 4a shows the time series posterior probabilities of Vf occurrence of a patient (Sub. P) when CVRR, RMSSD, and pNN50 were together used as inputs for each 10 s period. The figure confirms that posterior probabilities increase as a function of time till the occurrence of Vf reduces. Figure 4b compares accuracy, sensitivity, and specificity between a one-dimensional and a three-dimensional input. The prediction time was set to P = 1 minute. The figure confirms that there is a significant increase in accuracy, sensitivity, and specificity for the three-dimensional input compared to a one-dimensional input. Therefore, it was confirmed that the prediction accuracy improves when using multidimensional inputs. Table 3 shows the confusion matrix and prediction accuracies for all patients with different prediction time points P minutes ahead of the occurrence of Vf. Based on Table 3, prediction accuracies at prediction time points P=1, 2, ...,10 are 90.0%, 90.0%, 87.5%, 82.5%, 82.5%, 77.5%, 72.5%, 72.5%, 75.0%, and 65.0%, respectively. The AUC values at prediction time points P=1, 2, ...,10 are 0.94, 0.94, 0.85, 0.91, 0.90, 0.84, 0.70, 0.74, 0.72, and 0.62, respectively. Therefore, the prediction accuracy increases as we approach the time of occurrence of Vf.
Prediction of symptom events triggered by a multiple disease condition. The dataset provided by the ICU at the University of Tokyo Hospital was used to test the prediction accuracy of symptom events triggered by an undiagnosed multiple disease condition (see section "Experimental configuration"). Figure 5   www.nature.com/scientificreports/

Discussion
With the aim of predicting an acute deterioration triggered by target symptoms, we proposed a prediction method employing a probabilistic neural network that embeds the hidden Markov model with multidimensional mixed Gaussian distribution, called R-LLGMN. It enables prediction of a symptom event from multiple biological signals using the probability transition process in physiological conditions. The parameters of the model can be acquired through machine learning; hence, it can potentially be applied to various symptoms.
To determine the appropriate preprocessing method and model configuration, we statistically analysed the prediction accuracies generated under different settings using the data provided by Physionet. We then found that the prediction accuracy peaks when normalisation preprocessing is performed (see Fig. 2a). This is because appropriate scaling of the input data eliminates the difference in amplitude between data, which is irrelevant for R-LLGMN to discriminate between the two classes. Significant differences between time-differential preprocessing and no preprocessing was not confirmed when the hyperparameters in R-LLGMN were set to the following values: (M c,k , K c ) = (1, 2), (2, 3), (3, 3) (see Fig. 2a). This is because time-differential preprocessing can only represent information on short-term temporal changes in the biological signal, making it difficult to make longterm predictions. These results indicated that normalisation is an effective preprocessing that enabled to obtain the highest accuracy. In addition, we confirmed that the prediction accuracy was the highest when σ d = 0.01 (see Fig. 2b). This is because variation in the input data affected the learning of R-LLGMN. These results indicate that hyperparameter σ d must be determined based on the variation in the input data used for prediction. Therefore, in the following analysis, a preliminary analysis was conducted to determine σ d . However, a detailed investigation on the method for selecting σ d will be necessary in the future.
In terms of the neural network configuration, it was demonstrated that the prediction accuracy becomes maximum when M c,k = 3 , K c = 3 and M c,k = 3 , K c = 4 (see Fig. 2c). This is because increasing M c,k and K c enables R-LLGMN to model complicated time series characteristics by improving its representation ability. Moreover, it was demonstrated that prediction accuracy decreases when the values of the hyperparameters are increased to more than M c,k = 3 , K c = 3 and M c,k = 3 , K c = 4 . This is due to overfitting, which can worsen the generalisation performance. In addition, M c,k , K c increases the time required for the learning process (see Fig. 2c) because the computational complexity increases. Therefore, considering the trade-off between learning time and prediction accuracy, M c,k = 3 , K c = 3 ( c = 1, 2 , k = 1, 2, 3 ) were considered as the optimal hyperparameters.
Based on the hyperparameters and model configuration, the prediction accuracies were tested for acute hypotension, Vf, and a multiple disease condition. The prediction results for acute hypotension confirmed that the proposed model has the same level of prediction accuracy (92.5%) and sensitivity (100%) as the method proposed by Henriques et al. (see Fig. 3).
The prediction results for Vf confirmed a significant increase in accuracy, sensitivity, and specificity when using a three-dimensional input (see Fig. 4b). This is because not only the time series characteristics of RRI, but also the vagus nerve activity of the patient could be evaluated. Vagal nerve activity has been reported to increase or decrease 37 before the onset of Vf. Thus, it is effective in predicting Vf, which is a type of ventricular arrhythmia. In addition, an increase in the number of input dimensions also contributed to an improvement in prediction accuracy as it enabled R-LLGMN to extract characteristics of multiple types of biological information. However, the prediction accuracy decreased as the prediction time point parameter P increased. This indicates difficulties in early prediction (see Table 3). Introducing frequency analysis on the biological signals and applying it as an additional input dimension may improve early prediction accuracy.
The prediction result for a multiple disease condition confirmed that the posteriori probability increases as the prediction time point P approaches the point of a symptom event (see Fig. 5). Table 4 shows that all the prediction accuracies at prediction time points from P = 1 min to P = 7 min before the occurrence of a symptom event exceed 90.0%. This verifies the effectiveness of the proposed method in predicting events triggered by a multiple disease condition.
In this paper, we only tested prediction accuracy using a limited number of combinations of hyperparameters ( K c and M c,k ). Testing with more combinations could provide better prediction accuracy or enable earlier  www.nature.com/scientificreports/  True positive  17  17  16  14  15  14  12  12  13  9   True negative  19  19  19  19  18  17  17  17  17  17   False negative  3  3  4  6  5  6  8  8  7  11   False positive  1  1  1  1  2  3  3  3  3  3 Accuracies  www.nature.com/scientificreports/ detection. In addition, optimising the duration of RR interval analysis may also contribute to better performance. However, searching the combinations of hyperparameters is considerably time-consuming, and large values of K c and M c,k may cause overlearning. A more efficient learning algorithm is required to optimise the hyperparameters for the neural networks used in this paper. All input indices employed in this study are the linear variables in the time domain, but linear and nonlinear variables in the frequency domain such as standard deviation of HRV and power of HRV in the high-and low-frequency bands are reportedly more effective for predictive clinical purposes 23,[38][39][40] . However, the electrocardiogram data used in this study were sampled at 250 Hz, which was insufficient to estimate the frequency information of the heart rate interval accurately. Further improvement of the prediction accuracies and earlier detection of deterioration may thus be achieved by incorporating the frequency domain indices derived from the electrocardiogram data sampled at higher sampling frequencies. It should be noted that when adding these indices as the input features, the proposed method does not need to change its fundamental structure and algorithm because it adopts an R-LLGMN-based machine learning framework.
The number of patients analysed is not ideal. The database we used (Physionet Challenge 2009) only provides data for 30 patients with acute clinical deterioration for the learning dataset and 14 patients with acute clinical deterioration for the test dataset. Although we analysed four different datasets using the proposed algorithm with a single network architecture and the results demonstrated in this paper indicate the success and versatility of the proposed method, it is necessary to increase the number of patients from other open databases such as MIMIC III to further enhance the generalisability of the proposed method.
The results of our experiments showed that the proposed model has the highest prediction accuracy compared to contemporary methods. In addition, the proposed method is capable of predicting a symptom event  www.nature.com/scientificreports/ triggered by different diseases, such as acute hypotension and Vf, by adjusting the parameters of the model using the corresponding learning data. Given that the proposed method can predict a target symptom event before it actually occurs with a high accuracy of approximately 90%, we can conclude that the proposed method has achieved a clinically applicable precision.