Abstract
Welltrained clinicians may be able to provide diagnosis and prognosis from very short biomarker series using information and experience gained from previous patients. Although mathematical methods can potentially help clinicians to predict the progression of diseases, there is no method so far that estimates the patient state from very short timeseries of a biomarker for making diagnosis and/or prognosis by employing the information of previous patients. Here, we propose a mathematical framework for integrating other patients' datasets to infer and predict the state of the disease in the current patient based on their short history. We extend a machinelearning framework of “prediction with expert advice” to deal with unstable dynamics. We construct this mathematical framework by combining expert advice with a mathematical model of prostate cancer. Our model predicted well the individual biomarker series of patients with prostate cancer that are used as clinical samples.
Introduction
Mathematical models of diseases have been constructed to understand the mechanisms of diseases^{1,2,3,4,5,6,7}, provide diagnosis and prognosis^{8,9,10}, and determine treatment options^{11,12,13,14}. When we focus on a clinical setting, it is crucial that we can estimate the state of a disease from short biomarker observations. Clinicians make such estimations using their experience with previous patients (see Fig. 1a). To the best of our knowledge, such estimations have not been realized mathematically thus far. If such mathematical estimation is possible, then we can optimize a treatment option in a personalized way. The difficulty of estimations stems not only from a lack of information, but also the instability of biomarkers' timeseries, such as those for cancer volumes. Thus, our goal is to infer the state of a disease from both short, unstable timeseries data of biomarkers obtained from a target patient and longer timeseries data of biomarkers from previous patients who suffered from the same disease. We adopt the machinelearning framework of “online prediction”, which integrates “experts' advice”^{15} to make accurate predictions, where experts are shortterm patterns of previous patients' histories, which are conformed to the target patient timeseries.
A series of samples from a patient contains information on (often unstable) disease dynamics^{8,10} such as rapid increase. By considering the time series observed from the unstable dynamics, we may be able to better understand the current disease state. Employing past patients' timeseries as experts and the target patient's timeseries as observations, we can predict a timeseries with the standard expert advice method^{15}. However, this cannot be used directly, because we must deal with the unstable dynamics in which the value of a biomarker increases rapidly. In this paper, we propose an approach that couples an existing machinelearning technique with the instability possessed by the temporal disease datasets. Our method is based on the standard expert advice^{15}, but deals with the instability of the underlying dynamics^{8,10} by integrating trajectories in a database with weights that increase exponentially in time.
Results
The proposed method: temporal expert advice (TEA)
We extend the standard expert advice method^{15} to one that emphasizes nearpast information. This temporal expert advice, or the TEA algorithm, consists of three steps (see Fig. 1b–d). TEA uses a collection of timeseries, which we call experts, and weights each expert based on its agreement with the target timeseries. The algorithm outputs a prediction by combining these experts.
The first step constructs an expert of a target system. There are two options. The first option is to use long timeseries observed in the past as they are. We construct experts by simply inserting previous parts of the timeseries or the datasets of previous patients. Let x_{j}_{,l} be the lth point of the jth timeseries in a database (j = 1, 2, …, J, l = 1, 2, …, L) and f_{i}_{,t} be the ith expert's advice at time t. Numbers J and L are the number of timeseries and the number of points in each timeseries, respectively; We assume that the lengths of the timeseries are equal, but it is easy to extend to cases of different lengths. Let P be the number of points related to each expert. Then, we can define an expert f_{(L−P+1)(j−1)+i,k} = x_{j}_{,i+k−1} for i = 1, 2, …, L − P + 1, k = 1, 2, …, P. The second option is roughly to fit a mathematical model that has a set of parameters to a very short timeseries, obtain the initial conditions for each set of parameters, and prepare the set of experts with these parameters. The details of this second rough option are discussed after we introduce a mathematical model of prostate cancer in the later section.
In the second step, TEA weights the trajectories f_{i}_{,t} in the database to generate an appropriate weighting for the most current state. Let y_{k} denote the observation at time k, and let l(·,·) be a loss function. When we include the next point, the weights w_{i}_{,t} are updated according to a formula obtained by modifying the standard expert advice^{15}. To achieve this, we sum the loss at each time step with a coefficient as follows:The modified form considers the instability of the underlying dynamics by introducing a coefficient a_{k}(t), which measures the reliability of the prediction at, and increases exponentially with, time k such that a_{k}(t) = λ^{k}^{−1} or λ^{k} with λ > 1. Thus, the real values L_{i}_{,t} and L_{t} are the exponentially weighted losses of the ith expert and the predictor up to time t, respectively. We define L_{0} = L_{i}_{,0} = 0 for simplicity. The weight of the expert is updated aswhere η is a learning rate. Chernov and Zhdanov^{16} proposed a modified expert advice method in which they defined a_{k}(t) = ρ^{t}^{−k−1} and 0 < ρ < 1. We call this the “CZ method”.
The third step predicts future states of the target system by applying the obtained weighting to the future trajectories in the database^{15}. We can generate a point prediction by simply adding the q steps ahead of the trajectories with the weights obtained in the second step as follows:where N denotes the total number of experts. We can also generate a distributional prediction by assuming the distribution of observational errors, and summing this error distribution with the weights. To make these predictions online, we repeat the second and third steps iteratively.
Upper bound of the regret of TEA
We derive an upper bound of the regret of TEA. The primary property peculiar to TEA lies in our definition of a_{k}(t). We define the coefficient as a_{k}(t) = λ^{k}^{−1} or λ^{k}, where λ > 1. The choice of these two options depends on each situation. When a given data is too short, we choose the latter, e.g. our prediction about the biomarker of prostate cancer, or PSA (prostatespecific antigen). Let and be the accumulated losses for the proposed method. We call these the exponential accumulated losses to distinguish them from the standard accumulated losses. In addition, we define the regret as . The upper bound of the regret with a_{k}(t) = λ^{k}^{−1} is then given by
Proof of the upper bound in our proposed method
We give a proof of Eq. (5) in a similar way to the Proof of Theorem 2.2 in Ref. 15. Define a new variable . We will consider the upper and lower bounds of ln(W_{t}/W_{0}) to construct the upper bound of the regret. First, we obtain the lower bound of ln(W_{t}/W_{0}) as
Second, we derive the upper bound of ln(W_{t}/W_{0}). Observe that . Then, we can reformulate ln(W_{t}/W_{t}_{−1}) as follows:
Equation (7) can be regarded as the average of random variable exp(−ηλ^{t}^{−1}l(f_{k}_{,t}, y_{t})) with a probability mass function proportional to . Lemma 2.2 of Ref. 15 states thatHere, we assume that x is a random variable satisfying a ≤ x ≤ b, and that the inequality holds when s is any real number.
Replace s by −ηλ^{t}^{−1} and x by l(f_{k}_{,t}, y_{t}). Then, the upper bound of Eq. (7) can be found using Eq. (8) as follows:We assume that l(·,·) is convex as described above. Then, the upper bound of ln(W_{t}/W_{0}) can be derived asBecause Eqs. (6) and (10) provide lower and upper bounds of ln(W_{t}/W_{0}), respectively, the following inequality is obtained:By substituting the regret into Eq. (11), we finally reach the following inequality:
(Proof end)
Optimization of the upper bound of TEA
We minimize the upper bound of Eq. (12) over η. First, we differentiate the upper bound with respect to η as follows:
The solution is which gives the smallest upper bound. Replacing η in the upper bound of with η_{*}, we obtain the following optimal upper bound :Although this optimal upper bound perhaps seems to be curious at a glance due to its exponential increase with t, this is caused by the definition of the accumulated losses Eqs. (1) and (2) with a_{k}(t) = λ^{k}^{−1}_{.} This regret can be compared with the normal types of regrets using relationship described in the next section. When ε = 1 and λ → 1, the optimal upper bound coincides with , which is the upper bound obtained in the standard expert advice method^{15}.
Comparison between the proposed method and the Chernov–Zhdanov method
Here, we highlight the difference between the CZ method and the proposed TEA method. The first point of difference is the optimal upper bound of the regret. We briefly introduce the optimal upper bound of the CZ method^{16}. Let and be the accumulated losses for the ith expert and the predictor for the CZ method, respectively. Then, the optimal upper bound of the regret for the CZ method is given byNote that we assume the case where the value of the decay rate ρ does not depend on t or k. See Ref. 16 for the proof.
Although we cannot directly compare these regrets, we can compare them after normalization. Assuming that the decay rates are equal, namely λ = ρ^{−1}, the regrets and have the following relation:Using this relation, a comparison between the two upper bounds is feasible. Multiplying the optimal upper bound by ρ^{t}^{−1}, we obtain the normalized optimal upper bound as
Then, the following relation is obtained:This result means that the normalized optimal upper bound of the proposed method is always smaller than that of the CZ method when 0 < ρ < 1.
Next, we compare the weights produced by the two methods. Let and be the weights of the ith expert at time t in the CZ and TEA methods, respectively. Similarly to the derivation of Eq. (17), the accumulated losses of both methods are related by . Substituting this relation into , we haveEquality (20) means that the proposed TEA method tends to assign reliable experts with heavier weights than the CZ method. This implies that for reliable experts because ρ^{−t+1} ≥ 1.
Examples of timeseries prediction for mathematical models
We demonstrate the superiority of the TEA method to both the CZ method and the standard expert advice in online timeseries prediction using toy examples. We use the Hénon map^{17} and the Ikeda map^{18} for our demonstration. These two models are commonly used to test nonlinear timeseries analysis methods, which exhibit typical unstable chaotic dynamics. First, we generate timeseries for the database using various values of parameters. We then generate a target timeseries for prediction using a set of parameter values that is different from those used to generate the database. We prepare M × S experts for the database, where M is the number of parameter sets. For each parameter set, we generate S experts with different initial conditions. In numerical simulations of TEA, we set ε = 1 and . We also set λ = ρ^{−1} and ρ = 0.9. See Algorithm 2 in Ref. 16 for the implementation of the CZ method, and Ref. 15 for that of the standard expert advice.
The Hénon map^{17} is a twodimensional map defined as We set the parameters at a = 1.35 and b = 0.15 to generate the target timeseries. Note that the dynamics produced by this parameter set is of deterministic chaos. The experts' parameters are uniformly chosen from a ∈ [1.3,1.4] and b ∈ [0.1,0.2]. The initial conditions x_{0} and y_{0} are randomly chosen in [−0.02, 0.02] × [−0.02, 0.02], and the map is iterated for 1,000 steps to eliminate transient effects. We assume that we observe and predict the value of x + y. We use this assumption because we can observe a scalar biomarker of PSA in the prostate cancer application discussed later. The results presented in Figs. 2a, 2b, and 2c show that the proposed TEA achieves better online timeseries prediction than the standard expert advice and the CZ method. We choose M = 100 and S = 1,000 in Figs. 2a, 2f, and 2g. Another example of the Ikeda map is shown in Supplementary Fig. S1 (see also Supplementary Information).
The proposed TEA method provides the best online prediction in different toy examples. The more experts we use, the smaller the prediction errors become. When a large number of experts are used, the proposed TEA tends to achieve the best online timeseries prediction. We need to decay the past information in these examples, because the unstable chaotic dynamics rapidly loses the memories.
Examples of timeseries prediction for real datasets
We now consider two real datasets: violin sounds^{19} and the membrane potential of squid giant axons^{20}. The violin sounds are RWCMDBI2001W05 No. 15 in the RWC Music Database (Musical Instrument Sound). Previous studies on squid giant axons have demonstrated the chaotic nature of the underlying dynamics^{20,21,22,23}. These timeseries are both scalar and realvalued. We divide each time series into two. The first part is used to build the database, and the second constructs the targets for online prediction. We use M = 1,000 and M = 120 targets for the analysis of violin sounds and squid giant axon data, respectively. The lengths of the target data are 311 for the violin data and 51 for the squid giant axon data; numbers and lengths of target data are determined by the lengths of the original datasets.
We compare five methods using these real data. These are our TEA method, the CZ method, the standard expert advice, the persistence prediction, and the average prediction. The persistence prediction is a method that we let the current value to be the prediction for the next time point. We compare each pair of the method individually, and count the number of points at which the prediction by one method is better than the other for each target timeseries. If one method is superior at more than half the data points, we declare that method the winner on the target data. We exclude the initial ten points from the analysis, because we cannot prepare the learning part. Finally, we count the number of wins and losses for each pair among the five methods. In the TEA numerical simulations, we set ε = 1 and . We also set λ = ρ^{−1} and ρ = 0.9. The violin sound^{19} results are shown in Figs. 2d and 2e, and Tables 1. For this dataset, our method and the persistence prediction produce much better results than the other methods. Therefore, we next compare our TEA method with the persistence prediction with respect to the number of experts. We use the binominal test for the analysis, i.e., if the number of wins is greater (smaller) than 531 (469), the method is significantly superior (inferior) to the other method with respect to the 95% confidence level twosided binominal test. When the number of experts is large, our TEA method is significantly superior to the persistence prediction, as shown in Fig. 2e and Table 1. In the example of squid giant axon^{20}, the proposed TEA is also better than the other four methods when the number of experts is large, especially when greater than or equal to 87, as shown in Fig. 3 and Table 1.
In conclusion, our TEA method tends to provide the best prediction when the number of experts is large. The precise number of experts for which this is the case may change depending on the given data, the length of targets, and the decay parameter.
Distribution prediction to the mathematical models
We applied the distribution prediction to timeseries of the Hénon map. The distribution prediction will be explained in the later Method section. The setup is similar to that for the point prediction, except that we provide the prediction as a distribution. The results are presented in Figs. 2f and 2g. The width of the distribution prediction is narrow immediately after the learning period (Fig. 2f), then grows gradually as the number of prediction steps increases because of the instability of the underlying dynamics. The predicted confidence interval tends to contain the actual values. When we increase the number of points used for prediction, the width of the distribution prediction becomes narrower (Fig. 2g). We use values of S = 1,000 and M = 100 in Figs. 2f and 2g. In the TEA numerical simulations, we set ε = 1 and . The number of trials is 40 in each box in Fig. 2g. Restricting the range of λ to 1 < λ < 2 gives a better prediction. We generate the target and experts' timeseries as in the previous section. We also obtain the distribution prediction of the Ikeda map using S = 1,000 and M = 100, as shown in Fig. S1f. The result is very similar to that of the Hénon map. Again, restricting the range of λ to 1 < λ < 2 gives a better prediction.
Mathematical models of prostate cancer
TEA can be applied to clinical problems, such as the prediction of prostatespecific antigen (PSA) after some initial treatments, while waiting to start an additional treatment. We apply TEA to the prediction of tumor markers for prostate cancer PSA. Before the technical details, we introduce a mathematical model of prostate cancers in this section.
Patients had already received radical prostatectomy as an initial treatment. Then, clinicians followed postoperative PSA levels to determine when to commence salvage treatment. Although the timing at which patients start salvage treatment is an important problem, there is no definitive agreement on when this to be started. Currently, clinicians are determining the start of salvage treatment based on their discretion. The clinical part of this study was approved by the ethics committees of Jikei University School of Medicine and The University of Tokyo. All patients provided written informed consent. Cancer cells tend to thrive under an androgenrich environment. Meanwhile, lowering androgen levels makes cancer cells grow slowly or rather decline. Because of this characteristics, clinicians suppress the androgen concentration with hormone therapy. However, when cancer cells remain exposed to an androgenpoor environment, they often acquire the ability to grow without androgen. This growth signals a cancer relapse. Intermittent androgen suppression was proposed to delay the relapse of cancer^{24}. In intermittent androgen suppression, we start hormone therapy, but stop when PSA levels have decreased sufficiently. Then, we wait until PSA increases and reaches a threshold value. After reaching this threshold, we resume hormone therapy. We repeat this process to delay the relapse. However, clinical trials show that the effects of intermittent androgen suppression depend on individual patients, and are limited^{25,26}.
Here, we use a mathematical model^{8} of intermittent androgen suppression for prostate cancer^{24,25,26}. This model was constructed based on data of Canadian patients^{25,26} whose PSA had increased to some extent after radiation therapy, and were later treated by intermittent androgen suppression. Because the model of Ref. 8 has a small number of parameters, it is reasonable to predict the future PSA values with this simple model and very short timeseries, although several mathematical models have been proposed to describe dynamics under intermittent androgen suppression^{4,5,6,8,10,27,28,29,30,31}. In the model described in Ref. 8, we assume that there are three classes of cancer cells: androgen dependent cancer cells x_{1}, androgen independent cancer cells generated through reversible changes x_{2}, and androgen independent cancer cells generated through irreversible changes x_{3}. When the hormone therapy is underway, x_{1} may change to x_{2} or x_{3}. When the hormone therapy is stopped, x_{2} may return to x_{1}, whereas x_{3} cannot return to x_{1} or x_{2} because of genetic mutation. We previously verified two important properties of this model: namely, a piecewise linear model is sufficient to describe the dynamics of PSA, and the androgen concentration need not be explicitly included in the model^{8}. Based on these verified properties, we can simply construct the mathematical model as for the ontreatment period, andfor the offtreatment period^{8}. Here, d_{1}, d_{2}, d_{3}, d_{4}, d_{5}, d_{6}, e_{1}, e_{2}, e_{3}, and e_{4} are model parameters. We assume that a PSA measurement is represented by x_{1} + x_{2} + x_{3} for simplicity. Thus, we must specify these 10 parameters for the dynamics and three other parameters for the initial conditions of x_{1}, x_{2}, and x_{3}. If we try to find these 13 parameters directly only from a single target patient, we would need to obtain a long timeseries. The application of the proposed TEA algorithm makes the required observation period of PSA measurements shorter by integrating observations from the target patient with the long timeseries data of PSA measurements obtained from previous prostate cancer patients. We note that we only analyze the offtreatment period in this paper, because the target dataset is about the followup period after an initial treatment. Therefore, we need 4 control parameters and initial conditions.
Construction of experts for prediction of PSA for prostate cancer
In this paper, we have two datasets; one is a dataset of Canadian patients with many data points; the other is a dataset of Japanese patients with short time points. We need a long timeseries to efficiently estimate model parameters. Therefore, we select Canadian datasets for estimation of parameters and Japanese datasets for predicting targets.
In applying TEA to prostate cancer, we first prepared 72 sets of model parameters, each of which was obtained from one of 72 Canadian prostate cancer patients treated with intermittent androgen suppression. These parameters were obtained from Ref. 8. We note that our prediction target dataset corresponds to the offtreatment period in the model^{8}. Second, we chose the number of observation points to use as known data points. This must be at least three because of the model dimensions^{8}. Third, using each set of parameters, we determined the initial model state to minimize the fitting error between the initial three or more PSA measurements and the model output. The optimal initial conditions were selected by minimizing the following cost function:where h(x) = 10^{15}(1 − x) for x < 0 and h(x) = 0 for x ≥ 0, where t_{k} is the kth observation time. We denote the number of observation points used for learning by K. The method of obtaining the initial conditions was similar to that in Ref. 8. Fourth, we ran the model with each set of parameters and the corresponding initial conditions to construct the database of experts f_{i}_{,t}; thus, we have 72 experts.
Estimation of learning parameters
We applied the second step of the TEA algorithm to determine the weights of the PSA measurements. Then, we applied the third step of the TEA algorithm to obtain the distribution prediction. We determined the optimal decay rate λ by minimizing the error between the last learning observation and the prediction. We restricted the range of λ to 1 < λ < 2 to obtain better predictions. The standard deviation σ is estimated as follows. We ran the distribution prediction with the obtained initial conditions and the decay parameter. We set the standard deviation σ to the mean of the absolute errors between the median of the distribution prediction and the corresponding observation when the mean is taken during the learning period.
Application of TEA to prediction of PSA for prostate cancer
We predict the values of PSA with distribution prediction. The distribution prediction of PSA with TEA is shown in Fig. 4. Here we evaluate the larger side of the predicted distribution, because overlooking high PSA is highly undesirable in a real clinical setting. We show seven points u_{t}(Q) of the predicted distribution (97.5%, 87.5%, 75%, 65%, 60%, 55%, and 52.5%) in these figures, where u_{t}(Q) is defined asNote that Q is the intended value of the probability, i.e. 0.975, 0.875, 0.75, 0.65, 0.6, 0.55, and 0.525, respectively, in this situation. We obtained the proportion of PSA values that are less than the intended probability for each Q, and counted the PSA data points that are next to the final data points in the learning period; namely, if we are using three data points for learning, we count the fourth data point. In this paper, we focus on the predictability of the next point. The results are summarized in Table 2. Note that TEA can predict not only the next data point, but also those far in the future. We predicted the future PSA values for 88, 86, 80, and 69 patients when we used the first three, four, five and six time points, respectively. We also conducted numerical simulations using CZ and the standard expert advice. The predicted distributions were different for each method, as shown in Fig. 5. In numerical simulations, we set ε = 1. We arrange the learning rate as four constant values with v = 1, 2, 3, and 4. In addition, we increase v as the number of the learning points increases. We note that M = 72 is the number of experts. We also arrange the learning rate as for the standard expert advice and for the CZ method.
TEA exhibits the best performance among the three methods, because each proportion tends to be closest to the specified value of Q. These results imply that our proposed prediction method may be reasonable for real applications in a clinical setting. We also checked the prediction performance in terms of the median using the mean absolute error (MAE) as summarized in Supplementary Table S1. As a result, TEA shows the best performance in the meaning of the average MAE among the four cases. We note that the CZ method showed good performance in terms of the root mean square error (RMSE), however, we believe that the MAE suits our situation because we employed the absolute error function for the learning period.
Discussion
In general, clinicians provide a salvage treatment with patients who had recurrence after surgery. Although many studies show the clinical benefit of a salvage treatment for patients with prostate cancer, current studies have reported that an earlier salvage treatment, especially for local recurrence, could improve clinical outcomes^{32}. These results suggest that postoperative patients with lower PSA values may have a higher frequency of local recurrence that could be efficiently treated by radiotherapy. If clinicians can accurately assess the PSA failure at an earlier stage than the present standard criterion of PSA failure which is that the PSA value increases to 0.2 (ng/ml) or more after surgery, salvage treatments could be more effectively scheduled for each patient, improving the final clinical outcome^{33}. However, there is still no standardized criterion to determine the best timing of salvage treatments^{32,33}. Combined with a mathematical model^{8}, TEA or its further extensions may be able to potentially predict the PSA dynamics in patients before PSA failure. Therefore, the proposed TEA could become the basis of a new standard index for earlier prediction of PSA failure using a simple mathematical solution, that offers important information for a suitable salvage treatment after surgery^{7,34,35}.
The more experts we use, the more (numerically) accurate the prediction tends to become (Figs. 2b, 2c, and 2e); in this sense, the accumulation of datasets is important. Additionally, the longer the learning period, the more accurate the TEA prediction tends to become in the toy examples (Fig. 2g). This could be because the toy examples have bounded unstable dynamics. The prediction error does not monotonically decrease with an increase of the learning data points in the example of prostate cancer (Tables 2 and S1), because PSA tends to increase monotonously in time. TEA exhibits the best performance in our analyses. The proposed combination of the expert advice with a predicted distribution enhances the reliability of prediction. This is important in many applications, and especially in medicine.
In summary, we have demonstrated that TEA can infer the state of a target system, by combining its short timeseries and the expert advice constructed as a collection of longer timeseries. The proposed TEA may be applied to any problems where a short timeseries and its database are given, as demonstrated in the violin and squid giant axon examples, although we primarily intend to apply TEA in clinical settings, such as inferring the state of a disease using a short timeseries from the target patient and longer timeseries from previous patients with the same disease. We hope that TEA improves the overall survival and/or quality of life for patients.
Methods
Standard expert advice method
The expert advice method^{15} is an online predictor in machine learning. We briefly introduce the standard expert advice method in this section. See the book of CesaBianchi and Lugosi^{15} for a more detailed introduction. The expert advice consists of experts and a predictor. At each time step, each expert gives the prediction on the future. The predictor makes a prediction for the future by weighting these pieces of advice based on the experts' prediction history. After a new outcome is observed, the predictor updates the experts' weights using the losses produced in the current step. We iterate these steps to realize online prediction. Let f_{i}_{,t} be the ith expert's advice at time t and N be the number of experts. We assign each expert the weight w_{i}_{,t} at time t, and obtain the prediction by averaging the experts' advice aswhere p_{t} is the prediction at time t, η is a constant, and L_{i}_{,t} is the accumulated loss for the ith expert at time t. Better experts have smaller accumulated losses, and hence have larger weights. The accumulated losses for the ith expert and the predictor at time t arewhere y_{k} is the observation at time k, and l(x, y) is a convex loss function, typically the absolute error x − y or squared error (x − y)^{2}. We evaluate the performance of the predictor by a regret, which is defined as the predictor's accumulated loss minus the accumulated loss for the best expert. Mathematically, the regret R_{t} is defined as where ε is the maximum value of l(·,·). Namely, the regret is bounded above by the righthand side of Eq. (30) (see Ref. 15 for the derivation). We obtain the following optimal constant η_{*} by minimizing the upper bound over η:When replacing η with η_{*}, we obtain the optimal upper bound of the regret R_{t} as . We call the accumulated losses defined in Eqs. (28) and (29) the standard accumulated losses.
Although the standard expert advice can be applied in many cases, the method is not suited to the prediction of unstable systems, in which the recent history should be emphasized to predict the future more accurately. Thus, we extended the standard expert advice by placing greater weights on recent past information. We call our extension the temporal expert advice, or TEA.
Distribution prediction
Here, we extend the TEA method for point prediction to the prediction of distribution, so that we can handle the prediction of biomarkers. For this purpose, we introduce the distribution prediction of the ith expert at time t as follows:where σ is the standard deviation. This distribution is given under the assumption that a point prediction f_{i}_{,t} is disturbed by various errors and that the error is normally distributed around the point prediction. Then, the predictor for the distribution prediction is given byWe determine the optimal decay rate λ by minimizing the absolute error between the final learning point and the corresponding observation point. The standard deviation σ is set to be the absolute difference between the point prediction and the observation at the final learning point under the estimated λ above as , where is the number of the learning points. We note that we determined these parameters with a modified way in the prediction of PSA because of its instability.
References
 1.
Nowak, M. A. et al. Antigenic diversity thresholds and the development of AIDS. Science 254, 963–969 (1991).
 2.
Jackson, T. L. A mathematical model of prostate tumor growth and androgenindependent relapse. Discrete Contin. Dyn. Syst. Ser. B 4, 187–201 (2004).
 3.
Michor, F. et al. Dynamics of chronic myeloid leukaemia. Nature 435, 1267–1270 (2005).
 4.
Ideta, A. M., Tanaka, G., Takeuchi, T. & Aihara, K. A mathematical model of intermittent androgen suppression for prostate cancer. J. Nonlinear Sci. 18, 593–614 (2008).
 5.
Jain, H. V., Clinton, S. K., Bhinder, A. & Friedman, A. Mathematical modeling of prostate cancer progression in response to androgen ablation therapy. Proc. Natl. Acad. Sci. USA 108, 19701–19706 (2011).
 6.
Portz, T., Kuang, Y. & Nagy, J. D. A clinical data validated mathematical model of prostate cancer growth under intermittent androgen suppression therapy. AIP Adv. 2, 011002 (2012).
 7.
Draisma, G. et al. Lead time and overdiagnosis in prostatespecific antigen screening: importance of methods and context. J. Natl. Cancer Inst. 101, 374–383 (2009).
 8.
Hirata, Y., Bruchovsky, N. & Aihara, K. Development of a mathematical model that predicts the outcome of hormone therapy for prostate cancer. J. Theor. Biol. 264, 517–527 (2010).
 9.
Kronik, N. et al. Predicting outcomes of prostate cancer immunotherapy by personalized mathematical models. PLoS ONE 5, e15482 (2010).
 10.
Hirata, Y., Akakura, K., Higano, C. S., Bruchovsky, N. & Aihara, K. Quantitative mathematical modeling of PSA dynamics of prostate cancer patients treated with intermittent androgen suppression. J. Mol. Cell Biol. 4, 127–132 (2012).
 11.
Gorelik, B. et al. Efficacy of weekly docetaxel and bevacizumab in mesenchymal chondrosarcoma: a new theranostic method combining xenografted biopsies with a mathematical model. Cancer Res. 68, 9033–9040 (2008).
 12.
Suzuki, T., Bruchovsky, N. & Aihara, K. Piecewise affine systems modelling for optimizing hormone therapy of prostate cancer. Philos. Trans. R. Soc. Lond. A 368, 5045–5059 (2010).
 13.
Hirata, Y., di Bernardo, M., Bruchovsky, N. & Aihara, K. Hybrid optimal scheduling for intermittent androgen suppression of prostate cancer. Chaos 20, 045125 (2010).
 14.
Chmielecki, J. et al. Optimization of dosing for EGFRmutant nonsmall cell lung cancer with evolutionary cancer modeling. Sci. Transl. Med. 3, 90ra59 (2011).
 15.
CesaBianchi, N. & Lugosi, G. Prediction, Learning, and Games (Cambridge Univ. Press, New York, 2006).
 16.
Chernov, A. & Zhdanov, F. Prediction with expert advice under discounted loss. Proc. of ALT 2010, Lecture Notes in Artificial Intelligence 6331, 255–269 (2010).
 17.
Hénon, M. A twodimensional mapping with a strange attractor. Commun. Math. Phys. 50, 69–77 (1976).
 18.
Ikeda, K. Multiplevalued stationary state and its instability of the transmitted light by a ring cavity system. Opt. Commun. 30, 257–261 (1979).
 19.
Goto, M. Development of the RWC Music Database. Proc. 18th Int. Congress on Acoustics (ICA 2004), I553556 (2004).
 20.
Mees, A. et al. Deterministic prediction and chaos in squid axon response. Phys. Lett. A 169, 41–45 (1992).
 21.
Hirata, Y., Judd, K. & Aihara, K. Characterizing chaotic response of a squid axon through generating partitions. Phys. Lett. A 346, 141–147 (2005).
 22.
Hirata, Y. & Aihara, K. Devaney's chaos on recurrence plots. Phys. Rev. E 82, 036209 (2010).
 23.
Hirata, Y., Oku, M. & Aihara, K. Chaos in neurons and its application: perspective of chaos engineering. Chaos 22, 047511 (2012).
 24.
Akakura, K. et al. Effects of intermittent androgen suppression on androgendependent tumors. Cancer 71, 2782–2790 (1993).
 25.
Bruchovsky, N. et al. Final results of the Canadian prospective phase II trial of intermittent androgen suppression for men in biochemical recurrence after radiotherapy for locally advanced prostate cancer: clinical parameters. Cancer 107, 389–395 (2006).
 26.
Bruchovsky, N., Klotz, L., Crook, J. & Goldenberg, S. L. Locally advanced prostate cancer: biochemical results from a prospective phase II study of intermittent androgen suppression for men with evidence of prostatespecific antigen recurrence after radiotherapy. Cancer 109, 858–867 (2007).
 27.
Tanaka, G., Hirata, Y., Goldenberg, S. L., Bruchovsky, N. & Aihara, K. Mathematical modelling of prostate cancer growth and its application to hormone therapy. Philos. Trans. R. Soc. Lond. A 368, 5029–5044 (2010).
 28.
Tanaka, G., Tsumoto, K., Tsuji, S. & Aihara, K. Bifurcation analysis on a hybrid systems model of intermittent hormonal therapy for prostate cancer. Physica D 237, 2616–2627 (2008).
 29.
Guo, Q., Tao, Y. & Aihara, K. Mathematical modeling of prostate tumor growth under intermittent androgen suppression with partial differential equations. Int. J. Bifurcat. Chaos 18, 3789–3797 (2008).
 30.
Tao, Y., Guo, Q. & Aihara, K. A model at the macroscopic scale of prostate tumor growth under intermittent androgen suppression. Math. Models Meth. Appl. Sci. 19, 2177–2201 (2009).
 31.
Tao, Y., Guo, Q. & Aihara, K. A mathematical model of prostate tumor growth under hormone therapy with mutation inhibitor. J. Nonlinear Sci. 20, 219–240 (2010).
 32.
Pfister, D. et al. Early salvage radiotherapy following radical prostatectomy. Eur. Urol. 65, 1034–1043 (2014).
 33.
King, C. R. The timing of salvage radiotherapy after radical prostatectomy: a systematic review. Int. J. Radiat. Oncol. Biol. Phys. 84, 104–111 (2012).
 34.
Hazelton, W. D. & Luebeck, E. G. Biomarkerbased early cancer detection: is it achievable? Sci. Transl. Med. 3, 109fs9 (2011).
 35.
The U. S. Preventive Services Task Force, Screening for Prostate Cancer: U. S. Preventive Services Task Force Recommendation Statement. http://www.uspreventiveservicestaskforce.org/uspstf12/prostate/prostateart.htm (2012), Date of access: 04/01/2015.
Acknowledgements
We would like to express our appreciation to Dr. Nicholas Bruchovsky for valuable discussions and sharing published clinical data. This work is partially supported by JSPS KAKENHI Grant Number 11J07088, by MEXT KAKENHI Grant Number 23240019, by JSTCREST, and by the Aihara Innovative Mathematical Modelling Project, the Japan Society for the Promotion of Science (JSPS) through the “Funding Program for WorldLeading Innovative R&D on Science and Technology (FIRST Program)”, initiated by the Council for Science and Technology Policy (CSTP). The violin data used in this study is available in the RWC Music Database (Musical Instrument Sound).
Author information
Affiliations
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 1138656, Japan
 Kai Morino
 , Yoshito Hirata
 , Kenji Yamanishi
 & Kazuyuki Aihara
Institute of Industrial Science, The University of Tokyo, Tokyo 1538505, Japan
 Yoshito Hirata
 & Kazuyuki Aihara
Toyota Technological Institute at Chicago, Chicago, Illinois 60637, USA
 Ryota Tomioka
Graduate School of Informatics, Kyoto University, Kyoto 6068501, Japan
 Hisashi Kashima
CREST, JST, Honcho, Kawaguchi, Saitama 3320012, Japan
 Kenji Yamanishi
Department of Urology, Jikei University School of Medicine, Tokyo 1058461, Japan
 Norihiro Hayashi
 & Shin Egawa
Authors
Search for Kai Morino in:
Search for Yoshito Hirata in:
Search for Ryota Tomioka in:
Search for Hisashi Kashima in:
Search for Kenji Yamanishi in:
Search for Norihiro Hayashi in:
Search for Shin Egawa in:
Search for Kazuyuki Aihara in:
Contributions
N.H. and S.E. designed the clinical study. K.M., Y.H. and K.A. designed the rest of the study. K.M., Y.H., R.T., H.K., K.Y. and K.A. created the theoretical method. K.M. and Y.H. analyzed the data. N.H. and S.E. obtained the clinical data and suggested the clinical implications. K.M., Y.H. and K.A. wrote the manuscript. All authors checked the manuscript and agreed to submit the final version of the manuscript.
Competing interests
S.E. declares competing financial interests: supports from Takeda Pharmaceutical Co., Astellas, and AstraZeneca. The other authors declare no competing financial interests.
Corresponding author
Correspondence to Kai Morino.
Supplementary information
PDF files
 1.
Supplementary Information
Supplementary Information
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Further reading

Personalizing Androgen Suppression for Prostate Cancer Using Mathematical Modeling
Scientific Reports (2018)

System identification and parameter estimation in mathematical medicine: examples demonstrated for prostate cancer
Quantitative Biology (2016)

Parsimonious description for predicting highdimensional dynamics
Scientific Reports (2015)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.