Theoretical background

A time series is a sequence of time-oriented observations related to forecasting or controlling a specific variable1. This thematic study originated in 1927, adopting a general approach to time series analysis2. Nearly three decades later, new time series forecasting approaches began to emerge.

Initially, classical time series statistical models were proposed3. Subsequently, these models were refined to include exponential smoothing techniques4,5 before evolving into auto-regressive moving average models6. Eventually, they progressed further to incorporate Machine Learning7 and State-Space models8.

In all instances, the predictability of future events is a central element, crucial for planning and processes related to Operations Management, among others, such as Marketing, Economics, and Demography1. However, the predictability of an event or quantity depends on various factors, including an understanding of the influencing factors, data availability, future and past similarities, and the potential impact of forecasts on the predicted outcome9.

In the context of oncology studies, mortality and incidence projection methods were already compared in Canada, using age-period-cohort (APC), auto-regressive time series, and space-state models at least for ten cancer types10.

APC and Bayesian APC, auto-regressive integrated moving average (ARIMA) time series, and simple linear models were also compared for five cancer types in Switzerland11.

Using reported breast cancer cases in the Fijian population from 1995 to 2016, Chand et al.12 attempted to apply an ARIMA model to provide a 12-month ahead prediction. However, faced with non-stationary data according to the Augmented Dickey-Fuller test, a linear regression model was chosen. The proposed model was compared with the Naive Forecast Method, showing that the linear regression model outperformed the Naive Forecast Method.

Also exploring the epidemiological characteristics of breast cancer, Lin et al.13 used Exponential Smoothing (ETS) and Autoregressive Integrated Moving Average (ARIMA) models to forecast breast cancer incidence in China.

Regarding palliative cancer care, two different long short-term memory (LSTM) models were proposed, aiming to forecast the patients’ next visit day and estimate the total patient demand 1 week ahead14. For this, was take into account their requirements, demographics, and each service history profile.

Alrobai and Jilani15 also applied LSTM to forecast the incidence of the three most prevalent cancers in Saudi Arabia. However, it’s crucial to note that cancer prevalence can significantly vary from one country to another.

In Malaysia, to deal with the continued annual growth in cancer incidence rates, particularly female breast, colorectal, and lung cancer, Lazam et al.16 tested ARIMA and Exponential Smoothing (ETS) models. They intended to determine the best rates for incidence prediction for these mentioned types of cancer.

Tudor17 proposed alternative ways to forecast cancer incidence and mortality by connecting population web-search practices with health variables officially published by Romanian authorities. The applied models included ARIMA, the Exponential Smoothing State-Space Model with Box-Cox Transformation, ARMA Errors, Trend, and Seasonal Components, and a feed-forward neural network nonlinear autoregression model.

In this research, conducted in Brazil, we present the framework to evaluate previous works on cancer time-series prediction, dividing the time-series prediction according to Hyndman and Athanasopoulos9 into Classical Statistical models, State-Space models, and Machine Learning models (Table 1). For this, only researches that makes cancer predictions were considered.

Table 1 Forecasting model applied by cancer type.

After comparing ten previous works related to cancer incidence prediction (Table 1), we can conclude that:

  1. 1.

    Breast and lung cancer incidence predictions have garnered more attention in specialized literature and have been studied in 8 and 7 works, respectively; colorectal cancer has been studied in 5 works, while other cancer types have been studied in 4 works or less.

  2. 2.

    CSM and particularly ARIMA were the most used approaches.

  3. 3.

    Considering SSM and MLM, TBATS NNETAR, and MLP were never covered before in previous research.

  4. 4.

    We found no previous work in which all three classes of models were applied.

As will be presented in this paper, the third and fourth conclusions allow us to state that this work covers a gap in current cancer prediction. Thus, applying unseen methods (3rd) and the three classes of models (4th) to cancer prediction is an original contribution of this research.

Finally, the mentioned studies address the application of different forecasting methods in countries such as Canada, Switzerland, Fiji, China, Malaysia, and Romania. Their use in Brazil, for a larger sample of types of cancer and comparing them, seems like a complementary contribution.

Methods

Data collection

In this research, we analyze real cancer data from Brazil obtained from INCA. All time-series used are presented in Fig.  1 and are also available at Table 2. The seven cancer types evaluated are: Breast cancer (ICD-10 C50), Colorectal Cancer (ICD-10 C18 to C21), Prostate cancer (ICD-10 C61), Lung cancer (ICD-10 C33 and C34), Cervical cancer (ICD-10 C53), Head and Neck Cancer (ICD-10 C00 to C10) and Childhood Cancer (ICD-10 C00 to C96).

Figure 1
figure 1

ICD-10 Mortality rate by 100,000 inhabitants considering world population-adjusted by cancer type.

Table 2 ICD-10 Mortality rate by 100,000 inhabitants considering world population-adjusted by cancer type.

The filters employed for each type of cancer can be found in Table 3. We gathered data on the mortality rates for Brazilian cancer from INCA’s website21. The population figures were obtained from the 2022 Brazilian census22.

Table 3 Filters and criteria used to retrieve cancer data by cancer type.

In Brazil, the cancer incidence is not registered to all districts. So, INCA works with an approximate incidence inferred from the mortality rate considering Black et al.23, Ferlay et al.24 and Ferlay et al.25 estimation methodologies based mainly on the I/M ratio.

The mentioned methodologies links the unknown incidence rate-adjusted (IRa) to the known mortality rate-adjusted (MRa) of some district by the equation \(IRa = MRa*(I_R/M_0)\). Where \(I_R\) refers to known incidence of districts geographically near from the targeted unmeasured district and \(M_0\) refers to number of deaths of the same districts. The results of \(I_R/M_0\) ratio to the unmeasured district evaluated is presented in Table 4.

Table 4 I/M ratio by cancer type.

In Fig.  1 we present for each type of studied cancer the mortality rate by world population-adjusted by 100,000 inhabitants. Then, according to2325,24, estimation methodologies the expected incidence rate-adjusted IRa can be obtained by multiplying the mortality rate-adjusted in Fig.  1 and the values presented in Table 4 to each cancer type.

The current short-term predictions in Brazil rely on the average of the past 3 recent years. This outcome serves as a reference for the Brazilian public health system over the next 3 years. In essence, the existing approach is a simple moving average (MA).

Forecasting models applied

In this research we apply the univariate forecasting methods available in Hyndman and Khandakar26, Petris27 and Kourentzes28. Models applied in next sections are presented in Table 5. These models were implemented in R29 language (version 4.1.3) and the code used is available at Supplementary Material (Forecasting code.R).

Table 5 Forecasting models applied in this research.

To build each model is necessary to estimate many parameters, but the main features of each model are presented forward:

  • ETS: ETS is a class of models that essentially works with three components equations level (\(l_t\)), trend (\(b_t\)) and season (\(s_t\)) to explain the original time series variable (\(y_t\)) that we aim to forecast. In each model these components cannot be significant, also known as None (N) or can be significant and better described \(y_t\) as Additive (A) or Additive Damped (Ad) or Multiplicative (M) features. This class of models can be combined in 18 different ways (Fig.  2). For more details see Hyndman and Athanasopoulos9.

  • ARIMA: ARIMA or Seasonal ARIMA (SARIMA) is a class of models that combine autoregressive (AR) and moving average (MA) with differenced values. The AR part of ARIMA (p) shows that the time series is regressed on its own past data. The MA part of ARIMA (q) indicates that the forecast error is a linear combination of past respective errors. The I part of ARIMA (d) refers to differenced values of d order to obtain stationary time-series in which ARMA model approach can be applied Kotu and Deshpande (2019)30. The difference between ARIMA and SARIMA models remains on the same components appearing lagged by the length of seasonal time window (frequency) as P, D and Q. For more details see Hyndman and Athanasopoulos9 and Kotu and Deshpande30.

  • Kalman filter (KF): KF methods search the smallest vector that summarizes the past of the system that better describes the state of a deterministic dynamic system31. KF equation is basically composed by a linear autoregressive equation \({x(t)} = A*x(t) + W(t)\) where \(W(t) \approx N(0,Q)\) with a measurement that is \({y(t)} = C*y(t) + V(t)\) where \(V(t) \approx N(0,R)\) that defines the linearized process in which \(y(t) \in {\mathbb {R}}\). The random variables W(t) and V(t) are assumed to be independent of each other and both must follow a normal distribution.

  • TBATS: TBATS model is Trigonometric Seasonal (T) Exponential Smoothing Method + Box-Cox Transformation + ARMA model for residuals (BATS). Equations of the TBATS model are presented in equations below where \(\omega\) and \(\phi\) are Box-Cox and the damping parameters respectively, ARMA(pq) process model the error and \(m_1\) to \(m_J\) list the seasonal periods used while \(k_1\) to \(k_J\) are the corresponding number of Fourier terms used. For more details see De Liveira et al.32.

    $$\begin{aligned} y_{t}^{(\omega )}= & {} \frac{ y_{t}^{(\omega )}-1}{\omega }, \omega \ne 0,\\ y_{t}^{(\omega )}= & {} \log {y_{t}}, \omega = 0,\\ y_{t}^{(\omega )}= & {} l_{t-1}+\phi *b_{t-1}+\sum _{i=1}^{t} s_{t-m_i}^{i} +d_t,\\ l_{t}= & {} l_{t-1}+\phi *b_{t-1} +\alpha *d_t,\\ b_{t}= & {} (1-\phi )*b_t +\phi *b_{t-1}+\beta *d_t,\\ s_{t}^{i}= & {} s_{t-m_i}^{i} +\gamma _i *d_t,\\ d_{t}= & {} \sum _{i=1}^{p} \phi _i*d_{t-i}+\sum _{i=1}^{q} \theta _i*\epsilon _{t-i} +\epsilon _{t},\\ s_{t}^{i}= & {} \sum _{j=1}^{k_j} s_{j,t}^{i},\\ s_{t}^{i}= & {} s_{j,t-1}^{i}*\cos {\lambda _j^i} + s_{j,t-1}^{*i}*\sin {\lambda _j^i} + \gamma _1^i*d_t,\\ s_{t}^{*i}= & {} s_{j,t-1}^{i}*\sin {\lambda _j^i} + s_{j,t-1}^{*i}*\cos {\lambda _j^i} + \gamma _2^i*d_t\\ \end{aligned}$$
  • NNETAR: Neural Network Time Series Forecasts (NNETAR) is a class of feed-forward neural networks with a single hidden layer and lagged inputs. This model works with 2 (for non seasonal time-series) or 3 (for seasonal time-series) parameters: the number of past observations used as input layers (p), the number of past observations lagged by the length of seasonal time window used as input layers (P) and the number of neurons (k) in the single layer. In this research, a total of 20 repeats networks are fitted, each with random starting weights. These are then averaged when computing forecasts. The network is trained for one-step forecasting. Multi-step forecasts are computed recursively. The k selected to each type of cancer it the half of the number of input nodes plus 1. For non-seasonal data, the fitted model is denoted as an NNAR (pk) (Neural Network Autoregressive) model which is analogous to an AR (p) model but with nonlinear functions. For seasonal data, the fitted model is called an NNAR (pPk)[m] model, which is analogous to an ARIMA (p, 0, 0)(P, 0, 0)[m] model but with nonlinear functions. For more details see Hyndman and Athanasopoulos9.

  • MLP: MLP is an extension of feed-forward neural network where an arbitrary number of hidden layers that are placed in between the input and output layer (the truly computational engine of the MLP). According to Kourentzes et al.33, MLPs are designed to approximate any continuous function and can solve problems which are not linearly separable. In our case, the time-series problem proposed our input layer (like NNETARs’ model p) are the most recent past observations and we set the MLP model to choose the best number of input layers between 1 and the prediction length (3 years) lags will be used according to Mean Square Error. The same criteria were also adopted to choose the number of hidden nodes in each hidden layer. For more details see Kourentzes et al.33.

Figure 2
figure 2

Hyndman and Athanasopoulos9 ETS equations.

Forecasting models evaluation

The dataset presented in Table 2 were multiplied by I/M ratio for each cancer type shown in Table 4 to estimate the incidence rate of each type of cancer evaluated (Fig.  3).

For instance, to Breast cancer, the ICD-10 Mortality rate by 100,000 inhabitants are 17,77 in 1979, 21.73 in 1980 and so on (second column Table 2). Thus, the Breast cancer Incidence rate-ajusted will be these values multiplied by 5.59 (Breast cancers’ \(I_R/M_0\) ratio in Table 4) which are 77.65 in 1979, 94.96 in 1980, 69 in 1981 and so on that can be seen in Fig.  3.

Figure 3
figure 3

ICD-10 Incidence rate-ajusted (IRa) by cancer type.

In this research, we are interested in provide a comparison between Brazilian’s current short-term cancer prediction and the time-series state of art models. As mentioned in Section Theoretical Background, as long as the current short-term cancer prediction are made 3 years ahead, we split our dataset into training data (from 1979 to 2017) and test data (from 2018 to 2020).

Training (in sample) and test (out of sample) data are evaluated using the Root Mean Square Error (RMSE) criterion. A low RMSE in sample value indicates a good average fit of the model used while a low value of RMSE out of sample indicates that the model used, on average, delivers a reliable forecast9.

Below we present the criteria adopted to evaluate the current and proposed methods predictions to each cancer type:

  • The noise evaluation over the training (in sample) data according to the following tests: student (ST), normality (NT), Auto-correlation function (ACF) plot and Breusch-Pagan (BPT);

  • The error evaluation according to the test (out of sample) Root Mean Square Error (RMSE).

If the residuals produced a 0 mean error in Student-test, follows a normal distribution in Shapiro–Wilk test, remains between the interval defined by the blue lines in ACF plot test to all lags and presented no constant variance all over the time (homoscedasticity) in Breusch-Pagan test, we consider that the model residuals produced a white noise which means that the model is unbiased34,35,36,37,38.

The significance level adopted in this research is 0.05 which means that residuals produced a white noise if the obtained p-values in each test are higher than 0.05 to each model.

Thus, in this research we consider that the best model for each cancer type is given by their residual evaluation that (1) fulfill all requirements previously presented and (2) obtained the lowest out of sample RMSE.

Results

In this section we apply the methods presented in columns of Table 5 to each type of cancer incidence presented in Fig.  3. In Table 6 we summarize the in sample and out of sample RMSE results by model and type of cancer.

Table 6 RMSE per type of cancer per model.

As mentioned in Forecasting models evaluation section, to compare models errors summarized in Table 6 we select the out of sample RMSE criterion. Then, to ensure that models residuals give us a white noise in the training data we apply the Student test (Table 7), the ACF plot, the Shappiro-Wink normality test (Table 8) and the Breusch-Pagan test (Table 9).

Table 7 Student test p value per type of cancer per model.
Table 8 Normality test p value per type of cancer per model.
Table 9 Breusch—Pagan test p value per type of cancer per model.

As mentioned in Section Forecasting models evaluation, besides considering RMSE criteria we must also evaluate if each model produced residual values with a white error noise taking into account their auto-correlation plots and normality test to all cancer types (Table 6).

This evaluation is presented for all types of cancer evaluated, grouped (Figure 4) and individually—breast (Figure 5), colorectal (Figure 6), prostate (Figure 7), lung (Figure 8), cervical (Figure 9), head and neck (Figure 10) and childhood (Figure 11).

Figure 4
figure 4

All cancer types noise evaluation using INCA’s current model.

Figure 5
figure 5

Breast cancer noise evaluation by model.

Figure 6
figure 6

Colorectal cancer noise evaluation by model.

Figure 7
figure 7

Prostate cancer noise evaluation by model.

Figure 8
figure 8

Lung cancer noise evaluation by model.

Figure 9
figure 9

Cervical cancer noise evaluation by model.

Figure 10
figure 10

Head and Neck cancer noise evaluation by model.

Figure 11
figure 11

Childhood cancer noise evaluation by model.

The white noise failure evaluation by model and by cancer type is summarized in Table 10.

Table 10 White noise failure evaluation summary per type of cancer per model.

Considering the criteria presented in Section Forecasting models evaluation to ensure an unbiased model, we must select the best model to each type of cancer evaluated discarding the result of the following failed (biased) models for:

  • Current model, ETS, ARIMA, TBATS and KF to breast cancer which failed in Auto-correlation function (ACF) plot presented in Fig.  5 and, in normality test, MLP failed.

  • MLP to colorectal cancer which failed in ACF plot presented in Fig.  6, Breusch-Pagan test and in normality test. NNETAR also failed in Breusch-Pagan test.

  • NNETAR to prostate cancer which failed in Auto-correlation function (ACF) plot presented in Fig.  7 and MLP failed in normality test.

  • KF to lung cancer which failed in student test, ACF plot presented in Fig.  8 and, in normality test and ACF plot, MLP failed.

  • Cervical cancer presented residuals produced a significant ACF plot only to current model as presented in Fig.  9. MLP failed in normality test.

  • ARIMA to head and neck cancer which failed in ACF plot presented in Fig.  10 and, in normality test, KF and MLP failed.

  • MLP to childhood cancer which failed in normality test.

Thus, the best model to each cancer type are: NNETAR for breast, KF for colorectal, ARIMA for prostate, TBATS for lung, KF for cervical, the current method for Head and neck and KF for childhood.

Their prediction plots can be seen respectively in Figs. 12, 13, 14, 15, 16, 17 and 18. The 3-year ahead prediction values are summarized in Table 11

Table 11 Three years IRa prediction using the best model to each cancer type.
Figure 12
figure 12

NNAR breast cancer IRa prediction values.

Figure 13
figure 13

KF colorectal cancer IRa fitted and prediction values.

Figure 14
figure 14

ARIMA prostate cancer IRa fitted and prediction values.

Figure 15
figure 15

TBATS lung cancer IRa fitted and prediction values.

Figure 16
figure 16

KF cervical cancer IRa fitted and prediction.

Figure 17
figure 17

Current method head and neck cancer IRa fitted and prediction values.

Figure 18
figure 18

KF Childhood cancer IRa fitted and prediction values.

Discussion

A limitation of this research could be observed in the method used to obtain the incidence of cancer in Brazil. This occurs because, in practice, the incidence is not measured. Thus, we used cancer incidence estimation methodologies proposed in Black et al.23, Ferlay et al.24 and Ferlay et al.25 which are based on the mortality rate discussed in Section Data collection.

Considering that the presented methodologies can give us the best cancer incidence estimation evaluating only time-series univariate models, our findings in Table 6 seem to indicate that the current model applied by INCA in Brazil to forecasting cancer incidence underperform in 6 of the 7 type of cancers proposed in this research. So, the presented methodologies seem to behave more adequately than the Brazilian’s current methodology.

It is important to note that we are working with the same type and amount of data that is used today, meaning that it would not be necessary to collect new variables in order to increase the accuracy of the forecast.

In addition, we did not see the CSM models outperform the others in any type of cancer, although ARIMA models (CSM) are the most widely used models in the current literature so far as we presented in Table 1.

These facts imply that, while there is no broad and reliable Population-Based Cancer Registries in the country, all research that use these data as a primary source will be limited; including this one.

However, it is necessary to consider that Brazil has continental dimensions and a technological backwardness that do not facilitate the implementation of this type of record. Although restrictive, the fact has not prevented research and public policies aimed to cancer prevention and control in the country, that surely could be more effective.

In this sense, we reinforce that it is not possible to invalidate what has been done in the country, but to plead for the opening of space so that new, more accurate forecast models can be adopted, aiming at supporting strategic decisions to face cancer in the country. Even because the current literature has used models that go in the opposite direction of the results presented by this research in Table 1.

For instance, MLM models were only used in Soltani et al.14 and Alrobai and Jilani15 works and only LTSM were evaluated. Considering SSM, the current literature presents only Lee et al.10 research in which only KF approach is proposed.

In Table 11, we see that SSM (KF and TBATS) was selected in four of seven type of cancers evaluated while MLM (NNETAR), CSM (ARIMA) and current method where selected to one type of cancer.

The evaluation process adopted in this research and presented in Section Forecasting models evaluation was crucial to identify and discard biased models to each type of cancer. If we had only considered in sample RMSE criterion (measuring the best fitted model, on average) to select the models to each type of cancer, MLP would be selected in all time-series evaluated.

On the other hand, if we considered only out of sample RMSE criterion (measuring the best predicted values, on average), ARIMA and MLP would be selected in two types of cancer while ETS, TBATS and KF would be selected in only one type of cancer time-series (NNETAR and current method would not be selected).

The noise evaluation process adopted also allowed us to state that the current model can potentially provide a biased prediction because it failed in ACF plot to Breast and Cervical cancer as we can see in Fig.  4. Therefore, we cannot classify it as statistically valid for making predictions.

It is important to note that both cancers affects the female population and keep using the current method could jeopardize efficient planning of resources for diagnosis and treatment for them.

Considering that, in Brazil, government policies and programs are mostly focused on these types of cancer the situation may pose an important challenge to be overcome.

Finally, by evaluating Brazilian’s current approach, CSM, SSM and MLM using four exclusion criteria (mean 0, normality, ACF and homoscedasticity tests) and one decision criteria (lowest out of sample RMSE) we were able to establish the best unbiased model to each type of cancer, as we wanted to illustrate. We also emphasize that by comparing different methods we can potentially improve the main issue addressed in this research: how to provide an unbiased and reliable cancer forecasting.

Although it is not the focus of this research, causal and multivariate time-series models associated with other control variables such as cigarette smoking as a predictor of lung cancer and HPV vaccination coverage for cervical cancer should be investigated. Another promising direction is to investigate age-period-cohort (APC) models and combine them with the time-series models proposed in this research.

Conclusions

This research aimed to present and apply the main time-series-based models available in forecasting literature to the seven most prevalent types of cancer in Brazil. These models fall into three classes: classical statistical models, State-Space models, and machine learning models.

As mentioned in Theoretical Background section, it is the first attempt to apply unseen methods (TBATS, NNETAR and MLP) and the three classes of models to cancer prediction.

In Brazil, the incidence of cancer is not directly measured and must be estimated based on the mortality rate. Despite the challenge of not directly measuring cancer incidence, it is crucial for public health systems to estimate the incidence of a disease that ranks second in terms of mortality rate per 100,000 inhabitants.

While acknowledging the issue of not directly measuring incidence, our research mitigates this concern by utilizing the same data and employing the same cancer incidence estimation methods. This consistency ensures that our comparison between Brazil’s current prediction method and our proposed methods remains valid.

We also contributed to fulfill a literature gap identified in Table 1 by applying TBATS, MLP and NNETAR forecasting techniques predict seven cancer types in a Brazilian district.

Furthermore, we did not find any similar studies that compared the results of three classes of univariate time-series forecasting models or addressed more than one type of cancer.

When comparing only the error results (RMSE in sample and out of sample) between the approaches mentioned above and the current technique, we demonstrated that the current method underperforms for all types of cancer tested.

Moreover, in the Discussion section, we illustrated that, for breast and cervical cancers, the current approach applied in Brazil produced biased residuals, potentially affecting the quality and reliability of cancer incidence predictions in this country. Consequently, it may provide inaccurate information to healthcare decision-makers.

Therefore, we suggest that the methods evaluated in this study should be integrated into Brazil’s cancer forecast methodology to provide a reliable prediction for healthcare decision-makers.

To further researches, we also suggest a comparison between MLM time-series approaches. NNETAR and MLP (covered in this research) with LTSM which had been also used in recent previous works like Soltani et al.14 and Alrobai and Jilani15 presented in Table 1.

Although it was not the focus of this research, it should be noted that age-period-cohort (APC), previously mentioned in Section Theoretical Background, and Ensemble APC analysis as well as considering the birth-cohort effects39,40 have potential to provide more accurate forecasts compared to traditional time-series methods that only consider period components.

Finally, by contributing with a proposal for the application of a set of tested forecasting methods to estimate the incidence of cancer in Brazil, it is intended that the results encourage a discussion on the adoption of anticipatory actions, aimed at prevention and the provision of means and resources for the early detection of the most prevalent types of cancer.

In this sense, to provide more robust predictions causal models could be also taking into account like we can see in41,42,43,44,45,46,47 applied to other diseases. Using them it is possible to evaluate the impact of smoking reduction or HPV vaccines strategies for lung and cervical cancer respectively, for instance.