Introduction

Considering longitudinal data, also referred to as multivariate time series data, three-way data, or multivariate trajectory data, triclustering aims to discover patterns that satisfy specific homogeneity and statistical significance criteria. Given the increasing prevalence of three-way data across biomedical and social domains, triclustering—the discovery of patterns (triclusters) within three-way data—is becoming a reference technique to enhance the understanding of complex biological, individual, and societal systems1. Clustering is limited to this end since objects (patients) in three-way data domains are typically only meaningfully correlated on subspaces of the overall space (subsets of features), and although biclustering is able to find correlated objects in a subspace of features or temporal patterns for one feature, cannot consider both time and multiple features2.

In clinical domains, triclustering has been successfully applied for different ends: health record data analysis, where triclusters can identify groups of patients with correlated clinical features along time; neuroimaging data analysis in which triclusters correspond to enhanced hemodynamic or electrophysiological responses and connectivity patterns between brain regions; multi-omics, where triclusters capture putative regulatory patterns within omic series data; and multivariate physiological signal data analysis, where triclusters capture coherent physiological responses for a group of individuals1,3,4. In spite of triclustering relevance for descriptive tasks (knowledge acquisition), its potential in predictive tasks (medical decision) remains considerably untapped1.

In this context, grounded on the potentialities of triclustering approaches, we propose a triclustering-based classifier to learn prognostic models from three-way clinical data, which takes advantage of the temporal dependence between the monitored features, and further enhances model explainability by learning an associative model grounded on local temporal patterns (subsets of features with specific values for a subset of patients in a contiguous set of temporal observations during follow-up). To this end, we propose TCtriCluster, a temporally constrained triclustering algorithm able to mine time-contiguous triclusters that extends the state-of-the-art triCluster algorithm5, originally proposed by Zhao and Zaki to mine patterns in three-way gene expression data, to cope with three-way heterogeneous clinical data (patient-feature-time data).

As a case study, we target prognostic prediction in Amyotrophic Lateral Sclerosis (ALS) using a large cohort of Portuguese patients, where the triclusters learned from patients’ follow-up data can be interpreted as disease progression patterns. The patterns identifying groups of patients with coherent temporal evolution on a subset of features are then used for prognostic prediction as features in a state-of-the-art classifier. The prognostic models learned using the proposed triclustering-based classifier predict whether a patient will evolve to a target clinical endpoint within a certain time window. We target five clinically relevant endpoints in ALS: (1) need for non-invasive ventilation (NIV), (2) need for an auxiliary communication device, (3) need for percutaneous endoscopic gastrostomy (PEG), (4) need for a caregiver, and (5) need for a wheelchair.

The major contributions of this work are the following:

  • A new pattern-centric data transformation from longitudinal data into multivariate temporal features, the triclusters, yielding both descriptive and discriminative qualities for subsequent learning tasks;

  • First study in ALS that comprehensively assesses the state-of-the-art predictability limits of different clinical endpoints of interest, using time windows;

  • A new triclustering algorithm, termed TCTriCluster, able to find time-contiguous triclusters with constant and additive forms of homogeneity;

  • Discriminative patterns of (ALS) disease progression used for prognostic prediction and whose inspection can putatively help to explain prognostics, aiding medical research and practice.

The gathered results are promising, highlighting the potential of the proposed methodology regarding both predictability (outperforming state-of-the-art alternatives) and interpretability. Some limitations should, however, be pinpointed. First, our results primarily focus on the predictive value of follow-up assessments. Nevertheless, the proposed predictors can straightforwardly combine static features with triclustering-based features (as we show at the end). Second, in spite of the large ALS cohort size (N = 1321), collected at the Portuguese ALS center, data from other ALS centers can be used for further validation.

The proposed triclustering-based classifier can be used to learn prognostic models from follow-up data in other diseases, as well as predictive models from three-way data in other domains. The TCtriCluster algorithm can be further used as a standalone tool to mine arbitrarily positioned, overlapping, and temporally constrained triclusters with constant, scaling, and shifting patterns from three-way heterogeneous data.

Background and related work

ALS is a neurodegenerative disease characterized by weakness and functional disability, with patients presenting with a different phenotype and progression rate. Most of the patients with ALS die from respiratory complications within the first 3–5 years after disease onset. Notwithstanding, some patients are living up to 10 years, while in more severe circumstances, survival can be shortened to 1 year6. Recent studies have reported a prevalence of 8-9 cases in 100.000 inhabitants worldwide7, in Portugal, the described prevalence is similar8.

In the absence of curative treatment, it is essential to promote timely interventions for prolonging survival and improving quality of life. The most important interventions are NIV, with a major positive impact on survival; augmentative communication for preventing social isolation; PEG to keep appropriate nutrition; routine caregiver support for daily life activities and wheelchair regular outings, e.g. for medical appointments6,9,10. Clinicians have been using a well-established scale to determine disease progression: the revised ALS Functional Rating Scale (ALSFRS-R)11. This scale has specific questions regarding respiratory symptoms, speaking, swallowing, self-care and walking, which are essential to determine the timing of the several interventions. Regarding respiratory function, a number of tests are used to support the decision of NIV initiation.

Due to the high heterogeneity of this disease, the individual prognosis of an ALS patient is challenging. Therefore it is of utmost importance to develop explainable machine learning models, pinpointing the need for approaches to learn explainable disease progression models that clinicians can effectively use for prognostic prediction and timely interventions, with a possible positive impact on survival and quality of life. Recent years have witnessed an increasing awareness of the potentialities of machine learning amongst ALS researchers, leading to several applications to ALS cohort data12,13,14,15,16,17,18,19,20,21. The great potential of learning stratification models has also shown opportunities for future clinical trials, besides promoting more accurate and trustable predictions by learning group-specific prognostic models13,22,23,24.

In this context, Carreiro et al.12 conducted a pioneer study proposing prognostic models to predict the need for NIV in ALS based on clinically defined time windows. More recently, Pires et al.22 stratified patients according to their state of disease progression achieving three groups of progressors (slow, neutral and fast), and proposed specialized learning models according to these groups. They further used patient and clinical profiles with promising results23. However, none of their studies took into account the temporal progression of the features. Recently, Martins et al. proposed to couple itemset mining with sequential pattern mining to unravel disease presentation and disease progression patterns and used these patterns to predict the need for NIV in ALS patients25. Despite their relevant results, they did not consider the contiguity constraint imposed by the temporality of the patient’s follow-up data. Matos et al.26 proposed a biclustering-based classifier. Biclustering was used to find groups of patients with coherent values in subsets of clinical features (biclusters), then used as features together with static data. Besides promising, none of this approach also did not take into account the temporal dependence between the features.

In previous work, a preliminary assessment of the role of classic triclustering approaches for predicting ventilation support needs in ALS was undertaken27, and, biclusters discovered in the static dimension of data were considered to predict the need for NIV within specific time windows28. Differently from these earlier works, our research proposes a novel triclustering approach grounded on temporal contiguity constraints that yield both higher predictability and better explainability.

Complementarily to the above pattern-centric stances, Pancotti et al.29 recently applied state-of-the-art deep learning methods to study disease progression in ALS using a publicly available database (PRO-ACT), showing competitive performance.

Despite the extent of research on ALS prognostic ends, most of the existing works focus on survival prediction, NIV needs, or general changes to the ALS functional rating scale (ALSFRS-R), generally neglecting specific clinical endpoints of interest. Specific clinical endpoints, such as the need for a wheelchair or percutaneous endoscopic gastrostomy, have been primarily studied under descriptive stances, including the analysis of cumulative time-dependent risks30. To our knowledge, their predictability under the machine learning stance using time windows and explainable progression patterns remains unassessed.

Methods

This section describes the proposed methodology to learn a triclustering-based classifier from three-way data, from preprocessing (including creating learning examples) to classifier performance evaluation. It further describes TCtriCluster, the proposed triclustering algorithm to mine temporally constrained triclusters. Figure 1 depicts the overall workflow.

Figure 1
figure 1

Proposed Workflow to Learn a Triclustering-based Classifier.

In what follows, consider that a three-way dataset, D, is defined by n objects \(X = \{x_1,\ldots ,x_n\}\), m features \(Y = \{y_1,\ldots ,y_m\}\), and p contexts \(Z = \{z_1,\ldots ,z_p\}\), where the elements \(d_{ijk}\) relate object \(x_i\), feature \(y_j\), and context \(z_k\). Consider also that, a bicluster \(B = (I, J)\) is a subspace given by a subset of objects, \(I \subseteq X\), and a subset of features, \(J \subseteq Y\)2. Similarly, a tricluster \({\mathscr {T}} = (I, J, K)\), contains \(I \subseteq X\) objects, \(J \subseteq Y\) features and \(K \subseteq Z\) contexts, and \(t_{ijk}\) denote the elements of \({\mathscr {T}}\), where \(1 \le i \le |I|\), \(1 \le j \le |J|\) and \(1 \le k \le |K|\)1. In this context, each tricluster \({\mathscr {T}}\) can be represented as a set of biclusters \({\mathscr {T}} = \{{\mathscr {B}}_1, {\mathscr {B}}_2, \ldots , {\mathscr {B}}_s\}\):

$$\begin{aligned} {\mathscr {B}}_1= & {} \begin{bmatrix} t_{111} &{} t_{121} &{} \cdots &{} t_{1|J|1} \\ t_{211} &{} t_{221} &{} \cdots &{} t_{2|J|1} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ t_{|I|11} &{} t_{|I|21} &{} \cdots &{} t_{|I||J|1} \\ \end{bmatrix}, {\mathscr {B}}_2 = \begin{bmatrix} t_{112} &{} t_{122} &{} \cdots &{} t_{1|J|2} \\ t_{212} &{} t_{222} &{} \cdots &{} t_{2|J|2} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ t_{|I|12} &{} t_{|I|22} &{} \cdots &{} t_{|I||J|2} \\ \end{bmatrix}, \ldots , {\mathscr {B}}_{|K|} = \begin{bmatrix} t_{11|K|} &{} t_{12|K|} &{} \cdots &{} t_{1|J||K|} \\ t_{21|K|} &{} t_{12|K|} &{} \cdots &{} t_{1|J||K|} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ t_{|I|1|K|} &{} t_{|I|2|K|} &{} \cdots &{} t_{|I||J||K|} \\ \end{bmatrix} \end{aligned}$$

Preprocessing data

The three-way dataset, composed of several heterogeneous features measured over a number of time points, is first preprocessed to obtain learning examples. Depending on the dataset, dealing with missing values and class imbalance might also be needed. Some triclustering searches, such as the one proposed in this work, can ignore missing values, tackling imputation needs.

TCtriCluster: a new temporal triclustering algorithm

triCluster5, a pioneer and highly cited triclustering approach proposed and implemented by Zhao and Zaki is selected. It is a quasi-exhaustive approach, able to mine arbitrarily positioned and overlapping triclusters with constant, scaling, and shifting patterns from three-way data. Given that triCluster was proposed to mine coherent triclusters in three-way gene expression data (gene-sample-time), at this point, it is important to understand that clinical data can be preprocessed in order to have a similar structure, in which patient-feature-time data resembles the gene-sample-time data considered in earlier works. triCluster is composed of 3 main steps: (1) constructs a multigraph with similar value ranges between all pairs of samples; (2) mines maximal biclusters from the multigraph formed for each time point (slices of the 3D dataset); and (3) extracts triclusters by merging similar biclusters from different time points. Optionally, it can delete or merge triclusters according to the placed overlapping criteria.

As our goal is to mine temporal three-way data, meaning the Z context dimension corresponds to time, we borrow a pivotal idea behind CCC-Biclustering31, a state-of-the-art and highly efficient temporal biclustering algorithm, and introduce a temporal constraint in triclustering to promote interpretability, predictive accuracy, and efficiency. The goal thus becomes to mine Time-Contiguous Triclusters (TCTriclusters), triclusters with consecutive time points. In this context, we re-implemented triCluster in Python and extended it to cope with a time constraint. The new TCtriCluster algorithm implements this time constraint on its 3rd phase, as shown in Algorithm 1 (line 9).

TCtriCluster allows different combinations of input parameters (from the input parameters of triCluster5 that should be explored in order to discover the best parameters with which the final classifier should be learned. The input parameters are: \(\varepsilon , mx, my, mz, \delta ^x, \delta ^y, \delta ^z, \eta\) and \(\gamma\), corresponding to maximum ratio value, the minimum size of tricluster dimensions x, y and z, maximum range threshold along dimensions x, y and z, overlapping and merging threshold, respectively. More details about the input parameters are referred to5.

figure a

Hyperparameterizing the triclustering search

In this step, we find the best hyperparameters used as input by the triclustering algorithm (described above) in order to optimize predictive performance. The workflow, depicted in Fig. 2, starts by performing triclustering on the preprocessed data to obtain triclusters. Next, and since our triclustering-based classifier uses the triclusters as features, we compute a 3D virtual pattern for each tricluster.

Figure 2
figure 2

Learning triclustering best parameters: workflow.

The proposed 3D virtual pattern corresponds to the tricluster most representative pattern, an extension of the 2D version defined in32, and is computed as follows.

Definition 1

(3D virtual pattern). Given a tricluster \({\mathscr {T}}\), its virtual pattern \({\mathscr {P}}\) is defined as a set of elements \({\mathscr {P}} = \{ \rho _1, \rho _2,\ldots , \rho _{|I|}\}\), where \(\rho _i, 1 \le i \le |I|\) is defined as the mean (or the mode, in case of categorical features) of values in the \(i^{th}\) row for each context:

$$\begin{aligned} \rho _i = \frac{1}{|J|\times |K|} \sum _{z_k\in K} \sum _{y_j\in J} b_{ijk}. \end{aligned}$$
(1)

Considering as example a tricluster \({\mathscr {T}}\)=(IJK), mined from three-way data, (XYZ), composed by 3 objects, 3 features (\(y_1\) and \(y_7\) are categorical features) and 3 contexts, such that \(I = \{ x_1, x_3, x_7 \},\; J = \{ y_1, y_3, y_7 \},\; K=\{ z_2, z_3, z_4 \}\). For simplicity, consider \({\mathscr {T}} = \{ B_2, B_3, B_4 \}\):

$$\begin{aligned}B_2 = \begin{bmatrix} 1 &{} 3.1 &{} 5 \\ 1 &{} 2.8 &{} 3 \\ 3 &{} 2.1 &{} 10 \end{bmatrix}, B_3 = \begin{bmatrix} 2 &{} 3.0 &{} 3 \\ 3 &{} 2.8 &{} 3 \\ 3 &{} 2.9 &{} 9 \end{bmatrix}, B_4 = \begin{bmatrix} 3 &{} 2.9 &{} 3 \\ 2 &{} 2.9 &{} 3 \\ 3 &{} 2.4 &{} 8 \end{bmatrix} \end{aligned}$$

and an object (patient) \(P(X_p, I, K)\) defined as \(P = \{ C_2, C_3, C_4 \}: C_2 = \begin{bmatrix} 1&2.22&5 \end{bmatrix}; \; C_3 = \begin{bmatrix} 1&2.26&7 \end{bmatrix}; \; C_4 = \begin{bmatrix} 2&2.35&8 \end{bmatrix}\). In this settings, the Virtual Patterns are: \(\rho (B_2) = \begin{bmatrix} 1&2.6667&5 \end{bmatrix}\); \(\rho (B_3) = \begin{bmatrix} 3&2.9&3 \end{bmatrix}\); \(\rho (B_4) = \begin{bmatrix} 3&2.7333&3 \end{bmatrix}\); and \(\rho ({\mathscr {T}}) = \begin{bmatrix} 3&2.7667&3 \end{bmatrix}\).

Note that, optionally, in cases where triclustering could capture heterogeneous triclusters, we can detach the biclusters which compose the tricluster and use those biclusters as features (computing virtual pattern 2D) instead of the pattern that describe the whole tricluster. Notice that in this previous example, if we detached the tricluster, we will use three patterns—\(\rho (B_2)\), \(\rho (B_3)\) and \(\rho (B_4)\)—in which the first one is far different from the two others. This optional step gives more information to the classifier, promoting its predictive performance.

With the virtual patterns computed, to assess how well a specific object (patient), \(p_i\), follows the general tendency of a given tricluster \({\mathscr {T}}\) we have to compare \(p_i\) with the 3D virtual pattern, \({\mathscr {P}}\), which is the most representative pattern of the tricluster \({\mathscr {T}}\). To do this, we propose two approaches: (1) compute the Euclidean distance; or (2) compute Pearson correlation between the 3D virtual pattern \({\mathscr {P}}\) and the equivalent pattern (same features and contexts) of \(p_i\).

We denote these assessments as Virtual Distance 3D and Virtual Correlation 3D, and define them as follows:

Definition 2

(Virtual distance 3D). The virtual Euclidean distance between an observation \(p_i\) and a tricluster \({\mathscr {T}}\) is defined as

$$\begin{aligned} \texttt {VD}_{\text {3D}}(p_i, {\mathscr {T}}) = E (p_i, \rho ) = \sqrt{\sum ^I_{e=1} (p_{i_e} - \rho _e)^2 }. \end{aligned}$$
(2)

Definition 3

(Virtual correlation 3D). The virtual linear correlation between an object \(p_i\) and a tricluster \({\mathscr {T}}\) is defined as

$$\begin{aligned} \texttt {VC}_{\text {3D}}(p_i, {\mathscr {T}}) = r (p_i, \rho ) = \dfrac{\displaystyle \sum ^{I}_{e=1} (p_{i_e} - \bar{p_i}) (\rho _e - {\bar{\rho }}) }{\sqrt{\displaystyle \sum ^{I}_{e=1} (p_{i_e} - \bar{p_i})^2 \sum ^{I}_{e=1} (\rho _e - {\bar{\rho }})^2}}. \end{aligned}$$
(3)

After computing similarities matrices based on the virtual patterns (using distances or correlations), these matrices are used as learning examples by the classifier (having the triclusters as features) and evaluated with a 5\(\times\)10-fold Stratified Cross-Validation in order to find the best triclustering parameters, using classification performance as metric. The best parameters are then fed to the next step.

Figure 3
figure 3

Learning final triclustering-based model: workflow.

Learning the final classifier

Figure 3 depicts the steps involved in learning the final model. With the best parameters found in the previous step, an additional iteration is performed in order to obtain the final triclusters. The final triclusters are then used to create a classic multivariate data space by creating one variable per tricluster and computing the virtual distance/correlation between each training object and the given tricluster to produce the transformed data. Using this multivariate data space, a traditional classifier can be learned and used to make predictions in the next step.

Testing stage

After learning the target triclustering-based predictive model, new three-way objects can be classified. To do this, it is necessary to first calculate the array of similarities between the new object and the triclusters (virtual patterns) obtained in the previous steps. This array will be fed to the classifier that will, in turn return the classification for the new object with a percentage of accuracy. Figure 4 depicts an example using clinical three-way data (case study described in the next section).

Figure 4
figure 4

Example of using the triclustering-based classifier to classify new 3-way example from patient follow-up.

Ethics approval and consent to participate

The study was conducted in accordance with the Declaration of Helsinki and was approved by the local (Faculty of Medicine, University of Lisbon) ethics committee. Informed consent to participate in the study was obtained from all participants. Data access was granted in the context of project AIpALS (PTDC/CCI-CIF/4613/2020), where the authors’ institutions participate.

Case study: prognostic prediction in ALS

In this study, we want to predict whether a given patient will evolve to a critical endpoint within k days (time window) since the last clinical appointment using data from the patients’ follow-up. The target endpoints considered and validated by the clinicians are the following:

  • C1—need for non-invasive ventilation (NIV), as decided by the international guidelines11

  • C2—need for an auxiliary communication device (question 1 of the ALSFRS-R with a score of 1 or lower)

  • C3—need for percutaneous endoscopic gastrostomy (PEG) (question 3 of the ALSFRS-R with a score of 2 or lower)

  • C4—need for a caregiver (question 5 or 6 of the ALSFRS-R with a score of 1 or lower)

  • C5—need for a wheelchair (question 8 of the ALSFRS-R with a score of 1 or lower)

Figure 5
figure 5

Overview of triclustering-based classifier applied to ALS case study. Three-way data corresponds to longitudinal data collected at patients’ follow-up, and in particular, the dimensions X, Y, and Z correspond to patients, features and time.

In order to apply the triclustering-based classification approach, the three-way data corresponds to longitudinal data collected at the patient’s follow-up, and in particular, the dimensions X, Y, and Z correspond to patients, features, and time, as shown in Fig. 5.

Cohort data

Our study is conducted using the Lisbon ALS clinic dataset containing Electronic Health Records from ALS Patients regularly followed at the local ALS clinic since 1995 and last updated in October 2021. Its current version contains 1321 patients (740 males and 581 females) with age at onset average \(63 \pm 13\) years. Each patient incorporates a set of static features (demographics, disease severity, co-morbidities, medication, genetic information, exercise, and smoking habits, past trauma/surgery, and occupations) along with temporal features (collected repeatedly at follow-up), like disease progression tests (ALSFRS-R scale, respiratory tests, etc.). Table 1 shows the patient cohort characterization.

As the proposed methodology is focused on three-way clinical data analysis and in order to test its potential, we first restrict our data to temporal data only, discarding static data (described in Table 1). We considered 7 features per time point, the Functional Scores (ALSFRS-R), briefly described next, and a respiratory test: Forced Vital Capacity (FVC). Following recent studies33,34, we computed an extra temporal feature based on ALSFRS-R scale: MITOS stage33. The values for this feature range between 0-5 and provides information about the patient’s disease stage at the moment of the assessment. Concretely, the value represents the number of compromised ALSFRS-R domains33,34,35. The value 5 represents death.

Table 1 Characterization of the population used in the case study.

ALSFRS-R scores for disease progression rating are an aggregation of integers on a scale of 0 to 4 (where 0 is the worst and 4 is the best), providing different evaluations of the patient functional abilities at a given time point35. This functional evaluation is based on 12 questions, explained in Table   2. Different functional scores are then computed using subsets of scores, as shown in Table 3.

Table 2 ALSFRS-R questions.
Table 3 Functional scores and sub-scores according to ALSFRS-R.

Preprocessing

Data were preprocessed in accordance with the approach proposed by Carreiro et al.12, which assumes the patients are followed up regularly and perform a normative set of tests after each appointment. As patients may not be able to perform all tests in a single day, the method takes their temporal distribution into account when learning from the available clinical records by computing snapshots of the patient’s condition by grouping tests performed within a clinically accepted time window.

Following these assumptions, we performed a hierarchical (agglomerative) clustering with constraints to compute the patient’s snapshots, a state-of-the-art procedure to perform alignments along a follow-up12. The constraints applied when grouping the sets of evaluations followed well-established principles as in12: (1) the evaluations that compose a snapshot cannot belong to the same test as clinicians do not prescribe the same test twice; and (2) all the evaluations considered in the same snapshot should be consistent regarding the critical features of interest (i.e., the patient should be either in the critical endpoint or not in all the records composing the snapshot). For this study, the cutting point for creating the snapshots was defined as 100 days and goes in line with Carreiro et al.12.

Figure 6
figure 6

An example of the transformation of the original data into patient snapshots following Carreiro et al. approach12. Patient 2 is the only individual who reached a C2 critical status (Q1 \(\le\) 1), with the corresponding date being identified in its snapshots. Other critical dates based on tests are further computed based on well-established clinical criteria.

At this stage, we compute five datasets (one for each of the critical endpoints) with the patient’s snapshots which have a critical feature, establishing, for each snapshot, if the patient is or is not in a critical endpoint (binary feature). The critical feature value (target to be learned by the classifier) was computed for each critical endpoint based on the date on which a patient’s critical status was detected. For each one, the critical date considered and validated by the clinicians was the date of the first evaluation with the following ALSFRS-R conditions (see Table 2):

  • C1: critical when Q12 \(\le\) 3

  • C2: critical when Q1 \(\le\) 1

  • C3: critical when Q3 \(\le\) 2

  • C4: critical when Q5 \(\le\) 1 \(\vee\) Q6 \(\le\) 1

  • C5: critical when Q8 \(\le\) 1

As an example, for the target endpoint C1 (need for NIV), the critical feature identifies whether a patient will evolve to a critical status (need for NIV), occurring when the patient has a date within the defined interval where the Q12 score is lower than 3. Figure 6 depicts an example of the computation of patient snapshots.

After creating the patients’ snapshots, we have to compute the learning examples used by the predictive models. According to its critical point of interest, each dataset needs to have the patient’s evolution for a critical state, depending on the observed changes k days from the snapshot. We create the binary target class Evolution (E), where 1 represents an evolution for a critical status within k days from the snapshot, and 0 represents an unchanged critical status within the same time window.

The process of labelling the snapshots is performed based on the date on which a critical status was detected12. A patient’s snapshot (with date i) in which he/she was in a critical state between i and \(i + k\) is labelled as E=1 (situation A). The snapshots having a date more than k days before the critical status date (outside the time window) are labelled as E=0 (situation B). In the case of patients for who a crtical status has never been detected, their snapshots are labelled as E=0, existing at least one snapshot after \(i + k\) days (situation C). The snapshots with no critical status information after \(i + k\) days are considered not eligible for the analysis since it is impossible to ensure an evolution or not to a critical status in the considered time window (situation D). The snapshots in which the patient is in a critical status are also not eligible for the analysis since we aim to predict the evolution from a non-critical state to a possible critical one (situation E). Figure 7 shows examples of the Evolution computing process.

Figure 7
figure 7

Definition of class Evolution (E) according to the patient’s evolution to a critical status in the interval of k days where i is the median date of the snapshot.

We chose 3 clinically relevant time windows for this study: 90, 180 and 365 days (3, 6 and 12 months). Therefore, the process resulted in 3 datasets for each target endpoint and time window (15 in total). The number of snapshots in each dataset (discriminated by the classes) is documented in Table 4.

Table 4 Initial class distribution concerning each critical endpoint of interest and time windows (after snapshots creation)—N is the number of snapshots in which the patient will not evolve within the considered time window since the date of the snapshot, and Y is the number of snapshots in which the patient will evolve.

Finally, since the underlying triclustering algorithm is a quasi-exhaustive algorithm1 and we want to make the predictions based on current and recent clinical evaluations, we defined a maximum length on historical data to assist the prognostic tasks. With this assumption, we need to transform our datasets coupling snapshots to create the final learning instances which will feed to the model. The process of grouping snapshots is depicted in Fig. 8 and consists in defining a maximum size L and grouping consecutive snapshots for each patient. The size of sets (number of snapshots) will be defined by \(\min (L, nP)\) where nP is the number of available snapshots for a given patient.

Figure 8
figure 8

Example on the computation of snapshots with maximum length \(\min (L, nP)\), in this case, L = 3 and nP is represented by the number of snapshots (where the patient was not in a critical state) availabe for each patient. P4 has only 2 (\(nP=2\)) snapshots before the critical state, and only one set with these 2 snapshots was considered.

The final learning examples, used in the experiments, considered 3, 4, and 5 consecutive snapshots (CS) per patient, corresponding to clinical evaluations at 3, 4, and 5 consecutive appointments, respectively. The Evolution (Y or N) label of the last snapshot is considered as the target class. The new class distributions and the coupled snapshots are depicted in Table 5.

Table 5 Initial class distribution concerning each target endpoint and time window, after creating the learning examples considering 3, 4, and 5 consecutive snapshots of patient historical assessments.

Table 5 shows we face considerable class imbalance. In some time windows considered in this case study, the expression of non-evolution patients (class N) is far superior to that of evolution patients (class Y). To tackle this evident imbalance and prevent its drawbacks in the classification process, when the number of examples belonging to the majority class (N instances) is higher than 2/3, we first perform a Random Undersample (RU) until obtaining a representation of 2/3 in the dataset and then used SMOTE36 to oversample the minority class examples achieving an equal number of both classes.

Baseline results: prognostic models based on patient snapshots

Reproducing the methodology based only on patient snapshots and time windows presented by Carreiro et al.12, we performed experiments to predict the evolution of a given patient to a critical status for each of the critical endpoints of interest. Predicting the progression to assisted ventilation (need for NIV) is further included. The experiments were conducted with the datasets preprocessed, as explained in previous sections. Resulted from the creation of snapshots, missing values are observed (ranging between 8 and 15% prevalence). To surpass this problem, and since we are dealing with temporal data, we imputed missing values using the values in the previous snapshot (Last Observation Carried Forward). After this, for the snapshots that had not an earlier snapshot (which were residual in number), we imputed missings with the mean/mode for the specific feature.

We evaluated four classifiers: Naive Bayes (NB), SVM with Gaussian kernel, XGBoost (XGB), and Random Forests (RF) due to their state-of-the-art performance in this kind of predictive task23,25.

The evaluation was made using a 5 \(\times\) 10-fold stratified cross-validation scheme where we ensured that all the assessments from a given patient were all in the train/test fold. Moreover, to improve the model performance, we tackled the class imbalance within cross-validation, applying the same steps explained in the previous section only in the training folds.

Tables 6 and 7 show the benchmark results. Superior results are observed against the reference state-of-the-art results gathered in a previous study (need for NIV)12. As observed in the original study12, the results for Sensitivity are lower than for Specificity, understandable as positive cases (Evolution = Y) are the minority class.

Table 6 Baseline results using data preprocessed following the approach proposed by Carreiro et al.12 learned with 4 classifiers: Naive Bayes (NB), Support Vector Machine (SVM), Random Forests (RF) and XGB (eXtreme Gradient Boosting) to predict the Evolution for each of the target endpoints, C1, C2, and C3, within the considered time windows (90, 180 and 365 days), respectively.
Table 7 Baseline results using data preprocessed following the approach proposed by Carreiro et al.12 learned with 4 classifiers: Naive Bayes (NB), Support Vector Machine (SVM), Random Forests (RF) and XGB (eXtreme Gradient Boosting) to predict the Evolution for each of the target endpoints, C4 and C5, within the considered time windows (90, 180 and 365 days), respectively.

Triclustering-based classification results

To prove that historical clinical evaluations improve the model predictions, using triclusters as features, we applied our triclustering-based classification approach in accordance with the principles introduced in section “Methods”. For this case study, we opted to detach the triclusters into biclusters and then use them as features. Note that these biclusters are slices of the mined triclusters representing the temporal disease progression. As introduced, each slice is used individually to better represent the state of patients at a specific time point, given the expected differences across the temporal dimension.

As for the baseline, we performed experiments using four classifiers: Naive Bayes, SVM with Gaussian kernel, XGBoost, and Random Forests. The full results are documented in Supplementary Information File SI1 corresponding to the prognostic models for predicting the progression to the critical status C1, need for NIV; C2, need for an auxiliary communication device; C3, need for PEG; C4, need for a caregiver and C5, need for a wheelchair, respectively. We present the results for AUC, Sensitivity, and Specificity obtained with the models for time windows of 90, 180, and 365 days, identified by the clinicians as clinically relevant. We considered different numbers of historical assessments, creating datasets with 3, 4, and 5 consecutive snapshots (CS). Note that for each dataset (each one with examples with different history sizes) we applied the proposed approach using distances (D) and correlations (C) as the similarity criteria between the patients and the detached biclusters (from triclusters). Table 8 presents a summary of the best-obtained results for each target endpoint according to the three different considered time windows.

Comparing the gathered results with the baseline obtained by the state-of-the-art approach proposed by Carreiro et al.12 (see Fig. 9), we highlight the following:

  • triclustering-based classification obtained promising results, predicting all the target endpoints with solid accuracy. The best models achieved AUC results up to 90% predicting the progression for the target endpoints;

  • overall, triclustering-based predictors using current-and-past patient’s assessments are better than baseline models using only one evaluation (each snapshot individually) in predicting the progression to a critical status in ALS;

  • prognostic models of progression to C5 (wheelchair need) were those with minor differences in results against the baseline;

  • predicting progression to C1 – C4 states yield distinctively higher predictive accuracy using the proposed triclustering-based approach against baselines. Mid- and long-term predictions yield differences up to 10pp;

  • prognostic models achieved AUC above 90% when predicting the need for an auxiliary communication device (C2), PEG (C3) and caregiver (C4). Most of the best predictions needed 5 appointments, but mid-term prediction for the need for PEG (C3) and short-term prediction for the need for a caregiver (C4) only required 3;

  • overall, the distance criteria between patients and triclusters, when compared against peer correlation criteria, yield the best predictive results. The models with the best results were typically learned from a patient history with 5 follow-ups. However, for C2 and C4 needs, short-term prognostics (90 days) yielded better results using only the 3 latest snapshots from patient follow-up;

  • the high standard deviation of sensitivity estimates shows the inherent difficulty of predicting the positive class (Evolution=Y);

  • the triclustering-based approach allows to collect discriminative patterns of disease progression, promoting better model interpretability in clinical domains.

Table 8 Summary of the best AUC results obtained with the triclustering-based classification approach for each of the target endpoints according to each of the considered time windows.

Some limitations should be noted. First, the approach is focused on dynamic features. Note, nevertheless, that static features can be straightforwardly combined along triclustering-based features for the classification training step. Appendix 1 shows the results of using the static features described in Table 10 together with the triclustering features using the best model parameters and classifiers as shown in Table 8. Second, the triclustering algorithm’s ability to deal with the heterogeneity inherent to this type of data is limited since categorical variables need to entail a denormalization step (nominal variables) or numeric encoding (ordinal variables). Finally, despite the considerably large size of the conducted cohort in light of ALS prevalence, the validation of predictors in international populations is highlighted as a subsequent relevant step.

Figure 9
figure 9

Comparative plot of AUC results obtained by the baseline vs the triclustering-based classifier. Blue bars are referred to triclustering-based classifier results while orange bars are referred to the baseline.

Model interpretability

The relevance of a prognostic methodology should be evaluated not only by its predictive performance but also by its guarantees of interpretability. The proposed triclustering-based approach allows us to collect essential patterns of disease progression (used as features of the new space), promoting better model interpretability in clinical domains. In addition, the importance of the input patterns/features for the predictive model can be further recovered to rank the discriminative relevance of the underlying patterns.

To perform the model explainability and identify the more relevant patterns used by the models, the unified SHAP approach37 was applied. In particular, we select the KernelExplainer, and TreeExplainer methods, which introduce the possibility of directly measuring local feature interaction effects38. The goal is to understand what are the most relevant features, what features appear together, and whether the patterns found are clinically relevant to understand the patient’s progression to the critical endpoints: C1, need for NIV; C2, need for an auxiliary communication device; C3, need for PEG; C4, need for a caregiver and C5, need for a wheelchair.

Figure 10
figure 10

Top 20 patterns (triclusters) used by the triclustering-based classifiers. The terminology used is the following: patterns name starts with ‘Tric’ followed by an identifier, and the snapshot (bicluster) position in the set of snapshots, in which 0 is the first position. Class 0 represents ‘non-evolutions’, and Class 1 represents ‘evolutions’.

Table 9 Most relevant patterns used by the best three models.

We chose to analyze three target endpoints for three different time windows. All the outputs of the remaining endpoints and time windows are made available in a repository (see section “Data availability”). Figure 10 and Table 9 illustrate the top patterns found by TCtriCluster and selected by the classifiers to make the predictions. For the sake of simplicity, we reproduce only the outputs for Random Forest models.

An overall analysis reveals that the majority of the selected patterns refer to the last snapshot/time-point of the triclusters. This makes sense since this is the snapshot closer to the target. However, patterns corresponding to previous snapshots remain relevant since they can reveal other meaningful properties, including the underlying disease progression rate.

Conclusions

A new methodology was proposed to learn predictive models from longitudinal data using a novel triclustering-based classifier. To this end, TCtriCluster, an extension of triCluster, is proposed to handle heterogeneous clinical data with a temporal contiguity constraint. This restriction was shown to be effective in improving the efficacy of the target predictive models, highlighting its relevance for triclustering three-way time series data. We further show that triclustering-based classification enhances prognostic tasks with the potentialities of model interpretability, enabling the discovery of domain-relevant temporal patterns, then used as features in the predictive models.

As the central case study, we targeted the problem of predicting the clinical progression of ALS patients towards disease endpoints within clinically relevant time windows (90, 180 and 365 days). In particular, we focus on the prognostic of five relevant endpoints (need for non-invasive ventilation, auxiliary communication device, PEG, caregiver, and wheelchair) and assess predictability limits using different lengths of patient historical assessments.

The triclustering-based models achieved good results in short-term predictions (AUC higher than 90%) for the need for an auxiliary communication device and the need for PEG. Short-term prognostics of the need for NIV, caregiver, and wheelchair also yield good predictive performance (AUC around 85%). Some of these models improved their performance while predicting in the mid and long-term. The proposed methodology shows general improvements against state-of-the-art in the capacity to predict the target endpoints, confirming the relevance of using triclusters to perform data transformations sensitive to local patterns of disease progression. The possibility of extracting group-specific patterns along time frames of arbitrary length offers a higher degree of feature expressiveness, which is generally lacking in peer approaches. Another relevant property of the proposed transformation is the preserved interpretability of the produced features as they reveal informative progression patterns that discriminate a given outcome of interest. The inspection of those patterns unravels groups of individuals with coherent temporal variations on a subset of the clinical assessments throughout the follow-up.

This study represents a significant advance in prognostic prediction in ALS, showing generalized improvements in the predictability of degenerative progression towards critical states, meaning clinical interventions. This offers the unique opportunity to better-preparing families for the next illness stages and further entails individualized management with the purpose of optimizing independence, function, and safety, therefore reducing symptom burden and improving the quality of life of the patients.

The proposed triclustering-based methodology can further be used to learn predictive models with different types of three-way data, encompassing prognostic problems in other diseases with available longitudinal cohort studies.