Five-year trajectories of multimorbidity patterns in an elderly Mediterranean population using Hidden Markov Models

This study aimed to analyse the trajectories and mortality of multimorbidity patterns in patients aged 65 to 99 years in Catalonia (Spain). Five year (2012–2016) data of 916,619 participants from a primary care, population-based electronic health record database (Information System for Research in Primary Care, SIDIAP) were included in this retrospective cohort study. Individual longitudinal trajectories were modelled with a Hidden Markov Model across multimorbidity patterns. We computed the mortality hazard using Cox regression models to estimate survival in multimorbidity patterns. Ten multimorbidity patterns were originally identified and two more states (death and drop-outs) were subsequently added. At baseline, the most frequent cluster was the Non-Specific Pattern (42%), and the least frequent the Multisystem Pattern (1.6%). Most participants stayed in the same cluster over the 5 year follow-up period, from 92.1% in the Nervous, Musculoskeletal pattern to 59.2% in the Cardio-Circulatory and Renal pattern. The highest mortality rates were observed for patterns that included cardio-circulatory diseases: Cardio-Circulatory and Renal (37.1%); Nervous, Digestive and Circulatory (31.8%); and Cardio-Circulatory, Mental, Respiratory and Genitourinary (28.8%). This study demonstrates the feasibility of characterizing multimorbidity patterns along time. Multimorbidity trajectories were generally stable, although changes in specific multimorbidity patterns were observed. The Hidden Markov Model is useful for modelling transitions across multimorbidity patterns and mortality risk. Our findings suggest that health interventions targeting specific multimorbidity patterns may reduce mortality in patients with multimorbidity.


Supplementary 1
Prevalence of the 60 chronic diseases included in the study in individuals aged 65-99 years (N= 916 619, Catalonia, 2012). In three last columns, list of diseases included by prevalence cut off (1%, 2%, All) Rank

Supplementary 2. Machine Learning techniques applied to the dataset
This annex is devoted to describing the procedure followed to obtain the set of multimorbidity patterns that characterize the patient population and to identify the longitudinal trajectories along the set of most frequent patterns.
Initial dataset selection procedure The initial database was composed of the registered diagnosis of 60 diseases of =916,619 patients, over the time span of years (from 2012 to 2016). Initially, to obtain consistent results in the validation process, the diseases with a prevalence of less than 2% were obviated: only 47 diseases survived this screening. To enable the temporal evolution analysis, one sample per year was taken for each patient to obtain a vector containing categorical information about the presence or absence of each subsisted disease, in addition to age and gender. As a result, the dimension of these vectors is .
The dataset was thereby organized in temporal subsets in accordance with the time period. denominated factor scores. The new dimension, , was selected by applying the Karlis-Saporta-Spinaki rule [2], that is the criterion of retaining the factor scores significantly higher than the mean factor score. The transformed dataset is formed as with , where each -dimensional vector is obtained as where is a -dimension vector that guarantees that the distribution of the transformed database is centered on zero, in all dimensions of the new observation space. When applying the Karlis-Saporta-Spinaki rule the reduced dimension was .

Initial soft clustering by applying FCM
It is assumed that the patient population is distributed into a set of clusters, corresponding to the different multimorbidity patterns. HMM are normally computed by applying an iterative algorithm named the Baum-Welch algorithm, as explained in the following subsection. In order to initialize some of the parameters used by the Baum Welch algorithm, the soft clustering procedure, Fuzzy C-means (FCM) algorithm is applied on the dataset , to distribute the patients into a set of clusters. Each cluster is assumed to characterize a multimorbidity pattern. Cluster analysis involves assigning patients to clusters such that individuals in the same cluster are as similar as possible, while individuals belonging to different clusters are as dissimilar as possible. In our procedure the distribution of patients into the set of patterns is different in each of the years of study but it is assumed that the features characterizing each cluster through the corresponding centroid are stable over time. FCM is an unsupervised form of grouping in which each individual or patient can belong to more than one cluster, for that, it is assumed that each patient has some graded or fuzzy membership in each of the clusters. The FCM clustering process assigns a membership factor to each vector , for where is the number of clusters. tells us the degree to which the patient in year , belongs to the cluster. The similar/dissimilar properties are measured in FCM through a heuristic global cost function , which is the weighted sum of squared errors within groups: The norm used in (2) defines a measure of similarity between a data point and the cluster prototypes [3]. The weighting exponent parameter , is selected to adjust the blending of the different clusters. The objective function is iteratively minimized by assigning a membership to the vector in the cluster or pattern, and updating the cluster centroids for . So, once we have applied FCM to the dataset , we obtain a membership matrix , of size and a matrix of cluster centroids of size . The membership variables and the centroids coordinates are computed as Since clustering algorithms are unsupervised machine-learning techniques, the model fitting to the dataset is traditionally computed through cost functions that depend on both the dataset and the clustering parameters, and are denoted as validation indexes. We computed three different well-known validation indexes to obtain the optimal number of clusters K and the optimal value of the fuzziness parameter m: the partition coefficient validation index (whose cost function is maximum for the optimal model), and the Xie-Beni and the partition entropy validation indexes whose cost functions are minimum for the optimal models [4]. We computed 100 runs. In each run the centroids in (4) have been randomly initialized, the FCM algorithm is applied and a final set of centroids is obtained. We checked m=1.1, 1.2 and 1.5 and K=5,...,40. The averaged performance obtained through the three validation indexes, resulted optimal for m=1.1 and values between 6 and 12. Once this range was established, clusters were identified by clinical assessment.

HMM fitting by applying the Baum-Welch algorithm
The observed data is assumed to be a time series of discrete time, for instance, the patient is represented by the observed time sequence . Therefore, to model the temporal evolution of patients through the different clusters or patterns, the sequential individual observations are assumed to follow a dynamic random process represented by a hidden Markov model (HMM). This means that each patient follows a longitudinal trajectory over years , through the clusters. For example, the patient could belong to cluster 1 between years 1 and 3 inclusive, and evolve into cluster 2 in the last two years.  Figure A1 shows a state transition diagram over time, where each row corresponds to one of the states and each column corresponds to the latent variables . The red path shows the evolution of a patient through the longitudinal trajectory .

Figure A1. State transition diagram of an HMM with states. An individual trajectory is shown by red arrows.
A joint longitudinal representation of the latent variables associated with the patient and the observed sequence is shown in Figure A2 to emphasize the fact that each observed vector is conditioned on the state of the corresponding latent variable . In the following, the full set of latent variables is denoted as , i.e.
. Once we have defined the model, our main challenge is to estimate the model parameters .
Given the observed dataset , the maximum likelihood procedure consists of seeking the model parameters that most likely have generated . To this end, adopting the independent and identically distributed random variables assumption, whereby each patient sequence is independent of each other, the likelihood of the joint probability distribution of the complete dataset denoted as , has to be maximized in terms of . As a closed solution for the parameters does not exist, this problem is traditionally solved by applying the Expectation-Maximization (EM) algorithm, also called the Baum-Welch algorithm in the context of HMM [5,6]. The BW algorithm is well documented in the literature. It is a procedure that iteratively alternates between the expectation step (E-Step) and the maximization step (M- Step). It must be initialized by choosing starting values for the model parameters. In our work we have randomly initialized the transition probabilities , respecting the summation constraints associated with their probabilistic nature. Further, the parameters of the emission distribution , have been initialized as the centroids obtained by applying the FCM clustering algorithm given in (4), = , and the initial associated covariance matrices have been taken as: Initial state probabilities have been taken as the probability of being in a certain cluster for the previous FCM clustering. To compute these parameters, all the memberships have been summed per cluster divided by the total number of individuals.
The BW algorithm alternatively iterates between an expectation step and a maximization step.
In the expectation step, we compute the expectation of the log likelihood function with respect to the latent variables conditioned to the observed sequences and Once the best model has been selected, the log-likelihood function of the observation is compared with the theoretical log-likelihood function that would result from a randomly generated dataset of the same size and distributed as .
Given that is large, the log-likelihood function does not depart much from the theoretical value if the model is correct. On the other hand, when there is a misalignment between the model that generates the observed data and the parameters used to compute the log-  Obviously, a mismatch in the model results in the reduction of the likelihood function. This could be used as a method to validate the model assumed in the HMM. To that end, we generated Gaussian synthetic data assuming the parameters obtained with the HMM model. The log-likelihood function of the observation was -16.4, which is in high agreement with the computed model and correctly fitted the observed data .