Data encoding for healthcare data democratization and information leakage prevention

The lack of data democratization and information leakage from trained models hinder the development and acceptance of robust deep learning-based healthcare solutions. This paper argues that irreversible data encoding can provide an effective solution to achieve data democratization without violating the privacy constraints imposed on healthcare data and clinical models. An ideal encoding framework transforms the data into a new space where it is imperceptible to a manual or computational inspection. However, encoded data should preserve the semantics of the original data such that deep learning models can be trained effectively. This paper hypothesizes the characteristics of the desired encoding framework and then exploits random projections and random quantum encoding to realize this framework for dense and longitudinal or time-series data. Experimental evaluation highlights that models trained on encoded time-series data effectively uphold the information bottleneck principle and hence, exhibit lesser information leakage from trained models.


Introduction
In recent years, deep learning has demonstrated remarkable success in a wide variety of fields [1], and it is expected to have a significant impact on healthcare as well [2].Many attempts have been made to achieve this breakthrough in healthcare informatics, which often deals with noisy, heterogeneous, and non-standardized electronic health records (EHRs) [3].However, most clinical deep learning tools are either not robust enough or have not been tested in real-world scenarios [4,5].Deep learning solutions, approved by regulatory bodies, are less common in healthcare informatics, which shows that deep learning hasn't had the same level of success as in other fields such as speech and image processing [6].Along with well-known explainability challenges in deep learning models [7], the lack of data democratization [8] and latent information leakage (information leakage from trained models) [9,10] can also be regarded as a major hindrance in the development and acceptance of robust clinical deep learning solutions.In the current context, data democratization and information leakage can be described as: • Data democratization : It involves making digital healthcare data available to a wider cohort of the AI researchers.Achieving healthcare data democratization can result in global clinical models that are trained on data sampled from multiple geographical locations instead of being limited to a single site.These models are expected to be robust to population-specific distribution shifts and to exhibit better generalization.The wider access to healthcare data might also facilitate algorithmic contributions tailored for healthcare applications through a broader AI research base.However, healthcare data is "sensitive" and is rightly protected by data privacy laws making data democratization difficult [11,12].
• Latent Information Leakage: Deep learning models are known for their higher complexity and ability to learn the non-targeted latent information about the underlying population [10].This latent information often acts as an inductive bias to improve the predictive performance of the model.However, the latent information can be sensitive or help in inferring the information such as age, sex, and chronic or acute medical conditions of the patients.The revelation of this sensitive patient information can be considered a privacy violation.
Hence, data democratization and prevention of latent information leakage are two of the important factors required to develop better clinical deep learning solutions that are secure and widely acceptable.Data democratization can be equated with the irreversible de-identification of healthcare data so that no patient can be linked to an electronic health record (EHR).A "truly" de-identified dataset cannot be considered sensitive or private, so sharing it publicly would not result in a violation of any data privacy laws [13].However, researchers have not developed a truly irreversible de-identification mechanism, and there is always a risk of re-identification [11,13,14].It is a common practice to anonymize healthcare data, but the resulting data might not always be considered to be completely de-identified.In general, the notion of anonymity or de-identification is closely related to the amount of computational effort and time required to re-identify a patient from the data.An EHR can be considered non-anonymous (even after the anonymization process) if the efforts to re-identify the patient are considered reasonable.The "reasonable efforts" are subjective and should often change with advancements in technology [11].As a result, simple data anonymization is not enough to achieve "true" de-identification and data democratization.Hence, there is a requirement for information processing mechanisms that could mask private information while retaining the data semantics to enable data sharing or democratization.
Aside from data democratization, trained clinical deep learning models also raise privacy concerns.These models have been shown to learn bio-markers of diabetic retinopathy, anemia, and chronic kidney disease from fundus images [15].Apart from that, deep learning models can also predict gender, sex, ethnicity, and smoking status from a fundus image [16].Hence, it is quite possible that a model trained for predicting diabetic retinopathy from fundus images can learn a feature representation that may reveal non-targeted patient characteristics and sensitive information regarding the ailment of a patient suffering from chronic kidney disease and anemia.In the same way, a model trained for mortality prediction based on the first 48 hours of hospitalization in the intensive care unit (ICU) can provide information on the patient's acute as well as chronic conditions that may or may not be related to the current ICU stay or mortality prediction (see Results).The extensive feature extraction in deep learning models results in better performance for the targeted task and the discovery of new non-targeted or passive digital bio-markers for various diseases, thereby improving healthcare provision.This disclosure of non-targeted information, however, violates the privacy of the patients and poses an ethical dilemma.
Deep learning models can be seen as a combination of feature extraction layers mapping an input example to a compressed, semantic representation or embedding and the last classification layer mapping the embedding to the model output or predictions (Fig. 1D).According to the information bottleneck (IB) principle, an ideal model should minimize mutual information between input and embedding while maximizing it between embedding and the model output [17,18].In other words, the embedding extracted by the model should only contain task-specific information and must strip spurious or non-task-related information that might be present in the input.To avoid latent information leakage, clinical deep learning models should be designed or trained to follow the IB principle and must only extract the "relevant" information from the input patient data.
This paper argues that encoding healthcare data can achieve both data democratization and latent information leakage prevention simultaneously.To accomplish this, we envision an encoding framework transforming preprocessed and anonymized longitudinal health records or multi-variate timeseries data into a new space.This encoding framework should meet the following requirements: • One-way transformation: The recovery of the original data from its encoded version should either be impossible or extremely computationally challenging.
• Imperceptibility of the encoded data: The encoded data should be a highly convoluted version of the original data.It should not be possible to infer any information about the original data just by performing a simple manual or computational analysis of its encoded version.Feature scaling or normalization, for example, cannot be considered a viable method of encoding information.
• Semantic preservation: The encoding framework must preserve the semantic characteristics of the original data to a large extent so that deep learning models can be trained effectively over the encoded data.In theory, the performance of models based on original data and encoded data should be the same.
The realization of this envisioned framework will enable the sharing of encoded healthcare data without violating privacy constraints.Ideally, encoded data is imperceptible, and the encoding process is practically irreversible.Therefore, it is very unlikely that any sensitive patient information can be derived from encoded data by either a manual or computational inspection.Nevertheless, there is an obvious trade-off between the imperceptibility and semantic preservation requirements of the envisioned encoding framework.A better semantic preservation results in lesser imperceptibility and vice versa.As a result, the encoded data can be seen as a "deformed" version of the original data, and much higher computational effort is required to extract its semantic characteristics.This nature of encoded data results in inherent regularization during model training and indirectly enforces the IB principle (see Results) to prevent latent information leakage.This paper exploits random projections [19,20] and random quantum circuits [21,22] as information processing tools to achieve the desired encoding framework for the multivariate time-series data.Both random quantum circuits and random projections can deform or project the data to a space where it becomes imperceptible.By exploiting random projections or random quantum circuits, the proposed encoding framework performs piece-wise or segment-wise temporal encoding of each feature or each 1-d signal of a multivariate time series (Figure 1B).Since there is no interference among features or signals of the original time series, the resulting encoded time series retains its semantic characteristics.However, random transformations deform each segment of a signal to make them incomprehensible.Due to the fact that the original data, encoding method, transformation matrix (used for random projections), and random quantum circuit will not be made public, it is extremely difficult to reverse the encoding process.Hence, data democratization can be achieved by sharing this encoded data among deep learning researchers.Additionally, In addition to the clinical features and task labels, meta-data about the patients corresponding to ICU stays are also available.This includes gender information in all datasets as well as chronic, acute, and mixed conditions afflicting the patients in MIMIC-III and information about the ethnicity of the patients in eICU.More details about the clinical features representing time series in all datasets can be found in the supplementary document.The following experiments are designed to evaluate different aspects of the proposed framework: • On the original as well as on the encoded data, we train 5 different neural networks on each dataset and compare their relative performances.These models include long short-term memory (LSTM) [28], temporal 1-D convolutions [29], multi-resolution temporal convolutions [30], transformer [31], and vision transformer [32].More details can be found in Section 4.
• To assess the latent information leakage from the trained models, a single dense layer mapping the penultimate layer embedding to the patient information is used.For the MIMIC-III dataset, gender and 25 "latent" or non-targeted patient disorders (acute, chronic, and mixed) are predicted from the penultimate layer embedding of the trained mortality prediction models.For PhysioNet, we only predict gender as the latent information.Similarly, we predict the gender and ethnicity of patients from the trained ARF prediction models.
Since we are employing only a single linear layer to map embedding to either sex, ethnicity, or patient disorders (Fig. 1D), no further feature transformations are employed.The performance of this latent information prediction depends entirely on the nature of embedding.More details about this experimental setup can be found in Section 4.

Performance on the encoded time-series data
The performance of different models on the encoded and the original datasets is depicted in Fig. 2.An analysis of this figure highlights the following: • Across all datasets, the performance of models trained and evaluated on the original data is superior to that of the models dealing with the encoded time-series data.Across all models on MIMIC-III, random quantum encoding and random projection-based encoding resulted in an average relative performance drop of 3.52 (±1.25)% and 15.29 (±2.51)%, respectively.The average relative performance drop of 5.13 (±1.94)% and 22.44 (±4.75)% are observed in PhysioNet dataset.Similarly, a drop of 2.13 (±1.59)% and 12.45 (±2.29)% was observed for the eICU dataset.This drop is expected as data encoding deforms the time series to preserve patient information.
• Despite the performance drop exhibited by models trained on the encoded data, these models (especially models trained on the quantum encoded data) seem to be effective in performing the target task.This shows that the encoding framework, either using random projection or random quantum encoding, can retain essential semantic characteristics in the deformed encoded data.
• The performance of random quantum encoding is significantly better than random projections across all models and datasets.This shows that quantum encoding can preserve the semantic characteristics to a greater extent while deforming the data using random quantum operations.

Patients' gender and ethnicity prediction
The performance for the task of predicting a patient's gender from the trained mortality and ARF prediction models is depicted in Fig. 3.According to the analysis of Fig. 3, we can effectively predict patients' gender from the trained models on original or non-encoded data.The behavior is common across all datasets and all models regardless of their modeling capacity.Similarly, the analysis of Fig. 4 illustrates that we can identify the patients' gender from the ARF models trained on the original time-series data.Although gender and ethnicity are not sensitive information, these results highlight that trained models can indeed reveal the latent non-targeted patient characteristics.

Latent disorder prediction
Fig. 5 (a) illustrates the performance of predicting the patient disorders from the trained MIMIC-III models in a latent manner.The analysis of this figure highlights that all models trained on the original data generate representations   or embedding that reveal information regarding the patients' disorders.Across all models trained on original data, a macro AUROC of approx.0.7 is observed for the latent disorder prediction.It should be noted that the macro AUROC obtained by different models within this experiment is comparable to the performance achieved by the targeted patient phenotype prediction models (see Fig. S1 of the supplementary report).This shows that mortality prediction models are susceptible to leaking the patients' private medical information.Fig. 5 (b) and (c) depict the performance of predicting the chronic and acute disorders (a subset of 25 disorders) from the trained LSTM mortality prediction models.Similar behavior is observed for all the other models considered in this study (see supplementary document Fig. S2).The analysis of the figures shows that these models learn the characteristics that help infer or predict non-targeted patient disorders.We can predict both chronic and acute disorders that may or may not be correlated with the mortality prediction.According to the odds ratios [33] for acute and chronic disorders (Fig. 6 (a) and (b)), most acute conditions exhibit a higher risk of mortality (odds ratio >> 1), while most chronic conditions are weakly associated with the mortality (≈ 1).This shows that some conditions, such as shock and acute renal failure, are directly associated while others, such as chronic lipid metabolism disorder and chronic renal disease, are not associated with mortality in the MIMIC-III patients corresponding to the ICU stays.Irrespective of odds ratios or the association between disorders and mortality, we can identify patients ailing from these ailments with an average AUROC of > 0.7.

Encoded data minimizes information leakage
The analysis of Fig. 3, 4, and 5 further highlights that the models trained on the encoded data exhibit lesser latent information leakage than the models trained on the original data.On average, MIMIC-III models trained on data encoded using quantum circuits and random projections (rather than original data) exhibited a relative drop of 20.11 (±2.45)% and 23.52 (±3.98)% in performance for the latent gender prediction task.The PhysioNet models also exhibited relative drops of 22.66 (±5.45)% and 28.21 (±8.98)% for the data encoded using the quantum circuit and the random projections, respectively.Similar behavior is observed for the eICU models where quantum encoding and random projection-based encoding resulted in a relative drop of 23.1 (±4.25)% and 31.11(±7.6)% in the performance of gender prediction task.The encoding data also resulted in a drop in the performance of the ethnicity prediction tasks.A similar trend is observed for the patient disorder prediction from MIMIC-III models.Quantum encoding and random projections resulted in a relative drop of 12.5 (±3.79)% and 18.75 (±5.45)% in the average macro AUROC score.
As discussed in Section 1, models that follow the IB principle exhibit lesser information leakage.The drop in latent information leakage from models trained on the encoded data can be attributed to the lower mutual information (MI) between the model input (i.e.original or encoded) time series and the penultimate layer embedding generated from the trained models.To uphold this claim, we estimated MI from the LSTMs trained using the original and the encoded data.For the feasibility of MI estimation, we used the average and vectorized form of the input time series to compute MI.Fig. 7 illustrates the distribution of estimated MI between the input and the penultimate embedding.It is clear from this figure that the utilization of encoded data minimizes the MI between the model input and the learned representation.As a result, it can be inferred that training models with the encoded data inherently enforce the IB principle in the training process.Hence, the learned embedding only retains the information required to predict mortality while stripping away the non-essential or non-targeted patient information.
The above analysis shows that random projections-based encoding provides maximum prevention against latent information leakage.However, if we analyze Fig. 3 along with Fig. 2, it is also evident that random projection-based encoding results in a larger drop in the performance of the targeted task.On the other hand, random quantum encoding provides more balance between the performance of the targeted task and the prevention of information leakage.

Visual inspection of the encoded data
The difference between the original and the encoded examples from the MIMIC-III dataset is illustrated in Fig. 8.The analysis of this figure makes it clear that both temporal trends and distribution of features in the original and the encoded time-series examples are noticeably different.To further analyze the impact of the encoding process on the time-series data, 50 original and encoded examples from the positive (mortality) class of the PhysioNet dataset were randomly selected and averaged to obtain the original and encoded "summary" time series.Fig. 9 depicts the behavior of four randomly chosen features from these summarised time series.Again, the distribution of magnitude as well as temporal trends of the encoded features is different from the original time-series features.By mere visual inspection, it is near impossible to perceive any information from the encoded data (both quantum encoding and random projections).Similar behavior is observed for the other features.Hence, the encoding process provides an additional layer of privacy over the de-identified data and pushes the community a step closer to achieving data democratization.

Data encoding and explainability
Encoded data is expected to retain the semantic characteristics of the original data to a large extent such that models trained on original and encoded data exhibit similar behavior.Along with similar performance, the features relevant for predictions in models trained on both the original and the encoded data should largely be the same.While encoded data does retain semantic characteristics, there is a noticeable performance drop due to data encoding (Fig. 2).This shows that the behavior of models trained on encoded data could be different.Shapely additive explanations (SHAP) [34] are employed on the LSTM models, trained using the original and encoded PhysioNet and MIMIC-III datasets, to study the impact of data encoding on the feature relevance.Fig. 10 illustrates the top 10 relevant features identified by SHAP in each PhysioNet model.The analysis of this figure highlights that there is a huge overlap between the sets of relevant features identified for the "original" and the "quantum encoded" models.Moreover, Glascow comma score and blood urea nitrogen are regarded as the most relevant features in both models.Although there is some overlap between the relevant features of the original and the "random projection-based encoded" models, the overall behavior seems to be very different.Similar behavior is observed for the MIMIC-III models (see Fig. S3 of the supplementary document).Hence, it can be argued that random quantum encoding has been successful in retaining semantic characteristics such that the resultant models exhibit similar behavior to the original models up to an acceptable level.

Discussion
This study proposes to encode the healthcare data to achieve data democratization and prevent information leakage.The irreversible and semantic preserving encoding process outlined in this paper allows getting an imperceptible and deformed form of healthcare data that can be shared among researchers without violating privacy constraints.Moreover, the inherent regularisation imposed on neural network training due to deformity of the training data induces the information bottleneck (IB) principle and results in models that are less susceptible to latent information leakage (Fig. 7).
This paper explores random projections and random quantum operations to piece-wise encode the 1-d signals in a time series as highlighted in Section 4 and Fig.  signals exhibit different feature distributions and follow somewhat imperceptible trends (Fig. 9).Models trained on the encoded data perform well, highlighting that the semantics are effectively preserved (Fig. 2).Concomitantly, the information leakage from these models is significantly lesser than models trained on the original data (Fig. 3, 4 and 5).Thus, as desired, the proposed encoding framework results in encoded data that is visually imperceptible, effective for deep learning and minimizes information leakage from the trained deep models.The encoding transformation is near irreversible if the attacker does not have access to the random Gaussian matrix or random quantum circuit.To learn this transformation computationally or with deep neural networks, both encoded and original data must be available, which is clearly not the case.
Based on the performance comparison between models trained on data encoded using random projections and random quantum circuits (Fig. 2 and 3), it is evident that random quantum encoding balances the deformation of data and preservation of the semantic characteristics, which results in better models.Apart from the better performance of quantum encoding, retrieving the original data from its encoded version is theoretically harder as outputs of the quantum circuit or the state of qubits are observed by projecting them on a pre-defined basis state [35].These measurements become the encoded signals, and estimating the qubit state from this measurement can be ambiguous as multiple qubit states could map to the same measurement.Even if the measurement weren't an issue, one would have to estimate the structure of quantum circuits (number of layers, number of gates, and nature of gates) as well as the parameters of rotation gates to reverse the encoding process possibly.In contrast, we only need to estimate the transformation matrix (4 × 4) to reverse the random projections.It will be sufficient to have access to even one pair of original and encoded data to estimate this transformation matrix accurately.
The encoding of data is also able to facilitate collaboration among multiple research entities without infringing upon the privacy of the patients.All data collection sites can potentially share their data among themselves so that every site can access the "global" data.As discussed in Section 1, the models trained on this global data are expected to be more generic and better at handling the population-specific distribution shifts.However, the random nature of encoding at each site will impede this cross-site collaboration.This problem can be solved by agreeing beforehand on the nature of data transformation, such as quantum circuit structure and rotation gate parameters.Thus, encoded data from each site will be in the same transformation space, allowing deep learning models to be trained effectively.Similar to cross-site collaboration, federated learning also allows a central server to collaborate with multiple sites for training a global model without data sharing [36].However, the structure of models is entirely decided by the server, and sites do not have any independence.Each site is expected to perform similar operations using its local data.On the other hand, data encoding allows the researchers at each site to access the global data and work independently on any deep learning algorithm.
As an alternative to data encoding, generative models such as generative adversarial networks have been used to generate the data points that do not represent any real patients and theoretically can be shared publicly [37,38].However, generative models capture the input distribution of the data points, and it is always possible to sample data points that are extremely similar to the input points or real patients.Similar to the subjectivity around the de-identification process (as discussed in Section 1), a sampled example that is similar to real patient data may or may not be considered as a fabricated data point.Moreover, generative modeling requires extensive computational resources and a "large amount" of data to fabricate the data points effectively.On the other hand, the proposed encoding approach is an information-processing framework and does not require any training.
In hindsight, the proposed encoding framework suffers from two major drawbacks: • The proposed framework was designed to encode the data for deep learning models that are known for their higher modeling capabilities.The deformations in the encoded data make it difficult for traditional machine-learning models with limited capabilities to process the encoded data effectively.Similarly, the summary statistics of the encoded and the original data are poles apart (by design).As a result, traditional statistical or epidemiological analyses are not feasible.Hence, the use cases of the encoded data are only limited to deep learning models.
• Both random projections and random quantum encoding don't provide any explicit mechanism to control the degree of deformation in the encoded data or to maintain a balance between the imperceptibility of the encoded data and retaining the semantic information.The performance drop in models trained on the encoded data can be attributed to a lack of this balancing mechanism that could have induced only the minimum required deformation to make the data imperceptible and prevent information leakage.
In the future, we will work towards coming up with non-linear or sub-linear data transformations that could either automatically balance the deformation and semantic retention trade-off or provides a hyper-parameter to control the degree of deformation in the encoded data.Using such data transformations in the proposed encoding framework will improve the performance of the target tasks while enabling data democratization and preventing information leakage.

Proposed encoding framework
A uniformly sampled multivariate time series is a collection of multiple 1d signals representing features measured over time.Suppose X ∈ R F ×T is a time-series consisting of F 1-d signals of length T , and x ∈ R T or x = [x 1 , x 2 , x 3 , . . .x T ] is one of the F signals.The proposed framework transforms the time-series X by performing piece-wise encoding of every 1-d signal in X.The framework divides the signal x into segments or chunks of length n as x = x 1:n , x n+1:2n , . . .x (T −n+1):T and applies transformation operation f () on every segment: where e j ∈ R n is encoded version of jth segment of x.Note that the dimensions of transformed/encoded and input segments are the same, and a segment length of n = 4 has been used across all experiments.Each encoded segment of length n is temporally concatenated to obtain the encoded version, e ∈ R T , of the signal x as: e = e 1 , e 2 , e 3 . . .e (T /n) .Similarly, transformation or encoding operation is applied on all F 1-d signals to transform X into the encoded time-series E ∈ R F ×T .In this paper, we have used random projection and random quantum encoding as data transformation operation f () in the proposed framework.Both these mechanisms are discussed below.

Random projection
Random projection is a method of projecting the input data into a random subspace using a random projection matrix whose columns are of unit length [19,20].It is mainly used for dimensionality reduction, and it approximately preserves the similarity among data points in the projected subspace as outlined by Johnson-Lindenstrauss lemma [39].In this work, we are not interested in dimensionality reduction and are mainly concerned with projecting the input into a random subspace to make the data imperceptible.To attain this goal, we use a projection matrix R ∈ R n×n whose entries are randomly sampled from Gaussian distribution N (0, 1/n).This projection matrix can be used to encode the jth segment xj ∈ R n×1 of signal x as: where e j ∈ R n×1 is the encoded version of the input segment.As discussed above, we have used a segment length of n = 4, so the projection matrix of 4 × 4 is used for data encoding.

Random quantum encoding
The term random quantum encoding refers to a process of data transformation through the use of a quantum circuit containing multiple gates with random parameters [21].The Quantum circuit used in this study is shown in Fig. 1(c).This circuit is composed of the following components [40]: • Qubits or wires: The circuit consists of four wires to represent four quantum bits or qubits.A qubit is a quantum system having a resting state Initially, all four qubits are in a resting state.The number of wires or qubits is dictated by the length of the input segmented signal i.e. n = 4.
• Rotation gates (RX): These gates rotate the qubit around x-axis by φ k (radians) on its Bloch sphere projection, where k is the index of RX gate Encoding For Healthcare Data Democratisation in the circuit.This rotation operator with φ k randomly chosen parameters can be defined as: The resultant qubit state |ψ after applying kth RX gate to qubit |ψ is given as: • Controlled-NOT (CNOT) gates: A CNOT gate is used to entangle the two qubits and has no parameters.First qubit is considered as control and the second qubit is flipped if the control is |1 .As we can see, CNOT deals with 2-qubit quantum system whose basis states are {|00 , |01 , |10 , |11 }.An input to CNOT gate is a linear superimposition of these basis states: , where a, b, c and d are the complex coefficients.Hence, CNOT operation can be defined as: Encoding using quantum circuit: The whole quantum encoding process can be divided into three steps: • Encoding input segment on wires: The first step is to infuse or project the input segment xj on wires of the circuit.Each element (x j n ) of the input segment xj corresponds to nth wire or qubit.To encode the information from xj n to nth qubit, we rotate this qubit by xj n radians around y-axis on its Bloch sphere projection.This rotation operator is described as: where φ n is πx j n .The process of applying this operator is similar to RX gates (Equation 4).
• Processing qubits by quantum circuit: After preparing the qubits as encoded versions of the input segment, these qubits are processed by the quantum circuit (Fig. 1C) described above.
• Measuring the outputs: This operation is used to register the state of a qubit after applying all the quantum operations.In this work, we use the expectation of the Pauli-Z operator (Z) to measure the output state of a qubit |ψ .We know that Z can be defined as [40]: where |0 0| − |1 1| is the spectral decomposition form of Z.Then, the expected value of Pauli-Z operator for |ψ can be determined as:

Models
Following neural network architectures have been used for the prediction tasks: • Long short-term memory (LSTM) based model: This model has also been used in [12] for mortality prediction.It consists of an LSTM with 256 recurrent units followed by a linear layer with 1 node and sigmoid activation for binary prediction.• Temporal convolution neural networks [29,30]: These models exploit 1-d convolution operations for modeling input time series.In this work, we have used temporal convolutional networks having four temporal blocks followed by a linear layer with 1 node and sigmoid activation that maps the 64 dimensional embedding to an output score.Each temporal block consists of two 1-d convolution layers having 64 filters of size 9. Also, each convolution layer is followed by a 1-d batch normalization layer, parametric ReLU activation, and a dropout layer with a dropout probability of 0.75.We have also used a multi-branch temporal convolutional network (Multi-TCN ) consisting of two multi-branch temporal blocks followed by a linear layer with 1 node and sigmoid activation.Each multi-branch temporal block consists of three branches that process the same input in parallel.Each branch consists of two 1-d convolutional layers having 32 filters where each convolutional layer is followed by batch normalization and parametric ReLU activation.The kernel size of the filters in first, second and third branches is of 5, 7 and 9, respectively.The last layer of the block is a 1-d conv layer with 96 filters of size 1 that acts as an aggregator and selects the relevant features from all three branches.• Transformer and Vision Transformer: A transformer [31] consists of an encoder and a decoder, each formulated by stacking multiple self-attention layers that can capture the global dependencies in the input signals.We employ a transformer encoder having 1 attention layer with sixteen 256dimensional heads followed by two linear layers having 16 and F nodes.We get an output of shape T × F , where T are the time steps and F is the feature dimensions of the input time series.This encoder output is temporally pooled and given as input to a two-layered MLP classifier having 128 and 1 node to obtain the binary prediction.Vision transformer (ViT) [32] is another prevalent variant of transformer that is explicitly designed for images.In this work, we also exploit a vision transformer for modeling time series.The architecture is almost identical to the previously described transformer.In ViT, a learnable F -dimensional token is appended to the input time series and is given as input to the MLP classifier (instead of temporally pooled representation as done in classical transformer).
As discussed earlier, we have used single-layer models having either 1 node for latent binary prediction tasks or 25 nodes followed by sigmoid activation for latent disorder prediction from the mortality prediction models.

Training mortality prediction models
Irrespective of the data encoding strategy or model architectures, all prediction models are trained using the same parameter setting.Binary cross-entropy is used as the loss function.Adam optimizer with a fixed learning rate of 0.001 and a batch size of 64 are used for training the models.Each model is trained to provide the best performance on the validation examples, and the best-performing model configuration is used for evaluating the test or held-out dataset.

Training latent information prediction models
For training information leakage or latent prediction models, we again followed the same train, validation, and test split that is available for the prediction tasks.For estimating the information leakage from a trained model, we obtained the penultimate layer embedding for all examples.These embedding are used as input representation for training and evaluating the latent information prediction models i.e. gender, ethnicity and disorder prediction models.Binary cross-entropy loss, Adam optimizer with a fixed learning rate of 0.001 and a batch size of 256 are used for training the models.

Implementation details
All the experiments are performed using Python.PyTorch is used as a deep learning library.Quantum operations have been simulated using PennyLane [41].Mutual information for the IB analysis (Fig. 7) has been estimated using [42].
Project at the Hong Kong Centre for Cerebro-cardiovascular Health Engineering (COCHE).AT is supported by an EPSRC Healthcare Technologies Challenge Award (EP/N020774/1).The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health, InnoHK -ITC, or the University of Oxford.6 Performance on the encoded data for the task of Phenotyping As discussed in Results (main document), along with mortality labels, we also have information about the 25 phenotypes or disorders corresponding to each ICU stay.In the main text, we used these disorders for evaluating the information leakage from the trained mortality prediction models using a multi-label multi-class setup.
Here, we train all the models discussed in the main text for directly phenotyping each ICU stay.The last layers of these models were changed to have 25 nodes followed by sigmoid activation to favour multi-label multi-class predictions.The train, test and validation setup used for mortality prediction is also used here.Adam optimiser with a learning of 0.001 and a batch-size of 64 is used for training all models.
Fig. S1 illustrates the performance of different models as a function of encoding method for the task of phenotyping.Similar to the binary prediction experiments, the relation between encoding methods and model performance is almost identical.Both random projection and quantum encoding results in a noticeable drop in performance.However, this drop is much more bearable in the case of quantum encoding.10 Latent prediction of chronic and acute disorders from all models

Fig. 1 :
Fig.1:A.Conceptual rendition of a multi-variate time-series as a collection of multiple 1-d signals.B. Illustration of the process of encoding one of the 1-d signals within a time-series using the proposed encoding framework.C. Illustration of a quantum circuit that is composed of four wires, unitary rotation gates, and controlled-NOT (CNOT) gates.D. Illustration of the setup used for evaluating the latent information leakage from the trained mortality prediction models.Penultimate layer embedding from the trained mortality prediction models is given as input to a linear or dense layer dealing with either gender or patient disorders predictions.
Gender prediction from the embedding of MIMIC-III models.Gender prediction from the embedding of eICU models.

Fig. 3 :
Fig. 3: Model-specific and aggregated (across all models) performance for the task of patients' gender prediction using the penultimate embedding generated from different models trained on (a) MIMIC-III and (b) PhysioNet and (c) eICU datasets, respectively.

Fig. 4 :
Fig.4: Model-specific and aggregated (across all models) performance for the task of predicting patients' ethnicity from the trained acute respiratory failure prediction models.

Fig. 5 :
Fig. 5: Performance for the task of latent patient disorder prediction using the penultimate embedding generated from MIMIC-III mortality prediction models.The chronic and acute disorders shown in (b) and (c) are subsets of 25 different conditions considered in this work.A single model predicts the presence/absence of all 25 disorders.
Odds ratios between occurrences of acute disorders and mortality.

Fig. 6 :
Fig. 6: Odds ratios comparing occurrences of (a) chronic and (b) acute disorders against the outcome i.e. mortality in MIMIC-III dataset.

Fig. 7 :
Fig.7: Kernel density estimation plots illustrating the estimated mutual information (MI) between embedding obtained from the trained LSTM models and the input time-series in MIMIC-III (first row ) and PhysioNet (second row ) as a function of the encoding methods.To facilitate the MI estimation, the input time series is either vectorized or averaged across time dimensional before computing MI with the embedding.

Fig. 8 :
Fig. 8: Heat maps illustrating the differences in magnitude and trends of the original and encoded time-series examples.Each row represents an input time series and its encoded versions.

Fig. 9 :
Fig. 9: Difference in average trends and the average magnitude of the original and encoded signals or features.These signals are obtained by averaging 50 time series representing patients exhibiting mortality in the PhysioNet dataset.

Fig. 10 :
Fig. 10: A comparison of SHAP-based feature importance in LSTM models trained on (a) original, (b) quantum encoded, and (c) randomly projected versions of the PhysioNet dataset.
|0 and an excited state |1 .These states are mutually orthogonal and any qubit state |ψ can be represented as a superposition of |0 and |1 as: |ψ = a |0 +b |1 , where a and b are complex numbers that must satisfy |a| 2 + |b| 2 = 1.|a| 2 and |b| 2 represent the probability of |ψ being in |0 and |1 , respectively.

Fig. S1 :
Fig. S1: (a) Performance of LSTM, vision transformer (ViT), transformer, temporal convolutional network (TCN) and multi-branch temporal convolutional network (Multi-TCN) for phenotype prediction.(b) Aggregate performance of all models as a function of encoding method.

Fig. S2 :
Fig. S2: Odds ratios of different ethnicity to the acute respiratory failure in the eICU dataset.

Fig. S3 :
Fig. S3: A comparison of SHAP-based feature importance in LSTM models trained on (a) original, (b) quantum encoded, and (c) randomly projected versions of the MIMIC-III dataset.

9
Fig.S4 illustrates the differences in encoded and original time-series examples from PhysioNet dataset.

Fig. S4 :
Fig. S4: Heat maps illustrating the differences in magnitude and trends of the original and encoded time-series examples.Each row represents an input time-series and its encoded versions.

Fig. 5
Fig.5of the main text shows the accuracy in prediction of different chronic and acute disorders from LSTM mortality prediction models.Similarly, Fig.S5and Fig.S6documents the performance of predicting chronic and acute disorders from other mortality prediction models.
1. Compared to the original time-series signals, the resultant encoded Features(b) SHAP analysis of LSTM trained on quantum encoded PhysioNet dataset.

)
Here | 0|ψ | 2 and | 1|ψ | 2 represents the probabilities of |ψ being in states |0 and |1 , respectively.Note that a|b represents the inner product between |a and |b in Hilbert space.For nth wire or qubit, the measured value (e j n ) is regarded as the encoded version of the corresponding element xj n of the input segment xj .By considering all n qubit measurements, we obtain an encoded version (e j = [e j 1 , e j 2 , . . .e j n ]) of the input segment xj .The encoded signal e is obtained by temporally concatenating all the encoded segments e j .
7 Odds Ratio: Ethnicity vs Acute Respiration Failure in eICU dataset The analysis of Fig. S2 highlights the odds ratio for Black, Asian and Caucasian are close to 1.This shows that there is no apparent association between ethnicity and ARF in the eICU dataset.Despite that, we are able to predict the ethnicity of the patients from trained ARF models effectively (Fig 4 of the manuscript).SHAP analysis of LSTM trained on MIMIC-III.0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 Features(b) SHAP analysis of LSTM trained on quantum encoded MIMIC-III.