Robust Prediction of Personalized In Vivo Response to Unseen Drugs From In Vitro Screens Using a Novel Context-Aware Deconfounding Autoencoder

,


Introduction
Omics profiling, particularly transcriptomics, is a powerful technique to characterize cellular activity under various conditions, allowing developing machine learning models for personalized phenotype drug screening [1,2,3].However, the success of such predictive models largely relies on the availability of sufficient amounts of data with coherent and comprehensive annotations.In clinical, we are often short of a large number of coherent in vivo patient data with drug treatment and response history.As a result, most drug response prediction studies to date have mainly utilized Figure 1: Illustration of CODE-AE method.Given labeled cell line drug response data, the aim of CODE-AE is to predict individual patient clinical response to drugs that has been tested in the cell line model but never been tested in the patient.Conceptually, CODE-AE consists of four steps.(1) The unlabeled gene expression profile of both cell lines and patients are mapped into an embedding space using unsupervised learning.(2) Confounding factors are disentangled from intrinsic biomarkers in the embedding.(3) The distribution of embeddings of patients is aligned with that of cell lines.(4) A supervised model is trained based on the deconfounded and aligned embedding of cell lines, and tested using deconfounded and aligned embedding of patients.transcriptomic profiles from panels of in vitro cancer cell lines.Although such an approach is promising, the utility of drug response predictive models built with in vitro data is often limited when applied to actual patients due to the genetic and environmental differences between in vitro cell lines and patient-derived tissue samples as well as various confounding factors and overwhelming context-specific patterns that may mask intrinsic biological signals.
The inability to predict patient-specific in vivo drug responses from in vitro screening data using a machine learning approach originates from a fundamental challenge of out-of-distribution (OOD) problem.The underlying assumption of existing machine learning methods is that the data distribution of training data and unseen testing data is the same.When applying the machine learning model trained from cell line in vitro data to patient in vivo samples, the performance could significantly deteriorate due to the data distribution shift.Current efforts in solving the OOD problem include domain adaptation and meta-learning.Many domain adaptation methods have been proposed in computer vision and natural language processing.However, their application to aligning in vitro with in vivo data could be suboptimal due to noisy and heterogeneous nature of omics data.An adversarial deconfounding autoencoder (ADAE) was proposed to facilitate the domain adaptation of gene expression profiles [4], but ADAE has not been tested for translating in vitro data to in vivo data.A meta-learning approach named TCRP has recently been proposed [5] to improve the transferability of predictive drug response models from in vitro screens to in vivo settings.However, TCRP still requires a certain number of patient data for each drug tested to train the predictive model.It is often infeasible to obtain such data, especially for new drugs.Thus the actual application of TCRP to clinic is limited.It can be only applied to the scenario of few-shot learning but not zero-shot learning.Most relevant to this work, Jia et al. has applied Variational Autoencoder (VAE) pre-training followed by Elastic Net supervised training (VAEN) to learn cell line models and applied them to impute in vivo drug response [6].However, VAEN is not optimized to reliably transfer cell line data to patient samples and disentangle confounding factors [6] due to the fundamental limitation of VAE.
The unsolved question is how we can robustly predict individual patient response to a new drug that has never been tested in patients only using in vitro drug screens in the setting of zero-shot learning.To address this problem, we proposed a novel Context-aware Deconfounding Autoencoder (CODE-AE).In CODE-AE, we devised a self-supervised (pre)training scheme to construct a feature encoding module that can be easily tuned to adapt to the different downstream tasks.We leverage both unlabeled cell line and tissue samples for the self-supervised (pre)training of the encoder.The unique features of CODE-AE are that it can extract both common biological signals shared by incoherent samples and private representations unique to them, and separate confounding factors from them.CODE-AE allowed us to generalize existing cell line omics data for the robust prediction of in vivo patient-specific response to new drugs in the setting of zero-shot learning, a critical component for patient-specific drug screening and personalized medicine.
To show the performance lift achieved by CODE-AE, we performed exhaustive comparative studies on CODE-AE (and variants) and other competing methods over the breast cancer patient-derived tumor xenograft ex vivo PDTC dataset [7].Moreover, to demonstrate the potential of CODE-AE in personalized medicine, we apply CODE-AE to predicting chemotherapy resistance for patients in vivo, which is a significant obstacle to effective cancer therapy.Lack of effective personalized chemotherapy tailored to individual patients often leads to unnecessary suffering and reduces the chances of patient's overall survival.Our extensive studies show that CODE-AE effectively alleviates the out-ofdistribution problem when transferring the cell line model to patient samples, significantly outperforms state-of-the-art methods AD-AE [4], TCRP [5], VAEN [6], and COXEN [8] that are specifically designed for transcriptomics data as well as other popular domain adaptation methods Variational autoencoder (VAE) [9], Denoising autoencoder (DAE) [10], Deep Coral [11], and Domain Separation Network (DSN) [12] in terms of both accuracy and robustness.Using CODE-AE, we screened 59 drugs for 9,808 cancer patients.The in vivo drug screening not only further validated CODE-AE but also discovered novel personalized anti-cancer therapies and drug response biomarkers.Thus CODE-AE provides a useful framework to take advantage of rich in vitro omics data for developing generalized clinical predictive models.

Overview of CODE-AE
As illustrated in Figure 1, given the gene expression profiles of labeled cell lines and unlabeled patients as input, CODE-AE learns a nonlinear embedding function.The embedding function projects a high-dimensional expression profile of each cell line or patient to a low-dimensional vector, which distinguishes biological meaningful signals from confounding factors and transform the embeddings of cell lines and patients into the similar distribution.The embedding function is learned using both labeled and unlabeled data.Thus, CODE-AE is able to generalize to unlabeled data.Furthermore, aligning the distributions of cell line and patient embeddings across labeled and unlabeled samples can alleviate the OOD problem.
Algorithmically, CODE-AE pretrains the neural network using an autoencoder that minimizes a data reconstruction error (See Methods).The pretraining step is useful for generalization to an unlabeled dataset.The brief architecture of CODE-AE is shown in Figure 2. Different from conventional autoencoders such as VAE, CODE-AE has two unique features.Firstly, it learns shared signals between the cell line data and the patient data as well as private signals that are unique to the cell line and the patient.The rationale is to disentangle common biological signals between data sets from context-specific patterns that overwhelm drug response biomarkers [5].Secondly, CODE-AE regularizes the embeddings of cell lines and patients to have their distributions be similar.In this way, the knowledge learned from the cell line model can be transferred to patients.We test three regularization methods: simple concatenation of cell line and patient embeddings (CODE-AE-Base), minimization of their MMD loss (CODE-AE-MMD), and minimization of their adversarial loss (CODE-AE-ADV).After the unsupervised pretraining, a supervised drug response model can be trained from the aligned common embedding using the labeled cell line data.When a new patient comes, the drug response can be predicted from the trained cell line model based on the pretrained common embedding of the patient.Thus, CODE-AE does not need to use any labeled patient samples to construct the predictive model.

The optimal configuration of CODE-AE variants
We first studied the performance difference of CODE-AE variants in the combination of different configurations of the learning paradigm.In particular, the configuration choices we explicitly explored include (1) whether to include a hidden representation normalization layer, (2) use of only the shared representation or concatenation of private and shared representation for the downstream task, and (3) loss function for determining the similarity of tissue embeddings to cell line embeddings.We evaluated the performance of these CODE-AE variants using PDTC test dataset.For each drug, we ranked the performance of twelve models in comparison from the best (rank = 1) to the worst (rank = 12) based on Area under ROC curve.The final performance was determined by the average rank across all drugs.As shown in Table 1 and Supplemental Figure S1, the overall best performing CODE-AE variant is the CODE-AE-ADV with hidden representation normalization, aligned cross-domain features over shared representation, and adversarial loss.Hidden representation normalization can avoid embeddings being pushed towards meaningless zero-valued vectors by soft orthogonality loss.The better performance achieved by using only the shared representation in downstream tasks is aligned with our assumption that shared representation is affluent with transferable deconfounded biological meaningful information.We will only compare CODE-AE-ADV with other baseline models and apply it to actual prediction tasks in the following sections.

CODE-AE-ADV alleviates the out-of-distribution problem on transferring cell line models to patient data
We used the shared encoder from pre-trained CODE-AE-ADV to generate the new representations for in vivo TCGA patient samples and in vitro CCLE cell line samples.To inspect how well the embeddings of cell line data and patient samples are aligned, we generated tSNE plots to visualize their embeddings, as shown in Figure 3.The embeddings of TCGA and CCLE samples largely overlap in tSNE manifolds.It indicates that CODE-AE-ADV is effective in aligning cell lines and patients' representations.As a comparison, the low-dimensional representations of CCLE and TCGA data are clearly separated when using original gene expression profiles or a vanilla autoencoder.Thus, CODE-AE-ADV is more effective in addressing out-of-distribution problem than the embedding algorithms that are used by the state-of-the-art method VAEN [6].

CODE-AE-ADV outperforms state-of-the-art models when predicting ex vivo drug responses
We then compared CODE-AE-ADV with baseline models using ex vivo drug response data from PDTC.As shown in Table 2, CODE-AE-ADV is overall the best performer for PDTC test dataset when evaluated by the average of predicted ranks in terms of both area under ROC curve (AUROC) and area under Precision-Recall curve (AUPRC).When evaluated by AUROC, CODE-AE significantly outperformed the second best performer ADAE that is specially design to remove confounders [4] and VAEN [6] that is developed for predicting in vivo drug responses from in vitro screens.When evaluated by AUPRC, CODE-AE-ADV is still significantly better than ADAE but only slightly better than VAEN.Interestingly, the state-of-the-art domain adaptation methods DSN, CORAL, and DANN in computer vision and natural language processing do not perform well, even worse than the standard VAE.This observation suggests that omics data could be fundamentally different from images and human languages.Specially designed deep learning model is needed to address the challenges in omics data integration and predictive modeling.Among these domain adaptation methods, DSN performs the best.DSN uses the same idea as CODE-AE to separate common and unique features between two domains.This observation suggests the importance in disentangling shared and private information between cell line and patient samples.It is not surprising that models incorporated unlabeled pretraining clearly outperform the ones without.It notes that all models used the exact same data and training procedure for the pretraining (details in Method section).The difference is the neural network architecture and the loss function.COXEN on average is the best performer among models without unlabeled pre-training.
Figure 4 shows the drug-wise performance of each algorithm, as measured by the AUROC and AUPRC of predicted drug responses for each drug across all mice.CODE-AE-ADV performed the best for three drugs BX795, Obatoclax Mesylate, and Axitinib with the AUROC above 0.8.The AUPRC of CODE-AE-ADV was quite stable.It was above 0.75 for most drugs.Overall, among 50 drugs, CODE-AE-ADV and VAEN ranked the best in 22 drugs and 14 drugs, respectively, in terms of AUPRC.For several drugs such as PD173074, AZD7762, and sorefinib, the performance of CODE-AE-ADV was constantly worse than other baselines.The reason for this performance disparity is unclear and worth further investigation.From a practical perspective, it will be interesting to use different methods for different drugs.

CODE-AE-ADV outperforms state-of-the-art models when predicting patient chemotherapy resistance
We further evaluated the performance of CODE-AE-ADV for predicting clinical chemotherapy resistance in two aspects: either a lack of reduction in tumor size following chemotherapy or the occurrence of clinical relapse after an initial "positive response to treatment" [13] as detailed in Method section.Again, we used the AUROC and AUPRC of predicted drug response for each drug across patients to evaluate the performance.The results are shown in Figure 5.
Consistent with the results from the PDTC data set, CODE-AE-ADV consistently outperformed baseline models in most cases when evaluated by ROCAUC.CODE-AE-ADV achieved the highest value of AUROC in 5 out of 7 cases with statistically significant (two-sided t-test p-value ≤ 0.05) performance gain, and second highest in the other 2 but without statistically significant difference from the best one.CODE-AE-ADV is overall the best performer in this task as shown in Table 2 when evaluated by the average predicted ROCAUC ranks.VAEN performs relatively well in the case of relapse days after treatment, but much worse than CODE-AE-ADV, ADAE, and TCRP when evaluated by the clinical diagnosis.Furthermore, CODE-AE-ADV significantly outperforms ADAE by a large margin in all cases.This observation further supports that CODE-AE-ADV can enhance the signal-to-noise ratio in the biomarker identification because the major difference between CODE-AE-ADV and ADAE is to disentangle shared and private embeddings between cell lines and patient tissues.When the performance is evaluated by AUPRC, there is no clear winner among CODE-AE-ADV, VAEN, and VAE.There is no statistically significant differences between any two of them.TCRP and COXEN are inferior to the other methods that take advantage of the pre-training using unlabeled data, demonstrating the importance of the pre-training in few-shot and zero-shot learning.Fluorouracil, Temozolomide with a p-value of 0.0028, 0.0296, respectively.In the case of drug Sorafenib, Gemcitabine and Cisplatin, CODE-AE-ADV ranked the second best, while its performance difference with the respective best performing methods is not statistically significant with a p-values of 0.3609, 0.8746 and 0.3102, respectively.

CODE-AE-ADV is successful in deconfounding biological variables
To show that CODE-AE-ADV can generate transferable embedding through deconfounding uninteresting confounders while preserving true biological signals present in expression data, we selected the gene expression data sets used in ADAE [4] that represents the state-of-the-art for deconfounding biological variables and performed the same evaluation process.Specifically, we chose the TCGA brain cancer expression data set with gender information as confounding factors and brain cancer subtype classification as target downstream tasks.We first performed encoder training A PREPRINT -OCTOBER 24, 2021 with all unlabeled gene expression profiles regardless of gender.Then, we trained elastic net classifiers only using the labeled data from one gender for predicting cancer subtypes of another gender.Following the evaluation procedure described in [4], the classification performance measured in the area under the precision-recall curve (AUPRC) as well as area under the receiver operating curve (AUROC) of ten-fold cross-validation was reported in Table 3. Besides, we performed a two-sample t-test on the average performance between CODE-AE-ADV and the best non-CODE-AE method in each setting, and its results are shown on the last row of Table 3.We observed the same trends as those in the drug resistance prediction.Using the model built from female data to predict male data, CODE-AE-ADV significantly outperforms ADAE, the second-best performer measured by both AUROC and AUPRC.When applying the model trained from male data to predict female data, the performance of CODE-AE-ADV is slightly worse than CORAL, but the difference is not statistically significant.Both CODE-AE-ADV and CORAL significantly outperform the state-ofthe-art deconfounding method ADAE (p-value ≤ 0.05).Additionally, two other observations from the drug response experiments hold.Disentangling common and private features of different data modalities is essential for cell line to tissue transfer learning, and adversarial loss is more effective than MMD loss.

Application of CODE-AE-ADV to personalized medicine
To further validate CODE-AE-ADV with patient data and demonstrate its utility in personalized medicine, we applied CODE-AE-ADV (per-drug) trained with CCLE data to screen 59 drugs for 9,808 cancer patients from TCGA.Our major findings are summarized below.

Gene expression differential analysis of drug target verifies predicted patient drug responses from CODE-AE-ADV
We first verify our predictions by checking the association of our predicted drug response with the gene expression values of drug targets.If the predicted patient response on the targeted therapy is correlated with the drug target, it provides the validation of our prediction.We select the top 5% predicted drug sensitive patients as our responsive patient set and bottom 5% as the resistant patient set.We found that 47 out of 50 targeted therapies are statistically significantly (p-value ≤ 0.05) associated with the differential target gene expression between drug sensitive and resistant patients (Supplemental Table S1).This indicates that CODE-AE-ADV could capture the drug mode of action.

CODE-AE-ADV identifies precision anti-cancer therapy
We applied spectral biclustering to divide 9808 patients into 100 clusters and 59 drugs into 30 clusters from the predicted drug response matrix.In this way, patients will the similar drug response profile were grouped together.The clustering result for lung squamous cell carcinoma (LSCC), a type of Non-Small Cell Lung Cancer (NSCLC), is shown in Figure 6A.498 LSCC patients were clustered into 45 groups (Figure 6B).The number of patients in each group ranged from 1 (0.2%) to 60 (12.0%).
Among 59 drugs tested, the top 3 most responding drugs to the LSCC are gefitinib, AICAR, and germcitabine.Gefitinib is an EGFR tyrosin kinase inhibitor for the first-line treatment of NSCLC [14].AICAR is an AMPK agonist that can block the growth of cancer cells harboring the activated EGFR mutant [15].Recently, a novel AMPK agonist was discovered to effectively suppress the growth of EGFR-muated NSCLC cell line [16].Germcitabine, a chemotherapy, has long been used as one of the most effective treatments for NSCLC that may not harbor the EGFR mutation [17,18].Consistent with their drug mode of actions, the CODE-AE predicted patient profiles of gefitinib and AICAR are similar, but different from that of germcitabine, as shown in Figure 6A.We further inspected the patient cluster that was predicted to be the most responsive to gefitinib with the number of LSCC patients larger than 30.Besides LSCC, four other cancer types, head & neck squamous cell carcinoma (HN-SCC), cervical & endocervical cancer, bladder urothelial carcinoma, and esophageal carcinoma, were included in this cluster (Figure 6C).Several clinical studies have been carried out and are under going to use gefitinib for the treatment of HNSCC (e.g., https://clinicaltrials.gov/ct2/show/NCT00024089) due to the fact that EGFR is over-expressed in over 90% HNSCC patients [19,20].The meta-analysis based on existing clinical results showed that the efficacy of gefitinib on HNSCC was not significantly different from other chemotherapies [21].Our prediction suggested that only a fraction of HNSCC patients could benefit from the gefitinib treatment.Thus, it is necessary to stratify patients based on their drug response profiles in the design of clinical trials.CODE-AE could be useful tools for this purpose.Similarly, clinical trials for gefitinib to treat cervical cancer (https://clinicaltrials.gov/ct2/show/NCT00049556), bladder urothelial carcinoma (https://clinicaltrials.gov/ct2/show/NCT00246974) and esophageal carcinoma (https://clinicaltrials.gov/ct2/show/NCT01243398 ) are under going since certain patients diagnosed with these cancers were observed to respond to EGFR inhibitors [22,23,24] .Thus, the predictions from CODE-AE are largely consistent with clinical observations.

Conclusion
In this paper, we introduce a new transfer learning framework CODE-AE to predict individual patient drug response from a supervised neural network model trained from cell line data.Extensive benchmark studies demonstrate the advantage of CODE-AE over the state-of-the-art in terms of both accuracy and robustness.When applied CODE-AE to predict drug responses for patients in TCGA, the predictions are largely consistent with existing clinical observations.The performance gain of CODE-AE mainly comes from (1) the unsupervised learning that combines unlabeled data from both cell lines and patient samples, (2) separation of shared common features cross cell lines and patient samples with unique embedding for cell lines or patients, and (3) adversarial training to optimize the similarity and difference between incoherent data sets.CODE-AE could be further improved in several directions.In contrast with cell line data from a pure population of cells, patient tissue data are mixtures of normal, abnormal, and infiltrated immune cells.We can further improve the CODE-AE by the deconvolution of patient gene expression data.We only use transcriptomics profiles to build the predictive model in this study.We can integrate additional omics data such as somatic mutations and copy number variants in the framework of cross-level information transmission [25].Finally, we only apply CODE-AE to cancers.It will be interesting to test the performance of CODE-AE in other diseases besides cancers, which even do not have a large number of cell line data.In principle, CODE-AE can be applied to other transfer learning tasks with two data modalities with shared and unique features.We proposed a novel CODE-AE to generate biologically informative gene expression embeddings to transfer knowledge from in vitro data into patient samples.CODE-AE employed the standard autoencoder as the backbone to leverage the unlabeled gene expression data sets.Inspired by the work on factorized latent space [26] and domain separation network [12], we encoded the samples (from cell lines or tumor tissues) into two orthogonal embeddings, namely private embeddings and shared embeddings.The first one is designed to separate the context-specific signals that overwhelm the common biomarkers.The latter contains the deconfounded common intrinsic biological signals used to transfer knowledge across cell lines and tissues.

CODE-AE Base
As shown in Figure 2, the CODE-AE takes expression vectors from in vitro cell lines and patient tumor tissue samples and represent the unlabeled data set of N t patient tumor tissue samples and N c in vitro cancer cell line samples, respectively.Each sample x will be encoded into two separate embeddings through its corresponding cell line or tissue private encoder E •p and also the weight-sharing encoder E s .The concatenation of these two embeddings of each sample is expected to be able to reconstruct the original gene expression vector x through a shared decoder D, and the reconstruction is done as, where • represents the input gene expression profile, x • is the corresponding reconstructed input sample through the autoencoder component.
stands for the vector concatenation operation.We measure the quality of autoencoder reconstruction through the mean squared error between the original samples and the reconstruction output as below, In our formulation, we factorized each sample's latent space into two different subspaces to capture both domain specific and common information separately.To minimize the redundancy between the factorized latent spaces, we included an additional penalty term, L dif f in the form of orthogonality constraint.The difference loss L dif f is applied to both cell line and tissue samples and encourages the shared and private encoder to encode different aspects of the inputs.We define the loss via soft subspace orthogonality constraint as below, where Z •s are the embedding matrices whose rows are the shared embedding for cell line or tissue samples, while Z •p are the embedding matrices whose rows are the private embedding for cell line or tissue samples.It is obvious that L dif f tends to push the embeddings to meaningless all-zero-valued vectors.To avoid such scenario, we append an additional instance normalization layer after the output layer of each encoder to avoid embeddings with minimal norm.Lastly, the loss for CODE-AE-BASE is defined with the weighted combination between L recon and L dif f as below, where α is the embedding difference loss coefficient.

CODE-AE Variants
With CODE-AE-BASE, we could split cell line or tissue sample's inherent information into the private and shared streams.However, in our baseline experiments, we often found that it was sub-optimal or demonstrated varied performance.Thus, we proposed two variants that showed better and generally more stable performance.Under the CODE-AE framework, for each input sample, CODE-AE factorized it into two orthogonal embeddings.The concatenation of these two embeddings is considered as the new representation of the original input.Given that all samples in our consideration are gene expression profiles regardless of cell line or patient, we assumed that the new representation of original input in the factorized latent space close to each in terms of distributional differences.Hence, we incorporated additional feature alignment component into the CODE-AE-BASE framework.Specifically, the distributional difference of the concatenated representation of private and shared embeddings from both cell line and tumor tissue samples are minimized via the following two approaches.

CODE-AE-MMD.
The first variant, named CODE-AE-MMD, utilized the well known maximum mean discrepancy [27] as the distance measurement between the latent representation of cell line and tissue samples.Maximum Mean Discrepancy (MMD) loss [27] is a kernel-based distance function between samples from two distributions.In particular, we used an approximate version of exact MMD loss in CODE-AE-MMD as below, where Z c , Z t are embedding matrices for cell line and tissue samples respectively, whose rows are the concatenations of each sample's private and shared embedding.z • , z • are the i-th or j-th samples' corresponding embedding vectors.In practice, N will be the batch size.Accordingly, the loss of CODE-AE-MMD is given as below, where β is the MMD loss coefficient.

CODE-AE-ADV
The second variant, CODE-AE-ADV, employed adversarial training to push the representations of cell line and tissue samples to be similar to each other.Specifically, we appended a critic network F that scores representations with the objective that consistently gives higher scores for representations of cancer cell line samples.The encoders for tissue samples are given an additional objective to generate the embedding that could fool the critic network to produce high scores.In this manner, critic network and tissue sample encoders will play a min-max game in the form of an alternative training schedule, which is adopted by Wasserstein generative adversarial networks [28].
To avoid unstable training commonly existing in alternative training schedules, instead of standard WGAN [28], we used the WGAN with gradient penalty [29].Its affiliated loss terms are defined as below, where z • = z •s z •p stands for new representation of input and z = ǫz c + (1 − ǫ)z t and ǫ ∼ U(0, 1).A detailed CODE-AE-ADV learning procedure can be found in (Procedure 1).After the encoder training with unlabeled data as mentioned above, the shared encoder E s could be used to directly generate the deconfounded biological meaningful embedding vectors or append a neural network module for specific downstream tasks.In the latter case, strategies such as gradual unfreezing and decayed learning rate schedule could be adopted to improve task-specific performance further, as shown in our following experiments.

Gene set enrichment analysis
For each drug in consideration, we select the top 5% predicted drug sensitive patients as our responsive patient set and bottom 5% as the resistant patient set.Then, we input each responsive/resistant patient tumor tissue gene expression profiles ( 20000 genes) and their respective predicted response label into GSEA [30,31].Our enrichment analysis is focused on the oncogenic signature gene sets.Each gene set includes a list of genes that are regulated after perturbation of some cancer related genes.For each drug, the statistically enriched gene set for sensitive and resistant patient tissues are used to be explored further.

Clustering analysis
We grouped 9,808 patients and 59 drugs into 100 patient clusters and 30 drug clusters with Spectral biclustering methods based on the predicted drug responses profiles [].For each cluster, we averaged tissues drug response score profiles and then ranked the drugs based on the average scores.The higher score indicates that this cluster of patient is more sensitive to the drug.Unlabeled pre-training (in vitro and in vivo).The unlabeled datasets used for encoder pre-training include cancer cell line and patient tumor tissue gene expression profiles.Specifically, we collected 1,305 cancer cell line samples with corresponding gene expression profiles from the DepMap portal [32] and 9,808 patient tumor tissue samples from The Cancer Genome Atlas (TCGA) [33].All gene expression data are metricized by the standard transcripts per million base for each gene, with additional log transformation.In addition, we used the gene selection method in reference [34] to select the top 1000 varied genes measured by the percentage of unique values in gene expression samples for cancer cell lines and tumor tissue samples separately.Then we combined the two sets of top 1000 varied genes as the input features.There are a total of 1426 genes in the feature set.We also explored other feature selection approaches such as mean absolute difference (MAD).We only reported the results based on the genes selected by the percentage of unique value because baseline methods in consideration showed overall better performance.
Labeled fine-tuning(in vitro).The labeled dataset used for fine-tuning phase is collected from GDSC [35,36].GDSC recorded the cellular growth responses of cancer cell lines against a panel of drugs as the area under the drug response curve (AUC), which is defined as the fraction of the total area under the drug response curve between the highest and lowest screening concentration in GDSC.For each drug of interest, we first identified all cell lines with corresponding drug sensitivity measured in the area under the drug response curve (AUC) and then split these cancer cell lines' sensitivity against this drug into binary labels, namely responsive or non-responsive (resistant).The categorization threshold is selected as the average AUC value of all available cell line drug sensitivity for drugs tested.

Test Dataset
We evaluated the performance of CODE-AE in the setting of zero-shot learning, i.e., the unseen out-of-distribution (OOD) data have never been used in training.It is a more difficult but more realistic scenario than the state-of-the-art method TCRP [5] in which a small set of OOD data was used during the training.Specifically, the predictive model for each drug of interest was learned only with the aforementioned in vitro dataset.While in testing time, we evaluated the model performance with the following ex vivo and in vivo labeled datasets that were not used in the training phase on the prediction task of drug response classification in pre-clinical and clinical scenarios, respectively.
Pre-clinical (ex vivo).We used data from breast cancer PDTC [7] to evaluate the performance of drug response classification in a pre-clinical context.The previous study collected 83 human breast tumor biopsies and established human cell culture from these tumors with mice as intermediaries.Each of these human cell cultures was exposed to a list of drugs.From the list of drugs available in PDTC, we further selected 50 drugs with known protein targets for which cell-line responses had also been recorded in GDSC as drugs of interest.The drug sensitivity classification of each drug was considered as a separate learning task.Similar to the labeled GDSC dataset used during training, the PDTC responses were categorized into binary labels using PDTC AUCs, where the classification threshold is specified as the median AUC value of all available PDTC AUCs of each drug of interest.

Clinical (in vivo).
To evaluate the performance of drug response classification in a clinical context, we primarily consider a practical problem: predict chemotherapy resistance given gene expression profiles of patients while training the predictive model only using the gene expression profile of cancer cell lines.
Clinical chemotherapy resistance can be defined as either a lack of reduction in tumor size following chemotherapy or the occurrence of clinical relapse after an initial "positive response to treatment" [13].Hence, we extracted data sets to assess these two aspects.The patient clinical drug response was acquired from a recent work [34], where patients' clinical response records of two chemotherapy agents Gemcitabine and Fluorouracil from TCGA [33] were extracted.The patients were split into two groups: responders who had a partial or complete response and non-responders who had progressive clinical disease or stable disease diagnosis.Only patients on single-drug therapy through the entire duration of treatment were retained in the study.It notes that the gene expression profile of these TCGA patients could be used in the unsupervised pre-training of CODE-AE and other baseline models, but the drug response data were not used in the supervised fine-tuning.
In addition to using clinical diagnosis to indicate patients' drug responses towards a particular drug, we extracted patients' "new tumor events days after treatment" from TCGA [33] as the standard to divide patients into responders and non-responders.The median number of days of new tumor events was used as the threshold.Similar to the above data set from [34], we only included patients on single-drug therapy through the entire treatment duration in this test data set.For the list of drugs included in this test dataset, the drugs with more than 20 labeled samples are kept.

Baseline models
We compared CODE-AE with the following base-line models that include unlabeled pre-training: VAEN [6], standard autoencoder (AE) [37], denoising autoencoder (DAE) [10], and variational autoencoder (VAE) [9] as well as representative domain adaptation methods including deep coral (CORAL) [11] and domain separation network (DSN) [12] of both MMD (DSN-MMD) and adversarial (DSN-DANN) training variants.Furthermore, we included a more recent adversarial deconfounding autoencoder (ADAE) [4] given its similar formation as DANN [38] and state-of-the-art performance in transcriptomics data sets.In addition, for CODE-AE variants, we also explored different configurations, such as with/without hidden layer normalization, performing a downstream task with concatenated representation, or shared representation in an ablation study with PDTC test dataset.
For fair comparisons, all the encoder and decoder trained in the experiments share the same architecture.Specifically, the hidden representation is of dimension 128.The encoders and decoder are 2-layer neural network modules of dimension (512, 256) and (256, 512), respectively, with the rectified linear activation function.Appended modules such as critic network in CODE-AE-ADV and classifier network used for fine-tuning are 2-layer neural networks of dimension (64, 32) with rectified linear activation, have one output node with linear activation in critic network, and sigmoid activation in classifier networks.Further, the loss weight terms in CODE-AE-MMD and CODE-AE-ADV are all specified as 1.0.Moreover, for models that do not include unlabeled pre-training, we compared CODE-AE with COXEN [8] and TCRP [5] as well as vanilla neural network (denoted as MLP), elastic net classifier (denoted as EN), and random forest classifier (denoted as RF ).TCRP incorporates model agnostic meta-learning technique and is one of the most successful methods for predicting individual patient drug response from the cell line data so far.

Training procedure
For models that include an unlabeled pre-training phase, We first pre-train them for N epochs using the same unlabeled samples from both cancer cell lines and tumor tissues.With parameter grid search, N is selected based on the downstream task performance (over validation set).The pre-trained encoders will then be appended with a classification module to perform the downstream drug sensitivity classification task in the following fine-tuning step.We adopted the early stopping with validation performance in the fine-tuning phase (training phase for the model without unlabeled pre-training).Specifically, the labeled cell line samples were split into five stratified folds (according to drug sensitivity categorization).In one evaluation iteration, four out of five folds of the samples were used as the training set.The remaining one-fold of samples was used as the validation data set for early stopping.At last, the test performance of the classifier in each evaluation iteration was recorded.

Performance evaluation
We choose the area under the receiver operating curve (AUROC) as the measurement metric due to their insensitivity to changes in the test data set's class distribution [39].The model performance was measured in AUROC over the patient tissue expression data and corresponding drug response records.The performance of different methods was compared by the average of AUROCs of five iterations.It is noted that only cell line data were used for the model training and hyperparameter selections, and all ex vivo tissues and patient data were purely used for the testing.In addtion to AUROC, the area under the Precision-Recall curve (AUPRC) is used as an additional metric because it is sensitive to imbalance data.

Figure 2 :
Figure 2: COntext-aware Deconfounding AutoEncoder (CODE-AE) framework.a) CODE-AE Base architecture: A layer-tying shared encoder E s learns to map both cell line and tissue samples to extract common intrinsic biological signals.Private encoders E •p learn to represent cell line/tissue context-specific information as private embeddings.A shared decoder D reconstructs the input samples through the concatenation of private and shared embeddings and the reconstruction quality is measured with L recon .The private and shared embeddings are pushed apart through soft subspace orthogonality loss L dif f .The shared encoder E s appended with an additional classifier network will be trained during fine-tuning and perform inference during testing phase.b) CODE-AE-MMD: A variation of CODE-AE-BASE where the concatenation of private and shared embeddings are kept similar via optimizing L M M D .c) CODE-AE-ADV: A variation of CODE-AE-BASE where the concatenation of private and shared embeddings are kept similar via optimizing L adv .L adv is in the form of min-max optimization between a critic network F and encoder components.

Figure 3 :
Figure 3: tSNE plots of a) original expression b) embeddings generated by standard autoencoder c) embeddings generated by CODE-AE-ADV

Figure 4 :
Figure 4: Performance comparison of PDTC drug response classification as measured by (A) AUROC, and (B) AUPRC.

Figure 5 :
Figure 5: Performance comparison of patient chemotherapy response prediction based on (A) clinical diagnosis, and (B) relapse days after treatment.When classifying the clinical diagnosis for drug Gemcitabine, CODE-AE-ADV out-performs the second best performer statistically significant with a p-value of 0.0290, while for drug 5-Fluorouracil, CODE-AE-ADV outperforms the second best performer with p-value 0.7559.For the classification based on relapsed days after first treatment, CODE-AE-ADV significantly outperforms the second best performer for drug 5-Fluorouracil, Temozolomide with a p-value of 0.0028, 0.0296, respectively.In the case of drug Sorafenib, Gemcitabine and Cisplatin, CODE-AE-ADV ranked the second best, while its performance difference with the respective best performing methods is not statistically significant with a p-values of 0.3609, 0.8746 and 0.3102, respectively.

Figure 6 :
Figure 6: Predicted 498 lung squamous cell carcinoma (LSCC) patient responses to 59 drugs.(A) Patient clustering based on the predicted drug response.The higher prediction score is, the more sensitive patients is to the drug.Six largest patient clusters are labeled by the star.(B) The distribution of LSCC patients in 45 patient groups.(C) The distribution of patients with different primary tumors in the patient cluster 44.

Table 1 :
Average of predicted ranks of the most responsive drug by CODE-AE variants on PDTC dataset

Table 3 :
Performance comparison on cancer subtype prediction with the gender as a confounding factor.The best and the second best performances are highlighted and underlined, respectively.