Background & Summary

Protein kinase inhibitors (KIs) belong to a class of targeted therapeutics that are being increasingly used in treatment of various cancers1. Their use and development have been accelerated in recent years as they could target tumors more effectively than most other chemotherapeutics, and their mechanisms of actions are well defined. In many cases, however, their intended- or off-target kinases serve key biological roles, which when blocked, lead to severe adverse effects2. One of the major adverse effects that lead to discontinuation of treatment with KIs is cardiotoxicity3. As a part of the NIH Library of Integrated Cellular Signatures (LINCS) program4, the main goal of the Mount Sinai Drug Toxicity Signature Generation Center (DToxS) is to better understand mechanisms of KI-associated cardiotoxicity by constructing cellular signatures of drug effects. Since different omics assays show varied sensitivities5 and offer complementary molecular information on the cellular phenotypic state6, in order to develop a comprehensive understanding of the cellular responses to KIs, we use network analyses7 to combine differential expression of genes and gene products in human cardiomyocytes treated by FDA approved KIs, analyzed using both transcriptomic and proteomic methods. In this dataset, we present the proteomic portion of the effects of KIs on human cardiomyocytes. Kinase inhibitors presented here are grouped and color-coded according to their primary target profile (see Table 1 for this and additional drug metadata).

Table 1 Drug metadata.

Quantitative proteomics technologies have evolved to become increasingly effective at identifying differentially expressed proteins among diverse experimental conditions. Currently, two broad strategies are widely used for large-scale quantitative proteomics studies: stable isotope label-based8,9 and label-free quantification (LFQ) strategies10,11. The label-based methods provide fewer experimental variations and better protein quantitation precision12,13; however, the cost for label-based proteomics reagents is prohibitively expensive for analyzing hundreds of samples. LFQ methods, on the other hand, do not involve expensive labeling reagents; thus, they can be used for clinically relevant studies of hundreds and thousands of samples if proper controls and quality guidelines are followed. LFQ methods have more comprehensive dynamic ranges and more flexibility in study design14,15,16. However, while LFQ sample throughput can be increased through this approach, it should be noted that this approach can also lead to a smaller number of identified proteins, due to limited peptide fractionation.

In this DToxS dataset, we have chosen the LFQ method in order to economically compare the proteomics signatures from over 300 samples (Supplementary Table 1) that have been collected over an extended period of time, some with limited protein yields. The transcriptomes of these samples have been previously assessed using 3′ digital gene expression RNA sequencing (RNAseq)17,18. To ensure both proteomic and transcriptomic data can be compared from the same drug-treated cells, we have optimized a method to extract proteins from the samples after RNA extraction at the Center for Advanced Proteomics Research (CAPR, http://njms.rutgers.edu/proweb/) at Rutgers University - New Jersey Medical School (Fig. 1). All the sample preparation, analysis procedures and data have gone through careful quality control steps to ensure reproducibility (Fig. 2); standard operating procedures and data descriptors are publicly available in the DToxS website (www.dtoxs.org) and both raw and MaxQuant-analyzed data are publicly available via the PRIDE database (PXD014791) as well as the LINCS Project data portal (http://lincsportal.ccs.miami.edu/dcic-portal/). We think that this large KI-induced proteomics dataset will be a valuable resource for cancer, cardiology and drug development communities.

Fig. 1
figure 1

LFQ proteomics workflow for the DToxS dataset. (a) Drug and cell treatment design and experimental workflow. (b) After drug treatment, RNA was extracted from cell pellets, and the remaining proteins were recovered via protein precipitation. The protein amount for each sample was carefully estimated from its SDS-PAGE staining intensity as measured relative to a HeLa cell lysate standard. Two micrograms of protein from each sample were analyzed by LC-MS/MS, and the resulting proteins were quantified via LFQ approach using MaxQuant.

Fig. 2
figure 2

MS/MS quality control for the DToxS dataset. A series of quality control steps have been implemented to obtain high quality data for this dataset. (a) A gel-based method was developed for protein estimation. In each SDS-PAGE gel, 25 µg of HeLa cell protein extract was run as a reference to estimate the total protein amount of each sample (blue verticle lines). Based on the density of the CBB stain of each lane in relation to the HeLa stain density, the protein amount can be calculated (see examples in Online-only Table 1). After the tryptic digestions, the resulting peptides were diluted into 0.5 µg/ml for LC-MS/MS analysis (orange verticle lines), based on the estimated protein amount in each sample. (b) Levey-Jennings performance chart showing instrument stability. Two hundred nanograms of commercial HeLa protein digest was analyzed regulary during the time period of the DToxS LINCS data acquisition. Total ion current (TIC) from each month is shown. Red lines demarcate one, two and three times the standard deviation; LCL/UCL = lower/upper control limit. (c) Deep proteome coverage indicates good instrument sensitivity. Total protein groups identified from each cell line in this study are shown, with over ~5,000 unique proteins identified per cell line. (d) The HeLa protein digests were used for LFQ normalization. Based on the MS1 LFQ counts of the HeLa cells from each gel (top panel), a normalization factor was calculated and applied to the MS1 counts of the samples. After the HeLa cell normalization, the LFQ variation was more stable.

Methods

Cell culture, drug treatments and RNA extraction

Detailed materials and methods for all the experiments in this study can be accessed as version- and quality-controlled standard operating procedures on DToxS.org. Briefly, four commercially available cell lines of primary adult human cardiomyocytes were purchased from PromoCell GmbH (Cat #: C12810; Heidelberg, Germany), expanded and differentiated under serum-free conditions for 28 days per manufacturer’s instructions. The four cell lines used in this study (Lot #: 3042901.2, 4031101.3, 2082801.2, 2120301.2) were isolated from two Caucasian male and two Caucasian female subjects, aged 54, 62, 61, 56, respectively (see cell lines PMC-A, B, D, E in Table 2). Although these cell lines were originally derived from the heart and have many cardiomyocyte-like properties, they are not excitable or contractile; hence, we refer to them as cardiomyocyte-like throughout the dataset. All four cell lines were subjected to rigorous standardized quality control metrics that included regular testing for mycoplasma contamination and confirmation of cardiac origin through RT-PCR screening of key cardiac genes (ACTN2, TNNT2, KCHN2, MEF2C, NKX2–5 and GATA4), immunofluorescent validation of cardiac-specific actinin crossfiber formation, and measurement of calcium activity. Details of extended cellular metadata for each line can also be found at DToxS.org. Briefly, cells were subcultured at 37 °C under 5% CO2 with the manufacturer-supplied and serum-supplemented growth media (Cat #: C22270, C39275) up until passage four. Once they reached 100% confluence, cells were trypsinized, counted, and replated in serum-free growth media (Cat #: C22270), at a concentration of 40,000 cells/cm2 in 60 mm tissue culture dishes. Cells were then differentiated under serum-free conditions for 28 days, whereby 50% of the media was replenished every other day. All experiments were performed on sixth or earlier passage cells. After 28 days of differentiation, cells were treated with individual KIs for 48 hours at a concentration equivalent to clinically-observed median peak plasma concentration. Forty-eight hours was chosen to mimic the in situ chronic conditions and to detect consistent, non-transient, proteomic changes. We note that no extensive cell death or apoptosis were observed. Before and after treatment cell counts and bright field images were obtained to quantify changes in cell numbers. After 48 hours of KI treatment, cells were lysed on ice using TRIzol (Thermo Fisher, Cat #: 15596026) for five minutes, scraped off the dish, and mixed. TRIzol lysates were mixed with chloroform according to manufacturer’s instructions and the RNA-protein fractions were separated. From these organic partitions, RNA samples were processed, sequenced and analyzed as previously reported18. Results of the transcriptomic signatures are reported elsewhere17.

Table 2 Cellular metadata.

Protein extraction

After isolating RNA, the remaining TRIzol solutions were processed for protein extractions (Fig. 1a). Each TRIzol extract were first separated into two 1.5 mL Eppendorf tubes; each tube contained ~600 µL of the TRIzol chloroform and methanol solvent mixture. Then to each tube, 900 µL of pre-chilled cold acetone was added and mixed well. Five hundred µL of the mixture was transferred from each tube to a third new tube, and 500 µL pre-chilled −20 °C acetone was added to each of the three tubes and mixed well. All three tubes were put on ice for at least 12 hours inside a cold room, and centrifuged at 16,000 rpm (~23,469 g) for 30 minutes, in a Beckman Coulter Allegra 64 R centrifuge. After centrifugation, the supernatants were removed and 1 mL of pre-chilled acetone was added to each tube to wash the protein pellets. The protein pellets were sonicated for five bursts at level-3 on a Branson Ultrasonic cleaner, repeated once. The protein solutions were placed on ice for four hours and pelleted via centrifugation as described above. The wash procedure for each protein pellet was repeated twice for a total of three times. The final protein pellet was washed with 50 µL of pre-chilled acetone and centrifuged at 16,000 rpm for 10 minutes, and the remaining pellet was air dried at room temperature for 10 to 15 minutes. The protein pellet was then resolubilized with a vortex mixer, with 70 µL of 8 M urea (Fisher Cat #: U15500) in 50 mM ammonium bicarbonate (Sigma Cat#: 09830-500 G), and stored at −80 °C before proteomics analysis. See additional protocol details at https://martip03.u.hpc.mssm.edu/sop.php.

Sample preparation for proteomics analysis

The protein solutions may contain TRIzol and other low mass contaminants from the RNA extraction buffer, which can confound accurate protein estimation via conventional biochemical assays. Consequently, we ran the SDS-PAGE gels to remove the low mass contaminants from the high mass proteins, and estimated the protein amounts via the stain intensities from the gels stained with the Coomassie brilliant blue (CBB) dye (Fig. 1b and Supplementary Fig. 1). In each gel, 25 µg of a HeLa cell extract produced at the CAPR (aliquots of the same preparation were used for all gels) was run in one lane as the normalization reference, along with the proteins isolated from the drug-treated cell lines. After CBB staining of the gels, the density of each sample lane was measured using the ImageJ software (NIH), and the protein amount for each drug-treated sample was calculated using the following equation: protein (µg) = HeLa protein 25 µg × (Sample density/HeLa density). The protein quantity information was later used to dilute the tryptic peptides into 0.5 µg protein equivalents per µL for the LC-MS/MS analysis (Online-only Table 1).

The in-gel digestions of proteins were performed following a procedure similar to the one described by the Mann lab19. In brief, the entire gel lane of each sample was excised into ~1 cm2 gel blocks, and washed four times with 10 mL each of a Wash Buffer containing 30% acetonitrile (ACN) and 70% of 100 mM NH4HCO3 to remove the contaminants from the RNA extraction buffer. Subsequently, one mL of 25 mM dithiothreitol (DTT) solution was added to the gel blocks for disulfide reduction at 55 °C for 30 minutes, and 1 mL of 50 mM iodoacetamide solution was then added for thiol alkylation at 37 °C in the dark for 30 minutes. After alkylation, the gel blocks were dehydrated with 9 mL of ACN to remove both DTT and iodoacetamide. For in-gel trypsin digestion, one mL of trypsin solution (5 µg/mL in 50 mM NH4HCO3) was added into each sample, and incubated at 37 °C for 16 hours. Resulting peptides were extracted, desalted with Pierce C18 spin columns (Thermo Scientific) based on the manufacturer’s protocol and concentrated in a Speed Vac prior to LC-MS/MS analysis.

LC-MS/MS analysis

According to the gel-based protein amount estimates described above, the peptides from each sample were resuspended in Solvent A (2% ACN in 0.1% formic acid (FA)) at a final concentration of 0.5 µg/µL (see examples in Online-only Table 1). Two micrograms of peptides from each sample were subjected to LC-MS/MS analysis on a Q Exactive Mass Spectrometer coupled with an UltiMate 3000 RSLCnano Duo LC system (Thermo Scientific). The peptides were first loaded onto a trapping column (Acclaim PepMap 100 C18 trap column 75 µm × 2 cm, 3 µm, 100 Å) and then separated on an Acclaim PepMap C18 column (75 µm × 50 cm, 2 µm, 100 Å), using a 4-h binary gradient from 2–95% of Solvent B (85% ACN in 0.1% FA), at a flow rate of 250 nL/min. The eluted peptides were directly introduced into the MS system for data-dependent MS/MS analysis in the positive ion mode. The MS full scans were acquired in an m/z range of 400 to 1750 in the profile mode, with the AGC value specified at 3E6, and the injection time set at 100 msec. The resolution of the full MS scan was set to 140,000 at m/z 400. Following each full MS scan, the 15 most intense ions with charge states between 2+ to 5+ were selected within an isolation window of 2 m/z for the subsequent MS/MS analysis. The AGC of MS/MS analysis was set to 5E4 and the dynamic exclusion was 45 sec. The peptide ions were fragmented using higher energy collision dissociation at a NCE of 27. A total of 308 LC-MS/MS raw files were obtained for the proteins isolated across four different cell lines for 60 different types of drug treatment conditions. The reference HeLa digest from each SDS-PAGE gel was analyzed using the same method to normalize the LFQ intensities from the other cell digests. To quality control the LC-MS/MS system performance, we also ran a Pierce HeLa Protein Digest Standard (Pierce, Cat# 88329) on the instrument, prior to running each set of drug-treated samples.

MaxQuant for protein identification and quantification

In order to evaluate the quality of the data and compare the quantitative proteomic signatures among the drug-treated samples across different cell lines, the entire 372 raw LC-MS/MS dataset (308 samples and 64 HeLa cell controls, ~825 GB of data) was submitted for database search using the Andromeda search engine on the MaxQuant platform (Version 1.6.0.13). The raw data files were loaded with “No fractions” option selected. Trypsin was selected as enzyme with two miss cleavages. Methionine oxidation (+15.9949 Da) and protein N-terminal acetylation (+42.0106 Da) were selected as various modifications and cysteine carbamidomethyl modification (+57.0215 Da) was set as a fixed modification. Initial search peptide mass tolerance was set to 20 ppm, and the main search peptide mass tolerance was set to 4.5 ppm. LFQ was selected for label-free quantification, with the minimal LFQ ratio count set at 2. The MS/MS spectra were searched against both UniProt human FASTA database (downloaded from https://www.uniprot.org/proteomes/UP000005640 with the last modification date of 10/22/2018, containing 73,101 human protein sequences) and the MaxQuant default contaminants FASTA database (containing 245 protein sequences). Match between runs was selected to maximize protein identification and quantitation with a match time window of 0.7 minutes. The protein false discovery rate (FDR) was estimated using the decoy databases containing reversed sequences of the original proteins. Proteins identified (Table 3) with both protein and peptide FDRs at or less than 1% were included in the final results for the subsequent analyses.

Table 3 Summary of MaxQuant analytics.

Data Records

All the raw data described in this study have been uploaded to the PRIDE repository at the ProteomeXchange website (https://identifiers.org/pride.project:PXD014791) with the project accession PXD014791, and they are freely available for the research community20. Online-only Table 2 provides the metadata information of drug treatments, cell lines and replication numbers for all raw mass spectrometry data files. In addition, the processed higher level data is available through LINCS Data Portal (http://lincsportal.ccs.miami.edu/dcic-portal/) with the accession LDG-1444. Data include metadata, raw files of LC-MS/MS analysis, protein FASTA database, and protein identification and quantitation results from MaxQuant21.

Technical Validation

We have developed a three-tiered quality control strategy to ensure the high throughput production of deep and reproducible DToxS proteomics datasets. This strategy, which focuses on samples, instrument and analytical pipeline, has enabled us to ensure high quality at (1) the protein sample level, (2) the LC-MS/MS instrument performance level, and (3) the overall quantitative data processing variations.

Quality validation of the protein samples for LC-MS/MS

Due to both the cost constraints of treating the human primary cardiomyocyte-like cells with KI drugs and the scientific necessity to compare proteomics data with RNAseq data from the same samples, the same drug-treated cells were used to analyze protein expression, after the RNA extraction. The challenges from this approach included occasionally uneven or poor protein yields, and imprecise protein concentration estimates with the established Bradford or BCA assays, due to the confounding components in the RNA extraction buffers. While the protein yield of our combined RNA-protein extraction approach was lower and more variable compared to standard “protein-only” protocols, we note that all of the samples investigated here had more than adequate protein yield necessary for LC-MS/MS. In order to rigorously compare drug effects with the given sample-to-sample variability in experimental processing, we performed an additional SDS-PAGE in-gel quantification step with a consistent positive control sample and internally-prepared HeLa protein lysates. These gels further served as an additional quality control step, as they were checked for signs of protein degradation (such as smearing), which were not observed in any of the samples. Seventy SDS-PAGE gels were run to increase the estimation accuracy of the proteins derived from the cells (see examples in Supplementary Fig. 1); in each gel, 25 µg of HeLa cell protein lysate was included in the first lane, as the reference for both accurate protein estimation and efficient in-gel digestion, and later for LC-MS/MS data normalization. In brief, the densities of the CBB for each KI-treated cell line in each gel lane were recorded (see examples in Online-only Table 1). The protein amounts from each sample were extrapolated by comparing its CBB density to that of the 25 µg of the internally-prepared HeLa proteins; the estimated protein amounts were later used to obtain 0.5 µg/µL equivalent of the tryptic peptides from each sample for LC-MS/MS analysis (see examples in Fig. 2a). For tryptic digestion, the drug-treated samples were digested along with the internally-prepared HeLa proteins separated on the same SDS-PAGE gels, which were later used for LC-MS/MS data normalization as well.

Quality validation of the LC-MS/MS system

To ensure that LC-MS/MS system could deliver overall satisfactory performance, 200 ng of commercial HeLa cell digest (Pierce, Cat# 88329) was run prior to each gel batch of the biological samples. To confirm well-behaved LC separations, MS1 ions for select peptides were carefully monitored for retention time shifts and peak widths. In order to minimize the run-to-run carry-over of peptides, a duo-LC system was configured in parallel for this study, so while one LC column was used for peptides separation and LC-MS/MS analysis, the other LC column could be washed to remove any residual peptides. For the overall LC-MS/MS quality control, the system is capable of identifying >3,800 unique proteins and >14,000 unique peptides from ~200 ng of the commercial HeLa cell digest. Indeed, the Levey-Jennings quality assurance plot showing the monthly total ion current (TIC) of the HeLa digest indicates consistent LC-MS/MS system performance for LFQ quantification (Fig. 2b). Qualitatively, the LC-MS/MS system enabled the deep proteome coverages (~5,000 unique protein groups/cell line) of all four DToxS cell lines (Fig. 3a and Table 3). Total 372 raw LC-MS/MS files, including the files for 308 drug treated cardiomyocyte-like cell line and 64 HeLa cell line samples, were submitted to the MaxQuant analysis (Table 3). In total, 83,674 unique peptides that mapped onto 5,994 unique protein groups were identified with the FDR of less than 1%, among the four different cell lines. These proteins were mapped onto 5,784 unique genes. Given no multi-dimensional LC fractionation was performed, these deep proteome coverages confirm the effectiveness of the LC-MS/MS quality control procedures used for this study.

Fig. 3
figure 3

Coverage and reproducibility of detected proteins in control samples. (a) Venn diagram of the overlapping and unique protein groups across all samples for the four cardiomyocyte lines PMC-A, B, D, E. (b) Normalized MS1 LFQ intensities for the same sample measured from two separate SDS-PAGE gels in two different MS/MS runs had strong agreement. (c) The correlation (mean and variation) of LFQ intensity between the control samples from the same experiment for each experiment versus the samples from different experiments in each cell line (cell lines PMC-A, B, D, E shown in color coordination). (d) The clustering of all biological replicates from four cell lines under control conditions based on the Eucledian distances of LFQ intensities of replicates. (e) Accordingly, proteomic signatures of the four cell lines under control conditions across 71 biological replicates showed strong clustering based on the source cell line.

Quality control of the quantitative data analysis

To minimize the LC-MS/MS quantification noise derived from the sample preparation steps, the HeLa proteins used for the in-gel protein estimation were digested along with the drug-treated samples from the same gels. Thus, these HeLa cell protein digests were used to control for the trypsin digestion efficiencies, LC-MS/MS variations and for the quantitative normalization of the LFQ data. For example, the TIC analyses indicates sample-to-sample variations among both HeLa and control (i.e., no drug treatment) cell digests (Fig. 2c,upper and lower panels, respectively); with HeLa TIC signal normalization, the variations in the control cell digests were further reduced (Fig. 2d). We also randomized all samples into separate gels and included multiple technical replicates of control samples into the gels to prevent time-dependent artifacts that may arise from instrumental drift or lot-to-lot variations in reagents.

Overall, our method for protein count normalization showed good reproducibility (Fig. 3b), which can be quantitatively assessed using Pearson correlation across two technical replicates that were analyzed on different days from two different HeLa signal normalization protocols (i.e., different SDS-PAGE gels). This was consistent for all technical replicates in this dataset (Supplementary Fig. 2). In fact, the average slope for sample-to-sample comparison was remarkably close to unity: 1.03 ± 0.16. Similarly, the average coefficient of determination for all technical replicates in the study was 0.97 ± 0.03. We also note that both the depth of protein quantification for each replicate and the overlap of detected proteins between different biological replicates were consistently high for all cardiomyocyte-like cell lines (Fig. 3c), suggesting that our experimental pipeline was robust in identification of proteomic signatures. The quantification strategy was also successful in reproducible identification of subtle proteomic differences between individual cardiomyocyte cell lines PMC-A, B, D and E, which clustered consistently across most biological replicates using either Pearson (Fig. 3d) or Euclidean (Fig. 3e) distance metrics. Furthermore, we used overlap analysis, to show that most of the identified protein groups were common to either all four cell lines (3,907 or 65%) or detected in at least three out of four lines (4,348 or 73%) with only 1,212 (or 20%) protein groups detected in uniquely one of the lines (Fig. 3a).

Identification of differentially expressed proteins

The normalized LFQ intensities of identified proteins were processed by a custom computational pipeline for identifying differentially expressed proteins (DEPs) that consists of four parts: (1) general logarithmic transformation and variance stabilization; (2) linear model fitting for treatment comparison; (3) empirical Bayes test for differential statistical significance; and (4) Fisher’s test for combining experiment-wise differential comparison results (Supplementary Fig. 3).

First, for each experiment on each cell line for each drug treatment, the LFQ intensity measurements of drug-treated and control conditions are transformed with a general logarithm formula based on inverse hyperbolic sine function using the vsnMatrix and predict function of the vsn R package such that the variance of transformed data becomes independent of its mean value. Then the transformed intensities of the samples are fitted to the linear model and the fitting coefficients are estimated using the lmFit and contrasts.fit functions from the limma R package. Next, these fitting parameters are given to the empirical Bayes test using the eBayes function with its intensity-based trend parameter turned on (for the purpose of dynamic prior estimation) to calculate various statistical metrics for the differential comparison between the drug-treated samples and the control samples. Finally, the p-values from all individual independent differential comparisons are combined using the Fisher’s method with the sumlog function of the metap R package to obtain a single meta p-value for the comparison between all the drug-treated samples and the control samples for a cell line and a drug treatment. This procedure is then repeated for all the combinations of cell lines and drug treatments. This produces a list of reference proteins ranked by their p-values (from most to least significant). These ranked protein lists together with their cell lines and treated drugs are passed to the downstream analysis.

Downstream analysis of differentially expressed proteins

To better collate the cell line and drug specific responses, we first merged all DEPs of the same cell line treated with the same drug to one new list of DEPs. For each DEP list, we then calculated a combined p-value defined as the geometric mean of the individual p-values of the different treatments and a combined log2(fold changes) defined as the average mean of the individual log2(fold changes). For all downstream analyses, we considered the top 100 DEPs (based on the combined p-values). Combined drug treatments of the different cell lines were hierarchically clustered based on the Euclidean distance of the combined log2(fold changes) using Cluster 3.022,23 (Fig. 4a). In order to investigate the consistency between the different experiments for the same drug treatment across the four cell lines, we ran successive Pearson correlations between different replicates for a given drug treatment of the same cell compared to the correlation among different cell types (Fig. 4b).

Fig. 4
figure 4

Reproducibility of differential protein expression measurements for drug-treated samples. (a) Unsupervised clustering of top 100 differentially expressed proteins in KI treated cardiomyocyte-like cell lines. Color-coding on the left reflects the primary target of the drugs (see Table 1 for further details), whereas the bars on the right highlight the cell line. While KIs with weak proteomic signatures cluster along cell lines, those that have strong differential protein expression cluster along the primary target profile of the drugs. (b) The similarity and variability of drug-treated samples within and between cell lines as determined by Pearson correlation between drug treated replicates and controls. (c) A summary of the drugs showing similar profiles between cells versus those showing different profiles within cells.

Usage Notes

Use case 1: Identification of KI drug-induced protein expression profiles

This DToxS dataset can be mined to identify protein expression changes triggered by each KI drug treatment of the four cardiac cell lines derived from human tissues. For example, we note that the DEP signatures from different cell lines and KIs clustered together based on the impact of the drug. While KIs that had modest proteomic changes clustered along the cell lines (top clusters of Fig. 4a), those that have more substantial differential protein expression signatures (bottom clusters in Fig. 4a that include nilotinib, sorafenib, pazopanib and regorafenib, all of which are reported to have cardiotoxic adverse associations including QT prolongation and myocardial infarction) clustered on their own irrespective of the cell line. The proteomic changes identified in this study can be used to understand the cell signal transduction events that may underlie KI-induced cardiotoxicities. Also, with the LC-MS/MS information provided in this dataset, follow-up proteomics assays can be developed to validate the protein expression changes in larger number of samples; such assays may include either targeted assays, e.g., selected or parallel reaction monitoring or untargeted proteomics assays, e.g., data-independent analysis.

Use case 2: Variability of differential KI drug-induced protein expression profiles

This dataset can be analyzed to compare and cluster the proteomic changes induced by different KIs, thus ascertaining the similarities and idiosyncrasies of different drugs and patient (or cell line) specific responses. One can also quantify the variability and consistency of drug-induced proteomic changes within a given subject (i.e., within a given cell line) compared across a population (i.e., between lines) to determine the generalizability of a given drug effect on cardiomyocytes (Fig. 4c).

Use case 3: Integration of transcript and protein expression profiles for comprehensive network studies

Drugs may induce complex changes on either gene or protein expression. Since proteomic and transcriptomic changes in complex human cells are not always in agreement (due to technical bias, differential regulation or turnover rates), it may be desirable to integrate both gene and protein expression datasets from the same samples, in order to obtain more comprehensive understanding of the drug-induced changes among the signaling networks. For this purpose, the DToxS proteomics dataset presented here can be integrated with the DToxS transcriptomics dataset17 to fully understand the effects of KI-induced cardiotoxicities. We note that 97% of the protein groups in our dataset map to individual genes (Supplementary Fig. 4).

Use case 4: Identification of drug-induced post-translational modifications (PTMs) and regulated proteolysis

Drugs may induce changes among signaling pathways not only via the changes of gene and protein expressions, but also via the rapid regulations of protein functions by altering specific PTMs (e.g., phosphorylation) and regulated proteolytic events (e.g., caspase 3 activation)24,25,26. Therefore, alternative protein database search schemes can be carried out to analyze the raw DToxS proteomics dataset (Supplementary Fig. 5a), and identify KI-induced changes among specific PTMs (Supplementary Fig. 5b,c) or semi-tryptic or non-tryptic peptides derived from regulated proteolysis. Since the current workflow did not include the enrichments of the subproteomes specific for PTMs, the yield for peptides containing PTMs or non-tryptic cleavages will likely be modest. As such, the results from the alternative analysis of the DToxS dataset may provide information for future proteomics studies that focus on the enrichments of specific PTMs, e.g. acetylation and phosphorylation.