Integrative multi-omics and drug response profiling of childhood acute lymphoblastic leukemia cell lines

Acute lymphoblastic leukemia (ALL) is the most common childhood cancer. Although standard-of-care chemotherapeutics are sufficient for most ALL cases, there are subsets of patients with poor response who relapse in disease. The biology underlying differences between subtypes and their response to therapy has only partially been explained by genetic and transcriptomic profiling. Here, we perform comprehensive multi-omic analyses of 49 readily available childhood ALL cell lines, using proteomics, transcriptomics, and pharmacoproteomic characterization. We connect the molecular phenotypes with drug responses to 528 oncology drugs, identifying drug correlations as well as lineage-dependent correlations. We also identify the diacylglycerol-analog bryostatin-1 as a therapeutic candidate in the MEF2D-HNRNPUL1 fusion high-risk subtype, for which this drug activates pro-apoptotic ERK signaling associated with molecular mediators of pre-B cell negative selection. Our data is the foundation for the interactive online Functional Omics Resource of ALL (FORALL) with navigable proteomics, transcriptomics, and drug sensitivity profiles at https://proteomics.se/forall.

For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

April 2020
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Data availability is stated in the paper in the "Data availability" section: -The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD023662 - The RNA-seq data discussed in this study have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE168386 -This study also makes use of data generated by the St. Jude Children's Research Hospital -Washington University Pediatric Cancer Genome Project deposited at European Genome-phenome Archive (EGA) under the study accession code EGAS00001001952, https://ega-archive.org/studies/EGAS00001001952. -Viable cell count data from flow cytometry experiments are hosted in our github repository: https://github.com/isabelle-leo/FORALL/tree/main/data/ flow_cytometry -Genesets used for GSEA has been obtained from: Molecular Signatures Database v7.5 https://www.gsea-msigdb.org/gsea/msigdb -Analyzed data can be browsed using our interactive shiny app: http://proteomics.se/forall Code availability is stated in the paper in the "Code availability" section: - The code used to analyse the proteomics data is available and all code used to generate the figure panels is available on our github repository: https:// github.com/isabelle-leo/FORALL Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
The number of cell lines included in the study (n=49) was chosen to include all accessible cell lines meeting the following criteria: -Acute lymphoblastic leukaemia -Childhood (age limit up to 20 years) -B-ALL, BCP-ALL, or T-ALL lineage, derived from any tissue (bone marrow, peripheral blood) and not limited to any subtypes (genes fusions, mutations).
-Commercial availability or easily available upon request from repositories for better reproducibility.
-Commercial availability at the initiation of the project leading up to 49 childhood ALL cell lines.
-49 cell lines are sufficient for statistical analyses. Full information of the cell lines are available at supplementary table 1.
Data exclusions Data exclusions were stated wherever needed in the text during various analyses to address certain questions applicable to specific subsets of the dataset, i.e. specific analyses by gene fusion phenotype or lineage. Non-protein coding transcripts were excluded from the analyses of RNAseq data. Quality control standards relating to viability of control samples were established prior to drug screening and flow cytometry analysis, as described in the methods section, and all presented experimental data adheres to these standards. No data were excluded from the analyses. No data was excluded from analysis and all data met the quality control standards as described above.

Replication
Our proteomics data is novel which limited our ability to validate some part of our results. We used public RNA-seq data from clinical samples to validate part of our results at transcriptomic and proteomics level, in the tested cases replication/reproducibility was successful. Replicate proteome profiles were obtained using DDA proteomics for n=32 cell lines, and using DIA proteomics for remaining cell lines, which all demonstrated robust reproducibility of proteome phenotypes in unbiased hierarchical clustering. Selected cell lines (n=16) were replicated in the transcriptomic dataset, where they all clustered together with their replicates. Flow cytometry expriments were replicated at minimum n=3 times, using separate cultures and drug treatments, as well as with a minimum of 3 biological replicate cell lines containing the same phenotype, and replications were successful. CETSA experiments were replicated twice where replication was successful.
Randomization The cell line panel were chosen to represent as many possible known and rare subtypes of childhood ALL that could be obtained from a readily available source, selection of these cell lines was not altered based on additional randomization criteria. The experiments were randomized based on the date of cell lines obtained and cultured as well as cytogenetic type and subtype.

Blinding
The distinction of phenotypic and genetic fusion subtypes was confirmed in an unbiased way using unsupervised clustering. All conclusions were obtained by or supported by unbiased analyses of multi-omics data, which represented in-depth results obtained in a technically identical and unguided manner. Investigator blinding to conditions and outcome assessment was not applicable.
Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Mycoplasma contamination
All cell lines were tested for Mycoplasma by MycoAlert Mycoplasma detection kit (Lonza). All cell lines used in this study tested negative for mycoplasma.

Commonly misidentified lines
(See ICLAC register) The following cell lines from the ICLAC commonly misidentified lines register are included in the dataset: A3, BE-13. Their status as derivative cell lines is disclosed, and they have been characterized as derivative from cell lines which meet our cell line inclusion criteria. All metadata represents the identity of these cell lines based on their known derivative, and does not include misidentified characteristics.

Flow Cytometry
Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.