A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors

Single-cell genomics is essential to chart tumor ecosystems. Although single-cell RNA-Seq (scRNA-Seq) profiles RNA from cells dissociated from fresh tumors, single-nucleus RNA-Seq (snRNA-Seq) is needed to profile frozen or hard-to-dissociate tumors. Each requires customization to different tissue and tumor types, posing a barrier to adoption. Here, we have developed a systematic toolbox for profiling fresh and frozen clinical tumor samples using scRNA-Seq and snRNA-Seq, respectively. We analyzed 216,490 cells and nuclei from 40 samples across 23 specimens spanning eight tumor types of varying tissue and sample characteristics. We evaluated protocols by cell and nucleus quality, recovery rate and cellular composition. scRNA-Seq and snRNA-Seq from matched samples recovered the same cell types, but at different proportions. Our work provides guidance for studies in a broad range of tumors, including criteria for testing and selecting methods from the toolbox for other tumors, thus paving the way for charting tumor atlases.

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code

October 2018
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All main and Extended Data figures have associated raw data. Raw data will be available in the controlled access repository dbGaP (https://www.ncbi.nlm.nih.gov/ gap/), under the dbGaP Study Accession phs001983.v1.p1; raw data will also be available in the controlled access repository DUOS (https:// duos.broadinstitute.org/), under the following DUOS Dataset IDs: DUOS-000111, DUOS-000112, DUOS-000113, and DUOS-000114. The counts matrices and metadata for each sample will be publicly available in Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) under data repository accession no. GSE140819. Finally, we provide a website that displays a comprehensive analysis summary for each sample tested (https://tumor-toolbox.broadinstitute.org).

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
For each sample, an input of 8,000 single cells or 8,000-10,000 single nuclei were loaded into each channel of the 10x Genomics Single-Cell Chromium Controller. These loading values were chosen to balance the probability of forming doublets with the goal of having maximal cell recovery and sufficient cell/nuclei recovery to reveal the heterogeneous landscape of the tumors.
Data exclusions We removed low quality cells by requiring each cell to have a minimal number of UMIs and genes detected. We used different thresholds depending on the experimental modality (single cell or single nucleus) and on the 10x kit (V2 or V3 chemistry). For single nucleus data, we retained nuclei with at least 200 genes and 400 UMIs detected by V2 chemistry and with at least 500 genes and 1,000 UMIs detected by V3 chemistry. For single cell data, we retained cells with at least 500 genes and 1,000 UMIs detected by either V2 or V3 chemistry. For the V2-V3 comparison in HTAPP-951-SMP-4652 (Extended Data Fig. 9), we used the same thresholds for both chemistries: at least 200 genes and 400 UMIs detected. For both data types, we filtered out those cells or nuclei where >20% of UMIs came from mitochondrial genes.

Wild animals
This study did not involve wild animals.

Field-collected samples
This study did not involve field-collected samples.

Ethics oversight
Animal use was restricted to 1 female nude athymic mouse for para-adrenal injection of O-PDX cells. This study was carried out in strict accordance with the recommendations in the Guide to Care and Use of Laboratory Animals of the National Institute of Health. The protocol was approved by the Institutional Animal Care and Use Committee at St. Jude Children's Research Hospital. All efforts were made to minimize suffering. All mice were housed in accordance with approved IACUC protocols. Animals were housed on a 12-12 light cycle (light on 6 am and off 6 pm) and provided food and water ad libitum. Athymic nude female mice were purchased from Charles River Laboratories (strain code 553).
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Policy information about studies involving human research participants

Population characteristics
This research was not designed as a population study. Only a small number of samples (2-7) are profiled and analyzed per cancer type. Most samples are from adults, with the remaining samples being pediatric (pediatric high-grade glioma and neuroblastoma).

Recruitment
Patients were not actively recruited for this secondary-use study. Instead, patients were recruited under the initial IRB protocols approved by our collaborating institutions (see "Ethics oversight" section). External sample cohorts were then added to the Note that full information on the approval of the study protocol must also be provided in the manuscript.

Flow Cytometry
Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided. Cell population abundance We used a CD45+ depletion strategy to prepare an ovarian ascites sample for scRNA-Seq. To assess how well our CD45+ depletion strategy worked, we took a sample of these prepared cells, with and without the CD45+ depletion, and performed flow cytometry. CD45-cells were enriched from 0.75% to 29.4% of the population, as determined using the anti-CD45 antibody. EpCAM+ cells were enriched from 0.17% to 4.9%, as determined by the PE anti-human EPCAM antibody.

Methodology
Gating strategy Cells were gated by FSC and SSC (35% of events retained for no depletion, 23% of events retained for depletion of CD45+ cells), doublets removed using FSC-A and FSC-H (100% singlets for no depletion, 99.8% singlets for depletion of CD45+ cells), live cells identified using 7-AAD (84.7% of cells retained for no depletion are live, 96.6% of cells retained for depletion of CD45+ cells are live), the distribution of immune and non-immune cells quantified using the CD45 antibody (99.3% of cells retained for no depletion are CD45+, 70.5% of cells retained for depletion of CD45+ cells are CD45+), and the distribution of EPCAM+ cells quantified using the EPCAM antibody (0.17% of the cells retained for no depletion are EPCAM+, 4.92% of cells retained for depletion of CD45+ cells are EPCAM+).
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.