Integrating single-cell RNA-seq and imaging with SCOPE-seq2

Live cell imaging allows direct observation and monitoring of phenotypes that are difficult to infer from transcriptomics. However, existing methods for linking microscopy and single-cell RNA-seq (scRNA-seq) have limited scalability. Here, we describe an upgraded version of Single Cell Optical Phenotyping and Expression (SCOPE-seq2) for combining single-cell imaging and expression profiling, with substantial improvements in throughput, molecular capture efficiency, linking accuracy, and compatibility with standard microscopy instrumentation. We introduce improved optically decodable mRNA capture beads and implement a more scalable and simplified optical decoding process. We demonstrate the utility of SCOPE-seq2 for fluorescence, morphological, and expression profiling of individual primary cells from a human glioblastoma (GBM) surgical sample, revealing relationships between simple imaging features and cellular identity, particularly among malignantly transformed tumor cells.

www.nature.com/scientificreports/ In conventional scRNA-seq, the barcoded mRNA capture beads are randomly co-encapsulated with individual cells in the microwells, so we do not know which barcode is incorporated into the cDNA library of each imaged cell. However, in SCOPE-seq we can identify the barcode sequence on the bead in each microwell by hybridizing fluorescently-labeled oligonucleotide probes and imaging the beads with a fluorescence microscope (optical decoding). In our original report, the cell-identifying barcode that was incorporated into the cDNA library of each cell and the optically decodable barcode sequence were distinct, and we had to prepare a separate sequencing library to link the two sequences. In addition, only a small subset of oligonucleotides on each bead actually contained the optically decodable barcode sequence, limiting the fluorescence signal and therefore imaging speed and throughput. For SCOPE-seq2, we devised an improved optically decodable bead where the sequencing and optically decodable barcode sequences that identify a given cell are the same (Fig. 1B). The cell barcode contains two 8-nucleotide sequences, each of which is a member of a pool of 96 sequences (Supplementary Table S1). An 8-nucleotide random sequence is dispersed into three parts and serves as both a unique molecular identifier (UMI) and a spacer between other functional sequences on the bead. The oligonucleotides on all beads share two common sequences-a universal PCR adapter on the 5′-end and oligo(dT) on the 3′-end for mRNA capture and cDNA amplification. The oligonucleotides are synthesized by split-pool, solid-phase synthesis (Fig. 1C). Beads are pooled together to add common sequences and random UMIs, and are split into 96 reactions to add one of the 96 cell barcode sequences. After two rounds of split-pooling, a total of 96 2 = 9216 cell barcodes are generated. To generate cDNA from cells, we co-encapsulate the cells with these beads, lyse the cells, capture cell mRNAs on beads by hybridization, and reverse transcribe the captured mRNAs.
To link cellular imaging with scRNA-seq from the same cell, we identify the cell barcode sequence on each bead in the microwell array by sequential fluorescent probe hybridization. Our strategy is related to methods of decoding DNA microarrays and highly multiplexed fluorescence in situ hybridization (FISH) [8][9][10][11] . We use a temporal barcoding strategy in which each 8-nt cell barcode sequence corresponds to a unique, pre-defined 8-bit binary code (Supplementary Tables S2, S3). Each bit of the binary code can be read out by one cycle of probe hybridization, where the presence or absence of a hybridized probe indicates one or zero, respectively. The two parts of the cell barcode can be decoded simultaneously using two sets of differently colored fluorescent probes. To realize this decoding scheme, we generate a pool of fluorescent probes for each cycle of hybridization. All oligos whose sequences are complimentary to the cell barcode sequence marked '1' in the corresponding binary code are pooled and conjugated with fluorophores, Cy5 or Cy3. Distinct fluorophore-conjugated probes against the two 8-nucleotide sequences comprising the cell barcode are then pooled together to form the final probe pool (Fig. 1D). Thus, we are able to decode all possible cell barcode sequences by eight cycles of two-color probe hybridization. This approach is more scalable than the original SCOPE-seq strategy and gives a brighter signal on the bead surface because every primer contains an optically decodable barcode. Thus, SCOPE-seq2 beads are compatible with higher speed imaging, leading to higher throughput.
Finally, we further increased the cell indexing capacity to 96 2 × 10 = 92, 160 by dividing the microwells into ten regions as previously described 4 . We extract the beads from each region of the device separately for library construction and indexing, and then sequence the cDNA libraries from each region in a single pool.
Cell barcode optical decoding analysis for SCOPE-seq2. To decode the cell barcode sequences from imaging, a 'cycle-by-cycle' method was used in SCOPE-seq 4,10 , which calls the binary code for each bead based on the bimodal distribution of intensity values across all beads in each hybridization cycle. This method works well when the bead fluorescence intensity values of the 'one' state population are well separated from that of the 'zero' state population. However, because the beads exhibit auto-fluorescence at shorter wavelengths, the two populations are not as well separated in the Cy3 emission channel as in the Cy5 emission channel (Supplementary Fig. S1).
To accurately decode the cell barcode sequences from imaging, we utilized a modified 'bead-by-bead' fluorescence intensity analysis strategy, which has been used to decode randomly ordered DNA microarrays 10 . We determine the cell barcode sequences of each bead by sorting the eight intensity values in each emission channel in ascending order, calculating the relative intensity change between each pair of adjacent values, establishing a threshold based on the largest relative intensity change to assign a binary code, and mapping the binary code to the actual cell barcode sequence ( Fig. 2A). For those unmappable binary codes, we repeatedly re-assign the binary code based on the next largest relative intensity change until the code can be successfully mapped to a cell barcode sequence. Since this method decodes each bead independently, we expected that it would give better results when the 'one' and 'zero' intensity states were poorly separated. Optical decoding errors have three possible outcomes: (1) assigning the bead an unmappable binary code that does not exist in our barcode set; (2) assigning the bead a mappable binary code that does not correspond to a single-cell profile in the sequencing data; (3) assigning the bead a mappable binary code that does correspond to a single-cell profile in the sequencing data, resulting in assignment of a single-cell image to the incorrect scRNAseq profile. The first and second outcomes reduce our linking yield, but do not cause linking errors, whereas the third outcome leads to the incorrect linking. In SCOPE-seq2, for each region, we only used 96 2 = 9216 out of 2 16 = 65,536 possible binary codes for the cell barcodes, so that most decoding errors result in the first outcome. Importantly, the number of sequenced cell barcodes in one region of our device (< 1000 cells per region) is far fewer than the total 96 2 = 9216 possible cell barcodes, so if a bead is assigned a mappable binary code, an error will most likely lead to the second outcome rather than the third. For example, in dataset PJ070, we optically decoded cell barcodes using the 'cycle-by-cycle' method. Among all beads that paired with cells in imaging, 57.4% of the beads were assigned an unmappable binary code (first outcome) and 17.9% of the beads have the second outcome. The remaining beads either resulted in correct links between single-cell images and scRNA-seq profiles or contributed to our very low error rate (see below). Thus, a more accurate optical decoding method is expected Scientific Reports | (2020) 10:19482 | https://doi.org/10.1038/s41598-020-76599-w www.nature.com/scientificreports/ to give a higher fraction of linked imaging and sequencing data. To compare the 'bead-by-bead' method with the 'cycle-by-cycle' method, we tested these two methods on two datasets. In dataset PJ070 and PJ069, we linked 46% and 57% scRNA-seq profiles with cell images using the 'bead-by-bead' method compared to only 24% and 37% using the 'cycle-by-cycle' method. In both datasets, we observed at least a 20% increase in the fraction of linked cells with the 'bead-by-bead' method ( Fig. 2B), which suggests that the 'bead-by-bead' method is more suitable for cell barcode optical decoding in SCOPE-seq2.

Validation of SCOPE-seq2.
To demonstrate the performance of SCOPE-seq2, in terms of throughput, molecular capture efficiency, and accuracy of linking imaging and sequencing data, we performed an experiment with mixed human (U87) and mouse (3T3) cells labeled with two differently colored live staining dyes. We loaded the mixed cells into the microwells at a relatively high density and obtained 9061 transcriptional profiles from a single experiment. At saturating sequencing depth, we detected on average 10,245 mRNA transcripts from 3548 genes per cell (Fig. 3A,B), which is a nearly three-fold improvement over the original report of SCOPE-seq (Fig. 3C,D) 4 . To evaluate the linking accuracy of SCOPE-seq2, we identified the species of each cell from the color of the fluorescent label and from the species-specific alignment rate in RNA-seq (a cell with > 90% of reads aligning to the transcriptome of a given species was considered species-specific), and examined the consistency of the two cell species calls. In the 4145 scRNA-seq profiles that we successfully linked with imaging data, we obtained a class-balanced linking accuracy of 99.2% (0.8% error rate), with 98.8% of human cells and 99.6% of mouse cells agreeing with the species calls from two-color imaging (Fig. 3E). This represents a nearly five-fold improvement in accuracy over the original report of SCOPE-seq (3.9% error rate). We are also able to confidently remove multiplets in SCOPE-seq2 by manually identifying mixed-species and single-species multiplets from the two-color cell images ( Supplementary Fig. S2). By comparing image-based and sequencing-based mixed-species multiplets, we obtained a multiplet detection sensitivity of 68.8% and a specificity of 97.0%. A large portion of transcriptional profiles with low purity have been removed (Fig. 3F). Since we confirmed that SCOPE-seq2 has high linking accuracy, we suspected that the mixed-species multiplets detected by sequencing but not imaging were because of imperfections in scRNA-seq data, which served as our ground truth.

SCOPE-seq2 allows paired analysis of image-based and transcriptional phenotypes in individual cells isolated from human tissues.
To demonstrate that we can collect paired optical and transcriptional phenotypes from human tissue samples using SCOPE-seq2, we performed an experiment on cells dissociated from a human GBM surgical sample and labeled with calcein AM, a fluorgenic dye that reports esterase activity. We obtained 1,954 scRNA-seq profiles and linked 1110 of them to live cell images. We manually removed cell multiplets based on imaging. Calcein AM is commonly used as a live stain, and so we also removed outlier cells with low fluorescence intensity (see "Methods"). Malignantly transformed GBM cells often resemble non-neoplastic neural cell types in the adult brain, and so simple marker-based analysis is insufficient to confirm malignant status. To address this, we identified a large population of cells with amplification of chromosome 7 and loss of chromosome 10, two commonly co-occurring aneuploidies that are pervasive in GBM 12,13 , based on the gene expression ( Supplementary Fig. S3). We then computed a low-dimensional representation of the www.nature.com/scientificreports/ data using single-cell hierarchical Poisson factorization (scHPF) to identify key gene signatures that define the population 14 and visualized their distributions across cells using Uniform Manifold Approximation and Projection (UMAP). We recovered all of the major cell types that have been previously reported from scRNA-seq of GBM 13,15 including myeloid cells, endothelial cells, pericytes, malignant-transformed astrocyte-like cells, mesenchymal-like cells, oligodendrocyte-progenitor-like/neuroblast-progenitor-like cells (OPC/NPC) and cycling cells (Fig. 4A,B). We also measured 16 imaging features from cell images and grouped those features into three   Table S4).

SCOPE-seq2 identifies relationships between imaging features and lineage identities of malignantly transformed GBM cells. Malignant cells in GBM can resemble multiple neural/glial line-
ages or exhibit a mesenchymal phenotype 13,[15][16][17] . Because malignant GBM cells are known to be highly plastic, we decided to use a diffusion map to visualize their lineage relationships 18 . We selected malignant cells based on aneuploidy as described above (Fig. 5A), reduced the dimensionality of malignant cell gene expression by scHPF, and visualized the factorized data with a diffusion map, which revealed two major branches (Fig. 5B).
One branch consists of astrocyte-like cells and terminates with mesenchymal-like cells, while the other branch consists of OPC/NPC cells and cycling cells. This is consistent with our previous studies showing that astrocytelike and mesenchymal glioma cells are significantly more quiescent than OPC-like glioma cells 13 .
To explore that how imaging features of malignant cells are related to their transcriptional phenotypes, we asked whether unsupervised clustering of cellular imaging features would correspond to the two major branches observed in the diffusion map from scRNA-seq (Fig. 5B). We clustered malignant cells by the three imaging meta-features described above using hierarchical clustering, and identified two major cellular imaging clusters (Fig. 5C). By plotting two imaging clusters on the diffusion map embedding of the malignant cells, we found that cells with round shape, low intensity and small size (imaging cluster 0) are enriched in the OPC/NPCcycling branch, and cells with rough shape, high intensity and large size (imaging cluster 1) are enriched in the astrocyte-mesenchymal branch (Fig. 5D,E). This finding was further supported by differential expression analysis comparing expression profiles of cells in the two imaging clusters. As expected, markers of OPC/NPCs (MAP2, OLIG1, DLL3) and cycling cells (CDK6) are significantly enriched (FDR < 0.05, Mann-Whitney U test) in imaging cluster 0, while markers of astrocyte-like cells (APOE, GFAP, GJA1, AQP4, ALDOC) and mesenchymal cells (CHI3L1, CD44, CHI3L2, CCL2) are significantly enriched (FDR < 0.05, Mann-Whitney U-test) in imaging cluster 1 (Fig. 5F, Supplementary Table S5). Therefore, there is a clear correspondence between the major gene expression and basic imaging features for the malignantly transformed cells in this tumor.

Discussion
SCOPE-seq enables a direct link between live cell imaging, cytometry, and scRNA-seq with the scalability and low cost of conventional microfluidic and pooled barcoding approaches. With the upgraded SCOPE-seq2, we achieve enhancements in almost every aspect of performance. In the first experiment described here, we profiled > 9000 cells in a single device, linking > 4100 of them to imaging data. This represents a ~4-fold improvement in throughput. We also achieved a ~ 3-fold improvement in molecular capture efficiency for scRNA-seq and a ~ 5-fold decrease in the error rate for linking imaging and sequence data. Importantly, our improved optical barcode design greatly simplified the automation and microscopy required for SCOPE-seq2. In the original SCOPE-seq, a small fraction of the oligonucleotides on each bead contained optical barcodes, which limited the fluorescence intensity of the beads and required the use of laser-based optics, a relatively sensitive camera, and a small field-of-view. Because of the relatively low signal and autofluorescence of the beads, we were restricted to red fluorophores and could only image in a single channel, which limited our multiplexing capacity and decoding speed. In SCOPE-seq2, every oligonucleotide on the bead contains an optical barcode (~ 100-fold more oligonucleotides) and so a fast, automated microscope with a large field-of-view camera and simple, LED illumination are sufficient and allow two-color optical decoding. These advances make the technology more accessible and contribute to the improved performance.
In SCOPE-seq2, we have achieved substantial improvements in throughput and linking accuracy, but the fraction of cells with linked imaging and sequencing is still ~ 50% as in the original report of SCOPE-seq. For future improvements, we could introduce an error correcting code 8,11 and eliminate cell barcode sequences that have low linking rates across multiple experiments (i.e. barcodes that cause systematic errors). We expect that these modifications would improve the yield of linked cells, and further improve the throughput and linking accuracy of SCOPE-seq2.
SCOPE-seq2 compares favorably to alternative approaches for linking imaging or cytometric data to scRNAseq. Some of the earliest techniques combined index sorting by FACS with scRNA-seq to link cytometric data with expression profiles on a cell-by-cell basis 7 . However, this is relatively expensive, does not allow imaging, and is limited by the scalability of library construction in 96-or 384-well plates. The Fluidigm C1, an early commercial platform for scRNA-seq, which could initially profile tens of cells and later scaled to hundreds of cells, directly linked live cell imaging and scRNA-seq 5 . However, it was also limited by relatively high operating costs and other performance issues such as multiplet capture 1 . The icell8 from Takara/Wafergen can link low resolution imaging for cytometry to scRNA-seq, but throughput is limited to ~ 1000 to 1800 cells per sample according to product literature. An upgraded version 6 combines the Wafergen technology with FACS and achieves higher throughput (~ 7500 cells per sample), but the imaging resolution appears limited to cytometric applications, likely because the chamber volumes are > 1000-fold larger than in SCOPE-seq2 (hundreds of nanoliters vs. ~ 100 picoliters). In addition, achieving a high loading density requires FACS, which can be problematic for some cell types. Relatedly, costs per cell were cited as ~ $1 including 100,000 reads, whereas SCOPE-seq2 is < $0.40 at 100,000 reads due to substantially reduced library preparation costs. Finally, Zhang et al. recently reported microfluidic technology for linking cytometric analysis with scRNA-seq using a combination of droplet microfluidics and microfabricated chambers 19 . The authors claim that their approach is more scalable than SCOPE-seq, but only demonstrated a throughput of ~ 1200 cells which is fewer than both the initial report of SCOPE-seq and the upgraded version www.nature.com/scientificreports/ described here. Furthermore, it is unclear whether this approach is able to link live cell imaging with scRNA-seq as opposed to just flow cytometric data. The improvements to SCOPE-seq2 have enabled applications in primary cells isolated from complex tissues, which are typically more challenging to profile by scRNA-seq than cell lines grown in culture. In parallel with this study, we used SCOPE-seq2 to reveal cell size dynamics during adult neurogenesis in the mouse brain and identify the precise stage in neuronal differentiation where morphological changes associated with cell cycle entry occur 20 . In the same experiment, we also used SCOPE-seq2 with anti-NOTUM immunostaining to identify the cellular targets of NOTUM, an extracellular WNT antagonist that plays a crucial regulatory role in adult neurogenesis. Here, we profiled a human GBM surgical specimen using SCOPE-seq2. The scRNA-seq data from this experiment recapitulated all of the major cellular populations and states that have been associated with GBM in previous reports. In a focused analysis of the malignantly transformed tumor cells, we discovered a strong correlation between certain morphological features of individual cells and their cellular identities. Consistent with earlier studies, the transformed cells appear to differentiate along two major branches-one that includes OPC-like, NPC-like and proliferative glioma cells, and a second that includes astrocyte-like and mesenchymallike glioma cells that are more quiescent. Interestingly, these two major branches are also distinguishable by the imaging features of the dissociated cells. The astrocyte-/mesenchymal-like cells are larger, less round, and exhibit higher esterase activity. Differential expression analysis based on imaging classification alone separated canonical markers of these two populations, demonstrating that significant information about cellular identity is encoded in simple imaging observables. To summarize, SCOPE-seq2 is a versatile and high-performance technology for directly linking live cell imaging and scRNA-seq that scales to thousands of cells per sample and enables applications in both cell lines and primary cells dissociated from complex tissues.

Methods and protocols
Bead construction. 8 Table S1) were designed using an R package 'DNAbarcodes' with following criteria: sequences were at least 3 Levenshtein distance from each other; sequences that contain homopolymers longer than 2 nucleotides, with GC content < 40% or > 60%, or perfectly self-complementary sequences were removed. Full-length mRNA capture oligo sequences (Fig. 1B) are then generated with these candidate 8-nt cell barcode sequences in a combinatorial fashion. Self-complimentary score of each candidate 8-nt cell barcode sequence, defined as the length of the longest continuous stretch of self-complimentary sequence among all full-length mRNA capture oligo sequences that contain this 8-nt cell barcode sequence, is computed. Every A-T paring and C-G paring is scored with a length of 2/3 and 1, respectively, to account for the stronger binding affinity of C-G paring compared to A-T paring. The 8-nt cell barcode sequences with the bottom 50% self-complimentary scores are selected.  Table S2) with 3′-amino modifications were synthesized and purified (Sigma-Aldrich), then resuspended in water at 200 μM. To generate probe mixtures corresponding to each bit in the binary code, oligonucleotides labeled with '1' were taken (Fig. 1D, Supplementary Table S3), pooled and resuspended in 0.1 M sodium tetraborate (pH 8.5) coupling buffer at a final concentration of 22 μM with 0.6 μg/μL reactive fluorophore. Sulfo-CY5 NHS ester (Lumiprobe, cat# 21320) was coupled with S oligo pools, and Sulfo-CY3 NHS ester (Lumiprobe, cat# 23320) was coupled with Q oligo pools overnight at room temperature. Excess fluorophores were removed and oligos were recovered by ethanol precipitation (80% Ethanol, 0.06 M NaCl, 6 μg/mL glycogen). The concentration of probes was quantified using a NanoDrop (Thermo Scientific). Probe pools were diluted such that each probe had a final concentration of ~ 20 nM, and the two, distinctly labeled probe pools were mixed together for each binary code bit prior to use.

SCOPE-seq2.
• Preparation -A microwell array device was filled with wash buffer (20 mM Tris-HCl pH7.9, 50 mM NaCl, 0.1% Tween-20) and stored in a humid chamber one day before use. -Cell culture or tissue samples were dissociated into single cell suspension (see section, GBM tissue processing), and stained with desired fluorescent dyes.
• Cell loading -The pre-filled microwell array device was flushed with Tris-buffered saline (TBS).
-The single cell suspension was pipetted into the microwell array device.
-After 3-min, un-trapped cells were then flushed out with TBS.
• Cellular imaging Scientific Reports | (2020) 10:19482 | https://doi.org/10.1038/s41598-020-76599-w www.nature.com/scientificreports/ -The cell-loaded microwell device was scanned using an automated fluorescence microscope (Nikon, Eclipse Ti2) under the bright-field and fluorescence channels. Bright-field images were taken using an RGB light source (Lumencor, Lida) and wide-field 10 × 0.3 NA objective (Nikon, cat# MRH00101). Fluorescence images were taken using LED light source (Lumencor, spectra x), Quad band filter set (Chroma, cat# 89402), wide-field 10 × 0.3 NA objective (Nikon, cat# MRH00101) with 470 nm (GFP channel) and 555 nm (TRITC channel) excitation for Calcein AM and Calcein red-orange, respectively. • scRNA-seq (steps performed on microwell device) -Beads (Chemgenes) were pipetted into the microwell device, and untrapped beads were flushed out with 1 × TBS. The microwell device containing the cells and the beads was connected to the computercontrolled reagent and temperature delivery system as previously described 21  • Bead optical demultiplexing -The microwell device containing the beads with cDNAs was connected to a computer-controlled reagent delivery and scanning system (see section, "Automated reagent delivery system"). -Melting buffer (150 mM NaOH) was infused into the device and incubated for 10 min. The device was then washed with imaging buffer (2xSSC, 0.1% Tween-20). An automated imaging program scanned the device in the bright-field, Cy3 and Cy5 emission channels. Fluorescence images were acquired using an LED light source (Lumencor, spectra x), Quad band filter set (Chroma, cat# 89402), wide-field 10 × objective (Nikon, cat# MRH00101) and 555 nm and 649 nm excitation for Cy3 and Cy5, respectively. Hybridization solution (imaging buffer supplemented with probe pool A, described below) was infused into the device and incubated for 10 min. The device was then washed with imaging buffer. An automated imaging program scanned the device in the bright-field, Cy3 and Cy5 emission channels. -Repeat the previous step 7 times, with probe pool B to H. -Melting buffer was infused into the device and incubated for 10 min. The device was then washed with imaging buffer, and then disconnected from the automated reagent delivery system.
• scRNA-seq (steps performed off microwell device) -Perfluorinated oil was pipetted into the device to seal the microwells. The device was then cut into 10 regions. Beads from each region were extracted separated by soaking each small piece of bead-containing PDMS in 100% ethanol, vortexing, water bath sonication, and centrifugation in a 1.7 mL microcentrifuge tube. PDMS was then removed by tweezer. -Beads extracted from each region were processed in separate reactions for the downstream library construction. Beads were washed sequentially with TE/SDS buffer ( , 72 °C 5 min) on a thermocycler. PCR product from each piece was pooled and purified using SPRI paramagnetic bead (Beckman, cat# A63881) with a bead-tosample volume ratio of 0.6:1. -Purified cDNAs were then tagmented and amplified using the Nextera kit for in vitro transposition (Illumina, FC-131-1024). 0.8 ng cDNA was used as input per reaction. A unique i7 index primer was used to barcode the libraries obtained from each piece of the device. The i5 index primer was replaced by a universal P5 primer (Supplementary Table S6) for the selective amplification of 5′ end of cDNA (corresponding to the 3′ end of mRNA). Two rounds of SPRI paramagnetic bead-based purification with a bead-to-sample volume ratio of 0.6:1 and 1:1 were performed sequentially on the Nextera PCR product to obtain sequencing-ready libraries.  Table S6) was used for read 1.
Scientific Reports | (2020) 10:19482 | https://doi.org/10.1038/s41598-020-76599-w www.nature.com/scientificreports/ Automated reagent delivery system. An automated reagent delivery and scanning system is designed for automated optical decoding. In this system, fixed positive pressure (~ 1 psi) stabilized by a pressure regular (SMC Pneumatics, cat# AW20-N02-Z-A) is used to drive fluid flow. The microwell device is constantly pressurized during incubation steps, which prevents evaporation and bubble formation. Two 10-channel rotary selector valves (IDEX Health & Science, cat# MLP778-605) are connected in parallel to toggle between 14 reagent channels. A three-way solenoid valve (Cole-Parmer, cat# EW-01540-11), located at the downstream of the microwell device is used as an on/off switch for reagent flow. The multi-channel selector valves are controlled by a USB digital I/O device (National Instruments, cat# SCB-68A). The three-way solenoid valve is controlled by the same USB digital I/O device, but through a homemade transistor-switch circuit. The system is controlled by imaging software (Nikon, NIS-Elements).
Bead optical decoding analysis. Eight cycles of probe hybridizations (A to H) were used for cell barcode optical decoding. For each cycle, the device was imaged in the bright-field, Cy3 and Cy5 emission channels. Beads were first identified in the bright-field image by the ImageJ Particle Analyzer plugin, and the positions of the beads in the bright-field image were recorded. Then the average fluorescence intensities of each bead in the Cy3 and Cy5 images were measured. Beads identified in cycles B to H were mapped to the nearest bead in cycle A. Thus, we obtained a probe hybridization matrix with n beads × 16 intensity values (8 for Cy3 and 8 for Cy5).
To call cell barcodes from the imaging data, we tested two methods: • Cycle-by-cycle The cycle-by-cycle method was modified from the stage-by-stage decoding method 10 .
-For each cycle and each fluorescent channel; -Get N log transformed average intensity values; -Compute an intensity histogram using 50 bins; -Determine the median intensity value M , and identify the highest bin with intensity values smaller than M as B 1 and the highest bin with intensity values greater than M as B 2 ; -Identify the lowest bin B 3 with intensity values between B 1 and B 2 ; -Get the medium intensity value I of bin B 3 , then assign 0 to intensity values smaller than I and assign 1 to intensity values greater than I. -Refer to the binary code table. If the code assigned is in the table, then return the corresponding cell barcode sequence.
• Bead-by-bead The bead-by-bead method was modified from the core-by-core decoding method 10 .
-For each bead and each fluorescence channel; -Get eight average fluorescence intensity values x 1 , x 2 , . . . , x 8 ; -Let y 1 , y 2 , . . . , y 8 be the sorted values; -Let f n = y n+1 − y n /y n , n = 1, 2, . . . , 7 be the relative intensity fold change between neighbor sorted values; -Determine the largest fold change N = argmax n f n , then assign 0 to values to y 1 , y 2 , . . . , y N and assign 1 to values y N+1 , y N+2 , . . . , y 8 ; -Refer to the binary code table. If the code assigned in step 4 is in the table, then return the corresponding cell barcode sequence; -Otherwise, remove f N from list { f n } and repeat step 4, 5, until a corresponding cell barcode sequence is returned or the list { f n } is empty.
Live cell imaging analysis. Images were analyzed using the ImageJ software as previously described 4 .
• Identify microwells with cells Microwell outlines were identified as objects from the bright-field image using a local threshold, and then average fluorescence intensities of microwells in the live staining images were measured. Average intensity values followed a bimodal distribution, with the higher intensity population corresponding to microwells that contain cells.
• Cell optical phenotype extraction Only microwells with cells were selected and each cell was analyzed individually within the smallest bounding square of the corresponding microwell. The cell was identified in the live staining fluorescence image using the auto threshold and particle analyzer. Microwells with multiple cells identified by the software were excluded. Sixteen imaging features were measured for each cell in the fluorescence image: area, mean intensity, standard Scientific Reports | (2020) 10:19482 | https://doi.org/10.1038/s41598-020-76599-w www.nature.com/scientificreports/ deviation of intensity, minimum intensity, maximum intensity, median intensity, perimeter, width, height, major axis, minor axis, circularity, Feret's diameter, minimum Feret's diameter, roundness, and solidity.

SCOPE-seq2 scRNA-seq analysis.
To analyze the scRNA-seq data from SCOPE-seq2, we first extracted the cell-identifying barcode and UMI from Read 1 based on the designed oligonucleotide sequence, NN(8-nt Cell Barcode S)NN(8-nt Cell Barcode Q)NNNN. The 192 8-nt cell barcode sequences have a Hamming distance of at least three for all sequence pairs. Therefore, we corrected one substitution error in the cell barcode sequences. We only keep reads with a complete cell barcode. Next, we align the reads from Read 2 to a merged human/mouse genome (GRCh38 for human and GRCm38 for mouse) with merged GENCODE transcriptome annotations (GENCODE v.24 for both species) using STAR v.2.7.0 aligner 22 after removing 3′ poly(A) tails (indicated by tracts of > 7 A's) and fragments with fewer than 24 nucleotides after poly(A) tail removal. Only reads that were uniquely mapped to exons on the annotated strand were included for the downstream analysis. Reads with the same cell barcode, UMI (after one substitution error correction) and gene mapping were considered to originate from the same cDNA molecule and collapsed. Finally, we used this information to generate a molecular count matrix.  GBM tissue processing. A single-cell suspension was obtained from excess material collected during surgical resection of a WHO Grade IV GBM. The patient was anonymous and the specimen was de-identified. The tissue was mechanically dissociated following a 30-min incubation with papain at 37 °C in Hank's balanced salt solution. Cells were re-suspended in TBS after centrifugation at 100×g followed by selective lysis of red blood cells with ammonium chloride for 15 min at room temperature. Finally, cells were washed with TBS and quantified using a Countess (ThermoFisher).

Human and mouse cells mixed experiment.
• Human U87 cells were stained with Calcein AM (ThermoFisher Scientific, cat# C3100MP) and mouse 3T3 cells were stained with Calcein red-orange (ThermoFisher Scientific, cat# C34851) in culture medium at 37 °C for 10 min. The stained cells are then dissociated into single cell suspension by 0.25% Trypsin-EDTA (Life Technologies, cat# 25200-072) and re-suspended in TBS buffer. The U87 and 3T3 cells were mixed at 1:1 ratio with a final total cell concentration 1000 cells/μL. • The mixed cell suspension was processed and sequenced with SCOPE-seq2 workflow described above (PJ070). • Images and sequencing data were processed with the SCOPE-seq2 pipeline described above.
Sub-sampling analysis. To analyze the saturation behavior and sensitivity of scRNA-seq data from SCOPE-seq2 (Fig. 3A), we randomly sub-sampled the aligned reads and re-processed them with the scRNA-seq analysis pipeline described above. We then calculated two statistics, molecules per cell and genes per cell, based on the cells that were discovered from the total reads.
To compare the molecular capture efficiencies of SCOPE-seq and SCOPE-seq2 (Fig. 3C,D), we compared the number of unique molecules and gene IDs detected in two mixed-species experiments performed using the SCOPE-seq (PJ061) and SCOPE-seq2 (PJ070). To correct for the difference in sequencing depths between the two methods, we randomly sub-sampled the aligned reads from SCOPE-seq, which had a higher sequencing depth, to match that of the SCOPE-seq2 dataset (to ~ 5.3 reads/molecule).
Accuracy of linking imaging and scRNA-seq data. The linking accuracy was defined as the concordance between the scRNA-seq and imaging-based species calling for cell barcodes associated with a single species. In scRNA-seq data, cells with > 90% of reads aligning uniquely to a given species were considered to correspond to a single species. In the imaging data, we determined the imaging-based species call based on cell live staining colors. Cells with Calcein AM intensity > 724 were called as imaging-based human cells; Cells with Calcein red-orange intensity > 2048 were called as imaging-based mouse cells. Intensity thresholds were determined as the intensity of the shortest bin between the two mean values of the bimodal Gaussian distribution of intensity values.
Imaging based multiplets identification. Two-color live staining fluorescence images were merged with Calcein AM signal in green and Calcein red-orange signal in magenta. Each well was manually examined within the smallest bounding square. Wells with mixed-species cells were determined as having at least one Scientific Reports | (2020) 10:19482 | https://doi.org/10.1038/s41598-020-76599-w www.nature.com/scientificreports/ green object and one magenta object; wells with a single cell were determined as having only one green object or one magenta object.

Glioblastoma experiment.
• GBM specimen was collected and dissociated into single cells as described above. Cells were stained with Calcein AM (ThermoFisher Scientific, cat# C3100MP) • The GBM cell suspension was processed and sequenced with SCOPE-seq2 workflow described above (PJ069).
• Imaging and sequencing data were processed with the SCOPE-seq2 pipeline described above.
• We removed multiplets based on manually examination of each well within the smallest bounding square of the Calcein AM fluorescence image. • We identified the dead cells based on the Calcein AM fluorescence intensity. We fitted a Gaussian distribution to the fluorescent intensity histogram, set a threshold of lower 5 percentile, and removed cells with intensity lower than the threshold.

Single cell hierarchical Poisson factorization (scHPF) analysis.
To reduce the dimensionality of scRNA-seq results, we factorized gene count matrix using the scHPF 14 with default parameters and K = 13 (www. githu b.com/simsl ab/schpf ). One of the factors contained several heat shock with high gene scores (among the top 50 genes), likely indicating dissociation artifacts in certain cells. This factor was removed in all downstream analysis.

Malignant cell identification.
The cell aneuploidy analysis was performed based on the scHPF model as described previously 23 . To compute the scHPF-imputed expression matrix, we multiplied the gene and cell weight matrix (expectation matrix of variable θ and β) in the scHPF model and then log-transformed the result matrix as log 2 expectedcounts/10000 + 1 . The average gene expression on each somatic chromosome was calculated using the scHPF-imputed count matrix as previously described 13 . We defined a malignancy score as the difference between the average expression of Chr. 7 genes to that of Chr. 10 genes, < log 2 Chr.7Expression > − < log 2 Chr.10Expression > , plotted in Fig. 5A. We fitted a double Gaussian distribution to the malignancy score and the score of the shortest bin between two mean intensities was used as the threshold that separates the malignant and non-malignant cell populations ( Supplementary Fig. S3A). The difference of chromosome average expression between malignant and non-malignant cells, computed as the expression subtracted by the average expression of non-malignant cells, was shown in Supplementary Fig. S3B.

scRNA-seq clustering and visualization.
To visualize the scHPF model (Fig. 4A), we generated a UMAP embedding using the Pearson correlation distance matrix computed from the cell score matrix. To cluster the scRNA-seq profiles, we used the Phenograph implementation of Louvain community detection 24 , with the Pearson correlation matrix and k = 50 to construct a k-nearest neighbors graph.
Cell optical phenotypes clustering. To reduce the dimensionality of the cellular imaging features, 16 cell imaging features were z-normalized and hierarchically clustered using the 'linkage' method in the python module 'scipy' with correlation distance. The dendrogram in Fig. 4C was cut as k = 3 to form three clusters of imaging features, corresponding to cell size, shape and esterase activity. The values of meta-features were calculated as an average of the imaging features within each cluster.
To cluster the malignant cells based on their optical phenotypes, we hierarchically clustered imaging metafeatures using the 'linkage' method in the python module 'scipy' with correlation distance. The dendrogram in Fig. 5C was cut as k = 2 to form two clusters of malignant cells.

Diffusion map embedding of malignantly transformed GBM cells.
We factorized the molecular count matrix for malignantly transformed GBM cells (identified by aneuploidy analysis as described above) using scHPF with default parameters and K = 15. Prior to further analysis, we removed one of the 15 factors, which exhibited high scores for heat shock response genes, because it likely represents a dissociation artifact in a subset of cells. We then computed diffusion components with the DMAPS Python library (https ://githu b.com/ hsidk y/dmaps ). A Pearson correlation distance matrix computed from the scHPF cell score matrix was used as input with a kernel bandwidth of 0.5. The first two diffusion components are plotted in Fig. 5B,D,E. scRNA-seq differential expression. We used the Mann-Whitney U test for differential expression analysis. For pairwise comparison of two groups of cells, the group with more cells was randomly sub-sampled to the same cell number as the group with fewer cells. Next, the detected molecules from the group with a higher average number of molecules detected per cell were randomly sub-sampled so that the two groups had the same average number of molecules detected per cell. The resulting sub-sampled matrices were then normalized using the random pooling method from Lun et al. as implemented in the scran R package 25 . Finally, the resulting normalized matrices were subjected to gene-by-gene differential expression testing using the Mann-Whitney U-test using the 'mannwhitneyu' function in the Python package SciPy. The resulting p values were corrected using the Benjamini-Hochberg method as implemented in the 'multipletests' function in the Python package statsmodels.