Comprehensive single-cell transcriptome analysis reveals heterogeneity in endometrioid adenocarcinoma tissues

Single cell transcriptome analysis of a cancer tissue can provide objective assessment of subtype population or the activation of each of various microenvironment component cells. In this study, we applied our newly developed technique of single cell analysis to the myometrial infiltration side (M-side) and the endometrial side (E-side) of a human endometrioid adenocarcinoma with squamous differentiation tissues. We also analyzed spherogenic cultures derived from the same tissue to identify putative regulators of stemness in vivo. Cancer cells in the E-side were highly malignant compared with those in the M-side. Many cells on the E-side were positive for spheroid-specific tumorigenesis-related markers including SOX2. In addition, there were higher numbers of epithelial-to-mesenchymal transition (EMT) cells in the E-side compared with the M-side. This study identified a site containing cells with high malignant potential such as EMT and cancer stem-like cells in cancer tissues. Finally, we demonstrate that established endometrioid adenocarcinoma subtype classifiers were variably expressed across individual cells within a tumor. Thus, such intratumoral heterogeneity may be related to prognostic implications.

expensive equipment and involve considerable time and effort for the treatment of cells. Recently, several methods for single cell gene expression analysis 4-8 that enable measurement of considerably larger numbers of single cells have been developed. We have also developed a novel strategy called Next generation 1 cell sequencing (Nx1-seq for short) to enable the single cell transcriptome analysis of thousands of single cells. Nx1-seq makes use of barcoded beads that can easily be amended to include a particular sequence. This new approach has high sensitivity and good reproducibility, and does not require expensive equipment.
The tumor microenvironment consists of many different cell types that have a range of biological roles. Navin et al. 9 reported that analysis of 100 single cells from a polygenomic tumor revealed three distinct clonal subpopulations that probably represented sequential clonal expansions. Cancer development is influenced by differential microenvironmental conditions such as hypoxia, immune cell infiltration and stromal fibrosis. Single-cell analysis in cancer can provide an objective assessment of the overall abundance and activation of various microenvironment components, including cancer-associated fibroblasts, vessels and immune cells.
In this study, we used Nx1-seq to characterize the complex cellular heterogeneity in a human endometrioid adenocarcinoma tissue. Endometrial cancer is the most common global the gynecologic malignancy 10,11 . Approximately 80% of cases are estrogen-dependent well-to-moderately differentiated endometrioid endometrial carcinomas (EEC) that are usually confined to the uterine corpus at diagnosis of stage I and, thus, most are surgically curable. In contrast, estrogen-independent EECs are poorly differentiated tumors that frequently invade the myometrium, extend beyond the uterus at the time of hysterectomy, and are associated with a poor outcome. Despite the high and increasing incidence of endometrial cancer, our current models for the prediction of prognosis and treatment responses are suboptimal. Novel molecular biomarkers are needed to assist clinical decisions. This study used our Nx1-seq analysis method to identify each cell type and status in a cancerous tissue and to develop candidate biomarker molecules.

Results
Sensitivity and reproducibility and of Nx1-seq. We have developed a method for comprehensive single-cell transcriptome analysis, which can obtain digital gene expression data from thousands of individual cells ( Supplementary Fig. 1a, b). To confirm the ability of Nx1-Seq to identify individual cells such as the PC9 cells, a lung cancer cell line, we compared the results with those from the RNA-seq analysis of a bulk cell population (hundreds of thousands of cells) ( Fig. 1a and Supplementary Table 1). We also performed Nx1-seq with the PC9 cell line to ensure the reproducibility of technical replicates. The top 100 libraries from PC9 cells were compared to the bulk cell library, and Pearson correlation coefficients of approximately 0.94 were obtained (Fig. 1b). The pooling of Nx1-seq data from single cells of two homogeneous cell populations provided rich and highly reproducible transcriptional profiles. In addition, gene expression patterns among libraries with large sequencing reads were similar (r = 0.97), as shown in the mean of the scatter plots (Fig. 1c). We tested the ability of Nx1-seq to identify individual cells among two cell types using a ~1:1 mixture of PC9 (a human lung carcinoma cell line) and LLC (a mouse lung cancer cell line) cells. The cells were loaded onto a microwell and Nx1-seq was performed using a Miseq sequencer. We found that human and mouse transcripts were associated with each cell barcode (Fig. 1d). Next, Nx1-seq data was compared with an other massively parallel bead-based scRNA-seq method, Drop-seq using mouse cell lines. The gene detection levels per reads of Nx1-seq for each cell lines were similar to those of Drop-seq (Fig. 1e,f). Moreover, we tested the method on human peripheral blood mononuclear cells. As shown in Fig. 1g and Supplementary Fig. 2, the obtained libraries could be clustered into monocytes, T cells, NK cells and B cells by a three dimensional analysis using t-Distributed Stochastic Neighbor Embedding (tSNE). These data suggest that Nx1-seq can be used to identify genes expressed in each cell population.
Heterogeneity of tumor tissues. Endometrioid adenocarcinoma (EA) of the endometrium is the most common type of endometrial carcinoma 10 . In this study, we obtained single cell transcriptomes from an EA with squamous differentiation. Figure 2a shows the workflow of Nx1-seq. To estimate the endometrial tumor cell diversity, a single cell analysis was performed using Nx1-seq. We also analyzed the differences between cancer tissue cell populations from two sites, namely the endometrial side (E-side) and the myometrial infiltration side (M-side). Myometrial invasion is an independent prognostic marker of endometrioid carcinomas and correlates with the risk of metastasis to pelvic and/or para-aortic lymph nodes 10,12 .
One thousand cells from the E-side and from the M-side were analyzed and yielded more than 400 genes per cell and more than 20,000 reads (Supplementary Table 2). We performed tSNE for cells in the E-side and M-side, and showed three major clusters based on cell types. The cell populations inferred from this analysis were readily matched to the tumor cells and infiltrating lymphocytes, including macrophages and T cell classes, based on the specific expression of known markers such as CD3D and CD68 (Fig. 2b). We defined each cell type using several cell-specific genes as shown in Supplementary Table 3. When we compared the cellular ratio in the tissues of both sides, T cell infiltration in the M-side was higher than in the E-side (Fig. 2b). Next, to determine whether the gene expression pattern from single cells was consistent with the locations of individual tissue cells in the cancer tissue, EA tissue sections were stained with antibodies specific for immune cell markers such as CD4, CD8, TIA1, FOXP3 and CD20, cell proliferation marker such as BIRC5 and MKI67, stromal cell marker such as VIM and ACTA2 ( Fig. 2c and Supplementary Fig. 3). These data showed that the Nx1-seq data of infiltrating T cells was consistent with the pathological data. Moreover, we estimated the population of the infiltrating T cells between the M-side and the E-side in EA from eight other endometrioid adenocarcinoma patients ( Supplementary Fig. 4). In agreement with the previous experiment, the data showed that T cell infiltration in the M-side was higher than in the E-side. The relative abundance of the major cell classes in our data agreed with the pathological data, indicating that Nx1-seq provided an accurate assessment of the cell population in the tumor environment. Heterogeneity of cancer cells. We next applied the Nx1-seq method to the characterization of cancer cells. It is well known that cancer cell populations include cancer stem cells, differentiated cells in the mesenchyme transitioning from epithelial cells, and cells affected by therapies. Therefore, we sought to determine whether our method could differentiate these cell populations using a range of biomarkers despite the accumulation of gene mutations in endometrial cancer. We used estrogen receptor (ER) and progesterone receptor (PR) as prognostic biomarkers as these have been validated for endometrial cancer 11 . Loss of ER and PR is linked to aggressive tumors, specifically to the endometrioid subtype. In addition, STMN1 and HER2 overexpression identifies high-risk patients and lymph node metastasis in endometrial cancer 11 . In agreement with the pathological assessment, few cells on either side were found to express ER or PR. In contrast, STMN1 positive cells were more abundant in the E-side ( Supplementary Fig. 3). Myometrial invasion in endometrioid carcinomas is thought to be correlated with the risk of metastasis and is related to epithelial-to-mesenchymal transition (EMT) [13][14][15] . We therefore used Nx1-seq to examine EMT in the E-side and M-side. We screened for cancer cells expressing at least one EMT marker, such as ACTA2, EPCAM, CD44, THY1, VIM, FN1, ZEB1 or CDH1 (Fig. 2d). Our results showed that the cancer cells could be separated into three groups as follows: cells with only epithelial markers (EA); cells with only mesenchymal markers (EA EMT ); and cells with both epithelial markers and mesenchymal markers (EA intEMT ). To characterize the cancer cells further, we used an unsupervised cluster analysis (Fig. 3). Interestingly, each cluster of cancer cells inferred from this analysis contained all three types of cells, namely EA, EA intEMT , and EA EMT cells (Fig. 2d). These data suggested that EMT-like cells in the classified groups might be derived from a single cell.
The relative frequencies of different EMT-like cells in the E-side and M-side were estimated. The analysis indicated many EA type cells in the M-side. In contrast, the EA intEMT and EA EMT cell types contributed a higher proportion of EMT-like cells in the E-side compared with the M-side (Fig. 2d). Over all, the results indicated that the cancer cell populations in the E-side and M-side were different. In addition, we examined expression of specific genes in EA EMT type cells. Genes that highly expressed in EA EMT compared with EA type cells were TCEAL4, BAG1, VAT1, and GPI which are known to be gynecological tumor markers (Supplementary Table 4).

Heterogeneity of infiltrating immune cells.
Recently, attention has turned to the various types of non-neoplastic cells present in tumors, such as stromal cells and infiltrating leukocytes, which can interact with tumor cells and have a major influence on long-term outcome in patients 16,17 . In addition to CD8+ cytotoxic T cells, macrophage subsets are also considered a key component of effective antitumor immunity 18 . We therefore sought to characterize leukocyte infiltration of the endometrial cancer microenvironment using Nx1-seq to analyze gene expression in macrophages and T cells in the E-and M-sides (Fig. 2e). The population of T cells and cytotoxic T lymphocytes (CTLs) was larger in the M-side than the E-side. Although the ratios of tumor-infiltrating . Clustering of cancer cells. We performed an unsupervised cluster analysis using the Nx1-seq data to determine to what degree the two sides of the cancer tissue could be distinguished for EA, EMT[intEMT] and EA[EMT] types. Notably, there was not complete separation of these three cancer types, indicating that each single cell became a single EA EMT during the growth of cancer. Enlarged view shows one example. macrophages did not differ between the two sides there were some qualitative differences in macrophages. A population of macrophages expressing CXCL3 and CXCL8, chemoattractants in inflammation, and NFKBIA, was present in greater numbers in the M-side (Fig. 4). In contrast, macrophages with UCHL1, which promotes cancer progression 19 , were more common in the E-side. The proportions of macrophages expressing the inflammatory factors CCL5, IL10 and IL6 did not differ between the two sides. The analysis of gene expression levels in T cells showed that MALAT1-and RARB-expressing T cells were more frequent in the M-side.

Cells expressing chemokines in cancer tissues. The infiltration of macrophages and T cells into cancer
tissues is regulated by several chemokines. We therefore investigated chemokine expression in tumor tissues by Nx1-seq. Figure 2f shows the number of cells expressing chemokines in the E-and M-sides. The number of cells expressing CCL4, CCL20, CXCL1, and CXCL2 was greater in the M-side than the E-side. Cells expressing these chemokines consisted of cancer cells including EA, EA intEMT and EA EMT cells, as well as macrophages and T cells. In addition, it is well known that CCL3, CCL4 and CCL20 are able to recruit T cells containing CTLs to the tumor site. Therefore, these chemokines might contribute to tissue invasion by T cells. Next, we examined the expression of CCL4 and CCL20 by T cells in the E-and M-sides (Fig. 2g). Our results indicated that chemokines were expressed by T cells and that their expression levels were different in the two sides.

Characterization of spheroid cells established from endometrial adenocarcinoma tissues.
It has been reported that cancer stem/initiating cells are present in endometrial cancer 20,21 . Here, we used Nx1-seq analysis to determine whether this method could also detect cancer stem/initiating cells in endometrial cancer. To characterize cancer stem/initiating cells from endometrial adenocarcinoma cells, we used a sphere-forming assay to enrich potential stem cells from the endometrial tissue and then performed Nx1-seq analysis (Fig. 5a). In addition, we also analyzed sphere-derived cells growing in serum adherent cultures (Ser cells), because the use of serum promotes cell differentiation. Sphere-derived cells growing in serum-free nonadherent cultures (Sph cells) were used to examine whether they influenced in vivo tumor formation when transplanted into NSG mice, compared with Ser cells. NSG mice were injected with 1-10 × 10 5 Sph and Ser cells. Subcutaneous tumor formation was induced by the injection of 10 Sph cells as shown by macroscopic examination of a tumor derived from Sph cells (Fig. 5b,c). In contrast, at least 1 × 10 2 Ser cells were required to give rise to a tumor. These results support the hypothesis that Sph cells have a high cancer-initiating ability.
When a sample of 500 cells from Sph and Ser were analyzed by Nx1-seq, both cell types were highly heterogeneous as previously reported (Fig. 5d) 22 . The relative number of cells positive for each of the genes screened was then compared between Sph and Ser cells (Supplementary Table 5). In Sph cells, the gene expressed in the largest proportion of cells was FADS2, followed by CTSV, S100B, and SCG2 (Fig. 5d). In contrast, the most frequently expressed gene in Ser cells was SFN, followed by ANKRD1, KRTAP2-3, and CA2. To identify gene-signature-based differences between Sph and Ser cell populations, we performed a gene set enrichment analysis (GSEA). Our results showed that gene sets involving "Nucleobasenucleosidenucleotide and nucleic acid metabolic process", "biopolymer metabolic process" and "negative regulation of cellular process" were upregulated in Sph/Ser  Table 5). SOX2, THY1, CD24 and ZEB1 genes were expressed at a higher level in Sph cells compared with Ser cells (Fig. 5d).
Cancer tissues and spheroid cells. The distribution of cancer stem-like cells expressing Sph-specific genes was analyzed in the two sides of endometrial cancer tissues. A larger proportion of cells showing high levels of gene expression were found in the E-side than the M-side (Table 1); by contrast, higher levels of expression occurred more frequently in Ser cells were found in the M-side. Cells positive for the stem cell markers THY1, MYC and SOX2 occurred at a higher rate in the E-side. Figure 5e shows immunohistostaining for SOX2 positive cells in the E-side and the M-side; the results of this immunohistochemical analysis for SOX2 positive cells were consistent with the pathological data.
We investigated whether malignant cells expressing stem cell markers and those in an EMT transcriptional state coexisted in the E-side and M-side. The analysis showed that the three types of cancer cells (EA, EA intEMT and Highly expressed genes in spheroid.  EA EMT ) expressed SOX2, suggesting that cancer stem-like cells might become EMT cells (Fig. 5f). In addition, cells expressing other Sph-specific markers, such as CTSV, S100B, and LCP1, were present at a higher rate in the E-side, suggesting that the E-side contained cells with greater tumorigenic activity. Notably, the E-side, but not the M-side, contains active tumorigenic cells and EMT cells. In contrast, T cells infiltrated into the M-side at a higher rate than into the E-side.

Number of cells
Origin of the squamous cell carcinoma. The origin of squamous cell carcinoma (SCC) in endometrioid adenocarcinoma with squamous differentiation is not clear. Because KRT14 is a marker for squamous cell carcinoma 23 , we screened the gene expression of KRT14-positive cells. The results indicated that KRT14-positive cells were present in the E-side but not in the M-side (Fig. 6a,b). This finding was confirmed by high molecular weight keratin staining. Single cell gene analysis showed that some SCCs positive for KRT14 expressed the adenocarcinoma marker KRT8 as well as the EMT markers VIM, and ZEB2 (Fig. 6b). Bass et al. 24 reported that SOX2-driven tumors expressed markers of both squamous differentiation and pluripotency. In the current study, one cell expressing KRT14 was found among SOX2-positive cancer cells in the E-side. This cell was also positive for BIRC5, STMN1, SOX2, KRT14, VIM, CD24, EPCAM1 and ZBTB17. Overall, our data suggested that the gene expression of KRT14 (but not the antibody) is a marker for SCCs in endometrial cancer. Moreover, SCC may be related to EMT and associated with factors such as serum or attachment signals.

Discussion
Cancer cells display a range of phenotypic states that differ in their functional properties. It has long been postulated that intratumoral heterogeneity contributes to disease progression and has a marked effect on therapeutic efficacy. Previous studies have shown the heterogeneity of cancer cells and stroma cells. Our newly developed single cell transcriptome analysis, which we termed Nx1-seq, can overcome issues related to heterogeneity and provide novel insights into tumor microenvironments. This new approach is simple and can be used to analyze several hundreds to tens of thousands of cells without special equipment. Moreover, these microwells with barcode beads in a Lab-Tek chamber slide can be stored with buffer for several months before use. Therefore, the plate can be carried anywhere and analyses can be performed immediately after cell isolation without the need to prepare plates. In addition, the size of the well can be modified depending on the size of the target cells, larger microwells for cancer cells or smaller microwells for leukocytes and other non-cancerous cells, or with varying degrees of morphology and membrane composition. To determine whether cells are present in a well, empty wells without beads can be determined by light microscopy. Alternatively, by using emulsion (em)PCR, we placed an insert sequence for any binding site such as TCR, or IgG, into the capture beads using specific primers ( Supplementary Fig. 5).
In this study, we characterized complex heterogeneous cell samples from human endometrioid adenocarcinoma tissue and analyzed thousands of cells per experiment using Nx1-seq. Human endometrioid adenocarcinoma tissues are comprised of varying ratios of different cell populations of infiltrating immune cells, fibroblasts and cancer cells. Here, we also investigated the differences in cells from the E-side and M-side. Myometrial invasion is an independent prognostic parameter of endometrioid carcinomas and is correlated with the risk of metastasis to the lymph nodes. In addition, cells expressing malignancy-related genes present at greater numbers in the myoinvasive front. However, in this study, cancer cells in the E-side were highly malignant compared with those in the M-side. Many cells on the E-side were positive for spheroid-specific markers, such as SOX2, CTSV and GNG2, which are related to tumorigenesis. However, EMT is also regulated by multiple epigenetic mechanisms that can repress the expression of epithelial markers and convert epithelial cells into aggressive, invasive tumor cells with stem cell properties. In addition to the increased frequency of cells with a cancer stem cell marker in the E-side, the population of EA intEMT and EA EMT cells in the E-side was higher than in the M-side. Moreover, the population of infiltrating T cells in the tumor, which is a marker for prognosis, was smaller in the E-side compared with the M-side. These data obtained by Nx1-seq, demonstrated that cells with high malignant potential (HMP) were present in the site of the same cancer tissue.
Examination of gene expression in sphere cells identified TCEAL4 25 , BAG1 26 , VAT1 27 and GPI 28 genes, which were significantly more highly expressed in EA EMT than EA cells. These genes were previous reported as cancer markers in gynecological tumors. Therefore, they may be useful as HMP markers of endometrial adenocarcinoma.
Huvila et al. examined the biological differences between low-grade and high-grade endometrioid endometrial adenocarcinomas 29 and Mhawech-Fauceglia et al. investigated the differentially expressed genes between late and early stage and good prognosis vs. poor prognosis endometrioid adenocarcinoma 30 . When we compared our data with these data sets, only GPI was consistently observed in both studies. Why so few genes were detected might be related to the difference in individuals and/or cancer stage or these specific genes might only be expressed in rare cells observed by single cell analysis.
We hypothesized that sites with two distinct characteristics were present in this carcinoma tissue: cancer cells at an HMP site, and a second site where the HMP of the cancer cells had decreased (Fig. 7). Cancer cells in the E-side had reduced proliferation potency similar to that of stem-like cells. This site may be a niche for cancer stem-like cells, and may be more prone to EMT. However, immunocytic invasion occurs at an extremely low frequency in this site. At the M-side, the expression of chemokines increases and immunocytic invasion occurs. Therefore, to develop more effective cancer therapies, our hypothesis suggests that the diversity of cancer cells in each distinct site of tumor tissues must be considered.
In conclusion, Nx1-seq analysis is a powerful approach for characterizing and understanding cellular diversity under physiological or pathological conditions. We showed that a cancer tissue in endometrioid adenocarcinoma with squamous differentiation was heterogeneous with a wide range of immune cells.

Methods
Clinical specimens. The endo`metrioid adenocarcinoma samples were obtained with informed consent from patients who had undergone radical resection at the Sapporo Medical University Hospital (Sapporo, Japan), and tissue acquisition procedures were approved by the ethics committee of Sapporo Medical University. Blood samples were obtained with informed consent from a healthy volunteer and collection was approved by the ethics Tumor sample. Endometrioid adenocarcinomas with squamous differentiation (grade III at diagnosis) were obtained from a patient and minced. The cell mixture was digested with collagenase and dispase, and cultured in PBS(-). A single cell suspension was passed through a 30-μm nylon mesh, and live cells were isolated by density gradient centrifugation using Lymphoprep (Axis Shield, Oslo, Norway) at 780 × g for 20 min.

Development of a method for comprehensive single cell transcriptome analysis. Our method
for single cell transcriptome analysis can provides digital gene expression data for hundreds or thousands of single cells. The method is a scalable approach for digital gene expression profiling of thousands of single cells without the use of robotics. The method has three steps: 1) preparation of polystyrene beads conjugated with barcode nucleotides and of oligo-dTs by means of emulsion PCR (Supplementary Fig. 1a); 2) placing a barcoded bead into a well and mixing a single cell with a barcoded bead; 3) adding a population of heterogeneous cells and beads into 20-pL wells molded in polydimethylsiloxane (PDMS) slides ( Supplementary Fig. 1b). Each slide can contain 1.6 × 10 5 wells (2 × 2 mm). Poly(dT) barcoded beads with a diameter of 20 μm were first added to the microwell slide to achieve 1 bead/well. Approximately 10,000 cells were allowed to settle into the wells of a PDMS slide by gravity. The slides were covered with dialysis membrane and then incubated with a cell lysis solution containing 1% lithium dodecyl sulfate. After lysis, cellular mRNA was bound to the poly(dT) barcoded beads, which were then collected and used for reverse transcription.
Preparation of barcode-dT capture beads. The protocol for preparing barcoded beads was modified slightly from that described in the instruction manual for the GS Junior Titanium emPCR Kit (Lib-L) kit (Roche) as shown in Supplementary Fig. 6. Briefly, the synthesized oligonucleotide ( 5′-CC AT CT CA TC CC TG C G TG TC TC CG AC TC AG GC AG TG AA AA AA AA AA AA AA AA AA AA AA AA AN NN NN NN NN NN NA CA TA GG CC GT CTTCAGCCGCTGAGACTGCCAAGGCACACAGGGGATAGG-3′, 1 × 10 5 molecules) was hybridized to polystyrene DNA capture beads (1 × 10 7 beads) bound to specific oligonucleotide (5′-CCTATCCCCTGTGTGCCTTGGCA-3) included in the kit and clonally amplified by emulsion PCR with primer-1 (5′-CCTATCCCCTGTGTGCCTTGGCA-3′) and primer-2 (5′ biotin-18-atom hexa-ethyleneglycol spacer -CCATCTCATCCCTGCGTGTC-3′) (IDT Technologies). After breaking the emulsion, an enrichment process was utilized to selectively capture beads with an oligo-dT, while rejecting null beads. After the beads were purified, they were treated with ExoI(NEB) for 30 min. The ExoI was inactivated at 80 °C, and the beads were incubated with BtsI(NEB) for 2 hours. Next, the enriched double-stranded DNAs on the capture beads were cut with BtsI for 2 hours at 55 °C. After digestion, double strand DNA on the beads was rendered single stranded by removal of the secondary strand through incubation in 0.125 M NaOH. The supernatant was carefully removed and discarded. The residual solution was then diluted by the addition of 1 mL annealing buffer (20 mM tris-acetate, pH 7.6, 5 mM magnesium acetate), and the suspension was vortexed at a medium speed for 2 seconds; the beads were then pelleted and the supernatant removed. The beads were stored at −20 °C until use.
The barcodes on the beads were produced using emulsion PCR using randomly synthesized barcode oligo DNAs. Because multiple beads can enter the water-in-oil (w/o) emulsion droplet that acts as the reaction compartment, we examined the diversity of the barcodes. The DNA of several barcoding beads was sequenced using a GS Junior Titanium sequencer to confirm whether a single barcode was multiplied after making the barcoded beads. The data indicated that overlapping barcodes in single wells accounted for less than 2% of the 10,000 beads ( Supplementary Fig. 1c). Therefore, the barcoded beads were suitable for single cell analyses.
Preparation of microwell slides. A grid of micropillars (25-μm diameter, 40-μm height) was photolithographically patterned onto a silica wafer using SU-8 photoresist. The silica wafer was then used as a mold to print the PDMS slides (SILPOT 184, Dow Corning Toray Co., Ltd., Japan); these slides have the dimensions of a standard microscope slide and each contains ~160,000 wells in a 20 × 20 mm section. The molded PDMS slides were placed in an oxygen plasma chamber (YSR, SAKIGAKE Semiconductor Co., Ltd) for 5 min to generate a hydrophilic surface. The slides were then set into Lab-Tek ® Chamber Slide (2 well style, Thermo Fisher Scientific) The slides were then blocked in 1% bovine serum albumin (BSA) for 30 min and washed with deionized water followed by PBS to prepare them for cell seeding.
Barcoded beads (~600,000) in 1.0 mL PBS were allowed to settle into the wells of a PDMS slide by gravity over the course of 30 min with gentle agitation to distribute the suspension over the slide surface. To ensure one bead per well, a BSA-blocked dialysis membrane (12,000-14,000 MWCO regenerated cellulose, Spectra/Por 2 membrane discs, ϕ47 mm, Spectrum), that had been rinsed with PBS, was placed on the slide surface to seal the microwells. Excess beads in wells were removed by squeezing the surface of the slide with a small stick. Then, the presence of one bead per well was determined using a microscope.
Synthesis of cDNA of a single cell. A suspension of 10,000 cells was placed on each PDMS microwell slide (~160,000 wells) and allowed to settle into the wells by gravity over the course of 15 min with gentle agitation. The cell seeding process was calculated to be 90% efficient by measuring the cell concentration in seeding buffers for both pre-and post-cell seeding. The distribution of cells as single or multiple cells per well was calculated using Poisson statistics. The cell:well ratio of 1:20 used in this experiment corresponds to 95% of cells being deposited at one cell/well. The cells were allowed to settle into wells, then a BSA-blocked dialysis membrane rinsed with PBS was placed on the slide surface, sealing the microwells and trapping the cells and beads inside. Excess PBS was removed from the slide and membrane surfaces using a 200 μL pipette. Cell lysis solution (200 μL of 500 mM LiCl in 100 mM TRIS buffer (pH 7.5) with 1% lithium dodecyl sulfate, 10 mM EDTA and 5 mM DTT) was applied to the dialysis membranes for 12 min at room temperature. Time-lapse microscopy revealed that all cells were fully Immunohistochemical staining. Immunohistochemical staining was performed using formalin-fixed, paraffin-embedded sections. Sections (4-5 µm thick) were deparaffinized in xylene and rehydrated in graded alcohols and antigen retrieval was performed using target retrieval solution (pH 9.0) (DAKO, Tokyo, Japan). Endogenous peroxidase activity was blocked by 3% hydrogen peroxide in ethanol for 10 min. The sections were incubated with primary antibody (Supplementary Table 6) for 1 hour followed by incubation with biotinylated anti-mouse IgG or anti-rabbit IgG (Nichirei, Tokyo, Japan) for 30 min. Subsequently, the sections were stained with streptavidin-biotin complex (Nichirei), followed by incubation with 3,3′-diaminobenzidine as the chromogen and counterstaining with hematoxylin. Spheroid formation assay. Sphere cultures were performed as described previously 31 . Cells were plated at 1000 cells per well in six-well ultra-low attachment plates (Corning Inc., Corning, NY) and cultured in a DMEM/ F12 medium with 10 ng/mL hEGF, 10 ng/mL hbFGF, and 2% B-27 (Life Technologies Corp.) at 37 °C in 5% CO 2 . On day 7, the number of colonies was counted under an inverted contrast microscope.
Xenograft. Spheroid attached cells were collected and resuspended at 1 × 10 2 to 1 × 10 5 cells per 50 µL of PBS, followed by the addition of 50 µL of Matrigel (BD Biosciences). This cell-Matrigel suspension was subcutaneously injected into the backs of 4-6-week-old NSG mice (NOD.Cg-Prkdc scid Il2rg tm1Wjl /SzJ, Charles River Laboratory, Yokohama, Japan) under anesthesia. Tumor growth was monitored weekly for 12 weeks. Then, the xenografted tumors were resected and analyzed for the tumorigenicity of sphere cells at the subcutaneous injection site.

Clustering analysis and cell classification. t-Distributed Stochastic Neighbor Embedding (tSNE)
analysis 32 and visualization was performed using R. Cancer and other cells were also defined as described in Supplementary Table 2.
Statistical analysis. The Pearson correlation coefficient was calculated to determine the correlation between a cell versus bulk library, experiment 1 versus experiment 2, and each cell marker. Pair-wise multiple group comparisons were analyzed by the Mann-Whitney U-test with FDR correction by Benjamini/Hochberg method. Accession Numbers. The Nx1-seq data have been deposited in the DNA Databank of Japan (DDBJ) with the accession number DRA005192.