Single-cell RNA sequencing (scRNA-seq) enables the study and characterization of cellular states and pathways at ever-growing experimental scales, including the Human Cell Atlas1, cell atlases for tumors2 and other diseases3,4, and large-scale Perturb-Seq screens of millions of cells under genetic5,6 or drug7 perturbations. Methods for capturing and processing single-cell libraries have been radically scaled in the past few years8,9,10,11, but sequencing itself has largely relied on Illumina technology. Here we describe the development of a sequencing technology intended to facilitate large-scale studies. Mostly natural sequencing-by-synthesis (mnSBS) is a new sequencing chemistry that relies on a low fraction of labeled nucleotides, combining the efficiency of non-terminating chemistry with the throughput and scalability of optical endpoint scanning within an open fluidics system to enable high-throughput sequencing, and has been demonstrated on Genome-in-a-Bottle reference samples and samples from the 1000 Genomes project12. To benchmark mnSBS with scRNA-seq, we performed experiments with four library types, sequenced in parallel on an Illumina sequencer and on an Ultima Genomics (Ultima) prototype sequencer implementing mnSBS (Fig. 1a).

Fig. 1: Experimental design. a, Work flow showing four samples used and adjustments made for Ultima sequencing. b, Library conversion showing PCR process to change adapters from Illumina (P5 and P7, parts of Read 1 and 2) to Ultima (Primer for Sequencing + Sample Barcode (PS-SBC) and Primer for Bead (PB), parts of Read 1 and 2). The 5′ libraries have TSO and 3′ libraries have poly(dT). Our Ultima libraries did not require index sequences for combining libraries together, though this feature can be added in the future. c, mnSBS schematic. d, Data conversion of single-end reads to simulated paired-end reads needed for Cell Ranger analysis. White box shows five bases trimmed from cDNA and three bases trimmed from UMI adjacent to the poly(dT) sequence in 3′ libraries. In 5′ libraries, only three bases were trimmed from the cDNA next to the TSO. PS-SBC read is used to deconvolute multiplexed libraries. Full size image

To implement mnSBS for massively parallel, droplet-based scRNA-seq, we converted a typical scRNA-seq work flow to be compatible with Ultima sequencing (Fig. 1b–d; Methods). Focusing on 10x Chromium scRNA-seq (Methods), a popular method, we first added adapters to cDNA libraries specific for Ultima sequencing (Fig. 1b). Next, we address the fact that droplet-based scRNA-seq relies on pairing each cDNA read with a cell barcode (CBC) and a unique molecular identifier (UMI) (Methods). With Illumina sequencing, the two ends of the library are sequenced separately by paired-end sequencing, but for single-end Ultima sequencing, we capture all the information in a single read of 200–250 bases (Fig. 1d and Extended Data Fig. 1), such that the CBC and UMI are read first and followed by the cDNA. For those reads derived from the 3′ end of the transcript, we sequence through poly(T) bases, which are the result of the mRNA poly(A) tail, adjacent to the cDNA sequence.

To evaluate mnSBS with scRNA-seq, we carried out experiments with four libraries, spanning different technical and biological use cases, and sequenced each in parallel on both Ultima and Illumina sequencers (Methods). Three libraries were from peripheral blood mononuclear cells (PBMCs) of healthy human donors, spanning 3′ scRNA-seq (~7,000 cells, 1 individual), 5′ scRNA-Seq (~7,000 cells, 1 individual) and a library generated in multiplex by pooling cells from eight donors (~24,000 cells, 8 individuals, 5′ scRNA-seq). We chose PBMCs because they are primary human cells, include diverse cell types of various sizes and frequencies and have been used for previous benchmarking13,14. The fourth library was from a Perturb-Seq5,6 experiment, where ~20,000 cells were profiled after clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 pooled genetic perturbation, followed by scRNA-seq to detect both the profile of the cell and the associated guide RNA. Together, the four libraries span three major use cases—individual patient atlas, multiplex patient profiling, and large-scale screens, and the two most commonly used library types for scRNA-seq.

We first tested the feasibility of mnSBS for scRNA-seq, with matched Ultima and 5′ and 3′ droplet-based scRNA-seq of PBMCs. Initial analysis (Methods) showed that the number of UMIs generated at a given sequencing depth was comparable between Ultima and Illumina in the 5′ libraries, while for the 3′ libraries we obtained more UMIs with Illumina than Ultima (Fig. 2a), owing to differences in sequence quality. While Ultima and Illumina data for 5′ libraries were similar, for the 3′ data there was lower quality for Ultima in the bases flanking the poly(T) region—the 3′ end of the UMI and the 5′ end of the cDNA (Extended Data Fig. 2a). Indeed, filtering out reads that have bases with quality <10 in their UMI (the filter applied by the pre-processing pipeline we used, Cell Ranger15) yields similar rarefaction curves for Illumina and Ultima (Extended Data Fig. 2b). Thus, much of the difference in the observed number of UMIs per sequenced read for 3′ libraries is explained by the lower sequence quality UMIs in the Ultima data caused by the need to sequence through the poly(T) bases. To overcome this, for 3′ libraries we trimmed five bases from the cDNA adjacent to the poly(T) bases and then explored how best to trim the UMI. As we shortened the UMIs, UMIs that differed only in the trimmed bases ‘collapsed’ into a single UMI leading to decreases in the fraction of UMI–CBC pairs that occur in only one gene at roughly the same rate in Illumina and Ultima data (Extended Data Fig. 2c). Shortening the UMIs for Illumina had a minimal effect at nine bases or more (Extended Data Fig. 2d), suggesting that the challenges with Ultima reads were caused by lower base quality and that trimming to nine bases was reasonable.

Fig. 2: Quality metrics for matched 5′ and 3′ libraries. a, Total number of UMIs detected per cell at different sequencing depths. For b–e, reads were sampled so that Illumina and Ultima have the same number of reads. b, Number of cells identified by Cell Ranger only in Ultima, only in Illumina or both. c,d, Distribution of the number of genes (c) or UMIs (d) per cell. Box plots show the 25% and 75% quantiles with the median marked in between. e, Scatter plots with one point for each gene. Labeled genes (gray) have a high FC (FC > 2 using a pseudocount of 10 TPM). The 20 genes with the highest FC are labeled in each plot. For all 3′ libraries, the last three UMI bases were trimmed for quality reasons. Source data Full size image

We further investigated whether trimming the last three bases of the UMI impacted the results owing to increased ‘collisions’ when different full UMIs collapsed together to the same trimmed UMI. This could be the case when high UMI complexity is required, for a cell with many UMIs detected or for a highly expressed gene. To explore this, we examined the ratio of the number of trimmed to untrimmed UMIs for cells and for genes with different numbers of UMIs, in the Illumina 3′ PBMC dataset (with the higher quality UMIs). At the cell level, the ratio reduces as the coverage increases, but only modestly, by less than 10% for all but a few dozen cells with very high number of UMIs (Supplementary Fig. 1a). At the gene level, very few of the highly expressed genes (8 of 3,908 genes with >1,000 UMIs) show a reduction of >10% (Supplementary Fig. 1b). Conversely, some very lowly expressed genes have lower ratios, likely because, for these genes, losing even one UMI will lead to a smaller ratio. Taken together, our analyses show that shortening UMIs has only a modest effect on highly expressed genes and high-complexity cell profiles. This led us to exclude the last three bases of each UMI in Ultima 3′ data in subsequent downstream analysis (Methods).

Next, comparing the performance of these PBMC 3′ and 5′ matched libraries, we obtained similar overall performance for both sequencing technologies. First, to correct for differences in sequencing depths, which were higher in Ultima than Illumina, we randomly sampled Ultima reads, so that we used the same number of reads for each sequencing platform (Methods). Both technologies identified nearly all the same CBCs (Fig. 2b; 7,916 cells (Ultima) versus 7,926 cells (Illumina) in the 3′ data, and 7,875 cells (Ultima) versus 7,854 cells (Illumina) in the 5′ data), with the same number of UMIs and genes per cell for 5′ libraries and slightly lower numbers for 3′ libraries with Ultima (as expected) (Fig. 2c,d). When we sampled reads to have the same number of UMIs (Methods), we obtained a similar number of genes per cell in Illumina and Ultima also for 3′ libraries (Extended Data Fig. 3). Other metrics (Supplementary Table 1) also showed similar overall performance, with slightly higher genome mapping rates in Ultima but comparable transcriptome mapping rates.

The two sequencing technologies yielded highly correlated expression levels for the matched 5′ and 3′ PBMC libraries, albeit with some outlier genes and minor differences (Pearson’s r = 0.98 in all cases; Fig. 2e and Extended Data Fig. 3c). As expected, when a single sequencing run was randomly split into two datasets, we see even higher correlation of expression levels (Extended Data Fig. 3d). Specifically, there was a modest bias, particularly in the 3′ libraries, towards genes with higher GC content having higher expression in Illumina and the longest genes having higher expression in Ultima 3′ libraries (Extended Data Fig. 4a,b). Of the 166 genes with differences in expression for 3′ PBMC between the two sequencing platforms, most (130 genes, 78.3%) differed in the fraction of reads that were assigned by Cell Ranger to the gene out of all the reads mapped to that gene region (Extended Data Fig. 4c). This is likely related to how Ultima and Illumina reads map to different locations relative to the transcript, as expected from the difference in single-end versus paired-end reads (Fig. 1d). In 5′ data, Ultima reads map closer to the 5′ end than Illumina reads, while in 3′ data, Ultima reads map closer to the 3′ end than Illumina reads (Extended Data Fig. 4d,e). Because Cell Ranger excludes reads that do not fully map within annotated gene boundaries, more Ultima reads are excluded from analysis as they are closer to gene ends (Extended Data Fig. 4d,e), as shown, for example, for LILRA5 and HIST1H1D (Extended Data Fig. 4f,g). This difference in location can also lead to more multimapping or ambiguous reads (Extended Data Fig. 4h and Supplementary Table 2). For example, four (ARF5, MIF, IFITM1 and TCIRG1) of the 20 genes with the largest log fold change (FC) (all logs are the natural logarithm (base e) in this study, unless otherwise noted) between Ultima and Illumina in the 3′ data (labeled in Fig. 2e) have higher expression in Illumina and a much higher rate of mapped ambiguous reads in the Ultima than the Illumina data (>50 versus <10 ambiguous reads per non-ambiguous read for each gene, respectively) (Supplementary Table 2), possibly explaining the difference in their expression levels. Shortening Ultima Read 2 to the same length as Illumina Read 2 had a small effect on the fraction of assigned reads (Extended Data Fig. 4h) and other metrics (Supplementary Table 1)—suggesting read length is not a major factor in the differences we observed.

To further explore the effects of gene annotation on Ultima and Illumina-based scRNA-seq, we extended the standard reference using RNA-seq data, as we have previously shown this can recover the expression of a gene with an alternative 3′ end compared to the annotation16. We created a pipeline that extends the annotated gene boundaries based on reads that overlap a gene but are not completely contained in any of its annotated exons (Methods). We generated three such references, extended with either (1) published bulk PBMC data13, (2) the Ultima 3′ scRNA-seq data, or (3) the Ultima 5′ scRNA-seq data (with Ultima and Illumina data sampled to the same number of reads). We compared the expression of genes in Ultima data processed with the extended references to those in Illumina data either with or without the extended reference (Extended Data Fig. 5 and Supplementary Tables 1 and 3).

Analyzing the 5′ PBMC data with the extended reference decreased the number of differentially expressed (DE) genes between Ultima and Illumina by 22 to 23% (absolute logFC > ln2) compared with the standard reference, while other overall metrics were largely unchanged. In the 3′ data, there were a similar number of DE genes in analyses with the extended and standard references, although the expression of some genes, for example, LILRA5 and MT-CO2, agreed much more closely using the extended reference. Comparing gene expression levels for the same sequencing dataset processed with the standard or an extended reference shows that most levels are very similar, though a sizeable number (23 to 83) are higher and a few (1 to 3) are lower (Extended Data Fig. 5c). Also, some of the top genes that differ between the extended and standard references are genes that differ between Ultima and Illumina with the standard reference, for example, MT-CO2 and LILRA5 in the 3′ data and HIST1H1D and HIST1H1E in the 5′ data (Fig. 2e and Extended Data Fig. 4f,g). This suggests that a data-driven extended reference might help recover expression in Ultima scRNA-seq data, particularly when using 5′ data. Alternatively, one could consider modifying the way Cell Ranger counts UMIs to better take advantage of reads that overlap genes but are not completely contained within them.

We examined the impact of the single-end Ultima versus paired-end Illumina data by sequencing the 5′ PBMC library with single-ended Illumina sequencing. Applying a similar pipeline to the one we used for Ultima with minor required modifications (Methods), we sampled the Ultima data to have the same number of reads as the single-end Illumina data and compared them (Supplementary Fig. 2). The two methods showed very high agreement in terms of the number of UMIs per cell and genes per cell, with far fewer outlier genes between Ultima and the single-ended Illumina data than observed when comparing to the paired-end data (Supplementary Table 3). Overall, the quality control metrics of single-end Illumina and Ultima sequencing are much more similar, particularly the mapping metrics (Supplementary Table 1).

To compare the biological insights derived from scRNA-seq using the two technologies, we turned to analyze 5′ scRNA-seq of PBMCs from eight individuals processed together and sequenced with both Ultima and Illumina (Methods). Both methods have roughly the same number of UMIs in this dataset (<1% difference) and performed similarly (using all reads; Supplementary Table 1 and Extended Data Fig. 6). We also generated matched T cell Receptor (TCR) and B cell Receptor (BCR) Illumina sequencing data (Methods). Ultima sequencing was not used for this, because the 10x Chromium constructs specifically require paired-ends or much longer single-end reads to cover the entirety of these genes.

In the eight individuals PBMC dataset, the two sequencing platforms produced very similar results for the common tasks of genotype-based assignment, cell-type labeling and identification of DE genes and were well embedded together. First, we used Vireo17, which finds genotype clusters in the data without prior knowledge of the genotypes of individuals in the experiment, to assign reads to each individual in the mixture (Methods). Both Ultima and Illumina data returned highly concordant labels (Fig. 3a), with 92% agreement in label if we include those cells declared doublets or unassigned (χ2 test for independence gives a P < 2.2 × 10−16 and χ2 = 199,127 with degrees of freedom = 81) and >99.9% agreement if the cell is assigned singlet by both technologies (only five cells differ; χ2 test for independence gives a P < 2.2 × 10−16 and χ2 = 146,879 with degrees of freedom = 49). Next, we clustered the cells for each of the two datasets separately (Methods) and used Azimuth18 to automatically label cell types in each (Methods). In both sequencing datasets, we identify the major cell types expected for PBMCs, with the expected cell-type markers (Extended Data Fig. 7a,b), and cells are comparably well-mixed among individuals (Fig. 3b), with low adjusted mutual information (AMI) between cell type and individual in both Ultima (0.026) and Illumina (0.025) (AMI = 0 corresponds to no relation between individual and cell type; AMI = 1 corresponds to the case of perfect agreement between the two labelings). The two sequencing datasets also had high agreement on proportions of each cell type from each individual, both for the main cell-type categories (Fig. 3c; 95% agreement in cell-type labels between Ultima and Illumina; χ2 test for independence gives a P < 2.2 × 10−16 and χ2 = 123,891 with degrees of freedom = 49, AMI = 0.88) and for finer cell subsets, such as subclusters of T cells (Fig. 3d). They further agreed on differential expression between cell types (Fig. 3e; r = 0.93–0.95), such that 67.9% of genes that are significantly DE (FDR < 0.05 with Presto19; Methods) in one cell type in one of the two datasets are significant in both for that cell type. We found similar results with 5′ and 3′ PBMC datasets from a single individual (Extended Data Fig. 8). Moreover, the two PBMC mixture datasets were co-embedded well together into a joint two-dimensional space using uniform manifold approximation and projection (UMAP) after regressing out dataset of origin (Methods), with good mixing between datasets (AMI = 0.00068 between the joint clustering and dataset of origin), and good separation of cell types (Fig. 3f). Thus, data generated by the two sequencing technologies are compatible and can be combined easily in a single analysis.

Fig. 3: Cell-type identification and characterization of a mixture of PBMCs. a, Number of cells assigned to each donor by Vireo. Donors were renamed to match between Ultima and Illumina. b, UMAP plots for Ultima (right) and Illumina (left) colored by donor (top) and cell type (bottom). c, Bar plots of the proportion of each Azimuth-defined cell type in each donor for Ultima and Illumina. d, Bar plots of the proportion of each Azimuth-defined T cell subtype in each donor for Ultima and Illumina. Strong agreement can be seen. e, Scatter plots of logFC from performing DE between cell-type clusters with Presto. f, Joint UMAP of Ultima and Illumina data colored as in b. We did not sample the exact same number of reads from Illumina or Ultima data since they have approximately the same number of total UMIs. NK, natural killer cells; CTL, cytotoxic T cells; TCM, central memory T cells; MAIT, mucosal-associated invariant T cells; Treg, regulatory T cells; dnT, double negative T cells; gdT, gamma delta T cells. Source data Full size image

For B and T cells, where we had clonotype assignment only by Illumina sequencing of TCRs and BCRs (Methods), we found good concordance between Ultima and Illumina cell-type assignments. Most T cells called by either method had TCR sequences (76% in both Ultima and Illumina) with only a very small percent of cells of other cell types having a TCR sequence (3.7% in Ultima and 3.5% Illumina), with similar results for B cells and BCR sequences (Extended Data Fig. 7c; 93% of B cells in both Ultima and Illumina were assigned a BCR clonotype while only 0.72% of non-B cells in Ultima and 0.73% of non-B cells in Illumina were assigned a BCR clonotype). The distribution of T cell subsets to top TCR clonotypes for each individual was also largely concordant between Ultima and Illumina sequencing (Extended Data Fig. 7d), with small differences in cell-type labeling. CD8 T effector memory (TEM) cells were by far the most likely to be expanded, as expected20. Thus, Ultima sequencing for scRNA-seq can be combined with Illumina sequencing of TCR and BCR genes to generate comparable results to those found with only Illumina sequencing.

To explore finer signals, we compared the two datasets for continuous cell states—such as activation status or the cell cycle—recovered by unsupervised non-negative matrix factorization (NMF). Each NMF factor can reflect a gene program, defined by a non-negative score for each gene (referred to as gene loadings) and a non-negative score for each cell (referred to as cell loadings). Because NMF runs are not identical even when re-run on the same data, to compare NMF models from Ultima and Illumina data, we fit NMF on Ultima data, Illumina data and a null of randomly permuted Illumina expression values (Methods) and then measured how well cell or gene loadings fit each dataset. Cell loadings from the model learned on Ultima data fit the Illumina data almost as well as cell loadings from the Illumina-learned model and vice versa, while loadings from the permuted (null) dataset led to a much poorer fit (Extended Data Fig. 9a). For gene loadings, there was lower performance when fitting data from one sequencing technologies with loadings from a model learned on the data from the other technologies, each to a comparable extent, and both far better than random permutations (Extended Data Fig. 9b). Consensus NMF (cNMF)21 (Methods), which reduces variability owing to random sampling between NMF runs, showed high correlations of cell (Extended Data Fig. 9c) or gene (Extended Data Fig. 9d) loadings between models learned on different runs. The correspondence was comparable to that observed between two independent cNMF runs on the same dataset (Extended Data Fig. 9e,f), and lower than when comparing a single run to itself (Extended Data Fig. 9g,h), as expected. It was also much stronger than comparing cNMF models of two different biological systems (5′ PBMC mixture data and Perturb-Seq; see below for details of this experiment) (Extended Data Fig. 9i,j). Notably, the same cell subsets score highly for Ultima (Extended Data Fig. 9k) and Illumina (Extended Data Fig. 9l) data-derived programs on a joint UMAP embedding. For example, factor 13 in Illumina and factor 1 in Ultima scored in the same cells (Extended Data Fig. 9k,l) and were correspondingly highly correlated on both cell (Extended Data Fig. 9c) and gene (Extended Data Fig. 9d) loadings, indicating that they correspond to the same program. Moreover, other factors that differed between Ultima and Illumina were highly related—for example, factor 5 in the Ultima dataset was roughly decomposed into factors 5 and 11 in the Illumina dataset. Overall, we conclude that there is a high correspondence between cell states in Ultima and Illumina data.

As a final test, we evaluated performance with a Perturb-Seq screen, where heavy sequencing requirements are particularly limiting for scale5,6, and used a design that also tested for CITE-Seq22 and Cell Hashing23 performance. Specifically, we used a library from a pilot screen of an ongoing genome-wide Perturb-Seq study (PIT, KGS, CJF and AR, unpublished results) to identify regulators of MHC Class I in melanoma A375 cells (Fig. 4a). In this pilot, we introduced 6,127 guides targeting 1,902 transcription factors and chromatin modifiers (Supplementary Table 4) along with both intergenic and non-targeting control guides, enriched for cells with low human leukocyte antigen (HLA) levels, and followed by scRNA-seq of 20,000 cells that included CITE-seq22 and Cell Hashing23,24 (Methods). We sequenced the resulting scRNA-seq libraries with Illumina and Ultima, but the targeted PCR amplification (‘dial-out’) libraries used for guide detection, CITE-seq and Cell Hashing were only sequenced with Illumina (Extended Data Fig. 10a–c) because the read length was not sufficient for guide detection and the others were not attempted. Initial pre-processing of the Perturb-Seq scRNA-seq data showed similar performance for Ultima and Illumina, after sampling reads to have the same number of UMIs in each dataset (Extended Data Fig. 6a,b and Supplementary Table 1), as before, as well as in terms of cell assignment to guides (Fig. 4b and Extended Data Fig. 10c), Cell Hashing barcodes (Fig. 4c) and cell clustering and marker gene expression (Extended Data Fig. 10d–i).

Fig. 4: Perturb-Seq. a, Perturb-Seq to find regulators of MHC Class I in melanoma. We transduced A375 melanoma cells with a genome-wide library, and cells with low HLA expression were enriched by flow cytometry before scRNA-seq. b, Number of cells with each perturbation, only plotting those with >10 cells, excluding non-targeting and background guides. c, Number of cells with each Cell Hashing label. d,e, Guide similarity heat maps in Ultima (d) and Illumina (e). Effects of each guide on each gene in the Illumina data were calculated with an elastic-net-based approach as in MIMOSCA. The matrix of guides by genes with (uncorrected) P < 0.05 was extracted; the correlation between guides was calculated and plotted as a heat map. f, We extracted all gene–guide pairs from our DE analysis with FDR < 0.05 in either Illumina or Ultima. For each of these guide–gene pairs, we plotted the logFC on the y-axis (with 95% confidence intervals) and guide on the x-axis, with each box being a different gene, for both Illumina and Ultima. We included all guides that targeted a gene with any significantly different guides. The dots are colored red if significant (FDR < 0.05) and black if not. g, KEGG enrichment analysis for each guide. The ten pathways with smallest P values in both Illumina (left) and Ultima (right) are plotted as their −log 10 P in both Illumina and Ultima. All pathways were significant in both Illumina and Ultima at FDR of 0.05. Source data Full size image

Importantly, the Ultima and Illumina datasets identified similar relationships between perturbations and similar regulatory effects. For this analysis, we included the 335 cells in Illumina and 336 cells in Ultima, coming from 11 perturbations and 10 control guides in this pilot screen that were assigned to a single perturbation that had more than 10 assigned cells (the same perturbations were found by Illumina and Ultima). We then fit a regularized linear model (with elastic net, similar to previous studies6,24; Methods) of the mean impact of each perturbation on each gene, selected genes with nominal P < 0.05 using a permutation-based approach (Methods) and clustered the guides by these regulatory profiles. Analyses that were based on Ultima (Fig. 4d) and Illumina (Fig. 4e) yielded very similar guide relationships, both between multiple guides to the same gene (for example, STAT1 guides) and between guides to different, functionally related genes (for example, STAT1 and IRF1 or COP1_1 and CREBBP_3). Moreover, there was very high agreement in the effects on individual genes in both datasets, when comparing DE genes between each guide and an intergenic control (intergenic_1) in each dataset, in both significance and effect size (Fig. 4f and Extended Data Fig. 10j,k). Many such gene–guide pairs were significant in both the Ultima and Illumina datasets (Fig. 4f), and those significant only in one had highly similar effect sizes, showing consistent signal. Moreover, the KEGG pathways enriched with DE genes between each guide and an intergenic control were highly similar between the datasets (Fig. 4g).

Finally, we leveraged the Perturb-Seq data to assess any impact that our use of shorter Illumina Read 2 in the PBMC data may have had on our results. In the Perturb-Seq data, the length of Illumina Read 2 (96 bases) is longer than in the Illumina PBMC data (~55 bases). Reanalyzing the Illumina Perturb-Seq data after trimming Read 2 to 55 bases showed only slight reductions in the number of genes and UMIs per cell, and overall very similar results to the full-length Perturb-Seq data (Supplementary Fig. 3).

In conclusion, the two sequencing platforms generally perform similarly for scRNA-seq, across two main protocols for droplet-based scRNA-seq (3′ and 5′), two different sample types (primary cells and a cell line) and multiple experimental designs (simplex and multiplex, Perturb-Seq, CITE-Seq and Cell Hashing). One key explanation for the minor differences we observed is the position of reads relative to annotated gene boundaries (Extended Data Fig. 4d–g), as a consequence of Ultima single-end reads being closer to gene ends. Additionally, we currently recommend 5′ over 3′ libraries, given the small penalty in lost reads in 3′ libraries (Fig. 2a) owing to lower sequencing quality adjacent to the poly(T) sequence (Extended Data Fig. 2). A similar comparison of BGI MGISEQ-2000 and Illumina sequencing of 10x Chromium scRNA-seq libraries also found highly comparable results25.

Lower-cost Ultima sequencing should make it possible to sequence more reads, cells and/or samples in the context of large-scale tissue atlasing projects, such as the Human Cell Atlas1, the BRAIN Initiative26, the Cancer Moonshot Human Tumor Atlas Network2 as well as perturbation screens5,6. It should also be possible to design droplet-based scRNA-seq reagents, and methods for other large-scale single-cell and spatial genomics27,28,29,30,31 customized to Ultima sequencing to directly generate libraries and eliminate the need for library conversion (Fig. 1b). With appropriate adaptations for Ultima sequencing, single-cell ATAC-seq should be possible even with the current sequencing lengths. Additionally, with longer Ultima reads or different construct designs, sequencing of other library types including TCR/BCR and targeted PCR amplification can be enabled. Finally, such reduced sequencing costs could open the way to use scRNA-seq in clinical applications, including diagnostics (as in next-generation blood tests, ‘CBC2.0’1) or for therapeutics screens of small molecules, antibodies or cell therapies, impacting both basic biological discoveries and their clinical translation.