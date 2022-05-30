Cell sources, culturing and sorting

HEK293FT cells (Invitrogen) were grown in DMEM medium (4.5 g L−1 of glucose and 6 mM L-glutamine, Gibco), supplemented with 10% FBS (Sigma-Aldrich), 0.1 mM MEM non-essential amino acids (Gibco), 1 mM sodium pyruvate (Gibco) and 100 µg ml−1 of pencillin–streptomycin (Gibco) at 37 °C. For sort of single cells, cells were harvested by incubation with TrypLE Express (Gibco). K562 (ATCC) cells were grown in RPMI medium, supplemented 10% FBS (Sigma-Aldrich) and 1% pencillin–streptomycin (Gibco). Frozen aliquots of 10 million hPBMCs from healthy individuals were purchased from Lonza, requiring healthy donors only. Written informed consent was obtained at sampling point from all donors by Lonza, and our analyses of hPBMCs were approved by the Regional Ethical Review Board in Stockholm, Sweden (2020-05070). hPBMCs were gently thawed and stained with PE mouse anti-human CCR7 (2-L1-A,1:100), PE-Cy7 mouse anti-human CD4 (SK3,1:250), FITC mouse anti-human CD45RA (HI100, 1:100), PerCP-Cy5.5/BB700 mouse anti-human CD8 (RPA-T8, 1:250) and PE-Cy5 mouse anti-human CD45RO (UCHL1, 1:250) (BD Biosciences) before sorting. For all cell types, dead cells were gated out after staining with propidium iodide (Thermo Fisher Scientific). Live, single cells were sorted into 384-well plates containing lysis buffer using a BD FACSMelody (BD FACSChorus version 1.3 software) equipped with a 100-µm nozzle and plate cooling with index sorting on (BD Biosciences). After sorting, each plate was immediately spun down and stored at −80 °C.

Smart-seq3 library preparation

Full-volume Smart-seq3 library preparations were performed as previously described2. PCR was carried out using 20 cycles of amplification.

Low-volume Smart-seq3 and Smart-seq3xpress library preparation

For experiments including overlays, including Vapor-Lock (Qiagen), silicon oils (Sigma-Aldrich) and tri-/tetradecane (Sigma-Aldrich), 3 µl of designated overlay was added to each well and stored at room temperature until use. Lysis buffer of various volumes (0.1–3 µL) was dispensed using either Formulatrix Mantis or Dispendix I.Dot One liquid dispenser to each well, all containing 0.1% Triton X-100, 5% PEG8000 adjusted to RT volume, 0.5 µM oligo(dT) adjusted to RT volume, 0.5 mM dNTPs each adjusted to RT volume and 0.5 U RNase Inhibitor (Takara, 40 U µl−1). After dispensing, lysis plates were briefly centrifuged down to ensure that lysis is properly collected and stored under the overlay. Stored plates of sorted cells were denatured at 72 °C for 10 minutes, followed by the addition of indicated volumes of RT mix; common for all is that the reagent concentrations are stable: 25 mM Tris-HCl ~pH 8.3 (Sigma-Aldrich), 30 mM NaCl (Ambion), 0.5 mM GTP (Thermo Fisher Scientific), 2.5 mM MgCl 2 (Ambion), 8 mM DTT (Thermo Fisher Scientific), 0.25 U µl−1 of RNase Inhibitor (Takara), 2 µM TSO (5′-Biotin-AGAGACAGATTGCGCAATGNNNNNNNNrGrGrG-3′; IDT) and 2 U µl−1 of Maxima H Minus reverse transcriptase (Thermo Fisher Scientific). After RT mix dispensing, the plate was spun down to ensure merge of RT and lysis reactions. RT was performed at 42 °C for 90 minutes, followed by ten cycles of 50 °C for 2 minutes and 42 °C for 2 minutes. Indicated volumes of PCR master mix were dispensed, all containing constant reaction concentrations of 1× KAPA HiFi PCR buffer (Roche), 0.3 mM dNTPs each (Roche), 0.5 mM MgCl 2 (Ambion), 0.5 µM Smart-seq3 forward primer (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTGCGCAATG-3′; IDT), 0.1 µM Smart-seq3 reverse primer (5′-ACGAGCATCAGCAGCATACGA-3′; IDT) and 0.02 U µl−1 of KAPA HiFi DNA polymerase (Roche). After dispensing, the plate was quickly spun down before being incubated in PCR as follows: 3 minutes at 98 °C for initial denaturation, 10–24 cycles of 20 seconds at 98 °C, 30 seconds at 65 °C and 2–6 min at 72 °C. Final elongation was performed for 5 minutes at 72 °C. For conditions after cDNA pre-amplification clean-up: 100 nl of water, ExoSAP-IT express (Thermo Fisher Scientific) or 0.5 U ExoI (NEB) + 0.05 FastAP (Thermo Fisher Scientific) was dispensed per well and incubated at 37 °C for 15 minutes, followed by inactivation at 85 °C for 5 minutes.

For Smartseq3xpress with SeqAmp (Takara), lysis and RT was carried out with 0.125 µM or 0.5 µM oligodT30VN and 0.75 µM or 2 µM TSO unless otherwise indicated as described above. Original Smartseq3 TSO (5′-Biotin-AGAGACAGATTGCGCAATGNNNNNNNNrGrGrG-3′; IDT). Improved TSO (5′-Biotin-AGAGACAGATTGCGCAATGNNNNNNNNWWrGrGrG-3′; IDT). PCR mastermix was dispensed at 0.6 µl per cell containing 1× SeqAmp PCR buffer, 0.025 U µl−1 of SeqAmp polymerase and 0.5 µM/1 µM Smartseq3 forward and reverse primer. After dispensing PCR mastermix, the plate was quickly spun down before being incubated as follows: 1 minute at 95 °C for initial denaturation, 6–18 cycles of 10 seconds at 98 °C, 30 seconds at 65 °C and 2–6 minutes at 68 °C. Final elongation was performed for 10 minutes at 72 °C. For Smartseq3xpress with NEBNext Ultra II Q5 Master Mix (NEB), PCR mastermix consisted of 1× NEBNext Ultra II Q5 Master Mix and 0.5 µM/1 µM Smartseq3 forward and reverse primer and PCR was performed at 30 seconds at 98 °C for initial denaturation, 12 cycles of 10 seconds at 98 °C, 30 seconds at 65 °C and 6 minutes at 72 °C. Final elongation was performed for 5 minutes at 72 °C. For Smartseq3xpress with NEBNext Q5 Hot Start HiFi PCR Master Mix (NEB), PCR mastermix consisted of 1× NEBNext Q5 Hot Start HiFi PCR Master Mix and 0.5 µM/1 µM Smartseq3 forward and reverse primer and PCR was performed for 30 seconds at 98 °C for initial denaturation, 12 cycles of 10 seconds at 98 °C, 30 seconds at 65 °C and 1 minute at 65 °C. Final elongation was performed for 5 minutes at 65 °C. For Smartseq3xpress with Platinum SuperFi II DNA polymerase (Invitrogen), PCR mastermix consisted of 1× SuperFi II Master Mix, 0.2 µM dNTPs and 0.5 µM/1 µM Smartseq3 forward and reverse primer and PCR was performed for 30 seconds at 98 °C for initial denaturation, 12 cycles of 10 seconds at 98 °C, 30 seconds at 60 °C and 6 minutes at 72 °C. Final elongation was performed for 5 minutes at 72 °C. For Smartseq3xpress with Platinum II Taq Hot Start DNA polymerase (Invitrogen), PCR mastermix consisted of 1× Platinum II Taq Master Mix, 0.2 µM dNTPs and 0.5 µM/1 µM Smartseq3 forward and reverse primer and PCR was performed for 2 minutes at 94 °C for initial denaturation, 12 cycles of 15 seconds at 94 °C, 30 seconds at 60 °C and 6 minutes at 68 °C. Final elongation was performed for 5 minutes at 68 °C.

A full and comprehensive protocol of Smart-seq3xpress has been deposited on protocols.io18.

After pre-amplification workflow

For regular Smart-seq3, pre-amplified cDNA libraries were purified with homemade 22% PEG beads at a ratio of 1:0.6. Library sizes were observed using Agilent Bioanalyzer High Sensitivity Chip, followed by concentration quantification using QuantiFlour dsDNA assay (Promega). cDNA was subsequently diluted to 100 pg µl−1 unless otherwise specified.

For low volume, pre-amplified cDNA libraries were diluted by the addition of 9 µl of water to each well, if not indicated otherwise, followed by a quick centrifugation. Library sizes were checked on an Agilent Bioanalyzer, using the high-sensitivity DNA chip; meanwhile, concentrations were quantified using QuantiFlour dsDNA assay (Promega). cDNA was normalized to 100 pg µl−1 if nothing else was specified.

For Smart-seq3xpress, pre-amplified cDNA libraries were diluted with the addition of 9 µl of water unless stated otherwise, before transferring 1 µl of diluted cDNA from each well into tagmentation.

Sequence library preparation for Smart-seq3xpress

Tagmentation was performed in 2 µl consisting of 1 µl of either diluted or normalized pre-amplified cDNA and 1 µl of 1× tagmentation buffer (10 mM Tris pH 7.5, 5 mM MgCl 2 , 5% DMF), 0.025–0.5 µl of ATM (Illumina XT DNA sample preparation kit) or 0.0002–0.01 µl of tagmentation DNA enzyme 1 (TDE1;Illumina DNA sample preparation kit)). In the event of in-house Tn5, 1× tagmentation buffer used consisted of 10 mM TAPS-NaOH pH 8.4, 5 mM MgCl 2 and 8% PEG8000 and indicated amounts of 0.0005–0.01 µM in-house Tn5 enzyme. Samples were incubated at 55 °C for 10 minutes, followed by the addition of 0.5 µl of 0.2% SDS to each well. After addition of 1.5 µl/3.5 µl of custom Nextera index primers (0.5 μM) carrying 10-bp dual indexes, library amplification was started by the addition of 2/4 µl of PCR mix (1× Phusion Buffer (Thermo Fisher Scientific), 0.01 U µl−1 of Phusion DNA polymerase (Thermo Fisher Scientific), 0.2 mM dNTP each) and incubated for 3 minutes at 72 °C; 30 seconds at 95 °C; 12–14 cycles of (10 seconds at 95 °C; 30 seconds at 55 °C; 30–60 seconds at 72 °C); and 5 minutes at 72 °C in a thermal cycler. Samples were pooled by spinning out each plate gently in a 300-ml robotic reservoir (Nalgene) fitted with a custom 3D-printed scaffold by pulsing to ~200g. The pooled library was purified with homemade 22% PEG beads at a ratio of 1:0.7.

10x Genomics library preparation

After thawing the PBMC sample, we stained dead cells with propidium iodide (Thermo Fisher Scientific) and sorted 200,000 live cells into a 5-ml tube. After centrifugation, the cell suspension was resuspended and concentration of cells was determined using a Countess automated cell counter (Thermo Fisher Scientific). We loaded ~13,000 cells for a target cell recovery of ~8,000 cells and prepared libraries according to the 10x Genomics version 3.1 user guide. For both pre-amplification and post-fragmentation PCR, we applied 12 cycles of PCR.

Sequencing

Smartseq3 and Smartseq3xpress libraries were sequenced on a Illumina NextSeq 500 (Illumina NextSeq Control Software 2.2.0) or MGI DNBSEQ G400RS platform (version 1.1.0.108 software). For NextSeq runs with Smart-seq3, denatured libraries were loaded on HighOutput version 2.5 cartridges at 2.1–2.3 pM. For G400RS runs, libraries were created using phosphorylated index primers or subjected to five cycles of adapter conversion PCR using the MGIEasy Universal Library Conversion Kit (MGI) and subsequently circularized from 1 pmol of dsDNA according to the manufacturer’s protocol. Next, 60 fmol of circular ssDNA library pools were used for DNA nanoball (DNB) making using a custom rolling-circle amplification primer (5′-TCGCCGTATCATTCAAGCAGAAGACG-3′). DNBs were loaded on FCL flow cells (MGI) and sequenced using SE100, PE100 or PE150 cartridges using custom sequencing primers (Read 1: 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3′; MDA: 5′-CGTATGCCGTCTTCTGCTTGAATGATACGGCGAC-3′, Read 2: 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3′; i7 index: 5′-CCGTATCATTCAAGCAGAAGACGGCATACGAGAT-3′; i5 index: 5′-CTGTCTCTTATACACATCTGACGCTGCCGACGA-3′). 10x Genomics version 3.1 libraries were sequenced on a NextSeq 500 according to manufacturer specifications (1.8 pM loading concentration; HighOutput version 2.5 150-cycle kits, 28–8–92 cycles for read1, index1 and read2).

Primary data processing

zUMIs13 version 2.8.2 or newer was used to process raw FASTQ files. Reads were filtered for low-quality barcodes and UMIs (4 bases < phred 20, 3 bases < phred 20, respectively) and UMI-containing reads parsed by detection of the pattern (ATTGCGCAATG) while allowing up to two mismatches. Reads were mapped to the human genome (hg38) using STAR version 2.7.3, and error-corrected UMI counts were calculated from Ensembl gene annotations (GRCh38.95). zUMIs was also used to downsample cells to equal raw sequencing depth to facilitate method benchmarking.

Analysis of Smartseq3xpress hPBMC data

Cells were filtered for low-quality libraries, requiring (1) more than 50% of read pairs mapped to exons+introns, (2) more than 20,000 read pairs sequenced, (3) more than 500 genes (exon+intron quantification) detected per cell and (4) less than 15% of read pairs mapped to mitochondrial genes. Furthermore, a gene was required to be expressed in at least ten cells. Analysis was done using Seurat v4.0.1 (ref. 14). Data were normalized (‘LogNormalize’), scaled to 10,000 and total number of counts, and mitochondrial fraction was regressed out. Using the Seurat integration function, the donor effect from the seven different donors in the dataset was removed. The top 10,000 variable genes were considered and 35 principal components for shared nearest neighbor (SNN) neighborhood construction and UMAP dimensionality reduction. Cell clusters were produced using Louvain algorithm at a resolution of 0.8. Cell types were identified by using the R package Presto (Wilcoxon & AUC, version 1.0.0). For the Azimuth predictions, a QC-filtered count matrix was uploaded to the Azimuth web-based application and processed according to the Azimuth app. For the direct donor comparison with 10x Genomics version 3.1 data, read counts from only donor 7 were downsampled to similar median sequencing depth as the comparable 10x dataset and quality filtered as follows: at least 10,000 read pairs, more than 50% of read pairs mapped to exons+introns and less than 15% read pairs mapped to mitochondrial genes. A gene was required to be expressed in at least ten cells. Data were subset to 3,000 cells for analysis in Seurat. Data were LogNormalized, scaled to 10,000 and mitochondrial genes regressed out. Default Seurat settings were used for neighborhood construction and dimensionality reduction. Cell clusters were assigned using the Louvain algorithm at a resolution of 0.8. Cell type identification was performed as above.

Analysis of 10x Genomics version 3.1 donor 7 hPBMC data

Raw sequencing data in FASTQ format were processed using zUMIs version 2.9.3 with automatic barcode detection based on the 10x Genomics version 3.1 allow-list. After completion, we exported full count tables including empty droplets and assigned ambient RNA droplets and real cells using the CellBender (version 0.2.0) remove-background function19. To filter for doublets, the CellBender output.h5 file was used with Solo20 (version 0.6). Additional doublets were discarded by manually inspecting the distribution of total UMI counts per droplet and discarding those greater than 45,000. For downstream analysis in Seurat, a low-quality filter was applied based on requiring at least 10,000 read pairs and less than 10% read pairs mapped to mitochondrial genes. A gene should be expressed in at least ten cells to be included. A subset of 3,000 cells out of 6,483 passing QC was used for direct comparison to Smart-seq3xpress. Seurat was run at default settings using SCTransform, and cell clusters were assigned at resolution 0.8 using the Louvain algorithm. Cell types were identified using Presto (Wilcoxon & AUC) together with reference-based approach performed by the Azimuth app.

TCR reconstruction

TCR sequences were reconstructed using TraCeR version 0.6.0 (ref. 21) run in the teichlab/tracer Docker environment and using the --loci A B G D --species Hsap flags. Scirpy22 (version 0.8.0) was used to summarize and QC the output from TraCer.

Molecular spike data processing and analysis

Molecular spike data were extracted from aligned zUMIs BAM files and analyzed using the UMIcountR package10 (https://github.com/cziegenhain/UMIcountR, version 0.1.1). After loading the data using the ‘extract_spike_dat’ function, overrepresented spikes were discarded with a read cutoff of 25 and higher. We next used molecular spike observations across all cells and conditions with at least five reads per molecule to sample 26 ground truth mean expression levels from 1 to 316 molecules per cell using the ‘subsample_recompute’ function. We then plotted the mean counting difference shaded by the standard deviation.

Identification of TSO strand invasion artifact

To identify UMI reads with the TSO mis-primed artifact, we loaded sequencing reads into R using Rsamtools (version 2.6.0). In the case of paired-end sequencing, only first-in-pair reads were selected using the appropriate SAM flags. The strand orientation of the mapped reads was also determined from SAM flags. Then, we extracted a 20-bp window of genome sequence upstream of the read start position on the positive strand (+stranded mappings) or downstream of the read start position+read length on the negative strand (−stranded mappings) using the BSgenome package (version 1.62.0, human hg38). Afterwards, we checked for presence of the UMI sequence (with or without addition of GGG overhang) in the genomic window using R’s fuzzy string matching function (allowing 0, 1 or 2 mismatches). This identification procedure of artifactual UMI reads was also implemented in Python3 to process aligned BAM files and remove all artifactual reads/read pairs (available on GitHub: https://github.com/cziegenhain/pyTSOfilter).

Isoform-based analysis

For analysis of skipped-exon (SE) isoform differences, we retrieved annotations from GenCode (Human v39) and produced the SE annotation in GFF file format using BRIEkit-event (version 0.2.2). We filtered SE events using BRIEkit-event-filter with the following criteria: (1) retain SE events on autosomes and X/Y chromosomes; (2) SE events not overlapped by any other AS-exon; (3) surrounding introns are no shorter than a fixed length (100 bp); (4) presence of specific splice sites (that is, surrounded by AG-GT); and (5) SE events have a minimum distance (10 bp) from transcription start site or transcription termination site. Next, we summarized the coverage over the filtered SE events for each cell using the brie-count command from BRIE2 (version 2.0.6) using per-cell demultiplexed, aligned and TSO-artifact-filtered (see above) BAM files as input. The resulting count files in h5ad format were used as input for the Bayesian regression-based inference of PSI values and variable splicing detection over cell types. We applied the aggregated imputation mode introduced by BRIE2 to fit the gene-wise prior distribution through aggregation of data over all cells for each gene. Default settings for Monte Carlo EM were applied. Genes were filtered by requiring at least 50 counts, ten unique counts and at least 30 cells with unique counts. The minimum required minor isoform frequency was 0.001 (default settings). For variable splicing detection, we annotated each cell with a binarized dummy factor of cell type identity (Seurat clustering; Louvain resolution 2.0), removing the most common cell type to avoid collinearity of the design matrix. We loaded the resulting h5ad file into scanpy23 (version 1.8.2) for visualization of PSI values. For selection of SE events with significant cell type difference, we selected the highest evidence lower bound (ELBO) value per SE for each of the cell type LRT indices. Gene model plots to visualize significant cell-type-variable SE events were generated using the Gviz (version 1.38.1L, https://link.springer.com/protocol/10.1007%2F978-1-4939-3578-9_16) and rtracklayer (version 1.54.0, https://academic.oup.com/bioinformatics/article/25/14/1841/225816) R packages.

SNP and junction coverage analysis

Coverage over transcribed SNPs was analyzed per cell using the cellsnp-lite24 package (version 1.0.0) over the most common human polymorphisms (1000 Genomes Project minor allele frequency >0.005, 36 million positions). We applied default settings of minimum aggregated count over cells 20 and minimum MAPQ for read filtering of 20 (essentially discarding multimapping reads due to the mapping quality encoding of the STAR aligner). Coverage on RNA velocity informative positions was tabulated from zUMIs output BAM files. Fully spliced exon–exon reads were identified by the presence of splicing in their CIGAR value and exclusive assignment to exon regions, whereas nascent (that is, unspliced or partially spliced) exon–intron spanning reads were identified by the overlap with both exonic and intronic regions of the same gene.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.