Combined aptamer and transcriptome sequencing of single cells

The transcriptome and proteome encode distinct information that is important for characterizing heterogeneous biological systems. We demonstrate a method to simultaneously characterize the transcriptomes and proteomes of single cells at high throughput using aptamer probes and droplet-based single cell sequencing. With our method, we differentiate distinct cell types based on aptamer surface binding and gene expression patterns. Aptamers provide advantages over antibodies for single cell protein characterization, including rapid, in vitro, and high-purity generation via SELEX, and the ability to amplify and detect them with PCR and sequencing.


Results
Aptamers allow single cell surface profiling via droplet barcoding and sequencing. Aptamers are nucleic acids that adopt a three-dimensional fold and bind specifically to protein epitopes and small molecules 34,36 . Like antibodies, they can be used in combination for multiplexed characterization 42 , while being easily identified via nucleic acid sequencing. To allow simultaneous sequencing of cell mRNA and aptamers, we polyadenylate the aptamers to mimic the structure of mRNA; this allows both to be captured and sequenced using identical poly-thymine primers (Fig. 1a). To label the cells with aptamers, the mixed aptamer library is incubated with a cell suspension, and unbound aptamers washed away (Fig. 1b). To barcode the cells, we employ Drop-seq, a high throughput microfluidic approach 13 , although other barcoding methods can also be used 12,[43][44][45] . In Dropseq, cells are isolated in droplets with barcoded beads and lysis buffer (Fig. 1c) 13 . Upon lysis, aptamers and mRNA hybridize to poly-thymine barcode sequences on the beads (Fig. 1d), followed by demulsification, washing, and nucleic acid amplification 12,[46][47][48] . Amplification conjugates a unique barcode sequence to all aptamers and transcripts of a single cell, allowing material for many cells to be pooled, sequenced, and computationally deconvoluted by barcode. This provides, for every cell, paired aptamer and transcript reads (Fig. 1e) that are separated in silico (Fig. 1f,g). Single cells of the washed cell suspension are co-encapsulated with beads carrying a unique DNA barcode in a microfluidic device. (d) Each droplet contains lysis solution to lyse cells. Aptamers and mRNA molecules can hybridize with the barcoding beads by means of their poly-A sequence. Using the barcode bead as a primer in reverse transcription and DNA polymerase reactions, the droplet-specific unique barcode is fused to the mRNA and aptamer, providing a cell specific identifier. (e) Pooling all beads after barcode fusion, sequencing their content in parallel, and deconvoluting aptamers and mRNAs, allows evaluation of epitope profiles in single cells (f). (g) Since the cell-specific barcode is shared between aptamers and transcripts, the epitope data can be combined with the single cell transcriptome for further interdependent analysis.
To assess the impact of the poly-A tail on aptamer fold, we use RNAstructure 55 , a secondary structure prediction algorithm, and predict the same fold for the aptamers with and without poly-A tail (Fig. 2a). To assess whether the tails interfere with binding, we synthesize all eight polyadenylated aptamers and apply them to Ramos and control 3T3 cells. The aptamers are incubated at equal molar concentration with either cell line, followed by five wash cycles and concentration estimation in the final wash supernatant and final cell suspension by qPCR. In agreement with previous studies, TD05, TD08, and TE02 are highly enriched in Ramos cell suspensions, while TD09 is moderately enriched. In contrast, TC01, TE17, and sgc8a are not enriched in Ramos cells, as expected ( Fig. 2b) (Supplementary Fig. 1). Sgc3b remains below the detection limit for either cell line. Notably, although sgc8a is a reported binder of human T-cells 42 , it enriches in mouse 3T3 cells. However, a previous study showed that both sgc8 and a PKT7 binding antibody can interact with other cell lines presumably devoid of PKT7 56 . We conclude that poly-adenylating the aptamers does not affect fold or binding and that all except TC01 perform as previously reported in our hands.
Apt-seq provides independent information from RNAseq for inferring cell type. Inferring cell type from RNAseq data can be challenging due to the dynamic, complex, and multidimensional nature of gene expression. An advantage of Apt-seq is that it allows independent confirmation of cell type by aptamer binding. This can be used to support inferences from transcriptome data, or provide additional information for differentiating between related cells and states. To illustrate these benefits, we perform combined aptamer and transcriptome sequencing of suspended Ramos and 3T3 cells. After incubation and washing of the aptamers, we barcode the cells with Drop-seq. Drop-seq uses template switching to generate defined 3′-ends on cDNA, enabling subsequent PCR amplification. After amplification, cDNA is tagmented to generate ~500 bp fragments containing necessary sequencing handles. Aptamers lack the 5′-cap of mRNA and, thus, template switching is non-processive, preventing addition of the handles by this route. However, our aptamers are constructed with known flanking sequences, which we use for amplification and handle attachment. Additionally, the aptamers are short (below 200 bp), so that amplicons can be sequenced without tagmentation (Methods, Supplementary Tables 1 and 2). These simplifications allow efficient, joint processing of aptamers and mRNA in the same barcoding reaction.
After barcoding, the nucleic acids of many cells are pooled and sequenced. We aim for about 200 cells and perform limited Illumina MiSeq sequencing collecting ~12 million paired-end reads. We computationally group reads to single cells using barcodes, and then to single molecules using unique molecular identifiers (UMIs). After processing, we obtain high-quality transcriptome and aptamer data for 58 cells, with >4500 unique reads per cell. More cells can be obtained by sequencing deeper and collecting more beads 13 . As expected, most reads belong to the transcriptome, and a smaller fraction to surface-bound aptamers (Fig. 3a). The aptamer profiles segregate into two groups, with TD08 and TE02, and to a lesser degree TD05 and TD09. The aptamer sgc8a is predominantly anticorrelated with this group. We obtain few reads for TC01, TE17, and sgc3b ( Fig. 3b), consistent with our multicell qPCR results (Fig. 2b). Based on the qPCR results and because TD05 is a binder of the immunoglobulin heavy chain, we expect the first rectangular block to represent Ramos and the second 3T3 cells. Because these cells are from different species, cell type can be inferred by direct sequence analysis of cDNA. To verify our aptamer results, we thus evaluate the transcript reads of each cell. We order the transcriptome data using the same y-axis as the aptamers and again observe a clear block segregation, indicating that the gene expression of the two cell types is differentiated as expected. Based on transcript sequences, we confirm that the lower block corresponds to the human Ramos and the upper block to the mouse 3T3 cells (Fig. 3c). To confirm the relationship between expected aptamer and gene expression data, we construct a barnyard plot, displaying each cell as a 2D coordinate corresponding to the number of human and mouse transcripts identified within its barcode cluster (Fig. 3d). We extract the principal components of the aptamer read per cell matrix by calculating a singular value decomposition (SVD) (Fig. 3d, upper-left). This yields two eigenvector principle components, describing how the aptamers tend to correlate with each other. In agreement with the observed block structure of the aptamer profiles, we obtain an all positive first eigenvector with major contribution of the aptamers TD05, TD08, and TE02 (Fig. 3d, upper right) and a second eigenvector consisting primarily of sgc8a (Fig. 3d, lower-right). These eigenvectors are strongly anticorrelated, indicating that the aptamers cluster cleanly based on cell type. We use these eigenvectors to define a two-parameter coloring scheme, coloring each data point corresponding to a cell in the coordinate space according to its amplitudes along the two eigenvectors. Cells containing mostly human transcripts have amplitudes primarily along the first eigenvector (blue, Fig. 3c upper-right), while cells containing mostly mouse transcripts have amplitudes primarily along the second eigenvector (purple, Fig. 3d, lower-right). These results demonstrate that transcriptional signatures cluster with known patterns of aptamer binding and, more broadly, that aptamers can be used to label specific cell types in a format that can be read out with mRNA sequencing data.

Discussion
While single cell mRNA profiling provides a valuable snapshot into the inner workings of cells, it is just one of many characterizations of the cellular space. Inaccessible from transcriptome data, the proteome contains traces of post-translational regulatory events and exhibits different temporal dynamics 57,58 . Tapping into both domains, Apt-seq provides a strategy for highly multiplexed, single cell surface profiling simultaneous with the transcriptome. Because Apt-seq uses polyadenylated aptamer probes, it is compatible with most common single cell transcriptome sequencing methods, including Drop-seq, InDrops, and their commercial variants.
Apt-seq mirrors Abseq, and its successors cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) and RNA expression and protein sequencing (REAP-seq), in that it allows combined surface and transcriptome profiling of tens-of-thousands of single cells at high throughput [26][27][28] . However, aptamers provide practical and functional advantages over antibodies. In practical terms, aptamers are faster and cheaper to generate against many novel targets than antibodies. The in vitro SELEX process can be performed directly on living cells, obviating the need for antigen purification, which is not possible for many targets. Thus, Abseq may be preferred when the marker is best detected by a readily-available antibody, while Apt-seq may be superior for novel targets. Additionally, while antibodies require tag conjugation, aptamers can be used directly, since they are already sequenceable nucleic acids. In addition, patents on the original SELEX method have expired and new protocols have emerged that should expand the library of high-quality, specific aptamers 37,59-61 . Moreover, aptamers can be synthesized with higher purity and reliability than antibodies through established chemical synthesis pipelines; this may allow aptamers to overcome the poor reproducibility that has plagued antibodies, and potentially achieve higher experimental reliability.
Apt-seq provides new capabilities for high throughput single cell characterization. In addition to scaling numbers of cells sequenced per sample, there is an increasing interest in scaling the number of samples sequenced in total. Ideally, many samples would be pooled and sequenced on a single run, but the need to index each one necessitates microfluidic barcoding of every sample, an expensive and labor-intensive process. Apt-seq can address this by labeling samples with unique surface aptamers that can be used independent of transcriptome data to batch samples together in one microfluidic barcoding run. Since after cell washing the free aptamer concentration is orders of magnitude below their dissociation constant, there should be minimal cross contamination between samples.
Aptamers and antibodies can bind internal epitopes of cells using fixation and permeabilization protocols; however, such binding may be better with the smaller aptamers, having a hydrodynamic diameter of ~2 nm compared to ~15 nm for antibodies 41,62 . Indeed, many aptamers are readily internalized and staining of intracellular compartments can be more effective than with antibodies [63][64][65] . This will be important for combining intracellular proteomic and transcriptomic readouts 66 . Aptamers also enable measurement opportunities not possible with antibodies. For example, a raw aptamer library produced by cell-SELEX can be directly applied to cells, allowing different cell types to be inferred based on aptamer spectrum, rather than assembling an aptamer library from known binders. This should allow differentiation between cells even when distinct biomarkers are not known a priori. Additionally, multidentate, bispecific, or multipiece aptamers can be constructed by combining binding sequences in the same aptamer 67,68 , allowing assessment of epitope proximity in a way currently accomplished with fluorescence or proximity ligation assays, but in a format that can be highly multiplexed and performed simultaneously with cell surface and transcriptome profiling. Furthermore, riboswitch-like aptamers have a unique ability to bind small molecules and, thus, Apt-seq opens new possibilities for adding cellular metabolite characterization to simultaneous protein and nucleic acid measurements in single cells 69 .
Cell cultures. Ramos and 3T3 cells were cultured at 37 °C in RPMI-1640 medium supplemented with antibiotics and 5% fetal bovine serum (FBS) in the presence of 5% CO 2 .
Aptamer staining. Aptamers (from IDT) were folded as described before 50 Table 2). Signal was detected on a QuantStudio5 (Applied Biosystems) qPCR machine with SYBR green as reporter. A control experiment was performed to assess background priming on non-target aptamers and was found to be below detection limit. Amplification standard curves were measured for all aptamer primer pairs in triplicate. The model C(n) = C 0 b n , where C(n) corresponds to the concentration after n PCR cycles and C 0 to the start concentration, was log transformed and fit by linear regression to obtain the amplification rate b and the detection concentration C(n Ct ), which is reached at the cycle threshold (Ct). Both calculated constants were used to estimate the aptamer concentration in the experiment. Uncertainty in the constants was propagated and final uncertainty of the concentration estimates is presented as one sample standard deviation with applied Bessel's correction.
Single cell experiment. The aptamer-stained cell suspension was diluted in ice cold PBS containing 0.1% BSA to a final concentration of 1.06*10 5 cells/ml.
Single cell experiments were performed as described by 13 (online protocol: 70 ) with the following modifications. About 2000 beads were used for further processing, corresponding to about 100-200 cells. After reverse transcription with Maxima H Minus Reverse Transcriptase (Thermo Fisher), which produces barcoded cDNA-mRNA hybrids and barcoded double-stranded aptamers, we amplified both libraries together by 13 cycles of PCR. DropSeq uses template switching to introduce the sequence ACTCTGCGTTGATACCACTGCTT at the 3′-end of the cDNA which allows use of a single primer (smart PCR) (Supplementary Table 2) to amplify the complete barcoded construct. Since template switching is much more efficient on 5′-capped mRNA, it only rarely happens on aptamers. Therefore, we use a specific reverse primer for each different aptamer flanking region (Supplementary Table 1), which we add to the PCR mix at a ratio of 1:20 compared to the smart PCR primer. Next DropSeq uses tagmentation (Nextera) to trim the transcripts to a mean length of about 500 bp and to introduce the sequence overhang GTCTCGTGGGCTCGG. Since the length of the aptamers is significantly shorter, tagmentation is not expected to act on them. To enable downstream joint processing of aptamers and transcripts we introduced the tagmentation overhang as part of the aforementioned specific primers. We processed the sample further as described 13 . The cDNA library was sequenced on a MiSeq device (Illumina) with a MiSeq Reagent Kit v3 (Illumina) in paired-end mode.
Sequencing and data analysis. The python scripts for the evaluation of InDrops experiments 71 was adjusted to be applicable to the barcoding scheme of the DropSeq experiment. Without attempting any error correction, in a first pass all cellular barcodes with a count above 2000 where retained, which yielded 210 distinct barcodes. The corresponding sequences, containing transcripts and aptamers, were evaluated with Kallisto in the SCientifiC REPORTS | (2018) 8:2919 | DOI:10.1038/s41598-018-21153-y UMI-aware mode to generate a pseudo-alignment and to map transcripts to equivalence classes 72,73 . To reduce the influence of stochastic fluctuations in the data, cells were only included for further analysis if they yielded a minimum of 4500 unique transcript reads; 58 cells passed this filter. For display purpose, the 200 equivalence classes with the highest variance were selected (Fig. 3c). A barnyard plot was generated based on equivalence classes that could be unambiguously assigned to either mouse or human only (Fig. 3d). Aptamer sequences were identified separately by the program cutadabt 74 . For each aptamer three overlapping 20mers were generated and each of these fragments was used as a cutadabt query and tested against all sequences. If a sequence contained a consecutive 20mer that matched the query with at most one base mismatch, the sequence was considered a read of that aptamer. The identified aptamer sequences were collapsed based on their UMI. SVD was calculated based on the mean-centered normalized cells versus aptamers count matrix. The color of the x-coordinate in the barnyard plot corresponds to the amplitude of the first eigenvector (cyan) and the y-coordinate to the inverse amplitude of the second eigenvector (magenta) (Fig. 3d).
Data availability. The data that support the findings of this study are available from the corresponding author upon reasonable request.