Differential binding cell-SELEX: method for identification of cell specific aptamers using high throughput sequencing

Aptamers have evolved as a viable alternative to antibodies in recent years. High throughput sequencing (HTS) has revolutionized the aptamer research by increasing the number of reads from few using Sanger sequencing to millions of reads using HTS approach. Despite the availability and advantages of HTS compared to Sanger sequencing there are only 50 aptamer HTS sequencing samples available on public databases. HTS data for aptamer research are mostly used to compare sequence enrichment between subsequent selection cycles. This approach does not take full advantage of HTS because enrichment of sequences during selection can be due to inefficient negative selection when using live cells. Here we present differential binding cell-SELEX (systematic evolution of ligands by exponential enrichment) workflow that adapts FASTAptamer toolbox and bioinformatics tool edgeR that is mainly used for functional genomics to achieve more informative metrics about the selection process. We propose fast and practical high throughput aptamer identification method to be used with cell-SELEX technique to increase successful aptamer selection rate against live cells. The feasibility of our approach is demonstrated by performing aptamer selection against clear cell renal cell carcinoma (ccRCC) RCC-MF cell line using RC-124 cell line from healthy kidney tissue for negative selection.

Sequencing was performed after the 4 th and 11 th selection cycles. The reads per sample after the 1 4 6 initial quality filtration, adapter and constant primer binding region removal and length filtration (40 nt) 1 4 7 varied from 169,024 to 1,142,856 (Table 1).

4 8
Combining all replicates from both samples after data clean-up, we identified 3,627,938 unique 1 4 9 sequences within the 4 th selection cycle experiment and 503,107 unique sequences in the 11 th 1 5 0 selection cycle experiment. After filtering the reads by edgeR to remove the sequences that had lower 1 5 1 counts per million (CPM) than two per sample and that were present in less than two replicates, we 1 5 2 were left with 1,015 unique sequences for the 4 th cycle aptamers and 35,859 sequences for the 11 th 1 5 3 cycle aptamers.

5 4
For differential binding data analysis (Fig. 5), we further used selected sequences to run the edgeR 1 5 5 package, a statistical analysis software that is used to estimate differential expressions from RNA-seq 1 5 6 data. The resulting data were adjusted for multiple comparisons using the built-in Benjamini-Hochberg 1 5 7 approach and filtered by removing all sequences that have log 2 fold change (logFC) values less than 1 5 8 two or that had adjusted p-values higher than 0.0001. to identify 720 sequences with enrichment log 2 > 5. edgeR was used to perform differential binding 1 6 5 analysis, resulting in 17 candidate sequences. Matching the sequences resulted in six aptamer 1 6 6 candidates that are represented in both analyses.

6 7
Comparing differential binding datasets using the 4 th selection cycle enriched library, we were unable 1 6 8 to identify any significantly differentially bound sequences based on the count per million (CPM) of 1 6 9 each sequence and a fold change (FC) comparison between two cell lines (Fig. 6a) to specifically target ccRCC cells (if the log 2 cut off value is decreased to 5, it is possible to identify 6 1 8 3 sequences that can be found in both the differential binding analysis and enrichment analysis results).

8 4
We also ordered all unique sequences that were present in the 11 th pool by CPM and calculated the 1 8 5 log 2 enrichment value between the 4 th and 11 th cycle (Supplementary Table 4). Log 2 enrichment 1 8 6 values for the top 10 most abundant sequences ranged from 4.7 to 6.2, and 7 of 10 sequences had a 1 8 7 Log 2 value above 5, meaning that these sequences are also included in the enrichment analysis indicates increased binding to the RCC-MF ccRCC cells, red dots indicate that these results are 1 9 6 statistically significant according to an adjusted p-value < 0.0001 using edgeR and have logFC > 2 in 1 9 7 absolute numbers. All results that fulfil these criteria can be seen in (c).

9 8
Differential binding results confirm that it is possible to use edgeR within our pipeline to identify the 1 9 9 most likely candidate molecules for further testing.  (Table 2). We estimated a population shift as a mode of fluorescence intensity (MFI) for 2 0 9 each aptamer sample (n=3). The data were corrected by subtracting the MFI from a sample that was 2 1 0 incubated with a randomized starting library (MFI lead-sequence -MFI random-library ).

1 1
Corrected MFIs were compared with the t-test (significance defined as p < 0.05, n=3) using GraphPad

1 2
Prism to determine if our identified sequences altogether bind more to RCC-MF cells than to RC-124 2 1 3 cells. Three sequences (DB-4, EN-2, MB-3) were confirmed to be differentially bound using flow 2 1 4 cytometry by comparing MFIs (Fig. 7, Fig. S2). While MB-3, identified as the 3 rd most abundant 2 1 5 sequence, was significantly (p=0.002) differentially bound, it was targeted towards RC-124 cells. The 2 1 6 EN-2 sequence was identified using enrichment analysis and was statistically significantly (p=0.013) 2 1 7 binding to RCC-MF cells. DB-4 was significantly (p=0.019) more bound to RCC-MF cells and was 2 1 8 identified through a combined differential binding cell-SELEX and enrichment approach. hinges correspond to the first and third quartiles, and the whiskers mark the 1.5* interquartile range 2 2 6 (IQR). Statistical significance was determined with a t-test using the GraphPad Prism software.

7
The binding of selected aptamer sequences was identified through differential binding cell-SELEX  The main goal achieved in this research is the development of a differential binding cell-SELEX 2 5 2 method. This method can identify cell type-specific aptamer sequences from cell-SELEX selection 2 5 3 pools that would not be selected by other cell-SELEX methods and thus would remain overlooked by 2 5 4 the investigators.

5 5
Currently, the most often used analysis for aptamer finding using HTS data includes enrichment 2 5 6 analysis, which means a comparison of the abundance of one particular sequence at the beginning of 2 5 7 the SELEX procedure to the abundance of the same sequence after the SELEX procedure.

5 8
Enrichment analysis can identify a large number of oligonucleotides with very similar log 2 enrichment 2 5 9 values, as can be seen in our results (Supplementary Table 2). However, enrichment analysis is 2 6 0 rarely useful for cell-SELEX because of the high possibility to enrich non-specific sequences. Using 2 6 1 enrichment analysis with a cut-off value of log 2 > 5, we identified 720 sequences to be further tested.

6 2
However, when the same sequencing dataset was submitted for differential binding analysis using 2 6 3 edgeR, we identified 17 sequences that were more abundant on the surface of RCC-MF cells than on 2 6 4 RC-124 cells.

6 5
Enrichment analysis identified one sequence (EN-2) that was significantly (p=0.013) more bound to 2 6 6 target RCC-MF cells, as also confirmed by flow cytometry (Fig. 7). We were able to confirm using flow 2 6 7 cytometry that another sequence (DB-4), identified with a combined differential binding cell-SELEX 2 6 8 and enrichment analysis, was significantly (p=0.019) more bound to the RCC-MF cells. Importantly, 2 6 9 DB-4 was found between 720 sequences identified using enrichment analysis, but only as the 528 th 2 7 0 most enriched sequence. This provides scientific evidence that our approach can be used to identify 2 7 1 lead aptamers that most likely would be lost during enrichment analysis.

7 2
MB-3, one of the most abundant sequences in the dataset, showed significant binding to RC-124 cells.

7 3
MB-3 was identified neither in the enrichment analysis results nor in the differential binding cell-2 7 4 SELEX results. However, seven of the 10 most abundant aptamer sequences after the cell-SELEX 2 7 5 process were enriched above the set cut-off value log 2 > 5 and thus did appear in the enrichment 2 7 6 analysis results. None of these sequences appeared in the differential binding results because they 2 7 7 did not pass the statistical significance test applied to logFC. These observations are in line with 2 7 8 previous statements that the most abundant aptamer sequences are not necessarily the best binders 2 7 9 30 . This proves the value of the differential binding approach for excluding the non-specifically 2 8 0 enriched sequences during the cell-SELEX procedure.

8 1
After noticing high guanine abundance in several of the identified lead sequences, we searched for G- represents the identified G4 motifs.

9 6
Differential binding cell-SELEX uses edgeR to compare how all sequences that can be found in the 2 9 7 final enriched aptamer library interact with the control and target cells; it is also used to estimate the 2 9 8 statistical significance of these differences. There are several bioinformatics tools available to analyse 2 9 9 the statistical significance of the differential expression for RNA-seq data 25,33,34 . To the best of our 3 0 0 knowledge, none of these tools have been applied to estimate differentially bound aptamers on the 3 0 1 cell surface. edgeR was chosen because it is compatible with the existing data analysis workflows in 3 0 2 R 35 . A combination of enrichment analysis and the differential binding approach provides an 3 0 3 algorithm to choose target sequences for further analysis.

0 4
Altogether, we demonstrate a combined analysis pipeline that can be used to identify lead aptamers 3 0 5 from low binding selectivity aptamer libraries after cell-SELEX experiments. We propose a fast and 3 0 6 practical high throughput aptamer identification method to be used with the cell-SELEX technique to 3 0 7 increase the successful aptamer selection rate against live cells.

0 8
A higher number of sequencing reads during differential binding cell-SELEX could even further 3 0 9 increase the likelihood to identify low abundance, but differentially bound sequences specific to cells 3 1 0 of interest. Sequences that were present only in one replicate from each selection pool were 3 1 1 discarded. After the 4 th selection cycle, only a few sequences were present in more than one 3 1 2 1 4 sequencing replicate (Fig. 6a) compared to the 11 th cycle (Fig. 6b). An increased number of reads 3 1 3 would cover more diverse libraries and would make it possible to identify differentially bound 3 1 4 aptamers using fewer selection cycles.

1 5
The cell-SELEX design described in this research uses commercially available human RCC-MF and 3 1 6 RC-124 cells both as a target and a negative control. We are the first to use these cell lines for 3 1 7 aptamer selection with a cell-SELEX approach. However, it could be more suitable to use patient-3 1 8 matched primary cells isolated from the tumour site and adjacent healthy kidney tissue within a few 3 1 9 passages after isolation, when cells are most likely to represent the diversity found in clinical settings 3 2 0 36 .

2 1
The differential binding cell-SELEX method developed here can be used to accelerate aptamer 3 2 2 selection based on HTS analysis. Additional information from differential binding cell-SELEX reduces not only to identify aptamers against cell lines but also against primary cells isolated from patient 3 2 5 samples.

2 6
We conclude that the differential binding cell-SELEX method can be used to characterize not only 3 2 7 sequence enrichment between selection cycles, but also to select aptamer sequences that selectively