Modular barcode beads for microfluidic single cell genomics

Barcode beads allow efficient nucleic acid tagging in single cell genomics. Current barcode designs, however, are fabricated with a particular application in mind. Repurposing to novel targets, or altering to add additional targets as information is obtained is possible but the result is suboptimal. Here, we describe a modular framework that simplifies generation of multifunctional beads and allows their easy extension to new targets.


INTRODUCTION
Biological samples from tissues or the environment often contain communities of distinct cell types. Such mixtures can be deconvoluted by isolating subsets for analysis or employing single cell methods. For example, single cell genomics (scDNAseq) allows identification of the clonal composition of cancer cells, while single cell transcriptomics (scRNAseq) enables deconvolution of mixed cells with distinct phenotypes (1). High throughput microfluidic methods can analyse thousands of cells to characterize large and heterogeneous populations and, therefore, are becoming ubiquitous in biological research.
A key step in single cell genomics workflows is loading droplets with high concentrations of barcode oligos to tag nucleic acids of interest; the resultant tagged molecules can be pooled for all cells and efficiently sequenced in a single run. Barcode loading is often accomplished using beads on which barcode sequences are synthesized (2,3). Efficient microfluidic techniques allow most cells to be paired with a bead(4) (Figure 1a), while efficient synthesis of oligos on beads provides ample barcode for target tagging, yielding optimal sequencing data. However, current barcode designs utilize fixed targeting primers that are not easily repurposed; consequently, if new targets are identified, a new batch of beads must be synthesized, which is expensive and laborious. For example, beads used for scDNAseq cannot be repurposed for scRNAseq, nor can additional targets be easily added to existing whole transcriptome beads; this would allow, for instance, sensitive capture of guide RNAs used in genome wide knockout screens (5,6). If a universal barcode bead could be designed that could be easily and cost effectively retargeted to new sequences, it would allow easy repurposing of existing beads to new targets, accelerating single cell experiments and reducing cost and waste.
Here, we describe a modular framework that allows easy repurposing of existing bead batches to new targets ( Figure 1b)(supplementary protocol). The modular design is compatible with described single cell sequencing workflows, including scDNAseq (7), scRNAseq (2), Abseq (8), multimodal analyses (9)(10)(11)(12), and genome wide knockout screens (13)(14)(15). Moreover, the optimized structure generates compact barcodes that can be assembled in a fraction of the time and cost compared to existing methods. The platform is general and flexible, making it useful for numerous single cell genomics applications.

Microfluidic device fabrication
Devices were fabricated with standard photolithography techniques (16). Custom device fabrication is not necessary to use these beads and can be substituted with commercially available instruments (e.g. from 10x Genomics, Mission Bio, 1CellBio and others). Master structures were made with Su8 3025 photoresist (MicroChem, Westborough, MA, USA) on a three inch silicon waver (University Wafer) by spin coating, soft baking at 95°C for 20 min and subjecting to 3 min UV-exposure through printed photolithography masks (CAD/Art Services, 12000 DPI)(supplementary file). Post UV exposure, the wafer was baked at 95°C for 2 min and cooled to room temperature and developed in a propylene glycol monomethyl ether acetate (Sigma Aldrich) bath, rinsed with acetate, and dried and hard baked at 225°C for 10 min. Curing agent and PDMS were mixed in 1:10 ratio, degassed and poured over the master structure and baked at 65°C for 4 h, removed from the master and punched with a 0.75 mm biopsy core (World Precision Instruments). The device was then bonded to a glass slide using O2 plasma and the channels were treated with Aquapel (PPG Industries) to render them hydrophobic. Aquapel was purged from the channels with air after 5 min contact time and residual liquid evaporated by baking at 65°C for 15 min.
Barcode primer plate and splint plate (supplementary table) for first round of split-pool barcode synthesis was prepared by combining each barcode with its cognate splint at a 1:1 ratio and a final concentration of 100 μMM. Paired barcode and splints were phosphorylated in a PCR plate by combining 20 μMl of oligonucleotides with 40 μMl phosphorylation mix (1X T4 ligation buffer, 0.2 mg/ml BSA, 0.167 U/μMl T4 PNK (NEB)) and incubated for 30 min at 37°C and heat inactivated for 20 min at 65°C. After phosphorylation, 100 μMl of bead suspension to each barcode splint pair. To start the first round of ligation 40 μMl of ligation mix (9.54 U/ul T4 ligase (NEB) in 1x T4 ligase buffer) was added to each well and the plate sealed and incubated for 1-4 h at room temperature and the enzyme inactivated by heating to 65°C for 10 min. Beads were collected, washed five times in TET and resuspended in ligation buffer at a 1:1 solid:solvent fraction for the next round of ligation. This process was repeated for the barcode fragments 2 and 3 to yield the final modular gelbeads which can be quickly functionalized for a specific purpose.
To prepare the barcoded beads for single cell experiments we resuspended the beads at 1:2 solid to liquid in T4 ligation buffer and performed a splinted DNA ligation for 1 h at room temperature. For RNA beads we used 10 μMM pBB4: CTCGAATAGG as splint and 10 μMM pBB5: /5Phos/TTCGAGNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTV as primer. For the cancer hot spot panel we phosphorylated a set of 49 primers (Mission Bio Acute Myeloid Leukemia Panel) or 16 primers (supplementary table) at equimolar ratio and ligated to the beads at 10 μMM total concentration using 10 μMM pBB8: CTGCGAGTACTAGG or pBB4 as splint. To render the barcodes single stranded, beads were washed four times in denaturing solution (100 mM NaOH, 0.5 % v/v Brij-35) and washing solution quenched by resuspending in low salt buffer (10 mM NaCl, 10 mM Tris-HCl pH 8.0, 0.1 mM EDTA, 0.1% Tween-20).
A detailed step-by-step description is provided as supplementary protocol.

Barcode release test
Barcode beads (2 μMl) were resuspended in LS (10 mM Tris-HCl pH 7.5, 1 mM MgCl2, 50 mM NaCl, 0.1% Tween20) and incubated for 1 minute. The beads were pelleted at 2000g, the supernatant removed and the beads resusbended in 20 μMl 1x CutSmart (New England Biolabs), 1x Maxima H-RT buffer (Thermo Scientific) or 1x Kappa HiFi PCR buffer (Kapa Biosystems). For locally doublestranded cleavage tests pBB3 was added to a final concentration of 3 μMM. Then, 0.4 μMl USER II (New England Biolabs) enzyme mix was added an the suspension incubated for 45 min at 37°C, and heat inactivated. FAM-probe pBB6 was added to a final concentration of 1 μMM and the beads incubate at room temperature for 15 minutes under rotation. Beads were washed 3 times in 1 ml TET and imaged on a fluorescence microscope (EVOS cell imaging systems, Thermo Fisher) at constant light intensity, shutter speed and signal amplification.
For bioanalyzer traces 1 μMl beads were resuspended in 20 μMl CutSmart and released with 0.5 μMl USER II by incubating 30 minutes at 37°C. The Samples were diluted 4 fold with water and 1 μMl supernatant was loaded on a Bioanalyzer High Sensitivity DNA electrophoresis chip (Agilent Technologies).

Cell encapsulation and barcoding
For the cancer hot spot experiment K562 and Raji cells or P493-6 and LAX7R cells were mixed at a 1:1 ratio and resuspended in phosphate-buffered saline (PBS) with 10.2% (w/v) Iodixanol at an approximate concentration of 3 million cells/ml. Cells were co-flowed with lysis buffer (100 mM Tris at pH 8.0, 0.5% IGEPAL, proteinase K 1.0 mg/mL)(18) each at 1500 μMl/h and with 3000 μMl/h HFE-7500 with 2% (w/v) PEG-PFPE amphiphilic block copolymer on a bubble-trigger device (19) (20) (Figure 2a). The droplets equivalent of about 1000 cells (69 s fractions) were collected, the oil excess oil removed from the collection tubes and replaced with FC40 with 5% (w/v) PEG-PFPE amphiphilic block copolymer. Droplet PCR was thermocycled with the following conditions: 30min at 30°C to release primers, 3 min at 95°C; 20 cycles of 20 s at 98°C, 10 s at 72°C, 4 min at 62°C, and 30 s at 72°C; and a final step of 2 min at 72°C with all ramp rates set to 1°C/s. Emulsion was broken with 1H,1H,2H,2H-Perfluoro-1octanol diluted with 60 μMl water the beads pelleted and 50 μMl supernatant removed. To supernatant 5 μMl 10x CutSmart (NEB) was added and incubated with 20 units ExoI nuclease (NEB) for 1 h at 37°C before purification with 42 μMl AMPure beads (Beckman Coulter). The sample was eluted in 20 μMl water which served as input for sequencing library generation. Cancer hotspot experiments with CCRF-CEM and K562 cells were done as previously described (21).
For the RNA experiment K562 and 3T3 cells were mixed at a 1:1 ratio and resuspended at 2.57 million cells/ml in the same buffer as above. Cells were co-flowed with lysis buffer (30 mM Na-citrate pH 6.5, 0.2 % Trition-X100, 0.2 % SDS, 2 mM EDTA, 10 mM DTT) on the same device as above, droplets collected and incubate 1 h at 4°C. Same bead-and-droplet merging device was used to combine pBB5 functionalized, closed packed barcoding beads in RNA bead buffer (

Library preparation and sequencing
To the cancer hot spot library P5 and P7 sequences were attached by PCR using the custom pBB9 primer and Nextera N701 (Illumina). Library was purified with 0.6x volume fraction AMPure beads, its concentration measured by a fluorometer (Qubit 3.0, Invitrogen) and the absence of primer dimers verified on a Bioanalyzer High Sensitivity DNA electrophoresis chip (Agilent Technologies). For sequencing a MiSeq V2 300 cycles kit (Illumina) was used and the library diluted to 12 pM according to the recommendations of the sequencing kit manual. The library was sequenced in paired end mode and each side sequenced over 150 base pairs.
For RNA library, second strand synthesis and linear amplification by in vitro transcription (IVT) was done as described previously (22) without fragmenting RNA after IVT. IVT product was reverse transcribed with the primer: AAGCAGTGGTATCAACGCAGAGTGTANNNGGNNNB(23) as described (22), purified wit 0.9x volume fraction AMPure beads and the concentration measured by the Qubid dsDNA HS assay (Thermofisher). 13.8 ng RNA/DNA hybrid was fragmented with 20 μMg Tn5(24, 25) (assembled with pBB11 and pBB12) in 50 μMl. Library was amplified with pBB9 and pBB13 and NEBNext Ultra II Q5 according to the manufactorer protocol, purified with 0.65x AMPure beads and prepared a 12 pM sequencing library. The library was sequenced on a MiSeq V3 150 cycles kit (Illumina) in paired end mode, distributed 55/110 cycles, using custom sequencing primers pBB10 and pBB11 (supplementary table).

Bioinformatic data evaluation
Bead barcodes were parsed from the read 1 file with a custom script. The program cutadapt (v2.4) was used to find the common ligation scar between the combinatorial barcodes and the forward read primers. This step is necessary because the 0 -3 base pair spacer in the barcode bead sequence (which helps cluster identification on the sequencing device by increasing sequence diversity for otherwise identical segments of the barcode sequence such as the ligation scars) prevents extraction of the combinatorial barcode blocks by distance from the 5'-end. Barcode blocks are extracted by distance from the position of the ligation scar (3'-side distance) and matched against a white list, allowing a maximum edit distance of one, to identify true barcode sequences. Read 1 and read 2 sequences were demultiplexed into barcode groups and valid cell barcode groups were discriminated from background barcode groups by identifying the inflection point of the barcode rank plot versus number of associated reads. Reads from valid cell barcodes were processed as previously described (12). Briefly, FASTQ files with valid reads were aligned to the hg19 build of the human genome reference using bowtie2 (v2.3.4.1), filtered (properly mapped, mapping quality > 2, primary alignment), sorted, and indexed with samtools (v1.8). HaplotypeCaller from the GATK suite (v.4.1.3.0) was used to produce GVCF files and genotyped jointly on all genomic intervals with GATK GenotypeGVCFs. Genotyped intervals were combined into a single variant call format (VCF) file and multiallelic records split and left-aligned using bcftools (v1.9). Finally, variant records were exported to HDF5 format using a condensed representation of the genotyping calls (0: wildtype; 1: heterozygous alternate; 2: homozygous alternate; 3: no call). The result is a cell by allelic variant matrix, V, with the condensed genotype call as categorial matrix elements.
To cluster cells, genotype was one-hot encoded, converting V to four sparse binary matrices, V0 to V3. Pairwise cell-cell similarity was calculated by computing the dot product for the matrices corresponding to heterozygous and alternate calls and summing them: V1V1 T + V2V2 T , yielding the cellcell similarity matrix S. Hierarchical clustering of S was done with SciPy (v1.3.1) using scipy.cluster.hierarchy.linkage and Ward's minimum variance method. Variant calls that distinguish the two cell lines were identified by discarding all calls for which less than 10% of cells were called as alternate (sum of heterozygous and homozygous alternate) yielding 27 potentially informative variants. Remaining variant calls were inspected and discarded if constant over all cells. The remaining 14 variant call positions are the final list.
RNA sequencing data was evaluated with Kallisto(26) and Bustool (27). Since Kallisto currently does not support variable length barcode sequences, the 0 -3 base pair spacer was deleted from each read by a custom script. Next, Kallisto bus with option -x 0,0,24:0,25,32:1,0,0 was run to extract barcode sequences and pseudo align transcripts. Skipping barcode correction by the whitelist approach, output was converted to gene counts with Bustool and visualized in python using Scanpy (28) and matplotlib (29).

Compact barcode synthesis by split-pool ligation
The core concept of our bead design is to use modular sequence assembly in the synthesis process. Unlike existing approaches which synthesize and target beads in one workflow, thereby yielding beads limited to one sequencing task, we design beads in which the barcode sequence is assembled in a first step, and targeting in a second step with appropriate primers, such as poly-T for scRNAseq or multiplexed panels for scDNAseq.
Barcode bead fabrication starts with microfluidic emulsification of a gel precursor solution serving as the bead scaffold (22,30). Once polymerized, we recover the beads and the covalently linked primers serve as anchors for barcode synthesis.
To synthesize barcode sequences on the beads, we use a split-pool approach. We distribute the beads into wells containing unique barcode sequence fragments, adding the fragments to the bead stubs. The beads are recovered from all wells, mixed, and the mixed population divided and redistributed into the wells to attach another barcode sequence. The number of paths through the wells grows exponentially with the number of split-pool cycles. Obtaining high barcode diversity is important because cells are assigned barcode sequences randomly in accordance with a Poisson process. Consequently, a fraction of cells will be assigned a barcode already in use, termed barcodeclash, preventing recovery of single cell data in these instances. The fraction of clashing cells is a function of the total barcode space and number of profiled cells, f clas h =1− ( 1− 1 N ) , where N is the number of different barcodes and C the number of profiled cells.
Barcode synthesis uses enzymatic reactions to concatenate pre-synthesized oligos. We use enzymatic synthesis because the acrylamide hydrogel backbone of the beads is incompatible with phosphoramidites oligo synthesis. As concatenated subsequences we use octamers from a library with minimum Levenshtein distance four (31,32), which enables robust barcode identification even with sequencing errors.
DNA polymerases have previously been used in split-pool protocols, but require handles for specific hybridization, inflating barcode length. We instead use T4 DNA ligase which operates with four base pair overhangs (33) (Supplementary figure 2a, b). These overhangs ensure specific ligation, since different sequences used in sequential steps prevent improper propagation of reactive stubs remaining from failed ligations in previous rounds. To characterize the process, we measure ligation efficiency in a split pool synthesis reaction, observing >80% of stubs are ligated per round, such that after three rounds 64% of oligos on the bead are full length (Supplementary Figure 2c). In contrast, beads fabricated using polymerases achieve just 36% yield after two rounds (22). Thus, our results demonstrate that ligation is more efficient for barcode bead fabrication in split-pool protocols than polymerase extension while also yielding more compact barcodes that reduce sequencing waste.
Compact barcodes allow more sequence diversity in the same oligo length; this is important because barcode diversity sets clash rate and, thus, limits the number of cells that can be sequenced. For example, ligation permits assembly of three blocks into a barcode while polymerase extension allows only two. The result is that with three 96-well plates of barcode subsequences, ligation encodes 884,736 different sequences, allowing 45,382 cells per experiment for a 5% clash rate. By contrast, using polymerase extension to assemble two-block barcodes from eight 96 well plates yields just 147,456 barcodes, allowing just 7,564 cells to be profiled. To match ligation, polymerase extension would require eighteen 96-well plates.
In terms of cost, ligation is equivalent to polymerase extension per fabricated bead volume. However, the massive reduction in required barcode subsequences (3 rather than 18) affords a significant lower upfront investment. Ligation is also faster and less laborious: It uses double stranded DNA while polymerases require single stranded, obviating the need to denature after each cycle and reducing the number of wash steps. Moreover, the considerably smaller number of oligos makes manual pipetting feasible, whereas synthesis of two-block polymerase libraries from 18 plates requires robotics to ensure quality beads.

Enzymatic barcode release is cost effective
Popular protocols for single cell genomics release barcodes from beads to increase availability for reverse transcription or PCR priming. This is normally achieved using UV cleavable chemical moieties, like 2-nitrobenzyl, or disulfides that can be broken with a reducing agent (2,34). While fast to cleave, these linkers are expensive (Supplementary table 1) and require care to avoid premature cleavage during bead fabrication and handling. Our protocol instead employs enzymatic cleavage, yielding significant cost savings while also making the approach less sensitive to premature cleavage. Because genomic DNA is a valuable substrate in many single cell experiments, we avoid restriction enzymes including rare cutting Type IIS enzymes. Thus, we incorporate deoxyribose uracil (dU) as the linker, which does not exist in genomic DNA and can be cleaved by an enzyme mix comprising uracil DNA glycosylase and endonuclease III. Oligos containing dU are readily synthesized and inexpensive, and the linker consumes just one base. Moreover, dU containing oligos are stable and specifically cleavable by these enzymes, which are also readily available and inexpensive.
To characterize the efficiency of this barcode release strategy, we incubate beads functionalized with oligos containing dU in PCR and reverse transcription buffer, mimicking conditions of single cell genomics protocols. To measure cleavage efficiency, we use FAM-labelled probes complimentary to the cleavable oligo and measure fluorescence for cleaved beads and positive and negative controls. We observe little activity on single stranded oligos, but near complete cleavage of double stranded in both buffer conditions (Supplementary figure 2d, e). This makes sense because the enzymes natural substrate is double stranded DNA. Thus, to make the single stranded barcodes cleavable, we include an oligo complementary to the cleavage region in the mix, making it locally double stranded and achieving efficient cleavage. These results show that uracil cleavage is an efficient mechanism for barcode release, while reducing bead fabrication cost to about a third.

Flexible bead usage for scDNAseq and scRNAseq
To demonstrate the benefit of the modular bead design in typical use cases we create a mock cancer cell system consisting of two human cell lines, Raji and K562, and profile their clonal relationship. We select a panel of 49 genomic locations (supplementary table) that are known hot spots for mutations in acute myeloid leukemia, functionalizing the beads with the corresponding set of 49 forward primers, and running the associated microfluidic workflow (7). After cell encapsulation, lysis, and chromatin digestion, we merge the droplets with barcode beads, PCR reagents, and enzymes to cleave the uracil linkers, thereby releasing the barcode primers into solution for targeted amplification of the 49 loci ( Figure 2a). The resultant sequence data yields high quality reads across the panel for 1,020 cells, with a median of 3447 reads and 45 detected amplicon loci per cell (Supplementary Figure 3a). After aligning the fragments to the human reference genome and performing variant calling, we use hierarchical clustering to assign genotypes via Ward's minimum variance method (35), obtaining two separated clusters (Figure 2b). A comparison of the detected variant calls in single cells with the genotype obtained from a bulk measurement of Raji and K562 cells shows that the clusters indeed represent the two input lines (Figure 2c). We apply the same workflow to other cell lines such as P493-6, LAX7R; and profile CEM and K562 with the same beads but a different microfluidic strategy (21), to demonstrate that the method is insensitive to cellular or microfluidic differences (Supplementary figure 4). These experiments show that our barcode beads enable high throughput single cell genome sequencing with data quality equivalent to traditional beads.
A common quality control experiment in single cell RNA sequencing is to profile a cell suspension of two species because individual transcripts can be assigned to cell type based on their sequence. To showcase the potential of the modular bead design for scRNAseq we therefore attach a poly-T primer instead of the cancer hot spot panel to the same bead batch and prepare a mixed mouse (NIH 3T3) and human (K562) cell sample. We use the same microfluidic workflow, but include an acidic lysis solution to stabilize RNA, and reverse transcription (RT) instead of PCR reagents, to produce a scRNAseq library. We collect droplets for ~1 minute aiming to capture 300-400 cells; sequencing yields 134 mouse cells with a median count of 2615 transcripts, 184 human cells with 4640 transcripts and 33 cells of lower quality with a median transcript count below 1000 (Figure 2d, Supplementary  figure 3b). This demonstrates that our repurposed beads perform comparably to dedicated beads for scRNAseq (2). Bead modularity thus enables rapid deployment of the same barcode beads for single cell workflows targeting distinct molecules.

CONCLUSIONS
Our modular barcode design reduces the cost of bead synthesis while allowing rapid and inexpensive repurposing to other targets, including DNA and RNA. This makes them attractive for multiomic experiments by allowing fine tuning of primer sequences and concentrations to obtain best conditions for simultaneous DNA and RNA sequencing. Moreover, the barcode sequences are compact and efficiently synthesized to full length, reducing sequencing waste and minimally consuming read length.
A compact barcode structure also allows PCR primers targeted to specific cells of interest to increase sequence coverage and improve signal quality in the pooled library (36). The ease with which highly multiplexed primers can be added enables new opportunities to enhance scRNAseq sensitivity. For example, poly-T mRNA capture yields an unbiased profile of expressed transcripts, but because coverage is typically below ~10%, low abundance transcripts may be missed. Our modular design can enhance detection of such transcripts by dedicating a fraction of all primers on the bead to their capture alongside poly-T probes. Besides established DNA or RNA protocols, we expect that bead modularity will accelerate the introduction of novel assays by reducing the risk of fabricating dedicated bead batches that may not work.