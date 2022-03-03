ADAPT

Supplementary Notes describe ADAPT’s algorithms, data structures and implementation details. Supplementary Note 2 defines objective functions and describes how ADAPT optimizes them. Supplementary Note 3 describes how ADAPT enforces specificity. Supplementary Note 4 describes how ADAPT searches for genomic regions to target and links with sequence databases. Supplementary Note 5 describes how ADAPT forecasts relatively likely genome substitutions.

Introductory analyses

To illustrate viral database growth, we charted the growth in the number of viral genomes and their unique 31-mers over time (Supplementary Fig. 1). We first curated a list of viral species known to infect humans from a National Center for Biotechnology Information (NCBI) database70 (November 2019). For each, we took all NCBI genome neighbors31 (influenza sequences from the Influenza Virus resource71), which represent near-complete or complete genomes. To assign a date for each, we used the GenBank entry creation date rather than sample collection date for several reasons, including that this date more directly represents our focus in the analysis (when the sequence becomes present in the database) and that every entry on GenBank contains a value for this field. To control for some viruses having multiple segments (and thus sequences), we only used counts for one segment for each species, namely the segment that has the greatest number of sequences.

We used FLUAV subtyping as an example to demonstrate the effect of evolution on diagnostics (Extended Data Fig. 1a and Supplementary Fig. 2). We selected the most conserved k-mers—representing probe or guide sequences—from the sequences available at different years. Here, for simplicity, we ignored all other constraints, such as detection activity and specificity (the latter of which is critical for subtyping), which may further degrade the temporal performance of the selected k-mers. In particular, for each design year Y, we selected the 15 non-overlapping 30-mers found in the largest number of sequences taken from the two most recent years (Y − 1 and Y). We then measured the fraction of sequences in subsequent test years (Y, Y + 1, …) that exactly contain each of these k-mers. We performed the design strategy over ten resamplings of the sequences and use the mean fraction. We repeated this four times: for segment 4 (HA) sequences of H1 and H3 subtypes, and segment 6 (NA) sequences of N1 and N2 subtypes.

To visualize mutations accumulating on a genome during the course of an outbreak (Extended Data Fig. 1b), we used complete SARS-CoV-2 genomes from Global Initiative on Sharing All Influenza Data (GISAID)58. We called variants in all genomes, through 2020, against the reference genome ‘hCoV-19/Wuhan/IVDC-HB-01/2019’ (GISAID accession ‘EPI_ISL_402119’). For every date d between 1 February 2020 and 1 January 2021, spaced apart by 1 month, at every position we calculated the fraction of all genomes collected up to d that have a variant against the reference. We called all variants present between 0.1% and 1% frequency on some d as ‘low’ frequency and variants at ≥1% frequency on some d as ‘high’ frequency. We ignored all variants present at ≥1% frequency on the initial d (ancestral) or that were both low frequency on the initial d and stayed low frequency by the final d—that is, we kept the variants that transitioned to low or high frequency by the final d. We show the d when the variant first becomes called as low (light purple) or high (dark purple) frequency. If a variant transitions both to low and then to high frequency by the final d, we only show it for the d when it becomes high frequency.

Cas13a library design and testing

We designed a collection of CRISPR–Cas13a CRISPR RNA (crRNA) guides and target molecules to evaluate guide–target activity, focusing on assessing likely active guide–target pairs. First, we designed a target (the wild-type target) that is 865 nucleotides (nt) long (design details for the wild-type target are in the subsequent paragraph). We then created 94 guides (namely, the 28-nt spacers) tiling this wild-type target (Fig. 1a and Supplementary Fig. 3a). In the tiling scheme there are 30-nt blocks, each having four overlapping guides, in which the starts of the three guides, from the start of the most 5′ guide, are 4 nt, 13 nt and 23 nt. Of the 94 guides, 87 are experimental, three are negative controls and four are positive controls. We created 229 unique target sequences: one of them is the wild-type sequence (guides should exhibit activity against this target), 225 are experimental (mismatches and varying PFS alleles against the guides) and three are negative controls. All guides exactly match the wild-type target and should detect this, except the three negative control guides, which are not intended to detect any targets except one of the three negative control targets each. The four positive control guides target four 30-nt regions with a perfectly complementary sequence and non-G PFS that are held constant across all targets, with the exception of the three negative control targets. Across the experimental targets, the mismatches profile varying choices of positions and alleles against the guide. For the experimental targets, we generated single mismatches evenly spaced every 30 nt along the experimental region such that every guide targeting this region has either a single mismatch or an altered PFS at +1 or +2 nt from the protospacer; we created a total of 45 (3 × 15) such targets to probe all three possible mismatch alleles and 15 of 30 of the possible phasings. In the remainder of the experimental targets, we generated targets with two, three or four mismatches per 30-nt block with respect to the guide RNA in phase with the block. For these targets, we randomly selected mismatch positions to uniformly sample (or, when possible, exhaustively enumerate) average mismatch spacing and average mismatch distance to the center of the spacer, and randomly selected mismatch alleles. The 87 experimental guides may detect up to 226 unique target sequences (the wild type and 225 experimental targets), providing 19,662 experimental guide–target pairs.

To construct the wild-type target sequence, we aimed to produce a composition spanning viral genomic sequence diversity. In particular, we started with a previously described dataset of genomes from human-infecting viral species72, constructed a vector of the dinucleotide frequencies for each species and performed principal component analysis of the species from these vectors. For each 30-nt block of the wild-type target, we selected a point from the space of the first three principal components (uniformly at random), reconstructed a corresponding vector of dinucleotide frequencies (that is, transformed the point back to the original space) and then iteratively selected every next nucleotide in the block according to the distribution of dinucleotides. A goal of this scheme is for dinucleotides that are variable across viral species to also vary in frequency across the wild-type target: a dinucleotide that explains considerable variance across viral species (for example, is rich in some viral species and poor in others) ought to be rich in some blocks of the wild-type target and poor in other blocks, whereas a dinucleotide that explains little variance across species ought to have similar frequency along the target. In positions that would serve as a PFS for a guide, we disallowed G, and proportionately adjusted upwards the probability of choosing a G in non-PFS positions to maintain the total dinucleotide frequency in accordance with the randomly selected distribution (mismatches in experimental targets can still introduce a G PFS).

We synthesized the targets as DNA, in vitro transcribed them to RNA and synthesized the crRNAs as RNA. On all crRNAs, we used the same direct repeat (‘GAUUUAGACUACCCCAAAAACGAAGGGGACUAAAAC’). To determine a reasonable concentration for measuring fluorescence over time points, we tested eight concentrations of two targets and two guides in a pilot experiment (Supplementary Fig. 4a) and proceeded with 6.25 × 109 copies per µl. We tested the library using CARMEN; we followed the methodology described in ref. 8, which also contains the protocol. Briefly, a guide–target pair is enclosed in a droplet, together with the Cas13a enzyme, that may result in a detection reaction and thus fluorescence. We took an image of each location on each chip roughly every 20 min to measure this fluorescence. To alleviate the presence of microdroplets in this experiment (that is, an irregular pairing of target and guide; about one-third of the droplets), we trained and applied a CNN on hand-labeled data to identify and remove these.

Quantifying activity

In our Cas13a detection experiments, a fluorescent reporter is cleaved over time and its cleavage follows first-order kinetics:

$$\begin{array}{l}\frac{{{\mathrm{d}}\left[ R \right]}}{{{\mathrm{d}}t}} = - \frac{{k_{{\mathrm{cat}}}}}{{K_{\mathrm{M}}}}\left[ E \right]\left[ R \right]\\ \Rightarrow \left[ R \right] = [R]_0{\mathrm{e}}^{ - \frac{{k_{{\mathrm{cat}}}}}{{K_{\mathrm{M}}}}\left[ E \right]t}\end{array}$$

where [R] is the concentration of the not-yet-cleaved reporter, [E] is the concentration of the Cas13a guide–target complex, \(\frac{{k_{{\mathrm{cat}}}}}{{K_{\mathrm{M}}}}\) is the catalytic efficiency of the particular guide–target complex and t is time. The fluorescence measurements that we make, y, are proportional to the quantity of cleaved reporter at some time point:

$$y \propto [R]_0 - \left[ R \right].$$

Therefore, for each guide–target complex we fit a curve of the form

$$y = C\left( {1 - {\mathrm{e}}^{ - kt}} \right) + B.$$

Here, C and B represent the saturation point and background fluorescence, respectively. k represents the rate at which the reporter is cleaved, and it is proportional to the catalytic efficiency of the particular guide–target complex:

$$k = \frac{{k_{{\mathrm{cat}}}}}{{K_{\mathrm{M}}}}\left[ E \right].$$

This relationship is validated by the linear relationship between k and [E] (Supplementary Fig. 4a) when we vary the concentration of target (the limiting component of the complex). In producing our dataset, we held [E] constant. We used log 10 (k) as our measurement of the overall enzymatic activity resulting from the guide–target pair (Figs. 1 and 2 and Supplementary Fig. 4a,b). Intuitively, each step-increase in log 10 (k) corresponds to a fold-decrease in the half-life of the reporter in the reaction.

Our experimental data incorporate multiple droplets for each guide–target pair (Extended Data Fig. 2a). Each droplet represents one technical replicate of a particular guide–target pair. Thus, we have fluorescence values for each replicate at different time points, and in practice we compute the activity log 10 (k) for each replicate.

We curated the data to obtain a final dataset. Namely, we discarded data from two guides that showed no activity between them and any targets, owing to low concentrations in their synthesis. We also did not use data from positive or negative control guides, or from the negative control targets. Our final dataset contains 19,209 unique guide–target pairs (Supplementary Fig. 3b,c), counting 20 nt of sequence context around each protospacer in the target (18,253 unique pairs when not counting context).

Most guide–target pairs show activity (Extended Data Fig. 2d), as expected. At small values of k on a limited time scale (t up to ~120 min), we do not observe reporter activation (Supplementary Fig. 4b). Moreover, the curve becomes approximately linear (first-order Maclaurin expansion: y ≈ Ckt + B). At such values of k, we cannot estimate both C and k together; intuitively, this is because there is too little detectable signal. Therefore, there is a cutoff at which we can estimate k; we labeled activities at log(k) > −4 as active, and the others as inactive. This phenomenon also implies that at smaller values of k, including ones we label as active, activity estimates might be less reliable.

Predicting detection activity

Measurement error

To account for measurement error, we sampled, with replacement, ten technical replicate measurements of activity for each guide–target pair (Extended Data Fig. 2a). We used this strategy to ensure that, although there are differing numbers of replicates per guide–target pair, each pair would be represented in the dataset with the same number of replicates. There are 19,209 × 10 = 192,090 points in total in our dataset that we use for training and testing. When plotting regression results on guide–target pairs in the hold-out test set (Fig. 2c, Extended Data Fig. 4a and Supplementary Fig. 10), we set the true activity of a pair to be the mean of the measured activities across the technical replicates for the pair.

Model and input descriptions

We approached prediction using a two-step hurdle model, reasoning that (1) separate processes govern whether a guide–target pair is active compared with the level of its activity; and (2) we could better predict the activity of active pairs if we excluded the inactive pairs from a regression. We developed a classifier to decide whether a pair is inactive or active, and a regression model to predict the activity of only active pairs.

We explored multiple models for classification (Fig. 2a and Supplementary Fig. 5a), each with a space of hyperparameters:

L1 logistic regression: regularization strength (logarithmic in [10 −4 , 10 4 ])

L2 logistic regression: regularization strength (logarithmic in [10 −4 , 10 4 ])

L1 + L2 logistic regression (elastic net): regularization strength (logarithmic in [10 −4 , 10 4 ]), L1/L2 mixing ratio (1.0 − 2 x + 2 −5 for x uniform in [−5, 0])

Gradient-boosted trees (GBT): learning rate (logarithmic in [10 −2 , 1]), number of trees (logarithmic in [1, 2 8 ], integral), minimum number of samples for splitting a node (logarithmic in [2, 2 3 ], integral), minimum number of samples at a leaf node (logarithmic in [1, 2 2 ], integral), maximum depth of a tree (logarithmic in [2, 2 3 ], integral), number of features to consider when splitting a node (for n features, chosen uniformly among considering all, 0.1 n , \(\sqrt n\) and log 2 n )

Random forest (RF): number of trees (logarithmic in [1, 2 8 ], integral), minimum number of samples for splitting a node (logarithmic in [2, 2 3 ], integral), minimum number of samples at a leaf node (logarithmic in [1, 2 2 ], integral), maximum depth of a tree (chosen uniformly among not restricting the depth or restricting the depth to a value picked logarithmically from [2, 2 4 ] and made integral), number of features to consider when splitting a node (for n features, chosen uniformly among considering all, 0.1 n , \(\sqrt n\) and log 2 n )

Support vector machine (SVM; linear): regularization strength (logarithmic in [10 −8 , 10 8 ]), penalty type (chosen uniformly among L1 and L2)

Multilayer perceptron (MLP): number of layers excluding the output layer (uniform in [1, 3]), dimensionality of each layer excluding the output layer (each chosen uniformly in [4, 127]), dropout rate in front of each layer (uniform in [0, 0.5]), activation function (chosen uniformly among rectified linear unit (ReLU) and exponential linear unit (ELU)), batch size always 16

Long short-term memory recurrent neural network (LSTM): dimensionality of the output vector (logarithmic in [2, 2 8 ], integral), whether to be bidirectional (chosen uniformly among unidirectional and bidirectional), dropout rate in front of the final layer (uniform in [0, 0.5]), whether to perform an embedding of the one-hot encoded nucleotides and the dimensionality if so (chosen with 1/3 chance to not perform an embedding, and with 2/3 chance to perform an embedding with dimensionality chosen uniformly in [1, 8]), batch size is always 16

CNN: number of parallel convolutional filters and their widths (chosen uniformly among not having a convolutional layer, 1 filter of width 1, 1 filter of width 2, 1 filter of width 3, 1 filter of width 4, 2 filters of widths {1, 2}, 3 filters of widths {1, 2, 3} and 4 filters of widths {1, 2, 3, 4}), convolutional dimension (uniform in [10, 249]), pooling layer width (uniform in [1, 3]), pooling layer computation (chosen uniformly among maximum, average and both), number of parallel locally connected layers and their widths (chosen uniformly among not having a locally connected layer, 1 filter of width 1, 1 filter of width 2 and 2 filters of widths {1, 2}), locally connected filter dimension (uniform in [1, 4]), number of fully connected layers and their dimensions (chosen uniformly among 1 layer with dimension uniform in [25, 74] and 2 layers each with dimension uniform in [25, 74]), whether to perform batch normalization in between the convolutional and pooling layers (uniform among yes and no), activation function (chosen uniformly among ReLU and ELU), dropout rate in front of the fully connected layers (uniform in [0, 0.5]), L2 regularization coefficient (lognormal with mean µ = −13, σ = 4), batch size (uniform in [32, 255]), learning rate (logarithmic in [10−6, 10−1])

Similarly, for regression we explored multiple models (Supplementary Fig. 5b,c), each with a space of hyperparameters:

L1 linear regression: regularization strength (logarithmic in [10 −8 , 10 8 ])

L2 linear regression: regularization strength (logarithmic in [10 −8 , 10 8 ])

L1 + L2 linear regression (elastic net): regularization strength (logarithmic in [10 −8 , 10 8 ]), L1/L2 mixing ratio (1.0 − 2 x + 2 −5 for x uniform in [−5, 0])

GBT: same hyperparameter space as for classification

RF: same hyperparameter space as for classification

MLP: same hyperparameter space as for classification

LSTM: same hyperparameter space as for classification

CNN: same hyperparameter space as for classification

Model selection and evaluation describes the search process.

When training and testing the models, we used a 28-nt guide and target sequence, and include 10 nt of context in the target sequence on each side of the protospacer. We tested the following different inputs:

‘One-hot (1D)’: vector containing 4 bits to encode the nucleotide at each target position and 4 bits similarly for each guide position; with a 28-nt guide and 10 nt of context in the target around the protospacer, there are (10 + 28 + 10 + 28) × 4 = 304 bits

‘One-hot MM’: similar to ‘One-hot (1D)’ except explicitly encoding mismatches between the guide and target—that is, vector containing 4 bits to encode the nucleotide at each target position and 4 bits, at each guide position, encoding whether there is a mismatch (if not, all 0) and, if so, the guide allele; same length as ‘One-hot (1D)’

‘Handcrafted’: features are count of each nucleotide in the guide, count of each dinucleotide in the guide, GC count in the guide, total number of mismatches between the guide and target sequence, and a one-hot encoding of the 2-nt PFS (coupling the 2 nucleotides); the number of features is 4 + 16 + 1 + 1 + 16 = 38

‘One-hot MM + Handcrafted’: concatenation of features from ‘One-hot MM’ and ‘Handcrafted’, except removing from ‘One-hot MM’ the bits encoding the 2-nt PFS because these are included in ‘Handcrafted’

We used these inputs for all models except the LSTM and CNN. For these two models, which can capture and extract spatial relationships in the input, we used an alternative input (labeled ‘One-hot (2D)’ in figures). Here, the input dimensionality is (48, 8) and consists of a concatenated one-hot encoding of the target and guide sequence. Namely, each element x i (i ∈ {1… 48}) is a vector [x i,t , x i,g ]. Target context corresponds to i ∈ {1… 10} (5′ end) and i ∈ {39… 48} (3′ end); for these i, x i,t is a one-hot encoding of the target sequence and x i,g is all 0. The guide binds to the target at i ∈ {11… 38} and, for these i, x i,t is a one-hot encoding of the target sequence protospacer at position i − 10 of where the guide is designed to bind, while x i,g is a one-hot encoding of the guide at position i − 10.

We evaluated all models, except the MLP, LSTM and CNN, using scikit-learn 0.22 (ref. 73). We implemented and evaluated the MLP, LSTM and CNN models in TensorFlow 2.1.0 (ref. 74).

For the MLP, LSTM and CNN models, we used binary cross-entropy as the loss function for classification and mean squared error for regression. For these three models, we used the Adam optimizer75 and performed early stopping during training (maximum of 1,000 epochs) with a held-out portion of the training data. Additionally, for the CNN we regularized the weights (L2). When training all classification models, we weighted the active and inactive classes equally.

Data splits and test set

When performing model cross-validation, we must determine folds of the data. Guides are tiled along the 865-nt wild-type target (Fig. 1a and Supplementary Fig. 3a) and their positions along the RNA target enable dividing guide–target pairs into two sets in which each set consists of cognate guide–target pairs that are unrelated to the pairs in the other set. During k-fold cross-validation, we split the positions of the guide–target pairs into k consecutive folds (positions are ordered, that is, not shuffled). For each fold, the validation set consists of guide–target pairs where the guide’s position is from the validation range, and the training set consists of guide–target pairs where the guide’s position is from the position ranges in the remaining k − 1 folds. Note that the validation set consists of guide–target pairs from one contiguous region of the 865-nt RNA targets, while the training set is not necessarily contiguous. With this strategy alone, guides between the training and validation sets may overlap according to the position against which they were designed along the wild-type target. Although effects on activity might be position-dependent within the guide, this overlap can cause guides to have similar sequence composition or to be in regions of the target sequence with similar structure. To remove this possibility of leakage between a data split, after making a split of X into X train and X validate , we removed all guide–target pairs from X validate for which the guide has any overlap, in target sequence it is designed to detect, with a guide in X train . We performed this data splitting strategy during all cross-validated analyses, including for determining outer and inner folds of nested cross-validation.

We also followed this strategy to choose a test set that we hold out from all analyses and use only for evaluating the final CNNs. This test set consists of the 30% of all guides (counted before removing overlaps between the test set and other data) that detect the 3′ end of the 865-nt targets.

Model selection and evaluation

We performed nested cross-validation to select models—both for classification and regression—and evaluate our selection of them (Fig. 2a and Supplementary Fig. 5). We used five outer folds of the data. For each outer fold, we searched for hyperparameters using a cross-validated (five inner folds) random search over the space defined in Model and input descriptions; we scored using the mean auROC (classification) or Spearman correlation (regression) over the inner folds. In each random search, we used 100 hyperparameter choices for all models, except for the LSTM and CNN models (50), which we found slower to train.

The CNN models outperformed others in the above analysis, so we selected a final CNN model for classification and another for regression. For each of classification and regression, we performed a random search across five folds of the data using 200 random samples. We selected the model with the highest auROC (classification) or Spearman correlation (regression) averaged over the folds. Our evaluations of these two models used the hold-out test set.

Incorporating into ADAPT

We integrated the CNN models into ADAPT. First, we set the decision threshold on the classifier’s output to be 0.577467. We chose the threshold, via cross-validation, to achieve a desired precision of 0.975. In particular, we took five folds of our data (excluding test data) and, for each fold, we calculated the threshold that achieves a precision of 0.975 on the validation data. Our decision threshold is the mean across the folds.

We then defined a piecewise function, incorporating the classification and regression models, as:

$$d( {p,s} ) = \left\{ {\begin{array}{*{20}{c}} {0,} & {{\mathrm{if}}\,C( {p,s} ) < t} \\ {{\mathrm{max}}\left( {0,r + R( {p,s} )} \right),} & {\mathrm{else}} \end{array}} \right.$$

where d(p, s) is the predicted detection activity between a probe p and target sequence s (s includes 10 nt of context). C(p, s) is the output of the classifier, t is the classification decision threshold and R(p, s) is the output of the regression model. r is a shift that we add to regression outputs to ensure d(p, s) is non-negative; though a nice property, it is not strictly needed as long as we constrain the ground set as described in Supplementary Note 2a. The choice of r should depend on the range of activity values in the dataset; here, r = 4.

Comparison of predictions with independent Cas13a datasets

Supplementary Note 1 describes how we evaluated our model’s predictions using independent Cas13a datasets from refs. 36,37. When reporting P values for Spearman’s test and Pearson’s test (Extended Data Fig. 5), the alternative hypothesis is that the true correlation is not 0 (Pearson’s test uses a t-distribution). Pearson’s r (Extended Data Fig. 5) was calculated as a sample correlation coefficient between our model’s predicted values and paired, independently measured values.

ADAPT analyses

Comparing algorithms for submodular maximization

To compare the canonical greedy algorithm for constrained monotone submodular maximization43 with the fast randomized combinatorial algorithm42 (Supplementary Fig. 13), we ran ADAPT five times under each choice of parameter settings and species. For each run, we plotted the mean objective value taken across the best five design options. We used the arguments ‘-pm 3 -pp 0.9 --primer-gc-content-bounds 0.3 0.7 --max-primers-at-site 10 -gl 28 --max-target-len 250’ with our Cas13a activity model. We used the default objective function in ADAPT: 4 + A − 0.5 P − 0.25 L, where A is the objective value maximized by the submodular maximization algorithms, P is the number of primers and L is the target length.

Benchmarking comprehensiveness

To benchmark comprehensiveness (Fig. 3b,c and Supplementary Fig. 14), we ran ADAPT with three approaches. In all approaches, we decided that a probe detects a target sequence if and only if they are within one mismatch, counting G-U wobble pairs as matches, and used a sliding window of 200 nt and a probe length of 30 nt. We used bootstrapping to estimate uncertainty around plotted values owing to viral genome sampling: five times, we randomly sampled with replacement from all NCBI genome neighbors31 for each species (if there are N neighbors, we randomly sampled N with replacement) and used each of these resamplings as input to five runs. In the first approach (baselines), we used ADAPT’s design_naively.py program to select probes within each window via three strategies: (1) the consensus probe, computed at every site within the window, that detects the most number of genome sequences (‘consensus’); (2) the most common probe sequence, determined at every site within the window, that detects the most number of genome sequences (‘mode’); and (3) the n most common subsequences, with all n determined at each site in the window, choosing the n from the site where they collectively detect the most number of genome sequences (doing this separately for n ranging from 1 to 10). In the second approach, we maximized expected activity using ADAPT across the target sequences with different numbers of probes (hard constraints) using a penalty strength of 0 (that is, no soft constraint). Here, we defined the activity to be binary: 1 for detection, and 0 otherwise; this has the property that expected activity is equivalent to the fraction of sequences detected. In the third approach, we use the objective function in ADAPT that minimizes the number of probes subject to constraints on the fraction of sequences detected (specified via ‘-gp’; 0.9, 0.95 and 0.99).

Evaluating dispersion and generalization

We evaluated the dispersion, owing to randomness and sampling, in ADAPT’s designs (Supplementary Fig. 18). In all cases, we used all NCBI genome neighbors31 for each species and used the following arguments with ADAPT: ‘--obj maximize-activity --soft-guide-constraint 1 --hard-guide-constraint 5 --penalty-strength 0.25 -gl 28 -pl 30 -pm 3 -pp 0.98 --primer-gc-content-bounds 0.35 0.65 --max-primers-at-site 10 --max-target-length 500 --obj-fn-weights 0.50 0.25’, with a cluster threshold such that there is only one cluster, and used our Cas13a activity model. We ran ADAPT in two ways: 20 times without changing the input (output differences are owing to algorithmic randomness) and 20 times with resampled input (output differences are owing both to randomness and to sampling of the input sequences). Then, we measured dispersion by treating the five highest-ranked design options from each run as a set and computing pairwise Jaccard similarities across the 20 runs. This computation requires us to evaluate overlap between two sets: in one comparison, we consider a design option x to be in another set if x is present exactly in that other set (same primers and probes) and, in the other comparison, we consider a design option x to be in another set if that other set has some design option with both endpoints within 40 nt of x’s endpoints.

To evaluate the generalization of ADAPT’s designs (Fig. 4b), we performed cross-validation via repeated random subsampling. For each species, we took all NCBI genome neighbors31 and, 20 times, randomly selected 80% of them to use as input for design and the remaining 20% to test against. For each split, we used the same arguments with ADAPT as when evaluating dispersion: ‘--obj maximize-activity --soft-guide-constraint 1 --hard-guide-constraint 5 --penalty-strength 0.25 -gl 28 -pl 30 -pm 3 -pp 0.98 --primer-gc-content-bounds 0.35 0.65 --max-primers-at-site 10 --max-target-length 500 --obj-fn-weights 0.50 0.25’, with a cluster threshold such that there is only one cluster, and used our Cas13a activity model. When computing the fraction of sequences in the test set that are detected, we required the sequence to be detected by a primer on the 5′ and 3′ ends of a region (within three mismatches) and a probe (here, guide) to detect the region; we used the analyze_coverage.py program in ADAPT for this computation. We labeled detection of a sequence as ‘active’ if a guide in the guide set is decided by our Cas13a classification model to be active against the target. We labeled the detection as ‘highly active’ if a guide in the guide set is both decided to be active by the Cas13a classification model and its predicted activity, according to the Cas13a regression model, is ≥2.7198637 (4 added to the output of the model, −1.2801363). This threshold corresponds to the top 25% of predicted values on the subset of our hold-out test set that is classified as active.

Using the same cross-validation strategy, we also evaluated generalization except with relaxed settings on constraints for the number of guides and more stringent settings on primer coverage (Supplementary Fig. 19): ‘--obj maximize-activity --soft-guide-constraint 3 --hard-guide-constraint 10 --penalty-strength 0.05 -gl 28 -pl 30 -pm 3 -pp 0.995 --primer-gc-content-bounds 0.20 0.80 --max-primers-at-site 15 --max-target-length 1000 --obj-fn-weights 0.30 0.05’. These settings allow for more complex assay designs (for example, more guides and primers) to enable a higher sensitivity. Additionally, when deciding detection of the held-out genomes in this analysis, we adjusted thresholds to allow a higher sensitivity with lower precision: we allowed four mismatches for primers (instead of three) and lowered the decision threshold of our Cas13a classification model to 0.3 (instead of 0.577467).

Benchmarking trie-based specificity queries

We benchmarked the approach described in Supplementary Note 3d against a single, large trie (Supplementary Fig. 16). For this, we sampled 1.28% of all 28-mers from 570 viral species (~78.7 million 28-mers in total), and built data structures indexing these. We then randomly selected 100 species (here, counting each segment of a segmented genome as a separate species), and queried 100 randomly selected 28-mers from each of these for hits against the other 569 species. We performed this for varying choices of mismatches. We used the same approach to generate results in Supplementary Fig. 15, there comparing queries with and without tolerance of G-U base pairing.

Benchmarking runtime improvement with memoization

We benchmarked the effect on runtime of memoizing repeated computations (Supplementary Fig. 17), as described in Supplementary Note 4b. We used all genome neighbors from NCBI’s viral genomes resource31 as input for each of the three species tested. To run ADAPT while memoizing computations, we used the arguments: ‘--obj maximize-activity --soft-guide-constraint 1 --hard-guide-constraint 5 --penalty-strength 0.25 --maximization-algorithm random-greedy -pm 3 -pp 0.9 --primer-gc-content-bounds 0.3 0.7 --max-primers-at-site 10 -gl 28 --max-target-len 250 --best-n-targets 10 --id-m 4 --id-frac 0.01 --id-method shard’. We also used our Cas13a predictive model and enforced specificity against all other species within each species’ family. To perform runs without memoizing computations, we did the same except added the argument ‘--do-not-memoize-guide-computations’, which skips all memoization steps during ADAPT’s search (except for calls to the predictive model).

Design of broadly effective SARS-related CoV assays in 2018 and their evaluation

To evaluate the efficacy of species-level assays on a novel virus (Supplementary Fig. 21), we focused on SARS-related CoV. We simulated the 2018 design of assays for detecting the SARS-related CoV species, roughly a year before the initial detection of SARS-CoV-2. In particular, we used as input all genome neighbors from NCBI’s viral genomes resource31 for SARS-related CoV that were released on or before 31 December 2018 (there are 311 genomes). For ADAPT’s designs, we used the same parameters used for the vertebrate-infecting viral species designs (‘Designs across vertebrate-infecting species’), except tolerating up to one mismatch between primer and target sequences; the specificity criteria were also the same as in those designs.

In 2018, SARS-related CoV was biased toward SARS-CoV-1 genomes (owing to SARS outbreak sequencing) relative to viruses sampled from animals. To alleviate this overrepresentation, we also produced designs using ADAPT in which the input downsampled SARS-CoV-1 to a single genome (Supplementary Fig. 21c,d). We used the RefSeq, GenBank accession AY274119, as that genome.

To determine the performance of these designs on SARS-CoV-2, we used the 184,197 complete genomes (low-quality removed) available on GISAID58 as of 12 November 2020. For an assay to be predicted to detect a target sequence (Supplementary Fig. 21b,d), we require that (1) primers on both ends are within three mismatches of the target sequence; and (2) a guide in the guide set is classified by our Cas13a predictive model as active. We used these criteria for evaluating detection of SARS-CoV-2 and of the design’s input.

Designs across vertebrate-infecting species

We found all viral species in NCBI’s viral genomes resource31 that have a vertebrate as a host, as of April 2020. These are species ratified by the International Committee on Taxonomy of Viruses76. We added to this list others that may have been incorrectly labeled, as well as influenza viruses, which are separate from the resource. There were 1,933 species in total and we used ADAPT to design primers and Cas13a guides to detect them. As input, we used all genome neighbors from NCBI’s viral genomes resource31 (influenza database for influenza species71). We ran ADAPT in May–June 2020, and thus the input incorporates sequences available through those dates.

We constrained primers to have a length and GC content that are recommended for use with RPA77 (recombinase polymerase amplification), and thus are suitable for use with SHERLOCK1 (Specific High-Sensitivity Enzymatic Reporter UnLOCKing) detection. We enforced specificity at the species-level within each family. That is, we required that the guides for each species not have off-target hits to sequence from any other species in its same family. Restricting our specificity queries to one family at a time reduces ADAPT’s memory usage and runtime.

We used the following arguments when running ADAPT to maximize expected activity:

Initial clustering: clustered with a maximum distance of 30% (‘--cluster-threshold 0.3’)

Primers and amplicons: primer length of 30, primers must have GC content between 35% and 65%, at most 10 primers at a site (although high, this is only an upper bound and is meant to restrict the search space and thus restrict runtime), up to 3 mismatches between primers and target sequence for hybridization, primers must hybridize to ≥98% of sequences and length of a targeted genome region (amplicon) must be ≤250 nt (‘-pl 30 --primer-gc-content-bounds 0.35 0.65 --max-primers-at-site 10 -pm 3 -pp 0.98 --max-target-length 250’)

Guides: Cas13a guide length of 28 nt, together with our Cas13a predictive model (‘-gl 28 --predict-activity-model-path models/classify/model-51373185 models/regress/model-f8b6fd5d’)

Guide activity objective: soft constraint of 1 guide, hard constraint of 5 guides, guide penalty ( λ ) of 0.25, using the randomized greedy algorithm (‘--obj maximize-activity --soft-guide-constraint 1 --hard-guide-constraint 5 --penalty-strength 0.25 --maximization-algorithm random-greedy’)

Specificity: query up to 4 mismatches counting G-U pairs as matches, calling a guide non-specific if it hits ≥1% of sequences in another taxon (‘--id-method shard --id-m 4 --id-frac 0.01’)

Objective function and search: weights λ A = 0.5 and λ L = 0.25 in the objective function (defined in Supplementary Note 4b) and finding the best 20 design options (‘--obj-fn-weights 0.5 0.25 --best-n-targets 20’)

We made some species-specific adjustments. For influenza A and dengue viruses, two especially diverse species, we decreased the number of tolerated primer mismatches to two and allowed at most five primers at a site (‘-pm 2 --max-primers-at-site 5’); while these further constrain the design, they decrease runtime. For Norwalk virus and Rhinovirus C, we relaxed the number of primers at a site and the maximum region length to identify designs (‘--max-primers-at-site 20 --max-target-length 500’). For Cervid alphaherpesvirus 2, which has a short genome, we changed the GC-content bounds on primers to be 20–80% (‘--primer-gc-content-bounds 0.2 0.8’) to allow more potential amplicons. For 42 species, we relaxed specificity constraints to identify designs (list and details in code).

Of the 1,933 species, seven could not produce a design while maximizing activity and enforcing specificity, even with species-specific adjustments. They are: Bat mastadenovirus, Bovine associated cyclovirus 1, Chiropteran bocaparvovirus 4, Cyclovirus PKgoat21/PAK/2009, Finkel–Biskis–Jinkins murine sarcoma virus, Panine gammaherpesvirus 1 and Squirrel fibroma virus. Each of these seven species has just one genome sequence and ADAPT could not identify a guide set satisfying specificity constraints; it is possible they are misclassified or have very high genetic similarity to other species. When showing results for this objective, we report on 1,926 species.

In addition to using the above settings, which maximizes activity and enforces specificity, we ran ADAPT with three other approaches. We minimized the number of guides while enforcing specificity, requiring that guides be predicted to be highly active (as defined in ‘Evaluating dispersion and generalization’) in detecting 98% of sequences. We also ran the objectives to maximize activity and minimize guides without enforcing specificity. In total, 67 of the 1,933 species did not yield a design when minimizing the number of guides and enforcing specificity, owing to the constraints with this objective: ADAPT could not identify a guide set that is predicted to be highly active and achieves the desired coverage and specificity.

For species with segmented genomes, we ran ADAPT and produced designs separately for each segment. We then selected the segment whose highest-ranked design option has the best objective value (if multiple clusters, according to the largest cluster). We expect the selected segment to generally be the most conserved one.

In all analyses showing results of the designs (for example, number of guides, guide activity and target length), we used the highest-ranked design option output by ADAPT. For the species with more than one cluster, we report the mean across clusters from the highest-ranked design option in each cluster.

For producing designs across vertebrate-infecting viral species, we ran ADAPT on Amazon Web Services using the ‘x1.16xlarge’ instance type. We ran ADAPT in parallel across multiple species to fully use the instance’s resources. We evaluated ADAPT’s computational requirements, namely the runtime and memory usage, as part of these runs on that instance type.

Designs for evaluating sensitivity and specificity

ADAPT design parameters

To generate designs with ADAPT for experimental testing, we used the following arguments unless otherwise noted:

Initial clustering: force a single cluster (‘--cluster-threshold 1.0’)

Primers and amplicons: primer length of 30, primers must have GC content between 35% and 65%, at most 5 primers at a site, up to 1 mismatch between primers and target sequence for hybridization, primers must hybridize to ≥98% of sequences and length of a targeted genome region (amplicon) must be ≤250 nt (‘-pl 30 --primer-gc-content-bounds 0.35 0.65 --max-primers-at-site 5 -pm 1 -pp 0.98 --max-target-length 250’)

Guides: Cas13a guide length of 28 nt, together with our Cas13a predictive model (‘-gl 28 --predict-activity-model-path models/classify/model-51373185 models/regress/model-f8b6fd5d’)

Guide activity objective: soft constraint of 1 guide, hard constraint of 5 guides, guide penalty ( λ ) of 0.25, using the randomized greedy algorithm (‘--obj maximize-activity --soft-guide-constraint 1 --hard-guide-constraint 5 --penalty-strength 0.25 --maximization-algorithm random-greedy’)

Specificity: query up to 4 mismatches counting G-U pairs as matches, calling a guide non-specific if it hits ≥1% of sequences in another taxon (‘--id-method shard --id-m 4 --id-frac 0.01’)

Objective function and search: weights λ A = 0.5 and λ L = 0.25 in the objective function (defined in Supplementary Note 4b) (‘--obj-fn-weights 0.5 0.25’)

For SARS-CoV-2 input sequences, we used the 9,054 complete genomes available on GISAID58 as of 28 April 2020. We also used genomes from GISAID for pangolin SARS-like CoV input sequences (isolates from Guangxi, China and Guandong, China). For all other input sequences—SARS-like CoV isolates RaTG13, ZC45 and ZXC21; other SARS-like CoVs; SARS-CoV-1 (also referred to as SARS-CoV); and other Coronaviridae species—we used all genome neighbors from NCBI from each taxon31. For input sequences to EVB designs, we also used genome neighbors from NCBI.

Generating test target sequences

Experimentally testing design options output by ADAPT also requires generating representative target sequences. We found representative sequences for a design option, using a collection of genomes spanning diversity of a taxon, as follows: (1) We extracted the amplicon (according to provided positions, for example, from primer sequences), while extending outward to achieve a minimum length (usually 500 nt). (2) We removed sequences that are too short, for example, owing to gaps in the alignment. (3) We computed pairwise Mash distances78 and performed hierarchical clustering (average linkage) to achieve a desired number of clusters or a maximum intercluster distance. (4) To avoid outliers, we greedily selected (in order of descending size) clusters that include a desired total fraction of sequences, or a particular number of targets, or ones representing particular taxa (specifics below). (5) We computed the medoid of each cluster—that is, the sequence with minimal total distance to all other sequences in the cluster. (6) We used the medoids of each of the clusters as representative target sequences. The pick_test_targets.py program in ADAPT implements the procedure and we used this program.

Baseline distribution of activity

We established a baseline distribution of activity using Cas13a guides, to detect SARS-CoV-2, selected from the genomic regions targeted by the US CDC’s RT–qPCR assays79. In particular, we picked ten random 28-mers from the US CDC’s N1 amplicon that have a non-G PFS and used these as Cas13a guides, according to the ‘hCoV-19/Wuhan/IVDC-HB-01/2019’ genome58. We also chose another Cas13a guide at the site of the TaqMan probe with a non-G PFS. We did the same from the US CDC N2 amplicon. In addition, in the N1 and N2 amplicons, we used ADAPT to design a single guide with maximal activity (ignoring specificity) from within the amplicon. This provides 24 guides in total.

Experimental designs with ADAPT

To evaluate the activity and lineage-level specificity of SARS-CoV-2 designs, we used ADAPT to produce ten design options for detecting SARS-CoV-2. We increased the specificity in ADAPT to call a guide non-specific if it hits any sequence outside SARS-CoV-2 and also used the greedy maximization to obtain more intuitive outputs because, in this case, we expect only a single Cas13a guide for each design option (‘--id-frac 0 --maximization-algorithm greedy’). We enforced specificity to not detect any sequences outside of SARS-CoV-2 from the SARS-related CoV species (including related bat and pangolin coronavirus isolates) and also to not detect sequences from the other 43 species in the Coronaviridae family. Owing to experimental constraints, we tested the highest-ranked five. We generated targets for each design option against which to test, using the ones representative of SARS-CoV-2; pangolin SARS-like CoVs (isolates from Guangxi, China); bat SARS-like CoV isolates ZC45 and RaTG13; and SARS-CoV-1.

To further evaluate activity and subspecies-comprehensiveness, we used ADAPT to produce ten design options for detecting the SARS-CoV-2-related taxon. In referring to SARS-CoV-2-related, we use the definition given in Fig. 1b of ref. 48; it encompasses SARS-CoV-2 and several related bat and pangolin SARS-like coronaviruses. To correct for sampling biases, we used ten sampled SARS-CoV-2 genomes as input so that they make up roughly half of sequences in the SARS-CoV-2-related taxon. We used the same adjusted arguments in ADAPT as used for the SARS-CoV-2 designs (‘--id-frac 0 --maximization-algorithm greedy’). We enforced specificity to not detect any sequences outside of SARS-CoV-2-related from the SARS-related CoV species (including other bat SARS-like coronaviruses) and also to not detect sequences from the other 43 species in the Coronaviridae family. For each design option, we generated targets, and used the ones representative of SARS-CoV-2; pangolin SARS-like CoVs (isolates from Guangxi, China and Guangdong, China); bat SARS-like CoV isolates ZC45, ZXC21 and RaTG13; and SARS-CoV-1. For this experiment, the SARS-CoV-1 target allows us to evaluate specificity, while the others allow us to evaluate activity and subspecies-comprehensiveness.

We used ADAPT to produce ten design options to detect the SARS-related CoV species, and we used these to evaluate activity, species-comprehensiveness and specificity. To correct for sampling biases, we used 300 sampled SARS-CoV-2 genomes as input so that they make up roughly half of sequences in the species. We enforced specificity to not detect sequences from the other 43 species in the Coronaviridae family. For each design option, we generated representative targets that encompass SARS-CoV-2, SARS-CoV-1, bat SARS-like CoVs, pangolin SARS-like CoVs, MERS-CoV, Human coronavirus OC43 and Human coronavirus HKU1. There were eight or nine representative targets in total for each design option.

To evaluate species-comprehensiveness, we focused on EVB and used ADAPT to produce ten design options. Owing to its extensive diversity, we made several adjustments to arguments, which help to increase the space of potential design options (‘--primer-gc-content-bounds 0.30 0.70 -pm 4 -pp 0.80 --max-primers-at-site 10 --id-frac 0.10 --penalty-strength 0.15’).

We enforced specificity to not detect the 18 other species in the Enterovirus genus. For each design option, we generated representative targets from clusters that encompass at least 90% of all sequences. There were between 1 and 15 targets for each design option (the precise number depends heavily on the location of the design option amplicon in the genome). We additionally tested specificity within the Enterovirus genus by generating a single representative target for each of Enterovirus A, Enterovirus C and Enterovirus D.

To benchmark ADAPT’s designs for EVB, we created baseline Cas13a guides using an entropy-based approach that identifies conserved sites. For each of ADAPT’s design options, we considered the amplicon it targets. Then, we computed the information-theoretic (Shannon) entropy, over alleles, at every site in the amplicon. (We counted an ambiguous base fractionally and a gap as a ‘base’.) We define the average entropy of a 28-nt site to be the mean entropy across its 28 positions. The approach finds the site in the amplicon that has the minimal average entropy and an active (non-G) PFS in GenBank accession MK800120. Our entropy-based baseline guide is the sequence from GenBank accession MK800120 at this site. We performed this process in the amplicon from each of ADAPT’s designs to generate and test one baseline guide; for five of the ten designs, we generated and tested two baseline guides, where the second was from the site with the second lowest entropy and an active PFS. The approach is implemented in ADAPT’s design_naively.py program.

We built a positive control into each target. In particular, we added the sequence 5′-CACTATAGGGGCTCTAGCGACTTCTTTAAATAGTGGCTTAAAATAAC-3′ to the 5′ end of each target and included in our tests of every target a guide with protospacer sequence 5′-GCTCTAGCGACTTCTTTAAATAGTGGCT-3′.

Experiments evaluating sensitivity and specificity

Experimental procedure

We largely followed the CARMEN-Cas13 platform8 for experimentally validating ADAPT’s designs, with some key differences. DNA targets were ordered from Integrated DNA Technologies and in vitro transcribed using the HiScribe T7 High Yield RNA Synthesis Kit (New England Biolabs). Transcriptions were performed according to the manufacturer’s recommendations with a reaction volume of 20 µl that was incubated overnight at 37 °C. The transcribed RNA products were purified using RNAClean XP beads (Beckman Coulter) and quantified using NanoDrop One (Thermo Scientific). The RNA was serially diluted from 1011 to 104 copies per µl and used as input into the detection reaction. crRNAs were synthesized by Integrated DNA Technologies, resuspended in nuclease-free water and diluted to 1 µM for input into the detection reaction. The Cas13 detection reactions were made into two separate mixes for loading onto a 192.24 Dynamic Array integrated fluidic circuit (IFC) for Gene Expression (Fluidigm). The assay mix contained 42.5 nM LwaCas13a, 42.5 nM crRNA, 2× Assay Loading Reagent (Fluidigm) and nuclease-free water. The sample mix contained 1 µl of RNAse Inhibitor (New England Biolabs), 1× ROX Reference Dye (Invitrogen), 1× GE Sample Loading Reagent (Fluidigm), 1.95 nM quenched synthetic fluorescent RNA reporter (FAM/rUrUrUrUrUrUrU/3IABkFQ/, Integrated DNA Technologies) and 9 nM MgCl 2 in a nuclease assay buffer (40 mM Tris-HCl, 1 mM dithiothreitol pH 7.5). Syringe, Actuation Fluid, Pressure Fluid (Fluidigm) and 4 µl of assay and sample mixtures were loaded into their respective locations on a 192.24 IFC according to the manufacturer’s instructions. The IFC was loaded onto the IFC Controller RX (Fluidigm) where the ‘Load Mix’ script was run. After proper IFC loading, images over a 2-h period were collected using a custom protocol on Fluidigm’s Biomark HD.

Displaying experimental results

We plotted reference-normalized background-subtracted fluorescence for guide–target pairs. For a guide–target pair (at some time point t and target concentration), we first computed the reference-normalized value as

$${{{\mathrm{median}}}}\left( {\frac{{P_t - P_0}}{{R_t - R_0}}} \right)$$

where P t is the guide signal (FAM) at the time point, P 0 is its background measurement before the reaction, R t is the reference signal (ROX) at the time point, R 0 is its background measurement and the median is taken across Fluidigm’s replicates. We performed the same calculation for the no-template (water) control of the guide, providing a background fluorescence value for the guide at t (when there were multiple technical replicates of such controls, we took the mean value across them). The reference-normalized background-subtracted fluorescence for a guide–target pair is the difference between these two values. Note that, by definition, plotted values greater than 0 represent fluorescence that exceeds background and the no-template control (‘NC’ in figures) has value of 0. When plotting the no-template control separately (Supplementary Fig. 23), we show reference-normalized values without background-subtracting. In heatmaps showing fluorescence at a fixed time point, we used the middle time point (59 min). In kinetic curves that show fluorescence over time (for example, Fig. 5b), we smoothed the value by taking the rolling mean within a window of two time points.

When displaying the top-ranked design options from ADAPT (for example, in Fig. 5d–h), we ordered them according to the predicted activity of the Cas13a guides in expectation across the input genomes. ADAPT’s ranking incorporates additional factors (Supplementary Note 4b) that reflect amplification potential, and we used ADAPT’s objective function to identify the top N design options to test. But we ordered them according to only predicted fluorescent activity because our experimental testing did not involve amplification. When plotting fluorescence for a design that uses more than one guide, we plot the maximum fluorescence across the guides (computed separately at each target, target concentration and measurement time point). This is analogous to ADAPT’s model for measuring a probe set’s activity (Supplementary Note 2a), in which its activity in detecting a target sequence equals that of the best probe in the set for detecting that sequence.

Evaluating specificity against non-viral taxa

We performed an in silico comparison of all experimentally tested guides with human transcripts and bacterial pathogens to determine if there is potential cross-reactivity. We first built an index consisting of human transcript sequences from GENCODE v.38 (ref. 80) and NCBI reference genome sequences for 11 bacterial pathogens (Bordetella pertussis (NC_018518.1); Chlamydia pneumoniae (NC_005043.1); Haemophilus influenzae (NZ_CP009610.1); Legionella pneumophila (NZ_CP013742.1); Mycobacterium tuberculosis (NC_000962.3); Mycoplasma pneumoniae (NZ_CP010546.1); Pseudomonas aeruginosa (NC_002516.2); Staphylococcus epidermidis (NZ_CP035288.1); Streptococcus pneumoniae (NZ_CP046357.1); Streptococcus pyogenes (NZ_CP010450.1); Streptococcus salivarius (NZ_CP066093.1)). We also included, as positive controls for the analysis, NCBI reference sequence genomes for SARS-CoV-1 (NC_004718.3) and SARS-CoV-2 (NC_045512.2).

We sought to query guide sequences against this index while tolerating multiple mismatches over a short query length (that is, the guide length of 28 nt). To enable this, we used Bowtie 2 (ref. 81) to align guide sequences to the index with the parameters ‘-a --end-to-end -N 1 -L 7 -i S,1,1 --ma 0 --mp 1,1 --rdg 100,1 --rfg 100,1 --score-min L,-4,0’. These settings permit us to identify all alignments of guides, against our index, having four or fewer mismatches across the length of the guide without tolerating gaps. Such alignments represent potential non-specificity of the guides. Of all guides in our experimental testing, the only identified non-specificity was for one guide from the entropy-based strategy for benchmarking EVB detection (Design no. 8; four mismatches from a human transcript); thus, with this exception, all guides are at least five mismatches different from human transcripts and the included bacterial genomes.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.