Group testing via hypergraph factorization applied to COVID-19

Large scale screening is a critical tool in the life sciences, but is often limited by reagents, samples, or cost. An important recent example is the challenge of achieving widespread COVID-19 testing in the face of substantial resource constraints. To tackle this challenge, screening methods must efficiently use testing resources. However, given the global nature of the pandemic, they must also be simple (to aid implementation) and flexible (to be tailored for each setting). Here we propose HYPER, a group testing method based on hypergraph factorization. We provide theoretical characterizations under a general statistical model, and carefully evaluate HYPER with alternatives proposed for COVID-19 under realistic simulations of epidemic spread and viral kinetics. We find that HYPER matches or outperforms the alternatives across a broad range of testing-constrained environments, while also being simpler and more flexible. We provide an online tool to aid lab implementation: http://hyper.covid19-analysis.org.


S1 Maximally balanced designs
We seek designs that are maximally balanced in three ways. Here we illustrate each way through some examples: 1. Assign all individuals to the same number q of pools In the imbalanced design (consecutive pooling), pool combinations AB, CD and EF are each used twice but none of the other combinations (e.g., AC) are used.
Formally, maximally balanced designs can be defined as follows.
Definition 1: Let P ∈ {0, 1} m×n denote a pooling design with n individuals and m pools, where P ij = 1 means that pool i contains individual j and P ij = 0 otherwise. We say P is maximally balanced if for some q i) q j = q for all j, iii) max π∈Πq σ π − min π∈Πq σ π = 0 if n is a multiple of m q , 1 otherwise, where q j := m i=1 P ij is the number of pools individual j is assigned to, k i := n j=1 P ij is the size of pool i, σ π is the number of individuals assigned to the pool combination π, and Π q := {π ⊆ {1, . . . , m} : |π| = q} is the set of all possible combinations of q pools chosen from {1, . . . , m}.
We say that maximally balanced pools are also perfectly balanced when they are assigned exactly the same number of times, i.e., not just as evenly as possible. Likewise, we say that the pool assignments are perfectly balanced when they are assigned exactly the same number of times.

S3
S2 On the nontriviality of developing maximally balanced designs For q = 1, which corresponds to classical Dorfman testing, each individual needs to be assigned to a single pool. Balance is easily achieved here by cycling through the pools until all individuals are assigned. For example, to assign n = 8 individuals to m = 6 pools (A-F), assign individual 1 to pool A, individual 2 to B, and so on, yielding the following assignments: Individual 1 2 3 4 5 6 7 8 Pool A B C D E F A B All the n = 8 individuals are assigned to q = 1 pool, the m = 6 pools are assigned as evenly as possible (A and B are assigned twice and the rest are assigned once), and pool combinations are just the pools themselves when q = 1.
However, achieving maximal balance for q > 1 turns out to be highly nontrivial in general. Consider a straightforward approach of listing the pools and assigning individuals to consecutive pairs (e.g., AB, CD, EF, AB, ...). This design produces balanced pools but under-utilizes the combinatorial space (e.g., it does not use AC) and is inefficient. Alternatively, one might consider cycling through all possible pool combinations in their natural lexicographic order (i.e., AB, AC, AD, AE, ...). This design produces balanced pool combinations but now has imbalanced pools, which dilute the individuals unevenly and place some at a higher risk of false negatives than others. For instance, using all pairs in lexicographic order for n = 8 individuals assigned to q = 2 of m = 6 pools, yields Individual 1 2 3 4 5 6 7 8 Pools AB AC AD AE AF BC BD BE which assigns A five times and F only once. As a result, this design is not uniformly sensitive across individuals. For example, individual 1 undergoes a five-fold dilution in pool A and a four-fold dilution in pool B, while individual 5 is diluted in pool A but not at all in pool F. Uneven dilution of viral loads can lead to differential false negative test results, leaving individual 1 with a higher risk of false negatives than individual 5. Clearly, alternative orderings could be more balanced, and the challenge is how to systematically identify these.
One approach is to generate random designs such as random assignment 1 , which assigns each individual to q of m pools chosen uniformly at random. This design treats both pools and pool combinations uniformly on average, but any given random draw may assign them very unevenly in practice, especially when m, n, and q are not very large. For the example above, this approach draws one of the m q n = 15 8 possible designs uniformly at random, so pool sizes (individuals per pool) can range all the way from zero to n. Likewise for pool combinations. In fact, 98.7% of the random draws in this case will not be maximally balanced, i.e., the pools or pool combinations are not assigned as evenly as possible, as can be verified by exhaustive search. Hence, it is rare for these randomly generated designs to both treat individuals uniformly and maximally use all available pool combinations. With high probability, the random assignment design does not achieve our goal.
One could also consider searching among many random draws for a suitable design. In the small example above, it would take 180 draws to have a 90% chance of drawing at least one maximally balanced design. One might try selecting S4 then modifying the most balanced (in some appropriate sense) from among the many draws, but such an approach is ad-hoc, involves manual tweaking, and may still not be maximally balanced. Double-pooling 2 is an alternative random design that instead partitions the n individuals into m/2 pools two times. This design produces balanced pools but does not guarantee balanced pool combinations, so again involves the possibility of manual tweaking to achieve maximal balance.
Exhaustive search, on the other hand, is a systematic approach but is only practical for very small cases. The m q n many designs to search through rapidly becomes intractable in general. Moreover, one must repeat the process for any change in the numbers of individuals n, number of splits q, or number of pools m; the design is not easily adapted from one environment to another. One could attempt to precompute and store designs in advance, but this then limits us to the set of designs anticipated to be useful, again reducing flexibility.
Code-based designs (like P-BEST 3 ) and array designs (like plate-based arrays 4 ) can be maximally balanced but may only apply or remain balanced for certain choices of n and m. For example, array designs do not use all available pool combinations so do not have maximally balanced pool combinations once the number of individuals exceeds the number of cells in the array. Code-based designs can also require significant technical expertise to adapt. These aspects all limit the flexibility and adaptability of these approaches.
As a result, though it is quite natural to desire maximally balanced designs, the problem of easily generating them is highly nontrivial. In fact, it is not initially clear that designs maximally balanced in all three ways even exist in general, let alone, an efficient way to find them. In this paper, we have shown how certain deep results from combinatorics, such as Baranyai's theorem 5 imply that such designs do in fact exist, under the minimal required conditions. Moreover, we demonstrate how to generate them under mild conditions using the combinatorics of hypergraph factorizations. For q = 3, we leverage a non-trivial number-theoretic construction due to Beth 6;7 . These tools enable us to develop a simple, flexible and efficient pooled testing strategy with maximally balanced pool designs.

S3 Maximal balance and hypergraph factorization
As discussed in the main article, HYPER pooling designs are guaranteed to be maximally balanced in general as a consequence of the properties of hypergraph factorization. In particular, the properties of hypergraph factorization imply that the resulting sequence of pool assignments (obtained in the setup stage) produces maximal balance for any number of individuals. Indeed, as we will now explain, this is why HYPER uses hypergraph factorization for generating the pools.
The key idea is to order the m q pool combinations so that each consecutive block of m/q combinations contains each pool exactly once (we require that q divides m). In our earlier example (Fig. 1), the first block of m/q = 3 pairs (AB, CD, EF) contains each of the pools (A-F) exactly once, as does the next block (BC, DF, AE), and so on. This property keeps pool sizes maximally balanced as individuals one-by-one get assigned to the pool pairs, because it guarantees that each pool is assigned once before any are assigned again. Likewise, the fact that each pair appears exactly once keeps the pool combinations maximally balanced. As a result, in Fig. 1, each pool was assigned four times and each pool pair was assigned once or never, making the pools perfectly balanced and the pool pairs maximally balanced. For n = 11 individuals, the pools would no longer be perfectly even, since pools A and D would have only 3 individuals, but they would still be maximally balanced. The same holds for any number of individuals.
Our task, therefore, is to divide the m q possible pool combinations into subsets, where each pool appears exactly once in each subset. This corresponds exactly to the mathematical problem of hypergraph factorization. As described in the main article and illustrated in Fig. 1, we think of the m pools as vertices and the m q possible pool combination as hyperedges that each connect q of the m vertices (i.e., the blue lines in Fig. 1). The resulting hypergraph (i.e., the vertices and hyperedges taken together) contains all m q possible hyperedges and is known as the complete hypergraph of order q on m vertices, which we denote K q m . The hypergraph in Fig. 1 is K 2 6 . Shown in more detail, K 2 6 has the following hyperedges: The 15 hyperedges of K 2 6 (in lexicographic order) Drawing the corresponding complete hypergraph for q = 3 is harder, but one can quickly visualize the hyperedges as triangles connecting q = 3 vertices: The 20 hyperedges of K 3 6 (in lexicographic order)

BEF, ACD
So, restated in these terms, our task is to factorize the complete hypergraph K q m into disjoint 1-factors.
In Fig. 1, the sequence of pool assignments was obtained by factorizing K 2 6 into the five disjoint 1-factors shown as little hypergraphs inside the circle of pool assignments. The first 1-factor contained hyperedges AB, CD, and EF, which gave the first m/q = 3 pool assignments. The next 1-factor (BC, DF, AE) gave the next m/q = 3 pool assignments, and so on.
Hypergraph factorization has been the subject of intense study 8;9 and has even been used for group testing 10 (though in a different way). A particularly important theorem is that such a factorization always exists (as long as q divides m), which is a deep and celebrated result from combinatorics known as Baranyai's theorem 5 . However, finding a factorization can be an incredibly challenging combinatorial problem for q > 1. Fortunately, efficient algorithms for constructing these factorizations have been discovered 6;7;11 for q = 2 (as long as m is even) as well as q = 3 (as long as m is a multiple of 6 and m − 1 is a prime number). We describe these constructions in the following sections, and we discuss how HYPER fits into the broader context of design theory in the Supplementary Material.
As mentioned in the main article, these constructions cover a very wide range of useful m and q, and we have developed an online tool (http://hyper.covid19-analysis.org) that implements them and generates pool assignments.

S7
S4 Efficient constructions of hypergraph factorizations S4.1 Hypergraph factorization of the complete hypergraph for q = 1 For any m, the complete hypergraph K 1 m of order q = 1 is itself a 1-factor. As a result, its factorization contains only a single 1-factor: itself. So, the sequence of pool assignments is simply the pools listed (in any order). Notably, using this sequence to assign individuals to pools, reproduces Dorfman pooling as a special case.
S4.2 Hypergraph factorization of the complete hypergraph for q = 2 For m even, we use the following efficient method for constructing a factorization of K 2 m . Here we follow the description in Section VII-5.5 of Colbourn and Dinitz 12 and illustrate it using m = 6 as an example; see also page 595 of Beth et al. 8 .
The construction begins with the following starter: with the starter 1-factor formed by edges {(0, ∞), (−1, +1), . . . , (−u, +u)}. The remaining 1-factors are then generated by rotating the diagram as follows: This yields m − 1 many 1-factors in total, that taken together form the desired hypergraph factorization of K 2 m . To connect this back to the pools, simply relabel the vertices V using the pool names. For example, using for the diagrams above yields the following 1-factors: For q = 3, we leverage a nontrivial number-theoretic construction due to Thomas Beth 6;7 , which is guaranteed to work when m = 6k for some integer k (i.e., the total number of pools is divisible by six) and r = 6k − 1 is a prime number (i.e., r is not divisible by any other number between 1 and r). Note that many designs of practical interest enjoy these properties: e.g., for pool sizes of m = 6, 12, 24, 48, each are divisible by six, and further we have that r = 5, 11, 23, 47 are prime numbers.
Algebraic background. We follow the description in Beth 6 (which appears to be difficult to access online; the construction is also presented in the thesis 7 , Section 3.1, and also referenced in Tamm 11 ). The construction works as follows.
Consider the finite field (or Galois field) of the prime order r, GF(r) = Z/rZ = F r . This is defined as the set of numbers {0, 1, . . . , r − 1}, with addition, multiplication, and division by nonzero elements all defined modulo r (i.e., the result is always the residue after dividing by r). To this field, we append the symbol ∞, as the result of division by zero, so that 1/∞ = 0. We also define c + ∞ = c · ∞ = ∞ for all c = 0 ∈ F r . This constitutes the so-called projective line PG (1, r), with the point ∞ at infinity.
Beth's construction. Now, Beth considers the fractional linear map π : PG(1, r) → PG(1, r) given by π(x) = −(1 + x)/x. Here, 1 denotes the additive unit of the field, while addition and division are taken modulo r. A key observation is that π is a fixed-point-free map of order three; that is, it maps Let also ω be a primitive element of F r , that is an element such that ω j = 1, for any j = 1, 2, . . . , r − 2. Then, Beth's result 6;7 states that the partitions induced by multiplying and translating O by specific values λ, g as λ · O + g form a 1-factorization of F r . Here λ · O + g means that we take each of the hyperedges A i ∈ O, and transform their elements affinely into λ · A i + g, thus obtaining another hyperedge. Specifically, λ needs to take the values of the powers of ω given by λ = ω j , j = 1, . . . , (r − 1)/2, and g can take any value in F r .
The key for us is that this construction can be evaluated very efficiently, by simply iterating over the orbits of π and the values λ, g.
Example: r = 5. For m = 6 pools, r = m − 1 = 5 is a prime number and Beth's construction applies. We go through the construction in this setting as an illustrative example.
3. We find a primitive element of F r by looking at powers So we can choose ω ∈ {2, 3}. We will (arbitrarily) choose ω = 2.
Note that each cell in the resulting output table forms a partition of {0, 1, 2, 3, 4, ∞} as desired. Looping over the cells and using the partitions to form pools yields the desired design. S10

S5 Relationship to design theory
Here we describe how the HYPER design fits in the broader context of design theory. See Beth et al. 8;9 for an excellent introduction to relevant design theory; we will follow the notation and terminology from those references.
For us, a design I is a collection of points V and blocks B, and an assignment of some points to some blocks. In group testing, the points correspond to individuals (or more generally, to samples), and the blocks correspond to pools. The terminology of points and blocks is meant to be evocative of geometry, and indeed designs are closely connected to finite geometries such as affine and projective planes. Intuitively, points can sometimes be viewed as geometric points, while blocks can be viewed as lines. Points will be denoted with lowercase letters such as p, while blocks will be denoted with upper case letters such as B. The fact that point p is associated with (or incident on) block B is denoted as pIB.
Designs are called q-hypergraphs, if the size of the set of points incident on each block B is q, i.e., (B) = {p ∈ V : pIB} has size q. For group testing, this means that each sample is assigned to q pools. We are thus interested in q-hypergraphs, for small values of q, such as 2, 3.
A partition of a design is a disjoint union of its blocks into parts. A parallel class of a design is a collection of blocks such that each point is incident on exactly one block. This is analogous of the geometric idea that parallel lines do not intersect, and so if we partition the space into parallel lines, then each point belongs to exactly one such line. A design is called resolvable if it has a partition into parallel classes (a.k.a 1-factorization, parallelism or resolution).
For group testing, a resolution means that in each part of the partition of the blocks, we use each pool exactly once.
Since our goal is to use pools in a balanced way, this is precisely what we want. Thus, from a design theory perspective, we are interested exactly in resolutions of q-hypergraphs. There is a great amount of work on existence and constructions of such resolutions, see, e.g., Ch VIII of Beth et al. 8 and references therein.
One classical strategy is the permutation group action approach. Here, we start with a collection {B j }, j = 1, . . 207 of Beth et al. 8 . A more specific technique is the difference cycle method, which is an application of the permutation group action method when the cyclic group G = Z l acts on the vertices V = Z l by translation. This approach leads to resolutions of the complete hypergraph for both q = 2 and q = 3, see Section VIII.8 of Beth et al. 8 . These are precisely the algorithms that we use. S11 S6 Reordering HYPER pool designs to aid lab implementation To aid lab implementation, our online tool (http://hyper.covid19-analysis.org) presents designs after carrying out a re-ordering that we describe here. As we illustrated in Fig. 1 CD and EF as the first three pairs, but rather AB is repeated twice, then CD is repeated twice, and so on.

S7 Correcting for false negatives in HYPER
The conservative decoder used in HYPER does not correct for any pooled tests with false negative results, in contrast to, e.g., P-BEST 3 . Such error correction may be incorporated by introducing a tolerance so that individuals in just one or two negative pools are still considered putative positives, as has been previously done for random assignment 1 . Doing so can improve the overall sensitivity of HYPER. However, note that an important source of false negatives is the dilution of viral loads below the limit of detection. An individual with small viral load is hence likely to yield false negatives in most of its pools, in which case error-correction may not significantly improve sensitivity. It is likely most useful for correcting false negative PCR results that are independent across tests, e.g., as may arise while performing the testing procedure. Note also that it can dramatically reduce efficiency. In our illustrative example ( Fig. 1), allowing a tolerance of one negative pool would turn individuals 1, 5, and 9-12 into putative positives. Labs must consider how to properly weigh these tradeoffs according to their specific needs and their anticipated sources of errors.

S8 Balanced array designs
In addition to plate-based arrays 4 , we consider balanced array designs. In particular, the plate-based arrays 4 proposed for COVID-19 testing use 8 × 12 and 16 × 24 arrays. This corresponds to plate sizes common in laboratory environments, making these choices convenient in practice. However, it results in imbalanced pools (Table 1) because the row pools are larger than the column pools. To address this, one could instead use square arrays, though these may no longer corresponds to common physical plate sizes in the lab. In these cases, the arrays become primarily a conceptual tool for constructing the design. In particular, taking k = m/2, one could use a k × k array with n = k 2 = (m/2) 2 individuals.
This design has balanced pools since all pools contain k individuals. However, this is somewhat inflexible since n must then be a perfect square. One way to address this is to partially fill a larger array design than needed, i.e., consider a square array with holes.
For example, to replace the 8 × 12 array, which covers n = 96 individuals, one could use a 10 × 10 array with four holes.
Note, however, that the holes must be placed carefully to preserve maximal balance of the pools. For example, if the holes are all placed on the first row, then that row pool will have four fewer individuals than the rest of the row pools. Instead, the holes should be placed along diagonals of the array. In this way, the pools in the array stay maximally balanced.
Taking this approach for general n corresponds to filling the array along the diagonals. For k = 3, e.g., individuals may be incrementally filled into a 3 × 3 array as follows: i.e., the sequence of assignments to cells are generated for general k as and the pool assignments are given by the rows and columns.
This approach is limited to producing designs with at most k 2 individuals. However, it can be extended to allow for more individuals, i.e., to create designs with n > k 2 , by going back to the start and placing multiple individuals in each cell. Namely, individual k 2 + 1 is placed in the same cell as individual 1, individual k 2 + 2 is with individual 2, and so on.
This parallels how HYPER handles n > m q individuals. Here it produces a design with maximally balanced pools. Pool combinations are not maximally balanced since some combinations are unused while others are assigned twice, but they are maximally balanced among those that are used.

S14
The design can also be extended to three-dimensional arrays that assign each individual to q = 3 pools. For k = 3, e.g., we have the following 3 × 3 × 3 array with pools along each slice (pool B shaded in to illustrate): The sequence of assignments to cells are now generated as yielding the following pool assignments for k = 3: Analogous to the q = 2 balanced arrays, this approach is limited to producing designs with at most k 3 individuals but can be extended by cycling back through the pool assignments as is done in HYPER.
These extensions make it possible to generate array designs with parameters matching those considered for HYPER (Table 2), allowing for further comparison with HYPER. S15 S9 Reed-Solomon Kautz-Singleton code-based designs We also consider the celebrated Kautz-Singleton 13 construction that converts a b-nary code (often a Reed-Solomon 14 code) into a binary design matrix. This design has recently been shown to be order optimal for the so-called probabilistic group testing problem under certain assumptions 15 .
For the reader's convenience, here we detail the steps involved in constructing the Reed-Solomon Kautz-Singleton (RS-KS) design used in this paper in our notation. The design is parameterized by (n, m, q), together with an additional parameter f used in the construction. The first step is to construct a m/q-nary matrix from a Reed-Solomon code as follows: 1. Let b := m/q, and let F b be the finite field of size b (so b must be a prime power).
g., in lexicographic order: 4. Form the q × b f matrix with b-nary entries: This requires that q ≤ b (so the evaluation is at distinct elements), and f ≤ q (so it is injective; we need this to be a meaningful code).
The second step, the Kautz-Singleton construction, is to convert the b = m/q-nary matrix into a binary design matrix by S16 replacing each letter of the code by a binary column vector {0, 1} b with a one-hot encoding, i.e., This gives a qb × b f binary design matrix with q ones in each column. Since m = qb, the size of the matrix is equivalently Note that this approach is limited to producing designs for which • m/q is a prime power.
Note that the requirement that f ≤ q, means that for the choices q = 2 and q = 3 we consider, the relevant designs are: For n ≤ (m/q) f , we consider truncating the design by taking the first n columns to obtain the desired m × n binary design matrix. The m rows correspond to the m pools, the n columns correspond to the n individuals, and the q ones in each column identify to which pools each individual is assigned.
For n > (m/q) f , we consider extending the design in a straightforward way by recycling the pool assignments, i.e., by concatenating the design matrix with copies on the right until it has at least n columns total. Namely, individual (m/q) f + 1 is assigned the same pools as individual 1, individual (m/q) f + 2 is assigned the same pools as individual 2, and so on. This parallels how HYPER handles n > m q individuals. The resulting design parameters from Table 2 that can be handled by this construction are indicated there by asterisks ( * ). We found that for all these designs, the pools were maximally balanced and the pool combinations were maximally balanced among those that were used. When m/q is not a prime power, we considered using a prime power b larger than m/q in the construction then dropping qb − m rows to obtain m final rows, but doing so produced columns with less than q ones (i.e., those individuals were not placed in q pools). This could be addressed by removing those columns, but in the cases we tried, this led to the pools being unbalanced.

S17
Example. We work out the construction for n = 9, m = 6, q = 2, f = 2 as an illustrative example. Here, b = 3 so we have F 3 = {0, 1, 2}, i.e., (Z 3 , +, ·), with generator x = 2. Next, we have the enumeration which yields the following b-nary matrix Converting to a binary matrix finally yields the following design: Note that this has balanced pools and maximally balanced pool combinations, but does not use all pool pairs (e.g., the pool pair AB is unused). S18 S10 Performance characterization under a common statistical model Here we study the performance of HYPER under the common statistical model where each individual (or in more general contexts, each sample) is positive independently at random with probability p and each test may be incorrect with some probability, i.e., each test has a specificity of 1 − α and a sensitivity of β. We will use T to denote the number of tests used by HYPER (including stage 2 tests). Note that this is a random variable due to randomness in both which individuals are positive and in any test errors.

S10.1 An upper bound on the expected number of tests in the noiseless case
We first consider the noiseless case, and give an upper bound on the expected number of tests used by HYPER (including stage 2 tests). To get this result, we leverage the Dawson-Sankoff inequality 16 , a nontrivial refinement of the Bonferroni union-intersection inequalities, which we use in the form given by Galambos 17 . We will later relax the assumption made here that tests have perfect sensitivity and specificity.
Theorem 1 (Performance in the noiseless case): Suppose that all tests are noiseless (α = 0, β = 1), and consider a HYPER design with n individuals, m pools, and q splits, such that n is a multiple of m/q. For any integer l ≥ 2, the expected number of tests is upper bounded as follows: where k = nq/m is the pool size and u = m−2 q−2 · n/ m q . The bound becomes an equality when (A) q = 1, (B) n < m q and q = 2; and with taking l = 2 in both cases, or when (C) n is a multiple of m q . For general q, the optimal choice for the parameter l is bounded by l ≤ (q − 1)(1 − p) k−u + 2.
Note that in cases (A) and (B), the bound (1) not only becomes an equality but can also be simplified by using the fact that u = 0 in case (A) and u = 1 in case (B), yielding the following expressions: which can be combined into a single formula as follows: Proof of Theorem 1. Let R i , i = 1, . . . , n be the indicator of the event that we need to retest individual i in stage 2. Using the standard approach of calculating E(T ), we find that the number of tests required is equal to m (one for each pool), plus any retests required. Hence, Pr(R i ).

S19
Now, for our decoder, R i happens precisely when all groups containing i are positive. Let X i , i = 1, . . . , n be the indicator that the i-th individual is positive. Let G j ⊂ {1, . . . , n} be the individuals contained in pool j, for j = 1, . . . , m.
We use a refined version of the Bonferroni inequality known as the Dawson-Sankoff inequality 16 to bound Pr(R i ). First let us recall the familiar Bonferroni union-intersection inequalities. Consider events A 1 , . . . , A N , and for all j ∈ {1, . . . , N }, Then, the well-known Bonferroni inequalities state that for even h ∈ {1, . . . , N }, By construction, there are q pools containing individual i. Without loss of generality, we can assume by relabeling the pools that their indices are 1, . . . , q. Then, the individuals contained in them are G 1 , . . . , G q ; which implicitly depend on i, but this is not displayed for notational simplicity. In the Bonferroni inequality, we set N = q, and for j = 1, . . . , q, be the event that all individuals contained in the j-th pool containing individual i test negative. As above, these implicitly depend on i, but this is not displayed for notational simplicity. By definition, the i-th individual is not retested, so R i does not happen, precisely when none of the individuals contained in the pools 1, . . . , q to which individual i belongs to are positive. Equivalently, ¬R i = ∪ q j=1 A j . Then for any even integer h Taking h = 2, we thus find We can get sharper results with the Dawson-Sankoff inequality 16 . In the form given by Galambos 17 , this states that for any integer l ≥ 2, .
It remains to bound |G j |. Here HYPER designs are useful, because they try to balance |G j |. In each consecutive block of m/q individuals, they use each of the m pools exactly once. Thus, in each G j , there is at most one new individual.
Recall that k is the number of consecutive blocks of individuals of size m/q, and we assumed that k = nq/m is an integer.
Based on the above, we have |G j | = k.
The reason is that, since HYPER designs are maximally balanced, they only intersect at most m−2 q−2 S20 times in every consecutive block of m q individuals. These intersections correspond to the number of ways of choosing the remaining q − 2 pools out of the remaining m − 2. Hence, Pr(A j ) = (1 − p) k and for j = j , Pr In addition, we have equality for S 2 when either (A) q = 1, in which case the intersection is empty and u = 0, or (B) n < m q and q = 2 (in which case we know that the groups intersect in exactly u = 1 individual, which is the individual defining them) or (C) n is a multiple of m q (in which case we know that the number of intersections is exactly given by u for each pair of groups). This leads to the desired result.
As is known 16;17 , the optimal choice for l is l = 2S 2 /S 1 + 2. We can approximate the optimal choice using the calculations from the proof as This finishes the proof.

S10.2 Optimal efficiency for HYPER with q = 2
Our next result characterizes the optimal efficiency of HYPER designs with q = 2 for noiseless tests in the case where the number of individuals per batch is large and the prevalence is small. In comparison, the optimal expected tests per individual for Dorfman testing is approximately 2p 1/2 for small p (see, e.g., Finucan 18 ). HYPER designs with q = 2 improve on Dorfman designs in this regime. The expected tests per individual for three-stage-testing 18 and double-pooling 2 are approximately 3p 2/3 in the limit. This matches HYPER, but note that HYPER is a two-stage deterministic approach. The expected tests per individual for the hierarchical testing methods proposed in Finucan 18 and Mutesa et al. 19 is approximately ep ln(1/p) as p → 0, which is asymptotically more efficient, but note that these methods use multiple stages.
More generally, there is a lot of work on optimality for group testing, for various cases, e.g., two-stage and multi-stage algorithms, adaptive and non-adaptive algorithms, in worst case or average case, etc [20][21][22][23][24][25] . As p → 0, these works and others discuss a universal lower bound of order Θ(p ln(1/p)) on the expected tests per individual. However, the best efficiency (and the algorithms that achieve it) depends on the specific rate at which p → 0. In particular, under the same statistical model as in our paper, Mezard and Toninelli 23 construct certain tests where each sample is placed into q = ln(1/p)/ ln 2 pools, and show that these attain an asymptotic expected tests per individual of p ln(1/p). Coja-Oghlan et al. 26 constructs a 2-stage algorithm with asymptotically optimal efficiency, requiring q = m ln(2)/(np). Gebhard et al. 27 discusses similar proposals for the noisy case. In our work, the constraints we work with do not allow q to grow with p → 0. Scarlett 28 proposes a 4-stage algorithm with asymptotically optimal efficiency. Our work is also related to S21 d-disjunctive superimposed matrices for pooled testing, which work without errors when the number of positives is at most d and there are no false positives 13;29 . Sharp bounds on the size of d-disjunctive matrices were constructed in Erdös et al. 30 . Classic error-correcting codes such as Reed-Solomon codes were suggested for group testing dating back at least to Kautz and Singleton 13 .
Proof of Proposition 2. Taking the limit as n → ∞ with m/n = y fixed and noting that n < m 2 eventually in this limit, we obtain that eventually To optimize, we differentiate E with respect to y, obtaining The optimal y can be obtained by solving 0 = ∂E/∂y for y in terms of p. We approximate this solution in the limit p → 0 by taking the leading two terms of the Puiseux expansion (around p = 0) of the degree four Taylor approximation of ∂E/∂y (with respect to p = 0). This has one branch corresponding to a real solution yielding y ≈ 2p 2/3 − p.
Substituting y = 2p 2/3 − p into E and computing a Taylor approximation yields completing the derivation.

S10.3 Generalization to the noisy case
We next study the noisy case, where each of the pooled tests can have false positives and false negatives. Our first result gives an exact formula for the expected number of tests for q ≤ 2, and moreover, also gives formulas for the sensitivity, specificity, true negative probability, and true positive probability. Our second result gives a more generally applicable upper bound on the expected number of tests that is valid for all q.
Theorem 3 (Performance in the noisy case for q ≤ 2): Suppose each test has specificity 1 − α and sensitivity β, and consider a HYPER design with n individuals, m pools, and q splits, such that n is a multiple of m/q. Suppose further that q ≤ 2 and n ≤ m q . Then the expected number of tests has the following exact form: The overall sensitivity and specificity are, respectively, where X i denotes the true status of individual i and X i is the status declared by HYPER. Finally, the true negative and true positive probabilities are, respectively, where o = (1 − p)/p is the odds ratio.

S22
Theorem 4 (Performance in the noisy case for all q): Suppose each test has specificity 1 − α and sensitivity β, and consider a HYPER design with n individuals, m pools, and q splits, such that n is a multiple of m/q. For any integer l ≥ 2, the expected number of tests is upper bounded as follows: where The optimal choice of l, minimizing this upper bound, is achieved by l = (q − 1)p 2 /p 1 + 2.
Proof of Theorems 3 and 4. We will follow, to some extent, the notation and assumptions from Bilder's works (see, e.g., Bilder 31 ). LetĤ j be the binary result of testing group j, and H j be the true status of the j-th group. Recall that 1 − α is the specificity of each grouped test, so 1 − α = Pr(Ĥ j = 0|H j = 0) (we use this notation in convention with the notion of α for the level of a test in hypothesis testing); likewise, β is the sensitivity (or power) of each grouped test, so The key is to determine the probabilities Pr(R i |X i = t). Now R i happens if and only if each of the groups containing i have a positive status, i.e., for all i ∈ G j , we haveĤ j = 1: Since q = 1, 2, and n ≤ m q , these groups are non-overlapping outside of i, and thus, their probabilities are independent conditional on i. We can thus write Next, we can condition on H j for each term, to write Working our way back up, we can substitute these above to find Letting n i = |{i : i ∈ G j }| be the number of groups that i belongs to, we find Hence Recall that for the hypergraph designs, |G j | ≤ n/(m/q) , and if n is an integer multiple of m/q as assumed here, then This gives the desired formula for the expected number of tests.
Next we derive the overall sensitivity, specificity, true negative probability, and true positive probability. Recall that X i is the indicator that the i-th individual is declared positive. This happens if all groups containing i are positive in the first round, and then the result of a second independent test X i2 is also positive. Recall that X i is the indicator that the i-th individual is positive, and Pr(X i = 1) = p. We are interested in the probabilities Pr( Since the result of the retest is independent of the original tests and the retest has the same operating characteristics as any grouped test, we can write We have Pr( X i2 = 1|X i = 0) = α and Pr( X i2 = 1|X i = 1) = β. Hence, using our previous results and denoting We are next interested in the probabilities Pr(X i = t| X i =t), for t,t ∈ {0, 1}. For Pr(X i = 1| X i = 1) this denotes the true positive probability; for Pr(X i = 0| X i = 0) this denotes the true negative probability. Using Bayes' rule, we can write Using our previous results yields Recall that for the hypergraph designs, if n is an integer multiple of m/q, then γ = [β + (α − β) · (1 − p) nq/m−1 ] q and n i = q. Hence, we find that the overall sensitivity and specificity are, respectively, S24 and, denoting the odds ratio as o = (1−p)/p, we find that the true negative and true positive probabilities are, respectively, This finishes the proof of Theorem 3. Now, we proceed to Theorem 4. This follows in a similar way to the previous Theorem 1, but with more involved calculations. As before, the Dawson-Sankoff inequality 16 , in the form given by Galambos 17 , states that for any integer l, .
it is enough to give a lower bound for Pr(Ĥ j = 0) and an upper bound for Pr(Ĥ j =Ĥ g = 0) for all j = g.
We can condition on H j to write In the last line, we have used that n is a multiple of m/q.
Similarly, we can calculate for j = k, noting that |G j | = |G g | = k, and denoting |G j ∩G g | = v, so that |G j ∪G g | = 2k−v, As discussed before, due to the construction of hypergraph factorization designs, we have v ≤ u = m−2 q−2 · n/ m q . Since the above expression is monotonically increasing in u, we can also conclude that we can upper bound it by replacing v with u. This finishes the proof. Supplementary Figure 1: Performance under a common statistical model. We consider the overall efficiency (a), sensitivity (b), and specificity (c) of three HYPER designs (three choices of m, all with n = 3072 and q = 2) under a general statistical model with noisy tests having sensitivity β = 90% and specificity 1 − α = 95%. The theoretical predictions (solid curves) agree well with simulation results (scatter plots). For a lower sensitivity β = 80% (but same specificity 1 − α = 95%), the efficiency (d) is slightly better (since more positives are missed), the sensitivity (e) is lower, and the specificity (f) is slightly improved (at larger prevalence). For noiseless tests (β = 1 − α = 1), we study the optimal choice (g) of m for HYPER with q = 2 in the limit of large batches (n → ∞) with diminishing prevalence (p → 0). As n grows from n = 96 to n = 6144, the value of m optimizing the efficiency (found by exhaustive search) approaches the theoretical limit of m/n ≈ 2p 2/3 − p for small prevalence p. Likewise, the corresponding efficiency (h; relative to individual testing) approaches n/E(T ) ≈ p −2/3 /3 for small prevalence p. The approximation improves for increasingly small p as n grows. Supplementary Figure 2: Comparison of two design problems and their bounds. We compare two design problems. Problem 1 is to minimize the average number of tests used per individual by using any group testing method (i.e., any number of stages, any decoder, etc.). This problem is tackled, e.g., by fully-adaptive methods. Problem 2 is to minimize the average number of tests used per individual but by using two-stage group testing methods with a conservative decoder 32 . This is the problem that HYPER tackles. Namely, we consider methods with a first stage of pooled tests followed by a second stage where putative positives are tested individually. This class of methods is of great practical importance. Using only two stages reduces the time needed to receive results, which is crucial in public health settings. Moreover, carrying out more stages can be difficult in practice due to the added logistical burden (especially for adaptively chosen pooled tests). Using a conservative decoder also helps in practice since it avoids more complicated reasoning, e.g., to identify definite positives by process of elimination. This figure compares known lower bounds for the two problems in the noiseless large n setting (i.e., α = 0, β = 1, and n → ∞) for a range of fixed prevalences p (sometimes called the linear regime 33 ). As one would naturally expect, the lower bound (counting bound 33 ) for problem 1 is lower than the lower bound 32 for problem 2 since it is less constrained. For problem 1, fully-adaptive methods are nearly optimal, as was already known 33 . For problem 2, HYPER appears to be fairly close to optimal. The gap visible for smaller prevalences is likely due to: a) our additional constraint that q ≤ 3 (to aid real-life implementation), b) looseness in our upper bound for HYPER, or c) potential looseness in the lower bound 32 .  Figure 4: Comparison of P-BEST with additional HYPER designs. We expand the comparison of HYPER and P-BEST in Fig. 2. To the previous H 384,32,2 HYPER design, we add a H 384,48,2 design (with the same number of pools m = 48 as P-BEST) and a H 384,16,2 design (with same pool size nq/m = 48 as P-BEST). Since H 384,48,2 has the same number of pools as P-BEST, its efficiency at low prevalence is similar. However, its pools are one-third the size helping it achieve a higher sensitivity. The H 384,16,2 has the same size pools as P-BEST, and a comparable sensitivity for much of the 50-day window highlighted. However, it has one-third as many pools giving it an initial efficiency roughly three times higher. As before (Fig. 2), the efficiency of HYPER declines for both designs as prevalence grows, eventually falling below the constant efficiency achieved by P-BEST around day 80. Likewise, as before, the sensitivity of HYPER grows around the same time, while P-BEST significantly loses sensitivity. As before, average values of efficiency (relative to individual testing) and sensitivity of the various pooling designs are shown for each day, with results averaged across 200,000 random trials. For sensitivity, raw averages are shown as dots with degree-8 polynomial fits overlaid as curves; the curves for efficiency depict raw averages. For both n = 96 (a) and n = 384 (b) individuals per batch, the average efficiency and sensitivity of both random assignment and double-pooling are generally similar to their corresponding HYPER designs. However, it turns out that these random designs behave less consistently than HYPER. Their performance depends on which individual happens to be positive, which is undesirable from a laboratory standpoint. This aspect is obscured by the averages in this figure, and we investigate it separately in Supplementary Fig. 7. Compared with the balanced variants of the plate-based arrays, HYPER uses fewer pools and is roughly 25% more efficient for much of the 50-day window highlighted. HYPER also has correspondingly larger pool sizes, and is slightly less sensitive. Note that the 10 × 10 and 20 × 20 arrays here are, respectively, the smallest square arrays that can accommodate n = 96 and n = 384 individuals without placing multiple individuals in the same array cell. Forming balanced arrays with the same number of pools as HYPER (i.e., m = 8 and m = 16) requires extending the design to assign multiple individuals to some array cells, and we consider these designs in Supplementary Fig. 6.  Figure 6: Comparison with balanced array and code-based designs with matching parameters. Since plate-based arrays and P-BEST are particular instances of general array-based and code-based designs, we also compare HYPER with general balanced arrays (a,b) and Reed-Solomon Kautz-Singleton (RS-KS) code-based designs (c,d), with matching design parameters (number of individuals n, number of pools m, pools per individual q). We use the extended versions of these designs described in Sections S8 and S9; unextended versions do not accommodate more than (m/2) 2 individuals when q = 2. The sensitivity of these methods closely match those of HYPER when the pooling parameters are the same. Sensitivity depends significantly on the number of individuals per pool, which is nq/m for all three methods since they all have balanced pools. HYPER has either similar or better efficiency than both methods, with a larger improvement arising for more aggressive pooling parameters that use fewer pools and are more efficient at low prevalence.  Figure 7: Impact of imbalanced designs. We investigate how balance affects how much efficiency and sensitivity vary depending on where positive individuals happen to fall. We consider designs with n = 96 individuals per batch, m = 16 pools and q = 2 splits, and we suppose exactly one individual is positive in each batch. That individual's viral load is drawn from the distribution of nonzero viral loads on day 80 of the simulated epidemic from Fig. 2 (for which the prevalence is roughly 1.06%); everyone else has zero viral load. We consider placing the positive individual in each of the n slots of the batch (i.e., as individual 1, individual 2, etc.), and compare the average values (from 100,000 trials) for efficiency (relative to individual testing, i.e., individuals/test) and for sensitivity as functions of the location. Random assignment 1 (a-c) has varying performance depending on which particular design gets drawn, and often has unbalanced pool combinations and pools. As a result, all three draws had uneven efficiency and sensitivity; e.g., for the first draw, the sensitivity was 75.7% for a positive individual in location 11 but 72.2% for location 7. Double-pooling 2 (d-f) guarantees balanced pools but often has unbalanced pool combinations. As a result, all three draws had uniform sensitivity but uneven efficiency. Note that uneven efficiency means that the stage 2 workload is more unpredictable, which can make the logistics of large-scale screening more difficult. Consecutive pooling (AB, CD, EF, ...; g), only uses a subset of the m q possible pool combinations and was generally less efficient. It does, however, have balanced pools yielding uniform sensitivity regardless of where the positive individual fell. Lexicographic pooling (AB, AC, AD, ...; h) has balanced pool combinations yielding uniformly high efficiency. However, it has unbalanced pools resulting in uneven sensitivity. Balanced array pooling (i) and Reed-Solomon Kautz-Singleton (RS-KS) code-based pooling (j) both had balanced pools here, and each was maximally balanced on the (m/2) 2 pool combinations it used. However, neither was perfectly balanced on the pool combinations it used; in both cases, some pool combinations were used twice while others were used once. Both had uniform sensitivity and slightly uneven efficiency. In contrast, HYPER (k) has both balanced pool combinations and balanced pools. As a result, it had both uniformly high efficiency and uniform sensitivity. Moreover, its median efficiency (5.68 individuals/test) and median sensitivity (74.4%) were generally among the best. The boxplots include a center line (median), box limits (upper and lower quartiles), whiskers (1.5x the interquartile range), and points (outliers).  Figure 8: Efficiency and sensitivity of pooled testing during a simulated epidemic (Fig. 2) repeated for a 25-fold reduction in the limit of detection (from 100 to 4). With this reduction in the limit of detection (LOD), smaller viral loads are detected resulting in a higher individual testing sensitivity of roughly 95%. The sensitivity of the group testing methods increased concordantly, resulting in similar relative sensitivities as before. For example, the H 96,16,2 design and the 8 × 12 array design have a 0.54 percentage point difference in sensitivity averaged across days 40-90 here, compared with a 0.24 percentage point difference before (Fig. 2). The sensitivity loss with respect to individual testing is now slightly smaller; e.g., the sensitivity of the H 96,16,2 design is roughly 9 percentage points lower here, compared with a roughly 10 percentage point difference before (Fig. 2). In general, an increase in sensitivity can lead to a decrease in efficiency, since more pools (correctly) test positive, but the impact appears to be small here. Efficiency for the various methods is very similar to before (Fig. 2), with HYPER enjoying essentially the same gains in efficiency over individual testing.  Figure 9: Efficiency and sensitivity of pooled testing during a simulated epidemic (Fig. 2) repeated with a two-wave epidemic. To evaluate a later (post-exponential) phase, we repeated our simulation but with a population undergoing a sustained, two-wave epidemic (obtained from Cleary and Hay et al. 1 ). The simulated population was generated from an SEIR model with transmission rate modified at two time points: R 0 was initiated at 2.5 at day 0, decreased to 0.8 at day 80, and subsequently increased to 1.5 at day 150. The result is an epidemic with an initial wave, followed by a decline phase and subsequently another growth phase (a). We compare the efficiency and sensitivity of the same methods as before (c,d,e,f). Consistent with earlier studies 1 , sensitivity is generally lower for all methods (including individual testing) during the decline phase compared to the two growth phases. For example, compare day 104 (during decline) with days 82 and 173 (during growth), which all had prevalence around 1.0%. The difference in sensitivity can be explained by observing that the distribution of nonzero viral loads (b) on day 104 is shifted to the left relative to days 82 and 173, due to a shift away from recent infections. Hence, positive viral loads are more likely to fall below the limit of detection and get missed. At the same time, the relative performance of the methods is similar to before. The chosen HYPER designs are as sensitive as their corresponding array designs, while being roughly 20-25% more efficient for n = 96 and roughly 10-25% more efficient for n = 384. For n = 384, the HYPER design is generally more efficient than P-BEST (up to 50% more) until day 180 while also being generally more sensitive. After day 180, P-BEST is more efficient but at an additional cost of sensitivity.  Figure 10: Detailed comparison corresponding to Fig. 3. Each cell shows the effective screening capacity given constrained sample collection and testing budgets, for individual testing, HYPER (best n and m in annotation with q indicated by bar color), plate-based arrays 4 (better design between 8 × 12 and 16 × 24 noted in white), and P-BEST 3 . The best effective screening capacity for each cell is shown in black.    Figure 11: Comparison of pooling methods under resource constraints ( Fig. 3 and Supplementary Fig. 10) repeated for day 53 (prevalence of 0.1%). At this low prevalence, HYPER is the most effective across the grid of resource constraints and can be significantly so. The low prevalence setting also favors simple designs (low q) across more of the test-constrained regime.  Figure 13: Comparison of pooling methods under resource constraints ( Fig. 3 and Supplementary Fig. 10) repeated for day 83 (prevalence of 1.36%). At this prevalence, HYPER remains best across all scenarios. Compared to Supplementary Fig. 12, designs with higher q are even more often the most effective.   Supplementary Figure 17: Comparison with balanced array designs under resource constraints for day 53 (prevalence of 0.1%). HYPER was about as effective as or substantially more effective than balanced arrays across the entire grid of resource constraints (c, d). Both methods were optimized over the same sweep of n, m and q ( Table 2). In important testing-constrained scenarios (with sample collection budget significantly outstripping testing budget), HYPER was up to 38% more effective than balanced arrays (e.g., for a budget of 384 samples and 12 tests; a). In some cases, HYPER was more effective because its greater efficiency enabled it to use more aggressive design parameters (a). In some other cases, the best design parameters for both methods was the same, but HYPER was more effective with the same parameters (e.g., for a budget of 1536 samples and 24 tests; b).