Using spectral characterization to identify healthcare-associated infection (HAI) patients for clinical contact precaution

Healthcare-associated infections (HAIs) are a major problem in hospital infection control. Although HAIs can be suppressed using contact precautions, such precautions are expensive, and we can only apply them to a small fraction of patients (i.e., a limited budget). In this work, we focus on two clinical problems arising from the limited budget: (a) choosing the best patients to be placed under precaution given a limited budget to minimize the spread (the isolation problem), and (b) choosing the best patients to release when limited budget requires some of the patients to be cleared from precaution (the clearance problem). A critical challenge in addressing them is that HAIs have multiple transmission pathways such that locations can also accumulate ‘load’ and spread the disease. One of the most common practices when placing patients under contact precautions is the regular clearance of pathogen loads. However, standard propagation models like independent cascade (IC)/susceptible-infectious-susceptible (SIS) cannot capture such mechanisms directly. Hence to account for this challenge, using non-linear system theory, we develop a novel spectral characterization of a recently proposed pathogen load based model, 2-Mode-SIS model, on people/location networks to capture spread dynamics of HAIs. We formulate the two clinical problems using this spectral characterization and develop effective and efficient algorithms for them. Our experiments show that our methods outperform several natural structural and clinical approaches on real-world hospital testbeds and pick meaningful solutions.


Heterogeneous contact networks
G is a set of time-varying heterogeneous contact networks -it is sequence of graphs G 1 , G 2 , . . ., G T with T number of timesteps.We focus on unweighted and undirected graphs; hence each G t is represented as an N × N symmetric binary adjacency matrix A A A t s.t.A A A t (i, j) = 1 if nodes i and j are connected with an edge at time t, A A A t (i, j) = 0 otherwise.Note that our node-set is shared across all graphs and can represent people (patients and HCWs) as well as locations (patient rooms, nurses' stations, etc.).
As mentioned earlier, contact networks are not explicitly available in EHR data.Instead, we infer them from various tables, such as Flowsheets, Medicine Administration, Admission Discharge, and doctor's notes.These tables specify specific events that occurred during the patient's stay in the hospital, e.g., admission into the hospital or ICUs, administration of different kinds of medications.One can infer the exact location of the patient, and when a HCW enters the patient room, which we use to infer co-location.The contact network is constructed using this co-location information.

2-MODE-SIS model
In this work, we use 2-MODE-SIS model 25 to simulate the spread of MRSA pathogens.Procedure 1 shows the simulation steps of the model.The meaning of notations and parameters are listed in Supplementary Table 1 and Supplementary Table 2.In the contact networks, each node carries some amount of pathogens that changes over time, and the exchange of pathogens among the nodes is driven by edges that imply close contact among the nodes.2-MODE-SIS model uses τ i jt to denote the ratio of pathogen being transferred (or remaining if i = j) from node j to node i at time t, and φ i to denote the natural pathogen reduction rate on node i.By constructing a transfer matrix R R R t from the transfer ratios τ i jt , reduction rates φ i , and adjacency matrix A A A t , the pathogen load updates can be written as a linear operation as in Procedure 1 step 5, where x x x t is the infection state vector at time t and α is pathogen shedding rate for infected patients.Note that we restrict the column-sums of R R R t to be less than or equal to 1, which implies that the total amount of pathogen cannot increase after transfer (i.e., ∥R R R t l l l t ∥ 1 ≤ ∥l l l t ∥ 1 ).

Supplementary Table 1. List of notations
2-MODE-SIS is pathogen 'load-based' where each patient can be either susceptible or infected, and the model keeps track of pathogen load on all people and locations.Such pathogen load based models are widely used in HAIs modeling 7,[26][27][28][29][30][31] .Besides, they can also model the cleaning practice widely used in contact precautions (e.g., using disinfectants to clean the ward) directly 26,29,30,32,33 .Therefore, compared with traditional Independent Cascade (IC)/Susceptible-Infectious-Susceptible(SIS) models, 2-MODE-SIS model is more suitable for HAIs modeling.Unlike classical SIS models 34 , in 2-MODE-SIS model the infection probability of a patient increases with the amount of pathogen on the patient, which is formulated as a dose-response function f (•) in Procedure 1 step 8. Once infected, the patient sheds α additional pathogen per timestep to his own load, which can later be transferred to neighbors (both people and locations) via edges; this shedding continues until the patient recovers.For susceptible patients, they may still be colonized with the pathogen loads and spread them to others.As for HCWs and locations that are assumed to be non-infectable, they still act as pathways of pathogen transfer.

5:
Update loads l l l t+1 = R R R t l l l t + αx x x t .

6:
for each patient i do 7: if i is susceptible at time t (i.e., x x x t (i) = 0) then 8: i gets infected (i.e.x x x t+1 (i) = 1) with probability f (l l l t (i)) = min{1, β l l l t (i)}.

Approximation via NLDS
In this work, we develop a characterization that captures the overall dynamics of the system and derive a spectral characterization for the 2-MODE-SIS model.We will also show in later sections how it helps formulate the clinical problems.
Our proposed approach is to approximate the propagation dynamics of the 2-MODE-SIS model using a Non-Linear Dynamical System (NLDS) with two coupled states: the pathogen loads l l l t and the infection probabilities p p p t .For simplicity, assume that the system periodically operates on two pathogen transfer matrices, R R R 1 for odd timesteps and R R R 2 for even timesteps.Hence, the time-varying contact network will be Note that the following analysis can easily be generalized toward any arbitrary number of repeating pathogen transfer matrices.Given this assumption, the state transition 2/29 updates for odd timesteps are (1) where 1 (i,2t) cure and 1 (i,2t) load are indicator variables of the event at time 2t that an infected patient i is cured (with probability δ ) and the event where patient i becomes infected (with probability β l l l 2t (i)), respectively.In Equation 2, the first term vanishes when i is susceptible, and if i is infected instead, the second term vanishes.
From linearity of expectation, we get: Given the current state at time 2t, it is reasonable to assume that 1 (i,2t) load and x x x 2t (i) are independent as the two random variables are correlated indirectly through the past infection states.This allows us to use E[1 As a sidenote, this independence holds when patient i has been susceptible at time t ′ for all t ′ < 2t.Let s s s 2t = p p p 2t l l l 2t ∈ R (P+N) be the vector describing the state of the system at time 2t.Combining Equations 3 and 4 allows us to write the transition at odd timesteps as where g 1 is a non-linear mapping.Similarly, for even timesteps, we get (with non-linear mapping g 2 )

Stability analysis of the NLDS
Equipped with the non-linear equations of our approximated NLDS, we can then analyze its asymptotic behavior.The long-term behavior of a dynamical system is dictated by the stability of its equilibrium points, at which the states no longer change over time (i.e., s s s eq = g(s s s eq )).The particular equilibrium point of interest is the all-zero point 0 0 0 where there is no infection as well as no pathogen remaining.Clearly, both g 1 and g 2 map 0 0 0 onto itself and 0 0 0 is an equilibrium point.We aim to find under what condition this equilibrium point is 'stable' 24 (i.e. a small perturbation does not cause large deviations).We use the next claim on asymptotic stability in a discrete-time NLDS.
Lemma 1. Assume a NLDS with discrete-time updates s s s t+1 = g(s s s t ) has an equilibrium point at s s s eq .Then, the point s s s eq is asymptotically stable if the spectral radius (i.e., the largest eigenvalue) of the Jacobian matrix J J J = ∇g = ∂ g i ∂ s s s t ( j) i j evaluated at s s s eq is less than 1 35 .
Specifically, in the system with two update steps g 1 and g 2 , each of which the Jacobian matrices evaluated at 0 0 0 are The following lemma shows that the long-term behavior can be characterized by computing the spectral radius of a so-called system matrix, defined as the product of individual Jacobian matrices S S S 2 and S S S 1 .This result is derived by viewing the system as g(s s s 2t ) := g 2 (g 1 (s s s 2t )) and applying the chain rule to its Jacobian.Lemma 2. If the spectral radius (i.e., the largest eigenvalue) of the system matrix S S S = S S S 2 S S S 1 is less than 1 (i.e.ρ(S S S) < 1), then the all-zero equillibrium of the NLDS is asymptotically stable.
Proof.Consider another system that uses the same update rules, but takes two timesteps at once.This system can be written as s s s 2t+2 = g(s s s 2t ) where g(s s s 2t ) := g 2 (g 1 (s s s 2t )).These two systems are equivalent, and the Jacobian of g evaluated at 0 0 0 can be written as a product two Jacobians evaluated on the same point.
The first equality is due to chain rule, and the second comes from the fact that 0 0 0 is an equilibrium point (i.e., s s s 2t = 0 0 0 implies s s s 2t+1 = 0 0 0).Then, applying Lemma 1 to Equation 5 completes the proof.
As mentioned before, this argument can be generalized to any arbitrary number of repeating pathogen transfer matrices (i.e, from only repeating R R R , which leads to our main result. Theorem 1 (Spectral characterization).If the spectral radius (i.e., the largest eigenvalue) of the system matrix S S S = ∏ T t=1 S S S T −t+1 is less than 1 (i.e.ρ(S S S) < 1), then the all-zero equillibrium of the NLDS is asymptotically stable.
Here we show an example simulation results on the LYON hospital dataset with synthetic parameters to demonstrate our characterization.We vary ρ(S S S) with the shedding-rate α.Supplementary Fig 1(a) shows the varying infection trajectories as well as the corresponding ρ(S S S) values.When ρ(S S S) > 1, the system reaches an endemic state with nonzero infections, whereas when ρ(S S S) < 1, we see fast decline towards the zero-infection state.Supplementary Fig 1(b) displays the threshold on ρ(S S S) at which the dynamics of the system change.The endemic state footprint, measured as the average infected fraction across the last 50 timesteps, stays at 0 until ρ(S S S) reaches 1 as expected from Theorem 1.As soon as ρ(S S S) exceeds 1, the footprint suddenly takes off to a nonzero infection fraction, indicating a shift from gradual extinction to a long-lasting epidemic.Although this ρ(S S S) is different from the basic reproductive number R 0 for the 2-MODE-SIS model since the computation of ρ(S S S) also takes the pathogen loads l l l t into consideration.However, they are highly related and share the same tipping point (ρ(S S S) = 1 vs. R 0 = 1).

Explaining counter-intuitive behaviors
We also show that our characterization correctly captures and helps explain counter-intuitive behaviors of pathogen load-based propagation which have been observed before, specifically the so-called 'dilution effect' 25,36,37 .In Supplementary Fig 2 .we show the simulation results on static Erdős-Rényi random graphs with increasing link probability p.The transfer ratios τ i jt for i ̸ = j and φ i are fixed, such that the only difference between the curves are the number of edges in the graph and the proportion of pathogen remaining in each node after transfer (τ i jt for i = j).As p increases, the number of edges are also increasing in this situation.However, we observe that the number of infections at the end are decreasing with increasing p.This is surprising, because this goes against our intuition that more contacts lead to more infections.This also contrasts with previous Independent Cascade/Linear Threshold/SIR style models where adding an edge leads to more infections 38 .However, our ρ(S S S) correctly captures this dynamics.The reason is that with more edges in the graph, more pathogen is sent out and less remains, which means lower τ i jt for i = j.Hence, the diagonal elements of S S S decreases, and ρ(S S S) also decreases.

Using ρ(S) to formalize clinical problems
Here, we show how our spectral characterization ρ(S S S) can help formulate the two clinical problems IP and CP.The underlying intuition is that our spectral characterization well captures the transition dynamics of the HAIs breakout, and a lower ρ(S S S) value should correspond to fewer infections.We first formalize the IP problem as follows: Formal Isolation Problem (FIP).Let S S S(Θ, P) denote the system matrix given input parameters Θ and patients in set P under precaution.Given all patients X and a budget constraint k, find the subset of patients P * for precaution s.This formulation uses ρ(S S S(Θ, P)) to capture how isolating patients in P will affect the epidemic.Here, we formally model bringing a patient i under precaution by zeroing out off-diagonal terms in the i-th row and column in each pathogen transfer matrix R R R t (i.e., cut off its links with other nodes), and reducing the diagonal terms R R R t (i, i) by a multiplicative factor φ ppe < 1 (i.e., reduce its remaining pathogen via cleaning).They are widely used together in hospitals for contact precautions 9,21,39 .Note that the current setup can be readily modified to capture other contact precaution practices while maintaining a similar problem formulation.Similarly, we can also formalize the CP problem: Formal Clearance Problem (FCP).Let S S S(•) be defined as in FIP.Given an initial set Q of d candidate patients already under precaution (|Q| = d) and a budget constraint k < d, find the subset of patients P * for clearance s.t.P * = arg min P ρ(S S S(Θ, Q \ P)) subject to P ⊂ Q and |P| = k Algorithms Unfortunately, both formal problems are NP-hard.This can be seen via a reduction from the well-known Independent Set problem 40 or a more similar Nodal Spectral Radius Minimization (NSRM) problem 41 , both known to be NP-hard.For brevity, we defer detailed proofs later.However, computational hardness is not the only challenge.The optimization process itself can also incur numerical issues due to finite precision.We first discuss this issue in detail and develop practical algorithms for FIP and FCP via surrogate goals, which sidestep this issue.

Numerical issue
A natural algorithm for FIP or FCP would try adding (or removing) each candidate patient P under (or from) precaution and then compute by how much ρ(S S S) changes to incrementally build a feasible solution set while minimizing ρ(S S S).In practice, however, we find that computing the change in ρ(S S S) due to a single node removal runs into numerical issues: If there are much more patient-to-non-patient interactions than patient-to-patient interactions, the change in ρ(S S S) w.r.t.P can be too small to be computed within finite precision.The issue even gets worse when the total number of nodes N and the number of timesteps T increase.Justification: To see why, let us decompose the pathogen transfer matrix R R R t in S S S t into rows corresponding to patients (who can be infected) and rows corresponding to non-patients (who cannot be infected).Then the R R R t,pp submatrix of R R R t encodes the patient-to-patient interactions at time t, R R R t,ph encodes the patient-to-non-patient interactions at time t, and so on.Note that when entries in R R R t are much smaller than (1 − δ ) and α (as in real parameter settings), the spectrum of S S S t is mostly governed by the upper-left submatrix of S S S t (enclosed in red square).This can be seen by viewing R R R t,ph and R R R t,hp as small perturbations added to a block-diagonal matrix of which the eigenvalues equal those of the upper-left submatrix in red, and R R R t,hh .Hence if the majority of interactions are between patients and nonpatients rather than among patients, most changes will occur in R R R t,ph and R R R t,hp , but these changes due to link removal can be overshadowed and lead to a change in ρ(S S S) too small to be captured under finite precision.We observe that this is actually the case in our real-world testbeds: Only 0.005% of contacts are among patients in UVA-PRECOVID dataset, and 0.004% in UVA-COVID dataset.

Minimize the upper bound of ρ(S)
Hence we propose an alternate strategy to solve FIP and FCP by minimizing an upper bound on ρ(S S S) that can be computed more precisely.In particular, we define a system-adjacency matrix as A A A := ∏ T t=1 A A A T −t+1 , and minimize its largest eigenvalue ρ(A A A) as a surrogate to minimize the upper bound of ρ(S S S).The following theorem shows that ρ(S S S) is upper bounded by a monotonic function of ρ(A A A), justifying our approach of minimizing ρ(A A A) instead.
Theorem 2. Let S S S, A A A denote the system and system-adjacency matrices of 2-MODE-SIS model with T number of timesteps.Then ρ(S S S) ≤ f (ρ(A A A)), where k , and τ max = max i̸ = j,t {τ i jt }.Further, f (x) is a monotonic increasing function in x.
Proof.Lemma 3 first states a general upper bound on the spectral radius of any matrix M M M structured in a particular way.This Lemma can be proven using Schur complements and the interlacing theorem on a principal submatrix.
Lastly, note that M M M is a principle submatrix of M M M ′ .Since M M M ′ is non-negative, applying interlacing theorem of principle submatrix 42 leads to ρ(M M M) ≤ ρ(M M M ′ ) and this proves the lemma.
Using Lemma 3 and the fact that S S S is structured similarly to M M M, Lemma 4 shows that ρ(S S S) is upper bounded by a monotonic function of ρ(R R R), where R R R denotes an intermediate system-transfer matrix R R R := ∏ T t=1 R R R T −t+1 .Lemma 4. Given S S S, R R R and g(•) as before, then Proof.Say we have T = 2.

S S S 2 S S S
For T > 2, repeatedly multiplying and omitting high-order terms leads to following approximation: This approximation has the exact structure discussed in Lemma 3. Applying the Lemma proves the claim.
Note that having less non-infectable nodes leads to a tighter upper bound.Hence, starting from Lemma 4, we can obtain an upper bound on ρ(S S S) in terms of ρ(A A A) as such: The second inequality is due to monotonicity of g(•) with respect to the first input, and ρ(R R R) ≤ Cρ(A A A) + 1 from Lemma 5.
Expanding the last term leads to the desired result.
Next, Lemma 5 shows that ρ(R R R) is also upper bounded by a monotonic function of ρ(A A A) (which contains only zeros and ones) by viewing R R R as a Hadamard product of a constant matrix times A A A, plus some diagonal terms.Lemma 5. Given R R R, A A A as before, ρ(R R R) is upper bounded by a monotonic function of ρ(A A A): ρ(R R R) ≤ Cρ(A A A) + 1 (C and τ max as given in Theorem 2).
Proof.Note that the individual transfer matrices R R R t are formed by a Hadamard product plus diagonal terms: R R R Here T T T t stores the transfer rates [T T T t ] i, j = τ i jt , and D D D t are the additional diagonal terms.Then, the product R R R can be written as where the inequalities are element-wise.Then using Weyl's inequality leads to the desired upper-bound on ρ(R R R).
Theorem 2 then follows from Lemma 4 and Lemma 5 due to monotonicity of the upper bounds.
We observe that the upper bound follows our intuition with respect to individual parameters: A large α,β or small δ leads to a large upper bound, implying possibility of a large ρ(S S S).Increasing ρ(A A A) also increases the upper bound, but how much it increases depends on τ max or how much pathogen can be transferred through a single contact.Leveraging Theorem 2, we propose to solve the following two surrogate problems: Surrogate Isolation Problem (SIP).Let A A A(P) denote the system-adjacency matrix with patients in set P under precaution.Given all patients X and the budget constraint k, find a subset of patients P * for precaution s.t.P * = arg min P ρ(A A A(P)) subject to P ⊂ X and |P| = k.

Final algorithm
Fortunately, ρ(A A A) does not suffer from the same numerical issue as ρ(S S S) does.However, the two surrogate problems are also NP-hard.Hence we propose a simple yet effective heuristic algorithm, GREEDY-SPECTRAL, for SIP and SCP: Starting from an empty set, we iteratively find and add a patient for isolation (or releasing) that leads to the smallest ρ(A A A) value until the size of the set reaches the given budget.Note that in each iteration of GREEDY-SPECTRAL, we need to recalculate the change in ρ(A A A) after isolating or releasing every node, requiring O(kN) number of ρ(A A A)-computations.Each ρ(A A A)-computation can be time-consuming when done naively through computing and storing the full dense A A A and then computing its spectral radius, which takes O(N 2.37 T ) time 43 .Instead, we leverage the sparsity of individual A A A t , and use power iterations to compute ρ(A A A) without explicitly constructing A A A. The following Lemma 6 states the computational cost of this approach.Lemma 6. Computing ρ(A A A) with the Arnoldi method using implicit linear operator A A A op requires O(∑ T t=1 nnz(A A A t )) time and O(N) space per iteration, where nnz(A A A t ) denotes the number of nonzeros in A A A t .
Proof.Since the Arnoldi method only requires a function that computes matrix-vector products with the query matrix, we can use a linear operator A A A op : x x x → A A A T • • • A A A 1 x x x that does not explicitly compute A A A and then apply to x x x, but rather applies A A A 1 , A A A 2 , up to A A A T one-by-one to the input vector x x x.
Hence, when searching for the largest-magnitude eigenvalue of A A A, each i-th iteration of the Arnoldi method first computes the matrix-vector product q q q = A A A op (v v v i ) and then normalizes the result v v v i+1 = q q q ∥q q q∥ .Thus, each iteration only needs to store a length-N vector, which leads to the O(N) space complexity.The time complexity is dominated by the cost of computing A A A op (v v v).Within A A A op , computing the sparse matrix-vector product A A A t v v v requires O(nnz(A A A t )) time: each term in A A A t is multiplied by an entry in v v v and added to the output vector only once.Therefore, aggregating the cost of T sparse matrix-vector products in A A A op leads to O(∑ T t=1 nnz(A A A t )) time.
When using the Arnoldi method, the overall computational cost of GREEDY SPECTRAL can be reduced down to O(kn ∑ T t=1 nnz(A A A t ))-time and O(N)-space complexity as shown in Supplementary Fig 3 .Additionally, the change in ρ(A A A) after adding each node to the solution set can be computed in parallel for further speedup.

Experiemnts setup
Datasets: We evaluate the performance of GREEDY-SPECTRAL on various real datasets (see Supplementary Table 3, the meaning of notations are listed in Supplementary Table 1).On each dataset, we run 1000 simulation runs and report the average.For UVA datasets, we run the simulation till 294 timesteps (covering the entire given time period).This dataset contains a large suite of clinical metadata from the Epic-based SQL database at the University of Virginia (UVA) hospital.It also contains the weekly number of new tested positive MRSA cases in the hospital, and a rich set of features for each patient.From the dataset, we construct two time-varying heterogeneous contact networks, one representing 294 days before COVID-19 (May 2019 to Feb 2020), referred as the UVA-PRECOVID network, and another representing 294 days since the pandemic (May 2020 to Feb 2021), referred as the UVA-COVID network.The networks are derived from inpatient Data, doctor's Notes, and MedAdmin Data, which record the time and location of interactions between patients and HCWs.We aggregate these into daily networks such that two nodes are connected in G t if they have at least one contact at day t.
Besides, we also use a smaller dataset, LYON, to showcase the tipping point as in Supplementary Fig 1 .This dataset is collected using wearable RFID sensors 44,45 and contacts were recorded when people are in close proximity.Similarly to UVA, we construct G t if they have at least one contact at day t.Specifically, we use this dataset to show that our GREEDY-SPECTRAL algorithm for minimizing ρ(A A A) is also effective in minimizing ρ(S S S).However, this dataset doesn't contain the number of new tested positive MRSA cases, so we will use synthetic parameters in the experiments.Baselines: We compare GREEDY-SPECTRAL algorithm against the following popular methods: • RANDOM: Randomly pick k patients.
• PROPDEGREE: Pick k patients with the probability of picking i proportional to the sum of the nonzero terms of i in R t (intuitively, the degree of i) over all timesteps (i.e., p ∝ ∑ t ∑ j 1(R t ( j, i))).
• SHORR: Pick k patients with the highest so-called Shorr scores 46 ; Shorr score was developed to assess the clinical risk of an admitted patient acquiring MRSA, and variables used include the history of recent hospitalizations or kidney dialysis.This requires access to EHRs and is not applicable to the LYON dataset.Note that Shorr score is one of the classical clinical criteria used to pick patients for MRSA contact precaution 46 , we compare with SHORR to showcase that our GREEDY-SPECTRAL is more effective than such usual practices using patient-level features.
• OPTIMALSPECTRAL: Pick the optimal k patients that minimize ρ(A A A) by testing all possible combinations.However, this method is not scalable on large networks and we only get the results on the LYON testbed.

8/29
Evaluation metrics: We use the following metrics to quantify the performance in suppressing MRSA outbreak.
• η ρ(A A A) : Ratio of ρ(A A A) with P under precaution against ρ(A A A) when k = 0 (no patient under precaution) (i.e.η ρ(A A A) = ρ(A A A(P)) ρ(A A A( / 0)) ).Lower is better.
• γ CASES : Ratio of the number of new MRSA cases against k = 0. (e.g.γ CASES for PROPDEGREE is CASES PROPDEGREE CASES k=0 , where CASES PROPDEGREE and CASES k=0 are the number of new MRSA cases of PROPDEGREE and k = 0 respectively).Lower γ CASES is better.
• γ LOADS : Ratio of non-patient (HCWs and location) pathogen loads against k = 0. (e.g.γ LOADS for PROPDEGREE is , where LOADS PROPDEGREE and LOADS k=0 are the sum of HCWs pathogen loads and location pathogen loads of PROPDEGREE and k = 0 respectively).Lower γ CASES is better.
• PROB[CASES > TARGET]: Probability that the number of new MRSA cases is larger than TARGET.(e.g.PROB[CASES > 500] is the probability that the number of new MRSA cases is larger than 500).Lower PROB[CASES > TARGET] is better.

Additional results
First, we show how GREEDY-SPECTRAL performs in minimizing ρ(A A A) for Isolation problem.As seen in Supplementary Fig 4, our algorithm achieves lower η ρ(A A A) than RANDOM, PROPDEGREE, and SHORR on both UVA-PRECOVID and UVA-COVID dataset, as isolation budget k increases.This demonstrates the effectiveness of our algorithm in minimizing ρ(A A A).
Next, we show how GREEDY-SPECTRAL performs in minimizing ρ(A A A) for Clearance problem.As seen in Supplementary Fig 5, our algorithm achieves lower η ρ(A A A) than RANDOM, PROPDEGREE, and SHORR on both UVA-PRECOVID and UVA-COVID dataset, as clearance budget k increases.This demonstrates the effectiveness of our algorithm in minimizing ρ(A A A).

High precaution and cleaning scenario
Additional isolation problem results: Here, we first show more experiment results for high precaution and cleaning scenario (main article  7(a), the ratio of non-patient pathogen loads γ LOADS is always lower for our algorithm than other baselines.For example, when k = 4000, it achieves 9.2% (10.8%) lower loads than SHORR on UVA-PRECOVID (UVA-COVID).This indicates that our algorithm is more effective in reducing the non-patient pathogen loads than other baselines.However, in Supplementary Fig 7(b), we observe that it still gives a similar ratio of new MRSA cases γ CASES as other baselines.Similarly, in Supplementary Fig 7(c), it also gives similar PROB[CASES > TARGET] for varying TARGET (x-axis) as other baselines.

Isolation problem results:
First, we show more experiment results for relaxed precaution and cleaning scenario with fixing α such that the number of cases is 1000 when k = 0 but changing the k (main article Fig 4) in Supplementary Fig 8.
Next, we will show more results by fixing k but changing α.Supplementary Fig 9 shows the results with varying α but fixing k.As shown in Supplementary Fig 9, GREEDY-SPECTRAL achieves lower non-patient pathogen loads and number of new MRSA cases than other baselines.For example, for α such that the number of cases is 900 when k = 0 in UVA-COVID (bottom row), it leads to 12.6% less loads in Supplementary Fig 9(a

Performance of GREEDY-SPECTRAL in minimizing ρ(S)
Supplementary Fig 12 shows the effectiveness of minimizing ρ(S S S) by minimizing ρ(A A A) using GREEDY-SPECTRAL.In Supplementary Fig 12, we compare performance between the result of our algorithm in minimizing ρ(A A A) and OPTIMALSPECTRAL method in minimizing ρ(S S S) directly.Here, we perform the experiments on LYON dataset with synthetic parameters since the LYON dataset is the only dataset scalable for OPTIMALSPECTRAL while having no ground-truth number of cases for calibration.As shown in Supplementary Fig 12, we can see the two methods achieve similar performance in η ρ(S S S) (note that the two lines are overlapping), which demonstrates the effectiveness of our GREEDY-SPECTRAL algorithm in minimizing ρ(S S S).

SIP and FIP are NP-hard
Proof.Note that the Nodal Spectral Radius Minimization (NSRM) problem 41 is a special case of SIP where all nodes are infectable and there is only one timestep (i.e., |V | = P and T = 1).This implies SIP is at least as hard as NSRM, and is thus NP-hard.
Next, we prove FIP is NP-hard by showing FIP is a generalization of SIP proven to be NP-hard above.For all t = 1, . . ., T , each system matrix S S S t equals adjacency A A A t with zero-padding when (α, β , δ ) = (0, 0, 1), τ i jt = 1 for all i ̸ = j, and τ i jt = 0 for all i = j.Given this setup, S S S encodes A A A within the bottom-right block.Below is an illustrative example with T = 2: Note that this block-structure enforces not only ρ(S S S) = ρ(A A A), but also ρ(S S S(P)) = ρ(A A A(P)) for any possible set of isolated patients P.This implies existence of an FIP instance that is equivalent to any given instance of SIP.Hence FIP is at least as hard as SIP, and is thus NP-hard.

SCP and FCP are NP-hard
Proof.We prove this by reducing the Independent Set (IS) problem to SCP.Given a graph G = (V, E), the goal of IS is to find a set of k ≤ |V | nodes that do not share an edge.Note that a graph with no links between nodes has the all-zero adjacency matrix and a spectral radius of 0. Thus given an instance of IS, we can construct an equivalent instance of SCP by using the same graph G with all nodes representing infectable patients initially under precaution (i.e., R = V ), and then aiming to clear out |V | − k patients from precaution.If the resulting graph with k patients still under precaution has spectral radius equal to 0, then the graph G contains an independent set of k nodes.If the smallest achievable spectral radius is larger than 0, then no independent set of size k exists.This implies SCP is at least as hard as IS, and is thus NP-hard.
Then leveraging the exact same setting as for SIP and FIP above, we can reduce an instance of SCP to an equivalent instance of FCP.Therefore, FCP is at least as hard as SCP, and hence FCP is NP-hard.

Parameter setup
As mentioned in the main article, for high precaution and cleaning scenario, we calibrate on the weekly number of new tested positive MRSA cases in UVA hospital to get the parameters for 2-MODE-SIS model.For calibration, we use the Approximate Bayesian Computation-Rejection scheme implemented in the ABCPY package 47 .The procedure first samples a parameter set Θ i from a user-defined prior, runs multiple simulations with Θ i , and then stores Θ i if the average simulation results are close to actual observations.The final list of Θ i 's then approximates the posterior distribution given actual data.When evaluating which Θ i to store, we use cumulative weekly counts of new MRSA infections and measure the RMSE error between simulated and actual values.Note that in actual data, only patients who were negative in a previous test but were positive in a later test count as new MRSA infections.
Here, we show the effectiveness of the calibration procedure from two perspectives: On the one hand, we use some synthetic parameters to simulate some synthetic MRSA case curves.We then calibrate on these synthetic curves to show that our procedure could "recover" synthetic parameters via calibration.On the other hand, we also calibrate on the real-world MRSA case curves, and show the simulated curves using our calibrated parameters align closely with the real-world curves.
In Supplementary Fig 13, we set up 5 different "scenarios" for synthetic experiments.In each scenario, we set some "ground truth" parameters and run simulations with these ground-truth parameters to generate synthetic MRSA cases curve.Next, we use these curves for calibration to get the calibrated parameters.If the calibrated parameters closely align with the ground truth parameters we initially set (i.e., we can "recover" these parameters through calibration), we can conclude that the calibration process is effective.
In Supplementary Fig 14, we show that the resulting calibrated parameters fit the model closely towards actual observations.We also list the value of calibrated parameters in Supplementary Table 4. Specifically, in this table, τ p2p refers to the τ i jt for any t where i and j are patients.Others are similarly defined.p, h, l represent patients, HCWs, and locations, respectively.

Lemma 3 .
Given non-negative scalars a, b, c, non-negative n × n matrix O O O, and (n + p) × (n + p) matrix M M M: I p denotes the p × p identity matrix with p < n.Then, for the spectral radius of M M M we have ρ(M M M) ≤ g(ρ(O O O), a, b, c), where g(x, a, b, c) = 1 2 a + x + (a + x) 2 − 4(ax − bc) .Further, g(x, a, b, c) is monotonic increasing w.r.t.x.Proof.Let M M M ′ denote a matrix that is almost the same as M M M, but does not have the zeros filling up the gap: M M M ′ = aI I I n bI I I n cI I I n O O O Then, using the Schur complement leads to det(M M M ′ − λ I I I) = det ((O O O − λ I I I)(aI I I − λ I I I) − bcI I I) = det ((a − λ )O O O − (λ (a − λ ) + bc)I I I) This means that λ is an eigenvalue of M M M ′ if and only if µ = λ (a − λ ) + bc a − λ is an eigenvalue of O O O.This is a quadratic equation of λ , of which the solution is

( 1 −
δ )I I I p β I I I p 0 0 0 αI Let A A A(•) be as in SIP.Given an initial set Q of d candidate patients already under precaution (|Q| = d) and a budget constraint k < d, find the optimal subset of patients P * ⊂ Q for clearance: P * = arg min P ρ(A A A(Q \ P)) subject to P ⊂ Q and |P| = k.

Fig 3 )
in Supplementary Fig 6.In the main article Fig 3, we only show the results of γ LOADS and γ CASES for k = 1000, 2000, 3000, 4000, 5000 using bars.In Supplementary Fig 6, we also add k = 500, 1500, 2500, 3500, 4500 results and show them using lines.Clearance problem results: Here, we show the experiments for the clearance problem (CP) under high precaution and and cleaning scenario.Supplementary Fig 7 shows how GREEDY-SPECTRAL performs in minimizing non-patient pathogen loads and the number of new MRSA cases.As shown in Supplementary Fig ) and 9.2% less cases in Supplementary Fig9(b).In Supplementary Fig9(c), our algorithm can always lead to less than 500 cases, while SHORR has more than 48.6% probability of having more than 500 cases.From Supplementary Fig9, we can conclude GREEDY-SPECTRAL algorithm is effective in suppressing the MRSA outbreak under the relaxed precaution and cleaning scenario for isolation problem.Clearance problem results:Similarly, we first show more experiment results for relaxed precaution and cleaning scenario with fixing α such that the number of cases is 1000 when k = 0 but changing the k (main articleFig 5) in Supplementary Fig10.Then we show more results by fixing k but changing α.Supplementary Fig 11 shows the results with varying α but fixing k.As shown in Supplementary Fig 11, GREEDY-SPECTRAL achieves lower non-patient pathogen loads and number of new MRSA cases than other baselines.For example, for α such that the number of cases is 700 when k = 0 in UVA-PRECOVID (top row), it leads to 16.8% less loads than other baselines in Supplementary Fig 11(a) and 11.2% less cases in Supplementary Fig 11(b).In Supplementary Fig 11(c), our algorithm has only 5.0% probability of having more than 500 cases, while other baselines have more than 90.5% probability of having more than 500 cases.FromSupplementary Fig 11 into consideration, we can conclude GREEDY-SPECTRAL algorithm is effective in suppressing the MRSA outbreak under the relaxed precaution and cleaning scenario for clearance problem.

11 / 29 (
a) Time vs. Infected fraction (b)   vs. Footprint Supplementary Fig. 1. 2-MODE-SIS simulations on the LYON testbed.(a) Time vs. infected fraction.Banded regions indicate one standard error from an average of 100 runs.The endemic infection size decreases as we reduce ρ(S S S) from 1.21 to 0.96, where the disease goes extinct.(b) ρ(S S S) vs. endemic state infection fraction in log-linear scale.The long-term dynamics shifts at ρ(S S S) = 1 (dotted line).

Table 2 .
Pathogen load vector at time t x x x t Infection state vector at time t p p p t Infection probability vector at time t (p p p t = E[x x x t ]) A A A t Adjacency matrix at time t R R R t Pathogen transfer matrix at time t ρ(M M M) Spectral radius (i.e., the largest eigenvalue) of matrix M M M nnz(M M M) Number of nonzeros in matrix M M M List of 2-MODE-SIS model parameters Natural pathogen reduction rate on node i τ i jt Transfer ratio from node j to node i at time t