Vulnerability of the British swine industry to classical swine fever

Classical swine fever (CSF) is a notifiable, highly contagious viral disease of swine which results in severe welfare and economic consequences in affected countries. To improve preparedness, it is critical to have some understanding of how CSF would spread should it be introduced. Based on the data recorded during the 2000 epidemic of CSF in Great Britain (GB), a spatially explicit, premises-based model was developed to explore the risk of CSF spread in GB. We found that large outbreaks of CSF would be rare and generated from a limited number of areas in GB. Despite the consistently low vulnerability of the British swine industry to large CSF outbreaks, we identified concerns with respect to the role played by the non-commercial sector of the industry. The model further revealed how various epidemiological features may influence the spread of CSF in GB, highlighting the importance of between-farm biosecurity in preventing widespread dissemination of the virus. Knowledge of factors affecting the risk of spread are key components for surveillance planning and resource allocation, and this work provides a valuable stepping stone in guiding policy on CSF surveillance and control in GB.


Supplementary Methods
Supplementary Methods S1. Details on the data, method, and results for the nonmetric multidimensional scaling (NMS) analysis looking at factors correlated with the risk of CSF spread in Great Britain.

Data Management
Table A1 (Supplementary Methods S1) contains a list of the six variables included in the analysis.
Examination of the structure of the data in Table A1 (Supplementary Methods S1) required that some variables be categorised for the purpose of statistical analysis and others transformed. Table A1 includes a description of how the variables were treated for the purpose of statistical analysis. To limit biases due to farms that underwent few incursion events, we restricted our analysis to farms that had at least 10 incursion events (n=5613). From those farms a random sample of 5000 farms were selected. This was done to reduce the computational burden. No dissimilarities were observed 2 between the reduced and full set of farms for both the distribution of the probability of epidemic takeoff and the distribution of maximum epidemic size (Fig. A1 in Supplementary Methods S1).

Statistical Analysis
The data in this study were analysed using Nonmetric Multidimensional Scaling (NMS). The NMS method was used to identify epidemiological variables correlated with the risk of CSF spread as assessed in terms of probability of epidemic take-off and maximum epidemic size. NMS is a nonparametric ordination technique well suited to data that are non-normal or on arbitrary or discontinuous scales 1 . The advantage of NMS is it avoids the assumption of linear relationships among variables. It uses the ranked distances, so tending to linearise the relationships between variables 1 . PC-ORD software version 6.08 (MJM software Design, Gleneden Beach, OR) was used. A main matrix consisting of an anonymised farm ID designation and the variables listed in Table A1 (Supplementary Methods S1) was created. The final matrix consisted of six variables and 5000 farms.
An additional second matrix was created with farm ID and the probability of an epidemic occurring and the maximum size of the epidemic.
NMS was used with a Euclidian distance measure after relativizing by standard deviates of the columns. The dimensionality of the data set was first determined by plotting a measure of fit ("stress") to the number of dimensions. Optimal dimensionality was based on the number of dimensions with the lowest stress (i.e. the smallest departure from monotonicity in the relationship between distance in the original space and resistance in the reduced ordination space). A two-dimensional solution was requested of the NMS, since the inclusion of additional dimensions did not statistically improve the fit. Two hundred and fifty iterations were used for each NMS run, using random starting coordinates.
Several NMS runs were performed for each analysis to ensure that the solution was stable and represented a configuration with the best possible fit.
Diagnostics for the NMS ordination model are presented in Table A2 (Supplementary Methods S1). The results of the NMS models are shown using 2D ordination graphs of the distance between sample units which approximates dissimilarity in the estimated risk of CSF spread ( Supplementary Fig. 1).
The 80% confidence ellipses further added to discriminate between groups of interest using the "car"

Data
The value of all parameters involved in both the local between-farm spread and farm-level detection and control of CSF in Great Britain were fitted using data from the CSF epidemic in East Anglia, UK that occurred in 2000 5 . The data includes information regarding the spatial location and the time of report for all farms (n=16) for which CSF have been reported between August 8 th to November 3 rd 2000 and for which mitigation procedures were carried out. All the n=16 farms were used to estimate the model parameters.

Model framework and Likelihood
Here, we considered a modelling framework of the spread of the epidemic between farms similar to that used in the simulation model described in the main text. Briefly, we considered a stochastic spatio-temporal SIR model where susceptible individual farms become infectious and then removed or recovered 6 . The infectious compartment corresponds to the status where the farm is infected and is able to transmit the disease to other sites; while the recovery state means that the infection has been detected in the farm, on which mitigation procedures have been enforced within a 24-hour period post detection. In contrast with the simulation model, we assumed that (1) no movements occurred between farms during the epidemic period, and (2) the infection process would only be a function of the Euclidean distance between farms.
We further considered that epidemics occurred in a population of N farms, where the geographical location of each individual farm is known. We then assumed that epidemics start with a single initially infected farm and that an individual infected farm i would make an infectious contact with a susceptible individual j at a rate ij  such as  To account for such a delay, we assumed that an infected farm becomes detected/removed after a minimum of c (c > 0) days of being infectious, arbitrarily fixed to two latent periods (i.e. 2Tlat=8 days) based on 8 . The infectious period of the epidemic is therefore assumed to follow a left-truncated gamma distribution:  The likelihood of the model can therefore be expressed as: , v is the index case, and I n and R n are the total number of infected and removed individual farms in the population, respectively, with R I n n  since the epidemic has ceased. We denote by S the total farm-to-farm infectious pressure during the course of the epidemic. This is the case when we consider that an infection happens only when the total pressure exerted on a susceptible by the infectious individuals is bigger than its threshold 9 . Therefore, we have The infection process is actually a time-dependent Poisson process and S takes into account the fact there is no event happening between times.

Bayesian inference
Data available from disease outbreaks, as it is the case here, are usually the times at which infected farms were detected as such and from which mitigation procedures have been carried out. The infection times are regularly unknown unless some diagnostic tests are available leading to some knowledge of when the infections might have occurred. But in general, no information is available on the infection times. The infection times for all infected premises during the CSF epidemic in 2000 in East Anglia would need to be inferred together with the set of model parameters θ using data augmentation techniques. The Bayesian framework was then adopted as it provides natural approach for handling missing data problems along with the computational tool Markov Chain Monte Carlo (MCMC) methods [10][11][12] .
The joint posterior distribution of the model parameters given the data is can be written as where I is the vector of infection times with i I replaced by i I . Supplementary Table S1.  Pseudo-R 2 = 0.99 a Interpretation: the odds of pig producers to be infected from the movement of pigs in epidemics generated from incursions in assured commercial farms from Scotland was 4.52 (95% CI 4.29 -4.77) times higher than from incursions in assured commercial farms from England/Wales. Dk and DTk correspond to the first-order (direct) and total sensitivity indices for the kth variable tested in the global sensitivity analysis, respectively.

Supplementary Figures
Supplementary Figure S1. Nonmetric mulitdimensional scaling (NMS) ordination of the simulation of CSF spread. The nonmetric multidimensional scaling (NMS) final solution was two dimensional and explained 90.1% of the variation in the risk of CSF spread. NMS Axis 1 shows the influence of the producer type of the index case on the risk of CSF spread, whereas NMS Axis 2 shows the influence of showing records of moving at least one pig to another producer. Solid dots represent the NMS location of the 104 producers with a probability of epidemic take-off >0.5. Small circles represent the NM location of 100 randomly-selected producers with a probability of epidemic take-off ≤0.5. Crosses, triangles and squares represent the geometric center (or centroid) of the NMS location of farms (whether they moved pigs or not) described as small producers, non-assured commercial producers and assured commercial producers, respectively. Ovals indicate the 80% confidence ellipses around the centroid for the different producer types which sent (solid) or not