Estimating epidemiological parameters using diagnostic testing data from low pathogenicity avian influenza infected turkey houses

Limiting spread of low pathogenicity avian influenza (LPAI) during an outbreak is critical to reduce the negative impact on poultry producers and local economies. Mathematical models of disease transmission can support outbreak control efforts by estimating relevant epidemiological parameters. In this article, diagnostic testing data from each house on a premises infected during a LPAI H5N2 outbreak in the state of Minnesota in the United States in 2018 was used to estimate the time of virus introduction and adequate contact rate, which determines the rate of disease spread. A well-defined most likely time of virus introduction, and upper and lower 95% credibility intervals were estimated for each house. The length of the 95% credibility intervals ranged from 11 to 22 with a mean of 17 days. In some houses the contact rate estimates were also well-defined; however, the estimated upper 95% credibility interval bound for the contact rate was occasionally dependent on the upper bound of the prior distribution. The estimated modes ranged from 0.5 to 6.0 with a mean of 2.8 contacts per day. These estimates can be improved with early detection, increased testing of monitored premises, and combining the results of multiple barns that possess similar production systems.


Limiting spread of low pathogenicity avian influenza (LPAI) during an outbreak is critical to reduce the negative impact on poultry producers and local economies. Mathematical models of disease transmission can support outbreak control efforts by estimating relevant epidemiological parameters.
In this article, diagnostic testing data from each house on a premises infected during a LPAI H5N2 outbreak in the state of Minnesota in the United States in 2018 was used to estimate the time of virus introduction and adequate contact rate, which determines the rate of disease spread. A well-defined most likely time of virus introduction, and upper and lower 95% credibility intervals were estimated for each house. The length of the 95% credibility intervals ranged from 11 to 22 with a mean of 17 days. In some houses the contact rate estimates were also well-defined; however, the estimated upper 95% credibility interval bound for the contact rate was occasionally dependent on the upper bound of the prior distribution. The estimated modes ranged from 0.5 to 6.0 with a mean of 2.8 contacts per day. These estimates can be improved with early detection, increased testing of monitored premises, and combining the results of multiple barns that possess similar production systems.
Outbreaks of low pathogenicity avian influenza (LPAI) can be highly disruptive to the poultry industry. LPAI can place a heavy financial burden on producers due to the reductions in egg production, decreases in bird market weight, and increases in mortality that can occur in infected flocks as well as costs stemming from intensive cleaning and extended downtime 1 . In addition, H5 and H7 LPAI strains can not only mutate into highly pathogenic avian influenza, a devastating disease in domestic poultry that can cause near 100% mortality in infected flocks 2 , but their presence in domesticated poultry can also result in significant trade restrictions. Thus, it is critical to limit the spread of virus during an outbreak.
Epidemiological investigation is an important part of any LPAI outbreak response. Transmission of LPAI between poultry premises is primarily driven by human activity, for example live bird movements, rendering truck visits to multiple premises, or equipment sharing 1 . Epidemiological investigation can identify the specific activities responsible for disease spread, allowing for effective mitigation strategies to be implemented that target a specific pathway during an outbreak. The identification of these pathways can also improve understanding of transmission routes, knowledge that can be used to inform future outbreak response. Furthermore, tracing efforts performed as part of an epidemiological investigation can highlight previously unidentified at-risk premises.
Information on the time of infection for a house or premises can assist epidemiological investigations by allowing investigators to focus on a specific timeframe, improving the precision of the pathway analysis and tracing efforts. Mathematical modeling can provide such information, estimating a range of possible virus introduction times from outbreak data [3][4][5][6][7] . Here, the time of virus introduction is estimated using diagnostic testing data www.nature.com/scientificreports/ from turkey flocks infected during a 2018 LPAI H5N2 outbreak in the state of Minnesota in the United States (U.S.). The likelihood of observing the diagnostic test results given a time of virus introduction was estimated using simulated output from a stochastic disease transmission model. This comparison of simulated and observed data can also be used to reduce uncertainty in the internal parameters of the simulation model, a process known as inverse modelling 8 . There is substantial uncertainty in the adequate contact rate (also called the transmission parameter) for LPAI in turkeys, which determines the rate of spread in the disease transmission model. To address this uncertainty, the adequate contact rate was jointly estimated with the time of virus introduction for the LPAI infected turkey flocks in Minnesota.
To reduce uncertainty and provide disease transmission estimates applicable to epidemiological investigations, we explored an approach to estimate the time of virus introduction and adequate contact rate using diagnostic testing data from LPAI infected turkey flocks. Results are presented for the houses testing virus positive on one of the eight meat-type turkey premises infected during the 2018 LPAI H5N2 outbreak in Minnesota. The performance of the estimation approach was evaluated by applying the method to diagnostic testing data simulated with known parameters for several combinations of scenarios including high and low contact rate scenarios, scenarios varying the time of exposure relative to the sampling time of the first test, and scenarios varying by the amount and frequency of testing performed. Lastly, the method validation was extended to assess the effect of combining the likelihoods from multiple houses on the parameter estimates.

Materials and methods
Outbreak data. Birds on eight premises, all meat-type turkey premises, tested positive by real-time reverse transcription polymerase chain reaction (rRT-PCR) during the 2018 LPAI H5N2 outbreak in Minnesota. Diagnostic test results were available for each of the 33 barns that tested positive across the eight premises from the time of detection until the birds were sent to processing as part of controlled marketing 9 . The current article focuses on the estimated time of virus introduction and adequate contact rate results for a single premises, the 2nd premises detected in Kandiyohi county in Minnesota (Kandiyohi 2). Results from the other premises have been reported 10 . All five barns on Kandiyohi 2 tested positive for virus. The data consisted of the number of birds in each barn, the date swabs for rRT-PCR or serum samples were taken, and the number of positive diagnostic test results. Swabs for rRT-PCR were taken from live birds and tested in pools of 10 or 11 swabs. Serology testing consisted of 10 serum samples tested individually by either agar gel immunodiffusion (AGID) or enzyme-linked immunosorbent assay (ELISA). The Kandiyohi 2 testing data is provided in Supplementary Table S1. All samples were collected and tested under the guidance and authority of the Code of Federal Regulations (CFR) enacted to guide avian influenza surveillance in poultry as outlined in 9 CFR parts 145, 146, 147 and 56 11 .
Stochastic within-house LPAI transmission model. The infection prevalence of LPAI and seroprevalence over time simulated from the stochastic within-house transmission model were used to estimate the likelihood of observing the diagnostic test results given different virus exposure times and adequate contact rates. The transmission model for LPAI in broiler breeders from Bonney et al. was reparametrized for LPAI in turkeys 12 . A summary of the input parameters is given in Table 1. The transmission model tracks the number of birds in the susceptible, latently infected, infectious, recovered, removed (dead), and seroconverted states at each simulation time step, which in this case was set to 0.01 days.
The number of initially infected birds in a house was dependent on whether the house was the first infected on the premises, as deduced from diagnostic testing data. In the house that first tested positive for virus, each transmission run began with a single latently infected bird. The number of initially infected birds was increased for the houses that tested positive for virus at a later date based on the assumption that more virus would be present on the premises and thus likely transmitted to more susceptible birds in neighboring houses. For these houses, the number of initially infected birds was randomly sampled from the integers between one and ten. The number of birds initially infected in non-index barns during an outbreak is unknown given potential onfarm spread due to movement of people or equipment and could be higher than the baseline assumption of one to ten 1 . Due to the uncertainty in this parameter a sensitivity scenario was evaluated where the interval for the number of initially infected birds was increased from one to fifty. For the results and discussion of this sensitivity analysis see Supplementary Table S9 and Supplementary Table S10.
In each proceeding time step, the number of infected birds was simulated from a binomial distribution as follows. Let β be the adequate contact rate, N I (t) be the number of infectious birds at time step t , and N(t) be Table 1. Input parameters for the stochastic within-house LPAI transmission model. www.nature.com/scientificreports/ the number of birds alive at time step t . The probability of infection in the time interval (t, t + �t) , p inf (t, �t) , is given by the equation 13,14 : The length of the time step t was set to 0.01 days in this analysis. Next, let N S (t) be the number of susceptible birds at time step t . The number of newly infected birds in the time interval (t, t + �t) , N new_cases (t, �t) , is modeled by the binomial distribution 13 : As constructed, any increase in the contact rate increases the probability that a susceptible bird is infected in the simulation time step, resulting in a faster rate of disease spread.
Random latent period and infectious period lengths were generated for each infected bird from distributions modeling their respective lengths. Birds transitioned out of the latent and infectious states in the first simulation time step where the randomly generated length was exceeded. The parameters of the latent and infectious period length distributions, assumed to be gamma distributed, were estimated from inoculation study data using a maximum likelihood estimation method. Data involving both H5 and H7 LPAI strains were included due to insufficient data on turkeys inoculated with H5 LPAI strains. The inoculation studies used in the estimation of the latent period distribution include Pillai  The proportion of birds that die from infection was set to 0.01 based on the low mortality observed in houses infected during this Minnesota LPAI outbreak. The birds designated to die from infection transition from the infectious to the removed state, while the other birds transition to the recovered state. Once a bird is in the removed or recovered state, it remains there for the rest of the simulation.
The transition into the seroconverted state occurred as follows. Once a bird was infected, a random number was generated from a distribution modeling the time to seroconversion. A bird transitioned into the seroconverted state in the first time step where this length was exceeded. The time to seroconversion was assumed to be gamma distributed. The parameters of this distribution were estimated from inoculation studies using a maximum likelihood method. Data involving any LPAI virus strain were included due to a scarcity of viable data. The studies used were Dundon et al., Morales, Homme et al., and Preskenis, which involved H4, H6, H7, and H9 LPAI strains [20][21][22][23] . The proportion of birds that seroconvert was set to 0.99 based on the results of Spackman et al. (2010) in which 88/89 (99%) of the turkeys inoculated with twelve North American H7 LPAI isolates had detectable antibody by day 18 to 21 post-inoculation 19 .
Due to high uncertainty in the parameter estimate for the proportion of birds that die due to infection, a sensitivity analysis was performed where the proportion of birds dying due to disease was set to 0 and 0.04. The Kandiyohi 2 results were identical to the baseline results in the sensitivity analysis, suggesting the estimates are robust to changes in the proportion of infected birds that die due to disease. This is to be expected since only live birds were sampled for testing. There is uncertainty in other parameter estimates such as the shape and scale of the time to seroconversion distribution. More work is required to characterize this uncertainty in the form of prior distributions for these parameters. However, this was reserved for future work in order to focus the paper on the estimation method for the time of virus introduction and adequate contact rate.

Estimation of the time of virus introduction and adequate contact rate. The time of virus intro-
duction and the adequate contact rate were estimated for each poultry house using a Bayesian approach. The likelihood of observing the diagnostic test results was estimated given output simulated from the stochastic within-house LPAI transmission model for different virus introduction time and contact rate pairs. Similar to Pinsent et al. (2014), the likelihood of observing a test result was modeled with the binomial distribution 4 . A summary of the notation used in the following derivation of the estimation method is given in Supplementary  Table S2.
The probability of observing a pooled rRT-PCR test result, with all swabs taken from live birds, was determined as follows. First, let N swabs pcr,j be the number of swabs per pool in rRT-PCR test event j. A test event was defined as the testing performed on samples taken at the same time. In the case of Kandiyohi 2, the rRT-PCR test events were comprised of either 3 pooled samples of 11 swabs each or 1 pooled sample of 11 swabs. Let ρ I,ij (β, t intro ) be the infection prevalence at the time of sampling for test event j. This prevalence was estimated from the ith LPAI transmission model run simulated with a given contact rate β and time of virus introduction t intro . The probability of including at least one swab from an infectious bird in a single pooled sample taken as part of test event j and based on simulation run i, a probability notated as p pcr,ij , is Next, let N tests pcr,j be the total number of pooled rRT-PCR samples tested as part of test event j and N pos pcr,j be the number of pooled samples that tested positive in test event j. Both N tests pcr,j and N pos pcr,j were observed from the field data. Finally, let Se pcr be the rRT-PCR test sensitivity, in this case equal to 0.90 24 . The probability of observing N pos pcr,j given contact rate β and time of virus introduction t intro and based on output from transmission run i is sero,k were observed in the field data. Next, let ρ S,ik (β, t intro ) be the seroprevalence in the flock at the time of sampling for test event k, estimated from the ith disease transmission model simulation run given contact rate β and time of virus introduction t intro . Last of all, let Se sero be the AGID or ELISA test sensitivity, in this case set equal to 1.00 25 . A sensitivity analysis was performed for the Se sero parameter where the time of virus introduction and adequate contact rate were estimated from the Kandiyohi 2 diagnostic test data considering an AGID or ELISA test sensitivity of 0.98. Little difference was observed in the results of the sensitivity analysis compared to the baseline. Results are given in Supplementary Table S7 and Supplementary Table S8. The probability of observing N pos sero,k given contact rate β and time of virus introduction t intro and based on the output from the ith iteration of the disease transmission model is The samples taken for rRT-PCR and serology testing were performed without replacement, meaning a hypergeometric distribution would provide a more exact probability of observing the test result. However, due to the large population size in the houses (over 7200 turkeys in each barn) compared to the size of the samples taken for testing (no more than 33 turkeys sampled for rRT-PCR and 10 turkeys sampled for serology testing), a binomial distribution can serve as an appropriate substitute. As its implementation was more straightforward, a binomial approximation was used in this analysis.
The likelihood of observing the set of diagnostic test results in a house for a given contact rate and time of virus introduction was estimated by taking the average across the 10,000 simulation iterations of the product of the individual test probabilities. Let N iterations be the number of transmission model simulation iterations run for each contact rate and time of virus introduction pair. Let N pcr be the total number of rRT-PCR test events and N sero be the total number of serology test events. The likelihood of observing the diagnostic test results given contact rate β and time of virus introduction t intro is The prior distributions for the contact rate and time of virus introduction were assumed to be uniform distributed. The distribution for the contact rate had a minimum of 0.1 and maximum of 6.0 contacts per day, which was set based on LPAI and HPAI contact rate estimates for turkeys in the literature 3,18,[26][27][28][29] . The latest time of virus introduction that was evaluated was the time of sampling for the first positive diagnostic test result observed in the data. The earliest time of virus introduction that was evaluated was 80 days prior to the sampling time of the last diagnostic test taken from the house.
Due to the low dimensionality of this problem (two parameters to be estimated), the parameter space defined by the prior distributions was explored using a grid-based method. The contact rate was incremented by 0.1 steps while the time of virus introduction was incremented by 0.25 day steps. The likelihood was estimated for each combination of contact rate and virus introduction and then normalized to obtain a posterior distribution. Let π(β, t intro ) be the prior distribution for the contact rate and time of virus introduction. The posterior distribution, p(β, t intro |x) , was determined by A posterior distribution was estimated for each house following this approach.
Multiple house estimation approach. The single house likelihoods can be conditioned on the results from other houses with similar production characteristics (e.g. similar housing structures and bird ages) with the goal of improving the parameter estimates. Houses with similar production characteristics were assumed to have the same contact rate. Therefore, the likelihoods from these other houses represent a source of additional information.
The contact rate and time of virus introduction were estimated for a given house conditioned on the contact rate likelihoods observed in houses with similar production characteristics using the following approach. A summary of the notation used for the multiple house estimation method is given in Supplementary Table S3. First, a multi-house contact rate likelihood was derived by multiplying the marginal contact rate likelihoods, estimated by summing up the likelihood across each candidate time of exposure for a given contact rate, from each house. Let l h (x|β, t intro ) be the likelihood of observing the diagnostic testing data from house h. Let t 1 be the earliest and t max be the latest time of virus introduction evaluated for house h. The marginal likelihood for the contact rate, l h (x|β) , was determined by The models were coded in R statistical software and C programming language and the analysis was performed using supercomputing resources of the Minnesota Supercomputing Institute 30,31 . Figures 1 and 2 were produced using the 'ggplot2′ package in R 30,32 .
Method validation. The time of virus introduction and adequate contact rate were estimated from simulated data generated with known parameters in order to validate the estimation approach. Test results were simulated for two test schedule and two contact rate scenarios for six barns, each with a different time of exposure relative to the first test. The first test schedule scenario was a replicate of the Kandiyohi 2 test schedule. The second test schedule scenario consisted of an intensive testing schedule of three pooled rRT-PCR samples of 11 swabs each and 10 serum samples tested by AGID or ELISA taken on the same day every three days. The two contact rate scenarios consisted of a slow contact rate scenario, with the contact rate set to 1.0 contacts per bird per day, and a fast contact rate scenario, with the contact rate set to 4.5 contacts per bird per day. The times of virus introduction relative to the first test day for the six barns were 19 days prior, 6 days prior, 3 days prior, 1 day after, 2 days after, and 9 days after, respectively. The validation datasets were generated by first simulating the number of positive birds in a sample from a binomial distribution with the probability of sampling a positive bird equal to the infection prevalence or seroprevalence estimated from the LPAI transmission model. If a positive bird was sampled, the test result was simulated from a Bernoulli distribution with the probability of the sample testing positive equal to the test sensitivity. The validation datasets were formed by taking the average test results from 10,000 simulation iterations for each barn under each test schedule and contact rate scenario. The time of virus introduction and adequate contact rate were estimated from these datasets for both the single and multiple house approach.

Results
Estimated time of virus introduction and adequate contact rate from Kandiyohi 2 diagnostic testing data. The mode, median, and 95% credibility interval (CI) estimated from diagnostic testing data from each of the five houses on Kandiyohi 2 are given in Table 2. Figures 1 and 2 display contour plots of the posterior likelihood and marginal posterior likelihood plots for the time of virus introduction and adequate contact rate for Kandiyohi 2 house 1 and Kandiyohi 2 house 2, respectively. The marginal posterior likelihood for the time of virus introduction was parabolic in shape for each house, meaning an informative mode and 95% CI could be estimated. On average, the 95% CI's spanned 17 days.
An informative mode and 95% CI could not always be estimated for the adequate contact rate. In two of the houses on Kandiyohi 1, house 1 and house 2, the likelihood increased as the contact rate increased, as can be seen for house 1 in Fig. 1. For these two houses, only the lower CI bound is informative, with the upper CI bound entirely dependent on the upper bound of the prior distribution. In house 2, house 3, and house 4, on the other hand, the marginal posterior likelihood for the contact rate was parabolic in shape, allowing for more informative estimates.   Table 3. Estimates are given for the single and multiple house approach. The multiple house approach results given in the table consider the addition of one house and five houses. The results for the addition of one house are given as the average mode, median, and 95% CI estimated from all the possible house pairings. Validation results estimated from the three scenarios not represented in Table 3. ie. the results from simulated Kandiyohi 2 test data given a contact rate of 4.5 and simulated increased testing data given a contact rate of 1.0 contact per day and 4.5 contacts per day, are provided in the Supplementary Table S4, Supplementary Table S5, and Supplementary Table S6, respectively. The true estimates for the time of virus introduction and contact rate were contained by the 95% CI for each validation run under the single house estimation approach. Including the contact rate information from other houses in the multiple house approach generally both improved the accuracy and reduced uncertainty in the estimates. Similar issues with estimating an upper bound for the contact rate as encountered in the Kandiyohi 2 field data were present in the validation results. This can be observed in the results under the single barn approach in rows 1, 2, 3, and 6 in Table 3, where the upper bound for the contact rate is dependent on the bound of the prior distribution.
As expected, more accurate estimates with narrower 95% CI bounds were observed in the results of the validation scenarios with increased testing. The estimation approach also performed better when the house was exposed to virus relatively close to or after the time of the first test as compared to houses exposed multiple days prior. While the estimates for the day of virus introduction were more accurate in the higher contact rate scenarios, performance did not necessarily improve for the contact rate estimates. An informative lower 95% CI bound could be estimated for the higher contact rate scenarios, but the upper bound, and oftentimes the mode, were dependent on the upper bound of the prior distribution.

Discussion
Monitoring flocks with diagnostic testing is an intrinsic part of outbreak management strategies like controlled marketing in LPAI infected poultry flocks. Pinsent et al. (2014) developed a method to estimate the time of virus introduction and basic reproduction number from diagnostic testing data using a binomial likelihood and output from a deterministic disease transmission model 4 . Here, diagnostic testing data from five barns infected on a premises during the 2018 LPAI H5N2 outbreak in the U.S. state of Minnesota were used to estimate the time of virus introduction and adequate contact rate for each barn. The approach used built on the Pinsent et al. (2014) method in the following ways. First, output from a stochastic disease transmission model was used to estimate the likelihood as opposed to a deterministic model. Furthermore, pooled testing in addition to individual bird testing was incorporated into the method, and the method was applied to field data as opposed to simulated data only. Most notably, the method was extended to allow for parameter estimation from the combined likelihoods of multiple houses with the goal of reducing uncertainty in the estimates.
The parameter estimates for the time of virus introduction and adequate contact rate can be used in a number of applications related to preventing widespread transmission of LPAI during an outbreak. The time of Table 3. Validation results given barns were infected 19, 6, and 3 days prior to and 1, 2, and 9 days after the date of the first test, assumed to have been performed on October 20. The time of virus introduction and adequate contact rate were estimated from simulated test data based on the Kandiyohi 2 test data and a contact rate of 1.0 contacts per day. Results are given for the single barn approach and multiple house approach. www.nature.com/scientificreports/ virus introduction estimates provide information on how the outbreak unfolded, which can help stakeholders evaluate the management of the outbreak and inform the response to future outbreaks. For example, the time between the estimated virus introduction time and date of detection could be used to assess the effectiveness of the active surveillance program used to monitor premises status. Most importantly, the estimates for the time of virus introduction can support epidemiologic investigations by helping stakeholders focus their efforts on a specific time window. The adequate contact rate can be an important input parameter in epidemiological models, including the stochastic disease transmission model used in this study in which the contact rate determines the within-flock rate of disease spread. The transmission model has also been used previously to inform surveillance design and risk assessments 12,33,34 . The adequate contact rate can vary according to factors like the virus strain, bird species and age, or housing type. The rate of disease spread can have a direct impact on the effectiveness of outbreak control measures. Diseases with slow rates of spread, for example, can take longer to be detected than ones with faster rates under the same active surveillance protocol 34 . Additionally, contact rate estimates can generate insight into the risk factors for fast and slow spread, which can be used to inform intervention strategies for specific houses and premises. During this LPAI outbreak in Minnesota, some industry veterinarians requested predictions for the time until no more infectious birds would be present in infected houses based on the latest diagnostic test results. These predictions can support logistics planning, namely when to schedule the houses to be sent to processing. Moreover, the accuracy of the predictions can be improved with contact rates estimated directly from the outbreak in progress.
The only estimates for the contact rate for LPAI in turkeys identified in a literature review were from Comin et al. (2011) and Saenz et al. (2012) 3,18 . Furthermore, only in Comin et al. (2011) was the transmission parameter estimated from field data, where the spread dynamics can differ from those of the laboratory in transmission experiments 3 . As a result, there is substantial uncertainty in the turkey LPAI contact rate, which increases uncertainty in the appropriate contact rate value or distribution to use in proactive risk assessment, where it can be advantageous to model outbreak conditions with greater generality. Thus, the contact rates estimated from the turkeys flocks infected with LPAI in the field in Minnesota provide critical information on this parameter.
The estimation approach performed well for estimating the time of virus introduction, as an informative mode, upper 95% CI bound, and lower 95% CI bound could be estimated for each house on Kandiyohi 2 as well as for each of the validation simulation runs. While informative estimates could be derived for the contact rate for many houses and validation runs, for others the estimates for the mode and upper 95% CI bound were uninformative due to being dependent on the upper bound of the prior distribution. An example of the latter can be observed in the plot of the marginal posterior likelihood for the contact rate estimated from the Kandiyohi 1 house 1 diagnostic testing data given in Fig. 1. The shape of the marginal posterior is suggestive of practical non-identifiability, which refers to the inability to derive informative estimates from the model due to limited or uncertainty in the data 35 .
The validation results suggest a few strategies that can improve contact rate identifiability. First of all, greater uncertainty was observed in the validation results when the flock was close to 100% seroconverted by the time of the first test. When the flock is close to 100% seroconversion, there is little change in the test results; they quickly fall into a pattern of all negative rRT-PCR results and all seropositive serum samples. This introduces uncertainty into the contact rate estimates in that the pattern of change observed in the test results provides information on the rate of spread. For example, suppose 10 serum samples were collected five days apart. A result of zero seropositive samples followed by 10 seropositive samples would suggest faster disease spread than a result of zero seropositive samples followed by 3 seropositive samples. An example of greater uncertainty in the contact rate due to testing starting close to or after the flock is 100% seroconverted can be observed in the estimates for the validation house assumed to be exposed 19 days prior to the first test given in the first row of Table 3. All this suggests that early detection of LPAI virus in a house is important for obtaining identifiable contact rate estimates.
A second strategy that can improve contact rate estimation is to increase the amount of testing done during the outbreak. An example of improved contact rate estimates due to increased testing can be observed in the validation results for the house assumed to be exposed six days prior to the first day of testing given in Table 3 compared to the results given in Supplementary Table S4. The third strategy is to use the multiple house estimation method, which improved the contact rate estimates for the validation houses in rows 1, 2, 3, and 6 in Table 3, for example. While this strategy is far easier and less costly to implement than increasing active surveillance for early detection or increasing the amount of testing performed while monitoring flock status, it may not always be appropriate due to the houses having differing production characteristics. Detecting virus in the house well before 100% seroconversion, increasing the amount of testing performed for monitoring infected flocks, and using the multiple house estimation method all improve the quality of information on the dynamics of virus spread within the infected houses. This not only can refine the estimates for the contact rate, but also can reduce uncertainty and improve the accuracy of estimates for the time of virus introduction.
The objective in this study of estimating parameters from diagnostic testing data has the benefit of a defined likelihood function that can link output from the stochastic simulation to the observed data, i.e. the binomial likelihood function. The time of virus introduction and adequate contact rate could alternatively be estimated using a likelihood-free approach such as Approximate Bayesian Computation (ABC) 36 , a method growing in popularity and used by Guinat et al. (2018), for example, to estimate the time of virus introduction in pig herds infected with African swine fever 7 .
An advantage of the ABC approach as compared to the likelihood approach used in this analysis is that the parameter space can be sampled using Markov Chain Monte Carlo (MCMC), allowing for more efficient exploration of the parameter space. This advantage would be even more pronounced as the number of parameters to be estimated increases, since the number of unique parameter combinations would increase exponentially in the grid-based method used in the likelihood approach. However, convergence of the MCMC chains can be difficult Scientific Reports | (2021) 11:1602 | https://doi.org/10.1038/s41598-021-81254-z www.nature.com/scientificreports/ to achieve. The main advantage of the likelihood parameter estimation approach is the ability to easily combine the likelihoods from multiple houses and estimate a combined contact rate distribution that can then be used to estimate the time of virus introduction for a given house. It was demonstrated in the validation results that the multiple house estimation method can reduce uncertainty in and improve the accuracy of the contact rate and time of virus introduction estimates. Joint estimation of the contact rate for multiple houses would be challenging under the ABC approach in that the simulated output would have to match or be close to the observed test results for multiple houses, leading to low acceptance rates. Both approaches are computationally intensive and their implementation is only feasible due to advances in and wider availability of supercomputing resources. Estimating parameters from diagnostic testing data has utility for LPAI in poultry due to established tests that can detect virus and antibodies with high sensitivity. The estimation approach detailed in this article may be less applicable to those viruses whose diagnostic tests have low sensitivity as this would introduce greater uncertainty into the observed test results and estimated parameters. Since LPAI virus infections are often associated with low mortality, diagnostic testing is an especially important indicator for the progression of the virus through meat turkey and broiler flocks. Egg laying flocks can have an additional indicator in the form of decreased egg production. Egg production data could be combined with diagnostic testing data using an approach such as ABC to estimate the time of virus introduction or contact rate. For viruses such as highly pathogenic avian influenza or African swine fever that cause very high mortality in the host population, an approach that relies on the mortality data has the greatest utility. Examples of approaches for high mortality include Bos et al. and Guinat et al. 5,7 .
Overall, our findings demonstrate a valid approach for estimating disease transmission parameters that has been used to support epidemiological investigations of LPAI outbreaks in Missouri, North and South Carolina, and the outbreak in Minnesota 10,37 . This approach may also be deployable in response to outbreaks caused by high consequence viruses in other species and geographical locations.

Data availability
The diagnostic testing data analyzed in this study are available in Supplementary Table S1 in the Supplementary Materials.