Quantifying defence cascade responses as indicators of pig affect and welfare using computer vision methods

Affective states are key determinants of animal welfare. Assessing such states under field conditions is thus an important goal in animal welfare science. The rapid Defence Cascade (DC) response (startle, freeze) to sudden unexpected stimuli is a potential indicator of animal affect; humans and rodents in negative affective states often show potentiated startle magnitude and freeze duration. To be a practical field welfare indicator, quick and easy measurement is necessary. Here we evaluate whether DC responses can be quantified in pigs using computer vision. 280 video clips of induced DC responses made by 12 pigs were analysed by eye to provide ‘ground truth’ measures of startle magnitude and freeze duration which were also estimated by (i) sparse feature tracking computer vision image analysis of 200 Hz video, (ii) load platform, (iii) Kinect depth camera, and (iv) Kinematic data. Image analysis data strongly predicted ground truth measures and were strongly positively correlated with these and all other estimates of DC responses. Characteristics of the DC-inducing stimulus, pig orientation relative to it, and ‘relaxed-tense’ pig behaviour prior to it moderated DC responses. Computer vision image analysis thus offers a practical approach to measuring pig DC responses, and potentially pig affect and welfare, under field conditions.

(startle) to a stimulus such as an unexpected noise [20][21][22] , monitoring and evaluation of the stimulus accompanied by freezing/immobility 22,24 , and a final response of either defensive/escape behaviour or resumption of previous activities. Critically, components of the DC response are modulated by affective state in humans and rodents, making them potentially valuable indicators of these states.
The startle component, a chained series of rapid flexor reflex movements that cascade through the body and serve a protective function (e.g. eye blink), acts as a behavioural interrupt and enhances vigilance by switching attention towards potential threats 20,21,25 . In humans and rodents, negative states appear to potentiate startle magnitude. For the human eye blink, electromyography is usually used to quantify the strength of contraction of the orbicularis oculi muscle that closes the eye. For rodents, the magnitude of movements such as jumping are recorded using a load platform. For example, humans show larger startle responses when viewing or anticipating unpleasant pictures [26][27][28][29][30] or negative emotional expressions 31 , anticipating shock 32 , in the presence of odours previously paired with a stressful social experience 33 , and even when exposed to sweat odours from anxious donors 34 . In rodents, many studies demonstrate potentiated startle in subjects exposed to conditioned stimuli predicting an aversive event 25,28,[35][36][37][38] , an effect that may be mediated by corticotropin-releasing hormone 39 and attenuated by anxiolytic drugs (e.g. diazepam 37 ). Startle-potentiating effects of aversive conditioned stimuli are also observed in monkeys 40 .
Conversely, startle magnitude is often attenuated in people exposed to pleasant pictures 27,29 , sounds 41 , or conditioned visual stimuli paired with the removal of a painful stimulus and hence 'relief ' 42 , and in rodents exposed to conditioned visual or olfactory stimuli associated with rewards such as food [43][44][45][46] and rewarding brain stimulation 47 .
The links between negative states and startle potentiation and positive states and startle attenuation are not always observed. For example, startle magnitude sometimes increases in people anticipating pleasurable pictures 48,49 or monetary reward 32,50 (although startle magnitude attenuates following monetary reward 50 ). Bach 51 provides a theoretical and computational model which reconciles these findings with those described previously by assuming that forgone opportunities due to a brief startle response are lower than those due to being injured (e.g. by a predator) if one does not startle. The model thus predicts an increased startle, resulting in lower overall costs, when beneficial opportunities are available such as during reward anticipation. Other exceptions include that startle was potentiated when viewing the names of loved ones compared to neutral or famous names 52 and that anxiolytics had no effects on fear-potentiated startle 53 . Likewise, enrichment 25 and preferred home cage odour 54 did not attenuate startle in rats, and chronic mild stress failed to potentiate startle in mice 55 .
The freeze component of the DC response may function to allow effective processing of new information about potential danger, and to decrease the likelihood of detection by predators 22,56,57 . Negative affective states associated with increased anticipation of negative events 58 may thus favour prolonged freezing responses that minimise chances of being detected by a predator and reflect a higher threshold of evidence required to judge that the environment is safe. For example, freezing in rodent aversive conditioning studies is frequently used as an indicator of a fear-like state [59][60][61] and is potentiated by induced stress 62 and attenuated by drugs and other treatments assumed to generate a less negative state [63][64][65][66][67] . Rats selected for an anxiety and depression-like phenotype show higher levels of freezing in aversive conditioning tests 68 and dogs freeze for longer in response to negatively than positively valenced stimuli 69 . As with startle, there are exceptions to these findings such as the enhancement of unconditioned freezing in appetitive contexts 70 and the failure of GABA agonists, assumed to have an anxiolytic function, to modify freezing in mice 71 .
Variation in startle and freeze components of the DC response thus offer potential as new indicators of affective valence and hence welfare. Such variation is likely to reflect a combination of influences on an animal's current affective state, including temperamental and environmental factors, and hence can provide a summary readout of how particular individuals with particular experiences and genetic predispositions are being affected by their current situation. Preliminary studies of startle responses have been carried out in some farm animal species (e.g. pigs [72][73][74] ; cattle 75 ; sheep 76 ), and a recent study by Ross et al. 77 demonstrated attenuated startle responses in hens living with, relative to without, preferred enrichments. However, a major challenge to the use of DC responses as indicators of affective state and welfare in the field is quick and accurate measurement of these rapid and subtle behavioural patterns. 'Gold standard' measures such as direct behavioural observations (e.g. detailed by-eye video analysis) may be manageable for individuals or small experimental groups under lab conditions, but are slow and/or impractical in the field. One possible solution to this challenge is to use computer vision techniques to implement real-time automated analysis of video-recorded DC responses 78,79 . Successful development of this approach would allow cheap, objective and rapid measurement of DC responses on farm, or in other contexts such as abattoirs, using just a video camera, a standardised eliciting stimulus, and the required software.
Here we explore this possibility in a commercially important farm animal -the pig -by evaluating the validity of computer vision image analysis against 'ground truth' data provided by human behavioural observation which has been used successfully in previous studies of pig DC behaviour [72][73][74] . We also collect force, kinematic, and depth-camera (Kinect) measures for comparison, as these are widely used to assess movement in other species. Finally, we evaluate whether contextual factors influence the DC response. We choose the pig because it shows characteristic DC behaviour when unexpectedly disturbed, usually involving a whole-body startle and movement to a tense standing position, sometimes accompanied by a bark vocalization 72 , followed by a period of immobility or freezing during which the animal appears to be monitoring or attempting to detect the source of the disturbance. These responses terminate when the pig flees or resumes ongoing behaviour. Successful development of an automated measure of DC responses will open the way for validating variation in startle and freeze behaviour as an indicator of affective valence in pigs and trialling the approach in groups of animals and on farm 73,80 . Habituation to test room. On arrival (week 1 of the study) pigs were given individual ear tags and then left undisturbed (except for cleaning and feeding) for three days. During weeks 1-2 the pigs were gradually habituated to human contact, moving to the Test Room and spending time on the load platform. They were initially introduced to the room in a group of six and food treats were available throughout. Over a period of days, the group sizes were reduced and time periods increased until each pig was spending at least 5 minutes alone on the platform.
testing. The aim of this study was to generate recordings of a large number of Defence Cascade events and then, using all these events, to compare metrics generated by different measurement techniques in order to establish the validity of computer vision measures of the DC response. To this end pigs were tested repeatedly to generate the dataset of recordings. Testing was carried out when the pigs were relatively young and light (weeks 3-5 of the study; 9-11 weeks of age; 20-40 kg) and subsequently when they approached slaughter weight (weeks 11-13 of the study; 17-19 weeks of age; 50-80 kg). In each phase, each pig received three Standard test sessions and one Kinematic test session. One pig had to be excluded from two sessions due to illness, thus giving 94 sessions in total.
During each Standard test session each pig was taken from its home pen to the Test Room and encouraged to enter the force-measuring pen with food treats. It was then given c.2.5 min to settle before the first test. During this time, the pig's behaviour was observed live by PS and a 'relaxed-tense' score given using the rating scale shown in Table S1. A startling stimulus was then presented and the pig's response was recorded for 30 s using the load platform, Kinect, fast-capture, and standard video cameras. Sessions comprised a maximum of 5 tests (stimulus presentations), each separated by c.5 min, after which the pig returned to its home pen.
During sessions 1 and 2, the startling stimulus was a remotely-activated bursting balloon next to the force pen. Pigs rapidly habituated to this stimulus and therefore, given the repeated testing required for this experiment in order to yield the dataset required for analysis (see above), it was deemed necessary to vary the stimulus for the remaining sessions to generate sufficient startle/freeze responses for a comparison of automated image analysis and ground-truth measures of these. In session 3 the bursting balloon was combined with a person either calling out or stepping out from behind a curtain. From session 4 onwards the stimulus was varied in each test, the stimuli used included; an unfamiliar researcher stepping out from behind a curtain, a bucket containing metal nuts being dropped, a model person being hoisted above the pig, an umbrella being opened towards them or a bin bag being waved.
During Kinematic test sessions in weeks 4 and 12, pigs were taken from their home pen to a small room where 10 mm passive reflective spherical markers were attached to their backs using double sided sticky tape. One was placed between the ears and three more were spaced down the spine. The final two markers were transverse to each other in the region of the sacroiliac joints; these identified the rear of the pig. Once markers were securely in place the pig was taken to the Test room and the test procedure continued in the same way as for Standard tests with all measures recorded as usual, plus kinematic data recorded for 15 s following each startling stimulus.
In total, 285 tests nested within 94 test sessions (each comprising up to 5 stimulus presentations) were carried out across both phases. Technical issues with some tests mean the final dataset for analysis was 280 tests.
observations of behaviour from video recordings. Recordings from the standard video camera were coded by-eye using the Observer program 81 . Each video was examined in slow motion, multiple times to record the behaviour exhibited in response to the stimulus (Jump and fall, Jump away, Jump on spot, Spin to face, Side step, Head up, Head turn, Muscle ripple, Ear prick, Sniff at/approach stimulus, Freeze; see Table S2). Coding continued until an outcome behaviour (Flee, Slowly leave, Return to normal behaviour; see Table S2) occurred which always happened within 30 s of the startling stimulus. An Observer Startle Magnitude score of 0-4 (least to most intense) was generated from the coded behaviour according to a rating scale described in Table 1. A total www.nature.com/scientificreports www.nature.com/scientificreports/ Observer Freeze Duration (s) was also calculated. Any freezes that lasted less than a second were reviewed and those less than 0.4 s were re-classified as 'no freeze' because they did not show the tension/immobility required by our behavioural definition: 'muscles tensed, whole body stationary, ears often pricked' .
image analysis (iA) data. All IA measures were derived from the overhead fast capture camera (200 Hz frame rate) as it gave the clearest unobstructed view of the pig. A view from above is also likely to avoid obstructions such as pen walls, posts and other pigs under field conditions. To measure the magnitude of the startle response to the stimulus, accelerations of highly textured image regions were estimated using sparse feature tracking. For freeze durations, both speed and acceleration were extracted. Sparse feature tracking considers only the easiest regions to track and hence improves accuracy compared to dense feature tracking 82 . Birchfield's implementation of the Kanade Lucas Tomasi tracker was utilised (KLT; https://cecas.clemson.edu/~stb/klt/), and tracked points are referred to as KLT points. Justification for using points rather than centroids is provided in the Supplementary Information.
The magnitude of the initial response to the startling stimulus (KLT Acceleration Startle Magnitude; acceleration being closely related to force) was defined as the maximum acceleration of 50 KLT points (pixels/frame 2 ) in a temporal window 0.7 s after the startle stimulus (see Supplementary Information). The tracker was always initialized with 150 points. If points were lost by the tracker, new ones were initialized. For our calculations, we only considered the 50 points with the highest acceleration that were successfully tracked for 3 consecutive frames. KLT Speed Freeze Duration was defined as the total time the speed of the 50 fastest KLT points (pixels/ frame) was below an empirically determined threshold for a continuous period of at least 0.4 seconds (see above). Thresholds were calculated using 10-fold cross validation with 90% training and 10% testing data repeated 10 times for all data (see Supplementary Information). The same method was used to determine KLT Acceleration Freeze Duration, in case acceleration of the KLT points provided a better predictor of freeze duration than speed of the KLT points.
Load platform data. The four load cells of the platform each generated an analogue signal encoding change in force into change in voltage. This was captured at a rate of 400 Hz, converted into digital form and stored on computer to generate a time series synchronised with the other data and used in all subsequent analyses (see Supplementary Information). Signals from the four transducers were summed to give a measure of total instantaneous vertical force and normalised by the mass of each pig as this changed significantly over the course of the experiment and heavy pigs exert more force than lighter ones. The weight of the pig was determined from the average total force reading prior to the startle stimulus. All readings had this weight subtracted and were subsequently divided by this value to get vertical acceleration in units of gravities, g, where one g is 9.8 m/s 2 . Load Platform Startle Magnitudes were calculated as the peak absolute acceleration during a 0.7 s window after the startle stimulus. Load Platform Freeze Durations were calculated using load platform acceleration measures and thresholding in the same way as for Image Analysis (IA) data.
Kinect data. Depth maps were extracted from Kinect data collected at a 30 Hz frame rate (see Supplementary   Information). For each depth map sequence, a time series recording the vertical displacement of the pigs' centroids for the total duration of each test was constructed. These time series were used in all subsequent analyses. Vertical centroid speed was the magnitude of the first differential with respect to time in these series. Acceleration magnitude was the second differential. Startle magnitudes and freeze durations were extracted using both of these measures. Kinect Speed Startle Magnitude and Kinect Acceleration Startle Magnitude were calculated as, respectively, the peak absolute values of the vertical centre of mass velocity (mm/frame), and acceleration (mm/ frame 2 ) during a 0.7 s window following the startle. The method used to calculate Kinect Speed Freeze Duration and Kinect Acceleration Freeze Duration was the same as that used for the IA data.
Kinematic data. Kinematic data were recorded for 15 s following the startle stimulus and sub-sampled at 30 Hz. We extracted the magnitude of velocities and accelerations for each spherical marker on the pig individually and then took the average of the magnitude of those markers' trajectories. We only considered motion in

Analysis of the effects of other factors on observer measures of startle and freeze.
Multilevel models were used to investigate the possible effects of a range of factors (e.g. pig sex, weight, behaviour prior to test, orientation in apparatus, time of day, experience of test across time and within a day, startle stimulus, test session type; Table S3) on observer measures of startle and freeze responses. As above, an ordinal response multinomial model was used for startle magnitude data with Observer Startle Magnitude (0-4) as the response variable. However for freeze data, where the comparison to continuous measures required in the above analyses was now not essential, we converted the non-normal data into a binary freeze/no-freeze variable and used a binomial model with a logit link function. The hierarchical structure of the data was as described previously. Univariable analysis was first completed by adding each of the possible influencing factors individually into the multilevel model and recording whether they were significantly related to the response variable. A multivariable model was then constructed by sequentially adding in each of the significant influencing factors (most significant first) and checking at each stage that all previously added factors in the model remained significant. Once this had been done, any non-significant factors were entered into the model one by one. If they were now significant, the model-building process was repeated until none of the remaining factors, when introduced into the model, were found to be significant. Wald tests were then used to check whether the levels of hierarchical structure within the model were significant. If not, they were removed from the final model.

Results
comparisons of automated readouts with observer ground truth measures -startle magnitude. The ground truth Observer Startle Magnitude measure was significantly positively correlated with all automated measures of startle magnitude ( Table 2). The strongest correlation was with the Kinematic measures, although these were only available for a subset of the data. Multilevel models showed that the Load Platform and then KLT Acceleration measures were the strongest predictors of the Observer Startle Magnitude scores. (Table 2). KLT Acceleration data were strongly positively correlated not just with Observer Startle Magnitude, but also with all the automated measures, which are widely used to measure movement (Load Platform, r s = 0.824: n = 280, P < 0.001; Kinematic Velocity: r s = 0.821, n = 70, P < 0.001; Kinematic Acceleration: r s = 0.816, n = 70, P < 0.001; Kinect Speed: r s = 0.772, n = 280, P < 0.001; Kinect Acceleration: r s = 0.693, n = 280, P < 0.001; Fig. 1).
Effects of other factors on the observer measure of startle magnitude. The multilevel ordinal regression model of the influence of factors on Observer Startle Magnitude scores initially included Tests nested within Sessions within Pig. However neither of the sets of random effects for Session or Pig were significant when the predictor variables were added, and therefore the final model did not require any random effects. Factors significant in the final model are shown in Table 5. The primary reason that Session was significant is due to stronger startle magnitudes occurring in Session 4, when the startling stimuli started to be varied, compared to Sessions 5-8. The use of new stimuli resulted in increased startle magnitudes in comparison to the original balloon-burst stimulus, in particular when the bin bag was used. When compared to pigs that were rated as being calm prior to presentation of the first startling stimulus (relaxed-tense score of 1), those that received a score of 2 or 3 displayed a stronger startle reaction, but this effect was not seen for those animals with scores of 4 and 5. Larger startle reactions were seen when the pig was orientated away from the stimulus.
Effects of other factors on the observer measure of whether a freeze response occurred. The multilevel model initially included Tests nested within Sessions within Pig. However again neither Session or Pig random effects were significant once predictors were added and so the final model did not include a hierarchical structure. The likelihood of a freeze response decreased from Session 1 to 2 and increased significantly in Session 4 when the startling stimuli started to be varied. There was also an increased likelihood of a freeze response when the pig was facing away from the stimulus at the point of testing, and for pigs with a 'relaxed-tense' score of 2 or 3 compared to a score of 1 prior to the first startling stimulus. However, pigs with a score of 4 or 5 did not show this increased likelihood of freezing (Table 6).

Discussion
In this study, we investigated the validity of computer vision image analysis (IA) as a measure of startle magnitude and freeze duration Defence Cascade (DC) responses in pigs, because of its potential as a practical tool for assessing these responses under field conditions. We used behavioural observation of video as our ground truth measure of startle and freeze responses (see [72][73][74] ) and also collected load platform, kinematic, and Kinect depth-camera measures because these have been used to assess movement in other species.
Our KLT Acceleration image analysis estimate of startle magnitude, and estimates from all the automated measurements we made, were significantly positively correlated with the observer behavioural observation data. Kinematic measures generated the highest correlation coefficient which is unsurprising given that they are designed to capture detailed 3D movements and were derived using four carefully positioned cameras to measure www.nature.com/scientificreports www.nature.com/scientificreports/ precise movement of markers on the pigs. The Load Platform produced the next strongest correlation, followed by the image analysis KLT Acceleration data, with the Kinect measure generating the weakest correlations. Load Platform and KLT Acceleration were the best predictors of Observer Startle Magnitude in our multilevel model. The relatively weak predictive power of kinematic data in these analyses likely reflects the smaller sample size for this measure. Our image analysis estimate of startle magnitude was also strongly positively correlated with the load platform, kinematic and depth-camera measures.   Table 4. Sensitivity, specificity, positive and negative predictive values of each automated measure as a predictor of whether a freeze was scored as occurring or not by a human observer.  www.nature.com/scientificreports www.nature.com/scientificreports/ Our KLT Speed and Acceleration image analysis estimates of freeze duration were strongly positively correlated with the observer measure, as were Kinect Acceleration and Speed, and Load Platform data. Because there were a high number of zeroes (non-freezes) during tests, we carried out sensitivity and specificity analyses of whether or not a freeze was detected against the human observer ground truth. All measures had high specificity -they rarely detected a freeze when the observer did not detect one -with computer vision KLT measures performing particularly strongly. Their positive predictive values -how well a detected freeze actually indicates that a freeze did occur -were therefore high. However sensitivity was lower, particularly for Load Platform and KLT Acceleration data, which frequently did not detect freezes when the observer did and hence generated a high proportion of false negatives. In many such cases, the observer recorded short freezes (<2 s) in which the body appeared tensed but isolated parts (e.g. ears) were moving. This short duration and residual movement may have accounted for the failure of other measures to detect a freeze response.
For true positive data (when both the observer and automated measures detected a freeze), automated measures of freeze duration were strongly positively correlated with those made by the observer, with KLT Speed and Kinect Speed performing best. Furthermore, KLT Speed estimates of freeze duration were also strongly positively correlated with the other automated measures.
In addition to analysing the relationship between different measures of startle and freeze behaviour, we also investigated factors that may influence expression of these behaviours in pigs tested under laboratory conditions. In our study, repeat-testing of individuals was necessary to generate sufficient examples of DC responses to establish associations between different measures of these responses, whilst also following 3Rs recommendations to minimise animal use. However, repeat-testing resulted in pigs habituating to the original startling stimulus (bursting balloon), as has been found in previous studies of pigs 72,84 . Consequently, we varied the nature of the stimulus within test sessions, starting in session 4, and startle response magnitude increased after this session. Similarly, the likelihood of a freeze response initially decreased from session 1 to session 2 when the bursting balloon was used repeatedly, but increased in session 4 when stimuli started to be varied within sessions. The type of stimulus used also affected DC responses with bin bag and umbrella stimuli, both involving rapid movement, being particularly effective at generating greater startle magnitudes than the balloon. Use of stimuli with a pronounced visual component may thus be more potent inducers of DC responses in pigs compared to purely auditory stimuli.
Larger startle magnitudes were observed when pigs were orientated away from the stimulus possibly because startling stimuli presented in this context induced a greater surprise reaction [20][21][22] . An alternative explanation is that those pigs orientated away from the stimulus were more likely to make a large jump with movement (Observer Startle Magnitude score of 3) in order to face the stimulus, whilst pigs already facing the stimulus may have been more likely to jump on the spot (score of 2). However, our finding that there was also an increased probability of freezing when pigs were orientated away from the stimulus lends support to the idea that the stimulus was indeed more startling and surprising in this context, including leading to increased post-startle processing (during freeze) to resolve the source of the event 21,22 .
A subjective rating of how 'relaxed or tense' the pigs were before the first stimulus was delivered during a test also influenced DC responses. For both startle magnitude and freeze occurrence an increase in reaction was seen for pigs scored as 2 or 3 when compared to the calmest pigs (scored as 1). However, this difference was not observed in pigs scored as 4 or 5 (least calm). In fact, the probability of a freeze reaction was decreased in pigs with a score of 4 compared to those with a score of 1. Although this seems counterintuitive, we observed informally that the least calm pigs appeared to pay least attention to their surroundings and were often attempting to leave the test pen. Consequently, delivery of the startling stimulus did not intrude into their already active behaviour. This re-emphasises the need for pigs to be settled and calm at the point of testing, for example in a home pen.  Table 6. Significant factors in the multilevel binomial regression model of Observer Freeze Occurrence. A positive estimated coefficient of a predictor indicates an increased probability of a freeze occurring, whereas a negative coefficient indicates a decreased probability. Bold text indicates effects that are significant at p < 0.05.
It is interesting to note that our models found corresponding effects of moderating factors on both startle magnitude and probability of freeze occurrence. Both increased when startling stimuli were altered, when the pig was facing away from them, and in pigs who were slightly or moderately tense prior to their presentation, although not for animals rated as tense or very tense who appeared to be focused on leaving the test pen. This coherence of effects indicates that startle magnitude and freeze probability may reflect a similar underlying construct, for example affective valence 25,28,38,40 .
In summary, we found that computer vision image analysis measures were comparable with force, kinematic and depth measurements at estimating ground truth observer measures of both startle magnitude and freeze duration/occurrence. These findings demonstrate that computerised image analysis of video recordings can be used to detect and quantify startle and freeze Defence Cascade responses in individual pigs, and hence has potential as a practical automated measure of these behaviours under field conditions. We also identified factors including startling stimulus characteristics, orientation relative to the stimulus, and 'relaxed-tense' behaviour that can moderate DC responses and hence should be borne in mind during further investigation of this behaviour. The similarity of these moderating effects across both startle and freeze behaviours indicates that these two forms of DC response may represent similar underlying constructs such as affective valence. If variation in startle and freeze behaviour does indeed reflect affective state in pigs, as it appears to in other species [20][21][22]28,[59][60][61] , computer vision image analysis may thus provide a novel method for assessing pig welfare in the field. Further research on the link between affective state and pig startle and freeze behaviour, the capacity of image analysis to measure these behaviours in groups of pigs, and the utility of the approach under field conditions, including startle stimulus design and issues of habituation, is needed to realise this potential 73,78,80 .

Data availability
Data for all variables used in the statistical analyses will be made available on request. Extensive sets of uncompressed raw video RGB data, Kinect data, load platform voltage data and kinematic x,y,z data are also available on reasonable request.