Emotions are mental states occurring in response to external or internal stimuli which influence an individual’s fitness1. In the dimensional model of emotions, emotional states can be mapped in two-dimensional space, where valence (positive versus negative) and arousal (intensity) characterise each dimension1,2. Emotional states in non-human animals have been demonstrated in a broad range of species across the taxonomic scale (e.g. insects:3,4; birds:5,6; mammals:7,8). However, clear indicators of positively valenced emotion are rare, because identifying consistent correlates on the valence dimension is difficult9,10. Emotions are multi-componential and each component can vary across the two dimensions of arousal and valence1. The four pillars of emotion assessment in humans are physiological, behavioural, cognitive and subjective verbal report11,12. Quantifying emotions in non-human animals relies on indirect assessment of three of these pillars: physiology, behaviour and cognition; the fourth pillar is solely accessible in humans. Cognitive bias testing assesses the effect of emotion on cognition in animals13,14,15. However, this test requires lengthy specialised training of animals to implement, which itself could potentially alter the emotional state of the indivudals being assessed16,17. Vocalisations, however, can be reliably measured and may allow the quantification of emotions in non-human animals as they reflect the inner state of the caller, providing information about an individuals’ affective state10,18,19. Thus, emotions expressed in vocalisations may play an important role in shapping social interactions among indiviuduals.

Morton’s motivational-structural rules state that acoustic characteristics of vocalisations are predictable from the context in which they are produced20. Vocalisations produced in more ‘hostile’ contexts are expected to have lower frequencies and longer durations than vocalisations produced in ‘friendly’ or ‘fearful’ contexts20. However, in addition to between-context variation in acoustic characteristics, variation within vocalisation types may also be context-dependent, and may reflect the internal state of the caller21,22,23. In the domestic pig, the most common vocalisation is the grunt, which functions predominantly as a contact call24,25. Information encoded in grunts includes body size25, individual identity26, personality type27 and emotional state28. Pigs’ vocal responses to mental and physical stressors differ depending on the nature of the stressor29. The cognitive bias paradigm has been applied to assess of the effects of various factors on emotional valence in pigs7,30,31.

The aim of this study was to test whether vocalisations produced in oppositely valenced contexts differ in their acoustic characteristics. We based the choice of acoustic parameters to measure on previous studies (see Table 1) and measured these in grunts produced in two different situations. In the first situation, the valence of the context was experimentally manipulated using cognitive bias positive and negative training trials. In the second situation, response to the probe in ambiguous test trials provided the measure of the valence of the context for the individual. The valence of each context was defined as approaching a stimulus in positive contexts and avoiding a stimulus in negative contexts32. We then analysed the effect of optimistic versus pessimistic responding on the acoustic characteristics of grunts, within each situation, in two separate analyses.

Table 1 Overview of the acoustic parameters measured and the underlying rationale for each parameter included in the analysis.


Animals and housing

We studied crossbred PIC 337, Large White x Landrace pigs, allocated into groups of 18 per pen, balanced for weight and sex, where they remained from 4–10 weeks of age at a research farm (Agri-Food and Biosciences Institute, Hillsborough, Northern Ireland). All test subjects were 10 weeks old at the time of cognitive bias testing. The experiment consisted of three replicates. Within each replicate there were two types of housing environment treatment; a barren environment and an enriched environment (cf.27,33 and see electronic supplementary material for full details). The barren environment had a partially slatted concrete floor and the minimum legal space allowance of 0.41 m2 per pig, whereas the enriched environment had a solid floor with straw bedding and a greater space allowance of 0.62 m2 per pig.

Ethical note

Ethical approval for this research was granted from University of Lincoln’s College of Science Ethics Committee (COSREC62, 8/9/15). All procedures conformed to the ASAB/ABS Guidelines for the Use of Animals in Research.

Cognitive bias testing

Twenty-seven pigs were tested in a spatial cognitive bias task (see Supplementary Methods for full details of training). On the cognitive bias test day, each pig was exposed once individually to each of the ambiguous probe locations: near positive (NP), middle (M) and near negative (NN), with pseudo-randomization between two positive (P) and negative (N) training trials resulting in 9 trials per test (i.e. P, N, M, N, P, NN, N, P, NP). Ambiguous locations were unrewarded but in positive and negative training trials the locations were baited as in training. Latency to reach the bowl was recorded in all trials. Individuals were given 30 seconds to approach the bowl and if they did not approach in that time they were recorded as having a latency of 30 sec and returned to the start box for the next trial.

The raw latency to reach the probe data violated the assumptions of the linear models, therefore we classified responses to the probe within each as “optimistic”, “pessimistic” or “neutral” (cf.31). Pigs’ responses in each trial were classified in relation to the mean speed of approach to the trained positive and negative targets from the final training session using the following formula:

$$adjusted\,score=\frac{x}{[(y+z)/2]}\,\times 100$$

where x = latency to contact the bowl during the cognitive bias test, y = mean latency to contact the bowl during the positive training trials, z = mean latency to contact the bowl during the negative training trials. As per Carreras et al. (2016), adjusted scores were calculated as percentages; if the adjusted score was ≤75% the response was classified as “optimistic”; if the score was >75 and <125% the response was classified as “neutral”; and if the score was ≥125% the response was classified as “pessimistic”.

Acoustic recording and analysis

Vocalisations produced during the cognitive bias test were recorded at a distance of 1–4 meters using a Sennheiser ME66/K6 (Sennheiser Electronic GmbH & Co. KG, Wedemark, Germany) directional microphone connected to a Marantz PMD660 (D&M Professional, Kanagawa, Japan) solid state audio recorder (.wav format, sampling frequency: 44.1 kHz, resolution: 16 bit). The microphone was placed on one corner of the arena at a height of 1.5 m and pointed into the test arena (see Fig. S1 for microphone location above test arena). The recorder was switched on at the start of the test session for each individual and all trials were recorded.

All good quality grunts with a high signal-to-noise ratio from the positive, negative and ambiguous trials were selected for analysis (see Fig. 1 for spectrograms). Grunts were chosen for analysis on the condition that they were at least three calls apart, if produced in a bout, to avoid pseudoreplication. A total of 125 calls were produced when a subject responded optimistically; 153 calls were produced when a subject responded pessimistically; 4 calls were produced when the subject’s response was neutral and these 4 calls were excluded from subsequent analyses. A further 14 calls that had been produced in positive and negative trials where the subject had responded ‘incorrectly’ (i.e. responded pessimistically in the positive trial and optimistically in the negative trial) were also excluded. This resulted in a total of 264 calls from 23 individuals for analysis (78 calls from ambiguous test trials and 186 calls from the positive and negative training trials: see Table S1 for full details of number of calls contributed per individual). Using a custom built script in PRAAT34, we measured 11 acoustic parameters from each call (see Table 1 for list of parameters measured and supplement for further details of acoustic analysis). Some parameters (amplitude modulations and rate) could not be measured in every call and this resulted in some missing values, hence the sample size differs between acoustic parameters (Table 2).

Figure 1
figure 1

Grunt vocalisations of domestic pig. Grunt vocalisations from individual 71: (a) is from a positive trial with duration = 0.372 seconds and (b) is from a negative trial with duration = 0.411 seconds. F0 = fundamental frequency and F1 = the first formant (calls are available as audio files “Positive context grunt” and “Negative context grunt” respectively in the supplementary material). Visualisation settings: view range = 0–8000 Hz, window length = 0.03 sec, dynamic range = 65 dB, time steps = 700, frequency steps = 250, Gaussian window.

Table 2 Raw means and SDs, along with results from models for the effect of training trial type, i.e. Negative or Positive, controlled for sex, environment, and individual identity along with statistical results, sample size (N) and P values.

Statistical analysis

Data analysis was conducted using R v. 3.0.2 (R Development Core Team 2008). The statistical analysis of the acoustic parameters was split into two analyses: (1) the trained positive/negative trials, and; (2) the ambiguous trials. This was because these two trial types, trained versus ambiguous, were fundamentally different. In the trained trials the bowls were baited with either a positive or negative stimulus and thus, were expected to elicit either a positive or negative valence state during that trial. The ambiguous trials, on the other hand, were not baited and in addition, in each of the 3 ambiguous trials, this was the first time the individual had encountred the probe in that location.

Linear mixed models were used to investigate the effect of response type (optimistic or pessimistic) on the acoustic parameters in the positive/negative trials and the ambiguous trials separately, accounting for experimental design (see electronic supplementary material for full details of models). Model residuals were visually inspected for normality and homoscedasticity. Data are presented as mean ± SD.


Positive and negative training trials

The duration of calls produced in the negative training trials was significantly longer than the duration of calls produced during the positive training trials (raw mean ± SD: negative = 0.43 ± 0.15, positive = 0.35 ± 0.15, p = 0.0017). There was no significant difference between the positive and negative training trials in the other acoustic parameters investigated (see Table 2). To correct for multiple testing, we applied a Bonferroni correction; this led to a corrected value of p < 0.0046 (i.e. 0.05/11), which did not affect the interpretation of any of the results.

Ambiguous trials

There were no significant effect of response type (optimistic or pessimistic) on the acoustic parameters of the grunts from the ambiguous trials (see Table 2).


We found that emotional valence is encoded in the duration of calls. Grunts produced during contexts of positive valence were shorter than those produced during contexts of negative valence. Call duration is affected by respiration, and changes in the action or tension of the respiratory muscles may explain the shorter duration of vocalisations in positive compared to negative contexts10. Longer calls produced in response to negative stimuli may function to increase the salience of these, presumably more urgent, calls to conspecifics. Previous studies in pigs have also found shorter call duration in positive contexts compared to negative contexts28,35, with similar findings in horses23, elephants21 and dogs36. Thus, call duration appears to provide a consistent indicator of emotional valence in mammalian vocalisations10.

Interestingly, we did not find significant differences in the other measured acoustic parameters between positive and negative contexts, which differs from recent studies22,37,38. There are several non-mutually exclusive explantions why the outcomes differ among studies: firstly, the emotional valence experienced by indiviudals may not have differed enough to be reflected in the measured acoustic parameters. The only difference between the two contexts was the position of the bowl and whether a reward was received or not, and the trials were carried out in quick succession. The dimensional model views valence as a continuum. Other studies may have induced emotions further apart on the continuum, whilst our experimental contexts may have induced emotions that were closer to each other on the continuum. Thus, an acoustic parameter would need to be very sensitive to subtle differences in valence to be used as an indicator of valences that are closer together on the continuum. Secondly, the experimental design of previous studies differed from ours in several other aspects, including day of recording, number of individuals present during recording and the functional relevance of the context e.g. social isolation, startling and aggression as negative contexts and food anticipation and affiliative interactions as positive contexts22,28,37,38. Thirdly, it may be that the calls produced here were influenced more by unmeasured factors than the valence of the contexts. Leliveld and colleagues found that the context accounted for only 1.5–11.9% of the variability in the acoustic parameters they measured28, suggesting that many other factors influence the acoustic characteristics of vocalisations.

Vocalisations may be more associated with the expression of arousal than valence39,40. Indeed, one of the main challenges in assessing emotional valence is eliciting oppositely valenced states whilst not affecting arousal, since negatively valenced states also tend to be higher arousal states37. For assessing the characteristics of vocalisations in such conditions, this is problematic, as high arousal is known to impact the acoustic characteristics of vocalisations41. Fundamental frequency parameters and energy quartiles are robust measures of emotional arousal in mammals10,42,43. We did not detect any statistically significant differences in these parameters between the positively and negatively valenced contexts (see Table 2). This suggests that level of arousal did not differ between the positive and negative contexts used here.

In the ambiguous trials, individuals were presented with an ambiguous cue, i.e. the bowl located in the near positive, middle and near negative locations. In contrast to the positive and negative trained trials, we did not find any significant associations between optimistic/pessimistic response type and any of the acoustic parameters measured during the ambiguous trials. However, there was a trend for calls to be shorter when produced during an optimistic response than a pessimistic response. As the training trials were rewarded/unrewarded, we predicted they would induce emotional states of positive and negative valence respectively44. In contrast, responses in ambiguous trials provide a measure of the animals’ underlying emotional state14, rather than inducing any valenced emotional state per se. Thus, optimistic and pessimistic responding in the ambiguous trials may not have been directly comparable to optimistic and pessimistic responding in the training trials.

Emotions are typically measured through behaviour, for example through body posture45, facial expression46,47, and vocalisations48. For social species, the ability to recognise the emotional state of conspecifics is likely to be adaptive as it may enable them to predict the future behaviour of the individual and adjust their behaviour accordingly49,50,51. Individuals of different species are able to perceive, and respond to, differences in emotional state based on differences in the acoustic structure of the same type of vocalisations produced in different contexts51,52,53,54. Changes in the characteristics of vocalisations could potentially function as a signal of emotional state to conspecifics, however further research is required to fully understand the effect of valence on acoustic parameters in animal vocalisations. Recent studies on emotional contagion suggest that cues relating to emotional state can elicit similar changes in behaviour in individuals who have no direct experience of the original cue55,56,57. Thus, changes in vocalisations corresponding with emotions could function to signal emotional state to conspecifics and to regulate social interactions.

In conclusion, duration of vocalisation appears to be a consistent indicator of valence across species10, and our results suggest that it may also be a sensitive indicator of small differences in valence. This is encouraging, as call duration has the advantage of being easy to measure and apply in a wide variety of contexts. To assess whether valence affects the acoustic parameters of calls, it is necessary to understand how similar or dissimilar oppositely valenced states are to each other. This could help uncover which acoustic parameters are sensitive to small changes in valence, and which others may only be affected by large differences in valence.