The measurement of peripheral oxytocin (OT) concentrations has become a widespread tool in psychology and animal behaviour research1,2,3. Previous work in humans has associated endogenous OT release with trust4,5, mother–child play6, and social affiliative touch7. Research with non-human animals has linked peripheral OT concentrations to prosocial and affiliative behaviour8,9, inter-specific interactions involving social touch10,11, and domestication12. OT is a neuropeptide hormone that regulates physiological processes such as eating behaviour and satiety13, heart rate and blood pressure14, birth, lactation, and parenting behaviour15,16, and also plays a crucial role in social bond formation and maintenance17. It is produced in the hypothalamus and released into the bloodstream by the pituitary gland, hence it can be measured centrally (in brain tissue by microdialysis18; in cerebrospinal fluid19) or in peripheral substrates (in plasma20; in milk21; in saliva22; in urine9). Non-invasive means of measuring OT concentrations (i.e., in saliva and urine) are in high demand because they do not disturb the subject’s behaviour and are less likely to cause a stress response which may in turn affect OT concentrations23,24.

However, the validity of studies measuring peripheral OT concentrations has been criticized25,26,27,28, with many published studies not reporting the essential validation steps of the assays used to measure OT and/or its metabolites, suggesting that inconsistent findings are likely associated with a lack of analytical rigor and consistency26. Comparability across studies and labs is severely hindered by the fact that there are no standardized protocols detailing how to prepare samples to measure OT and its metabolites in different sample matrices and/or species. For example, as demonstrated by a recent meta-analysis29, it is particularly important to state whether or not sample extraction has been conducted before analysis as this greatly affects measurements. There are several different ways to measure OT and its immunoreactive metabolites in peripheral substrates, including enzyme immunoassays (EIA9), radio immunoassays (RIA30), and mass spectrometry applications (i.e., LC–MS31,32; nanoLC-MS33). The current paper will focus on EIAs as they appear to be most commonly used in the behavioural sciences and psychology, yet to date only a few published studies conducted and reported validations for OT EIAs using peripheral substrates (Table 1).

Table 1 Overview of studies reporting validations of oxytocin EIAs using peripheral substrates (blood, urine, saliva).

In general, one can differentiate between full and partial validations: A full validation is necessary when establishing a new assay for the first time, or when a commercially available assay kit is used for the first time for a particular species and/or sample matrix. A partial validation may be sufficient when a commercial assay is used and the manufacturer has already assessed certain parameters (such robustness or antibody cross-reactivity) during development34. Nevertheless, each assay needs to be validated every time, prior to its use, in a different species, for each new sample matrix, or when a new extraction protocol is established. The following is usually needed to sufficiently validate an immunoassay for its intended use: an assessment of its (1) selectivity (i.e., antibody cross-reactivity), (2) dilution linearity or parallelism (i.e., to determine the assay’s linear range by using either spiked or non-spiked samples, and identify potentially interfering matrix effects, respectively), (3) extraction efficiency and assay accuracy to calculate percent recovery and variation, (4) performance of a biological or physiological validation using a known trigger of endogenous OT release or by administering exogenous OT, and finally, (5) assessment of antibody specificity using chromatographic separation3. Furthermore, recording measures of repeatability and precision (i.e., reported as intra- and inter-assay coefficients of variation (CV) and on-going internal quality control (QC) are required for continuous evaluation of assay performance throughout a study including publication of obtained values alongside results. A validation should reflect the intended purpose of a subsequent study and may be considered successful if it produces reliable results in the context of the data’s intended use (see ‘fit-for-purpose approach’ in biomarker research34,35). It should also allow the estimation of the smallest detectable effect to determine whether the assay is suitable given the expected effect size of a study. Lastly, even if an assay does not meet requirements for a given purpose, validation parameters should be reported nonetheless, as this information may contribute to saving valuable resources.

The aim of the present paper was to analytically and physiologically validate a commercially available OT EIA kit (Arbor Assays, Ann Arbor, MI, USA, Cat. No. K048-H5) for dog, wolf, and human urine samples, and compare its performance to another commercial kit (Enzo Life Sciences, Assay Designs, Cat. No. 901-153A-0001) previously validated for OTM measurement in dog and wolf36 as well as human urine37 by our group, thereby providing practical recommendations for future studies. To this end, we ran tests of parallelism for each species to investigate the presence of matrix effects. Next, we assessed extraction efficiency and assay accuracy followed by the determination of patterns of immunoreactivity (IR). Finally, we physiologically validated the assay by intranasally administering exogenous OT (or a placebo) to a group of pet dogs. All analytical parameters for the Enzo assay kit reported in this paper were obtained in the same way as for the Arbor assay. We used pooled samples from the same study populations for analytical validation of both assays; however, we did not reuse the old samples from the Enzo validation to avoid long storage periods. All tests were conducted by the same experimenter under the same laboratory conditions. Full methodological details and results for the Enzo assay were published before36,37 and are cited here for comparative purposes.

Material and methods


Urine samples of 11 pet dogs (5 females, 6 males) and 8 humans (4 females, 4 males) were collected at the Max-Planck-Institute for Evolutionary Anthropology (MPI EVA) in Leipzig, Germany, and urine samples of 6 wolves (3 females, 3 males) were collected at the Wolf Science Center (WSC), in Ernstbrunn, Austria, for analytical assay validation. All individuals were in good health status at the time of sample collection. For the physiological validation, 14 adult, healthy pet dogs of different breeds (9 males, 5 females) recruited from the database of the Clever Dog Lab (CDL) of the University of Veterinary Medicine (Vienna, Austria) were trained to inhale OT nasal spray (Syntocinon, Novartis) using a vaporizer mask previously shown to be effective in administering exogenous OT to dogs38.

Urine sample collection

Dog urine samples at the MPI EVA were collected when the dogs urinated spontaneously during leashed walks with their owners in an outside area in front of the institute. Urine samples were collected in plastic trays (Carl Roth, 5195.1) and brought to the Endocrinology Laboratory within 5 min. Human participants were asked to urinate into a plastic tray (Carl Roth, 5195.1) and samples were then brought to the Endocrinology Laboratory at the MPI EVA, as well within 5 min following collection.

Dogs at the CDL and wolves at the WSC (once habituated to the urine collection process using an expandable metal stick with a plastic cup attached; Carl Roth, 5195.1; Fig. 1) provided spontaneously voided urine samples during leashed walks with their owners or animal trainers, respectively. Within a maximum of 15 min following collection, samples (kept on ice packs in the meantime) were brought to the facilities of the CDL or WSC.

Figure 1
figure 1

Urine collection device used for dogs and wolves consisting of an expandable metal stick and a plastic cup.

At the respective laboratories, all urine samples (dog, wolf, and human) were subsequently divided into 1 ml aliquots and 100 µl of a 0.1% phosphoric acid (PA) was added per 1 ml sample to avoid OT degradation36,37. Samples were aliquoted and frozen at − 20 °C until further processing. In case samples had to be transported to the MPI EVA for extraction and analysis, they were kept on dry ice during shipment which took less than 12 h.

Intranasal oxytocin administration

To physiologically validate the assay at hand, we administered 12 international units (IU) OT nasal spray (Syntocinon, Novartis) or a placebo (PL; saline solution; 0.9% sodium chloride, Ringer) using a vaporizer mask (Nebutec, M-neb vet nebulizer and inhalation mask for dogs; see Schaebs et al.38 for details) to 14 pet dogs and collected urine samples before and 45–60 min after treatment. Each dog received both treatments in a semi-randomized and counterbalanced order, on different days. Analysis of the samples was blinded (i.e., the experimenter processing the samples did not know which treatment the dog had received).

Ethics declarations

Wolves and dogs

The study was discussed and approved by the institutional ethics and animal welfare committee and all experiments were performed in compliance with GSP and ARRIVE guidelines and national legislation. Specifically, approval was obtained from the ethical commission of the University of Veterinary Medicine, Vienna (approval number: ETK 05/03/2017) for the wolf samples, and from the ethical commission of the Max Planck Society for the dog samples (approval number 2017_07) used for the analytical assay validation. The OT/PL administration was part of a study with pet dogs run at the CDL (University of Veterinary Medicine, Vienna) and approved by its ethical commission (approval number: ETK 13/11/2017). We obtained informed consent from all pet dog owners after full description of the procedure.

Human participants

The study was discussed and approved by the institutional ethics committee and all experiments were performed in accordance with GSP guidelines and national legislation. Ethical approval for participation of human subjects was obtained from the ethical commission of the Max Planck Society (approval number 2017_09) and informed consent was obtained from all participants after full explanation of the purpose and nature of the study.

Sample extraction and urinary oxytocin metabolite measurement

All laboratory analyses were performed in the Endocrinology lab at the MPI EVA. Urine sample extraction with solid phase extraction (SPE) cartridges was conducted according to a previously validated and published protocol9 incorporating minor adjustments (see36 for details). Extracted samples were analysed according to the assay manufacturer’s instructions and incubated overnight at 4 °C. All samples were measured in duplicates. When optical density (OD) values of sample duplicates differed more than 10% the measurement was repeated or the sample got excluded from further analysis.

Average Zero standard (B0; wells contained only assay buffer but no sample) OD values achieved after incubation were more than twice as high with the Arbor as with the Enzo assay1.11 (SD 0.12; N = 12 plates) for the Arbor, and 0.47 (SD 0.08; N = 32 plates) for the Enzo assay, respectively.

The inter-assay CV of OTM concentrations for a high concentrated OT standard (QC high: 640 pg/ml; N = 5 plates) was 4.1%, and 16.8% for a low concentrated OT standard (QC low: 102.4 pg/ml; N = 5 plates). The intra-assay CV, as calculated by averaging variability across duplicates of all samples measured on a single assay plate, was 8.6% (N = 29 samples) for dog and wolf samples, and 9.5% (N = 29 samples) for human samples.

Analytical validation


We conducted a test for parallelism for each of the three species to investigate the potential presence of matrix effects. 450 µl of an extracted dog urine pool was spiked with 50 µl of an OT standard (concentration 1600 pg/ml; supplied by Arbor Assays) and diluted serially36. The same procedure was performed on an extracted wolf and human urine pool.

Extraction efficiency and assay accuracy

To determine extraction efficiency and assay accuracy, we created five pools of dog, wolf, and human urine samples. For extraction efficiency, 237.5 µl pooled urine samples were spiked with 12.5 µl of three different concentrations of an OT standard (delivered with the assay system; high: 40,000 pg/ml; medium: 20,000 pg/ml; low: 10,000 pg/ml) before extraction. To assess assay accuracy, 237.5 µl extracted urine samples were spiked with 12.5 µl of the same three different concentrations of an OT standard (see above). Subsequently, percent recovery was calculated following the formula reported in36.


Patterns of immunoreactivity (IR) were investigated following the protocol given in36. In brief, IR was determined by running 100 µl of extracted dog, wolf, human pool samples, or extracted OT standard, over a Waters Alliance 2695 high-performance liquid chromatograph (HPLC) equipped with a Gemini C18 column (Phenomenex, Torrance, CA, USA). The obtained fractions were collected with a Waters Fraction Collector 3 (Waters, Milford, MA, USA), lyophilized overnight, and kept frozen at − 20 °C until measurement with the EIA. We calculated the percentage of ‘explained IR’ (i.e., IR that overlapped with the OT standard and thus likely originates from OT or one of its degradation products/metabolites) according to the formula given in36.


All statistical tests were run and plots created using R (version 3.3.3; paired t-tests performed using version 4.0.239). We tested for parallelism by fitting a linear model including the interaction between sample type (standard curve and pooled sample) and the concentration of the standard with the percent binding as response variable36. The model was fitted using the function lm. The check for assumptions of normality and homogeneity of the residuals did not indicate any problems (inspection of a qq-plot of the residuals and residuals plotted against fitted values40). Model stability was assessed by means of DFBeta40, which did not indicate any problems. Paired t-tests were conducted to assess changes in urinary OTM concentrations from pre- to post-treatment using the data obtained from the physiological validation (intranasal OT administration). Effect sizes were determined using R2 (paired t squared/(paired t squared + df)).


The Arbor assay measured average OTM concentrations of 398 pg/ml (SD 158) in the dog urine pool, 367 pg/ml (SD 189) in the wolf urine pool, and 119 pg/ml (SD 62) in the human urine pool. In contrast, the Enzo assay measured average OTM concentrations of 152 pg/ml (SD 67) in pooled dog urine, 123 pg/ml (SD 47) in pooled wolf urine, and 35 pg/ml (SD 8) in pooled human urine36,37.


All three serially diluted pools were parallel to the standard curve (dog urine: t(12) =  − 0.233, P = 0.820; wolf urine: t(12) =  − 0.243, P = 0.812; human urine: t(12) =  − 0.351, P = 0.732) and this was confirmed by visual inspection (Fig. 2).

Figure 2
figure 2

Parallelism of serially diluted human, dog, and wolf urine pool samples to the oxytocin (OT) standard curve. Note that the x-axis is on a log scale.

Extraction efficiency and assay accuracy

For the dog urine pool, mean extraction efficiency was 138% (range: 127–144%; SD = 9.9; n = 3; Table 2) when spiked with a high, 137% (range: 115–157%; SD = 21.3; n = 3; Table 2) when spiked with a medium and 157% (range: 130–170%; SD = 22.7; n = 3; Table 2) when spiked with a low concentrated OT standard. Mean assay accuracy for the dog pool was 166% (range: 150–186%; SD = 18.0; n = 3; Table 2) when spiked with a high, 137% (range: 126–147%; SD = 15.1; n = 2; Table 2) when spiked with a medium and 137% (range: 134–140%; SD = 4.1; n = 2; Table 2) when spiked with a low concentrated OT standard.

Table 2 Extraction efficiency and assay accuracy for the Arbor and Enzo assays.

For the wolf urine pool, mean extraction efficiency was 119% (range: 111–128%; SD = 6.3; n = 5; Table 2) when spiked with a high, 132% (range: 118–143%; SD = 10.6; n = 4; Table 2) when spiked with a medium and 119% (range: 96.8–152%; SD = 21.5; n = 5; Table 2) when spiked with a low concentrated OT standard. Mean assay accuracy for the wolf pool was 129% (range: 115–146%; SD = 15.3; n = 5; Table 2) when spiked with a high, 132% (range: 119–149%; SD = 13.6; n = 5; Table 2) when spiked with a medium and 143% (range: 123–162%; SD = 19.5; n = 3; Table 2) when spiked with a low concentrated OT standard.

For the human urine pool, mean extraction efficiency was 105% (range: 102–107%; SD = 2.5; n = 3; Table 2) when spiked with a high, 99.6% (range: 97.2–102%; SD = 2.6; n = 3; Table 2) when spiked with a medium and 98.6% (range: 93.0–104%; SD = 8.0; n = 2; Table 2) when spiked with a low concentrated OT standard. Mean assay accuracy for the human pool was 112% (range: 110–113%; SD = 1.7; n = 3; Table 2) when spiked with a high, 116% (range: 111–126%; SD = 8.6; n = 3; Table 2) when spiked with a medium and 114% (range: 101–130%; SD = 14.7; n = 3; Table 2) when spiked with a low concentrated OT standard.


The immunogram of the extracted OT standard revealed IR in fractions 2 and 3 (accounting for 26.5% and 73.5% of the total IR, respectively; Fig. 3, Table 3).

Figure 3
figure 3

Percent of total immunoreactivity detected in each fraction in extracted oxytocin (OT) standard and extracted dog, wolf, and human urine.

Table 3 Percent of total immunoreactivity (IR) detected in extracted urine explained by IR found in extracted oxytocin (OT) standard as measured by Enzo and Arbor OT assays.

The immunogram of extracted dog urine revealed IR in fractions 2, 3 and 4 (accounting for 28.5%, 66.5% and 5% of the total IR, respectively; Fig. 3, Table 3). Thus, 95% of IR in extracted dog urine can be explained by that in extracted OT standard.

The immunogram of extracted wolf urine revealed IR in fractions 2 and 3 (accounting for 28% and 72% of the total IR, respectively; Fig. 3, Table 3). Thus, 100% of the IR found in extracted wolf urine can be explained by that in extracted OT standard.

The immunogram of extracted human urine revealed IR in fractions 2 and 3 (accounting for 25% and 75% of the total IR, respectively; Fig. 3, Table 3). Thus, 100% of the IR in extracted human urine can be explained by that in extracted OT standard.

Physiological assay validation

Following intranasal OT treatment, the Arbor assay measured an average increase in urinary OTM concentrations of 52.9% (average pre-treatment concentrations: 165 pg/mg creatinine; SD 86.1; range: 53.9–463; median = 147; average post-treatment concentrations: 252 pg/mg creatinine; SD 139; range: 75.2–718; median = 220). This increase was significant (t (36) = 4.23, P < 0.001) (Fig. 4a) and the effect size was large (R2 = 0.33). Following intranasal PL treatment, an increase in OTM concentrations of 4.9% was detected (average pre-treatment concentrations: 165 pg/mg creatinine; SD 94.3; range: 65.7–599; median = 144; average post-treatment concentrations: 173 pg/mg creatinine; SD 98.7; range: 54.1–476; median = 141). This increase was not significant (t (33) = 0.79, P = 0.44) (Fig. 4b) and the effect size was negligible (R2 = 0.02).

Figure 4
figure 4

(a, b) Individual changes of urinary OTM concentrations following intranasal OT (a) or placebo (b) treatment in pet dogs (boxes indicate the interquartile range, horizontal lines indicate the median).


With the present study, we aimed to evaluate the performance of a commercial EIA kit (Arbor Assays, Ann Arbor) to measure urinary OTM concentrations in dogs, wolves, and humans. In addition, we tested whether the assay would pick up changes in dogs’ urinary OTM concentrations following intranasal treatment with either OT or a PL solution. Building on previous studies by our group36,37 we compare and discuss the outcomes of this validation of the Arbor OT assay in relation to the results we obtained for another commercial kit from a different manufacturer, the Enzo OT assay kit (Enzo Life Sciences, Assay Designs), to guide decisions regarding assay suitability for the measurement of urinary OTM in dogs, wolves, and humans.

The Arbor assay performed well with regard to inter- and intra-assay CVs and parallelism, in all three species assessed, indicating that matrix effects were not an issue. However, similarly to the Enzo OT assay, values for extraction efficiency and assay accuracy were higher than 100% for dogs and wolves for low, medium, and high concentrations with relatively large SDs (Table 2). For the human samples, values for extraction efficiency exceeded 100% only for the high concentration, but all three concentrations (low, medium, high) for assay accuracy. However, SDs for human samples were considerably lower compared to the Enzo assay. Taken together, results indicate a comparable performance of the two assays with regard to accuracy and extraction efficiency for dogs and wolves, but warrants caution when measuring samples in the lower range of both assays as subtle differences may not be picked up36. For human urine samples, the Arbor assay performed better than the Enzo with regard to its accuracy.

There was a striking difference in average Zero standard (B0) OD values achieved following over-night incubation. Compared to the Enzo assay, the Arbor assay reached OD values more than twice as high. Low OD readings due to insufficient colour development can be caused, among other things, by low temperature (in the lab, or of the reagents), too short incubation periods, or too many wash cycles, and may result in low repeatability (i.e., higher intra-assay CVs) as the standard curve becomes relatively flat and small differences in OD values result in largely different hormone concentrations. Furthermore, the proportion of measurements which fall below or above the linear range of the standard curve increases. This results in more samples needing to be re-measured. Therefore, for this aspect of binding sensitivity, the present assay showed clear advantages over the previously validated one.

To evaluate whether the assay system indeed measures OT and its immunoreactive metabolites rather than cross-reacting substances that do not stem from the OT metabolism, patterns of IR in the samples were determined. For the Arbor assay, the immunogram of OT standard showed one major peak in fraction 3 accounting for 73.5% of total IR, as well as considerable IR in fraction 2 accounting for 26.5% of total IR. OT molecules are sensitive to structural changes due to temperature and pH-level of the samples41,42 and may be altered or broken down during sample handling and extraction3. The finding of IR in more than one fraction of OT standard hence suggests the presence of not only OT, but also OT degradation products36. The immunograms for wolf and human urine revealed that IR was present in the same two fractions (fractions 2 and 3) as in the OT standard sample, explaining 28% and 72%, and 25% and 75% of total IR, respectively. In case of dog urine, IR was found in three fractions (2, 3, and 4), accounting for 28.5%, 66.5%, and 5% of total IR, respectively. Thus, while for wolf and human urine, 100% of IR in the samples can be explained by IR in extracted OT standard, for dog urine, only 95% of IR detected matched IR present in extracted OT standard and a small proportion of additional IR was found in fraction 4, accounting for 5% of total IR. Since all urine samples for the analytical assay validation were collected and treated exactly the same way from storage and extraction to measurement, this may reflect species-specific differences in either the metabolic breakdown of the OT molecule in the body, degradation processes during handling, particular features of the urine (i.e., such as acidity/pH-level42), or the presence of cross-reacting substances in dog urine that do not stem from OT metabolism. To investigate in detail how OT is metabolized in each species and secreted into specific substrates, one would have to perform a radiometabolism study whereby a radioactively labelled hormone is injected into an animal and samples are taken repeatedly to investigate excretion patterns (see for example43). Unfortunately such studies, while of great interest and importance, are often not feasible in the species at hand due to high invasiveness, budget considerations, and specific requirements related to handling radioactive material.

To summarize, proportions of IR in urine explained by IR patterns in the OT standard were considerably higher when samples were measured with the Arbor than the Enzo assay (Fig. 3, Table 3), in particular for wolf and dog samples, indicating higher antibody specificity and capacity to detect urinary OT and its metabolites/degradation products. This further suggests that the OT antibodies provided by the different manufacturers varied in the epitopes they recognized, hence different OT metabolites were detected by the two assays (see also44 for a comparison of two EIAs and a RIA), and emphasizes the lack of comparability of absolute hormones values across studies when different assay systems are used even if both assays were validated for the species and substrates at hand45. To illustrate this discrepancy, we found average OTM concentrations in the population pools (N = 11 dogs; N = 6 wolves; N = 8 humans) to be more than twice as high when comparable pool samples were measured with the Arbor than with Enzo assay36,37.

The Arbor assay was able to detect changes in pet dogs’ urinary OTM concentrations after intranasal treatment with OT nasal spray using a vaporizer mask and performed similarly to the previously validated assay38. Specifically, urinary OTM concentrations increased significantly following intranasal OT administration but not when a PL treatment was applied. Similar results were obtained with the Enzo assay36 and thus both assays appear suitable to determine administration success in studies using intranasal OT administration in dogs.

In addition to reporting assay validation parameters, Schaebs and colleagues36 outlined important factors to consider concerning sample storage (particularly regarding temperature and storage time) and highlighted the importance of sample extraction. Here we added the validation of another commercially available assay and found that both assays met the requirements of a “fit-for-purpose” validation35 and may be used to measure urinary OTM in dogs, wolves, and humans in behavioural or psychological research. The Arbor assay performed better with regard to binding sensitivity (i.e., maximum OD values achieved) and antibody specificity (proportions of IR explained). Hence, while further refinement of extraction protocols is still required to improve measures of accuracy, the assay system validated here may offer improved performance compared to the Enzo assay for the measurement of urinary OTM in dogs, wolves, and humans. Importantly, careful consideration of reported variation in assay accuracy and extraction efficiency in combination with CVs of QCs will allow estimation whether the assay system is accurate enough for a given study purpose particularly when expected effect sizes are known. To conclude, the present study further cautions against comparing absolute values across studies/labs when different assay systems were used and highlights the need for rigorous method validation in peripheral OT research before carrying out studies.