Introduction

Measuring glucocorticoids (GCs) deposited in hair is an increasingly popular method for biomarker-based stress assessment. Hair is sampled easily and painlessly, it is often an abundant source of material, and it has been argued to have superior qualities over other methods for analyzing GCs when it comes to gauging chronic stress1,2,3. If GCs are sequestered from the blood stream and locked into place in the growing hair at the level of the hair follicle, a single strand of hair contains within it a historical record of the HPA axis activity of its owner spanning months into the past. This idea is taken to the next level when researchers segment hairs and analyze different sections, ostensibly corresponding to different periods in the past, to make inferences regarding the perceived stress levels over time of their subjects, whether human patients4, captive animals5, or long-dead mummies6.

But are hair glucocorticoids (hGCs) a robust marker of stress? Local production of GCs in the hair follicle has been confirmed7, with the local (follicular) HPA axis appearing to respond to local stressors independently of the rest of the organism8. The rate of incorporation of GCs into hairs9 is unclear and the mechanisms by which this takes place are unknown10,11. Taken together, it is unclear to which degree hGCs are reflective of the (central) stress response of an individual. It has even been suggested that an individual’s hGC levels may follow a circadian rhythm, changing with the time of day12. Moreover, with some evidence that hGC concentrations change much faster than can be explained by mechanisms concerning only incorporation of GCs in the hair follicle13,14, can sections of a hair really be related to a specific period in the past?

Hair analysis for quantitating GC levels builds on a long-standing forensic practice of looking for deposition of toxins and illicit substances in hairs and fingernails15. Methods for detecting residual anabolic steroid hormones in hair gained momentum in the wake of a high-profile doping scandal in competitive cycling in 199816,17, and were subsequently further extended for quantitating also endogenously produced steroids18. A study in 2002 on rock hyraxes2 is, we believe, the first study to infer preceding stress from the concentrations of hGCs. In 2007 (the study seeming to have been initiated years earlier19), the first study linking hGCs to stress was published for humans20, citing the method’s successful application in animals (specifically, Davenport et al.1). Notably, with a leg-up with respect to method development from forensics, none of these early papers needed to address methodological issues with respect to when and how GCs were sequestered into hairs. There are, however, crucial differences between the forensic application of hair analysis and hair analysis of GCs for estimating preceding stress. With illicit (exogenous) substances, concentrations are less important – there are no legal circulating levels of cocaine, athletes should not present with traces of nandrolone in their system, and if your hairs contain traces of thallium you have angered a regime and you are secretly being poisoned – quantitation is mostly only relevant with regards to detection limits. Endogenous compounds are more problematic in a forensic context as relative levels become important (testosterone doping in athletes being a notorious forensic conundrum21). Moreover, relating the position in a strand of hair of an exogenous substance of interest to a time of exposure in the past is highly approximate, even if the substance’s interactions with the hair matrix are well-characterized22,23. Individual differences in hair uptake (capacity) have also been noted, suggesting that comparisons of measured quantities between subjects may be inappropriate24,25. For hGC analyses, by contrast, concentrations, timing of deposition, and inter-individual comparisons are key features. Here, there are no clear precedents within the vast body of literature on forensic hair analysis for hGC analyses to lean on.

Despite the many unknowns surrounding the use of hGCs as a measure of chronic stress, the biomarker is presently used to gauge mental illness26, the wellbeing of human trauma victims27 (and long-term consequences of the trauma28,29), post-traumatic stress disorder (PTSD) sufferers4,30,31, and children20,32; to assess animal welfare in wildlife33, captive animals34,35 and laboratory animals36. The mismatch between the uncertainties of the method and the confidence with which it is applied is concerning.

The present systematic review strove to collect and evaluate the empirical evidence supporting the use of hGC analysis as a method for assessing physiological and psychological stress. With the method being used both for studying animals and humans, and the two bodies of literature feeding off one another, limiting the study to either humans or non-human animals would not have painted a complete picture. Unlike previous reviews/meta-analyses3,26,37,38, we thus set out to collate data from all vertebrate species; to include studies carried out in human and non-human animals alike. Specifically, we investigated whether hGC levels had been found to correlate well with concentrations of GCs in other biological matrices, proven to be reflective of HPA axis activity in the recent past. We collated studies that compared hGC levels in stressed individuals to those of unstressed controls to determine whether an up-regulated HPA axis would produce a predictable increase in hGCs. Through meta-analytic investigation of these two sets of studies, we attempted to answer whether hGC concentrations appear to be reflective of the central HPA axis activity. Specifically, the focus in this review is to assess the evidence supporting the claim that stressful stimuli result in measurable increases in hGC levels. Moreover, by subdividing controlled studies by type and temporality of the stressor, we attempted to determine whether hGC levels are better suited for describing certain types of stress than others.

Material and Methods

The methods listed below were pre-specified in a study protocol accessible online since Jan 13, 2016 (Supplemental materials C).

A broad – inclusive – search strategy was employed in an attempt to find all relevant publications that could provide unambiguous evidence of hGCs being related to the HPA axis-activating stress response of an individual. Studies where hGC levels were used as a measure of stress or where hGC levels were correlated to GC levels in other biological matrices (blood, saliva, urine and feces) were retrieved through multiple databases (MEDLINE, Web of Science, EMBASE, Zoological Record, and PsycINFO). The publication searches were conducted in January 2016. Whereas we have contextualized our findings with more recent examples, no studies published past this date were included in the analyses. Quality assessments were made according to an adapted (nine-item) checklist and basic study information was extracted along with hGC results. Synthesizing data from multiple sources, summary estimates were created separately for correlations with different biological matrices. Similarly, experimental designs deemed fundamentally incompatible were separated out and individual summary estimates were created. Specifically, studies were classified as studying “induced (acute) stress”, “chronic stress”, “observed stress” (where the stressor was inferred by an observer), “self-assessed stress”, “past stress” (where a subject was exposed to a stressful period, which had subsequently ended prior to hair collection) and post-traumatic stress disorder (“PTSD”; which we chose to include because its link with hGCs was receiving great attention at the time this review was scoped). Random effects models were used throughout and differences in experimental subjects and controls were expressed as standardized mean differences. For detailed descriptions of the methods, refer to Supplemental materials A.

Results

A total of 3,518 unique entries were found using the search strategy, of which 468 entries were retained for full text analysis (Fig. 1). A majority of these studies were subsequently excluded due to not meeting the pre-stated inclusion criteria: 28% were excluded due to their exploratory study design – often characterized by the lack of a control group and a clear a priori hypothesis; 16% presented no data from a controlled study – these were mostly method papers, reviews, opinion papers, and other narrative journal entries; 26% were not peer-reviewed publications – these were mostly meeting abstracts and theses. Other incompatible study designs, and entries where the full text could not be obtained, made up 17% of the entries retained for full text screenings.

Figure 1
figure 1

Flow chart outlining the systematic search strategy, the subsequent screening, and inclusion/exclusion of database entries. The diagram has been adapted from the PRISMA Flow Diagram87.

For the entries retained for full text screening – where all texts were verified to concern the use of hGCs – an exponential growth in method adoption is obvious: 2015 saw more publications on hGCs than had been published between 2003 and 2011 in total. Presently, a new publication (counting also non-peer reviewed entries) on hGCs is available online every three days (or less).

Study quality of experimental studies

Of the 59 peer-reviewed publications included in the present systematic review, 38 papers reported on 42 studies with a stress group/control group design that could be assessed for study quality.

A salient trend was found when assessing the risk of bias: A majority of the 38 papers did not account for the possibility that a stressor other than the one that was purportedly studied could have influenced the results. This is evident in Fig. 2 focusing on checklist items 2, 3 and 8: The influence of concurrent interventions or unintended exposures could only be ruled out in 9 (24%) of the studies (item 3), the influence of confounding factors could only be ruled out in 16 (42%) of the studies (item 2), and only 12 (32%) of the studies featured a study design that ensured that the subjects were equally exposed to any confounding factors (item 8). In only three studies (8%) could all three sources of bias be ruled out entirely. Similar ambient conditions for stress and control groups could also only be guaranteed in 15 (39%) of the studies (item 5). Remarkably, only 3 (8%) of the studies reported on blinding of the outcome assessors (item 6), even though this is an explicit recommendation of most present-day best-practice frameworks (e.g. the ARRIVE guidelines39). In no one study were all of the sources of bias addressed, and in a few none were (for a by-entry summary of the risk-of-bias analyses, refer to Supplemental materials B, appendix 1).

Figure 2
figure 2

Results from the risk-of-bias checklist assessment of the experimental study designs.

Study characteristics and data extraction

The studies retained for analysis presented a diverse set, with no two study designs quite alike (Tables 1 and 2). Of the studies retained for analysis, roughly half (48%) were human studies. Both sexes have been studied in roughly equal numbers (52% female subjects across all studies), but only rarely were equal sex ratios employed in any one study; study objectives and opportunistic sampling of e.g. wildlife populations tending to bias the sex ratio in favor of one or the other. We made initial attempts at exploring sex differences – similar to a previous meta-analysis38 – however the data were insufficient to draw any conclusions. Similarly, when extracting data we had harbored hopes of being able to compare the effects of differing sampling and analysis protocols that have been discussed previously40. However, the laboratory methods employed were fairly similar and study designs fairly dissimilar, the combination lending itself poorly to stringent analyses. Human studies were consistent in sampling the posterior vertex of the head, whereas the non-human studies appeared to sample regions by convenience or just by random (e.g. studies in dogs have sampled backs, shoulders, chests, and legs, depending on research group and study). Although often discussed as a potential issue41,42 no one study admitted to including hair follicles in their hair samples and all but six papers5,34,43,44,45,46 explicitly described methods designed to ensure samples being free of follicles. It has been theorized that the color of a hair can influence the GC content47, however the results are inconsistent10 and hair color is rarely reported. We consequently did not attempt to extract information on this. Only human and other primate studies employed the “stress calendar” idea, sub-sectioning hairs to infer circulating GC levels at multiple time points in the past from the same sample. Of all the studies retained for analysis, a clear majority (48 studies, 83%) employed a washing step, intended to remove contaminants from the outside of the hairs, and all but two studies minced/pulverized the hairs prior to analysis. For quantification of GCs, antibody-based methods were most frequently employed (48 studies, 83%), however numerous different protocols/antibodies have been utilized.

Table 1 Study characteristics for studies with extracted correlations. In total, 29 comparisons were extracted from 27 peer-reviewed publications, collecting data from 1,093 subjects across nine species. These studies were used for producing summary estimates of correlation coefficients between hGC concentrations and GC concentrations in other biological matrices.
Table 2 Study characteristics for experimental studies. In total, 42 comparisons were extracted from 38 peer-reviewed publications, collecting data from 3,199 subjects across 16 species. These studies were used for creating summary estimates for the difference in hGC concentrations between stressed subjects and controls for different sub-categories of stressors.

When extracting data, two studies – Manenschijn et al.48 and Luo et al.4 – were singled out as having a reported precision more than tenfold higher than the other 38 studies (including studies utilizing the very same methodology in comparable subjects). We believe that this is simply due to incorrect reporting of the measure of dispersion. Unable to reach the authors for a comment – despite multiple attempts – we have tentatively included the data from these studies, assuming that the graphically presented measures of dispersion were in fact SEMs, rather than – as listed – 95% CIs. Three studies – two experimental studies of chronic stressors33,49, and one study reporting a non-significant correlation between hGC and GC in saliva50 – were excluded from further analysis as critical information could not be obtained from the corresponding authors (none of the studies listed the number of samples/subjects used in their analyses). We do not expect these exclusions to have significantly altered our summary estimates, however, as these studies were of moderate size and all fell into well-populated subgroups.

Correlations with GC in other matrices

Meta-analyses of correlation coefficients revealed a great deal of heterogeneity between studies, as could be expected from the diverse set of studies analyzed (Fig. 3). Significant synthesized meta-correlations could be found between hGCs and GCs in blood, saliva and feces. A significant correlation could not be found between hGCs and GCs in urine. However this analysis featured only five studies (collecting 169 subjects), all with fairly high intra-study variance of data. Leave-one-out analysis furthermore revealed that the statistically significant correlation found between GCs in blood and hGCs could not be substantiated if data from the study by Yu et al.51 were removed. Moreover, removing the data from the study by Accorsi et al.52 would more than halve the synthesized correlation coefficient between GCs in feces and hGC (putting it in range with the other correlations at r = 0.22), suggesting that the strength of the correlation may be somewhat overestimated. Additionally, removing a single study for each correlation summary would reduce heterogeneity markedly, bringing down the largest I2 value to 38%, suggesting a small number of studies were responsible for a majority of the heterogeneity (the leave-one-out analyses are presented in their entirety in Supplemental Materials B, Appendix 2). A factor adding to the heterogeneity may also be that some studies averaged multiple samples over time – e.g. the study by D’Anna-Hernandez et al.53 where hair and saliva samples were averaged across four sampling points throughout human pregnancy, or Sauvé et al.54 comparing single hair samples to urine samples averaged over a 24-hour period. With a relatively small dataset, we did not see fit to analyze these as separate subgroups, increasing our “researcher degrees of freedom”55,56, and potentially flagging spurious correlations (this would, further, only have been possible for the correlation with GC in saliva). Moreover, the studies that correlated both point estimates and averages31,57,58,59 did not demonstrate a consistent difference between the two approaches.

Figure 3
figure 3

Synthesis of correlation coefficients. Forest plots are presented for correlation coefficients between hGCs and GC in (A) blood, (B) saliva, (C) urine, and (D) feces. Where multiple coefficients were reported in the same study (grey markers) these were weighted accordingly, to avoid biasing the random effects model. The summary estimate correlation coefficients are reported with 95% CI.

hGCs as a measure of stress

Summarizing the evidence from the experimental studies using random effects models produced varied results (Fig. 4), reaffirming our decision to carry out subgroup analyses. Induced (acute) stress models produced a clear elevation in GC concentrations measured in hairs (Fig. 4A) with low inter-study heterogeneity (I2 could not be estimated). Chronic stressors also produced a significant elevation in deposited GC compared to control groups (Fig. 4B). The results from the chronic stress studies were however highly heterogeneous with a majority of the variance of the summary estimate stemming from between-study variation, as opposed to within-study variation (I2 = 80%), suggesting that not all of the studies were comparable with respect to the stress response and its effect on hGC levels. Simply put, it is unlikely that these studies all describe a similar HPA axis activation in response to the studied stressor; it is, for instance, likely that some scenarios simply did not induce a stress response. Observed stress (Fig. 4C) and self-assessed stress (Fig. 4D) produced unclear results. Finally, stressors that had subsided at the time of sampling (“past stress”) did not produce a measurable elevation in hGCs (Fig. 4E). Studies concerning hGCs measured in PTSD sufferers similarly generated unclear results (Fig. 4F), with a combination of studies showing both elevations and decreases in hGC output relative to a control group.

Figure 4
figure 4

Forest plots summarizing results from induced (acute stress) studies (A), chronic stress studies (B), studies where stress was inferred through observation (C) or assessed by the subject (D), studies where the stressor had passed (E), and studies featuring subjects with PTSD (F). The number of subjects in the studies are listed with the control group last. Summary estimates (Hedges’ g) were constructed using the DerSimonian-Laird approach and are reported with 95% CI.

Discussion

Study landscape as evidenced by systematic searches

With a couple of new papers appearing every week that concern or utilize hGC analysis, it is fair to say that it has become a widespread method for assessing stress. But hair-growth is a slow process, and popular speculation60,61 suggests that GCs are sequestered by hairs over several weeks – if not months. Consequently, controlled studies are for logistic reasons hard to design and execute. Perhaps this is why our search strategy turned up more narrative reviews, opinion papers, and book chapters lauding the method than it did actual controlled studies providing empirical evidence that the method is a sound one. Moreover, the typical (i.e. the most numerous) study employing hGC analyses, published prior to February 2016, was an exploratory one. Characteristically, a single cohort of subjects had hair samples collected along with a number of other environmental, physiological, psychological, and/or demographic data. Correlations were then constructed to scrutinize which parameters were linked to elevated hGC concentrations. The topics of these studies are varied, from investigations of environmental effects on squirrel gliders62 or social effects on German shepherds63 to probing cultural64, environmental65, nutritional32 or genetic66 influences on psychological stress in people of differing ages. The implicit prior assumption for studies of this kind is that hGCs are linked to central HPA axis functioning and are thus a measure of (chronic) stress. This puts even more of an onus on the (relatively small number of) controlled studies to validate and affirm the use of hGC concentrations as a measure of stress.

Is there support for hGCs as a biomarker of stress?

The present investigation supports the use of hGCs as a measure of central HPA axis functioning and, consequently, as a stress-sensitive biomarker. The compounded data however calls into question the temporality of the marker, suggesting it is a better marker for ongoing than of past stress.

In studies where subjects were exposed to a controlled stressor, a predictable elevation was found in most cases. Whether repeated ACTH challenges67 or a more elaborate protocol combining multiple stressors were employed68, a consistent increase was found across species when comparing challenged subjects to unstressed controls. The effect of acute stress on hGC levels seems, furthermore, to be rapid. Whereas most studies sampled hairs at least two weeks after having applied the stressor, the study by Cattet et al.14 is remarkable in that they report elevated hGCs within hours of stressor onset. In a similar vein, most stress protocols were applied continuously for weeks before hGC concentrations were evaluated, but González-de-la-Vara et al.67 found that they could detect an elevation in hGCs two weeks after a pair of sustained-release ACTH injections. Both studies point to hGC concentrations being reflective, primarily, of events in the recent past, as opposed to historical stressors. This is also consistent with hGCs correlating with GCs in other matrices.

Although both inter- and intra-study variances were high for the collated data, it is clear that hGC concentrations correlate significantly with GC concentrations in other matrices. The synthesized correlation coefficients are weak to moderate – ranging from 0.13 to 0.56 – but this is in range with the correlations between established matrices obtained in these very same studies54,59,69,70. Due to large fluctuations stemming from the pulsatile nature of GC release71, coupled with the different temporality of the matrices – serum and saliva concentrations of GCs change in a matter of minutes in response to a stressor, urinary and fecal GCs change over a period of hours72 – these correlations will inevitably be moderate at the most. The correlation between hGCs and GCs in feces is the strongest of the four, which is to be expected as fecal samples integrate circulating GC concentrations over a period of several hours. Hairs are similarly suggested to sequester GCs from circulation over a longer time window. In the face of popular claims, it is unlikely that this time window is several weeks long, however, as hGC concentrations also correlate significantly with serum and salivary concentrations of GCs.

The effect of confounders on measured hGC levels in chronic stress studies

When compiling studies of chronic stress, a link between individuals experiencing stress and elevated levels of hGCs was found, albeit a weaker link than for acute stressors. The greater level of heterogeneity of this dataset is probably in part because some of the studies were carried out under highly uncontrolled circumstances. With long-term studies featuring subjects – whether human or non-human – in an uncontrolled environment, it is hard to ensure that the studied stressor is the sole and most influential source of stress. It may be that a lack of dietary salmon elicits a physiological stress reaction in grizzly bears, as suggested by Bryan et al.43, but it is quite impossible to tell what other factors might influence the life and allostasis of these bears. The confounding factors of this study may well have overshadowed the effect the authors were looking for. Similarly, military training is not all long marches and adrenaline-fueled combat training. With no outside verification, the soldiers undergoing basic training studied by Boesch and collaborators73 may not have had a more active HPA axis than e.g. an office worker with an active lifestyle in the period of sampling. This is not to criticize these experiments; rather, this is to highlight the fact that a number of studies into chronic stressors have an exploratory element to them, as the magnitude of the chronic stressor is hard to judge in relation to a host of ambient stressors. Our risk-of-bias assessment singled out unrelated confounding factors as the most common unchecked source of bias. Only 24% of studies could account for external confounding factors in the studied period, and only in 32% of studies could they be assumed to have been distributed equally between the studied subject groups. The evidence supplied by the chronic stress studies should thus be interpreted carefully.

Differences between the human concept of stress and HPA axis activity

In our investigation, studies where periods of stress were inferred presented highly heterogeneous data. It has been shown before that when human subjects are asked to introspectively assess their own level of stress, assessments correlate poorly with their actual HPA axis functioning38,74,75. In the present investigation we see a similar trend for studies relying on self-reported stress. Whereas we will note that the present investigation contains only a handful of studies, no consistent trend or even weak effect can be inferred. This is not to say that the subjects were not experiencing psychological stress – the studies collect data from distressed subjects ranging from survivors of natural disasters76 to patients sourced from mental health services31,77 – but it serves as a reminder that the human concept of stress is not synonymous with the prototypical fight-or-flight response. Different states of stress will involve the HPA axis differently. This is further exemplified by the studies of PTSD, where in two studies31,78 a notable reduction in hGCs is found for PTSD subjects when compared to healthy controls. We specifically analyzed PTSD studies separately as it has been suggested that PTSD is accompanied by a lowering in circulating GC levels, as opposed to an elevation. Notably, PTSD subjects are identified through clinical scores suggesting the subjects were assigned to groups according to arbitrary cutoffs in a continuum of chronic stress diagnoses. This muddying of the waters, where the line between chronic stress conditions and PTSD is blurred, may, in part, explain why no clear trend is found concerning hGC profiles for either. In the future, a larger dataset that would allow for a more stringent subgrouping of chronic stress studies based on e.g. clinical scores may assist in identifying more uniform profiles.

Studies where the stressed subjects were identified through observation similarly did not paint a consistent picture. Whether the result of studying animal behavior5,79 or of putting human patients through structured interviews4,80, there seems to be a mismatch between subjectively assessed stress and hGC levels. For this category of studies we will note that it is particularly concerning that no blinding was employed, even though the findings hinge completely on the subjective assessment of an external observer. Would the marked difference between groups have been as profound in the study by Carlitz et al.5 (the average effect size was greater than that found in any induced stress study) if periods of stress had been recorded by a blinded observer? The study was made even more subjective and bias-prone by not ever defining the studied stressors, leaving it in the hands of unblinded observers to determine what to consider a stressor. With few studies and heterogeneous results, it is currently hard to determine whether studies that rely on externally assessing stress can provide empirical evidence with respect to the utility of hGC analyses.

Temporal aspects of hGCs as a biomarker of stress

An important factor shared between the chronic stress studies that demonstrate a clear difference between stressed and control subjects is that the stressor persisted at the time of sampling. When singling out the studies where the stressor could be positively ensured to have subsided at the time of hair sampling, the pattern was found to be different. In the study by Kapoor et al.81, pregnant rhesus monkeys were exposed to a daily acoustic startle stress protocol for five weeks. Serum samples analyzed for circulating GC levels were used to verify that the protocol elicited a significant stress response throughout the period. When analyzing hGC concentrations 3–13 weeks later (depending on subject), no elevation could be found relative to a control group; not a trace to be found of a considerable elevation of circulating GC levels persisting for five weeks. Similarly, when Ashley et al.34 analyzed hairs from both reindeer and caribou two weeks after a single (non-sustained-release) ACTH challenge, no elevation could be found. Fecal GC analyses confirmed that the stressor had subsided after 24–48 hours. With only two studies in this category, we should be careful not to over-interpret; however, this is all part of a recurring pattern. In a recent meta-analysis, Stalder et al.38 reanalyzed historical data from human studies in aggregate – collecting data from 66 studies and more than 10,000 hGC samples – and found that in cases of past/absent stress, no relation with elevated hGC concentrations could be found. The idea of hairs containing a historical record of past stress is, and remains, completely unproven, empirical evidence instead pointing to hGCs being a measure of concurrent stress.

Evidence that hGC concentrations are a historical record of stress could also come in the form of studies sub-sectioning hairs, inferring circulating GC levels at multiple time-points in the past. However, the GC levels of hairs were found to be similar across all the sampled segments – individuals with elevated levels of hGCs had higher levels of hGCs in all segments when compared to controls77,82,83. In only two studies the authors attempted to construct a narrative based on point-to-point fluctuations in GC concentrations along the hair shafts. The findings by Luo and collaborators4 are however marred by a strong wash-out effect, with hGC levels successively becoming lower the further away from the scalp a segment is sourced. The most distant segment is purported to contain the lowest levels of hGCs as this segment is hypothesized to correspond to a period before a major trauma. However, this also holds true for the non-traumatized controls, undermining the hypothesis. The study by Carlitz et al.5 is similarly problematic in that the narrative seems to have been constructed post hoc, and only three individual profiles are shown in the paper. To our knowledge, there is little evidence to suggest that interactions between a steroid hormone and a strand of hair are strong enough to lock the molecules permanently in place. This effect is the basis by which a specific section of hair can be related to a time-point in the past. Convincing evidence that baleen from whales can trap hormones, leaving a historical record of hormonal fluctuations, has been presented84,85. A similar case for hair – a distantly related keratinous matrix – remains elusive however. A recent investigation using radiolabeled GCs in monkeys has instead presented fairly conclusive evidence that GCs do not form discrete bands in strands of hair, but that GCs move along the shaft of hair post-deposition86. The difference may lie in the gauge and density of the matrices, with baleen samples being extracted from a depth of several centimeters, using power tools, as opposed to processing the entirety of a micrometer-thick hair.

Regardless, the evidence provided by sub-sectioning of hairs, taken altogether, rather seems to suggest that hGCs are distributed along a strand of hair by longitudinal transport of the hydrophobic hormones, whether through diffusion or capillary action (possibly helped along by the waxy sebum). Reading too much into point-to-point fluctuations thus currently appears to be a case of seeing patterns where there are none to be found.

Conclusions

Combining results of controlled studies with the correlational evidence, it seems fair to state that hGC levels seem to relate to central HPA axis functioning. GC levels in hairs appear to be an appropriate marker of ongoing physiological stress. If the stressor persists, hGC analyses will remain useful; however, it is currently unadvisable to interpret events in the past based on hGC levels. The idea of GCs being locked into place, providing a historical record of HPA axis functioning has been called into question every time it has been tested in a controlled experiment. Based on the collected evidence we would strongly advise against sub-segmenting hairs, speculating about specific periods in the past. We would be delighted to be proven wrong by a future study, but there is something to be said about, not only the studies our search strategy uncovered, but also the ones that could not be found. Whereas it is hard to design a study where subjects’ stress levels are controlled for weeks on end, it is far from impossible to design a study to test the hypothesis that a stressor in the past can be uncovered in a specific segment of hair. Yet, these studies are nowhere to be found. Whereas the data material did not allow for a stringent exploration of publication bias, it seems highly probable that a number of studies providing negative results have remained unpublished. With this review, and others like it, it is our hope that these negative findings may find their way into publications, providing a better picture of when hGC analyses are appropriate, and when they are not.

We strongly recommend that current and future research into hGC analyses focus on some of the fundamental questions. How are GCs incorporated into hairs? How long do they remain post-deposition? There are a number of basic questions that can be answered by small means, utilizing clever study design. With two notable exceptions9,86, studies utilizing radioisotope-labelled GCs are virtually completely missing; yet, the information that could be gained from studies of this type is invaluable. Applied studies utilizing hGC analyses, in the meanwhile, would do well to approach theoretical concepts surrounding hGCs in an agnostic fashion. We do not currently know how far into the past we can measure stress through sampling hairs. Stating that a certain protocol measures e.g. three months of preceding stress is misleading and perpetuates misinformation. Moreover, it is important that the duration of stressors be recorded and reported as accurately as possible. If we are able to pin down the temporality of hGCs as a stress marker, findings of studies in the past may have to be re-interpreted. This will however only be possible if there is enough information on timing of stressors relative to hair samplings. By detailing, as best as they can, the timing of stressors, researchers will, in a sense, future-proof their study results. Moreover, we hope that researchers will be more wary of unaccounted-for sources of stress in their studies. In many cases, these cannot be avoided. Consequently, we would encourage the reporting of possible “contaminating factors” – sources of stress that could not be accounted for in the study design. We would also urge authors to publish their raw data, or, at the very least, keep a record of them. In our investigation we were disappointed to find peer-reviewed publications missing crucial basic information (e.g. the number of subjects analyzed in a study) and to learn, when contacting the authors, that the information could not be produced. With most journals being able to host supporting data files online, and a host of repositories available for when journals cannot, there is no reason for not making data available and risking losing important records. Finally, we will note that many of the analyzed papers utilizing hGC analyses have made useful contributions to science; whether giving a voice to overlooked wildlife, trying to improve animal welfare, or assessing the mental wellbeing of people. We may seem critical of some of these studies, however, this comes from an adamant belief that we can and should do even better. We must constantly hold ourselves to a higher standard, in order to improve our field of research.