Hormones are central regulators of organismal function and flexibility that mediate a diversity of phenotypic traits from early development through senescence. Yet despite these important roles, basic questions about how and why hormone systems vary within and across species remain unanswered. Here we describe HormoneBase, a database of circulating steroid hormone levels and their variation across vertebrates. This database aims to provide all available data on the mean, variation, and range of plasma glucocorticoids (both baseline and stress-induced) and androgens in free-living and un-manipulated adult vertebrates. HormoneBase (www.HormoneBase.org) currently includes >6,580 entries from 476 species, reported in 648 publications from 1967 to 2015, and unpublished datasets. Entries are associated with data on the species and population, sex, year and month of study, geographic coordinates, life history stage, method and latency of hormone sampling, and analysis technique. This novel resource could be used for analyses of the function and evolution of hormone systems, and the relationships between hormonal variation and a variety of processes including phenotypic variation, fitness, and species distributions.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
Hormones are central regulators of phenotype, whose effects span multiple fields of research, from molecular biology to population biology1,
A particularly promising approach to answering such questions – and many others of broad interest to animal behaviour and organismal biology – lies in large-scale comparative analyses of the multitude of endocrine data that have been collected over the past several decades. Such analyses, conducted within a rigorous phylogenetic, environmental, and life-history framework, have the potential to illuminate the factors driving divergence in the hormonal mechanisms of behaviour, physiology, and morphology13,14. To date, most analyses have focused on relatively small taxonomic scales, and on comparing mean trait values across populations and species15,
In this context, we present HormoneBase, a resource of compiled endocrine data across vertebrates. Included in this dataset are >6,580 measures of mean and within-population variation in glucocorticoids and androgens from 476 species (Figs 1,2; Table 1) that were reported in 648 publications – and additional unpublished resources – between 1967 and 2015. Additional information on geographic location (Fig. 3), life history, study design, and time period accompanies each entry. By making HormoneBase publicly available we aim to encourage data sharing across the scientific community and facilitate research into the function and evolution of physiological traits.
Endocrine data were obtained from publications, and from several unpublished datasets (Data Citation 1: figshare https://doi.org/10.6084/m9.figshare.5649727). We searched for studies that conformed to our inclusion criteria using: (i) online academic databases (e.g., Google Scholar, Web of Science), and (ii) cross-referencing from other published works. Studies were selected for inclusion if they included data on circulating glucocorticoids (baseline or stress-induced corticosterone/cortisol) or androgens (testosterone/11-ketotestosterone) that: (i) were from free-living populations, (ii) were collected from adults that had not been subject to an experimental manipulation prior to sampling (e.g., of hormones or the environment), (iii) measured plasma levels, (iv) did not pool data across males and females, or across adults and juveniles, and (v) were reported in or could be converted to a standard unit of measurement (ng/mL).
Published values were obtained from text, tables, or supplementary materials, or extracted from published figures using the program Data Thief III (http://datathief.org). Entries include mean circulating concentrations (ng/mL) for each population/group and time period; whenever possible, data on within-population variation (coefficient of variation, standard error), range (maximum and minimum values), and sample size are also included. When papers did not directly report the coefficient of variation (CV), it was calculated from the standard deviation (SD) or standard error (SE) and sample size (n), according to the following formulas: or . If papers reported that outliers had been excluded we noted this for each hormone measure, and noted the criteria for exclusion where provided.
When a single reference reported multiple means for different groups of individuals (e.g., populations or life history stages), or from different time points, data were entered on separate lines. In cases where papers reported a single hormonal mean from data collected across multiple populations, the location of up to three of the sampled populations was noted in the entry. When stress-induced glucocorticoid levels were measured at multiple time points during a standardized stress series, only the time period at which mean glucocorticoid levels were highest was included.
The decision was made to focus HormoneBase on androgens and glucocorticoids because these are currently the most widely sampled hormones across vertebrates. Because hormone concentrations are not directly comparable across biological matrices, we included only plasma hormone concentrations. Hormone levels are also increasingly being measured in other biological matrices (e.g., feces, feathers) but these sample types are not very well-suited for large-scale comparative analyses because they use hormone metabolites, which differ within and across species and assay/antibody types24.
Sample Collection and Assay Method Data
Because sampling method and assay technique may influence circulating hormone levels, we included specific information about capture, sampling, and assay approach. The time of day (i.e., range of hours) that samples were collected was recorded as provided. The specific method used to capture free-living individuals was noted, and the capture/sampling method assigned to one of three categories. “Active” sampling methods were those in which a blood sample was obtained rapidly and within a known period of time after targeting a previously undisturbed animal. “Passive” methods are those in which animals were sampled after an unknown period of restraint (e.g., non-continuously monitored traps or nets). “Attractant” methods are those in which animals were drawn to the site of capture using some type of attractant (e.g., song playbacks, baited traps). The maximum sampling latency (interval from capture to blood sampling) was recorded for androgens and baseline glucocorticoids, and the type of acute stressor and the interval from initial capture to the collection of a stress-induced glucocorticoid sample were also recorded.
To explore and control for potential differences in assay technique, we included information on the assay method used to assess plasma hormone levels (e.g., radioimmunoassay, enzyme immunoassay), and, where provided, the specific antibody or commercial kit that was used. Because laboratory identity can also influence measured hormone levels25, we recorded the identity of the laboratory in which hormone assays were conducted. For collaborative papers that did not identify where assays were conducted, and were the product of multiple endocrine laboratories, we arbitrarily assigned one of the collaborating laboratories as the assay location.
Taxonomic and Geographical Data
All endocrine data include associated taxonomic information, using common and scientific names. Where relevant, scientific names were updated to reflect recent reclassifications. Taxonomy was determined using major lineage-specific trees (ray-finned fishes26, amphibians27,28, mammals29, squamates30, turtles31, and birds32).
The location name, geographic coordinates (latitude and longitude in degrees decimal), and elevation (in meters) of the population from which the data were collected are also recorded for each entry. When not provided in the original publication, approximate geographic coordinates and elevation were determined by searching for the location name in Google Earth.
Temporal and Life History Data
To enable assessment of seasonal and life history patterns17,33,34, we included information on the time period of sampling as reported (the month and year in which data were collected) and the life history stage of sampled individuals. Measurements were characterized as coming from breeding or non-breeding individuals, or a combination of the two. Designations were based on author classifications when provided. When life history stage was not provided in the original data source, samples were classified as coming from a combination of breeding and non-breeding individuals, except in cases where seasonally breeding populations were sampled only during months that did not overlap with the breeding season.
When life history sub-stage was provided in the original data source, this information was also included in the database. To provide some standardization across species, and widely varying terminology, reported sub-stages were combined into fourteen categories: pre-breeding, courtship, incubation, copulation, gravid/pregnant, non-gravid/pregnant, laying, young care, lactation, post-breeding, migration, torpor, hibernation, pre-basic moult. When information about life history sub-stage was not contained in the original data source, this field was left blank. An associated column provides information about whether the sampled individuals were confirmed to be in a given life history stage (e.g., incubating birds captured off their nests), or whether the life history stage reflected the typical stage for individuals in that population at the time of sampling (e.g., birds sampled in mist nets during the breeding season but not traced to a specific nest). For birds, information on moult status was also recorded as provided35.
The HormoneBase dataset (Data Citation 1: figshare https://doi.org/10.6084/m9.figshare.5649727) is provided as two comma-separated values text files: a single file that includes all data described above (HormoneBase_v1.csv), and a file that contains the reference information for the source of each entry (HormoneBase_references_v1.csv). Variable names are provided in the first row, with details of each variable and units measured summarized in Table 2 (available online only). These files are accompanied by a metadata pdf file (HormoneBase_MetadataData.pdf).
The data presented in HormoneBase are primarily from published, peer-reviewed sources, but also contain unpublished data provided by authors. Data entry was initially proofed by each lab that entered the data to confirm that the entries matched reported data. Upon submission to the central repository, two members of the database entry team independently examined each entry to identify incomplete entries or extreme values. All hormone measures were also mapped onto a phylogeny to reveal putative taxonomic outliers. When such cases were identified, entries were confirmed or corrected by consulting the original source.
How to cite this article: Vitousek, M. N. et al. HormoneBase, a population-level database of steroid hormone levels across vertebrates. Sci. Data 5:180097 doi: 10.1038/sdata.2018.97 (2018).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Vitousek, M. N. et al. figshare https://doi.org/10.6084/m9.figshare.5649727 (2017)
We wish to thank Sarah Burgan, Stefania Casagrande, Pete Collins, Julia Cramer, Travis Flock, Jennifer Hoots, Michael Jessel, Lucia Mentesana, Meredith Miles, Steffanie Munguia, Samantha Murphy, Sophie Nicolich-Henkin, Sara Ocasio, Eric Schuppe, Beth Skinner, Jocelyn Stedman, Ashley Stenstrom, Sarah Talamantes, and Andrew Wang for assistance compiling hormone data. We are especially grateful to John Wingfield for submitting a substantial amount of unpublished data for inclusion in HormoneBase, and to Patrick Kelley for compiling these data.