Long-term monitoring of the Iberian ibex population in the Sierra Nevada of the southeast Iberian Peninsula

This dataset provides long-term information on the presence of the Iberian ibex (Capra pyrenaica hispanica Schimper, 1848) in Sierra Nevada (SE Iberian Peninsula). Data on the abundance and demographic structure of the Iberian ibex population were compiled over the last three decades. Transects were laid out to record different variables such as the number of individuals sighted, the perpendicular distance of each group of Iberian ibex to the transect line and sex as well as age of individuals in the case of males. These data enabled the calculation of population parameters such as density, sex ratio, birth rate, and age structure. These parameters are key for Iberian ibex conservation and management, given that Sierra Nevada harbours the largest population of this species in the Iberian Peninsula. The data set we present is structured using the Darwin Core biological standard, which contains 3,091 events (582 transect walk events and 2,509 group sighting events), 5,396 occurrences, and 2,502 measurements. The occurrences include the sightings of 11,436 individuals (grouped by sex and age) from 1993 to 2018 in a total of 88 transects distributed along Sierra Nevada, of which 33 have been continuously sampled since 2008.

Population density was calculated, using specific programs such as TRANSECT or DISTANCE in different versions. The rest of the parameters that define the population, i.e. sex ratio, birth rate, and age pyramid (for males), were estimated as indicated in the Usage Notes section. The global analysis of these parameters was decisive in new management guidelines, adapting them to theoretical values. This management approach is useful in handling the increased probability of the appearance and transmission of infectious diseases, while avoiding the habitat deterioration.
The dataset described here contains annual observations for each transect. In total, 11,436 individuals of Iberian ibex were recorded since 1993 in a total of 88 transects. The Open Access publication of these historical data constitutes a valuable resource for consultation by scientists and managers, guaranteeing its conservation through free and open access (without legal or economic restrictions) to the different sectors of society. The data can be reused to generate new information useful for the management of these Iberian ibex in Sierra Nevada and similar mountains.

Methods
Study area. Sierra Nevada is a massif located in the south-eastern Iberian Peninsula (37°14′-36°54′N; 2°37′-3°39′W) within the Baetic System (part of the Penibaetic mountain ranges), near the Mediterranean Sea (Fig. 3a). Sierra Nevada has the highest summits of the Iberian Peninsula, the peak Mulhacén reaching 3,479 m a.s.l., making this the second-highest mountain range in mainland Europe, after the Alps. Though the general climate is Mediterranean, the mountain morphology gives it the characteristics of a continental climate. Biogeographically, five of the six thermotypes defined for the Mediterranean region appear in Sierra Nevada, from the thermomediterranean in the lowest and driest areas of the east to the cryoromediterranean in the highest peaks 47 . The mean precipitation gives rise to a dry and subhumid ombrotype, although exceptions appear due to extreme drought (eastern part) and to areas with mean precipitation of more than 1,000 mm/year. Topographically, the area is heterogeneous, with strong climatic contrasts between the sunny, dry south-facing slopes and the shaded, wetter north-facing slopes.
A major diversity hotspot in the Mediterranean region, Sierra Nevada has unique ecosystems with many endemic species. Overall, Sierra Nevada comprises 27 habitats types from the habitat directive (Annex I of Directive 92/43/CEE). The Iberian ibex is distributed throughout Sierra Nevada from the summits to the bottom of the valleys, depending on the seasonality of the ecosystem.
The presence of Iberian ibex has been constant in this mountainous region, for many reasons, such as: the implementation in the mid-1960s of a set of legislative measures that promoted the conservation of the species, human depopulation, habitat transformation, increase of shrublands, and massive reforestation projects. . Different colours indicate the bibliographic source. There is a discrepancy between official census and the density values of this dataset because they are based on different methodologies and also due to the different areas covered (see Fig. 1). The trends of the Iberian ibex population in Sierra Nevada appear to be related to land-use changes and human depopulation.
www.nature.com/scientificdata www.nature.com/scientificdata/ Regarding the absence of large predators in Sierra Nevada, although the wolf (Canis lupus) was abundant during the Middle Ages until nearly the end of the 19th century, hunting caused its disappearance in 1933.
Sampling protocol. Sampling based on linear transects has been used since the early 1930s. This method has proved practical, efficient, and relatively inexpensive 48,49 . In addition, it has been recommended because it provides the ability to control the reliability of the results 49 , this being the main reason for using the Distance Sampling in the census of wild ungulates 39,50 . In Sierra Nevada, the linear transects method for estimating the size of the Iberian ibex population was first used in 1993 51 . A more detailed explanation of the methodology can be found in the Plan específico de gestión de la población de cabra montés en el Parque Nacional de Sierra Nevada 43 (Specific Management Plan for the Iberian ibex population in Sierra Nevada National Park) and in the monitoring methodologies of the Sierra Nevada Global-Change Observatory 52 .
The transects are sampled by two or more observers, on foot or by vehicle, when terrain conditions allow, at a speed of no more than 15 km/h. The sampling time is adapted to the dates when the field work is carried out, recording the official time in the surveys. In summer, the observers walk the transects mainly at dawn and dusk. In autumn, the sampling time is extended throughout the day. Each transect is sampled on a single day, so the sampling of all of them is completed on consecutive days during about 2 weeks. The optical materials used are binoculars (8 × 35) and a telescope (20 × 40). When circumstances prevent a satisfactory viewing (individuals far away or hidden by the surrounding vegetation, temporary brevity of contact, etc.) sightings are not taken into account.
The sampling design (length and location of transects) was carried out following criteria of randomness and stratification. Although a total of 88 transects were sampled throughout the monitoring since 1993, not all of them were sampled every year: before 2008 due to methodological changes and after 2008 due only to weather conditions. This means that the transects designs were made uniform as of 2008, when 33 transects were fixed and were the only ones constantly sampled from 2008 onwards. Some of these were modifications of the old transects ( Fig. 3b), while others had been sampled before 2008 also. Based on our experience, these 33 transects were considered suitable because they cover a large part of the mountain range and they can be sampled in a reasonable time without the resulting in observers fatigue (less than four hours).
Overall, one sampling was conducted annually, although in 1995 two samplings were undertaken (summer and autumn) but none in 1994, 1999, 2005, or 2006. The first years the surveys were conducted in summer so that snowfall would not prevent sampling. From 2008 onwards, the 33 fixed transects are sampled annually in autumn whenever the weather conditions allow. Transects are sampled in autumn (before the oestrous cycle), when animals are more active and more easily observed, offering greater probabilities of detecting animals.
In each sampling, the observers walk the linear transects taking notes on the Iberian ibex groups sighted and recording different variables such as: the number of individuals observed (group size); the contact hour; and www.nature.com/scientificdata www.nature.com/scientificdata/ perpendicular distance of each group of Iberian ibex to the transect line. At the individual level, records are made of physical condition (mainly the presence of lesions caused by sarcoptidosis), the sex of each Iberian ibex, and the age in the case of the males. In addition, the date as well as the starting and ending times of the sampling are also recorded, as well as the identity of the observers.
The probability of detecting an individual is related to spatial distribution of the sightings 53 and visibility conditions, habitat coverage, land topography, animal and group size, as well as the density. The method assumes that, if the density is high, many individuals will be sighted up close. If the density is low, only a few individuals will be sighted, and far away. The following premises must be assumed: animals on the transect line are always observed; animals must be immobile when they are observed or located on the spot before they move; no animal should be counted twice; distances and sighting angles must be calculated accurately and sightings are independent events.
With all these data collected, the parameters that define the population were: density (number of individuals/ km²), sex ratio (number of females/number of males), birth rate (number of kids/number of adult females) and age pyramid for males. The size of the horns and body morphology make it easier to determine the years of age or age class to which males belong, as in Alados and Escós 37,54 . Habitat description. The 33 transects sampled in the last 10 years (since 2008) cover different habitats and elevational ranges. To explore how the transect covers the ecosystems found in Sierra Nevada, we classified each transect according to the ecosystems it covers. For each transect, we analysed the percentage of each ecosystem sampled. A buffer of 200 m along each transect was calculated. Using the map of the ecosystems of the Sierra Nevada 55 , we calculated the percentage of each ecosystem sampled in each transect. The transects were then classified based on the percentage represented of each ecosystem and the elevational range covered. Finally, a Multidimensional Non-Metric Scaling, using vegan R package 56 , was performed to explore and validate the classification made (Fig. 4).
The classes are: • HMS (High-Mountain Shrubland). Transects situated in 1950-2600m a.s.l. with a predominant habitat of >70% of juniper-genista thickets (high-mountain shrubland). • F (Forest). Transects with the predominant ecosystem is natural forest (holm oak, Quercus ilex; melojo oak, Quercus pyrenaica) or pine plantation (Pinus sp.). • C (Croplands). Transects covering mainly mountain crops (53%) with a significant presence of aquatic systems (15%) and mid-mountain shrubland (13%). • HMG (High-Mountain Grasslands). High-mountain grasslands predominate (>70%) in the transects located at the highest elevations (2650/2800-3300 m a.s.l). • MMS (Mid-Mountain Shrubland). Transects with a diverse habitat composition but with mid-mountain shrubland as the dominant ecosystem. High abundance of mid-mountain shrubland generally with different forests: pine plantations; autochthonous Scots pine; holm oaks (Quercus ilex); and, occasionally with mid-mountain grasslands. Data management and standardisation. The publication of historical data of Iberian ibex in Sierra Nevada stored in old formats has the advantage that it not only makes them public, but it also protects them from possible accidental loss 57 . In this case, all field data collected was recorded annually in surveys on paper for each transect sampled (Fig. 5a). To facilitate the digitalisation, a database was designed to store the information generated, using a form. Its structure was designed according to the main elements in samplings and their relations: the observers, the transects, the observed groups of Iberian ibex, and the sightings of the species at an individual level (Fig. 5b).
After data debugging (see Technical Validation section) and disaggregation (Fig. 5c), the dataset was standardised to the Darwin Core 58 structure as sampling event data (Fig. 5d). The resulting dataset was published through the Integrated Publishing Toolkit 59 (IPT v2.3.6) of the Spanish node of the Global Biodiversity Information Facility (GBIF) (http://ipt.gbif.es) (Fig. 5e).
This collaborative framework to retrieve and to make the data available to the scientific community responds to the Sierra Nevada Global-Change Observatory efforts to manage data following the FAIR principles 60 . In this case, we seek to enhance the potential reusability of these data by making the dataset: • Findable: by the integration and dissemination of data and metadata through GBIF with a unique and persistent identifier assigned. The dataset is also hosted by the Sierra Nevada Global-Change Observatory. • Accessible: by open and free access to the data and metadata.
• Interoperable: by standardising data to the Darwin Core standard and metadata to the EML standard. • Reusable: by providing a complete provenance and description of the dataset in GBIF metadata sections and in the data descriptor presented here. In addition, in the Usage Notes section, we present a reproducible example of use that begins with the download of the Darwin Core file available in GBIF and shows how to explore some population parameters.

Data Records
The data descriptor we present corresponds to version 1.7 of the dataset called Dataset of Iberian ibex population in Sierra Nevada (Spain) 61  www.nature.com/scientificdata www.nature.com/scientificdata/ it will be updated annually depending on the financial support. The custodian of all the information collected is the Administrative Center of the Sierra Nevada National and Natural Park, whereas the owner is the Department of Agriculture, Livestock, Fisheries, and Sustainable Development.
The DwC-A contains sampling event data, specifically: 3,091 records of events (582 transect walk events and 2,509 group sighting events), 5,396 records of occurrences, and 2,502 records of associated measurements. The DwC-A structure is based on a sampling event hierarchy: the "parent events" are the transect walk events, meaning that two or more observers sample one transect; whereas the "child events" are the group sighting events; that is, an Iberian ibex group is sighted and the observers collect different variables. The occurrences compile the sightings of 11,436 Iberian ibex (grouped by sex and life stage) from 1993 to 2018 in 88 transects distributed along Sierra Nevada. One variable was included in the Measurement or Fact table: the contact distance, meaning the perpendicular distance between each group of Iberian ibex and the transect line.
Clarifications and remarks concerning some Darwin Core elements included in the dataset (Fig. 5d) are provided hereafter: • The parentEventID element makes it possible to create the hierarchy among the transect walk (parent) events and the group sighting (child) events. This element is used to link the different group sightings to the same transect walk where they were recorded. A child event "inherits" the information from its parent event. • Individuals (species occurrences) with the same eventID belongs to the same group (event).
• The verbatimLocality contains local reference names, locating the start and endpoint of the transect (e.g. a mountain refuge, a crag, etc.) or a representative place where the transect is located.  www.nature.com/scientificdata www.nature.com/scientificdata/

technical Validation
Different validation processes were applied in the data cycle stages described in Fig. 5: (a) During the sampling, the observers fundamentally cross-checked the sightings in situ. (b) In the second step, due to the large volume of data, we implemented some controls and validation rules in the Access form in order to reduce human errors and facilitate the digitalisation: • Input masks control-data entry formats (especially date/time data type).
• We defined required fields (e.g. transect number and sampling date).
• We made lists of predefined values (e.g. group types: male alone, female alone, males, females, females with kids, and mixed groups). • We established some "control fields", i.e. variables that the person digitalising the data calculated manually to facilitate the information identification. For instance, before introducing the sightings, the person had to indicate the total number of groups identified in each survey; the size of each group; the type of group categorized by sex and age; etc.
As for transects, a more accurate digitalisation was carried out at a scale of 1:1000 in ArcGIS 10.2 62 , using as cartographic base the orthophotos from PNOA (Spanish National Program for Aerial Orthophoto). (c) The data were processed through the PostgreSQL relational database management system (RDBMS) version 11.3 63 together with R version 3.6.0 64 using the package Rpostgres 65 and the spatial extension PostGIS version 2.5.2 66 , in addition to other packages: DBI 67 , knitr 68 , dplyr 69 and splitstackshape 70 . In this way, we created a validation process in R and SQL code to check specific errors derived from digitalisation and corrected them. When necessary, the surveys were re-checked and several validation rounds were run. Specific examples are given below: • We checked whether all the information was associated: samplings without any observers assigned, groups that had no sightings assigned, etc. • Regarding null values, we checked whether all the essential variables were filled out, e.g. males without the age variable, groups without the size value, etc. • We identified any duplicated information.
• We revised any incongruous data, e.g. the hour when a group was sighted had to be between the start and end time of the sampling. • We also checked the "control fields" because they were susceptible to error, e.g. the automatic sum of individuals might not match the indicated group size; groups categorized as mixed should be males and females with kids, etc.

Usage Notes
We provided a reproducible example using the data stored at GBIF. The first step was to download the Darwin Core Archive (.zip file) of the dataset from the IPT. Then, using the finch package 71 we processed the Darwin Core Archive (DwC-A) and load the datasets. In the following steps we computed the population structure over the study period and explored several population parameters, such as sex ratio and birth rate.
Population structure. We explored the time course of population structure. For this, each year, we computed the percentage of individuals belonging to a certain age class. Also, we computed the average of each age class for the study period.
• First, from the Occurrence table (from the DwC-A), we selected the field lifeStage which indicates the age of the individual. • For individuals belonging to "kid" lifeStage, we considered half to be males, since in many mammalian populations a balance between the sexes is maintained with a 1:1 ratio, which does not differ significantly from a theoretical distribution 50,72 . • Then, we computed the number of individuals by year and age class, and the percentage.
Then, we plotted the structure of the population for each year (Fig. 6).
Sex ratio and birth rate. To compute the sex ratio and the birth rate, we first needed to determine male and female counts grouped by year. We also computed the number of kids per year, using the field lifeStage included in the Occurrence table.
• Extract year from eventDate field. • Group data by year and determine the male and female counts.
Then we computed the sex ratio (sr) as female count/male count and the birth rate (br) as the kid count/female count. We used the variables eventDate, sex, organismQuantity and lifeStage from Occurrence table (from the DwC-A). The results are plotted as in Fig. 7.

Code availability
The code used in the Usage Notes section is publicly available through the Zenodo repository 73 . The Usage Notes section was performed using R computing language 64 and the packages: finch 71 , tidyverse 74 , knitr 68 and here 75 .