A national macroinvertebrate dataset collected for the biomonitoring of Ireland’s river network, 2007–2018

The Environmental Protection Agency (EPA) in Ireland is responsible for the ecological monitoring and assessment of 37 hydrometric areas covering 46 river catchments and over 13,000 km of river channel nationwide. The national river monitoring program commenced in 1971 and has developed further since 2007 into the National Rivers Water Framework Directive (WFD) Monitoring Program following the implementation of the WFD across the European Union. The monitoring program is designed to obtain sufficiently representative information to assess ecological quality for each water body assessed. Consequently, macroinvertebrate data have been collected at over 2,900 river survey stations on a minimum 3-year cycle to fulfil these requirements. While the EPA has collected these data for water quality assessments we recognize that the data have value beyond this one purpose. We provide a summary of how these 10,987 data records, covering the years 2007 to 2018, have been collected and used to deepen understanding of water quality, biodiversity and general ecological health of Ireland’s river network.


Background & Summary
Ireland (see usage notes) has over 84,800 km of river channel with streams, rivers and tributaries flowing from their headwaters to the sea through a vast network of channels. The national river monitoring program, established in Ireland in 1971 by An Foras Forbartha (succeeded by the Environmental Protection Agency (EPA) in 1993), is designed to provide representative information on the entire river network through the monitoring of discrete sections of river called water bodies. There are 3,192 river water bodies in Ireland and the national river monitoring program is designed to obtain sufficiently representative information to assign a status to each of these water bodies. All the major rivers and their significant tributaries are included in the monitoring program (https://gis.epa.ie/EPAMaps/) which has been carried out on a continuous basis since 1971 when 2,900 km of river channel was first surveyed. The surveys presently combine general chemical and biological assessments with approximately 13,200 km of river channel surveyed over a 3-year cycle [1][2][3] .
A summary of historical information on Irish water quality (covering rivers, lakes, ground, estuarine and coastal waters) is available in O'Boyle et al. 3 and Trodd and O'Boyle 4 (but see https://www.epa.ie/ for regular updates). In Ireland, the most commonly encountered forms of pollution in rivers are eutrophication and organic pollution 3,5,6 . Less frequently encountered are non-organic types of pollution such as toxic pollution (e.g. by sheep treatment or industrial chemicals) 3,7 , siltation (e.g. arising from cattle access over-grazing, drainage, quarrying or stone-cutting operations) 3,8,9 and acidification in sensitive afforested areas 10 . Similarly, hydromorphological alteration to river channels can have influential and adverse effects on aquatic organisms 3 .
The Water Framework Directive (WFD), establishing a framework for European Union (EU) community action in the field of water policy with the objective to maintain or restore aquatic ecosystems, was adopted by the EU in October 2000 (2000/60/EC) and aims to achieve these objectives through a process of river basin management planning supported by status assessments derived from monitoring programs. As part of the WFD implementation process, the EPA was required to develop a new monitoring program for surface (rivers and lakes), ground, estuarine and coastal waters. The monitoring program became operational in December 2006 and in early 2007 the new WFD rivers monitoring program replaced the previous national river monitoring program 11 . The EPA is responsible for overseeing the WFD monitoring of biological elements, which in rivers focuses mainly on the assessment of aquatic macroinvertebrates to aid in the establishment of ecological status (see Methods for more information).
The data described here were originally collected for one purpose -to assess rivers nationwide to determine the quality of the macroinvertebrate communities across the country as part of the WFD ecological status assessment. These assessments indicate changes that pollution brings about in the benthic macroinvertebrate communities, i.e. larval insects (e.g. mayflies, stoneflies, caddisflies, beetles, etc.) together with crustaceans (e.g. shrimps), snails and bivalves, worms, and leeches. These changes reflect the varying sensitivities of the different groups of macroinvertebrates to the stresses caused by pollution, with sensitive species being progressively replaced by more tolerant forms as pollution increases. Such effects are well documented in the Irish and international literature (e.g. Kail et al. 12 ).
While the EPA has collected this data for water quality assessments we recognize that the data have value beyond this one purpose. This dataset has the potential to be used to address many questions relating to our river environment including, for example, biodiversity assessments, species atlases and International Union for Conservation of Nature red listing, by providing an invaluable spatial and temporal dataset. Our purpose here is to widen the knowledge of, and accessibility of, these data. To date several projects and studies have acquired and utilized these data; for example, Donohue et al. 5 , Bennett et al. 13 , White et al. 14 , Feeley et al. 15,16 and Kelly-Quinn et al. 7 , examining water quality and ecological relationships, ecology, biodiversity, species distributions and other work such as species red listing.
The biological program and macroinvertebrate data described here includes 2,976 unique biomonitoring stations and 10,987 individual surveys, assessed between 2007 and 2018, covering four 3-year reporting cycles.

Methods
Sampling rational and design. The EPA in conjunction with local authorities and other public bodies in Ireland has undertaken a substantial characterization of the physical water environment and the impact of human activities on waters 17 . Therefore, the monitored river water bodies in Ireland and the national river monitoring program are designed to obtain sufficiently representative information across river typologies and on significant pressures to assign a WFD status to each water body across our entire river network (Fig. 1).
The data collected covers the range of ecological conditions found in Irish rivers to assist the assignment of an ecological status as required by the WFD. Ecological status is an assessment of the quality of the structure and functioning of surface water ecosystems and it highlights the influence of pressures (e.g. pollution and habitat degradation) on several identifiable quality elements. As part of the WFD, ecological status is determined for each of the surface water body categories (i.e. rivers, lakes, transitional waters and coastal waters) using intercalibrated (see Technical Validation for further details) biological quality elements (BQEs) and supported by physico-chemical and hydromorphological quality elements. Ecological status for surface water bodies is primarily driven by the BQEs, namely fish, aquatic flora, macroinvertebrates and phytoplankton. The overall ecological status classification for any water body is determined, according to the 'one out, all out' principle, by the element  Table 5 for more details.
www.nature.com/scientificdata www.nature.com/scientificdata/ with the worst status out of all the biological and supporting quality elements. In Ireland, macroinvertebrates are the main BQE determining the ecological status in rivers 3 .
The WFD requires BQE scores to be expressed as an Ecological Quality Ratio (EQR) to standardize and provide a common scale of ecological quality across participatory Member States using differing national methods 18 . The EQR is determined by expressing the observed result over the expected result which calculates a ratio score ( Table 1). The 'expected' or 'reference' condition (EQR close to 1) is the natural, undisturbed environment, i.e. the benchmark. The assessment of the scale of anthropogenic pollution in any water body is based on the extent of deviation from expected reference conditions and follows the definitions as outlined in the WFD (Table 1). For example, 'High status' is defined as the biological, chemical and morphological conditions associated with no or very low human pressure, and therefore, little or no deviation from reference, 'Good status' means 'slight' deviation, 'Moderate status' means 'moderate' deviation, and so on. EQRs provide a common scale to ensure comparability across different pressures, allowing water managers to easily recognise and characterize impact facilitating the development of mitigation measures to restore or preserve ecological status 17 . To assess the network of rivers in Ireland, monitoring stations cover all 37 hydrometric areas (HA) providing a full national coverage (Fig. 1).
Field sampling and data collection. River macroinvertebrates are collected from June to September each year, when flows are likely to be relatively low. Occasionally, for operational or weather-related reasons, surveys may occur outside of this period. Two approached are used. The first, and principal methodology used (96.7% of surveys in dataset), is by kick-sampling with a standard pond net (230 × 225 mm frame with 1 mm mesh). In this approach a semi-quantitative two-minute macroinvertebrate kick-sample is collected from the riverbed preferably from the faster flowing riffle habitats 19 . A further one-minute hand search is carried out to locate macroinvertebrates that remain attached to the underside of the cobbles 19 . Depending upon the proportion of various habitats (e.g. glides, margins, pools), time may also be spent sampling these habitats with operators moving location approximately every 4 to 5-seconds over a 50 m stretch. Similar studies in Ireland and elsewhere have found that this sampling approach is sufficient to achieve a suitable representation of taxa for bioassessment of lotic habitats 20,21 . Occasionally, when the substratum (e.g. bedrock) or flow conditions make kick-sampling difficult, or the abundance of macroinvertebrates collected is extremely low (e.g. toxic pollution, see Kelly-Quinn et al. 7 ), it may have been necessary to spend a longer amount of time sampling the river to accumulate a sufficient diversity and abundance of macroinvertebrates. In fast flowing steep rivers, it may have been necessary to kick deeper into the riverbed to disturb the organisms and include more of the marginal areas to ensure taxa are recorded 19 . This sampling approach requires avoidance of obvious localized disturbance (e.g. cattle access points) which may adversely influence the sample taken.
If the river depth is too deep to wade, a separate approach is taken. In this scenario, a bankside extension net sampling approach for deep (non-wadable) rivers is used to collect macroinvertebrates. It must be noted that this methodology is used less frequently than the kick-sampling approach. If employed, the depth and number of extension poles attached to a modified hand net will vary on a site by site basis. The net (frame and mesh dimensions as above) is then pulled upstream along the riverbed, generally at a perpendicular angle to the bank to cover as much surface area as possible with operators moving location after every pull over a 20 to 50 m stretch. The net may also need to be emptied between pulls to ensure that macroinvertebrates already collected are not lost inadvertently during the next pull. The extension net is also used to sweep along the water surface and marginal vegetation. This approach is conducted for a minimum of five minutes or until a representative sample is obtained (see Technical Validation for more details).
Once a live sample is collected it is assessed on the riverbank and the EPA Q-value classification is assigned (see Toner et al. 1 for more details). This involves recording the taxa present at a suitable and attainable (under field conditions) taxonomic resolution ( Table 2) and their categorical relative abundance (Table 3), determined using approximate counts. Once all taxa and their relative abundance have been recorded, the sample is returned to the river. Potential users should note that actual numbers of taxa have not been recorded and are therefore unavailable within the dataset. Similarly, taxonomic resolution may vary from what is outlined in Table 2. Indeterminate specimens may be brought back to the laboratory for identification under a microscope. Taxa are also occasionally returned to the laboratory and identified by microscope as a quality control measure. A brief description of www.nature.com/scientificdata www.nature.com/scientificdata/ the Q-value ecological quality rating (EQR) is outlined in Table 1. The typology of each river station is described in Table 4, after Kelly-Quinn et al. 22,23 .
Each hydrometric area (Table 5 and Fig. 1) is generally surveyed on a three-year cycle; however, full surveys of certain hydrometric areas may be spilt across two concurrent years (e.g. HA 25), and on occasion a subset of stations were surveyed/resurveyed outside of the main survey year to closely track any progress in status changes following the implementation of a program of measures (Table 6). Certain stations were sampled on a more frequent basis such as seriously polluted sites (i.e. Red dot sites -Fanning et al. 24 ), WFD high status objective sites, priority areas for action identified in Ireland's national river basin plan 17 and occasional sites of interest to local authorities and the EPA Office of Environmental Enforcement.   www.nature.com/scientificdata www.nature.com/scientificdata/ Within each hydrometric area, water bodies may have one or more sampling stations along their continuum. The number of stations may also vary between survey years, although, unless health and safety, or other unforeseen circumstances arise, the EPA attempt to sample the same stations in each survey cycle. Similarly, the numbers of water bodies and stations sampled within each hydrometric area will reflect the geographical area and length of river network.

Data Records
Although the EPA has collected river macroinvertebrate data in Ireland since 1971, only data from 2007 onwards is available digitally at present. While efforts are being made to digitize pre-2007 data, the current paper and associated dataset only relates to the period 2007 to 2018 covering four full WFD river invertebrate biomonitoring cycles. Table 6 highlights the availability of data in each HA over the data collection period. The dataset of macroinvertebrate taxa includes several descriptive and complimentary fields for each record ( Table 7).
The complete list of taxa included in the dataset is fixed but note that because taxonomic resolution varies annually and between stations some minor editing may be required before usage. The associated relative abundance categories are outlined in Table 3. Note if a taxon has no relative abundance it was not recorded. Nomenclature is based on that supplied by the British Natural History Museum (https://www.nhm.ac.uk/ our-science/data/uk-species/species/index.html [accessed 01 July 2020]) and Freshwater Animal Diversity Assessment (FADA) Project (http://fada.biodiversity.be/).
The dataset includes 10,987 records (see Table 6), each representing an associated list of macroinvertebrate taxa from a unique date and monitoring station. All data collections occurred between 17 April 2007 and 18 October 2018 and are available in figshare 25 .

technical Validation
The data described here were collected using the EPA Q-value classification as described briefly above with full details available in Toner et al. 1 . This approach, established and developed by An Foras Forbartha, was tested and intercalibrated by the EPA against other European macroinvertebrate assessment methods (2008/915/ EC) 26 and was adopted as the official macroinvertebrate classification system for assessing rivers in Ireland (Statutory Instrument No. 272 of 2009). More detailed information on intercalibration of this metric is available in McGarrigle and Lucey 27 and a comparison with other intercalibrated European macroinvertebrate metrics is available in Bennett et al. 13 . Note that the Q-value is based on the well-established sensitivities, abundance and diversity of macroinvertebrates and their relationship to water quality. The system is considered a proprietary expert system and is generally only applied from June through September.
In terms of data collection, operator harmonization occurs annually at the beginning of each sampling season to ensure all operators in the field use identical and replicable approaches to sample collection, sample size, macroinvertebrate identification and data recording. All data collected is checked and accurately transferred to the in-house database. All data are double checked by an independent user to ensure all data collected have been accurately transcribed.  www.nature.com/scientificdata www.nature.com/scientificdata/ The long-term collection of this data and the harmonization approaches employed allow the production of a time series which provides a valuable record of environmental change in Irish rivers at a national level (Fig. 2). Additionally, Donohue et al. 5 found highly significant relationships between Q-values and measures of urbanization and agricultural intensity, densities of humans and cattle within a catchment, but also chemical indicators of water quality, namely molybdate-reactive phosphorus, ammonia, total oxidized nitrogen, biological oxygen demand, pH and conductivity. There is also evidence to show macroinvertebrate data collected for Q-value determination has a relationship with fish diversity and density 28 .

Usage Notes
The use of the terms 'Ireland' , 'Republic of Ireland' and 'Irish' are used interchangeably throughout this paper and refer only to the State known officially as 'Ireland' and described as the 'Republic of Ireland' , and does not include the entirety of the island of Ireland, which includes 'Northern Ireland' , part of the United Kingdom.
Data that is produced directly by the EPA is free for use; however, where data are used in publications and/ or presentations, in full or partial, the EPA should be acknowledged. An example of how to acknowledge the use of EPA data is, but not limited to, the following: 'Data included [in this study/review/other] was provided by the Environmental Protection Agency (Ireland)' .