Domestic waste emissions to European waters in the 2010s

Estimation of domestic waste emissions to waters is needed for pollution assessment and modelling. We assessed quantity and location of domestic waste emissions to European waters for the 2010s. Specifically, we considered discharges of domestic waste Population Equivalent (PE, the amount of waste that equals to 60 g per day of Biochemical Oxygen Demand), and mean annual loads (t/y) of total nitrogen, total phosphorus, and 5-days Biochemical Oxygen Demand. The spatial resolution and extent of the analysis corresponded to the CCM2 River and Catchment Database for Europe, for catchments of mean area of 6.4 km2. The assessment is based on available European databases that allowed pinpointing waste emissions to a high spatial and conceptual resolution. Content gaps, particularly concerning domestic waste from isolated dwellings, were filled through alternative sources of information, exploiting population density and national statistics data. The dataset is of interest for assessing waste emissions to and fate through European fresh and marine waters also beyond the three pollutants evaluated in this study.

sources (IAS and SD) leach underground, where they could contribute locally to the pollution of groundwater, and reach surface water via subsurface pathways (diffuse sources).
Homogeneous data on quantity and location of all sources of pollution outlined above are not available for the whole Europe. A method was developed to merge domestic waste data reported in databases (REP approach) with alternative sources of information based on population density and national statistics on sewerage connection and treatment level rates (POP approach), to fill in content and regional gaps (Fig. 1). The spatial resolution and extent of the analysis and dataset corresponded to the CCM2 River and Catchment Database for Europe [7][8][9] , which defines the European drainage network and divides the land into topologically connected catchments. Iceland was excluded from the analysis as it is not covered by CCM2. The reference period for the assessment was 2014-2015, although in some cases a longer time period was considered. The dataset is of interest to assess and model waste pollution in waters at the continental scale.

Methods
The assessment of emissions of nitrogen, phosphorus and organic matter (BOD) in domestic waste involved several steps, namely assessment (i) of the spatial distribution of annual domestic waste (LOCATION); (ii) of the pollution loads (QUANTITY); and (iii) of the level of treatment (REDUCTION).
Location. The Waterbase -Urban Waste Water Treatment Directive (UWWTD) database v6 10 , reporting data for 2014, was used as basis for domestic waste emissions in the REP approach region. The UWWTD database reports domestic waste emitted by agglomerations larger than 2000 Population Equivalent (PE) in the 28 European Union Member States plus Iceland, Norway, and Switzerland. The database reports waste loads generated in agglomerations, to which WWTP or IAS loads are transferred to, WWTP treatment levels, and location of WWTP discharge points. All loads are reported in terms of PE. Besides waste generated by resident population, PE loads comprise also commercial, industrial, or tourism waste that is produced in the agglomerations.
The UWWTD database is composed of several tables that portray the complex transfers between agglomerations, WWTPs, and discharge points. The waste load generated in agglomerations may be transferred to WWTPs, to IAS, or discharged without treatment. One agglomeration can be served by more than one WWTP and one WWTP may serve more agglomerations. Similarly, waste from IAS may be released in the environment or transferred in part or in total to one or more WWTPs by truck. Finally, a WWTP may have one or more discharge points.
In the database, some missing data and errors, for example in geographic coordinates, were detected. Further, inconsistencies between waste loads generated by agglomerations, transferred to WWTPs, treated, and ultimately transferred to discharge points were noted. Thus, the original database was amended, filling in missing Orange dots indicate domestic emission points based on UWWTD database, covering EU28, Norway and Switzerland (REP approach). Blue background indicates the extent of the dataset and data coverage for population statistics (POP approach); stripes indicate regions for which solely POP approach data was available.
information where possible and reducing inconsistencies by tracking records through the database structure, with the aim of preserving the generated waste load from agglomerations to treatment facilities and discharge points. Geographic coordinates were corrected where possible based on facility names, or location of linked items (agglomerations, WWTPs or discharge points). Rules applied to address the inconsistencies are detailed in Vigiak et al. 11 . Not all inconsistencies could be addressed; for example a discrepancy of 10% of waste between what was transferred to and received from any single WWTP was considered acceptable. While the revision process may have reduced database inconsistencies, it may also have inadvertently generated errors, as assumptions had to be made when addressing each inconsistency/error type. This was particularly true for Croatia, for which no information on the path of waste generated in agglomerations to IAS or WWTP was reported, and for which mean statistics of neighbour Slovenia were used instead. Thus, Croatia results should be considered approximations only.
The revision allowed to attribute PE generated in agglomerations (PE_GEN), to IAS or WWTPs discharge points. Waste load treated in IAS (PE_IAS) was equalled to the share of load transferred from agglomerations to IAS (TO_IAS) less the waste load transferred from agglomerations to WWTPs by truck (IAS_to_WWTP; Table 1). In total, about 627.5 million PE were generated in 2014 in the 30 countries comprised in the UWWTD database of the REP region (PE_GEN); of this waste, about 2.3% was not treated (PE_0), 1.8% was treated in IAS (PE_IAS), and 95.9% was connected and treated in WWTPs (PE_WWTP). Due to persistence of small inconsistencies at agglomeration or WWTP level, some small differences between PE generated and allocated to the three waste pathways remained, with the sum of allocations exceeding generated waste by 0.2% overall.
Waste generated in agglomeration but not treated (PE_0) was considered to be discharged directly to the stream network at the agglomeration location, less a 10% abatement that occurs in the sewerage system 2 . As the UWWTD database reports no information about treatment and location of IAS, waste load treated in IAS (PE_IAS) was assumed to receive primary treatment and be discharged in the ground (diffuse source) at the www.nature.com/scientificdata www.nature.com/scientificdata/ agglomeration coordinates. Waste load treated in WWTPs (PE_WWTP) was reduced according to WWTP treatment level, and emitted in the stream network at the WWTP discharge points. When a WWTP had more than one discharge point, WWTP emissions were divided among discharge points assuming that larger portions of waste would be discharged to larger rivers/streams. The mean annual flow of the receiving reaches was used to define each discharge point receiving fraction.
In parallel and for the whole European continent, domestic waste was also assessed based on national statistics of domestic waste treatment coupled with the spatial distribution of population density (POP approach). This was the only method applicable in the region outside the 30 countries listed in Table 1. The percentage population connected to sewerage system and receiving waste water treatment level were derived from national statistics (Online-only Table 1). The main source of information was Eurostat 12 . Population shares were correspondingly defined as: (a) Collected in sewer (%, corresponding to Eurostat 'Urban wastewater collecting system'); (b) IAS (%, 'Independent wastewater treatment -total'); (c) 1ary treatment (%, 'Urban and other waste water treatment plants -primary treatment'); (d) 2ary treatment (%, 'Urban and other wastewater treatment plants -secondary treatment'); (e) 3ary treatment (%, 'Urban and other wastewater treatment plants -tertiary treatment'). In a few instances, the distribution in 1ary, 2ary or 3ary treatment plants was unreported in Eurostat 12 , but was extracted instead from another Eurostat dataset 13 . From statistics (a) to (e), we derived three more population shares: (f) population whose waste is collected but not treated (Pop_0 = Collected in sewer − sum of 1ary, 2ary and 3ary treatments; = (a) − (c + d + e), %); (g) Disconnected population (DISC), i.e. population share whose waste is not collected in sewers (DISC = 100 − Collected in sewer, = 100 − (a), %); and (h) Scattered Dwellings (SD), i.e. small, sparsely distributed homesteads, equal to the share of disconnected population that is not treated with IAS Small inconsistencies in national statistics were identified. For example, IAS data were sometimes unreported or larger than DISC; Pop_0 did not always match Eurostat 12 'Percentage of resident population not connected to urban and other waste water treatment plants' statistics. The inconsistencies were addressed maintaining information about collected and treated shares, i.e. items from (a) to (e), while adjusting derived shares, from (f) to (h). These inconsistencies indicate a degree of conceptual uncertainty in defining population shares or in the interpretation assumed in this study, especially with regards to Pop_0 and DISC. When Eurostat 12 data was not available other reporting sources were used (Online-only Table 1); however it was noted that different international sources indicated sometimes discordant figures 14 . Caution should be exerted especially for statistics reported for Albania, Moldova, and Russian Federation.
The 1 km 2 raster grid of Global Human Settlement (GHS) population of 2015 15 was used to define population density (inhabitants/km 2 ). Population was allocated to waste water treatment shares according to its density, assuming that most densely populated areas would benefit of the best nationally available technology, and vice versa the least populated areas would not be connected to sewerage systems. Thus, four increasing population density thresholds per country were identified based on the national cumulative population density distribution and national treatment statistics as: Th DISC , below which density of population was assumed disconnected from sewerage; Th 0 defining the density up to which population was assumed to be connected to sewers but whose waste was not treated; Th 1 defining the density up to which population was assumed to be served by primary treatment; and Th 2 defining the density up to which population was assumed to be served by secondary treatment; population densities above Th 2 were assumed to be served by tertiary treatment. After applying the density thresholds, the number of inhabitants per treatment and per catchment was obtained by multiplying the catchment mean density (inhabitants/km 2 ) for the relative share by the catchment area (km 2 ). Through this procedure, population was spatially partitioned into: i. Population that is not connected to sewer systems (Pop_DISC: GHS2015 population density < Th DISC ).
Pop_DISC was divided in the two fractions, Pop_IAS (i.e. whose waste is treated in IAS) and Pop_SD (i.e. waste generated in scattered dwellings, served by septic tanks). The ratio IAS/DISC between disconnected population whose waste is treated in IAS and all disconnected population was derived from national statistics and used to separate the two fractions: Table 1). Thus Pop_IAS and Pop_SD share the same spatial distribution. ii. Population that is connected to sewer system but whose waste is not treated (Pop_0: Th DISC > = density < Th 0 ) iii. Population that is connected to sewer system and whose waste is treated at primary (Pop_1: Th 0 > = density < Th 1 ), secondary (Pop_2: Th 1 > = density < Th 2 ), or tertiary level (Pop_3: density > = Th 2 ).
A final check consisted of summing inhabitants per country and treatment level to see if proportions respected the official statistics. Deviations of allocated to official population shares were less than 0.35% in more than 90% of cases. The largest negative deviation was −3% for tertiary treatment in Turkey, and +3% of primary treatment in Georgia. Within EU28, the largest deviation was +1.2% population allocated to tertiary treatment in Luxemburg. These differences are due to errors in allocating a CCM2 catchment to a single country along country borders.
Merging of domestic waste data sources. The UWWTD database does not report waste from agglomerations below 2000 PE unless its waste is treated in WWTPs. Thus, part of the population that is disconnected from sewerage systems (part of Pop_DISC) or possibly served by sewerage but not treated in small agglomerations (part of Pop_0) remains unreported. To fill in this source gap it was necessary to estimate which quota of population might be unreported in the UWWTD database (called herein "residual population", Pop_RES). This was done by relating domestic waste estimates from REP and POP approach. However, direct comparison was complicated by (i) differences in reported units, as the UWWTD database reports PE while the POP approach is based (2020) 7:33 | https://doi.org/10.1038/s41597-020-0367-0 www.nature.com/scientificdata www.nature.com/scientificdata/ on inhabitants; and (ii) the highlighted uncertainties in reported shares of Pop_DISC and Pop_0 for the POP approach.
The relationship between PE reported in the UWWTD database (PE_GEN) and inhabitants (total resident population PopTot, estimated from GHS2015 15 ) needed be better understood to allow for a meaningful merging of the two approaches. Theoretically, missing population in the UWWTD database would be negligible in countries where shares of scattered dwellings (disconnected but not treated in IAS) or connected but not treated (Pop_0) population is nil or very low. Of the 30 countries analysed in the UWWTD database, 15 reported at least 97.5% of population as treated through IAS or WWTPs (Online-only Table 1). The country ratio between PE_GEN and PopTot (inhabitants) for these 15 countries ranged from 0.8 to 2.4 (median 1.18). Despite this variability and the small sample size, a significant linear regression between PE_GEN and PopTot could be identified: PE_GEN = 1.23 inhabitant (R 2 = 0.98; sample size = 15; Fig. 2). The rate PE/inhabitant of 1.23 was thus adopted to transform PE into resident population and vice versa. We refer to inhabitants derived from PE as Population Resident Equivalents (PRE, inhabitants), where 1 PRE = 1 PE /1.23. The interpretation of this rate is that on average across Europe the contribution of commercial, industrial and tourism emissions to domestic waste on top of resident population can be considered around 23%. This figure is higher than a global average of 15% 2 but seems reasonable for industrialized countries, and especially for urban areas.
The PE/inhabitant rate was used to estimate the quota of country population that was not accounted for in the UWWTD database (Pop_RES). Figure 3 shows a conceptual scheme of the procedure. First, country total PE_GEN reported in the UWWTD database (blue bar) was transformed into population resident equivalents (PRE), which were compared to total population (PopTot). The difference between total population (PopTot) and the estimated inhabitants reported in the UWWTD database (PRE) was defined Pop_RES. If the total population was lower than estimated PRE, Pop_RES was nil (case A in Fig. 3). Otherwise (cases B in Fig. 3), Pop_RES was taken (and spatially distributed) as part of population disconnected (Pop_DISC) or connected but not treated (Pop_0). In first instance, if disconnected population was larger than the estimated Pop_RES (case B1 in Fig. 3), then Pop_RES was defined and distributed as the Pop_RES/Pop_DISC fraction of the disconnected population; all this fraction was considered belonging to scattered dwellings (Pop_SD). When Pop_RES was larger than Pop_ DISC (case B2 in Fig. 3), then after allocating Pop_SD equal to Pop_DISC, the remaining portion of Pop_RES was taken and distributed as a fraction of connected not treated population (Pop_RES_0). Finally, there could be cases where Pop_RES was larger than the sum of Pop_DISC and Pop_0 (case B3 in Fig. 3). In these cases, all Pop_DISC was considered Pop_SD, all Pop_0 was considered Pop_RES_0, but there was no further attempt to fill the remaining estimated population gap, and the final Pop_RES allocated to the country was lower than the population gap initially estimated.
In some countries (AT, CH, DE, DK, EE, ES, IT, MT, and SE; country codes are indicated in Online-only Table 1), PRE exceeded population thus Pop_RES was nil. In other cases (BE, CY, CZ, LT, LV, PL, SI, and SK), population exceeded PE_GEN; in these countries Pop_RES amounted to 16-42% of population, and was a considerable source of domestic waste in addition to what reported in the UWWTD database. Finally, in the remaining countries (BG, FI, FR, GB, GR, HR, HU, IE, LU, NL, NO, PT, RO) total PE_GEN were larger than population, but the corresponding PRE were lower than population. In these cases, the median Pop_RES was 3% of population, Fig. 2 Relationship between generated population equivalent (PE_GEN) as reported in the UWWTD database and total population (inhabitants), for 15 countries that reported at least 97.5% of population as treated through IAS or WWTPs (Online-only Table 1).
In the POP approach, shares of connected population (Pop_0 to Pop_3) were transformed into PE loads using the PRE equivalence definition, i.e. adding a 23% component due to commercial, industrial and tourism to that of resident population. However, a 10% reduction was applied to these PE loads to account for losses occurring in sewerage system 2 . Pop_IAS were also transformed into PRE, adding a 23% to align them to PE_IAS definition. Instead, domestic load from scattered dwellings (Pop_SD) was considered as produced solely by resident inhabitants (1 PE/inhabitant in this case).
In synthesis, the two datasets (REP and POP) were merged as follows ( Fig. 1): • Treated loads: In regions covered by the UWWTD database, treated load was estimated with the UWWTD database and attributed to discharge point locations (REP approach). For countries not covered by the UWWTD database, treated waste was estimated with Pop_1, Pop_2 and Pop_3 assessed with POP approach, and emissions were distributed according to catchment population density; • Disconnected and connected not treated domestic loads: In regions covered by the UWWTD database, PE_ IAS and PE_0 reported in the UWWTD database were attributed at agglomeration coordinates. Additionally, unreported population pertaining to small agglomerations (Pop_RES) was distributed according to POP approach (Pop_SD and Pop_RES_0). For countries not covered by the UWWTD database, the analogues from POP approach (Pop_SD, Pop_IAS, and Pop_0) were used.
Domestic pollutants loads (QUaNTiTY). All waste generated by these sources was considered domestic, although urban waste reported in the UWWTD database includes waste from commercial, industrial, and tourism activities. After allocating domestic waste load (PE) spatially across Europe, the associated emissions of nitrogen, phosphorus, and BOD loads were estimated assuming them to be dependent on human diet 1,16,17 .
Emissions of nitrogen and phosphorus from human excreta were estimated based on protein consume 16 . Consume was considered equal to intake less a 20% of retail losses for vegetable proteins and 11% for animal proteins, and a further 3% of losses through sweat/hair/blood 2 . Therefore, N and P emissions (E N and E P ; g/day/ PE) were calculated according to Jönsson and Vinnerås 16 as: where VEGPRT is the vegetable protein intake, ANIMPRT is the animal protein intake (g/day/PE); α Ν is content of nitrogen in proteins (0.11) and α P is the content of phosphorus in proteins (0.011) 16  where X indicates the constituent (nitrogen, phosphorus or BOD). An additional source of phosphorus emissions in domestic waste due to use of detergents was estimated with Bouraoui et al. 1 (Online-only Table 1). www.nature.com/scientificdata www.nature.com/scientificdata/ Treatment pollution removal (REDUCTiON). Annual load emissions of domestic waste were computed as: out,T,X i n T,X T,X where Load out,T,X is the annual pollutant emission load of constituent X (t/y of nitrogen, phosphorus, or BOD) at treatment level T; Load in,T,X is the annual load undergoing treatment, and eff T,X is the removal efficiency. Removal efficiencies per treatment level were adopted from literature 3,19-21 (Table 2). BOD efficiencies were set after calibration of BOD fluxes in Europe 22 within the range of literature values. WWTP treatment types are reported in the UWWTD database 10 , however only nutrient removal technologies were considered to assign tertiary treatment level. WWTP treatment levels for nutrient were thus assigned as follows: tertiary when nitrogen or phosphorus removal was indicated; secondary when secondary treatment was specified, primary in all other cases. Noteworthy, in this way primary level was assigned to 1424 WWTPs which had no treatment reported in the UWWTD database, and to further 394 WWTP ambiguous cases.
Phosphorus removal technology improves tertiary treatment P efficiencies sensibly (T3P; Table 2). In the UWWTD database adoption of phosphorus removal is specified. In the POP approach, national statistics report this information partially 12 . However, the fraction of tertiary WWTPs that include phosphorus removal in the UWWTD database was generally higher than Eurostat 12 data, possibly because the UWWTD database is more recent. Thus the rate of phosphorus removal adoption in tertiary treatment (T3P/T3) as estimated from UWWTD database was adopted to define the fraction of Pop_3 treated with phosphorous removal (Pop_3P = Pop_3*T3P/ T3). In countries not covered by UWWTD database, for which no data was available, all tertiary treatment was considered without phosphorus removal technology (T3 only; Pop_3P = 0).

Data Records
Waste emissions to waters comprised point and diffuse sources. Locations of individual point sources however are subject to some uncertainty, thus we elected to aggregate emissions at the CCM2 9 catchment scale, which we consider appropriate for modelling pollution transport in freshwaters at continental scale. Further, emissions were aggregated at administrative units, which comprised NUTS Level 3 European administrative units 23 , united to ESRI administrative units 24 for countries not covered by Eurostat 23 .
The dataset produced in this study 25 consists of the original CCM2 catchment polygon layer 7-9 (spatial projection ETRS LAEA 1989) whose table reports emissions to waters according to the items listed in Online-only Table 2. All sources of the same type (table items) were aggregated per catchment. The list of items reported in Online-only Table 2 reflects the approach and interpretation given in this study. To allow for a more flexible use of the dataset, for example in case users would select only some waste sources, or apply the dataset to other waste related studies, we report domestic waste in terms of PE load emitted through different pathways and derived constituents (nitrogen, phosphorus, and BOD) as estimated in our study. For the administrative units, we provide the same table but with reference to administrative regional code at NUTS Level 3 23 .

technical Validation
Comparison of domestic waste population shares. Continental and global assessments of domestic waste are usually based on population and national statistics as availability of more detailed data is rare. Thus, it is insightful to compare country domestic PE loads for the region where both REP and POP approaches could be applied.
Country total Population Resident Equivalent in the two approaches showed very good agreement, with a Pearson's correlation coefficient ρ of 0.99 ( Fig. 4; coefficient of determination R2 = 0.98; percent bias PBIAS = 4.6%; Nash-Sutcliffe Efficiency NSE = 0.98). This was achieved thanks to the introduction of 'residual population' to account for small dwellings that may be unreported in the UWWTD database. Yet, in some countries, notably AT and DK, which reported large PE loads, PRE in the REP approach remains higher than residential population and above the 1:1 line. Conversely, others countries, like BE and CZ, reporting lower than expected PEs but small shares of disconnected or untreated population, lay below the 1:1 line.
Larger differences amongst the two approaches emerge however when looking at shares of population by treatment level. The correlation ρ between PRE in disconnected population was 0.78 (Fig. 4, orange triangles; R2 = 0.61; PBIAS = −4.6%; NSE = 0.46). This attests inconsistencies in data sources used in the two approaches. For example, the GB share of disconnected population in REP approach is much higher than in POP, because the UWWTD database reports that in GB almost 300,000 PE are treated via IAS (part of disconnected population,  Table 2. Treatment removal efficiencies adopted in this study per treatment level T and constituent X (N = nitrogen; P = phosphorus; BOD = Biochemical Oxygen Demand).
www.nature.com/scientificdata www.nature.com/scientificdata/ time REP data for CH, EE, RO reports lower shares of disconnected population but higher shares of connected not treated population than POP data. The highest inconsistency between the two approaches was observed for 'Connected not treated' population share (PE_0/Pop_0). For this share, the largest deviation was the case of IT; in the UWWTD database, IT reports 0.7% of connected not treated PE (Table 1), but Eurostat 9 reports about 30% of population as connected to sewerage but not treated (Online-only Table 1).
Correlation increased for primary treatment share, albeit differences for some countries were large (Pearson's correlation coefficient ρ = 0.79; R2 = 0.62; PBIAS = +10.6%; NSE = −0.41). In this case, part of the discrepancies between the two approaches may have been generated by assuming primary treatment for WWTPs whose treatment type was not clearly declared in the UWWTD database. The assumption affected the majority (>75%) of WWTPs classified as primary treatment in BE, BG, CZ, ES, FI, HR, IE, LU, LV, RO, SI and SK. Conversely, it did not affect REP primary treatment for DE, GB or DK, which instead report larger shares of primary treatment through the UWWTD database than through national statistics (Table 1 and Online-only Table 1). The agreement between shares of population increases substantially in secondary (ρ = 0.98; R2 = 0.95; PBIAS = 3.5%; NSE = 0.69) and tertiary treatment (ρ = 0.96; green squares in Fig. 4; R2 = 0.93; PBIAS = 8.8%; NSE = 0.89). This is reassuring because shares of domestic loads treated at the secondary or higher level represent the large majority of population (89% of PE_GEN). Figure 4 highlights discrepancies in national reporting of domestic waste through different channels 14 . It is possible that interpretation of Eurostat statistics by reporting countries was different from the one assumed in this study (Online-only Table 1). Differences in reporting periods and variability in the PE/Population rate further complicate comparisons.
Finally, we would like to remark spatial distribution implications in using waste emissions from point data (REP approach) or from combining population density and national statistics (POP approach). Figure 5 compares PE loads (in logarithmic scale) estimated by the two approaches and aggregated at decreasing spatial administrative scale, from country level (NUTS Level 1), to NUTS Level 2, and NUTS Level 3. At national scale the two approaches are consistent and close to 1:1 line. However, allocation of domestic waste diverges under the two approaches as the spatial scale gets finer. Pearson's correlation coefficient ρ decreased from 0.99 at country scale (NUTS Level 0), to 0.96 at NUTS Level 1, 0.93 at NUTS level 2, to 0.83 at NUTS Level 3 (and to 0.34 at CCM2 catchment scale). A limit case at NUTS Level 2 scale exemplifies the cause of the divergence: UKI4, part of greater London, where 2.6 million people live, scores 0 PEs in the REP approach because no WWTP discharge point falls in this area, whereas under the POP approach 3.2 million PEs are attributed to this region.
Both approaches suffer limits and uncertainties, raising on one side from the assumptions adopted in the study, and on the other side from errors and inconsistencies that were detected in data sources, thus discrepancies should not to be regarded as errors in either approach. Yet, the UWWTD database represents an important step forward in tracking domestic waste generation and fate, and reporting quality of the original UWWTD database has improved in time. At the same time, the introduction of "residual population" shares allowed to align the two approaches and to fill in a domestic source gap (small dwellings) in the UWWTD database. www.nature.com/scientificdata www.nature.com/scientificdata/ Comparison of emission loads and removal efficiencies with UWWTD database data. For a minority of WWTPs and on voluntary basis, the UWWTD database reports incoming and exiting loads of nitrogen, phosphorus and BOD. When the reported loads were consistent with incoming PE load and WWTP treatment level was unambiguously declared, this information was used to (i) test the estimation of pollutant loads from domestic waste adopted in this study, and (ii) compare reported treatment efficiencies with those assumed in this study (Table 2). Notably however, declared WWTPs emissions were not retained in the final dataset 25 to avoid methodological inconsistencies, which could rise for example when using the dataset for scenarios analysis.
In Fig. 6, declared incoming loads as reported in UWWTD database are compared with incoming loads as estimated from human diet in this study. Reported treatment efficiencies varied considerably, with interquartile ranges being larger than 0.2 (Table 3). Mean efficiencies from reported data however compare very well with the ones assumed in this study (Table 2). Assumed treatment efficiency for BOD at secondary level and for phosphorus removal were slightly higher than reported means, but were still within the interquartile ranges. In any case, Table 3 indicates that removal efficiencies are a substantial source of uncertainty in the estimation of domestic waste emissions.

Comparison of WWtP emissions from different sources. The European Pollutant Release and
Transfer Register database (E-PRTR) 26 reports emissions from large WWTPs (i.e. those with incoming loads above 100,000 PE or whose emissions are higher than given thresholds). The presence of WWTPs in the E-PRTR database provides another source of information about pollutant emissions of domestic waste, albeit limited to a sample of very large facilities. An independent study 27 analysed WWTP-related information reported in the E-PRTR and estimated median emission factors for nitrogen (N), phosphorus (P) and Total Organic Carbon (TOC) per PE and per treatment level, which can be compared to this study estimations (Table 4).
Nitrogen emissions estimated in this study are slightly higher than medians reported in van Duijnhoven and van den Roovart 27 or emissions declared in the UWWTD database, probably because of the low efficiency assumed for primary treatment (Table 2). Conversely, phosphorus emissions for primary and secondary www.nature.com/scientificdata www.nature.com/scientificdata/ treatment estimated in this study were higher than what reported in the E-PRTR or UWWTD database. van den Roovart et al. 27 did not separate tertiary treatment with or without P removal. However, most of tertiary WWTPs include phosphorus removal technology (80% of tertiary facilities, treating 90% of incoming waste load treated at  Table 3.  To transform BOD into TOC, this study assumed the molecular ratio of 1.68 +/− 0.375 28 . For primary treatment, TOC estimations of this study concur with median emissions from E-PRTR, but are higher than estimates from the UWWTD database data. Conversely, at secondary or tertiary treatment this study estimates concur with UWWTD database but are lower than van den Roovart et al. 27 estimates. The comparison confirms the validity of assumptions taken in this study to assess emission loads and abatement, but highlights as well the uncertainty in the estimation of emissions.

Usage Notes
Emissions to waters (items in Online-only Table 2) were selected to allow for flexible usage. PE loads of domestic waste can be used for assessing other forms of waste pollution, for example emergent pollutants. PE for the REP approach conforms to UWWTD database definition; PE in POP approach areas were derived multiplying population by the 1.23 factor assumed for Europe. As they are not part of official reporting, shares of domestic waste for "residual population" were kept separate to allow users to discard them should they prefer that. Users can decide whether to accept this study methods to assess nitrogen, phosphorus or BOD emissions, or to change some assumptions, in e.g. treatment efficiencies, pathways, etc. For example Morée et al. 2 applied a 20% of losses on incoming nitrogen loads in scattered dwellings for volatilization; this loss was not applied in the present dataset but can be easily enforced should the user want to.
The database can be downloaded as an ESRI file geodatabase that includes catchment spatial information, or as stand-alone tables (csv format). In addition to the database, a map viewer allows browsing the dataset according to the four constituents (PE, nitrogen, phosphorus, or BOD, in annual values per km 2 ) and three types of nested hierarchies: administrative units 23,24 , hydrological units 9 , and official Water Framework Directive River Basins Districts 29 . The web-viewer is scale dependent, so units activate or deactivate by zooming in or out of an area, and can be used to explore amount and shares of PE, and loads of nitrogen, phosphorus and BOD emissions. The default viewer displays PE at administrative level. Three main buttons on the right side of the viewer allows to activate the legend, change layers to be displayed (top layer selected covers the others), and access an information page. By clicking on any unit, a pop-up window opens to show dataset content.
We discourage spatial disaggregation of the reported loads to scale larger than the CCM2 catchments to limit the impact of potential location errors on tracking emissions. Population density used in the POP approach has an original scale of 1 km 2 . For REP region, should users wish to revert to point discharges, then the original UWWTD database 10 could be used instead of this dataset.

Code availability
UWWTD database changes were developed through several R scripts whereas checks and corrections of spatial coordinates were conducted using ArcGIS interface. Final assembly of GIS dataset was conducted using SQL scripts. We regret it is not possible to package all the codes and procedure into sharable documentation.
Based on E-PRTR database 27 Estimated in this study UWWTD database 10  Table 4. Comparison of emission factors of nitrogen (N), phosphorus (P) and Total Organic Carbon (TOC) per PE estimated by (i) an independent study 27 based on E-PRTR data 26 ; (ii) in this study; and (iii) in the UWWTD database 10 subset of data. *TOC was estimated from BOD assuming a ratio BOD/TOC = 1.68 +/− 0.375 after Dubber and Gray 28 . Values in brackets report TOC emission ranges when adopting this error.