Introduction

Over the past century, the average global temperature on Earth has increased by at least 1.1 °C since 18801,2, and the world population has grown by ~4.4 billion to nearly 8 billion people between 1980 and today3. These profound changes have considerably influenced land use patterns and watershed characteristics, leading to shifts in hydrogeology, vegetation cover, rainfall-runoff patterns, and water quality4,5. Source apportionment in watershed management is a critical analytical process for identifying diverse sources of pollutants and assessing their specific contributions6. However, current methods typically focus on analyzing data from a specific year, often relying on a static perspective and neglecting the dynamic changes in land use, climate, and human activity over a long period of time7. Apportionment results based on static assumptions may lead to overestimating or underestimating sources, which may not be applicable due to changes in specific sources8. This has also created challenges for long-term watershed management9. It is thus necessary to adopt adaptive management to, enable timely adjustment of reduction strategies under changing conditions.

Generally, pollution sources can be categorized into three types. The first type is closely linked to natural conditions, as pollutant emissions are directly influenced by factors such as rainfall, wind direction, and temperature. For instance, rainfall can impact runoff processes and water quality6. The second type includes sources strongly associated with human activities, such as industrial emissions, vehicle exhaust, agricultural fertilizers, and wastewater discharge10. For example, increased irrigation in farmland can lead to higher concentrations of fertilizers in water bodies8. The third type encompasses sources that are influenced by an interplay of both natural conditions and human activities, leading to distinct emission characteristics. Many scholars have demonstrated that pollution sources undergo drastic changes over a long period of time11,12. Tan et al. (2023)13 reported that the share of pollution attributed to rural domestic activities, which includes household wastewater, septic systems, and agricultural runoff, decreased from 23.50% to 20.04% from 1995 to 2005 and then increased to 20.42% by 2020. Hu et al.14 showed that the net anthropogenic phosphorus inputs progressively increased by 1.4 times from 1980 to 2015 in the Yangtze River basin in China. Previous studies have underestimated or overestimated the true extent of pollution, as the cycles of meteorological changes and social dynamics that notably impact pollution sources span long time scales. Climate research, for instance, requires long-term data spanning decades or even centuries to explore trends and cyclical variations15. Similarly, the length of a social cycle varies depending on factors such as development stage, political system, and cultural characteristics, typically ranging from 30 to 50 years16. We acknowledge that the dynamics of pollution sources over time can vary significantly. Many sources are changing simultaneously in China, a consequence of rapid development. But, it is also true that certain pollution sources may remain stable, showing little meaningful change. Conversely, some sources might change abruptly in response to sudden alterations in environmental or constructed conditions. This complex behavior underscores the importance of a nuanced approach to studying pollution dynamics over time. Current research must not only consider the changes in the types of pollution sources and their contributions but also concurrently identify the diverse patterns of change in these sources. This includes examining variations in their intensity, frequency, and impact areas to fully understand the dynamic nature of water quality and to develop effective pollution control strategies.

At present, although identifying specific key sources provides valuable information, their importance extends beyond merely pinpointing important sources. For example, certain methods, such as export coefficients17, the phosphorus index (PI)18, source characteristic factors, composition and ratio methods, chemical mass balances19, receptor modeling and materials flow analysis20, enable researchers to quantify the relative contributions of different point sources to overall pollution levels between sources, such as industrial discharge, and nonpoint sources (NPSs), such as agricultural runoff. However, these methods assume that pollution sources have a uniform impact over time21,22. This assumption does not fully capture the complexities of real-world scenarios where non-linear factors such as topography, land use, rainfall patterns, and hydrodynamic processes can considerably alter pollutant transport and the contributions of various sources. Additionally, other methods overlook pollutants’ temporal and spatial variations, which seriously limits the accuracy of key sources23,24. Nonetheless, the changes in pollutant emissions over a long period of time can be analyzed as a non-stationary time series25,26. The Seasonal Trend Decomposition using locally weighted regression (STL) method has been applied to water quality assessment by considering the periodic and random characteristics of pollution during the change process27,28,29. The Autoregressive Integrated Moving Average (ARIMA) model, which combines time series and regression analysis, can effectively capture changes in different periods and provide valuable insights for guiding pollutant control30,31,32. Physics-based models are firmly rooted in physical laws and can simulate environmental processes, especially those related to hydrology and pollution dispersion. Moreover, source apportionment can be regarded as a multi-objective problem that incorporates the long-term contributions and trends of different sources. It provides decision-makers with a series of Pareto-optimal solutions by comparing the non-dominance relationships among different source33. These solutions enable decision-makers to determine the best plan based on various factors.

Our primary objective in this study is to identify a dynamic source apportionment framework that integrates the STL method, ARIMA and physics-based models. Specifically, we aim to understand the patterns of change in pollution sources and their impacts utilizing an extensive inventory of both long-term and dynamic sources. This approach is designed to provide a comprehensive and nuanced view of source apportionment, accommodating the complexities and temporal variations inherent in environmental data. We selected the Hangbu River watershed in the Chaohu Lake basin, China, to study the dynamics of pollution sources. This watershed was selected due to its mix of land uses and varied sources of pollution for studying nitrogen and phosphorus dynamics in comparable watersheds. The sources of nitrogen and phosphorus are attributed to two major categories: human activities and natural sources (Fig. 1)34,35. Human activity sources include industrial sewage; urban domestic; rural domestic; urban NPSs; intensive livestock and poultry breeding industry; dispersed livestock and poultry breeding industry; the planting industry; and aquaculture. Natural sources refer to background-level pollution from natural land and river sediments.

Fig. 1: Changes in pollution sources in agricultural watersheds over the past 40 year.
figure 1

As socio-economic development progresses, industrial sources in watersheds, urban non-point sources, and urban domestic source have emerged and increased successively.

Results

Varying source composition and contributions

By considering various scenarios and identifying uncertainties in the contributions of different sources to watersheds (Supplementary Table 1, Tables 2 and 3), it is possible to assess how environmental factors, climate change, and socio-economic development influence the distribution of these sources over time. A hydrological year refers to a 12-month period used for measuring precipitation and streamflow in hydrology. The categorization of a hydrological year—into wet, normal, or dry—relies on long-term analysis of precipitation and streamflow. Wet years significantly exceed, while dry years fall below, the long-term average, indicating periods of surplus or scarcity, respectively. Normal years align closely with the historical average, serving as a baseline for water resource equilibrium. In a wet hydrological year with high socio-economic development (Scenario 1) (Fig. 2(a)), the planting industry is identified as the dominant contributor of nitrogen and phosphorus, constituting ~64% and 38%, respectively, of the total loads. Here, the nitrogen and phosphorus load from the planting industry amount to ~7672.47 tons (t) and 314.10 t, respectively. This dominance can be attributed to several factors specific to local wet conditions. First, in wet hydrological years, there is increased runoff and leaching from agricultural lands, which often results in increased transport of nitrogen and phosphorus from fertilizers and soil nutrients into water bodies, further contributing to elevated nutrient loads in surface waters. In such scenarios, the planting industry, with its extensive use of fertilizers and other agrochemicals, becomes a major source of these nutrients. The intensive livestock and poultry breeding industry is the second largest contributor, accounting for approximately 12% of the total nitrogen loads (1448.51 t) and 20% of the total phosphorus loads (163.04 t), primarily due to the runoff and leaching of manure and other waste products, which are also exacerbated by heavy rainfall. However, these observations differ in a dry hydrological year with high socio-economic development (Scenario 2) (Fig. 2b), where the planting industry, despite remaining the dominant source, contributes markedly less nitrogen, at approximately 1904.55 t, which accounts for approximately 36%. This difference is largely due to reduced runoff and leaching under drier conditions36. Light rainfall does not notably affect the migration of phosphorus attached to sediment9. Similarly, in a normal hydrological year with high socio-economic development (Scenario 5) (Fig. 2(e)), the planting industry again emerges as the most substantial contributor to nitrogen and phosphorus, showing a substantial increase to 2617.95 t and 142.14 t, respectively, accounting for 39% and 27% of the total loads. These findings underscore the considerable influence of rainfall variations on the contributions of different sources and highlight the importance of considering climatic factors when interpreting source apportionment results.

Fig. 2: The proportion of pollution source contribution in different scenarios.
figure 2

Scenario 1 represents wet years with high socioeconomic development; Scenario 2 represents dry years with high socio-economic development; Scenario 3 represents normal years with low socio-economic development; Scenario 4 represents normal years with moderate socio-economic development; and Scenario 5 represents normal years with high socio-economic development.

To explore the contributions of pollution sources under the different development scenarios, we analyzed three typical scenarios—a low development scenario (Scenario 3), a moderate development scenario (Scenario 4), and a high development scenario (Scenario 5)—to demonstrate the impacts of economic growth, population increase, and land use changes on pollution loads. In a normal hydrological year with low socio-economic development (Scenario 3) (Fig. 2c), the planting industry was the primary contributor, with a nitrogen contribution of 17466.36 t, accounting for 76% of the total nitrogen loading. Urban domestic sources, mainly from untreated or improperly treated sewage discharged directly into rivers, rank the second, with a nitrogen contribution of 4059.19 t, accounting for approximately 18% of the total loading. In contrast, for phosphorus contribution, urban domestic sources emerge as the most principal contributor, with 293.11 t, accounting for 45% of the total, while the planting industry ranks the second, with 166.44 t, accounting for 25% of the total. Economic growth and population increase may lead to higher pollution levels from fertilizers37. In a normal hydrological year with moderate socioeconomic development (Scenario 4) (Fig. 2d), the planting industry remained the primary source, contributing 91% of the total nitrogen load (32278.28 t) and 58% of the total phosphorus load (556.56 t). Human activities and land use changes are increasing rapidly, leading to an increase in pollution closely associated with human activities, such as agriculture and the livestock breeding industry. Moreover, driven by increased productivity, wastewater treatment plants have been constructed in watersheds over the past decade to treat industrial and urban wastewater. In a normal hydrological year with high socio-economic development (Scenario 5) (Fig. 2(e)), the planting industry remained the largest nitrogen source, with a contribution of 2617.95 t contribution (39% of the total). In comparison, intensive livestock and poultry breeding industry source ranked the second, with a contribution of 1495.06 t (22% of the total). The planting industry contributed the most phosphorus, with 142.14 t (27% of the total), followed by aquaculture, with a contribution of 131.62 t (25% of the total). Environmental systems, influenced by factors such as seasonal variations, climate shifts, and land use changes, are inherently dynamic. Relying on data from a specific year would miss these fluctuations, potentially skewing apportionment results. For example, pollution patterns in dry and wet hydrological years can differ markedly due to varying runoff. Our findings highlight the importance of multiyear analyses for obtaining a comprehensive understanding of pollution sources.

Tendency, periodicity, and mutagenicity of sources

The complexity of long-term and temporal trends in nitrogen and phosphorus emissions requires an understanding of how development and hydroclimate variability impact the physicochemical processes responsible for changes in water quality38. In this study, the 36-year long-term values (from 1985 to 2020) of specific sources in the study area were analyzed using the Mann-Kendall (M-K) method and the least squares method. The M-K method, as a non-parametric test, was employed to detect the importance of trends in the time series data of emissions. In turn, the least squares method was applied to quantify the magnitude of the detected trends. Emissions from rural domestic sources increased by 414.98 t per year for nitrogen and 17.73 t per year for phosphorus between 1985 and 1991. From 1992 to 2020, the annual emission of pollutants from rural domestic source decreased. Specifically, the nitrogen gradients of the fitted lines for these annual emissions were 385.24 t a−1 and 16.41 t a−1, respectively. This analysis also confirmed a marked downwards trend from urban domestic discharge to direct discharge of industrial sewage from 1985 to 2003, followed by a slight decline in subsequent years. However, nitrogen emissions exhibited an overall upwards trend during 2000–2020, with a particularly notable increase before 2010, when all sources except urban domestic and industrial sewage showed a marked increase. This trend can be largely attributed to several factors. First, the extensive development of agricultural land and intensive use of nitrogen-rich fertilizers during this period were found to direct contributors to increased emissions. Furthermore, a shift towards more intensive agricultural practices occurred during the early 21st century in China, including the overuse of fertilizers and the cultivation of high-yield crop varieties, which demand more nitrogen. These practices, coupled with inadequate nutrient management, resulted in elevated nitrogen levels in adjacent water bodies. Additionally, the loss of soil organic materials due to runoff, especially in the 2010s, further exacerbated this situation. These factors collectively contributed to the heightened nitrogen emissions observed during this period39.

In this study, we model the spatial or temporal variability in nitrogen and phosphorus by establishing a linear relationship between the STL decompositions of precipitation, emissions from different sources, and environmental variables. The changes in environmental factors versus riverine fluxes reveal robust relationships characterized by linear increases in riverine fluxes. The correlations among precipitation, construction land area, and urban NPSs emissions were linear and significant, with coefficient of determination (R2) values of 0.93 for nitrogen and 0.96 for phosphorus (Supplementary Table 4). The strong correlation can be primarily attributed to the inherent link between urbanization, population growth, and the expansion of construction land. This expansion inevitably leads to an increase in urban NPSs pollution. Urbanization, characterized by the widespread emergence of impervious surfaces such as concrete and asphalt, drastically reduces the ability of soil to absorb rainwater and transports more urban pollutants into local water bodies40,41. These changes intensify urban NPSs emissions and disrupt natural hydrological processes, thereby impacting the severity of urban pollution and the sustainability of water resources42,43,44.

The periodic variations in several source emissions, such as urban NPSs, the planting industry, and natural sources, were strongly influenced by precipitation45, especially for the planting industry. Agricultural development has long been identified as a factor that strongly affects the quantity and quality of water bodies46. There are linear relationships among precipitation, farmland area, fertilizer application amount, and source emissions from the planting industry, with R2 values of 0.72 for nitrogen and 0.76 for phosphorus (Supplementary Table 4). Moreover, the periodic variations in precipitation are positively correlated with nitrogen and phosphorus emissions. For the studied period, if the periodic trend of annual precipitation exhibited a change of 1 mm, nitrogen and phosphorus emissions increased by 2354 t and 287.14 t, respectively (Supplementary Table 5). Temporal variations in nitrogen concentrations in surface water are predominantly influenced by seasonal climate changes, especially during periods of nutrient application47,48,49. Additionally, excessive manure application to meet crop demands can lead to adverse effects on soil phosphorus conditions. This over-application often results in a marked increase in phosphorus concentration in the soil solution, which may subsequently leach into adjacent water bodies50.

In this study, ARIMA modeling was employed to quantify the variability in pollution sources. In our analysis, for the random term of the nitrogen from the planting pollution source, the ARIMA parameters are set as 2, 0, and 2 for the p, d, and q values, respectively, resulting in an R2 of 0.27. For the random term of the nitrogen from the urban NPSs, the parameters are 4, 0, and 0, respectively, achieving satisfactory results with an R2 greater than 0.57. Additionally, the random terms of phosphorus from the planting industry, nitrogen from urban NPSs, and nitrogen and phosphorus from natural sources exhibited characteristics of white noise. In statistical terms, this means that their auto-correlation function and partial auto-correlation function did not show a gradual decay in correlation with increasing time lags. These findings revealed a robust and positive correlation between periodic fluctuations in rainfall and emissions from the planting industry, urban NPSs, and background values from natural sources. These periodicity and variability factors markedly influence the observed trends, particularly in the case of planting source, which demonstrates the most pronounced sensitivity to rainfall, exhibiting fluctuations ranging from 0.93% to 58.45%. Based on the above results, the study quantified the trends and patterns of changes in different sources over a long time series (Fig. 3).

Fig. 3: Different terms of nitrogen and phosphorus emission series.
figure 3

a The trend term of NES; b the periodic; and c random term of NES from planting industry; d the trend term of PES; e the periodic; and f random term of PES from planting industry.

Dynamic key source identification

Next, the pollution sources were ranked using a Pareto-based approach, considering three dynamic contribution indicators over a long-term time series (Supplementary Table 6), namely the contribution value (the relative contribution of a pollution source to the total pollution load), trend of change (the direction and magnitude of changes in a source’s pollution contribution), and robustness (the consistency and reliability of a pollution source’s contribution across different conditions and time periods).

However, it should be noted that the strategies for each source may exhibit slight variations depending on the specific pollutant under consideration. The Pareto front surface is shown in Supplementary Table 7, with 33 layers of the Pareto front for nitrogen and 29 layers of the Pareto front for phosphorus. In the case of nitrogen, dynamic source apportionment revealed the planting industry as a key source in the region, covering 64.60% of the watershed; this industry is located mainly in the middle and upper reaches of the watershed. Additionally, urban domestic sources are determined to be key sources in the region, covering 3.99% of the watershed and primarily located in the upper reaches. Together, these areas contributed 87.83% of the total nitrogen. Regarding phosphorus, dynamic source allocation designates large-scale livestock and poultry breeding industry pollution in the upper reaches, covering 22.00% of the watershed area, as a key source. Simultaneously, the planting industry in the downstream region, which accounts for 14.15% of the area, and urban domestic sources in the densely populated central region, which covers 2.53% of the watershed area, are identified as key sources. These areas contributed 57.68% of the total phosphorus. Compared to conventional methods, the dynamic source apportionment considers characteristics such as trends and mutations (Fig. 4a, d). As a result, certain pollution sources, such as the planting industry from sub-watersheds No. 13 and No. 21 for nitrogen, and the urban domestic industry from sub-watershed No. 21 for phosphorus, are identified as key sources, even though their contributions to the overall output may not be the most prominent.

Fig. 4: Key source identification and spatialization under different strategies.
figure 4

a Traditional strategies; b recent strategies; c forward strategies for nitrogen; d traditional strategies; e recent strategies; f forward strategies for phosphorus.

Discussion

Studies on source apportionment often focus on specific years or limited periods. However, long-term factors from different sources exhibit diverse trends. For example, farmland exports a varying percentage of nitrogen and phosphorus, ranging from 36.17% to 90.93% and 10.85% to 58.01%, respectively, during different hydrological years. This finding supports the notion that pollutant sources can fluctuate over time due to changes in environmental factors. Among the various changes, the dominant factor is the trend, which is further influenced by periodic fluctuations resulting from variations in rainfall patterns and societal factors. These additional elements contribute to the complexity of management. The periodic changes observed in long-term series can substantially impact the dynamic trend of intrinsic nutrient inputs in watersheds. For instance, influenced by rainfall periodicity and other factors, the coefficient of variation of the planting industry contribution changes from 0.36 to 0.42. While the ARIMA model captures discrete random changes, its influence is relatively small compared to the impact of periodic changes.

Source apportionment under changing conditions is essential for ensuring that the relevant measures are implemented51,52. We conducted a cluster analysis on the Pareto-based stratification results by using recent, forward, and maintenance strategies. In the short term, prioritizing pollution control from major sources becomes crucial when decision-makers focus on the impact of these sources on watershed exports. This perspective, however, shifts in a long-term context. Here, decision-makers with a forward-looking approach might emphasize emerging trends, potentially spotlighting industries in urban domestic sectors that show a marked upwards trend. It is hypothesized that these sectors contribute substantially to environmental degradation over time. Conversely, decision-makers aiming for robust strategies might favor targeting stable, easily manageable sources such as industrial and rural domestic sectors. These choices reflect a balance between immediate impact and long-term sustainability.

The prioritization strategies for source apportionment, both for a specific year and over an extended period, are illustrated in Fig. 4 b–e. This approach is consistent with the views on adaptive strategies for climate change adopted by various countries and some departments around the world, emphasizing the need for adaptable and context-specific strategies in pollution control. For instance, in 2021, the executive branch of the United States, through executive actions and coordination among various federal agencies, updated the nation’s climate adaptation approach. This update focused on protecting federal infrastructure and introducing accountability measures for climate resilience. Norway has concentrated its efforts on assessments, planning, monitoring of flood risk, and constructing more resilient and sustainable communities53. In the United Kingdom, the Thames Barrier stands as a testament to adaptive ingenuity, safeguarding London from flooding and exemplifying robustness and flexibility in adaptation strategies. Moreover, in the agricultural sector, adaptation is evident through the revision of conventional farming practices. This includes the enhancement of irrigation systems and the cultivation of drought-tolerant crops, thereby ensuring the resilience of the global food system against the intensifying effects of heatwaves and droughts. Each of these national strategies reflects an approach to addressing the local impacts of global climate change, showcasing the importance of tailored solutions in building resilience.

In this study, we proposed an adaptive method that enhances real-time decision-making and response capabilities. By analyzing real-time data, this method enables the timely adjustment of strategies to adapt to changing conditions. In contrast, traditional source apportionment relies on static data and pre-determined measures54. Compared with results based on specific years, which may deviate from previous years, measures considering long-term source results were more concentrated. In traditional approaches, sources requiring reduction are prioritized based on their contribution to the entire watershed, often with the planting industry being the primary contributor. However, with adaptive source apportionment, we consider not only the contributions of sources but also their trends and periodic variations, which can lead to a scientifically informed strategy. For example, when identifying key sources of phosphorus, sub-watersheds 8 and 14 contributed 13.29% and 6.5%, respectively. However, their contributions are anticipated to decrease in the future, and their stability is expected to be relatively low, considering the dynamic contribution indicators: trend of change and robustness (Supplementary Table 6). Consequently, we should not prioritize implementing control measures in these areas.

Recognizing the importance of adopting comprehensive management that encompasses both short-term and long-term perspectives is essential for achieving effective improvements in water quality instead of relying solely on individual strategies55. Decision-makers should consider several factors when analyzing dynamic patterns of different sources over time series, including each source’s contribution over multiple years, the trends in their variations, and the reliability of control effects associated with each source. In light of these considerations, it is crucial to prioritize the development of control strategies that emphasize long-term effectiveness and resilience, particularly in temperate climates and predominantly agricultural watersheds, where typically exhibit distinct seasonal variations, which can markedly impact the patterns of pollution sources.

Several steps need to be performed in the future. First, the analysis employed linear equations to model the emissions and contributions of different sources. It is essential to recognize that complex environmental changes often lead to non-linear variations over time or in response to environmental variables. For instance, in 2016, the Chinese government designated protected zones for drinking water sources and terrestrial areas within the Chao Lake watershed, with restricted and prohibited farming areas covering an area of 1215.10 km2,56 (Fig. 5a, b). The assumption of simple linear approximations may not align with the actual regulations. Second, incorporating the contributions of different sources into a single optimization layer assumes the equal importance of these influences. This simplification is necessary due to data availability and consistency with other regional planning processes, but it may not accurately reflect the varying environmental impacts. Future research will seek to refine model parameterization through the integration of Dynamic Key Source Identification analysis, utilizing multi-source data assimilation algorithms to better reflect the dynamic nature of pollution sources. Third, we focused primarily on the contributions, variations, and robustness of the sources. It’s crucial to incorporate weighted factors to prioritize sources based on their environmental significance, considering ecological, health, or socio-economic impacts. This approach, moving away from treating all influences as equally important, will provide a more nuanced and effective strategy for dynamic conservation and sustainable management of watersheds. Final, acknowledging the focus on phosphorus and nitrogen as the main eutrophication drivers in our study, we recognize the importance of expanding our investigation to include a broader spectrum of pollutants, such as metals, E. coli, and various minerals. Future efforts will aim to include additional major nutrients and contaminants, thereby deepening our comprehension of environmental issues and enhancing strategies for ecosystem management and restoration in the region.

Fig. 5: Geographical location and regional characteristics of Chaohu Lake and Hangbu river.
figure 5

a Geographic location; b location map of monitoring stations of Chaohu Lake in China; c digital elevation model and d soil distribution map of the Hangbu River.

Methods

Accessibility and processing of data

The spatial heterogeneity of land use types (obtained from https://zenodo.org) in the watershed was substantial from 1985 to 2020 (Supplementary Table 8 and Fig. 1)57. The land use in the study area was dominated by agricultural land (~62–69%), which mainly consisted of dry land and paddy fields. Large-scale agricultural development started in the 1980s in the Hangbu River. The progressive land use change over the past indicated that agricultural development culminated in 2000 and became stable. Simultaneously, construction land was developed in the Hangbu River, and its proportion in the whole watershed changed from 1% to 4%, with a corresponding decrease in natural lands (grassland and wetland). The digital elevation map (obtained from http://www.ngcc.cn/ngcc/) and soil distribution of the Hangbu River are displayed in Fig. 5(c) and (d), respectively. The time series data, covering the period from 1985 to 2020, were derived from yearbook data and field investigations (Supplementary Table 9-11). The rural domestic pollution, intensive livestock and poultry breeding industry pollution, dispersed livestock and poultry breeding industry pollution, and aquaculture pollution were calculated based on township yearbook data, with the townships serving as the administrative units for data collection and analysis (Supplementary Table 9). This study used a method for processing long-term statistical series data based on administrative units and auxiliary information which refers to additional data sources includes geographical data, land use patterns, population statistics, and other relevant environmental indicators obtained from various government reports, local surveys, and academic studies. The method involved consolidating administrative units, matching statistical data with a standard administrative unit, and integrating indicator data from multiple sources. The fused data were also processed to ensure accuracy and reliability (Supplementary Section 1).

Construction of a long-term and dynamic sources inventory

A comprehensive figure that visually represents how each model and statistical test is integrated as shown in Supplementary Fig. 2. The long-term and dynamic sources inventory was constructed by integrating the following models:

  • Dynamic export coefficient model, which provides accurate pollutant emission data.

  • Soil and Water Assessment Tool (SWAT) model and Markov algorithm, which offer insights into the process of pollutant transport in river channels and allow obtaining distributed and precise contributions of pollution sources.

The export coefficient method, proposed by Johnes17, calculates pollutant emissions from different sources (Supplementary Section 2). The dynamic export coefficient model (DECM) was used to obtain more accurate watershed pollutant emission sources. A detailed explanation of the DECM is provided in our previous studies58,59. The formula for the DECM is as follows:

$${U}_{i}=\mathop{\sum }\limits_{q=1}^{n}D\left({{pcp}}_{i}\right)\left[{B}_{q}\left({J}_{i}\right)\right]$$
(1)

where \({U}_{i}\) is the i-th type of source emission, kg; D(pcpi) is the dynamic export coefficient of the i-th type of source depending on different rainfall amounts; pcp is the annual rainfall in a sub-watershed; Bq is the area of the q-th type of calculation unit, considering land use, soil type, slope, population, livestock, fertilizer and pesticide use, km2 and Ji is the inputs of the i-th type of source considering the nutrient input from rainfall, fertilizer, and pesticide use from big data by physics-based models, kg.

In this section, the accurate contributions of pollution sources were obtained by employing the SWAT model, a hydrological model used to simulate water balance and water quality processes in watersheds, and the Markov algorithm. The SWAT model is capable of simulating crop growth, nutrient cycling, and sediment transport while also considering the influence of climate factors, such as rainfall, temperature, and solar radiation, on hydrological processes52. Evaluation metrics such as R2, the Nash-Sutcliffe model efficiency, and percent bias (PBIAS) were utilized to assess the performance of the SWAT model (Supplementary Fig. 3, Tables 12, 13 and Supplementary Section 2).

This study used the Markov algorithm to define upstream and downstream relationships (Supplementary Fig. 4). Pollutants were generated upstream and moved to downstream regions. Ultimately, they reached the watershed outlet after a finite number of transitions. Our previous research and Supplementary Section 3 provide a detailed explanation of the method of incorporating spatial location information, which involved grid-scale and source-transfer-sink simulations based on the Markov algorithm60.

The response of pollution loads entering specific water bodies was as follows:

$$\Delta W=\Delta \left(\mathop{\sum }\limits_{i=1}^{l}{P}_{j}^{i}\right)$$
(2)

where, \(W\) is the load of the i-th type of source on the l-th reach entering the specific water bodies, kg; \({P}_{j}^{i}\) is the i-th type of source load to the outlets from the j-th sub-watershed; and l is the number of reaches where the pollutant was transferred to the specific water bodies.

Time series analysis and modeling

Identifying trends is crucial because it helps to understand the long-term changes in different sources and reveals periodicities and other characteristics of pollution processes. Data on long-term series of source emissions and contributions were obtained through the model mentioned above, and the change patterns of sources were identified. Specifically, by decomposing the emissions over a long time series based on the STL model, the sources were divided into different change patterns, and the distribution was modeled to quantify their impact based on the M-K test, least squares method, trigonometric function model and the Auto-Regressive Integral Moving Average model (Supplementary Section 4).

The STL model uses robust local-weighted regression as a smoothing method to estimate the value of a response variable23. The dynamic source emission series can be expressed as the sum of three components: trend, seasonal, and random factors. The original time series can be decomposed as follows:

$$Y={Y}_{{trend}}+{Y}_{{seasonal}}+{Y}_{{random}}$$
(3)

where, \(Y\) is the long-term series of source emission consumption, \({Y}_{{trend}}\) is the trend component, \({Y}_{{seasonal}}\) is the seasonal component and \({Y}_{{random}}\) is the random component.

The variation characteristics of the time series of each component are different due to different influencing factors. Among them, the trend component is mainly affected by human activity and economic factors and reflects a more extended period of development. The seasonal component is a periodic fluctuation with a fixed length and amplitude influenced by precipitation variation61. In this study, the temporal scale of pollution source emissions is annual, so the seasonal component can also be considered periodic. Various accidental factors, including national policies, influence random components. The individual components are quantified using a decomposition technique, and the quantified values of each component are used to establish response equations with different environmental variables in the Hangbu River. The response relationship models of the variables relevant to emissions from different sources in the long-term series are as follows:

$${Y}_{{trend}}=a+b{X}_{1}+c{X}_{2}+\cdots +n{X}_{n}$$
(4)
$${Y}_{{seasonal}}={y}_{0}+a\sin (\pi f({P}_{p}))$$
(5)
$${Y}_{{\rm{random}}}={\rm{ARIMA}}({\rm{p}},{\rm{d}},{\rm{q}})$$
(6)

where, \({P}_{p}\) is the periodicity trend of the precipitation data.

A trigonometric function was constructed to quantify the periodicity trend of emissions over long time series. Then, the non-stationary precipitation data were decomposed into three components: a trend term, periodicity trend, and random trend62. Similarly, a trigonometric function was constructed for the periodicity trend of precipitation data over long time series. Finally, a response relationship model was established between the periodicity trend of emissions and precipitation data.

$${Y}_{{seasonal}}={y}_{0}+a\sin (\pi (t-b)/w)$$
(7)
$${P}_{p}={z}_{0}+a\sin (\pi (t-b)/w)$$
(8)
$${Y}_{{seasonal}}=f({P}_{p})$$
(9)

where, \(t\) represents the year between 1985 and 2020.

This study used the ARIMA model to quantify a random term of pollution emissions. ARIMA models consist of three parameters, p, d, and q, which represent the autoregressive component (p), the number of differences made to achieve stationary time series (d), and the number of moving averages (q), respectively. When specific parameters are set to 0, ARIMA can be transformed into different forms, such as autoregressive (AR) and moving average (MA) models63. The AR and MA mathematical formulas are shown in Eq. (13) and Eq. (14), respectively.

$${Y}_{t}={e}_{t}+{\varphi }_{0}+{\varphi }_{1}{Y}_{t-1}+{\varphi }_{2}{Y}_{t-2}+\cdots +{\varphi }_{p}{Y}_{t-p}$$
(10)

where, {\({Y}_{t}\), t = 0, ±1, ±2, } is a time series; {\({e}_{t}\), t = 0, ±1, ±2, } is a white noise time series; and for any \(s\, <\, t,{E}\left({{Y}_{s}e}_{t}\right)\) = 0.

$${Y}_{t}={e}_{t}-{\theta }_{1}{e}_{t-1}-{\theta }_{2}{e}_{t-2}-\cdots -{\theta }_{q}{e}_{t-q}$$
(11)

where, \({Y}_{t}\) is the dependent variable at time t; \(e\) is the white noise with variance \({\sigma }^{2}\); and {\({\theta }_{q}\), q = 0, ±1, ±2, }\(\in\)[0, 1].

Dynamic identification of key sources

Identifying key sources in dynamic apportionment can be achieved by using a multi-objective algorithm, an effective method for obtaining optimal allocation results (Supplementary Section 5). This study established a multi-objective function for dynamic source apportionment in watersheds with the objectives of maximizing contributions, maximizing growth trends, and achieving high robustness. Subsequently, the solutions were ranked by solving the multi-objective ranking problem.

Objective function l: Maximum contribution

$${\rm{Load}}=\max \frac{\mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{i=1}^{l}{P}_{j}^{i}}{N}$$
(12)

Objective function 2: Maximum trend of change

$${\rm{Load}}=\max k$$
(13)
$$k=\left({\rm{n}}\sum {xy}-\sum {\rm{x}}\sum {\rm{y}}/\left({\rm{n}}\sum \left({{\rm{x}}}^{2}\right)-{\left(\sum {\rm{x}}\right)}^{2}\right)\right.$$
(14)

where, n represents the number of samples and x and y represent the independent and dependent variables of the samples, respectively. Σ represents the summation symbol, \(\sum {xy}\) represents the sum of the products of x and y, \(\sum x\) represents the sum of x, \(\sum y\) represents the sum of y and \(\sum \left({x}^{2}\right)\) represents the sum of the squares of x.

Objective function 3: Maximum robustness

$${\rm{Load}}=\max \sqrt{\frac{1}{M}\mathop{\sum }\limits_{i=1}^{M}{\left(\mathop{\sum }\limits_{n=1}^{N}\mathop{\sum }\limits_{i=1}^{l}{P}_{j}^{i}-\mu \right)}^{2}}$$
(15)

To meet the diverse management needs of decision-makers, this study employed the k-means clustering algorithm to investigate the generation of different preference scenarios. The specific descriptions of the different strategies are as follows:

  • Recent strategy: This strategy aims to maximize the short-term benefits of pollution prevention and control without considering the long-term advantages.

  • Forward strategy: This strategy focuses on pollution control and trend monitoring. The goal is to allocate limited resources to more cost-effective sources and prevent their further growth.

  • Maintenance strategy: This strategy aims to sustain the current contribution of the pollution source and the existing level of investment in pollution control measures.