Introduction

Rivers are important channels for matter migration and transport between land and lakes or oceans, providing abundant fresh water resources for potable water, irrigation, aquaculture, navigation, and power generation1,2. However, river ecosystems are experiencing widespread deterioration and are globally threatened by anthropogenic activities and climate change3,4. A global study revealed that nearly 80% (4.8 billion) of the world’s population (for 2000) lives in areas with a high incidence of threat (>75%) to human water security5. Even worse, one-third of the world’s population lacks access to safe potable water6. In the face of current challenges, it is urgent to diagnose the threats to river water quality over a broad range of time and space scales, remedy their underlying causes, and limit the threats from the source to protect river freshwater resources5.

China’s rivers have suffered profound water quality impairments due to the undeniable pressure of economic development on the environment since China’s Reform and Opening-up in 19787. Water pollution in China has been confirmed to be a major cause of 40 billion cubic meters of water shortage in China per year8. The elevated input of anthropogenic nutrients is a critical cause of reduced water quality in Chinese rivers. According to estimates from multi-scale models, the total dissolved nitrogen (TDN) and total dissolved phosphorus (TDP) input to rivers in China in 2012 were 28 Tg and 3 Tg, respectively9. Furthermore, excess nutrients from rivers were transported to lakes and the ocean, resulting in frequent episodes of blooms and red tide, endangering human and aquatic health and ecosystem services10. Fortunately, inland water quality across China displayed marked improvement or was maintained at favorable levels nationwide from 2003 to 2017, which is attributed to reductions in nutrient discharge11,12. In 2022, a national investigation of 3641 sampling sites in rivers, lakes, and reservoirs across China showed that 12.1% of sampling sites had water quality lower than Class III according to the China Surface Water Environmental Quality Standard (GB3838-2002), while 0.7% of sites had a more severe condition at worse than Class V13. Over the past four decades since the Reform and Opening-up, under the background of balancing economic development and environmental protection, it is essential to identify the water quality patterns and underlying mechanisms in China’s rivers to provide references and information for river water quality protection in developing countries7.

Several studies have been devoted to the patterns of water quality in China’s rivers and their associated drivers, such as quantifying inputs of N and P to Chinese rivers from different sources at multiple scales9, the cycle of nutrients in river systems including sources, transformation, and flux14,15, and the spatial water quality patterns and critical covariates of river impairment12,16,17,18. However, there are still gaps in understanding the spatiotemporal variation and underlying mechanism of water quality of China’s rivers over the past four decades. First, the lack of long-term regular frequency nationwide monitoring data is the major bottleneck in the study of impacts on river water quality, because the traceable and available monitoring data only extends from 200312,19. Second, the identification of the driving mechanisms of river water quality variation are subject to the temporal and spatial scale resolution of factorial models and explanatory variables (including natural geographic characteristics, socioeconomic indicators, land use data, and meteorological factors)12,17. Finally, it is challenging to bridge the barriers to scientific research and management applications, and apply the understanding of historical river water quality variation and driving mechanisms to future water quality management and the achievement of the sustainable development goals (SDGs)7,10.

The study assembled a 16-year (2003–2018) monthly data from 613 riverine water-quality monitoring sites as well as watershed characteristics (e.g., longitude, latitude, land-use patterns, net anthropogenic N/P inputs, and soil properties) and climate conditions at a national scale to build a set of stacking machine-learning models. The stacking machine-learning models integrated the different base models’ results, which could reduce variance and improve the stability of the final model20. Three base models were selected for their high popularity and performance based on previous studies, including random forest (RF), support vector machine (SVM), and k-nearest neighbors (KNN)19,20. The stacking model was used to simulate and predict the annual and monthly variations in river water quality during the period of 1980 to 2018 (Fig. 1). We then use two future scenarios (SSP2-RCP4.5 and SSP5-RCP8.5) to predict the decadal trends in water quality between 2020–2050. Multiple linear regression (MLR) models and correlation analyses were employed to quantify the relative contributions of anthropogenic, climatic and geographical factors to changes in riverine TN, NH3-N, and TP as well as the CODMn. On the basis of the relations of SDGs to water quality, sustainable water quality management policies were proposed to achieve a better aquatic environment for China’s rivers, as well as other developing countries.

Fig. 1: Framework used for simulating nutrient concentrations in Chinese rivers.
figure 1

Details of the model stacking process including data processing, stacking model, ten-fold cross validation, and model application. Monitor data of TN, TP, NH3-N, and CODMn concentration from 2003 to 2018 are used to stimulate the monthly data of 1980 to 2018, and the interdecadal data 2020–2050 under two future scenarios (SSP2-RCP4.5 and SSP5-RCP8.5).

Results and discussion

Spatiotemporal trends in TN, TP, NH3-N, and CODMn concentrations

Comparison between measured and predicted values in the 10 large river basins showed that our machine-learning models were generally able to recreate TN, TP, NH3-N, and CODMn concentrations at a significance level of p < 0.01 and with a low predictive bias estimated using the R2, root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and the mean absolute error (MAE) (Supplementary Fig. 1, Supplementary Table 1). The accuracy statistics of 10-fold cross validation on both test and validation datasets also indicated the goodness of fit and predictive power of the stacking model for TN, TP, NH3-N, and CODMn concentrations (Supplementary Tables 25). Although this data-driven machine learning stacking model had limited interpretability and deductibility, it showed promise with its robustness and stability in the induction capability of the educated ensemble machine learning19,20.

The concentration trends of TN, TP, NH3-N and CODMn were different throughout the country from 1980 to 2018 (Fig. 2a–d). The proportion of sampling sites with simulated TN concentrations lower than 1.5 mg L−1 in 1980, 1990, 2000, 2010, 2015, and 2018 was 19.83%, 20.25%, 19.60%, 18.65%, 17.41%, and 18.12%, respectively. These trends suggest that TN pollution increased during the observational period. At present, although there is a threshold of river TN concentrations in the China Surface Water Environmental Quality Standard (that is, 1 mg L−1 for Class III), it has not been included in the assessment system by water environment management21. NH3-N concentrations increased from 1980 to 2010, and then decreased from 2010 to 2018. CODMn increased from 1980 to 2000, and then decreased from 2000 to 2018 (Fig. 2). This result is generally consistent with the findings of a previous study, which found that NH3-N and COD concentrations overall decreased from 2003 to 2017 based on monthly monitoring data obtained from inland water bodies (including rivers and lakes) in China12. TP concentrations increased from 1980 to 2015, and then decreased from 2015 to 2018 (Fig. 2). In 2000, the Chinese government proposed the Total Amount of Pollutants Control Plan, taking COD as one of the control indexes of 12 major pollutants, and remarkable results have been achieved7,22. The Law of the People’s Republic of China on the Prevention and Control of Water Pollution enacted in 2008 has strictly tightened the regulation of water environmental protection23. Then, the construction of an ecological civilization was elevated to a national strategy in 2012, and a series of discharge reduction targets for TP, COD, and NH3-N were set during the 13th 5-Year Comprehensive Work Plan from 2016 to 202024. However, nutrient concentrations are not expected to decrease under the influence of future human activities and climate change (Supplementary Fig. 2). Thus, the smooth implementation of these policies and action plans is still needed to gradually decouple economic growth from its environmental impacts7,10.

Fig. 2: Nutrient concentrations in Chinese rivers between 1980 and 2018.
figure 2

Panels ad display the cumulative proportions of the stimulated annual average concentration of TN, TP, NH3-N, and CODMn in China’s rivers. eh Average nutrient concentrations in 613 rivers of China from 1980 to 2018 (mg L−1). The blue circle size represents the average nutrient concentrations for the 10 major basins during 1980 to 2018.

Due to differences in geographical conditions among the watersheds, changes in nutrient concentrations differed between basins during the period 1980–2018 (Supplementary Figs. 36). Except for the Huaihe River (n = 48, p < 0.05, r = −0.21), average TN concentrations increased, especially within the Southeast River (n = 41, p < 0.05, r = 0.86) and the Pear River (n = 66, p < 0.01, r = 0.97), where the temporal trends were statistically significant (Supplementary Fig. 3). Average TP concentrations decreased in most basins of China, especially within the Songhua River (n = 45, p < 0.05, r = −0.75) and the Yellow River (n = 67, p < 0.01, r = −0.76) (Supplementary Fig. 4). However, NH3-N concentrations and CODMn displayed several phased changes in some basins. NH3-N and CODMn concentrations increased during the early part of the observational period, whereas NH3-N and CODMn concentrations decreased in most regions of China in recent years (Supplementary Figs. 5, 6).

Our results show that TN concentrations exhibited significant spatial variations with poorer water quality in eastern China between 1980–2018 (p < 0.01; Fig. 2e). TP, NH3-N and CODMn concentrations did not vary significantly between basins (Fig. 2f–h). The proportions of sampling sites with TN concentration greater than 1.5 mg L−1 was 62.3%, which suggests that TN represents a relatively serious pollution problem in most regions of China under the current water quality standards. The proportion of sampling sites with simulated observations where TP concentration was greater than 0.2 mg L−1, NH3-N concentration was greater than 1.0 mg L−1 and the proportion of CODMn concentration was greater than 6.0 mg L−1 was 13.3%, 16.3%, and 13.7%. Compared with the historical distribution of nutrients, the proportion with higher concentrations increased between the period of 2020–2050 (Supplementary Fig. 7). Under the SSP2-RCP4.5 scenario, the proportions of simulated observations where TN concentration is greater than 1.5 mg L−1, TP concentration is greater than 0.2 mg L−1, NH3-N concentration is greater than 1.0 mg L−1 and the proportion of CODMn concentration is greater than 6.0 mg L−1 are 75.8%, 18.2%, 40.2%, and 17.6%, respectively. Under SSP5-RCP8.5 scenario, the proportion of total simulated observations where TN, TP, NH3-N and CODMn concentration were greater than 1.5 mg L−1, 0.2 mg L−1, 1.0 mg L−1, and 6.0 mg L−1 are 75.2%, 19.6%, 43.2%, and 17.5%, respectively. The results make it clear that human activities and climate change will significantly influence riverine nutrient concentrations, especially within the Yellow, Huaihe and Haihe rivers (p < 0.01).

Attribution of riverine water quality variability

The covariates related to nutrient concentration were divided into three categories including anthropogenic, climatic and geographical factors. The contribution ratios of these anthropogenic, climatic and geographical factors for nutrient concentrations are quantified by inputting data from each of the 10 river basins into the MLR model separately (Fig. 3a–d). Among the covariates considered, the anthropogenic predictors had a larger contribution (24.93–71.29% for TN, 22.43–77.10% for TP, and 52.37–91.06% for NH3-N) compared with climatic drivers (21.51–58.83% for TN, 17.73–73.86% for TP, and 5.16–37.79% for NH3-N) and geographical drivers (4.25–16.23% for TN, 3.70–25.61% for TP, and 6.90–18.21% for NH3-N) when inputting data into the MLR model using the data for each of the 10 river basins (Fig. 3a–d). However, the pattern of contribution ratios for the anthropogenic, climatic, and geographical factors for CODMn differed between river basins. In general, the geographical drivers had a larger contribution (41.60%) than the climatic drivers (25.55%) and anthropogenic predictors (32.83%) in the Northwest Inland River. The climatic drivers had a larger contribution to the CODMn concentration compared with geographical drivers and anthropogenic predictors in the Songhua River, Yellow River, Huaihe River, Southwest River and Southeast River. This finding may be due to the fact that river nitrogen and phosphorus are mainly derived from anthropogenic sources and are therefore highly correlated with human activities25,26,27. A previous study found that point sources accounted for 75% of the TDP input and agricultural non-point sources accounted for 72% of TDN input in Chinese rivers in 20129. However, the possible sources of COD may be more related to natural sources, including endogenous sources from the degradation of algae and aquatic plants and the release of sediments, and exogenous sources from atmospheric sedimentation and the import of terrestrial vegetation and soil organic matter28,29. Climate change-induced rising temperature, hydrological intensification, and extreme weather affect the timing and magnitude of dissolved organic matter delivery from terrestrial ecosystems to surface water2,29.

Fig. 3: Evaluation of the percentage contribution and effect of anthropogenic, climatic and geographical factors to nutrient variability in ten major basins between 1980 and 2018.
figure 3

ad Percentage contributions of anthropogenic, climatic, and geographical factors used multiple linear regression. eh Regression coefficients for every predictor.

Moreover, the regression coefficients between TN, TP, and NH3-N concentrations and anthropogenic drivers were higher than those for the natural drivers and geographical drivers (Fig. 3e–h). Among the selected geographical drivers, elevation, slope, soil total nitrogen (STN), soil total phosphorus (STP), and soil organic matter (SOM) displayed a discernible contribution to nutrient variability in most river basins. Precipitation factors (Pre and PRCPTOT) were strong predictors for the concentrations of TN, TP, and CODMn across the 10 studied basins, with the exception of NH3-N. In contrast, air temperature (Atmr10) was a weak predictor of nutrient concentrations across the 10 studied basins (Supplementary Table 6).

Regarding the selected anthropogenic drivers, our analysis shows that the percentage of farmland, forestland, grassland, urban area, population and anthropogenic N and P inputs were critical covariates for nutrient riverine levels (Fig. 3e–h). The extent of urban area and farmland within any given watershed showed a consistently positive relationship with nutrient concentrations. By contrast, forestland and grassland displayed a negative relationship with nutrient concentrations. Population had a distinct signature on nutrients levels in the Songhua (NH3-N, TN, TP and CODMn), Huaihe (NH3-N and CODMn), Southwest (TP), and Northwest Inland (NH3-N and TP) rivers. A relevant, previously conducted study used nighttime light intensity to characterize population and concluded that the nighttime light intensity had a distinct signature (contribution > 35%) on the two nutrient levels (TP and NH3-N) in the Yellow and Pearl River Basins17, which is different from our study. Anthropogenic N and P inputs had higher contributions to the variability of both nutrients in the Songhua, Haihe, Huaihe, Yangtze, Southwest, and Northwest Inland rivers, where a somewhat higher contribution was registered with the regression coefficient of >1 (Fig. 3e–h). The results suggested that with socio-economic growth, the rivers in western and inland regions of China (except eastern regions) have registered severe water quality impairments, which should be given more attention in the future.

Analysis of natural and anthropogenic drivers

Our attempt to characterize the signature of anthropogenic activities required an assessment of the role of natural factors, such as meteorological conditions and geographical characteristics, in shaping riverine water quality. When considering all 613 sub-watersheds across China, our analysis was able to discern a weak negative relationship between elevation and/or slope and nutrient concentrations (Fig. 3e–h), suggesting that sites at higher elevations and/or that possessed steeper slopes exhibited lower nutrient levels. In general, anthropogenic activities increase with the decrease of altitude, and plains and lowland areas are susceptible to intensive agricultural cultivation, livestock and poultry farming, urban development, and population aggregation30. The slope of the watershed determines the speed of water flow and the severity of soil erosion. Compared with lowland rivers, mountainous rivers with higher slopes are expected to experience faster flow rates and more severe erosion, resulting in shorter retention times for water and a weaker river self-purification capacity2. Our results confirmed the negative correlation between the altitude and nutrient concentration of China’s rivers, and found that the theoretical positive effect of slope on nutrient concentration was offset by other human activities, such as impoundment to regulate hydraulic and nutrient retention17,31.

The timing and magnitude of exogenous nutrient inputs and the factors promoting internal nutrient migration and transformation in rivers may be affected by long-term meteorological forcing32. Increasing air temperature will influence the riverine thermal regimes, as well as the physical and chemical properties of water (i.e., water pH, salinity, solubility, viscosity, and diffusion rates), further affecting biochemical processes such as nitrification, denitrification, sediment mineralization and re-release14,33,34. The role of precipitation or extreme precipitation is also a dominant factor affecting hydrological regimes, including hydraulic characteristics, water level, flow rate, inundation pattern, and water cycles3,35. Moreover, changes in the amount, frequency, and intensity of precipitation will mobilize nutrients on land through surface and subsurface processes for gathering non-point pollution, and release higher concentrations of sediment through erosion and resuspension36,37. A previous study suggested that precipitation dominated the interannual variability of riverine N loading across the continental United States during the period of 1987–20073. Our results show that the relationship between nutrient concentrations in rivers and mean and extreme meteorological factors varies according to geographical region and water quality indicators (Fig. 3), possibly due to the covariance between meteorological factors and other natural and human activity variables14,33.

Compared to the relatively minor effects of climate and geographical factors considered, the population, NANI/NAPI, and the percentage of specific land-use types were found to be the stronger predictors of riverine nutrients levels (except for CODMn) and explained most of the nutrient variation collectively (Fig. 3). It has been confirmed that continuous urbanization and intensive agricultural development have had a profound impact on nutrient inputs from land to rivers25,26. It is noteworthy that the average explanatory rates of the predicted variables used after screening for the variations in CODMn, NH3-N, TN, and TP concentrations in the 10 river basins were 52.92%, 35.74%, 72.15%, and 31.97%, respectively (Supplementary Table 7). Pollution control measures, as indicated by the proportion of land with drainage systems and the capacity of sewage treatment plants, and the construction of water conservancy facilities should be taken into account in further studies due to their impacts on river nutrient input and migration, although it is difficult to collect these data with high resolution and accuracy in China since 198017.

Management implications for future environmental efforts

Our analysis suggested that anthropogenic activities and natural factors have a significant impact on riverine nutrients levels. In addition, changes in water environmental management policies have played an important role in water quality improvement. As the discharge control standard gradually received attention, the water environmental management policy was transferred to target the discharge of COD since 2000. Therefore, CODMn concentrations have decreased since then, especially in the Yellow River and the Huaihe River (Supplementary Fig. 8). The Action Plan for Prevention and Control of Water Pollution in China was formulated in 2015 to strengthen the prevention, and control of water pollution, and all water functional areas are required to meet the water quality requirements. During the period of 2007–2017, the N and P loads exported from agriculture have significantly decreased from 1.598 × 109 to 7.195 × 108 kg and 1.087 × 108 to 7.62 × 107 kg, respectively38. TP and NH3-N concentrations have also decreased with the change in water policy in recent years (Fig. 4, Supplementary Fig. 9). However, TN was not included in China’s surface water quality control targets over the past years, and thus TN concentrations have not significantly decreased (Supplementary Figs. 3 and 10), which will result in potential detriment to aquatic ecosystems. Therefore, some mitigation measures should be taken to manage N to restore water quality in China21.

Fig. 4: Temporal trends in TP concentrations with a change in water environmental management policies in several typical watersheds between 1980 and 2018.
figure 4

I represents the standard discharge control used between 1980 and 2005; II represents the target total amount control approach used between 2005 and 2015; and III represents the water environmental quality improvement between 2015 and 2018. a Songhua River; b Yellow River; c Huaihe River; and d Yangtze River.

Although the recent decreases in TP, NH3-N, and CODMn concentrations in most rivers indicates that China’s nutrient control measures have been effective, it has to be noted that reaching a good ecological status will required significant time (Supplementary Fig. 2). Currently, problems exist within China’s water environmental management system, such as unified water environmental quality standards for all regions, separation of water quantity management and water quality management, and coordination between different management organizations. The findings from this study indicate that China now needs more flexible regional water strategies to cope with the different regional trends and sources of nutrient loadings to freshwaters. The ongoing revision of China’s Water Pollution Prevention and Control Law should propose significant changes to the current water governance structure and reflect its flexibility between regions to further reduce pollutant loadings and nutrient concentrations in rivers.

A sustainable pathway is essential for achieving a reduction in riverine TN, TP, and NH3-N concentrations as well as CODMn in the near future. All 17 SDGs have targets that are relevant to water quality in rivers in China10. Two SDGs, namely SDG 6 “Clean Water and sanitation” and SDG14 “Life under Water”, are particularly relevant to water quality. For example, reducing nutrient pollution in shallow groundwater and surface water resources could help to achieve universal and equitable access to safe and affordable drinking water by 2030 (SDG 6.1)8. Achieving a substantial increase water-use efficiency across all sectors and ensuring sustainable withdrawals and a sustainable supply of freshwater to address water scarcity by 2023 (SDG 6.4) could reduce nutrient pollution in water systems via improving water-use efficiency in agriculture and upgrading the domestic wastewater pipe networks to reduce nutrient leaching and runoff to waters. Reducing the river export of nutrients could help to achieve the target of the prevention and significant reduction of marine pollution of all kinds (in particular pollution due to land-based activities), and including marine debris and nutrient pollution by 2025 (SDG 14.1)10. On the basis of sustainable development goals, the water resources, water environment, aquatic ecology and water risk should be considered together to achieve good ecological status in China’s rivers. In formulating future policies, special attention needs to be paid to pollution discharge, sewage systems and climate change regarding their economic, societal, institutional and technical feasibility in ensuring the effectiveness of the policies in pollution control.

Methods

Machine-learning models for simulating concentrations of water quality parameters

The collected variable data were screened using a Maximum Information Coefficient (MIC). Actually, the process was based on three steps: (1) MIC > 0.25; (2) deletion of the prediction variables with collinearity (Spearman correlation analysis, R > 0.8); (3) retainment of the prediction index that has a high correlation with a response index (Spearman correlation analysis, R > 0.4). The general processes inherent in the models are depicted in Fig. 1. We utilized the model stacking method, which provided a composite prediction based on the results of multiple base models (that is, RF, SVM, and KNN)20. The model stacking algorithm uses a two-layered learning framework where the outputs generated by individual base models are input into another model to generate final predictions39,40. The learning process of the stacking model is categorized into three steps: stacking generation, stacking pruning, and stacking integration. The phase of stacking generation mainly refers to the generation of base models, whereas the last two steps optimally combine the base model predictions to form a final set of predictions using a second-level algorithm.

Different methods can be used to combine the base models, among which linear combination is the most widely used41,42,43. A linear stacking model has a prediction function expressed as:

$$y={\omega }_{1}{f}_{1}+{\omega }_{2}{f}_{2}+\cdots +{\omega }_{M}{f}_{M}$$
(1)

where y represents the stacking target; f1, f2, , fM denote the base model predictions from M individual algorithms (M = 3 in this study); and wm (m = 1,, M) is the weight assigned for each base model. The key problem with this approach lies in how to obtain the optimal set of weights (Fig. 1). A quadratic programming-based algorithm was adopted to estimate the set of weights using the R software package. We then assumed that the dataset which we intend to estimate the weights for N observations. First, a base model m is trained using the dataset with the ith observation removed. \({\hat{f}}_{m}^{-i}({{x}_{i}})\) represents the prediction of the model m for the ith observation. The estimation of the weights is obtained from the least square linear regression of yi (observed value of the ith observation) on the linear combination of \({\hat{f}}_{m}^{-i}({{x}_{i}})\), m = 1,, M. The optimal set of stacking weights are estimated by minimizing the following objective function under two constraints:

$${\hat{\omega }}^{{st}}={argmin}\mathop{\sum }\limits_{i=1}^{N}{\left[{y}_{i}-\mathop{\sum }\limits_{m=1}^{M}{\omega }_{m}{\hat{f}}_{m}^{-i}({x}_{i})\right]}^{2}$$
(2)
$${\omega }_{m}\ge 0,m=1,2,3,\cdots M$$
$$\mathop{\sum }\limits_{m=1}^{M}{\omega }_{m}=1,m=1,2,3,\cdots ,M$$

where \({\hat{\omega }}^{{st}}\) is the objective function, and xi refers to the ith observation composed of all environmental variables. The two above constraints are reasonable if we interpret the weights as posterior model probabilities. It is worth noting that the ith observation is removed from the training data when training model m to avoid assigning unfairly high weights to models with higher complexity44. More detailed information on the approach can be seen in the Supporting Information.

The predicting performance of the training and testing datasets provided complementary information for model validation. Training primarily exhibited model robustness, i.e., stability and balance of model predictability in the presence of data shuffling. Testing measures the model’s performance on the unseen data and addresses the model fitness. In this context, we used the Pearson correlation coefficient (R2) as the statistical metric to quantify the predictive performance of the models (Supplementary Tables 15). To supplement the Pearson correlation coefficient and provide an in-depth assessment of model accuracy, we calculated the RMSE, NSE, and MAE. NSE estimates the correspondence between observed and predicted values45.

Factors controlling TN, TP, NH3-N, and CODMn variability

In this study, an MLR model was employed to assess the factors controlling nutrient levels in the water bodies. MLR models provide insights into the relationships between the response variables and multiple explanatory variables. The influence of each explanatory variable on the response variable is determined based on the ratio of the standardization coefficients of different explanatory variables to the sum of the absolute values of the total standardization coefficients46. The considered response and explanatory variables are shown in Supplementary Table 6. This approach has been widely applied to simulate water quality and to identify the key driving factors3,32. The MLR model applied in this study used a connection function to establish the relationship between the response variables (nutrient concentrations) and explanatory variables (environmental factors). Considering a response variable Y and p explanatory variables X1,…,Xp, and n observations for MLR, that is:

$${y}_{i}={\beta }_{0}+{\beta }_{1}{x}_{i1}+\ldots +{\beta }_{p-1}{x}_{i,p-1}+{\beta }_{p}{x}_{{ip}}+{\varepsilon }_{i}$$
(3)

where \({\varepsilon }_{i} \sim N(0,{\delta }^{2})\) for i = 1, …, n. After testing the collinearity of the predictive variables, this study used the annual mean data of the 613 sub-watersheds from 1980 to 2018 to input the data of 10 major river basins into the model separately. The ordinary least square method obtains the best function by minimizing the sum of squares of errors to estimate the standardization coefficient. The significance of standardized coefficients and fitting equations were tested using t tests and F tests, respectively46. The standardized coefficient (r) between response variables and explanatory variables was used to compare the influence of each variable on nutrient variability. Influence, in this case, is presented as a contribution percentage of each variable as follows:

$${C}_{i}=\frac{{r}_{i}}{\left|{r}_{1}\right|+\left|{r}_{2}\right|\cdots +\left|{r}_{p-1}\right|+\left|{r}_{p}\right|}\times 100 \%$$
(4)

where Ci represents the contribution percentage of variable i, i = 1, 2, 3, ···, p, and ri represents the standardized coefficient between response variables and the explanatory variable i.

Data sources

In this study, four selected water quality parameters, including CODMn, TN, NH3-N, and TP were selected to describe water quality in China. Monthly data were collected between 2003–2018 from 613 river water quality monitoring sites in the nation’s 10 major river basins from the China National Environmental Monitoring Center (http://www.mee.gov.cn/hjzl/shj/dbszdczb//). The CODMn, TN, NH3-N, and TP concentrations were analyzed in a laboratory using the standard testing procedures recommended by the Ministry of Environmental Protection of China47, which did not change over the reported time period. The 10 river basins included the Songhua, Liaohe, Haihe, Yellow, Huaihe, Yangtze, Southeast, Pearl, Southwest, and Northwest Inland rivers. Spatial data were also collected from the basins, including the geographical conditions, physicochemical soil properties, climatic conditions, land use, anthropogenic discharges, and socioeconomic development (Supplementary Table 6). The elevations and slopes of each water monitoring station were determined on the basis of a digital elevation model (resolution of 1 × 1 km) from the Resource and Environment Science and Data Center (http://www.resdc.cn/Default.aspx). The local meteorological conditions (e.g., temperature, precipitation, wind speed, and extreme climate index) over the spatial domain covered by the national network were obtained from the CN05.1 dataset, which was obtained from the China Meteorological Administration and constructed by the “anomaly approach”. Data interpolation between sites was based on many station observations (–2400) in China48,49. The dataset has a spatial resolution of 0.25° × 0.25°. The soil properties of each sub-basin were extracted from a digital soil properties map obtained from the Institute of Soil Science, Chinese Academy of Sciences (http://www.issas.cas.cn/). The scale of the utilized digital soil properties map is 1:1,000,000. The land-use dataset from China (with a resolution of 30 × 30 m) was obtained from the Institute of Geographic Sciences and Natural Resources Research (IGSNRR, Chinese Academy of Sciences) (http://www.resdc.cn/Default.aspx). The net anthropogenic N and phosphorus (P) inputs (NANI and NAPI) were estimated based on the reported discharge activity data and discharge coefficient50,51. The discharge activity data were obtained from the Chinese Statistical Yearbook (https://data.cnki.net/Yearbook/Navi?type=type&code=A). Gross domestic product (GDP) and population density (POP) represent two important social economic indicators that may affect pollution sources and the input of pollutants to water bodies52. The spatial distribution of social economic data (resolution of 1 × 1 km) was obtained from the Resource and Environment Science and Data Center in the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences (https://www.resdc.cn/).