Water quality assessment based on multivariate statistics and water quality index of a strategic river in the Brazilian Atlantic Forest

Fifty-four water samples were collected between July and December 2019 at nine monitoring stations and fifteen parameters were analysed to provide an updated diagnosis of the Piabanha River water quality. Further, forty years of monitoring were analysed, including government data and previous research projects. A georeferenced database was also built containing water management data. The Water Quality Index from the National Sanitation Foundation (WQINSF) was calculated using two datasets and showed an improvement in overall water quality, despite still presenting systematic violations to Brazilian standards. Principal components analysis (PCA) showed the most contributing parameters to water quality and enabled its association with the main pollution sources identified in the geodatabase. PCA showed that sewage discharge is still the main pollution source. The cluster analysis (CA) made possible to recommend the monitoring network optimization, thereby enabling the expansion of the monitoring to other rivers. Finally, the diagnosis provided by this research establishes the first step towards the Framing of water resources according to their intended uses, as established by the Brazilian National Water Resources Policy.


Results
Water uses. We have requested and received from INEA two water user databases of the Piabanha Basin.
The first set corresponds to raw data from the National Water Resources Register (CNARH), with all the registrations until December 2017 and with 1549 registered interferences (water abstraction or effluent discharge). The second one is the registration validated by INEA until August 2018 by the Águas do Rio project comprising a total of 669 validated interferences. With these data, it was possible to build a georeferenced base. By so doing, it was possible to list the main effluent discharges by type for each monitoring station.
In the validated database, from the 669 interferences, 84% are water abstractions and 16% are effluent discharges. Water abstraction account for 425 m 3 day −1 with 75% from wells and 25% from rivers. On the other hand, effluent discharges are 89 m 3 day −1 . The largest volume of effluents comes from the sanitation sector with 57% of the total, whereas industries account for 33%, aquaculture with 4% and mining for 3% of discharges.
When comparing the two databases, it is clear that the universe of registered users is much larger than the universe of validated users; in other words, those whose data were made up by the state environmental agency and, therefore, received a license. For example, the validated database has only six interferences related to agriculture, in contrast to 789 interferences awaiting validation. This is a serious obstacle for water resources management in the region, which threatens the sustainability of water resources.
Short time monitoring and water quality index. In order to assess and compare the water quality of the Piabanha River, we calculated the Water Quality Index from the National Sanitation Foundation (WQI NSF ) using two datasets, the first one from 2012 and the last one from 2019 (Tables 1 and Table 2). The 2012 results (Fig. 1A) oscillated between the bad and medium categories, generally with medium quality (50.5 ± 10.3). In 2019 (Fig. 1B), the results ranged between the medium and good categories, in general with medium quality (61.6 ± 10.8).
Data sets show significant seasonal behavior (p < 0.05) (Fig. 1C,D) between the end of the dry period (Jul, Aug, Sep) and the beginning of the rainy period (Oct, Nov, Dec) for the parameters DO, WT, pH, nitrate, phosphate and turbidity, while no significant seasonal difference (p > 0.05) was found for the parameters E. coli, BOD and TDS. The parameters that have most impacted the WQI NSF were coliforms and BOD. Ammonia and total phosphorus do not account to WQI NSF , but their concentration has violated Brazilian legislation and their influence can be better understood by PCA.
Principal components and clusters analysis. The 2019 dataset (n = 48), comprising six monitoring campaigns at the eight monitoring stations along the Piabanha River with 15 parameters analysed, was grouped by the average value of each parameter at each station (n = 8). Pearson's correlation matrix is presented in Table 3     www.nature.com/scientificreports/ ACP was applied to identify groups of parameters that influence water quality. PC 1, PC 2 and PC3 account for 72% (eigenvalue 10.74), 14% (eigenvalue 13.94) and 5% (eigenvalue 0.8), respectively, of the data variance. Components with eigenvalues larger than the unit were selected. That is, the first two components together account for 86% of the total variance. The loadings that compose the first two components are presented in the Table 4 and the stations that most influence the results are represented in Fig. 2A.
PC1 was substantially correlated with practically all parameters. Stations number 1 to 4 loaded positively (loadings > 0.7) to PC1 with the parameters TDS, Alkalinity, Ammonia, Total Nitrogen, Phosphate, Total Phosphorus, DBO, COD, E. coli, while stations number 5 to 8 loaded negatively (loadings < − 0.7) with Nitrate, Turbidity, SS, pH and WT. PC2 was most influenced by stations in the urban area, notably station 1, and showed a positive correlation (loadings > 0.5) with OD, COD, BOD and less by SS (loading = 0.33), being more influenced by station 1 in the urban area. On the other hand, it was negatively correlated with E. coli (loading = − 0.66) with a large influence of station 3.
The sampling stations were grouped into three statistically significant clusters with 75% of similarity by agglomerative hierarchical clusterization based on the ward linkage by Euclidean distance (

Longtime monitoring assessment based on Mann-Kendall rank test and Fourier transform.
In a complementary way, in order to evaluate a possible trend on water quality and to detect the seasonal behavior of the basin, we used a time series with 40 years of monitoring. Since dissolved oxygen can be used as a surrogate variable for the general health of aquatic ecosystems [49][50][51] , it was selected to perform the Mann-Kendall rank test of randomness for the station more upstream and further downstream of the Piabanha River, PB002 and PB011 respectively. The upstream station showed a statistically significant increasing trend (n = 166, S = 1507, Z = 2.10, p < 0.03), whereas the downstream station does not show a statistically significant trend (n = 198, S = 1179, Z = 1.27, p = 0.20). The entire dataset can be found as Supplementary Table S3 and S4. To detect the seasonal behavior, we have applied a Fourier transform algorithm to the time series from 1980 to 2019 to the station PB011 (Fig. 3A, which does not display a tendency behavior and can be considered as representative of the entire basin because it is the most downstream station. The data were organized in quarterly averages for the DO parameter. The two most powerful signals correspond to the frequencies of 0.25 and 0.45, nearly (Fig. 3B) It corresponds to periods of 12 and 6 months, respectively. Taking into account this seasonality, we confirmed that our 2019 field campaigns are representative of seasonality comprising the final half of the dry season and the initial half of the rainy season.

Discussion
Water quality assessment. The Piabanha River had a better water quality in 2019 than in 2012, according to WQI NSF results (Fig. 1). The improvement was substantial over the first 40 km, rated as "bad" in most campaigns in 2012, while rated as medium in most campaigns in 2019 due to sewage collection and treatment system expansion. Since 2012, Petrópolis has built 50 km of sewage collection network and 7 new sewage treatment units 52 . These plants produce secondary level effluents through biological treatment, the plants flow capacity reaches about 800 L s −1 . These stations use different technologies such as: submerged aerated biofilters, anaerobic upflow reactor, moving bed biofilm reactor and upflow anaerobic sludge blanket reactor. Beside this, in some plants are used biosystems 53 . Water quality improved in stretches after 40 km due to self-purification processes and the contribution of clean tributaries. This is in line with findings from other rivers worldwide 31,54,55 .  www.nature.com/scientificreports/ Dry seasons, in general, presented better water quality indexes than rainy seasons. Other studies 28,56,57 have shown similar seasonal behavior, where water quality worsens in the rainy season due to sediments and pollutants input carried by the rain. In addition, most of the sewage network is the same network that collects rainwater. Thus, during rainy events, sewage is no longer treated and is discharged directly into rivers.
Although the WQI NSF had a medium rating in 2019, BOD and Coliforms were substantially above the maximum allowed by Brazilian regulation. In addition, the index is limited to the parameters used in its calculation 58 . This is the case for the ammonium parameter, which presented concentrations up to three times higher than allowed in Brazilian regulation, reminding that only nitrate is used in the WQI NSF . The same occurs with total phosphorus: only phosphate is considered, although it does not have a maximum value established by the Brazilian federal regulation. In what follows, we analyse these parameters in more detail.
Biochemical Oxygen Demand (BOD) is one of the most widely used criteria for water quality assessment. It provides information on the ready biodegradable fraction of the organic load in water 59 . High BOD concentrations reduce oxygen availability, mainly correlated to microbiological activity 60 . Its concentration ranged from 2.00 to 45 mg L −1 (average 7.69 ± 7.52) over the entire data, with its concentrations most of the time substantially above the maximum allowed by Brazilian regulation (5 mg L −1 ). Escherichia coli is naturally present in the intestinal tracts of warm-blooded animals and it is widely used as an indicator of fecal contamination 61,62 . Villas-Boas 42 pointed to fecal coliforms as the most relevant water quality parameter in the urban area of Petrópolis, mainly related to pollution caused by untreated domestic sewage.
Phosphorus is an essential nutrient for all forms of life 63 . Its availability can be related to atmospheric deposition 64 , anthropic uses of products such as detergents 65 and due to agricultural activities 66 . Orthophosphates are the most relevant in the aquatic environment as they are the main form of phosphate assimilated by aquatic vegetables 67 . Previous studies 42,68,69 in the Piabanha Basin found phosphate values in perfect agreement with ours. Alvim 68 points out that the main source of phosphorus for the Piabanha River is the sewage discharge and the higher concentrations are found during the rainy season.
Nitrate is a very common element in surface water since it is the end product of the aerobic decomposition of the organic nitrogenous compound 70,71 . Its sources are related to landscape composition, being influenced by both agricultural and urban uses 72 . Villas-Boas 42 found high concentration of nitrate and ammonium in the urban region of Piabanha River in agreement with this study. Alvim 68 reports that domestic sewage discharged into Piabanha River waters account for 43% of the nitrogen load, the atmospheric contribution for 31% and the farming activity for 15%.
The major contributors to water quality and stretches of river with similar water quality. The first two components together account for 86% of the total variance, indicating method high explanatory power of the method. It was far better than other similar studies around the world 29,30,71,[73][74][75] . PC1 predominantly accounts for urban sewage pollution. This is clearly demonstrated by the fact that stations from 1 to 4, located in the urban area of Petrópolis, positively loaded PC1 with organic matter (BOD and COD), TDS and nutrients such as phosphorus and nitrogenous constituents, especially ammonia, indicating recent pollution. Even clearer is the fact that stations from 5 to 8 have negatively loaded with nitrate, showing the nitrogen compounds degradation in the downstream stretches of the urban area. On the other hand, the increase in nitrate concentrations in association with the increase in turbidity in stations outside the urban area may also be associated with land use, especially in agriculture.
PC2 is dominated by the dissolved oxygen parameter and other parameters that indicate the health of the river, as organic load and coliforms. It is explained by water pollution by organic matter and biological activity and reinforces the result of CP1. In the study region, sanitation is still a challenge to be faced by the government, especially in the first urban stretch, after 40 km from the source of the Piabanha River, this region has 26% of untreated sewage 53 . www.nature.com/scientificreports/ Cluster analysis was used to group sampling stations into similarity classes indicating the stretches of river with similar water quality. As pointed out by Singh 29 , it implies that only one site in each cluster may serve as good in spatial assessment of the water quality as the whole cluster. So, the number of sampling sites can be reduced; hence, cost without losing any significance of the outcome. On the other hand, this interpretation should be done with caution since trends in different stretches can be very different, making future changes significant. Therefore, great care must be taken to reduce monitoring stations.
It is important to notice that the first cluster (S1, S6 and S4, S5) groups station 1 with station 6, the first one corresponding to the urban area of Petrópolis whose pollution stems from sewage and industrial effluents. Likewise, station 6 is located after the confluence of the Preto-Paquequer River, which crosses Teresópolis, the second largest city in the hydrographic basin, also with the presence of economic and industrial activities. Sand mining is the predominant activity near stations 4 and 5, which together receive the impact of five mining companies. Similarly, station 6, after the Preto River, receives the impact of seven sand mines. In fact, this group brings together economic activities whose impact on water quality is similar. Station 5 could be removed from the network monitoring in order to reduce costs.
The second cluster (S2 and S3) refers to the most urbanized section of the basin. When individually checking the quality parameters between these stations, one can conclude that they differ only by the diluting effect caused by the contribution of the Araras River, on the left bank, and of the Poço do Ferreira River, on the right bank, which receives its waters from the Bonfim River after its source in the Serra dos Órgãos National Park, an important federal conservation unit. Station 3 was introduced precisely to detect this diluting effect, but since the cluster analysis showed that it was not significant it is recommended to remove this station.
The third cluster (S7 and S8) has a very similar behavior: station 8 is just before the Piabanha River mouth and station 7 is located less than 10 km upstream of the mouth. In addition, on this stretch there are only three interferences registered as discharges. Thus, it is recommended to remove station 7, considering the importance of maintaining a station close to the river mouth.

Trend analysis and seasonal variation.
Although it still presents systematic violations to Brazilian standards 76 , the water quality, in general, has improved in the Piabanha River over the past 40 years (Fig. 3A,B). This statement is supported by the Mann-Kendall rank test of randomness, indicating a significant (p = 0.03) tendency to increase the values of the dissolved oxygen parameter at station PB002, located in the urban area of Petrópolis, which is highly impacted by effluent discharges, despite the fact that this region has municipal sewage treatment. PB011 presents high levels of DO, since the beginning of the time series exhibiting an almost monotonic behavior over time, thus it has no tendency. The high DO levels are due to both the river's reoxygenation process and the contribution of clean waters from its tributaries, such as the Fagundes River.
A strong annual and semi-annual seasonality was indicated by the power spectral density, which can be seen in the periodogram (Fig. 3B) resulting from the Fast Fourier Transform. The results are in accordance with the literature 77 indicating that more than 90% of the total variance of dissolved oxygen is accounted for by the annual periodicity and the next four higher harmonics (semi-annual; tri-annual, etc.). Seasonality follows the rainfall regime with a dry period from April to September, and a wet period from October to March, according to Araújo's 78 study carried out in the Piabanha River basin.
Water quality at point PB002 started to improve in 2000, when the first sewage treatment plant in the city of Petrópolis came into operation. Currently, 95% of the population has access to drinking water, and the coverage of treated urban sewage is 85%. The municipality has 26 sewage treatment units, responsible for the treatment of 56.2 million liters per day. In relation to the other municipalities in the basin, according to the National Sanitation Information System 79 (SNIS), the municipality of Três Rios treats 2.97% of its sewage, while the other municipalities, Teresópolis, Areal, São José do Vale do Rio Preto, Paty do Alferes and Paraíba do Sul did not report their data to SNIS, potentially indicating that they do not perform sewage treatment. In other words, about 50% of the population has no formal access to sewage treatment services.

Conclusion
The diagnosis provided by this research establishes the first step towards the Framing of water resources according to their intended uses, as established by the Brazilian National Water Resources Policy. In addition to the diagnosis which was carried out a georeferenced database was built. There are few cases of Framework in Brazil and none in the studied watershed. This makes this study relevant to Brazilian water resources management. The considerable number of users awaiting regularization from the State Environmental Institute is a limitation to implement the Framework and requires a joint effort of the watershed committee.
Answering our initial question, Piabanha River water quality is medium according to the WQI NSF and certainly is not able to support high levels of biodiversity. Some river stretches have quality compatible with class 4 according to the Brazilian regulation for the coliforms, BOD and TP parameters; hence, they cannot be used for irrigation, human or animal consumption, not even after treatment. On the other hand, the Framework must be carried out according to intended uses. Therefore, we recommend that the Piabanha Committee, in partnership with the State Public Ministry, lead actions to reduce the concentrations of these parameters, mainly in the sanitation sector.
It is recommended that the monitoring program be continued and expanded to stretches where conflicts between water uses occur, in order to implement the Framework to enforce the improvement of water quality. It is also important to point out that this study was financed with public resources from the Piabanha water resources fund and that the present analysis made possible to recommend the exclusion of three of the eight existing stations, thereby enabling the expansion of the monitoring to other tributaries of the Piabanha River under the influence of large population with practically no sanitation, notably the Rio Preto/Paquequer sub-basin.
Scientific Reports | (2020) 10:22038 | https://doi.org/10.1038/s41598-020-78563-0 www.nature.com/scientificreports/ This work describes a methodological approach that can be useful for other researches in environmental science and management. We have applied an integrated approach using data from different sources combined with data analysis based on WQI, PCA, CA, frequency analysis and trend analysis, which were used in a complementary way to understand a research problem.

Materials and methods
Study area. The Piabanha Basin is located in southern Brazil, belonging to the mountainous region of the State of Rio de Janeiro with an area of 2050 km 2 (Fig. 4). The Piabanha River source is at 1150 m of altitude and runs down 80 km until it flows into the Paraíba do Sul River at an altitude of 260 m. The upper portion of the basin presents a humid tropical climate. With steep slopes, annual rainfall exceeds 2000 mm. The lower portion of the basin has a sub-humid climate and the average rainfall decreases to 1300 mm. The seasons are well defined throughout the basin and the rainfall regime has symmetry in its distribution between the periods from January to June and from July to December 78 . The territory is home to 535 thousand people in 2018 80 . The two largest cities in the region, Petrópolis and Teresópolis, are located in the headwaters of the basins and give rise to the Piabanha and Preto rivers, respectively. Additionally, because the sewage treatment is limited and the river flows are low, high constituent concentrations are observed (e.g., fecal coliform, nitrate, and BOD), especially in urban areas 42 . Datasets. Three sets of monitoring data have been used in this researchh (Fig. 4). The first and main one was the result of a monitoring program that is being conducted by the Piabanha watershed Committee, in which data from July to December 2019 have been analysed and are described in more details in the next item. The second were from 6 campaigns carried out in 2012 by HIDROECO project 44 also with financial resources from the Piabanha Committee which is used as a baseline for comparison purposes. The third was comprised of two stations of the basic monitoring network of the Rio de Janeiro Environmental Institute, with data from 1980 to the present, except for periods of data gaps.
A georeferenced database was also built containing water management data. Brazilian National Water Agency (ANA) has developed the National Water Resources Users Register (CNARH) for any bulk water user that www.nature.com/scientificreports/ changes regime, quantity or quality of a water body. It is a federal platform, but it can be managed by each state. Registration is a prerequisite for the other stages of uses regularization. Water Quality Index. A Water Quality Index (WQI) is an empirical expression which integrates significant physical, chemical and microbiological parameters of water quality into a single number 82 . It can be a powerful communication tool to simplify a complex set of parameters, whose individual interpretation can be difficult, into a single index representing the general water quality. A water quality index was initially proposed by Horton 26 and further developed by Brown 27,83 resulting in the National (USA) Sanitation Foundation Water Quality Index (WQI NSF ). The original version of the WQI NSF established an additive expression 27 ; on the other hand, field data analysis suggested that the additive WQI lacked sensitivity in adequately reflecting the effect of a single low value parameter on the overall water quality. As a result, a multiplicative form of WQI was proposed 82,83 : q i is the quality class for the nth variable, a number between 0 and 100, obtained from the respective average quality variation curve 82 , depending on the concentration of each nth variable. W i is the relative weight for the nth variable, number between 0 and 1, assigned according to the importance of the variable for overall quality conformation. WQI NSF is the National Sanitation Foundation Water Quality Index, a number between 0 and 100, rated as "excellent" (100 > WQI ≥ 90), "good," (90 > WQI ≥ 70), "medium" (70 > WQI ≥ 50), "bad" (50 > WQI ≥ 25) or "very bad" (25 > WQI ≥ 0).

Monitoring campaigns and analytical procedures.
The WQI NSF and its many adaptations have been widely used 84,85 , however, its use is not uniform, replacing parameters without the necessary adaptation of the respective curve of the indicator. In Brazil, since 1975 the WQI NSF has been used by CETESB (Environmental Company of the State of São Paulo). In the following decades, other Brazilian states adopted, with minor adaptations, this index, which today is the most widely used in the country. In the present study, the weights (w i ) have been used according to the methodology established by INEA (Environmental Institute of the State of Rio de Janeiro): DO (0.17); Fecal coliforms (0.16); pH and BOD (0.11); Nitrates, Phosphate and Temperature (0.10); Turbidity (0.08) and TDS (0.07), rather than total solids. The replacement of the total solids for dissolved solids parameter may cause an average variation of 0.2% in the final result of WQI NSF , based on our estimates (n = 48, data 2019). In relation to microbiology, E. coli have been used instead of fecal coliforms, applying a correction factor 86 87 , is a multivariate technique of covariance modeling that reduces the dimensionality of an originally correlated dataset, with the lowest possible information loss. A new set of variables containing new orthogonal, uncorrelated variables, is formed from a dataset of correlated variables, which are weighed linear combinations of the original variables 30 .
PCA technique extracts the eigenvalues and eigenvectors from the covariance matrix of original variables. The PCs are obtained by multiplying the original correlated variables with the eigenvector, which is a list of coefficients, frequently called "loadings" 29,30,88,89 . A widely accepted and simple qualitative rule proposes that loadings greater than 0.30 or less than − 0.30 are significant; loadings greater than 0.40 or less than − 0.40 are more important, whereas loadings greater than 0.50 or less than − 0.50 are very significant 90 . The suitability of data for PCA was evaluated by Kaiser-Meyer-Olkin 91,92 (KMO) measuring of sampling adequacy and Bartlett tests of sphericity 93 . The Shapiro test was evaluated to verify the data normality (α = 0.01).
Cluster analysis reveals the latent behavior of a dataset to categorize the objects into groups or clusters on the basis of similarities 30,88,89 . Hierarchical agglomerative cluster analysis (CA) classifies objects by first putting each object in a separate cluster, and then joins the clusters together stepwise until a single cluster remains 29 . Timeseries analysis and trend detection. Mann-Kendall trend test is a nonparametric test used to identify a trend in a series, first proposed by Mann 94 and further improved by Kendall 95 and Hirsch 96 . The null hypothesis (H 0 ) for these tests is that there is no trend in the series. The tests are based on the calculation of Kendall's tau measure of association between two samples, which is itself based on the ranks with the samples. The variables are ranked in pairs, and the difference of each variable to its antecessor is calculated. The total number of pairs that present negative differences is subtracted from the number of pairs with positive differences (S). A positive value of S indicates an upward trend, and a negative value of S a downward trend. For n > 10, a normal approximation is used to calculate Z statistic which is used to calculate p-value 96 .
Fourier decomposition is a technique which allows the separation of frequency components from a data series with seasonal behavior from a complex water quality dataset 97  www.nature.com/scientificreports/ influences and their scales 50 . Power spectral density (PSD) obtained from FFT and represented by periodograms is a recommended procedure to detect seasonality 98,99 .
Brazilian legal regulation. Brazilian fresh waters are divided into four classes, depending on the intended use 76 . The Special Class is intended mainly for the preservation of the natural balance of aquatic communities in fully protected conservation areas. Class 1 is designed for human consumption supply, after simplified treatment, for the protection of aquatic communities and for primary contact recreation. Class 2 requires conventional treatment for human consumption. Class 3 requires conventional or advanced treatment for human consumption and can be used to feed animals and irrigate some crops. Class 4 is intended only for navigation and landscape harmony. It is important to note that the Framework refers to the required water quality target according to water uses. The river basin committees are responsible for implementing the Framework, in accordance with the Brazilian National Water Resources Policy 33 . As long as the Framework is not established by the basin committee, fresh waters will be considered class 2 (Art. 42 CONAMA 357/2005) 76 .

Data availability
All data generated or analysed during this study are included in this published article and its Supplementary Information files.