Introduction

Air is vital and its quality is measured in the most strategic places in cities1,2. Particulate matter (PM) with an aerodynamic diameter \(<10 \upmu \text{m}\) represents a global health problem3 with short or long term effects4,5. The fixation of this pollutant in the upper respiratory system depends on the inhalation flow rate6, causing cardiovascular diseases7 and mortality due to lung cancer8. Megacities are the most affected due to industrialization, energy consumption and high vehicular flow, generating social costs9.

The World Health Organization (WHO)10 has established PM\(_{10}\) concentration threshold values for 24-h (50 \(\upmu\)g/m\(^{3}\)) and annual (20 \(\upmu\)g/m\(^{3}\)) average measurements, which are used as a reference in air quality monitoring programs. Meteorological monitoring also contributes to measuring the effects of meteorological variables on air quality and the environment11. Relative humidity (RH), temperature (T), atmospheric pressure (AP), wind speed (WS) and wind direction (WD) influence both the distribution and concentration of PM\(_{10}\)12,13,14,15,16. This information is used in predictive models of particulate matter to optimize the control of atmospheric emissions17 and to manage sustainable cities18.

The city of Lima in Peru, is approaching ten million inhabitants and the appearance of the COVID-19 pandemic in March 2020 led to the closure of land borders19 and suspension of international air transport, leading to an improvement in air quality3,20,21 until the start of the economic reactivation in May 2020 that activated the industrial sector and vehicular transport (approximately 22,794,136 daily trips in 2019)22,23.

The Ministerio del Ambiente (MINAM) updated the air quality standards (AQS)24 and established the air quality index (AQI) for PM\(_{10}\) called Índice de Calidad del Aire (INCA)25, but there is no evidence of PM\(_{10}\) prediction models in southern Lima to help improve air quality management and minimize risks to human health. The Dirección General de Salud Ambiental (DIGESA) of the Ministerio de Salud (MINSA) maintains a monitoring network for relative humidity (RH), atmospheric pressure (AP), solar radiation (SR), wind speed (WS), wind direction (WD) and PM\(_{10}\) as part of the air quality health surveillance program in Lima and Callao. This network includes two stations in southern Lima: Santiago de Surco (SS), an urbanized and commercial sector; and San Juan de Miraflores (SJM), a less paved and dusty sector. Two areas of Lima that are complete opposites in terms of environmental management characteristics. In addition, SJM borders the “Panamericana Sur” highway, one of the most important roads in the country and close to cement factories. Both districts (SS and SJM) have a total of 900,000 inhabitants26.

In this context, statistical models are suitable to describe the complex relationship between PM\(_{10}\) and meteorological variables, predicting its behavior especially in urban areas. This research applies statistical approaches to the prediction of PM\(_{10}\) before and during the pandemic caused by COVID-19 in Lima. Moreover, there are a limited number of studies in Lima, which is one of the cities with the highest pollution levels in South America27,28,29. Predicting air quality with high accuracy can be problematic, but these tools are becoming increasingly important because they provide comprehensive information to prevent critical pollution episodes and reduce human exposure to this pollutants30,31.

The objective of this research is to contribute to the environmental management of air quality through the application of a simple and effective statistical modeling of air quality related to PM\(_{10}\) concentration levels, based on meteorological parameters and with scientific support that allows authorities to optimize decision making in the control of air pollution and risks to human health. The contributions of the research are summarized below:

  • Implementation of statistical modeling for time series data during 2019–2020, at two meteorological and air quality monitoring locations in southern Lima.

  • The three-dimensional PM\(_{10}\) forecasting model based on time series, being the first time this analysis is applied in South Lima. In addition, the principal component analysis (PCA) to evaluate the effect of meteorological variables on the behavior of PM\(_{10}\).

The rest of the paper is structured as follows: “Literature review”, shows the different studies that precede this research. Then, “Materials and methods” that describes the methodology developed based on statistical modeling approaches. Also, “Results and discussion” that presents the main findings of this research compared to other studies. Finally, “Limitations” and “Conclusions” that provides the main conclusions, together with some recommendations for future research.

Literature review

The review of studies with applications for modeling PM\(_{10}\) as a function of meteorological variables did not produce information in southern Lima (SJM and SS stations), but there is some background information for other districts belonging to the city of Lima. For example, air quality was evaluated at 3 points in North Lima, using the gray grouping method32. In addition, the dependence of particulate matter in air on meteorological parameters was studied in the town of Zárate in Lima33. Also, PM concentrations were analyzed one month before the pandemic (February 2020) and at the beginning of the pandemic (March-April 2020)21. In another investigation, a multivariate repression model based on internodal correlations ranging from 0.31 to 0.49 was implemented to analyze air quality at other stations in Lima, using a low-cost sensor with IIoT technology18. Similarly, the Weather Research and Forecasting-Chem (WRF-Chem) model was applied to develop an operational forecast system for air quality in the Metropolitan Area of Lima and Callao (MALC)12. Air quality for PM\(_{10}\) was also evaluated in other districts of Lima through the application of artificial neural networks (ANN), observing certain difficulties for its prediction in stations with critical pollution episodes34.

Applications of PM\(_{10}\) models on meteorological variables are related to the use of algorithms and functional relationships35. In China, meteorological conditions were related to interannual PM variations through correlations using robust ANN prediction models36. In Iran, meteorological factors were related to the PM\(_{10}\)/PM\(_{2.5}\) ratio using the AirQ+ model, calculating positive correlations between relative humidity (RH) and temperature (T), and negative correlations with precipitation and wind speed (WS)37. However, one of the most commonly used models is the multiple linear regression (MLR) model16,38. For example, in Bulgaria, MLR was applied relating AP-RH-T-T-PM\(_{10}\), reporting moderate correlations and indicating that more factors affecting PM\(_{10}\) concentration should be included15. In China, PM\(_{10}\) and PM\(_{2.5}\) were related to meteorological variables by linear and single-factor exponential regressions, and determined robust models of negative relationship for WS-PM\(_{10}\) and positive relationships for T-PM\(_{10}\) in warm seasons, but not in cool seasons14. Also, ANN and MLR models were compared for meteorological variable-PM\(_{10}\) combinations without obtaining significant correlations (p >0.05) for RH39. Another study combined the dew point (DP)-T variables on an integral Moivre-Laplace model to predict PM\(_{10}\) and PM\(_{2.5}\)40. Similarly, the T-WS-AP-RH-PM\(_{10}\) factors were used, showing limitations in the single factor models, and instead applied 3D graph fits to obtain better prediction results13. In the present research, the methodology proved to be simple and robust and was based on the statistical modeling approach to predict PM\(_{10}\) concentration and contribute to generate new tools for air quality management, aerosol control and prevention of risks associated with COVID-19.

Materials and methods

Area of study and dataset

Lima, the capital of Peru, has a flat morphology formed by the valleys of the Chillón, Rímac and Lurín rivers, and meteorological variables are influenced by the relief of the Andes mountain range, the cold Humboldt current and the South Pacific anticyclone (SPA), which generates microclimates in the city41 and variations in the altitudes of the TI base that influence PM\(_{10}\) dispersion.

The first monitoring station is located at a MINSA facility in the SS district (12\(^{\circ }\)8’43.47”S 76\(^{\circ }\)59’50.07”W), and the second is located at the Hospital María Auxiliadora in the SJM district (12\(^{\circ }\)9’41.33”S 76\(^{\circ }\)57’32.36”W). The monitoring generates continuous information and the publication of real-time data for PM\(_{10}\) (\(\upmu\)g/m\(^{3}\)), temperature-T (\(^{\circ }\)C), dew point-DP (\(^{\circ }\)C), atmospheric pressure-AP (hPa), solar radiation-SR (W/m\(^2\)), wind speed-WS (m/s) and wind direction-WD (degrees, \(^{\circ }\)). The map of Lima south-Peru (see Fig. 1) shows the monitoring stations evaluated in the period 2019–2020. This map was prepared using the Arcgis 10.4.1 software, with the shapefiles from Peru, including the Pacific Ocean.

Figure 1
figure 1

Location map and wind rose for SS and SJM, period 2019–2020. Map created in Arcgis software version 10.4.1 (https://desktop.arcgis.com/es/arcmap/10.4/get-started/setup/arcgis-desktop-quick-start-guide.htm).

From Fig. 1, both monitoring stations correspond to coastal and surrounding urban areas in southern Lima. The district of San Juan de Miraflores is more desert and sandy, with prevailing coastal winds, intense vehicular traffic, dust resuspension42 and close to limestone, clay and gypsum quarries. The predominant wind rose presented South (S), South South West (SSW) and South East (SE) directions, with intensities between 0.9 and 2.5 m/s, with a punctual anomalous value in the spring (4.2 m/s) for SJM and between 0.3 to 2.4 m/s for SS.

Equipment and materials

The calibrated automatic equipment (zero air equipment and argon gas dilution equipment) were installed between 1.5 and 10 m from the ground and connected to 220 volts, avoiding solid barriers 10 meters around to ensure adequate air flow. The Campbell Scientific weather station was calibrated to measure each meteorological variable. Data obtained from January 1, 2019 to December 31, 2020 were processed by real-time telemetry using AirMetReport software. Glass fiber of 25 mm diameter was used, placed daily inside the low volume test equipment (PM\(_{10}\) beta gauge) with automatic analysis of PM\(_{10}\) concentration in \(\upmu\)g/m\(^{3}\). Monitoring was performed according to the Protocol for Air Quality Monitoring and Data Management43. The data were validated by DIGESA and are published on the institutional web page.

Statistical procedure

A total of 188,859 valid hourly data (T, DP, AP, SR, RH, WD, WS and PM\(_{10}\)) generated between 2019 and 2020 were used. For 2019 only 83.5% of the data were used, while for 2020 only 51.13% were used. The 2019 data were used to develop the models and their calibration, and the 2020 data were used to validate or evaluate the models obtained in 2019. The hourly scale data were used for plotting temporal variability and statistical analysis. For its part, the wind rose analysis was elaborated using Wrplot View V.8.0.2 software. The input data were used in Origin-Pro 8.0 software, generating the following:

  • Analysis of the correlation between meteorological factors and PM\(_{10}\).

  • Application of PCA and dimensionality reduction of the interrelated variables by linear transformation of the input vectors from high to low dimension, and generation of uncorrelated components by reducing the number of predictor variables through the correlation matrix (M):

    $$\begin{aligned} |M-\lambda I| \end{aligned}$$
    (1)

    where \(\lambda\) is an eigenvalue and I is the identity matrix, \(\lambda\) multiplied by a nonzero eigenvector E generates the correspondence \({C_{e}}\)

    $$\begin{aligned} C_{e}=\lambda E \end{aligned}$$
    (2)

    Thus, the j-th variance of the j-th principal component \(\text{PC}\) is given as:

    $$\begin{aligned} \text{Variance }=\frac{\kappa _{j}}{\Sigma _{n} \lambda _{n}} \end{aligned}$$
    (3)

    The calculated principal components (PC) generate a maximum \(\lambda\) of linear combination of variables with the highest data variability, transforming the original set to the orthogonal one by multiplying the eigenvectors44.

  • PM\(_{10}\) modeling using a single meteorological variable.

  • PM\(_{10}\) modeling using MLR, defined by:

    $$\begin{aligned} Y_{i}=\beta _{0}+\beta _{1} X_{1 i}+\cdots +\beta _{k} X_{k i}+\varepsilon _{i},\quad i=1,2, \ldots , n \end{aligned}$$
    (4)

    where \(Y_i\) the dependent variable; \(\beta _0, \beta _1, \ldots , \beta _k\) are \(k+1\) constant parameters; \(X_1, \ldots , X_k\) are k independent variables; \(\varepsilon\) are the independent errors identically and Gaussian distributed and n is the number of observations39.

  • Modeling of PM\(_{10}\) using the combination of 2 meteorological variables by 3D surface fitting, which is an extension of the ordinary nonlinear fitting for both XYZ and matrix data. The method consisted of converting all data to Log10, assigning the meteorological variables (independent) to the X, Y axes and the PM\(_{10}\) (dependent) to the Z axis. The worksheet was converted to matrix using the grid method and random parameters of 24 columns by 24 rows. Different models based on the Levenberg-Marquardt algorithm (damped least squares) were tested taking into account the metrics defined in this item, generating the following functions: Extreme Cum: Non-linear Extreme Value Cumulative Function

    $$\begin{aligned} z = z_0 + B * \exp \left\{ -\exp \left\{ \frac{C-x}{D}\right\} \right\} +E * \exp \left\{ -\exp \left\{ \frac{F-y}{G}\right\} \right\} +H \exp \left\{ -\exp \left\{ \frac{C-x}{D}\right\} -\exp \left\{ \frac{F-y}{G}\right\} \right\} \end{aligned}$$
    (5)

    Voigt2DMod: The voigt surface with volume as parameter, donde: \(z_0\), A, \(x_c\), \(w_1\), \(y_c\), \(w_2\), \(m_u\) are constant parameters.

    $$\begin{aligned} z=z_{0}+A\left[ m_{u} \frac{4}{\pi ^{2}} * \frac{w_{1}}{4\left( x-x_{c}\right) ^{2}+w_{1}^{2}} * \frac{w_{2}}{4\left( y-y_{c}\right) ^{2}+w_{2}^{2}}+\left( 1-m_{u}\right) \frac{4 L n 2}{\pi w_{1} w_{2}} e^{\frac{-4 L n 2}{w_{1}^{2}}\left( x-x_{c}\right) ^{2}-\frac{-4 L n 2}{w_{2}^{2}}\left( y-y_{c}\right) ^{2}}\right] \end{aligned}$$
    (6)

    Poly2D: Two-dimensional polynomial function

    $$\begin{aligned} z=z_0+a x+b y+c x^{2}+d y^{2}+f x y \end{aligned}$$
    (7)

    For these tests, 95% reliability, correlation coefficient (r) and coefficient of determination (R\(^2\)) were considered45,46.

Performance metrics

Analysis of prediction performance involves calculating the errors between the observed and predicted values. Four statistical metrics were used to compare the performance of the models:

  1. 1.

    Pearson correlation coefficient (r):

    $$\begin{aligned} \text{r}=\frac{\sum _{i=1}^{n}\left( Y_{o}^{i}-\bar{Y}_{o}\right) \left( Y_{m}^{i}-\bar{Y}_{m}\right) }{\sqrt{\sum _{i=1}^{n}\left( Y_{o}^{i}-\bar{Y}_{o}\right) ^{2}} \cdot \sqrt{\sum _{i=1}^{n}\left( Y_{m}^{i}-\bar{Y}_{m}\right) ^{2}}} \end{aligned}$$
    (8)
  2. 2.

    Coefficient of determination (R\(^{2}\)):

    $$\begin{aligned} \text{R}^{2}=\frac{\left[ \sum _{i=1}^{n}\left( Y_{o}^{i}-\bar{Y}_{o}\right) \left( Y_{m}^{i}-\bar{Y}_{m}\right) \right] ^{2}}{\sum _{i=1}^{n}\left( Y_{o}^{i}-\bar{Y}_{o}\right) ^{2} * \sum _{i=1}^{n}\left( Y_{m}^{i}-\bar{Y}_{m}\right) ^{2}} \end{aligned}$$
    (9)
  3. 3.

    Root Mean Squared Error (RMSE):

    $$\begin{aligned} \text{RMSE}=\sqrt{\frac{\sum _{i=1}^{n}\left( Y_{m}^{i}-Y_{o}^{i}\right) ^{2}}{n}} \end{aligned}$$
    (10)
  4. 4.

    Nash-Sutcliffe Efficiency (NSE):

    $$\begin{aligned} \text{NSE}=1-\frac{\sum _{i=1}^{n}\left( Y_{m}^{i}-Y_{o}^{i}\right) ^{2}}{\sum _{i=1}^{n}\left( Y_{o}^{i}-\bar{Y}_{o}\right) ^{2}}-\alpha <D C \le 1.0 \end{aligned}$$
    (11)

where \(Y_{o}^{i}, Y_{m}^{i}\), stand for model predicted and target values, respectively, \(\bar{Y}_{o}^{i}, \bar{Y}_{m}^{i}\), are their mean values and n represents the number of observations.

Air quality index (AQI) and air quality standards (AQS)

In Peru, the AQI PM\(_{10}\) for 24 h (100 \(\upmu\)g/m\(^{3}\)) should not be exceeded more than 7 times a year and the annual arithmetic mean should not exceed 50 \(\upmu\)g/m\(^{3}\). The AQI PM\(_{10}\) was calculated from the 24-h AQS PM\(_{10}\) and the alert threshold value (150 \(\upmu\)g/m\(^{3}\)) according to the following expression:

$$\begin{aligned} I(\text{PM}_{10})=\frac{\left[ \text{PM}_{10} \upmu \text{g} / \text{m}^{3}\right] 100 \upmu \text{g} / \text{m}^{3}}{150 \upmu \text{g} / \text{m}^{3}} \end{aligned}$$
(12)

where I(PM\(_{10}\)) expresses the calculated AQI PM\(_{10}\), and the value inside the parenthesis is the observed PM\(_{10}\). Table 1 shows the national AQI classification (INCA).

Table 1 Air quality index values.

Results and discussion

Meteorological variations and seasonal correlations

Figure 2 shows the average monthly distribution of meteorological variables in SS and SJM during 2019. The monthly averages of meteorological variables evidenced that:

  • T, DP and SR variables are higher in the austral summer (January, February and March), while RH and AP have low monthly averages due to the proximity of the sun to the earth.

  • In winter, especially in July and August, there is a decrease in SR and T (including DP), and the values of AP and RH increase.

Figure 2
figure 2

Distribution of the monthly mean values of the meteorological variables at stations SS and SJM, between January and December 2019, Lima-Peru: (a) WS (m/s), (b) WD (degrees), (c) T (\(^{\circ }\text{C}\)), (d) DP (\(^{\circ }\text{C}\)), (e) SR (W/m\(^{2}\)), (f) AP (hPa) and (g) RH (%).

WS varied discretely from \(1.06 \,\text{m} / \text{s}\) in January \((\text{SS})\) a \(1.93 \,\text{m} / \text{s}\) in December \((\text{SJM})\) (Fig. 2a), with wind patterns in a south-easterly direction for SS and south-westerly for SJM, especially in April (Fig. 2b). Temperature ranged between \(15.11^{\circ }\text{C}\) in winter (August-SJM) and \(25.56^{\circ }\text{C}\) in summer (February-SS) (ver Fig. 2c), and DP fluctuated between \(12.61{ }^{\circ } \text{C}\) (August-SS) and \(21.18{ }^{\circ } \text{C}\) (February-SJM) (see Fig. 2d).

The annual means of T and DP in both districts showed that \(\text{T}_{\text{SS}}>\text{DP}_{\text{SS}}\) (3.65\(^{\circ }\text{C}\)) and \(\text{T}_{\text{SJM}}>\text{DP}_{\text{SJM}}\) (2.59\(^{\circ }\text{C}\)), but SS was warmer than SJM: \(\text{T}_{\text{SS}}>\text{T}_{\text{SJM}}\) (0.26\(^{\circ }\text{C}\)) and \(\text{DP}_{\text{SJM}}>\text{DP}_{\text{SS}}\) (0.68\(^{\circ }\text{C}\)). Likewise, SR was higher in summer (394.36 \(\,\text{W}/\text{m}^{2}\), February-SS) and lower in winter (80.26 \(\,\text{W}/\text{m}^{2}\), July-SJM) (Fig. 2e), with annual means \(\text{SR}_{\text{SS}}>\text{SR}_{\text{SJM}}\) (34.1 \(\,\text{W}/\text{m}^{2}\)) indicating that the Peruvian central coastal strip presents greater annual variations of solar energy received over the surface47. In contrast, AP (see Fig. 2f) and RH (see Fig. 2g) showed an inverse behavior and particular patterns (\(\text{RH}_{\text{SS}}<\text{RH}_{\text{SJM}}\) and \(\text{AP}_{\text{SS}}>\text{AP}_{\text{SJM}}\)), with mean monthly values of RH ranging between 69.8% (Summer-March, SS) and 91.94% (Winter-July, SJM), while AP fluctuated between 994.99 hPa (February, SJM) and 1004.73 hPa (September, SS), with maximum peaks in the winter months. Figure 3 shows the variability of meteorological factors in southern Lima, representing the temporal evolution of PM\(_{10}\) in Santiago de Surco and San Juan de Miraflores during 2019.

Figure 3
figure 3

Temporal variation of PM\(_{10}\): (a) Hourly mean in SS; (b) Hourly mean in SJM; (c) Daily behavior of PM\(_{10}\) throughout the week; (d) Monthly mean; (e) Percentage of AQI values in SJM; (f) Percentage of AQI values in SS. Period 2019.

Hourly, daily, weekly and monthly variation of PM\(_{10}\) concentration

Figure 3a,b, show lower PM\(_{10}\) hourly averages in the early morning (1:00 a.m. and 5:00 a.m.), associated with higher humidities and weaker winds that kept aerosols suspended. As the hours passed, \(\text{T}\) increased and RH decreased, concentrating PM\(_{10}\) in the air column, especially at peak hours in the morning (9:00 a.m. and 12:00 p.m.) and at night (6:00 p.m. and 9:00 p.m.). In SS, PM\(_{10}\) at peak hours followed the seasonal order: spring \(\left( 60.38 \upmu \text{g} / \text{m}^{3}\right)>\) autumn \(\left( 49.64 \upmu \text{g} / \text{m}^{3}\right)>\) winter \(\left( 44.73 \upmu \text{g} / \text{m}^{3}\right) ,>\) summer \(\left( 40.92 \upmu \text{g} / \text{m}^{3}\right) \text{y}\) and in \(\text{SJM}\): summer \(\left( 117.9 \upmu \text{g} / \text{m}^{3}\right)>\) autumn (107.8 \(\left. \upmu \text{g} / \text{m}^{3}\right)>\) spring \(\left( 102.1 \upmu \text{g} / \text{m}^{3}\right)>\) winter \(\left( 91.3 \upmu \text{g} / \text{m}^{3}\right)\). The intense activity of the Lima-Callao vehicle fleet48, dust resuspension and industrial activities generate this effect.

Likewise, Fig. 3c shows the daily averages of PM\(_{10}\), in SS and SJM. Lower values were recorded on Saturdays and Sundays (SS: \(29 \upmu \text{g} / \text{m}^{3}\) and \(\text{SJM}: 54.58 \upmu \text{g} / \text{m}^{3}\) ), due to formal and student work breaks, and decreased activity of vehicular mobile sources48. For the other days of the week, the values were higher (Wednesday in SS: 46 \(\upmu \text{g} / \text{m}^{3}\), in spring; and Monday in SJM: 97.94 \(\upmu \text{g} / \text{m}^{3}\), in autumn).

On the other hand, the monthly mean values for PM\(_{10}\) (Fig. 3d) ranged from 32.6 \(\upmu \text{g} / \text{m}^{3}\) (February-SS) to 92.8 \(\upmu \text{g} / \text{m}^{3}\) (May-SJM). Post hoc tests produced for SJM a significant difference \((\text{p}=0.0)\) between concentrations recorded during the warm austral summer-autumn months (PM\(_{10}\): 89.6 to 92.8 \(\upmu \text{g} / \text{m}^{3}\) ) and cool winter-spring months (PM\(_{10}\): 73 to 79.1 \(\upmu \text{g} / \text{m}^{3}\). The decrease in TI base altitude in the coastal summer-autumn coincided with the highest PM\(_{10}\) concentrations.

Air quality indexes

In 2019, the AQIs values in South Lima (Fig. 3e,f) were 49.3 \(\%\) good quality \(\left( \text{PM}_{10}<76 \upmu \text{g} / \text{m}^{3}\right) >44.1 \%\) moderate quality \(\left( 76-150 \upmu \text{g} / \text{m}^{3}\right) >6.5 \%\) poor quality (101-250 \(\upmu \text{g} / \text{m}^{3}\) threshold state of care). In addition, differences were found between SS and SJM AQIs. The AQI\(_{\text{SJM}}\) values were \(77 \%\) of moderate quality \(>14 \%\) of poor quality \(>8.8 \%\) of good quality, while the AQI\(_{SS}\) values were \(86 \%\) of good quality \(>14 \%\) of moderate quality. On the other hand, \(13 \%\) of hourly \(\text{PM}_{10\text{SS}}\) values exceeded the WHO reference value \(\left( 50 \upmu \text{g} / \text{m}^{3}\right)\) but their annual mean \(40 \upmu \text{g} / \text{m}^{3}\) did not exceed the WHO annual reference value (\(50 \upmu \text{g} / \text{m}^{3}\) ); while for \(\text{SJM}\), \(100 \%\) of hourly data exceeded the WHO reference. Likewise, \(15 \%\) of hourly \(\text{PM}_{10\text{SJM}}\) values exceeded the current national AQS \(\left( 100 \upmu \text{g} / \text{m}^{3}\right)\) and the annual mean \(\left( 78.7 \upmu \text{g} / \text{m}^{3}\right)\) exceeded the annual AQS \(\left( 50 \upmu \text{g} / \text{m}^{3}\right)\). The fact is that both locations present a potential risk to respiratory and cardiovascular diseases, especially SJM due to the daily exposure of people to higher PM\(_{10}\) levels.

Meteorological and PM\(_{10}\) variations in 2020 during the COVID-19 pandemic

Due to the pandemic, annual monitoring of meteorological variables in SJM was suspended between April and June and PM\(_{10}\) was monitored in January, February, September and October. In SS, meteorological monitoring was continuous and PM\(_{10}\) was monitored in February and July-December. On the other hand, the increase in monthly average RH in South Lima was notorious, as \(\text{SR}_{\text{SJM2020}}(274.74 \,\text{W}/\text{m}^{2})>\text{SR}_{\text{SJM2019}}(227.37 \,\text{W}/\text{m}^{2})\) and \(\text{SR}_{\text{SS2020}}\) \(\left( 307.68 \,\text{W} / \text{m}^{2}\right) >\text{SR}_{\text{SS} 2019}\left( 258.68 \,\text{W} / \text{m}^{2}\right)\). The wind pattern for SS had dominant east-southeast direction and for SJM southwest direction. In 2020, the comparison of PM\(_{10}\) concentrations for SJM corresponding to the months of each year showed no significant differences, while for SS a 25% increase PM\(_{10}\) concentration was observed. The pandemic did not produce reduced PM\(_{10}\) levels in 2020 as expected due to the operability of major emission sources during that time49. The monitoring results are shown in Fig. 4.

Figure 4
figure 4

Distribution of the monthly mean values of the meteorological variables at the SS and SJM stations, between January and December 2020: (a) WS (m/s), (b) WD (degrees), (c) T (\(^{\circ }\text{C}\)), (d) DP (\(^{\circ }\text{C}\)), (e) AP (hPa) and (f) RH (%), (g) SR (W/m\(^2\)) and (h) PM\(_{10}\) (\(\upmu\)g/m\(^{3}\)).

Correlations between meteorological variables and PM\(_{10}\)

Table 2, shows the correlations determined for the meteorological variables and PM\(_{10}\) data observed in 2019. Significant statistical values \((\text{p}<0.05)\), with strong and moderate magnitudes, are shown in bold: r\(_\text{T-DP}\) \((0.95788)>\text{r}_{\text{T-SR}}\) \((0.72471)>\text{r}_{\text{RH-SR}}(-0.66936)>\text{r}_{\text{RH-T}}(-0.6484)>\text{r}_{\text{DP-SR}}(0.61907)\). Regarding PM\(_{10}\), the order was moderate and weak: \({\text{r}_\text{WD}}-{\text{PM}_{10}}\) \((0.48192)>\text{r}_{\text{WS}-\text{PM}_{10}}(0.40526)>\text{r}_{\text{AP}-\text{PM}_{10}}(-0.39443)\) \(>\text{r}_{\text{RH}-\text{PM}_{10}}(0.18348)>\text{r}_{\text{DP}-\text{PM}_{10}}(0.13796)\). Likewise, Fig. 5 shows the fluctuations of each meteorological variable and PM\(_{10}\) concentrations. While, Fig. 6 presents the single variable regressions in Cartesian coordinates for southern Lima in the period 2019.

Table 2 Pearson correlations of the meteorological variables and the annual and seasonal PM\(_{10}\).
Figure 5
figure 5

Daily variations of meteorological variables and concentrations of PM\(_{10}\), for South Lima in 2019. (a) WS, (b) WD, (c) AP, (d) DP, (e) SR and (f) RH.

Figure 6
figure 6

Regressions of a single meteorological factor \((p < 0.05)\) and PM\(_{10}\) in South Lima in 2019: (a) AP-PM\(_{10}\) Annual, (b) WD-PM\(_{10}\) Annual, (c) WS-PM\(_{10}\) Annual, (d) RH-PM\(_{10}\) Annual, (e) DP-PM\(_{10}\) Annual, (f) PM\(_{10}\)-SR (summer).

The wind speed and direction presented in Fig. 5a, shows the average daily fluctuation of WS in southern Lima, with a U-shaped distribution for SS and SJM, and Fig. 5b shows the fluctuation for WD. Seasonally, the range of velocities was lower for SS \(\text{SS}(1.15-1.34 \,\text{m} / \text{s})\) and higher for SJM (1.49-1.86 m/s). Regarding the WS-PM\(_{10}\) relationship, some authors calculated negative correlations13,14. On the contrary, positive and direct correlations were calculated (r\(_{\text{WS-PM}_{10}}-\text{anual}(0.40526)\)), especially for summer (r\(_{\text{WS-PM}_{10}}(0.55296)\)) and autumn (r\(_{\text{WS-PM}_{10}}(0.56274)\)). These values are higher than other studies reported in Lima34, indicating influence on PM\(_{10}\) dispersion, resuspension and transport, including its decrease with the simultaneous diminution of wind50. The non-parametric regression produced a coefficient of determination \(\left( \text{R}^{2}=0.182\right)\), close to other studies on the same variable13 (Fig. 6c). On the other hand, with respect to annual WD in southern Lima, the correlations were also significant r\(_{\text{WD-PM}_{10}}(0.48192)\) (Fig. 5b), in the order: \(\text{r}_{\text{spring}}\) \((0.72592)>\) \(\text{r}_{\text{ summer }}(0.55989)>\text{r}_{\text{ winter }}(0.5149)>\text{r}_{\text{ autumn }}(0.08837)\).

Regarding the atmospheric pressure presented in Fig. 5c, shows higher AP values for lower PM\(_{10}\) values, confirmed by their negative correlation in the order: summer (\(\text{r}_{\text{AP}-\text{PM} 10}=-0.74682)>\) autumn \((\text{r}_{\text{AP}-\text{PM} 10}=-0.72994)>\) spring \(\left( \text{r}_{\text{AP}-\text{PM} 10}=-0.54587\right)>\) annual \(\left( \text{r}_{\text{AP}-\text{PM} 10}=-0.39443\right)\) (see Table 2). In that sense, Li points out that low atmospheric pressures associated with downward mass fluxes restrict the upward movement of PM by accumulating them in the air column14. The AP-PM\(_{10}\) statistical adjustment produced a weak significant regression \(\left( \text{R}^{2}=0.2725, \text{p}<0.05\right)\), being a stronger relationship calculated with respect to the other variables.

On the other hand, temperature, dew point and relative humidity were also evaluated in this study. Contrary to Govindasamy’s study, no significant correlations were generated for T-PM\(_{10}\), but significant correlations were generated for DP-PM\(_{10}\)49. These were weak and direct in winter (\(\left. \text{r}_{\text{DP}-\text{PM} 10}=0.2957\right)\) and spring (\(\left. \text{r}_{\text{DP}-\text{PM} 10}=0.184\right)\), and the regressions were also weak \(\left( \text{R}^{2}=0.0349\right)\) (Fig. 6e). According to Szep, if T \(>\text{DP}\), as occurs in southern Lima, the pollutant concentration and atmospheric stability decrease causing dilution of the pollutant and its partial wet precipitation40. The SS zone presents a greater \(\text{T}-\text{DP}\left( \text{SS}:3.65^{\circ } \text{C}\right)\) difference than \({\text{SJM}}\left( \text{SJM}:2.59^{\circ } \text{C}\right)\) which would favor a greater dilution of PM\(_{10}\).

On the other hand, it is evident that the high RH in Lima in winter is associated with haze and rainfall events, generating drifting particulate matter (PM\(_{10}\)), that is removed from the air column by wet precipitation40,51. The warm periods (summer) do not usually present rainfall and RH decreases as a characteristic of desert geography, generating a significant T-RH correlation (\(\text{r}_\text{T-RH}-\text{summer}=0.67751\) and \(\text{r}_\text{T-RH}-\text{summer}=0.80644\)). A more stable atmosphere with progressive decrease in the altitude of the TI base in January-May (warm periods) produced greater evaporation and accumulation of PM\(_{10}\) especially in SJM. The RH-PM\(_{10}\) correlation presented the following order: autumn (\(\left. \text{r}_{\text{RH}-\text{PM} 10}=0.22361\right)>\) spring (\(\text{r}_{\text{RH}-\text{PM} 10}=0.39192\) ) > winter \(\left( \text{r}_{\text{RH}-\text{PM} 10}=0.35943\right)\), which generated a weak regression \(\left( \text{R}^{2}=0.0475\right.\), Fig. 6d) as found in other international studies (\(\text{r}_{\text{RH}-\text{PM} 10}=0.382\) to 0.467)15.

In addition, about the solar radiation presented in Fig. 4g shows a marked variability in the trends between SR and PM\(_{10}\), with a weak inverse correlation in summer \((\text{r}=-0.21155)\). This led to calculate particular correlations in each zone according to the seasons of the year, among them: summer \(\left( \text{r}_{\text{SS}}=0.15194\right.\); \(\left. \text{r}_{\text{SJM}}=0.009844\right)\), autumn \(\left( \text{r}_{\text{SS}}=0.03957 ; \quad \text{r}_{\text{SJM}}=0.124166\right) , \quad\) winter \(\quad \left( \text{r}_{\text{SS}}=0.28932\right.\), \(\left. \text{r}_{\text{SJM}}=0.5552\right)\) and spring \(\left( \text{r}_{\text{SS}}=0.33422 ; \quad \text{r}_{\text{SJM}}=0.44675\right)\). The results were congruent with those reported in other investigations; for example, Vardoulakis and Kassomenos, also reported weak correlations in European cities during warm \((\text{r}=\) 0.02 to 0.06) and cool \((\text{r}=\) 0.11 to 0.38) months52. On the other hand, direct combinations of high PM\(_{10}\)-SR or low PM\(_{10}\)-SR values would be infrequent and suggest that high PM\(_{10}\) concentrations could decrease RH intensity53. Indeed, the study showed reductions in RH in this coastal area of the southern solstice circle54. being between \(11.5\%\) (summer) and \(25.3 \%\) (winter) for elevated seasonal averages of PM in SJM (73.14 to \(87.21 \upmu \text{g} / \text{m}^{3}\) ) compared to SS, and mineral dust aerosols scatter and absorb some of the RH reaching land53.

Multivariate relationships using the PCA

Table 3 shows the principal component analysis. Eigenvalues were produced for three principal components (\(\text{PC}\)) that explained between 80% and 88% of the total variance of PM\(_{10}\) concentrations. As the PC factor is the square of the factor loading, it has been interpreted as the equivalent of the coefficient of determination. The PM\(_{10}\) variances in summer \((83\%)\), autumn \((88\%)\), winter \((80\%)\) and spring \((88\%)\) showed moderate loadings of the variables with only one factor55.

In summer, factors PC1 (RH-AP) and PC2 (T-DP) explained \(67\%\) of the variance, with descending wind flows and low pressure levels favoring increased humidity and atmospheric stability and decreasing the altitude of the TI base and thermal gradients (1.1 to \(0.9^{\circ } \text{C} / 100 \,\text{m}\)), increasing PM\(_{10}\)56. In autumn, factors PC1 (RH-T-DP) and PC2 (AP-WS) explained \(71\%\) of the variance, witnessing in May atmospheric stability that prevented vertical development of the mixing layer and maintained high PM\(_{10}\) levels, but in June coastal winds, humidity, thermal gradient \(\left( 2.5^{\circ } \text{C} / 100 \,\text{m}\right)\) and TI base altitude (756.6 m) intensified56, driving aerosol dispersion. In winter, factors PC1 (RH-WS-WD) and PC2 (T-DP) explained 64% of the variance, with higher thermal gradients \(\left( 2.6^{\circ } \text{C}-3.4^{\circ } \text{C} / 100 \,\text{m}\right)\), wind intensity, humidity and TI base altitude \((>750 \,\text{m})\) favoring higher PM\(_{10}\) dispersion56. In spring, factors PC1 (AP-WD) and PC2 (RH-T-DP), explained \(76\%\) of the variance, producing greater atmospheric instability without significant humidity inputs and temperature increases that warmed the surface and favored the dispersion of the pollutant56. Figure 7 shows the 2 main factors (PC1 and PC2) that concentrated the highest percentages of the PCA in the seasons of the year for 2019.

Table 3 Principal Components of meteorological variables. EV: Eigenvalue (%); PV: Percentage of variance (%) and CV: Cumulative variance.
Figure 7
figure 7

Principal components analysis: (a) summer, (b) autumn, (c) winter, (d) spring, (e) annual.

Multiple regression model

The meteorological and PM\(_{10}\) data observed in 2019 produced the following multiple regression model with significant \(\text{p}\)-value \((\text{p}<0.05)\):

$$\begin{aligned} \text{PM}_{10}=612.9611+2.90988\text{RH}+14.68703\text{T}-16.8064\text{DP}-0.88883\text{AP}+22.90704\text{WS}+0.251\text{WD} \end{aligned}$$
(13)

The relative error for the intercept was equal to 217.335 and for the coefficients of the variables ranged from 0.16989 (AP) to 6.5235 (DP). The determination factor reflected a weak fit \(\left( \text{R}^{2}=0.3802\right)\), evidencing the limitations of the method. This result was consistent with the model described by Lin38, for the T-WS-PM\(_{10}\) combination \(\left( \text{R}^{2}=0.394\right)\). In contrast, Ceylan did not obtain a significant model in similar tests39. Likewise, Kamarul related RH-T with a MODIS-AOD550 satellite factor to improve the regression \(\left( \text{R}^{2}=0,66\right)\), but suggested optimizing it and including WS-WD16.

Three-dimensional models

The multiple variables integrally influence the dilution and diffusion of atmospheric pollutant13. Under this assumption, statistical fitting of 3D surfaces was performed. The results yielded strong and significant regressions \(\left( \text{R}^{2}>0.75; \text{p}<0.05\right)\), in the order: \(\text{RH}-\text{AP}-\text{PM}_{ 10}\left( \text{R}^{2}=0.94685\right.\), Fig. 8a) \(>\text{T}-\text{DP}\left( \text{R}^{2}=0.87646\right.\), Fig. 8b \()>\left( \text{AP}-\text{WS}-\text{PM}_{10}, \text{R}^{2}=0.85064\right.\), Fig. 8c) \(>(\text{AP}-\text{WD}-\) \(\text{PM}_{10}, \text{R}^{2}=0.77984\), Fig. 8d). These three-dimensional models (see Table 4) correspond with the results obtained in the PCA and explain that relative humidity and atmospheric pressure largely affect the PM\(_{10}\) concentration, as well as temperatures and wind action. Compared with the curve fitted under the influence of a single factor and MLR, the fitting performance of the functional relationship is higher and confirms that different meteorological factors have different effects on PM\(_{10}\) concentration.

Figure 8
figure 8

Functional relationships for meteorological bi-variable combinations with PM\(_{10}\) concentration, expressed as: (a) logRH-logAP, (b) logT-logDP, (c) logAP-logWS and (d) logAP-logWD.

Table 4 Three-dimensional models.

Comparing model predictions to monitoring data 2019 and model applied to 2020

Figure 9 shows the data fit between the observed PM\(_{10}\) and the calculated (modeled) PM\(_{10}\) for the year 2019. While, Fig. 10, represents the data fit in the 2020 assessment. The calibration of the models evaluated by the correlation coefficient indicates that the MLR performed better than the other models developed \((\text{r}=0.6166)\) between the outputs (modeled PM\(_{10}\)) and the observed data (Fig. 9a). Also, the three-dimensional function that combined the LogAP-LogWD-LogPM\(_{10}\) (Fig. 9b) presented a moderate correlation \((\text{r}=0.5753)\) unlike the other two 3D models. However, Table 5 shows that the RMSE for the MLR was higher (RMSE= 12.9226) relative to the others, but comparable to the modeling errors of another study \((\text{RMSE}=10.64-26.08, \,\text{T}-\text{AP}-\text{RH}-\text{WS})\)45. The 3D models had smaller errors (\(\text{RMSE}=0.0989\) a 0.2776) because the algorithm for 3D models by regression fitting generates more complex interactions between input and output data. The NSE criterion for MLR (NSE=0.3804) was closer to unity, and was also comparable to Nguyen’s fitting errors45 (between 0.26 and 0.53 ).

Figure 9
figure 9

Calibrations 2019: (a) multiple linear regression, (b) Log AP- Log WS-Log PM\(_{10}\) combinations, (c) Log RH-Log AP-Log PM\(_{10}\), (d) Log AP-Log WS-Log PM\(_{10}\).

Figure 10
figure 10

Evaluation 2020: (a) Multiple Linear Regression, (b) Log AP- Log WS-Log PM\(_{10}\) combinations, (c) Log RH-Log AP-Log PM\(_{10}\), (d) Log AP-Log WS-Log PM\(_{10}\).

Table 5 Statistical comparison results between the models studied for PM\(_{10}\). Model 1: Model 3D (LogAP-LogWD-LogPM\(_{10}\)); Model 3D (LogRH-LogAP-LogPM\(_{10}\)); Model 3D (LogAP-LogWS-LogPM\(_{10}\)); CAL: Calibration and EVA: Evaluation.

The evaluation of the models applied to the 2020 data showed a decrease in correlations, with the exception of the combined 3D model of RH-AP-PM\(_{10}\) and MLR which showed slightly higher correlations with \((\text{r}=0.4435)\) and \((\text{r}=0.3239)\) respectively. The RMSE for the MLR doubled relative to its calibration (\(\text{RMSE}=23.9983\)) and the values of the NSE criterion were all negative, but the NSE of the MLR and the 3D combined AP-WS models were closer to unity error, with values of − 2.7214 and − 1.5304, respectively. Consequently, two relevant aspects were highlighted:

  • Conditions in 2019 were characterized by intense anthropogenic activity. While in 2020, the cessation of activities at the beginning of the pandemic showed changes during the months of blocking and subsequent reactivation (May). These changes were reflected in the wind patterns, especially for SS, which presented a dominant direction towards the south in 2019. While in 2020 it was from east to south east, associated with the increase in PM\(_{10}\).

  • The results should be considered significant for predicting PM\(_{10}\) concentration. However, it is believed that the inclusion of new predictor variables related to TI base altitude, aerosol re-suspension, vehicular traffic and discriminations of anthropogenic and geogenic sources57 would help to improve the model to compare with others in order to minimize human health risks in times of pandemic.

Comparison of research on atmospheric quality in the city of Lima, Peru

The results of research conducted by different authors on air quality related to PM\(_{10}\) in the city of Lima were compared with the present study.

  • This research provides an easy and practical method with effective and reliable results through the development of statistical prediction models for PM\(_{10}\), based on multiple linear regression, use of three-dimensional logarithms and principal component analysis, under the influence of meteorological variables in the warm and cool season in southern Lima. This technique allows testing the applicability of the models and reveals the spatial distribution dynamics of PM\(_{10}\), strengthening decision making in environmental management related to the protection of human health through the prevention and control of PM\(_{10}\) air pollution in the context of constant urban growth.

  • Silva et al.58 evaluated the PM\(_{10}\) pollutant in the city of Lima over a 6-year period (2010-2015), showing that the highest PM\(_{10}\) concentrations were observed in the eastern part of the city, mainly in the summer (December to March). In addition, the authors identified large open spaces, vehicular traffic and the commercialization of rubble, bricks and cement as the main sources of particulate matter. These results are similar to those reported in this research; however, the authors conducted the research in a period before the COVID-19 pandemic, which reflects a stable situation in environmental conditions.

  • Reátegui-Romero at al.59 conducted a study on PM\(_{10}\) and PM\(_{2.5}\) pollutants during 2 months (February and July 2016), showing that the highest PM\(_{10}\) concentrations were observed in the northern area of Lima, and relative humidity is inversely proportional to PM\(_{10}\) concentrations, with higher peaks observed in the summer month (February). The authors’ results coincide with the findings of this research for such months, which are explained through the seasonal behavior pattern of South Lima and in the 3D model that demonstrates the influence of association between RH and AP on PM\(_{10}\) (Fig. 8a).

  • Sanchez et al.12 used the WRF-Chem to predict PM\(_{10}\) concentrations in Lima during April 2016, showing that there is a higher PM\(_{10}\) concentration in areas with greater impact of vehicular traffic, reaching values of 476.8 \(\upmu\)g/m\(^{3}\) for the Santa Anita station. The authors related temperature, relative humidity and wind speed, in addition to the incorporation of topographic and meteorological data that increased the accuracy in terms of normalized mean bias for PM\(_{10}\) based on the emissions inventory. In contrast, this research was based on the modeling of air quality through the exclusive relation of meteorological variables with PM\(_{10}\), showing the limitations of the applied models that could be enhanced with the inclusion of geomorphological factors, among others.

  • Cordova et al.34 evaluated the PM\(_{10}\) pollutant in Metropolitan Lima during 2017 and 2018, mentioning that the main sources were the vehicle fleet, the industrial park and overpopulation, reaching maximum values (974 \(\upmu\)g/m\(^{3}\)) at the Huachipa station for the summer months (December–March). Artificial neural networks, specifically, the Long Short-Term Memory model under two validation schemes were used to predict PM\(_{10}\) concentrations. The results showed good prediction performance for both low concentrations and critical episodes. The model presented a potential application for South Lima and could be compared with the simple methodology (MLR, 3D and PCA) applied in this research. However, an analysis of the RMSE errors calculated in both applications resulted in discrepant values (statistical methods: 0.0989 to 23.9983; ANN: 10.573 to 64.297) and very close Spearman correlations (MLR: 0.6166; 3D RH-AP Model: 0.5753; ANN: 0.517–0.756).

Limitations

This study has some limitations. There are gaps, about 67% are valid data (2019–2020). It is also limited to the application of statistical functions such as MRL, PCA and logarithmic functions using the factors, coefficient of determination, correlation coefficient, RMSE and NSE. The model is proposed with data from 2019 and extrapolated to the following year due to the limited availability of data in 2020. The application of three-dimensional models is limited by their low R\(^2\) (for the year 2020). The number of data represents a relatively short period (two years). A more extended period of hourly data may have allowed a more rigorous statistical analysis and more conclusive results.

Conclusions

A statistical modeling approach has been applied to predict PM\(_{10}\) concentration in two locations in South Lima (SJM and SS) both before (2019) and during the COVID-19 pandemic (2020), as a function of meteorological variables. The PCA evidenced the seasonal influence explained in the various combinations of meteorological variables on the distribution of PM\(_{10}\). The SJM district presented moderate to poor PM\(_{10}\) quality levels versus most acceptable values in SS. Calibration of the statistical models in 2019 demonstrated a better (significant) fit for the multiple linear regression model than the 3D modeling, while evaluation of the models in 2020 generated lower determination factors. Thus, this research strengthens the application of statistical models in predicting the spatial distribution of PM\(_{10}\), providing scientific support in decision making related to public health protection. As future work, we consider developing new models under the machine learning approach through the application of geostatistical models to compare the accuracy in the prediction of air quality for PM\(_{10}\). Likewise, address the analysis of air pollution before, during and after the pandemic using diagnostic measures for the class of nonparametric regression models with symmetric random errors, which includes all continuous and symmetric distributions60.