Introduction

The COVID-19 pandemic has had a significant impact on various aspects of society, including environmental factors such as methane emissions. Methane emissions contribute to climate change, which negatively impacts the survival of living species1. Several studies have explored the relationship between the pandemic and methane emissions using satellite data and statistical tools. For instance, Lyon et al.2 focused on measuring changes in methane emissions in the United States’ Permian Basin during the volatile oil price period associated with the pandemic. Stevenson et al.3 highlighted that COVID-19 lockdown measures led to reductions in emissions that could explain over half of the observed increase in global atmospheric methane levels. Sun et al.4 examined changes in formaldehyde columns related to COVID-19 and its association with anthropogenic emissions. These studies collectively demonstrate the complex interplay between the pandemic, human activities, and methane emissions. Additionally, McNorton et al.5 quantified methane emissions from hotspots and during COVID-19 using a global atmospheric inversion, while Thorpe et al.6 reported a decline in methane emissions due to reduced oil, natural gas, and refinery production during the pandemic. Hari et al.7 and Gulyaev et al.8 provided insights into the impact of the lockdown on atmospheric methane over India and in urban environments, respectively. Monteiro et al.9 analyzed the impact of the lockdown on greenhouse gases across multiple cities.

Moreover, Zhang et al.10 emphasized the importance of quantifying pollutant emission changes during the pandemic to understand its impacts on air quality and society. Peischl et al.11 studied the quantification of methane and ethane emissions from oil and natural gas production regions in the central and western United States. The study utilizes new satellite observations and atmospheric inverse modeling to report on methane emissions specifically from the Permian Basin, which is a significant oil-producing region in the U.S. Le Quéré et al.12 contextualized the 2020 emissions changes within the broader scope of fossil \(\hbox {CO}_2\) emissions post-COVID-19. Rita et al.13 discussed the pandemic’s impact on methane emissions, noting a slight increase due to reduced agricultural and fishery activities but an overall positive effect on the environment. Recent advancements in technology, such as Han et al.14 integration of spatial and statistical methods for climate change mitigation, Peng et al.15 use of deep learning models for precision agriculture, and Bhatti et al.16 development of hybrid UNet models for over 99% accurate brain tumor MRI segmentation, demonstrate significant progress across environmental monitoring, medical imaging, and disease diagnosis. Similarly, Zhang et al.17 presents a novel interactive annotation framework using Attention U-net and composite geodesic distance to improve the accuracy and efficiency of medical image labeling, significantly reducing user interaction and time. Moreover, recent research in medical image security, such as the encrypted double tagging algorithm as reported by C. Zeng et al.18 and Xiao, X et al.19, has effectively improved image protection during transmission.

Despite these insights, there remains a gap in studies that rigorously apply statistical tools to determine the significance of changes in methane emissions during the pandemic. For instance, Zhang et al.10 used TROPOMI satellite data and the ANOVA statistical tool to assess whether there were significant differences in NOx pollution data during 2020 amidst the pandemic. However, since all concentration comparison results showed insignificant values, they did not perform post-hoc analysis. Post-hoc analysis is typically necessary when there is a statistically significant difference between two groups, as it controls for Type I errors (false positives) that can occur during multiple comparisons. Moreover, when the sample sizes of the groups being compared differ, it is necessary to validate the ANOVA results with a t-test method (such as Welch’s t-test) and also to apply various post-hoc analyses. Although Peischl et al.11 employed Welch’s t-test to determine significant differences between annual methane and ethane measurements in the Denver basin, they provided limited detail on confidence intervals and other statistical parameters. Both the t-test and ANOVA assume that the sample has a normal distribution; however, if this assumption is violated, the statistical test results may be inaccurate and misleading. To ensure normality, a normality test is essential. However, in these publications, no normality test was conducted, and the assumption that the sample is normally distributed was made without empirical validation. V. Monteiro et al.9 and E. Gulyaev et al.8 similarly did not perform a normality test in their study. While these studies offer valuable insights and correlations related to the COVID-19 pandemic, they lack an in-depth and comprehensive statistical analysis that could further validate their findings.

In this study, we introduce a hybrid statistical method to analyze satellite data, specifically focusing on evaluating the significance of the annual increase in methane emissions in Seoul, South Korea. The key contributions of this research include:

  • Enhanced statistical analysis: Utilization of normality tests, Welch’s t-test, Student’s t-test, and ANOVA to rigorously assess differences in methane emissions, ensuring high precision and reliability in the results.

  • Quantitative impact assessment: Providing a robust, data-driven evaluation of the COVID-19 pandemic’s impact on methane emissions, offering clear insights into changes with statistical confidence.

  • Methodological innovation: Introducing a novel hybrid statistical approach that sets a new standard for analyzing environmental data, establishing a foundation for future research in this area.

Methods

The flowchart in Figure 1 outlines the process of analyzing methane emissions in Seoul using satellite data. It starts with data collection and pre-processing. The analysis involves a hybrid statistical approach: normality tests determine whether to use t-tests and ANOVA or other alternative methods. If ANOVA showed significant differences, post-hoc analysis was conducted. This study highlights the significant year-to-year changes in methane emissions in South Korea and introduces a hybrid statistical approach that sets a new standard and can be applied to similar cases in the atmospheric science community. The hybrid statistical analysis conducted in this study used Python as the main programming language, with all related algorithms provided in the Supplementary Information to support replication of the study.

TROPOMI’s sentinel-5-precursor data

Time-series data of methane gas concentrations in South Korea were obtained from the Tropospheric Monitoring Instrument (TROPOMI) data onboard the Sentinel-5-Precursor satellite. The area of interest is centered around Olympic Park in Seoul, with coordinates latitude 37.5207 and longitude 127.1227. Level 3 data from the satellite was processed by averaging the methane gas concentrations within a 1000-meter radius around this point. In the satellite data, methane concentration is expressed as \(\hbox {XCH}_4\), which represents the atmospheric total mole fraction of column-averaged dry air. Measurement data from 2019 to 2023 were collected and analyzed. The data is available through access to the Google earth engine.20

Normality tests

In the ANOVA and t-test, the assumption is that the samples are normally distributed. However, if the samples are not normally distributed then the results of these statistical tests can be misleading or inaccurate. Therefore, normality tests are required to confirm whether the samples under test follow a normal distribution before conducting ANOVA and t-test. The null hypothesis (H0) used in this study is that the samples are normally distributed, while the alternative hypothesis (H1) of this study is uniform distribution. Two common tests used for this purpose are the Shapiro-Wilk (S-W) test and the Kolmogorov-Smirnov (K-S) test. The Shapiro-Wilk test, developed by Shapiro and Wilk21, is particularly effective for small to medium-sized samples and is recognized for its high power in detecting deviations from normality. On the other hand, the K-S test, as detailed by Yap and Sim22, is a non-parametric test that compares the cumulative distribution function (CDF) of the data with that of a normal distribution.

K-S test

To perform the K-S test, the empirical distribution function (EDF) \(F_n(x)\) is calculated using the equation:

$$\begin{aligned} F_n(x) = \frac{\text {no. of observations} \le x}{n}, \quad -\infty< x < \infty \end{aligned}$$
(1)

where \(F_n(x)\) is defined as a step function that increases by \(\frac{1}{n}\) at each of the \(n\) data points. It represents the cumulative probability up to each point in the sample. The cumulative distribution function (CDF) \(F(x)\) for the normal distribution is calculated as:

$$\begin{aligned} F(x) = \frac{1}{2} \left[ 1 + \text {erf}\left( \frac{x - \mu }{\sigma \sqrt{2}} \right) \right] \end{aligned}$$
(2)

where \(\text {erf}\) is the error function. Afterward, the K-S test statistic \(D\) is calculated using the equation:

$$\begin{aligned} D = \max _x \left| F_n(x) - F(x) \right| \end{aligned}$$
(3)

which measures the largest deviation between the observed data and the expected distribution. The \(p\)-value is then calculated based on \(D\), which represents the probability of observing such a difference (or greater) if the null hypothesis is true. Finally, the decision is made by comparing the \(p\)-value with significance level \(\alpha\) (typically 0.05). If \(p\)-value \(< \alpha\), then the conclusion is to reject the null hypothesis (the sample does not come from a normal distribution).

S-W test

The S-W test statistic \(W\) is based on the covariance matrix of the ordered sample values \((X_{(1)}, X_{(2)}, \ldots , X_{(n)})\) and calculated using the following equation:

$$\begin{aligned} W = \frac{\left( \sum _{i=1}^{n} a_i X_{(i)} \right) ^2}{\sum _{i=1}^{n} (X_i - {\bar{X}})^2}, \end{aligned}$$
(4)

where \({\bar{X}}\) is the sample mean and \(a_i\) are constants derived from the expected values of the order statistic under the normal distribution (null hypothesis). The result \(W\) is then compared with the critical values from the Shapiro-Wilk distribution for the given sample size \(n\) and chosen significance level \(\alpha\).

Monte Carlo simulation

Monte Carlo simulations are particularly useful when evaluating the type I error rate (false positive rate) and type II error rate (false negative rate), as well as estimate statistical power under various conditions.The statistical power is the probability that the test will correctly reject a false null hypothesis (i.e., the probability of avoiding a Type II error)23. Therefore, the higher the power, the lower the probability of making a Type II error. Monte Carlo simulations were done by repeatedly sampling from the distribution (null data) and calculating the test statistic for each sample.

ANOVA and post-hoc analyses

The second statistical tool used in this study is the analysis of variance (ANOVA). ANOVA is a statistical method used to analyze the means of three or more groups to determine if there are statistically significant differences among them. It does this by partitioning the total variance observed in the data into different components: variance between groups \(MSB\) and variance within groups \(MSW\). ANOVA uses the F-statistic to determine if the differences between group means are significant. The F-statistic is calculated as the ratio of the variance between groups to the variance within groups.

$$\begin{aligned} F = \frac{MSB}{MSW} = \frac{\sum _{i=1}^{k} n_i ({\bar{X}}_i - {\bar{X}})^2 / (k-1)}{\sum _{i=1}^{k} \sum _{j=1}^{n_i} (X_{ij} - {\bar{X}}_i)^2 / (N-k)} \end{aligned}$$
(5)

Here, \({\bar{X}}_i\) is the mean of the \(i\)-th group; \(n_i\) is the number of observations in the \(i\)-th group; \({\bar{X}}\) is the overall mean of all observations; \(N\) is the total number of observations; \(k\) is the number of groups; \(X_{ij}\) is the \(j\)-th observational value in the \(i\)-th group. A high F-value indicates that the group means are significantly different, while a low F-value suggests that the group means are similar.24,25

ANOVA tells if there are any significant differences between group means but does not indicate which specific groups are different from each other. Post-hoc tests help identify these specific group differences. Post-hoc analyses in ANOVA are essential for identifying specific differences between groups after an overall analysis has been conducted. In addition, the risk of committing a Type I error (false positive) increases when making multiple comparisons because each individual test carries its own chance of incorrectly rejecting a true null hypothesis (finding a difference when there isn’t one). When multiple comparisons are made, the risk of committing at least one Type I error increases with the number of comparisons. This cumulative risk is known as the familywise error rate (FWER). To address this increased risk, post-hoc analyses use various methods to control the overall Type I error rate.26

One common method used for post-hoc comparisons is the Bonferroni correction. The Bonferroni correction is frequently employed to adjust the FWER following ANOVA, especially when multiple pairwise comparisons are made.27 The Bonferroni correction procedure is performed by correcting the level of significance \(\alpha\) by dividing it by the number of comparisons k, performed by ANOVA.

$$\begin{aligned} \alpha _{\text {adjusted}} = \frac{\alpha }{k} \end{aligned}$$
(6)

This results in a new adjusted level of significance \(\alpha _{\text {adjusted}}\). After that, the p-value is compared with \(\alpha _{\text {adjusted}}\), and if the p-value is lower, then the result is considered statistically significant for each test.

The second method that can be used to perform post-hoc analysis is Tukey’s Honestly Significant Different (HSD) test. Tukey’s HSD test is applied after ANOVA to determine which specific group means differ significantly from each other.28 It provides a balance between controlling the overall Type I error rate and being sensitive enough to detect true differences between groups. After conducting ANOVA analysis, HSD parameters are calculated with the following equation for equal sample sizes:

$$\begin{aligned} \text {HSD} = q \cdot \sqrt{\frac{\text {MSW}}{n}} \end{aligned}$$
(7)

and using this formula for unequal sample sizes:

$$\begin{aligned} \text {HSD} = q \cdot \sqrt{\frac{\text {MSW}}{2 \left( \frac{1}{n_i} + \frac{1}{n_j}\right) }} \end{aligned}$$
(8)

where \(q\), \(n\), \(n_i\), \(n_j\) are the critical value from the studentized range distribution, sample size, sample size in groups \(i\) and \(j\), respectively. Then the absolute difference value of each pair of groups \(|{\bar{X}}_i - {\bar{X}}_j|\) is compared with HSD. If the value of \(|{\bar{X}}_i - {\bar{X}}_j| > \text {HSD}\), then the null hypothesis is rejected (the pairs show statistically significant difference).

Another commonly used method for post-hoc analyses in ANOVA is Scheffe’s method. Scheffe’s method is a conservative approach that is suitable for situations where the number of comparisons is not predetermined. It is particularly useful when the number of groups being compared is small and when the assumption of homogeneity of variances is violated.29

The student’s and the Welch’s t-test

The t-test is a statistical method commonly used to compare the means of two independent groups. It assesses whether there is a statistically significant difference between the means of the two groups. The Student’s t-test calculates a t-statistic t based on the difference between the means of the two groups and the variance within each group.

$$\begin{aligned} t = \frac{{\bar{x}} - {\bar{y}}}{s_p \cdot \sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} \end{aligned}$$
(9)

where \(n_x\) and \(n_y\) are the number of data for the first and second group, respectively. The pooled variance \(s_p\) is given by

$$\begin{aligned} s_p = \sqrt{\frac{(n_x - 1)s_x^2 + (n_y - 1)s_y^2}{n_x + n_y - 2}} \end{aligned}$$
(10)

The t-statistic is then compared to a critical value from the t-distribution to determine if the observed difference between the group means is significant. A significant result indicates that the means of the two groups are different, while a non-significant result suggests that there is no significant difference between the group means.30

The variances of the two samples are pooled to obtain the most accurate estimate of the assumed equal variances of the two populations. This highlights the importance of the underlying assumption of equal population variances in the Student’s t-test. When these variances are actually unequal, the Student’s t-test performs poorly, leading to issues with both Type I and Type II errors.31 To overcome this, the statistical tool Welch’s t-test, also known as the unequal variances t-test t’, is used.

$$\begin{aligned} t' = \frac{{\bar{x}} - {\bar{y}}}{\sqrt{\frac{1}{n_x} + \frac{1}{n_y}}} \end{aligned}$$
(11)

The calculation of degrees of freedom in the Welch’s t-test is calculated differently from the Student’s t-test. In the Student’s t-test, the degrees of freedom are simply calculated with

$$\begin{aligned} \nu = n_x + n_y - 2 \end{aligned}$$
(12)

while in the Welch’s t-test they are calculated with

$$\begin{aligned} \nu ' = \frac{\left( \frac{s_x^2}{n_x} + \frac{s_y^2}{n_y}\right) ^2}{\frac{\left( \frac{s_x^2}{n_x}\right) ^2}{n_x - 1} + \frac{\left( \frac{s_y^2}{n_y}\right) ^2}{n_y - 1}} \end{aligned}$$
(13)

The permutation test

The permutation test, often referred to as the randomization test, is a non-parametric statistical method used to determine the significance of an observed effect.32 To conduct a permutation test, first state the hypotheses: a null hypothesis stating there is no significant difference, and an alternative hypothesis stating there is a significant difference. Next, calculate the test statistic for the observed data. Then, make permutations by shuffling the data, dividing it into groups, and calculating the statistic for each shuffle. Repeat this many times to create a distribution under the null hypothesis. Finally, calculate the p-value from this distribution and decide whether to reject the null hypothesis if the p-value is lower than the significance level.

Results and discussion

Figure 2 exemplifies the time series data of the total column-averaged dry-air mole fractions of methane concentration \(\text {XCH}_4\) in the Seoul area. Over the course of the year, the methane concentration exhibits distinct temporal increase, with noticeable peaks and troughs especially before and after COVID-19 pandemic event. The data indicates higher concentrations after COVID-19, possibly due to increased anthropogenic activities. To further understand the variations and underlying factors contributing to the methane concentrations, a series of statistical analyses were performed. These analyses will help us determine if there are significant differences in methane levels across different time periods and conditions.

Boxplot

Figure 3 shows the boxplot analysis of methane data in the Seoul area from 2019 to 2023, considering the context of the COVID-19 pandemic, revealed intriguing insights into the impact of environmental changes during this period. The boxplot shows fluctuations in methane levels across the years, with notable variations that could be linked to the COVID-19 pandemic. According to Kim et al.33, COVID-19 in South Korea occurred in three waves. The first wave started around January 20 to April 26, 2020. The second wave occurred from July 28 to October 12, 2020. And the last wave of covid occurred from November 3 to February 1, 2021. During this period, lockdown and social distancing were strictly enforced. The implementation of lockdown measures and restrictions during the pandemic period may have led to shifts in methane emissions and atmospheric conditions in Seoul. The data displayed in the boxplot could reflect the changes in industrial activities, transportation patterns, and overall human mobility that occurred in response to the pandemic regulation. In order to make robust qualitative conclusions along with statistical confidence values, statistical tests must be performed.

The normality tests

The results of the normality test are summarized in detail in the supplementary materials. The analysis indicated that the samples followed a normal distribution, validating the use of the t-test and ANOVA. Notably, some previous studies overlooked the normality assumption when employing t-tests and ANOVA, despite its critical importance. Failure to confirm normality can lead to misleading conclusions, as t-tests and ANOVA results can be invalid when samples are not normally distributed.

ANOVA and post-hoc analyses

Since the data groups compared in Figure 2 consist of 5 groups, it is more efficient to perform ANOVA analysis first. The results of the ANOVA analysis are shown in Table 1. The ANOVA results reveal that there are statistically significant differences in methane concentrations across the years, as indicated by the very low p-value ( \(2.02\times 10^{-13}\)) and the high F-value (26.57198). This means that the year-to-year variation in methane concentrations is greater than what would be expected by random chance. To determine which specific years differ from each other, post-hoc analyses are necessary. These tests will identify the pairs of years with significant differences in methane concentrations.

The ANOVA analysis, followed by post-hoc comparisons using Bonferroni correction, Tukey’s HSD, and Scheffe’s method, reveals several similar patterns of significant differences in methane concentrations between different years as shown in Table 2. Methane concentrations before the COVID-19 pandemic event in (2019 vs 2020) were not significantly different from across all three post-hoc methods (p-values: Bonferroni - 0.1045, Tukey’s HSD - 0.397, Scheffe’s - 0.1045). During the pandemic event (2020 vs 2021, 2021 vs 2022) methane concentrations did not differ significantly. However, after the COVID-19 pandemic (2022 vs 2023), the methane concentrations between 2022 and 2023 showed a significant difference (p-values: Bonferroni - 0.0001, Tukey’s HSD - 0.0002, Scheffe’s - 0.0001).

The possibility that could cause a significant increase after the COVID-19 pandemic is related to the implementation of the “new normal” policy. The “new normal” policy refers to the set of regulations and societal behaviors adopted in response to the COVID-19 pandemic to mitigate its spread and manage its impacts on daily life. During lockdowns, strict measures such as stay-at-home orders, closure of non-essential businesses, and bans on large gatherings were implemented. The “new normal” policies emerged as these lockdowns were gradually lifted, aiming to sustain virus control while allowing more routine activities to resume. These measures differ from pre-pandemic norms by emphasizing ongoing preventive practices even as restrictions ease, reflecting a permanent shift in public health strategy. Starting in November 2021, South Korea implemented a “new normal” policy and effectively implemented from the beginning of 202234, which is likely to affect the increase in methane concentrations. The “new normal” may cause changes in industrial activities, transportation patterns, and other socioeconomic factors that contribute to higher methane emissions. For example, as restrictions are relaxed and economic activities resume, increased movement of people and goods may result in higher emissions from transportation and industry.

The student’s and the Welch’s t-test

The analyses using Student’s t-test and Welch’s t-test were performed to validate the ANOVA results. While both tests were used, Welch’s t-test is more appropriate due to the differing sample sizes across the years (2019, 2020, 2022, and 2023). Nonetheless, the results from both Student’s t-test and Welch’s t-test showed no difference in this study as shown in Figure 4. The permutation test results for the 2019 and 2020 data showed the same test statistic value of -0.0096, with p-values of 0.1191 (Student’s t-test) and 0.1156 (Welch’s t-test). Since the test statistic is much smaller than the p-values, we conclude there is no significant difference between these two years. For 2022 and 2023, the permutation test showed a test statistic value of -0.0172 and a p-value of 0.0001 (identical for both t-tests). Here, the test statistic is much larger than the p-value, indicating a significant difference between these years. The permutation test results suggested that before the pandemic, the test statistic fell within the 95% confidence interval, indicating no significant difference in methane levels. However, post-pandemic, the test statistic fell outside the histogram of the permutation test, confirming the ANOVA results that showed significant differences in methane concentrations after COVID-19.

Fig. 1
figure 1

Flowchart of satellite data analysis using hybrid statistical analysis.

Fig. 2
figure 2

Time series measurements of methane concentrations around Olympic Park, Seoul from data from the TROPOMI instrument on board Sentinel-5-Precursor satellite.

Fig. 3
figure 3

Boxplot of methane gas measurements around Olympic Park, Seoul from TROPOMI instrument data.

Fig. 4
figure 4

Comparison of t-test results before and after the COVID-19 pandemic in South Korea.

Comparison with previous studies

Monteiro et al.9 analyzed methane emissions during the COVID-19 lockdown using ground-based networks in U.S. and Canadian cities, employing high-precision techniques such as Cavity Ring-Down Spectroscopy (CRDS), non-dispersive infrared absorption, and off-axis integrated cavity output spectroscopy. Despite these detailed observations, they found inconsistent and mostly insignificant changes in methane levels, likely due to urban methane sources like natural gas leaks and landfills, which were minimally impacted by the lockdown. This contrasts with our satellite-based study, which revealed significant post-pandemic increases in methane concentrations in Seoul. This discrepancy can be attributed to differences in data collection methods and spatial scales. Ground-based instruments provide continuous, localized measurements limited to surface-level methane concentrations, while satellite data offers broader spatial coverage, capturing column-averaged methane concentrations and integrating emissions over larger areas, thus providing a more comprehensive perspective that includes both ground-level and upper-atmosphere concentrations.

M. Hari et al.7, using TROPOMI satellite data, observed a reduction in methane levels during the COVID-19 lockdown, consistent with the decrease noted in our analysis of TROPOMI data for Seoul. Similarly, Gulyaev et al.8, who also utilized CRDS in their ground-based study, reported a reduction in inter-hourly methane variation during the lockdown, followed by a significant increase thereafter in Ekaterinburg. Our study found no significant change in methane concentrations during the initial lockdown phase, but we observed a notable increase post-2021. These findings align with the general trend of decreased emissions during lockdowns, followed by a rebound as restrictions were eased.

Adding to this, J.M. Norton et al.5 used a high-resolution 4D-Var global inversion system based on the ECMWF Integrated Forecasting System (IFS) and newly available satellite observations to analyze methane emissions. They reported that global anthropogenic methane emissions in the first half of 2020 were slightly higher than in 2019, mainly due to the energy and agricultural sectors. While they observed a temporary increase in emissions from China during the early months of the global slowdown, a decrease below expected levels followed in later months. This suggests that the overall impact of the COVID-19 slowdown on methane emissions might be small compared to the long-term positive trend in emissions. Our findings of a significant increase in methane concentrations post-2021 align with this long-term trend, indicating that the temporary reductions during the lockdown were not sufficient to offset the broader upward trajectory in methane emissions.

Growth rate of methane emissions

From long-term measurement data of methane conducted by Lee, H et al.,35 using the cavity ring-down spectrometer (CRDS) instrument in Anmyeongdo, South Korea, it was found that the baseline increase in methane gas for five years before the COVID-19 pandemic (2016-2020) was 8.3 ppb\(\cdot\)yr-1 (similar to WMO global mean: \(9 \pm 2 \, \text {ppb}\cdot \text {yr}^{-1}\)). While the results of this study before the COVID-19 pandemic (2019 to 2020) showed that the concentration of methane gas increased by 10.1 ppb\(\cdot\)yr-1 in Seoul, not much different from the results of their study. The data shows that this baseline increase is relatively steady, suggesting consistent sources and processes influencing methane emissions over that period. However, after the COVID-19 pandemic (2022-2023), the concentration of methane gas increased sharply to 17.8 ppb\(\cdot\)yr-1 indicates a significant deviation from the established baseline. This significant increase, as confirmed by t-test and ANOVA statistical tests, highlights that the pandemic period had an additional effect on methane emissions.

Table 1 ANOVA result table from methane measurement during 2019 to 2023.
Table 2 ANOVA Post-Hoc analysis results.

Conclusions and future works

This study presents a comprehensive analysis of methane concentrations in Seoul, South Korea, from 2019 to 2023, with a particular emphasis on the impact of the COVID-19 pandemic and “new normal” policies, employing hybrid statistical methods. The results revealed that there was no significant difference in methane concentrations between 2019 and 2020, indicating that the initial phase of the pandemic did not drastically alter methane emissions in Seoul. During the pandemic period (2020-2021), year-to-year comparisons also showed no significant differences in methane concentrations, suggesting that the lockdown and social distancing measures maintained similar levels of emissions. However, post-2021, there was a significant increase in methane concentrations, particularly between 2022 and 2023. These findings, consistent across all three post-hoc tests (Bonferroni correction, Tukey’s HSD, Scheffe’s method), suggest that the “new normal” policy implemented in November 2021 and fully in effect from early 2022 likely contributed to these increases, reflecting resumed economic activities, increased industrial operations, and higher transportation emissions. The results from Student’s t-test and Welch’s t-test agreed well with the ANOVA findings. Additionally, the results of the normality tests (Shapiro-Wilk and Kolmogorov-Smirnov) indicate that the sample data can be considered to come from a normal distribution. This result supports the t-test and ANOVA to produce accurate results because these tests can only be performed on samples with a normal distribution.

The beneficiaries of this study include policymakers, environmental scientists, and urban planners. Additionally, research scientists in other fields, such as those developing algorithms in watermarking technology to detect significant changes in images19,36,37,38, can also benefit from the methodologies used in this study. Policymakers can use these findings to develop targeted interventions aimed at reducing methane emissions. By understanding the impact of the “new normal” policy, strategies can be designed to balance economic activities with environmental sustainability. Urban planners can incorporate these insights into the development of greener urban infrastructure, promoting sustainable city environments. From a policy perspective, the significant increase in methane concentrations post-2021 underscores the need for robust environmental regulations and monitoring systems. Policies that limit industrial emissions, enhance public transportation, and encourage renewable energy use are critical. The study highlights the importance of maintaining some of the reduced-activity patterns observed during the pandemic to achieve long-term environmental benefits.

Despite the robust statistical analyses employed in this study, there are several limitations. First, the study relies on satellite data, which, while comprehensive, may not capture finer-scale variations in methane concentrations. Second, the analysis is constrained to Seoul, and the findings may not be directly applicable to other regions with different socio-economic and environmental contexts. Third, the study period is relatively short, covering only five years, which may not fully capture long-term trends in methane emissions. Future research should address these limitations by incorporating ground-based measurement data to validate and complement satellite observations. Expanding the study to other regions and extending the analysis period would provide a more comprehensive understanding of methane emission trends. Additionally, investigating the specific sources of increased methane emissions post-pandemic, such as changes in industrial activity and transportation patterns, would offer more detailed insights into the underlying factors contributing to these trends. By addressing these questions, the study not only enhances our understanding of the pandemic’s environmental impact but also informs the development of effective policies to mitigate greenhouse gas emissions and combat climate change.