Introduction

Solar flares are sudden explosions of energy that occur on the Sun’s surface. They often occur in solar active regions (ARs), caused by strong magnetic fields typically associated with sunspot areas. Solar flares are categorized into five classes A, B, C, M, and X, with A-class flares having the lowest intensity and X-class flares having the highest intensity. Major flares are usually accompanied by coronal mass ejections and solar energetic particles1,2,3,4,5,6,7. These eruptive events can have significant and harmful effects on or near Earth, damaging technologies, power grids, space stations, and human life8,9,10,11. Therefore, providing accurate and early forecasts of solar flares is crucial for disaster risk management, risk mitigation, and preparedness.

Although a lot of effort has been devoted to flare prediction12,13,14,15, developing accurate, operational near-real-time flare forecasting systems remains a challenge. In the past, researchers designed statistical models for the prediction of flares based on the physical properties of active regions16,17,18. With the availability of large amounts of flare-related data14, researchers started using machine learning methods for flare forecasting3,19,20. More recently, deep learning, which is a subfield of machine learning, has emerged and showed promising results in predicting solar eruptions, including solar flares21,22.

For example, Nishizuka et al.23 developed deep neural networks to forecast M- and C-class flares that would occur within 24 h using data downloaded from the Solar Dynamics Observatory (SDO)24 and the Geostationary Operational Environmental Satellite (GOES). Sun et al.22 employed three-dimensional (3D) convolutional neural networks (CNNs) to forecast \(\ge\)M-class and \(\ge\)C-class flares using Space-weather HMI Active Region Patches (SHARP)25 magnetograms downloaded from the Joint Science Operations Center (JSOC) accessible at http://jsoc.stanford.edu/. Li et al.26 also adopted a CNN model to forecast \(\ge\)M-class and \(\ge\)C-class flares using SHARP magnetograms where the authors restructured the CNN layers in their neural network with different filter sizes. Deng et al.27 developed a hybrid CNN model to predict solar flares during the rising and declining phases of Solar Cycle 24.

Some researchers adopted SHARP magnetic parameters in time series for flare prediction. Static SHARP parameters quantitatively describe the properties of ARs, especially their ability to produce flares, at a given time. On the other hand, dynamic information, such as the magnetic helicity injection rate, sunspot motions, shear flows, and magnetic flux emergence/flux cancelation, is more important for flare forecasting. Using time series of SHARP parameters allows a model to capture the relationship between the evolution of magnetic fields of ARs and solar flares, hence achieving more accurate flare predictions 28,29. In an earlier study, Yu et al.30 added the evolutionary information of ARs to a predictive model for the prediction of short-term solar flares. More recently, Chen et al.31 designed a long short-term memory (LSTM) network to identify precursors of solar flare events using time series of SHARP parameters. LSTM is suitable for capturing the temporal dynamics of time series. Liu et al.9 developed another LSTM network with a customized attention mechanism to direct the network to focus on important patterns in time series of SHARP parameters. Sun et al.32 attempted to distinguish between ARs with strong flares (\(\ge\)M-class flares) and ARs with no flare at all. The authors showed that combining LSTM and CNN can better solve the “strong versus quiet” flare prediction problem, with data from both Solar Cycle 23 and Cycle 24. All of the aforementioned studies provided valuable models and algorithms in the field. However, the existing methods focused on short-term forecasts (usually within 24 h). Furthermore, the models were not used as operational systems.

In this paper, we propose a new deep learning approach to predicting solar flares using time series of SHARP parameters. Our approach employs a transformer-based framework, named SolarFlareNet, which predicts whether there would be a flare within 24 to 72 h, where the flare could be a \(\ge\)M5.0-, \(\ge\)M-, or \(\ge\)C-class flare. We further extend SolarFlareNet to produce probabilistic forecasts of flares and implement the probabilistic model into an operational, near real-time flare forecasting system. Experimental results demonstrate that SolarFlareNet generally performs better than, or is comparable to, related flare prediction methods.

Results

Deterministic prediction tasks

For any given active region (AR) and time point t, we predict whether there would be a \(\gamma\)-class flare within the next 24 h (48 h, 72 h, respectively) of t where \(\gamma\) can be \(\ge\)M5.0, \(\ge\)M, or \(\ge\)C. A \(\ge\)M5.0-class flare means the GOES X-ray flux value of the flare is above \(5 \times 10^{-5}\text{ Wm}^{-2}\). A \(\ge\)M-class flare refers to an X- or M-class flare. A \(\ge\)C-class flare refers to an X-class, M-class, or C-class flare. We focus on these three classes of flares due to their importance in space weather9,19,23,33. We developed three transformer models to tackle the three prediction tasks individually and separately. Notice that we did not consider \(\gamma\) to be \(\ge\)X due to the lack of samples for X-class flares. Instead, we use \(\ge\)M5.0 as the most significant class, which contains sufficient samples.

Comparison with previous methods

We conducted a series of experiments to compare the proposed SolarFlareNet framework with closely related methods. All these methods perform binary classifications/predictions as defined above. We adopt several performance metrics. Formally, given an AR and a data sample \(x_{t}\) observed at time point t, we define \(x_{t}\) to be a true positive (TP) if the \(\ge\)M5.0 (\(\ge\)M, \(\ge\)C, respectively) model predicts that \(x_{t}\) is positive, i.e., the AR will produce a \(\ge\)M5.0- (\(\ge\)M-, \(\ge\)C-, respectively) class flare within the next 24 h of t, and \(x_{t}\) is indeed positive. We define \(x_{t}\) as a false positive (FP) if the \(\ge\)M5.0 (\(\ge\)M, \(\ge\)C, respectively) model predicts that \(x_{t}\) is positive while \(x_{t}\) is actually negative, i.e., the AR will not produce a \(\ge\)M5.0- (\(\ge\)M-, \(\ge\)C-, respectively) class flare within the next 24 h of t. We say \(x_{t}\) is a true negative (TN) if the \(\ge\)M5.0 (\(\ge\)M, \(\ge\)C, respectively) model predicts \(x_{t}\) to be negative and \(x_{t}\) is indeed negative; \(x_{t}\) is a false negative (FN) if the \(\ge\)M5.0 (\(\ge\)M, \(\ge\)C, respectively) model predicts \(x_{t}\) to be negative while \(x_{t}\) is actually positive. We also use TP (FP, TN, and FN, respectively) to represent the total number of true positives (false positives, true negatives, and false negatives, respectively). The TP, FP, TN, and FN for the 48-h and 72-h ahead predictions are defined similarly. The performance metrics are calculated as follows:

$$\begin{aligned} \text {Recall} = \frac{ \textrm{TP}}{\mathrm {TP + FN}} \end{aligned}$$
(1)
$$\begin{aligned} \text {Precision} = \frac{ \textrm{TP}}{\mathrm {TP + FP}} \end{aligned}$$
(2)
$$\begin{aligned} \text {Accuracy (ACC)} = \frac{ \mathrm {TP + TN}}{\mathrm {TP+FP + TN + FN}} \end{aligned}$$
(3)
$$\begin{aligned} \text {Balanced Accuracy (BACC)} = \frac{1}{2} \left( \frac{ \textrm{TP}}{\mathrm {TP+FN}} + \frac{ \textrm{TN}}{\mathrm {TN+FP}}\right) \end{aligned}$$
(4)
$$\begin{aligned} \text {True Skill Statistics (TSS)} = \frac{ \textrm{TP}}{\mathrm {TP+FN}} - \frac{\textrm{FP}}{\mathrm {FP+TN}} \end{aligned}$$
(5)

Table 1 compares SolarFlareNet with related methods for 24-h ahead flare predictions. The performance metric values of SolarFlareNet are mean values obtained from 10-fold cross-validation9. The metric values of the highest performance models in the related studies are taken directly from those studies and are displayed in Table 1. The symbol ‘–’ means that a method does not produce the metric value for the corresponding prediction task. The best metric values are highlighted in boldface. TSS is the primary metric used in the literature on flare prediction. It can be seen from Table 1 that SolarFlareNet outperforms the state-of-the-art methods in terms of TSS except that Liu et al.9 perform better than SolarFlareNet in predicting \(\ge\)M5.0 class flares.

Table 1 Performance comparison between SolarFlareNet and related methods for 24-h ahead flare predictions.

Table 2 presents the mean performance metric values with standard deviations enclosed in parentheses for the 48- and 72-h forecasts made by SolarFlareNet. None of the existing methods in Table 1 provides predictions in 48 or 72 h in advance and, therefore, they are not listed in Table 2. Overall, SolarFlareNet performs well for the 48- and 72-h forecasts. However, the metric values of the tool in Table 2 are lower than those in Table 1. This is understandable due to the longer range of predictions in Table 2.

Table 2 Performance metric values of SolarFlareNet for 48- and 72-h ahead flare predictions.

Probabilistic forecasting with calibration

SolarFlareNet is essentially a probabilistic forecasting method, producing a probability between 0 and 1. The method compares the probability with a predetermined threshold, which is set to 0.5. Given an AR and a data sample \(x_{t}\) at time point t, if the predicted probability is greater than or equal to the threshold, then the AR will produce a flare within the next 24 (48, 72, respectively) hours of t (i.e., \(x_{t}\) belongs to the positive class); otherwise, the AR will not produce a flare within the next 24 (48, 72, respectively) hours of t (i.e., \(x_{t}\) belongs to the negative class). We can turn SolarFlareNet into a probabilistic forecasting method by directly outputting the predicted probability without comparing it with the threshold. Under this circumstance, the output is interpreted as a probabilistic estimate of how likely the AR will produce a flare within the next 24 (48, 72, respectively) hours of t. We employ a probability calibration technique with isotonic regression 36,37 to adjust the predicted probability and avoid the mismatch between the distributions of the predicted and expected probabilistic values 5. We add a suffix “-C” to SolarFlareNet to denote the network without calibration.

To evaluate the performance of a probabilistic forecasting method, we use the Brier score (BS) and Brier skill score (BSS), defined as follows4,5,38:

$$\begin{aligned} \text {BS} = \frac{1}{N}\sum _{i=1}^{N}(y_i - \hat{y}_i)^2 \end{aligned}$$
(6)
$$\begin{aligned} \text {BSS} = 1 - \frac{BS}{\frac{1}{N}\sum _{i=1}^{N}(y_i - \bar{y})^2} \end{aligned}$$
(7)

where N is the number of data samples in a test set; \(y_i\) denotes the observed probability and \(\hat{y}_i\) denotes the predicted probability of the ith test data sample, respectively; \(\bar{y} = \frac{1}{N}\sum _{i=1}^{N} y_i\) denotes the mean of all the observed probabilities. BS values range from 0 to 1, with 0 being a perfect score. BSS values range from \(-\infty\) to 1, with 1 being a perfect score.

Table 3 compares SolarFlareNet, used as a probabilistic forecasting method, with a closely related method9. The BS and BSS values in the table are mean values (with standard deviations enclosed in parentheses) obtained from 10-fold cross-validation. The metric values for the existing method are taken directly from the related work9. The best BS and BSS values are highlighted in bold. Notice that the existing method did not make 48-h or 72-h forecasts in advance. Table 3 shows that there is no definitive conclusion regarding the relative performance of SolarFlareNet and the existing method. The existing method is better in terms of BS, while SolarFlareNet is better in terms of BSS. On the other hand, the calibrated version of a model is better than the model without calibration. Notice also that the results of the 48-h and 72-h forecasts are worse than those of the 24-h forecasts. This is understandable since the longer the prediction window, the worse the performance a model achieves due to data deviation over time.

Table 3 Performance comparison between SolarFlareNet and an existing method for probabilistic flare predictions (24 to 72 h in advance).

The SolarFlareNet system

Figure 1
figure 1

The graphical user interface of the SolarFlareNet system.

We have implemented the probabilistic forecasting method described above into an operational, near real-time flare forecasting system. To access the system, visit the SolarDB website at https://nature.njit.edu/solardb/index.html. On the website, select and click the menu entry “Tools” and then select and click “Flare Forecasting System.” Figure 1 shows the graphical user interface (GUI) of the system. It displays a probabilistic estimate of how likely an AR will produce a flare within the next 24, 48, and 72 h of the time point at which the system is updated each day. No prediction is made for an AR marked with a special character *, #, or \(\sim\) where

  • * means the AR is near the limb,

  • # means the AR is spotless with the number of spots being zero,

  • \(\sim\) means no SHARP data is available for the AR.

The system provides daily predictions based on the data obtained from the previous day. When the user clicks the link to the previous day, the user is led to the SolarMonitor site that is accessible at https://www.solarmonitor.org/index.php where detailed AR information for that day is available. The system also provides previous forecasting results since the operational system came online. We compare the previous forecasting results with the true flare events in the GOES X-ray flare catalogs provided by NCEI. The SolarFlareNet system achieves 89% (76%, 71%, respectively) accuracy for 24-h (48-h, 72-h, respectively) ahead predictions.

Discussion and conclusion

In this article, we present a novel transformer-based framework to predict whether a solar active region (AR) would produce a \(\gamma\)-class flare within the next 24 to 72 h where \(\gamma\) is \(\ge\)M5.0, \(\ge\)M, or \(\ge\)C. We use three transform models to handle the three classes of flares individually and separately. All three transformer models perform binary predictions. We collect ARs with flares that occurred between 2010 and 2022 from the GOES X-ray flare catalogs provided by the National Centers for Environmental Information (NCEI). In addition, we downloaded SHARP magnetic parameters from the Joint Science Operations Center (JSOC). Each data sample contains SHARP parameters suitable for machine learning. We conducted experiments using 10-fold cross-validation9. Based on the experiments, our transformer-based framework generally performs better than closely related methods in terms of TSS (true skill statistics), as shown in Table 1. We further extend our framework to produce probabilistic forecasts of flares and implement the framework into an operational, near real-time flare forecasting system accessible on the Web. The probabilistic framework is comparable to a closely related method9 in terms of BS (Brier score) and BSS (Brier skill score) when making 24-h forecasts, as shown in Table 3, although the existing method did not make 48- or 72-h forecasts. Thus, we conclude that SolarFlareNet is a feasible tool for producing flare forecasts within 24 to 72 h.

Methods

Data collection

In this study we used SHARP magnetic parameters2,9,25 downloaded from the Joint Science Operations Center (JSOC) accessible at http://jsoc.stanford.edu/. Specifically, we collect data samples, composed of SHARP parameters, at a cadence of 12 minutes where the data samples are retrieved from the hmi.sharp_cea_720s data series on the JSOC website using the Python package SunPy39. We selected nine SHARP magnetic parameters as suggested in the literature2,3,4,9,19. These nine parameters include the total unsigned current helicity (TOTUSJH), total unsigned vertical current (TOTUSJZ), total unsigned flux (USFLUX), mean characteristic twist parameter (MEANALP), sum of flux near polarity inversion line (R_VALUE), total photospheric magnetic free energy density (TOTPOT), sum of the modulus of the net current per polarity (SAVNCPP), area of strong field pixels in the active region (AREA_ACR), and absolute value of the net current helicity (ABSNJZH). Table 4 presents an overview of the nine parameters. The SHARP parameters’ values are in different scales and units; therefore, we normalize each parameter’s values using the min-max normalization method4,5. Formally, let \(p^k_{i}\) be the original value of the ith parameter of the kth data sample. Let \(q^k_{i}\) be the normalized value of the ith parameter of the kth data sample. Let \(min_i\) be the minimum value of the ith parameter. Let \(max_i\) be the maximum value of the ith parameter. Then

$$\begin{aligned} q^k_{i} = \frac{p^k_i - min_i}{max_i - min_i} \end{aligned}$$
(8)

We collected A-, B-, C-, M- and X-class flares that occurred between May 2010 and December 2022, and their associated active regions (ARs) from the GOES X-ray flare catalogs provided by the National Centers for Environmental Information (NCEI). Flares without identified ARs were excluded. This process yielded a database of 8 A-class flares, 6571 B-class flares, 8973 C-class flares, 895 M-class flares, and 58 X-class flares. Also, we collected 10 nonflaring ARs40. We collected data samples that were 24 (48, 72, respectively) hours before a flare. Furthermore, we collected data samples that were 24 (48, 72, respectively) hours after the start time of each nonflaring AR. The data was then cleaned as follows2,5,9.

We discard ARs that are outside ± 70\(^\circ\) of the central meridian. These ARs are near the limb and have projection effects that render the calculation of the ARs’ SHARP parameters incorrect. In addition, we discard a data sample if (i) its corresponding flare record has an absolute value of the radial velocity of SDO greater than 3500 m \(s^{-1}\), (ii) the HMI data have low quality41, or (iii) the data sample has missing values or incomplete SHARP parameters. Thus, we exclude low-quality data samples and keep qualified data samples of high quality in our study.

Table 4 Overview of the nine SHARP parameters used in our study.

Data labeling

Data labeling is crucial in machine learning. To predict \(\ge\)C-class flares, suppose that a C-, M-, or X-class flare occurs at time point t on an AR (more precisely, the start time of the flare is t). Data samples between t and t − 24 h (48, 72 h, respectively) in the AR are labeled positive. If the flare occurs at time point t is an A-class or B-class flare, the data samples between t and t − 24 h (48, 72 h, respectively) in the AR are labeled negative. Figure 2 illustrates the labeling scheme to predict whether a \(\ge\)C-class flare would occur within 24 h. In predicting \(\ge\)M-class flares, we use \(\ge\)M-class flares to label positive data samples; use \(\le\)C-class flares to label negative data samples. In predicting \(\ge\)M5.0-class flares, we use \(\ge\)M5.0-class flares to label positive data samples; use \(\le\)C-class flares as well as M1.0- through M4.0-class flares to label negative data samples. If there are recurring flares whose corresponding data samples overlap, we give priority to the largest flare and label the overlapped data samples based on the largest flare. In all three prediction tasks, the data samples in the nonflaring ARs are labeled negative.

Table 5 shows the total numbers of positive and negative data samples in each class for 24-, 48-, and 72-h ahead flare predictions. The numbers in the table are lower than expected. This is because we discarded/removed many low-quality data samples as described above. If a gap occurs in the middle of a time series due to removal, we use a zero-padding strategy5,9 to create a synthetic data sample to fill the gap. The synthetic data sample has zero values for all nine SHARP parameters. The synthetic data sample is added after normalization of the values of the SHARP parameters, and therefore the synthetic data sample does not affect the normalization procedure.

Figure 2
figure 2

Illustration of positive and negative data samples used in predicting \(\ge\)C-class flares. In the left panel, the red vertical line indicates the start time of a \(\ge\)C-class flare. The data samples collected in the 24 h prior to the red vertical line are labeled positive (in green color). In the right panel, the red vertical line indicates the start time of an A-class or B-class flare. The data samples collected in the 24 h prior to the red vertical line are labeled negative (in yellow color).

For each prediction task, we divide the corresponding data samples into 10 equal sized distinct partitions/folds that are used to perform 10-fold cross-validation experiments. In the run i, where \(1 \le i \le 10\), we use fold i as the test set and the union of the other nine folds as the training set. The data samples of the same AR are placed in the training set or the test set, but not both. This scheme ensures that a model is trained with data different from the test data and makes predictions on the test data that it has never seen during training. There are 10 folds and, consequently, 10 runs. The means and standard deviations of the performance metrics’ values over the 10 runs are calculated and recorded.

Data augmentation

The data sets used in this study to predict flares of the \(\ge\)M- and \(\ge\)M5.0-class are imbalanced as shown in Table 5 where negative data samples are much more than positive data samples. Imbalanced data pose a challenge in model training and often result in poor model performance. One may use data augmentation to combat the imbalanced data. Data augmentation is an important technique that enriches training data and increases the generalization of the model 42. Here, we adopt the Gaussian white noise (GWN) data augmentation scheme because it has shown a significant improvement in model performance43,44. GWN assumes that any two values are statistically independent, regardless of how close they are in time. The stationary random values of GWN are generated using the zero mean and 5\(\%\) of the standard deviation. During training, the data augmentation is applied to the minority (positive) class, leaving the majority (negative) class as is. During testing, the data are left without any augmentation so that the model predicts only on the actual test data to avoid any misleading performance assessment.

Table 5 Total numbers of positive and negative data samples in each class for 24-, 48-, and 72-h ahead flare predictions.

The SolarFlareNet architecture

Figure 3 presents the architecture of SolarFlareNet. It is a transformer-based framework that combines a one-dimensional convolutional neural network (Conv1D), long short-term memory (LSTM), transformer encoder blocks (TEBs), and additional layers that include batch normalization (BN) layers, dropout layers, and dense layers. The first layer is the input layer, which takes as input a time series of m consecutive data samples \(x_{t-m+1}\), \(x_{t-m+2}\) ...\(x_{t-1}\), \(x_{t}\) where \(x_{t}\) is the data sample at time point t5. (In the study presented here, m is set to 10.) The input layer is followed by a BN layer. BN is an additional mechanism to stabilize SolarFlareNet, make it faster, and help to avoid overfitting during training45. We applied BN after the input layer, the LSTM layer, and within the TEBs to make sure that SolarFlareNet is stable throughout the training process. The BN layer is followed by the Conv1D layer because time series generally have a strong 1D time locality that can be extracted by the Conv1D layers46. Then, the LSTM layer is used, which is equipped with regularization to also avoid overfitting. LSTM is suitable for handling time series data to capture the temporal correlation and dependency in the data. Adding an LSTM layer after a Conv1D layer has shown significant improvement in time series prediction47,48,49. The LSTM layer passes the learned features and patterns to a BN layer to stabilize the network before the data go to the TEBs.

We use transformer encoders without decoders because we process time series here, rather than performing natural language processing where the decoders are required to decode the words for sentence translation. The number of TEBs is set to 4. This number has a significant effect on the overall performance of the model50. When we use less than 4 TEBs, the model is not able to learn useful patterns and is under-fitted. When we use more than 4 TEBs, the large number of TEBs causes overhead on the encoder processing while the model tends to do excessive overfitting and lean toward the majority class (i.e., negative class) in the data, ignoring the minority class (i.e., positive class) entirely. Each TEB is configured with a dropout layer, multi-head attention (MHA) layer, a BN layer, a Conv1D layer, and an LSTM layer. The MHA layer is the most important layer in the encoder because it provides the transformation on the sequence values to obtain the different metrics. The MHA layer is configured with 4 heads and each attention head is also set to 4. The dropout layer is mainly used to overcome the overfitting caused by the imbalanced data. It drops a percentage of the neurons from the architecture, which causes the internal architecture of the model to change, allowing for better performance and stability. Finally, the softmax function is used as the final activation function, which produces a probabilistic estimate of how likely a flare will occur within the next 24 (48, 72, respectively) hours of t.

Figure 3
figure 3

Architecture of SolarFlareNet.

Ablation study

We performed ablation tests to assess each component of SolarFlareNet. We consider four variants of SolarFlareNet, denoted SolarFlareNet-Conv, SolarFlareNet-L, SolarFlareNet-ConvL, and SolarFlareNet-T, respectively. Here, SolarFlareNet-Conv (SolarFlareNet-L, SolarFlareNet-ConvL, SolarFlareNet-T, respectively) represents the subnet of SolarFlareNet in which the Conv1D layer (LSTM layer, Conv1D and LSTM layers, transformer network with the 4 TEBs, respectively) is removed while keeping the remaining components of the SolarFlareNet framework. Table 6 compares the TSS values of the five models for the 24-, 48-, and 72-h ahead flare prediction. It can be seen from Table 6 that the full model, SolarFlareNet, outperforms the four subnets in terms of the TSS metric. This happens because the SolarFlareNet-Conv model captures the temporal correlation of the test data, but does not learn additional characteristics of the data to build a stronger relationship between the test data. SolarFlareNet-L captures the properties of the test data, but lacks knowledge of the temporal correlation patterns in the data to deeply analyze the sequential information in the test data. It can also be seen from Table 6 that the SolarFlareNet-ConvL model is not as good as the full model, indicating that the transformer network alone is not enough to produce the best results. Lastly, SolarFlareNet-T has the least performance among the four subnets, demonstrating the importance of the transformer network. In conclusion, our ablation study indicates that the performance of the proposed SolarFlareNet framework is not dominated by any single component. In fact, all components have made contributions to the overall performance of the proposed framework.

Table 6 TSS values of the five methods considered in the ablation study.