Spectral band selection and ANIMR-GAN for high-performance multispectral coal gangue classification

Low-energy and efficient coal gangue sorting is crucial for environmental protection. Multispectral imaging (MSI) has emerged as a promising technology in this domain. This work addresses the challenge of low resolution and poor recognition performance in underground MSI equipment. We propose an attention-based multi-level residual network (ANIMR) within a super-resolution reconstruction model (ANIMR-GAN) inspired by CycleGAN. This model incorporates improvements to the discriminator and loss function. We trained the model on 600 coal and gangue MSI samples and validated it on an independent set of 120 samples. The ANIMR-GAN, combined with a random forest classifier, achieved a maximum accuracy of 97.78% and an average accuracy of 93.72%. Furthermore, the study identifies the 959.37 nm band as optimal for coal and gangue classification. Compared to existing super-resolution methods, ANIMR-GAN offers advantages, paving the way for intelligent and efficient coal gangue sorting, ultimately promoting advancements in sustainable mineral processing.


Multispectral data acquisition
Shaanxi is an important coal-producing province in China, with areas such as Yan'an, Jingbian, and Yulin being the main coal mining regions 23 .300 coal samples and 300 gangue samples were selected from the CM1-CM5 mining areas, as shown in Fig. 1.After collecting MSI, 300 images of coal samples and 300 images of gangue samples were obtained.These images were used for modeling and analyzing the performance of algorithms.In addition, 60 coal samples and 60 gangue samples were collected separately from the CM6 mining area, and MSI of the coal and gangue were obtained.These images were not used for modeling, but only for the validation of the final model.The size of the selected coal and gangue is 50-300 mm, mainly non-stick coal and long flame coal.
The MSI system for coal and gangue recognition is illustrated in Fig. 2, which consists of a light source and a spectrometer.The light source device is LS-LHA produced by SUMITA OPTICAL GLASS, Inc., with a maximum power of 150W and the corresponding average illumination is 406 k lx.The filtering device consists of two filters manufactured by Edmund Optics (United States), which allows only light with a wavelength in the range of [675 nm 975 nm] to pass.The multispectral imager is an MQ022HG-IM-SM5X5-NIR produced by XIMEA, Germany.It can obtain 25 spectral images in the wavelength range of [675 nm 975 nm], with a resolution of 409 × 216 Pixel for each image.For more detailed information, please refer to the References 26,27 .
The focal length and exposure time of the spectrometer were set to 3.2 mm and 85.03 ms, respectively.A total of 600 coal and gangue spectral imaging data were used for modeling, including 300 coal samples and 300 gangue samples.Due to the limited sample size, data augmentation techniques such as flipping and adding Gaussian noise were applied to the spectral images, resulting in a total of 3000 multispectral data.Among them, there were 1500 coal images and 1500 gangue images, with each multispectral data containing 25 spectral images.Figure 3 displays the spectral images of coal and gangue, where (01)- (25) represent the different wavelength bands of the spectral images in HSImager software.
The grayscale images are converted to three-channel thermodynamic diagrams, ranging from red to white for enhanced visualization.To standardize analysis, the images are resized to 400 × 400 pixels using quadratic interpolation.Figure 4 showcases the results-although visual differences between coal and gangue are more pronounced after processing, direct visual classification remains challenging.This highlights the need for superresolution (SR) techniques combined with a suitable classification algorithm to optimize the mineral sorting process.

Method
This paper proposes a SR method that integrates attention network and improved multi-level residual (ANIMR).This method introduces attention mechanism and utilizes the improved residual network to enhance the feature extraction ability.It has the dual advantages of low response time and high reconstruction image quality.The model consists of a generator and a discriminator, where the generator includes a reconstruction network G and a degradation network F, while the discriminator includes DL R and DH R .G is responsible for reconstructing LR to HR, F is responsible for downsampling HR image into LR image, D LR is responsible for distinguishing real LR images from degraded images, and D HR is responsible for distinguishing real HR images from HR images obtained through reconstruction.The overall ANIMR-GAN system architecture is shown in Fig. 5. where K RG1 , K RG2 , K RG3 , and K RG4 are the outputs of the four RB, I c represents the concatenation of the outputs of the four RB, I CAB represents the channel attention module, and I RG is a 1 × 1 convolutional layer used to adjust the number of output channels.
In ANIMR, RB architecture is augmented with a spatial attention block (SAB) placed at the end of the block, while removing the BN layer, as illustrated in Fig. 6B.This study builds on the RB structure designed in SRResnet and VDSR (as shown in Fig. 6C,D) 28 and introduces both channel attention module and SAB to enable the residual network to learn channel and spatial weights respectively, thereby enhancing the feature extraction capability of the network in different channel and spatial regions.The channel attention mechanism assigns higher weights to important channels, thus enhancing the feature extraction capability of the network without increasing the network width.The spatial attention mechanism assigns more attention to important high-frequency information such as texture and edges when allocating internal attention resources in the feature (1) Figure 2. MSI acquisition system, the acquisition system includes the host, PC, light source, filter and some necessary accessories.The acquisition system is arranged on the conveyor belt of coal and gangue.There is a dust removal system arranged on the belt.The dust removal system realizes the dustless work equipment by suction pipe, centrifugal machine and filter, the separation of coal and gangue is realized by the function of automatic air compressor.

Degenerate network
The task of the degenerate network is to degrade the HR image to the LR image, which is the reverse process of the reconstruction network.As shown in Fig. 7, where Interpolate is an interpolation downsampling operation, the degenerate network first downsamples the HR image to the low-resolution space, then uses the residual set extracted in ANIMR to learn the mapping relationship between HR and LR, and finally reconstructs the predicted LR image through a convolutional layer.

DL R and DH R discriminator
The discriminator used in the algorithm is based on the structure proposed in Ref. 29 , as shown in Fig. 8.The relative discriminator network is essentially a binary classifier, as shown in Eq. (2).
where HR r represents the original HR image, HR f represents the SR image obtained by the reconstruction network, σ denotes the sigmoid activation function, I D represents the parameters of the convolutional layers in the discriminator network, E represents the mean value of all samples in the current dataset.Assuming that the input to the discriminator is a reconstructed image HR f , the improved output is no longer the probability of HR r being (2) the original image, but the probability of HR f being more realistic than the original image.This enables the HR image to play a more active role in training.

Loss function design
During the reconstruction process, there is always inherent uncertainty in the details of the image from LR to HR.In ANIMR-GAN, the loss function of the reconstruction network G is designed to consist of four parts: pixel-level loss, perceptual loss, adversarial loss, and cycle consistency loss, as shown in Eq. ( 3).
(3)  where the variables H and W represent the size of the feature map, i and j represent the coordinates of the pixel.F i,j represents the pixel grayscale value of the SR image at that point, and Y i,j represents the pixel grayscale value of the original HR image at that point.The calculation error is modified from calculating the square of the difference to calculating the absolute difference, reducing the computational complexity during iteration, improving the convergence speed of the model, and alleviating the problem of over-blurring in the SR image.
Perceptual loss.Perceptual loss originally from Stanford university Li Fei team 30 , specifically for SR task.The perceptual loss function uses the VGG19 network for feature extraction.Since the perception gap between the two images can be measured more comprehensively by using the inactive feature map, the feature map before the ReLU activation layer is selected to calculate the loss.The perceived loss function is shown in Eq. (5).
where H and W is the size of image, subscript i,j represent the coordinates of pixel points; φ represent the VGG19 network; F i,j is the pixel gray value of the reconstructed image at the point; Y i,j is the pixel gray value of the HR original image at the point.
Adversarial loss.The training process of generating adversarial network is the iterative learning process of generator and discriminator in game adversarial.The adversarial loss function of relative discriminator is shown in the Eq. ( 6).The adversarial loss of the generator not only includes the reconstructed image but also the HR original image, thus promoting the training of the reconstruction network.In SRGAN, since there is a lack of real HR images for training, only reconstructed images can be used for training.As a result, traditional GAN models are not effective because there are no real images for the discriminator to compare with.Therefore, using a relative discriminator is a more appropriate choice.The relative discriminator can not only distinguish between reconstructed images and HR original images but also help the generator to better learn the mapping relationship between them.In this way, the generator can learn how to generate more realistic images, thereby improving the learning ability of the reconstruction network.
Cycle consistency loss.Through adversarial loss, the reconstruction network, degradation network, and discriminator can be trained separately.However, in the process of reconstruction, the network may generate different outputs based on the fixed input of MSI, leading to a one-to-many mapping between the LR and HR images.To ensure that the input image's feature information is not lost, cycle consistency loss is introduced in the cyclic structure.This loss can ensure that the input X remains close to the output y after passing through a cycle, as shown in Eq. ( 8).
The cycle consistency loss is obtained by summing the average absolute error between the inputs and outputs of the two cycle structures, as shown in Eq. ( 9).
where MAE represents mean absolute error, G represents the reconstruction network, and F represents the degradation network.

Evalution index
This article uses two evaluation parameters, namely, structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), to assess the super-resolution reconstruction algorithm 31 .SSIM is a subjective measure based on three relatively independent factors: luminance, contrast, and structure.It is used to quantify the structural similarity of the generated image, and its value ranges from 0 to 1.A higher value indicates a smaller error between the reconstructed image and the original image.The definition of SSIM is as follows Eq. (10).
PSNR, a sensitive error-based image evaluation metric, is mainly calculated based on the mean square error (MSE) between the original and reconstructed images.It is commonly used to measure the similarity between the original and reconstructed images, with higher values indicating smaller errors between them.The formula for calculating PSNR is as follows in Eq. (11).
where n represents the number of bits per pixel, which is typically 8. l and m represent the dimensions of the image, and I SR(i,j) and I HR(i,j) denote the pixel values of the SR image and HR image at position (i,j).µ I SR and µ I HR are the mean values of the pixels in the SR and HR images, while σ I SR and σ I HR are the standard deviations of the pixels in SR and HR images.σ I SR I HR represents the covariance between the SR and HR images, c 1 and c 2 are constants.

Random forest modeling
Random forest (RF) is an improved variant of the bagging integration algorithm.Based on the decision tree, the random forest model is constructed by integrating multiple decision trees.In particular, the random selection of feature attributes is introduced in the training process of the random forest.Therefore, for the same data set, the double randomness of random forest provides better generalization ability and anti-fitting ability.
After using the ANIMR-GAN method to reconstruct coal and gangue images, we plan to use the RF method to classify the reconstructed images.To achieve this, we need to determine two important parameters.Firstly, we need to set the number of decision trees in the RF, which is crucial for balancing computational resources and model performance.Secondly, we need to select the feature wavelengths of the coal and gangue MSI, identifying the key spectral lines that are relevant to coal and gangue recognition, which is essential for reducing model input and improving model robustness.

Experimental results and analysis
This section presents a comprehensive validation of the proposed algorithm using real-world coal and gangue multispectral images (MSI).We perform a comparative analysis against various existing algorithms to assess its effectiveness.To facilitate this comparison, we down-sample the original high-resolution (HR) images by factors of 2, 4, and 8, generating corresponding low-resolution (LR) counterparts.The validation experiments are conducted on a well-equipped computational platform comprising a 64-bit Ubuntu 20.04 operating system, an Intel Core i7 processor, and an Nvidia GeForce RTX 3090 graphics card.The algorithm implementation leverages Python 3.8 and the PyTorch deep learning framework.
Experimental steps are as follows: Firstly, all the data is divided into three data sets, namely training set, test set and independent verification set.The ratio is 7:3:2.The independent validation set samples are taken from CM6 mining area.Since the selection of the training set can affect the accuracy of the model, we repeated the recognition of spectral images at different wavelengths 100 times, with each training dataset being randomly selected.Secondly, the original grayscale image is mapped to a heatmap and resized to 400 × 400 pixels.Thirdly, the image is downsampled to obtain LR images at 2×, 4×, and 8× downsampling rates, resulting in image sizes of 200 × 200, 100 × 100, and 50 × 50, respectively.During training, the L1 loss function and Adam optimizer are used, with β 1 = 0.8, β 2 = 0.999, and ε = 10 −7 .The learning rate is set to 10 −4 .At the beginning of training, the reconstruction network and degradation network are separately pre-trained for 10 rounds.Then, the trained model and discriminator are used for alternate training for 30,000 iterations.The total training time for the model is about 48 h.The specific parameters of the model are shown in Table 1.

Determination of the number of residuals
ANIMR-GAN uses a residual set to learn the mapping from LR images to HR images.The number of RG affects the overall network parameters and results.We trained different numbers of RG and tested 4 × SR on the dataset.The number of RG was set to 8, 16, 32, 48, and 64. Figure 9 shows the relationship between the number of RG and the reconstruction results.When the number of RG is 8, the SSIM and PSNR values are 0.8894 and 32.54, respectively.When the number of RG is increased from 8 to 16, the SSIM and PSNR values continue to improve, with an increase of 5.959‰ and 7.068‰ to reach 0.8947 and 32.77, respectively, indicating an improvement in the reconstruction quality.When the number of RG is increased from 16 to 48, the SSIM value shows a slight Experimental results for different numbers of RG, the optimal number of RGs for the ANIMR-GAN model was determined to be 32 based on a balance between reconstruction quality and computational efficiency.This configuration provides a good balance of performance and resource utilization, making it suitable for realtime applications where both accuracy and speed are important.
decrease, but the PSNR continues to increase, reaching 0.8911 and 32.82, respectively.When the number of RG is increased from 48 to 64, both SSIM and PSNR values increase to 0.8934 and 32.94, respectively, and the reconstruction quality reaches its best.Increasing the number of RG from 8 to 64 increases the number of parameters from 1,765,021 to 11,940,358.However, this substantial increase in the number of parameters does not significantly improve the reconstruction quality and instead wastes computational resources.Therefore, we determine the number of RG to be 32.This ensures reconstruction quality while reducing the number of parameters and improving computational speed.

Comparison of different SR methods
In this section, we compare ANIMR with commonly used SR methods, including super resolution convolutional neural network (SRCNN), sparse coding-based network (SCN), and very deep networks for SR (VDSR).The evaluation is based on existing MSI of coal and gangue, and various metrics such as PSNR and SSIM are used to comprehensively assess the performance of each model.The main idea of SRCNN is to divide the image into three stages: image block extraction, non-linear mapping, and image reconstruction, based on the relationship between deep learning and traditional sparse coding.These stages are then combined into a single deep convolutional neural network framework.
The VDSR is based on the SRCNN approach.However, it incorporates a VGG network structure for image classification and introduces a deeper neural network with 20 weight layers.This approach aims to establish a mapping model from low-resolution to HR images using a more profound network architecture.
The sparse coding based network (SCN) is a method that borrows from the idea of sparse representation SR.It combines the independent optimization modules of sparse representation, mapping, and sparse reconstruction into a sparse network and collaboratively optimizes these modules to obtain a globally optimal solution.SCN first obtains the sparse prior information of the image through the feature extraction layer, and then establishes a feedforward neural network SCN based on the learned iterative shrinkage and thresholding algorithm (LISTA) to realize the sparse encoding and decoding of the image.Finally, image enlargement is achieved through a cascaded network.
All of the above three methods, including the Super Resolution Convolution Neural Network (SRCNN), Very Deep Networks for super-resolution (VDSR), and SCN, are commonly used SR algorithms.The following section will compare ANIMR with these three methods, using evaluation metrics such as PSNR and SSIM to comprehensively evaluate each model.
Different models of ANIMR, SRCNN, VDSR, and SCN were established and their reconstruction results under different iterations are shown in Fig. 10.The horizontal axis represents the iteration number of the algorithm, while the vertical axis shows the PSNR and SSIM indicators obtained from the validation dataset.The results indicate that the reconstruction results of ANIMR-GAN are significantly better than those of the simpler SRCNN and VDSR, and slightly better than those of SCN.
To compare the SR performance of the four methods (SRCNN, VDSR, SCN, ANIMR-GAN) at different magnification factors, we used each method with 2×, 4×, and 8×, and calculated the SSIM and PSNR values.Figure 11 provides detailed comparison results.Overall, the SSIM and PSNR values of the reconstruction models decrease with increasing magnification factors, indicating a decrease in reconstruction performance.For the × 2 magnification model, ANIMR-GAN achieved the best PSNR of 38.61 and an SSIM of 0.957.For the × 4 magnification model, ANIMR-GAN achieved the highest SSIM and PSNR values, which were 32.97 and 0.906, respectively.SCN also performed well, with SSIM and PSNR values of 32.9 and 0.902, respectively.For the × 8 magnification model, ANIMR-GAN still performed the best, with PSNR and SSIM values of 27.56 and 0.780, respectively.To balance the computational resources and reconstruction performance, we chose a magnification factor of × 4, which ensures the reconstruction performance without a dramatic increase in the input volume of the classification model.
Based on the experimental results mentioned above, it can be concluded that the ANIMR-GAN method performs well in identifying coal and gangue images under different iteration numbers and reconstruction ratios.Compared with SRCNN, VDSR, and EDSR, it shows higher SSIM and PSNR values.

Random forest classification method based on ANIMR-GAN
Firstly, the number of decision trees in the RF needs to be determined.Previous research has shown that the more decision trees in a RF, the better the learning performance.However, we cannot set too many decision trees, as this would be a waste of computer resources and time.We selected five typical bands, including the maximum value, minimum value, and three intermediate values at 682.92 nm, 736.25 nm, 851.48 nm, 900.95 nm, and 959.37 nm, to study the relationship between the number of decision trees in RF and accuracy.
In RF, the more decision trees there are, the better the learning performance, but we cannot set too many decision trees as this would be wasteful in terms of computer resources and time.Figure 12 shows the average recognition accuracy for the selected bands at different numbers of decision trees.As the number of decision trees increases, the average accuracy also increases.However, once the number of decision trees exceeds 50, the increase in average accuracy becomes minimal, and the accuracy stabilizes.Therefore, we set the number of decision trees to 50.Additionally, we observed that the average accuracy of different spectral bands was not consistent, with 736.25 nm having a relatively large fluctuation in average recognition rate for coal and gangue, which may be related to the light source.
Figure 13 illustrates the range and average accuracy of identification obtained by inputting MSI of different wavelengths into the RF model.The maximum and minimum values are marked with dotted lines and labels in the figure.The trend of average accuracy is positively correlated with wavelength, i.e., higher wavelengths tend to have higher accuracy rates.The maximum average accuracy appears at 959.37 nm, while the minimum appears at 872.87 nm.This experimental result is very close to previous research 22 .Since there is a one-to-one correspondence between wavelength and functional groups of substances, 872.87 nm corresponds to the stretching and vibration of the C-O chemical bond in alcohols or phenols.We speculate that the poor accuracy at this wavelength may be due to the presence of certain alcohol-like substances in both coal and gangue.

Comparison of classification methods for coal gangue with SR
After conducting experiments to determine the optimal number of decision trees in the RF model, it has been confirmed that the combination of ANIMR-GAN and RF can be used for identifying coal and gangue in MSI.
To compare the performance of different classification methods, commonly used machine learning methods such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Least Squares Support Vector Machines (LSSVM), and eXtreme Gradient Boosting (XGBoost) will be compared to RF.The parameter settings for each method are listed in Table S1, the parameter optimization method mainly employs grid search.Figure 14 shows the comparison results of recognition accuracy, and the following experimental conclusions are obtained.
(1) To compare the classification performance of different machine learning models on the coal and gangue images reconstructed by ANIMR-GAN, KNN, SVM, LSSVM, XGBoost, and RF algorithms were used.The accuracy of each band image is different, and overall, the accuracy tends to increase with the increase in wavelength.
(2) Based on the experimental results, the classification accuracy of KNN, SVM, and LSSVM was not as good as that of RF in the wavelength range of 682.92 nm to 882.95 nm.However, for wavelengths beyond 882.95 nm, the five methods showed similar classification performance.XGBoost's classification accuracy was not significantly different from that of RF across all 25 wavelength bands, but its average accuracy was slightly lower.(3) Compared to KNN, SVM, and LSSVM, RF has a lower standard deviation, resulting in smaller fluctuations in accuracy.Although XGBoost has a similar average recognition rate to RF, its standard deviation is still larger.(4) It is worth noting that the maximum recognition rates for SVM, LSSVM, and XGBoost all correspond to the 932.08 nm wavelength, which is associated with the stretching and vibration of C-O-C chemical bonds in lipid substances.This suggests that the chemical substances at this wavelength may be of significant importance in distinguishing between coal and gangue.However, due to the precision limitations of the

Conclusion
The separation of coal and gangue is a crucial step in the coal mining and selection process.The use of MSI combined with ore sorting equipment for gangue separation can greatly improve mining efficiency and save on the consumption of ore sorting reagents and water resources.SR can address the problem of LR caused by equipment limitations and dust interference.In this paper, an attention mechanism is introduced to construct a multi-level residual network and a cyclic generative adversarial network structure, forming a new reconstruction method called ANIMR-GAN.This is of significant importance in accelerating the application of MSI in coal and gangue identification.We collected 60 coal samples and 60 gangue samples from six mining areas, with samples from five of the mining areas used for model construction and research, and samples from the remaining mining area used for independent data set validation after modeling.We employed a MSI device to capture images in the 682.92-959.37 nm wavelength range.The impact of different residual combinations on the SSIM and PSNR values of the reconstruction model was compared, and 32 RB were ultimately chosen to balance computation and reconstruction efficacy.ANIMR-GAN was compared to three commonly used reconstruction algorithms, SRCNN, VDSR, and SCN, under different iteration numbers and reconstruction magnifications.The results showed that as the iteration number increased, the reconstruction performance improved.ANIMR-GAN's reconstruction results were significantly superior to those of the simpler SRCNN and VDSR models and slightly better than SCN.In the 4× reconstruction, ANIMR-GAN had the highest SSIM and PSNR values, which were 32.97 and 0.906, respectively.To balance classification and reconstruction performance, a reconstruction magnification of 4 was chosen.
Next, we compared the multispectral RF classification models with different numbers of decision trees and different spectral bands.The results showed that the accuracy of the RF models varied depending on the spectral band used, with generally better performance observed at longer wavelengths.However, the model built using 872.87 nm had the lowest accuracy.Further analysis revealed that both coal and gangue contain certain alcohols that may have affected the modeling results.Nonetheless, further investigation is needed to confirm this finding.
Finally, we combined ANIMR-GAN with the RF algorithm and compared it with KNN, SVM, LSSVM, and XGBoost, achieving good classification performance on both the modeling and independent datasets.The highest accuracy reached 97.78%, corresponding to a wavelength of 959.37 nm, with an average accuracy of 93.72%.The results show that using the 959.37 nm band for modeling is more effective than using all 25 bands.Compared to previous studies on coal and gangue classification without SR, this approach improved accuracy by approximately 2% 11 .
Certainly, there is always room for improvement.In the next step, we will study feature extraction from the images reconstructed by ANIMR-GAN, and select combinations of bands for modeling analysis, to continuously improve the accuracy of the models we build.

Figure 3 .
Figure 3. Multispectral data for coal and gangue, the spectral images of coal and gangue are displayed, where (01)-(25) represent the different wavelength bands of the spectral images in HSImager software.(A) Coal, (B) gangue.

Figure 4 .
Figure 4. Spectral data of coal and gangue after treatment, the spectral images of coal and gangue are displayed, where (01)-(25) represent the different wavelength bands of the spectral images.(A) Coal, (B) Gangue.

Figure 5 .
Figure 5.The overall ANIMR-GAN system architecture, RG residual group, The large heat maps represent the high-resolution images, the small heat maps represent the low-resolution images, a solid line represents a generative network, and a dashed line represents a reconstructed network, including a generator G, a degenerate network F, a low-resolution discriminator DL R , and a high-resolution discriminator DH R .

Figure 7 .
Figure 7. Degenerate network structure, HR is high resolution images, RG is Residual Group, the degenerate network first downsamples the HR image to the low-resolution space, then uses the residual set extracted in ANIMR to learn the mapping relationship between HR and LR.

Figure 8 .
Figure 8. Discriminator structure.(A) The overall structure of the discriminator.(B) The structure of the Conv block.

Figure 10 .
Figure 10.Experimental results of ANIMR-GAN and other networks with different models.(A) The value of PSNR.(B) The value of SSIM.

Figure 11 .
Figure 11.The experimental results of different SR methods.The labels on the bar chart represent the SSIM of different models.

Figure 13 .
Figure 13.The average recognition accuracy and accuracy range corresponding to different wavelengths.
The relationship between the average recognition rate of RF model and different quantitative decision trees.

Table 2 .
The validation results on the independent dataset.