Introduction

Computed tomography (CT) imaging technology is essential in modern clinical medical imaging diagnosis. CT scanning can provide multi-directional anatomical images of various parts of the human body, clearly displaying the structural features, organs and lesions of each part, providing an important diagnostic basis for doctors1. However, the radiation produced during CT scans can impact the subjects’ physical health. Studies have found that when the dose of X-rays absorbed by the human body exceeds the normal range, it can cause metabolic disorders in the body and induce leukemia, cancer, and many other diseases2. To reduce the harm of CT radiation to the body, the clinical use of low-dose CT (Low Dose CT, LDCT) scanning, through the reduction of X-ray tube current, reduces the tube voltage and other measures to reduce CT radiation dose3.Under the conditions of low-dose CT imaging, a dramatic reduction in the number of X-rays, resulting in differences in the number of photons received by the detector, causes quantum noise and artifacts in reconstructed images4. The presence of noise will cause CT image details to be blurred and contrast to be reduced, this affects the accuracy of medical imaging.Therefore, LDCT image denoising has become a research direction of great interest in the field of medical imaging, and the current methods to improve the quality of LDCT imaging can be categorized into projection domain processing methods5, iterative reconstruction methods6 and image post-processing.

The projection domain denoising algorithm denoises the projection data directly and then goes through the CT reconstruction algorithm to get the denoised CT image. Classical methods in projection domain denoising algorithms include bilateral filtering, total generalized variational filters8, etc. Li et al.7 studied the quantum noise problem in LDCT images and proposed a penalized likelihood method to suppress this noise9. Cui et al.10 proposed a statistical sinogram smoothing method based on adaptive weighted total variation regularization to reduce noise while preserving image detail information for the LDCT noise problem. The projection domain processing method has the advantages of small computation and short time consumption. However, other noises may be introduced when reconstructing CT images, and the method relies on projected data. Iterative reconstruction methods are another common approach to improve the quality of LDCT images. The method mainly focuses on obtaining denoised images by optimizing the prior regularization objective function. Researchers have proposed a large number of iterative reconstruction algorithms. Liang et al.11 proposed the adaptive statistical iterative reconstruction (ASIR) algorithm to reconstruct LDCT images, but the method has limitations in image reconstruction. In recent years, numerous improved methods have been proposed on the basis of iterative reconstruction techniques. Sidky et al.12 added the full variate as an image prior to the objective function. However, the TV constraint is predicated on the assumption that the signal satisfies the segmental smoothness. In situations where complex structures are not considered, the application of TV regularization methods can result in the blurring of fine details in the reconstructed image. To solve this problem, Liu et al.13 proposed an adaptive weighted total variation minimization method, which can effectively reduce noise and artifacts in LDCT image reconstruction. Bredies et al.14 introduced a generalized real variation constraint in a penalized weighted most minor squares framework, which can effectively reduce the block artifacts caused by the TV constraint. However, iterative reconstruction methods require projection data, are vendor-dependent, and should be implemented on the scanner’s reconstruction system, where performing multiple iterations incurs high computational costs15.

Image post-processing methods can directly denoise the reconstructed CT images without relying on the original projection data, and can be flexibly applied to different scanning systems with fast processing speed, which is an important direction in the current research of low-dose CT denoising. Therefore, many researchers have proposed various post-processing methods, such as Total variation and its variants16, Wavelet transform algorithm17, large-scale non-local mean (LNLM) filter18 combined with multi-scale directional diffusion scheme based on fractional-order partial differential equations (PDEs) filter19 and residual codec convolutional network algorithm20, etc. Buades et al.21. Proposed the non-local method (NLM), this method searches for similar pixels in a non-local neighborhood. Used by researchers to denoise low-dose CT images22. However, as the noise level increases, the accuracy of this method in identifying similar pixels decreases. In23 the quality of low-dose CT images through a large-scale NLM algorithm. However, this algorithm has no significant effect in removing artifacts. Zamyatin et al.24. proposed an adaptive multi-scale TV filtering algorithm, this algorithm decomposes the image into multiple scales through Gaussian pyramid decomposition, effectively suppressing noise in low-dose CT images. However, the total variation method will cause the edges of the image to be blurred or lost while removing noise.

To further improve the denoising results, the image denoising method based on dictionary learning has also been applied to low-dose CT image post-processing. Elad et al.25 first proposed a K-SVD dictionary learning algorithm, Zhu et al.26 used it for LDCT image denoising, improving the image quality compared with the traditional methods, but it was not easy to protect the edge and detail information of the image. K-SVD dictionary learning algorithm, works by learning from a given set of images to adapt themselves to their content, but this algorithm only trains a dictionary of fixed atomic size, there are deficiencies in the precise description of image information. To further improve the performance of the sparse representation denoising method, Chen et al.27. proposed a sparse representation method based on fast dictionary learning to remove artifacts and noise in low-dose CT images of abdominal tumors. However, when strong artifacts appear in the image, this method is difficult to suppress, this is because streak artifacts can lead to the same large rarefaction coefficients as tissue structures. To solve this problem, Chen et al.28. proposed a dictionary learning (ASDL) algorithm for artifact suppression, combining wavelet decomposition and dictionary learning to remove noise and artifacts effectively. Chen et al.29. proposed applying a denoising method based on combining dictionary learning and sharpening filtering to low-dose CT images of abdominal tumors, effective recovery of valid information from CT images. However, when unsharp filtering is used for image denoising, it may cause the image to be over-smoothed, causing some detailed information to be lost. Some researchers combine image self-similarity models with sparse representations, Jiang et al.30. proposed a weighted coding algorithm, combining non-local self-similarity of images with sparse representation to build a variational model, which can retain more effective information. However, the dictionary used for sparse representation does not classify images according to the specific information they contain and does not accurately describe the information contained in the image.

Although the denoising algorithm based on sparse representation has achieved a good denoising effect to a certain extent, how to effectively retain image details while removing image noise, is still a question worth studying. To solve these problems, this paper proposes an LDCT image denoising method based on two-dimensional variational mode decomposition and dictionary learning. This method starts from the perspective of image frequency division processing and optimization of the training dictionary. First, two-dimensional variational modal decomposition is used to decompose the image into modal components with different center frequencies. Then, use the K-SVD dictionary learning algorithm to train a targeted dictionary for each type of modal component. Finally, perform relevant atom detection on the dictionary and optimize the dictionary atoms that cannot represent the image. This method obtains an over-complete dictionary that is more suitable for image features through classification training of image blocks, and the ability to represent images more sparsely. At the same time, the optimization of noise atoms in the over-complete dictionary after training can reduce the impact of noise atoms on image denoising.Therefore, the work of this paper can be summarized as follows:

  1. 1.

    To solve the problem that traditional sparse dictionary training methods lack consideration of differences in image structural characteristics, therefore, the redundant dictionary obtained by training cannot fully reflect the detailed characteristics of the image. The proposed classification dictionary training method, Uses two-dimensional variational mode decomposition to decompose image signals into multiple modal components, and classification training of modal components containing different image information. The obtained dictionary can preserve the texture and structure of the image at different scales, while better adapting to the complexity and diversity of images.

  2. 2.

    To improve the sparse representation ability of the dictionary, In the sparse coding stage, the regularized orthogonal matching pursuit algorithm (ROMP) is introduced to sparsely represent the image signal. Selecting the most relevant atoms, and updating residuals and weights, enables more accurate sparse representation of signals.

  3. 3.

    To avoid the impact of noise atoms in the dictionary obtained through training on denoising performance, the Bartlett test method is proposed to be applied to the optimization of dictionary noise atoms, reducing the impact of noise atoms on denoised image quality.

Basic theory

Sparse representation

The essence of sparse representation is to utilize the sparsity of the signal to represent the main features of the signal with as few non-zero elements as possible. This representation can reduce the redundant information of the password and extract the main features of the signal, thus realizing the compression and dimensionality reduction of the signal. For a given image signal \(x \in R^{n}\), it is possible to use a linear combination of the elements of an overcomplete dictionary \(D = [d_{1} ,d_{2} , \ldots ,d_{k} ] \in R^{N \times K}\), i.e.

$$ x = D\alpha $$
(1)

where \(\alpha { = [}\alpha_{{1}} ,\alpha_{2} ,.....\alpha_{k} {]}^{T} \in R^{K}\) is the image sparse matrix, \(K\) is the number of dictionary atoms, and satisfies \(K > N\). The image sparsity problem based on an overcomplete dictionary can be expressed as follows:

$$ \min \left\| \alpha \right\|_{0} , \, s.t \, x = D\alpha $$
(2)

where \(\left\| \cdot \right\|_{0}\) denotes the \(l_{0}\)-norm, that is, the non-zero elements in the sparse coefficient vector are solved by constraints on the dictionary and the signal samples to be tested. Since the dictionary \(D\) in the sparse representation is an overcomplete dictionary, Eq. (2) has an infinite number of solutions, so it is necessary to find solutions with a few nonzero coefficients, which requires certain constraints. Elad et al.20 expressed the sparse representation-based image recovery model as follows:

$$ \mathop {\min }\limits_{x} \frac{1}{2}\left\| {x - D\alpha } \right\|_{2}^{2} + \lambda \left\| \alpha \right\|_{0} $$
(3)

Sparse representation theory has been widely used in various fields of image restoration because of its ability to represent signals flexibly, concisely, and adaptively. After decomposing the image sparsely, the valid information and noise exhibit different characteristics on a sparse dictionary. Distinguish between valid and noisy information in an image based on its characteristics, this is why sparse representations can be denoised. Therefore, this paper used an image denoising method based on sparse representation to denoise LDCT images and train initial images using the dictionary learning algorithm K-SVD, adaptively update the dictionary by learning the feature information of the target image. The dictionary obtained from the training contains a priori information about the image, which reflects its structural features. This approach preserves the original detail information of the image while removing noise and artifacts.

K-SVD dictionary learning algorithm

The construction of dictionaries is generally divided into fixed dictionaries and learning dictionaries. Fixed dictionaries include wavelet transform dictionaries, curvelet dictionaries, etc. This type of dictionary is simple and easy to implement for sparse representation of signals, but the signal expression form is single and can not effectively represent all image structure information. Compared with the fixed dictionary, a learning dictionary is obtained through learning from a large amount of data. The dictionary has richer atomic forms, strong adaptive ability, and can better match the structure of the image itself. Therefore, learning dictionaries are usually used.

The K-SVD algorithm is a classic dictionary learning algorithm, the core of its algorithm is training a dictionary that can effectively represent the sample signal. This algorithm has good coupling and can be used with numerous sparse decomposition algorithms. The K-SVD algorithm is mainly used in training a dictionary and includes two stages: sparse coding and dictionary update. Assume that the sample signal \(Y = \{ y_{i} \}_{i = 1}^{N}\) is the input signal, where \(i = 1,2,...,N\). The objective function of dictionary learning can be described as:

$$ \mathop {\min }\limits_{D,X} \left\| {Y - DA} \right\|_{F}^{2} \, s.t \, \forall i,\left\| {\alpha_{i} } \right\|_{0} \le T_{0} $$
(4)

where \(D \in R^{N \times K} (K > N)\) is the initial dictionary; \(\alpha_{i}\) is the sparsity coefficient for each sample \(y_{i}\); \(T_{0}\) is the sparsity. When the dictionary \(D\) is unchanged, the sparse coefficients are solved using the sparse decomposition algorithm in the sparse coding stage. In this paper, the ROMP algorithm is used to sparsely represent the sample signal, and after obtaining the sparse representation coefficients \(\alpha_{i}\), the sparse representation coefficients \(\alpha_{i}\) are fixed, and the dictionary \(D\) is updated.

K-SVD uses a column-by-column update method to update the dictionary. When updating the \(k{\text{ - th}}\) column of atoms, the other atoms are fixed. Suppose we want to update atom \(d_{k}\) of atom \(k{\text{ - th}}\), define \(\alpha_{k}\) to be the sparsity coefficient of \(Y\) on atom \(d_{k}\). The objective function is then:

$$ \left\| {\left. {Y - DA} \right\|} \right._{F}^{2} = \left\| {\left. {Y - \sum\nolimits_{j}^{k} {d_{j} \alpha_{{_{T} }}^{j} } } \right\|_{F}^{2} } \right. = \left\| {\left. {(Y - \sum\limits_{j \ne k} {d_{j} \alpha_{{_{T} }}^{j} ) - d_{k} \alpha_{{_{T} }}^{k} } } \right\|} \right._{F}^{2} = \left\| {\left. {E_{k} - d_{k} \alpha_{T}^{k} } \right\|} \right._{F}^{2} $$
(5)

where \(d_{k}\) represents the \(k{\text{ - th}}\) column of the dictionary, \(\alpha_{{_{T} }}^{k}\) denotes row \(A\) corresponding to \(d_{k}\) in the coefficient matrix \(k\), the matrix \(E_{k}\) represents the error caused by the components of \(d_{k}\) across all samples. In the dictionary update phase, the main task is to perform the SVD decomposition of the sparse representation error matrix \(E_{k}\) to find the updated atoms of the dictionary and the corresponding sparsity coefficients by decomposition.

The proposed method

Image classification principles

Because the information features of images are rich and complex, traditional sparse dictionary training methods cannot fully take into account the differences between image structure information, Causing easy loss of image texture and contour information. To solve this problem, propose a method for training sparse classification dictionaries. First, classify the image patches, and train the corresponding dictionary separately for each type of image block. The image can be composed of high-frequency parts and low-frequency parts, different structural components contain important image information. After the image is decomposed, it can be expressed by the following formula:

$$ I_{image} = I_{LF} \cup I_{HF} $$
(6)

In formula (6), \(I_{image}\) represents the entire image area; \(I_{LF}\) and \(I_{HF}\) represent the high-frequency and low-frequency parts of the image respectively. Prepare to build sub-component dictionaries suitable for representing high-frequency and low-frequency parts of the image respectively.

Use two-dimensional variational mode decomposition to classify the images to be trained, divided into low frequency part and high frequency part. The low-frequency modal component retains most of the original image information and contains a small amount of noise, high-frequency modal components contain a lot of noise and image edge information.Two-Dimensional Variational Mode Decomposition31 (2D-VMD) is an image processing technique that extends Variational Mode Decomposition (VMD) to two-dimensional images. The principle is to decompose the original two-dimensional signal into several time–frequency components. The two-dimensional variational mode decomposition-constrained variation model is expressed as:

$$ \left\{ {\begin{array}{*{20}l} {\min \{ \left\| {\left. {\nabla [\mu_{AS,k} (x)e^{{ - j\left\langle {\omega_{k} ,x} \right\rangle }} ]} \right\|_{2}^{2} } \right.\} } \hfill \\ {s.t. \, \sum\limits_{k} {\omega_{k} = f} } \hfill \\ \end{array} } \right. $$
(7)

In the formula, \(f\) represents the image to be processed, \(\mu_{AS,k} (x) = \mu_{k} \times [\delta (\left\langle {x,\omega_{k} } \right\rangle ) + \frac{j}{{\pi \left\langle {x,\omega_{k} } \right\rangle }}]\delta (\left\langle {x,\omega_{k} } \right\rangle )\).

In order to solve the above-constrained variation problem, the optimal solution in Eq. (7) is obtained by adding the Lagrangian multiplication operator \(\lambda\) and the quadratic penalty factor \(\alpha\), transforming a constrained variation problem into an unconstrained variation problem, the expression of the unconstrained variation problem is as follows:

$$ L(\{ \mu _{k} \} ,\{ \omega _{k} \} ,\lambda ) = \sum\limits_{k} {\alpha _{k} \left\| {\left. {\nabla [\mu _{{AS,k}} (x)e^{{ - j\left\langle {\omega _{k} ,x} \right\rangle }} ]} \right\|_{2}^{2} } \right.} + \left\| {\left. {f(x) - \sum\limits_{k} {\mu _{k} (x)} } \right\|} \right._{2}^{2} + \left\langle {\lambda (x),f(x) - \sum\limits_{k} {\mu _{k} (x)} } \right\rangle $$
(8)

Use the alternating direction method of multiplication operators to solve the optimal solution to the above variational constraint problem, so as to decompose the input image signal into a sub-mode. The specific algorithm steps for image block classification in this paper are as follows:

Step 1: Initialization \(\{ \hat{\mu }_{k}^{0} \}\)\(\{ \hat{\omega }_{k}^{0} \}\)\(\{ \hat{\lambda }_{0} \}\) and \(n\);

Step 2: Execute loop, \(n = n + 1\);

Step 3: Update \(\hat{\mu }_{k}\), that is \(\mu_{k}^{n + 1} (\omega ) = \arg \min_{{\mu_{k} }} \left\{ {\alpha_{k} \left\| {j(\omega - \omega_{k} )[(1 + {\text{sgn}}(\omega \cdot \omega_{k} ))\mu_{k} (\omega )]} \right\|_{2}^{2} + {\mkern 1mu} \left\| {f(\omega ) - \sum\limits_{i} {\mu_{i} (\omega ) + \frac{\lambda (\omega )}{2}} } \right\|_{2}^{2} } \right\}\), among \((1 + {\text{sgn}} (\omega \cdot \omega_{k} ))\mu_{k} (\omega ) = \hat{\mu }_{AS,k} = \left\{ {\begin{array}{*{20}c} {2\hat{\mu }_{k} (\omega ),} & { \, if\left\langle {\left. {\omega ,\omega_{k} } \right\rangle \, > 0} \right.} \\ {\hat{\mu }_{k} (\omega ),} & {if\left\langle {\left. {\omega ,\omega_{k} } \right\rangle } \right. = 0} \\ {0,} & {if\left\langle {\left. {\omega ,\omega_{k} } \right\rangle < 0} \right.} \\ \end{array} } \right.\);

Step 4: Update \(\hat{\omega }_{k}\), that is \(\omega_{k}^{n + 1} (\omega ) = \mathop {\arg \min }\nolimits_{{\mu_{k} }} \{ \alpha_{k} \left\| {j(\omega - \omega_{k} )[(1 + {\text{sgn}} (\omega \cdot \omega_{k} ))\mu_{k} (\omega )]} \right\|_{2}^{2}\);

Step 5: Update \(\lambda (\omega )\), that is \(\hat{\lambda }^{n + 1} (\omega ) = \hat{\lambda }^{n} + \tau (\hat{f}(\omega ) - \sum\limits_{k} {\hat{\mu }_{k}^{n + 1} (\omega } ))\);

Step 6: If \(\frac{{\sum\nolimits_{k} {\left\| {\hat{\mu }_{k}^{n + 1} - \hat{\mu }_{k}^{n} } \right\|_{2}^{2} } }}{{\left\| {\hat{\mu }_{k}^{n} } \right\|_{2}^{2} }} < Ke\) is satisfied, stop iteration, otherwise start iteration from step (2).

Finally, \(k\) IMF components can be calculated by the inverse Fourier transform of \(\hat{\mu }_{k} (\omega )\), that is, \(\hat{\mu }_{k} (t){ = }IFFT(\hat{\mu }_{k} (\omega ))\). To intuitively reflect the reliability of the training sample classification algorithm selected in this article, classify low-dose CT images using two-dimensional variational mode decomposition, the low-dose CT image is shown in Fig. 1a. The classification results are shown in Figs. 1b and c. It can be seen from Fig. 1b, that the low-frequency modal component partially retains the overall information and global features of low-dose CT images, Fig. 1c shows the high-frequency modal component, which contains details, texture, edges, and noise information of low-dose CT images. Therefore, the classification of both low-frequency modal components and high-frequency modal components is accurate.

Figure 1
figure 1

IMF image decomposed by two-dimensional variational mode decomposition method (a) LDCT (b) low-frequency (c) high-frequency.

Optimizing dictionary atoms

In order to better adapt the dictionary to different image data, this paper uses noisy image blocks to train the dictionary. However, using noisy image blocks to train the dictionary through K-SVD is prone to noise interference and mixing in false atoms. Because false atoms do not contain the feature information of the image, they will not be used to represent the image but will instead affect the denoising effect of the dictionary32. Figure 2 shows the dictionary obtained by initializing the DCT dictionary and training noisy images using the K-SVD algorithm, and false atoms are marked in the white box.

Figure 2
figure 2

Dictionary containing some false atoms after training by K-SVD algorithm (a) DCT Dictionary (b) K-SVD Learning Dictionary.

To solve this problem, our method will verify atoms and perform atomic optimization during the training process. The basic idea of the proposed method is to exploit the presence of strong sample correlation in at least one direction in the image feature region and no significant sample correlation in the noise region. To estimate sample correlation in one or more directions, consider rearranging according to a specific direction to get a vector \(U^{\theta } = \{ u_{1}^{\theta } ,u_{2}^{\theta } ,....,u_{n}^{\theta } \}\), where \(\theta = \{ \theta_{1} , \ldots ,\theta_{n} \}\) represents a certain direction. The selected direction is \(\theta = \{ 0^{ \circ } ,{45}^{ \circ } ,{90}^{ \circ } ,{135}^{ \circ } \}\), as shown in Fig. 3, rearranging dictionary atoms in these directions.

Figure 3
figure 3

Four directions for rearranging dictionary atoms.

To facilitate sample correlation in a specific direction \(\theta\) , the feature vector is defined as:

$$ Z^{\theta } = \{ z_{{2}}^{\theta } ,z_{{3}}^{\theta } ...,z_{{n^{2} }}^{\theta } \} $$
(9)

To determine the strength of a sample’s correlation in a given direction, evaluated by the magnitude of \(z_{k}^{\theta }\). If the value of \(Z^{\theta }\) is small, then it indicates that the correlation between samples in this direction is weak. On the contrary, it indicates a strong correlation between samples. In image feature and noise detection problems, the variance of the sample in one direction is usually used as a statistical criterion to test the noise atom. Let \(\sigma_{\theta }^{2}\) be the variance of the eigenvector \(Z^{\theta }\), and perform Bartlett’s test on the variance \(\sigma_{\theta }^{2}\) to determine if the atoms are noise atoms. Set up the null hypothesis \(H_{0}\) and the opposing hypothesis \(H_{1}\):

\(H_{{0}} :\sigma_{1}^{2} = \sigma_{2}^{2} = \sigma_{3}^{2} = \sigma_{4}^{2}\), consists of noise atoms.

H1

There exists at least one pair \((i,j)\) in which \(i \ne j\) makes \(\sigma_{i}^{2} \ne \sigma_{j}^{2}\), then the atom contains the image feature information.

This paper introduces the Bartlett test to weaken the effect of noise atoms on dictionary denoising33. This method is used to optimize dictionary atoms and improve the sparse representation ability of the dictionary. The Bartlett test statistic of the feature vector \(\sigma_{\theta }^{2}\) can be defined as:

$$ Q = \frac{{(n - 2)(4\ln (\sum\limits_{i = 1}^{4} {\frac{{\sigma_{i}^{2} }}{4}) - \sum\limits_{i = 1}^{4} {\ln \sigma_{i}^{2} } } )}}{{1 + \frac{1}{9}(\frac{4}{n - 2} - \frac{1}{{4(n - {2})}})}} $$
(10)

If the Bartlett test statistic of an atom satisfies \(Q < \chi^{2} (\alpha ;{3})\), it means that if the contribution rate of the atom is small, then the atom is considered a noise atom, where \(\chi^{2} (\alpha ;{3})\) denotes the location of the upper percentile of the chi-squared distribution with a degree of freedom 3. \(\alpha\) is taken to be 0.05.

The denoising framework using Bartlett’s test is shown in Fig. 4. In this paper, the dictionary is trained using the K-SVD algorithm and the dictionary is atomically optimised in the optimisation phase. The method does not affect the information atoms, it only reduces the dictionary atoms with a small contribution so that the dictionary obtained from its training can be better adapted to the characteristics of low-dose CT images and better capture the structural information and texture details in the images.

Figure 4
figure 4

Dictionary Atomic Optimization Framework.

The principle of dictionary atom optimization is to replace the atoms corresponding to the noise atoms using the atoms in the overcomplete DCT dictionary. This can eliminate the influence of noise atoms, and keep the number of dictionary atoms stable, reducing the interference of noise atoms and favoring the sparse representation of the image. As shown in Fig. 5, the noise atoms in the original overcomplete dictionary (a) are replaced with the corresponding atoms in the overcomplete DCT dictionary (b) to obtain the optimized dictionary (c).

Figure 5
figure 5

Optimization of over-complete dictionaries (a) Primitive dictionary containing some meaningless atoms (b) Overcomplete DCT dictionary (c) Optimized dictionary.

K-SVD low-dose CT image denoising algorithm based on dictionary atom optimization

Although the sparse representation image denoising method based on dictionary learning has better denoising performance, in this algorithm, the entire image is limited to a single dictionary for sparse decomposition and reconstruction, resulting in the loss of image detail information. At the same time, in order to make the dictionary better adapt to different image data, dictionary training using noisy images, however, in the process of directly using noisy image blocks to train the dictionary, it is easy to be interfered with by noise and mixed with false atoms, False atoms do not contain characteristic information of the image and will not be used to represent the image. Instead, it will affect the denoising effect of the dictionary. In order to overcome these shortcomings, proposed a low-dose CT image denoising method based on 2D-VMD and dictionary atom optimization. This method first performs block classification on denoised images, using two-dimensional variational mode decomposition to decompose the processed image into modal components with different center frequencies, prepare for subsequent classification training by building a sparse representation dictionary suitable for specific image information. Results based on image patch classification, train corresponding learning dictionaries for each type of modal component respectively. This process combines the regularized orthogonal matching pursuit algorithm and dictionary atomic optimization. Finally, use the final dictionary to denoise the image. The flowchart of the method is shown in Fig. 6.

Figure 6
figure 6

K-SVD image denoising flow chart based on image decomposition and dictionary atom optimization.

The specific implementation steps of our method are as follows.

  1. 1.

    Preprocessing of noisy images. Modal decomposition of images using the 2D-VMD method decomposes the image into high-frequency modal components \(IMF_{1}\) and low-frequency modal components \(IMF_{2}\).

  2. 2.

    DCT dictionaries are used as initial dictionaries for the K-SVD algorithm, training the dictionary with the decomposed modal component data and obtaining the high-frequency modal component dictionary \(D_{{IMF_{{1}} }}\) and the low-frequency modal component dictionary \(D_{{IMF_{{2}} }}\).

  3. 3.

    Chunk the modal components to process. For a modal component of size \(\sqrt N \times \sqrt N\), the modal components are divided into overlapping image chunks of size \(\sqrt n \times \sqrt n\). For ease of processing, each image chunk is vectorized to obtain the sample set \(\{ x_{i} \}_{i = 1}^{M}\), where the samples \(x_{i}\) have length \(L = n\). The image is divided into 8 × 8 image blocks, constituting the sample set for training.

  4. 4.

    Sparse coding stage. We use the ROMP algorithm to solve sparse representation coefficients. First, initialize dictionary \(D = [x_{1} ,x_{2} ,....,x_{i} ]\); \(K\) is the sparsity; Initialize residuals \(r_{0} = y\); Atomic index set \(\Lambda_{0} = \emptyset\); Matching matrix \(A_{{0}} { = }\emptyset\); \(t = 1\); Initial number of iterations \(k = 1\); The detailed sparse deco mposition process is as follows:

    1. a)

      Atomic selection: according to \(u = \{ u_{j} |u_{j} = | < r_{t - 1} ,\alpha_{j} > |,1 \le j \le N\}\), When \(\left| {u{}_{0}} \right| < K\), store the column numbers of all atoms in \(u\) in the set of column numbers \(J_{0}\); When \(\left| {u_{0} } \right| > K\),\(J_{0} = {\text{supp}}(u_{K} )\).

    2. b)

      Regularization process: Find \(J_{0}\) in the set \(J\) that satisfies: \(\left| {u(i)} \right| \le \left| {2u(i)} \right|,i,j \in J_{0}\) has maximum energy \(\sum\nolimits_{j} {\left| {u(j)} \right|}^{2}\).

    3. c)

      Updated support set: \(\Lambda_{t} = \Lambda_{t - 1} \cup J{}_{0}\), \({\rm A}_{t} = {\rm A}_{t - 1} \cup \alpha_{j}\), \(j \in J_{0}\).

    4. d)

      Updating the signal sparse representation coefficient estimates: find the \(y = {\rm A}_{i} \theta_{i}\) least squares solution from the equation \(\hat{\theta }_{t} = \mathop {\arg \min }\limits_{\theta } \left\| {y - {\rm A}_{t} \theta_{t} } \right\| = ({\rm A}_{t}^{T} {\rm A}_{t} )^{ - 1} {\rm A}_{t}^{T}\).

    5. e)

      Update residuals: \(r_{t} = y - {\rm A}_{t} \hat{\theta }_{t}\).

    6. f)

      Determine iteration termination conditions: Let \(t = t + 1\), as \(t \le K\), return to step (a). If \(t > K\) or \(\left\| {\Lambda_{i} } \right\|_{0} \ge 2K\) or \(r_{t} = 0\), then stop the iteration and go to step (g).

    7. g)

      Estimated \(\hat{\theta }\): location of non-zero items in \(\Lambda_{i}\), the corresponding nonzero value in \(\hat{\theta }\) is the final required \(\hat{\theta }_{t}\).

  5. 5.

    Dictionary Updates. According to the sparsity coefficient \(\hat{\theta }_{t}\) and the initial dictionary \(D\) obtained in step (4), the sparsity coefficient is fixed, the dictionary is updated, and for each basic element, the atom whose coefficient is 0 is deleted, and then the SVD decomposition is used to find a set of optimal basic atoms to replace the original basic atoms, and the iteration is not stopped until the condition is satisfied, and the sparsity coefficient is updated at the same time that the dictionary is updated.

  6. 6.

    Dictionary atom optimization. Perform noise atom detection and replacement to obtain the optimized dictionary \(\hat{D}^{\prime}\).

  7. 7.

    Image denoising. The denoised high-frequency modal component and low-frequency modal component are obtained by reconstructing the image with the optimized dictionary and the corresponding sparse matrix. Finally, the two denoised modal components are fused to obtain the final denoised image.

Simulation experiment and result analysis

Evaluation indicators

To evaluate the quality of the denoised images objectively, two evaluation criteria are used in this paper. One is peak signal-to-noise ratio(PSNR). PSNR is one of the most widely used objective evaluation metrics for images. A larger PSNR value indicates smaller image distortion, which means a better denoising effect of the method. The defining equation is as follows:

$$ PSNR(X,Y) = {10} \times {\text{log}}_{10} ({255}^{{2}} {/}MSE) $$
(11)

where

$$ MSE(X,Y) = \frac{{1}}{MN}\sum\limits_{{i{ = 1}}}^{M} {\sum\limits_{j = 1}^{N} {(x_{ij} - y_{ij} )} }^{2} $$
(12)

where \(X\) and \(Y\) are two images of size \(M \times N\). \(x_{ij}\) and \(y_{ij}\) are the pixel value of \(X\) and \(Y\) at position \((i,j)\). It can beobserved from (11) and (12) that the larger the PSNR values are, the closer the image \(X\) is to \(Y\).

Another criterion is the Structural similarity(SSIM)34, which is commonly used to measure the similarity between two images. The range of values for SSIM is (0,1), a larger SSIM value indicates a greater structural similarity between the two images. The formula for its calculation is as follows:

$$ SSIM(x,y) = \frac{{(2\mu_{x} \mu_{y} + C_{1} )(2\sigma_{xy} + C_{2} )}}{{(\mu_{x}^{2} + \mu_{y}^{2} + C_{1} ) + (\sigma_{x}^{2} + \sigma_{y}^{2} + C_{2} )}} $$
(13)

where \(x\) and \(y\) denote the original image and the image to be evaluated, the mean value \((\mu_{x} ,\mu_{y} )\) indicates the brightness of the image; the standard deviation \((\sigma_{x} ,\sigma_{y} )\) is used to denote the contrast of the image, and the covariance \((\sigma_{xy} )\) represents the image structure.

Experimental data

This paper uses simulated low-dose CT data sets and clinical low-dose CT data sets to verify the denoising performance of the proposed denoising method.

Low-dose CT simulation data: The dataset was downloaded from the National Biomedical Imaging Archive (NBIA). The data set contains 7015 normal-dose CT images of 165 patients, the pixel size of each image is 256 × 256. Before these data are used to test the performance of denoising methods, Poisson noise needs to be added to normal-dose CT images, to obtain simulated low-dose CT images. Assuming a monochromatic source, projection measurements from CT scans follow a Poisson distribution, then the noise characteristics in the simulated image can be expressed as:

$$ z_{i} \sim Poisson\{ b_{i} e^{{ - l_{i} }} + r_{i} \} ,i = 1,....,I $$
(14)

where \(z_{i}\) represents the measurement value along the th ray path, \(l_{i}\) represents the line integral of the attenuation coefficient, \(b_{i}\) represents the blank scan coefficient, and \(r_{i}\) represents noise in low-dose CT images. The noise level is controlled by the blank scan coefficient \(b_{i}\).

Low-dose CT image clinical data: The clinical low-dose CT image data set from the “2016 AAPM Mayo Clinic Low Dose CT Grand Challenge” authorized by the Mayo Clinic was used. Mayo dataset images from 10 patient cases, conventional dose scans are taken using 120 kV tube voltage and 200 mAs effective current. The dataset consists of paired 3 mm normal-dose CT images and composite quarter-dose LDCT images of size 512 × 512, the quarter dose image is simulated by adding Poisson noise to the full dose projection data.

Experimental results and analysis

In order to verify the effectiveness of the denoising algorithm in this paper, seven different state-of-the-art methods were compared, including wavelet denoising17, TV algorithm35, WNNM algorithm36, non-local mean filtering22 (NLM), DCT37, MOD algorithm38, and K-SVD algorithm26 against our method. TV algorithm, WNNM algorithm, non-local mean filtering (NLM), and wavelet denoising are classic image denoising methods. TV algorithm, WNNM algorithm, non-local mean filtering (NLM), and wavelet denoising are classic image denoising methods that have been successfully applied to low-dose CT image denoising. DCT, MOD algorithm and K-SVD algorithm are the current mainstream image denoising methods based on sparse representation. These methods include both classic low-dose CT image denoising and currently widely used low-dose CT image denoising, which can provide a comprehensive comparative verification of the denoising performance of the method proposed in this paper. Verify the denoising performance of this method through quantitative and qualitative analysis. Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) were selected to quantitatively evaluate the denoised images.

  • 1. Simulated data experiment

To verify the denoising performance of the proposed method, experiments were conducted using chest CT images in the simulated data. Figure 7 shows the denoising effect of chest CT images, which includes images denoised by normal dose CT (NDCT), LDCT and various contrast algorithms. To more comprehensively evaluate the image denoising effect of various denoising algorithms, this section marks the regions of interest in the generated image, this area is enlarged as shown in Fig. 8. It can be seen from Fig. 7 that all algorithms suppress noise to a certain extent. From Fig. 7c and d, it can be seen that the image noise after denoising by wavelet and TV is reduced, but there is an obvious block effect and serious loss of details. From Fig. 7e and f, the denoising effect of the NLM and WNNM algorithms is better than the first two methods. The image appears more natural, but the edge details are too smooth. DCT, MOD, and K-SVD retain more detailed information than other methods. Observing Fig. 7j, we can intuitively find that the proposed method performs best in denoising effect and better retains image edge information.

Figure 7
figure 7

Denoising effects of different algorithms on chest images (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Figure 8
figure 8

Zoomed parts over the region of interest (ROI) marked by the red box in Fig. 7 (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Figure 8 is a detailed enlargement of the red box area in Fig. 7. In Fig. 8, blue circles and red arrows are used to mark some texture details. By carefully observing and comparing the images, the low-dose CT restored by the method proposed in this paper can better retain the structural information of the image, closer to normal-dose CT images. As can be seen from Fig. 8c, d and e, the denoised image still contains noise and artifacts (the details pointed out by the yellow arrows are not clear). Figure 8f, g, h and i show the NLM and sparse representation-based denoising effects respectively. It can be seen that the denoising effect has been significantly improved, but the yellow arrows point to parts of the image that are too smooth and unnatural, the denoised image is not well restored. Figure 8j shows the denoising effect of the proposed method, it can be seen that the recovery of the denoising effect and texture detail information is better than other methods.

For quantitative evaluation, Table 1 gives the PSNR and SSIM indicators of the proposed method and other representative denoising algorithms. The larger the PSNR value is, the better the denoising effect of the denoised image is and the more complete the image details are retained. The larger the SSIM value, the better the visual effect of the generated image. As can be seen from Table 1, compared with other algorithms, the algorithm in this paper has achieved better results on two indicators. Combining subjective and objective evaluations, the method proposed in this article can effectively remove the noise in low-dose CT images while retaining the detailed information and edge information of low-dose CT images.

  • 2. Clinical data experiment

Table 1 Quantitative analysis results of chest images using different denoising methods.

To verify the denoising effect of this method, this paper selects two representative slices from the clinical test set, denoise them using different denoising methods, visual comparison of denoised images with normal dose CT images and low dose CT images. Among them, the yellow rectangular box in Fig. 9 marks the area of interest. Subjective evaluation of denoising effect in terms of noise artifacts, edge preservation, and detail information. As shown in Fig. 9b, due to the insufficient number of incident X-ray photons, artifact noise in low-dose CT images is obvious. The presence of noise will reduce the quality of CT images, blur image details, and distort edges, making it difficult for doctors to accurately observe CT images.

Figure 9
figure 9

Denoising effects of different algorithms on liver images (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Subjective evaluation: From the experimental results of denoising, it can be seen that each method exhibits varying degrees of noise suppression ability. Firstly, this article selects three classic traditional denoising methods, As shown in Figs. 9 c and d, CT images after wavelet and TV denoising still contain noise, blurry compared to the original image, unable to achieve better reconstruction results. Figures 9e and f show that the denoising effect of WNNM and NLM algorithms is better than that of wavelet and TV denoising, however, the texture and detail information of the image are not well restored. Secondly, the classic sparse representation denoising method was chosen, including DCT, MOD, and K-SVD algorithms. As can be seen from Fig. 9g, the DCT-based method can remove noise very well, but the denoised image is too smooth and details are lost. Figures 9h and i show the dictionary learning denoising effect diagram based on MOD and K-SVD, both methods retain better content and detail and texture information while removing noise, however, there is still a certain gap compared with NDCT images. It can be seen from the denoising result diagram in Fig. 9j that compared with other methods, the proposed denoising method can effectively preserve more texture details in low-dose CT images while removing noise, improving visualization of low-dose CT images.

To display the noise reduction results more clearly, the area of interest of the CT image denoised by different denoising methods is enlarged, as shown in Fig. 10. Use circles and arrows to mark some structural details. It can be seen from Figs. 10c and d that the wavelet and TV denoising effects are not ideal, residual noise blurs the texture details in the blue circle. It can be observed from Figs. 10e and f that the WNNM and NLM methods can suppress more noise, however, after denoising, there is a smoothing problem at the edge of the image, and detailed information is lost. The visual effect of the denoised images in Fig. 10g, h and i is better than the previous denoising methods, however, the image detail information and contour information in the circles and arrow parts are not well restored. It can be seen from Fig. 10j that the image denoised by this method is closer to the NDCT image, the areas of interest have been well restored. This proposed method achieves a better balance between denoising and image structure protection, resulting in better image quality.

Figure 10
figure 10

Zoomed parts over the region of interest (ROI) marked by the yellow box in Fig. 9 (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Figure 11 shows the experimental results after denoising selected abdominal low-dose CT images using different denoising models. Low-dose CT images and corresponding NDCT images are shown in Fig. 11b and a, which clearly show substantial image differences. Figures 11c and d show the effect of wavelet and TV denoising, it can be seen that there is still noise and artifacts, and the detailed information of the image is lost. It can be seen from Fig. 11e and f that the two methods can improve the image quality to a certain extent, the processed image has issues with detail loss and image smoothing. Compared to this, the sparse representation-based image denoising method used in Fig. 11g, h and i, the problem with image smoothing is not obvious, but there is a situation of edge information loss. Figure 11j shows the denoising effect of the proposed method, the denoised image is closer to the normal dose CT image. This is because this article uses a classification training dictionary, the trained dictionary can better reflect the details and texture structure features of the image.

Figure 11
figure 11

Denoising results of abdominal images using different algorithms (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Figure 12 is a detailed enlargement of the red rectangular area in Fig. 11. In Fig. 12, red arrows are used to mark some texture details. By carefully observing and comparing the images, The low-dose CT restored by the proposed method can better retain the structural information of the image. The denoising effect based on traditional denoising methods is not ideal, the detailed information of the image after denoising is lost (the detailed information pointed out by the red arrow is not clear). Figures 12g, h and i respectively show the denoising effect diagram based on sparse representation, it can be seen that the denoising effect has been significantly improved, but the red arrows point to parts of the image that are too smooth and unnatural. Figure 12j shows the denoising effect of this paper, it can be seen that the method in this paper is better than other methods in terms of denoising effect and texture detail information recovery.

Figure 12
figure 12

Zoomed parts over the region of interest (ROI) marked by the yellow box in Fig. 11 (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Quantitative evaluation: To quantitatively analyze the reconstruction results of different algorithms, This paper uses two objective evaluation indicators, PSNR and SSIM, to quantitatively evaluate the quality of denoised images. Tables 2 and 3 show the quantitative evaluation indicators of denoised images after using different methods to denoise LDCT images. It can be seen from the two tables that the proposed method performs well on both objective indicators, and the PSNR and SSIM indicators have been significantly improved. Specifically, The image-denoising algorithm based on sparse representation has significantly improved image quality from the evaluation indicators, but there are still some shortcomings in the handling of details. Both WNNM and NLM methods have their advantages, and their performance in objective evaluation is slightly lower than that of image-denoising methods based on sparse representation. Wavelet and TV did not improve significantly in the two indicators, which is consistent with the poor noise suppression effect in subjective evaluation judgments. Combining subjective and objective evaluations, the method proposed in this paper combines image decomposition and dictionary atomic optimization, based on being able to effectively remove noise in low-dose CT, it also retains the detailed information and edge information of low-dose CT images.

Table 2 Quantitative analysis results of chest images using different denoising methods.
Table 3 Quantitative analysis results of chest images using different denoising methods.

To further verify whether the denoising method can preserve the details of pathological structures, We show the denoising effect of marking areas containing lesions in Fig. 13. It can be seen from the figure that the proposed method is better in terms of noise suppression and structural information retention. The lower right corner of Fig. 13 is an enlarged view of the lesion area after processing by the proposed method and other algorithms, mark it with a red circle. From the position indicated by the red circle in Fig. 13, Lesions are visible in all denoised images. However, in the image denoised by wavelet and TV algorithms, it is obvious that the lesion area is relatively blurry. It can be seen from the WNNM, NLM, DCT, MOD, and K-SVD denoised images that there are smoothing and loss problems at the lesion edge. In contrast, the lesion area after denoising by the method proposed in this article is closer to that of NDCT.

Figure 13
figure 13

Demonstrating the denoising effect of marking regions containing lesions (a) NDCT (b) LDCT (c) Wavelet (d) TV (e) WNNM (f) NLM (g) DCT (h) MOD (i) K-SVD (j) Proposed.

Conclusion

Aiming at the problem of serious noise and artifacts in LDCT images, this paper proposes an image-denoising algorithm based on image decomposition and dictionary optimization, which combines the ideas of two-dimensional variational modal decomposition (2D-VMD) and dictionary learning. Firstly, the LDCT image to be processed is decomposed into multiple modal components. Each modal component corresponds to a different frequency and time-domain feature, i.e., each modal component contains different image information. Each modal component corresponds to different frequency and time domain features, i.e., each modal component contains different image information, and a targeted dictionary is trained for each modal component, which can represent the image more sparsely and optimize the perfect dictionary by detecting and replacing the noise atoms, which can weaken the influence of the noise atoms on the denoising performance of the dictionary. By comparing with other algorithms, it can be found that the algorithm in this paper can effectively remove noise and artifacts in LDCT images while retaining and recovering more details and structural information about the images. Therefore, the algorithm can improve the quality of LDCT images, which is conducive to the subsequent processing and analysis of LDCT images. Although the proposed method is compared with some other methods, achieved better denoising performance. However, there are still some issues that require further research. For example, comparing the proposed method with more advanced image denoising algorithms, the use of two-dimensional variational mode decomposition will inevitably bring errors, because the algorithm needs to determine the decomposition scale, and under different image and noise conditions, different parameter selections may affect the denoising effect.

Therefore, in future research, it is expected that the 2D variational modal decomposition will be replaced by better feature decomposition techniques to further improve the flexibility of the framework proposed in this paper. In addition, the computation on the dictionary learning-based and 2D-VMD with dictionary learning methods is more complicated, involving multiple steps such as frequency domain transformation, iterative optimization, and dictionary learning, which requires more computational resources and time, and future work can consider further optimizing the running time of the algorithms to achieve a reduction in the denoising time while improving the denoising effect.