Introduction

This paper considers following model

$$\begin{aligned} \min \{h(x)|~x\in R^n\}, \end{aligned}$$
(1)

where the objective function \(h:R^n\longrightarrow R\) is continuously differentiable. The conjugate gradient (CG) algorithm is widely used to solve (1), in which the iteration formula is written as:

$$\begin{aligned} x_{k+1}=x_k+\alpha _kd_k, ~k=0,1,2,\ldots , \end{aligned}$$
(2)

where \(x_{k+1},\) \(\alpha _k\) and \(d_k\) are next iteration point, step size and search direction respectively, where \(d_k\) is generally defined by formula

$$\begin{aligned} d_k=\left\{ \begin{array}{ll} -g_k+\beta _{k}d_{k-1}, &{} \quad \text{ if }\,\,\,\, k\ge 1,\\ -g_k,&{} \quad \text{ if }\,\,\,\,k=0,\\ \end{array} \right. \end{aligned}$$
(3)

where \(g_k\) is called the gradient of objective function h(x) at iteration point \(x_k\), and \(\beta _k\in R\) is a scalar. Some CG algorithms are proposed to solve large-scale optimization problems and engineer problems. In Ref.4, general conjugate gradient method using the Wolfe line search is proposed, with a condition on the scalar \(\beta _k,\) which is sufficient for the global convergence. In Ref.16, a projection-based method is proposed to solve large-scale nonlinear pseudo-monotone equations, without Lipschitz continuity. In Refs.19,20,21, Sheng et al. proposed some trust region algorithms to solve nonsmooth minimization, large-residual nonsmooth least squares problems and optimization problems. Yuan et al proposed some nonlinear conjugate gradient methods to restore nonlinear equations and image restorations in Ref.24,25. In Ref.5, Dai summarized some analysis of conjugate gradient method. In Ref.9, authors adopted conjugate gradient solvers on graphic processing units. In Ref.12, authors proposed a new conjugate gradient method with guaranteed descent and an efficient line search for optimization. In Ref.18, authors proposed a hybrid conjugate gradient algorithm combining PRP and FR algorithms. In Ref.23, Wei et al proposed a conjugate gradient algorithm which designs a negative coefficient in the formula of the search direction. In fact, an important work is the design of \(\beta _k,\) and some classical expressions are widely used, including the Hestenes-Stiefel (HS)8,14,27, Liu-Storey (LS)22, Polak-Ribière-Polyak (PRP)11,25,26,28, Dai-Yuan (DY)6,29 and conjugate descent method (CD)10,13, Fletcher-Reeves (FR)15, where the first three algorithms have relatively good numerical performance but fewer theoretical results, while the others are inverse. The definitions are listed in Table 1, where \(\Vert .\Vert \) is the Euclidean norm.

Table 1 Six classical CG scalars.

The primary components of conjugate gradient algorithms encompass the search direction, step size (when applicable), and global convergence. The ultimate objective is to achieve a satisfactory balance between numerical efficiency and theoretical scrutiny.

In fact, the adequate descent property is a prerequisite for theoretical analysis and is governed by the following equation

$$\begin{aligned} g_k^Td_k\le -t\Vert g_k\Vert ^2, \end{aligned}$$
(4)

where \(t>0.\) Moreover, the trust region technique illustrates that the search radius plays a crucial role in determining the numerical efficacy. The search direction is obtained by solving the subsequent quadratic function, where \(\Delta _{k}\) denotes the trust region radius.

$$\begin{aligned}&\min _{x\in \Re ^n} g_{k}^{T}d_{k}+\dfrac{1}{2}d^{T}Q_{k}d.\\ & s.t.~~~\Vert d_{k}\Vert \le \Delta _{k}. \end{aligned}$$

The search direction in CG algorithms is also called satisfying the trust region property if following formula holds.

$$\begin{aligned} \Vert d_{k}\Vert \le t_1\Vert g_{k}\Vert , \end{aligned}$$
(5)

where \(t_1 > 0\). Equations (4) and (5) are intimately connected with the global convergence. Furthermore, an inexact linear search approach is frequently utilized to determine a suitable step size \(\alpha _k\). This paper adopts weak Wolfe-Powell (WWP) inexact linear search, which is formulated as follows:

$$\begin{aligned} h(x_k+\alpha _kd_k)\le h(x_k)+\delta \alpha _kg_k^Td_k \end{aligned}$$
(6)

and

$$\begin{aligned} g(x_k+\alpha _kd_k)^Td_k\ge \tau g_k^Td_k, \end{aligned}$$
(7)

where \(\delta \in (0,\frac{1}{2})\) and \(\tau \in (\delta ,1)\).

The aforementioned discussions are intricately linked to global convergence, which necessitates certain fundamental assumptions. These include: (i) the objective function must be continuously differentiable; (ii) the level set \(S=\{x\in R^n: h(x)\le h(x_0)\}\) must be bounded; and (iii) the gradient function g(x) must be Lipschitz continuous, where \(x_0\) denotes an initial point. The FR method1, modified HS method7, modified LS method17, and modified DY method29 achieve global convergence through the formula

$$\begin{aligned} \liminf _{k \rightarrow \infty } \Vert g_{k}\Vert =0. \end{aligned}$$

In other words, the Lipschitz continuity of the gradient function is a prerequisite for existing works, prompting us to consider whether global convergence can be attained in the absence of Lipschitz continuity. This paper proposes some three-term trust region conjugate gradient methods that converge under non-Lipschitz continuity condition, with the main properties summarized as follows:

  • Objective algorithms possess both the sufficient descent and trust region properties, without any additional conditions. The trust region property is derived from the trust region algorithm, while the algorithm design is based on classical approaches such as Hestenes-Stiefel (HS) and Polak-Ribière-Polyak (PRP).

  • These algorithms achieve global convergence even under conditions of non-Lipschitz continuity of the gradient function and weak Wolfe-Powell linear search techniques.

  • The applications of these algorithms include image restoration of noisy gray scale and color images, as well as solving large-scale unconstrained problems. The case studies illustrate that TT-TR-WP and TT-TR-CG possess superior numerical performance.

The remainder of the paper is organized as follows: “Motivation and TT-TR-WP” provides an overview of the motivation behind TT-TR-WP; “The global convergence of TT-TR-WP” presents the convergence analysis; “TT-TR-CG and theoretical analysis” describes the TT-TR-WP algorithm and its convergence analysis; “Case studies” presents the case studies, including image restoration and large-scale unconstrained problem-solving; and finally, the last section offers concluding remarks.

Motivation and TT-TR-WP

The first three-term conjugate gradient formula is proposed by Zhang et al.30, in which the search direction is defined by

$$\begin{aligned} d_k=\left\{ \begin{array}{ll} -g_k,&{} \text{ if }\,\,\,\,k=0,\\ -g_k+\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{\Vert g_{k-1}\Vert ^2}, &{} \text{ if }\,\,\,\, k\ge 1.\\ \end{array} \right. \end{aligned}$$
(8)

Formula (8) satisfies the sufficient descent property without any additional conditions, while the trust region property is closely related to the objective function, Lipschitz continuity, and level set.

Formula (9) was introduced by Yuan et al.28 under the weak Wolfe-Powell linear search technique, where the search direction is given by the following expression:

$$\begin{aligned} d_k=\left\{ \begin{array}{ll} -g_k,&{} \text{ if }\,\,\,\,k=0,\\ -g_k+\alpha _{k-1}\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{\Vert g_{k-1}\Vert ^2}, &{} \text{ if }\,\,\,\, k\ge 1,\\ \end{array} \right. \end{aligned}$$
(9)

The step size \(\alpha _{k-1}\) is included in the search direction (9). This formula not only satisfies the sufficient descent property without other conditions, but also guarantees global convergence under non-Lipschitz continuity conditions, while the trust region property is closely linked to the formula \(\alpha _{k-1}d_{k-1} = x_{k} - x_{k-1}\), objective function, and level set.

To summarize, while formulas (8) and (9) do possess the sufficient descent property without additional conditions, there are several limitations. The trust region property, vital for both theoretical analysis and numerical performance, unfortunately depends on the objective function, basic assumptions, and complex analysis. Additionally, there exist simpler and more cost-effective algorithms that simultaneously achieve better numerical performance and theoretical results.

Aforementioned discussions inspire us to propose following formula.

$$\begin{aligned} d_k=\left\{ \begin{array}{ll} -g_k,&{} \text{ if }\,\,\,\,k=0,\\ -g_k+\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{\sigma \Vert d_{k-1}\Vert \Vert y_{k-1}\Vert +|d_{k-1}y_{k-1}|}, &{} \text{ if }\,\,\,\, k\ge 1,\\ \end{array} \right. \end{aligned}$$
(10)

Remark 1

  1. (i)

    Formula (10) possesses the sufficient descent and trust region properties that are independent of any additional conditions.

  2. (ii)

    Global convergence is guaranteed even under conditions of non-Lipschitz continuity of the gradient function.

  3. (iii)

    The classical HS algorithm’s excellent numerical performance is incorporated into TT-TR-WP through a specified denominator.

This section presents Algorithm 1, while the subsequent section provides the theoretical analysis.

TT-TR-WP: A convergently three-term trust region algorithm with the weak Wolfe-Powell linear search

  • Step 0: Initialize \(x_0\in R^n\), \(d_0=-g_0\), constants \(\epsilon \in (0,1)\), \(\delta \in (0,\frac{1}{2})\), \(\tau \in (\delta ,1)\), \(\sigma >0\), and set \(k=0\).

  • Step 1: Stop rule \(\Vert g_k\Vert \le \epsilon \).

  • Step 2: Choose step size \(\alpha _k\) under formulas (6) and (7).

  • Step 3: Update iteration point \(x_{k+1} = x_k+\alpha _kd_k\).

  • Step 4: Stop rule \(\Vert g_{k+1}\Vert \le \epsilon \).

  • Step 5: Update search direction under formula (10).

  • Step 6: Set \(k=k+1\), and go to Step 2.

The global convergence of TT-TR-WP

This section analyzes the global convergence of TT-TR-WP, in which the properties of sufficient descent and trust region are firstly given.

Lemma 3.1

The search direction (10) simultaneously has the sufficient descent (4) and trust region (5) properties, i.e.,

$$\begin{aligned} g_k^Td_k = -\Vert g_k\Vert ^2, \end{aligned}$$
(11)

and

$$\begin{aligned} \Vert d_k\Vert \le (1+\frac{2}{\sigma })\Vert g_k\Vert , \end{aligned}$$
(12)

Proof

If \(k=0\), \(d_0=-g_0,\) and \(\Vert d_0\Vert \le \Vert g_0\Vert \le (1+\frac{2}{\sigma })\Vert g_0\Vert ,\)

If \(k\ge 1\), following formulas can be obtained from the formula (10):

$$\begin{aligned} g_k^Td_k= & {} g_k^T\left( -g_k+\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{\sigma \Vert d_{k-1}\Vert \Vert y_{k-1}\Vert +|d_{k-1}y_{k-1}|}\right) \\= & {} -\Vert g_k\Vert ^2+g_k^T\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{\sigma \Vert d_{k-1}\Vert \Vert y_{k-1}\Vert +|d_{k-1}y_{k-1}|}\\= & {} -\Vert g_k\Vert ^2. \end{aligned}$$

and

$$\begin{aligned} \Vert d_k\Vert= & {} \Vert -g_k+\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{\sigma \Vert d_{k-1}\Vert \Vert y_{k-1}\Vert +|d_{k-1}y_{k-1}|}\Vert \\\le & {} \Vert g_k\Vert +\frac{2\Vert g_k\Vert \Vert y_{k-1}\Vert \Vert d_{k-1}\Vert }{\sigma \Vert d_{k-1}\Vert \Vert y_{k-1}\Vert +|d_{k-1}y_{k-1}|}\\\le & {} (1+\frac{2}{\sigma })\Vert g_k\Vert , \end{aligned}$$

then completes the proof. \(\square \)

Remark 2

  1. (i)

    The Lemma 3.1 proves the sufficient descent and trust region properties of search direction (10), which are independent of any assumptions and linear search techniques.

  2. (ii)

    From formula (11), we can obtain

    $$\begin{aligned} -\Vert d_k\Vert \Vert g_k\Vert \le g_k^Td_k = -\Vert g_k\Vert ^2, \end{aligned}$$

    this means that

    $$\begin{aligned} \Vert g_k\Vert \le \Vert d_k\Vert , \end{aligned}$$

    thus following formula holds from formula (12)

    $$\begin{aligned} \Vert g_k\Vert \le \Vert d_k\Vert \le (1+\frac{2}{\sigma })\Vert g_k\Vert ,\forall \, k. \end{aligned}$$
    (13)

To achieve global convergence, certain basic assumptions are proposed.

Assumption

  1. (i)

    The level set \(S=\{x| h(x)\le h(x_0)\}\) is well-defined and bounded, where \(x_0\) is the initial point.

  2. (ii)

    The function h(x) is continuously differentiable and bounded below.

Under these assumptions, the following significant properties hold:

Property 1: The iteration sequence \(\{x_k\}\) is bounded.

Property 2: The gradient function g(x) is continuous on the level set.

Now pay attention to the global convergence of TT-TR-WP.

Theorem 3.1

If sequences \(\{x_k,d_k,\alpha _k,g_k\}\) are generated by TT-TR-WP, then, following formula holds

$$\begin{aligned} \liminf _{k \rightarrow \infty } \Vert g_{k}\Vert =0. \end{aligned}$$
(14)

Proof

We adopt proof by contradiction, and firstly make an assumption

$$\begin{aligned} \Vert g_k\Vert \ge \varepsilon _C, \end{aligned}$$
(15)

where \(\varepsilon _C\) is a positive constant.

Additionally, there exists a convergent subsequence \(\{x_{k_i}\}\) since iteration point \(\{x_k\}\) is bounded, it means that

$$\begin{aligned} x_{k_i}\rightarrow x^*, i\rightarrow \infty , \end{aligned}$$

Similarly, the gradient function is continuous, thus there exists \(\epsilon _1>0\) and an integer \(N_1>0\) such that

$$\begin{aligned} \Vert g(x_{k_i})-g(x^*)\Vert <\epsilon _1,\,\,\forall \,\,i>N_1. \end{aligned}$$
(16)

From formula (13), there exists \(\epsilon _2>0,\) and an integer \(N_2>0\) satisfying

$$\begin{aligned} \Vert d(x_{k_i})-d(x^*)\Vert <\epsilon _2,\,\,\forall \,\,i>N_2. \end{aligned}$$
(17)

From (16), (17) and (11), following formula holds

$$\begin{aligned} g(x^*)^Td(x^*)\le -\Vert g(x^*)\Vert ^2 \le -\varepsilon _C^2<0. \end{aligned}$$
(18)

On the other hand, following formula will be obtained from (7)

$$\begin{aligned} g(x_k+\alpha _kd_k)^Td_k\ge \tau g_k^Td_k, \end{aligned}$$

thus

$$\begin{aligned} g_{k_{i+1}}^Td_{k_{i}} - \tau g_{k_{i}}^Td_{k_{i}}\ge 0, \end{aligned}$$

then taking the limit on both sides and set \(N=\max \{N_1,N_2\},\) with the subsequence \(\{x_{k_i}\},\) we can deduce that

$$\begin{aligned} \lim _{i \rightarrow \infty }(g_{k_{i+1}}^Td_{k_{i}}-\tau g_{k_{i}}^Td_{k_{i}})=(1-\tau )g(x^*)^Td(x^*)\ge 0. \end{aligned}$$

It means that there exists a subsequence \(\{x_{k_i}\},\) such that

$$\begin{aligned} g(x^*)^Td(x^*)\ge 0, \end{aligned}$$

while this contradicts the relation (11), i.e. the original formula holds and the proof is completed. \(\square \)

Remark 3

  1. (i)

    Non-Lipschitz continuous gradient functions are prevalent. For instance, \(g(x) = \sin (\frac{1}{x})\) and \(g(x)=x^{\frac{3}{2}}\sin (\frac{1}{x})\) for \(x\in (0, 1].\)

  2. (ii)

    The global convergence of TR-TR-WP is established under the weak Wolfe-Powell linear search technique and gradient function non-Lipschitz continuity.

  3. (iii)

    The sufficient descent and trust region properties, (11) and (12), simplify the convergence analysis.

TT-TR-CG and theoretical analysis

This section will propose the other modified three-term trust region CG algorithm, TT-TR-CG, and prove some properties.

In TT-TR-CG, the search direction has following form:

$$\begin{aligned} d_k=\left\{ \begin{array}{ll} -g_k,&{} \text{ if }\,\,\,\,k=0,\\ -g_k+\frac{g_k^Ty_{k-1}d_{k-1}-g_k^Td_{k-1}y_{k-1}}{max\{\mu \Vert d_{k-1}\Vert \Vert y_{k-1}\Vert ,\Vert g_{k-1}\Vert ^2\}}, &{} \text{ if }\,\,\,\, k\ge 1,\\ \end{array} \right. \end{aligned}$$
(19)

where \(\mu > 0.\)

This subsection will firstly describe contents of objective algorithm.

TT-TR-CG: A convergently three-term trust region CG with the weak Wolfe-Powell

  • Step 0: Initialize \(x_0\in R^n\), \(d_0=-g_0\), constants \(\epsilon \in (0,1)\), \(\delta \in (0,\frac{1}{2})\), \(\tau \in (\delta ,1)\), \(\mu >0\), and set \(k=0\).

  • Step 1: Stop rule \(\Vert g_k\Vert \le \epsilon \).

  • Step 2: Choose step size \(\alpha _k\) under formulas (6) and (7).

  • Step 3: Update iteration point \(x_{k+1} = x_k+\alpha _kd_k\).

  • Step 4: Stop rule \(\Vert g_{k+1}\Vert \le \epsilon \).

  • Step 5: Update search direction under formula (19).

  • Step 6: Set \(k=k+1\), and go to Step 2.

Remark 4

  1. (i)

    The search direction (19) satisfies both the sufficient descent and trust region properties simultaneously.

  2. (ii)

    Global convergence analysis is established under the gradient function non-Lipschitz continuity and weak Wolfe-Powell linear search technique.

  3. (iii)

    The good numerical performance of the classical PRP algorithm is partly incorporated into TT-TR-CG through the specified denominator.

Lemma 4.1

The search direction (19) has the sufficient descent (4) and trust region (5) properties simultaneously without any conditions, i.e.,

$$\begin{aligned} g_k^Td_k = -\Vert g_k\Vert ^2, \end{aligned}$$
(20)

and

$$\begin{aligned} \Vert d_k\Vert \le (1+\frac{2}{\mu })\Vert g_k\Vert . \end{aligned}$$
(21)

Proof

The proof is similar with the TT-TR-WP, thus omits it. \(\square \)

To obtain the global convergence, some basic assumptions are proposed.

Assumption

  1. (i)

    the level set \(S=\{x| h(x)\le h(x_0)\}\) is defined and bounded, where \(x_0\) is an initial point;

  2. (ii)

    the objective function h(x) is continuously differentiable and bounded below.

Theorem 4.1

If sequences \(\{x_k,d_k,\alpha _k,g_k\}\) are generated by TT-TR-CG, then, following formula holds

$$\begin{aligned} \liminf _{k \rightarrow \infty } \Vert g_{k}\Vert =0. \end{aligned}$$
(22)

Proof

The proof is similar with the “The global convergence of TT-TR-WP”, then completes the proof. \(\square \)

Case studies

This section utilises objective algorithms to restore noisy images and solve large-scale unconstrained optimisation problems to test their numerical performance.

To further test the numerical performance, this paper introduces two baseline algorithms in Ref.26,28, namely MPRP and A-TPRP-A, and the formulas are (8), (9), respectively. The former is the first three-term conjugate gradient algorithm and is widely cited. The latter is the latest algorithm which updates the search direction with the step size and possesses global convergence without Lipschitz continuity. The baseline algorithms possess both good numerical performance and theoretical properties in the existing works.

The experimental environment consists of an Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz 1.80 GHz with 16 GB RAM running on the Windows 11 operating system.

Image restoration

The restoration of noisy images is of great practical importance and is widely used. This subsection uses the TT-TR-WP, TT-TR-CG and baseline algorithms to restore noisy images to test their numerical performance, in which three figures are chosen because they are widely used and classical test figures, see Refs.24,25.

The objective function and experimental settings are described as follows: The candidate noise index set is denoted as N, the objective function as \(\omega (u)\), and the edge-preserving function as \(\chi \). The true image containing \(K\times L\) pixels is denoted as x. For a more detailed explanation of image restoration, please refer to Refs.3,24,25,28.

$$\begin{aligned} N:=\left\{ (i, j) \in I \mid {\bar{\zeta }}_{i, j} \ne \zeta _{i, j}, \zeta _{i, j}=s_{\min } \text{ or } s_{\max }\right\} , \end{aligned}$$

where \(I = \{1, 2, \ldots , K\} \times \{1,2,\ldots ,L,\},\) \(\zeta _{i, j}\) is the observed noisy image and \({\bar{\zeta }}_{i, j}\) is the verified image, \(s_{min}\) and \(s_{max}\) are the minimum and maximum noisy pixel. Consider following optimization function

$$\begin{aligned} \min _u \omega (u) \end{aligned}$$

and

$$\begin{aligned} \omega (u)=\sum _{(i, j) \in N}\left\{ \sum _{(m, n) \in \phi _{i, j} \backslash N} \chi \left( u_{i, j}-\zeta _{m, n}\right) +\frac{1}{2} \sum _{(m, n) \in \phi _{i, j} \bigcap {N}} \chi \left( u_{i, j}-u_{m, n}\right) \right\} , \end{aligned}$$

\(\phi _{i, j} = \{(i,j-1), (i,j+1),(i-1,j),(i+1,j)\}.\)

$$\begin{aligned} \chi =\left\{ \begin{array}{ll} t^{2} / \nu , &{} \text{ if } \quad |t| \le \nu \\ |t|-2 \nu , &{} \text{ if } \quad |t|>\nu , \end{array}\right. \end{aligned}$$

where \(\nu > 0.\)

$$\begin{aligned} PSNR=10\times \log _{10}\left( \frac{(2^{num}-1)^2}{MSE}\right) , \end{aligned}$$

where MSE is the mean square error between the original image and processed image and num is the number of bits.

The stop rule of algorithm is \(\frac{\Vert h_{k+1}-h_k\Vert }{\Vert h_k\Vert }<\varepsilon \), and the parameters are \(\delta =0.2, \tau = 0.895, \sigma =0.1, \mu = 0.1, \varepsilon =10^{-6}.\)

Table 2 The running time under different noise ratios with diverse algorithms.
Table 3 The ratio of total running time comparing with TT-TR-WP.
Table 4 The SSIM and PSNR under different noise ratios with diverse algorithms.

In restoring noisy gray-scale images, from Table 2, we can conclude that TT-TR-WP exhibits the best numerical performance in terms of running time, TT-TR-CG is the second best, MPRP is third, and A-T-PRP-A is the slowest. Furthermore, if we set the performance of TT-TR-WP as the standard, then TT-TR-CG takes around 2.34 times longer. The other algorithms take around 2.46 and 2.42 times longer, respectively. In Table 3, the time proportion among all algorithms in each figure and all figures is proposed, in which the biggest gap is 1.68, TT-TR-WP is far ahead than the others, and TT-TR-CG is pretty good in most situations. Additionally, results in Table 4 further demonstrate that all algorithms obtain highly similar SSIM and PSNR values. Combining the above discussion, we can make a conclusion: to obtain highly similar results, TT-TR-WP and TT-TR-CG perform relatively well and the proposed algorithms are competitive.

In summary, TT-TR-WP exhibits impressive numerical performance, and TT-TR-CG is highly competitive with the others. To save space, this paper only records numerical results but abandons the display of figures obtained by diverse algorithms with noise ratios of 70%, and 90%, see Fig. 1. In each row, the first column is obtained by TT-TR-WP, the second column by TT-TR-CG, the third column by A-T-PRP-A, and the last column by MPRP.

Figure 1
figure 1

From left to right, the images disturbed by 50\(\%\) salt-and-pepper noise, the images restored by TT-TR-WP (first column), TT-TR-CG (second column), A-T-PRP-A (third column) and MPRP (last column), respectively.

Color image restoration

To further evaluate the performance of the objective algorithms, this section applies various algorithms to restore color images with different levels of noise. Peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) and Mean Squared Error (MSE) are widely used measurements for image quality assessment and are used in this section. To save space, this paper only records numerical results but abandons the display of figures obtained by diverse algorithms with noise ratios of 20%, 60% and 80%. The stop rule of algorithm is \(\frac{\Vert h_{k+1}-h_k\Vert }{\Vert h_k\Vert }<\varepsilon \), and the parameters are \(\delta =0.0885, \tau = 0.885, \sigma =0.0015, \mu = 1.1555, \varepsilon =10^{-4}.\)

Table 5 The running time with different noise ratios across various algorithms.

In Table 5, the total running time of four algorithms is 73.83, 74.88, 80.02, 74.52 s, respectively. Additionally, from Tables 678, the PSNR, MSE, and SSIM of algorithms are highly similar, but object algorithms are relatively competitive. The images restored by various algorithms under different noise ratios are presented in Fig. 2 that corresponds to noise ratio 40%. In each row, the first column is obtained by TT-TR-WP, the second column by TT-TR-CG, the third column by A-T-PRP-A, and the last column by MPRP.

Table 6 The PSNR with different noise ratios across various algorithms.
Table 7 The MSE with different noise ratios across various algorithms.
Table 8 The SSIM with different noise ratios across various algorithms (s).
Figure 2
figure 2

From left to right, the images disturbed by 40\(\%\) salt-and-pepper noise, the images restored by TT-TR-WP (first column), TT-TR-CG (second column), A-T-PRP-A (third column) and MPRP (last column), respectively.

General unconstrained optimization

To further test the numerical performance, this subsection applies the algorithms to solve large-scale unconstrained optimization problems. Sixty-five classical functions are randomly selected from2, as shown in Table 9, with dimensions of 3000, 6000, and 12,000. The stopping criterion is \(\Vert g(x_k)\Vert <\varepsilon \) or \(NI > 8000\), where NI is the iteration number, and \(g(x_k)\) is the gradient value at the point \(x_k\). The parameters used are \(\delta =0.2, \tau = 0.9, \sigma =0.001, \mu = 0.1, \varepsilon =10^{-6}\).

Table 9 Test functions.

The running time in seconds is used as the reference standard for evaluating numerical performance, as shown in Table 10. The relative numerical performance of solving large-scale problems is illustrated in Fig. 3, in which the red line denotes TT-TR-WP, black line denotes TT-TR-CG, blue line denotes A-T-PRP-A, and the other denotes MPRP. TT-TR-WP has a high initial value, which means that possesses relatively good robustness. TT-TR-CG exhibits gradually increase trend all time which means that possesses relatively good applicability. TT-TR-WP and TT-TR-CG both possess relatively good robustness and applicability than the others.

In summary, TT-TR-WP and TT-TR-CG possess relatively good numerical performance than baseline algorithms, in terms of applicability and robustness, in which TT-TR-WP has the best robustness and relatively good applicability and TT-TR-CG is the opposite.

Table 10 The running time of diverse algorithms on tested problems.

Conclusion

This paper introduces two three-term trust region conjugate gradient algorithms, TT-TR-WP and TT-TR-CG, which are capable of converging under non-Lipschitz continuous gradient functions without any additional conditions. These algorithms possess sufficient descent and trust region properties, and demonstrate global convergence. In order to assess their numerical performance, we compare them with two classical algorithms in terms of restoring noisy gray-scale and color images as well as solving large-scale unconstrained problems. To obtain highly similar SSIM and PSNR values in noisy gray-scale images, TT-TR-WP exhibits the best numerical performance in terms of running time, TT-TR-CG is the second best, MPRP is third, and A-T-PRP-A is the slowest. Furthermore, if we set the performance of TT-TR-WP as the standard, then TT-TR-CG takes around 2.34 times longer. The other algorithms take around 2.46 and 2.42 times longer, respectively. In solving the same color images, the proposed algorithms exhibit relative good performance over other algorithms. Additionally, in comparative experiments of algorithm performance, the curve of TT-TR-CG has the maximum initial value, while the curve of TT-TR-WP is the second-best, indicating that TT-TR-CG and TT-TR-WP are relatively more robustness and have high stability when facing diverse situations. In summary, TT-TR-WP and TT-TR-CG exhibit relatively better performance in terms of applicability and robustness.

Figure 3
figure 3

The running time of diverse algorithms on tested problems.