Open AccessArticle

An Improvement of the Alternating Direction Method of Multipliers to Solve the Convex Optimization Problem

College of Mathematics and Computational Science, Guilin University of Electronic Technology, Guilin 541004, China

Author to whom correspondence should be addressed.

Mathematics 2025, 13(5), 811; https://doi.org/10.3390/math13050811

Submission received: 18 December 2024 / Revised: 17 February 2025 / Accepted: 25 February 2025 / Published: 28 February 2025

Download

Browse Figures

Versions Notes

Abstract

The alternating direction method is one of the attractive approaches for solving convex optimization problems with linear constraints and separable objective functions. Experience with applications has shown that the number of iterations depends significantly on the penalty parameter for the linear constraint. The penalty parameters in the classical alternating direction method are a constant. In this paper, an improved alternating direction method is proposed, which not only adaptively adjusts the penalty parameters per iteration based on the iteration message but also adds relaxation factors to the Lagrange multiplier update steps. Preliminary numerical experiments show that the technique of adaptive adjusting of the penalty parameters per iteration and attaching relaxation factors in the Lagrange multiplier updating steps are effective in practical applications.

Keywords:

convex optimization; alternating direction method of multipliers; symmetric alternating direction method of multipliers; global convergence

MSC:

65F10; 65F45

1. Introduction

Many problems in the fields of signal and image processing, machine learning, medical image reconstruction, computer vision, and network communications [1,2,3,4,5,6,7,8] can be reduced to solving the following convex optimization problem with linear constraints:

\min \{f (x) + g (y) | A x + B y = b, x \in X, y \in Y\}

(1)

where

A \in ℝ^{m \times m_{1}}

and

B \in ℝ^{m \times m_{2}}

are the given matrices;

b \in ℝ^{m}

is the given vector;

f : ℝ^{m_{1}} \to ℝ

and

g : ℝ^{m_{2}} \to ℝ

are continuous closed convex functions; and

X

and

Y

are nonempty closed convex subsets in

ℝ^{m_{1}}

and

ℝ^{m_{2}}

, respectively. The augmented Lagrangian function of the convex optimization problem (Equation (1)) is the following Equation (2).

L (x, y, λ) = f (x) + g (y) - λ^{T} (A x + B y - b) + \frac{β}{2} {‖A x + B y - b‖}_{2}^{2},

(2)

where

β > 0

is the penalty parameter and

λ \in ℝ^{m}

is the Lagrange multiplier vector. Based on the objective function with separable structural properties, the alternating direction method of multipliers (ADMM) has been proposed in the literature [9,10] to solve the problem (Equation (1)) as follows:

\{\begin{cases} x_{k + 1} = \arg \min_{x \in X} \{f (x) - {(λ_{k})}^{T} (A x + B y_{k} - b) + \frac{β}{2} {‖A x + B y_{k} - b‖}_{2}^{2}\}, \\ y_{k + 1} = \arg \min_{y \in Y} \{g (y) - {(λ_{k})}^{T} (A x_{k + 1} + B y - b) + \frac{β}{2} {‖A x_{k + 1} + B y - b‖}_{2}^{2}\}, \\ λ_{k + 1} = λ_{k} - β (A x_{k + 1} + B y_{k + 1} - b) . \end{cases}

(3)

The ADMM algorithm can be viewed as the Gauss Seidel form of the augmented Lagrangian multiplier method, and it can be interpreted as the Douglas Rachford splitting algorithm (DRSM) from a dual perspective [11]. In addition, the DRSM can be further interpreted as a proximal point algorithm (PPA) [12]. The ADMM algorithm is an important method for solving separable convex optimization problems. Due to its fast processing speed and good convergence performance, the ADMM algorithm is widely used in statistical learning, machine learning, and other fields. However, the ADMM algorithm has slow convergence speed under the requirement of a large scale and high precision. Therefore, many experts and scholars are working on improving the ADMM algorithm, such as Fortin and Glowinski, who suggested in [13] attaching a relaxation factor to the Lagrange multiplier updating step in Equation (3), which this results in the following scheme:

\{\begin{cases} x_{k + 1} = \arg \min_{x \in X} \{f (x) - {(λ_{k})}^{T} (A x + B y_{k} - b) + \frac{β}{2} {‖A x + B y_{k} - b‖}_{2}^{2}\}, \\ y_{k + 1} = \arg \min_{y \in Y} \{g (y) - {(λ_{k})}^{T} (A x_{k + 1} + B y - b) + \frac{β}{2} {‖A x_{k + 1} + B y - b‖}_{2}^{2}\}, \\ λ_{k + 1} = λ_{k} - γ β (A x_{k + 1} + B y_{k + 1} - b), \end{cases}

(4)

where the parameter γ can be chosen in the interval

(0, \frac{1 + \sqrt{5}}{2})

, and thus it becomes possible to enlarge the step size for updating the Lagrange multiplier. An advantage of this larger step size in Equation (4) is that it can numerically lead to faster convergence empirically. This scheme (Equation (4)) differs from the original ADMM scheme (Equation (3)) only in the fact that the step size for updating the Lagrange multiplier can be larger than 1. But technically they are actually two distinct families of ADMM algorithms, in which one is derived from the operator splitting framework and the other is derived from Lagrangian splitting. Thus, despite the similarity in notation, the ADMM scheme (Equation (4)) with Fortin et al.’s larger step size and the original ADMM scheme (Equation (3)) are completely different in nature. On the other hand, Glowinski et al. [14] applied the Douglas Rachford splitting method (DRSM) [11] to the dual of Equation (1) and obtained the following scheme:

\{\begin{cases} x_{k + 1} = \arg \min_{x \in X} \{f (x) - {(λ_{k})}^{T} (A x + B y_{k} - b) + \frac{β}{2} {‖A x + B y_{k} - b‖}_{2}^{2}\}, \\ λ_{k + \frac{1}{2}} = λ_{k} - β (A x_{k + 1} + B y_{k} - b), \\ y_{k + 1} = \arg \min_{y \in Y} \{g (y) - {(λ_{k + \frac{1}{2}})}^{T} (A x_{k + 1} + B y - b) + \frac{β}{2} {‖A x_{k + 1} + B y - b‖}_{2}^{2}\}, \\ λ_{k + 1} = λ_{k + \frac{1}{2}} - β (A x_{k + 1} + B y_{k + 1} - b) . \end{cases}

(5)

This scheme (Equation (5)) can be regarded as a symmetric version of the ADMM scheme (Equation (3)) in the sense that the variables

x

and

y

are treated equally, each of which is followed consequently by an update of the Lagrange multiplier. However, as shown in [15], the sequence generated by the symmetric ADMM (Equation (5)) is not necessarily strictly contractive with respect to the solution set of Equation (1), while this property can be ensured by the sequence generated by the ADMM (Equation (3)). Because of these deficiencies, Bingsheng He et al. [15] proposed the following scheme:

\{\begin{cases} x_{k + 1} = \arg \min_{x \in X} \{f (x) - {(λ_{k})}^{T} (A x + B y_{k} - b) + \frac{β}{2} {‖A x + B y_{k} - b‖}_{2}^{2}\}, \\ λ_{k + \frac{1}{2}} = λ_{k} - α β (A x_{k + 1} + B y_{k} - b), \\ y_{k + 1} = \arg \min_{y \in Y} \{g (y) - {(λ_{k + \frac{1}{2}})}^{T} (A x_{k + 1} + B y - b) + \frac{β}{2} {‖A x_{k + 1} + B y - b‖}_{2}^{2}\}, \\ λ_{k + 1} = λ_{k + \frac{1}{2}} - α β (A x_{k + 1} + B y_{k + 1} - b), \end{cases}

(6)

where the parameter α∈ (0, 1) is for shrinking the step sizes in Equation (5). The sequence generated by Equation (6) is strictly contractive with respect to the solution set of Equation (1). Thus, we called it the strictly contractive symmetric version of the ADMM. Limiting α ∈ (0, 1) in the strictly contractive symmetric version of the ADMM (Equation (6)) makes the updating of Lagrange multipliers more conservative and with smaller steps, but smaller steps should be strongly avoided in practical applications. Instead, one wants to seek larger steps where possible to speed up numerical performance. For this purpose, Bingsheng He et al. [16] proposed the following scheme:

\{\begin{cases} x_{k + 1} = \arg \min_{x \in X} \{f (x) - {(λ_{k})}^{T} (A x + B y_{k} - b) + \frac{β}{2} {‖A x + B y_{k} - b‖}_{2}^{2}\}, \\ λ_{k + \frac{1}{2}} = λ_{k} - r β (A x_{k + 1} + B y_{k} - b), \\ y_{k + 1} = \arg \min_{y \in Y} s \{g (y) - {(λ_{k + \frac{1}{2}})}^{T} (A x_{k + 1} + B y - b) + \frac{β}{2} {‖A x_{k + 1} + B y - b‖}_{2}^{2}\}, \\ λ_{k + 1} = λ_{k + \frac{1}{2}} - s β (A x_{k + 1} + B y_{k + 1} - b), \end{cases}

(7)

where

r

and

s

are limited to the following region

D = \{(s, r) | s \in (0, \frac{1 + \sqrt{5}}{2}), r \in (- 1, 1) & r + s > 0, | r | < 1 + s - s^{2}\} .

(8)

As for more symmetric alternation method improvements of the ADMM algorithm, we provide the reader with references [17,18,19,20,21,22,23,24,25,26].

In addition, the convergence speed of the ADMM algorithm is directly related to the choice of the penalty parameter

β

. Therefore, improvements of the ADMM algorithm with a change to penalty parameter

β_{k}

instead of the fixed penalty parameter

β

are proposed. Such as He, Yang, and Wang [27] proposed an adaptive penalty parameter scheme to solve linearly constrained monotone variational inequalities. In this paper, we, based on the idea of references [16,27], propose an adaptive penalty parameter alternate direction method of multipliers with attach relaxation factors and with the Lagrange multiplier updated twice at each iteration. The iteration format is shown in Algorithm 1.

Algorithm 1: Improved alternate direction multiplier algorithm to solve convex optimization problem (1)
Given error tolerance $ε > 0$ , controls parameter $μ \in (0, 1)$ , and relaxation factors $(r, s) \in D$ defined in (Equation (8)). Choose parameter sequence $\{τ_{k}\}$ , which satisfies $τ_{k} \geq 0$ and $\sum_{k = 0}^{\infty} τ_{k} < + \infty$ . Given initial penalty parameter $β_{0} > 0$ and initial approximation $(x_{0}, y_{0}, λ_{0}) \in (X, Y, ℝ^{r})$ . Set $k = 0$ . Step 1. Computing
$x_{k + 1} = \arg \min_{x \in X} \{f (x) - {(λ_{k})}^{T} (A x + B y_{k} - b) + \frac{β_{k}}{2} {‖A x + B y_{k} - b‖}_{2}^{2}\}$ ,	(9)
$λ_{k + \frac{1}{2}} = λ_{k} - r β_{k} (A x_{k + 1} + B y_{k} - b)$ ,	(10)
$y_{k + 1} = \arg \min_{y \in Y} \{g (y) - {(λ_{k + \frac{1}{2}})}^{T} (A x_{k + 1} + B y - b) + \frac{β_{k}}{2} {‖A x_{k + 1} + B y - b‖}_{2}^{2}\},$	(11)
$λ_{k + 1} = λ_{k + \frac{1}{2}} - s β_{k} (A x_{k + 1} + B y_{k + 1} - b)$ .	(12)
Step 2. If $\max ({‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}, {‖B (y_{k} - y_{k + 1})‖}_{2}) \leq ε$ , stop. Otherwise, go to Step 3. Step 3. Updating penalty parameter
$β_{k + 1} = \{\begin{cases} (1 + τ_{k}) β_{k}, & i f {‖x_{k} - P_{X} [x_{k} - (\nabla f (x_{k}) - A^{T} λ_{k})]‖}_{2} < μ {‖A x_{k} + B y_{k} - b‖}_{2}, \\ β_{k} / (1 + τ_{k}), & i f μ {‖x_{k} - P_{X} [x_{k} - (\nabla f (x_{k}) - A^{T} λ_{k})]‖}_{2} > {‖A x_{k} + B y_{k} - b‖}_{2}, \\ β_{k}, & otherwise . \end{cases}$	(13)
Step 4. Set $k = k + 1$ , and go to Step 1.

We choose the algorithm described in [28] with the necessary modifications to solve the x-subproblem and y-subproblem in Algorithm 1. The iteration method to solve the x-subproblem can be described as follows in Algorithm 2.

Algorithm 2: Algorithm to solve x-subproblem (9) or y-subproblem (11)

Given constants

θ \in (0, 1), ν \in (0, 1)

l \in (1, \infty)

, error tolerance

ε

, and initial approximation

x_{0} \in X

. Set

η_{0} = 0.9

and

k = 0

.
Step 1. Computing

{\bar{x}}_{k + 1} = P_{X} [x_{k} - η_{k} F (x_{k})]

, where

F (x) = \nabla f (x) - A^{T} λ_{k} + β_{k} A^{T} (A x + B y_{k} - b)

.
Step 2. If

ψ_{k} = {η_{k} {‖F (x_{k}) - F ({\bar{x}}_{k + 1})‖}_{2} / ‖x_{k} - {\bar{x}}_{k + 1}‖}_{2} \leq ν

, computing

x_{k + 1} = P_{X} [x_{k} - ρ_{k} F ({\bar{x}}_{k + 1})],

where

ρ_{k} = l η_{k} e_{k}^{T} d_{k} / {‖d_{k}‖}_{2}^{2}

e_{k} = x_{k} - {\bar{x}}_{k + 1}

d_{k} = e_{k} - η_{k} [F (x_{k}) - F ({\bar{x}}_{k + 1})]

.
If

{‖x_{k + 1} - P_{X} [x_{k + 1} - F (x_{k + 1})]‖}_{2} \leq ε

, stop (in this case,

x_{k + 1}

is an approximate solution of the x-subproblem in Algorithm 1). Otherwise, define

η_{k + 1} ≜ \{\begin{array}{l} (3 / 2) η_{k}, & i f ψ_{k} \leq θ \\ η_{k}, & otherwise \end{array}

and set

k = k + 1

. Go to Step 1.
If all of the above are not true, go to Step 3.
Step 3. Define

η_{k} ≜ (2 / 3) η_{k} * \min \{1, 1 / ψ_{k}\}

and set

{\bar{x}}_{k + 1} = P_{X} [x_{k} - η_{k} F (x_{k})]

. Go to Step 2.

The rest of this paper is organized as follows: In Section 2, the global convergence of the proposed algorithm is proven. In Section 3, some numerical comparison experiments with existing algorithms are given. Finally, we draw some conclusions.

2. Convergence of Algorithm 1

To conveniently prove the global convergence of Algorithm 1, we first give the following Lemmas 1–3.

Lemma 1.

([29]). Let

Ω

be a nonempty closed convex subset on

ℝ^{n}

, and

F (u)

is a continuous closed convex function on

Ω

; then, a sufficient necessary condition for

u^{*}

to be a solution of the optimization problem

\min_{u \in Ω} F (u)

is that

u^{*}

is a solution of the variational inequality

{(u - u^{*})}^{T} \nabla F (u^{*}) \geq 0, \forall u \in Ω

(14)

Lemma 2.

([26,27]). A sufficient necessary condition for the vector

u^{*}

to be a solution of the variational inequality (Equation (14)) is that

u^{*}

is also a solution of the projection equation

u = P_{Ω} [u - \nabla F (u)],

where

P_{Ω} [u]

is the projection of

u

onto

Ω

; that is,

P_{Ω} (u) = \arg \min_{y \in Ω} {‖y - u‖}_{2}

Lemma 3.

([30]). Let

Ω

be a nonempty closed convex subset on

ℝ^{n}

, then we have

{‖P_{Ω} (v) - P_{Ω} (u)‖}_{2} \leq {‖v - u‖}_{2}, \forall v, u \in Ω .

Let

\begin{array}{l} w = (\begin{array}{l} x \\ y \\ λ \end{array}), Q (w) = (\begin{array}{l} \nabla f (x) - A^{T} λ \\ \nabla g (y) - B^{T} λ \\ A x + B y - b \end{array}) \\ (x, y, λ) \in Ω = X \times Y \times ℝ^{r}, \end{array}

then the problem (Equation (1)) is equivalent to finding

w^{*}

such that

(w - w^{*}) {}^{T}Q (w^{*}) \geq 0, \forall (x, y, λ) \in Ω .

(15)

It is known from Lemma 2 that

w^{*}

is a solution of the variational inequality (Equation (15)) if and only if

w^{*}

is the zero point of

e (w) : = (\begin{matrix} e_{x} (w) \\ e_{y} (w) \\ e_{λ} (w) \end{matrix}) = (\begin{matrix} x - P_{X} \{x - [\nabla f (x) - A^{T} λ]\} \\ y - P_{Y} \{y - [\nabla g (y) - B^{T} λ]\} \\ A x + B y - b \end{matrix}) .

Lemma 4.

Assume that the sequence

{w_{k}}

is generated by Algorithm 1; then, the sufficient necessary condition for

w_{k + 1}

to be a solution of the variational inequality (Equation (15)) is

{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2} = {‖B y_{k} - B y_{k + 1}‖}_{2} = 0 .

(16)

Proof.

It follows from Lemma 1 that solving Equations (9) and (11) is equivalent to finding

x_{k + 1} \in X

and

y_{k + 1} \in Y

such that

{(x^{'} - x_{k + 1})}^{T} \{\nabla f (x_{k + 1}) - A^{T} [λ_{k} - β_{k} (A x_{k + 1} + B y_{k} - b)]\} \geq 0, \forall x^{'} \in X,

(17)

{(y^{'} - y_{k + 1})}^{T} \{\nabla g (y_{k + 1}) - B^{T} [λ_{k + \frac{1}{2}} - β_{k} (A x_{k + 1} + B y_{k + 1} - b)]\} \geq 0, \forall y^{'} \in Y .

(18)

Thus, it follows from Lemma 2 that

x_{k + 1} = P_{X} \{x_{k + 1} - [\nabla f (x_{k + 1}) - A^{T} (λ_{k} - β_{k} (A x_{k + 1} + B y_{k} - b))]\},

y_{k + 1} = P_{Y} \{y_{k + 1} - [\nabla g (y_{k + 1}) - B^{T} (λ_{k + \frac{1}{2}} - β_{k} (A x_{k + 1} + B y_{k + 1} - b))]\} .

Noting that Equations (10) and (12) hold, we have

λ_{k + 1} = λ_{k} - r β_{k} (A x_{k + 1} + B y_{k} - b) - s β_{k} (A x_{k + 1} + B y_{k + 1} - b) .

Therefore, we have Lemma 3, that

\begin{array}{r} {‖e (w_{k + 1})‖}_{2}^{2} = {‖(\begin{matrix} x_{k + 1} - P_{X} \{x_{k + 1} - [\nabla f (x_{k + 1}) - A^{T} λ_{k + 1}]\} \\ y_{k + 1} - P_{Y} \{y_{k + 1} - [\nabla g (y_{k + 1}) - B^{T} λ_{k + 1}]\} \\ A x_{k + 1} + B y_{k + 1} - b \end{matrix})‖}_{2}^{2} \\ \leq {‖\begin{matrix} A^{T} [λ_{k} - λ_{k + 1} - β_{k} (A x_{k + 1} + B y_{k} - b)] \\ B^{T} [λ_{k + \frac{1}{2}} - λ_{k + 1} - β_{k} (A x_{k + 1} + B y_{k + 1} - b)] \\ A x_{k + 1} + B y_{k + 1} - b \end{matrix}‖}_{2}^{2} \\ = {‖\begin{matrix} A^{T} [(r + s - 1) β_{k} (A x_{k + 1} + B y_{k + 1} - b) - (1 - r) β_{k} (B y_{k} - B y_{k + 1})] \\ B^{T} [((s - 1) β_{k} (A x_{k + 1} + B y_{k + 1} - b))] \\ A x_{k + 1} + B y_{k + 1} - b \end{matrix}‖}_{2}^{2} \\ \begin{array}{l} \leq [{(r + s - 1)}^{2} β_{k}^{2} {‖A‖}_{F}^{2} + {(s - 1)}^{2} β_{k}^{2} {‖B‖}_{F}^{2} + 1] {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ \begin{matrix}  \end{matrix} + {(1 - r)}^{2} β_{k}^{2} {‖A‖}_{F}^{2} \cdot {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{array} \end{array}

From above inequality, and noting that r and s are fixed constants and

{β_{k}}

is a bounded sequence of positive numbers, we know that

w_{k + 1}

to be the solution of the variational inequality (Equation (15)) if and only if Equation (16) holds. □

Lemma 5.

Assume that the sequence

{w_{k}}

is generated by Algorithm 1 and

w^{*}

is a solution of the variational inequality (Equation (14)), and we have

\begin{array}{r} {(λ_{k} - λ^{*})}^{T} (A x_{k + 1} + B y_{k + 1} - b) \geq β_{k} {(A x_{k + 1} + B y_{k} - b)}^{T} (A x_{k + 1} - A x^{*}) \\ + r β_{k} {(A x_{k + 1} + B y_{k} - b)}^{T} (B y_{k + 1} - B y^{*}) \\ + β_{k} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k + 1} - B y^{*}) . \end{array}

(19)

Proof.

Since

w^{*}

is a solution of the variational inequality (Equation (15)), we have

A x^{*} + B y^{*} - b = 0

and

{(x_{k + 1} - x^{*})}^{T} \{\nabla f (x^{*}) - A^{T} λ^{*}\} \geq 0,

(20)

{(y_{k + 1} - y^{*})}^{T} \{\nabla g (y^{*}) - B^{T} λ^{*}\} \geq 0 .

(21)

On the other hand, we have by Equations (17) and (18) that

{(x^{*} - x_{k + 1})}^{T} \{\nabla f (x_{k + 1}) - A^{T} [λ_{k} - β_{k} (A x_{k + 1} + B y_{k} - b)]\} \geq 0,

(22)

{(y^{*} - y_{k + 1})}^{T} \{\nabla g (y_{k + 1}) - B^{T} [λ_{k + \frac{1}{2}} - β_{k} (A x_{k + 1} + B y_{k + 1} - b)]\} \geq 0 .

(23)

Noting that the gradient of convex function is a monotone function, we have from Equations (20)–(22) that

{(x_{k + 1} - x^{*})}^{T} \{A^{T} [λ_{k} - λ^{*} - β_{k} (A x_{k + 1} + B y_{k} - b)]\} \geq 0

(24)

Noting that the gradient of convex function is a monotone function, we have from Equations (21)–(23) that

{(y_{k + 1} - y^{*})}^{T} \{B^{T} [λ_{k + \frac{1}{2}} - λ^{*} - β_{k} (A x_{k + 1} + B y_{k + 1} - b)]\} \geq 0 .

(25)

Noting that

A x^{*} + B y^{*} - b = 0

, we know from Equation (11), Equations (24) and (25) that Equation (19) holds. □

Lemma 6.

For

k \geq 1

, we have

\begin{matrix} β_{k} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1}) \geq \frac{1 - s}{1 + r} β_{k - 1} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1}) \\ - \frac{r}{1 + r} β_{k} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{matrix}

(26)

Proof.

From Equation (18) we have

{(y_{k} - y_{k + 1})}^{T} \{\nabla g (y_{k + 1}) - B^{T} [λ_{k + \frac{1}{2}} - β_{k} (A x_{k + 1} + B y_{k + 1} - b)]\} \geq 0,

(27)

and on the other hand, by setting

k = k - 1

and

y^{'} = y_{k + 1}

in Equation (18), we have

{(y_{k + 1} - y_{k})}^{T} \{\nabla g (y_{k}) - B^{T} [λ_{k - \frac{1}{2}} - β_{k - 1} (A x_{k} + B y_{k} - b)]\} \geq 0 .

(28)

Noting that Equation (6) holds, we have

λ_{k} = λ_{k - \frac{1}{2}} - s β_{k - 1} (A x_{k} + B y_{k} - b) .

(29)

Combining Equation (3) and Equation (29), we obtain

λ_{k + \frac{1}{2}} = λ_{k - \frac{1}{2}} - r β_{k} (A x_{k + 1} + B y_{k} - b) - s β_{k - 1} (A x_{k} + B y_{k} - b) .

(30)

From Equations (27), (28), and (30) and the monotonicity of the gradient of the convex function, we know that Equation (26) holds. □

Let

H_{k} = (\begin{matrix} (r + s - r s) β_{k} {}^{2}B^{T} B & - r β_{k} B^{T} \\ - r β_{k} B & I_{m} \end{matrix}), (r, s) \in D,

then

H_{k}

is a symmetric semi-definite matrix, and defining

{‖x‖}_{H_{k}}^{2} = x^{T} H_{k} x,

we have the following Lemma 7.

Lemma 7.

Assume that the sequence

{w_{k}}

is generated by Algorithm 1 and

w^{*}

is the solution of the variational inequality (Equation (14)), and we have

{‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} - φ (w_{k}),

(31)

where

v_{k} = (\begin{array}{l} y_{k} \\ λ_{k} \end{array}), v^{*} = (\begin{array}{l} y^{*} \\ λ^{*} \end{array})

and

\begin{array}{l} φ (w_{k}) & = (r + s) (2 - (r + s)) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + \frac{{(1 - r)}^{2} (r + s)}{1 + r} β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ - |\frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1})| . \end{array}

(32)

Proof.

By Equations (3) and (4) and the definition of

H_{k}

and

{‖x‖}_{H_{k}}

, we have

\begin{array}{l} {‖(\begin{matrix} y_{k + 1} - y_{k} \\ λ_{k + 1} - λ_{k} \end{matrix})‖}_{H_{k}}^{2} = {‖(\begin{matrix} y_{k} - y_{k + 1} \\ (r + s) β_{k} (A x_{k + 1} + B y_{k + 1} - b) + r β_{k} (B y_{k} - B y_{k + 1}) \end{matrix})‖}_{H_{k}}^{2} \\ = (r + s - r s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} + {‖(r + s) β_{k} (A x_{k + 1} + B y_{k + 1} - b) + r β_{k} (B y_{k} - B y_{k + 1})‖}_{2}^{2} \\ - 2 r (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1}) - 2 r^{2} β_{k}^{2} {(B y_{k} - B y_{k + 1})}^{T} (B y_{k} - B y_{k + 1}) \\ = {(r + s)}^{2} β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + (1 - r) (r + s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{array}

By the definition of

H_{k}

, Lemmas 5 and 6, and noting that

A x^{*} + B y^{*} - b = 0

, we have

\begin{array}{l} \begin{array}{l} 2 {(\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix})}^{T} H_{k} (\begin{matrix} y_{k + 1} - y_{k} \\ λ_{k + 1} - λ_{k} \end{matrix}) = - 2 (r + s - r s) β_{k}^{2} {(B y_{k} - B y^{*})}^{T} (B y_{k} - B y_{k + 1}) \\ + 2 r β_{k} {(λ_{k} - λ^{*})}^{T} (B y_{k} - B y_{k + 1}) + 2 r (r + s) β_{k}^{2} {(B y_{k} - B y^{*})}^{T} (A x_{k + 1} + B y_{k + 1} - b) \\ + 2 r^{2} β_{k}^{2} {(B y_{k} - B y^{*})}^{T} (B y_{k} - B y_{k + 1}) - 2 (r + s) β_{k} {(λ_{k} - λ^{*})}^{T} (A x_{k + 1} + B y_{k + 1} - b) \\ - 2 r β_{k} {(λ_{k} - λ^{*})}^{T} (B y_{k} - B y_{k + 1}) \end{array} \\ \begin{array}{l} = - 2 (1 - r) (r + s) β_{k}^{2} {(B y_{k + 1} - B y^{*})}^{T} (B y_{k} - B y_{k + 1}) - 2 (1 - r) (r + s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ + 2 r (r + s) β_{k}^{2} {(B y_{k} - B y^{*})}^{T} (A x_{k + 1} + B y_{k + 1} - b) - 2 (r + s) β_{k} {(λ_{k} - λ^{*})}^{T} (A x_{k + 1} + B y_{k + 1} - b) \\ \leq - 2 (1 - r) (r + s) β_{k}^{2} {(B y_{k + 1} - B y^{*})}^{T} (B y_{k} - B y_{k + 1}) - 2 (1 - r) (r + s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ + 2 r (r + s) β_{k}^{2} {(B y_{k} - B y^{*})}^{T} (A x_{k + 1} + B y_{k + 1} - b) - 2 r (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k} - b)}^{T} (B y_{k + 1} - B y^{*}) \\ - 2 (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k} - b)}^{T} (A x_{k + 1} - A x^{*}) - 2 (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k + 1} - B y^{*}) \end{array} \\ \begin{array}{l} = - 2 (1 - r) (r + s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} - 2 (r + s) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ - 2 (r + s) β_{k}^{2} {(B y_{k} - B y_{k + 1})}^{T} (B y_{k + 1} - B y^{*}) - 2 (r + s) β_{k}^{2} {(B y_{k} - B y_{k + 1})}^{T} (A x_{k + 1} - A x^{*}) \\ + 2 r (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1}) \end{array} \\ \begin{array}{l} = - 2 (1 - r) (r + s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} - 2 (r + s) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ - 2 (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1}) + 2 r (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1}) \end{array} \\ \begin{array}{l} = - 2 (1 - r) (r + s) β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} - 2 (r + s) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ - 2 (1 - r) (r + s) β_{k}^{2} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1}) \\ \leq - 2 (r + s) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - \frac{2 (1 - r) (r + s)}{1 + r} β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ - \frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1}) \\ \leq - 2 (r + s) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - \frac{2 (1 - r) (r + s)}{1 + r} β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ + |\frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1})| . \end{array} \end{array}

Therefore, we have

\begin{array}{l} {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} = {‖(\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix})‖}_{H_{k}}^{2} + {‖(\begin{matrix} y_{k + 1} - y_{k} \\ λ_{k + 1} - λ_{k} \end{matrix})‖}_{H_{k}}^{2} + 2 {(\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix})}^{T} H_{k} (\begin{matrix} y_{k + 1} - y_{k} \\ λ_{k + 1} - λ_{k} \end{matrix}) \\ \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} - (r + s) (2 - r - s) {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - \frac{{(1 - r)}^{2} (r + s)}{1 + r} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ + |\frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k + 1} + B y_{k + 1} - b)}^{T} (B y_{k} - B y_{k + 1})| = {‖v_{k} - v^{*}‖}_{H_{k}}^{2} - φ (w_{k}) . \end{array}

Hence, the inequality Equation (31) holds. □

To prove the global convergence of Algorithm 1, we divided the domain

D

defined in Equation (8) into the following five parts:

\{\begin{array}{l} D_{1} = \{(r, s) | r \in (- 1, 1), s \in (0, 1), r + s > 0\}, \\ D_{2} = \{(r, s) | r \in (- 1, 1), s = 1\}, \\ D_{3} = \{(r, s) | r \in (- 1, 0), s \in (1, \frac{1 + \sqrt{5}}{2}), - r < 1 + s - s^{2}\}, \\ D_{4} = \{(r, s) | r = 0, s \in (1, \frac{1 + \sqrt{5}}{2})\}, \\ D_{5} = \{(r, s) | r \in (0, 1), s \in (1, \frac{1 + \sqrt{5}}{2}), r < 1 + s - s^{2}\} . \end{array}

Obviously,

D = \sum_{n = 1}^{5} D_{n} and D_{i} \cap D_{j} = \emptyset, \forall i, j \in \{1, 2, 3, 4, 5\}, i \neq j

Lemma 8.

Assume that the sequence

{w_{k}}

is generated by Algorithm 1; then, for any

(r, s) \in D

, there exist constants

C_{0}, C_{1}, C_{2} > 0

and

ξ

, such that

φ (w_{k})

defined in Equation (32) satisfies

\begin{matrix} φ (w_{k}) \geq ξ C_{0} \{β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} \\ + C_{1} β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + C_{2} β_{k}^{2} {‖B (y_{k} - y_{k + 1})‖}_{2}^{2}, \end{matrix}

(33)

where

ξ = \{\begin{array}{l} 0, {(r, s)} \in D_{2}, \\ 1, {(r, s)} \in D \ D_{2} . \end{array}

Proof.

In the case

(r, s) \in D_{1}

, we have by the Cauchy–Schwarz inequality for the last term of

φ (w_{k})

in Lemma 7 that

\begin{array}{l} |\frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1})| \\ \leq (1 - s) (r + s) β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} + \frac{{(1 - r)}^{2} (1 - s) (r + s)}{{(1 + r)}^{2}} β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{array}

Thus, we have

\begin{array}{l} φ (w_{k}) & \geq (r + s) (2 - (r + s)) β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + \frac{{(1 - r)}^{2} (r + s)}{1 + r} β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ - (1 - s) (r + s) β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} - \frac{{(1 - r)}^{2} (1 - s) (r + s)}{{(1 + r)}^{2}} β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2} \\ = (1 - s) (r + s) \{β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} \\ + (1 - r) (r + s) β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + {(\frac{(1 - r) (r + s)}{1 + r})}^{2} β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{array}

Let

C_{0}, C_{1}, C_{2}

in Equation (33) as

C_{0} = (r + s) (1 - s) > 0, C_{1} = (r + s) (1 - r) > 0, C_{2} = {(\frac{(1 - r) (r + s)}{1 + r})}^{2} > 0,

then the inequality Equation (33) holds.

In the case

(r, s) \in D_{2}

, that is,

r \in (- 1, 1), s = 1

, we have by Lemma 7 that

φ (w_{k}) = (1 + r) (1 - r) β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + {(1 - r)}^{2} β_{k}^{2} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} .

Let

C_{1} = (1 + r) (1 - r) > 0, C_{2} = {(1 - r)}^{2} > 0

, and we have

φ (w_{k}) \geq C_{1} β_{k}^{2} {‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + C_{2} β_{k}^{2} {‖B (y_{k} - y_{k + 1})‖}_{2}^{2} .

In the case

(r, s) \in D_{3}

, we have by the Cauchy–Schwarz inequality for the last term of

φ (w_{k})

in Lemma 7 that

\begin{array}{l} |\frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1})| \\ \leq (r + s) [T_{1} - (r + s)] β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} + \frac{{(1 - r)}^{2} {(1 - s)}^{2} (r + s)}{{(1 + r)}^{2} [T_{1} - (r + s)]} β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2}, \end{array}

where

T_{1} = r + s + \frac{(s^{2} - s) (2 - s)}{1 + r} > 0

. Thus, we have

\begin{array}{l} φ (w_{k}) & \geq (r + s) [T_{1} - (r + s)] \{β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} \\ + (r + s) [2 - T_{1}] β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ + [\frac{{(1 - r)}^{2} (r + s)}{1 + r} - \frac{{(1 - r)}^{2} {(1 - s)}^{2} (r + s)}{{(1 + r)}^{2} [T_{1} - (r + s)]}] β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{array}

And let

C_{0}, C_{1}, C_{2}

in Equation (33) as

\begin{matrix} C_{0} = (r + s) [T_{1} - (r + s)] = \frac{(r + s) (s^{2} - s) (2 - s)}{1 + r} > 0, C_{1} = (r + s) [2 - T_{1}] > - r (r + s) > 0, \\ C_{2} = \frac{{(1 - r)}^{2} (r + s)}{1 + r} - \frac{{(1 - r)}^{2} {(1 - s)}^{2} (r + s)}{{(1 + r)}^{2} [T_{1} - (r + s)]} = \frac{{(1 - r)}^{2} (r + s) (1 + s - s^{2})}{s (1 + r) (2 - s)} > 0, \end{matrix}

then the inequality Equation (33) holds.

In the case

(r, s) \in D_{4}

, we have by the Cauchy–Schwarz inequality for the last term of

φ (w_{k})

in Lemma 7 that

\begin{array}{l} |2 s (1 - s) β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1})| \\ \leq s (T_{2} - s) β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} + \frac{s {(1 - s)}^{2}}{T_{2} - s} {‖B y_{k} - B y_{k + 1}‖}_{2}^{2}, \end{array}

where

T_{2} = \frac{1}{3} (s^{2} - s + 5)

. Thus, we have

\begin{matrix} φ (w_{k}) \geq s (T_{2} - s) \{β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} \\ + s (2 - T_{2}) β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + s (1 - \frac{{(1 - s)}^{2}}{T_{2} - s}) {‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{matrix}

And let

C_{0}, C_{1}, C_{2}

in Equation (33) as

\begin{matrix} C_{0} = s (T_{2} - s) = \frac{1}{3} s (s^{2} - 4 s + 5) > 0, C_{1} = s (2 - T_{2}) = \frac{1}{3} s (1 + s - s^{2}) > 0, \\ C_{2} = s (1 - \frac{{(1 - s)}^{2}}{T_{2} - s}) = \frac{2 s (1 + s - s^{2})}{1 + {(s - 2)}^{2}} > 0, \end{matrix}

then the inequality Equation (33) holds.

In the case

(r, s) \in D_{5}

, we have by the Cauchy–Schwarz inequality for the last term of

φ (w_{k})

in Lemma 7 that

\begin{array}{l} |\frac{2 (1 - r) (1 - s) (r + s)}{1 + r} β_{k - 1} β_{k} {(A x_{k} + B y_{k} - b)}^{T} (B y_{k} - B y_{k + 1})| \\ \leq (r + s) [T_{3} - (r + s)] β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} + \frac{{(1 - r)}^{2} {(1 - s)}^{2} (r + s)}{{(1 + r)}^{2} [T_{2} - (r + s)]} β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2}, \end{array}

where

T_{3} = r + s + {(1 - s)}^{2}

. Thus, we have

\begin{array}{l} φ (w_{k}) & \geq (r + s) [T_{3} - (r + s)] \{β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} - β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} \\ + (r + s) [2 - T_{3}] β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ + [\frac{{(1 - r)}^{2} (r + s)}{1 + r} - \frac{{(1 - r)}^{2} {(1 - s)}^{2} (r + s)}{{(1 + r)}^{2} [T_{3} - (r + s)]}] β_{k} {}^{2}{‖B y_{k} - B y_{k + 1}‖}_{2}^{2} . \end{array}

And let

C_{0}, C_{1}, C_{2}

in Equation (33) as

\begin{matrix} C_{0} = (r + s) [T_{3} - (r + s)] = (r + s) {(1 - s)}^{2} > 0, C_{1} = (r + s) [2 - T_{3}] > 0, \\ C_{2} = \frac{{(1 - r)}^{2} (r + s)}{1 + r} - \frac{{(1 - r)}^{2} {(1 - s)}^{2} (r + s)}{{(1 + r)}^{2} [T_{3} - (r + s)]} = \frac{r {(1 - r)}^{2} (r + s)}{{(1 + r)}^{2}} > 0, \end{matrix}

then the inequality Equation (33) holds. □

Lemma 9.

Assume that the sequence

{w_{k}}

is generated by Algorithm 1; then, we have that there exist constants

t_{1}, t_{2} \in (0, \frac{(1 - r) (r + s)}{r + s - r s})

such that

t_{1} (r + s - r s) β_{k - 1}^{2} {‖B y_{k} - B y^{*}‖}_{2}^{2} \leq {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} and t_{2} {‖λ_{k} - λ^{*}‖}_{2}^{2} \leq {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2},

where

(r, s) \in D

Proof.

By the definition of

H_{k}

and

{‖x‖}_{H_{k}}

, we have

\begin{array}{l} {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} - t_{1} (r + s - r s) β_{k - 1}^{2} {‖B y_{k} - B y^{*}‖}_{2}^{2} \\ = {(\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix})}^{T} (\begin{matrix} (1 - t_{1}) (r + s - r s) β_{k - 1}^{2} B^{T} B & - r β_{k - 1} B^{T} \\ - r β_{k - 1} B & I_{m} \end{matrix}) (\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix}) . \end{array}

Since

t_{1} \in (0, \frac{(1 - r) (r + s)}{r + s - r s})

, we know that matrix

(\begin{matrix} (1 - t_{1}) (r + s - r s) β_{k - 1}^{2} B^{T} B & - r β_{k - 1} B^{T} \\ - r β_{k - 1} B & I_{m} \end{matrix})

is a positive semi-definite matrix. And thus

{‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} - t_{1} (r + s - r s) β_{k - 1}^{2} {‖B y_{k} - B y^{*}‖}_{2}^{2} \geq 0

. Analogously, we have

t_{2} {‖λ_{k} - λ^{*}‖}_{2}^{2} \leq {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2}

.□

Lemma 10.

Assume that the sequence

\{w_{k}\}

is generated by Algorithm 1; then, there exists a constant

\bar{C} > 0

such that

{‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} < \bar{C}

holds for all positive integer numbers k.

Proof.

β_{k} > β_{k - 1}

, we have by Equation (26), Lemmas 7–9, and the definition of

H_{k}

and

{‖x‖}_{H_{k}}

that

\begin{array}{l} {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \\ \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} - (C_{1} β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + C_{2} β_{k} {}^{2}{‖B (y_{k} - y_{k + 1})‖}_{2}^{2}) \\ \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ = (1 + τ_{k - 1}) {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + {(\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix})}^{T} (\begin{matrix} (r + s - r s) (τ_{k - 1} + τ_{k - 1}^{2}) β_{k - 1}^{2} B^{T} B & 0 \\ 0 & - τ_{k - 1} I_{m} \end{matrix}) (\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix}) \\ + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ = (1 + τ_{k - 1}) {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + (τ_{k - 1} + τ_{k - 1}^{2}) (r + s - r s) β_{k - 1}^{2} {‖B y_{k} - B y^{*}‖}_{2}^{2} - τ_{k - 1} {‖λ_{k} - λ^{*}‖}_{2}^{2} \\ + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ \leq (1 + τ_{k - 1}) {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + \frac{τ_{k - 1}}{t_{1}} (1 + τ_{k - 1}) {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ \leq (1 + τ_{k - 1}) (1 + \frac{τ_{k - 1}}{t_{1}}) \{{‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} . \end{array}

β_{k} < β_{k - 1}

, we have by Equation (26), Lemmas 7–9, and the definition of

H_{k}

and

{‖x‖}_{H_{k}}

that

\begin{array}{l} {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ = \frac{1}{1 + τ_{k - 1}} {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + {(\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix})}^{T} (\begin{matrix} (r + s - r s) \frac{- τ_{k - 1} β_{k - 1}^{2}}{{(1 + τ_{k - 1})}^{2}} B^{T} B & 0 \\ 0 & \frac{τ_{k - 1}}{1 + τ_{k - 1}} I_{m} \end{matrix}) (\begin{matrix} y_{k} - y^{*} \\ λ_{k} - λ^{*} \end{matrix}) \\ + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ = \frac{1}{1 + τ_{k - 1}} {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + \frac{τ_{k - 1}}{1 + τ_{k - 1}} {‖λ_{k} - λ^{*}‖}_{2}^{2} - \frac{τ_{k - 1} (r + s - r s)}{{(1 + τ_{k - 1})}^{2}} β_{k - 1}^{2} {‖B y_{k} - B y^{*}‖}_{2}^{2} \\ + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2} \\ \leq \frac{t_{2} + τ_{k - 1}}{t_{2} (1 + τ_{k - 1})} \{{‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} + ξ C_{0} β_{k - 1}^{2} {‖A x_{k} + B y_{k} - b‖}_{2}^{2}\} . \end{array}

Let

ℕ_{>} = \{k | β_{k} > β_{k - 1}\}, ℕ_{<} = \{k | β_{k} < β_{k - 1}\}, ℕ_{=} = \{k | β_{k} = β_{k - 1}\}

with

ℕ_{>} \cap ℕ_{<} = \emptyset, ℕ_{>} \cup ℕ_{<} \cup ℕ_{=} = \{1, 2, \dots, k\},

then

{‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} \leq \prod_{i \in ℕ_{>}} (1 + τ_{i - 1}) (1 + \frac{τ_{i - 1}}{t_{1}}) \prod_{j \in ℕ_{<}} \frac{t_{2} + τ_{j - 1}}{t_{2} (1 + τ_{j - 1})} \{{‖v_{1} - v^{*}‖}_{H_{0}}^{2} + ξ C_{0} β_{0}^{2} {‖A x_{1} + B y_{1} - b‖}_{2}^{2}\} .

Noting that

\prod_{i = 1}^{\infty} (1 + τ_{i - 1}) (1 + \frac{τ_{i - 1}}{t_{1}}) < \infty

and

\prod_{i = 1}^{\infty} \frac{t_{2} + τ_{i - 1}}{t_{2} (1 + τ_{i - 1})} < \infty

, we know that there exists a constant

\bar{C} > 0

such that

{‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} < \bar{C}

holds for all positive integer numbers k. □

Theorem 1.

Assume that the sequence

\{w_{k}\}

is generated by Algorithm 1; then, we have

\lim_{k \to \infty} {{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + {‖B y_{k} - B y_{k + 1}‖}_{2}^{2}} = 0 .

And so

\hat{w} = \lim_{k \to \infty} w_{k}

is a solution of the variational inequality (Equation (15)). Hence

(\hat{x}, \hat{y})

, is a solution of the optimization problem (Equation (1)).

Proof.

β_{k} > β_{k - 1}

, we have by Lemmas 7, 8, and 10 that

\begin{array}{l} C_{1} β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + C_{2} β_{k} {}^{2}{‖B (y_{k} - y_{k + 1})‖}_{2}^{2} \\ \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} - {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} (β_{k - 1} {}^{2}{‖A x_{k} + B y_{k} - b‖}_{2}^{2} - β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2}) \\ \leq {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} - {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} (β_{k - 1} {}^{2}{‖A x_{k} + B y_{k} - b‖}_{2}^{2} - β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2}) \\ + (1 + τ_{k - 1}) (1 + \frac{τ_{k - 1}}{t_{1}}) {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} . \end{array}

Hence, we have

\begin{array}{l} \min (C_{1}, C_{2}) \sum_{i = 1}^{\infty} β_{i} {}^{2}({‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) \\ \leq \sum_{i = 1}^{\infty} β_{i} {}^{2}(C_{1} {‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + C_{2} {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) \\ \leq {‖v_{1} - v^{*}‖}_{H_{0}}^{2} - {‖v_{\infty} - v^{*}‖}_{H_{\infty}}^{2} + ξ C_{0} (β_{0} {}^{2}{‖A x_{1} + B y_{1} - b‖}_{2}^{2} - β_{\infty}^{2} {‖A x_{\infty} + B y_{\infty} - b‖}_{2}^{2}) \\ + \bar{C} \sum_{i = 1}^{\infty} (1 + τ_{i - 1}) (1 + \frac{τ_{i - 1}}{t_{1}}) . \end{array}

Noting that

\sum_{i = 1}^{\infty} (1 + τ_{i - 1}) (1 + \frac{τ_{i - 1}}{t_{1}}) < \infty

, the above equation is equivalent to

\begin{array}{l} \min (C_{1}, C_{2}) \sum_{i = 1}^{\infty} β_{i} {}^{2}({‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) + ξ C_{0} β_{\infty}^{2} {‖A x_{\infty} + B y_{\infty} - b‖}_{2}^{2} \\ \leq {‖v_{1} - v^{*}‖}_{H_{0}}^{2} - {‖v_{\infty} - v^{*}‖}_{H_{\infty}}^{2} + ξ C_{0} β_{0} {}^{2}{‖A x_{1} + B y_{1} - b‖}_{2}^{2} + \bar{C} \sum_{i = 1}^{\infty} (1 + τ_{i - 1}) (1 + \frac{τ_{i - 1}}{t_{1}}) < \infty . \end{array}

So, we have

\lim_{k \to \infty} {{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + {‖B y_{k} - B y_{k + 1}‖}_{2}^{2}} = 0 .

β_{k} < β_{k - 1}

, we have by Lemmas 7, 8, and 10 that

\begin{array}{l} C_{1} β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + C_{2} β_{k} {}^{2}{‖B (y_{k} - y_{k + 1})‖}_{2}^{2} \\ \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} - {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} (β_{k - 1} {}^{2}{‖A x_{k} + B y_{k} - b‖}_{2}^{2} - β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2}) \\ \leq {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} - {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} (β_{k - 1} {}^{2}{‖A x_{k} + B y_{k} - b‖}_{2}^{2} - β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2}) \\ + \frac{t_{2} + τ_{k - 1}}{t_{2} (1 + τ_{k - 1})} {‖v_{k} - v^{*}‖}_{H_{k - 1}}^{2} . \end{array}

Hence, we have

\begin{array}{l} \min (C_{1}, C_{2}) \sum_{i = 1}^{\infty} β_{i}^{2} ({‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) \\ \leq \sum_{i = 1}^{\infty} β_{i}^{2} (C_{1} {‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + C_{2} {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) \\ \leq {‖v_{1} - v^{*}‖}_{H_{0}}^{2} - {‖v_{\infty} - v^{*}‖}_{H_{\infty}}^{2} + ξ C_{0} (β_{0}^{2} {‖A x_{1} + B y_{1} - b‖}_{2}^{2} - β_{\infty}^{2} {‖A x_{\infty} + B y_{\infty} - b‖}_{2}^{2}) \\ + \bar{C} \sum_{i = 1}^{\infty} \frac{t_{2} + τ_{i - 1}}{t_{2} (1 + τ_{i - 1})} . \end{array}

Noting that

\sum_{i = 1}^{\infty} \frac{t_{2} + τ_{i - 1}}{t_{2} (1 + τ_{i - 1})} < \infty

, the above equation is equivalent to

\begin{array}{l} \min (C_{1}, C_{2}) \sum_{i = 1}^{\infty} β_{i}^{2} ({‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) + ξ C_{0} β_{\infty}^{2} {‖A x_{\infty} + B y_{\infty} - b‖}_{2}^{2} \\ \leq {‖v_{1} - v^{*}‖}_{H_{0}}^{2} - {‖v_{\infty} - v^{*}‖}_{H_{\infty}}^{2} + ξ C_{0} β_{0}^{2} {‖A x_{1} + B y_{1} - b‖}_{2}^{2} + \bar{C} \sum_{i = 1}^{\infty} \frac{t_{2} + τ_{i - 1}}{t_{2} (1 + τ_{i - 1})} < \infty . \end{array}

So, we have

\lim_{k \to \infty} {{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + {‖B y_{k} - B y_{k + 1}‖}_{2}^{2}} = 0

β_{k} = β_{k - 1} = β

, we have by Lemmas 7 and 8 that

\begin{array}{l} C_{1} β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + C_{2} β_{k} {}^{2}{‖B (y_{k} - y_{k + 1})‖}_{2}^{2} \\ \leq {‖v_{k} - v^{*}‖}_{H_{k}}^{2} - {‖v_{k + 1} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} (β_{k - 1} {}^{2}{‖A x_{k} + B y_{k} - b‖}_{2}^{2} - β_{k} {}^{2}{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2}) . \end{array}

Hence, we have

\begin{array}{l} \min (C_{1}, C_{2}) \sum_{i = 1}^{\infty} β_{i}^{2} ({‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) \\ \leq \sum_{i = 1}^{\infty} β_{i}^{2} (C_{1} {‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + C_{2} {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) \\ \leq {‖v_{1} - v^{*}‖}_{H_{k}}^{2} - {‖v_{\infty} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} (β_{0}^{2} {‖A x_{1} + B y_{1} - b‖}_{2}^{2} - β_{\infty}^{2} {‖A x_{\infty} + B y_{\infty} - b‖}_{2}^{2}) . \end{array}

with its equivalence to

\begin{array}{l} \min (C_{1}, C_{2}) \sum_{i = 1}^{\infty} β_{i}^{2} ({‖A x_{i + 1} + B y_{i + 1} - b‖}_{2}^{2} + {‖B (y_{i} - y_{i + 1})‖}_{2}^{2}) + ξ C_{0} β_{\infty}^{2} {‖A x_{\infty} + B y_{\infty} - b‖}_{2}^{2} \\ \leq {‖v_{1} - v^{*}‖}_{H_{k}}^{2} - {‖v_{\infty} - v^{*}‖}_{H_{k}}^{2} + ξ C_{0} β_{0}^{2} {‖A x_{1} + B y_{1} - b‖}_{2}^{2} < \infty . \end{array}

So, we have

\lim_{k \to \infty} {{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}^{2} + {‖B y_{k} - B y_{k + 1}‖}_{2}^{2}} = 0

. By that discussed above and Lemma 4, we know that

\hat{w} = \lim_{k \to \infty} w_{k}

is a solution of the variational inequality (Equation (19)). Hence,

(\hat{x}, \hat{y})

, is a solution of the optimization problem (Equation (1)). □

3. Numerical Experiments

In this section, some numerical experiments will be tested to illustrate the efficiency of Algorithm 1. The experiments are divided into two parts. In the first part, we first analyze the relationship between the parameters r and s and the convergence of Algorithm 1 and give the selection range of the best parameters r and s. Then, we give a comparison of Algorithm 1 with the algorithm proposed in [15], the algorithm proposed in [16], and the algorithm proposed in [27] to solve a convex quadratic programming problem. In the second part, we give the numerical comparison between Algorithm 1 and the algorithm proposed in [16] in image restoration. All codes were written in MATLAB R2019a, and all experiments were performed on a personal computer with an Intel(R) Core i7-6567U processor (3.3 GHz) and 8 GB memory.

3.1. Convex Quadratic Programming Problem

To specify the problem (Equation (1)), we choose a separable convex quadratic programming problem proposed in [31], where

f (x) = \frac{1}{2} x^{T} M_{1} x + q_{1} {}^{T}x and g (y) = \frac{1}{2} y^{T} M_{2} y + q_{2} {}^{T}y,

and here the matrices

M_{i} = Q_{i}^{T} Q_{i} (i = 1, 2)

, and

Q_{1} \in ℝ^{m \times m}

, and

Q_{2} \in ℝ^{n \times n}

are randomly generated square matrices with all entries are chosen in intervals [−5, 5]. The vectors

q_{i} (i = 1, 2)

are generated from a uniform distribution in the interval [−500, 500]. For the linear constraint,

A = 1_{1 \times m}

and

B = 1_{1 \times n}

(i.e., all entries of A and B are taken as 1), and

b = (m + n) / 100

. The convex set

X \subset ℝ^{m}

is defined as a sphere with the origin at the center of the circle and radius

r = 1

, and the convex set

Y \subset ℝ^{n}

is defined as the box in

ℝ^{n}

with all entries are chosen in intervals [0, 5].

Since the convergence of the original ADMM algorithm is well proven, the algorithm proposed in [11] may not converge in some cases, the symmetric ADMM algorithm proposed in [16] has a larger step size relative to the algorithm in [16], and the adaptive penalty parameter has been less studied. Therefore, we only compare Algorithm 1 numerically with the algorithms presented in references [14], [16] and [27] denoted respectively by Algorithm 14, 16 and 27.

To ensure the fairness of the experiment, the parameter

γ

of Algorithm 14 and 27 is chosen as the optimal value

γ = 1.6

; the parameters r and s of the algorithm proposed in [18] are chosen as the optimal value r = 0.8, s = 1.17; and the initial penalty parameter

β

of all algorithms mentioned above is chosen as 1. The initial iterations for all tested methods are zero vectors, and the stopping criterion is chosen as

\max \{{‖A x_{k + 1} + B y_{k + 1} - b‖}_{2}, {‖B y_{k} - B y_{k + 1}‖}_{2}\} \leq ε = 10^{- 6}

For Algorithm 1, we take

τ_{k} = \{\begin{cases} 0.3, i f k \leq k_{\max}, \\ 0, o t h e r w i s e . \end{cases}

The parameters of Algorithm 2, which solves the x-subproblem and the y-subproblem of Algorithm 1, are chosen as

θ = 0.1

ν = 0.8

, and

l = 1.8

. Table 1 reports the iteration numbers (Iter) and computation time (Time) of Algorithm 1 with different parameters r and s, and the best estimate of the parameters r and s can be chosen as r = 0.95, s = 1.12. And so, in the following numerical comparison experiment, the parameters r and s of Algorithm 1 are chosen as r = 0.95, s = 1.12.

In Table 2, we show the iteration numbers (Iter), the computational time (Time) of Algorithm 1, and Algorithm14, 16 and 27 for solving the separable convex quadratic programming problem proposed in [31]. To investigate the stability and accuracy of the algorithms, we tested 8 different sets of m and n values throughout the experiment, running each set of data 10 times and averaging the final results. In Figure 1, we plot the comparison in terms of objective function values (Obj) and computing time with Algorithm 1, and Algorithm14, Algorithm16 and Algorithm27 for solving the separable convex quadratic programming problem proposed in [31].

Based on the tests reported in Table 2, Figure 1, and many other performed unreported tests that show similar patterns, we know that Algorithm 1 is more efficient than other existing algorithms.

3.2. Image Restoration Problem

In this subsection, we consider the total variational image deblurring model proposed in [16], whose discretized version can be written as

\min_{y} {‖A y‖}_{1} + \frac{λ}{2} {‖B x - z‖}^{2}

(34)

where

y \in R^{n}

represents a digital clean image,

z \in R^{n}

is a corrupted input image,

A : = (\partial_{1}, \partial_{2}) : R^{n} \to R^{n} \times R^{n}

is the discrete gradient operator, and

\partial_{1} : R^{n} \to R^{n}

and

\partial_{2} : R^{n} \to R^{n}

are the discrete derivatives in the horizontal and vertical directions, respectively.

B : R^{n} \to R^{n}

is the matrix representation of a spatially invariant blurring operator, λ > 0 is a constant balancing the data fidelity and total variational regularization terms, and

{‖\cdot‖}_{1}

defined on

R^{n} \times R^{n}

is given by

{‖x‖}_{1} = \sum_{{i, j} = 1}^{n} \sqrt{{|x_{i, j}^{1}|}^{2} + {|x_{i, j}^{2}|}^{2}}, \forall x = (x^{1}, x^{2}) \in R^{n} \times R^{n},

This is a basic model for various more advanced image processing tasks, and it has been studied extensively in the literature. Introducing the auxiliary variable x, we can reformulate Equation (34) as

\min_{x, y} {‖x‖}_{1} + \frac{λ}{2} {‖B y - z‖}^{2} s . t x - A y = 0

which is a special case of the generic model (Equation (1)) under discussion, and thus Algorithm 1 is applicable.

We test two images lena.tif (512 × 512) and man.pgm (512 × 512). These images are first corrupted by the blur operator with kernel size of 21 × 21, and then the blurred images are further corrupted by zero-white Gaussian noise with a standard derivation 0.002. In Figure 2, we list the original and degraded images. The quality of restored images is measured by the value of the SNR given by

SNR = 20 \log_{10} \frac{‖y‖}{‖y - \bar{y}‖}

where

y

is the original image and

\bar{y}

is the recovered image. A larger SNR value means a higher quality of the restored image. Table 3 and Figure 3 report the comparison experiment between Algorithm 1 and the algorithm proposed in [16] for the image restoration problem.

4. Conclusions

The alternating direction method is one of the most attractive approaches for solving convex optimization problems with linear constraints and separable objective functions. Experience with applications has shown that the iteration numbers and computing time depend significantly on the penalty parameter for the linear constraint. The penalty parameters in the classical alternating direction method are a constant. In this paper, we, based on the idea of references [16,28], propose an adaptive penalty parameter alternate direction method of multipliers with attach relaxation factors and with the Lagrange multiplier updated twice at each iteration (Algorithm 1), which not only adaptively adjusts the penalty parameters per iteration based on the iteration message but also adds relaxation factors to the Lagrange multiplier update steps. For the proposed algorithm, that is, Algorithm 1, the global convergence is proven (that is, Theorem 1). Preliminary numerical experiments show that the technique of adaptive adjusting of penalty parameters per iteration and attaching relaxation factors in Lagrange multiplier updating steps are effective in practical applications (see Table 2 and Figure 1).

Author Contributions

Writing—original draft preparation, J.P. and Z.W.; writing-review and editing, S.Y. and Z.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Natural Science Foundation of China (grant number 11961012) and the Special Research Project for Guangxi Young Innovative Talents (grant number AD20297063).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the anonymous reviewer for valuable suggestions that helped them to improve this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, J.F.; Zhang, Y. Alternating direction algorithms for l₁-problems in compressive sensing. SIAM J. Sci. Comput. 2011, 33, 250–278. [Google Scholar] [CrossRef]
Li, J.C. A parameterized proximal point algorithm for separable convex optimization. Optim. Lett. 2018, 12, 1589–1608. [Google Scholar]
Tao, M.; Yuan, X.M. Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 2011, 21, 57–81. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H.C. Convergence rates for an inexact ADMM applied to separable convex optimization. Comput. Optim. Appl. 2020, 77, 729–754. [Google Scholar] [CrossRef]
Liu, Z.S.; Li, J.C.; Liu, X.N. A new model for sparse and Low-Rank matrix decomposition. J. Appl. Anal. Comput. 2017, 7, 600–616. [Google Scholar]
Jiang, F.; Wu, Z.M. An inexact symmetric ADMM algorithm with indefinite proximal term for sparse signal recovery and image restoration problems. J. Comput. Appl. Math. 2023, 417, 114628. [Google Scholar] [CrossRef]
Bai, J.C.; Hager, W.W.; Zhan, H.C. An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 2022, 81, 479–518. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternation direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Chan, T.F.; Glowinski, R. Finite Element Approximation and Iterative Solution of a Class of Mildly Nonlinear Elliptic Equations; Technical Report STAN-CS-78-674; Computer Science Department, Stanford University: Stanford, CA, USA, 1978. [Google Scholar]
Hestenes, M.R. Multiplier and gradient methods. J. Optim. Theory Appl. 1969, 4, 303–320. [Google Scholar] [CrossRef]
Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
Cai; X. J.; Gu, G.Y.; He, B.S.; Yuan, X.M. A proximal point algorithm revisit on the alternating direction method of multipliers. Sci. China Math. 2013, 56, 2179–2186. [Google Scholar] [CrossRef]
Fortin, M.; Glowinski, R. Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems; North-Holland: Amsterdam, The Netherlands, 1983. [Google Scholar]
Glowinski, R.; Karkkainen, T.; Majava, K. On the Convergence of Operator-Splitting Methods, in Numerical Methods for Scientific Computing, Variational Problems and Applications; Heikkola, E., Kuznetsov, P.Y., Neittaanm, P., Pironneau, O., Eds.; CIMNE: Barcelona, Spain, 2003; pp. 67–79. [Google Scholar]
He, B.S.; Liu, H.; Wang, Z.R.; Yuan, X.M. A strictly contractive Peaceman-Rachford splitting method for convex programming. SIAM J. Optim. 2014, 24, 1011–1040. [Google Scholar] [CrossRef] [PubMed]
He, B.S.; Ma, F.; Yuan, X.M. Convergence study on the symmetric version of ADMM with larger step sizes. SIAM J. Imaging Sci. 2016, 9, 1467–1501. [Google Scholar] [CrossRef]
Luo, G.; Yang, Q.Z. A fast symmetric alternating direction method of multipliers. Numer. Math. Theory Meth. Appl. 2020, 13, 200–219. [Google Scholar]
Li, X.X.; Yuan, X.M. A proximal strictly contractive Peaceman-Rachford splitting method for convex programming with applications to imaging. SIAM J. Imaging Sci. 2015, 8, 1332–1365. [Google Scholar] [CrossRef]
Bai, J.C.; Li, J.C.; Xu, F.M.; Zhang, H.C. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 2018, 70, 129–170. [Google Scholar] [CrossRef]
Wu, Z.M.; Li, M. An LQP-based symmetric Alternating direction method of multipliers with larger step sizes. J. Oper. Res. Soc. China 2019, 7, 365–383. [Google Scholar] [CrossRef]
Chang, X.K.; Bai, J.C.; Song, D.J.; Liu, S.Y. Linearized symmetric multi-block ADMM with indefinite proximal regularization and optimal proximal parameter. Calcolo 2020, 57, 38. [Google Scholar] [CrossRef]
Han, D.R.; Sun, D.F.; Zhang, L.W. Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Math. Oper. Res. 2018, 43, 622–637. [Google Scholar] [CrossRef]
Gao, B.; Ma, F. Symmetric alternating direction method with indefinite proximal regularization for linearly constrained convex optimization. J. Optim. Theory Appl. 2018, 176, 178–204. [Google Scholar] [CrossRef]
Shen, Y.; Zuo, Y.N.; Yu, A.L. A partially proximal S-ADMM for separable convex optimization with linear constraints. Appl. Numer. Math. 2021, 160, 65–83. [Google Scholar] [CrossRef]
Adona, V.A.; Goncalves, M.L.N. An inexact version of the symmetric proximal ADMM for solving separable convex optimization. Numer. Algor. 2023, 94, 1–28. [Google Scholar] [CrossRef]
He, B.S.; Yang, H. Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett. 1998, 23, 151–161. [Google Scholar] [CrossRef]
He, B.S.; Yang, H.; Wang, S.L. Alternating direction method with Self-Adaptive penalty parameters for monotone variational inequalities. J. Optim. Theory Appl. 2000, 106, 337–356. [Google Scholar] [CrossRef]
He, B.S.; Liao, L.Z. Improvements of some projection methods for monotone nonlinear variational inequalities. J. Optim. Theory Appl. 2002, 112, 111–128. [Google Scholar] [CrossRef]
Palomar, D.P.; Scutari, G. Variational inequality theory a mathematical framework for multiuser communication systems and signal processing. In Proceedings of the International Workshop on Mathematical of Issues in Information Sciences-MIIS, Xi’an, China, 7–13 July 2012. [Google Scholar]
He, B.S. A new method for a class of linear variational inequalities. Math. Programm. 1994, 66, 137–144. [Google Scholar] [CrossRef]
Han, D.R.; He, H.J.; Yang, H.; Yuan, X.M. A customized Douglas–Rachford splitting algorithm for separable convex minimization with linear constraints. Numer. Math. 2014, 127, 167–200. [Google Scholar] [CrossRef]

Figure 1. Objective function curve of Algorithm 1 and the other three algorithms.

Figure 2. First column is original images; second column is corrupted images.

Figure 3. First column is blurred images; second column is recovered images by Algorithm 1; and third column is recovered images by the algorithm in [18].

Table 1. The numerical comparison of Algorithm 1 with different parameters r and s.

m	n	r = 0.95 s = 1.12		r = 0.9 s = 1		r = s = 1		r = 1 s = 1.5		r = 1.54 s = 0.2		r = 0.1 s = 1.57
m	n	Iter	Time	Iter	Time	Iter	Time	Iter	Time	Iter	Time	Iter	Time
800	200	10.7	0.41	11.4	0.44	10.8	0.44	11.1	0.46	17.7	0.65	19.1	0.71
600	400	10.7	0.26	11.4	0.29	11.0	0.27	11.6	0.29	16.6	0.42	18.7	0.49
1600	400	10.0	1.51	10.3	1.66	9.9	1.61	11.0	1.75	18.5	2.84	18.0	2.82
1200	800	10.0	1.13	10.7	1.26	9.9	1.16	10.7	1.27	17.8	2.05	18.0	2.08
2400	600	9.7	6.95	10.1	7.23	9.7	6.98	10.8	7.72	19.0	13.13	17.1	12.30
1800	1200	9.6	2.73	10.3	3.06	9.8	2.92	11.2	3.30	18.2	5.22	17.0	5.03
3200	800	9.8	15.32	10.0	15.59	10.1	15.81	10.3	16.01	19.0	29.22	16.8	26.24
2400	1600	9.7	8.54	10.1	8.83	10.2	8.83	10.8	9.34	17.1	14.30	15.6	13.86

Table 2. The numerical comparison of Algorithm 1 with the other three algorithms.

m	n	obj	Algorithm 1		Algorithm 16		Algorithm 27		Algorithm 14
			r = 0.95, s = 1.12		r = 0.8, s = 1.17		$γ = 1.6$		$γ = 1.6$
			Iter	Time	Iter	Time	Iter	Time	Iter	Time
800	200	−9478.39	10.6	0.41	39.1	1.41	20.2	0.68	50.1	1.79
600	400	−8939.27	10.7	0.26	42.4	1.19	20.0	0.51	54.1	1.47
1600	400	−12,473.27	10.0	1.51	29.1	5.13	18.9	3.06	37.5	6.62
1200	800	−12,038.11	10.0	1.13	30.1	3.77	19.0	2.16	38.8	4.84
2400	600	−14,789.51	9.5	6.95	21.7	15.56	17.6	12.61	27.9	20.01
1800	1200	−14,312.39	9.8	2.73	22.9	7.19	18.0	5.30	30.1	9.51
3200	800	−16,191.25	9.8	15.32	18.3	28.18	17.3	27.32	23.8	36.43
2400	1600	−14,365.07	9.7	8.54	19.7	17.03	17.3	14.99	25.5	21.98

Table 3. The numerical comparison of Algorithm 1 and the algorithm in [16].

	Lena (512 × 512)			man (512 × 512)
	Iter	Time	SNR	Iter	Time	SNR
Algorithm 1 r = 0.95, s = 1.12	26	3.93	23.02	30	4.51	17.02
Algorithm 16 r =0.8, s = 1.17	37	5.54	23.00	40	6.08	17.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, J.; Wang, Z.; Yu, S.; Tang, Z. An Improvement of the Alternating Direction Method of Multipliers to Solve the Convex Optimization Problem. Mathematics 2025, 13, 811. https://doi.org/10.3390/math13050811

AMA Style

Peng J, Wang Z, Yu S, Tang Z. An Improvement of the Alternating Direction Method of Multipliers to Solve the Convex Optimization Problem. Mathematics. 2025; 13(5):811. https://doi.org/10.3390/math13050811

Chicago/Turabian Style

Peng, Jingjing, Zhijie Wang, Siting Yu, and Zengao Tang. 2025. "An Improvement of the Alternating Direction Method of Multipliers to Solve the Convex Optimization Problem" Mathematics 13, no. 5: 811. https://doi.org/10.3390/math13050811

APA Style

Peng, J., Wang, Z., Yu, S., & Tang, Z. (2025). An Improvement of the Alternating Direction Method of Multipliers to Solve the Convex Optimization Problem. Mathematics, 13(5), 811. https://doi.org/10.3390/math13050811

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improvement of the Alternating Direction Method of Multipliers to Solve the Convex Optimization Problem

Abstract

1. Introduction

2. Convergence of Algorithm 1

3. Numerical Experiments

3.1. Convex Quadratic Programming Problem

3.2. Image Restoration Problem

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI