Abstract
We revisit the classic supporting hyperplane illustration of the duality gap for non-convex optimization problems. It is refined by dissecting the duality gap into two terms: the first measures the degree of near-optimality in a Lagrangian relaxation, while the second measures the degree of near-complementarity in the Lagrangian relaxed constraints. We also give an example of how this dissection may be exploited in the design of a solution approach within discrete optimization.
Avoid common mistakes on your manuscript.
1 Background
We consider the primal problem of finding
where the set \(X \subseteq \mathbb {R}^n\) and the functions \(f : \mathbb {R}^n \mapsto \mathbb {R}\) and \(g : \mathbb {R}^n \mapsto \mathbb {R}^m\). With the vector \(u \in \mathbb {R}^m_+\) of Lagrangian multipliers for the constraint (1b), the dual function associated with the Lagrangian relaxation of this constraint is
while
is the Lagrangian dual problem.
We assume that the set X is non-empty and compact, and that the functions f and g are continuous on X. Then the relaxed problem (2) has an optimal solution for every \(u\in \mathbb {R}^m_+\). We further assume that the primal problem fulfils some constraint qualification which ensures that the dual problem (3) has an optimal solution (such as a Slater condition, see e.g. [1, Proposition 2.4.1]). Optimal solutions to problems (1) and (3) are denoted \(x^*\) and \(u^*\), respectively. The duality gap for the primal–dual pair is \(\varGamma := f^* - \theta ^*\). To ensure that the duality gap is zero, the primal problem must have a convexity property; cf. [2, Theorem 6.2.4], [3, Chapter 5], and [4, Chapter 6]. In case the primal problem is non-convex (e.g., a discrete optimization problem), a positive duality gap can be expected. (Readers that are not well acquainted with Lagrangian duality are referred to e.g. [2,3,4].)
If the duality gap is zero, optimal solutions to both the primal problem (1) and its Lagrangian dual problem (3) can be characterized through the classic global optimality conditions, see e.g. [5, Theorem 5.1] and [2, Theorem 6.2.5]. Letting \((x,u) \in X \times \mathbb {R}^m_+\), these can be stated as
The interpretation of these three conditions is optimality in the Lagrangian relaxed problem (2), feasibility in the relaxed constraint (1b), and complementarity in this constraint, respectively. The following result establishes the equivalence of the consistency of the system (4) and primal–dual optimality with a zero duality gap. The following theorem can be found in e.g. [2, Theorem 6.2.5].
Theorem 1.1
(primal–dual optimality condition) A pair \((x,u) \in X \times \mathbb {R}^m_+\) satisfies the system (4) if and only if x solves the primal problem (1), u solves the dual problem (3), and \(f^* = \theta ^*\) holds.
A conclusion from this theorem is that the system (4) is inconsistent whenever u is not optimal in the Lagrangian dual problem (3) or there is a positive duality gap. In the case when the duality gap is zero and the dual vector is optimal in the Lagrangian dual problem (3), the result of Theorem 1.1 can be used to characterize all optimal solutions to the primal problem (1).
Corollary 1.1
(characterization of optimal primal solutions) If \(f^* = \theta ^*\) holds and u solves the dual problem (3), then an \(x \in X\) solves the primal problem (1) if and only if it, together with u, satisfies the system (4).
This characterization has been generalized [6, Proposition 5] to allow for a positive duality gap, the use of a \(u \in \mathbb {R}^m_+\) that is not necessarily optimal in the dual problem (3), and also to describe near-optimal solutions to the primal problem (1). This generalization is based on the following relaxed global optimality conditions for the problem (1). Here, \(\beta \in \mathbb {R}_+\) and again we let \((x,u) \in X \times \mathbb {R}^m_+\).
Note that the quantities \(\varepsilon \) and \(\delta \) will always become non-negative whenever \((x,u) \in X \times R_+^m\) and (5) hold. They capture near-optimality in the Lagrangian relaxed problem (2) and near-complementarity in the relaxed constraint (1b), respectively. The following theorem is a restatement of [6, Proposition 5].
Theorem 1.2
(characterization of near-optimal primal solutions) For any given \(u \in \mathbb {R}^m_+\), an \(x \in X\) is \(\beta \)-optimal in the primal problem (1) if and only if it, together with u and some values of \(\varepsilon \) and \(\delta \), satisfies the system (5).
Note that for any \(u \in \mathbb {R}^m_+\) and the choice \(\beta = 0\), the system (5) characterizes all primal optimal solutions. Further, if the duality gap is zero, u solves the dual problem, and \(\beta = 0\), this characterization reduces to that of Corollary 1.1.
The characterization in Theorem 1.2 can be simplified by introducing the function \(\varepsilon : X \times \mathbb {R}^m_+ \mapsto \mathbb {R}_+\) with \(\varepsilon (x,u) = f(x) + u^\mathrm{T} g(x) - \theta (u)\), which for a given u measures the degree of near-optimality of an \(x \in X\) in the Lagrangian relaxation, and the function \(\delta : X \times \mathbb {R}^m_+ \mapsto \mathbb {R}_+\) with \(\delta (x,u) = \max \{ 0, - u^\mathrm{T} g(x)\}\), which for a given u measures the degree of near-complementarity of an \(x \in X\) in the relaxed constraint.
Note that
holds for any choice of primal feasible solution x and \(u\in \mathbb {R}^m_+\). Further, for any such choice, the identity (6) provides a dissection of the difference between the primal and dual objective values into a non-negative Lagrangian near-optimality term and a non-negative near-complementarity term. In particular, \(f^* - \theta ^* = \varepsilon (x^*,u^*) + \delta (x^*,u^*) = \varGamma \). With this new notation, Theorem 1.2 can be restated as follows.
Corollary 1.2
(characterization of near-optimal primal solutions) For any given \(u \in \mathbb {R}^m_+\), an x that is feasible in the primal problem (1) is \(\beta \)-optimal if and only if
holds.
The functions \(\varepsilon \) and \(\delta \) were introduced in [7, 8] (although those works did not include the maximum operator in the definition of \(\delta \)). Further, their values were interpreted in the Lagrangian dual space; see also Fig. 6 below. We here give an interpretation with respect to the supporting hyperplane illustration of the duality gap.
2 Supporting hyperplane illustrations
We now consider the case \(m=1\), introduce auxiliary variables \(z \in \mathbb {R}\) and \(v \in \mathbb {R}\), which describe values of functions f and g, respectively, and define the set \((g,f)(X) = \left\{ (g(x),f(x))~|~x \in X \right\} \subset \mathbb {R}^2\). The Lagrangian relaxed problem (2) can then be restated as
Figures 1 and 2 show the classical geometric illustrations of Lagrangian dualization, see e.g. [2], for the case of a zero and a positive duality gap, respectively. Here, and in the remainder of this section, \(\cdot \,^*\) denotes an optimal value. Points in the set (g, f)(X) with \(g(x)\le 0\) are indicated by the gray area, and \((g^*,f^*)=(g(x^*),f(x^*))\).
The functions \(\varepsilon \) and \(\delta \) are now introduced, and in Fig. 3 we show their optimal values. Since \(\varepsilon (x^*,u^*)\) measures the degree of near-optimality of \(x^*\in X\) in the Lagrangian relaxation (2), the line \(z + u^*v = \theta ^* + \varepsilon (x^*,u^*)\) will pass through the point \((g^*,f^*)\). This line intersects the z-axis at \(\theta ^* + \varepsilon (x^*,u^*)\). The geometric interpretation of \(\delta (x^*,u^*)\) follows from \(\varGamma =\varepsilon (x^*,u^*) + \delta (x^*,u^*)\). Alternatively, it follows from the definition \(\delta (x,u) = \max \{ 0, - u^\mathrm{T} g(x)\}\), giving \(\delta (x^*,u^*) = - u^*g^*\).
Next, in Fig. 4, we illustrate the dissection of \(f(x) - \theta (u)\) into \(\varepsilon (x,u)\) and \(\delta (x,u)\) for a non-optimal primal feasible solution \(\bar{x}\) and a non-optimal \(\bar{u}\in \mathbb {R}^m_+\). Here, \((\bar{g},\bar{f})=(g(\bar{x}),f(\bar{x}))\). Since \(\bar{u}\) is not optimal, the line \(z+\bar{u}v = \theta (\bar{u})\) supports the set (g, f)(X) at only one point (which may correspond to an \(x \in X\) that is feasible or infeasible). The construction of the geometric interpretation of \(\varepsilon (\bar{x},\bar{u})\) and \(\delta (\bar{x},\bar{u})\) follows the same arguments as in Fig. 3.
To make the interpretations very concrete, we conclude this section with a detailed analysis of a numerical example, which is a knapsack problem.
The optimal solution is \(x^*=(1,1,0,0,1)\) and \(f^*=24\), with \(g^*=-3\). The dual optimum is \(u^*=\frac{13}{11}\) and \(\theta ^*=15\frac{4}{11}\). Hence, \(\varGamma = 8\frac{7}{11}\), with optimal near-complementarity \(\delta (x^*,u^*) = -\frac{13}{11}(-3) = 3\frac{6}{11}\) and Lagrangian near-optimality \(\varepsilon (x^*,u^*) = \varGamma - \delta (x^*,u^*) = 5\frac{1}{11}\). The problem is illustrated in Fig. 5. The set (g, f)(X) is here discrete and indicated by circles. The circles corresponding to primal feasible solutions are in gray, and \((g^*,f^*)\) is in black. For \(u=u^*\), the Lagrangian relaxed problem has the two optimal solutions \(x^1=(1,1,0,0,0)\) and \(x^2=(1,1,1,0,0)\), with \((g(x^1),f(x^1))=(2,13)\) and \((g(x^2),f(x^2))=(-9,26)\). The line \(z+u^*v=\theta ^*\) passes through these two points. The values \(\varepsilon (x^*,u^*)\) and \(\delta (x^*,u^*)\) can also be interpreted in the Lagrangian dual space [7, 8]. For our numerical example, this is shown in Fig. 6.
3 A practical implication
The purpose of this section is to illustrate how the quantities \(\varepsilon \) and \(\delta \) can be exploited when designing solution approaches for certain problem structures. Preliminary results along this line of research are presented in [8]. We here present a slight extension of the findings from that reference.
We consider the Set Covering Problem (SCP) stated as
where \(\mathcal {J}= \{1,\ldots ,n\}\), \(\mathcal {I}= \{1,\ldots ,m\}\), and all \(c_j > 0\) and \(a_{ij} \in \{0,1\}\). Lagrangian relaxing constraint (9b) with multipliers \(u \in \mathbb {R}^m_+\), the dual function is \(h:\mathbb {R}^m_+ \rightarrow \mathbb {R}\) with
The dual problem is \(h^*=\max _{u \in \mathbb {R}^m_+ } ~ h(u)\). This Lagrangian relaxation has the integrality property [9, p. 177]. Hence, \(h^*\) coincides with the optimal value of the linear programming relaxation of the SCP. Further, any optimal solution to the dual of the latter problem is an optimal solution to the Lagrangian dual problem. Since the upper bounds on the variables are redundant in SCP, we may consider the linear programming relaxation without these bounds. Let \(u^*\) be an optimal dual solution to this problem. Then \(\overline{c}_j=c_j-\sum _{i\in \mathcal {I}} u^*_i a_{ij} \ge 0\) holds for all \(j\in \mathcal {J}\).
For the SCP, \(\varepsilon :\{0,1\}^n \times \mathbb {R}^m_+ \rightarrow \mathbb {R}_+\) with
and \(\delta :\{0,1\}^n \times \mathbb {R}^m_+ \rightarrow \mathbb {R}_+\) with
Let \(\bar{x}\) be any feasible solution to SCP. Since \(h^*=\sum _{i\in \mathcal {I}} u_i^*\), we get
Further,
From (6) we have that \(\sum _{j\in \mathcal {J}}c_j\bar{x}_j-h^* = \varepsilon (\bar{x},u^*)+\delta (\bar{x},u^*)\), and in particular if \(\bar{x}\) is optimal we obtain that \(\varGamma = \varepsilon (\bar{x},u^*) + \delta (\bar{x},u^*)\).
We study 11 challenging SCP problem instances taken from the OR-Library [10]; details concerning the computational setup can be found in [8]. The instances are listed in Table 1. The first five are artificial and taken from [11], and the other six originate from a rail crew scheduling application [12]. The former have a density of 5% (by construction) and the latter have densities between 0.2% and 1.3%. Three of the instances could be solved to proven optimality.
For the optimal or best found solution, denoted \(x^*\), and its objective value \(z^*_\text {IP}\), we calculate the following quantities: relative gap \(\varGamma _{\text {rel}}:=(z^*_{\text {IP}}\) \(-~h^*)/h^*\), relative near-optimality \(\varepsilon _{\text {rel}}:=\varepsilon (x^*,u^*)/(z^*_{\text {IP}}-h^*)\), and relative near-complementarity \(\delta _{\text {rel}}:=\delta (x^*,u^*)/(z^*_{\text {IP}}-h^*)\). We also calculate the quantity Average Excess Coverage \((\text {AEC}):=\tfrac{1}{m}\sum _{i\in \mathcal {I}} (\sum _{j\in \mathcal {J}} a_{ij}x^*_j - 1)\). Note that \(\varepsilon _{\text {rel}} + \delta _{\text {rel}} = 1\).
The functions \(\varepsilon \) and \(\delta \) depend on both x and u. Hence, if there are alternative optimal primal or dual solutions, then the contributions of \(\varepsilon \) and \(\delta \) to the duality gap may vary between these solutions; this was noticed already in [6]. To study this aspect, we solved the two problems
Their optimal values give the full range for \(\delta (x,u^*)\) over all solutions to SCP that are at least as good as \(x^*\). These problems are actually sometimes harder to solve than the original SCP, but most were solved to proven optimality. Detailed results are given in Table 1. (The analysis of the full range of \(\delta (x,u)\) with respect to both optimal x and u is a much more complex task.)
As can be seen in the table, the primal–dual gap \(\varepsilon (x^*,u^*) + \delta (x^*,u^*)\) can be caused by either of the terms. For the first five instances, this gap is vastly dominated by the violation of complementarity, while for the rail instances it can be composed by either the Lagrangian near-optimality term or the near-complementarity term, or a combination of them. Further, large gaps are consistently caused solely by violation of complementarity, due to excess coverage of constraints.
Our observations can be utilized when designing core problem solution strategies for classes of set covering problems with known characteristics. A core problem is a restricted but feasible version of an original problem; such a problem should be of a manageable size and is constructed by selecting a subset of the original variables, see for example [12]. Our results indicate that if the duality gap is expected to be large then it can also be expected that the near-optimality term is relatively small. Since \(\varepsilon (x^*,u^*)=\sum _{j\in \mathcal {J}}\overline{c}_jx_j^* \ge 0\), it is then likely that \(x_j^*=0\) holds whenever \(\bar{c}_j\) is large. Therefore, variables with large values of \(\bar{c}_j\) can most likely be excluded from the core problem. Otherwise, if the gap is expected to be moderate, then the near-optimality term can be relatively large, and therefore the core problem should also contain variables with relatively large reduced costs. These conclusions give a theoretical justification of the core problem construction used in [12].
4 Conclusion
We have extended the classical supporting hyperplane illustration of the duality gap for non-convex optimization problems, by dissecting the gap into two contributions: near-optimality in the Lagrangian relaxation and near-complementarity in the Lagrangian relaxed constraints. This dissection adds improved understanding of the nature of the duality gap. We have also demonstrated that this dissection may have implications on the design of solution approaches.
References
Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II. Springer, Berlin (1993)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 2nd edn. Wiley, New York (1993)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Bertsekas, D.P., Nedić, A., Ozdaglar, A.E.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)
Shapiro, J.F.: Mathematical Programming: Structures and Algorithms. Wiley, New York (1979)
Larsson, T., Patriksson, M.: Global optimality conditions for discrete and nonconvex optimization—with applications to Lagrangian heuristics and column generation. Oper. Res. 54, 436–453 (2006)
Zhao, Y., Larsson, T., Rönnberg, E.: An integer programming column generation principle for heuristic search methods. Int. Trans. Oper. Res. 27, 665–695 (2020)
Ngulo, U., Larsson, T., Quttineh, N.-H.: A dissection of the duality gap of set covering problems. In: Neufeld, J.S., Buscher, U., Lasch, R., Möst, D., Schönberger, J. (eds.) Operations Research Proceedings 2019, Selected Papers of the Annual International Conference of the German Operations Research Society (GOR), Dresden 2019. Springer International Publishing, Cham, pp. 175–181 (2020)
Wolsey, L.A.: Integer Programming. Wiley, Hoboken (1998)
Beasley, J.E.: OR-Library: distributing test problems by electronic mail. J. Oper. Res. Soc. 41(11), 1069–1072 (1990)
Beasley, J.E.: A Lagrangian heuristic for set-covering problems. Nav. Res. Logist. 37(1), 151–164 (1990)
Ceria, S., Nobili, P., Sassano, A.: A Lagrangian-based heuristic for large-scale set covering problems. Math. Program. 81(2), 215–228 (1998)
Acknowledgements
We thank the anonymous reviewers for valuable suggestions, that lead to a considerable improvement of the paper.
Funding
Open access funding provided by Linköping University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Quttineh, NH., Larsson, T. Dissecting the duality gap: the supporting hyperplane interpretation revisited. Optim Lett 16, 1093–1102 (2022). https://doi.org/10.1007/s11590-021-01764-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-021-01764-7