Generalized derivatives of optimal-value functions with parameterized convex programs embedded

994 Accesses
Explore all metrics

Abstract

This article proposes new practical methods for furnishing generalized derivative information of optimal-value functions with embedded parameterized convex programs, with potential applications in nonsmooth equation-solving and optimization. We consider three cases of parameterized convex programs: (1) partial convexity—functions in the convex programs are convex with respect to decision variables for fixed values of parameters, (2) joint convexity—the functions are convex with respect to both decision variables and parameters, and (3) linear programs where the parameters appear in the objective function. These new methods calculate an LD-derivative, which is a recently established useful generalized derivative concept, by constructing and solving a sequence of auxiliary linear programs. In the general partial convexity case, our new method requires that the strong Slater conditions are satisfied for the embedded convex program’s decision space, and requires that the convex program has a unique optimal solution. It is shown that these conditions are essentially less stringent than the regularity conditions required by certain established methods, and our new method is at the same time computationally preferable over these methods. In the joint convexity case, the uniqueness requirement of an optimal solution is further relaxed, and to our knowledge, there is no established method for computing generalized derivatives prior to this work. In the linear program case, both the Slater conditions and the uniqueness of an optimal solution are not required by our new method.

Directional Derivative of the Value Function for Parametric Set-Constrained Optimization Problems

Article 11 June 2024

Second-Order Optimality Conditions for Constrained Optimization Problems with $C^1$ Data Via Regular and Limiting Subdifferentials

Article 28 June 2021

Value Functions and Their Directional Derivatives in Parametric Nonlinear Programming

Article 09 November 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This article considers parameterized convex programs of the form

$$\begin{aligned} \begin{aligned} \phi {(\textbf{y})}:=\min _{\textbf{x}\in {X}}\quad&f(\textbf{x},\textbf{y})\\ \text {s.t.}\quad&\textbf{g}(\textbf{x},\textbf{y})\le \textbf{0},\\&\textbf{h}(\textbf{x},\textbf{y})=\textbf{0}, \end{aligned} \end{aligned}$$

(1)

where the functions f, $\textbf{g}$, $\textbf{h}$ are continuously differentiable, $\phi $ is an optimal-value function in the sense of Thibault [51], and for each fixed $\textbf{y}$, the optimization problem is convex. It is well-known that even if the optimization problem in (1) is convex and if the functions f, $\textbf{g}$, and $\textbf{h}$ are sufficiently smooth, the function $\phi $ is still nonsmooth and nonconvex in general, or even discontinuous without certain regularity conditions (c.f. [4]). Due to such nonsmoothness, algorithms for equation-solving [10, 15, 40, 52] and optimization [31, 32, 53, 55] with $\phi $ embedded typically require furnishing $\phi $’s generalized derivatives, which are sensitivity concepts for nonsmooth functions generalized from smooth functions. This article will propose new implementable methods for furnishing the generalized derivatives of $\phi $, under mild conditions that can be satisfied in practice.

(Clarke’s) generalized Jacobian [6] is a point-to-set generalization of the classical derivative to locally Lipschitz continuous functions that are not differentiable everywhere, and is useful in methods for nonsmooth equation-solving or optimization. Semismooth Newton methods [40, 52] for equation-solving replace the usual derivatives employed in the classical Newton method with elements of the generalized Jacobian, and can exhibit Q-quadratic local convergence properties [39] similarly to their smooth counterparts [36]. Recently, a linear program (LP-)based Newton method [10, 15] even relaxes the non-singularity requirement of sensitivities in both classical and semismooth Newton methods, while keeping the same local convergence properties. Besides, bundle methods [31, 32] for nonsmooth optimization utilize generalized Jacobian elements to approximate the original nonsmooth functions locally at each iteration. Elements of the generalized Jacobian were also supplied to the smooth optimization solver IPOPT [54] to solve nonsmooth programs [53, 55], which turned out to outperform the bundle methods in some cases.

Though the generalized Jacobian elements are useful, they are typically difficult to compute in practice. One reason is that the generalized Jacobian only satisfy calculus rules as inclusions (see [5, Theorem 2.6.6]); often strict inclusions [1]. Nesterov’s lexicographic derivative (L-derivative) [35] and the lexicographic directional derivative (LD-derivative) [27] are useful for mitigating this problem. The L-derivative is another generalized derivative concept for nonsmooth functions. It has been shown in [26] that the L-derivatives are elements of the plenary hull of the generalized Jacobian whenever they exist. As a result, the L-derivatives are no less useful than generalized Jacobian elements in methods for equation-solving or optimization, in the sense that using L-derivatives in place of these elements does not affect the desirable convergence properties of these methods (c.f. [27, Sect. 2.2]). Moreover, for piecewise-differentiable functions (in the sense of Scholtes [47]) or scalar-valued functions such as $\phi $ above, L-derivatives are guaranteed to comprise a subset of the generalized Jacobian (c.f. [27, Sect. 3] and [35, Theorem 11], respectively). The LD-derivative [27] is a nonsmooth extension of the classical directional derivative, which provides a practical way for calculating the L-derivatives. From an LD-derivative, an L-derivative can be computed by solving a simple linear equation system. Unlike the generalized Jacobian, the LD-derivatives follow a sharp chain rule for composite functions, which enables a vector forward mode of automatic differentiation (AD) [27] or a reverse AD mode via branch-locking [25].

The goal of this article is to propose implementable methods for furnishing useful generalized derivative information for $\phi $ in (1). Unfortunately, established methods for estimating, characterizing, or computing generalized derivatives of optimal-value functions are of limited scope for parameterized nonlinear programs (NLPs). For parameterized NLPs whose parameters only appear in the objective function, sufficient conditions under which the optimal-value function is differentiable were given in [4, 38]. When the optimal-value function is not differentiable, Clarke [5] gave a characterization of the generalized Jacobian. For the situation where parameters only appear at constraints’ right-hand sides, Gauvin [16] estimated the generalized Jacobian as a subset of the convex hull of all KKT multipliers. If assuming parameterized convex NLPs and that the strong Slater conditions ([20, Definition 2.3.1]) hold, then the corresponding optimal-value function is convex, and the generalized Jacobian is identical to the set of all KKT multipliers (follows from [5, Proposition 1.2] and [20, Theorems 3.3.2 and 3.3.3]). For linear programs (LPs) parameterized by the constraints’ right-hand-side vectors, Höffner et al. [21] proposed a practical method for computing an LD-derivative by solving a sequence of auxiliary LPs. This method was extended to compute LD-derivatives for lexicographic LPs [18] and dynamic systems with LPs embedded [21]. The situation where parameters appear at the left-hand side of constraints, such as in (1), is in general more difficult to handle (and can readily cover the right-hand-side perturbations). Some results for estimating the Fréchet subgradients can be found in [34]. Rockafellar [43, 44] estimated the generalized Jacobian based on Lagrange multipliers (treat active inequality constraints as equalities) and the concept of proximal subgradients. When the decision space is closed and convex for fixed parameters, Thibault [51] estimated the generalized Jacobian of the optimal-value function using generalized Jacobians of the support functions of the decision space. De Wolf and Smeers [9] characterized the generalized Jacobian of LPs parameterized by the matrix coefficients in constraints, which were then applied to solve nonsmooth NLPs arising in the applications of natural gas transmission networks. Other studies (e.g. [4, 7, 8, 22, 41, 45], among many) focus on directional differentiability of optimal-value functions. However, such directional derivatives are not enough for furnishing an L-derivative or an element in the generalized Jacobian [1], which are in turn useful in algorithms for nonsmooth equation-solving and optimization.

There are also extensive studies (refer to [4, 11, 12, 14, 29]) focusing on generalized sensitivity information of the optimal-solution mapping of parameterized programs. Given such information, generalized derivatives of the optimal-value function may be easily computed. Recent works computed an LD-derivative of the optimal-solution mapping of parametric NLPs by solving a nonsmooth and nonlinear equation system [49], or by solving a sequence of quadratic programs [48], both under the linear independence constraint qualification (LICQ) and strong second-order sufficient condition. Under these conditions, the optimal solution is guaranteed to be unique around the parameter values of interest. Then, an LD-derivative of the optimal-value function follows from the sharp chain rule for LD-derivatives [27]. These methods could in principle be used to compute an LD-derivative of $\phi $ in (1). However, these methods cannot deal with parameterized convex NLPs with non-unique optimal solutions, and the LICQ cannot be verified unless the original NLP is solved, and is also somewhat stringent in convex optimization. For example, LICQ is not satisfied at any degenerate vertex (c.f. [3, Sect. 2.4]) of a polyhedral decision space. However, to the authors’ knowledge, these are the only established methods that may be used to achieve the goal of this article.

In this article, we propose new practical methods for evaluating useful LD-derivatives of the optimal-value function $\phi $ with parameterized convex NLPs embedded of the form (1). As mentioned, LD-derivatives can be used for computing L-derivatives, and the L-derivatives are guaranteed to be valid generalized Jacobian elements for optimal-value functions, which are in turn useful for nonsmooth equation-solving and optimization. Moreover, such LD-derivatives follow a sharp chain rule for composite functions, and thus allowing treatment of complex problems with $\phi $ embedded (in contrary to methods only computing generalized Jacobian elements e.g. [5, 9]). We consider three cases of parameterized convex NLPs of the form (1):

Partial convexity – the functions f, $\textbf{g}$, and $\textbf{h}$ are convex with respect to $\textbf{x}$, for each fixed $\textbf{y}$.
Joint convexity – the functions f, $\textbf{g}$, and $\textbf{h}$ are jointly convex with respect to both $\textbf{x}$ and $\textbf{y}$.
Parameterized LP’s objective function – the optimization problem in (1) is an LP wherein $\textbf{y}$ only appears in the objective function.

We propose new methods for computing the LD-derivatives for each of the three cases mentioned above. In each case, the new method computes the LD-derivatives by solving a sequence of LPs, of which the computational complexity grows polynomially with the size of the problem. In the partial convexity case, our new method is in general computationally preferable over the established methods [48, 49], which solve a nonsmooth and nonlinear equation system or a sequence of quadratic programs. The new method requires that the strong Slater conditions hold for the parameterized convex NLPs and requires that there exists exactly one optimal solution at the parameter value of interest. These conditions are essentially less stringent than the regularity conditions in [48, 49] (as will be discussed in Sect. 3.1), and thus our new method is applicable to a broader class of problems with partial convexity. We nevertheless note that [48, 49] primarily aim at calculating LD-derivatives of the optimal-solution mapping for general nonconvex parameterized NLPs, which intuitively needs more restrictive assumptions than our purpose. In the joint convexity case, the uniqueness requirement of an optimal solution is further relaxed; our new method only assumes a nonempty and bounded optimal solutions set for the parameters’ values of interest. Due to such non-uniqueness of optimal solutions, the methods in [48, 49] are no longer applicable. To the authors’ knowledge, prior to this article, there is no established method for furnishing computationally relevant generalized derivatives in this case. In the case of parameterized LP’s objective function, both the Slater conditions and the uniqueness requirement of an optimal solution are not required, and the new method necessarily yields an element of the B-subdifferential (see Definition 1 below). It has been shown [27] that the B-subdifferential-based Newton method [30] exhibits local Q-quadratic convergence under less stringent conditions than the semismooth Newton method [40] based on the generalized Jacobian.

The rest of this article is structured as follows. Section 2 presents mathematical preliminaries including notational convention, Clarke generalized Jacobian, lexicographic derivative, and set-valued mapping. Sections 3.1, 3.2, and 3.3 respectively present the new methods for computing an LD-derivative of $\phi $ in the partial convexity, joint convexity, and parameterized LP’s objective function cases.

2 Preliminaries

2.1 Notation

Throughout this article, scalars are denoted as lowercase letters (e.g. $x\in {\mathbb {R}}$), vectors are denoted as boldface lowercase letters (e.g. $\textbf{x}\in {\mathbb {R}}^n$), and the $i^{\textrm{th}}$ component of a vector $\textbf{x}$ is denoted as $x_i$. $\textbf{0}$ denotes a vector with all elements to be 0, whose dimensions are clear from the context. Matrices are denoted as boldface uppercase letters (e.g. $\textbf{M}\in {\mathbb {R}}^{n\times m}$). The $j^{\textrm{th}}$ column of a matrix $\textbf{M}$ is denoted as $\textbf{m}_{(j)}$. Sets are denoted as uppercase letters (e.g. $X\subset {\mathbb {R}}^n$). The interior of a set X is denoted as $\textrm{int}(X)$. Inequalities involving vectors are to be interpreted componentwise. The standard Euclidean norm $\Vert \cdot \Vert $ is considered on ${\mathbb {R}}^n$. For a function $\textbf{f}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m$ that is (Fréchet) differentiable at $\textbf{x}\in {\mathbb {R}}^n$, $\textbf{J}\textbf{f}(\textbf{x})\in {\mathbb {R}}^{m\times n}$ denotes the Jacobian of $\textbf{f}$ at $\textbf{x}$. For a scalar-valued differentiable function f of $(\textbf{x},\textbf{y})$, $\nabla _\textbf{x}f\in {\mathbb {R}}^{n_x}$ and $\nabla _\textbf{y}f\in {\mathbb {R}}^{n_y}$ denote the partial derivative of f with respect to $\textbf{x}$ and $\textbf{y}$, respectively. A vector-valued function $\textbf{f}$ is said to be convex if each component $f_i$ is convex.

2.2 Generalized Jacobian

Clarke’s generalized Jacobian [6] defined below is a pertinent generalized derivative concept for locally Lipschitz continuous functions that are not differentiable everywhere.

Definition 1

(from [27], originally from [6]) Given an open set $X\subset {\mathbb {R}}^n$ and a locally Lipschitz continuous function $\textbf{f}:X\rightarrow {\mathbb {R}}^m$, let $S\subset X$ be the set on which $\textbf{f}$ is differentiable. The B-subdifferential of $\textbf{f}$ at $\textbf{x}\in X$ is defined as

$$\begin{aligned} \partial _\textrm{B}\textbf{f}(\textbf{x}):=\{\textbf{H}\in {\mathbb {R}}^{m\times n}:\textbf{H}=\lim _{i\rightarrow \infty }\textbf{J}\textbf{f}(\textbf{x}^{(i)}),\quad \textbf{x}=\lim _{i\rightarrow \infty }\textbf{x}^{(i)},\quad \textbf{x}^{(i)}\in S,\;\forall i\in {\mathbb {N}}\}. \end{aligned}$$

The (Clarke) generalized Jacobian of $\textbf{f}$ at $\textbf{x}$, denoted as $\partial \textbf{f}(\textbf{x})$, is the convex hull of $\partial _\textrm{B}\textbf{f}(\textbf{x})$, and is nonempty, compact, and convex.

As introduced in Sect. 1, elements of the generalized Jacobian are used in the semismooth Newton method [40] and LP-Newton method [10, 15] for nonsmooth equation-solving, and in bundle methods [31, 32] for nonsmooth optimization.

2.3 Lexicographic derivatives

This section introduces Nesterov’s lexicographic derivatives [35], which have been shown to be no less useful than the generalized Jacobian for nonsmooth equation-solving and optimization [26, 35].

Definition 2

(from [47]) Given an open set $X\subset {\mathbb {R}}^n$, a function $\textbf{f}:X\rightarrow {\mathbb {R}}^m$, some $\textbf{x}\in X$, and some $\textbf{d}\in {\mathbb {R}}^n$, the limit

$$\begin{aligned} \underset{\alpha \downarrow 0}{\lim }\frac{\textbf{f}(\textbf{x}+\alpha \textbf{d})-\textbf{f}(\textbf{x})}{\alpha } \end{aligned}$$

is called the directional derivative of $\textbf{f}$ at $\textbf{x}$ if and only if it exists, and is denoted as $\textbf{f}'(\textbf{x};\textbf{d})$. The function $\textbf{f}$ is directionally differentiable at $\textbf{x}$ if and only if $\textbf{f}'(\textbf{x};\textbf{d})$ exists and is finite for each $\textbf{d}\in {\mathbb {R}}^n$. The function $\textbf{f}$ is directionally differentiable if and only if it is directionally differentiable at each point in its domain.

Definition 3

(from [27], originally from [35]) Let $X\subset {\mathbb {R}}^n$ be open and $\textbf{f}:X\rightarrow {\mathbb {R}}^m$ be locally Lipschitz continuous and directionally differentiable at $\textbf{x}\in X$. The function $\textbf{f}$ is lexicographically smooth (L-smooth) at $\textbf{x}$ if and only if for any $p\in {\mathbb {N}}$ and any matrix $\textbf{M}:=[\textbf{m}_{(1)}\;\textbf{m}_{(2)}\;\cdot \cdot \cdot \;\textbf{m}_{(p)}]\in {\mathbb {R}}^{n\times p}$, the following homogenization sequence of functions is well defined:

$$\begin{aligned} \begin{aligned}&\textbf{f}^{(0)}_{\textbf{x},\textbf{M}}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m:\textbf{d}\mapsto \textbf{f}'(\textbf{x};\textbf{d}),\\&\textbf{f}^{(j)}_{\textbf{x},\textbf{M}}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m:\textbf{d}\mapsto [\textbf{f}^{(j-1)}_{\textbf{x},\textbf{M}}]'(\textbf{m}_{(j)};\textbf{d}),\quad \forall j\in \{1,\ldots ,p\}. \end{aligned} \end{aligned}$$

The function $\textbf{f}$ is L-smooth if and only if it is L-smooth at each point in its domain. When the columns of $\textbf{M}$ span ${\mathbb {R}}^n$, $\textbf{f}^{(p)}_{\textbf{x},\textbf{M}}$ is guaranteed to be linear. If $p=n$ and $\textbf{M}\in {\mathbb {R}}^{n\times n}$ is non-singular, then a lexicographic derivative (L-derivative) of $\textbf{f}$ at $\textbf{x}$ in the directions $\textbf{M}$, denoted as $\textbf{J}_\textrm{L}\textbf{f}(\textbf{x};\textbf{M})$, is defined as

$$\begin{aligned} \textbf{J}_\textrm{L}\textbf{f}(\textbf{x};\textbf{M}):=\textbf{J}\textbf{f}^{(n)}_{\textbf{x},\textbf{M}}(\textbf{0})\in {\mathbb {R}}^{m\times n}, \end{aligned}$$

and the lexicographic subdifferential $\partial _\textrm{L}\textbf{f}(\textbf{x})$ is defined as

$$\begin{aligned} \partial _\textrm{L}\textbf{f}(\textbf{x}):=\{\textbf{J}_\textrm{L}\textbf{f}(\textbf{x};\textbf{M})\in {\mathbb {R}}^{m\times n}:\forall \textbf{M}\in {\mathbb {R}}^{n\times n}\text { and non-singular}\}. \end{aligned}$$

It has been shown that $\partial _\textrm{L}\textbf{f}(\textbf{x})\subset \partial \textbf{f}(\textbf{x})$ if $\textbf{f}$ is piecewise differentiable in the sense of Scholtes [47] ( [27, Sect. 3]), or if $\textbf{f}$ is a scalar-valued function ([35, Theorem 11]). Moreover, for vector-valued functions but not necessarily piecewise differentiable, Khan and Barton [26] showed that the L-derivatives are elements of the plenary hull [50] of the generalized Jacobian whenever they exists. These elements are no less useful than the generalized Jacobian in methods for nonsmooth equation-solving and optimization, in the sense that using these in place of the generalized Jacobian elements will not affect the convergence results of these methods (c.f. [26, Sect. 2.4]).

The following definition of lexicographic directional derivative provides a convenient way for evaluating an L-derivative.

Definition 4

(from [27]) Given an open set $X\subset {\mathbb {R}}^n$, a locally Lipschitz continuous function $\textbf{f}:X\rightarrow {\mathbb {R}}^m$ that is L-smooth at $\textbf{x}\in X$, and a matrix $\textbf{M}:=[\textbf{m}_{(1)}\;\cdot \cdot \cdot \;\textbf{m}_{(p)}]\in {\mathbb {R}}^{n\times p}$, define the lexicographic directional derivative (LD-derivative) as

$$\begin{aligned} \textbf{f}'(\textbf{x};\textbf{M}):=[\textbf{f}^{(0)}_{\textbf{x},\textbf{M}}(\textbf{m}_{(1)})\;\textbf{f}^{(1)}_{\textbf{x},\textbf{M}}(\textbf{m}_{(2)})\;\cdot \cdot \cdot \;\textbf{f}^{(p-1)}_{\textbf{x},\textbf{M}}(\textbf{m}_{(p)})]. \end{aligned}$$

As shown in [27], such LD-derivatives are particularly useful for evaluating the L-derivatives. Firstly, it holds that

$$\begin{aligned} \textbf{f}'(\textbf{x};\textbf{M})\equiv [\textbf{f}^{(p)}_{\textbf{x},\textbf{M}}(\textbf{m}_{(1)})\;\cdot \cdot \cdot \;\textbf{f}^{(p)}_{\textbf{x},\textbf{M}}(\textbf{m}_{(p)})]. \end{aligned}$$

Furthermore, since $\textbf{f}^{(n)}_{\textbf{x},\textbf{M}}$ is linear, if $\textbf{M}\in {\mathbb {R}}^{n\times n}$ is non-singular, $\textbf{f}'(\textbf{x};\textbf{M})$ and $\textbf{J}_\textrm{L}\textbf{f}(\textbf{x};\textbf{M})$ are related by

$$\begin{aligned} \textbf{f}'(\textbf{x};\textbf{M})\equiv \textbf{J}_\textrm{L}\textbf{f}(\textbf{x};\textbf{M})\textbf{M}. \end{aligned}$$

(2)

Secondly, the LD-derivatives follow a sharp chain rule (c.f. [27, Proposition 2.2] and [18, Theorems 2.1 and 2.2]) for L-smooth composite functions, which enables a vector forward mode of automatic differentiation [27].

2.4 Set-valued mappings

This section summarizes some closedness, boundedness, and Lipschitzian properties of set-valued mappings.

Definition 5

(from [46]) Given $Y\subset {\mathbb {R}}^n$ and $X\subset {\mathbb {R}}^m$, a set-valued mapping $\Omega :Y\rightrightarrows X$ maps each element of Y to a subset of X.

Definition 6

(from [23]) Given $Y\subset {\mathbb {R}}^n$ and $X\subset {\mathbb {R}}^m$, a set-valued mapping $\Omega :Y\rightrightarrows X$ is closed at $\bar{\textbf{y}}\in Y$ if for some sequence $\{\textbf{y}^k\}\subset Y$ such that $\bar{\textbf{y}}=\lim _{k\rightarrow \infty }\textbf{y}^k$, and some sequence $\{\textbf{x}^k\}$ such that $\textbf{x}^k\in \Omega (\textbf{y}^k)$ and $\bar{\textbf{x}}=\lim _{k\rightarrow \infty }\textbf{x}^k$, it follows that $\bar{\textbf{x}}\in \Omega (\bar{\textbf{y}})$.

Definition 7

(from [23]) Given $X\subset {\mathbb {R}}^m$, an open $Y\subset {\mathbb {R}}^n$, and a $\bar{\textbf{y}}\in Y$, a set-valued mapping $\Omega :Y\rightrightarrows X$ is uniformly bounded near $\bar{\textbf{y}}$, if there exists a neighborhood $N_{\bar{\textbf{y}}}\subset Y$ of $\bar{\textbf{y}}$ such that $\cup _{\textbf{y}\in N_{\bar{\textbf{y}}}}\Omega (\textbf{y})$ is bounded.

Definition 8

(from [28]) For some $\textbf{x}\in {\mathbb {R}}^n$ and $Z\subset {\mathbb {R}}^n$, let

$$\begin{aligned} d(\textbf{x},Z):=\inf \{\Vert \textbf{x}-\textbf{z}\Vert :\textbf{z}\in Z\}. \end{aligned}$$

Then, given $X^\textrm{A},X^\textrm{B}\subset {\mathbb {R}}^n$, define the Hausdorff distance between $X^\textrm{A}$ and $X^\textrm{B}$ as

$$\begin{aligned} d_\textrm{H}(X^\textrm{A},X^\textrm{B}):=\inf \{k:d(\textbf{x}^\textrm{A},X^\textrm{B})\le k, d(\textbf{x}^\textrm{B},X^\textrm{A})\le k, \textbf{x}^\textrm{A}\in X^\textrm{A},\textbf{x}^\textrm{B}\in X^\textrm{B}\}. \end{aligned}$$

Definition 9

(from [28]) Given $X\subset {\mathbb {R}}^m$, an open $Y\subset {\mathbb {R}}^n$, and $\bar{\textbf{y}}\in Y$, a set-valued mapping $\Omega :Y\rightrightarrows X$ is Lipschitzian near $\bar{\textbf{y}}$ if there exist a neighborhood $N_{\bar{\textbf{y}}}$ of $\bar{\textbf{y}}$ and a $l>0$ such that

$$\begin{aligned} d_\textrm{H}(\Omega (\textbf{y}^\textrm{A}),\Omega (\textbf{y}^\textrm{B}))\le l\Vert \textbf{y}^\textrm{A}-\textbf{y}^\textrm{B}\Vert ,\quad \forall \textbf{y}^\textrm{A},\textbf{y}^\textrm{B}\in N_{\bar{\textbf{y}}}. \end{aligned}$$

3 Computing generalized derivatives

This section presents our new methods for computing an LD-derivative of $\phi $ of the form (1), which is specialized to the partial convexity, joint convexity, and parameterized LP’s objective function cases as introduced in Sect. 1. These new methods compute an LD-derivative by solving a sequence of easily-constructed LPs. From an LD-derivative, an L-derivative of $\phi $ is readily computed via (2), provided a non-singular directions matrix $\textbf{M}\in {\mathbb {R}}^{n_y\times n_y}$ (the simplest is to employ an identity matrix $\textbf{M}:=\textbf{I}$). Since $\phi $ is scalar-valued, [35, Theorem 11] indicates that the L-derivatives are elements of Clarke generalized Jacobian, which are in turn useful in nonsmooth equation-solving and optimization. Moreover, the sharp chain rule of LD-derivatives allows to evaluate generalized derivatives for more complex problems with $\phi $ embedded.

With a slightly abuse of notation, the same letters may be used to represent different quantities in different subsections. Nevertheless, these letters will have consistent meanings in each subsection.

3.1 Partial convexity

The following assumption formalizes a parameterized convex NLP with partial convexity considered in this subsection.

Assumption 1

Let $X\subset {\mathbb {R}}^{n_x}$ be open and convex, and let $Y\subset {\mathbb {R}}^{n_y}$ be open. Consider functions $f:{X}\times Y\rightarrow {\mathbb {R}}$, $\textbf{g}:{X}\times Y\rightarrow {\mathbb {R}}^m$, and $\textbf{h}:{X}\times Y\rightarrow {\mathbb {R}}^l$. Consider the following parameterized NLP:

$$\begin{aligned} \begin{aligned} \phi (\textbf{y}):=\min _{\textbf{x}\in {X}}\quad&f(\textbf{x},\textbf{y})\\ \text {s.t.}\quad&\textbf{g}(\textbf{x},\textbf{y})\le \textbf{0},\\&\textbf{h}(\textbf{x},\textbf{y})=\textbf{0}. \end{aligned} \end{aligned}$$

(3)

For some $\bar{\textbf{y}}\in Y$ and some neighborhood $N_{\bar{\textbf{y}}}\subset Y$ of $\bar{\textbf{y}}$, define the parameterized decision space as a set-valued mapping $\Omega :N_{\bar{\textbf{y}}}\rightrightarrows X$ such that for each $\textbf{y}\in N_{\bar{\textbf{y}}}$,

$$\begin{aligned} \Omega (\textbf{y}):=\{\textbf{x}\in X: \textbf{g}(\textbf{x},\textbf{y})\le \textbf{0}\text { and }\textbf{h}(\textbf{x},\textbf{y})=\textbf{0}\}. \end{aligned}$$

Suppose that the following conditions hold:

I.1
the functions f, $\textbf{g}$, and $\textbf{h}$ are continuously differentiable on ${X}\times N_{\bar{\textbf{y}}}$,
I.2
the optimization problem in (3) for $\textbf{y}:=\bar{\textbf{y}}$ has exactly one optimal solution, denoted as $\bar{\textbf{x}}^*$,
I.3
there exists a closed set ${\tilde{X}}\subset X$ such that $\cup _{\textbf{y}\in N_{\bar{\textbf{y}}}}\Omega (\textbf{y})\subset {\tilde{X}}$,

and for each $\textbf{y}\in N_{\bar{\textbf{y}}}$,

I.4
$\phi ({\textbf{y}})$ is finite,
I.5
the mappings $f(\cdot ,\textbf{y})$ and $\textbf{g}(\cdot ,\textbf{y})$ are convex,
I.6
the function $\textbf{h}(\cdot ,\textbf{y})$ is affine,
I.7
the strong Slater conditions hold for $\Omega (\textbf{y})$: since Condition I.6 holds, for each $i\in \{1,\ldots ,l\}$, let $\textbf{a}^i_{\textbf{y}}:=\nabla _\textbf{x}h_i(\textbf{x},{\textbf{y}}),\forall \textbf{x}\in X$, suppose that the vectors $\textbf{a}^i_{\textbf{y}},\forall i\in \{1,\ldots ,l\}$ are linearly independent and there exists a $\varvec{{\xi }}_\textbf{y}\in X$ (can be dependent on $\textbf{y}$) such that $\textbf{g}(\varvec{{\xi }}_\textbf{y},{\textbf{y}})< \textbf{0}$ and $\textbf{h}(\varvec{{\xi }}_\textbf{y},{\textbf{y}})=\textbf{0}$.

Observe that for each $\textbf{y}\in N_{\bar{\textbf{y}}}$, the optimization problem in (3) is convex. Then, (3) is called a parameterized convex NLP with partial convexity.

Condition I.3 can often be satisfied in practice, and if $X\equiv {\mathbb {R}}^{n_x}$, this condition is always satisfied with ${\tilde{X}}:={\mathbb {R}}^{n_x}$. Condition I.2 can be satisfied if the objective function $f(\cdot ,\textbf{y})$ is strictly convex, or if certain regularity conditions are satisfied (e.g. [41, 48, 49]). This condition can be relaxed in both the joint convexity and parameterized LP cases, as will be seen in Sects. 3.2 and 3.3.

For a special case where the parameters $\textbf{y}$ only appear in the objective function, [4, Remark 4.14] indicates that the optimal-value function $\phi $ in Assumption 1 is differentiable, and gives a practical method for calculating the gradients. In other general nonsmooth cases, [48, 49] are the only established methods that can be used for calculating a generalized derivative of $\phi $. These methods compute an LD-derivative of a unique optimal-solution function, and the LD-derivative of $\phi $ follows from a sharp chain rule. We note that in the partial convexity setting (i.e., Conditions I.4–I.6 hold), Conditions I.1, I.2 and I.7 are essentially less stringent than the required regularity conditions in [48, 49] for the following reasons. [48, 49] require that the linear independence constraint qualification and the strong second-order sufficient condition hold at an optimal solution, and that the functions f, $\textbf{g}$, and $\textbf{h}$ to be second-order continuously differentiable. Under these conditions, the parameterized primal optimal solution and dual optimal solution are guaranteed to be unique near $\bar{\textbf{y}}$, and to be piecewise-differentiable functions on $N_{\bar{\textbf{y}}}$. Notably, this directly implies Conditions I.1, I.2, and I.7 (see [20, Theorem 2.3.2] for satisfaction of Condition I.7), but not the converse. Interestingly, our method requires the optimal solution at $\bar{\textbf{y}}$ to be unique, but not necessarily around $\bar{\textbf{y}}$. We nevertheless note that [48, 49] primarily aim at calculating LD-derivatives of the optimal-solution mapping for general nonconvex parameterized NLPs, which intuitively needs more restrictive assumptions than our purpose.

The following proposition establishes the locally Lipschitz continuity of $\phi $ in Assumption 1, which is essential for $\phi $ to be L-smooth.

Proposition 1

Under Assumption 1, the function $\phi $ is locally Lipschitz continuous at $\bar{\textbf{y}}$.

Proof

Let $W\subsetneq X$ be a bounded open convex set such that $\bar{\textbf{x}}^*\in W$. Let ${\tilde{N}}_{\bar{\textbf{y}}}\subsetneq N_{\bar{\textbf{y}}}$ be a neighborhood of $\bar{\textbf{y}}$. Since Condition I.1 holds, it follows that the functions f, ${\textbf{g}}$, and ${\textbf{h}}$ are (globally) Lipschitz continuous on $W\times {\tilde{N}}_{\bar{\textbf{y}}}$. Since Condition I.2 holds, the optimal solutions set of the NLP for $\textbf{y}:=\bar{\textbf{y}}$ in (3) is nonempty and bounded. Combining these with Conditions I.3 ,I.5, I.6, and I.7 in Assumption 1, according to [28, Theorem 1], we obtain that the infimum function

$$\begin{aligned} \begin{aligned} {\tilde{\phi }}(\textbf{y}):=\inf _{\textbf{x}\in X}\quad&{f}(\textbf{x},\textbf{y})\\ \text {s.t.}\quad&{\textbf{g}}(\textbf{x},\textbf{y})\le \textbf{0},\\&{\textbf{h}}(\textbf{x},\textbf{y})=\textbf{0}, \end{aligned} \end{aligned}$$

(4)

is locally Lipschitz continuous at $\bar{\textbf{y}}$. Since ${\tilde{\phi }}(\textbf{y})\equiv \phi (\textbf{y}),\forall \textbf{y}\in N_{\bar{\textbf{y}}}$ due to Condition I.4, it follows that $\phi $ is locally Lipschitz continuous at $\bar{\textbf{y}}$. $\square $

Remark 1

Note that [28, Theorem 1] used in the proof above requires that the set $X\equiv {\mathbb {R}}^{n_x}$, while here X is an open set. However, since the decision variables $\textbf{x}$ never leave X, relevant properties of f, $\textbf{g}$, and $\textbf{h}$ only need to be held on X. Moreover, since Condition I.3 holds, [28, Theorem 1] is indeed applicable here with the justification above.

The following Lemmata establish important regularity properties of the primal and dual optimal solutions mappings of the optimization problem near $\bar{\textbf{y}}$ in (3).

Lemma 1

Suppose that Assumption 1 holds. Define $M:N_{\bar{\textbf{y}}}\rightrightarrows {\mathbb {R}}^{n_x}$ so that for each $\textbf{y}\in N_{\bar{\textbf{y}}}$, $M(\textbf{y})$ is the set of optimal solutions of the optimization problem in (3), i.e.,

$$\begin{aligned} M(\textbf{y}):=\{\textbf{x}\in \Omega (\textbf{y}):\phi (\textbf{y})= f(\textbf{x},\textbf{y})\}. \end{aligned}$$

Then, M is nonempty and uniformly bounded near $\bar{\textbf{y}}$.

Proof

Condition I.4 implies that M is nonempty near $\bar{\textbf{y}}$.

For some $\epsilon >0$, define $M_{\epsilon }:N_{\bar{\textbf{y}}}\rightrightarrows {\mathbb {R}}^{n_x}$ such that for each $\textbf{y}\in N_{\bar{\textbf{y}}}$, $M_{\epsilon }(\textbf{y})$ is the set of all $\epsilon $-optimal solutions of the NLP in (3), i.e.,

$$\begin{aligned} M_{\epsilon }(\textbf{y}):=\{\textbf{x}\in \Omega (\textbf{y}): f(\textbf{x},\textbf{y})\le \phi (\textbf{y})+\epsilon \}. \end{aligned}$$

Following the arguments in proof of Proposition 1 and applying [28, Theorem 1], we obtain that $M_{\epsilon }$ is Lipschitzian near $\bar{\textbf{y}}$ as in Definition 9. Moreover, since $M(\bar{\textbf{y}})$ is bounded due to Condition I.2, according to [42, Corollary 8.7.1], $M_{\epsilon }(\bar{\textbf{y}})$ is bounded. Now, according to Definition 9, there exist a neighborhood ${\hat{N}}_{\bar{\textbf{y}}}\subset N_{\bar{\textbf{y}}}$ of $\bar{\textbf{y}}$, a $l>0$, and a $\sigma >0$ such that

$$\begin{aligned} d_{\textrm{H}}(M_\epsilon (\textbf{y}),M_\epsilon (\bar{\textbf{y}}))\le l\Vert \textbf{y}-\bar{\textbf{y}}\Vert \le l\sigma , \quad \forall \textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}. \end{aligned}$$

This implies that for each $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$ and each $\textbf{x}^\textrm{B}\in M_\epsilon (\textbf{y})$,

$$\begin{aligned} d(\textbf{x}^\textrm{B},M_\epsilon (\bar{\textbf{y}}))\equiv \inf \{\Vert \textbf{x}^\textrm{B}-\textbf{x}^\textrm{A}\Vert :\textbf{x}^\textrm{A}\in M_\epsilon (\bar{\textbf{y}})\}\le l\sigma . \end{aligned}$$

Since $M_\epsilon (\bar{\textbf{y}})$ is bounded, define ${\tilde{M}}_{\epsilon }(\bar{\textbf{y}})$ to be the closure of $M_\epsilon (\bar{\textbf{y}})$. Then, we have

$$\begin{aligned} d(\textbf{x}^\textrm{B},M_\epsilon (\bar{\textbf{y}}))\equiv \min \{\Vert \textbf{x}^\textrm{B}-\tilde{\textbf{x}}^\textrm{A}\Vert :\tilde{\textbf{x}}^\textrm{A}\in {\tilde{M}}_\epsilon (\bar{\textbf{y}})\}\le l\sigma . \end{aligned}$$

Now, consider an arbitrary $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$ and an arbitrary $\textbf{x}^\textrm{B}\in M_{\epsilon }(\textbf{y})$. We may find a $\tilde{\textbf{x}}^{\textrm{A}*}\in {\tilde{M}}_\epsilon (\bar{\textbf{y}})$ such that

$$\begin{aligned} d(\textbf{x}^\textrm{B},M_\epsilon (\bar{\textbf{y}}))\equiv \Vert \textbf{x}^\textrm{B}-\tilde{\textbf{x}}^{\textrm{A}*}\Vert . \end{aligned}$$

Moreover, since ${\tilde{M}}_\epsilon (\bar{\textbf{y}})$ is bounded, there exists an $\alpha >0$ such that $\Vert \tilde{\textbf{x}}^A\Vert \le \alpha ,\forall \tilde{\textbf{x}}^A\in {\tilde{M}}_\epsilon (\bar{\textbf{y}})$. Then, it follows that

$$\begin{aligned} \begin{aligned} \Vert \textbf{x}^\textrm{B}\Vert -\Vert \tilde{\textbf{x}}^{\textrm{A}*}\Vert&\le \Vert \textbf{x}^\textrm{B}-\tilde{\textbf{x}}^{\textrm{A}*}\Vert \le l\sigma \\ \Vert \textbf{x}^\textrm{B}\Vert&\le l\sigma + \Vert \tilde{\textbf{x}}^{A*}\Vert \le l\sigma +\alpha . \end{aligned} \end{aligned}$$

Since both the choices of $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$ and $\textbf{x}^\textrm{B}\in M_\epsilon (\bar{\textbf{y}})$ are arbitrary above, it implies that $M_\epsilon $ is uniformly bounded on ${\hat{N}}_{\bar{\textbf{y}}}$. Since $M(\textbf{y})\subset M_\epsilon (\textbf{y}),\forall \textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$, it follows that M is uniformly bounded on ${\hat{N}}_{\bar{\textbf{y}}}$ too. $\square $

Remark 2

The proof of Lemma 1 implies that in general, if a set-valued mapping is bounded at a point, and if the mapping is also locally Lipschitzian at this point, then the mapping must also be locally bounded.

Lemma 2

Under Assumption 1, the set-valued mapping M in Lemma 1 is closed at $\bar{\textbf{y}}$ as in Definition 6.

Proof

Since Condition I.3 holds, for each $\textbf{y}\in N_{\bar{\textbf{y}}}$,

$$\begin{aligned} \Omega (\textbf{y})\equiv \{\textbf{x}\in {\tilde{X}}:\textbf{g}(\textbf{x},\textbf{y})\le \textbf{0}\text { and }\textbf{h}(\textbf{x},\textbf{y})=\textbf{0}\}. \end{aligned}$$

Moreover, since the functions $\textbf{g}$ and $\textbf{h}$ are continuous by assumption, the set-valued mapping $\Omega $ is closed at $\textbf{y}$ due to [22, Theorem A.5].

Since it has been established that M is nonempty near $\bar{\textbf{y}}$ in Lemma 1, we may choose a sequence $\{\textbf{y}^k\}\subset N_{\bar{\textbf{y}}}$ such that $\lim _{k\rightarrow \infty }\textbf{y}^k=\bar{\textbf{y}}$ and sequence $\{\textbf{x}^k\}$ such that $\textbf{x}^k\in M(\textbf{y}^k)$ and $\lim _{k\rightarrow \infty }\textbf{x}^k=\bar{\textbf{x}}$. Then, we have

$$\begin{aligned} f(\bar{\textbf{x}},\bar{\textbf{y}})=\lim _{k\rightarrow \infty }f(\textbf{x}^k,\textbf{y}^k)=\lim _{k\rightarrow \infty }\phi (\textbf{y}^k). \end{aligned}$$

Since $\phi $ is continuous near $\bar{\textbf{y}}$ due to Proposition 1 and since $\lim _{k\rightarrow \infty }\textbf{y}^k=\bar{\textbf{y}}$, we have $\lim _{k\rightarrow \infty }\phi (\textbf{y}^k)=\phi (\bar{\textbf{y}})$, which implies that $ f(\bar{\textbf{x}},\bar{\textbf{y}})=\phi (\bar{\textbf{y}})$. Moreover, since $\Omega $ is closed at $\bar{\textbf{y}}$, it follows that $\bar{\textbf{x}}\in \Omega (\bar{\textbf{y}})$. Thus, $\bar{\textbf{x}}\in M(\bar{\textbf{y}})$ and M is closed at $\bar{\textbf{y}}$. $\square $

Lemma 3

Suppose that Assumption 1 holds. Define the Lagrange dual function $\ell :X\times N_{\bar{\textbf{y}}}\times {\mathbb {R}}^m\times {\mathbb {R}}^l\rightarrow {\mathbb {R}}$ of the parameterized convex NLP in (3) such that for each $(\textbf{x},\textbf{y},\varvec{{\lambda }},\varvec{{\mu }})\in X\times N_{\bar{\textbf{y}}}\times {\mathbb {R}}^m\times {\mathbb {R}}^l $,

$$\begin{aligned} \ell (\textbf{x},\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}):=f(\textbf{x},\textbf{y})+\varvec{{\lambda }}^\textrm{T}\textbf{g}(\textbf{x},\textbf{y})+\varvec{{\mu }}^\textrm{T}\textbf{h}(\textbf{x},\textbf{y}). \end{aligned}$$

Define $U:N_{\bar{\textbf{y}}}\rightrightarrows {\mathbb {R}}^{m+l}$ such that for each $\textbf{y}\in N_{\bar{\textbf{y}}}$, $U(\textbf{y})$ is the set of dual optimal solutions of the optimization problem in (3), i.e.,

$$\begin{aligned} U(\textbf{y}):=\arg \max _{\varvec{{\lambda }}\in {\mathbb {R}}^m,\varvec{{\mu }}\in {\mathbb {R}}^l} o(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }})\text { s.t. }\varvec{{\lambda }}\ge \textbf{0}, \end{aligned}$$

where

$$\begin{aligned} o(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}):=\inf _{\textbf{x}\in {X}}\ell (\textbf{x},\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}). \end{aligned}$$

Then, U is closed at $\bar{\textbf{y}}$, and is nonempty and uniformly bounded near $\bar{\textbf{y}}$.

Proof

This proof proceeds similarly to the proof of [22, Lemma 2].

Since it has been shown in Lemma 1 that the optimal-solution set-valued mapping M is nonempty and uniformly bounded near $\bar{\textbf{y}}$ and since X is assumed to be open, we may choose a compact set ${\hat{X}}\subset X$ such that $M(\textbf{y})\subset \textrm{int}({\hat{X}})$ for all $\textbf{y}$ in some neighborhood ${\hat{N}}_{\bar{\textbf{y}}}\subset N_{\bar{\textbf{y}}}$ of $\bar{\textbf{y}}$. Then, define the following optimal-value function on ${\hat{N}}_{\bar{\textbf{y}}}$:

$$\begin{aligned} \begin{aligned} {\hat{\phi }}(\textbf{y}):=\min _{\textbf{x}\in {\hat{X}}}\quad&f(\textbf{x},\textbf{y})\\ \text {s.t.}\quad&\textbf{g}(\textbf{x},\textbf{y})\le \textbf{0},\\&\textbf{h}(\textbf{x},\textbf{y})=\textbf{0}. \end{aligned} \end{aligned}$$

(5)

It is straightforward that for each $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$, $\phi (\textbf{y})\equiv {\hat{\phi }}({\textbf{y}})$, and the optimization problems in (3) and (5) have identical primal optimal solutions. Define ${\hat{U}}:{\hat{N}}_{\bar{\textbf{y}}}\rightrightarrows {\mathbb {R}}^{m+l}$ such that for each $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$, ${\hat{U}}(\textbf{y})$ is the set of dual optimal solutions of the optimization problem in (5), that is

$$\begin{aligned} {\hat{U}}(\textbf{y}):=\arg \max _{\varvec{{\lambda }}\in {\mathbb {R}}^m,\varvec{{\mu }}\in {\mathbb {R}}^l} {\hat{o}}(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }})\text { s.t. }\varvec{{\lambda }}\ge \textbf{0}, \end{aligned}$$

(6)

where

$$\begin{aligned} {\hat{o}}(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}):=\inf _{\textbf{x}\in {\hat{X}}}\ell (\textbf{x},\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}). \end{aligned}$$

(7)

Now, we show that for each $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$, $U(\textbf{y})\equiv {\hat{U}}(\textbf{y})$. We will show later that ${\hat{U}}$ is uniformly bounded near $\bar{\textbf{y}}$ and is closed at $\bar{\textbf{y}}$. Since Conditions I.4, I.5, I.6, and I.7 hold, [20, Theorem 2.3.2] shows that U is bounded at $\bar{\textbf{y}}$, is nonempty near $\textbf{y}$, and strong duality holds near $\textbf{y}$. Thus, for each $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$,

$$\begin{aligned} \phi (\textbf{y})\equiv \left( \sup _{\varvec{{\lambda }}\in {\mathbb {R}}^m,\varvec{{\mu }}\in {\mathbb {R}}^l} o(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }})\text { s.t. }\varvec{{\lambda }}\ge \textbf{0}\right) . \end{aligned}$$

(8)

Similarly to [28, Theorem 1] mentioned in the proof of Proposition 1, [20, Theorem 2.3.2] also requires $X\equiv {\mathbb {R}}^{n_x}$. However, since Condition I.3 holds, this result is applicable here. The weak duality theorem [2, Proposition 5.1.3] indicates that for each $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$,

$$\begin{aligned} \phi (\textbf{y})\equiv {\hat{\phi }}(\textbf{y})\ge \left( \sup _{\varvec{{\lambda }}\in {\mathbb {R}}^m,\varvec{{\mu }}\in {\mathbb {R}}^l} {\hat{o}}(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }})\text { s.t. }\varvec{{\lambda }}\ge \textbf{0}\right) . \end{aligned}$$

(9)

Consider an arbitrary fixed $\textbf{y}\in {\hat{N}}_{\bar{\textbf{y}}}$. Since ${\hat{X}}\subset X$, we have that

$$\begin{aligned} {\hat{o}}(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }})\ge {o}(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}),\quad \forall \varvec{{\lambda }}\ge 0,\forall \varvec{{\mu }}\in {\mathbb {R}}^l. \end{aligned}$$

Moreover, since (8) holds, for each $(\varvec{{\lambda }}^*,\varvec{{\mu }}^*)\in U(\textbf{y})$,

$$\begin{aligned} {\hat{o}}(\textbf{y},\varvec{{\lambda }}^*,\varvec{{\mu }}^*)\ge {o}(\textbf{y},\varvec{{\lambda }}^*,\varvec{{\mu }}^*)=\phi (\textbf{y}). \end{aligned}$$

Combining this with (9), we have that $\sup \{{\hat{o}}(\textbf{y},\varvec{{\lambda }},\varvec{{\mu }}):\varvec{{\lambda }}\ge \textbf{0},\varvec{{\mu }}\in {\mathbb {R}}^l\}=\phi (\textbf{y})$ and $U(\textbf{y})\subset {\hat{U}}(\textbf{y})$.

To arrive a contradiction, assume that $U(\textbf{y})\subsetneq {\hat{U}}(\textbf{y})$. This implies that there exists a $(\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})\in {\hat{U}}(\textbf{y})$ such that $o(\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})<\phi (\textbf{y})={\hat{o}}(\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})$, i.e.,

$$\begin{aligned} \inf \{\ell (\textbf{x},\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}}):\textbf{x}\in X\}< \inf \{\ell (\textbf{x},\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}}):\textbf{x}\in {\hat{X}}\}. \end{aligned}$$

Thus, there must exist a point $\textbf{x}^\textrm{A}\in ({X}{\setminus }{\hat{X}})$ such that $\ell (\textbf{x}^\textrm{A},\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})<\ell (\textbf{x}^*,\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})$, where $\textbf{x}^*\in \big (M(\textbf{y})\subset \textrm{int}({\hat{X}})\big )$. Since the mapping $\ell (\cdot ,\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})$ is convex, it follows that for any $0<\gamma <1$, with $\textbf{x}^\textrm{B}:=\gamma \textbf{x}^\textrm{A}+(1-\gamma )\textbf{x}^*$, we have that $\ell (\textbf{x}^\textrm{B},\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})<\ell (\textbf{x}^*,\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})$. Moreover, since $\textbf{x}^*$ minimizes the NLP in (5), [2, Proposition 5.1.1] indicates that $\textbf{x}^*$ also minimizes $\ell (\cdot ,\textbf{y},\hat{\varvec{{\lambda }}},\hat{\varvec{{\mu }}})$ on ${\hat{X}}$. This implies that $\textbf{x}^\textrm{B}$ must be not in ${\hat{X}}$, which contradicts the fact that $\textbf{x}^*\in \big (M(\textbf{y})\subset \textrm{int}({\hat{X}})\big )$. Thus, Combining this with $U(\textbf{y})\subset {\hat{U}}(\textbf{y})$, we have that $U(\textbf{y})\equiv {\hat{U}}(\textbf{y})$.

Since ${\hat{X}}$ is compact, applying [22, Theorem A.1] to (7) yields that ${\hat{o}}$ is continuous. Since ${\hat{U}}(\bar{\textbf{y}})\equiv U(\bar{\textbf{y}})$, ${\hat{U}}$ is also nonempty near $\bar{\textbf{y}}$ and bounded at $\textbf{y}$. Then, applying [22, Theorem A.4] to (6) yields that ${\hat{U}}$ is uniformly bounded near $\bar{\textbf{y}}$. Finally [22, Theorem A.2] shows that ${\hat{U}}$ is closed at $\bar{\textbf{y}}$. All these properties also hold for U. $\square $

The following theorem shows that the function $\phi $ in Assumption 1 is directionally differentiable at $\bar{\textbf{y}}$, and the directional derivative can be computed by constructing and solving an LP.

Theorem 1

Under Assumption 1, the directional derivative $\phi '(\bar{\textbf{y}};\textbf{d})$ exists and is finite for all $\textbf{d}\in {\mathbb {R}}^{n_y}$, and moreover,

$$\begin{aligned} \phi '(\bar{\textbf{y}};\textbf{d})\equiv [\nabla _\textbf{y}f(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{d}+\psi (\textbf{d}), \end{aligned}$$

(10)

where the function $\psi :{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}$ is defined as follows:

$$\begin{aligned} \begin{aligned} \psi (\textbf{d}):=\min _{\textbf{w}\in {\mathbb {R}}^{n_x}}\quad&[\nabla _\textbf{x}f(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{w}\\ \text {s.t.}\quad&\,[\nabla _\textbf{x}g_i(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{w}\le -[\nabla _\textbf{y}g_i(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{d},\quad \forall i\text { such that }g_i(\bar{\textbf{x}}^*,\bar{\textbf{y}})=0,\\&[\nabla _\textbf{x}h_j(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{w}=-[\nabla _\textbf{y}h_j(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{d},\quad \forall j\in \{1,\ldots ,l\}. \end{aligned} \end{aligned}$$

(11)

Proof

Under Assumption 1, Lemmata 1, 2, and 3 above show that both the primal optimal solutions mapping M and the dual optimal solutions mapping U are closed at $\bar{\textbf{y}}$ and are nonempty and uniformly bounded near $\bar{\textbf{y}}$. Then, we reformulate (3) to a maximization problem and reformulate each equality constraint $h_i(\textbf{x},\textbf{y})=0$ as two inequalities. Since X is open, it must be that $M(\bar{\textbf{y}})\subset \textrm{int}(X)$. Then, following the arguments in the proof of [22, Theorem 2 and Corollary 2.1], we obtain that $\phi '(\bar{\textbf{y}};\textbf{d})$ exists and is finite for all $\textbf{d}\in {\mathbb {R}}^{n_y}$ and

$$\begin{aligned} \begin{aligned}&\phi '(\bar{\textbf{y}};\textbf{d})\equiv \min _{\begin{array}{c} \textbf{x}\in M(\bar{\textbf{y}}),\textbf{w}\in {\mathbb {R}}^{n_x},\\ \textbf{z}^g\in {\mathbb {R}}^{m},\textbf{z}^h\in {\mathbb {R}}^{2l} \end{array}}\quad [\nabla _\textbf{y}f(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{d}+[\nabla _\textbf{x}f(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{w}\\ \text {s.t.}&\quad z^g_ig_i(\textbf{x},\bar{\textbf{y}})+[\nabla _\textbf{x}g_i(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{w}\le -[\nabla _\textbf{y}g_i(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{d},\;\forall i\in \{1,\ldots ,m\},\\&\quad z^h_jh_j(\textbf{x},\bar{\textbf{y}})+[\nabla _\textbf{x}h_j(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{w}\le -[\nabla _\textbf{y}h_j(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{d},\;\forall j\in \{1,\ldots ,l\},\\&\qquad -z^h_{j+l}h_j(\textbf{x},\bar{\textbf{y}})+[-\nabla _\textbf{x}h_j(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{w}\\&\le -[-\nabla _\textbf{y}h_j(\textbf{x},\bar{\textbf{y}})]^\textrm{T}\textbf{d}, \;\forall j\in \{1,\ldots ,l\}. \end{aligned} \end{aligned}$$

(12)

Since $M(\bar{\textbf{y}}):=\{\bar{\textbf{x}}^*\}$ as in Condition I.2, (12) reduces to (10) and (11). $\square $

Remark 3

The proof above utilizes arguments of [22, Theorem 2 and Corollary 2.1], and these results’ statements require X to be closed and convex, and require a parameterized convex program with only inequality constraints and the Slater conditions must hold. However, these requirements are only to guarantee that the mappings M and U in [22]’s case have the properties as in Lemmata 1, 2, and 3. Since these properties have already been established in our case without satisfying [22]’s conditions, it is verified that the arguments in the proof of [22, Theorem 2 and Corollary 2.1] are sufficient to obtain (12).

Remark 4

Theorem 1 provides a characterization of the directional derivative of parameterized convex NLPs with both inequality and equality constraints. This is a nontrivial generalization of [22, Theorem 2 and Corollary 2.1] which only deal with parameterized convex NLPs with inequality constraints, and deal with a closed X. We allow an open X by imposing Condition I.3, which opens up the theory to a broader class of functions in practice. The main difficulties for including affine equality constraints is to establish the uniform boundedness of M near $\bar{\textbf{y}}$ and the closedness of M at $\bar{\textbf{y}}$. In [22], the justifications of these properties (c.f. Theorems A.2 and A.4) rely heavily on the fact that the decision space mapping $\Omega $ is open at $\bar{\textbf{y}}$ (in the sense of Hogan [22, 23]). This is true for $\Omega $ only defined by inequality constraints (see [23, Theorem 12]), but does not necessarily hold if also considering equality constraints. In our work, based on a later result [28, Theorem 1], Lemmata 1 and 2 establish these properties without requiring $\Omega $ to be open at $\bar{\textbf{y}}$.

Observe that the function $\psi $ in (10) is an LP parameterized by the constraints’ right-hand side. To proceed, we reformulate the optimization problem of $\psi (\textbf{d})$ for each $\textbf{d}\in {\mathbb {R}}^{n_y}$ to a standard form of LP (c.f. [3]). Let $n_{{\bar{c}}}$ be the number of active constraints of (3) for $\textbf{y}:=\bar{\textbf{y}}$ at $\bar{\textbf{x}}^*$, and then we have for each $\textbf{d}\in {\mathbb {R}}^{n_y}$ that

$$\begin{aligned} \psi (\textbf{d})\equiv {q}(\varvec{{\beta }}(\textbf{d})) \end{aligned}$$

(13)

where $q:{\mathbb {R}}^{n_{{\bar{c}}}}\rightarrow {\mathbb {R}}\cup \{+\infty ,-\infty \}$ such that

$$\begin{aligned} \begin{aligned} q(\textbf{b}):= \inf _\textbf{v}\quad&\textbf{c}^\textrm{T}\textbf{v}\\ \text {s.t.}\quad&\textbf{A}\textbf{v}=\textbf{b},\\&\textbf{v}\ge \textbf{0}, \end{aligned} \end{aligned}$$

with appropriate dimensions for $\textbf{c},\textbf{v}$ and $\textbf{A}$, and $\varvec{{\beta }}:{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}^{n_{{\bar{c}}}}:\textbf{d}\mapsto \textbf{G}\textbf{d}$ is a linear function where $\textbf{G}\in {\mathbb {R}}^{n_{{\bar{c}}}\times n_y}$ is the stacked $[-\nabla _{\textbf{y}}g_i(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}$ for all i such that $g_i(\bar{\textbf{x}}^*,\bar{\textbf{y}})=0$ and all $[-\nabla _{\textbf{y}}h_j(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}$. With this preparation, the following proposition shows that $\psi $ is L-smooth whose LD-derivatives can be computed by solving a sequence of LPs.

Proposition 2

Consider the functions $\psi $, q, $\varvec{{\beta }}$ as in (13). Then, the function $\psi $ is L-smooth, and for any $\textbf{d}\in {\mathbb {R}}^{n_y}$ and any $\textbf{K}\in {\mathbb {R}}^{n_y\times r}$, the following sequence of functions is well-defined: $\forall j\in \{0,\ldots ,r\},$

$$\begin{aligned} \begin{aligned} \psi ^{(j)}_{\textbf{d},\textbf{K}}:{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}:\tilde{\textbf{d}}\mapsto \max _{\varvec{{\lambda }}\in {\mathbb {R}}^{{\bar{c}}}}\quad&[\textbf{G}\tilde{\textbf{d}}]^\textrm{T}\varvec{{\lambda }}\\ \text {s.t.}\quad&\begin{bmatrix} \textbf{A}^\textrm{T}\\ -[\varvec{{\beta }}(\textbf{d})]^\textrm{T}\\ -[\textbf{G}\textbf{k}_{(1)}]^\textrm{T}\\ \vdots \\ -[\textbf{G}\textbf{k}_{(j)}]^\textrm{T}\end{bmatrix}\varvec{{\lambda }}\le \begin{bmatrix} \textbf{c}\\ -\psi (\textbf{d})\\ -\psi _{\textbf{d},\textbf{M}}^{(0)}(\textbf{k}_{(1)})\\ \vdots \\ -\psi _{\textbf{d},\textbf{M}}^{(j-1)}(\textbf{k}_{(j)}) \end{bmatrix}, \end{aligned} \end{aligned}$$

(14)

where for $j=0$, the constraint $-[\textbf{G}\textbf{k}_{(j)}]^\textrm{T}\varvec{{\lambda }}\le -\psi _{\textbf{d},\textbf{K}}^{(j-1)}(\textbf{k}_{(j)})$ is understood to be vacuous. Moreover, associated to the sequence (14), define a sequence of sets $D_{\textbf{d},\textbf{K}}^{(j)}\subset {\mathbb {R}}^{^{n_{{\bar{c}}}}}$ such that for each $j\in \{0,\ldots ,r-1\}$, $D^{(j)}_{\textbf{d},\textbf{K}}$ is the optimal solution set of the LP for evaluating $\psi ^{(j)}_{\textbf{d},\textbf{K}}(\textbf{k}_{(j+1)})$ in (14). Then, it follows that

$$\begin{aligned} D^{(0)}_{\textbf{d},\textbf{K}}:=\arg \max _{\varvec{{\lambda }}\in \Lambda (\textbf{G}\textbf{d})}[\textbf{G}\textbf{k}_{(1)}]^\textrm{T}\varvec{{\lambda }} \end{aligned}$$

where $\Lambda (\textbf{G}\textbf{d})$ is the optimal solution set of the dual problem (c.f. [3]) of $q(\textbf{G}\textbf{d})$, and for each $j\in \{1,\ldots ,r-1\}$,

$$\begin{aligned} D^{(j)}_{\textbf{d},\textbf{K}}:=\Big (\arg \max _{\varvec{{\lambda }}\in D^{(j-1)}_{\textbf{d},\textbf{K}}}[\textbf{G}\textbf{k}_{(j+1)}]^\textrm{T}\varvec{{\lambda }}\Big )\subset D^{(j-1)}_{\textbf{d},\textbf{K}}. \end{aligned}$$

(15)

Furthermore, the LD-derivative of $\psi $ at $\textbf{d}$ in the directions $\textbf{K}$ is given by

$$\begin{aligned} \psi '(\textbf{d};\textbf{K}):=\varvec{{\lambda }}^\textrm{T}\textbf{G}\textbf{K},\quad \text {for any }\varvec{{\lambda }}\in D^{(r-1)}_{\textbf{d},\textbf{K}}. \end{aligned}$$

Proof

From Theorem 1, $\psi $ is finite for any $\textbf{d}\in {\mathbb {R}}^{n_y}$, i.e., $F\equiv {\mathbb {R}}^{n_y}$ where $F:=\{\textbf{d}\in {\mathbb {R}}^{n_y}:-\infty<\psi (\textbf{d})<+\infty \}$, and thus $\textbf{d}\in \textrm{int}(F)$. Moreover, since $\varvec{{\beta }}$ is linear and (13) holds, the claimed results follow from [21, Proposition 3.7]. $\square $

Remark 5

Analogously to [21, Remark 3.4], evaluating $\psi '(\textbf{d};\textbf{K})$ requires solving at most $n_y$ LPs, because $\psi ^{(k)}_{\textbf{d},\textbf{K}}$ is guaranteed to be linear if the columns $\{\textbf{k}_{(1)},\ldots ,\textbf{k}_{(k)}\}$ span ${\mathbb {R}}^{n_y}$. However, if at a $i<n_y$ such that $D^{(i)}_{\textbf{d},\textbf{K}}$ is a singleton, then (15) implies that $D^{(i)}_{\textbf{d},\textbf{K}}\equiv D^{(i+1)}_{\textbf{d},\textbf{K}}\equiv \cdot \cdot \cdot D^{(r-1)}_{\textbf{d},\textbf{K}}$; no need to solve more LPs. Moreover, since the computational complexity of solving each LP grows polynomially with the size of the problem, computational complexity for evaluation of the LD-derivative also grows polynomially with the size of the problem.

Motivated by Remark 5, we employ the following result from [33] to examine whether an obtained optimal solution for an LP is unique or not. This result examines LP solutions’ uniqueness by constructing and solving an auxiliary LP. This result provides a practical way to terminate solving the sequence of LPs in Proposition 2 effectively when one LP in the sequence turns out to have a unique solution.

Proposition 3

(Adapted from [33]) Consider an LP of the general form:

$$\begin{aligned} \begin{aligned} \min _\textbf{x}\quad&\textbf{p}^\textrm{T}\textbf{x}\\ \text {s.t.}\quad&\tilde{\textbf{A}}\textbf{x}=\tilde{\textbf{b}},\quad \leftarrow \text {multipliers}\;\textbf{u}\\&\tilde{\textbf{C}}\textbf{x}\ge \tilde{\textbf{d}},\quad \leftarrow \text {multipliers}\;\textbf{v}. \end{aligned} \end{aligned}$$

(16)

Let $\textbf{x}^*$ be a primal optimal solution and $(\textbf{u}^*,\textbf{v}^*)$ be a dual optimal solution of (16). Let $\tilde{\textbf{c}}_i$ denote the $i^\textrm{th}$ row of $\tilde{\textbf{C}}$. Define index sets

$$\begin{aligned} \begin{aligned} {\mathcal {K}}&:=\{i:\tilde{\textbf{c}}_i\textbf{x}^*={\tilde{d}}_i\text { and }v^*_i>0\},\\ \text {and}\quad {\mathcal {L}}&:=\{i:\tilde{\textbf{c}}_i\textbf{x}^*={\tilde{d}}_i\text { and }v^*_i=0\}. \end{aligned} \end{aligned}$$

Let $\tilde{\textbf{C}}_{\mathcal {K}}$ and $\tilde{\textbf{C}}_{\mathcal {L}}$ denote matrices whose rows are $\tilde{\textbf{c}}_i,\forall i\in {\mathcal {K}}$ and are $\tilde{\textbf{c}}_i,\forall i\in {\mathcal {L}}$, respectively. Consider the following LP:

$$\begin{aligned} \begin{aligned} \max _\textbf{x}\quad&\textbf{e}^\textrm{T}\tilde{\textbf{C}}_{\mathcal {L}}\textbf{x}\\ \text {s.t.}\quad&\;\;\tilde{\textbf{A}}\textbf{x}=\textbf{0},\\&\tilde{\textbf{C}}_{\mathcal {K}}\textbf{x}=\textbf{0},\\&\tilde{\textbf{C}}_{\mathcal {L}}\textbf{x}\ge \textbf{0}, \end{aligned} \end{aligned}$$

(17)

where $\textbf{e}$ is a vector of appropriate dimension with all components being one. Then, $\textbf{x}^*$ is a unique optimal solution of (16) if and only if the optimal objective value of (17) is zero. For any LP of the form (16), (17) will be called its (optimal) solutions’ uniqueness examination LP.

Proof

This result follows from [33, Theorem 2 and Remark 2]. $\square $

The following theorem shows that the function $\phi $ defined in (3) is L-smooth at $\bar{\textbf{y}}$, whose LD-derivatives can be evaluated by solving a sequence of LPs.

Theorem 2

Suppose that Assumption 1 holds, and consider the formulation of the directional derivative $\phi '(\bar{\textbf{y}};\textbf{d})$ established in Theorem 1 and the sequence of functions $\psi ^{(j)}_{\textbf{d},\textbf{K}}$ and sets $D_{\textbf{d},\textbf{K}}^{(j)}$ defined in Proposition 2. Then, the function $\phi $ is L-smooth at $\bar{\textbf{y}}$, and the following statements hold for any $\textbf{M}\in {\mathbb {R}}^{n_y\times p}$:

II.1
The following sequence of functions
$$\begin{aligned} \begin{aligned}&\phi ^{(0)}_{\bar{\textbf{y}},\textbf{M}}:{{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}}:\textbf{d}\mapsto \phi '(\bar{\textbf{y}};\textbf{d}),\\&\phi ^{(j)}_{\bar{\textbf{y}},\textbf{M}}:{{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}}:\textbf{d}\mapsto [\phi ^{(j-1)}_{\bar{\textbf{y}},\textbf{M}}]'(\textbf{m}_{(j)};\textbf{d}),\quad \forall j\in \{1,\ldots ,p\} \end{aligned} \end{aligned}$$
are well-defined and given by:
$$\begin{aligned} \begin{aligned} \phi ^{(0)}_{\bar{\textbf{y}},\textbf{M}}(\textbf{d})&:=\phi '(\bar{\textbf{y}};\textbf{d})\text { is given in }(10),\\ \phi ^{(j)}_{\bar{\textbf{y}},\textbf{M}}(\textbf{d})&:=[\nabla _\textbf{y}f(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{d}+\psi ^{(j-1)}_{\textbf{m}_{(1)},\textbf{M}_{(2):(p)}}(\textbf{d}),\quad \forall j\in \{1,\ldots ,p\}, \end{aligned} \end{aligned}$$
where $\textbf{M}_{(2):(p)}$ denotes $[\textbf{m}_{(2)}\;\cdot \cdot \cdot \;\textbf{m}_{(p)}]$.
II.2
The LD derivative $\phi '(\bar{\textbf{y}};\textbf{M})$ is given by
$$\begin{aligned} \phi '(\bar{\textbf{y}};\textbf{M}):=[\phi '(\bar{\textbf{y}};\textbf{m}_{(1)})\;([\nabla _\textbf{y}f(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}+\varvec{{\lambda }}^\textrm{T}\textbf{G})\textbf{M}_{(2):(p)}] \end{aligned}$$
(18)
for any $\varvec{{\lambda }}^\textrm{T}\in D^{(p-2)}_{\textbf{m}_{(1)},\textbf{M}_{(2):(p)}}$.

Proof

Observe that the term $[\nabla _\textbf{y}f(\bar{\textbf{x}}^*,\bar{\textbf{y}})]^\textrm{T}\textbf{d}$ in (10) is affine w.r.t. $\textbf{d}$, and then Statement II.1 follows from Theorem 1 and Proposition 2. Moreover, since Proposition 1 holds and according to Definition 3, $\phi $ is L-smooth at $\bar{\textbf{y}}$. Then, Statement II.2 again follows from Theorem 1 and Proposition 2. $\square $

With the theoretical results established above, the following algorithm computes an LD-derivative of $\phi $ at $\bar{\textbf{y}}$ in Assumption 1 by solving a sequence of LPs.

Remark 6

In the best case, the algorithm above requires solving two LPs to evaluate $\phi '(\bar{\textbf{y}};\textbf{M})$, namely the LP in (13) for evaluating $\psi (\textbf{m}_{(1)})$ and the solutions’ uniqueness examination LP of the dual problem of this LP. If this solutions’ uniqueness examination LP has a maximum zero, i.e., when the dual problem of the LP in (13) has a unique solution, then Proposition 2 implies that this solution must solve all LPs in the sequence defined in (14). Thus, this solution can be used directly to evaluate $\phi '(\bar{\textbf{y}};\textbf{M})$ using (18), as the algorithm above shows. We also note that Algorithm 1 requires solving at most $(2p-1)$ LPs, of which the computational cost grows polynomially with p.

We believe that the best case mentioned above may happen frequently in practice for the following reasons. Since $\phi $ is locally Lipschitz continuous, Rademacher’s theorem ( [19, Theorem 3.1]) implies that $\phi $ is differentiable almost everywhere in the Lebesgue sense. Note that the non-differentiable points of measure-zero are indeed reachable during problem-solving on a finite precision machine (c.f. [1, Sect. 4]). If $\phi $ is differentiable at $\bar{\textbf{y}}$, then [37, Proposition 1] implies that the function $\psi $ in (10) must be linear. Since $\psi $ is an optimal-value function with embedded LP parameterized by its constraints’ right-hand side vector, [3, Theorem 5.2] shows that the subdifferential (in the sense of convex analysis [42]) of $\psi $ at any $\textbf{d}\in {\mathbb {R}}^{n_y}$ is identical to the dual optimal solutions set of the corresponding LP. Thus, for any $\textbf{m}_{(1)}\in {\mathbb {R}}^{n_y}$, the dual optimal solutions set of the LP in (13) with $\textbf{d}:=\textbf{m}_{(1)}$ must be a singleton, and Algorithm 1 computes an LD-derivative by solving two LPs as discussed. Then, an L-derivative, which reduces to the Jacobian of $\phi $ at $\bar{\textbf{y}}$ in this case, can be readily computed via (2).

To implement Algorithm 1, an LP solver that can return both primal and dual optimal solutions is favorable; this is hardly a problem for prevalent LP solvers such as CPLEX [24]. If such solver is unavailable, then extra efforts need to be taken to compute a dual optimal solution. Given a primal optimal solution, a practical approach may be to identify an extreme point of the dual optimal solutions set represented by the KKT conditions.

3.2 Joint convexity

In this subsection, we show that if the functions f, $\textbf{g}$, and $\textbf{h}$ in Assumption 1 are jointly convex with respect to $(\textbf{x},\textbf{y})$. Then, with Condition I.2 being relaxed, we can still calculate an LD-derivative of $\phi $ using Algorithm 1 with easy modifications. Such parameterized convex NLPs with joint convexity are formalized as follows.

Assumption 2

Consider the setup in Assumption 1, suppose that Condition I.5 is strengthened to:

III.1
f, $\textbf{g}$, and $\textbf{h}$ are convex on ${X}\times N_{\bar{\textbf{y}}}$,

and Condition I.2 is relaxed to

III.2
$M(\bar{\textbf{y}})$ is nonempty and bounded, where M is the primal solution mapping defined in Lemma 1.

Then, (3) is a parameterized convex NLP with joint convexity.

Roughly, under Assumption 2, Theorem 2 still holds with the unique optimal solution $\bar{\textbf{x}}^*$ replaced by any $\hat{\textbf{x}}\in M(\bar{\textbf{y}})$ for evaluating $\psi $, and thus Algorithm 1 is valid for this joint convexity case with $\bar{\textbf{x}}^*$ replaced by $\hat{\textbf{x}}$. This result requires significantly simpler justification than in the previous section, due to the joint convexity of f, $\textbf{g}$, and $\textbf{h}$. When the optimal solutions are non-unique, this result (which is formalized below) is particularly beneficial for implementation, since any optimal solution of (3) obtained by numerical solver can be used directly to evaluate an LD-derivative. Note that since the optimal solution is non-unique here, the methods in [48, 49] cannot be applied to compute an LD-derivative of $\phi $.

Theorem 3

Suppose that Assumption 2 holds, and consider an arbitrary $\hat{\textbf{x}}\in M(\bar{\textbf{y}})$. Then, Algorithm 1 can be used for computing an LD-derivative $\phi '(\bar{\textbf{y}};\textbf{M})$ with $\hat{\textbf{x}}$ in place of $\bar{\textbf{x}}^*$ wherever $\bar{\textbf{x}}^*$ appears.

Proof

Since Condition III.2 holds, following the same arguments in the proof of Proposition 1, we have $\phi $ is locally Lipschitz continuous at $\bar{\textbf{y}}$.

Define a set

$$\begin{aligned}V:=\{\textbf{y}\in {N_{\bar{\textbf{y}}}}:\exists \textbf{x}\in {X}, \textbf{g}(\textbf{x},\textbf{y})<0\text { and }\textbf{h}(\textbf{x},\textbf{y})=0\}.\end{aligned}$$

Then, Condition I.7 implies that $\bar{\textbf{y}}\in \textrm{int}(V)$. Since Assumption 2 holds, by reformulating each equality constraint $h_i(\textbf{x},\textbf{y})=0$ as two inequalities and applying [22, Corollaries 3.2 and 3.1], we have that for each $\textbf{d}\in {\mathbb {R}}^{n_y}$, $\phi '(\bar{\textbf{y}};\textbf{d})$ is given by (10) and (11) with $\hat{\textbf{x}}$ in place of $\bar{\textbf{x}}^*$.

Then, Proposition 2 and Theorem 2 follow immediately with $\hat{\textbf{x}}$ in place of $\bar{\textbf{x}}^*$. Consequently, Algorithm 1 is valid for this joint convexity case with $\hat{\textbf{x}}$ in place of $\bar{\textbf{x}}^*$. $\square $

Remark 7

In the joint convexity case, $\phi $ is convex on $N_{\bar{\textbf{y}}}$ due to [13, Corollary 2.1]. Thus, an L-derivative is a subgradient in the context of convex analysis [42].

3.3 Parameterized LP’s objective function

In this subsection, we propose a new method for computing LD-derivatives of optimal-value functions with embedded an LP whose parameters appear in the objective function. Similarly to the methods in the previous subsections, this new method also computes an LD-derivative by solving a sequence of LPs. The following assumption formalizes such parameterized LP.

Assumption 3

Let $Y\subset {\mathbb {R}}^{n_y}$ be open and consider a function $\textbf{c}:Y\rightarrow {\mathbb {R}}^{n_x}$. Consider a parameterized LP of the form:

$$\begin{aligned} \begin{aligned} \phi (\textbf{y}):=\min _{\textbf{x}\in {\mathbb {R}}^{n_x}}\quad&[\textbf{c}(\textbf{y})]^\textrm{T}\textbf{x}\\ \text {s.t.}\quad&\textbf{A}\textbf{x}= \varvec{{\alpha }},\\&\textbf{B}\textbf{x}\le \varvec{{\beta }}, \end{aligned} \end{aligned}$$

(19)

with appropriate dimensions for the matrices $(\textbf{A}$, $\textbf{B})$ and the vectors $(\varvec{{\alpha }},\varvec{{\beta }})$. Suppose that the following conditions hold:

IV.1
the decision space $X:=\{\textbf{x}\in {\mathbb {R}}^{n_x}:\textbf{A}\textbf{x}=\varvec{{\alpha }}\text { and }\textbf{B}\textbf{x}\le \varvec{{\beta }}\}$ is compact, and
IV.2
the function $\textbf{c}$ is continuously differentiable.

Unlike the partial and joint convexity cases in the previous subsections, here we do not assume the Slater conditions for the decision space X, and do not assume uniqueness of an optimal solution of the LP. Instead, X is assumed to be compact. This requirement is guaranteed to be satisfied if the decision variables $\textbf{x}$ have known lower and upper bounds. Moreover, for a well-posed problem where $\phi (\textbf{y})$ is finite for each $\textbf{y}\in {Y}$, even if X is not compact, one can always impose sufficiently large bounds on $\textbf{x}$ to make the decision space compact, yet without affecting the value of $\phi (\textbf{y})$.

We first introduce a classical result by Danskin [8] for describing directional derivatives of optimal-value functions with parameter–independent decision space.

Proposition 4

(Originally from [8], summarized in [22]) Let ${\tilde{X}}\subset {\mathbb {R}}^n$ and ${\tilde{Y}}\subset {\mathbb {R}}^m$ be open, and let $\Xi \subset {\tilde{X}}$ be compact. Consider a function $f:{\tilde{X}}\times {\tilde{Y}}\rightarrow {\mathbb {R}}$ of $(\textbf{x},\textbf{y})$, for which suppose that f and the partial derivative mapping $\nabla _\textbf{y}f:{\tilde{X}}\times {\tilde{Y}}\rightarrow {\mathbb {R}}^m$ are continuous. Consider an optimal-value function $v:{\tilde{Y}}\rightarrow {\mathbb {R}}$ such that for each $\textbf{y}\in {\tilde{Y}}$,

$$\begin{aligned} v(\textbf{y}):=\min _{\textbf{x}\in \Xi }f(\textbf{x},\textbf{y}). \end{aligned}$$

Then, v is directionally differentiable on ${\tilde{Y}}$, and for each $\textbf{y}\in {\tilde{Y}}$ and $\textbf{d}\in {\mathbb {R}}^m$,

$$\begin{aligned} v'(\textbf{y};\textbf{d}):=\min _{\textbf{x}\in M(\textbf{y})}[\nabla _{\textbf{y}}f(\textbf{x},\textbf{y})]^\textrm{T}\textbf{d}, \end{aligned}$$

(20)

where $M(\textbf{y}):=\{\textbf{x}\in \Xi :v(\textbf{y})=f(\textbf{x},\textbf{y})\}$ is the optimal solutions set of the optimization problem in (20).

The following result provides a method for computing the directional derivatives of $\phi $ in Assumption 3, by applying Theorem 4.

Proposition 5

Consider the setup in Assumption 3. Then, for each $\textbf{y}\in {Y}$ and each $\textbf{d}\in {\mathbb {R}}^{n_y}$, the directional derivative $\phi '(\textbf{y};\textbf{d})$ is well-defined and given by

$$\begin{aligned} \begin{aligned}&\phi '({\textbf{y}};\textbf{d}):=\min _{\textbf{x}\in {\mathbb {R}}^{n_x}}[\textbf{J}\textbf{c}({\textbf{y}})\textbf{d}]^\textrm{T}\textbf{x}\\ \text {s.t.}&\quad \textbf{A}\textbf{x}=\varvec{{\alpha }},\\&\quad \textbf{B}\textbf{x}\le \varvec{{\beta }},\\&\qquad [\textbf{c}({\textbf{y}})]^\textrm{T}\textbf{x}=\phi ({\textbf{y}}). \end{aligned} \end{aligned}$$

(21)

Proof

Under Assumption 3, this result follows immediately from Theorem 4. $\square $

Observe that the decision space of the optimization problem in (21) is $\textbf{d}$-independent and is compact, and the mapping $\textbf{d}\mapsto \textbf{J}\textbf{c}(\textbf{y})\textbf{d}$ is linear. This enables using Proposition 4 to compute the directional derivatives of the mapping $\textbf{d}\mapsto \phi '({\textbf{y}};\textbf{d})$. Based on this intuition, the following result shows that the function $\phi $ in Assumption 3 is L-smooth, and moreover, is piecewise differentiable. Besides, for any $\textbf{y}\in {Y}$ and directions matrix $\textbf{M}\in {\mathbb {R}}^{n_y\times p}$, the LD-derivative $\phi (\textbf{y};\textbf{M})$ can be computed by solving a sequence of LPs.

Theorem 4

Consider the setup in Assumption 3. Then, the function $\phi $ is piecewise differentiable and is L-smooth on Y. Moreover, for each $\textbf{y}\in {Y}$ and each $\textbf{M}\in {\mathbb {R}}^{n_y\times p}$, the following sequence of functions

$$\begin{aligned} \begin{aligned}&\phi ^{(0)}_{{\textbf{y}},\textbf{M}}:{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}:\textbf{d}\mapsto \phi '({\textbf{y}};\textbf{d}),\\&\phi ^{(j)}_{{\textbf{y}},\textbf{M}}:{\mathbb {R}}^{n_y}\rightarrow {\mathbb {R}}:\textbf{d}\mapsto [\phi ^{(j-1)}_{{\textbf{y}},\textbf{M}}]'(\textbf{m}_{(j)};\textbf{d}),\quad \forall j\in \{1,\ldots ,p\} \end{aligned} \end{aligned}$$

are well-defined and given by: for each $j\in \{0,\ldots ,p\}$,

$$\begin{aligned} \begin{aligned} \phi ^{(j)}_{{\textbf{y}},\textbf{M}}(\textbf{d}):=\min _{\textbf{x}\in {\mathbb {R}}^{n_x}}\quad&[\textbf{J}\textbf{c}(\textbf{y})\textbf{d}]^\textrm{T}\textbf{x}\\ \text {s.t.}&\quad \textbf{A}\textbf{x}= \varvec{{\alpha }},\\&\quad \textbf{B}\textbf{x}\le \varvec{{\beta }},\\&\quad [\textbf{c}(\textbf{y})]^\textrm{T}\textbf{x}=\phi (\textbf{y}),\\&\qquad [\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(1)}]^\textrm{T}\textbf{x}=\phi ^{(0)}_{{\textbf{y}},\textbf{M}}(\textbf{m}_{(1)}),\\&\quad \vdots \\&\quad [\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(j)}]^\textrm{T}\textbf{x}=\phi ^{(j-1)}_{{\textbf{y}},\textbf{M}}(\textbf{m}_{(j)}),\\ \end{aligned} \end{aligned}$$

(22)

where the constraint $[\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(1)}]^\textrm{T}\textbf{x}=\phi ^{(0)}_{{\textbf{y}},\textbf{M}}(\textbf{m}_{(1)})$ is vacuous for $j=0$. Associated to the sequence (22), define a sequence of sets $D_{\textbf{y},\textbf{M}}^{(j)}\subset {\mathbb {R}}^{^{n_{x}}}$ such that for each $j\in \{0,\ldots ,p-1\}$, $D^{(j)}_{\textbf{y},\textbf{M}}$ is the optimal solution set of the LP for evaluating $\phi ^{(j)}_{\textbf{y},\textbf{M}}(\textbf{m}_{(j+1)})$ in (22). Then, it follows that

$$\begin{aligned} D^{(0)}_{\textbf{y},\textbf{M}}:=\arg \max _{\textbf{x}\in M(\textbf{y})}[\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(1)}]^\textrm{T}\textbf{x} \end{aligned}$$

(23)

where $M(\textbf{y})$ is the optimal solution set of the LP in (19), and for each $j\in \{1,\ldots ,p-1\}$,

$$\begin{aligned} D^{(j)}_{\textbf{y},\textbf{M}}:=\Big (\arg \max _{\textbf{x}\in D^{(j-1)}_{\textbf{y},\textbf{M}}}[\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(j+1)}]^\textrm{T}\textbf{x}\Big )\subset D^{(j-1)}_{\textbf{y},\textbf{M}}. \end{aligned}$$

(24)

Furthermore, an LD-derivative $\phi '(\textbf{y};\textbf{M})$ is given by

$$\begin{aligned} \phi '(\textbf{y};\textbf{M}):=\textbf{x}^\textrm{T}\textbf{J}\textbf{c}(\textbf{y})\textbf{M},\quad \text {for any }\textbf{x}\in D^{(p-1)}_{\textbf{y},\textbf{M}}. \end{aligned}$$

(25)

Proof

Consider a function ${\tilde{\phi }}:{\mathbb {R}}^{n_x}\rightarrow {\mathbb {R}}$:

$$\begin{aligned} \begin{aligned} {\tilde{\phi }}(\tilde{\textbf{c}}):=\min _{\textbf{x}\in {\mathbb {R}}^{n_x}}\quad&[\tilde{\textbf{c}}]^\textrm{T}\textbf{x}\\ \text {s.t.}\quad&\textbf{A}\textbf{x}= \varvec{{\alpha }},\\&\textbf{B}\textbf{x}\le \varvec{{\beta }}. \end{aligned} \end{aligned}$$

[3, Sect. 5.4] shows that ${\tilde{\phi }}$ is a piecewise linear concave function. Since $\textbf{c}$ is continuously differentiable and $\phi \equiv {\tilde{\phi }}\circ \textbf{c}$, $\phi $ is piecewise differentiable according to [47, Sect. 4.1]. Then, it follows that $\phi $ is L-smooth according to [27, Proposition 3.1].

The function $\phi ^{(1)}_{\textbf{y},\textbf{M}}$ is obtained by applying Proposition 4 to $\phi ^{(0)}_{\textbf{y},\textbf{M}}\equiv \phi '(\textbf{y};\cdot )$ established in Proposition 5. Then, the rest functions in the sequence (22) are obtained by applying Proposition 4 inductively.

The relations (23) and (24) follow immediately from the structure of the sequence of functions (22).

Consider an arbitrary $\textbf{x}\in D^{(p-1)}_{\textbf{y},\textbf{M}}$. Since (24) holds and following from [27, Definition 2.1],

$$\begin{aligned} \begin{aligned} \phi '(\textbf{y};\textbf{M})&=[\phi ^{(0)}_{\textbf{y},\textbf{M}}(\textbf{m}_{(1)})\;\phi ^{(1)}_{\textbf{y},\textbf{M}}(\textbf{m}_{(2)})\;\cdot \cdot \cdot \;\phi ^{(p-1)}_{\textbf{y},\textbf{M}}(\textbf{m}_{(p)})]\\&=[\textbf{x}^\textrm{T}\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(1)}\;\textbf{x}^\textrm{T}\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(2)}\;\cdot \cdot \cdot \;\textbf{x}^\textrm{T}\textbf{J}\textbf{c}(\textbf{y})\textbf{m}_{(p)}]\\&=\textbf{x}^\textrm{T}\textbf{J}\textbf{c}(\textbf{y})\textbf{M}. \end{aligned} \end{aligned}$$

$\square $

Theorem 4 shows that under Assumption 3, an LD-derivative $\phi '(\textbf{y};\textbf{M})$ can be computed by solving a sequence of LPs defined in (22). Since (24) holds, similarly to Remark 5, if one LP in the sequence has a unique solution, then there is no need to solve the rest LPs in the sequence. Thus, the method for examining the uniqueness of LPs’ optimal solutions described in Proposition 3 is also useful here. Embedded with this solutions’ uniqueness examination method, the complete algorithm for computing $\phi '(\textbf{y};\textbf{M})$ in Assumption 3 is presented below.

If the original LP in (19) for some $\bar{\textbf{y}}\in Y$ has a unique solution, then [5, Proposition 1.13 and Theorem 2.1] imply that $\phi $ is differentiable at $\bar{\textbf{y}}$. In this case, Algorithm 2 only requires solving one LP to furnish the desired LD-derivative, namely the solutions’ uniqueness examination LP of this original LP. Similarly to the discussion after Algorithm 1, we expect this to happen frequently in practice, since $\phi $ is differentiable almost everywhere. Algorithm 2 requires solving at most 2p LPs.

Now, we compare Algorithm 2 to an established method [5] which can be used to compute a generalized Jacobian element of $\phi $ in Assumption 3. According to [5, Theorem 2.1], an element $\textbf{s}\in \partial \phi (\bar{\textbf{y}})$ is given by:

$$\begin{aligned} \textbf{s}\equiv [\textbf{J}\textbf{c}(\bar{\textbf{y}})]^\textrm{T}\textbf{x},\quad \text {for any } \textbf{x}\in M(\bar{\textbf{y}}), \end{aligned}$$

(26)

where $M(\bar{\textbf{y}})$ is the optimal solution set of the LP in (19) for $\textbf{y}:=\bar{\textbf{y}}$. Both this method and Algorithm 2 can compute a valid element of the generalized Jacobian. Recall that an L-derivative can be computed from an LD-derivative via (2), and an L-derivative in this scalar-valued function case is an element of the generalized Jacobian. It appears that (26) is more efficient since no auxiliary LP needs to be solved. However, Algorithm 2 offers two advantages. Firstly, unlike the generalized Jacobian, the LD-derivatives obtained by Algorithm 3 satisfy a sharp chain rule [27], and thus allow dealing with more complex problems with $\phi $ embedded. Secondly, since $\phi $ is piecewise differentiable as shown in Theorem 4, in this case, an L-derivative is also guaranteed to be an element of the B-subdifferential in Definition 1 ( [27, Corollary 3.6]). Such element enables solving nonsmooth equation systems with $\phi $ embedded using the B-subdifferential-based Newton method [30]. It has been shown in [27] that for piecewise differentiable functions, this generalized Newton method exhibits local Q-quadratic convergence under less stringent conditions than the semismooth Newton method [40] based on elements of the generalized Jacobian.

4 Conclusions and future work

This article has proposed new methods for computing LD-derivatives of optimal-value functions with embedded parameterized convex programs, with potential applications in nonsmooth equation solving and optimization. We have considered three cases of parameterized convex programs, namely the partial convexity case, the joint convexity case, and the case where the parameters appear in the objective function of LPs. In each case, our new method computes an LD-derivative by solving a sequence of LPs. In the partial convexity case, the key assumptions of the new method are that the strong Slater conditions hold for the embedded convex NLP’s decision space, and this convex NLP has a unique optimal solution for the parameters’ values of interest. It has been shown that these assumptions are essentially less stringent than the conditions required by the only established methods [48, 49] that may be used for our purpose. Moreover, under these less stringent conditions, our new Algorithm 1 is also evidently computationally preferable over these methods. We then showed in Theorem 3 that in the joint convexity case, the proposed method does not require a unique optimal solution of the embedded convex programs, and any optimal solution can be used for furnishing the desired LD-derivatives. This feature is particularly beneficial for implementation. Prior to this article, there is no established method for furnishing computationally relevant generalized derivatives in this case. In the case of parameterized LP’s objective function, our new method does not require Slater conditions for the decision space, does not require a unique optimal solution, and yields valid B-subdifferential elements for facilitating piecewise differentiable equation-solving. Lastly, Algorithms 1 and 2 employ an LP’s solutions’ uniqueness examination scheme as described in Proposition 3, which may significantly reduce computational effort for evaluating LD-derivatives in practice.

An interesting application of the proposed results is to design convergent algorithms for solving large-scale decomposable programs of the form

$$\begin{aligned} \begin{aligned} \min _{\textbf{x}\in X, \textbf{y}\in Y} \quad&f(\textbf{x},\textbf{y})\\ \text {s.t.}\quad&\textbf{g}(\textbf{x},\textbf{y})\le \textbf{0},\\ \quad&\textbf{h}(\textbf{x},\textbf{y})=\textbf{0}, \end{aligned} \end{aligned}$$

(27)

where $\textbf{y}$ are the complicating variables, and the functions f, $\textbf{g}$, and $\textbf{h}$ are convex with respect to $\textbf{x}$ for each fixed $\textbf{y}$. This problem is equivalent to

$$\begin{aligned} \begin{aligned} \min _{\textbf{y}\in Y^*}\quad&\phi (\textbf{y}), \end{aligned} \end{aligned}$$

(28)

where $Y^*$ is the projection of the original decision space onto $\textbf{y}$-space [17], and $\phi (\textbf{y}):=\min \{f(\textbf{x},\textbf{y}):\textbf{g}(\textbf{x},\textbf{y})\le \textbf{0},\textbf{h}(\textbf{x},\textbf{y})=\textbf{0},\textbf{x}\in X\}$. The generalized Benders decomposition [17] is a prevalent approach for solving such problems. This approach progressively refines upper and lower bounds of the globally optimal objective value, by solving certain auxiliary subproblems. The results of this article may open a new avenue for solving such NLPs with partial convexity. When $\textbf{y}$ are continuous variables, the proposed results may be applied to evaluate a generalized derivative of $\phi (\textbf{y})$, and thus enables solving (28) directly using nonsmooth optimization methods, while still exploiting any decomposable structure when evaluating $\phi (\textbf{y})$. Such methods would not require the functions in (27) to have Property P (c.f. [17]), which is crucial to apply the generalized Benders decomposition.

Data availability statements

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Barton, P.I., Khan, K.A., Stechlinski, P., Watson, H.A.: Computationally relevant generalized derivatives: theory, evaluation, and applications. Optim. Methods Softw. 33(4–6), 1030–1072 (2018)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Google Scholar
Bertsimas, D.P., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)
Google Scholar
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, Berlin (2000)
Book Google Scholar
Clarke, F.H.: Generalized gradients and applications. Trans. Am. Math. Soc. 205, 247–262 (1975)
Article MathSciNet Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)
Book Google Scholar
Craven, B., Janin, R.: Regularity properties of the optimal value function in non linear programming. Optimization 28(1), 1–7 (1993)
Article MathSciNet Google Scholar
Danskin, J.M.: The theory of max–min, with applications. SIAM J. Appl. Math. 14(4), 641–664 (1966)
Article MathSciNet Google Scholar
De Wolf, D., Smeers, Y.: Generalized derivatives of the optimal value of a linear program with respect to matrix coefficients. Eur. J. Oper. Res. 291(2), 491–496 (2021)
Article MathSciNet Google Scholar
Facchinei, F., Fischer, A., Herrich, M.: An LP-Newton method: nonsmooth equations, KKT systems, and nonisolated solutions. Math. Program. 146(1), 1–36 (2014)
Article MathSciNet Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, New York (2003)
Google Scholar
Fiacco, A.V.: Introduction to Sensitivity and Stability Analysis in Nonlinear Programming, vol. 165. Academic Press, New York (1983)
Google Scholar
Fiacco, A.V., Kyparisis, J.: Convexity and concavity properties of the optimal value function in parametric nonlinear programming. J. Optim. Theory Appl. 48(1), 95–126 (1986)
Article MathSciNet Google Scholar
Fiacco, A.V., McCormick, G.P.: Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Wiley, New York (1968)
Google Scholar
Fischer, A., Herrich, M., Izmailov, A.F., Solodov, M.V.: A globally convergent LP-Newton method. SIAM J. Optim. 26(4), 2012–2033 (2016)
Article MathSciNet Google Scholar
Gauvin, J.: The generalized gradient of a marginal function in mathematical programming. Math. Oper. Res. 4(4), 458–463 (1979)
Article MathSciNet Google Scholar
Geoffrion, A.M.: Generalized Benders decomposition. J. Optim. Theory Appl. 10(4), 237–260 (1972)
Article MathSciNet Google Scholar
Gomez, J.A., Höffner, K., Khan, K.A., Barton, P.I.: Generalized derivatives of lexicographic linear programs. J. Optim. Theory Appl. 178(2), 477–501 (2018)
Article MathSciNet Google Scholar
Heinonen, J.: Lectures on Lipschitz Analysis. University of Jyväskylä, Jyväskylä (2005)
Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals. Springer, Berlin (2013)
Google Scholar
Höffner, K., Khan, K.A., Barton, P.I.: Generalized derivatives of dynamic systems with a linear program embedded. Automatica 63, 198–208 (2016)
Article MathSciNet Google Scholar
Hogan, W.W.: Directional derivatives for extremal-value functions with applications to the completely convex case. Oper. Res. 21(1), 188–209 (1973)
Article MathSciNet Google Scholar
Hogan, W.W.: Point-to-set maps in mathematical programming. SIAM Rev. 15(3), 591–603 (1973)
Article MathSciNet Google Scholar
IBM ILOG: V20.1.0: User’s manual for CPLEX. International Business Machines Corporation (2020)
Khan, K.A.: Branch-locking AD techniques for nonsmooth composite functions and nonsmooth implicit functions. Optim. Methods Softw. 33(4–6), 1127–1155 (2018)
Article MathSciNet Google Scholar
Khan, K.A., Barton, P.I.: Generalized derivatives for solutions of parametric ordinary differential equations with non-differentiable right-hand sides. J. Optim. Theory Appl. 163(2), 355–386 (2014)
Article MathSciNet Google Scholar
Khan, K.A., Barton, P.I.: A vector forward mode of automatic differentiation for generalized derivative evaluation. Optim. Methods Softw. 30(6), 1185–1212 (2015)
Article MathSciNet Google Scholar
Klatte, D., Kummer, B.: Stability properties of infima and optimal solutions of parametric optimization problems. In: Demyanov, V.F., Pallaschke, D. (eds.) Nondifferentiable Optimization: Motivations and Applications, pp. 215–229. Springer, Berlin (1985)
Chapter Google Scholar
Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization: Regularity, Calculus, Methods and Applications. Kluwer, Dordecht (2002)
Google Scholar
Kojima, M., Shindo, S.: Extension of Newton and quasi-Newton methods to systems of PC¹ equations. J. Oper. Res. Soc. Jpn. 29(4), 352–375 (1986)
Google Scholar
Lemaréchal, C., Strodiot, J.J., Bihain, A.: On a bundle algorithm for nonsmooth optimization. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming 4, pp. 245–282. Academic Press, New York (1981)
Chapter Google Scholar
Lukšan, L., Vlček, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program. 83(1), 373–391 (1998)
Article MathSciNet Google Scholar
Mangasarian, O.: Uniqueness of solution in linear programming. Linear Algebra Appl. 25, 151–162 (1979)
Article MathSciNet Google Scholar
Mordukhovich, B.S., Nam, N.M., Yen, N.D.: Subgradients of marginal functions in parametric mathematical programming. Math. Program. 116(1), 369–396 (2009)
Article MathSciNet Google Scholar
Nesterov, Y.: Lexicographic differentiation of nonsmooth functions. Math. Program. 104(2–3), 669–700 (2005)
Article MathSciNet Google Scholar
Nielsen, C.J., Barton, P.I.: 110th anniversary: a generalized nonsmooth operator for process integration. Ind. Eng. Chem. Res. 59(1), 253–264 (2019)
Article Google Scholar
Pang, J.S.: Newton’s method for B-differentiable equations. Math. Oper. Res. 15(2), 311–341 (1990)
Article MathSciNet Google Scholar
Penot, J.P.: Differentiability properties of optimal value functions. Can. J. Math. 56(4), 825–842 (2004)
Article MathSciNet Google Scholar
Qi, L.: Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1), 227–244 (1993)
Article MathSciNet Google Scholar
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1), 353–367 (1993)
Article MathSciNet Google Scholar
Ralph, D., Dempe, S.: Directional derivatives of the solution of a parametric nonlinear program. Math. Program. 70(1), 159–172 (1995)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book Google Scholar
Rockafellar, R.T.: Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization. Math. Oper. Res. 6(3), 424–436 (1981)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming. Math. Program. Study 17, 28–66 (1982)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Directional differentiability of the optimal value function in a nonlinear programming problem. In: Sensitivity, Stability and Parametric Analysis, pp. 213–226. Springer (1984)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
Google Scholar
Scholtes, S.: Introduction to Piecewise Differentiable Equations. Springer, Berlin (2012)
Book Google Scholar
Stechlinski, P., Jäschke, J., Barton, P.I.: Generalized sensitivity analysis of nonlinear programs using a sequence of quadratic programs. Optimization 68(2–3), 485–508 (2019)
Article MathSciNet Google Scholar
Stechlinski, P., Khan, K.A., Barton, P.I.: Generalized sensitivity analysis of nonlinear programs. SIAM J. Optim. 28(1), 272–301 (2018)
Article MathSciNet Google Scholar
Sweetser, T.: A minimal set-valued strong derivative for vector-valued Lipschitz functions. J. Optim. Theory Appl. 23(4), 549–562 (1977)
Article MathSciNet Google Scholar
Thibault, L.: On subdifferentials of optimal value functions. SIAM J. Control. Optim. 29(5), 1019–1036 (1991)
Article MathSciNet Google Scholar
Ulbrich, M.: Semismooth Newton methods for operator equations in function spaces. SIAM J. Optim. 13(3), 805–841 (2002)
Article MathSciNet Google Scholar
Vikse, M., Watson, H.A., Kim, D., Barton, P.I., Gundersen, T.: Optimization of a dual mixed refrigerant process using a nonsmooth approach. Energy 196, 116999 (2020)
Article Google Scholar
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Article MathSciNet Google Scholar
Watson, H.A., Vikse, M., Gundersen, T., Barton, P.I.: Optimization of single mixed-refrigerant natural gas liquefaction processes described by nondifferentiable models. Energy 150, 860–876 (2018)
Article Google Scholar

Download references

Funding

’Open Access funding provided by the MIT Libraries’

Author information

Authors and Affiliations

Process Systems Engineering Laboratory, Department of Chemical Engineering, MIT, 77 Massachusetts Ave, Cambridge, MA, 02139, USA
Yingkai Song & Paul I. Barton

Authors

Yingkai Song
View author publications
You can also search for this author in PubMed Google Scholar
Paul I. Barton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul I. Barton.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was funded by the OCP Group, Morocco.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Song, Y., Barton, P.I. Generalized derivatives of optimal-value functions with parameterized convex programs embedded. J Glob Optim 89, 355–378 (2024). https://doi.org/10.1007/s10898-023-01359-9

Download citation

Received: 27 June 2022
Accepted: 23 December 2023
Published: 25 January 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s10898-023-01359-9

Generalized derivatives of optimal-value functions with parameterized convex programs embedded

Abstract

Similar content being viewed by others

Directional Derivative of the Value Function for Parametric Set-Constrained Optimization Problems

Second-Order Optimality Conditions for Constrained Optimization Problems with \(C^1\) Data Via Regular and Limiting Subdifferentials

Value Functions and Their Directional Derivatives in Parametric Nonlinear Programming

1 Introduction

2 Preliminaries

2.1 Notation

2.2 Generalized Jacobian

Definition 1

2.3 Lexicographic derivatives

Definition 2

Definition 3

Definition 4

2.4 Set-valued mappings

Definition 5

Definition 6

Definition 7

Definition 8

Definition 9

3 Computing generalized derivatives

3.1 Partial convexity

Assumption 1

Proposition 1

Proof

Remark 1

Lemma 1

Proof

Remark 2

Lemma 2

Proof

Lemma 3

Proof

Theorem 1

Proof

Remark 3

Remark 4

Proposition 2

Proof

Remark 5

Proposition 3

Proof

Theorem 2

Proof

Remark 6

3.2 Joint convexity

Assumption 2

Theorem 3

Proof

Remark 7

3.3 Parameterized LP’s objective function

Assumption 3

Proposition 4

Proposition 5

Proof

Theorem 4

Proof

4 Conclusions and future work

Data availability statements

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation