Data-driven inverse optimization with imperfect information

Peyman Mohajerin Esfahani¹,
Soroosh Shafieezadeh-Abadeh²,
Grani A. Hanasusanto³ &
…
Daniel Kuhn ORCID: orcid.org/0000-0003-2697-8886²

3886 Accesses
57 Citations
Explore all metrics

Abstract

In data-driven inverse optimization an observer aims to learn the preferences of an agent who solves a parametric optimization problem depending on an exogenous signal. Thus, the observer seeks the agent’s objective function that best explains a historical sequence of signals and corresponding optimal actions. We focus here on situations where the observer has imperfect information, that is, where the agent’s true objective function is not contained in the search space of candidate objectives, where the agent suffers from bounded rationality or implementation errors, or where the observed signal-response pairs are corrupted by measurement noise. We formalize this inverse optimization problem as a distributionally robust program minimizing the worst-case risk that the predicted decision (i.e., the decision implied by a particular candidate objective) differs from the agent’s actual response to a random signal. We show that our framework offers rigorous out-of-sample guarantees for different loss functions used to measure prediction errors and that the emerging inverse optimization problems can be exactly reformulated as (or safely approximated by) tractable convex programs when a new suboptimality loss function is used. We show through extensive numerical tests that the proposed distributionally robust approach to inverse optimization attains often better out-of-sample performance than the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting problem structure in optimization under uncertainty via online convex optimization

Article 30 March 2018

Maxiset Point of View for Signal Detection in Inverse Problems

Article 01 July 2019

Robust best choice problem

Article 09 July 2020

Notes

Proposition 4.4 readily extends to the case $n+m = 2p$ at the expense of additional notation by leveraging [21, Theorem 2].
A possible choice is $\beta _N=\exp (-\sqrt{N})$.
Strictly speaking, if $\Theta $ is an $\infty $-norm ball of the form (16a), then (18) can be viewed as a family of 2n finite conic programs because $\Theta $ is non-convex but decomposes into 2n convex polytopes.
We did not consider the ERM approach in Sect. 5 because it coincides with the VI approach for linear hypotheses.

References

Ackerberg, D., Benkard, C.L., Berry, S., Pakes, A.: Econometric tools for analyzing market outcomes. In: Heckman, J., Leamer, E. (eds.) Handbook of Econometrics, pp. 4171–4276. Elsevier, Amsterdam (2007)
Google Scholar
Ahmed, S., Guan, Y.: The inverse optimal value problem. Math. Program. A 102, 91–110 (2005)
Article MathSciNet MATH Google Scholar
Ahuja, R.K., Orlin, J.B.: A faster algorithm for the inverse spanning tree problem. J. Algorithms 34, 177–193 (2000)
Article MathSciNet MATH Google Scholar
Ahuja, R.K., Orlin, J.B.: Inverse optimization. Oper. Res. 49, 771–783 (2001)
Article MathSciNet MATH Google Scholar
Anderson, B.D., Moore, J.B.: Optimal Control: Linear Quadratic Methods. Courier Corporation, North Chelmsford (2007)
Google Scholar
Aswani, A., Shen, Z.-J.M., Siddiq, A.: Inverse optimization with noisy data. Preprint arXiv:1507.03266 (2015)
Bajari, P., Benkard, C.L., Levin, J.: Estimating dynamic models of imperfect competition. Technical Report, National Bureau of Economic Research (2004)
Ben-Ayed, O., Blair, C.E.: Computational difficulties of bilevel linear programming. Oper. Res. 38, 556–560 (1990)
Article MathSciNet MATH Google Scholar
Ben-Tal, A., Ghaoui, El, Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
Book MATH Google Scholar
Bertsimas, D., Gupta, V., Paschalidis, I.C.: Inverse optimization: a new perspective on the Black–Litterman model. Oper. Res. 60, 1389–1403 (2012)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Gupta, V., Paschalidis, I.C.: Data-driven estimation in equilibrium using inverse optimization. Math. Program. 153, 595–633 (2015)
Article MathSciNet MATH Google Scholar
Boissard, E.: Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Probab. 16, 2296–2333 (2011)
Article MathSciNet MATH Google Scholar
Bottasso, C.L., Prilutsky, B.I., Croce, A., Imberti, E., Sartirana, S.: A numerical procedure for inferring from experimental data the optimization cost functions using a multibody model of the neuro-musculoskeletal system. Multibody Syst. Dyn. 16, 123–154 (2006)
Article MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2009)
MATH Google Scholar
Braess, D., Nagurney, A., Wakolbinger, T.: On a paradox of traffic planning. Transp. Sci. 39, 446–450 (2005)
Article Google Scholar
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8, 231–357 (2015)
Article MATH Google Scholar
Burton, D., Toint, P.L.: On an instance of the inverse shortest paths problem. Math. Program. 53, 45–61 (1992)
Article MathSciNet MATH Google Scholar
Carr, S., Lovejoy, W.: The inverse newsvendor problem: choosing an optimal demand portfolio for capacitated resources. Manag. Sci. 46, 912–927 (2000)
Article MATH Google Scholar
Chan, T.C., Craig, T., Lee, T., Sharpe, M.B.: Generalized inverse multiobjective optimization with application to cancer therapy. Oper. Res. 62, 680–695 (2014)
Article MathSciNet MATH Google Scholar
Faragó, A., Szentesi, Á., Szviatovszki, B.: Inverse optimization in high-speed networks. Discrete Appl. Math. 129, 83–98 (2003)
Article MathSciNet MATH Google Scholar
Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162, 707–738 (2014)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2001)
MATH Google Scholar
Harker, P., Pang, J.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48, 161–220 (1990)
Article MathSciNet MATH Google Scholar
Heuberger, C.: Inverse combinatorial optimization: a survey on problems, methods, and results. J. Comb. Optim. 8, 329–361 (2004)
Article MathSciNet MATH Google Scholar
Hochbaum, D.S.: Efficient algorithms for the inverse spanning-tree problem. Oper. Res. 51, 785–797 (2003)
Article MathSciNet MATH Google Scholar
Iyengar, G., Kang, W.: Inverse conic programming with applications. Oper. Res. Lett. 33, 319–330 (2005)
Article MathSciNet MATH Google Scholar
Keshavarz, A., Wang, Y., Boyd, S.: Imputing a convex objective function. In: IEEE International Symposium on Intelligent Control, pp. 613–619 (2011)
Lofberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: IEEE International Symposium on Computer Aided Control Systems Design, pp. 284 –289 (2004)
Markowitz, H .M.: Portfolio Selection: Efficient Diversification of Investments. Yale University Press, New Haven (1968)
Google Scholar
Mohajerin Esfahani, P., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. (2017). https://doi.org/10.1007/s10107-017-1172-1
Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39, 117–129 (1987)
Article MathSciNet MATH Google Scholar
Neumann-Denzau, G., Behrens, J.: Inversion of seismic data using tomographical reconstruction techniques for investigations of laterally inhomogeneous media. Geophys. J. Int. 79, 305–315 (1984)
Article Google Scholar
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Article Google Scholar
Saez-Gallego, J., Morales, J.M., Zugno, M., Madsen, H.: A data-driven bidding model for a cluster of price-responsive consumers of electricity. IEEE Trans. Power Syst. 31, 5001–5011 (2016)
Article Google Scholar
Schaefer, A.J.: Inverse integer programming. Optim. Lett. 3, 483–489 (2009)
Article MathSciNet MATH Google Scholar
Shafieezadeh-Abadeh, S., Kuhn, D., Mohajerin Esfahani, P.: Regularization via mass transportation. Preprint arXiv:1710.10016 (2017)
Shafieezadeh-Abadeh, S., Esfahani, P.M., Kuhn, D.: Distributionally robust logistic regression. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1576–1584. Curran Associates, Inc. (2015)
Sion, M.: On general minimax theorems. Pac. J. Math. 8, 171–176 (1958)
Article MathSciNet MATH Google Scholar
Terekhov, A.V., Pesin, Y.B., Niu, X., Latash, M.L., Zatsiorsky, V.M.: An analytical approach to the problem of inverse optimization with additive objective functions: an application to human prehension. J. Math. Biol. 61, 423–453 (2010)
Article MathSciNet MATH Google Scholar
Troutt, M.D., Pang, W.-K., Hou, S.-H.: Behavioral estimation of mathematical programming objective function coefficients. Manag. Sci. 52, 422–434 (2006)
Article MATH Google Scholar
Wang, L.: Cutting plane algorithms for the inverse mixed integer linear programming problem. Oper. Res. Lett. 37, 114–116 (2009)
Article MathSciNet MATH Google Scholar
Woodhouse, J.H., Dziewonski, A.M.: Mapping the upper mantle: three-dimensional modeling of earth structure by inversion of seismic waveforms. J. Geophys. Res. 89, 5953–5986 (1984)
Article Google Scholar
Zhang, J., Xu, C.: Inverse optimization for linearly constrained convex separable programming problems. Eur. J. Oper. Res. 200, 671–679 (2010)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Swiss National Science Foundation grant BSCGI0_157733.

Author information

Authors and Affiliations

Delft Center for Systems and Control, TU Delft, Delft, The Netherlands
Peyman Mohajerin Esfahani
Risk Analytics and Optimization Chair, EPFL, Lausanne, Switzerland
Soroosh Shafieezadeh-Abadeh & Daniel Kuhn
Graduate Program in Operations Research and Industrial Engineering, UT Austin, Austin, TX, USA
Grani A. Hanasusanto

Authors

Peyman Mohajerin Esfahani
View author publications
You can also search for this author in PubMed Google Scholar
Soroosh Shafieezadeh-Abadeh
View author publications
You can also search for this author in PubMed Google Scholar
Grani A. Hanasusanto
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kuhn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Kuhn.

Appendix A. Convex hypotheses

Consider the class $\mathcal {F}$ of hypotheses of the form $F_\theta (s,x):=\big \langle \theta , \Psi (x) \big \rangle $, where each component function of the feature map $\Psi :{\mathbb {R}}^n\rightarrow {\mathbb {R}}^d$ is convex, and where the weight vector $\theta $ ranges over a convex closed search space $\Theta \subseteq {\mathbb {R}}^d_+$. Thus, by construction, $F_\theta (s,x)$ is convex in x for every fixed $\theta \in \Theta $. In the remainder we will assume without much loss of generality that the transformation from the signal-response pair (s, x) to the signal-feature pair $(s,\Psi (x))$ is Lipschitz continuous with Lipschitz modulus 1, that is, we require

$$\begin{aligned} \Vert (s_1,\Psi (x_1)) - (s_2,\Psi (x_2)) \Vert \le \Vert (s_1,x_1) - (s_2,x_2) \Vert \quad \forall (s_1,x_1), (s_2,x_2)\in {\mathbb {R}}^m\times {\mathbb {R}}^n.\nonumber \\ \end{aligned}$$

(34)

Before devising a safe conic approximation for the inverse optimization problem (13) with convex hypotheses, we recall that the conjugate of a function $f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}$ is defined through $f^*(z)=\sup _{x\in {\mathbb {R}}^n} \big \langle z, x \big \rangle -f(x)$.

Theorem A.1

(Convex hypotheses and suboptimality loss) Assume that $\mathcal {F}$ represents the class of convex hypotheses induced by the feature map $\Psi $ and with a convex closed search space $\Theta \subseteq {\mathbb {R}}_+^d$ and that Assumption 5.1 holds. If the observer uses the suboptimality loss (4b) and measures risk using the $\mathrm{CVaR}$ at level $\alpha \in (0,1]$, then the following convex program provides a safe approximation for the distributionally robust inverse optimization problem (13) over the 1-Wasserstein ball:

$$\begin{aligned} \begin{array}{llll} \text {minimize} &{}\quad \displaystyle \tau +\frac{1}{\alpha }\left( {\varepsilon } \lambda + {1 \over N}\sum \limits _{i = 1}^{N} r_i\right) \\ \text {subject to} &{}\quad { \theta \in \Theta ,\;\; \lambda \in {\mathbb {R}}_+,\;\;\tau , r_i\in \mathbb R,\;\; \phi _{i1},\phi _{i2}\in {\mathcal {C}}^*, \;\;\gamma _{i} \in {\mathcal {K}}^* ,\;\; z_{ij}\in {\mathbb {R}}^n } &{}\quad \forall i \le N,\forall j\le d\\ &{}\quad \displaystyle \sum _{j=1}^d \theta _j \Psi ^*_j(z_{ij}/\theta _j) + \big \langle \theta , \Psi (\widehat{x}_i) \big \rangle + \big \langle \phi _{i1}, C \widehat{s}_i - d \big \rangle - \big \langle \gamma _i, H \widehat{s}_i + h \big \rangle \le r_i + \tau &{}\quad \forall i \le N \\ &{}\quad \displaystyle \sum _{j=1}^d z_{ij} = W^\top \gamma _i &{}\quad \forall i \le N\\ &{}\quad \big \langle C \widehat{s}_i-d, \phi _{i2} \big \rangle \le r_i &{}\quad \forall i \le N \\ &{}\quad \left\| \begin{pmatrix} H^\top \gamma _i - C^\top \phi _{i1} \\ \theta \end{pmatrix} \right\| _* \le \lambda ,\;\; \left\| \begin{pmatrix} C^\top \phi _{i2} \\ 0 \end{pmatrix} \right\| _* \le \lambda&\quad \forall i \le N. \end{array} \end{aligned}$$

(35)

Note that Theorem A.1 remains valid if $(\widehat{s}_i, \widehat{x}_i)\notin \Xi $ for some $i\le N$.

Proof of Theorem A.1

As in the proof of Theorem 5.2 one can show that the objective function of the inverse optimization problem (13) is equivalent to (19). In the remainder, we derive a safe conic approximation for the (intractable) subordinate worst-case expectation problem

$$\begin{aligned} \sup _{\mathbb {Q}\in \mathbb {B}^{1}_{\varepsilon }({\widehat{\mathbb {P}}_N})} \mathbb {E}^\mathbb {Q}\big [\max \{\ell _\theta (s,x) - \tau , 0\} \big ]. \end{aligned}$$

(36)

To this end, note that the suboptimality loss $\ell _\theta (s,x) = \big \langle \theta , \Psi (x) \big \rangle - \min _{y \in {\mathbb {X}}(s)} \big \langle \theta , \Psi (y) \big \rangle $ depends on x only through $\Psi (x)$. This motivates us to define a lifted suboptimality loss $\ell ^\Psi _\theta (s,\psi ) = \big \langle \theta , \psi \big \rangle - \min _{y \in {\mathbb {X}}(s)} \big \langle \theta , \Psi (y) \big \rangle $, where $\psi $ represents an element of the feature space ${\mathbb {R}}^d$, and the empirical distribution $\widehat{\mathbb {P}}_N^\Psi = \frac{1}{N}\sum _{i=1}^N \delta _{(\widehat{s}_i, \Psi (\widehat{x}_i))}$ of the signal-feature pairs. Moreover, we denote by ${\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N^\Psi )$ the 1-Wasserstein ball of all distributions on ${\mathbb {S}} \times {\mathbb {R}}^d$ that have a distance of at most $\varepsilon $ from $\widehat{\mathbb {P}}_N^\Psi $. In the following we will show that the worst-case expectation

$$\begin{aligned} \sup _{{\mathbb {Q}} \in {\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N^\Psi )} \mathbb {E}^\mathbb {Q}\big [\max \{\ell _\theta (s,\psi ) - \tau , 0\} \big ] \end{aligned}$$

(37)

on the signal-feature space ${\mathbb {S}} \times {\mathbb {R}}^d$ provides a tractable upper bound on the worst-case expectation (36). By Definition 4.3, each distribution $\mathbb {Q}\in {\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N)$ corresponds to a transportation plan $\Pi $, that is, a joint distribution of two signal-response pairs (s, x) and $(s',x')$ under which (s, x) has marginal distribution $\mathbb {Q}$ and $(s',x')$ has marginal distribution $\widehat{\mathbb {P}}_N$. By the law of total probability, the transportation plan can be expressed as $\Pi =\frac{1}{N}\sum _{i=1}^N \delta _{(\widehat{s}_i,\widehat{x}_i)}\otimes \mathbb {Q}_i$, where $\mathbb {Q}_i$ denotes the conditional distribution of (s, x) given $(s',x')=(\widehat{s}_i,\widehat{x}_i)$, $i\le N$, see also [30, Theorem 4.2]. Thus, we the worst-case expectation (36) satisfies

$$\begin{aligned}&\sup _{\mathbb {Q}\in \mathbb {B}^{1}_{\varepsilon }({\widehat{\mathbb {P}}_N})} \mathbb {E}^\mathbb {Q}\big [\max \{\ell _\theta (s,x) - \tau , 0\} \big ] \\&\quad =\sup \limits _{\mathbb {Q}^i}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \max \{\ell _\theta (s,\Psi (x)) - \tau , 0\} \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \\&\quad \text {s.t.}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \Vert (s,x) - (\widehat{s}_i, \widehat{x}_i ) \Vert \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \le \varepsilon \\&\quad \int _{\Xi } \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) = 1 \qquad \forall i \le N \\&\quad \le \sup \limits _{\mathbb {Q}^i}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \max \{\ell _\theta (s,\Psi (x)) - \tau , 0\} \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \\&\quad \text {s.t.}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \Vert (s,\Psi (x)) - (\widehat{s}_i, \Psi (\widehat{x}_i) ) \Vert \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \le \varepsilon \\&\quad \int _{\Xi } \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) = 1 \qquad \forall i \le N \\&\quad \le \sup \limits _{\mathbb {Q}^i}\frac{1}{N} \sum \limits _{i=1}^N \int _{{\mathbb {S}} \times {\mathbb {R}}^d} \max \{\ell _\theta (s,\psi ) - \tau , 0\} \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}\psi ) \\&\quad \text {s.t.}\frac{1}{N} \sum \limits _{i=1}^N \int _{{\mathbb {S}} \times {\mathbb {R}}^d} \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i) ) \Vert \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}\psi ) \le \varepsilon \\&\quad \int _{{\mathbb {S}} \times {\mathbb {R}}^d} \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}\psi ) = 1 \qquad \forall i \le N, \end{aligned}$$

where the first inequality follows from (34), while the second inequality follows from relaxing the implicit condition that the signal-feature pair $(s,\psi )$ must be supported on $\{(s,\Psi (x)):(s,x)\in \Xi \}\subseteq \mathbb {S}\times {\mathbb {R}}^d$. Using a similar reasoning as before, the last expression is readily recognized as the worst-case expectation (37). Thus, (37) provides indeed an upper bound on (36). Duality arguments borrowed from [30, Theorem 4.2] imply that the infinite-dimensional linear program (37) admits a strong dual of the form

$$\begin{aligned} \begin{array}{cll} \inf \limits _{\lambda \ge 0, r_i} &{} \displaystyle \lambda \varepsilon + \frac{1}{N} \sum _{i=1}^N r_i &{} \\ \text {s.t.} &{} \sup \limits _{(s,y) \in \Xi , \psi \in {\mathbb {R}}^d} \big \langle \theta , \psi - \Psi (y) \big \rangle - \tau - \lambda \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i)) \Vert \le r_i &{} \forall i \le N \\ &{}\sup \limits _{s \in {\mathbb {S}}, \psi \in {\mathbb {R}}^d} -\lambda \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i)) \Vert \le r_i &{} \forall i \le N. \end{array} \end{aligned}$$

(38)

By using the definitions of $\mathbb S$ and $\mathbb {X}(s)$ put forth in Assumption 5.1, the i-th member of the first constraint group in (38) is satisfied if and only if the optimal value of the maximization problem

$$\begin{aligned} \begin{array}{cl} \sup \limits _{s,y,\psi } &{} \big \langle \theta , \psi - \Psi (y) \big \rangle - \tau - \lambda \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i)) \Vert \\ \text {s.t.}&{} C s \succeq _{\mathcal {C}} d,\; Wy \succeq _{\mathcal {K}} Hs + h \end{array} \end{aligned}$$

(39)

does not exceed $r_i$. A tedious but routine calculation shows that the dual of (39) can be represented as

$$\begin{aligned} \begin{array}{cll} \inf \limits _{p_i, q_i, \gamma _i, \phi _{i1}, z_{ij}} &{} \displaystyle \sum _{j=1}^d \theta _j \Psi ^*_j(z_{ij}/\theta _j) - \tau + \big \langle \theta , \Psi (\widehat{x}_i) \big \rangle + \big \langle \phi _{i1}, C \widehat{s}_i - d \big \rangle - \big \langle \gamma _i, H \widehat{s}_i + h \big \rangle \\ \text {s.t.} &{} \sum _{j=1}^d z_{ij} = W^\top \gamma _i \\ &{} \Vert (H^\top \gamma _i - C^\top \phi _{i1}, \theta ) \Vert _* \le \lambda \\ &{} \gamma _i \in \mathcal K^*, \phi _{i1} \in \mathcal C^*. \end{array} \end{aligned}$$

(40)

Note that the perspective functions $\theta _j \Psi ^*_j(z_{ij}/\theta _j)$ in the objective of (40) emerge from taking the conjugate of $\theta _j \Psi _j(y)$. Thus, for $\theta _j=0$ we must interpret $\theta _j \Psi ^*_j(z_{ij}/\theta _j)$ as an indicator function in $z_{ij}$ which vanishes for $z_{ij}=0$ and equals $\infty $ otherwise. By weak duality, (40) provides an upper bound on (39). We conclude that the i-th member of the first constraint group in (38) is satisfied whenever the dual problem (40) has a feasible solution whose objective value does not exceed $r_i$. A similar reasoning shows that the i-th member of the second constraint group in (38) holds if and only if there exists $\phi _{i2} \in \mathcal {C}^*$ such that

$$\begin{aligned}&\begin{array}{clll} &{} \big \langle C\widehat{s}_i-d, \phi _{i2} \big \rangle \le r_i \quad \text {and} \quad \left\| \begin{pmatrix}C^\intercal \phi _{i2} \\ 0\end{pmatrix} \right\| _* \le \lambda . \end{array} \end{aligned}$$

Thus, the worst-case expectation (36) is bounded above by the optimal value of the finite convex program

$$\begin{aligned} \begin{array}{clll} \inf \limits _{} &{}\quad \displaystyle {\varepsilon } \lambda + {1 \over N}\sum \limits _{i = 1}^{N} r_i \\ \text {s.t.}&{}\quad { \lambda \in {\mathbb {R}}_+,\;r_i\in {\mathbb {R}},\; \phi _{i1},\phi _{i2}\in \mathcal {C}^*, \;\gamma _{i}\in \mathcal {K}^* } &{}\quad \forall i \le N \\ &{}\quad \sum _{j=1}^d \theta _j \Psi ^*_j(z_{ij}/\theta _j) + \big \langle \theta , \Psi (\widehat{x}_i) \big \rangle + \big \langle \phi _{i1}, C \widehat{s}_i - d \big \rangle - \big \langle \gamma _i, H \widehat{s}_i + h \big \rangle \le r_i + \tau &{}\quad \forall i \le N \\ &{}\quad \sum _{j=1}^d z_{ij} = W^\top \gamma _i &{}\quad \forall i \le N \\ &{}\quad \big \langle C \widehat{s}_i-d, \phi _{i2} \big \rangle \le r_i\\ &{}\quad \left\| \begin{pmatrix} H^\top \gamma _i - C^\top \phi _{i1} \\ \theta \end{pmatrix} \right\| _* \le \lambda ,\;\; \left\| \begin{pmatrix} C^\top \phi _{i2} \\ 0 \end{pmatrix} \right\| _* \le \lambda&\quad \forall i \le N. \end{array} \end{aligned}$$

The claim then follows by substituting this convex program into (19).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohajerin Esfahani, P., Shafieezadeh-Abadeh, S., Hanasusanto, G.A. et al. Data-driven inverse optimization with imperfect information. Math. Program. 167, 191–234 (2018). https://doi.org/10.1007/s10107-017-1216-6

Download citation

Received: 11 December 2015
Accepted: 28 November 2017
Published: 07 December 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10107-017-1216-6

Data-driven inverse optimization with imperfect information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting problem structure in optimization under uncertainty via online convex optimization

Maxiset Point of View for Signal Detection in Inverse Problems

Robust best choice problem

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A. Convex hypotheses

Theorem A.1

Proof of Theorem A.1

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Data-driven inverse optimization with imperfect information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting problem structure in optimization under uncertainty via online convex optimization

Maxiset Point of View for Signal Detection in Inverse Problems

Robust best choice problem

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A. Convex hypotheses

Appendix A. Convex hypotheses

Theorem A.1

Proof of Theorem A.1

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation